Submitting RDF

From NINESWiki
Jump to: navigation, search

RDF is the metadata format that contributors use to make their resources available for use within NINES, 18thConnect and MESA. With RDF, contributors describe each of their resources in general terms that allow those resources to be categorized and searched through COLLEX.

For more basic information about RDF and the principles of generating it for your project, see l this page at NINES.


Contents

Samples

Below is a generic, mock RDF file. We have also made several RDF samples available, along with samples of XSLT for transforming XML source files into RDF metadata.

RDF Mock-up

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      xmlns:dc="http://purl.org/dc/elements/1.1/"
      xmlns:dcterms="http://purl.org/dc/terms/"
      xmlns:collex="http://www.collex.org/schema#"
      xmlns:ra="http://www.rossettiarchive.org/schema#"
      xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
      xmlns:role="http://www.loc.gov/loc.terms/relators/">

   <YOUR:NAMESPACE rdf:about="UNIQUE_OBJECT_ID">

<!-- choose your federation: multiples are also allowed. -->

      <collex:federation>NINES</collex:federation>
      <collex:federation>18thConnect</collex:federation>
      <collex:federation>MESA</collex:federation>
      
      <collex:archive>CONTRIBUTING PROJECT</collex:archive>

      <dc:title>OBJECT TITLE</dc:title>
      <dcterms:alternative>ALTERNATE TITLE</dcterms:alternative>
      <dc:source>TITLE OF JOURNAL/ANTHOLOGY/LARGER WORK</dc:source>

      <role:ART>VISUAL ARTIST</role:ART>
      <role:AUT>AUTHOR</role:AUT>
      <role:EDT>EDITOR</role:EDT>
      <role:PBL>PUBLISHER</role:PBL>
      <role:TRL>TRANSLATOR</role:TRL>
      <role:CRE>CREATOR</role:CRE>
      <role:ETR>ETCHER</role:ETR>
      <role:EGR>ENGRAVER</role:EGR>

      <dc:type>TYPE</dc:type>

      <collex:discipline>DISCIPLINE</collex:discipline>

      <collex:genre>GENRE</collex:genre>
      <collex:genre>ANOTHER GENRE</collex:genre>

      <collex:freeculture>TRUE OR FALSE</collex:freeculture>
      <collex:ocr>TRUE OR FALSE</collex:ocr>
      <collex:fulltext>TRUE OR FALSE</collex:fulltext>

      <dc:language>LANGUAGE OF RESOURCE</dc:language>

      <dc:date>4-DIGIT-DATE</dc:date>

      <collex:thumbnail rdf:resource="http://YOUR_PUBLICATION.ORG/THUMBNAIL.JPG"/>
      <collex:image rdf:resource="http://YOUR_PUBLICATION.ORG/FULL_SIZE_IMAGE.JPG"/>

      <collex:source_xml rdf:resource="http://YOUR_ENCODED_RESOURCE.XML"/>
      <collex:text rdf:resource="http://PLAIN_TEXT_OBJECT_TRANSCRIPTION.TXT"/>
      <rdfs:seeAlso rdf:resource="http://YOUR_PUBLICATION.ORG/YOUR_OBJECT.HTML"/>

      <dcterms:hasPart rdf:resource="ANOTHER_OBJECT_CONTAINED_BY_THIS_OBJECT"/>
      <dcterms:isPartOf rdf:resource="AN_OBJECT_THAT_CONTAINS_THIS_OBJECT"/>
      <dc:relation rdf:resource="AN_ASSOCIATED_OBJECT"/>

   </YOUR:NAMESPACE>

</rdf:RDF>

RDF Specification

Available Elements
<custom_namespace rdf:about=""> <collex:source_xml>
<collex:source_html> <collex:source_sgml>
<dc:date> <collex:thumbnail>
<dc:relation> <collex:text>
<dc:source> <rdf:RDF>
<dc:title> <rdf:value>
<dcterms:alternative> <rdfs:label>
<dcterms:hasPart> <rdfs:seeAlso>
<dcterms:isPartOf> <role:ART>
<collex:archive> <role:AUT>
<collex:date> <role:EDT>
<collex:genre> <role:PBL>
<collex:image> <role:TRL>
<role:ETR> <role:CRE>
<collex:federation> <role:EGR>
<collex:ocr> <collex:fulltext>
<dc:language> <collex:freeculture>

Element Definitions

All element values should not include leading or trailing whitespace. In other words, <dc:date>1875</dc:date> is correct while <dc:date> 1875 </dc:date> is incorrect.

<rdf:RDF>

the root element of the RDF file, listing namespace declarations with multiple "xmlns:___" attributes
it isn't necessary to reference an actual XSD schema to validate the RDF--use the "xmlns" value only to establish a unique namespace
Required? YES
Can appear? ONCE

<custom_namespace rdf:about="value">

denotes the object
a child element of <rdf:RDF> with a project defined namespace
"rdf:about" attribute records the unique id for the object. This id should be meaningful to the home project.
Required? YES
Can appear? ONCE

<collex:archive>

a shorthand reference to the contributing project or journal, one word such as "rossetti" or "rc_praxis". This word should be unique to this particular set of content. You shouldn't, therefore, choose a reference like "PodunkUP" if Podunk University Press intends to contribute a different set of content in future. (Instead, choose "podunk_up_journal1"). You may use a wide variety of characters to form this name, but it is recommended that you use only lower case letters, numbers, and the underscore.
Required? YES
Can appear? ONCE

<dc:title>

the title of the object
Required? YES
Can appear? ONCE

<dcterms:alternative>

an alternative title or name of the object. The distinction between titles and alternative titles is application-specific. In Collex search results, the alternative title displays below the

main title, in plain (non-hyperlinked) text.

Required? NO
Can appear? MULTIPLE

<dc:source>

title of the larger work, resource, or collection of which the present object takes part
can be used for the title of a journal, anthology, book, online collection, etc.
Required? NO
Can appear? MULTIPLE

<dc:subject>

a single keyword which is not currently displayed in some nodes, but is displayed in the results for others. It may be applied to tags and for future mining of data from ARC Partners sites. MESA also adds a requirement that the term come from a recognized or approved-by-MESA authority list. See the MESA metadata standards for more info:

[1] .

Required? NO
Can appear? MULTIPLE

<dc:type>

adapted from the DCMI list of types, this term should describe the medium, or format of the object.
Available Type Values
Codex Collection Drawing Illustration
Interactive Resource Manuscript Map Moving Image
Periodical Physical Object Roll Sheet
Sound Still Image Typescript


Required? YES
Can appear? MULTIPLE

<role:***>

individuals or organizations involved in the creation of the object. Though this list is drawn from the LOC relator codes

([2]), it is hard-coded into the Collex indexing software; if ARC nodes want to use additional relator codes such as RCD (Recordist) or DRT (Director), the ARC Steering Committee would need to add them to the set of terms allowed in Collex.

possible element names include <role:ART> for Visual Artist
<role:AUT> for Author
<role:EDT> for Editor
<role:PBL> for Publisher
<role:TRL> for Translator
<role:CRE> for Creator
<role:ETR> for Etcher
<role:EGR> for Engraver
<role:OWN> for Owner
<role:ART> for Artist
<role:ARC> for Architect
<role:BND> for Binder
<role:BKD> for Book designer
<role:BKP> for Book producer
<role:CLL> for Calligrapher
<role:CTG> for Cartographer
<role:COL> for Collector
<role:CLR> for Colorist
<role:CWT> for Commentator for written text
<role:COM> for Compiler
<role:CMT> for Compositor
<role:CRE> for Creator
<role:DUB> for Dubious author
<role:FAC> for Facsimilist
<role:ILU> for Illuminator
<role:ILL> for Illustrator
<role:LTG> for Lithographer
<role:PRT> for Printer
<role:POP> for Printer of plates
<role:PRM> for Printmaker
<role:RPS> for Repository
<role:RBR> for Rubricator
<role:SCR> for Scribe)
<role:SCL> for Sculptor)
<role:TYD> for Type designer)
<role:TYG> for Typographer)
<role:WDE> for Wood engraver)
<role:WDC> for Wood cutter)
ARC Partner sites recommend the submission of names in the format of "Last, First." Contributors are also encouraged to consult the Library of Congress [3] authorities list. Please be internally consistent and keep good records of any names you use.
Please note: each element's content values pertain only to the object at hand, not to the object's content or subject matter; when you list a particular name as "author," this should be the author of the object, not an author described in the object's text.
ARC strongly encourages using <role:ART> or <role:AUT>, even when the agent is unknown or anonymous. In such cases, use the standard values "Unknown" or "Anonymous." For example, <role:AUT>Unknown</role:AUT>. Variants of those values ("Unk." or "Anon.") will degrade the usability of the faceted browser.
Required? YES
Can appear? MULTIPLE

<collex:discipline>

information about the disciplines that may be interested in the object
Each object is required to have at least one valid discipline from the list below.
Required? YES
Can appear? MULTIPLE
Available Discipline Values
Anthropology Archaeology Architecture Art History
Book History Classics and Ancient History Ethnic Studies Film Studies
Gender Studies Geography History Law
Literature Manuscript Studies Math Musicology
Philosophy Religious Studies Science Theater Studies

<collex:genre>

basic descriptive genres for Collex materials
Each object is required to have at least one valid genre from the list below.
Required? YES
Can appear? MULTIPLE
Available Genre Values
Bibliography Catalog Citation Collection
Correspondence Criticism Drama Ephemera
Fiction Historiography Law Life Writing
Liturgy Musical Analysis Music, Other Musical Recording
Musical Score Nonfiction Paratext Philosophy
Photograph Poetry Religion Religion, Other
Reference Works Review Scripture Sermon
Translation Travel Writing Unspecified Visual Art
Please note the difference between 'Bibliography' (that is, a collection of bibliographical citations) and 'Citation' (referring to one citation, possibly within a larger list).

<dc:date>

date of the object
may contain either a four digit year AND NOTHING ELSE (<dc:date>1959</dc:date>), or may contain a <collex:date> element inside, its usage described in this entry and the next one.
Please note: contributors should, when at all possible, attempt to include a date even when a date value is unknown or uncertain
Unknown or uncertain dates can, in most cases, be narrowed to a possible date range, be it a decade or a century; contributors should use the <nines:date> formula to record a human-readable value (<rdfs:label>) and a computational value (<rdf:value>).
To narrow a range to a decade, replace the last year digit with "u." E.g. 1860's are written as "186u" in <rdf:value>.
To narrow a range to a century, replace the last two year digits with "u." E.g. 1800's are written as "18uu" in <rdf:value>.
The value of <rdfs:label> can be anything one would like: "1860's", "1800's", "Likely the 1860's", "1860 through 1869", etc.
  <dc:date>
    <collex:date>
      <rdfs:label>1890-99 (circa)</rdfs:label>
      <rdf:value>189u</rdf:value>
    </collex:date>
  </dc:date>
Objects which were produced over a number of years can receive a date range. Again, use the <collex:date> scheme. The <rdfs:label> takes any human-readable formulation, e.g. "1861 through 1862". <rdf:value> would encode the start-date and end-date in the range in a comma-separated format, e.g. "1861,1862".
  <dc:date>
    <collex:date>
      <rdfs:label>1891-93</rdfs:label>
      <rdf:value>1891,1893</rdf:value>
    </collex:date>
  </dc:date>
Objects that are uncertainly dated but still known to be composed within a specific date range should receive a hybrid formulation, involving two <dc:date> elements. One <dc:date> records the date range; the second <dc:date> marks the object's date as "Uncertain".
  <dc:date>
    <collex:date>
      <rdfs:label>Sometime between 1891 and 1893</rdfs:label>
      <rdf:value>1891,1893</rdf:value>
    </collex:date>
  </dc:date>
  <dc:date>Uncertain</dc:date>
Objects worked on in nonconsecutive years should receive distinct <dc:date> elements for each year. So, for an object begun in 1890, put on hiatus in 1891, then concluded in 1892, the encoding would be ...
  <dc:date>1890</dc:date>
  <dc:date>1892</dc:date>
Required? YES
Can appear? MULTIPLE

<collex:date>

element used when contributor wants to preserve more human readable date information while also including a formal date value
has two child elements, <rdfs:label> and <rdf:value>
Required? NO
Can appear? ONCE

<rdfs:label>

preserves a human readable date value, e.g. "1806 (circa)"
will appear as the "Date" value in COLLEX query results
Required? NO
Can appear? ONCE

<rdf:value>

formal, four-digit date value of the <rdfs:label> contents
used for computational sorting and querying
Required? NO
Can appear? ONCE

<collex:freeculture>

if present, a "true" value denotes that the content is free and available for use by all people in all places, whereas as a "false" value denotes that the content is restricted in some way to subscribers.
Required? NO (defaults to "true" if not present)
Can appear? ONCE

<collex:source_xml>

pointer to the web-accessible source code for the data in XML format.
Required? NO
Can appear? ONCE

<collex:source_html>

pointer to the web-accessible source code for the data in HTML format.
Required? NO
Can appear? ONCE

<collex:source_sgml>

pointer to the web-accessible source code for the data in SGML format.
Required? NO
Can appear? ONCE

<rdfs:seeAlso rdf:resource="">

pointer to the web-accessible object as it is rendered in your own interface. distinct urls displaying the same content should each get an rdfs:seeAlso entry.
usually an html page. During indexing, the NINES server issues a HEAD request to the specified URL (not a GET) and follows redirects.
Required? YES
Can appear? ONCE

<collex:text>

contains either:
1) URL to a web-accessible, plain text transcription of the object, like the following:
<collex:text rdf:resource="http://www.rossettiarchive.org/docs/1-1835.raw.txt"/>
2) plain text of the transcript within the nines:text element, such as:
<collex:text>full text goes here</collex:text>
indexed by the COLLEX search engine and used for full-text queries. This should be a "pure" transcript of the text content of the object, without extraneous text from navigation elements, copyright statements, etc. Encode plain text in UTF-8 format.
Required? NO
Can appear? ONCE

<collex:image rdf:resource="">

pointer to the web-accessible, full-size digital image of the object. This optional element is used to specify the full-sized image that may appear in a pop-up box when a user clicks on the object thumbnail in a list of search results
Required? NO
Can appear? ONCE

<collex:thumbnail rdf:resource="">

pointer to the web-accessible, thumbnail-sized digital image of the object being described. If this tag

is not present, the display will default to the archive or node thumbnail. We suggest that you make your thumbnails no larger than 100 pixels in either height or width.

Required? NO
Can appear? ONCE

<dcterms:hasPart rdf:resource="">

pointer to divisions of the present object which have their own RDF objects
expresses a hierarchical relationship
e.g. a book object could points to its subordinate chapter objects
not currently exploited by COLLEX, but useful in the future for describing a graph of objects
Required? NO
Can appear? MULTIPLE

<dcterms:isPartOf rdf:resource="">

pointer to the RDF object of which the present object is a division
expresses a hierarchical relationship
e.g. a chapter object points to a book object
Required? NO
Can appear? MULTIPLE

<dc:relation rdf:resource="">

pointer to an associated resource
provide a means to express relationships among resources that have formal relationships to others, but exist as discrete resources themselves
e.g. images in a document, other volumes in a series or items in a collection
Required? NO
Can appear? MULTIPLE

<collex:federation>

The federation that this object belongs to. Currently, the two legal values of this are "NINES" and "18thConnect". Notice that an object can belong to more than one federation.
Required? YES
Can appear? MULTIPLE

<collex:ocr>

if present, a "true" value denotes that the content is was obtained by OCR, so there may be mistakes.
Required? NO (defaults to "false" if not present)
Can appear? ONCE

<collex:fulltext>

if present, a "true" value denotes that the text element points to the full text of the object, and not a summary, or other abbreviated content. In some cases, mostly for resources like victbib, this full text was NOT the full text of the object in question, rather it was project-specific text (like an annotated bibliography). In other cases it could be the full text of a poem, for example, for which there is no online plain text to harvest.
Required? NO (defaults to "false" if not present)
Can appear? ONCE

<dc:language>

This element identifies the language of the resource using the language codes from the ISO 639-2 Language Code List. The content of dc:language may be either from the first column (ISO 639-2 Code), the third column (English name of Language), or the fourth column (French name of Language). Please note that currently only the MESA search interface allows the user to limit a search by language; NINES and 18thConnect do not have this capability. An example appears at [4].
Required? NO (defaults to "English" if not present)
Can appear? ONCE

Testing, Troubleshooting, and Submitting RDF

The W3C makes available a great RDF Validator. Use this service to ensure your RDF parses correctly and to gain a deeper understanding of the graph nature of RDF (by enabling the graph display option).

NINES has developed a mechanism for contributors to upload their own RDF submissions, parse them against our schema, and test and tinker with them in a sandbox Collex interface. Once you've prepared a set of RDF you'd like to test, you can gain access to this staging area by submitting your materials to the NINES Inbox. From that page you can contact the Project Manager to set up your account and start the indexing process.

The Importance of Being Stable

We recommend linking to your RDF in the meta tags of your HTML as follows:

<link rel="meta" type="application/rdf xml" href="myobject.rdf"/>

These links are a semantic web "best practice." That said, Collex does not currently pick up changes to your HTML-linked RDF in any automated way. Instead, when you have revised RDF, you should upload it through the data administration system as a fresh batch. Please note that your fresh upload will completely replace all the RDF records NINES currently holds for your project. This means that the unique object id's expressed in each rdf:about field should remain stable.

These id's are the most brittle aspect of the NINES system. If you change an id, all the user-created content built on top of your object will be lost or ruined. This includes tags and annotations as well as NINES exhibits, such as course syllabi or critical essays.

The requirement that you keep stable NINES id's should not impact your ability to alter identifiers within your own archive at will.

Special Considerations for Dynamic Content

The Collex software matches a site's public URL to the one given in the rdfs:seeAlso link in order to make objects collectible from the bookmarklet. Web resources generated from a database or XSLT at run time present additional challenges, as parameters may be re-ordered or absent. Rather than listing every URL permutation as an rdfs:seeAlso entry, one should explicitly reference the RDF from the meta tag in the HTML (see note above). The Collex system then matches the rdf:about unique identifier for objects defined in your RDF and objects loaded in Collex.

The preferable solution for dynamic driven sites is to use URIs which hide the underlying technology, which is certain to change. See the following article for a technical explanation for future proofing your URIs:

 http://www.wrox.com/WileyCDA/Section/id-301495.html
Personal tools