Difference between revisions of "Submitting RDF"
() |
() |
||
Line 332: | Line 332: | ||
====<dc:date>==== | ====<dc:date>==== | ||
: date of the object | : date of the object | ||
− | : may contain either a four digit year or a <code><nowiki><collex:date></nowiki></code> element | + | : may contain either a four digit year inside the <code><nowiki><dc:date></nowiki></code> tags or a <code><nowiki><collex:date></nowiki></code> element |
: '''Please note: contributors should, when at all possible, attempt to include a date ''even'' when a date value is unknown or uncertain''' | : '''Please note: contributors should, when at all possible, attempt to include a date ''even'' when a date value is unknown or uncertain''' | ||
::Unknown or uncertain dates can, in most cases, be narrowed to a possible date range, be it a decade or a century; contributors should use the <code><nowiki><nines:date></nowiki></code> formula to record a human-readable value (<code><rdfs:label></code>) and a computational value (<code><rdf:value></code>). | ::Unknown or uncertain dates can, in most cases, be narrowed to a possible date range, be it a decade or a century; contributors should use the <code><nowiki><nines:date></nowiki></code> formula to record a human-readable value (<code><rdfs:label></code>) and a computational value (<code><rdf:value></code>). |
Revision as of 09:16, 22 August 2013
RDF is the metadata format that contributors use to make their resources available for use within NINES, 18thConnect and MESA. With RDF, contributors describe each of their resources in general terms that allow those resources to be categorized and searched through COLLEX.
For more basic information about RDF and the principles of generating it for your project, see this page at NINES.
Contents
- 1 Samples
- 2 RDF Specification
- 2.1 Element Definitions
- 2.1.1 <rdf:RDF>
- 2.1.2 <custom_namespace rdf:about="value">
- 2.1.3 <collex:archive>
- 2.1.4 <dc:title>
- 2.1.5 <dcterms:alternative>
- 2.1.6 <dc:source>
- 2.1.7 <dc:subject>
- 2.1.8 <dc:type>
- 2.1.9 <role:***>
- 2.1.10 <collex:discipline>
- 2.1.11 <collex:genre>
- 2.1.12 <dc:date>
- 2.1.13 <collex:date>
- 2.1.14 <rdfs:label>
- 2.1.15 <rdf:value>
- 2.1.16 <collex:freeculture>
- 2.1.17 <collex:source_xml>
- 2.1.18 <collex:source_html>
- 2.1.19 <collex:source_sgml>
- 2.1.20 <rdfs:seeAlso rdf:resource="">
- 2.1.21 <collex:text>
- 2.1.22 <collex:image rdf:resource="">
- 2.1.23 <collex:thumbnail rdf:resource="">
- 2.1.24 <dcterms:hasPart rdf:resource="">
- 2.1.25 <dcterms:isPartOf rdf:resource="">
- 2.1.26 <dc:relation rdf:resource="">
- 2.1.27 <collex:federation>
- 2.1.28 <collex:ocr>
- 2.1.29 <collex:fulltext>
- 2.1.30 <dc:language>
- 2.1 Element Definitions
- 3 Testing, Troubleshooting, and Submitting RDF
- 4 The Importance of Being Stable
- 5 Special Considerations for Dynamic Content
Samples
Below is a generic, mock RDF file. We have also made several RDF samples available, along with samples of XSLT for transforming XML source files into RDF metadata.
RDF Mock-up
<?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:collex="http://www.collex.org/schema#" xmlns:ra="http://www.rossettiarchive.org/schema#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:role="http://www.loc.gov/loc.terms/relators/"> <YOUR:NAMESPACE rdf:about="UNIQUE_OBJECT_ID"> <!-- choose your federation: multiples are also allowed. --> <collex:federation>NINES</collex:federation> <collex:federation>18thConnect</collex:federation> <collex:federation>MESA</collex:federation> <collex:archive>CONTRIBUTING PROJECT</collex:archive> <dc:title>OBJECT TITLE</dc:title> <dcterms:alternative>ALTERNATE TITLE</dcterms:alternative> <dc:source>TITLE OF JOURNAL/ANTHOLOGY/LARGER WORK</dc:source> <role:ART>VISUAL ARTIST</role:ART> <role:AUT>AUTHOR</role:AUT> <role:EDT>EDITOR</role:EDT> <role:PBL>PUBLISHER</role:PBL> <role:TRL>TRANSLATOR</role:TRL> <role:CRE>CREATOR</role:CRE> <role:ETR>ETCHER</role:ETR> <role:EGR>ENGRAVER</role:EGR> <dc:type>TYPE</dc:type> <collex:discipline>DISCIPLINE</collex:discipline> <collex:genre>GENRE</collex:genre> <collex:genre>ANOTHER GENRE</collex:genre> <collex:freeculture>TRUE OR FALSE</collex:freeculture> <collex:ocr>TRUE OR FALSE</collex:ocr> <collex:fulltext>TRUE OR FALSE</collex:fulltext> <dc:language>LANGUAGE OF RESOURCE</dc:language> <dc:date>4-DIGIT-DATE</dc:date> <collex:thumbnail rdf:resource="http://YOUR_PUBLICATION.ORG/THUMBNAIL.JPG"/> <collex:image rdf:resource="http://YOUR_PUBLICATION.ORG/FULL_SIZE_IMAGE.JPG"/> <collex:source_xml rdf:resource="http://YOUR_ENCODED_RESOURCE.XML"/> <collex:text rdf:resource="http://PLAIN_TEXT_OBJECT_TRANSCRIPTION.TXT"/> <rdfs:seeAlso rdf:resource="http://YOUR_PUBLICATION.ORG/YOUR_OBJECT.HTML"/> <dcterms:hasPart rdf:resource="ANOTHER_OBJECT_CONTAINED_BY_THIS_OBJECT"/> <dcterms:isPartOf rdf:resource="AN_OBJECT_THAT_CONTAINS_THIS_OBJECT"> <dc:relation rdf:resource="AN_ASSOCIATED_OBJECT"> </YOUR:NAMESPACE> </rdf:RDF>
RDF Specification
Element Definitions
All element values should not include leading or trailing whitespace. In other words, <dc:date>1875</dc:date>
is correct while <dc:date> 1875 </dc:date>
is incorrect.
<rdf:RDF>
- the root element of the RDF file, listing namespace declarations with multiple "
xmlns:___
" attributes - it isn't necessary to reference an actual XSD schema to validate the RDF--use the "xmlns" value only to establish a unique namespace
- Required? YES
- Can appear? ONCE
<custom_namespace rdf:about="value">
- denotes the object
- a child element of
<rdf:RDF>
with a project defined namespace "rdf:about"
attribute records the unique id for the object- Required? YES
- Can appear? ONCE
<collex:archive>
- a shorthand reference to the contributing project or journal, one word such as "rossetti" or "rc_praxis." This word should be unique to this particular set of content. You shouldn't, therefore, choose a reference like "PodunkUP" if Podunk University Press intends to contribute a different set of content in future. (Instead, choose "PodunkUP_journal1.") You may use a wide variety of characters to form this name, but it is recommended that you use only lower case letters, numbers, and the underscore.
- Required? YES
- Can appear? ONCE
<dc:title>
- the title of the object
- Required? YES
- Can appear? ONCE
<dcterms:alternative>
- an alternative title of the object
- Required? NO
- Can appear? MULTIPLE
<dc:source>
- title of the larger work, resource, or collection of which the present object takes part
- can be used for the title of a journal, anthology, book, online collection, etc.
- Required? NO
- Can appear? MULTIPLE
<dc:subject>
- a single keyword which is not currently displayed, but may be applied to tags and for future mining of data from ARC Partners sites.
- Required? NO
- Can appear? MULTIPLE
<dc:type>
- adapted from the DCMI list of types, this term should describe the medium, or format of the object.
Available Type Values Codex Collection Drawing Illustration Interactive Resource Manuscript Map Moving Image Periodical Physical Object Roll Sheet Sound Still Image Typescript
- Required? YES
- Can appear? MULTIPLE
<role:***>
- individual involved in the creation of the object
- possible element names include
<role:ART>
for Visual Artist<role:AUT>
for Author<role:EDT>
for Editor<role:PBL>
for Publisher<role:TRL>
for Translator<role:CRE>
for Creator<role:ETR>
for Etcher<role:EGR>
for Engraver<role:OWN>
for Owner<role:ART>
for Artist<role:ARC>
for Architect<role:BND>
for Binder<role:BKD>
for Book designer<role:BKP>
for Book producer<role:CLL>
for Calligrapher<role:CTG>
for Cartographer<role:COL>
for Collector<role:CLR>
for Colorist<role:CWT>
for Commentator for written text<role:COM>
for Compiler<role:CMT>
for Compositor<role:CRE>
for Creator<role:DUB>
for Dubious author<role:FAC>
for Facsimilist<role:ILU>
for Illuminator<role:ILL>
for Illustrator<role:LTG>
for Lithographer<role:PRT>
for Printer<role:POP>
for Printer of plates<role:PRM>
for Printmaker<role:RPS>
for Repository<role:RBR>
for Rubricator<role:SCR>
for Scribe)<role:SCL>
for Sculptor)<role:TYD>
for Type designer)<role:TYG>
for Typographer)<role:WDE>
for Wood engraver)<role:WDC>
for Wood cutter)
- ARC Partner sites recommend the submission of names in the format of "Last, First." Contributors are also encouraged to consult the Library of Congress authorities list. Please be internally consistent and keep good records of any names you use.
- Please note: each element's content values pertain only to the object at hand, not to the object's content or subject matter; when you list a particular name as "author," this should be the author of the object, not an author described in the object's text.
- ARC strongly encourages using
<role:ART>
or<role:AUT>
, even when the agent is unknown or anonymous. In such cases, use the standard values "Unknown" or "Anonymous." For example,<role:AUT>Unknown</role:AUT>
. Variants of those values ("Unk." or "Anon.") will degrade the usability of the faceted browser. - Required? YES
- Can appear? MULTIPLE
<collex:discipline>
- information about the disciplines that may be interested in the object
- Each object is required to have at least one valid discipline from the list below.
- Required? YES
- Can appear? MULTIPLE
Available Discipline Values Anthropology Archaeology Architecture Art History Book History Classics and Ancient History Ethnic Studies Film Studies Gender Studies Geography History Law Literature Manuscript Studies Math Musicology Philosophy Religious Studies Science Theater Studies
<collex:genre>
- basic descriptive genres for Collex materials
- Each object is required to have at least one valid genre from the list below.
- Required? YES
- Can appear? MULTIPLE
Available Genre Values Bibliography Catalog Citation Collection Correspondence Criticism Drama Ephemera Fiction Historiography Law Life Writing Liturgy Musical Analysis Music, Other Musical Recording Musical Score Nonfiction Paratext Philosophy Photograph Poetry Religion Religion, Other Reference Works Review Scripture Sermon Translation Travel Writing Unspecified Visual Art
- Please note the difference between 'Bibliography' (that is, a collection of bibliographical citations) and 'Citation' (referring to one citation, possibly within a larger list).
<dc:date>
- date of the object
- may contain either a four digit year inside the
<dc:date>
tags or a<collex:date>
element - Please note: contributors should, when at all possible, attempt to include a date even when a date value is unknown or uncertain
- Unknown or uncertain dates can, in most cases, be narrowed to a possible date range, be it a decade or a century; contributors should use the
<nines:date>
formula to record a human-readable value (<rdfs:label>
) and a computational value (<rdf:value>
).- To narrow a range to a decade, replace the last year digit with "u." E.g. 1860's are written as "186u" in
<rdf:value>
. - To narrow a range to a century, replace the last two year digits with "u." E.g. 1800's are written as "18uu" in
<rdf:value>
. - The value of
<rdfs:label>
can be anything one would like: "1860's", "1800's", "Likely the 1860's", "1860 through 1869", etc.
- To narrow a range to a decade, replace the last year digit with "u." E.g. 1860's are written as "186u" in
- Unknown or uncertain dates can, in most cases, be narrowed to a possible date range, be it a decade or a century; contributors should use the
<dc:date> <collex:date> <rdfs:label>1890-99 (circa)</rdfs:label> <rdf:value>189u</rdf:value> </collex:date> </dc:date>
- Objects which were produced over a number of years can receive a date range. Again, use the
<collex:date>
scheme. The<rdfs:label>
takes any human-readable formulation, e.g. "1861 through 1862".<rdf:value>
would encode the start-date and end-date in the range in a comma-separated format, e.g. "1861,1862".
- Objects which were produced over a number of years can receive a date range. Again, use the
<dc:date> <collex:date> <rdfs:label>1891-93</rdfs:label> <rdf:value>1891,1893</rdf:value> </collex:date> </dc:date>
- Objects that are uncertainly dated but still known to be composed within a specific date range should receive a hybrid formulation, involving two
<dc:date>
elements. One<dc:date>
records the date range; the second<dc:date>
marks the object's date as "Uncertain".
- Objects that are uncertainly dated but still known to be composed within a specific date range should receive a hybrid formulation, involving two
<dc:date> <collex:date> <rdfs:label>Sometime between 1891 and 1893</rdfs:label> <rdf:value>1891,1893</rdf:value> </collex:date> </dc:date> <dc:date>Uncertain</dc:date>
- Objects worked on in nonconsecutive years should receive distinct
<dc:date>
elements for each year. So, for an object begun in 1890, put on hiatus in 1891, then concluded in 1892, the encoding would be ...
- Objects worked on in nonconsecutive years should receive distinct
<dc:date>1890</dc:date> <dc:date>1892</dc:date>
- Required? YES
- Can appear? MULTIPLE
<collex:date>
- element used when contributor wants to preserve more human readable date information while also including a formal date value
- has two child elements,
<rdfs:label>
and<rdf:value>
- Required? NO
- Can appear? ONCE
<rdfs:label>
- preserves a human readable date value, e.g. "1806 (circa)"
- will appear as the "Date" value in COLLEX query results
- Required? NO
- Can appear? ONCE
<rdf:value>
- formal, four-digit date value of the
<rdfs:label>
contents - used for computational sorting and querying
- Required? NO
- Can appear? ONCE
<collex:freeculture>
- if present, a "true" value denotes that the content is free and available for use by all people in all places, whereas as a "false" value denotes that the content is restricted in some way to subscribers.
- Required? NO (defaults to "true" if not present)
- Can appear? ONCE
<collex:source_xml>
- pointer to the web-accessible source code for the data in XML format.
- Required? NO
- Can appear? ONCE
<collex:source_html>
- pointer to the web-accessible source code for the data in HTML format.
- Required? NO
- Can appear? ONCE
<collex:source_sgml>
- pointer to the web-accessible source code for the data in SGML format.
- Required? NO
- Can appear? ONCE
<rdfs:seeAlso rdf:resource="">
- pointer to the web-accessible object as it is rendered in your own interface. distinct urls displaying the same content should each get an rdfs:seeAlso entry.
- usually an html page. During indexing, the NINES server issues a HEAD request to the specified URL (not a GET) and follows redirects.
- Required? YES
- Can appear? ONCE
<collex:text>
- contains either:
- 1) URL to a web-accessible, plain text transcription of the object, like the following:
<collex:text rdf:resource="http://www.rossettiarchive.org/docs/1-1835.raw.txt"/>
- 2) plain text of the transcript within the nines:text element, such as:
<collex:text>full text goes here</collex:text>
- indexed by the COLLEX search engine and used for full-text queries. This should be a "pure" transcript of the text content of the object, without extraneous text from navigation elements, copyright statements, etc. Encode plain text in UTF-8 format.
- Required? NO
- Can appear? ONCE
<collex:image rdf:resource="">
- pointer to the web-accessible, full-size digital image of the object
- Required? NO
- Can appear? ONCE
<collex:thumbnail rdf:resource="">
- pointer to the web-accessible, thumbnail-sized digital image of the object. We suggest that you make your thumbnails no larger than 100 pixels in either height or width.
- Required? NO
- Can appear? ONCE
<dcterms:hasPart rdf:resource="">
- pointer to divisions of the present object which have their own RDF objects
- expresses a hierarchical relationship
- e.g. a book object could points to its subordinate chapter objects
- not currently exploited by COLLEX, but useful in the future for describing a graph of objects
- Required? NO
- Can appear? MULTIPLE
<dcterms:isPartOf rdf:resource="">
- pointer to the RDF object of which the present object is a division
- expresses a hierarchical relationship
- e.g. a chapter object points to a book object
- Required? NO
- Can appear? MULTIPLE
<dc:relation rdf:resource="">
- pointer to an associated resource
- provide a means to express relationships among resources that have formal relationships to others, but exist as discrete resources themselves
- e.g. images in a document, other volumes in a series or items in a collection
- Required? NO
- Can appear? MULTIPLE
<collex:federation>
- The federation that this object belongs to. Currently, the two legal values of this are "NINES" and "18thConnect". Notice that an object can belong to more than one federation.
- Required? YES
- Can appear? MULTIPLE
<collex:ocr>
- if present, a "true" value denotes that the content is was obtained by OCR, so there may be mistakes.
- Required? NO (defaults to "false" if not present)
- Can appear? ONCE
<collex:fulltext>
- if present, a "false" value denotes that the text element points to the full text of the object, and not a summary, or other abbreviated content.
- Required? NO (defaults to "true" if not present)
- Can appear? ONCE
<dc:language>
- This element identifies the language of the resource. Please note whether the language is English, French, German, or Italian.
- Required? NO (defaults to "English" if not present)
- Can appear? ONCE
Testing, Troubleshooting, and Submitting RDF
The W3C makes available a great RDF Validator. Use this service to ensure your RDF parses correctly and to gain a deeper understanding of the graph nature of RDF (by enabling the graph display option).
NINES has developed a mechanism for contributors to upload their own RDF submissions, parse them against our schema, and test and tinker with them in a sandbox Collex interface. Once you've prepared a set of RDF you'd like to test, you can gain access to this staging area by submitting your materials to the NINES Inbox. From that page you can contact the Project Manager to set up your account and start the indexing process.
The Importance of Being Stable
We recommend linking to your RDF in the meta tags of your HTML as follows:
<link rel="meta" type="application/rdf xml" href="myobject.rdf"/>
These links are a semantic web "best practice." That said, Collex does not currently pick up changes to your HTML-linked RDF in any automated way. Instead, when you have revised RDF, you should upload it through the data administration system as a fresh batch. Please note that your fresh upload will completely replace all the RDF records NINES currently holds for your project. This means that the unique object id's expressed in each rdf:about field should remain stable.
These id's are the most brittle aspect of the NINES system. If you change an id, all the user-created content built on top of your object will be lost or ruined. This includes tags and annotations as well as NINES exhibits, such as course syllabi or critical essays.
The requirement that you keep stable NINES id's should not impact your ability to alter identifiers within your own archive at will.
Special Considerations for Dynamic Content
The Collex software matches a site's public URL to the one given in the rdfs:seeAlso link in order to make objects collectible from the bookmarklet. Web resources generated from a database or XSLT at run time present additional challenges, as parameters may be re-ordered or absent. Rather than listing every URL permutation as an rdfs:seeAlso entry, one should explicitly reference the RDF from the meta tag in the HTML (see note above). The Collex system then matches the rdf:about unique identifier for objects defined in your RDF and objects loaded in Collex.
The preferable solution for dynamic driven sites is to use URIs which hide the underlying technology, which is certain to change. See the following article for a technical explanation for future proofing your URIs:
http://www.wrox.com/WileyCDA/Section/id-301495.html