Principles for Metadata Reform

From ARC Wiki
Jump to navigation Jump to search

Our metadata form, the form that is used to submit projects to the ARC nodes (MESA, REKn, 18thConnect, NINES, and ModNets) was developed by NINES in 2005, and its specifications can be found at Submitting RDF. We are working on revising what we fondly call our ‘NINES RDF’, and what follows are some principles to which we are adhering as we do so.


1. The success of NINES and subsequent nodes (MESA, REKn, 18thConnect, and ModNets) has depended upon and will continue to depend upon the leanness of the metadata form. That is, if we require people to submit too much information, and if our forms are too complicated (coming closer and closer to library metadata such as Marc records or Mets), literary scholars and proprietors of research databases will not be able to submit their projects to us for peer review and inclusion in our search of peer-reviewed scholarly sites. In that case, only libraries would be able to submit, and we would in fact just become a library search engine, albeit a more global one. We decided at the inception of NINES in 2003 to keep the bar for submission as low as possible, technically speaking, while making peer review as rigorous as possible: we are scholars first, valuing the quality of the research over the technical prowess of its creators, and so we cannot create technological barriers to entry. Moreover, too many required fields to fill in will make the form too complicated and ambiguous to be useful to us. Practically, this means that we will only add required fields to our metadata form after extended discussion and with the approval of all ARC members.


2. Another value of the nodes coming together under ARC is the requirement of working together to generate the most important metadata categories for literary, cultural, and historical scholarship. That is, this organization spans the disciplines within the discipline of English literature (philosophy, history), the periods: medieval, Renaissance, 18th century, 19th century, and modernist. We wish to achieve a very difficult balance: acknowledging the value of disciplinary categories while looking to a future in which disciplinary boundaries might disappear. When revising our metadata scheme, then, we must consciously attempt to create a scheme that is relevant for all the periods while simultaneously allowing our home discipline to best express what is significant in its own terms. Keeping an eye on the needs of other periods is the hardest part, and it will require reforming what we already have. For instance, one ‘genre’ term in NINES is ‘manuscript’: aside from the fact that it is really a medium, the problem with this term is that it is useless to medievalists and wouldn’t mean the same thing to Renaissance and Restoration writers, for whom circulating manuscripts was one mode of publication, as it does to the 19th century, for whom it means something unpublished. We are in the process of rethinking this problem since MESA has joined, and this is good, important work, potentially helping the discipline of 19th century studies to understand itself better.


3. The taxonomies we develop as part of our rdf scheme, our metadata form, will be taken up by contributors to our nodes and could potentially become one of the standards for the semantic web. This is a big responsibility: we can at this moment have an impact on the way that literature is found on the Internet, not just by search engines but by data-mining and information systems that create encyclopedic definitions of the world ‘on the fly’. We are participating in the emergence of knowledge organization for the future, beyond Dewey and Library of Congress schemes. The temptation here will be to load our metadata with every possible category of importance to us, but, just as scholars can’t meet really onerous technological requirements for self-definition as a condition of submitting their work to us, no google-like search engine will take account of ALL our categories: these crawlers, data-miners, and web streaming systems will select a few categories randomly if we do not pick out for them what is truly important.