Archive basics

From ARC Wiki
Jump to navigation Jump to search

An Introduction to Files and Markup at the Rossetti Archive

This brief guide is intended to introduce new wombats to the structure and maintenance of the Rossetti Archive.


"Markup" refers to a means of encoding text with information about its content or format. Markup consists of elements, or "tags", which surround information and describe it. A stylesheet then interprets the markup for a browser and tells the browser how to present the information on the computer screen.

We currently tag our documents in XML (Extensible Markup Language), although not so long ago we tagged our documents in SGML (Standardized General Markup Language). Both languages are systems for marking structured information with tags. The Rossetti Archive has its own set of tags, defined in a DTD (document type data) file that we use, and programmed to render in browsers by a set of stylesheets. All these elements of the archive are programmed and updated in-house.

Our XML editing programs (mainly oXygen) have functions that indicate what tags can be used in a given location. Additionally, a reference file called Tags Defined lists all the Rossetti Archive tags, their possible attributes, and their functions.

File Structures

We have three types of files, each with their own peculiarities of structure:

  • RAW = Rossetti Archive Work
  • RAD = Rossetti Archive Document
  • RAP = Rossetti Archive Picture

Each type of file has a generic template, located in /home/jjm2f//home10/rossetti_work/ROSSETTI/HOLDING/0proof.

Files have two main parts:

  1. Header: containing information about the archive and the file itself; and
  2. Body: containing commentary, description, and tagged text.

Files are structured according to a DTD (Document Type Definition). A DTD essentially establishes the rules by which a tag-set operates: defining the hierarchy of tags, which tags can nest in which, what information is necessary, etc.

For example, if you mark up a certain phrase as a title (e.g. <title>The Blessed Damozel</title>) the DTD tells you that the <title> tag must also have the attribute "level" which denotes what kind of title it is: e.g. <title level="work">The Blessed Damozel</title>.

The DTD also tells you that you cannot insert a <scribe> tag within <title> tags, although an <msadds> tag (manuscript additions) is allowed. You can see a graphical representation of the DTD with the oXygen program. The most up-to-date version of the Rossetti Archive DTD is called "ram.xsd" and can be found in SOURCE.

Any new or newly edited document must be parsed, that is, verified against the DTD to ensure that the file's structure accords to the structure dictated by the DTD. Only once all files parse cleanly can we publish material to the web, through a series of scripts.

Organization and Navigation of the Archive

RAC (Rossetti Archive Commentary)

Patrons access the archive through introductory html pages that lead to RACs, Rossetti Archive Commentaries, text files tagged in XML. (See the collected RACs .) These RACs are the main portals for the archive and hold information and listings on genres or types of works and documents. For example, we have RACs for "poems", "prose", "manuscripts", "doubleworks", "pictures", etc. From the central RACs page, one can either

  • access certain important works and documents that are hotlinked in the RACs, or
  • proceed to an alphabetical or chronological listing of RAWs.

RAW (Rossetti Archive Work)

RAWs are XML files that refer not to any particular document but to the idea that informs and associates a group of documents with one another, creating a relatively cohesive unit. The RAW deals with Rossetti's work at the conceptual level. An example (the most popular one around here) is "The Blessed Damozel". "The Blessed Damozel", considered as a work, includes all the studies, complete paintings, and reproduction of the pictorial idea that is the Damozel, as well as all the textual instantiations of the poem, "The Blessed Damozel", including drafts, revisions, proof copies, copies of printed editions, the memorial reconstruction in your head, etc. In practical terms, the RAW is the page that binds together the disparate objects that belong to the idea that is "The Blessed Damozel." These objects are collected under the heading "browse collection". RAWs provide commentary and information about the work and lead the user to particular documents.

RAD (Rossetti Archive Document), RAP (Rossetti Archive Picture), and Images

Those particular documents are the RADs, RAPs, and images. RADs and RAPs are xml files that give detailed information on a particular document: RADs deal with textual documents; RAPs deal with pictorial documents. Every document in the archive has its own RAD or RAP, unless that document is a photographic reproduction (exceptions include Delaware repros, which include a great deal of textual annotation). Photo reproductions are included in the RAP of the document of which they are reproductions. Images of the RAD and RAP documents are also presented in the RAD and RAP file.

Thus, the basic navigation works as follows.

RAC --> RAW --> RAD/RAP; or
Genre --> Work --> Document (which holds images)

From any page, the user can access a search engine. Also, each page usually has multiple links to other Archive works and documents, creating a vast interconnected web/hypertext.

Servers and Files Behind-the-Scenes

The Rossetti Archive is currently published to the internet from servers of the Institute for Advanced Technology in the Humanities (IATH) at the University of Virginia. Our staff currently uses a set of in-house servers for uploading and working with files. These are:

Our main collecting point for Rossetti Archive files and documentation. Jefferson includes our allocated web space which includes these key directories:

  • /home10/rossetti_tools/ -- repository for scripts and technical stuff
  • /home10/rossetti_work/ -- the main editorial area, including
    • /ROSSETTI/SOURCE: includes all raws, rads, and raps currently in process for publication to the web. Not the live server, but the central repository for files ready-to-go.
    • /ROSSETTI/HOLDING: includes templates, incomplete rads currently being worked on, and the staging directory for images to-be-processed.
  • /home10/rossetti -- the live site, containing the xml, html and txt files to be accessed by our online users
  • /home10/rossetti2 -- the test site, usually a mirror of the live site, unless recent changes have not yet been deployed.

NINES (Networked Interface for Nineteenth-Century Electronic Scholarship)

The Rossetti Archive is among several scholarly projects to become associated with NINES, developed in-house. From NINES, users can search and collect objects and information from the array of scholarly projects. This means that projects like the Rossetti Archive must encode its objects with certain standardized information to be compliant with the NINES interface. Most of this already happens behind the scenes at NINES, which harvests information from the tags within the Rossetti Archive's thousands of files. By following our standard tagging procedures and making sure that all tag attributes are correct, we are already providing NINES with the appropriate information they need.

Once this information is harvested, NINES makes available to users an array of tools to collect, sort, juxtapose, tag, and manipulate files according to their own scholarly projects. See more about NINES at