Tagging Resources

From Collex/ARC Wiki
Jump to: navigation, search


Production and Maintenance of the Rossetti Archive

Staff at the Rossetti Archive basically do the following work:

  1. Acquire document images for the archive, by scanning or library services, and process them for publishing on the web. We use standard procedures for the digitizing of images. The gist of the process is scanning the image, correcting the scan, assigning a name to the image, making a jpeg formatted image from the tiff, and putting everything where it's supposed to be. Everything is documented on the Imaging Procedures page.
  2. Create new RAW, RAD, and RAP files by tagging these documents in xml and supplying information and commentaries.
  3. Proof and parse these files for publishing to the web. All files are proofed several times against the originals or page images. Before they can be published, all xml files must be parsed against the most up-to-date version of "ram.xsd" to insure compliance with the DTD.
  4. Republish the archive to reflect updates and new files.

As of December 2005, most of the production of images and new RAWs, RADs, and RAPs has already been done. Most of our efforts are currently devoted to proofing and polishing the archive, and insuring its compliance with NINES.

Proofing Workflow

To proof any Rossetti Archive document, first check the large black folders in which we keep records of what has been proofed, what needs proofing, etc. For new proofing jobs, please use either the RAD or RAP proofing checklist, as appropriate. Follow along, checking off what steps you've taken, and file the finished document back in the folder. Please make sure to add a comment tag to any file you've modified in SOURCE.

Comment Tags

Always note when you've worked on or altered a document by putting comment tags in the header. Each file should contain some version of the following three comment tags:

<!-- last updated [date][initials] -->
<!-- text proofed [date][initials] -->
<!-- parsed [date][initials]-->

Comment tags should appear after the RAM header information and before the rad/rap/raw header begins. For example:

<?xml version="1.0" encoding="iso-8859-1"?>
<ram xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="../ram.xsd" archivetype="rad"

<!-- last updated 06/09/04 jjm -->
<!-- text proofed 06/11/04 mw -->
<!-- parsed 06/13/04 mw -->

<rad type="serial" id="workcode">

Tags and Attributes

We use a wide range of XML tags that have different requirements. Many of these tags need to appear within or outside of other tags. Many tags can specify attributes that themselves characterize something about what they contain, or modify their contents: <tag attribute1="[value]">

To find out all the possible attributes any tag can have, or within what other tags a particular tag can appear, check Tags Defined. This reference document usually has information about the possible attributes tags can have, too. Another good bet is to look at active, proofed rads for models.

If that fails, you can also check the DTD, or document type data, which is the information contained in the file "ram.xsd". This more complicated document specifies what Rossetti Archive tags are and what their "children" can be. (For example, searching for the element named "msadds" we find it requires the attribute "type" and that its possible values are add, sig, assign, note, printdir, and other.) Please do not alter this document without consulting the project manager.

"Highlight Rendering": <hi rend="">

Commonly used tag for text formatting options. Punctuation should be coded outside of the <hi rend=""></hi> tags. Please note: do not use the <hi rend> tag for font styles (i.e. "gothic") or font sizes.

  • b -- bold
  • i -- italic
  • u -- underline
  • c -- all caps: renders a word or a series of words in all capital letters of equal height.
  • sc -- small caps: renders a word or a series of words in capital letters with some slightly larger if they are capitalized in the source code. Small-capped words should not be tagged in all capitals. Otherwise we won't know what characters are relatively larger than each other. So only the largest should be encoded in upper case. Examples:
    • Right way: <hi rend="sc">Andromeda</hi>, by Perseus wed
    • Wrong way: <hi rend="sc">ANDROMEDA</hi>, by Perseus wed
  • inf -- inferior, or subscript: renders a word or series of words which appear below the line of text.
  • sup -- superior, or superscript: renders a word or series of words which appear above the line of text.
  • center -- center on page

Paragraph rendering

The default will be to render paragraphs as indented. To render paragraphs with no indent (ni = no indent), use:

<p rend="ni"> 

To render paragraphs with a "hanging indent" otherwise called an outdent (oi = out indent), use:

<p rend="oi">

Limits on the "rend" attribute

In years past, the "rend" attribute was frequently added to various tags, not just the <hi> tag. Examples include the line break, line, and title tags, to name a few. We're working on making these render. However, this is now considered non-standard. Going forward, please use the <hi> tag whenever possible.

Line Breaks: <lb/>

A self-closing tag. Use line breaks in prose only with DGR works. We do not record line breaks in prose works by other authors, as in reviews or essays.

Quotation Marks

Never use the keyboard quotes (" ") anywhere except inside tags (e.g. <workcode="#">, <msadds type="printdir"> etc.). To encode quotation marks that should render in a document, always use the Unicode Entity References.

Entity references are formatting tags. They do not mark content, they merely indicate characters to be displayed. Clichés and words emphasized using quotation marks should be tagged with entity references. They should not be tagged with <quote> tags unless they are specific quotes. Entity references should be used to mirror the formatting on the page, and will often be used in conjunction with <quote> tags.

Quote Tags: <quote>

<quote> tags are used to mark content only. They are to be used whenever you encounter an actual quotation. This does not include clichés or unusual word usage, unless those are direct quotations.

<quote> tags will not render any quotation marks. If you are tagging a quotation that appears in quotation marks, you will need to use both the <quote> tag pair and entity references for quotation marks (e.g. “). For example: "Hell hath no fury ..." would be tagged:

“Hell hath no fury . . . ”

Rare exception: unless the cliché is being quoted: As DGR wrote in his poem, "Hell hath no fury ..." would be tagged:

As DGR wrote in his poem,
<quote>“Hell hath no fury . . . ”</quote>

Internal Links to Existing Rossetti Documents

When DGR or others reference works in the Rossetti Archive (i.e. anything with a workcode), use the <xref> tag to link to that document. We tag for titles and what kind of work (level=); also for specific documents and what pages (from=). Two samples:

To make a live link to a work in the Rossetti Archive:

DGR revised the first sonnet when he came to publish it in the 
<title level="wrk">
	<xref doc="a.1-1870.1stedn.rad" workcode="1-1870" from="265">
	“Sonnets for Pictures”

Linking to an image works much the same way:

The painting 
<title level="pic">
<xref doc="a.s40.rap">
<hi rend="i">The Girlhood of Mary Virgin</hi>
was begun in August 1848 and completed for exhibition in March 1849.

Bibliographic Citations: <bibl>

Use the tag structure below for any bibliographic citations, including author, work, pages, etc. These usually include an <xref> link to a non-existent rad, tagged link="dead" so it doesn't show up. (We do this because we assume that someday, someway, everything will be archived and available, even stuff we don't put in ourselves. On that glorious day, a global find-replace will resuscitate all these latent links into live ones.)

The second sonnet may or may not have been subjected
to revisions before it was finally (first) published
in regular print form (in 
<title level="bk">
<xref doc="a.nd497.r8s5.rad" link="dead" from="130" workcode="9-1848.s40">
<hi rend="i">DGR: A Record and a Study </hi>
), after DGR's death.  

To find the list of rads for any work cited in the Rossetti Archive, see the "workscited" master list in jefferson://home10/rossetti_work/ROSSETTI/SOURCE/racs/workscited.rac.xml

Valid Title Attributes: <title level="">

For works by DGR:

  • wrk -- works
  • doc -- docs (books by DGR are docs, not books)
  • pic -- pictures
  • ms -- manuscripts
  • prf -- proofs
  • statue -- statue

Works not by DGR:

  • es -- essays
  • per -- periodicals
  • bk -- books
  • etx -- e-texts
  • eph -- ephemera
  • wrk -- anything else

Work Unit: <workunit>

Whenever a DGR work is quoted (and even when he quotes his own works), use the <workunit> tag. The workunit tag simply lets us know that the lines in question are from a DGR work; <workunit> must exist inside a <quote> tag pair, and may contain formatting tags (unicode entity references). Note that DGR's letters, and quotes of conversations with him do not need <workunit> tags--only things that have (or should have) workcodes will take a <workunit> tag.

The <workunit> tag is used to indicate and identify block and in-line quotations of DGR's work (anything with a workcode; letters and conversations quoted do not take <workunit> tags). To be used primarily in secondary prose works.

Attributes: <workunit [attribute]="">

  • display (block or inline): indicates position and presentation
  • wholeness (whole or part): a complete work or not?
  • workcode
  • type
  • doc (optional)
  • to (optional)
  • from (optional)

Example: marking a block quotation in the midst of a paragraph from The Stealthy School of Criticism:

<p>A third quotation is from <title level="wrk">
<xref doc="20-1869.raw">Eden Bower</xref></title>, and says,

<workunit display="block" wholeness="part" workcode="20-1869" type="ballad">
<l n="1" r="187">“What more prize than love to impel thee?
<l n="2" r="188">Grip and lip my limbs as I tell thee!”

<lb/>Here again no reference is given . . . [etc.]

Work Codes

The number identifying a discrete concept, regardless of whether or not that concept was ever realized as text or image. A work is any conceptual entity of DGR's, realized in any medium, or media, or never realized. A work may be nested within another work. E.g., the second "Willowwood" sonnet has the workcode 14b-1869, is part of the "Willowwood" sequence (14-1869), which is contained in "Songs and Sonnets towards a work to be called 'The House of Life'" (44-1869), which is part of the 1870 _Poems_ (1-1870). [discuss overlap - 44-1869 vs. 22-1881]

For textual works: workcodes are assigned in chronological order (as far as it is possible to determine): the first work of 1870, for example, will be 1-1870, the second 2-1870, and so on. (See /home10/rossetti_work/tools/INDEXES/MASTER.LIST)

For Pictures, workcodes are assigned according to their catalogue numbers (e.g.: S244 for Surtees, SA56 for Surtees Additions, F8 for Fredeman, and OP35 for OtherPictures). (In /home10/rossetti_work/tools/INDEXES/ see: surtees-index.txt; surteesadds-index.txt; otherpictures.list; and fredeman-index.txt)

For Letters: DGR and non-DGR letters take the following form:


For editorial commentary (prefaces, endnotes, etc) and editorial constructions (such as WMR's assemblage of DGR's limericks) use the following form:


For notebooks and other ephemeral writings:


For marginalia (ie. manuscript commentary added to a printed text, not including DGR's revisions to proof sets, trialbooks, etc.):


The "Master List"

A complete list of all the workcodes in the archive, organized by type. Included in the list is information about what files the workcodes contain and are associated with. [Why does the master list continue to be useful?]

Valid ram Types

In the header of any RAD or RAP. Not to be confused with metatypes, or "type" attributes of other tags like <title>.



These are identifiers for on-the-fly web sorting, added as attributes to the <rad>, <rap>, or <raw> fields at the tops of ALL documents.

Format: <rad id="" type="" metatype="">

Possible values are:

web.book (full DGR volumes, proofs, pamphlets, broadsides, musical scores, etc)
web.serial (periodicals)
web.poem (DGR)
web.prose (DGR)
web.translation (DGR)
web.manuscript (DGR)
web.otherbook (non-DGR)
web.otherpic (non-DGR)
web.visual (any DGR artwork)
web.doublework (this supercedes all other categories)

Valid div Types

A "div" tag denotes a section or subsection of a book, poem, etc. You can see the range of attributes for a
tag in Tags Defined. One particular attribute is crucial: type. The type attribute specifies exactly what kind of object we're dealing with. Such information is crucial for search results and interfacing with the NINES project. Please do not invent new div types; use only valid type attributes from the list below.

<div0 type="[something valid]" n="1">

art criticism
art notes
bibliographic notes
cover notes
drama notes
dramatic monologue
half title
notebook entry
picture notes
poem group
poem notes
prolonged sonnet
story notes
table of contents

Tagging Procedure for New RADs

New RADs are usually assigned by Jerry or a project manager. Contact either one for help, if necessary.

  1. Locate document or image(s) of document to tag from. We typically place such source files in algernon/SOURCE. Make a copy of the file locally, and work from this copy, replacing the original when finished.
  2. Select a copy text (the reading text; ask Jerry) from which to make the RAD. If there is no copy text existing, use one of the templates in jefferson://home10/rossetti_work/ROSSETTI/HOLDING/0proof.
  3. Open the copy text or template with the oXygen XML editor.
  4. Save the file in 0proof according to the file name conventions:
    1. Use workcodes and ".(extension)" to differentiate, if necessary. For example:
      • 1-1870.bl1pr.rad.xml = the British library proof of the 1st edition of the 1870 Poems.
      • 1-1870.troxexhum1a.rad.xml = Princeton/Troxell exhumation proof 1a.
      • 1-1870.fiz1pr.rad.xml = Fitzwilliam Museum, proof for 1st edition.
    2. For other books (not by Rossetti): use the Library of Congress number. For example:
      • pr5240.f11.rad.xml = the 1911 edition of Rossetti's Works, compiled by William Michael Rossetti. (See /home10/rossetti_work/tools/INDEXES/otherpictures.list)
    3. For Periodicals: use the Library of Congress number. (See /home10/rossetti_work/tools/INDEXES/periodicals.list)
  5. Comment the top of the file with the date and your initials, as in: < ! -- created on 2-12-04 by mkw -- >
  6. Proceed to edit the content of the copy text to match the document you are tagging. If needed, build an appropriate div structure and pages. See instructions on tagging a rad.
  7. Focus on the transcription. If the document requires editorial commentary, a physical description, bibliographic information, etc. contact Jerry.

Line Scheme

The idea is that every document of a particular work will have its relative line scheme, numbered serially for that document. In order to establish a reference lineation so that one can attach notes and annotations to lines when the lineation varies from version to version; or so that one can compare equivalent lines under these circumstances of variation (or show all versions of a particular line), we construct an absolute text with an absolute lineation. This absolute text is a construction of the editor. Its base is the last authorized document (for Blessed Damozel, this is the 1881 edition).


  1. Give that base document and begin lineation at the start.
  2. As soon as any document shows variation from the base text, establish subnumber for that variation. (Eg, the base moves ...37, 38, 39, 40 and then the Morgan MS has a variant stanza; this numeration begins 40.1, 40.2 etc., and one returns to the base as soon as the variation is lineated.)
  3. If any variant appears in a different position in yet another document (say, the Morgan MS 40.1 through 40.7 appears in 1856 after 1856 line 112), in the relative text it gets the 40.1...7 absolute number.
  4. One chooses all numbers in relation to the last authorized document in which the variant appears.
  5. In texts that are organized by uniform stanzas in series, one uses an equivalent scheme, treating each stanza as a single line.
Personal tools