Indexing the Collection (Finding Aids)
After you have followed all the steps to set up your directories and prepare
your files, as found in the finding aids preparation
documentation, indexing the collection is fairly straightforward. To create
an index for use with the Findaid Class interface, you will need to index
the words in the collection, then index the XML (the structural metadata,
if you will), and then finally "fabricate" structures based on a combination
of elements (for example, defining what the "main entry" is, without adding
a <MAINENTRY> tag around the appropriate <AUTHOR> or <TITLE> element).
The following commands can be used to make the index, alone or in combination.
- Ensure that your collection data is valid by running make validate, which
will use the dlxsead2002.dtd to validate the full xml file.
- Ensure that your collection data is normalized by running make norm. This step is done to put attributes in the order in which they were defined in the DTD. Even thought your collection data is XML, it is a requirement of xmlrgn (part of the make xml step below) that the attributes appear in this order.
- make singledd indexes words for texts that have been concatenated
into one large file for a collection. Creating an index from a single file
(as opposed to multi file system indexing) is the recommended process for
reasons of speed and reliability. Use the make singledd command in
the Makefile stored at $DLXSROOT/bin/c/collid/Makefile .
- make xml indexes the SGML structure by reading the DTD. sgmlrgn
validates as it indexes, and is slower than multiregion indexing (see XPAT
documentation for more information) for this reason. However, this method
necessary for collections that have nested elements of the same name (and
the EAD DTD permits this). Use the make sgml command in the Makefile
stored at $DLXSROOT/bin/c/collid/Makefile .
- make post builds and indexes fabricated regions based on the XPat
queries stored in the $DLXSROOT/prep/c/collid/{coll}.extra.srch file. Because
every collection is different, this file will need to be adapted after you
have determined what you want to use as the "main title" for a finding aid
(e.g., perhaps the ORIGINATION within the DID within the ARCHDESC) and how
many levels of components (e.g., nested to C04) you have in your collection.
If you try to index/build fabricated regions from elements not used in your
finding aids collection, you will see errors like Error found: <Error>syntax
error before: ")</Error> when you use the make post command
in the Makefile stored at $DLXSROOT/bin/c/collid/Makefile .
You have now built indexes and region files for your collection. You can test
that things are properly indexed by issuing the command xpat $DLXSROOT/idx/c/collid/collid.dd and
doing searches, such as region "c02" and region "main" .
For more information about searching, see the XPAT
manual .Strategically, it is good to test this from a directory other
than the one you indexed in, to ensure that relative or absolute paths are
resolving appropriately.