Last updated |
2003-11-03 14:52:30 EST |
Doc Title |
Preparation for Index Building (Finding Aids) |
Author 1 |
Powell, Chris |
Author 2 |
Pagliere, Alan |
CVS Revision |
$Revision: 1.6 $ |
Preparation for Index Building (Finding Aids)
Setting
up directories
You will need to identify directories where you plan to store your EAD2002
XML source file, your index file (approximately 75% of the size of your finding
aids), your "region" files and other information such as data dictionaries,
and files you use to prepare your data. We recommend you use the following
structure:
- Store specialized scripts for your collection and its Makefile in $DLXSROOT/bin/{c}/{coll}/ where $DLXSROOT is
the "tree" where you install all DLXS components, {c} is the first
letter of the name of the collection you are indexing, and {coll} is
the collection ID of the collection you are indexing. For example, if your
collection ID is "bhlead" and your DLXSROOT is "/l1", you will place the
Makefile in /l1/bin/b/bhlead/ , e.g., /l1/bin/b/bhlead/Makefile .
See directory
conventions for more information.
- Store your source finding aids in $DLXSROOT/prep/{c}/{coll}/data/.
The Makefile will find them here.
- Store any DTDs, doctype, and files for preparing your data in $DLXSROOT/prep/{c}/{coll}/.
Unlike the contents of other directories, everything in prep should be
expendable when actually running the indexes.
- After running all the targets in the Makefile, the finalized, concatenated
XML file for your finding aids collection will be created in $DLXSROOT/obj/{c}/{coll}/ ,
e.g., /l1/obj/b/bhlead/bhlead.xml .
- Store index, region, data dictionary, and init files in $DLXSROOT/idx/{c}/{coll}/ ,
e.g., /l1/idx/b/bhlead/bhlead.idx. These will be updated as the
index related targets in the Makefile are run. See the XPAT
documentation for more on these types of files.
The files that are located in $DLXSROOT/bin/s/samplefa and $DLXSROOT/prep/s/samplefa can
be used as examples in preparing your finding aids and creating your indexes.
The following files may need to have the #! adjusted for your location of
perl:
- $DLXSROOT/bin/f/findaid/output.dd.frag.pl
- $DLXSROOT/bin/f/findaid/inc.extra.dd.pl
- $DLXSROOT/bin/s/samplefa/prepdocs.pl (Note: prepdocs.pl is a Perl file
that processes the individual files in $DLXSROOT/prep/s/samplefa/data and
so should be looked at and edited as needed for whatever processing your
files need)
The following files will need to be edited to reflect your collection names
and paths:
- $DLXSROOT/bin/s/samplefa/Makefile
- $DLXSROOT/prep/s/samplefa/samplefa.blank.dd
- $DLXSROOT/prep/s/samplefa/samplefa.extra.srch
- $DLXSROOT/prep/s/samplefa/samplefa.inp
The following file will need to be edited to reflect the location of your
entity reference files:
- $DLXSROOT/prep/s/samplefa/dlxsead2002.dtd
Preparing your data
In your prep directory, create a data subdirectory for your collection
and copy the finding aids for your collection into it. In our example collection,
which is a subset of the Bentley Historical Library's Finding Aids, this would
be $DLXSROOT/prep/s/samplefa/data/. Ensure that each has a unique
value in the eadid element.
- At the command line, run make validateeach, which will check
each file individually in the data directory. Catching errors (which will
show up in *.errors files in the datasubdirectory is much easier
than catching them later after the files are concatenated into a large
bundle of XML.
- Check for error files in the datasubdirectory and fix any parsing
errors in the culprit files.
- Back at $DLXSROOT/bin/{c}/{coll}, run make prepdocs,
which process files and concatenate them into a single xml file in the $DLXSROOT/obj/{c}/{coll}/ directory.
- Continue to indexing your Finding Aids