Last updated |
2003-03-04 09:44:03 EST |
Doc Title |
Index Building, Bibliographic Class |
Author 1 |
Hagedorn, Kat |
CVS Revision |
$Revision: 1.6 $ |
Index Building, Bibliographic Class
You will need to identify a directory or directories where you plan to store
your SGML or XML source file, your index file (approximately 75% of the size
of your bibliographic information), your "region" files, and other information
such as data dictionaries. We recommend you use the following structure:
- Store SGML or XML files in /{DLXSROOT}/obj/{s}/{sample}/ where {DLXSROOT} is
the "tree" where you install all DLXS components, {s} is the first
letter of the name of the collection you are indexing, and {sample} is
the collection ID of the collection you are indexing. For example, if your
collection id is "nyt" and your DLXSROOT is "/l1", you will place the nyt.xml file
in /l1/obj/n/nyt/, e.g., /l1/obj/n/nyt/nyt.xml. See directory
conventions for more information.
- Store index, region, data dictionary, and init files in /{DLXSROOT}/idx/{s}/{sample}/,
e.g., /l1/idx/n/nyt/nyt.idx. See the XPAT
documentation for more on these types of files.
The instructions below assume a sample collection named "nyt" and a DLXSROOT
of "/l1", as in the above examples. Please replace these sample names with
your local filenames.
- Ensure that your SGML is fully validated or normalized, or that your XML
is fully validated. Use a validating parser such as nsgmls to
accomplish this. NB: Building indexes without validation can cause problems
such as unreliable results; data that will not validate should not be put
online. Assuming SGML, put the file nyt.sgm in /l1/obj/n/nyt/nyt.sgm
- Copy the sample data dictionary file bib-sample.dd to /l1/idx/n/nyt/ and
rename as nyt.dd
- Edit the nyt.dd file to replace
- b/bib-sample/bib-sample.sgm with n/nyt/nyt.sgm
- b/bib-sample/bib-sample.idx with n/nyt/nyt.idx
- and b/bib-sample/bib-sample.init with n/nyt/nyt.init
- Copy the sample init file bib-sample.init to /l1/idx/n/nyt/ and
rename as nyt.init
- Index your collection using the following command, replacing the value 10m with
an appropriate amount of memory. Please see XPAT
documentation to determine how much memory to allocate.
xpatbld -m 10m -D /l1/idx/n/nyt/nyt.dd
- Create your region files by issuing the following command.
multirgn -f -D /l1/idx/n/nyt.dd -t bib-regions.tags
The file bib-regions.tags can be located in
any directory and can be deleted after the regions have been indexed. DLPS
keeps a copy of this file in /l1/obj/lib/sgml/bib-regions.tags
You have now built indexes and region files for your collection. You can test
that things are properly indexed by issuing the command
xpat /l1/idx/n/nyt/nyt.dd
and then searching a common word (e.g., "the") and
region A
Strategically, it is good to test this from a directory other than the one
you indexed in, to ensure that relative or absolute paths are resolving appropriately.