Last updated | 2003-12-01 11:31:02 EST |
Doc Title | Image Class XPAT Index Building |
Author 1 | Weise, John |
CVS Revision | $Revision: 1.6 $ |
This document describes the steps necessary to build an XPAT index for the Image Class.
New in Version 2:
Distributed with Image Class is a preconfigured XPAT index directory named "image-blank" that can be used as a boilerplate for building new Image Class XPAT indexes. You find image-blank at...
$DLXSROOT/idx/i/image-blank
In DLXS all content data (sgml for Image Class) are stored under $DLXSROOT/obj with the exception of continuous tone images which are stored under $DLXSROOT/img. It is necessary to create collection specific obj and idx directories for each collection.
Starting with DLXS CD #8, a shell script ($DLXSROOT/bin/i/image/setupcollindex) was included that automatically creates and configures idx and obj directories for a new collection. It also copies the sgml file from $DLXSROOT/prep/c/collid to $DLXSROOT/obj/c/collid. It stops short of actually building the index.
usage: $DLXSROOT/bin/i/image/setupcollindex c/collid
example: $DLXSROOT/bin/i/image/setupcollindex s/sampleic
With all of the SGML files properly placed in the $DLXSROOT/obj/c/collid directory, and the $DLXSROOT/idx/c/collid directory setup, the XPAT index can be built. Most collections of several thousand records will build in less than an hour. Large collections could take several hours. It depends on the amount of data and the available computing power. Building an index with a small amount of data is recommended on the first try. A few hundred records is appropriate for starters, and will only take a few minutes to run.
Tip: If you want the index to build in the background and without needing to worry about the process dying if the session is lost, try... nohup make all &
It is possible to test the index by starting an XPAT session on the command line from within $DLXSROO/idx/c/collid.
jweise@kukicha% xpat image.dd Digital Library eXtension Service, XPAT, Release 5.2 COPYRIGHT (c) 2000 The Regents of the University of Michigan All Rights Reserved >> region "ENTRY" 1: 8 matches >> pr sample 1327, ..D</BASE></GEN><ENTRY COLLID="MCSAMPLEIC" ENTRYID="X-34" CA="samp.. 4245, ..D></I></ENTRY><ENTRY COLLID="MCSAMPLEIC" ENTRYID="X-49" CA="samp.. 5090, ..D></I></ENTRY><ENTRY COLLID="MCSAMPLEIC" ENTRYID="X-51" CA="samp.. 5970, ..D></I></ENTRY><ENTRY COLLID="MCSAMPLEIC" ENTRYID="X-52" CA="samp.. 6802, ..D></I></ENTRY><ENTRY COLLID="MCSAMPLEIC" ENTRYID="X-59" CA="samp.. 7581, ..D></I></ENTRY><ENTRY COLLID="MCSAMPLEIC" ENTRYID="X-62" CA="samp.. 10101, ..D></I></ENTRY><ENTRY COLLID="MCSAMPLEIC" ENTRYID="X-77" CA="samp.. 14959, ..D></I></ENTRY><ENTRY COLLID="MCSAMPLEIC" ENTRYID="X-84" CA="samp..
It is possible and favorable to move a built index to a new location. For example, at Michigan, an XPAT index is built on a development machine and then moved to a production machine. Building an index is an intensive CPU process that can take a few minutes to several hours. Building an XPAT index on the development machine removes the burden from the production machine. It also allows an index to be tested thoroughly in the development environment before being moved to production.
The steps for moving an index and associated SGML files from one machine to another, and into production are:
It is important to know that since paths are hard-coded in the index, the index must be put into an identical directory location at the destination; otherwise it will not work.
It might be useful to manage multiple instances of idx and obj directories for a single collection and then use a symlink to point to the index that is to be used by the middleware. For example, one could have $DLXSROOT/idx/c/collid-a and $DLXSROOT/idx/c/collid-b plus a symlink $DLXSROOT/idx/c/collid that points to the a or b instance. This approach might simplify the deployment of collection updates with minimal disruption of service.
A better approach is to build indexes in a development environment (preferably on a separate machine) and use a tool such as rdist to transfer the index files to the production location.