xpatmaint [ -v ] [ -o ] [ -l log_file ] [ -1 ] [ -2 ] [ -3 ] [ -4 ] [ -5 ] [ -d delete_file ] -D main_data_dictionary -a append_data_dictionary
xpatmaint uses a five-stage, off-line process to merge the append database with the main database. If a delete_file is specified, xpatmaint will also delete portions of the main database. The first stage consists of a full scan of the main text to generate index update directives. The second stage consists of an in-place update of the main text and the main index, using the update directives produced in the first stage. Also during the second stage, the append text is physically appended to the end of the main text and any specified deletion portions are physically removed from the main text. The third stage consists of merging the region indices of the append database with the region indices of the main database. The region indices of the main database are also updated to reflect the deleted portions of text. The forth stage consists of merging the fast find indices from the main database with the append database. The fifth stage rebuilds the fast region indices either used by main database or append database. The final step updates the Data Dictionary for the main database.
The second requirement is that the database be off-line (i.e. with nobody searching it) while stages 2 to 5 are being performed. This requirement exists because the database is in an invalid state during these stages. See the section on Partial Execution Options section, below, for more details on running the various stages separately.
xpatmaint is usually run from the directory containing the main database. Because of this, file specifications in the append_data_dictionary should have full pathnames so they can be located from anywhere in the file system, particularly from the main database's directory.
For regions having the same name in both the main_data_dictionary and the append_data_dictionary, the region information from the append database is merged into the corresponding region file in the main database. Any region from the append database, whose name does not match a region name in the main database, is placed at the end of the main database's `.rgn' file. If the main database does not already have a `.rgn' file, then one is created.
The index file in the append database is not used by xpatmaint. Instead, xpatmaint rebuilds the index for the append database using the index specifications in the main_data_dictionary before the index merging takes place.
If xpatmaint is run with only the -1 option specified, it will only perform stage 1 and will write the index update directives into a file called pmt_dir in the current directory. When the time comes to perform stages 2 to 5, xpatmaint can be executed with the -2, -3, -4 and -5 options specified. xpatmaint will then read the update directives from the pmt_dir file and update the index and region indices. If stage 4 is required, the pmt_sv_dir directives file will be created during stage 2 processing. It is required to have pmt_sv_dir directives to to update the fast find index. The stage 5 does not need any directives.
There is another benefit from these options. Even if no partial execution options are specified, xpatmaint still writes the update directives to the pmt_dir file after it has finished stage 1. The pmt_dir directives file is only removed after stage 3 completes. Should a machine crash occur after stage 2, it is only necessary to restore the index and region files before re-running xpatmaint with the -2 and -3 options specified. Should a machine crash occur during stage 3, only the region files would need to be restored before re-running xpatmaint with the -3 option specified.
After stage 2 is successfully completed, the pmt_sv_dir directive files will be created. As long as the pmt_sv_dir is created successful, the machine crash occur during stage 4, only the fast find index files would need to be restored before re-running xpatmaint with the -4 option specified.
The stage 5 will completely rebuild all fast regions specified either in main database or in append database. Therefore, no file is needed to be restored before re-running xpatmaint with the -5 option specified.
122 345 790 930 3507 5603
During stage 1, the scan rate for the main text is logarithmic in the size of the append text but tends to level off when the size of the append text exceeds 2 MB. Using the -o optimization option, a typical scan rate for a 1 MB append text is 140 KB/sec. So, stage 1 requires time equal to the size of the main text divided by the effective scan rate.
Stage 2 merges the append index with the main index. The rate at which this stage progresses varies with the relative sizes of the main and append texts. When the append text is 10% the size of the main text, the processing rate is typically 300 KB/sec. Stage 2 requires time equal to the size of the main index (not the main index) divided by the effective processing rate.
Stage 3 merges the append region files with the main region files. The time required to do this depends on the total size of all the region files from both databases but a typical processing rate is 400 KB/sec.
Stage 4 merges the append fast find index files with the main fast find index files. The time required to do this depends on the size of the fast find index files and the number of deleted text regions.
Stage 5 rebuilds the append fast region indices. The time required will be the same as the time required to run xpatfr independently on each fast regions.
When all the above stages (excepted stage 4 and stage 5) are combined, the overall processing rate for the addition of an append text that is 1% the size of the main text, on a Sun SPARCstation 2 is approximately 120 KB/sec in the size of the main text. If including stage 4, the speed will be approximately 40 KB/sec. In addition, if including stage 5, the speed will be futher decreased by the number of fast regions need to be rebuilt.
This example assumes that the main database is in the directory /usr/database/main. The database consists of the text (main_db), the index (main_db.idx), the Data Dictionary (main_db.dd), and two region files (main_db.rgn and custom.rgn). There is also a deletion portions file called main_delete.
The append text is 500 KB in size and is in the file /usr/database/new_data/new. The first step in merging the append text with the main database is to build an append database. This is done by going to the /usr/database/new_data directory and running xpatbld. xpatbld is given 1200 KB of memory to use - enough to index the 500 KB of text. In the following steps, note the use of full pathnames in the various file specifications:
% cd /usr/database/new_data % xpatbld -m 1200k -t /usr/database/new_data/new \ -o /usr/database/new_data/new
The region indices are built next, using a combination of multirgn and xpatrgn. It is assumed that a descriptive tagnames file with a `.tag' extension has been created for use by multirgn. It is further assumed that the pattern for the region called Custom, to be created using xpatrgn, is in the file custom.ptn and that the xpatrgn generated region pointers will be placed into a file called custom.rgn. Note that while this example uses multirgn and xpatrgn, sgmlrgn can also be used to create region files.
% multirgn -f new.dd tagnames.d % xpatrgn new.dd \ /usr/database/new_data/custom.rgn Custom < custom.ptn
The append database can now be merged with the main database. The databases are merged by going to the main database's directory and running xpatmaint.
% cd /usr/database/main % xpatmaint -v -o -D main_db.dd -d main_delete \ -a /usr/database/new_data/new.dd . . (various progress messages from xpatmaint) . . **** xpatmaint completed **** %
At this point, the append text has been added to the main database's text file (main_db), an index has been built on the new text and has been merged with the main database's index. The regions in the append database's new.rgn file have been merged with the regions in the main database's main_db.rgn file. The regions in the append database's custom.rgn file have been merged with the regions in the main database's custom.rgn file, and the main database's Data Dictionary (main_db.dd) file has been updated to reflect the new state of the database. Note that the region files are merged by region name, and not on the basis of the region files themselves.
FINAL NOTE: For xpatmaint to run properly, it is important that the region indices created for the append database are proper xpat region indices. The validity of region files created by multirgn, sgmlrgn, or xpatrgn can be assumed. However, if a custom program is used to create the region files, then care should be taken to ensure that faulty data does not produce incorrect region indices.