XPATMAINT

Section: User Commands (1)
Updated: November 2000
Index Return to Main Contents

NAME

xpatmaint - XPAT text DBMS maintenance utility

SYNOPSIS

xpatmaint [ -v ] [ -o ] [ -l log_file ] [ -1 ] [ -2 ] [ -3 ] [ -4 ] [ -5 ] [ -d delete_file ] -D main_data_dictionary -a append_data_dictionary

DESCRIPTION

xpatmaint uses a five-stage, off-line process to merge the append database with the main database. If a delete_file is specified, xpatmaint will also delete portions of the main database. The first stage consists of a full scan of the main text to generate index update directives. The second stage consists of an in-place update of the main text and the main index, using the update directives produced in the first stage. Also during the second stage, the append text is physically appended to the end of the main text and any specified deletion portions are physically removed from the main text. The third stage consists of merging the region indices of the append database with the region indices of the main database. The region indices of the main database are also updated to reflect the deleted portions of text. The forth stage consists of merging the fast find indices from the main database with the append database. The fifth stage rebuilds the fast region indices either used by main database or append database. The final step updates the Data Dictionary for the main database.

OPTIONS

-v: verbose - report progress through the stages of execution. If -v is not specified, xpatmaint works quietly.
-o: optimize - use optimization mode. This option increases the speed with which the main text is scanned by using a special speedup algorithm. This algorithm increases the amount of core memory allocated during the first stage. This option should always be used if core memory is not an issue.
-l log_file: log - log progress through the stages of execution. This option is similar in nature to the -v option, except that the output is placed in a file. xpatmaint will append log information if the specified log_file already exists. This makes the log_file useful as an audit trail for updates to the main database.
-1 [ -2 [ -3 [ -4 [ -5 ] ] ] ]
-2 [ -3 [ -4 [ -5 ] ] ]
-3 [ -4 [ -5 ] ]
-4 [ -5 ]
-5: partial execution options - specify partial execution. xpatmaint can be started or re-started at any stage using these options. For more details, refer to the Description section above, and the Partial Execution Options section, below.
-d delete_file: deletion portions file - specify the portions of the main text to be deleted. For more details, refer to the Deletion Regions File section, below.

NOTES

As mentioned above, xpatmaint physically modifies various files of the main database during its 2nd to 5th stages. Two requirements result from this method of operation. The first requirement is to ensure that the database is backed up before xpatmaint is run. A backup is necessary because xpatmaint will leave the database in an invalid state if it terminates abnormally during an update operation. In such cases, the database can be restored from backup and the update can then be re-started. See the section on Partial Execution Options section, below, for more details on re-starting an update.

The second requirement is that the database be off-line (i.e. with nobody searching it) while stages 2 to 5 are being performed. This requirement exists because the database is in an invalid state during these stages. See the section on Partial Execution Options section, below, for more details on running the various stages separately.

xpatmaint is usually run from the directory containing the main database. Because of this, file specifications in the append_data_dictionary should have full pathnames so they can be located from anywhere in the file system, particularly from the main database's directory.

For regions having the same name in both the main_data_dictionary and the append_data_dictionary, the region information from the append database is merged into the corresponding region file in the main database. Any region from the append database, whose name does not match a region name in the main database, is placed at the end of the main database's `.rgn' file. If the main database does not already have a `.rgn' file, then one is created.

The index file in the append database is not used by xpatmaint. Instead, xpatmaint rebuilds the index for the append database using the index specifications in the main_data_dictionary before the index merging takes place.

PARTIAL EXECUTION OPTIONS

During the first stage of xpatmaint, users can still search the database since the main and append database text files are only scanned, not physically changed. However, the main database must be taken off-line for the 2nd to 5th stages. The first stage generally takes much longer to run than the other stages. As such, it is sometimes convenient to have stage 1 run while users are using the database (e.g. as a low-priority process during the day), and then run stages 2 to 5 afterwards (e.g. at night). This policy can be implemented using the partial execution options to xpatmaint.

If xpatmaint is run with only the -1 option specified, it will only perform stage 1 and will write the index update directives into a file called pmt_dir in the current directory. When the time comes to perform stages 2 to 5, xpatmaint can be executed with the -2, -3, -4 and -5 options specified. xpatmaint will then read the update directives from the pmt_dir file and update the index and region indices. If stage 4 is required, the pmt_sv_dir directives file will be created during stage 2 processing. It is required to have pmt_sv_dir directives to to update the fast find index. The stage 5 does not need any directives.

There is another benefit from these options. Even if no partial execution options are specified, xpatmaint still writes the update directives to the pmt_dir file after it has finished stage 1. The pmt_dir directives file is only removed after stage 3 completes. Should a machine crash occur after stage 2, it is only necessary to restore the index and region files before re-running xpatmaint with the -2 and -3 options specified. Should a machine crash occur during stage 3, only the region files would need to be restored before re-running xpatmaint with the -3 option specified.

After stage 2 is successfully completed, the pmt_sv_dir directive files will be created. As long as the pmt_sv_dir is created successful, the machine crash occur during stage 4, only the fast find index files would need to be restored before re-running xpatmaint with the -4 option specified.

The stage 5 will completely rebuild all fast regions specified either in main database or in append database. Therefore, no file is needed to be restored before re-running xpatmaint with the -5 option specified.

DELETION PORTIONS FILE

The deletion portions file specifies the portions of the main database that are to be removed. The portions are specified by pairs of start and end positions and these positions are inclusive. The positions must be monotonically increasing and must not overlap. Each pair of positions must be on a separate line. Positions are 1-based offsets into the text. This means that the first character of the file is at position 1. For example,


122 345
790 930
3507 5603

Text between offsets 122 and 345, 790 and 930, and 3507 and 5603 in the main database will be deleted. Note that the positions are monotonically increasing and do not overlap.

DISK REQUIREMENTS

In addition to the disk space necessary for the main and append databases, xpatmaint requires disk space equal to the size of the append database text, index (built using the index specifications from the main_data_dictionary), and region files. Disk space for the pmt_dir update directives file must also be available. The size of this file is directly proportional to the number of deletion portions plus approximately the size of the append text. 300 bytes are required for each portion of text to be deleted. For example, 10 deletions will increase the size of the directives file by 3 KB.

MEMORY REQUIREMENTS

xpatmaint has the following memory requirements:
1) approximately 5 times the size of the append text. Note, however, that only 3 times the size of the append text need be available in core at a time - the rest may be swapped out in virtual memory.
2) 500 bytes per deletion portion.
3) 32K of buffer space.

EXECUTION TIME

The following times are the execution characteristics of xpatmaint running on a Sun SPARCstation 2. Before the first stage begins, a setup stage is performed. The time required to perform the setup is related to the size of the append text. The setup stage for a 1 MB append text typically requires about 20 seconds.

During stage 1, the scan rate for the main text is logarithmic in the size of the append text but tends to level off when the size of the append text exceeds 2 MB. Using the -o optimization option, a typical scan rate for a 1 MB append text is 140 KB/sec. So, stage 1 requires time equal to the size of the main text divided by the effective scan rate.

Stage 2 merges the append index with the main index. The rate at which this stage progresses varies with the relative sizes of the main and append texts. When the append text is 10% the size of the main text, the processing rate is typically 300 KB/sec. Stage 2 requires time equal to the size of the main index (not the main index) divided by the effective processing rate.

Stage 3 merges the append region files with the main region files. The time required to do this depends on the total size of all the region files from both databases but a typical processing rate is 400 KB/sec.

Stage 4 merges the append fast find index files with the main fast find index files. The time required to do this depends on the size of the fast find index files and the number of deleted text regions.

Stage 5 rebuilds the append fast region indices. The time required will be the same as the time required to run xpatfr independently on each fast regions.

When all the above stages (excepted stage 4 and stage 5) are combined, the overall processing rate for the addition of an append text that is 1% the size of the main text, on a Sun SPARCstation 2 is approximately 120 KB/sec in the size of the main text. If including stage 4, the speed will be approximately 40 KB/sec. In addition, if including stage 5, the speed will be futher decreased by the number of fast regions need to be rebuilt.

EXAMPLE

This example assumes that the main database is in the directory /usr/database/main. The database consists of the text (main_db), the index (main_db.idx), the Data Dictionary (main_db.dd), and two region files (main_db.rgn and custom.rgn). There is also a deletion portions file called main_delete.

The append text is 500 KB in size and is in the file /usr/database/new_data/new. The first step in merging the append text with the main database is to build an append database. This is done by going to the /usr/database/new_data directory and running xpatbld. xpatbld is given 1200 KB of memory to use - enough to index the 500 KB of text. In the following steps, note the use of full pathnames in the various file specifications:

% cd /usr/database/new_data
% xpatbld -m 1200k -t /usr/database/new_data/new \
    -o /usr/database/new_data/new

The region indices are built next, using a combination of multirgn and xpatrgn. It is assumed that a descriptive tagnames file with a `.tag' extension has been created for use by multirgn. It is further assumed that the pattern for the region called Custom, to be created using xpatrgn, is in the file custom.ptn and that the xpatrgn generated region pointers will be placed into a file called custom.rgn. Note that while this example uses multirgn and xpatrgn, sgmlrgn can also be used to create region files.

% multirgn -f new.dd tagnames.d
% xpatrgn new.dd \
   /usr/database/new_data/custom.rgn Custom < custom.ptn

The append database can now be merged with the main database. The databases are merged by going to the main database's directory and running xpatmaint.

% cd /usr/database/main
% xpatmaint -v -o -D main_db.dd -d main_delete \
                -a /usr/database/new_data/new.dd
          .
          .
      (various progress messages from xpatmaint)
          .
          .
      **** xpatmaint completed ****
%

At this point, the append text has been added to the main database's text file (main_db), an index has been built on the new text and has been merged with the main database's index. The regions in the append database's new.rgn file have been merged with the regions in the main database's main_db.rgn file. The regions in the append database's custom.rgn file have been merged with the regions in the main database's custom.rgn file, and the main database's Data Dictionary (main_db.dd) file has been updated to reflect the new state of the database. Note that the region files are merged by region name, and not on the basis of the region files themselves.

FINAL NOTE: For xpatmaint to run properly, it is important that the region indices created for the append database are proper xpat region indices. The validity of region files created by multirgn, sgmlrgn, or xpatrgn can be assumed. However, if a custom program is used to create the region files, then care should be taken to ensure that faulty data does not produce incorrect region indices.

This document was created by man2html, using the manual pages.
Time: 18:03:38 GMT, March 26, 2001