XPAT export file formats (man page)

XPAT_EXPORT

Section: File Formats (5)
Index

NAME

xpat_export - XPAT xpat export file formats

DESCRIPTION

xpat exports data to files in one of three formats: two styles of match sets and one style of region sets. All formats consist of a common 512 byte header, followed by the actual data.

In the following discussion, the term ``pointer'' refers to a 4 byte integer whose value is a 0-based byte offset into the text.

The file header is defined by the following `C' structure.

Note: longs are assumed to be four bytes and chars are assumed to be one byte.


  struct
  {
    long        file_type;
    long        swapped;
    long        reserved1;
    long        reserved2;
    long        reserved3;
    long        version_number;
    long        compressed;
    long        download_check;
    char        reserved[512 - 8*sizeof(long)];
  }

file_type indicates the type of data exported by xpat. The following are the valid values for file_type:

1
The exported data consist of pairs of pointers. The first pointer in the pair is the byte offset of the first character of the region. The second pointer is the byte offset of the last character of the region. Viewing the pairs as an array, the following relationships must hold true for each even value of i (0, 2, 4, ...):

p[i+0] <= p[i+1] < p[i+2] <= p[i+3]

2
(Reserved.)

3
The exported data consist of pointers to matches in the database. The pointers are ordered so that the text strings they point to are in alphabetic order.

4
The exported data consist of pointers to matches in the database. The pointers are arranged in increasing numerical order. This means that the matches are ordered by their position (byte offset) in the text.

The swapped field is used to determine if the file was written on a machine architecture which swaps the bytes with respect to the architecture on which the file is being read. When the file is written, swapped should contain the value 0x01020304.

The reserved1 and reserved2 fields are reserved for future Open Text use. reserved1 should have the value 0x00000001. The reserved2 and reserved3 fields should have the value 0x00000000.

The version_number field contains the version number of the program that created the file. The decimal format of this number is MMmmss, where MM is the major version number, mm is the minor version number, and ss is the sub-version number. For instance, for Release 5.0, the decimal form of this number is 050000.

The compressed field identifies whether the file is compressed or not. A value of 0 indicates an uncompressed file, while a non-zero value indicates a compressed file. The actual non-zero value specifies the compression method used (different compression methods may be used to compress different files). All files are currently uncompressed so this value should always be set to 0.

The download_check field is used to detect index files that were transferred between Unix and DOS machines using text (ASCII) transfers instead of binary transfers. Most programs that transfer data between Unix and DOS machines allow for both binary and text transfers. Binary transfers copy the data as-is without any transformations. In contrast, text transfers translate the line-ending characters to the convention used on the target machine (CR/LF for DOS, LF for Unix). If an index file is transferred using a text transfer it will become corrupted. The download_check field detects these corruptions by containing the value 0x0a0d0a00. If a binary transfer is used, this value will remain unchanged; if a text transfer is used, this value will be changed (and the changed value will be different for Unix-to-DOS transfers and DOS-to-Unix transfers). Please note that if a text transfer was used, a DOS-TO-UNIX or UNIX-TO-DOS conversion program may not accurately restore the transferred file to the original binary file. Instead, you must re-transfer the file using a binary transfer. Also note that, for backwards-compatibility, the value 0x00000000 is also an acceptable value (but it will not be changed by text transfers).

The remaining bytes in the 512 byte header are reserved for future DLXS use and should be set to 0x00.

SEE ALSO

System Integration Guide
xpat(1)

Index

NAME
DESCRIPTION
SEE ALSO