xpat_export - XPAT xpat export file formats
xpat exports data to files in one of three formats: two styles of match sets and one style of region sets. All formats consist of a common 512 byte header, followed by the actual data.
In the following discussion, the term ``pointer'' refers to a 4 byte integer whose value is a 0-based byte offset into the text.
The file header is defined by the following `C' structure.
Note: longs are assumed to be four bytes and chars are assumed to be one byte.
struct { long file_type; long swapped; long reserved1; long reserved2; long reserved3; long version_number; long compressed; long download_check; char reserved[512 - 8*sizeof(long)]; }
file_type indicates the type of data exported by xpat. The following are the valid values for file_type:
p[i+0] <= p[i+1] < p[i+2] <= p[i+3]
The swapped field is used to determine if the file was written on a machine architecture which swaps the bytes with respect to the architecture on which the file is being read. When the file is written, swapped should contain the value 0x01020304.
The reserved1 and reserved2 fields are reserved for future Open Text use. reserved1 should have the value 0x00000001. The reserved2 and reserved3 fields should have the value 0x00000000.
The version_number field contains the version number of the program that created the file. The decimal format of this number is MMmmss, where MM is the major version number, mm is the minor version number, and ss is the sub-version number. For instance, for Release 5.0, the decimal form of this number is 050000.
The compressed field identifies whether the file is compressed or not. A value of 0 indicates an uncompressed file, while a non-zero value indicates a compressed file. The actual non-zero value specifies the compression method used (different compression methods may be used to compress different files). All files are currently uncompressed so this value should always be set to 0.
The download_check field is used to detect index files that were transferred between Unix and DOS machines using text (ASCII) transfers instead of binary transfers. Most programs that transfer data between Unix and DOS machines allow for both binary and text transfers. Binary transfers copy the data as-is without any transformations. In contrast, text transfers translate the line-ending characters to the convention used on the target machine (CR/LF for DOS, LF for Unix). If an index file is transferred using a text transfer it will become corrupted. The download_check field detects these corruptions by containing the value 0x0a0d0a00. If a binary transfer is used, this value will remain unchanged; if a text transfer is used, this value will be changed (and the changed value will be different for Unix-to-DOS transfers and DOS-to-Unix transfers). Please note that if a text transfer was used, a DOS-TO-UNIX or UNIX-TO-DOS conversion program may not accurately restore the transferred file to the original binary file. Instead, you must re-transfer the file using a binary transfer. Also note that, for backwards-compatibility, the value 0x00000000 is also an acceptable value (but it will not be changed by text transfers).
The remaining bytes in the 512 byte header are reserved for future DLXS use and should be set to 0x00.
System Integration Guide
xpat(1)