The region patterns in the patterns_file consist of pairs of starting and ending strings, one pair per line. xpatrgn will search for occurrences of these string pairs in the text and record their offsets in region_file. Once a starting string has been found, xpatrgn will search for the first occurrence of the corresponding ending string in order to end the region. Nested occurrences are ignored. Regions begin on the first character of the starting string and end on the last character of the ending string. These positions may be modified by adding or subtracting an integer value, as shown in the example below. If the ending string of any pair is not given in the input, xpatrgn will begin regions on occurrences of the starting string, and will end the regions on the character before the first character of the next region. If the end of the text is reached in the middle of a region, the program will record the location of the last character in the text as the end position of the last region.
Note: this algorithm is different than that used by xpat to make regions during a search session. Consider the text,
( a b ( c d ) ( d e f
and the region pattern,
"(" ")"(i.e., build regions between the `(' and `)' characters). xpatrgn would build the regions as `( a b ( c d )' and ` ( d e f'. xpat, on the other hand, would find all the matches which could start a region and all the matches which could end a region. It would then take the nearest pairs. For the above text, xpat would record the single region, ` ( c d )'. It would not record a region for either `( a b ', or `( d e f'.
The special character sequences `\^' and `\$' will match the first and last characters in the text, respectively.
"\n"
creates regions that are located between newline characters. Note that these regions will start at each newline character and there will be no region created for the first line (the text before the first newline).
The input pattern,
"\^" "\n" + 1
creates a region for each line in the file, starting on the first character in each line. This pattern will also include the first line in the file.
The input pattern,
"<Headline>" +10 "</Headline>" -11
creates regions between `<Headline>' and `</Headline>' tags, except that the actual regions begin on the first letter after the `<Headline>' tag, and end on the last letter before the `</Headline>' tag. This is different from the actions of multirgn, which includes the tags.
The command,
xpatrgn -p my_patrns.ptn -o Patrn1.rgn -D text.dd
builds a region for the database specified by the Data Dictionary, `text.dd'. It uses the patterns specified in `my_ptrns.ptn' and puts the index in the file `Patrn1.rgn'. It names the region, `Patrn1'.
The command,
xpatrgn -v -d "This is my pattern" -r "My Pattern" -p my_patrns.ptn -o MyPat.rgn -D data.dd
builds a region for the database specified by the Data Dictionary, `data.dd'. xpatrgn will print progress messages as it builds the index. It will record the description, `This is my pattern' in the Data Dictionary entry for the region it builds. It will name the region, `My Pattern'. It will get the patterns from the file, `my_patrns.ptn'. Finally, it will place the index in the file, `MyPat.rgn'.
The command,
ptrn_prog | xpatrgn -o Patrn1.rgn -D text.dd
builds a region called `Patrn1' for the database specified by `text.dd'. It will put the index in the file, `Patrn1.rgn'.