Information Retrieval: Data Structures and AlgorithmsSistrings as you'll see them in the literature, have some interesting properties:
W. B. Frakes and R. S. Baeza-Yates
1992.
They start at some offset in the entire string of the database They stretch off to at least the end of the database (implying that they always overlap with each other...) One can change (with the IndexPts section of the pat50 .dd file) where they start
One of the most important features of pat50/XPat is its ability
to index not only full text, but also to index SGML regions. This gives
the ability to create complex searches that reach into regions of text
based on the markup elements.
In 1998 and 1999 we began to use the next generation of OpenText search engine, ot60, which builds indexes on tokens rather than sistrings. However, we much prefered the Pat tree structure and some of its features.
Happily, DLPS was able, in 1999, to acquire the source code to pat50 (see pat50/DLPS recent developments), which OpenText was no longer supporting. We can now offer a license to use our version of the original engine, now being called XPat. We have already begun to make changes to it and will continue to do so. These enhancements include: