Indexing will be covered in detail during the Text Class Data Preparation section.
A full list of XPAT commands can be found at: http://www.hti.umich.edu/sgml/pat/pat50manual.html
%
xpatu $DLXSROOT/idx/s/sampletc_utf8/sampletc_utf8.dd
Back
to top >> "prince" 1: 134 matches >> "prince " 2: 123 matches >> sample 3: 10 matches >> pr 539939, ..was said that Prince Alexander of Battenberg had changed into a .. 957348, ..e only child, Prince Alexander, who came in before we went to ta.. 1390470, ..TEM>Bismarck,Prince, and the Austro-German alliance ~ <REF>xxiv.. 552103, ..alliance that Prince Bismarck, in 1879, entered into the very cl.. 208247, .. sceptre d'un prince de religion orthodoxe.</P> <P> <.. 1016444, ..n the streets Prince Michael and Teresia, 20 to 30 dinars toward.. 943446, ..ian statue of Prince Michael, whose name and portrait are found .. 483031, ..la volonté du prince Nicolas, ses résolutions personnelles au su.. 1411801, ..udolph, Crown Prince, Popularity of ~ <REF>69</REF> </ITEM.. 1141121, ..raged it. The Prince suspected nothing of what was taking place .. >> [290947] 4 : one match
The first query finds all "semi-infinite strings" that begin with "prince",
the second finds those that are "prince" exactly (with the space, or anything
that has been mapped to a space), and the third query finds the string beginning
at the byte 290947.
To find how many of a particular region type exist, enter region plus the name of the region (double quotes are needed if the name contains non-alphanumeric characters).
>> region "DIV1" 1: 38 matches >> region "A-NODE" 2: 46 matches
Also see the {ddinfo regionnames}
command.
Also see the history
command.
>> long 1: 244 matches >> help 2: 54 matches >> 1 + 2 3: 298 matches >> "alternate" 4: 5 matches >> pr 4 1175485, ..most from the alternate advance and retreat of the Russian and T.. 1165090, ..in. Vineyards alternated with fields of barley, oats, and maize;.. 967310, ..men and women alternately; <EPB/> <PB REF="00000208.tif" S.. 1313659, ..a and Austria alternately. But, when able to repel aggression, s.. 1303571, .. each country alternately. It should be composed of three secti.. >> mysearch = "pair" 5: mysearch = 3 matches >> pr *mysearch 1170568, ..and a half; a pair of buffaloes, 600 francs (£24).</P> <P>B.. 848085, ..s dress was a pair of large Turkish trousers of white wool, a sh.. 1085132, ..nd thick; two pairs of oxen drew it by means of a pole which was..
Also see the subset
command.
Also see the {sortorder}
setting.
Also see other operators and relations.
Using some basic XPAT operators, we can build some very specific searches that take advantage of the SGML's markup. Here is an actual example from the TextClass implementation. The following query is actually the basis for the fabricated region called mainauthor in most of our text collections. Note that this query depends on knowing the structure of the document's markup (in case of TextClass documents, the regions here are essentially the same as in the TEIHEADER of the TEI.2 DTD.)
>> ((region AUTHOR within (region TITLESTMT within region FILEDESC)) not within (region SOURCEDESC)) 6: 2 matches >> pr.region.6 235, ..<AUTHOR> Yriarte, Charles, 1832-1898. </AUTHOR> .. 513768, ..<AUTHOR> Laveleye, Emile de, 1822-1892. </AUTHOR>..
Here we construct a query to return a PSet consisting of hits on a user-entered search term. We want to display a line containing the immediate context of the hit and also a title from an enclosing division:
The query for the user's search is simply:
>> firstsearch= ("Branivoj " + "Branivoj<") 7 firstsearch = one match
To get a division title for the hit we need to build up regions based on the hit:
>> slicesearch= subset.1.25 *firstsearch 8: slicesearch = one match >> mainslicesearch = (region "DLPSTEXTCLASS" incl *slicesearch) 9: mainslicesearch = one match >> mainheader = (region "HEADER" within *mainslicesearch) 10 : mainheader = one match
Finally to view the content of the region we have constructed we enter:
>> pr.region."HEADER" (region *mainheader)
See also viewing sets.
The default mode, in an interactive XPAT session, is "quietoff". This gives the results messages you have seen so far: numbered sets, byte offsets followed by snippets of SGML with ".." on either end, etc. Another mode, and the most useful for interacting with XPAT programmatically, is "quieton raw". Nothing seems to happen when one enters:
>> {quieton raw}
However, entering queries now produces results that are tagged in a way that is easily parsable from within a program. First enter an earlier point search:
firstsearch = ("Branivoj " + "Branivoj<")
<SSize>1</SSize> pr
<PSet><Start>313615</Start><Raw><Size>64</Size>res du nom de Branivoj s'emparent du territoire qu'ils gouvernen</Raw></PSet>
Now enter an earlier region search:
((region AUTHOR within (region TITLESTMT within region FILEDESC)) not within (region SOURCEDESC)) <SSize>4</SSize> pr.region.AUTHOR
<RSet><Start>143</Start><End>178</End><Raw><Size>36</Size> <AUTHOR>Holbach, Maude M. </AUTHOR></Raw><Start>298344</Start> <End>298391</End><Raw><Size>48</Size><AUTHOR>Yriarte, Charles, 1832-1898. </AUTHOR></Raw> <Start>792438</Start><End>792487</End><Raw><Size>50</Size> <AUTHOR>Laveleye, Emile de, 1822-1892. </AUTHOR></Raw><Start>1689410</Start> <End>1689486</End><Raw><Size>77</Size> <AUTHOR>Sebright, Georgina Mary Muir (Mackenzie), Lady, d. 1874- </AUTHOR></Raw></RSet>
Some of these tags are self-explanatory (e.g., SSize = set size). But some may need a bit of explanation.
XPAT's ability to return results with tags allows a program to parse the results into pieces. In the DLXS Middleware this is done by a group of DLXS Perl modules. These modules have methods to let the CGI program interact with XPAT (an XPAT process is forked off by the CGI program and queries can be made of it at any time). The main object the code uses is the xpat object. It has methods for making queries in different ways and for interacting with the forked off XPAT process.
Here is some code (from TextClass.pm) that illustrates how the middleware uses a method of the Perl-based XPAT object (created in an earlier part of the code).
... my $query = qq{(region mainheader incl ( $idnorgn incl "$idno" ) );}; my ( $error, $result) = $xpat->GetSimpleResultsFromQuery( $query ); if ( $error ) { &DlpsUtils::errorBail( qq{Query error in FindXPATContainingIdno: $result} ); } &DlpsUtils::StripAllRSetCruft( \$result ); $result =~ m,<SSize>(\d+)</SSize>,; my $hit = $1; if ( $hit > 0 ) { $returnXpat = $xpat; last; } ...
While some code, such as this, makes a query via a simple method, most queries in the middleware are actually made by other means, through other objects and their methods. Once data has been prepared according to the DLXS Class DTDs, in terms of searching, the middleware can be thought of as an engine that simply "runs" the data.
NOTE: Whereas in Release 11a and before, if there was any code change that needed to be made by DLXS users, it was usually when different display of data was needed ("filtering"). Now, nearly all "filtering" of data for display is done via XLST stylesheets. Occasionally, collection-specific searches need to be made (based on, for example, idiosyncratic markup). The query building for those searches may still need to be subclassed. However, most text type collections, if using the admittedly loose Text Class DTD, will run through the middleware with little if any modification, since most standard searches are done via those things that help abstract out many idiosyncracies of mark up: fabricated regions, mapped search region names, etc.
Back to topA fabricated region is a "virtual" region that has been indexed. You can use any valid XPAT query to create a result set. Then, with the {export} command, you can have XPAT create a binary index of the points in the result.
Why would you want to do this? If you, or your program, will be making queries often on something that is a bit complex (in terms of the query needed), you can have XPAT consult a previously created index rather than have it do the complex query, each time it might need it, using the usual idx and SGML rgn indexes.
For examples and more discussion of fabricated regions, see: Fabricated Regions.
Once the fabricated regions are created and indexed, they can be searched for and printed just like any other region.
>> region maindate 1: 4 matches >> pr.region.maindate region maindate 990, ..<DATE>1910.</DATE>.. 299182, ..<DATE>1876.</DATE>.. 793555, ..<DATE>1887.</DATE>.. 1690542, ..<DATE>1877.</DATE>..
For more information about all XPAT commands, see the regular DLXS documentation about XPAT.
The pr command is the heart of viewing sets. In an interactive XPAT session, it lets you view the results you've searched for. Within the middleware, getting the data back from XPAT is just one step; before Release 12, it was followed by "filtering" operations, Perl substitutions using regular expressions, to remove or change other tags in the the content and to change the appearance tof the content; e.g. highlighting hits, etc., eventually resulting in HTML. As of Release 12, though there is some small amount of manipulation of the XML that is returned from XPAT queries, essentially all "filtering" (conversion to HTML) is done via XSLT stylesheets.
The format the results that XPAT returns with pr or save is determined by the current {quieton} setting. There is a big difference between the normal user-sitting-at-the-pat-terminal interactive mode, and the machine-readable modes.
Note: The save command is, in a sense, the same as the pr command: pr displays to STDOUT, save outputs (appends) to a file whose name is given by {savefile}. The format of the output is the same.
Back to top<Sync>string</Sync>Back to top