SGMLRGN - XPAT multi-purpose SGML application (man page)

SGMLRGN

Section: User Commands (1)
Index

NAME

sgmlrgn - XPAT multi-purpose SGML application

An SGML Application Conforming to International Standard ISO 8879 ---
Standard Generalized Markup Language

SYNOPSIS

sgmlrgn -m mode [ -C ] [ -J ] [ -c ] [ -d ] [ -D dd_name ] [ -e ] [ -g ] [ -p ] [ -u ] [ -v ] [ -i name ] [ -o outfile_name ] [ -M meta_structure_file ] [ -G group_name ] filename ...

DESCRIPTION

sgmlrgn parses and validates the SGML document entity in filename(s) and prints the results on the standard output according to the mode specification. Note that the document entity may be spread among several files. For example, the SGML document type definition (DTD) and document instance set could each be in a separate file. If the filename is a dash (-), sgmlrgn will process SGML text from standard input. This is especially useful in filter mode.

OPTIONS

-m mode: mode - specify a processing mode. All results are printed on standard output and all error messages are printed on standard error. The following are the valid modes. Note that only the first two letters of the mode name are significant. NOTE: For MFS database, refer to MFS DATABASE OPTIONS for building regions only in SGML documents.

: region -D data_dictionary [ -Cv ] [ -o outfile_name ] [-M meta_structure_file ] [ -G group_name ]

: Generate xpat region indices (see regions(5)) and update the region information in the data_dictionary. The -D argument must be included - it specifies the Data Dictionary to be updated. If the name of the input file is named Text, the output region file will be named Text.rgn. Region files produced by sgmlrgn have the same format as those produced by xpatrgn(1) and multirgn(1). The -o option can be used to specify the output region file name; otherwise, the region file name will be the same as the data dictionary file name with (`.rgn') extension. For -M and -G options, read the section on MFS DATABASE OPTIONS.

filter [ -Cv ] [ -R 0 | 1 ]

: Parse the DTD and normalize the SGML input. Specifying -R 1 (default) will expand entity references but specifying -R 0 will not expand entity references. This mode is frequently used as a filter to normalize any minimized SGML references in the text. When sgmlrgn is used in this way, it has two extra filename arguments. The first is the name of the document type declaration (`.inp') file and the second is a dash (to specify standard input).

check [ -Ccdegipuv ] [ -D data_dictionary ]

: Validate the DTD and document instance. Only errors are reported.

dtd [ -Ccdegipuv ] [ -D data_dictionary ]

: Parse the DTD and expand all parameter entity references. The elements are nicely printed and sorted alphabetically.

root [ -v ] [ -D data_dictionary ]

: Print the root element of the document instance in the form `{DefaultRegion root}'. The DefaultRegion command is a xpat command to set the default region. The output from this mode can be used in the xpat initialization file (see xpat(1) and data_dict(5)).

elist [ -Ccipv ] [ -D data_dictionary ]

: Alphabetically list all elements defined in the DTD.

normalize [ -Cciv ] [ -D data_dictionary ]

: Transform the document instance into a fully expanded document instance, expanding all minimized elements.

declare [ -v ] [ -D data_dictionary ]

: Print the SGML declaration. The following options are "undocumented":

print [ -Cciv ] [ -D data_dictionary ] [-S print_length]

: Print the start tag, end tag and attribute position in the document instance. A sample of region, with maximum print_length (default is 30), will be printed.

test [ -D generate_file ] [ -S seed# ]

: Generate a test document according to the DTD. The generation parameters can be specified in the generate_file and the random number generator seed is seed#. The generate test parameter mode, '-m gentest', can be used to produce the initial generate_file. If the generate_file is not specified, the default parameters will be used.

gentest

: Generate a set of default test-generation parameters. These parameters can be changed and used in the test mode, '-m test'.

-C: Make sgmlrgn case sensitive.
-J: Enable sgmlrgn to merge the new regions into the existing regions.
-c: Describe capacity usage at the end of the parse.
-d: Warn about duplicate entity declarations.
-e: Describe open entities in error messages. Error messages always include the position of the most recently opened external entity.
-g: Show the Generic Identifiers of open elements in error messages.
-i name: Pretend that <!ENTITY%nameINCLUDE> occurs at the start of the document type definition subset in the SGML document entity. Since repeated definitions of an entity are ignored, this definition will take precedence over any other definitions of this entity in the document type definition. Multiple -i options are allowed. If the SGML declaration replaces the reserved name INCLUDE then the new reserved name will be the replacement text of the entity. Typically the document type definition will contain <!ENTITY%nameIGNORE> and will use %name; in the status keyword specification of a marked section declaration. In this case the effect of the option will be to cause the marked section not to be ignored.

-o: specify the output region file name. It will automatically append the region file extension ('.rgn') at the end when generating the file name.
-p: Parse only the prologue. sgmlrgn will exit after parsing the document type definition.
-u: Warn about undefined elements - elements used in the DTD but not defined.
-v: Turn ON verbose mode. Verbose mode outputs messages concerning the progress of sgmlrgn.
-M meta_structure_file: The meta_structure_file is generated by mfsmeta(1) program. The structure file contains information about each file start and end parsing positions. Only the files which have <DisplayFmt> equal to 'sgml' will be processed by sgmlrgn. The -G group_name option further narrows which group of SGML documents can be parsed by the given DTD.
-G group_name: The group_name identifies which group of SGML documents should be parsed. Since only one DTD can be used for each sgmlrgn run, the group_name basically defines the SGML documents that can be parsed by the given DTD.

MFS DATABASE OPTIONS

Background

In MFS databases, the MFS system creates a ``virtual text'' from the text of all the files in the database. The portion of this virtual text that corresponds to each file consists of three pieces: the Meta-Header section, the Data section, and the Meta-Trailer section. This breakdown is illustrated in the following diagram:


<OTDoc><OTMeta>..</OTMeta><OTData>.............</OTData></OTDoc>
|--------- Meta-Header ----------|| SGML Data ||- Meta-Trailer -|
^                                 ^            ^                ^
start                             start        start            end
header                            data         trailer          pos

The data in the Meta-Header and Meta-Trailer sections is highly structured and is uniform across all the files in the MFS database. In contrast, the data in the Data sections may be untagged text, tagged text without a DTD, or tagged text with a DTD (SGML data).

The process of building region indices on such databases involves several steps. The first step involves running mfsmeta over the database to build a meta_structure_file. This file contains information about the positions of the Meta-Header, Data, and Meta-Trailer sections for each file in the database.

The second step involves building regions on the fields in the Meta-Header and Meta-Trailer sections that are common to all files. Refer to the multirgn(1) man page for further details.

The third step involves building regions for the Data sections. For the Data sections that contain tagged text without a DTD, this task is accomplished using multirgn. For SGML Data sections (that do have a DTD), this task is accomplished using sgmlrgn.

There are three types of SGML MFS databases. The first consists of a group of SGML files that all conform to the same DTD - each file is a complete document.

The second type consists of a group of SGML files that conform to several different DTD's - each file is still a complete document.

The third type consists of a group of SGML files that conform to one or more DTD's - the files may contain either complete documents or pieces of documents (i.e., the text for specific elements in the DTD). Each of the next three sections discusses how to build regions for one of the above database types.

Building Regions for Type 1 SGML Databases

The first step involves setting up the FilterChain section of the Data Dictionary which specifies the SGML files to be included in the database. In particular, the DisplayFmt field should be set to the value, `sgml'.

For example, the following FilterChain section might be appropriate for a the first kind of SGML database:


<FilterChain>
<SearchView>meta</SearchView>
<DisplayView>meta</DisplayView>
<RawView>meta</RawView>
<DisplayFmt>sgml</DisplayFmt>
<FileGroup>
<MfsDir>sgmldata</MfsDir>
<MfsFile>*.sgm</MfsFile>
<MfsExpand>tree</MfsExpand>
</FileGroup>
</FilterChain>

Once the FilterChain sections have been set up, the following command can be used to build the SGML regions (usually separately after dbbuild). For this example, assume the meta_structure_file generated by mfsmeta is called `data.str' and the `data.inp' contains the <!DOCTYPE> declaration for the SGML files in the database:


% sgmlrgn -v -m region -M data.str -D data.dd data.inp data.dd

sgmlrgn will use the data.str to identify all the sgml format files and will build SGML regions on them.

Building Regions for Type 2 SGML Databases

As with Type 1 SGML databases, the first step involves setting up the FilterChain sections of the Data Dictionary. However, because the files conform to more than one DTD, they must be separated into groups, where all the files in a group conform to a particular DTD. A FilterChain section is then set up for each group.

The DisplayFmt section of each FilterChain section is then set to contain two values, separated by a comma. The first value is the keyword `sgml' and the second value is a short group name that you pick, which uniquely identifies the group.

For example, the following FilterChain sections might be appropriate a Type 2 SGML database that contains files from two DTD's (with group names, `manual' and `news').


<FilterChain>
<SearchView>meta</SearchView>
<DisplayView>meta</DisplayView>
<RawView>meta</RawView>
<DisplayFmt>sgml,manual</DisplayFmt>
<FileGroup>
<MfsDir>mandata</MfsDir>
<MfsFile>*.sgm</MfsFile>
<MfsExpand>tree</MfsExpand>
</FileGroup>
</FilterChain>
<FilterChain>
<SearchView>meta</SearchView>
<DisplayView>meta</DisplayView>
<RawView>meta</RawView>
<DisplayFmt>sgml,news</DisplayFmt>
<FileGroup>
<MfsDir>newsdata</MfsDir>
<MfsFile>*.sgm</MfsFile>
<MfsExpand>tree</MfsExpand>
</FileGroup>
</FilterChain>

Once the FilterChain sections have been set up, the following commands can be used to build the SGML regions (each DTD in the database requires one pass with sgmlrgn). For this example, assume the meta_structure_file generated by mfsmeta is called data.str. Assume that the file, `manual.inp' contains the <!DOCTYPE> declaration for the `manual' files. Finally, assume that the file, `news.inp' contains the <!DOCTYPE> declaration for the `news' files.


% sgmlrgn -v -m region -M data.str -G manual -D data.dd manual.inp data.dd
% sgmlrgn -v -m region -M data.str -G news -D data.dd news.inp data.dd

Note that the `-G' option is used to specify which group to build the regions on in each pass.

Building Regions for Type 3 SGML Databases

As with Type 2 SGML databases, the first step involves setting up the FilterChain sections of the Data Dictionary. Also, as in Type 2 SGML databases, the files must be separated into groups. What is different in Type 3 databases is that the groups not only specifies files which use a particular DTD, but may also be further refined to specify files which contain text for a specific element of a DTD.

For example, assume the newspaper documents in the example above consisted of two elements, HEADLINE and TEXT. Further, assume that text for all the HEADLINE parts were in files with the suffix, `.hl' and that the text for the TEXT parts were in files with the suffix, .txt'. Then the following FilterChain sections could be used to define this database (which also includes the `manual' files in the other directory):


<FilterChain>
<SearchView>meta</SearchView>
<DisplayView>meta</DisplayView>
<RawView>meta</RawView>
<DisplayFmt>sgml,manual</DisplayFmt>
<FileGroup>
<MfsDir>mandata</MfsDir>
<MfsFile>*.sgm</MfsFile>
<MfsExpand>tree</MfsExpand>
</FileGroup>
</FilterChain>
<FilterChain>
<SearchView>meta</SearchView>
<DisplayView>meta</DisplayView>
<RawView>meta</RawView>
<DisplayFmt>sgml,newshl,HEADLINE</DisplayFmt>
<FileGroup>
<MfsDir>newsdata</MfsDir>
<MfsFile>*.hl</MfsFile>
<MfsExpand>tree</MfsExpand>
</FileGroup>
</FilterChain>
<FilterChain>
<SearchView>meta</SearchView>
<DisplayView>meta</DisplayView>
<RawView>meta</RawView>
<DisplayFmt>sgml,newstxt,TEXT</DisplayFmt>
<FileGroup>
<MfsDir>newsdata</MfsDir>
<MfsFile>*.txt</MfsFile>
<MfsExpand>tree</MfsExpand>
</FileGroup>
</FilterChain>

Note that a third attribute has been added to the DisplayFmt fields of the `news' files, which identifies the element that the text in those files corresponds to. Also note that `HEADLINE' and `TEXT' groups have different group names (`newshl' and `newstxt'). Also note that there is no element attribute defined for the `manual' files because they are to be parsed using the entire `manual' DTD.

Once the FilterChain sections have been set up, the following commands can be used to build the SGML regions. For this example, assume the meta_structure_file generated by mfsmeta is called data.str. Assume that the file, `manual.inp' contains the <!DOCTYPE> declaration for the `manual' files. Finally, assume that the file, `news.inp' contains the <!DOCTYPE> declaration for the `news' files.


% sgmlrgn -v -m region -M data.str -G manual -D data.dd manual.inp data.dd
% sgmlrgn -v -m region -M data.str -G newshl -D data.dd news.inp data.dd
% sgmlrgn -v -m region -M data.str -G newstxt -D data.dd news.inp data.dd

Note that the `-G' option is used to specify which group to build the regions on in each pass.

DESCRIPTIONS

Entity Manager

An external entity resides in one or more files. The entity manager component of sgmlrgn maps a sequence of files into an entity in three sequential stages:

1.: each carriage return character is turned into a non-SGML character;
2.: each newline character is turned into a record end character, and at the same time a record start character is inserted at the beginning of each line;
3.: the files are concatenated.

A system identifier is interpreted as a list of filenames separated by colons. If no system identifier is supplied, then the entity manager will attempt to generate a filename using the public identifier. The searching of the related system filename associated with the public identifier is done by a table lookup. The table is named sgmlentity.map in the system. The sgmlentity.map file has two white-space delimited fields per document type. The first field is the system filename. The second field is the PUBLIC ID. The following are sample entries for document types in the sgmlentity file:

: # comments
filename_1 "PUBLIC ID 1"
filename_2 "PUBLIC ID 2"

....
filename_N "PUBLIC ID N"

sgmlrgn uses the following precedence order searching algorithm to find the PUBLIC ID:

1.: the sgmlentity.map file in the local directory.
2.: the sgmlentity.map file pointed to by the SGMLREGION_PATH environment variable
3.: the system filename in the local directory.

Example

The mode examples that follow rely on three files: the document type definition, the document instance, and the input file. The following is a sample document type definition called example.dtd.

: <!ENTITY % ISOlat1 PUBLIC

"ISO8879-1986//ENTITIESAddedLatin1//EN">
%ISOlat1;
<!ELEMENT doc - O (intro, body, concl)>
<!ELEMENT intro - O (#PCDATA)>
<!ELEMENT body - O (p|#PCDATA)+>
<!ELEMENT concl - O (#PCDATA)>
<!ELEMENT p - O (#PCDATA)>
<!ATTLIST p type (center|left|right) left>

The following is an example SGML document instance called example.sgm:

: <!DOCTYPE doc SYSTEM "example.dtd">
<doc><intro>Introduction
<body>Paragraph 1
Paragraph 2
<concl>Conclusion

The PUBLIC entity "ISO 8879-1986//ENTITIES Added Latin 1//EN" will produce a table lookup in the entity map file sgmlentity.map (or in $(SGMLREGION_PATH)/sgmlentity.map). The following is an entry in the sgmlentity.map file:

: ....

/usr/app/isolat1.gml"ISO8879-1986//ENTITIESAddedLatin1//EN"

....

This particular ISOlat1 public entity will be mapped to the system id /usr/app/isolat1.gml. The following is a simple input file:

: <!DOCTYPE doc SYSTEM "example.dtd">

Example: region mode

The following invokes sgmlrgn as a region file generator.

: sgmlrgn -v -m region -D example.dd example.sgm

This will generate a region file called example.rgn containing region pointers in the document instance called example.sgm. The region names are the names of the entities defined in the DTD. When sgmlrgn completes, the example.dd Data Dictionary will be updated. The -v option causes the following messages to be displayed while sgmlrgn executes:

: ..buildingregions#(0)size(0K)time(1s)

totalbuiltregions#(14)size(0K)time(1s)

..sortingnow

..writingnow

The total number of regions is 14 and the total execution time is 1 second. The resulting regions can be used by xpat(1) and Latitude Query. See regions(5) and data_dict(5) for more details.

Example: filter mode

The filter mode can be invoked as follows:

: sgmlrgn -m filter example.inp -

With this command, sgmlrgn is invoked as a filter. The example.inp (the document type declaration) file is parsed and the filter is loaded with the DTD. sgmlrgn will expect the SGML text on standard input (-) and will produce a fully expanded instance of the input text on standard output.

Example: check mode

The following invokes sgmlrgn to check the validity of the SGML document instance:

: sgmlrgn -m check example.sgm

sgmlrgn will produce error messages for any syntax errors in the DTD or the document instance. sgmlrgn will produce a summary line when it has finished parsing the document instance. The following is an example of the summary information produced.

: checking total size(94K) time(1s)

sgmlrgn took 1 second to validate the file (which is approximately 94 KB in length).

Example: dtd mode

The following invokes sgmlrgn to expand the DTD.

: sgmlrgn -m dtd example.sgm

or
sgmlrgn -m dtd example.inp

The first form of the command retrieves the name of the DTD from the SGML document instance, whereas the second form retrieves the name of the DTD from the document type declaration file. Either of the above commands will generate the following output:

: <!ELEMENT BODY - O (P|#PCDATA)+>
<!ELEMENT CONCL - O (#PCDATA)>
<!ELEMENT DOC - O (INTRO,BODY,CONCL)>
<!ELEMENT INTRO - O (#PCDATA)>
<!ELEMENT P - O (#PCDATA)>
<!ATTLIST P TYPE (CENTER|LEFT|RIGHT) LEFT>

All of the parameter entity references are expanded. The element names are sorted alphabetically and the element definitions are nicely printed in a structured format.

Example: print mode

The following command invokes sgmlrgn in print mode:

: sgmlrgn -m print example.sgm

It will generate the following output:

: [STG DOC 36:40]-->[<doc>]

[STGINTRO41:47]-->[<intro>]

[ETGINTRO41:61]-->[<intro>Introduction\n\]

[STGBODY62:67]-->[<body>]

[ATTP

TYPETOKEN76:79]-->[left]

[STGP68:80]-->[<ptype=left>]

[ETGP68:92]-->[<ptype=left>Paragraph1\n\]

[ATTP

TYPETOKEN101:106]-->[center]

[STGP93:107]-->[<ptype=center>]

[ETGP93:119]-->[<ptype=center>Paragraph2\n\]

[ETGBODY62:119]-->[<body><ptype=l....er>Paragraph2\n\]

[STGCONCL120:126]-->[<concl>]

[ETGCONCL120:120]
[ETG DOC 36:120]

The number before the colon is the starting position; the number after the colon is the ending position. The STG stands for start tag region, ETG stands for the tag's region and ATT stands for the attribute regions within a tag. For each nested level of region, the printing is indented by a corresponding number of blank characters. The end of line character is represented by \n\. If the printed region is too long, it will be shortened and .... will be substituted. For example:

: [ETG BODY 62:119]-->[<body>Paragraph 2]

This can be used to check the region generation and also allows structure of the document instance to be viewed.

Example: root mode

The following invokes the root mode of sgmlrgn:

: sgmlrgn -m root example.sgm

or
sgmlrgn -m root example.inp

The first form of the command retrieves the name of the DTD from the SGML document instance, whereas the second form retrieves the name of the DTD from the document type declaration file. The DTD is then parsed to determine the root element. Either of the above commands will generate the following output:

: {DefaultRegion DOC}

The output is suitable as part of a xpat(1) initialization file to set the default region.

Example: elist mode

The following invokes the element list generation mode:

: sgmlrgn -m elist example.sgm

This will list all the element names in alphabetical order:

: BODY
CONCL
DOC
INTRO
P

Example: normalize mode

The following invokes sgmlrgn's document instance normalization mode:

: sgmlrgn -m normalize example.sgm

The original minimized document instance is transformed to a fully expanded document instance:

: <DOC>
<INTRO>Introduction </INTRO>
<BODY>
Paragraph 1
Paragraph 2</BODY>
<CONCL>Conclusion</CONCL></DOC>

Example: test mode

The test mode is used to generate a test document instance and is invoked by:

: sgmlrgn -m test example.sgm

or
sgmlrgn -m test example.inp

The following is fully validated sample output generated by sgmlrgn:

: <DOC><INTRO></INTRO><BODY>cb b
bca bccabcab</BODY><CONCL></DOC>

This output document instance can be used as a test input document to sgmlrgn. If the generate_file is specified, the generation parameters will be taken from generate_file. Otherwise, the default generation parameters will be used.

Example: gentest mode

Invoke the generation of a sample set of test-generation parameters by:

: sgmlrgn -m gentest example.sgm

or
sgmlrgn -m gentest example.inp

The parameters are described in a tagged format and their meanings are:

Tag Name Description Default

<OmitStag> allow omit start tag 0

<OmitStagProb> allow omit start tag probability 0.3

<OmitEtag> allow omit end tag 1

<OmitEtagProb> allow omit end tag probability 0.3

<NetTag> allow NET tag 1

<NetTagProb> allow NET tag probability 0.1

<UncloseTag> allow unclosed tag 1

<UncloseProb> allow unclosed tag probability 0.1

<Orep> range of * element 2

<Rep> range of + element 3

<OptProb> ? element probability 0.3

<GrpOrep> range of * group 2

<GrpRep> range of + group 3

<GrpOptProb> ? group probability 0.3

<DataRange> range of # of data chars 12

<DataCharRange> range of data used chars 3

<DataCharStart> start data used char a

<DataEoln> allow end of line in data 1

<DataEolnProb> allow eoln in data probability 0.3

<StagEoln> allow end of line in start tag 1

<StagEolnProb> allow eoln in start tag probability 0.3

<AttRange> range of # of char-attrib. chars 8

<AttCharRange> range of char-attrib. used chars 2

<AttCharStart> start char-attrib. used char x

<AttNumRange> range of # of number-attrib. chars 8

<AttNumCharRange> range of number-attrib. used chars 2

<AttNumCharStart> start number-attrib. used char 0

<IDVal> start ID number 0

<OpenElement> not used 0

The global default values, apply to any element, are enclosed by pairs of <Regions> tags.

Within the <Regions> description, each element can override values in the global set of parameters by:

: <Region><Name>BODY</Name>

<Orep>10</Orep>
</Region>

In this case, the BODY element will have the range of 10 if it is optional and repeatable in a definition. Otherwise, BODY element will take the global values.

This set of test-generation parameters can be stored in a generate_file which can be used in the test generation mode, -m test.

System Declaration

The system declaration for sgmlrgn is as follows:

SYSTEM "ISO 8879:1986"

CHARSET

BASESET "ISO 646-1983//CHARSET

International Reference Version (IRV)//ESC 2/5 4/0"

DESCSET 01280

CAPACITY PUBLIC "ISO 8879:1986//CAPACITY Reference//EN"

FEATURES

MINIMIZE DATATAG NO OMITTAG YES RANK NO SHORTTAG YES

LINK SIMPLE NO IMPLICIT NO EXPLICIT NO

OTHER CONCUR NO SUBDOC YES 1 FORMAL YES

SCOPE DOCUMENT

SYNTAX PUBLIC "ISO 8879:1986//SYNTAX Reference//EN"

SYNTAX PUBLIC "ISO 8879:1986//SYNTAX Core//EN"

VALIDATE

GENERAL YES MODEL YES EXCLUDE YES CAPACITY YES

NONSGML YES SGML YES FORMAL YES

SDIF

PACK NO UNPACK NO

The memory usage of sgmlrgn is not a function of the capacity points used by a document; however, sgmlrgn can handle capacities significantly greater than the reference capacity set.

In some environments, higher values may be supported for the SUBDOC parameter.

Documents that do not use optional features are also supported. For example, if FORMALNO is specified in the SGML declaration, public identifiers will not be required to be valid formal public identifiers.

Certain parts of the concrete syntax may be changed:

The shunned character numbers can be changed. Eight bit characters can be assigned to LCNMSTRT, UCNMSTRT, LCNMCHAR and UCNMCHAR. Declaring this requires that the syntax reference character set be declared like this:

BASESET	"ISO Registration Number 100//CHARSET
	ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
DESCSET	02560

Uppercase substitution can be performed or not performed both for entity names and for other names.

Either short reference delimiters assigned by the reference delimiter set or no short reference delimiters are supported.

The reserved names can be changed.

The quantity set can be increased within certain limits subject to there being sufficient memory available. The upper limit on NAMELEN is 239. The upper limits on ATTCNT, ATTSPLEN, BSEQLEN, ENTLVL, LITLEN, PILEN, TAGLEN, and TAGLVL are more than thirty times greater than the reference limits. The upper limit on GRPCNT, GRPGTCNT, and GRPLVL is 253. NORMSEP cannot be changed. DTAGLEN are DTEMPLEN irrelevant since sgmlrgn does not support the DATATAG feature.

SGML Declaration

The SGML declaration may be omitted. If the SGML declaration is omitted, the following declaration will be implied:

<!SGML "ISO 8879:1986"
CHARSET
BASESET	"ISO 646-1983//CHARSET
	International Reference Version (IRV)//ESC 2/5 4/0"
DESCSET	09UNUSED
	929
	112UNUSED
	13113
	1418UNUSED
	329532
	1271UNUSED
CAPACITY	SGMLREF
	TOTALCAP	1000000
	ENTCAP	1000000
	ENTCHCAP	1000000
	ELEMCAP	1000000
	GRPCAP	1000000
	EXGRPCAP	1000000
	EXNMCAP	1000000
	ATTCAP	1000000
	ATTCHCAP	1000000
	AVGRPCAP	1000000
	NOTCAP	1000000
	NOTCHCAP	1000000
	IDCAP	1000000
	IDREFCAP	1000000
	MAPCAP	1000000
	LKSETCAP	1000000
	LKNMCAP	1000000
SCOPE	DOCUMENT
SYNTAX	PUBLIC	"ISO 8879:1986//SYNTAX Reference//EN"
QUANTITY	SGMLREF
	ATTCNT	100
	ATTSPLEN	960
	BSEQLEN	960
	DTAGLEN	32
	DTEMPLEN	32
	ENTLVL	32
	GRPCNT	100
	GRPGTCNT	96
	GRPLVL	32
	LITLEN	1024
	NAMELEN	80
	NORMSEP	2
	PILEN	1024
	TAGLEN	960
	TAGLVL	1000
FEATURES
MINIMIZE	DATATAG	NO	OMITTAG	YES	RANK	NO	SHORTTAG	YES
LINK	SIMPLE	NO	IMPLICIT	NO	EXPLICIT	NO
OTHER	CONCUR	NO	SUBDOC	YES 99999	FORMAL	YES
APPINFO NONE>

with the exception that characters 128 through 254 will be assigned to DATACHAR. When exporting documents that use characters in this range, an accurate description of the upper half of the document character set should be added to this declaration. For ISO Latin-1, an appropriate description would be:

BASESET	"ISO Registration Number 100//CHARSET
	ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
DESCSET	12832UNUSED
	1609532
	2551UNUSED

The reference capacity set is upgraded to a 1 MB limit and the reference quantity set is upgraded to a bigger limit.

ORIGIN

ARCSGML was written by Charles F. Goldfarb.

Sgmls was derived from ARCSGML by James Clark.

sgmlrgn was derived from Sgmls.

Background
Building Regions for Type 1 SGML Databases
Building Regions for Type 2 SGML Databases
Building Regions for Type 3 SGML Databases

DESCRIPTIONS

Entity Manager
Example
Example: region mode
Example: filter mode
Example: check mode
Example: dtd mode
Example: print mode
Example: root mode
Example: elist mode
Example: normalize mode
Example: test mode
Example: gentest mode
System Declaration
SGML Declaration

SEE ALSO
ORIGIN

Tag Name	Description	Default
<OmitStag>	allow omit start tag	0
<OmitStagProb>	allow omit start tag probability	0.3
<OmitEtag>	allow omit end tag	1
<OmitEtagProb>	allow omit end tag probability	0.3
<NetTag>	allow NET tag	1
<NetTagProb>	allow NET tag probability	0.1
<UncloseTag>	allow unclosed tag	1
<UncloseProb>	allow unclosed tag probability	0.1
<Orep>	range of * element	2
<Rep>	range of + element	3
<OptProb>	? element probability	0.3
<GrpOrep>	range of * group	2
<GrpRep>	range of + group	3
<GrpOptProb>	? group probability	0.3
<DataRange>	range of # of data chars	12
<DataCharRange>	range of data used chars	3
<DataCharStart>	start data used char	a
<DataEoln>	allow end of line in data	1
<DataEolnProb>	allow eoln in data probability	0.3
<StagEoln>	allow end of line in start tag	1
<StagEolnProb>	allow eoln in start tag probability	0.3
<AttRange>	range of # of char-attrib. chars	8
<AttCharRange>	range of char-attrib. used chars	2
<AttCharStart>	start char-attrib. used char	x
<AttNumRange>	range of # of number-attrib. chars	8
<AttNumCharRange>	range of number-attrib. used chars	2
<AttNumCharStart>	start number-attrib. used char	0
<IDVal>	start ID number	0
<OpenElement>	not used	0

SYSTEM "ISO 8879:1986"
CHARSET
BASESET	"ISO 646-1983//CHARSET
	International Reference Version (IRV)//ESC 2/5 4/0"
DESCSET	01280
CAPACITY	PUBLIC	"ISO 8879:1986//CAPACITY Reference//EN"
FEATURES
MINIMIZE	DATATAG	NO	OMITTAG	YES	RANK	NO	SHORTTAG	YES
LINK	SIMPLE	NO	IMPLICIT	NO	EXPLICIT	NO
OTHER	CONCUR	NO	SUBDOC	YES 1	FORMAL	YES
SCOPE	DOCUMENT
SYNTAX	PUBLIC	"ISO 8879:1986//SYNTAX Reference//EN"
SYNTAX	PUBLIC	"ISO 8879:1986//SYNTAX Core//EN"
VALIDATE
	GENERAL	YES	MODEL	YES	EXCLUDE	YES	CAPACITY	YES
	NONSGML	YES	SGML	YES	FORMAL	YES
SDIF
	PACK	NO	UNPACK	NO