Last updated | 2003-03-04 09:46:45 EST |
Doc Title | Transforming Bibliographic Files, Bibliographic Class |
Author 1 | Hagedorn, Kat |
CVS Revision | $Revision: 1.6 $ |
Please note that for most classes in DLXS, DLXS does not formally support transformations from non-DLXS formats to DLXS XML/SGML. The following instructions and programs are provided as guides or aids only.
To date, DLPS has received and processed bibliographic information in a variety of formats, including USMARC records from our NOTIS catalog, SGML from Chadwyck-Healey, bibliographic information in our own generic TEI-like SGML (encoded for the Text Class), and a variety of local database schema from applications like FMPro and MS-Access. The three Perl programs linked below are representative both for their ability to transform from one of these to Bibliographic Class, and for their degree of "polish." They are provided as freely available aids for those implementing the DLXS Bibliographic Class and doing similar work.
gums2bib.pl: This very rudimentary program transforms data encoded in Text Class's "grand unified markup scheme" (tongue-in-cheek), a TEI-derived DTD, to the Bibliographic Class's DTD. It will be replaced with a similarly simple program that transforms data from the more current Text Class DTD. However, it should be noted that, wherever possible, we rely on data coming from USMARC records for both the encoded information in the gums.dtd/textclass.dtd and for bibliographic information in the Bibliographic Class, so this program is reserved for exceptions -- bibliographic data found only in the online text. To use it, ensure that the path to Perl is correct and issue the gums2bib.pl command, specifying input file and output file, as in:
gums2bib.pl my-texts.sgm > my-bib.sgm
marc2bib.pl: This much more thoughtful program (written primarily by Beth Kirschner) derives bibliographic information from NOTIS records in the USMARC format and produces output in the Bibliographic Class's bib.dtd. We often use it in conjunction with something called marc_split.pl, which divides a file of NOTIS-generated records into individual records named with the NOTIS record identifier and the ".marc" extension. Current practice, in fact, relies on periodic updates of the approximately 15,000 bibliographic files from the Library online catalog, splitting the resulting records into individual files, and generating new Bibliographic Class and Text Class records.
The program will look for USMARC records in a file called records.marc, or alternatively in a file or files identified on the command line. It will produce a collection of individual files with the .bib extension, each named with the NOTIS record identifier or key and will put those output files in a directory called sgmlout. Thus, marc2bib.pl by itself, with a file called records.marc containing the NOTIS keys foo, bar, and foobar, will result in sgmlout/foo.bib, sgmlout/bar.bib, and sgmlout/foobar.bib. Similarly, marc2bib.pl with the command line argument marc/*.marc with the NOTIS IDs foo.marc, bar.marc, and foobar.marc, will also result in (or overwrite) sgmlout/foo.bib, sgmlout/bar.bib, and sgmlout/foobar.bib.