Reissued as: CC:DA/TF/TEI/3
May 30, 1995
Committee on Cataloging: Description & Access
Call for CC:DA Action on the TEI Header
A Position Paper
Sherry Kelley, UCLA Library
Bradford Eden, NEEDS Cataloger
In Consultation With
Edward Gaynor, University of Virginia Library,
Cataloging Services Department
May 30, 1995
Please note that the purpose of this document is to facilitate the work of the Committee and to provide a means for outreach to both library and non-library cataloging communities.
This document is intended for the exclusive use of CC:DA and its cataloging constituencies, and is presented for discussion in the ongoing process of rule revision. Under no circumstances should the information here be copied or re-transmitted without prior consultation with the current Chair of CC:DA.
The CC:DA charge: "To make a continuing assessment of the state of the art and suggest the direction of change in the field of descriptive cataloging; to recommend solutions to problems relating not only to bibliographic description but also to choice and form of access points, other than subject access; to initiate proposals for additions to and revisions of the cataloging code currently adopted by ALA;
to develop official ALA positions on such proposals in consultation with other appropriate ALA units and organizations in the U.S.A.
" (ALA Handbook of Organization, 1994/1995)
An important new direction for the field of descriptive cataloging is the growth of electronic "publishing" and the increase in electronic documents as a percentage of materials collected by libraries. Electronic documents are structured in many ways, from flat files such as ASCII to those encoded following international standards such as Standard Generalized Markup Language (SGML) and the Text Encoding Initiative (TEI) guidelines. We are concerned with the latter category of document in this position paper but wish to point out that the creation of TEI-conformant documents and their accompanying TEI headers is just one of many new challenges for bibliographic control and description created by electronic publishing. The Committee on Cataloging: Description and Access (CC:DA) should join, even lead, the dialogue going on in MARBI, CONSER, and various OCLC projects, including the OCLC Internet Resources Cataloging Project, concerning these materials.
It is appropriate that CC:DA do so on two counts. The first is that catalogers are struggling to describe electronic documents with a patchwork of rules from the Anglo-American Cataloguing Rules, 2nd ed., 1988 revision (AACR2r), especially Chapter nine, Computer files. These are inadequate for a number of reasons, chief of which are the strong print and commercial publisher orientation of the rules. Secondly, the number of electronic text projects to convert printed texts to TEI-conformant documents is increasing. Each TEI-conformant document carries its own "bibliographic record" in the form of a TEI header. Libraries will be the chief users of these headers as surrogates for title pages, as potential access records in their OPACs, and as source records for descriptive catalogers. CC:DA can and should help standardize the creation and use of these headers.
Text Encoding Initiative
The Text Encoding Initiative is a major international academic effort to establish guidelines for the encoding and interchange of electronic texts. (E. van Herwijnen, Practical SGML, 2nd ed., 1994. p. 53)
The encoding scheme used by the TEI guidelines is an application of a system known as the Standard Generalized Markup Language (SGML). It is an international standard (ISO 8879) for the description of marked-up electronic text. SGML defines methods of representing text in electronic form through the use of coding conventions. TEI uses a subset of SGML for its encoding scheme. By way of comparison, MARC is a markup language, as are word processing coding conventions that indicate how a document will display in print format. (TEI P3, Cumulative Draft. Chapter 2. A Gentle Introduction to SGML, p. 19.)
The TEI Header
Every TEI-conformant text has an encoded set of descriptions prefixed to it. This is known as the TEI header and consists of four major parts: file description, encoding description, text profile, and revision history. The file description is mandatory and is our chief concern in this position paper because it contains "a full bibliographical description of the computer file from which a user of the text could derive a proper bibliographic citation, or which a librarian or archivist could use in creating a catalogue entry recording its presence within a library or archive." (TEI P3, Chapter 5) In documenting information about the text, its source, its encoding and its revisions, headers "provide an analogue to the title page attached to a printed work." (Ibid.)
Seven data elements may appear in the file description (<fileDesc> element of the <teiHeader> element), three of which are mandatory. The mandatory elements are:
title statement (<titleStmt>): information about the title of a work and those responsible for its intellectual content.
Optional elements are:
publication statement (<publicationStmt>): information about the publication or distribution of an electronic or other text.
source description (<sourceDesc>): bibliographic description of the copy text(s) from which an electronic text was derived or generated.
edition statement (<editionStmt>): information about one edition of a text.
extent (<extent>): information about the approximate size of the electronic text as stored on some carrier medium, specified in any convenience units.
series statement (<seriesStmt>): information about the series, if any, to which a publication belongs.
notes statement (<notesStmt>): any notes providing information about a text additional to that recorded in other parts of the bibliographic description. (Guidelines for Electronic Text Encoding and Interchange, version P3, 1994. Chapter 24.)
These elements clearly parallel descriptive cataloging areas 1 - 7 of AACR2r. Collaboration between the authors of the TEI and the descriptive cataloging community as represented by CC:DA in the further development of header elements would be mutually beneficial, since these serve as bibliographic records and as title page equivalents. Some collaboration has already occurred, as indicated by references in the TEI documentation to use of AACR2 in formulating data element definitions.
University of Virginia Library, Cataloging Services Department Experience
The University of Virginia Library's Electronic Text Center provides a good illustration of the possible interconnections between staff creating TEI-conformant documents and catalogers. University of Virginia Library was one of the first to create electronic texts and headers following TEI guidelines and to produce MARC records for its online public access catalog. Early in the project, cataloging department staff created headers following TEI, Chapter 24, AACR2r, and their own local policies. A set of procedures grew out of this process that provided sufficient guidance for staff from the Electronic Text Center to assume responsibility for the creation of headers. Once created, headers are stored in an online file to be reviewed by cataloging staff. Separate MARC records are created from the headers and the headers themselves are edited to conform to AACR2r as appropriate. For example, names are entered in the header in their authorized form. (Cataloging Procedures Manual, Chapter 12, Part B: Electronic Texts. University of Virginia Library, Cataloging Services Dept.)
A useful byproduct from this project for the cataloging community was the incorporation of suggestions made by University of Virginia Library catalogers into the TEI, third edition.
There are many electronic text projects using SGML. The Center for Electronic Texts in the Humanities, the Berkeley Finding Aid Project, and UVA Library Electronic Text Center are examples. Internationally, notable examples include the WEBDOC and RIDDLE projects. Bibliographic records in the form of headers are being created to accompany texts, and to serve as surrogate catalog records in separate hypermedia databases, with links to the texts. These databases are built on multiple and sometimes proprietary platforms. CC:DA should strongly support efforts to record data in headers according to AACR2 as means to standardize their data content, in anticipation of a time when the header can be used as a record that will appear in Online Public Access Catalogues (OPACs), either through links that are opaque to the user, or through SGML/MARC reversible mapping programs.
The potential for direct integration into the OPAC makes the TEI header a candidate for CC:DA review. Records are being created that consist of descriptive elements that must be AACR2-conformant in order to integrate, virtually or physically, with AACR2/MARC records. Not only must these records be AACR2-conformant, but AACR2 must be revised to guide cataloging staff in the preparation and use of headers for electronic texts. As stated in Chapter 5.7 of the TEI Guide, Note for Library Cataloguers: "The (TEI) file header is not a library catalogue record, and so will not make all of the distinctions essential in standard library work
It is the intention of the developers, however, to ensure that the information required for a catalogue record be retrievable from the TEI file header, and moreover that the mapping from one to the other be as simple and straightforward as possible."
We recommend that such a Task Force be formed by CC:DA to be charged with but not limited to the following:
- Investigate ways CC:DA and the editors of the TEI might collaborate to inform each other on implementation and development of header data elements.
- Consider amending AACR2 to instruct catalogers in the use of TEI headers as title page substitutes.
- Investigate possible collaboration with MARBI and other organizations to standardize encoding conventions in support of reversible mapping between MARC and SGML, or to support seamless integration of TEI-conformant headers into MARC databases. CC:DA would be concerned with the appropriate definition of data content, not data encoding.
To be added.
Examples from the Cataloging Procedures Manual,
Chapter 12, Part B: Electronic Texts.
University of Virginia Library, Cataloging Services Dept.
TEI header template:
<!DOCTYPE TEI.2 system 'teilite.dtd'>
The work's title [a machine-readable transcription]
The work's author, last name first
Creation of machine-readable version:
creator of electronic version
Conversion to TEI.2-conformant markup:
University of Virginia Library Electronic
ca. XXX kilobytes
University of Virginia Library
collection and ID, e.g. Modern English, AusEmma
<p>Place where text can be found,
e.g. Available from: Oxford Text Archive</p>
<p>Available commercially from:</p>
<p>Name of electronic series, if any</p>
Illustrations have been included from the
Any other notes.
The work's title
The author's name, first name first
e.g. Editor / Translator / Annotator
<p>Edition information, e.g. 1st ed.</p>
place of publication
date of publication
Name of print series</p>
for the University of Virginia
Library Electronic Text Center</p>
<p>All quotation marks retained as data</p>
<p>Spell-check and verification made against printed
text using WordPerfect spell checker</p>
<p>All unambiguous end-of-line hyphens have been
removed, and the trailing part of a word has been
joined to the preceding line.</p>
<p id=ETC>Keywords in the header are a local
Electronic TextCenter scheme to aid in
establishing analytical groupings</p>
<p>ID elements are given for each page element and
are composed of the text's unique cryptogram and
the given page number, as in AusEmma1 for page
one of Jane Austen's Emma.</p>
First published date
languages used in the text
fiction or non-fiction; poetry, prose, or drama
date of changes
who made the changes
what was done
OCLC Workform for Electronic Texts:
|| ____ , ____|
||VA@ |c VA@|
|||a Modern |a English|
||<title> |h [computer file] / |c <author>|
||Computer data (1 file : ca. kilobytes)|
||Charlottesville, Va. : |b University of Virginia Library, |c <date>.|
||Mode of access: Internet. Host: etext.lib.virginia.edu|
||Text in French and English.|
||Title from TEI header.|
||Prepared for the University of Virginia Library Electronic Text Center.|
||Conversion to TEI.2-conformant markup.|
||Tagging checked and parsed against teilite.dtd.|
||All quotation marks retained as data. All unambiguous end-of-line hyphens have been removed, and the trailing part of the word has been joined to the preceding line.|
||ID elements are given for each page element and are composed of the text's unique cryptogram and the given page number, as in AusEmma1 for page one of Jane Austen's Emma.|
||Available (commercially) from:|
||Also available as ASCII text via campus gopher GWIS.|
||Includes bibliographical references.|
|||p Transcribed from: |a author. |t title / author. ed. |c place : publisher, date. |e p. : ill. ; cm. |f (series)|
|||p Transcribed from: |n Source unknown.|
||University of Virginia. |b Library. |b Electronic Text Center.|
||Virginia.EDU |m The Electronic Text Center, Alderman Library, University of Virginia, Charlottesville, VA 22903 (804) 924-3230 |u mailto://email@example.com|
|||u http://etext.lib.virginia.edu/modeng. browse.html |2 http|