Committee on Cataloging: Description & Access
Task Force on Metadata
Fairmont Hotel, Bayou II
The CC:DA 2000 Metadata subcommittee met on Saturday, 6/26 to plan for the pre-conference to be held at the 2000 annual conference. It is to have a practical orientation, describing options for providing access to electronic resources and the pros and cons of the various options. It will last 2 days (Thursday-Friday); the estimated cost is $235.00. OCLC will sponsor the coffee breaks.
The tentative outline is:
The full pre-conference program will be posted on the web by mid-August.
The TF decided that the intended audience for the pre-conference would be decision makers in the technical services arena: catalogers, collection development librarians, and their managers.
Recommending, as needed, rule revision to enable interoperability of cataloging (with AACR2R) with metadata schemes.
The Joint Steering Committee on the Revision of AACR will meet in Brisbane, Australia in October 1999 and will consider the reports of the Task Force on ISBD(ER), the Task Force on Rule 0.24 (which currently says that the cardinal principle of AACR2 is to identify the carrier and then consider content), and Tom Delseys analysis of AACR2. If the JSC decisions are made public by Midwinter 2000 it will help us take into account the direction in which they are going. John Attig is putting together a JSC web-site.
Group 1 (Matthew Beacom) which considered charge 1:
Analyzing resource-description needs of libraries, seeking input from interested librarians and discussion via the metamarda-l email reflector.
The Group reviewed existing principles for libraries and catalogs and focussed on the four user tasks identified in the Functional Requirements for Bibliographic Records (FRBR) report: Find, Identify, Select, and Obtain, adding a fifth one proposed by Rahmatollah Fattahi at the 1997 Toronto Conference: Manage our systems in ways to fulfill the four user tasks.
After trying to identify categories of users the Group decided that users are multiple, complex, and protean. As Erik Jul phrased it, We dont define who our users are; users define who users are. (Cf. also a June 1997 article in D-Lib magazine by Carl Lagoze entitled From static to dynamic surrogates: resource discovery in the digital age which contains thoughts on defining users. < http://www.dlib.org/dlib/june97/06lagoze.html>
In looking at the context in which we describe resources the Group decided that the traditional catalog is just one tool in a network of tools; it must be compatible with those other tools. These considerations raised a number of questions: How then can we create a coherent environment for our users? What is the role of the catalog in that network/environment? In attempting to answer the latter question we decided that we must be able to search multiple databases concurrently, and to take the output of information from one tool and use it as input to other tools. From this point we need to try to identify future needs, primarily interoperability and authority control. Matthew noted that Project CORC is an example of an attempt to make the Dublin Core and MARC data compatible in a single database.
Rebecca Guenther, the co-chair from MARBI, commented that it might be useful to analyze what has been learned from CORC. Possible useful data include such things as 1) Points of friction; problems that have shown up; 2) Do users of CORC find they are doing things in MARC because there is no place for them in the Dublin Core?; flaws in the DC; As someone commented, we need to overcome the hurdle of people trying to make DC do what MARC does. 3) Are there types of resources for which DC is better than MARC? Erik Jul and Eric Childress offered to compile such data for the Midwinter 2000 meeting.
Mark Watson commented that it seemed wrong to go from such large questions as interoperability to charge 5: modifying AACR2R rules!
Group 3 (Bill Fietzer) which considered charge 3:
Devising a definition of metadata and investigating the interoperability of newly emerging metadata schemes with the cataloging rules (AACR2R) and the USMARC format.
The Group started with working definitions culled from a variety of sources. The definitions depended on who the defining group was and what its purpose was. The Group ended up with the following definitions which are still works in progress:
Erik Jul suggested using FRBR terminology (find, identify, select, obtain to which he added use) in the Metadata definition so that it reads:
Mary Larsgaard asked if there are any changes we might need to make in AACR2 and/or MARC so that we can integrate them with other metadata schemes. Mary Woodley pointed out that content descriptions based on standards other than AACR2 can be loaded in MARC. She referred to the SCIPIO data which had been loaded in MARC with different authority control; no-one seems to have noticed these records!
Willy Cromwell-Kessler asked how we fit together pots of data which have different semantics so that they look seamless to users.
Erik Jul said it is possible to have multiple kinds of cataloging in a single online catalog. Referring back to the beginnings of the Intercat Project (1991) he said that it asked two questions: 1) Can we use AACR2/MARC to describe electronic resources? And 2) If we can, does it work? We aimed those questions inward, at ourselves. Now we are looking outward, at our users, but we need to ask the same two questions.
Group 2 (presented by Sherman Clarke since Dianne Hillmann, the Chair of Group, was not able to attend the first part of the Task Force meeting) which considered charge 2:
Building a conceptual map (or maps) of the resource-description terrain/landscape and developing models for accessing/using metadata both within and outside the library community.
Clarke said that the Conceptual Map of the Resource Description Landscape showed where metadata for various resources were found Before Computers, Presently, and Possibly in the Future. He noted that it mixed carrier, content, and seriality. Work needed to be done to develop thinking in multiple dimensions.
Diane Hillmann commented that the Conceptual Map was intended to be suggestive, not exhaustive.
Group 4 (Brad Eden) which considered charge 4:
Recommending ways in which libraries may best incorporate the use of metadata schemes into the current library methods of resource description and resource discovery.
The Group started by defining prototype as: a virtually seamless access to information and relevant retrieval of information from the users point of view. The next step the Group did was to identify existing experimental prototypes. For the future we need to evaluate them and to recommend best practice. They identified the following candidates to be considered as possible prototypes:
Other projects of interest
Mary Larsgaard agreed that the entire Task Force should concentrate its future attention on continuing Charge 4 by studying these prototypes and that Charge 5 would be secondary.
Mary Woodley pointed out that the A.K.A. project (a search engine designed to search across multiple databases simultaneously using one of a choice of vocabularies) would no longer exist after June 30.
Erik Jul said he was bothered by the implication of oneness in talk about integration. Willy Cromwell-Kessler said that she did not find that integration implied oneness.
Matthew Beacom commented that the preconference topics discussed today fit in well with the import of charge 4: recommending ways in which libraries may best incorporate the use of metadata schemes into the current library methods of resource description and resource discovery.
While people and economies depend on information the exchange of information has been hindered by incompatible hardware and software potholes. The World Wide Web forces us to recognize this as a big problem.
How do we solve this problem? By designing enabling technologies/standards. The W3C is a leader in this effort. The context for the problem requires us to recognize:
We also make the assumption that common architectural components (syntax, structure, semantics, protocols, etc.) help in solving the problem.
A resource description community is characterized by common semantic, structural and syntactic conventions for exchange of resource description information. Libraries are one resource description community; the conventions they use include AACR2R, MARC, LCSH, etc.
One enabling technology, a common syntax, is XML (eXtensible Markup Language). A markup language is a mechanism to define tags and the structural relationship between them in documents. Extensible means that the semantics is not defined; there is no pre-coordinated set of tags. XML is a subset of SGML, the Standardized General Markup Language (ISO 8879) and is composed of the most widely-used parts of SGML. As a subset of SGML XML is able to benefit from SGML DTDs (Document Type Definitions) and SGML can validate XML instance data.
Among the important concepts of XML is the notion of Well-formedness which means that if you have a beginning tag, you must also have an ending tag. Another concept is validity checking. Since there is no such thing as an XML DTD the SGML DTD must be used for this purpose, at least until the XML schema now in progress is completed. The XML schema is a next-generation SGML DTD. Schemas enable resource communities to define and exchange vocabularies.
SGML is character-based. For XML we need to show different types of data types, e.g., ISO date, floats, document numbers. XML is UNICODE. This enables it to handle many different kinds of characters but also makes it case sensitive.
There are several data transmission methods:
Often syntax is not enough and a common structural representation for expressing statements is required. Another enabling technology, the RDF (Resource Description Framework), provides that common structure and semantics. This data model is designed to impose structural constraint on syntax; to support consistent encoding, exchange, and processing of metadata.
The basic structural element of RDF is the RDF statement and there is only one way to express a statement. A statement has 3 components: a resource, a property, and a value. A value can be either a literal string or a resource. A property type is a type of resource. By resource the RDF means anything that can be uniquely identified.
A namespace is a place to go to find a vocabulary or standards for semantics; a way of defining context. Communities and individuals will declare their own namespaces; it wont matter if others use the same name as long as each name refers to a different URL (Universal Resource Location).
Metadata is data and can be described.
What does it all mean? A common syntax, structure, and ability to share semantics leads to a growing number of tools, both free and commercial, that support the creation, management, and navigation of structured information. These tools are ubiquitous in the web infrastructure in the form of browsers, servers, proxies, etc.
What does it all mean to the library community? The "hard problems" still remain to be solved and the library community has something to offer. We are familiar with concepts such as
The library community has a well-developed infrastructure already in place:
A role that the library community can play is to start using the W3C enabling technologies to decide what it means to start building within our own community.
Additional information can be found at:
Notes by Judith Hopkins, SUNY Buffalo