Committee on Cataloging: Description & Access
Task Force on Metadata
Meeting Minutes, 28 June 1998Renaissance Washington Hotel, Room 6, 8:30 am - 12:30 pm
The Task Force met in a small room where the first order of business was obtaining enough chairs to seat all persons attending. This being done, the group turned to its formal agenda, with Sherry Kelley, outgoing chair, presiding. (Note: Discussion items are numbered in the order of presentation, not the order in which they appeared on the agenda.)
Michael Fox, Division of Library and Archives, Minnesota Historical Society
Mr. Fox started by presenting the context for his remarks. Both the EAD [http://www.loc.gov/ead/] and ALCTS web pages have a paper on the EAD.
Archives have large holdings, composed of collections. In the past they depended on hard copy finding aids (a multiple-level description which reflects the hierarchical organization of the collection), but are now sharing information electronically. The EAD is a model composed of elements and a structure. It is also a syntax for rendering those elements in a way that is compliant with SGML [Standard Generalized Markup Language] and XML [eXtensible Markup Language]. EAD (see URL http://www.loc.gov/ead/eadback.html) is a communications format that was designed to be used by catalogers, with close ties to MARC, the Canadian archives rules, and Steven L. Hensens Archives, Personal Papers and Manuscripts (APPM) which provide guidelines for the contents of EAD records.
An EAD document is a tool for searching but it is also a container for resource delivery as images of the data being described can be embedded in an EAD record. Libraries can use it to catalog finding aids for related collections in other institutions.
The first issue: If a finding aid is metadata, what do you catalog? the finding aid or the collection itself? There are also issues related to the choice of title. The EAD contains an EAD header, similar to the TEI header. Thus a record has two separate titles: the collection title and the EAD header title.
There is also an issue about MARC record connections. There may be multiple descriptions created (standard MARC records) for the collection as a whole and for parts of it. The MARC record can be embedded in the EAD record, though the EAD doesnt provide all the MARC element content designators, e.g., it lacks subfield codes.
The MARBI/CC:DA Joint Task Force on Metadata and the Cataloging Rules report asked catalogers to focus on screen displays. Depending on the browser used, these may be different in different situations even if the underlying cataloging code is the same.
Catalogers need to understand the underlying authoritativeness of the data. Can I rely on this record as valid and authoritative? For example, are the names under authority control?
At Minnesota the catalogers create the MARC record first and then cut and paste it into the EAD record. They are issuing a Tag Library and hope to bring out implementation guidelines.
The EAD is the property of the Society of American Archivists (SAA) but its maintenance is done by LC.
Questions from other members of the Task Force followed.
Q. Why create MARC or EAD records? Why not just mark up the finding
aid in HTML?
Q. What should libraries be doing in terms of resource allocation?
Mr. Weibel focused on the concept of the Internet Commons that the Internet has made an international community of all the persons who use it. Among the issues he raised were: Why do we need another content-description tool? Because the OPAC can no longer be the center of the world. The proportion of searches done in OPACs today is probably only a small proportion of the searches done with search engines. MARC and AACR2 will not be adopted by the rest of the world. The success of Yahoo! should be a clear wakeup call and indication of the direction in which we should go. We need to build bridges between communities.
The Dublin Core is international in scope with participants coming from about 20 different countries and pilot projects under way in about 10 countries. It provides a syntax for resource discovery. Its major functions are to serve as a switching language (semantics) and to provide access to databases with different underlying schemas (structural).
The Dublin Core provides bridges between electronic and real-life differences. For example, Archives provide many ways to access one type of material, while Libraries provide a single way to access many types of materials.
It may not be necessary to create Dublin Core records; instead, the Dublin Core can be used to provide an interface between existing descriptions. The Dublin Core can be used as a window into databases without modifying or converting existing data, but rather by mapping existing schemas into Dublin Core semantics. While such mapping will inevitably result in some loss in precision, the countervailing benefit is a single conceptual model for users to access many disparate resource stores... that is, Dublin Core can serve as a switching language among many existing database schemas.
Diane Hillmann is preparing a preliminary set of guidelines for use of the Dublin Core. There is an effort to make the Dublin Core extensible, to provide an interdisciplinary approach. Efforts are also underway to standardize it through preparation of IETF RFCs [Internet Engineering Task Force Requests for Comments]. It is expected that formal NISO [National Information Standards Organization] and perhaps ISO [International Standards Organization] standardization will follow.
Q. Who is We?
Q. What is your view on the suggestion made by some library administrators
looking at cost measures that Dublin Core records should replace MARC
CIMI is the Consortium for the Computer Interchange of Museum Information; a member organization of international museum groups. It exists to coordinate standards, and to develop and test specifications for standardizing the migration of data across systems in a meaningful way. It has not developed a metadata standard.
CIMI does not assume the centrality of the OPAC as a gateway to information resources. Instead, it assumes the existence of multiple disparate repositories of data: art museums, natural history museums, etc. The Dublin Core offers CIMI a means for expressing queries and for accessing databases. CIMI has defined additional elements appropriate to museums, e.g., Provenance. However it assume that there will not be one metadata standard but that the Z39.50 standard, an information retrieval protocol that supports communication among different information systems, will provide access to all.
Information about CIMI can be found at: http://www.cimi.org
In January 1998 CIMI started to run a Dublin Core Testbed to see if it could search and retrieve across various types of databases: text files, graphical databases, etc. (For an announcement describing the Dublin Core Testbed see http://www.cimi.org/documents/metafinalPD.html.
The testbed is meant to test both implicit and explicit assumptions about the Dublin Core, e.g., Is it easy to create and use? Is it cost effective? What museum-centric qualifiers are needed? The report on this project is due in the first quarter of 1999. Following that will be a project to test a large repository of DC records.
Q. Robin Wendler (Harvard) asked about the natural history
participation in the testbed project.
Q. How do you plan to go about testing the assumptions?
Q. Robin Wendler referred to a phrase that had been widely used this morning: Not your grandfathers OPAC. She noted that at Harvard they are creating multiple OPACs for different types of resources, e.g., Law School portraits in an Art database.
After the completion of these 3 presentations from metadata set developers and implementers, Willy Cromwell-Kessler (Research Libraries Group) spoke on search interfaces and interoperability. That at least was the formal title of her talk, but she said that its actual scope was the issue of integration for the user, based on work with various sets of museum-structured databases for RLG (which serves as the financial manager for CIMI).
While earlier discussion had covered many different data sets, we now need a system that can bring them together. In the library community we have done that by standardizing the data exchange format (MARC) and the content description (AACR2, LCSH and other subject authority lists, internationally accepted classification schemes). It is unlikely, however, that these tools will work in the other communities that have been discussed.
We need to make data retrieved from different systems meaningful for users. We do that by establishing sets of equivalents between different sets of data. We need an overlying user interface that can deal with these various systems; need to take into account the different syntaxes used in these various systems. MARC uses linking fields to show how fields relate to each other. Other systems use different means. There isnt even a one-to-one mapping between elements in different systems.
Among the issues that must be dealt with are:
Ms. Cromwell-Kessler referred to the AMICO (Art Museum Image Consortium) project between the University of California and Harvard. cf. http://www.archimuse.com/papers/amico.spectra.9708.html
Q. Are the 120 or so data elements in AMICO organized in supergroups?
Q. The AMICO project sounds as if it is incorporating the concept of
incorporating thesauri authority records into the OPAC.
Priscilla Caplan mentioned a metadata effort she thought the group should know about. She pointed out that libraries get lots of information from all sources. One such source is from publishers. Each publisher creates its own system. With the Scientific-Technical-Medical publishing community organizing itself around Rights Management for Digital Data, publishers are showing that they recognize that they create data that could be used across the entire publishing community, and beyond. They are creating a basic standard. Art and music publishers, as well as S-T-M publishers, are getting involved. They are talking about highly structured, coded, authority controlled data. Much of the work is going on under the auspices of the British Book Industry. There has been very little input as yet from the abstracting and indexing community.
Sherry and John noted that the report was based on the assumptions that other metadata (beyond AACR2/MARC records) would be incorporated into OPACs, that the OPAC is central, and that some of the materials cataloged would be digital.
The Task Force had looked at two metadata standards: the TEI header and the Dublin Core and had evaluated them on the basis of catalog records. How well do they map into AACR2 and MARC? They tried to make comparisons on the basis of content, not of syntax.
Q. Are there ways of indicating within metadata its authoritativeness?
Sherry Kelley had to leave at this point and Mary Larsgaard assumed the chair.
Robin Wendler noted that we are dealing with changes on many levels: more kinds of materials, more kinds of user demands, changing technology, current systems that we have to keep running, and the need to save money.
Stu Weibel said he was unhappy about the tone of the report; he thought it reflected a fortress mentality. We have to work towards new models that are positive, that take into account user needs and ways to satisfy those needs. We cannot be defensive.
Bill Moen asked if it had been a conscious decision to distinguish between metadata and bibliographic records based on accepted standards?
Jackie Shieh pointed out that there are many different types of brief records in local databases: temporary reserve records, on-order records, collection level records, in-process records. The OPAC is already full of stuff. Users can cope with them.
Brad Eden thought the report was an attempt to alert CC:DA to the fact that changes are occurring rapidly and that it needs to react rapidly. In fact things had changed rapidly even during the term of the Task Force. Things are changing ever more rapidly and we cannot wait for CC:DA to proposed changes. There will be less standardization in the future as we make the changes we need locally.
Diane Hillmann said that we bring something to other venues: experience with categorizing data, experience with large databases, etc. Even if we dont all catalog according to the same specific rule we are all imbued with the same outlook and approach.
John Attig responded that this was something we had to get out of the way; there is no way that one catalog will be able to deal with everything.
Priscilla Caplan said that we should not base our work on this report. She noted that she disagreed with all its conclusions; and thought it would give people wrong impressions.
Turning to the specific Recommendations in the report, John Attig noted that the existing cataloging rules allow us to use metadata as sources, that we do not really need changes in the rules in order to use metadata.
Bob Thomas (WLN) suggested that it might be time to do away with the concept of chief sources of data. Use ALL sources to create the best access.
Willy Cromwell-Kessler commented that the models are changing.
Diane Hillmann emphasized the importance of going back to the four user tasks set forth in the IFLA report on Functional Requirements for Bibliographic Records. The purpose of description has more to do with providing access than with actually describing an object.
Stu Weibel asked if the allocation of resources within libraries would change as a result of these activities.
Dan Kinney, chair of CC:DA, reported on the draft charge of the joint CC:DA /MARBI task force. As it then stood, the charge read:
The Metadata Task Force is charged with, but not limited to:
Many people present had problems with the language of the draft charge. Among the questions raised were the appropriateness for a CC:DA Task Force to study to see if the existing AACR2 sources are the only ones appropriate for bibliographic records, and whether it would be possible for the group to come up with deliverables. Stu Weibel, pointing out the library community is waiting for leadership, thought the latter question was key. Robin Wendler replied that we have to lead from a position of knowledge and understanding which we now lack. Stu retorted that there comes a time when we have to act even if our action doesnt work.
Bill Moen pointed out that there is no one answer. Brad Eden noted that we have to work on these questions daily via an electronic discussion list, not just semi-annually at ALA conferences.
Priscilla Caplan asked if a few specific problems could be identified that could be solved within the context of the charge.
Mary Larsgaard phrased the goal as: Making it quick and easy for users to find things through Web with high precision.
Priscilla Caplan said that the Dublin Core has lots of flexibility; what is needed are guidelines for library use of the Dublin Core elements. Diane Hillmann said librarians are experienced in being mediators; now we are also generating resources (through scanning, etc.). She saw the Dublin Core as providing metadata elements. What we need to do is to adjust our focus in terms of our own roles.
Rhonda Marker asked what Priscilla meant by guidelines. Priscilla replied by giving an example, e.g., In the library community is one allowed to put non-controlled headings in a record?
Stu Weibel asked Diane Hillmann (who is working on a user guide for the Dublin Core) if that guide was aimed at librarians. The answer was no.
Some of the things a set of guidelines might do would be to suggest ways for librarians to provide qualifiers for Dublin Core elements, to answer questions such as where the created records are to reside and what software will be used to access them.
Willy Cromwell-Kessler said the above comments implied endorsement of using the Dublin Core as a switching mechanism.
Bill Moen said there was a need to clarify the definitions in the Task Force charge. There is an ecology of metadata of which library cataloging based on AACR2/MARC is just one type.
The meeting was adjourned at 12:30 pm.