ALCTS - Association of Library Collections & Technical Services

Task Force on Metadata

Meeting Minutes, 28 June 1998

Renaissance Washington Hotel, Room 6, 8:30 am - 12:30 pm


The Task Force met in a small room where the first order of business was obtaining enough chairs to seat all persons attending. This being done, the group turned to its formal agenda, with Sherry Kelley, outgoing chair, presiding. (Note: Discussion items are numbered in the order of presentation, not the order in which they appeared on the agenda.)

  1. Presentations from metadata element set developers and implementers

    1. Encoded Archival Description
      Michael Fox, Division of Library and Archives, Minnesota Historical Society

      Mr. Fox started by presenting the context for his remarks. Both the EAD [http://www.loc.gov/ead/] and ALCTS web pages have a paper on the EAD.

      Archives have large holdings, composed of collections. In the past they depended on hard copy finding aids (a multiple-level description which reflects the hierarchical organization of the collection), but are now sharing information electronically. The EAD is a model composed of elements and a structure. It is also a syntax for rendering those elements in a way that is compliant with SGML [Standard Generalized Markup Language] and XML [eXtensible Markup Language]. EAD (see URL http://www.loc.gov/ead/eadback.html) is a communications format that was designed to be used by catalogers, with close ties to MARC, the Canadian archives rules, and Steven L. Hensen’s Archives, Personal Papers and Manuscripts (APPM) which provide guidelines for the contents of EAD records.

      An EAD document is a tool for searching but it is also a container for resource delivery as images of the data being described can be embedded in an EAD record. Libraries can use it to catalog finding aids for related collections in other institutions.

      The first issue: If a finding aid is metadata, what do you catalog? the finding aid or the collection itself? There are also issues related to the choice of title. The EAD contains an EAD header, similar to the TEI header. Thus a record has two separate titles: the collection title and the EAD header title.

      There is also an issue about MARC record connections. There may be multiple descriptions created (standard MARC records) for the collection as a whole and for parts of it. The MARC record can be embedded in the EAD record, though the EAD doesn’t provide all the MARC element content designators, e.g., it lacks subfield codes.

      The MARBI/CC:DA Joint Task Force on Metadata and the Cataloging Rules report asked catalogers to focus on screen displays. Depending on the browser used, these may be different in different situations even if the underlying cataloging code is the same.

      Catalogers need to understand the underlying authoritativeness of the data. Can I rely on this record as valid and authoritative? For example, are the names under authority control?

      At Minnesota the catalogers create the MARC record first and then cut and paste it into the EAD record. They are issuing a Tag Library and hope to bring out implementation guidelines.

      The EAD is the property of the Society of American Archivists (SAA) but its maintenance is done by LC.

      Questions from other members of the Task Force followed.

      Q. Why create MARC or EAD records? Why not just mark up the finding aid in HTML?
      A. HTML is designed for screen presentation; it does not aid in content manipulation.

      Q. What should libraries be doing in terms of resource allocation?
      A. For new work the cost is no greater than was the cost of past practices. Recon of legacy data is different, and will require additional resources.
      Comment (Stuart Weibel): The focus should be on the user populations. What will the users be wanting to find and how can you help them find it? Bill Moen agreed, noting that it is better to use a title used by users than the precise collection title.

    2. Dublin Core
      Stuart Weibel, OCLC Office of Research and Special Projects

      Mr. Weibel focused on the concept of the Internet Commons – that the Internet has made an international community of all the persons who use it. Among the issues he raised were: Why do we need another content-description tool? Because the OPAC can no longer be the center of the world. The proportion of searches done in OPACs today is probably only a small proportion of the searches done with search engines. MARC and AACR2 will not be adopted by the rest of the world. The success of Yahoo! should be a clear wakeup call and indication of the direction in which we should go. We need to build bridges between communities.

      The Dublin Core is international in scope with participants coming from about 20 different countries and pilot projects under way in about 10 countries. It provides a syntax for resource discovery. Its major functions are to serve as a switching language (semantics) and to provide access to databases with different underlying schemas (structural).

      The Dublin Core provides bridges between electronic and real-life differences. For example, Archives provide many ways to access one type of material, while Libraries provide a single way to access many types of materials.

      It may not be necessary to create Dublin Core records; instead, the Dublin Core can be used to provide an interface between existing descriptions. The Dublin Core can be used as a window into databases without modifying or converting existing data, but rather by mapping existing schemas into Dublin Core semantics. While such mapping will inevitably result in some loss in precision, the countervailing benefit is a single conceptual model for users to access many disparate resource stores... that is, Dublin Core can serve as a switching language among many existing database schemas.

      Diane Hillmann is preparing a preliminary set of guidelines for use of the Dublin Core. There is an effort to make the Dublin Core extensible, to provide an interdisciplinary approach. Efforts are also underway to standardize it through preparation of IETF RFCs [Internet Engineering Task Force Requests for Comments]. It is expected that formal NISO [National Information Standards Organization] and perhaps ISO [International Standards Organization] standardization will follow.

      Q. Who is We?
      A. There is a Dublin Core Directorate formed of two committees that guide the development of the Dublin Core. The Technical Advisory Committee is composed of leaders of the various Dublin Core working groups. The Policy Advisory Committee provides input from the groups that are major stakeholders. One thing they have to do is to find a way of supporting the Dublin Core financially. OCLC is currently doing so but perhaps groups such as the IETF, W3C [World Wide Web Consortium], NSF [National Science Foundation] and even the European Community might take a role.

      Q. What is your view on the suggestion made by some library administrators looking at cost measures that Dublin Core records should replace MARC records?
      A. That was not the intention of the Dublin Core initiative. Libraries are unlikely to abandon MARC; they have too much invested in it.

    3. CIMI Project
      William Moen, School of Library and Information Sciences, University of North Texas

      CIMI is the Consortium for the Computer Interchange of Museum Information; a member organization of international museum groups. It exists to coordinate standards, and to develop and test specifications for standardizing the migration of data across systems in a meaningful way. It has not developed a metadata standard.

      CIMI does not assume the centrality of the OPAC as a gateway to information resources. Instead, it assumes the existence of multiple disparate repositories of data: art museums, natural history museums, etc. The Dublin Core offers CIMI a means for expressing queries and for accessing databases. CIMI has defined additional elements appropriate to museums, e.g., Provenance. However it assume that there will not be one metadata standard but that the Z39.50 standard, an information retrieval protocol that supports communication among different information systems, will provide access to all.

      Information about CIMI can be found at: http://www.cimi.org

      In January 1998 CIMI started to run a Dublin Core Testbed to see if it could search and retrieve across various types of databases: text files, graphical databases, etc. (For an announcement describing the Dublin Core Testbed see http://www.cimi.org/documents/metafinalPD.html.

      The testbed is meant to test both implicit and explicit assumptions about the Dublin Core, e.g., Is it easy to create and use? Is it cost effective? What museum-centric qualifiers are needed? The report on this project is due in the first quarter of 1999. Following that will be a project to test a large repository of DC records.

      Q. Robin Wendler (Harvard) asked about the natural history participation in the testbed project.
      A. The focus has been on the cultural history elements but place holders have been provided for different types of elements. There may be separate Z39.50 specifications for using Z39.50 for lab specimens, etc.
      A. Sherry Kelley (Smithsonian) noted that the Smithsonian Institution has its own test project involving many different groups but that none are from the natural history community.

      Q. How do you plan to go about testing the assumptions?
      A. While they have identified about 12-15 assumptions, the first phase has focused on 3 (with several sub-assumptions) related to creation of Dublin Core records. (See http://www.cimi.org/documents/DC_hypothoses.html for a document that identifies these assumptions.)

      Q. Robin Wendler referred to a phrase that had been widely used this morning: “Not your grandfather’s OPAC.” She noted that at Harvard they are creating multiple OPACs for different types of resources, e.g., Law School portraits in an Art database.

  2. Search Interfaces and Interoperability

    After the completion of these 3 presentations from metadata set developers and implementers, Willy Cromwell-Kessler (Research Libraries Group) spoke on search interfaces and interoperability. That at least was the formal title of her talk, but she said that its actual scope was the issue of integration for the user, based on work with various sets of museum-structured databases for RLG (which serves as the financial manager for CIMI).

    While earlier discussion had covered many different data sets, we now need a system that can bring them together. In the library community we have done that by standardizing the data exchange format (MARC) and the content description (AACR2, LCSH and other subject authority lists, internationally accepted classification schemes). It is unlikely, however, that these tools will work in the other communities that have been discussed.

    We need to make data retrieved from different systems meaningful for users. We do that by establishing sets of equivalents between different sets of data. We need an overlying user interface that can deal with these various systems; need to take into account the different syntaxes used in these various systems. MARC uses linking fields to show how fields relate to each other. Other systems use different means. There isn’t even a one-to-one mapping between elements in different systems.

    Among the issues that must be dealt with are:

    1. How do we deal with legacy data?
    2. What do we do to regularize display? Incorporate thesauri into interfaces?
    3. How can one send a Z39.50 query that will incorporate thesauri data?
    4. A new OPAC is defining itself; it may be different in different environments and for different users

    Ms. Cromwell-Kessler referred to the AMICO (Art Museum Image Consortium) project between the University of California and Harvard. cf. http://www.archimuse.com/papers/amico.spectra.9708.html

    Q. Are the 120 or so data elements in AMICO organized in supergroups?
    A. They are closest to those in Categories for Descriptions of Works of Art (CDWA).
    Bill Moen: When searching across databases there will be a loss of precision and specificity.

    Q. The AMICO project sounds as if it is incorporating the concept of incorporating thesauri authority records into the OPAC.
    A. There is need for experimentation in how to use thesauri, including multilingual thesauri, to retrieve material. Putting things at the front end is a way to better serve users rather than of saving resources.
    Stu Weibel: Resource savings can come from re-using existing resources, e.g., a MARC AACR2 record.

    Priscilla Caplan mentioned a metadata effort she thought the group should know about. She pointed out that libraries get lots of information from all sources. One such source is from publishers. Each publisher creates its own system. With the Scientific-Technical-Medical publishing community organizing itself around Rights Management for Digital Data, publishers are showing that they recognize that they create data that could be used across the entire publishing community, and beyond. They are creating a basic standard. Art and music publishers, as well as S-T-M publishers, are getting involved. They are talking about highly structured, coded, authority controlled data. Much of the work is going on under the auspices of the British Book Industry. There has been very little input as yet from the abstracting and indexing community.

  3. Summary of the report from the CC:DA Task Force on Metadata and the Cataloging Rules which Sherry Kelley had chaired and for which John Attig had served as editor.

    Sherry and John noted that the report was based on the assumptions that other metadata (beyond AACR2/MARC records) would be incorporated into OPACs, that the OPAC is central, and that some of the materials cataloged would be digital.

    The Task Force had looked at two metadata standards: the TEI header and the Dublin Core and had evaluated them on the basis of catalog records. How well do they map into AACR2 and MARC? They tried to make comparisons on the basis of content, not of syntax.

    Q. Are there ways of indicating within metadata its authoritativeness?
    A. (S.K.) You must study the guidelines for a particular metadata set to learn what they provide for content.
    A. (J.A.) Some metadata provide a mechanism for indicating sources.
    Stu Weibel: We trust in cataloging because we trust the source of the information (LC, OCLC, RLG). In the future that trust will have to be broadened to include other sources.
    J.A. Those sources (LC, OCLC, RLG) are trustworthy because of the commonly accepted standards that lie behind them.
    Bill Moen: We now have the opportunity to re-think how we are going to present information to users. The different metadata schemes, coming from different perspectives, imply that they have different users.
    Willy Cromwell-Kessler: We are moving away from an environment where we have commonly accepted definitions; it’s like moving from linear to 3 dimensional chess.

    Sherry Kelley had to leave at this point and Mary Larsgaard assumed the chair.

    Robin Wendler noted that we are dealing with changes on many levels: more kinds of materials, more kinds of user demands, changing technology, current systems that we have to keep running, and the need to save money.

    Stu Weibel said he was unhappy about the tone of the report; he thought it reflected a fortress mentality. We have to work towards new models that are positive, that take into account user needs and ways to satisfy those needs. We cannot be defensive.

    Bill Moen asked if it had been a conscious decision to distinguish between metadata and bibliographic records based on accepted standards?

    Jackie Shieh pointed out that there are many different types of brief records in local databases: temporary reserve records, on-order records, collection level records, in-process records. The OPAC is already full of stuff. Users can cope with them.

    Brad Eden thought the report was an attempt to alert CC:DA to the fact that changes are occurring rapidly and that it needs to react rapidly. In fact things had changed rapidly even during the term of the Task Force. Things are changing ever more rapidly and we cannot wait for CC:DA to proposed changes. There will be less standardization in the future as we make the changes we need locally.

    Diane Hillmann said that we bring something to other venues: experience with categorizing data, experience with large databases, etc. Even if we don’t all catalog according to the same specific rule we are all imbued with the same outlook and approach.

    John Attig responded that this was something we had to get out of the way; there is no way that one catalog will be able to deal with everything.

    Priscilla Caplan said that we should not base our work on this report. She noted that she disagreed with all its conclusions; and thought it would give people wrong impressions.

    Turning to the specific Recommendations in the report, John Attig noted that the existing cataloging rules allow us to use metadata as sources, that we do not really need changes in the rules in order to use metadata.

    Bob Thomas (WLN) suggested that it might be time to do away with the concept of chief sources of data. Use ALL sources to create the best access.

    Willy Cromwell-Kessler commented that the models are changing.

    Diane Hillmann emphasized the importance of going back to the four user tasks set forth in the IFLA report on Functional Requirements for Bibliographic Records. The purpose of description has more to do with providing access than with actually describing an object.

    Stu Weibel asked if the allocation of resources within libraries would change as a result of these activities.

  4. Draft Charge for the Task Force

    Dan Kinney, chair of CC:DA, reported on the draft charge of the joint CC:DA /MARBI task force. As it then stood, the charge read:

    “The Metadata Task Force is charged with, but not limited to:

    1. Analyzing and documenting the impact of metadata schemes on well-established library standards, such as AACR2 and MARC. Specifically, the Task Force is charged with monitoring metadata groups: evaluating implementation projects utilizing schemes, and evaluating data mapping between metadata schemes. The Task Force shall assess the consequences of integrating records containing various resource descriptions into library databases, evaluate mechanisms for integration, and recommend appropriate measures for libraries.
    2. As needed, prepare rule revision proposals and discussion papers.
    The Task Force should report to CC:DA at midwinter and annual conferences and, in general, inform ALA and the library community about metadata development.”

    Many people present had problems with the language of the draft charge. Among the questions raised were the appropriateness for a CC:DA Task Force to study to see if the existing AACR2 sources are the only ones appropriate for bibliographic records, and whether it would be possible for the group to come up with deliverables. Stu Weibel, pointing out the library community is waiting for leadership, thought the latter question was key. Robin Wendler replied that we have to lead from a position of knowledge and understanding which we now lack. Stu retorted that there comes a time when we have to act even if our action doesn’t work.

    Bill Moen pointed out that there is no one answer. Brad Eden noted that we have to work on these questions daily via an electronic discussion list, not just semi-annually at ALA conferences.

    Priscilla Caplan asked if a few specific problems could be identified that could be solved within the context of the charge.

    Mary Larsgaard phrased the goal as: Making it quick and easy for users to find things through Web with high precision.

    Priscilla Caplan said that the Dublin Core has lots of flexibility; what is needed are guidelines for library use of the Dublin Core elements. Diane Hillmann said librarians are experienced in being mediators; now we are also generating resources (through scanning, etc.). She saw the Dublin Core as providing metadata elements. What we need to do is to adjust our focus in terms of our own roles.

    Rhonda Marker asked what Priscilla meant by guidelines. Priscilla replied by giving an example, e.g., In the library community is one allowed to put non-controlled headings in a record?

    Stu Weibel asked Diane Hillmann (who is working on a user guide for the Dublin Core) if that guide was aimed at librarians. The answer was no.

    Some of the things a set of guidelines might do would be to suggest ways for librarians to provide qualifiers for Dublin Core elements, to answer questions such as where the created records are to reside and what software will be used to access them.

    Willy Cromwell-Kessler said the above comments implied endorsement of using the Dublin Core as a switching mechanism.

    Bill Moen said there was a need to clarify the definitions in the Task Force charge. There is an ecology of metadata of which library cataloging based on AACR2/MARC is just one type.

    The meeting was adjourned at 12:30 pm.