ALCTS - Association of Library Collections & Technical Services

Task Force on Metadata

Meeting Minutes

Sunday, June 27, 1999 8:30 a.m. - 12:30 p.m.
Fairmont Hotel, Bayou II

  1. The Task Force chair, Mary Larsgaard, started by describing the plans for the 2000 pre-conference.
  2. The CC:DA 2000 Metadata subcommittee met on Saturday, 6/26 to plan for the pre-conference to be held at the 2000 annual conference. It is to have a practical orientation, describing options for providing access to electronic resources and the pros and cons of the various options. It will last 2 days (Thursday-Friday); the estimated cost is $235.00. OCLC will sponsor the coffee breaks.

    The tentative outline is:

    • Overview by a library director with cataloging experience
    • Use of AACR2 and MARC21 (with emphasis on recent changes)
    • Seriality
    • Alternate approaches: CORC, ISO standards of interest (e.g., the date standard)
    • Non-library and consortial groups working with the web, e.g., the Colorado group, EAD, Art world
    • Creating metadata for items scanned in-house: What kinds of metadata do you create for that?
    • Vision: What are the ideal methods for providing access to materials on the web?

    The full pre-conference program will be posted on the web by mid-August.

    The TF decided that the intended audience for the pre-conference would be decision makers in the technical services arena: catalogers, collection development librarians, and their managers.

  3. Plan for coming year. Mary Larsgaard next talked about the Task Force’s plan for the coming year (the T.F. goes out of existence at the annual meeting in the year 2000). We need to focus on the fifth charge (but see page 4 below):
  4. Recommending, as needed, rule revision to enable interoperability of cataloging (with AACR2R) with metadata schemes.

    The Joint Steering Committee on the Revision of AACR will meet in Brisbane, Australia in October 1999 and will consider the reports of the Task Force on ISBD(ER), the Task Force on Rule 0.24 (which currently says that the cardinal principle of AACR2 is to identify the carrier and then consider content), and Tom Delsey’s analysis of AACR2. If the JSC decisions are made public by Midwinter 2000 it will help us take into account the direction in which they are going. John Attig is putting together a JSC web-site.

  5. Summary reports on charges 1-4 by leaders of the midwinter breakout groups that considered them.
  6. Group 1 (Matthew Beacom) which considered charge 1:

    Analyzing resource-description needs of libraries, seeking input from interested librarians and discussion via the metamarda-l email reflector.

    The Group reviewed existing principles for libraries and catalogs and focussed on the four user tasks identified in the Functional Requirements for Bibliographic Records (FRBR) report: Find, Identify, Select, and Obtain, adding a fifth one proposed by Rahmatollah Fattahi at the 1997 Toronto Conference: Manage our systems in ways to fulfill the four user tasks.

    After trying to identify categories of users the Group decided that users are multiple, complex, and protean. As Erik Jul phrased it, “We don’t define who our users are; users define who users are.” (Cf. also a June 1997 article in D-Lib magazine by Carl Lagoze entitled “From static to dynamic surrogates: resource discovery in the digital age” which contains thoughts on defining users. < http://www.dlib.org/dlib/june97/06lagoze.html>

    In looking at the context in which we describe resources the Group decided that the traditional catalog is just one tool in a network of tools; it must be compatible with those other tools. These considerations raised a number of questions: How then can we create a coherent environment for our users? What is the role of the catalog in that network/environment? In attempting to answer the latter question we decided that we must be able to search multiple databases concurrently, and to take the output of information from one tool and use it as input to other tools. From this point we need to try to identify future needs, primarily interoperability and authority control. Matthew noted that Project CORC is an example of an attempt to make the Dublin Core and MARC data compatible in a single database.

    Rebecca Guenther, the co-chair from MARBI, commented that it might be useful to analyze what has been learned from CORC. Possible useful data include such things as 1) Points of friction; problems that have shown up; 2) Do users of CORC find they are doing things in MARC because there is no place for them in the Dublin Core?; flaws in the DC; As someone commented, we need to overcome the hurdle of people trying to make DC do what MARC does. 3) Are there types of resources for which DC is better than MARC? Erik Jul and Eric Childress offered to compile such data for the Midwinter 2000 meeting.

    Mark Watson commented that it seemed wrong to go from such large questions as interoperability to charge 5: modifying AACR2R rules!

    Group 3 (Bill Fietzer) which considered charge 3:

    Devising a definition of “metadata” and investigating the interoperability of newly emerging metadata schemes with the cataloging rules (AACR2R) and the USMARC format.

    The Group started with working definitions culled from a variety of sources. The definitions depended on who the defining group was and what its purpose was. The Group ended up with the following definitions which are still works in progress:

    METADATA are structured, encoded data that describe characteristics of information-bearing entities to aid in the identification, discovery, assessment, and management of the described entities

    INTEROPERABILITY is the ability of two or more systems or components to exchange information and use the exchanged information without special effort on either system.

    A METADATA SCHEME provides a formal structure designed to identify the knowledge structure of a given discipline and to link that structure to the information of the discipline through the creation of an information system that will assist the identification, discovery and use of information within that discipline.

    Erik Jul suggested using FRBR terminology (find, identify, select, obtain to which he added use) in the Metadata definition so that it reads:

    METADATA are structured, encoded data that describe characteristics of information-bearing entities to aid in the finding, identification, selection, obtaining, assessment, and use of the described entities

    Mary Larsgaard asked if there are any changes we might need to make in AACR2 and/or MARC so that we can integrate them with other metadata schemes. Mary Woodley pointed out that content descriptions based on standards other than AACR2 can be loaded in MARC. She referred to the SCIPIO data which had been loaded in MARC with different authority control; no-one seems to have noticed these records!

    Willy Cromwell-Kessler asked how we fit together “pots of data” which have different semantics so that they look seamless to users.

    Erik Jul said it is possible to have multiple kinds of cataloging in a single online catalog. Referring back to the beginnings of the Intercat Project (1991) he said that it asked two questions: 1) Can we use AACR2/MARC to describe electronic resources? And 2) If we can, does it work? We aimed those questions inward, at ourselves. Now we are looking outward, at our users, but we need to ask the same two questions.

    Group 2 (presented by Sherman Clarke since Dianne Hillmann, the Chair of Group, was not able to attend the first part of the Task Force meeting) which considered charge 2:

    Building a conceptual map (or maps) of the resource-description terrain/landscape and developing models for accessing/using metadata both within and outside the library community.

    Clarke said that the Conceptual Map of the Resource Description Landscape showed where metadata for various resources were found Before Computers, Presently, and Possibly in the Future. He noted that it mixed carrier, content, and seriality. Work needed to be done to develop thinking in multiple dimensions.

    Diane Hillmann commented that the Conceptual Map was intended to be suggestive, not exhaustive.

    Group 4 (Brad Eden) which considered charge 4:

    Recommending ways in which libraries may best incorporate the use of metadata schemes into the current library methods of resource description and resource discovery.

    The Group started by defining “prototype” as: “a virtually seamless access to information and relevant retrieval of information from the user’s point of view.” The next step the Group did was to identify existing experimental prototypes. For the future we need to evaluate them and to recommend best practice. They identified the following candidates to be considered as possible prototypes:

    BIBLINK
    http://hosted.ukoln.ac.uk/biblink/

    Cooperative Online Resource Catalog (CORC)
    http://corc.oclc.org/

    Arts and Humanities Data Service (articles about art)
    http://ahds.ac.uk/public/metadata/discovery.html

    Getty Research Institute auction catalog records
    http://opac.pub.getty.edu

    a.k.a. project by Getty Research Institute (public site to be pulled shortly)
    http://www.ahip.getty.edu/aka/ [Powerpoint presentation on a.k.a.]

    Arthur project by Getty Research Institute (an images database)
    http://www.ahip.getty.edu/arthur/

    Other projects of interest

    http://facesla.org

    Mary Larsgaard agreed that the entire Task Force should concentrate its future attention on continuing Charge 4 by studying these prototypes and that Charge 5 would be secondary.

    Mary Woodley pointed out that the A.K.A. project (a search engine designed to search across multiple databases simultaneously using one of a choice of vocabularies) would no longer exist after June 30.

    Erik Jul said he was bothered by the implication of “oneness” in talk about “integration.” Willy Cromwell-Kessler said that she did not find that integration implied “oneness.”

    Matthew Beacom commented that the preconference topics discussed today fit in well with the import of charge 4: recommending ways in which libraries may best incorporate the use of metadata schemes into the current library methods of resource description and resource discovery.

  7. Web standards for describing data: portions of the landscape. (Eric Miller and Diane Hillmann)
  8. The standards described were XML (eXtensible Markup Language) and the RDF (Resource Description Framework), both of which are recommendations of the W3C (World Wide Web Consortium), its highest level standard.

    Eric Miller of the OCLC Office of Research did the presentation with Diane chiming in with comments.

    While people and economies depend on information the exchange of information has been hindered by incompatible hardware and software potholes. The World Wide Web forces us to recognize this as a big problem.

    How do we solve this problem? By designing enabling technologies/standards. The W3C is a leader in this effort. The context for the problem requires us to recognize:

    • There are multiple stakeholders and requirements
    • International community
    • Requirements will evolve

    We also make the assumption that common architectural components (syntax, structure, semantics, protocols, etc.) help in solving the problem.

    A resource description community is characterized by common semantic, structural and syntactic conventions for exchange of resource description information. Libraries are one resource description community; the conventions they use include AACR2R, MARC, LCSH, etc.

    One enabling technology, a common syntax, is XML (eXtensible Markup Language). A markup language is a mechanism to define tags and the structural relationship between them in documents. Extensible means that the semantics is not defined; there is no pre-coordinated set of tags. XML is a subset of SGML, the Standardized General Markup Language (ISO 8879) and is composed of the most widely-used parts of SGML. As a subset of SGML XML is able to benefit from SGML DTDs (Document Type Definitions) and SGML can validate XML instance data.

    Among the important concepts of XML is the notion of “Well-formedness” which means that if you have a beginning tag, you must also have an ending tag. Another concept is “validity checking.” Since there is no such thing as an XML DTD the SGML DTD must be used for this purpose, at least until the XML schema now in progress is completed. The XML schema is a next-generation SGML DTD. Schemas enable resource communities to define and exchange vocabularies.

    SGML is character-based. For XML we need to show different types of data types, e.g., ISO date, floats, document numbers. XML is UNICODE. This enables it to handle many different kinds of characters but also makes it case sensitive.

    There are several data transmission methods:

    1. Embedded in the XML document (analogous to the use of META in HTML)
    2. Associated with (in HTTP header)
    3. Trusted 3d party (explicit HTTP GET)

    Often syntax is not enough and a common structural representation for expressing statements is required. Another enabling technology, the RDF (Resource Description Framework), provides that common structure and semantics. This data model is designed to impose structural constraint on syntax; to support consistent encoding, exchange, and processing of metadata.

    The basic structural element of RDF is the RDF statement and there is only one way to express a statement. A statement has 3 components: a resource, a property, and a value. A value can be either a literal string or a resource. A property type is a type of resource. By “resource” the RDF means “anything that can be uniquely identified.”

    A “namespace” is a place to go to find a vocabulary or standards for semantics; a way of defining context. Communities and individuals will declare their own namespaces; it won’t matter if others use the same name as long as each name refers to a different URL (Universal Resource Location).

    Metadata is data and can be described.

    What does it all mean? A common syntax, structure, and ability to share semantics leads to a growing number of tools, both free and commercial, that support the creation, management, and navigation of structured information. These tools are ubiquitous in the web infrastructure in the form of browsers, servers, proxies, etc.

    What does it all mean to the library community? The "hard problems" still remain to be solved and the library community has something to offer. We are familiar with concepts such as

    • Works and Manifestations,
    • with dealing with multiple languages and
    • with aggregations (collection vs. item where the item itself may be a collection of other items).

    The library community has a well-developed infrastructure already in place:

    • Link authorities (Subject, name, place, etc.)
    • The modular components “Legos” such as AACR2, MARK, LCSH
    • Guidelines, such as Diane Hillmann’s User Guidelines for Dublin Core.

    A role that the library community can play is to start using the W3C enabling technologies to decide what it means to start building within our own community.

    Additional information can be found at:


Notes by Judith Hopkins, SUNY Buffalo