A Paper Presented at the Conference on Research Issues for Authority Control, OCLC, March 31-April 1, 1996
Background
- Planning for automated authority control component for
LIAS, Penn State's local system
- Presentation and article by Jennifer Younger
-- Concept of utility in authority control
-- Maximizing the impact of authority work
-- Correlation between disciplines and the way information is likely to be retrieved
-- Do humanists and scientists search online catalogs differently?
- Need for research
Research Plan -- Two Studies
- Survey of the content of the online catalog
[preliminary results reported here]
- Examination of user behavior related to use of the online catalog
[to be conducted later in 1996]
Basic Question:
Is there a relationship between subject matter and the distribution
of types of headings or the frequency of occurrence of individual
headings?
The Database Survey
The Sample
- A random sample of 449 records
- Sampled from records added to the online catalog 1983 through 1995
- All formats included
- In-process records excluded
- Expected to be representative of current cataloging
The "Questionnaire"
- A stripped-down MARC record
-- Included title and all access points
- Data collected on print-outs of records
- 68 questions asked for each record
- Data gathered on call number, language, names, subjects, and series
Sample Validation
- Call number breakdown of sample compared with October 1995 call number data
for the entire Penn State database
- Almost an exact match
Disciplines
Results of the Survey
Profile of the Sample
Number of Records by Language
| English | 382 | 85% |
| German | 22 | 5% |
| Spanish | 12 | 3% |
| French | 11 | 2% |
| Other | 22 | 5% |
| TOTAL | 449 | 100% |
- English accounted for an overwhelming majority of records in the sample.
Chart [.gif; 15K]
Number of Records by Discipline
| Humanities | 141 | 31.4% |
| History / Geography | 48 | 10.7% |
| Social Sciences | 142 | 31.6% |
| Science & Technology | 118 | 26.3% |
| TOTAL | 449 | 100% |
- Humanities, Social Sciences, and Science are more or less balanced.
Chart [.gif; 16K]
Number of Headings per Record: Names
| None | 7 | 1.5% |
| 1 | 289 | 64.4% |
| 2 | 105 | 23.4% |
| 3-9 | 47 | 10.5% |
| 10+ | 1 | 0.2% |
| TOTAL | 449 | 100% |
- The overwhelming majority (almost 88%) of the records had 1 or 2 names.
- Only 7 records had no name headings.
Chart [.gif; 14K]
Number of Headings per Record: Subjects
| None | 45 | 10.0% |
| 1 | 145 | 32.3% |
| 2 | 117 | 26.1% |
| 3 | 78 | 17.4% |
| 4 | 40 | 8.9% |
| 5+ | 24 | 5.3% |
| TOTAL | 449 | 100% |
- 45 records (10%) had no subject headings.
- The overwhelming majority (almost 78%) of the records had 1 to 3 subject headings.
Chart [.gif; 15K]
Number of Headings per Record: Series
- 63% of the records had no series.
Name Headings in the Sample
There were a total of 667 name headings in the sample.
Number of Names by Type of Name
| Personal | 505 | 75.7% |
| Corporate | 144 | 21.6% |
| Conference | 18 | 2.7% |
| TOTAL | 667 | 100% |
Chart [.gif; 15K]
Number of Names by Discipline: Personal Names
| Humanities | 188 | 37.2% |
| History | 45 | 8.9% |
| Social Sciences | 140 | 27.7% |
| Science | 132 | 26.2% |
| TOTAL | 505 | 100% |
- 37% of the personal name headings were in Humanities
(only 31% of the records in the sample were in Humanities)
Chart [.gif; 15K]
Number of Names by Discipline: Corporate Names
| Humanities | 19 | 13.2% |
| History | 16 | 11.1% |
| Social Sciences | 61 | 42.4% |
| Science | 48 | 33.3% |
| TOTAL | 144 | 100% |
- 42% of the corporate name headings were in Social Sciences
(only 32% of the records in the sample were in Social Sciences)
- 33% of the corporate name headings were in Science
(only 26% of the records in the sample were in Science)
- In contrast, only 13% of the corporate name headings were in Humanities
(31% of the records in the sample were in Humanities)
Chart [.gif; 15K]
Number of Names by Discipline: Conference Names
| Humanities | 0 | 0.0% |
| History | 1 | 5.5% |
| Social Sciences | 4 | 22.2% |
| Science | 13 | 72.3% |
| TOTAL | 18 | 100% |
- Most of the conference names were in Science.
- However, out of the 667 name headings in the sample,
there were only 18 conference names.
Chart [.gif; 15K]
Frequency of Sample Headings in the Database
Each name heading in the sample was searched in the Penn State online catalog [about 1.75 million records],
and the number of records with that heading were counted.
Number of Names by Type of Name
| Frequency |
1 | 2-5 | 6-10 | 11-99 | 100+ |
| Personal Names |
148 |
184 |
66 |
96 |
11 |
| Corporate / Conference Names |
21 |
22 |
10 |
36 |
55 |
- Most personal names occur 5 times or less in the database.
- Most corporate and conference names occur 10 times or more.
Chart [.gif; 17K]
Number of Personal Names by Discipline
| Frequency |
1 | 2-5 | 6-10 | 11-99 | 100+ |
| Humanities |
52 |
55 |
25 |
45 |
11 |
| History |
10 |
18 |
9 |
8 |
0 |
| Social Sciences |
37 |
60 |
17 |
26 |
0 |
| Science |
49 |
51 |
15 |
17 |
0 |
- Personal names are most likely to occur frequently in Humanities.
Chart [.gif; 18K]
Number of Corporate and Conference Names by Discipline
| Frequency |
1 | 2-5 | 6-10 | 11-99 | 100+ |
| Humanities |
4 |
5 |
3 |
4 |
3 |
| History |
4 |
1 |
1 |
1 |
10 |
| Social Sciences |
12 |
2 |
6 |
17 |
28 |
| Science |
10 |
20 |
1 |
16 |
14 |
- Corporate and conference names are
most likely to occur frequently in Social Sciences.
Chart [.gif; 18K]
Conclusions
Type of Name
- Personal names are concentrated in Humanities.
- Corporate and conference names are concentrated in Social Sciences and Science.
Frequency of Occurrence
- Personal names are less likely to occur frequently.
- Corporate and conference names are more likely to occur frequently.
Subject Headings
Topics of Interest
- Structure of subject headings:
- How complex are the headings?
- How many subdivisions do they contain?
- Verification of subject headings:
- How many headings are in LCSH?
- What is not in LCSH?
Complexity of subject headings
Of 906 subject headings in the sample,
- 31% had only one subdivision ($a)
- 42% had two subdivisions
- 27% had three or more subdivisions
Frequency of occurrence
Looking at each subdivision in the sample independent of the rest of the heading
and searching each subdivision in the entire Penn State online catalog:
- 49% of the main headings ($a) occurred in more than 100 headings in the catalog
- 92% of the topical subdivisions ($x) occurred in more than 100 headings
- 85% of the chronological subdivisions ($y) occurred in more than 100 headings
- 93% of the geographic subdivisions ($z) occurred in more than 100 headings
Verification against LCSH
Complete Headings
- Total of 816 LC subject headings (650 or 651)
- 365 of these (45%) could be verified completely in LCSH
- 451 (55%) could not be verified completely
- Almost always, this was because one or more subdivisions were not in LCSH
Topical Subdivisions ($x)
- The 816 subject headings contained 473 topical subdivisions
- 149 of these subdivisions were in LCSH, 324 were not
- Of these 324 unverified subdivisions, there were 122 unique subdivisions
- Of these 122 unverified topical subdivisions, 111 are free-floating subdivisions
- 75 of these appear in H1095
- Conclusion: Subdivision records for the headings in H1095 would greatly enhance
the odds of verification
Chronological Subdivisions ($y)
- The 816 subject headings contained 51 chronological subdivisions
- 39 of these subdivisions were in LCSH, 12 were not
- Of these 12 unverified subdivisions, there were only 2 unique subdivisions:
19th century
20th century
Geographic Subdivisions ($z)
- The 816 subject headings contained 264 geographic subdivisions
- Only 33 of these subdivisions were in LCSH, 231 were not
- Of these 231 unverified subdivisions, there were 74 unique subdivisions
- Of the 74 unique geographic subdivisions
- 3 were names of continents
- 35 were names of countries
- 11 were names of states
- 49 of the subdivisions belong to these three categories
Conclusion: If subdivision records were created for these three categories,
the odds of verification would be greatly increased.
Note: For all three categories, the form of the subdivision is the same
as the form of the main heading, so the subdivision records could be
created automatically.
Impact of Subdivision Records on our Sample
- Using just LCSH
- 365 headings (45%) were verified
- 451 headings (55%) were not
- Using LCSH plus the subdivision records recommended
- 705 headings (86.5%) were verified
- Only 111 headings (13.5%) were not
Conclusions
- Subject headings are complex and hierarchical.
- LCSH does not contain a majority of the complete subject headings appearing
in bibliographic records.
- Subdivision records can dramatically increase odds of verification.
Last updated: 5/15/96
Copyright © 1996 Christine Avery, Mary Ann Itoga, John C. Attig