Skip to content
Penn State University Libraries

Social Sciences Statistics & Data

 

Contact

Stephen  Woods photo

Stephen Woods
Title: Social Sciences Librarian Specializing in Data and Government Information


Subject Specialist:
Statistics and Data
U.S. Government Documents
814-865-0665
e-mail: sj231@psu.edu

Social Sciences Library

 

Getting Started with Data Research

Overview

Welcome. The Social Sciences Statistics and Data Web site contains most of the recognized social sciences data available through the Penn State University Library system. Much of the information is digital online, along with print copies of many codebooks, software manuals, and reference books pertaining to data and statistical analysis.

 

Getting Started With Data Research

This tutorial is intended to assist researchers in finding essential social sciences numbers, data, and data sets. Before you can begin finding data sets, identify all relative variables that you will need. Some variable examples:

  • age
  • gender
  • race
  • nationality
  • marital status
  • employment
  • education
  • political
  • religion
  • geographic location
  • housing

Researchers who have a fundamental understanding of statistics, or who are unable to find the appropriate resources can use The CAT (Penn State Libraries' online catalog), the Libraires' Databases by Title (A-Z List), online reference resources, and consultation with librarians at the Social Sciences Library (or your campus library) to find the appropriate data sets.

Reference Librarians are available to help you find the resources you need. Please do not hesitate to contact a librarian for more assistance.

 

Research Resource Example

How you search and work with the data will depend on which study or data resource you select for your research. One of the primary data resources available to Penn State researchers of social sciences is the extensive ICPSR (Inter-university Consortium for Political and Social Research) data archive. You can search for studies in the ICPSR archive in three ways:

  1. by using keywords or key phrases appearing in the title, abstract, principal investigator, and ICPSR study number fields
  2. Search by subject using the ICPSR list of subject headings
  3. Search the thesaurus for detailed subject terms, geographic terms, and personal names

 

datacommons@psu

"The datacommons@psu was developed to provide a resource for data sharing, discovery, and archiving for the Penn State research and teaching community. Access to information is vitally to the research, teaching, and outreach conducted at Penn State. The datacommons@psu serves as a data discovery tool, a data archive for research data created by Penn State for projects funded by agencies like the National Science Foundation, as well as a portal to data, applications, and resources throughout the university.

The datacommons@psu facilitates interdisciplinary cooperation and collaboration by connecting people and resources and by:

  • Acquiring, storing, and providing discovery tools for geospatial data and applications.
  • Highlighting existing resources developed or housed by Penn State.
  • Providing access to project/program partners via map or web services.
  • Providing support for researchers and Penn State organizations to store and publish their data.

Members of the Penn State community can easily share and house their data through the datacommons@psu. The datacommons@psu will also develop metadata for your data and provide information to support your NSF, NIH, or other agency data management plan."

Frequently-asked Questions

What is a Data Set?

A data set is a compilation of data elements which represent the characteristics of a systematically-drawn sample of observations. There are two types of data sets: primary and secondary.

  • Primary Data set
    A primary data set is collected and compiled by the researcher for the purpose of addressing a specific research question.

  • Secondary Data set
    A secondary data set is collected and compiled by either another individual or agency. Secondary data sets can consist of public use files, restricted access files, or a combination of both. Examples of secondary data sets include the Health Care Financing Administration's Medicare Current Beneficiary Survey which profiles the demographic characteristics, health status and functioning, access to care, sources of and satisfaction with care, insurance coverage, financial resources, and family supports of Medicare beneficiaries, the National Opinion Research Center's General Social Survey (GSS) (which focuses on various topics such as the role of government, sociopolitical participation, social networks, religious socialization, etc.), or the National Center for Health Statistics' National Health Interview Survey (which provides information about the amount and distribution of illness, its effects in terms of disability and chronic impairments, and the kinds of health services people receive).

 

What is a codebook?

Generically, any information on the structure, contents, and layout of a data file. Typically, a codebook includes:

  • column locations and widths for each variable
  • definitions of different record types
  • response codes for each variable
  • codes used to indicate nonresponse and missing data
  • exact questions and skip patterns used in a survey

Many codebooks also include frequencies of response. Codebooks vary widely in quality and amount of information included.

 

How to access Codebooks

The codebook provides the user with the information necessary to access and analyze a data set. It is usually necessary to review the codebook to determine whether the data set will provide the information you need.

In some cases the Penn State Libraries will have a paper copy of the codebook that accompanies a data set. To see if a specific codebook is available in the Libraries, search The CAT. Use keywords from the name of the study to locate a codebook.

In cases where a keyword search yields several similar entries you must look through the records for the one that indicates the type of file is data. For example, a keyword search for the Medicare Current Beneficiary Survey yields several similar entries, and the user will have to determine which of these is the appropriate one. In the case of the Medicare Current Beneficiary Survey, the appropriate entry would be the one that reads "Medicare Current Beneficiary Survey". Any codebooks associated with the data will be listed on the record and can be checked out of the library like any other book.

In some cases data sets are owned by the Libraries on CD-ROM or floppy disk. These data sets are cataloged and can be located using The CAT. The CAT record will note if there is a print codebook available. In some cases the codebook is available on the CD-ROM along with the data itself.

 

How do I get help with data set software packages?

For assistance getting started, make an appointment with Stephen Woods [e-mail: swoods@psu.edu], the Social Sciences Librarian who specializes in statistics and data.

 

What's the difference between phrase searching and word searching?

When you select word searches, the search engine looks for the words independently of one another. Phrase searching looks for the words together. For example, searching for labor force as a phrase will ensure that you don't get search results relating to pregnancy. Phrase searching is best used when one of your search terms is very common, but its context within the phrase is rather specific.

 

When searching, should I type my terms in uppercase or lowercase letters?

Searching is not case-sensitive when your search query is in lowercase. Searching is case-sensitive when the query term contains uppercase letters.

Data Tutorials

Use of Secondary Data

Restrictions on Data Use

It is essential to consider the legal issues associated with the use of a given data set (which is not a public use file). Several federal agencies impose restrictions on the use of confidential data, such as limited scope of use, limited period of access, and specific access and storage procedures. Generally, the data release forms of each agency dictate the level and extent of security measures required. Failure to comply with the data release agreements or restrictions could result in cancellation of funding and debarment from future grants or contracts from any federal agency. Further, violating any confidentiality agreement with a federal agency could lead to fines and imprisonment. It is recommended that the restrictions stipulated by the specific federal agency be examined in detail, prior to accessing any data set.

Penn State University's Training on the Protection of Human Research Participants

Before you use any data at Penn State involving human participants you are required to take Penn State University's Protection of Human Research Participants training.

This online basic training course has been mandated by Federal regulations and is required before approval can be granted for the use of human participants in any University research project.

Each person proposing to conduct research involving human participants (collecting new data or using existing data) must complete the basic training at least once while at Penn State. After you have read the web material presented, you will be presented with the quiz link. The quiz does not have to be taken as soon as you finish reading. You will be presented with ten randomly-selected questions. Upon completion of the quiz, your answers will be scored and the results will be immediately submitted to you and the Office for Research Protections.

Completion of this course and a passing grade of at least 70% on the corresponding exam will fulfill the training requirement. Please note that you may re-take the exam, if needed; however, you cannot simply change incorrect answers. A new quiz will need to be completed and submitted.

Before you start any research project - honors or master's thesis, doctoral dissertation, grant-funded research, or any research that could result in published materials or be presented in a public venue, you must receive approval from Penn State's Institutional Review Board (IRB). This form must be filled out by anyone who is involved in any of the research endeavors mentioned above.

A recent change is that if you have not taken the Human Participants Training Course (see above), you will not be granted IRB approval. Be sure to take the course before you seek IRB approval for your research project.

Related Libraries' Guides