Skip to content
Penn State University Libraries
 

Why make data accessible?

Data Sharing

“Not disseminating information that you get from your research is scientifically like being against mom and apple pie.”  ~ Richard G. Weiss, chemistry professor at Georgetown, as quoted in "NSF Revamps Data‑Sharing Policy" (September 27, 2010), in Chemical & Engineering News.

Graph showing biodiversity during Phanerozoic era.
Phanerozoic biodiversity/Sublevirtue; Wikimedia Commons, GNU FDL

Giving access to data is key for many reasons:

  • It's a requirement, likely to be adopted by other funders: first NIH, then NSF and now NEH. Can NEA be far behind?
  • It's part of good science (and good social science and good humanities): sharing data frames them as a public good, promoting community, collaboration, and conscientiousness.
  • Data sharing is a step toward reuse of data, which adds value to them: allowing other researchers access to your data opens up possibilities for new readings, new interpretations, and new discoveries.
  • It's a practice that bodes well for other aspects of doing research: being open about data can lead to sharing of process, workflows, and tools, all of which enhance the potential for replicating the research results.

In June 2011 the Office of Digital Humanities (ODH), a division in the National Endowment for the Humanities (NDH), announced a new grant program, Digital Humanities Implementation Grants (DHIG). Unlike previous ODH grant programs, this one requires submission of a sustainability and digital management plan:

Data Management Plans for NEH Office of Digital Humanities Proposals and Awards

Guide for DHIG

Not all data resulting from research can be shared, however. See the next tab in this section, Issues to Parse before Sharing Data.For more about data sharing, see the following:

Back to top

NIH

NIH Data Sharing Requirements

NIH requires a brief paragraph (which will not count towards the application page limit) following the research plan section of the PHS 398 application form immediately after letters of support.

According to the National Institutes of Health, "Data sharing achieves many important goals for the scientific community, such as

  • reinforcing open scientific inquiry
  • encouraging diversity of analysis and opinion
  • promoting new research, testing of new or alternative hypotheses and methods of analysis
  • supporting studies on data collection methods and measurement
  • facilitating education of new researchers
  • enabling the exploration of topics not envisioned by the initial investigators
  • permitting the creation of new datasets by combining data from multiple sources

Key Elements to Consider in Preparing a Data Sharing Plan Under NIH Extramural Support pdf.

NIH Data Sharing Policy and Implementation Guidance

Frequently Asked Questions on NIH Data Sharing Policy

NIH requests that all extramural applicants seeking $500,000 or more in direct costs in any one year provide a data-sharing plan in their applications. Researchers submitting grant, cooperative, or contract applications will be required to include a data sharing plan or an explanation of why data sharing is not possible. Data sharing plans or an explanation should be addressed in a brief paragraph placed after the research plan. The precise content of the data sharing plan will vary depending on the data being collected. The following may be included:

  • Schedule for data sharing
  • Format of final dataset
  • Documentation to be provided
  • Analytical tools to be provided, if any
  • Need for data sharing agreement
  • Mode of data sharing

 

 

 

Back to top

Issues to parse before sharing data

Sometimes data resulting from funded research cannot be shared, and there are policies addressing this. For example, the “Privacy Rule” of the Health Insurance Portability and Accountability Act (HIPAA) has guidelines for maintaining confidentiality of research data derived from health care records, requiring specification of data handling responsibilities and privileges.

Black and white graph showing planetary movements with cyclical lines on spatial-temporal grid. From the 10th century.
Chart of planetary movements, from 10th cent./Wikimedia Commons

The following are important questions to pose prior to making your data available (adapted from "Data Sharing Essentials," University of Wisconsin-Madison):

  • Is there personal information embedded in the data?
    • If so, then redaction and/or anonymizing techniques will need to be applied to the data before making them public.
  • Have you taken necessary measures to render your data understandable by others?
    • Raw data are rarely comprehensible outside a community of interest, or sometimes even by one's peers. Supplementary materials for describing, deciphering, and contextualizing data should be made available. These include codebooks, descriptions of methodology, data dictionaries, legends, and metadata schema, among others.
  • Do your data adhere to recognized standards and normalization practices in your field?
    • Have you ensured compliance with respect to metadata, format, and standards for description and sharing?
  • Have you considered ways your data could be repurposed? This speaks to the reuse policies for your data - should they have any?
    • As you consider this question, consult the Panton Principles, a kind of manifesto in support of open data.
Back to top

Data repositories

“The challenge before us as a profession, before each of us as researchers, and before the broader community of social scientists, is to prepare for the collection and analysis of these new data sources, to unlock the secrets they hold, and to use this new information to better understand and ameliorate the major problems that affect society and the well-being of human populations.”  Gary King (Harvard University)

Diagram shows circles with text inside them and all of them linked together, to convey the idea of linked data.
Linked data sets, 2007/Cygri; Wikimedia Commons, GNU FDL

Data Repositories, Open Data, and Open Access

As Open Science Data and Open Access, both exemplifying approaches to unrestricted dissemination of research and scholarship, gain traction in academic communities, the potential for repositories and networks encouraging the sharing of data also grows. An Open Data Query service exists, where you enquire of data holders about the openness and availability of their data. For more about Open Access and Open Data, see the Scholarly Publishing and Academic Resources Coalition (SPARC).

Below is a selected list of repositories and networks founded on the idea of making data publicly accessible:

See "Data Repositories,"  "Disciplinary Repositories," and Purdue's list of "Other Data Repositories" for more of these types of resources.

Back to top