Dept Information
- Past Projects
- Tell Us What YOU Want!
Managing data is an integral part of the research process. It can be challenging particularly when studies involve several researchers and/or when studies are conducted from multiple locations. How data is managed depends on the types of data, how data is collected and stored, and how it is used through the length of the study.
The outcome of your research depends in part on how well you manage your data. Managing data helps you as a researcher organize research files and data for easier access and analysis. It helps ensure the quality of your research. It supports the published results of your work and, in the long term, helps ensure accountability in data analysis. Good data management starts with comprehensive and consistent data documentation and should be maintained through the life cycle of the data.
What are metadata? Metadata give context to your research data by providing descriptive detail about it. They offer standardized, structured information explaining data in terms of, for example, purpose, origin, time references, geographic location, creator, access conditions, and terms of use of your data collection. Used to enable resource discovery, metadata can provide pathways for searching existing data; present as a bibliographic record for citation; or facilitate online browsing of data.
Deciding on what elements to use to describe your data is one way to start structuring them. Examples of metadata elements are title, contributor, creator, subject, description, type, format, date, relation, identifier. An example of a metadata schema, or element set, is the Dublin Core metadata schema.
What you input for "title," "subject," "format," or for any metadata field, tells something about the data you have collected. An important best practice approach to creating metadata is to use a controlled vocabulary, a standardized terminology for your community of interest (e.g., art history catalogers often use the Getty Research Institute's Art & Architecture Thesaurus). Complying with an accepted standard, such as a controlled vocabulary or an authority list (e.g., Library of Congress Authorities), will help in the retrieval and indexing of your data.
Metadata Consultation Services at Penn State University Libraries? Contact:
Kevin Clair - Metadata Librarian, Cataloging and Metadata Services kmc35@psu.edu 814-865-2257
An example of a standardized approach to describing botany data can be seen in the Swingle Plant Anatomy Collection - particularly in its data dictionary (which is a guide to types of data in a database) based in Dublin Core.
Another example is this sample GenBank record below - click on the image to go to the source site:
Close attention to storage, back-up, security, and sustainability of your data means you lessen the risks of compromising their quality and accessibility over the long term. This is also a part of data management that likely will entail collaborating with IT staff in your department or campus unit. For guidance on this, see "Questions to Ask As You Prepare a Grant Proposal," from Stanford University.
Issues related to storage include considering how rapidly data are expected to increase over the lifetime of the research project. Part of answering this question involves determining whether data will be collected in automated ways, which potentially steps up the scale of data collection, or whether staff on the project will be gathering data themselves (e.g., via inputting in a database, or a lab notebook). Options for short-term storage include hard disk drives and portable media (e.g., DVDs and CDs).
Penn State offers a range of server storage options - some free of charge, others fee-based. These services are generally for short-term storage. They are not recommended for long-term archiving or curation of digital data and content. Contact the Research Data Management Services Team for guidance on archiving and other curation issues.
All Penn State students, faculty, and staff can apply for personal Web space on the Personal server. Students, faculty and staff who request Personal Web space will get a www folder/directory added to their respective PASS folders. Space allocations for this service no longer apply due to the 10GB maximum quota available to PASS users.
TSM is a fee-based service. It acts as a file backup and archive server for the disk drives of any workstation or personal computer connected to the Internet. TSM runs as a server on the IBM RS/6000 SP under the AIX operating system In addition, TSM supports 25 different platforms as clients and offers disaster recovery and Hierarchical Storage Management (HSM).
TSM is available to Penn State faculty, staff, and departments.
Many academic departments, units, and schools/colleges at Penn State also have storage options - check with your affiliated department or unit. Penn State colleges, departments, and official units are welcome to purchase ITS Web space. An ITS Charge Account (formerly a P-Account) must be established for charging purposes.

Ownership is key: It is important for researchers to understand the relevant ownership rules for any data that they collect or use. From an ethical standpoint, researchers should consider the implications of data ownership agreements before they are made with other researchers, institutions, or funding agencies.
Typically, when research is funded by federal or nonprofit granting agencies, the data are owned by the institution receiving the grant. The primary researcher or scholar receiving the grant has the responsibility for storage and maintenance of the data, including the protection of confidential or sensitive information.
Data obtained through research supported by private or corporate funding, however, may have different guidelines for ownership and restrictions on sharing. This issue is further complicated when organizations such as universities patent data sets.
Scholars and researchers have a moral and professional responsibility to ensure that confidential or sensitive data is stored and released in a way that protects research participants. For example, the “Privacy Rule” of the federal Health Insurance Portability and Accountability Act (HIPAA) advises on maintaining confidentiality for research data that comes from health care records; HIPAA calls for specifications of data handling responsibilities and privileges.
Data that include confidential or sensitive information, if properly cleaned, can still be shared by following certain guidelines:
Data must be archived in a controlled, secure environment in a way that safeguards the primary data, observations, or recordings. The archive must be accessible by scholars analyzing the data, and available to collaborators or others who have rights of access. Primary research data should be stored securely for sufficient time following publication, analysis, or termination of the project. The number of years that data should be retained varies from field to field and may depend on the nature of the data and the research.
Sustainable data management is crucial to the value of research and crucial to ensuring continued scholarship.Typically, in data storage, there is a an access copy, for use, and an archival copy, essentially for preservation and back-up purposes. Backing up data cannot be overemphasized, just as natural disasters and breakdowns in systems and software cannot be predicted. Back up your data early and often!
Choosing data formats and software depends mostly on the preference of the researcher but can often be dictated by discipline-specific standards and customs. While ensuring the long-term usability and sustainability of data requires attention to standard and interchangeable software, there are also Preferred Formats (from the UK Data Archive) for data creation and preservation.
For more information about selecting data formats and software with respect to sustainability, see "Sustainable Data Formats" (University of Wisconsin-Madison).