Describing Data with Metadata
Often described simply as “data about data,” metadata helps others find and understand your research in a way not unlike how keywords help Google find and deliver a website. More precisely, metadata is “structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource” (NISO, 2004).
Chances are that you are already producing experiment-specific metadata in the course of your research. Instrumentation data often captures when and where a sample was analyzed; codebooks often describe how qualitative research has been collected, coded, and counted.
The challenge for many researchers lies in capturing project-wide metadata and putting into a format that computers and humans can read easily and find quickly. Below, we describe some general points for consideration when designing your research protocols—they will help you create and manage metadata much more easily, and can also assist you in writing a data management plan, where funding agencies require it.
Guiding Questions
- What metadata have you already created, or will you create, without even trying? If you are using electronic instruments, survey collection tools, data dictionaries or qualitative research analysis software in the course of your research, you may already have created a great deal of experiment-specific metadata. Project-specific metadata can be found in grant proposals and codebooks.
- What research communities will expect to see your data? What are the metadata standards for that community? Metadata standards already exist in many research communities: Darwin Core for Biology; DDI for Social Science; CSDGM (FGDC) for geospatial data; TEI for the Humanities. You can find a comprehensive list of metadata standards on Wikipedia.
- How can you describe your data in a way that will help others find it? Considering how information is often found on the web—keyword associations and search engine optimization—the more you describe your data, the more discoverable it will be. Consider assigning keywords or subject headings to your data that are relevant to others in your field. Using a controlled vocabulary (i.e., MeSH, the Getty Thesaurus of Geographic Names®) can also help when deciding how to describe data.
Best Practices
- Make metadata central to your study design or research project. Adding metadata after the fact is expensive and time-consuming.
- Use existing metadata standards and controlled vocabularies where possible.
- At the very least, supply these Dublin Core metadata elements (or related elements in other metadata schemes): Creator name(s), Title of dataset, File Information (what programs are needed to open and work with the data), and Methodology.
- To better increase discoverability (as well as compliance with grant requirements), supply these suggested metadata: Identifier(s) (DOIs, PDB ID, etc), External URI (if multiple copies of this dataset exist in other databases or websites), Coverage/Creation Dates, and Grant Number(s).
- Consider supplying preservation metadata (e.g. technical specifications, MD5 checksums, etc.) in addition to general metadata that describes the data set.
Tools and Resources
- Wikipedia: Metadata standards
- Digital Curation Centre: Metadata
- Digital Curation Centre: Scientific Metadata
- Colorado Clinical and Translational Sciences Institute: Keep Well-Documented Data Dictionaries
- JISC Digital Media: Controlling your Language: A Directory of Metadata Vocabularies [Controlled vocabularies]
- Data to Insight Center (IU): XML Metadata Concept Catalog (XMC Cat) - An open source web service that helps manage the adaptability and interoperability of metadata schema and provides support for automatic capture of metadata.






