Skip to main content

How to Cite Data: Key Components

The purpose of citations is to enable others to find the same sources you used. Data are like any other source and should be cited in your bibliography and your writing.

Why Cite Data?

When you collect your own data, citing its location makes it possible for others to find them and extend your research, raising your profile as a researcher. ICPSR provides a good overview of the importance of data citation:

"Citing data files in publications based on those data is important for several reasons:

  • Other researchers may want to replicate research findings and need the bibliographic information provided in citations to identify and locate the referenced data.
  • Citations appearing in publication references are harvested by key electronic social sciences indexes, such as Web of Science, providing credit to the researchers.
  • Data producers, funding agencies, and others can track citations to specific collections to determine types and levels of usage, thus measuring impact."

If you're using data you didn't gather yourself, citing your source is just as important as citing your other research sources. For other scholars to be able to examine and extend your work, they must be able to find the original data.

Consequently, although most style guides do not include examples for citing data, consider the key components and other elements at right and work them into the style you're using.

Citing Scraped Data

Note that the elements provided here all refer to datasets that have been either published in some way, or deposited in a repository.  It is more difficult to cite data that have not been preserved or fixed in some way. 

If you plan to scrape data, FIRST CONTACT DIGITAL RESEARCH SERVICES to be sure you are not violating the legal license terms under which we operate.  You will also need to explore if copyright and licensing terms allow you to preserve and/or share the data you obtain in this manner.

Once you are sure you have permission to scrape, preserve and/or share, make a plan for how to share this information with other researchers.

You may want to

  1. Deposit and cite the data you scraped, and
  2. Deposit the script(s) you used to scrape them in figshare or Zenodo, and cite  them.  (Both of these repositories can assign Digital Object Identifiers (DOIs), to both software [i.e., scripts] and datasets, making them easier and more reliable to cite.)

If you are scraping web pages (as opposed to database content), you should cite a list of all the urls you scraped.  You may also wish to make sure all scraped pages are archived by the WayBackMachine so that they continue to be accessible in the format you encountered despite later changes.

Thanks to Sebastian Karcher of the Qualitative Data Archive for much of this advice.

Key Components of a Data Citation

Element

Description

Author

The original researcher(s) who collected the data

Study name/Title

What did the original researcher call it?

Producer

The organization that sponsored the research, usually the author's institution. This takes the place of a publisher in an ordinary citation, so be prepared to list the place of publication as well. It may be useful to add a designation like [producer] if it is not actually a publisher.

Year Data Produced

When did the Producer first release the data? Treat this like the publication date.

Other Possible Elements

Element

Description

Unique Identifier, like a Digital Object Identifier (DOI)

If you got the data from a repository like ICPSR, note their unique identifier as part of the title. If the data file has a DOI, include it as you would a URL for a web site. Check here for information on how to obtain a DOI.

Distributor

The organization that makes the data available. From what organization did you get it? If directly from the author, listing the author's institution/organization once (as the publisher) is sufficient. However if the distributor is different from the producer, it's important to list it separately; it may be useful to add a designation like “[distributor]” to clarify its role.

Year Data Collected

When did the original researcher collect the data? You may choose how specific to be--it may only be important to list the years, or you may want to provide more specific date ranges if it would be important for subsequent users to know the periodicity (months, weeks, days, etc.).

  • Last Updated: Jan 5, 2021 9:19 AM
  • URL: https://guides.lib.unc.edu/citedata