Skip to Main Content

Molecular Biology Data Sources: Sequence Data

Created by Health Science Librarians

NCBI General Resources

Search Tools

Submission Information

  • NCBI Submission Portal: NCBI's submission portal provides information on how to submit data to its databases, organized according to the type of data being submitted: sequence, project, biological materials, microarray, manuscript or clinical
  • Submit Sequence Data to NCBI: Information on how to submit specific types of sequence data

GenBank

Access GenBank

What is it?

  • Annotated collection of all publicly available DNA sequences
  • Part of International Nucleotide Sequence Database Collaboration
  • New release every 2 months

Access

  • Public access
  • NCBI places no restrictions on use or distribution of data
  • Submitters may place patent, copyright or other intellectual property rights restrictions on all or a portion of data

How to Retrieve Data

  • Search for sequence identifiers and annotations with Nucleotide
  • Search and align sequences to query sequence with BLAST
  • Search, link and download sequences using NCBI E-utilities
  • Download sequence records in flat file format via FTPFull contents are described in the README.genbank file on GenBank's FTP server

Submissions

  • Accepts mRNA or genomic sequence data determined directly by submitter
  • Several options for submitting data to GenBank
  • Will, upon request, withhold release of new submissions for specified period of time

Back to Top

Database of Short Genetic Variations (SNP)

Access dbSNP

What is it?

  • Archive and repository for short genetic variations, including:
    • Single nucleotide polymorphisms
    • Small-scale multi-base insertions or deletions
    • Short tandem repeats
  • Integrates genetic variation and clinical data in collaboration with locus-specific databases and diagnostic laboratories
  • Two major classes of content:
    • Submitted: original observations of sequence variation
    • Computed: generated during SNP build cycle
  • Each entry includes:
    • Sequence surrounding polymorphism
    • Occurrence frequency of polymorphism
    • Experimental methods used to assay the variation

Access

  •  Public access

How to Retrieve Data

Submissions

  • Accepts submissions from all organisms, including prokaryotes
  • Loose definition of SNPs: no requirement about minimum allele frequencies
    • Large-scale insertions, deletions, etc. should be submitted to dbVar
  • Need handle assignment from NCBI prior to submission
  • For more information, visit Submissions to dbSNP

Back to Top

Database of Genomic Structural Variation (dbVar)

Access dbVar

What is it?

  • Database for large-scale genomic variants: insertions, deletions, translocations, inversions
  • Accepts data from all species and clinical data

Access

  • Public access

How to Retrieve Data

  • Enter search terms into dbVar search box on homepage
    • Search will return studies and variants
    • Refine search using Limits or Advanced Search
    • For more information, visit dbVar Entrez Search Help
  •  Data available via FTP on a per study and per assembly basis

Submissions

  • Recommend submitting variant data > 50 base pairs to dbVar
  • Accepts submissions from all organisms
  • Accepts human clinical studies
  • Complete information on how to submit data available on the dbVar homepage under "Submitting Data"

Back to Top

Sequence Read Archive (SRA)

Access SRA

What is it?

  • Repository for raw sequencing data from next-generation sequencing technologies, including:
    • Roche 454 GS System®
    • Ion Torrent®
    • Illumina Genome Analyzer®
    • SOLiD®
    • Helicos HeliScope®
    • Complete Genomics®
  • Goals for SRA
    • Provide central repository for next generation sequencing data
    • Provide links to other resources using this data
    • Provide retrieval based on ancillary information and sequence comparison
    • Track studies and experiments
    • Separate submission from content

Access

  • Public access

How to Retrieve Data

  • Searching
  •  Multiple options for downloading data
    •  Via Aspera or FTP from SRA homepage
    • Run Browser
      • Download data from one or more runs in fasta and fastq form
      • Under "Browse" tab on SRA homepage
    • Individual level data will require controlled access to dbGaP
    • For more information, visit SRA Download Guide
  • Using data
    • SRA Systems Development Kit provides Application Programming Interfaces for accession and manipulation of larger quantities of data
    • On SRA Homepage, under "Software"

Submissions

  • Accepts primary sequencing data from next-generation sequencing platforms
  • Submissions tools available under "Submit" tab on SRA homepage
  • For more information, visit Submitting to the SRA

Back to Top

Protein

Access Protein

What is it?

  • Collection of protein sequences
  • Source of sequences
    • GenPept sequences: translations of annotated coding regions in GenBank
    • RefSeq database: curated database of genomic DNA, transcripts (RNA) and protein products
    • Third Party Annotation (TPA) database: database of sequences derived from primary sequences
    • UniProtKB/Swiss-Prot
    • Protein Research Foundation
    • Protein Data Bank

Access

  • Public access

How to Retrieve Data

  • Searching
    • Enter keywords into search box on Protein homepage
      • Can limit search to records within a particular component database, gene location, publishing or modification date
    • Search and align sequences using BLAST
      • Select 'Protein BLAST' under Basic BLAST heading
    •  Additional tools available on Protein homepage
  •  Multiple options available for downloading data
    • Download directly to file on computer by selecting 'Send To' in upper right hand corner of results page
    • Via FTP from:
      • GenBank
      • RefSeq
      • BLAST
      • Links to these resources found on Protein homepage

Submissions

  • Protein sequences alone are not accepted, they must be accompanied by a nucleotide sequence (DNA or RNA), which can be submitted through GenBank
  • For more information, please view the information in this guide on GenBank

Back to Top