Molecular Biology Data Sources: Sequence Data
Created by Health Science Librarians
NCBI General Resources
Search Tools
Submission Information
NCBI Submission Portal : NCBI's submission portal provides information on how to submit data to its databases, organized according to the type of data being submitted: sequence, project, biological materials, microarray, manuscript or clinical
Submit Sequence Data to NCBI : Information on how to submit specific types of sequence data
GenBank
Access GenBank
What is it?
Annotated collection of all publicly available DNA sequences
Part of International Nucleotide Sequence Database Collaboration
New release every 2 months
Access
Public access
NCBI places no restrictions on use or distribution of data
Submitters may place patent, copyright or other intellectual property rights restrictions on all or a portion of data
How to Retrieve Data
Search for sequence identifiers and annotations with Nucleotide
Search and align sequences to query sequence with BLAST
Search, link and download sequences using NCBI E-utilities
Download sequence records in flat file format via FTP . F ull contents are described in the README.genbank file on GenBank's FTP server
Submissions
Accepts mRNA or genomic sequence data determined directly by submitter
Several options for submitting data to GenBank
Will, upon request, withhold release of new submissions for specified period of time
Back to Top
Database of Short Genetic Variations (SNP)
Access dbSNP
What is it?
Archive and repository for short genetic variations, including:
Single nucleotide polymorphisms
Small-scale multi-base insertions or deletions
Short tandem repeats
Integrates genetic variation and clinical data in collaboration with locus-specific databases and diagnostic laboratories
Two major classes of content:
Submitted: original observations of sequence variation
Computed: generated during SNP build cycle
Each entry includes:
Sequence surrounding polymorphism
Occurrence frequency of polymorphism
Experimental methods used to assay the variation
Access
How to Retrieve Data
Searching
Where to start
Multiple search options are available:
Data available for download via FTP from the dbSNP homepage
Submissions
Accepts submissions from all organisms, including prokaryotes
Loose definition of SNPs: no requirement about minimum allele frequencies
Large-scale insertions, deletions, etc. should be submitted to dbVar
Need handle assignment from NCBI prior to submission
For more information, visit Submissions to dbSNP
Back to Top
Database of Genomic Structural Variation (dbVar)
Access dbVar
What is it?
Database for large-scale genomic variants: insertions, deletions, translocations, inversions
Accepts data from all species and clinical data
Access
How to Retrieve Data
Enter search terms into dbVar search box on homepage
Search will return studies and variants
Refine search using Limits or Advanced Search
For more information, visit dbVar Entrez Search Help
Data available via FTP on a per study and per assembly basis
Submissions
Recommend submitting variant data > 50 base pairs to dbVar
Accepts submissions from all organisms
Accepts human clinical studies
Complete information on how to submit data available on the dbVar homepage under "Submitting Data"
Back to Top
Sequence Read Archive (SRA)
Access SRA
What is it?
Repository for raw sequencing data from next-generation sequencing technologies, including:
Roche 454 GS System®
Ion Torrent®
Illumina Genome Analyzer®
SOLiD®
Helicos HeliScope®
Complete Genomics®
Goals for SRA
Provide central repository for next generation sequencing data
Provide links to other resources using this data
Provide retrieval based on ancillary information and sequence comparison
Track studies and experiments
Separate submission from content
Access
How to Retrieve Data
Searching
Multiple options for downloading data
Via Aspera or FTP from SRA homepage
Run Browser
Download data from one or more runs in fasta and fastq form
Under "Browse" tab on SRA homepage
Individual level data will require controlled access to dbGaP
For more information, visit SRA Download Guide
Using data
SRA Systems Development Kit provides Application Programming Interfaces for accession and manipulation of larger quantities of data
On SRA Homepage, under "Software"
Submissions
Accepts primary sequencing data from next-generation sequencing platforms
Submissions tools available under "Submit" tab on SRA homepage
For more information, visit Submitting to the SRA
Back to Top
Protein
Access Protein
What is it?
Collection of protein sequences
Source of sequences
GenPept sequences: translations of annotated coding regions in GenBank
RefSeq database: curated database of genomic DNA, transcripts (RNA) and protein products
Third Party Annotation (TPA) database: database of sequences derived from primary sequences
UniProtKB/Swiss-Prot
Protein Research Foundation
Protein Data Bank
Access
How to Retrieve Data
Searching
Enter keywords into search box on Protein homepage
Can limit search to records within a particular component database, gene location, publishing or modification date
Search and align sequences using BLAST
Select 'Protein BLAST' under Basic BLAST heading
Additional tools available on Protein homepage
Multiple options available for downloading data
Download directly to file on computer by selecting 'Send To' in upper right hand corner of results page
Via FTP from:
GenBank
RefSeq
BLAST
Links to these resources found on Protein homepage
Submissions
Protein sequences alone are not accepted, they must be accompanied by a nucleotide sequence (DNA or RNA), which can be submitted through GenBank
For more information, please view the information in this guide on GenBank
Back to Top