Skip to Main Content

Molecular Biology Data Sources: Genes and Expression

Created by Health Science Librarians

Gene Expression Omnibus (GEO)

Access GEO

What is it?

  • International repository for high-throughput functional genomic data
  • Stores original submitter records and curated DataSets
    • GEO Series: original submitter record that summarizes a study
    • GEO DataSets: collection of biologically and statistically comparable samples processed using the same platform
      • Not all Series have been assembled into a DataSet
      • Datasets form basis of analysis features
  • GEO Profiles: Stores gene expression profiles derived from curated GEO DataSets
    •  Presented as a chart that displays expression of one gene across all samples within a DataSet

Access

  • Public access, no login required
  • NCBI places no restrictions on use or distribution of data
  • Submitters may place patent, copyright or other intellectual property restrictions on all or a portion of data

How to Retrieve Data

  •  GEO DataSets provides a study-centric view of data
    • Stores descriptions of Series and DataSets
    • Enter search terms into GEO DataSets search box on homepage
  • GEO Profiles provides a gene-centric view of data
    • Profiles can be searched using keywords, gene names, GenBank accession numbers and other attributes
    • Enter search terms into GEO Profiles search box on homepage
  • For information on advanced search techniques, visit Querying GEO DataSets and GEO Profiles
  • GEO DataSets and GEO Profiles are part of NCBI's Entrez cross-database search system
  • All data can be downloaded in several formats using a variety of mechanisms. For more information, visit Download GEO Data

Submissions

  • Accepts several categories of high-throughput functional genomic data
  • For more information, visit Submitting Data and Frequently Asked Questions
  • Will, upon request, hold release of new submissions for specified period of time

Back to Top

Database of Genotypes and Phenotypes (dbGaP)

Access dbGaP

What is it?

  • Archives and distributes studies that investigate interaction between genotype and phenotype, including Genome Wide Association Study (GWAS) data
  • Contains 4 types of data:
    • Study documentation
    • Phenotypic data: individual level and summary
    • Genetic data
    • Statistical results

Access

  • Two levels of access: open and controlled
  • Open access
    • Public access
    • Non-sensitive data
    • Summaries of studies, descriptive statistics for phenotypic variables, original study document text
    • Data available as open access may vary between studies
  • Controlled
    • Managed through dbGaP Authorized Access portal
      • Submit and manage data access requests
      • Download approved data sets
    • Sensitive data involving personal health information
    • Individual level data that has been de-identified

How to Retrieve Data

  • Enter search terms into dbGaP search box or use Advance features
  • Data access policies found on individual study pages
  • Open access data available via FTP from homepage
  • Controlled access data
    • Access determined by a NIH Data Access Committee (DAC) on a per-study basis
    • Submit Data Use Certification (DUC), through Authorized Access portal, to appropriate NIH DAC for approval
      • DUC requires name of Signing Official at requestor's institution
    • Once approved, download data through Authorized Access portal
    • For more information on applying for individual level data, review these Step-by-Step Instructions

Submissions

Back to Top

ClinVar

Access ClinVar

What is it?

  • Goal: Archive high quality variation and phenotype information
  • Aggregates information about sequence variation and its relationship to human health
  • Submitter-driven
    • ClinVar will integrate data for same genotype/phenotype combination and highlight conflicts
    • ClinVar did calculate values for clinical significance and are currently reviewing some cases
  • Anticipated use
    • Search DNA or protein location for what is known about sequence variations at the location
    • Review evidence about phenotype associated with allele

Access

  • Public access

How to Retrieve Data

  • Enter search terms into search box on ClinVar Homepage. This preliminary release of ClinVar provides basic query and retrieval functions. Not all the content within ClinVar is being displayed from the website, although it is available via FTP (see below).
    • Browse via: variant, gene, location, submitter, and other categories
    • Refine search using Limits or Advanced Search
  • Data available for download via FTP from the ClinVar homepage.
  • For more on these and other access methods e.g. API, see Accessing and Using Data in ClinVar.

Submissions

Back to Top

Database of Major Histocompatibility Complex (dbMHC)

Access dbMHC

What is it?

  • Platform for Human Leukocyte Antigen (HLA) community to submit, edit, view and exchange DNA and clinical data related to the human major histocompatibility complex (MHC)
  • Integrated with International Histocompatibility Working Group (IHWG), whose projects include:
    • Hematopoietic cell transplantation: HLA genotype and clinical outcome data
    • Type I diabetes
    • Rheumatoid arthritis: HLA genotype, microsatellite, clinical and demographic data

Access

  • Public access
  • Can access as guest or member
  • Guest
    • Cannot submit data
    • Cannot edit existing data
    • Data not saved from session to session or frame to frame
    • Can download data from a session
  • dbMHC Member
    • On dbMHC homepage, select "Create an Account" under "Accounts"
      • Provide institutional information
      • Specify account administrator

How to Retrieve Data

  • Several resources available for accessing data:
    • Alignment viewer for HLA and related genes
    • MHC microsatellite database (dbMHCms)
    • Sequencing interpretation site for Sequence Based Typing (SBT)
    • Primer/Probe database
    • Typing Kit interface
  • Data available for download via FTP from dbMHC homepage
    • See "Download" in column labeled 'Resources'

Submissions

  • No direct submission mechanism, but will accept data
  • If interested in submission, contact dbMHC staff at dbMHC@ncbi.nlm.nih.gov

Back to Top