Public datasets

Explore 3.6 PB of genomics data across 4.4M files

r2://genomics-data

Data from various genomics file formats (BAM, VCF, BED, etc), and sequencing technologies.

s3://human-pangenomics

Sequencing data and analysis of 10 trios. First complete human genome assembly.

https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/

Reference data from several sequencing technologies. Used as ground truth for benchmarking.

s3://1000genomes

Sequencing data and analysis of >2,500 individuals from around the world.

gs://deepvariant

Sample data used for testing and benchmarking the DeepVariant variant caller.

gs://broad-public-datasets

Sample datasets from the Broad Institute for testing bioinformatics workflows.

s3://genomeark

Data from the Vertebrate Genomes Project (VGP), featuring reference genomes for vertebrate species.

s3://human-microbiome-project

Microbiome data of 300 healthy adults, and several individuals with disease conditions.

s3://threatenedspecies

Sequencing datasets and reference genomes of several threatened Australasian species.

s3://3kricegenome

Sequencing data and analysis of >3,000 rice varieties from 89 countries.

s3://gatk-test-data

Test datasets for the GATK variant caller, with data from WGS, WES, and RNA-seq.

s3://avidity-manuscript-data

Data from the Element Bio manuscript about the Avidity instrument.

s3://ont-open-data

Oxford Nanopore benchmarking datasets from various sequencing chemistries and samples.

s3://d3b-openaccess-us-east-1-prd-pbta

Analysis of pediatric brain tumors: gene expression, gene fusions, somatic mutations, CNVs, and SVs.

Custom Public Dataset

Access a custom public bucket

Provider
Bucket