Public Datasets

Sample Genomics Data
r2://genomics-data

Data from various genomics file formats (BAM, VCF, BED, etc), and sequencing technologies.

Browse this dataset
Human Pangenome Project
s3://human-pangenomics

Sequencing data and analysis of 10 trios. First complete human genome assembly.

Browse this dataset
Genome in a Bottle
https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/

Reference data from several sequencing technologies. Used as ground truth for benchmarking.

Browse this dataset
DNAStack COVID19 Data
s3://dnastack-covid-19-sra-data

Sequencing data from COVID-19 samples, processed through Illumina, PacBio and ONT.

Browse this dataset
1000 Genomes Project
s3://1000genomes

Sequencing data and analysis of >2,500 individuals from around the world.

Browse this dataset
DeepVariant Datasets
gs://deepvariant

Sample data used for testing and benchmarking the DeepVariant variant caller.

Browse this dataset
Broad Public Datasets
gs://broad-public-datasets

Sample datasets from the Broad Institute for testing bioinformatics workflows.

Browse this dataset
Human Microbiome Project
s3://human-microbiome-project

Microbiome dataset of 300 healthy adults, and several individuals suffering from disease conditions.

Browse this dataset
Australasian Genomes
s3://threatenedspecies

Sequencing datasets and reference genomes of several threatened Australasian species.

Browse this dataset
3000 Rice Genomes Project
s3://3kricegenome

Sequencing data and analysis of >3,000 rice varieties from 89 countries.

Browse this dataset
GATK Test Data
s3://gatk-test-data

Test datasets for the GATK variant caller, with data from WGS, WES, and RNA-seq.

Browse this dataset

Custom Public Dataset

Access a public bucket that is not listed above:

Provider
Bucket