Public datasets

Explore 5 PB of genomics data across 4.8M files

Sample Genomics Data

Data from various genomics file formats (BAM, VCF, BED, etc), and sequencing technologies. (5.15 GB)

Sample Genomics Data 5.15 GB Data from various genomics file formats (BAM, VCF, BED, etc), and sequencing technologies.
Human Pangenome Project

Sequencing data and analysis of 10 trios. First complete human genome assembly. (3 PB)

Human Pangenome Project 3 PB Sequencing data and analysis of 10 trios. First complete human genome assembly.
Genome in a Bottle

Reference data from several sequencing technologies. Used as ground truth for benchmarking.

Genome in a Bottle Reference data from several sequencing technologies. Used as ground truth for benchmarking.
1000 Genomes Project

Sequencing data and analysis of >2,500 individuals from around the world. (766 TB)

1000 Genomes Project 766 TB Sequencing data and analysis of >2,500 individuals from around the world.
DeepVariant Datasets

Sample data used for testing and benchmarking the DeepVariant variant caller. (4.71 TB)

DeepVariant Datasets 4.71 TB Sample data used for testing and benchmarking the DeepVariant variant caller.
Broad Public Datasets

Sample datasets from the Broad Institute for testing bioinformatics workflows. (4.14 TB)

Broad Public Datasets 4.14 TB Sample datasets from the Broad Institute for testing bioinformatics workflows.
Genome Ark

Data from the Vertebrate Genomes Project (VGP), featuring reference genomes for vertebrate species. (918 TB)

Genome Ark 918 TB Data from the Vertebrate Genomes Project (VGP), featuring reference genomes for vertebrate species.
Human Microbiome Project

Microbiome data of 300 healthy adults, and several individuals with disease conditions. (5.86 TB)

Human Microbiome Project 5.86 TB Microbiome data of 300 healthy adults, and several individuals with disease conditions.
Australasian Genomes

Sequencing datasets and reference genomes of several threatened Australasian species. (7.49 TB)

Australasian Genomes 7.49 TB Sequencing datasets and reference genomes of several threatened Australasian species.
3000 Rice Genomes

Sequencing data and analysis of >3,000 rice varieties from 89 countries. (248 TB)

3000 Rice Genomes 248 TB Sequencing data and analysis of >3,000 rice varieties from 89 countries.
GATK Test Data

Test datasets for the GATK variant caller, with data from WGS, WES, and RNA-seq. (1.05 TB)

GATK Test Data 1.05 TB Test datasets for the GATK variant caller, with data from WGS, WES, and RNA-seq.
Element Bio Data

Data from the Element Bio manuscript about the Avidity instrument. (535 GB)

Element Bio Data 535 GB Data from the Element Bio manuscript about the Avidity instrument.
Nanopore Data

Oxford Nanopore benchmarking datasets from various sequencing chemistries and samples. (90.6 TB)

Nanopore Data 90.6 TB Oxford Nanopore benchmarking datasets from various sequencing chemistries and samples.
Pediatric Brain Tumor Atlas

Analysis of pediatric brain tumors: gene expression, gene fusions, somatic mutations, CNVs, and SVs. (2.64 TB)

Pediatric Brain Tumor Atlas 2.64 TB Analysis of pediatric brain tumors: gene expression, gene fusions, somatic mutations, CNVs, and SVs.