Public datasets

Explore 5.7 PB of genomics data across 4.9M files

Sample Genomics Data

Data from various genomics file formats (BAM, VCF, BED, etc), and sequencing technologies. (5.15 GB)

Sample Genomics Data
5.15 GB Data from various genomics file formats (BAM, VCF, BED, etc), and sequencing technologies.
Human Pangenome Project

Sequencing data and analysis of 10 trios. First complete human genome assembly. (3.5 PB)

Human Pangenome Project
3.5 PB Sequencing data and analysis of 10 trios. First complete human genome assembly.
Genome in a Bottle

Reference data from several sequencing technologies. Used as ground truth for benchmarking.

Genome in a Bottle
Reference data from several sequencing technologies. Used as ground truth for benchmarking.
1000 Genomes Project

Sequencing data and analysis of >2,500 individuals from around the world. (766 TB)

1000 Genomes Project
766 TB Sequencing data and analysis of >2,500 individuals from around the world.
Bio Data Zoo

Example genomics data for tool developers (619 kB)

Bio Data Zoo
619 kB Example genomics data for tool developers
DeepVariant Datasets

Sample data used for testing and benchmarking the DeepVariant variant caller. (4.74 TB)

DeepVariant Datasets
4.74 TB Sample data used for testing and benchmarking the DeepVariant variant caller.
Broad Public Datasets

Sample datasets from the Broad Institute for testing bioinformatics workflows. (4.09 TB)

Broad Public Datasets
4.09 TB Sample datasets from the Broad Institute for testing bioinformatics workflows.
Genome Ark

Data from the Vertebrate Genomes Project (VGP), featuring reference genomes for vertebrate species. (1.1 PB)

Genome Ark
1.1 PB Data from the Vertebrate Genomes Project (VGP), featuring reference genomes for vertebrate species.
Human Microbiome Project

Microbiome data of 300 healthy adults, and several individuals with disease conditions. (5.86 TB)

Human Microbiome Project
5.86 TB Microbiome data of 300 healthy adults, and several individuals with disease conditions.
Australasian Genomes

Sequencing datasets and reference genomes of several threatened Australasian species. (8.38 TB)

Australasian Genomes
8.38 TB Sequencing datasets and reference genomes of several threatened Australasian species.
3000 Rice Genomes

Sequencing data and analysis of >3,000 rice varieties from 89 countries. (248 TB)

3000 Rice Genomes
248 TB Sequencing data and analysis of >3,000 rice varieties from 89 countries.
GATK Test Data

Test datasets for the GATK variant caller, with data from WGS, WES, and RNA-seq. (1.05 TB)

GATK Test Data
1.05 TB Test datasets for the GATK variant caller, with data from WGS, WES, and RNA-seq.
Element Bio Data

Data from the Element Bio manuscript about the Avidity instrument. (535 GB)

Element Bio Data
535 GB Data from the Element Bio manuscript about the Avidity instrument.
Nanopore Data

Oxford Nanopore benchmarking datasets from various sequencing chemistries and samples. (91.6 TB)

Nanopore Data
91.6 TB Oxford Nanopore benchmarking datasets from various sequencing chemistries and samples.
Pediatric Brain Tumor Atlas

Analysis of pediatric brain tumors: gene expression, gene fusions, somatic mutations, CNVs, and SVs. (2.98 TB)

Pediatric Brain Tumor Atlas
2.98 TB Analysis of pediatric brain tumors: gene expression, gene fusions, somatic mutations, CNVs, and SVs.
To feature your public bucket on 42basepairs, please reach out to us!