42basepairs is a free tool for exploring public genomics datasets hosted on Amazon S3, Google Cloud Storage, and Cloudflare R2.
42basepairs is built by Robert Aboukhalil. Thanks to Maria Nattestad for advice and ideas on visualizing genomics data.
Preview genomics data (.bam
, .vcf
, .bed
, .fa
, .fastq
, .gff
,
.wig
, .bigWig
, .bigBed
, etc.), and non-genomics formats (.pdf
, .md
,
.gz
, .xls
, etc.)
Download small, representative subsets of large data files.
Load files into the igv.js genome browser for further exploration.
Toggle Interactive Mode to see which command line tools were used for the preview.
To open a folder or a file preview in a new window, hold down the Cmd or Ctrl key while clicking the file/folder name.
To browse an s3://
or gs://
path, simply append it to 42basepairs.com/
!
For example, to browse s3://giab/data/NA12878
, go to
42basepairs.com/s3://giab/data/NA12878
If you use a path that corresponds to a file, you will be redirected to its parent folder, where the file will be highlighted.
First, get the download URL by clicking the file's Copy download URL button. Then:
curl -O "https://42basepairs.com/download/r2/genomics-data/alignments_NA12878.bam"
You can replace wget
in the command above if you don't have have curl
on your machine.
The download URL can be programmatically infered without using the UI:
42basepairs.com/download/cloudProvider/bucketName/path/to/file
where cloudProvider is either s3
, gs
, or r2
.
Some bioinformatics tools such as samtools
and bcftools
support URLs, so you don't need to download the entire file locally
before running a query.
To subset a small region from a large .bam
:
samtools view "https://42basepairs.com/download/r2/genomics-data/alignments_NA12878.bam" 11:93646751-93647750
To subset a small region from a large .vcf
:
bcftools view "https://42basepairs.com/download/r2/genomics-data/variants_CHM13.vcf.gz" chr18:50e6-60e6
These commands will automatically download the much smaller .bai
or .tbi
index files from 42basepairs into the current
folder, so they can be used to download only a subset of the data.
You can use 42basepairs download URLs in your web applications to fetch public genomics data from the front end, as the URLs are CORS-enabled.
Refer to the biowasm docs if you're interested in lazy-loading those download URLs, and running small analyses in the browser.