The goal of ChromoZoom is to make genome browsing online as effortless as navigating the world on Google Maps, while retaining superior data density and customizability, modeled off of the capabilities of UCSC genome browser and IGV.
All data is drawn directly in the browser using canvas and SVG, similar to the approach of igv.js and pileup.js. There's a few substantial differences though:
- We placed a premium on fast navigation. You can zoom with the mousewheel and "throw" the display, just like Google Maps.
- You don't need to install software to a server or embed code into a webpage to use ChromoZoom. Simply visit chromozoom.org, which is designed as a first-class genome browsing experience for nearly all of UCSC's tracks and genomes.
- It's easy to create and load custom genomes using the IGB Quickload format.
ChromoZoom is free for academic, nonprofit, and personal use. The source code is licensed under the GNU Affero General Public License v3. In a nutshell, this license means that you are free to copy, redistribute, and modify the source code, but you are expected to provide the source for any code derived from ChromoZoom to anybody that receives the modified code or uses it over a computer network (e.g. as a web application). ChromoZoom is not free for commercial use. For commercial licensing, please contact the Roth laboratory.
To host ChromoZoom or run the UCSC track scraper, you need either macOS or Linux. For Windows users, we suggest usage of our virtual environment.
The web interface should work in any recent version of a modern HTML5-capable web browser (Chrome, Firefox, Safari, IE ≥11).
Out of the box, ChromoZoom is serves a web interface that can display data on top of genome layouts crossloaded from UCSC, or data in IGB Quickload directories. You will need:
- PHP 5.x + Apache (or another webserver that can run PHP scripts)
- Note that magic quotes must be disabled.
- libcurl bindings for PHP (on macOS, this is included in the default PHP install)
- To support any of the binary track and genome formats, you will need the following on your
$PATH
, which during setup will be symlinked into a new directory in this repo calledbin/
:tabix
, a generic indexer for TAB-delimited genome position filessamtools
, utilities for viewing for the Sequence Alignment/Map (SAM) and BAM (Binary SAM) formats- The following Jim Kent binaries:
bigBedInfo
bigBedSummary
bigBedToBed
bigWigSummary
bigWigInfo
twoBitToFa
Place a checkout of this repo somewhere in your webserver's DOCROOT. To setup the aforementioned symlinks to binaries, run rake check
from the command line at the root of the repo. Files under php/
and index.php
will need to be executable by the webserver. Access index.php
from a web browser to view the ChromoZoom interface.
Note: To support HTTPS URLs for VCF/tabix or BAM files, you will need to compile tabix
and samtools
with libcurl
support. See below for details.
We provide a pipeline to convert data from genomes hosted at UCSC into highly efficient binary formats that make it simple to serve thousands of annotation tracks from flatfiles. This is the strategy used for chromozoom.org.
The script is at UCSC_tracks/get_tracks.py
. See the README.md in that directory for full instructions on how to run the track scraper. You can target the scraper to specific UCSC genome assemblies using the --org_prefix
switch.
Using virtualization ChromoZoom can run easily from any system. VirtualBox and Vagrant must be installed. To set up your environment, run the following:
$ cd path/to/this/repo
$ vagrant up
Once set up, you can access ChromoZoom at localhost:8080
.
In addition to the above, you'll need node.js and two npm packages:
$ npm install -g browserify watchify
$ git clone https://github.com/rothlab/chromozoom.git
$ cd chromozoom
$ rake check
This will tell you if you're missing any of the previously mentioned binaries needed for hosting ChromoZoom or running the UCSC track scraper. You should then serve this directory from Apache + PHP (symlinking into your existing webroot usually works) and access index.php
.
After making changes to the JavaScript in js/
, you need to recompile the scripts in build/
. When developing, use
$ rake watchify
which will open three screen sessions and continuously recompile debug-friendly versions of the scripts (quit by pressing Ctrl + A, then type :quit
+ Enter.) To compile minified scripts for production, use
$ rake browserify
which also runs right before you commit code to git, since rake check
installs a pre-commit hook (see git-hooks-pre-commit.sh
).
None of the following components are strictly necessary for running ChromoZoom—however, they add useful capabilities, such as improved searching and track format support. Both of these upgrades were used for our main instance at chromozoom.org.
- Compiling
bigBedSearch
, which allows prefix searching of bigBed fields - HTTPS support for
samtools
andtabix
Compiling bigBedSearch
The bigBed format can include extra B+ tree indices in the very last section of the file, which ChromoZoom can then use to search for features by the text content of various fields in the uncompressed BED data. e.g., if you want to search a gene track for gene names matching a certain prefix, these indices make such a search practical even if the track itself is large and somewhere else on the web.
I've created a binary that enables these prefix queries, which you can install if you have gcc
and make
:
$ git clone https://github.com/powerpak/bigBedSearch.git
$ cd bigBedSearch
$ make
This should produce a bigBedSearch
executable that you can copy to ChromoZoom's bin/
directory so the web frontend can use it.
If you want HTTPS to work, either make sure /usr/include/openssl
is available, or specify the equivalent SSL_DIR as an environment variable.
You can also use that source tree to produce customized versions of bigBedInfo
, bigBedSummary
, bigBedToBed
, bigWigInfo
, and bigWigSummary
, if UCSC's binaries weren't compiled in the way you prefer. (e.g., HTTPS doesn't always seem to work in UCSC's macOS binaries.)
Current release versions for samtools
and tabix
don't support HTTPS, but libcurl
is being merged into the next planned release so that this is possible. To get these features now, follow these instructions, which are largely cribbed from this answer on BioStars, with a major change being that libcurl was already merged into the development branch for htslib.
You'll first need to have gcc
, autoconf
, and zlib
, libcurl
, openssl
, and ncurses
with development headers. On macs, brew install autoconf
and you should already have the rest if you have Xcode. On most Linux distros, these are all easily found in your respective package manager.
Get the development version of htslib and setup the configure script:
$ git clone https://github.com/samtools/htslib.git
$ cd htslib/
$ autoconf
If the last step fails with something about m4 macros, try being more forceful with autoreconf --install
. Then configure with libcurl support and compile:
$ ./configure --enable-libcurl
$ make
(Side note. To get this to compile with a slightly older libcurl
, such as the moderately ancient version 7.19.7 on certain high-performance computing nodes, you may have to remove the case statement about CURLE_NOT_BUILT_IN
from hfile_libcurl.c
.)
Once it works, you'll find tabix
in this directory, along with htsfile
(which is like file
, for sequencing formats), both with HTTPS support. Test that it's working with
$ ./htsfile https://hostname.example.com/path/to/some.bam
All good? Then get the source release for samtools
1.2:
$ cd ..
$ curl -LO https://github.com/samtools/samtools/releases/download/1.2/samtools-1.2.tar.bz2
$ tar xzvf samtools-1.2.tar.bz2
$ cd samtools-1.2
Although this includes htslib 1.2.1, you want to point it to the development version you just installed:
$ rm -rf htslib-1.2.1
$ ln -s ../htslib htslib-1.2.1
$ make LDLIBS+=-lcurl LDLIBS+=-lcrypto
You should find samtools
in this directory. Test it against some BAM file on an HTTPS server, and if you get back SAM data you're in good shape:
$ ./samtools view https://hostname.example.com/path/to/some.bam 1:1-10000
(Note that this will spit out a .bai
file into the current directory, which you can safely delete afterward.)