Skip to content

Latest commit

 

History

History
223 lines (157 loc) · 9.76 KB

README.md

File metadata and controls

223 lines (157 loc) · 9.76 KB

slow5curl

slow5curl is a command line tool and a library and for fetching reads from remote BLOW5 files, which is built on top of slow5lib and libcurl. Note that slow5curl is kept separate from slow5lib/slow5tools, because libcurl is a complex dependency and we want to keep slow5lib/slow5tools as simple as possible.

This project is still under active development. Currently, the tool and the C API is available. Python API is under construction.

GitHub Downloads BioConda Install CI

Full documentation: https://BonsonW.github.io/slow5curl
Publication (SLOW5 format): https://www.nature.com/articles/s41587-021-01147-4
SLOW5 specification: https://hasindu2008.github.io/slow5specs
SLOW5 ecosystem: https://hasindu2008.github.io/slow5

Table of Contents

Quick Start

If you are a Linux user on x86_64 architecture and want to quickly try slow5curl out, download the compiled binaries from the latest release. Binaries should work on most Linux distributions as long as the curl and zlib runtime libraries are available. You can install curl using apt-get install curl on Ubuntu. zlib is typically available by default on most Linux distributions. For compiled binaries to work, your processor must support SSSE3 instructions or higher (processors after 2007 have these) and your operating system must have GLIBC 2.17 or higher (Linux distributions from 2014 onwards typically have this).

sudo apt-get install curl # curl runtime library on Ubuntu (CentOS have this by default)
VERSION=v0.2.1
wget "https://github.com/BonsonW/slow5curl/releases/download/$VERSION/slow5curl-$VERSION-x86_64-linux-binaries.tar.gz" && tar xvf slow5curl-$VERSION-x86_64-linux-binaries.tar.gz && cd slow5curl-$VERSION/

# list commands and options
./slow5curl

# print out the header
./slow5curl head https://github.com/BonsonW/slow5curl/raw/main/examples/data/reads_10.blow5

# fetch a single read
./slow5curl get https://github.com/BonsonW/slow5curl/raw/main/examples/data/reads_10.blow5 000286ab-1f80-40e3-a778-8d89e4e52940 -o read.blow5

# print out the list of reads
./slow5curl reads https://github.com/BonsonW/slow5curl/raw/main/examples/data/reads_10.blow5

Building

Building A Release

Users are recommended to build from the latest release tar ball. A compiler that supports C99 is needed to build slow5curl.

Quick example for Ubuntu:

sudo apt-get install zlib1g-dev libcurl4-openssl-dev # install zlib and libcurl development libraries
VERSION=v0.2.1
wget "https://github.com/BonsonW/slow5curl/releases/download/$VERSION/slow5curl-$VERSION-release.tar.gz" && tar xvf slow5curl-$VERSION-release.tar.gz && cd slow5curl-$VERSION/
make

The commands to install libcurl and zlib development libraries on some popular distributions:

On Debian/Ubuntu: sudo apt-get install libcurl4-openssl-dev zlib1g-dev
On Fedora/CentOS: sudo dnf/yum install libcurl-devel zlib-devel
On OS X: brew install curl zlib

Building From GitHub

To build from GitHub:

sudo apt-get install zlib1g-dev libcurl4-openssl-dev # install zlib and libcurl development libraries
git clone --recursive https://github.com/BonsonW/slow5curl
cd slow5curl
make

The commands to install libcurl and zlib development libraries on some popular distributions:

On Debian/Ubuntu: sudo apt-get install libcurl4-openssl-dev zlib1g-dev
On Fedora/CentOS: sudo dnf/yum install libcurl-devel zlib-devel
On OS X: brew install curl zlib

Other Building Options

Optional zstd compression

If you want to access BLOW5 files compressed with zstd, you can optionally enable zstd compression support in slow5lib when building slow5curl by invoking make zstd=1. This requires zstd 1.3 or higher development libraries installed on your system:

On Debian/Ubuntu: sudo apt-get install libzstd1-dev # libzstd-dev on newer distributions if libzstd1-dev is unavailable
On Fedora/CentOS: sudo yum libzstd-devel
On OS X: brew install zstd

Tool

Manual: Commands and options.

One-Liners: Example bash one-liners.

Workflows: Example bash workflows.

Tool Examples

# fetch reads from a remote BLOW5, one at a time
slow5curl get https://url/to/file.blow5 readid1 readid2 ....

# fetch reads from a remote BLOW5, a batch at a time
slow5curl get https://url/to/file.blow5 --list readids.txt

# get a read from a remote BLOW5, caching the index
slow5curl get https://url/to/file.blow5 readid1 --cache /path/to/file.blow5.idx -o read.blow5

# use a custom BLOW5 index which is available locally
slow5curl get https://url/to/file.blow5 --index /path/to/file.blow5.idx --list readids.txt

# use a custom remote BLOW5 index
slow5curl get https://url/to/file.blow5 --index https://url/to/file.blow5.idx --list readids.txt

# get the header info of a remote BLOW5
slow5curl head https://url/to/file1.blow5

# get the list of read-ids of a remote BLOW5
slow5curl reads https://url/to/file1.blow5

Working examples:

# fetch from a small BLOW5
slow5curl get https://github.com/BonsonW/slow5curl/raw/main/examples/data/reads_10.blow5 000286ab-1f80-40e3-a778-8d89e4e52940 -o read.blow5

# fetch from a large BLOW5 while caching the indexing (note downloading the index will take some time)
slow5curl get https://gtgseq.s3.amazonaws.com/ont-r10-dna/NA24385/raw/PGXX22394_reads.blow5 05ef1592-a969-4eb8-b917-44ca536bec36  --cache /tmp/PGXX22394_reads.blow5.idx -o read.blow5

# fetch from a large BLOW5 with the cached index
slow5curl get https://gtgseq.s3.amazonaws.com/ont-r10-dna/NA24385/raw/PGXX22394_reads.blow5 05ef1592-a969-4eb8-b917-44ca536bec36 --index /tmp/PGXX22394_reads.blow5.idx -o read.blow5

Troubleshooting/Questions

Open an Issue.

Library

Follow the steps to build slow5curl from GitHub. Include <slow5curl/s5curl.h> in your C program and call the API functions. To compile your program and statically link against slow5curl:

gcc [OPTIONS] -I path/to/slow5curl/include -I path/to/slow5curl/slow5lib/include your_program.c path/to/slow5curl/lib/libslow5curl.a -lcurl -lpthread -lz

path/to/slow5curl/ is the absolute or relative path to the slow5curl repository cloned above.

If you compiled slow5curl with zstd support enabled, make sure you append -lzstd to the above command.

C API: Full documentation of the C API.

C API Examples: Main features of the C API and how to use them.

If you need any help with using the library please open an Issue.

Compatibility

slow5curl version slow5lib version
0.2.1 1.2.0-beta
0.2.0 1.2.0-beta

Acknowledgement

AWS Open Data Sponsorship Program for generously hosting open data in BLOW5 format, which greatly helped us to develop and test this tool.

Citation

Please cite the following in your publications when using the SLOW5 file format:

Gamaarachchi, H., Samarakoon, H., Jenner, S.P. et al. Fast nanopore sequencing data analysis with SLOW5. Nat Biotechnol 40, 1026-1029 (2022). https://doi.org/10.1038/s41587-021-01147-4

@article{gamaarachchi2022fast,
  title={Fast nanopore sequencing data analysis with SLOW5},
  author={Gamaarachchi, Hasindu and Samarakoon, Hiruna and Jenner, Sasha P and Ferguson, James M and Amos, Timothy G and Hammond, Jillian M and Saadat, Hassaan and Smith, Martin A and Parameswaran, Sri and Deveson, Ira W},
  journal={Nature biotechnology},
  pages={1--4},
  year={2022},
  publisher={Nature Publishing Group}
}

Please cite the following in your publications when using slow5curl:

Bonson Wong, James M Ferguson, Jessica Y Do, Hasindu Gamaarachchi, Ira W Deveson, Streamlining remote nanopore data access with slow5curl, GigaScience, Volume 13, 2024, giae016, https://doi.org/10.1093/gigascience/giae016

@article{10.1093/gigascience/giae016,
    title="{Streamlining remote nanopore data access with slow5curl}",
    author={Wong, Bonson and Ferguson, James M and Do, Jessica Y and Gamaarachchi, Hasindu and Deveson, Ira W},
    journal={GigaScience},
    volume={13},
    pages={7},
    year={2024},
    publisher={Oxford University Press},
}