diff --git a/README.md b/README.md index 9005c12..f0f633f 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,6 @@ # slow5curl -slow5curl is a command line tool and a library and for fetching reads from remote BLOW5 files, which is built on top of [slow5lib](https://github.com/hasindu2008/slow5lib) and [libcurl](https://curl.se/libcurl/). See [slow5lib](https://github.com/hasindu2008/slow5lib) or [slow5tools](https://github.com/hasindu2008/slow5tools) for fully featured SLOW5 file manipulation; slow5curl is kept standalone for its specific use case and dependencies. - +slow5curl is a command line tool and a library and for fetching reads from remote BLOW5 files, which is built on top of [slow5lib](https://github.com/hasindu2008/slow5lib) and [libcurl](https://curl.se/libcurl/). Note that slow5curl is kept separate from [slow5lib](https://github.com/hasindu2008/slow5lib)/[slow5tools](https://github.com/hasindu2008/slow5tools), because *libcurl* is a complex dependency and we want to keep [slow5lib](https://github.com/hasindu2008/slow5lib)/[slow5tools](https://github.com/hasindu2008/slow5tools) as simple as possible. *This project is still under active development. Currently, the tool and the C API is available. Python API is under construction.* @@ -30,7 +29,7 @@ SLOW5 ecosystem: https://hasindu2008.github.io/slow5
## Quick Start -If you are a Linux user on x86_64 architecture and want to quickly try slow5curl out, download the compiled binaries from the [latest release](https://github.com/BonsonW/slow5curl/releases). Binaries should work on most Linux distributions as long as the `curl` and `zlib` runtime libraries are available. You can install `curl` using `apt-get install curl` on Ubuntu. `zlib` is typically available by default on most Linux distributions. For compiled binaries to work, your processor must support SSSE3 instructions or higher (processors after 2007 have these) and your operating system must have GLIBC 2.17 or higher (Linux distributions from 2014 onwards typically have this). +If you are a Linux user on x86_64 architecture and want to quickly try slow5curl out, download the compiled binaries from the [latest release](https://github.com/BonsonW/slow5curl/releases). Binaries should work on most Linux distributions as long as the `curl` and `zlib` runtime libraries are available. You can install `curl` using `apt-get install curl` on Ubuntu. `zlib` is typically available by default on most Linux distributions. For compiled binaries to work, your processor must support SSSE3 instructions or higher (processors after 2007 have these) and your operating system must have GLIBC 2.17 or higher (Linux distributions from 2014 onwards typically have this). ```sh sudo apt-get install curl # curl runtime library on Ubuntu (CentOS have this by default) @@ -133,7 +132,7 @@ slow5curl get https://github.com/BonsonW/slow5curl/raw/main/examples/data/reads_ slow5curl get https://gtgseq.s3.amazonaws.com/ont-r10-dna/NA24385/raw/PGXX22394_reads.blow5 05ef1592-a969-4eb8-b917-44ca536bec36 --cache /tmp/PGXX22394_reads.blow5.idx -o read.blow5 # fetch from a large BLOW5 with the cached index -slow5curl get https://gtgseq.s3.amazonaws.com/ont-r10-dna/NA24385/raw/PGXX22394_reads.blow5 05ef1592-a969-4eb8-b917-44ca536bec36 --index /tmp/PGXX22394_reads.blow5.idx -o read.blow5 +slow5curl get https://gtgseq.s3.amazonaws.com/ont-r10-dna/NA24385/raw/PGXX22394_reads.blow5 05ef1592-a969-4eb8-b917-44ca536bec36 --index /tmp/PGXX22394_reads.blow5.idx -o read.blow5 ``` ### Troubleshooting/Questions diff --git a/docs/commands.md b/docs/commands.md index 482eda8..eb41c00 100644 --- a/docs/commands.md +++ b/docs/commands.md @@ -29,7 +29,7 @@ slow5curl get [OPTIONS] https://url/to/file1.blow5 --list readids.txt * `-s, --sig-compress compression_type`:
Specifies the raw signal compression method used for BLOW5 output. `compression_type` can be `none` for uncompressed raw signal or `svb-zd` to compress the raw signal using StreamVByte zig-zag delta [default value: svb-zd]. Note that record compression (-c option above) is still applied on top of the compressed signal. Signal compression with svb-zd and record compression with zstd is similar to ONT's vbz. zstd+svb-zd offers slightly smaller file size and slightly better performance compared to the default zlib+svb-zd, however, will be less portable. * `-t, --threads INT`:
- Number of threads (connections) [default value: 128]. The number of threads will depend on the request limit of the server and the number of cores available to the client. For example, a 40 core CPU may go up to 512 threads, but if the server only allows 10 requests at a time per client we will be limited by the latter. + Number of threads (connections) [default value: 128]. As these threads are for network access, the number of threads can be much larger than the number of CPU threads available on the system. The max number of threads will typically depend on the request limit of the server. For example, we may go up to 512 threads, but if the server only allows 128 requests at a time per client, we will be limited by the latter. * `-K, --batchsize`:
The batch size. This is the number of records on the memory at once [default value: 4096]. An increased batch size improves multi-threaded performance at the cost of higher RAM. * `-l, --list FILE`:
@@ -45,7 +45,7 @@ slow5curl get [OPTIONS] https://url/to/file1.blow5 --list readids.txt ### head -Print [header information](https://hasindu2008.github.io/slow5specs/summary#slow5-header) from a remote BLOW5 file URL. +Print [header information](https://hasindu2008.github.io/slow5specs/summary#slow5-header) from a remote BLOW5 file URL. ```sh slow5curl head https://url/to/file1.blow5 diff --git a/test/test.sh b/test/test.sh index 63be3fa..45c0314 100755 --- a/test/test.sh +++ b/test/test.sh @@ -11,8 +11,6 @@ BLOW_OUT="${OUT}reads.blow5" TXT_OUT="${OUT}text.txt" READ_LIST="${RAW}reads_10.txt" -test -d ${OUT} || mkdir ${OUT} - die() { echo "$1" >&2 echo @@ -37,6 +35,9 @@ echo_test_name() { printf '\n--%s--\n' "$1" } +test -d ${OUT} && rm -r ${OUT} +mkdir ${OUT} || die "mkdir failed" + # cache opt TESTCASE_NAME="get_cached" echo_test_name ${TESTCASE_NAME}