Skip to content
skripche edited this page Jan 24, 2020 · 46 revisions

Welcome to the sra-tools wiki!

ANNOUNCEMENTS:

2020-01-15 2.10.2 Release

The 2.10.2 release allows for access of dbGaP controlled human data in AWS and GCP buckets if you have approval from dbGaP.

  1. Prefetch now accepts a JWT with acts both authorization and selection of data to download using the "--perm" command line argument
  2. Prefetch allows users to download original data files submitted to SRA along with SRA computed data files using "prefetch --type all"
  3. Prefetch retained the functionality to accept all style kart file, but it is now specified as a command line argument "--cart"
  4. Prefetch download has been limited to https and the eliminate-qua ls option has been temporarily disabled
  5. Added command line options for cloud configurations for vdb-config
  6. Random error at startup of fasterq-dump has been fixed
  7. "-Z" option is not accepted for fasterq-dump
  8. A GUID is shown in vdb-config or created if not yet present

2019-08-19
We have released 2.10.0 of sra-tools that operate natively within AWS and GCP cloud environments. Most of the functionality you are accustomed to has been preserved, although there are a few changes.

  1. This release allows access to public SRA data stored within cloud buckets, now including the ability to retrieve original submission files (raw, unharmonized, no error correction) with prefetch.
  2. The local caching model has changed to support original submission files: we have introduced the accession directory for prefetch that will contain any files you have requested related to a particular accession.
  3. Contrary to prior behavior, if you have not specifically established a designated cache area, prefetch will use the accession-directory.
  4. Similarly, the converter (dumper) tools will make use of a process-local temporary cache area unless you have configured the toolkit for a specific cache. NB - this behavior will temporarily use more local space, but is preferred for cluster operation.
  5. Access to data within the cloud will generally require setting up cloud-specific account credentials and making them known to the toolkit via vdb-config. The tools will not send out any credentials until you have agreed to accept charges within vdb-config. Your account information is required so that the cloud provider may assess egress charges and is not used in any way by NCBI or transmitted for any other purpose.
  6. Access to cloud data from within a region that would not incur egress charges may be allowed without account credentials - as a special exception. In this case, you may configure the toolkit (using vdb-config) to send a cloud service provided environment credential as proof of your execution environment.

With release 2.9.1 of sra-tools we have finally made available the tool fasterq-dump, a replacement for the much older fastq-dump tool. As its name implies, it runs faster, and is better suited for large-scale conversion of SRA objects into FASTQ files that are common on sites with enough disk space for temporary files. fasterq-dump is multi-threaded and performs bulk joins in a way that improves performance as compared to fastq-dump, which performs joins on a per-record basis (and is single-threaded).

fastq-dump is still supported as it handles more corner cases than fasterq-dump, but it is likely to be deprecated in the future.

You can get more information about fasterq-dump in this Wiki at https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump.