Skip to content

a Rust crate for working with a local copy of the NCBI Taxonomy database

License

Notifications You must be signed in to change notification settings

pvanheus/ncbitaxonomy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CircleCI

ncbitaxonomy

This is a Rust crate (i.e. library) for working with a local copy of the NCBI Taxonomy database. The database can be downloaded (either taxdump.zip or taxdump.tar.gz) from the NCBI Taxonomy FTP site and reformatted into a SQLite database using the taxonomy_util utility's to_sqlite subcommand.

Documentation is available at crates.io.

taxonomy_filter_refseq

(new in 0.1.1)

A tool to filter a NCBI RefSeq FASTA file so that only the ancestors of a given taxon are retained.

$ taxonomy_filter_refseq --help
taxonomy_filter_refseq 1.0.0
Peter van Heusden <pvh@sanbi.axc.za>
Filter NCBI RefSeq FASTA files by taxonomic lineage

USAGE:
    taxonomy_filter_refseq [FLAGS] [OPTIONS] <INPUT_FASTA> <ANCESTOR_NAME> [OUTPUT_FASTA]

FLAGS:
        --no_curated      Don't accept curated RNAs and proteins (NM_, NR_ and NP_ accessions)
        --no_predicted    Don't accept computationally predicted RNAs and proteins (XM_, XR_ and XP_ accessions)
    -h, --help            Prints help information
    -V, --version         Prints version information

OPTIONS:
    -d, --db <TAXDB_URL>    URL for SQLite taxonomy database

ARGS:
    <INPUT_FASTA>      FASTA file with RefSeq sequences
    <ANCESTOR_NAME>    Name of ancestor to use as ancestor filter
    <OUTPUT_FASTA>     Output FASTA filename (or stdout if omitted)

taxonomy_filter_fastq

(new in version 0.2.0)

$ taxonomy_filter_fastq --help
taxonomy_filter_fastq 1.0.0
Peter van Heusden <pvh@sanbi.axc.za>
Filter FASTQ files whose reads have been classified by Centrifuge or Kraken2, only retaining reads in taxa descending
from given ancestor

USAGE:
    taxonomy_filter_fastq [FLAGS] [OPTIONS] <INPUT_FASTQ>... --ancestor_taxid <ANCESTOR_ID> --tax_report_filename <TAXONOMY_REPORT_FILENAME> <--centrifuge|--kraken2>

FLAGS:
    -d, --output_dir    Directory to deposited filtered output files in
    -C, --centrifuge    Filter using report from Centrifuge
    -h, --help          Prints help information
    -K, --kraken2       Filter using report from Kraken2
    -V, --version       Prints version information

OPTIONS:
    -A, --ancestor_taxid <ANCESTOR_ID>                      Name of ancestor to use as ancestor filter
    -d, --db <TAXDB_URL>                                    URL for SQLite taxonomy database
    -F, --tax_report_filename <TAXONOMY_REPORT_FILENAME>    Output from Kraken2 (default) or Centrifuge

ARGS:
    <INPUT_FASTQ>...    FASTA file with RefSeq sequences

taxonomy_util

(new in 1.0.0)

Utilities to convert NCBI taxonomy database files into SQLite database (the input format used in other tools).

taxonomy_util 1.0.0
Peter van Heusden <pvh@sanbi.axc.za>
Utilities for working with the NCBI taxonomy database

USAGE:
    taxonomy_util [OPTIONS] [SUBCOMMAND]

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -d, --db <TAXDB_URL>    URL for SQLite taxonomy database

SUBCOMMANDS:
    common_ancestor_distance    find the tree distance to te common ancestor between two taxa
    get_id                      find taxonomy ID for name
    get_lineage                 get lineage for name
    get_name                    find name for taxonomy ID
    help                        Prints this message or the help of the given subcommand(s)
    to_sqlite                   save taxonomy database loaded from files to SQLite database file

About

a Rust crate for working with a local copy of the NCBI Taxonomy database

Resources

License

Stars

Watchers

Forks

Packages

No packages published