Roadmap

Current roadmap (2022)

Mileston 0.2.12

updates to version 2 distribution on PyPI and bioconda

Bugs

Issue	Branch	PR
248			Make `version_0_2` the default branch; rename `master`
396			Checklist of steps for this update

Milestone 0.3.0

new command-line API
functionality similar to v0.2.x

Subcommands

Issue	Branch	PR
123	anib_123	338	ANIb(lastall)
124			TETRA
150			Classify
378	alembic_378	387	Alembic
--	compare	364	Compare

Features

Issue	Branch	PR
146			Config file
215			SLURM support
147			Pipe 3rd party output to temp location

Questions

Issue	Branch	PR
151			ANIm metric status

Bugs

Issue	Branch	PR
373	issue_373	376	ANIm should not be symmetric
383	issue_383	385	`try/except` around extraction in `pyani download`
371			`ValueError: zero-size array to reduction operation minimum which has no identity`
342	issue_342, noextend_342		Use `--noextend` in NUCmer as a rule
340			Alignment coverage >1.0

Misc.

Issue	Branch	PR
145			Warnings for 0-identity comparisons
188			Propagate labels for taxon determination
392			Rationalise documentation
152			Update logging exceptions
194			Adopt concurrent.futures in place of multiprocessing

Close?

Issue	Branch	PR
221			Missing labels and captions in plots with default settings
129			ANIm: check class/label files before loading sequences

Milestone 0.3.1

Extension of pyani v0.3.0 to add new functionality and outputs

Subcommands

Issue	Branch	PR
187	tree_186	370	Tree (branch named for a now-closed issue)
180			Evolve
135			Subsample
362			Add tests for `--recovery` mode

Features

Issue	Branch	PR
136			Use JSON for labels/classes files
116			Order rows and columns in clustering order like images
94			Fetching only N genomes
343			--dry-run flag

Bugs

Issue	Branch	PR
14			Collating results is slow for large datasets (>1500 genomes)
306			NUCmer job generation for large jobs slows down rapidly

Milestone 0.3.2

Extension of pyani v0.3.1 to accommodate alternative measures of similarity

Subcommands

Issue	Branch	PR
156			wANI
155			gANI
137			mash
16			AAI

Milestone 0.3.3

Flask interface onto pyani database.

Features

Issue	Branch	PR
148			Flask interface onto SQLite3 backend

Previous roadmap (2017)

This page contains notes for the planned future development of pyani

Interface

The current interface for pyani scripts is to call either the average_nucleotide_identity.py or genbank_get_genomes_by_taxon.py scripts with a combination of arguments. For the average_nucleotide_identity.py script in particular there are arguments that either perform a stage in the total analysis, or prevent a stage from executing. I would like to change this interface to a pyani.py COMMAND OPTIONS structure, similar to git and other tools.

More specificially, I would like to enable operations such as:

pyani.py download -t 931 -o my_organism: download all NCBI assemblies under taxon 931 to the directory my_organism
pyani.py index my_organism: generate MD5 or other hashes for each genome in the directory my_organism
pyani.py anim my_organism -o my_organism_ANIm --scheduler SGE: conduct ANIm analysis on the genomes in the directory my_organism
pyani.py anib my_organism -o my_organism_ANIb --scheduler SGE: conduct ANIb analysis on the genomes in the directory my_organism
pyani.py render my_organism_ANIm --gmethod seaborn: draw graphical output for the ANIm analysis in the directory my_organism_ANIm
pyani.py classify my_organism_ANIm: conduct classification analysis of ANIm results in the directory my_organism_ANIm
pyani.py db --setdb my_db: specify the database (sqlite3?) to hold comparison data; create it if it does not exist
pyani.py db --update my_organism_ANIm: update the current comparison data database with the results contained in my_organism_ANIm - this might be useful after a partial run/failure.

Some modifications to the options are also desirable:

specify multiple input directories
specify multiple class/label files

Database Storage

I have a goal to store all the comparison results in a persistent database, so that incremental additions to existing analyses are made easier, and that partially complete jobs can be resumed.

General Implementation

A specific sqlite3 database is designated as 'current' for any analysis (e.g. with pyani.py db --setdb <location>)
The default database location could be .pyani/pyanidb in the root directory for the analysis (other configuration/debug information may go into .pyani)
The database will recognise a MD5 (or other) hash as representing a unique input genome. This will require all input genomes to be 'indexed'
- The indexing may be performed during download with pyani download
- Indexing may be forced with pyani index <directory>
Previously-seen genomes will be stored in a table (a separate table of their previously-seen locations will be kept)
Comparisons between genome pairs will be recorded in a table, indicating the tool (MUMmer, BLAST+, etc.) and date (which may be used to force a recomparison if requested)
For each comparison, we will record in another table the values that are currently recorded in the output .tab files
Anticipated tables:
- genomes: hashes of genome sequences
- paths: known paths for each hash, keyed by hash from genomes
- comparisons: pairwise comparisons conducted, multikeyed by query and subject genomes from genomes, with a column describing the comparison (and options used)
- data: pairwise comparison results: identity, coverage, mismatches, etc. - what is currently reported in .tab files

This database will allow rapid identification of which analyses have been performed before, negating the need to redo comparisons.

It will also provide a persistent record of comparisons which can be accessed for downstream analyses using, e.g. pyani render and a set of genome files (or list of their hashes?). This will allow ready subsetting of outputs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap

Current roadmap (2022)

Mileston 0.2.12

Bugs

Milestone 0.3.0

Subcommands

Features

Questions

Bugs

Misc.

Close?

Milestone 0.3.1

Subcommands

Features

Bugs

Milestone 0.3.2

Subcommands

Milestone 0.3.3

Features

Previous roadmap (2017)

Index

Interface

Database Storage

General Implementation

Tables

Clone this wiki locally