Skip to content

Commit

Permalink
Merge branch 'release-candidate-0.4.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
apetkau committed Feb 14, 2019
2 parents d8bc1f7 + 1bac73d commit e58cce1
Show file tree
Hide file tree
Showing 31 changed files with 792 additions and 459 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ __pycache__
/staramr/databases/data/
/.eggs
/.venv
/.mypy_cache
15 changes: 15 additions & 0 deletions .mypy.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[mypy]
python_version = 3.5
warn_unused_configs = False

[mypy-pandas.*]
ignore_missing_imports = True

[mypy-Bio.*]
ignore_missing_imports = True

[mypy-git.*]
ignore_missing_imports = True

[mypy-numpy.*]
ignore_missing_imports = True
7 changes: 4 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
sudo: required
language: python
python:
- "3.5"
- "3.6"

env:
Expand All @@ -18,7 +19,7 @@ install:
- conda create -c bioconda -c conda-forge -q -y -n test-environment python=$TRAVIS_PYTHON_VERSION blast=2.7.1 git
- source activate test-environment
- python setup.py install
- mkdir -p staramr/databases/data/
- staramr db build --dir staramr/databases/data/update $DATABASE_COMMITS
- staramr db build --dir staramr/databases/data $DATABASE_COMMITS
- pip install mypy==0.600

script: python setup.py test
script: ./scripts/mypy && python setup.py test
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
# Version 0.4.0

* Add support for campylobacter from PointFinder database.
* Fix `read_table` deprecation warnings by replacing `read_table` with `read_csv`.
* Handling issue with name of `16S` gene in PointFinder database for salmonella.
* Refactoring and simplifying some of the git ResFinder/PointFinder database code.
* Added automated type checking with [mypy](https://mypy.readthedocs.io).

# Version 0.3.0

* Exclusion of `aac(6')-Iaa` from results by default. Added ability to override this with `--no-exclude-genes` or pass a custom list of genes to exclude from results with `--exclude-genes-file`.
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ To include acquired point-mutation resistances using PointFinder, please run:
staramr search --pointfinder-organism salmonella -o out *.fasta
```

Where `--pointfinder-organism` is the specific organism you are interested in (currently only *salmonella* is supported).
Where `--pointfinder-organism` is the specific organism you are interested in (currently only *salmonella* and *campylobacter* are supported).


## Database Info
Expand Down Expand Up @@ -194,7 +194,7 @@ staramr db restore-default

## Dependencies

* Python 3
* Python 3.5+
* BLAST+
* Git

Expand Down Expand Up @@ -386,7 +386,7 @@ positional arguments:
optional arguments:
-h, --help show this help message and exit
--pointfinder-organism POINTFINDER_ORGANISM
The organism to use for pointfinder {salmonella}. Defaults to disabling search for point mutations. [None].
The organism to use for pointfinder {salmonella, campylobacter}. Defaults to disabling search for point mutations. [None].
-d DATABASE, --database DATABASE
The directory containing the resfinder/pointfinder databases [staramr/databases/data].
-n NPROCS, --nprocs NPROCS
Expand Down Expand Up @@ -528,7 +528,7 @@ Example:

# Caveats

This software is still a work-in-progress. In particular, not all organisms stored in the PointFinder database are supported (only *salmonella* is currently supported). Additionally, the predicted phenotypes are for microbiological resistance and *not* clinical resistance. Phenotype/drug resistance predictions are an experimental feature which is continually being improved.
This software is still a work-in-progress. In particular, not all organisms stored in the PointFinder database are supported (only *salmonella* and *campylobacter* are currently supported). Additionally, the predicted phenotypes are for microbiological resistance and *not* clinical resistance. Phenotype/drug resistance predictions are an experimental feature which is continually being improved.

`staramr` only works on assembled genomes and not directly on reads. A quick genome assembler you could use is [Shovill][shovill]. Or, you may also wish to try out the [ResFinder webservice][resfinder-web], or the command-line tools [rgi][] or [ariba][] which will work on sequence reads as well as genome assemblies. You may also wish to check out the [CARD webservice][card-web].

Expand Down
4 changes: 2 additions & 2 deletions bin/staramr
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,10 @@ if __name__ == '__main__':
try:
args.run_command(args)
except CommandParseException as e:
logger.error(e)
logger.error(str(e))
if e.print_help():
e.get_parser().print_help()
sys.exit(1)
except Exception as e:
logger.exception(e)
logger.exception(str(e))
sys.exit(1)
6 changes: 6 additions & 0 deletions scripts/mypy
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash

SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
ROOT_DIR="$SCRIPT_DIR/.."

mypy --config $ROOT_DIR/.mypy.ini $ROOT_DIR/bin/staramr $ROOT_DIR/staramr
2 changes: 1 addition & 1 deletion staramr/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '0.3.0'
__version__ = '0.4.0'
8 changes: 8 additions & 0 deletions staramr/blast/AbstractBlastDatabase.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,14 @@ def get_path(self, database_name):
"""
pass

@abc.abstractmethod
def get_name(self) -> str:
"""
Gets a name for this blast database implementation.
:return: A name for this implementation.
"""
pass

def get_database_paths(self):
"""
Gets a list of all database paths.
Expand Down
83 changes: 38 additions & 45 deletions staramr/blast/BlastHandler.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@
import subprocess
from concurrent.futures import ThreadPoolExecutor
from os import path
from typing import Dict

from Bio.Blast.Applications import NcbiblastnCommandline

from staramr.blast.AbstractBlastDatabase import AbstractBlastDatabase
from staramr.exceptions.BlastProcessError import BlastProcessError

logger = logging.getLogger('BlastHandler')
Expand All @@ -32,16 +34,14 @@ class BlastHandler:
qseq
'''.strip().split('\n')]

def __init__(self, resfinder_database, threads, output_directory, pointfinder_database=None):
def __init__(self, blast_database_objects_map: Dict[str, AbstractBlastDatabase], threads: int,
output_directory: str) -> None:
"""
Creates a new BlastHandler.
:param resfinder_database: The staramr.blast.resfinder.ResfinderBlastDatabase for the particular ResFinder database.
:param blast_database_objects_map: A map containing the blast databases.
:param threads: The maximum number of threads to use, where one BLAST process gets assigned to one thread.
:param output_directory: The output directory to store BLAST results.
:param pointfinder_database: The staramr.blast.pointfinder.PointfinderBlastDatabase to use for the particular PointFinder database.
"""
self._resfinder_database = resfinder_database

if threads is None:
raise Exception("threads is None")

Expand All @@ -53,11 +53,13 @@ def __init__(self, resfinder_database, threads, output_directory, pointfinder_da
self._output_directory = output_directory
self._input_genomes_tmp_dir = path.join(output_directory, 'input-genomes')

if (pointfinder_database == None):
self._pointfinder_configured = False
self._blast_database_objects_map = blast_database_objects_map

if (self._blast_database_objects_map['pointfinder'] is None):
self._pointfinder_configured = False # type: bool
del self._blast_database_objects_map['pointfinder']
else:
self._pointfinder_database = pointfinder_database
self._pointfinder_configured = True
self._pointfinder_configured = True # type: bool

self._thread_pool_executor = None
self.reset()
Expand All @@ -70,10 +72,8 @@ def reset(self):
if self._thread_pool_executor is not None:
self._thread_pool_executor.shutdown()
self._thread_pool_executor = ThreadPoolExecutor(max_workers=self._threads)
self._resfinder_blast_map = {}
self._pointfinder_blast_map = {}
self._pointfinder_future_blasts = []
self._resfinder_future_blasts = []
self._blast_map = {}
self._future_blasts_map = {}

if path.exists(self._input_genomes_tmp_dir):
logger.debug("Directory [%s] already exists", self._input_genomes_tmp_dir)
Expand All @@ -86,23 +86,15 @@ def run_blasts(self, files):
:param files: The files to scan.
:return: None
"""
database_names_resfinder = self._resfinder_database.get_database_names()
logger.debug("Resfinder Databases: %s", database_names_resfinder)

if self.is_pointfinder_configured():
database_names_pointfinder = self._pointfinder_database.get_database_names()
logger.debug("Pointfinder Databases: %s", database_names_pointfinder)
else:
database_names_pointfinder = None

db_files = self._make_db_from_input_files(self._input_genomes_tmp_dir, files)
logger.debug("Done making blast databases for input files")

for file in db_files:
logger.info("Scheduling blast for %s", path.basename(file))
self._schedule_resfinder_blast(file, database_names_resfinder)
if self.is_pointfinder_configured():
self._schedule_pointfinder_blast(file, database_names_pointfinder)
logger.info("Scheduling blasts for %s", path.basename(file))

for name in self._blast_database_objects_map:
database_object = self._blast_database_objects_map[name]
self._schedule_blast(file, database_object)

def _make_db_from_input_files(self, db_dir, files):
logger.info("Making BLAST databases for input files")
Expand All @@ -126,33 +118,34 @@ def _make_db_from_input_files(self, db_dir, files):

return db_files

def _schedule_resfinder_blast(self, file, database_names):
def _schedule_blast(self, file, blast_database):
database_names = blast_database.get_database_names()
logger.debug("%s databases: %s", blast_database.get_name(), database_names)
for database_name in database_names:
database = self._resfinder_database.get_path(database_name)
database = blast_database.get_path(database_name)
file_name = os.path.basename(file)

blast_out = os.path.join(self._output_directory, file_name + "." + database_name + ".resfinder.blast.xml")
blast_out = os.path.join(self._output_directory,
file_name + "." + database_name + "." + blast_database.get_name() + ".blast.tsv")
if os.path.exists(blast_out):
raise Exception("Error, blast_out [%s] already exists", blast_out)

self._resfinder_blast_map.setdefault(file_name, {})[database_name] = blast_out
self._get_blast_map(blast_database.get_name()).setdefault(file_name, {})[database_name] = blast_out

future_blast = self._thread_pool_executor.submit(self._launch_blast, database, file, blast_out)
self._resfinder_future_blasts.append(future_blast)
self._get_future_blasts_from_map(blast_database.get_name()).append(future_blast)

def _schedule_pointfinder_blast(self, file, database_names):
for database_name in database_names:
database = self._pointfinder_database.get_path(database_name)
file_name = os.path.basename(file)
def _get_blast_map(self, name):
if name not in self._blast_map:
self._blast_map[name] = {}

blast_out = os.path.join(self._output_directory, file_name + "." + database_name + ".pointfinder.blast.xml")
if os.path.exists(blast_out):
raise Exception("Error, blast_out [%s] already exists", blast_out)
return self._blast_map[name]

self._pointfinder_blast_map.setdefault(file_name, {})[database_name] = blast_out
def _get_future_blasts_from_map(self, name):
if name not in self._future_blasts_map:
self._future_blasts_map[name] = []

future_blast = self._thread_pool_executor.submit(self._launch_blast, database, file, blast_out)
self._pointfinder_future_blasts.append(future_blast)
return self._future_blasts_map[name]

def is_pointfinder_configured(self):
"""
Expand All @@ -169,9 +162,9 @@ def get_resfinder_outputs(self):
"""

# Forces any exceptions to be thrown if error with blasts
for future_blast in self._resfinder_future_blasts:
for future_blast in self._get_future_blasts_from_map('resfinder'):
future_blast.result()
return self._resfinder_blast_map
return self._get_blast_map('resfinder')

def get_pointfinder_outputs(self):
"""
Expand All @@ -181,9 +174,9 @@ def get_pointfinder_outputs(self):
"""
if (self.is_pointfinder_configured()):
# Forces any exceptions to be thrown if error with blasts
for future_blast in self._pointfinder_future_blasts:
for future_blast in self._get_future_blasts_from_map('pointfinder'):
future_blast.result()
return self._pointfinder_blast_map
return self._get_blast_map('pointfinder')
else:
raise Exception("Error, pointfinder has not been configured")

Expand Down
7 changes: 5 additions & 2 deletions staramr/blast/pointfinder/PointfinderBlastDatabase.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,13 +71,16 @@ def get_organism(self):
"""
return self.organism

def get_name(self):
return 'pointfinder'

@classmethod
def get_available_organisms(cls):
"""
A Class Method to get a list of organisms that are currently supported by staramr.
:return: The list of organisms currently supported by staramr.
"""
return ['salmonella']
return ['salmonella','campylobacter']

@classmethod
def get_organisms(cls, database_dir):
Expand All @@ -86,7 +89,7 @@ def get_organisms(cls, database_dir):
:param database_dir: The PointFinder database root directory.
:return: A list of organisms.
"""
config = pd.read_table(path.join(database_dir, 'config'), comment='#', header=None,
config = pd.read_csv(path.join(database_dir, 'config'), sep='\t', comment='#', header=None,
names=['db_prefix', 'name', 'description'])
return config['db_prefix'].tolist()

Expand Down
27 changes: 24 additions & 3 deletions staramr/blast/pointfinder/PointfinderDatabaseInfo.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,27 @@
import pandas as pd
import logging

from os import path

"""
A Class storing information about the specific PointFinder database.
"""

logger = logging.getLogger('PointfinderDatabaseInfo')


class PointfinderDatabaseInfo:

def __init__(self, database_info_dataframe):
def __init__(self, database_info_dataframe, file=None):
"""
Creates a new PointfinderDatabaseInfo.
:param database_info_dataframe: A pd.DataFrame containing the information in PointFinder.
:param file: The file where the pointfinder database info originates from.
"""
self._pointfinder_info = database_info_dataframe
self._file = file

self._resistance_table_hacks(self._pointfinder_info)

@classmethod
def from_file(cls, file):
Expand All @@ -22,8 +31,8 @@ def from_file(cls, file):
:param file: The file containing drug resistance mutations.
:return: A new PointfinderDatabaseInfo.
"""
pointfinder_info = pd.read_table(file, index_col=False)
return cls(pointfinder_info)
pointfinder_info = pd.read_csv(file, sep='\t', index_col=False)
return cls(pointfinder_info, file)

@classmethod
def from_pandas_table(cls, database_info_dataframe):
Expand All @@ -34,6 +43,18 @@ def from_pandas_table(cls, database_info_dataframe):
"""
return cls(database_info_dataframe)

def _resistance_table_hacks(self, table):
"""
A function implementing some hacks to try and fix mismatched strings in the pointfinder databases.
These should be removed when the underlying database is corrected.
:param table: The pointfinder resistance table to fix.
:return: None, but modifies the passed table in place.
"""
if self._file and 'salmonella' in str(self._file) and path.exists(
path.join(path.dirname(self._file), '16S_rrsD.fsa')):
logger.debug("Replacing [16S] with [16S_rrsD] for pointfinder organism [salmonella]")
table[['#Gene_ID']] = table[['#Gene_ID']].replace('16S', '16S_rrsD')

def _get_resistance_codon_match(self, gene, codon_mutation):
table = self._pointfinder_info

Expand Down
3 changes: 3 additions & 0 deletions staramr/blast/resfinder/ResfinderBlastDatabase.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,6 @@ def get_database_names(self):

def get_path(self, database_name):
return os.path.join(self.database_dir, database_name + self.fasta_suffix)

def get_name(self):
return 'resfinder'
Loading

0 comments on commit e58cce1

Please sign in to comment.