Skip to content

Commit

Permalink
Merge pull request #9 from joshuailevy/unit-test-adding
Browse files Browse the repository at this point in the history
Unit test adding
  • Loading branch information
joshuailevy authored Nov 2, 2021
2 parents 0a15c8a + 5170877 commit 805db30
Show file tree
Hide file tree
Showing 20 changed files with 41,858 additions and 2,266 deletions.
5 changes: 3 additions & 2 deletions .github/workflows/python-package-conda.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
name: freyja CI

on:
push:
branches: [ main ]
Expand All @@ -25,7 +24,9 @@ jobs:
run: |
conda create --yes -n freyja python=3.7
conda activate freyja
conda config --add channels conda-forge
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda install --yes --file ci/conda_requirements.txt
pip install -e . --no-deps
Expand Down
29 changes: 8 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,47 +8,34 @@ Freyja is entirely written in Python 3, but requires preprocessing by tools like

### Dependencies
* [iVar](https://github.com/andersen-lab/ivar)
* [samtools](https://github.com/samtools/samtools)
* [UShER](https://usher-wiki.readthedocs.io/en/latest/#)
* [cvxpy](https://www.cvxpy.org/)
* [tqdm](https://github.com/tqdm/tqdm)
* [numpy](https://numpy.org/)
* [pandas](https://pandas.pydata.org/)

## Usage
After primer trimming in iVar, we get both variant call and sequencing depth information with the command:
```
samtools mpileup -aa -A -d 600000 -Q 20 -q 0 -B -f NC_045512_Hu-1.fasta filename.trimmed.bam | tee >(cut -f1-4 > filename.depth) | ivar variants -p filename -q 20 -r NC_045512_Hu-1.fa
freyja variants [bamfile] --variants [variant outfile name] --depths [depths outfile name]
```
which uses both samtools and iVar.

We can then run Freyja on the output files using the commmand:
```
python sample_deconv.py variant_tsvs/ depth_files/ output_result.tsv
freyja demix [variants-file] [depth-file] --output [output-file]
```
This results in a tsv file, which includes the lineages present and their corresponding abundances.
This outputs to a tsv file that includes the lineages present, their corresponding abundances, and summarization by constellation.

---
### Additional options
1. By default, this method will use the existing "usher_barcodes.csv" file for the barcodes. To make a new barcode library, download the latest global phylogenetic tree from UShER: http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/. To pull the latest tree from the comand line, run
By default, this method ships with an existing "data/usher_barcodes.csv" file for the barcodes, and the [outbreak.info](https://outbreak.info/) curated lineage metadata file for summarizing lineages by WHO designation. To update both of these we recommend running the command

```
wget http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/public-latest.all.masked.pb.gz
```

Lineage defining mutation barcodes are extracted using
```
matUtils extract -i public-latest.all.masked.pb.gz -C lineagePaths.txt
```
and these are converted to a new barcode set by
```
python convert_paths2barcodes.py lineagePaths.txt
```
which saves the new barcodes as "usher_barcodes.csv".
freyja update
2. For summarizing of lineages by constellation, we pull directly from the [outbreak.info](https://outbreak.info/) curated lineage metadata file. To pull a new one, just run

```
wget -N https://raw.githubusercontent.com/outbreak-info/outbreak.info/master/web/src/assets/genomics/curated_lineages.json
```
which downloads new versions of the curated lineage file as well as the UShER global phylogenetic [tree](http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/), which is subsequently converted into barcodes and saved in "data/usher_barcodes.csv".

---

Expand Down
3 changes: 3 additions & 0 deletions ci/conda_requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,6 @@ cvxpy
pytest
flake8
coveralls
ivar
samtools
usher
5 changes: 0 additions & 5 deletions demixing_result.csv

This file was deleted.

121 changes: 63 additions & 58 deletions freyja.egg-info/PKG-INFO
Original file line number Diff line number Diff line change
Expand Up @@ -5,63 +5,68 @@ Summary: Freyja recovers relative lineage abundances from mixed SARS-CoV-2 sampl
Home-page: https://github.com/joshuailevy/freyja
Author: Joshua Levy
Author-email: jolevy@scripps.edu
License: XXX
Description: # Freyja
Freyja is a tool to recover relative lineage abundances from mixed SARS-CoV-2 samples from a sequencing dataset (BAM aligned to the Hu-1 reference). The method uses lineage-determining mutational "barcodes" derived from the UShER global phylogenetic tree as a basis set to solve the constrained (unit sum, non-negative) de-mixing problem.

Freyja is intended as a post-processing step after primer trimming and variant calling in [iVar (Grubaugh and Gangavaparu et al., 2019)](https://github.com/andersen-lab/ivar). From measurements of SNV freqency and sequencing depth at each position in the genome, Freyja returns an estimate of the true lineage abundances in the sample.

## Installation
Freyja is entirely written in Python 3, but requires preprocessing by tools like iVar and [samtools](https://github.com/samtools/samtools) mpileup to generate the required input data. Successful installation of iVar (available via conda) should be sufficient to perform all required steps.

### Dependencies
* [iVar](https://github.com/andersen-lab/ivar)
* [UShER](https://usher-wiki.readthedocs.io/en/latest/#)
* [cvxpy](https://www.cvxpy.org/)
* [tqdm](https://github.com/tqdm/tqdm)
* [numpy](https://numpy.org/)
* [pandas](https://pandas.pydata.org/)

## Usage
After primer trimming in iVar, we get both variant call and sequencing depth information with the command:
```
samtools mpileup -aa -A -d 600000 -Q 20 -q 0 -B -f NC_045512_Hu-1.fasta filename.trimmed.bam | tee >(cut -f1-4 > filename.depth) | ivar variants -p filename -q 20 -r NC_045512_Hu-1.fa
```

We can then run Freyja on the output files using the commmand:
```
python sample_deconv.py variant_tsvs/ depth_files/ output_result.tsv
```
This results in a tsv file, which includes the lineages present and their corresponding abundances.

---
### Additional options
1. By default, this method will use the existing "usher_barcodes.csv" file for the barcodes. To make a new barcode library, download the latest global phylogenetic tree from UShER: http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/. To pull the latest tree from the comand line, run

```
wget http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/public-latest.all.masked.pb.gz
```

Lineage defining mutation barcodes are extracted using
```
matUtils extract -i public-latest.all.masked.pb.gz -C lineagePaths.txt
```
and these are converted to a new barcode set by
```
python convert_paths2barcodes.py lineagePaths.txt
```
which saves the new barcodes as "usher_barcodes.csv".

2. For summarizing of lineages by constellation, we pull directly from the [outbreak.info](https://outbreak.info/) curated lineage metadata file. To pull a new one, just run

```
wget -N https://raw.githubusercontent.com/outbreak-info/outbreak.info/master/web/src/assets/genomics/curated_lineages.json
```

---

Acknowledgements


License: BSD 2-Clause
Platform: UNKNOWN
Description-Content-Type: text/markdown
License-File: LICENSE

# Freyja
Freyja is a tool to recover relative lineage abundances from mixed SARS-CoV-2 samples from a sequencing dataset (BAM aligned to the Hu-1 reference). The method uses lineage-determining mutational "barcodes" derived from the UShER global phylogenetic tree as a basis set to solve the constrained (unit sum, non-negative) de-mixing problem.

Freyja is intended as a post-processing step after primer trimming and variant calling in [iVar (Grubaugh and Gangavaparu et al., 2019)](https://github.com/andersen-lab/ivar). From measurements of SNV freqency and sequencing depth at each position in the genome, Freyja returns an estimate of the true lineage abundances in the sample.

## Installation
Freyja is entirely written in Python 3, but requires preprocessing by tools like iVar and [samtools](https://github.com/samtools/samtools) mpileup to generate the required input data. Successful installation of iVar (available via conda) should be sufficient to perform all required steps.

### Dependencies
* [iVar](https://github.com/andersen-lab/ivar)
* [samtools](https://github.com/samtools/samtools)
* [UShER](https://usher-wiki.readthedocs.io/en/latest/#)
* [cvxpy](https://www.cvxpy.org/)
* [numpy](https://numpy.org/)
* [pandas](https://pandas.pydata.org/)

## Usage
After primer trimming in iVar, we get both variant call and sequencing depth information with the command:
```
samtools mpileup -aa -A -d 600000 -Q 20 -q 0 -B -f NC_045512_Hu-1.fasta [filename.trimmed.bam] | tee >(cut -f1-4 > [outpath.depth]) | ivar variants -p [outpath] -q 20 -r NC_045512_Hu-1.fa
```

We can then run Freyja on the output files using the commmand:
```
freyja demix [variants-file] [depth-file] --output [output-file]
```
This outputs in a tsv file, which includes the lineages present, their corresponding abundances, and summarization by constellation.

---
### Additional options
1. By default, this method will use the existing "data/usher_barcodes.csv" file for the barcodes, and the [outbreak.info](https://outbreak.info/) curated lineage metadata file for summarizing lineages by WHO designation. To update both of these we recommend running the command

```
freyja update

```
which downloads new versions of the UShER global phylogenetic [tree](http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/) as well as the curated lineage file.

Lineage defining mutation barcodes are extracted using
```
matUtils extract -i public-latest.all.masked.pb.gz -C lineagePaths.txt
```
These are converted to a new barcode set by
```
freyja barcode lineagePaths.txt
```
which saves the new barcodes as "usher_barcodes.csv".

2. For summarizing of lineages by constellation, we pull directly from the [outbreak.info](https://outbreak.info/) curated lineage metadata file. To pull a new one, just run

```
wget -N https://raw.githubusercontent.com/outbreak-info/outbreak.info/master/web/src/assets/genomics/curated_lineages.json
```

---

Acknowledgements



10 changes: 9 additions & 1 deletion freyja.egg-info/SOURCES.txt
Original file line number Diff line number Diff line change
@@ -1,21 +1,29 @@
LICENSE
README.md
setup.py
freyja/__init__.py
freyja/_cli.py
freyja/convert_paths2barcodes.py
freyja/sample_deconv.py
freyja/updates.py
freyja.egg-info/PKG-INFO
freyja.egg-info/SOURCES.txt
freyja.egg-info/dependency_links.txt
freyja.egg-info/entry_points.txt
freyja.egg-info/requires.txt
freyja.egg-info/top_level.txt
freyja/data/NC_045512_Hu-1.fasta
freyja/data/NC_045512_Hu-1.fasta.fai
freyja/data/curated_lineages.json
freyja/data/lineagePaths.txt
freyja/data/mixture.depth
freyja/data/mixture.tsv
freyja/data/public-latest.all.masked.pb.gz
freyja/data/test.bam
freyja/data/test.depth
freyja/data/test.tsv
freyja/data/usher_barcodes.csv
freyja/tests/__init__.py
freyja/tests/test_barcoding.py
freyja/tests/test_deconv.py
freyja/tests/test_deconv.py
freyja/tests/test_variants.py
53 changes: 40 additions & 13 deletions freyja/_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,13 @@
convert_to_barcodes, reversion_checking
from freyja.sample_deconv import buildLineageMap, build_mix_and_depth_arrays,\
reindex_dfs, map_to_constellation, solve_demixing_problem
from freyja.updates import download_tree, convert_tree,\
get_curated_lineage_data
import os
import urllib.request
import subprocess
import sys

locDir = os.path.abspath(os.path.join(os.path.realpath(__file__), os.pardir))


@click.group()
Expand All @@ -16,6 +21,7 @@ def cli():
@cli.command()
@click.argument('filename', type=click.Path(exists=True))
def barcode(filename):
# not needed anymore. This all takes place in the update function now.
print('Building barcodes from global phylogenetic tree')
df = pd.read_csv(filename, sep='\t')
df = parse_tree_paths(df)
Expand Down Expand Up @@ -57,19 +63,40 @@ def demix(variants, depths, output):

@cli.command()
def update():
# get data from UShER
print('Downloading a new global tree')
url = 'http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/\
UShER_SARS-CoV-2/public-latest.all.masked.pb.gz'
locDir = os.path.abspath(os.path.join(os.path.realpath(__file__),
os.pardir))
urllib.request.urlretrieve(url, os.path.join(locDir,
"data/public-latest.all.masked.pb.gz"))
print('Downloading an updated curated lineage set from outbreak.info')
url2 = 'https://raw.githubusercontent.com/outbreak-info/outbreak.info/\
master/web/src/assets/genomics/curated_lineages.json'
urllib.request.urlretrieve(url2,
os.path.join(locDir,
"data/curated_lineages.json"))
download_tree()
print('Getting outbreak data')
get_curated_lineage_data()
print("Converting tree info to barcodes")
convert_tree() # returns paths for each lineage
# Now parse into barcode form
lineagePath = os.path.join(locDir, "data/lineagePaths.txt")
print('Building barcodes from global phylogenetic tree')
df = pd.read_csv(lineagePath, sep='\t')
df = parse_tree_paths(df)
df_barcodes = convert_to_barcodes(df)
df_barcodes = reversion_checking(df_barcodes)
df_barcodes.to_csv(os.path.join(locDir, 'data/usher_barcodes.csv'))


@cli.command()
@click.argument('bamfile', type=click.Path(exists=True))
@click.option('--ref', help='Reference',
default=os.path.join(locDir,
'data/NC_045512_Hu-1.fasta'),
type=click.Path())
@click.option('--variants', help='Variant call output file', type=click.Path())
@click.option('--depths', help='Sequencing depth output file',
type=click.Path())
def variants(bamfile, ref, variants, depths):
bashCmd = f"samtools mpileup -aa -A -d 600000 -Q 20 -q 0 -B -f "\
f"{ref} {bamfile} | tee >(cut -f1-4 > {depths}.depth) |"\
f" ivar variants -p {variants} -q 20 -t 0.0 -r {ref}"
sys.stdout.flush() # force python to flush
completed = subprocess.run(bashCmd, shell=True, executable="/bin/bash",
stdout=subprocess.PIPE)
sys.exit(completed.returncode)


if __name__ == '__main__':
Expand Down
Loading

0 comments on commit 805db30

Please sign in to comment.