Emergenet

Emergenet is a framework to create digital twin of the wild viral ecosystem of Influenza strains, to rapidly and scalably assesses the emergence risks of Influenza A strains circulating in non-human hosts. By analyzing genomic sequences of key viral proteins (HA and NA), Emergenet estimates the likelihood of future mutations and predicts the potential for these strains to acquire human host adapatbility.

File Tree

The file tree shows relavent directories for the current version of the project.

To replicate the results from the paper, go to paper_data_v3 and follow the README.

emergenet
├── emergenet : Emergenet package source code
├── examples : Examples using the emergenet.emergenet and emergenet.domseq modules
├── paper_data_v3 : Results for current version of the paper
└── tex : LaTeX source files for paper

Description

Predicting seasonal vaccine strains
Estimating emergence risk of non-human strains

Installation

PyPI

To install with pip:

pip install emergenet --upgrade

Dependencies

quasinet
numpy
pandas
matplotlib
distance
biopython
scikit-learn
shapely
alphashape

Quick Start

Examples are located here.

For more documentation, see here.

Estimating Emergence Risk with `emergenet.emergenet`

To evaluate a strain, use examples/estimate_risk.py. You only need to provide the HA and NA sequences. Run python examples/estimate_risk.py -h for more arguments.

Here is a detailed example using an IRAT strain, A/Indiana/08/2011, evaluated at present time.

$ HA=MKTIIAFSCILCLIFAQKLPGSDNSMATLCLGHHAVPNGTLVKTITDDQIEVTNATELVQSSSTGGICNSPHQILDGKNCTLIDALLGDPHCDDFQNKEWDLFVERSTAYSNCYPYYVPDYATLRSLVASSGNLEFTQESFNWTGVAQGGSSYACRRGSVNSFFSRLNWLYNLNYKYPEQNVTMPNNDKFDKLYIWGVHHPGTDKDQTNLYVQASGRVIVSTKRSQQTVIPNIGSRPWVRGVSSIISIYWTIVKPGDILLINSTGNLIAPRGYFKIQSGKSSIMRSDAHIDECNSECITPNGSIPNDKPFQNVNKITYGACPRYVKQNTLKLATGMRNVPEKQTRGIFGAIAGFIENGWEGMVDGWYGFRHQNSEGTGQAADLKSTQAAINQITGKLNRVIKKTNEKFHQIEKEFSEVEGRIQDLEKYVEDTKIDLWSYNAEILVALENQHTIDLTDSEMSKLFERTRRQLRENAEDMGNGCFKIYHKCDNACIGSIRNGTYDHDIYRNEALNNRFQIKGVQLKSGYKDWILWISFAISCFLLCVVLLGFIMWACQKGNIRCNICI
$ NA=MNPNQKIITIGSVSLIIATICFLMQIAILVTTVTLHFKQHDYNSPPNNQAMLCEPTIIERNTTEIVYLTNITIEKEICPKLAEYRNWSKPQCNITGFAPFSKDNSIRLSAGGDIWVTREPYVSCDPDKCYQFALGQGTTLNNGHSNNTVHDRTPYRTLLMNELGVPFHLGTRQVCMAWSSSSCHDGKAWLHVCITGNDNNATASFIYNGRLVDSIGSWSKNILRTQESECVCINGTCTVVMTDGSASGKADTKILFVEEGKIVHISTLSGSAQHVEECSCYPRFPGVRCVCRDNWKGSNRPIVDINVKNYSIVSSYVCSGLVGDTPRKSDSVSSSYCLDPNNEKGGHGVKGWAFDDGNDVWMGRTINETLRLGYETFKVIEGWSKANSKLQTNRQVIVEKGDRSGYSGIFSVEGKSCINRCFYVELIRGRKEETKVWWTSNSIVVFCGTSGTYGTGSWPDGADINLMPI
$ python estimate_risk.py $HA $NA --risk_sample_size=100

Estimated IRAT Emergence Score: 6.50
Time taken: 31.28 seconds

Here is a detailed example using the same IRAT strain, A/Indiana/08/2011, evaluated at the time of IRAT assessment.

import pandas as pd
from emergenet.emergenet import Enet, predict_irat_emergence

DATA_DIR = 'data/emergenet/'

# Load IRAT sequence - A/Indiana/08/2011
irat_df = pd.read_csv(DATA_DIR+'irat.csv')
row = irat_df.iloc[20]

# We need the analysis date, and HA and NA sequences
# Optionally, we can proved a save_data directory
analysis_date = row['Date of Risk Assessment']
ha_seq = row['HA Sequence']
na_seq = row['NA Sequence']
SAVE_DIR = 'data/emergenet/example_results/'

# Initialize the Enet
enet = Enet(analysis_date=analysis_date, 
            ha_seq=ha_seq, 
            na_seq=na_seq, 
            save_data=SAVE_DIR, 
            random_state=42)

# Estimate the Enet risk scores
ha_risk, na_risk = enet.risk(risk_sample_size=100)

# Map the Enet risk scores to the IRAT risk scale
irat, irat_low, irat_high = predict_irat_emergence(ha_risk=ha_risk, 
                                                   na_risk=na_risk)

Predicting Future Dominant Strain with `emergenet.domseq`

import pandas as pd
from emergenet.domseq import DomSeq
from emergenet.utils import save_model, load_model

DATA_DIR = 'data/domseq/'

# Initialize the DomSeq
domseq = DomSeq(seq_trunc_length=565, random_state=42)

# Load data from current time period (2021-2022 season)
df = pd.read_csv(DATA_DIR+'north_h1n1_21_22.csv')

# Train enet
enet = domseq.train(seq_df=df, sample_size=3000, n_jobs=1)

# Load candidate sequences for recommendation
# This includes all human H1N1 strains up until the 2021-2022 season
candidate_df = pd.read_csv(DATA_DIR+'north_h1n1_21_22_pred.csv')

# Compute prediction sequences (return predictions from top 3 largest clusters)
pred_df = domseq.predict_domseq(seq_df=df, 
                                pred_seq_df=candidate_df, 
                                enet=enet_model, 
                                n_clusters=3, 
                                sample_size=3000)

# Compute a single prediction for the dominant strain
single_pred_seq = domseq.predict_single_domseq(pred_seqs=pred_df, 
                                               pred_seq_df=candidate_df)

Name		Name	Last commit message	Last commit date
Latest commit History 582 Commits
build/lib/emergenet		build/lib/emergenet
docs		docs
emergenet.egg-info		emergenet.egg-info
emergenet		emergenet
examples		examples
extras		extras
irat_qnet/preddata		irat_qnet/preddata
paper_data_v0		paper_data_v0
paper_data_v1		paper_data_v1
paper_data_v2		paper_data_v2
paper_data_v3		paper_data_v3
tex		tex
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README.rst		README.rst
gpush		gpush
logo.png		logo.png
makedoc.sh		makedoc.sh
setup.py		setup.py
upload.sh		upload.sh
version.py		version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Emergenet

File Tree

Description

Installation

Dependencies

Quick Start

Estimating Emergence Risk with `emergenet.emergenet`

Predicting Future Dominant Strain with `emergenet.domseq`

About

Releases

Packages

Contributors 2

Languages

License

zeroknowledgediscovery/emergenet

Folders and files

Latest commit

History

Repository files navigation

Emergenet

File Tree

Description

Installation

Dependencies

Quick Start

Estimating Emergence Risk with emergenet.emergenet

Predicting Future Dominant Strain with emergenet.domseq

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Estimating Emergence Risk with `emergenet.emergenet`

Predicting Future Dominant Strain with `emergenet.domseq`

Packages