Skip to content

andrewgull/HeteroR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine learning detection of unstable antibiotic heteroresistance in E. coli

The pipelines are created using Snakemake

Data analysis and modelling are performed using R and tidyverse.

Snakemake pipelines

Main pipeline

File: workflow/snakefile.smk

Purpose: assembling and annotating E. coli genomes (resistance genes, IS elements, direct repeats) from both short and long sequencing reads.

Configuration file: workflow/config.yaml

To run the pipeline short and long reads should be in resources/data_raw/{strain}/short/ and resources/data_raw/{strain}/long/ directories.

DAG:

main dag

Phylogeny pipeline

File: workflow/phylogeny.smk

Purpose: phylogenetic analysis of the samples including 27 reference strains.

Configuration file: workflow/config_phylogeny.yaml

Analysis of mutants

File: mutants.smk

Purpose: analysis of the HR mutants.

Configuration file: workflow/config_mutants.yaml

DAG:

mut dag

Data analysis and machine learning

For feature generation see notebooks/modelling/features.qmd.

For exploratory data analysis of the features, see file notebooks/modelling/EDA.qmd,

For training and validation procedures, see notebooks/modelling/training_and_validation.Rmd,

For analysis of the models, see notebooks/modelling/models_analysis.Rmd

Features table: notebooks/modelling/data/features_strain.csv

Strains will be available from SRA under BioProject PRJNA1165464

About

code for heteroresistance project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published