The pipelines are created using Snakemake
Data analysis and modelling are performed using R and tidyverse.
File: workflow/snakefile.smk
Purpose: assembling and annotating E. coli genomes (resistance genes, IS elements, direct repeats) from both short and long sequencing reads.
Configuration file: workflow/config.yaml
To run the pipeline short and long reads should be in resources/data_raw/{strain}/short/
and resources/data_raw/{strain}/long/
directories.
DAG:
File: workflow/phylogeny.smk
Purpose: phylogenetic analysis of the samples including 27 reference strains.
Configuration file: workflow/config_phylogeny.yaml
File: mutants.smk
Purpose: analysis of the HR mutants.
Configuration file: workflow/config_mutants.yaml
DAG:
For feature generation see notebooks/modelling/features.qmd
.
For exploratory data analysis of the features, see file notebooks/modelling/EDA.qmd
,
For training and validation procedures, see notebooks/modelling/training_and_validation.Rmd
,
For analysis of the models, see notebooks/modelling/models_analysis.Rmd
Features table: notebooks/modelling/data/features_strain.csv
Strains will be available from SRA under BioProject PRJNA1165464