Skip to content

joelfnogueira/PD_Metagenomic_Analysis

 
 

Repository files navigation

Metagenomic Analysis of the Parkinson's Disease Microbiome (Under Construction)

This repository recreates the shotgun metagenomic analysis presented in : paper link


Badges will go here

Table of Contents

  1. Background
  2. Requirements
  3. Workflow
  4. Setup/Installation
  5. Metadata Explained Variance
  6. Community Composition
  7. Multivariate Statistical Linear Models
  8. Dirichlet Multinomial Mixtures
  9. Probabilistic Graphical Models
  10. Gut Metabolic and Gut Brain Modules GMMs GBMs
  11. Virulence Analysis
  12. Microbial Amyloid Quantification
  13. Biomarker Selection and Validation
  14. Feature Specificity for Parkinson's Disease

Background

Add Abstract

Requirements

All software used for this analysis is open source and freely available to the public. The majority of this analysis takes place in R-studio. Certain packages require R version >= 3.6. We recommend updating to R 4.0.1. - "See Things Now" for this analysis.

  1. Download R
  2. Download R-studio

In addition, the FlashWeave Probablistic Graphical Models utilze Julia, Python, JupyterNotebook, follow the intructions below to download the necessary packages:

  1. Download python
  2. Download numpy
  3. Download networkx
  4. Download matplotlib
  5. Download JupyterNotebook
  6. Download Julia (see below)

Download the binary version from https://julialang.org/downloads/. Julia 1.0 or above are currently supported by FlashWeave.

To call julia from the command line, update your .bash_profile with the following BUT replace quoted section with your own download location/version:

PATH="/Applications/Julia-1.4.app/Contents/Resources/julia/bin/:${PATH}" export PATH

This line sources your .bashrc file (also add to .bash_profile)

if [ -f $HOME/.bashrc ]; then . $HOME/.bashrc fi

Workflow:

Run the following analyses in the specified order. The scripts are located in the source (src) file and the outputs of will be generated in the data/ and figures/ folders. R-scripts may be run by opening each individually in R-studio, selecting all, and using (command + enter) or by the command line by typing the following:

Rscript name_of_rscript.R

Various Object Types names and descriptions

Phyloseq Objects Feature Abundance
dat Species
dat.genus Genus
dat.phylum Phylum
dat.pathways Pathways (Stratified)
dat.pathways.slim Pathways
dat.ecs Enzmyes (Stratified)
dat.ecs.slim Enzmyes
dat.KOs Kegg Orthologs (Stratified)
dat.KOs.slim Kegg Orthologs
dat.Pfams Pfams (Stratified)
dat.Pfams.slim Pfams
dat.Eggnogs EggNogs (Stratified)
dat.Eggnogs.slim EggNogs

Setup/Installation:

Load all necessary packages for analysis. If any errors present themselves make sure you are using the proper version of R (4.0.1. - "See Things Now").

  • Run: configure.enviornment.R

To collate data tables into phlyoseq objects that are used downstream.

  • Run:create_phyloseq_obj.R

Metadata-Explained-Variance

This analysis sources PERMANOVA_Analysis.R which may take a few minutes to complete with permutations = 9,999.

  • Run: PERMANOVA_Viz.R

Community-Composition

Run the following scripts:

  • Community_Composition_Overview.R
  • Beta_Diversity_Analysis_adaptable_input_script.R
  • Alpha_Diversity_Analysis_adaptable_input_script.R

Multivariate-Statistical-Linear-Models

To test for associations between our PD donors and the two controls groups we utilized MaAsLin2 and employed general linear models accounting for age, sex, and bmi in one comparison between PD patients (n=48) and Healthy Population Controls (n=41), and a separate model for PD Patients and Spouse Controls (n=29 each) which accounts for the household effect.

Data generated by this analysis is used in multiple scripts downstream. Conducting this analysis will take approximately one hour (longer if no low variance trimming is selected) due to the large amount of features present in the enzyme and KO datasets along with mutliple models accounting for stratification.

Run the following scripts:

  • MaAsLin2_Analysis.R

To vizualize data generated from these models:

Run:

  • Differential_Abundance_Viz_Taxa_Figure_2.R

  • Differential_Abundance_Viz_Functional.R

Note that this script requires some manual input: To vizualize a particular dataset of interest - replace name of Robj in section of script titled (SWAP FUNCTION LEVEL HERE) (see table above for options)

(NOTE TO SELF: May be best to run this analysis using the command line & instruct users to input a tag that fills in Robj.

Dirichlet-Multinomial-Mixtures

  • Run: DMM_Analysis.R

Probabilistic-Graphical-Models

This analysis requires multiple platforms, run the following script in Rstudio to prep the necessary input tables and metadata for FlashWeave analysis and vizualization. Flashweave analysis takes place at the command-line, while vizualization uses D3.js within a Jupyter notebook.

  • Run: FlashWeave_input_prep.R

(WARNING: This analysis may take up to (3-4) hours) Next, open a terminal console and navigate to the directory with this repository. Run the FlashWeave Analysis with the following line:

julia correlation_analysis.jl

To build the interactive network open Jupyter Notebook (type Jupter Notebook in terminal console). Go to the Help Tab and select : Launch Classic Notebook. Navigate to D3.js_Network_viz.ipynb file and enter (Cell -> Run All)

Gut-Metabolic-and-Gut-Brain-Modules-GMMs-GBMs

  • Run: omixer-rmpR_setup.R

Virulence-Analysis

  • Run: Virulence_Analysis.R

Microbial-Amyloid-Quantification

  • Run: Amyloidgenic_Protein_Analysis.R

Biomarker-Selection-and-Validation

(On-going)

PD-Feature-Specificity

  • Run: Disease_Specificity.R (On-going)

Acknowledgements

License

A short snippet describing the license (MIT, Apache etc) MIT License 2020 jboktor

Any questions contact: jboktor@caltech.edu

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 76.2%
  • R 23.6%
  • Julia 0.2%