Skip to content

francesccoll/powerbacgwas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PowerBacGWAS: Power calculations for Bacterial GWAS

PowerBacGWAS is a computational pipeline to conduct power calculations for bacterial GWAS. It uses existing collections of bacterial genomes to establish the sample sizes required to detect statistical significant associations for a given genotype frequency and effect size (or phenotype heritability). It supports a range of genomic variation including SNPs, indels, and variation in gene content (pan-genome). Here, we make the code available, and provide installation and usage instructions. PowerBacGWAS can be applied to any bacterial population Here we applied it to three different bacterial species: Enterococcus faecium, Klebsiella pneumoniae, and Mycobacterium tuberculosis.

Docker/Nextflow Installation

The easiest and recommended way to install and run PowerBacGWAS is via its Docker/Nextflow implementation.

You will need to:

  1. Install Docker or Singularity
  2. Install Nextflow
  3. Download PowerBacGWAS Nextflow files from GitHub:
git clone https://github.com/francesccoll/powerbacgwas/
cd powerbacgwas/nextflow
nextflow run main.nf --help

See the PowerBacGWAS wiki page for examples of Nextflow commands.

Local Installation

PowerBacGWAS consists of a set of Python and R scripts that would work provided that all required dependencies below (both python modules and software) are installed in your local machine.

Required dependencies

Software

  • Python3 version >= 3.6.9
  • R version >= 3.6.3
  • PastML version >= 1.9.20
  • plink version >= PLINK v1.90b6.17 64-bit (28 Apr 2020)
  • GCTA version >= version 1.93.2
  • pyseer version >= pyseer 1.3.7-dev
  • bcftools version >= 1.9
  • bgzip version >= 1.9
  • tabix version >= 1.9

Python Modules

  • cyvcf2 version >= 0.20.8
  • scipy version >= 1.5.4
  • numpy version >= 1.19.4
  • pandas version >= 1.1.4
  • PyVCF version >= 0.6.8

R libraries

  • optparse >= 1.6.6
  • ggplot2 version >= 3.3.1

From Source

Download the latest release from this github repository or clone it.

git clone https://github.com/francesccoll/powerbacgwas/
cd powerbacgwas/

As the pipeline uses scripts from PastML and PySeer, clone their GitHub directories into the downloaded powerbacgwas folder:

git clone https://github.com/evolbioinfo/pastml
git clone https://github.com/mgalardini/pyseer

Usage and Tutorials

Please read the PowerBacGWAS wiki page for full usage instructions and tutorials.

License

PowerBacGWAS is a free software, licensed under GNU General Public License v3.0

Feedback/Issues

Use the issues page to report on installation and usage issues.

Citation

Not available yet