Skip to content

Computationally efficient and statistically powerful software for detecting context-specific eQTL effects in multi-context genomic studies.

Notifications You must be signed in to change notification settings

BrunildaBalliu/FastGxC

Repository files navigation

FastGxC

Computationally efficient and statistically powerful software for detecting context-specific eQTL effects in multi-context genomic studies.

Preprint available on BioRxiv

Extended data with FastGxC results on GTEx and CLUES cohorts can be found here

Scripts are still under construction but please email me (bballiu at ucla dot edu) with comments / questions.

Simulate toy data

Please download the required R packages inside generate_simulated_data.R before running the toy example.

If you want to run a toy example, you can generate simulated data by running

  project_directory=your_project_directory
  
  Rscript generate_simulated_data.R $project_directory

This script will make a data folder in your project_directory (if one does not already exists) and generate and save the following files (1) SNPs.txt: snp genotype data for 10,000 SNPs and 300 individuals (MatrixEQTL input format), (2) snpsloc.txt: location information of these 10,000 SNPs (MatrixEQTL input format), (3) expression.txt: gene expression data for 300 individuals across 100 genes and 50 contexts (2) geneloc.txt: location information of these 10 genes (MatrixEQTL input format),

Running FastGxC

Please download the required R packages inside decompose_expression.R and run_MatrixEQTL.R before running FastGxC.

FastGxC works in two steps. In the first step, expression is decomposed into shared and context-specific components. In the second step, eQTLs are separately mapped on these components.

Step 1 - Decomposition: For each individual, decompose the phenotype of interest (e.g. gene expression) across C contexts (e.g. tissues or cell-types) into one context-shared and C context-specific components by running.

project_directory=your_project_directory
exp_file_name="expression.txt"
Rscript decompose_expression.R $project_directory $exp_file_name

This script will take as an imput a file with gene expression data for all individuals, genes, and contexts (see expression.txt for right format) and outputs one file with context-shared expression (context_shared_expression.txt) and C files with expression specific to each context (CONTEXT_NAME_specific_expression.txt). The files will be saved in the data folder in your project_directory.

Step 2 - eQTL mapping: FastGxC estimates genetic effects on the context-shared component and each of the C context-specific components separately using simple linear models. Note: Here we use the R package MatrixEQTL but any other software that can perform quick linear regression can be used (e.g. FastQTL or tensorqtl).

Map context-specific eQTLs

project_directory=your_project_directory
nC = your_nummber_of_contexts

for i in $(seq 1 nC); do
    Rscript run_MatrixEQTL.R $project_directory SNPs.txt snpsloc.txt context$i\_specific\_expression.txt geneloc.txt  context$i\_specific\_eQTLs.txt specific
 done

Map context-shared eQTLs

  Rscript run_MatrixEQTL.R $project_directory SNPs.txt snpsloc.txt context_shared_expression.txt geneloc.txt  "context_shared_eQTLs.txt" shared

This script take as input data needed to run MatrixEQTL and outputs eQTL summary statistics in the MatrixEQTL format. In the end, you should have one file with summary statistics for shared eQTL and C files with summary statistics for each context C.

Multiple testing adjustment

Please download the required R packages inside run_TreeQTL.R before running TreeQTL.

To adjust for multiple testing across all contexts, genes, and genetic variants tested you can use the hierarchical FDR procedures implemented in the R package TreeQTL.

TreeQTL requires that you run MatrixEQTL to do eQTL mapping (see step 2 above). If you used another eQTL mapping softwares, please make sure the output is in the format required by TreeQTL. You can also replace TreeQTL with other methods, e.g. mashR, which can also lead to a considerable increase in power.

Map specific-eGenes, i.e., genes with at least one context-specific eQT

project_directory=your_project_directory
Rscript run_TreeQTL.R $project_directory specific

Map shared-eGenes, i.e., genes with at least one context-shared eQT

project_directory=your_project_directory
Rscript run_TreeQTL.R $project_directory shared

This script take as input data needed to run TreeQTL and outputs shared and specific eGenes (two files) and eAssociation (C+1 files) summary statistics in the TreeQTL format.

About

Computationally efficient and statistically powerful software for detecting context-specific eQTL effects in multi-context genomic studies.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages