FactorGo
is a scalable variational factor analysis model that learns pleiotropic factors using GWAS summary statistics!
We present Factor analysis model in Genetic assOciation (FactorGo) to learn latent
pleiotropic factors using GWAS summary statistics. Our model is implemented using Just-in-time
(JIT)
via JAX in python, which generates and compiles heavily optimized
C++ code in real time and operates seamlessly on CPU, GPU or TPU. FactorGo is a command line tool and
please see example below and full documentation.
For pubished paper, please see:
Zhang, Z., Jung, J., Kim, A., Suboc, N., Gazal, S., and Mancuso, N. (2023). A scalable approach to characterize pleiotropy across thousands of human diseases and complex traits using GWAS summary statistics. Am. J. Hum. Genet. 110, 1863–1874. (https://www.cell.com/ajhg/abstract/S0002-9297(23)00353-1)
We are currently working on more detailed documentations. Feel free to contact me (zzhang39@usc.edu) if you need help on running our tool and further analysis. I am happy to schedule zoom call if needed.
Installation | Example | Notes | Support | Other Software
FactorGo assumes the true genetic effect can be decomposed into latent pleiotropic factors.
Briefly, we model test statistics at
where
To model our uncertainty in
where $\alpha \in R^{k \times 1}{>0} (\phi > 0)$ controls the prior precision for variant loadings (intercept). To avoid overfitting,
and “shut off” uninformative factors when $k$ is misspecified, we use automatic relevance determination (ARD) [1]
and place a prior over
Lastly, we place a prior over the shared residual variance across GWAS studies as
We recommend first create a conda environment and have pip
installed.
# download use http address
git clone https://github.com/mancusolab/FactorGo.git
# or use ssh agent
git clone git@github.com:mancusolab/FactorGo.git
cd factorgo
pip install .
For iilustration, we use example data stored in /example/data
,
including Z score summary statistics file and sample size file.
To run factorgo
command line tool, we specify the following input files and flags:
- GWAS Zscore file: n20_p1k.Zscore.tsv.gz
- Sample size file: n20_p1k.SampleN.tsv
- -k 5: estimate 5 latent factors
- --scale: the snp columns of Zscore matrix is center and standardized
- -o: output directory and prefix
For all available flags, please use factorgo -h
.
factorgo \
./example/data/n20_p1k.Zscore.tsv.gz \
./example/data/n20_p1k.SampleN.tsv \
-k 5 \
--scale \
-o ./example/result/demo_test
The output contains five result files:
-
demo_test.Wm.tsv.gz: posterior mean of loading matrix W (pxk)
-
demo_test.Zm.tsv.gz: posterior mean of factor score Z (nxk)
-
demo_test.Wvar.tsv.gz: posterior variance of loading matrix W (kx1)
-
demo_test.Zvar.tsv.gz: posterior variance of factor score Z (nxk)
-
demo_test.factor.tsv.gz: contains the following three columns
| a) factor index (ordered by R2), | b) posterior mean of ARD precision parameters, | c) variance explained by each factor (R2)
The default computation device for factorgo
is CPU. To switch to GPU device, you can specify the platform (cpu/gpu/tpu) using the flag -p gpu
for example:
factorgo \
./example/data/n20_p1k.Zscore.tsv.gz \
./example/data/n20_p1k.SampleN.tsv \
-k 5 \
--scale \
-p gpu \ # use gpu device
-o ./example/result/demo_test
factorgo
uses JAX with Just In Time compilation to achieve high-speed computation.
However, there are some issues for JAX with Mac M1 chip.
To solve this, users need to initiate conda using miniforge, and then install factorgo
using pip
in the desired environment.
[1] Bishop, C.M. (1999). Variational principal components. 509–514.
Please report any bugs or feature requests in the Issue Tracker. If you have any questions or comments please contact zzhang39@usc.edu and/or nmancuso@usc.edu.
Feel free to use other software developed by Mancuso Lab:
- SuShiE: a Bayesian fine-mapping framework for molecular QTL data across multiple ancestries.
- MA-FOCUS: a Bayesian fine-mapping framework using TWAS statistics across multiple ancestries to identify the causal genes for complex traits.
- SuSiE-PCA: a scalable Bayesian variable selection technique for sparse principal component analysis
- twas_sim: a Python software to simulate TWAS statistics.
- HAMSTA: a Python software to estimate heritability explained by local ancestry data from admixture mapping summary statistics.
This project has been set up using PyScaffold 4.1.1. For details and usage information on PyScaffold see https://pyscaffold.org/.