Skip to content

LucaJiang/ldsc_notes

Repository files navigation

LDSC Notes

GitHub Pages

Notes for LD Score Regression.

TODO: Delete sliding window, because the l2 in reference panel is already the LD score. Figure out why 'M=1173569' which is the number of SNPs in the reference panel.

Structure of the Repository

  • 'master' branch: the source code of my implementation of LDSC.
    • 'ldscore' folder: the source code of LDSC.
    • 'ldsc.py' file: the entry point of LDSC.
    • 'data' folder: the data used by LDSC, i.e. $\ell^2$ and reference panel.
    • 'results' folder: the output log of LDSC.
    • 'test' folder: the test code and results. Mostly used for analysis performance.
  • 'gh-pages' branch: the code of the presentation page.

How to Run

  • Install dependencies:
pip3 install -r requirements.txt

Note: only need install jackknife to estimate the variance of heritability.

  • Calculate LD scores and estimate heritability:
python3 ldsc.py \
-M all \
-s ./data/full.sumstats \
-r ./data/eur_w_ld_chr/ \
-m CM -w 1e-4 \
-o ./results/test
  • Calculate LD scores only:
python3 ldsc.py \
-mi ldsc \
-s ./data/full.sumstats \
-r ./data/eur_w_ld_chr/ \
-m CM -w 1e-2 \
-o ./results/test
  • Estimate heritability only:
python3 ldsc.py \
-mi h2 \
-s ./results/test.txt \
-n 61220 \
-o ./results/test

Implementation Details

  • LD Score Calculation with MPI
    • Delete MAF <= 0.01 SNPs, ignore the order of 'A1' and 'A2'
    • Mapping from SNP in sumsstats to SNP in reference panel
    • LD Score is the 'l2' column in the reference panel
  • Iterative ReWeighted Least Squares(IRWLS): $$\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}$$ $$\text{argmin}_{\boldsymbol{\beta}} \sum_{i=1}^n ||y_i - \mathbf{x}_i\boldsymbol{\beta}||_p$$ where $\mathbf{X}=[\mathbf{1},\mathbf{\ell}]$ is a $n\times 2$ matrix.
    • Initial with pesudo-inverse: $$\hat{\boldsymbol{\beta}}_0 = \text{pinv}(\mathbf{X})\mathbf{Y}$$

    • Iteratively update the weight matrix, for $k=1,\ldots$, until convergence or max iteration:

      1. Calculate the residuals: $$\mathbf{r}_k = \mathbf{Y} - \mathbf{X}\hat{\boldsymbol{\beta}}_k$$
      2. Calculate the weight matrix: $$\mathbf{w}_k = | \mathbf{r}_k |^{p-2}$$ $$\mathbf{W}_k = \text{diag}(\mathbf{w}_k/\sum \mathbf{w}_k)$$
      3. Update the regression coefficient: $$\hat{\boldsymbol{\beta}}_{k+1} = (\mathbf{X}^T\mathbf{W}_k\mathbf{X})^{-1}\mathbf{X}^T\mathbf{W}_k\mathbf{Y}$$
      4. Check convergence: $$||\mathbf{r}_k - \mathbf{r}_{k+1}||_2 &lt; \epsilon$$
    • jackknife estimate of the variance:

Analysis of MPI

vizviewer result_mpi.json

You may install vizviewer python package first.

Presentation Page

The presentation on LDSC is here, which is generated base on the 'gh-pages' branch and deployed by github action.

Reference

  1. Bulik-Sullivan, Brendan K., et al. "LD score regression distinguishes confounding from polygenicity in genome-wide association studies." Nature genetics 47.3 (2015): 291-295. doi: 10.1038/ng.3211
  2. Cai M, Wang Z, Xiao J, et al. XMAP: Cross-population fine-mapping by leveraging genetic diversity and accounting for confounding bias[J]. bioRxiv, 2023: 2023.03. 30.534832.
  3. Burrus C S. Iterative reweighted least squares[J]. OpenStax CNX. Available online: web.archive, 2012, 12.

Releases

No releases published

Packages

No packages published

Languages