This repo contains results and scripts for reproducing and repurposing our analyses in our work,
Alan Aw, Jeremy McRae, Elior Rahmani and Yun Song (2024+) "Highly parameterized polygenic scores tend to overfit to population stratification via random effects" bioRXiv preprint. DOI: 10.1101/2024.01.27.577589
As the image above shows, we also present our results in an interactive dashboard, available here.
Under results
:
random_projections
contains results from Performance Inflation by rPGSeffect_perturbation
contains results from Performance Relative to pPGS and sPGSPGS_catalog
contains results from analyses of polygenic risk scores obtained from the PGS Catalogue (Evaluation of MCH PGSs in our paper)PGS_perf_vs_pval_thres
contains the data file and plot showing the performance, averaged across traits, of PGSs we trained on UKB phenotypes using various GWAS p-value thresholds
Under scripts
, similar directory organization as Results. We include one additional subdirectory:
angular_central_gaussian
, which contains a script to simulate random vectors under the Angular Central Gaussian distribution. This is mentioned briefly in our Main Text and discussed in our Supplementary Material
Under logs
, we provide log files that record the statistical tests we performed, as described in our paper.
Under manuscript
, we provide the Supplementary Material for our work, which contains technical details and more mathematical ideas that unfortunately could not fit into the Main Text.
- The summary of non-zero variant counts of all PGS Catalogue scores is available under
results/PGS_catalog/all_polygenic_scores
. The associated data wrangling script is available underscripts/PGS_catalog/all_polygenic_scores
.