Benchmarking Mendelian Randomization methods for causal inference using genome‐wide association study summary statistics
We present a benchmarking analysis of MR methods for causal inference with real-world genetic datasets. Our focus is on MR methods that utilize GWAS summary statistics as input, as they do not require access to individual-level GWAS data and are widely applicable. Specifically, we consider 16 MR methods, including the standard IVW (fixed), IVW (random) and 14 other advanced MR methods: dIVW, Egger, RAPS, Weighted-median, Weighted-mode, MR-PRESSO, MRMix, cML-MA, MR-Robust, MR-Lasso, MR-CUE, CAUSE, MRAPSS and MR-ConMix (Figure A). The procedure for running the MR methods is outlined in Figure B. To assess the performance of these MR methods, we utilized real-world datasets and focused on three key aspects: type I error control, the accuracy of causal effect estimates, replicability, and power (Figure C).
The original GWAS datasets used in this study are summarized in Table GWASs.xlsx. You can access the original GWAS datasets directly through the download links provided in the table. The formatted datasets used in this study are provided below.
Dataset 1: GWASATLAS Dataset for evaluation of type I error control in confounding scenario (a): Population stratification
Formatted GWASs for exposures; Formatted GWASs for outcomes; Formatted IV data for MR analysis;
Dataset 2: the Neal Lab Dataset for evaluation of type I error control in confounding scenario (a): Population stratification
Formatted GWASs; Formatted IV data for MR analysis.
Dataset 3: the Pan UKBB Dataset for evaluation of type I error control in confounding scenario (a): Population stratification
Formatted GWASs; Formatted IV data for MR analysis.
Dataset 4: the dataset for evaluation of type I error control in confounding scenario (b): Pleiotropy
Formatted GWASs; Formatted IV data for MR analysis
Dataset 5: the dataset for evaluation of type I error control in confounding scenario (c): Family-level confounders
Formatted GWASs; Formatted IV data for MR analysis
Formatted GWASs; Formatted IV data for MR analysis;
Formatted GWASs; Formatted IV data for MR analysis;
(1) "Formatted GWASs" refers to the formatted summary-level data files generated after quality control from the original GWAS datasets.
(2) "Formatted IV data for MR analysis" contains the following three types of files:
"Tested Trait pairs": the exposure-outcome trait pairs to be analyzed;
"MRdat": refers to the summary statistics of LD clumped IV sets for each trait pair tested which can be directed used for MR analysis;
"bg_paras": refers to the estimated background parameters "Omega" and "C" which will be used for MR estimation in MR-APSS.
(3) The details on data preprocessing including quality control of GWAS summary statistics, formatting GWASs, and LD clumping for IV selection can be found in the supplementary note of our paper[1].
Implementation details on data preprocessing can be found in the MR-APSS software tutorial on MR-APSS GitHub website.
#install.packages("devtools") #install.packages("remotes")
devtools::install_github("gqi/MRMix")
devtools::install_github("xue-hr/MRcML")
devtools::install_github("jean997/cause@v1.2.0")
devtools::install_github("rondolab/MR-PRESSO")
install.packages("MendelianRandomization")
devtools::install_github("YangLabHKUST/MR-APSS")
devtools::install_github("QingCheng0218/MR.CUE@main")
remotes::install_github("MRCIEU/TwoSampleMR")
devtools::install_github("qingyuanzhao/mr.raps")
install.packages(“robustbase”)
We perform IV selection for each trait pair in each dataset. The R code for IV selection is available in IV_selection.R.
We then applied each compared method using the dataset after IV selection. The R codes for running the 15 MR methods for each dataset are available in main_run_MR_methods.R. To run the codes of main_run_MR_methods.R, you must load the required packages and the R functions in the folder Rfuncs.
Results for dataset 1;
Results for dataset 2;
Results for dataset 3;
Results for dataset 4;
Results for dataset 5;
Results for dataset 6;
Results for dataset 7.
The datasets underwent a recent reorganization on September 24, 2024.
Xianghong Hu, Mingxuan Cai, Jiashun Xiao, Xiaomeng Wan, Zhiwei Wang, Hongyu Zhao, Can Yang, Benchmarking Mendelian randomization methods for causal inference using genome-wide association study summary statistics, The American Journal of Human Genetics, 2024. [link]; [medrxiv version].
Please feel free to contact Xianghong Hu (maxhu@ust.hk) or Prof. Can Yang (macyang@ust.hk) if any questions.