Benchmarking Mendelian Randomization methods for causal inference using genome‐wide association study summary statistics

The experimental design for benchmarking MR methods

We present a benchmarking analysis of MR methods for causal inference with real-world genetic datasets. Our focus is on MR methods that utilize GWAS summary statistics as input, as they do not require access to individual-level GWAS data and are widely applicable. Specifically, we consider 16 MR methods, including the standard IVW (fixed), IVW (random) and 14 other advanced MR methods: dIVW, Egger, RAPS, Weighted-median, Weighted-mode, MR-PRESSO, MRMix, cML-MA, MR-Robust, MR-Lasso, MR-CUE, CAUSE, MRAPSS and MR-ConMix (Figure A). The procedure for running the MR methods is outlined in Figure B. To assess the performance of these MR methods, we utilized real-world datasets and focused on three key aspects: type I error control, the accuracy of causal effect estimates, replicability, and power (Figure C).

Datasets

GWAS sources

The original GWAS datasets used in this study are summarized in Table GWASs.xlsx. You can access the original GWAS datasets directly through the download links provided in the table. The formatted datasets used in this study are provided below.

Dataset 1: GWASATLAS Dataset for evaluation of type I error control in confounding scenario (a): Population stratification

Formatted GWASs for exposures; Formatted GWASs for outcomes; Formatted IV data for MR analysis;

Dataset 2: the Neal Lab Dataset for evaluation of type I error control in confounding scenario (a): Population stratification

Formatted GWASs; Formatted IV data for MR analysis.

Dataset 3: the Pan UKBB Dataset for evaluation of type I error control in confounding scenario (a): Population stratification

Formatted GWASs; Formatted IV data for MR analysis.

Dataset 4: the dataset for evaluation of type I error control in confounding scenario (b): Pleiotropy

Formatted GWASs; Formatted IV data for MR analysis

Dataset 5: the dataset for evaluation of type I error control in confounding scenario (c): Family-level confounders

Formatted GWASs; Formatted IV data for MR analysis

Dataset 6: the dataset for evaluation of the accuracy of causal effect estimates

Formatted GWASs; Formatted IV data for MR analysis;

Dataset 7: the dataset for evaluation of replicability

Formatted GWASs; Formatted IV data for MR analysis;

Notes:

(1) "Formatted GWASs" refers to the formatted summary-level data files generated after quality control from the original GWAS datasets. (2) "Formatted IV data for MR analysis" contains the following three types of files:
"Tested Trait pairs": the exposure-outcome trait pairs to be analyzed;
"MRdat": refers to the summary statistics of LD clumped IV sets for each trait pair tested which can be directed used for MR analysis;
"bg_paras": refers to the estimated background parameters "Omega" and "C" which will be used for MR estimation in MR-APSS.
(3) The details on data preprocessing including quality control of GWAS summary statistics, formatting GWASs, and LD clumping for IV selection can be found in the supplementary note of our paper[1].
Implementation details on data preprocessing can be found in the MR-APSS software tutorial on MR-APSS GitHub website.

R code

Install required packages

#install.packages("devtools") #install.packages("remotes")

devtools::install_github("gqi/MRMix")

devtools::install_github("xue-hr/MRcML")

devtools::install_github("jean997/cause@v1.2.0")

devtools::install_github("rondolab/MR-PRESSO")

install.packages("MendelianRandomization")

devtools::install_github("YangLabHKUST/MR-APSS")

devtools::install_github("QingCheng0218/MR.CUE@main")

remotes::install_github("MRCIEU/TwoSampleMR")

devtools::install_github("qingyuanzhao/mr.raps")

install.packages(“robustbase”)

Run MR Methods

We perform IV selection for each trait pair in each dataset. The R code for IV selection is available in IV_selection.R.

We then applied each compared method using the dataset after IV selection. The R codes for running the 15 MR methods for each dataset are available in main_run_MR_methods.R. To run the codes of main_run_MR_methods.R, you must load the required packages and the R functions in the folder Rfuncs.

Results of MR methods

Results for dataset 1;
Results for dataset 2;
Results for dataset 3;
Results for dataset 4;
Results for dataset 5;
Results for dataset 6;
Results for dataset 7.

updates

The datasets underwent a recent reorganization on September 24, 2024.

Reference

Xianghong Hu, Mingxuan Cai, Jiashun Xiao, Xiaomeng Wan, Zhiwei Wang, Hongyu Zhao, Can Yang, Benchmarking Mendelian randomization methods for causal inference using genome-wide association study summary statistics, The American Journal of Human Genetics, 2024. [link]; [medrxiv version].

Contact information

Please feel free to contact Xianghong Hu (maxhu@ust.hk) or Prof. Can Yang (macyang@ust.hk) if any questions.

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
Rfuncs		Rfuncs
GWASs.xlsx		GWASs.xlsx
IV_selection.R		IV_selection.R
README.md		README.md
design.pdf		design.pdf
design.png		design.png
main_run_MR_methods.R		main_run_MR_methods.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmarking Mendelian Randomization methods for causal inference using genome‐wide association study summary statistics

The experimental design for benchmarking MR methods

Datasets

GWAS sources

Dataset 1: GWASATLAS Dataset for evaluation of type I error control in confounding scenario (a): Population stratification

Dataset 2: the Neal Lab Dataset for evaluation of type I error control in confounding scenario (a): Population stratification

Dataset 3: the Pan UKBB Dataset for evaluation of type I error control in confounding scenario (a): Population stratification

Dataset 4: the dataset for evaluation of type I error control in confounding scenario (b): Pleiotropy

Dataset 5: the dataset for evaluation of type I error control in confounding scenario (c): Family-level confounders

Dataset 6: the dataset for evaluation of the accuracy of causal effect estimates

Dataset 7: the dataset for evaluation of replicability

Notes:

R code

Install required packages

Run MR Methods

Results of MR methods

updates

Reference

Contact information

About

Releases 1

Packages

Languages

YangLabHKUST/MRbenchmarking

Folders and files

Latest commit

History

Repository files navigation

Benchmarking Mendelian Randomization methods for causal inference using genome‐wide association study summary statistics

The experimental design for benchmarking MR methods

Datasets

GWAS sources

Dataset 1: GWASATLAS Dataset for evaluation of type I error control in confounding scenario (a): Population stratification

Dataset 2: the Neal Lab Dataset for evaluation of type I error control in confounding scenario (a): Population stratification

Dataset 3: the Pan UKBB Dataset for evaluation of type I error control in confounding scenario (a): Population stratification

Dataset 4: the dataset for evaluation of type I error control in confounding scenario (b): Pleiotropy

Dataset 5: the dataset for evaluation of type I error control in confounding scenario (c): Family-level confounders

Dataset 6: the dataset for evaluation of the accuracy of causal effect estimates

Dataset 7: the dataset for evaluation of replicability

Notes:

R code

Install required packages

Run MR Methods

Results of MR methods

updates

Reference

Contact information

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages