Jared M. Cole, Carly B. Scott, Mackenzie M. Johnson, Peter R. Golightly, Jedidiah Carlson, Matthew J. Ming, Arbel Harpak, & Mark Kirkpatrick
Welcome! Here you can find the code to reproduce the analyses of "The battle of the sexes in humans is highly polygenic".
- All data generated from this project can be found at Zenodo:
- Code used to perform all analyses and plotting can be found in the scripts directory.
- The following software was used:
- The following R packages were used:
- optparse
- dplyr
- tidyverse
- ggplot2
- ggExtra
- cowplot
- scales
- data.table
- qqman
- abc
Detailed descriptions of all the data files found at the Zenodo repository can be found in the file: Data_file_descriptions.txt
Code is number-labeled according to the rough order of when these analyses appear in the text. There are main scripts (in both R and bash, numbered 1-10) that call on several subscripts (unnumbered) to perform various tasks (see scripts directory). ML_functions.R
contains most of the functions carried out accross multiple scripts, including the likelihood functions.
Simulates data and fits likelihood to estimate selection coefficients on simulated data. Requires following data files:
UKB_mafs_imputed_filtered.txt
UKB_r2_genotyped_filtered.txt
UKB_mafs_r2_genotyped.txt
Set of bash commands to extract the relevant haplotype and imputation data from UKB (BGEN files), as well as extract relevant metadata fields (as text file).
R script that conducts sample-level quality control (QC) steps using metadata (eg, missingness, relatedness, etc)
Bash commands (and using several intermediary scripts) that conduct site-level QC on the UKB genomic data (eg, MAF filtering, genotype quality, regions homologous to sex chromosome regions, etc). Outputs filtered haplotype data.
Filtering done using PLINK 1.9 and PLINK 2.0
This script is used in conjunction with Site_marker_QC_steps.R
below.
R script to do more site-level QC (eg, removing excess heterozygosity, assess missingness between males and females, etc)
This script is used in conjunction with 4.PLINK_site_filtering_steps_and_processing.sh
above.
Bash commands (and using several intermediary scripts) to count haplotypes from the filtered output generated by 4.PLINK_site_filtering_steps_and_processing.sh
.
Bash commands (and using several intermediary R and bash scripts) to run the likelihood analyses and perform bootstrapping.
Bash commands for running the Standard Major Axis regression on 27 traits.
R script used to perform Approximate Bayesian Computation.
R script to run statistical analyses on data generated by 6.Run_likelihood_analyses_with_bootstrapping.sh (eg, Mann-Whitney U tests, chi-square tests, etc), Figs 1C-3. Includes all analyses presented in main text and in the supplement.
R script to perform all plotting in the main text and the supplement.