Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: finish running on geuvadis #43

Merged
merged 131 commits into from
Jun 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
131 commits
Select commit Hold shift + click to select a range
1c1b752
wip: classes for multiple multiple-testing corrections
aryarm Sep 17, 2023
e791e18
integrate corrections into Terminator
aryarm Sep 26, 2023
cfdada1
fix issues with corrector class
aryarm Oct 1, 2023
d9c8199
try to generate simple plot with different alpha values
aryarm Oct 2, 2023
322e016
standardize in validate_linreg_methods
aryarm Nov 13, 2023
a72b7f2
disable multiple testing corrections by default
aryarm Nov 13, 2023
f846daa
subset gt file by samples in pt file first
aryarm Nov 13, 2023
e55bc90
add happler transform command
aryarm Nov 13, 2023
df8a19e
also show direction of LD in plots
aryarm Nov 13, 2023
58f9562
allow for comparing LD between causal and observed haps
aryarm Nov 13, 2023
8423d24
prune the negative allele but keep the positive one
aryarm Nov 13, 2023
087607c
handle single gwas in manhattan script
aryarm Nov 14, 2023
1dad9ab
make manhattan script more modular
aryarm Nov 21, 2023
1770ca3
remove multiple testing corrections as we add more nodes to a branch
aryarm Nov 28, 2023
d2d0ac2
fix broken code in summarize_results.R
aryarm Dec 5, 2023
27bcbdf
revise broken code from refactor
aryarm Dec 5, 2023
518c71f
make ld_range potentially reproducible
aryarm Dec 12, 2023
d6d26b7
switch from runtime to runtime_min
aryarm Dec 12, 2023
815d001
try adding support for random haps
aryarm Dec 13, 2023
5ca12d2
fix miscellaneous errors
aryarm Dec 13, 2023
d6cb28d
allow for arbitrary haplotype IDs hap files
aryarm Dec 13, 2023
8c2f93f
move causal_gt param to inputs
aryarm Dec 13, 2023
dae8dc9
handle case where causal hap is also within input
aryarm Dec 13, 2023
0b84477
handle case with no hap
aryarm Dec 15, 2023
f8dcb67
add susie metrics script
aryarm Dec 21, 2023
78b3adf
add bash script for running on tscc 2.0
aryarm Dec 21, 2023
13f3bb3
add cluster-sync command
aryarm Jan 4, 2024
faa089e
add step to plot finemapping metrics
aryarm Jan 4, 2024
5f2c187
create simpler version of haptools ld command
aryarm Jan 18, 2024
75a34ee
reinstate hap mode support
aryarm Jan 28, 2024
e647bec
modify run.bash to run on slurm
aryarm Feb 2, 2024
81fd9e0
allow for specifying various sample sizes
aryarm Feb 2, 2024
a930bed
start on geuvadis workflow
aryarm Feb 8, 2024
04003cf
install peer script in default env
aryarm Feb 8, 2024
dd7e6bd
add Michael's peer residual script
aryarm Feb 8, 2024
0b1075b
try to use region param in entire happler workflow
aryarm Feb 9, 2024
f88d721
use maf in happler script
aryarm Feb 10, 2024
cec879a
create separate env with peer since it conflicts
aryarm Feb 10, 2024
4a6a40e
run happler pipeline until gwas but not manhattan
aryarm Feb 11, 2024
ecfd1d5
support inferring orange hap ids in manhattan plot
aryarm Feb 11, 2024
fb88243
implement region pgen handling code in R scripts
aryarm Feb 11, 2024
f008959
remove double log file
aryarm Feb 20, 2024
a070cbc
give up on running covariates and just use residuals instead
aryarm Feb 20, 2024
0fa0b3b
run 1000G different sample sizes
aryarm Feb 20, 2024
4d629ff
enable plotting module to handle sampsize and locus wildcards
aryarm Feb 21, 2024
ff2c39b
run on geuvadis
aryarm Mar 12, 2024
7c36a40
ensure scientific notat not used and liftover regions
aryarm Mar 21, 2024
a639a33
compute pgen ld a little faster
aryarm Mar 28, 2024
578435e
also enable computing allele LD heatmaps
aryarm Mar 28, 2024
a6ddbef
fix happler_results rule when in 'run' mode
aryarm Mar 28, 2024
0a149b2
enable debug mode via config
aryarm Mar 31, 2024
cd1f2ac
plot finemapping without causal variable
aryarm Mar 31, 2024
05564a0
add r debugging function
aryarm Apr 1, 2024
552454d
allow summarize_results.R to work wo causal variant
aryarm Apr 1, 2024
7c7b3b6
fix issue with loading pgen data in R with region
aryarm Apr 1, 2024
62aa305
add SVs to manhattan plots
aryarm Apr 2, 2024
474a8f9
switch to snakemake 8 and slurm executor
aryarm Apr 2, 2024
58ce2a5
add table at bottom of allele heatmap
aryarm Apr 4, 2024
fcdc69a
switch to black and white
aryarm Apr 4, 2024
791c678
convert white to gray in LD heatmap
aryarm Apr 8, 2024
56f5153
make script for plotting pts vs gts for a hap
aryarm Apr 9, 2024
a2e6d81
add boxplots to normal snakemake output
aryarm Apr 9, 2024
c4bd437
try to batch igv
aryarm Apr 12, 2024
bdce829
add IGV plotting
aryarm Apr 16, 2024
7bc79a5
update to snakemake8
aryarm Apr 16, 2024
ade0975
add plot for variance explained
aryarm Apr 18, 2024
48f4443
refactor variance_explained and remove linears param
aryarm Apr 18, 2024
6ead321
allow for extracting PIPs to a TSV
aryarm Apr 19, 2024
1651bc7
update to latest haptools
aryarm Apr 19, 2024
1f1d06b
use inplace subset of phenotypes
aryarm Apr 19, 2024
5f6ed3d
upgradesnakemake version
aryarm Apr 22, 2024
9216cfa
create conditional linreg plots
aryarm Apr 22, 2024
e3e9af8
add original manhattan to conditional regression plot
aryarm Apr 26, 2024
a5067b0
revise run.bash to work with condo instead of hotel
aryarm Apr 26, 2024
6be296e
use set instead of tuple of samples after updating haptools
aryarm Apr 30, 2024
2f9fb61
use only a specific hap in the IGV plots
aryarm Apr 30, 2024
197e0ed
try the strict branching strategy instead
aryarm Apr 30, 2024
fa633a4
create script for computing residuals
aryarm May 1, 2024
c226788
add residual computation and split_pheno scripts
aryarm May 1, 2024
b86c369
fix comment in residuals script
aryarm May 1, 2024
3434767
handle samples that are phenotyped but not genotyped
aryarm May 4, 2024
f2c513b
use 'trait' instead of 'gene' as wildcard name
aryarm May 6, 2024
7553a4a
tweak workflow slightly for running on UKB
aryarm May 6, 2024
ec81cec
upgrade plink2 in workflow
aryarm May 7, 2024
e123834
do not import simulation workflow if just running happler
aryarm May 7, 2024
a7036ee
ensure withdrawn samples are removed not kept
aryarm May 7, 2024
3b86df6
make ukb config the default instead
aryarm May 7, 2024
7a6a3d6
add space after param in genotypes module
aryarm May 7, 2024
8c6ea4d
fix issue where we were only running one locus for each trait
aryarm May 7, 2024
9fd618d
prune even with the strict branching strategy
aryarm May 13, 2024
26eaf8c
fix broken workflow after removing --no-temp
aryarm May 21, 2024
2fc43f6
fix labeling of manhattans
aryarm May 22, 2024
1013b7e
handle buffer/padding in heatmap plots
aryarm May 24, 2024
17f604d
switch from shapeit to beagle bc it's faster
aryarm May 24, 2024
7496a48
ensure chosen haplotypes exceed maf threshold
aryarm May 28, 2024
7c20bfa
add test for 1000G test dataset
aryarm May 30, 2024
f7296ea
disable filtering with maf of None
aryarm May 30, 2024
a199403
fix bug in maf masking and add proper test
aryarm May 30, 2024
f3cc6b1
switch to symlink for config file
aryarm May 30, 2024
7d271b1
upgrade to latest snakemake
aryarm May 30, 2024
127808a
try to use less memory while tree building
aryarm May 30, 2024
1fc5ab3
add checkpoint to check which loci have multi-variant haps
aryarm May 31, 2024
636652a
provide best allele idx instead of best res idx
aryarm May 31, 2024
f4731a9
fix bugs in multiline checkpoint
aryarm May 31, 2024
278a42b
update to latest beagle and reduce happler runtime
aryarm May 31, 2024
a2c7005
add extra plots to output
aryarm May 31, 2024
7c21201
filter by maf in all happler steps
aryarm May 31, 2024
8f7b508
add test with real data
aryarm Jun 2, 2024
c3a56d7
fix error when output is stdout but --show-tree is given
aryarm Jun 2, 2024
0d38703
fix bug when terminating strict branching with an maf threshold
aryarm Jun 2, 2024
3dbcf57
try to switch to dev-env.yml file
aryarm Jun 3, 2024
c3af2f3
update pyproject to use only py3.8+
aryarm Jun 3, 2024
9b3566e
remove py3.7 from ci
aryarm Jun 3, 2024
1c56b7d
update sphinx-autodoc-typehints
aryarm Jun 3, 2024
1d90176
add statsmodels and update sphinx-autodoc-typehints again
aryarm Jun 3, 2024
ea9838b
disable macos ci for now
aryarm Jun 3, 2024
9d36b70
also add SV analysis to target rule
aryarm Jun 3, 2024
9c0afba
try to use custom formula for memory requirement
aryarm Jun 3, 2024
269fc35
use custom formula for runtime as well
aryarm Jun 3, 2024
9714546
fix rtd issue
aryarm Jun 4, 2024
2aa71bb
refmt with black
aryarm Jun 4, 2024
4763715
re-adjust memory to work for more cases
aryarm Jun 4, 2024
a3aba3f
Merge branch 'feat/multiple-testing' of github.com:gymrek-lab/happler…
aryarm Jun 4, 2024
346a442
fix bool typing rtd err
aryarm Jun 4, 2024
fada3f4
fix remaining rtd issues again
aryarm Jun 4, 2024
115b53c
try to resolve issues with newer versions of packages
aryarm Jun 4, 2024
961b553
lower maf thresh and try to rerun happler
aryarm Jun 4, 2024
b615764
skip debug log for real example and only log tree when needed
aryarm Jun 5, 2024
0c5399a
refmt with black
aryarm Jun 5, 2024
112c3ce
try to fix last rtd warning
aryarm Jun 5, 2024
02f36c4
increase time for vcf2plink command
aryarm Jun 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions .devcontainer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
// For format details, see https://aka.ms/devcontainer.json. For config options, see the
// README at: https://github.com/devcontainers/templates/tree/main/src/ubuntu
{
"name": "Ubuntu",
// Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile
"image": "mcr.microsoft.com/devcontainers/base:jammy",
"features": {
"ghcr.io/rocker-org/devcontainer-features/miniforge:1": {
"version": "latest",
"variant": "Mambaforge"
}
},

// Features to add to the dev container. More info: https://containers.dev/features.
// "features": {},

// Use 'forwardPorts' to make a list of ports inside the container available locally.
// "forwardPorts": [],

// Use 'postCreateCommand' to run commands after the container is created.
"postCreateCommand": "mamba env create -n happler -f dev-env.yml && conda run -n happler poetry install",

// Configure tool-specific properties.
"customizations": {
"vscode": {
"extensions": ["ms-python.python"],
"settings": {
"python.condaPath": "/opt/conda/condabin/conda",
"python.defaultInterpreterPath": "/opt/conda/envs/happler/bin/python",
"python.terminal.activateEnvironment": true,
"python.terminal.activateEnvInCurrentTerminal": true,
"python.venvFolders": ["/home/vscode/.cache/pypoetry/virtualenvs"],
"terminal.integrated.environmentChangesRelaunch": true,
"terminal.integrated.hideOnStartup": "always"
}
}
}

// Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
// "remoteUser": "root"
}
13 changes: 13 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
## Checklist

* [ ] I've checked to ensure there aren't already other open [pull requests](../../../pulls) for the same update/change
* [ ] I've prefixed the title of my PR according to [the conventional commits specification](https://www.conventionalcommits.org/). If your PR fixes a bug, please prefix the PR with `fix: `. Otherwise, if it introduces a new feature, please prefix it with `feat: `. If it introduces a breaking change, please add an exclamation before the colon, like `feat!: `. If the scope of the PR changes because of a revision to it, please update the PR title, since the title will be used in our CHANGELOG.
* [ ] At the top of the PR, I've [listed any open issues that this PR will resolve](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword). For example, "resolves #0" if this PR resolves issue #0
- [ ] I've explained my changes in a manner that will make it possible for both users and maintainers of happler to understand them
* [ ] I have followed the [contributing guidelines](https://happler.readthedocs.io/en/stable/project_info/contributing.html#how-to-fix-a-bug-or-implement-a-new-feature)
* [ ] I have adhered to the [style guidelines](https://happler.readthedocs.io/en/stable/project_info/contributing.html#style)
* [ ] I've added tests for any new functionality. Or, if this PR fixes a bug, I've added test(s) that replicate it
* [ ] I've updated the relevant documentation and checked that the newly built documentation is formatted properly
* [ ] All functions, modules, classes etc. still conform to [numpy docstring standards](https://numpydoc.readthedocs.io/en/latest/format.html)
* [ ] (if applicable) I've updated the pyproject.toml file with any changes I've made to happler's dependencies, and I've run `poetry lock --no-update` to ensure the lock file stays up to date and that our dependencies are locked to their minimum versions
* [ ] In the body of this PR, I've included a short address to the reviewer highlighting one or two items that might deserve their focus
5 changes: 0 additions & 5 deletions .github/workflows/constraints.txt

This file was deleted.

126 changes: 56 additions & 70 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
name: Tests

on:
- pull_request
on: [pull_request, workflow_call]

jobs:
tests:
Expand All @@ -11,13 +10,15 @@ jobs:
fail-fast: false
matrix:
include:
- { python: "3.7", os: "ubuntu-latest", session: "lint" }
- { python: "3.7", os: "ubuntu-latest", session: "tests" }
- { python: "3.8", os: "ubuntu-latest", session: "lint" }
- { python: "3.8", os: "ubuntu-latest", session: "tests" }
- { python: "3.9", os: "ubuntu-latest", session: "tests" }
- { python: "3.10", os: "ubuntu-latest", session: "tests" }
# - { python: "3.10", os: "windows-latest", session: "tests" }
# - { python: "3.10", os: "macos-latest", session: "tests" }
- { python: "3.11", os: "ubuntu-latest", session: "tests" }
- { python: "3.12", os: "ubuntu-latest", session: "tests" }
# - { python: "3.11", os: "windows-latest", session: "tests" }
# - { python: "3.9", os: "macos-latest", session: "tests" }
- { python: "3.8", os: "ubuntu-latest", session: "size" }

env:
NOXSESSION: ${{ matrix.session }}
Expand All @@ -26,90 +27,75 @@ jobs:

steps:
- name: Check out the repository
uses: actions/checkout@v3
uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python }}
uses: actions/setup-python@v3
- name: Setup Mambaforge
uses: conda-incubator/setup-miniconda@v3
with:
python-version: ${{ matrix.python }}
activate-environment: happler
miniforge-variant: Mambaforge
auto-activate-base: false
miniforge-version: latest
use-mamba: true

- name: Upgrade pip
run: |
pip install --constraint=.github/workflows/constraints.txt pip
pip --version
- name: Upgrade pip in virtual environments
shell: python
run: |
import os
import pip
with open(os.environ["GITHUB_ENV"], mode="a") as io:
print(f"VIRTUALENV_PIP={pip.__version__}", file=io)
- name: Install Poetry
- name: Get Date
id: get-date
run: echo "today=$(/bin/date -u '+%Y%m%d')" >> $GITHUB_OUTPUT
shell: bash

- name: Cache Conda env
uses: actions/cache@v3
with:
path: ${{ env.CONDA }}/envs
key:
conda-${{ runner.os }}--${{ runner.arch }}--${{ steps.get-date.outputs.today }}-${{ hashFiles('dev-env.yml') }}-${{ env.CACHE_NUMBER }}
env:
# Increase this value to reset cache if dev-env.yml has not changed
CACHE_NUMBER: 0
id: cache

- name: Install dev environment
run:
mamba env update -n happler -f dev-env.yml
if: steps.cache.outputs.cache-hit != 'true'

- name: Try to build happler
shell: bash -el {0}
run: |
pipx install --pip-args=--constraint=.github/workflows/constraints.txt poetry
poetry --version
- name: Install Nox
poetry build --no-ansi

- name: Check distribution size
if: matrix.session == 'size'
run: |
pipx install --pip-args=--constraint=.github/workflows/constraints.txt nox
pipx inject --pip-args=--constraint=.github/workflows/constraints.txt nox nox-poetry
nox --version
- name: Run Nox
du -csh dist/*
# check that the generated dist/ directory does not exceed 0.3 MB
# if this check fails, it's because you forgot to list large files in a "tool.poetry.exclude" section of our pyproject.toml
# https://python-poetry.org/docs/pyproject/#include-and-exclude
[ $(du -b dist | cut -f1) -lt 300000 ]

- name: Run tests with nox
if: matrix.session != 'size'
shell: bash -el {0}
run: |
nox --python=${{ matrix.python }}
nox --verbose --python=${{ matrix.python }}

- name: Upload coverage data
if: always() && matrix.session == 'tests'
uses: "actions/upload-artifact@v3"
with:
name: coverage-data
path: ".coverage.*"

large-files:
name: File sizes
runs-on: ubuntu-latest
steps:
- name: Check out the repository
uses: actions/checkout@v3
uses: actions/checkout@v4

- name: Check for large files
uses: actionsdesk/lfs-warning@v3.2
with:
token: ${{ secrets.GITHUB_TOKEN }} # Optional
filesizelimit: 500000b
labelName: large-files

# coverage:
# runs-on: ubuntu-latest
# needs: tests
# steps:
# - name: Check out the repository
# uses: actions/checkout@v3

# - name: Set up Python
# uses: actions/setup-python@v3
# with:
# python-version: "3.10"

# - name: Upgrade pip
# run: |
# pip install --constraint=.github/workflows/constraints.txt pip
# pip --version
# - name: Install Poetry
# run: |
# pipx install --pip-args=--constraint=.github/workflows/constraints.txt poetry
# poetry --version
# - name: Install Nox
# run: |
# pipx install --pip-args=--constraint=.github/workflows/constraints.txt nox
# pipx inject --pip-args=--constraint=.github/workflows/constraints.txt nox nox-poetry
# nox --version
# - name: Download coverage data
# uses: actions/download-artifact@v3
# with:
# name: coverage-data

# - name: Combine coverage data and display human readable report
# run: |
# nox --session=coverage
# - name: Create coverage report
# run: |
# nox --session=coverage -- xml
# - name: Upload coverage report
# uses: codecov/codecov-action@v3.1.0
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,10 @@ dist/
analysis/data
analysis/out
analysis/log
analysis/Rplots.pdf
analysis/myenv.RData
.snakemake
.ipynb_checkpoints
# vscode
.vscode
venv/
28 changes: 18 additions & 10 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,26 @@
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Note: I used https://github.com/readthedocs/readthedocs.org/issues/4912#issuecomment-664002569 for inspiration
# Note: I used https://docs.readthedocs.io/en/stable/build-customization.html#install-dependencies-with-poetry for inspiration

version: 2

build:
os: "ubuntu-22.04"
tools:
python: "3.8"
jobs:
post_create_environment:
# Install poetry
# https://python-poetry.org/docs/#installing-manually
- pip install poetry
post_install:
# Install dependencies with 'docs' dependency group
# https://python-poetry.org/docs/managing-dependencies/#dependency-groups
# VIRTUAL_ENV needs to be set manually for now.
# See https://github.com/readthedocs/readthedocs.org/pull/11152/
- VIRTUAL_ENV=$READTHEDOCS_VIRTUALENV_PATH poetry install --only main,docs

sphinx:
configuration: docs/conf.py

python:
version: 3.7
install:
- method: pip
path: .
extra_requirements:
- docs

fail_on_warning: true
10 changes: 5 additions & 5 deletions analysis/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
![Snakemake](https://img.shields.io/badge/snakemake-�~I�6.7.0-brightgreen.svg?style=flat-square)](https://snakemake.bitbucket.io)
![Snakemake](https://img.shields.io/badge/snakemake-�~I�8.12.0-brightgreen.svg?style=flat-square)](https://snakemake.bitbucket.io)

# download
Execute the following command.
Expand All @@ -8,9 +8,9 @@ git clone https://github.com/aryarm/happler
You can also download example data for the pipeline. See [the config file](config/config.yml) for links and instructions.

# setup
The pipeline is written as a Snakefile which can be executed via [Snakemake](https://snakemake.readthedocs.io). For reproduciblity, we recommend installing the version that we used (6.7.0):
The pipeline is written as a Snakefile which can be executed via [Snakemake](https://snakemake.readthedocs.io). For reproduciblity, we recommend installing the version that we used (8.12.0):
```
conda create -n snakemake -c conda-forge --no-channel-priority 'bioconda::snakemake==6.7.0'
conda create -n snakemake -c conda-forge --no-channel-priority 'bioconda::snakemake==8.12.0'
```
`snakemake` will [automatically install all dependencies](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#integrated-package-management) of the pipeline upon its first execution using `conda`.

Expand All @@ -25,9 +25,9 @@ conda create -n snakemake -c conda-forge --no-channel-priority 'bioconda::snakem
```
./run.bash &
```
__or__ on a TORQUE cluster:
__or__ on a SLURM cluster:
```
qsub run.bash
sbatch run.bash
```
### Output
All output of the pipeline will be placed in a new directory (`out/`, by default).
Expand Down
87 changes: 87 additions & 0 deletions analysis/config/config-geuvadis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# This is the Snakemake configuration file that specifies paths and
# and options for the pipeline. Anybody wishing to use
# the provided snakemake pipeline should first fill out this file with paths to
# their own data, as the Snakefile requires it.
# Every config option has reasonable defaults unless it is labeled as "required."
# All paths are relative to the directory that Snakemake is executed in.
# Note: this file is written in the YAML syntax (https://learnxinyminutes.com/docs/yaml/)


# Paths to a SNP-STR haplotype reference panel
# You can download this from http://gymreklab.com/2018/03/05/snpstr_imputation.html
# If the VCFs are per-chromosome, replace the contig name in the file name with "{chr}"
# The VCF(s) must be sorted and indexed (with a .tbi file in the same directory)
# required!
# ref_panel: "/projects/ps-gymreklab/resources/datasets/snpstr/1kg.snp.str.chr{chr}.vcf.gz"
# snp_panel: "/projects/ps-gymreklab/resources/datasets/ukbiobank/array_imputed/pfile_converted/chr{chr}.pgen"
snp_panel: "data/geuvadis/geuvadis_ensemble_phasing.pgen"
# str_panel: "/tscc/projects/ps-gymreklab/jmargoli/ukbiobank/str_imputed/runs/first_pass/vcfs/annotated_strs/chr{chr}.vcf.gz"

# Path to a list of samples to exclude from the analysis
# There should be one sample ID per line
# exclude_samples: data/ukb_random_samples_exclude.tsv

# If SNPs are unphased, provide the path to a SHAPEIT4 map file like these:
# https://github.com/odelaneau/shapeit4/tree/master/maps
# The map file should use the same reference genome as the reference panel VCFs
# phase_map: data/genetic_maps/chr{chr}.b37.gmap.gz

# A "locus" is a string with a contig name, a colon, the start position, a dash, and
# the end position or a BED file with a ".bed" file ending
# There are different simulation modes that you can use:
# 1. "str" - a tandem repeat is a string with a contig name, a colon, and the start position
# 2. "snp" - a SNP follows the same format as "str"
# 3. "hap" - a haplotype
# 4. "ld_range" - creates random two-SNP haplotypes with a range of LD values between the alleles of each haplotype
# 5. "run" - execute happler on a locus without simulating anything
# The STR and SNP positions should be contained within the locus.
# The positions should be provided in the same coordinate system as the reference
# genome of the reference panel VCFs
# The contig should correspond with the contig name from the {chr} wildcard in the VCF
# required! and unless otherwise noted, all attributes of each mode are required
# locus: 19:45401409-46401409 # center: 45901409 (APOe4)
locus: data/geuvadis/geuvadis_eqtl_genes.full.liftover.bed
modes:
str:
pos: 19:45903857 # STR_691361
snp:
pos: 19:45910672 # rs1046282
hap:
alleles: [rs36046716:G, rs1046282:G] # 45892145, 45910672
beta: [0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45]
ld_range:
reps: 1
min_ld: 0
max_ld: 1
step: 0.1
min_af: 0.25
max_af: 0.75
# beta: [0.35]
beta: [0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45]
alpha: [0.05]
random: false # whether to also produce random haplotypes
run:
pheno: data/geuvadis/phenos/{trait}.pheno
SVs: data/geuvadis/pangenie_hprc_hg38_all-samples_bi_SVs-missing_removed.pgen
# pheno_matrix: data/geuvadis/EUR_converted_expr_hg38.csv # optional
mode: run

# Covariates to use if they're needed
# Otherwise, they're assumed to be regressed out of the phenotypes
# Note: the finemapping methods won't be able to handle these covariates
# covar: data/geuvadis/5PCs_sex.covar

# Discard rare variants with a MAF below this number
# Defaults to 0 if not specified
min_maf: 0.1

# Sample sizes to use
# sample_size: [500, 1000, 1500, 2000, 2500]
# sample_size: 777

# Whether to include the causal variant in the set of genotypes provided to the
# finemapping methods. Set this to true if you're interested in seeing how the
# methods perform when the causal variant is absent from the data.
# Defaults to false if not specified
# You can also provide a list of booleans, if you want to test both values
exclude_causal: [true, false]
Loading
Loading