Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
gaow committed Nov 15, 2023
1 parent f5fdc15 commit 313bc6f
Show file tree
Hide file tree
Showing 261 changed files with 63,191 additions and 45,060 deletions.
2 changes: 1 addition & 1 deletion .buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 18c70c2741933e246bb2d3bfd3128ab1
config: c107af7d3dded06e26715a49c95b6214
tags: 645f666f9bcd5a90fca523b33c5a78b7
1,230 changes: 665 additions & 565 deletions README.html

Large diffs are not rendered by default.

File renamed without changes
File renamed without changes
Binary file added _images/PCA_20_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
Binary file added _images/PCA_8_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/RNA_calling_2_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
File renamed without changes
File renamed without changes

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
:root {
--tabs-color-label-active: hsla(231, 99%, 66%, 1);
--tabs-color-label-inactive: rgba(178, 206, 245, 0.62);
--tabs-color-overline: rgb(207, 236, 238);
--tabs-color-underline: rgb(207, 236, 238);
--tabs-size-label: 1rem;
}
136 changes: 116 additions & 20 deletions _sources/code/association_scan/TensorQTL/TensorQTL.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"kernel": "SoS"
},
"source": [
"# TensorQTL QTL association testing"
"# TensorQTL QTL Association Testing"
]
},
{
Expand All @@ -15,7 +15,7 @@
"kernel": "SoS"
},
"source": [
"This notebook implements a workflow using [tensorQTL](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1836-7) to perform QTL association testing."
"This notebook implements a workflow using [TensorQTL](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1836-7) to perform QTL association testing."
]
},
{
Expand All @@ -39,9 +39,13 @@
"\n",
"For cis-analysis:\n",
"\n",
"- Optionally, a list of genomic regions associate with each molecular features to analyze. The default cis-analysis will use a window around TSS. This can be customized to take given start and end genomic coordinates.\n",
"\n",
"\n",
"- Optionally, a list of genomic regions associate with each molecular features to analyze. The default cis-analysis will use a window around TSS. This can be customized to take given start and end genomic coordinates."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Output\n",
"\n",
"For each chromosome, several of summary statistics files are generated, including both nominal test statistics for each test, as well as region (gene) level association evidence.\n",
Expand Down Expand Up @@ -84,13 +88,98 @@
"- pval_beta - Second permutation P-value obtained via beta approximation. This is the one to use for downstream analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Minimal Working Example\n",
"\n",
"The data can be found on [Synapse](https://www.synapse.org/#!Synapse:syn36416559/files/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cis TensorQTL Command "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"sos run pipeline/TensorQTL.ipynb cis \\\n",
" --genotype-file output/genotype_by_chrom/protocol_example.genotype.chr21_22.genotype_by_chrom_files.txt \\\n",
" --phenotype-file output/phenotype_by_chrom/protocol_example.protein.bed.phenotype_by_chrom_files.txt \\\n",
" --covariate-file output/covariate/protocol_example.protein.protocol_example.samples.protocol_example.genotype.chr21_22.pQTL.plink_qc.prune.pca.Marchenko_PC.gz \\\n",
" --customized_cis_windows prototype_example/protocol_example/protocol_example.protein.enhanced_cis_chr21_chr22.bed \\\n",
" --cwd output/cis_association/ \\\n",
" --container containers/TensorQTL.sif --MAC 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Trans TensorQTL Command \n",
"\n",
"Note: Some protein is not in the customized cis windows list. There we will need to remove them from the analysis by create a region_list. Noted that the region list need to be a actual file. So `<()` file is not acceptable. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"zcat output/protocol_example.protein.bed.gz | cut -f 1,2,3,4 | grep -v -e ENSG00000163554 \\\n",
" -e ENSG00000171564 -e ENSG00000171560 -e ENSG00000171557 > output/protocol_example.protein.region_list"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following will take more than 180G of memory to run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"sos run xqtl-pipeline/pipeline/TensorQTL.ipynb trans \\\n",
" --genotype-file output/genotype_by_chrom/protocol_example.genotype.chr21_22.genotype_by_chrom_files.txt \\\n",
" --phenotype-file output/phenotype_by_chrom/protocol_example.protein.bed.phenotype_by_chrom_files.txt \\\n",
" --region-list output/protocol_example.protein.region_list \\\n",
" --covariate-file output/protocol_example.protein.protocol_example.samples.protocol_example.genotype.chr21_22.pQTL.unrelated.plink_qc.prune.pca.Marchenko_PC.gz \\\n",
" --customized-cis-windows output/protocol_example.protein.customized_cis.tsv \\\n",
" --cwd output/association/trans/ \\\n",
" --container containers/TensorQTL.sif --MAC 5 --numThreads 8 -J 1 -q csg --mem 240G -c /mnt/vast/hpc/csg/molecular_phenotype_calling/csg.yml "
]
},
{
"cell_type": "markdown",
"metadata": {
"kernel": "SoS"
},
"source": [
"# Command interface "
"## Command Interface "
]
},
{
Expand Down Expand Up @@ -196,7 +285,7 @@
"kernel": "Bash"
},
"source": [
"## Minimal working example\n",
"## Old Minimal working example\n",
"\n",
"An MWE is uploaded to [google drive](https://drive.google.com/drive/folders/1yjTwoO0DYGi-J9ouMsh9fHKfDmsXJ_4I?usp=sharing).\n",
"The singularity image (sif) for running this MWE is uploaded to [google drive](https://drive.google.com/drive/folders/1mLOS3AVQM8yTaWtCbO8Q3xla98Nr5bZQ)\n",
Expand Down Expand Up @@ -296,13 +385,17 @@
"# The name of phenotype corresponding to gene_id or gene_name in the region\n",
"parameter: region_name = \"gene_id\"\n",
"# The phenotype group file to group molecule_trait into molecule_trait_object\n",
"# This is applicable to when there are multiple molecular events in the same region, such as sQTL analysis.\n",
"# This applies to multiple molecular events in the same region, such as sQTL analysis.\n",
"parameter: phenotype_group = path() \n",
"parameter: region_list_phenotype_column = 4\n",
"\n",
"# Specify the cis window for the up and downstream radius to analyze around the region of interest, in units of bp\n",
"# This parameter will be set to zero if `customized_cis_windows` is provided.\n",
"\n",
"# Specify the cis window for the up and downstream radius to analyze around the region of interest in units of bp\n",
"# This parameter will be zero if `customized_cis_windows` is provided.\n",
"parameter: window = 1000000\n",
"\n",
"\n",
"\n",
"# Number of threads\n",
"parameter: numThreads = 8\n",
"# For cluster jobs, number commands to run per job\n",
Expand All @@ -314,7 +407,7 @@
"import re\n",
"parameter: entrypoint= ('micromamba run -a \"\" -n' + ' ' + re.sub(r'(_apptainer:latest|_docker:latest|\\.sif)$', '', container.split('/')[-1])) if container else \"\"\n",
"\n",
"# Use the header of covariate file to decide the sample size\n",
"# Use the header of the covariate file to decide the sample size\n",
"import pandas as pd\n",
"N = len(pd.read_csv(covariate_file, sep = \"\\t\",nrows = 1).columns) - 1\n",
"\n",
Expand Down Expand Up @@ -358,9 +451,11 @@
"# and you want to avoid computing the nominal results again\n",
"parameter: skip_nominal_if_exist = False\n",
"input: input_files, group_by = len(input_files[0]), group_with = \"input_chroms\" \n",
"output: parquet = f'{cwd:a}/{name}.cis_qtl_pairs.{_input_chroms}.parquet', # This convention is necessary to match the pattern of map_norminal output\n",
" regional = f'{cwd:a}/{name}.{_input_chroms}.cis_qtl_regional.gz',\n",
" nominal = f'{cwd:a}/{name}.{_input_chroms}.cis_qtl_nominal.gz'\n",
"output: parquet = f'{cwd:a}/{_input[0]:bnn}.cis_qtl_pairs.{_input_chroms}.parquet', # This convention is necessary to match the pattern of map_norminal output\n",
" regional = f'{cwd:a}/{_input[0]:bnn}.cis_qtl_regional.gz',\n",
" nominal = f'{cwd:a}/{_input[0]:bnn}.cis_qtl_nominal.gz'\n",
"\n",
"\n",
"task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output[0]:bn}'\n",
"python: expand= \"$[ ]\", stderr = f'{_output[0]}.stderr', stdout = f'{_output[0]}.stdout' , container = container, entrypoint = entrypoint\n",
" import pandas as pd\n",
Expand All @@ -380,7 +475,8 @@
" ## Analyze only the regions listed\n",
" if $[region_list.is_file()]:\n",
" region = pd.read_csv(\"$[region_list:a]\", comment=\"#\", header=None)\n",
" keep_region = region[phenotype_id].to_list()\n",
" phenotype_column = 1 if len(region.columns) == 1 else $[region_list_phenotype_column]\n",
" keep_region = region.iloc[:,phenotype_column-1].to_list()\n",
" phenotype_df = phenotype_df[phenotype_df.index.isin(keep_region)]\n",
" phenotype_pos_df = phenotype_pos_df[phenotype_pos_df.index.isin(keep_region)]\n",
"\n",
Expand Down Expand Up @@ -451,7 +547,7 @@
" phenotype_pos_df,\n",
" covariates_df=covariates_df, \n",
" seed=999, \n",
" window=$[window], \n",
" window=window, \n",
" maf_threshold = $[maf_threshold],\n",
" group_s=group_s)\n",
" \n",
Expand Down Expand Up @@ -531,10 +627,8 @@
"input_chroms = [x[0] for x in input_files]\n",
"input_files = [x[1:] for x in input_files]\n",
"\n",
"#input: phenotype_file, genotype_file\n",
"#output: nominal = f'{cwd:a}/{_input[0]:bnn}.trans_qtl_nominal.gz'\n",
"input: input_files, group_by = len(input_files[0]), group_with = \"input_chroms\" \n",
"output: nominal = f'{cwd:a}/{name}.{_input_chroms}.trans_qtl_nominal.gz'\n",
"output: nominal = f'{cwd:a}/{_input[0]:bnn}.trans_qtl_nominal.gz'\n",
"parameter: batch_size = 50000\n",
"parameter: pval_threshold = 1.0\n",
"task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output[0]:bn}'\n",
Expand All @@ -556,7 +650,8 @@
" ## Analyze only the regions listed\n",
" if $[region_list.is_file()]:\n",
" region = pd.read_csv(\"$[region_list:a]\", comment=\"#\", header=None)\n",
" keep_region = region[phenotype_id].to_list()\n",
" phenotype_column = 1 if len(region.columns) == 1 else $[region_list_phenotype_column]\n",
" keep_region = region.iloc[:,phenotype_column-1].to_list()\n",
" phenotype_df = phenotype_df[phenotype_df.index.isin(keep_region)]\n",
" phenotype_pos_df = phenotype_pos_df[phenotype_pos_df.index.isin(keep_region)]\n",
"\n",
Expand All @@ -570,6 +665,7 @@
" raise ValueError(\"cannot uniquely match all the phentoype data in the input to the customized cis windows provided\")\n",
" window = 0 # In the updated tensorQTL, by default if there is a customized cis window, the actual cis window will be start - window & end + window, so it is necessary to change the window parameter to 0\n",
"\n",
"\n",
" covariates_df = pd.read_csv(covariates_file, sep='\\t', index_col=0).T\n",
" pr = genotypeio.PlinkReader(plink_prefix_path)\n",
" genotype_df = pr.load_genotypes()\n",
Expand Down
Loading

0 comments on commit 313bc6f

Please sign in to comment.