Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
gaow committed Aug 11, 2023
1 parent ef5f867 commit 564c84a
Show file tree
Hide file tree
Showing 3 changed files with 59 additions and 29 deletions.
71 changes: 43 additions & 28 deletions _sources/code/data_preprocessing/genotype/VCF_QC.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "markdown",
"id": "scientific-inspiration",
"id": "caroline-lawyer",
"metadata": {
"kernel": "SoS",
"tags": []
Expand All @@ -13,7 +13,7 @@
},
{
"cell_type": "markdown",
"id": "northern-primary",
"id": "changing-essex",
"metadata": {
"kernel": "SoS"
},
Expand All @@ -23,7 +23,7 @@
},
{
"cell_type": "markdown",
"id": "competitive-enemy",
"id": "interstate-invalid",
"metadata": {
"kernel": "SoS"
},
Expand All @@ -46,7 +46,7 @@
},
{
"cell_type": "markdown",
"id": "dress-influence",
"id": "announced-attempt",
"metadata": {
"kernel": "SoS"
},
Expand All @@ -69,7 +69,7 @@
},
{
"cell_type": "markdown",
"id": "banner-portrait",
"id": "closing-cleaners",
"metadata": {
"kernel": "SoS",
"tags": []
Expand All @@ -94,7 +94,7 @@
},
{
"cell_type": "markdown",
"id": "patient-catering",
"id": "facial-pregnancy",
"metadata": {
"kernel": "SoS"
},
Expand All @@ -112,7 +112,7 @@
},
{
"cell_type": "markdown",
"id": "inclusive-reconstruction",
"id": "behavioral-sequence",
"metadata": {
"kernel": "SoS"
},
Expand All @@ -134,7 +134,7 @@
},
{
"cell_type": "markdown",
"id": "wicked-teens",
"id": "ruled-youth",
"metadata": {
"kernel": "SoS"
},
Expand Down Expand Up @@ -177,7 +177,7 @@
},
{
"cell_type": "markdown",
"id": "ranging-kansas",
"id": "psychological-understanding",
"metadata": {
"kernel": "SoS"
},
Expand All @@ -192,7 +192,7 @@
{
"cell_type": "code",
"execution_count": 13,
"id": "commercial-wagon",
"id": "extended-bachelor",
"metadata": {
"kernel": "SoS"
},
Expand All @@ -211,7 +211,7 @@
},
{
"cell_type": "markdown",
"id": "otherwise-gnome",
"id": "lightweight-swedish",
"metadata": {
"kernel": "SoS"
},
Expand All @@ -222,7 +222,7 @@
{
"cell_type": "code",
"execution_count": 4,
"id": "tribal-product",
"id": "royal-prompt",
"metadata": {
"kernel": "SoS"
},
Expand All @@ -241,7 +241,7 @@
},
{
"cell_type": "markdown",
"id": "wrapped-indie",
"id": "passing-moral",
"metadata": {
"kernel": "SoS"
},
Expand All @@ -252,7 +252,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "thorough-microwave",
"id": "elegant-exclusion",
"metadata": {
"kernel": "SoS"
},
Expand All @@ -264,7 +264,7 @@
},
{
"cell_type": "markdown",
"id": "approximate-patient",
"id": "japanese-buddy",
"metadata": {
"kernel": "SoS"
},
Expand All @@ -275,7 +275,7 @@
{
"cell_type": "code",
"execution_count": 1,
"id": "coupled-pipeline",
"id": "governmental-welcome",
"metadata": {
"kernel": "Bash",
"tags": []
Expand Down Expand Up @@ -362,6 +362,12 @@
" qc_3:\n",
" Workflow Options:\n",
" --[no-]remove-duplicates (default to False)\n",
" --remove-samples . (as path)\n",
" The path to the file that contains the list of samples\n",
" to remove (format FID, IID)\n",
" --keep-samples . (as path)\n",
" The path to the file that contains the list of samples\n",
" to keep (format FID, IID)\n",
" qc_4:\n"
]
}
Expand All @@ -372,7 +378,7 @@
},
{
"cell_type": "markdown",
"id": "square-smoke",
"id": "liable-blocking",
"metadata": {
"kernel": "Bash"
},
Expand All @@ -383,7 +389,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "thrown-wheat",
"id": "chubby-fishing",
"metadata": {
"kernel": "SoS"
},
Expand Down Expand Up @@ -463,7 +469,7 @@
},
{
"cell_type": "markdown",
"id": "paperback-requirement",
"id": "empirical-gasoline",
"metadata": {
"kernel": "SoS",
"tags": []
Expand All @@ -479,7 +485,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "brown-national",
"id": "lined-hungarian",
"metadata": {
"kernel": "SoS"
},
Expand All @@ -501,7 +507,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "scenic-chart",
"id": "terminal-cleaners",
"metadata": {
"kernel": "SoS"
},
Expand Down Expand Up @@ -532,7 +538,7 @@
},
{
"cell_type": "markdown",
"id": "increased-complaint",
"id": "funky-light",
"metadata": {
"kernel": "SoS"
},
Expand All @@ -545,7 +551,7 @@
{
"cell_type": "code",
"execution_count": 4,
"id": "neither-liberty",
"id": "general-lewis",
"metadata": {
"kernel": "SoS"
},
Expand Down Expand Up @@ -594,7 +600,7 @@
},
{
"cell_type": "markdown",
"id": "needed-mitchell",
"id": "loving-mission",
"metadata": {
"kernel": "SoS"
},
Expand All @@ -605,7 +611,7 @@
{
"cell_type": "code",
"execution_count": 3,
"id": "southeast-garlic",
"id": "dirty-census",
"metadata": {
"kernel": "SoS"
},
Expand Down Expand Up @@ -671,7 +677,7 @@
},
{
"cell_type": "markdown",
"id": "driven-mills",
"id": "understood-motivation",
"metadata": {
"kernel": "SoS"
},
Expand All @@ -686,7 +692,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "quick-windsor",
"id": "monthly-soccer",
"metadata": {
"kernel": "SoS"
},
Expand All @@ -695,6 +701,13 @@
"[qc_3 (export to PLINK)]\n",
"parameter: walltime = '24h'\n",
"parameter: remove_duplicates = False\n",
"# The path to the file that contains the list of samples to remove (format FID, IID)\n",
"parameter: remove_samples = path('.')\n",
"# The path to the file that contains the list of samples to keep (format FID, IID)\n",
"parameter: keep_samples = path('.')\n",
"fail_if(not (keep_samples.is_file() or keep_samples == path('.')), msg = f'Cannot find ``{keep_samples}``')\n",
"fail_if(not (remove_samples.is_file() or remove_samples == path('.')), msg = f'Cannot find ``{remove_samples}``')\n",
"\n",
"output: f'{_input:nn}.bed'\n",
"task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'\n",
"bash: container = container, expand= \"${ }\", stderr = f'{_output:n}.stderr', stdout = f'{_output:n}.stdout', entrypoint=entrypoint\n",
Expand All @@ -703,6 +716,8 @@
" --vcf-require-gt \\\n",
" --allow-extra-chr \\\n",
" --geno-counts \\\n",
" ${('--keep %s' % keep_samples) if keep_samples.is_file() else \"\"} \\\n",
" ${('--remove %s' % remove_samples) if remove_samples.is_file() else \"\"} \\\n",
" --make-bed --out ${_output:n} ${\"--rm-dup exclude-all\" if remove_duplicates else \"\" }\n",
"bash: expand= \"${ }\", stderr = f'{_output:n}.stderr', stdout = f'{_output:n}.stdout', container = container, entrypoint=entrypoint\n",
" stdout=${_output:n}.stdout\n",
Expand All @@ -715,7 +730,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "still-basketball",
"id": "convertible-massage",
"metadata": {
"kernel": "SoS"
},
Expand Down
15 changes: 15 additions & 0 deletions code/data_preprocessing/genotype/VCF_QC.html
Original file line number Diff line number Diff line change
Expand Up @@ -913,6 +913,12 @@ <h2>Command Interface<a class="headerlink" href="#command-interface" title="Perm
qc_3:
Workflow Options:
--[no-]remove-duplicates (default to False)
--remove-samples . (as path)
The path to the file that contains the list of samples
to remove (format FID, IID)
--keep-samples . (as path)
The path to the file that contains the list of samples
to keep (format FID, IID)
qc_4:
</pre></div>
</div>
Expand Down Expand Up @@ -1166,6 +1172,13 @@ <h2>Genotype QC<a class="headerlink" href="#genotype-qc" title="Permalink to thi
<div class="highlight-sos notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="n">qc_3</span> <span class="p">(</span><span class="n">export</span> <span class="n">to</span> <span class="n">PLINK</span><span class="p">)]</span>
<span class="kn">parameter:</span> <span class="kp">walltime</span> <span class="o">=</span> <span class="s1">&#39;24h&#39;</span>
<span class="kn">parameter:</span> <span class="n">remove_duplicates</span> <span class="o">=</span> <span class="kc">False</span>
<span class="c1"># The path to the file that contains the list of samples to remove (format FID, IID)</span>
<span class="kn">parameter:</span> <span class="n">remove_samples</span> <span class="o">=</span> <span class="n">path</span><span class="p">(</span><span class="s1">&#39;.&#39;</span><span class="p">)</span>
<span class="c1"># The path to the file that contains the list of samples to keep (format FID, IID)</span>
<span class="kn">parameter:</span> <span class="n">keep_samples</span> <span class="o">=</span> <span class="n">path</span><span class="p">(</span><span class="s1">&#39;.&#39;</span><span class="p">)</span>
<span class="n">fail_if</span><span class="p">(</span><span class="ow">not</span> <span class="p">(</span><span class="n">keep_samples</span><span class="o">.</span><span class="n">is_file</span><span class="p">()</span> <span class="ow">or</span> <span class="n">keep_samples</span> <span class="o">==</span> <span class="n">path</span><span class="p">(</span><span class="s1">&#39;.&#39;</span><span class="p">)),</span> <span class="n">msg</span> <span class="o">=</span> <span class="sa">f</span><span class="s1">&#39;Cannot find ``</span><span class="si">{</span><span class="n">keep_samples</span><span class="si">}</span><span class="s1">``&#39;</span><span class="p">)</span>
<span class="n">fail_if</span><span class="p">(</span><span class="ow">not</span> <span class="p">(</span><span class="n">remove_samples</span><span class="o">.</span><span class="n">is_file</span><span class="p">()</span> <span class="ow">or</span> <span class="n">remove_samples</span> <span class="o">==</span> <span class="n">path</span><span class="p">(</span><span class="s1">&#39;.&#39;</span><span class="p">)),</span> <span class="n">msg</span> <span class="o">=</span> <span class="sa">f</span><span class="s1">&#39;Cannot find ``</span><span class="si">{</span><span class="n">remove_samples</span><span class="si">}</span><span class="s1">``&#39;</span><span class="p">)</span>

<span class="kn">output:</span> <span class="sa">f</span><span class="s1">&#39;</span><span class="si">{</span><span class="n">_input</span><span class="si">:</span><span class="s1">nn</span><span class="si">}</span><span class="s1">.bed&#39;</span>
<span class="kn">task:</span> <span class="kp">trunk_workers</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="kp">trunk_size</span> <span class="o">=</span> <span class="n">job_size</span><span class="p">,</span> <span class="kp">walltime</span> <span class="o">=</span> <span class="kp">walltime</span><span class="p">,</span> <span class="kp">mem</span> <span class="o">=</span> <span class="kp">mem</span><span class="p">,</span> <span class="kp">cores</span> <span class="o">=</span> <span class="n">numThreads</span><span class="p">,</span> <span class="kp">tags</span> <span class="o">=</span> <span class="sa">f</span><span class="s1">&#39;</span><span class="si">{</span><span class="n">step_name</span><span class="si">}</span><span class="s1">_</span><span class="si">{</span><span class="n">_output</span><span class="si">:</span><span class="s1">bn</span><span class="si">}</span><span class="s1">&#39;</span>
<span class="kn">bash:</span> <span class="n">container</span> <span class="o">=</span> <span class="n">container</span><span class="p">,</span> <span class="n">expand</span><span class="o">=</span> <span class="s2">&quot;${ }&quot;</span><span class="p">,</span> <span class="n">stderr</span> <span class="o">=</span> <span class="sa">f</span><span class="s1">&#39;</span><span class="si">{</span><span class="n">_output</span><span class="si">:</span><span class="s1">n</span><span class="si">}</span><span class="s1">.stderr&#39;</span><span class="p">,</span> <span class="n">stdout</span> <span class="o">=</span> <span class="sa">f</span><span class="s1">&#39;</span><span class="si">{</span><span class="n">_output</span><span class="si">:</span><span class="s1">n</span><span class="si">}</span><span class="s1">.stdout&#39;</span><span class="p">,</span> <span class="n">entrypoint</span><span class="o">=</span><span class="n">entrypoint</span>
Expand All @@ -1174,6 +1187,8 @@ <h2>Genotype QC<a class="headerlink" href="#genotype-qc" title="Permalink to thi
<span class="o">--</span><span class="n">vcf</span><span class="o">-</span><span class="n">require</span><span class="o">-</span><span class="n">gt</span> \
<span class="o">--</span><span class="n">allow</span><span class="o">-</span><span class="n">extra</span><span class="o">-</span><span class="nb">chr</span> \
<span class="o">--</span><span class="n">geno</span><span class="o">-</span><span class="n">counts</span> \
<span class="err">$</span><span class="p">{(</span><span class="s1">&#39;--keep </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">keep_samples</span><span class="p">)</span> <span class="k">if</span> <span class="n">keep_samples</span><span class="o">.</span><span class="n">is_file</span><span class="p">()</span> <span class="k">else</span> <span class="s2">&quot;&quot;</span><span class="p">}</span> \
<span class="err">$</span><span class="p">{(</span><span class="s1">&#39;--remove </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">remove_samples</span><span class="p">)</span> <span class="k">if</span> <span class="n">remove_samples</span><span class="o">.</span><span class="n">is_file</span><span class="p">()</span> <span class="k">else</span> <span class="s2">&quot;&quot;</span><span class="p">}</span> \
<span class="o">--</span><span class="n">make</span><span class="o">-</span><span class="n">bed</span> <span class="o">--</span><span class="n">out</span> <span class="err">$</span><span class="p">{</span><span class="n">_output</span><span class="p">:</span><span class="n">n</span><span class="p">}</span> <span class="err">$</span><span class="p">{</span><span class="s2">&quot;--rm-dup exclude-all&quot;</span> <span class="k">if</span> <span class="n">remove_duplicates</span> <span class="k">else</span> <span class="s2">&quot;&quot;</span> <span class="p">}</span>
<span class="kn">bash:</span> <span class="n">expand</span><span class="o">=</span> <span class="s2">&quot;${ }&quot;</span><span class="p">,</span> <span class="n">stderr</span> <span class="o">=</span> <span class="sa">f</span><span class="s1">&#39;</span><span class="si">{</span><span class="n">_output</span><span class="si">:</span><span class="s1">n</span><span class="si">}</span><span class="s1">.stderr&#39;</span><span class="p">,</span> <span class="n">stdout</span> <span class="o">=</span> <span class="sa">f</span><span class="s1">&#39;</span><span class="si">{</span><span class="n">_output</span><span class="si">:</span><span class="s1">n</span><span class="si">}</span><span class="s1">.stdout&#39;</span><span class="p">,</span> <span class="n">container</span> <span class="o">=</span> <span class="n">container</span><span class="p">,</span> <span class="n">entrypoint</span><span class="o">=</span><span class="n">entrypoint</span>
<span class="n">stdout</span><span class="o">=</span><span class="err">$</span><span class="p">{</span><span class="n">_output</span><span class="p">:</span><span class="n">n</span><span class="p">}</span><span class="o">.</span><span class="n">stdout</span>
Expand Down
2 changes: 1 addition & 1 deletion searchindex.js

Large diffs are not rendered by default.

0 comments on commit 564c84a

Please sign in to comment.