Skip to content

Commit

Permalink
Merge pull request #225 from monarch-initiative/update-tutorial
Browse files Browse the repository at this point in the history
Update tutorial
  • Loading branch information
ielis authored Aug 22, 2024
2 parents 6cc296e + acb8e55 commit b8d7b7b
Show file tree
Hide file tree
Showing 33 changed files with 1,223 additions and 541 deletions.
2 changes: 1 addition & 1 deletion docs/_static/genophenocorr.css
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
.wy-nav-content {
max-width: 55% !important;
max-width: 1200px !important;
}
Binary file added docs/img/tutorial/tbx5_protein_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
16 changes: 8 additions & 8 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
=============
genophenocorr
=============
=====
GPSEA
=====


The concept of phenotype denote the observable attributes of an individual, but in
Expand All @@ -10,9 +10,9 @@ A key question in biology and human genetics concerns the relationships between
genetics, the focus is generally placed on the study of whether specific disease-causing alleles are associated with specific phenotypic
manifestations of the disease.

`genophenocorr` is a Python package designed to support genotype-phenotype correlation analysis.
The input to `genophenocorr` is a collection of `Global Alliance for Genomics and Health (GA4GH) Phenopackets <https://pubmed.ncbi.nlm.nih.gov/35705716/>`_.
`genophenocorr` ingests data from these phenopackets and performs analysis of the correlation of specific variants,
`GPSEA` (genotypes and phenotypes - study and evaluation of associations) is a Python package designed to support genotype-phenotype correlation analysis.
We pronounce GPSEA as "G"-"P"-"C". The input to `GPSEA` is a collection of `Global Alliance for Genomics and Health (GA4GH) Phenopackets <https://pubmed.ncbi.nlm.nih.gov/35705716/>`_.
`gpsea` ingests data from these phenopackets and performs analysis of the correlation of specific variants,
variant types (e.g., missense vs. premature termination codon), or variant location in protein motifs or other features.
The phenotypic abnormalities are represented by `Human Phenotype Ontology (HPO) <https://hpo.jax.org/app/>`_ terms.
Statistical analysis is performed using a `Fisher Exact Test <https://en.wikipedia.org/wiki/Fisher%27s_exact_test>`_,
Expand All @@ -35,11 +35,11 @@ We provide recommended reading for background on the study of genotype-phenotype
Feedback
--------

The best place to leave feedback, ask questions, and report bugs is the `genophenocorr Issue Tracker <https://github.com/monarch-initiative/genophenocorr/issues>`_.
The best place to leave feedback, ask questions, and report bugs is the `GPSEA Issue Tracker <https://github.com/monarch-initiative/genophenocorr/issues>`_.

.. toctree::
:caption: Installation & Tutorial
:name: tutorial
:name: index-toc
:maxdepth: 1
:hidden:

Expand Down
334 changes: 334 additions & 0 deletions docs/report/tbx5_cohort_info.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,334 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Cohort</title>
<style>
table {
border-collapse: collapse;
margin: 25px 0;
font-size: 0.9em;
font-family: sans-serif;
min-width: 400px;
box-shadow: 0 0 20px rgba(0, 0, 0, 0.15);
}

.table .column-1 {
text-align: left;
}
th {
background-color: LightSkyBlue;
border: 1px solid #dddddd;
text-align: left;
padding: 2px;
font-weight: bold;
font-size: 120%;
}

tr {
border: 1px solid #dddddd;
}

td {
padding: 2px;
font-weight: bold;
}

tr:nth-child(even) {
background-color: #f2f2f2;
}

.table td, tr {
text-align: right;
}

.lft {
text-align: left;
}

div {
display: block;
width: 100%;
padding-right: 1%;
margin-bottom: 20px; /* Add margin to separate divs */
}

caption {
caption-side: top;
text-align: left;
padding-bottom: 10px;
font-weight: bold;
}
</style>
</head>

<body>
<h1>genophenocorr cohort analysis</h1>
<p>Successfully ingested 156 individuals.</p>

<p>No errors encountered.</p>


<table>
<caption style="color: black;">
<h3>Top 10 HPO Terms</h3>
A total of 116 HPO terms were used to annotated the cohort.
</caption>
<tbody>
<tr class="strng">
<th class="lft">HPO Term</th>
<th>ID</th>
<th>Seen in <em>n</em> individuals</th>
</tr>

<tr>
<td class="lft">Atrial septal defect</td>
<td>HP:0001631</td>
<td>50</td>
</tr>

<tr>
<td class="lft">Ventricular septal defect</td>
<td>HP:0001629</td>
<td>41</td>
</tr>

<tr>
<td class="lft">Hypoplasia of the radius</td>
<td>HP:0002984</td>
<td>40</td>
</tr>

<tr>
<td class="lft">Triphalangeal thumb</td>
<td>HP:0001199</td>
<td>36</td>
</tr>

<tr>
<td class="lft">Absent thumb</td>
<td>HP:0009777</td>
<td>32</td>
</tr>

<tr>
<td class="lft">Short thumb</td>
<td>HP:0009778</td>
<td>32</td>
</tr>

<tr>
<td class="lft">Abnormal carpal morphology</td>
<td>HP:0001191</td>
<td>30</td>
</tr>

<tr>
<td class="lft">Secundum atrial septal defect</td>
<td>HP:0001684</td>
<td>27</td>
</tr>

<tr>
<td class="lft">Absent radius</td>
<td>HP:0003974</td>
<td>15</td>
</tr>

<tr>
<td class="lft">Cardiac conduction abnormality</td>
<td>HP:0031546</td>
<td>14</td>
</tr>

</tbody>
</table>

<table>
<caption style="color: black;">
<h3>Top 10 Variants</h3>
Variants are shown according to NM_181486.4. A total of 156 unique variants were identified in the cohort.
</caption>
<tbody>
<tr class="strng">
<th>Count</th>
<th class="lft">Variant key</th>
<th>Variant Name</th>
<th>Protein Variant</th>
<th>Variant Class</th>
</tr>

<tr>
<td>22</td>
<td class="lft">12_114385521_114385521_C_T</td>
<td>c.710G>A</td>
<td>p.Arg237Gln</td>
<td>MISSENSE_VARIANT</td>
</tr>

<tr>
<td>20</td>
<td class="lft">12_114401830_114401830_C_T</td>
<td>c.238G>A</td>
<td>p.Gly80Arg</td>
<td>MISSENSE_VARIANT</td>
</tr>

<tr>
<td>8</td>
<td class="lft">12_114385563_114385563_G_A</td>
<td>c.668C>T</td>
<td>p.Thr223Met</td>
<td>MISSENSE_VARIANT</td>
</tr>

<tr>
<td>6</td>
<td class="lft">12_114398675_114398675_G_T</td>
<td>c.408C>A</td>
<td>p.Tyr136Ter</td>
<td>STOP_GAINED</td>
</tr>

<tr>
<td>5</td>
<td class="lft">12_114398682_114398682_C_CG</td>
<td>c.400dup</td>
<td>p.Arg134ProfsTer49</td>
<td>FRAMESHIFT_VARIANT</td>
</tr>

<tr>
<td>5</td>
<td class="lft">12_114403792_114403792_C_CG</td>
<td>c.106_107insC</td>
<td>p.Ser36ThrfsTer25</td>
<td>FRAMESHIFT_VARIANT</td>
</tr>

<tr>
<td>5</td>
<td class="lft">12_114399514_114399514_A_C</td>
<td>c.361T>G</td>
<td>p.Trp121Gly</td>
<td>MISSENSE_VARIANT, SPLICE_REGION_VARIANT</td>
</tr>

<tr>
<td>4</td>
<td class="lft">12_114385474_114385474_A_G</td>
<td>c.755+2T>C</td>
<td>None</td>
<td>SPLICE_DONOR_VARIANT</td>
</tr>

<tr>
<td>4</td>
<td class="lft">12_114398656_114398656_C_CG</td>
<td>c.426dup</td>
<td>p.Ala143ArgfsTer40</td>
<td>FRAMESHIFT_VARIANT</td>
</tr>

<tr>
<td>4</td>
<td class="lft">12_114366360_114366360_C_T</td>
<td>c.787G>A</td>
<td>p.Val263Met</td>
<td>MISSENSE_VARIANT</td>
</tr>

</tbody>
</table>
<table>
<caption style="color: black;">
<h3>Diseases</h3>
</caption>
<tbody>
<tr class="strng">
<th class="lft">Disease Name</th>
<th >Disease ID</th>
<th>Annotation Count</th>
</tr>

<tr>
<td class="lft">Holt-Oram syndrome</td>
<td>OMIM:142900</td>
<td>156</td>
</tr>

</tbody>
</table>

<table>
<caption style="color: black;">
<h3>Variant categories for NM_181486.4</h3>
</caption>
<tbody>
<tr class="strng">
<th class="lft">Variant effect</th>
<th>Annotation Count</th>
</tr>

<tr>
<td class="lft">FRAMESHIFT_VARIANT</td>
<td>38</td>
</tr>

<tr>
<td class="lft">MISSENSE_VARIANT</td>
<td>85</td>
</tr>

<tr>
<td class="lft">STOP_GAINED</td>
<td>19</td>
</tr>

<tr>
<td class="lft">SPLICE_REGION_VARIANT</td>
<td>10</td>
</tr>

<tr>
<td class="lft">SPLICE_ACCEPTOR_VARIANT</td>
<td>2</td>
</tr>

<tr>
<td class="lft">SPLICE_DONOR_VARIANT</td>
<td>7</td>
</tr>

<tr>
<td class="lft">SPLICE_DONOR_5TH_BASE_VARIANT</td>
<td>2</td>
</tr>

<tr>
<td class="lft">INTRON_VARIANT</td>
<td>2</td>
</tr>

<tr>
<td class="lft">INFRAME_INSERTION</td>
<td>2</td>
</tr>

<tr>
<td class="lft">STOP_RETAINED_VARIANT</td>
<td>2</td>
</tr>

<tr>
<td class="lft">INFRAME_DELETION</td>
<td>1</td>
</tr>

</tbody>
</table>



</body>
</html>
18 changes: 18 additions & 0 deletions docs/report/tbx5_frameshift_vs_missense.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
"Genotype group: Missense, Frameshift",Missense,Missense,Frameshift,Frameshift,,
,Count,Percent,Count,Percent,p value,Corrected p value
Ventricular septal defect [HP:0001629],31/60,52%,19/19,100%,5.6190936213143254e-05,0.0008990549794102921
Abnormal atrioventricular conduction [HP:0005150],0/22,0%,3/3,100%,0.00043478260869565214,0.003478260869565217
Atrioventricular block [HP:0001678],0/22,0%,2/2,100%,0.0036231884057971015,0.014492753623188406
Heart block [HP:0012722],0/22,0%,2/2,100%,0.0036231884057971015,0.014492753623188406
Absent thumb [HP:0009777],12/71,17%,14/31,45%,0.005628510156750059,0.018011232501600187
Patent ductus arteriosus [HP:0001643],3/37,8%,2/2,100%,0.01349527665317139,0.03598740440845704
Triphalangeal thumb [HP:0001199],13/72,18%,13/32,41%,0.02560162393963452,0.058517997576307476
Cardiac conduction abnormality [HP:0031546],14/36,39%,3/3,100%,0.07440639019586388,0.14881278039172777
Secundum atrial septal defect [HP:0001684],14/35,40%,4/22,18%,0.1424522583078588,0.2532484592139712
Muscular ventricular septal defect [HP:0011623],6/59,10%,6/25,24%,0.1687456462342971,0.2699930339748754
Pulmonary arterial hypertension [HP:0002092],4/6,67%,0/2,0%,0.42857142857142855,0.6233766233766234
Hypoplasia of the ulna [HP:0003022],1/12,8%,2/10,20%,0.5714285714285713,0.7619047619047618
Hypoplasia of the radius [HP:0002984],30/62,48%,6/14,43%,0.7735491022101784,0.9520604334894502
Absent radius [HP:0003974],7/32,22%,6/25,24%,1.0,1.0
Short humerus [HP:0005792],7/17,41%,4/9,44%,1.0,1.0
Atrial septal defect [HP:0001631],42/44,95%,20/20,100%,1.0,1.0
Loading

0 comments on commit b8d7b7b

Please sign in to comment.