Skip to content

Commit

Permalink
Fix #11 (#12)
Browse files Browse the repository at this point in the history
Add simple graph alignment notebook to showcase `GraphAlignment` and greedy bootstrapped optimization using `n_repeats`
  • Loading branch information
ulupo committed May 15, 2024
1 parent fb7c79b commit 5844e39
Show file tree
Hide file tree
Showing 5 changed files with 356 additions and 20 deletions.
36 changes: 23 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ python -m pip install -e .

## Quickstart

### Input data preprocessing
### Input data preprocessing (MSA pairing)

First, parse your multiple sequence alignments (MSAs) in FASTA format
into a list of tuples `(header, sequence)` using
Expand Down Expand Up @@ -205,21 +205,28 @@ msa_B_oh = one_hot_encode_msa(msa_B_for_pairing, device=device)

### Pairing optimization

Finally, we can instantiate an
Finally, we can instantiate a class from `diffpass.train` to find an
optimal pairing between `x` and `y`. Here, `x` and `y` are MSAs, so we
can look for a pairing that optimizes the mutual information between `x`
and `y`. For this, we use
[`InformationPairing`](https://Bitbol-Lab.github.io/DiffPaSS/train.html#informationpairing)
object and optimize the mutual information between the paired MSAs using
the DiffPaSS bootstrapped optimization algorithm. The results are stored
in a
[`DiffPaSSResults`](https://Bitbol-Lab.github.io/DiffPaSS/base.html#diffpassresults)
container. The lists of (hard) losses and permutations found during the
optimization can be accessed as attributes of the container.
and the DiffPaSS bootstrapped optimization algorithm. See the tutorials
below for other examples, including for graph alignment when `x` and `y`
are weighted adjacency matrices.

``` python
from diffpass.train import InformationPairing

information_pairing = InformationPairing(group_sizes=species_sizes).to(device)
bootstrap_results = information_pairing.fit_bootstrap(x, y)
```

The results are stored in a
[`DiffPaSSResults`](https://Bitbol-Lab.github.io/DiffPaSS/base.html#diffpassresults)
container. The lists of (hard) losses and permutations found during the
optimization can be accessed as attributes of the container:

``` python
print(f"Final hard loss: {bootstrap_results.hard_losses[-1].item()}")
print(f"Final hard permutations (one permutation per species): {bootstrap_results.hard_perms[-1][-1].item()}")
```
Expand All @@ -229,11 +236,14 @@ the tutorials.

## Tutorials

See the
[`mutual_information_msa_pairing.ipynb`](https://github.com/Bitbol-Lab/DiffPaSS/blob/main/nbs/tutorials/mutual_information_msa_pairing.ipynb)
notebook for an example of paired MSA optimization in the case of
well-known prokaryotic datasets, for which ground truth pairings are
given by genome proximity.
- [`mutual_information_msa_pairing.ipynb`](https://github.com/Bitbol-Lab/DiffPaSS/blob/main/nbs/tutorials/mutual_information_msa_pairing.ipynb):
paired MSA optimization using mutual information in the case of
well-known prokaryotic datasets, for which ground truth pairings are
given by genome proximity.
- [`graph_alignment.ipynb`](https://github.com/Bitbol-Lab/DiffPaSS/blob/main/nbs/tutorials/graph_alignment.ipynb):
general graph alignment using
[`diffpass.train.GraphAlignment`](https://Bitbol-Lab.github.io/DiffPaSS/train.html#graphalignment),
with an example of aligning two weighted adjacency matrices.

## Documentation

Expand Down
10 changes: 7 additions & 3 deletions nbs/index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@
"source": [
"## Quickstart\n",
"\n",
"### Input data preprocessing\n",
"### Input data preprocessing (MSA pairing)\n",
"\n",
"First, parse your multiple sequence alignments (MSAs) in FASTA format into a list of tuples ``(header, sequence)`` using `read_msa`.\n",
"\n",
Expand Down Expand Up @@ -164,14 +164,17 @@
"\n",
"### Pairing optimization\n",
"\n",
"Finally, we can instantiate an `InformationPairing` object and optimize the mutual information between the paired MSAs using the DiffPaSS bootstrapped optimization algorithm. The results are stored in a `DiffPaSSResults` container. The lists of (hard) losses and permutations found during the optimization can be accessed as attributes of the container.\n",
"Finally, we can instantiate a class from `diffpass.train` to find an optimal pairing between `x` and `y`. Here, `x` and `y` are MSAs, so we can look for a pairing that optimizes the mutual information between `x` and `y`. For this, we use `InformationPairing` and the DiffPaSS bootstrapped optimization algorithm. See the tutorials below for other examples, including for graph alignment when `x` and `y` are weighted adjacency matrices.\n",
"\n",
"```python\n",
"from diffpass.train import InformationPairing\n",
"\n",
"information_pairing = InformationPairing(group_sizes=species_sizes).to(device)\n",
"bootstrap_results = information_pairing.fit_bootstrap(x, y)\n",
"```\n",
"\n",
"The results are stored in a `DiffPaSSResults` container. The lists of (hard) losses and permutations found during the optimization can be accessed as attributes of the container:\n",
"```python\n",
"print(f\"Final hard loss: {bootstrap_results.hard_losses[-1].item()}\")\n",
"print(f\"Final hard permutations (one permutation per species): {bootstrap_results.hard_perms[-1][-1].item()}\")\n",
"```\n",
Expand All @@ -185,7 +188,8 @@
"source": [
"## Tutorials\n",
"\n",
"See the [`mutual_information_msa_pairing.ipynb`](https://github.com/Bitbol-Lab/DiffPaSS/blob/main/nbs/tutorials/mutual_information_msa_pairing.ipynb) notebook for an example of paired MSA optimization in the case of well-known prokaryotic datasets, for which ground truth pairings are given by genome proximity."
"- [`mutual_information_msa_pairing.ipynb`](https://github.com/Bitbol-Lab/DiffPaSS/blob/main/nbs/tutorials/mutual_information_msa_pairing.ipynb): paired MSA optimization using mutual information in the case of well-known prokaryotic datasets, for which ground truth pairings are given by genome proximity.\n",
"- [`graph_alignment.ipynb`](https://github.com/Bitbol-Lab/DiffPaSS/blob/main/nbs/tutorials/graph_alignment.ipynb): general graph alignment using `diffpass.train.GraphAlignment`, with an example of aligning two weighted adjacency matrices."
]
},
{
Expand Down
1 change: 1 addition & 0 deletions nbs/sidebar.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,5 @@ website:
- train.ipynb
- section: tutorials
contents:
- tutorials/graph_alignment.ipynb
- tutorials/mutual_information_msa_pairing.ipynb
323 changes: 323 additions & 0 deletions nbs/tutorials/graph_alignment.ipynb

Large diffs are not rendered by default.

6 changes: 2 additions & 4 deletions nbs/tutorials/mutual_information_msa_pairing.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# DiffPaSS – Example usage on datasets of interacting protein systems\n",
"# Pairing two protein MSAs by maximising mutual information\n",
"\n",
"> DiffPaSS and DiffPaSS-IPA for pairing two interacting MSAs using mutual information."
]
Expand Down Expand Up @@ -180,9 +180,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Optimize pairings by maximising mutual information between chains: ``InformationAlignment``"
]
"source": "## 3. Optimize pairings by maximising mutual information between chains: `InformationAlignment`"
},
{
"cell_type": "code",
Expand Down

0 comments on commit 5844e39

Please sign in to comment.