Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Add graph alignment tutorial #12

Merged
merged 2 commits into from
May 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 23 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ python -m pip install -e .

## Quickstart

### Input data preprocessing
### Input data preprocessing (MSA pairing)

First, parse your multiple sequence alignments (MSAs) in FASTA format
into a list of tuples `(header, sequence)` using
Expand Down Expand Up @@ -205,21 +205,28 @@ msa_B_oh = one_hot_encode_msa(msa_B_for_pairing, device=device)

### Pairing optimization

Finally, we can instantiate an
Finally, we can instantiate a class from `diffpass.train` to find an
optimal pairing between `x` and `y`. Here, `x` and `y` are MSAs, so we
can look for a pairing that optimizes the mutual information between `x`
and `y`. For this, we use
[`InformationPairing`](https://Bitbol-Lab.github.io/DiffPaSS/train.html#informationpairing)
object and optimize the mutual information between the paired MSAs using
the DiffPaSS bootstrapped optimization algorithm. The results are stored
in a
[`DiffPaSSResults`](https://Bitbol-Lab.github.io/DiffPaSS/base.html#diffpassresults)
container. The lists of (hard) losses and permutations found during the
optimization can be accessed as attributes of the container.
and the DiffPaSS bootstrapped optimization algorithm. See the tutorials
below for other examples, including for graph alignment when `x` and `y`
are weighted adjacency matrices.

``` python
from diffpass.train import InformationPairing

information_pairing = InformationPairing(group_sizes=species_sizes).to(device)
bootstrap_results = information_pairing.fit_bootstrap(x, y)
```

The results are stored in a
[`DiffPaSSResults`](https://Bitbol-Lab.github.io/DiffPaSS/base.html#diffpassresults)
container. The lists of (hard) losses and permutations found during the
optimization can be accessed as attributes of the container:

``` python
print(f"Final hard loss: {bootstrap_results.hard_losses[-1].item()}")
print(f"Final hard permutations (one permutation per species): {bootstrap_results.hard_perms[-1][-1].item()}")
```
Expand All @@ -229,11 +236,14 @@ the tutorials.

## Tutorials

See the
[`mutual_information_msa_pairing.ipynb`](https://github.com/Bitbol-Lab/DiffPaSS/blob/main/nbs/tutorials/mutual_information_msa_pairing.ipynb)
notebook for an example of paired MSA optimization in the case of
well-known prokaryotic datasets, for which ground truth pairings are
given by genome proximity.
- [`mutual_information_msa_pairing.ipynb`](https://github.com/Bitbol-Lab/DiffPaSS/blob/main/nbs/tutorials/mutual_information_msa_pairing.ipynb):
paired MSA optimization using mutual information in the case of
well-known prokaryotic datasets, for which ground truth pairings are
given by genome proximity.
- [`graph_alignment.ipynb`](https://github.com/Bitbol-Lab/DiffPaSS/blob/main/nbs/tutorials/graph_alignment.ipynb):
general graph alignment using
[`diffpass.train.GraphAlignment`](https://Bitbol-Lab.github.io/DiffPaSS/train.html#graphalignment),
with an example of aligning two weighted adjacency matrices.

## Documentation

Expand Down
10 changes: 7 additions & 3 deletions nbs/index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@
"source": [
"## Quickstart\n",
"\n",
"### Input data preprocessing\n",
"### Input data preprocessing (MSA pairing)\n",
"\n",
"First, parse your multiple sequence alignments (MSAs) in FASTA format into a list of tuples ``(header, sequence)`` using `read_msa`.\n",
"\n",
Expand Down Expand Up @@ -164,14 +164,17 @@
"\n",
"### Pairing optimization\n",
"\n",
"Finally, we can instantiate an `InformationPairing` object and optimize the mutual information between the paired MSAs using the DiffPaSS bootstrapped optimization algorithm. The results are stored in a `DiffPaSSResults` container. The lists of (hard) losses and permutations found during the optimization can be accessed as attributes of the container.\n",
"Finally, we can instantiate a class from `diffpass.train` to find an optimal pairing between `x` and `y`. Here, `x` and `y` are MSAs, so we can look for a pairing that optimizes the mutual information between `x` and `y`. For this, we use `InformationPairing` and the DiffPaSS bootstrapped optimization algorithm. See the tutorials below for other examples, including for graph alignment when `x` and `y` are weighted adjacency matrices.\n",
"\n",
"```python\n",
"from diffpass.train import InformationPairing\n",
"\n",
"information_pairing = InformationPairing(group_sizes=species_sizes).to(device)\n",
"bootstrap_results = information_pairing.fit_bootstrap(x, y)\n",
"```\n",
"\n",
"The results are stored in a `DiffPaSSResults` container. The lists of (hard) losses and permutations found during the optimization can be accessed as attributes of the container:\n",
"```python\n",
"print(f\"Final hard loss: {bootstrap_results.hard_losses[-1].item()}\")\n",
"print(f\"Final hard permutations (one permutation per species): {bootstrap_results.hard_perms[-1][-1].item()}\")\n",
"```\n",
Expand All @@ -185,7 +188,8 @@
"source": [
"## Tutorials\n",
"\n",
"See the [`mutual_information_msa_pairing.ipynb`](https://github.com/Bitbol-Lab/DiffPaSS/blob/main/nbs/tutorials/mutual_information_msa_pairing.ipynb) notebook for an example of paired MSA optimization in the case of well-known prokaryotic datasets, for which ground truth pairings are given by genome proximity."
"- [`mutual_information_msa_pairing.ipynb`](https://github.com/Bitbol-Lab/DiffPaSS/blob/main/nbs/tutorials/mutual_information_msa_pairing.ipynb): paired MSA optimization using mutual information in the case of well-known prokaryotic datasets, for which ground truth pairings are given by genome proximity.\n",
"- [`graph_alignment.ipynb`](https://github.com/Bitbol-Lab/DiffPaSS/blob/main/nbs/tutorials/graph_alignment.ipynb): general graph alignment using `diffpass.train.GraphAlignment`, with an example of aligning two weighted adjacency matrices."
]
},
{
Expand Down
1 change: 1 addition & 0 deletions nbs/sidebar.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,5 @@ website:
- train.ipynb
- section: tutorials
contents:
- tutorials/graph_alignment.ipynb
- tutorials/mutual_information_msa_pairing.ipynb
323 changes: 323 additions & 0 deletions nbs/tutorials/graph_alignment.ipynb

Large diffs are not rendered by default.

4 changes: 1 addition & 3 deletions nbs/tutorials/mutual_information_msa_pairing.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -180,9 +180,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Optimize pairings by maximising mutual information between chains: ``InformationAlignment``"
]
"source": "## 3. Optimize pairings by maximising mutual information between chains: `InformationAlignment`"
},
{
"cell_type": "code",
Expand Down
Loading