-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the workflow pipeline #114
Changes from 126 commits
eed8ab8
4f96b58
f05c5dd
36f107a
70d56bd
d415427
f814bac
e197f67
61c9de2
37a2503
ba2dcc3
94d7bb6
e4f6627
2e0aed0
5ab748e
ddcaa8d
e566131
90d033a
c5a9b44
26e83fb
eff7981
71524fc
ea6b834
f4b736c
7dcce77
a926597
cf142a1
ce1539a
c5724e1
3cc332d
1a97228
4d5425b
058aa99
ef52952
9d2eb06
ab1ab76
10fbdc4
622125a
4a72c2a
90495b5
dd1735e
eaf0b6e
cd2ff92
aa3054e
ef21b2d
7238298
8ad6bb2
e87867f
7d2b936
36f9adc
f9d08bc
daa9870
305ceac
212e955
d281582
f6ce666
e9601fb
a4d379a
1340367
8b5135e
6a38d27
e082035
e7ce4fb
3cbd016
b70e68c
204d22a
c8b7f66
696c36d
24298a4
73b781e
e93d3bf
281a864
602b98b
b8606dd
bbd281d
dd5a604
1c40e97
aecac7e
672a41a
afb22f5
09efd77
b173d93
9ddc811
3dc8c2a
ecd29ee
0b37cf4
9b73218
dd74acf
b8355bb
3132af3
36ca207
02e9520
b58d117
58f52e7
a71d32f
3dbc165
36ed0d0
f045017
7da1bda
81c33b8
b5a27a3
87c473e
5ee44a8
db5232e
1d2d536
e79d31b
1d28042
456288a
aea041a
d859c4e
03aad47
995bba1
5e3d6c4
1a9f155
2ccef0c
7aa6ecb
eca06db
720a76b
5c829d0
50f7432
2b0941d
2d173a8
b2ed582
80b9bb3
d5c9ea5
bd8873b
d061fdc
a778a0e
ab71686
c246e9c
c35ca7a
9f17cbc
cc33374
f840d27
5070fe6
f7c7c21
30bccd3
fa4c4bf
6cf8895
8a31787
be55f59
e6752c9
282e98d
594bb33
77ce671
ed3871d
d6d7b9f
ef4a8d7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -40,6 +40,7 @@ Changes | |
|
||
Enhancements | ||
- Add a base class for workflows (PR #188). | ||
- Add the ABFE workflow (PR #114). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. needs to go in 1.0 |
||
- Add filter function to gmx.extract to make it more robust (PR #183): can filter | ||
incomplete/corrupted lines (#126, #171) with filter=True. | ||
- Add support to util.anyopen() for taking filelike objects (PR #197) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,10 +7,16 @@ of the results and step-by-step version that allows more flexibility. | |
For developers, the skeleton of the workflow should follow the example in | ||
:class:`alchemlyb.workflows.base.WorkflowBase`. | ||
|
||
For users, **alchemlyb** offered a workflow :class:`alchemlyb.workflows.ABFE` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. offers |
||
similar to | ||
`Alchemical Analysis <https://github.com/MobleyLab/alchemical-analysis>`_ | ||
for doing automatic ABFE analysis. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. absolute binding free energy (ABFE) |
||
|
||
.. currentmodule:: alchemlyb.workflows | ||
|
||
.. autosummary:: | ||
:toctree: workflows | ||
|
||
base | ||
ABFE | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,157 @@ | ||
The ABFE workflow | ||
================== | ||
Though **alchemlyb** is a library offering great flexibility in deriving free | ||
energy estimate, it also provide a easy pipeline that is similar to | ||
`Alchemical Analysis <https://github.com/MobleyLab/alchemical-analysis>`_ and a | ||
step-by-step version that allows more flexibility. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This fits better in the Start by explaining in one sentence what ABFE is and what the workflow does. |
||
|
||
Fully Automatic analysis | ||
------------------------ | ||
*Absolute binding free energy* (ABFE) calculations can be analyzed with | ||
two lines of code in a fully automated manner (similar to | ||
`Alchemical Analysis <https://github.com/MobleyLab/alchemical-analysis>`_). | ||
In this case, any parameters are set when invoking :class:`~alchemlyb.workflows.abfe.ABFE` | ||
and reasonable defaults are chosen for any parameters not set explicitly. The two steps | ||
are to | ||
|
||
1. initialize an instance of the :class:`~alchemlyb.workflows.abfe.ABFE` class | ||
2. invoke the :meth:`~alchemlyb.workflows.ABFE.run` method to execute | ||
complete workflow. | ||
|
||
For a GROMACS ABFE simulation, executing the workflow would look similar | ||
to the following code:: | ||
|
||
>>> from alchemtest.gmx import load_ABFE | ||
>>> from alchemlyb.workflows import ABFE | ||
>>> # Enable the logger | ||
>>> import logging | ||
>>> logging.basicConfig(filename='ABFE.log', level=logging.INFO) | ||
>>> # Obtain the path of the data | ||
>>> import os | ||
>>> dir = os.path.dirname(load_ABFE()['data']['complex'][0]) | ||
>>> print(dir) | ||
'alchemtest/gmx/ABFE/complex' | ||
>>> workflow = ABFE(units='kcal/mol', software='Gromacs', dir=dir, | ||
>>> prefix='dhdl', suffix='xvg', T=298, outdirectory='./') | ||
>>> workflow.run(skiptime=10, uncorr='dhdl', threshold=50, | ||
>>> methods=('mbar', 'bar', 'ti'), overlap='O_MBAR.pdf', | ||
xiki-tempula marked this conversation as resolved.
Show resolved
Hide resolved
|
||
>>> breakdown=True, forwrev=10) | ||
orbeckst marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
See :mod:`~alchemlyb.workflows.ABFE` for the explanation with regard to the | ||
parameters. The next two sections explains the output of the workflow and a | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. explain There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry I wonder what do you mean here? The explanation is linked to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not the parameters. I am looking for a narrative that puts in words what the code above does and why it was written that way. Talk about enabling a logger to see messages. Talk about organizing the data and what kind of data you're using here. Talk about what kind of analysis you want to perform and justify your choices of parameters. Think of the code above as a figure in a paper. You always have text that explains what is in the figure and point out the key observations that the reader should make. You never just say "See Fig 3." and let the reader come up with their own interpretation of the data, do you? Or in other words: "In scientific papers, images, tables, equations, and code samples do NOT speak for themselves. You need to give them your voice." |
||
set of analysis that allows the user to examine the quality of the estimate. | ||
|
||
File Input | ||
^^^^^^^^^^ | ||
|
||
This command expects the energy files to be structured in two common ways. It | ||
could either be :: | ||
simulation | ||
├── lambda_0 | ||
│ ├── prod.xvg | ||
│ └── ... | ||
├── lambda_1 | ||
│ ├── prod.xvg | ||
│ └── ... | ||
└── ... | ||
|
||
Where :code:`dir='simulation/lambda_*', prefix='prod', suffix='xvg'`. Or :: | ||
|
||
dhdl_files | ||
├── dhdl_0.xvg | ||
├── dhdl_1.xvg | ||
└── ... | ||
|
||
Where :code:`dir='dhdl_files', prefix='dhdl_', suffix='xvg'`. | ||
|
||
Output | ||
^^^^^^ | ||
|
||
The workflow returns the free energy estimate using all of | ||
:class:`~alchemlyb.estimators.TI`, :class:`~alchemlyb.estimators.BAR`, | ||
:class:`~alchemlyb.estimators.MBAR`. For ABFE calculations, the alchemical | ||
transformation is usually done is three stages, the *bonded*, *coul* and *vdw* | ||
which corresponds to the free energy contribution from applying the | ||
restraint to restrain the ligand to the protein, decouple/annihilate the | ||
coulombic interaction between the ligand and the protein and | ||
decouple/annihilate the protein-ligand lennard jones interactions. The result | ||
will be stored in :attr:`~alchemlyb.workflows.ABFE.summary` as | ||
:class:`pandas.Dataframe`. :: | ||
|
||
|
||
MBAR MBAR_Error BAR BAR_Error TI TI_Error | ||
States 0 -- 1 0.065967 0.001293 0.066544 0.001661 0.066663 0.001675 | ||
1 -- 2 0.089774 0.001398 0.089303 0.002101 0.089566 0.002144 | ||
2 -- 3 0.132036 0.001638 0.132687 0.002990 0.133292 0.003055 | ||
... | ||
26 -- 27 1.243745 0.011239 1.245873 0.015711 1.248959 0.015762 | ||
27 -- 28 1.128429 0.012859 1.124554 0.016999 1.121892 0.016962 | ||
28 -- 29 1.010313 0.016442 1.005444 0.017692 1.019747 0.017257 | ||
Stages coul 10.215658 0.033903 10.017838 0.041839 10.017854 0.048744 | ||
vdw 22.547489 0.098699 22.501150 0.060092 22.542936 0.106723 | ||
bonded 2.374144 0.014995 2.341631 0.005507 2.363828 0.021078 | ||
TOTAL 35.137291 0.103580 34.860619 0.087022 34.924618 0.119206 | ||
orbeckst marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Output Files | ||
^^^^^^^^^^^^ | ||
|
||
For quality assessment, a couple of plots were generated and written to | ||
the folder specified by `outdirectory`. | ||
|
||
The :ref:`overlay matrix for the MBAR estimator <plot_overlap_matrix>` will be | ||
plotted and saved to :file:`O_MBAR.pdf`, which examines the overlap between | ||
different lambda windows. | ||
|
||
The :ref:`dHdl for TI <plot_TI_dhdl>` will be plotted to | ||
:file:`dhdl_TI.pdf`, allows one to examine if the lambda scheduling has | ||
covered the change of the gradient in the lambda space. | ||
|
||
The :ref:`dF states <plot_dF_states>` will be plotted to :file:`dF_state.pdf` in | ||
portrait model and :file:`dF_state_long.pdf` in landscape model, which | ||
allows the user to example the contributions from each lambda window. | ||
|
||
The forward and backward convergence will be plotted to :file:`dF_t.pdf` using | ||
:class:`~alchemlyb.estimators.MBAR` and save in | ||
:attr:`~alchemlyb.workflows.ABFE.convergence`, which allows the user to | ||
examine if the simulation time is enough to achieve a converged result. | ||
|
||
Semi-automatic analysis | ||
----------------------- | ||
The same analysis could also performed in steps allowing access and modification | ||
to the data generated at each stage of the analysis. :: | ||
|
||
>>> from alchemtest.gmx import load_ABFE | ||
>>> from alchemlyb.workflows import ABFE | ||
>>> # Obtain the path of the data | ||
>>> import os | ||
>>> dir = os.path.dirname(load_ABFE()['data']['complex'][0]) | ||
>>> print(dir) | ||
'alchemtest/gmx/ABFE/complex' | ||
>>> # Load the data | ||
>>> workflow = ABFE(software='Gromacs', dir=dir, | ||
>>> prefix='dhdl', suffix='xvg', T=298, outdirectory='./') | ||
>>> # Set the unit. | ||
>>> workflow.update_units('kcal/mol') | ||
>>> # Read the data | ||
>>> workflow.read() | ||
>>> # Decorrelate the data. | ||
>>> workflow.preprocess(skiptime=10, uncorr='dhdl', threshold=50) | ||
>>> # Run the estimator | ||
>>> workflow.estimate(methods=('mbar', 'bar', 'ti')) | ||
>>> # Retrieve the result | ||
>>> summary = workflow.generate_result() | ||
>>> # Plot the overlap matrix | ||
>>> workflow.plot_overlap_matrix(overlap='O_MBAR.pdf') | ||
>>> # Plot the dHdl for TI | ||
>>> workflow.plot_ti_dhdl(dhdl_TI='dhdl_TI.pdf') | ||
>>> # Plot the dF states | ||
>>> workflow.plot_dF_state(dF_state='dF_state.pdf') | ||
>>> # Convergence analysis | ||
>>> workflow.check_convergence(10, dF_t='dF_t.pdf') | ||
|
||
API Reference | ||
------------- | ||
.. autoclass:: alchemlyb.workflows.ABFE | ||
:members: | ||
:inherited-members: |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,11 +2,12 @@ | |
import logging | ||
import numpy as np | ||
|
||
from ..estimators import MBAR, BAR, TI, AutoMBAR | ||
from ..estimators import BAR, TI | ||
from ..estimators import AutoMBAR as MBAR | ||
orbeckst marked this conversation as resolved.
Show resolved
Hide resolved
|
||
from .. import concat | ||
|
||
|
||
def forward_backward_convergence(df_list, estimator='mbar', num=10): | ||
def forward_backward_convergence(df_list, estimator='MBAR', num=10): | ||
'''Forward and backward convergence of the free energy estimate. | ||
|
||
Generate the free energy estimate as a function of time in both directions, | ||
|
@@ -20,7 +21,7 @@ def forward_backward_convergence(df_list, estimator='mbar', num=10): | |
---------- | ||
df_list : list | ||
List of DataFrame of either dHdl or u_nk. | ||
estimator : {'mbar', 'bar', 'ti', 'autombar'} | ||
estimator : {'MBAR', 'BAR', 'TI'} | ||
orbeckst marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Name of the estimators. | ||
xiki-tempula marked this conversation as resolved.
Show resolved
Hide resolved
|
||
num : int | ||
The number of time points. | ||
|
@@ -51,16 +52,13 @@ def forward_backward_convergence(df_list, estimator='mbar', num=10): | |
logger.info('Start convergence analysis.') | ||
logger.info('Check data availability.') | ||
|
||
if estimator.lower() == 'mbar': | ||
orbeckst marked this conversation as resolved.
Show resolved
Hide resolved
|
||
logger.info('Use MBAR estimator for convergence analysis.') | ||
estimator_fit = MBAR().fit | ||
elif estimator.lower() == 'autombar': | ||
if estimator == 'MBAR': | ||
logger.info('Use AutoMBAR estimator for convergence analysis.') | ||
estimator_fit = AutoMBAR().fit | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd prefer the user having a choice — or does it not really matter because users cannot pass arguments to the estimator so they should always use AutoMBAR?? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have made it such that the kwargs are passed to the estimator. |
||
elif estimator.lower() == 'bar': | ||
estimator_fit = MBAR().fit | ||
elif estimator == 'BAR': | ||
logger.info('Use BAR estimator for convergence analysis.') | ||
estimator_fit = BAR().fit | ||
elif estimator.lower() == 'ti': | ||
elif estimator == 'TI': | ||
logger.info('Use TI estimator for convergence analysis.') | ||
estimator_fit = TI().fit | ||
else: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,6 @@ | ||
from .mbar_ import MBAR, AutoMBAR | ||
from .bar_ import BAR | ||
from .ti_ import TI | ||
|
||
FEP_ESTIMATORS = [MBAR.__name__, AutoMBAR.__name__, BAR.__name__] | ||
TI_ESTIMATORS = [TI.__name__] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave at
false
so that we see all failuresThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is initially true, then in one of my PR, I set it to false so I could see the fails more clearly but I forget to change it back before the PR is merged. So we want to have this as false?