alchemistry · orbeckst · Sep 6, 2022 · Feb 28, 2021 · Mar 7, 2021 · Mar 8, 2021
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -26,7 +26,7 @@ jobs:
   test:
     runs-on: ${{ matrix.os }}
     strategy:
-      fail-fast: false
+      fail-fast: true
       matrix:
         os: ["ubuntu-latest", "macOS-latest", "windows-latest"]
         python-version: ["3.8", "3.9", "3.10"]

diff --git a/CHANGES b/CHANGES
@@ -40,6 +40,7 @@ Changes
 
 Enhancements
   - Add a base class for workflows (PR #188).
+  - Add the ABFE workflow (PR #114).
   - Add filter function to gmx.extract to make it more robust (PR #183): can filter 
     incomplete/corrupted lines (#126, #171) with filter=True.
   - Add support to util.anyopen() for taking filelike objects (PR #197)

diff --git a/docs/api_principles.rst b/docs/api_principles.rst
@@ -65,7 +65,8 @@ The library is structured as follows, following a similar style to
     │   ├── ti_dhdl.py
     │   └── ...
     └── workflows           ### WORK IN PROGRESS
-        └── ...
+    │   ├── base.py
+    │   ├── abfe.py
 
 
 * The :mod:`~alchemlyb.parsing` submodule contains parsers for individual MD engines, since the output files needed to perform alchemical free energy calculations vary widely and are not standardized.  Each module at the very least provides an `extract_u_nk` function for extracting reduced potentials (needed for MBAR), as well as an `extract_dHdl` function for extracting derivatives required for thermodynamic integration.  Other helper functions may be exposed for additional processing, such as generating an XVG file from an EDR file in the case of GROMACS.  All `extract\_*` functions take similar arguments (a file path, parameters such as temperature), and produce standard outputs (:class:`pandas.DataFrame` for reduced potentials, :class:`pandas.Series` for derivatives).

diff --git a/docs/workflows.rst b/docs/workflows.rst
@@ -7,10 +7,16 @@ of the results and step-by-step version that allows more flexibility.
 For developers, the skeleton of the workflow should follow the example in
 :class:`alchemlyb.workflows.base.WorkflowBase`.
 
+For users, **alchemlyb** offered a workflow :class:`alchemlyb.workflows.ABFE`
+similar to
+`Alchemical Analysis <https://github.com/MobleyLab/alchemical-analysis>`_
+for doing automatic ABFE analysis.
+
 .. currentmodule:: alchemlyb.workflows
 
 .. autosummary::
     :toctree: workflows
 
     base
+    ABFE
 
diff --git a/docs/workflows/alchemlyb.workflows.ABFE.rst b/docs/workflows/alchemlyb.workflows.ABFE.rst
@@ -0,0 +1,157 @@
+The ABFE workflow
+==================
+Though **alchemlyb** is a library offering great flexibility in deriving free
+energy estimate, it also provide a easy pipeline that is similar to
+`Alchemical Analysis <https://github.com/MobleyLab/alchemical-analysis>`_ and a
+step-by-step version that allows more flexibility.
+
+Fully Automatic analysis
+------------------------
+*Absolute binding free energy* (ABFE) calculations can be analyzed with
+two lines of code in a fully automated manner (similar to 
+`Alchemical Analysis <https://github.com/MobleyLab/alchemical-analysis>`_).
+In this case, any parameters are set when invoking :class:`~alchemlyb.workflows.abfe.ABFE`
+and reasonable defaults are chosen for any parameters not set explicitly. The two steps
+are to
+
+1. initialize an instance of the  :class:`~alchemlyb.workflows.abfe.ABFE` class
+2. invoke the :meth:`~alchemlyb.workflows.ABFE.run` method to execute
+   complete workflow.
+
+For a GROMACS ABFE simulation, executing the workflow would look similar
+to the following code::
+
+    >>> from alchemtest.gmx import load_ABFE
+    >>> from alchemlyb.workflows import ABFE
+    >>> # Enable the logger
+    >>> import logging
+    >>> logging.basicConfig(filename='ABFE.log', level=logging.INFO)
+    >>> # Obtain the path of the data
+    >>> import os
+    >>> dir = os.path.dirname(load_ABFE()['data']['complex'][0])
+    >>> print(dir)
+    'alchemtest/gmx/ABFE/complex'
+    >>> workflow = ABFE(units='kcal/mol', software='Gromacs', dir=dir,
+    >>>                 prefix='dhdl', suffix='xvg', T=298, outdirectory='./')
+    >>> workflow.run(skiptime=10, uncorr='dhdl', threshold=50,
+    >>>              methods=('mbar', 'bar', 'ti'), overlap='O_MBAR.pdf',
+    >>>              breakdown=True, forwrev=10)
+
+
+See :mod:`~alchemlyb.workflows.ABFE` for the explanation with regard to the
+parameters. The next two sections explains the output of the workflow and a
+set of analysis that allows the user to examine the quality of the estimate.
+
+File Input
+^^^^^^^^^^
+
+This command expects the energy files to be structured in two common ways. It
+could either be ::
+    simulation
+    ├── lambda_0
+    │   ├── prod.xvg
+    │   └── ...
+    ├── lambda_1
+    │   ├── prod.xvg
+    │   └── ...
+    └── ...
+
+Where :code:`dir='simulation/lambda_*', prefix='prod', suffix='xvg'`. Or ::
+
+    dhdl_files
+    ├── dhdl_0.xvg
+    ├── dhdl_1.xvg
+    └── ...
+
+Where :code:`dir='dhdl_files', prefix='dhdl_', suffix='xvg'`.
+
+Output
+^^^^^^
+
+The workflow returns the free energy estimate using all of
+:class:`~alchemlyb.estimators.TI`, :class:`~alchemlyb.estimators.BAR`,
+:class:`~alchemlyb.estimators.MBAR`. For ABFE calculations, the alchemical
+transformation is usually done is three stages, the *bonded*, *coul* and *vdw*
+which corresponds to the free energy contribution from applying the
+restraint to restrain the ligand to the protein, decouple/annihilate the
+coulombic interaction between the ligand and the protein and
+decouple/annihilate the protein-ligand lennard jones interactions. The result
+will be stored in :attr:`~alchemlyb.workflows.ABFE.summary` as
+:class:`pandas.Dataframe`. ::
+
+
+                          MBAR  MBAR_Error        BAR  BAR_Error         TI  TI_Error
+    States 0 -- 1     0.065967    0.001293   0.066544   0.001661   0.066663  0.001675
+           1 -- 2     0.089774    0.001398   0.089303   0.002101   0.089566  0.002144
+           2 -- 3     0.132036    0.001638   0.132687   0.002990   0.133292  0.003055
+    ...
+           26 -- 27   1.243745    0.011239   1.245873   0.015711   1.248959  0.015762
+           27 -- 28   1.128429    0.012859   1.124554   0.016999   1.121892  0.016962
+           28 -- 29   1.010313    0.016442   1.005444   0.017692   1.019747  0.017257
+    Stages coul      10.215658    0.033903  10.017838   0.041839  10.017854  0.048744
+           vdw       22.547489    0.098699  22.501150   0.060092  22.542936  0.106723
+           bonded     2.374144    0.014995   2.341631   0.005507   2.363828  0.021078
+           TOTAL     35.137291    0.103580  34.860619   0.087022  34.924618  0.119206
+
+Output Files
+^^^^^^^^^^^^
+
+For quality assessment, a couple of plots were generated and written to
+the folder specified by `outdirectory`.
+
+The :ref:`overlay matrix for the MBAR estimator <plot_overlap_matrix>` will be
+plotted and saved to :file:`O_MBAR.pdf`, which examines the overlap between
+different lambda windows.
+
+The :ref:`dHdl for TI <plot_TI_dhdl>` will be plotted to
+:file:`dhdl_TI.pdf`, allows one to examine if the lambda scheduling has
+covered the change of the gradient in the lambda space.
+
+The :ref:`dF states <plot_dF_states>` will be plotted to :file:`dF_state.pdf` in
+portrait model and :file:`dF_state_long.pdf` in landscape model, which
+allows the user to example the contributions from each lambda window.
+
+The forward and backward convergence will be plotted to :file:`dF_t.pdf` using
+:class:`~alchemlyb.estimators.MBAR` and save in
+:attr:`~alchemlyb.workflows.ABFE.convergence`, which allows the user to
+examine if the simulation time is enough to achieve a converged result.
+
+Semi-automatic analysis
+-----------------------
+The same analysis could also performed in steps allowing access and modification
+to the data generated at each stage of the analysis. ::
+
+    >>> from alchemtest.gmx import load_ABFE
+    >>> from alchemlyb.workflows import ABFE
+    >>> # Obtain the path of the data
+    >>> import os
+    >>> dir = os.path.dirname(load_ABFE()['data']['complex'][0])
+    >>> print(dir)
+    'alchemtest/gmx/ABFE/complex'
+    >>> # Load the data
+    >>> workflow = ABFE(software='Gromacs', dir=dir,
+    >>>                 prefix='dhdl', suffix='xvg', T=298, outdirectory='./')
+    >>> # Set the unit.
+    >>> workflow.update_units('kcal/mol')
+    >>> # Read the data
+    >>> workflow.read()
+    >>> # Decorrelate the data.
+    >>> workflow.preprocess(skiptime=10, uncorr='dhdl', threshold=50)
+    >>> # Run the estimator
+    >>> workflow.estimate(methods=('mbar', 'bar', 'ti'))
+    >>> # Retrieve the result
+    >>> summary = workflow.generate_result()
+    >>> # Plot the overlap matrix
+    >>> workflow.plot_overlap_matrix(overlap='O_MBAR.pdf')
+    >>> # Plot the dHdl for TI
+    >>> workflow.plot_ti_dhdl(dhdl_TI='dhdl_TI.pdf')
+    >>> # Plot the dF states
+    >>> workflow.plot_dF_state(dF_state='dF_state.pdf')
+    >>> # Convergence analysis
+    >>> workflow.check_convergence(10, dF_t='dF_t.pdf')
+
+API Reference
+-------------
+.. autoclass:: alchemlyb.workflows.ABFE
+    :members:
+    :inherited-members:
diff --git a/src/alchemlyb/convergence/convergence.py b/src/alchemlyb/convergence/convergence.py
@@ -2,11 +2,12 @@
 import logging
 import numpy as np
 
-from ..estimators import MBAR, BAR, TI, AutoMBAR
+from ..estimators import BAR, TI
+from ..estimators import AutoMBAR as MBAR
 from .. import concat
 
 
-def forward_backward_convergence(df_list, estimator='mbar', num=10):
+def forward_backward_convergence(df_list, estimator='MBAR', num=10):
     '''Forward and backward convergence of the free energy estimate.
 
     Generate the free energy estimate as a function of time in both directions,
@@ -20,7 +21,7 @@ def forward_backward_convergence(df_list, estimator='mbar', num=10):
     ----------
     df_list : list
         List of DataFrame of either dHdl or u_nk.
-    estimator : {'mbar', 'bar', 'ti', 'autombar'}
+    estimator : {'MBAR', 'BAR', 'TI'}
         Name of the estimators.
     num : int
         The number of time points.
@@ -51,16 +52,13 @@ def forward_backward_convergence(df_list, estimator='mbar', num=10):
     logger.info('Start convergence analysis.')
     logger.info('Check data availability.')
 
-    if estimator.lower() == 'mbar':
-        logger.info('Use MBAR estimator for convergence analysis.')
-        estimator_fit = MBAR().fit
-    elif estimator.lower() == 'autombar':
+    if estimator == 'MBAR':
         logger.info('Use AutoMBAR estimator for convergence analysis.')
-        estimator_fit = AutoMBAR().fit
-    elif estimator.lower() == 'bar':
+        estimator_fit = MBAR().fit
+    elif estimator == 'BAR':
         logger.info('Use BAR estimator for convergence analysis.')
         estimator_fit = BAR().fit
-    elif estimator.lower() == 'ti':
+    elif estimator == 'TI':
         logger.info('Use TI estimator for convergence analysis.')
         estimator_fit = TI().fit
     else:

diff --git a/src/alchemlyb/estimators/__init__.py b/src/alchemlyb/estimators/__init__.py
@@ -1,3 +1,6 @@
 from .mbar_ import MBAR, AutoMBAR
 from .bar_ import BAR
 from .ti_ import TI
+
+FEP_ESTIMATORS = [MBAR.__name__, AutoMBAR.__name__, BAR.__name__]
+TI_ESTIMATORS = [TI.__name__]
diff --git a/src/alchemlyb/tests/test_convergence.py b/src/alchemlyb/tests/test_convergence.py
@@ -30,7 +30,7 @@ def test_convergence_mbar(gmx_benzene):
 
 def test_convergence_autombar(gmx_benzene):
     dHdl, u_nk = gmx_benzene
-    convergence = forward_backward_convergence(u_nk, 'AutoMBAR')
+    convergence = forward_backward_convergence(u_nk, 'MBAR')
     assert convergence.shape == (10, 5)
     assert convergence.iloc[0, 0] == pytest.approx(3.02, 0.01)
     assert convergence.iloc[0, 2] == pytest.approx(3.06, 0.01)

diff --git a/src/alchemlyb/tests/test_visualisation.py b/src/alchemlyb/tests/test_visualisation.py
@@ -130,7 +130,7 @@ def test_plot_dF_state():
 def test_plot_convergence_dataframe():
     bz = load_benzene().data
     data_list = [extract_u_nk(xvg, T=300) for xvg in bz['Coulomb']]
-    df = forward_backward_convergence(data_list, 'mbar')
+    df = forward_backward_convergence(data_list, 'MBAR')
     ax = plot_convergence(df)
     assert isinstance(ax, matplotlib.axes.Axes)
     plt.close(ax.figure)