Automated Solvation Shell Analysis #227

cadeduckworth · 2023-01-08T23:11:55Z

No description provided.

* separating functionality for use with _single_universe without iterating frames * retaining functionality for use with _single_frame

* try/except pattern condensed to one run method * single run method implemented for _single_universe and _single_frame * functionality of original run method retained, while removing frame iteration for _single_universe usage * progress bar still active when using _single_universe, but no frame iteration occurs

directory_paths: finds MDPOW project data and returns pandas DataFrame directory_iteration: takes pandas DataFrame or csv and iterates automated_dihedral_analysis over MDPOW project directories automated_dihedral_analysis: group of functions that includes automation scripts for DihedralAnalysis, automatic dihedral atom group selection, seaborn violin plots function, and other required functions Future changes will eliminate redundancies following initial testing

…class to test possible ways to call automation-related functions

removed raise added docstrings to describe use of NotImplementedError responded with reason for removing _conclude_universe() next step: updating html documentation following successful testing

fixed expected indent block error removed raise removed _conclude_universe()

…se automation for DihedralAnalysis automated_dihedral_analysis.py directory_iteration.py directory_paths.py

…ained within automated_dihedral_analysis.py

…v files in provided parent directory

…alysis

…ted-dihedral-analysis updating with changes for backend and testing from PR #216 ensemble_run_update

*incomplete, pushing to test *source and html doc additions *changed all instances of datadir to dirname for consistency with DihedralAnalysis *all changes pertain to new workflows modules

codecov · 2023-01-08T23:15:47Z

Codecov Report

Merging #227 (1f62fa6) into develop (cafb300) will decrease coverage by 1.45%.
The diff coverage is 0.00%.

❗ Current head 1f62fa6 differs from pull request most recent head 61f53b9. Consider uploading reports for the commit 61f53b9 to get more accurate results

@@             Coverage Diff             @@
##           develop     #227      +/-   ##
===========================================
- Coverage    80.15%   78.70%   -1.45%     
===========================================
  Files           15       16       +1     
  Lines         1900     1935      +35     
  Branches       291      295       +4     
===========================================
  Hits          1523     1523              
- Misses         286      321      +35     
  Partials        91       91

Impacted Files	Coverage Δ
mdpow/workflows/solvations.py	`0.00% <0.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

ASA works as is workflows base module needs adjustment for proper use

orbeckst · 2023-04-04T02:00:48Z

Lots of conflicts to resolve, @cadeduckworth !

cadeduckworth · 2023-04-05T17:56:32Z

@orbeckst
The following is a brief overview of the new manual EnsembleAnalysis workflow based on the changes I made to the original SolvationAnalysis: (the changes to mdpow.analysis.solvation are in the most recent commit`)

I made changes that allow a user to input a max distance, and obtain all of the solute-solvent interactions within that distance (distance and atom/atomgroup info included), per molecule, per solvent, per interaction, per lambda.

This is a very rough draft, especially for the plot, but the workflow, results DataFrame, and FacetGrid get the main idea across for what I am going for.

SAMPL9 Data, FF = GAFF, Molecule = S08

Workflow:

Results DF: (it might be better to break the *_ix lists up into more columns)

Figure (FacetGrid): (please excuse the quality, I am trying to work out the kinks in the .map() function)

orbeckst · 2023-04-05T19:59:27Z

The manual workflow as shown is a good mini tutorial and looks already very good.

Presenting the data will be more of a challenge—the distance KDEs are naturally cut off at the cutoff, which makes it look a bit weird/incomplete. And once the solute disappears, the N becomes quite big and swamps the small numbers. Perhaps try a log plot?? Or normalize??? Also, are the large numbers compatible with what you get for a sphere of radius with the cutoff at the density of water or octanol/other solvent?

orbeckst · 2023-04-05T20:01:57Z

The distance histograms are interesting. You can see how molecules start to overlap with the disappearing solute.

…workflows registry/registry docs

cadeduckworth · 2023-04-06T03:14:39Z

@orbeckst
Although there is a large step size specified, from a theoretical perspective I interpreted the violins as a stepwise visual contribution to confirmation that things are going according to plan in the context of turning off intermolecular interactions. I might be thinking about it wrong, but the solute or solvent is essentially being electrostatically and otherwise alchemically removed from the system which is positively indicated by the increased equilibration of solvent molecules, right?

On that note, I think seeing that there is not much change when turning of Coulombic interactions for neutral solute/solvent is a good indication, and then seeing the gradual increase of number of solvent molecules at a decreased distance when turning off VDW in that sense is also a good indication of sufficient sampling, IFF, this holds up for other cases with more frames included.

Perhaps I should run a simulation with one of the charged molecules that was excluded for logP calculations from one of the SAMPL sets and see what happens if I can get GROMACS to comply?

EDIT: paragraph spacing, wording

After thought: a nearest neighbors calculation or comparison of individual lambdas to a population distribution could help later in identifying when some systems are not behaving?

cadeduckworth · 2023-04-06T03:26:52Z

@orbeckst

I will look into approximately homogenous equilibration of solvents to try to confirm the data for more completely sampled datasets when I get the chance (those lambdas for VDW closer to 1, when the solute is essentially removed). I apologize if I did not use the correct terms there, it has been a while since I covered equilibrium chemistry, but I think I understand what you are getting at.

cadeduckworth · 2023-04-06T04:57:27Z

@orbeckst

I ran a more complete analysis of the same SAMPL9 GAFF molecule (S08) with a step size of 500 and cleaned up the plots a bit.
This is the product of what I have been playing with using cairosvg and svgutils instead of sns.FacetGrid, which, outside of .catplots, when initialized directly, pretty much requires that axes be shared for the same data being analyzed, which is exactly the same as sharing axes in .catplots.

So, this is the SVG converted to pdf. The width ratios are slightly off, which affects the placement of the Mol object, but I cleaned up the scale on the y-axes as well as the height/width aspect ratio. Fixing the axes/scales and including much more data seems to give a better look at what is going on.

If we do eventually go with a stacked plot scheme, then I will reserve less of the right side and reduce it to a single Mol object and legend to make it look better.

S08.pdf

cadeduckworth · 2023-04-06T05:06:29Z

From earlier this morning before making changes to SolvationAnalysis:

Using the original SolvationAnalysis class, which summed the number of solvent molecules within a specified first and second cutoff distance, separately, I made a time series of the number of solvent molecules.
The concept does not make much sense since each lambda simulation starts from the same relaxed equilibrated MD sim(?), but if I can map the time series to the lambda values then it might be useful as an option later on for something. Note: this is using very under sampled data, large step size, but also shows 95% CI

orbeckst · 2023-04-07T01:30:07Z

For the S08 data, the number of solvent atoms is initially 60,000 — there aren't that many waters in the box, are they?

Can you do the following calculation: Given the standard density of water at the simulation conditions, how many water molecules would you expect to find in a volume roughly equivalent to (1) the solute, (2) the solute with its surface blown up by 4 Å? A rough physicist estimate will suffice as a start, e.g., approximating the molecule as a cylinder or sphere...

Then compare the estimate to the numbers from solvation analysis.

orbeckst · 2023-04-07T01:36:33Z

mdpow/analysis/solvation.py

@@ -56,21 +56,38 @@ def __init__(self, solute: EnsembleAtomGroup, solvent: EnsembleAtomGroup, distan
        self._dists = distances

    def _prepare_ensemble(self):
-        self._col = ['distance', 'solvent', 'interaction',
-                     'lambda', 'time', 'N_solvent']


Are you now returning distances? If so, better write a new class and leave the old one as-is.

@orbeckst
I agree, because they are doing different things. Can they both exist as separate classes within the same module?

orbeckst · 2023-04-07T01:40:00Z

I am not sure if the time series contain more interesting information than the averages or distributions — at least in equilibrium. Time series in equilibrium are good as a diagnostic to check if the system is likely sampling equilibrium: if the fluctuations are around a stable mean then it might be equilibrium...

orbeckst · 2023-04-07T01:41:27Z

I don't think that we need to add the molecular structure next to the solvation plots unless you want to highlight specific atoms. Otherwise I'd expect a molecular structure to be included in an overview panel and the reader can use the ID to cross-reference.

cadeduckworth · 2023-04-08T15:51:08Z

@orbeckst
I am working on some calculations/estimates, but for clarification, the 'counts' bar plot is counting each solvent atom that is within the cutoff distance near any solute atom non-exclusively, so multiple solvent atom - solute atoms distances are being recorded and counted, meaning that solvent atom A near solute atom B and solute atom C are counted separately. I believe this is why the numbers are so high. There could be an easy way to count each solvent atom only once with a minor adjustment to the script. Additionally, we could still record all of the data, and only count each solvent molecule once in the plot by only counting the unique resids.

orbeckst · 2023-04-08T18:35:12Z

Each atom should only be counted once. There's a single closest interaction between a solvent atom and the solute (barring coincidental equidistant triangles).

orbeckst · 2024-10-14T20:06:33Z

@cadeduckworth is this something you'd want to complete for a MDPOW paper?

Can you resolve the conflicts so that one can look at the changes in the context of the latest version of the code?

cadeduckworth and others added 30 commits September 15, 2022 10:27

developing back end run methods

e71456d

* separating functionality for use with _single_universe without iterating frames * retaining functionality for use with _single_frame

moved directory_paths function to reside within the DihedralAnalysis …

f9b6646

…class to test possible ways to call automation-related functions

Addressed change requests by Oliver

7bcdf49

removed raise added docstrings to describe use of NotImplementedError responded with reason for removing _conclude_universe() next step: updating html documentation following successful testing

fixed indentation error for _single_frame() block

a87cb11

reverting previous tab issue, necessary for normal function

7edde0d

fixed errors previously reverted for testing

48016c5

fixed expected indent block error removed raise removed _conclude_universe()

Merge branch 'develop' into ensemble_run_update

59c819c

restored original dihedral.py file

6b1be1b

created automation/dihedral subdirectory under mdpow/analysis/ to hou…

2ddea0e

…se automation for DihedralAnalysis automated_dihedral_analysis.py directory_iteration.py directory_paths.py

updated docstrings and added examples for directory_paths.py

ad6db6e

added docstrings and examples for directory_iteration.py

848dd87

added initial draft of docstrings and examples for all functions cont…

6394d7a

…ained within automated_dihedral_analysis.py

added the option to save the DihedralAnalysis results DataFrame as cs…

d8f3ca3

…v files in provided parent directory

removed redundant DataFrame saving pattern from automated_dihedral_an…

d8137cb

…alysis

Merge remote-tracking branch 'origin/ensemble_run_update' into automa…

29477ab

…ted-dihedral-analysis updating with changes for backend and testing from PR #216 ensemble_run_update

merged updates from PR #216 and PR #218

642e87e

Merge branch 'develop' into automated-dihedral-analysis

4814376

Merge branch 'develop' into automated-dihedral-analysis

f333ee8

resolved conflicts to merge updates in from develop from PR#216

6f16b84

adding data for testing automated dihedral analysis

ddffbdf

added starting framework for testing automated dihedral analysis

c850448

reorganized automation directory into workflows directory

69b0351

added preliminary simple core tests for automated dihedral analysis

4cf176d

initial reformatting of existing docs for sphinx markup compatibility

97c4a73

*incomplete, pushing to test *source and html doc additions *changed all instances of datadir to dirname for consistency with DihedralAnalysis *all changes pertain to new workflows modules

fix errors in docs for workflows

930d54e

added imports for automated dihedral analysis tests

84536ff

added init file for workflows module

f01b2a4

added logging functionality for workflows modules

61ff494

cadeduckworth added 2 commits January 8, 2023 15:31

generalizing automation base module for use with other analyses

1ad6b8c

initialize PR for automated-solvation-shell module

3f65e28

cadeduckworth added 5 commits January 8, 2023 17:09

initial script for automated solvation shell analysis

80c641c

updated automated solvation module, incomplete, issue with compression

09721c7

fixed automated solvation shell analysis module

68c0b1e

ASA works as is workflows base module needs adjustment for proper use

docs and func names

7181017

cleanup and remove workflows base module/move to new PR

e3ef88f

orbeckst mentioned this pull request Jan 16, 2023

Workflows Base Module #229

Merged

cadeduckworth added 5 commits April 5, 2023 01:40

resolving conflicts

36be611

add new version of testing resources

b876ad1

correct module name

1f62fa6

start test module for solvations

77f7054

change data obtained by SolvationAnalysis

041c6ed

name top-level automated solvation function and complete addition to …

61f53b9

…workflows registry/registry docs

orbeckst reviewed Apr 7, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated Solvation Shell Analysis #227

Automated Solvation Shell Analysis #227

cadeduckworth commented Jan 8, 2023

codecov bot commented Jan 8, 2023 •

edited

Loading

orbeckst commented Apr 4, 2023

cadeduckworth commented Apr 5, 2023

orbeckst commented Apr 5, 2023

orbeckst commented Apr 5, 2023

cadeduckworth commented Apr 6, 2023 •

edited

Loading

cadeduckworth commented Apr 6, 2023

cadeduckworth commented Apr 6, 2023

cadeduckworth commented Apr 6, 2023

orbeckst commented Apr 7, 2023

orbeckst Apr 7, 2023

cadeduckworth Apr 8, 2023

orbeckst commented Apr 7, 2023

orbeckst commented Apr 7, 2023

cadeduckworth commented Apr 8, 2023

orbeckst commented Apr 8, 2023

orbeckst commented Oct 14, 2024

Automated Solvation Shell Analysis #227

Are you sure you want to change the base?

Automated Solvation Shell Analysis #227

Conversation

cadeduckworth commented Jan 8, 2023

codecov bot commented Jan 8, 2023 • edited Loading

Codecov Report

orbeckst commented Apr 4, 2023

cadeduckworth commented Apr 5, 2023

orbeckst commented Apr 5, 2023

orbeckst commented Apr 5, 2023

cadeduckworth commented Apr 6, 2023 • edited Loading

cadeduckworth commented Apr 6, 2023

cadeduckworth commented Apr 6, 2023

cadeduckworth commented Apr 6, 2023

orbeckst commented Apr 7, 2023

orbeckst Apr 7, 2023

Choose a reason for hiding this comment

cadeduckworth Apr 8, 2023

Choose a reason for hiding this comment

orbeckst commented Apr 7, 2023

orbeckst commented Apr 7, 2023

cadeduckworth commented Apr 8, 2023

orbeckst commented Apr 8, 2023

orbeckst commented Oct 14, 2024

codecov bot commented Jan 8, 2023 •

edited

Loading

cadeduckworth commented Apr 6, 2023 •

edited

Loading