Dihedral Plots: RDKit Mol Object #243

cadeduckworth · 2023-02-21T12:35:13Z

add RDKit Mol object to dihedral analysis plots
add tests, and close Add test to verify that dihedral analysis plots are labeled with the correct molecule and dihedral atom group #238
close Fix import error in workflows base module (develop branch) #245

codecov · 2023-02-21T12:40:59Z

Codecov Report

Merging #243 (4af6de6) into develop (3b93aad) will increase coverage by 0.38%.
The diff coverage is 99.02%.

@@             Coverage Diff             @@
##           develop     #243      +/-   ##
===========================================
+ Coverage    80.73%   81.12%   +0.38%     
===========================================
  Files           15       15              
  Lines         1905     1960      +55     
  Branches       294      296       +2     
===========================================
+ Hits          1538     1590      +52     
- Misses         276      278       +2     
- Partials        91       92       +1

Impacted Files	Coverage Δ
mdpow/workflows/dihedrals.py	`95.93% <99.01%> (+1.05%)`	⬆️
mdpow/workflows/base.py	`77.55% <100.00%> (-4.09%)`	⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

…to atom_indices/get_atom_indices

…hedrals

cadeduckworth · 2023-02-21T16:34:23Z

@orbeckst
I think I fixed the bond highlighting issue and image size issue.
I incorporated the new algorithm/methods with the 'low res' plotting functionality so that there is a functioning base to improve upon.

Currently going back and looking at our previous conversations and the new links you sent to improve upon the existing methods..

cadeduckworth · 2023-02-21T16:38:44Z

orbeckst · 2023-02-21T17:31:12Z

That looks better. Very good. Did you check other examples, too?

The other comments were along the lines

add numerical labels (I shared a notebook with prettyfy() I think)
add the dihedral name using atom numbers as well (... need to find original discussion)

cadeduckworth · 2023-02-21T18:02:09Z

I checked the plots for that molecule that I have been testing on, but no others yet.

I changed a lot of things that will be helpful moving forward, especially for testing.
The information now used for Mol objects almost all come directly from atom_indices, which uses the RDKit conversion method that is tested and verified from PR#217.

For each dihedral group in a molecule the atom_indices and bond_indices are saved in a dictionary, where the key is in the format of 'C1-C2-C3-C4' to correspond to MDAnalysis selection style, so that the results DataFrame can be the source that calls the associated plotting graphics info. That way, the DF determines which atom and bond indices are highlighted, rather than an iterative method.

EDIT (reminder): @orbeckst - relevant to discussion from our meeting on 6/29/23.. I did not remember because it was several months ago, but there was a method to the madness all along!

cadeduckworth · 2023-02-21T18:13:26Z

@orbeckst

found the notebook with prettyfy(), will add changes

cadeduckworth · 2023-02-21T18:30:15Z

atom_indices added to label Mol object on plots

To-Do:

fix image size to auto adjust by size of Mol object
switch to SVG or enhance image quality

cadeduckworth · 2023-02-21T18:50:17Z

cadeduckworth · 2023-02-21T19:02:54Z

cleaner plot titles

cadeduckworth · 2023-02-21T23:49:42Z

This is where I have landed with the combined RDKIT/matplotlib to SVG composition.
(SVG converted to PDF attached below)

With the number of atoms in the SAMPL9 molecules, and the small text size of the atom indices, the SVG plots will look much better for those purposes. It should also maintain a consistent box size for all of the molecules, so the orientation or overall size of the molecule should not cause the image to zoom in or out like the MolToImage RDKit PDF output method.

Regarding the PDF output method, I figured out that, because the default size of the MolToImage conversion is 300x300, 'square' shaped molecules end up appearing bigger under increased extent values, which means the 'scale' cannot really afford to be 'zoomed in' for horizontally oriented molecules that assume more of a 'linear' conformation. There exists a noticeable difference between the aspect options, auto and equal, but changing those without changing the frame size and other parameters causes distortion. All of this of course can be changed, but would be difficult to detect how to do so for each case. For the current methods that output PDF, the best combination I have come across is the most recently pushed commit (see also, plot directly beneath it), and likely suits most sizes/shapes (needs further testing to be sure).

Regarding the SVG output method, the svgutil code, for me, has been tricky for size, scale, and placement. There is a way to get the sizes after the original Mol or MPL objects are saved to SVG files which could make things easier, but it seems like minor changes in different steps of processing can cause big downstream changes for placement/proportions/scales. There is also a method that converts units between px, cm, pt, etc., but I have not had a chance to get into it yet.

It might be best to use what I have developed for SVGs as an option that can be specified, for now, and use the pdf method as default, unless conda has something similar to inkscape that we can include as a dependency in order to facilitate viewing, conversion, and notebook embedding?

test.pdf

orbeckst · 2023-02-22T01:17:14Z

The PDF looks sharp and can be zoomed in ✅

The next thing is to make sure that we don't need to scale it to out it into a paper. To make it publication-ready we would want it to be "at final size", i.e., typical two-column wide figure (about 190 mm, just randomly used https://writing.stackexchange.com/questions/21658/what-is-the-image-size-in-scientific-paper-if-indicated-as-a-single-1-5-or-2-c ) and font size should not be smaller than 9 pt. I think the matplotlib default is 10 so if you make figures at final size, everything should look good.

orbeckst · 2023-02-22T01:18:53Z

(see also https://github.com/Becksteinlab/making-prettier-plots/blob/master/Making%20better%20plots.ipynb )

cadeduckworth · 2023-03-04T09:27:14Z

@orbeckst

Here are some plotting results after implementing existing method with svgutils and new method with cairosvg into dihedrals.py module.

svgutils is being used to create the SVG files/graphics and layer them.
cairosvg is being used to convert to pdf and scale/save in the desired size.
I added a kwarg to adjust the final exported pdf plot width, with the default set to 190mm.
I also added conversion to pixels for compatibility with cairosvg.svg2pdf.

Example with a fairly cluttered molecule from SAMPL9 set (step=1000), before I added a little white space to the right border to allow for longer solvent names:
S07_C21-C22-C25-F27_violins.pdf

SM25 & SM26 (testing resources) after increasing whitespace on right border:
SM25_C2-N3-S4-C5_violins.pdf
SM26_O1-S2-C3-C4_violins.pdf

I have not been able to figure out how to suppress matplotlib output, which does not contain the Mol image.

Moving forward I am working on fixing the tests to accommodate what was changed during development. After I achieve full coverage again I will request a review and start making changes to the documentation to reflect changes and new functionality.

EDIT: see parentheses

…s and altered function definitions

…s, remove broken test, add reminder to update func list in docs

cadeduckworth · 2023-03-04T12:14:18Z

@orbeckst

The codecov report of 98.80% is misleading; some of my current tests are weak at best, and some are possibly not testing anything at all. Once the new SVG methods are finalized, more specific tests need to be written, existing tests need to be revisited, and fixture scopes need to be optimized.
Logger/error info needs to be added or updated. Once PR229 is merged the test_workflows_base.py module needs to test iterative usage of new plotting functionality for multiple projects. Docs will need a thorough rewrite. Tutorials and examples will need to be revisited.

The 'plotting only one solvent issue' needs to be revisited (see most recent comments, SM25 & SM26 pdfs). The current method almost works, but results in an unnecessary solvent label in the legend. I have already already come up with a way to probably fix this but have not tested it yet.

The plots look nice, they seem scalable (a few examples showed signs of adequate scaling, see also most recent comment for S07, SM255, SM26), and I am pleased with the combination of svgutils and cairosvg. I am not sure if the problem exists outside of Jupyter or IDEs, but I tried my hand at suppressing the matplotlib output with no luck.. and the output is a plot which does not contain the Mol object image, so it is annoying. I will look more into this.

EDIT: As a forewarning the changed files here are littered with comments and bad practices, so I will try to clean those up before you get a chance to review. This one required changes on more levels, so PR229 and PR242 are more complete and take precedence.

mdpow/tests/test_automated_dihedral_analysis.py

cadeduckworth · 2023-07-02T02:13:33Z

@orbeckst
Please see lines 143 and 144 (dihedrals.build_universe). If MDPOW supports single solvent Pow calculations (does it?), should we not always assume that results will be available for water by default?

EDIT: Maybe it is best to default to using the topology and trajectory from the first solvent specified?

orbeckst · 2023-07-02T02:40:23Z

It supports single solvent solvation free calculations, yes. You're right, the water directory does not necessarily exist.

However, just raise an issue that we improve build_universe() to be smarter about finding the topology. Right now it's working for everything that we need.

Let's not use the ITP file because that might not be present, e.g., when someone else ran just the FEP on a cluster.

orbeckst · 2023-07-02T02:41:46Z

Actually, your idea to use the first solvent seems pretty good and easy enough to implement. You could even make solvent='water' a kwarg and then the default is the current behavior.

cadeduckworth · 2023-07-02T02:47:18Z

It supports single solvent solvation free calculations, yes. You're right, the water directory does not necessarily exist.

However, just raise an issue that we improve build_universe() to be smarter about finding the topology. Right now it's working for everything that we need.

Let's not use the ITP file because that might not be present, e.g., when someone else ran just the FEP on a cluster.

ITP file?

mdpow/workflows/base.py

mdpow/workflows/dihedrals.py

orbeckst · 2023-07-02T04:48:15Z

mdpow/tests/test_workflows_base.py


-        assert 'all analyses completed' in caplog.text, ('automated_dihedral_analysis '
-               'did not iteratively run to completion for the provided project')
+            assert 'all analyses completed' in caplog.text, ('automated_dihedral_analysis '


Unindent — that's better outside the pytest.raises block.

As a general rule reduce the amount of code in a with block to the essentials as this makes clearer what's relevant for testing the condition.

I believe the test fails without the indent, possibly because of what is being tested or how it is written in the module. I am not certain of this, but making the change last night resulted in a passing test.

orbeckst · 2023-07-02T06:12:18Z

ITP file?

I was thinking along what I used for the dihedral sampler: use the initial files in the top level directories, including the topology file, but on second thought it makes no sense ebcause that files does not contain the solvent and is not useful for using with the trajectory. The TPR is really what we want.

Sorry to have confused you.

cadeduckworth · 2023-07-03T03:58:57Z

in response to

Docs say hue_order takes a list of strings, so we will do the list conversion as part of the kwarg.

EDIT: I think hue_order=["water", "octanol"] was the original kwarg from the dihedral violins code you provided me with to start developing the figure functions a long while ago.

cadeduckworth · 2023-07-03T05:36:29Z

@orbeckst Ready for review, all requested changes resolved or responded to. I raised several new issues and commented or updated some existing issues to document what needs to be done soon, but is not immediately necessary for merging this PR. Minor reviewing left to do on my part tomorrow as a sanity check, but I hope to get this merged and resubmit the SAMPL9 jobs Monday afternoon.

orbeckst

Only one code compactifitransmogration.

mdpow/workflows/dihedrals.py

orbeckst · 2023-07-04T02:35:33Z

A momentous occasion: PR #243 has been merged. 🚀 🌟

Congratulations!

new PR

b4aeb11

cadeduckworth linked an issue Feb 21, 2023 that may be closed by this pull request

Add test to verify that dihedral analysis plots are labeled with the correct molecule and dihedral atom group #238

Closed

cadeduckworth added 5 commits February 21, 2023 06:26

update mol object methods, change dihedral_indices naming convention …

cc17a6b

…to atom_indices/get_atom_indices

add get_bond_indices

30ad175

reflect name change in tests from dihedrals automated workflow

8d67d81

missed name change in test

59f32a5

fix bond highlighting and add low resolution plotting function for di…

9a7c25a

…hedrals

add atom_indices to titles of plot columns

b6f66c2

add atom_indices to label Mol object on plots

2f8c042

cleaner plot titles

844b29f

current best plot resolution/proportion

5fa5f11

cadeduckworth added 5 commits March 4, 2023 02:34

add svgutils and cairosvg methods to plot svg mol object

82cbaa2

reimplement DF input option and fix most tests to reflect name change…

d26d9f1

…s and altered function definitions

add svgutils and cairosvg to dependencies, install, requirements list…

d80ae4c

…s, remove broken test, add reminder to update func list in docs

remove unneccessary import

a8f5b5e

remove import

2522b5c

orbeckst reviewed Jul 1, 2023

View reviewed changes

mdpow/tests/test_automated_dihedral_analysis.py Show resolved Hide resolved

Merge branch 'develop' into plot-Mol

b6d4f61

cadeduckworth mentioned this pull request Jul 2, 2023

Fix import error in workflows base module (develop branch) #245

Closed

cadeduckworth added 2 commits July 1, 2023 20:58

change imports to follow PEP 8

7843fce

modify dihedrals workflow docs to explain figdir kwarg requirement

0cb2ddc

orbeckst self-assigned this Jul 2, 2023

orbeckst reviewed Jul 2, 2023

View reviewed changes

mdpow/workflows/base.py Outdated Show resolved Hide resolved

orbeckst reviewed Jul 2, 2023

View reviewed changes

mdpow/workflows/dihedrals.py Outdated Show resolved Hide resolved

orbeckst reviewed Jul 2, 2023

View reviewed changes

use first solvent specified to build MDAnalysis Universe

84aed57

cadeduckworth mentioned this pull request Jul 2, 2023

use relative imports inside MDPOW #257

Merged

cadeduckworth and others added 2 commits July 2, 2023 00:38

Merge branch 'develop' into plot-Mol

d231b45

Merge branch 'develop' into plot-Mol

9a7d385

cadeduckworth added 4 commits July 2, 2023 21:06

modify single solvent plotting method, add solvent count assertion

2970b24

comment expected fixture scope changes, reference issue #235

9801848

remove solute.unwrap, not needed

4ebf3c0

reference issue #260 to fix jupyter notebook figure output

292a8bd

orbeckst requested changes Jul 3, 2023

View reviewed changes

mdpow/workflows/dihedrals.py Outdated Show resolved Hide resolved

finalize single solvent figure modifications and add test

4af6de6

orbeckst approved these changes Jul 4, 2023

View reviewed changes

orbeckst merged commit 76b96d4 into develop Jul 4, 2023

orbeckst deleted the plot-Mol branch July 4, 2023 02:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dihedral Plots: RDKit Mol Object #243

Dihedral Plots: RDKit Mol Object #243

cadeduckworth commented Feb 21, 2023 •

edited

Loading

codecov bot commented Feb 21, 2023 •

edited

Loading

cadeduckworth commented Feb 21, 2023

cadeduckworth commented Feb 21, 2023

orbeckst commented Feb 21, 2023

cadeduckworth commented Feb 21, 2023 •

edited

Loading

cadeduckworth commented Feb 21, 2023 •

edited

Loading

cadeduckworth commented Feb 21, 2023

cadeduckworth commented Feb 21, 2023

cadeduckworth commented Feb 21, 2023

cadeduckworth commented Feb 21, 2023

orbeckst commented Feb 22, 2023

orbeckst commented Feb 22, 2023

cadeduckworth commented Mar 4, 2023 •

edited

Loading

cadeduckworth commented Mar 4, 2023 •

edited

Loading

cadeduckworth commented Jul 2, 2023 •

edited

Loading

orbeckst commented Jul 2, 2023

orbeckst commented Jul 2, 2023

cadeduckworth commented Jul 2, 2023

orbeckst Jul 2, 2023

cadeduckworth Jul 2, 2023

orbeckst commented Jul 2, 2023

cadeduckworth commented Jul 3, 2023 •

edited

Loading

cadeduckworth commented Jul 3, 2023

orbeckst left a comment

orbeckst commented Jul 4, 2023

Dihedral Plots: RDKit Mol Object #243

Dihedral Plots: RDKit Mol Object #243

Conversation

cadeduckworth commented Feb 21, 2023 • edited Loading

codecov bot commented Feb 21, 2023 • edited Loading

Codecov Report

cadeduckworth commented Feb 21, 2023

cadeduckworth commented Feb 21, 2023

orbeckst commented Feb 21, 2023

cadeduckworth commented Feb 21, 2023 • edited Loading

cadeduckworth commented Feb 21, 2023 • edited Loading

cadeduckworth commented Feb 21, 2023

cadeduckworth commented Feb 21, 2023

cadeduckworth commented Feb 21, 2023

cadeduckworth commented Feb 21, 2023

orbeckst commented Feb 22, 2023

orbeckst commented Feb 22, 2023

cadeduckworth commented Mar 4, 2023 • edited Loading

cadeduckworth commented Mar 4, 2023 • edited Loading

cadeduckworth commented Jul 2, 2023 • edited Loading

orbeckst commented Jul 2, 2023

orbeckst commented Jul 2, 2023

cadeduckworth commented Jul 2, 2023

orbeckst Jul 2, 2023

Choose a reason for hiding this comment

cadeduckworth Jul 2, 2023

Choose a reason for hiding this comment

orbeckst commented Jul 2, 2023

cadeduckworth commented Jul 3, 2023 • edited Loading

cadeduckworth commented Jul 3, 2023

orbeckst left a comment

Choose a reason for hiding this comment

orbeckst commented Jul 4, 2023

cadeduckworth commented Feb 21, 2023 •

edited

Loading

codecov bot commented Feb 21, 2023 •

edited

Loading

cadeduckworth commented Feb 21, 2023 •

edited

Loading

cadeduckworth commented Feb 21, 2023 •

edited

Loading

cadeduckworth commented Mar 4, 2023 •

edited

Loading

cadeduckworth commented Mar 4, 2023 •

edited

Loading

cadeduckworth commented Jul 2, 2023 •

edited

Loading

cadeduckworth commented Jul 3, 2023 •

edited

Loading