Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dihedral Plots: RDKit Mol Object #243

Merged
merged 90 commits into from
Jul 4, 2023
Merged

Dihedral Plots: RDKit Mol Object #243

merged 90 commits into from
Jul 4, 2023

Conversation

cadeduckworth
Copy link
Contributor

@cadeduckworth cadeduckworth commented Feb 21, 2023

@codecov
Copy link

codecov bot commented Feb 21, 2023

Codecov Report

Merging #243 (4af6de6) into develop (3b93aad) will increase coverage by 0.38%.
The diff coverage is 99.02%.

@@             Coverage Diff             @@
##           develop     #243      +/-   ##
===========================================
+ Coverage    80.73%   81.12%   +0.38%     
===========================================
  Files           15       15              
  Lines         1905     1960      +55     
  Branches       294      296       +2     
===========================================
+ Hits          1538     1590      +52     
- Misses         276      278       +2     
- Partials        91       92       +1     
Impacted Files Coverage Δ
mdpow/workflows/dihedrals.py 95.93% <99.01%> (+1.05%) ⬆️
mdpow/workflows/base.py 77.55% <100.00%> (-4.09%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@cadeduckworth
Copy link
Contributor Author

@orbeckst
I think I fixed the bond highlighting issue and image size issue.
I incorporated the new algorithm/methods with the 'low res' plotting functionality so that there is a functioning base to improve upon.

Currently going back and looking at our previous conversations and the new links you sent to improve upon the existing methods..

@cadeduckworth
Copy link
Contributor Author

image

@orbeckst
Copy link
Member

That looks better. Very good. Did you check other examples, too?

The other comments were along the lines

  • add numerical labels (I shared a notebook with prettyfy() I think)
  • add the dihedral name using atom numbers as well (... need to find original discussion)

@cadeduckworth
Copy link
Contributor Author

cadeduckworth commented Feb 21, 2023

I checked the plots for that molecule that I have been testing on, but no others yet.

I changed a lot of things that will be helpful moving forward, especially for testing.
The information now used for Mol objects almost all come directly from atom_indices, which uses the RDKit conversion method that is tested and verified from PR#217.

For each dihedral group in a molecule the atom_indices and bond_indices are saved in a dictionary, where the key is in the format of 'C1-C2-C3-C4' to correspond to MDAnalysis selection style, so that the results DataFrame can be the source that calls the associated plotting graphics info. That way, the DF determines which atom and bond indices are highlighted, rather than an iterative method.

EDIT (reminder): @orbeckst - relevant to discussion from our meeting on 6/29/23.. I did not remember because it was several months ago, but there was a method to the madness all along!

@cadeduckworth
Copy link
Contributor Author

cadeduckworth commented Feb 21, 2023

@orbeckst

  • found the notebook with prettyfy(), will add changes

@cadeduckworth
Copy link
Contributor Author

atom_indices added to label Mol object on plots

To-Do:

  • fix image size to auto adjust by size of Mol object
  • switch to SVG or enhance image quality

@cadeduckworth
Copy link
Contributor Author

image

@cadeduckworth
Copy link
Contributor Author

cleaner plot titles

image

@cadeduckworth
Copy link
Contributor Author

This is where I have landed with the combined RDKIT/matplotlib to SVG composition.
(SVG converted to PDF attached below)

With the number of atoms in the SAMPL9 molecules, and the small text size of the atom indices, the SVG plots will look much better for those purposes. It should also maintain a consistent box size for all of the molecules, so the orientation or overall size of the molecule should not cause the image to zoom in or out like the MolToImage RDKit PDF output method.

Regarding the PDF output method, I figured out that, because the default size of the MolToImage conversion is 300x300, 'square' shaped molecules end up appearing bigger under increased extent values, which means the 'scale' cannot really afford to be 'zoomed in' for horizontally oriented molecules that assume more of a 'linear' conformation. There exists a noticeable difference between the aspect options, auto and equal, but changing those without changing the frame size and other parameters causes distortion. All of this of course can be changed, but would be difficult to detect how to do so for each case. For the current methods that output PDF, the best combination I have come across is the most recently pushed commit (see also, plot directly beneath it), and likely suits most sizes/shapes (needs further testing to be sure).

Regarding the SVG output method, the svgutil code, for me, has been tricky for size, scale, and placement. There is a way to get the sizes after the original Mol or MPL objects are saved to SVG files which could make things easier, but it seems like minor changes in different steps of processing can cause big downstream changes for placement/proportions/scales. There is also a method that converts units between px, cm, pt, etc., but I have not had a chance to get into it yet.

It might be best to use what I have developed for SVGs as an option that can be specified, for now, and use the pdf method as default, unless conda has something similar to inkscape that we can include as a dependency in order to facilitate viewing, conversion, and notebook embedding?

test.pdf

@orbeckst
Copy link
Member

The PDF looks sharp and can be zoomed in ✅

The next thing is to make sure that we don't need to scale it to out it into a paper. To make it publication-ready we would want it to be "at final size", i.e., typical two-column wide figure (about 190 mm, just randomly used https://writing.stackexchange.com/questions/21658/what-is-the-image-size-in-scientific-paper-if-indicated-as-a-single-1-5-or-2-c ) and font size should not be smaller than 9 pt. I think the matplotlib default is 10 so if you make figures at final size, everything should look good.

@orbeckst
Copy link
Member

@cadeduckworth
Copy link
Contributor Author

cadeduckworth commented Mar 4, 2023

@orbeckst

Here are some plotting results after implementing existing method with svgutils and new method with cairosvg into dihedrals.py module.

svgutils is being used to create the SVG files/graphics and layer them.
cairosvg is being used to convert to pdf and scale/save in the desired size.
I added a kwarg to adjust the final exported pdf plot width, with the default set to 190mm.
I also added conversion to pixels for compatibility with cairosvg.svg2pdf.

Example with a fairly cluttered molecule from SAMPL9 set (step=1000), before I added a little white space to the right border to allow for longer solvent names:
S07_C21-C22-C25-F27_violins.pdf

SM25 & SM26 (testing resources) after increasing whitespace on right border:
SM25_C2-N3-S4-C5_violins.pdf
SM26_O1-S2-C3-C4_violins.pdf

I have not been able to figure out how to suppress matplotlib output, which does not contain the Mol image.

Moving forward I am working on fixing the tests to accommodate what was changed during development. After I achieve full coverage again I will request a review and start making changes to the documentation to reflect changes and new functionality.

EDIT: see parentheses

@cadeduckworth
Copy link
Contributor Author

cadeduckworth commented Mar 4, 2023

@orbeckst

The codecov report of 98.80% is misleading; some of my current tests are weak at best, and some are possibly not testing anything at all. Once the new SVG methods are finalized, more specific tests need to be written, existing tests need to be revisited, and fixture scopes need to be optimized.
Logger/error info needs to be added or updated. Once PR229 is merged the test_workflows_base.py module needs to test iterative usage of new plotting functionality for multiple projects. Docs will need a thorough rewrite. Tutorials and examples will need to be revisited.

The 'plotting only one solvent issue' needs to be revisited (see most recent comments, SM25 & SM26 pdfs). The current method almost works, but results in an unnecessary solvent label in the legend. I have already already come up with a way to probably fix this but have not tested it yet.

The plots look nice, they seem scalable (a few examples showed signs of adequate scaling, see also most recent comment for S07, SM255, SM26), and I am pleased with the combination of svgutils and cairosvg. I am not sure if the problem exists outside of Jupyter or IDEs, but I tried my hand at suppressing the matplotlib output with no luck.. and the output is a plot which does not contain the Mol object image, so it is annoying. I will look more into this.

EDIT: As a forewarning the changed files here are littered with comments and bad practices, so I will try to clean those up before you get a chance to review. This one required changes on more levels, so PR229 and PR242 are more complete and take precedence.

@cadeduckworth
Copy link
Contributor Author

cadeduckworth commented Jul 2, 2023

@orbeckst
Please see lines 143 and 144 (dihedrals.build_universe). If MDPOW supports single solvent Pow calculations (does it?), should we not always assume that results will be available for water by default?

EDIT: Maybe it is best to default to using the topology and trajectory from the first solvent specified?

@orbeckst
Copy link
Member

orbeckst commented Jul 2, 2023

It supports single solvent solvation free calculations, yes. You're right, the water directory does not necessarily exist.

However, just raise an issue that we improve build_universe() to be smarter about finding the topology. Right now it's working for everything that we need.

Let's not use the ITP file because that might not be present, e.g., when someone else ran just the FEP on a cluster.

@orbeckst
Copy link
Member

orbeckst commented Jul 2, 2023

Actually, your idea to use the first solvent seems pretty good and easy enough to implement. You could even make solvent='water' a kwarg and then the default is the current behavior.

@cadeduckworth
Copy link
Contributor Author

It supports single solvent solvation free calculations, yes. You're right, the water directory does not necessarily exist.

However, just raise an issue that we improve build_universe() to be smarter about finding the topology. Right now it's working for everything that we need.

Let's not use the ITP file because that might not be present, e.g., when someone else ran just the FEP on a cluster.

ITP file?

@orbeckst orbeckst self-assigned this Jul 2, 2023
mdpow/workflows/base.py Outdated Show resolved Hide resolved

assert 'all analyses completed' in caplog.text, ('automated_dihedral_analysis '
'did not iteratively run to completion for the provided project')
assert 'all analyses completed' in caplog.text, ('automated_dihedral_analysis '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unindent — that's better outside the pytest.raises block.

As a general rule reduce the amount of code in a with block to the essentials as this makes clearer what's relevant for testing the condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the test fails without the indent, possibly because of what is being tested or how it is written in the module. I am not certain of this, but making the change last night resulted in a passing test.

@orbeckst
Copy link
Member

orbeckst commented Jul 2, 2023

ITP file?

I was thinking along what I used for the dihedral sampler: use the initial files in the top level directories, including the topology file, but on second thought it makes no sense ebcause that files does not contain the solvent and is not useful for using with the trajectory. The TPR is really what we want.

Sorry to have confused you.

@cadeduckworth
Copy link
Contributor Author

cadeduckworth commented Jul 3, 2023

in response to

Docs say hue_order takes a list of strings, so we will do the list conversion as part of the kwarg.

EDIT: I think hue_order=["water", "octanol"] was the original kwarg from the dihedral violins code you provided me with to start developing the figure functions a long while ago.

@cadeduckworth
Copy link
Contributor Author

@orbeckst Ready for review, all requested changes resolved or responded to. I raised several new issues and commented or updated some existing issues to document what needs to be done soon, but is not immediately necessary for merging this PR. Minor reviewing left to do on my part tomorrow as a sanity check, but I hope to get this merged and resubmit the SAMPL9 jobs Monday afternoon.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one code compactifitransmogration.

mdpow/workflows/dihedrals.py Outdated Show resolved Hide resolved
@orbeckst orbeckst merged commit 76b96d4 into develop Jul 4, 2023
@orbeckst orbeckst deleted the plot-Mol branch July 4, 2023 02:34
@orbeckst
Copy link
Member

orbeckst commented Jul 4, 2023

A momentous occasion: PR #243 has been merged. 🚀 🌟

Congratulations!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants