Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Analysis #38

Closed
wants to merge 39 commits into from
Closed

[WIP] Analysis #38

wants to merge 39 commits into from

Conversation

lilyminium
Copy link
Member

@lilyminium lilyminium commented Feb 7, 2020

Closes #4, #7, most of #29

This is taking much longer than I thought it would. I would appreciate any feedback on what I have so far, what I should prioritise next, and also any changes I should make before I try to finish the rest of the notebooks.

Every notebook aims to show a useful use case of a class or function in MDAnalysis, point out variables that users will likely need to change, visualise the results, and interpret them in some way.

Notebooks on Analysis modules

  • base
  • align
    • alignto
    • AlignTraj
      • mention match_atoms (new in 0.21.0 or 1.0)
    • Aligning a trajectory to various frames
    • AverageStructure (new in 0.21.0 or 1.0)
  • rms
    • rmsd
    • RMSD
    • RMSF
    • pairwise RMSD
  • contacts
    • Contacts (1D) (notebook currently for develop version)
    • q1q2 (2D)
    • Writing a custom analysis
  • distances
    • dist
    • distance_array
    • self_distance_array
    • between -- this is such a small function... might put it as a note in the Selection Language page
  • helanal is very broken. Low priority
  • psa
    • PSAnalysis
    • [ ] PSAPair don't think people will use this much
    • [ ] Pair don't think people will use this much
  • encore
    • hes (harmonic ensemble similarity) (currently broken, waiting on Changes encore.hes() details output mdanalysis#2484)
    • ces (clustering ensemble similarity)
    • dres (dimension reduction ensemble similarity)
    • ces_convergence
    • dres_convergence
    • Indirectly (the following modules are to support the above)
      • clustering
      • dimensionality_reduction
      • creating your own DimensionalityReductionMethod
      • creating your own ClusteringMethod
      • [ ] confdistmatrix not for casual users
      • [ ] covariance_matrix not for casual users
      • [ ] bootstrap not for casual users
  • hydrogen_bonds
  • hbonds
    • [ ] hbond_analysis soon to be deprecated?
    • hbond_autocorrel (procrastinating due to hope that this will get moved to hydrogen_bonds)
    • WaterBridgeAnalysis (procrastinating due to hope that this will get moved to hydrogen_bonds)
  • hole
    • [x] HOLE
    • [x] HOLEtraj
    • hole2
  • leaflet
    • LeafletFinder
  • nuclinfo: this contains assorted very self-explanatory functions. Low priority.
  • polymer
  • density: maybe someone will do upgrade DensityAnalysis to use AnalysisBase mdanalysis#2502 ?
    • DensityAnalysis density_from_Universe
    • [ ] density_from_PDB
    • Bfactor2RMSF
  • lineardensity: waiting on Have LinearDensity support updating AtomGroups, more intuitive names, save bins mdanalysis#2508. Low priority
    • LinearDensity
  • waterdynamics: maybe someone will move this to AnalysisBase....
    • HydrogenBondLifetimes: not sure how this is different from hbond_autocorrel yet
    • WaterOrientationalRelaxation
    • AngularDistribution
    • Mean Square Displacement
    • SurvivalProbability
  • diffusionmap
    • DistanceMatrix
    • DiffusionMap
  • pca

Other notebooks added

References and executing the notebooks

In #34 I mentioned that a pro of having examples in Jupyter notebooks is that the examples get checked by executing the notebook. To ensure that the notebooks get checked every time the documentation is built, I've written a script scripts/clean_example_notebooks.py that takes in a list of notebooks. For each notebook, it:

  1. Goes through and collects references
  2. Converts any shorthand references to an inline citation that links to the paper (in the notebook) or the actual reference (online HTML)
  3. Builds a bibliography cell at the end of the notebook
  4. Executes the notebook
  5. On success, adds or updates a Last executed line to the first cell
  6. If successful, rewrites the notebook.
  7. On failure, it collects all the errors and raises them all at once. So doc building should fail.

This script already successfully fails on the harmonic ensemble clustering notebook. In this notebook, the shorthand references haven't been converted yet.

Caveats

The NGLView widget can only save the molecule data if it is displayed in a browser. These notebooks are not executed in a browser with the script, and there are only a few of them, so right now I just re-run them after executing through the script. If this becomes too annoying I can convert to selenium.

(The script will print which notebooks have NGLView in them and need to be re-executed when it's done.)

References

nbsphinx can handle citations through sphinxcontrib-bibtex, so I moved the user guide references to a bibtex file references.bib. Pasting citations gets quite tedious when making notebooks, so I use a shorthand with the pattern '#{bibtex-key}'. e.g. #michaud-agrawal_mdanalysis_2011 gets converted to the reference with a bibtex key of michaud-agrawal_mdanalysis_2011. The full HTML in the notebook looks like <a data-cite="michaud-agrawal_mdanalysis_2011" href="https://doi.org/10.1002/jcc.21787">Michaud-Agrawal *et al.*, 2011</a>

The inline display in the browser looks quite ugly (e.g. first cell here) but I'm not sure how to change that in sphinxcontrib-bibtex.

Structure of docs

Right now, the notebooks are all displayed in Examples and in the sidebar of Analysis. It can be a bit distracting to click on a notebook from the main Analysis page, and suddenly realise you've moved to the Examples section in the sidebar. I might fiddle with this later.

@lilyminium
Copy link
Member Author

As always, a built version is at https://lilyminium.github.io/UserGuide/

@orbeckst
Copy link
Member

orbeckst commented Feb 7, 2020

RE: PSA — we did a more real-life example in https://www.mdanalysis.org/SPIDAL-MDAnalysis-Midas-tutorial/ and the data set is available as https://www.mdanalysis.org/MDAnalysisData/adk_transitions.html ; @sseyler created a tutorial in 2015 https://github.com/Becksteinlab/PSAnalysisTutorial but this might be out-of-date with the current code.

By the way, finding pairs is a useful tool once one wants to know why two trajectories differ.

I don't suggest that you change the example that you have. But perhaps add links to the other PSA tutorials, with a note that these showe more complicated cases and more advanced functions but might be somewhat out of date. If we have someone particularly interested in PSA then they can update the User Guide.

@orbeckst
Copy link
Member

orbeckst commented Feb 7, 2020

MDAnalysis.analysis.hbonds is indeed going to be only in 1.0 and we recommend MDAnalysis.analysis.hydrogenbonding. Don't bother adding examples with hbonds.

@orbeckst
Copy link
Member

orbeckst commented Feb 7, 2020

It seems pretty clear that we will go to 1.0.0 and that we're not bothering with 0.21.0 — @richardjgowers ?? — so change references from 0.21 to 1.0. In general, I would say you can simply state that the User Guide requires MDAnalysis 1.0 or better. If you link to specific classes such as AverageStructure then they have their own versionadded notes.

@orbeckst
Copy link
Member

orbeckst commented Feb 7, 2020

https://lilyminium.github.io/UserGuide/examples/analysis/alignment_and_rms/rmsf.html#Plotting-RMSF : "As we can see, the LID and NMP residues indeed move much more compared to the rest of the enzyme."

I would not say this about the PSF/DCD data from the tests. It's a non-equilibrium trajectory that is biased to make a transition, so computing fluctuations on it seems inappropriate. A better example would be a real AdK equilibrium trajectory https://www.mdanalysis.org/MDAnalysisData/adk_equilibrium.html

If you want to stick with the test trajectory then I would say that the apparent fluctuations in these domains are high but in this particular case it is due to the fact that these domains move from their closed to their open conformation. Note that this trajectory does not sample from the equilibrium distribution so it's not really meaningful to calculate these fluctuations (but we do it anyway for demonstration purposes).

@richardjgowers
Copy link
Member

richardjgowers commented Feb 7, 2020 via email

@lilyminium
Copy link
Member Author

lilyminium commented Feb 8, 2020

@orbeckst Thanks for the comments, I wasn't aware of the MDAnalysisData package. This makes things a lot easier.

In general, I would say you can simply state that the User Guide requires MDAnalysis 1.0 or better.

Should I remove the Minimum version lines from the first cell then? I was also wondering if the Last updated was overkill.

Edit: I mostly look at this as a Jupyter notebook where the first cell looks neat, but all the info takes up quite a lot of space in the HTML conversion.

Screenshot 2020-02-09 at 12 42 23 PM
vs
Screenshot 2020-02-09 at 12 42 34 PM


/examples/analysis/polymers_and_membranes/polymer
/examples/analysis/polymers_and_membranes/hole
/examples/analysis/polymers_and_membranes/hole2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hole2 docs show up with the same title as the hole ones. Perhaps put "Deprecated", "Old", or somthing else in the title of the old module?

Copy link
Member

@orbeckst orbeckst Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

References in the hole notebooks did not resolve eg the Note:

The classes in MDAnalysis.analysis.hole are wrappers around the HOLE program. Please cite ([smart_pore_1993], [smart_hole_1996]) when using this module in published work.

Copy link
Member

@orbeckst orbeckst Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hole2 docs still contain reference to HOLEtraj

One of the limitations of the hole program is that it can only accept PDB files. In order to use other formats with hole, or to run hole on trajectories, we can use the hole.HOLEtraj class with an MDAnalysis.Universe

and ht.profiles instead of ha.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gif of the hole surface is cute but the gA channel is not properly shown in VMD because of the distortions along the normal mode. To get a nice representation do the following:

import MDAnalysis as mda; from MDAnalysis.tests import datafiles as data

# write out the structure without ETA (which elNemo ignored :-p)
u = mda.Universe(data.PDB_HOLE)
u.select_atoms("not resname ETA").write("elNemo_ga.pdb")

Use the above

vmd elNemo_ga.pdb path/to/1grm_elNemo_mode7.pdb

and delete the first frame in the trajectory.

We need to to remove ETA (the ethanolamine C-terminus) from the original PDB file because the elNemo trajectory does not contain it and so you cannot directly use PDB_HOLE as the "topology" for MULTI_PDB_HOLE.

This should give you a nicer representation of the channel around the pore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use your new plot_order_parameters(..., aggregrator=...) function to do the most common application of a hole trajectory profile, the "average HOLE profile" with "mean(R) ± std(R)" (as at the end of https://github.com/MDAnalysis/binder-notebook/blob/master/notebooks/analysis/hole-basics.ipynb )?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but you could plot the fillbetween standard deviation with on the returned axis from aggregator=np.mean, or just make the plot yourself with the data returned from over_order_parameters. I could add another plotting function for this?

@lilyminium
Copy link
Member Author

Thanks @orbeckst. Is this approximately what you were thinking? (Is there a way to use GROMACS to wrap molecules back into the unit cell?)

@richardjgowers
Copy link
Member

Those embedded nglviews are awesome!

@orbeckst
Copy link
Member

Awesome, I am so happy to see the transform + density workflow actually working (probably glacially slow but that's just an engineering problem ;-) ).

Apparently, nglview can display density data, at least it is mentioned in the API docs. Maybe @arose can give us a hint how to load a DX file ... or directly use a NumPy array with a density?

@orbeckst
Copy link
Member

orbeckst commented Feb 19, 2020

(Is there a way to use GROMACS to wrap molecules back into the unit cell?)
The standard Gromacs workflow is

printf "protein\nsystem\n"| gmx trjconv -f md.xtc -s md.tpr -o md_centered.xtc -pbc mol -center -ur compact
printf "backbone\nsystem\n" | gmx trjconv -s md.tpr -f md_centered.xtc -o md_fit.xtc -fit rot+trans

The pbc mol -center step will pack all molecules around whatever is specified as the center (in my example the protein) and make all molecules whole. This means that some water molecules or lipids will poke outside the actual box boundaries but that's generally what you want to end up with. Once this is done, the rotational/translational superposition is performed on the centered system.

EDIT: Gromacs also has the really nice -ur compact feature where it converts the triclinic unitcell into a compact representation, namely truncated octahedron or rhombic dodecahedron. MDAnalysis cannot do this at the moment.

@arose
Copy link

arose commented Feb 19, 2020

Maybe @arose can give us a hint how to load a DX file ...

I think view.add_component('my.dx') should be enough. There is no way to directly use a numpy array. Instead of .dx I would suggest to use a binary format like .ccp4 for faster parsing if you don't already have the dx file. Note that there is also a binary dx format variant.

@orbeckst
Copy link
Member

Thanks!

In principle we can write ccp4 ... unless there's a bug in the writer MDAnalysis/GridDataFormats#76

@lilyminium
Copy link
Member Author

This has merge conflicts up the wazoo and I'm not sure how many of the notebooks are even accurate to behaviour anymore; it's probably best to separate each one out and merge one-by-one after checking. I do still have them up at my fork for easier viewing.

@lilyminium
Copy link
Member Author

I can do this over the next few days myself but help is very welcome.

@lilyminium
Copy link
Member Author

Superseded by #125 and #126.

@lilyminium lilyminium closed this Dec 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
version-1.0 Contains 1.0 features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Analysis (general)
4 participants