Skip to content

Commit

Permalink
Many documentation updates, small tweaks to database interface. (#100)
Browse files Browse the repository at this point in the history
  • Loading branch information
lubbersnick authored Sep 9, 2024
1 parent be79f3a commit 110b831
Show file tree
Hide file tree
Showing 29 changed files with 530 additions and 179 deletions.
4 changes: 3 additions & 1 deletion AUTHORS.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Emily Shinkle (LANL)
Michael G. Taylor (LANL)
Jan Janssen (LANL)
Cagri Kaymak (LANL)
Shuhao Zhang (CMU, LANL)
Shuhao Zhang (CMU, LANL) - Batched Optimization routines

Also thanks to testing and feedback from:

Expand All @@ -36,3 +36,5 @@ David Rosenberger
Michael Tynes
Drew Rohskopf
Neil Mehta
Alice E A Allen

32 changes: 28 additions & 4 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,47 @@
Breaking changes:
-----------------

- set_e0_values has been renamed hierarchical_energy_initialization. The old name is
still provided but deprecated, and will be removed.

New Features:
-------------

- Added a new custom cuda kernel implementation using triton. These are highly performant and now the default implementation.
- Exporting a database to NPZ or H5 format after preprocessing is now just a function call away.
- SNAPjson format can now support an optional number of comment lines.
- Added Batch optimizer features in order to optimize geometries in parallel on the GPU. Algorithms include FIRE and BFGS.
- Added a new custom cuda kernel implementation using triton.
These are highly performant and now the default implementation.
- Exporting any database to NPZ or H5 format after preprocessing can be done with a method call.
- Database states can be cached to disk to simplify the restarting of training.
- Added batch geometry optimizer features in order to optimize geometries
in parallel on the GPU. Algorithms include FIRE, Newton-Raphson, and BFGS.
- Added experiment pytorch lightning trainer to provide for simple parallelized training.
- Added a molecular dynamics engine which includes the ability to batch over systems.
- Added examples pertaining to coarse graining.
- Added pair finders based on scipy KDTree for training to large systems.
- Added tool to drastically simplify creating ensemble models. The ensemblized graphs
are compatible with molecular dynamics codes such ASE and LAMMPS.
- Added the ability to weight different systems/atoms/bonds in a loss function.


Improvements:
-------------

- Eliminated dependency on pyanitools for loading ANI-style H5 datasets.
- SNAPjson format can now support an optional number of comment lines.
- Added unit conversion options to the LAMMPS interface.
- Improved performance of bond order regression.
- It is now possible to limit the memory usage of the MLIAP interface in LAMMPS
using a library setting.
- Provide tunable regularization of HIP-NN-TS with an epsilon parameter, and
set the default to use a better value for epsilon.


Bug Fixes:
----------

- Fixed bug where custom kernels were not launching properly on non-default GPUs
- Fixed error when LAMMPS interface is in kokkos mode and the kokkos device was set to CPU.
- MLIAPInterface objects
- Fixed bug with RDF computer automatic initialization.

0.0.3
=======
Expand Down
10 changes: 10 additions & 0 deletions COPYRIGHT.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@

Copyright 2019. Triad National Security, LLC. All rights reserved.
This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos
National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S.
Department of Energy/National Nuclear Security Administration. All rights in the program are
reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear
Security Administration. The Government is granted for itself and others acting on its behalf a
nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare
derivative works, distribute copies to the public, perform publicly and display publicly, and to permit
others to do so.
10 changes: 0 additions & 10 deletions LICENSE.txt
Original file line number Diff line number Diff line change
@@ -1,15 +1,5 @@


Copyright 2019. Triad National Security, LLC. All rights reserved.
This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos
National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S.
Department of Energy/National Nuclear Security Administration. All rights in the program are
reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear
Security Administration. The Government is granted for itself and others acting on its behalf a
nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare
derivative works, distribute copies to the public, perform publicly and display publicly, and to permit
others to do so.

This program is open source under the BSD-3 License.
Redistribution and use in source and binary forms, with or without modification, are permitted
provided that the following conditions are met:
Expand Down
1 change: 1 addition & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ The Journal of chemical physics, 148(24), 241715.
See AUTHORS.txt for information on authors.

See LICENSE.txt for licensing information. hippynn is licensed under the BSD-3 license.
See COPYRIGHT.txt for copyright information.

Triad National Security, LLC (Triad) owns the copyright to hippynn, which it identifies as project number LA-CC-19-093.

Expand Down
4 changes: 3 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,11 @@
"no-show-inheritance": True,
"special-members": "__init__",
}
autodoc_member_order = 'bysource'


# The following are highly optional, so we mock them for doc purposes.
autodoc_mock_imports = ["pyanitools", "seqm", "schnetpack", "cupy", "lammps", "numba", "triton", "pytorch_lightning"]
autodoc_mock_imports = ["pyanitools", "seqm", "schnetpack", "cupy", "lammps", "numba", "triton", "pytorch_lightning", 'triton', 'scipy']


# -- Options for HTML output -------------------------------------------------
Expand Down
1 change: 0 additions & 1 deletion docs/source/examples/controller.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
Controller
==========


How to define a controller for more customized control of the training process.
We assume that there is a set of ``training_modules`` assembled and a ``database`` object has been constructed.

Expand Down
6 changes: 3 additions & 3 deletions docs/source/examples/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ Examples

Here are some examples about how to use various features in
``hippynn``. Besides the :doc:`/examples/minimal_workflow` example,
the examples are just snippets. For runnable example scripts, see
`the examples at the hippynn github repository`_
the examples are just snippets, rather than full scripts.
For runnable example scripts, see `the examples at the hippynn github repository`_

.. _`the examples at the hippynn github repository`: https://github.com/lanl/hippynn/tree/development/examples

Expand All @@ -23,5 +23,5 @@ the examples are just snippets. For runnable example scripts, see
mliap_unified
excited_states
weighted_loss

lightning

20 changes: 20 additions & 0 deletions docs/source/examples/lightning.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Pytorch Lightning module
========================


Hippynn incldues support for distributed training using `pytorch-lightning`_.
This can be accessed using the :class:`hippynn.experiment.HippynnLightningModule` class.
The class has two class-methods for creating the lightning module using the same
types of arguments that would be used for an ordinary hippynn experiment.
These are :meth:`hippynn.experiment.HippynnLightningModule.from_experiment_setup`
and :meth:`hippynn.experiment.HippynnLightningModule.from_train_setup`.
Alternatively, you may construct and supply the arguments for the module yourself.

Finally, in additional to the usual pytorch lightning arguments,
the hippynn lightning module saves an additional file, `experiment_structure.pt`,
which needs to be provided as an argument to the
:meth:`hippynn.experiment.HippynnLightningModule.load_from_checkpoint` constructor.


.. _pytorch-lightning: https://github.com/Lightning-AI/pytorch-lightning

40 changes: 31 additions & 9 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,31 +8,53 @@ We hope you enjoy your stay.
What is hippynn?
================

`hippynn` is a python library for machine learning on atomistic systems.
``hippynn`` is a python library for machine learning on atomistic systems
using `pytorch`_.
We aim to provide high-performance modular design so that different
components can be re-used, extended, or added to. You can find more information
at the :doc:`/user_guide/features` page. The development home is located
at `the hippynn github repository`_, which also contains `many example files`_
about overall library features at the :doc:`/user_guide/features` page.
The development home is located at `the github github repository`_, which also contains `many example files`_.
Additionally, the :doc:`user guide </user_guide/index>` aims to describe abstract
aspects of the library, while the
:doc:`examples documentation section </examples/index>` aims to show
more concretely how to perform tasks with hippynn. Finally, the
:doc:`api documentation </api_documentation/hippynn>` contains a comprehensive
listing of the library components and their documentation.

The main components of hippynn are constructing models, loading databases,
training the models to those databases, making predictions on new databases,
and interfacing with other atomistic codes. In particular, we provide interfaces
to `ASE`_ (prediction), `PYSEQM`_ (training/prediction), and `LAMMPS`_ (prediction).
and interfacing with other atomistic codes for operations such as molecular dynamics.
In particular, we provide interfaces to `ASE`_ (prediction),
`PYSEQM`_ (training/prediction), and `LAMMPS`_ (prediction).
hippynn is also used within `ALF`_ for generating machine learned potentials
along with their training data completely from scratch.

Multiple formats for training data are supported, including
Numpy arrays, the ASE Database, `fitSNAP`_ JSON format, and `ANI HDF5 files`_.
Multiple :doc:`database formats </user_guide/databases>` for training data are supported, including
Numpy arrays, `ASE`_-compatible formats, `FitSNAP`_ JSON format, and `ANI HDF5 files`_.

``hippynn`` includes many tools, such as an :doc:`ASE calculator</examples/ase_calculator>`,
a :doc:`LAMMPS MLIAP interface</examples/mliap_unified>`,
:doc:`batched prediction </examples/predictor>` and batched geometry optimization,
:doc:`automatic ensemble creation </examples/ensembles>`,
:doc:`restarting training from checkpoints </examples/restarting>`,
:doc:`sample-weighted loss functions </examples/weighted_loss>`,
:doc:`distributed training with pytorch lightning </examples/lightning>`,
and more.

``hippynn`` is highly modular, and if you are a model developer, interfacing your
pytorch model into the hippynn node/graph system will make it simple and easy for users
to build models of energy, charge, bond order, excited state energies, and more.

.. _`ASE`: https://wiki.fysik.dtu.dk/ase/
.. _`PYSEQM`: https://github.com/lanl/PYSEQM/
.. _`LAMMPS`: https://www.lammps.org
.. _`fitSNAP`: https://github.com/FitSNAP/FitSNAP
.. _`FitSNAP`: https://github.com/FitSNAP/FitSNAP
.. _`ANI HDF5 files`: https://doi.org/10.1038/s41597-020-0473-z
.. _`ALF`: https://github.com/lanl/ALF/

.. _`the hippynn github repository`: https://github.com/lanl/hippynn/
.. _`the github github repository`: https://github.com/lanl/hippynn/
.. _`many example files`: https://github.com/lanl/hippynn/tree/development/examples
.. _`pytorch`: https://pytorch.org


.. toctree::
Expand Down
4 changes: 2 additions & 2 deletions docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ Installation
============



Requirements
^^^^^^^^^^^^

Expand Down Expand Up @@ -43,6 +42,8 @@ Interfacing codes:
.. _LAMMPS: https://www.lammps.org/
.. _PYSEQM: https://github.com/lanl/PYSEQM
.. _pytorch-lightning: https://github.com/Lightning-AI/pytorch-lightning
.. _hippynn: https://github.com/lanl/hippynn/


Installation Instructions
^^^^^^^^^^^^^^^^^^^^^^^^^
Expand All @@ -67,7 +68,6 @@ Clone the hippynn_ repository and navigate into it, e.g.::
$ git clone https://github.com/lanl/hippynn.git
$ cd hippynn

.. _hippynn: https://github.com/lanl/hippynn/


Dependencies using conda
Expand Down
2 changes: 1 addition & 1 deletion docs/source/user_guide/ckernels.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ The three custom kernels correspond to the interaction sum in hip-nn:

.. math::
a'_{i,a} = = \sum_{\nu,b} V^\nu_{a,b} e^{\nu}_{i,b}
a'_{i,a} = \sum_{\nu,b} V^\nu_{a,b} e^{\nu}_{i,b}
e^{\nu}_{i,a} = \sum_p s^\nu_{p} z_{p_j,a}
Expand Down
5 changes: 3 additions & 2 deletions docs/source/user_guide/concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,9 @@ Graphs

A :class:`~hippynn.graphs.GraphModule` is a 'compiled' set of nodes; a ``torch.nn.Module`` that executes the graph.

GraphModules are used in a number of places within hippynn.

GraphModules are used in a number of places within hippynn,
such as the model, the loss, the evaluator, the predictor, the ASE interface,
and the LAMMPS interface objects all use GraphModules.

Experiment
^^^^^^^^^^
Expand Down
43 changes: 38 additions & 5 deletions docs/source/user_guide/databases.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,45 @@ the [i,j] element of the cell gives the j cartesian coordinate of cell vector i.
massive difficulties fitting to periodic boundary conditions, you may check the transposed version
of your cell data, or compute the RDF.

Database Formats and notes
---------------------------

ASE Objects Database handling
----------------------------------------------------------
If your training data is stored as ASE files of any type (.json,.db,.xyz,.traj ... etc.) it can be loaded directly
a Database for hippynn.
Numpy arrays on disk
........................

see :class:`hippynn.databases.NPZDatabase` (if arrays are stored
in a `.npz` dictionary) or :class:`hippynn.databases.DirectoryDatabase`
(if each array is in its own file).

Numpy arrays in memory
........................

Use the base :class:`hippynn.databases.Database` class directly to initialize
a database from a dictionary mapping db_names to numpy arrays.

pyanitools H5 files
........................

See :class:`hippynn.databases.PyAniFileDB` and see :class:`hippynn.databases.PyAniDirectoryDB`.

This format requires ``h5py`` and ``ase`` to be installed.

Snap JSON Format
........................

See :class:`hippynn.databases.SNAPDirectoryDatabase`. This format requires ``ase`` to be installed.

For more information on this format, see the FitSNAP_ software.

.. _FitSNAP: https://fitsnap.github.io

ASE Database
........................

If your training data is stored as ASE files of any type,
(.json,.db,.xyz,.traj ... etc.) it can be loaded directly
as a Database for hippynn.

The ASE database :class:`~hippynn.databases.AseDatabase` can be loaded with ASE installed.

See ~/examples/ase_db_example.py for a basic example utilzing the class.
See ~/examples/ase_db_example.py for a basic example utilizing the class.
15 changes: 11 additions & 4 deletions docs/source/user_guide/features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Modular set of pytorch layers for atomistic operations
if you want to use them in your scripts without using the rest of the features
provided here -- no problem!

API documentation for :mod:`~hippynn.layers`
API documentation for :mod:`~hippynn.layers` and :mod:`~hippynn.networks`

Graph level API for simple and flexible construction of models from pytorch components.
---------------------------------------------------------------------------------------
Expand All @@ -26,6 +26,12 @@ Graph level API for simple and flexible construction of models from pytorch comp

API documentation for :mod:`~hippynn.graphs`

For more information on nodes and graphs, see the `graph exploration ipython notebook`_ which can also
be found in the example files.

.. _graph exploration ipython notebook: https://github.com/lanl/hippynn/blob/development/examples/graph_exploration.ipynb


Plot level API for tracking your training.
----------------------------------------------------------
- Using the graph API, define quantities to evaluate before, during, or after training as
Expand All @@ -46,7 +52,7 @@ API documentation for :mod:`~hippynn.experiment`
Custom Kernels for fast execution
----------------------------------------------------------
- Certain operations are not efficiently written in pure pytorch, we provide
alternative implementations with ``numba``
alternative implementations.
- These are directly linked in with pytorch Autograd -- use them like native pytorch functions.
- These provide advantages in memory footprint and speed
- Includes CPU and GPU execution for custom kernels
Expand All @@ -55,7 +61,8 @@ More information at :doc:`this page </user_guide/ckernels>`

Interfaces
----------------------------------------------------------
- ASE: Define `ase` calculators based on the graph-level API.
- PYSEQM: Use `pyseqm` calculations as nodes in a graph.
- ASE: Define ``ase`` calculators based on the graph-level API.
- PYSEQM: Use ``pyseqm`` calculations as nodes in a graph.
- LAMMPS: Create a file for use as a `pair style mliap` object.

API documentation for :mod:`~hippynn.interfaces`
5 changes: 5 additions & 0 deletions docs/source/user_guide/settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,3 +69,8 @@ The following settings are available:
- float between 0 and 1
- 1.0
- no
* - TIMEPLOT_AUTOSCALING
- If True, only provide log-scaled plots of training quantities over time if warranted by the data. If False, always produce all plots in linear, log, and loglog scales.
- bool
- True
- yes
2 changes: 1 addition & 1 deletion examples/ani1x_training.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ def load_db(db_info, en_name, force_name, seed, anidata_location, n_workers):
found_indices = ~np.isnan(database.arr_dict[en_name])
database.arr_dict = {k: v[found_indices] for k, v in database.arr_dict.items()}

database.make_trainvalidtest_split(0.1, 0.1)
database.make_trainvalidtest_split(test_size=0.1, valid_size=0.1)
return database


Expand Down
Loading

0 comments on commit 110b831

Please sign in to comment.