Many documentation updates, small tweaks to database interface. (#100)

lanl · Sep 9, 2024 · 110b831 · 110b831
1 parent be79f3a
commit 110b831
Show file tree

Hide file tree

Showing 29 changed files with 530 additions and 179 deletions.
diff --git a/AUTHORS.txt b/AUTHORS.txt
@@ -19,7 +19,7 @@ Emily Shinkle (LANL)
 Michael G. Taylor (LANL)
 Jan Janssen (LANL)
 Cagri Kaymak (LANL)
-Shuhao Zhang (CMU, LANL)
+Shuhao Zhang (CMU, LANL) - Batched Optimization routines
 
 Also thanks to testing and feedback from:
 
@@ -36,3 +36,5 @@ David Rosenberger
 Michael Tynes
 Drew Rohskopf
 Neil Mehta
+Alice E A Allen
+
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -3,23 +3,47 @@
 Breaking changes:
 -----------------
 
+- set_e0_values has been renamed hierarchical_energy_initialization. The old name is
+  still provided but deprecated, and will be removed.
+
 New Features:
 -------------
 
-- Added a new custom cuda kernel implementation using triton. These are highly performant and now the default implementation.
-- Exporting a database to NPZ or H5 format after preprocessing is now just a function call away.
-- SNAPjson format can now support an optional number of comment lines.
-- Added Batch optimizer features in order to optimize geometries in parallel on the GPU. Algorithms include FIRE and BFGS.
+- Added a new custom cuda kernel implementation using triton.
+  These are highly performant and now the default implementation.
+- Exporting any database to NPZ or H5 format after preprocessing can be done with a method call.
+- Database states can be cached to disk to simplify the restarting of training.
+- Added batch geometry optimizer features in order to optimize geometries
+  in parallel on the GPU. Algorithms include FIRE, Newton-Raphson, and BFGS.
+- Added experiment pytorch lightning trainer to provide for simple parallelized training.
+- Added a molecular dynamics engine which includes the ability to batch over systems.
+- Added examples pertaining to coarse graining.
+- Added pair finders based on scipy KDTree for training to large systems.
+- Added tool to drastically simplify creating ensemble models. The ensemblized graphs
+  are compatible with molecular dynamics codes such ASE and LAMMPS.
+- Added the ability to weight different systems/atoms/bonds in a loss function.
+
 
 Improvements:
 -------------
 
 - Eliminated dependency on pyanitools for loading ANI-style H5 datasets.
+- SNAPjson format can now support an optional number of comment lines.
+- Added unit conversion options to the LAMMPS interface.
+- Improved performance of bond order regression.
+- It is now possible to limit the memory usage of the MLIAP interface in LAMMPS
+  using a library setting.
+- Provide tunable regularization of HIP-NN-TS with an epsilon parameter, and
+  set the default to use a better value for epsilon.
+
 
 Bug Fixes:
 ----------
 
 - Fixed bug where custom kernels were not launching properly on non-default GPUs
+- Fixed error when LAMMPS interface is in kokkos mode and the kokkos device was set to CPU.
+- MLIAPInterface objects
+- Fixed bug with RDF computer automatic initialization.
 
 0.0.3
 =======

diff --git a/COPYRIGHT.txt b/COPYRIGHT.txt
@@ -0,0 +1,10 @@
+
+Copyright 2019. Triad National Security, LLC. All rights reserved.
+This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos
+National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S.
+Department of Energy/National Nuclear Security Administration. All rights in the program are
+reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear
+Security Administration. The Government is granted for itself and others acting on its behalf a
+nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare
+derivative works, distribute copies to the public, perform publicly and display publicly, and to permit
+others to do so.
diff --git a/LICENSE.txt b/LICENSE.txt
@@ -1,15 +1,5 @@
 
 
-Copyright 2019. Triad National Security, LLC. All rights reserved.
-This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos
-National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S.
-Department of Energy/National Nuclear Security Administration. All rights in the program are
-reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear
-Security Administration. The Government is granted for itself and others acting on its behalf a
-nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare
-derivative works, distribute copies to the public, perform publicly and display publicly, and to permit
-others to do so.
-
 This program is open source under the BSD-3 License.
 Redistribution and use in source and binary forms, with or without modification, are permitted
 provided that the following conditions are met:

diff --git a/README.rst b/README.rst
@@ -106,6 +106,7 @@ The Journal of chemical physics, 148(24), 241715.
 See AUTHORS.txt for information on authors.
 
 See LICENSE.txt for licensing information. hippynn is licensed under the BSD-3 license.
+See COPYRIGHT.txt for copyright information.
 
 Triad National Security, LLC (Triad) owns the copyright to hippynn, which it identifies as project number LA-CC-19-093.
 

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -45,9 +45,11 @@
     "no-show-inheritance": True,
     "special-members": "__init__",
 }
+autodoc_member_order = 'bysource'
+
 
 # The following are highly optional, so we mock them for doc purposes.
-autodoc_mock_imports = ["pyanitools", "seqm", "schnetpack", "cupy", "lammps", "numba", "triton", "pytorch_lightning"]
+autodoc_mock_imports = ["pyanitools", "seqm", "schnetpack", "cupy", "lammps", "numba", "triton", "pytorch_lightning", 'triton', 'scipy']
 
 
 # -- Options for HTML output -------------------------------------------------

diff --git a/docs/source/examples/controller.rst b/docs/source/examples/controller.rst
@@ -1,7 +1,6 @@
 Controller
 ==========
 
-
 How to define a controller for more customized control of the training process.
 We assume that there is a set of ``training_modules`` assembled and a ``database`` object has been constructed.
 

diff --git a/docs/source/examples/index.rst b/docs/source/examples/index.rst
@@ -3,8 +3,8 @@ Examples
 
 Here are some examples about how to use various features in
 ``hippynn``. Besides the :doc:`/examples/minimal_workflow` example,
-the examples are just snippets. For runnable example scripts, see
-`the examples at the hippynn github repository`_
+the examples are just snippets, rather than full scripts.
+For runnable example scripts, see `the examples at the hippynn github repository`_
 
 .. _`the examples at the hippynn github repository`: https://github.com/lanl/hippynn/tree/development/examples
 
@@ -23,5 +23,5 @@ the examples are just snippets. For runnable example scripts, see
     mliap_unified
     excited_states
     weighted_loss
-
+    lightning
 
diff --git a/docs/source/examples/lightning.rst b/docs/source/examples/lightning.rst
@@ -0,0 +1,20 @@
+Pytorch Lightning module
+========================
+
+
+Hippynn incldues support for distributed training using `pytorch-lightning`_.
+This can be accessed using the :class:`hippynn.experiment.HippynnLightningModule` class.
+The class has two class-methods for creating the lightning module using the same
+types of arguments that would be used for an ordinary hippynn experiment.
+These are :meth:`hippynn.experiment.HippynnLightningModule.from_experiment_setup`
+and :meth:`hippynn.experiment.HippynnLightningModule.from_train_setup`.
+Alternatively, you may construct and supply the arguments for the module yourself.
+
+Finally, in additional to the usual pytorch lightning arguments,
+the hippynn lightning module saves an additional file, `experiment_structure.pt`,
+which needs to be provided as an argument to the
+:meth:`hippynn.experiment.HippynnLightningModule.load_from_checkpoint` constructor.
+
+
+.. _pytorch-lightning: https://github.com/Lightning-AI/pytorch-lightning
+
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -8,31 +8,53 @@ We hope you enjoy your stay.
 What is hippynn?
 ================
 
-`hippynn` is a python library for machine learning on atomistic systems.
+``hippynn`` is a python library for machine learning on atomistic systems
+using `pytorch`_.
 We aim to provide high-performance modular design so that different
 components can be re-used, extended, or added to. You can find more information
-at the :doc:`/user_guide/features` page. The development home is located
-at `the hippynn github repository`_, which also contains `many example files`_
+about overall library features at the :doc:`/user_guide/features` page.
+The development home is located at `the github github repository`_, which also contains `many example files`_.
+Additionally, the :doc:`user guide </user_guide/index>` aims to describe abstract
+aspects of the library, while the
+:doc:`examples documentation section </examples/index>` aims to show
+more concretely how to perform tasks with hippynn. Finally, the
+:doc:`api documentation </api_documentation/hippynn>` contains a comprehensive
+listing of the library components and their documentation.
 
 The main components of hippynn are constructing models, loading databases,
 training the models to those databases, making predictions on new databases,
-and interfacing with other atomistic codes. In particular, we provide interfaces
-to `ASE`_ (prediction), `PYSEQM`_ (training/prediction), and `LAMMPS`_ (prediction).
+and interfacing with other atomistic codes for operations such as molecular dynamics.
+In particular, we provide interfaces to `ASE`_ (prediction),
+`PYSEQM`_ (training/prediction), and `LAMMPS`_ (prediction).
 hippynn is also used within `ALF`_ for generating machine learned potentials
 along with their training data completely from scratch.
 
-Multiple formats for training data are supported, including
-Numpy arrays, the ASE Database, `fitSNAP`_ JSON format, and `ANI HDF5 files`_.
+Multiple :doc:`database formats </user_guide/databases>` for training data are supported, including
+Numpy arrays, `ASE`_-compatible formats, `FitSNAP`_ JSON format, and `ANI HDF5 files`_.
+
+``hippynn`` includes many tools, such as an :doc:`ASE calculator</examples/ase_calculator>`,
+a :doc:`LAMMPS MLIAP interface</examples/mliap_unified>`,
+:doc:`batched prediction </examples/predictor>` and batched geometry optimization,
+:doc:`automatic ensemble creation </examples/ensembles>`,
+:doc:`restarting training from checkpoints </examples/restarting>`,
+:doc:`sample-weighted loss functions </examples/weighted_loss>`,
+:doc:`distributed training with pytorch lightning </examples/lightning>`,
+and more.
+
+``hippynn`` is highly modular, and if you are a model developer, interfacing your
+pytorch model into the hippynn node/graph system will make it simple and easy for users
+to build models of energy, charge, bond order, excited state energies, and more.
 
 .. _`ASE`: https://wiki.fysik.dtu.dk/ase/
 .. _`PYSEQM`: https://github.com/lanl/PYSEQM/
 .. _`LAMMPS`: https://www.lammps.org
-.. _`fitSNAP`: https://github.com/FitSNAP/FitSNAP
+.. _`FitSNAP`: https://github.com/FitSNAP/FitSNAP
 .. _`ANI HDF5 files`: https://doi.org/10.1038/s41597-020-0473-z
 .. _`ALF`: https://github.com/lanl/ALF/
 
-.. _`the hippynn github repository`: https://github.com/lanl/hippynn/
+.. _`the github github repository`: https://github.com/lanl/hippynn/
 .. _`many example files`: https://github.com/lanl/hippynn/tree/development/examples
+.. _`pytorch`: https://pytorch.org
 
 
 .. toctree::

diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -2,7 +2,6 @@ Installation
 ============
 
 
-
 Requirements
 ^^^^^^^^^^^^
 
@@ -43,6 +42,8 @@ Interfacing codes:
 .. _LAMMPS: https://www.lammps.org/
 .. _PYSEQM: https://github.com/lanl/PYSEQM
 .. _pytorch-lightning: https://github.com/Lightning-AI/pytorch-lightning
+.. _hippynn: https://github.com/lanl/hippynn/
+
 
 Installation Instructions
 ^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -67,7 +68,6 @@ Clone the hippynn_ repository and navigate into it, e.g.::
     $ git clone https://github.com/lanl/hippynn.git
     $ cd hippynn
 
-.. _hippynn: https://github.com/lanl/hippynn/
 
 
 Dependencies using conda

diff --git a/docs/source/user_guide/ckernels.rst b/docs/source/user_guide/ckernels.rst
@@ -60,7 +60,7 @@ The three custom kernels correspond to the interaction sum in hip-nn:
 
 .. math::
 
-    a'_{i,a} =  = \sum_{\nu,b} V^\nu_{a,b} e^{\nu}_{i,b}
+    a'_{i,a} = \sum_{\nu,b} V^\nu_{a,b} e^{\nu}_{i,b}
 
     e^{\nu}_{i,a} = \sum_p s^\nu_{p} z_{p_j,a}
 

diff --git a/docs/source/user_guide/concepts.rst b/docs/source/user_guide/concepts.rst
@@ -45,8 +45,9 @@ Graphs
 
 A :class:`~hippynn.graphs.GraphModule` is a 'compiled' set of nodes; a ``torch.nn.Module`` that executes the graph.
 
-GraphModules are used in a number of places within hippynn.
-
+GraphModules are used in a number of places within hippynn,
+such as the model, the loss, the evaluator, the predictor, the ASE interface,
+and the LAMMPS interface objects all use GraphModules.
 
 Experiment
 ^^^^^^^^^^

diff --git a/docs/source/user_guide/databases.rst b/docs/source/user_guide/databases.rst
@@ -31,12 +31,45 @@ the [i,j] element of the cell gives the j cartesian coordinate of cell vector i.
 massive difficulties fitting to periodic boundary conditions, you may check the transposed version
 of your cell data, or compute the RDF.
 
+Database Formats and notes
+---------------------------
 
-ASE Objects Database handling
-----------------------------------------------------------
-If your training data is stored as ASE files of any type (.json,.db,.xyz,.traj ... etc.) it can be loaded directly 
-a Database for hippynn.
+Numpy arrays on disk
+........................
+
+see :class:`hippynn.databases.NPZDatabase` (if arrays are stored
+in a `.npz` dictionary) or :class:`hippynn.databases.DirectoryDatabase`
+(if each array is in its own file).
+
+Numpy arrays in memory
+........................
+
+Use the base :class:`hippynn.databases.Database` class directly to initialize
+a database from a dictionary mapping db_names to numpy arrays.
+
+pyanitools H5 files
+........................
+
+See :class:`hippynn.databases.PyAniFileDB` and see :class:`hippynn.databases.PyAniDirectoryDB`.
+
+This format requires ``h5py`` and ``ase`` to be installed.
+
+Snap JSON Format
+........................
+
+See :class:`hippynn.databases.SNAPDirectoryDatabase`. This format requires ``ase`` to be installed.
+
+For more information on this format, see the FitSNAP_ software.
+
+.. _FitSNAP: https://fitsnap.github.io
+
+ASE Database
+........................
+
+If your training data is stored as ASE files of any type,
+(.json,.db,.xyz,.traj ... etc.) it can be loaded directly
+as a Database for hippynn.
 
 The ASE database :class:`~hippynn.databases.AseDatabase` can be loaded with ASE installed.
 
-See ~/examples/ase_db_example.py for a basic example utilzing the class.
+See ~/examples/ase_db_example.py for a basic example utilizing the class.
diff --git a/docs/source/user_guide/features.rst b/docs/source/user_guide/features.rst
@@ -11,7 +11,7 @@ Modular set of pytorch layers for atomistic operations
   if you want to use them in your scripts without using the rest of the features
   provided here -- no problem!
 
-API documentation for :mod:`~hippynn.layers`
+API documentation for :mod:`~hippynn.layers` and :mod:`~hippynn.networks`
 
 Graph level API for simple and flexible construction of models from pytorch components.
 ---------------------------------------------------------------------------------------
@@ -26,6 +26,12 @@ Graph level API for simple and flexible construction of models from pytorch comp
 
 API documentation for :mod:`~hippynn.graphs`
 
+For more information on nodes and graphs, see the `graph exploration ipython notebook`_ which can also
+be found in the example files.
+
+.. _graph exploration ipython notebook: https://github.com/lanl/hippynn/blob/development/examples/graph_exploration.ipynb
+
+
 Plot level API for tracking your training.
 ----------------------------------------------------------
 - Using the graph API, define quantities to evaluate before, during, or after training as
@@ -46,7 +52,7 @@ API documentation for :mod:`~hippynn.experiment`
 Custom Kernels for fast execution
 ----------------------------------------------------------
 - Certain operations are not efficiently written in pure pytorch, we provide
-  alternative implementations with ``numba``
+  alternative implementations.
 - These are directly linked in with pytorch Autograd -- use them like native pytorch functions.
 - These provide advantages in memory footprint and speed
 - Includes CPU and GPU execution for custom kernels
@@ -55,7 +61,8 @@ More information at :doc:`this page </user_guide/ckernels>`
 
 Interfaces
 ----------------------------------------------------------
-- ASE: Define `ase` calculators based on the graph-level API.
-- PYSEQM: Use `pyseqm` calculations as nodes in a graph.
+- ASE: Define ``ase`` calculators based on the graph-level API.
+- PYSEQM: Use ``pyseqm`` calculations as nodes in a graph.
+- LAMMPS: Create a file for use as a `pair style mliap` object.
 
 API documentation for :mod:`~hippynn.interfaces`
diff --git a/docs/source/user_guide/settings.rst b/docs/source/user_guide/settings.rst
@@ -69,3 +69,8 @@ The following settings are available:
      - float between 0 and 1
      - 1.0
      - no
+   * - TIMEPLOT_AUTOSCALING
+     - If True, only provide log-scaled plots of training quantities over time if warranted by the data. If False, always produce all plots in linear, log, and loglog scales.
+     - bool
+     - True
+     - yes
diff --git a/examples/ani1x_training.py b/examples/ani1x_training.py
@@ -108,7 +108,7 @@ def load_db(db_info, en_name, force_name, seed, anidata_location, n_workers):
     found_indices = ~np.isnan(database.arr_dict[en_name])
     database.arr_dict = {k: v[found_indices] for k, v in database.arr_dict.items()}
 
-    database.make_trainvalidtest_split(0.1, 0.1)
+    database.make_trainvalidtest_split(test_size=0.1, valid_size=0.1)
     return database