Ensemble creation feature (#68)

* first draft of ensemble code * add ensemble.py * working version, still working on merging identical nodes * add merging of identical nodes and some cleaning * try to add type annotations * adding ensemble usage example * update docs for ensembles and more * further docs * update example wording
lanl · Apr 13, 2024 · 7d3e0f3 · 7d3e0f3
1 parent 604a93c
commit 7d3e0f3
Show file tree

Hide file tree

Showing 18 changed files with 625 additions and 25 deletions.
diff --git a/docs/source/examples/ensembles.rst b/docs/source/examples/ensembles.rst
@@ -0,0 +1,25 @@
+Ensembling Models
+#################
+
+
+Using the :func:`~hippynn.graphs.make_ensemble` function makes it easy to combine models.
+
+By default, ensembling is based on the db_name for the nodes in each input graph.
+Nodes which have the same name will be assigned an ensemble node which combines
+the different versions of that quantity, and additionally calculates the
+mean and standard deviation.
+
+It is easy to make an ensemble from a glob string or a list of directories where
+the models are saved::
+
+    from hippynn.graphs import make_ensemble
+    model_form = '../../collected_models/quad0_b512_p5_GPU*'
+    ensemble_graph, ensemble_info = make_ensemble(model_form)
+
+The ensemble graph takes the inputs which are required for all of the models in the ensemble.
+The ``ensemble_info`` object provides the counts for the inputs and targets of the ensemble
+and the counts of those corresponding quantities across the ensemble members.
+
+A typical use case would be to then build a Predictor or ASE Calculator from the ensemble.
+See :file:`~examples/ensembling_models.py` for a detailed example.
+
diff --git a/docs/source/examples/index.rst b/docs/source/examples/index.rst
@@ -3,8 +3,10 @@ Examples
 
 Here are some examples about how to use various features in
 ``hippynn``. Besides the :doc:`/examples/minimal_workflow` example,
-the examples are just snippets. For fully-fledged examples see the
-``examples`` directory in the repository.
+the examples are just snippets. For runnable example scripts, see
+`the examples at the hippynn github repository`_
+
+.. _`the examples at the hippynn github repository`: https://github.com/lanl/hippynn/tree/development/examples
 
 .. toctree::
     :maxdepth: 1
@@ -13,6 +15,7 @@ the examples are just snippets. For fully-fledged examples see the
     controller
     plotting
     predictor
+    ensembles
     periodic
     forces
     restarting
@@ -21,3 +24,4 @@ the examples are just snippets. For fully-fledged examples see the
     excited_states
     weighted_loss
 
+
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -12,10 +12,28 @@ What is hippynn?
 We aim to provide high-performance modular design so that different
 components can be re-used, extended, or added to. You can find more information
 at the :doc:`/user_guide/features` page. The development home is located
-at `the hippynn github repository`_.
+at `the hippynn github repository`_, which also contains `many example files`_
 
+The main components of hippynn are constructing models, loading databases,
+training the models to those databases, making predictions on new databases,
+and interfacing with other atomistic codes. In particular, we provide interfaces
+to `ASE`_ (prediction), `PYSEQM`_ (training/prediction), and `LAMMPS`_ (prediction).
+hippynn is also used within `ALF`_ for generating machine learned potentials
+along with their training data completely from scratch.
+
+Multiple formats for training data are supported, including
+Numpy arrays, the ASE Database, `fitSNAP`_ JSON format, and `ANI HDF5 files`_.
+
+.. _`ASE`: https://wiki.fysik.dtu.dk/ase/
+.. _`PYSEQM`: https://github.com/lanl/PYSEQM/
+.. _`LAMMPS`: https://www.lammps.org
+.. _`fitSNAP`: https://github.com/FitSNAP/FitSNAP
+.. _`ANI HDF5 files`: https://doi.org/10.1038/s41597-020-0473-z
+.. _`ALF`: https://github.com/lanl/ALF/
 
 .. _`the hippynn github repository`: https://github.com/lanl/hippynn/
+.. _`many example files`: https://github.com/lanl/hippynn/tree/development/examples
+
 
 .. toctree::
    :maxdepth: 1
@@ -27,7 +45,6 @@ at `the hippynn github repository`_.
    hippynn API documentation <api_documentation/hippynn>
    license
 
-
 Indices and tables
 ==================
 

diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -43,6 +43,21 @@ Interfacing codes:
 Installation Instructions
 ^^^^^^^^^^^^^^^^^^^^^^^^^
 
+Conda
+-----
+Install using conda::
+
+    conda install -c conda-forge hippynn
+
+Pip
+---
+Install using pip::
+
+    pip install hippynn
+
+Install from source:
+--------------------
+
 Clone the hippynn_ repository and navigate into it, e.g.::
 
     $ git clone https://github.com/lanl/hippynn.git
@@ -55,14 +70,14 @@ Clone the hippynn_ repository and navigate into it, e.g.::
   out ``cupy`` from the conda_requirements.txt file.
 
 Dependencies using conda
--------------------------
+........................
 
 Install dependencies from conda using recommended channels::
 
     $ conda install -c pytorch -c conda-forge --file conda_requirements.txt
 
 Dependencies using pip
------------------------
+.......................
 
 Minimum dependencies using pip::
 

diff --git a/examples/ensembling_models.py b/examples/ensembling_models.py
@@ -0,0 +1,56 @@
+import torch
+import hippynn
+
+if torch.cuda.is_available():
+    device = 0
+else:
+    device = 'cpu'
+
+### Building the ensemble just requires calling one function call.
+model_form = '../../collected_models/quad0_b512_p5_GPU*'
+ensemble_graph, ensemble_info = hippynn.graphs.make_ensemble(model_form)
+
+# Retrieve the ensemble node which has just been created.
+# The name will be the prefix 'ensemble' followed by the db_name from the ensemble members.
+ensemble_energy = ensemble_graph.node_from_name("ensemble_T")
+
+### Building an ASE calculator for the ensemble
+
+import ase.build
+
+from hippynn.interfaces.ase_interface import HippynnCalculator
+
+# The ensemble node has `mean`, `std`, and `all` outputs.
+energy_node = ensemble_energy.mean
+extra_properties = {"ens_predictions": ensemble_energy.all, "ens_std": ensemble_energy.std}
+calc = HippynnCalculator(energy=energy_node, extra_properties=extra_properties)
+calc.to(device)
+
+# build something and attach the calculator
+molecule = ase.build.molecule("CH4")
+molecule.calc = calc
+
+energy_value = molecule.get_potential_energy()  # Activate calculation to get results dict
+
+print("Got energy", energy_value)
+print("In units of kcal/mol", energy_value / (ase.units.kcal/ase.units.mol))
+
+# All outputs from the ensemble members. Because the model was trained in kcal/mol, this is too.
+# The name in the results dictionary comes from the key in the 'extra_properties' dictionary.
+print("All predictions:", calc.results["ens_predictions"])
+
+
+### Building a Predictor object for the ensemble
+pred = hippynn.graphs.Predictor.from_graph(ensemble_graph)
+
+# get batch-like inputs to the ensemble
+z_vals = torch.as_tensor(molecule.get_atomic_numbers()).unsqueeze(0)
+r_vals = torch.as_tensor(molecule.positions).unsqueeze(0)
+
+pred.to(r_vals.dtype)
+pred.to(device)
+# Do some computation
+output = pred(Z=z_vals, R=r_vals)
+# Print the output of a node using the node or the db_name.
+print(output[ensemble_energy.all])
+print(output["T_all"])
diff --git a/hippynn/graphs/__init__.py b/hippynn/graphs/__init__.py
@@ -27,6 +27,7 @@
 from .graph import GraphModule
 
 from .predictor import Predictor
+from .ensemble import make_ensemble
 
 __all__ = [
     "get_subgraph",
@@ -39,4 +40,5 @@
     "GraphModule",
     "Predictor",
     "IdxType",
+    "make_ensemble",
 ]