Custom kernel improvements (#107)

- Wrapped "pytorch" version with torch.compile. - Allows "cupy" to run without "numba" installed. - Added "sparse" version using torch.sparse. - Provide registration mechanism for implementations of message passing, simplifying the process of adding new versions - Refactor to only require a single test script - Add speed test script - Moved old implementations to hidden, but still accessible if needed. - Improved documentation Other changes, not related to custom kernels: - improved documentation. Commit messages: * Add first implementation of sparse kernels. Note: sparse kernel sensesum cannot be used when more than one entry connects the same pair, however this is caught as an error. Fixed bug with compare_against='triton'. * Add improved and simplified dispatch for custom kernel implementation. * remove individual test files that are no longer necessary * improve pytorch-only kernels and add compile and jit options * fix python error on gpu part of test * wrong print message * put speed tests into a single function * make comparison function also be wrapped * add script for benchmarking implementation speeds * formatting * add features to speed tester * add legacy atomic based kernels, improved test script * update docs * fix arg passing * tweak documentation tables * adjust error message
lanl · Sep 27, 2024 · 36b7350 · 36b7350
1 parent c1c084c
commit 36b7350
Show file tree

Hide file tree

Showing 30 changed files with 1,201 additions and 352 deletions.
diff --git a/conda_requirements.txt b/conda_requirements.txt
@@ -1,5 +1,5 @@
 numpy
-pytorch >= 1.9
+pytorch >= 2.0
 torchtriton
 matplotlib
 numba
@@ -8,4 +8,5 @@ ase
 h5py
 tqdm
 python-graphviz
-lightning
+lightning
+opt_einsum
diff --git a/docs/source/examples/controller.rst b/docs/source/examples/controller.rst
@@ -2,11 +2,12 @@ Controller
 ==========
 
 How to define a controller for more customized control of the training process.
-We assume that there is a set of ``training_modules`` assembled and a ``database`` object has been constructed.
+We assume that there is a set of :class:`~hippynn.experiment.assembly.TrainingModules` assembled, called ``training_modules``,
+and a :class:`~hippynn.databases.Database`-like object called ``database`` that has been constructed.
 
 The following snippet shows how to set up a controller using a custom scheduler or optimizer::
 
-    from hippynn.experiment.controllers import RaiseBatchSizeOnPlateau,PatienceController
+    from hippynn.experiment.controllers import RaiseBatchSizeOnPlateau, PatienceController
 
     optimizer = torch.optim.Adam(training_modules.model.parameters(),lr=1e-3)
 

diff --git a/docs/source/examples/ensembles.rst b/docs/source/examples/ensembles.rst
@@ -21,5 +21,6 @@ The ``ensemble_info`` object provides the counts for the inputs and targets of t
 and the counts of those corresponding quantities across the ensemble members.
 
 A typical use case would be to then build a Predictor or ASE Calculator from the ensemble.
-See :file:`~examples/ensembling_models.py` for a detailed example.
+See `/examples/ensembling_models.py`_ for a detailed example.
 
+.. _/examples/ensembling_models.py: https://github.com/lanl/hippynn/blob/development/examples/ensembling_models.py
diff --git a/docs/source/examples/plotting.rst b/docs/source/examples/plotting.rst
@@ -2,12 +2,13 @@ Plotting
 ========
 
 
-How to make a plotmaker.
+:mod:`hippynn.plotting` is only available if matplotlib is installed.
 
-Let's assume you have a ``molecule_energy`` node that you are training to.
+By default, hippynn will plot loss metrics over time when training ends.
+On top of this, hippynn can make diagnostic plots during its evaluation phase.
+For example, Let's assume you have a ``molecule_energy`` node that you are training to.
 A simple plot maker would look like this::
 
-
     from hippynn import plotting
 
     plot_maker = hippynn.plotting.PlotMaker(
@@ -19,7 +20,8 @@ A simple plot maker would look like this::
 
     training_modules,db_info = assemble_for_training(train_loss, validation_losses, plot_maker=plot_maker)
 
-The plot maker is thus passed to `assemble_for_training` and attached to the model evaluator.
+The plot maker is thus passed to :func:`~hippynn.experiment.assemble_for_training` and attached to the model evaluator.
+
+
 
-Note that :mod:`hippynn.plotting` is only available if matplotlib is installed.
 
diff --git a/docs/source/examples/predictor.rst b/docs/source/examples/predictor.rst
@@ -1,10 +1,10 @@
 Predictor
 =========
 
-The predictor is a simple API for making predictions on an entire database.
+The :class:`~hippynn.graphs.Predictor` is a class for making predictions on an entire database.
 
 Often you'll want to make predictions based on the model. For this,
-use :meth:`Predictor.from_graph`. Let's assume you have a ``GraphModule`` called ``model``::
+use the :meth:`~hippynn.graphs.Predictor.from_graph`. method. Let's assume you have a :class:`~hippynn.GraphModule` called ``model``::
 
     predictor = hippynn.graphs.Predictor.from_graph(model)
 

diff --git a/docs/source/examples/restarting.rst b/docs/source/examples/restarting.rst
@@ -117,7 +117,7 @@ Advanced Details
 -  Here are a list of objects and their final device after loading.
 
    .. list-table::
-      :widths: 40 30
+      :widths: 30 70
       :header-rows: 1
 
       * - Objects

diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -7,7 +7,7 @@ Requirements
 
 Requirements:
     * Python_ >= 3.9
-    * pytorch_ >= 1.9
+    * pytorch_ >= 2.0
     * numpy_
 
 Optional Dependencies:
@@ -20,6 +20,7 @@ Optional Dependencies:
     * graphviz_ (for visualizing model graphs)
     * h5py_ (for loading ani-h5 datasets)
     * pytorch-lightning_ (for distributed training)
+    * opt_einsum_ (backend for accelerating some pytorch expressions)
 
 Interfacing codes:
     * ASE_
@@ -41,7 +42,7 @@ Interfacing codes:
 .. _PYSEQM: https://github.com/lanl/PYSEQM
 .. _pytorch-lightning: https://github.com/Lightning-AI/pytorch-lightning
 .. _hippynn: https://github.com/lanl/hippynn/
-
+.. _opt_einsum: https://github.com/dgasmith/opt_einsum
 
 Installation Instructions
 ^^^^^^^^^^^^^^^^^^^^^^^^^

diff --git a/docs/source/user_guide/ckernels.rst b/docs/source/user_guide/ckernels.rst
@@ -4,14 +4,106 @@ Custom Kernels
 Bottom line up front
 --------------------
 
-We use custom kernels in `hippynn` to accelerate the HIP-NN neural network message passing.
-On the GPU, the best implementation to select is ``triton``, followed by ``cupy``,
-followed by ``numba``. On the CPU, only ``numba`` is available. In general, these
+If possible, install ``triton`` and ``numba``, as they will accelerate HIP-NN networks
+and reduce memory cost on GPU and CPU, respectively.
+
+
+Brief Description
+-----------------
+
+We use custom kernels in hippynn to accelerate the HIP-NN neural network message passing and
+to significantly reduce the amount of memory required in passing messages.
+On the GPU, the best implementation to select is ``"triton"``, followed by ``"cupy"``,
+followed by ``"numba"``. On the CPU, only ``"numba"`` is available. In general, these
 custom kernels are very useful, and the only reasons for them to be off is if are
 if the packages are not available for installation in your environment or if diagnosing
 whether or not a bug could be related to potential misconfiguration of these additional packages.
-``triton`` comes with recent versions of ``pytorch``, so optimistically you may already be
-configured to use the custom kernels.
+``"triton"`` comes with recent versions of ``"pytorch"``, so optimistically you may already be
+configured to use the custom kernels. Finally, there is the ``"sparse"`` implementation, which
+uses torch.sparse functions. This saves memory much as the kernels from external packages,
+however, it does not currently achieve a significant speedup over pytorch.
+
+
+Comparison Table
+----------------
+
+
+.. list-table:: Hippynn Custom Kernels Options Summary
+   :widths: 4 30 3 3 3 3 10 30
+   :header-rows: 1
+
+   * - Name
+     - Description
+     - Low memory
+     - Speedup
+     - CPU
+     - GPU
+     - Required Packages
+     - Notes
+   * - pytorch
+     - Dense operations and index add operations
+     - No
+     - No
+     - Yes
+     - Yes
+     - None
+     - lowest overhead, gauranteed to run, but poorest performance
+       for large data
+   * - triton
+     - CSR-dense with OpenAI's triton compiler
+       using autotuning.
+     - Yes
+     - Excellent
+     - no
+     - yes
+     - triton
+     - Best option for GPU. Does incur some start-up lag due to autotuning.
+   * - numba
+     - CSR-dense hybrid with numba
+     - Yes
+     - Good
+     - Yes
+     - Yes
+     - numba
+     - Best option for CPU; non-CPU implementations fall back to this on CPU when available.
+   * - cupy
+     - CSR-dense hybrid with cupy/C code.
+     - Yes
+     - Great
+     - no
+     - yes
+     - cupy
+     - Direct translation of numba algorithm, but has improved performance.
+   * - sparse
+     - CSR-dense using torch.sparse operations.
+     - Yes
+     - None
+     - Yes
+     - Yes
+     - pytorch>=2.4
+     - Cannot handle all systems, but raises an error on failure.
+
+.. note::
+   Kernels which do not support the CPU fall back to numba if it is available, and
+   to pytorch if it is not.
+
+.. note::
+   Custom Kernels do come with some launch overheads compared to the pytorch implementation.
+   If your workload is small (small batch sizes, networks, and/or small systems)
+   and you're using a GPU, then you may find best performance with kernels set to ``"pytorch"``.
+
+.. note::
+   The sparse implementation is slow for very small workload sizes. At large workload
+   sizes, it is about as fast as pytorch (while using less memory), but still slower
+   than numba.
+
+.. note::
+   The sparse implementation does not handle message-passing where atoms can appear
+   together in two or more sets of pairs due to small systems with periodic boundary conditions.
+
+
+For information on how to set the custom kernels, see :doc:`settings`
+
 
 Detailed Explanation
 --------------------

diff --git a/hippynn/_settings_setup.py b/hippynn/_settings_setup.py
@@ -88,13 +88,12 @@ def kernel_handler(kernel_string):
     kernel = {
         "0": False,
         "false": False,
-        "pytorch": False,
         "1": True,
         "true": True,
     }.get(kernel_string, kernel_string)
 
-    if kernel not in [True, False, "auto", "triton", "cupy", "numba"]:
-        warnings.warn(f"Unexpected custom kernel setting: {kernel_string}.", stacklevel=3)
+    # This function used to warn about unexpected kernel settings.
+    # Now this is an error which is raised in the custom_kernels module.
 
     return kernel