Skip to content

Commit

Permalink
Process: Add the metadata.disable_cache input (#6293)
Browse files Browse the repository at this point in the history
Often, a user will want to have caching enabled by default, but in some
specific cases disable it for a particular process. The
`disable_caching` context manager was designed for such a use-case,
however, this only affects the current Python interpreter and is
therefore not useful when submitting to the daemon.

The `metadata.disable_cache` input is added to the `Process` class. It
is not set by default, but when set to `True`, the cache is completely
disabled when storing the node. It overrides all other configuration
rules that may be active. Since this value is set directly on the node
through the process' inputs, its action will affect all interpreters
including those of the daemon workers.

To implement this, the `should_use_cache` method was added to the
`NodeCaching` class. This simply returns the result of the
`get_use_cache` utility function, which evaluates the caching
configuration settings. This used to be called directly in the
`Node.store` method, but by abstracting it, it can be overridden by the
`ProcessNodeCaching` class, which can now return `False` if the
`disable_cache` metadata input is set to `True`. Note that if it is not
set or set to `False`, the input is ignored, keeping backwards
compatibility.

Since the value of the `disable_cache` is stored in the node's
attributes under the `metadata_inputs` key, that key should be ignored
when computing the hash of the node. If the key is included in the hash,
then a process that explicitly specifies `disable_cache=False` would not
be cached because the hash would not match a node that did not specify
the input at all.

The `metadata_inputs` contains all metadata inputs and not just the
`disable_cache`, but the `_hash_ignored_attributes` currently does not
support specifying nested keys. That doesn't matter though as the
`metadata_inputs` should be ignored entirely anyway. The commit that
introduced this key simply forgot to add it to the ignore list.
  • Loading branch information
sphuber authored Feb 23, 2024
1 parent 82c3728 commit 4626b11
Show file tree
Hide file tree
Showing 8 changed files with 98 additions and 18 deletions.
11 changes: 11 additions & 0 deletions docs/source/howto/run_codes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -969,5 +969,16 @@ When ``strict`` is set to ``True``, the function will raise a ``ValueError`` if
Besides controlling which process classes are cached, it may be useful or necessary to control what already *stored* nodes are used as caching *sources*.
Section :ref:`topics:provenance:caching:control-caching` provides details how AiiDA decides which stored nodes are equivalent to the node being stored and which are considered valid caching sources.

Alternatively, if the cache should not be considered for a specific process, its ``metadata.disable_cache`` input can be set to ``True``:

.. code-block:: python
from aiida.engine import submit
submit(SomeProcess, inputs={'metadata': {'disable_cache': True}})
The advantage of this approach is that the ``disable_cache`` metadata input overrides all other configuration and controls of caching, so the process is guaranteed to not be taken from the cache.
Unlike the ``enable_caching`` and ``disable_caching`` context managers which only affect the local interpreter, this approach is respected by all interpreters.
This approach, therefore, is mostly useful when submitting processes to the daemon that should ignore the cache.

.. |Computer| replace:: :py:class:`~aiida.orm.Computer`
.. |CalcJob| replace:: :py:class:`~aiida.engine.processes.calcjobs.calcjob.CalcJob`
8 changes: 7 additions & 1 deletion src/aiida/engine/processes/process.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,12 @@ def define(cls, spec: ProcessSpec) -> None: # type: ignore[override]
default='CALL',
help='The label to use for the `CALL` link if the process is called by another process.',
)
spec.input(
'metadata.disable_cache',
required=False,
valid_type=bool,
help='Do not consider the cache for this process, ignoring all other caching configuration rules.',
)
spec.inputs.valid_type = orm.Data
spec.inputs.dynamic = False # Settings a ``valid_type`` automatically makes it dynamic, so we reset it again
spec.exit_code(
Expand Down Expand Up @@ -721,7 +727,7 @@ def _setup_version_info(self) -> None:
def _setup_metadata(self, metadata: dict) -> None:
"""Store the metadata on the ProcessNode."""
for name, value in metadata.items():
if name in ['store_provenance', 'dry_run', 'call_link_label']:
if name in ['store_provenance', 'dry_run', 'call_link_label', 'disable_cache']:
continue

if name == 'label':
Expand Down
9 changes: 9 additions & 0 deletions src/aiida/orm/nodes/caching.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,15 @@ def is_created_from_cache(self) -> bool:
"""
return self.get_cache_source() is not None

def should_use_cache(self) -> bool:
"""Return whether the cache should be considered when storing this node.
:returns: True if the cache should be considered, False otherwise.
"""
from aiida.manage.caching import get_use_cache

return get_use_cache(identifier=self._node.process_type)

def _get_same_node(self) -> 'Node' | None:
"""Returns a stored node from which the current Node can be cached or None if it does not exist
Expand Down
9 changes: 2 additions & 7 deletions src/aiida/orm/nodes/node.py
Original file line number Diff line number Diff line change
Expand Up @@ -447,8 +447,6 @@ def store(self) -> 'Node':
:note: After successful storage, those links that are in the cache, and for which also the parent node is
already stored, will be automatically stored. The others will remain unstored.
"""
from aiida.manage.caching import get_use_cache

if not self.is_stored:
# Call `_validate_storability` directly and not in `_validate` in case sub class forgets to call the super.
self._validate_storability()
Expand All @@ -457,15 +455,12 @@ def store(self) -> 'Node':
# Verify that parents are already stored. Raises if this is not the case.
self._verify_are_parents_stored()

# Determine whether the cache should be used for the process type of this node.
use_cache = get_use_cache(identifier=self.process_type)

# Clean the values on the backend node *before* computing the hash in `_get_same_node`. This will allow
# us to set `clean=False` if we are storing normally, since the values will already have been cleaned
self._backend_entity.clean_values()

# Retrieve the cached node.
same_node = self.base.caching._get_same_node() if use_cache else None
# Retrieve the cached node if ``should_use_cache`` returns True
same_node = self.base.caching._get_same_node() if self.base.caching.should_use_cache() else None

if same_node is not None:
self._store_from_cache(same_node)
Expand Down
17 changes: 17 additions & 0 deletions src/aiida/orm/nodes/process/process.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,19 @@ class ProcessNodeCaching(NodeCaching):
# The link_type might not be correct while the object is being created.
_hash_ignored_inputs = ['CALL_CALC', 'CALL_WORK']

def should_use_cache(self) -> bool:
"""Return whether the cache should be considered when storing this node.
:returns: True if the cache should be considered, False otherwise.
"""
metadata_inputs = self._node.get_metadata_inputs() or {}
disable_cache = metadata_inputs.get('metadata', {}).get('disable_cache', None)

if disable_cache:
return False

return super().should_use_cache()

@property
def is_valid_cache(self) -> bool:
"""Return whether the node is valid for caching
Expand Down Expand Up @@ -151,6 +164,10 @@ def __str__(self) -> str:

return f'{base}'

@classproperty
def _hash_ignored_attributes(cls) -> Tuple[str, ...]: # noqa: N805
return super()._hash_ignored_attributes + ('metadata_inputs',)

@classproperty
def _updatable_attributes(cls) -> Tuple[str, ...]: # noqa: N805
return super()._updatable_attributes + (
Expand Down
6 changes: 6 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -347,6 +347,12 @@ def manager():
return get_manager()


@pytest.fixture
def runner(manager):
"""Get the ``Runner`` instance of the currently loaded profile."""
return manager.get_runner()


@pytest.fixture
def event_loop(manager):
"""Get the event loop instance of the currently loaded profile.
Expand Down
48 changes: 42 additions & 6 deletions tests/engine/test_process.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
from aiida.common.lang import override
from aiida.engine import ExitCode, ExitCodesNamespace, Process, run, run_get_node, run_get_pk
from aiida.engine.processes.ports import PortNamespace
from aiida.manage.caching import enable_caching
from aiida.manage.caching import disable_caching, enable_caching
from aiida.orm.nodes.caching import NodeCaching
from aiida.plugins import CalculationFactory
from plumpy.utils import AttributesFrozendict
Expand Down Expand Up @@ -509,15 +509,11 @@ def define(cls, spec):
spec.input('metadata_portnamespace.without_default', is_metadata=True)


def test_metadata_inputs():
def test_metadata_inputs(runner):
"""Test that explicitly passed ``is_metadata`` inputs are stored in the attributes.
This is essential to make it possible to recreate a builder for the process with the original inputs.
"""
from aiida.manage import get_manager

runner = get_manager().get_runner()

inputs = {
'metadata_port': 'value',
'metadata_port_non_serializable': orm.Data().store(),
Expand All @@ -531,3 +527,43 @@ def test_metadata_inputs():
'metadata_port': 'value',
'metadata_portnamespace': {'without_default': 100},
}


class CachableProcess(Process):
"""Dummy process that defines a storable and cachable node class."""

_node_class = orm.CalculationNode


@pytest.mark.usefixtures('aiida_profile_clean')
def test_metadata_disable_cache(runner, entry_points):
"""Test the ``metadata.disable_cache`` input."""
from aiida.engine.processes import ProcessState

entry_points.add(CachableProcess, 'aiida.workflows:core.dummy')

# Create a ``ProcessNode`` instance that is a valid cache source
process_original = CachableProcess(runner=runner)
process_original.node.set_process_state(ProcessState.FINISHED)
process_original.node.seal()
assert process_original.node.base.caching.is_valid_cache

# Cache is disabled, so node should not be cached
with disable_caching():
process = CachableProcess(runner=runner)
assert not process.node.base.caching.is_created_from_cache

# Cache is disabled, fact that ``disable_cache`` is explicitly set to ``False`` should not change anything
with disable_caching():
process = CachableProcess(runner=runner, inputs={'metadata': {'disable_cache': False}})
assert not process.node.base.caching.is_created_from_cache

# Cache is enabled, so node should be cached
with enable_caching():
process = CachableProcess(runner=runner)
assert process.node.base.caching.is_created_from_cache

# Cache is enabled, but ``disable_cache`` is explicitly set to ``False``, so node should not be cached
with enable_caching():
process = CachableProcess(runner=runner, inputs={'metadata': {'disable_cache': True}})
assert not process.node.base.caching.is_created_from_cache
Loading

0 comments on commit 4626b11

Please sign in to comment.