Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

after_context_created not the first hook to run, but is fixed when kedro-telemetry is uninstalled #2492

Closed
melvinkokxw opened this issue Apr 4, 2023 · 7 comments

Comments

@melvinkokxw
Copy link

melvinkokxw commented Apr 4, 2023

Description

after_context_created should be the earliest hook triggered (as per Kedro's documentation) but after_catalog_created is triggered before it.

When I uninstalled kedro-telemetry (to fix a separate issue), the hooks are triggered in the expected order.

Raising the bug here instead of the kedro-plugins repo as I cannot confirm if kedro-telemetry causes the bug

Context

I was doing a kedro run and trying to read parameters in a hook, expecting the after_context_created hook to be triggered first. Instead, after_catalog_created was triggered before it.

Steps to Reproduce

  1. Create hooks and add them in settings.py
  2. Do a kedro run

Expected Result

after_context_created hook should be the earliest hook to be triggered

Actual Result

after_catalog_created is triggered before the after_context_created hook

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): 0.18.7
  • Python version used (python -V): 3.8.16
  • Operating system and version: Windows 10 Enterprise for Virtual Desktops version 22H2
  • settings.py looks like this:
from kedro.config import TemplatedConfigLoader

from model_ops.hooks import MLFlowHooks

HOOKS = (MLFlowHooks(),)

CONFIG_LOADER_CLASS = TemplatedConfigLoader

CONFIG_LOADER_ARGS = {
    "config_patterns": {
        "mlflow_catalog": ["mlflow/*_catalog.yml"],
        "mlflow_parameters": ["mlflow/*_parameters.yml"],
    },
    "globals_pattern": "*globals.yml",
}
@melvinkokxw melvinkokxw changed the title after_context_created not the first hook to run, but is fixed when kedro-telemetry is uninstalled after_context_created not the first hook to run, but is fixed when kedro-telemetry is uninstalled Apr 4, 2023
@noklam
Copy link
Contributor

noklam commented Apr 4, 2023

@melvinkokxw Thanks for reporting this. In a kedro run, the catalog is created by the context so context hook is the earliest triggered hook.

Can you share more how's your hook look like and do you still see the same issue when removed telemetry?

@melvinkokxw
Copy link
Author

@noklam Here is the hook I'm using:

class MLFlowHooks:
    """
    Initialise MLFlow and log useful tracking info automatically
    """

    @hook_impl
    def after_context_created(self, context):
        """Loads MLFlow catalog and parameters if MLFlow is enabled

        Args:
            context: Kedro context
        """
        if not self.mlflow_enabled:
            return

        context.config_loader["parameters"] = {
            **context.config_loader["parameters"],
            **context.config_loader["mlflow_parameters"],
        }
        context.config_loader["catalog"] = {
            **context.config_loader["catalog"],
            **context.config_loader["mlflow_catalog"],
        }

    @hook_impl
    def after_catalog_created(self, feed_dict):
        """
        Check if MLFLow is enabled

        Args:
            feed_dict: Kedro feed dict

        Returns:

        """
        parameters = feed_dict["parameters"]

        # MLFlow enabled?
        self.mlflow_enabled = parameters.get("mlflow_enabled", False)

When I removed kedro-telemetry, after_context_created was triggered first. When I reinstall kedro-telemetry, after_catalog_created was triggered first.

@astrojuanlu
Copy link
Member

Some debugging:

after_context_created, the one supposed to be the earliest, is called in KedroSession.load_context:

self._hook_manager.hook.after_context_created( # pylint: disable=no-member
context=context
)

after_catalog_created is called by either accessing KedroContext.catalog or calling KedroSession.run:

self._hook_manager.hook.after_catalog_created(
catalog=catalog,
conf_catalog=conf_catalog,
conf_creds=conf_creds,
feed_dict=feed_dict,
save_version=save_version,
load_versions=load_versions,
)

Note that KedroSession.run calls .load_context() (hence triggering after_context_created) before triggering after_catalog_created.

However, kedro-telemetry hooks on before_command_run, which is a CLI hook that runs earlier:

self._cli_hook_manager.hook.before_command_run( # pylint: disable=no-member
project_metadata=self._metadata, command_args=args
)
try:
super().main(
args=args,
prog_name=prog_name,
complete_var=complete_var,
standalone_mode=standalone_mode,
**extra,
)

Working on gh-1942 to better understand what's happening.

@noklam
Copy link
Contributor

noklam commented May 20, 2023

I think this was answer in slack but I can't find the thread now.

@astrojuanlu
Copy link
Member

https://www.linen.dev/s/kedro/t/10245031/hi-kedro-team-users-i-found-two-unusual-behaviours-with-kedr#6be97170-46ab-4562-8395-425b62f779b8

What happen is this - catalog is a read-only object, everytime context.catalog get called it get created and trigger the after_catalog_created hook.
In the telemetry hook after_context_created it created catalog, so it trigger the after_catalog_created before your MLFlowHook’s after_context_created

@noklam
Copy link
Contributor

noklam commented Jul 31, 2023

@melvinkokxw I am closing this for now. The short answer to this is the execution orders are not clear when you have multiple hooks exist. In this case, kedro-telemetry try to read catalog so it triggers after_catalog_created.

What happens roughly

  1. kedro-telemtry after_context_created read catalog
  2. after_catalog_created is triggered
  3. after_context_created is triggered twice in your custom hook.

So your hook isn't really the "first hook". We need to document this better but this is not a bug.

Related issues:

@noklam noklam closed this as completed Jul 31, 2023
@astrojuanlu
Copy link
Member

This goes a bit deeper into the design of KedroContext but maybe, given that it's immutable in develop (#1465), .catalog could be pre-computed, hence after_catalog_created would always be triggered before after_context_created, reducing ambiguity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants