Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double execution of after_catalog_created hook if opted into kedro-telemetry #382

Closed
fdroessler opened this issue Oct 9, 2023 · 4 comments · Fixed by #422
Closed

Double execution of after_catalog_created hook if opted into kedro-telemetry #382

fdroessler opened this issue Oct 9, 2023 · 4 comments · Fixed by #422
Labels
bug Something isn't working Community Issue/PR opened by the open-source community

Comments

@fdroessler
Copy link
Contributor

Description

As part of investigating an issue in kedro-azureml (getindata/kedro-azureml#79) We came across the fact that any after_catalog_created hooks will be executed twice if the user has opted in kedro-telemetry. This is because in telemetry after_context_created hook the context.catalog is accessed which in turn triggers the after_catalog_created hook for the first and I'd assume unexpected time.

This might lead to bugs and unintended behaviours. Is this by design or could one open a PR that as a potential solution. E.g. one could create a stateful KedroTelemetryProjectHooks that only sends the Project data as part of the after_catalog_created hook. Better solutions might be found when thinking about it more than 5 min 😅

Steps to Reproduce

  1. install kedro spaceflight starter
  2. add a hook that prints something in a after_catalog_created hook
import logging

from kedro.framework.hooks import hook_impl
from kedro.io import DataCatalog


class DataCatalogHooks:
    @property
    def _logger(self):
        return logging.getLogger(self.__class__.__name__)

    @hook_impl
    def after_catalog_created(self, catalog: DataCatalog) -> None:
        # self._logger.info("************************I was executed!")
        print("************************I was executed!")
  1. run the kedro pipeline

Expected Result

I would have expected ************************I was executed! to only appear once but it might not be so by design.

Actual Result

❯ kedro run
[10/09/23 21:41:59] INFO     Kedro project space-telemetry                                                                                                          session.py:364
************************I was executed!
************************I was executed!
[10/09/23 21:42:01] INFO     Loading data from 'companies' (CSVDataset)...

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

python=3.10

kedro==0.18.13
kedro-datasets==1.7.1
kedro-telemetry==0.2.5
kedro-viz==6.5.0

@astrojuanlu astrojuanlu added bug Something isn't working Community Issue/PR opened by the open-source community labels Oct 9, 2023
@astrojuanlu
Copy link
Member

Thanks a lot @fdroessler for flagging this. @ankatiyar this might be related to the weird things you observed while investigating the telemetry.

See also kedro-org/kedro#2492 cc @noklam

@ankatiyar
Copy link
Contributor

The thing that I encountered was different and intentional - essentially, before_command_run hook in kedro-telemetry sends the Heap event twice to make it easy to aggregate.
This seems to be a separate issue. It's happening because of what @fdroessler described above. The after_context_created hook in telemetry also accesses the context.catalog to get catalog information which "creates" the catalog once before the actual kedro run.

Of the top of my head, I think one solution could be separate out the sending of Catalog size related information to Heap into a after_catalog_created hook implementation and have after_context_created for the other information - no of pipelines etc. We can discuss this more when we do a deep dive into improving telemetry workflows?

@noklam
Copy link
Contributor

noklam commented Oct 12, 2023

I agree delay sending the project info in after_context_created is a better solution. @fdroessler Would you like to open a PR for this?

@fdroessler
Copy link
Contributor Author

Sure @noklam I'll put something together in the next days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Community Issue/PR opened by the open-source community
Projects
Archived in project
4 participants