Double execution of `after_catalog_created` hook if opted into `kedro-telemetry` #382

fdroessler · 2023-10-09T19:42:42Z

Description

As part of investigating an issue in kedro-azureml (getindata/kedro-azureml#79) We came across the fact that any after_catalog_created hooks will be executed twice if the user has opted in kedro-telemetry. This is because in telemetry after_context_created hook the context.catalog is accessed which in turn triggers the after_catalog_created hook for the first and I'd assume unexpected time.

This might lead to bugs and unintended behaviours. Is this by design or could one open a PR that as a potential solution. E.g. one could create a stateful KedroTelemetryProjectHooks that only sends the Project data as part of the after_catalog_created hook. Better solutions might be found when thinking about it more than 5 min 😅

Steps to Reproduce

install kedro spaceflight starter
add a hook that prints something in a after_catalog_created hook

import logging

from kedro.framework.hooks import hook_impl
from kedro.io import DataCatalog


class DataCatalogHooks:
    @property
    def _logger(self):
        return logging.getLogger(self.__class__.__name__)

    @hook_impl
    def after_catalog_created(self, catalog: DataCatalog) -> None:
        # self._logger.info("************************I was executed!")
        print("************************I was executed!")

run the kedro pipeline

Expected Result

I would have expected ************************I was executed! to only appear once but it might not be so by design.

Actual Result

❯ kedro run
[10/09/23 21:41:59] INFO     Kedro project space-telemetry                                                                                                          session.py:364
************************I was executed!
************************I was executed!
[10/09/23 21:42:01] INFO     Loading data from 'companies' (CSVDataset)...

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

python=3.10

kedro==0.18.13
kedro-datasets==1.7.1
kedro-telemetry==0.2.5
kedro-viz==6.5.0

The text was updated successfully, but these errors were encountered:

astrojuanlu · 2023-10-09T19:45:53Z

Thanks a lot @fdroessler for flagging this. @ankatiyar this might be related to the weird things you observed while investigating the telemetry.

See also kedro-org/kedro#2492 cc @noklam

ankatiyar · 2023-10-10T15:11:10Z

The thing that I encountered was different and intentional - essentially, before_command_run hook in kedro-telemetry sends the Heap event twice to make it easy to aggregate.
This seems to be a separate issue. It's happening because of what @fdroessler described above. The after_context_created hook in telemetry also accesses the context.catalog to get catalog information which "creates" the catalog once before the actual kedro run.

Of the top of my head, I think one solution could be separate out the sending of Catalog size related information to Heap into a after_catalog_created hook implementation and have after_context_created for the other information - no of pipelines etc. We can discuss this more when we do a deep dive into improving telemetry workflows?

noklam · 2023-10-12T11:07:02Z

I agree delay sending the project info in after_context_created is a better solution. @fdroessler Would you like to open a PR for this?

fdroessler · 2023-10-15T10:04:56Z

Sure @noklam I'll put something together in the next days.

astrojuanlu added bug Something isn't working Community Issue/PR opened by the open-source community labels Oct 9, 2023

fdroessler mentioned this issue Oct 29, 2023

fix(telemetry): remove double execution by moving to after catalog created hook #422

Merged

4 tasks

astrojuanlu closed this as completed in #422 Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Double execution of `after_catalog_created` hook if opted into `kedro-telemetry` #382

Double execution of `after_catalog_created` hook if opted into `kedro-telemetry` #382

fdroessler commented Oct 9, 2023

astrojuanlu commented Oct 9, 2023

ankatiyar commented Oct 10, 2023

noklam commented Oct 12, 2023

fdroessler commented Oct 15, 2023

Double execution of after_catalog_created hook if opted into kedro-telemetry #382

Double execution of after_catalog_created hook if opted into kedro-telemetry #382

Comments

fdroessler commented Oct 9, 2023

Description

Steps to Reproduce

Expected Result

Actual Result

Your Environment

astrojuanlu commented Oct 9, 2023

ankatiyar commented Oct 10, 2023

noklam commented Oct 12, 2023

fdroessler commented Oct 15, 2023

Double execution of `after_catalog_created` hook if opted into `kedro-telemetry` #382

Double execution of `after_catalog_created` hook if opted into `kedro-telemetry` #382