You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part of investigating an issue in kedro-azureml (getindata/kedro-azureml#79) We came across the fact that any after_catalog_created hooks will be executed twice if the user has opted in kedro-telemetry. This is because in telemetry after_context_created hook the context.catalog is accessed which in turn triggers the after_catalog_createdhook for the first and I'd assume unexpected time.
This might lead to bugs and unintended behaviours. Is this by design or could one open a PR that as a potential solution. E.g. one could create a stateful KedroTelemetryProjectHooks that only sends the Project data as part of the after_catalog_created hook. Better solutions might be found when thinking about it more than 5 min 😅
Steps to Reproduce
install kedro spaceflight starter
add a hook that prints something in a after_catalog_created hook
import logging
from kedro.framework.hooks import hook_impl
from kedro.io import DataCatalog
class DataCatalogHooks:
@property
def _logger(self):
return logging.getLogger(self.__class__.__name__)
@hook_impl
def after_catalog_created(self, catalog: DataCatalog) -> None:
# self._logger.info("************************I was executed!")
print("************************I was executed!")
run the kedro pipeline
Expected Result
I would have expected ************************I was executed! to only appear once but it might not be so by design.
Actual Result
❯ kedro run
[10/09/23 21:41:59] INFO Kedro project space-telemetry session.py:364
************************I was executed!
************************I was executed!
[10/09/23 21:42:01] INFO Loading data from 'companies' (CSVDataset)...
Your Environment
Include as many relevant details about the environment in which you experienced the bug:
The thing that I encountered was different and intentional - essentially, before_command_run hook in kedro-telemetry sends the Heap event twice to make it easy to aggregate.
This seems to be a separate issue. It's happening because of what @fdroessler described above. The after_context_created hook in telemetry also accesses the context.catalog to get catalog information which "creates" the catalog once before the actual kedro run.
Of the top of my head, I think one solution could be separate out the sending of Catalog size related information to Heap into a after_catalog_created hook implementation and have after_context_created for the other information - no of pipelines etc. We can discuss this more when we do a deep dive into improving telemetry workflows?
Description
As part of investigating an issue in kedro-azureml (getindata/kedro-azureml#79) We came across the fact that any
after_catalog_created
hooks will be executed twice if the user has opted inkedro-telemetry
. This is because in telemetryafter_context_created
hook the context.catalog is accessed which in turn triggers theafter_catalog_created
hook for the first and I'd assume unexpected time.This might lead to bugs and unintended behaviours. Is this by design or could one open a PR that as a potential solution. E.g. one could create a stateful KedroTelemetryProjectHooks that only sends the Project data as part of the after_catalog_created hook. Better solutions might be found when thinking about it more than 5 min 😅
Steps to Reproduce
after_catalog_created
hookExpected Result
I would have expected
************************I was executed!
to only appear once but it might not be so by design.Actual Result
Your Environment
Include as many relevant details about the environment in which you experienced the bug:
python=3.10
kedro==0.18.13
kedro-datasets==1.7.1
kedro-telemetry==0.2.5
kedro-viz==6.5.0
The text was updated successfully, but these errors were encountered: