Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(telemetry): Run telemetry outside of kedro projects #775

Conversation

DimedS
Copy link
Contributor

@DimedS DimedS commented Jul 22, 2024

Description

Closes #729,

Telemetry will be executed if a CLI command is run outside of a Kedro project (including the kedro new command). In this case, the same information will be sent, except for the project-related data.

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>
@DimedS DimedS changed the title Run telemetry outside of kedro projects feat(telemetry): Run telemetry outside of kedro projects Jul 22, 2024
@@ -168,26 +168,23 @@ def before_command_run(
):
"""Hook implementation to send command run data to Heap"""

if not project_metadata: # in package mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean we also track packaged mode now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that line isn't related to package mode:
#729 (comment)

As I understand it, in package mode, execution will occur without the CLI. Am I correct?

Copy link
Member

@merelcht merelcht Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A packaged project will use the __main__.py as entrypoint. It will then look for the run method to execute.

I think you're right this line here in telemetry actually isn't related to package mode.. It seems to only be reached if you're not in a kedro project.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Packaged mode will execute the CLI through the entrypoint. I believe the commend about package mode because it was assumed that it is the only situation that project_metadata will be missing. This is not exactly true now because it would be either:

  1. packaged mode
  2. kedro command outside of project (i.e. kedro new)

I believe bot situations should be handled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @noklam. Could you please provide more details about packaged mode? From what I see in the code, if I package a Kedro project, it will have an entry point project_name.__main__:main. When I execute project_name in the terminal, the kedro/framework/cli/project.py:run() function will be executed directly, without calling the before_command_run() hook. This is because that hook is only called in the KedroCLI() class during Kedro CLI execution. I'm trying to understand what we need to handle in the telemetry hooks code concerning packaging mode.

if not self._consent:
self._opt_out_notification()
return

# get KedroCLI and its structure from actual project root
cli = KedroCLI(project_path=project_metadata.project_path)
cli = KedroCLI(project_path=project_path if project_path else Path.cwd())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my question is, does that project_path here actually matters? Where is it used as the command_args are coming from hook and KedroCLI which was created in kedro framework.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also trying to understand why KedroCLI is dependent on a specific Kedro project. I believe the CLI structure should be the same for all projects. Do you think we can just use Path.cwd() here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the project_path matters but we should add a comment to explain this. I have questions about the _recurse_cli used by the function, does it takes care of the LazyCommand we added lately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have questions about the _recurse_cli used by the function, does it takes care of the LazyCommand we added lately?

I checked the Lazy Loading PR, looks like _recurse_cli() func should work as previously. Also I tested current PR manually with latest kedro main branch on fresh virtual env, everything works well from the 1st command.

Copy link
Contributor

@noklam noklam Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dig into this a bit and I think project_path is not an arbitrary value here and it matters.

The output of recurse_cli change when I provide different value of project_path (mlflow etc is missing), they are not important here but I am not sure if this will trigger the imports.
image

When project_path is set as root correctly:
image

I am also concerned would this revert the effort of LazyCommand since this seems to try to load the CLI group (hopefully not the dependencies). How could we test this properly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Nok is right, the recursion through the CLI definitely structure triggers the loading of lazy the CLI command groups -
It uses get_command() :

cli_element.get_command(ctx, command_name),

Which if you look at the implementation of the LazyGroup, triggers the lazy load:
https://github.com/kedro-org/kedro/blob/8bbbfb6c7257005bef841143690308f05d4b829b/kedro/framework/cli/utils.py#L504-L509

I believe we need to find a way to do the masking without loading relying on KedroCLI to not defeat the purpose of introducing lazy loading

kedro-telemetry/kedro_telemetry/plugin.py Outdated Show resolved Hide resolved
Comment on lines +275 to +276
if project_path:
pyproject_path = project_path / PYPROJECT_CONFIG_NAME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of overwriting the supplied project_path here I'd just create a new variable to make it easier to follow what happens and debug if something is wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's already a new variable?

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>
@DimedS DimedS marked this pull request as ready for review July 31, 2024 15:42
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with notes. #775 (comment), upon some investigation, I think there are issues to follow up:

  1. project_path is not arbitary as it affects the result of recursive call
  2. The recursive call undo the effort of LazyCommand as it eagerly import all the commands.

I will open a new issue and I suggest we fix both problem together (1. may not needed if we end up throwing away the recursive call)

@DimedS
Copy link
Contributor Author

DimedS commented Jul 31, 2024

Thanks for your review, @ankatiyar and @noklam. I agree with you; recurse_cli() in telemetry is currently undermining the efforts made by Lazy Load, and we should prioritise fixing that. However, this issue is somewhat loosely connected to the current PR. Therefore, it's better to proceed with this PR before the release to fully complete the opt-out telemetry modernisation.

Copy link
Contributor

@ElenaKhaustova ElenaKhaustova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @DimedS!

Agree that the issue found #794 should be addressed asap but it is outside of the scope of this PR.

@DimedS DimedS merged commit 05a794f into main Aug 1, 2024
10 checks passed
@DimedS DimedS deleted the 729-enable-telemetry-for-kedro-new-and-commands-outside-kedro-project-folder branch August 1, 2024 09:58
merelcht pushed a commit to galenseilis/kedro-plugins that referenced this pull request Aug 27, 2024
* Run telemetry outside of kedro projects

---------

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>
Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants