Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add runner benchmark #4210

Open
wants to merge 30 commits into
base: main
Choose a base branch
from
Open

Add runner benchmark #4210

wants to merge 30 commits into from

Conversation

noklam
Copy link
Contributor

@noklam noklam commented Oct 7, 2024

Description

Dev Notes:

QA notes

pip install asv or pip install -e ".[benchmark]"
asv run --quick

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

Signed-off-by: Nok <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
@noklam
Copy link
Contributor Author

noklam commented Oct 9, 2024

============================================================Done============================================================
{'SequentialRunner': 7.8415021896362305, 'ThreadRunner': 7.56311297416687, 'ParallelRunner': 3.4262261390686035}

At the moment I only running test for a compute-bound workload, the result is expected only ParallelRunner should speed things up. I run into issue sometimes with dataset already registered error. I think it's likely related to: #4191

@ElenaKhaustova any thought? Should I focus on testing only the KedroDataCatalog instead?

@noklam noklam requested a review from ankatiyar October 9, 2024 14:34
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Comment on lines 1 to 2
[project]
name = "kedro_benchmarks"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason to do this is multiprocessing breaks imports. Without this I get a benchmarks module is not found error.

To make sure this always in sys.path, I add this into part of the installation step. Other suggestions are welcomed.

I have tried PYTHONPATH as well but it doesn't seem to work as new process doesn't inherit it.

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
@noklam noklam force-pushed the noklam/stress-testing-runners-4127 branch from a8883b9 to 66ad6c5 Compare October 16, 2024 11:23
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
@noklam noklam marked this pull request as ready for review October 17, 2024 14:53
@noklam noklam requested a review from merelcht as a code owner October 17, 2024 14:53
"pip install -e . kedro-datasets[pandas-csvdataset]"
],
"branches": [
"noklam/stress-testing-runners-4127"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to revert before merge.

@merelcht
Copy link
Member

The tests look good! I'm getting the same failure with ThreadRunner you mentioned. Do we have an idea on how hard it is to fix the bug in the DataCatalog? I think we should fix it, while KedroDataCatalog is still experimental

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
…edro-org/kedro into noklam/stress-testing-runners-4127

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants