Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarks for KedroDataCatalog and fix tests for DataCatalog #4246

Merged
merged 6 commits into from
Oct 21, 2024

Conversation

ankatiyar
Copy link
Contributor

@ankatiyar ankatiyar commented Oct 21, 2024

Description

Close #4125

Development notes

  • Added tests for KedroDataCatalog
  • Fix tests for DataCatalog: The tests were failing for DataCatalog.add_feed_dict() and DataCatalog.add_all() because asv runs the setup() then repeats the tests for a number of times and then runs teardown() but trying to add the same datasets multiple times results in DatasetAlreadyExistsError https://github.com/kedro-org/kedro/actions/workflows/benchmark-performance.yml

To test locally: asv run

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Copy link
Contributor

@ElenaKhaustova ElenaKhaustova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @ankatiyar!

Added some suggestions on extending tests.

def time_setitem(self):
"""Benchmark the time to set a dataset"""
for i in range(1,1001):
self.catalog[f"dataset_new_{i}"] = CSVDataset(filepath="data.csv")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please also add setting raw data? So this part of setter was covered:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a separate test for this

benchmarks/benchmark_kedrodatacatalog.py Show resolved Hide resolved
benchmarks/benchmark_kedrodatacatalog.py Outdated Show resolved Hide resolved
benchmarks/benchmark_kedrodatacatalog.py Outdated Show resolved Hide resolved
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Copy link
Contributor

@ElenaKhaustova ElenaKhaustova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 🚀

@ankatiyar ankatiyar enabled auto-merge (squash) October 21, 2024 16:30
@ankatiyar ankatiyar merged commit 9a0a779 into main Oct 21, 2024
28 checks passed
@ankatiyar ankatiyar deleted the fix-asv-benchmarks branch October 21, 2024 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Stress Testing] - Data Catalog and Config Loader
4 participants