DO NOT MERGE - Pipeline performance test project #4154

lrcouto · 2024-09-10T16:05:32Z

Description

Kedro project made to simulate delays and latency in specific points of a Kedro pipeline. Pass the desired delays in seconds using the --params flag. For example:

kedro run --params=hook_delay=5,dataset_load_delay=5,file_save_delay=5

Development notes

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

Read the contributing guidelines
Signed off each commit with a Developer Certificate of Origin (DCO)
Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Added a description of this change in the RELEASE.md file
Added tests to cover my changes
Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

Signed-off-by: Laura Couto <laurarccouto@gmail.com>

noklam

Thanks! I am not able to run the pipeline with missing data so I just quickly review it on a high level.

Can you add some description do the PR explaining how to use this pipeline test. I see that most of the pipeline here are mocking with sleep, why did you end up going with this implementation?

For example, if I want to answer the question, does Kedro run too slow when it needs to connect to a database, what command should I run?

noklam · 2024-09-17T10:28:08Z

performance-test/conf/README.md

@@ -0,0 +1,20 @@
+# What is this for?
+
+This folder should be used to store configuration files used by Kedro or by separate tools.


Is there any specific configuration needed to be documented? Otherwise I think we can remove this from our project

noklam · 2024-09-17T10:29:07Z

performance-test/README.md

@@ -0,0 +1,98 @@
+# performance-test


Maybe it's more helpful to document how this project should be used, otherwise I suggest removing it as these template doesn't add much information for us.

noklam · 2024-09-17T10:30:54Z

performance-test/src/performance_test/hooks.py

+def register_pipelines(self) -> Dict[str, Pipeline]:
+    from performance_test.pipelines.expense_analysis import (
+        pipeline as expense_analysis_pipeline,
+    )
+
+    return {
+        "__default__": expense_analysis_pipeline.create_pipeline(),
+        "expense_analysis": expense_analysis_pipeline.create_pipeline(),
+    }


Does this belongs to pipeline_registry.py?

Signed-off-by: Laura Couto <laurarccouto@gmail.com>

…o into pipeline-performance-test

ankatiyar · 2024-09-23T15:35:23Z

performance-test/requirements.txt

+notebook
+ruff~=0.1.8
+scikit-learn~=1.5.1; python_version >= "3.9"
+scikit-learn<=1.4.0,>=1.0; python_version < "3.9"


pyspark should probably be here

Signed-off-by: Laura Couto <laurarccouto@gmail.com>

…o into pipeline-performance-test

noklam · 2024-09-24T13:40:33Z

I still unable to run the pipeline - am I suppose to get the data somewhere? Can we merge this folder with Ankita's benchmark (not hurry for now can do this at the end).

Signed-off-by: Laura Couto <laurarccouto@gmail.com>

ankatiyar

Thanks @lrcouto, I was able to get the pipeline to run! (thanks for helping with the setup) It looks good to me, just some minor comments.
I think it'd be nice to have this project be it's own separate repository that we could use to run performance tests instead of be a part of Kedro code base but keen to hear what others think..

ankatiyar · 2024-09-24T16:17:53Z

performance-test/.viz/stats.json

@@ -0,0 +1 @@
+{}


Can get rid of this folder entirely, it's generated by viz

ankatiyar · 2024-09-24T16:19:23Z

performance-test/README.md

@@ -0,0 +1,19 @@
+# performance-test
+


We could also add setup instructions here, just so it's recorded somewhere!

noklam

I can run the pipeline successfully too with an extra instruction to install java on GitPod.

kedro run --params=hook_delay=5,dataset_load_delay=5,file_save_delay=5

Apart from @ankatiyar 's comment, some minor comments about making the parameters name consistent. Like we discussed, a few preset of configuration would be helpful so people know how to use the configuration to test (we'll likely need these preset anyway to run benchmark automatically)

If I understand correctly I don't expect any difference between:
kedro run --file-save-delay=5 and kedro run --file-load-delay=5

noklam · 2024-09-25T15:40:34Z

performance-test/conf/base/parameters.yml

@@ -0,0 +1,3 @@
+hook_delay: 0
+dataset_load_delay: 0
+file_save_delay: 0


can we choose one name? either data_save_delay or file_load_delay.

We could just call them "save_delay" and "load_delay" maybe?

sounds good

Signed-off-by: Laura Couto <laurarccouto@gmail.com>

lrcouto · 2024-10-16T15:20:44Z

Project is currently located at https://github.com/kedro-org/pipeline-performance-test

Closing this PR since it's not necessary anymore.

lrcouto added 3 commits September 9, 2024 18:01

Add test project

b31c0d6

Signed-off-by: Laura Couto <laurarccouto@gmail.com>

Add delays

43b7571

Signed-off-by: Laura Couto <laurarccouto@gmail.com>

Use env vars to determine delay

f505a7f

Signed-off-by: Laura Couto <laurarccouto@gmail.com>

lrcouto linked an issue Sep 10, 2024 that may be closed by this pull request

[Stress Testing] - Create example projects to assess Kedro performance for complex pipelines #3866

Open

astrojuanlu mentioned this pull request Sep 11, 2024

[Stress Testing] - Data Catalog and Config Loader #4125

Open

lrcouto added 2 commits September 11, 2024 16:01

Use kedro run --params to determine delays

eafe4c5

Signed-off-by: Laura Couto <laurarccouto@gmail.com>

Add extra nodes

c5a1ac3

Signed-off-by: Laura Couto <laurarccouto@gmail.com>

lrcouto marked this pull request as ready for review September 12, 2024 00:10

lrcouto requested a review from merelcht as a code owner September 12, 2024 00:10

Merge branch 'main' into pipeline-performance-test

d1a492a

lrcouto requested review from noklam, ankatiyar, ElenaKhaustova and astrojuanlu September 12, 2024 13:52

Merge branch 'main' into pipeline-performance-test

bfd5844

noklam reviewed Sep 17, 2024

View reviewed changes

lrcouto and others added 7 commits September 17, 2024 11:01

Merge branch 'main' into pipeline-performance-test

97bc3d4

Merge branch 'main' into pipeline-performance-test

bd16556

Remove redundant function from hooks

6c5ac73

Signed-off-by: Laura Couto <laurarccouto@gmail.com>

Merge branch 'main' into pipeline-performance-test

e6ec50f

Add usage instructions to readme

60f06ad

Signed-off-by: Laura Couto <laurarccouto@gmail.com>

Merge branch 'pipeline-performance-test' of github.com:kedro-org/kedr…

3399c37

…o into pipeline-performance-test

Merge branch 'main' into pipeline-performance-test

5acf23c

ankatiyar reviewed Sep 23, 2024

View reviewed changes

lrcouto and others added 3 commits September 23, 2024 15:44

Merge branch 'main' into pipeline-performance-test

86e53fe

Add pyspark to project requirements

6f24fe0

Signed-off-by: Laura Couto <laurarccouto@gmail.com>

Merge branch 'pipeline-performance-test' of github.com:kedro-org/kedr…

3d2e5b8

…o into pipeline-performance-test

lrcouto added 2 commits September 24, 2024 11:18

Add example dataset to repo

6f3b67d

Signed-off-by: Laura Couto <laurarccouto@gmail.com>

Add spark dataset requirements to project requirements file

28b938a

Signed-off-by: Laura Couto <laurarccouto@gmail.com>

ankatiyar reviewed Sep 24, 2024

View reviewed changes

Merge branch 'main' into pipeline-performance-test

f4fa341

noklam self-requested a review September 25, 2024 15:39

noklam reviewed Sep 25, 2024

View reviewed changes

lrcouto and others added 4 commits September 26, 2024 10:46

Merge branch 'main' into pipeline-performance-test

d476736

Change param names

3414c78

Signed-off-by: Laura Couto <laurarccouto@gmail.com>

Rerun docs build

f1ba080

Signed-off-by: Laura Couto <laurarccouto@gmail.com>

Merge branch 'main' into pipeline-performance-test

b24048c

noklam mentioned this pull request Sep 30, 2024

[Stress Testing] Setup Kedro performance test for executing #4200

Open

lrcouto closed this Oct 16, 2024

lrcouto removed a link to an issue Oct 17, 2024

[Stress Testing] - Create example projects to assess Kedro performance for complex pipelines #3866

Open

lrcouto linked an issue Oct 17, 2024 that may be closed by this pull request

[Stress Testing] - Create example projects to assess Kedro performance for complex pipelines #3866

Open

lrcouto removed a link to an issue Oct 17, 2024

[Stress Testing] - Create example projects to assess Kedro performance for complex pipelines #3866

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DO NOT MERGE - Pipeline performance test project #4154

DO NOT MERGE - Pipeline performance test project #4154

lrcouto commented Sep 10, 2024 •

edited

Loading

noklam left a comment

noklam Sep 17, 2024

noklam Sep 17, 2024

noklam Sep 17, 2024

ankatiyar Sep 23, 2024

noklam commented Sep 24, 2024

ankatiyar left a comment

ankatiyar Sep 24, 2024

ankatiyar Sep 24, 2024

noklam left a comment

noklam Sep 25, 2024

lrcouto Sep 25, 2024

noklam Sep 26, 2024

lrcouto commented Oct 16, 2024

		@@ -0,0 +1,20 @@
		# What is this for?

		This folder should be used to store configuration files used by Kedro or by separate tools.

DO NOT MERGE - Pipeline performance test project #4154

DO NOT MERGE - Pipeline performance test project #4154

Conversation

lrcouto commented Sep 10, 2024 • edited Loading

Description

Development notes

Developer Certificate of Origin

Checklist

noklam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

noklam commented Sep 24, 2024

ankatiyar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

noklam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lrcouto commented Oct 16, 2024

lrcouto commented Sep 10, 2024 •

edited

Loading