-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DO NOT MERGE - Pipeline performance test project #4154
Changes from all commits
b31c0d6
43b7571
f505a7f
eafe4c5
c5a1ac3
d1a492a
bfd5844
97bc3d4
bd16556
6c5ac73
e6ec50f
60f06ad
3399c37
5acf23c
86e53fe
6f24fe0
3d2e5b8
6f3b67d
28b938a
f4fa341
d476736
3414c78
f1ba080
b24048c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,151 @@ | ||
########################## | ||
# KEDRO PROJECT | ||
|
||
# ignore all local configuration | ||
conf/local/** | ||
!conf/local/.gitkeep | ||
|
||
# ignore potentially sensitive credentials files | ||
conf/**/*credentials* | ||
|
||
# ignore everything in the following folders | ||
data/** | ||
|
||
# except their sub-folders | ||
!data/**/ | ||
|
||
# also keep all .gitkeep files | ||
!.gitkeep | ||
|
||
# keep also the example dataset | ||
!data/01_raw/* | ||
|
||
|
||
########################## | ||
# Common files | ||
|
||
# IntelliJ | ||
.idea/ | ||
*.iml | ||
out/ | ||
.idea_modules/ | ||
|
||
### macOS | ||
*.DS_Store | ||
.AppleDouble | ||
.LSOverride | ||
.Trashes | ||
|
||
# Vim | ||
*~ | ||
.*.swo | ||
.*.swp | ||
|
||
# emacs | ||
*~ | ||
\#*\# | ||
/.emacs.desktop | ||
/.emacs.desktop.lock | ||
*.elc | ||
|
||
# JIRA plugin | ||
atlassian-ide-plugin.xml | ||
|
||
# C extensions | ||
*.so | ||
|
||
### Python template | ||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
.hypothesis/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
.static_storage/ | ||
.media/ | ||
local_settings.py | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# pyenv | ||
.python-version | ||
|
||
# celery beat schedule file | ||
celerybeat-schedule | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{} | ||
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# performance-test | ||
Check warning on line 1 in performance-test/README.md GitHub Actions / vale[vale] performance-test/README.md#L1
Raw output
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe it's more helpful to document how this project should be used, otherwise I suggest removing it as these template doesn't add much information for us. |
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could also add setup instructions here, just so it's recorded somewhere! |
||
## Overview | ||
|
||
This is a test project meant to simulate delays in specific parts of a Kedro pipeline. It's supposed to be a tool to gauge pipeline performance and be used to compare in-development changes to Kedro with an already stable release version. | ||
|
||
## Usage | ||
|
||
There are three delay parameters that can be set in this project: | ||
|
||
**hook_delay** - Simulates slow-loading hooks due to it performing complex operations or accessing external services that can suffer from latency. | ||
Check warning on line 11 in performance-test/README.md GitHub Actions / vale[vale] performance-test/README.md#L11
Raw output
|
||
|
||
**load_delay** - Simulates a delay in loading a dataset, because of a large size or connection latency, for example. | ||
Check warning on line 13 in performance-test/README.md GitHub Actions / vale[vale] performance-test/README.md#L13
Raw output
|
||
|
||
**save_delay** - Simulates a delay in saving an output file, because of, for example, connection delay in accessing remote storage. | ||
Check warning on line 15 in performance-test/README.md GitHub Actions / vale[vale] performance-test/README.md#L15
Raw output
|
||
|
||
When invoking the `kedro run` command, you can pass the desired value in seconds for each delay as a parameter using the `--params` flag. For example: | ||
|
||
`kedro run --params=hook_delay=5,load_delay=5,save_delay=5` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
congress_expenses: | ||
type: spark.SparkDataset | ||
filepath: data/gastos-deputados.csv | ||
file_format: csv | ||
load_args: | ||
header: True | ||
inferSchema: True | ||
|
||
expenses_per_party: | ||
type: spark.SparkDataset | ||
filepath: data/output/expenses_per_party.csv | ||
file_format: csv | ||
save_args: | ||
sep: ',' | ||
header: True | ||
mode: overwrite | ||
load_args: | ||
header: True | ||
inferSchema: True | ||
|
||
largest_expense_source: | ||
type: spark.SparkDataset | ||
filepath: data/output/largest_expense_source.parquet | ||
file_format: parquet | ||
save_args: | ||
sep: ',' | ||
header: True | ||
mode: overwrite | ||
|
||
top_spender_per_party: | ||
type: spark.SparkDataset | ||
filepath: data/output/top_spender_per_party.csv | ||
file_format: csv | ||
save_args: | ||
sep: ',' | ||
header: True | ||
mode: overwrite | ||
load_args: | ||
header: True | ||
inferSchema: True | ||
|
||
top_overall_spender: | ||
type: spark.SparkDataset | ||
filepath: data/output/top_overall_spender.parquet | ||
file_format: parquet | ||
save_args: | ||
sep: ',' | ||
header: True | ||
mode: overwrite | ||
|
||
top_spending_party: | ||
type: spark.SparkDataset | ||
filepath: data/output/top_spending_party.parquet | ||
file_format: parquet | ||
save_args: | ||
sep: ',' | ||
header: True | ||
mode: overwrite |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
hook_delay: 0 | ||
load_delay: 0 | ||
save_delay: 0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# This is a boilerplate parameters config generated for pipeline 'expense_analysis' | ||
# using Kedro 0.19.8. | ||
# | ||
# Documentation for this file format can be found in "Parameters" | ||
# Link: https://docs.kedro.org/en/0.19.8/configuration/parameters.html |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# You can define spark specific configuration here. | ||
|
||
spark.driver.maxResultSize: 3g | ||
spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem | ||
spark.sql.execution.arrow.pyspark.enabled: true | ||
|
||
# https://docs.kedro.org/en/stable/integrations/pyspark_integration.html#tips-for-maximising-concurrency-using-threadrunner | ||
spark.scheduler.mode: FAIR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can get rid of this folder entirely, it's generated by viz