Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End to end tests #14

Merged
merged 23 commits into from
Mar 25, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions .github/workflows/e2e-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
name: End 2 End Tests (Spaceflights Quickstart)

on:
push:
branches:
- develop
pull_request:

jobs:
build:
runs-on: ubuntu-latest
if: github.event.pull_request.draft == false
steps:
- uses: actions/checkout@v2

- name: Setup python
uses: actions/setup-python@v2.2.1
with:
python-version: 3.8

- name: Setup virtualenv
run: |
python -V
python -m pip install virtualenv
virtualenv venv
source venv/bin/activate

- name: Initialize kedro spaceflights project
run: |
pip install . 'kedro<0.18'
kedro new --starter spaceflights --config tests/e2e/starter-config.yml --verbose --checkout=0.17.6

- name: Install project dependencies
run: |
cd ./spaceflights
echo "git+https://github.com/getindata/kedro-vertexai.git@$GITHUB_SHA" >> src/requirements.txt
echo "kedro-docker" >> src/requirements.txt
sed -i '/kedro-telemetry/d' src/requirements.txt
cat src/requirements.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean this cat? I left it intentionally for debugging purposes

pip install -r src/requirements.txt

- name: Init and update configuration
run: |
cd ./spaceflights
kedro docker init
kedro vertexai init gid-ml-ops-sandbox europe-west4
echo "!data/01_raw" >> .dockerignore
mv ../tests/e2e/catalog.yml conf/base/catalog.yml
mv ../tests/e2e/vertexai.yml conf/base/vertexai.yml

- name: Prepare docker env
uses: docker/setup-buildx-action@v1
id: buildx
with:
install: true

- name: Build pipeline docker image
run: |
cd ./spaceflights
docker build --build-arg BASE_IMAGE=python:3.8-buster --tag kedro-vertexai-e2e:latest --load .

- name: Publish docker image to GCR
uses: mattes/gce-docker-push-action@v1
with:
creds: ${{ secrets.GOOGLE_APPLICATION_CREDENTIALS }}
src: kedro-vertexai-e2e:latest
dst: gcr.io/gid-ml-ops-sandbox/kedro-vertexai-e2e:${{ github.sha }}

- name: Set up GCP Credentials
uses: google-github-actions/auth@v0.6.0
with:
credentials_json: ${{ secrets.GOOGLE_APPLICATION_CREDENTIALS }}
create_credentials_file: true
cleanup_credentials: true

- name: Run project on vertex pipeline
run: |
cd ./spaceflights
export KEDRO_CONFIG_COMMIT_ID=$GITHUB_SHA
kedro vertexai run-once

2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ out/
.LSOverride
.Trashes

spaceflights

# Vim
*~
.*.swo
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

## [Unreleased]

- Add end 2 end tests based on Kedro Spaceflights quickstart guide from our docs.
- Move service account configuration from env variables to config file. (#7)
- Refactored config to use `pydantic` for validation instead of homemade code. (#1)

Expand Down
4 changes: 2 additions & 2 deletions docs/source/03_getting_started/01_quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ $ pip install 'kedro<0.18' kedro-vertexai kedro-docker
With the dependencies in place, let's create a new project:

```
$ kedro new --starter=spaceflights
$ kedro new --starter=spaceflights --checkout=0.17.6

Project Name:
=============
Expand Down Expand Up @@ -163,7 +163,7 @@ The usage of `${run_id}` is described in section [Dynamic configuration support]
Execute:

```console
kedro docker build
kedro docker build --build-arg BASE_IMAGE=python:3.8-buster
```

When execution finishes, your docker image is ready. If you don't use local cluster, you should push the image to the remote repository:
Expand Down
2 changes: 1 addition & 1 deletion kedro_vertexai/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ def init(ctx, project_id, region, with_github_actions: bool):
run_name=run_name,
region=region,
)
config_path = Path.cwd().joinpath("conf/base/vertexai.yaml")
config_path = Path.cwd().joinpath("conf/base/vertexai.yml")
with open(config_path, "w") as f:
f.write(sample_config)

Expand Down
59 changes: 59 additions & 0 deletions tests/e2e/catalog.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
companies:
type: pandas.CSVDataSet
filepath: data/01_raw/companies.csv
# more about layers in the Data Engineering Convention:
# https://kedro.readthedocs.io/en/stable/03_tutorial/06_visualise_pipeline.html#interact-with-data-engineering-convention
layer: raw

reviews:
type: pandas.CSVDataSet
filepath: data/01_raw/reviews.csv
layer: raw

shuttles:
type: pandas.ExcelDataSet
filepath: data/01_raw/shuttles.xlsx
layer: raw
load_args:
engine: openpyxl

preprocessed_companies:
type: pandas.CSVDataSet
filepath: gs://gid-ml-ops-sandbox-plugin-tests/${run_id}/02_intermediate/preprocessed_companies.csv
layer: intermediate

preprocessed_shuttles:
type: pandas.CSVDataSet
filepath: gs://gid-ml-ops-sandbox-plugin-tests/${run_id}/02_intermediate/preprocessed_shuttles.csv
layer: intermediate

model_input_table:
type: pandas.CSVDataSet
filepath: gs://gid-ml-ops-sandbox-plugin-tests/${run_id}/03_primary/model_input_table.csv
layer: primary

X_train:
type: pickle.PickleDataSet
filepath: gs://gid-ml-ops-sandbox-plugin-tests/${run_id}/05_model_input/X_train.pickle
layer: model_input

y_train:
type: pickle.PickleDataSet
filepath: gs://gid-ml-ops-sandbox-plugin-tests/${run_id}/05_model_input/y_train.pickle
layer: model_input

X_test:
type: pickle.PickleDataSet
filepath: gs://gid-ml-ops-sandbox-plugin-tests/${run_id}/05_model_input/X_test.pickle
layer: model_input

y_test:
type: pickle.PickleDataSet
filepath: gs://gid-ml-ops-sandbox-plugin-tests/${run_id}/05_model_input/y_test.pickle
layer: model_input

regressor:
type: pickle.PickleDataSet
filepath: gs://gid-ml-ops-sandbox-plugin-tests/${run_id}/06_models/regressor.pickle
versioned: true
layer: models
3 changes: 3 additions & 0 deletions tests/e2e/starter-config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
project_name: Spaceflights E2E Test
repo_name: spaceflights
python_package: spaceflights
53 changes: 53 additions & 0 deletions tests/e2e/vertexai.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
project_id: gid-ml-ops-sandbox
region: europe-west4
run_config:
# Name of the image to run as the pipeline steps
image: gcr.io/gid-ml-ops-sandbox/kedro-vertexai-e2e:${commit_id}

# Pull policy to be used for the steps. Use Always if you push the images
# on the same tag, or Never if you use only local images
image_pull_policy: IfNotPresent

# Location of Vertex AI GCS root
root: gid-ml-ops-sandbox-plugin-tests/staging

# Name of the kubeflow experiment to be created
experiment_name: kedro-vertex-e2e

# Name of the scheduled run, templated with the schedule parameters
scheduled_run_name: kedro-vertex-e2e

# Optional service account to run vertex AI Pipeline with
service_account: vertex-ai-pipelines@gid-ml-ops-sandbox.iam.gserviceaccount.com

# Optional pipeline description
# description: "Very Important Pipeline"

# How long to keep underlying Argo workflow (together with pods and data
# volume after pipeline finishes) [in seconds]. Default: 1 week
ttl: 604800

# Optional network configuration
# network:

# Name of the vpc to use for running Vertex Pipeline
# vpc: my-vpc

# Hosts aliases to be placed in /etc/hosts when pipeline is executed
# host_aliases:
# - ip: 127.0.0.1
# hostnames: me.local

# What Kedro pipeline should be run as the last step regardless of the
# pipeline status. Used to send notifications or raise the alerts
# on_exit_pipeline: notify_via_slack

# Optional section allowing adjustment of the resources, reservations and limits
# for the nodes. When not provided they're set to 500m cpu and 1024Mi memory.
# If you don't want to specify pipeline resources set both to None in __default__.
resources:

# Default settings for the nodes
__default__:
cpu: 500m
memory: 1024Mi
2 changes: 1 addition & 1 deletion tests/test_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ def test_init(self, cwd):

assert result.exit_code == 0, result.output
assert result.output.startswith("Configuration generated in ")
with open(path.joinpath("conf/base/vertexai.yaml"), "r") as f:
with open(path.joinpath("conf/base/vertexai.yml"), "r") as f:
cfg = yaml.safe_load(f)
assert isinstance(cfg, dict), "Could not parse config as yaml"

Expand Down