From ead0436e8a98eababce2514bf7857fadd785abbe Mon Sep 17 00:00:00 2001 From: Matej Sestak <70316007+sestys@users.noreply.github.com> Date: Tue, 11 May 2021 09:37:51 +0200 Subject: [PATCH] [Submission] Demo: CI&CD with Kubeflow Pipelines in GCP (#1500) * init feedback proposal * Final demo submission --- contributions/demo/johhamm-sestak/README.md | 116 ++++++++++++++++++-- 1 file changed, 108 insertions(+), 8 deletions(-) diff --git a/contributions/demo/johhamm-sestak/README.md b/contributions/demo/johhamm-sestak/README.md index 0c762c04a1..582165c514 100644 --- a/contributions/demo/johhamm-sestak/README.md +++ b/contributions/demo/johhamm-sestak/README.md @@ -1,14 +1,114 @@ -# Demo: Machine Learning CI/CD with Github Actions and Kubernetes + +# MLOPS: CI&CD with Kubeflow Pipelines in GCP -## Members -* Johan Hammarstedt, johham@kth.se, Github: jhammarstedt + -* Matej Sestak, sestak@kth.se, Github: sestys -## Topic -In this demo, we will show how to set up MLops pipeline using Kuberflow and Google Cloud. +This repo will demonstrate how to take the first step towards MLOps by setting up and deploying a ML CI/CD pipeline using Google Clouds AI Platform, Kubeflow and Docker. -## Format -The demo will be presented in a 3-5 minute long video showcasing our implementation and results. +This demo was created as a part of an assignment for a DevOps course given at KTH spring 2021 and a video demo will also be added shortly. +## ✍ Authors +* Johan Hammarstedt, [jhammarstedt](https://github.com/jhammarstedt) +* Matej Sestak, [Sestys](https://github.com/sestys) + + +## Video +[YouTube](https://www.youtube.com/watch?v=1DQxoU1s8dw) + + +## Git repo +Source code for this demo can be find [here](https://github.com/jhammarstedt/gcloud_MLOPS_demo). + + +## 🗺 Overview +The following topics will be covered: +1. Building each task as a docker container and running them with cloud build + * Preprocessing step: Loading data from GC bucket, editing it and storing a new file + * Training: Creating a pytorch model and build a **custom prediction routine** (GCP mainly supporst tensorflow, but you can add custom models) + * Deployment: Deploying your custom model to the AI Platform with version control +2. Creating a Kubeflow pipeline and connecting the above tasks +3. Perform CI by building Github Triggers in Cloud Build that will rebuild container upon a push to repository +4. CD by using Cloud Functions to trigger upon uploading new data to your bucket + +
+ +
+ + + + +## 🌉 Setting up the pipeline +Here we will go through the process of running the pipeline step by step: + +1. Create a gcp project, open the shell (make sure you're in the project), and clone the repository + + `$ git clone https://github.com/jhammarstedt/gcloud_MLOPS_demo.git` + +2. Create a [kubeflow pipeline](https://console.cloud.google.com/ai-platform/pipelines) +3. Run the `$ ./scripts/set_auth.sh` script in google cloud shell (might wanna change the SA_NAME), this gives us the roles we need to run the pipeline +4. Create a docker container for each step (each of the folders in the containers repo representes a different step) + * Do this by running: ```$ gcloud_MLOPS_demo/containers ./build_containers.sh``` from the cloud shell. + + This will run "build_single_container.sh in each directory" + * If you wish to try and just build one container, enter the directory which you want to build and run: + + `$ bash ../build_single_container.sh {directory name}` + +5. Each subfolder (which will be a container) includes: +