Tools for building your MLops/ResearchOps workflows.
Workflows is a container-native workflow engine for orchestrating parallel jobs on Kubernetes.We build it on top of Argo Workflows, which is implemented as a Kubernetes CRD (Custom Resource Definition).
You need to configure the ArgoWorkflow environment before deploy your own workflows. Fortunately, it is extremely convenient to install ArgoWorkflow based on the cloud-native k8s environment. Make sure you have kubectl
configured correctly on your machine, use kubectl apply
command with installation yaml file in workflows/
folder to install Argo:
# Create specific namespace for ArgoWorkflow
$ kubectl create ns argo
# Install ArgoWorkflow
$ kubectl apply -f workflows/install.yaml -n argo
Optional: Download the latest Argo CLI from official releases page which includes the guide on setting up.
To run Argo workflows that use artifacts, such as Mnist
we are running, you must configure and use an artifact repository. Argo supports any S3 compatible artifact repository such as AWS, GCS ( Google Cloud Storage ) and Minio. We all used GCS
in our examples:
-
Create a bucket and name it
mlops-example-bucket
from the GCP Console (https://console.cloud.google.com/storage/browser). -
Create the Service Account key and store it as a K8s secret:
# Create specific namespace for Mnist Demo $ kubectl create ns mnist-demo # Create secret for GCS used by Mnist workflows $ kubectl create secret generic mlops-bucket-serviceaccount --from-file=serviceAccountKey=<YOUR-SERVICE-ACCOUNT-KEY-file> -n mnist-demo
To access cluster resources, such as pods and workflows contronller, you should create a new service account with proper authorization.
$ kubectl create -f workflows/create-serviceaccounts.yaml -n mnist-demo
All scripts used for Mnist model training and evaling are in the mnist/
folder, use docker build
command to build and tag the image:
$ cd workflows/examples/mnist/docker/
$ docker build -t $DOCKER_REGISTRY/$MY_ORG/mnist-example:$TAG ./
$ docker push $DOCKER_REGISTRY/$MY_ORG/mnist-example:$TAG
Scripts used for Mnist serving are in the mnist-serving
folder:
$ cd workflows/examples/mnist-serving/docker/
$ docker build -t $DOCKER_REGISTRY/$MY_ORG/mnist-serving:$TAG ./
$ docker push $DOCKER_REGISTRY/$MY_ORG/mnist-serving:$TAG
Feel free to choose your favorite docker registry(dockerhub, huaweicloud swr...) and create the organization. You may need to login the registry before pushing.
After building and pushing the images, specify the image url in the corresponding yaml file, mnist-train-eval.yaml
in this demo.
$ cd workflows
$ vim examples/mnist-train-eval.yaml
# Update the value of 'image' field to your own docker registry url
Then all you have to do is set up the resources with kubectl:
# Setup Mnist workflow:
$ cd workflows
$ kubectl apply -f ./examples/mnist-train-eval.yaml -n mnist-demo
NOTE: Once all three steps in workflow mnist-train-eval
passed, you can visit the mnist website with url https://MASTER_NODE_IP:9003
. Draw a digit and test it.
Events is an event-driven workflow automation framework for Kubernetes which helps you trigger K8s objects, Argo Workflows, Serverless workloads, etc. on events from a variety of sources like webhooks, S3, schedules, messaging queues, gcp pubsub, sns, sqs, etc.
Argo Events is an event-driven workflow automation framework for Kubernetes which helps you trigger K8s objects, Argo Workflows, Serverless workloads, etc. on events from a variety of sources like webhooks, S3, schedules, messaging queues, gcp pubsub, sns, sqs, etc.
# Create specific namespace for argo events
$ kubectl create namespace argo-events
# Deploy Argo Events, SA, ClusterRoles, Sensor Controller, EventBus Controller and EventSource Controller.
# Cluster-wide Installation
$ kubectl apply -f events/install.yaml
# Or Namespace Installation
$ kubectl apply -f events/namespace-install.yaml
To make the Sensors be able to trigger Workflows, a Service Account with RBAC settings is required (assume you run the examples in the namespace argo-events).
$ kubectl apply -f events/create-serviceaccount.yaml -n argo-events
We are going to set up a sensor and event-source for webhook. The goal is to trigger an Argo workflow upon a HTTP Post request.
-
Set up the eventbus.
$ kubectl apply -f events/examples/eventbus_native.yaml -n argo-events
-
Create the webhook event source.
$ kubectl apply -f events/examples/webhook/eventsource_webhook.yaml -n argo-events
-
Create the webhook sensor.
$ kubectl apply -f events/examples/webhook/sensor_webhook.yaml -n argo-events
If the commands are executed successfully, the eventbus, event-source and sensor pods will get created. You will also notice that a service is created for the event-source.
-
Use either Curl or Postman to send a post request to the http://localhost:9100/example.
$ curl -d '{"message":"this is my first webhook"}' -H "Content-Type: application/json" -X POST http://localhost:9100/example
-
Now, you should see an Argo workflow being created.
$ kubectl get wf -n argo-events
-
Make sure the workflow pod ran successfully. You will see the message printed in the workflow logs
_____________________________ < this is my first webhook > ------------------------------ \ \ \ ## . ## ## ## == ## ## ## ## === /""""""""""""""""___/ === ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ / ===- ~~~ \______ o __/ \ \ __/ \____\______/
We build CD on top of ArgoCD, which is a declarative, GitOps continuous delivery tool for Kubernetes.
We will create a new namespace, argocd, where Argo CD services and application resources will live.
$ kubectl create namespace argocd
$ kubectl apply -f cd/install.yaml -n argocd
The initial password for the admin
account is auto-generated and stored as clear text in the field password
in a secret named argocd-initial-admin-secret
in your Argo CD installation namespace. You can simply retrieve this password using kubectl
:
$ kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo
Open a browser to the Argo CD external UI, and login by visiting the IP/hostname(https://MASTER_NODE_IP:2747) in a browser.
You can deploy your first MLops application in just a few steps, see cd/examples/getting-started-guide.md or ResearchOps/README.md (Recommend)
-
The current default setting of Argo needs to use the token to login, you may need to generate a token with shell script we provided:
$ ./workflows/gen_token.sh # Copy the output starting with 'Bearer' to the token box of the Argo login interface (https://MASTER_NODE_IP:2746) # Now you can see all the workflows in argo namespace on https://MASTER_NODE_IP:2746/workflows/argo web.
Argo url:
https://MASTER_NODE_IP:2746
-
Before you deploy the public network service, please make sure that the firewall policy of your cloud server allows outgoing communication on the required port, such as port
2746
and9003
. -
See more technical details in the Argo Workflows official document and Argo Events official document.
-
See more examples in Argo Workflows Github Repository and Argo Events Github Repository.
-
We plan to build the developer ultimate experience platform on the mlops Platform, which is also our original intention of building the mlops platform: to help models quickly land and iterate.