Skip to content

Latest commit

 

History

History
36 lines (28 loc) · 2.27 KB

README.md

File metadata and controls

36 lines (28 loc) · 2.27 KB

KubernetesPodOperator on Airflow

The KubernetesPodOperator in Airflow is a very powerful operator. It allows you to build and run any image in a Kubernetes cluster. This is particularly useful if you have enough DAGs inside your Airflow server that you start to run into dependency conflicts. By running your task inside a Docker image, you are given the power to define an environment completely separate from your Airflow environment.

Things you'll need before you can run the KubernetesPodOperator on Airflow:

  1. A Kubernetes cluster. Have the kubeconfig file available on your Airflow server for your DAG to point to.
  2. A registry to pull your image from. Here I will simply use the Docker Registry.

Preparations

This simple demonstration will show how to use secrets and configmaps from a Kubernetes cluster in a KubernetesPodOperator, so the first step should be to create those resources in our cluster.

Create Kubernetes resouces

To create the configmap, run

# kubectl create configmap NAME --from-env-file=/path/to/file
kubectl create configmap airflow-configmap --from-env-file=configmap.txt

To create the secret, run

# kubectl create secret generic NAME --from-file=/path/to/file
kubectl create secret generic airflow-secret --from-file=secret.json

Docker Registry

Next step is to set up the Docker Registry. The docker_registry_command.sh file contains a docker run command for spinning up a registry container. With the registry container up, build the image inside python_image/ and give it a tag so that it points to your registry, then push it

cd python_image
docker build -t localhost:5000/python_script .
docker push localhost:5000/python_script

To check if the image has been successfully pushed to the registry, go to http://localhost:5000/v2/_catalog.

kubepod_DAG.py

With these initial steps completed, the kubepod DAG can be run. The DAG contains two tasks. The first communicates with a Kubernetes cluster to perform a simple task from within that cluster, which is to run script.py inside python_image/. This script will get the values of the secret and the configmap we just created. The second task is a BashOperator that gets the output from the XCom passed by the KubernetesPodOperator.