Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spark): integrate Spark operator in Kubeflow manifests #2889

Merged
merged 21 commits into from
Oct 15, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions .github/workflows/spark_test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: Build & Apply Spark manifest in KinD
on:

Check warning on line 2 in .github/workflows/spark_test.yaml

View workflow job for this annotation

GitHub Actions / format_YAML_files

2:1 [truthy] truthy value should be one of [false, true]
pull_request:
paths:
- tests/gh-actions/install_KinD_create_KinD_cluster_install_kustomize.sh
- .github/workflows/spark_test.yaml
- contrib/spark/**

jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Install KinD, Create KinD cluster and Install kustomize
run: ./tests/gh-actions/install_KinD_create_KinD_cluster_install_kustomize.sh

- name: Install Istio
run: ./tests/gh-actions/install_istio.sh

- name: Install oauth2-proxy
run: ./tests/gh-actions/install_oauth2-proxy.sh

- name: Install cert-manager
run: ./tests/gh-actions/install_cert_manager.sh

- name: Create kubeflow namespace
run: kustomize build common/kubeflow-namespace/base | kubectl apply -f -

- name: Install KF Multi Tenancy
run: ./tests/gh-actions/install_multi_tenancy.sh

- name: Create KF Profile
run: kustomize build common/user-namespace/base | kubectl apply -f -

- name: Build & Apply manifests
run: |
cd contrib/spark/
export KF_PROFILE=kubeflow-user-example-com
make test
11 changes: 11 additions & 0 deletions contrib/spark/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
SPARK_OPERATOR_RELEASE_VERSION ?= 2.0.1
SPARK_OPERATOR_HELM_CHART_REPO ?= https://kubeflow.github.io/spark-operator

.PHONY: spark-operator/base
spark-operator/base:
mkdir -p spark-operator/base
cd spark-operator/base && helm template --include-crds spark-operator spark-operator --version ${SPARK_OPERATOR_RELEASE_VERSION} --repo ${SPARK_OPERATOR_HELM_CHART_REPO} > resources.yaml

.PHONY: test
test:
./test.sh ${KF_PROFILE}
5 changes: 5 additions & 0 deletions contrib/spark/OWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
approvers:
- juliusvonkohout
reviewers:
- juliusvonkohout
- GezimSejdiu
26 changes: 26 additions & 0 deletions contrib/spark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Kubeflow Spark Operator
juliusvonkohout marked this conversation as resolved.
Show resolved Hide resolved

[![Integration Test](https://github.com/kubeflow/spark-operator/actions/workflows/integration.yaml/badge.svg)](https://github.com/kubeflow/spark-operator/actions/workflows/integration.yaml)[![Go Report Card](https://goreportcard.com/badge/github.com/kubeflow/spark-operator)](https://goreportcard.com/report/github.com/kubeflow/spark-operator)

## What is Spark Operator?

The Kubernetes Operator for Apache Spark aims to make specifying and running [Spark](https://github.com/apache/spark) applications as easy and idiomatic as running other workloads on Kubernetes. It uses
[Kubernetes custom resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) for specifying, running, and surfacing status of Spark applications.

## Overview

For a complete reference of the custom resource definitions, please refer to the [API Definition](docs/api-docs.md). For details on its design, please refer to the [Architecture](https://www.kubeflow.org/docs/components/spark-operator/overview/#architecture). It requires Spark 2.3 and above that supports Kubernetes as a native scheduler backend.

The Kubernetes Operator for Apache Spark currently supports the following list of features:

* Supports Spark 2.3 and up.
* Enables declarative application specification and management of applications through custom resources.
* Automatically runs `spark-submit` on behalf of users for each `SparkApplication` eligible for submission.
* Provides native [cron](https://en.wikipedia.org/wiki/Cron) support for running scheduled applications.
* Supports customization of Spark pods beyond what Spark natively is able to do through the mutating admission webhook, e.g., mounting ConfigMaps and volumes, and setting pod affinity/anti-affinity.
* Supports automatic application re-submission for updated `SparkApplication` objects with updated specification.
* Supports automatic application restart with a configurable restart policy.
* Supports automatic retries of failed submissions with optional linear back-off.
* Supports mounting local Hadoop configuration as a Kubernetes ConfigMap automatically via `sparkctl`.
* Supports automatically staging local application dependencies to Google Cloud Storage (GCS) via `sparkctl`.
* Supports collecting and exporting application-level metrics and driver/executor metrics to Prometheus.
6 changes: 6 additions & 0 deletions contrib/spark/UPGRADE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Upgrading
```sh
# Step 1: Update SPARK_OPERATOR_RELEASE_VERSION in Makefile
# Step 2: Create new Spark operator manifest
make spark-operator/base
```
65 changes: 65 additions & 0 deletions contrib/spark/spark-operator/base/aggregated-roles.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kubeflow-spark-admin
labels:
app: spark-operator
app.kubernetes.io/name: spark-operator
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-admin: "true"
rules: []
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kubeflow-spark-edit
labels:
app: spark-operator
app.kubernetes.io/name: spark-operator
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-edit: "true"
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-admin: "true"
rules:
- apiGroups:
- sparkoperator.k8s.io
resources:
- sparkapplications
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- sparkoperator.k8s.io
resources:
- sparkapplications/status
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kubeflow-spark-view
labels:
app: spark-operator
app.kubernetes.io/name: spark-operator
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-view: "true"
rules:
- apiGroups:
- sparkoperator.k8s.io
resources:
- sparkapplications
verbs:
- get
- list
- watch
- apiGroups:
- sparkoperator.k8s.io
resources:
- sparkapplications/status
verbs:
- get
---

22 changes: 22 additions & 0 deletions contrib/spark/spark-operator/base/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- resources.yaml
- aggregated-roles.yaml
namespace: kubeflow
patches:
# Add securityContext to Spark Operator Pod.
- target:
kind: Deployment
labelSelector: "app.kubernetes.io/name=spark-operator"
patch: |-
- op: add
path: /spec/template/spec/containers/0/securityContext
value:
runAsUser: 1000
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operator is based on spark image that uses 185 as non root user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Let me then remove this as we have the non-root user by default on Spark.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just runAsNonRoot:true is enough then

allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
Loading
Loading