Skip to content

Commit

Permalink
feat(build): add Spark v3.4.1 (#40)
Browse files Browse the repository at this point in the history
* update to 3.4.1
  • Loading branch information
Fan Ting Wei authored Sep 11, 2023
1 parent 89131b5 commit 947a1ab
Show file tree
Hide file tree
Showing 5 changed files with 52 additions and 7 deletions.
24 changes: 24 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,30 @@ jobs:
scala: "2.13"
with_hive: "true"
with_pyspark: "true"
- spark: "3.4.1"
java: "8"
hadoop: "3.3.4"
scala: "2.12"
with_hive: "true"
with_pyspark: "true"
- spark: "3.4.1"
java: "8"
hadoop: "3.3.4"
scala: "2.13"
with_hive: "true"
with_pyspark: "true"
- spark: "3.4.1"
java: "11"
hadoop: "3.3.4"
scala: "2.12"
with_hive: "true"
with_pyspark: "true"
- spark: "3.4.1"
java: "11"
hadoop: "3.3.4"
scala: "2.13"
with_hive: "true"
with_pyspark: "true"
runs-on: ubuntu-20.04
env:
IMAGE_NAME: "spark-k8s"
Expand Down
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
## v3

- (Temporarily drop support for R due to keyserver issues)
- Only supports for for 3.1.3, 3.2.2, 3.3.0 (dropped 2.4.8).
- Only supports for for 3.1.3, 3.2.2, 3.3.0, 3.4.1 (dropped 2.4.8).
- Supports both Java 8 and 11 for Spark 3 builds.
- Add Ubuntu-based image since the migration to eclipse-temurin for jre image source.

## v2

Expand Down
16 changes: 11 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,22 @@ Debian:
- `3.3.0`
- `3.2.2`
- `3.1.3`
- `3.4.1`

## Note

(R builds are temporarily suspended due to keyserver issues at current time.)

All the build images here are Debian based as the official Spark repository now
uses `openjdk:<java>-jdk-slim-buster` as the base image for Kubernetes build.
Because currently the official Dockerfiles do not pin the Debian distribution,
they are incorrectly using the latest Debian `bullseye`, which does not have
support for Python 2, and its Python 3.9 do not work well with PySpark.
Build image for Spark 3.4.1 is Ubuntu based because openjdk is deprecated and
going forward the official Spark repository uses `eclipse-temurin:<java>-jre`
where slim variants of jre images are not available at the moment.

All the build images with Spark before v3.4.0 are Debian based as the official
Spark repository now uses `openjdk:<java>-jre-slim-buster` as the base image
for Kubernetes build. Because currently the official Dockerfiles do not pin
the Debian distribution, they are incorrectly using the latest Debian `bullseye`,
which does not have support for Python 2, and its Python 3.9 do not work well
with PySpark.

Hence some Dockerfile overrides are in-place to make sure that Spark 2 builds
can still work.
Expand Down
11 changes: 10 additions & 1 deletion make-distribution.sh
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,15 @@ else
DOCKERFILE_PY="./resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile"
fi

if [[ ${SPARK_MAJOR_VERSION} -eq 3 && ${SPARK_MINOR_VERSION} -ge 4 ]]; then # >=3.4
# From Spark v3.4.0 onwards, openjdk is not the prefered base image source as it i
# deprecated and taken over by eclipse-temurin. slim-buster variants are not available
# on eclipse-temurin at the moment.
IMAGE_VARIANT="jre"
else
IMAGE_VARIANT="jre-slim-buster"
fi

# Temporarily remove R build due to keyserver issue
# DOCKERFILE_R="./resource-managers/kubernetes/docker/src/main/dockerfiles/R/Dockerfile"

Expand All @@ -83,7 +92,7 @@ TAG_NAME="${SELF_VERSION}_${SPARK_LABEL}_hadoop-${HADOOP_VERSION}_scala-${SCALA_
# build

./bin/docker-image-tool.sh \
-b java_image_tag=${JAVA_VERSION}-jre-slim-buster \
-b java_image_tag=${JAVA_VERSION}-${IMAGE_VARIANT} \
-r "${IMAGE_NAME}" \
-t "${TAG_NAME}" \
-f "${DOCKERFILE_BASE}" \
Expand Down
5 changes: 5 additions & 0 deletions templates/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,8 @@ versions:
java: ['8', '11']
hadoop: ['3.3.2']
scala: ['2.12', '2.13']

- spark: ['3.4.1']
java: ['8', '11']
hadoop: ['3.3.4']
scala: ['2.12', '2.13']

0 comments on commit 947a1ab

Please sign in to comment.