-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48664] Add Apache Spark 4.0.0-preview1 Dockerfiles #61
Conversation
Thank you @cloud-fan for publishing the docker image, can we make the CI pass? |
I have no idea why it fails... |
I just followed #33 |
wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \ | ||
wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \ | ||
export GNUPGHOME="$(mktemp -d)"; \ | ||
wget -nv -O KEYS https://downloads.apache.org/spark/KEYS; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yaooqinn do you know why other Spark versions' dockerfiles are fine without this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, (it might because your key is new and theirs are cached)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See related: #54 (comment)
we need upload the key to https://keys.openpgp.org/upload
let me check the file permissions. |
It seems that we need to bump up mini kube to fix ITs like apache/spark#44813 |
# curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 | ||
# TODO(SPARK-44495): Resume to use the latest minikube for k8s-integration-tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove these comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending CIs
The merge script didn't work for me, I'll just use the merge PR button in github. |
@Yikun I think we need to update |
"4.0.0-preview1-scala2.13-java17-python3-ubuntu", | ||
"4.0.0-preview1-java17-python3", | ||
"4.0.0-preview1-java17", | ||
"python3-java17" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With hindsight, I think we should have a more strict naming rule for tags. For example, python3-java17
is a bad name as we don't even know which spark version it is.
I'd propose: spark_version-scala_version-java_version-[python support]-[r support]-OS
if the spark version only supports one scala or java version, we can omit the scala/java version in the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the name.
Is -OS necessary as a JVM app or not?If necessary, the OS versions or codenames like Focal Fossa
also matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah we can omit OS name as well. If one day we publish docker images with different OS, we can add back the OS in the tag name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some new case in 4.0.0, so we can't just take the 3.5 version java17 version as reference:
- java17 as default version
- scala2.13 as default version
the version rule in original design, there are 4 type special version:
latest: the latest version of current majar.minor version
3.3.0: the 3.3.0 version with default scala/python3
python3/r/scala: latest python3/r/scala version
3.3.0-r/3.3.0-scala/3.3.0-python3: the 3.3.0 version with r/scala only/python3
If we publish the 4.0.0 someday, it should be:
- 4.0.0-scala2.13-java17-python3-ubuntu:
"4.0.0-scala2.13-java17-python3-ubuntu"
"4.0.0-python3",
"4.0.0", // for user don't care spark pyspark/java/scala version
"python3", // for pyspark user but don't care spark java/scala version
"latest" // for user but don't care spark pyspark/java/scala version
- 4.0.0-scala2.13-java17-r-ubuntu
"4.0.0-scala2.13-java17-r-ubuntu"
"4.0.0-r", // for sparkr 4.0.0 user but don't care spark java/scala version
"r", // for sparkr user but don't care spark java/scala version
- 4.0.0-scala2.13-java17-ubuntu
"4.0.0-scala2.13-java17-r-ubuntu"
"4.0.0-scala", // for sparkr 4.0.0 scala user but don't care spark java/scala version
"scala", // for sparkr scala user but don't care spark java/scala version
- 4.0.0-scala2.13-java17-python3-r-ubuntu
"4.0.0-scala2.13-java17-python3-r-ubuntu"
so for 4.0.0-preview at this point:
{
"path": "4.0.0-preview1/scala2.13-java17-python3-ubuntu",
"tags": [
"4.0.0-preview1-scala2.13-java17-python3-ubuntu",
"4.0.0-preview1-python3",
]
},
{
"path": "4.0.0-preview1/scala2.13-java17-r-ubuntu",
"tags": [
"4.0.0-preview1-scala2.13-java17-r-ubuntu",
"4.0.0-preview1-r"
]
},
{
"path": "4.0.0-preview1/scala2.13-java17-ubuntu",
"tags": [
"4.0.0-preview1-scala2.13-java17-ubuntu",
"4.0.0-preview1-scala"
]
},
{
"path": "4.0.0-preview1/scala2.13-java17-python3-r-ubuntu",
"tags": [
"4.0.0-preview1-scala2.13-java17-python3-r-ubuntu"
]
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to believe people don't care about spark versions...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the idea is should we want to remove python3
, scala
, r
special tag?
original:
latest: the latest version of current majar.minor version
3.3.0: the 3.3.0 version with default scala/python3
python3/r/scala: latest python3/r/scala version
3.3.0-r/3.3.0-scala/3.3.0-python3: the 3.3.0 version with r/scala only/python3
tobe:
latest: the latest version of current majar.minor version
3.3.0: the 3.3.0 version with default scala/python3
3.3.0-r/3.3.0-scala/3.3.0-python3: the 3.3.0 version with r/scala only/python3
It's hard to believe people don't care about spark versions...
- IIRC, the background of
python3/r/scala
is that there were 3 images before (spark, spark-py, spark-r), then spark --> scala, spark-py --> python3, spark-r --> r . - BTW, the
spark:python3
,spark:r
is also used byRun now
in https://spark.apache.org/
also cc @HyukjinKwon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should provide a single image with all supports (Python and R). I don't think package size is a big concern here. We can also provide variants with different scala and java versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should provide a single image with all supports (Python and R). I don't think package size is a big concern here.
There were some discussion on it [1] (only 150MB plus by adding r), the 4.0.0-scala2.13-java17-python3-r-ubuntu
is all-in-one image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, the all-in-one image isn't that big. Why do we provide other variants that remove python or r support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 reason I think:
- To cover kubernetes best practice, to keep images size as small as possible.
- As I said before, there were spark-py / spark-r / spark before, to align old version images.
I have seen these tags present at https://hub.docker.com/r/apache/spark/tags but absent at https://hub.docker.com/_/spark It seems that we need to a PR like docker-library/official-images#16325 to publish Docker Official Images |
The version which the latest tag points to is also outdated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So sorry for the late reply
wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \ | ||
wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \ | ||
export GNUPGHOME="$(mktemp -d)"; \ | ||
wget -nv -O KEYS https://downloads.apache.org/spark/KEYS; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See related: #54 (comment)
we need upload the key to https://keys.openpgp.org/upload
"4.0.0-preview1-scala2.13-java17-python3-ubuntu", | ||
"4.0.0-preview1-java17-python3", | ||
"4.0.0-preview1-java17", | ||
"python3-java17" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some new case in 4.0.0, so we can't just take the 3.5 version java17 version as reference:
- java17 as default version
- scala2.13 as default version
the version rule in original design, there are 4 type special version:
latest: the latest version of current majar.minor version
3.3.0: the 3.3.0 version with default scala/python3
python3/r/scala: latest python3/r/scala version
3.3.0-r/3.3.0-scala/3.3.0-python3: the 3.3.0 version with r/scala only/python3
If we publish the 4.0.0 someday, it should be:
- 4.0.0-scala2.13-java17-python3-ubuntu:
"4.0.0-scala2.13-java17-python3-ubuntu"
"4.0.0-python3",
"4.0.0", // for user don't care spark pyspark/java/scala version
"python3", // for pyspark user but don't care spark java/scala version
"latest" // for user but don't care spark pyspark/java/scala version
- 4.0.0-scala2.13-java17-r-ubuntu
"4.0.0-scala2.13-java17-r-ubuntu"
"4.0.0-r", // for sparkr 4.0.0 user but don't care spark java/scala version
"r", // for sparkr user but don't care spark java/scala version
- 4.0.0-scala2.13-java17-ubuntu
"4.0.0-scala2.13-java17-r-ubuntu"
"4.0.0-scala", // for sparkr 4.0.0 scala user but don't care spark java/scala version
"scala", // for sparkr scala user but don't care spark java/scala version
- 4.0.0-scala2.13-java17-python3-r-ubuntu
"4.0.0-scala2.13-java17-python3-r-ubuntu"
so for 4.0.0-preview at this point:
{
"path": "4.0.0-preview1/scala2.13-java17-python3-ubuntu",
"tags": [
"4.0.0-preview1-scala2.13-java17-python3-ubuntu",
"4.0.0-preview1-python3",
]
},
{
"path": "4.0.0-preview1/scala2.13-java17-r-ubuntu",
"tags": [
"4.0.0-preview1-scala2.13-java17-r-ubuntu",
"4.0.0-preview1-r"
]
},
{
"path": "4.0.0-preview1/scala2.13-java17-ubuntu",
"tags": [
"4.0.0-preview1-scala2.13-java17-ubuntu",
"4.0.0-preview1-scala"
]
},
{
"path": "4.0.0-preview1/scala2.13-java17-python3-r-ubuntu",
"tags": [
"4.0.0-preview1-scala2.13-java17-python3-r-ubuntu"
]
},
### What changes were proposed in this pull request? - Remove `wget -nv -O KEYS https://downloads.apache.org/spark/KEYS` and `gpg --import KEYS;`, it works but not meet the [security concern](https://github.com/docker-library/official-images?tab=readme-ov-file#security) from DOI. - add `add-dockerfiles.sh 4.0.0-preview1` support to address #61 (comment) - Fix `versions.json`: consider java17 as default version, so we can remove java17 tag. ### Why are the changes needed? update 4.0.0-preview1 ### Does this PR introduce _any_ user-facing change? new release ### How was this patch tested? ./add-dockerfiles.sh 4.0.0-preview1 , no diff Closes #63 from Yikun/4.0.0-preview1. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
What changes were proposed in this pull request?
add docker files for 4.0.0-preview1
Why are the changes needed?
new release
Does this PR introduce any user-facing change?
new release
How was this patch tested?
new tests