-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incomplete relocation of scala case classes (Apache Spark + Shadow) #400
Comments
@johkelly Did you ever resolve this issue? |
I did not. It hasn't been high enough priority for me to dig further for more info, either. |
@johkelly we are now running into this issue as well, trying to shade spark-xml for use in Scala. |
Yes this also affects me, trying to migrate json4s for a conflict with Apache Spark |
FYI: Apparently a circumvention has been proposed here: https://discuss.gradle.org/t/possible-to-build-spark-fat-jar-with-gradle/42235/3 haven't verify it yet, you are welcomed to post your test result here |
No it doesn't work, turns out to be false info:
|
Apache Spark allows scala case classes to be used as implicit definitions of schemas for the
Dataset
structure. Under the hood, this appears to be powered by reflection code buried in thespark-catalyst
codebase. I couldn't follow what was actually happening in there; IntelliJ's debugger was having trouble following the codepaths and they're using some mildly esoteric scala language features.The issue arises when the case class being used as a schema has been relocated. A motivating example is
com.google.protobuf.timestamp.Timestamp
being relocated to avoid a common dependency conflict with spark/hadoop libraries, but still being used in a schema as part of a case class compiled from protobuf definitions.Related to #146 , #269, and #318 in that this might be another case of the ASM libraries not being up to the task of dealing with scala bytecode.
Shadow Version
tested with
2.0.3
and2.0.4
Gradle Version
tested with
4.4
and4.6
Expected Behavior
The
Encoder
(s) for the relocated case class(es) are generated correctly, and allow spark to use the relocated class(es) as schemas or components in a schema.Actual Behavior
When run on a multi-node cluster:
Full stack trace: https://gist.github.com/johkelly/0c99c7bf717adc610fc906296be02850
When run locally, triggering (what I believe is) the same code:
Gradle Build Script(s)
Sample gradle project:
https://gist.github.com/johkelly/ff78f6c80bcbe38e1e73c598b364395b
The text was updated successfully, but these errors were encountered: