Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete relocation of scala case classes (Apache Spark + Shadow) #400

Open
johkelly opened this issue Sep 17, 2018 · 6 comments
Open

Comments

@johkelly
Copy link

Apache Spark allows scala case classes to be used as implicit definitions of schemas for the Dataset structure. Under the hood, this appears to be powered by reflection code buried in the spark-catalyst codebase. I couldn't follow what was actually happening in there; IntelliJ's debugger was having trouble following the codepaths and they're using some mildly esoteric scala language features.

The issue arises when the case class being used as a schema has been relocated. A motivating example is com.google.protobuf.timestamp.Timestamp being relocated to avoid a common dependency conflict with spark/hadoop libraries, but still being used in a schema as part of a case class compiled from protobuf definitions.

Related to #146 , #269, and #318 in that this might be another case of the ASM libraries not being up to the task of dealing with scala bytecode.

Shadow Version

tested with 2.0.3 and 2.0.4

Gradle Version

tested with 4.4 and 4.6

Expected Behavior

The Encoder(s) for the relocated case class(es) are generated correctly, and allow spark to use the relocated class(es) as schemas or components in a schema.

Actual Behavior

When run on a multi-node cluster:

java.lang.AssertionError: assertion failed: unsafe symbol Timestamp (child of package timestamp) in runtime reflection universe

Full stack trace: https://gist.github.com/johkelly/0c99c7bf717adc610fc906296be02850

When run locally, triggering (what I believe is) the same code:

Exception in thread "main" java.lang.UnsupportedOperationException: No Encoder found for encoder_relocation.relocated.RelocatedClass

Gradle Build Script(s)

Sample gradle project:
https://gist.github.com/johkelly/ff78f6c80bcbe38e1e73c598b364395b

@murphp15
Copy link

@johkelly Did you ever resolve this issue?
We are facing the same issue now.

@johkelly
Copy link
Author

I did not. It hasn't been high enough priority for me to dig further for more info, either.

@mfawcett
Copy link

@johkelly we are now running into this issue as well, trying to shade spark-xml for use in Scala.

@tribbloid
Copy link

Yes this also affects me, trying to migrate json4s for a conflict with Apache Spark

@tribbloid
Copy link

FYI: Apparently a circumvention has been proposed here:

https://discuss.gradle.org/t/possible-to-build-spark-fat-jar-with-gradle/42235/3

haven't verify it yet, you are welcomed to post your test result here

@tribbloid
Copy link

tribbloid commented Sep 18, 2022

No it doesn't work, turns out to be false info:

Symbol 'term org.json4s' is missing from the classpath.
This symbol is required by ' <none>'.
Make sure that term json4s is in your classpath and check for conflicting dependencies with `-Ylog-classpath`.
A full rebuild may help if 'package.class' was compiled against an incompatible version of org.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants