Incomplete relocation of scala case classes (Apache Spark + Shadow) #400

johkelly · 2018-09-17T16:42:33Z

Apache Spark allows scala case classes to be used as implicit definitions of schemas for the Dataset structure. Under the hood, this appears to be powered by reflection code buried in the spark-catalyst codebase. I couldn't follow what was actually happening in there; IntelliJ's debugger was having trouble following the codepaths and they're using some mildly esoteric scala language features.

The issue arises when the case class being used as a schema has been relocated. A motivating example is com.google.protobuf.timestamp.Timestamp being relocated to avoid a common dependency conflict with spark/hadoop libraries, but still being used in a schema as part of a case class compiled from protobuf definitions.

Related to #146 , #269, and #318 in that this might be another case of the ASM libraries not being up to the task of dealing with scala bytecode.

Shadow Version

tested with 2.0.3 and 2.0.4

Gradle Version

tested with 4.4 and 4.6

Expected Behavior

The Encoder(s) for the relocated case class(es) are generated correctly, and allow spark to use the relocated class(es) as schemas or components in a schema.

Actual Behavior

When run on a multi-node cluster:

java.lang.AssertionError: assertion failed: unsafe symbol Timestamp (child of package timestamp) in runtime reflection universe

Full stack trace: https://gist.github.com/johkelly/0c99c7bf717adc610fc906296be02850

When run locally, triggering (what I believe is) the same code:

Exception in thread "main" java.lang.UnsupportedOperationException: No Encoder found for encoder_relocation.relocated.RelocatedClass

Gradle Build Script(s)

Sample gradle project:
https://gist.github.com/johkelly/ff78f6c80bcbe38e1e73c598b364395b

The text was updated successfully, but these errors were encountered:

murphp15 · 2019-04-18T16:25:32Z

@johkelly Did you ever resolve this issue?
We are facing the same issue now.

johkelly · 2019-04-18T21:52:40Z

I did not. It hasn't been high enough priority for me to dig further for more info, either.

mfawcett · 2020-03-24T00:14:11Z

@johkelly we are now running into this issue as well, trying to shade spark-xml for use in Scala.

tribbloid · 2022-06-12T01:29:34Z

Yes this also affects me, trying to migrate json4s for a conflict with Apache Spark

tribbloid · 2022-09-01T16:40:28Z

FYI: Apparently a circumvention has been proposed here:

https://discuss.gradle.org/t/possible-to-build-spark-fat-jar-with-gradle/42235/3

haven't verify it yet, you are welcomed to post your test result here

tribbloid · 2022-09-18T01:09:11Z

No it doesn't work, turns out to be false info:

Symbol 'term org.json4s' is missing from the classpath.
This symbol is required by ' <none>'.
Make sure that term json4s is in your classpath and check for conflicting dependencies with `-Ylog-classpath`.
A full rebuild may help if 'package.class' was compiled against an incompatible version of org.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incomplete relocation of scala case classes (Apache Spark + Shadow) #400

Incomplete relocation of scala case classes (Apache Spark + Shadow) #400

johkelly commented Sep 17, 2018

murphp15 commented Apr 18, 2019

johkelly commented Apr 18, 2019

mfawcett commented Mar 24, 2020

tribbloid commented Jun 12, 2022

tribbloid commented Sep 1, 2022

tribbloid commented Sep 18, 2022 •

edited

Loading

Incomplete relocation of scala case classes (Apache Spark + Shadow) #400

Incomplete relocation of scala case classes (Apache Spark + Shadow) #400

Comments

johkelly commented Sep 17, 2018

Shadow Version

Gradle Version

Expected Behavior

Actual Behavior

Gradle Build Script(s)

murphp15 commented Apr 18, 2019

johkelly commented Apr 18, 2019

mfawcett commented Mar 24, 2020

tribbloid commented Jun 12, 2022

tribbloid commented Sep 1, 2022

tribbloid commented Sep 18, 2022 • edited Loading

tribbloid commented Sep 18, 2022 •

edited

Loading