Kryo serialization exception with Spark

18 views
Skip to first unread message

André Araújo

unread,
Sep 25, 2024, 10:38:06 PM9/25/24
to kryo-users
Hi, all,

A bit of a n00bie question, but I would appreciate any tips on how to solve this.
The last statement in the following Spark/Scala example fails with the exception shown below.

What am I doing wrong? What's the best/right way to execute this?

Thanks,
André

case class Data(val msg: String)
def transform(msg: String): Data = Data(msg)

// this works locally
List("a").map(transform)

// this also works
sc.parallelize(List("a")).map(transform).count

// but collecting values from the RDD fails with serialization issues
sc.parallelize(List("a")).map(transform).collect

24/09/26 02:02:12 165 ERROR TaskResultGetter: Exception while getting task result
com.esotericsoftware.kryo.KryoException: Unable to find class: [L$line577.$read$$iw$$iw$Data;
at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160)
at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:804)
at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:348)
at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:88)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:75)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1919)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:62)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassNotFoundException: $line577.$read$$iw$$iw$Data
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
... 13 more
Name: org.apache.spark.SparkException
Message: Job aborted due to stage failure: Exception while getting task result: com.esotericsoftware.kryo.KryoException: Unable to find class: [L$line577.$read$$iw$$iw$Data;
StackTrace:   at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1928)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1916)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1915)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1915)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:951)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:951)
  at scala.Option.foreach(Option.scala:257)
  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:951)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2149)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2098)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2087)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:762)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2079)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2100)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2119)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2144)
  at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:990)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
  at org.apache.spark.rdd.RDD.collect(RDD.scala:989)

j...@durchholz.org

unread,
Sep 26, 2024, 12:13:41 PM9/26/24
to kryo-...@googlegroups.com
On 26.09.24 04:38, André Araújo wrote:
> // but collecting values from the RDD fails with serialization issues

What's the RDD? (I know nothing about Spark.)

> com.esotericsoftware.kryo.KryoException: Unable to find class:
> [L$line577.$read$$iw$$iw$Data;
> at
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160)

This means that Kryo cannot find the bytecode for class
[L$line577.$read$$iw$$iw$Data .
This looks like a class generated by the compiler, likely for a closure;
for some reason, your data contains such a thing, maybe because Scala
chose to not evaluate the closure (no idea if Scala has lazy evaluation)
or because the data structure was explicitly defined to contain a closure.

My conjecture would be:

Step 1: The Scala compiler generates that [L$line577.$read$$iw$$iw$Data
class, a data structure with a closure is serialized using that class.
Step 2: The Scala code is modified, or merely recompiled with different
options. Whatever the reason, the class generated for the closure now
has a different name than [L$line577.$read$$iw$$iw$Data .
Step 3: The new version tries to read the serialized data but cannot
interpret it because it has the class under a different name.

Disclaimer: My assumptions about what Scala does may be totally wrong,
or the problem might stem from some entirely different problem.
I'm just offering an answer since nobody else responded (yet).

HTH, and if it does not, please ignore.

Regards,
Jo
Reply all
Reply to author
Forward
0 new messages