I'm on master branch to test out the new kryo 2.0 version serialization. The value I was trying to broadcast is a over serval GB trove hashmap. And I use com.esotericsoftware.kryo.serializers.JavaSerializer
for the hashmap. I can
confirm when directly used with Kryo, the trove collection can be
serialized and deserialized normally with com.esotericsoftware.kryo.serializers.JavaSerializer. However, I
encouter an EOF exception when used with spark during the stage deserializing broadcast variable .
13/02/04 20:02:39 INFO cluster.TaskSetManager: Loss was due to java.io.EOFException
at spark.KryoDeserializationStream.readObject(KryoSerializer.scala:44)
at spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:129)
at spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:40)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)
at java.io.ObjectInputStream.readSerialData(Unknown Source)
at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
at java.io.ObjectInputStream.readSerialData(Unknown Source)
at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
at java.io.ObjectInputStream.readSerialData(Unknown Source)
at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
at java.io.ObjectInputStream.readSerialData(Unknown Source)
at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.readObject(Unknown Source)
at spark.scheduler.ShuffleMapTask.readExternal(ShuffleMapTask.scala:115)
at java.io.ObjectInputStream.readExternalData(Unknown Source)
at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.readObject(Unknown Source)
at spark.JavaDeserializationStream.readObject(JavaSerializer.scala:23)
at spark.JavaSerializerInstance.deserialize(JavaSerializer.scala:45)
at spark.executor.Executor$TaskRunner.run(Executor.scala:93)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Any Suggestion to how to debug?
Thanks,
Jiacheng Guo