Hi,
I'm running a scalding job over a very large dataset.
In it's last phase, I get the following exception in the Map Cleanup phase:
Note that running the same job on a smaller dataset does not cause this. Any idea?
ERROR | cascading.tuple.hadoop.TupleSerialization$SerializationElementWriter | failed serializing token: null with classname: com.akamai.csi.jobs.needles.scalding.fpneedle.model.AggregationObject caught Throwable, no trap available, rethrowing |
From: Hagai Attias
Sent: April 16, 2015 12:19:05am PDT
To: cascadi...@googlegroups.com
Subject: [scalding] Spill failed in Map cleanup phase
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/7cc48b71-3c07-420e-9d4c-69696916a3dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
From: Hagai Attias
Sent: April 16, 2015 6:24:37am PDT
To: cascadi...@googlegroups.com
Subject: Re: [scalding] Spill failed in Map cleanup phase
AggregationObject is a custom case class which extends scala.Serializable. It doesn't implement Hadoop's Writable interface explicitly.It truly seems like a serialization issue when records are sorted and spill, but I'm not sure how to solve.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/a58c6769-f3be-4a2c-bbe2-d734fcc6c58d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1089)and by that time, the data is already serialized. I think something else is going on here to trigger the IOException.
Caused by: java.io.EOFException
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/F8AFF803-C2F2-4362-9FEF-57FC95E66433%40transpac.com.
.groupBy { case (k, v) => PolicyHostSelectorKey(k.appId, k.host, k.selector) }
Here's PolicyHostSelectorKey:
case class PolicyHostSelectorKey(appId: String, host: String, selector: String) extends Serializable {
}
object PolicyHostSelectorKey {
implicit val ord: Ordering[PolicyHostSelectorKey] = Ordering.by(PolicyHostSelectorKey.unapply)
}
And now the error pops when keys are being ordered. Is something wrong with my ordering?
Here's the updated stacktrace:
Caused by: cascading.CascadingException: unable to compare Tuples, likely a CoGroup is being attempted on fields of different types or custom comparators are incorrectly set on Fields, lhs: '7.175E-43' rhs: 'PolicyHostSelectorKey(klrs_16124,somedomain.com,ARGS_NAMES:{"cid":"1090cbb008","options":{"data":{"styles":[{"name":"Image With Caption","props":{"color":{"value":""},"font-family":{"value":""},"font-weight":{"value":""},"text-align":{"value":""},"font-size":{"value":""},"font-style":{"value":""},"line-height":{"value":""}},"selector":".mcnTextContent"}],"numberOfCaptions":1,"captionPosition":"right","captionWidth":"half","captions":[{"image":{"alt":"","width":1024,"src":"https:\/\/domain.com\/98943935dd3a74794c6fe19c8\/images\/77d330cf-ed79-42e9-8918-06950bd9228c.png","height":683},"text":"Your text caption goes here"}],"align":"center","selectedCaption":0},"socket_id":"44455.18049364","html":"<table border)'
at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:91)
at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:33)
at cascading.tuple.hadoop.util.DeserializerComparator.compareTuples(DeserializerComparator.java:160)
... 7 more
Caused by: java.lang.ClassCastException: java.lang.Float cannot be cast to com.akamai.csi.jobs.needles.scalding.fpneedle.model.keys.PolicyHostSelectorKey
at com.akamai.csi.jobs.needles.scalding.fpneedle.model.keys.PolicyHostSelectorKey$$anonfun$1.apply(PolicyHostSelectorKey.scala:19)
at scala.math.Ordering$$anonfun$by$1.apply(Ordering.scala:219)
at scala.math.Ordering$$anonfun$by$1.apply(Ordering.scala:219)
at scala.math.Ordering$$anon$9.compare(Ordering.scala:200)
at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:87)
... 9 more
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/798bd745-95a9-4195-a823-4858afe21ce2%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/19f44c59-c090-4e72-ae18-07393b5f7e6a%40googlegroups.com.
mapred.reduce.child.java.opts | -Xmx2576980378 |
mapred.child.java.opts | -Xmx200m |
2015-04-26 15:41:09,749 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2015-04-26 15:41:09,750 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.<init>(String.java:215) at com.esotericsoftware.kryo.io.Input.readString(Input.java:448) at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:157) at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:146) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) at com.twitter.chill.TraversableSerializer.read(Traversable.scala:43) at com.twitter.chill.TraversableSerializer.read(Traversable.scala:21) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) at com.twitter.chill.TraversableSerializer.read(Traversable.scala:43) at com.twitter.chill.TraversableSerializer.read(Traversable.scala:21) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:629) at com.twitter.chill.SerDeState.readObject(SerDeState.java:58) at com.twitter.chill.KryoPool.fromBytes(KryoPool.java:105) at com.twitter.chill.hadoop.KryoDeserializer.deserialize(KryoDeserializer.java:51) at cascading.tuple.hadoop.TupleSerialization$SerializationElementReader.read(TupleSerialization.java:628) at cascading.tuple.hadoop.io.HadoopTupleInputStream.readType(HadoopTupleInputStream.java:105) at cascading.tuple.hadoop.io.HadoopTupleInputStream.getNextElement(HadoopTupleInputStream.java:52) at cascading.tuple.io.TupleInputStream.readTuple(TupleInputStream.java:78) at cascading.tuple.hadoop.io.TupleDeserializer.deserialize(TupleDeserializer.java:40) at cascading.tuple.hadoop.io.TupleDeserializer.deserialize(TupleDeserializer.java:28) at org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1261) at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1199) at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:255) at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:251) at cascading.flow.hadoop.util.TimedIterator.next(TimedIterator.java:74) at cascading.flow.hadoop.HadoopGroupByClosure$1.next(HadoopGroupByClosure.java:113) at cascading.flow.hadoop.HadoopGroupByClosure$1.next(HadoopGroupByClosure.java:71) at cascading.pipe.joiner.InnerJoin$JoinIterator.next(InnerJoin.java:190)
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/3259988c-f83f-450e-9fc5-63eca8a9aa25%40googlegroups.com.