Hello,
I am trying to write Avro objects to HDFS using the PackedAvroScheme. I have an aggregator function which sets the filled Avro object in a Tuple and adds it to the output collector.
When I run my job, I see errors like: "Could not load serializer for java.util.ArrayList" from Hadoop Serializer ...
Following some forum discussions, I added to "cascading.avro.serialization.AvroSpecificRecordSerialization" and "org.apache.hadoop.io.serializer.JavaSerialization" to "io.serializations" configuration parameter. But now, I am seeing errors like below in my task logs (during the Reduce phase):
java.io.IOException: java.lang.ClassNotFoundException: cascading.tuple.Tuple
at org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:61)
at org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:40)
at cascading.tuple.hadoop.TupleSerialization$SerializationElementReader.read(TupleSerialization.java:628)
at cascading.tuple.hadoop.io.HadoopTupleInputStream.readType(HadoopTupleInputStream.java:105)
at cascading.tuple.hadoop.io.HadoopTupleInputStream.getNextElement(HadoopTupleInputStream.java:52)
at cascading.tuple.io.TupleInputStream.readTuple(TupleInputStream.java:78)
at cascading.tuple.io.TupleInputStream.readTuple(TupleInputStream.java:67)
at cascading.tuple.hadoop.io.HadoopTupleInputStream.readIndexTuple(HadoopTupleInputStream.java:58)
at cascading.tuple.io.TupleInputStream.readIndexTuple(TupleInputStream.java:106)
at cascading.tuple.hadoop.io.IndexTupleDeserializer.deserialize(IndexTupleDeserializer.java:38)
at cascading.tuple.hadoop.io.IndexTupleDeserializer.deserialize(IndexTupleDeserializer.java:28)
at org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1421)
at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1361)
I am using cascading 2.6.1 and cascading-avro-2.1.2. Any idea what's going on here?
Thanks!
Sowmi
From: Sowmitra Thallapragada
Sent: May 19, 2015 7:13:17am PDT
To: cascadi...@googlegroups.com
Subject: Serialization errors
cascading.tuple.hadoop.TupleSerialization,org.apache.hadoop.io.serializer.WritableSerialization,cascading.avro.serialization.AvroSpecificRecordSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization,org.apache.hadoop.io.serializer.JavaSerialization
From: Sowmitra Thallapragada
Sent: May 19, 2015 7:35:39am PDT
To: cascadi...@googlegroups.com
Subject: Re: Serialization errors
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/66155670-785F-40D5-A888-55B2BB7E91F6%40transpac.com.--
You received this message because you are subscribed to a topic in the Google Groups "cascading-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cascading-user/ROsxUjQvVfo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
From: Sowmitra Thallapragada
Sent: May 19, 2015 9:49:01am PDT
2015-05-19 21:25:35,022 INFO [Thread-5] cascading.tuple.collect.SpillableTupleList: attempting to load codec: org.apache.hadoop.io.compress.GzipCodec 2015-05-19 21:25:35,022 INFO [Thread-5] cascading.tuple.collect.SpillableTupleList: found codec: org.apache.hadoop.io.compress.GzipCodec 2015-05-19 21:25:35,031 ERROR [Thread-5] cascading.tuple.hadoop.TupleSerialization$SerializationElementReader: failed deserializing token: 32 with classname: java.util.ArrayList java.io.IOException: java.lang.ClassNotFoundException: cascading.tuple.Tuple at org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:61) at org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:40) at cascading.tuple.hadoop.TupleSerialization$SerializationElementReader.read(TupleSerialization.java:628) at cascading.tuple.hadoop.io.HadoopTupleInputStream.readType(HadoopTupleInputStream.java:105) at cascading.tuple.hadoop.io.HadoopTupleInputStream.getNextElement(HadoopTupleInputStream.java:52) at cascading.tuple.io.TupleInputStream.readTuple(TupleInputStream.java:78) at cascading.tuple.io.TupleInputStream.readTuple(TupleInputStream.java:67)
--
You received this message because you are subscribed to a topic in the Google Groups "cascading-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cascading-user/ROsxUjQvVfo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/2F49E4C6-CA83-4070-8528-B67973A7AE2E%40transpac.com.
Hi Ken,
Finally zeroed in on the problem. For one of the datasets, I was using AvroScheme instead of PackedAvroScheme, and the data I was reading had a schema like below. Looks like AvroScheme does not handle the ser/de of the array field well. Are you aware of any issues around that?
{
"type" : "record",
"name" : "Object",
"fields" : [ {
"name" : "field1",
"type" : "int"
}, {
"name" : "field2",
"type" : "string"
}, {
"name" : "field3",
"type" : "boolean"
}, {
"name" : "field4",
"type" : {
"type" : "array",
"items" : {
"type" : "record",
"name" : "field4_type",
"fields" : [ {
"name" : "subField1",
"type" : "string"
}, {
"name" : "subField2",
"type" : "string"
} ]
}
}
} ]
}
From: Sowmitra Thallapragada
Sent: May 19, 2015 9:15:29pm PDT
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/CANEcNg7Bdcs4zH4pLFjaj%2BP_S3Q0CdOEjbtMpsMoyp8L8aNhKQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.