ClassCastException while reading json protobuf data through hive external table using EB

99 views
Skip to first unread message

Narayanan K

unread,
Mar 30, 2014, 4:56:47 PM3/30/14
to elephant...@googlegroups.com
Hi,

We have Json data in protobuf format. I was trying to create an external table on top of the Protobuf files of Json data in HDFS following the https://github.com/kevinweil/elephant-bird/wiki/How-to-use-Elephant-Bird-with-Hive – Reading Protocol Buffers.

I added the following jars : (Also included them in auxpath)

add jar /homes/xyz/elephant-bird-hive-3.0.5.jar;
add jar /homes/xyz/impression.jar;
add jar /homes/xyz/elephant-bird-core-3.0.5.jar;
add jar /homes/xyz/Impproto.jar;


CREATE EXTERNAL TABLE EXT_IMP_PROTO
PARTITIONED BY (dt string)
ROW FORMAT SERDE 'com.twitter.elephantbird.hive.serde.ProtobufDeserializer'
with serdeproperties (
    "serialization.class"="org.myorg.ImpProto.Data")
LOCATION '/user/xyz/data/';

ALTER TABLE EXT_IMP_PROTO ADD PARTITION(dt='2014032504') LOCATION '/user/xyz/data/2014032504/';

The table and partition got created.

Describe EXT_IMP_PROTO ;
Is giving the expected schema for the table as per the proto class.

But when I try to do a select * from EXT_IMP_PROTO where dt='2014032504' or select userid from EXT_IMP_PROTO where dt='2014032504';

I get the following error :

Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable
Time taken: 1.656 seconds


I also tried altering the table to add inputformat and outformat as com.twitter.elephantbird.mapred.input.DeprecatedRawMultiInputFormat and org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat respectively. But still I am getting the ClassCastException.

Is there something that I am missing here ?

It will be great if someone can tell me why I am getting this exception.

Regards
Narayan

Narayanan K

unread,
Apr 8, 2014, 10:29:00 AM4/8/14
to elephant...@googlegroups.com
Hi,

Just checking if someone has any suggestions on this problem.

Regards
Narayanan

Børge Svingen

unread,
May 6, 2014, 5:35:35 PM5/6/14
to elephant...@googlegroups.com

On Sunday, March 30, 2014 10:56:47 PM UTC+2, Narayanan K wrote:
 
We have Json data in protobuf format. I was trying to create an external table on top of the Protobuf files of Json data in HDFS following the https://github.com/kevinweil/elephant-bird/wiki/How-to-use-Elephant-Bird-with-Hive – Reading Protocol Buffers.

...
 
I also tried altering the table to add inputformat and outformat as com.twitter.elephantbird.mapred.input.DeprecatedRawMultiInputFormat and org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat respectively. But still I am getting the ClassCastException.

Is there something that I am missing here ?
 
Am I correct in assuming that your protobuf file has been created using writeDelimitedTo?

If so you need to provide an input format class capable of reading delimited protobuf files - as far as I know, none of the EB classes do this yet.

I have such a class myself, and it's working fine. I plan to add it to EB, but so far I can only get it to work with EB 3.0.3 - I believe this is due to the same problem as the one described on https://groups.google.com/d/msg/elephantbird-dev/L--OOt_N2K0/VSxf_faBRcQJ


-- 

Børge Svingen

Ramkumar Karunanidhi

unread,
Jun 11, 2015, 2:25:34 PM6/11/15
to elephant...@googlegroups.com
Hi,
Were you able to resolve the issue. I am also facing the same issue. Please help me if you have the solution.
Reply all
Reply to author
Forward
0 new messages