I'd like to process the output of MapReduce job with Hive. The output is a SequenceFile with NullWritable as a key and ProtobufWritable as value.
I try to read data into hive:
ADD JAR elephant-bird-core-4.7-SNAPSHOT.jar;
ADD JAR elephant-bird-hadoop-compat-4.7-SNAPSHOT.jar;
ADD JAR elephant-bird-hive-4.7-SNAPSHOT.jar;
ADD JAR ProtobufGeneratedClass.jar;
CREATE EXTERNAL TABLE sessions
ROW FORMAT SERDE "com.twitter.elephantbird.hive.serde.ProtobufDeserializer"
WITD SERDEPROPERTIES ("serialization.class"="Serialization ClassPath")
STORED AS
INPUTFORMAT "org.apache.hadoop.mapred.SequenceFileInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
LOCATION 'PathToDirectoryOfMROutput';
SELECT COUNT(*) FROM sessions;
And get the exception
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable com.twitter.elephantbird.mapreduce.io.ProtobufWritable@e35fcc5{could not be deserialized}
....
Caused by: java.lang.ClassCastException: com.twitter.elephantbird.mapreduce.io.ProtobufWritable cannot be cast to org.apache.hadoop.io.BytesWritable
How i could read my output structure into hive?