Protobuf serialized data stored in a hive table

262 views
Skip to first unread message

ian.von...@rd.io

unread,
Mar 21, 2013, 6:09:16 PM3/21/13
to elephant...@googlegroups.com
Sorry to start up a second thread, I am trying to build a table of thrift serialized data and one protobuf serialized so eventually I can compare the two. On the protobuf front I can not seem to create a table, I get a NULL pointer exception. I have the following .proto file:

package PlayEvent;

option java_package = "rdio.hive_extensions";

message PlayEventMessage {
    optional string tid = 1;  // transaction id
    ....
}

And after using protoc play_event.proto --java_out=. --python_out=. and then javac to create the class I files, I created a jar of all the class files:

$ jar -tf PlayEventProtos.jar 
META-INF/
META-INF/MANIFEST.MF
rdio/hive_extensions/PlayEvent$1.class
rdio/hive_extensions/PlayEvent.class
rdio/hive_extensions/PlayEvent$PlayEventMessage$Builder.class
rdio/hive_extensions/PlayEvent$PlayEventMessage.class
rdio/hive_extensions/PlayEvent$PlayEventMessageOrBuilder.class

Then I have a hql file:
ADD JAR PlayEventProtos.jar;
ADD JAR /home/ivonseggern/elephant-bird/hive/target/elephant-bird-hive-3.0.8-SNAPSHOT.jar;
ADD JAR /home/ivonseggern/elephant-bird/core/target/elephant-bird-core-3.0.8-SNAPSHOT.jar;

CREATE DATABASE IF NOT EXISTS test_db;

CREATE TABLE IF NOT EXISTS test_db.test_pb
-- no need to specify a schema - it will be discovered at runtime
    PARTITIONED BY (month STRING, day INT)
        ROW FORMAT serde "com.twitter.elephantbird.hive.serde.ProtobufDeserializer"
        with serdeproperties (
            "serialization.class"="rdio.hive_extensions.PlayEvent$PlayEventMessage")
        stored as
        inputformat "com.twitter.elephantbird.mapred.input.DeprecatedRawMultiInputFormat"
        outputformat "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

which when I run I get:
FAILED: Error in metadata: java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.NullPointerException)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

I must be doing something silly, but I can't figure out what, any help is much appreciated, thanks!
Ian

Namit Jain

unread,
Jul 10, 2013, 6:59:01 AM7/10/13
to elephant...@googlegroups.com, ian.von...@rd.io
Something similar is failing for me, with a similar error:



hive> create table users
    >   row format serde "com.twitter.elephantbird.hive.serde.ProtobufDeserializer"
    >   with serdeproperties (
    >   "serialization.class"=
    >   "org.apache.hadoop.hive.serde2.proto.test.Complexpb$Complex")
    > ;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.ClassCastException: class org.apache.hadoop.hive.serde2.proto.test.Complexpb$Complex)


    > create table users
    >   row format serde "com.twitter.elephantbird.hive.serde.ProtobufDeserializer"
    >   with serdeproperties (
    >   "serialization.class"=
    >   "org.apache.hadoop.hive.serde2.proto.test.Complexpb$Complex")
    >   stored as
    >   inputformat "com.twitter.elephantbird.mapred.input.DeprecatedRawMultiInputFormat"
    > ;

NoViableAltException(26@[])
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:899)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:423)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:966)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:881)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:782)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
FAILED: ParseException line 1:0 cannot recognize input near 'FAILED' ':' 'Execution'


Were you able to get it to work ?

-namit
Reply all
Reply to author
Forward
0 new messages