Hive external table with protobuf

1,077 views
Skip to first unread message

Piotrek Stapp

unread,
Nov 5, 2013, 7:20:42 AM11/5/13
to elephant...@googlegroups.com
I am trying to achieve to load external file into hive. Combining everything which I founded on Google, gives me following code:

create external table test1                                                   
  row format serde "com.twitter.elephantbird.hive.serde.ProtobufDeserializer"  
 with serdeproperties ("serialization.class"="Test.Messages$Event")
 stored as                                                                     
  inputformat "com.twitter.elephantbird.mapred.input.DeprecatedRawMultiInputFormat"
  outputformat "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
 location '/tmp/datafolder/';

Unfortunately SELECT * FROM test1; doesn't give me an empty result

I created my file using following code
 
ProtobufBlockWriter<Messages.Event> writer = new ProtobufBlockWriter<Messages.Event>(os, Messages.Event.class);
        Messages.Event msg = Messages.Event.newBuilder()
                .setServiceEnd(
                        Messages.ServiceCallEnd.newBuilder()
                                .setId(Messages.ServiceCallId.newBuilder().setId(ByteString.copyFromUtf8("call numero 1")))
                                .setDate(Messages.Date.newBuilder().setUtcTicks(1111111))
                                .build())
                .setEventDiscriminator(
                        Messages.Event.EventType.valueOf(Messages.Event.EventType.ServiceCallEnd_VALUE))
                .build();

        writer.write(msg);
        writer.finish();
        writer.close();

 

學德岳

unread,
Jan 7, 2015, 5:55:55 AM1/7/15
to elephant...@googlegroups.com
Hi Piotrek

Have you solved this problem?

I have the same problem as you do.

If you solve the problem can you tell me how to fix it?

Many thanks

Piotrek Stapp於 2013年11月5日星期二UTC+8下午8時20分42秒寫道:

Piotrek Stapp

unread,
Jan 8, 2015, 3:47:59 AM1/8/15
to elephant...@googlegroups.com
No I didn't. I decide to covert my data into JSON, using protobuf to json serializer.

岳學德

unread,
Jan 9, 2015, 11:08:42 PM1/9/15
to elephant...@googlegroups.com
Hi Piotrek Stapp 

Thanks your kindly response.

I solved the problem.
I'm new to protobuf, hive, elephant-bir, haddop, first I put protobuf binary data directly into HDFS.
I tried to use hive integrated with elephant-bird to select, no result showed.
Because it doesn't work that way.

After asking some senior colleagues, they said protobuf binary data should be written into some kind of container, some file formats, like hadoop SequenceFile.
The elephant-bird page had written the information too, but first I couldn't understand it completely.

After write protobuf binary data into sequenceFile, I can read the protobuf data with the hive create table syntax which elephant-bird page provided correctly.
Oh, and because I use sequenceFile format, so I change the create table input,output format syntax to:
inputformat 'org.apache.hadoop.mapred.SequenceFileInputFormat'
outputformat 'org.apache.hadoop.mapred.SequenceFileOutputFormat'

Thank you for your response.
Hope it helps.

Best regards

--
You received this message because you are subscribed to a topic in the Google Groups "elephantbird-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elephantbird-dev/lnOzF39LED0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elephantbird-d...@googlegroups.com.
To post to this group, send email to elephant...@googlegroups.com.
Visit this group at http://groups.google.com/group/elephantbird-dev.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages