issue with ProtobufPigLoader loading lzo compressed file with one protobuf record per line

254 views
Skip to first unread message

varnit

unread,
Feb 13, 2013, 6:02:05 PM2/13/13
to elephant...@googlegroups.com
I am having issues with ProtobufPigLoader loading our log files which have one protobuf record per line. Here is the backend error message:

java.lang.RuntimeException: error rate while reading input records crossed threshold
        at com.twitter.elephantbird.mapreduce.input.LzoRecordReader$InputErrorTracker.incErrors(LzoRecordReader.java:155)
        at com.twitter.elephantbird.mapreduce.input.LzoBinaryB64LineRecordReader.nextKeyValue(LzoBinaryB64LineRecordReader.java:135)
        at com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.getNextBinaryValue(LzoBaseLoadFunc.java:107)
        at com.twitter.elephantbird.pig.load.ProtobufPigLoader.getNext(ProtobufPigLoader.java:62)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:540)
        at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.Exception: Unknown error
        at com.twitter.elephantbird.mapreduce.input.LzoRecordReader$InputErrorTracker.incErrors(LzoRecordReader.java:138)
        ... 14 more

Pig Stack Trace
---------------
ERROR 2997: Unable to recreate exception from backed error: java.lang.RuntimeException: error rate while reading input records crossed threshold

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias raw_data. Backend error : Unable to recreate exception from backed error: java.lang.RuntimeException: error rate while reading input records crossed threshold
        at org.apache.pig.PigServer.openIterator(PigServer.java:890)
        at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:679)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
        at org.apache.pig.Main.run(Main.java:500)
        at org.apache.pig.Main.main(Main.java:114)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.RuntimeException: error rate while reading input records crossed threshold
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:354)
        at org.apache.pig.PigServer.launchPlan(PigServer.java:1313)
        at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1298)
        at org.apache.pig.PigServer.storeEx(PigServer.java:995)
        at org.apache.pig.PigServer.store(PigServer.java:962)
        at org.apache.pig.PigServer.openIterator(PigServer.java:875)
        ... 12 more

Pig script:

REGISTER path/to/elephant-bird-*-3.0.7.jar;
REGISTER path/to/my/proto-0.0.1.jar;

raw_data = load 'path/to/*' using com.twitter.elephantbird.pig.load.ProtobufPigLoader('com.example.stats.proto.StatsEvent.Event');
dump raw_data;

Any idea what's wrong here?

-varnit

Bill Graham

unread,
Feb 13, 2013, 6:27:48 PM2/13/13
to elephant...@googlegroups.com
This is the key part:


error rate while reading input records crossed threshold

Your input data has bad records. Try experimenting with this setting in Pig:
 
SET elephantbird.mapred.input.bad.record.threshold 0.001





-varnit

--
You received this message because you are subscribed to the Google Groups "elephantbird-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elephantbird-d...@googlegroups.com.
To post to this group, send email to elephant...@googlegroups.com.
Visit this group at http://groups.google.com/group/elephantbird-dev?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Raghu Angadi

unread,
Feb 13, 2013, 6:51:24 PM2/13/13
to elephant...@googlegroups.com
Vernit, are those records base64 encoded? please attach the log from the task (the log includes actual line number etc).

Raghu.

varnit

unread,
Feb 13, 2013, 7:46:44 PM2/13/13
to elephant...@googlegroups.com
Raghu,
Encoding to base64 worked. Does EB not support protobuf file with one record per line? I would prefer not encoding to base64.

Thanks,
-varnit

Dmitriy Ryaboy

unread,
Feb 13, 2013, 7:55:09 PM2/13/13
to elephant...@googlegroups.com
AFAIK there is no way to guarantee that your serialized protobuf does not contain \n, so no.
But if you want to avoid the base64 overhead, you can use the Block format rather than the newline-delimited format. It's a fair bit more efficient.

D
Dmitriy V Ryaboy
Twitter Analytics
http://twitter.com/squarecog

Vinodh K

unread,
Oct 9, 2015, 2:50:33 AM10/9/15
to elephantbird-dev
Hi Raghu, Can you please explain how did resolve this issue. I mean can you elaborate "Encoding to base64 worked.".

Thanks,
VInodh K

Dmitriy Ryaboy

unread,
Oct 9, 2015, 3:25:42 AM10/9/15
to elephant...@googlegroups.com
You serialize a protobuf and then encode it as a base 64 string. Append a new line. Or use block encoding as in my response to this thread. If you are not sure what base 64 is or how to encode a binary blob that way, please let us know what you tried and in what way it didn't work. 
Visit this group at http://groups.google.com/group/elephantbird-dev.
For more options, visit https://groups.google.com/d/optout.


--
Dmitriy V Ryaboy
Product Instrumentation and Experimentation @ Twitter
http://twitter.com/squarecog
I like PIE.

Vinodh K

unread,
Oct 9, 2015, 5:08:11 AM10/9/15
to elephantbird-dev
Thank you, I got the solution by increasing threshold size.
To unsubscribe from this group and stop receiving emails from it, send an email to elephantbird-dev+unsubscribe@googlegroups.com.
To post to this group, send email to elephantbird-dev@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages