Need help with EB/Protobuf/EMR/Hive

995 views
Skip to first unread message

da...@livefyre.com

unread,
Sep 2, 2013, 9:24:02 PM9/2/13
to elephant...@googlegroups.com
Hi All,

I'm new to EB and was following this old thread - https://groups.google.com/forum/#!msg/elephantbird-dev/7gaYDBONtkk/kHPGMyc9dtUJ - from which I pieced together the 'latest' versions of each of the required jars.

All my MR jobs are on Amazon's EMR.  This is my test launch script/setup on EMR:
1) ./elastic-mapreduce --create --alive --num-instances 1 --instance-type m1.small --name 'onconv' --hive-interactive --hive-versions 0.11.0 --ami-version latest --hadoop-version 1.0.3

When EMR is ready, I would ssh into a hadoop cluster (of one node)
Data file preparation(using ProtobufMRExample):
% export HADOOP_CLASSPATH=/mnt/var/lib/hive_0110/downloaded_resources/elephant-bird-core-3.0.5.jar:/home/hadoop/lib/guava-13.0.1.jar:/home/hadoop/lib/protobuf-java-2.4.1.jar
% hadoop jar /mnt/var/lib/hive_0110/downloaded_resources/elephant-bird-examples-3.0.4.jar com.twitter.elephantbird.examples.ProtobufMRExample -libjars /mnt/var/lib/hive_0110/downloaded_resources/elephant-bird-core-3.0.5.jar,/home/hadoop/lib/guava-13.0.1.jar,/home/hadoop/lib/protobuf-java-2.4.1.jar -Dproto.test=lzoOut -Dproto.test.format=Block s3://<ROOT_DEV>/tmp/input/test1 s3://<ROOT_DEV>/tmp/output6

I was able to uncompress by using -Dproto.test=lzoIn to verify that the compression was fine. test1 has 5 lines, each line has 2 columns: name<TAB>age

And with part-m-00000.lzo in place in my tmp/output6, these were my hive's commands:
add jar s3://<ROOT_DEV>/lib/elephant-bird/elephant-bird-core-3.0.5.jar;
add jar s3://<ROOT_DEV>/lib/elephant-bird/elephant-bird-hive-3.0.5.jar;
add jar s3://<ROOT_DEV>/lib/elephant-bird/guava-13.0.1.jar;
add jar s3://<ROOT_DEV>/lib/elephant-bird/protobuf-java-2.4.1.jar;
add jar s3://<ROOT_DEV>/lib/elephant-bird/elephant-bird-examples-3.0.4.jar;

drop table test1;

create external table test1
  partitioned by (dt string)
  row format serde "com.twitter.elephantbird.hive.serde.ProtobufDeserializer"
 with serdeproperties (
 "serialization.class"="com.twitter.elephantbird.examples.proto.Examples$Age")
 stored as
  inputformat "com.twitter.elephantbird.mapred.input.DeprecatedRawMultiInputFormat"
  outputformat "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

ALTER TABLE test1 ADD IF NOT EXISTS PARTITION (dt='output6')
    LOCATION 's3://<ROOT_DEV>/tmp';

There were no errors nor warnings, so I thought that was a good sign.

hive> describe test1;
OK
name                 string               from deserializer   
age                 int                 from deserializer   
dt                   string               None                
 
# Partition Information  
# col_name             data_type           comment             
 
dt                   string               None                
Time taken: 0.62 seconds, Fetched: 8 row(s)

The schema looks good though I don't quite understand why dt was mentioned twice.

hive> ALTER TABLE test1 ADD IF NOT EXISTS PARTITION (dt='output6')
    >     LOCATION 's3://<ROOT_DEV>/tmp';
OK
Time taken: 1.057 seconds
hive> select count(*) from test1;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Kill Command = /home/hadoop/bin/hadoop job  -kill job_201309022344_0002
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 1
2013-09-02 23:56:34,464 Stage-1 map = 0%,  reduce = 0%
2013-09-02 23:56:41,606 Stage-1 map = 0%,  reduce = 100%, Cumulative CPU 1.06 sec
2013-09-02 23:56:42,647 Stage-1 map = 0%,  reduce = 100%, Cumulative CPU 1.06 sec
2013-09-02 23:56:43,716 Stage-1 map = 0%,  reduce = 100%, Cumulative CPU 1.06 sec
2013-09-02 23:56:44,788 Stage-1 map = 0%,  reduce = 100%, Cumulative CPU 1.06 sec
2013-09-02 23:56:45,795 Stage-1 map = 0%,  reduce = 100%, Cumulative CPU 1.06 sec
2013-09-02 23:56:46,826 Stage-1 map = 0%,  reduce = 100%, Cumulative CPU 1.06 sec
2013-09-02 23:56:47,858 Stage-1 map = 0%,  reduce = 100%, Cumulative CPU 1.06 sec
2013-09-02 23:56:48,891 Stage-1 map = 0%,  reduce = 100%, Cumulative CPU 1.06 sec
2013-09-02 23:56:49,899 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.06 sec
MapReduce Total cumulative CPU time: 1 seconds 60 msec
Ended Job = job_201309022344_0002
Counters:
MapReduce Jobs Launched: 
Job 0: Reduce: 1   Cumulative CPU: 1.06 sec   HDFS Read: 0 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 60 msec
OK
0
Time taken: 53.766 seconds, Fetched: 1 row(s)

I did a 'select name,age from test1;' but the job returned nothing.  In EMR, perusing the log files in s3 reviewed zero errors.  However, I'm seeing
Task TASKID="task_201309022344_0007_m_000001" TASK_TYPE="SETUP" TASK_STATUS="SUCCESS" FINISH_TIME="1378167778191" COUNTERS="{(FileSystemCounters)(FileSystemCounters)[(FILE_BYTES_WRITTEN)(FILE_BYTES_WRITTEN)(61239)]}{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(PHYSICAL_MEMORY_BYTES)(Physical memory \\(bytes\\) snapshot)(43864064)][(SPILLED_RECORDS)(Spilled Records)(0)][(CPU_MILLISECONDS)(CPU time spent \\(ms\\))(100)][(COMMITTED_HEAP_BYTES)(Total committed heap usage \\(bytes\\))(16252928)][(VIRTUAL_MEMORY_BYTES)(Virtual memory \\(bytes\\) snapshot)(442163200)]}"
which I suspect was trying to tell me something.

Why am I reading zero?  Any pointers and suggestions deeply appreciated.

Thanks in advance,
dave


da...@livefyre.com

unread,
Sep 6, 2013, 5:38:57 PM9/6/13
to elephant...@googlegroups.com
I was able to setup my own local dev box (instead of using EMR) and was able to reproduce the problem.  Both local and EMR'a stack traces are identical.

Stacktrace:
2013-09-06 21:27:00,877 INFO org.apache.hadoop.hive.ql.exec.MapOperator: dump TS struct<name:string,age:int>
2013-09-06 21:27:00,877 INFO ExecMapper:
<MAP>Id =3
  <Children>
    <TS>Id =0
      <Children>
        <SEL>Id =1
          <Children>
            <FS>Id =2
              <Parent>Id = 1 null<\Parent>
            <\FS>
          <\Children>
          <Parent>Id = 0 null<\Parent>
        <\SEL>
      <\Children>
      <Parent>Id = 3 null<\Parent>
    <\TS>
  <\Children>
<\MAP>
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable
        at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:643)
        at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
        ... 8 more
Caused by: java.lang.IllegalArgumentException: FieldDescriptor does not match message type.
        at com.google.protobuf.GeneratedMessage$FieldAccessorTable.getField(GeneratedMessage.java:1445)
        at com.google.protobuf.GeneratedMessage$FieldAccessorTable.access$100(GeneratedMessage.java:1391)
        at com.google.protobuf.GeneratedMessage$Builder.setField(GeneratedMessage.java:340)
        at com.google.protobuf.GeneratedMessage$Builder.setField(GeneratedMessage.java:207)
        at com.twitter.elephantbird.hive.serde.ProtobufStructObjectInspector.setStructFieldData(ProtobufStructObjectInspector.java:139)
        at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:330)
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:626)

da...@livefyre.com

unread,
Sep 6, 2013, 8:56:19 PM9/6/13
to elephant...@googlegroups.com
So, 'select * from test1;' works and didn't trigger a mapreduce job.

And further probing into EB github, the source @ com.twitter.elephantbird.hive.serde.ProtobufStructObjectInspector.setStructFieldData(ProtobufStructObjectInspector.java:139) shows the setField(field, value) was the failure point after failing this test in com.google.protobuf.GeneratedMessage.java:1445
if (field.getContainingType() != descriptor) {
        throw new IllegalArgumentException(
          "FieldDescriptor does not match message type.");

Was ProtobufStructObjectInspector wrongly instantiated with an invalid descriptor param?

Would appreciate some comments.

Thanks,
dave

On Monday, September 2, 2013 6:24:02 PM UTC-7, da...@livefyre.com wrote:

Raghu Angadi

unread,
Sep 8, 2013, 2:12:48 PM9/8/13
to elephant...@googlegroups.com, da...@livefyre.com
Hi Dave,

Thanks for digging into this. I myself haven't worked with Hive & EB much. 

So count(*) was successful even though the mapper had the above exception? 

is it possible that extra 'dt' that your noticed is somehow causing hive to look for field 'dt' which does not exist in the protobuf? If you could compile EB, you can print the fieldName inside setStructFieldData().

btw, EB 3.x is an older version, the current version is 4.1 (though the upgrade mostly does not affect your simple example).



--
You received this message because you are subscribed to the Google Groups "elephantbird-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elephantbird-d...@googlegroups.com.
To post to this group, send email to elephant...@googlegroups.com.
Visit this group at http://groups.google.com/group/elephantbird-dev.
For more options, visit https://groups.google.com/groups/opt_out.

da...@livefyre.com

unread,
Sep 8, 2013, 3:29:22 PM9/8/13
to elephant...@googlegroups.com, da...@livefyre.com
Hi Raghu,

Thanks for the reply.  Always good to hear from an ex-Yahoo (search team)  :)

'select count(*) from test1' does not work.  It was 'select * from test1' that works.  Except for the latter, every other HQL I tried would trgiger a MR job and that's when the exception happens.

e.g. This fails too:
drop table test1_1;
CREATE TABLE test1_1 ( name string, age int );

INSERT OVERWRITE TABLE test1_1
    SELECT *
    FROM test1 ;

I'm guessing that when there is a MR job, there's when that codepath gets executed.  I also don't fully understand why the single 'select * from test1' works.  Data seems to be already deserialized/loaded at that point.

I tried a version without the 'dt'.  Same exception, same place when executing 'select name, age from test1' but 'select *' works as before:
hduser@t13:~$ hive

Logging initialized using configuration in jar:file:/mnt/hive/hive-0.11.0-bin/lib/hive-common-0.11.0.jar!/hive-log4j.properties
Hive history file=/tmp/hduser/hive_job_log...@t13.livefyre.com_201309081911_737844943.txt
hive> add jar /mnt/emr/dynamodb/elephant-bird-core-3.0.5.jar;
Added /mnt/emr/dynamodb/elephant-bird-core-3.0.5.jar to class path
Added resource: /mnt/emr/dynamodb/elephant-bird-core-3.0.5.jar
hive> add jar /mnt/emr/dynamodb/elephant-bird-hive-3.0.5.jar;                                                                                                         
Added /mnt/emr/dynamodb/elephant-bird-hive-3.0.5.jar to class path
Added resource: /mnt/emr/dynamodb/elephant-bird-hive-3.0.5.jar
hive> add jar /mnt/emr/dynamodb/guava-13.0.1.jar;
Added /mnt/emr/dynamodb/guava-13.0.1.jar to class path
Added resource: /mnt/emr/dynamodb/guava-13.0.1.jar
hive> add jar /mnt/emr/dynamodb/protobuf-java-2.4.1.jar;
Added /mnt/emr/dynamodb/protobuf-java-2.4.1.jar to class path
Added resource: /mnt/emr/dynamodb/protobuf-java-2.4.1.jar
hive> add jar /mnt/emr/dynamodb/elephant-bird-examples-3.0.4.jar;
Added /mnt/emr/dynamodb/elephant-bird-examples-3.0.4.jar to class path
Added resource: /mnt/emr/dynamodb/elephant-bird-examples-3.0.4.jar
hive> 
    > drop table test1;
OK
Time taken: 8.329 seconds
hive> 
    > create external table test1
    >   -- partitioned by (dt string)
    >   row format serde "com.twitter.elephantbird.hive.serde.ProtobufDeserializer"
    >  with serdeproperties (
    >  "serialization.class"="com.twitter.elephantbird.examples.proto.Examples$Age")
    >  stored as
    >   inputformat "com.twitter.elephantbird.mapred.input.DeprecatedRawMultiInputFormat"
    >   outputformat "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
    >   LOCATION 'file:/mnt/emr/data/output6';
OK
Time taken: 0.345 seconds
hive> select * from test1;
OK
dave 28
adrian 11
brandon 8
doris 28
everyone 100
Time taken: 0.694 seconds, Fetched: 5 row(s)

I downloaded eb-core and eb-hive 4.1 jars and you're right - the upgrade didn't touch these parts. FYI: EB 4.1 requires an additional jar - elephant-bird-hadoop-compat-4.1.jar which I downloaded and 'added' in each hive run.  Again, 'select *' works and everything else broke at the same point.

I downloaded the EB source from github and with eclipse, set breakpoints at various points in the call stack.  I could see the execution traces in eclipse BUT none of the breakpoint broke!

Any ideas who can help us here?

Thanks,
dave

da...@livefyre.com

unread,
Sep 10, 2013, 4:05:41 PM9/10/13
to elephant...@googlegroups.com
So I suspect the issue was with a 'corrupted' elephant-bird-examples-3.0.4.jar

I created my own test Message protobuf and compiled with protoc 2.4.1. and all that works!   Note that I had problems with protoc 2.5.0 and had to switch to 2.4.1.

Thanks to all for listening.


On Monday, September 2, 2013 6:24:02 PM UTC-7, da...@livefyre.com wrote:

Raghu Angadi

unread,
Sep 16, 2013, 7:36:59 PM9/16/13
to elephant...@googlegroups.com
Thanks to Dave for digging into this. We both met in SF and debugged a bit more.

It looks like ProtobufStructObjectInspector.java  in protobuf has major issues. It never worked properly. 

e.g.
  • setStructFieldData() returns new object, but Hive expects the object inspector to modify the object.
  • create() does not return a Message object, it returns ProtoDescriptor object.
we will see what the correct fixes should be.



da...@livefyre.com

unread,
Sep 17, 2013, 1:09:42 AM9/17/13
to elephant...@googlegroups.com
Thanks you Raghu for taking the time to jam on this.  FYI, this is only particular to Hive.  Pig version works just fine.

To be clear, elephant-bird-hive-3.0.3.jar works fine too(if you work around the fieldname mangling).  The ProtobufStructObjectInspector.java was introduced in 3.0.4 onwards including the latest 4.1.  It's strange that such a important change wasn't accorded a 3.1.x or 3.2.x higher to alert users.

If you have to use Hive and protobuf, 3.0.3 is the way to go until a fix is available.

In any case, the following script has be successfully tested: 

add jar /mnt/emr/dynamodb/elephant-bird-examples-3.0.4.jar;
add jar /mnt/emr/dynamodb/elephant-bird-core-3.0.3.jar;
add jar /mnt/emr/dynamodb/elephant-bird-hive-3.0.3.jar;
add jar /mnt/emr/dynamodb/protobuf-java-2.4.1.jar;
add jar /mnt/emr/dynamodb/guava-13.0.1.jar;

drop table test1;

create external table test1                                                   
  row format serde "com.twitter.elephantbird.hive.serde.ProtobufDeserializer"  
 with serdeproperties (                                                        
 "serialization.class"="com.twitter.elephantbird.examples.proto.Examples$Age") 
 stored as                                                                     
  inputformat "com.twitter.elephantbird.mapred.input.DeprecatedRawMultiInputFormat"   
  outputformat "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
  LOCATION 'file:/mnt/emr/data/output2';

desc test1;
select * from test1;
select name_ from test1;

drop table test1;


On Monday, September 16, 2013 4:36:59 PM UTC-7, Raghu Angadi wrote:
Thanks to Dave for digging into this. We both met in SF and debugged a bit more.

It looks like ProtobufStructObjectInspector.java  in protobuf has major issues. It never worked properly. 

e.g.
  • setStructFieldData() returns new object, but Hive expects the object inspector to modify the object.
  • create() does not return a Message object, it returns ProtoDescriptor object.
we will see what the correct fixes should be.



On Sun, Sep 8, 2013 at 12:29 PM, <da...@livefyre.com> wrote:
Hi Raghu,

Thanks for the reply.  Always good to hear from an ex-Yahoo (search team)  :)

'select count(*) from test1' does not work.  It was 'select * from test1' that works.  Except for the latter, every other HQL I tried would trgiger a MR job and that's when the exception happens.

e.g. This fails too:
drop table test1_1;
CREATE TABLE test1_1 ( name string, age int );

INSERT OVERWRITE TABLE test1_1
    SELECT *
    FROM test1 ;

I'm guessing that when there is a MR job, there's when that codepath gets executed.  I also don't fully understand why the single 'select * from test1' works.  Data seems to be already deserialized/loaded at that point.

I tried a version without the 'dt'.  Same exception, same place when executing 'select name, age from test1' but 'select *' works as before:
hduser@t13:~$ hive

Logging initialized using configuration in jar:file:/mnt/hive/hive-0.11.0-bin/lib/hive-common-0.11.0.jar!/hive-log4j.properties
Hive history file=/tmp/hduser/hive_job_log_hduser...@t13.livefyre.com_201309081911_737844943.txt

amalho...@gmail.com

unread,
Oct 17, 2013, 4:17:06 PM10/17/13
to elephant...@googlegroups.com
I see this error in hive-CDH-4.4.0 but dont see it in CDH-4.2.0. Any workarounds?

Chen Song

unread,
Apr 27, 2014, 2:42:41 PM4/27/14
to elephant...@googlegroups.com
It appears to be a problem with protobuf versions after 2.4.1 onwards. I tested on hive CDH5 (0.12.0) with protobuf-java 2.5.0 and saw the same issues. Is there a quick workaround to fix this issue?


On Monday, September 16, 2013 7:36:59 PM UTC-4, Raghu Angadi wrote:
Thanks to Dave for digging into this. We both met in SF and debugged a bit more.

It looks like ProtobufStructObjectInspector.java  in protobuf has major issues. It never worked properly. 

e.g.
  • setStructFieldData() returns new object, but Hive expects the object inspector to modify the object.
  • create() does not return a Message object, it returns ProtoDescriptor object.
we will see what the correct fixes should be.



On Sun, Sep 8, 2013 at 12:29 PM, <da...@livefyre.com> wrote:
Hi Raghu,

Thanks for the reply.  Always good to hear from an ex-Yahoo (search team)  :)

'select count(*) from test1' does not work.  It was 'select * from test1' that works.  Except for the latter, every other HQL I tried would trgiger a MR job and that's when the exception happens.

e.g. This fails too:
drop table test1_1;
CREATE TABLE test1_1 ( name string, age int );

INSERT OVERWRITE TABLE test1_1
    SELECT *
    FROM test1 ;

I'm guessing that when there is a MR job, there's when that codepath gets executed.  I also don't fully understand why the single 'select * from test1' works.  Data seems to be already deserialized/loaded at that point.

I tried a version without the 'dt'.  Same exception, same place when executing 'select name, age from test1' but 'select *' works as before:
hduser@t13:~$ hive

Logging initialized using configuration in jar:file:/mnt/hive/hive-0.11.0-bin/lib/hive-common-0.11.0.jar!/hive-log4j.properties
Hive history file=/tmp/hduser/hive_job_log_hduser...@t13.livefyre.com_201309081911_737844943.txt

Rahul Ravindran

unread,
May 2, 2014, 8:47:36 PM5/2/14
to elephant...@googlegroups.com
Hi Dave, Raghu,
  I am hitting this with EB 4.4. I am using protocol buffers 2.4.1 but still see this which seems contrary to the behavior you noticed. Can you confirm that this problem does not exist with 2.4.1?
Thanks,
~Rahul.

Chen Song

unread,
May 15, 2014, 11:26:00 AM5/15/14
to elephant...@googlegroups.com
This issue exists due to incompatibility between ProtobufStructObjectInspector.java and hive for protobuf versions >= 2.4.1. What Raghu meant was by using EB 3.0.3, one can get around this issue because there is no ProtobufStructObjectInspector.java in 3.0.3.

Rahul Ravindran

unread,
May 15, 2014, 4:58:08 PM5/15/14
to elephant...@googlegroups.com
I have a fix for the ProtobufStructObjectInspector.java. I will clean it up, update tests and submit a merge request


--
You received this message because you are subscribed to a topic in the Google Groups "elephantbird-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elephantbird-dev/L--OOt_N2K0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elephantbird-d...@googlegroups.com.

To post to this group, send email to elephant...@googlegroups.com.
Visit this group at http://groups.google.com/group/elephantbird-dev.
For more options, visit https://groups.google.com/d/optout.

Cristi Calugaru

unread,
Aug 7, 2014, 8:42:52 AM8/7/14
to elephant...@googlegroups.com
Hi Rahul,

I am facing a similar problem, and I was wondering what is the current status of this issue.
I am using CDH 4.6,  elephant-bird-core-4.5, protobuf 2.4.1. 
My table was created like this: 

create table test row format serde "com.twitter.elephantbird.hive.serde.ProtobufDeserializer" with serdeproperties ("serialization.class"="xxxxxxx.SegmentProto$Segment") stored as inputformat "com.twitter.elephantbird.mapred.input.DeprecatedRawMultiInputFormat" outputformat "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";


When doing a select I get: 

org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
 processing writable 
        at org
.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:647)
        at org
.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:141)
        at org
.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org
.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
        at org
.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
        at org
.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java
.security.AccessController.doPrivileged(Native Method)
        at javax
.security.auth.Subject.doAs(Subject.java:415)
        at org
.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
        at org
.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.IllegalArgumentException: FieldDescriptor does not match message type.
        at com
.google.protobuf.GeneratedMessage$FieldAccessorTable.getField(GeneratedMessage.java:1445)

I switched to the 3.0.3 version as it was suggested here, but then I got a different error:

Error: class com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper$RecordReaderWrapper$1 has interface org.apache.hadoop.mapreduce.TaskInputOutputContext as super class

Should I try with the latest elephant-bird nightly build/master?

Chenjie Yu

unread,
Aug 20, 2014, 6:16:57 PM8/20/14
to elephant...@googlegroups.com
Is there any update on this?


On Monday, September 2, 2013 6:24:02 PM UTC-7, da...@livefyre.com wrote:

Rahul Ravindran

unread,
Aug 20, 2014, 6:31:26 PM8/20/14
to elephant...@googlegroups.com
I have a pull request at https://github.com/kevinweil/elephant-bird/pull/400 which is currently working in our production environment


Cristi Calugaru

unread,
Sep 10, 2014, 4:12:24 AM9/10/14
to elephant...@googlegroups.com
Rahul, I can tried compiling elephant-bird with your path, and bumped into a different exception:

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable 

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable 
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:647)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:141)
... 8 more
Caused by: java.lang.IllegalArgumentException: argument type mismatch

These are the jars I use, where SNAPSHOT describes elephant bird built with your patch:
elephant-bird-core-4.6-SNAPSHOT.jar;
elephant-bird-hadoop-compat-4.6-SNAPSHOT.jar;
elephant-bird-hive-4.6-SNAPSHOT.jar;
protobuf-java-2.4.1.jar;
protobuf-java-format-1.2.jar;

Can you post what your jar versions are, in your production environment?

Thanks!

jkm137

unread,
Nov 11, 2014, 5:37:14 PM11/11/14
to elephant...@googlegroups.com
Was this ever resolved?  I hit the same issue, and I'd be grateful for a workaround.

Thanks!

Chen Song

unread,
Nov 14, 2014, 2:35:14 PM11/14/14
to elephant...@googlegroups.com
Has this pull request got merged? https://github.com/kevinweil/elephant-bird/pull/400.

Please try this patch, if not working, I have a patch myself working in our production for a few months that I can share.

On Tue, Nov 11, 2014 at 5:37 PM, jkm137 <jhm...@gmail.com> wrote:
Was this ever resolved?  I hit the same issue, and I'd be grateful for a workaround.

Thanks!

--
You received this message because you are subscribed to a topic in the Google Groups "elephantbird-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elephantbird-dev/L--OOt_N2K0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elephantbird-d...@googlegroups.com.
To post to this group, send email to elephant...@googlegroups.com.
Visit this group at http://groups.google.com/group/elephantbird-dev.
For more options, visit https://groups.google.com/d/optout.



--
Chen Song

jkm137

unread,
Nov 14, 2014, 4:32:03 PM11/14/14
to elephant...@googlegroups.com
Yeah, I tried using that patch, but I'm still seeing issues.  Could you point me at your patch?

Thanks again!

吴磊

unread,
Feb 2, 2015, 4:37:56 AM2/2/15
to elephant...@googlegroups.com
I use this patch , it works.
https://github.com/twitter/elephant-bird/pull/425

在 2014年11月15日星期六 UTC+8上午5:32:03,jkm137写道:
Reply all
Reply to author
Forward
0 new messages