Lzo Compression using EB fails when dealing with hive "union all" like queries

53 views
Skip to first unread message

Irfan Mohammed

unread,
Sep 25, 2013, 7:19:34 PM9/25/13
to elephant...@googlegroups.com
Hi,

I am running into a error where the DeprecatedInputFormatWrapper is not able to detect the "valueCopier" when dealing with combination queries like "union all" or "lateral view". The exception is posted at the end. 

I am using hive 0.9.0 with Elephant-Bird 4.1. 

Successful Insert Query [ ref: insert_from_simple_select.sql ] 
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

insert overwrite table dest_table_01 partition (dt)
select * from source_table_01
;

Failure Insert Query [ ref: insert_from_union_select.sql ] 
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

insert overwrite table dest_table_01 partition (dt)
select * from (
select * from source_table_01
union all
select * from source_table_02
) t1
;

A tarball to reproduce the exception is attached. You can execute the "run.sh" to 
  1. setup the schema
  2. load the data into the source tables
  3. query the source tables
  4. run the insert queries. 
    1. insert from simple  select succeeds
    2. insert from union select fails. 
Am I missing something in the hive setup? Or I need to write the query differently?

Thanks for the help. 

Thanks,
Irfan

2013-09-25 15:53:17,002 INFO  ExecMapper (ExecMapper.java:close(215)) - ExecMapper: processed 100000 rows: used memory = 216894840
2013-09-25 15:53:17,006 WARN  mapred.LocalJobRunner (LocalJobRunner.java:run(347)) - job_local_0001
java.io.IOException: java.io.IOException: java.io.IOException: DeprecatedInputFormatWrapper - value is different and no value copier provided. current reader class : class com.twitter.elephantbird.mapreduce.input.LzoLineRecordReader
  at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
  at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
  at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:311)
  at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:227)
  at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:210)
  at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:195)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:393)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
  at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:263)
Caused by: java.io.IOException: java.io.IOException: DeprecatedInputFormatWrapper - value is different and no value copier provided. current reader class : class com.twitter.elephantbird.mapreduce.input.LzoLineRecordReader
  at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
  at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
  at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
  at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
  at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
  at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
  at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:309)
  ... 7 more
Caused by: java.io.IOException: DeprecatedInputFormatWrapper - value is different and no value copier provided. current reader class : class com.twitter.elephantbird.mapreduce.input.LzoLineRecordReader
  at com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper$RecordReaderWrapper.next(DeprecatedInputFormatWrapper.java:325)
  at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
  ... 11 more
hive_elephant_bird_test.tar.gz
Reply all
Reply to author
Forward
0 new messages