Pb with Presto 0.54 and ORC file

370 views
Skip to first unread message

damien...@gmail.com

unread,
Dec 14, 2013, 5:37:28 AM12/14/13
to presto...@googlegroups.com
Presto 0.54
Hadoop 1.2.1 (apache tgz)
Hive 0.12 (with remote metastore)

We created new table in orc file format from text we this query :

CREATE TABLE foo STORED AS ORC AS SELECT * FROM table_txt;

The strange things are :
- Hue seems to fully understand this table (data show are ok)
- Hive can query and get correct data

When we query this table in Presto, we have these errors :

java.lang.RuntimeException: Error opening Hive split hdfs://nc-h04/user/hive/warehouse/casino.db/encaissementn2_orc/000002_1 (offset=0, length=40494983) using org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: Message missing required fields: columns[1].kind, columns[2].kind, columns[3].kind, columns[4].kind, columns[5].kind, columns[7].kind, columns[8].kind, columns[9].kind, columns[11].kind, columns[12].kind, columns[13].kind, columns[14].kind, columns[16].kind
at com.facebook.presto.hive.HiveRecordSet.createRecordReader(HiveRecordSet.java:190) ~[na:na]
at com.facebook.presto.hive.HiveRecordSet.cursor(HiveRecordSet.java:111) ~[na:na]
at com.facebook.presto.spi.classloader.ClassLoaderSafeRecordSet.cursor(ClassLoaderSafeRecordSet.java:46) ~[presto-spi-0.54.jar:0.54]
at com.facebook.presto.operator.RecordProjectOperator.<init>(RecordProjectOperator.java:45) ~[presto-main-0.54.jar:0.54]
at com.facebook.presto.split.RecordSetDataStreamProvider.createNewDataStream(RecordSetDataStreamProvider.java:46) ~[presto-main-0.54.jar:0.54]
at com.facebook.presto.split.DataStreamManager.createNewDataStream(DataStreamManager.java:61) ~[presto-main-0.54.jar:0.54]
at com.facebook.presto.operator.TableScanOperator.addSplit(TableScanOperator.java:132) ~[presto-main-0.54.jar:0.54]
at com.facebook.presto.operator.Driver.addSplit(Driver.java:166) ~[presto-main-0.54.jar:0.54]
at com.facebook.presto.operator.Driver.updateSource(Driver.java:142) ~[presto-main-0.54.jar:0.54]
at com.facebook.presto.execution.SqlTaskExecution.createDriver(SqlTaskExecution.java:460) ~[presto-main-0.54.jar:0.54]
at com.facebook.presto.execution.SqlTaskExecution.access$400(SqlTaskExecution.java:73) ~[presto-main-0.54.jar:0.54]
at com.facebook.presto.execution.SqlTaskExecution$2.apply(SqlTaskExecution.java:333) ~[presto-main-0.54.jar:0.54]
at com.facebook.presto.execution.SqlTaskExecution$2.apply(SqlTaskExecution.java:329) ~[presto-main-0.54.jar:0.54]
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.initialize(SqlTaskExecution.java:592) ~[presto-main-0.54.jar:0.54]
at com.facebook.presto.execution.TaskExecutor$PrioritizedSplitRunner.initializeIfNecessary(TaskExecutor.java:395) ~[presto-main-0.54.jar:0.54]
at com.facebook.presto.execution.TaskExecutor$Runner.run(TaskExecutor.java:543) ~[presto-main-0.54.jar:0.54]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
Caused by: com.facebook.presto.hive.shaded.com.google.protobuf.InvalidProtocolBufferException: Message missing required fields: columns[1].kind, columns[2].kind, columns[3].kind, columns[4].kind, columns[5].kind, columns[7].kind, columns[8].kind, columns[9].kind, columns[11].kind, columns[12].kind, columns[13].kind, columns[14].kind, columns[16].kind
at com.facebook.presto.hive.shaded.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:81) ~[na:na]
at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$Builder.buildParsed(OrcProto.java:5908) ~[na:na]
at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$Builder.access$10700(OrcProto.java:5834) ~[na:na]
at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.parseFrom(OrcProto.java:5779) ~[na:na]
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:1108) ~[na:na]
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:1114) ~[na:na]
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:94) ~[na:na]
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rows(ReaderImpl.java:242) ~[na:na]
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.<init>(OrcInputFormat.java:56) ~[na:na]
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:168) ~[na:na]
at com.facebook.presto.hive.HiveRecordSet$1.call(HiveRecordSet.java:185) ~[na:na]
at com.facebook.presto.hive.HiveRecordSet$1.call(HiveRecordSet.java:180) ~[na:na]
at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:85) ~[na:na]
at com.facebook.presto.hive.HiveRecordSet.createRecordReader(HiveRecordSet.java:179) ~[na:na]
... 18 common frames omitted

Stephen Sprague

unread,
Dec 14, 2013, 5:46:53 PM12/14/13
to presto...@googlegroups.com, damien...@gmail.com
you may wish to review this post: https://groups.google.com/forum/#!searchin/presto-users/ORC/presto-users/89P9Qyt0twQ/bukdHcTKLSIJ

it doesn't answer your question directly but does contain insight on where ORC support is at this moment in time.

damien...@gmail.com

unread,
Dec 16, 2013, 3:20:38 AM12/16/13
to presto...@googlegroups.com, damien...@gmail.com
Thanks for the tip.

I already seen this post. Anyway, we just converted all table to RCFILE format. Because it seems that Facebook use it with RCFILE.

We don't want to be beta tester.

Regards,
Damien Carol

damien...@gmail.com

unread,
Dec 31, 2013, 4:14:29 AM12/31/13
to presto...@googlegroups.com, damien...@gmail.com
The problem still there with 0.55 release.

David Phillips

unread,
Jan 1, 2014, 7:26:17 PM1/1/14
to presto...@googlegroups.com
On Sat, Dec 14, 2013 at 2:37 AM, <damien...@gmail.com> wrote:
We created new table in orc file format from text we this query :

[...]


When we query this table in Presto, we have these errors :

We do not currently support ORC. I filed in an issue to track this:


We will need to figure out why it does not work (at minimum, probably need to upgrade to Hive 0.12) and add it to our integration tests.

However, making it work is just the first step. Making it fast requires a lot more work and the ORC code makes that difficult because all of the classes are private. We have been talking to Hortonworks about the required changes to ORC.

For now, we recommend you use RCFile with the binary SerDe.

Reply all
Reply to author
Forward
0 new messages