Read ORC FIle Error

208 views
Skip to first unread message

Syed Akram

unread,
Nov 30, 2015, 2:18:22 AM11/30/15
to Presto
I am trying to read a orc file, and i faced the below issue,

java.lang.IndexOutOfBoundsException: end index (2775362) must not be greater than size (1696)
        at io.airlift.slice.Preconditions.checkPositionIndexes(Preconditions.java:94)
        at io.airlift.slice.Slice.checkIndexLength(Slice.java:1185)
        at io.airlift.slice.Slice.slice(Slice.java:739)
        at io.airlift.slice.BasicSliceInput.readSlice(BasicSliceInput.java:156)
        at com.facebook.presto.orc.stream.OrcInputStream.advance(OrcInputStream.java:210)
        at com.facebook.presto.orc.stream.OrcInputStream.read(OrcInputStream.java:125)
        at java.io.InputStream.read(InputStream.java:101)
        at com.facebook.presto.hive.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:737)
        at com.facebook.presto.hive.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701)
        at com.facebook.presto.hive.protobuf.CodedInputStream.readTag(CodedInputStream.java:99)
        at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.<init>(OrcProto.java:10661)
        at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.<init>(OrcProto.java:10625)
        at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10730)
        at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10725)
        at com.facebook.presto.hive.protobuf.AbstractParser.parseFrom(AbstractParser.java:89)
        at com.facebook.presto.hive.protobuf.AbstractParser.parseFrom(AbstractParser.java:95)
        at com.facebook.presto.hive.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
        at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.parseFrom(OrcProto.java:10958)
        at com.facebook.presto.orc.metadata.OrcMetadataReader.readStripeFooter(OrcMetadataReader.java:112)
        at com.facebook.presto.orc.StripeReader.readStripeFooter(StripeReader.java:321)
        at com.facebook.presto.orc.StripeReader.readStripe(StripeReader.java:100)
        at com.facebook.presto.orc.OrcRecordReader.advanceToNextStripe(OrcRecordReader.java:355)
        at com.facebook.presto.orc.OrcRecordReader.advanceToNextRowGroup(OrcRecordReader.java:317)
        at com.facebook.presto.orc.OrcRecordReader.nextBatch(OrcRecordReader.java:281)


any fix?

Thanks

Dain Sundstrom

unread,
Nov 30, 2015, 6:00:11 PM11/30/15
to presto...@googlegroups.com
The stack shows that the code is attempting to read a new stripe, and the stripe size in the file is larger than the bytes in the file. To me it looks like you have a corrupt ORC file. How did you write this file?

-dain
> --
> You received this message because you are subscribed to the Google Groups "Presto" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to presto-users...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Syed Akram

unread,
Nov 30, 2015, 10:47:34 PM11/30/15
to Presto
I wrote the file using Hive orcwriter.


i used above code to read the orc file, as you said, this file have multiple stripes, but i am not able to read that using above code,
can you suggest any changes?

Thanks

Syed Akram

unread,
Nov 30, 2015, 10:57:10 PM11/30/15
to Presto
FYI, i tried small file reading using orc reader, it works fine.

Dain Sundstrom

unread,
Dec 1, 2015, 9:04:25 PM12/1/15
to presto...@googlegroups.com
Can you post the code for the writer? IIRC you get this kind of error if you don’t flush the writer.

-dain

Syed Akram

unread,
Dec 1, 2015, 10:38:42 PM12/1/15
to Presto
OrcStructInspector orcStructInsp = createObjectInspector(structFieldNames, fieldObject);
ObjectInspector objInsp = (ObjectInspector) orcStructInsp;
List<? extends StructField> fieldRef = orcStructInsp.getAllStructFieldRefs();
setConf(hdfsIp, hdfsPort);
writer = OrcFile.createWriter(fs, new Path(finalPath), 
conf, objInsp, 67108864, CompressionKind.SNAPPY, 4096 , 10000);

int fieldSize = fieldRef.size();
ObjectInspector[] objArray = new ObjectInspector[fieldSize]; 
for (int i = 0; i < fieldSize; i++)
{
objArray[i] = fieldRef.get(i).getFieldObjectInspector();
}

row.setFieldValue(i, Writables.getWritable(objArray[i], res));
writer.addRow(row);
}
finally
{
try
{
if (writer != null)
{
writer.close();
}
}

On Monday, November 30, 2015 at 12:48:22 PM UTC+5:30, Syed Akram wrote:
Reply all
Reply to author
Forward
0 new messages