Hello everyone,
These lasts weeks we've been seeing a weird behaviour on some groupBy queries. They randomly fail with the following error:
net.jpountz.lz4.LZ4Exception: Error decoding offset 35777 of input buffer
at net.jpountz.lz4.LZ4JNIFastDecompressor.decompress(LZ4JNIFastDecompressor.java:66) ~[lz4-1.3.0.jar:?]
at io.druid.segment.data.CompressedObjectStrategy$LZ4Decompressor.decompress(CompressedObjectStrategy.java:290) ~[druid-processing-0.9.2.jar:0.9.2]
at io.druid.segment.data.FixedSizeCompressedObjectStrategy.decompress(FixedSizeCompressedObjectStrategy.java:54) ~[druid-processing-0.9.2.jar:0.9.2]
at io.druid.segment.data.CompressedObjectStrategy.fromByteBuffer(CompressedObjectStrategy.java:339) ~[druid-processing-0.9.2.jar:0.9.2]
at io.druid.segment.data.CompressedObjectStrategy.fromByteBuffer(CompressedObjectStrategy.java:45) ~[druid-processing-0.9.2.jar:0.9.2]
at io.druid.segment.data.GenericIndexed$BufferIndexed._get(GenericIndexed.java:225) ~[druid-processing-0.9.2.jar:0.9.2]
at io.druid.segment.data.GenericIndexed$1.get(GenericIndexed.java:300) ~[druid-processing-0.9.2.jar:0.9.2]
at io.druid.segment.data.CompressedVSizeIntsIndexedSupplier$CompressedVSizeIndexedInts.loadBuffer(CompressedVSizeIntsIndexedSupplier.java:383) ~[druid-processing-0.9.2.jar:0.9.2]
at io.druid.segment.data.CompressedVSizeIntsIndexedSupplier$CompressedVSizeIndexedInts.get(CompressedVSizeIntsIndexedSupplier.java:344) ~[druid-processing-0.9.2.jar:0.9.2]
at io.druid.segment.column.SimpleDictionaryEncodedColumn.getSingleValueRow(SimpleDictionaryEncodedColumn.java:65) ~[druid-processing-0.9.2.jar:0.9.2]
at io.druid.segment.QueryableIndexStorageAdapter$CursorSequenceBuilder$1$1QueryableIndexBaseCursor$2$1.get(QueryableIndexStorageAdapter.java:526) ~[druid-processing-0.9.2.jar:0.9.2]
at io.druid.query.groupby.epinephelinae.GroupByQueryEngineV2$GroupByEngineIterator.next(GroupByQueryEngineV2.java:238) ~[druid-processing-0.9.2.jar:0.9.2]
at io.druid.query.groupby.epinephelinae.GroupByQueryEngineV2$GroupByEngineIterator.next(GroupByQueryEngineV2.java:148) ~[druid-processing-0.9.2.jar:0.9.2]
This log is from one of the historical servers. This groupBy is on two dimensions but this behavior happened also with single dimension. TopN queries have the same issue.
The weird thing about this is that after two minutes we ran the same query again, without doing anything on the cluster, and it succeeded. Nothing was changed.
We are using 0.9.2 version of Druid, and groupBy failed either with v1 or v2 version.
What can we do to find more information about the error? Do you know what is happening here?
Thanks!