groupBy/topN failure

152 views
Skip to first unread message

Federico Nieves

unread,
Jan 4, 2017, 1:19:06 PM1/4/17
to Druid User
Hello everyone,

These lasts weeks we've been seeing a weird behaviour on some groupBy queries. They randomly fail with the following error:

net.jpountz.lz4.LZ4Exception: Error decoding offset 35777 of input buffer
        at net.jpountz.lz4.LZ4JNIFastDecompressor.decompress(LZ4JNIFastDecompressor.java:66) ~[lz4-1.3.0.jar:?]
        at io.druid.segment.data.CompressedObjectStrategy$LZ4Decompressor.decompress(CompressedObjectStrategy.java:290) ~[druid-processing-0.9.2.jar:0.9.2]
        at io.druid.segment.data.FixedSizeCompressedObjectStrategy.decompress(FixedSizeCompressedObjectStrategy.java:54) ~[druid-processing-0.9.2.jar:0.9.2]
        at io.druid.segment.data.CompressedObjectStrategy.fromByteBuffer(CompressedObjectStrategy.java:339) ~[druid-processing-0.9.2.jar:0.9.2]
        at io.druid.segment.data.CompressedObjectStrategy.fromByteBuffer(CompressedObjectStrategy.java:45) ~[druid-processing-0.9.2.jar:0.9.2]
        at io.druid.segment.data.GenericIndexed$BufferIndexed._get(GenericIndexed.java:225) ~[druid-processing-0.9.2.jar:0.9.2]
        at io.druid.segment.data.GenericIndexed$1.get(GenericIndexed.java:300) ~[druid-processing-0.9.2.jar:0.9.2]
        at io.druid.segment.data.CompressedVSizeIntsIndexedSupplier$CompressedVSizeIndexedInts.loadBuffer(CompressedVSizeIntsIndexedSupplier.java:383) ~[druid-processing-0.9.2.jar:0.9.2]
        at io.druid.segment.data.CompressedVSizeIntsIndexedSupplier$CompressedVSizeIndexedInts.get(CompressedVSizeIntsIndexedSupplier.java:344) ~[druid-processing-0.9.2.jar:0.9.2]
        at io.druid.segment.column.SimpleDictionaryEncodedColumn.getSingleValueRow(SimpleDictionaryEncodedColumn.java:65) ~[druid-processing-0.9.2.jar:0.9.2]
        at io.druid.segment.QueryableIndexStorageAdapter$CursorSequenceBuilder$1$1QueryableIndexBaseCursor$2$1.get(QueryableIndexStorageAdapter.java:526) ~[druid-processing-0.9.2.jar:0.9.2]
        at io.druid.query.groupby.epinephelinae.GroupByQueryEngineV2$GroupByEngineIterator.next(GroupByQueryEngineV2.java:238) ~[druid-processing-0.9.2.jar:0.9.2]
        at io.druid.query.groupby.epinephelinae.GroupByQueryEngineV2$GroupByEngineIterator.next(GroupByQueryEngineV2.java:148) ~[druid-processing-0.9.2.jar:0.9.2]

This log is from one of the historical servers. This groupBy is on two dimensions but this behavior happened also with single dimension. TopN queries have the same issue.

The weird thing about this is that after two minutes we ran the same query again, without doing anything on the cluster, and it succeeded. Nothing was changed.

We are using 0.9.2 version of Druid, and groupBy failed either with v1 or v2 version.

What can we do to find more information about the error? Do you know what is happening here?


Thanks!

Gian Merlino

unread,
Jan 4, 2017, 2:11:36 PM1/4/17
to druid...@googlegroups.com
Hey Federico,

I wonder if one of your segment files on one of your historicals was corrupt. The query might work sometimes (if a different historical was picked) and fail sometimes (if the historical with the bad copy was picked). Also if the bad segment is moved, that would involve re-downloading it, and the query should then always work (since the bad copy is gone).

At this current time, does the query fail at all, ever? If so, could you try tracking down a specific segment-granular "intervals" that causes it to fail, and then try tracking that down to a bad segment on a bad historical node?

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/b93e7a63-2aa1-4075-82ed-c4e0e9f1f622%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Federico Nieves

unread,
Jan 4, 2017, 3:08:59 PM1/4/17
to Druid User
Hi Gian,

That could be a good possibility. It is very hard to track it down though, the query failed for like 4 or 5 consecutive times, but then started to work all the time with no exception.

It happened with another query also, but the same behavior occurred. After failing a couple of times it doesn't fail anymore.

Any thoughts about how could I get the bad segment? Maybe from logs?

Thanks for the help !!

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.

Gian Merlino

unread,
Jan 4, 2017, 4:26:13 PM1/4/17
to druid...@googlegroups.com
It's tough to tell from logs; the best way is probably to narrow down through "intervals" which segments may be involved, and then try running dump-segment on those segments as they are found in the historical node caches: http://druid.io/docs/latest/operations/dump-segment.html

If the segments are corrupt then dump-segment should fail.

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Federico Nieves

unread,
Jan 4, 2017, 5:34:21 PM1/4/17
to Druid User
I followed instructions on your link and came to the following error:

Exception in thread "main" java.lang.RuntimeException: java.io.FileNotFoundException: /var/druid/cache/historical/source/2016-11-23T12:00:00.000Z_2016-11-23T13:00:00.000Z/2016-11-23T12:00:19.081Z/index.drd

It seems that index.drd isn't present on any of the cached segments. Could it be that I have to download the segment from HDFS, extract index.zip and then point to the folder of the extrated files?

Thanks for your help !

Gian

Federico Nieves

unread,
Jan 24, 2017, 6:51:30 PM1/24/17
to Druid User
Hi Gian, were you able to see my earlier post? The error is still happening (and quite frequently). Weird thing is that sometimes after a couple of retries the query succeeds.

Thanks!

Gian Merlino

unread,
Jan 24, 2017, 7:13:14 PM1/24/17
to druid...@googlegroups.com
Sorry I missed it. That error is a bit misleading, it just means there's no segment files in the directory you provided. You're probably just missing specifying the partition number of the segment. Try using /var/druid/cache/historical/source/2016-11-23T12:00:00.000Z_2016-11-23T13:00:00.000Z/2016-11-23T12:00:19.081Z/X/ where the final X/ is the partition directory (it'll be a number like 0, 1, 2, etc… if there are multiple, each one will have its own segment in it). Then the dump should work.

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Federico Nieves

unread,
Jan 24, 2017, 7:25:26 PM1/24/17
to Druid User
Thanks Gian, that was quick.

Actually files are there, but not the one that the process is looking for. Here is what the folder you mentioned looks like:

00000.smoosh
meta.smoosh
version.bin

Only those 3 files (00000.smoosh is the big sized one, should have all the segment data). But there is no "index.drd". Am I missing something?

Thanks!

Gian

Gian Merlino

unread,
Jan 24, 2017, 7:29:06 PM1/24/17
to druid...@googlegroups.com
dump-segment only looks for index.drd if version.bin is missing, and you do have version.bin. So that's why I suggest double-checking the directory parameter you're giving to dump-segment.

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Federico Nieves

unread,
Jan 24, 2017, 7:41:27 PM1/24/17
to Druid User
It worked!. So as you said, the path was missing the /0 folder which is present on everyone. Silly me that didn't check that before.

Seems like a lot of work checking one by one, I could script it, but it could take a little bit. Isn't there any other way to check about that error (from my first message) or segment integrity on another way?

Thanks a lot Gian, one more time :)

Gian

Gian Merlino

unread,
Jan 24, 2017, 7:58:48 PM1/24/17
to druid...@googlegroups.com
I think scripting it is probably the best way. At least that's what I would do.

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages