Error to query huge volume of data in druid

464 views
Skip to first unread message

shailendra kumar

unread,
Apr 15, 2021, 6:03:47 AM4/15/21
to Druid User
We were running a druid query server on a 2 core and 8GB RAM server(broker jvm: 4g and router jvm 512m). With this configuration we were able to query ~5GiB of data or 50L of records.
If we try to query more time periods we are getting error in druid UI "unparsable row in string return".

2021-04-15.png


When we looked at logs found below error in broker.log

2021-04-15T08:16:00,618 ERROR [qtp1821711066-139] org.apache.druid.sql.http.SqlResource - Unable to send SQL response [20da59b3-0640-48fd-a00e-49cc8b8902a3]
org.apache.druid.query.QueryInterruptedException: Unexpected end-of-input: expected close marker for Array (start marker at [Source: (SequenceInputStream); line: -1, column: -1])
at [Source: (SequenceInputStream); line: -1, column: 396074945]
       at org.apache.druid.client.JsonParserIterator.interruptQuery(JsonParserIterator.java:197) ~[druid-server-0.20.0.jar:0.20.0]
       at org.apache.druid.client.JsonParserIterator.next(JsonParserIterator.java:119) ~[druid-server-0.20.0.jar:0.20.0]
       at org.apache.druid.java.util.common.guava.BaseSequence.makeYielder(BaseSequence.java:90) ~[druid-core-0.20.0.jar:0.20.0]
       at org.apache.druid.java.util.common.guava.BaseSequence.access$000(BaseSequence.java:27) ~[druid-core-0.20.0.jar:0.20.0]
       at org.apache.druid.java.util.common.guava.BaseSequence$1.next(BaseSequence.java:114) ~[druid-core-0.20.0.jar:0.20.0]
       at org.apache.druid.java.util.common.guava.MergeSequence.makeYielder(MergeSequence.java:131) ~[druid-core-0.20.0.jar:0.20.0]
       at org.apache.druid.java.util.common.guava.MergeSequence.access$000(MergeSequence.java:32) ~[druid-core-0.20.0.jar:0.20.0]
       at org.apache.druid.java.util.common.guava.MergeSequence$2.next(MergeSequence.java:173) ~[druid-core-0.20.0.jar:0.20.0]
       at org.apache.druid.java.util.common.guava.WrappingYielder$1.get(WrappingYielder.java:53) ~[druid-core-0.20.0.jar:0.20.0]
       at org.apache.druid.java.util.common.guava.WrappingYielder$1.get(WrappingYielder.java:49) ~[druid-core-0.20.0.jar:0.20.0]
       at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.20.0.jar:0.20.0]
       at org.apache.druid.java.util.common.guava.WrappingYielder.next(WrappingYielder.java:48) ~[druid-core-0.20.0.jar:0.20.0]
       at org.apache.druid.java.util.common.guava.MergeSequence.makeYielder(MergeSequence.java:131) ~[druid-core-0.20.0.jar:0.20.0]

Then we tried by upgrading to 4 core and 16GB RAM server(broker jvm: 10g and router jvm 1024m)
But nothing changed. The query limit and error remained as above.

qqqq.png

We are trying to achieve nearly 200L/2B records or 20GB of data in a single query. Can someone suggest a hardware or software configuration which is required to achieve this result.

Rachel Pedreschi

unread,
Apr 15, 2021, 10:42:34 AM4/15/21
to druid...@googlegroups.com
Is it possible that it is being caused by a certain value that you are only retrieving when you query more data?   Or does this happen when you reach a certain number of rows / amount of data regardless of what time period you are querying?

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/25602328-fe6f-4e08-a24e-7759413bf6cen%40googlegroups.com.


--
Rachel Pedreschi
VP Developer Relations and Community
Imply.io

shailendra kumar

unread,
Apr 16, 2021, 2:11:53 AM4/16/21
to Druid User

Hi Rachel,

    This happen when we reach a certain number of rows / amount of data regardless of what time period we are querying.

Peter Marshall

unread,
Apr 16, 2021, 10:42:37 AM4/16/21
to Druid User
I'm asking around some developers and will see what comes back :)

vai...@imply.io

unread,
Apr 16, 2021, 1:41:46 PM4/16/21
to Druid User

It seems like you are hitting the below-reported bug, which is still in an open state -
https://github.com/apache/druid/issues/10907

Thanks and Regards,
Vaibhav

Ben Krug

unread,
Apr 16, 2021, 5:15:26 PM4/16/21
to druid...@googlegroups.com
I also found a similar report, in which it turned out to be a misleading message, from a timeout.  Can you add a longer timeout in the query context and see whether it runs longer?  Eg, you can edit the query context by clicking the three dots next to "Run", and add something like this:

"timeout": 20000

Of course, your query will run longer if it works, which might strain your system, so keep an eye on it.
Reply all
Reply to author
Forward
0 new messages