Help needed regarding the druid exception

30 views
Skip to first unread message

Keerthi Kumar

unread,
Jun 12, 2024, 1:28:31 AMJun 12
to Druid User
Hi All,

We are trying to cache around 80 millions rows of data in druid with the below hardware:

No. of nodes - 4
CPU : 46
RAM : 96 GB
OS Disk : 100 Gb
Data Disk : 1 * 500 GB
Storage : 2 *Data Disk 1 TB

- 20 parallee queries - few are manually executed and few are scheduled
- Dashboards with 6 charts.

Now, the application is failing to load the dashboards from druid with the below exception:

Historicals:
2024-06-08T10:17:38,141 WARN [qtp1990720701-72[groupBy_[ds_ws_mvno_d_trm_rat_miss_t_ra]_f1c07071-9885-4b96-b9c3-0f3d1c896ce2]] org.apache.druid.server.QueryLifecycle - Exception while processing queryId [f1c07071-9885-4b96-b9c3-fdgfdgfdgfdg] (org.apache.druid.query.QueryTimeoutException: Query Timed Out!)

Router:
TRACE StatusLogger Call to LogManager.getLogger(com.sun.jersey.server.impl.application.CloseableServiceFactory)
2024-06-07T17:47:39,388 ERROR [CoordinatorRuleManager-Exec--0] org.apache.druid.server.router.CoordinatorRuleManager - Exception while polling for rules
org.apache.druid.java.util.common.IOE: No known server
at org.apache.druid.discovery.DruidLeaderClient.getCurrentKnownLeader(DruidLeaderClient.java:267) ~[druid-server-24.0.1.jar:24.0.1]
at org.apache.druid.discovery.DruidLeaderClient.makeRequest(DruidLeaderClient.java:122) ~[druid-server-24.0.1.jar:24.0.1]
at org.apache.druid.server.router.CoordinatorRuleManager.poll(CoordinatorRuleManager.java:138) ~[druid-services-24.0.1.jar:24.0.1]
at org.apache.druid.java.util.common.concurrent.ScheduledExecutors$1.call(ScheduledExecutors.java:55) ~[druid-core-24.0.1.jar:24.0.1]

Broker:
2024-06-08T10:17:38,132 WARN [sql[b563386c-0247-4a90-80f8-95dff3577d58]] org.apache.druid.server.QueryLifecycle - Exception while processing queryId [a4660137-48bf-4dfc-8374-05425aa4a0a0] (org.apache.druid.query.QueryTimeoutException: Query [a4660137-48bf-4dfc-8374-05425aa4a0a0] timed out!)
2024-06-08T10:17:38,133 WARN [sql[64e55900-20d0-43f5-815c-6b94b019a3ac]] org.apache.druid.server.QueryLifecycle - Exception while processing queryId [1e035b2c-d951-4802-a618-320a03289976] (org.apache.druid.query.QueryTimeoutException: Query [1e035b2c-d951-4802-a618-320a03289976] timed out!)
2024-06-08T10:18:58,884 ERROR [qtp1389978471-113] org.apache.druid.sql.http.SqlResource - Unable to send SQL response [ee876e5a-b24b-440a-9744-b0c391aa941e]
org.eclipse.jetty.io.EofException: null
at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:280) ~[jetty-io-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422) ~[jetty-io-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:277) ~[jetty-io-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381) ~[jetty-io-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:831) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622]

Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) ~[?:1.8.0_275]
at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) ~[?:1.8.0_275]
at sun.nio.ch.IOUtil.write(IOUtil.java:148) ~[?:1.8.0_275]
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:503) ~[?:1.8.0_275]
at java.nio.channels.SocketChannel.write(SocketChannel.java:502) ~[?:1.8.0_275]
at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:274) ~[jetty-io-9.4.48.v20220622.jar:9.4.48.v20220622]
... 84 more
2024-06-08T10:18:59,404 ERROR [qtp1389978471-122] org.apache.druid.sql.http.SqlResource - Unable to send SQL response [7175fc93-f81e-4137-b103-419b3986d9a7]
org.eclipse.jetty.io.EofException: null
at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:280) ~[jetty-io-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422) ~[jetty-io-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:277) ~[jetty-io-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381) ~[jetty-io-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:831) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622]

can anyone pls help me with this issue?? Does the above said hardware suffice the requirement? and also how to fix this issue

Peter Marshall

unread,
Jun 18, 2024, 4:00:30 AMJun 18
to Druid User
Hey!

From the number of "timeout" I would say this is a true, on purpose catch since the query took too long to run.

This would indicate to me that you have a complex query that is not "interactive" (ie, it's meant for sub-second query responses) yet you are using the interactive API versus the asynchronous ("MSQ") API.
If you do need queries to come back < 5s, then you may want to look at the https://learn.imply.io/ course on data modelling to see if you are partitioning etc. your data sufficiently. I believe there is also a video in the course that explains how queries execute when you use the interactive SQL API. These may lead you to look at things like the __time filter in your queries, whether you are using secondary partitioning on commonly filtered dimensions, whether you could use aggregation, etc.

I also notice that you're running Druid 24 - Druid 30 has just been release so you are quite some way behind and you may want to think about upgrading.

Hope these things help!

John Kowtko

unread,
Jun 18, 2024, 7:26:59 AMJun 18
to Druid User
Hi N.Keerth,

In addition to everything Peter mentioned, if you want to send a copy of your query we can take a quick look at it to see if there are any quick optimizations we can do.

Thanks.  John

Reply all
Reply to author
Forward
0 new messages