Hi Vijay,
Thanks for the document, we already have maxQueuedBytes set to 10Mb but we face this issue still.
Interestingly after reading the document I did try out an experiment, I tried to increase druid.router.http.readTimeout in router from existing PT5M to PT15M, and surprisingly we were able to query upto 8min 40s instead of usual 5min after which the query failed with the same error but on the client side it closed without exception, it was able to query 81,36,866 rows vs 30,00,000 rows previous.
This is a big improvement & this is why its confusing for me as i am really not sure how to debug this nor i am able to understand what is causing this issue. I am starting to believe this is related to timeouts, but since we have so many timeout parameters of each processes i wish to find the correlation between them and how i can debug this problem.
I have listed out all server configurations below.
We are using currently druid 0.20.0 version.
################################################
Current values in our servers
################################################################
druid.server.http.defaultQueryTimeout
-------------------------------------
Historical : Not set so default is 300000ms i.e 5 minutes
Broker : Not set so default is 300000ms i.e 5 minutes
MiddleManager : Not set so default is 300000ms i.e 5 minutes
druid.broker.http.readTimeout
-------------------------------------
Broker-1 : PT5M
Broker-2 : PT5M
druid.router.http.readTimeout
-------------------------------------
Router : PT5M
druid.broker.http.maxQueuedBytes
-------------------------------------
Broker-1: 10MiB
Broker-2: 10MiB
druid.processing.buffer.sizeBytes
-------------------------------------
Historical-1 : 1024MiB
Historical-2 : 1024MiB
Broker-1 : 500MiB
Broker-2 :500MiB
MiddleManager-1 : druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100MiB
Router : not set
druid.query.groupBy.maxMergingDictionarySize
-------------------------------------
Historical-1 : not set so default is 100000000 (100Mb)
Historical-2 : not set so default is 100000000 (100Mb)
Broker-1 : not set so default is 100000000 (100Mb)
Broker-2 : not set so default is 100000000
MiddleManager-1 : not set so default is 100000000 (100Mb)
Router :
druid.query.groupBy.maxOnDiskStorage
-------------------------------------
Historical-1 : 1000000000 (100Mb)
Historical-2 : 1000000000 (100Mb)
Broker-1 : not set so default is 0 (disabled disk spilling)
Broker-2 : not set so default is 0 (disabled disk spilling)
MiddleManager-1 : not set so default is 0 (disabled disk spilling)
Router :
#########################
Broker configuration
#########################
# HTTP server settings
druid.server.http.numThreads=50
# 15 minutes should be max timeout, we should not allow any queries
# that holds up druid system
druid.server.http.maxQueryTimeout=900000
druid.server.http.maxSubqueryRows=1000000
druid.broker.balancer.type=connectionCount
druid.broker.http.numConnections=40
druid.broker.http.maxQueuedBytes=10MiB
druid.broker.http.readTimeout=PT5M
druid.broker.http.unusedConnectionTimeout=PT4M
druid.broker.http.numMaxThreads=40
druid.processing.buffer.sizeBytes=500MiB
druid.processing.numMergeBuffers=6
druid.processing.numThreads=1
druid.processing.tmpDir=/opt/apache-druid-0.20.0/var/druid/processing
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.SysMonitor","org.apache.druid.java.util.metrics.JvmMonitor","org.apache.druid.java.util.metrics.JvmThreadsMonitor","org.apache.druid.java.util.metrics.JvmCpuMonitor","org.apache.druid.java.util.metrics.CpuAcctDeltaMonitor","org.apache.druid.client.cache.CacheMonitor","org.apache.druid.server.metrics.QueryCountStatsMonitor","org.apache.druid.server.emitter.HttpEmittingMonitor"]
#########################
Router configuration
#########################
druid.router.http.numConnections=50
druid.router.http.readTimeout=PT15M
druid.router.http.numMaxThreads=40
druid.server.http.numThreads=50
druid.router.defaultBrokerServiceName=druid/broker
druid.router.coordinatorServiceName=druid/coordinator
druid.router.managementProxy.enabled=true
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.SysMonitor","org.apache.druid.java.util.metrics.JvmCpuMonitor","org.apache.druid.java.util.metrics.CpuAcctDeltaMonitor","org.apache.druid.java.util.metrics.JvmMonitor","org.apache.druid.java.util.metrics.JvmThreadsMonitor","org.apache.druid.server.emitter.HttpEmittingMonitor"]
#########################
Historical configuration
#########################
druid.server.http.numThreads=50
druid.processing.buffer.sizeBytes=1024MiB
druid.processing.numMergeBuffers=2
druid.processing.numThreads=7
druid.processing.tmpDir=/opt/apache-druid-0.20.0/data/processing
druid.segmentCache.locations=[{"path":"/opt/apache-druid-0.20.0/data/segment-cache","maxSize":"540g"}]
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=caffeine
druid.cache.sizeInBytes=512MiB
druid.query.groupBy.maxOnDiskStorage=1000000000
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.SysMonitor","org.apache.druid.java.util.metrics.JvmCpuMonitor","org.apache.druid.java.util.metrics.CpuAcctDeltaMonitor","org.apache.druid.java.util.metrics.JvmMonitor","org.apache.druid.java.util.metrics.JvmThreadsMonitor","org.apache.druid.client.cache.CacheMonitor","org.apache.druid.server.metrics.QueryCountStatsMonitor","org.apache.druid.server.metrics.HistoricalMetricsMonitor", "org.apache.druid.server.emitter.HttpEmittingMonitor"]
#########################
MiddleManager
#########################
druid.indexer.runner.startPort=8100
druid.indexer.runner.endPort=8140
druid.worker.capacity=8
druid.indexer.runner.javaOptsArray=["-server" ,"-Xms1200m" ,"-Xmx1200m" ,"-XX:MaxDirectMemorySize=1g","-Duser.timezone=UTC","-Dfile.encoding=UTF-8","-XX:+ExitOnOutOfMemoryError","-Djute.maxbuffer=1024000","-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager","-Dorg.jboss.logging.provider=slf4j","-Dnet.spy.log.LoggerImpl=net.spy.memcached.compat.log.SLF4JLogger","-Dlog4j.shutdownCallbackRegistry=org.apache.druid.common.config.Log4jShutdown","-Dlog4j.shutdownHookEnabled=true","-XX:+HeapDumpOnOutOfMemoryError","-XX:HeapDumpPath=/log/jvm/heapdump"]
druid.indexer.task.baseTaskDir=/opt/apache-druid-0.20.0/data/task
druid.server.http.numThreads=50
druid.indexer.fork.property.druid.processing.numMergeBuffers=2
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100MiB
druid.indexer.fork.property.druid.processing.numThreads=1
druid.indexer.fork.property.druid.processing.tmpDir=/opt/apache-druid-0.20.0/data/tmp
#########################
Python code
#########################
import pandas as pd
from pydruid.db import connect
from datetime import datetime, timedelta
import pytz
st = datetime.now()
query = """select __time, MRIClientId, MRISessionId, EventType, RequestTimestamp, ResponseTimestamp, Phone, TIN, PNR from mriprodstream where __time BETWEEN TIMESTAMP '2021-08-24 00:00:00' and TIMESTAMP '2021-08-24 23:59:59'"""
try:
conn = connect(host='mridruidquery', port=8888, path='/druid/v2/sql', scheme='http', context={"timeout" : 900000})
curs = conn.cursor()
print("Curs connection done!!")
df = pd.DataFrame(curs.execute(query))
print("data frame ready")
print(df.shape)
except Exception as e:
print(e)
finally:
if curs:
curs.close()
if conn:
conn.close()
print("Execution taken = {tt}".format(tt=(datetime.now()-st)))
Regards,
Nitish