druid performance optimization issues

752 views
Skip to first unread message

zoucaitou

unread,
Feb 28, 2018, 9:53:01 PM2/28/18
to Druid User
System 
  • ubuntu 16.04
  • druid 0.10.1
  • hadoop 2.9.0
Hardware
  • cpu*16
  • mem 64g
  • storage 500g(not ssd)
 Distributed
  • master node => node1
  • data node => node2
  • query node => node3

Master Node

Coordinator
 
jvm.config
-server
-Xmx10g
-Xms10g
-XX:NewSize=512m
-XX:MaxNewSize=512m
-XX:+UseG1GC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=/home/druid/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-Dderby.stream.error.file=/home/druid/derby.log
 
runtime.properties 
druid.service=druid/coordinator
druid.host=xxxxxxxx
druid.port=8081

druid.coordinator.startDelay=PT30S
druid.coordinator.period=PT30S
druid.coordinator.merge.on=true
 
Overlord
jvm.config
-server
-Xmx4g
-Xms4g
-XX:NewSize=256m
-XX:MaxNewSize=256m
-XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=/home/druid/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

 
runtime.properties 
druid.service=druid/overlord
druid.host=xxxxxxx
druid.port=8090

druid.indexer.autoscale.doAutoscale=true
druid.indexer.autoscale.strategy=ec2
druid.indexer.autoscale.workerIdleTimeout=PT90m
druid.indexer.autoscale.terminatePeriod=PT5M

druid.indexer.queue.startDelay=PT30S
druid.coordinator.period=PT30S

druid.indexer.runner.type=remote
druid.indexer.storage.type=metadata

 Data Node

Historical
 
jvm.config
-server
-Xmx12g
-Xms12g
-XX:NewSize=6g
-XX:MaxNewSize=6g
-XX:MaxDirectMemorySize=30g
-XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=/home/druid/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

 
runtime.properties 
druid.service=druid/historical
druid.host=xxxxxx
druid.port=8083

druid.server.tier=hot
druid.server.priority=100

# HTTP server threads
druid.server.http.numThreads=45

# Processing threads and buffers
druid.processing.buffer.sizeBytes=1073741824
druid.processing.numMergeBuffers=11
druid.processing.numThreads=15
druid.processing.tmpDir=/home/druid/processing

# Segment storage
druid.segmentCache.locations=[{"path":"/home/druid/segment-cache","maxSize"\:300000000000}]
druid.server.maxSize=300000000000

# Query cache
druid.historical.cache.useCache=false
druid.historical.cache.populateCache=false
# druid.cache.type=caffeine
# druid.cache.sizeInBytes=2000000000

 
MiddleManager
jvm.config
-server
-Xmx64m
-Xms64m
-XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=/home/druid/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager


 
runtime.properties 
druid.service=druid/middlemanager
druid.host=xxxxxxxx
druid.port=8091

# Number of tasks per middleManager
druid.worker.capacity=10

# Task launch parameters
druid.indexer.runner.javaOpts=-server -Xmx3g -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

druid.indexer.task.baseTaskDir=/home/druid/task
druid.indexer.task.restoreTasksOnRestart=true

# HTTP server threads
druid.server.http.numThreads=45

# Processing threads and buffers
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=336870912
druid.indexer.fork.property.druid.processing.numThreads=2
druid.indexer.fork.property.druid.segmentCache.locations=[{"path": "/home/druid/processing", "maxSize": 0}]
druid.indexer.fork.property.druid.server.http.numThreads=45

druid.processing.buffer.sizeBytes=100000000
druid.processing.numMergeBuffers=2
druid.processing.numThreads=3
druid.processing.tmpDir=/home/druid/processing

# Hadoop indexing
druid.indexer.task.hadoopWorkingPath=/home/druid/hadoop-tmp
druid.indexer.task.defaultHadoopCoordinates=["org.apache.hadoop:hadoop-client:2.7.3"]

Query Node

Broker
 
jvm.config
-server
-Xmx20g
-Xms20g
-XX:NewSize=6g
-XX:MaxNewSize=6g
-XX:MaxDirectMemorySize=30g
-XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=/home/druid/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager


 
runtime.properties 
druid.service=druid/broker
druid.host=xxxxxx
druid.port=8082

# HTTP server threads
druid.broker.http.numConnections=20
druid.server.http.numThreads=45

# Processing threads and buffers
druid.processing.buffer.sizeBytes=1073741824
druid.processing.numMergeBuffers=11
druid.processing.numThreads=15
druid.processing.tmpDir=/home/druid/processing

# Query cache disabled -- push down caching and merging instead
druid.broker.cache.useCache=true
druid.broker.cache.populateCache=true
druid.cache.type=memcached
druid.cache.hosts=node1:11211,node3:11211
druid.cache.memcachedPrefix=druid
druid.cache.numConnections=12

druid.broker.select.tier=highestPriority


Cluster





metric monitor historical node query/wait time very slow



GunWoo Kim

unread,
Jun 30, 2018, 1:51:21 PM6/30/18
to Druid User
Hi, zoucaitou

how many historical node do you use?

i checked your historical node runtime props,
-Xmx12g
-Xms12g
-XX:NewSize=6g
-XX:MaxNewSize=6g
-XX:MaxDirectMemorySize=30g


your server has 64Gb memory and with your historical runtime conf, available memory for segment loading is under 20gb.

if 20gb memory is not enough for serving segments in historical node, there can be page in/out for segments and it can effect to query processing.

Check your total segments size per historical node and available memory for your segments.

good luck :)

zhangxin...@gmail.com

unread,
Jul 14, 2018, 10:13:21 AM7/14/18
to Druid User
I think you can add the GC logs configuration in your historical nodes  jvm.config file, then test the heap size changes through the gc logs once the queries are coming. guess it maybe not enough young generation heap size, you can increase the -NewSize or -MaxNewSize, or it maybe caused by whole heap size, you can solve it using -Xmx or  -Xms.

在 2018年3月1日星期四 UTC+8上午10:53:01,zoucaitou写道:
Reply all
Reply to author
Forward
0 new messages