Hi All,
Our druid query performance seems to be ok under light load but degrades fairly quickly when load is increased. I initially thought it was an issue with specific queries that filtered on certain dimensions (see here). But I also wanted to take a higher level look at my configs and get some feedback as to whether there are areas that can be optimized.
Here is our setup:
3 broker nodes. Each node has:
-cpu * 8
-mem: 61 gig
-storage: 160 gig
broker node config:
druid.cache.type=local
druid.cache.sizeInBytes=104857600
druid.broker.cache.useCache=true
druid.broker.cache.populateCache=true
druid.broker.cache.unCacheable=[]
druid.processing.numThreads=7
druid.processing.buffer.sizeBytes=2147483647
druid.broker.http.numConnections=15
druid.broker.http.readTimeout=PT5M
druid.server.http.numThreads=30
broker node mem config:
-Xmx24g
-Xms24g
-XX:NewSize=8g
-XX:MaxNewSize=8g
-XX:MaxDirectMemorySize=16g
2 historical nodes. Each node has:
-cpu * 8
-mem: 61 gig
-storage: 1.6 TB SSD
historical node config:
druid.cache.type=local
druid.cache.sizeInBytes=104857600
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.historical.cache.unCacheable=[]
druid.server.maxSize=1503238553600
druid.segmentCache.locations=[{"path":"/data/b/druid/segmentCache","maxSize": 751619276800},{"path":"/data/c/druid/segmentCache","maxSize": 751619276800}]
druid.processing.numThreads=7
druid.processing.buffer.sizeBytes=1073741824
druid.broker.cache.useCache=true
druid.broker.cache.populateCache=true
historical node mem config:
-Xmx2g
-Xms2g
-XX:MaxDirectMemorySize=8g
-XX:NewSize=1g
-XX:MaxNewSize=1g
Segment Info:
-Our datasource schema has 22 dimensions and 7 metrics.
-Queries are only using longSum metric aggregators.
-The coordinator says the datasource has 348 gig in cold storage (this is what's stored in the historical local disk I assume?)
-No replication on the historical nodes
-Segment granularity is DAY
-Data source has a total of 517 shards in 129 intervals.
-We have a skew in our segment sizes: previously had been around 800 MB across two shards. Increase in data in the last month now yields around 4 GB across 10 shards per segment (gradual increase).
The degradation in the query performance seems to increase when the queries contain more dimension filters, or more values to filter on per dimension. The degradation is also gradual, slowly affecting a larger percentage of the queries being made and when degraded, is very pronounced:
Timeseries queries normally: 1-10 ms
TopN queries normally: 100-200 ms
Degraded queries (both timeseries/topN): up to 30 sec.
I am already trying to optimize the client calls to not put so much concurrent load on the broker, but are there other optimizations that can be made in terms of our configuration or setup? Please let me know if there is any additional information that may be helpful.
Thank you!
-JamesTo view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/3313da4b-5f59-486d-a4ca-9de442a1991c%40googlegroups.com.--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/66565de7-62fc-45fb-88a5-cd6ee11a2095%40googlegroups.com.