Druid Historical node potential memory leak

ita...@oribi.io

unread,

Jun 15, 2017, 10:15:30 AM6/15/17

to Druid User

Hi,

Running one historical node on a m4.4xlarge machine (16 core, 64gb). After working for 1-2 days, there were OOM errors. The GC print details showed the problem could be the young generation size, so I raised that up to 6gb as per the druid production cluster configuration.

So after a few more days, I see lots of GC's again. Ran jstat and got the following (even after running a manual GC with jcmd <pid> GC.run):

S0 S1 E O M CCS YGC YGCT FGC FGCT GCT

0.00 0.00 5.73 99.99 97.67 94.52 1495 48.873 103 100.036 148.908

The old generation capacity is at 99.99%. Only restarting the server solved this (something that is not feasible to do every 1-2 days).

This is current jvm.config file:

-server

-Xms16g

-Xmx16g

-XX:NewSize=6g

-XX:MaxNewSize=6g

-XX:MaxDirectMemorySize=12g

-XX:+UseConcMarkSweepGC

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=var/tmp

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

This is the runtime.properties file (rest of the values are default):

druid.service=druid/historical

druid.port=8083

# HTTP server threads

druid.server.http.numThreads=20

# Processing threads and buffers

druid.processing.buffer.sizeBytes=536870912

# Segment storage

druid.segmentCache.locations=[{"path":"var/druid/segment-cache","maxSize"\:130000000000}]

druid.server.maxSize=130000000000

# Query cache

druid.historical.cache.useCache=true

druid.historical.cache.populateCache=true

druid.cache.type=local

druid.cache.sizeInBytes=2000000000

druid.monitoring.monitors=["com.metamx.metrics.JvmMonitor", "io.druid.server.metrics.HistoricalMetricsMonitor"]

These are the extensions being used:

druid.extensions.loadList=["druid-s3-extensions", "mysql-metadata-storage", "druid-histogram", "druid-datasketches", "druid-kafka-indexing-service", "druid-lookups-cached-global", "graphite-emitter" ]

I'd appreciate any insight on where to look for what could cause the problem. Perhaps one of the extensions? Currently running Imply 2.0.0 (Druid 0.9.2). Total size of data in deep storage is 8gb~ of segments. Free memory on the machine when there are GC's is around 20gb. I've constantly been raising the heap size.

Thanks in advance

Itamar

Gian Merlino

unread,

Jun 15, 2017, 4:33:49 PM6/15/17

to druid...@googlegroups.com

Can you analyze a heap dump and see what's taking up all the space? My first guess would be something related to one of the extensions, like druid-lookups-cached-global or graphite-emitter.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/3fa22af1-af8e-42de-aaff-9aa304e25f58%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ita...@oribi.io

unread,

Jun 17, 2017, 3:49:15 PM6/17/17

to Druid User

Hi Gian,

Will do. These were my thoughts as well (as these are new issues). I will add the -XX:HeapDumpOnOutOfMemoryError flag and wait for it to reoccur. Will keep this post updated. Thanks!

Itamar

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.

Arpan Khagram

unread,

Oct 24, 2017, 9:59:44 AM10/24/17

to Druid User

Hi Itamar, Did you get what exactly the issue was ? I am having similar problem and raising heap size every few days now. I am having 9 GB of data in historical node and heap size currently it is using is 19 GB :(