Druid Historical node potential memory leak

633 views
Skip to first unread message

ita...@oribi.io

unread,
Jun 15, 2017, 10:15:30 AM6/15/17
to Druid User
Hi,

Running one historical node on a m4.4xlarge machine (16 core, 64gb). After working for 1-2 days, there were OOM errors. The GC print details showed the problem could be the young generation size, so I raised that up to 6gb as per the druid production cluster configuration.

So after a few more days, I see lots of GC's again. Ran jstat and got the following (even after running a manual GC with jcmd <pid> GC.run):

  S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT
  0.00   0.00   5.73  99.99  97.67  94.52   1495   48.873   103  100.036  148.908

The old generation capacity is at 99.99%. Only restarting the server solved this (something that is not feasible to do every 1-2 days).

This is current jvm.config file:

-server
-Xms16g
-Xmx16g
-XX:NewSize=6g
-XX:MaxNewSize=6g
-XX:MaxDirectMemorySize=12g
-XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

This is the runtime.properties file (rest of the values are default):
druid.service=druid/historical
druid.port=8083

# HTTP server threads
druid.server.http.numThreads=20

# Processing threads and buffers
druid.processing.buffer.sizeBytes=536870912

# Segment storage
druid.segmentCache.locations=[{"path":"var/druid/segment-cache","maxSize"\:130000000000}]
druid.server.maxSize=130000000000

# Query cache
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=local
druid.cache.sizeInBytes=2000000000

druid.monitoring.monitors=["com.metamx.metrics.JvmMonitor", "io.druid.server.metrics.HistoricalMetricsMonitor"]

These are the extensions being used:
druid.extensions.loadList=["druid-s3-extensions", "mysql-metadata-storage", "druid-histogram", "druid-datasketches", "druid-kafka-indexing-service", "druid-lookups-cached-global", "graphite-emitter" ]

I'd appreciate any insight on where to look for what could cause the problem. Perhaps one of the extensions? Currently running Imply 2.0.0 (Druid 0.9.2). Total size of data in deep storage is 8gb~ of segments. Free memory on the machine when there are GC's is around 20gb. I've constantly been raising the heap size.

Thanks in advance

Itamar

Gian Merlino

unread,
Jun 15, 2017, 4:33:49 PM6/15/17
to druid...@googlegroups.com
Can you analyze a heap dump and see what's taking up all the space? My first guess would be something related to one of the extensions, like druid-lookups-cached-global or graphite-emitter.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/3fa22af1-af8e-42de-aaff-9aa304e25f58%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ita...@oribi.io

unread,
Jun 17, 2017, 3:49:15 PM6/17/17
to Druid User
Hi Gian,

Will do. These were my thoughts as well (as these are new issues). I will add the -XX:HeapDumpOnOutOfMemoryError flag and wait for it to reoccur. Will keep this post updated. Thanks!

Itamar

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.

Arpan Khagram

unread,
Oct 24, 2017, 9:59:44 AM10/24/17
to Druid User
Hi Itamar, Did you get what exactly the issue was ? I am having similar problem and raising heap size every few days now. I am having 9 GB of data in historical node and heap size currently it is using is 19 GB :(

Regards,
Arpan Khagram
Reply all
Reply to author
Forward
0 new messages