Hi,
Running one historical node on a m4.4xlarge machine (16 core, 64gb). After working for 1-2 days, there were OOM errors. The GC print details showed the problem could be the young generation size, so I raised that up to 6gb as per the druid production cluster configuration.
So after a few more days, I see lots of GC's again. Ran jstat and got the following (even after running a manual GC with jcmd <pid> GC.run):
S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
0.00 0.00 5.73 99.99 97.67 94.52 1495 48.873 103 100.036 148.908
The old generation capacity is at 99.99%. Only restarting the server solved this (something that is not feasible to do every 1-2 days).
This is current jvm.config file:
-server
-Xms16g
-Xmx16g
-XX:NewSize=6g
-XX:MaxNewSize=6g
-XX:MaxDirectMemorySize=12g
-XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
This is the runtime.properties file (rest of the values are default):
druid.service=druid/historical
druid.port=8083
# HTTP server threads
druid.server.http.numThreads=20
# Processing threads and buffers
druid.processing.buffer.sizeBytes=536870912
# Segment storage
druid.segmentCache.locations=[{"path":"var/druid/segment-cache","maxSize"\:130000000000}]
druid.server.maxSize=130000000000
# Query cache
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=local
druid.cache.sizeInBytes=2000000000
druid.monitoring.monitors=["com.metamx.metrics.JvmMonitor", "io.druid.server.metrics.HistoricalMetricsMonitor"]
These are the extensions being used:
druid.extensions.loadList=["druid-s3-extensions", "mysql-metadata-storage", "druid-histogram", "druid-datasketches", "druid-kafka-indexing-service", "druid-lookups-cached-global", "graphite-emitter" ]
I'd appreciate any insight on where to look for what could cause the problem. Perhaps one of the extensions? Currently running Imply 2.0.0 (Druid 0.9.2). Total size of data in deep storage is 8gb~ of segments. Free memory on the machine when there are GC's is around 20gb. I've constantly been raising the heap size.
Thanks in advance
Itamar