Hi,
we are running nightly reindex tasks to reduce the heavy amount of shards/segements from kafka indexing service. For that task we
use the ""ingestSegment" firehose, so we read segments from druid und write them back (merged) to the same datasource.
We did an update yesterday (0.10.0 => 0.11.0) and realized that the jobs are much faster now. (from hour to minutes =))
I recognized a problem with one datasource: java.lang.OutOfMemoryError: Java heap space
My solution was to reduce the max rows in memory from 100000 to 50000 and now everything works fine.
StackTrace:
2018-03-07T08:16:39,839 INFO [xxx_xxx_xxx_supervisor-incremental-persist] io.druid.segment.IndexMergerV9 - Starting persist for interval[2018-03-03T00:00:00.000Z/2018-03-04T00:00:00.000Z], rows[100,000]
2018-03-07T08:15:40,824 ERROR [main-EventThread] org.apache.zookeeper.ClientCnxn - Caught unexpected throwable
java.lang.OutOfMemoryError: Java heap space
at java.util.zip.InflaterInputStream.<init>(InflaterInputStream.java:88) ~[?:1.8.0_151]
at java.util.zip.ZipFile$ZipFileInflaterInputStream.<init>(ZipFile.java:408) ~[?:1.8.0_151]
at java.util.zip.ZipFile.getInputStream(ZipFile.java:389) ~[?:1.8.0_151]
at java.util.jar.JarFile.getManifestFromReference(JarFile.java:199) ~[?:1.8.0_151]
at java.util.jar.JarFile.getManifest(JarFile.java:180) ~[?:1.8.0_151]
at sun.misc.URLClassPath$JarLoader$2.getManifest(URLClassPath.java:981) ~[?:1.8.0_151]
at java.net.URLClassLoader.defineClass(URLClassLoader.java:450) ~[?:1.8.0_151]
at java.net.URLClassLoader.access$100(URLClassLoader.java:73) ~[?:1.8.0_151]
at java.net.URLClassLoader$1.run(URLClassLoader.java:368) ~[?:1.8.0_151]
at java.net.URLClassLoader$1.run(URLClassLoader.java:362) ~[?:1.8.0_151]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_151]
at java.net.URLClassLoader.findClass(URLClassLoader.java:361) ~[?:1.8.0_151]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_151]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) ~[?:1.8.0_151]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_151]
at org.apache.logging.log4j.core.impl.Log4jLogEvent.getThrownProxy(Log4jLogEvent.java:482) ~[log4j-core-2.5.jar:2.5]
at org.apache.logging.log4j.core.pattern.ExtendedThrowablePatternConverter.format(ExtendedThrowablePatternConverter.java:64) ~[log4j-core-2.5.jar:2.5]
at org.apache.logging.log4j.core.pattern.PatternFormatter.format(PatternFormatter.java:36) ~[log4j-core-2.5.jar:2.5]
at org.apache.logging.log4j.core.layout.PatternLayout$PatternSerializer.toSerializable(PatternLayout.java:292) ~[log4j-core-2.5.jar:2.5]
at org.apache.logging.log4j.core.layout.PatternLayout.toSerializable(PatternLayout.java:206) ~[log4j-core-2.5.jar:2.5]
at org.apache.logging.log4j.core.layout.PatternLayout.toSerializable(PatternLayout.java:56) ~[log4j-core-2.5.jar:2.5]
at org.apache.logging.log4j.core.layout.AbstractStringLayout.toByteArray(AbstractStringLayout.java:148) ~[log4j-core-2.5.jar:2.5]
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:112) ~[log4j-core-2.5.jar:2.5]
at org.apache.logging.log4j.core.appender.RollingFileAppender.append(RollingFileAppender.java:88) ~[log4j-core-2.5.jar:2.5]
at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:152) ~[log4j-core-2.5.jar:2.5]
at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:125) ~[log4j-core-2.5.jar:2.5]
at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:116) ~[log4j-core-2.5.jar:2.5]
at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84) ~[log4j-core-2.5.jar:2.5]
at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:390) ~[log4j-core-2.5.jar:2.5]
at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:378) ~[log4j-core-2.5.jar:2.5]
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:362) ~[log4j-core-2.5.jar:2.5]
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:352) ~[log4j-core-2.5.jar:2.5]
My questions are:
-) Is this a problem with Druid memory settings? (in the stacktrace it seems to come from ZooKeeper)
-) If it is a problem with Druid, where we have to tweak the memory settings? (its not clear for me, where the setting for running tasks is coming from)
Thanks, Alex