We are running druid on cent OS 6 with openJDK 8 64bit with around 16GB ram with 2GB swap space for the druid overlord node.
However we keep facing OOM exceptions after which we need to restart the overlord node. Stack trace of the exception is as below.
Is this really OOM error or is this due to max utilization of the thread count on the java process.
Any help would be appreciated.
druid.host=<%= %>
druid.port=<%= %>
druid.service=druid/overlord
druid.indexer.runner.type=remote
druid.indexer.storage.type=local
druid.indexer.queue.startDelay=PT30S
druid.db.connector.connectURI=<%= %>
druid.db.connector.user=<%= %>
druid.db.connector.password=<%= %>
#druid.selectors.indexing.serviceName=druid/overlord
druid.indexer.runner.javaOpts="-server -Xmx256m"
druid.indexer.runner.startPort=8088
druid.indexer.fork.property.druid.processing.numThreads=1
druid.indexer.fork.property.druid.computation.buffer.size=100000000
2016-10-17T10:42:24,259 ERROR [Curator-LeaderSelector-0] io.druid.indexing.overlord.TaskMaster - Failed to lead: {class=io.druid.indexing.overlord.TaskMaster, exceptionType=class java.lang.reflect.InvocationTargetException, exceptionMessage=null}
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_79]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_79]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_79]
at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_79]
at com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:350) ~[java-util-0.27.9.jar:?]
at com.metamx.common.lifecycle.Lifecycle.start(Lifecycle.java:259) ~[java-util-0.27.9.jar:?]
at io.druid.indexing.overlord.TaskMaster$1.takeLeadership(TaskMaster.java:141) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
at org.apache.curator.framework.recipes.leader.LeaderSelector$WrappedListener.takeLeadership(LeaderSelector.java:534) [curator-recipes-2.10.0.jar:?]
at org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:399) [curator-recipes-2.10.0.jar:?]
at org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:441) [curator-recipes-2.10.0.jar:?]
at org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:64) [curator-recipes-2.10.0.jar:?]
at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:245) [curator-recipes-2.10.0.jar:?]
at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:239) [curator-recipes-2.10.0.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_79]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_79]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_79]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_79]
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method) ~[?:1.7.0_79]
at java.lang.Thread.start(Thread.java:714) ~[?:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949) ~[?:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1360) ~[?:1.7.0_79]
at java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:628) ~[?:1.7.0_79]
at org.apache.curator.utils.CloseableExecutorService.submit(CloseableExecutorService.java:191) ~[curator-client-2.10.0.jar:?]
at org.apache.curator.framework.recipes.cache.PathChildrenCache.submitToExecutor(PathChildrenCache.java:812) ~[curator-recipes-2.10.0.jar:?]
at org.apache.curator.framework.recipes.cache.PathChildrenCache.offerOperation(PathChildrenCache.java:763) ~[curator-recipes-2.10.0.jar:?]
at org.apache.curator.framework.recipes.cache.PathChildrenCache.start(PathChildrenCache.java:310) ~[curator-recipes-2.10.0.jar:?]
at io.druid.indexing.overlord.RemoteTaskRunner.start(RemoteTaskRunner.java:304) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
... 19 more