Facing OOM Exception in druid overlord

86 views
Skip to first unread message

Roshan Pv

unread,
Apr 5, 2017, 6:39:11 AM4/5/17
to Druid User
Hi,
We are running druid on cent OS 6 with openJDK 8 64bit with around 16GB ram with 2GB swap space for the druid overlord node.
However we keep facing OOM exceptions after which we need to restart the overlord node. Stack trace of the exception is as below.
Is this really OOM error or is this due to max utilization of the thread count on the java process.
Any help would be appreciated.


runtime.properties for druid overlrod(<%= %> implies Embedded ruby templates):

druid.host=<%= %>
druid.port=<%= %>
druid.service=druid/overlord





druid.indexer.runner.type=remote
druid.indexer.storage.type=local
druid.indexer.queue.startDelay=PT30S



druid.db.connector.connectURI=<%=  %>
druid.db.connector.user=<%=  %>
druid.db.connector.password=<%=  %>

#druid.selectors.indexing.serviceName=druid/overlord
druid.indexer.runner.javaOpts="-server -Xmx256m"
druid.indexer.runner.startPort=8088
druid.indexer.fork.property.druid.processing.numThreads=1
druid.indexer.fork.property.druid.computation.buffer.size=100000000


Java stack trace for OOM Exception:
2016-10-17T10:42:24,259 ERROR [Curator-LeaderSelector-0] io.druid.indexing.overlord.TaskMaster - Failed to lead: {class=io.druid.indexing.overlord.TaskMaster, exceptionType=class java.lang.reflect.InvocationTargetException, exceptionMessage=null}
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_79]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_79]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_79]
        at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_79]
        at com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:350) ~[java-util-0.27.9.jar:?]
        at com.metamx.common.lifecycle.Lifecycle.start(Lifecycle.java:259) ~[java-util-0.27.9.jar:?]
        at io.druid.indexing.overlord.TaskMaster$1.takeLeadership(TaskMaster.java:141) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
        at org.apache.curator.framework.recipes.leader.LeaderSelector$WrappedListener.takeLeadership(LeaderSelector.java:534) [curator-recipes-2.10.0.jar:?]
        at org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:399) [curator-recipes-2.10.0.jar:?]
        at org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:441) [curator-recipes-2.10.0.jar:?]
        at org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:64) [curator-recipes-2.10.0.jar:?]
        at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:245) [curator-recipes-2.10.0.jar:?]
        at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:239) [curator-recipes-2.10.0.jar:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_79]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_79]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_79]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_79]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_79]
        at java.lang.Thread.run(Thread.java:745) [?:1.7.0_79]
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method) ~[?:1.7.0_79]
        at java.lang.Thread.start(Thread.java:714) ~[?:1.7.0_79]
        at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949) ~[?:1.7.0_79]
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1360) ~[?:1.7.0_79]
        at java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:628) ~[?:1.7.0_79]
        at org.apache.curator.utils.CloseableExecutorService.submit(CloseableExecutorService.java:191) ~[curator-client-2.10.0.jar:?]
        at org.apache.curator.framework.recipes.cache.PathChildrenCache.submitToExecutor(PathChildrenCache.java:812) ~[curator-recipes-2.10.0.jar:?]
        at org.apache.curator.framework.recipes.cache.PathChildrenCache.offerOperation(PathChildrenCache.java:763) ~[curator-recipes-2.10.0.jar:?]
        at org.apache.curator.framework.recipes.cache.PathChildrenCache.start(PathChildrenCache.java:310) ~[curator-recipes-2.10.0.jar:?]
        at io.druid.indexing.overlord.RemoteTaskRunner.start(RemoteTaskRunner.java:304) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
        ... 19 more

Nishant Bangarwa

unread,
Apr 10, 2017, 11:57:15 AM4/10/17
to Druid User
you may be hitting ulimit on "max user processes" (ulimit -u), can you try raising ulimit  ?


--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/5f44383a-f966-47dd-bfb1-f1b1e547f32d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages