historical oom

764 views
Skip to first unread message

guojing.feng

unread,
Oct 11, 2016, 7:36:40 AM10/11/16
to Druid User
Hi, Team:
My Historical nodes always oom after run several days.
the log is:
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_80]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_80]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_80]
at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_80]
at com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:350) ~[java-util-0.27.9.jar:?]
at com.metamx.common.lifecycle.Lifecycle.start(Lifecycle.java:259) ~[java-util-0.27.9.jar:?]
at io.druid.guice.LifecycleModule$2.start(LifecycleModule.java:155) ~[druid-api-0.9.1.1.jar:0.9.1.1]
at io.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:91) [druid-services-0.9.1.1.jar:0.9.1.1]
at io.druid.cli.ServerRunnable.run(ServerRunnable.java:40) [druid-services-0.9.1.1.jar:0.9.1.1]
at io.druid.cli.Main.main(Main.java:105) [druid-services-0.9.1.1.jar:0.9.1.1]
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method) ~[?:1.7.0_80]
at java.lang.Thread.start(Thread.java:714) ~[?:1.7.0_80]
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949) ~[?:1.7.0_80]
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1360) ~[?:1.7.0_80]
at java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:628) ~[?:1.7.0_80]
at io.druid.curator.ShutdownNowIgnoringExecutorService.execute(ShutdownNowIgnoringExecutorService.java:132) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at org.apache.curator.utils.CloseableExecutorService.submit(CloseableExecutorService.java:191) ~[curator-client-2.10.0.jar:?]
at org.apache.curator.framework.recipes.cache.PathChildrenCache.submitToExecutor(PathChildrenCache.java:812) ~[curator-recipes-2.10.0.jar:?]
at org.apache.curator.framework.recipes.cache.PathChildrenCache.offerOperation(PathChildrenCache.java:763) ~[curator-recipes-2.10.0.jar:?]
at org.apache.curator.framework.recipes.cache.PathChildrenCache.start(PathChildrenCache.java:297) ~[curator-recipes-2.10.0.jar:?]
at org.apache.curator.framework.recipes.cache.PathChildrenCache.start(PathChildrenCache.java:236) ~[curator-recipes-2.10.0.jar:?]
at io.druid.curator.announcement.Announcer.startCache(Announcer.java:373) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.curator.announcement.Announcer.announce(Announcer.java:259) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.curator.announcement.Announcer.announce(Announcer.java:152) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.server.coordination.BatchDataSegmentAnnouncer.announceSegments(BatchDataSegmentAnnouncer.java:195) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.server.coordination.ZkCoordinator$BackgroundSegmentAnnouncer.finishAnnouncing(ZkCoordinator.java:586) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.server.coordination.ZkCoordinator.addSegments(ZkCoordinator.java:434) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.server.coordination.ZkCoordinator.loadLocalCache(ZkCoordinator.java:277) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.server.coordination.ZkCoordinator.start(ZkCoordinator.java:133) ~[druid-server-0.9.1.1.jar:0.9.1.1]
... 10 more
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007fc2b2878000, 12288, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 12288 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/hadoop/druid/druid-0.9.1.1/hs_err_pid99636.log

My Historical configis:
jvm.config:
-server
-Xmx4g
-Xms4g
-XX:NewSize=1g
-XX:MaxNewSize=1g
-XX:PermSize=512M
-XX:MaxPermSize=512m
-XX:+UseConcMarkSweepGC
-XX:MaxDirectMemorySize=10g
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Duser.timezone=GMT+8
-Dfile.encoding=UTF-8
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-Djava.io.tmpdir=var/tmp

runtime.properties:
# HTTP server threads
druid.server.http.numThreads=25

# Processing threads and buffers
druid.processing.buffer.sizeBytes=1073741824
druid.processing.numThreads=7

# Segment storage
druid.segmentCache.locations=[{"path":"var/druid/segment-cache","maxSize"\:30000000000}]
druid.server.maxSize=30000000000

And free -g output:
              total        used        free      shared  buff/cache   available
Mem:             62           5           7           0          49          56
Swap:            14           0          14



David Lim

unread,
Oct 11, 2016, 3:41:22 PM10/11/16
to Druid User
Note that the exception says: Caused by: java.lang.OutOfMemoryError: unable to create new native thread. This isn't a memory issue but a thread allocation issue, and it's likely that you've hit a user/system limit somewhere. The following link has some suggestions for things you can look at: http://www.mastertheboss.com/jboss-server/jboss-monitoring/how-to-solve-javalangoutofmemoryerror-unable-to-create-new-native-thread

guojing.feng

unread,
Oct 11, 2016, 10:03:06 PM10/11/16
to Druid User
Hi, David:
Thank you for response.
I have checked my system limit, And it is big enough.
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 256534
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 655350
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 655350
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
I also checked the historical process, it's about 70 threads. 
There are some other error logs in hs_err_pid**.log like this:
Internal exceptions (10 events):
Event: 90.894 Thread 0x0000000005a0a000 Exception <a 'java/lang/OutOfMemoryError': Map failed> (0x00000005c0000040) thrown at [/HUDSON/workspace/8-2-build-linux-amd64/jdk8u92/6642/hotspot/s
rc/share/vm/prims/jni.cpp, line 735]
Event: 90.896 Thread 0x0000000005a0a000 Exception <a 'java/lang/OutOfMemoryError': unable to create new native thread> (0x00000005c000b038) thrown at [/HUDSON/workspace/8-2-build-linux-amd6
4/jdk8u92/6642/hotspot/src/share/vm/prims/jvm.cpp, line 3020]
Event: 90.897 Thread 0x0000000005a0a000 Exception <a 'java/lang/OutOfMemoryError': Map failed> (0x00000005c0019a88) thrown at [/HUDSON/workspace/8-2-build-linux-amd64/jdk8u92/6642/hotspot/s
rc/share/vm/prims/jni.cpp, line 735]
Event: 93.724 Thread 0x0000000005a0a000 Exception <a 'java/lang/OutOfMemoryError': Map failed> (0x00000005c0000040) thrown at [/HUDSON/workspace/8-2-build-linux-amd64/jdk8u92/6642/hotspot/s
rc/share/vm/prims/jni.cpp, line 735]
Event: 93.738 Thread 0x0000000005a0a000 Exception <a 'java/lang/OutOfMemoryError': unable to create new native thread> (0x00000005c02b6808) thrown at [/HUDSON/workspace/8-2-build-linux-amd6
4/jdk8u92/6642/hotspot/src/share/vm/prims/jvm.cpp, line 3020]
Event: 93.739 Thread 0x0000000005a0a000 Exception <a 'java/lang/OutOfMemoryError': Map failed> (0x00000005c02c5178) thrown at [/HUDSON/workspace/8-2-build-linux-amd64/jdk8u92/6642/hotspot/s
rc/share/vm/prims/jni.cpp, line 735]
Event: 96.640 Thread 0x0000000005a0a000 Exception <a 'java/lang/OutOfMemoryError': Map failed> (0x00000005c0000040) thrown at [/HUDSON/workspace/8-2-build-linux-amd64/jdk8u92/6642/hotspot/s
rc/share/vm/prims/jni.cpp, line 735]
Event: 96.642 Thread 0x0000000005a0a000 Exception <a 'java/lang/OutOfMemoryError': unable to create new native thread> (0x00000005c000b1c8) thrown at [/HUDSON/workspace/8-2-build-linux-amd6
4/jdk8u92/6642/hotspot/src/share/vm/prims/jvm.cpp, line 3020]
Event: 96.642 Thread 0x0000000005a0a000 Exception <a 'java/lang/OutOfMemoryError': Map failed> (0x00000005c0019b60) thrown at [/HUDSON/workspace/8-2-build-linux-amd64/jdk8u92/6642/hotspot/s
rc/share/vm/prims/jni.cpp, line 735]
Event: 99.669 Thread 0x0000000005a0a000 Exception <a 'java/lang/OutOfMemoryError': Map failed> (0x00000005c1d05f30) thrown at [/HUDSON/workspace/8-2-build-linux-amd64/jdk8u92/6642/hotspot/s
rc/share/vm/prims/jni.cpp, line 735]




在 2016年10月12日星期三 UTC+8上午3:41:22,David Lim写道:

David Lim

unread,
Oct 12, 2016, 1:59:34 PM10/12/16
to Druid User
Can you try 'cat /proc/{PID}/limits' , where {PID} is the PID of the process that's unable to create more threads? Processes can actually be running under more restrictive limits than the system limit, i.e. if they're invoked by a supervisor or another process that enforces its own limits.

fgjvip

unread,
Oct 12, 2016, 7:49:45 PM10/12/16
to davi...@imply.io, druid...@googlegroups.com
great,thank you。
yesterday,i found out that my cat /proc/sys/vm/max_map_count is 65530, but there are 65xxx  segments files in local cache.
after increase the max_map_count the error disapeared.
thanks(〜 ̄▽ ̄)〜



发自我的小米手机
在 David Lim <davi...@imply.io>,2016年10月13日 上午1:59写道:
--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/P2j8jya4k0k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/11e71e5a-9789-4f92-96f2-ac2516466576%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages