java.lang.OutofMemoryError with many partitions

Jacob Biesinger

unread,

Sep 3, 2013, 8:46:54 PM9/3/13

to hyrack...@googlegroups.com

Hi!

I'm having some troubles trying to scale Hyracks to use the many cores available on my hardware (24-64 cores).

My application works well when using a low partition count (4; set via cluster.properties: $IO_DIRS and stores.properties: $store and read in by our hyracks driver).

When I increase the partition count to 21, the NC's all fail with the error message:

Exception in thread "TCPEndpoint IO Thread" java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:691)
        at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:943)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1336)
        at edu.uci.ics.hyracks.control.nc.partitions.MaterializingPipelinedPartition.writeTo(MaterializingPipelinedPartition.java:81)
        at edu.uci.ics.hyracks.control.nc.partitions.PartitionManager.registerPartitionRequest(PartitionManager.java:102)
        at edu.uci.ics.hyracks.control.nc.net.NetworkManager$InitialBufferAcceptor.accept(NetworkManager.java:104)
        at edu.uci.ics.hyracks.net.protocols.muxdemux.ChannelControlBlock$ReadInterface.flush(ChannelControlBlock.java:163)
        at edu.uci.ics.hyracks.net.protocols.muxdemux.ChannelControlBlock$ReadInterface.read(ChannelControlBlock.java:155)
        at edu.uci.ics.hyracks.net.protocols.muxdemux.ChannelControlBlock.read(ChannelControlBlock.java:320)
        at edu.uci.ics.hyracks.net.protocols.muxdemux.MultiplexedConnection.driveReaderStateMachine(MultiplexedConnection.java:403)
        at edu.uci.ics.hyracks.net.protocols.muxdemux.MultiplexedConnection.notifyIOReady(MultiplexedConnection.java:123)
        at edu.uci.ics.hyracks.net.protocols.tcp.TCPEndpoint$IOThread.run(TCPEndpoint.java:171)

Interestingly, the NC process itself doesn‘t fail. It just runs out of work to do. Other java processes (e.g., hadoop tasktracker heartbeat processes) running under my user name on the machine also start failing with the same error. I can’t even run jps until I kill -9 the NC's manually.

The machine has plenty of free memory (128GB RAM) and adjusting the java heap size (-Xmx) doesn't seem to help (tried 10g all the way up to 100g).

I ran jstack $NC_PID to get an every-second dump of the per-thread java stack which you can see at http://goo.gl/2bgy9l. In the jstack dumps in there, you can see my workers by grep edu.uci.ics.hyracks.api.rewriter.runtime.SuperActivity, which shows they start up stack.nc.32 and finish at stack.nc.71. stack.nc.91 shows the problem I mentioned: jstack can't start a new jvm due to the same allocation thread allocation problems…

For reference, here are my ulimits:

[wbiesing@compute-2-10 ~]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1033199
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I'll investigate a bit further… 4 partitions to 21 is admittedly a large jump. I will next try setting my max stack size via ulimit -s, and try turning down the per-thread stack size with -Xss.

I just wanted to get feedback on the problem: has anyone else encountered something similar? Or had success with even more partitions than I've listed here?

Thanks!

—
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine

Jacob Biesinger

unread,

Sep 3, 2013, 8:57:27 PM9/3/13

to hyrack...@googlegroups.com

BTW, in case it wasn't obvious from the memory sizes referenced…


[wbiesing@compute-2-10 appassembler]$ java -version
java version "1.7.0_09"
Java(TM) SE Runtime Environment (build 1.7.0_09-b05)
Java HotSpot(TM) 64-Bit Server VM (build 23.5-b02, mixed mode)

--

Jake Biesinger
Graduate Student
Xie Lab, UC Irvine

Jacob Biesinger

unread,

Sep 4, 2013, 3:27:41 AM9/4/13

to hyrack...@googlegroups.com

I‘ve polled /proc/PID whlie the jobs are running right up until the crash. When the jvm spits this error, the process actually isn’t shut down (normally, this is by design). Since the process has eaten all of this “memory” (not RAM) but isn't shut down, bash also starts complaining any time I enter a command:

-bash: fork: retry: Resource temporarily unavailable

AFAICT, the cause of this error is one of the following:

1) my stack size is too small. ulimit -s shows 8MB, and I think this limitation is at the process level, not the user-level… so bash shouldn't be affected by a rogue JVM.
2) too many processes are open. But there are only 3-4 started by me so scratch this one…
3) too many threads are created by the process. /proc/PID/tasks shows ~600 threads which is a lot… but I don‘t see a user-level thread limit and again, I don’t think that maxing out the thread count for the JVM should affect bash at all (still lots of memory available).
4) the cap on open files is being reached. That's a real possibility… lsof -u wbiesing | wc -l tells me we're between 2500 and 3800, which is pretty close to the ulimit -n of 4000. But google tells me this problem would lead to a different bash and jvm error (wouldn't the jvm complain about the number of open files rather than the # of threads?)

I can play around with Java stack sizes to see if it's a thread/stack limitation… but if it is, why would it bleed over into other non-jvm processes? Any thoughts?

Thanks!

--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine

Jacob Biesinger

unread,

Sep 4, 2013, 4:12:21 AM9/4/13

to hyrack...@googlegroups.com

It turns out that on Linux, ulimit -u ("max processes per user") is actually the max number of (*threads* + processes). For my particular settings and data sizes (a few GB of input, ~40GB output, FRAME_SIZE=65535, FRAME_LIMIT=4096), Hyracks is apparently generating hundreds and hundreds of threads, so much that I'm bumping into the ulimit -u soft limit of 1024 on this machine. If I increase to the hard limit of 2000, the jobs run a bit longer and eventually fail on "Too many open files!" (hard ulimit -n 4000), *in addition to* the thread message I mentioned previously. So... Hyracks is opening ~4k files and is running ~2k threads (I have nothing else running on this machine).

Since I don't have root to change the hard limits, I guess I need to tweak my settings to use a smaller FRAME_SIZE and a larger FRAME_LIMIT so as to create fewer total open files during the external sort. (Or could there be a bug where Hyracks doesn't close files and/or join threads appropriately?)