-XX:+PerfDisableSharedMem removed hundreds of milliseconds of latency...

3,743 views
Skip to first unread message

Kevin Burton

unread,
Jan 25, 2016, 6:29:17 PM1/25/16
to mechanical-sympathy
This is really interesting:


(see overview below)

Note that Cassandra 2.2 is enabling this by default.  The DOWNSIDE is that it breaks tools.

I wonder if there's a middle path here. Maybe enabling it but configure Java to write to some other directory other than /tmp and then mount that data as tmpfs / shared memory.

jstack and friends can be super helpful... 


TL;DR: The JVM by default exports statistics by mmap-ing a file in /tmp(hsperfdata). On Linux, modifying a memory mapped file can block until disk I/O completes, which can be hundreds of milliseconds. Since the JVM modifies these statistics during garbage collection and safepoints, this causes pauses that are hundreds of milliseconds long. To reduce worst-case pause latencies, add the -XX:+PerfDisableSharedMem JVM flag to disable this feature. This will break tools that read this file, like jstatUpdate:how I found this problem


Long version: There are few things I find more satisfying that solving a challenging bug, and this is the hardest bug I can remember. I've spent four months figuring this one out. The root cause is that on rare occasions, writing to memory that is a memory mapped file on Linux will block, waiting for disk writes to complete. This is surprising, since in the code it doesn't look like modifying some variable is related to disk I/O. I wrote a test program to demonstrate this behaviour, and I was able to cause pauses on multiple file systems: ext2, ext3, ext4, btrfs (extremely common), and xfs (less common). The pauses occur even if the I/O is to a separate disk, or if you call mlock. The only workaround I've found is to place the mmap-ed file in tmpfs (a RAM disk), or disable it completely.

Kevin Burton

unread,
Jan 25, 2016, 6:40:51 PM1/25/16
to mechanical-sympathy
Also, looks like this is mitigated by just placing /tmp on tmpfs ... Interesting that our Debian boxes (100 or so) all have /tmp still on disk.  So we're going to push out a release that resolves that problems.  

Dan Eloff

unread,
Jan 26, 2016, 12:08:19 AM1/26/16
to mechanica...@googlegroups.com
Reading the comments there it looks like the Linux kernel pauses were mtime related. The lazytime patch set (new mount option) should resolve it, apparently. Not sure what kernel version that's in.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages