How do you make the OS not flush to disk altogether in memory mapped IO ?

708 views
Skip to first unread message

ymo

unread,
Sep 24, 2014, 2:41:01 PM9/24/14
to mechanica...@googlegroups.com
According to this http://stackoverflow.com/questions/21748078/why-does-java-memory-mapped-buffer-cause-massive-unexpected-disk-io linux will flush to disk based on the kernel parameters set in sysctl. I am wondering if there is anything you can do to make sure it does not write to the disk at all.

Using off-heap memory does not guaranty that the os will not page the memory to disk. So what gives ?

Ariel Weisberg

unread,
Sep 24, 2014, 3:40:41 PM9/24/14
to mechanica...@googlegroups.com
Hi,

If you run without swap you won't see applications paged out. You can also set swappiness to 1.

Linux also supports memory only filesystems and there is usually one at /dev/shm.

If you want to avoid any flushing at all until you can write to private memory and then memcpy to the filesystem when done.

MongoDB had to do something tricky to make their mmap stuff work with journaling in crash safe way http://www.kchodorow.com/blog/2012/10/04/how-mongodbs-journaling-works. I don't quite follow how the private few can both reflect the contents of shared view, but not reflect subsequent changes to it. It must be some mmaped file feature I don't know about.

Ariel

Gil Tene

unread,
Sep 24, 2014, 7:02:46 PM9/24/14
to mechanica...@googlegroups.com
Paging and swapping are separate things you need to worry about.

The only things that will ever be paged (not swapped) to disk are things that are mapped to files. E.g. mmaped files (or MappedByteBuffers), and code. Anonymous memory (the sort you get with malloc and when you explicitly ask to mmap with MAP_ANONYMOUS) will not get paged. So if your off-heap stuff is created with newly allocated DirectByteBuffer (which uses anonymous memory) it won't get paged. But if it's created with MappedByteBuffer of an actual file, expect paging. E.g. if you need a persistent journal file, you you've signed up for paging.

Anonymous memory is still susceptible to swapping. The best way to prevent that is to disable swapping (it's a useless and arcane thing, anyway). You can do that y simply not having a swap file, which will truly make sure it won't happen. Short of that, you can "hint" that you really wish the OS didn't swap memory for no reason by setting swapiness to 0. But I've seen systems swap with swapines set to 0 under various conditions. For example, you can end up swapping with swapiness to 0 even when half the memory in the box sitting empty in your zone_reclaim_info isn't set to 0.

A common reason people end up paging without wanting to is sharing. The easiest way to share memory between processes is with an mmap of a shared file, and if this file has an actual backing store, the OS will end up paging it. There are various ways to try to avoid and reduce this effect. One is to place the shared file in /tmp , which is usually not backed by a storage device. Another is to use shared memory that is not mapped to a file, but created using variants of shmat (the system V shared memory service) [this usually involves creating Direct buffers with JNI].

Richard Warburton

unread,
Sep 25, 2014, 4:56:13 AM9/25/14
to mechanica...@googlegroups.com
Hi,

If you run without swap you won't see applications paged out. You can also set swappiness to 1.

Happy to corrected here - but I think the vm.swappiness on linux is a bit more complex than this.

Swappiness is a 0-100 property controlling the tendency of the kernel to move pages from physical memory to disk.  Not only that but in version 3.5 the meaning of this parameter changed.

* In < 3.5 the value 0 means "lowest tendency to swap" and swapping will only happen if the kernel things there is going to be an out of memory error. Ie - swap before invoking OOM Killer.

* in >= 3.5 the value 0 means "disable swap". This maybe what you want if you would rather processes got killed rather than swapping happening. The value 1 means what 0 used to mean in older kernels - ie try to avoid swapping, but if you're about to run out of memory then swap.

regards,

  Richard Warburton

Ariel Weisberg

unread,
Sep 25, 2014, 10:26:59 AM9/25/14
to mechanica...@googlegroups.com
Certainly there is more subtlety to this since they changed the behavior in a breaking way. I reflexively say 1 because people who have swap configured typically expect swapping over OOM. I think, without factual basis, that 1 performs similar to 0 pre 3.5 so the advice that people should set it to 1 or disable swap makes for an easy way to explain how to get the desired behavior across versions without accidentally disabling swap post 3.5.

Your and Gil's explanations are definitely better.

Kevin Burton

unread,
Sep 26, 2014, 4:28:40 PM9/26/14
to mechanica...@googlegroups.com
This is a complicated topic... but you can also call mlock(2) on the mapped region.  The good part of this strategy is that if the OS is doing something silly, you can avoid it from being paged back out to disk.  But if you aren't careful, this can lead to the OOM killer kicking in.

This can help if you KNOW a certain section of memory should not be paged out.. and it's a reasonably sized amount of memory.  So say 1GB of memory on a 64GB system where you have 50GB of data that's just vfs page cache but NOT locked.

This way if you need memory the OS can just evict the page cache.

It's a bad idea to mlockall() or mlock most of your data.  The OS can freak out... 

If you REALLY know what you are doing you can get away with it... but having multiple daemons on the box can end up shooting you in the foot.



On Wednesday, September 24, 2014 11:41:01 AM UTC-7, ymo wrote:

Michael Barker

unread,
Oct 1, 2014, 1:31:56 AM10/1/14
to mechanica...@googlegroups.com
As I understand it, mlock(2) won't prevent I/O to disk.  An mlock'd page will be written back to disk if dirty, but won't be evicted from the page cache.

Mike.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rüdiger Möller

unread,
Oct 3, 2014, 7:55:23 PM10/3/14
to mechanica...@googlegroups.com
Why not create a ram disc and put your mmapped files there ? 
Reply all
Reply to author
Forward
0 new messages