Heaps, forking, vm.overcommit_memory=2, MADV_DONTFORK

719 views
Skip to first unread message

Albert Strasheim

unread,
May 16, 2011, 11:51:20 AM5/16/11
to golang-dev
Hello all

We recently ran into a situation where a Go process running in a Linux
VM with 2 GB RAM, no swap space and vm.overcommit_memory=2 couldn't
fork another process.

Due to the fact the forking process had created quite a bit of garbage
in a short time without the GC having had a chance to run, the memory
it had mmaped (Sys in runtime.MemStats, IIRC) was about 1.3 GB.

Due to vm.overcommit_memory=2, the fork operation failed because Linux
refused to overcommit its memory.

I was just browsing some other code, and came across
madvise(MADV_DONTFORK) which is available since Linux 2.6.16.

Would it make sense for Go to mark its heap with MADV_DONTFORK so that
a "large" Go process can still fork small processes on resource-
constrained systems?

Regards

Albert

Russ Cox

unread,
May 16, 2011, 11:54:16 AM5/16/11
to Albert Strasheim, golang-dev
I have no problem with setting MADV_DONTFORK on Linux.

Russ

Florian Weimer

unread,
May 16, 2011, 2:05:04 PM5/16/11
to Albert Strasheim, golang-dev
* Albert Strasheim:

> I was just browsing some other code, and came across
> madvise(MADV_DONTFORK) which is available since Linux 2.6.16.
>
> Would it make sense for Go to mark its heap with MADV_DONTFORK so that
> a "large" Go process can still fork small processes on resource-
> constrained systems?

A more standard approach would call vfork() or clone(2) with suitable
flags. Martin Buchholz contributed such code to OpenJDK (which faces
exactly the same problem), and he might still have a copy under terms
which are suitable for inclusion with Go.

Albert Strasheim

unread,
May 17, 2011, 4:46:59 AM5/17/11
to Florian Weimer, golang-dev
Hello

There is some info here:

http://developers.sun.com/solaris/articles/subprocess/subprocess.html

I'll look it over and try to submit a CL.

Regards

Albert

Albert Strasheim

unread,
May 17, 2011, 5:14:57 AM5/17/11
to Florian Weimer, golang-dev
Hello again

It seems the options are vfork and clone(CLONE_VFORK).

As also noted at

http://en.wikipedia.org/wiki/Fork_(operating_system)

the man page for vfork says:

`It is rather unfortunate that Linux revived this specter from the
past. The BSD man page states: "This system call will be eliminated
when proper system sharing mechanisms are implemented. Users should
not depend on the memory sharing semantics of vfork() as it will, in
that case, be made synonymous to fork(2)."`

which makes me wonder if madvise(MADV_DONTFORK) is the solution for
the new millennium?

If vfork is okay, it might be as simple as changing

RawSyscall(SYS_FORK, 0, 0, 0)

to

RawSyscall(SYS_VFORK, 0, 0, 0)

in exec_unix.go, but this would leave the forking process suspended
until the child process reaches RawSyscall(SYS_EXECVE, ...).

I'm guessing the madvise approach avoids this pause.

Thoughts on the way forward?

Regards

Albert

Ian Lance Taylor

unread,
May 17, 2011, 10:17:54 AM5/17/11
to Albert Strasheim, Florian Weimer, golang-dev
Albert Strasheim <ful...@gmail.com> writes:

> `It is rather unfortunate that Linux revived this specter from the
> past. The BSD man page states: "This system call will be eliminated
> when proper system sharing mechanisms are implemented. Users should
> not depend on the memory sharing semantics of vfork() as it will, in
> that case, be made synonymous to fork(2)."`
>
> which makes me wonder if madvise(MADV_DONTFORK) is the solution for
> the new millennium?

That paragraph has been there since vfork was invented, and it's a
canard. It will always be quicker to clone a process if you don't have
to clone the memory mappings, and that is getting more true rather than
less. It will never be more efficient to implement vfork as fork.

> If vfork is okay, it might be as simple as changing
>
> RawSyscall(SYS_FORK, 0, 0, 0)
>
> to
>
> RawSyscall(SYS_VFORK, 0, 0, 0)
>
> in exec_unix.go, but this would leave the forking process suspended
> until the child process reaches RawSyscall(SYS_EXECVE, ...).

Worse, it's not clear that all systems which provide vfork also permit
the child process to do things like call setuid. vfork only knows how
to unwind a limited set of changes to the child environment, and that
set of changes is poorly documented.

> I'm guessing the madvise approach avoids this pause.

I think madvise is the right way to go. But note that it will require
copying all the parameters to forkAndExecInChild into local stack
variables before the fork, as they may otherwise be inaccessible to the
child.

Ian

Reply all
Reply to author
Forward
0 new messages