Catching failed memory allocations

Richard Gooch

unread,

Jan 25, 2016, 1:09:05 AM1/25/16

to golang-dev

Hi, all. There's currently no way to safely allocate memory in Go without running the risk of the process panicing. This makes it very difficult to have an application that automatically scales to fit the machine (or container) resources. Further, for an application to determine how much memory is available for allocation from the OS is non-portable and unreliable, so it's not like an application can wrap allocations inside a function that checks to see if they will fit. The guess it can make about available memory can be under or over the real value, leading to wasted resources or OOM panics.

Ideally, one could recover() from an OOM panic, but that currently does not work. There is even code in the standard package library that attempts to do a recover() (to change the panic message), but it doesn't work. Commit b0d2713b77f80986f688d18bd0df03ed56d6e7b5 by Rob Pike attempted this 4 years ago. Perhaps it worked once?

I realise that running defer() code when a memory allocation has failed, since the recovery code will also need to allocation memory. However, the common case is probably that a large allocation failed, and there is room for small allocations needed for cleanup, so just allowing applications to catch OOM panics would probably help most of the time. If there is another memory allocation failure during recovery, kill the application. There could also be a reserved chunk of memory that is freed/made available during OOM recovery, which is re-reserved once recovery is complete. That would also allow effective recovery if a small memory allocation failed. As long as the recovery code uses less memory than the reserved size, this approach should be reliable.

A less attractive option is to add a trymake() built-in function, which returns a value,error tuple. I like this less because there are many other ways in which memory is allocated, so one cannot catch them all. It also requires changing a large number of callsites. Nevertheless, it would be better than the current situation.

I'm hoping to get at least in-principle agreement that the current behaviour needs to be fixed and how to do that. We can then figure out the "who" :-)

Regards,

Richard....

Ian Lance Taylor

unread,

Jan 25, 2016, 1:24:55 AM1/25/16

to Richard Gooch, golang-dev

On Sun, Jan 24, 2016 at 8:27 PM, Richard Gooch <rg+go...@safe-mbox.com> wrote:
>
> Ideally, one could recover() from an OOM panic, but that currently does not
> work. There is even code in the standard package library that attempts to do
> a recover() (to change the panic message), but it doesn't work. Commit
> b0d2713b77f80986f688d18bd0df03ed56d6e7b5 by Rob Pike attempted this 4 years
> ago. Perhaps it worked once?

That code doesn't recover from an OOM panic (as you note, an OOM is
not a panic, and it can not be recovered). That code recovers from an
attempt to create a slice so large that it does not fit in memory.
Such an attempt, which can never succeed, does panic, and can be
recovered (see the makeslice function near the start of
https://golang.org/src/runtime/slice.go). I agree that it's a subtle,
and often useless, distinction. It was a distinction that meant
slightly more at the time that change was written, because back then
int was 32 bits even on a 64-bit machine, and an attempt to create a
slice that required more than 31 bits to index it would panic.

> I realise that running defer() code when a memory allocation has failed,
> since the recovery code will also need to allocation memory. However, the
> common case is probably that a large allocation failed, and there is room
> for small allocations needed for cleanup, so just allowing applications to
> catch OOM panics would probably help most of the time. If there is another
> memory allocation failure during recovery, kill the application. There could
> also be a reserved chunk of memory that is freed/made available during OOM
> recovery, which is re-reserved once recovery is complete. That would also
> allow effective recovery if a small memory allocation failed. As long as the
> recovery code uses less memory than the reserved size, this approach should
> be reliable.

I don't agree that the common case is that a large allocation failed.
That is one case, but I think the common case is that the program has
a memory leak and has in fact run out of memory. Adding a reserved
chunk of memory introduces a new allocation approach that is almost
never used, and is therefore more likely to be buggy, and (I suspect)
will rarely help.

> A less attractive option is to add a trymake() built-in function, which
> returns a value,error tuple. I like this less because there are many other
> ways in which memory is allocated, so one cannot catch them all. It also
> requires changing a large number of callsites. Nevertheless, it would be
> better than the current situation.

It's true that there are many ways that allocation can occur, but
there aren't all that many ways that a large allocation can occur. If
we restrict ourselves to slices, which seems reasonable at first
glance, then there is really only one way: the make function (one
could write a truly large composite literal, but that seems
implausible). If we can agree that the only kind of memory allocation
from which one can plausibly reliably recover is a large one, then I
think an approach like trymake might seem more reasonable.

Ian

Benny Siegert

unread,

Jan 25, 2016, 1:51:49 AM1/25/16

to Ian Lance Taylor, Richard Gooch, golang-dev

The other scenario that comes to mind is appending to a large slice which then needs to reallocate a chunk of backing memory. Then you would also need a tryappend function, I assume.

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Richard Gooch

unread,

Jan 25, 2016, 3:09:40 AM1/25/16

to golang-dev, rg+go...@safe-mbox.com

On Sunday, 24 January 2016 22:24:55 UTC-8, Ian Lance Taylor wrote:

On Sun, Jan 24, 2016 at 8:27 PM, Richard Gooch <rg+go...@safe-mbox.com> wrote:
>
> Ideally, one could recover() from an OOM panic, but that currently does not
> work. There is even code in the standard package library that attempts to do
> a recover() (to change the panic message), but it doesn't work. Commit
> b0d2713b77f80986f688d18bd0df03ed56d6e7b5 by Rob Pike attempted this 4 years
> ago. Perhaps it worked once?

> I realise that running defer() code when a memory allocation has failed,
> since the recovery code will also need to allocation memory. However, the
> common case is probably that a large allocation failed, and there is room
> for small allocations needed for cleanup, so just allowing applications to
> catch OOM panics would probably help most of the time. If there is another
> memory allocation failure during recovery, kill the application. There could
> also be a reserved chunk of memory that is freed/made available during OOM
> recovery, which is re-reserved once recovery is complete. That would also
> allow effective recovery if a small memory allocation failed. As long as the
> recovery code uses less memory than the reserved size, this approach should
> be reliable.

I don't agree that the common case is that a large allocation failed.
That is one case, but I think the common case is that the program has
a memory leak and has in fact run out of memory. Adding a reserved
chunk of memory introduces a new allocation approach that is almost
never used, and is therefore more likely to be buggy, and (I suspect)
will rarely help.

Memory leaks may be one class of problems, but there is another class of problems where the memory consumption is proportional to the load, and hence the load that can be processed is memory constrained. A reserved chunk of memory would definitely help (when used for OOM recovery). Right now, it's challenging using Go for large, scalable applications.

I'm not proposing something complex. Just a simple chunk of memory allocated the normal way (internally) which is freed by the OOM handler before deferred functions are called. It could even be something the application has to do explicitly, such as runtime.ReserveOOMBuffer(size uint64). The application recovery code would be expected to call it again once it's freed up memory and called runtime.GC().

Regarding rarely used codepaths: that's what unittests are for :-)

> A less attractive option is to add a trymake() built-in function, which
> returns a value,error tuple. I like this less because there are many other
> ways in which memory is allocated, so one cannot catch them all. It also
> requires changing a large number of callsites. Nevertheless, it would be
> better than the current situation.

It's true that there are many ways that allocation can occur, but
there aren't all that many ways that a large allocation can occur. If
we restrict ourselves to slices, which seems reasonable at first
glance, then there is really only one way: the make function (one
could write a truly large composite literal, but that seems
implausible). If we can agree that the only kind of memory allocation
from which one can plausibly reliably recover is a large one, then I
think an approach like trymake might seem more reasonable.

Appending to slices and map inserts are also common. So we'd want extensions for those, too. It's starting to get ugly. Another factor to consider is that there's a lot of library code that we'd want to change. For example, the GOB decoder. I want to be able to read structured data and have it fail with an error if the data structure won't fit in memory. I have a couple of applications where it's not a couple of huge slices but a large map.

Reserving an OOM buffer seems a lot cleaner.

Regards,

Richard....

Richard Gooch

unread,

Jan 30, 2016, 11:39:22 AM1/30/16

to golang-dev

Converted to an issue so it doesn't get dropped: https://github.com/golang/go/issues/14162

Reply all

Reply to author

Forward