oom killer gone nuts

Jens Axboe

unread,

Jan 20, 2005, 7:35:43 AM1/20/05

to Linux Kernel, Andrew Morton, Andrea Arcangeli

Hi,

Using current BK on my x86-64 workstation, it went completely nuts today
killing tasks left and right with oodles of free memory available.
Here's a little snippet from messages:

Out of Memory: Killed process 2847 (screen).
oom-killer: gfp_mask=0xd1
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
HighMem per-cpu: empty

Free pages: 529184kB (0kB HighMem)
Active:19127 inactive:20440 dirty:92 writeback:0 unstable:0 free:132296 slab:3827 mapped:3503 pagetables:164
DMA free:4536kB min:60kB low:72kB high:88kB active:0kB inactive:0kB present:16384kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
Normal free:524648kB min:4028kB low:5032kB high:6040kB active:76508kB inactive:81760kB present:1031360kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
DMA: 556*4kB 155*8kB 65*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4536kB
Normal: 29800*4kB 25115*8kB 6953*16kB 1251*32kB 326*64kB 103*128kB 31*256kB 12*512kB 3*1024kB 1*2048kB 0*4096kB = 524648kB
HighMem: empty
Swap cache: add 59864, delete 55781, find 6188/8478, race 0+0
Out of Memory: Killed process 27326 (bash).
oom-killer: gfp_mask=0xd1
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
HighMem per-cpu: empty

Free pages: 530472kB (0kB HighMem)
Active:18861 inactive:20434 dirty:147 writeback:0 unstable:0 free:132618 slab:3836 mapped:2955 pagetables:144
DMA free:4536kB min:60kB low:72kB high:88kB active:0kB inactive:0kB present:16384kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
Normal free:525936kB min:4028kB low:5032kB high:6040kB active:75444kB inactive:81736kB present:1031360kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
DMA: 556*4kB 155*8kB 65*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4536kB
Normal: 30040*4kB 25124*8kB 6953*16kB 1255*32kB 328*64kB 103*128kB 31*256kB 12*512kB 3*1024kB 1*2048kB 0*4096kB = 525936kB
HighMem: empty
Swap cache: add 59864, delete 55781, find 6188/8478, race 0+0

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Andries Brouwer

unread,

Jan 20, 2005, 8:17:28 AM1/20/05

to Jens Axboe, Linux Kernel, Andrew Morton, Andrea Arcangeli

On Thu, Jan 20, 2005 at 01:34:06PM +0100, Jens Axboe wrote:

> Using current BK on my x86-64 workstation, it went completely nuts today
> killing tasks left and right with oodles of free memory available.

Yes, the fact that the oom-killer exists is a serious problem.
People work on trying to tune it, instead of just removing it.

I am getting reports that also in overcommit mode 2 (no overcommit,
no oom-killer ever needed) processes are killed by the oom-killer
(on 2.6.10).

Andries

Andrea Arcangeli

unread,

Jan 20, 2005, 12:24:32 PM1/20/05

to Andries Brouwer, Jens Axboe, Linux Kernel, Andrew Morton

On Thu, Jan 20, 2005 at 02:15:56PM +0100, Andries Brouwer wrote:
> On Thu, Jan 20, 2005 at 01:34:06PM +0100, Jens Axboe wrote:
>
> > Using current BK on my x86-64 workstation, it went completely nuts today
> > killing tasks left and right with oodles of free memory available.
>
> Yes, the fact that the oom-killer exists is a serious problem.
> People work on trying to tune it, instead of just removing it.

I'm working on fixing it, not just tuning it. The bugs in mainline
aren't about the selection algorithm (which is normally what people
calls oom killer). The bugs in mainline are about being able to kill a
task reliably, regardless of which task we pick, and every linux kernel
out there has always killed some task when it was oom. So the bugs are
just obvious regressions of 2.6 if compared to 2.4.

But this is all fixed now, I'm starting sending the first patches to
Anderw very shortly (last week there was still the oracle stuff going
on). Now I can fix the rejects.

I will guarantee nothing about which task will be picked (that's the old
code at works, I changed not a bit in what normally people calls "the oom
killer", plus the recent improvement from Thomas), but I guarantee the
VM won't kill tasks right and left like it does now (i.e. by invoking the
oom killer multiple times).

Marcelo Tosatti

unread,

Jan 20, 2005, 12:32:08 PM1/20/05

to Andries Brouwer, Jens Axboe, Linux Kernel, Andrew Morton, Andrea Arcangeli

On Thu, Jan 20, 2005 at 02:15:56PM +0100, Andries Brouwer wrote:

> On Thu, Jan 20, 2005 at 01:34:06PM +0100, Jens Axboe wrote:
>
> > Using current BK on my x86-64 workstation, it went completely nuts today
> > killing tasks left and right with oodles of free memory available.
>
> Yes, the fact that the oom-killer exists is a serious problem.
> People work on trying to tune it, instead of just removing it.
>
> I am getting reports that also in overcommit mode 2 (no overcommit,
> no oom-killer ever needed) processes are killed by the oom-killer
> (on 2.6.10).

Hi Andries,

There is a user requirement for overcommit mode, you know.

Saying "hey, there's no more overcommit mode in future v2.6 releases, you
run out of memory and get -ENOMEM" is not really an option is it?

You propose to remove the OOM killer and do what? Lockup solid?

It is _WAY_ off right now: look at the amount of free pages:

DMA free:4536kB min:60kB low:72kB high:88kB active:0kB inactive:0kB present:16384kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
Normal free:524648kB min:4028kB low:5032kB high:6040kB active:76508kB inactive:81760kB present:1031360kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
DMA: 556*4kB 155*8kB 65*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4536kB
Normal: 29800*4kB 25115*8kB 6953*16kB 1251*32kB 326*64kB 103*128kB 31*256kB 12*512kB 3*1024kB 1*2048kB 0*4096kB = 524648kB
HighMem: empty

v2.4 gets it pretty much right for most cases, and its obviously screwed up right now in v2.6.

Andrea/Thomas were working on getting it fixed ??

Andries Brouwer

unread,

Jan 20, 2005, 1:58:52 PM1/20/05

to Marcelo Tosatti, Andries Brouwer, Jens Axboe, Linux Kernel, Andrew Morton, Andrea Arcangeli

On Thu, Jan 20, 2005 at 12:00:34PM -0200, Marcelo Tosatti wrote:
> On Thu, Jan 20, 2005 at 02:15:56PM +0100, Andries Brouwer wrote:

> > Yes, the fact that the oom-killer exists is a serious problem.
> > People work on trying to tune it, instead of just removing it.
> >
> > I am getting reports that also in overcommit mode 2 (no overcommit,
> > no oom-killer ever needed) processes are killed by the oom-killer
> > (on 2.6.10).
>
> Hi Andries,
>
> There is a user requirement for overcommit mode, you know.
>
> Saying "hey, there's no more overcommit mode in future v2.6 releases, you
> run out of memory and get -ENOMEM" is not really an option is it?
>
> You propose to remove the OOM killer and do what? Lockup solid?

Right now we have three overcommit modes.
They are specified by:
0: overcommit, but keep it reasonable (the current default)
1: overcommit, always say yes
2: keep track of all our obligations, do not overcommit

So, one has the right to expect that no OOM situation can occur
in overcommit mode 2. But in 2.6.10 it can. That is a bug.
The conclusion must be that bookkeeping is done incorrectly.
Perhaps also mode 0 is affected by that same bug.

Now you ask what I propose. There is no hurry worrying about that -
the first thing should be to fix the bookkeeping problem.

But assume that fixed. Then everybody can run in mode 2 and never
have any problems. That is what I do.

Yes, you say, but that is an inefficient use of memory. Perhaps.
That is the price I am willing to pay for the guarantee that my
processes are not killed at some random moment.

But if someone else does not do anything of importance and doesnt
care if his processes die at arbitrary moments if only things go
as fast as possible and use as much of his precious memory as possible,
then also for him overcommit mode 2 can be useful.
It is accompanied by the variable overcommit_ratio R - the amount
of memory that can be used is Swap + Memory*(R/100). Here R can be
larger than 100, so in overcommit mode 2 one can specify very precisely
what amount of overcommitment is considered acceptable.

Very few people run overcommit mode 2, and lots of things are
badly tested. It cannot become the default today.
But I would like to see it the default at some future moment.

Andries

Andries Brouwer

unread,

Jan 20, 2005, 3:53:56 PM1/20/05

to Andrea Arcangeli, Andries Brouwer, Jens Axboe, Linux Kernel, Andrew Morton

On Thu, Jan 20, 2005 at 06:15:44PM +0100, Andrea Arcangeli wrote:

> > Yes, the fact that the oom-killer exists is a serious problem.
> > People work on trying to tune it, instead of just removing it.
>
> I'm working on fixing it, not just tuning it. The bugs in mainline
> aren't about the selection algorithm (which is normally what people
> calls oom killer). The bugs in mainline are about being able to kill a
> task reliably, regardless of which task we pick, and every linux kernel
> out there has always killed some task when it was oom. So the bugs are
> just obvious regressions of 2.6 if compared to 2.4.

Yes, earlier one lost a job once in a great while, these days it is
once in a while - the frequency has gone up.

But let me stress that I also consider the earlier situation
unacceptable. It is really bad to lose a few weeks of computation.

You talk about "when it is oom", as if it would be unavoidable,
an act of nature. But it can be avoided, and should be avoided,
unless the sysadmin explicitly says that oom is OK for him.

(Compare allowing oom with overclocking - there is a trade-off
between speed and reliability. It must be possible to choose
for reliability. Indeed, reliability must be the default.)

Andries

Chris Friesen

unread,

Jan 20, 2005, 5:04:13 PM1/20/05

to Andries Brouwer, Andrea Arcangeli, Jens Axboe, Linux Kernel, Andrew Morton

Andries Brouwer wrote:

> But let me stress that I also consider the earlier situation
> unacceptable. It is really bad to lose a few weeks of computation.

Shouldn't the application be backing up intermediate results to disk
periodically? Power outages do occur, as do bus faults, electrical
glitches, dead fans, etc.

Chris

Andrea Arcangeli

unread,

Jan 20, 2005, 6:29:48 PM1/20/05

to Chris Friesen, Andries Brouwer, Jens Axboe, Linux Kernel, Andrew Morton

On Thu, Jan 20, 2005 at 03:57:07PM -0600, Chris Friesen wrote:
> Andries Brouwer wrote:
>
> >But let me stress that I also consider the earlier situation
> >unacceptable. It is really bad to lose a few weeks of computation.
>
> Shouldn't the application be backing up intermediate results to disk
> periodically? Power outages do occur, as do bus faults, electrical
> glitches, dead fans, etc.

Agreed. Plus if you truly cannot change the app because it's binary only
at least you can set the ulimit based on the virtual sizes, ulimit
should work reliably even if overcommit doesn't.

Jens Axboe

unread,

Jan 21, 2005, 2:43:38 AM1/21/05

to Andrea Arcangeli, Andries Brouwer, Linux Kernel, Andrew Morton

On Thu, Jan 20 2005, Andrea Arcangeli wrote:
> On Thu, Jan 20, 2005 at 02:15:56PM +0100, Andries Brouwer wrote:
> > On Thu, Jan 20, 2005 at 01:34:06PM +0100, Jens Axboe wrote:
> >
> > > Using current BK on my x86-64 workstation, it went completely nuts today
> > > killing tasks left and right with oodles of free memory available.
> >
> > Yes, the fact that the oom-killer exists is a serious problem.
> > People work on trying to tune it, instead of just removing it.
>
> I'm working on fixing it, not just tuning it. The bugs in mainline
> aren't about the selection algorithm (which is normally what people
> calls oom killer). The bugs in mainline are about being able to kill a
> task reliably, regardless of which task we pick, and every linux kernel
> out there has always killed some task when it was oom. So the bugs are
> just obvious regressions of 2.6 if compared to 2.4.
>
> But this is all fixed now, I'm starting sending the first patches to
> Anderw very shortly (last week there was still the oracle stuff going
> on). Now I can fix the rejects.
>
> I will guarantee nothing about which task will be picked (that's the old
> code at works, I changed not a bit in what normally people calls "the oom
> killer", plus the recent improvement from Thomas), but I guarantee the
> VM won't kill tasks right and left like it does now (i.e. by invoking the
> oom killer multiple times).

And especially not with 500MB of zone normal free, thanks :)

2.6.11-rc1-xx vm behaviour is looking a _lot_ worse than 2.6.10 btw, I
haven't looked closer at what has changed yet it's just a subjective
feeling. I regularly have to run a fillmem.c hog to prune caches or it
runs like an old dog.

--
Jens Axboe

Andrea Arcangeli

unread,

Jan 21, 2005, 3:08:13 AM1/21/05

to Jens Axboe, Andries Brouwer, Linux Kernel, Andrew Morton

On Fri, Jan 21, 2005 at 08:42:08AM +0100, Jens Axboe wrote:
> And especially not with 500MB of zone normal free, thanks :)

;) Are you sure you had 500m free even before the _first_ oom killing?

I assumed what you posted was not the first one of the oom killing
messages. If it was the first then there was a regression. But if OTOH I
didn't misunderstood your message and it wasn't the first, then what
you've seen is just the brokeness of 2.6 w.r.t. oom killing, that's what
made Thomas drive a few hours too, and you've only to apply the 5
patches I just posted, and everything will work perfectly correct then
in terms of _not_ killing right and left anymore, even despite the 500m
free ;). I tested the code before posting and my regression test passed
at least, so it looked like there was no other regression. The several
rejects I've got while porting the code looked all due noop-cleanups. So
I doubt there was a regression and I'm optimistic you've just seen the
old bugs.

Jens Axboe

unread,

Jan 21, 2005, 3:13:20 AM1/21/05

to Andrea Arcangeli, Andries Brouwer, Linux Kernel, Andrew Morton

On Fri, Jan 21 2005, Andrea Arcangeli wrote:
> On Fri, Jan 21, 2005 at 08:42:08AM +0100, Jens Axboe wrote:
> > And especially not with 500MB of zone normal free, thanks :)
>
> ;) Are you sure you had 500m free even before the _first_ oom killing?

No it wasn't, the first looked like this:

Jan 20 13:22:15 wiggum kernel: oom-killer: gfp_mask=0xd1
Jan 20 13:22:15 wiggum kernel: DMA per-cpu:
Jan 20 13:22:15 wiggum kernel: cpu 0 hot: low 2, high 6, batch 1
Jan 20 13:22:15 wiggum kernel: cpu 0 cold: low 0, high 2, batch 1
Jan 20 13:22:15 wiggum kernel: Normal per-cpu:
Jan 20 13:22:15 wiggum kernel: cpu 0 hot: low 32, high 96, batch 16
Jan 20 13:22:15 wiggum kernel: cpu 0 cold: low 0, high 32, batch 16
Jan 20 13:22:15 wiggum kernel: HighMem per-cpu: empty
Jan 20 13:22:15 wiggum kernel:
Jan 20 13:22:15 wiggum kernel: Free pages: 155720kB (0kB HighMem)
Jan 20 13:22:15 wiggum kernel: Active:113367 inactive:14428 dirty:2048 writeback:0 unstable:0 free:38930 slab:6284 mapped:102966 pagetables:2010
Jan 20 13:22:15 wiggum kernel: DMA free:4080kB min:60kB low:72kB high:88kB active:16kB inactive:0kB present:16384kB pages_scanned:21 all_unreclaimable? yes
Jan 20 13:22:15 wiggum kernel: protections[]: 0 0 0
Jan 20 13:22:15 wiggum kernel: Normal free:151640kB min:4028kB low:5032kB high:6040kB active:453452kB inactive:57712kB present:1031360kB pages_scanned:0 all_unreclaimable? no
Jan 20 13:22:15 wiggum kernel: protections[]: 0 0 0
Jan 20 13:22:15 wiggum kernel: HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Jan 20 13:22:15 wiggum kernel: protections[]: 0 0 0
Jan 20 13:22:15 wiggum kernel: DMA: 520*4kB 120*8kB 65*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4080kB
Jan 20 13:22:15 wiggum kernel: Normal: 23636*4kB 6171*8kB 225*16kB 7*32kB 1*64kB 4*128kB 3*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 151640kB
Jan 20 13:22:15 wiggum kernel: HighMem: empty
Jan 20 13:22:15 wiggum kernel: Swap cache: add 35304, delete 31894, find 5456/7337, race 0+0
Jan 20 13:22:15 wiggum kernel: Out of Memory: Killed process 12786 (firefox-bin).
Jan 20 13:22:20 wiggum kernel: oom-killer: gfp_mask=0xd1
Jan 20 13:22:20 wiggum kernel: DMA per-cpu:
Jan 20 13:22:20 wiggum kernel: cpu 0 hot: low 2, high 6, batch 1
Jan 20 13:22:20 wiggum kernel: cpu 0 cold: low 0, high 2, batch 1
Jan 20 13:22:20 wiggum kernel: Normal per-cpu:
Jan 20 13:22:20 wiggum kernel: cpu 0 hot: low 32, high 96, batch 16
Jan 20 13:22:20 wiggum kernel: cpu 0 cold: low 0, high 32, batch 16
Jan 20 13:22:20 wiggum kernel: HighMem per-cpu: empty
Jan 20 13:22:20 wiggum kernel:
Jan 20 13:22:20 wiggum kernel: Free pages: 215112kB (0kB HighMem)
Jan 20 13:22:20 wiggum kernel: Active:97117 inactive:15986 dirty:2693 writeback:0 unstable:0 free:53778 slab:6223 mapped:85471 pagetables:1948
Jan 20 13:22:20 wiggum kernel: DMA free:4152kB min:60kB low:72kB high:88kB active:0kB inactive:0kB present:16384kB pages_scanned:0 all_unreclaimable? no
Jan 20 13:22:20 wiggum kernel: protections[]: 0 0 0
Jan 20 13:22:20 wiggum kernel: Normal free:210960kB min:4028kB low:5032kB high:6040kB active:388468kB inactive:63944kB present:1031360kB pages_scanned:0 all_unreclaimable? no
Jan 20 13:22:20 wiggum kernel: protections[]: 0 0 0
Jan 20 13:22:20 wiggum kernel: HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Jan 20 13:22:20 wiggum kernel: protections[]: 0 0 0
Jan 20 13:22:20 wiggum kernel: DMA: 524*4kB 125*8kB 66*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4152kB
Jan 20 13:22:20 wiggum kernel: Normal: 29382*4kB 8689*8kB 669*16kB 91*32kB 29*64kB 10*128kB 4*256kB 4*512kB 0*1024kB 2*2048kB 0*4096kB = 210960kB
Jan 20 13:22:20 wiggum kernel: HighMem: empty
Jan 20 13:22:20 wiggum kernel: Swap cache: add 35388, delete 32397, find 5465/7365, race 0+0
Jan 20 13:22:20 wiggum kernel: Out of Memory: Killed process 12909 (xmms).

I've kept the timestamps this time, as you can see there are 5 seconds
between the two kills and the first kill happens with 150MB of zone
normal free! At 13:22:25, my psi IM client gets killed as well.

--
Jens Axboe

Andrea Arcangeli

unread,

Jan 21, 2005, 3:44:05 AM1/21/05

to Jens Axboe, Andries Brouwer, Linux Kernel, Andrew Morton

On Fri, Jan 21, 2005 at 09:09:41AM +0100, Jens Axboe wrote:
> Jan 20 13:22:15 wiggum kernel: oom-killer: gfp_mask=0xd1

This was a GFP_KERNEL|GFP_DMA allocation triggering this. However it
didn't look so much out of DMA zone, there's 4M of ram free. Could be
the ram was relased by another CPU in the meantime if this was SMP (or
even by an interrupt in UP too).

Could very well be you'll get things fixed by the lowmem_reserve patch,
that will reserve part of the dma zone, so with it you're sure it
couldn't have gone below 4M due slab allocs like skb.

I recommend trying again with the patches applied, the oom stuff is so
buggy right now that it's better you apply the fixes and try again, and
if it still happens we know it's a regression.

Thanks!

Jens Axboe

unread,

Jan 21, 2005, 3:50:33 AM1/21/05

to Andrea Arcangeli, Andries Brouwer, Linux Kernel, Andrew Morton

On Fri, Jan 21 2005, Andrea Arcangeli wrote:

> On Fri, Jan 21, 2005 at 09:09:41AM +0100, Jens Axboe wrote:
> > Jan 20 13:22:15 wiggum kernel: oom-killer: gfp_mask=0xd1
>
> This was a GFP_KERNEL|GFP_DMA allocation triggering this. However it
> didn't look so much out of DMA zone, there's 4M of ram free. Could be
> the ram was relased by another CPU in the meantime if this was SMP (or
> even by an interrupt in UP too).

It is/was UP.

> Could very well be you'll get things fixed by the lowmem_reserve patch,
> that will reserve part of the dma zone, so with it you're sure it
> couldn't have gone below 4M due slab allocs like skb.
>
> I recommend trying again with the patches applied, the oom stuff is so
> buggy right now that it's better you apply the fixes and try again, and
> if it still happens we know it's a regression.

I've added all 6 of the OOM patches (I didn't notice that thread until
now).

--
Jens Axboe