Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

OOM-killer invoked but why ?

808 views
Skip to first unread message

Claude Frantz

unread,
Jan 31, 2008, 8:07:32 AM1/31/08
to linux-...@vger.kernel.org
Hello !

I'm faced to a problem where the OOM-killer is invoked but I cannot find
the reason why. The machine is rather powerfull, the load is very moderate,
the disk swap space is nearly unused. The only strange observation which
appears to me is the slow but progressive decreasing of kbbuffers during
many hours.

Can you help me to diagnose the problem and to find a good solution ?

Thanks a lot !

Claude


kernel: 2.6.22.14-72.fc6 (Fedora 6)

"sar -r" output:

12:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
12:10:01 AM 1739920 1635056 48.45 9368 135620 8192960 148 0.00 0
12:20:01 AM 1691180 1683796 49.89 8644 162992 8192960 148 0.00 0
12:30:01 AM 1732076 1642900 48.68 8608 141168 8192960 148 0.00 0
12:40:01 AM 1766308 1608668 47.66 8128 134744 8192960 148 0.00 0
12:50:01 AM 1718156 1656820 49.09 6884 134288 8192960 148 0.00 0
01:00:01 AM 1728448 1646528 48.79 6476 137912 8192960 148 0.00 0
01:10:01 AM 1707652 1667324 49.40 5792 156572 8192960 148 0.00 0
01:20:01 AM 1736928 1638048 48.54 6368 138872 8192960 148 0.00 0
01:30:02 AM 1776288 1598688 47.37 5412 145136 8192960 148 0.00 0
01:40:01 AM 1780456 1594520 47.25 5464 150536 8192960 148 0.00 0
01:50:01 AM 1744856 1630120 48.30 4960 154732 8192960 148 0.00 0
02:00:02 AM 1687012 1687964 50.01 3996 171048 8192960 148 0.00 0
02:10:01 AM 1696020 1678956 49.75 3916 145424 8192960 148 0.00 0
02:20:02 AM 1740864 1634112 48.42 4340 142900 8192960 148 0.00 0
02:30:01 AM 1769460 1605516 47.57 3516 138056 8192960 148 0.00 0
02:40:02 AM 1764376 1610600 47.72 3184 138844 8192960 148 0.00 0
02:50:02 AM 1702100 1672876 49.57 3736 157448 8192960 148 0.00 0
03:00:01 AM 1750396 1624580 48.14 3556 141016 8192960 148 0.00 0
03:10:02 AM 1744168 1630808 48.32 1900 136612 8192960 148 0.00 0
03:20:01 AM 1749388 1625588 48.17 1012 136804 8192960 148 0.00 0
03:30:01 AM 1728028 1646948 48.80 1980 139104 8192960 148 0.00 0
03:40:01 AM 1718596 1656380 49.08 1136 156932 8192960 148 0.00 0
03:50:02 AM 1692684 1682292 49.85 768 140808 8192960 148 0.00 0
~~~~~~ OOM-killer in action. Then reboot.
07:30:01 AM 2134568 1240408 36.75 233624 506224 8193108 0 0.00 0
07:40:01 AM 2104412 1270564 37.65 252204 524220 8193108 0 0.00 0
07:50:01 AM 2049712 1325264 39.27 265368 527096 8193108 0 0.00 0
08:00:01 AM 1813652 1561324 46.26 281708 527296 8193108 0 0.00 0

The values in /proc/sys/vm :

/proc/sys/vm/overcommit_memory
0
/proc/sys/vm/panic_on_oom
0
/proc/sys/vm/overcommit_ratio
50
/proc/sys/vm/page-cluster
3
/proc/sys/vm/dirty_background_ratio
5
/proc/sys/vm/dirty_ratio
10
/proc/sys/vm/dirty_writeback_centisecs
499
/proc/sys/vm/dirty_expire_centisecs
2999
/proc/sys/vm/nr_pdflush_threads
2
/proc/sys/vm/swappiness
60
/proc/sys/vm/nr_hugepages
0
/proc/sys/vm/hugetlb_shm_group
0
/proc/sys/vm/lowmem_reserve_ratio
256 32
/proc/sys/vm/drop_caches
0
/proc/sys/vm/min_free_kbytes
3816
/proc/sys/vm/percpu_pagelist_fraction
0
/proc/sys/vm/max_map_count
65536
/proc/sys/vm/laptop_mode
0
/proc/sys/vm/block_dump
0
/proc/sys/vm/vfs_cache_pressure
100
/proc/sys/vm/legacy_va_layout
0
/proc/sys/vm/stat_interval
1
/proc/sys/vm/vdso_enabled
1

The syslog extract:

Jan 28 03:50:24 toaster kernel: ps invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0
Jan 28 03:50:24 toaster kernel: [<c045cf52>] out_of_memory+0x69/0x1a7
Jan 28 03:50:24 toaster kernel: [<c045e3bb>] __alloc_pages+0x216/0x2a0
Jan 28 03:50:24 toaster kernel: [<c04a6f1e>] proc_info_read+0x0/0x9d
Jan 28 03:50:24 toaster kernel: [<c045e471>] __get_free_pages+0x2c/0x3a
Jan 28 03:50:24 toaster kernel: [<c04a6f57>] proc_info_read+0x39/0x9d
Jan 28 03:50:24 toaster kernel: [<c04a6f1e>] proc_info_read+0x0/0x9d
Jan 28 03:50:24 toaster kernel: [<c0477dda>] vfs_read+0xa6/0x158
Jan 28 03:50:24 toaster kernel: [<c0478238>] sys_read+0x41/0x67
Jan 28 03:50:24 toaster kernel: [<c0404fa2>] syscall_call+0x7/0xb
Jan 28 03:50:24 toaster kernel: =======================
Jan 28 03:50:24 toaster kernel: Mem-info:
Jan 28 03:50:24 toaster kernel: DMA per-cpu:
Jan 28 03:50:35 toaster xinetd[3182]: START: time-dgram pid=0 from=137.193.74.3
Jan 28 03:50:48 toaster kernel: CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jan 28 03:50:48 toaster kernel: CPU 1: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jan 28 03:50:48 toaster kernel: CPU 2: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jan 28 03:50:48 toaster kernel: CPU 3: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jan 28 03:50:48 toaster kernel: Normal per-cpu:
Jan 28 03:50:48 toaster kernel: CPU 0: Hot: hi: 186, btch: 31 usd: 58 Cold: hi: 62, btch: 15 usd: 60
Jan 28 03:50:48 toaster kernel: CPU 1: Hot: hi: 186, btch: 31 usd: 34 Cold: hi: 62, btch: 15 usd: 60
Jan 28 03:50:49 toaster kernel: CPU 2: Hot: hi: 186, btch: 31 usd: 42 Cold: hi: 62, btch: 15 usd: 52
Jan 28 03:50:49 toaster kernel: CPU 3: Hot: hi: 186, btch: 31 usd: 85 Cold: hi: 62, btch: 15 usd: 51
Jan 28 03:50:49 toaster kernel: HighMem per-cpu:
Jan 28 03:50:49 toaster kernel: CPU 0: Hot: hi: 186, btch: 31 usd: 176 Cold: hi: 62, btch: 15 usd: 9
Jan 28 03:50:49 toaster kernel: CPU 1: Hot: hi: 186, btch: 31 usd: 57 Cold: hi: 62, btch: 15 usd: 14
Jan 28 03:50:49 toaster kernel: CPU 2: Hot: hi: 186, btch: 31 usd: 143 Cold: hi: 62, btch: 15 usd: 11
Jan 28 03:50:49 toaster kernel: CPU 3: Hot: hi: 186, btch: 31 usd: 55 Cold: hi: 62, btch: 15 usd: 0
Jan 28 03:50:49 toaster kernel: Active:186294 inactive:2340 dirty:6 writeback:55 unstable:0
Jan 28 03:50:49 toaster kernel: free:431675 slab:177466 mapped:7100 pagetables:1915 bounce:0
Jan 28 03:50:49 toaster kernel: DMA free:3544kB min:68kB low:84kB high:100kB active:0kB inactive:0kB present:16256kB pages_scanned:0 all_unreclaimable? yes
Jan 28 03:50:49 toaster kernel: lowmem_reserve[]: 0 873 3285
Jan 28 03:50:49 toaster kernel: Normal free:3684kB min:3744kB low:4680kB high:5616kB active:212kB inactive:112kB present:894080kB pages_scanned:365 all_unreclaimable? yes
Jan 28 03:50:49 toaster kernel: lowmem_reserve[]: 0 0 19300
Jan 28 03:50:49 toaster kernel: HighMem free:1719472kB min:512kB low:3100kB high:5688kB active:744964kB inactive:9248kB present:2470404kB pages_scanned:0 all_unreclaimable? no
Jan 28 03:50:49 toaster kernel: lowmem_reserve[]: 0 0 0
Jan 28 03:50:49 toaster kernel: DMA: 3*4kB 4*8kB 3*16kB 0*32kB 0*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3548kB
Jan 28 03:50:49 toaster kernel: Normal: 30*4kB 29*8kB 8*16kB 1*32kB 3*64kB 5*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 3904kB
Jan 28 03:50:49 toaster kernel: HighMem: 5991*4kB 8849*8kB 18804*16kB 12622*32kB 7820*64kB 2500*128kB 354*256kB 15*512kB 1*1024kB 0*2048kB 0*4096kB = 1719332kB
Jan 28 03:50:49 toaster kernel: Swap cache: add 37, delete 37, find 0/0, race 0+0
Jan 28 03:50:49 toaster kernel: Free swap = 8192960kB
Jan 28 03:50:49 toaster kernel: Total swap = 8193108kB
Jan 28 03:50:49 toaster kernel: Free swap: 8192960kB
Jan 28 03:50:49 toaster kernel: 851840 pages of RAM
Jan 28 03:50:49 toaster kernel: 622464 pages of HIGHMEM
Jan 28 03:50:49 toaster kernel: 8096 reserved pages
Jan 28 03:50:49 toaster kernel: 638310 pages shared
Jan 28 03:50:49 toaster kernel: 0 pages swap cached
Jan 28 03:50:49 toaster kernel: 6 pages dirty
Jan 28 03:50:49 toaster kernel: 55 pages writeback
Jan 28 03:50:49 toaster kernel: 7100 pages mapped
Jan 28 03:50:49 toaster kernel: 177466 pages slab
Jan 28 03:50:49 toaster kernel: 1915 pages pagetables
Jan 28 03:50:49 toaster kernel: Out of memory: kill process 10859 (amavisd) score 36218 or a child
Jan 28 03:50:49 toaster kernel: Killed process 19146 (amavisd)

from "lspci":

SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Peter Zijlstra

unread,
Jan 31, 2008, 9:37:06 AM1/31/08
to r31d...@pc0312b.rz.unibw-muenchen.de, linux-...@vger.kernel.org


You seem to have ran out of zone normal memory with all of it stuck in
kernel allocations. Would you have /proc/slabinfo available?

Claude Frantz

unread,
Jan 31, 2008, 10:17:41 AM1/31/08
to Peter Zijlstra, linux-...@vger.kernel.org
Peter Zijlstra wrote:

> You seem to have ran out of zone normal memory with all of it stuck in
> kernel allocations. Would you have /proc/slabinfo available?

Thanks Peter !

No ! There is no /proc/slabinfo available.

Claude

Peter Zijlstra

unread,
Jan 31, 2008, 1:14:54 PM1/31/08
to Claude Frantz, linux-...@vger.kernel.org

On Thu, 2008-01-31 at 15:41 +0100, Claude Frantz wrote:
> Peter Zijlstra wrote:
>
> > You seem to have ran out of zone normal memory with all of it stuck in
> > kernel allocations. Would you have /proc/slabinfo available?
>
> Thanks Peter !
>
> No ! There is no /proc/slabinfo available.

If you're using SLUB there is:
Documentation/vm/slabinfo.c

Andrew Morton

unread,
Feb 5, 2008, 5:08:18 AM2/5/08
to r31d...@pc0312b.rz.unibw-muenchen.de, linux-...@vger.kernel.org, Christoph Lameter, sta...@kernel.org
On Thu, 31 Jan 2008 13:53:05 +0100 Claude Frantz <r31d...@pc0312b.rz.unibw-muenchen.de> wrote:

> Hello !
>
> I'm faced to a problem where the OOM-killer is invoked but I cannot find
> the reason why. The machine is rather powerfull, the load is very moderate,
> the disk swap space is nearly unused. The only strange observation which
> appears to me is the slow but progressive decreasing of kbbuffers during
> many hours.
>
> Can you help me to diagnose the problem and to find a good solution ?
>

> ...


>
> Jan 28 03:50:49 toaster kernel: 177466 pages slab
> Jan 28 03:50:49 toaster kernel: 1915 pages pagetables
> Jan 28 03:50:49 toaster kernel: Out of memory: kill process 10859 (amavisd) score 36218 or a child
> Jan 28 03:50:49 toaster kernel: Killed process 19146 (amavisd)

slab. Maybe you've been bitten by the quicklist leak. If you're able to
patch your kernel then please try this fix:

commit 96990a4ae979df9e235d01097d6175759331e88c
Author: Christoph Lameter <clam...@sgi.com>
Date: Mon Jan 14 00:55:14 2008 -0800

quicklists: Only consider memory that can be used with GFP_KERNEL

Quicklists calculates the size of the quicklists based on the number of
free pages. This must be the number of free pages that can be allocated
with GFP_KERNEL. node_page_state() includes the pages in ZONE_HIGHMEM and
ZONE_MOVABLE which may lead the quicklists to become too large causing OOM.

Signed-off-by: Christoph Lameter <clam...@sgi.com>
Tested-by: Dhaval Giani <dha...@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>

diff --git a/mm/quicklist.c b/mm/quicklist.c
index ae8189c..3f703f7 100644
--- a/mm/quicklist.c
+++ b/mm/quicklist.c
@@ -26,9 +26,17 @@ DEFINE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK];
static unsigned long max_pages(unsigned long min_pages)
{
unsigned long node_free_pages, max;
+ struct zone *zones = NODE_DATA(numa_node_id())->node_zones;
+
+ node_free_pages =
+#ifdef CONFIG_ZONE_DMA
+ zone_page_state(&zones[ZONE_DMA], NR_FREE_PAGES) +
+#endif
+#ifdef CONFIG_ZONE_DMA32
+ zone_page_state(&zones[ZONE_DMA32], NR_FREE_PAGES) +
+#endif
+ zone_page_state(&zones[ZONE_NORMAL], NR_FREE_PAGES);

- node_free_pages = node_page_state(numa_node_id(),
- NR_FREE_PAGES);
max = node_free_pages / FRACTION_OF_NODE_MEM;
return max(max, min_pages);
}


I note that this didn't have the sta...@kernel.org cc. Christoph, did we
deliberately decide not to backport?

Dhaval Giani

unread,
Feb 5, 2008, 6:04:02 AM2/5/08
to Andrew Morton, r31d...@pc0312b.rz.unibw-muenchen.de, linux-...@vger.kernel.org, Christoph Lameter, sta...@kernel.org

According to
http://archive.netbsd.se/?ml=linux-stable-commits&a=2008-01&m=6134301 ,
its been added to the stable tree. I remember asking Greg to add it.

Thanks
--
regards,
Dhaval

Greg KH

unread,
Feb 5, 2008, 5:07:40 PM2/5/08
to Dhaval Giani, Andrew Morton, sta...@kernel.org, linux-...@vger.kernel.org, r31d...@pc0312b.rz.unibw-muenchen.de, Christoph Lameter

And then Christoph told me to remove it...

thanks,

greg k-h

Christoph Lameter

unread,
Feb 5, 2008, 5:14:02 PM2/5/08
to Greg KH, Dhaval Giani, Andrew Morton, sta...@kernel.org, linux-...@vger.kernel.org, r31d...@pc0312b.rz.unibw-muenchen.de

No I asked you to add this patch and remove the earlier patch that
tinkered around with tlb flushing.

Oliver Pinter

unread,
Feb 5, 2008, 5:18:53 PM2/5/08
to Greg KH, Dhaval Giani, Andrew Morton, sta...@kernel.org, linux-...@vger.kernel.org, r31d...@pc0312b.rz.unibw-muenchen.de, Christoph Lameter
that are, not this version ..

this is the BAD:
----8<----
From stable-...@linux.kernel.org Mon Dec 17 16:32:25 2007
2 From: Christoph Lameter <clam...@sgi.com>
3 Date: Mon, 17 Dec 2007 16:20:27 -0800
4 Subject: quicklist: Set tlb->need_flush if pages are remaining in
quicklist 0
5 To: torv...@linux-foundation.org
6 Cc: sta...@kernel.org, ak...@linux-foundation.org,
dha...@linux.vnet.ibm.com, clam...@sgi.com
7 Message-ID: <200712180020....@imap1.linux-foundation.org>
8
9
10 From: Christoph Lameter <clam...@sgi.com>
11
12 patch 421d99193537a6522aac2148286f08792167d5fd in mainline.
13
14 This ensures that the quicklists are drained. Otherwise draining may only
15 occur when the processor reaches an idle state.
16
17 Fixes fatal leakage of pgd_t's on 2.6.22 and later.
18
19 Signed-off-by: Christoph Lameter <clam...@sgi.com>
20 Reported-by: Dhaval Giani <dha...@linux.vnet.ibm.com>
21 Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
22 Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
23 Signed-off-by: Greg Kroah-Hartman <gre...@suse.de>
24
25
26 ---
27 include/asm-generic/tlb.h | 4 ++++
28 1 file changed, 4 insertions(+)
29
30 --- a/include/asm-generic/tlb.h
31 +++ b/include/asm-generic/tlb.h
32 @@ -14,6 +14,7 @@
33 #define _ASM_GENERIC__TLB_H
34
35 #include <linux/swap.h>
36 +#include <linux/quicklist.h>
37 #include <asm/pgalloc.h>
38 #include <asm/tlbflush.h>
39
40 @@ -85,6 +86,9 @@ tlb_flush_mmu(struct mmu_gather *tlb, un
41 static inline void
42 tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start,
unsigned long end)
43 {
44 +#ifdef CONFIG_QUICKLIST
45 + tlb->need_flush += &__get_cpu_var(quicklist)[0].nr_pages != 0;
46 +#endif
47 tlb_flush_mmu(tlb, start, end);
48
49 /* keep the page table cache within bounds
----8<----


--
Thanks,
Oliver

Greg KH

unread,
Feb 5, 2008, 5:41:17 PM2/5/08
to Christoph Lameter, linux-...@vger.kernel.org, Andrew Morton, Dhaval Giani, r31d...@pc0312b.rz.unibw-muenchen.de, sta...@kernel.org

Argh, I'm too confused...

As long as everyone is happy with what is currently queued up for
22-stable and .23-stable, I'll just shut up now and get on releasing
them :)

thanks,

greg k-h

0 new messages