Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Regression caused by commit "netfilter: iptables: lock free counters"

76 views
Skip to first unread message

Eric Dumazet

unread,
Mar 30, 2009, 3:33:24 AM3/30/09
to Roman Mindalev, Stephen Hemminger, Patrick McHardy, LKML, netf...@vger.kernel.org
Roman Mindalev a écrit :
> Hi.
> Since 2.6.29-git1 I got many debug messages in kernel log.
> One from them:
>
> Mar 29 12:06:11 desktop kernel: [ 917.414369] BUG: using
> smp_processor_id() in preemptible [00000000] code: hddtemp/2820 Mar 29
> 12:06:11 desktop kernel: [ 917.414378] caller is
> ipt_do_table+0xd0/0x3a8 [ip_tables] Mar 29 12:06:11 desktop kernel:
> [ 917.414382] Pid: 2820, comm: hddtemp Not tainted
> 2.6.29-05093-gc31f403 #1 Mar 29 12:06:11 desktop kernel: [ 917.414386]
> Call Trace: Mar 29 12:06:11 desktop kernel: [ 917.414393]
> [<ffffffff8042d037>] debug_smp_processor_id+0xe7/0xf0 Mar 29 12:06:11
> desktop kernel: [ 917.414400] [<ffffffffa0171f10>]
> ipt_do_table+0xd0/0x3a8 [ip_tables] Mar 29 12:06:11 desktop kernel:
> [ 917.414414] [<ffffffffa0180d0e>] ? nf_conntrack_in+0x23e/0x610
> [nf_conntrack] Mar 29 12:06:11 desktop kernel: [ 917.414422]
> [<ffffffff8026d47d>] ? trace_hardirqs_off+0xd/0x10 Mar 29 12:06:11
> desktop kernel: [ 917.414429] [<ffffffffa008e0b3>]
> ipt_local_out_hook+0x53/0x60 [iptable_filter] Mar 29 12:06:11 desktop
> kernel: [ 917.414435] [<ffffffff80546a07>] nf_iterate+0x67/0xa0 Mar
> 29 12:06:11 desktop kernel: [ 917.414441] [<ffffffff80551140>] ?
> dst_output+0x0/0x10 Mar 29 12:06:11 desktop kernel: [ 917.414446]
> [<ffffffff80546af5>] nf_hook_slow+0xb5/0x100 Mar 29 12:06:11 desktop
> kernel: [ 917.414452] [<ffffffff80551140>] ? dst_output+0x0/0x10 Mar
> 29 12:06:11 desktop kernel: [ 917.414458] [<ffffffff80552f7f>]
> __ip_local_out+0x9f/0xb0 Mar 29 12:06:11 desktop kernel:
> [ 917.414464] [<ffffffff8058c4db>] ? bictcp_cong_avoid+0x4b/0x2d0 Mar
> 29 12:06:11 desktop kernel: [ 917.414469] [<ffffffff80552fa1>]
> ip_local_out+0x11/0x30 Mar 29 12:06:11 desktop kernel: [ 917.414474]
> [<ffffffff8055389c>] ip_queue_xmit+0x21c/0x410 Mar 29 12:06:11 desktop
> kernel: [ 917.414480] [<ffffffff80272825>] ?
> print_lock_contention_bug+0x25/0x120 Mar 29 12:06:11 desktop kernel:
> [ 917.414486] [<ffffffff8056810e>] ?
> tcp_established_options+0x3e/0xe0 Mar 29 12:06:11 desktop kernel:
> [ 917.414492] [<ffffffff80272825>] ?
> print_lock_contention_bug+0x25/0x120 Mar 29 12:06:11 desktop kernel:
> [ 917.414498] [<ffffffff80568ef5>] tcp_transmit_skb+0x535/0x820 Mar
> 29 12:06:11 desktop kernel: [ 917.414504] [<ffffffff8056b5fb>]
> tcp_write_xmit+0x20b/0xaa0 Mar 29 12:06:11 desktop kernel:
> [ 917.414510] [<ffffffff8055d451>] ? sk_stream_alloc_skb+0x41/0x110
> Mar 29 12:06:11 desktop kernel: [ 917.414515] [<ffffffff8052799e>] ?
> __alloc_skb+0x6e/0x150 Mar 29 12:06:11 desktop kernel: [ 917.414521]
> [<ffffffff8056befb>] __tcp_push_pending_frames+0x2b/0x90 Mar 29
> 12:06:11 desktop kernel: [ 917.414527] [<ffffffff8055ddc3>]
> tcp_sendmsg+0x8a3/0xbb0 Mar 29 12:06:11 desktop kernel: [ 917.414533]
> [<ffffffff8051d2bb>] sock_aio_write+0x12b/0x140 Mar 29 12:06:11 desktop
> kernel: [ 917.414539] [<ffffffff802df6f1>] do_sync_write+0xf1/0x140
> Mar 29 12:06:11 desktop kernel: [ 917.414545] [<ffffffff8025e470>] ?
> autoremove_wake_function+0x0/0x40 Mar 29 12:06:11 desktop kernel:
> [ 917.414551] [<ffffffff802e0259>] vfs_write+0x179/0x190 Mar 29
> 12:06:11 desktop kernel: [ 917.414557] [<ffffffff802e0360>]
> sys_write+0x50/0x90 Mar 29 12:06:11 desktop kernel: [ 917.414563]
> [<ffffffff8020b51b>] system_call_fastpath+0x16/0x1b
>
> Result of the bisection:
>
> 784544739a25c30637397ace5489eeb6e15d7d49 is first bad commit
> commit 784544739a25c30637397ace5489eeb6e15d7d49
> Author: Stephen Hemminger <shemm...@vyatta.com>
> Date: Fri Feb 20 10:35:32 2009 +0100
>
> netfilter: iptables: lock free counters
>
> The reader/writer lock in ip_tables is acquired in the critical
> path of processing packets and is one of the reasons just loading
> iptables can cause a 20% performance loss. The rwlock serves two
> functions:
>
> 1) it prevents changes to table state (xt_replace) while table is
> in use. This is now handled by doing rcu on the xt_table. When
> table is replaced, the new table(s) are put in and the old one
> table(s) are freed after RCU period.
>
> 2) it provides synchronization when accesing the counter values.
> This is now handled by swapping in new table_info entries for
> each cpu then summing the old values, and putting the result back
> onto one cpu. On a busy system it may cause sampling to occur at
> different times on each cpu, but no packet/byte counts are lost in
> the process.
>
> Signed-off-by: Stephen Hemminger <shemm...@vyatta.com>
>
> Sucessfully tested on my dual quad core machine too, but iptables
> only (no ipv6 here) BTW, my new "tbench 8" result is 2450 MB/s, (it
> was 2150 MB/s not so long ago)
>
> Acked-by: Eric Dumazet <da...@cosmosbay.com>
> Signed-off-by: Patrick McHardy <ka...@trash.net>
>
> :040000 040000 67ef5761d009330e8d538dd1a3aff77c86995172
> 0d8d6398ced2117bb6995b0c414fd8013685324e M include :040000 040000
> 0317c745a0722ade9a6c0cbe9d6cb92abbd40efa
> 0eb195a365164c05e7c764ab1c2a7da891276837 M net
>
> After reverting commit no more these messages.
>
> This issue in bugzilla: http://bugzilla.kernel.org/show_bug.cgi?id=12960

Thanks Roman for this excellent report

Could you try following patch ?

[PATCH] netfilter: use rcu_read_bh() in ipt_do_table()

Commit 784544739a25c30637397ace5489eeb6e15d7d49
(netfilter: iptables: lock free counters) forgot to disable BH
in arpt_do_table(), ipt_do_table() and ip6t_do_table()

Use rcu_read_lock_bh() instead of rcu_read_lock() cures the problem.

Reported-and-bisected-by: Roman Mindalev <r0...@r000n.net>
Signed-off-by: Eric Dumazet <da...@cosmosbay.com>
---

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 35c5f6a..5ba533d 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -253,7 +253,7 @@ unsigned int arpt_do_table(struct sk_buff *skb,
indev = in ? in->name : nulldevname;
outdev = out ? out->name : nulldevname;

- rcu_read_lock();
+ rcu_read_lock_bh();
private = rcu_dereference(table->private);
table_base = rcu_dereference(private->entries[smp_processor_id()]);

@@ -329,7 +329,7 @@ unsigned int arpt_do_table(struct sk_buff *skb,
}
} while (!hotdrop);

- rcu_read_unlock();
+ rcu_read_unlock_bh();

if (hotdrop)
return NF_DROP;
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 82ee7c9..810c0b6 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -339,7 +339,7 @@ ipt_do_table(struct sk_buff *skb,

IP_NF_ASSERT(table->valid_hooks & (1 << hook));

- rcu_read_lock();
+ rcu_read_lock_bh();
private = rcu_dereference(table->private);
table_base = rcu_dereference(private->entries[smp_processor_id()]);

@@ -437,7 +437,7 @@ ipt_do_table(struct sk_buff *skb,
}
} while (!hotdrop);

- rcu_read_unlock();
+ rcu_read_unlock_bh();

#ifdef DEBUG_ALLOW_ALL
return NF_ACCEPT;
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index e89cfa3..dfed176 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -365,7 +365,7 @@ ip6t_do_table(struct sk_buff *skb,

IP_NF_ASSERT(table->valid_hooks & (1 << hook));

- rcu_read_lock();
+ rcu_read_lock_bh();
private = rcu_dereference(table->private);
table_base = rcu_dereference(private->entries[smp_processor_id()]);

@@ -466,7 +466,7 @@ ip6t_do_table(struct sk_buff *skb,
#ifdef CONFIG_NETFILTER_DEBUG
((struct ip6t_entry *)table_base)->comefrom = NETFILTER_LINK_POISON;
#endif
- rcu_read_unlock();
+ rcu_read_unlock_bh();

#ifdef DEBUG_ALLOW_ALL
return NF_ACCEPT;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Roman Mindalev

unread,
Mar 30, 2009, 6:51:25 AM3/30/09
to Eric Dumazet, Stephen Hemminger, Patrick McHardy, LKML, netf...@vger.kernel.org

2.6.29-git4 and 2.6.29-git6 both works fine for me with this patch.
Thanks, Eric!

Patrick McHardy

unread,
Mar 30, 2009, 8:07:10 AM3/30/09
to Roman Mindalev, Stephen Hemminger, Eric Dumazet, LKML, netf...@vger.kernel.org
Roman Mindalev wrote:
> Since 2.6.29-git1 I got many debug messages in kernel log.
> One from them:
>
> Mar 29 12:06:11 desktop kernel: [ 917.414369] BUG: using
> smp_processor_id() in preemptible [00000000] code: hddtemp/2820 Mar 29
> 12:06:11 desktop kernel: [ 917.414378] caller is
> ipt_do_table+0xd0/0x3a8 [ip_tables]

Stephen, please have a look at this. I think we need to disable
preemption in *t_do_table() for PREEMPT_RCU.

Patrick McHardy

unread,
Mar 30, 2009, 8:08:29 AM3/30/09
to Roman Mindalev, Eric Dumazet, Stephen Hemminger, LKML, netf...@vger.kernel.org, David S. Miller
Roman Mindalev wrote:
> On Mon, 30 Mar 2009 09:32:51 +0200
> Eric Dumazet <da...@cosmosbay.com> wrote:
>
>> Could you try following patch ?
>>
>> [PATCH] netfilter: use rcu_read_bh() in ipt_do_table()
>>
>> Commit 784544739a25c30637397ace5489eeb6e15d7d49
>> (netfilter: iptables: lock free counters) forgot to disable BH
>> in arpt_do_table(), ipt_do_table() and ip6t_do_table()
>>
>> Use rcu_read_lock_bh() instead of rcu_read_lock() cures the problem.
>>
>> Reported-and-bisected-by: Roman Mindalev <r0...@r000n.net>
>> Signed-off-by: Eric Dumazet <da...@cosmosbay.com>
>> ---
>
> 2.6.29-git4 and 2.6.29-git6 both works fine for me with this patch.
> Thanks, Eric!

Thanks indeed Eric :) Dave, please apply.

Acked-by: Patrick McHardy <ka...@trash.net>

Stephen Hemminger

unread,
Mar 30, 2009, 10:56:43 AM3/30/09
to Patrick McHardy, Roman Mindalev, Eric Dumazet, LKML, netf...@vger.kernel.org, David S. Miller
On Mon, 30 Mar 2009 14:08:08 +0200
Patrick McHardy <ka...@trash.net> wrote:

> Roman Mindalev wrote:
> > On Mon, 30 Mar 2009 09:32:51 +0200
> > Eric Dumazet <da...@cosmosbay.com> wrote:
> >
> >> Could you try following patch ?
> >>
> >> [PATCH] netfilter: use rcu_read_bh() in ipt_do_table()
> >>
> >> Commit 784544739a25c30637397ace5489eeb6e15d7d49
> >> (netfilter: iptables: lock free counters) forgot to disable BH
> >> in arpt_do_table(), ipt_do_table() and ip6t_do_table()
> >>
> >> Use rcu_read_lock_bh() instead of rcu_read_lock() cures the problem.
> >>
> >> Reported-and-bisected-by: Roman Mindalev <r0...@r000n.net>
> >> Signed-off-by: Eric Dumazet <da...@cosmosbay.com>
> >> ---
> >
> > 2.6.29-git4 and 2.6.29-git6 both works fine for me with this patch.
> > Thanks, Eric!
>
> Thanks indeed Eric :) Dave, please apply.
>
> Acked-by: Patrick McHardy <ka...@trash.net>
>
>

Acked-by: Stephen Hemminger <shemm...@vyatta.com>

The original version had rcu_read_lock_bh() but someone (not sure who)
talked me into removing it.

David Miller

unread,
Apr 2, 2009, 3:54:29 AM4/2/09
to ka...@trash.net, r0...@r000n.net, da...@cosmosbay.com, shemm...@vyatta.com, linux-...@vger.kernel.org, netf...@vger.kernel.org
From: Patrick McHardy <ka...@trash.net>
Date: Mon, 30 Mar 2009 14:08:08 +0200

> Thanks indeed Eric :) Dave, please apply.
>
> Acked-by: Patrick McHardy <ka...@trash.net>

Applied, thanks everyone.

Graham Murray

unread,
Apr 5, 2009, 3:05:35 AM4/5/09
to linux-...@vger.kernel.org
Roman Mindalev <r0...@r000n.net> writes:

> Result of the bisection:
>
> 784544739a25c30637397ace5489eeb6e15d7d49 is first bad commit
> commit 784544739a25c30637397ace5489eeb6e15d7d49

I am seeing a different problem which also bisects to this commit. There are
no kernel messages but ip6tables fails to run.

newton ~ # ip6tables -L -v
FATAL: Module ip6_tables not found.
ip6tables v1.4.3.1: can't initialize ip6tables table `filter': Memory allocation problem
Perhaps ip6tables or your kernel needs to be upgraded.

I get this error no matter which ip6tables sub-command I run. Ip6tables
is built into the kernel, not as modules.

An strace shows the failure to be
socket(PF_INET6, SOCK_RAW, IPPROTO_RAW) = 3
getsockopt(3, SOL_IPV6, 0x40 /* IPV6_??? */, "filter\0\305\0w~\300\0wb\305P\24\312\t\0009b\305\216\23\0\0\310\341/g\16"..., [84]) = 0
brk(0) = 0x8273000
brk(0x8294000) = 0x8294000
getsockopt(3, SOL_IPV6, 0x41 /* IPV6_??? */, 0x8273090, 0xbfd23628) = -1 ENOMEM (Cannot allocate memory)
close(3) = 0

David Miller

unread,
Apr 5, 2009, 4:23:21 AM4/5/09
to gra...@gmurray.org.uk, linux-...@vger.kernel.org, net...@vger.kernel.org, netfilt...@vger.kernel.org
From: Graham Murray <gra...@gmurray.org.uk>
Date: Sun, 05 Apr 2009 08:05:17 +0100

Please CC the appropriate mailing lists (as I have now) when reporting
this incredibly useful information. The networking and netfilter
developers largely do not read linux-kernel.

Eric Dumazet

unread,
Apr 5, 2009, 6:01:40 AM4/5/09
to David Miller, gra...@gmurray.org.uk, linux-...@vger.kernel.org, net...@vger.kernel.org, netfilt...@vger.kernel.org
David Miller a écrit :

> From: Graham Murray <gra...@gmurray.org.uk>
> Date: Sun, 05 Apr 2009 08:05:17 +0100
>
> Please CC the appropriate mailing lists (as I have now) when reporting
> this incredibly useful information. The networking and netfilter
> developers largely do not read linux-kernel.
>
>> Roman Mindalev <r0...@r000n.net> writes:
>>
>>> Result of the bisection:
>>>
>>> 784544739a25c30637397ace5489eeb6e15d7d49 is first bad commit
>>> commit 784544739a25c30637397ace5489eeb6e15d7d49
>> I am seeing a different problem which also bisects to this commit. There are
>> no kernel messages but ip6tables fails to run.
>>
>> newton ~ # ip6tables -L -v
>> FATAL: Module ip6_tables not found.
>> ip6tables v1.4.3.1: can't initialize ip6tables table `filter': Memory allocation problem
>> Perhaps ip6tables or your kernel needs to be upgraded.
>>
>> I get this error no matter which ip6tables sub-command I run. Ip6tables
>> is built into the kernel, not as modules.
>>
>> An strace shows the failure to be
>> socket(PF_INET6, SOCK_RAW, IPPROTO_RAW) = 3
>> getsockopt(3, SOL_IPV6, 0x40 /* IPV6_??? */, "filter\0\305\0w~\300\0wb\305P\24\312\t\0009b\305\216\23\0\0\310\341/g\16"..., [84]) = 0
>> brk(0) = 0x8273000
>> brk(0x8294000) = 0x8294000

so ip6tables allocates about 128 Kbytes of ram in order to get rules from kernel.

>> getsockopt(3, SOL_IPV6, 0x41 /* IPV6_??? */, 0x8273090, 0xbfd23628) = -1 ENOMEM (Cannot allocate memory)
>> close(3) = 0
>>


This is a big problem yes, since "iptables|ip6tables" -L needs to allocate kernel memory
to perform the momentary swap.

On x86, this is potentially a problem if vmalloc space is exhausted or fragmented,
(or lowmem exhausted) and/or many cpus are online/possible.

Graham, could you please give us :

# cat /proc/vmallocinfo
# cat /proc/meminfo

I wonder if your machine is in a state where even an "ip6tables -A ..." would fail anyway
since it should allocate same amount of memory than "ip6tables -L "


This could probably be solved using a single "table" containing rules only, that could
be shared for every cpus. Only counters should be percpu. This should save a lot of ram,
over previous situation (2.6.29 or current one)

(current scheme is to allocate a copy of all rules logic *and* counters per cpu)

Then if we want to be sure "iptables -L" cannot fail, we should reserve this extra space
at load time (iptables -{A|I}", instead.

Other possibility is to use a percpu seqlock as Stephen did in one of his patch, and not swap tables
when doing "iptables -L".
This would slowdown fast path a litle bit (one spinlock/spinunlock) per ipt_do_table() call.

Graham Murray

unread,
Apr 5, 2009, 6:12:58 AM4/5/09
to linux-...@vger.kernel.org
Eric Dumazet <da...@cosmosbay.com> writes:

>>>> 784544739a25c30637397ace5489eeb6e15d7d49 is first bad commit
>>>> commit 784544739a25c30637397ace5489eeb6e15d7d49
>>> I am seeing a different problem which also bisects to this commit. There are
>>> no kernel messages but ip6tables fails to run.
>>>
>>> newton ~ # ip6tables -L -v
>>> FATAL: Module ip6_tables not found.
>>> ip6tables v1.4.3.1: can't initialize ip6tables table `filter': Memory allocation problem
>>> Perhaps ip6tables or your kernel needs to be upgraded.
>>>
>>> I get this error no matter which ip6tables sub-command I run. Ip6tables
>>> is built into the kernel, not as modules.
>>>
>>> An strace shows the failure to be
>>> socket(PF_INET6, SOCK_RAW, IPPROTO_RAW) = 3
>>> getsockopt(3, SOL_IPV6, 0x40 /* IPV6_??? */, "filter\0\305\0w~\300\0wb\305P\24\312\t\0009b\305\216\23\0\0\310\341/g\16"..., [84]) = 0
>>> brk(0) = 0x8273000
>>> brk(0x8294000) = 0x8294000
>
> so ip6tables allocates about 128 Kbytes of ram in order to get rules from kernel.
>
>>> getsockopt(3, SOL_IPV6, 0x41 /* IPV6_??? */, 0x8273090, 0xbfd23628) = -1 ENOMEM (Cannot allocate memory)
>>> close(3) = 0
>>>
>
>
> This is a big problem yes, since "iptables|ip6tables" -L needs to allocate kernel memory
> to perform the momentary swap.
>
> On x86, this is potentially a problem if vmalloc space is exhausted or fragmented,
> (or lowmem exhausted) and/or many cpus are online/possible.

iptables gives me no problems at all, it is just ip6tables that
fails. The first indication of this is during the init scripts when
ip6tables-restore fails.

> Graham, could you please give us :
>
> # cat /proc/vmallocinfo
> # cat /proc/meminfo

newton ~ # cat /proc/vmallocinfo
0xf81fe000-0xf8200000 8192 hpet_enable+0x2f/0x272 phys=fed00000 ioremap
0xf8202000-0xf8204000 8192 acpi_os_map_memory+0x24/0x3e phys=bff8e000 ioremap
0xf8206000-0xf8208000 8192 acpi_os_map_memory+0x24/0x3e phys=bff8e000 ioremap
0xf820a000-0xf820c000 8192 acpi_os_map_memory+0x24/0x3e phys=bff8e000 ioremap
0xf820e000-0xf8210000 8192 acpi_os_map_memory+0x24/0x3e phys=bff89000 ioremap
0xf8210000-0xf821b000 45056 acpi_os_map_memory+0x24/0x3e phys=bff80000 ioremap
0xf821c000-0xf821e000 8192 acpi_os_map_memory+0x24/0x3e phys=bff80000 ioremap
0xf8220000-0xf8222000 8192 acpi_os_map_memory+0x24/0x3e phys=fed1f000 ioremap
0xf8224000-0xf8227000 12288 acpi_os_map_memory+0x24/0x3e phys=bfff0000 ioremap
0xf8228000-0xf822a000 8192 acpi_os_map_memory+0x24/0x3e phys=bff8e000 ioremap
0xf822c000-0xf822e000 8192 acpi_os_map_memory+0x24/0x3e phys=bff80000 ioremap
0xf8230000-0xf8232000 8192 acpi_os_map_memory+0x24/0x3e phys=bff80000 ioremap
0xf8234000-0xf8236000 8192 acpi_os_map_memory+0x24/0x3e phys=bff8e000 ioremap
0xf8238000-0xf823a000 8192 acpi_os_map_memory+0x24/0x3e phys=bff89000 ioremap
0xf823b000-0xf8247000 49152 zisofs_init+0x1b/0x46 pages=11 vmalloc
0xf8248000-0xf824d000 20480 radeonfb_pci_register+0x193/0x174c phys=ff9f0000 ioremap
0xf824e000-0xf8250000 8192 pci_iomap+0xdd/0xf0 phys=ffafb000 ioremap
0xf8252000-0xf8254000 8192 usb_hcd_pci_probe+0x1a1/0x410 phys=ffafb000 ioremap
0xf8254000-0xf8257000 12288 pci_iomap+0xdd/0xf0 phys=ff6fe000 ioremap
0xf8258000-0xf825a000 8192 iTCO_wdt_probe+0x2f1/0x360 phys=fed1f000 ioremap
0xf825e000-0xf8260000 8192 module_alloc+0x1e/0x40 pages=1 vmalloc
0xf8260000-0xf8265000 20480 pci_ioremap_bar+0x4e/0xa0 phys=ffafc000 ioremap
0xf8266000-0xf8277000 69632 snd_malloc_sgbuf_pages+0x1f4/0x230 vmap
0xf827b000-0xf8280000 20480 module_alloc+0x1e/0x40 pages=4 vmalloc
0xf8280000-0xf9281000 16781312 radeonfb_pci_register+0x3a0/0x174c phys=d0000000 ioremap
0xf9282000-0xf9293000 69632 snd_malloc_sgbuf_pages+0x1f4/0x230 vmap
0xf9294000-0xf92a5000 69632 snd_malloc_sgbuf_pages+0x1f4/0x230 vmap
0xf92a6000-0xf92b7000 69632 snd_malloc_sgbuf_pages+0x1f4/0x230 vmap
0xf92b8000-0xf92c9000 69632 snd_malloc_sgbuf_pages+0x1f4/0x230 vmap
0xf92ca000-0xf92db000 69632 snd_malloc_sgbuf_pages+0x1f4/0x230 vmap
0xf92dc000-0xf931d000 266240 pcpu_alloc+0x16d/0x440 vpages
0xf9320000-0xf9325000 20480 sky2_probe+0x12d/0xa79 [sky2] phys=ff7fc000 ioremap
0xf9326000-0xf932f000 36864 module_alloc+0x1e/0x40 pages=8 vmalloc
0xf932f000-0xf933c000 53248 module_alloc+0x1e/0x40 pages=12 vmalloc
0xf933d000-0xf9340000 12288 drm_addmap_core+0x6f8/0x8a0 [drm] pages=2 vmalloc user
0xf9340000-0xf9345000 20480 sky2_probe+0x12d/0xa79 [sky2] phys=ff8fc000 ioremap
0xf9346000-0xf934f000 36864 drm_sg_alloc+0x13f/0x2a0 [drm] pages=8 vmalloc
0xf9352000-0xf9357000 20480 drm_ht_create+0xaf/0xe0 [drm] pages=4 vmalloc
0xf935d000-0xf9361000 16384 module_alloc+0x1e/0x40 pages=3 vmalloc
0xf9368000-0xf936c000 16384 module_alloc+0x1e/0x40 pages=3 vmalloc
0xf9373000-0xf9377000 16384 module_alloc+0x1e/0x40 pages=3 vmalloc
0xf9380000-0xf9391000 69632 drm_addmap_core+0x6cb/0x8a0 [drm] phys=ff9f0000 ioremap
0xf9394000-0xf9399000 20480 module_alloc+0x1e/0x40 pages=4 vmalloc
0xf93a8000-0xf93b3000 45056 module_alloc+0x1e/0x40 pages=10 vmalloc
0xf93c0000-0xf93c7000 28672 module_alloc+0x1e/0x40 pages=6 vmalloc
0xf93d3000-0xf93d9000 24576 module_alloc+0x1e/0x40 pages=5 vmalloc
0xf93e0000-0xf93e9000 36864 drm_core_ioremap_wc+0x43/0x80 [drm] phys=dfff8000 ioremap
0xf93ef000-0xf93ff000 65536 module_alloc+0x1e/0x40 pages=15 vmalloc
0xf9400000-0xf9411000 69632 drm_addmap_core+0x6cb/0x8a0 [drm] phys=ff9f0000 ioremap
0xf9421000-0xf9437000 90112 module_alloc+0x1e/0x40 pages=21 vmalloc
0xf9445000-0xf944d000 32768 module_alloc+0x1e/0x40 pages=7 vmalloc
0xf9457000-0xf945a000 12288 module_alloc+0x1e/0x40 pages=2 vmalloc
0xf9471000-0xf9480000 61440 module_alloc+0x1e/0x40 pages=14 vmalloc
0xf948b000-0xf9490000 20480 module_alloc+0x1e/0x40 pages=4 vmalloc
0xf94ac000-0xf94ae000 8192 dm_vcalloc+0x5a/0x90 [dm_mod] pages=1 vmalloc
0xf94af000-0xf94d6000 159744 module_alloc+0x1e/0x40 pages=38 vmalloc
0xf94d9000-0xf94db000 8192 dm_vcalloc+0x5a/0x90 [dm_mod] pages=1 vmalloc
0xf9569000-0xf95c3000 368640 module_alloc+0x1e/0x40 pages=89 vmalloc
0xf989d000-0xf9a94000 2060288 sys_swapon+0x56d/0xa60 pages=502 vmalloc
0xf9a95000-0xf9c18000 1585152 sys_swapon+0x56d/0xa60 pages=386 vmalloc
0xf9c46000-0xf9c49000 12288 xt_alloc_table_info+0xc0/0xe0 pages=2 vmalloc
0xf9c4a000-0xf9c4d000 12288 xt_alloc_table_info+0xc0/0xe0 pages=2 vmalloc
0xf9c4e000-0xf9c51000 12288 xt_alloc_table_info+0xc0/0xe0 pages=2 vmalloc
0xf9c52000-0xf9c55000 12288 xt_alloc_table_info+0xc0/0xe0 pages=2 vmalloc
0xf9c60000-0xf9c62000 8192 module_alloc+0x1e/0x40 pages=1 vmalloc
0xf9c6d000-0xf9c74000 28672 module_alloc+0x1e/0x40 pages=6 vmalloc
0xf9c7d000-0xf9c80000 12288 module_alloc+0x1e/0x40 pages=2 vmalloc
0xf9c98000-0xf9ca7000 61440 module_alloc+0x1e/0x40 pages=14 vmalloc
0xf9ccc000-0xf9cd1000 20480 module_alloc+0x1e/0x40 pages=4 vmalloc
0xf9cf6000-0xfbcf7000 33558528 drm_sg_alloc+0x13f/0x2a0 [drm] pages=8192 vmalloc vpages

newton ~ # cat /proc/meminfo
MemTotal: 6210480 kB
MemFree: 2935668 kB
Buffers: 648416 kB
Cached: 1188316 kB
SwapCached: 0 kB
Active: 1606488 kB
Inactive: 1483096 kB
Active(anon): 1151104 kB
Inactive(anon): 129184 kB
Active(file): 455384 kB
Inactive(file): 1353912 kB
Unevictable: 0 kB
Mlocked: 0 kB
HighTotal: 5379592 kB
HighFree: 2925400 kB
LowTotal: 830888 kB
LowFree: 10268 kB
SwapTotal: 7261360 kB
SwapFree: 7261360 kB
Dirty: 84 kB
Writeback: 0 kB
AnonPages: 1253024 kB
Mapped: 130664 kB
Slab: 114644 kB
SReclaimable: 102084 kB
SUnreclaim: 12560 kB
PageTables: 8132 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 10366600 kB
Committed_AS: 1969496 kB
VmallocTotal: 122880 kB
VmallocUsed: 55208 kB
VmallocChunk: 62492 kB
DirectMap4k: 8184 kB
DirectMap2M: 903168 kB

Eric Dumazet

unread,
Apr 5, 2009, 6:30:46 AM4/5/09
to Graham Murray, linux-...@vger.kernel.org, Linux Netdev List
Graham Murray a écrit :

I see, its a plain bug in net/ipv6/netfilter/ip6_tables.c
function alloc_counters() always returns -ENOMEM

Unfortunatly , its Sunday here and I have to run for lunch time with family :)

If nobody beats me, I will do the fix in a couple of hours...

Thank you

Jan Engelhardt

unread,
Apr 5, 2009, 7:37:11 AM4/5/09
to Eric Dumazet, David Miller, gra...@gmurray.org.uk, linux-...@vger.kernel.org, net...@vger.kernel.org, netfilt...@vger.kernel.org

On Sunday 2009-04-05 12:01, Eric Dumazet wrote:
>
>This could probably be solved using a single "table" containing
>rules only, that could be shared for every cpus. Only counters
>should be percpu. This should save a lot of ram, over previous
>situation (2.6.29 or current one)

Why would counters stay separate?

I recognize all of this table copying is related to do NUMA
optimizations, and I think I heard cache bouncing too somewhere else.

[ http://marc.info/?l=netfilter-devel&m=119903624211253&w=2 ]

Eric Dumazet

unread,
Apr 5, 2009, 8:30:32 AM4/5/09
to Graham Murray, David S. Miller, linux-...@vger.kernel.org, Linux Netdev List
Eric Dumazet a écrit :
> Graham Murray a écrit :

>> iptables gives me no problems at all, it is just ip6tables that
>> fails. The first indication of this is during the init scripts when
>> ip6tables-restore fails.
>
> I see, its a plain bug in net/ipv6/netfilter/ip6_tables.c
> function alloc_counters() always returns -ENOMEM
>
> Unfortunatly , its Sunday here and I have to run for lunch time with family :)
>
> If nobody beats me, I will do the fix in a couple of hours...

Here is the fix, thanks Graham for the report !

[PATCH] netfilter: ip6tables fix

ip6_tables.c alloc_counters() misses a return statement, making
ip6tables -N always failing and leaking memory.

Reported-by: Graham Murray <gra...@gmurray.org.uk>


Signed-off-by: Eric Dumazet <da...@cosmosbay.com>
---

diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index dfed176..800ae85 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -1033,6 +1033,8 @@ static struct xt_counters *alloc_counters(struct xt_table *table)

xt_free_table_info(info);

+ return counters;
+
free_counters:
vfree(counters);
nomem:

Eric Dumazet

unread,
Apr 5, 2009, 8:35:25 AM4/5/09
to Jan Engelhardt, David Miller, gra...@gmurray.org.uk, linux-...@vger.kernel.org, net...@vger.kernel.org, netfilt...@vger.kernel.org
Jan Engelhardt a écrit :

> On Sunday 2009-04-05 12:01, Eric Dumazet wrote:
>> This could probably be solved using a single "table" containing
>> rules only, that could be shared for every cpus. Only counters
>> should be percpu. This should save a lot of ram, over previous
>> situation (2.6.29 or current one)
>
> Why would counters stay separate?
>
> I recognize all of this table copying is related to do NUMA
> optimizations, and I think I heard cache bouncing too somewhere else.
>
> [ http://marc.info/?l=netfilter-devel&m=119903624211253&w=2 ]
>
>

Not only NUMA, but SMP too. counters are integrated in rules themselves.
So in order to avoid ping pongs between cpus, we choose to allocate
one copy of rules/counters per cpu.

But with some changes, we could let the rules read-only and shared by
all cpus, and shadow counters only on percpu variables, thus reducing
memory costs.

Eric Dumazet

unread,
Apr 5, 2009, 8:40:57 AM4/5/09
to Graham Murray, David S. Miller, linux-...@vger.kernel.org, Linux Netdev List
Eric Dumazet a écrit :

> Eric Dumazet a écrit :
>> Graham Murray a écrit :
>>> iptables gives me no problems at all, it is just ip6tables that
>>> fails. The first indication of this is during the init scripts when
>>> ip6tables-restore fails.
>> I see, its a plain bug in net/ipv6/netfilter/ip6_tables.c
>> function alloc_counters() always returns -ENOMEM
>>
>> Unfortunatly , its Sunday here and I have to run for lunch time with family :)
>>
>> If nobody beats me, I will do the fix in a couple of hours...
>
> Here is the fix, thanks Graham for the report !
>
> [PATCH] netfilter: ip6tables fix
>
> ip6_tables.c alloc_counters() misses a return statement, making
> ip6tables -N always failing and leaking memory.

Oh well, this ChangeLog is not correct :(

Here is an updated patch with correct ChangeLog, sorry David.
(-L instead of -N, and there was no memory leak involved)

[PATCH] netfilter: ip6tables fix

ip6_tables.c alloc_counters() misses a return statement, making

ip6tables -L always failing.

0 new messages