kernel panic: Out of memory and no killable processes... (2)

syzbot

unread,

Jan 28, 2018, 12:58:05 PM1/28/18

to aarc...@redhat.com, ak...@linux-foundation.org, gu...@fb.com, kirill....@linux.intel.com, linux-...@vger.kernel.org, linu...@kvack.org, mho...@suse.com, mi...@kernel.org, penguin...@i-love.sakura.ne.jp, rien...@google.com, syzkall...@googlegroups.com, yan...@alibaba-inc.com

Hello,

syzbot hit the following crash on net-next commit
6bb46bc57c8e9ce947cc605e555b7204b44d2b10 (Fri Jan 26 16:00:23 2018 +0000)
Merge branch 'cxgb4-fix-dump-collection-when-firmware-crashed'

C reproducer is attached.
syzkaller reproducer is attached.
Raw console output is attached.
compiler: gcc (GCC) 7.1.1 20170620
.config is attached.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+8630e3...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for
details.
If you forward the report, please keep this part and the footer.

[ 3685] 0 3685 17821 1 184320 0 0 sshd
[ 3692] 0 3692 4376 0 32768 0 0
syzkaller025682
[ 3695] 0 3695 4376 0 36864 0 0
syzkaller025682
Kernel panic - not syncing: Out of memory and no killable processes...

CPU: 0 PID: 1 Comm: init Not tainted 4.15.0-rc9+ #212
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
panic+0x1e4/0x41c kernel/panic.c:183
out_of_memory+0xc56/0x1220 mm/oom_kill.c:1076
__alloc_pages_may_oom mm/page_alloc.c:3395 [inline]
__alloc_pages_slowpath+0x1d1b/0x2d00 mm/page_alloc.c:4096
__alloc_pages_nodemask+0x9fb/0xd80 mm/page_alloc.c:4252
alloc_pages_current+0xb6/0x1e0 mm/mempolicy.c:2036
alloc_pages include/linux/gfp.h:492 [inline]
__page_cache_alloc+0x334/0x500 mm/filemap.c:946
page_cache_read mm/filemap.c:2388 [inline]
filemap_fault+0xefc/0x1c20 mm/filemap.c:2572
ext4_filemap_fault+0x82/0xad fs/ext4/inode.c:6164
__do_fault+0xeb/0x30f mm/memory.c:3202
do_read_fault mm/memory.c:3612 [inline]
do_fault mm/memory.c:3712 [inline]
handle_pte_fault mm/memory.c:3943 [inline]
__handle_mm_fault+0x1d8f/0x3ce0 mm/memory.c:4067
handle_mm_fault+0x334/0x8d0 mm/memory.c:4104
__do_page_fault+0x5c9/0xc90 arch/x86/mm/fault.c:1430
do_page_fault+0xee/0x720 arch/x86/mm/fault.c:1505
page_fault+0x4c/0x60 arch/x86/entry/entry_64.S:1260
RIP: 0033:0x7ffa5f1e1410
RSP: 002b:00007ffe0fa32248 EFLAGS: 00010206
RAX: 0000000000000000 RBX: 00007ffe0fa329b0 RCX: 00007ffa5f299dd3
RDX: 00007ffe0fa32280 RSI: 00007ffe0fa323b0 RDI: 0000000000000011
RBP: 00007ffe0fa32b60 R08: 00007ffe0fa32be0 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00007ffe0fa32f40 R14: 0000000000000000 R15: 0000000000000000
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: disabled

---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzk...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug
report.
Note: all commands must start from beginning of the line in the email body.

raw.log.txt

repro.syz.txt

repro.c.txt

config.txt

Tetsuo Handa

unread,

Jan 28, 2018, 7:20:35 PM1/28/18

to da...@davemloft.net, netfilt...@vger.kernel.org, core...@netfilter.org, net...@vger.kernel.org, aarc...@redhat.com, ak...@linux-foundation.org, gu...@fb.com, kirill....@linux.intel.com, linux-...@vger.kernel.org, linu...@kvack.org, mho...@suse.com, mi...@kernel.org, rien...@google.com, syzkall...@googlegroups.com, yan...@alibaba-inc.com

syzbot wrote:
> syzbot hit the following crash on net-next commit
> 6bb46bc57c8e9ce947cc605e555b7204b44d2b10 (Fri Jan 26 16:00:23 2018 +0000)
> Merge branch 'cxgb4-fix-dump-collection-when-firmware-crashed'
>
> C reproducer is attached.
> syzkaller reproducer is attached.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+8630e3...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> [ 3685] 0 3685 17821 1 184320 0 0 sshd
> [ 3692] 0 3692 4376 0 32768 0 0
> syzkaller025682
> [ 3695] 0 3695 4376 0 36864 0 0
> syzkaller025682
> Kernel panic - not syncing: Out of memory and no killable processes...
>

This sounds like too huge vmalloc() request where size is controlled by userspace.

----------
[ 27.738855] syzkaller025682 invoked oom-killer: gfp_mask=0x14002c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_NOWARN), nodemask=(null), order=0, oom_score_adj=0
[ 27.754960] syzkaller025682 cpuset=/ mems_allowed=0
[ 27.760386] CPU: 0 PID: 3689 Comm: syzkaller025682 Not tainted 4.15.0-rc9+ #212
[ 27.767820] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 27.777166] Call Trace:
[ 27.779739] dump_stack+0x194/0x257
[ 27.793194] dump_header+0x28c/0xe1e
[ 27.888997] oom_kill_process+0x8b5/0x14a0
[ 27.981648] out_of_memory+0x86d/0x1220
[ 28.003684] __alloc_pages_slowpath+0x1d1b/0x2d00
[ 28.054140] __alloc_pages_nodemask+0x9fb/0xd80
[ 28.090590] alloc_pages_current+0xb6/0x1e0
[ 28.094927] __vmalloc_node_range+0x409/0x650
[ 28.103837] __vmalloc_node_flags_caller+0x50/0x60
[ 28.113166] kvmalloc_node+0x82/0xd0
[ 28.116869] xt_alloc_table_info+0x64/0xe0
[ 28.121097] do_ip6t_set_ctl+0x29b/0x5f0
[ 28.139158] nf_setsockopt+0x67/0xc0
[ 28.142862] ipv6_setsockopt+0x115/0x150
[ 28.146912] udpv6_setsockopt+0x45/0x80
[ 28.150867] sock_common_setsockopt+0x95/0xd0
[ 28.155359] SyS_setsockopt+0x189/0x360
[ 28.177379] entry_SYSCALL_64_fastpath+0x29/0xa0
----------

struct xt_table_info *xt_alloc_table_info(unsigned int size)
{
(...snipped...)
info = kvmalloc(sz, GFP_KERNEL);
(...snipped...)
}

static int
do_ip6t_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
{
int ret;

if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
return -EPERM;

switch (cmd) {
case IP6T_SO_SET_REPLACE:
ret = do_replace(sock_net(sk), user, len);
break;

case IP6T_SO_SET_ADD_COUNTERS:
ret = do_add_counters(sock_net(sk), user, len, 0);
break;

default:
ret = -EINVAL;
}

return ret;
}

vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back
off when the current task is killed") but then became unkillable by commit
b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is
killed""). Therefore, we can't handle this problem from MM side.
Please consider adding some limit from networking side.

Florian Westphal

unread,

Jan 29, 2018, 2:27:05 AM1/29/18

to Tetsuo Handa, da...@davemloft.net, netfilt...@vger.kernel.org, core...@netfilter.org, net...@vger.kernel.org, aarc...@redhat.com, yan...@alibaba-inc.com, mho...@suse.com, syzkall...@googlegroups.com, linux-...@vger.kernel.org, mi...@kernel.org, linu...@kvack.org, rien...@google.com, ak...@linux-foundation.org, gu...@fb.com, kirill....@linux.intel.com

Tetsuo Handa <penguin...@i-love.sakura.ne.jp> wrote:
> syzbot wrote:
> > syzbot hit the following crash on net-next commit
> > 6bb46bc57c8e9ce947cc605e555b7204b44d2b10 (Fri Jan 26 16:00:23 2018 +0000)
> > Merge branch 'cxgb4-fix-dump-collection-when-firmware-crashed'
> >
> > C reproducer is attached.
> > syzkaller reproducer is attached.
> > Raw console output is attached.
> > compiler: gcc (GCC) 7.1.1 20170620
> > .config is attached.
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+8630e3...@syzkaller.appspotmail.com
> > It will help syzbot understand when the bug is fixed. See footer for
> > details.
> > If you forward the report, please keep this part and the footer.
> >
> > [ 3685] 0 3685 17821 1 184320 0 0 sshd
> > [ 3692] 0 3692 4376 0 32768 0 0
> > syzkaller025682
> > [ 3695] 0 3695 4376 0 36864 0 0
> > syzkaller025682
> > Kernel panic - not syncing: Out of memory and no killable processes...
> >
>
> This sounds like too huge vmalloc() request where size is controlled by userspace.

Right.

Before eacd86ca3b036e55e172b7279f101cef4a6ff3a4
this used

info = __vmalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY,

would it help to re-add that?

> vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back
> off when the current task is killed") but then became unkillable by commit
> b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is
> killed""). Therefore, we can't handle this problem from MM side.
> Please consider adding some limit from networking side.

I don't know what "some limit" would be. I would prefer if there was
a way to supress OOM Killer in first place so we can just -ENOMEM user.

AFAIU NOWARN|NORETRY does that, so I would propose to just read-add it?

Kirill A. Shutemov

unread,

Jan 29, 2018, 3:26:52 AM1/29/18

to Florian Westphal, Tetsuo Handa, da...@davemloft.net, netfilt...@vger.kernel.org, core...@netfilter.org, net...@vger.kernel.org, aarc...@redhat.com, yan...@alibaba-inc.com, mho...@suse.com, syzkall...@googlegroups.com, linux-...@vger.kernel.org, mi...@kernel.org, linu...@kvack.org, rien...@google.com, ak...@linux-foundation.org, gu...@fb.com, kirill....@linux.intel.com

On Mon, Jan 29, 2018 at 08:23:57AM +0100, Florian Westphal wrote:
> > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back
> > off when the current task is killed") but then became unkillable by commit
> > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is
> > killed""). Therefore, we can't handle this problem from MM side.
> > Please consider adding some limit from networking side.
>
> I don't know what "some limit" would be. I would prefer if there was
> a way to supress OOM Killer in first place so we can just -ENOMEM user.

Just supressing OOM kill is a bad idea. We still leave a way to allocate
arbitrary large buffer in kernel.

--
Kirill A. Shutemov

Florian Westphal

unread,

Jan 29, 2018, 12:00:31 PM1/29/18

to Kirill A. Shutemov, Florian Westphal, Tetsuo Handa, da...@davemloft.net, netfilt...@vger.kernel.org, core...@netfilter.org, net...@vger.kernel.org, aarc...@redhat.com, yan...@alibaba-inc.com, mho...@suse.com, syzkall...@googlegroups.com, linux-...@vger.kernel.org, mi...@kernel.org, linu...@kvack.org, rien...@google.com, ak...@linux-foundation.org, gu...@fb.com, kirill....@linux.intel.com

Isn't that what we do everywhere in network stack?

I think we should try to allocate whatever amount of memory is needed
for the given xtables ruleset, given that is what admin requested us to do.

I also would not know what limit is sane -- I've seen setups with as much
as 100k iptables rules, and that was 5 years ago.

And even if we add a "Xk rules" limit, it might be too much for
low-memory systems, or not enough for whatever other use case there
might be.

Michal Hocko

unread,

Jan 29, 2018, 1:25:25 PM1/29/18

to Florian Westphal, Kirill A. Shutemov, Tetsuo Handa, da...@davemloft.net, netfilt...@vger.kernel.org, core...@netfilter.org, net...@vger.kernel.org, aarc...@redhat.com, yan...@alibaba-inc.com, syzkall...@googlegroups.com, linux-...@vger.kernel.org, mi...@kernel.org, linu...@kvack.org, rien...@google.com, ak...@linux-foundation.org, gu...@fb.com, kirill....@linux.intel.com

On Mon 29-01-18 17:57:22, Florian Westphal wrote:
> Kirill A. Shutemov <kir...@shutemov.name> wrote:
> > On Mon, Jan 29, 2018 at 08:23:57AM +0100, Florian Westphal wrote:
> > > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back
> > > > off when the current task is killed") but then became unkillable by commit
> > > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is
> > > > killed""). Therefore, we can't handle this problem from MM side.
> > > > Please consider adding some limit from networking side.
> > >
> > > I don't know what "some limit" would be. I would prefer if there was
> > > a way to supress OOM Killer in first place so we can just -ENOMEM user.
> >
> > Just supressing OOM kill is a bad idea. We still leave a way to allocate
> > arbitrary large buffer in kernel.
>
> Isn't that what we do everywhere in network stack?
>
> I think we should try to allocate whatever amount of memory is needed
> for the given xtables ruleset, given that is what admin requested us to do.

If this is a root only thing then __GFP_NORETRY sounds like the most
straightforward way to go.
--
Michal Hocko
SUSE Labs

Kirill A. Shutemov

unread,

Jan 29, 2018, 1:28:14 PM1/29/18

to Florian Westphal, Tetsuo Handa, da...@davemloft.net, netfilt...@vger.kernel.org, core...@netfilter.org, net...@vger.kernel.org, aarc...@redhat.com, yan...@alibaba-inc.com, mho...@suse.com, syzkall...@googlegroups.com, linux-...@vger.kernel.org, mi...@kernel.org, linu...@kvack.org, rien...@google.com, ak...@linux-foundation.org, gu...@fb.com, kirill....@linux.intel.com

On Mon, Jan 29, 2018 at 05:57:22PM +0100, Florian Westphal wrote:
> Kirill A. Shutemov <kir...@shutemov.name> wrote:
> > On Mon, Jan 29, 2018 at 08:23:57AM +0100, Florian Westphal wrote:
> > > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back
> > > > off when the current task is killed") but then became unkillable by commit
> > > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is
> > > > killed""). Therefore, we can't handle this problem from MM side.
> > > > Please consider adding some limit from networking side.
> > >
> > > I don't know what "some limit" would be. I would prefer if there was
> > > a way to supress OOM Killer in first place so we can just -ENOMEM user.
> >
> > Just supressing OOM kill is a bad idea. We still leave a way to allocate
> > arbitrary large buffer in kernel.
>
> Isn't that what we do everywhere in network stack?
>
> I think we should try to allocate whatever amount of memory is needed
> for the given xtables ruleset, given that is what admin requested us to do.

Is it correct that "admin" in this case is root in random container?
I mean, can we get access to it with CLONE_NEWUSER|CLONE_NEWNET?

This can be fun.

> I also would not know what limit is sane -- I've seen setups with as much
> as 100k iptables rules, and that was 5 years ago.
>
> And even if we add a "Xk rules" limit, it might be too much for
> low-memory systems, or not enough for whatever other use case there
> might be.

I hate what I'm saying, but I guess we need some tunable here.
Not sure what exactly.

--
Kirill A. Shutemov

Florian Westphal

unread,

Jan 29, 2018, 5:38:29 PM1/29/18

to Kirill A. Shutemov, Florian Westphal, Tetsuo Handa, da...@davemloft.net, netfilt...@vger.kernel.org, core...@netfilter.org, net...@vger.kernel.org, aarc...@redhat.com, yan...@alibaba-inc.com, mho...@suse.com, syzkall...@googlegroups.com, linux-...@vger.kernel.org, mi...@kernel.org, linu...@kvack.org, rien...@google.com, ak...@linux-foundation.org, gu...@fb.com, kirill....@linux.intel.com

Kirill A. Shutemov <kir...@shutemov.name> wrote:
> On Mon, Jan 29, 2018 at 05:57:22PM +0100, Florian Westphal wrote:
> > Kirill A. Shutemov <kir...@shutemov.name> wrote:
> > > On Mon, Jan 29, 2018 at 08:23:57AM +0100, Florian Westphal wrote:
> > > > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back
> > > > > off when the current task is killed") but then became unkillable by commit
> > > > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is
> > > > > killed""). Therefore, we can't handle this problem from MM side.
> > > > > Please consider adding some limit from networking side.
> > > >
> > > > I don't know what "some limit" would be. I would prefer if there was
> > > > a way to supress OOM Killer in first place so we can just -ENOMEM user.
> > >
> > > Just supressing OOM kill is a bad idea. We still leave a way to allocate
> > > arbitrary large buffer in kernel.
> >
> > Isn't that what we do everywhere in network stack?
> >
> > I think we should try to allocate whatever amount of memory is needed
> > for the given xtables ruleset, given that is what admin requested us to do.
>
> Is it correct that "admin" in this case is root in random container?

Yes.

> I mean, can we get access to it with CLONE_NEWUSER|CLONE_NEWNET?

Yes.

> This can be fun.

Do we prevent "admin in random container" to insert 2m ipv6 routes
(alternatively: ipsec tunnels, interfaces etc etc)?

> > I also would not know what limit is sane -- I've seen setups with as much
> > as 100k iptables rules, and that was 5 years ago.
> >
> > And even if we add a "Xk rules" limit, it might be too much for
> > low-memory systems, or not enough for whatever other use case there
> > might be.
>
> I hate what I'm saying, but I guess we need some tunable here.
> Not sure what exactly.

Would memcg help?

(I don't buy the "run untrusted binaries on linux is safe" thing, so
I would not know).

Jan Engelhardt

unread,

Jan 30, 2018, 2:03:46 AM1/30/18

to Florian Westphal, Kirill A. Shutemov, Tetsuo Handa, da...@davemloft.net, netfilt...@vger.kernel.org, core...@netfilter.org, net...@vger.kernel.org, aarc...@redhat.com, yan...@alibaba-inc.com, mho...@suse.com, syzkall...@googlegroups.com, linux-...@vger.kernel.org, mi...@kernel.org, linu...@kvack.org, rien...@google.com, ak...@linux-foundation.org, gu...@fb.com, kirill....@linux.intel.com

On Monday 2018-01-29 17:57, Florian Westphal wrote:
>> > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back
>> > > off when the current task is killed") but then became unkillable by commit
>> > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is
>> > > killed""). Therefore, we can't handle this problem from MM side.
>> > > Please consider adding some limit from networking side.
>> >
>> > I don't know what "some limit" would be. I would prefer if there was
>> > a way to supress OOM Killer in first place so we can just -ENOMEM user.
>>
>> Just supressing OOM kill is a bad idea. We still leave a way to allocate
>> arbitrary large buffer in kernel.

At the very least, mm could limit that kind of "arbitrary". If the machine has
x GB (swap included) and the admin tries to make the kernel allocate space for
an x GB ruleset, no way is it going to be satisfiable _even with OOM_.

Michal Hocko

unread,

Jan 30, 2018, 2:52:30 AM1/30/18

to Florian Westphal, Kirill A. Shutemov, Tetsuo Handa, da...@davemloft.net, netfilt...@vger.kernel.org, core...@netfilter.org, net...@vger.kernel.org, aarc...@redhat.com, yan...@alibaba-inc.com, syzkall...@googlegroups.com, linux-...@vger.kernel.org, mi...@kernel.org, linu...@kvack.org, rien...@google.com, ak...@linux-foundation.org, gu...@fb.com, kirill....@linux.intel.com

On Mon 29-01-18 23:35:22, Florian Westphal wrote:
> Kirill A. Shutemov <kir...@shutemov.name> wrote:

[...]

> > I hate what I'm saying, but I guess we need some tunable here.
> > Not sure what exactly.
>
> Would memcg help?

That really depends. I would have to check whether vmalloc path obeys
__GFP_ACCOUNT (I suspect it does except for page tables allocations but
that shouldn't be a big deal). But then the other potential problem is
the life time of the xt_table_info (or other potentially large) data
structures. Are they bound to any process life time. Because if they are
not then the OOM killer will not help. The OOM panic earlier in this
thread suggests it doesn't because the test case managed to eat all the
available memory and killed all the eligible tasks which didn't help.

So in some sense the memcg would help to stop the excessive allocation,
but it wouldn't resolve it other than kill all tasks in the affected
memcg/container. Whether this is sufficient or not, I dunno. It sounds
quite suboptimal to me. But it is true this would be less tricky then
adding a global knob...

Florian Westphal

unread,

Jan 30, 2018, 3:14:38 AM1/30/18

to Michal Hocko, Florian Westphal, Kirill A. Shutemov, Tetsuo Handa, da...@davemloft.net, netfilt...@vger.kernel.org, core...@netfilter.org, net...@vger.kernel.org, aarc...@redhat.com, yan...@alibaba-inc.com, syzkall...@googlegroups.com, linux-...@vger.kernel.org, mi...@kernel.org, linu...@kvack.org, rien...@google.com, ak...@linux-foundation.org, gu...@fb.com, kirill....@linux.intel.com

Michal Hocko <mho...@kernel.org> wrote:
> On Mon 29-01-18 23:35:22, Florian Westphal wrote:
> > Kirill A. Shutemov <kir...@shutemov.name> wrote:
> [...]
> > > I hate what I'm saying, but I guess we need some tunable here.
> > > Not sure what exactly.
> >
> > Would memcg help?
>
> That really depends. I would have to check whether vmalloc path obeys
> __GFP_ACCOUNT (I suspect it does except for page tables allocations but
> that shouldn't be a big deal). But then the other potential problem is
> the life time of the xt_table_info (or other potentially large) data
> structures. Are they bound to any process life time.

No.

> Because if they are
> not then the OOM killer will not help. The OOM panic earlier in this
> thread suggests it doesn't because the test case managed to eat all the
> available memory and killed all the eligible tasks which didn't help.

Yes, which is why we do not want any OOM killer invocation in first
place...

> So in some sense the memcg would help to stop the excessive allocation,
> but it wouldn't resolve it other than kill all tasks in the affected
> memcg/container. Whether this is sufficient or not, I dunno. It sounds
> quite suboptimal to me. But it is true this would be less tricky then
> adding a global knob...

Global knob doesn't really help at all, I can add multiple large
iptables rulesets (so we would have to account), and we have same issue
in virtually all of networking, so we need limits for interface count,
tunnel count, ipsec policies/SAs, nftables, tc, etc etc.

Kirill A. Shutemov

unread,

Jan 30, 2018, 3:28:21 AM1/30/18

to Florian Westphal, Michal Hocko, Tetsuo Handa, da...@davemloft.net, netfilt...@vger.kernel.org, core...@netfilter.org, net...@vger.kernel.org, aarc...@redhat.com, yan...@alibaba-inc.com, syzkall...@googlegroups.com, linux-...@vger.kernel.org, mi...@kernel.org, linu...@kvack.org, rien...@google.com, ak...@linux-foundation.org, gu...@fb.com, kirill....@linux.intel.com

On Tue, Jan 30, 2018 at 09:11:27AM +0100, Florian Westphal wrote:
> Michal Hocko <mho...@kernel.org> wrote:
> > On Mon 29-01-18 23:35:22, Florian Westphal wrote:
> > > Kirill A. Shutemov <kir...@shutemov.name> wrote:
> > [...]
> > > > I hate what I'm saying, but I guess we need some tunable here.
> > > > Not sure what exactly.
> > >
> > > Would memcg help?
> >
> > That really depends. I would have to check whether vmalloc path obeys
> > __GFP_ACCOUNT (I suspect it does except for page tables allocations but
> > that shouldn't be a big deal). But then the other potential problem is
> > the life time of the xt_table_info (or other potentially large) data
> > structures. Are they bound to any process life time.
>
> No.

Well, IIUC they bound to net namespace life time, so killing all
proccesses in the namespace would help to get memory back. :)

--
Kirill A. Shutemov

Dmitry Vyukov

unread,

Jan 30, 2018, 4:02:56 AM1/30/18

to Kirill A. Shutemov, Florian Westphal, Michal Hocko, Tetsuo Handa, David Miller, netfilt...@vger.kernel.org, core...@netfilter.org, netdev, Andrea Arcangeli, Yang Shi, syzkall...@googlegroups.com, LKML, Ingo Molnar, Linux-MM, David Rientjes, Andrew Morton, gu...@fb.com, Kirill A. Shutemov

On Tue, Jan 30, 2018 at 9:28 AM, Kirill A. Shutemov

<kir...@shutemov.name> wrote:
> On Tue, Jan 30, 2018 at 09:11:27AM +0100, Florian Westphal wrote:
>> Michal Hocko <mho...@kernel.org> wrote:
>> > On Mon 29-01-18 23:35:22, Florian Westphal wrote:
>> > > Kirill A. Shutemov <kir...@shutemov.name> wrote:
>> > [...]
>> > > > I hate what I'm saying, but I guess we need some tunable here.
>> > > > Not sure what exactly.
>> > >
>> > > Would memcg help?
>> >
>> > That really depends. I would have to check whether vmalloc path obeys
>> > __GFP_ACCOUNT (I suspect it does except for page tables allocations but
>> > that shouldn't be a big deal). But then the other potential problem is
>> > the life time of the xt_table_info (or other potentially large) data
>> > structures. Are they bound to any process life time.
>>
>> No.
>
> Well, IIUC they bound to net namespace life time, so killing all
> proccesses in the namespace would help to get memory back. :)

... unless the namespace is mounted into file system.

Let's start with NOWARN as that's what kernel generally uses for
allocations with user-controllable size. ENOMEM is roughly as
informative as the WARNING message in this case.

I think we also need to consider setting up memory cgroup for
syzkaller test processes (we do RLIMIT_AS, but that's weak).

Michal Hocko

unread,

Jan 30, 2018, 4:51:38 AM1/30/18

to Florian Westphal, Kirill A. Shutemov, Tetsuo Handa, da...@davemloft.net, netfilt...@vger.kernel.org, core...@netfilter.org, net...@vger.kernel.org, aarc...@redhat.com, yan...@alibaba-inc.com, syzkall...@googlegroups.com, linux-...@vger.kernel.org, mi...@kernel.org, linu...@kvack.org, rien...@google.com, ak...@linux-foundation.org, gu...@fb.com, kirill....@linux.intel.com

On Tue 30-01-18 09:11:27, Florian Westphal wrote:
> Michal Hocko <mho...@kernel.org> wrote:
> > On Mon 29-01-18 23:35:22, Florian Westphal wrote:
> > > Kirill A. Shutemov <kir...@shutemov.name> wrote:
> > [...]
> > > > I hate what I'm saying, but I guess we need some tunable here.
> > > > Not sure what exactly.
> > >
> > > Would memcg help?
> >
> > That really depends. I would have to check whether vmalloc path obeys
> > __GFP_ACCOUNT (I suspect it does except for page tables allocations but
> > that shouldn't be a big deal). But then the other potential problem is
> > the life time of the xt_table_info (or other potentially large) data
> > structures. Are they bound to any process life time.
>
> No.
>
> > Because if they are
> > not then the OOM killer will not help. The OOM panic earlier in this
> > thread suggests it doesn't because the test case managed to eat all the
> > available memory and killed all the eligible tasks which didn't help.
>
> Yes, which is why we do not want any OOM killer invocation in first
> place...

The problem is that as soon as you eat that memory and ask for more
until you fail with ENOMEM then the OOM is simply unavoidable.

Michal Hocko

unread,

Jan 30, 2018, 4:57:42 AM1/30/18

to Dmitry Vyukov, Kirill A. Shutemov, Florian Westphal, Tetsuo Handa, David Miller, netfilt...@vger.kernel.org, core...@netfilter.org, netdev, Andrea Arcangeli, Yang Shi, syzkall...@googlegroups.com, LKML, Ingo Molnar, Linux-MM, David Rientjes, Andrew Morton, gu...@fb.com, Kirill A. Shutemov

You want __GFP_NORETRY but that is not _fully_ supported by kvmalloc
right now. More specifically kvmalloc doesn't guanratee that the request
will not trigger the OOM killer (like regular __GFP_NORETRY). This is
because of internal vmalloc restrictions. If you are however OK to
simply bail out in most cases then __GFP_NORETRY should work reasonably
fine.

> I think we also need to consider setting up memory cgroup for
> syzkaller test processes (we do RLIMIT_AS, but that's weak).

Well, this is not about syzkaller, it merely pointed out a potential
DoS... And that has to be addressed somehow.

Michal Hocko

unread,

Jan 30, 2018, 9:01:07 AM1/30/18

to Dmitry Vyukov, Kirill A. Shutemov, Florian Westphal, Tetsuo Handa, David Miller, netfilt...@vger.kernel.org, core...@netfilter.org, netdev, Andrea Arcangeli, Yang Shi, syzkall...@googlegroups.com, LKML, Ingo Molnar, Linux-MM, David Rientjes, Andrew Morton, gu...@fb.com, Kirill A. Shutemov

So how about this?
---
From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001
From: Michal Hocko <mho...@suse.com>
Date: Tue, 30 Jan 2018 14:51:07 +0100
Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive

syzbot has noticed that xt_alloc_table_info can allocate a lot of
memory. This is an admin only interface but an admin in a namespace
is sufficient as well. eacd86ca3b03 ("net/netfilter/x_tables.c: use
kvmalloc() in xt_alloc_table_info()") has changed the opencoded
kmalloc->vmalloc fallback into kvmalloc. It has dropped __GFP_NORETRY on
the way because vmalloc has simply never fully supported __GFP_NORETRY
semantic. This is still the case because e.g. page tables backing the
vmalloc area are hardcoded GFP_KERNEL.

Revert back to __GFP_NORETRY as a poors man defence against excessively
large allocation request here. We will not rule out the OOM killer
completely but __GFP_NORETRY should at least stop the large request
in most cases.

Fixes: eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_table_info()")
Signed-off-by: Michal Hocko <mho...@suse.com>
---
net/netfilter/x_tables.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index d8571f414208..a5f5c29bcbdc 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1003,7 +1003,13 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages)
return NULL;

- info = kvmalloc(sz, GFP_KERNEL);
+ /*
+ * __GFP_NORETRY is not fully supported by kvmalloc but it should
+ * work reasonably well if sz is too large and bail out rather
+ * than shoot all processes down before realizing there is nothing
+ * more to reclaim.
+ */
+ info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY);
if (!info)
return NULL;

--
2.15.1

Florian Westphal

unread,

Jan 30, 2018, 9:04:18 AM1/30/18

to Michal Hocko, Dmitry Vyukov, Kirill A. Shutemov, Florian Westphal, Tetsuo Handa, David Miller, netfilt...@vger.kernel.org, core...@netfilter.org, netdev, Andrea Arcangeli, Yang Shi, syzkall...@googlegroups.com, LKML, Ingo Molnar, Linux-MM, David Rientjes, Andrew Morton, gu...@fb.com, Kirill A. Shutemov

> From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mho...@suse.com>
> Date: Tue, 30 Jan 2018 14:51:07 +0100
> Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive

Acked-by: Florian Westphal <f...@strlen.de>

Michal Hocko

unread,

Jan 30, 2018, 9:40:03 AM1/30/18

to Florian Westphal, Dmitry Vyukov, Kirill A. Shutemov, Tetsuo Handa, David Miller, netfilt...@vger.kernel.org, core...@netfilter.org, netdev, Andrea Arcangeli, Yang Shi, syzkall...@googlegroups.com, LKML, Ingo Molnar, Linux-MM, David Rientjes, Andrew Morton, gu...@fb.com, Kirill A. Shutemov

Thanks! How should we route this change? Andrew, David?

Andrew Morton

unread,

Jan 30, 2018, 2:27:49 PM1/30/18

to Michal Hocko, Dmitry Vyukov, Kirill A. Shutemov, Florian Westphal, Tetsuo Handa, David Miller, netfilt...@vger.kernel.org, core...@netfilter.org, netdev, Andrea Arcangeli, Yang Shi, syzkall...@googlegroups.com, LKML, Ingo Molnar, Linux-MM, David Rientjes, gu...@fb.com, Kirill A. Shutemov

On Tue, 30 Jan 2018 15:01:04 +0100 Michal Hocko <mho...@kernel.org> wrote:

> > Well, this is not about syzkaller, it merely pointed out a potential
> > DoS... And that has to be addressed somehow.
>
> So how about this?
> ---

argh ;)

offtopic: preceding comment here is "prevent them from hitting BUG() in
vmalloc.c". I suspect this is ancient code and vmalloc sure as heck
shouldn't go BUG with this input. And it should be using `sz' ;)

So I suspect and hope that this code can be removed. If not, let's fix
vmalloc!

> - info = kvmalloc(sz, GFP_KERNEL);
> + /*
> + * __GFP_NORETRY is not fully supported by kvmalloc but it should
> + * work reasonably well if sz is too large and bail out rather
> + * than shoot all processes down before realizing there is nothing
> + * more to reclaim.
> + */
> + info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY);
> if (!info)
> return NULL;

checkpatch sayeth

networking block comments don't use an empty /* line, use /* Comment...

So I'll do that and shall scoot the patch Davewards.

Michal Hocko

unread,

Jan 31, 2018, 3:19:19 AM1/31/18

to Andrew Morton, Dmitry Vyukov, Kirill A. Shutemov, Florian Westphal, Tetsuo Handa, David Miller, netfilt...@vger.kernel.org, core...@netfilter.org, netdev, Andrea Arcangeli, Yang Shi, syzkall...@googlegroups.com, LKML, Ingo Molnar, Linux-MM, David Rientjes, gu...@fb.com, Kirill A. Shutemov

On Tue 30-01-18 11:27:45, Andrew Morton wrote:
> On Tue, 30 Jan 2018 15:01:04 +0100 Michal Hocko <mho...@kernel.org> wrote:
>
> > > Well, this is not about syzkaller, it merely pointed out a potential
> > > DoS... And that has to be addressed somehow.
> >
> > So how about this?
> > ---
>
> argh ;)

doh, those hardwired moves...

Yeah, we do not BUG but rather fail instead. See __vmalloc_node_range.
My excavation tools pointed me to "VM: Rework vmalloc code to support mapping of arbitray pages"
by Christoph back in 2002. So yes, we can safely remove it finally. Se
below.

From 8d52e1d939d101b0dafed6ae5c3c1376183e65bb Mon Sep 17 00:00:00 2001
From: Michal Hocko <mho...@suse.com>
Date: Wed, 31 Jan 2018 09:16:56 +0100
Subject: [PATCH] net/netfilter/x_tables.c: remove size check

Back in 2002 vmalloc used to BUG on too large sizes. We are much better
behaved these days and vmalloc simply returns NULL for those. Remove
the check as it simply not needed and the comment even misleading.

Suggested-by: Andrew Morton <ak...@linux-foundation.org>

Signed-off-by: Michal Hocko <mho...@suse.com>
---

net/netfilter/x_tables.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index b55ec5aa51a6..48a6ff620493 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -999,10 +999,6 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
if (sz < sizeof(*info))
return NULL;

- /* Pedantry: prevent them from hitting BUG() in vmalloc.c --RR */
- if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages)
- return NULL;
-
/* __GFP_NORETRY is not fully supported by kvmalloc but it should

* work reasonably well if sz is too large and bail out rather

* than shoot all processes down before realizing there is nothing

Pablo Neira Ayuso

unread,

Feb 2, 2018, 6:41:27 AM2/2/18

to Michal Hocko, Florian Westphal, Dmitry Vyukov, Kirill A. Shutemov, Tetsuo Handa, David Miller, netfilt...@vger.kernel.org, core...@netfilter.org, netdev, Andrea Arcangeli, Yang Shi, syzkall...@googlegroups.com, LKML, Ingo Molnar, Linux-MM, David Rientjes, Andrew Morton, gu...@fb.com, Kirill A. Shutemov

I'll place this in the nf.git tree if everyone is happy with it.

Pablo Neira Ayuso

unread,

Feb 7, 2018, 12:44:45 PM2/7/18

to Michal Hocko, Andrew Morton, Andrea Arcangeli, Yang Shi, Tetsuo Handa, netdev, gu...@fb.com, LKML, Ingo Molnar, syzkall...@googlegroups.com, Linux-MM, core...@netfilter.org, netfilt...@vger.kernel.org, Kirill A. Shutemov, David Rientjes, Kirill A. Shutemov, David Miller, Dmitry Vyukov

Hi,

On Wed, Jan 31, 2018 at 09:19:16AM +0100, Michal Hocko wrote:
[...]

Patchwork didn't catch this patch for some reason, would you mind to
resend?

Thanks!

Andrew Morton

unread,

Feb 7, 2018, 2:06:46 PM2/7/18

to Pablo Neira Ayuso, Michal Hocko, Andrea Arcangeli, Yang Shi, Tetsuo Handa, netdev, gu...@fb.com, LKML, Ingo Molnar, syzkall...@googlegroups.com, Linux-MM, core...@netfilter.org, netfilt...@vger.kernel.org, Kirill A. Shutemov, David Rientjes, Kirill A. Shutemov, David Miller, Dmitry Vyukov

From: Michal Hocko <mho...@suse.com>
Subject: net/netfilter/x_tables.c: remove size check

Back in 2002 vmalloc used to BUG on too large sizes. We are much better
behaved these days and vmalloc simply returns NULL for those. Remove the

check as it simply not needed and the comment is even misleading.

Link: http://lkml.kernel.org/r/20180131081...@dhcp22.suse.cz

Suggested-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Michal Hocko <mho...@suse.com>

Reviewed-by: Andrew Morton <ak...@linux-foundation.org>
Cc: Florian Westphal <f...@strlen.de>
Cc: David S. Miller <da...@davemloft.net>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>

---

net/netfilter/x_tables.c | 4 ----
1 file changed, 4 deletions(-)

diff -puN net/netfilter/x_tables.c~net-netfilter-x_tablesc-remove-size-check net/netfilter/x_tables.c
--- a/net/netfilter/x_tables.c~net-netfilter-x_tablesc-remove-size-check
+++ a/net/netfilter/x_tables.c
@@ -1004,10 +1004,6 @@ struct xt_table_info *xt_alloc_table_inf

if (sz < sizeof(*info))
return NULL;

- /* Pedantry: prevent them from hitting BUG() in vmalloc.c --RR */

- if ((size >> PAGE_SHIFT) + 2 > totalram_pages)

- return NULL;
-
/* __GFP_NORETRY is not fully supported by kvmalloc but it should
* work reasonably well if sz is too large and bail out rather
* than shoot all processes down before realizing there is nothing

_

Pablo Neira Ayuso

unread,

Feb 7, 2018, 2:15:26 PM2/7/18

to Andrew Morton, Michal Hocko, Andrea Arcangeli, Yang Shi, Tetsuo Handa, netdev, gu...@fb.com, LKML, Ingo Molnar, syzkall...@googlegroups.com, Linux-MM, core...@netfilter.org, netfilt...@vger.kernel.org, Kirill A. Shutemov, David Rientjes, Kirill A. Shutemov, David Miller, Dmitry Vyukov

On Wed, Feb 07, 2018 at 11:06:42AM -0800, Andrew Morton wrote:
> From: Michal Hocko <mho...@suse.com>
> Subject: net/netfilter/x_tables.c: remove size check
>
> Back in 2002 vmalloc used to BUG on too large sizes. We are much better
> behaved these days and vmalloc simply returns NULL for those. Remove the
> check as it simply not needed and the comment is even misleading.

Applied, thanks!

Tetsuo Handa

unread,

Mar 23, 2018, 6:40:24 AM3/23/18

to syzbot, aarc...@redhat.com, ak...@linux-foundation.org, gu...@fb.com, kirill....@linux.intel.com, linux-...@vger.kernel.org, linu...@kvack.org, mho...@suse.com, mi...@kernel.org, rien...@google.com, syzkall...@googlegroups.com, yan...@alibaba-inc.com

On 2018/01/29 2:58, syzbot wrote:
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzk...@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> If you want to test a patch for this bug, please reply with:
> #syz test: git://repo/address.git branch
> and provide the patch inline or as an attachment.
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug report.
> Note: all commands must start from beginning of the line in the email body.

#syz fix: netfilter: x_tables: make allocation less aggressive

Why not to backport to 4.13+ ?

Michal Hocko

unread,

Mar 23, 2018, 7:23:26 AM3/23/18

to Tetsuo Handa, syzbot, aarc...@redhat.com, ak...@linux-foundation.org, gu...@fb.com, kirill....@linux.intel.com, linux-...@vger.kernel.org, linu...@kvack.org, mi...@kernel.org, rien...@google.com, syzkall...@googlegroups.com, yan...@alibaba-inc.com, Pablo Neira Ayuso

You should CC maintainer to get an answer

Reply all

Reply to author

Forward