kernel panic: audit: backlog limit exceeded

707 views
Skip to first unread message

syzbot

unread,
Feb 24, 2020, 3:18:15 AM2/24/20
to a...@unstable.cc, b.a.t...@lists.open-mesh.org, dan.ca...@oracle.com, da...@davemloft.net, epa...@redhat.com, fz...@cray.com, gre...@linuxfoundation.org, john.h...@intel.com, linux...@redhat.com, linux-...@vger.kernel.org, marekl...@neomailbox.ch, net...@vger.kernel.org, pa...@paul-moore.com, s...@simonwunderlich.de, syzkall...@googlegroups.com
Hello,

syzbot found the following crash on:

HEAD commit: 36a44bcd Merge branch 'bnxt_en-shutdown-and-kexec-kdump-re..
git tree: net
console output: https://syzkaller.appspot.com/x/log.txt?x=148bfdd9e00000
kernel config: https://syzkaller.appspot.com/x/.config?x=768cc3d3e277cc16
dashboard link: https://syzkaller.appspot.com/bug?extid=9a5e789e4725b9ef1316
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=151b1109e00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=128bfdd9e00000

The bug was bisected to:

commit 0c1b9970ddd4cc41002321c3877e7f91aacb896d
Author: Dan Carpenter <dan.ca...@oracle.com>
Date: Fri Jul 28 14:42:27 2017 +0000

staging: lustre: lustre: Off by two in lmv_fid2path()

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=17e6c3e9e00000
final crash: https://syzkaller.appspot.com/x/report.txt?x=1416c3e9e00000
console output: https://syzkaller.appspot.com/x/log.txt?x=1016c3e9e00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+9a5e78...@syzkaller.appspotmail.com
Fixes: 0c1b9970ddd4 ("staging: lustre: lustre: Off by two in lmv_fid2path()")

audit: audit_backlog=13 > audit_backlog_limit=7
audit: audit_lost=1 audit_rate_limit=0 audit_backlog_limit=7
Kernel panic - not syncing: audit: backlog limit exceeded
CPU: 1 PID: 9913 Comm: syz-executor024 Not tainted 5.6.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x197/0x210 lib/dump_stack.c:118
panic+0x2e3/0x75c kernel/panic.c:221
audit_panic.cold+0x32/0x32 kernel/audit.c:307
audit_log_lost kernel/audit.c:377 [inline]
audit_log_lost+0x8b/0x180 kernel/audit.c:349
audit_log_start kernel/audit.c:1788 [inline]
audit_log_start+0x70e/0x7c0 kernel/audit.c:1745
audit_log+0x95/0x120 kernel/audit.c:2345
xt_replace_table+0x61d/0x830 net/netfilter/x_tables.c:1413
__do_replace+0x1da/0x950 net/ipv6/netfilter/ip6_tables.c:1084
do_replace net/ipv6/netfilter/ip6_tables.c:1157 [inline]
do_ip6t_set_ctl+0x33a/0x4c8 net/ipv6/netfilter/ip6_tables.c:1681
nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
nf_setsockopt+0x77/0xd0 net/netfilter/nf_sockopt.c:115
ipv6_setsockopt net/ipv6/ipv6_sockglue.c:949 [inline]
ipv6_setsockopt+0x147/0x180 net/ipv6/ipv6_sockglue.c:933
tcp_setsockopt net/ipv4/tcp.c:3165 [inline]
tcp_setsockopt+0x8f/0xe0 net/ipv4/tcp.c:3159
sock_common_setsockopt+0x94/0xd0 net/core/sock.c:3149
__sys_setsockopt+0x261/0x4c0 net/socket.c:2130
__do_sys_setsockopt net/socket.c:2146 [inline]
__se_sys_setsockopt net/socket.c:2143 [inline]
__x64_sys_setsockopt+0xbe/0x150 net/socket.c:2143
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x44720a
Code: 49 89 ca b8 37 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 1a e0 fb ff c3 66 0f 1f 84 00 00 00 00 00 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 fa df fb ff c3 66 0f 1f 84 00 00 00 00 00
RSP: 002b:00007ffd032dec78 EFLAGS: 00000286 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 000000000044720a
RDX: 0000000000000040 RSI: 0000000000000029 RDI: 0000000000000003
RBP: 00007ffd032deda0 R08: 00000000000003b8 R09: 0000000000004000
R10: 00000000006d7b40 R11: 0000000000000286 R12: 00007ffd032deca0
R13: 00000000006d9d60 R14: 0000000000000029 R15: 00000000006d7ba0
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

Paul Moore

unread,
Feb 24, 2020, 5:38:43 PM2/24/20
to syzbot, a...@unstable.cc, b.a.t...@lists.open-mesh.org, dan.ca...@oracle.com, da...@davemloft.net, Eric Paris, fz...@cray.com, gre...@linuxfoundation.org, john.h...@intel.com, linux...@redhat.com, linux-...@vger.kernel.org, marekl...@neomailbox.ch, net...@vger.kernel.org, s...@simonwunderlich.de, syzkall...@googlegroups.com
Similar to syzbot report 72461ac44b36c98f58e5, see my comments there.

--
paul moore
www.paul-moore.com

Eric Paris

unread,
Feb 24, 2020, 5:43:39 PM2/24/20
to Paul Moore, syzbot, a...@unstable.cc, b.a.t...@lists.open-mesh.org, dan.ca...@oracle.com, da...@davemloft.net, fz...@cray.com, gre...@linuxfoundation.org, john.h...@intel.com, linux...@redhat.com, linux-...@vger.kernel.org, marekl...@neomailbox.ch, net...@vger.kernel.org, s...@simonwunderlich.de, syzkall...@googlegroups.com
https://syzkaller.appspot.com/x/repro.syz?x=151b1109e00000 (the
reproducer listed) looks like it is literally fuzzing the AUDIT_SET.
Which seems like this is working as designed if it is setting the
failure mode to 2.

Paul Moore

unread,
Feb 24, 2020, 5:47:03 PM2/24/20
to Eric Paris, syzbot, a...@unstable.cc, b.a.t...@lists.open-mesh.org, dan.ca...@oracle.com, da...@davemloft.net, fz...@cray.com, gre...@linuxfoundation.org, john.h...@intel.com, linux...@redhat.com, linux-...@vger.kernel.org, marekl...@neomailbox.ch, net...@vger.kernel.org, s...@simonwunderlich.de, syzkall...@googlegroups.com
On Mon, Feb 24, 2020 at 5:43 PM Eric Paris <epa...@redhat.com> wrote:
> https://syzkaller.appspot.com/x/repro.syz?x=151b1109e00000 (the
> reproducer listed) looks like it is literally fuzzing the AUDIT_SET.
> Which seems like this is working as designed if it is setting the
> failure mode to 2.

So it is, good catch :) I saw the panic and instinctively chalked
that up to a mistaken config, not expecting that it was what was being
tested.
--
paul moore
www.paul-moore.com

Dmitry Vyukov

unread,
Feb 27, 2020, 10:40:11 AM2/27/20
to Paul Moore, Tetsuo Handa, Eric Paris, syzbot, a...@unstable.cc, b.a.t...@lists.open-mesh.org, Dan Carpenter, David Miller, fz...@cray.com, Greg Kroah-Hartman, john.h...@intel.com, linux...@redhat.com, LKML, marekl...@neomailbox.ch, netdev, s...@simonwunderlich.de, syzkaller-bugs, syzkaller
On Mon, Feb 24, 2020 at 11:47 PM Paul Moore <pa...@paul-moore.com> wrote:
>
> On Mon, Feb 24, 2020 at 5:43 PM Eric Paris <epa...@redhat.com> wrote:
> > https://syzkaller.appspot.com/x/repro.syz?x=151b1109e00000 (the
> > reproducer listed) looks like it is literally fuzzing the AUDIT_SET.
> > Which seems like this is working as designed if it is setting the
> > failure mode to 2.
>
> So it is, good catch :) I saw the panic and instinctively chalked
> that up to a mistaken config, not expecting that it was what was being
> tested.

Yes, this audit failure mode is quite unpleasant for fuzzing. And
since this is not a top-level syscall argument value, it's effectively
impossible to filter out in the fuzzer. Maybe another use case for the
"fuzer lockdown" feature +Tetsuo proposed.
With the current state of the things, I think we only have an option
to disable fuzzing of audit. Which is pity because it has found 5 or
so real bugs in audit too.
But this happened anyway because audit is only reachable from init pid
namespace and syzkaller always unshares pid namespace for sandboxing
reasons, that was removed accidentally and that's how it managed to
find the bugs. But the unshare is restored now:
https://github.com/google/syzkaller/commit/5e0e1d1450d7c3497338082fc28912fdd7f93a3c

As a side effect all other real bugs in audit will be auto-obsoleted
in future if not fixed because they will stop happening.

#syz invalid

Paul Moore

unread,
Feb 27, 2020, 7:14:28 PM2/27/20
to Dmitry Vyukov, Tetsuo Handa, Eric Paris, syzbot, a...@unstable.cc, b.a.t...@lists.open-mesh.org, Dan Carpenter, David Miller, fz...@cray.com, Greg Kroah-Hartman, john.h...@intel.com, linux...@redhat.com, LKML, marekl...@neomailbox.ch, netdev, s...@simonwunderlich.de, syzkaller-bugs, syzkaller
On Thu, Feb 27, 2020 at 10:40 AM Dmitry Vyukov <dvy...@google.com> wrote:
> On Mon, Feb 24, 2020 at 11:47 PM Paul Moore <pa...@paul-moore.com> wrote:
> > On Mon, Feb 24, 2020 at 5:43 PM Eric Paris <epa...@redhat.com> wrote:
> > > https://syzkaller.appspot.com/x/repro.syz?x=151b1109e00000 (the
> > > reproducer listed) looks like it is literally fuzzing the AUDIT_SET.
> > > Which seems like this is working as designed if it is setting the
> > > failure mode to 2.
> >
> > So it is, good catch :) I saw the panic and instinctively chalked
> > that up to a mistaken config, not expecting that it was what was being
> > tested.
>
> Yes, this audit failure mode is quite unpleasant for fuzzing. And
> since this is not a top-level syscall argument value, it's effectively
> impossible to filter out in the fuzzer. Maybe another use case for the
> "fuzer lockdown" feature +Tetsuo proposed.
> With the current state of the things, I think we only have an option
> to disable fuzzing of audit. Which is pity because it has found 5 or
> so real bugs in audit too.
> But this happened anyway because audit is only reachable from init pid
> namespace and syzkaller always unshares pid namespace for sandboxing
> reasons, that was removed accidentally and that's how it managed to
> find the bugs. But the unshare is restored now:
> https://github.com/google/syzkaller/commit/5e0e1d1450d7c3497338082fc28912fdd7f93a3c
>
> As a side effect all other real bugs in audit will be auto-obsoleted
> in future if not fixed because they will stop happening.

On the plus side, I did submit fixes for the other real audit bugs
that syzbot found recently and Linus pulled them into the tree today
so at least we have that small victory.

We could consider adding a fuzz-friendly build time config which would
disable the panic failsafe, but it probably isn't worth it at the
moment considering the syzbot's pid namespace limitations.

--
paul moore
www.paul-moore.com

Tetsuo Handa

unread,
Feb 28, 2020, 5:03:31 AM2/28/20
to Paul Moore, Dmitry Vyukov, syzbot, LKML, syzkaller-bugs, syzkaller
On 2020/02/28 9:14, Paul Moore wrote:
> We could consider adding a fuzz-friendly build time config which would
> disable the panic failsafe, but it probably isn't worth it at the
> moment considering the syzbot's pid namespace limitations.
>

I think adding a fuzz-friendly build time config does worth. For example,
we have locations where printk() emits "BUG:" or "WARNING:" and fuzzer
misunderstands that a crash occurred. PID namespace is irrelevant.
I proposed one at
https://lkml.kernel.org/r/20191216095955.988...@I-love.SAKURA.ne.jp .
I appreciate your response.

Paul Moore

unread,
Feb 28, 2020, 8:09:07 AM2/28/20
to Tetsuo Handa, Dmitry Vyukov, syzbot, LKML, syzkaller-bugs, syzkaller
To be clear, I was talking specifically about the intentional panic in
audit_panic(). It is different from every other panic I've ever seen
(perhaps there are others?) in that it doesn't indicate a serious
error condition in the kernel, it indicates that audit records were
dropped. It seems extreme to most people, but some use cases require
that the system panic rather than lose audit records.

My suggestion was that we could introduce a Kconfig build flag that
syzbot (and other fuzzers) could use to make the AUDIT_FAIL_PANIC case
in audit_panic() less panicky. However, as syzbot isn't currently
able to test the kernel's audit code due to it's pid namespace
restrictions, it doesn't make much sense to add this capability. If
syzbot removes that restriction, or when we get to the point that we
support multiple audit daemons, we can revisit this.

--
paul moore
www.paul-moore.com

Dmitry Vyukov

unread,
Mar 2, 2020, 3:42:54 AM3/2/20
to Paul Moore, Tetsuo Handa, Eric Paris, syzbot, a...@unstable.cc, b.a.t...@lists.open-mesh.org, Dan Carpenter, David Miller, fz...@cray.com, Greg Kroah-Hartman, john.h...@intel.com, linux...@redhat.com, LKML, marekl...@neomailbox.ch, netdev, s...@simonwunderlich.de, syzkaller-bugs, syzkaller
+1!

Dmitry Vyukov

unread,
Mar 2, 2020, 3:47:33 AM3/2/20
to Paul Moore, Tetsuo Handa, syzbot, LKML, syzkaller-bugs, syzkaller
Yes, we need some story for both panic and pid ns.

We also use a separate net ns, but allow fuzzer to create some sockets
in the init net ns to overcome similar limitations. This is done using
a pseudo-syscall hack:
https://github.com/google/syzkaller/blob/4a4e0509de520c7139ca2b5606712cbadc550db2/executor/common_linux.h#L1546-L1562

But the pid ns is different and looks a bit harder as we need it
during send of netlink messages.

As a strawman proposal: the comment there says "for now":

/* Only support auditd and auditctl in initial pid namespace
* for now. */
if (task_active_pid_ns(current) != &init_pid_ns)
return -EPERM;

What does that mean? Is it a kind of TODO? I mean if removing that
limitation is useful for other reasons, then maybe we could kill 2
birds with 1 stone.

Paul Moore

unread,
Mar 2, 2020, 8:43:40 AM3/2/20
to Dmitry Vyukov, Tetsuo Handa, syzbot, LKML, syzkaller-bugs, syzkaller
Long story made short - the audit subsystem doesn't handle namespaces
or containers as well as it should. Work is ongoing to add the
necessary support, but it isn't there yet and I don't want us to just
start removing restrictions until we have the proper support in place
(this what I alluded to with my "... when we get to the point that we
support multiple audit daemons, we can revisit this").

--
paul moore
www.paul-moore.com

Dmitry Vyukov

unread,
Mar 2, 2020, 9:25:49 AM3/2/20
to Paul Moore, Tetsuo Handa, syzbot, LKML, syzkaller-bugs, syzkaller
I see. Thanks for context.

FTR we've started collecting such cases
(panic-but-working-as-intended-and-hard-to-selectively-filter-out) in
https://github.com/google/syzkaller/issues/1622. So that they are not
lost in future.
Reply all
Reply to author
Forward
0 new messages