WARNING: ODEBUG bug in tcf_queue_work

26 views
Skip to first unread message

syzbot

unread,
Mar 5, 2020, 1:45:11ā€ÆPM3/5/20
to da...@davemloft.net, j...@mojatatu.com, ji...@resnulli.us, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com, xiyou.w...@gmail.com
Hello,

syzbot found the following crash on:

HEAD commit: 63623fd4 Merge tag 'for-linus' of git://git.kernel.org/pub..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10535e45e00000
kernel config: https://syzkaller.appspot.com/x/.config?x=9833e26bab355358
dashboard link: https://syzkaller.appspot.com/bug?extid=9c2df9fd5e9445b74e01
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=168c4839e00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=10587419e00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+9c2df9...@syzkaller.appspotmail.com

------------[ cut here ]------------
ODEBUG: activate active (active state 1) object type: rcu_head hint: 0x0
WARNING: CPU: 0 PID: 9599 at lib/debugobjects.c:485 debug_print_object+0x168/0x250 lib/debugobjects.c:485
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 9599 Comm: syz-executor772 Not tainted 5.6.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x197/0x210 lib/dump_stack.c:118
panic+0x2e3/0x75c kernel/panic.c:221
__warn.cold+0x2f/0x3e kernel/panic.c:582
report_bug+0x289/0x300 lib/bug.c:195
fixup_bug arch/x86/kernel/traps.c:174 [inline]
fixup_bug arch/x86/kernel/traps.c:169 [inline]
do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267
do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286
invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
RIP: 0010:debug_print_object+0x168/0x250 lib/debugobjects.c:485
Code: dd 00 e7 91 88 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 b5 00 00 00 48 8b 14 dd 00 e7 91 88 48 c7 c7 60 dc 91 88 e8 07 6e 9f fd <0f> 0b 83 05 03 6c ff 06 01 48 83 c4 20 5b 41 5c 41 5d 41 5e 5d c3
RSP: 0018:ffffc90005cd70b0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff815ebe46 RDI: fffff52000b9ae08
RBP: ffffc90005cd70f0 R08: ffff888093472400 R09: fffffbfff16a3370
R10: fffffbfff16a336f R11: ffffffff8b519b7f R12: 0000000000000001
R13: ffffffff89bac220 R14: 0000000000000000 R15: 1ffff92000b9ae24
debug_object_activate+0x346/0x470 lib/debugobjects.c:652
debug_rcu_head_queue kernel/rcu/rcu.h:176 [inline]
__call_rcu kernel/rcu/tree.c:2597 [inline]
call_rcu+0x2f/0x700 kernel/rcu/tree.c:2683
queue_rcu_work+0x8a/0xa0 kernel/workqueue.c:1742
tcf_queue_work+0xd3/0x110 net/sched/cls_api.c:206
route4_change+0x19e8/0x2250 net/sched/cls_route.c:550
tc_new_tfilter+0xb82/0x2480 net/sched/cls_api.c:2103
rtnetlink_rcv_msg+0x824/0xaf0 net/core/rtnetlink.c:5427
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2478
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5454
netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1329
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1918
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:672
____sys_sendmsg+0x753/0x880 net/socket.c:2343
___sys_sendmsg+0x100/0x170 net/socket.c:2397
__sys_sendmsg+0x105/0x1d0 net/socket.c:2430
__do_sys_sendmsg net/socket.c:2439 [inline]
__se_sys_sendmsg net/socket.c:2437 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2437
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x446709
Code: e8 1c ba 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 ab 0e fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fb9e0c67d98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000006dbc68 RCX: 0000000000446709
RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000003
RBP: 00000000006dbc60 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dbc6c
R13: 0000000000000005 R14: 00a3a20740000000 R15: 0507002400000038
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

Hillf Danton

unread,
Mar 6, 2020, 1:44:47ā€ÆAM3/6/20
to syzbot, da...@davemloft.net, j...@mojatatu.com, ji...@resnulli.us, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com, xiyou.w...@gmail.com

On Thu, 05 Mar 2020 10:45:10 -0800
Add a put callback for cls_route4_ops in an attempt to pair with get.

--- a/net/sched/cls_route.c
+++ b/net/sched/cls_route.c
@@ -48,6 +48,8 @@ struct route4_filter {
u32 id;
int iif;

+ atomic_t ref;
+
struct tcf_result res;
struct tcf_exts exts;
u32 handle;
@@ -56,6 +58,12 @@ struct route4_filter {
struct rcu_work rwork;
};

+static inline void route4_get_filter(struct route4_filter *filter)
+{
+ if (filter)
+ atomic_inc(&filter->ref);
+}
+
#define ROUTE4_FAILURE ((struct route4_filter *)(-1L))

static inline int route4_fastmap_hash(u32 id, int iif)
@@ -232,8 +240,10 @@ static void *route4_get(struct tcf_proto
for (f = rtnl_dereference(b->ht[h2]);
f;
f = rtnl_dereference(f->next))
- if (f->handle == handle)
+ if (f->handle == handle) {
+ route4_get_filter(f);
return f;
+ }
}
return NULL;
}
@@ -267,9 +277,14 @@ static void route4_delete_filter_work(st
rtnl_unlock();
}

-static void route4_queue_work(struct route4_filter *f)
+static inline void route4_put_filter(struct tcf_proto *tp,
+ struct route4_filter *filter)
{
- tcf_queue_work(&f->rwork, route4_delete_filter_work);
+ if (!filter || !atomic_dec_and_test(&filter->ref))
+ return;
+ tcf_unbind_filter(tp, &filter->res);
+ tcf_exts_get_net(&filter->exts);
+ tcf_queue_work(&filter->rwork, route4_delete_filter_work);
}

static void route4_destroy(struct tcf_proto *tp, bool rtnl_held,
@@ -294,11 +309,7 @@ static void route4_destroy(struct tcf_pr

next = rtnl_dereference(f->next);
RCU_INIT_POINTER(b->ht[h2], next);
- tcf_unbind_filter(tp, &f->res);
- if (tcf_exts_get_net(&f->exts))
- route4_queue_work(f);
- else
- __route4_delete_filter(f);
+ route4_put_filter(tp, f);
}
}
RCU_INIT_POINTER(head->table[h1], NULL);
@@ -338,10 +349,7 @@ static int route4_delete(struct tcf_prot
*/
route4_reset_fastmap(head);

- /* Delete it */
- tcf_unbind_filter(tp, &f->res);
- tcf_exts_get_net(&f->exts);
- tcf_queue_work(&f->rwork, route4_delete_filter_work);
+ route4_put_filter(tp, f);

/* Strip RTNL protected tree */
for (i = 0; i <= 32; i++) {
@@ -523,6 +531,8 @@ static int route4_change(struct net *net
break;

tcf_block_netif_keep_dst(tp->chain->block);
+ route4_get_filter(f); /* for adding filter in hash table */
+ route4_get_filter(f); /* for putting fold as filter will be put by caller */
rcu_assign_pointer(f->next, f1);
rcu_assign_pointer(*fp, f);

@@ -536,6 +546,7 @@ static int route4_change(struct net *net
fp = &pfp->next, pfp = rtnl_dereference(*fp)) {
if (pfp == f) {
*fp = f->next;
+ route4_put_filter(tp, f);
break;
}
}
@@ -544,11 +555,7 @@ static int route4_change(struct net *net

route4_reset_fastmap(head);
*arg = f;
- if (fold) {
- tcf_unbind_filter(tp, &fold->res);
- tcf_exts_get_net(&fold->exts);
- tcf_queue_work(&fold->rwork, route4_delete_filter_work);
- }
+ route4_put_filter(tp, fold);
return 0;

errout:
@@ -649,12 +656,18 @@ static void route4_bind_class(void *fh,
f->res.class = cl;
}

+static void route4_put(struct tcf_proto *tp, void *f)
+{
+ route4_put_filter(tp, (struct route4_filter *) f);
+}
+
static struct tcf_proto_ops cls_route4_ops __read_mostly = {
.kind = "route",
.classify = route4_classify,
.init = route4_init,
.destroy = route4_destroy,
.get = route4_get,
+ .put = route4_put,
.change = route4_change,
.delete = route4_delete,
.walk = route4_walk,
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1964,6 +1964,7 @@ static int tc_new_tfilter(struct sk_buff
return -EPERM;

replay:
+ fh = NULL;
tp_created = 0;

err = nlmsg_parse_deprecated(n, sizeof(*t), tca, TCA_MAX,
@@ -2110,7 +2111,6 @@ replay:
goto errout;
}
} else if (n->nlmsg_flags & NLM_F_EXCL) {
- tfilter_put(tp, fh);
NL_SET_ERR_MSG(extack, "Filter already exists");
err = -EEXIST;
goto errout;
@@ -2128,13 +2128,13 @@ replay:
if (err == 0) {
tfilter_notify(net, skb, n, tp, block, q, parent, fh,
RTM_NEWTFILTER, false, rtnl_held);
- tfilter_put(tp, fh);
/* q pointer is NULL for shared blocks */
if (q)
q->flags &= ~TCQ_F_CAN_BYPASS;
}

errout:
+ tfilter_put(tp, fh);
if (err && tp_created)
tcf_chain_tp_delete_empty(chain, tp, rtnl_held, NULL);
errout_tp:

Cong Wang

unread,
Mar 11, 2020, 1:26:23ā€ÆAM3/11/20
to Hillf Danton, syzbot, David Miller, Jamal Hadi Salim, Jiri Pirko, Jakub Kicinski, LKML, Linux Kernel Network Developers, syzkaller-bugs
This does not look like a refcnt pairing issue, it seems to be a
double "free", more precisely, we call call_rcu() twice on the same
object. I guess for some reason 'fold' is still visible even after
it is scheduled to be freed by rcu work.

I will take a deeper look tomorrow.

Thanks.

Cong Wang

unread,
Mar 13, 2020, 11:06:29ā€ÆPM3/13/20
to syzbot, David Miller, Jamal Hadi Salim, Jiri Pirko, Jakub Kicinski, LKML, Linux Kernel Network Developers, syzkaller-bugs

syzbot

unread,
Mar 14, 2020, 12:25:03ā€ÆAM3/14/20
to da...@davemloft.net, j...@mojatatu.com, ji...@resnulli.us, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com, xiyou.w...@gmail.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger crash:

Reported-and-tested-by: syzbot+9c2df9...@syzkaller.appspotmail.com

Tested on:

commit: 29311b74 cls_route: remove the old filter from hashtable
git tree: https://github.com/congwang/linux.git tcindex
kernel config: https://syzkaller.appspot.com/x/.config?x=8e8e51c36c1e1ca7
dashboard link: https://syzkaller.appspot.com/bug?extid=9c2df9fd5e9445b74e01
compiler: gcc (GCC) 9.0.0 20181231 (experimental)

Note: testing is done by a robot and is best-effort only.

Cong Wang

unread,
Mar 16, 2020, 2:42:56ā€ÆPM3/16/20
to syzbot, David Miller, Jamal Hadi Salim, Jiri Pirko, Jakub Kicinski, LKML, Linux Kernel Network Developers, syzkaller-bugs
#syz fix: net_sched: cls_route: remove the right filter from hashtable
Reply all
Reply to author
Forward
0 new messages