inconsistent lock state in rxrpc_put_client_conn

13 views
Skip to first unread message

syzbot

unread,
Feb 3, 2020, 7:38:12ā€ÆPM2/3/20
to da...@davemloft.net, dhow...@redhat.com, ku...@kernel.org, linu...@lists.infradead.org, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot found the following crash on:

HEAD commit: 3d80c653 Merge tag 'rxrpc-fixes-20200203' of git://git.ker..
git tree: net
console output: https://syzkaller.appspot.com/x/log.txt?x=16a38595e00000
kernel config: https://syzkaller.appspot.com/x/.config?x=95b275782b150c86
dashboard link: https://syzkaller.appspot.com/bug?extid=3f1fd6b8cbf8702d134e
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=14ac314ee00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13ec4c5ee00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+3f1fd6...@syzkaller.appspotmail.com

================================
WARNING: inconsistent lock state
5.5.0-syzkaller #0 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
ffff88808e8fa1c8 (&(&local->client_conns_lock)->rlock){+.?.}, at: spin_lock include/linux/spinlock.h:338 [inline]
ffff88808e8fa1c8 (&(&local->client_conns_lock)->rlock){+.?.}, at: rxrpc_put_one_client_conn net/rxrpc/conn_client.c:948 [inline]
ffff88808e8fa1c8 (&(&local->client_conns_lock)->rlock){+.?.}, at: rxrpc_put_client_conn+0x6ed/0xc90 net/rxrpc/conn_client.c:1001
{SOFTIRQ-ON-W} state was registered at:
lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4484
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:338 [inline]
rxrpc_get_client_conn net/rxrpc/conn_client.c:304 [inline]
rxrpc_connect_call+0x358/0x4e30 net/rxrpc/conn_client.c:701
rxrpc_new_client_call+0x9c0/0x1ad0 net/rxrpc/call_object.c:290
rxrpc_new_client_call_for_sendmsg net/rxrpc/sendmsg.c:595 [inline]
rxrpc_do_sendmsg+0xffa/0x1d5f net/rxrpc/sendmsg.c:652
rxrpc_sendmsg+0x4d6/0x5f0 net/rxrpc/af_rxrpc.c:586
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:672
____sys_sendmsg+0x358/0x880 net/socket.c:2343
___sys_sendmsg+0x100/0x170 net/socket.c:2397
__sys_sendmmsg+0x1bf/0x4d0 net/socket.c:2487
__do_sys_sendmmsg net/socket.c:2516 [inline]
__se_sys_sendmmsg net/socket.c:2513 [inline]
__x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2513
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
irq event stamp: 130510
hardirqs last enabled at (130510): [<ffffffff87e8d446>] __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
hardirqs last enabled at (130510): [<ffffffff87e8d446>] _raw_spin_unlock_irqrestore+0x66/0xe0 kernel/locking/spinlock.c:191
hardirqs last disabled at (130509): [<ffffffff87e8d7bf>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
hardirqs last disabled at (130509): [<ffffffff87e8d7bf>] _raw_spin_lock_irqsave+0x6f/0xcd kernel/locking/spinlock.c:159
softirqs last enabled at (130494): [<ffffffff8147535c>] _local_bh_enable+0x1c/0x30 kernel/softirq.c:162
softirqs last disabled at (130495): [<ffffffff81477d5b>] invoke_softirq kernel/softirq.c:373 [inline]
softirqs last disabled at (130495): [<ffffffff81477d5b>] irq_exit+0x19b/0x1e0 kernel/softirq.c:413

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&(&local->client_conns_lock)->rlock);
<Interrupt>
lock(&(&local->client_conns_lock)->rlock);

*** DEADLOCK ***

1 lock held by swapper/1/0:
#0: ffffffff89babe80 (rcu_callback){....}, at: rcu_do_batch kernel/rcu/tree.c:2176 [inline]
#0: ffffffff89babe80 (rcu_callback){....}, at: rcu_core+0x562/0x1390 kernel/rcu/tree.c:2410

stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.5.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x197/0x210 lib/dump_stack.c:118
print_usage_bug.cold+0x327/0x378 kernel/locking/lockdep.c:3100
valid_state kernel/locking/lockdep.c:3111 [inline]
mark_lock_irq kernel/locking/lockdep.c:3308 [inline]
mark_lock+0xbb4/0x1220 kernel/locking/lockdep.c:3665
mark_usage kernel/locking/lockdep.c:3565 [inline]
__lock_acquire+0x1e8e/0x4a00 kernel/locking/lockdep.c:3908
lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4484
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:338 [inline]
rxrpc_put_one_client_conn net/rxrpc/conn_client.c:948 [inline]
rxrpc_put_client_conn+0x6ed/0xc90 net/rxrpc/conn_client.c:1001
rxrpc_put_connection net/rxrpc/ar-internal.h:965 [inline]
rxrpc_rcu_destroy_call+0xbd/0x200 net/rxrpc/call_object.c:572
rcu_do_batch kernel/rcu/tree.c:2186 [inline]
rcu_core+0x5e1/0x1390 kernel/rcu/tree.c:2410
rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2419
__do_softirq+0x262/0x98c kernel/softirq.c:292
invoke_softirq kernel/softirq.c:373 [inline]
irq_exit+0x19b/0x1e0 kernel/softirq.c:413
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x1a3/0x610 arch/x86/kernel/apic/apic.c:1137
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
</IRQ>
RIP: 0010:native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:61
Code: b8 43 cb f9 eb 8a cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d 24 bf 5f 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 14 bf 5f 00 fb f4 <c3> cc 55 48 89 e5 41 57 41 56 41 55 41 54 53 e8 4e 19 7a f9 e8 e9
RSP: 0018:ffffc90000d3fd68 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
RAX: 1ffffffff13675b2 RBX: ffff8880a99fc340 RCX: 0000000000000000
RDX: dffffc0000000000 RSI: 0000000000000006 RDI: ffff8880a99fcbd4
RBP: ffffc90000d3fd98 R08: ffff8880a99fc340 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000
R13: ffffffff8aa3e080 R14: 0000000000000000 R15: 0000000000000001
arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:686
default_idle_call+0x84/0xb0 kernel/sched/idle.c:94
cpuidle_idle_call kernel/sched/idle.c:154 [inline]
do_idle+0x3c8/0x6e0 kernel/sched/idle.c:269


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

syzbot

unread,
Feb 4, 2020, 12:41:03ā€ÆAM2/4/20
to da...@davemloft.net, dhow...@redhat.com, ku...@kernel.org, linu...@lists.infradead.org, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com
syzbot has bisected this bug to:

commit 5273a191dca65a675dc0bcf3909e59c6933e2831
Author: David Howells <dhow...@redhat.com>
Date: Thu Jan 30 21:50:36 2020 +0000

rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1182314ee00000
start commit: 3d80c653 Merge tag 'rxrpc-fixes-20200203' of git://git.ker..
git tree: net
final crash: https://syzkaller.appspot.com/x/report.txt?x=1382314ee00000
console output: https://syzkaller.appspot.com/x/log.txt?x=1582314ee00000
Reported-by: syzbot+3f1fd6...@syzkaller.appspotmail.com
Fixes: 5273a191dca6 ("rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

Hillf Danton

unread,
Feb 4, 2020, 3:40:26ā€ÆAM2/4/20
to syzbot, da...@davemloft.net, dhow...@redhat.com, ku...@kernel.org, linu...@lists.infradead.org, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com

Mon, 03 Feb 2020 16:38:12 -0800 (PST)
> syzbot found the following crash on:
>
> HEAD commit: 3d80c653 Merge tag 'rxrpc-fixes-20200203' of git://git.ker..
> git tree: net
> console output: https://syzkaller.appspot.com/x/log.txt?x=16a38595e00000
> compiler: gcc (GCC) 9.0.0 20181231 (experimental)
Take lock with irq quiesced.

--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -282,6 +282,7 @@ static int rxrpc_get_client_conn(struct
struct rxrpc_local *local = cp->local;
struct rb_node *p, **pp, *parent;
long diff;
+ unsigned long flags;
int ret = -ENOMEM;

_enter("{%d,%lx},", call->debug_id, call->user_call_ID);
@@ -301,7 +302,7 @@ static int rxrpc_get_client_conn(struct
*/
if (!cp->exclusive) {
_debug("search 1");
- spin_lock(&local->client_conns_lock);
+ spin_lock_irqsave(&local->client_conns_lock, flags);
p = local->client_conns.rb_node;
while (p) {
conn = rb_entry(p, struct rxrpc_connection, client_node);
@@ -328,7 +329,7 @@ static int rxrpc_get_client_conn(struct
break;
}
}
- spin_unlock(&local->client_conns_lock);
+ spin_unlock_irqrestore(&local->client_conns_lock, flags);
}

/* There wasn't a connection yet or we need an exclusive connection.
@@ -365,7 +366,7 @@ static int rxrpc_get_client_conn(struct
* conflicting instance.
*/
_debug("search 2");
- spin_lock(&local->client_conns_lock);
+ spin_lock_irqsave(&local->client_conns_lock, flags);

pp = &local->client_conns.rb_node;
parent = NULL;
@@ -408,7 +409,7 @@ candidate_published:
call->security = candidate->security;
call->security_ix = candidate->security_ix;
call->service_id = candidate->service_id;
- spin_unlock(&local->client_conns_lock);
+ spin_unlock_irqrestore(&local->client_conns_lock, flags);
_leave(" = 0 [new %d]", candidate->debug_id);
return 0;

@@ -418,7 +419,7 @@ candidate_published:
*/
found_extant_conn:
_debug("found conn");
- spin_unlock(&local->client_conns_lock);
+ spin_unlock_irqrestore(&local->client_conns_lock, flags);

if (candidate) {
trace_rxrpc_client(candidate, -1, rxrpc_client_duplicate);

David Howells

unread,
Feb 6, 2020, 8:09:46ā€ÆAM2/6/20
to Hillf Danton, dhow...@redhat.com, syzbot, da...@davemloft.net, ku...@kernel.org, linu...@lists.infradead.org, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com
Hillf Danton <hda...@sina.com> wrote:

> Take lock with irq quiesced.

I think that's overkill. It only needs _bh annotations, not _irqsave/restore
- but even that is probably not the best way.

The best way is to offload the stuff done by rxrpc_rcu_destroy_call() to a
workqueue if called in softirq mode. I'm not sure whether rcu callbacks are
done in softirq mode - if they are, then it can just call rxrpc_queue_work().

David

Hillf Danton

unread,
Feb 6, 2020, 10:16:15ā€ÆPM2/6/20
to David Howells, syzbot, da...@davemloft.net, ku...@kernel.org, linu...@lists.infradead.org, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com

On Thu, 06 Feb 2020 13:09:36 +0000 David Howells wrote:
>
> > Take lock with irq quiesced.
>
> I think that's overkill. It only needs _bh annotations, not _irqsave/restore
> - but even that is probably not the best way.
>
> The best way is to offload the stuff done by rxrpc_rcu_destroy_call() to a
> workqueue if called in softirq mode. I'm not sure whether rcu callbacks are
> done in softirq mode - if they are, then it can just call rxrpc_queue_work().

Fair.
It may look like

--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -551,7 +551,10 @@ enum rxrpc_congest_mode {
* - matched by { connection, call_id }
*/
struct rxrpc_call {
- struct rcu_head rcu;
+ union {
+ struct rcu_head rcu;
+ struct work_struct destruct_work;
+ };
struct rxrpc_connection *conn; /* connection carrying call */
struct rxrpc_peer *peer; /* Peer record for remote address */
struct rxrpc_sock __rcu *socket; /* socket responsible */
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -561,12 +561,10 @@ void rxrpc_put_call(struct rxrpc_call *c
}
}

-/*
- * Final call destruction under RCU.
- */
-static void rxrpc_rcu_destroy_call(struct rcu_head *rcu)
+static void rxrpc_destruct_call_workfn(struct work_struct *work)
{
- struct rxrpc_call *call = container_of(rcu, struct rxrpc_call, rcu);
+ struct rxrpc_call *call = container_of(work, struct rxrpc_call,
+ destruct_work);
struct rxrpc_net *rxnet = call->rxnet;

rxrpc_put_peer(call->peer);
@@ -578,6 +576,17 @@ static void rxrpc_rcu_destroy_call(struc
}

/*
+ * Final call destruction under RCU.
+ */
+static void rxrpc_rcu_destroy_call(struct rcu_head *rcu)
+{
+ struct rxrpc_call *call = container_of(rcu, struct rxrpc_call, rcu);
+
+ INIT_WORK(&call->destruct_work, rxrpc_destruct_call_workfn);
+ rxrpc_queue_work(&call->destruct_work);
+}
+
+/*
* clean up a call
*/
void rxrpc_cleanup_call(struct rxrpc_call *call)

David Howells

unread,
Feb 7, 2020, 2:23:25ā€ÆAM2/7/20
to Hillf Danton, dhow...@redhat.com, syzbot, da...@davemloft.net, ku...@kernel.org, linu...@lists.infradead.org, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages