In a BPF program loaded with the BPF_PROG_TYPE_CGROUP_SOCK_ADDR type, particularly when BPF_CGROUP_UNIX_RECVMSG is applied, a race condition can occur between sk_setsockopt(), __unix_dgram_recvmsg(), and unix_stream_read_generic. This can result in a deadlock, specifically in unix_set_peek_off() (net/unix/af_unix.c, L:789) or unix_stream_read_generic() (net/unix/af_unix.c, L:2775).
Deadlock between sk_setsockopt() and __unix_dgram_recvmsg()
Thread 0 | Thread 1
|
sk_setsockopt(sk) |
1. sockopt_lock_sock(sk) |
|
| __unix_dgram_recvmsg()
| 2. mutex_lock(&u->iolock)
| BPF_CGROUP_RUN_PROG_UNIX_RECVMSG_LOCK(sk)
| 3. lock_sock(sk)
|
unix_set_peek_off() |
4. if(mutex_lock_interruptible(&u->iolock))//deadlock |
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Deadlock between sk_setsockopt() and unix_stream_read_generic()
Thread 0 | Thread 1
unix_stream_read_generic(state) |
1. mutex_lock(&u->iolock) |
|
| sk_setsockopt(sk)
| 2. sockopt_lock_sock(sk)
| unix_set_peek_off()
| 3. if(mutex_lock_interruptible(&u->iolock))
|
BPF_CGROUP_RUN_PROG_UNIX_RECVMSG_LOCK(sk) |
4. lock_sock(sk) // deadlock |
I attached a PoC(Proof of Concept) to simulate this scenario.
1. execute ./poc_dgram
======================================================
WARNING: possible circular locking dependency detected
6.7.0-rc5 #6 Not tainted
------------------------------------------------------
a.out/261 is trying to acquire lock:
ffff888104fb17b0 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: __unix_dgram_recvmsg+0x3f0/0x420
but task is already holding lock:
ffff888104fb1c00 (&u->iolock){+.+.}-{3:3}, at: __unix_dgram_recvmsg+0xb7/0x420
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&u->iolock){+.+.}-{3:3}:
__mutex_lock+0xa7/0xb70
unix_set_peek_off+0x1e/0x50
sk_setsockopt+0xbdd/0x12f0
do_sock_setsockopt+0xa9/0x190
__se_sys_setsockopt+0x83/0xc0
do_syscall_64+0x50/0xf0
entry_SYSCALL_64_after_hwframe+0x6f/0x77
-> #0 (sk_lock-AF_UNIX){+.+.}-{0:0}:
__lock_acquire+0x1449/0x2cd0
lock_acquire+0xe3/0x250
lock_sock_nested+0x2e/0x80
__unix_dgram_recvmsg+0x3f0/0x420
sock_recvmsg+0x9d/0xc0
____sys_recvmsg+0x111/0x1f0
___sys_recvmsg+0x210/0x2d0
__x64_sys_recvmsg+0x10c/0x150
do_syscall_64+0x50/0xf0
entry_SYSCALL_64_after_hwframe+0x6f/0x77
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&u->iolock);
lock(sk_lock-AF_UNIX);
lock(&u->iolock);
lock(sk_lock-AF_UNIX);
*** DEADLOCK ***
1 lock held by a.out/261:
#0: ffff888104fb1c00 (&u->iolock){+.+.}-{3:3}, at: __unix_dgram_recvmsg+0xb7/0x420
stack backtrace:
CPU: 5 PID: 261 Comm: a.out Not tainted 6.7.0-rc5 #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x6e/0xb0
check_noncircular+0x138/0x160
__lock_acquire+0x1449/0x2cd0
? __skb_try_recv_datagram+0x80/0x180
? lock_acquire+0xe3/0x250
? __skb_try_recv_datagram+0x80/0x180
? __unix_dgram_recvmsg+0x3f0/0x420
lock_acquire+0xe3/0x250
? __unix_dgram_recvmsg+0x3f0/0x420
? __skb_try_recv_datagram+0xaa/0x180
lock_sock_nested+0x2e/0x80
? __unix_dgram_recvmsg+0x3f0/0x420
__unix_dgram_recvmsg+0x3f0/0x420
sock_recvmsg+0x9d/0xc0
____sys_recvmsg+0x111/0x1f0
___sys_recvmsg+0x210/0x2d0
__x64_sys_recvmsg+0x10c/0x150
do_syscall_64+0x50/0xf0
entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7f974647adad
Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 6a ef ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2f 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 9e ef ff ff 48
RSP: 002b:00007f9745a8de60 EFLAGS: 00000293 ORIG_RAX: 000000000000002f
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f974647adad
RDX: 0000000000000040 RSI: 00007f9745a8eeb0 RDI: 0000000000000003
RBP: 00007f9745a8eef0 R08: 0000000000000000 R09: 00007f9745a8f700
R10: fffffffffffff744 R11: 0000000000000293 R12: 00007ffcfa69042e
R13: 00007ffcfa69042f R14: 00007f9745a8efc0 R15: 0000000000802000
</TASK>
2. execute ./poc_stream
======================================================
WARNING: possible circular locking dependency detected
6.7.0-rc5-dirty #12 Not tainted
------------------------------------------------------
poc_stream/265 is trying to acquire lock:
ffff8881019335b0 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: unix_stream_r0
but task is already holding lock:
ffff888101933a00 (&u->iolock){+.+.}-{3:3}, at: unix_stream_read_ge0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&u->iolock){+.+.}-{3:3}:
__mutex_lock+0xa7/0xb70
unix_set_peek_off+0x1e/0x50
sk_setsockopt+0xbdd/0x12f0
do_sock_setsockopt+0xa9/0x190
__se_sys_setsockopt+0x83/0xc0
do_syscall_64+0x50/0xf0
entry_SYSCALL_64_after_hwframe+0x6f/0x77
-> #0 (sk_lock-AF_UNIX){+.+.}-{0:0}:
__lock_acquire+0x1449/0x2cd0
lock_acquire+0xe3/0x250
lock_sock_nested+0x2e/0x80
unix_stream_read_generic+0x580/0xa50
unix_stream_recvmsg+0x84/0xb0
sock_recvmsg+0x9d/0xc0
____sys_recvmsg+0x111/0x1f0
___sys_recvmsg+0x210/0x2d0
__x64_sys_recvmsg+0x10c/0x150
do_syscall_64+0x50/0xf0
entry_SYSCALL_64_after_hwframe+0x6f/0x77
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&u->iolock);
lock(sk_lock-AF_UNIX);
lock(&u->iolock);
lock(sk_lock-AF_UNIX);
*** DEADLOCK ***
1 lock held by poc_stream/265:
#0: ffff888101933a00 (&u->iolock){+.+.}-{3:3}, at: unix_stream_re0
stack backtrace:
CPU: 2 PID: 265 Comm: poc_stream Not tainted 6.7.0-rc5-dirty #12
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.04
Call Trace:
<TASK>
dump_stack_lvl+0x6e/0xb0
check_noncircular+0x138/0x160
__lock_acquire+0x1449/0x2cd0
? unix_stream_read_generic+0xde/0xa50
? lock_acquire+0xe3/0x250
? lockdep_hardirqs_on_prepare+0x175/0x260
? unix_stream_read_generic+0x580/0xa50
lock_acquire+0xe3/0x250
? unix_stream_read_generic+0x580/0xa50
? unix_stream_read_generic+0x130/0xa50
lock_sock_nested+0x2e/0x80
? unix_stream_read_generic+0x580/0xa50
unix_stream_read_generic+0x580/0xa50
? selinux_socket_recvmsg+0xd6/0x100
unix_stream_recvmsg+0x84/0xb0
? __pfx_unix_stream_read_actor+0x10/0x10
sock_recvmsg+0x9d/0xc0
____sys_recvmsg+0x111/0x1f0
___sys_recvmsg+0x210/0x2d0
__x64_sys_recvmsg+0x10c/0x150
do_syscall_64+0x50/0xf0
entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7f04a2d02dad
Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 6a ef ff ff 8b 8
RSP: 002b:00007f04a1b14e60 EFLAGS: 00000293 ORIG_RAX: 000000000000f
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f04a2d02dad
RDX: 0000000000000000 RSI: 00007f04a1b15eb0 RDI: 0000000000000005
RBP: 00007f04a1b15ef0 R08: 0000000000000000 R09: 00007f04a1b16700
R10: fffffffffffff648 R11: 0000000000000293 R12: 00007ffd8aa9fb6e
R13: 00007ffd8aa9fb6f R14: 00007f04a1b15fc0 R15: 0000000000802000
</TASK>
To prevent this, the following patch should be applied.
Before calling BPF_CGROUP_RUN_PROG_UNIX_RECVMSG() in __unix_dgram_recvmsg() and unix_stream_read_generic(), invoke mutex_unlock(&u->iolock). After the function returns, call mutex_lock(&u->iolock).
This patch is intended to proactively prevent deadlock caused by a race condition.
Fixes: 859051dd165e ("bpf: Implement cgroup sockaddr hooks for unix sockets")
Reported-by: Team p0pk3rn <
bob.p...@gmail.com>
Signed-off-by: Team p0pk3rn <
bob.p...@gmail.com>
---
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index ac1f2bc18fc9..64e847d93527 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2414,9 +2414,11 @@ int __unix_dgram_recvmsg(struct sock *sk, struct msghdr *msg, size_t size,
if (msg->msg_name) {
unix_copy_addr(msg, skb->sk);
+ mutex_unlock(&u->iolock);
BPF_CGROUP_RUN_PROG_UNIX_RECVMSG_LOCK(sk,
msg->msg_name,
&msg->msg_namelen);
+ mutex_lock(&u->iolock);
}
if (size > skb->len - skip)
@@ -2772,9 +2774,11 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state,
state->msg->msg_name);
unix_copy_addr(state->msg, skb->sk);
+ mutex_unlock(&u->iolock);
BPF_CGROUP_RUN_PROG_UNIX_RECVMSG_LOCK(sk,
state->msg->msg_name,
&state->msg->msg_namelen);
+ mutex_lock(&u->iolock);
sunaddr = NULL;
}
--
2.34.1