[syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_get_doit

22 skatījumi
Pāriet uz pirmo nelasīto ziņojumu

syzbot

nelasīta,
2024. gada 15. jūn. 06:39:22 (pirms 11 dienām) 15. jūn.
uz Dai...@oracle.com,chuck...@oracle.com,jla...@kernel.org,ko...@netapp.com,linux-...@vger.kernel.org,linu...@vger.kernel.org,ne...@suse.de,syzkall...@googlegroups.com,t...@talpey.com
Hello,

syzbot found the following issue on:

HEAD commit: cea2a26553ac mailmap: Add my outdated addresses to the map..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=169fd8ee980000
kernel config: https://syzkaller.appspot.com/x/.config?x=fa0ce06dcc735711
dashboard link: https://syzkaller.appspot.com/bug?extid=4207adf14e7c0981d28d
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/1f7ce933512f/disk-cea2a265.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/0ce3b9940616/vmlinux-cea2a265.xz
kernel image: https://storage.googleapis.com/syzbot-assets/19e24094ea37/bzImage-cea2a265.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+4207ad...@syzkaller.appspotmail.com

INFO: task syz-executor.1:17770 blocked for more than 143 seconds.
Not tainted 6.10.0-rc3-syzkaller-00022-gcea2a26553ac #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.1 state:D stack:23800 pid:17770 tgid:17767 ppid:11381 flags:0x00000006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5408 [inline]
__schedule+0x17e8/0x4a20 kernel/sched/core.c:6745
__schedule_loop kernel/sched/core.c:6822 [inline]
schedule+0x14b/0x320 kernel/sched/core.c:6837
schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6894
__mutex_lock_common kernel/locking/mutex.c:684 [inline]
__mutex_lock+0x6a4/0xd70 kernel/locking/mutex.c:752
nfsd_nl_listener_get_doit+0x115/0x5d0 fs/nfsd/nfsctl.c:2124
genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0xb16/0xec0 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x1e5/0x430 net/netlink/af_netlink.c:2564
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
netlink_unicast+0x7ec/0x980 net/netlink/af_netlink.c:1361
netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x223/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
___sys_sendmsg net/socket.c:2639 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2668
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f24ed27cea9
RSP: 002b:00007f24ee0080c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f24ed3b3f80 RCX: 00007f24ed27cea9
RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000005
RBP: 00007f24ed2ebff4 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Jeff Layton

nelasīta,
2024. gada 17. jūn. 06:15:32 (pirms 9 dienām) 17. jūn.
uz syzbot,Dai...@oracle.com,chuck...@oracle.com,ko...@netapp.com,linux-...@vger.kernel.org,linu...@vger.kernel.org,ne...@suse.de,syzkall...@googlegroups.com,t...@talpey.com,David S. Miller,Eric Dumazet,Jakub Kicinski,Paolo Abeni,Lorenzo Bianconi
We've had number of these reports recently. I think I understand what's
happening but I'm not sure how to fix it. The problem manifests as a
stuck nfsd_mutex:

nfsd_nl_rpc_status_get_start takes the nfsd_mutex, and it's released in
nfsd_nl_rpc_status_get_done. These are the ->start and ->done
operations for the rpc_status_get dumpit routine.

I think syzbot is triggering one of the two "goto errout_skb"
conditions in netlink_dump (not sure which). In those cases we end up
returning from that function without calling ->done, which would lead
to the hung mutex like we see here.

Is this a bug in the netlink code, or is the rpc_status_get dumpit
routine not using ->start and ->done correctly?

Thanks,
--
Jeff Layton <jla...@kernel.org>

Jakub Kicinski

nelasīta,
2024. gada 17. jūn. 10:51:35 (pirms 9 dienām) 17. jūn.
uz Jeff Layton,syzbot,Dai...@oracle.com,chuck...@oracle.com,ko...@netapp.com,linux-...@vger.kernel.org,linu...@vger.kernel.org,ne...@suse.de,syzkall...@googlegroups.com,t...@talpey.com,David S. Miller,Eric Dumazet,Paolo Abeni,Lorenzo Bianconi
On Mon, 17 Jun 2024 06:15:25 -0400 Jeff Layton wrote:
> We've had number of these reports recently. I think I understand what's
> happening but I'm not sure how to fix it. The problem manifests as a
> stuck nfsd_mutex:
>
> nfsd_nl_rpc_status_get_start takes the nfsd_mutex, and it's released in
> nfsd_nl_rpc_status_get_done. These are the ->start and ->done
> operations for the rpc_status_get dumpit routine.
>
> I think syzbot is triggering one of the two "goto errout_skb"
> conditions in netlink_dump (not sure which). In those cases we end up
> returning from that function without calling ->done, which would lead
> to the hung mutex like we see here.
>
> Is this a bug in the netlink code, or is the rpc_status_get dumpit
> routine not using ->start and ->done correctly?

Dumps are spread over multiple recvmsg() calls, even if we error out
the next recvmsg() will dump again, until ->done() is called. And we'll
call ->done() if socket is closed without reaching the end.

But the multi-syscall nature puts us at the mercy of the user meaning
that holding locks ->start() to ->done() is a bit of a no-no.
Many of the dumps dump contents of an Xarray, so its easy to remember
an index and continue dumping from where we left off.

Lorenzo Bianconi

nelasīta,
2024. gada 17. jūn. 11:01:45 (pirms 9 dienām) 17. jūn.
uz Jakub Kicinski,Jeff Layton,syzbot,Dai...@oracle.com,chuck...@oracle.com,ko...@netapp.com,linux-...@vger.kernel.org,linu...@vger.kernel.org,ne...@suse.de,syzkall...@googlegroups.com,t...@talpey.com,David S. Miller,Eric Dumazet,Paolo Abeni
I guess we can grab the nfsd_mutex lock in nfsd_nl_rpc_status_get_dumpit() and get
rid of nfsd_nl_rpc_status_get_start() and nfsd_nl_rpc_status_get_done()
completely. We will just verify the nfs server is running each time the dumpit
callback is executed. What do you think?

Regards,
Lorenzo
signature.asc

Jeff Layton

nelasīta,
2024. gada 17. jūn. 11:45:16 (pirms 9 dienām) 17. jūn.
uz Jakub Kicinski,syzbot,Dai...@oracle.com,chuck...@oracle.com,ko...@netapp.com,linux-...@vger.kernel.org,linu...@vger.kernel.org,ne...@suse.de,syzkall...@googlegroups.com,t...@talpey.com,David S. Miller,Eric Dumazet,Paolo Abeni,Lorenzo Bianconi
Understood, thanks. I wasn't keyed into the fact that ->start and -
>done weren't always called in the context of the same syscall. In that
case, I think we have no choice but to move the locking into the -
>dumpit routine. I believe Lorenzo is drafting a patch along those
lines.
--
Jeff Layton <jla...@kernel.org>

Lorenzo Bianconi

nelasīta,
2024. gada 17. jūn. 12:26:32 (pirms 9 dienām) 17. jūn.
uz syzbot,Dai...@oracle.com,chuck...@oracle.com,jla...@kernel.org,ko...@netapp.com,linux-...@vger.kernel.org,linu...@vger.kernel.org,ne...@suse.de,syzkall...@googlegroups.com,t...@talpey.com
#syz test https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git 4ddfda417a50

From be9676fba16c0b8769c3b6094f35da39b1ba3953 Mon Sep 17 00:00:00 2001
Message-ID: <be9676fba16c0b8769c3b6094f35da3...@kernel.org>
From: Lorenzo Bianconi <lor...@kernel.org>
Date: Mon, 17 Jun 2024 16:26:26 +0200
Subject: [PATCH] NFSD: grab nfsd_mutex in nfsd_nl_rpc_status_get_dumpit()

Grab nfsd_mutex lock in nfsd_nl_rpc_status_get_dumpit routine and remove
nfsd_nl_rpc_status_get_start() and nfsd_nl_rpc_status_get_done(). This
patch fix the syzbot log reported below:
Fixes: 1bd773b4f0c9 ("nfsd: hold nfsd_mutex across entire netlink operation")
Fixes: bd9d6a3efa97 ("NFSD: add rpc_status netlink support")
Signed-off-by: Lorenzo Bianconi <lor...@kernel.org>
---
Documentation/netlink/specs/nfsd.yaml | 2 --
fs/nfsd/netlink.c | 2 --
fs/nfsd/netlink.h | 3 --
fs/nfsd/nfsctl.c | 48 ++++++---------------------
4 files changed, 11 insertions(+), 44 deletions(-)

diff --git a/Documentation/netlink/specs/nfsd.yaml b/Documentation/netlink/specs/nfsd.yaml
index 5a98e5a06c68..c87658114852 100644
--- a/Documentation/netlink/specs/nfsd.yaml
+++ b/Documentation/netlink/specs/nfsd.yaml
@@ -132,8 +132,6 @@ operations:
doc: dump pending nfsd rpc
attribute-set: rpc-status
dump:
- pre: nfsd-nl-rpc-status-get-start
- post: nfsd-nl-rpc-status-get-done
reply:
attributes:
- xid
diff --git a/fs/nfsd/netlink.c b/fs/nfsd/netlink.c
index 137701153c9e..ca54aa583530 100644
--- a/fs/nfsd/netlink.c
+++ b/fs/nfsd/netlink.c
@@ -49,9 +49,7 @@ static const struct nla_policy nfsd_pool_mode_set_nl_policy[NFSD_A_POOL_MODE_MOD
static const struct genl_split_ops nfsd_nl_ops[] = {
{
.cmd = NFSD_CMD_RPC_STATUS_GET,
- .start = nfsd_nl_rpc_status_get_start,
.dumpit = nfsd_nl_rpc_status_get_dumpit,
- .done = nfsd_nl_rpc_status_get_done,
.flags = GENL_CMD_CAP_DUMP,
},
{
diff --git a/fs/nfsd/netlink.h b/fs/nfsd/netlink.h
index 9459547de04e..8eb903f24c41 100644
--- a/fs/nfsd/netlink.h
+++ b/fs/nfsd/netlink.h
@@ -15,9 +15,6 @@
extern const struct nla_policy nfsd_sock_nl_policy[NFSD_A_SOCK_TRANSPORT_NAME + 1];
extern const struct nla_policy nfsd_version_nl_policy[NFSD_A_VERSION_ENABLED + 1];

-int nfsd_nl_rpc_status_get_start(struct netlink_callback *cb);
-int nfsd_nl_rpc_status_get_done(struct netlink_callback *cb);
-
int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
struct netlink_callback *cb);
int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info);
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index e5d2cc74ef77..78091a73b33b 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1468,28 +1468,6 @@ static int create_proc_exports_entry(void)

unsigned int nfsd_net_id;

-/**
- * nfsd_nl_rpc_status_get_start - Prepare rpc_status_get dumpit
- * @cb: netlink metadata and command arguments
- *
- * Return values:
- * %0: The rpc_status_get command may proceed
- * %-ENODEV: There is no NFSD running in this namespace
- */
-int nfsd_nl_rpc_status_get_start(struct netlink_callback *cb)
-{
- struct nfsd_net *nn = net_generic(sock_net(cb->skb->sk), nfsd_net_id);
- int ret = -ENODEV;
-
- mutex_lock(&nfsd_mutex);
- if (nn->nfsd_serv)
- ret = 0;
- else
- mutex_unlock(&nfsd_mutex);
-
- return ret;
-}
-
static int nfsd_genl_rpc_status_compose_msg(struct sk_buff *skb,
struct netlink_callback *cb,
struct nfsd_genl_rqstp *rqstp)
@@ -1566,8 +1544,16 @@ static int nfsd_genl_rpc_status_compose_msg(struct sk_buff *skb,
int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
struct netlink_callback *cb)
{
- struct nfsd_net *nn = net_generic(sock_net(skb->sk), nfsd_net_id);
int i, ret, rqstp_index = 0;
+ struct nfsd_net *nn;
+
+ mutex_lock(&nfsd_mutex);
+
+ nn = net_generic(sock_net(skb->sk), nfsd_net_id);
+ if (!nn->nfsd_serv) {
+ ret = -ENODEV;
+ goto out_unlock;
+ }

rcu_read_lock();

@@ -1644,22 +1630,10 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
ret = skb->len;
out:
rcu_read_unlock();
-
- return ret;
-}
-
-/**
- * nfsd_nl_rpc_status_get_done - rpc_status_get dumpit post-processing
- * @cb: netlink metadata and command arguments
- *
- * Return values:
- * %0: Success
- */
-int nfsd_nl_rpc_status_get_done(struct netlink_callback *cb)
-{
+out_unlock:
mutex_unlock(&nfsd_mutex);

- return 0;
+ return ret;
}

/**
--
2.45.1


signature.asc

syzbot

nelasīta,
2024. gada 17. jūn. 12:26:32 (pirms 9 dienām) 17. jūn.
uz lor...@kernel.org,chuck...@oracle.com,dai...@oracle.com,jla...@kernel.org,ko...@netapp.com,linux-...@vger.kernel.org,linu...@vger.kernel.org,lor...@kernel.org,ne...@suse.de,syzkall...@googlegroups.com,t...@talpey.com
This crash does not have a reproducer. I cannot test it.

Jeff Layton

nelasīta,
2024. gada 17. jūn. 12:50:01 (pirms 9 dienām) 17. jūn.
uz Lorenzo Bianconi,syzbot,Dai...@oracle.com,chuck...@oracle.com,ko...@netapp.com,linux-...@vger.kernel.org,linu...@vger.kernel.org,ne...@suse.de,syzkall...@googlegroups.com,t...@talpey.com
> <be9676fba16c0b8769c3b6094f35da39b1ba3953.1718640518.git.lorenzo@kern
Reviewed-by: Jeff Layton <jla...@kernel.org>

Chuck Lever

nelasīta,
2024. gada 17. jūn. 13:22:00 (pirms 9 dienām) 17. jūn.
uz Lorenzo Bianconi,syzbot,Dai...@oracle.com,jla...@kernel.org,ko...@netapp.com,linux-...@vger.kernel.org,linu...@vger.kernel.org,ne...@suse.de,syzkall...@googlegroups.com,t...@talpey.com
Applied to nfsd-fixes (for v6.10-rc). Thanks!



--
Chuck Lever
Atbildēt visiem
Atbildēt autoram
Pārsūtīt
0 jauni ziņojumi