[syzbot] [fs?] BUG: unable to handle kernel NULL pointer dereference in path_from_stashed

21 views
Skip to first unread message

syzbot

unread,
Jul 20, 2024, 12:36:28 PM (7 days ago) Jul 20
to bra...@kernel.org, ja...@suse.cz, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
Hello,

syzbot found the following issue on:

HEAD commit: 51835949dda3 Merge tag 'net-next-6.11' of git://git.kernel..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1325e60d980000
kernel config: https://syzkaller.appspot.com/x/.config?x=c1c1b0a8065e216
dashboard link: https://syzkaller.appspot.com/bug?extid=34a0ee986f61f15da35d
compiler: arm-linux-gnueabi-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11bd9a5e980000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15726e95980000

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/8ead8862021c/non_bootable_disk-51835949.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/a129ae4ab997/vmlinux-51835949.xz
kernel image: https://storage.googleapis.com/syzbot-assets/9339fe082652/zImage-51835949.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+34a0ee...@syzkaller.appspotmail.com

8<--- cut here ---
Unable to handle kernel NULL pointer dereference at virtual address 00000008 when read
[00000008] *pgd=844c0003, *pmd=fe12e003
Internal error: Oops: 205 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 3011 Comm: syz-executor103 Not tainted 6.10.0-syzkaller #0
Hardware name: ARM-Versatile Express
PC is at path_from_stashed+0x1c/0x308 fs/libfs.c:2204
LR is at open_namespace+0x44/0xbc fs/nsfs.c:102
pc : [<8053af54>] lr : [<8054d6f8>] psr: 80000013
sp : df959e80 ip : 84183000 fp : df959ec4
r10: 84183000 r9 : 00000003 r8 : 843f3300
r7 : 82caa250 r6 : 84183000 r5 : 00000000 r4 : 82625878
r3 : df959ecc r2 : 00000008 r1 : 82c95800 r0 : 00000008
Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 30c5387d Table: 843f1a00 DAC: fffffffd
Register r0 information: non-paged memory
Register r1 information: slab kmalloc-1k start 82c95800 pointer offset 0 size 1024
Register r2 information: non-paged memory
Register r3 information: 2-page vmalloc region starting at 0xdf958000 allocated at kernel_clone+0xac/0x3e4 kernel/fork.c:2780
Register r4 information: non-slab/vmalloc memory
Register r5 information: NULL pointer
Register r6 information: slab task_struct start 84183000 pointer offset 0 size 3072
Register r7 information: slab mnt_cache start 82caa240 pointer offset 16 size 184
Register r8 information: slab filp start 843f3300 pointer offset 0 size 160
Register r9 information: non-paged memory
Register r10 information: slab task_struct start 84183000 pointer offset 0 size 3072
Register r11 information: 2-page vmalloc region starting at 0xdf958000 allocated at kernel_clone+0xac/0x3e4 kernel/fork.c:2780
Register r12 information: slab task_struct start 84183000 pointer offset 0 size 3072
Process syz-executor103 (pid: 3011, stack limit = 0xdf958000)
Stack: (0xdf959e80 to 0xdf95a000)
9e80: 841b2400 00000008 df959eb4 df959e98 806e6544 804065ac 00000009 82625878
9ea0: 00000000 84183000 841b2c80 843f3300 00000003 84183000 df959ef4 df959ec8
9ec0: 8054d6f8 8053af44 df959ef4 00000000 00000000 f9244696 82625878 82625878
9ee0: 841b2400 00000008 df959f14 df959ef8 8055271c 8054d6c0 0000ff07 00000000
9f00: 843f3300 00000000 df959fa4 df959f18 8051a7f0 805525a4 000001b2 8020029c
9f20: 84183000 000001b2 df959fac df959f38 8020ba70 8042c724 83f01500 df959f80
9f40: 00000000 843f3300 00000003 82cb0800 df959f7c df959f60 805283b8 8027aebc
9f60: 83f01500 00000003 83f01500 00000003 df959fa4 f9244696 8026b8b0 00000000
9f80: 00000000 0008e058 00000036 8020029c 84183000 00000036 00000000 df959fa8
9fa0: 80200060 8051a6c8 00000000 00000000 00000003 0000ff07 00000000 00000000
9fc0: 00000000 00000000 0008e058 00000036 7ee54e0c 00000000 00000001 00000000
9fe0: 7ee54c70 7ee54c60 0001064c 0002e7a0 00000010 00000003 00000000 00000000
Call trace:
[<8053af38>] (path_from_stashed) from [<8054d6f8>] (open_namespace+0x44/0xbc fs/nsfs.c:102)
r10:84183000 r9:00000003 r8:843f3300 r7:841b2c80 r6:84183000 r5:00000000
r4:82625878
[<8054d6b4>] (open_namespace) from [<8055271c>] (pidfd_ioctl+0x184/0x4c4 fs/pidfs.c:196)
r6:00000008 r5:841b2400 r4:82625878
[<80552598>] (pidfd_ioctl) from [<8051a7f0>] (vfs_ioctl fs/ioctl.c:51 [inline])
[<80552598>] (pidfd_ioctl) from [<8051a7f0>] (do_vfs_ioctl fs/ioctl.c:861 [inline])
[<80552598>] (pidfd_ioctl) from [<8051a7f0>] (__do_sys_ioctl fs/ioctl.c:905 [inline])
[<80552598>] (pidfd_ioctl) from [<8051a7f0>] (sys_ioctl+0x134/0xda4 fs/ioctl.c:893)
r7:00000000 r6:843f3300 r5:00000000 r4:0000ff07
[<8051a6bc>] (sys_ioctl) from [<80200060>] (ret_fast_syscall+0x0/0x1c arch/arm/mm/proc-v7.S:67)
Exception stack(0xdf959fa8 to 0xdf959ff0)
9fa0: 00000000 00000000 00000003 0000ff07 00000000 00000000
9fc0: 00000000 00000000 0008e058 00000036 7ee54e0c 00000000 00000001 00000000
9fe0: 7ee54c70 7ee54c60 0001064c 0002e7a0
r10:00000036 r9:84183000 r8:8020029c r7:00000036 r6:0008e058 r5:00000000
r4:00000000
Code: e24dd01c e1a07001 e5911004 ee1dcf70 (e5905000)
---[ end trace 0000000000000000 ]---
----------------
Code disassembly (best guess):
0: e24dd01c sub sp, sp, #28
4: e1a07001 mov r7, r1
8: e5911004 ldr r1, [r1, #4]
c: ee1dcf70 mrc 15, 0, ip, cr13, cr0, {3}
* 10: e5905000 ldr r5, [r0] <-- trapping instruction


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Edward Adam Davis

unread,
Jul 21, 2024, 12:50:26 AM (6 days ago) Jul 21
to syzbot+34a0ee...@syzkaller.appspotmail.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
time_ns is null ?

#syz test: upstream 5e0497553643

diff --git a/fs/pidfs.c b/fs/pidfs.c
index c9cb14181def..fdae58eb1d98 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -168,6 +168,8 @@ static long pidfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case PIDFD_GET_TIME_NAMESPACE:
get_time_ns(nsp->time_ns);
ns_common = to_ns_common(nsp->time_ns);
+ if (!nsp->time_ns)
+ return -EINVAL;
break;
case PIDFD_GET_TIME_FOR_CHILDREN_NAMESPACE:
get_time_ns(nsp->time_ns_for_children);

syzbot

unread,
Jul 21, 2024, 12:57:04 AM (6 days ago) Jul 21
to ead...@qq.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot tried to test the proposed patch but the build/boot failed:

failed to apply patch:
checking file fs/pidfs.c
Hunk #1 FAILED at 168.
1 out of 1 hunk FAILED



Tested on:

commit: 5e049755 Merge branch 'link_path_walk'
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
userspace arch: arm
patch: https://syzkaller.appspot.com/x/patch.diff?x=13ea160d980000

Edward Adam Davis

unread,
Jul 21, 2024, 1:14:49 AM (6 days ago) Jul 21
to syzbot+34a0ee...@syzkaller.appspotmail.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
time_ns is null ?

#syz test: upstream 51835949dda3

syzbot

unread,
Jul 21, 2024, 1:41:07 AM (6 days ago) Jul 21
to ead...@qq.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-by: syzbot+34a0ee...@syzkaller.appspotmail.com
Tested-by: syzbot+34a0ee...@syzkaller.appspotmail.com

Tested on:

commit: 51835949 Merge tag 'net-next-6.11' of git://git.kernel..
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=1421b72d980000
compiler: arm-linux-gnueabi-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm
patch: https://syzkaller.appspot.com/x/patch.diff?x=100ea8b1980000

Note: testing is done by a robot and is best-effort only.

Edward Adam Davis

unread,
Jul 21, 2024, 2:23:23 AM (6 days ago) Jul 21
to syzbot+34a0ee...@syzkaller.appspotmail.com, bra...@kernel.org, ja...@suse.cz, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
syzbot call pidfd_ioctl() with cmd "PIDFD_GET_TIME_NAMESPACE" and disabled
CONFIG_TIME_NS, since time_ns is NULL, it will make NULL ponter deref in
open_namespace.

Reported-and-tested-by: syzbot+34a0ee...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=34a0ee986f61f15da35d
Signed-off-by: Edward Adam Davis <ead...@qq.com>
---
fs/pidfs.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/pidfs.c b/fs/pidfs.c
index c9cb14181def..fe0ddab48f57 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -168,6 +168,8 @@ static long pidfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case PIDFD_GET_TIME_NAMESPACE:
get_time_ns(nsp->time_ns);
ns_common = to_ns_common(nsp->time_ns);
+ if (!nsp->time_ns)
+ return -EINVAL;
break;
case PIDFD_GET_TIME_FOR_CHILDREN_NAMESPACE:
get_time_ns(nsp->time_ns_for_children);
--
2.43.0

Matthew Wilcox

unread,
Jul 21, 2024, 6:59:12 AM (6 days ago) Jul 21
to Edward Adam Davis, syzbot+34a0ee...@syzkaller.appspotmail.com, bra...@kernel.org, ja...@suse.cz, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
On Sun, Jul 21, 2024 at 02:23:12PM +0800, Edward Adam Davis wrote:
> syzbot call pidfd_ioctl() with cmd "PIDFD_GET_TIME_NAMESPACE" and disabled
> CONFIG_TIME_NS, since time_ns is NULL, it will make NULL ponter deref in
> open_namespace.

what about PIDFD_GET_TIME_FOR_CHILDREN_NAMESPACE?

Christian Brauner

unread,
Jul 22, 2024, 4:01:08 AM (5 days ago) Jul 22
to Edward Adam Davis, syzbot+34a0ee...@syzkaller.appspotmail.com, ja...@suse.cz, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
On Sun, Jul 21, 2024 at 02:23:12PM GMT, Edward Adam Davis wrote:
> syzbot call pidfd_ioctl() with cmd "PIDFD_GET_TIME_NAMESPACE" and disabled
> CONFIG_TIME_NS, since time_ns is NULL, it will make NULL ponter deref in
> open_namespace.
>
> Reported-and-tested-by: syzbot+34a0ee...@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=34a0ee986f61f15da35d
> Signed-off-by: Edward Adam Davis <ead...@qq.com>
> ---

Hm, nsproxy is really messy in that regard. Some namespaces will always
be set to init_<type>_ns and others will be set to NULL.

That's an invitation for bugs such as this. Imho the correct fix is to
change nsproxy to always set nsp-><type>_ns to init_<type>_ns and no
code ever needs to worry about dereferencing NULL.

But that'll require more changes so this seems an appropriate fix for now.

Christian Brauner

unread,
Jul 22, 2024, 9:14:19 AM (5 days ago) Jul 22
to linux-...@vger.kernel.org, Christian Brauner, Edward Adam Davis, syzbot+34a0ee...@syzkaller.appspotmail.com, ja...@suse.cz, linux-...@vger.kernel.org, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
The nsproxy structure contains nearly all of the namespaces associated
with a task. When a given namespace type is not supported by this kernel
the rules whether the corresponding pointer in struct nsproxy is NULL or
always init_<ns_type>_ns differ per namespace. Ideally, that wouldn't be
the case and for all namespace types we'd always set it to
init_<ns_type>_ns when the corresponding namespace type isn't supported.

Make sure we handle all namespaces where the pointer in struct nsproxy
can be NULL when the namespace type isn't supported.

Fixes: 5b08bd408534 ("pidfs: allow retrieval of namespace file descriptors") # mainline only
Signed-off-by: Christian Brauner <bra...@kernel.org>
---
fs/pidfs.c | 65 +++++++++++++++++++++++++++++++++++-------------------
1 file changed, 42 insertions(+), 23 deletions(-)

diff --git a/fs/pidfs.c b/fs/pidfs.c
index fe0ddab48f57..7ffdc88dfb52 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -119,7 +119,7 @@ static long pidfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
struct task_struct *task __free(put_task) = NULL;
struct nsproxy *nsp __free(put_nsproxy) = NULL;
struct pid *pid = pidfd_pid(file);
- struct ns_common *ns_common;
+ struct ns_common *ns_common = NULL;

if (arg)
return -EINVAL;
@@ -146,54 +146,73 @@ static long pidfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
switch (cmd) {
/* Namespaces that hang of nsproxy. */
case PIDFD_GET_CGROUP_NAMESPACE:
- get_cgroup_ns(nsp->cgroup_ns);
- ns_common = to_ns_common(nsp->cgroup_ns);
+ if (IS_ENABLED(CONFIG_CGROUPS)) {
+ get_cgroup_ns(nsp->cgroup_ns);
+ ns_common = to_ns_common(nsp->cgroup_ns);
+ }
break;
case PIDFD_GET_IPC_NAMESPACE:
- get_ipc_ns(nsp->ipc_ns);
- ns_common = to_ns_common(nsp->ipc_ns);
+ if (IS_ENABLED(CONFIG_IPC_NS)) {
+ get_ipc_ns(nsp->ipc_ns);
+ ns_common = to_ns_common(nsp->ipc_ns);
+ }
break;
case PIDFD_GET_MNT_NAMESPACE:
get_mnt_ns(nsp->mnt_ns);
ns_common = to_ns_common(nsp->mnt_ns);
break;
case PIDFD_GET_NET_NAMESPACE:
- ns_common = to_ns_common(nsp->net_ns);
- get_net_ns(ns_common);
+ if (IS_ENABLED(CONFIG_NET_NS)) {
+ ns_common = to_ns_common(nsp->net_ns);
+ get_net_ns(ns_common);
+ }
break;
case PIDFD_GET_PID_FOR_CHILDREN_NAMESPACE:
- get_pid_ns(nsp->pid_ns_for_children);
- ns_common = to_ns_common(nsp->pid_ns_for_children);
+ if (IS_ENABLED(CONFIG_PID_NS)) {
+ get_pid_ns(nsp->pid_ns_for_children);
+ ns_common = to_ns_common(nsp->pid_ns_for_children);
+ }
break;
case PIDFD_GET_TIME_NAMESPACE:
- get_time_ns(nsp->time_ns);
- ns_common = to_ns_common(nsp->time_ns);
- if (!nsp->time_ns)
- return -EINVAL;
+ if (IS_ENABLED(CONFIG_TIME_NS)) {
+ get_time_ns(nsp->time_ns);
+ ns_common = to_ns_common(nsp->time_ns);
+ }
break;
case PIDFD_GET_TIME_FOR_CHILDREN_NAMESPACE:
- get_time_ns(nsp->time_ns_for_children);
- ns_common = to_ns_common(nsp->time_ns_for_children);
+ if (IS_ENABLED(CONFIG_TIME_NS)) {
+ get_time_ns(nsp->time_ns_for_children);
+ ns_common = to_ns_common(nsp->time_ns_for_children);
+ }
break;
case PIDFD_GET_UTS_NAMESPACE:
- get_uts_ns(nsp->uts_ns);
- ns_common = to_ns_common(nsp->uts_ns);
+ if (IS_ENABLED(CONFIG_UTS_NS)) {
+ get_uts_ns(nsp->uts_ns);
+ ns_common = to_ns_common(nsp->uts_ns);
+ }
break;
/* Namespaces that don't hang of nsproxy. */
case PIDFD_GET_USER_NAMESPACE:
- rcu_read_lock();
- ns_common = to_ns_common(get_user_ns(task_cred_xxx(task, user_ns)));
- rcu_read_unlock();
+ if (IS_ENABLED(CONFIG_USER_NS)) {
+ rcu_read_lock();
+ ns_common = to_ns_common(get_user_ns(task_cred_xxx(task, user_ns)));
+ rcu_read_unlock();
+ }
break;
case PIDFD_GET_PID_NAMESPACE:
- rcu_read_lock();
- ns_common = to_ns_common(get_pid_ns(task_active_pid_ns(task)));
- rcu_read_unlock();
+ if (IS_ENABLED(CONFIG_PID_NS)) {
+ rcu_read_lock();
+ ns_common = to_ns_common( get_pid_ns(task_active_pid_ns(task)));
+ rcu_read_unlock();
+ }
break;
default:
return -ENOIOCTLCMD;
}

+ if (!ns_common)
+ return -EOPNOTSUPP;
+
/* open_namespace() unconditionally consumes the reference */
return open_namespace(ns_common);
}
--
2.43.0

Christian Brauner

unread,
Jul 22, 2024, 9:14:25 AM (5 days ago) Jul 22
to linux-...@vger.kernel.org, Christian Brauner, Edward Adam Davis, syzbot+34a0ee...@syzkaller.appspotmail.com, ja...@suse.cz, linux-...@vger.kernel.org, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
Add selftests to verify that deriving namespace file descriptors from
pidfd file descriptors works correctly.

Signed-off-by: Christian Brauner <bra...@kernel.org>
---
.../selftests/pidfd/pidfd_setns_test.c | 258 +++++++++++++++---
1 file changed, 227 insertions(+), 31 deletions(-)

diff --git a/tools/testing/selftests/pidfd/pidfd_setns_test.c b/tools/testing/selftests/pidfd/pidfd_setns_test.c
index 47746b0c6acd..7c2a4349170a 100644
--- a/tools/testing/selftests/pidfd/pidfd_setns_test.c
+++ b/tools/testing/selftests/pidfd/pidfd_setns_test.c
@@ -16,11 +16,56 @@
#include <unistd.h>
#include <sys/socket.h>
#include <sys/stat.h>
+#include <linux/ioctl.h>

#include "pidfd.h"
#include "../clone3/clone3_selftests.h"
#include "../kselftest_harness.h"

+#ifndef PIDFS_IOCTL_MAGIC
+#define PIDFS_IOCTL_MAGIC 0xFF
+#endif
+
+#ifndef PIDFD_GET_CGROUP_NAMESPACE
+#define PIDFD_GET_CGROUP_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 1)
+#endif
+
+#ifndef PIDFD_GET_IPC_NAMESPACE
+#define PIDFD_GET_IPC_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 2)
+#endif
+
+#ifndef PIDFD_GET_MNT_NAMESPACE
+#define PIDFD_GET_MNT_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 3)
+#endif
+
+#ifndef PIDFD_GET_NET_NAMESPACE
+#define PIDFD_GET_NET_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 4)
+#endif
+
+#ifndef PIDFD_GET_PID_NAMESPACE
+#define PIDFD_GET_PID_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 5)
+#endif
+
+#ifndef PIDFD_GET_PID_FOR_CHILDREN_NAMESPACE
+#define PIDFD_GET_PID_FOR_CHILDREN_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 6)
+#endif
+
+#ifndef PIDFD_GET_TIME_NAMESPACE
+#define PIDFD_GET_TIME_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 7)
+#endif
+
+#ifndef PIDFD_GET_TIME_FOR_CHILDREN_NAMESPACE
+#define PIDFD_GET_TIME_FOR_CHILDREN_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 8)
+#endif
+
+#ifndef PIDFD_GET_USER_NAMESPACE
+#define PIDFD_GET_USER_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 9)
+#endif
+
+#ifndef PIDFD_GET_UTS_NAMESPACE
+#define PIDFD_GET_UTS_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 10)
+#endif
+
enum {
PIDFD_NS_USER,
PIDFD_NS_MNT,
@@ -31,22 +76,25 @@ enum {
PIDFD_NS_CGROUP,
PIDFD_NS_PIDCLD,
PIDFD_NS_TIME,
+ PIDFD_NS_TIMECLD,
PIDFD_NS_MAX
};

const struct ns_info {
const char *name;
int flag;
+ unsigned int pidfd_ioctl;
} ns_info[] = {
- [PIDFD_NS_USER] = { "user", CLONE_NEWUSER, },
- [PIDFD_NS_MNT] = { "mnt", CLONE_NEWNS, },
- [PIDFD_NS_PID] = { "pid", CLONE_NEWPID, },
- [PIDFD_NS_UTS] = { "uts", CLONE_NEWUTS, },
- [PIDFD_NS_IPC] = { "ipc", CLONE_NEWIPC, },
- [PIDFD_NS_NET] = { "net", CLONE_NEWNET, },
- [PIDFD_NS_CGROUP] = { "cgroup", CLONE_NEWCGROUP, },
- [PIDFD_NS_PIDCLD] = { "pid_for_children", 0, },
- [PIDFD_NS_TIME] = { "time", CLONE_NEWTIME, },
+ [PIDFD_NS_USER] = { "user", CLONE_NEWUSER, PIDFD_GET_USER_NAMESPACE, },
+ [PIDFD_NS_MNT] = { "mnt", CLONE_NEWNS, PIDFD_GET_MNT_NAMESPACE, },
+ [PIDFD_NS_PID] = { "pid", CLONE_NEWPID, PIDFD_GET_PID_NAMESPACE, },
+ [PIDFD_NS_UTS] = { "uts", CLONE_NEWUTS, PIDFD_GET_UTS_NAMESPACE, },
+ [PIDFD_NS_IPC] = { "ipc", CLONE_NEWIPC, PIDFD_GET_IPC_NAMESPACE, },
+ [PIDFD_NS_NET] = { "net", CLONE_NEWNET, PIDFD_GET_NET_NAMESPACE, },
+ [PIDFD_NS_CGROUP] = { "cgroup", CLONE_NEWCGROUP, PIDFD_GET_CGROUP_NAMESPACE, },
+ [PIDFD_NS_TIME] = { "time", CLONE_NEWTIME, PIDFD_GET_TIME_NAMESPACE, },
+ [PIDFD_NS_PIDCLD] = { "pid_for_children", 0, PIDFD_GET_PID_FOR_CHILDREN_NAMESPACE, },
+ [PIDFD_NS_TIMECLD] = { "time_for_children", 0, PIDFD_GET_TIME_FOR_CHILDREN_NAMESPACE, },
};

FIXTURE(current_nsset)
@@ -54,6 +102,7 @@ FIXTURE(current_nsset)
pid_t pid;
int pidfd;
int nsfds[PIDFD_NS_MAX];
+ int child_pidfd_derived_nsfds[PIDFD_NS_MAX];

pid_t child_pid_exited;
int child_pidfd_exited;
@@ -61,10 +110,12 @@ FIXTURE(current_nsset)
pid_t child_pid1;
int child_pidfd1;
int child_nsfds1[PIDFD_NS_MAX];
+ int child_pidfd_derived_nsfds1[PIDFD_NS_MAX];

pid_t child_pid2;
int child_pidfd2;
int child_nsfds2[PIDFD_NS_MAX];
+ int child_pidfd_derived_nsfds2[PIDFD_NS_MAX];
};

static int sys_waitid(int which, pid_t pid, int options)
@@ -128,9 +179,12 @@ FIXTURE_SETUP(current_nsset)
char c;

for (i = 0; i < PIDFD_NS_MAX; i++) {
- self->nsfds[i] = -EBADF;
- self->child_nsfds1[i] = -EBADF;
- self->child_nsfds2[i] = -EBADF;
+ self->nsfds[i] = -EBADF;
+ self->child_nsfds1[i] = -EBADF;
+ self->child_nsfds2[i] = -EBADF;
+ self->child_pidfd_derived_nsfds[i] = -EBADF;
+ self->child_pidfd_derived_nsfds1[i] = -EBADF;
+ self->child_pidfd_derived_nsfds2[i] = -EBADF;
}

proc_fd = open("/proc/self/ns", O_DIRECTORY | O_CLOEXEC);
@@ -139,6 +193,11 @@ FIXTURE_SETUP(current_nsset)
}

self->pid = getpid();
+ self->pidfd = sys_pidfd_open(self->pid, 0);
+ EXPECT_GT(self->pidfd, 0) {
+ TH_LOG("%m - Failed to open pidfd for process %d", self->pid);
+ }
+
for (i = 0; i < PIDFD_NS_MAX; i++) {
const struct ns_info *info = &ns_info[i];
self->nsfds[i] = openat(proc_fd, info->name, O_RDONLY | O_CLOEXEC);
@@ -148,20 +207,27 @@ FIXTURE_SETUP(current_nsset)
info->name, self->pid);
}
}
- }

- self->pidfd = sys_pidfd_open(self->pid, 0);
- EXPECT_GT(self->pidfd, 0) {
- TH_LOG("%m - Failed to open pidfd for process %d", self->pid);
+ self->child_pidfd_derived_nsfds[i] = ioctl(self->pidfd, info->pidfd_ioctl, 0);
+ if (self->child_pidfd_derived_nsfds[i] < 0) {
+ EXPECT_EQ(errno, EOPNOTSUPP) {
+ TH_LOG("%m - Failed to derive %s namespace from pidfd of process %d",
+ info->name, self->pid);
+ }
+ }
}

/* Create task that exits right away. */
- self->child_pid_exited = create_child(&self->child_pidfd_exited,
- CLONE_NEWUSER | CLONE_NEWNET);
+ self->child_pid_exited = create_child(&self->child_pidfd_exited, 0);
EXPECT_GE(self->child_pid_exited, 0);

- if (self->child_pid_exited == 0)
+ if (self->child_pid_exited == 0) {
+ if (self->nsfds[PIDFD_NS_USER] >= 0 && unshare(CLONE_NEWUSER) < 0)
+ _exit(EXIT_FAILURE);
+ if (self->nsfds[PIDFD_NS_NET] >= 0 && unshare(CLONE_NEWNET) < 0)
+ _exit(EXIT_FAILURE);
_exit(EXIT_SUCCESS);
+ }

ASSERT_EQ(sys_waitid(P_PID, self->child_pid_exited, WEXITED | WNOWAIT), 0);

@@ -174,18 +240,43 @@ FIXTURE_SETUP(current_nsset)
EXPECT_EQ(ret, 0);

/* Create tasks that will be stopped. */
- self->child_pid1 = create_child(&self->child_pidfd1,
- CLONE_NEWUSER | CLONE_NEWNS |
- CLONE_NEWCGROUP | CLONE_NEWIPC |
- CLONE_NEWUTS | CLONE_NEWPID |
- CLONE_NEWNET);
+ if (self->nsfds[PIDFD_NS_USER] >= 0 && self->nsfds[PIDFD_NS_PID] >= 0)
+ self->child_pid1 = create_child(&self->child_pidfd1, CLONE_NEWUSER | CLONE_NEWPID);
+ else if (self->nsfds[PIDFD_NS_PID] >= 0)
+ self->child_pid1 = create_child(&self->child_pidfd1, CLONE_NEWPID);
+ else if (self->nsfds[PIDFD_NS_USER] >= 0)
+ self->child_pid1 = create_child(&self->child_pidfd1, CLONE_NEWUSER);
+ else
+ self->child_pid1 = create_child(&self->child_pidfd1, 0);
EXPECT_GE(self->child_pid1, 0);

if (self->child_pid1 == 0) {
close(ipc_sockets[0]);

- if (!switch_timens())
+ if (self->nsfds[PIDFD_NS_MNT] >= 0 && unshare(CLONE_NEWNS) < 0) {
+ TH_LOG("%m - Failed to unshare mount namespace for process %d", self->pid);
_exit(EXIT_FAILURE);
+ }
+ if (self->nsfds[PIDFD_NS_CGROUP] >= 0 && unshare(CLONE_NEWCGROUP) < 0) {
+ TH_LOG("%m - Failed to unshare cgroup namespace for process %d", self->pid);
+ _exit(EXIT_FAILURE);
+ }
+ if (self->nsfds[PIDFD_NS_IPC] >= 0 && unshare(CLONE_NEWIPC) < 0) {
+ TH_LOG("%m - Failed to unshare ipc namespace for process %d", self->pid);
+ _exit(EXIT_FAILURE);
+ }
+ if (self->nsfds[PIDFD_NS_UTS] >= 0 && unshare(CLONE_NEWUTS) < 0) {
+ TH_LOG("%m - Failed to unshare uts namespace for process %d", self->pid);
+ _exit(EXIT_FAILURE);
+ }
+ if (self->nsfds[PIDFD_NS_NET] >= 0 && unshare(CLONE_NEWNET) < 0) {
+ TH_LOG("%m - Failed to unshare net namespace for process %d", self->pid);
+ _exit(EXIT_FAILURE);
+ }
+ if (self->nsfds[PIDFD_NS_TIME] >= 0 && !switch_timens()) {
+ TH_LOG("%m - Failed to unshare time namespace for process %d", self->pid);
+ _exit(EXIT_FAILURE);
+ }

if (write_nointr(ipc_sockets[1], "1", 1) < 0)
_exit(EXIT_FAILURE);
@@ -203,18 +294,43 @@ FIXTURE_SETUP(current_nsset)
ret = socketpair(AF_LOCAL, SOCK_STREAM | SOCK_CLOEXEC, 0, ipc_sockets);
EXPECT_EQ(ret, 0);

- self->child_pid2 = create_child(&self->child_pidfd2,
- CLONE_NEWUSER | CLONE_NEWNS |
- CLONE_NEWCGROUP | CLONE_NEWIPC |
- CLONE_NEWUTS | CLONE_NEWPID |
- CLONE_NEWNET);
+ if (self->nsfds[PIDFD_NS_USER] >= 0 && self->nsfds[PIDFD_NS_PID] >= 0)
+ self->child_pid2 = create_child(&self->child_pidfd2, CLONE_NEWUSER | CLONE_NEWPID);
+ else if (self->nsfds[PIDFD_NS_PID] >= 0)
+ self->child_pid2 = create_child(&self->child_pidfd2, CLONE_NEWPID);
+ else if (self->nsfds[PIDFD_NS_USER] >= 0)
+ self->child_pid2 = create_child(&self->child_pidfd2, CLONE_NEWUSER);
+ else
+ self->child_pid2 = create_child(&self->child_pidfd2, 0);
EXPECT_GE(self->child_pid2, 0);

if (self->child_pid2 == 0) {
close(ipc_sockets[0]);

- if (!switch_timens())
+ if (self->nsfds[PIDFD_NS_MNT] >= 0 && unshare(CLONE_NEWNS) < 0) {
+ TH_LOG("%m - Failed to unshare mount namespace for process %d", self->pid);
+ _exit(EXIT_FAILURE);
+ }
+ if (self->nsfds[PIDFD_NS_CGROUP] >= 0 && unshare(CLONE_NEWCGROUP) < 0) {
+ TH_LOG("%m - Failed to unshare cgroup namespace for process %d", self->pid);
_exit(EXIT_FAILURE);
+ }
+ if (self->nsfds[PIDFD_NS_IPC] >= 0 && unshare(CLONE_NEWIPC) < 0) {
+ TH_LOG("%m - Failed to unshare ipc namespace for process %d", self->pid);
+ _exit(EXIT_FAILURE);
+ }
+ if (self->nsfds[PIDFD_NS_UTS] >= 0 && unshare(CLONE_NEWUTS) < 0) {
+ TH_LOG("%m - Failed to unshare uts namespace for process %d", self->pid);
+ _exit(EXIT_FAILURE);
+ }
+ if (self->nsfds[PIDFD_NS_NET] >= 0 && unshare(CLONE_NEWNET) < 0) {
+ TH_LOG("%m - Failed to unshare net namespace for process %d", self->pid);
+ _exit(EXIT_FAILURE);
+ }
+ if (self->nsfds[PIDFD_NS_TIME] >= 0 && !switch_timens()) {
+ TH_LOG("%m - Failed to unshare time namespace for process %d", self->pid);
+ _exit(EXIT_FAILURE);
+ }

if (write_nointr(ipc_sockets[1], "1", 1) < 0)
_exit(EXIT_FAILURE);
@@ -267,6 +383,22 @@ FIXTURE_SETUP(current_nsset)
info->name, self->child_pid1);
}
}
+
+ self->child_pidfd_derived_nsfds1[i] = ioctl(self->child_pidfd1, info->pidfd_ioctl, 0);
+ if (self->child_pidfd_derived_nsfds1[i] < 0) {
+ EXPECT_EQ(errno, EOPNOTSUPP) {
+ TH_LOG("%m - Failed to derive %s namespace from pidfd of process %d",
+ info->name, self->child_pid1);
+ }
+ }
+
+ self->child_pidfd_derived_nsfds2[i] = ioctl(self->child_pidfd2, info->pidfd_ioctl, 0);
+ if (self->child_pidfd_derived_nsfds2[i] < 0) {
+ EXPECT_EQ(errno, EOPNOTSUPP) {
+ TH_LOG("%m - Failed to derive %s namespace from pidfd of process %d",
+ info->name, self->child_pid2);
+ }
+ }
}

close(proc_fd);
@@ -288,6 +420,12 @@ FIXTURE_TEARDOWN(current_nsset)
close(self->child_nsfds1[i]);
if (self->child_nsfds2[i] >= 0)
close(self->child_nsfds2[i]);
+ if (self->child_pidfd_derived_nsfds[i] >= 0)
+ close(self->child_pidfd_derived_nsfds[i]);
+ if (self->child_pidfd_derived_nsfds1[i] >= 0)
+ close(self->child_pidfd_derived_nsfds1[i]);
+ if (self->child_pidfd_derived_nsfds2[i] >= 0)
+ close(self->child_pidfd_derived_nsfds2[i]);
}

if (self->child_pidfd1 >= 0)
@@ -446,6 +584,42 @@ TEST_F(current_nsset, nsfd_incremental_setns)
}
}

+TEST_F(current_nsset, pidfd_derived_nsfd_incremental_setns)
+{
+ int i;
+ pid_t pid;
+
+ pid = getpid();
+ for (i = 0; i < PIDFD_NS_MAX; i++) {
+ const struct ns_info *info = &ns_info[i];
+ int nsfd;
+
+ if (self->child_pidfd_derived_nsfds1[i] < 0)
+ continue;
+
+ if (info->flag) {
+ ASSERT_EQ(setns(self->child_pidfd_derived_nsfds1[i], info->flag), 0) {
+ TH_LOG("%m - Failed to setns to %s namespace of %d via nsfd %d",
+ info->name, self->child_pid1,
+ self->child_pidfd_derived_nsfds1[i]);
+ }
+ }
+
+ /* Verify that we have changed to the correct namespaces. */
+ if (info->flag == CLONE_NEWPID)
+ nsfd = self->child_pidfd_derived_nsfds[i];
+ else
+ nsfd = self->child_pidfd_derived_nsfds1[i];
+ ASSERT_EQ(in_same_namespace(nsfd, pid, info->name), 1) {
+ TH_LOG("setns failed to place us correctly into %s namespace of %d via nsfd %d",
+ info->name, self->child_pid1,
+ self->child_pidfd_derived_nsfds1[i]);
+ }
+ TH_LOG("Managed to correctly setns to %s namespace of %d via nsfd %d",
+ info->name, self->child_pid1, self->child_pidfd_derived_nsfds1[i]);
+ }
+}
+
TEST_F(current_nsset, pidfd_one_shot_setns)
{
unsigned flags = 0;
@@ -542,6 +716,28 @@ TEST_F(current_nsset, no_foul_play)
info->name, self->child_pid2,
self->child_nsfds2[i]);
}
+
+ /*
+ * Can't setns to a user namespace outside of our hierarchy since we
+ * don't have caps in there and didn't create it. That means that under
+ * no circumstances should we be able to setns to any of the other
+ * ones since they aren't owned by our user namespace.
+ */
+ for (i = 0; i < PIDFD_NS_MAX; i++) {
+ const struct ns_info *info = &ns_info[i];
+
+ if (self->child_pidfd_derived_nsfds2[i] < 0 || !info->flag)
+ continue;
+
+ ASSERT_NE(setns(self->child_pidfd_derived_nsfds2[i], info->flag), 0) {
+ TH_LOG("Managed to setns to %s namespace of %d via nsfd %d",
+ info->name, self->child_pid2,
+ self->child_pidfd_derived_nsfds2[i]);
+ }
+ TH_LOG("%m - Correctly failed to setns to %s namespace of %d via nsfd %d",
+ info->name, self->child_pid2,
+ self->child_pidfd_derived_nsfds2[i]);
+ }
}

TEST(setns_einval)
--
2.43.0

Reply all
Reply to author
Forward
0 new messages