Hello everyone,Is there any way to run nsjail in docker without privileged mode?
--
You received this message because you are subscribed to the Google Groups "nsjail" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nsjail+un...@googlegroups.com.
To post to this group, send email to nsj...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nsjail/e02e62f3-1c1e-4f2b-9ea3-02d09a762aa7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
| 14:38 (3 minuty temu) | ![]() ![]() | ||
|
> docker --privilleged mode is used just because of the personality() syscallThere is much more to that. Docker itself uses linux namespaces, cgroups and a default seccomp profile (via docs: "The default seccomp profile will adjust to the selected capabilities, in order to allow use of facilities allowed by the capabilities, so you should not have to adjust this") to limit the things the user can do inside the container so nsjail can't do many other things (set namespaces/cgroups/mounts).So:
> Is there any way to run nsjail in docker without privileged mode?
In theory: yes or maybe? We would need to make it so that the container can use everything nsjail needs.So let's try this. We know that nsjail can work properly if we use `--privileged`:```$ docker run --rm -it --privileged disconnect3d/nsjail nsjail -R / /bin/ls -- /[2019-03-05T13:12:33+0000] Mode: STANDALONE_ONCE[2019-03-05T13:12:33+0000] Jail parameters: hostname:'NSJAIL', chroot:'', process:'/bin/ls', bind:[::]:0, max_conns_per_ip:0, time_limit:0, personality:0, daemonize:false, clone_newnet:true, clone_newuser:true, clone_newns:true, clone_newpid:true, clone_newipc:true, clonew_newuts:true, clone_newcgroup:true, keep_caps:false, disable_no_new_privs:false, max_cpus:0[2019-03-05T13:12:33+0000] Mount point: src:'' dst:'/' flags:'MS_RDONLY' type:'tmpfs' options:'' is_dir:true[2019-03-05T13:12:33+0000] Mount point: src:'/' dst:'/' flags:'MS_RDONLY|MS_BIND|MS_REC|MS_PRIVATE' type:'' options:'' is_dir:true[2019-03-05T13:12:33+0000] Mount point: src:'' dst:'/proc' flags:'MS_RDONLY' type:'proc' options:'' is_dir:true[2019-03-05T13:12:33+0000] Uid map: inside_uid:0 outside_uid:0 count:1 newuidmap:false[2019-03-05T13:12:33+0000] [W][1] void cmdline::logParams(nsjconf_t*)():236 Process will be UID/EUID=0 in the global user namespace, and will have user root-level access to files[2019-03-05T13:12:33+0000] Gid map: inside_gid:0 outside_gid:0 count:1 newgidmap:false[2019-03-05T13:12:33+0000] [W][1] void cmdline::logParams(nsjconf_t*)():247 Process will be GID/EGID=0 in the global user namespace, and will have group root-level access to files[2019-03-05T13:12:33+0000] Executing '/bin/ls' for '[STANDALONE MODE]'bin boot dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var[2019-03-05T13:12:33+0000] PID: 6 ([STANDALONE MODE]) exited with status: 0, (PIDs left: 0)```So maybe we can remove `--privileged` and add all linux capabilities and remove seccomp profile?:```
$ docker run --rm -it --cap-add=ALL --security-opt seccomp=unconfined disconnect3d/nsjail nsjail -R / /bin/ls -- /[2019-03-05T13:12:49+0000] Mode: STANDALONE_ONCE[2019-03-05T13:12:49+0000] Jail parameters: hostname:'NSJAIL', chroot:'', process:'/bin/ls', bind:[::]:0, max_conns_per_ip:0, time_limit:0, personality:0, daemonize:false, clone_newnet:true, clone_newuser:true, clone_newns:true, clone_newpid:true, clone_newipc:true, clonew_newuts:true, clone_newcgroup:true, keep_caps:false, disable_no_new_privs:false, max_cpus:0[2019-03-05T13:12:49+0000] Mount point: src:'' dst:'/' flags:'MS_RDONLY' type:'tmpfs' options:'' is_dir:true[2019-03-05T13:12:49+0000] Mount point: src:'/' dst:'/' flags:'MS_RDONLY|MS_BIND|MS_REC|MS_PRIVATE' type:'' options:'' is_dir:true[2019-03-05T13:12:49+0000] Mount point: src:'' dst:'/proc' flags:'MS_RDONLY' type:'proc' options:'' is_dir:true[2019-03-05T13:12:49+0000] Uid map: inside_uid:0 outside_uid:0 count:1 newuidmap:false[2019-03-05T13:12:49+0000] [W][1] void cmdline::logParams(nsjconf_t*)():236 Process will be UID/EUID=0 in the global user namespace, and will have user root-level access to files
[2019-03-05T13:12:49+0000] Gid map: inside_gid:0 outside_gid:0 count:1 newgidmap:false[2019-03-05T13:12:49+0000] [W][1] void cmdline::logParams(nsjconf_t*)():247 Process will be GID/EGID=0 in the global user namespace, and will have group root-level access to files[2019-03-05T13:12:49+0000] [E][1] bool mnt::initNsInternal(nsjconf_t*)():370 mount('/', '/', NULL, MS_REC|MS_PRIVATE, NULL): Permission denied[2019-03-05T13:12:49+0000] [F][1] bool subproc::runChild(nsjconf_t*, int, int, int)():432 Launching child process failed[2019-03-05T13:12:49+0000] [W][1] bool subproc::runChild(nsjconf_t*, int, int, int)():460 Received error message from the child process before it has been executed[2019-03-05T13:12:49+0000] [E][1] int nsjail::standaloneMode(nsjconf_t*)():146 Couldn't launch the child process
```But we are still getting:> [2019-03-05T13:12:49+0000] [E][1] bool mnt::initNsInternal(nsjconf_t*)():370 mount('/', '/', NULL, MS_REC|MS_PRIVATE, NULL): Permission deniedAnd I am not sure how to make it work from here.Might be sth with "device cgroup controller"? Via docs (https://docs.docker.com/v17.09/edge/engine/reference/commandline/run/#full-container-capabilities-privileged):> The--privileged
flag gives all capabilities to the container, and it also lifts all the limitations enforced by thedevice
cgroup controller. In other words, the container can then do almost everything that the host can do. This flag exists to allow special use-cases, like running Docker within Docker.---Also it is probably worth to ask: why do one want to run nsjail in a docker container?I used it because building nsjail and running it on different machines and distros can be a hassle and docker speeds up this process (i.e. have *one* nsjail build & run environment).Also, if your nsjail config requires to be run under root, it is probably the same as if you would run it through docker with `--privileged`.
vagrant@localhost:~$ sudo docker run --rm -it --security-opt seccomp=unconfined --security-opt apparmor=unconfined --security-opt=no-new-privileges --cap-add SYS_ADMIN -v /proc:/new_proc disconnect3d/nsjail /bin/bash
root@471cd0e0c9f0:/# nsjail -R / /bin/ls -- /
[2019-03-13T10:23:51+0000] Mode: STANDALONE_ONCE
[2019-03-13T10:23:51+0000] Jail parameters: hostname:'NSJAIL', chroot:'', process:'/bin/ls', bind:[::]:0, max_conns_per_ip:0, time_limit:0, personality:0, daemonize:false, clone_newnet:true, clone_newuser:true, clone_newns:true, clone_newpid:true, clone_newipc:true, clonew_newuts:true, clone_newcgroup:true, keep_caps:false, disable_no_new_privs:false, max_cpus:0
[2019-03-13T10:23:51+0000] Mount point: src:'' dst:'/' flags:'MS_RDONLY' type:'tmpfs' options:'' is_dir:true
[2019-03-13T10:23:51+0000] Mount point: src:'/' dst:'/' flags:'MS_RDONLY|MS_BIND|MS_REC|MS_PRIVATE' type:'' options:'' is_dir:true
[2019-03-13T10:23:51+0000] Mount point: src:'' dst:'/proc' flags:'MS_RDONLY' type:'proc' options:'' is_dir:true
[2019-03-13T10:23:51+0000] Uid map: inside_uid:0 outside_uid:0 count:1 newuidmap:false
[2019-03-13T10:23:51+0000] [W][15] void cmdline::logParams(nsjconf_t*)():236 Process will be UID/EUID=0 in the global user namespace, and will have user root-level access to files
[2019-03-13T10:23:51+0000] Gid map: inside_gid:0 outside_gid:0 count:1 newgidmap:false
[2019-03-13T10:23:51+0000] [W][15] void cmdline::logParams(nsjconf_t*)():247 Process will be GID/EGID=0 in the global user namespace, and will have group root-level access to files
[2019-03-13T10:23:51+0000] Executing '/bin/ls' for '[STANDALONE MODE]'
bin boot dev etc home lib lib64 media mnt new_proc opt proc root run sbin srv sys tmp usr var
[2019-03-13T10:23:51+0000] PID: 16 ([STANDALONE MODE]) exited with status: 0, (PIDs left: 0)
root@471cd0e0c9f0:/#
root@471cd0e0c9f0:/#
root@471cd0e0c9f0:/# exit
vagrant@localhost:~$ sudo docker run --rm -it --security-opt seccomp=unconfined --security-opt apparmor=unconfined --security-opt=no-new-privileges --cap-add SYS_ADMIN disconnect3d/nsjail /bin/bash
root@c649a6874ef7:/# mkdir -p /tmp/nsjail.root/proc
root@c649a6874ef7:/# mount -t proc none /tmp/nsjail.root/proc
root@c649a6874ef7:/# nsjail -R / /bin/ls -- /
[2019-03-13T10:24:42+0000] Mode: STANDALONE_ONCE
[2019-03-13T10:24:42+0000] Jail parameters: hostname:'NSJAIL', chroot:'', process:'/bin/ls', bind:[::]:0, max_conns_per_ip:0, time_limit:0, personality:0, daemonize:false, clone_newnet:true, clone_newuser:true, clone_newns:true, clone_newpid:true, clone_newipc:true, clonew_newuts:true, clone_newcgroup:true, keep_caps:false, disable_no_new_privs:false, max_cpus:0
[2019-03-13T10:24:42+0000] Mount point: src:'' dst:'/' flags:'MS_RDONLY' type:'tmpfs' options:'' is_dir:true
[2019-03-13T10:24:42+0000] Mount point: src:'/' dst:'/' flags:'MS_RDONLY|MS_BIND|MS_REC|MS_PRIVATE' type:'' options:'' is_dir:true
[2019-03-13T10:24:42+0000] Mount point: src:'' dst:'/proc' flags:'MS_RDONLY' type:'proc' options:'' is_dir:true
[2019-03-13T10:24:42+0000] Uid map: inside_uid:0 outside_uid:0 count:1 newuidmap:false
[2019-03-13T10:24:42+0000] [W][17] void cmdline::logParams(nsjconf_t*)():236 Process will be UID/EUID=0 in the global user namespace, and will have user root-level access to files
[2019-03-13T10:24:42+0000] Gid map: inside_gid:0 outside_gid:0 count:1 newgidmap:false
[2019-03-13T10:24:42+0000] [W][17] void cmdline::logParams(nsjconf_t*)():247 Process will be GID/EGID=0 in the global user namespace, and will have group root-level access to files
[2019-03-13T10:24:42+0000] Executing '/bin/ls' for '[STANDALONE MODE]'
bin boot dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var
[2019-03-13T10:24:42+0000] PID: 18 ([STANDALONE MODE]) exited with status: 0, (PIDs left: 0)
root@c649a6874ef7:/# ^C
root@c649a6874ef7:/# exit
vagrant@localhost:~$ sudo docker run --rm -it --security-opt seccomp=unconfined --security-opt apparmor=unconfined --security-opt=no-new-privileges --cap-add SYS_ADMIN disconnect3d/nsjail /bin/bash
root@4f2ae317840d:/# nsjail -R / /bin/ls -- /
[2019-03-13T10:25:25+0000] Mode: STANDALONE_ONCE
[2019-03-13T10:25:25+0000] Jail parameters: hostname:'NSJAIL', chroot:'', process:'/bin/ls', bind:[::]:0, max_conns_per_ip:0, time_limit:0, personality:0, daemonize:false, clone_newnet:true, clone_newuser:true, clone_newns:true, clone_newpid:true, clone_newipc:true, clonew_newuts:true, clone_newcgroup:true, keep_caps:false, disable_no_new_privs:false, max_cpus:0
[2019-03-13T10:25:25+0000] Mount point: src:'' dst:'/' flags:'MS_RDONLY' type:'tmpfs' options:'' is_dir:true
[2019-03-13T10:25:25+0000] Mount point: src:'/' dst:'/' flags:'MS_RDONLY|MS_BIND|MS_REC|MS_PRIVATE' type:'' options:'' is_dir:true
[2019-03-13T10:25:25+0000] Mount point: src:'' dst:'/proc' flags:'MS_RDONLY' type:'proc' options:'' is_dir:true
[2019-03-13T10:25:25+0000] Uid map: inside_uid:0 outside_uid:0 count:1 newuidmap:false
[2019-03-13T10:25:25+0000] [W][15] void cmdline::logParams(nsjconf_t*)():236 Process will be UID/EUID=0 in the global user namespace, and will have user root-level access to files
[2019-03-13T10:25:25+0000] Gid map: inside_gid:0 outside_gid:0 count:1 newgidmap:false
[2019-03-13T10:25:25+0000] [W][15] void cmdline::logParams(nsjconf_t*)():247 Process will be GID/EGID=0 in the global user namespace, and will have group root-level access to files
[2019-03-13T10:25:25+0000] [W][1] bool mnt::mountPt(mount_t*, const char*, const char*)():204 mount('src:'' dst:'/proc' flags:'MS_RDONLY' type:'proc' options:'' is_dir:true') src:'none' dstpath:'/tmp/nsjail.0.root//proc' failed: Operation not permitted
[2019-03-13T10:25:25+0000] [W][1] bool mnt::mountPt(mount_t*, const char*, const char*)():209 procfs can only be mounted if the original /proc doesn't have any other file-systems mounted on top of it (e.g. /dev/null on top of /proc/kcore): Operation not permitted
[2019-03-13T10:25:25+0000] [F][1] bool subproc::runChild(nsjconf_t*, int, int, int)():432 Launching child process failed
[2019-03-13T10:25:25+0000] [W][15] bool subproc::runChild(nsjconf_t*, int, int, int)():460 Received error message from the child process before it has been executed
[2019-03-13T10:25:25+0000] [E][15] int nsjail::standaloneMode(nsjconf_t*)():146 Couldn't launch the child process