Hi all,
I'm a PhD student from Argentina, I'm trying to
set up syzkaller to do fuzzing on the linux kernel by hosting QEMU
x86_64 VMs on a x86_64 host. I've run into some problems trying to get
it to run correctly, so I'd like to ask your help in diagnosing the
issue.
I'm using the
gcr.io/syzkaller/env Docker container to try to run syzkaller. As I understand it, the published image is missing the
qemu-system-x86 utility, so I've created a custom Dockerfile that
expands on the base image by adding it. I run this image manually, since
the syz-env tool is hardcoded to look for "
gcr.io/syzkaller/env".
I'm working with kernel version v6.15-rc5 (commit hash 92a09c47464d).
I've
tested that I'm able to spin up a QEMU instance inside the container,
using the compiled kernel and the bullseye image generated by the
create-image.sh tool, and that I'm able to ssh into it (from inside the
container).
The problem arises when I try to
execute syz-manager. In particular, even the smoke-test
mode seems to fail. Running it with the debug flag gives me the following output on repeat, until it crashes:
2025/05/09 16:26:13 running ssh: []string{"-p", "30581", "-F", "/dev/null", "-o", "UserKnownHostsFile=/dev/null", "-o", "IdentitiesOnly=yes", "-o", "BatchMode=yes", "-o", "StrictHostKeyChecking=no", "-o", "ConnectTimeout=10", "-i", "/syzkaller/gopath/src/github.com/google/syzkaller/tmp/bullseye.id_rsa", "-v", "root@localhost", "pwd"} 2025/05/09 16:26:23 ssh failed: failed to run ["ssh" "-p" "30581" "-F" "/dev/null" "-o" "UserKnownHostsFile=/dev/null" "-o" "IdentitiesOnly=yes" "-o" "BatchMode=yes" "-o" "StrictHostKeyChecking=no" "-o" "ConnectTimeout=10" "-i" "/syzkaller/gopath/src/github.com/google/syzkaller/tmp/bullseye.id_rsa" "-v" "root@localhost" "pwd"]: exit status 255 OpenSSH_9.2p1 Debian-2+deb12u5, OpenSSL 3.0.15 3 Sep 2024
debug1: Reading configuration data /dev/null
debug1: Connecting to localhost [::1] port 30581.
debug1: connect to address ::1 port 30581: Connection refused
debug1: Connecting to localhost [127.0.0.1] port 30581.
debug1: fd 3 clearing O_NONBLOCK
debug1: Connection established.
debug1: Local version string SSH-2.0-OpenSSH_9.2p1 Debian-2+deb12u5
Connection timed out during banner exchange
Connection to 127.0.0.1 port 30581 timed out
At the end, just before the crash, I get these few lines of output:
syzkaller login:
2025/05/09 16:30:38 pool: booting instance 0
2025/05/09 16:30:38 running command: qemu-system-x86_64 []string{"-m", "2048", "-smp", "2", "-chardev", "socket,id=SOCKSYZ,server=on,wait=off,host=localhost,port=52471", "-mon", "chardev=SOCKSYZ,mode=control", "-display", "none", "-serial", "stdio", "-no-reboot", "-name", "VM-0", "-device", "virtio-rng-pci", "-enable-kvm", "-cpu", "host,migratable=off", "-device", "e1000,netdev=net0", "-netdev", "user,id=net0,restrict=on,hostfwd=tcp:127.0.0.1:52079-:22", "-hda", "/syzkaller/gopath/src/github.com/google/syzkaller/tmp/bullseye.img", "-snapshot", "-kernel", "/syzkaller/kernel/arch/x86/boot/bzImage", "-append", "root=/dev/sda console=ttyS0 "} 2025/05/09 16:30:38 VM 0: crash: can't ssh into the instance
2025/05/09 16:30:38 [FATAL] kernel crashed in smoke testing mode, exiting
To me, this looks like the syz-manager is failing to ssh into the VMs. This is weird, because as I mentioned I'm able to do it manually (from inside the container, at least).
I should mention that while syz-manager is running I can use wget to access the hosted website at localhost:56741 just fine. The problem is that I'm only able to do this from inside the running container, but the hosted service doesn't appear accessible from anywhere outside it. Furthermore, I can ssh into the running VMs from inside the container, but I'm not able to do it from the outside (even though the corresponding ports are published). I've tested this extensively, and I'm able to host http services in containers created from other images just fine. I don't know if this might indicate there's something wrong with the networking of the container I'm hosting, and if this might be related to the problems I'm experiencing.
Perhaps I misunderstood, and I shouldn't be using "
gcr.io/syzkaller/env" for this? Should I be trying to run everything on my machine instead?
Any help regarding this topic will be greatly appreciated.
Please, let me know if there's any other useful information I might share, like the configs I'm using for syzkaller or the linux kernel, etc.
Thank you a lot in advance,
Best regards
Daniel