Hello,
I've been working on getting syzkaller setup on a freebsd guest
from a linux host using qemu/kvm. I am a student looking to build
up some experience and get the career ball rolling, and someone
suggested to me that contributing to an open source project is a
good place to start. On the FreeBSD project ideas page I found
references to syzkaller which caught my interest. I hoped I could
start by adding syscall definitions if any are still unsupported
and help with the FreeBSD documentation section. In the process of
getting everything up and running I made a few changes to correct
some errors that I got along the way.
I noticed that the docs are a bit dated, particularly in the section about compiling the executor on the guest. I found a thread in this group which was helpful in getting the executor compilation going without errors. For some reason, I wasn't able to use 'go get' to acquire the syzkaller sources (probably user error, I haven't done much with golang before) and ended up cloning the repo with git. I'm not sure if that will cause issues down the road but please correct me if I am wrong.Â
Basically, the steps I took were :
   1.) Setup a freebsd14 VM from the .qcow2 snapshot and expand the image to 45GB with 'qemu-img resize image.qcow2 +45GB' on the host as well as '/etc/rc.d/growfs onestart' on the guest (45GB is probably overkill, but I kept running out of space during buildworld and added extra to be safe, less would probably work, my guest image ended up using about 22GB out of 47GB in the end). then make the SYZKALLER custom kernel config file according to the current freebsd doc. At first I had issues building the kernel. For me, it worked after running:
      make -j4 buildworld && make -j4 buildkernel KERNCONF=SYZKALLER
      make -j4 installkernel KERNCONF=SYZKALLER
   2.) Install syzkaller dependencies on the host (replace
pacman with the relevant package manager for the host OS):
      pacman -S bash gcc git gmake go golangci-lint llvm
   (golangci-lint was not available in the pacman repos, and was
installed from the AUR, so may need to be acquired separate from
the other dependencies)
   3.) Acquire syzkaller sources via 'git clone' or 'go get'
(git clone in my case, correct me if this was the wrong way to do
things):
      git clone https://github.com/google/syzkaller/
   (OR)
      go get -d -u https://github.com/google/syzkaller/
   4.) Compile the binaries (in host syzkaller directory):
      make all HOSTOS=linux HOSTARCH=amd64 TARGETOS=freebsd TARGETARCH=amd64
   5.) Start the guest image:
      qemu-system-x86_64 -m 2048 -smp 4 -chardev
socket,id=SOCKSYZ,server=on,wait=off,host=localhost,port=51727
-mon chardev=SOCKSYZ,mode=control -display none -serial stdio
-no-reboot -name VM-0 -device virtio-rng-pci -enable-kvm -device
e1000,netdev=net0 -netdev
user,id=net0,restrict=on,hostfwd=tcp:127.0.0.1:1569-:22
-hda/path/to/snapshot.qcow2
   6.) Copy the syzkaller/executor/* files from the host to the
guest (replace $PORT and $DIR/TO/SSHKEY as appropriate):
      scp -P $PORT -i $DIR/TO/SSHKEY syzkaller/executor/
root@localhost:/root/
   7.) Compile syz-executor (on the guest ):
      c++ -o syz-executor executor/executor.cc -pthread -Wall -Wframe-larger-than=8192 -Wparentheses -Werror -O2 -m64 -static -DGOOS_freebsd=1 -DGOARCH_amd64=1 -DGIT_REVISION=\"61f862782082c777ba335aa4b4b08d4f74d7d86e\"
   8.) Copy the syz-executor binary from the guest to the host
in the syzkaller/bin/freebsd_amd64/ directory (replace $PORT and
$SSHKEY as appropriate):
      scp -P $PORT -i $SSHKEY root@localhost:/root/syz-executor
syzkaller/bin/freebsd_amd64/
   9.) Shut down the guest and run syz-manager on the host with:
      sys-manager -config freebsd.cfg
At first, I had some issues with ssh authentication, but realized I ran the qemu command with the -snapshot flag which made config changes to the guest not persistent across reboots, removing that flag allowed me to properly configure the ssh keys. Even so, I was getting tons of, "SYZFAIL: tun_id out of range" errors (almost 200 in 30-40 minutes of uptime, near constant), which I didn't see any explicit references to in the docs, on the mailing list, or on syzbot. I saw in the report logs that these errors would appear when syz-executor would try creating a tun/tap interface with an ID higher than 4. Being new to kernel fuzzing, at first I wasn't sure if this was normal behavior or a syzkaller bug. I found this thread which had a brief reference to similar crashes with NetBSD guests and mentions increasing the number of tun/tap devices to resolve the issue. I grep'd around and looked at the commit history finding that not long after that discussion executor/common_bsd.h was changed to give NetBSD guests a MAX_TUN value of 64, higher than the default of 4. I subsequently edited that line (130) from:
   #if GOOS_netbsd
to:
   #if GOOS_netbsd || GOOS_freebsd
After recompiling, this gave freebsd guests more tun/tap
interfaces to work with and eliminated the SYZFAIL crashes. The
syz-manager was now able to run properly as far as I am aware. I
let it run for 2-3 hours and didn't have a single SYZFAIL so I
think it worked.Â
I also noticed a warning very early into the execution of
sys-manager that when calling qemu to start the guest snapshots
the 'nowait' option is deprecated and is now 'wait=off' and I was
able to edit vm/qemu/qemu.go (line 408) from:
"-chardev", fmt.Sprintf("socket,id=SOCKSYZ,server=on,nowait,host=localhost,port=%v", inst.monport),
to:
 "-chardev",
fmt.Sprintf("socket,id=SOCKSYZ,server=on,wait=off,host=localhost,port=%v",
inst.monport),
After recompiling again this change stopped that particular
warning. I'm not sure if either of these changes are desirable to
be merged to the main branch , but I wanted to document the
process and my rationale for the patches in a way that would be
visible to others trying to use syzkaller with freebsd. I'd be
happy to put in a PR if it would be helpful to the project as a
whole. I'm also willing to update the documentation pages for
freebsd to include the steps I took to get things up and running
for future users.
I have a few questions after going through the process of setting
everything up. I've seen that when a crash is found the automatic
reproducing fails each time with a cant_build_freebsd_on_linux
error and wonder if this is expected behavior, or another issue
that I should be looking into. I have not tried to manually
reproduce any crashes thus far, but I expect it has something to
do with freebsd using a different make than linux and may just
turn reproduction off for now. I also see that when clicking the
coverage hyperlink in the web interface, that a coverage profile
failed to generate due to a kernel obj directory not being
specified. Can the obj directory be copied to the host and
specified, or is this functionality not yet supported for freebsd?
Lastly, I see that most of the crashes (46 and 75 respectively
from 3-4 hours of fuzzing) are either, 'lost connection to test
machine' or, 'no output from test machine' which from the docs I
understand can be indicative of a kernel lockup or other issue,
but is the high rate of these two crash types to be expected, or
should I be looking into whether something else is going wrong?
Please let me know if I have misunderstood something along the
way or am barking up the wrong tree. I am new to participating in
large projects and kernel fuzzing and am open to feedback on how
to improve.
Thanks,
Adam Goska
Thanks for the responses! I might be at fault for the mail being
flagged as spam. The email account I'm using runs on an openbsd
VPS that I rented and configured, and while I managed to get
SPF/DKIM/DMARC/rDNS working, I've had mixed results on whether the
major providers file mails from my server as spam which I believe
could be related to the .xyz TLD. Curiously, on Outlook accounts
it always goes right to the spam folder, but on my personal Gmail
account it has gone to the inbox without issue, funny that it
doesn't for google group mails. If it becomes a persistent issue I
may switch over to a Gmail account to make things easier, I use
this one for mailing lists and the like because it doesn't get too
much spam or unwanted messages from various services I've signed
up for over the years.
Back on topic, I appreciate all of your input and it's given me a
lot to think about. I've put in an initial pull
request to update the readme with a working command to
compile syz-executor on a freebsd guest. It's mostly as a test to
see if I am doing things correctly, formatting the commit message
right and the like. Being new to using git/go I'm a bit worried I
have messed up somewhere. I had issues with running the presubmit
tests, not sure yet if that's related to my go setup not being
correct or because I run arch linux on the laptop I'm working this
with and 'make install_prerequisites' runs commands meant for
debian based distros which aren't compatible with pacman. After a
cursory look at the Makefile and repos, some packages look like
they might have slightly different naming conventions between the
two distros. I'll have to look into this further, but it might be
better to set up syz-env instead. I figured because I had only
edited a single line of a markdown file without changing any
formatting it wasn't critical for me to run the local tests and
pulled the trigger anyway, though I hope to get the tests working
soon and definitely before I make any changes to files which will
affect the compiled binaries.
I'm excited to start exploring some of these crashes! Since my last mail I've attempted to manually reproduce a crash with sys-execprog that seemed to work, though I shot myself in the foot a bit by not running the VM in snapshot mode. The disk seemed to be somewhat corrupted, and the VM wouldn't boot normally again until I entered single user mode and ran chkdsk. I haven't yet minimized the programs that caused the crash to determine the exact cause, but I have successfully reproduced it using syz-execprog with the log file.
As far as the first change I had made to stop the syzfail messages, is it possible that syzbot running a freebsd host configuration compared to my linux+qemu config could be causing the issue? I did find it weird that there were no occurrences of the 'tun device out of range' error on syzbot or reports from other users. The only other explanation I can think of at the moment is that my using git clone instead of go get somehow left out a dependency which would prevent that error, or that my go environment being out of whack in general somehow caused it. I will hold off on submitting a pull request for that change until I understand the problem a little more.
For the second change to modify the qemu flags called to start
the fuzzing VMs, I saw that the nowait option has been deprecated
since version 6.0, which was released in April 2021. This
probably means that it is too early to commit this change just
yet, to preserve compatibility with users on older versions of
qemu. I didn't notice a significant difference in performance
before and after the patch, just that the warning message didn't
appear during the startup routine.
At the beginning of those lines of output where I get either pass or okay for other tests is the "?" and toward the end of those lines it says 'no test files' so maybe these are just being skipped as you said. There does seem to be quite a few of them though. I am using the syz-env from the current pull of the repo, is it possible that there is a more recent version in the gcr.io registry? I'm hoping to get some feedback from Mark on whether increasing MAX_TUN for freebsd is a good move or not. I'm curious why I would be getting those syzfails when others don't seem to be. I also did some looking into the qemu warning change, and found the commit where the preferred wait=off syntax was introduced, it was back in 2009 so I don't believe that anyone will be using an old enough qemu version to be affected. The qemu deprecated features page says that after a feature is declared deprecated it will stay functional for that release and then one more being liable to be removed after that point. The short form boolean options like nowait were declared deprecated on the 6.0 release and it looks like they're on 6.2 now. Am I clear to put in a PR for this change?