NetBSD syzbot broken?

21 views
Skip to first unread message

Taylor R Campbell

unread,
Mar 28, 2024, 11:05:24 PM3/28/24
to Aleksandr Nogikh, Dmitry Vyukov, syzkaller-...@googlegroups.com
It's been a little while since I looked at syzbot, and it looks like
all three NetBSD builders are broken right now, but possibly because
of an issue on the host rather than in the NetBSD guest:

https://syzkaller.appspot.com/bug?id=27607fdf2130867e13d3546a583d489bab005260

> failed to create the VM Instance: failed to read from qemu: EOF
> WARNING: Image format was not specified for '/syzkaller/managers/ci2-netbsd/latest.tmp/image' and probing guessed raw.
> Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
> Specify the 'raw' format explicitly to remove the restrictions.
> Could not access KVM kernel module: No such file or directory
> qemu-system-x86_64: failed to initialize kvm: No such file or directory

Am I reading this right -- that it's a host problem with qemu failing
to start, not a guest problem? Is there anything I can do to diagnose
this or bring it back?

Aleksandr Nogikh

unread,
Apr 3, 2024, 2:06:11 PM4/3/24
to Taylor R Campbell, Dmitry Vyukov, syzkaller-...@googlegroups.com, Taras Madan
Hello,

Thanks for reaching out!

That is indeed due to some problem with the host qemu, but that makes it even more mysterious. I didn't update anything on the host and I think nobody else did that either. And it's now unfortunately impossible to pinpoint the exact moment when it began to fail with "failed to create the VM Instance:" -- syzbot cleans up old logs and in some of these logs there are actually ordinary NetBSD compilation errors. Could a NetBSD host have updated the qemu package itself?

In the syzkaller code, we do have an option to distinguish whether we need to explicitly specify the raw format or not:
https://github.com/google/syzkaller/blob/51c4dcff83b0574620c280cc5130ef59cc4a2e32/vm/qemu/qemu.go#L459

We can set UseNewQemuImageOptions=true for NetBSD and it should hopefully be working again. But I still don't understand why it has popped up..

-- 
Aleksandr




Taylor R Campbell

unread,
Apr 10, 2024, 9:36:49 PM4/10/24
to Aleksandr Nogikh, Dmitry Vyukov, syzkaller-...@googlegroups.com, Taras Madan
> Date: Wed, 3 Apr 2024 20:05:54 +0200
> From: Aleksandr Nogikh <nog...@google.com>
>
> That is indeed due to some problem with the host qemu, but that makes it
> even more mysterious. I didn't update anything on the host and I think
> nobody else did that either. And it's now unfortunately impossible to
> pinpoint the exact moment when it began to fail with "failed to create the
> VM Instance:" -- syzbot cleans up old logs and in some of these logs there
> are actually ordinary NetBSD compilation errors. Could a NetBSD host have
> updated the qemu package itself?

Not sure what you mean by this -- is syzbot running NetBSD guests
under a NetBSD host?

If so, this message is pretty weird:

Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize kvm: No such file or directory

I would expect to see something about nvmm, not kvm, if that were the
case; qemu under a NetBSD host has never supported `-accel kvm' or
anything like that -- only `-accel nvmm'. If there haven't been any
recent changes about this, surely this would have failed all along
because NetBSD as a host has never had kvm.

But maybe I misunderstood your question? I'm fuzzy on how syzbot
operates at a high level -- I've only dug into the issues it reports.
Certainly I'd be astonished if a NetBSD _guest_ changed any kind of
package installation on the _host_.

> In the syzkaller code, we do have an option to distinguish whether we need
> to explicitly specify the raw format or not:
> https://github.com/google/syzkaller/blob/51c4dcff83b0574620c280cc5130ef59cc4a2e32/vm/qemu/qemu.go#L459
>
> We can set UseNewQemuImageOptions=true for NetBSD and it should hopefully
> be working again. But I still don't understand why it has popped up..

It seems like if a file is supposed to be interpreted as a raw image,
it would be prudent to say `format=raw'. How do I find how the image
(/syzkaller/managers/ci2-netbsd/latest.tmp/image) is created and what
format it is supposed to be in?

Conceivably if something changed about that somehow (e.g., maybe
NetBSD newfs or mkimage started doing putting in different formatting
that might confuse qemu's format detection) that could break this.
But it seems unlikely.

Aleksandr Nogikh

unread,
Apr 11, 2024, 6:44:43 AM4/11/24
to Taylor R Campbell, Dmitry Vyukov, syzkaller-...@googlegroups.com, Taras Madan
On Thu, Apr 11, 2024 at 3:36 AM Taylor R Campbell <rias...@netbsd.org> wrote:
>
> > Date: Wed, 3 Apr 2024 20:05:54 +0200
> > From: Aleksandr Nogikh <nog...@google.com>
> >
> > That is indeed due to some problem with the host qemu, but that makes it
> > even more mysterious. I didn't update anything on the host and I think
> > nobody else did that either. And it's now unfortunately impossible to
> > pinpoint the exact moment when it began to fail with "failed to create the
> > VM Instance:" -- syzbot cleans up old logs and in some of these logs there
> > are actually ordinary NetBSD compilation errors. Could a NetBSD host have
> > updated the qemu package itself?
>
> Not sure what you mean by this -- is syzbot running NetBSD guests
> under a NetBSD host?
>
> If so, this message is pretty weird:
>
> Could not access KVM kernel module: No such file or directory
> qemu-system-x86_64: failed to initialize kvm: No such file or directory
>
> I would expect to see something about nvmm, not kvm, if that were the
> case; qemu under a NetBSD host has never supported `-accel kvm' or
> anything like that -- only `-accel nvmm'. If there haven't been any
> recent changes about this, surely this would have failed all along
> because NetBSD as a host has never had kvm.
>
> But maybe I misunderstood your question? I'm fuzzy on how syzbot
> operates at a high level -- I've only dug into the issues it reports.
> Certainly I'd be astonished if a NetBSD _guest_ changed any kind of
> package installation on the _host_.


Ah, I'm sorry, I have confused it with the FreeBSD setup on syzbot.
NetBSD fuzzing indeed runs on a Linux host.

>
> > In the syzkaller code, we do have an option to distinguish whether we need
> > to explicitly specify the raw format or not:
> > https://github.com/google/syzkaller/blob/51c4dcff83b0574620c280cc5130ef59cc4a2e32/vm/qemu/qemu.go#L459
> >
> > We can set UseNewQemuImageOptions=true for NetBSD and it should hopefully
> > be working again. But I still don't understand why it has popped up..
>
> It seems like if a file is supposed to be interpreted as a raw image,
> it would be prudent to say `format=raw'. How do I find how the image
> (/syzkaller/managers/ci2-netbsd/latest.tmp/image) is created and what
> format it is supposed to be in?

I think it's easier to just use the new arguments format here. I've
sent a PR: https://github.com/google/syzkaller/pull/4672

But the main problem is that it tries to use kvm acceleration, but
fails to: `qemu-system-x86_64: failed to initialize kvm: No such file
or directory`.

It's still unclear whether it didn't use kvm for netbsd builds before,
but somehow began to, or for some reason we lost the nested
virtualization support on our GCE instance.

Aleksandr Nogikh

unread,
Apr 11, 2024, 8:56:27 AM4/11/24
to Taylor R Campbell, Dmitry Vyukov, syzkaller-...@googlegroups.com, Taras Madan

Taylor R Campbell

unread,
Apr 12, 2024, 2:41:35 PM4/12/24
to Aleksandr Nogikh, Dmitry Vyukov, syzkaller-...@googlegroups.com, Taras Madan
> Date: Thu, 11 Apr 2024 14:56:11 +0200
> From: Aleksandr Nogikh <nog...@google.com>
>
> On Thu, Apr 11, 2024 at 12:44 PM Aleksandr Nogikh <nog...@google.com> wrote:
> >
> I've sent https://github.com/google/syzkaller/commit/95ed9ece851c5ce0f8db8fbe8c852457b4c36a85,
> let's see if it changes anything.

Looks like progress! It now reports a different error:

failed to create the VM Instance: failed to read from qemu: EOF
qemu-system-x86_64: -device virtio-blk-device,drive=hd0: No 'virtio-bus' bus found for device 'virtio-blk-device'

Maybe this needs to be virtio-blk-pci instead? I don't think NetBSD
on x86 currently has any non-pci virtio.

Aleksandr Nogikh

unread,
Apr 16, 2024, 8:53:55 AM4/16/24
to Taylor R Campbell, Dmitry Vyukov, syzkaller-...@googlegroups.com, Taras Madan
I've sent https://github.com/google/syzkaller/pull/4696

That was only a warning, so maybe it will work well now.

--
Aleksandr

Taylor R Campbell

unread,
Dec 19, 2024, 8:01:14 PM12/19/24
to Aleksandr Nogikh, Dmitry Vyukov, syzkaller-...@googlegroups.com, Taras Madan
Looks like the NetBSD syzbot is still not running, but for different
reasons now:

https://syzkaller.appspot.com/bug?extid=848e5b707cc911337a29 looks
like a bug in the NetBSD build which was fixed back in October by
adding shquote(3) to libnbcompat during the tools build

https://syzkaller.appspot.com/bug?extid=85f03a4da9f6aedf4c44 looks
like a golang build problem that probably isn't NetBSD specific?

--- FAIL: TestFuzz (11.34s)
testutil.go:35: seed=1733770450604578864
fuzzer_test.go:210: CRASH: first bug
fuzzer_test.go:210: CRASH: second bug
fuzzer_test.go:88: resulting corpus:
testing.go:1231: TempDir RemoveAll cleanup: unlinkat /tmp/TestFuzz1015377258/001: directory not empty
FAIL

https://syzkaller.appspot.com/bug?extid=ed1ef9cf91c3de65d44b looks
like a testbed problem with posix_spawnp -- presumably this is
happening before anything even starts running NetBSD?

Can I trouble any of you folks to help get syzbot again on NetBSD?
Anything I can do to help make it happen?

Aleksandr Nogikh

unread,
Dec 26, 2024, 4:26:43 AM12/26/24
to Taylor R Campbell, Dmitry Vyukov, syzkaller-...@googlegroups.com, Taras Madan
Thanks for reaching out!

On Fri, Dec 20, 2024 at 2:01 AM Taylor R Campbell <rias...@netbsd.org> wrote:
>
> Looks like the NetBSD syzbot is still not running, but for different
> reasons now:
>
> https://syzkaller.appspot.com/bug?extid=848e5b707cc911337a29 looks
> like a bug in the NetBSD build which was fixed back in October by
> adding shquote(3) to libnbcompat during the tools build

Does something need to be done on the syzbot side to incorporate this fix?

>
> https://syzkaller.appspot.com/bug?extid=85f03a4da9f6aedf4c44 looks
> like a golang build problem that probably isn't NetBSD specific?
>
> --- FAIL: TestFuzz (11.34s)
> testutil.go:35: seed=1733770450604578864
> fuzzer_test.go:210: CRASH: first bug
> fuzzer_test.go:210: CRASH: second bug
> fuzzer_test.go:88: resulting corpus:
> testing.go:1231: TempDir RemoveAll cleanup: unlinkat /tmp/TestFuzz1015377258/001: directory not empty
> FAIL

This looks like a one-off reincarnation of
https://github.com/google/syzkaller/issues/4920
It may cause syzbot sometimes to not update to a newer revision, but
it should not prevent it from doing fuzzing.

>
> https://syzkaller.appspot.com/bug?extid=ed1ef9cf91c3de65d44b looks
> like a testbed problem with posix_spawnp -- presumably this is
> happening before anything even starts running NetBSD?

Yes, it happens when syzbot tries to update to a newer NetBSD kernel
revision. It runs syz-manager with a "-mode=smoke-test" option and
that quickly fails with "SYZFAIL: posix_spawnp failed".

This must have begun after
https://github.com/google/syzkaller/commit/bc1a1b50f942408a9139887b914f745d9fa02adc

>
> Can I trouble any of you folks to help get syzbot again on NetBSD?
> Anything I can do to help make it happen?

It would really help if you could take a look at the posix_spawnp()
invocation in the commit I referenced above. On NetBSD, the syscall
fails with "errno 22: Invalid argument". What could be that invalid
argument? Why was posix_spawn() working fine?

--
Aleksandr

Taylor R Campbell

unread,
Dec 26, 2024, 10:24:06 AM12/26/24
to Aleksandr Nogikh, Dmitry Vyukov, syzkaller-...@googlegroups.com, Taras Madan
> Date: Thu, 26 Dec 2024 10:26:29 +0100
> From: Aleksandr Nogikh <nog...@google.com>
>
> On Fri, Dec 20, 2024 at 2:01 AM Taylor R Campbell <rias...@netbsd.org> wrote:
> >
> > Looks like the NetBSD syzbot is still not running, but for different
> > reasons now:
> >
> > https://syzkaller.appspot.com/bug?extid=848e5b707cc911337a29 looks
> > like a bug in the NetBSD build which was fixed back in October by
> > adding shquote(3) to libnbcompat during the tools build
>
> Does something need to be done on the syzbot side to incorporate this fix?

Shouldn't be -- if you just do a clean build of NetBSD with build.sh,
starting with build.sh tools, it should pick that up.

If you had cached the tools build, it's possible you need to delete
your tooldir and rebuild it. A quick skim of
https://github.com/google/syzkaller/blob/d3ccff6372e07c6aabd02b5da419aa6492b5f0ad/pkg/build/netbsd.go#L24
suggests you do use fresh tools:

// Clear the tools.
if _, err := osutil.RunCmd(5*time.Minute, params.KernelDir, "rm", "-rf", "obj/"); err != nil {
return ImageDetails{}, err
}

So this should be addressed, but if it still happens (e.g., if I
misunderstood what is getting cleaned here -- usually I use `build.sh
-O ../obj' so all the build products go into ../obj and the source
tree can even be read-only), I can take a closer look.

> > https://syzkaller.appspot.com/bug?extid=85f03a4da9f6aedf4c44 looks
> > like a golang build problem that probably isn't NetBSD specific?
> >
> > --- FAIL: TestFuzz (11.34s)
> > testutil.go:35: seed=1733770450604578864
> > fuzzer_test.go:210: CRASH: first bug
> > fuzzer_test.go:210: CRASH: second bug
> > fuzzer_test.go:88: resulting corpus:
> > testing.go:1231: TempDir RemoveAll cleanup: unlinkat /tmp/TestFuzz1015377258/001: directory not empty
> > FAIL
>
> This looks like a one-off reincarnation of
> https://github.com/google/syzkaller/issues/4920
> It may cause syzbot sometimes to not update to a newer revision, but
> it should not prevent it from doing fuzzing.

OK, thanks!

> > https://syzkaller.appspot.com/bug?extid=ed1ef9cf91c3de65d44b looks
> > like a testbed problem with posix_spawnp -- presumably this is
> > happening before anything even starts running NetBSD?
>
> Yes, it happens when syzbot tries to update to a newer NetBSD kernel
> revision. It runs syz-manager with a "-mode=smoke-test" option and
> that quickly fails with "SYZFAIL: posix_spawnp failed".
>
> This must have begun after
> https://github.com/google/syzkaller/commit/bc1a1b50f942408a9139887b914f745d9fa02adc
>
> > Can I trouble any of you folks to help get syzbot again on NetBSD?
> > Anything I can do to help make it happen?
>
> It would really help if you could take a look at the posix_spawnp()
> invocation in the commit I referenced above. On NetBSD, the syscall
> fails with "errno 22: Invalid argument". What could be that invalid
> argument? Why was posix_spawn() working fine?

That's pretty weird, because posix_spawnp just searches $PATH and then
defers to posix_spawn:

https://nxr.netbsd.org/xref/src/lib/libc/gen/posix_spawnp.c?r=1.5

Only the file -> fpath argument is changed (and then only if there's
no slash in the pathname); all of the other ones are passed through
verbatim to posix_spawn.

If you can ktrace or ktruss the program it might help to find what
arguments are being passed to posix_spawn. The usage is:

ktrace ./prog # writes binary trace to ktrace.out
kdump # formats binary trace in ktrace.out human-readably

or, if you want a usage model a little more like strace at somewhat
higher overhead:

ktruss [-o <outfile>] ./prog # traces and formats simultaneously

(I guess I should try going through the instructions to run syzkaller
locally at this point rather than just teledebug everything remotely!)

Aleksandr Nogikh

unread,
Jan 8, 2025, 12:06:40 PMJan 8
to Taylor R Campbell, Dmitry Vyukov, syzkaller-...@googlegroups.com, Taras Madan
On Thu, Dec 26, 2024 at 4:24 PM Taylor R Campbell <rias...@netbsd.org> wrote:
>
> > Date: Thu, 26 Dec 2024 10:26:29 +0100
> > From: Aleksandr Nogikh <nog...@google.com>
> >
> > On Fri, Dec 20, 2024 at 2:01 AM Taylor R Campbell <rias...@netbsd.org> wrote:
> > >
> > > Looks like the NetBSD syzbot is still not running, but for different
> > > reasons now:
> > >
> > > https://syzkaller.appspot.com/bug?extid=848e5b707cc911337a29 looks
> > > like a bug in the NetBSD build which was fixed back in October by
> > > adding shquote(3) to libnbcompat during the tools build
> >
> > Does something need to be done on the syzbot side to incorporate this fix?
>
> Shouldn't be -- if you just do a clean build of NetBSD with build.sh,
> starting with build.sh tools, it should pick that up.
>
> If you had cached the tools build, it's possible you need to delete
> your tooldir and rebuild it. A quick skim of
> https://github.com/google/syzkaller/blob/d3ccff6372e07c6aabd02b5da419aa6492b5f0ad/pkg/build/netbsd.go#L24
> suggests you do use fresh tools:
>
> // Clear the tools.
> if _, err := osutil.RunCmd(5*time.Minute, params.KernelDir, "rm", "-rf", "obj/"); err != nil {
> return ImageDetails{}, err
> }
>
> So this should be addressed, but if it still happens (e.g., if I
> misunderstood what is getting cleaned here -- usually I use `build.sh
> -O ../obj' so all the build products go into ../obj and the source
> tree can even be read-only), I can take a closer look.

Looked once more at https://syzkaller.appspot.com/bug?extid=848e5b707cc911337a29

I think it was also a transient problem -- there was a spike on Oct
31st and on Dec 20th, and that's all. Since we have a "test error" for
every recent NetBSD commit, syzbot can build NetBSD just fine. So I
guess nothing needs to be done here.
I'm sorry, it looks like I misinformed you about the
https://github.com/google/syzkaller/commit/bc1a1b50f942408a9139887b914f745d9fa02adc
commit.

The commit has indeed caused "netbsd test error: SYZFAIL: posix_spawnp
failed" to happen, but that actually only replaced an older problem
that already existed. It just had a different title: "netbsd test
error: SYZFAIL: posix_spawn failed", see
https://syzkaller.appspot.com/bug?extid=6b8de84f7954fa8b4ecb

And that one appeared ~ on Jun 25th:
https://syzkaller.appspot.com/netbsd/graph/crashes?regexp=netbsd+test+error%3A+SYZFAIL%3A+posix&Months=10&show-graph=Show+graph

That's actually around the time when the corresponding code snippet
first appeared in syzkaller.

Before we start deep debugging of this, does the code ring any bells
for you now given that the problem is not in the posix_spawn ->
posix_spawnp change? :)

>
> https://nxr.netbsd.org/xref/src/lib/libc/gen/posix_spawnp.c?r=1.5
>
> Only the file -> fpath argument is changed (and then only if there's
> no slash in the pathname); all of the other ones are passed through
> verbatim to posix_spawn.
>
> If you can ktrace or ktruss the program it might help to find what
> arguments are being passed to posix_spawn. The usage is:
>
> ktrace ./prog # writes binary trace to ktrace.out
> kdump # formats binary trace in ktrace.out human-readably
>
> or, if you want a usage model a little more like strace at somewhat
> higher overhead:
>
> ktruss [-o <outfile>] ./prog # traces and formats simultaneously
>

I fear it might be not so straightforward to get meaningful data out
of syz-executor traces. It does so much initialization and forking +
requires following a complicated communication protocol with the host
process even to get to the stage where it would call posix_spawnp().

If we still want to, we can try to do it by
1) Hacking https://github.com/google/syzkaller/blob/f3558dbf032eab2b77c1cb11b9ce2baffe7838d3/pkg/rpcserver/local.go#L67
to prepend ktrace/ktruss to the syz-executor call.
2) Building syz-execprog on the hacked syzkaller and using it to run
some simple program inside a VM.

**But** if we can get the needed information just by adding extra
`debug(...)` logging calls into syz-executor code, that would be
incomparably easier than everything else.
Reply all
Reply to author
Forward
0 new messages