About NetBSD bug reports

26 views
Skip to first unread message

Aleksandr Nogikh

unread,
Mar 14, 2022, 12:41:34 PM3/14/22
to rias...@netbsd.org, Dmitry Vyukov, Taras Madan, syzkaller-...@googlegroups.com
Hello,

Thank you for such an impressive NetBSD bug fixing sprint!

I would like to ask you a question, just in case you could help us.

Our NetBSD bugs page/group has unfortunately seemed to be abandoned
recently. Which is very sad, because it could definitely benefit the
security and reliability of the NetBSD kernel. I already tried to seek
advice on what to do here a couple of months ago, but nobody replied.

Do you know of any ways (e.g. mailing lists other than
syzkaller-...@googlegroups.com) syzbot could reach NetBSD
developers with the bugs it finds? For example, for Linux bugs, we use
`scripts/get_maintainer.pl` to get the related mailing lists /
developer emails and then report to those addresses. Is there
something similar for NetBSD?

P.S. If there are also some things which could have made your recent
active interaction with syzbot easier, please let us know.


Best Regards
Aleksandr

Taylor R Campbell

unread,
Mar 14, 2022, 2:00:51 PM3/14/22
to Aleksandr Nogikh, Dmitry Vyukov, Taras Madan, syzkaller-...@googlegroups.com
> Date: Mon, 14 Mar 2022 17:41:22 +0100
> From: Aleksandr Nogikh <nog...@google.com>
>
> I would like to ask you a question, just in case you could help us.
>
> Our NetBSD bugs page/group has unfortunately seemed to be abandoned
> recently. Which is very sad, because it could definitely benefit the
> security and reliability of the NetBSD kernel. I already tried to seek
> advice on what to do here a couple of months ago, but nobody replied.
>
> Do you know of any ways (e.g. mailing lists other than
> syzkaller-...@googlegroups.com) syzbot could reach NetBSD
> developers with the bugs it finds? For example, for Linux bugs, we use
> `scripts/get_maintainer.pl` to get the related mailing lists /
> developer emails and then report to those addresses. Is there
> something similar for NetBSD?

I recall a discussion internally about that a couple months ago but
I'm not sure what happened -- it looks like we ended up with a mailing
list netbsd...@NetBSD.org like coverity...@NetBSD.org, which
was maybe supposed to be subscribed to netbsd-syzkaller-bugs, but the
netbsd-syzbot mail archive doesn't have anything in it so maybe
something is wrong on our end. I'll see if I can find what's up.

> P.S. If there are also some things which could have made your recent
> active interaction with syzbot easier, please let us know.

Thanks! I think the main thing I've found so far -- in my first time
interacting with syzbot much -- is just that cross-referencing fix
commits wasn't obvious, and the feedback isn't clear.

I'm still not sure if the changes listed under `fix pending' for which
I gave a commit subject line are correctly cross-referenced; maybe
there's a delay before it updates its determination of whether it has
the commit?

Also it would be nice if I could easiy go from a line like

Reported-by: syzbot+6ff1f6...@syzkaller.appspotmail.com

which we have copied & pasted into commit messages, and find my way to
the dashboard at

https://syzkaller.appspot.com/bug?id=fa28fdb1140b2b0b9b5e4b8066983ca560555d1c

with log, reproducer, &c. Right now I'm guessing and checking, or
just searching `6ff1f6ab536e2d1d4376' in Google, but it's not very
reliable. Is there a better way to do this?

Aleksandr Nogikh

unread,
Mar 15, 2022, 7:35:23 AM3/15/22
to Taylor R Campbell, Dmitry Vyukov, Taras Madan, syzkaller-...@googlegroups.com, syzbot user survey
On Mon, Mar 14, 2022 at 7:00 PM Taylor R Campbell <rias...@netbsd.org> wrote:
>
> > Date: Mon, 14 Mar 2022 17:41:22 +0100
> > From: Aleksandr Nogikh <nog...@google.com>
> >
> > I would like to ask you a question, just in case you could help us.
> >
> > Our NetBSD bugs page/group has unfortunately seemed to be abandoned
> > recently. Which is very sad, because it could definitely benefit the
> > security and reliability of the NetBSD kernel. I already tried to seek
> > advice on what to do here a couple of months ago, but nobody replied.
> >
> > Do you know of any ways (e.g. mailing lists other than
> > syzkaller-...@googlegroups.com) syzbot could reach NetBSD
> > developers with the bugs it finds? For example, for Linux bugs, we use
> > `scripts/get_maintainer.pl` to get the related mailing lists /
> > developer emails and then report to those addresses. Is there
> > something similar for NetBSD?
>
> I recall a discussion internally about that a couple months ago but
> I'm not sure what happened -- it looks like we ended up with a mailing
> list netbsd...@NetBSD.org like coverity...@NetBSD.org, which
> was maybe supposed to be subscribed to netbsd-syzkaller-bugs, but the
> netbsd-syzbot mail archive doesn't have anything in it so maybe
> something is wrong on our end. I'll see if I can find what's up.

Thank you!
FWIW syzbot could also directly send emails to
netbsd...@NetBSD.org (or whatever), if that could simplify things.

>
> > P.S. If there are also some things which could have made your recent
> > active interaction with syzbot easier, please let us know.
>
> Thanks! I think the main thing I've found so far -- in my first time
> interacting with syzbot much -- is just that cross-referencing fix
> commits wasn't obvious, and the feedback isn't clear.
>
> I'm still not sure if the changes listed under `fix pending' for which
> I gave a commit subject line are correctly cross-referenced; maybe
> there's a delay before it updates its determination of whether it has
> the commit?

Yes, at the moment syzbot only matches the received titles with the
actual commits when it builds a new kernel (~ every 12h, provided
there are new commits in the kernel or syzkaller repo). I agree, it's
not really intuitive for a new user.
As I understand, it would help if syzbot did an instant commit title
validation and explained in the reply email what it has done and what
are its next steps.

>
> Also it would be nice if I could easiy go from a line like
>
> Reported-by: syzbot+6ff1f6...@syzkaller.appspotmail.com
>
> which we have copied & pasted into commit messages, and find my way to
> the dashboard at
>
> https://syzkaller.appspot.com/bug?id=fa28fdb1140b2b0b9b5e4b8066983ca560555d1c
>
> with log, reproducer, &c. Right now I'm guessing and checking, or
> just searching `6ff1f6ab536e2d1d4376' in Google, but it's not very
> reliable. Is there a better way to do this?

Hmm, I think such a use case was not considered. This line was only
intended to enable syzbot to automatically mark bugs as 'fixed' as it
processes new commits. Obviously, we have all the information to match
these hashes back to the actual bugs, the question is only how to best
expose it to the user and how users would figure out that it's become
possible. We'll think about what can be done in this case.

Thank you very much for your feedback! It really helps to see the user
perspective and make the tool more convenient to work with.

If anything else comes to your mind, feel free to share :)

--
Best Regards,
Aleksandr

Taylor R Campbell

unread,
Mar 16, 2022, 9:46:17 AM3/16/22
to Aleksandr Nogikh, Dmitry Vyukov, Taras Madan, syzkaller-...@googlegroups.com, syzbot user survey
I notice there are a number of syzbot crashes that work by

(a) using mknod to create a /dev/mem equivalent (dev=0x200 on x86;
dev=0 on arm; and a couple others on some other architectures),
and then
(b) opening that writing to somewhere in it.

If something crashes when you do this, that's not a bug -- /dev/mem
just maps the entire physical address space of the system, so there's
no limit to the amount of havoc you can wreak by writing to it.
That's why we limit /dev/mem to the superuser, and disable writes to
it at securelevel>=1, and disable reads from it at securelevel>=2.
It's useful for, e.g., debugging a live kernel with gdb.

I haven't catalogued all of these, but here's an example:

https://syzkaller.appspot.com/bug?id=f06a5182accb8ee1c0ca5b48631d0d37c6695779

Is there a way to teach syzbot to avoid this? I can mark them as
`invalid', but that's not exactly right -- it's not a one-off mistake
induced by previous memory corruption; it's just that this whole
avenue -- reading or writing /dev/mem -- is apt to lead to crashes
that are not bugs.

Same deal with /dev/kmem (0x201 on x86, 1 on arm, &c.), which maps the
kernel virtual address space, although I'm not sure I've seen any
crashes reported with that.

Aleksandr Nogikh

unread,
Mar 16, 2022, 1:19:10 PM3/16/22
to Taylor R Campbell, Dmitry Vyukov, Taras Madan, syzkaller-...@googlegroups.com, syzbot user survey
Thank you for sharing these observations!

On Wed, Mar 16, 2022 at 2:46 PM Taylor R Campbell <rias...@netbsd.org> wrote:
>
> I notice there are a number of syzbot crashes that work by
>
> (a) using mknod to create a /dev/mem equivalent (dev=0x200 on x86;
> dev=0 on arm; and a couple others on some other architectures),
> and then
> (b) opening that writing to somewhere in it.
>
> If something crashes when you do this, that's not a bug -- /dev/mem
> just maps the entire physical address space of the system, so there's
> no limit to the amount of havoc you can wreak by writing to it.
> That's why we limit /dev/mem to the superuser, and disable writes to
> it at securelevel>=1, and disable reads from it at securelevel>=2.
> It's useful for, e.g., debugging a live kernel with gdb.
>
> I haven't catalogued all of these, but here's an example:
>
> https://syzkaller.appspot.com/bug?id=f06a5182accb8ee1c0ca5b48631d0d37c6695779

This is an interesting example. When I look at the reproducer, I don't
really understand how it worked.

mknod(&(0x7f0000000080)='./file0\x00', 0x205e, 0x200)
https://syzkaller.appspot.com/text?tag=ReproSyz&x=11ea7661900000

The device is 0x200, but the mode here is 0x205e, so it shouldn't have
made it past the check here:
https://github.com/NetBSD/src/blob/819b3848b3ef6d1a09d64c97167ba355230a972f/sys/kern/vfs_syscalls.c#L2422

Or did I miss something?

>
> Is there a way to teach syzbot to avoid this? I can mark them as
> `invalid', but that's not exactly right -- it's not a one-off mistake
> induced by previous memory corruption; it's just that this whole
> avenue -- reading or writing /dev/mem -- is apt to lead to crashes
> that are not bugs.

Yes, syzkaller actually already attempts to neutralize such
potentially dangerous calls.
https://github.com/google/syzkaller/blob/master/sys/targets/common.go#L94

For mknod/mknodat, whenever it wants to use S_IFBLK or S_IFCHR, it
uses S_IFREG instead (with two exceptions - /dev/null and loop
devices).

Taylor R Campbell

unread,
Mar 16, 2022, 3:53:37 PM3/16/22
to Aleksandr Nogikh, Dmitry Vyukov, Taras Madan, syzkaller-...@googlegroups.com, syzbot user survey
> Date: Wed, 16 Mar 2022 18:18:57 +0100
> From: Aleksandr Nogikh <nog...@google.com>
>
> On Wed, Mar 16, 2022 at 2:46 PM Taylor R Campbell <rias...@netbsd.org> wrote:
> >
> > I haven't catalogued all of these, but here's an example:
> >
> > https://syzkaller.appspot.com/bug?id=f06a5182accb8ee1c0ca5b48631d0d37c6695779
>
> This is an interesting example. When I look at the reproducer, I don't
> really understand how it worked.
>
> mknod(&(0x7f0000000080)='./file0\x00', 0x205e, 0x200)
> https://syzkaller.appspot.com/text?tag=ReproSyz&x=11ea7661900000
>
> The device is 0x200, but the mode here is 0x205e, so it shouldn't have
> made it past the check here:
> https://github.com/NetBSD/src/blob/819b3848b3ef6d1a09d64c97167ba355230a972f/sys/kern/vfs_syscalls.c#L2422

I don't see why? We have (from sys/stat.h):

#define _S_IFMT 0170000 /* type of file mask */
#define _S_IFCHR 0020000 /* character special */

Or, in hexadecimal:

S_IFMT = 0xf000
S_IFCHR = 0x2000

Here, mode = 0x205e, so mode & S_IFMT = 0x2000 = S_IFCHR, so we take
the S_IFCHR case, set vattr.va_type = VCHR, and proceed on our way.

> > Is there a way to teach syzbot to avoid this? I can mark them as
> > `invalid', but that's not exactly right -- it's not a one-off mistake
> > induced by previous memory corruption; it's just that this whole
> > avenue -- reading or writing /dev/mem -- is apt to lead to crashes
> > that are not bugs.
>
> Yes, syzkaller actually already attempts to neutralize such
> potentially dangerous calls.
> https://github.com/google/syzkaller/blob/master/sys/targets/common.go#L94
>
> For mknod/mknodat, whenever it wants to use S_IFBLK or S_IFCHR, it
> uses S_IFREG instead (with two exceptions - /dev/null and loop
> devices).

I'm puzzled by this, though. It looks like this logic should prevent
mknod from ever passing mode=0x205e=020136 and dev=0x200 to mknod. Am
I missing something?

Some of the reproducers use compat_50_mknod -- the old mknod syscall
with 32-bit dev_t -- and maybe there needs to be a case for that, but
the one you quoted appears to use the current (64-bit) mknod:

https://syzkaller.appspot.com/text?tag=ReproSyz&x=11ea7661900000

Maybe this reproducer was just generated before the mknod rules were
put in?

Taylor R Campbell

unread,
Mar 17, 2022, 5:31:43 AM3/17/22
to Aleksandr Nogikh, Dmitry Vyukov, Taras Madan, syzkaller-...@googlegroups.com
Had some trouble recently getting syzbot to test a patch. It looks
like a problem on syzbot's end with the syzbot build:

> Date: Wed, 16 Mar 2022 16:54:08 -0700
> From: syzbot <syzbot+fd58d1...@syzkaller.appspotmail.com>
>
> Hello,
>
> syzbot tried to test the proposed patch but the build/boot failed:
>
> syzkaller build failed: failed to run ["make" "target"]: exit status 2
> tools/syz-make/make.go:14:2: no required module provides package github.com/google/syzkaller/pkg/osutil: go.mod file not found in current directory or any parent directory; see 'go help modules'
> tools/syz-make/make.go:15:2: no required module provides package github.com/google/syzkaller/sys/targets: go.mod file not found in current directory or any parent directory; see 'go help modules'
> Makefile:39: *** syz-make failed. Stop.
>
> go env (err=<nil>)
> GO111MODULE=""
> GOARCH="amd64"
> GOBIN=""
> GOCACHE="/syzkaller/.cache/go-build"
> GOENV="/syzkaller/.config/go/env"
> GOEXE=""
> GOEXPERIMENT=""
> GOFLAGS=""
> GOHOSTARCH="amd64"
> GOHOSTOS="linux"
> GOINSECURE=""
> GOMODCACHE="/syzkaller/jobs/netbsd/gopath/pkg/mod"
> GONOPROXY=""
> GONOSUMDB=""
> GOOS="linux"
> GOPATH="/syzkaller/jobs/netbsd/gopath"
> GOPRIVATE=""
> GOPROXY="https://proxy.golang.org,direct"
> GOROOT="/usr/local/go"
> GOSUMDB="sum.golang.org"
> GOTMPDIR=""
> GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
> GOVCS=""
> GOVERSION="go1.17"
> GCCGO="gccgo"
> AR="ar"
> CC="gcc"
> CXX="g++"
> CGO_ENABLED="1"
> GOMOD="/dev/null"
> CGO_CFLAGS="-g -O2"
> CGO_CPPFLAGS=""
> CGO_CXXFLAGS="-g -O2"
> CGO_FFLAGS="-g -O2"
> CGO_LDFLAGS="-g -O2"
> PKG_CONFIG="pkg-config"
> GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3854820466=/tmp/go-build -gno-record-gcc-switches"
>
> git status (err=<nil>)
> HEAD detached at a2cdad9d4
> nothing to commit, working tree clean
>
>
>
> Tested on:
>
> commit: [unknown
> git tree: https://github.com/NetBSD/src trunk
> dashboard link: https://syzkaller.appspot.com/bug?extid=fd58d1d4dd12f8931486
> compiler:
> patch: https://syzkaller.appspot.com/x/patch.diff?x=17f943d5700000
>

Aleksandr Nogikh

unread,
Mar 17, 2022, 6:29:20 AM3/17/22
to Taylor R Campbell, Dmitry Vyukov, Taras Madan, syzkaller-...@googlegroups.com, syzbot user survey
On Wed, Mar 16, 2022 at 8:53 PM Taylor R Campbell <rias...@netbsd.org> wrote:
>
> > Date: Wed, 16 Mar 2022 18:18:57 +0100
> > From: Aleksandr Nogikh <nog...@google.com>
> >
> > On Wed, Mar 16, 2022 at 2:46 PM Taylor R Campbell <rias...@netbsd.org> wrote:
> > >
> > > I haven't catalogued all of these, but here's an example:
> > >
> > > https://syzkaller.appspot.com/bug?id=f06a5182accb8ee1c0ca5b48631d0d37c6695779
> >
> > This is an interesting example. When I look at the reproducer, I don't
> > really understand how it worked.
> >
> > mknod(&(0x7f0000000080)='./file0\x00', 0x205e, 0x200)
> > https://syzkaller.appspot.com/text?tag=ReproSyz&x=11ea7661900000
> >
> > The device is 0x200, but the mode here is 0x205e, so it shouldn't have
> > made it past the check here:
> > https://github.com/NetBSD/src/blob/819b3848b3ef6d1a09d64c97167ba355230a972f/sys/kern/vfs_syscalls.c#L2422
>
> I don't see why? We have (from sys/stat.h):
>
> #define _S_IFMT 0170000 /* type of file mask */
> #define _S_IFCHR 0020000 /* character special */
>
> Or, in hexadecimal:
>
> S_IFMT = 0xf000
> S_IFCHR = 0x2000
>
> Here, mode = 0x205e, so mode & S_IFMT = 0x2000 = S_IFCHR, so we take
> the S_IFCHR case, set vattr.va_type = VCHR, and proceed on our way.
>

Argh, I didn't notice that these constants were octals. Thank you!
Then these reproducers indeed make perfect sense.

> > > Is there a way to teach syzbot to avoid this? I can mark them as
> > > `invalid', but that's not exactly right -- it's not a one-off mistake
> > > induced by previous memory corruption; it's just that this whole
> > > avenue -- reading or writing /dev/mem -- is apt to lead to crashes
> > > that are not bugs.
> >
> > Yes, syzkaller actually already attempts to neutralize such
> > potentially dangerous calls.
> > https://github.com/google/syzkaller/blob/master/sys/targets/common.go#L94
> >
> > For mknod/mknodat, whenever it wants to use S_IFBLK or S_IFCHR, it
> > uses S_IFREG instead (with two exceptions - /dev/null and loop
> > devices).
>
> I'm puzzled by this, though. It looks like this logic should prevent
> mknod from ever passing mode=0x205e=020136 and dev=0x200 to mknod. Am
> I missing something?

Yes, it's actually very strange that it still executes such calls.
When I execute only that piece of code, it works as expected (mode
becomes 0x805e).
So it looks like there's some bug elsewhere in the syzkaller code.
I'll try to take a closer look at this.

>
> Some of the reproducers use compat_50_mknod -- the old mknod syscall
> with 32-bit dev_t -- and maybe there needs to be a case for that, but
> the one you quoted appears to use the current (64-bit) mknod:
>
> https://syzkaller.appspot.com/text?tag=ReproSyz&x=11ea7661900000
>
> Maybe this reproducer was just generated before the mknod rules were
> put in?

The code was added in 2018, so this problem is unfortunately not gone by now.
I also see in https://syzkaller.appspot.com/bug?id=3681606a7d82f90dcfcbfa20eafd448596ae9353
that logs from Mar 10th still mention
"mknod(&(0x7f0000000080)='./file0\x00', 0x205e, 0x200)".

Aleksandr Nogikh

unread,
Mar 17, 2022, 8:35:48 AM3/17/22
to Taylor R Campbell, Dmitry Vyukov, Taras Madan, syzkaller-...@googlegroups.com
On Thu, Mar 17, 2022 at 10:31 AM Taylor R Campbell <rias...@netbsd.org> wrote:
>
> Had some trouble recently getting syzbot to test a patch. It looks
> like a problem on syzbot's end with the syzbot build:
>
> > Date: Wed, 16 Mar 2022 16:54:08 -0700
> > From: syzbot <syzbot+fd58d1...@syzkaller.appspotmail.com>
> >
> > Hello,
> >
> > syzbot tried to test the proposed patch but the build/boot failed:
> >
> > syzkaller build failed: failed to run ["make" "target"]: exit status 2
> > tools/syz-make/make.go:14:2: no required module provides package github.com/google/syzkaller/pkg/osutil: go.mod file not found in current directory or any parent directory; see 'go help modules'
> > tools/syz-make/make.go:15:2: no required module provides package github.com/google/syzkaller/sys/targets: go.mod file not found in current directory or any parent directory; see 'go help modules'
> > Makefile:39: *** syz-make failed. Stop.

Thanks for reporting! I sent a fix
(https://github.com/google/syzkaller/pull/3033). After it's merged and
deployed, patch testing for old bugs should work fine.

Taylor R Campbell

unread,
Mar 19, 2022, 8:10:17 AM3/19/22
to Aleksandr Nogikh, Dmitry Vyukov, Taras Madan, syzkaller-...@googlegroups.com
> Date: Thu, 17 Mar 2022 13:35:36 +0100
> From: Aleksandr Nogikh <nog...@google.com>
>
> Thanks for reporting! I sent a fix
> (https://github.com/google/syzkaller/pull/3033). After it's merged and
> deployed, patch testing for old bugs should work fine.

Thanks, seems better now!

Taylor R Campbell

unread,
Mar 30, 2022, 8:06:30 AM3/30/22
to Aleksandr Nogikh, Dmitry Vyukov, Taras Madan, syzkaller-...@googlegroups.com
Do you know what's up with the latest syzkaller NetBSD runs, as of
about a week ago? It seems to have gotten stuck on something but I'm
not sure what. Is the VM hanging at boot? Is there a way to interact
with it at the console, e.g. to hit C-A-ESC or send a serial break to
enter ddb and run `bt' or `ps' to see what it's stuck on?

The log suggests that the file system is not clean but it is booting
without fsck, which sounds bad, but maybe that's how it always runs,
so maybe it's only by accident that it hasn't caused problems before?

https://syzkaller.appspot.com/text?tag=CrashLog&x=16fc8b73700000

Aleksandr Nogikh

unread,
Mar 30, 2022, 10:35:48 AM3/30/22
to Taylor R Campbell, Dmitry Vyukov, Taras Madan, syzkaller-...@googlegroups.com
Hi,

On Wed, Mar 30, 2022 at 2:06 PM Taylor R Campbell <rias...@netbsd.org> wrote:
>
> Do you know what's up with the latest syzkaller NetBSD runs, as of
> about a week ago? It seems to have gotten stuck on something but I'm
> not sure what. Is the VM hanging at boot?

Yes, for some reason NetBSD VMs do not boot normally now.

> Is there a way to interact
> with it at the console, e.g. to hit C-A-ESC or send a serial break to
> enter ddb and run `bt' or `ps' to see what it's stuck on?

Via syzbot, unfortunately no. It just captures and saves the serial
port output (+ ssh output, when it has managed to establish an ssh
connection).
For OpenBSD, it "interviews" ddb with a set of predefined commands
(https://github.com/google/syzkaller/blob/42718dd659525414aa0bf2794688ac94a32f7764/vm/vmimpl/openbsd.go#L15),
which I think can also be done for NetBSD. Though in its current
implementation it apparently relies on ddb being automatically started
after a panic, so if there's no panic (like in this case), it wouldn't
help.

>
> The log suggests that the file system is not clean but it is booting
> without fsck, which sounds bad, but maybe that's how it always runs,
> so maybe it's only by accident that it hasn't caused problems before?

We keep a clean image separately and recreate fresh VMs based on that
image each time, so it's quite unlikely that the image is corrupted.

I tried to export the GCE image to qcow2 and reproduce it with qemu,
however, it boots absolutely fine then. On GCE, the problem can be
reproduced in 100% of cases.
I connected to the serial console of a running VM:
1. The kernel itself is not hanging - if I press Ctrl+C, I get
`^C/etc/rc.d/network terminated with signal 2`. Though then it hangs
on "Starting dhcpcd." and here Ctrl+C doesn't help anymore.
2. I didn't manage to enter the ddb mode. After typing ~B (which does
BREAK, as was written in
https://cloud.google.com/compute/docs/troubleshooting/troubleshooting-using-serial-console#sending_a_serial_break),
nothing happens. Maybe ddb is just not enabled and syzbot's NetBSD
config needs to be adjusted somehow?

Here's the exported qcow2, in case you'd like to check something yourself:
https://storage.googleapis.com/artifacts.syzkaller.appspot.com/shared-files/30-03-2022%20netbsd%20image/netbd-disk.qcow2
I was running it via "qemu-system-x86_64 -smp 2 -m 4G -enable-kvm
-hda "./netbd-disk.qcow2" -snapshot -nographic -net nic -net user"

>
> https://syzkaller.appspot.com/text?tag=CrashLog&x=16fc8b73700000

Taylor R Campbell

unread,
Apr 11, 2022, 8:01:28 PM4/11/22
to Aleksandr Nogikh, Dmitry Vyukov, Taras Madan, syzkaller-...@googlegroups.com
> Date: Wed, 30 Mar 2022 16:35:36 +0200
> From: Aleksandr Nogikh <nog...@google.com>
>
> I tried to export the GCE image to qcow2 and reproduce it with qemu,
> however, it boots absolutely fine then. On GCE, the problem can be
> reproduced in 100% of cases.
> I connected to the serial console of a running VM:
> 1. The kernel itself is not hanging - if I press Ctrl+C, I get
> `^C/etc/rc.d/network terminated with signal 2`. Though then it hangs
> on "Starting dhcpcd." and here Ctrl+C doesn't help anymore.

I'm guessing it's something different about qemu and GCE networking,
since userland is obviously still functioning but it's stuck on
configuring the network. I have a couple candidate changes to back
out, but the `report' of these hangs at boot doesn't have a reproducer
so syzbot refuses to test a patch. How can I try asking syzbot to run
them?

(syzbot1.patch backs out one change; syzbot2.patch backs out that
change and another one. They can only be tested separately, not on
top of each other.)

> 2. I didn't manage to enter the ddb mode. After typing ~B (which does
> BREAK, as was written in
> https://cloud.google.com/compute/docs/troubleshooting/troubleshooting-using-serial-console#sending_a_serial_break),
> nothing happens. Maybe ddb is just not enabled and syzbot's NetBSD
> config needs to be adjusted somehow?

As I recall, ddb is enabled but the magic break logic might be broken.
If you build a kernel with `options CNMAGIC="\"+++++\""' you could
enter ddb by typing `+++++' instead of sending a break.

Of course, syzbot would immediately reboot, because it has `options
DDB_COMMANDONENTER="show registers;bt;ps;...stuff...;reboot"'.
syzbot1.patch
syzbot2.patch

Aleksandr Nogikh

unread,
Apr 12, 2022, 1:52:53 PM4/12/22
to Taylor R Campbell, Dmitry Vyukov, Taras Madan, syzkaller-...@googlegroups.com
(+others)

On Tue, Apr 12, 2022 at 7:52 PM Aleksandr Nogikh <nog...@google.com> wrote:
>
> On Tue, Apr 12, 2022 at 2:01 AM Taylor R Campbell <rias...@netbsd.org> wrote:
> >
> > > Date: Wed, 30 Mar 2022 16:35:36 +0200
> > > From: Aleksandr Nogikh <nog...@google.com>
> > >
> > > I tried to export the GCE image to qcow2 and reproduce it with qemu,
> > > however, it boots absolutely fine then. On GCE, the problem can be
> > > reproduced in 100% of cases.
> > > I connected to the serial console of a running VM:
> > > 1. The kernel itself is not hanging - if I press Ctrl+C, I get
> > > `^C/etc/rc.d/network terminated with signal 2`. Though then it hangs
> > > on "Starting dhcpcd." and here Ctrl+C doesn't help anymore.
> >
> > I'm guessing it's something different about qemu and GCE networking,
> > since userland is obviously still functioning but it's stuck on
> > configuring the network. I have a couple candidate changes to back
> > out, but the `report' of these hangs at boot doesn't have a reproducer
> > so syzbot refuses to test a patch. How can I try asking syzbot to run
> > them?
> >
> > (syzbot1.patch backs out one change; syzbot2.patch backs out that
> > change and another one. They can only be tested separately, not on
> > top of each other.)
>
> I've pushed an update to syzbot, now it should be able to accept
> patches for boot errors testing (unless I missed something or messed
> it all up).
>
> >
> > > 2. I didn't manage to enter the ddb mode. After typing ~B (which does
> > > BREAK, as was written in
> > > https://cloud.google.com/compute/docs/troubleshooting/troubleshooting-using-serial-console#sending_a_serial_break),
> > > nothing happens. Maybe ddb is just not enabled and syzbot's NetBSD
> > > config needs to be adjusted somehow?
> >
> > As I recall, ddb is enabled but the magic break logic might be broken.
> > If you build a kernel with `options CNMAGIC="\"+++++\""' you could
> > enter ddb by typing `+++++' instead of sending a break.
> >
> > Of course, syzbot would immediately reboot, because it has `options
> > DDB_COMMANDONENTER="show registers;bt;ps;...stuff...;reboot"'.
>
> That's interesting, I can try to build and run it this way and give
> you the output if the idea with patches doesn't work out well.

Taylor R Campbell

unread,
Apr 14, 2022, 9:45:00 AM4/14/22
to Aleksandr Nogikh, Dmitry Vyukov, Taras Madan, syzkaller-...@googlegroups.com
> Date: Tue, 12 Apr 2022 19:52:41 +0200
> From: Aleksandr Nogikh <nog...@google.com>
>
So I found the offending patch through bisection, and I'm about to
revert it. But I don't yet understand why it's a problem. Could you
try the approach above to get some diagnostics out, since it's
apparently hard to reproduce and doesn't happen on qemu even with
virtio-net and virtio-rng?

If it's not too much trouble, it would also be nice to get stack
traces from the softint threads by removing `reboot' or `sync' from
the DDB_COMMANDONENTER string, and querying them like this:

db> ps
...
0 66 1 7 200 ffff87f0f1242720 softser/7
0 65 1 7 200 ffff87f0f12422e0 softclk/7
0 64 1 7 200 ffff87f0f121bb40 softbio/7
0 63 1 7 200 ffff87f0f121b700 softnet/7
...
db> bt/a ffff87f0f1242720

At least, if they have `>' or something written on the line after
`softser/7' or similar, usually `tstile' -- if not then they're idle
and the stack trace will be empty. (I don't think we have a ddb
command to print every thread's stack trace -- maybe we should! But I
realize this is a little involved, so I'll try to find another way to
diagnose it if this is too much trouble.)

Taylor R Campbell

unread,
Apr 15, 2022, 7:19:17 PM4/15/22
to Aleksandr Nogikh, Dmitry Vyukov, Taras Madan, syzkaller-...@googlegroups.com
I backed out the offending change that had been holding up syzkaller
for a couple of weeks, but one of the builders is still failing,
apparently because `clang++' is missing from the PATH:

https://syzkaller.appspot.com/bug?id=4ff03a64fbabf97e33eaafa7ad754c51af7a72bc

But there's no log. Did something change in the build environment?
Do you know where this message came from?

Aleksandr Nogikh

unread,
Apr 17, 2022, 1:02:18 PM4/17/22
to Taylor R Campbell, Dmitry Vyukov, Taras Madan, syzkaller-...@googlegroups.com
On Sat, Apr 16, 2022 at 1:19 AM Taylor R Campbell <rias...@netbsd.org> wrote:
>
> I backed out the offending change that had been holding up syzkaller
> for a couple of weeks, but one of the builders is still failing,
> apparently because `clang++' is missing from the PATH:

Thanks! Glad to hear that you bisected and resolved the issue.

>
> https://syzkaller.appspot.com/bug?id=4ff03a64fbabf97e33eaafa7ad754c51af7a72bc
>
> But there's no log. Did something change in the build environment?
> Do you know where this message came from?

Yes, there were changes and I see why it could have happened. I'll
update our syzbot images and redeploy them after the holidays (on
Tue).

In the previous email you said "But I don't yet understand why it's a
problem.". Is this still relevant? If yes, I'll try to do the
debugging steps you suggested and let you know the result.

Taylor R Campbell

unread,
Apr 17, 2022, 2:20:08 PM4/17/22
to Aleksandr Nogikh, Dmitry Vyukov, Taras Madan, syzkaller-...@googlegroups.com
> Date: Sun, 17 Apr 2022 19:02:06 +0200
> From: Aleksandr Nogikh <nog...@google.com>
>
> On Sat, Apr 16, 2022 at 1:19 AM Taylor R Campbell <rias...@netbsd.org> wrote:
> > https://syzkaller.appspot.com/bug?id=4ff03a64fbabf97e33eaafa7ad754c51af7a72bc
> >
> > But there's no log. Did something change in the build environment?
> > Do you know where this message came from?
>
> Yes, there were changes and I see why it could have happened. I'll
> update our syzbot images and redeploy them after the holidays (on
> Tue).

Great, thanks! No particular hurry, have a nice Passover or Easter or
whatever you're up to!

> In the previous email you said "But I don't yet understand why it's a
> problem.". Is this still relevant? If yes, I'll try to do the
> debugging steps you suggested and let you know the result.

Yes, it is still relevant -- I would like to make the change I had to
revert, because the goal of the change is definitely worthwhile (get
entropy early on from the host more reliably).

Aleksandr Nogikh

unread,
Apr 19, 2022, 1:41:59 PM4/19/22
to Taylor R Campbell, Dmitry Vyukov, Taras Madan, syzkaller-...@googlegroups.com
So, I uploaded the new images to syzbot, now the kmsan instance works
normally :)
The image I used -
https://storage.googleapis.com/artifacts.syzkaller.appspot.com/shared-files/19-04-2022-netbsd/netbsd.debug.tar.gz
(ddb activates as you said - with +++++).
The output of ddb - https://pastebin.com/ZVKAx57k

Just in case - it seems that it's possible to set up a small GCE
instance for free (https://cloud.google.com/free). If so, you can also
try to upload the image above and create a VM with it. Though, if
you've never worked with Google Cloud, it can be rather confusing at
first..

Hope this helps!
Reply all
Reply to author
Forward
0 new messages