Help with syzbot reprodecers for bcachefs

97 views
Skip to first unread message

Kent Overstreet

unread,
Oct 25, 2024, 9:33:04 PM10/25/24
to syzk...@googlegroups.com
Syzbot has been turning up some juicy stuff lately...

I'm trying to get the local reproducers working, but I'm clearly missing
something. Just running the C reproducer isn't enough, it seems to need
the disk images too, but I'm not seeing documentation or the scripts
that do that prep.

What would be really slick is if I could get a ktest test written that
just takes as input a specific syzbot bug to reproduce and does all the
rest - in ktest tests specify declaratively all their dependencies
(kernel config, extra qemu config, scratch disks) so in theory we could
make something that can reproduce any syzbot bug locally with a single
command.

Aleksandr Nogikh

unread,
Oct 28, 2024, 2:03:36 PM10/28/24
to Kent Overstreet, syzk...@googlegroups.com
Hi Kent,

There's a guide on how to reproduce a bug using the assets shared by syzbot:
https://github.com/google/syzkaller/blob/master/docs/syzbot_assets.md

If you need to use a locally built kernel instead, it will only differ
in that you would take the `.config` file shared in the bug report and
build the kernel image with it.

> Just running the C reproducer isn't enough, it seems to need the disk images too

Do you mean the images that are mounted by the reproducer or the disk
image for the VM itself? The former are already included in the C/syz
reproducers and for the latter we use Buildroot-based images:
https://github.com/google/syzkaller/blob/master/tools/create-buildroot-image.sh

Here's the latest one:
https://storage.googleapis.com/syzkaller/images/buildroot_amd64_2024.09.gz

> What would be really slick is if I could get a ktest test

That sounds interesting!
I've grepped the torvalds tree for "ktest", but did not find any real
tests, only a few samples in the tools/testing/ktest folder. Are real
ktests scattered somewhere across the maintainer trees?

In any case, if it makes life easier, we can try to generate ktest
files on the syzbot side.

--
Aleksandr
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/syzkaller/acf5pmdwqfmkw3yp2olkfwjvt44f634s7tsor4qm2zcq6dvdqh%40d2jqc64xrlwe.

Kent Overstreet

unread,
Oct 28, 2024, 10:36:08 PM10/28/24
to Aleksandr Nogikh, syzk...@googlegroups.com
On Mon, Oct 28, 2024 at 07:03:18PM +0100, Aleksandr Nogikh wrote:
> Hi Kent,
>
> There's a guide on how to reproduce a bug using the assets shared by syzbot:
> https://github.com/google/syzkaller/blob/master/docs/syzbot_assets.md
>
> If you need to use a locally built kernel instead, it will only differ
> in that you would take the `.config` file shared in the bug report and
> build the kernel image with it.
>
> > Just running the C reproducer isn't enough, it seems to need the disk images too
>
> Do you mean the images that are mounted by the reproducer or the disk
> image for the VM itself? The former are already included in the C/syz
> reproducers and for the latter we use Buildroot-based images:
> https://github.com/google/syzkaller/blob/master/tools/create-buildroot-image.sh

I'm still reading your docs and grokking syzbot, but I believe I just
want the image mounted by the reproducer - ktest has its own root image,
and I'd just add any syzbot specific tools to that.

What I'm mainly looking for is a standard way to set up the test
environment - presumably you have that as scripts in your VM root image,
and I'm wondering if it's subsystem dependent? The docs don't say too
much about the image mounted by the reproducer.

> Here's the latest one:
> https://storage.googleapis.com/syzkaller/images/buildroot_amd64_2024.09.gz
>
> > What would be really slick is if I could get a ktest test
>
> That sounds interesting!
> I've grepped the torvalds tree for "ktest", but did not find any real
> tests, only a few samples in the tools/testing/ktest folder. Are real
> ktests scattered somewhere across the maintainer trees?

ktest is my own thing:
https://evilpiepirate.org/git/ktest.git/

Try it out, it's slick - it's a full CI too, with a dashboard:
https://evilpiepirate.org/~testdashboard/ci?user=kmo&branch=bcachefs-testing

> In any case, if it makes life easier, we can try to generate ktest
> files on the syzbot side.

What I have in mind is a syzbot.ktest in the ktest repo, and then you'd
pass a commandline argument to that, which would be a syzkaller ID, then
syzbot.ktest would fetch the c reproducer and disk image (if
applicable).

But I'm not seeing anything to tie the C reproducer to the disk image,
so maybe you could supply a metadata file that describes that?

The other thing to consider is the kernel .config. It should work to
just use your .config and just overlay any ktest-specific kconfig
dependencies (we probably configure qemu a bit differently than you),
but I wonder if the ktest code for declaratively specifying a kernel
config would be of interest to you:

https://evilpiepirate.org/git/ktest.git/tree/tests/kconfig.sh
That's the base, then tests can do their own require-kernel-config or
require-kernel-config-soft to pull in e.g. CONFIG_FS_BCACHEFS.

Aleksandr Nogikh

unread,
Oct 30, 2024, 2:54:55 PM10/30/24
to Kent Overstreet, syzk...@googlegroups.com
Hi Kent,

On Tue, Oct 29, 2024 at 3:36 AM Kent Overstreet
<kent.ov...@linux.dev> wrote:
>
> On Mon, Oct 28, 2024 at 07:03:18PM +0100, Aleksandr Nogikh wrote:
> > Hi Kent,
> >
> > There's a guide on how to reproduce a bug using the assets shared by syzbot:
> > https://github.com/google/syzkaller/blob/master/docs/syzbot_assets.md
> >
> > If you need to use a locally built kernel instead, it will only differ
> > in that you would take the `.config` file shared in the bug report and
> > build the kernel image with it.
> >
> > > Just running the C reproducer isn't enough, it seems to need the disk images too
> >
> > Do you mean the images that are mounted by the reproducer or the disk
> > image for the VM itself? The former are already included in the C/syz
> > reproducers and for the latter we use Buildroot-based images:
> > https://github.com/google/syzkaller/blob/master/tools/create-buildroot-image.sh
>
> I'm still reading your docs and grokking syzbot, but I believe I just
> want the image mounted by the reproducer - ktest has its own root image,
> and I'd just add any syzbot specific tools to that.
>
> What I'm mainly looking for is a standard way to set up the test
> environment - presumably you have that as scripts in your VM root image,
> and I'm wondering if it's subsystem dependent? The docs don't say too
> much about the image mounted by the reproducer.

It's all subsystem independent. When we handle a #syz test command, we
build the kernel at the required revision, create and boot a VM with
it, upload the compiled program (for C repros) and syzkaller binaries
(for syz repro), then we run the program and monitor the console log
for the signs of crash reports. We only substitute the bzImage to the
VM root image, the rest is done externally.

C reproducers are intended to be fully self-contained -- if you look
closely at the source code, you will notice that they enclose the raw
disk images that they will mount (though in a compressed format to
keep the total size sane). You just need to compile them and run on an
instrumented kernel. Same for Syz reproducers, but they are more
tricky as they also require a specific syzkaller revision to be
checked out and built.

We extract the mounted images only to simplify debugging. Because of
compression, it's not very straightforward to get the raw binary image
from the C file, so we do this automatically just in case someone
needs those blobs.

>
> > Here's the latest one:
> > https://storage.googleapis.com/syzkaller/images/buildroot_amd64_2024.09.gz
> >
> > > What would be really slick is if I could get a ktest test
> >
> > That sounds interesting!
> > I've grepped the torvalds tree for "ktest", but did not find any real
> > tests, only a few samples in the tools/testing/ktest folder. Are real
> > ktests scattered somewhere across the maintainer trees?
>
> ktest is my own thing:
> https://evilpiepirate.org/git/ktest.git/
>
> Try it out, it's slick - it's a full CI too, with a dashboard:
> https://evilpiepirate.org/~testdashboard/ci?user=kmo&branch=bcachefs-testing

Ah, interesting!
Thanks for sharing the links.

>
> > In any case, if it makes life easier, we can try to generate ktest
> > files on the syzbot side.
>
> What I have in mind is a syzbot.ktest in the ktest repo, and then you'd
> pass a commandline argument to that, which would be a syzkaller ID, then
> syzbot.ktest would fetch the c reproducer and disk image (if
> applicable).

By ID you mean the bug ID, right?

We have a JSON API on our syzbot dashboard. For example, if you want
some machine-readable info for
https://syzkaller.appspot.com/bug?extid=8761afeaaf2249358b14, you can
query https://syzkaller.appspot.com/bug?extid=8761afeaaf2249358b14&json=1.
It does not include links to the downloadable assets, but that can be
quickly fixed if needed.

Note that we have request throttling (max 15 requests per 15 seconds
from the same IP). Though if you just want to use the ktest for
debugging purposes, that should not be a problem at all.

>
> But I'm not seeing anything to tie the C reproducer to the disk image,
> so maybe you could supply a metadata file that describes that?

As mentioned above, there's no need to do anything to tie them. They
are already tied as much as possible :)

>
> The other thing to consider is the kernel .config. It should work to
> just use your .config and just overlay any ktest-specific kconfig
> dependencies (we probably configure qemu a bit differently than you),
> but I wonder if the ktest code for declaratively specifying a kernel
> config would be of interest to you:
>
> https://evilpiepirate.org/git/ktest.git/tree/tests/kconfig.sh
> That's the base, then tests can do their own require-kernel-config or
> require-kernel-config-soft to pull in e.g. CONFIG_FS_BCACHEFS.

On our side, we have our custom tool for mass-generating .config files
depending for all of our different syzkaller instances:
https://github.com/google/syzkaller/tree/master/dashboard/config/linux
https://github.com/google/syzkaller/blob/master/dashboard/config/linux/main.yml

Since we fuzz all kernel subsystems at the same time, our .config
files are very heavy and take a lot of time to build even on a
powerful machine. If you just want to debug bcachefs findings, you
might be better off enabling some limited set of options. Here's what
might be related:

https://github.com/google/syzkaller/blob/master/dashboard/config/linux/bits/filesystems.yml#L198
https://github.com/google/syzkaller/blob/master/dashboard/config/linux/bits/kasan.yml
https://github.com/google/syzkaller/blob/master/dashboard/config/linux/bits/lockdep.yml

--
Aleksandr

Kent Overstreet

unread,
Oct 30, 2024, 11:00:50 PM10/30/24
to Aleksandr Nogikh, syzk...@googlegroups.com
On Wed, Oct 30, 2024 at 07:54:37PM +0100, Aleksandr Nogikh wrote:
> It's all subsystem independent. When we handle a #syz test command, we
> build the kernel at the required revision, create and boot a VM with
> it, upload the compiled program (for C repros) and syzkaller binaries
> (for syz repro), then we run the program and monitor the console log
> for the signs of crash reports. We only substitute the bzImage to the
> VM root image, the rest is done externally.
>
> C reproducers are intended to be fully self-contained -- if you look
> closely at the source code, you will notice that they enclose the raw
> disk images that they will mount (though in a compressed format to
> keep the total size sane). You just need to compile them and run on an
> instrumented kernel. Same for Syz reproducers, but they are more
> tricky as they also require a specific syzkaller revision to be
> checked out and built.

Ah, ok.

So my 'sysbot-repro.ktest' worked once (and popped one of the bugs I
need) - but not reliably, mostly it just loops spewing 'executing
program', and does nothing else. Is this something you've seen?

> We extract the mounted images only to simplify debugging. Because of
> compression, it's not very straightforward to get the raw binary image
> from the C file, so we do this automatically just in case someone
> needs those blobs.

Gotcha

> We have a JSON API on our syzbot dashboard. For example, if you want
> some machine-readable info for
> https://syzkaller.appspot.com/bug?extid=8761afeaaf2249358b14, you can
> query https://syzkaller.appspot.com/bug?extid=8761afeaaf2249358b14&json=1.
> It does not include links to the downloadable assets, but that can be
> quickly fixed if needed.
>
> Note that we have request throttling (max 15 requests per 15 seconds
> from the same IP). Though if you just want to use the ktest for
> debugging purposes, that should not be a problem at all.

Cool :)

I'll let you know when I have a nice polished syzbot-repro.ktest; maybe
we can add a link to it?

Kent Overstreet

unread,
Oct 31, 2024, 1:54:30 AM10/31/24
to Aleksandr Nogikh, syzk...@googlegroups.com
On Wed, Oct 30, 2024 at 07:54:37PM +0100, Aleksandr Nogikh wrote:
> We have a JSON API on our syzbot dashboard. For example, if you want
> some machine-readable info for
> https://syzkaller.appspot.com/bug?extid=8761afeaaf2249358b14, you can
> query https://syzkaller.appspot.com/bug?extid=8761afeaaf2249358b14&json=1.
> It does not include links to the downloadable assets, but that can be
> quickly fixed if needed.

It looks like I'll need a way of getting the kernel config that goes
with a given reproducer (shame they don't have the same hex ID, I wonder
if that could be changed?).

Aleksandr Nogikh

unread,
Oct 31, 2024, 12:42:40 PM10/31/24
to Kent Overstreet, syzk...@googlegroups.com
(combined both emails)

On Thu, Oct 31, 2024 at 4:00 AM Kent Overstreet
<kent.ov...@linux.dev> wrote:
>
> On Wed, Oct 30, 2024 at 07:54:37PM +0100, Aleksandr Nogikh wrote:
> > It's all subsystem independent. When we handle a #syz test command, we
> > build the kernel at the required revision, create and boot a VM with
> > it, upload the compiled program (for C repros) and syzkaller binaries
> > (for syz repro), then we run the program and monitor the console log
> > for the signs of crash reports. We only substitute the bzImage to the
> > VM root image, the rest is done externally.
> >
> > C reproducers are intended to be fully self-contained -- if you look
> > closely at the source code, you will notice that they enclose the raw
> > disk images that they will mount (though in a compressed format to
> > keep the total size sane). You just need to compile them and run on an
> > instrumented kernel. Same for Syz reproducers, but they are more
> > tricky as they also require a specific syzkaller revision to be
> > checked out and built.
>
> Ah, ok.
>
> So my 'sysbot-repro.ktest' worked once (and popped one of the bugs I
> need) - but not reliably, mostly it just loops spewing 'executing
> program', and does nothing else. Is this something you've seen?

Yes, that may happen. Sometimes it takes several minutes of looping to
actually crash the kernel, sometimes the reproducer only works in the
same environment where it was generated. I.e. if the reproducer was
found on a GCE-based instance, it may be not as reliable when run on
QEMU.

.config differences could be another reason -- were you using the one
shared by syzbot?

>
> > We extract the mounted images only to simplify debugging. Because of
> > compression, it's not very straightforward to get the raw binary image
> > from the C file, so we do this automatically just in case someone
> > needs those blobs.
>
> Gotcha
>
> > We have a JSON API on our syzbot dashboard. For example, if you want
> > some machine-readable info for
> > https://syzkaller.appspot.com/bug?extid=8761afeaaf2249358b14, you can
> > query https://syzkaller.appspot.com/bug?extid=8761afeaaf2249358b14&json=1.
> > It does not include links to the downloadable assets, but that can be
> > quickly fixed if needed.
> >
> > Note that we have request throttling (max 15 requests per 15 seconds
> > from the same IP). Though if you just want to use the ktest for
> > debugging purposes, that should not be a problem at all.
>
> Cool :)
>
> I'll let you know when I have a nice polished syzbot-repro.ktest; maybe
> we can add a link to it?

A link from where? There are quite a number of options :)

On Thu, Oct 31, 2024 at 6:54 AM Kent Overstreet
<kent.ov...@linux.dev> wrote:
>
> On Wed, Oct 30, 2024 at 07:54:37PM +0100, Aleksandr Nogikh wrote:
> > We have a JSON API on our syzbot dashboard. For example, if you want
> > some machine-readable info for
> > https://syzkaller.appspot.com/bug?extid=8761afeaaf2249358b14, you can
> > query https://syzkaller.appspot.com/bug?extid=8761afeaaf2249358b14&json=1.
> > It does not include links to the downloadable assets, but that can be
> > quickly fixed if needed.
>
> It looks like I'll need a way of getting the kernel config that goes
> with a given reproducer (shame they don't have the same hex ID, I wonder
> if that could be changed?).

That would be problematic to change, unfortunately. In our DB, these
IDs are not related to each other: one is for bugs, the other one is
for texts.

The crashes on our web dashboard (and in the &json=1 response as well)
are ordered according to some concrete rules. When we report a bug to
the mailing lists, we always pick the one with the highest score (=
the first one on the returned list). So unless syzbot has found a
better crash in the meanwhile, the first record is the one you saw in
the email.

If it's important to always fetch the exact one, we could add some
&reported=1 parameter to only return those crashes/reproducers that
were actually reported to LKML.

--
Aleksandr

Kent Overstreet

unread,
Oct 31, 2024, 6:07:41 PM10/31/24
to Aleksandr Nogikh, syzk...@googlegroups.com
Yeah, I tracked it down to a .config issue. Odd one; my workdir where it
was working wasn't using the .config from syzbot, just the normal
bcachefs one (with something else applied, apparently) so had me confused :)

> > Cool :)
> >
> > I'll let you know when I have a nice polished syzbot-repro.ktest; maybe
> > we can add a link to it?
>
> A link from where? There are quite a number of options :)

I'll be adding it to the ktest repo - then yeah, so many options :)

I'd really love to have the closest thing possible to "one click to
reproduce this locally" - what if we added something to the syzbot
dashboard that copies the ktest command to the clipboard?

> <kent.ov...@linux.dev> wrote:
> >
> > On Wed, Oct 30, 2024 at 07:54:37PM +0100, Aleksandr Nogikh wrote:
> > > We have a JSON API on our syzbot dashboard. For example, if you want
> > > some machine-readable info for
> > > https://syzkaller.appspot.com/bug?extid=8761afeaaf2249358b14, you can
> > > query https://syzkaller.appspot.com/bug?extid=8761afeaaf2249358b14&json=1.
> > > It does not include links to the downloadable assets, but that can be
> > > quickly fixed if needed.
> >
> > It looks like I'll need a way of getting the kernel config that goes
> > with a given reproducer (shame they don't have the same hex ID, I wonder
> > if that could be changed?).
>
> That would be problematic to change, unfortunately. In our DB, these
> IDs are not related to each other: one is for bugs, the other one is
> for texts.
>
> The crashes on our web dashboard (and in the &json=1 response as well)
> are ordered according to some concrete rules. When we report a bug to
> the mailing lists, we always pick the one with the highest score (=
> the first one on the returned list). So unless syzbot has found a
> better crash in the meanwhile, the first record is the one you saw in
> the email.
>
> If it's important to always fetch the exact one, we could add some
> &reported=1 parameter to only return those crashes/reproducers that
> were actually reported to LKML.

Hmm, how does it work when there are multiple crashes and reproducers?
Are they the same, or will syzbot keep coming up with new reproducers?

If the latter, then I think we'd want to make it easy to try all of
them; some of them might trigger the crash more reliably then others,
and then for the sake of determinism I think we'd want to provide a link
that fetches .config that goes with that C reproducer as well.

(highest score - you're already determining which one reproduces it the
most reliably? that's nifty)

It seems like you already track which ones go together for the
dashboard, would providing an API be workable?

Maybe it could be something like "fetch the nth reproducer/.config for a
given bug" - then the ktest command would be referencing the bug ID,
which would be nice, and it seems like that might work well with what
the dashboard is already doing...

Aleksandr Nogikh

unread,
Nov 1, 2024, 12:19:14 PM11/1/24
to Kent Overstreet, syzk...@googlegroups.com
On Thu, Oct 31, 2024 at 11:07 PM Kent Overstreet
Yes, something like that is definitely doable. If it's not too
involved, we could even generate the ktest file content, so that
there's no need to query the &json=1 API.
If the crashes keep on happening, syzbot will still find new
reproducers once in a while.

>
> If the latter, then I think we'd want to make it easy to try all of
> them; some of them might trigger the crash more reliably then others,
> and then for the sake of determinism I think we'd want to provide a link
> that fetches .config that goes with that C reproducer as well.
>
> (highest score - you're already determining which one reproduces it the
> most reliably? that's nifty)

That's not so much about reliability, but more about prioritizing
those crashes/reproducers found on the mainline kernel and those that
were detected on amd64 (as opposed to the setups with 32bit userspace
or arm64 or riscv).

>
> It seems like you already track which ones go together for the
> dashboard, would providing an API be workable?
>
> Maybe it could be something like "fetch the nth reproducer/.config for a
> given bug" - then the ktest command would be referencing the bug ID,
> which would be nice, and it seems like that might work well with what
> the dashboard is already doing...

Yes, currently we essentially already share all that information.
There's a list of crashes with per-crash links to configs/repros/etc.

--
Aleksandr

Kent Overstreet

unread,
Nov 1, 2024, 4:51:54 PM11/1/24
to Aleksandr Nogikh, syzk...@googlegroups.com
Is the json API set up to provide that now?

Aleksandr Nogikh

unread,
Nov 4, 2024, 1:21:03 PM11/4/24
to Kent Overstreet, syzk...@googlegroups.com
On Fri, Nov 1, 2024 at 9:51 PM Kent Overstreet
Let's examine the existing output to make sure we're on the same page.

https://syzkaller.appspot.com/bug?extid=8761afeaaf2249358b14&json=1

There's a "crashes" array, each element of which has "kernel-config"
and "c-reproducer" download links (you just need to prepend
https://syzkaller.appspot.com/ to them). The output corresponds
directly to what is displayed on the bug's page on the web dashboard.
As I understand, that array should provide enough information to run
the tests you were talking about, though I don't know how problematic
it might be to iterate over the JSON in Bash.

--
Aleksandr

Kent Overstreet

unread,
Nov 4, 2024, 4:57:30 PM11/4/24
to Aleksandr Nogikh, syzk...@googlegroups.com
On Mon, Nov 04, 2024 at 07:20:47PM +0100, Aleksandr Nogikh wrote:
> On Fri, Nov 1, 2024 at 9:51 PM Kent Overstreet
> > Is the json API set up to provide that now?
>
> Let's examine the existing output to make sure we're on the same page.
>
> https://syzkaller.appspot.com/bug?extid=8761afeaaf2249358b14&json=1
>
> There's a "crashes" array, each element of which has "kernel-config"
> and "c-reproducer" download links (you just need to prepend
> https://syzkaller.appspot.com/ to them). The output corresponds
> directly to what is displayed on the bug's page on the web dashboard.
> As I understand, that array should provide enough information to run
> the tests you were talking about, though I don't know how problematic
> it might be to iterate over the JSON in Bash.

Yeah, that's what I want. I'll just write a helper for parsing the json
(rust + serde should make that trivial), so I believe I just need the
endpoint.

Speaking of helpers, I noticed on your crash reports that it looks like
you're using something lighter weight than scripts/decode_stacktrace.sh
- is that something you could point me at as well?

Aleksandr Nogikh

unread,
Nov 5, 2024, 4:53:54 AM11/5/24
to Kent Overstreet, syzk...@googlegroups.com
We have implemented it in syzkaller's Go code. See e.g.
1) https://github.com/google/syzkaller/blob/master/pkg/symbolizer/symbolizer.go
2) https://github.com/google/syzkaller/blob/509da42949c4013fb236ebf6e25d3562d110198c/pkg/report/linux.go#L383
and the functions below

But essentially we also rely on addr2line just like decode_stacktrace.sh does.

Kent Overstreet

unread,
Nov 5, 2024, 2:47:44 PM11/5/24
to Aleksandr Nogikh, syzk...@googlegroups.com
Thanks, maybe I'll crib off of that and do a C version so we can have it
in the kernel tree

What's the json endpoint?

Aleksandr Nogikh

unread,
Nov 7, 2024, 8:38:03 AM11/7/24
to Kent Overstreet, syzk...@googlegroups.com
On Tue, Nov 5, 2024 at 8:47 PM Kent Overstreet
It's the URL from "Dashboard link" (*) + "&json=1". E.g. for
https://lore.kernel.org/all/672b9f03.050a022...@google.com/T/

it would be https://syzkaller.appspot.com/bug?extid=985f827280dc3a6e7e92&json=1

(*) You get the same link also when you click on any bug title from
our main page: https://syzkaller.appspot.com/upstream

--
Aleksandr

Kent Overstreet

unread,
Nov 7, 2024, 1:35:50 PM11/7/24
to Aleksandr Nogikh, syzk...@googlegroups.com
That looks nice and clean, perfect :)

Kent Overstreet

unread,
Nov 7, 2024, 4:14:16 PM11/7/24
to Aleksandr Nogikh, syzk...@googlegroups.com
And, it's working - syzbot reproducers running locally in a single
command, with kgdb/ssh and all that:

$ btk run -I ~/ktest/tests/syzbot-repro.ktest f074d2e31d8d35a6a38c

It takes an optional crash index, i.e.

$ btk run -I ~/ktest/tests/syzbot-repro.ktest f074d2e31d8d35a6a38c

Patch is here:
https://evilpiepirate.org/git/ktest.git/commit/?id=3c30e501fb0d1413849cfc4f5832f8f5cff48585

How shall we tell people about this? :)

Kent Overstreet

unread,
Nov 7, 2024, 5:16:01 PM11/7/24
to Aleksandr Nogikh, syzk...@googlegroups.com
On Thu, Nov 07, 2024 at 04:14:12PM -0500, Kent Overstreet wrote:
> And, it's working - syzbot reproducers running locally in a single
> command, with kgdb/ssh and all that:
>
> $ btk run -I ~/ktest/tests/syzbot-repro.ktest f074d2e31d8d35a6a38c
>
> It takes an optional crash index, i.e.
>
> $ btk run -I ~/ktest/tests/syzbot-repro.ktest f074d2e31d8d35a6a38c
>
> Patch is here:
> https://evilpiepirate.org/git/ktest.git/commit/?id=3c30e501fb0d1413849cfc4f5832f8f5cff48585
>
> How shall we tell people about this? :)

Also, have you given any thought to kconfig minimization? The syzbot
kconfigs take something like 10x longer to build than what I'm used to.

There's also something in there that breaks kgdb (not panic timeout, I
fixed that one), so if you've got any ideas there I'm all ears...

Aleksandr Nogikh

unread,
Nov 13, 2024, 6:47:35 AM11/13/24
to Kent Overstreet, Dmitry Vyukov, syzk...@googlegroups.com
On Thu, Nov 7, 2024 at 10:14 PM Kent Overstreet
Cool! :)

> How shall we tell people about this? :)

We should definitely mention it in the syzbot documentation. Currently
the bug reproduction instructions are scattered across
https://github.com/google/syzkaller/blob/master/docs/syzbot_assets.md
and https://github.com/google/syzkaller/blob/master/docs/reproducing_crashes.md
(and IMO the latter one desperately needs an update/rewrite).

In any case, as a first step, it would be great to have a complete
list of commands to use ktest from scratch. E.g. from the first `git
clone` of ktest and checkout of the kernel tree to actually running
the syzbot-repro.ktest test.

Unfortunately we don't have a syzbot blog to advertise the feature,
but I guess we could at least ask Dmitry to tweet the news? :)

--
Aleksandr

Kent Overstreet

unread,
Nov 13, 2024, 9:55:13 PM11/13/24
to Aleksandr Nogikh, Dmitry Vyukov, syzk...@googlegroups.com
Clone the repo, then run as root

<path to ktest>/root_image create

Then (optional) symlink build-test-kernel into your path

Then, from your kernel source tree run

build-test-kernel run -I <path to ktest>/tests/syzbot-repro.ktest <bug-id>

Could that go in the syzbot docs?

> Unfortunately we don't have a syzbot blog to advertise the feature,
> but I guess we could at least ask Dmitry to tweet the news? :)

That'd be cool :)

Aleksandr Nogikh

unread,
Nov 18, 2024, 7:29:49 AM11/18/24
to Kent Overstreet, syzkaller
On Wed, Nov 13, 2024 at 6:03 PM Kent Overstreet
<kent.ov...@linux.dev> wrote:
>
> On Wed, Nov 13, 2024 at 12:55:53PM +0100, Aleksandr Nogikh wrote:
> > On Thu, Nov 7, 2024 at 11:15 PM Kent Overstreet
> > <kent.ov...@linux.dev> wrote:
> > >
> > > On Thu, Nov 07, 2024 at 04:14:12PM -0500, Kent Overstreet wrote:
> > > > And, it's working - syzbot reproducers running locally in a single
> > > > command, with kgdb/ssh and all that:
> > > >
> > > > $ btk run -I ~/ktest/tests/syzbot-repro.ktest f074d2e31d8d35a6a38c
> > > >
> > > > It takes an optional crash index, i.e.
> > > >
> > > > $ btk run -I ~/ktest/tests/syzbot-repro.ktest f074d2e31d8d35a6a38c
> > > >
> > > > Patch is here:
> > > > https://evilpiepirate.org/git/ktest.git/commit/?id=3c30e501fb0d1413849cfc4f5832f8f5cff48585
> > > >
> > > > How shall we tell people about this? :)
> > >
> > > Also, have you given any thought to kconfig minimization? The syzbot
> > > kconfigs take something like 10x longer to build than what I'm used to.
> >
> > Yes, we considered it and it will very likely happen some day, but I
> > cannot say when exactly. It's going to be quite resource-intensive, so
> > definitely not until after we have refactored syz-ci (the component
> > that actually builds kernels/runs jobs) to make it more scalable.
> >
> > See this issue: https://github.com/google/syzkaller/issues/3199
>
> It might be possible to skip the config bisection if we can come up with
> per-subsystem .configs (which would have to be automatic, given the
> number of subsystems you have to deal with)
>
> ktest also has some machinery for declarative specification of kconfigs,
> which I wonder if you'd find useful.
>
> Maybe it'd be something like - bisect to find a minimized kconfig once
> per subsystem, then check if a bug repros with the subsystem minimized
> kconfig - if it doesn't, maybe add an an option or two to the subsystem
> config?

That is a very interesting suggestion, thank you!
I've noted it in the GitHub issue.

The most tricky part here would be to actually come up with those
per-subsystem configs. It's totally doable if we e.g. just define a
few most generic smaller configs (one config for all filesystems, one
for all networking stuff, one for all usb drivers), but otherwise it's
not entirely clear how to automatically attribute subsets of configs
to each of the 100s of Linux subsystems.

Currently syzbot extracts its subsystems from the MAINTAINERS entries
and those do not reference any KConfig entries at all. So it looks
like it would unfortunately require manual work in any case.

>
> > > There's also something in there that breaks kgdb (not panic timeout, I
> > > fixed that one), so if you've got any ideas there I'm all ears...
> >
> > We don't set CONFIG_KGDB in syzbot's .config files. Does ktest change
> > this before building the kernel?
>
> We do - I bisected it to, of all things, CONFIG_NR_CPUS=8 breaking kgdb
> (!)

Oh, it's very surprising indeed. So if you set it to a different
value, kgdb starts working fine?

Aleksandr Nogikh

unread,
Nov 18, 2024, 7:36:55 AM11/18/24
to Kent Overstreet, syzkaller
(+ the public group)

On Mon, Nov 18, 2024 at 1:36 PM Aleksandr Nogikh <nog...@google.com> wrote:
>
> On Thu, Nov 14, 2024 at 3:55 AM Kent Overstreet
> Yes, definitely :)
>
> Let me try it out on my workstation first. I'll let you know how it
> went and whether I needed any extra steps.

Aleksandr Nogikh

unread,
Nov 22, 2024, 8:15:07 AM11/22/24
to Kent Overstreet, syzkaller
I've run the commands locally and got to the VM running stage, then it
hung. Here's the output:
https://pastebin.com/B2HAzAup

I guess it's having trouble sshing into the VM. Do you actually need
to authenticate with keys to the test VMs? On syzbot, we now use
buildroot-based VMs to which you can just log in with an empty
password. These are anyway only for testing purposes.

Aleksandr Nogikh

unread,
Nov 22, 2024, 9:08:14 AM11/22/24
to Kent Overstreet, syzkaller
Ah, I used -l instead of -I. With -I, there's no "illegal option -- l"
error, but the outcome is still the same.

In the meanwhile, I've drafted a new version of the documentation that
will include ktest as well:
https://github.com/google/syzkaller/pull/5529

Kent Overstreet

unread,
Nov 22, 2024, 3:36:30 PM11/22/24
to Aleksandr Nogikh, syzkaller
Ahh, no: I run all my bash scripts with errexit, and that error isn't
handled - I'll post a fix momentarily

Kent Overstreet

unread,
Nov 22, 2024, 3:43:02 PM11/22/24
to Aleksandr Nogikh, syzkaller
On Fri, Nov 22, 2024 at 02:14:50PM +0100, Aleksandr Nogikh wrote:
> I've run the commands locally and got to the VM running stage, then it
> hung. Here's the output:
> https://pastebin.com/B2HAzAup
>
> I guess it's having trouble sshing into the VM. Do you actually need
> to authenticate with keys to the test VMs? On syzbot, we now use
> buildroot-based VMs to which you can just log in with an empty
> password. These are anyway only for testing purposes.

Back when we were mainly using qemu userspace networking and connecting
over a unix domain socket that would've been fine, but opening up tcp
port without authentication is something I don't think I want to do :)

(That code still exists in ktest, but the liblwipv6 code was prone to
getting stuck burning a core, doh)

It now just prints a warning and continues if you don't have an ssh
pubkey.

Kent Overstreet

unread,
Nov 22, 2024, 3:49:06 PM11/22/24
to Aleksandr Nogikh, syzkaller
On Fri, Nov 22, 2024 at 03:07:59PM +0100, Aleksandr Nogikh wrote:
> Ah, I used -l instead of -I. With -I, there's no "illegal option -- l"
> error, but the outcome is still the same.
>
> In the meanwhile, I've drafted a new version of the documentation that
> will include ktest as well:
> https://github.com/google/syzkaller/pull/5529

Nice, added a comment about the rust and cap'n proto stuff - those parts
are only for the CI.

Any thoughts on other features we might want to add?

I've been thinking it might be worth having it loop through all the
reproducers instead of just running one, there's some bugs (e.g.
https://syzkaller.appspot.com/bug?extid=c6fd966ebbdea1e8ff08) that
haven't been wanting to pop for me locally.

Another thought...

the ktest CI/dashboard is built around providing a git log view, and
that visualization of where in the history bugs are popping up (and with
what frequency) has been immensely useful. You've already got crash
reports tied to git revisions, so...

Aleksandr Nogikh

unread,
Nov 25, 2024, 9:00:52 AM11/25/24
to Kent Overstreet, syzkaller
I've repeated the steps using the latest ktest checkout - and it's
working correctly :)

On Fri, Nov 22, 2024 at 9:49 PM Kent Overstreet
<kent.ov...@linux.dev> wrote:
>
> On Fri, Nov 22, 2024 at 03:07:59PM +0100, Aleksandr Nogikh wrote:
> > Ah, I used -l instead of -I. With -I, there's no "illegal option -- l"
> > error, but the outcome is still the same.
> >
> > In the meanwhile, I've drafted a new version of the documentation that
> > will include ktest as well:
> > https://github.com/google/syzkaller/pull/5529
>
> Nice, added a comment about the rust and cap'n proto stuff - those parts
> are only for the CI.

Thanks for having taken a look!

>
> Any thoughts on other features we might want to add?
>
> I've been thinking it might be worth having it loop through all the
> reproducers instead of just running one, there's some bugs (e.g.
> https://syzkaller.appspot.com/bug?extid=c6fd966ebbdea1e8ff08) that
> haven't been wanting to pop for me locally.

That can be quite useful, I think.

>
> Another thought...
>
> the ktest CI/dashboard is built around providing a git log view, and
> that visualization of where in the history bugs are popping up (and with
> what frequency) has been immensely useful.

Could you please share how this visualization has been helpful for you?

> You've already got crash reports tied to git revisions, so...

We always report a git revision on which we found a bug, but due to
the non-deterministic nature of fuzzing, the discovered bug could be
the one introduced at any time from yesterday to the beginning of the
Linux history.

Once a reproducer is found, syzbot attempts to bisect the git log to
identify the exact revision when the kernel starts crashing. Its
results are correct, I'd say, in ~80% of cases.

We have this visualization of the distribution of the cause/fix
bisection results over time, but, to be honest, I cannot draw any
practical conclusions from it :)
https://syzkaller.appspot.com/upstream/graph/lifetimes

--
Aleksandr

Kent Overstreet

unread,
Nov 25, 2024, 4:33:42 PM11/25/24
to Aleksandr Nogikh, syzkaller
Test data is noisy - I know we all want clean dashboards, but I don't
think that's really the right thing to be aiming for; as soon as my test
dashboard starts getting close to clean I start looking for new
hardening to do (fault injection, new assertions, etc.).

So there's always failing tests, and a lot of those tests only fail
intermittently. Running tests x number of times would help, but not
eliminate the problem, and it's not really necessary with the "test the
whole branch history" approach I take.

That gets me a lot of information on what's happening over time, so I
can quickly spot a commit that introduces a heisenbug - even in the
presence of many failing tests, since I'll see the commit at which the
number of test failures ticked up.

And looking at an individual test failure, I can see at a glance if it's
something that just started failing, or recently started failing more
often.

> > You've already got crash reports tied to git revisions, so...
>
> We always report a git revision on which we found a bug, but due to
> the non-deterministic nature of fuzzing, the discovered bug could be
> the one introduced at any time from yesterday to the beginning of the
> Linux history.

*nod*

> Once a reproducer is found, syzbot attempts to bisect the git log to
> identify the exact revision when the kernel starts crashing. Its
> results are correct, I'd say, in ~80% of cases.

80%? Uhh, of all the bcachefs syzbot bugs I don't think I've seen a
single one where the bisect was accurate :)

For the hard, interesting bugs where I end up spending most of my time,
bisect tends not to be useful; I rarely use it. I suspect that's because
most of the bugs where bisect would be useful are caught immediately by
my CI.

> We have this visualization of the distribution of the cause/fix
> bisection results over time, but, to be honest, I cannot draw any
> practical conclusions from it :)
> https://syzkaller.appspot.com/upstream/graph/lifetimes

What I have in mind would be a log/visualization for individual bugs -
if you have that I haven't seen it yet.

Here's the equivalent from my dashboard:
https://evilpiepirate.org/~testdashboard/ci?user=kmo&branch=bcachefs-testing&test=^fs.bcachefs.fstests-nocow.generic.224$

So that tells me it's not a new bug, just one that occurs quite
infrequently (and if you look at the test failure, it's not hard to see
why; it's a latency issue where we hit a 10 second timer, and it's a
multithreaded metadata test that would stress that sort of thing).

If you're interested, I think it'd be pretty easy to adopt my dashboard
code to your results database.

Aleksandr Nogikh

unread,
Dec 2, 2024, 6:23:35 AM12/2/24
to Kent Overstreet, syzkaller
On Mon, Nov 25, 2024 at 10:33 PM Kent Overstreet
Hmmm, that's interesting. Apart from the obvious cases where a
bisection points to some absolutely irrelevant part of the kernel,
there are also the results that look quite reasonable, at least to a
person not well acquainted with the bcachefs implementation. Some
examples after a quick look at the syzbot dashboard:

https://syzkaller.appspot.com/bug?extid=92e65e9b7a42d379f92e
https://syzkaller.appspot.com/bug?extid=47f334396d741f9cb1ce
https://syzkaller.appspot.com/bug?extid=a27c3aaa3640dd3e1dfb

Are these also incorrect? If not, is there something that syzbot could
have done differently to get a correct result?

>
> For the hard, interesting bugs where I end up spending most of my time,
> bisect tends not to be useful; I rarely use it. I suspect that's because
> most of the bugs where bisect would be useful are caught immediately by
> my CI.
>
> > We have this visualization of the distribution of the cause/fix
> > bisection results over time, but, to be honest, I cannot draw any
> > practical conclusions from it :)
> > https://syzkaller.appspot.com/upstream/graph/lifetimes
>
> What I have in mind would be a log/visualization for individual bugs -
> if you have that I haven't seen it yet.
>
> Here's the equivalent from my dashboard:
> https://evilpiepirate.org/~testdashboard/ci?user=kmo&branch=bcachefs-testing&test=^fs.bcachefs.fstests-nocow.generic.224$

For this one I only see "passed" on all rows. Is that intended?

>
> So that tells me it's not a new bug, just one that occurs quite
> infrequently (and if you look at the test failure, it's not hard to see
> why; it's a latency issue where we hit a 10 second timer, and it's a
> multithreaded metadata test that would stress that sort of thing).
>
> If you're interested, I think it'd be pretty easy to adopt my dashboard
> code to your results database.

I think I understand what you mean now, thanks for the explanations!

In our case, we're dealing with the recordings of what was found
during fuzzing, and these are, I'd say, even noisier than the results
of running unstable tests. For tests, you run them at least once on
each revision of your tree, while the fuzzer may not even try to
execute the faulty sequence of system calls on some (or, actually, on
most) days. So even if the bug is 100% reproducible, it will just not
be seen regularly. The data we process is also different: we do not
deal with passed/failed scenarios, but rather with crash titles that
can theoretically be triggered in many different ways.

We do have some historical per-bug data on the number of times it
occurred on each day, e.g.
https://syzkaller.appspot.com/upstream/graph/crashes?regexp=kernel+BUG+in+bch2_bucket_alloc_trans&Months=5&show-graph=Show+graph

...but this data isn't directly linked to specific revisions, at least
not on the pages we currently display.

So we could definitely do a much better job on displaying the list of
revisions where the bug was detected. I've filed
https://github.com/google/syzkaller/issues/5557, is that something
that would make the analysis easier for you?

--
Aleksandr
Reply all
Reply to author
Forward
0 new messages