what trees/branches to test on syzbot

119 views
Skip to first unread message

Dmitry Vyukov

unread,
Jan 16, 2018, 2:51:58 AM1/16/18
to LKML, Theodore Ts'o, Eric W. Biederman, Greg Kroah-Hartman, Andrew Morton, Linus Torvalds, syzkaller, Guenter Roeck, Stephen Rothwell, David Miller
Hello,

Several people proposed that linux-next should not be tested on
syzbot. While some people suggested that it needs to test as many
trees as possible. I've initially included linux-next as it is a
staging area before upstream tree, with the intention that patches are
_tested_ there, is they are not tested there, bugs enter upstream
tree. And then it takes much longer to get fix into other trees.

So the question is: what trees/branches should be tested? Preferably
in priority order as syzbot can't test all of them.

Thanks

Guenter Roeck

unread,
Jan 16, 2018, 4:45:51 AM1/16/18
to Dmitry Vyukov, LKML, Theodore Ts'o, Eric W. Biederman, Greg Kroah-Hartman, Andrew Morton, Linus Torvalds, syzkaller, Stephen Rothwell, David Miller
I always thought that -next existed specifically to give people a
chance to test the code in it. Maybe the question is where to report
the test results ?

Guenter

Dmitry Vyukov

unread,
Jan 16, 2018, 4:59:13 AM1/16/18
to Guenter Roeck, LKML, Theodore Ts'o, Eric W. Biederman, Greg Kroah-Hartman, Andrew Morton, Linus Torvalds, syzkaller, Stephen Rothwell, David Miller, Fengguang Wu
FTR, from Guenter on another thread:

> Interesting. Assuming that refers to linux-next, not linux-net, that
> may explain why linux-next tends to deteriorate. I wonder if I should
> drop it from my testing as well. I'll be happy to follow whatever the
> result of this exchange is and do the same.

If we agree on some list of important branches, and what branches
specifically should not be tested with automatic reporting, I think it
will benefit everybody.
+Fengguang, can you please share your list and rationale behind it?

Guenter Roeck

unread,
Jan 16, 2018, 11:58:02 AM1/16/18
to Dmitry Vyukov, LKML, Theodore Ts'o, Eric W. Biederman, Greg Kroah-Hartman, Andrew Morton, Linus Torvalds, syzkaller, Stephen Rothwell, David Miller, Fengguang Wu

Eric W. Biederman

unread,
Jan 16, 2018, 12:04:21 PM1/16/18
to Dmitry Vyukov, Guenter Roeck, LKML, Theodore Ts'o, Greg Kroah-Hartman, Andrew Morton, Linus Torvalds, syzkaller, Stephen Rothwell, David Miller, Fengguang Wu
The problem is testing linux-next and then using get-maintainer.pl to
report the problem.

If you are resource limited I would start by testing Linus's tree to
find the existing bugs, and to get a baseline. Using get-maintainer.pl
is fine for sending emails to developers there.

After that I would test the individual tress that are pulled into
linux-next. So that any issue not found in Linus's tree can be
attributed to the tree you are testing and sent the the appropriate
maintainer.

After that I would consider testing linux-next itself and see if any
issues are caused by the merger of all of those trees.

Eric

Greg Kroah-Hartman

unread,
Jan 16, 2018, 12:34:44 PM1/16/18
to Eric W. Biederman, Dmitry Vyukov, Guenter Roeck, LKML, Theodore Ts'o, Andrew Morton, Linus Torvalds, syzkaller, Stephen Rothwell, David Miller, Fengguang Wu
I second this, almost all of the issues you are hitting are usually in
Linus's tree. Let's make that "clean" first, before messing around and
adding 100+ other random developer's trees into the mix :)

thanks,

greg k-h

Fengguang Wu

unread,
Jan 18, 2018, 8:48:45 PM1/18/18
to Dmitry Vyukov, Guenter Roeck, LKML, Theodore Ts'o, Eric W. Biederman, Greg Kroah-Hartman, Andrew Morton, Linus Torvalds, syzkaller, Stephen Rothwell, David Miller
Hi Dmitry,
0-day aims to aggressively test as much tree and branches as possible,
including various developer trees, maintainer, linux-next, mainline and
stable trees. Here are the complete list of 800+ trees we monitored:

https://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git/tree/repo/linux

The rationale is obvious. IMHO what really matters here is about
capability rather than rationale: that policy heavily relies on the
fundamental capability of auto bisecting. Once regressions are
bisected, we know the owners of problem to auto send report to, ie.
the first bad commit's author and committer.

For the bugs that cannot be bisected, they tend to be old ones and
we report more often on mainline tree than linux-next.

Thanks,
Fengguang

Dmitry Vyukov

unread,
Jan 22, 2018, 8:32:26 AM1/22/18
to Greg Kroah-Hartman, Eric W. Biederman, Guenter Roeck, LKML, Theodore Ts'o, Andrew Morton, Linus Torvalds, syzkaller, Stephen Rothwell, David Miller, Fengguang Wu
FTR I've just dropped linux-next and mmots from syzbot.

Dmitry Vyukov

unread,
Jan 22, 2018, 8:34:53 AM1/22/18
to Fengguang Wu, Guenter Roeck, LKML, Theodore Ts'o, Eric W. Biederman, Greg Kroah-Hartman, Andrew Morton, Linus Torvalds, syzkaller, Stephen Rothwell, David Miller
Thanks for the info, Fengguang.

Bisecting is something we need to syzbot in future. However about 50%
of syzbot bugs are due to races and are somewhat difficult to bisect
reliably.

Tetsuo Handa

unread,
Jun 9, 2018, 2:36:50 AM6/9/18
to Dmitry Vyukov, Greg Kroah-Hartman, Eric W. Biederman, Guenter Roeck, LKML, Theodore Ts'o, Andrew Morton, Linus Torvalds, syzkaller, Stephen Rothwell, David Miller, Fengguang Wu
I hope that we can test linux-next on syzbot, as a tree for testing debug
printk() patches. People do not like sending debug printk() patches to
Linus's tree, while majority of bugs are found in Linus's tree.

We could automatically expire (and delete) reports found in linux-next from
the table at https://syzkaller.appspot.com/ if the bug was not reproduced
for some time (e.g. one week or one month).

Linus Torvalds

unread,
Jun 9, 2018, 6:17:33 PM6/9/18
to Tetsuo Handa, Dmitry Vyukov, Greg Kroah-Hartman, Eric W. Biederman, Guenter Roeck, Linux Kernel Mailing List, Theodore Ts'o, Andrew Morton, syzkaller, Stephen Rothwell, David Miller, Wu Fengguang
On Fri, Jun 8, 2018 at 11:36 PM Tetsuo Handa
<penguin...@i-love.sakura.ne.jp> wrote:
> On 2018/01/22 22:32, Dmitry Vyukov wrote:
> >
> > FTR I've just dropped linux-next and mmots from syzbot.
>
> I hope that we can test linux-next on syzbot, as a tree for testing debug
> printk() patches.

I think it would be lovely to get linux-next back eventually, but it
sounds like it's just too noisy right now, and yes, we should have a
baseline for the standard tree first.

But once there's a "this is known for the baseline", I think adding
linux-next back in and then maybe even have linux-next simply just
kick out trees that cause problems would be a good idea.

Right now linux-next only kicks things out based on build issues (or
extreme merge issues), afaik. But it *would* be good to also have
things like syzbot do quality control on linux-next.

Because the more things get found and fixed before they even hit my
tree, the better.

Linus

Theodore Y. Ts'o

unread,
Jun 9, 2018, 9:51:17 PM6/9/18
to Linus Torvalds, Tetsuo Handa, Dmitry Vyukov, Greg Kroah-Hartman, Eric W. Biederman, Guenter Roeck, Linux Kernel Mailing List, Andrew Morton, syzkaller, Stephen Rothwell, David Miller, Wu Fengguang
On Sat, Jun 09, 2018 at 03:17:21PM -0700, Linus Torvalds wrote:
> I think it would be lovely to get linux-next back eventually, but it
> sounds like it's just too noisy right now, and yes, we should have a
> baseline for the standard tree first.
>
> But once there's a "this is known for the baseline", I think adding
> linux-next back in and then maybe even have linux-next simply just
> kick out trees that cause problems would be a good idea.
>
> Right now linux-next only kicks things out based on build issues (or
> extreme merge issues), afaik. But it *would* be good to also have
> things like syzbot do quality control on linux-next.

Syzbot is always getting improved to find new classes of problems. So
the only way to get a baseline would be to use an older version of
syzbot for linux-next, and to have it suppress sending e-mails about
failures that are duplicates that were already found via the mainline
tree.

Then periodically, once version N has run for M weeks, and has spewed
some large number of new failures to LKML, then you could promote
version N to be run against linux-next, and so hopefully the only
thing it would report against linux-next are regressions, and not
duplicates of new bugs also being found via the latest and greatest
version of syzbot being run against the mainline kernel.

- Ted

Dmitry Vyukov

unread,
Jun 10, 2018, 2:11:27 AM6/10/18
to Theodore Y. Ts'o, Linus Torvalds, Tetsuo Handa, Dmitry Vyukov, Greg Kroah-Hartman, Eric W. Biederman, Guenter Roeck, Linux Kernel Mailing List, Andrew Morton, syzkaller, Stephen Rothwell, David Miller, Wu Fengguang
The set of trees where a crash happened is visible on dashboard, so
one can see if it's only linux-next or whole set of trees. Potentially
syzbot can act differently depending on this predicate, but I don't
see what should be the difference. However, this does not fully save
from falsely assessing bugs as linux-next-only just because they
happened few times and only on linux-next so far. But using an older
syzkaller revision won't save from this fully either, because (1) some
bugs take long time to find, and (2) a bug can be hidden by another
known bug, so when the second bug is fixed the first one suddenly pops
up, but it's not a new bug (and the chances are that the second one
will be fixed on linux-next first, so the first bug will look like
linux-next-only).
I think re removing commits from linux-next, one of the main signals
can be: were there recent changes related to the bug. Looking at new
bugs being reported, frequently it's quite obvious (e.g.
"use-after-free in foo" and a recent "make foo faster").
But in general, if we go with linux-next, maintainers and developers
need to agree to deal with this additional aspect during bug triage.

There is also a problem with rebasing of linux-next: reported commit
hashes do not make sense and we can forget about bisection.

On a related note, recently Greg suggested to onboard more subsystem
-next trees (currently we test only net-next and bpf-next), so I tried
to formulate requirements for these trees:

https://github.com/google/syzkaller/issues/592
- not rebased (commit hashes work, bisection works)
- maintained in a reasonably good shape (no tons of assorted crashes)
- reasonably active (makes sense to test)
- merge upstream periodically (bugs are getting fixed)
- with maintainers who are willing to cooperate and fix bugs

Any volunteers?

Thanks

Theodore Y. Ts'o

unread,
Jun 10, 2018, 9:23:00 PM6/10/18
to Dmitry Vyukov, Linus Torvalds, Tetsuo Handa, Greg Kroah-Hartman, Eric W. Biederman, Guenter Roeck, Linux Kernel Mailing List, Andrew Morton, syzkaller, Stephen Rothwell, David Miller, Wu Fengguang
On Sun, Jun 10, 2018 at 08:11:05AM +0200, Dmitry Vyukov wrote:
>
> The set of trees where a crash happened is visible on dashboard, so
> one can see if it's only linux-next or whole set of trees. Potentially
> syzbot can act differently depending on this predicate, but I don't
> see what should be the difference. However, this does not fully save
> from falsely assessing bugs as linux-next-only just because they
> happened few times and only on linux-next so far.

So how about this, only report something as being a linux-next
regression if (a) there is a reproducer, and (b) the reproducer does
not trigger any kind of crash on mainline?

> There is also a problem with rebasing of linux-next: reported commit
> hashes do not make sense and we can forget about bisection.

If there is a valid reproducer, bisection should simply be a matter ofu
running and if we know the reproducer doesn't trigger on mainline,
then the bisection should only require no more than 8-10 VM runs. For
Linux-next, this would be *super* valuable. Reporting the commit ID
and the one-line commit summary will be enough for most maintainers,
since even if they are using a rewinding head, so long as the
bisection can be done quickly enough (e.g., within a few days), it
will still be in their git repository.

And if you have a reproducer, then once it's identified as a
linux-next reproducer with a guilty commit, that can be confirmed by
either (a) seeing if you can revert the commit and if it makes the
problem go away, or (b) figure out which subsystem git tree the commit
was introduced via, and then verify that the reproducer triggers on
the tip of the subsystem git tree.

All of this will require development effort, so I suspect it's not
something we'll see from syzbot tomorrow --- but it's not
*impossible*.

I think though that sending e-mail about a linux-next syzbot crash if
there is a reproducer and the reproducer doesn't trigger a crash on
mainline should be really simple to implement, and it would add huge
value without spamming the subsystem maintainers.

- Ted

Dmitry Vyukov

unread,
Jun 15, 2018, 5:54:38 AM6/15/18
to Theodore Y. Ts'o, Dmitry Vyukov, Linus Torvalds, Tetsuo Handa, Greg Kroah-Hartman, Eric W. Biederman, Guenter Roeck, Linux Kernel Mailing List, Andrew Morton, syzkaller, Stephen Rothwell, David Miller, Wu Fengguang
But if this also happens on upstream, then we want to report it
twofold. So this predicate can be reduced to "report crashes that
happen only on linux-next iff they have reproducers", right?
We will probably also need something that will auto-invalidate old
bugs that were never reported.

Re backwards bisection (when bug is introduced), we can actually test
linux-next-history instead of linux-next, right?
But forward bisection (when bug is fixed) unfortunately won't work
because these commits are not connected to HEAD. And forward bisection
is very important, otherwise who will bring order to all these
hundreds of open bugs?
https://syzkaller.appspot.com/

Stephen Rothwell

unread,
Jun 18, 2018, 12:52:54 AM6/18/18
to Dmitry Vyukov, Theodore Y. Ts'o, Linus Torvalds, Tetsuo Handa, Greg Kroah-Hartman, Eric W. Biederman, Guenter Roeck, Linux Kernel Mailing List, Andrew Morton, syzkaller, David Miller, Wu Fengguang
Hi Dmitry,

On Fri, 15 Jun 2018 11:54:16 +0200 Dmitry Vyukov <dvy...@google.com> wrote:
>
> Re backwards bisection (when bug is introduced), we can actually test
> linux-next-history instead of linux-next, right?

I don't see why using linux-next-history would be any better, it just
contains all the linux-next releases while the linux-next tree contains
the last 3 months worth.

--
Cheers,
Stephen Rothwell

Eric W. Biederman

unread,
Jun 18, 2018, 2:11:24 AM6/18/18
to Dmitry Vyukov, Theodore Y. Ts'o, Linus Torvalds, Tetsuo Handa, Greg Kroah-Hartman, Guenter Roeck, Linux Kernel Mailing List, Andrew Morton, syzkaller, Stephen Rothwell, David Miller, Wu Fengguang
Maybe you want to monitor linux-next and see if the problem commits
disappear. That can let you stop worrying about the issue.

I don't see the point of worrying about which linux-next build a problem
appeared in. It is the first commit that reproduces the problem that is
interesting.

That commit tells you who did something that was problematic. If you
notify the committer with the reproducer they should be able to
reproduce the problem and fix it.

Very rarely I suspect it will be the merge commit into linux-next that
is the problem, but most of the time these commits are going to be in
the subsystem trees.

Eric

Alan Cox

unread,
Jun 18, 2018, 9:54:44 AM6/18/18
to Dmitry Vyukov, Theodore Y. Ts'o, Linus Torvalds, Tetsuo Handa, Greg Kroah-Hartman, Eric W. Biederman, Guenter Roeck, Linux Kernel Mailing List, Andrew Morton, syzkaller, Stephen Rothwell, David Miller, Wu Fengguang
> But forward bisection (when bug is fixed) unfortunately won't work
> because these commits are not connected to HEAD. And forward bisection
> is very important, otherwise who will bring order to all these
> hundreds of open bugs?
> https://syzkaller.appspot.com/

Bisection isn't so important when you are trying to close bugs that
got fixed, with a note that it's no longer reproducable. It might mean the
reproducer broke but it also stops you drowning and it tells a user that
they might as well try the new one and see if still breaks thus
collecting the information needed.

True it's nice to know what commit may have magically fixed it but it's
not essential. Further more once you see a bug is fixed even in -next you
can later run the reproducer against an actual release to make sure it's
still fixed there, and bisect between previous release and that release to
find a mainline commit id if it's a single fix point.

Alan

Tetsuo Handa

unread,
Jun 26, 2018, 6:55:31 AM6/26/18
to Linus Torvalds, Dmitry Vyukov, Greg Kroah-Hartman, Eric W. Biederman, Guenter Roeck, Linux Kernel Mailing List, Theodore Ts'o, Andrew Morton, syzkaller, Stephen Rothwell, David Miller, Wu Fengguang
I hope we can accept NOW either "reviving linux-next.git" or "allowing debug printk()
patches for linux.git". For example, "INFO: task hung in __sb_start_write" got 900
crashes in 81 days but still unable to find a reproducer. Dmitry tried to reproduce
locally with debug printk() patches but not yet successful. I think that testing with
http://lkml.kernel.org/r/f91e1c82-9693-cca3...@i-love.sakura.ne.jp
on linux.git or linux-next.git is the only realistic way for debugging this bug.
More we postpone revival of the linux-next, more syzbot reports we will get...

Theodore Y. Ts'o

unread,
Jun 26, 2018, 10:16:13 AM6/26/18
to Tetsuo Handa, Linus Torvalds, Dmitry Vyukov, Greg Kroah-Hartman, Eric W. Biederman, Guenter Roeck, Linux Kernel Mailing List, Andrew Morton, syzkaller, Stephen Rothwell, David Miller, Wu Fengguang
On Tue, Jun 26, 2018 at 07:54:53PM +0900, Tetsuo Handa wrote:
> I hope we can accept NOW either "reviving linux-next.git" or "allowing debug printk()
> patches for linux.git". For example, "INFO: task hung in __sb_start_write" got 900
> crashes in 81 days but still unable to find a reproducer. Dmitry tried to reproduce
> locally with debug printk() patches but not yet successful. I think that testing with
> http://lkml.kernel.org/r/f91e1c82-9693-cca3...@i-love.sakura.ne.jp
> on linux.git or linux-next.git is the only realistic way for debugging this bug.
> More we postpone revival of the linux-next, more syzbot reports we will get...

Here's a proposal for adding linux-next back:

*) Subsystems or maintainers need to have a way to opt out of getting
spammed with Syzkaller reports that have no reproducer. More often
than not, they are not actionable, and just annoy the maintainers,
with the net result that they tune out all Syzkaller reports as
noise.

*) Email reports for failures on linux-next that correspond to known
failures on mainline should be suppressed. Another way of doing
this would be to only report a problem found by a specific
reproducer to the mailing list unless the recipient has agreed to
be spammed by Syskaller noise.

And please please please, Syzkaller needs to figure out how to do
bisection runs once you have a reproducer.

- Ted

Dmitry Vyukov

unread,
Jun 26, 2018, 10:38:51 AM6/26/18
to Theodore Y. Ts'o, Tetsuo Handa, Linus Torvalds, Dmitry Vyukov, Greg Kroah-Hartman, Eric W. Biederman, Guenter Roeck, Linux Kernel Mailing List, Andrew Morton, syzkaller, Stephen Rothwell, David Miller, Wu Fengguang
On Tue, Jun 26, 2018 at 4:16 PM, Theodore Y. Ts'o <ty...@mit.edu> wrote:
> On Tue, Jun 26, 2018 at 07:54:53PM +0900, Tetsuo Handa wrote:
>> I hope we can accept NOW either "reviving linux-next.git" or "allowing debug printk()
>> patches for linux.git". For example, "INFO: task hung in __sb_start_write" got 900
>> crashes in 81 days but still unable to find a reproducer. Dmitry tried to reproduce
>> locally with debug printk() patches but not yet successful. I think that testing with
>> http://lkml.kernel.org/r/f91e1c82-9693-cca3...@i-love.sakura.ne.jp
>> on linux.git or linux-next.git is the only realistic way for debugging this bug.
>> More we postpone revival of the linux-next, more syzbot reports we will get...
>
> Here's a proposal for adding linux-next back:
>
> *) Subsystems or maintainers need to have a way to opt out of getting
> spammed with Syzkaller reports that have no reproducer. More often
> than not, they are not actionable, and just annoy the maintainers,
> with the net result that they tune out all Syzkaller reports as
> noise.

False. You can count yourself. 2/3 are actionable and fixed.

This also makes the following point ungrounded.

Guenter Roeck

unread,
Jun 26, 2018, 10:55:02 AM6/26/18
to Dmitry Vyukov, Theodore Ts'o, penguin...@i-love.sakura.ne.jp, Linus Torvalds, Greg Kroah-Hartman, Eric W. Biederman, linux-kernel, Andrew Morton, syzkaller, Stephen Rothwell, David Miller, kbuild test robot
On Tue, Jun 26, 2018 at 7:38 AM Dmitry Vyukov <dvy...@google.com> wrote:
>
> On Tue, Jun 26, 2018 at 4:16 PM, Theodore Y. Ts'o <ty...@mit.edu> wrote:
> > On Tue, Jun 26, 2018 at 07:54:53PM +0900, Tetsuo Handa wrote:
> >> I hope we can accept NOW either "reviving linux-next.git" or "allowing debug printk()
> >> patches for linux.git". For example, "INFO: task hung in __sb_start_write" got 900
> >> crashes in 81 days but still unable to find a reproducer. Dmitry tried to reproduce
> >> locally with debug printk() patches but not yet successful. I think that testing with
> >> http://lkml.kernel.org/r/f91e1c82-9693-cca3...@i-love.sakura.ne.jp
> >> on linux.git or linux-next.git is the only realistic way for debugging this bug.
> >> More we postpone revival of the linux-next, more syzbot reports we will get...
> >
> > Here's a proposal for adding linux-next back:
> >
> > *) Subsystems or maintainers need to have a way to opt out of getting
> > spammed with Syzkaller reports that have no reproducer. More often
> > than not, they are not actionable, and just annoy the maintainers,
> > with the net result that they tune out all Syzkaller reports as
> > noise.
>
> False. You can count yourself. 2/3 are actionable and fixed.
>

Problem is that some if not many of the other 1/3 will be considered
noise, and even some of the 2/3 will be considered noise because they
have already been fixed by the time they are reported. Same problem as
with, say, stable tree merges: People don't see the thousands of bug
fixes inherited with such merges, but they do see the two or three
regressions. Plus, of course, one can not prove that the thousands of
bug fixes did any good because the fixed bugs are not observable
anymore. The only remedy is to try to reduce regressions down to zero
(or, of course, stop using/merging stable releases).

The same applies here: People won't see the good, they only see the
noise. This is pretty much the reason why I all but stopped reporting
build/boot failures on -next. You would have to reduce the noise
almost down to zero for people to stop complaining, and you would have
to be _really_ sure that the problem was not already fixed or reported
elsewhere.

Guenter

Tetsuo Handa

unread,
Jun 26, 2018, 4:37:35 PM6/26/18
to Guenter Roeck, Dmitry Vyukov, Theodore Ts'o, Linus Torvalds, Greg Kroah-Hartman, Eric W. Biederman, linux-kernel, Andrew Morton, syzkaller, Stephen Rothwell, David Miller, kbuild test robot
I think that syzbot can stop deciding email recipients and leave it to those who
diagnose bugs, for the ratio of sending to wrong subsystem maintainers is not low.
For example, syzbot assumed that "INFO: task hung in __get_super" is a fs layer bug.
But I think that the problem is in more lower layers (block or mm or locking layer).
The root cause could even be just overstressed due to instructions enabled by
CONFIG_KCOV_ENABLE_COMPARISONS=y.

Tetsuo Handa

unread,
Jul 5, 2018, 6:49:46 AM7/5/18
to Dmitry Vyukov, Guenter Roeck, Theodore Ts'o, Linus Torvalds, Greg Kroah-Hartman, Eric W. Biederman, linux-kernel, Andrew Morton, syzkaller, Stephen Rothwell, David Miller, kbuild test robot
On 2018/06/27 5:37, Tetsuo Handa wrote:
> I think that syzbot can stop deciding email recipients and leave it to those who
> diagnose bugs, for the ratio of sending to wrong subsystem maintainers is not low.
> For example, syzbot assumed that "INFO: task hung in __get_super" is a fs layer bug.
> But I think that the problem is in more lower layers (block or mm or locking layer).
> The root cause could even be just overstressed due to instructions enabled by
> CONFIG_KCOV_ENABLE_COMPARISONS=y.
>

Thinking from today's bpf related reports, I think that subversion/quilt-based
custom patches will be useful as well.

Since quilt can apply changes in a patch atomically (using "quilt push" command),
we can maintain one custom patch for one git tree. Then, the kernel source syzbot
will test is either "no custom patch applied" or "only one custom patch applied".
That is, if "quilt push" fails, syzbot will continue testing without custom patch.

Since subversion manages revision number using an integer, adding a column for
indicating "which custom patch was applied for this report" to the table will not
occupy much space. We will figure out that custom patch needs to be updated via
syzbot reports with that column being empty.

The custom patch can contain whatever changes which might be useful for debugging.
For example, debug printk() for "INFO: task hung in __sb_start_write" case.
For another example, context identifier for printk().

Updating custom patches in subversion repository is done manually. But the cost is
negligible.

Tetsuo Handa

unread,
Jul 6, 2018, 7:27:04 PM7/6/18
to Andrew Morton, Dmitry Vyukov, Guenter Roeck, Theodore Ts'o, Linus Torvalds, Greg Kroah-Hartman, Eric W. Biederman, linux-kernel, syzkaller, Stephen Rothwell, David Miller, kbuild test robot
Hello Andrew,

It seems that syzbot (experimentally ?) restarted testing linux-next.

May I ask you to carry temporarily debug printk() patch at
https://groups.google.com/d/msg/syzkaller-bugs/E8M8WTqt034/OpadOICfCAAJ
for "INFO: task hung in __sb_start_write" case?

The bug should be reproduced within a day if executed under syzbot environment.
Thus, I'm sure that we don't need to carry this patch for long.

Andrew Morton

unread,
Jul 9, 2018, 8:35:48 PM7/9/18
to Tetsuo Handa, Dmitry Vyukov, Guenter Roeck, Theodore Ts'o, Linus Torvalds, Greg Kroah-Hartman, Eric W. Biederman, linux-kernel, syzkaller, Stephen Rothwell, David Miller, kbuild test robot
Sure, I can add that. Let's get the build warning sorted out first,
please. Any old silly workaround will suffice in a developer-only
debug patch.

Tetsuo Handa

unread,
Jul 9, 2018, 10:14:32 PM7/9/18
to Andrew Morton, Dmitry Vyukov, Guenter Roeck, Theodore Ts'o, Linus Torvalds, Greg Kroah-Hartman, Eric W. Biederman, linux-kernel, syzkaller, Stephen Rothwell, David Miller, kbuild test robot
Andrew Morton wrote:
> On Sat, 7 Jul 2018 08:26:32 +0900 Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp> wrote:
>
> > Hello Andrew,
> >
> > It seems that syzbot (experimentally ?) restarted testing linux-next.
> >
> > May I ask you to carry temporarily debug printk() patch at
> > https://groups.google.com/d/msg/syzkaller-bugs/E8M8WTqt034/OpadOICfCAAJ
> > for "INFO: task hung in __sb_start_write" case?
> >
> > The bug should be reproduced within a day if executed under syzbot environment.
> > Thus, I'm sure that we don't need to carry this patch for long.
>
> Sure, I can add that.

Thank you.

> Let's get the build warning sorted out first,
> please. Any old silly workaround will suffice in a developer-only
> debug patch.

The build warning is about mips architecture rather than this patch itself,
for x86_64 builds fine. You can add this patch despite the build warning.

Tetsuo Handa

unread,
Aug 27, 2018, 8:36:20 AM8/27/18
to Dmitry Vyukov, syzkaller
Hello.

Any chance we can use subscribe/unsubscribe approach?

If subscribe/unsubscribe were implemented, we could remove distinction
between "moderation" and "open" which is an annoying barrier because
we cannot mark as "dup" entries across them.

On 2018/06/28 19:54, Tetsuo Handa wrote:
> On 2018/06/27 5:37, Tetsuo Handa wrote:
>> I think that syzbot can stop deciding email recipients and leave it to those who
>> diagnose bugs, for the ratio of sending to wrong subsystem maintainers is not low.
>> For example, syzbot assumed that "INFO: task hung in __get_super" is a fs layer bug.
>> But I think that the problem is in more lower layers (block or mm or locking layer).
>> The root cause could even be just overstressed due to instructions enabled by
>> CONFIG_KCOV_ENABLE_COMPARISONS=y.
>>
> Below is what I suggest syzbot to do. Trying to find maintainers automatically tends
> to select persons who are too busy to diagnose that bug. Your "actionable" is
> different from maintainer's "can make time for examining what is wrong". Thus,
> I think that we should stop sending mails to automatically selected maintainers.
> Instead, syzbot interface can have subscribe/unsubscribe interface, and those who
> can spend resource for diagnosing each bug and estimating whom to notify adds/removes
> mail addresses. This way, we can revive linux-next.git without spamming maintainers.
>
> Note that I don't understand go language syntax. This patch unlikely passes
> the build and unlikely works as expected.
>
> ---
> dashboard/app/api.go | 2 --
> dashboard/app/app_test.go | 6 ------
> dashboard/app/bug.html | 2 --
> dashboard/app/config.go | 2 --
> dashboard/app/email_test.go | 33 -------------------------------
> dashboard/app/entities.go | 2 +-
> dashboard/app/jobs_test.go | 1 -
> dashboard/app/mail_bug.txt | 1 -
> dashboard/app/main.go | 2 --
> dashboard/app/reporting.go | 14 ++++++++------
> dashboard/app/reporting_email.go | 31 ++++++++++++-----------------
> dashboard/app/reporting_test.go | 2 --
> dashboard/dashapi/dashapi.go | 4 ++--
> pkg/email/parser.go | 25 ++++++++++++++++++++++++
> pkg/report/linux.go | 42 ----------------------------------------
> pkg/report/report.go | 4 +---
> syz-ci/manager.go | 1 -
> syz-manager/manager.go | 2 --
> 18 files changed, 49 insertions(+), 127 deletions(-)

Dmitry Vyukov

unread,
Aug 27, 2018, 10:06:01 AM8/27/18
to Tetsuo Handa, syzkaller
On Mon, Aug 27, 2018 at 5:36 AM, Tetsuo Handa
<penguin...@i-love.sakura.ne.jp> wrote:
> Hello.
>
> Any chance we can use subscribe/unsubscribe approach?
>
> If subscribe/unsubscribe were implemented, we could remove distinction
> between "moderation" and "open" which is an annoying barrier because
> we cannot mark as "dup" entries across them.

Hi,

Please detail what exactly you mean by subscribe/unsubscribe approach
and how the workflow will look like.
But it seems that implement duping from moderation to open would solve
the problem. And it's useful in itself and in other contexts and looks
like less work.

Tetsuo Handa

unread,
Aug 27, 2018, 5:14:07 PM8/27/18
to Dmitry Vyukov, syzkaller
Suppose Alice, Bob and Carol are there.

Alice looks at the dashboard and finds a report. Then, Alice gets interested
in that report. Thus, Alice invites herself using "subscribe Al...@mail1.com"
command in order to receive emails for that report.

Later, some more crashes get added to that report. Then, Alice examines that
report and thinks that Bob would be responsible for that report. Thus, Alice
invites Bob using "subscribe B...@mail2.com" command.

Now, Bob starts receiving emails. But Bob thinks that Carol will be more familiar
for that report. Thus, Bob removes himself using "unsubscribe B...@mail2.com" and
invites Carol using "subscribe Ca...@mail3.com".

Now, Alice and Carol receive emails for that report.

That is: syzbot might be allowed to determine candidates of initial subscribers,
but anyone is allowed to "add/remove subscribers" before emails are actually
sent to syzbot-determined candidates.

Dmitry Vyukov

unread,
Jan 4, 2019, 8:17:08 AM1/4/19
to Tetsuo Handa, syzkaller
The question is: when/how will this last part happen. This implies
that a human looks at all reports before they are mailed in a timely
manner. The reality is that no human looks at all of them and triages
all of them. What happens if nobody looks at a report? It is not
mailed at all? Who are these special people triaging all reports? How
are they different from all kernel developers? Why can't all kernel
developers do this triage after the report has been mailed? If all
kernel developers can't cope will all bug reports, how can we expect a
hundreds times smaller group of people will be able to do this work?
If we delay a report even by few days, it can actually cause a
negative effect if the bug was already reported by somebody else
(duplicate unuseful emails). Or if syzbot has already mailed one
incarnation of a bug and kernel developers are spending time debugging
it, but we did not mail another, more useful, incarnation of the same
bug because we are holding it internally and don't see the relation.
Taking into account that most of the time reports are sent to relevant
people, any additional delays can actually be net loss.
What I am getting at: adding people to CC after the report has been
sent is trivial and already works (just add them to CC). Removing
people from CC is currently missing, but very few people complained, I
guess kernel developers are used to lots of semi-relevant emails.
This all also looks exactly as a bug tracking system. See here you can
add/remove people to CC, set assignee, note statuses and keep all
history in one place:
https://bugzilla.kernel.org/show_bug.cgi?id=199359
But implementing yet another bug tracking system is huge amount of
work, it requires much more resources than we have now. And in the end
almost nothing of this is specific to syzbot, so solving this in the
limited context of syzbot looks wrong.

Tetsuo Handa

unread,
Jan 5, 2019, 12:34:09 AM1/5/19
to Dmitry Vyukov, syzkaller
Yes. Always involve a human check before sending to specific individuals.

> The reality is that no human looks at all of them and triages
> all of them.

Of course. Therefore, we need people who play "frontend" role and people
who play "backend" role. The former (Alice in the example above) is a
"dispatcher" who interprets syzbot reports and decides whom to notify.

Currently syzbot is making decisions based on MAINTAINERS file. But
you are missing that persons listed there are generally too busy to
look at details. Reports which do not tell "what is happening" and
"why it is happening" at a glance are not useful for them. If I recall
correctly, many weeks ago, you were discussing with Linus regarding
what should be done before reporting (e.g. bisect).

> What happens if nobody looks at a report? It is not
> mailed at all? Who are these special people triaging all reports?

Only "frontend" people will look at reports from syzbot, and
"backend" people will triage reports from "frontend" people.

> How
> are they different from all kernel developers? Why can't all kernel
> developers do this triage after the report has been mailed?

It is impossible for all kernel developers to read all mails.
That's why we need to use roles, like I said that I have worked at a
support center and experienced both "frontend" role and "backend" role.

> If all
> kernel developers can't cope will all bug reports, how can we expect a
> hundreds times smaller group of people will be able to do this work?

We can do this work if we use roles. And I'm rather working as "frontend".
For example, I interpreted a report from syzbot and forwarded to authors
which caused that bug as
http://lkml.kernel.org/r/a6e8d929-3765-ad70...@I-love.SAKURA.ne.jp .

> If we delay a report even by few days, it can actually cause a
> negative effect if the bug was already reported by somebody else
> (duplicate unuseful emails).

If we do want to send a report automatically, it should be sent to only
one mailing list. That would be either LKML or syzbot specific one
(for allowing people to find using Google search).

> Or if syzbot has already mailed one
> incarnation of a bug and kernel developers are spending time debugging
> it, but we did not mail another, more useful, incarnation of the same
> bug because we are holding it internally and don't see the relation.
> Taking into account that most of the time reports are sent to relevant
> people, any additional delays can actually be net loss.

I disagree. Mails sent to maintainers are seldom useful; again, they are
too busy to examine the bugs. Mails sent to authors which caused that bug
will be useful. Thus, sending to mailing list for Google search would be
fine, but sending to auto-generated individuals is bad.

> What I am getting at: adding people to CC after the report has been
> sent is trivial and already works (just add them to CC). Removing
> people from CC is currently missing, but very few people complained,

I am the one who is complaining about lack of ability to unsubscribe.

> I
> guess kernel developers are used to lots of semi-relevant emails.

Are they reading semi-relevant emails?
I don't read threads which are not actionable for me.

> This all also looks exactly as a bug tracking system. See here you can
> add/remove people to CC, set assignee, note statuses and keep all
> history in one place:
> https://bugzilla.kernel.org/show_bug.cgi?id=199359
> But implementing yet another bug tracking system is huge amount of
> work, it requires much more resources than we have now. And in the end
> almost nothing of this is specific to syzbot, so solving this in the
> limited context of syzbot looks wrong.

As a "frontend" people, what I want is not a bug tracking system like Bugzilla.
What I want is a column for holding free-text memo (like "sticky", or "Google
Spreadsheet") rather than structures for choosing from checkbox/combobox (for
finding from many thousands of bugs).

The "backend" people can use Bugzilla etc. if the bug is complicated enough to
require a bug tracking system. There are too many bugs left unnoticed in bug
tracking systems. I don't think that registering to bug tracking systems from
the beginning is a good choice. More trivial bugs, more difficult to manage
important bugs.

yang ou

unread,
Jan 25, 2019, 11:43:26 AM1/25/19
to syzkaller
How about the reprodcible percent of race bugs?

在 2018年1月22日星期一 UTC+8下午9:34:53,Dmitry Vyukov写道:

yang ou

unread,
Jan 26, 2019, 8:12:20 PM1/26/19
to syzkaller
How about the reprodcible percent of race bugs?

在 2018年1月22日星期一 UTC+8下午9:34:53,Dmitry Vyukov写道:
On Fri, Jan 19, 2018 at 2:48 AM, Fengguang Wu <fenggu...@intel.com> wrote:

Dmitry Vyukov

unread,
Jan 27, 2019, 4:47:26 AM1/27/19
to yang ou, syzkaller, Tetsuo Handa
On Sun, Jan 27, 2019 at 2:12 AM yang ou <aen...@gmail.com> wrote:
>
> How about the reprodcible percent of race bugs?

What about is? Do you mean what is the percent of reproducible bugs or what?
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Tetsuo Handa

unread,
Jan 27, 2019, 6:17:41 AM1/27/19
to Dmitry Vyukov, yang ou, syzkaller
On 2019/01/27 18:47, Dmitry Vyukov wrote:
> On Sun, Jan 27, 2019 at 2:12 AM yang ou <aen...@gmail.com> wrote:
>>
>> How about the reprodcible percent of race bugs?
>
> What about is? Do you mean what is the percent of reproducible bugs or what?

I guess that yang meant to ask

What is the ratio of "(crashes / trials) * 100" for each race bug?
Can syzbot tell us how likely/unlikely each race bug is reproducible?

. That is, if syzbot found a reproducer, syzbot runs that reproducer for e.g.
100 times and report how many times that reproducer succeeded to trigger that bug.
If the ratio is low, there might be room for trying to find a better reproducer.

Dmitry Vyukov

unread,
Jan 27, 2019, 6:29:56 AM1/27/19
to Tetsuo Handa, yang ou, syzkaller
I see. This is https://github.com/google/syzkaller/issues/885. Let's
move the discussion there.
Reply all
Reply to author
Forward
0 new messages