Syzkaller Bisection Sanity Checking (extend noop change detection)

99 views
Skip to first unread message

Lukas Bulwahn

unread,
Jul 7, 2020, 1:12:27 PM7/7/20
to Jouni Högander, Jukka Kaartinen, Dmitry Vyukov, syzkaller
Hi Dmitry,

I took your proposal (fixing the "noop change detection") from our
last discussion
(https://groups.google.com/forum/#!topic/syzkaller/DTZU5gRHGoo) and
expanded a bit on that idea:


Problem:

Reproducing a specific bug, identified with syzkaller is tricky.
Triggering the intended bugs with a reproducers can fail because:
1. the intended bug only happens sporadically. Hence, it might not
trigger within a certain time limit.
The reproducer attempt then simply run until the time limit. In
such a case, in a bisection, we must assume that the bug is not in
that commit, simply because we did not observe it anymore.
2. the reproducer hit another unrelated crash and not the intended bug.
In a bisection, we consider any kernel crash identified as the
specific bug that is intended to trigger.

This leads to incorrect bisections, and ultimately to reporting a
kernel crash to the wrong maintainers, decreasing the priority and
attention towards syzkaller reports.

We would like to identify wrong bisections due to these flaky
reproducers early and with little runtime overhead in the bisection
process with the following feature:

First, we need to ensure that the syzkaller kernel builds are
reproducible independent of the specific commit, i.e., any kernel
build that is semantically identical on the build-relevant files leads
to bit-identical kernel images.

Now, let us assume we are bisecting between the two commits A and B.
The commit C selected as the next bisection commit between A and B.
The hash of the kernel build of commit A, B and C are shaA, shaB and
shaC, respectively.
The reported bug (error message string) of the reproducer on commit A,
B, C is denoted with bugA, bugB, bugC.

If shaC is identical to shaA, then we know that the kernel images of A
and C are identical. So in absence of other sporadic bugs interfering,
the error message bugA must be identical bugC. If bugA is not
identical to bugC, we must assume that one of the reproducer runs,
either A or C, must have been subject to some sporadic bug interfering
or the bug itself is sporadic.

The bisection algorithm at this point has detected that the reproducer
only reproduces a sporadic bug and the bisection is not fully
trustworthy (potentially wrong bisection). This information can then
be used within syzkaller for further decisions on reporting.

Otherwise, if bugA == bugC, we simply continue the commit bisection as known.

Also, if shaC is identical to shaB, we apply the approach described
above analogously, for B instead A.

With continued bisection and decreasing number of commits between the
bisection commits A and B, the likelihood that the shaC and shaA or
shaB are identical increases and hence that wrong bisections are
identified.

We could further extend the bisection algorithm with a suitable
backtracking approach:

Once we identified that the bug is sporadic and either A or C are
wrong, the bisection algorithm goes back to testing A resulting in the
report bugA'.
If bugA' == bugA, we assume the result from commit A trustworthy and
then we continue back with testing C, resulting in bugC'.
If bugC' == bugA, we continue bisection, assuming bugC was a sporadic
issue and bugA, bugA', bugC is the actual relevant output for the
reproducer run on the kernel build shaA and shaC, remember that shaA
== shaC here.
If not bugC' == bug A, then we give up (otherwise we are just going in
circles in this bisection backtracking). You could think about
repeating this reproducer back and forth with a fixed time, but you
will need to stop eventually in this unclear situation.

If bugA' != bugA, we backtrack to the bisection before the bisection
of commit A and B.

Dmitry, in this roughly what you had in mind, with your suggestion.

If so, we would start with implementing this simple bisection sanity
checking above and provide a pull request on that.


Best regards,

Lukas

Dmitry Vyukov

unread,
Jul 8, 2020, 4:51:36 AM7/8/20
to Lukas Bulwahn, Jouni Högander, Jukka Kaartinen, syzkaller, Eric Biggers, Pedro Lopes
On Tue, Jul 7, 2020 at 7:12 PM Lukas Bulwahn <lukas....@gmail.com> wrote:
>
> Hi Dmitry,
>
> I took your proposal (fixing the "noop change detection") from our
> last discussion
> (https://groups.google.com/forum/#!topic/syzkaller/DTZU5gRHGoo) and
> expanded a bit on that idea:

+Eric, Perdo as they were looking at "noop change detection" recently as well.

Dmitry Vyukov

unread,
Jul 8, 2020, 4:59:16 AM7/8/20
to Lukas Bulwahn, Jouni Högander, Jukka Kaartinen, syzkaller, Eric Biggers, Pedro Lopes
One thing is that we won't get shaA == shaC up to the very last steps
of bisection. All but the last steps include lots of commits
(hundreds/thousands), chances that we will get the same has are close
to zero.

Then, different crash messages do not necessarily mean sporadic bugs.
The same bug can result in multiple manifestations (most notably
races, timing-dependent crashes, multi-threaded reproducers). I think
in the "bisection results analysis" table I've seen a number of such
bugs.
I had a much more prosaic thing in mind -- currently kernel binary
hashing is broken, we are getting different hashes for comment-only
changes:
https://github.com/google/syzkaller/issues/1271#issuecomment-559093018

Lukas Bulwahn

unread,
Jul 8, 2020, 1:49:31 PM7/8/20
to Dmitry Vyukov, Jouni Högander, Jukka Kaartinen, syzkaller, Eric Biggers, Pedro Lopes
Yes, I agree but the noop detection can somehow only once the number
of commits get smaller;
otherwise, it is not a "noop", but an actual change that could impact
the reproducer run.

Well, if I reduced the kernel to a minimal config for the specific
reproducer, the git range could still be hundreds of commits, but all
those commits do not change the kernel image, because they are just
modifying code that is not relevant to the config, e.g., hundreds of
changes to Documentation, device trees, etc.. It is still an end of
bisection check, but this could already trigger with a number of
irrelevant commits in between.

> Then, different crash messages do not necessarily mean sporadic bugs.
> The same bug can result in multiple manifestations (most notably
> races, timing-dependent crashes, multi-threaded reproducers). I think
> in the "bisection results analysis" table I've seen a number of such
> bugs.
>

Is that multiple manifestations true for the majority of syzkaller bugs?
Especially, we are talking about multiple manifestations of a bug on
the same kernel binary here.
My suggestion cannot handle multiple manifestations and would simply
give up on that, but if that is only a minor, smaller class of bugs,
then there is still value for the single manifestation bugs (single
manifestation on a single fixed kernel build).

But you also need to consider timeout that the bug might just not
manifest at all, right?

Do I understand noop detection correctly that you would simply bisect
with all those known weaknesses and determine if the commit points to
a commit that changes the kernel build hash compared to its parent
commit hash (assuming a reproducible build)? And if not, you would
simply restart the bisection from the beginning git range and simply
completely run the whole bisection once again?
Okay, we need to investigate that issue first. I guess this is even
anyway needed for any kind of noop detection.

Just some random ideas:
We find the cause of those hash changing but irrelevant parts in the
kernel build.
Then, we remove that part, make them constant by setting something, or
we refine that identity checking to simply ignore some irrelevant meta
data.

Possibly, some debug tools need to store maps to line numbers etc.,
and those just change whenever you change the file even in irrelevant
parts.

Dmitry, thanks for your hints so far. I see and agree with the points
you have raised; I think some of them can be resolved with a bit of
work, and some are just the limitation of working with fuzzy
information of running reproducers. It does not invalidate the idea
completely and the smaller steps on the way towards some feature like
I sketched are valuable anyway, even if later used differently.

Eric, Pedro, what did you already think about and what are you
currently tackling in that regard?

Lukas

Dmitry Vyukov

unread,
Jul 8, 2020, 2:10:26 PM7/8/20
to Lukas Bulwahn, Jouni Högander, Jukka Kaartinen, syzkaller, Eric Biggers, Pedro Lopes
I agree that config minimization may help somewhat here.
Here is some quick sampling of bisections we do:

Bisecting: 7028 revisions left to test after this (roughly 13 steps)
Bisecting: 7248 revisions left to test after this (roughly 13 steps)
Bisecting: 21685 revisions left to test after this (roughly 15 steps)
Bisecting: 7937 revisions left to test after this (roughly 13 steps)
Bisecting: 9032 revisions left to test after this (roughly 13 steps)
Bisecting: 14820 revisions left to test after this (roughly 14 steps)
Bisecting: 8497 revisions left to test after this (roughly 13 steps)
Bisecting: 7032 revisions left to test after this (roughly 13 steps)
Bisecting: 8695 revisions left to test after this (roughly 13 steps)
Bisecting: 7596 revisions left to test after this (roughly 13 steps)
Bisecting: 7882 revisions left to test after this (roughly 13 steps
Bisecting: 5991 revisions left to test after this (roughly 13 steps)
Bisecting: 7882 revisions left to test after this (roughly 13 steps)
Bisecting: 7882 revisions left to test after this (roughly 13 steps)
Bisecting: 7596 revisions left to test after this (roughly 13 steps)

If we assume that a range of 128 commits don't change the binary (?),
this means that ~half of the last steps will have the same binary.

But if we have the same binary, why do we need to run any runtime
tests really? ;)

Dmitry Vyukov

unread,
Jul 8, 2020, 2:17:02 PM7/8/20
to Lukas Bulwahn, Jouni Högander, Jukka Kaartinen, syzkaller, Eric Biggers, Pedro Lopes
I don't have a precise number, but my impression is that it's frequent enough.
Here is some quick sampling for such cases. Each group corresponds to
a single build:

run #0: crashed: general protection fault in sctp_assoc_rwnd_increase
run #1: crashed: general protection fault in sctp_ulpevent_free
run #2: crashed: general protection fault in sctp_assoc_rwnd_increase
run #3: crashed: general protection fault in sctp_assoc_rwnd_increase
run #4: crashed: general protection fault in sctp_assoc_rwnd_increase
run #5: crashed: general protection fault in sctp_assoc_rwnd_increase
run #6: crashed: general protection fault in sctp_assoc_rwnd_increase
run #7: crashed: general protection fault in sctp_assoc_rwnd_increase

run #0: crashed: general protection fault in sctp_assoc_rwnd_increase
run #1: crashed: general protection fault in sctp_assoc_rwnd_increase
run #2: crashed: general protection fault in sctp_assoc_rwnd_increase
run #3: crashed: general protection fault in sctp_assoc_rwnd_increase
run #4: crashed: general protection fault in corrupted
run #5: crashed: general protection fault in sctp_assoc_rwnd_increase
run #6: crashed: general protection fault in sctp_assoc_rwnd_increase
run #7: crashed: general protection fault in corrupted

run #0: crashed: INFO: rcu detected stall in corrupted
run #1: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #2: crashed: INFO: rcu detected stall in sys_sendfile64
run #3: crashed: INFO: rcu detected stall in corrupted
run #4: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #5: crashed: INFO: rcu detected stall in corrupted
run #6: crashed: INFO: rcu detected stall in corrupted
run #7: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #8: crashed: INFO: rcu detected stall in sys_sendfile64
run #9: crashed: INFO: rcu detected stall in ext4_file_write_iter

run #0: crashed: INFO: rcu detected stall in sys_sendfile64
run #1: crashed: INFO: rcu detected stall in corrupted
run #2: crashed: INFO: rcu detected stall in corrupted
run #3: crashed: INFO: rcu detected stall in sys_sendfile64
run #4: crashed: INFO: rcu detected stall in corrupted
run #5: crashed: INFO: rcu detected stall in corrupted
run #6: crashed: INFO: rcu detected stall in corrupted
run #7: crashed: INFO: rcu detected stall in corrupted
run #8: crashed: INFO: rcu detected stall in sys_sendfile64
run #9: crashed: INFO: rcu detected stall in corrupted

run #0: crashed: INFO: rcu detected stall in rw_verify_area
run #1: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #2: crashed: INFO: rcu detected stall in corrupted
run #3: crashed: INFO: rcu detected stall in corrupted
run #4: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #5: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #6: crashed: INFO: rcu detected stall in corrupted
run #7: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #8: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #9: crashed: INFO: rcu detected stall in rw_verify_area

run #0: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #1: crashed: INFO: rcu detected stall in corrupted
run #2: crashed: INFO: rcu detected stall in sys_sendfile64
run #3: crashed: INFO: rcu detected stall in sys_sendfile64
run #4: crashed: INFO: rcu detected stall in corrupted
run #5: crashed: INFO: rcu detected stall in sys_sendfile64
run #6: crashed: INFO: rcu detected stall in sys_sendfile64
run #7: crashed: INFO: rcu detected stall in corrupted
run #8: crashed: INFO: rcu detected stall in corrupted
run #9: crashed: INFO: rcu detected stall in sys_sendfile64

run #0: crashed: KASAN: use-after-free Read in link_path_walk
run #1: crashed: KASAN: use-after-free Read in link_path_walk
run #2: crashed: KASAN: use-after-free Read in trailing_symlink
run #3: crashed: KASAN: use-after-free Read in trailing_symlink
run #4: crashed: KASAN: use-after-free Read in trailing_symlink
run #5: crashed: KASAN: use-after-free Read in link_path_walk
run #6: crashed: KASAN: use-after-free Read in link_path_walk
run #7: crashed: KASAN: use-after-free Read in link_path_walk
run #8: crashed: KASAN: use-after-free Read in trailing_symlink
run #9: crashed: KASAN: use-after-free Read in trailing_symlink

run #0: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #1: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #2: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #3: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #4: crashed: WARNING: ODEBUG bug in corrupted
run #5: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #6: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #7: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #8: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #9: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup

run #0: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #1: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #2: crashed: general protection fault in delayed_uprobe_remove
run #3: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #4: crashed: general protection fault in delayed_uprobe_remove
run #5: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #6: OK
run #7: OK
run #8: OK
run #9: OK

run #0: crashed: general protection fault in delayed_uprobe_remove
run #1: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #2: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #3: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #4: crashed: general protection fault in delayed_uprobe_remove
run #5: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #6: OK
run #7: OK
run #8: OK
run #9: OK

run #0: crashed: KASAN: use-after-free Read in fuse_dev_do_read
run #1: crashed: WARNING in request_end
run #2: crashed: KASAN: use-after-free Read in fuse_dev_do_read
run #3: OK
run #4: OK

run #0: crashed: KASAN: slab-out-of-bounds Read in __ip_append_data
run #1: crashed: inconsistent lock state in rhashtable_walk_enter
run #2: crashed: inconsistent lock state in rhashtable_walk_enter
run #3: crashed: inconsistent lock state in rhashtable_walk_enter
run #4: crashed: inconsistent lock state in rhashtable_walk_enter
run #5: crashed: inconsistent lock state in rhashtable_walk_enter
run #6: crashed: inconsistent lock state in rhashtable_walk_enter
run #7: crashed: kernel BUG at arch/x86/mm/physaddr.c:LINE!
run #8: crashed: inconsistent lock state in rhashtable_walk_enter
run #9: crashed: inconsistent lock state in rhashtable_walk_enter

run #0: crashed: kernel BUG at arch/x86/mm/physaddr.c:LINE!
run #1: crashed: inconsistent lock state in rhashtable_walk_enter
run #2: crashed: inconsistent lock state in rhashtable_walk_enter
run #3: crashed: inconsistent lock state in rhashtable_walk_enter
run #4: crashed: inconsistent lock state in rhashtable_walk_enter
run #5: crashed: kernel BUG at arch/x86/mm/physaddr.c:LINE!
run #6: crashed: inconsistent lock state in rhashtable_walk_enter
run #7: crashed: kernel BUG at arch/x86/mm/physaddr.c:LINE!
run #8: crashed: inconsistent lock state in rhashtable_walk_enter
run #9: crashed: inconsistent lock state in rhashtable_walk_enter

run #0: crashed: KASAN: use-after-free Read in __vb2_perform_fileio
run #1: crashed: KASAN: use-after-free Write in __vb2_cleanup_fileio
run #2: crashed: KASAN: use-after-free Read in __vb2_perform_fileio
run #3: crashed: KASAN: use-after-free Read in __vb2_perform_fileio
run #4: crashed: INFO: task hung in vivid_stop_generating_vid_cap
run #5: crashed: INFO: task hung in vivid_stop_generating_vid_cap
run #6: crashed: INFO: task hung in vivid_stop_generating_vid_cap
run #7: crashed: INFO: task hung in vivid_stop_generating_vid_cap
run #8: crashed: INFO: task hung in vivid_stop_generating_vid_cap
run #9: crashed: INFO: task hung in vivid_stop_generating_vid_cap

Dmitry Vyukov

unread,
Jul 8, 2020, 2:24:22 PM7/8/20
to Lukas Bulwahn, Jouni Högander, Jukka Kaartinen, syzkaller, Eric Biggers, Pedro Lopes
On Wed, Jul 8, 2020 at 7:49 PM Lukas Bulwahn <lukas....@gmail.com> wrote:
> But you also need to consider timeout that the bug might just not
> manifest at all, right?

Yes.

> Do I understand noop detection correctly that you would simply bisect
> with all those known weaknesses and determine if the commit points to
> a commit that changes the kernel build hash compared to its parent
> commit hash (assuming a reproducible build)?

Yes.
The initial emails contained lots of such cases. The major problem was
that the result was immediately and obviously laughable for any
humans, e.g. bug is bisection to a comment-only change and the author
is pointed finger at. Obviously that author will then point the finger
at syzbot for doing this dumbness :)
Other cases were arm64 change for x86_64 build, or Linus tagging a release.


> And if not, you would
> simply restart the bisection from the beginning git range and simply
> completely run the whole bisection once again?

No, currently we simply don't send an email.
But this is orthogonal. If we have a good signal for bad bisection, we
may don't send email or retry it, or try different config, or
different timeout, or maybe something else.
Yes.

> Just some random ideas:
> We find the cause of those hash changing but irrelevant parts in the
> kernel build.
> Then, we remove that part, make them constant by setting something, or
> we refine that identity checking to simply ignore some irrelevant meta
> data.
>
> Possibly, some debug tools need to store maps to line numbers etc.,
> and those just change whenever you change the file even in irrelevant
> parts.

Yes, we ignore debug data, we ignore some other sections of the binary
and we set some env vars to some special values for kbuild (e.g. not
embeding build timestamp into the binary).

Dmitry Vyukov

unread,
Jul 8, 2020, 2:26:05 PM7/8/20
to Lukas Bulwahn, Jouni Högander, Jukka Kaartinen, syzkaller, Eric Biggers, Pedro Lopes
On Wed, Jul 8, 2020 at 8:24 PM Dmitry Vyukov <dvy...@google.com> wrote:
> > Just some random ideas:
> > We find the cause of those hash changing but irrelevant parts in the
> > kernel build.
> > Then, we remove that part, make them constant by setting something, or
> > we refine that identity checking to simply ignore some irrelevant meta
> > data.
> >
> > Possibly, some debug tools need to store maps to line numbers etc.,
> > and those just change whenever you change the file even in irrelevant
> > parts.
>
> Yes, we ignore debug data, we ignore some other sections of the binary
> and we set some env vars to some special values for kbuild (e.g. not
> embeding build timestamp into the binary).

Here are env vars:
https://github.com/google/syzkaller/blob/9f9845eb28f0b516d36cb5478bae8dd5714733f1/pkg/build/linux.go#L140-L151

And here are sections we checksum:
https://github.com/google/syzkaller/blob/9f9845eb28f0b516d36cb5478bae8dd5714733f1/pkg/build/linux.go#L168-L178

Lukas Bulwahn

unread,
Jul 8, 2020, 5:56:25 PM7/8/20
to Dmitry Vyukov, Jouni Högander, Jukka Kaartinen, syzkaller, Eric Biggers, Pedro Lopes
Thanks! I think we will start with determining how to identify if two
builds are essentially "identical". So, hunting down on the current
issue and fixing that.

Then, we will see how to make use of that in the bisection.

Lukas

Lukas Bulwahn

unread,
Jul 8, 2020, 6:02:22 PM7/8/20
to Dmitry Vyukov, Jouni Högander, Jukka Kaartinen, syzkaller, Eric Biggers, Pedro Lopes
Okay, but then we will need some kind of "magic" to compute the
"likelihood that an error message is related to the other error
message from the same bug or not" as Jukka suggested to look into. I
guess we need to collect more data of that kind above, and then we can
derive some kind of heuristic on that.

Lukas Bulwahn

unread,
Jul 13, 2020, 5:19:23 AM7/13/20
to Dmitry Vyukov, Jouni Högander, Jukka Kaartinen, syzkaller, Eric Biggers, Pedro Lopes
Good point. You see that I am still in the middle of the thought
process of understanding the problem and its relation to the solution
here :)

I think we can split the solution into two aspects:

So, one part of the solution I proposed is simply being smarter when bisecting:

For this first part, let us assume:
1. we have a bisection criteria that is fully reliable, deterministic
and complete, i.e., for each commit, it gives us with 100% correctness
either a good or bad answer.
Let us call this function b (as in bisecting function) with this type
commit -> {good, bad}.

(This assumption does not hold for syzkaller reproducers, as discussed
already in this thread, but we will get to that later in part 2.)

2. we have a reliable way to identify for a commit if the bisection
criteria would lead to the same result as another run on a different
commit.
Let us call this binary relation C (as in characterizing equivalence
relation) on commits, i.e., type is (commit, commit) -> boolean.

The default bisection works as known:
1. Determine a commit c in between a git range (start, end). WLOG let
start be good and end be bad.
2. Test: Let r be the result of b(c).
3. If r is good, then set start to c; if is bad set end to c.
4. Repeat with a new start or end on step 1, until the first bad
commit is identified.

Our "smarter" bisection works now as follows:

1. Determine a commit c in between a git range (start, end). WLOG let
start be good and end be bad.
2a. Characteristic Test:
  1. Check if (c, start) \in C, then let r be good (because we know
b(start) = good) and go to Step 3.
  2. Otherwise Check if (c, end) \in C then let r be bad and go to Step 3.
  3. Otherwise continue with Step 2b.
2b. Test: Let r be the result of b(c).
3. If r is good, then set start to c; if is bad set end to c.
4. Repeat with a new start or end on step 1, until the first bad
commit is identified.

That works pretty straight forward.
It would allow us to not bisect towards irrelevant commits when C is
reliable, even if b is not reliable.

Now we simply need to have a good way of determining this relation C:
A suitable build and hash function on the image would do that.

This would be the first topic that we can work on: 1. the basic
smarter git bisection (possibly fully syzkaller-agnostic as some
self-contained go module) and 2. the specific hash function for the
relation C.


The second part is really about dealing with the unreliable
reproducers, but at this point, we are really just relying on
executing the reproducers multiple times on the same commit and trying
to determine if reproducing is reliable or not. There is not much more
thought in that part so far.

Lukas

Dmitry Vyukov

unread,
Jul 16, 2020, 6:53:37 AM7/16/20
to Lukas Bulwahn, Jouni Högander, Jukka Kaartinen, syzkaller, Eric Biggers, Pedro Lopes
To make sure I understand: you propose to skip the actual runtime test
if we already did the runtime test on an equivalent build, and instead
simply reuse the result of the previous run?
It makes sense. We already have all necessary info collected for this.
We have this results map:
https://github.com/google/syzkaller/blob/b090c64363768b1fc574b98a8d42a3dd5f7af3e9/pkg/bisect/bisect.go#L239
which holds results for all tests we did before along with the kernel
build hash:
https://github.com/google/syzkaller/blob/b090c64363768b1fc574b98a8d42a3dd5f7af3e9/pkg/bisect/bisect.go#L422-L427

However... this will effectively disable the current sanity check on
the bisection result: when we bisect to a comment-only change (or to
another arch, etc) b/c with your proposal we will never bisect into a
middle of an equivalence range. We will either point to the beginning
of an equivalence range, or one commit after. Yet still the bisection
can go wrong, we will just not detect that. So the question that is
not clear to me: will it improve quality or make things worse? :)
I am not sure how to answer this question... it would be amusing to
build a parametrized model of this process, with parameters like
reliability of the reproducer, amount and duration of other bugs,
proportion of equivalent builds. Then one could run 10^6 randomized
experiments in no time and compare how parameters and algorithm
details affect the result.

Lukas Bulwahn

unread,
Jul 17, 2020, 8:15:54 AM7/17/20
to Dmitry Vyukov, Jouni Högander, Jukka Kaartinen, syzkaller, Eric Biggers, Pedro Lopes
Yes, that is what I suggest.
 
However... this will effectively disable the current sanity check on
the bisection result: when we bisect to a comment-only change (or to
another arch, etc) b/c with your proposal we will never bisect into a
middle of an equivalence range. We will either point to the beginning
of an equivalence range, or one commit after. Yet still the bisection
can go wrong, we will just not detect that. So the question that is
not clear to me: will it improve quality or make things worse? :)

Agree.

My proposal before was just mixing both really separate aspects into one single proposal:
As we know the current bisection is "not optimal", but then, we use these duplicated reproducer runs to make statements about the flakyness of reproducers.

But it is conceptually much simpler to state two completely independent facts with two "independent" features:

1. Reproducer bisection is optimized as described.

2. Run reproducers multiple times on the bisected commit (last good, first bad) and determine if we really believe that bisection was true or due to wrong decisions in the bisection.
 
Of course, the optimization with 1. gives you some compute power to spend on 2. (If we just spend less time bisecting, we might be willing to give syzkaller some time for repeated runs.)

I am not sure how to answer this question... it would be amusing to
build a parametrized model of this process, with parameters like
reliability of the reproducer, amount and duration of other bugs,
proportion of equivalent builds. Then one could run 10^6 randomized
experiments in no time and compare how parameters and algorithm
details affect the result.

Agree that is possible; I think the challenge is to give proper numbers here.

What is the proportion of equivalent builds for each reproducer? Remember that we want to combine that with the kernel configuration minimization... so we are already back to a full evaluation, not a quick simulation.

I am not afraid of such a benchmark; my reasoning above is sound and should lead to less reproducer runs while reaching both goals, find a reliable first bad commit and identify wrong bisections due to flaky reproducers (as good as that is possible with a flaky reproducer and a fixed number of repetitions on the last good and first bad commit).

We will move forward with a first implementation. Dmitry, let us know what you want to see as evaluation on the current reproducers and bisections for this feature to get this merged and activated in the current syzbot infrastructure. 

Lukas

Dmitry Vyukov

unread,
Jul 18, 2020, 5:32:06 AM7/18/20
to Lukas Bulwahn, Jouni Högander, Jukka Kaartinen, syzkaller, Eric Biggers, Pedro Lopes
Good point.
We can point to a "comment only" change only due to a flaky
reproducer, right? I think if the bisection was distracted by an
unrelated crash, then we should point to the change that introduced
that unrelated crash (which won't be a "comment only" change), right?
If this is true, then I agree it makes sense to assess reproducer
flakiness separately. This has an additional benefit of detecting
wrong bisections which point to a not "comment only" change due to a
flaky reproducer.


>> I am not sure how to answer this question... it would be amusing to
>> build a parametrized model of this process, with parameters like
>> reliability of the reproducer, amount and duration of other bugs,
>> proportion of equivalent builds. Then one could run 10^6 randomized
>> experiments in no time and compare how parameters and algorithm
>> details affect the result.
>
>
> Agree that is possible; I think the challenge is to give proper numbers here.
>
> What is the proportion of equivalent builds for each reproducer? Remember that we want to combine that with the kernel configuration minimization... so we are already back to a full evaluation, not a quick simulation.

Agree.

> I am not afraid of such a benchmark; my reasoning above is sound and should lead to less reproducer runs while reaching both goals, find a reliable first bad commit and identify wrong bisections due to flaky reproducers (as good as that is possible with a flaky reproducer and a fixed number of repetitions on the last good and first bad commit).
>
> We will move forward with a first implementation. Dmitry, let us know what you want to see as evaluation on the current reproducers and bisections for this feature to get this merged and activated in the current syzbot infrastructure.

I understand that evaluating this is hard. So if we agree that this is
a win-win change based on our understanding and mental experiment,
then I am ready to merge it without additional evaluation.
How do you want to detect flaky reproducers?
We already do multiple tests on each commit (we do 10, and doing one
set of 10 is not really different from doing 2 sets of 5, right). So
maybe we can gather some info from these 10 runs?...
However, one interesting aspect is if we do more runs, then we
increase chances of hitting some very low-probability unrelated crash
(e.g. if we run 100 tests, likely at least 1 of them will get some
unrelated hang or something).
Also I am sure I've seen cases where reproducer flakiness changes over
time. Say, on some commits we hit the crash most of the time (say,
9-10 out of 10), but on other commits we start hitting very few
crashes (say, 1-2 out of 10), but it was still the same bug, just
something else has changed that affected the rate of reproduction. So
we need to be careful extrapolating flakiness across commits.
Reply all
Reply to author
Forward
0 new messages