Minimizing configuration before running commit bisection

66 views
Skip to the first unread message

Jouni Högander

unread,
8 May 2020, 6:52:53 am8/5/20
to syzk...@googlegroups.com, dvy...@google.com, lukas....@gmail.com, Jukka Kaartinen
Hello,

We have been working on implementing configuration minimization for
syzkaller commit bisection. RFC pull request is available here:

https://github.com/google/syzkaller/pull/1689

Now we have finished first evaluations on how this impacts commit
bisection. The evaluation and scripts to reproduce it can be found from
here:

https://github.com/hogander-unikie/syzkaller/blob/config_bisect/docs/config_bisect_evaluation.md

Results were surprisingly bad. We are suspecting it is due to used
configurations or/and selected reproducers.

Any comments or suggestions are appreciated.

BR,

Jouni

Dmitry Vyukov

unread,
12 May 2020, 1:14:40 pm12/5/20
to Jouni Högander, syzkaller, Lukas Bulwahn, Jukka Kaartinen
Hi Jouni,

Just to let you know, I've received this and this is on my radar. I am
just paged out by some intern-season preparation work. I hope to look
at this in more detail tomorrow.

Lukas Bulwahn

unread,
12 May 2020, 2:43:59 pm12/5/20
to Jouni Högander, syzk...@googlegroups.com, Dmitry Vyukov, Jukka Kaartinen
On Fri, May 8, 2020 at 12:52 PM Jouni Högander
<jouni.h...@unikie.com> wrote:
>
Jouni, I agree that the evaluation did not meet our initial expectations.

However, we only really evaluated the config bisection/minimization on
10 reproducers so far and not starting with the best minimal
configuration for bisection. Further, on the syzkaller mailing list
and in its issue tracker, we all have seen that the crash
identification and git bisection has a number of difficulties (and
some randomness is always part of the reproducer runs). Conceptually,
we still believe that the config minimization (by config bisection)
before running the git bisection should reduce the interference of
non-related issues on the git bisection. Our evaluation on 10
reproducers unfortunately could not show this conceptual benefit in
numbers yet, due to many random factors skewing the results of that
evaluation.

That is why it is important to:
a. continue to improve the crash identification, which improves git
bisection in general, and
b. extend our evaluation to a larger set of reproducers.

Lukas

Jouni Högander

unread,
25 May 2020, 7:36:34 am25/5/20
to Dmitry Vyukov, syzkaller, Lukas Bulwahn, Jukka Kaartinen
Hello Dmitry,

We are now considering how to continue with this config
minimization. Should we continue improving the config bisection part and
then run more evaluation before integration? Another option could be to
just put our effort into getting the code into shape that it's possible
to merge into Syzkaller master and run/follow evaluation in Syzbot.

What is your opinion on this?

BR,

Jouni Högander

Dmitry Vyukov

unread,
26 May 2020, 7:59:08 am26/5/20
to Jouni Högander, syzkaller, Lukas Bulwahn, Jukka Kaartinen
Hi,

I am failing badly at keeping my promises of prompt replies. Sorry.

Overall I agree that config minimization should improve quality (maybe
with some additional tweaks).

I looked at the table at:
https://github.com/hogander-unikie/syzkaller/blob/config_bisect/docs/config_bisect_evaluation.md
What is the meaning of SUCCESS/FAILURE? Is it that SUCCESS is
"bisection produces correct result", and FAILURE is "bisection
produces wrong result"?
If so, there is no single case where config minimization helped
(caused switch from FAILURE to SUCCESS). While there are 2 where we
got SUCCESS->FAILURE change. Is it correct? Or am I misinterpreting
it?

Yes, it would be good to get more data to prove that (1) there are
cases where it improves things, (2) there are not much cases where it
makes things worse.

For (1) maybe you can take some bugs from this table:
https://docs.google.com/spreadsheets/d/1WdBAN54-csaZpD3LgmTcIMR7NDFuQoOZZqPZ-CUqQgA/edit#gid=0
where Correct=- Racy=- Unrelated=Y (these should the be best case for
config minimization to help).

I would suggest capturing a full bisection log in any future
experiments, it's very helpful in answering any concrete questions
about a particular bisection later.

I guess to improve quality we need to understand root causes for wrong
results in each case and then try to address common root causes.

One thing that comes to mind: we may get a different bug as we change
configuration. Since we are bisecting configuration on the same kernel
commit, it may be possible to do some guesses as to whether we see the
same bug or a different one (it's much harder to do for different
kernel commits). But I don't know if it's a common reason for
bisection failures or not.

Regarding improving logic further vs getting code into shape and
integrating now.
Overall integrating earlier and in smaller steps is much better. If I
get a huge PR a year later, it will be much harder to review,
productionize and merge it.
We may use syzbot for automatic scalable evaluation platform, but we
need to at least be sure that it does not make things significantly
worse (syzbot mails results to real people). We may also merge, but
disable it initially under some flag and enable later. Moving in
incremental steps will be easier.



On Mon, May 25, 2020 at 1:36 PM Jouni Högander

Lukas Bulwahn

unread,
26 May 2020, 11:44:18 am26/5/20
to Dmitry Vyukov, Jouni Högander, syzkaller, Jukka Kaartinen
On Tue, May 26, 2020 at 1:59 PM Dmitry Vyukov <dvy...@google.com> wrote:
> [...]
>
> Regarding improving logic further vs getting code into shape and
> integrating now.
> Overall integrating earlier and in smaller steps is much better. If I
> get a huge PR a year later, it will be much harder to review,
> productionize and merge it.
> We may use syzbot for automatic scalable evaluation platform, but we
> need to at least be sure that it does not make things significantly
> worse (syzbot mails results to real people). We may also merge, but
> disable it initially under some flag and enable later. Moving in
> incremental steps will be easier.
>

So the planned steps would be:

1. Get the current implementation with the feature set as-is in shape
for integrating it now, but keep it disabled by default. Get the
current pull request ready and merged.
2. Extend evaluation, especially building on top of the referred
syzbot bisection evaluation results. Share evaluation result with the
syzkaller community, i.e., on this mailing list.
3. Add further refinements to improve bisection results until we can
confidently show we are better off with kernel config minimization.
Open new pull requests with refinements, get that ready and merged.
4. Make kernel config minimization default.
5. Roll out into public syzbot instance.
6. Take care of responses from kernel developers when this
functionality is enabled.

Right?

Lukas

Dmitry Vyukov

unread,
27 May 2020, 10:27:35 am27/5/20
to Lukas Bulwahn, Jouni Högander, syzkaller, Jukka Kaartinen
The plan sounds good to me.

Jukka Kaartinen

unread,
30 June 2020, 11:41:56 am30/6/20
to syzkaller
Hi all,

We did some evaluation if commit bisecting helps to find the correct commit in commit bisect.

https://docs.google.com/spreadsheets/d/1KgRs2zISoyXa4Cz66dFnCN2BEjkNgDDemejrEVMg1Wc/edit?usp=sharing with filter Correct=- Racy=- Unrelated=Y

In the 19 cases that we examined. We found out that using minimum configuration does help a bit. It will not solve the problem but it helps to minimize the amount of noise that is seen in the commit bisect process and it speeds up the commit bisecting because there is less to compile.

While doing the evaluation we found other problems that would be good to fix.

When we set our goal to find a commit that introduces a bug then we should only search for that specific bug. Good example here: https://syzkaller.appspot.com/text?tag=Log&x=106c36d7200000

Bisecting should look for "crashed: general protection fault in qca_setup" but when we start bisecting that bug is not seen anymore instead bisect considers "WARNING in __might_sleep" as the crash.

Here is also a good example https://syzkaller.appspot.com/text?tag=Log&x=160eabcf200000 only the first run reproduces the original crash and is not seen after that.

Another issue that some crashes are very hard to reproduce. Here original "WARNING in rcu_check_gp_start_stall" is not seen at all in the bisecting log: https://syzkaller.appspot.com/text?tag=Log&x=111856cf200000 from bug: https://syzkaller.appspot.com/bug?id=0c963236471bc9561fd3b38da03cd09482e90c72


Commit bisect logs here:

https://drive.google.com/drive/folders/18k91d0uAI0f5lySYMd7Qw0O56vIeC4pS


config_bisect_evaluation.png



Dmitry Vyukov

unread,
1 July 2020, 4:10:14 am1/7/20
to Jukka Kaartinen, syzkaller
Interesting.
Do I understand it correctly that these 2 cases of "config bisection and commit bisect succeeded" are where the original bisect went wrong because of too big config and with minimized config bisection produced the correct answer (let's call it X)?
I think the most interesting number would be that X divided by number of cases where we have reproducer, we were able to reproduce the crash on the original commit but bisection went wrong. Taking into account cases where we don't have a reproducer, or where we were not able to reproduce on the original commit is not too interesting in this context, because this is not something config bisection is supposed to help with.

Re different manifestations, this come up multiple times, but so far I don't see practical options to do this:
Do you have any suggestions on this front?


--
You received this message because you are subscribed to the Google Groups "syzkaller" group.
To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/07e36476-42bb-433f-af11-6e6b5e1ca277n%40googlegroups.com.

Antti Stenhäll

unread,
1 July 2020, 9:30:53 am1/7/20
to syzkaller
Hi, 

I did minor updates for material (last execution round finished). One more successful commit bisect round, details available on sheet which Jukka linked and bisect-log added to share drive.

BR,
Antti

Lukas Bulwahn

unread,
2 July 2020, 1:33:47 am2/7/20
to Antti Stenhäll, syzkaller
Thanks, Jouni, Jukka and Antti.

Dmitry, if I understand the summary from Jukka correctly, the "number of cases where we have reproducer, we were able to reproduce the crash on the original commit but bisection went wrong." (let's call it Y) is 7. So, X / Y = 2 / 7 = 0,28.

Why 7?
The original 19 cases we looked at all are cases where the original bisection went wrong (according to your evaluation; we filtered on Unrelated=Y).
In 12 cases, for 3 there was no reproducer to start with (I guess that was not available anymore from your original evaluation) and for 9 cases we could not reproduce the original bug, the evaluation "stopped" before we ran the config minimization. So we were left with 7 cases where we started minimizing the config.

The data set is still quite small, but running the whole config minimization & commit bisection simply takes time and we are not having a larger scale server in the back here. I would love to see this minimization rolled out on the syzbot infrastructure, and maybe we can run a few bisections in parallel with and without config minimization on some recent bugs that appeared and if they differ get a report, e.g., to syzkaller-bugs mailing list, and then we can determine and report back and collect which bisection is correct and which is wrong. We will continue to work on the implementation and get the pull request ready. Dmitry, let us know if you would need further changes to the code to make that possible.


Let us see if Jukka can describe the ideas on distinguishing bugs with their warning message, maybe a very simple heuristics on splitting the warning in function name and original string and checking if the warning string is "similar" to the original one is sufficient to get better there.


Lukas

Dmitry Vyukov

unread,
2 July 2020, 3:31:49 am2/7/20
to Lukas Bulwahn, Antti Stenhäll, syzkaller
On Thu, Jul 2, 2020 at 7:33 AM Lukas Bulwahn <lukas....@gmail.com> wrote:
>
> Thanks, Jouni, Jukka and Antti.
>
> Dmitry, if I understand the summary from Jukka correctly, the "number of cases where we have reproducer, we were able to reproduce the crash on the original commit but bisection went wrong." (let's call it Y) is 7. So, X / Y = 2 / 7 = 0,28.

2/7 looks pretty good. This means 2/7 emails will be correct and add
right people to CC rather than point to a random commit and CC random
people.
For cases where we were not able to reproduce, etc, syzbot will also
not send an email, so that's fine (for some definition of fine).
Maybe we will also be able to improve config bisection in future as we
gather more data on cases where it's not working as expected.


> Why 7?
> The original 19 cases we looked at all are cases where the original bisection went wrong (according to your evaluation; we filtered on Unrelated=Y).
> In 12 cases, for 3 there was no reproducer to start with (I guess that was not available anymore from your original evaluation) and for 9 cases we could not reproduce the original bug, the evaluation "stopped" before we ran the config minimization. So we were left with 7 cases where we started minimizing the config.
>
> The data set is still quite small, but running the whole config minimization & commit bisection simply takes time and we are not having a larger scale server in the back here. I would love to see this minimization rolled out on the syzbot infrastructure, and maybe we can run a few bisections in parallel with and without config minimization on some recent bugs that appeared and if they differ get a report, e.g., to syzkaller-bugs mailing list, and then we can determine and report back and collect which bisection is correct and which is wrong. We will continue to work on the implementation and get the pull request ready. Dmitry, let us know if you would need further changes to the code to make that possible.

I fixed the noop change detection:
https://github.com/google/syzkaller/pull/1889/commits/27ed94e9183814d80d05722bea394a256d83ae8f
And now looking into merging it carefully.

Btw, "noop change" detection (when we attribute bug to a commit in a
different arch or docs change that does not even affect vmlinux
binary) is not working completely for other reasons:
https://github.com/google/syzkaller/issues/1271#issuecomment-559093018
It seems that it could help to filter out 1/3 of incorrect emails
(don't fix bisections, but at least prevent nonsense emails).
Maybe you are interested in looking into it? ;)


> Let us see if Jukka can describe the ideas on distinguishing bugs with their warning message, maybe a very simple heuristics on splitting the warning in function name and original string and checking if the warning string is "similar" to the original one is sufficient to get better there.

I can tell you ahead of time what I will ask for if you are doing
this: a base of samples for testing where a single reproducer leads to
different crash titles. Both for racy bugs that lead to different
crashes on the same kernel and especially different crashes on
different kernel versions when a function was renamed or bug detection
tool changed to produce a different message.

Also you seem to assume that if we see a different bug, it means that
the original bug is not present in this kernel. This is not
necessarily true. As the result an attempt to detect "same crash" may
in an incorrect bisection whereas the current blind logic would result
in a correct bisection. Overall will it result in more corrected
bisections, or more broken bisections?
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/CAKXUXMz7723Oc7BH_3w-znkPkD1foTYe7t36Z_k9s%3Dh_qO81BA%40mail.gmail.com.

Lukas Bulwahn

unread,
2 July 2020, 5:29:17 am2/7/20
to Dmitry Vyukov, Antti Stenhäll, syzkaller
Agree. That might be much simpler and if your statistics is right, it
is quite effective, especially when combined with the minimized kernel
config, because many commits probably have no impact on the actual
kernel build. That is certainly going to help to detect the false
bisections and reduce the false reporting.

>
> > Let us see if Jukka can describe the ideas on distinguishing bugs with their warning message, maybe a very simple heuristics on splitting the warning in function name and original string and checking if the warning string is "similar" to the original one is sufficient to get better there.
>
> I can tell you ahead of time what I will ask for if you are doing
> this: a base of samples for testing where a single reproducer leads to
> different crash titles. Both for racy bugs that lead to different
> crashes on the same kernel and especially different crashes on
> different kernel versions when a function was renamed or bug detection
> tool changed to produce a different message.
>
> Also you seem to assume that if we see a different bug, it means that
> the original bug is not present in this kernel. This is not
> necessarily true. As the result an attempt to detect "same crash" may
> in an incorrect bisection whereas the current blind logic would result
> in a correct bisection. Overall will it result in more corrected
> bisections, or more broken bisections?
>

Agree. Those are the challenges here.

We will discuss what we think is most promising and give it a shot to
do something about it.

Lukas

Dmitry Vyukov

unread,
2 July 2020, 11:02:43 am2/7/20
to Lukas Bulwahn, Antti Stenhäll, syzkaller
Right. It will be even more efficient with config minimization. I did
not consider this positive side-effect of config minimization.
Reply all
Reply to author
Forward
0 new messages