You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to syzk...@googlegroups.com
FYI, of the 97 open syzbot reports against upstream Linux that were bisected to
a single commit, I think 52 of the bisection results are probably correct, and
45 are probably incorrect. I.e., about 53% accuracy.
It would be really helpful to improve this -- ideally by producing more good
results, but even just not sending as many bad results would be an improvement.
It seems the biggest issues that cause bad results are (1) not running the
reproducer enough times for hard-to-reproduce bugs, and (2) treating every
single crash as the desired one even when there are strong indicators it's not.
These are the bugs I've marked as bisected incorrectly
(it's usually fairly obvious):
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Eric Biggers, syzkaller
On Tue, Jun 25, 2019 at 8:59 AM Eric Biggers <ebig...@kernel.org> wrote:
>
> FYI, of the 97 open syzbot reports against upstream Linux that were bisected to
> a single commit, I think 52 of the bisection results are probably correct, and
> 45 are probably incorrect. I.e., about 53% accuracy.
>
> It would be really helpful to improve this -- ideally by producing more good
> results, but even just not sending as many bad results would be an improvement.
> It seems the biggest issues that cause bad results are (1) not running the
> reproducer enough times for hard-to-reproduce bugs, and (2) treating every
> single crash as the desired one even when there are strong indicators it's not.
>
> These are the bugs I've marked as bisected incorrectly
> (it's usually fairly obvious):
Hi Eric,
I agree the results are not good and improving it would be useful.
There needs to be some tuning around number of runs. Though, any idea
have some immediate counter-examples. E.g. doing more runs will
trigger more unrelated bugs, and unrelated bugs is the major source of
incorrect results. I also wasn't able to figure out what exactly can
be done with crash identification. I see that more than half of
crashes have different manifestations, either in time or in space.
I've seen counter-examples to just any idea I could come up with. At
this point we some concrete algorithmic ideas baked by some data.