On Thu, Jan 18, 2018 at 2:09 AM, Theodore Ts'o <
ty...@mit.edu> wrote:
> On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote:
>>
>> If syzkaller can only test one tree than linux-next should be the one.
>
> Well, there's been some controversy about that.  The problem is that
> it's often not clear if this is long-standing bug, or a bug which is
> in a particular subsystem tree --- and if so, *which* subsystem tree,
> etc.  So it gets blasted to linux-kernel, and to 
get_maintainer.pl,
> which is often not accurate --- since the location of the crash
> doesn't necessarily point out where the problem originated, and hence
> who should look at the syzbot report.  And so this has caused
> some.... irritation.
Re set of tested trees.
We now have an interesting spectrum of opinions.
Some assorted thoughts on this:
1. First, "upstream is clean" won't happen any time soon. There are
several reasons for this:
 - Currently syzkaller only tests a subset of subsystems that it knows
how to test, even the ones that it tests it tests poorly. Over time
it's improved to test most subsystems and existing subsystems better.
Just few weeks ago I've added some descriptions for crypto subsystem
and it uncovered 20+ old bugs.
 - syzkaller is guided, genetic fuzzer over time it leans how to do
more complex things by small steps. It takes time.
 - We have more bug detection tools coming: LEAKCHECK, KMSAN (uninit
memory), KTSAN (data races).
 - generic syzkaller smartness will be improved over time.
 - it will get more CPU resources.
Effect of all of these things is multiplicative: we test more code,
smarter, with more bug-detection tools, with more resources. So I
think we need to plan for a mix of old and new bugs for foreseeable
future.
2. 
get_maintainer.pl and mix of old and new bugs was mentioned as
harming attribution. I don't see what will change when/if we test only
upstream. Then the same mix of old/new bugs will be detected just on
upstream, with all of the same problems for old/new, maintainers,
which subsystem, etc. I think the amount of bugs in the kernel is
significant part of the problem, but the exact boundary where we
decide to start killing them won't affect number of bugs.
3. If we test only upstream, we increase chances of new security bugs
sinking into releases. We sure could raise perceived security value of
the bugs by keeping them private, letting them sink into release,
letting them sink into distros, and then reporting a high-profile
vulnerability. I think that's wrong. There is something broken with
value measuring in security community. Bug that is killed before
sinking into any release is the highest impact thing. As Alexei noted,
fixing bugs es early as possible also reduces fix costs, backporting
burden, etc. This also can eliminate need in bisection in some cases,
say if you accepted a large change to some files and a bunch of
crashes appears for these files on your tree soon, it's obvious what
happens.
4. It was mentioned that linux-next can have a broken slab allocator
and that will manifest as multiple random crashes. FWIW I don't
remember that I ever seen this. Yes, sometimes it does not build/boot,
but these builds are just rejected for testing.
I don't mind dropping linux-next specifically if that's the common
decision. However, (1) Alexei and Gruenter expressed opposite opinion,
(2) I don't see what it will change dramatically, (2) as far as I
understand Linus actually relies on linux-next giving some concrete
testing to the code there.
But I think that testing bpf-next is a positive thing provided that
there is explicit interest from maintainers. And note that that will
be testing targeted specifically at bpf subsystem, so that instance
will not generate bugs in SCSI, USB, etc (though it will cover a part
of net). Also note that the latest email format includes set of tree
where the crash happened, so if you see "upstream" or "upstream and
bpf-next", nothing really changes, you still know that it happens
upstream. Or if you see only "bpf-next", then you know that it's only
that tree.