WARNING in up_write

23 views
Skip to first unread message

syzbot

unread,
Apr 2, 2018, 10:01:02 PM4/2/18
to linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
Hello,

syzbot hit the following crash on upstream commit
86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
Merge branch 'ras-core-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
syzbot dashboard link:
https://syzkaller.appspot.com/bug?extid=dc5ab2babdf22ca091af

So far this crash happened 8 times on upstream.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5688491102961664
syzkaller reproducer:
https://syzkaller.appspot.com/x/repro.syz?id=5709211904245760
Raw console output:
https://syzkaller.appspot.com/x/log.txt?id=5720789257027584
Kernel config:
https://syzkaller.appspot.com/x/.config?id=6801295859785128502
compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+dc5ab2...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for
details.
If you forward the report, please keep this part and the footer.

EXT4-fs (sda1): shut down requested (0)
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(sem->owner != get_current())
WARNING: CPU: 1 PID: 4441 at kernel/locking/rwsem.c:133
up_write+0x1cc/0x210 kernel/locking/rwsem.c:133
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 4441 Comm: syzkaller594909 Not tainted 4.16.0+ #11
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x1a7/0x27d lib/dump_stack.c:53
panic+0x1f8/0x42c kernel/panic.c:183
__warn+0x1dc/0x200 kernel/panic.c:547
report_bug+0x1f4/0x2b0 lib/bug.c:186
fixup_bug.part.10+0x37/0x80 arch/x86/kernel/traps.c:178
fixup_bug arch/x86/kernel/traps.c:247 [inline]
do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
RIP: 0010:up_write+0x1cc/0x210 kernel/locking/rwsem.c:133
RSP: 0018:ffff8801b349f710 EFLAGS: 00010286
RAX: dffffc0000000008 RBX: ffff8801ccc0ce40 RCX: ffffffff815ae26e
RDX: 0000000000000000 RSI: 1ffff10036693e92 RDI: 1ffff10036693e67
RBP: ffff8801b349f798 R08: fffffbfff10b0659 R09: fffffbfff10b0659
R10: ffff8801b349f708 R11: fffffbfff10b0658 R12: 1ffff10036693ee2
R13: dffffc0000000000 R14: ffff8801b349f770 R15: ffff8801ccc0ce98
percpu_up_write+0xca/0x110 kernel/locking/percpu-rwsem.c:183
sb_freeze_unlock fs/super.c:1390 [inline]
thaw_super+0x1ca/0x260 fs/super.c:1524
thaw_bdev+0x151/0x180 fs/block_dev.c:555
ext4_shutdown fs/ext4/ioctl.c:489 [inline]
ext4_ioctl+0x1f85/0x3e60 fs/ext4/ioctl.c:1048
vfs_ioctl fs/ioctl.c:46 [inline]
do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
SYSC_ioctl fs/ioctl.c:701 [inline]
SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x440109
RSP: 002b:00007fffce185d28 EFLAGS: 00000213 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440109
RDX: 0000000020000100 RSI: 000000008004587d RDI: 0000000000000003
RBP: 00000000006ca018 R08: 000000000000000f R09: 65732f636f72702f
R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000401990
R13: 0000000000401a20 R14: 0000000000000000 R15: 0000000000000000
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzk...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug
report.
Note: all commands must start from beginning of the line in the email body.

Dmitry Vyukov

unread,
Apr 4, 2018, 3:24:27 PM4/4/18
to syzbot, Theodore Ts'o, linux-fsdevel, LKML, syzkall...@googlegroups.com, Al Viro
On Tue, Apr 3, 2018 at 4:01 AM, syzbot
<syzbot+dc5ab2...@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzbot hit the following crash on upstream commit
> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
> Merge branch 'ras-core-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=dc5ab2babdf22ca091af
>
> So far this crash happened 8 times on upstream.
> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5688491102961664
> syzkaller reproducer:
> https://syzkaller.appspot.com/x/repro.syz?id=5709211904245760
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=5720789257027584
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=6801295859785128502
> compiler: gcc (GCC) 7.1.1 20170620
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+dc5ab2...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.

+Ted for ext4 frames
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/001a1148578c10e4700568e814eb%40google.com.
> For more options, visit https://groups.google.com/d/optout.

Matthew Wilcox

unread,
Apr 4, 2018, 3:35:08 PM4/4/18
to Dmitry Vyukov, syzbot, Theodore Ts'o, linux-fsdevel, LKML, syzkall...@googlegroups.com, Al Viro
On Wed, Apr 04, 2018 at 09:24:05PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 3, 2018 at 4:01 AM, syzbot
> <syzbot+dc5ab2...@syzkaller.appspotmail.com> wrote:
> > DEBUG_LOCKS_WARN_ON(sem->owner != get_current())
> > WARNING: CPU: 1 PID: 4441 at kernel/locking/rwsem.c:133 up_write+0x1cc/0x210
> > kernel/locking/rwsem.c:133
> > Kernel panic - not syncing: panic_on_warn set ...

Message-Id: <1522852646-2196-1-gi...@redhat.com>

Theodore Y. Ts'o

unread,
Apr 4, 2018, 11:22:02 PM4/4/18
to Matthew Wilcox, Dmitry Vyukov, syzbot, linux-fsdevel, LKML, syzkall...@googlegroups.com, Al Viro
We were way ahead of syzbot in this case. :-)

I reported the problem Tuesday morning:

https://lkml.org/lkml/2018/4/4/814

And within a few hours Waiman had proposed a fix:

https://patchwork.kernel.org/patch/10322639/

Note also that it's not ext4 specific. It can be trivially reproduced using any one of:

kvm-xfstests -c ext4 generic/068
kvm-xfstests -c btrfs generic/068
kvm-xfstests -c xfs generic/068

(Basically, any file system that supports freeze/thaw.)

Cheers,

- Ted

Matthew Wilcox

unread,
Apr 4, 2018, 11:24:57 PM4/4/18
to Theodore Y. Ts'o, Dmitry Vyukov, syzbot, linux-fsdevel, LKML, syzkall...@googlegroups.com, Al Viro
On Wed, Apr 04, 2018 at 11:22:00PM -0400, Theodore Y. Ts'o wrote:
> On Wed, Apr 04, 2018 at 12:35:04PM -0700, Matthew Wilcox wrote:
> > On Wed, Apr 04, 2018 at 09:24:05PM +0200, Dmitry Vyukov wrote:
> > > On Tue, Apr 3, 2018 at 4:01 AM, syzbot
> > > <syzbot+dc5ab2...@syzkaller.appspotmail.com> wrote:
> > > > DEBUG_LOCKS_WARN_ON(sem->owner != get_current())
> > > > WARNING: CPU: 1 PID: 4441 at kernel/locking/rwsem.c:133 up_write+0x1cc/0x210
> > > > kernel/locking/rwsem.c:133
> > > > Kernel panic - not syncing: panic_on_warn set ...
> >
> > Message-Id: <1522852646-2196-1-gi...@redhat.com>
> >
>
> We were way ahead of syzbot in this case. :-)

Not really ... syzbot caught it Monday evening ;-)

Date: Mon, 02 Apr 2018 19:01:01 -0700
From: syzbot <syzbot+dc5ab2...@syzkaller.appspotmail.com>
To: linux-...@vger.kernel.org, linux-...@vger.kernel.org,
syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
Subject: WARNING in up_write

Dmitry Vyukov

unread,
Apr 5, 2018, 4:22:29 AM4/5/18
to Matthew Wilcox, Theodore Y. Ts'o, syzbot, linux-fsdevel, LKML, syzkall...@googlegroups.com, Al Viro
:)

#syz fix: locking/rwsem: Add up_write_non_owner() for percpu_up_write()

Dave Chinner

unread,
Apr 5, 2018, 6:33:47 PM4/5/18
to Matthew Wilcox, Theodore Y. Ts'o, Dmitry Vyukov, syzbot, linux-fsdevel, LKML, syzkall...@googlegroups.com, Al Viro
On Wed, Apr 04, 2018 at 08:24:54PM -0700, Matthew Wilcox wrote:
> On Wed, Apr 04, 2018 at 11:22:00PM -0400, Theodore Y. Ts'o wrote:
> > On Wed, Apr 04, 2018 at 12:35:04PM -0700, Matthew Wilcox wrote:
> > > On Wed, Apr 04, 2018 at 09:24:05PM +0200, Dmitry Vyukov wrote:
> > > > On Tue, Apr 3, 2018 at 4:01 AM, syzbot
> > > > <syzbot+dc5ab2...@syzkaller.appspotmail.com> wrote:
> > > > > DEBUG_LOCKS_WARN_ON(sem->owner != get_current())
> > > > > WARNING: CPU: 1 PID: 4441 at kernel/locking/rwsem.c:133 up_write+0x1cc/0x210
> > > > > kernel/locking/rwsem.c:133
> > > > > Kernel panic - not syncing: panic_on_warn set ...
> > >
> > > Message-Id: <1522852646-2196-1-gi...@redhat.com>
> > >
> >
> > We were way ahead of syzbot in this case. :-)
>
> Not really ... syzbot caught it Monday evening ;-)

Rather than arguing over who reported it first, I think that time
would be better spent reflecting on why the syzbot report was
completely ignored until *after* Ted diagnosed the issue
independently and Waiman had already fixed it....

Clearly there is scope for improvement here.

Cheers,

Dave.
--
Dave Chinner
da...@fromorbit.com

Eric Biggers

unread,
Apr 5, 2018, 8:13:29 PM4/5/18
to Dave Chinner, Matthew Wilcox, Theodore Y. Ts'o, Dmitry Vyukov, syzbot, linux-fsdevel, LKML, syzkall...@googlegroups.com, Al Viro
Well, ultimately a human needed to investigate the syzbot bug report to figure
out what was really going on. In my view, the largest problem is that there are
simply too many bugs, so many are getting ignored. If there were only a few
bugs, then Dmitry would investigate each one and send a "real" bug report of
better quality than the automated system can provide, or even send a fix
directly. But in reality, on the same day this bug was reported, syzbot also
found 10 other bugs, and in the previous 2 days it had found 38 more. No single
person can keep up with that. You can see the current bug list, which has 172
open bugs, on the dashboard at https://syzkaller.appspot.com/. Yes, the kernel
really is that broken. Though, of course most bugs are in specific modules, not
the core kernel.

And although quite a few of these bugs will end up to be duplicates or even
already fixed, a human still has to look at each one to figure that out.
(Though, I do think that syzbot should try to automatically detect when a
reproducible bug was already fixed, via bisection. It would cause a few bugs to
be incorrectly considered fixed, but it may be a worthwhile tradeoff.)

These bugs are all over the kernel as well, so most developers don't see the big
picture but rather just see a few bugs for "their" subsystem on "their"
subsystem's mailing list and sometimes demand special attention. Of course,
it's great when people suggest ways to improve the process. But it's not great
when people just don't feel responsible for fixing bugs and wait for
Someone Else to do it.

I'm hoping that in the future the syzbot "team", which seems to actually be just
Dmitry now, can get more resources towards helping fix the bugs. But either
way, in the end Linux is a community effort.

Note also that syzbot wasn't super useful in this particular case because people
running xfstests came across the same bug. But, this is actually a rare case.
Most syzbot bug reports have been for weird corner cases or races that no one
ever thought of before, so there are no existing tests that find them.

Thanks,

Eric

Theodore Y. Ts'o

unread,
Apr 5, 2018, 9:37:45 PM4/5/18
to Eric Biggers, Dave Chinner, Matthew Wilcox, Dmitry Vyukov, syzbot, linux-fsdevel, LKML, syzkall...@googlegroups.com, Al Viro
On Thu, Apr 05, 2018 at 05:13:25PM -0700, Eric Biggers wrote:
> Well, ultimately a human needed to investigate the syzbot bug report to figure
> out what was really going on. In my view, the largest problem is that there are
> simply too many bugs, so many are getting ignored. If there were only a few
> bugs, then Dmitry would investigate each one and send a "real" bug report of
> better quality than the automated system can provide, or even send a fix
> directly. But in reality, on the same day this bug was reported, syzbot also
> found 10 other bugs, and in the previous 2 days it had found 38 more. No single
> person can keep up with that. You can see the current bug list, which has 172
> open bugs, on the dashboard at https://syzkaller.appspot.com/. Yes, the kernel
> really is that broken. Though, of course most bugs are in specific modules, not
> the core kernel.

There are a lot of bugs, so it needs to be easier for humans to figure
out which ones they should care about. And not all bugs are created
equal. Some are WARN_ON's that aren't all that important. Others
will hard crash the kernel, but are not likely to be something that
can be turned into a privilege escalation attack. Some bugs are
trivially reproducible, and some take a lot more effort. Making it
easier for humans to decide which ones should be looked at first would
certainly be helpful.y

For me the prioritization goes as follows.

1) Is it a regression? If it's a regression, I want to fix it fast.

2) Is it something that can be easily escalated to a privilege escalation attack?
Again, if so, I want to fix it fast.

3) Is it going to get in the way of my development process? Things
that trigger new xfstests failures are important, because it's how I
detect (1).

So I ignored the Syzkaller reports this week because it's hard to
differentiate important bugs from less important ones, and after the
merge window, I want to make sure that I have not introduced any
regressions, and I also want to make sure that commits getting merged
by others have not introduced any regressions in the testing suite
that I use, which is xfstests.

This is why I've been asking for the bisection feature --- not to find
out when a bug has been fixed, but to find out when a bug has been
*introduced*. If I know that this a bug which has recently
introduced, especially if it has been recently introduced by commits
in my tree, or which I have recently pushed to Linus, I'm going to
care a lot more. If I can't make that determination, I'm going to
deprioritize that bug in favor of those that definitely do meet these
criteria.

It's not a matter of waiting for someone else to fix it (although I
won't complain if someone does :-). It's that I'm overloaded, and I
have to prioritize the work that I do. If syzbot reports are hard to
parse or hard to prioritize, then I may end up prioritizing other work
as being more important. Sorry, but that's just the way that it is.

Note that I haven't just been complaining about it. I've been working
on ways so that the gce-xfstests and kvm-xfstests test appliances can
more easily be used to work on Syzbot reports. If I can make myself
more efficient, or help other people be more efficient, that's
arguably more important than trying to fix some of the 174 currently
open Syzbot issues --- unless you can tell me that certain ones are
super urgent because they (for example) result in CVSS score > 8.

Cheers,

- Ted

Dave Chinner

unread,
Apr 5, 2018, 10:02:10 PM4/5/18
to Eric Biggers, Matthew Wilcox, Theodore Y. Ts'o, Dmitry Vyukov, syzbot, linux-fsdevel, LKML, syzkall...@googlegroups.com, Al Viro
On Thu, Apr 05, 2018 at 05:13:25PM -0700, Eric Biggers wrote:
> On Fri, Apr 06, 2018 at 08:32:26AM +1000, Dave Chinner wrote:
> > On Wed, Apr 04, 2018 at 08:24:54PM -0700, Matthew Wilcox wrote:
> > > On Wed, Apr 04, 2018 at 11:22:00PM -0400, Theodore Y. Ts'o wrote:
> > > > On Wed, Apr 04, 2018 at 12:35:04PM -0700, Matthew Wilcox wrote:
> > > > > On Wed, Apr 04, 2018 at 09:24:05PM +0200, Dmitry Vyukov wrote:
> > > > > > On Tue, Apr 3, 2018 at 4:01 AM, syzbot
> > > > > > <syzbot+dc5ab2...@syzkaller.appspotmail.com> wrote:
> > > > > > > DEBUG_LOCKS_WARN_ON(sem->owner != get_current())
> > > > > > > WARNING: CPU: 1 PID: 4441 at kernel/locking/rwsem.c:133 up_write+0x1cc/0x210
> > > > > > > kernel/locking/rwsem.c:133
> > > > > > > Kernel panic - not syncing: panic_on_warn set ...
> > > > >
> > > > > Message-Id: <1522852646-2196-1-gi...@redhat.com>
> > > > >
> > > >
> > > > We were way ahead of syzbot in this case. :-)
> > >
> > > Not really ... syzbot caught it Monday evening ;-)
> >
> > Rather than arguing over who reported it first, I think that time
> > would be better spent reflecting on why the syzbot report was
> > completely ignored until *after* Ted diagnosed the issue
> > independently and Waiman had already fixed it....
> >
> > Clearly there is scope for improvement here.
> >
> > Cheers,
> >
>
> Well, ultimately a human needed to investigate the syzbot bug report to figure
> out what was really going on. In my view, the largest problem is that there are
> simply too many bugs, so many are getting ignored.

Well, yeah. And when there's too many bugs, looking at the ones
people are actually hitting tend to take precedence over those
reported by a bot an image problem...

> If there were only a few bugs, then Dmitry would investigate each
> one and send a "real" bug report of better quality than the
> automated system can provide, or even send a fix directly. But in
> reality, on the same day this bug was reported, syzbot also found
> 10 other bugs, and in the previous 2 days it had found 38 more.
> No single person can keep up with that.

And this is precisely why people turn around and ask the syzbot
developers to do things that make it easier for them to diagnose
the problems syzbot reports.

> You can see the current
> bug list, which has 172 open bugs, on the dashboard at
> https://syzkaller.appspot.com/.

Is that all? That's *nothing*.

> Yes, the kernel really is that
> broken.

Actually, that tells me the kernel is a hell of a lot better than my
experience leads me to beleive it is. I'd have expected thousands of
bugs, even tens of thousands of bugs given how many issues we deal
with in individual subsystems on a day to day basis.

> And although quite a few of these bugs will end up to be
> duplicates or even already fixed, a human still has to look at
> each one to figure that out. (Though, I do think that syzbot
> should try to automatically detect when a reproducible bug was
> already fixed, via bisection. It would cause a few bugs to be
> incorrectly considered fixed, but it may be a worthwhile
> tradeoff.)
>
> These bugs are all over the kernel as well, so most developers
> don't see the big picture but rather just see a few bugs for
> "their" subsystem on "their" subsystem's mailing list and
> sometimes demand special attention. Of course, it's great when
> people suggest ways to improve the process.

That's not the response I got....

> But it's not great
> when people just don't feel responsible for fixing bugs and wait
> for Someone Else to do it.

The excessive cross posting of the reports is one of the reasons
people think someone else will take care of it. i.e. "Oh, that looks VFS,
that went to -fsdevel, I don't need to look at it"....

Put simply: if you're mounting an XFS filesystem image and something
goes bang, then it should be reported to the XFS list. It does not
need to be cross posted to LKML, -fsdevel, 10 individual developers,
etc. If it's not an XFS problem, then the XFS developers will CC the
relevant lists as needed.

> I'm hoping that in the future the syzbot "team", which seems to
> actually be just Dmitry now, can get more resources towards
> helping fix the bugs. But either way, in the end Linux is a
> community effort.

We don't really need help fixing the bugs - we need help making it
easier to *find the bug* the bot tripped over. That's what the
syzbot team needs to focus on, not tell people that what they got is
all they are going to get.

> Note also that syzbot wasn't super useful in this particular case
> because people running xfstests came across the same bug. But,
> this is actually a rare case. Most syzbot bug reports have been
> for weird corner cases or races that no one ever thought of
> before, so there are no existing tests that find them.

Which is exactly what these whacky "mount a filesystem fragment"
tests it is now doing are exercising. Finding the cause of
corruption related crashes is not easy and takes time. Having the
bot developers add something to the bot that will save the developer
looking at the problem 10 minutes of setup time makes a huge
difference to the effort required to find the problem.

The tool is useless if people find it too hard to make sense of the
bug reports (*cough* lockdep *cough*) or perform triage of the
report. If we want to get the bugs fixed faster, we have to make the
reports from automated tools contain the exact information the
developer needs to solve the problem.

Dmitry Vyukov

unread,
Apr 13, 2018, 2:25:50 PM4/13/18
to Dave Chinner, Eric Biggers, Matthew Wilcox, Theodore Y. Ts'o, syzbot, linux-fsdevel, LKML, syzkaller-bugs, Al Viro, syzkaller
Hi,

Regarding feature requests.
We too have limited resources unfortunately and can't handle all
feature requests. Feature requests generally fall into the following
categories:

1. General features that are easy to do.
These are generally done right away (more or less).

2. General features that require significant time.
These are noted and are done as resources permit. For example:
- bisection (https://github.com/google/syzkaller/issues/501)
- kdump collection (https://github.com/google/syzkaller/issues/491)
Examples of what is done already:
- patch testing
- significantly restructured reports

3. Subsystem-specific features that are easy to do.
I don't remember that we got any. I guess they would compete with case 2.

4. Subsystem-specific features that require significant time.
For these we don't have resources at the moment. Our company have
dedicated people for some subsystems (to not go far -- Ted for ext4),
but we don't have people for just any subsystem.
Kernel developers working on Infiniband contributed to syzkaller
themselves, and as far as I understand they are very happy with the
results because it allowed them to find and fix several dozens of
critical bugs (without involing us at all), so that's an option too.

Then, the context of the system is not a single subsystem and not a
single bug. Please don't draw all conclusions from a small subset of
cases. At this scale there inevitably will be harder bugs that will be
handled worse than a dedicated human would do (but a dedicated human
would not be able to handle that amount of bugs). But this does not
make the overall effect negative, lots of hundreds of bugs are getting
fixed. In lots of cases developers pick up bugs from "C program +
repro instructions". There is also considerable amount of simpler bugs
that are getting fixed even without reproducers. In can be a case for
a filesystem too, for example, a NULL deref with an obvious missed
preceeding state check, or a KASAN report with all stacks. It's not
possible to know ahead of time if it's something that can be fixed
with the existing information, or something that can't be. So there is
no option of reporting just the former bugs, we can report either all
of them or none of them (which would mean that none of the bugs are
fixed).

Regarding prioritization.
Bisection is on our plate. But note that a WARNING can be misleading.
One of the bad bugs syzkaller has found was exactly a WARNING, a
WARNING to restore FPU registers on context switch, which means
interprocess, or host->guest information leak. One of the worst ones
manifested in no kernel report at all. It was one of these "target
machine just become unresponsive with no self-detected reports".
"There is something wrong with kernel" reports get lowest priority,
but that one turned out to be full guest->host escape. Even if it's
just a WARNING, but triggered remotely, that can be a large problem
too. So generally prioritizaton still requires an expert atention,
which in turn requires reports all these bugs in the first place.
It can also be a case that an innocent bug masks critical bugs. For
example, if there is an easy to trigger bug on enterance to a
subsystem, nothing else will be discovered until that one is fixed.

There are definitely more than 172 bugs. I agree, thousands. And the
system is generally capable of finding them, it already has found
close to 2000 I think. It's just that the system chokes with existing
bugs and all test machines crash right after boot. The more bugs we
fix, the more new bugs we will see.

Bugs with high CVSS scores are frequently found with similar fuzzing
systems. But these won't be reported by humans on mailing lists, and
these are not bugs people are actually hitting. These look exactly
like this -- some insane inputs to kernel and are sold and used to
exploit our phones and bank accounts.

Regarding CC lists.
If you see issues there, please improve scripts/get_maintainer.pl.
That's what most people use to find relevant emails when reporting
bugs (when they are not maintainers of this very subsystem and have
some secret knowledge) and that's what syzbot uses. If it produces
wrong results, the scope of the problem is larger than syzbot.

Dmitry Vyukov

unread,
Sep 4, 2018, 4:29:14 AM9/4/18
to Matthew Wilcox, Theodore Y. Ts'o, syzbot, linux-fsdevel, LKML, syzkaller-bugs, Al Viro
The title was later changed to:

#syz fix: locking/rwsem: Add a new RWSEM_ANONYMOUSLY_OWNED flag
Reply all
Reply to author
Forward
0 new messages