INFO: task hung in filemap_fault

95 views
Skip to first unread message

syzbot

unread,
Dec 18, 2017, 3:43:04 AM12/18/17
to a...@linux.intel.com, ak...@linux-foundation.org, ja...@suse.cz, jla...@redhat.com, linux-...@vger.kernel.org, linu...@kvack.org, mgo...@techsingularity.net, mi...@kernel.org, npi...@gmail.com, rgol...@suse.com, syzkall...@googlegroups.com
Hello,

syzkaller hit the following crash on
6084b576dca2e898f5c101baef151f7bfdbb606d
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
compiler: gcc (GCC) 7.1.1 20170620
.config is attached
Raw console output is attached.

Unfortunately, I don't have any reproducer for this bug yet.


INFO: task syz-executor7:10280 blocked for more than 120 seconds.
Not tainted 4.15.0-rc3-next-20171214+ #67
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor7 D 0 10280 3310 0x00000004
Call Trace:
context_switch kernel/sched/core.c:2800 [inline]
__schedule+0x30b/0xaf0 kernel/sched/core.c:3376
schedule+0x2e/0x90 kernel/sched/core.c:3435
io_schedule+0x11/0x40 kernel/sched/core.c:5043
wait_on_page_bit_common mm/filemap.c:1099 [inline]
wait_on_page_bit mm/filemap.c:1132 [inline]
wait_on_page_locked include/linux/pagemap.h:530 [inline]
__lock_page_or_retry+0x391/0x3e0 mm/filemap.c:1310
lock_page_or_retry include/linux/pagemap.h:510 [inline]
filemap_fault+0x61c/0xa70 mm/filemap.c:2532
__do_fault+0x23/0xa4 mm/memory.c:3206
do_read_fault mm/memory.c:3616 [inline]
do_fault mm/memory.c:3716 [inline]
handle_pte_fault mm/memory.c:3947 [inline]
__handle_mm_fault+0x10b5/0x1930 mm/memory.c:4071
handle_mm_fault+0x215/0x450 mm/memory.c:4108
faultin_page mm/gup.c:502 [inline]
__get_user_pages+0x1ff/0x980 mm/gup.c:699
populate_vma_page_range+0xa1/0xb0 mm/gup.c:1200
__mm_populate+0xcc/0x190 mm/gup.c:1250
mm_populate include/linux/mm.h:2233 [inline]
vm_mmap_pgoff+0x103/0x110 mm/util.c:338
SYSC_mmap_pgoff mm/mmap.c:1533 [inline]
SyS_mmap_pgoff+0x215/0x2c0 mm/mmap.c:1491
SYSC_mmap arch/x86/kernel/sys_x86_64.c:100 [inline]
SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:91
entry_SYSCALL_64_fastpath+0x1f/0x96
RIP: 0033:0x452a09
RSP: 002b:00007efce66dac58 EFLAGS: 00000212 ORIG_RAX: 0000000000000009
RAX: ffffffffffffffda RBX: 000000000071bea0 RCX: 0000000000452a09
RDX: 0000000000000003 RSI: 0000000000001000 RDI: 0000000020e5b000
RBP: 0000000000000033 R08: 0000000000000016 R09: 0000000000000000
R10: 0000000000002011 R11: 0000000000000212 R12: 00000000006ed568
R13: 00000000ffffffff R14: 00007efce66db6d4 R15: 0000000000000000

Showing all locks held in the system:
2 locks held by khungtaskd/673:
#0: (rcu_read_lock){....}, at: [<00000000f4a26b03>]
check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
#0: (rcu_read_lock){....}, at: [<00000000f4a26b03>] watchdog+0xbf/0x750
kernel/hung_task.c:249
#1: (tasklist_lock){.+.+}, at: [<00000000cd00a56d>]
debug_show_all_locks+0x3d/0x1a0 kernel/locking/lockdep.c:4464
1 lock held by rsyslogd/2967:
#0: (&f->f_pos_lock){+.+.}, at: [<0000000085b629a7>]
__fdget_pos+0x5b/0x70 fs/file.c:765
2 locks held by getty/3089:
#0: (&tty->ldisc_sem){++++}, at: [<000000000b906052>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000d59241b2>]
n_tty_read+0xce/0xa40 drivers/tty/n_tty.c:2131
2 locks held by getty/3090:
#0: (&tty->ldisc_sem){++++}, at: [<000000000b906052>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000d59241b2>]
n_tty_read+0xce/0xa40 drivers/tty/n_tty.c:2131
2 locks held by getty/3091:
#0: (&tty->ldisc_sem){++++}, at: [<000000000b906052>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000d59241b2>]
n_tty_read+0xce/0xa40 drivers/tty/n_tty.c:2131
2 locks held by getty/3092:
#0: (&tty->ldisc_sem){++++}, at: [<000000000b906052>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000d59241b2>]
n_tty_read+0xce/0xa40 drivers/tty/n_tty.c:2131
2 locks held by getty/3093:
#0: (&tty->ldisc_sem){++++}, at: [<000000000b906052>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000d59241b2>]
n_tty_read+0xce/0xa40 drivers/tty/n_tty.c:2131
2 locks held by getty/3094:
#0: (&tty->ldisc_sem){++++}, at: [<000000000b906052>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000d59241b2>]
n_tty_read+0xce/0xa40 drivers/tty/n_tty.c:2131
2 locks held by getty/3095:
#0: (&tty->ldisc_sem){++++}, at: [<000000000b906052>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000d59241b2>]
n_tty_read+0xce/0xa40 drivers/tty/n_tty.c:2131

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 673 Comm: khungtaskd Not tainted 4.15.0-rc3-next-20171214+ #67
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0xe9/0x14b lib/dump_stack.c:53
nmi_cpu_backtrace+0x149/0x150 lib/nmi_backtrace.c:103
nmi_trigger_cpumask_backtrace+0x101/0x150 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
check_hung_task kernel/hung_task.c:132 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
watchdog+0x562/0x750 kernel/hung_task.c:249
kthread+0x149/0x170 kernel/kthread.c:238
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:524
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0x6/0x10
arch/x86/include/asm/irqflags.h:54


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzk...@googlegroups.com.
Please credit me with: Reported-by: syzbot <syzk...@googlegroups.com>

syzbot will keep track of this bug report.
Once a fix for this bug is merged into any tree, reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug
report.
Note: all commands must start from beginning of the line in the email body.
config.txt
raw.log

Tetsuo Handa

unread,
Dec 18, 2017, 9:52:23 AM12/18/17
to syzbot, a...@linux.intel.com, ak...@linux-foundation.org, ja...@suse.cz, jla...@redhat.com, linux-...@vger.kernel.org, linu...@kvack.org, mgo...@techsingularity.net, mi...@kernel.org, npi...@gmail.com, rgol...@suse.com, syzkall...@googlegroups.com
On 2017/12/18 17:43, syzbot wrote:
> Hello,
>
> syzkaller hit the following crash on 6084b576dca2e898f5c101baef151f7bfdbb606d
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
>
> Unfortunately, I don't have any reproducer for this bug yet.
>

This log has a lot of mmap() but also has Android's binder messages.

r9 = syz_open_dev$binder(&(0x7f0000000000)='/dev/binder#\x00', 0x0, 0x800)

[ 49.200735] binder: 9749:9755 IncRefs 0 refcount change on invalid ref 2 ret -22
[ 49.221514] binder: 9749:9755 Acquire 1 refcount change on invalid ref 4 ret -22
[ 49.233325] binder: 9749:9755 Acquire 1 refcount change on invalid ref 0 ret -22
[ 49.241979] binder: binder_mmap: 9749 205a3000-205a7000 bad vm_flags failed -1
[ 49.256949] binder: 9749:9755 unknown command 0
[ 49.262470] binder: 9749:9755 ioctl c0306201 20000fd0 returned -22
[ 49.293365] binder: 9749:9755 IncRefs 0 refcount change on invalid ref 2 ret -22
[ 49.301297] binder: binder_mmap: 9749 205a3000-205a7000 bad vm_flags failed -1
[ 49.314146] binder: 9749:9755 Acquire 1 refcount change on invalid ref 4 ret -22
[ 49.322732] binder: 9749:9755 Acquire 1 refcount change on invalid ref 0 ret -22
[ 49.332063] binder: 9749:9755 Release 1 refcount change on invalid ref 1 ret -22
[ 49.340796] binder: 9749:9755 Acquire 1 refcount change on invalid ref 2 ret -22
[ 49.349457] binder: 9749:9755 BC_DEAD_BINDER_DONE 0000000000000001 not found
[ 49.349462] binder: 9749:9755 BC_DEAD_BINDER_DONE 0000000000000000 not found

[ 246.752088] INFO: task syz-executor7:10280 blocked for more than 120 seconds.

Anything that hung after uptime > 46.75 can be reported at uptime = 246.75, can't it?

Is it possible to reproduce this problem by running the same program?

Tetsuo Handa

unread,
Dec 18, 2017, 5:19:58 PM12/18/17
to syzbot, a...@linux.intel.com, ak...@linux-foundation.org, ja...@suse.cz, jla...@redhat.com, linux-...@vger.kernel.org, linu...@kvack.org, mgo...@techsingularity.net, mi...@kernel.org, npi...@gmail.com, rgol...@suse.com, syzkall...@googlegroups.com
Typo. I wanted to say 126.75 >= uptime > 6.75.
khungtaskd warning with 120 seconds check interval can be delayed for up to 240 seconds.

Dmitry Vyukov

unread,
Dec 28, 2017, 9:09:54 AM12/28/17
to Tetsuo Handa, syzbot, Andi Kleen, Andrew Morton, Jan Kara, jla...@redhat.com, LKML, Linux-MM, Mel Gorman, Ingo Molnar, npi...@gmail.com, rgol...@suse.com, syzkall...@googlegroups.com, Jens Axboe, Ming Lei, Hannes Reinecke, Omar Sandoval, sh...@fb.com
Hi Tetsuo,

syzbot always re-runs the same workload on a new machine. If it
manages to reproduce the problem, it provides a reproducer. In this
case it didn't.

The program that triggered this is this one (number 7 matches task
syz-executor7):

2017/12/18 06:16:18 executing program 7:

It has only 2 mmaps. The first one is pretty standard, but the second
one mmaps loop device:

r7 = syz_open_dev$loop(&(0x7f0000e58000-0xb)='/dev/loop#\x00', 0x0, 0x4102)
mmap(&(0x7f0000e5b000/0x1000)=nil, 0x1000, 0x3, 0x2011, r7, 0x0)

We have a bunch of hangs around /dev/loop:

https://groups.google.com/forum/#!msg/syzkaller-bugs/qzz2v1M93O4/DjHEEvq5AQAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/jy-bXYbRh7c/a1dQYyD9CgAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/vjGYuMMspAU/K3oOF_eHCgAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/BwpEc6q6gFY/5kHDMGElAgAJ

Probably related to these ones.
+loop maintainers.
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/82d89066-7dd2-12fe-3cc0-c8d624fe0d51%40I-love.SAKURA.ne.jp.
> For more options, visit https://groups.google.com/d/optout.

Tetsuo Handa

unread,
Jan 1, 2018, 10:27:57 AM1/1/18
to dvy...@google.com, bot+980f5e5fc060c37505...@syzkaller.appspotmail.com, a...@linux.intel.com, ak...@linux-foundation.org, ja...@suse.cz, jla...@redhat.com, linux-...@vger.kernel.org, linu...@kvack.org, mgo...@techsingularity.net, mi...@kernel.org, npi...@gmail.com, rgol...@suse.com, syzkall...@googlegroups.com, ax...@kernel.dk, tom.l...@gmail.com, ha...@suse.de, osa...@fb.com, sh...@fb.com
> > Anything that hung in 126.75 >= uptime > 6.75 can be reported at uptime = 246.75, can't it?
> > khungtaskd warning with 120 seconds check interval can be delayed for up to 240 seconds.
> >
> > Is it possible to reproduce this problem by running the same program?
>
>
> Hi Tetsuo,
>
> syzbot always re-runs the same workload on a new machine. If it
> manages to reproduce the problem, it provides a reproducer. In this
> case it didn't.

Even if it did not manage to reproduce the problem, showing raw.log in
C format is helpful for me. For example,

ioctl$LOOP_CHANGE_FD(r3, 0x4c00, r1)

is confusing. 0x4c00 is not LOOP_CHANGE_FD but LOOP_SET_FD.
If the message were

ioctl(r3, 0x4c00, r1)

more people will be able to read what the program tried to do.
There are many operations done on loop devices, but are too hard
for me to pick up only loop related actions.

>
> The program that triggered this is this one (number 7 matches task
> syz-executor7):
>
> 2017/12/18 06:16:18 executing program 7:
>
> It has only 2 mmaps. The first one is pretty standard, but the second
> one mmaps loop device:
>
> r7 = syz_open_dev$loop(&(0x7f0000e58000-0xb)='/dev/loop#\x00', 0x0, 0x4102)
> mmap(&(0x7f0000e5b000/0x1000)=nil, 0x1000, 0x3, 0x2011, r7, 0x0)
>
> We have a bunch of hangs around /dev/loop:
>
> https://groups.google.com/forum/#!msg/syzkaller-bugs/qzz2v1M93O4/DjHEEvq5AQAJ
> https://groups.google.com/forum/#!msg/syzkaller-bugs/jy-bXYbRh7c/a1dQYyD9CgAJ
> https://groups.google.com/forum/#!msg/syzkaller-bugs/vjGYuMMspAU/K3oOF_eHCgAJ
> https://groups.google.com/forum/#!msg/syzkaller-bugs/BwpEc6q6gFY/5kHDMGElAgAJ
>
> Probably related to these ones.
> +loop maintainers.
>

I suggest syzbot to try linux.git before reporting bugs in linux-next.git.
You know there are many duplicates caused by an invalid free in pcrypt.
Soft lockups in ioctl(LOOP_SET_FD) are

/* Avoid recursion */
f = file;
while (is_loop_device(f)) {
struct loop_device *l;

if (f->f_mapping->host->i_bdev == bdev)
goto out_putf;

l = f->f_mapping->host->i_bdev->bd_disk->private_data;
if (l->lo_state == Lo_unbound) {
error = -EINVAL;
goto out_putf;
}
f = l->lo_backing_file;
}

loop which means that something (maybe memory corruption) is forming circular
chain, and there seems to be some encryption related parameters/values in
raw.log file. It is nice to retest a kernel without encryption related things
and/or a kernel without known encryption related bugs.

Dmitry Vyukov

unread,
Jan 8, 2018, 4:09:04 AM1/8/18
to Tetsuo Handa, syzbot, Andi Kleen, Andrew Morton, Jan Kara, jla...@redhat.com, LKML, Linux-MM, Mel Gorman, Ingo Molnar, npi...@gmail.com, rgol...@suse.com, syzkall...@googlegroups.com, Jens Axboe, Ming Lei, Hannes Reinecke, Omar Sandoval, sh...@fb.com
On Mon, Jan 1, 2018 at 4:27 PM, Tetsuo Handa
Hi Tetsuo,

The main purpose of this format is different, this is a complete
representation of programs that allows replaying them using syzkaller
tools. We can't simply drop info from there. Do you propose to add
another attached file that contains the same info in a different
format? What is the exact format you are proposing? Is it just
dropping the syscall name part after $ sign? Note that it's still not
C, more complex syscall generally look as follows:

perf_event_open(&(0x7f0000b5a000)={0x4000000002, 0x78, 0x1e2, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xffff, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
@perf_bp={&(0x7f0000000000)=0x0, 0x0}, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0}, 0x0, 0x0, 0xffffffffffffffff, 0x0)
recvmmsg(0xffffffffffffffff, &(0x7f0000003000)=[{{0x0, 0x0,
&(0x7f0000002000)=[{&(0x7f000000a000)=""/193, 0xc1},
{&(0x7f0000007000-0x3d)=""/61, 0x3d}], 0x2,
&(0x7f0000005000-0x67)=""/103, 0x67, 0x0}, 0x0}], 0x1, 0x0,
&(0x7f0000003000-0x10)={0x77359400, 0x0})
bpf$PROG_LOAD(0x5, &(0x7f0000000000)={0x1, 0x5,
&(0x7f0000002000)=@framed={{0x18, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0}, [@jmp={0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}], {0x95, 0x0, 0x0,
0x0}}, &(0x7f0000004000-0xa)='syzkaller\x00', 0x3, 0xc3,
&(0x7f0000386000)=""/195, 0x0, 0x0, [0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0], 0x0}, 0x48)

Note: you can convert any syzkaller program to equivalent C code using
syz-prog2c utility that comes with syzkaller.

Dmitry Vyukov

unread,
Jan 8, 2018, 4:17:25 AM1/8/18
to Tetsuo Handa, syzbot, Andi Kleen, Andrew Morton, Jan Kara, jla...@redhat.com, LKML, Linux-MM, Mel Gorman, Ingo Molnar, npi...@gmail.com, rgol...@suse.com, syzkall...@googlegroups.com, Jens Axboe, Ming Lei, Hannes Reinecke, Omar Sandoval, sh...@fb.com
Hi Tetsuo,

Let's forget about the single crypto bug. We can't build the system
that handles hundreds of bugs around that single bug which is fixed at
this point. What is the general improvement you are proposing?

Note that some bugs are only in linux.git, some are only in
linux-next.git, some are only in net, kvm, etc, or maybe in some
combination of these. And we generally don't know where a bug is
present and where it is not. We can try to do some assumption _if_ the
bug has a reproducer, but even then most kernel bugs are due to races
and can't be reproduced with 100% probability, or it can't be just
that the same bug can be reproduced on a different tree but requires a
slightly different reproducer. So any such assumptions won't be 100%
reliable, and any flaw in information syzbot provides usually provokes
lots of very negative reaction from kernel developers.

Tetsuo Handa

unread,
Jan 8, 2018, 5:48:33 AM1/8/18
to dvy...@google.com, bot+980f5e5fc060c37505...@syzkaller.appspotmail.com, a...@linux.intel.com, ak...@linux-foundation.org, ja...@suse.cz, jla...@redhat.com, linux-...@vger.kernel.org, linu...@kvack.org, mgo...@techsingularity.net, mi...@kernel.org, npi...@gmail.com, rgol...@suse.com, syzkall...@googlegroups.com, ax...@kernel.dk, tom.l...@gmail.com, ha...@suse.de, osa...@fb.com, sh...@fb.com
Dmitry Vyukov wrote:
> >> Hi Tetsuo,
> >>
> >> syzbot always re-runs the same workload on a new machine. If it
> >> manages to reproduce the problem, it provides a reproducer. In this
> >> case it didn't.
> >
> > Even if it did not manage to reproduce the problem, showing raw.log in
> > C format is helpful for me. For example,
> >
> > ioctl$LOOP_CHANGE_FD(r3, 0x4c00, r1)
> >
> > is confusing. 0x4c00 is not LOOP_CHANGE_FD but LOOP_SET_FD.
> > If the message were
> >
> > ioctl(r3, 0x4c00, r1)
> >
> > more people will be able to read what the program tried to do.
> > There are many operations done on loop devices, but are too hard
> > for me to pick up only loop related actions.
>
>
> Hi Tetsuo,
>
> The main purpose of this format is different, this is a complete
> representation of programs that allows replaying them using syzkaller
> tools.

What is ioctl$LOOP_CHANGE_FD(r3, 0x4c00, r1) ?
0x4c00 is LOOP_SET_FD. Why LOOP_CHANGE_FD is there?

> We can't simply drop info from there. Do you propose to add
> another attached file that contains the same info in a different
> format? What is the exact format you are proposing?

Plain C program which can be compiled without installing additional
program/library packages (except those needed for building kernels).

> Is it just
> dropping the syscall name part after $ sign? Note that it's still not
> C, more complex syscall generally look as follows:
>
> perf_event_open(&(0x7f0000b5a000)={0x4000000002, 0x78, 0x1e2, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xffff, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> @perf_bp={&(0x7f0000000000)=0x0, 0x0}, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0}, 0x0, 0x0, 0xffffffffffffffff, 0x0)
> recvmmsg(0xffffffffffffffff, &(0x7f0000003000)=[{{0x0, 0x0,
> &(0x7f0000002000)=[{&(0x7f000000a000)=""/193, 0xc1},
> {&(0x7f0000007000-0x3d)=""/61, 0x3d}], 0x2,
> &(0x7f0000005000-0x67)=""/103, 0x67, 0x0}, 0x0}], 0x1, 0x0,
> &(0x7f0000003000-0x10)={0x77359400, 0x0})
> bpf$PROG_LOAD(0x5, &(0x7f0000000000)={0x1, 0x5,
> &(0x7f0000002000)=@framed={{0x18, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0}, [@jmp={0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}], {0x95, 0x0, 0x0,
> 0x0}}, &(0x7f0000004000-0xa)='syzkaller\x00', 0x3, 0xc3,
> &(0x7f0000386000)=""/195, 0x0, 0x0, [0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0], 0x0}, 0x48)
>
> Note: you can convert any syzkaller program to equivalent C code using
> syz-prog2c utility that comes with syzkaller.

I won't install go language into my environment for analyzing/reproducing your
reports. If syz-prog2c is provided as a CGI service (e.g. receive URL containing
raw.log and print the converted C program), I might try it.

Dmitry Vyukov

unread,
Jan 15, 2018, 4:40:38 AM1/15/18
to Tetsuo Handa, syzbot, Andi Kleen, Andrew Morton, Jan Kara, jla...@redhat.com, LKML, Linux-MM, Mel Gorman, Ingo Molnar, npi...@gmail.com, rgol...@suse.com, syzkall...@googlegroups.com, Jens Axboe, Ming Lei, Hannes Reinecke, Omar Sandoval, sh...@fb.com
On Mon, Jan 8, 2018 at 11:48 AM, Tetsuo Handa
<penguin...@i-love.sakura.ne.jp> wrote:
> Dmitry Vyukov wrote:
>> >> Hi Tetsuo,
>> >>
>> >> syzbot always re-runs the same workload on a new machine. If it
>> >> manages to reproduce the problem, it provides a reproducer. In this
>> >> case it didn't.
>> >
>> > Even if it did not manage to reproduce the problem, showing raw.log in
>> > C format is helpful for me. For example,
>> >
>> > ioctl$LOOP_CHANGE_FD(r3, 0x4c00, r1)
>> >
>> > is confusing. 0x4c00 is not LOOP_CHANGE_FD but LOOP_SET_FD.
>> > If the message were
>> >
>> > ioctl(r3, 0x4c00, r1)
>> >
>> > more people will be able to read what the program tried to do.
>> > There are many operations done on loop devices, but are too hard
>> > for me to pick up only loop related actions.
>>
>>
>> Hi Tetsuo,
>>
>> The main purpose of this format is different, this is a complete
>> representation of programs that allows replaying them using syzkaller
>> tools.
>
> What is ioctl$LOOP_CHANGE_FD(r3, 0x4c00, r1) ?
> 0x4c00 is LOOP_SET_FD. Why LOOP_CHANGE_FD is there?


In short, it specifies exact discrimination of the syscall which
affects parsing of the rest of the arguments. For some syscalls
(ioctl/setsockopt/sendmsg) kernel has hundreds of different
discriminations with radically different arguments.
Now if you are asking why the discrimination is LOOP_CHANGE_FD, but
the actual command is LOOP_SET_FD, that's because this is a fuzzer,
it's sole purpose is to mess things in unexpected ways.
raw.log is not a _program_, it's hundreds of separate programs that
were executed before the crash. It's also very compressed
representation as compared to equivalent C programs. For example for
this program:

mmap(&(0x7f0000000000/0xfff000)=nil, 0xfff000, 0x3, 0x32,
0xffffffffffffffff, 0x0)
r0 = socket$nl_generic(0x10, 0x3, 0x10)
sendmsg$nl_generic(r0,
&(0x7f0000b3e000-0x38)={&(0x7f0000d4a000-0xc)={0x10, 0x0, 0x0, 0x0},
0xc, &(0x7f0000007000)={&(0x7f0000f7c000-0x15c)={0x24, 0x1c, 0x109,
0xffffffffffffffff, 0xffffffffffffffff, {0x4, 0x0, 0x0},
[@nested={0x10, 0x9, [@typed={0xc, 0x0, @u32=0x0}]}]}, 0x24}, 0x1,
0x0, 0x0, 0x0}, 0x0)

you can get up to this amount of C code:
https://gist.githubusercontent.com/dvyukov/eeaeb4e4ac45c3a251f72098c9295bf9/raw/700cd583507eca90711ba11b42e406f317553371/gistfile1.txt

that is, 700 lines of C source for 3 line program. So instead of a 1MB
file that will be 100MB, and then it probably should be a gzip archive
with hundreds of separate C files. There are people on this list
complaining even about 200K of attachments. I don't see that this will
be better and well accepted.

Tetsuo Handa

unread,
Jan 15, 2018, 5:45:00 AM1/15/18
to dvy...@google.com, bot+980f5e5fc060c37505...@syzkaller.appspotmail.com, a...@linux.intel.com, ak...@linux-foundation.org, ja...@suse.cz, jla...@redhat.com, linux-...@vger.kernel.org, linu...@kvack.org, mgo...@techsingularity.net, mi...@kernel.org, npi...@gmail.com, rgol...@suse.com, syzkall...@googlegroups.com, ax...@kernel.dk, tom.l...@gmail.com, ha...@suse.de, osa...@fb.com, sh...@fb.com
??? I can't catch what you want to say.

I understand that a fuzzer intentionally tests various cases.
My question is simple. Why don't you use actual command name like
ioctl$LOOP_SET_FD(r3, 0x4c00, r1) ?
Writing like ioctl$LOOP_CHANGE_FD is confusing. I consider it as a bug.
No problem. In the "tty: User triggerable soft lockup." case, I manually
trimmed the reproducer at https://marc.info/?l=linux-mm&m=151368630414963 .
That is,

(1) Can the problem be reproduced even if setup_tun(0, true); is commented out?

(2) Can the problem be reproduced even if NONFAILING(A = B); is replaced with
plain A = B; assignment?

(3) Can the problem be reproduced even if install_segv_handler(); is commented
out?

(4) Can the problem be reproduced even if some syscalls (e.g. __NR_memfd_create,
__NR_getsockopt, __NR_perf_event_open) are replaced with no-op?

and so on. Then, I finally reached a reproducer which I sent, and the bug was fixed.

What is important is that everyone can try simplifying the reproducer written
in plain C in order to narrow down the culprit. Providing a (e.g.) CGI service
which generates plain C reproducer like gistfile1.txt will be helpful to me.

Dmitry Vyukov

unread,
Jan 15, 2018, 7:30:00 AM1/15/18
to Tetsuo Handa, syzbot, Andi Kleen, Andrew Morton, Jan Kara, jla...@redhat.com, LKML, Linux-MM, Mel Gorman, Ingo Molnar, npi...@gmail.com, rgol...@suse.com, syzkall...@googlegroups.com, Jens Axboe, Ming Lei, Hannes Reinecke, Omar Sandoval, sh...@fb.com
On Mon, Jan 15, 2018 at 11:44 AM, Tetsuo Handa
To answer this I need to dive a bit into syzkaller internals.
There are 2 main operations with programs: generation and mutations.
During these operations consts specified in syscall discriminations
always match (we don't get LOOP_CHANGE_FD which is actually
LOOP_SET_FD). However, there are some additional operations, in
particular argument changes based on intercepted comparison arguments
in kernel code. Say, we execute LOOP_CHANGE_FD and collect all
comparison arguments and then alter syscall arguments that match one
argument to another argument. That's when we can get discrimination X
which is actually discrimination Y. There is bunch of open questions
associated with this process. First of which is -- should we ever
mutate arguments that are specified as const in a discrimination.
Mutating consts is useful, because that's what allows fuzzer to get to
all of these undescribed ioctls/setsockopts/netlink messages/network
packet protocols, etc. But then, should we also change the
discrimination? What to do if 2 disciminations have completely
different arguments, which are not possible to translate? What if we
don't have a discrimination for the new value at all? What if we have
a partial const match for 2 discriminations, but no exact match for
any discrimination. Also note that it's usually more complex than this
ioctl change because consts can be buried deep into sendmsg's data.
Even in this case, the actual ioctl depends on the file type, but we
can as well have an unknown fd type, so even when we do
ioctl$LOOP_SET_FD(r3, 0x4c00, r1), that's not necessasry LOOP_SET_FD.

Dmitry Vyukov

unread,
Jan 15, 2018, 8:44:13 AM1/15/18
to Tetsuo Handa, syzbot, Andi Kleen, Andrew Morton, Jan Kara, jla...@redhat.com, LKML, Linux-MM, Mel Gorman, Ingo Molnar, npi...@gmail.com, rgol...@suse.com, syzkall...@googlegroups.com, Jens Axboe, Ming Lei, Hannes Reinecke, Omar Sandoval, sh...@fb.com
I am not completely following. You previously mentioned raw.log, which
is a collection of multiple programs, but now you seem to be talking
about a single reproducer. When syzbot manages to reproduce the bug
only with syzkaller program but not with a corresponding C program, it
provides only syzkaller program. It such case it can sense to convert.
But the case you pointed to actually contains C program. So there is
no need to do the conversion at all... What am I missing?

Tetsuo Handa

unread,
Jan 15, 2018, 8:57:00 AM1/15/18
to dvy...@google.com, bot+980f5e5fc060c37505...@syzkaller.appspotmail.com, a...@linux.intel.com, ak...@linux-foundation.org, ja...@suse.cz, jla...@redhat.com, linux-...@vger.kernel.org, linu...@kvack.org, mgo...@techsingularity.net, mi...@kernel.org, npi...@gmail.com, rgol...@suse.com, syzkall...@googlegroups.com, ax...@kernel.dk, tom.l...@gmail.com, ha...@suse.de, osa...@fb.com, sh...@fb.com
Dmitry Vyukov wrote:
> > No problem. In the "tty: User triggerable soft lockup." case, I manually
> > trimmed the reproducer at https://marc.info/?l=linux-mm&m=151368630414963 .
> > That is,
> >
> > (1) Can the problem be reproduced even if setup_tun(0, true); is commented out?
> >
> > (2) Can the problem be reproduced even if NONFAILING(A = B); is replaced with
> > plain A = B; assignment?
> >
> > (3) Can the problem be reproduced even if install_segv_handler(); is commented
> > out?
> >
> > (4) Can the problem be reproduced even if some syscalls (e.g. __NR_memfd_create,
> > __NR_getsockopt, __NR_perf_event_open) are replaced with no-op?
> >
> > and so on. Then, I finally reached a reproducer which I sent, and the bug was fixed.
> >
> > What is important is that everyone can try simplifying the reproducer written
> > in plain C in order to narrow down the culprit. Providing a (e.g.) CGI service
> > which generates plain C reproducer like gistfile1.txt will be helpful to me.
>
> I am not completely following. You previously mentioned raw.log, which
> is a collection of multiple programs, but now you seem to be talking
> about a single reproducer. When syzbot manages to reproduce the bug
> only with syzkaller program but not with a corresponding C program, it
> provides only syzkaller program. It such case it can sense to convert.
> But the case you pointed to actually contains C program. So there is
> no need to do the conversion at all... What am I missing?
>

raw.log is not readable for me.
I want to see C program even if syzbot did not manage to reproduce the bug.
If C program is available, everyone can try reproducing the bug with manually
trimmed C program.

Dmitry Vyukov

unread,
Jan 15, 2018, 9:01:01 AM1/15/18
to Tetsuo Handa, syzbot, Andi Kleen, Andrew Morton, Jan Kara, jla...@redhat.com, LKML, Linux-MM, Mel Gorman, Ingo Molnar, npi...@gmail.com, rgol...@suse.com, syzkall...@googlegroups.com, Jens Axboe, Ming Lei, Hannes Reinecke, Omar Sandoval, sh...@fb.com
If it did not manage to reproduce the bug, there is no C program.
There is no program at all.

Tetsuo Handa

unread,
Jan 15, 2018, 9:26:12 AM1/15/18
to dvy...@google.com, bot+980f5e5fc060c37505...@syzkaller.appspotmail.com, a...@linux.intel.com, ak...@linux-foundation.org, ja...@suse.cz, jla...@redhat.com, linux-...@vger.kernel.org, linu...@kvack.org, mgo...@techsingularity.net, mi...@kernel.org, npi...@gmail.com, rgol...@suse.com, syzkall...@googlegroups.com, ax...@kernel.dk, tom.l...@gmail.com, ha...@suse.de, osa...@fb.com, sh...@fb.com
Dmitry Vyukov wrote:
> >> I am not completely following. You previously mentioned raw.log, which
> >> is a collection of multiple programs, but now you seem to be talking
> >> about a single reproducer. When syzbot manages to reproduce the bug
> >> only with syzkaller program but not with a corresponding C program, it
> >> provides only syzkaller program. It such case it can sense to convert.
> >> But the case you pointed to actually contains C program. So there is
> >> no need to do the conversion at all... What am I missing?
> >>
> >
> > raw.log is not readable for me.
> > I want to see C program even if syzbot did not manage to reproduce the bug.
> > If C program is available, everyone can try reproducing the bug with manually
> > trimmed C program.
>
> If it did not manage to reproduce the bug, there is no C program.
> There is no program at all.
>

What!? Then, what does raw.log contain? I want to read raw.log as C program.

Dmitry Vyukov

unread,
Jan 15, 2018, 9:28:20 AM1/15/18
to Tetsuo Handa, syzbot, Andi Kleen, Andrew Morton, Jan Kara, jla...@redhat.com, LKML, Linux-MM, Mel Gorman, Ingo Molnar, npi...@gmail.com, rgol...@suse.com, syzkall...@googlegroups.com, Jens Axboe, Ming Lei, Hannes Reinecke, Omar Sandoval, sh...@fb.com

Tetsuo Handa

unread,
Jan 15, 2018, 9:38:38 AM1/15/18
to dvy...@google.com, bot+980f5e5fc060c37505...@syzkaller.appspotmail.com, a...@linux.intel.com, ak...@linux-foundation.org, ja...@suse.cz, jla...@redhat.com, linux-...@vger.kernel.org, linu...@kvack.org, mgo...@techsingularity.net, mi...@kernel.org, npi...@gmail.com, rgol...@suse.com, syzkall...@googlegroups.com, ax...@kernel.dk, tom.l...@gmail.com, ha...@suse.de, osa...@fb.com, sh...@fb.com
Dmitry Vyukov wrote:
> On Mon, Jan 15, 2018 at 3:25 PM, Tetsuo Handa
> <penguin...@i-love.sakura.ne.jp> wrote:
> > Dmitry Vyukov wrote:
> >> >> I am not completely following. You previously mentioned raw.log, which
> >> >> is a collection of multiple programs, but now you seem to be talking
> >> >> about a single reproducer. When syzbot manages to reproduce the bug
> >> >> only with syzkaller program but not with a corresponding C program, it
> >> >> provides only syzkaller program. It such case it can sense to convert.
> >> >> But the case you pointed to actually contains C program. So there is
> >> >> no need to do the conversion at all... What am I missing?
> >> >>
> >> >
> >> > raw.log is not readable for me.
> >> > I want to see C program even if syzbot did not manage to reproduce the bug.
> >> > If C program is available, everyone can try reproducing the bug with manually
> >> > trimmed C program.
> >>
> >> If it did not manage to reproduce the bug, there is no C program.
> >> There is no program at all.
> >>
> >
> > What!? Then, what does raw.log contain? I want to read raw.log as C program.
>
>
> raw.log is not a _program_, it's hundreds of separate programs that
> were executed before the crash.

I want to know the hundreds of separate programs as C programs.
Even if there are hundreds, more recently ran programs tend to be
the culprit.

> It's also very compressed
> representation as compared to equivalent C programs. For example for
> this program:
>
> mmap(&(0x7f0000000000/0xfff000)=nil, 0xfff000, 0x3, 0x32,
> 0xffffffffffffffff, 0x0)
> r0 = socket$nl_generic(0x10, 0x3, 0x10)
> sendmsg$nl_generic(r0,
> &(0x7f0000b3e000-0x38)={&(0x7f0000d4a000-0xc)={0x10, 0x0, 0x0, 0x0},
> 0xc, &(0x7f0000007000)={&(0x7f0000f7c000-0x15c)={0x24, 0x1c, 0x109,
> 0xffffffffffffffff, 0xffffffffffffffff, {0x4, 0x0, 0x0},
> [@nested={0x10, 0x9, [@typed={0xc, 0x0, @u32=0x0}]}]}, 0x24}, 0x1,
> 0x0, 0x0, 0x0}, 0x0)
>
> you can get up to this amount of C code:
> https://gist.githubusercontent.com/dvyukov/eeaeb4e4ac45c3a251f72098c9295bf9/raw/700cd583507eca90711ba11b42e406f317553371/gistfile1.txt
>
> that is, 700 lines of C source for 3 line program. So instead of a 1MB
> file that will be 100MB, and then it probably should be a gzip archive
> with hundreds of separate C files. There are people on this list
> complaining even about 200K of attachments. I don't see that this will
> be better and well accepted.
>

I don't think attaching to mails is acceptable. Thus, I suggest e.g. CGI
service so that only those who want to try can obtain the C programs.

Dmitry Vyukov

unread,
Feb 13, 2018, 2:52:02 PM2/13/18
to syzbot, syzkall...@googlegroups.com
old bug bankruptcy

#syz invalid

On Mon, Dec 18, 2017 at 9:43 AM, syzbot
<bot+980f5e5fc060c37505...@syzkaller.appspotmail.com>
wrote:
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/001a11444d0e7bfd7f05609956c6%40google.com.
Reply all
Reply to author
Forward
0 new messages