Debugging "no output"

Alexey Kardashevskiy

unread,

Jun 12, 2020, 12:18:44 AM6/12/20

to syzkaller

Hi,

After about 20min of running syzkaller (as "syzkaller/bin/syz-manager
-config syzkaller/bin/my.cfg", "procs"=8, it is 4xQEMUs, 8 CPUs each),
VMs stop with "no output".

I look at the VM's output and see that the last tests were:

09:03:31 executing program 1:

r0 = socket$inet_sctp(0x2, 0x5, 0x84)

ioctl$sock_SIOCETHTOOL(r0, 0x8946,
&(0x7f0000000100)={'bridge_slave_1\x00',
&(0x7f00000000c0)=@ethtool_ts_info})

09:03:31 executing program 6:

inotify_init1(0x180800)

prctl$PR_SET_KEEPCAPS(0x8, 0x0)

I can ssh to that VM and see it is pretty alive with lots of
/syz-executor.X in the root but let's say doing "strace /syz-executor.6"
shows no "inotify".

The question is - how do I tell which one of 8 executors failed to
produce output? Thanks,

--
Alexey

Dmitry Vyukov

unread,

Jun 12, 2020, 12:50:43 AM6/12/20

to Alexey Kardashevskiy, syzkaller

Hi Alexey,

It's no output from the whole VM whatsoever.
Each test process has multiple levels of timeouts and SIGKILLs, and
should be hard-killed after 7 seconds or so. Then, each test prints
that "executing program N:" to the output. However, syz-manager have
not seen any output from the VM within 5 minutes. There is something
wrong with it.
If you ssh into the VM, try to kill all executor processes. Frequently
there is an unkillable one, and it causes such failure mode. An
unkillable process is a kernel bug too. With TASK_HUNG debug config,
kernel should self-diagnose it and print informative bug report. If
it's not enabled in your kernel, consider enabling it. We have
recommended debug configs here:
https://github.com/google/syzkaller/blob/master/dashboard/config/bits-syzbot.config
https://github.com/google/syzkaller/blob/master/dashboard/config/bits-syzbot-aux-debug.config
and timeout tuning here:
https://github.com/google/syzkaller/blob/master/dashboard/config/upstream-selinux.cmdline
https://github.com/google/syzkaller/blob/master/dashboard/config/upstream.sysctl

If you are using qemu emulation mode, then this may be subject to:
https://github.com/google/syzkaller/issues/1552

And these features may be useful for debugging of such cases:
https://github.com/google/syzkaller/issues/491
https://github.com/google/syzkaller/issues/1217

Alexey Kardashevskiy

unread,

Jun 15, 2020, 4:55:01 AM6/15/20

to Dmitry Vyukov, syzkaller

Well, this is never the case on my machine. VMs keep running, no hung
tasks detected, etc.

> If you ssh into the VM, try to kill all executor processes. Frequently
> there is an unkillable one, and it causes such failure mode.

Nothing like that, I can kill them all with no problems. I suspect that
successful response from a syz-executor process is somehow lost, trying
to pin that..

> An
> unkillable process is a kernel bug too. With TASK_HUNG debug config,
> kernel should self-diagnose it and print informative bug report. If
> it's not enabled in your kernel, consider enabling it. We have
> recommended debug configs here:
> https://github.com/google/syzkaller/blob/master/dashboard/config/bits-syzbot.config
> https://github.com/google/syzkaller/blob/master/dashboard/config/bits-syzbot-aux-debug.config
> and timeout tuning here:
> https://github.com/google/syzkaller/blob/master/dashboard/config/upstream-selinux.cmdline
> https://github.com/google/syzkaller/blob/master/dashboard/config/upstream.sysctl

Thanks, this is interesting.

> If you are using qemu emulation mode, then this may be subject to:
> https://github.com/google/syzkaller/issues/1552
>
> And these features may be useful for debugging of such cases:
> https://github.com/google/syzkaller/issues/491
> https://github.com/google/syzkaller/issues/1217
>

--
Alexey

Alexey Kardashevskiy

unread,

Jun 17, 2020, 1:21:32 AM6/17/20

to Dmitry Vyukov, syzkaller

Tried "procs"=1, see this (hopefully formating won't break):

root@syzkaller:~# ps axjf | egrep '(syz|PID)'
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
26266 26272 26271 26266 pts/0 26271 S+ 0 0:00 \_
grep -E (syz|PID)
1 359 359 359 ? -1 Ssl 0 0:26 /syz-fuzzer
-executor=/syz-executor -name=vm-7 -arch=ppc64le -manage
r=10.0.2.10:44993 -sandbox=none -procs=1 -cover=true -debug=false
-test=false
359 372 372 359 ? -1 Z 0 0:00 \_
[syz-executor.0] <defunct>
1 13901 13899 359 ? -1 S 0 0:00 /syz-executor.0
1 13906 13904 359 ? -1 S 0 0:00 /syz-executor.0
1 13911 13908 359 ? -1 S 0 0:00 /syz-executor.0
1 13940 13938 359 ? -1 S 0 0:00 /syz-executor.0
1 13945 13942 359 ? -1 S 0 0:00 /syz-executor.0
1 13950 13948 359 ? -1 S 0 0:00 /syz-executor.0
1 13967 13965 359 ? -1 S 0 0:00 /syz-executor.0
1 13972 13970 359 ? -1 S 0 0:00 /syz-executor.0
1 13977 13975 359 ? -1 S 0 0:00 /syz-executor.0
1 19540 19538 359 ? -1 S 0 0:00 /syz-executor.0
1 19542 19538 359 ? -1 S 0 0:00 /syz-executor.0
1 19568 19566 359 ? -1 S 0 0:00 /syz-executor.0
1 19572 19570 359 ? -1 S 0 0:00 /syz-executor.0
1 19576 19574 359 ? -1 S 0 0:00 /syz-executor.0
1 19580 19578 359 ? -1 S 0 0:00 /syz-executor.0
1 26212 26212 359 ? -1 Zl 0 0:00
[syz-executor.0] <defunct>

I can see many instances of /syz-executor.0 with the same parent pid=1
so I assume it is not one multithreaded executor but many instances and
do they all run different things? I expected to see one syz-executor.0
after setting procs to 1, what do I miss here?

Another thing - I added times to logging and see this:

===
04:49:38 executing program 0 -- :
r0 = openat$null(0xffffffffffffff9c, &(0x7f0000000000)='/dev/null\x00',
0x101101, 0x0)
fcntl$setstatus(r0, 0x4, 0x2800)
r1 = openat$zero(0xffffffffffffff9c, &(0x7f0000000380)='/dev/zero\x00',
0x0, 0x0)
fcntl$setstatus(r1, 0x4, 0x400)
r2 = geteuid()
r3 = getegid()
syz_mount_image$tmpfs(&(0x7f00000000c0)='tmpfs\x00',
&(0x7f0000000100)='./file0\x00', 0x0, 0x0, 0x0, 0x0, &(0x7f0000000540)=A
NY=[@ANYBLOB='uid=', @ANYRESHEX=r2,
@ANYBLOB=',mode=00000000000000000000001,mode=00000000000000000000777,gid=',
@ANYRESHEX=r3
])
r4 = geteuid()
r5 = geteuid()
r6 = getegid()
syz_mount_image$tmpfs(&(0x7f00000000c0)='tmpfs\x00',
&(0x7f0000000100)='./file0\x00', 0x0, 0x0, 0x0, 0x0, &(0x7f0000000540)=A
NY=[@ANYBLOB='uid=', @ANYRESHEX=r5,
@ANYBLOB=',mode=00000000000000000000001,mode=00000000000000000000777,gid=',
@ANYRESHEX=r6
])
ioctl$NS_GET_OWNER_UID(r0, 0x2000b704, &(0x7f00000000c0)=<r7=>0x0)
mount$9p_fd(0x0, &(0x7f0000000040)='./file0\x00',
&(0x7f0000000080)='9p\x00', 0x800400, &(0x7f0000000100)={'trans=fd,', {'rfd
no', 0x3d, r1}, 0x2c, {'wfdno', 0x3d, r0}, 0x2c,
{[{@access_user='access=user'}, {@loose='loose'}, {@aname={'aname', 0x3d, '/
dev/null\x00'}}, {@uname={'uname', 0x3d, '/dev/null\x00'}},
{@msize={'msize', 0x3d, 0x4}}, {@cache_mmap='cache=mmap'}], [{@ui
d_gt={'uid>', r2}}, {@uid_eq={'uid', 0x3d, r4}}, {@uid_gt={'uid>', r5}},
{@euid_eq={'euid', 0x3d, r7}}, {@dont_hash='dont_has
h'}, {@pcr={'pcr', 0x3d, 0x7}}, {@hash='hash'}]}})
mprotect(&(0x7f00007e0000/0x800000)=nil, 0x800000, 0x4)

04:50:38 executor 0: HUNG
===

It took 1 minute for syzcaller to decide the process hung but since
there are many of syz-executor.0, I cannot even tell which one is hung
actually. Oh well, I have to add an actual syz-executor process pid to
logging, let's try that...

--
Alexey

Dmitry Vyukov

unread,

Jun 17, 2020, 12:03:32 PM6/17/20

to Alexey Kardashevskiy, syzkaller

This looks like some kernel bug, especially the <defunct> ones. These
are some leftover runaway, unkillable processes. These <defunct> ones
should have SIGKILL pending in /proc/PID/status, yet they don't die.
Normally syz-executor has 3 processes, each child of the previous one
(parent, child and grand-child).

To see how things should be, you may try the following.
Copy syz-executor and and syz-execprog to a VM. Create prog.txt file
with "getpid()" contents (the simplest syzkaller program). and then
run:

$ syz-execprog -executor syz-executor -procs=1 -repeat=0 prog.txt

This will repeat the single test forever with 1 test process. In
another console run ps to see the expected process tree. If you Ctrl+C
it, there should be no remaining processes.

You may also save some crash log from "no output" bug and then reply
it manually in a VM with:

$ syz-execprog -executor syz-executor -procs=1 -repeat=0 crash.log

This can make debugging simpler. Normally it should continue running
with no leftover processes.

Well, all of them. Well, maybe except one. But the <defunct> ones are
definitely bad.

Alexey Kardashevskiy

unread,

Jun 18, 2020, 5:54:24 AM6/18/20

to Dmitry Vyukov, syzkaller

3? I never see 3, it is two maximum + execprog or fuzzed as a parent:

./syz-execprog -executor syz-executor
\_ /root/syz-executor.0
\_ /root/syz-executor.0

or

/syz-fuzzer -executor=/syz-executor
\_ /syz-executor.7
| \_ /syz-executor.7
\_ /syz-executor.5
| \_ /syz-executor.5
\_ /syz-executor.2
| \_ /syz-executor.2

Even two seems too much, why are there two?

And is there anything anywhere besides the vm output telling what
syz-executor is doing what?

As in the example above, there are more than 15 syz-executor.0 with the
parent==1==init so they are orphans do die, were they all running the
same test or different ones?

It is getting a bit clearer now :)

--
Alexey

Dmitry Vyukov

unread,

Jun 18, 2020, 6:16:28 AM6/18/20

to Alexey Kardashevskiy, syzkaller

If you are running "getpid()" program, then the third one is super
short lived. You may need to run ps lots of times to "catch" it.
One of the parent process is for sandboxing, another serves as fork server.

> And is there anything anywhere besides the vm output telling what
> syz-executor is doing what?

If you run syz-execprog, or syz-fuzzer, or syz-manager with -debug
flag, then executor will be printing very detailed info how it starts
and finishing running each system call.
It's sometimes useful with syz-execprog, but with syz-manager it will
produce tremendous amount of output and slowdown fuzzing process, it's
not supposed to be enabled for actual fuzzing.

> As in the example above, there are more than 15 syz-executor.0 with the
> parent==1==init so they are orphans do die, were they all running the
> same test or different ones?

Different tests. But the different tests may also be very similar, in
the sense that they all trigger the same kernel bug.

Upstream kernel now has a debug config that can detect broken unkillable tasks:
https://syzkaller.appspot.com/bug?id=9386d051e11e09973d5a4cf79af5e8cedf79386d
https://syzkaller.appspot.com/bug?id=a0e3436829698d5824231251fad9d8e998f94f5e
Enabling it will produce clearer diagnositics and will separate these
cases from the rest of more generic "no output" cases.

Alexey Kardashevskiy

unread,

Jun 18, 2020, 6:40:34 AM6/18/20

to Dmitry Vyukov, syzkaller

I noticed this... Unfortunately the minimal config to start getting
these in 20min after the start is 4 VMs, 8 vCPUs each and 8 procs in
each VM.

git grep "die for more than" 1b5044021070 (where 1b5044021070 is 16hrs
old upstream) finds nothing, have any pointers handy?

--
Alexey

Dmitry Vyukov

unread,

Jun 18, 2020, 8:32:26 AM6/18/20

to Alexey Kardashevskiy, syzkaller

That's:

commit c64b18417bfa3affa4d9ca1e27ff1397f395b7f2
Author: Tetsuo Handa <penguin...@i-love.sakura.ne.jp>
Date: Tue Jun 9 20:22:51 2020 +1000
kernel/hung_task.c: Monitor killed tasks.

in linux-next next-20200618. Not sure if it's in the mainline or not.

Alexey Kardashevskiy

unread,

Aug 6, 2020, 10:30:52 AM8/6/20

to Dmitry Vyukov, syzkaller

On 18/06/2020 20:16, Dmitry Vyukov wrote:

Out of curiosity - what is this fork server exactly doing? And what is
that flag_threaded business? Does it run every test in 2 threads by default?

--
Alexey

Dmitry Vyukov

unread,

Aug 6, 2020, 10:34:26 AM8/6/20

to Alexey Kardashevskiy, syzkaller

What any fork server is typically doing -- precreating some resources
and doing some setup before fork, so that we don't need to repeat it
for every test.

> And what is
> that flag_threaded business? Does it run every test in 2 threads by default?

If it's set, syscalls are executed in separate worker threads, this
allows it to continue after blocking syscalls.
If it's not set, everything is executed in the main thread.

Alexey Kardashevskiy

unread,

Aug 6, 2020, 8:08:24 PM8/6/20

to Dmitry Vyukov, syzkaller

I have a sequence of 3 syscalls:

open$dir
openat$null
mount$9p_fd

and I can see there syscall being called in thread#1, it gets stuck in
mountn (yeah, a bug) and then thread#2 does exactly the same and gets
stuck too. What is the idea here, if mount() is stuck and there is
something after mount() in the script, syzkaller will continue? I am
missing the point here.

--
Alexey

Dmitry Vyukov

unread,

Aug 7, 2020, 2:58:25 AM8/7/20

to Alexey Kardashevskiy, syzkaller

That's probably "collide" mode.
Run syz-execprog with -debug flag and -threaded/collide=0/1, it will
explain what happens.

Alexey Kardashevskiy

unread,

Aug 9, 2020, 10:56:15 PM8/9/20

to Dmitry Vyukov, syzkaller

On 07/08/2020 16:58, Dmitry Vyukov wrote:

>>>> And what is
>>>> that flag_threaded business? Does it run every test in 2 threads by default?
>>>
>>> If it's set, syscalls are executed in separate worker threads, this
>>> allows it to continue after blocking syscalls.
>>> If it's not set, everything is executed in the main thread.
>>
>> I have a sequence of 3 syscalls:
>>
>> open$dir
>> openat$null
>> mount$9p_fd
>>
>> and I can see there syscall being called in thread#1, it gets stuck in
>> mountn (yeah, a bug) and then thread#2 does exactly the same and gets
>> stuck too. What is the idea here, if mount() is stuck and there is
>> something after mount() in the script, syzkaller will continue? I am
>> missing the point here.
>
> That's probably "collide" mode.
> Run syz-execprog with -debug flag and -threaded/collide=0/1, it will
> explain what happens.

a-ha. It was this collide mode so it is intentional and for catching
races, cool, thanks.

Another (unrelated) question - how does KCOV guide fuzzing exactly, what
is the mechanism? As I understand it, KCOV tells how a particular
syscall gets executed but is this information consumed by the fuzzer
anyhow? Thanks,

--
Alexey

Dmitry Vyukov

unread,

Aug 10, 2020, 1:55:05 AM8/10/20

to Alexey Kardashevskiy, syzkaller

Please start new threads for new unrelated questions. This is useful
for searching answers later.

Reply all

Reply to author

Forward