Very high number of "suppressed report"

556 views
Skip to first unread message

Christoph Paasch

unread,
Jun 28, 2022, 7:47:59 AM6/28/22
to syzkaller
Hello,

I have a syzkaller instance running and am seeing a very high number of "suppressed report" errors.

Looking at the logs they all seem to be "syz-executor.0 invoked oom-killer".

Besides these, my syzkaller instance isn't finding any other crashers (which is quite suspicious...).


Any suggestions on what kind of logs I could collect to understand what is going on?


Thanks,
Christoph

Aleksandr Nogikh

unread,
Jun 28, 2022, 8:19:20 AM6/28/22
to Christoph Paasch, syzkaller, Space Meyer
Hi Christoph,

(I'm assuming you're fuzzing Linux. In case you're interested in
Darwin, I have Cc'd Space, who was implementing the support.)

That's interesting. Overall, syzkaller is not really worrying about
oom-killed syz-executors. It only suppresses a report if e.g.
syz-fuzzer was affected or there are other signs that the system state
is too mangled (see
https://github.com/google/syzkaller/blob/master/pkg/report/linux.go#L115
+ there are per-bug-type suppressions like
https://github.com/google/syzkaller/blob/master/pkg/report/linux.go#L1840).
Is there anything else in your reports/logs that matches the
suppression regexps from the link?

You can try running syzkaller with a `-debug` flag, in that case it
will only run one VM/device and will dump all console output + extra
debug output. Maybe this can shed some more light on the problem.

FWIW also note that syzkaller can be actually quite memory-demanding.
It runs a syz-fuzzer process on a target device, which keeps the whole
corpus in memory, and each syz-executor process can take I think more
than 50mb of RAM, depending on the configuration. The `proc` param of
the syz-manager config specifies the number of syz-executors.

--
Best Regards,
Aleksandr
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/91acfeb5-7ea5-4f26-b5f4-73ada0b94a95n%40googlegroups.com.

Christoph Paasch

unread,
Jun 28, 2022, 1:05:08 PM6/28/22
to Aleksandr Nogikh, syzkaller, Space Meyer
Hello,

On Jun 28, 2022, at 5:19 AM, 'Aleksandr Nogikh' via syzkaller <syzk...@googlegroups.com> wrote:

Hi Christoph,

(I'm assuming you're fuzzing Linux. In case you're interested in
Darwin, I have Cc'd Space, who was implementing the support.)

yes, it's Linux. I have been running syzkaller for the past several years on a modified Linux Kernel.

That's interesting. Overall, syzkaller is not really worrying about
oom-killed syz-executors. It only suppresses a report if e.g.
syz-fuzzer was affected or there are other signs that the system state
is too mangled (see
https://github.com/google/syzkaller/blob/master/pkg/report/linux.go#L115
+ there are per-bug-type suppressions like
https://github.com/google/syzkaller/blob/master/pkg/report/linux.go#L1840).
Is there anything else in your reports/logs that matches the
suppression regexps from the link?

You can try running syzkaller with a `-debug` flag, in that case it
will only run one VM/device and will dump all console output + extra
debug output. Maybe this can shed some more light on the problem.

I will try -debug and also try with an upstream kernel - just to exclude that it's the custom pieces that cause this.

FWIW also note that syzkaller can be actually quite memory-demanding.
It runs a syz-fuzzer process on a target device, which keeps the whole
corpus in memory, and each syz-executor process can take I think more
than 50mb of RAM, depending on the configuration. The `proc` param of
the syz-manager config specifies the number of syz-executors.

I give 2GB to the syzkaller instances. Haven't changed that setting in years.


Thanks for your feedback, I will come back to you once I have gathered more info with -debug.


Christoph


--
Best Regards,
Aleksandr


On Tue, Jun 28, 2022 at 1:48 PM 'Christoph Paasch' via syzkaller
<syzk...@googlegroups.com> wrote:

Hello,

I have a syzkaller instance running and am seeing a very high number of "suppressed report" errors.

Looking at the logs they all seem to be "syz-executor.0 invoked oom-killer".

Besides these, my syzkaller instance isn't finding any other crashers (which is quite suspicious...).


Any suggestions on what kind of logs I could collect to understand what is going on?


Thanks,
Christoph

--
You received this message because you are subscribed to the Google Groups "syzkaller" group.
To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/91acfeb5-7ea5-4f26-b5f4-73ada0b94a95n%40googlegroups.com.

-- 
You received this message because you are subscribed to a topic in the Google Groups "syzkaller" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/syzkaller/QZm-RKN-fzc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to syzkaller+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/CANp29Y5J01QJGdwWzVSi6RmuAGXV9koCPRhvnTg%2BVPxTV_Eupw%40mail.gmail.com.

Aleksandr Nogikh

unread,
Jun 29, 2022, 4:21:30 AM6/29/22
to Christoph Paasch, syzkaller, Space Meyer
Hi Christoph,

On Tue, Jun 28, 2022 at 7:05 PM Christoph Paasch <cpa...@apple.com> wrote:
>
> Hello,
>
> On Jun 28, 2022, at 5:19 AM, 'Aleksandr Nogikh' via syzkaller <syzk...@googlegroups.com> wrote:
>
> Hi Christoph,
>
> (I'm assuming you're fuzzing Linux. In case you're interested in
> Darwin, I have Cc'd Space, who was implementing the support.)
>
>
> yes, it's Linux. I have been running syzkaller for the past several years on a modified Linux Kernel.
>
> That's interesting. Overall, syzkaller is not really worrying about
> oom-killed syz-executors. It only suppresses a report if e.g.
> syz-fuzzer was affected or there are other signs that the system state
> is too mangled (see
> https://github.com/google/syzkaller/blob/master/pkg/report/linux.go#L115
> + there are per-bug-type suppressions like
> https://github.com/google/syzkaller/blob/master/pkg/report/linux.go#L1840).
> Is there anything else in your reports/logs that matches the
> suppression regexps from the link?
>
> You can try running syzkaller with a `-debug` flag, in that case it
> will only run one VM/device and will dump all console output + extra
> debug output. Maybe this can shed some more light on the problem.
>
>
> I will try -debug and also try with an upstream kernel - just to exclude that it's the custom pieces that cause this.

I think that the most important step now is to figure out why
syzkaller actually suppresses those reports, i.e. because of which
kernel-printed line.
If it's possible, it would be very helpful to see the full report
(even better -- also the log) of one of such crashes. You can also
match the report(s) you have to the suppression regexps from linux,go
yourself.

>
> FWIW also note that syzkaller can be actually quite memory-demanding.
> It runs a syz-fuzzer process on a target device, which keeps the whole
> corpus in memory, and each syz-executor process can take I think more
> than 50mb of RAM, depending on the configuration. The `proc` param of
> the syz-manager config specifies the number of syz-executors.
>
>
> I give 2GB to the syzkaller instances.

Yeah, that should be more than enough.

> Haven't changed that setting in years.

And it started failing as you described just recently, right?

Christoph Paasch

unread,
Jun 29, 2022, 11:29:04 AM6/29/22
to Aleksandr Nogikh, syzkaller, Space Meyer
Hello,
Attached is the log.
syzkaller.log

Aleksandr Nogikh

unread,
Jun 30, 2022, 5:47:45 AM6/30/22
to Christoph Paasch, syzkaller, Space Meyer
Hi Christoph,

Thanks for the log and for the info!

The syz-fuzzer process gets killed here due to an OOM situation:
[ 1203.383996] Out of memory (oom_kill_allocating_task): Killed
process 1423 (syz-fuzzer) total-vm:3997212kB, anon-rss:3140548kB,
file-rss:0kB, shmem-rss:572kB, UID:0 pgtables:6332kB oom_score_adj:0

syz-fuzzer's anon-rss is almost 3GB, that's crazy..

Do you run syz-manager with "sandbox": "none"? If yes, I think there's
a chance that it somehow corrupts the syz-fuzzer process. Could you
try to run it with "sandbox": "namespace" and see if it changes
anything?

If it doesn't, then we likely have (introduced?) some memory leak
problem to syz-fuzzer.

--
Best Regards,
Aleksandr
> FWIW also note that syzkaller can be actually quite memory-demanding.
> It runs a syz-fuzzer process on a target device, which keeps the whole
> corpus in memory, and each syz-executor process can take I think more
> than 50mb of RAM, depending on the configuration. The `proc` param of
> the syz-manager config specifies the number of syz-executors.
>
>
> I give 2GB to the syzkaller instances.
>
>
> Yeah, that should be more than enough.
>
>
> Actually, it's 4GB. But that should be even better ;-)
>
>
> Haven't changed that setting in years.
>
>
> And it started failing as you described just recently, right?
>
>
> Yes, just recently. I updated my kernel to 5.4.197+ custom changes and I updated syzkaller as well.
>
> Right now, I confirmed that plain 5.4.197 has the same problems. I'm now trying to bisect syzkaller to see if a change in syzkaller introduced the "regression". If I don't make progress on that today, I will try bisecting the kernel.
>
>
> Christoph
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/CANp29Y7JsWcPfTjVyWcJTX9eyxOgdgBfL5HE1R3CKWEik7NHEg%40mail.gmail.com.
>
>

Christoph Paasch

unread,
Jun 30, 2022, 6:36:32 PM6/30/22
to Aleksandr Nogikh, syzkaller, Space Meyer


> On Jun 30, 2022, at 2:47 AM, 'Aleksandr Nogikh' via syzkaller <syzk...@googlegroups.com> wrote:
>
> Hi Christoph,
>
> Thanks for the log and for the info!
>
> The syz-fuzzer process gets killed here due to an OOM situation:
> [ 1203.383996] Out of memory (oom_kill_allocating_task): Killed
> process 1423 (syz-fuzzer) total-vm:3997212kB, anon-rss:3140548kB,
> file-rss:0kB, shmem-rss:572kB, UID:0 pgtables:6332kB oom_score_adj:0
>
> syz-fuzzer's anon-rss is almost 3GB, that's crazy..
>
> Do you run syz-manager with "sandbox": "none"? If yes, I think there's
> a chance that it somehow corrupts the syz-fuzzer process. Could you
> try to run it with "sandbox": "namespace" and see if it changes
> anything?

This is my config-file:

$ cat my.cfg
{
"target": "linux/amd64",
"http": "127.0.0.1:56741",
"workdir": "/home/cpaasch/gopath/src/github.com/google/syzkaller/workdir_kasan_net",
"kernel_obj": "/mnt/tmp/build/",
"kernel_src": "/mnt/tmp/mptcp_syzkaller/",
"kernel_build_src": "/mnt/tmp/mptcp_syzkaller/",
"image": "/home/cpaasch/syzkaller/tools/stretch.img",
"sshkey": "/home/cpaasch/syzkaller/tools/stretch.id_rsa",
"syzkaller": "/home/cpaasch/gopath/src/github.com/google/syzkaller",
"disable_syscalls": ["perf_event_open", "syz_mount_image", "syz_read_part_table", "openat$ttyprintk", "mount", "mkdir", "openat$ptmx", "mq_open", "fsetxattr", "rt_tgsigqueueinfo", "ioctl$VT_RESIZE", "ioctl$TIOCVHANGUP", "get_robust_list", "openat$nullb", "ioctl$SCSI_IOCTL_SEND_COMMAND", "lremovexattr", "mknod$loop", "write$binfmt_script", "syz_open_dev$sg", "write$nbd", "prlimit64", "write$P9_RRENAMEAT", "fcntl$addseals", "finit_module", "ioctl$KDSETMODE", "write$FUSE_NOTIFY_STORE", "ioctl$TIOCL_SETVESABLANK", "fsmount", "socket$vsock_stream", "socketpair$unix", "socket$nl_audit", "connect$unix", "bind$unix", "openat$loop_ctrl", "syz_io_uring_setup", "ioctl$FS_IOC_RESVSP", "openat$cdrom", "syz_genetlink_get_family_id$devlink", "ioctl$FS_IOC_SETVERSION", "sendmsg$SMC_PNETID_FLUSH"],
"procs": 8,
"type": "qemu",
"vm": {
"count": 24,
"cmdline" : "net.ifnames=0",
"kernel": "/mnt/tmp/build/arch/x86/boot/bzImage",
"cpu": 2,
"mem": 4096
}
}


But - I see that the default valule for "sandbox" is "none", right?

I will give it a shot with "namespace".



Christoph
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/CANp29Y71Jp%2BjykdX7FtJCf0DmXMJ_KnFpB6fOnZxi1dzsJ484Q%40mail.gmail.com.

Christoph Paasch

unread,
Jul 1, 2022, 11:28:25 AM7/1/22
to Christoph Paasch, Aleksandr Nogikh, syzkaller, Space Meyer
Coming back to this - even with sandbox: "namespace", I keep on getting oom-kills.


Christoph

Aleksandr Nogikh

unread,
Jul 1, 2022, 11:33:15 AM7/1/22
to Christoph Paasch, syzkaller, Space Meyer
On Fri, Jul 1, 2022 at 5:28 PM Christoph Paasch <cpa...@apple.com> wrote:
>
> Coming back to this - even with sandbox: "namespace", I keep on getting oom-kills.
>
>
> Christoph

Thank you very much for testing this and for providing the details!

I'll try to reproduce your setup on my side (using upstream Linux
5.4.197) and see if the situation is the same.

--
Best regards,
Aleksandr

Christoph Paasch

unread,
Jul 1, 2022, 11:45:09 AM7/1/22
to Aleksandr Nogikh, syzkaller, Space Meyer
Thanks, Aleksandr!

I takes at least an hour on 24VMs to start producing these OOM-kills.

I am bisecting syzkaller and am currently between 3f1e91ed11e657c596a375b100695a8b4e0d6e37 and e2d91b1d0dd8c8b4760986ec8114469246022bb8. Does that ring a bell?


Christoph


Christoph Paasch

unread,
Jul 1, 2022, 7:32:09 PM7/1/22
to Christoph Paasch, Aleksandr Nogikh, syzkaller, Space Meyer

On Jul 1, 2022, at 8:45 AM, 'Christoph Paasch' via syzkaller <syzk...@googlegroups.com> wrote:

Thanks, Aleksandr!

I takes at least an hour on 24VMs to start producing these OOM-kills.

I am bisecting syzkaller and am currently between 3f1e91ed11e657c596a375b100695a8b4e0d6e37 and e2d91b1d0dd8c8b4760986ec8114469246022bb8. Does that ring a bell?

I take this back. My bisection was incorrect. Seems rather to be a kernel issue (?).


Christoph


Dmitry Vyukov

unread,
Jul 2, 2022, 3:23:34 AM7/2/22
to Christoph Paasch, Aleksandr Nogikh, syzkaller, Space Meyer
On Sat, 2 Jul 2022 at 01:32, 'Christoph Paasch' via syzkaller
<syzk...@googlegroups.com> wrote:
> On Jul 1, 2022, at 8:45 AM, 'Christoph Paasch' via syzkaller <syzk...@googlegroups.com> wrote:
>
> Thanks, Aleksandr!
>
> I takes at least an hour on 24VMs to start producing these OOM-kills.
>
> I am bisecting syzkaller and am currently between 3f1e91ed11e657c596a375b100695a8b4e0d6e37 and e2d91b1d0dd8c8b4760986ec8114469246022bb8. Does that ring a bell?
>
>
> I take this back. My bisection was incorrect. Seems rather to be a kernel issue (?).

Does this correlate with corpus/coverage growth?
corpus/coverage are the main moving parts in the syz-fuzzer that can
consume lots of memory. It's also consistent with the fact that it
starts happening after a few hours of running.
What are your corpus/coverage numbers? Maybe you noticed that they are
higher than before?
In the past this happened that the fuzzer somehow managed to produce
programs that yield fake controlled coverage somehow. E.g. you execute
syscall "foo(0x42)" and get kernel coverage 0x42, then execute
"foo(0x43)" and get 0x43. This can explode corpus/coverage. Check the
syscalls page, it should show per-syscall corpus/coverage. Maybe
something stands out there.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/503D24BE-B81E-4B31-B7C6-FD1CA78A8275%40apple.com.

Christoph Paasch

unread,
Jul 2, 2022, 11:15:31 AM7/2/22
to Dmitry Vyukov, Aleksandr Nogikh, syzkaller, Space Meyer


> On Jul 2, 2022, at 12:23 AM, Dmitry Vyukov <dvy...@google.com> wrote:
>
> On Sat, 2 Jul 2022 at 01:32, 'Christoph Paasch' via syzkaller
> <syzk...@googlegroups.com> wrote:
>> On Jul 1, 2022, at 8:45 AM, 'Christoph Paasch' via syzkaller <syzk...@googlegroups.com> wrote:
>>
>> Thanks, Aleksandr!
>>
>> I takes at least an hour on 24VMs to start producing these OOM-kills.
>>
>> I am bisecting syzkaller and am currently between 3f1e91ed11e657c596a375b100695a8b4e0d6e37 and e2d91b1d0dd8c8b4760986ec8114469246022bb8. Does that ring a bell?
>>
>>
>> I take this back. My bisection was incorrect. Seems rather to be a kernel issue (?).
>
> Does this correlate with corpus/coverage growth?

My corpus db is indeed much bigger than in the past. And at the start it takes a long time to load the corpus db.

> corpus/coverage are the main moving parts in the syz-fuzzer that can
> consume lots of memory. It's also consistent with the fact that it
> starts happening after a few hours of running.
> What are your corpus/coverage numbers?

They are indeed much higher than usually. I’m away from my computer now, but can give exact numbers on Tuesday, if you need them.

Should I delete my corpus file and start fresh?


Thanks,
Christoph

Dmitry Vyukov

unread,
Jul 4, 2022, 4:01:31 AM7/4/22
to Christoph Paasch, Aleksandr Nogikh, syzkaller, Space Meyer
On Sat, 2 Jul 2022 at 17:15, Christoph Paasch <cpa...@apple.com> wrote:
> > On Jul 2, 2022, at 12:23 AM, Dmitry Vyukov <dvy...@google.com> wrote:
> >
> > On Sat, 2 Jul 2022 at 01:32, 'Christoph Paasch' via syzkaller
> > <syzk...@googlegroups.com> wrote:
> >> On Jul 1, 2022, at 8:45 AM, 'Christoph Paasch' via syzkaller <syzk...@googlegroups.com> wrote:
> >>
> >> Thanks, Aleksandr!
> >>
> >> I takes at least an hour on 24VMs to start producing these OOM-kills.
> >>
> >> I am bisecting syzkaller and am currently between 3f1e91ed11e657c596a375b100695a8b4e0d6e37 and e2d91b1d0dd8c8b4760986ec8114469246022bb8. Does that ring a bell?
> >>
> >>
> >> I take this back. My bisection was incorrect. Seems rather to be a kernel issue (?).
> >
> > Does this correlate with corpus/coverage growth?
>
> My corpus db is indeed much bigger than in the past. And at the start it takes a long time to load the corpus db.
>
> > corpus/coverage are the main moving parts in the syz-fuzzer that can
> > consume lots of memory. It's also consistent with the fact that it
> > starts happening after a few hours of running.
> > What are your corpus/coverage numbers?
>
> They are indeed much higher than usually. I’m away from my computer now, but can give exact numbers on Tuesday, if you need them.
>
> Should I delete my corpus file and start fresh?

It depends on what happened there.
If we believe it was one-in-gazillion luck on the fuzzer side, we can
assume it will never learn this program again and just drop the
corpus.
But generally we try to prevent the fuzzer from doing this.

I would check syscalls/ page and see if there is one/few syscalls dominating.
If so, I would use tools/syz-db to unpack corpus.db and then grep for
the offending syscall.

Christoph Paasch

unread,
Jul 5, 2022, 11:54:37 AM7/5/22
to Dmitry Vyukov, Aleksandr Nogikh, syzkaller, Space Meyer
I have two instances, both with the same "oom-problem".

Here is the output of the syscalls page:



That does not look like it is limited to a few syscalls.


Unless you need any other debug-info, I'm just gonna delete the corpus.db so that I can start fuzzing again.



Thanks for your help, Dmitry & Aleksandr!


Christoph



Dmitry Vyukov

unread,
Jul 6, 2022, 4:46:56 AM7/6/22
to Christoph Paasch, Aleksandr Nogikh, syzkaller, Space Meyer
On Tue, 5 Jul 2022 at 17:54, Christoph Paasch <cpa...@apple.com> wrote:
> I have two instances, both with the same "oom-problem".
>
> Here is the output of the syscalls page:

Yes, there is something wrong.
Here is how it looks on syzbot largest instance:

syz_clone [3836] 2403 54205 prio
syz_clone3 [3837] 666 50281 prio
syz_emit_ethernet [3838] 1002 39078 prio
syz_emit_vhci [3839] 6 633 prio
syz_extract_tcp_res [3841] 0 0 prio
syz_extract_tcp_res$synack [3842] 1 520 prio
syz_fuse_handle_req [3843] 66 24738 prio
syz_genetlink_get_family_id$SEG6 [3844] 4 2381 prio
syz_genetlink_get_family_id$batadv [3845] 48 12591 prio
syz_genetlink_get_family_id$devlink [3846] 29 18448 prio
syz_genetlink_get_family_id$ethtool [3847] 69 22755 prio
syz_genetlink_get_family_id$fou [3848] 19 3258 prio
syz_genetlink_get_family_id$gtp [3849] 8 3157 prio
syz_genetlink_get_family_id$ieee802154 [3850] 19 9499 prio
syz_genetlink_get_family_id$ipvs [3851] 9 4785 prio
syz_genetlink_get_family_id$l2tp [3852] 30 6601 prio
syz_genetlink_get_family_id$mptcp [3853] 7 9445 prio
syz_genetlink_get_family_id$nbd [3854] 10 3566 prio
syz_genetlink_get_family_id$net_dm [3855] 0 0 prio
syz_genetlink_get_family_id$netlbl_calipso [3856] 4 2194 prio
syz_genetlink_get_family_id$netlbl_cipso [3857] 4 3751 prio
syz_genetlink_get_family_id$netlbl_mgmt [3858] 8 2491 prio
syz_genetlink_get_family_id$netlbl_unlabel [3859] 4 6668 prio
syz_genetlink_get_family_id$nl80211 [3860] 113 32760 prio
syz_genetlink_get_family_id$nl802154 [3861] 34 11794 prio
syz_genetlink_get_family_id$smc [3862] 12 10564 prio


And it affects all syscalls, not just one. Which suggests KCOV returns
random coverage all the time.
Do you use/load modules by any chance? Modules loaded at different
addresses may lead to this.
Somebody added some support for modules, but I don't remember if that
support detects badly loaded modules.

Christoph Paasch

unread,
Jul 6, 2022, 11:58:57 AM7/6/22
to Dmitry Vyukov, Aleksandr Nogikh, syzkaller, Space Meyer
No, I'm not loading any modules.

I'm also hitting now a lot of "SYZFATAL: executor NUM failed NUM times[...]".

Maybe something on my debian-image got messed up. I will start fresh and generate a new image.


Christoph

> Somebody added some support for modules, but I don't remember if that
> support detects badly loaded modules.
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "syzkaller" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/syzkaller/QZm-RKN-fzc/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to syzkaller+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/CACT4Y%2BbxykG2GtA141vsbpFcgxKaX8M9b2%2BA-8i9Li7T8bgZcQ%40mail.gmail.com.

Christoph Paasch

unread,
Jul 7, 2022, 11:48:26 AM7/7/22
to Christoph Paasch, Dmitry Vyukov, Aleksandr Nogikh, syzkaller, Space Meyer
After I started with a fresh debian-image, no more oom-kills, but a very large number of "SYZFATAL: executor NUM failed NUM times [...]" and no other panics :-/

Most of them are reporting "no space left on device"


Any ideas?

Thanks,
Christoph



Christoph

Somebody added some support for modules, but I don't remember if that
support detects badly loaded modules.

-- 
You received this message because you are subscribed to a topic in the Google Groups "syzkaller" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/syzkaller/QZm-RKN-fzc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to syzkaller+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/CACT4Y%2BbxykG2GtA141vsbpFcgxKaX8M9b2%2BA-8i9Li7T8bgZcQ%40mail.gmail.com.

-- 
You received this message because you are subscribed to a topic in the Google Groups "syzkaller" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/syzkaller/QZm-RKN-fzc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to syzkaller+...@googlegroups.com.

Dmitry Vyukov

unread,
Jul 8, 2022, 1:34:50 AM7/8/22
to Christoph Paasch, Aleksandr Nogikh, syzkaller, Space Meyer
I think we are seeing the same on syzbot.
In the past such cases happened when the fuzzer messed with the underlying fs:
https://github.com/google/syzkaller/blob/bff65f44b47bd73f56c3d6a5c3899de5f5775136/sys/linux/init.go#L316-L325
Potentially more of such restrictions are necessary. But this requires
localizing the problematic test program first.

Christoph Paasch

unread,
Jul 8, 2022, 1:16:39 PM7/8/22
to Dmitry Vyukov, Aleksandr Nogikh, syzkaller, Space Meyer
My instance found reproducers and quite a few of them have:
r0 = creat(&(0x7f0000000080)='./cgroup.net/devices.allow\x00', 0x0)

Let me know if there is anything I can do to help.


Christoph






Thanks,
Christoph



Christoph

Somebody added some support for modules, but I don't remember if that
support detects badly loaded modules.

--
You received this message because you are subscribed to a topic in the Google Groups "syzkaller" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/syzkaller/QZm-RKN-fzc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to syzkaller+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/CACT4Y%2BbxykG2GtA141vsbpFcgxKaX8M9b2%2BA-8i9Li7T8bgZcQ%40mail.gmail.com.


--
You received this message because you are subscribed to a topic in the Google Groups "syzkaller" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/syzkaller/QZm-RKN-fzc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to syzkaller+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/2EE5E073-1173-4424-A576-D6B983775386%40apple.com.



-- 
You received this message because you are subscribed to a topic in the Google Groups "syzkaller" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/syzkaller/QZm-RKN-fzc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to syzkaller+...@googlegroups.com.

Dmitry Vyukov

unread,
Jul 11, 2022, 4:22:00 AM7/11/22
to Christoph Paasch, Aleksandr Nogikh, syzkaller, Space Meyer
Humm.. is it the whole program? Not sure how creation of that cgroup
file can eat all disk space... but who knows.
What would be actionable is a program that reproduces "no space left
on device" when run in the VM as:
$ syz-execprog -repeat=0 -procs=4 prog
Then disk state can be examined and the root cause localized.

Christoph Paasch

unread,
Jul 11, 2022, 11:57:37 AM7/11/22
to Dmitry Vyukov, Aleksandr Nogikh, syzkaller, Space Meyer
No, the reproducers all have more statements as well.

> Not sure how creation of that cgroup
> file can eat all disk space... but who knows.
> What would be actionable is a program that reproduces "no space left
> on device" when run in the VM as:
> $ syz-execprog -repeat=0 -procs=4 prog
> Then disk state can be examined and the root cause localized.

Yes, I can repro:

root@syzkaller:~# ./syz-execprog -repeat=0 -procs=4 ./repro
2022/07/11 15:54:52 parsed 1 programs
2022/07/11 15:54:52 executed programs: 0
2022/07/11 15:55:04 SYZFATAL: executor failed 11 times: failed to create temp dir: mkdir syzkaller-testdir161155160: no space left on device
root@syzkaller:~# cat repro
# {Threaded:false Repeat:true RepeatTimes:0 Procs:8 Slowdown:1 Sandbox:none Leak:false NetInjection:true NetDevices:true NetReset:true Cgroups:true BinfmtMisc:true CloseFDs:true KCSAN:false DevlinkPCI:false USB:false VhciInjection:false Wifi:false IEEE802154:false Sysctl:true UseTmpDir:true HandleSegv:true Repro:false Trace:false LegacyOptions:{Collide:false Fault:false FaultCall:0 FaultNth:0}}
flock(0xffffffffffffffff, 0x1)
r0 = creat(&(0x7f0000000080)='./cgroup.net/devices.allow\x00', 0x0)
fallocate(r0, 0x11, 0x0, 0xffffffff000)
syz_clone(0x52d0300, 0x0, 0x0, 0x0, 0x0, 0x0)


All of the disc-space is consumed in /syzcgroup/net/


Christoph

>
> --
> You received this message because you are subscribed to a topic in the Google Groups "syzkaller" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/syzkaller/QZm-RKN-fzc/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to syzkaller+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/CACT4Y%2BZmmf9trdDG%3Do5cQ-dWZ9SHaaUFY%2Bpb2_23FrKwDvTUCw%40mail.gmail.com.

Dmitry Vyukov

unread,
Jul 12, 2022, 10:48:11 AM7/12/22
to Christoph Paasch, Aleksandr Nogikh, syzkaller, Space Meyer
OK, this is a start.

This does not reproduce for me. I get to:
2022/07/12 14:39:40 executed programs: 87201
and according to df disk space is not consumed at all.

syz-executor tries to mount cgroup fs at /syzcgroup to allow some
testing of cgroups:
https://github.com/google/syzkaller/blob/master/executor/common_linux.h#L3459

I am thinking maybe your kernel does not have cgroups enabled, and
these mounts fail and instead of creating cgroups this program
actually creates normal files. Then obviously the fallocate call will
cause huge disk space consumption.
Normally, syz-executor also cleans up all temp files after tests, but
not in /syzcgroup dir.

What's the state of /syzcgroup? What's mounted there? What are the
contents after the test?
You can get some debug output with:
./syz-execprog -debug prog

If that's the case, perhaps we should remove /syzcgroup and links to
it if cgroup mounts fail.

Christoph Paasch

unread,
Jul 12, 2022, 12:46:57 PM7/12/22
to Dmitry Vyukov, Aleksandr Nogikh, syzkaller, Space Meyer
I do have cgroups enabled, but not all of the options:
$  grep -i cgroup ./.config
CONFIG_CGROUPS=y
# CONFIG_BLK_CGROUP is not set
CONFIG_CGROUP_SCHED=y
CONFIG_CGROUP_PIDS=y
# CONFIG_CGROUP_RDMA is not set
CONFIG_CGROUP_FREEZER=y
# CONFIG_CGROUP_HUGETLB is not set
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
# CONFIG_CGROUP_PERF is not set
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_CGROUP_NET_PRIO is not set
# CONFIG_CGROUP_NET_CLASSID is not set


Then obviously the fallocate call will
cause huge disk space consumption.
Normally, syz-executor also cleans up all temp files after tests, but
not in /syzcgroup dir.

What's the state of /syzcgroup? What's mounted there? What are the
contents after the test?

root@syzkaller:/syzcgroup# mount | grep syzcgroup
none on /syzcgroup/unified type cgroup2 (rw,relatime)
none on /syzcgroup/cpu type cgroup (rw,relatime,cpuset,clone_children)

root@syzkaller:/syzcgroup# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root       2.0G  1.9G     0 100% /
devtmpfs        1.7G     0  1.7G   0% /dev
tmpfs           1.7G     0  1.7G   0% /dev/shm
tmpfs           1.7G  9.6M  1.7G   1% /run
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           1.7G     0  1.7G   0% /sys/fs/cgroup


You can get some debug output with:
./syz-execprog -debug prog

root@syzkaller:~# ./syz-execprog -debug prog -repeat=0 -procs=4 ./repro
2022/07/12 16:38:42 parsed 1 programs
2022/07/12 16:38:42 executed programs: 0
spawned loop pid 1434
mount(fusectl) failed: 16
netlink: add addr 172.30.0.4 dev nr3: No such device
[... a bunch of these ...]
netlink: device wg2 up master NULL: No such device
mount of binder at /dev/binderfs failed: 19
[780ms] exec opts: procid=3 threaded=1 cover=0 comps=0 dedup=1 signal=0 timeouts=50/5000/1 prog=0 filter=0
spawned worker pid 2
#0 [782ms] -> flock(0xffffffffffffffff, 0x1)
#0 [782ms] <- flock=0xffffffffffffffff errno=9
#0 [782ms] -> creat(0x20000080, 0x0)
#0 [782ms] <- creat=0x3
#0 [783ms] -> fallocate(0x3, 0x11, 0x0, 0xffffffff000)
#0 [796ms] <- fallocate=0xffffffffffffffff errno=28
#0 [797ms] -> syz_clone(0x52d0300, 0x0, 0x0, 0x0, 0x0, 0x0)
#0 [797ms] <- syz_clone=0xffffffffffffffff errno=22
2022/07/12 16:38:43 result: hanged=false err=<nil>


Christoph



If that's the case, perhaps we should remove /syzcgroup and links to
it if cgroup mounts fail.

--
You received this message because you are subscribed to a topic in the Google Groups "syzkaller" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/syzkaller/QZm-RKN-fzc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to syzkaller+...@googlegroups.com.

Dmitry Vyukov

unread,
Jul 13, 2022, 3:46:40 AM7/13/22
to Christoph Paasch, Aleksandr Nogikh, syzkaller, Space Meyer
Yes, it looks like it. I've filed
https://github.com/google/syzkaller/issues/3241 for this.
Do you want to fix it?

Christoph Paasch

unread,
Jul 13, 2022, 1:36:57 PM7/13/22
to Dmitry Vyukov, Aleksandr Nogikh, syzkaller, Space Meyer
I'm not sure I will have the time to do a proper pull-request.

However, in the issue you mention that it's because the NET_PRIO and NET_CLASSID kernel-configs are not set. I did enable them and still have the bug happening.
Could it be because I'm on a 5.4-kernel that does not have the net and rlimit cgroup ?


Christoph

>
> --
> You received this message because you are subscribed to a topic in the Google Groups "syzkaller" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/syzkaller/QZm-RKN-fzc/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to syzkaller+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/CACT4Y%2BYs1%2B948Jh0cNHbTOr-inHOyVM%3DJMr94BCb17HYZ1-zZg%40mail.gmail.com.

Dmitry Vyukov

unread,
Jul 15, 2022, 4:03:15 AM7/15/22
to Christoph Paasch, Aleksandr Nogikh, syzkaller, Space Meyer
I think this should fix it:
https://github.com/google/syzkaller/pull/3242


> However, in the issue you mention that it's because the NET_PRIO and NET_CLASSID kernel-configs are not set. I did enable them and still have the bug happening.
> Could it be because I'm on a 5.4-kernel that does not have the net and rlimit cgroup ?

I think it's something else.
Maybe your user-space system mounts all these cgroup controllers
somewhere else in init. IIRC they can't be mounted twice.
If that's the case, the PR will fix ENOSPC, but these cgroup
controllers will still be untested.

I think our buildroot images shouldn't be mounting any cgroup
controllers in init:
https://github.com/google/syzkaller/blob/master/tools/create-buildroot-image.sh

Christoph Paasch

unread,
Jul 15, 2022, 7:43:32 PM7/15/22
to Dmitry Vyukov, Aleksandr Nogikh, syzkaller, Space Meyer
Yes, this works!

>
>
>> However, in the issue you mention that it's because the NET_PRIO and NET_CLASSID kernel-configs are not set. I did enable them and still have the bug happening.
>> Could it be because I'm on a 5.4-kernel that does not have the net and rlimit cgroup ?
>
> I think it's something else.
> Maybe your user-space system mounts all these cgroup controllers
> somewhere else in init. IIRC they can't be mounted twice.
> If that's the case, the PR will fix ENOSPC, but these cgroup
> controllers will still be untested.

Seems like that's indeed what is happening.

>
> I think our buildroot images shouldn't be mounting any cgroup
> controllers in init:
> https://github.com/google/syzkaller/blob/master/tools/create-buildroot-image.sh

I use create-image.sh, not this one. Does that make a difference?


Thanks,
Christoph


Dmitry Vyukov

unread,
Jul 16, 2022, 2:35:53 AM7/16/22
to Christoph Paasch, Aleksandr Nogikh, syzkaller, Space Meyer
It uses some Debian distro, it's possible it messes with cgroups in init.
Buildroot image boots faster and does fewer surprising things.

Christoph Paasch

unread,
Jul 28, 2022, 12:06:13 PM7/28/22
to Dmitry Vyukov, Aleksandr Nogikh, syzkaller, Space Meyer
I see! I'm running on buildroot now. Looks good for now :-)

One frequent SYZFATAL I get now is:
SYZFATAL: executor NUM failed NUM times: failed to start executor binary: fork/exec /syz-executor: permission denied

It's not as bad as the no-space one, but still quite frequent.


Christoph



--
You received this message because you are subscribed to a topic in the Google Groups "syzkaller" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/syzkaller/QZm-RKN-fzc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to syzkaller+...@googlegroups.com.

Dmitry Vyukov

unread,
Jul 28, 2022, 1:46:30 PM7/28/22
to Christoph Paasch, Aleksandr Nogikh, syzkaller, Space Meyer
Good!

> One frequent SYZFATAL I get now is:
> SYZFATAL: executor NUM failed NUM times: failed to start executor binary: fork/exec /syz-executor: permission denied
>
> It's not as bad as the no-space one, but still quite frequent.

It seems that the fuzzer messes with the syz-executor binary... or
with the mount. W/o a reproducer it's hard to tell.

pengfei xu

unread,
Aug 12, 2025, 5:29:38 AM8/12/25
to syzkaller
Hi  Christoph Paasch,

I met the similar issue on an ARM64 platform:  "a very high number of "suppressed report" errors without crash"

Is there one error when syzkaller init like as below:
"

Comparisons             : mmap of data segment failed. want 0x20000000, got 0xffffffffffffffff (errno 17: File exists). . process exited with status 67.

"
If yes,  you could have a try to change kconfig and then make the kernel Image:

CONFIG_ARM64_MTE=y  changed to "CONFIG_ARM64_MTE=n" , enabled MTE may caused above mmap failed issue.

Then above issue " mmap of data segment failed." was gone and no  "suppressed report" issue.

Thanks!

Aleksandr Nogikh

unread,
Aug 12, 2025, 5:45:53 AM8/12/25
to pengfei xu, syzkaller
Hi Pengfei,

On Tue, Aug 12, 2025 at 11:29 AM pengfei xu <xpf...@gmail.com> wrote:
>
> Hi Christoph Paasch,
>
> I met the similar issue on an ARM64 platform: "a very high number of "suppressed report" errors without crash"
>
> Is there one error when syzkaller init like as below:
> "
>
> Comparisons : mmap of data segment failed. want 0x20000000, got 0xffffffffffffffff (errno 17: File exists). . process exited with status 67.

On what syzkaller revision do you observe the error above?
https://github.com/google/syzkaller/commit/5ba0fed13435213276f29e3d9e39d926f04ac1a8
should have fixed a very similar looking problem.

--
Aleksandr
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/syzkaller/61f07988-1ba5-488f-b709-6be5c07236dfn%40googlegroups.com.

pengfei xu

unread,
Aug 12, 2025, 10:17:02 PM8/12/25
to syzkaller
Hi Aleksandr Nogikh,
   Thanks for your suggestion!
   After I updated the syzkaller to latest and used previous KCONFIG with  CONFIG_ARM64_MTE=y.
   "mmap of data segment failed. want 0x20000000, got 0xffffffffffffffff (errno 17: File exists)" issue was fixed.
   You commit 81ed97dd2689fd fixed above issue, thanks!

  And there were still many "suppressed report", is there some tips for  "suppressed report"?

  Thanks!

Aleksandr Nogikh

unread,
Aug 13, 2025, 6:15:06 AM8/13/25
to pengfei xu, syzkaller
On Wed, Aug 13, 2025 at 4:17 AM pengfei xu <xpf...@gmail.com> wrote:
>
> Hi Aleksandr Nogikh,
> Thanks for your suggestion!
> After I updated the syzkaller to latest and used previous KCONFIG with CONFIG_ARM64_MTE=y.
> "mmap of data segment failed. want 0x20000000, got 0xffffffffffffffff (errno 17: File exists)" issue was fixed.
> You commit 81ed97dd2689fd fixed above issue, thanks!
>
> And there were still many "suppressed report", is there some tips for "suppressed report"?

I'd recommend looking at the suppressed reports themselves - if they
are suppressed, there's usually a reason for that. If these are mostly
OOMs, then it may be worth it to add more RAM to the VMs / use fewer
procs / see what was missing in the syzkaller's executor code to
prevent them.
> To view this discussion visit https://groups.google.com/d/msgid/syzkaller/82171fa6-bb4d-4293-b761-fc49696849een%40googlegroups.com.

pengfei xu

unread,
Aug 13, 2025, 9:20:54 PM8/13/25
to syzkaller

Hi  Aleksandr Nogikh,
    Thanks for the suggestion!
    Yes, the suppressed report mentioned out-of-memory (OOM) and terminated the process.
    I'll try increasing the memory size or reducing the number of procs.
  
[  947.445712] syz-executor invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
[  947.455193] CPU: 1 PID: 171 Comm: syz-executor Not tainted 6.6.88-f148c32bf96d-gf148c32bf96d #8
[  947.455778] Hardware name: linux,dummy-virt (DT)
[  947.456058] Call trace:
[  947.456217]  dump_backtrace+0x11c/0x1dc
[  947.456499]  show_stack+0x2c/0x4c
[  947.456707]  dump_stack_lvl+0x68/0x84
[  947.456951]  dump_stack+0x20/0x2c
[  947.457167]  dump_header+0xf0/0x6e4
[  947.457417]  oom_kill_process+0x174/0x4a8
[  947.457687]  out_of_memory+0xeec/0x11a8
[  947.457933]  __alloc_pages_may_oom+0x224/0x340
[  947.458223]  __alloc_pages+0xa44/0x171c
...
BR.
Thanks!
Reply all
Reply to author
Forward
0 new messages