Trying to find code related to Syzbot's (syz-hub's?) corpus behavior

62 views
Skip to first unread message

Joseph Bursey

unread,
Aug 18, 2023, 2:56:00 PM8/18/23
to syzk...@googlegroups.com, Ardalan Amiri Sani, Zhiyun Qian
Hello! I've been looking into how Syzbot handles its corpus, specifically how seeds are shared and how the corpus grows and is pruned if at all. Through observing Syzbot, I've come across some behavior that I can't pin down in the source code. I have noticed that every night at 00:00 UTC, the corpus and corresponding coverage drop by a few thousand. Notably, When I run Syzkaller without syz-hub, this same behavior does not occur, so I'm pretty sure this is related to syz-hub. In short, what causes this drop?

While looking for the code, I found a function called purgeCorpus() in syz-hub/state/state.go, which appears to delete unused items from the corpus. If this is the cause, how  do these unused items come about?

Thanks in advance!

Joseph (Joey) Bursey
Graduate
Department of Computer Science
University of California, Irvine

Aleksandr Nogikh

unread,
Aug 21, 2023, 5:18:06 AM8/21/23
to Joseph Bursey, syzk...@googlegroups.com, Ardalan Amiri Sani, Zhiyun Qian
Hello,

On Fri, Aug 18, 2023 at 8:56 PM Joseph Bursey <jbu...@uci.edu> wrote:
>
> Hello! I've been looking into how Syzbot handles its corpus, specifically how seeds are shared and how the corpus grows and is pruned if at all.

Take a look at minimizeCorpus() [1], which is executed by syz-manager
once in a while.

If you want to see some corpus growth graphs, you can run syzkaller
with the `-bench` argument and supply the resulting file to
tools/syz-benchcmp to generate an .html file. There's also a tool[2]
for mass-running syzkaller instances and grouping the results.

[1] https://github.com/google/syzkaller/blob/d216d8a03b50bef82eac746d227230835f061640/syz-manager/manager.go#L1210
[2] https://github.com/google/syzkaller/blob/master/docs/syz_testbed.md

> Through observing Syzbot, I've come across some behavior that I can't pin down in the source code. I have noticed that every night at 00:00 UTC, the corpus and corresponding coverage drop by a few thousand. Notably, When I run Syzkaller without syz-hub, this same behavior does not occur, so I'm pretty sure this is related to syz-hub. In short, what causes this drop?

There's actually a pretty simple explanation. What we display is the
max corpus size over the present day:
https://github.com/google/syzkaller/blob/master/dashboard/app/entities.go#L41

If an instance restarted during the day, it will likely not reach its
corpus maximum until tomorrow. Therefore the corpus size drop on the
web dashboard. Syz-hub has nothing to do with it.

> While looking for the code, I found a function called purgeCorpus() in syz-hub/state/state.go, which appears to delete unused items from the corpus. If this is the cause, how do these unused items come about?

In syz-hub, purgeCorpus() just cleans up db from progs that are not in
use by any instance that's using syz-hub.

--
Aleksandr

Joseph Bursey

unread,
Aug 21, 2023, 1:50:53 PM8/21/23
to Aleksandr Nogikh, syzk...@googlegroups.com, Ardalan Amiri Sani, Zhiyun Qian
Thank you! This makes a lot more sense now.

- Joey Bursey


On Mon, Aug 21, 2023 at 2:18 AM Aleksandr Nogikh <nog...@google.com> wrote:
Hello,

On Fri, Aug 18, 2023 at 8:56 PM Joseph Bursey <jbu...@uci.edu> wrote:
>
> Hello! I've been looking into how Syzbot handles its corpus, specifically how seeds are shared and how the corpus grows and is pruned if at all.

Take a look at minimizeCorpus() [1], which is executed by syz-manager
once in a while.

If you want to see some corpus growth graphs, you can run syzkaller
with the `-bench` argument  and supply the resulting file to
tools/syz-benchcmp to generate an .html file. There's also a tool[2]
for mass-running syzkaller instances and grouping the results.


> Through observing Syzbot, I've come across some behavior that I can't pin down in the source code. I have noticed that every night at 00:00 UTC, the corpus and corresponding coverage drop by a few thousand. Notably, When I run Syzkaller without syz-hub, this same behavior does not occur, so I'm pretty sure this is related to syz-hub. In short, what causes this drop?

There's actually a pretty simple explanation. What we display is the
max corpus size over the present day:
Reply all
Reply to author
Forward
0 new messages