Re: Inquiry About Syzbot's Historical Coverage Corpus for Linux

43 views
Skip to first unread message

Dmitry Vyukov

unread,
Mar 22, 2025, 10:19:17 AMMar 22
to Xingyu Li, Taras Madan, syzkaller, Zhiyun Qian
+syzkaller, Taras

(I am currently travelling, but Taras may provide an answer)

On Wed, 19 Mar 2025 at 13:03, Xingyu Li <xli...@ucr.edu> wrote:
>
> Hi Dmitry,
>
> Hope that this email finds you well!
> My name is Xingyu Li, a PhD student from UC Riverside, advised by Prof.Zhiyun Qian. My ongoing project is related to ideally take historical syzbot corpora and find a good seed that
> can reach a given line of kernel code (could be an old kernel). I have the below questions:
> 1, I noticed that the "coverage" on https://syzkaller.appspot.com/upstream provides the mapping between each covered line in the linux kernel and the syscall sequence which covers the corresponding line. But the webpage https://syzkaller.appspot.com/upstream only shows such mapping relationships for the one specific linux kernel. Is there a corpus which contains such mappings for all of the old linux kernel versions?
> 2, besides, I noticed that there are two corpus links: https://storage.googleapis.com/syzkaller/temp/corpus.db and https://storage.googleapis.com/syzkaller/corpus/ci-upstream-kasan-gce-corpus.db. What are the differences between these two corpus? Are they updated daily automatically? I want to know if any of these two corpus all of the syscall sequence generated by syzkaller in the past several years. Can I take codes covered by test cases in this corpus as all of the codes that can be covered by syzkaller?
>
> Thanks.
>
>
> --
> Yours sincerely,
> Xingyu

Aleksandr Nogikh

unread,
Mar 22, 2025, 9:56:42 PMMar 22
to Dmitry Vyukov, Xingyu Li, Taras Madan, syzkaller, Zhiyun Qian
On Sat, Mar 22, 2025 at 7:19 AM 'Dmitry Vyukov' via syzkaller
<syzk...@googlegroups.com> wrote:
>
> +syzkaller, Taras
>
> (I am currently travelling, but Taras may provide an answer)
>
> On Wed, 19 Mar 2025 at 13:03, Xingyu Li <xli...@ucr.edu> wrote:
> >
> > Hi Dmitry,
> >
> > Hope that this email finds you well!
> > My name is Xingyu Li, a PhD student from UC Riverside, advised by Prof.Zhiyun Qian. My ongoing project is related to ideally take historical syzbot corpora and find a good seed that
> > can reach a given line of kernel code (could be an old kernel). I have the below questions:
> > 1, I noticed that the "coverage" on https://syzkaller.appspot.com/upstream provides the mapping between each covered line in the linux kernel and the syscall sequence which covers the corresponding line. But the webpage https://syzkaller.appspot.com/upstream only shows such mapping relationships for the one specific linux kernel. Is there a corpus which contains such mappings for all of the old linux kernel versions?

In our context, "corpus" is just the minimal set of programs that
cover all the code we've been able to reach so far. Did you mean
coverage reports?

There's some historical data in per-manager build history, e.g.
https://syzkaller.appspot.com/upstream/manager/ci-upstream-kasan-gce-root

Also, there's a special page that focuses specifically on historical
coverage data: https://syzkaller.appspot.com/upstream/coverage?period=month

> > 2, besides, I noticed that there are two corpus links: https://storage.googleapis.com/syzkaller/temp/corpus.db and https://storage.googleapis.com/syzkaller/corpus/ci-upstream-kasan-gce-corpus.db. What are the differences between these two corpus? Are they updated daily automatically? I want to know if any of these two corpus all of the syscall sequence generated by syzkaller in the past several years. Can I take codes covered by test cases in this corpus as all of the codes that can be covered by syzkaller?

The latter one is updated daily. It doesn't contain any coverage data,
it's just the set of syzlang programs that are executed on syzkaller
restart. Only after having executed them all a syzkaller instance is
able to construct a PC => program mapping.

--
Aleksandr

> >
> > Thanks.
> >
> >
> > --
> > Yours sincerely,
> > Xingyu
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/syzkaller/CACT4Y%2BauwTtpC3E2aFvCHRtp01Hbeu3EPSn1tu%2B%2BJZ0vpDjBaQ%40mail.gmail.com.

Xingyu Li

unread,
Mar 22, 2025, 11:56:05 PMMar 22
to Aleksandr Nogikh, Dmitry Vyukov, Taras Madan, syzkaller, Zhiyun Qian
Hi Dmitry,

Happy travels! Thanks for the message forwarding to  Aleksandr.

Hi Aleksandr,

Thanks for your reply!

There's some historical data in per-manager build history, e.g.
https://syzkaller.appspot.com/upstream/manager/ci-upstream-kasan-gce-root

But this link only contains the history of kernel versions fuzzed by syzbot, without any coverage information and their corresponding programs which can implement such coverage.

Also, there's a special page that focuses specifically on historical
coverage data: https://syzkaller.appspot.com/upstream/coverage?period=month

This link shows which lines can be reached, but it does not contain the corresponding program which can reach the specific line.

The latter one is updated daily. It doesn't contain any coverage data,
it's just the set of syzlang programs that are executed on syzkaller
restart. Only after having executed them all a syzkaller instance is
able to construct a PC => program mapping.

So how was the later corpus built? Can I think that it a minimal set of programs that can cover all of the codes which were covered by syzkaller in the past several years?

Basically, my goal is to get a syzlang program which can reach any specific code line of the linux kernel(of course, it should be already covered by syzkaller in the history).
So it seems that https://syzkaller.appspot.com/upstream/coverage?period=month contains all of the covered lines. https://storage.googleapis.com/syzkaller/corpus/ci-upstream-kasan-gce-corpus.db contains a minimal set of of syzlang programs which can reach all of thel lines in https://syzkaller.appspot.com/upstream/coverage?period=month.
But only these two pieces of information is not enough for my goal. I need a mapping relationship between these two, and then given any specific line, I can get a syzlang program which can reach the specific line.

--
Yours sincerely,
Xingyu

Taras Madan

unread,
Mar 26, 2025, 2:20:06 PMMar 26
to Xingyu Li, Eduardo' Vela Nava, Aleksandr Nogikh, Dmitry Vyukov, syzkaller, Zhiyun Qian
Hi Xingyu Li,
I think the only way to go for you is to parse our cover.html files.
You can find these reports here for example. This is one of many.
This is the only programs coverage storage I know.

I'm currently working on the better programs export but there is no plan to store data longer than 1 month.
We just don't have much use-cases for this data.
You can see the example here (will be auto-deleted soon).

@Eduardo' Vela" Nava is your cover.html parsing code public?

BR,
Taras.

Xingyu Li

unread,
Mar 26, 2025, 2:54:20 PMMar 26
to Taras Madan, Eduardo' Vela Nava, Aleksandr Nogikh, Dmitry Vyukov, syzkaller, Zhiyun Qian
Hi Taras,

Thanks for your response.

You can see the example here (will be auto-deleted soon).

it seems the elements within this webpage are like {"repo":xxx,"commit:xxx,"program":"a syzlang program","coverage":[{"file_path":"xxx","functions":[covered line numbers]]}. So does such elements means that all of covered lines by the specific syzlang program? since i noticed that the number of covered lines can be super big.
And this is generated by @Eduardo' Vela" Nava 's cover.html parsing codes? Can you share it if it is not public?

--
Yours sincerely,
Xingyu

Xingyu Li

unread,
Apr 29, 2025, 5:16:17 PMApr 29
to Taras Madan, Eduardo' Vela Nava, Aleksandr Nogikh, Dmitry Vyukov, syzkaller, Zhiyun Qian
@Eduardo' Vela" Nava: Hi, is your cover.html parsing code public? If not, can you share it? Much appreciated!
--
Yours sincerely,
Xingyu

Xingyu Li

unread,
Sep 11, 2025, 6:15:52 PMSep 11
to Taras Madan, Eduardo' Vela Nava, Aleksandr Nogikh, Dmitry Vyukov, syzkaller, Zhiyun Qian
Hi Taras and Eduardo,

I want to follow up if the code to parse the cover.html is public, and if there are some changes on syzbot to store historical coverage. 

Thanks!
Reply all
Reply to author
Forward
0 new messages