[syzbot] Request for bulk access to syzbot crash reports for research

4 views
Skip to first unread message

이한

unread,
Dec 11, 2025, 1:11:54 AM (12 days ago) Dec 11
to syzk...@googlegroups.com
Dear syzkaller team,

My name is Han Lee, and I am a security researcher studying Linux kernel bug behavior and memory safety detection patterns.  
As part of my current research, I am analyzing syzbot bugs to understand which Sanitizer (KASAN, KMSAN, KCSAN, UBSAN, KMEMLEAK, etc.) actually detected each bug.

To do this accurately, I need to process all crash reports associated with each bug extid, because different reports under the same extid may be triggered by different sanitizers or different reproductions.  
At the moment I am collecting these reports via the public web interface, but I have encountered the rate limit:

    "429 Too Many Requests
     Allowed rate is 15 requests per 15 seconds."

I understand and respect the limit, and I would like to avoid placing unnecessary load on your servers.  
Therefore, I am writing to ask whether there is a way to obtain this data in bulk or through a more efficient access method.

Specifically, the data I am hoping to access includes:

- All crash report texts for each bug extid (the content behind the “report” links),
- Basic metadata such as extid, title, subsystem, and manager,
- And, if possible, historical crash reports for the upstream/fixed bugs set.

This data will be used solely for academic research on sanitizer coverage, kernel bug classification, and behavioral analysis of kernel faults.  
I am happy to comply with any restrictions or requirements, and I will properly acknowledge syzkaller/syzbot in all resulting research outputs.

If bulk access is possible (e.g., via a downloadable archive, an internal API endpoint, or by granting a higher per-user rate limit), I would sincerely appreciate your guidance.  
If certain parts of the data cannot be shared, any subset you are able to provide would still be extremely helpful.

Thank you very much for your time and for maintaining syzkaller — an invaluable resource for the kernel and security community.  
I look forward to your reply.

Best regards,  
Han Lee

Aleksandr Nogikh

unread,
Dec 11, 2025, 1:19:23 AM (12 days ago) Dec 11
to 이한, Taras Madan, syzk...@googlegroups.com
Hi Han Lee,

On Thu, Dec 11, 2025 at 3:11 PM 이한 <tw...@korea.ac.kr> wrote:
Dear syzkaller team,

My name is Han Lee, and I am a security researcher studying Linux kernel bug behavior and memory safety detection patterns.  
As part of my current research, I am analyzing syzbot bugs to understand which Sanitizer (KASAN, KMSAN, KCSAN, UBSAN, KMEMLEAK, etc.) actually detected each bug.

To do this accurately, I need to process all crash reports associated with each bug extid, because different reports under the same extid may be triggered by different sanitizers or different reproductions.  
At the moment I am collecting these reports via the public web interface, but I have encountered the rate limit:

    "429 Too Many Requests
     Allowed rate is 15 requests per 15 seconds."

I understand and respect the limit, and I would like to avoid placing unnecessary load on your servers.  
Therefore, I am writing to ask whether there is a way to obtain this data in bulk or through a more efficient access method.

Specifically, the data I am hoping to access includes:

- All crash report texts for each bug extid (the content behind the “report” links),
- Basic metadata such as extid, title, subsystem, and manager,
- And, if possible, historical crash reports for the upstream/fixed bugs set.


We don't have archives that export exactly this kind of data, but we bulk export json=1 versions of the per-bug, e.g.
https://syzkaller.appspot.com/bug?extid=e06bb7478e687f235ad7&json=1

This should hopefully simplify things for you. You can find it here:
https://storage.googleapis.com/artifacts.syzkaller.appspot.com/shared-files/repro-export/upstream.tar.gz

@Taras Madan it has around 8.5k records - are these open&fixed bugs?

이한

unread,
Dec 11, 2025, 2:00:23 AM (12 days ago) Dec 11
to Aleksandr Nogikh, syzk...@googlegroups.com, taras...@google.com

Dear syzkaller team,


Thank you very much for providing the JSON crash metadata.  

It is extremely helpful for mapping each bug extid to its associated crash reports.


However, after reviewing the data, I found that I still need to fetch the full crash report text from each `crash-report-link` entry.  

Since these links must be accessed individually via the public `/text?tag=CrashReport&x=...` endpoint, my script still needs to issue a separate HTTP request per crash report.


Because my research requires analyzing all crash reports associated with each extid to determine which Sanitizer (KASAN, KMSAN, KCSAN, UBSAN, KMEMLEAK, etc.) actually detected the bug, this results in a large number of requests.  

As a result, I consistently hit the public rate limit:


    "429 Too Many Requests – Allowed rate is 15 requests per 15 seconds."


I fully understand the need for rate limiting, and I want to avoid placing unnecessary load on your infrastructure.  

To proceed with the research, I would like to kindly ask whether one of the following could be possible:


1. Bulk access to all crash report texts for the bugs included in the JSON dump,  

   (e.g., a downloadable compressed archive containing the report bodies).


2. A higher rate limit for my research IP or a dedicated API key with expanded quota.


3. Any internal format or dataset (even partially sanitized) containing the raw crash report contents.


The data will be used strictly for academic research on sanitizer coverage patterns, bug classification, and kernel debugging behavior.  

I will properly acknowledge syzkaller/syzbot in all research output.


If some options above are not possible, I would sincerely appreciate any alternative method you can suggest that allows retrieving the crash report text more efficiently without overwhelming the public interface.


Thank you once again for your support and for providing the JSON dataset.  

Your help is invaluable to my ongoing research.


Best regards,  

Han Lee

Reply all
Reply to author
Forward
0 new messages