Re: [QUESTION] How to generate random corrupted filesystem image by syzkaller?

36 views
Skip to first unread message

Aleksandr Nogikh

unread,
Nov 5, 2024, 5:08:31 AM11/5/24
to Zhihao Cheng, syzk...@googlegroups.com, Zhihao Cheng, zhangyi (F), yang...@huawei.com, dvy...@google.com
Hi Zhihao,

The syz-imagegen generates valid fs images to seed the fuzzing
process. It's expected that fsck.ext4 will succeed on all of them.

During fuzzing, syzkaller will start mutating corpus programs, which
eventually leads to mounts of corrupted images and kernel crashes
similar to the report you shared. You don't need to do anything
special to force that behavior, it's the default.

--
Aleksandr

On Tue, Nov 5, 2024 at 10:22 AM Zhihao Cheng
<cheng...@huaweicloud.com> wrote:
>
> Hi, I have one question, how to generate random corrupted filesystem
> image by syzkaller?
>
> I noticed the tools/syz-imagegen, and it can generate kinds of fs images
> with different mkfs options, for example:
> [root@localhost syzkaller]$ ./bin/syz-imagegen -fs ext4 --keep
> generated images: 63/63
> [root@localhost syzkaller]$ ls sys/linux/test/syz_mount_image_ext4_*
> sys/linux/test/syz_mount_image_ext4_0
> sys/linux/test/syz_mount_image_ext4_28.img
> sys/linux/test/syz_mount_image_ext4_47.img
> sys/linux/test/syz_mount_image_ext4_0.img
> sys/linux/test/syz_mount_image_ext4_29
> sys/linux/test/syz_mount_image_ext4_48
>
> All *.img files can pass the check from fsck.ext4, and they can be
> mounted successfully. After looking through the
> code(tools/syz-imagegen/imagegen.go), I think syz-imagegen won't inject
> corruptions into fs images.
>
> So, how can I generate corrupted filesystem images by syzkaller? It
> looks like syzkaller can make it, because I did find the problem caused
> by corrupted syz image[1].
>
> [1] https://syzkaller.appspot.com/bug?extid=ae688d469e36fb5138d0
>

Aleksandr Nogikh

unread,
Nov 5, 2024, 8:34:19 AM11/5/24
to Zhihao Cheng, syzk...@googlegroups.com, Zhihao Cheng, zhangyi (F), yang...@huawei.com, dvy...@google.com
On Tue, Nov 5, 2024 at 12:10 PM Zhihao Cheng
<cheng...@huaweicloud.com> wrote:
>
>
>
> 在 2024/11/5 18:08, Aleksandr Nogikh 写道:
> > Hi Zhihao,
> >
> > The syz-imagegen generates valid fs images to seed the fuzzing
> > process. It's expected that fsck.ext4 will succeed on all of them.
> >
> > During fuzzing, syzkaller will start mutating corpus programs, which
> > eventually leads to mounts of corrupted images and kernel crashes
> > similar to the report you shared. You don't need to do anything
> > special to force that behavior, it's the default.
> >
>
> Thanks for replying, Aleksandr. Do we have any ways to save those
> corrupted fs images during fuzzing? I want to collect some corrupted fs
> images to test my newly developed fsck tool(fsck.ubifs, for UBIFS
> filesystem). After checking&repairing by fsck, the corrupted fs images
> could be mounted and accessed normally, that's my application scenarios
> for fuzzing fs images.
>

It is theoretically possible to hack the syzkaller code to extract and
record those images while fuzzing the kernel, but I wonder if you
might be better off just using a userspace coverage-guided fuzzer like
libafl or https://github.com/google/fuzztest. You can supply it with a
set of pregenerated fs images and it will mutate them to better cover
and exercise specifically your tool's code.

--
Aleksandr

Aleksandr Nogikh

unread,
Nov 6, 2024, 9:06:32 AM11/6/24
to Zhihao Cheng, syzk...@googlegroups.com, Zhihao Cheng, zhangyi (F), yang...@huawei.com, dvy...@google.com
On Wed, Nov 6, 2024 at 2:38 PM Zhihao Cheng <cheng...@huaweicloud.com> wrote:
> Hi, Aleksandr. Thanks for the suggestions, and I find another simple
> tool fsfuzzer(https://github.com/stevegrubb/fsfuzzer), the mutation
> algorithm(mangle.c) is simple. Do you know where is the implementation
> of syzkaller's mutation(In which source file I can get the mutation
> algorithm)? I want to do a comparation.

The complexity of syzkaller's mutation engine comes from having to
deal with not just binary blobs, but with highly structured arguments
/ pointers / resources / syscall dependencies. For filesystems, we do
not describe their structure, these are just blobs of data.

The related code is as follows:
1) We find the potentially interesting locations within the raw binary
image: https://github.com/google/syzkaller/blob/master/prog/heatmap.go
2) Then we randomly pick one of those locations and corrupt it:
https://github.com/google/syzkaller/blob/df3dc63b8ba0b52ca67025f5b55cd4356b3eda75/prog/mutation.go#L435-L451

--
Aleksandr
>

Zhihao Cheng

unread,
Nov 6, 2024, 10:12:34 AM11/6/24
to Aleksandr Nogikh, syzk...@googlegroups.com, Zhihao Cheng, zhangyi (F), yang...@huawei.com, dvy...@google.com


在 2024/11/5 18:08, Aleksandr Nogikh 写道:
> Hi Zhihao,
>
> The syz-imagegen generates valid fs images to seed the fuzzing
> process. It's expected that fsck.ext4 will succeed on all of them.
>
> During fuzzing, syzkaller will start mutating corpus programs, which
> eventually leads to mounts of corrupted images and kernel crashes
> similar to the report you shared. You don't need to do anything
> special to force that behavior, it's the default.
>

Zhihao Cheng

unread,
Nov 6, 2024, 10:12:34 AM11/6/24
to syzk...@googlegroups.com, Zhihao Cheng, zhangyi (F), yang...@huawei.com, dvy...@google.com, nog...@google.com

Zhihao Cheng

unread,
Nov 6, 2024, 10:13:01 AM11/6/24
to Aleksandr Nogikh, syzk...@googlegroups.com, Zhihao Cheng, zhangyi (F), yang...@huawei.com, dvy...@google.com


在 2024/11/5 21:34, Aleksandr Nogikh 写道:

Zhihao Cheng

unread,
Nov 7, 2024, 2:28:08 AM11/7/24
to Aleksandr Nogikh, syzk...@googlegroups.com, Zhihao Cheng, zhangyi (F), yang...@huawei.com, dvy...@google.com
Thanks Aleksandr. The mutation engine looks not so complex, and it looks
more reasonable than fsfuzzer, maybe I can translate it into an
independent tool.

Dmitry Vyukov

unread,
Nov 7, 2024, 3:57:42 AM11/7/24
to Zhihao Cheng, Aleksandr Nogikh, syzk...@googlegroups.com, Zhihao Cheng, zhangyi (F), yang...@huawei.com
You may also try conventional fuzzing engines (AFL++, libafl,
libfuzzer) with a good seed corpus.

Theoretically you can combine coverage from both kernel and fsck.
Kernel coverage can be collected using KCOV and injected into the
fuzzer engine (at least libfuzzer supports that).

Just to make sure: build fsck with asan/msan.

You can also do differential fuzzing: feed the same image into kernel
mount and fsck. If kernel rejects to mount it, then fsck must also
produce some error for it. If it does not, then it's a logical bug
fsck.

Zhihao Cheng

unread,
Nov 7, 2024, 8:21:26 AM11/7/24
to Dmitry Vyukov, Aleksandr Nogikh, syzk...@googlegroups.com, Zhihao Cheng, zhangyi (F), yang...@huawei.com
Do you know where I can get an userguide/example about how to mutate fs
image by libafl? I cannot find more information from
https://aflplus.plus/libafl-book/baby_fuzzer/more_examples.html.

> Theoretically you can combine coverage from both kernel and fsck.
> Kernel coverage can be collected using KCOV and injected into the
> fuzzer engine (at least libfuzzer supports that).

How does fuzzer engine do injection according to the kernel coverage? Is
there a document or source file I can reference?

Thanks.

Dmitry Vyukov

unread,
Nov 7, 2024, 8:30:24 AM11/7/24
to Zhihao Cheng, Aleksandr Nogikh, syzk...@googlegroups.com, Zhihao Cheng, zhangyi (F), yang...@huawei.com, Dominik Maier
I would start with just the stock mutation provided by the engine.
I don't know details about libafl, but I assume it should have a
"custom mutator" concept as well.

+Dominik for libafl: this is about fuzzing mounting/fsck of disk
images, and using a custom mutator for them.


> > Theoretically you can combine coverage from both kernel and fsck.
> > Kernel coverage can be collected using KCOV and injected into the
> > fuzzer engine (at least libfuzzer supports that).
>
> How does fuzzer engine do injection according to the kernel coverage? Is
> there a document or source file I can reference?

Here is my rough prototype of glueing KCOV and libfuzzer, it needs to
add coverage to this libfuzzer_coverage array:
https://github.com/google/syzkaller/blob/master/tools/kcovfuzzer/kcovfuzzer.c#L119C70-L119C88

I would assume other engines have something similar.

Dominik, does something similar exist for libafl?

Zhihao Cheng

unread,
Nov 7, 2024, 10:12:03 PM11/7/24
to Dominik Maier, Dmitry Vyukov, Aleksandr Nogikh, syzk...@googlegroups.com, Zhihao Cheng, zhangyi (F), yang...@huawei.com


在 2024/11/7 22:46, Dominik Maier 写道:
> Hi Zhihao, all,
>
> Thanks for adding me.
>
> >  I would assume other engines have something similar.
>
> LibAFL is very flexible, yes. You can define your own observers such as
> coverage maps, state, etc. I would browse around in the example
> fuzzers folder
> <https://github.com/AFLplusplus/LibAFL/tree/main/fuzzers> to see what's
> possible.
> I.e., this
> <https://github.com/AFLplusplus/LibAFL/blob/d1c746a0a2d697e7ab6294dca8b9fbc908e92251/fuzzers/baby/baby_fuzzer/src/main.rs#L64> simple
> example defines its own MapObserver from a global buffer and uses that
> for a MapFeedback.
> You can then combine different feedbacks with `feedback_and`,
> `feedback_or`, etc. to define if a run should be considered `interesing`
> or not.
> (Interesting samples will be kept for future mutations).
>
> > I would start with just the stock mutation provided by the engine.
> I don't know details about libafl, but I assume it should have a
> "custom mutator" concept as well.
>
> Yes, sounds like a very good starting point.
> There are multiple levels of how far you want to take this. The simplest
> one is to use the havoc mutations (the same ones traditional AFL uses)
> plus CmpLog that tries to solve multi-byte comparisons, see this example
> <https://github.com/AFLplusplus/LibAFL/tree/main/fuzzers/inprocess/libfuzzer_libpng_launcher#libfuzzer-for-libpng-with-launcher>
> for a pretty complete fuzzer.
> You may want to fix some hard-to-fuzz fields such as checksums and
> length fields in the harness.
>
> Then, you could try to define a grammar for your target and use Nautilus
> <https://github.com/AFLplusplus/LibAFL/tree/main/fuzzers/structure_aware/baby_fuzzer_nautilus#baby-nautilus-fuzzer>,
> although this feels like it would be a bad fit for file systems as there
> is probably no well-defined grammar for these(?).
> Probably a better idea is to use multipart inputs
> <https://github.com/AFLplusplus/LibAFL/tree/main/fuzzers/structure_aware/baby_fuzzer_multi#baby-fuzzer-multi>
> that can mutate and splice different fields of your input manually, if
> you know how you could split up a filesystem in logical parts.
>
> Lastly, you can write your own mutators from scratch, of course by
> implementing your own Mutator
> <https://docs.rs/libafl/latest/libafl/mutators/trait.Mutator.html>s but
> it's likely not going to outperform the other options in this case,
> unless you really know what you are looking for.
>
> Feel free to follow up with further questions.
>
> Best
> Dominik

Hi Dominik, thanks for the detailed introduction. If I want to learn the
mutation algorithm, can I reference to
https://github.com/AFLplusplus/LibAFL/tree/main/libafl/src/mutators?

Dominik Maier

unread,
Nov 13, 2024, 3:45:56 AM11/13/24
to Dmitry Vyukov, Zhihao Cheng, Aleksandr Nogikh, syzk...@googlegroups.com, Zhihao Cheng, zhangyi (F), yang...@huawei.com
Hi Zhihao, all,

Thanks for adding me. 

>  I would assume other engines have something similar.

LibAFL is very flexible, yes. You can define your own observers such as coverage maps, state, etc. I would browse around in the example fuzzers folder to see what's possible.
I.e., this simple example defines its own MapObserver from a global buffer and uses that for a MapFeedback.
You can then combine different feedbacks with `feedback_and`, `feedback_or`, etc. to define if a run should be considered `interesing` or not.
(Interesting samples will be kept for future mutations).

> I would start with just the stock mutation provided by the engine.
I don't know details about libafl, but I assume it should have a
"custom mutator" concept as well.

Yes, sounds like a very good starting point.
There are multiple levels of how far you want to take this. The simplest one is to use the havoc mutations (the same ones traditional AFL uses) plus CmpLog that tries to solve multi-byte comparisons, see this example for a pretty complete fuzzer.
You may want to fix some hard-to-fuzz fields such as checksums and length fields in the harness.

Then, you could try to define a grammar for your target and use Nautilus, although this feels like it would be a bad fit for file systems as there is probably no well-defined grammar for these(?).
Probably a better idea is to use multipart inputs that can mutate and splice different fields of your input manually, if you know how you could split up a filesystem in logical parts.

Lastly, you can write your own mutators from scratch, of course by implementing your own Mutators but it's likely not going to outperform the other options in this case, unless you really know what you are looking for.

Feel free to follow up with further questions.

Best
Dominik

Dominik Maier

unread,
Nov 13, 2024, 3:46:44 AM11/13/24
to Zhihao Cheng, Dmitry Vyukov, Aleksandr Nogikh, syzk...@googlegroups.com, Zhihao Cheng, zhangyi (F), yang...@huawei.com
Hi Zhihao,

> Hi Dominik, thanks for the detailed introduction. If I want to learn the
mutation algorithm, can I reference to
https://github.com/AFLplusplus/LibAFL/tree/main/libafl/src/mutators?

Yes, that's a collection of most (non-custom) mutators supported by LibAFL.
Specifically, the mutations.rs file contains all interesting havoc mutations (they are picked at random, and stacked), the token_mutations.rs file contains some additional mutations that insert known tokens ("dictionary" entries, comparisons learned at runtime) at random places.

The other files implement other mutation strategies, such as the previously mentioned grammar mutations, etc.
I don't think you need to really understand them, though - on one hand _most_ are pretty simple. The idea of fuzzing is that, empirically, more executions outperform smarter mutators for most targets.
You can just use mutators from these files in your own project if you don't want to use the whole library for one reason or another. Interfacing Rust with C code is simple (assuming your target is in C?).

Best
Dominik
Reply all
Reply to author
Forward
0 new messages