Re: Fuzzing ioctls related to other file systems on syzbot

59 views
Skip to first unread message

Dmitry Vyukov

unread,
Jul 22, 2020, 2:49:49 AM7/22/20
to Jiaheng Hu, Eric Biggers, Matt Morehouse, Zubin Mithra, syzkaller
On Tue, Jul 21, 2020 at 5:49 PM Jiaheng Hu <jiah...@google.com> wrote:
>
> Hi Dmitry,
>
> I'm a summer intern working on syzkaller and I've been working on writing descriptions for fs-specific ioctls recently (specifically, that of f2fs). I've been able to create a f2fs image on my local machine and fuzz it, but so far it seems that syz-bot doesn't run with any f2fs image. I'd like to enable syzbot to fuzz fs-specific ioctls if possible, and would really like to set up a meeting and talk about potential paths moving forward.

+syzkaller mailing list

Hi Jiaheng,

I saw your PRs on github ;)

It's a hard question. There are multiple options, but I don't see any
that would be good, reasonably simple and felt like the way to do it.
The solution should be scalable to other filesystems as well in
future, even if initially we concentrate just on f2fs.
Another very important aspect to keep in mind is: how syzkaller will
provide reproducers for these bugs and how developers will be able to
reproduce these bugs?

syzkaller can mount filesystems via syz_mount_image pseudo-syscall.
However, it does not know how to generate correct filesystems.
One direction may be just preseeding it with correct images. I did
this in the past for some filesystems (see
tools/syz-imagegen/imagegen.go). The simplest way to do it is to
connect a local seeded instance to the syzbot syz-hub. It's probably
the simplest one to get some results. The downside is that it feels
somewhat like a one-off effort rather than a long-term solution.
syzkaller actually seems to be able to find some fs-specific bugs, e.g.:
https://syzkaller.appspot.com/bug?extid=1f85ab0ddb8405a93116
And I think I saw others as well.

Another direction would be teaching syzkaller how to generate correct images:
https://github.com/google/syzkaller/issues/1020
This is harder. But this is a better long-term investment and I would
expect this to result in way more bugs (also in image parsing and just
weirdly setup images, though it may not be a problem for all threat
models).

Another simple one is to set up a separate syzbot instance with an
f2fs root image.
But this feels too one-off and too non-scalable. Setting up and
maintaining instances is currently expensive (it should not be, but
currently it is).

Another direction is to package multiple fs images as files in the
syzbot image and mount them after boot at known locations. Say, we
will have /syzkaller/fs/f2fs/ as mounted f2fs images, and so on for
other filesystems. And then we can make the fuzzer use these dirs
somehow as well. There are multiple options here again:
- pre-create a temp dir in each of the filesystems and link them into
the main temp dir; a downside here is per-test overhead: we will need
to mkdir/link/rmdir per fs, even if it's not used in the test at all
- run the whole test in one/random filesystem
- create and cleanup an fs dir only if we see the program uses it
- create an fs dir explicitly with a new pseudo-syscall

That's options that I see. Do you have any other suggestions/ideas?

Let's do several iterations over email and then we can do a meeting to
discuss more concrete details. Frankly I would not be able to come
with this proper list of ideas during a meeting and I won't be able to
give answers to any random questions right away as well.

Matt Morehouse

unread,
Jul 24, 2020, 2:33:44 PM7/24/20
to Dmitry Vyukov, Jiaheng Hu, Eric Biggers, Zubin Mithra, syzkaller
Naive question:
If we can put the f2fs image file in the syzbot image and mount it as a special resource fd, isn't that all we need to test the ioctls Jiaheng added?

Matt Morehouse

unread,
Jul 24, 2020, 3:25:29 PM7/24/20
to Dmitry Vyukov, Jiaheng Hu, Eric Biggers, Zubin Mithra, syzkaller
Probably this falls under Dmitry's pseudo-syscall approach.  We could have something like this:

int syz_get_f2fs_dirfd() {
  static int f2fs_dirfd = []() {
    syz_mount_image(..., "/syzkaller/fs/f2fs", ...);
    return open("/syzkaller/fs/f2fs", ...)
  }();
  return f2fs_dirfd;
}

Then any *at() syscall has a chance of getting the f2fs dirfd.

Dmitry Vyukov

unread,
Jul 25, 2020, 1:20:54 AM7/25/20
to Matt Morehouse, Jiaheng Hu, Eric Biggers, Zubin Mithra, syzkaller
On Fri, Jul 24, 2020 at 9:25 PM Matt Morehouse <mas...@google.com> wrote:
>
> Probably this falls under Dmitry's pseudo-syscall approach. We could have something like this:
>
> int syz_get_f2fs_dirfd() {
> static int f2fs_dirfd = []() {
> syz_mount_image(..., "/syzkaller/fs/f2fs", ...);
> return open("/syzkaller/fs/f2fs", ...)
> }();
> return f2fs_dirfd;
> }
>
> Then any *at() syscall has a chance of getting the f2fs dirfd.

Yes, this is one of possible approaches.

Packaging images into the syzbot disk image template is probably not
too hard, though, will involve some manual re-deployment work for each
update.
I see more significant issues with C reproducers. They will become
totally dependent on the image. We will need to provide the whole
image containing all other images, and probably each fs image
separately as well(?) and also somehow make the reason for
non-reproducibility obvious when the image is not present at the
necessary location.

There are also 2 suboptions:
- we could either mount it once globally and then create temp sub
dirs there when a test requests it (but also need to cleanup and
ensure that different processes don't collide)
- also could link it in the process work dir, then it will be able
to use it in any syscalls that work with files
- or mount a new copy in the psudo-syscall right in the process work
dir (automatically resolves all issues with collisions,
non-reproducibility and cleanup)

Or we could actually embed the binary image data right in the executor
in syz_mount_image format, then we don't depend on the image and don't
need to redeploy syzbot and don't have problems with C reproducers.
However, as far as I remember some of fs images are quite large...

Matt Morehouse

unread,
Jul 27, 2020, 1:28:12 PM7/27/20
to Dmitry Vyukov, Jiaheng Hu, Eric Biggers, Zubin Mithra, syzkaller
On Fri, Jul 24, 2020 at 10:20 PM Dmitry Vyukov <dvy...@google.com> wrote:
On Fri, Jul 24, 2020 at 9:25 PM Matt Morehouse <mas...@google.com> wrote:
>
> Probably this falls under Dmitry's pseudo-syscall approach.  We could have something like this:
>
> int syz_get_f2fs_dirfd() {
>   static int f2fs_dirfd = []() {
>     syz_mount_image(..., "/syzkaller/fs/f2fs", ...);
>     return open("/syzkaller/fs/f2fs", ...)
>   }();
>   return f2fs_dirfd;
> }
>
> Then any *at() syscall has a chance of getting the f2fs dirfd.

Yes, this is one of possible approaches.

Packaging images into the syzbot disk image template is probably not
too hard, though, will involve some manual re-deployment work for each
update.
I see more significant issues with C reproducers. They will become
totally dependent on the image. We will need to provide the whole
image containing all other images, and probably each fs image
separately as well(?) and also somehow make the reason for
non-reproducibility obvious when the image is not present at the
necessary location.

An error message from syz_mount_image should be able to indicate the image is missing.  And we could provide the f2fs image in the bug report with instructions when a repro contains syz_get_f2fs_dirfd.
 

There are also 2 suboptions:
 - we could either mount it once globally and then create temp sub
dirs there when a test requests it (but also need to cleanup and
ensure that different processes don't collide)
   - also could link it in the process work dir, then it will be able
to use it in any syscalls that work with files
 - or mount a new copy in the psudo-syscall right in the process work
dir (automatically resolves all issues with collisions,
non-reproducibility and cleanup)

+1 on mounting a copy in a new dir each time.
 

Or we could actually embed the binary image data right in the executor
in syz_mount_image format, then we don't depend on the image and don't
need to redeploy syzbot and don't have problems with C reproducers.
However, as far as I remember some of fs images are quite large...

Jiaheng, maybe you can look into this and see how large the f2fs image would be.  Embedding directly might simplify deployment.

Jiaheng Hu

unread,
Jul 30, 2020, 2:30:19 AM7/30/20
to Matt Morehouse, Dmitry Vyukov, Eric Biggers, Zubin Mithra, syzkaller
Hi Dmitry,

Thank you for your reply!

Considering the duration of my internship, Matt and I think that going down the path of mounting an image of the filesystem type we want to fuzz would be the most accessible approach. The third suboption you mentioned ("mount a new copy in the psudo-syscall right in the process work dir") seems to be the best option along this path as it seems to handle reproducibility problem. Going into the detail of this, do we want to just create a new psudo-syscall which performs mounting task with a set of pre-set parameters, or do we want to force syzkaller to mount the given image everytime a new fuzzing thread has started (and if so, how do we actually do it)? 
 
Secondly, what do you think would be a suitable image to be mounted? I used the create_image.sh scripted to create a f2fs image w/ minimal debian stretch os, and the size of that is 2GB (which is 134217729 lines of binary, probably too large to be directly embedded). I'm not really familiar with image related stuff, but I kind of feel like we don't really need an os on the image? Is there any way for us to downsize the image but still have it working for mounting purpose?

Thirdly, you mentioned that having syzkaller generating random images would be a better and "long-term" approach. I'm wondering how different this would be from creating a "parameterized" version of the create-image.sh script, and just use that script with random parameters to generate new images. I also need some help understanding the importance of mutating images: would some bug occur only given a specific image?


Thank you again,
Best,
Jiaheng

Dmitry Vyukov

unread,
Jul 30, 2020, 3:23:00 AM7/30/20
to Jiaheng Hu, Matt Morehouse, Eric Biggers, Zubin Mithra, syzkaller
On Thu, Jul 30, 2020 at 8:30 AM Jiaheng Hu <jiah...@google.com> wrote:
>
> Hi Dmitry,
>
> Thank you for your reply!
>
> Considering the duration of my internship, Matt and I think that going down the path of mounting an image of the filesystem type we want to fuzz would be the most accessible approach. The third suboption you mentioned ("mount a new copy in the psudo-syscall right in the process work dir") seems to be the best option along this path as it seems to handle reproducibility problem. Going into the detail of this, do we want to just create a new psudo-syscall which performs mounting task with a set of pre-set parameters, or do we want to force syzkaller to mount the given image everytime a new fuzzing thread has started (and if so, how do we actually do it)?

Sounds reasonable.
I think the image should be mounted by a pseudo-syscall rather than on
every test process startup. This way we (1) use some fuzzer-provided
randomness during mount, (2) don't pay the cost of the mount if it's
not necessary, (3) mount several filesystems (of the same type or of
different types).

There are still several options:
1. We can use images stored as files in known locations.
2. Embed some compressed version directly into executor (this can use
the existing syz_mount_image syscall, but with an "empty" image; if
image is empty, executor will use the "default" image).
3. Pre-seed the corpus with valid images in syz_mount_image syscall.

(3) looks like the best option to me so far. It does everything we
want, and makes C reproducers work without any external dependencies
and on top of option (2) also allows the fuzzer to mutate the seed
images to test the mount operation itself.
I injected some images into the corpus manually as a one-off effort at
some point. But nobody knows what images I injected now, and how well
that worked, and if they still stay in the corpus or were lost for
some reason, and how to inject more images for other filesystems
(f2fs).
So I wonder if it's possible to do this in a more principled,
controlled and extensible way...
We have these "unit-tests" for some descriptions:
https://github.com/google/syzkaller/tree/master/sys/linux/test
I was thinking before if we could use these as seeds for corpus. Some
of these contains very interesting non-trivial scenarios, e.g.:
https://github.com/google/syzkaller/blob/master/sys/linux/test/binder
https://github.com/google/syzkaller/blob/master/sys/linux/test/wireguard
https://github.com/google/syzkaller/blob/master/sys/linux/test/io_uring
It looks very reasonable to use them as seeds: both useful and
provides a nice way for contributors not just to add descriptions, but
also seed corpus with some non-trivial initial use examples for the
subsystem (currently it's not possible).
And it looks like a perfect option for what you are trying to achieve.
We could add a few seeds for fs there and then arrange either syz-ci
or syz-manager and build process (not sure yet what's the best course
here) to use them as initial seeds when starting fuzzing.
What do you think?


> Secondly, what do you think would be a suitable image to be mounted? I used the create_image.sh scripted to create a f2fs image w/ minimal debian stretch os, and the size of that is 2GB (which is 134217729 lines of binary, probably too large to be directly embedded). I'm not really familiar with image related stuff, but I kind of feel like we don't really need an os on the image? Is there any way for us to downsize the image but still have it working for mounting purpose?

Yes, don't use that, we totally don't need a full distro image. We
just need a minimal empty (or almost empty) filesystem image.
Here is the command create a minimal FAT image:
$ fallocate -l 56K disk.raw && mkfs.fat disk.raw
That's it. It gives you a 56K file that can be mounted as FAT fs.

And the nice thing is that it's actually mostly 0's which play well
with the syz_mount_image "compressed" format (which makes all regions
with 0's implicit):

$ hexdump -C disk.raw
00000000 eb 3c 90 6d 6b 66 73 2e 66 61 74 00 02 04 01 00 |.<.mkfs.fat.....|
00000010 02 00 02 70 00 f8 01 00 20 00 40 00 00 00 00 00 |...p.... .@.....|
00000020 00 00 00 00 80 00 29 73 4d 0a df 4e 4f 20 4e 41 |......)sM..NO NA|
00000030 4d 45 20 20 20 20 46 41 54 31 32 20 20 20 0e 1f |ME FAT12 ..|
00000040 be 5b 7c ac 22 c0 74 0b 56 b4 0e bb 07 00 cd 10 |.[|.".t.V.......|
00000050 5e eb f0 32 e4 cd 16 cd 19 eb fe 54 68 69 73 20 |^..2.......This |
00000060 69 73 20 6e 6f 74 20 61 20 62 6f 6f 74 61 62 6c |is not a bootabl|
00000070 65 20 64 69 73 6b 2e 20 20 50 6c 65 61 73 65 20 |e disk. Please |
00000080 69 6e 73 65 72 74 20 61 20 62 6f 6f 74 61 62 6c |insert a bootabl|
00000090 65 20 66 6c 6f 70 70 79 20 61 6e 64 0d 0a 70 72 |e floppy and..pr|
000000a0 65 73 73 20 61 6e 79 20 6b 65 79 20 74 6f 20 74 |ess any key to t|
000000b0 72 79 20 61 67 61 69 6e 20 2e 2e 2e 20 0d 0a 00 |ry again ... ...|
000000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa |..............U.|
00000200 f8 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000400 f8 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000410 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
0000e000


mkfs also accepts some per-fs options, it makes sense to generate a
number of different images.
E.g. here is what I used for hfsplus (I don't remember which of these
succeeded, most likely not all, but it gives an idea):

fallocate -l 64K disk.raw && mkfs.hfsplus disk.raw
fallocate -l 512K disk.raw && mkfs.hfsplus disk.raw
fallocate -l 512K disk.raw && mkfs.hfsplus -w disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -w disk.raw
fallocate -l 512K disk.raw && mkfs.hfsplus disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -w -s disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -h disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -w -s disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -s disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -w -J disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -w -J 256K disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -w -J 1M disk.raw
fallocate -l 2M disk.raw && mkfs.hfsplus -w -J 1M disk.raw
fallocate -l 2M disk.raw && mkfs.hfsplus -w -b 512 disk.raw
fallocate -l 2M disk.raw && mkfs.hfsplus -w -i 17 -b 512 disk.raw
fallocate -l 2M disk.raw && mkfs.hfsplus -w -i 17 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -w -i 17 -b 512 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -w -i 17 -b 512 -n
e=512,c=512,a=512 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -w -i 17 -b 4096 -n
e=4096,c=4096,a=4096 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -w -i 17 -b 4096 -n
e=512,c=4096,a=4096 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -w -i 17 -b 4096 -n
e=1024,c=4096,a=4096 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -w -i 17 -b 4096 -n
e=1024,c=512,a=4096 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -w -i 17 -b 4096 -n
e=1024,c=2048,a=4096 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -w -i 17 -b 4096 -n
e=1024,c=1024,a=4096 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -w -i 17 -b 4096 -n
e=1024,c=4096,a=4096 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -w -i 17 -b 4096 -n
e=1024,c=4096,a=512 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -w -i 17 -b 4096 -n
e=1024,c=4096,a=1024 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -w -i 17 -b 4096 -n
e=1024,c=4096,a=2048 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -w -i 17 -b 8192 -n
e=1024,c=4096,a=2048 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -w -i 17 -b 8192 -n
e=1024,c=2048,a=2048 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -w -i 17 -b 2048 -n
e=1024,c=4096,a=2048 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -w -b 2048 -n
e=1024,c=8192,a=2048 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -w -b 2048 -c a=128,b=3 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -w -c a=128,b=3 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -w -c a=128,b=3,c=17 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -w -c a=128,b=3,c=17,e=10000 disk.raw
fallocate -l 1M disk.raw && mkfs.hfsplus -h disk.raw
fallocate -l 512K disk.raw && mkfs.hfsplus -h disk.raw
fallocate -l 512K disk.raw && mkfs.hfsplus -h -s disk.raw
fallocate -l 512K disk.raw && mkfs.hfsplus -h -s -w disk.raw
fallocate -l 512K disk.raw && mkfs.hfsplus -s disk.raw
fallocate -l 512K disk.raw && mkfs.hfsplus -h -s disk.raw
fallocate -l 512K disk.raw && mkfs.hfsplus -h -w disk.raw
fallocate -l 512K disk.raw && mkfs.hfsplus -h -J 13 disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -h -v asd disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -h -v syz disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -h -v syz -b 5000 disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -h -v syz -b 10000 disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -h -v syz -b 8192 -n
e=1024,c=1024,a=1024 disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -h -v syz -b 8192 -n
e=1024,c=2048,a=1024 disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -h -v syz -b 8192 -n
e=1024,c=4095,a=1024 disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -h -v syz -b 8192 -n
e=1024,c=4096,a=1024 disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -h -v syz -b 8192 -n
e=1024,c=4097,a=1024 disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -h -v syz -b 8192 -n
e=1024,c=4096,a=1024 disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -h -v syz -b 8192 -n
e=1024,c=4096,a=1024 -c a=128,b=3,c=17,e=10000 disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -h -b 8192 -n
e=1024,c=4096,a=1024 -c a=128,b=3,c=17,e=10000 disk.raw
fallocate -l 768K disk.raw && mkfs.hfsplus -h -v syz -b 8192 -n
e=1024,c=4096,a=1024 disk.raw
fallocate -l 4M disk.raw && mkfs.hfsplus -h -v syz -b 8192 -n
e=1024,c=4096,a=1024 disk.raw


> Thirdly, you mentioned that having syzkaller generating random images would be a better and "long-term" approach. I'm wondering how different this would be from creating a "parameterized" version of the create-image.sh script, and just use that script with random parameters to generate new images.

See above. We can use mkfs to generate few variants.
But this won't replace full random generation b/c number of different
valid images is effectively infinite, we can't pre-generate them all.
mkfs can't even produce them all.

> I also need some help understanding the importance of mutating images: would some bug occur only given a specific image?

Totally.
There are some "major" features/options of images that have a very
significant effect on behavior of all fs operations. Plus an infinite
long tail of various details that may have some effect.

Matt Morehouse

unread,
Aug 19, 2020, 12:36:06 PM8/19/20
to Dmitry Vyukov, Jiaheng Hu, Eric Biggers, Zubin Mithra, syzkaller
Since Jiaheng's internship ends Friday, I'd like to summarize the current status and direction of this work:
  • PR #2032 - Currently has a single test that mounts a minimal f2fs image using syz_mount_image
    • Jiaheng is working on a shell script to auto-generate more interesting f2fs images to mount, to be integrated with "make generate"
    • Also needs to update syzbot kernel config to work
  • PR #2053 - Adds ability to seed the corpus with test files on syz-manager startup, using a syz-manager config parameter
    • Currently we need to specify each file individually, probably we just want to pass a "seed corpus directory" instead
    • Needs to update the syz-ci config to preseed from syzkaller/sys/linux/test (I'm not sure where exactly to do this)
If any of this seems off, please let us know, so we don't waste the final few days of Jiaheng's internship.

Matt Morehouse

unread,
Aug 19, 2020, 12:37:12 PM8/19/20
to Dmitry Vyukov, Jiaheng Hu, Eric Biggers, Zubin Mithra, syzkaller
Time permitting, Jiaheng may also write another PR to add BTRFS tests/seeds.

Dmitry Vyukov

unread,
Sep 14, 2020, 7:23:45 AM9/14/20
to Matt Morehouse, Jiaheng Hu, Eric Biggers, Zubin Mithra, syzkaller

Matt Morehouse

unread,
Sep 14, 2020, 11:25:10 AM9/14/20
to Dmitry Vyukov, Jiaheng Hu, Jiaheng Hu, Eric Biggers, Zubin Mithra, syzkaller
+Jiaheng Hu (non-google email)

Fantastic!
Reply all
Reply to author
Forward
0 new messages