Re: Questions on writing syzkaller descriptions

318 views
Skip to first unread message

Dmitry Vyukov

unread,
Jun 4, 2018, 6:16:58 AM6/4/18
to Chi Pham, syzkaller
On Fri, Jun 1, 2018 at 12:58 PM, Chi Pham <chi...@gmail.com> wrote:
> Hi Dmitry,
>
> Thanks for making syzkaller, it's a very interesting project!
>
> I've been reading syzkaller descriptions to understand how they relate to
> the syscall implementations.
> This has spurred a couple of questions that I hope you can answer:
>
> 1. Resources are, as I understand, values that are transferred between
> functions and other functions/data types, i.e. they describe a sort of data
> dependency.
> I was puzzled by this syntax: resource fd[int32]: 0xffffffffffffffff,
> AT_FDCWD
> Specifically, what does the 0xfff..., AT_FDCWD signify in this case - is
> it a sort of value constraint? And if yes, what for?

+syzkaller mailing list

Hi Chi,

I've just extended description of resources in documentation, please
see if it answers your questions:

https://github.com/google/syzkaller/blob/master/docs/syscall_descriptions_syntax.md#resources


> 2. Return values are discarded unless they are resources. What about out
> parameters? (i.e. uninitialised pointers passed as arguments, to be
> initialised by the function)
> Based on the existing descriptions, I'm guessing they don't matter, aside
> from needing to be valid memory to prevent the program from failing on
> copy_to_user.

Correct.
Values in output structures only matter if there are some resources
returned, e.g. see pipe syscall:

https://github.com/google/syzkaller/blob/master/sys/linux/sys.txt#L85

Pipe returns the resulting fd's in a struct passed into the syscall.


> 3. Sometimes data types are described to have specific constant values, e.g.
> for flags or padding. But I've come across some cases where I couldn't
> immediately see why. Example: in sys/linux/sndcontrol.txt, line 57, the
> field "type" of the struct "snd_ctl_elem_info" is set to 0. But based on the
> source code (include/uapi/sound/asound.h), it looks like 0 corresponds to
> SNDRV_CTL_ELEM_TYPE_NONE, and there are several other possible types.
> What is the reasoning behind setting it to 0? Is it just in this specific
> case it makes sense, i.e. it requires some domain-specific knowledge about
> how the structs are used?


I think it's just a bug in our descriptions. It's hard to extract them
because interfaces are complex and undocumented. I've just pushed a
commit that fixes this and few other things in sndcontrol
descriptions:

https://github.com/google/syzkaller/commit/63f18a76c394930d1368a6b120b9e432bb37d332


> 4. Regarding size: how do you decide whether an int should be 32-bit or
> 64-bit?
> E.g. when some struct field may be declared in the Linux header as "int",
> but in the syzkaller description, it needs to have a size.
> Is the convention to just put int32 when not specified by the source?


C's int type does not have size in its name, but it has well defined
size in ABI. A type simply can't have no size, whether it's spelled
explicitly or not. E.g. char, bool, long and short also don't have
sizes in their names, but they do have sizes. Size of C's int is 4
bytes on all relevant platforms, thus it is translated to int32. We
just make the size explicit.


> 5. Initialising the devices: what is the difference between using
> syz_open_dev and openat? Is there any documentation for these
> syzkaller-specific functions?

You can see source for these pseudo-syscalls here:

https://github.com/google/syzkaller/blob/master/executor/common_linux.h

The only difference between openat and syz_open_dev is that
syz_open_dev substitutes '#' in name with passed in id. This is used
to open families of devices like /dev/snd/controlC0,
/dev/snd/controlC1, /dev/snd/controlC2.


> Background: I'm a student working with Julia Lawall on the possibilities of
> auto-inferring some of the simpler interfaces (I'm aware of the
> headerparser, my approach will involve some source code analysis). I'd also
> like to manually make a few syzkaller descriptions, just to get a better
> feel for it.

Great!

I've filed an issue for this recently:

https://github.com/google/syzkaller/issues/590

You may want to drop a note there to avoid duplicated with other
people. Are you looking at using Coccinelle? Or clang?


Thanks

Chi Pham

unread,
Jun 4, 2018, 7:06:00 AM6/4/18
to Dmitry Vyukov, syzkaller
Hi Dmitry,

Thanks for replying.

On Mon, Jun 4, 2018 at 12:16 PM, Dmitry Vyukov <dvy...@google.com> wrote:
On Fri, Jun 1, 2018 at 12:58 PM, Chi Pham <chi...@gmail.com> wrote:
> Hi Dmitry,
>
> Thanks for making syzkaller, it's a very interesting project!
>
> I've been reading syzkaller descriptions to understand how they relate to
> the syscall implementations.
> This has spurred a couple of questions that I hope you can answer:
>
> 1. Resources are, as I understand, values that are transferred between
> functions and other functions/data types, i.e. they describe a sort of data
> dependency.
>    I was puzzled by this syntax: resource fd[int32]: 0xffffffffffffffff,
> AT_FDCWD
>    Specifically, what does the 0xfff..., AT_FDCWD signify in this case - is
> it a sort of value constraint? And if yes, what for?

+syzkaller mailing list

Hi Chi,

I've just extended description of resources in documentation, please
see if it answers your questions:

https://github.com/google/syzkaller/blob/master/docs/syscall_descriptions_syntax.md#resources

Yes, that makes sense, thank you.
Ok, good to know. I agree.
 

> 4. Regarding size: how do you decide whether an int should be 32-bit or
> 64-bit?
>    E.g. when some struct field may be declared in the Linux header as "int",
> but in the syzkaller description, it needs to have a size.
>    Is the convention to just put int32 when not specified by the source?


C's int type does not have size in its name, but it has well defined
size in ABI. A type simply can't have no size, whether it's spelled
explicitly or not. E.g. char, bool, long and short also don't have
sizes in their names, but they do have sizes. Size of C's int is 4
bytes on all relevant platforms, thus it is translated to int32. We
just make the size explicit.


Right, I see. I suppose the main platform-specific size is long which is covered by intptr.
 

> 5. Initialising the devices: what is the difference between using
> syz_open_dev and openat? Is there any documentation for these
> syzkaller-specific functions?

You can see source for these pseudo-syscalls here:

https://github.com/google/syzkaller/blob/master/executor/common_linux.h

The only difference between openat and syz_open_dev is that
syz_open_dev substitutes '#' in name with passed in id. This is used
to open families of devices like /dev/snd/controlC0,
/dev/snd/controlC1, /dev/snd/controlC2.


Ok. On a related note, does syzkaller do anything in the realm of finalising/flushing/closing devices?
 

> Background: I'm a student working with Julia Lawall on the possibilities of
> auto-inferring some of the simpler interfaces (I'm aware of the
> headerparser, my approach will involve some source code analysis). I'd also
> like to manually make a few syzkaller descriptions, just to get a better
> feel for it.

Great!

I've filed an issue for this recently:

https://github.com/google/syzkaller/issues/590

You may want to drop a note there to avoid duplicated with other
people. Are you looking at using Coccinelle? Or clang?

 
Oh, I didn't see this issue at all!
I have read the DIFUZE paper though, which is similar to what I'll be doing, but with another approach and focus (DIFUZE being more security-oriented).
Initially, I'm aiming to use Coccinelle (and Coccinelle internals), but we'll see what kind of capabilities will be needed. Not opposed to using clang or other tools.
I'll comment on the issue soon.

- Chi

Dmitry Vyukov

unread,
Jun 4, 2018, 7:14:28 AM6/4/18
to Chi Pham, syzkaller
On Mon, Jun 4, 2018 at 1:05 PM, Chi Pham <chi...@gmail.com> wrote:
>> > 5. Initialising the devices: what is the difference between using
>> > syz_open_dev and openat? Is there any documentation for these
>> > syzkaller-specific functions?
>>
>> You can see source for these pseudo-syscalls here:
>>
>> https://github.com/google/syzkaller/blob/master/executor/common_linux.h
>>
>> The only difference between openat and syz_open_dev is that
>> syz_open_dev substitutes '#' in name with passed in id. This is used
>> to open families of devices like /dev/snd/controlC0,
>> /dev/snd/controlC1, /dev/snd/controlC2.
>>
>
> Ok. On a related note, does syzkaller do anything in the realm of
> finalising/flushing/closing devices?

What exactly do you mean by finalising/flushing/closing?
syzkaller can call close on a device fd, it can also call an ioctl
that does some kind of flushing (if it's present for a device). All
tests are executed in isolated processes, so after a test the process
exits and all device fd's are closed automatically.

Chi Pham

unread,
Jun 4, 2018, 7:27:30 AM6/4/18
to Dmitry Vyukov, syzkaller
I mean if it has tests that check if closing breaks anything.
E.g. tests that do things to the device after closing, or reopening devices and performing new operations?

- Chi 

Dmitry Vyukov

unread,
Jun 4, 2018, 7:36:28 AM6/4/18
to Chi Pham, syzkaller
Yes, syzkaller should be able to generate something like:

r0 = open("/dev/foo")
ioctl(r0, ...)
close(r0)
r1 = open("/dev/foo")
ioctl(r1, ...)

Chi Pham

unread,
Jun 30, 2018, 1:51:11 PM6/30/18
to Dmitry Vyukov, syzkaller
I've written a syzkaller description for rtc (real-time clock).
Some follow-up questions for writing descriptions:

1. For integer arguments/fields that have a range constraint, is it always best to specify the range as accurately as possible (e.g. based on documentation)?
Will the fuzzer ever generate values that hit outside these constraints, just to test if the implementation handles them correctly?
As far as I can tell superficially, it does not (https://github.com/google/syzkaller/blob/master/prog/rand.go#L632), but maybe it is not useful to do so.
Example: specifying that a minute field has type int32[0:59].

2. There are some ioctl options that appear to only exist in certain architectures. Should they be included in the description?
Example: RTC_PLL_GET is only handled in the m68k architecture, which is not represented in the generated .const files.

Thanks,
- Chi

Dmitry Vyukov

unread,
Jul 2, 2018, 4:12:41 AM7/2/18
to Chi Pham, syzkaller
Hi Chi,

Are you planning to send a pull request with the rtc description?

Chi Pham

unread,
Jul 2, 2018, 4:40:33 AM7/2/18
to Dmitry Vyukov, syzkaller
Yes. Do you prefer to comment on the pull request? I figured these were general enough questions that the list might benefit from them.

- Chi 

Dmitry Vyukov

unread,
Jul 2, 2018, 5:05:47 AM7/2/18
to Chi Pham, syzkaller
On Mon, Jul 2, 2018 at 10:40 AM, Chi Pham <chi...@gmail.com> wrote:
>> > I've written a syzkaller description for rtc (real-time clock).
>> > Some follow-up questions for writing descriptions:
>> >
>> > 1. For integer arguments/fields that have a range constraint, is it
>> > always
>> > best to specify the range as accurately as possible (e.g. based on
>> > documentation)?
>> > Will the fuzzer ever generate values that hit outside these constraints,
>> > just to test if the implementation handles them correctly?
>> > As far as I can tell superficially, it does not
>> > (https://github.com/google/syzkaller/blob/master/prog/rand.go#L632), but
>> > maybe it is not useful to do so.
>> > Example: specifying that a minute field has type int32[0:59].


Yes, the idea is describe correct arguments as precisely as possible.
It's generating correct non-trivial programs that's extremely hard.
Generating incorrect programs is trivial. So we want the description
to help with generating correct programs. We obviously need some
degree of incorrect input values too, but that can be and should be
implemented on top of correct descriptions.
We actually diverge from the specified int range during mutation:
https://github.com/google/syzkaller/blob/master/prog/mutation.go#L174
So we should be good here. But if necessary, the common logic in prog
generation/mutation (rather than each and every argument/field in the
description) is the right place to loosen generation precision.


>> > 2. There are some ioctl options that appear to only exist in certain
>> > architectures. Should they be included in the description?
>> > Example: RTC_PLL_GET is only handled in the m68k architecture, which is
>> > not
>> > represented in the generated .const files.

If it's only m68k, then I think we need to drop it. We don't support
m68k at the moment, and I think make extract/generate will bark on
ioctl with undefined const.
If it's only arm, then we need to include it. I think we should have
some examples of this in kvm.txt descriptions which also have some
arch-specific ioctls.




>> Hi Chi,
>>
>> Are you planning to send a pull request with the rtc description?
>
>
> Yes. Do you prefer to comment on the pull request? I figured these were
> general enough questions that the list might benefit from them.

Good. I am fine answering here.

Chi Pham

unread,
Jul 2, 2018, 6:13:40 AM7/2/18
to Dmitry Vyukov, syzkaller
Ok, makes sense.
 

>> > 2. There are some ioctl options that appear to only exist in certain
>> > architectures. Should they be included in the description?
>> > Example: RTC_PLL_GET is only handled in the m68k architecture, which is
>> > not
>> > represented in the generated .const files.

If it's only m68k, then I think we need to drop it. We don't support
m68k at the moment, and I think make extract/generate will bark on
ioctl with undefined const.
If it's only arm, then we need to include it. I think we should have
some examples of this in kvm.txt descriptions which also have some
arch-specific ioctls.


The const is not undefined - it is included in the general header file (i.e. in uapi/linux/rtc.h).
It just happens to only be implemented for m68k currently, but doesn't it leave an open door for it to be implemented in other architectures in the future?
I merely noticed it because it's not covered in my testing environment (QEMU with Debian, x86-64 kernel).
 
>> Hi Chi,
>>
>> Are you planning to send a pull request with the rtc description?
>
>
> Yes. Do you prefer to comment on the pull request? I figured these were
> general enough questions that the list might benefit from them.

Good. I am fine answering here.

Thanks very much!

- Chi 

Dmitry Vyukov

unread,
Jul 2, 2018, 6:44:43 AM7/2/18
to Chi Pham, syzkaller
Then I don't have strong preference either way. Doing it both ways is fine.
Reply all
Reply to author
Forward
0 new messages