Syscall OUT arg as resource (for IN arg of another) in Syzlang

246 views
Skip to first unread message

Branden Sherrell

unread,
Jan 27, 2021, 5:43:34 PM1/27/21
to syzk...@googlegroups.com
Hi there,

It appears as if resource types are restricted mostly to integer (or pointer) types. But what of the situation where a resource is returned as an OUT param for a syscall? In particular, a situation like this:

struct uuid {
   uint8_t id[16];
};
int get_id_syscall(struct id *id);
int use_id_syscall(struct id *id, …);

Essentially a situation in which one syscall output (not the return value) is to be used as input to another. In this contrived example, the get_id_syscall will request some ID value that is to be used with the second use_id_syscall. It might be known that without a valid (e.g. from get_id_syscall) ID the latter syscall may exit early without anything interesting happening. This is the general motivation for properly defining this relationship. How can syzlang be used to describe this situation? At first I tried creating an array resource:

    resource id[array[int8, 16]]

But was met with the distinct error:

    array can't be resource base (int types can)

You can do the following:

uuid {
    id array[int8, 16]
}
get_id_syscall(id ptr[out, uuid])
use_id_syscall(id, ptr[in, uuid])

However there is no explicit link between the OUT arg of get_id_syscall and the IN arg of use_id_syscall. Because of this, syzkaller will just randomly generate the input for the latter syscall, even after (possibly) receiving one from the first. 

Is there a way to properly link an OUT param of one syscall to be used as a resource for the IN param of another?


Thanks so much,
Branden

Dmitry Vyukov

unread,
Jan 28, 2021, 2:35:22 AM1/28/21
to Branden Sherrell, syzkaller
Hi Branden,

> Is there a way to properly link an OUT param of one syscall to be used as a resource for the IN param of another?

This is fully supported from day one. Look at e.g. pipe syscall.

However, currently resources can only be int8/16/32/64. Larger
resources are not supported at the moment, though, it would be useful
for several other cases.

Branden Sherrell

unread,
Mar 9, 2021, 3:18:39 PM3/9/21
to syzk...@googlegroups.com
Hello,

I am trying to understand the control flow for a given fuzz-case execution through a panic report entry in the web UI. While going through the sources I found myself somewhat confused by the responsibilities of the various callback functions listed in the title:
 - Diagnose
 - ContainsCrash
 - Parse
 - Symbolize

Taking the QEMU vm “type” as an example, I can see that when the executor is “Run”, its output is merged into the instance merger:

However, so is the output of the VM instance (the kernel serial) where panic reports are written:

Although I see they are added with different tags, notably “ssh” for the first and “qemu” for the latter.
It’s from this point forward that I am confused on the overall architecture design of syzkaller. 

1) What is them purpose of Diagnose (defined in the QEMU vm type source)?

2) When a panic occurs, what is the overall expected flow through the following “report” functions? (I chose gvisor as an example because its implementation is much simpler than Linux and the others)
If the output of the kernel serial port is merged with the output of the executor running the fuzz case, how is this information separated?

3) How are the actual fuzz cases exported from the VM to the host and stored in the database?
By this I mean specifically the text-based syscall/resource representation (e.g. https://syzkaller.appspot.com/text?tag=ReproSyz&x=15a18240500000)

I tried looking through the sources to answer these questions myself, but I am concerned that the Linux example might be too complicated, and the gvisor example may be too simplified to really understand the scope of each of these callback functions. 
* by “purpose” I mean, what are they design intentions from input to output?


Thank you!
Branden

Dmitry Vyukov

unread,
Mar 10, 2021, 4:03:18 AM3/10/21
to Branden Sherrell, syzkaller
Hi Branden,

Yes, kernel console output and syz-fuzzer (not executor) outputs are
merged together. You can see examples of intermixed output in any
"log" files, they contain both kernel oops (from console) and programs
executed before the crash (syz-fuzzer output). “ssh” and “qemu” tags
are not used to separate outputs later, I think they are used only to
produce meaningful log messages.

ContainsCrash is called first and it's called continuously on the
merged output to understand if the output contains any crash yet or
not.
If ContainsCrash returns true, Parse is called to get the crash title
and full oops message.
Symbolize simply adds file:line info to oops messages, so it's called
after Parse.
Diagnose is VM-type-specific function that may produce some additional
diagnostics output that may help to debug the crash (e.g. for qemu it
outputs CPU registers, for gVisor it sends some commands to runsc to
produce additional stacks). Diagnose is called after Parse as well,
however, I think we may call Parse after Diagnose again because
Diagnose may cause the kernel to produce more output on console.

The test cases are exported from VM by printing them on stdout by
syz-fuzzer process. So they appear in the merged "log" files.

Hope this clears some things.

Branden Sherrell

unread,
Sep 20, 2021, 5:07:08 PM9/20/21
to Dmitry Vyukov, syzkaller
Hi Dmitry,

I am only recently getting back to looking at syzlang. Thank you for the info you provided below. My next question has to do with system calls where one argument is a pointer to a size. I can see that syzkaller supports this scenario:

void syscall_ex(void *buf, size_t size_of_buf);

Defined as

syscall_ex(buf buffer[in], size_of_buffer len[buf])

But how can it be used to describe the scenario where the size field is a pointer to the size? e.g.

void syscall_ex(void *buf, size_t *size_of_buf);


Thank you,
Branden

Dmitry Vyukov

unread,
Sep 20, 2021, 11:27:29 PM9/20/21
to Branden Sherrell, syzkaller
On Mon, 20 Sept 2021 at 20:50, Branden Sherrell <sherr...@gmail.com> wrote:
>
> Hi Dmitry,
>
> I am only recently getting back to looking at syzlang. Thank you for the info you provided below. My next question has to do with system calls where one argument is a pointer to a size. I can see that syzkaller supports this scenario:
>
> void syscall_ex(void *buf, size_t size_of_buf);
>
> Defined as
>
> syscall_ex(buf buffer[in], size_of_buffer len[buf])
>
> But how can it be used to describe the scenario where the size field is a pointer to the size? e.g.
>
> void syscall_ex(void *buf, size_t *size_of_buf);

Hi Branden,

Check out getsockopt, I think it does something like this.

Branden Sherrell

unread,
Sep 21, 2021, 12:53:29 AM9/21/21
to Dmitry Vyukov, syzkaller
Hi Dmitry,

Indeed it does! I was looking for an example for this use-case. Apparently the description `ptr[dir, len[var]]` has special meaning for this scenario. Another quick question: is there a way to force syscalls to be enabled that do not consume or produce resources? I’m thinking in particular system calls like `sync`. When syzkaller starts it seems to disable system calls of this type. This is mentioned some documentation as well (https://github.com/google/syzkaller/blob/fcdb12ba70875c410749932abf39160d19c753d9/docs/syscall_descriptions.md):

All system calls in the enable_syscalls list will be enabled if their requirements are met (ie. if they are supported in the target machine and any other system calls that need to run in order to provide inputs for them are also enabled).

Branden

Dmitry Vyukov

unread,
Sep 21, 2021, 11:48:44 AM9/21/21
to Branden Sherrell, syzkaller
On Tue, 21 Sept 2021 at 06:53, Branden Sherrell <sherr...@gmail.com> wrote:
>
> Hi Dmitry,
>
> Indeed it does! I was looking for an example for this use-case. Apparently the description `ptr[dir, len[var]]` has special meaning for this scenario. Another quick question: is there a way to force syscalls to be enabled that do not consume or produce resources? I’m thinking in particular system calls like `sync`. When syzkaller starts it seems to disable system calls of this type. This is mentioned some documentation as well (https://github.com/google/syzkaller/blob/fcdb12ba70875c410749932abf39160d19c753d9/docs/syscall_descriptions.md):
>
> All system calls in the enable_syscalls list will be enabled if their requirements are met (ie. if they are supported in the target machine and any other system calls that need to run in order to provide inputs for them are also enabled).

Syscalls that don't consume resources are enabled (not disabled). They
don't need any inputs. Why do you think they are disabled?
Reply all
Reply to author
Forward
0 new messages