Re: Getting started with syzbot

1,003 views
Skip to first unread message

Dmitry Vyukov

unread,
Jun 11, 2018, 3:48:50 AM6/11/18
to Shreeya Patel, syzkaller
On Sat, Jun 9, 2018 at 10:15 PM, Shreeya Patel
<shreeya.p...@gmail.com> wrote:
> Hi Dmitry,
>
> As I told you before that I am here to learn about syzbot. Actually this
> task was given to me by Greg. I see lot of people sending patches for some
> bug. I also want to solve some bugs. What do I need to learn for that? How
> should I look at the syzbot report?
> Basically, I am not able to understand the syzbot report.
> Do you have some references through which I can understand the work flow and
> learn to debug here?
> I've done some work for Linux Kernel development but this one is new for me.

+syzkaller mailing list

Hi Shreeya,

It's great that you want to fix some syzbot-reported bugs.

The workflow is along the following lines:

1. Choose a bug from "open" bugs on syzbot dashboard:
https://syzkaller.appspot.com/

2. Check the current bug status and any custom comments on the mailing
list thread for the bug (available via "Reported" link). For example,
for this bug:
https://syzkaller.appspot.com/bug?id=98de7581d334faa54310825a332cafcddfcb5bb7
you can see here:
https://groups.google.com/forum/#!msg/syzkaller-bugs/H2CBWPXBYPk/Z4rb4zIgBQAJ
that it was actually already fixed (but syzbot does not know about it yet).
This is useful to avoid duplicated work and reuse any debugging other
people already did.

3. Debug and fix the bug as any other kernel bug.

4. Send a patch with the fix and include the Reported-by tag
referenced in the email and on the dashboard.

That's it. Some additional documentation is available at:
https://github.com/google/syzkaller/blob/master/docs/syzbot.md

Generally bugs with reproducers are easier to debug as you can
reproduce it locally and test your fix. So I would suggest to start
with bugs with reproducers.

If you are asking about format of syzbot reports, then they are meant
to be self-explanatory. I am not sure what to add here. Do you have
any concrete questions?

Thanks

Shreeya Patel

unread,
Jun 11, 2018, 4:47:38 AM6/11/18
to Dmitry Vyukov, syzkaller
So people will have to always check it on mailing list. Instead why can't we have something on the dashboard where it will specify that someone is working on it or someone is willing to work on it so he/she can have some tag on it.

>
>3. Debug and fix the bug as any other kernel bug.
>
>4. Send a patch with the fix and include the Reported-by tag
>referenced in the email and on the dashboard.
>
>That's it. Some additional documentation is available at:
>https://github.com/google/syzkaller/blob/master/docs/syzbot.md
>
>Generally bugs with reproducers are easier to debug as you can
>reproduce it locally and test your fix. So I would suggest to start
>with bugs with reproducers.
>
>If you are asking about format of syzbot reports, then they are meant
>to be self-explanatory. I am not sure what to add here. Do you have
>any concrete questions?

No, I think all my doubts have been cleared. Thank you so much :)

>
>Thanks

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Dmitry Vyukov

unread,
Jun 11, 2018, 9:15:34 AM6/11/18
to Shreeya Patel, syzkaller

Dmitry Vyukov

unread,
Jun 22, 2018, 6:00:32 AM6/22/18
to Shreeya Patel, syzkaller
Hi Shreeya,

Unfortunately generally debugging in a creative process without a
step-by-step instructions that always lead to the result. Bugs can
also be of vastly different complexity. Some are just missed sanity
checks on function inputs, some are tricky interactions of concurrent
threads and complex state machines that give a failure under some
obscure conditions. Effective debugging require lots of practice and
this skill can't be acquired within a day.

Looking at stack traces and locating involved source code is always
useful as a first step. I, for one, don't use gdb with kernel. But
maybe you will find it useful. gdb would allow to print value of
involved variables at the time of the crash. But the same achievable
by adding more debug output to kernel source code, and that's what I
generally use.

On the highest level, one needs to understand what should have been
prevented the crash. Then if you find nothing, then you add logic that
prevents it. The simplest example would be adding a check on function
argument that it's within certain bounds and if it's not then
returning EINVAL. If you find existing logic that should have been
prevented the crash, then you need to understand why/when it failed to
prevent it and fix it. But I afraid such instructions are so general
that they are practically useless.

Regarding reproducers, yes, generally you reproduce the crash first,
then write a tentative fix, then re-run the reproducer and it should
not trigger the crash.

This topic is hard to explain in one email, sorry. Perhaps you can
find some literature on the topic, or a more experienced developer
that you sit with and learn on some examples.



On Thu, Jun 21, 2018 at 4:30 PM, Shreeya Patel
<shreeya.p...@gmail.com> wrote:
> Hi Dmitry,
>
> I have been reading about how to locate the bugs since few days.
>
> https://www.kernel.org/doc/html/v4.11/admin-guide/bug-hunting.html?highlight=bug%20hunting
> https://static.lwn.net/images/pdf/LDD3/ch04.pdf
>
> Above are some of the references which I have been following. But I see that
> all these references talk about EIP
> to locate the bug with GDB or if that doesn't work (due to stack overflow)
> then we look at the call trace.
>
> Is this the method which all the developers are following? I know there are
> various ways to do so but I just want
> to know that I'm on the right path or not.
>
> One more thing I would like to know. Maybe this is quite basic question (or
> silly).
> I picked one bug report to work on which had a C reproducer. I saved that
> reproducer
> on my system, compiled and ran the program. It compiled successfully but
> after running it
> there was no output i.e. it hung there. This means that the bug was
> reproduced.
> What should be the further process?
> Maybe it should be that I have to solve that bug by locating it using gdb?
> Suppose I solved it and after that if I run the reproducer program again,
> will it be that
> it won't hang again?
>
> Sorry for disturbing you again and again. I'm a second year engineering
> student and have got no experience
> with debugging kernel oops but I want to learn it.
>
> Thanks

Alexander Potapenko

unread,
Jun 22, 2018, 6:56:05 AM6/22/18
to Dmitriy Vyukov, shreeya.p...@gmail.com, syzkaller
My 5 cents on this topic.

First, GDB can sometimes be useful, but in many cases it may be
mistaking or confusing because of it being attached too late, other
things happening in the background, stack traces being corrupted etc.
I wouldn't recommend starting with it.
What GDB is good at at the basic level is quickly detecting that the
kernel deadlocked: if you attach GDB to a running program, dump the
stack and see some _spin_lock() functions at the top for a long time,
this basically means you're stuck.
But I believe there are easier means to detect deadlocks (there should
be a debug config for that).

Once you've a stable bug reproducer (which is critical!) and a crash
stack trace, your tools of the trade should be printk() and bisection.
The basic idea of debugging is to figure out which invariant in the
program is violated and to pinpoint the place at which the violation
happened (not necessarily the place where the program crashed).

For example, if you know that the kernel crashes on a NULL dereference
in a function foo(), you can figure out where the data comes from.
First, print the pointer value at the point of crash to make sure it's NULL.
Then check all assignments to that variable in the current function
(if any) to see which of them have introduced that NULL value.
If the pointer was passed to foo() as a parameter, check its value
right at the beginning of foo(). If it's also NULL, go up the crash
stack till you find the origin of that NULL value.

It's extremely handy to use the __FILE__ and __LINE__ macros when
printing the values, e.g.:
pr_err("ptr=%px HERE: %s:%d\n", ptr, __FILE__, __LINE__);

If for some reason you don't have the crash stack, but have a rough
idea what might have gone wrong, you can still figure out the exact
location by inserting pr_err() into the suspect functions and seeing
which of them were called before the crash.
This requires some knowledge of the subsystem in question, however.

Bisection is a general trick to reduce the search space by dividing it
into two halves.
It can be applied to kernel commits (as in "which commit introduced
this bug?" [1]), reproducer programs ("which part of the program
actually triggers the bug?" [2]), source files ("on which line of the
function does is the following invariant violated?"), printk outputs
("there are too many calls to printk already, can I remove some?") and
anything else.

As Dmitry said, it's quite hard to write a general instruction,
especially given the variety of bugs that syzkaller is able to detect.
I suggest you start with something more or less straightforward (like
null dereferences, local stack buffer overflows or assertion
violations) and reproducible to gain some hands-on experience, and
then move on to more complex cases like hanging kernel.

[1] - you may want to check out the manual for `git bisect` if you
want to do commit bisection. This can also be handy: looking at the
commit that introduced a bug may be easier than reading the whole
file.
[2] - usually syzkaller repros are good enough, but they can be
reduced further either manually or by using tools like multidelta.
Reducing a test case manually may also give you better understanding
about what's going on in the kernel.
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Shreeya Patel

unread,
Jun 23, 2018, 2:07:22 PM6/23/18
to Alexander Potapenko, Dmitriy Vyukov, syzkaller
On Fri, 2018-06-22 at 12:55 +0200, Alexander Potapenko wrote:

Hi Alexander,
This example was quite useful in understanding the work flow.

>
> Bisection is a general trick to reduce the search space by dividing
> it
> into two halves.
> It can be applied to kernel commits (as in "which commit introduced
> this bug?" [1]), reproducer programs ("which part of the program
> actually triggers the bug?" [2]), source files ("on which line of the
> function does is the following invariant violated?"), printk outputs
> ("there are too many calls to printk already, can I remove some?")
> and
> anything else.
>
> As Dmitry said, it's quite hard to write a general instruction,
> especially given the variety of bugs that syzkaller is able to
> detect.
> I suggest you start with something more or less straightforward (like
> null dereferences, local stack buffer overflows or assertion
> violations) and reproducible to gain some hands-on experience, and
> then move on to more complex cases like hanging kernel.

Ok.

>
> [1] - you may want to check out the manual for `git bisect` if you
> want to do commit bisection. This can also be handy: looking at the
> commit that introduced a bug may be easier than reading the whole
> file.
> [2] - usually syzkaller repros are good enough, but they can be
> reduced further either manually or by using tools like multidelta.
> Reducing a test case manually may also give you better understanding
> about what's going on in the kernel.

Thanks for such detailed explanation. It will be useful for some other
newcomers also :)
> > > > > On 11 June 2018 13:18:26 GMT+05:30, Dmitry Vyukov <dvyukov@go

Shreeya Patel

unread,
Jun 23, 2018, 2:14:29 PM6/23/18
to Dmitry Vyukov, syzkaller
Ok, got it.

>
> This topic is hard to explain in one email, sorry. Perhaps you can
> find some literature on the topic, or a more experienced developer
> that you sit with and learn on some examples.

Ok, so I'll try to locate and solve the bug by looking at the crash
stack. Thanks for such detailed explanation :)
> > > > On 11 June 2018 13:18:26 GMT+05:30, Dmitry Vyukov <dvyukov@goog
Reply all
Reply to author
Forward
0 new messages