Are all PCs in newInput pointing to __sanitizer_cov_trace_pc things?

153 views
Skip to first unread message

Joey Jiao

unread,
May 25, 2021, 4:27:22 AM5/25/21
to syzkaller
Hi,
I'm passing /rawcover PC to check if it's related to call to __sanitizer_cov_trace_pc.

All PCs within vmlinux are correct.

But for DLKM modules, it's not.
I decoded some:
```
   13544:       aa1703e0        mov     x0, x23
   13580:       f94002d6        ldr     x22, [x22]
   13598:       94000000        bl      0 <__asan_load4_noabort>
   144f0:       94000000        bl      0 <mutex_lock>
   1451c:       94000000        bl      0 <v4l2_fh_open>
   16848:       91000000        add     x0, x0, #0x0
   168ec:       f83832df        stset   x24, [x22]
   168f8:       14000317        b       17554 <cam_destroy_session_hdl+0x5c>
   169e0:       90000000        adrp    x0, 0 <cam_req_mgr_core_link_reset>
   16b78:       90000016        adrp    x22, 0 <cam_req_mgr_core_link_reset>
   16bd0:       91000042        add     x2, x2, #0x0
   16cbc:       94000000        bl      0 <__asan_load8_noabort>
```

I think the reasons are :
1. these PCs are read from corpus (uint32)
2. DLKM modules loaded into different address after each reboot. So the base address is different:
```
# cat /proc/modules|grep camera
camera 8675328 36 - Live 0xffffffd00bb58000

reboot and check again.

# cat /proc/modules | grep camera
camera 8675328 36 - Live 0xffffffd00bb8800

# cat /proc/modules | grep camera
camera 8675328 36 - Live 0xffffffd00b6a0000
```

Any suggestion to correct this?

Dmitry Vyukov

unread,
May 25, 2021, 9:10:38 AM5/25/21
to Joey Jiao, syzkaller
On Tue, May 25, 2021 at 10:27 AM Joey Jiao <joey....@gmail.com> wrote:
>
> Hi,
> I'm passing /rawcover PC to check if it's related to call to __sanitizer_cov_trace_pc.
>
> All PCs within vmlinux are correct.
>
> But for DLKM modules, it's not.

How are they incorrect? What PCs do you see?

> I decoded some:
> ```
> 13544: aa1703e0 mov x0, x23
> 13580: f94002d6 ldr x22, [x22]
> 13598: 94000000 bl 0 <__asan_load4_noabort>
> 144f0: 94000000 bl 0 <mutex_lock>
> 1451c: 94000000 bl 0 <v4l2_fh_open>
> 16848: 91000000 add x0, x0, #0x0
> 168ec: f83832df stset x24, [x22]
> 168f8: 14000317 b 17554 <cam_destroy_session_hdl+0x5c>
> 169e0: 90000000 adrp x0, 0 <cam_req_mgr_core_link_reset>
> 16b78: 90000016 adrp x22, 0 <cam_req_mgr_core_link_reset>
> 16bd0: 91000042 add x2, x2, #0x0
> 16cbc: 94000000 bl 0 <__asan_load8_noabort>
> ```
>
> I think the reasons are :
> 1. these PCs are read from corpus (uint32)

PCs are not stored in the corpus. PCs are transient. Each syz-manager
process starts collecting PCs from scratch.

> 2. DLKM modules loaded into different address after each reboot. So the base address is different:
> ```
> # cat /proc/modules|grep camera
> camera 8675328 36 - Live 0xffffffd00bb58000
>
> reboot and check again.
>
> # cat /proc/modules | grep camera
> camera 8675328 36 - Live 0xffffffd00bb8800
>
> # cat /proc/modules | grep camera
> camera 8675328 36 - Live 0xffffffd00b6a0000
> ```
>
> Any suggestion to correct this?

Yes, this won't work. Maybe if you disable all address randomization
in the kernel and pre-load modules in fixed addresses, they will be
loaded at fixed addresses.

Joey Jiao

unread,
May 25, 2021, 8:46:29 PM5/25/21
to syzkaller
CONFIG_RANDOMIZE_BASE has been disabled, but no idea which option to control module load address.

According to https://unix.stackexchange.com/questions/108620/loading-a-kernel-driver-to-a-specific-memory-address, it seems it cannot  pre-load modules in fixed addresses.

Other options?


" PCs are not stored in the corpus. PCs are transient. Each syz-manager process starts collecting PCs from scratch."
I'm seeing it reads progs from corpus db while later progs used by prepareFileMap to get pcs.
```
    var progs []cover.Prog
    if sig := r.FormValue("input"); sig != "" {
        inp := mgr.corpus[sig]
        progs = append(progs, cover.Prog{
            Data: string(inp.Prog),
            PCs:  convert(rg, inp.Cover),
        })
    } else {
```
So why did you say it's not stored in the corpus?

And if not possible to load module at fixed address, is it possible to design like this:
Store module name and convert PC to offset inside module and store it?

Joey Jiao

unread,
May 25, 2021, 11:44:00 PM5/25/21
to syzkaller
Ahh, I mixed corpusDB and corpus. ok, so

1. how to load module in fixed address?
2. or possible to save PC in corpus as module name + offset.

Dmitry Vyukov

unread,
May 26, 2021, 6:52:13 AM5/26/21
to Joey Jiao, syzkaller
On Wed, May 26, 2021 at 5:44 AM Joey Jiao <joey....@gmail.com> wrote:
>
> Ahh, I mixed corpusDB and corpus. ok, so
>
> 1. how to load module in fixed address?

I don't know. Maybe somebody else on the list knows?

So this support for modules coverage was never working for you? Or
what am I missing?
https://github.com/google/syzkaller/pull/2478

> 2. or possible to save PC in corpus as module name + offset.

First I would ensure that Linux kernel cannot possibly load at fixed addresses.

It is possible but it must not penalize execution without modules too much.
It definitely must not be strings, but rather some integer
representing module id/index/hash.
Then there is question of how to represent it? Should it be combined
with PC/stored separately? The simplest option would be to hash module
into coverage signal, then no additional support required anywhere
else. But there is also a question of coverage...
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/9e74e7b6-290e-4b2b-9067-aa4cf50deb4bn%40googlegroups.com.

Joey Jiao

unread,
May 26, 2021, 7:18:10 AM5/26/21
to syzkaller
It worked, it just happen to find recently in https://github.com/google/syzkaller/pull/2596 that mismatch causes it not work.
So during debugging, I found DLKM loads in different addresses will cause some PCs not correctly symbolized.

Joey Jiao

unread,
May 26, 2021, 7:32:21 AM5/26/21
to syzkaller
BTW, on qemu I didn't see this issue as there are only several modules loaded and every time I reboot, I see the address is constant. But in my real device setup, there are hundreds of modules and so the issue comes out.

Dmitry Vyukov

unread,
May 26, 2021, 7:35:16 AM5/26/21
to Joey Jiao, syzkaller
On Wed, May 26, 2021 at 1:32 PM Joey Jiao <joey....@gmail.com> wrote:
>
> BTW, on qemu I didn't see this issue as there are only several modules loaded and every time I reboot, I see the address is constant. But in my real device setup, there are hundreds of modules and so the issue comes out.

Maybe you just need to preload them all after boot in fixed order?
If modules are loaded lazily and in random order, then obviously
addresses will be different even if kernel allocated addresses
deterministically.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/6b82790f-104e-4e12-9160-6634c551b1f8n%40googlegroups.com.

Joey Jiao

unread,
May 26, 2021, 7:49:52 AM5/26/21
to syzkaller
I'm also thinking of that preload order as modifying syzkaller seems really takes time and complicated for this usecase.
I'm also investigated to compile these modules back into vmlinux but seems not feasible at the moment as many build system changes.

Thanks

Dmitry Vyukov

unread,
May 26, 2021, 7:55:53 AM5/26/21
to Joey Jiao, syzkaller
On Wed, May 26, 2021 at 1:49 PM Joey Jiao <joey....@gmail.com> wrote:
>
> I'm also thinking of that preload order as modifying syzkaller seems really takes time and complicated for this usecase.
> I'm also investigated to compile these modules back into vmlinux but seems not feasible at the moment as many build system changes.

I was thinking of some init script that will do insmod for all modules
in fixed order.
Perhaps the startup_script can do it:
https://github.com/google/syzkaller/blob/master/vm/adb/adb.go#L45
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/a0ec903b-22b3-47a2-b0e2-d9e5cf48e0afn%40googlegroups.com.

Joey Jiao

unread,
May 26, 2021, 8:41:38 AM5/26/21
to syzkaller
The idea seems working.
I fixed some of the customized modprobe script, some modules are loaded in fixed addresses now.
Some are not, which I still need to find where it was loaded by which script.

Joey Jiao

unread,
May 27, 2021, 11:22:04 PM5/27/21
to syzkaller
After digging further, I think it cannot guarantee to have addressess fixed for each reboot.

If I modprobe these modules in sequence on specific CPU, some of the modules true can have the same addresses.
But the kernel/mm doesn't guarantee to have vmalloc address for some modules later. 

Let me consider the option 2 above as well.

Dmitry Vyukov

unread,
May 28, 2021, 3:32:12 AM5/28/21
to Joey Jiao, syzkaller
On Fri, May 28, 2021 at 5:22 AM Joey Jiao <joey....@gmail.com> wrote:
>
> After digging further, I think it cannot guarantee to have addressess fixed for each reboot.
>
> If I modprobe these modules in sequence on specific CPU, some of the modules true can have the same addresses.
> But the kernel/mm doesn't guarantee to have vmalloc address for some modules later.

What do you mean by "later"? If you load all of them, then no modules
will be loaded later, right?
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/510bde0a-18cf-4b31-b70d-a5a0caf8345bn%40googlegroups.com.

Dmitry Vyukov

unread,
May 28, 2021, 3:35:20 AM5/28/21
to Joey Jiao, syzkaller
On Fri, May 28, 2021 at 9:31 AM Dmitry Vyukov <dvy...@google.com> wrote:
> > After digging further, I think it cannot guarantee to have addressess fixed for each reboot.
> >
> > If I modprobe these modules in sequence on specific CPU, some of the modules true can have the same addresses.
> > But the kernel/mm doesn't guarantee to have vmalloc address for some modules later.
>
> What do you mean by "later"? If you load all of them, then no modules
> will be loaded later, right?

In both Documentation/x86/x86_64/mm.rst and
Documentation/arm64/memory.rst I see that modules have a dedicated
region, which is different from vmalloc region. So vmalloc's shouldn't
interfere with module loading addresses.

Joey Jiao

unread,
May 28, 2021, 3:35:45 AM5/28/21
to syzkaller
I meant for example, I have 40 modules to load,
28 are loaded at the same address, then 29th loaded in different address, and so all following modules are not.

From the move_module and vmalloc code, we didn't find any cause of it.

Dmitry Vyukov

unread,
May 28, 2021, 3:42:51 AM5/28/21
to Joey Jiao, syzkaller
On Fri, May 28, 2021 at 9:35 AM Joey Jiao <joey....@gmail.com> wrote:
>
> I meant for example, I have 40 modules to load,
> 28 are loaded at the same address, then 29th loaded in different address, and so all following modules are not.
>
> From the move_module and vmalloc code, we didn't find any cause of it.

I would add some logging to kernel around the place where it selects
the address. There must be some reason for this in the code.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/57cfd90f-310d-472b-b50b-5864da11c03cn%40googlegroups.com.

Joey Jiao

unread,
May 28, 2021, 3:52:35 AM5/28/21
to syzkaller
Do you have any real devices like pixels to help debug too?

We added many prinks too.

Dmitry Vyukov

unread,
May 28, 2021, 4:11:04 AM5/28/21
to Joey Jiao, syzkaller
On Fri, May 28, 2021 at 9:52 AM Joey Jiao <joey....@gmail.com> wrote:
>
> Do you have any real devices like pixels to help debug too?
>
> We added many prinks too.

I personally don't have any devices.

Modules can load other modules, maybe you have a race between loading
dependencies and loading subsequent modules?
I see there is something called autoclean, maybe some of the modules
are unloaded?

What I would try is:
1. Load modules once and collect expected load addresses
(/proc/modules contents).
2. Load them again and after loading each module verify that they are
at expected addresses (maybe just ensuring that /proc/modules is
prefix of the full /proc/modules will work).
3. If there is any difference, dump /proc/modules.
The question is: is just the address different? Or there are some
other differences, e.g. other modules are loaded/unloaded?
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/90f0408f-872a-4ee7-a7ea-ad9dc5f70d6bn%40googlegroups.com.

Joey Jiao

unread,
Jun 1, 2021, 4:22:35 AM6/1/21
to syzkaller
Well, still no luck to load in fixed address for all modules.
I can see 40 module alloc in /proc/vmallocinfo, but it's true there it's not continous.

So I prefer to implement the syzkaller change, and a draft version is here https://github.com/google/syzkaller/pull/2604

Dmitry Vyukov

unread,
Jun 1, 2021, 12:25:56 PM6/1/21
to Joey Jiao, syzkaller
On Tue, Jun 1, 2021 at 10:22 AM Joey Jiao <joey....@gmail.com> wrote:
>
> Well, still no luck to load in fixed address for all modules.
> I can see 40 module alloc in /proc/vmallocinfo, but it's true there it's not continous.
>
> So I prefer to implement the syzkaller change, and a draft version is here https://github.com/google/syzkaller/pull/2604

I looked at the PR. In its current form it breaks coverage filter I
think, also coverage reports, also coverage exchange between VMs and
probably more. All these places assume PCs don't change/use fixed
modules snapshot.
If this would be a small, low risk change with no performance impact,
maybe we could just merge it. But it's not small, not low risk, has
performance/memory impact and increase the system complexity globally
for the whole project forever. So I still interested in the getting to
the bottom of non-deterministic module loading.

I see that modules are allocated from the dedicated region:
https://elixir.bootlin.com/linux/latest/source/arch/arm64/kernel/module.c#L38

and they are allocated deterministically at the lowest address:
https://elixir.bootlin.com/linux/latest/source/mm/vmalloc.c#L1105

What are the contents of /proc/vmallocinfo and /proc/modules at the
time of the first module address difference?
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/325f2b6a-eb2f-4ab4-a605-defd4958f90cn%40googlegroups.com.

Joey Jiao

unread,
Jun 4, 2021, 1:44:33 AM6/4/21
to syzkaller
OK, so modprobe -a causes the issue, but I didn't dig further.
So here's the solution:
1. disable kaslr
2. get module loader order from original device `cat /proc/modules | sort -k 6`
3. modprobe these modules one by one.

Dmitry Vyukov

unread,
Jun 4, 2021, 4:27:01 AM6/4/21
to Joey Jiao, syzkaller
On Fri, Jun 4, 2021 at 7:44 AM Joey Jiao <joey....@gmail.com> wrote:
>
> OK, so modprobe -a causes the issue, but I didn't dig further.
> So here's the solution:
> 1. disable kaslr
> 2. get module loader order from original device `cat /proc/modules | sort -k 6`
> 3. modprobe these modules one by one.

Cool! Glad you sorted it out. And thanks for writing down the solution.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/a4440913-fbaa-4750-8290-f374361eadedn%40googlegroups.com.

Max Spector

unread,
Jun 8, 2021, 10:41:35 PM6/8/21
to syzkaller
Hey we are trying to do this on an actual Pixel device but are unable to modprobe easily.  Could you share the script you are using to get the modules to align please.   We are having issues loading and unloading the modules on a Pixel device.

Joey Jiao

unread,
Jun 13, 2021, 5:39:34 AM6/13/21
to syzkaller
So for example https://source.android.com/devices/architecture/kernel/loadable-kernel-modules
```
on early-init
    exec u:r:vendor_modprobe:s0 -- /vendor/bin/modprobe -a -d \
        /vendor/lib/modules module_a module_b module_c ...
```

Instead of modprobe -a _a _b, it can be done like this:
1. put modprobe thing into a shell script
2. in the script load the module one by one
```
M="module_a module_b" # the order in M is getting from `cat /proc/modules | sort -k 6 | awk '{print $1}'`
for i in $M; do 
    /vendor/bin/modprobe -d /vendor/lib/modules $i
done
```

Joey Jiao

unread,
Jun 24, 2021, 8:07:34 PM6/24/21
to syzkaller
Add a link to the way to fix PC mismatching on arm64 https://github.com/google/syzkaller/pull/2631#issuecomment-868084389

OS, binutils, syzkaller are all needed to be modified to fix it.

Reply all
Reply to author
Forward
0 new messages