Re: [syzbot] [kernel?] possible deadlock in scheduler_tick (2)

1 view
Skip to first unread message

Tetsuo Handa

unread,
May 20, 2023, 7:02:39 AM5/20/23
to syzbot, syzkall...@googlegroups.com, Mel Gorman, Huang, Ying, Vlastimil Babka, Andrew Morton, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, kasan-dev, linux-mm
This looks same pattern with https://lkml.kernel.org/r/6577e1fa-b6ee-f2be...@I-love.SAKURA.ne.jp .
I think stackdepot needs to drop __GFP_KSWAPD_RECLAIM as well as debugobject did.

Like I wrote at https://lore.kernel.org/linux-mm/d642e597-cf7d-b410...@I-love.SAKURA.ne.jp/ ,
I wish that GFP_ATOMIC / GFP_NOWAIT do not imply __GFP_KSWAPD_RECLAIM...

On 2023/05/20 17:26, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: f1fcbaa18b28 Linux 6.4-rc2
> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
> console output: https://syzkaller.appspot.com/x/log.txt?x=1332a029280000
> kernel config: https://syzkaller.appspot.com/x/.config?x=3dc1cdd68141cdc3
> dashboard link: https://syzkaller.appspot.com/bug?extid=ece2915262061d6e0ac1
> compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
> userspace arch: arm64
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/f9e1748cceea/disk-f1fcbaa1.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/6dea99343621/vmlinux-f1fcbaa1.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/f5a93f86012d/Image-f1fcbaa1.gz.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+ece291...@syzkaller.appspotmail.com

Tetsuo Handa

unread,
May 20, 2023, 7:33:22 AM5/20/23
to syzbot, syzkall...@googlegroups.com, Mel Gorman, Huang, Ying, Vlastimil Babka, Andrew Morton, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, kasan-dev, linux-mm
syzbot is reporting lockdep warning in __stack_depot_save(), for
wakeup_kswapd() from wake_all_kswapds() from __alloc_pages_slowpath()
calls wakeup_kcompactd() when __GFP_KSWAPD_RECLAIM is set and
__GFP_DIRECT_RECLAIM is not set (i.e. GFP_ATOMIC).

Since __stack_depot_save() might be called with arbitrary locks held,
__stack_depot_save() should not wake kswapd which in turn wakes kcompactd.

Reported-by: syzbot <syzbot+ece291...@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=ece2915262061d6e0ac1
Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Fixes: cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB")
---
lib/stackdepot.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 2f5aa851834e..5c331a80b87a 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -405,7 +405,10 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
* contexts and I/O.
*/
alloc_flags &= ~GFP_ZONEMASK;
- alloc_flags &= (GFP_ATOMIC | GFP_KERNEL);
+ if (!(alloc_flags & __GFP_DIRECT_RECLAIM))
+ alloc_flags &= __GFP_HIGH;
+ else
+ alloc_flags &= GFP_KERNEL;
alloc_flags |= __GFP_NOWARN;
page = alloc_pages(alloc_flags, DEPOT_POOL_ORDER);
if (page)
--
2.18.4

Tetsuo Handa

unread,
May 20, 2023, 9:15:12 AM5/20/23
to syzbot, syzkall...@googlegroups.com, Mel Gorman, Huang, Ying, Vlastimil Babka, Andrew Morton, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, kasan-dev, linux-mm
On 2023/05/20 20:33, Tetsuo Handa wrote:
> @@ -405,7 +405,10 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
> * contexts and I/O.
> */
> alloc_flags &= ~GFP_ZONEMASK;
> - alloc_flags &= (GFP_ATOMIC | GFP_KERNEL);
> + if (!(alloc_flags & __GFP_DIRECT_RECLAIM))
> + alloc_flags &= __GFP_HIGH;
> + else
> + alloc_flags &= GFP_KERNEL;
> alloc_flags |= __GFP_NOWARN;

Well, comparing with a report which reached __stack_depot_save() via fill_pool()
( https://syzkaller.appspot.com/bug?extid=358bb3e221c762a1adbb ), I feel that
above lines might be bogus.

Maybe we want to enable __GFP_HIGH even if alloc_flags == GFP_NOWAIT because
fill_pool() uses __GFPHIGH | __GFP_NOWARN regardless of the caller's context.
Then, these lines could be simplified like below.

if (!(alloc_flags & __GFP_DIRECT_RECLAIM))
alloc_flags = __GFP_HIGH | __GFP_NOWARN;
else
alloc_flags = (alloc_flags & GFP_KERNEL) | __GFP_NOWARN;

How is the importance of memory allocation in __stack_depot_save() ?
If allocation failure is welcome, maybe we should not trigger OOM killer
by clearing __GFP_NORETRY when alloc_flags contained __GFP_FS ...

Tetsuo Handa

unread,
May 20, 2023, 6:44:40 PM5/20/23
to syzbot, syzkall...@googlegroups.com, Mel Gorman, Huang, Ying, Vlastimil Babka, Andrew Morton, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, kasan-dev, linux-mm
Well, since stackdepot itself simply use GFP flags supplied by kasan,
this should be considered as a kasan's problem?

__kasan_record_aux_stack() {
alloc_meta->aux_stack[0] = kasan_save_stack(GFP_NOWAIT, can_alloc); // May deadlock due to including __GFP_KSWAPD_RECLAIM bit.
}

Any atomic allocation used by KASAN needs to drop __GFP_KSWAPD_RECLAIM bit.
Where do we want to drop this bit (in the caller side, or in the callee side)?

Huang, Ying

unread,
May 21, 2023, 10:14:44 PM5/21/23
to Tetsuo Handa, syzbot, syzkall...@googlegroups.com, Mel Gorman, Vlastimil Babka, Andrew Morton, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, kasan-dev, linux-mm, Johannes Weiner
Yes. I think we should fix the KASAN. Maybe define a new GFP_XXX
(instead of GFP_ATOMIC) for debug code? The debug code may be called at
almost arbitrary places, and wakeup_kswap() isn't safe to be called in
some situations.

BTW: I still think that it's better to show the circular lock order in
the patch description. I know the information is in syzkaller report.
It will make reader's life easier if the patch description is
self-contained.

Best Regards,
Huang, Ying

Tetsuo Handa

unread,
May 21, 2023, 10:48:07 PM5/21/23
to Huang, Ying, syzbot, syzkall...@googlegroups.com, Mel Gorman, Vlastimil Babka, Andrew Morton, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, kasan-dev, linux-mm, Johannes Weiner, Michal Hocko
On 2023/05/22 11:13, Huang, Ying wrote:
>> Any atomic allocation used by KASAN needs to drop __GFP_KSWAPD_RECLAIM bit.
>> Where do we want to drop this bit (in the caller side, or in the callee side)?
>
> Yes. I think we should fix the KASAN. Maybe define a new GFP_XXX
> (instead of GFP_ATOMIC) for debug code? The debug code may be called at
> almost arbitrary places, and wakeup_kswap() isn't safe to be called in
> some situations.

What do you think about removing __GFP_KSWAPD_RECLAIM from GFP_ATOMIC and GFP_NOWAIT?
Recent reports indicate that atomic allocations (GFP_ATOMIC and GFP_NOWAIT) are not safe
enough to think "atomic". They just don't do direct reclaim, but they do take spinlocks.
Removing __GFP_KSWAPD_RECLAIM from GFP_ATOMIC and GFP_NOWAIT simplifies locking dependency and
reduces latency of atomic allocations (which is important when called from "atomic" context).
I consider that memory allocations which do not do direct reclaim should be geared towards
less locking dependency.

In general, GFP_ATOMIC or GFP_NOWAIT users will not allocate many pages.
It is likely that somebody else tries to allocate memory using __GFP_DIRECT_RECLAIM
right after GFP_ATOMIC or GFP_NOWAIT allocations. We unlikely need to wake kswapd
upon GFP_ATOMIC or GFP_NOWAIT allocations.

If some GFP_ATOMIC or GFP_NOWAIT users need to allocate many pages, they can add
__GFP_KSWAPD_RECLAIM explicitly; though allocating many pages using GFP_ATOMIC or
GFP_NOWAIT is not recommended from the beginning...

Huang, Ying

unread,
May 21, 2023, 11:09:08 PM5/21/23
to Tetsuo Handa, syzbot, syzkall...@googlegroups.com, Mel Gorman, Vlastimil Babka, Andrew Morton, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, kasan-dev, linux-mm, Johannes Weiner, Michal Hocko
Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp> writes:

> On 2023/05/22 11:13, Huang, Ying wrote:
>>> Any atomic allocation used by KASAN needs to drop __GFP_KSWAPD_RECLAIM bit.
>>> Where do we want to drop this bit (in the caller side, or in the callee side)?
>>
>> Yes. I think we should fix the KASAN. Maybe define a new GFP_XXX
>> (instead of GFP_ATOMIC) for debug code? The debug code may be called at
>> almost arbitrary places, and wakeup_kswap() isn't safe to be called in
>> some situations.
>
> What do you think about removing __GFP_KSWAPD_RECLAIM from GFP_ATOMIC and GFP_NOWAIT?
> Recent reports indicate that atomic allocations (GFP_ATOMIC and GFP_NOWAIT) are not safe
> enough to think "atomic". They just don't do direct reclaim, but they do take spinlocks.
> Removing __GFP_KSWAPD_RECLAIM from GFP_ATOMIC and GFP_NOWAIT simplifies locking dependency and
> reduces latency of atomic allocations (which is important when called from "atomic" context).
> I consider that memory allocations which do not do direct reclaim should be geared towards
> less locking dependency.

Except debug code, where do you find locking issues for waking up kswapd?

> In general, GFP_ATOMIC or GFP_NOWAIT users will not allocate many pages.
> It is likely that somebody else tries to allocate memory using __GFP_DIRECT_RECLAIM
> right after GFP_ATOMIC or GFP_NOWAIT allocations. We unlikely need to wake kswapd
> upon GFP_ATOMIC or GFP_NOWAIT allocations.
>
> If some GFP_ATOMIC or GFP_NOWAIT users need to allocate many pages, they can add
> __GFP_KSWAPD_RECLAIM explicitly; though allocating many pages using GFP_ATOMIC or
> GFP_NOWAIT is not recommended from the beginning...

From performance perspective, it's better to wake up kswapd as early as
possible. Because it can reduce the possibility of the direct
reclaiming, which may case very long latency.

Best Regards,
Huang, Ying

Tetsuo Handa

unread,
May 22, 2023, 7:34:08 AM5/22/23
to Huang, Ying, syzbot, syzkall...@googlegroups.com, Mel Gorman, Vlastimil Babka, Andrew Morton, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, kasan-dev, linux-mm, Johannes Weiner, Michal Hocko
On 2023/05/22 12:07, Huang, Ying wrote:
> Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp> writes:
>
>> On 2023/05/22 11:13, Huang, Ying wrote:
>>>> Any atomic allocation used by KASAN needs to drop __GFP_KSWAPD_RECLAIM bit.
>>>> Where do we want to drop this bit (in the caller side, or in the callee side)?
>>>
>>> Yes. I think we should fix the KASAN. Maybe define a new GFP_XXX
>>> (instead of GFP_ATOMIC) for debug code? The debug code may be called at
>>> almost arbitrary places, and wakeup_kswap() isn't safe to be called in
>>> some situations.
>>
>> What do you think about removing __GFP_KSWAPD_RECLAIM from GFP_ATOMIC and GFP_NOWAIT?
>> Recent reports indicate that atomic allocations (GFP_ATOMIC and GFP_NOWAIT) are not safe
>> enough to think "atomic". They just don't do direct reclaim, but they do take spinlocks.
>> Removing __GFP_KSWAPD_RECLAIM from GFP_ATOMIC and GFP_NOWAIT simplifies locking dependency and
>> reduces latency of atomic allocations (which is important when called from "atomic" context).
>> I consider that memory allocations which do not do direct reclaim should be geared towards
>> less locking dependency.
>
> Except debug code, where do you find locking issues for waking up kswapd?

I'm not aware of lockdep reports except debug code.

But due to too many locking dependency, lockdep gives up tracking all dependency (e.g.

https://syzkaller.appspot.com/bug?extid=8a249628ae32ea7de3a2
https://syzkaller.appspot.com/bug?extid=a70a6358abd2c3f9550f
https://syzkaller.appspot.com/bug?extid=9bbbacfbf1e04d5221f7
https://syzkaller.appspot.com/bug?extid=b04c9ffbbd2f303d00d9

). I want to reduce locking patterns where possible. pgdat->{kswapd,kcompactd}_wait.lock
and zonelist_update_seq are candidates which need not to be held from interrupt context.

>
>> In general, GFP_ATOMIC or GFP_NOWAIT users will not allocate many pages.
>> It is likely that somebody else tries to allocate memory using __GFP_DIRECT_RECLAIM
>> right after GFP_ATOMIC or GFP_NOWAIT allocations. We unlikely need to wake kswapd
>> upon GFP_ATOMIC or GFP_NOWAIT allocations.
>>
>> If some GFP_ATOMIC or GFP_NOWAIT users need to allocate many pages, they can add
>> __GFP_KSWAPD_RECLAIM explicitly; though allocating many pages using GFP_ATOMIC or
>> GFP_NOWAIT is not recommended from the beginning...
>
>>From performance perspective, it's better to wake up kswapd as early as
> possible. Because it can reduce the possibility of the direct
> reclaiming, which may case very long latency.

My expectation is that a __GFP_DIRECT_RECLAIM allocation request which happened
after a !__GFP_KSWAPD_RECLAIM allocation request wakes kswapd before future
__GFP_DIRECT_RECLAIM allocation requests have to perform the direct reclaiming.

>
> Best Regards,
> Huang, Ying
>

Huang, Ying

unread,
May 22, 2023, 8:09:04 PM5/22/23
to Tetsuo Handa, syzbot, syzkall...@googlegroups.com, Mel Gorman, Vlastimil Babka, Andrew Morton, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, kasan-dev, linux-mm, Johannes Weiner, Michal Hocko
Why is it not safe to wake up kswapd/kcompactd from interrupt context?

Best Regards,
Huang, Ying

Tetsuo Handa

unread,
May 22, 2023, 8:45:19 PM5/22/23
to Huang, Ying, syzbot, syzkall...@googlegroups.com, Mel Gorman, Vlastimil Babka, Andrew Morton, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, kasan-dev, linux-mm, Johannes Weiner, Michal Hocko
On 2023/05/23 9:07, Huang, Ying wrote:
>>> Except debug code, where do you find locking issues for waking up kswapd?
>>
>> I'm not aware of lockdep reports except debug code.
>>
>> But due to too many locking dependency, lockdep gives up tracking all dependency (e.g.
>>
>> https://syzkaller.appspot.com/bug?extid=8a249628ae32ea7de3a2
>> https://syzkaller.appspot.com/bug?extid=a70a6358abd2c3f9550f
>> https://syzkaller.appspot.com/bug?extid=9bbbacfbf1e04d5221f7
>> https://syzkaller.appspot.com/bug?extid=b04c9ffbbd2f303d00d9
>>
>> ). I want to reduce locking patterns where possible. pgdat->{kswapd,kcompactd}_wait.lock
>> and zonelist_update_seq are candidates which need not to be held from interrupt context.
>
> Why is it not safe to wake up kswapd/kcompactd from interrupt context?

I'm not saying it is not safe to wake up kswapd/kcompactd from interrupt context.
Please notice that I'm using "need not" than "must not".

Since total amount of RAM a Linux kernel can use had been increased over years,
watermark gap between "kswapd should start background reclaim" and "current thread
must start foreground reclaim" also increased. Then, randomly allocating small
amount of pages from interrupt context (or atomic context) without waking up
will not needlessly increase possibility of reaching "current thread must start
foreground reclaim" watermark. Then, reducing locking dependency by not waking up
becomes a gain.





KASAN developers, I'm waiting for your response on

How is the importance of memory allocation in __stack_depot_save() ?
If allocation failure is welcome, maybe we should not trigger OOM killer
by clearing __GFP_NORETRY when alloc_flags contained __GFP_FS ...

part.

Huang, Ying

unread,
May 22, 2023, 9:11:29 PM5/22/23
to Tetsuo Handa, syzbot, syzkall...@googlegroups.com, Mel Gorman, Vlastimil Babka, Andrew Morton, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, kasan-dev, linux-mm, Johannes Weiner, Michal Hocko
Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp> writes:

> On 2023/05/23 9:07, Huang, Ying wrote:
>>>> Except debug code, where do you find locking issues for waking up kswapd?
>>>
>>> I'm not aware of lockdep reports except debug code.
>>>
>>> But due to too many locking dependency, lockdep gives up tracking all dependency (e.g.
>>>
>>> https://syzkaller.appspot.com/bug?extid=8a249628ae32ea7de3a2
>>> https://syzkaller.appspot.com/bug?extid=a70a6358abd2c3f9550f
>>> https://syzkaller.appspot.com/bug?extid=9bbbacfbf1e04d5221f7
>>> https://syzkaller.appspot.com/bug?extid=b04c9ffbbd2f303d00d9
>>>
>>> ). I want to reduce locking patterns where possible. pgdat->{kswapd,kcompactd}_wait.lock
>>> and zonelist_update_seq are candidates which need not to be held from interrupt context.
>>
>> Why is it not safe to wake up kswapd/kcompactd from interrupt context?
>
> I'm not saying it is not safe to wake up kswapd/kcompactd from interrupt context.
> Please notice that I'm using "need not" than "must not".

Got it.

> Since total amount of RAM a Linux kernel can use had been increased over years,
> watermark gap between "kswapd should start background reclaim" and "current thread
> must start foreground reclaim" also increased. Then, randomly allocating small
> amount of pages from interrupt context (or atomic context) without waking up
> will not needlessly increase possibility of reaching "current thread must start
> foreground reclaim" watermark. Then, reducing locking dependency by not waking up
> becomes a gain.

Personally, I prefer to wake up kswapd ASAP. And fix the deadlock if
possible.

Best Regards,
Huang, Ying

Michal Hocko

unread,
May 24, 2023, 8:10:02 AM5/24/23
to Tetsuo Handa, Huang, Ying, syzbot, syzkall...@googlegroups.com, Mel Gorman, Vlastimil Babka, Andrew Morton, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, kasan-dev, linux-mm, Johannes Weiner
On Mon 22-05-23 11:47:25, Tetsuo Handa wrote:
> On 2023/05/22 11:13, Huang, Ying wrote:
> >> Any atomic allocation used by KASAN needs to drop __GFP_KSWAPD_RECLAIM bit.
> >> Where do we want to drop this bit (in the caller side, or in the callee side)?
> >
> > Yes. I think we should fix the KASAN. Maybe define a new GFP_XXX
> > (instead of GFP_ATOMIC) for debug code? The debug code may be called at
> > almost arbitrary places, and wakeup_kswap() isn't safe to be called in
> > some situations.
>
> What do you think about removing __GFP_KSWAPD_RECLAIM from GFP_ATOMIC and GFP_NOWAIT?

Not a good idea IMO. It is really hard to achieve real locklessness in the
page allocator. If we ever need something like that it should be pretty
obviously requested by a dedicated gfp flag rather than overriding a
long term established semantic. While GFP_ATOMIC is a bit of a misnomer
it has many users who really only require non-sleeping behavior.

> Recent reports indicate that atomic allocations (GFP_ATOMIC and GFP_NOWAIT) are not safe
> enough to think "atomic". They just don't do direct reclaim, but they do take spinlocks.
> Removing __GFP_KSWAPD_RECLAIM from GFP_ATOMIC and GFP_NOWAIT simplifies locking dependency and
> reduces latency of atomic allocations (which is important when called from "atomic" context).

I would really like to see any numbers to believe this is the case
actually. Waking up kswapd should be pretty non-visible.

> I consider that memory allocations which do not do direct reclaim should be geared towards
> less locking dependency.
>
> In general, GFP_ATOMIC or GFP_NOWAIT users will not allocate many pages.

This hugely depend on the workload. I do not think we can make any
generic statements like that.

> It is likely that somebody else tries to allocate memory using __GFP_DIRECT_RECLAIM
> right after GFP_ATOMIC or GFP_NOWAIT allocations. We unlikely need to wake kswapd
> upon GFP_ATOMIC or GFP_NOWAIT allocations.

The thing is that you do not know this is a the case. You might have a
IRQ heavy prossing making a lot of memory allocations (e.g. networking)
while the rest of the processing doesn't require any additional memory.

> If some GFP_ATOMIC or GFP_NOWAIT users need to allocate many pages, they can add
> __GFP_KSWAPD_RECLAIM explicitly; though allocating many pages using GFP_ATOMIC or
> GFP_NOWAIT is not recommended from the beginning...

As much as I do not really like the long term GFP_ATOMIC semantic I do
not think we should be changing it to what you are proposing for reasons
mentioned above. GFP_NOWAIT change is even more questionable. Many users
simply use GFP_NOWAIT as a way of an optimistic allocation with a more
expensinsive fallback. We do not want to allow those consumers to
consume watermark gap memory to force others to hit the direct reclaim
wall.

Really there is very likely only a handfull of users who cannot even
wake kswapd or perform other non-sleeping locking and those should
currently drop __GFP_KSWAPD_RECLAIM. Maybe we should consider an alias
for them to not bother with the low level flag. Maybe we will need
GFP_LOCKLESS or something similar.
--
Michal Hocko
SUSE Labs

Tetsuo Handa

unread,
May 27, 2023, 11:30:56 AM5/27/23
to syzbot, syzkall...@googlegroups.com, Mel Gorman, Huang, Ying, Vlastimil Babka, Andrew Morton, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, Vincenzo Frascino, Marco Elver, kasan-dev, linux-mm
syzbot is reporting lockdep warning in __stack_depot_save(), for
the caller of __stack_depot_save() (i.e. __kasan_record_aux_stack() in
this report) is responsible for masking __GFP_KSWAPD_RECLAIM flag in
order not to wake kswapd which in turn wakes kcompactd.

Since kasan/kmsan functions might be called with arbitrary locks held,
mask __GFP_KSWAPD_RECLAIM flag from all GFP_NOWAIT/GFP_ATOMIC allocations
in kasan/kmsan.

Note that kmsan_save_stack_with_flags() is changed to mask both
__GFP_DIRECT_RECLAIM flag and __GFP_KSWAPD_RECLAIM flag, for
wakeup_kswapd() from wake_all_kswapds() from __alloc_pages_slowpath()
calls wakeup_kcompactd() if __GFP_KSWAPD_RECLAIM flag is set and
__GFP_DIRECT_RECLAIM flag is not set.
---
mm/kasan/generic.c | 4 ++--
mm/kasan/tags.c | 2 +-
mm/kmsan/core.c | 6 +++---
mm/kmsan/instrumentation.c | 2 +-
4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index e5eef670735e..2c94f4943240 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -488,7 +488,7 @@ static void __kasan_record_aux_stack(void *addr, bool can_alloc)
return;

alloc_meta->aux_stack[1] = alloc_meta->aux_stack[0];
- alloc_meta->aux_stack[0] = kasan_save_stack(GFP_NOWAIT, can_alloc);
+ alloc_meta->aux_stack[0] = kasan_save_stack(0, can_alloc);
}

void kasan_record_aux_stack(void *addr)
@@ -518,7 +518,7 @@ void kasan_save_free_info(struct kmem_cache *cache, void *object)
if (!free_meta)
return;

- kasan_set_track(&free_meta->free_track, GFP_NOWAIT);
+ kasan_set_track(&free_meta->free_track, 0);
/* The object was freed and has free track set. */
*(u8 *)kasan_mem_to_shadow(object) = KASAN_SLAB_FREETRACK;
}
diff --git a/mm/kasan/tags.c b/mm/kasan/tags.c
index 67a222586846..7dcfe341d48e 100644
--- a/mm/kasan/tags.c
+++ b/mm/kasan/tags.c
@@ -140,5 +140,5 @@ void kasan_save_alloc_info(struct kmem_cache *cache, void *object, gfp_t flags)

void kasan_save_free_info(struct kmem_cache *cache, void *object)
{
- save_stack_info(cache, object, GFP_NOWAIT, true);
+ save_stack_info(cache, object, 0, true);
}
diff --git a/mm/kmsan/core.c b/mm/kmsan/core.c
index 7d1e4aa30bae..3adb4c1d3b19 100644
--- a/mm/kmsan/core.c
+++ b/mm/kmsan/core.c
@@ -74,7 +74,7 @@ depot_stack_handle_t kmsan_save_stack_with_flags(gfp_t flags,
nr_entries = stack_trace_save(entries, KMSAN_STACK_DEPTH, 0);

/* Don't sleep. */
- flags &= ~__GFP_DIRECT_RECLAIM;
+ flags &= ~(__GFP_DIRECT_RECLAIM | __GFP_KSWAPD_RECLAIM);

handle = __stack_depot_save(entries, nr_entries, flags, true);
return stack_depot_set_extra_bits(handle, extra);
@@ -245,7 +245,7 @@ depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id)
extra_bits = kmsan_extra_bits(depth, uaf);

entries[0] = KMSAN_CHAIN_MAGIC_ORIGIN;
- entries[1] = kmsan_save_stack_with_flags(GFP_ATOMIC, 0);
+ entries[1] = kmsan_save_stack_with_flags(__GFP_HIGH, 0);
entries[2] = id;
/*
* @entries is a local var in non-instrumented code, so KMSAN does not
@@ -253,7 +253,7 @@ depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id)
* positives when __stack_depot_save() passes it to instrumented code.
*/
kmsan_internal_unpoison_memory(entries, sizeof(entries), false);
- handle = __stack_depot_save(entries, ARRAY_SIZE(entries), GFP_ATOMIC,
+ handle = __stack_depot_save(entries, ARRAY_SIZE(entries), __GFP_HIGH,
true);
return stack_depot_set_extra_bits(handle, extra_bits);
}
diff --git a/mm/kmsan/instrumentation.c b/mm/kmsan/instrumentation.c
index cf12e9616b24..cc3907a9c33a 100644
--- a/mm/kmsan/instrumentation.c
+++ b/mm/kmsan/instrumentation.c
@@ -282,7 +282,7 @@ void __msan_poison_alloca(void *address, uintptr_t size, char *descr)

/* stack_depot_save() may allocate memory. */
kmsan_enter_runtime();
- handle = stack_depot_save(entries, ARRAY_SIZE(entries), GFP_ATOMIC);
+ handle = stack_depot_save(entries, ARRAY_SIZE(entries), __GFP_HIGH);
kmsan_leave_runtime();

kmsan_internal_set_shadow_origin(address, size, -1, handle,
--
2.34.1

syzbot

unread,
May 27, 2023, 5:01:43 PM5/27/23
to ak...@linux-foundation.org, almaz.ale...@paragon-software.com, andre...@gmail.com, dvy...@google.com, el...@google.com, fred...@kernel.org, gli...@google.com, han...@cmpxchg.org, kasa...@googlegroups.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, mgo...@techsingularity.net, mho...@kernel.org, mho...@suse.com, mi...@kernel.org, nt...@lists.linux.dev, penguin...@i-love.sakura.ne.jp, ryabin...@gmail.com, syzkall...@googlegroups.com, tg...@linutronix.de, vba...@suse.cz, vincenzo...@arm.com, ying....@intel.com
syzbot has found a reproducer for the following issue on:

HEAD commit: eb0f1697d729 Merge branch 'for-next/core', remote-tracking..
console output: https://syzkaller.appspot.com/x/log.txt?x=17733b4d280000
kernel config: https://syzkaller.appspot.com/x/.config?x=8860074b9a9d6c45
dashboard link: https://syzkaller.appspot.com/bug?extid=ece2915262061d6e0ac1
compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm64
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15b4e536280000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=10224d6d280000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/034232da7cff/disk-eb0f1697.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/b11411bec33e/vmlinux-eb0f1697.xz
kernel image: https://storage.googleapis.com/syzbot-assets/a53c52e170dd/Image-eb0f1697.gz.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/2ce9b06770e0/mount_0.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+ece291...@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
6.4.0-rc3-syzkaller-geb0f1697d729 #0 Not tainted
------------------------------------------------------
klogd/5578 is trying to acquire lock:
ffff0001fea76c40 (&pgdat->kcompactd_wait){-...}-{2:2}, at: __wake_up_common_lock kernel/sched/wait.c:137 [inline]
ffff0001fea76c40 (&pgdat->kcompactd_wait){-...}-{2:2}, at: __wake_up+0xec/0x1a8 kernel/sched/wait.c:160

but task is already holding lock:
ffff0001b4259b18 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested kernel/sched/core.c:558 [inline]
ffff0001b4259b18 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock kernel/sched/sched.h:1366 [inline]
ffff0001b4259b18 (&rq->__lock){-.-.}-{2:2}, at: rq_lock kernel/sched/sched.h:1653 [inline]
ffff0001b4259b18 (&rq->__lock){-.-.}-{2:2}, at: scheduler_tick+0xa4/0x52c kernel/sched/core.c:5616

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&rq->__lock){-.-.}-{2:2}:
_raw_spin_lock_nested+0x50/0x6c kernel/locking/spinlock.c:378
raw_spin_rq_lock_nested+0x2c/0x44 kernel/sched/core.c:558
raw_spin_rq_lock kernel/sched/sched.h:1366 [inline]
rq_lock kernel/sched/sched.h:1653 [inline]
task_fork_fair+0x7c/0x23c kernel/sched/fair.c:12095
sched_cgroup_fork+0x38c/0x464 kernel/sched/core.c:4777
copy_process+0x24fc/0x3514 kernel/fork.c:2618
kernel_clone+0x1d8/0x8ac kernel/fork.c:2918
user_mode_thread+0x110/0x178 kernel/fork.c:2996
rest_init+0x2c/0x2f4 init/main.c:700
start_kernel+0x0/0x55c init/main.c:834
start_kernel+0x3f0/0x55c init/main.c:1088
__primary_switched+0xb8/0xc0 arch/arm64/kernel/head.S:523

-> #1 (&p->pi_lock){-.-.}-{2:2}:
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x5c/0x7c kernel/locking/spinlock.c:162
try_to_wake_up+0xb0/0xd9c kernel/sched/core.c:4191
default_wake_function+0x4c/0x60 kernel/sched/core.c:6993
autoremove_wake_function+0x24/0xf8 kernel/sched/wait.c:419
__wake_up_common+0x23c/0x3bc kernel/sched/wait.c:107
__wake_up_common_lock kernel/sched/wait.c:138 [inline]
__wake_up+0x10c/0x1a8 kernel/sched/wait.c:160
wakeup_kcompactd+0x254/0x310 mm/compaction.c:2942
balance_pgdat+0x1880/0x1c34 mm/vmscan.c:7541
kswapd+0x7d0/0x10fc mm/vmscan.c:7738
kthread+0x288/0x310 kernel/kthread.c:379
ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:853

-> #0 (&pgdat->kcompactd_wait){-...}-{2:2}:
check_prev_add kernel/locking/lockdep.c:3108 [inline]
check_prevs_add kernel/locking/lockdep.c:3227 [inline]
validate_chain kernel/locking/lockdep.c:3842 [inline]
__lock_acquire+0x3310/0x75f0 kernel/locking/lockdep.c:5074
lock_acquire+0x23c/0x71c kernel/locking/lockdep.c:5691
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x5c/0x7c kernel/locking/spinlock.c:162
__wake_up_common_lock kernel/sched/wait.c:137 [inline]
__wake_up+0xec/0x1a8 kernel/sched/wait.c:160
wakeup_kcompactd+0x254/0x310 mm/compaction.c:2942
wakeup_kswapd+0x350/0x8c8 mm/vmscan.c:7792
wake_all_kswapds+0x13c/0x23c mm/page_alloc.c:4028
__alloc_pages_slowpath+0x378/0x1edc mm/page_alloc.c:4296
__alloc_pages+0x3bc/0x698 mm/page_alloc.c:4781
alloc_pages+0x4bc/0x7c0
__stack_depot_save+0x4ac/0x678 lib/stackdepot.c:410
kasan_save_stack+0x54/0x6c mm/kasan/common.c:46
__kasan_record_aux_stack+0xcc/0xe8 mm/kasan/generic.c:491
kasan_record_aux_stack+0x14/0x20 mm/kasan/generic.c:496
task_work_add+0x94/0x3c0 kernel/task_work.c:48
task_tick_mm_cid kernel/sched/core.c:11940 [inline]
scheduler_tick+0x2d0/0x52c kernel/sched/core.c:5626
update_process_times+0x198/0x1f4 kernel/time/timer.c:2076
tick_sched_handle kernel/time/tick-sched.c:243 [inline]
tick_sched_timer+0x330/0x4e8 kernel/time/tick-sched.c:1481
__run_hrtimer kernel/time/hrtimer.c:1685 [inline]
__hrtimer_run_queues+0x458/0xca0 kernel/time/hrtimer.c:1749
hrtimer_interrupt+0x2c0/0xb64 kernel/time/hrtimer.c:1811
timer_handler drivers/clocksource/arm_arch_timer.c:656 [inline]
arch_timer_handler_virt+0x74/0x88 drivers/clocksource/arm_arch_timer.c:667
handle_percpu_devid_irq+0x2a4/0x804 kernel/irq/chip.c:930
generic_handle_irq_desc include/linux/irqdesc.h:158 [inline]
handle_irq_desc kernel/irq/irqdesc.c:651 [inline]
generic_handle_domain_irq+0x7c/0xc4 kernel/irq/irqdesc.c:707
__gic_handle_irq drivers/irqchip/irq-gic-v3.c:728 [inline]
__gic_handle_irq_from_irqson drivers/irqchip/irq-gic-v3.c:779 [inline]
gic_handle_irq+0x70/0x1e4 drivers/irqchip/irq-gic-v3.c:823
call_on_irq_stack+0x24/0x4c arch/arm64/kernel/entry.S:882
do_interrupt_handler+0xd4/0x138 arch/arm64/kernel/entry-common.c:274
__el1_irq arch/arm64/kernel/entry-common.c:471 [inline]
el1_interrupt+0x34/0x68 arch/arm64/kernel/entry-common.c:486
el1h_64_irq_handler+0x18/0x24 arch/arm64/kernel/entry-common.c:491
el1h_64_irq+0x64/0x68 arch/arm64/kernel/entry.S:587
__daif_local_irq_restore arch/arm64/include/asm/irqflags.h:182 [inline]
arch_local_irq_restore arch/arm64/include/asm/irqflags.h:202 [inline]
dump_stack_lvl+0x104/0x124 lib/dump_stack.c:107
dump_stack+0x1c/0x28 lib/dump_stack.c:113
dump_header+0xb4/0x954 mm/oom_kill.c:460
oom_kill_process+0x10c/0x6ec mm/oom_kill.c:1036
out_of_memory+0xe24/0x103c mm/oom_kill.c:1156
__alloc_pages_may_oom mm/page_alloc.c:3669 [inline]
__alloc_pages_slowpath+0x1714/0x1edc mm/page_alloc.c:4444
__alloc_pages+0x3bc/0x698 mm/page_alloc.c:4781
alloc_pages+0x4bc/0x7c0
alloc_slab_page+0xa0/0x164 mm/slub.c:1851
allocate_slab mm/slub.c:2006 [inline]
new_slab+0x210/0x2f4 mm/slub.c:2051
___slab_alloc+0x80c/0xdf4 mm/slub.c:3192
__slab_alloc mm/slub.c:3291 [inline]
__slab_alloc_node mm/slub.c:3344 [inline]
slab_alloc_node mm/slub.c:3441 [inline]
kmem_cache_alloc_node+0x318/0x46c mm/slub.c:3496
__alloc_skb+0x19c/0x3d8 net/core/skbuff.c:644
alloc_skb include/linux/skbuff.h:1288 [inline]
alloc_skb_with_frags+0xb4/0x590 net/core/skbuff.c:6378
sock_alloc_send_pskb+0x76c/0x884 net/core/sock.c:2729
unix_dgram_sendmsg+0x480/0x16c0 net/unix/af_unix.c:1944
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
__sys_sendto+0x3b4/0x538 net/socket.c:2144
__do_sys_sendto net/socket.c:2156 [inline]
__se_sys_sendto net/socket.c:2152 [inline]
__arm64_sys_sendto+0xd8/0xf8 net/socket.c:2152
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
el0_svc_common+0x138/0x258 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193
el0_svc+0x4c/0x15c arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591

other info that might help us debug this:

Chain exists of:
&pgdat->kcompactd_wait --> &p->pi_lock --> &rq->__lock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&rq->__lock);
lock(&p->pi_lock);
lock(&rq->__lock);
lock(&pgdat->kcompactd_wait);

*** DEADLOCK ***

2 locks held by klogd/5578:
#0: ffff8000161245e8 (oom_lock){+.+.}-{3:3}, at: __alloc_pages_may_oom mm/page_alloc.c:3618 [inline]
#0: ffff8000161245e8 (oom_lock){+.+.}-{3:3}, at: __alloc_pages_slowpath+0x1694/0x1edc mm/page_alloc.c:4444
#1: ffff0001b4259b18 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested kernel/sched/core.c:558 [inline]
#1: ffff0001b4259b18 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock kernel/sched/sched.h:1366 [inline]
#1: ffff0001b4259b18 (&rq->__lock){-.-.}-{2:2}, at: rq_lock kernel/sched/sched.h:1653 [inline]
#1: ffff0001b4259b18 (&rq->__lock){-.-.}-{2:2}, at: scheduler_tick+0xa4/0x52c kernel/sched/core.c:5616

stack backtrace:
CPU: 1 PID: 5578 Comm: klogd Not tainted 6.4.0-rc3-syzkaller-geb0f1697d729 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/28/2023
Call trace:
dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:233
show_stack+0x2c/0x44 arch/arm64/kernel/stacktrace.c:240
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106
dump_stack+0x1c/0x28 lib/dump_stack.c:113
print_circular_bug+0x150/0x1b8 kernel/locking/lockdep.c:2066
check_noncircular+0x2cc/0x378 kernel/locking/lockdep.c:2188
check_prev_add kernel/locking/lockdep.c:3108 [inline]
check_prevs_add kernel/locking/lockdep.c:3227 [inline]
validate_chain kernel/locking/lockdep.c:3842 [inline]
__lock_acquire+0x3310/0x75f0 kernel/locking/lockdep.c:5074
lock_acquire+0x23c/0x71c kernel/locking/lockdep.c:5691
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x5c/0x7c kernel/locking/spinlock.c:162
__wake_up_common_lock kernel/sched/wait.c:137 [inline]
__wake_up+0xec/0x1a8 kernel/sched/wait.c:160
wakeup_kcompactd+0x254/0x310 mm/compaction.c:2942
wakeup_kswapd+0x350/0x8c8 mm/vmscan.c:7792
wake_all_kswapds+0x13c/0x23c mm/page_alloc.c:4028
__alloc_pages_slowpath+0x378/0x1edc mm/page_alloc.c:4296
__alloc_pages+0x3bc/0x698 mm/page_alloc.c:4781
alloc_pages+0x4bc/0x7c0
__stack_depot_save+0x4ac/0x678 lib/stackdepot.c:410
kasan_save_stack+0x54/0x6c mm/kasan/common.c:46
__kasan_record_aux_stack+0xcc/0xe8 mm/kasan/generic.c:491
kasan_record_aux_stack+0x14/0x20 mm/kasan/generic.c:496
task_work_add+0x94/0x3c0 kernel/task_work.c:48
task_tick_mm_cid kernel/sched/core.c:11940 [inline]
scheduler_tick+0x2d0/0x52c kernel/sched/core.c:5626
update_process_times+0x198/0x1f4 kernel/time/timer.c:2076
tick_sched_handle kernel/time/tick-sched.c:243 [inline]
tick_sched_timer+0x330/0x4e8 kernel/time/tick-sched.c:1481
__run_hrtimer kernel/time/hrtimer.c:1685 [inline]
__hrtimer_run_queues+0x458/0xca0 kernel/time/hrtimer.c:1749
hrtimer_interrupt+0x2c0/0xb64 kernel/time/hrtimer.c:1811
timer_handler drivers/clocksource/arm_arch_timer.c:656 [inline]
arch_timer_handler_virt+0x74/0x88 drivers/clocksource/arm_arch_timer.c:667
handle_percpu_devid_irq+0x2a4/0x804 kernel/irq/chip.c:930
generic_handle_irq_desc include/linux/irqdesc.h:158 [inline]
handle_irq_desc kernel/irq/irqdesc.c:651 [inline]
generic_handle_domain_irq+0x7c/0xc4 kernel/irq/irqdesc.c:707
__gic_handle_irq drivers/irqchip/irq-gic-v3.c:728 [inline]
__gic_handle_irq_from_irqson drivers/irqchip/irq-gic-v3.c:779 [inline]
gic_handle_irq+0x70/0x1e4 drivers/irqchip/irq-gic-v3.c:823
call_on_irq_stack+0x24/0x4c arch/arm64/kernel/entry.S:882
do_interrupt_handler+0xd4/0x138 arch/arm64/kernel/entry-common.c:274
__el1_irq arch/arm64/kernel/entry-common.c:471 [inline]
el1_interrupt+0x34/0x68 arch/arm64/kernel/entry-common.c:486
el1h_64_irq_handler+0x18/0x24 arch/arm64/kernel/entry-common.c:491
el1h_64_irq+0x64/0x68 arch/arm64/kernel/entry.S:587
__daif_local_irq_restore arch/arm64/include/asm/irqflags.h:182 [inline]
arch_local_irq_restore arch/arm64/include/asm/irqflags.h:202 [inline]
dump_stack_lvl+0x104/0x124 lib/dump_stack.c:107
dump_stack+0x1c/0x28 lib/dump_stack.c:113
dump_header+0xb4/0x954 mm/oom_kill.c:460
oom_kill_process+0x10c/0x6ec mm/oom_kill.c:1036
out_of_memory+0xe24/0x103c mm/oom_kill.c:1156
__alloc_pages_may_oom mm/page_alloc.c:3669 [inline]
__alloc_pages_slowpath+0x1714/0x1edc mm/page_alloc.c:4444
__alloc_pages+0x3bc/0x698 mm/page_alloc.c:4781
alloc_pages+0x4bc/0x7c0
alloc_slab_page+0xa0/0x164 mm/slub.c:1851
allocate_slab mm/slub.c:2006 [inline]
new_slab+0x210/0x2f4 mm/slub.c:2051
___slab_alloc+0x80c/0xdf4 mm/slub.c:3192
__slab_alloc mm/slub.c:3291 [inline]
__slab_alloc_node mm/slub.c:3344 [inline]
slab_alloc_node mm/slub.c:3441 [inline]
kmem_cache_alloc_node+0x318/0x46c mm/slub.c:3496
__alloc_skb+0x19c/0x3d8 net/core/skbuff.c:644
alloc_skb include/linux/skbuff.h:1288 [inline]
alloc_skb_with_frags+0xb4/0x590 net/core/skbuff.c:6378
sock_alloc_send_pskb+0x76c/0x884 net/core/sock.c:2729
unix_dgram_sendmsg+0x480/0x16c0 net/unix/af_unix.c:1944
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
__sys_sendto+0x3b4/0x538 net/socket.c:2144
__do_sys_sendto net/socket.c:2156 [inline]
__se_sys_sendto net/socket.c:2152 [inline]
__arm64_sys_sendto+0xd8/0xf8 net/socket.c:2152
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
el0_svc_common+0x138/0x258 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193
el0_svc+0x4c/0x15c arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

Huang, Ying

unread,
May 28, 2023, 9:08:32 PM5/28/23
to Tetsuo Handa, syzbot, syzkall...@googlegroups.com, Mel Gorman, Vlastimil Babka, Andrew Morton, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, Vincenzo Frascino, Marco Elver, kasan-dev, linux-mm
Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp> writes:

> syzbot is reporting lockdep warning in __stack_depot_save(), for
> the caller of __stack_depot_save() (i.e. __kasan_record_aux_stack() in
> this report) is responsible for masking __GFP_KSWAPD_RECLAIM flag in
> order not to wake kswapd which in turn wakes kcompactd.
>
> Since kasan/kmsan functions might be called with arbitrary locks held,
> mask __GFP_KSWAPD_RECLAIM flag from all GFP_NOWAIT/GFP_ATOMIC allocations
> in kasan/kmsan.
>
> Note that kmsan_save_stack_with_flags() is changed to mask both
> __GFP_DIRECT_RECLAIM flag and __GFP_KSWAPD_RECLAIM flag, for
> wakeup_kswapd() from wake_all_kswapds() from __alloc_pages_slowpath()
> calls wakeup_kcompactd() if __GFP_KSWAPD_RECLAIM flag is set and
> __GFP_DIRECT_RECLAIM flag is not set.
>
> Reported-by: syzbot <syzbot+ece291...@syzkaller.appspotmail.com>
> Closes: https://syzkaller.appspot.com/bug?extid=ece2915262061d6e0ac1
> Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>

This looks good to me. Thanks!

Reviewed-by: "Huang, Ying" <ying....@intel.com>

Alexander Potapenko

unread,
May 31, 2023, 9:32:32 AM5/31/23
to Huang, Ying, Tetsuo Handa, syzbot, syzkall...@googlegroups.com, Mel Gorman, Vlastimil Babka, Andrew Morton, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, Vincenzo Frascino, Marco Elver, kasan-dev, linux-mm
On Mon, May 29, 2023 at 3:08 AM Huang, Ying <ying....@intel.com> wrote:
>
> Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp> writes:
>
> > syzbot is reporting lockdep warning in __stack_depot_save(), for
> > the caller of __stack_depot_save() (i.e. __kasan_record_aux_stack() in
> > this report) is responsible for masking __GFP_KSWAPD_RECLAIM flag in
> > order not to wake kswapd which in turn wakes kcompactd.
> >
> > Since kasan/kmsan functions might be called with arbitrary locks held,
> > mask __GFP_KSWAPD_RECLAIM flag from all GFP_NOWAIT/GFP_ATOMIC allocations
> > in kasan/kmsan.
> >
> > Note that kmsan_save_stack_with_flags() is changed to mask both
> > __GFP_DIRECT_RECLAIM flag and __GFP_KSWAPD_RECLAIM flag, for
> > wakeup_kswapd() from wake_all_kswapds() from __alloc_pages_slowpath()
> > calls wakeup_kcompactd() if __GFP_KSWAPD_RECLAIM flag is set and
> > __GFP_DIRECT_RECLAIM flag is not set.
> >
> > Reported-by: syzbot <syzbot+ece291...@syzkaller.appspotmail.com>
> > Closes: https://syzkaller.appspot.com/bug?extid=ece2915262061d6e0ac1
> > Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
>
> This looks good to me. Thanks!
>
> Reviewed-by: "Huang, Ying" <ying....@intel.com>

Sorry for the late reply, but maybe it would be better to mask this
flag in __stack_depot_save() (lib/stackdepot.c) instead?
We are already masking out a number of flags there, and the problem
seems quite generic.

Andrew Morton

unread,
Jun 9, 2023, 6:31:26 PM6/9/23
to Alexander Potapenko, Huang, Ying, Tetsuo Handa, syzbot, syzkall...@googlegroups.com, Mel Gorman, Vlastimil Babka, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, Vincenzo Frascino, Marco Elver, kasan-dev, linux-mm
Tetsuo?

Tetsuo Handa

unread,
Jun 10, 2023, 7:40:54 AM6/10/23
to Andrew Morton, Alexander Potapenko, Huang, Ying, syzbot, syzkall...@googlegroups.com, Mel Gorman, Vlastimil Babka, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, Vincenzo Frascino, Marco Elver, kasan-dev, linux-mm
syzbot is reporting lockdep warning in __stack_depot_save(), for
__kasan_record_aux_stack() is passing GFP_NOWAIT which will result in
calling wakeup_kcompactd() from wakeup_kswapd() from wake_all_kswapds()
from __alloc_pages_slowpath().

Strictly speaking, __kasan_record_aux_stack() is responsible for removing
__GFP_KSWAPD_RECLAIM flag in order not to wake kswapd which in turn wakes
kcompactd. But since KASAN and KMSAN functions might be called with
arbitrary locks held, we should consider removing __GFP_KSWAPD_RECLAIM
flag from KASAN and KMSAN. And this patch goes one step further; let's
remove __GFP_KSWAPD_RECLAIM flag in the __stack_depot_save() side, based
on the following reasons.

Reason 1:

Currently, __stack_depot_save() has "alloc_flags &= ~GFP_ZONEMASK;" line
which is pointless because "alloc_flags &= (GFP_ATOMIC | GFP_KERNEL);"
line will also zero out zone modifiers. But why is __stack_depot_save()
trying to mask gfp flags supplied by the caller?

I guess that __stack_depot_save() tried to be as robust as possible. But
__stack_depot_save() is a debugging function where all callers have to
be able to survive allocation failures. Scattering low-level gfp flags
like 0 or __GFP_HIGH should be avoided in order to replace GFP_NOWAIT or
GFP_ATOMIC.

Reason 2:

__stack_depot_save() from stack_depot_save() is also called by
ref_tracker_alloc() from __netns_tracker_alloc() from
netns_tracker_alloc() from get_net_track(), and some of get_net_track()
users are passing GFP_ATOMIC because waking kswapd/kcompactd is safe.
But even if we mask __GFP_KSWAPD_RECLAIM flag at __stack_depot_save(),
it is very likely that allocations with __GFP_KSWAPD_RECLAIM flag happen
somewhere else by the moment __stack_depot_save() is called for the next
time.

Therefore, not waking kswapd/kcompactd when doing allocation for
__stack_depot_save() will be acceptable from the memory reclaim latency
perspective.

While we are at it, let's make __stack_depot_save() accept __GFP_NORETRY
and __GFP_RETRY_MAYFAIL flags, based on the following reason.

Reason 3:

Since DEPOT_POOL_ORDER is defined as 2, we must mask __GFP_NOFAIL flag
in order not to complain rmqueue(). But masking __GFP_NORETRY flag and
__GFP_RETRY_MAYFAIL flag might be overkill.

The OOM killer might be needlessly invoked due to order-2 allocation if
GFP_KERNEL is supplied by the caller, despite the caller might have
passed GFP_KERNEL for doing order-0 allocation.

Allocation for order-2 might stall if GFP_NOFS or GFP_NOIO is supplied
by the caller, despite the caller might have passed GFP_NOFS or GFP_NOIO
for doing order-0 allocation.

Generally speaking, I feel that doing order-2 allocation from
__stack_depot_save() with gfp flags supplied by the caller is an
unexpected behavior for the callers. We might want to use only order-0
allocation, and/or stop using gfp flags supplied by the caller...
Suggested-by: Alexander Potapenko <gli...@google.com>
Cc: Huang, Ying <ying....@intel.com>
Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
---
Changes in v3:
Huang, Ying thinks that masking __GFP_KSWAPD_RECLAIM flag in the callers
side is preferable
( https://lkml.kernel.org/r/87fs7ny...@yhuang6-desk2.ccr.corp.intel.com ).
But Alexander Potapenko thinks that masking __GFP_KSWAPD_RECLAIM flag
in the callee side would be the better
( https://lkml.kernel.org/r/CAG_fn=UTTbkGeOX0teGcNOeobtgV=mfGOefZpV-...@mail.gmail.com ).
I took Alexander's suggestion, and added reasoning for masking
__GFP_KSWAPD_RECLAIM flag in the callee side.

Changes in v2:
Mask __GFP_KSWAPD_RECLAIM flag in the callers, suggested by Huang, Ying
( https://lkml.kernel.org/r/87edn92...@yhuang6-desk2.ccr.corp.intel.com ).

lib/stackdepot.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 2f5aa851834e..33ebefaa7074 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -405,7 +405,10 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
* contexts and I/O.
*/
alloc_flags &= ~GFP_ZONEMASK;
- alloc_flags &= (GFP_ATOMIC | GFP_KERNEL);
+ if (!(alloc_flags & __GFP_DIRECT_RECLAIM))
+ alloc_flags &= __GFP_HIGH;
+ else
+ alloc_flags &= ~__GFP_NOFAIL;

Huang, Ying

unread,
Jun 11, 2023, 9:31:37 PM6/11/23
to Tetsuo Handa, Andrew Morton, Alexander Potapenko, syzbot, syzkall...@googlegroups.com, Mel Gorman, Vlastimil Babka, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, Vincenzo Frascino, Marco Elver, kasan-dev, linux-mm
TBH, I don't like to remove __GFP_KSWAPD_RECLAIM flag unnecessarily.
But this is only my personal opinion.

> While we are at it, let's make __stack_depot_save() accept __GFP_NORETRY
> and __GFP_RETRY_MAYFAIL flags, based on the following reason.
>
> Reason 3:
>
> Since DEPOT_POOL_ORDER is defined as 2, we must mask __GFP_NOFAIL flag
> in order not to complain rmqueue(). But masking __GFP_NORETRY flag and
> __GFP_RETRY_MAYFAIL flag might be overkill.
>
> The OOM killer might be needlessly invoked due to order-2 allocation if
> GFP_KERNEL is supplied by the caller, despite the caller might have
> passed GFP_KERNEL for doing order-0 allocation.
>
> Allocation for order-2 might stall if GFP_NOFS or GFP_NOIO is supplied
> by the caller, despite the caller might have passed GFP_NOFS or GFP_NOIO
> for doing order-0 allocation.
>
> Generally speaking, I feel that doing order-2 allocation from
> __stack_depot_save() with gfp flags supplied by the caller is an
> unexpected behavior for the callers. We might want to use only order-0
> allocation, and/or stop using gfp flags supplied by the caller...

Per my understanding, this isn't locking issue reported by syzbot? If
so, I suggest to put this in a separate patch.
Why not just

alloc_flags &= ~__GFP_KSWAPD_RECLAIM;

?

> + else
> + alloc_flags &= ~__GFP_NOFAIL;
> alloc_flags |= __GFP_NOWARN;
> page = alloc_pages(alloc_flags, DEPOT_POOL_ORDER);
> if (page)

Best Regards,
Huang, Ying

Alexander Potapenko

unread,
Jun 21, 2023, 8:57:03 AM6/21/23
to Tetsuo Handa, Andrew Morton, Huang, Ying, syzbot, syzkall...@googlegroups.com, Mel Gorman, Vlastimil Babka, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, Vincenzo Frascino, Marco Elver, kasan-dev, linux-mm
On Sat, Jun 10, 2023 at 1:40 PM Tetsuo Handa
<penguin...@i-love.sakura.ne.jp> wrote:
>
> syzbot is reporting lockdep warning in __stack_depot_save(), for
> __kasan_record_aux_stack() is passing GFP_NOWAIT which will result in
> calling wakeup_kcompactd() from wakeup_kswapd() from wake_all_kswapds()
> from __alloc_pages_slowpath().
>
> Strictly speaking, __kasan_record_aux_stack() is responsible for removing
> __GFP_KSWAPD_RECLAIM flag in order not to wake kswapd which in turn wakes
> kcompactd. But since KASAN and KMSAN functions might be called with
> arbitrary locks held, we should consider removing __GFP_KSWAPD_RECLAIM
> flag from KASAN and KMSAN. And this patch goes one step further; let's
> remove __GFP_KSWAPD_RECLAIM flag in the __stack_depot_save() side, based
> on the following reasons.
>
> Reason 1:
>
> Currently, __stack_depot_save() has "alloc_flags &= ~GFP_ZONEMASK;" line
> which is pointless because "alloc_flags &= (GFP_ATOMIC | GFP_KERNEL);"
> line will also zero out zone modifiers.

Good catch, we indeed do not need the GFP_ZONEMASK line now.
But looks like you'll need it at least in the __GFP_NOFAIL branch?

> But why is __stack_depot_save()
> trying to mask gfp flags supplied by the caller?
>
> I guess that __stack_depot_save() tried to be as robust as possible. But
> __stack_depot_save() is a debugging function where all callers have to
> be able to survive allocation failures.

This, but also the allocation should not deadlock.
E.g. KMSAN can call __stack_depot_save() from almost any function in
the kernel, so we'd better avoid heavyweight memory reclaiming,
because that in turn may call __stack_depot_save() again.

>
> Reason 2:
>
> __stack_depot_save() from stack_depot_save() is also called by
> ref_tracker_alloc() from __netns_tracker_alloc() from
> netns_tracker_alloc() from get_net_track(), and some of get_net_track()
> users are passing GFP_ATOMIC because waking kswapd/kcompactd is safe.
> But even if we mask __GFP_KSWAPD_RECLAIM flag at __stack_depot_save(),
> it is very likely that allocations with __GFP_KSWAPD_RECLAIM flag happen
> somewhere else by the moment __stack_depot_save() is called for the next
> time.
>
> Therefore, not waking kswapd/kcompactd when doing allocation for
> __stack_depot_save() will be acceptable from the memory reclaim latency
> perspective.

Ack.

> While we are at it, let's make __stack_depot_save() accept __GFP_NORETRY
> and __GFP_RETRY_MAYFAIL flags, based on the following reason.

Looks like you're accepting a whole bunch of flags in addition to
__GFP_NORETRY and __GFP_RETRY_MAYFAIL - maybe list the two explicitly?

> Reason 3:
>
> Since DEPOT_POOL_ORDER is defined as 2, we must mask __GFP_NOFAIL flag
> in order not to complain rmqueue(). But masking __GFP_NORETRY flag and
> __GFP_RETRY_MAYFAIL flag might be overkill.
>
> The OOM killer might be needlessly invoked due to order-2 allocation if
> GFP_KERNEL is supplied by the caller, despite the caller might have
> passed GFP_KERNEL for doing order-0 allocation.

As you noted above, stackdepot is a debug feature anyway, so invoking
OOM killer because there is no memory for an order-2 allocation might
be an acceptable behavior?

> Allocation for order-2 might stall if GFP_NOFS or GFP_NOIO is supplied
> by the caller, despite the caller might have passed GFP_NOFS or GFP_NOIO
> for doing order-0 allocation.

What if the caller passed GFP_NOFS to avoid calling back into FS, and
discarding that flag would result in a recursion?
Same for GFP_NOIO.

> Generally speaking, I feel that doing order-2 allocation from
> __stack_depot_save() with gfp flags supplied by the caller is an
> unexpected behavior for the callers. We might want to use only order-0
> allocation, and/or stop using gfp flags supplied by the caller...

Right now stackdepot allows the following list of flags: __GFP_HIGH,
__GFP_KSWAPD_RECLAIM, __GFP_DIRECT_RECLAIM, __GFP_IO, __GFP_FS.
We could restrict it further to __GFP_HIGH | __GFP_DIRECT_RECLAIM to
be on the safe side - plus allow __GFP_NORETRY and
__GFP_RETRY_MAYFAIL.

Tetsuo Handa

unread,
Jun 21, 2023, 10:07:25 AM6/21/23
to Alexander Potapenko, Andrew Morton, Huang, Ying, syzbot, syzkall...@googlegroups.com, Mel Gorman, Vlastimil Babka, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, Vincenzo Frascino, Marco Elver, kasan-dev, linux-mm
On 2023/06/21 21:56, Alexander Potapenko wrote:
>> But why is __stack_depot_save()
>> trying to mask gfp flags supplied by the caller?
>>
>> I guess that __stack_depot_save() tried to be as robust as possible. But
>> __stack_depot_save() is a debugging function where all callers have to
>> be able to survive allocation failures.
>
> This, but also the allocation should not deadlock.
> E.g. KMSAN can call __stack_depot_save() from almost any function in
> the kernel, so we'd better avoid heavyweight memory reclaiming,
> because that in turn may call __stack_depot_save() again.

Then, isn't "[PATCH] kasan,kmsan: remove __GFP_KSWAPD_RECLAIM usage from
kasan/kmsan" the better fix?



>> Allocation for order-2 might stall if GFP_NOFS or GFP_NOIO is supplied
>> by the caller, despite the caller might have passed GFP_NOFS or GFP_NOIO
>> for doing order-0 allocation.
>
> What if the caller passed GFP_NOFS to avoid calling back into FS, and
> discarding that flag would result in a recursion?
> Same for GFP_NOIO.

Excuse me, but "alloc_flags &= ~__GFP_NOFAIL;" will not discard flags in
GFP_NOFS / GFP_NOIO ?



>> Generally speaking, I feel that doing order-2 allocation from
>> __stack_depot_save() with gfp flags supplied by the caller is an
>> unexpected behavior for the callers. We might want to use only order-0
>> allocation, and/or stop using gfp flags supplied by the caller...
>
> Right now stackdepot allows the following list of flags: __GFP_HIGH,
> __GFP_KSWAPD_RECLAIM, __GFP_DIRECT_RECLAIM, __GFP_IO, __GFP_FS.
> We could restrict it further to __GFP_HIGH | __GFP_DIRECT_RECLAIM to
> be on the safe side - plus allow __GFP_NORETRY and
> __GFP_RETRY_MAYFAIL.

I feel that making such change is killing more than needed; there is
no need to discard __GFP_KSWAPD_RECLAIM when GFP_KERNEL is given.

"[PATCH] kasan,kmsan: remove __GFP_KSWAPD_RECLAIM usage from kasan/kmsan"
looks the better.

Alexander Potapenko

unread,
Jun 21, 2023, 10:43:08 AM6/21/23
to Tetsuo Handa, Andrew Morton, Huang, Ying, syzbot, syzkall...@googlegroups.com, Mel Gorman, Vlastimil Babka, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, Vincenzo Frascino, Marco Elver, kasan-dev, linux-mm
On Wed, Jun 21, 2023 at 4:07 PM Tetsuo Handa
<penguin...@i-love.sakura.ne.jp> wrote:
>
> On 2023/06/21 21:56, Alexander Potapenko wrote:
> >> But why is __stack_depot_save()
> >> trying to mask gfp flags supplied by the caller?
> >>
> >> I guess that __stack_depot_save() tried to be as robust as possible. But
> >> __stack_depot_save() is a debugging function where all callers have to
> >> be able to survive allocation failures.
> >
> > This, but also the allocation should not deadlock.
> > E.g. KMSAN can call __stack_depot_save() from almost any function in
> > the kernel, so we'd better avoid heavyweight memory reclaiming,
> > because that in turn may call __stack_depot_save() again.
>
> Then, isn't "[PATCH] kasan,kmsan: remove __GFP_KSWAPD_RECLAIM usage from
> kasan/kmsan" the better fix?

Perhaps you are right and I shouldn't have insisted on pushing this
flag down to stackdepot.
If other users (e.g. page_owner) can afford invoking kswapd, then we
are good to go, and new compiler-based tools can use the same flags
KASAN and KMSAN do.


>
> >> Allocation for order-2 might stall if GFP_NOFS or GFP_NOIO is supplied
> >> by the caller, despite the caller might have passed GFP_NOFS or GFP_NOIO
> >> for doing order-0 allocation.
> >
> > What if the caller passed GFP_NOFS to avoid calling back into FS, and
> > discarding that flag would result in a recursion?
> > Same for GFP_NOIO.
>
> Excuse me, but "alloc_flags &= ~__GFP_NOFAIL;" will not discard flags in
> GFP_NOFS / GFP_NOIO ?

But not for the other if-clause?
Anyway, I actually confused GFP_NOIO (which is technically
__GFP_RECLAIM) and GFP_NOFS with __GFP_IO/__GFP_FS, thinking that
there's a separate pair of GFP flags opposite to __GFP_IO and
__GFP_FS.
Please disregard.

>
>
> >> Generally speaking, I feel that doing order-2 allocation from
> >> __stack_depot_save() with gfp flags supplied by the caller is an
> >> unexpected behavior for the callers. We might want to use only order-0
> >> allocation, and/or stop using gfp flags supplied by the caller...
> >
> > Right now stackdepot allows the following list of flags: __GFP_HIGH,
> > __GFP_KSWAPD_RECLAIM, __GFP_DIRECT_RECLAIM, __GFP_IO, __GFP_FS.
> > We could restrict it further to __GFP_HIGH | __GFP_DIRECT_RECLAIM to
> > be on the safe side - plus allow __GFP_NORETRY and
> > __GFP_RETRY_MAYFAIL.
>
> I feel that making such change is killing more than needed; there is
> no need to discard __GFP_KSWAPD_RECLAIM when GFP_KERNEL is given.
>
> "[PATCH] kasan,kmsan: remove __GFP_KSWAPD_RECLAIM usage from kasan/kmsan"
> looks the better.
>

I agree, let's go for it.
Sorry for the trouble.

--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Tetsuo Handa

unread,
Jun 21, 2023, 10:54:24 AM6/21/23
to Alexander Potapenko, Andrew Morton, Huang, Ying, syzbot, syzkall...@googlegroups.com, Mel Gorman, Vlastimil Babka, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, Vincenzo Frascino, Marco Elver, kasan-dev, linux-mm
On 2023/06/21 23:42, Alexander Potapenko wrote:
>> "[PATCH] kasan,kmsan: remove __GFP_KSWAPD_RECLAIM usage from kasan/kmsan"
>> looks the better.
>>
>
> I agree, let's go for it.
> Sorry for the trouble.
>

No problem. :-)

Andrew, please take "[PATCH] kasan,kmsan: remove __GFP_KSWAPD_RECLAIM usage from kasan/kmsan"
at https://lkml.kernel.org/r/656cb4f5-998b-c8d7...@I-love.SAKURA.ne.jp
with "Reviewed-by: "Huang, Ying" <ying....@intel.com>" added.

Alexander Potapenko

unread,
Jun 21, 2023, 11:38:16 AM6/21/23
to Andrew Morton, Huang, Ying, Tetsuo Handa, syzbot, syzkall...@googlegroups.com, Mel Gorman, Vlastimil Babka, Andrey Konovalov, Dmitry Vyukov, Andrey Ryabinin, Vincenzo Frascino, Marco Elver, kasan-dev, linux-mm
Reviewed-by: Alexander Potapenko <gli...@google.com>

Andrew, please accept this patch. As noted in the other thread, no
changes to stackdepot are needed.
Reply all
Reply to author
Forward
0 new messages