[syzbot] memory leak in __send_signal

21 views
Skip to first unread message

syzbot

unread,
Jun 6, 2021, 10:32:25 AM6/6/21
to ax...@kernel.dk, chri...@brauner.io, ebie...@xmission.com, el...@google.com, linux-...@vger.kernel.org, ol...@redhat.com, p...@google.com, pet...@infradead.org, syzkall...@googlegroups.com, tg...@linutronix.de
Hello,

syzbot found the following issue on:

HEAD commit: 9d32fa5d Merge tag 'net-5.13-rc5' of git://git.kernel.org/..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10fd97dfd00000
kernel config: https://syzkaller.appspot.com/x/.config?x=de8efb0998945e75
dashboard link: https://syzkaller.appspot.com/bug?extid=0bac5fec63d4f399ba98
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16029ce0300000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+0bac5f...@syzkaller.appspotmail.com

2021/06/05 21:42:36 executed programs: 303
2021/06/05 21:42:42 executed programs: 312
2021/06/05 21:42:48 executed programs: 319
2021/06/05 21:42:54 executed programs: 331
BUG: memory leak
unreferenced object 0xffff8881278e3c80 (size 80):
comm "syz-executor.4", pid 12851, jiffies 4295068441 (age 14.610s)
hex dump (first 32 bytes):
80 3c 8e 27 81 88 ff ff 80 3c 8e 27 81 88 ff ff .<.'.....<.'....
00 00 00 00 00 00 00 00 05 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff812450d6>] __sigqueue_alloc+0xd6/0x240 kernel/signal.c:441
[<ffffffff81247d31>] __send_signal+0x231/0x600 kernel/signal.c:1155
[<ffffffff8124b123>] do_send_sig_info+0x63/0xc0 kernel/signal.c:1333
[<ffffffff8124b4f9>] do_send_specific+0xc9/0xf0 kernel/signal.c:3881
[<ffffffff8124b5ab>] do_tkill+0x8b/0xb0 kernel/signal.c:3907
[<ffffffff8124e811>] __do_sys_tkill kernel/signal.c:3942 [inline]
[<ffffffff8124e811>] __se_sys_tkill kernel/signal.c:3936 [inline]
[<ffffffff8124e811>] __x64_sys_tkill+0x31/0x50 kernel/signal.c:3936
[<ffffffff843540da>] do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
[<ffffffff84400068>] entry_SYSCALL_64_after_hwframe+0x44/0xae



---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Thomas Gleixner

unread,
Jun 21, 2021, 7:08:32 PM6/21/21
to syzbot, ax...@kernel.dk, chri...@brauner.io, ebie...@xmission.com, el...@google.com, linux-...@vger.kernel.org, ol...@redhat.com, p...@google.com, pet...@infradead.org, syzkall...@googlegroups.com
syzbot reported a memory leak related to sigqueue caching. This happens
when a thread group leader with child tasks is reaped.

The group leader's sigqueue_cache is correctly freed. The group leader then
reaps the child tasks and if any of them has a signal pending it caches
that signal. That's obviously bogus because nothing will free the cached
signal of the reaped group leader anymore.

Prevent this by setting tsk::sigqueue_cache to an error pointer value in
exit_task_sigqueue_cache().

Add comments to all relevant places.

Fixes: 4bad58ebc8bc ("signal: Allow tasks to cache one sigqueue struct")
Reported-by: syzbot+0bac5f...@syzkaller.appspotmail.com
Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
---
kernel/signal.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)

--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -435,6 +435,12 @@ static struct sigqueue *
* Preallocation does not hold sighand::siglock so it can't
* use the cache. The lockless caching requires that only
* one consumer and only one producer run at a time.
+ *
+ * For the regular allocation case it is sufficient to
+ * check @q for NULL because this code can only be called
+ * if the target task @t has not been reaped yet; which
+ * means this code can never observe the error pointer which is
+ * written to @t->sigqueue_cache in exit_task_sigqueue_cache().
*/
q = READ_ONCE(t->sigqueue_cache);
if (!q || sigqueue_flags)
@@ -463,13 +469,18 @@ void exit_task_sigqueue_cache(struct tas
struct sigqueue *q = tsk->sigqueue_cache;

if (q) {
- tsk->sigqueue_cache = NULL;
/*
* Hand it back to the cache as the task might
* be self reaping which would leak the object.
*/
kmem_cache_free(sigqueue_cachep, q);
}
+
+ /*
+ * Set an error pointer to ensure that @tsk will not cache a
+ * sigqueue when it is reaping it's child tasks
+ */
+ tsk->sigqueue_cache = ERR_PTR(-1);
}

static void sigqueue_cache_or_free(struct sigqueue *q)
@@ -481,6 +492,10 @@ static void sigqueue_cache_or_free(struc
* is intentional when run without holding current->sighand->siglock,
* which is fine as current obviously cannot run __sigqueue_free()
* concurrently.
+ *
+ * The NULL check is safe even if current has been reaped already,
+ * in which case exit_task_sigqueue_cache() wrote an error pointer
+ * into current->sigqueue_cache.
*/
if (!READ_ONCE(current->sigqueue_cache))
WRITE_ONCE(current->sigqueue_cache, q);

Oleg Nesterov

unread,
Jun 22, 2021, 2:34:17 AM6/22/21
to Thomas Gleixner, syzbot, ax...@kernel.dk, chri...@brauner.io, ebie...@xmission.com, el...@google.com, linux-...@vger.kernel.org, p...@google.com, pet...@infradead.org, syzkall...@googlegroups.com
On 06/22, Thomas Gleixner wrote:
>
> syzbot reported a memory leak related to sigqueue caching. This happens
> when a thread group leader with child tasks is reaped.
>
> The group leader's sigqueue_cache is correctly freed. The group leader then
> reaps the child tasks and if any of them has a signal pending it caches
> that signal.

I guess you mean the race with exit_notify() ? Could you spell please?
I am just curious how exactly this problem was found.

This doesn't really matter, because damn yes, a task T can call
release_task(another_task)->sigqueue_cache_or_free() after
exit_task_sigqueue_cache(T) was already called. For example, a last non-leader
thread exits and reaps a zombie leader.

Somehow I thought that exit_task_sigqueue_cache() at the end of __exit_signal()
should fix this problem, but this is obviously wrong.


> @@ -463,13 +469,18 @@ void exit_task_sigqueue_cache(struct tas
> struct sigqueue *q = tsk->sigqueue_cache;
>
> if (q) {
> - tsk->sigqueue_cache = NULL;
> /*
> * Hand it back to the cache as the task might
> * be self reaping which would leak the object.
> */
> kmem_cache_free(sigqueue_cachep, q);
> }
> +
> + /*
> + * Set an error pointer to ensure that @tsk will not cache a
> + * sigqueue when it is reaping it's child tasks
> + */
> + tsk->sigqueue_cache = ERR_PTR(-1);
> }


Reviewed-by: Oleg Nesterov <ol...@redhat.com>

Thomas Gleixner

unread,
Jun 22, 2021, 3:59:21 AM6/22/21
to Oleg Nesterov, syzbot, ax...@kernel.dk, chri...@brauner.io, ebie...@xmission.com, el...@google.com, linux-...@vger.kernel.org, p...@google.com, pet...@infradead.org, syzkall...@googlegroups.com
On Tue, Jun 22 2021 at 08:34, Oleg Nesterov wrote:
> On 06/22, Thomas Gleixner wrote:
>
> I guess you mean the race with exit_notify() ? Could you spell please?

Yes let me rephrase that.

> I am just curious how exactly this problem was found.

I was looking at that syzbot report

https://lore.kernel.org/r/00000000000014...@google.com

and analyzed it how this ends up leaking memory.

Christian Brauner

unread,
Jun 22, 2021, 4:06:44 AM6/22/21
to Thomas Gleixner, syzbot, ax...@kernel.dk, chri...@brauner.io, ebie...@xmission.com, el...@google.com, linux-...@vger.kernel.org, ol...@redhat.com, p...@google.com, pet...@infradead.org, syzkall...@googlegroups.com
On Tue, Jun 22, 2021 at 01:08:30AM +0200, Thomas Gleixner wrote:
> syzbot reported a memory leak related to sigqueue caching. This happens
> when a thread group leader with child tasks is reaped.
>
> The group leader's sigqueue_cache is correctly freed. The group leader then
> reaps the child tasks and if any of them has a signal pending it caches
> that signal. That's obviously bogus because nothing will free the cached
> signal of the reaped group leader anymore.
>
> Prevent this by setting tsk::sigqueue_cache to an error pointer value in
> exit_task_sigqueue_cache().
>
> Add comments to all relevant places.
>
> Fixes: 4bad58ebc8bc ("signal: Allow tasks to cache one sigqueue struct")
> Reported-by: syzbot+0bac5f...@syzkaller.appspotmail.com
> Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
> ---

Acked-by: Christian Brauner <christia...@ubuntu.com>
Reply all
Reply to author
Forward
0 new messages