[syzbot] [kernel?] WARNING: ODEBUG bug in __mod_timer

9 views
Skip to first unread message

syzbot

unread,
May 29, 2023, 10:31:13 AM5/29/23
to linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: 44bf136283e5 Add linux-next specific files for 20230420
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1026a757c80000
kernel config: https://syzkaller.appspot.com/x/.config?x=ba413c0527139246
dashboard link: https://syzkaller.appspot.com/bug?extid=7937ba6a50bdd00fffdf
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/db069eb0e403/disk-44bf1362.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/9f9f4ca6cfb8/vmlinux-44bf1362.xz
kernel image: https://storage.googleapis.com/syzbot-assets/e53d01b894f7/bzImage-44bf1362.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+7937ba...@syzkaller.appspotmail.com

WARNING: CPU: 1 PID: 5171 at lib/debugobjects.c:505 debug_print_object+0x194/0x2c0 lib/debugobjects.c:505
Modules linked in:
CPU: 1 PID: 5171 Comm: kworker/1:5 Not tainted 6.3.0-rc7-next-20230420-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/30/2023
Workqueue: rcu_gp wait_rcu_exp_gp
RIP: 0010:debug_print_object+0x194/0x2c0 lib/debugobjects.c:505
Code: df 48 89 fe 48 c1 ee 03 80 3c 16 00 0f 85 c7 00 00 00 48 8b 14 dd c0 0f a7 8a 50 4c 89 ee 48 c7 c7 80 03 a7 8a e8 7c ec 35 fd <0f> 0b 58 83 05 86 75 63 0a 01 48 83 c4 20 5b 5d 41 5c 41 5d 41 5e
RSP: 0018:ffffc900056af890 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000000000
RDX: ffff88801e71d7c0 RSI: ffffffff814be0c7 RDI: 0000000000000001
RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8a4ecb40
R13: ffffffff8aa70a60 R14: ffffc900056af950 R15: ffffffff817267d0
FS: 0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020404030 CR3: 000000002bc8b000 CR4: 00000000003526e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
debug_object_assert_init+0x20f/0x310 lib/debugobjects.c:912
debug_timer_assert_init kernel/time/timer.c:792 [inline]
debug_assert_init kernel/time/timer.c:837 [inline]
__mod_timer+0x9b/0xe80 kernel/time/timer.c:1020
schedule_timeout+0x149/0x2b0 kernel/time/timer.c:2166
synchronize_rcu_expedited_wait_once kernel/rcu/tree_exp.h:572 [inline]
synchronize_rcu_expedited_wait kernel/rcu/tree_exp.h:624 [inline]
rcu_exp_wait_wake+0x2ae/0x1550 kernel/rcu/tree_exp.h:693
process_one_work+0x99a/0x15e0 kernel/workqueue.c:2405
worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552
kthread+0x33e/0x440 kernel/kthread.c:379
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the bug is already fixed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to change bug's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the bug is a duplicate of another bug, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Tetsuo Handa

unread,
May 29, 2023, 10:40:22 AM5/29/23
to syzbot, syzkall...@googlegroups.com, Thomas Gleixner, linux-...@vger.kernel.org, Stephen Boyd, Andrew Morton, Peter Zijlstra, wuchi
syzbot is reporting false positive ODEBUG message immediately after
ODEBUG was disabled due to OOM.

[ 1062.309646][T22911] ODEBUG: Out of memory. ODEBUG disabled
[ 1062.886755][ T5171] ------------[ cut here ]------------
[ 1062.892770][ T5171] ODEBUG: assert_init not available (active state 0) object: ffffc900056afb20 object type: timer_list hint: process_timeout+0x0/0x40

This race happened because debug_objects_oom() emitted OOM message but did
not turn off debug_objects_enabled, and debug_print_object() did not check
debug_objects_enabled when calling WARN().

CPU 0 [ T5171] CPU 1 [T22911]
-------------- --------------
debug_object_assert_init() {
if (!debug_objects_enabled)
return;
db = get_bucket((unsigned long) addr); // Finds a bucket, but...
debug_objects_oom() {
pr_warn("Out of memory. ODEBUG disabled\n");
// all buckets get emptied here, and...
hlist_move_list(&db->list, &freelist);
}
lookup_object_or_alloc(addr, db, descr, false, true) {
lookup_object(addr, b) {
return NULL; // this bucket is already empty.
}
if (!descr->is_static_object || !descr->is_static_object(addr))
return ERR_PTR(-ENOENT);
}
if (!obj) { // obj == ERR_PTR(-ENOENT) because non-static object.
debug_objects_oom();
return;
}
debug_print_object(&o, "assert_init") {
// False positive due to not checking debug_objects_enabled.
WARN(1, KERN_ERR "ODEBUG: %s %s (active state %u) "
"object: %p object type: %s hint: %pS\n", ...);
}
}

Reported-by: syzbot <syzbot+7937ba...@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=7937ba6a50bdd00fffdf
Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
---
lib/debugobjects.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 984985c39c9b..63974e9edac5 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -466,6 +466,7 @@ static void debug_objects_oom(void)
unsigned long flags;
int i;

+ debug_objects_enabled = 0;
pr_warn("Out of memory. ODEBUG disabled\n");

for (i = 0; i < ODEBUG_HASH_SIZE; i++, db++) {
@@ -502,10 +503,10 @@ static void debug_print_object(struct debug_obj *obj, char *msg)
void *hint = descr->debug_hint ?
descr->debug_hint(obj->object) : NULL;
limit++;
- WARN(1, KERN_ERR "ODEBUG: %s %s (active state %u) "
- "object: %p object type: %s hint: %pS\n",
- msg, obj_states[obj->state], obj->astate,
- obj->object, descr->name, hint);
+ WARN(debug_objects_enabled, KERN_ERR
+ "ODEBUG: %s %s (active state %u) object: %p object type: %s hint: %pS\n",
+ msg, obj_states[obj->state], obj->astate,
+ obj->object, descr->name, hint);
}
debug_objects_warnings++;
}
--
2.18.4

Thomas Gleixner

unread,
Jun 6, 2023, 6:35:46 PM6/6/23
to Tetsuo Handa, syzbot, syzkall...@googlegroups.com, linux-...@vger.kernel.org, Stephen Boyd, Andrew Morton, Peter Zijlstra, wuchi
Tetsuo!
The above is undecodable gibberish.

Something like this is completely sufficient:

CPU 0 [ T5171] CPU 1 [T22911]
-------------- --------------
debug_object_assert_init() {
db = get_bucket(addr);
debug_objects_oom() {
pr_warn("Out of memory. ODEBUG disabled\n");
// all buckets get emptied here
}
lookup_object_or_alloc(addr, db, ...)
// Due to OOM:
return ERR_PTR(-ENOENT);
...

// Emits assert_init message and warning
debug_print_object(&o, "assert_init");
}

And this:

> This race happened because debug_objects_oom() emitted OOM message but did
> not turn off debug_objects_enabled

is completely wrong. Why?

The place where debug_objects_enabled is set to 0 is way before
debug_objects_oom() is invoked. That place _cannot_ invoke
debug_objects_oom() because it holds a hash bucket lock.

There are exactly three places which invoke debug_objects_oom() and for
all three places the pattern is exactly the same:

lock_bucket();
obj = lookup_object_or_alloc();
unlock_bucket();
if (!obj)
debug_objects_oom();

The place which clears debug_objects_enabled is unsurprisingly
lookup_object_or_alloc() itself, which _cannot_ invoke
debug_objects_oom() because it is invoked with the hash bucket lock
held. There is even a comment to that effect:

/* Out of memory. Do the cleanup outside of the locked region */
debug_objects_enabled = 0;
return NULL;

So at the point where debug_objects_oom() is invoked
@debug_objects_enabled is already 0.

But you claim that this is required, right?

> @@ -466,6 +466,7 @@ static void debug_objects_oom(void)
> unsigned long flags;
> int i;
>
> + debug_objects_enabled = 0;
> pr_warn("Out of memory. ODEBUG disabled\n");

Q: What is setting a variable which is already 0 to 0 solving?
A: Absolutely nothing

Now this:

> @@ -502,10 +503,10 @@ static void debug_print_object(struct de
> void *hint = descr->debug_hint ?
> descr->debug_hint(obj->object) : NULL;
> limit++;
> - WARN(1, KERN_ERR "ODEBUG: %s %s (active state %u) "
> - "object: %p object type: %s hint: %pS\n",
> - msg, obj_states[obj->state], obj->astate,
> - obj->object, descr->name, hint);
> + WARN(debug_objects_enabled, KERN_ERR
> + "ODEBUG: %s %s (active state %u) object: %p object type: %s hint: %pS\n",
> + msg, obj_states[obj->state], obj->astate,
> + obj->object, descr->name, hint);
> }

Q: Why is this related to the WARN() itself?
A: It's not related at all

The obvious fix is:

--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -498,6 +498,14 @@ static void debug_print_object(struct de
const struct debug_obj_descr *descr = obj->descr;
static int limit;

+ /*
+ * OOM handling is asynchronous for performance reasons. So the
+ * call site might have raced with a concurrent OOM which cleared
+ * the hash buckets.
+ */
+ if (!debug_objects_enabled)
+ return;
+
if (limit < 5 && descr != descr_test) {
void *hint = descr->debug_hint ?
descr->debug_hint(obj->object) : NULL;

Along with a understandable changelog, no?

Thanks,

tglx

Tetsuo Handa

unread,
Jun 7, 2023, 6:19:17 AM6/7/23
to Thomas Gleixner, syzbot, syzkall...@googlegroups.com, linux-...@vger.kernel.org, Stephen Boyd, Andrew Morton, Peter Zijlstra, wuchi
syzbot is reporting false positive ODEBUG message immediately after
ODEBUG was disabled due to OOM. We should recheck debug_objects_enabled
before reporting.

[ 1062.309646][T22911] ODEBUG: Out of memory. ODEBUG disabled
[ 1062.886755][ T5171] ------------[ cut here ]------------
[ 1062.892770][ T5171] ODEBUG: assert_init not available (active state 0) object: ffffc900056afb20 object type: timer_list hint: process_timeout+0x0/0x40

CPU 0 [ T5171] CPU 1 [T22911]
-------------- --------------
debug_object_assert_init() {
if (!debug_objects_enabled)
return;
db = get_bucket((unsigned long) addr); // Finds a bucket, but...
lookup_object_or_alloc() {
debug_objects_enabled = 0;
return NULL;
}
debug_objects_oom() {
pr_warn("Out of memory. ODEBUG disabled\n");
// all buckets get emptied here, and
}
lookup_object_or_alloc(addr, db, descr, false, true) {
// this bucket is already empty.
return ERR_PTR(-ENOENT);
}
// Emits false positive warning.
debug_print_object(&o, "assert_init");
Suggested-by: Thomas Gleixner <tg...@linutronix.de>
Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
---
Changes in v2:
No need to turn off debug_objects_enabled from debug_objects_oom().

lib/debugobjects.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 984985c39c9b..ab3bbcb78471 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -498,6 +498,15 @@ static void debug_print_object(struct debug_obj *obj, char *msg)
const struct debug_obj_descr *descr = obj->descr;
static int limit;

+ /*
+ * Don't report if lookup_object_or_alloc() by the current thread
+ * failed because lookup_object_or_alloc()/debug_objects_oom() by
+ * concurrent thread turned off debug_objects_enabled and cleared
+ * the hash buckets.
+ */
+ if (!debug_objects_enabled)
+ return;
+
if (limit < 5 && descr != descr_test) {
void *hint = descr->debug_hint ?
descr->debug_hint(obj->object) : NULL;
--
2.18.4

Reply all
Reply to author
Forward
0 new messages