Paul, this would be ready to be integrated with the RCU patches.
Thanks,
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Thank you, Mathieu, queued up for 2.6.35!
Thanx, Paul
Mathieu Desnoyers wrote:
> Helps finding racy users of call_rcu(), which results in hangs because list
> entries are overwritten and/or skipped.
>
> Changelog since v4:
> - Bissectability is now OK
> - Now generate a WARN_ON_ONCE() for non-initialized rcu_head passed to
> call_rcu(). Statically initialized objects are detected with
> object_is_static().
> - Rename rcu_head_init_on_stack to init_rcu_head_on_stack.
> - Remove init_rcu_head() completely.
>
> Changelog since v3:
> - Include comments from Lai Jiangshan
Thank you, Lai!!! I have added your Reviewed-by.
Thanx, Paul
And testing got me the following debugobjects splat, which baffles me.
My first thought was that one of the synchronize_rcu() variants was
missing the init_rcu_head_on_stack(), but not so. Then I started
looking through the debugobjects code, and found the following:
static void debug_object_is_on_stack(void *addr, int onstack)
{
int is_on_stack;
static int limit;
if (limit > 4)
return;
This really confuses me. We are using a static variable, but as
near as I can tell, it is being guarded by a per-bucket lock:
raw_spin_lock_irqsave(&db->lock, flags);
If I understand correctly, this means that multiple CPUs might be
concurrently updating the static variable "limit", which might in
turn be causing the splat below.
Or am I missing something?
Thanx, Paul
------------------------------------------------------------------------
ODEBUG: object is on stack, but not annotated
------------[ cut here ]------------
Badness at lib/debugobjects.c:294
NIP: c0000000002c76f0 LR: c0000000002c76ec CTR: c00000000041ecd8
REGS: c0000001de71b280 TRAP: 0700 Tainted: G W (2.6.34-rc3-autokern1)
MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 24000424 XER: 0000000f
TASK = c0000001de7dca00[3695] 'arping' THREAD: c0000001de718000 CPU: 1
GPR00: c0000000002c76ec c0000001de71b500 c00000000096c048 0000000000000034
GPR04: 0000000000000001 c000000000063918 0000000000000000 0000000000000002
GPR08: 0000000000000003 0000000000000000 c000000000086f68 c0000001de7dca00
GPR12: 000000000000256d c0000000074e4200 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 00000000201b8f60
GPR20: 00000000201b8f70 00000000201b8f48 0000000000000000 c0000000008766b8
GPR24: c0000001de71b800 0000000000000001 c0000000008ad400 c000000001247478
GPR28: c0000000e6abb8c0 c0000000e6abb8c0 c000000000904570 c000000001247470
NIP [c0000000002c76f0] .__debug_object_init+0x314/0x40c
LR [c0000000002c76ec] .__debug_object_init+0x310/0x40c
Call Trace:
[c0000001de71b500] [c0000000002c76ec] .__debug_object_init+0x310/0x40c (unreliable)
[c0000001de71b5d0] [c00000000007d990] .rcuhead_fixup_activate+0x40/0xdc
[c0000001de71b660] [c0000000002c6a7c] .debug_object_fixup+0x4c/0x74
[c0000001de71b6f0] [c0000000000c5e54] .__call_rcu+0x3c/0x1d4
[c0000001de71b790] [c0000000000c6050] .synchronize_rcu+0x4c/0x6c
[c0000001de71b870] [c0000000004be218] .synchronize_net+0x10/0x24
[c0000001de71b8e0] [c0000000005498c8] .packet_release+0x1d4/0x274
[c0000001de71b990] [c0000000004ac1f0] .sock_release+0x54/0x124
[c0000001de71ba20] [c0000000004ac9e4] .sock_close+0x34/0x4c
[c0000001de71baa0] [c00000000012469c] .__fput+0x174/0x264
[c0000001de71bb40] [c000000000120c54] .filp_close+0xb0/0xd8
[c0000001de71bbd0] [c000000000065e70] .put_files_struct+0x1a8/0x314
[c0000001de71bc70] [c000000000067e04] .do_exit+0x234/0x6f0
[c0000001de71bd30] [c000000000068354] .do_group_exit+0x94/0xc8
[c0000001de71bdc0] [c00000000006839c] .SyS_exit_group+0x14/0x28
[c0000001de71be30] [c000000000008554] syscall_exit+0x0/0x40
Instruction dump:
7f80b000 419e0030 2fa00000 e93e8140 380b0001 90090000 419e000c e87e8148
48000008 e87e8150 4bd9cb89 60000000 <0fe00000> 801c0010 2f800003 419e0024
This "limit" static variable is really only a printk suppressor: it stops the
printk warning output after approximately 5 occurences (modulo racy increments).
But normally, it should not _cause_ a splat if there ain't any in the first
place.
Will send the fix in a following email.
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
[Mathieu]
Here is the fix.
Signed-off-by: Mathieu Desnoyers <mathieu....@efficios.com>
CC: "Paul E. McKenney" <pau...@linux.vnet.ibm.com>
---
kernel/rcutree_plugin.h | 2 ++
1 file changed, 2 insertions(+)
Index: linux.trees.git/kernel/rcutree_plugin.h
===================================================================
--- linux.trees.git.orig/kernel/rcutree_plugin.h 2010-04-21 21:15:45.000000000 -0400
+++ linux.trees.git/kernel/rcutree_plugin.h 2010-04-21 21:16:57.000000000 -0400
@@ -515,11 +515,13 @@ void synchronize_rcu(void)
if (!rcu_scheduler_active)
return;
+ init_rcu_head_on_stack(&rcu.head);
init_completion(&rcu.completion);
/* Will wake me after RCU finished. */
call_rcu(&rcu.head, wakeme_after_rcu);
/* Wait for it. */
wait_for_completion(&rcu.completion);
+ destroy_rcu_head_on_stack(&rcu.head);
}
EXPORT_SYMBOL_GPL(synchronize_rcu);
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
Thank you very much, Mathieu!!! Queued for 2.6.35.