Here is an updated version of the rcu head debugobjects, following the comments
I received in the last rounds.
It applies on top of -tip, based on 2.6.34-rc2, commit
2e958f219d2b8d192d44e2472a214b3a93c44673
Unless people have any objection, it should be ready to be merged. I think the
appropriate maintainer to perform this merge would be Paul E. McKenney, because
this patchset is RCU-related.
Thanks,
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
> Unless people have any objection, it should be ready to be merged. I
> think the appropriate maintainer to perform this merge would be Paul
> E. McKenney, because this patchset is RCU-related.
Agreed, and I'm fine with the networking parts, so:
Acked-by: David S. Miller <da...@davemloft.net>
This should be very helpful in tracking down otherwise-painful bugs!!!
Thank you, Mathieu!!! I am happy to apply this, especially given Dave
Miller's Acked-by.
A few questions and comments:
o Patches 1/6, 2/6, 3/6: Was the intent for the three Subject
lines to read as follows?
[patch 1/6] Revert "net: remove INIT_RCU_HEAD() usage"
[patch 2/6] Revert "netfilter: don't use INIT_RCU_HEAD()"
[patch 3/6] Revert "net: don't use INIT_RCU_HEAD"
o Patch 4/6 looks good to me, and given that Thomas hasn't
complained, I am guessing that he is OK with it.
o Would it be possible to make this bisectable as follows?
a. Insert a new patch after current patch 4/6 that
defines destroy_rcu_head_on_stack(),
rcu_head_init_on_stack(), and rcu_head_init() with
their !CONFIG_DEBUG_OBJECTS_RCU_HEAD definitions.
b. Move current patch 6/6 to this position.
c. Move current patch 5/6 to this position, updating
to reflect the new patch added in (a) above.
o Patch 6/6: Would it be possible to use the object_is_on_stack()
function defined in include/linux/sched.h instead of passing
in the flag on_stack to bdi_work_init()? It looks like
fs/fs-writeback.c already includes include/linux/sched.h, so
shouldn't be a problem from a #include-hell viewpoint.
Please let me know if I am missing something on any of the above. If
these changes seem reasonable to you, please either submit a new patch
set or let me know that you are OK with me making these changes.
Thanx, Paul
Yep. Oops, got burnt by git show > patches/patchname.patch. Will fix and
re-send.
>
> o Patch 4/6 looks good to me, and given that Thomas hasn't
> complained, I am guessing that he is OK with it.
OK.
>
> o Would it be possible to make this bisectable as follows?
>
> a. Insert a new patch after current patch 4/6 that
> defines destroy_rcu_head_on_stack(),
> rcu_head_init_on_stack(), and rcu_head_init() with
> their !CONFIG_DEBUG_OBJECTS_RCU_HEAD definitions.
>
> b. Move current patch 6/6 to this position.
>
> c. Move current patch 5/6 to this position, updating
> to reflect the new patch added in (a) above.
>
Sure. Will do.
> o Patch 6/6: Would it be possible to use the object_is_on_stack()
> function defined in include/linux/sched.h instead of passing
> in the flag on_stack to bdi_work_init()? It looks like
> fs/fs-writeback.c already includes include/linux/sched.h, so
> shouldn't be a problem from a #include-hell viewpoint.
Wow, that's cool! We learn about exciting internal API functions every day,
isn't life great ? I will definitely change the fs-writeback.c code to make use
of it.
We might event want to go further. A similar scheme could be used for the
rcu_head debugobject activation fixup. Basically, I need a way to distinguish
between:
A) objects on stack and allocated objects
and
B) objects statically initialized
So either we use something resembling:
if (object_is_on_stack() || object_is_allocated())
or
if (object_is_static())
I am not aware of the proper API members to do that though.
>
> Please let me know if I am missing something on any of the above. If
> these changes seem reasonable to you, please either submit a new patch
> set or let me know that you are OK with me making these changes.
As soon as I get the information about the static vs stack/alloc, I'll complete
the update and re-send the updated patchset.
Thanks,
Mathieu
>
> Thanx, Paul
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
Something close to "object_is_static()" would be "kernel_text_address()", but it
only considers the text section addresses. I'd need something that would be
broader than that, containing all core kernel and module bss and data sections.
Still looking...
Thanks,
Mathieu
I was actually feeling pretty good about remembering object_is_on_stack()
and finding it again. ;-)
I am not seeing anything the identifies data or bss, and in any case,
other situations such as per-CPU, __read_mostly, and who knows what all
else would also need to be handled. So in the short term, my guess would
be that it would be best to provide the three functions (possibly renaming
them as noted above), but to leave the responsibility for figuring out
which to invoke with the caller. Always happy to be proven wrong, of
course!
Thanx, Paul
I think I found a neat way to do object_is_static(), which is all we need here.
Can you have a look at the following patch ? If you think it's right, I'll add
it to the patchset for the next submission.
extable and module add object is static
Adds an API to extable.c to check if an object is statically defined. This
permits, along with object_is_on_stack(), to figure out is an object is either
statically defined (object_is_static()) or allocated on the stack
(object_is_on_stack()).
Signed-off-by: Mathieu Desnoyers <mathieu....@efficios.com>
CC: "Paul E. McKenney" <pau...@linux.vnet.ibm.com>
CC: David S. Miller <da...@davemloft.net>
CC: ak...@linux-foundation.org
CC: mi...@elte.hu
CC: la...@cn.fujitsu.com
CC: dipa...@in.ibm.com
CC: jo...@joshtriplett.org
CC: dvh...@us.ibm.com
CC: n...@us.ibm.com
CC: tg...@linutronix.de
CC: pet...@infradead.org
CC: ros...@goodmis.org
CC: Valdis.K...@vt.edu
CC: dhow...@redhat.com
CC: eric.d...@gmail.com
CC: Alexey Dobriyan <adob...@gmail.com>
---
include/linux/kernel.h | 2 +
kernel/extable.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 54 insertions(+)
Index: linux.trees.git/include/linux/kernel.h
===================================================================
--- linux.trees.git.orig/include/linux/kernel.h 2010-03-27 19:27:02.000000000 -0400
+++ linux.trees.git/include/linux/kernel.h 2010-03-27 19:28:02.000000000 -0400
@@ -218,6 +218,8 @@ extern int __kernel_text_address(unsigne
extern int kernel_text_address(unsigned long addr);
extern int func_ptr_is_kernel_text(void *ptr);
+extern int object_is_static(void *obj);
+
struct pid;
extern struct pid *session_of_pgrp(struct pid *pgrp);
Index: linux.trees.git/kernel/extable.c
===================================================================
--- linux.trees.git.orig/kernel/extable.c 2010-03-27 19:28:06.000000000 -0400
+++ linux.trees.git/kernel/extable.c 2010-03-27 19:54:40.000000000 -0400
@@ -60,6 +60,14 @@ static inline int init_kernel_text(unsig
return 0;
}
+static inline int init_kernel_text_or_data(unsigned long addr)
+{
+ if (addr >= (unsigned long)__init_begin &&
+ addr <= (unsigned long)__init_end)
+ return 1;
+ return 0;
+}
+
int core_kernel_text(unsigned long addr)
{
if (addr >= (unsigned long)_stext &&
@@ -113,3 +121,47 @@ int func_ptr_is_kernel_text(void *ptr)
return 1;
return is_module_text_address(addr);
}
+
+/*
+ * Taking the safe approach to refer to each section individually rather than
+ * just taking a range between _stext and _end, just in case an architecture
+ * would decide to override the standard linker scripts and put a section
+ * outside of this range.
+ *
+ * Should be kept in sync with linux/section.h and asm-generic/vmlinux.lds.h.
+ */
+static int core_object_is_static(void *obj)
+{
+ unsigned long addr = (unsigned long) obj;
+
+ if (addr >= (unsigned long)_sdata &&
+ addr <= (unsigned long)_edata)
+ return 1;
+ if (addr >= (unsigned long)__bss_start &&
+ addr <= (unsigned long)__bss_stop)
+ return 1;
+ if (addr >= (unsigned long)__per_cpu_start &&
+ addr <= (unsigned long)__per_cpu_end)
+ return 1;
+ if (addr >= (unsigned long)__start_rodata &&
+ addr <= (unsigned long)__end_rodata)
+ return 1;
+ if (system_state == SYSTEM_BOOTING &&
+ init_kernel_text_or_data(addr))
+ return 1;
+ if (core_kernel_text(addr))
+ return 1;
+ return 0;
+}
+
+/*
+ * object_is_static - returns true is an object is statically defined
+ */
+int object_is_static(void *obj)
+{
+ if (core_object_is_static(obj))
+ return 1;
+ if (is_module_address((unsigned long) obj))
+ return 1;
+ return 0;
+}
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
Looks plausible to me!
Rusty, any thoughts?
Thanx, Paul
Looks sane at the first glance. Will go over it in detail tomorrow.
> o Patch 6/6: Would it be possible to use the object_is_on_stack()
> function defined in include/linux/sched.h instead of passing
> in the flag on_stack to bdi_work_init()? It looks like
> fs/fs-writeback.c already includes include/linux/sched.h, so
> shouldn't be a problem from a #include-hell viewpoint.
Well, I'm a bit wary about that. The reason is that we really want
the annotation of:
init_on_stack();
....
destroy_on_stack();
instead of the confusing:
init();
....
destroy_on_stack();
So having an automatism in the bdi_work_init() function will people
make forget to put the destroy_on_stack() annotation into it.
The flag is horrible as well. How about this:
/* helper function, do not use in code ! */
__bdi_work_init(....., onstack)
{
....
if (on_stack) {
work.state |= WS_ONSTACK;
init_rcu_head_on_stack(&work.rcu_head);
} else {
....
}
See, how this moves also the "work.state |= WS_ONSTACK;" line out of
the calling code.
bdi_work_init(...)
{
__bdi_work_init(...., false);
}
bdi_work_init_on_stack(...)
{
__bdi_work_init(...., true);
}
out of the code.
To make it complete, please do not use the asymmetric:
destroy_rcu_head_on_stack(&work.rcu_head);
Create a helper function:
bdi_destroy_work_on_stack(...)
{
destroy_rcu_head_on_stack(work->rcu_head);
}
That makes it way more readable and we did that with the other on
stack initializers as well.
Thanks,
tglx
Hello, Thomas,
I must defer to you on this one. ;-)
Thanx, Paul
These changes are queued for v3. Thanks Thomas!
Mathieu
>
> Thanks,
>
> tglx
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
This may only correct for UP.
You may need arch-special codes for SMP.
looking at include/linux/percpu.h:
#ifndef PERCPU_ENOUGH_ROOM
#define PERCPU_ENOUGH_ROOM \
(ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES) + \
PERCPU_MODULE_RESERVE)
#endif
I was under the impression that most architectures were keeping their per-cpu
data within the __per_cpu_start .. __per_cpu_end range. But I see that ia64
is the only one to redefine PERCPU_ENOUGH_ROOM. I'm not sure if it can be a
problem. Is that what you had in mind ?
(adding Tony and Fenghua in CC so they can confirm)
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
Tejun has patches for this.
That's great! Tejun, can you point me out to an update version of these
patches ? I am particularly interested in being able to know the range of
statically defined per-cpu data.
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
Thinking about the rcu head init topic, we might be able to drop the
init_rcu_head() initializer. The idea is the following:
- We need init_rcu_head_on_stack()/destroy_rcu_head_on_stack().
- call_rcu() populates the rcu_head and normally does not care about it being
pre-initialized.
- The activation fixup can detect if a non-initialized rcu head is being
activated and just perform the fixup without complaining.
- If we have two call_rcu() in a row in the same GP on the same rcu_head, the
activation check will detect it.
So either we remove all the init_rcu_head(), as was originally proposed, or we
use one that is a no-op on !DEBUG configs and initialize the object with DEBUG
configs.
That removes the dependency on object_is_static().
Thoughts ?
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
On 03/29/2010 10:16 PM, Mathieu Desnoyers wrote:
> That's great! Tejun, can you point me out to an update version of these
> patches ? I am particularly interested in being able to know the range of
> statically defined per-cpu data.
http://thread.gmane.org/gmane.linux.kernel/958794/focus=959493
These were waiting for Rusty's ACK. They got ACKed today and will be
pushed to mainline through percpu tree soonish.
Thanks.
--
tejun
OK. I just figured that I could initialize the rcu_heads in all cases in the
debugobject fixup anyway, so I guess I won't need "object_is_static()" after
all. But I can keep the patch around so it can eventually be re-sent to
standardize the debugobjects activation fixups. They currently need to keep a
flag around to identify statically defined objects.
Thanks,
Mathieu
>
> Thanks.
>
> --
> tejun
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
If I understand correctly, this does sound good. Here is what I think
you are proposing:
o call_rcu() and friends only complain if handed an rcu_head
structure that is still queued awaiting a grace period.
They don't care otherwise.
o rcu_do_batch() complains unless the rcu_head structure has
most recently been enqueued by call_rcu() or one if its friends.
Did I get it right?
Thanx, Paul
Exactly.
Thanks,
Mathieu
>
> Thanx, Paul
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
Very good!!!
Thanx, Paul
> Implement a basic state machine checker in the debugobjects.
Can you please add some real explanation how that checker works and
why we want to have it ?
Thanks,
tglx
We can add this to the changelog. Is it worth it to create a Documentation file
for it ?
This state machine checker detects races and inconsistencies within the "active"
life of a debugobject. The checker only keeps track of the current state; all
the state machine logic is kept at the object instance level.
The checker works by adding a supplementary "unsigned int astate" field to the
debug_obj structure. It keeps track of the current "active state" of the object.
The only constraints that are imposed on the states by the debugobjects system
is that:
- activation of an object sets the current active state to 0,
- deactivation of an object expects the current active state to be 0.
For the rest of the states, the state mapping is determined by the specific
object instance. Therefore, the logic keeping track of the state machine is
within the specialized instance, without any need to know about it at the
debugobject level.
The current object active state is changed by calling:
debug_object_active_state(addr, descr, expect, next)
where "expect" is the expected state and "next" is the next state to move to if
the expected state is found. A warning is generated if the expected is not
found.
Thanks,
Mathieu
>
> Thanks,
>
> tglx
>
>
>
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
> * Thomas Gleixner (tg...@linutronix.de) wrote:
> > On Mon, 29 Mar 2010, Mathieu Desnoyers wrote:
> >
> > > Implement a basic state machine checker in the debugobjects.
> >
> > Can you please add some real explanation how that checker works and
> > why we want to have it ?
>
> We can add this to the changelog. Is it worth it to create a Documentation file
> for it ?
I meant the changelog. The "Implement ...." line was not really helpful :)
>
> This state machine checker detects races and inconsistencies within the "active"
> life of a debugobject. The checker only keeps track of the current state; all
> the state machine logic is kept at the object instance level.
>
> The checker works by adding a supplementary "unsigned int astate" field to the
> debug_obj structure. It keeps track of the current "active state" of the object.
>
> The only constraints that are imposed on the states by the debugobjects system
> is that:
>
> - activation of an object sets the current active state to 0,
> - deactivation of an object expects the current active state to be 0.
>
> For the rest of the states, the state mapping is determined by the specific
> object instance. Therefore, the logic keeping track of the state machine is
> within the specialized instance, without any need to know about it at the
> debugobject level.
>
> The current object active state is changed by calling:
>
> debug_object_active_state(addr, descr, expect, next)
>
> where "expect" is the expected state and "next" is the next state to move to if
> the expected state is found. A warning is generated if the expected is not
> found.
Does it only warn or is there a callback to fixup things as well ?
Thanks,
tglx
For the moment, it only warns. I have not seen the need for a fixup callback
yet. It might become useful at some point, but I prefer to proceed
incrementally. This kind of callback could become quite big too, because it
would have to deal with transitions "from each to each" states of the system,
with, in the worse case scenario, different fixups for each situation.
Just for the specific case of "do RCU batch", when detecting that a non-queued
rcu head is there for execution, there are a few cases to consider:
- List corruption
- Appears in two lists.
- Appears in the same list twice.
- Race (two threads reading the list at the same time).
- ...
I am probably forgetting about others. So one way to fixup this would be not to
execute the callback, but even then, the lists might be corrupted. So it's not
at all clear to me if we can do much better than reporting the inconsistency
without increasing intrusiveness. But maybe I just need more imagination. ;)
Thanks,
Mathieu
>
> Thanks,
>
> tglx
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
Yes, that sounds tricky to implement and maybe not worth the
effort. Can you resend the debugobjects patch with a full changelog
please ?
Thanks,
tglx
This state machine checker detects races and inconsistencies within the "active"
life of a debugobject. The checker only keeps track of the current state; all
the state machine logic is kept at the object instance level.
The checker works by adding a supplementary "unsigned int astate" field to the
debug_obj structure. It keeps track of the current "active state" of the object.
The only constraints that are imposed on the states by the debugobjects system
is that:
- activation of an object sets the current active state to 0,
- deactivation of an object expects the current active state to be 0.
For the rest of the states, the state mapping is determined by the specific
object instance. Therefore, the logic keeping track of the state machine is
within the specialized instance, without any need to know about it at the
debugobject level.
The current object active state is changed by calling:
debug_object_active_state(addr, descr, expect, next)
where "expect" is the expected state and "next" is the next state to move to if
the expected state is found. A warning is generated if the expected is not
found.
Signed-off-by: Mathieu Desnoyers <mathieu....@efficios.com>
Acked-by: David S. Miller <da...@davemloft.net>
CC: "Paul E. McKenney" <pau...@linux.vnet.ibm.com>
CC: ak...@linux-foundation.org
CC: mi...@elte.hu
CC: la...@cn.fujitsu.com
CC: dipa...@in.ibm.com
CC: jo...@joshtriplett.org
CC: dvh...@us.ibm.com
CC: n...@us.ibm.com
CC: tg...@linutronix.de
CC: pet...@infradead.org
CC: ros...@goodmis.org
CC: Valdis.K...@vt.edu
CC: dhow...@redhat.com
CC: eric.d...@gmail.com
CC: Alexey Dobriyan <adob...@gmail.com>
---
include/linux/debugobjects.h | 11 ++++++++
lib/debugobjects.c | 59 ++++++++++++++++++++++++++++++++++++++++---
2 files changed, 67 insertions(+), 3 deletions(-)
Index: linux.trees.git/include/linux/debugobjects.h
===================================================================
--- linux.trees.git.orig/include/linux/debugobjects.h 2010-03-27 18:48:23.000000000 -0400
+++ linux.trees.git/include/linux/debugobjects.h 2010-03-27 18:50:09.000000000 -0400
@@ -20,12 +20,14 @@ struct debug_obj_descr;
* struct debug_obj - representaion of an tracked object
* @node: hlist node to link the object into the tracker list
* @state: tracked object state
+ * @astate: current active state
* @object: pointer to the real object
* @descr: pointer to an object type specific debug description structure
*/
struct debug_obj {
struct hlist_node node;
enum debug_obj_state state;
+ unsigned int astate;
void *object;
struct debug_obj_descr *descr;
};
@@ -60,6 +62,15 @@ extern void debug_object_deactivate(void
extern void debug_object_destroy (void *addr, struct debug_obj_descr *descr);
extern void debug_object_free (void *addr, struct debug_obj_descr *descr);
+/*
+ * Active state:
+ * - Set at 0 upon initialization.
+ * - Must return to 0 before deactivation.
+ */
+extern void
+debug_object_active_state(void *addr, struct debug_obj_descr *descr,
+ unsigned int expect, unsigned int next);
+
extern void debug_objects_early_init(void);
extern void debug_objects_mem_init(void);
#else
Index: linux.trees.git/lib/debugobjects.c
===================================================================
--- linux.trees.git.orig/lib/debugobjects.c 2010-03-27 18:48:23.000000000 -0400
+++ linux.trees.git/lib/debugobjects.c 2010-03-27 18:50:09.000000000 -0400
@@ -140,6 +140,7 @@ alloc_object(void *addr, struct debug_bu
obj->object = addr;
obj->descr = descr;
obj->state = ODEBUG_STATE_NONE;
+ obj->astate = 0;
hlist_del(&obj->node);
hlist_add_head(&obj->node, &b->list);
@@ -251,8 +252,10 @@ static void debug_print_object(struct de
if (limit < 5 && obj->descr != descr_test) {
limit++;
- WARN(1, KERN_ERR "ODEBUG: %s %s object type: %s\n", msg,
- obj_states[obj->state], obj->descr->name);
+ WARN(1, KERN_ERR "ODEBUG: %s %s (active state %u) "
+ "object type: %s\n",
+ msg, obj_states[obj->state], obj->astate,
+ obj->descr->name);
}
debug_objects_warnings++;
}
@@ -446,7 +449,10 @@ void debug_object_deactivate(void *addr,
case ODEBUG_STATE_INIT:
case ODEBUG_STATE_INACTIVE:
case ODEBUG_STATE_ACTIVE:
- obj->state = ODEBUG_STATE_INACTIVE;
+ if (!obj->astate)
+ obj->state = ODEBUG_STATE_INACTIVE;
+ else
+ debug_print_object(obj, "deactivate");
break;
case ODEBUG_STATE_DESTROYED:
@@ -552,6 +558,53 @@ out_unlock:
raw_spin_unlock_irqrestore(&db->lock, flags);
}
+/**
+ * debug_object_active_state - debug checks object usage state machine
+ * @addr: address of the object
+ * @descr: pointer to an object specific debug description structure
+ * @expect: expected state
+ * @next: state to move to if expected state is found
+ */
+void
+debug_object_active_state(void *addr, struct debug_obj_descr *descr,
+ unsigned int expect, unsigned int next)
+{
+ struct debug_bucket *db;
+ struct debug_obj *obj;
+ unsigned long flags;
+
+ if (!debug_objects_enabled)
+ return;
+
+ db = get_bucket((unsigned long) addr);
+
+ raw_spin_lock_irqsave(&db->lock, flags);
+
+ obj = lookup_object(addr, db);
+ if (obj) {
+ switch (obj->state) {
+ case ODEBUG_STATE_ACTIVE:
+ if (obj->astate == expect)
+ obj->astate = next;
+ else
+ debug_print_object(obj, "active_state");
+ break;
+
+ default:
+ debug_print_object(obj, "active_state");
+ break;
+ }
+ } else {
+ struct debug_obj o = { .object = addr,
+ .state = ODEBUG_STATE_NOTAVAILABLE,
+ .descr = descr };
+
+ debug_print_object(&o, "active_state");
+ }
+
+ raw_spin_unlock_irqrestore(&db->lock, flags);
+}
+
#ifdef CONFIG_DEBUG_OBJECTS_FREE
static void __debug_check_no_obj_freed(const void *address, unsigned long size)
{
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
Thank you, Thomas!!!
Thanx, Paul
Reviewed-by: Thomas Gleixner <tg...@linutronix.de>
Please feel free to pull that into the rcu branch where the users are
going to be.
Thanks,
tglx