After this patchset there will be only one user of local_t left: Mathieu's
trace ringbuffer.
I added some patches that define additional this_cpu ops in order to help
Mathieu with making the trace ringbuffer use this_cpu ops. These have barely
been tested (boots fine on 32 and 64 bit but there is no user of these operations).
RFC state.
V7->V8
- Fix issue in slub patch
- Fix issue in modules patch
- Rediff page allocator patch
- Provide new this_cpu ops needed for ringbuffer [RFC state]
V6->V7
- Drop patches merged in 2.6.33 merge cycle
- Drop risky slub patches
V5->V6:
- Drop patches merged by Tejun.
- Drop irqless slub fastpath for now.
- Patches against Tejun percpu for-next branch.
V4->V5:
- Avoid setup_per_cpu_area() modifications and fold the remainder of the
patch into the page allocator patch.
- Irq disable / per cpu ptr fixes for page allocator patch.
V3->V4:
- Fix various macro definitions.
- Provide experimental percpu based fastpath that does not disable
interrupts for SLUB.
V2->V3:
- Available via git tree against latest upstream from
git://git.kernel.org/pub/scm/linux/kernel/git/christoph/percpu.git linus
- Rework SLUB per cpu operations. Get rid of dynamic DMA slab creation
for CONFIG_ZONE_DMA
- Create fallback framework so that 64 bit ops on 32 bit platforms
can fallback to the use of preempt or interrupt disable. 64 bit
platforms can use 64 bit atomic per cpu ops.
V1->V2:
- Various minor fixes
- Add SLUB conversion
- Add Page allocator conversion
- Patch against the git tree of today
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Hi Christoph,
I am a bit concerned about the "generic" version of this_cpu_cmpxchg.
Given that what LTTng needs is basically an atomic, nmi-safe version of
the primitive (on all architectures that have something close to a NMI),
this means that it could not switch over to your primitives until we add
the equivalent support we currently have with local_t to all
architectures. The transition would be faster if we create an
atomic_cpu_*() variant which would map to local_t operations in the
initial version.
Or maybe have I missed something in your patchset that address this ?
Thanks,
Mathieu
> +
> +static inline unsigned long this_cpu_cmpxchg_generic(volatile void *ptr,
> + unsigned long old, unsigned long new, int size)
> +{
> + unsigned long prev;
> +
> + preempt_disable();
> + prev = __this_cpu_cmpxchg_generic(ptr, old, new, size);
> + preempt_enable();
> + return prev;
> +}
> +
> +#ifndef this_cpu_cmpxchg
> +# ifndef this_cpu_cmpxchg_1
> +# define this_cpu_cmpxchg_1(pcp, old, new) this_cpu_cmpxchg_generic((pcp), (old), (new), 1)
> +# endif
> +# ifndef this_cpu_cmpxchg_2
> +# define this_cpu_cmpxchg_2(pcp, old, new) this_cpu_cmpxchg_generic((pcp), (old), (new), 2)
> +# endif
> +# ifndef this_cpu_cmpxchg_4
> +# define this_cpu_cmpxchg_4(pcp, old, new) this_cpu_cmpxchg_generic((pcp), (old), (new), 4)
> +# endif
> +# ifndef this_cpu_cmpxchg_8
> +# define this_cpu_cmpxchg_8(pcp, old, new) this_cpu_cmpxchg_generic((pcp), (old), (new), 8)
> +# endif
> +# define this_cpu_cmpxchg(pcp, old, new) __pcpu_size_call_return(__this_cpu_cmpxchg_, (old), (new))
> +#endif
> +
> /*
> * Generic percpu operations that do not require preemption handling.
> * Either we do not care about races or the caller has the
> @@ -594,6 +650,22 @@ do { \
> # define __this_cpu_xor(pcp, val) __pcpu_size_call(__this_cpu_xor_, (pcp), (val))
> #endif
>
> +#ifndef __this_cpu_cmpxchg
> +# ifndef __this_cpu_cmpxchg_1
> +# define __this_cpu_cmpxchg_1(pcp, old, new) __this_cpu_cmpxchg_generic((pcp), (old), (new), 1)
> +# endif
> +# ifndef __this_cpu_cmpxchg_2
> +# define __this_cpu_cmpxchg_2(pcp, old, new) __this_cpu_cmpxchg_generic((pcp), (old), (new), 2)
> +# endif
> +# ifndef __this_cpu_cmpxchg_4
> +# define __this_cpu_cmpxchg_4(pcp, old, new) __this_cpu_cmpxchg_generic((pcp), (old), (new), 4)
> +# endif
> +# ifndef __this_cpu_cmpxchg_8
> +# define __this_cpu_cmpxchg_8(pcp, old, new) __this_cpu_cmpxchg_generic((pcp), (old), (new), 8)
> +# endif
> +# define __this_cpu_cmpxchg(pcp, old, new) __pcpu_size_call_return(__this_cpu_cmpxchg_, (old), (new))
> +#endif
> +
> /*
> * IRQ safe versions of the per cpu RMW operations. Note that these operations
> * are *not* safe against modification of the same variable from another
> @@ -709,4 +781,31 @@ do { \
> # define irqsafe_cpu_xor(pcp, val) __pcpu_size_call(irqsafe_cpu_xor_, (val))
> #endif
>
> +static inline unsigned long irqsafe_cpu_cmpxchg_generic(volatile void *ptr,
> + unsigned long old, unsigned long new, int size)
> +{
> + unsigned long flags, prev;
> +
> + local_irq_save(flags);
> + prev = __this_cpu_cmpxchg_generic(ptr, old, new, size);
> + local_irq_restore(flags);
> + return prev;
> +}
> +
> +#ifndef irqsafe_cpu_cmpxchg
> +# ifndef irqsafe_cpu_cmpxchg_1
> +# define irqsafe_cpu_cmpxchg_1(pcp, old, new) irqsafe_cpu_cmpxchg_generic(((pcp), (old), (new), 1)
> +# endif
> +# ifndef irqsafe_cpu_cmpxchg_2
> +# define irqsafe_cpu_cmpxchg_2(pcp, old, new) irqsafe_cpu_cmpxchg_generic(((pcp), (old), (new), 2)
> +# endif
> +# ifndef irqsafe_cpu_cmpxchg_4
> +# define irqsafe_cpu_cmpxchg_4(pcp, old, new) irqsafe_cpu_cmpxchg_generic(((pcp), (old), (new), 4)
> +# endif
> +# ifndef irqsafe_cpu_cmpxchg_8
> +# define irqsafe_cpu_cmpxchg_8(pcp, old, new) irqsafe_cpu_cmpxchg_generic(((pcp), (old), (new), 8)
> +# endif
> +# define irqsafe_cpu_cmpxchg(pcp, old, new) __pcpu_size_call_return(irqsafe_cpu_cmpxchg_, (old), (new))
> +#endif
> +
> #endif /* __LINUX_PERCPU_H */
>
> --
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
Patches 3-6 applied.
Wouldn't it be better to use __builtin_constant_p() and switching in
arch add/dec macros? It just seems a bit extreme to define all those
different variants from generic header. I'm quite unsure whether
providing overrides for all the different size variants from generic
header is necessary at all. If an arch is gonna override the
operation, it's probably best to just let it override the whole thing.
Thanks.
--
tejun
On 12/19/2009 07:26 AM, Christoph Lameter wrote:
> Use cpu ops to deal with the per cpu data instead of a local_t. Reduces memory
> requirements, cache footprint and decreases cycle counts.
>
> The this_cpu_xx operations are also used for !SMP mode. Otherwise we could
> not drop the use of __module_ref_addr() which would make per cpu data handling
> complicated. this_cpu_xx operations have their own fallback for !SMP.
>
> The last hold out of users of local_t is the tracing ringbuffer after this patch
> has been applied.
>
> Signed-off-by: Christoph Lameter <c...@linux-foundation.org>
Please keep Rusty Russell cc'd on module changes.
> Index: linux-2.6/kernel/trace/ring_buffer.c
> ===================================================================
> --- linux-2.6.orig/kernel/trace/ring_buffer.c 2009-12-18 13:13:24.000000000 -0600
> +++ linux-2.6/kernel/trace/ring_buffer.c 2009-12-18 14:15:57.000000000 -0600
> @@ -12,6 +12,7 @@
> #include <linux/hardirq.h>
> #include <linux/kmemcheck.h>
> #include <linux/module.h>
> +#include <asm/local.h>
This doesn't belong to this patch, right? I stripped this part out,
added Cc: to Rusty and applied 1, 2, 7 and 8 to percpu branch. I'll
post the updated patch here.
Thanks.
--
tejun
Thanks.
From 8af47ddd01364ae3c663c0bc92415c06fe887ba1 Mon Sep 17 00:00:00 2001
From: Christoph Lameter <c...@linux-foundation.org>
Date: Fri, 18 Dec 2009 16:26:24 -0600
Subject: [PATCH] module: Use this_cpu_xx to dynamically allocate counters
Use cpu ops to deal with the per cpu data instead of a local_t. Reduces memory
requirements, cache footprint and decreases cycle counts.
The this_cpu_xx operations are also used for !SMP mode. Otherwise we could
not drop the use of __module_ref_addr() which would make per cpu data handling
complicated. this_cpu_xx operations have their own fallback for !SMP.
The last hold out of users of local_t is the tracing ringbuffer after this patch
has been applied.
Signed-off-by: Christoph Lameter <c...@linux-foundation.org>
Cc: Rusty Russell <ru...@rustcorp.com.au>
Signed-off-by: Tejun Heo <t...@kernel.org>
---
include/linux/module.h | 38 ++++++++++++++------------------------
kernel/module.c | 30 ++++++++++++++++--------------
2 files changed, 30 insertions(+), 38 deletions(-)
diff --git a/include/linux/module.h b/include/linux/module.h
index 482efc8..9350577 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -16,8 +16,7 @@
#include <linux/kobject.h>
#include <linux/moduleparam.h>
#include <linux/tracepoint.h>
-
-#include <asm/local.h>
+#include <linux/percpu.h>
#include <asm/module.h>
#include <trace/events/module.h>
@@ -361,11 +360,9 @@ struct module
/* Destruction function. */
void (*exit)(void);
-#ifdef CONFIG_SMP
- char *refptr;
-#else
- local_t ref;
-#endif
+ struct module_ref {
+ int count;
+ } *refptr;
#endif
#ifdef CONFIG_CONSTRUCTORS
@@ -452,25 +449,16 @@ void __symbol_put(const char *symbol);
#define symbol_put(x) __symbol_put(MODULE_SYMBOL_PREFIX #x)
void symbol_put_addr(void *addr);
-static inline local_t *__module_ref_addr(struct module *mod, int cpu)
-{
-#ifdef CONFIG_SMP
- return (local_t *) (mod->refptr + per_cpu_offset(cpu));
-#else
- return &mod->ref;
-#endif
-}
-
/* Sometimes we know we already have a refcount, and it's easier not
to handle the error case (which only happens with rmmod --wait). */
static inline void __module_get(struct module *module)
{
if (module) {
- unsigned int cpu = get_cpu();
- local_inc(__module_ref_addr(module, cpu));
+ preempt_disable();
+ __this_cpu_inc(module->refptr->count);
trace_module_get(module, _THIS_IP_,
- local_read(__module_ref_addr(module, cpu)));
- put_cpu();
+ __this_cpu_read(module->refptr->count));
+ preempt_enable();
}
}
@@ -479,15 +467,17 @@ static inline int try_module_get(struct module *module)
int ret = 1;
if (module) {
- unsigned int cpu = get_cpu();
+ preempt_disable();
+
if (likely(module_is_live(module))) {
- local_inc(__module_ref_addr(module, cpu));
+ __this_cpu_inc(module->refptr->count);
trace_module_get(module, _THIS_IP_,
- local_read(__module_ref_addr(module, cpu)));
+ __this_cpu_read(module->refptr->count));
}
else
ret = 0;
- put_cpu();
+
+ preempt_enable();
}
return ret;
}
diff --git a/kernel/module.c b/kernel/module.c
index 64787cd..9bc04de 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -474,9 +474,10 @@ static void module_unload_init(struct module *mod)
INIT_LIST_HEAD(&mod->modules_which_use_me);
for_each_possible_cpu(cpu)
- local_set(__module_ref_addr(mod, cpu), 0);
+ per_cpu_ptr(mod->refptr, cpu)->count = 0;
+
/* Hold reference count during initialization. */
- local_set(__module_ref_addr(mod, raw_smp_processor_id()), 1);
+ __this_cpu_write(mod->refptr->count, 1);
/* Backwards compatibility macros put refcount during init. */
mod->waiter = current;
}
@@ -555,6 +556,7 @@ static void module_unload_free(struct module *mod)
kfree(use);
sysfs_remove_link(i->holders_dir, mod->name);
/* There can be at most one match. */
+ free_percpu(i->refptr);
break;
}
}
@@ -619,7 +621,7 @@ unsigned int module_refcount(struct module *mod)
int cpu;
for_each_possible_cpu(cpu)
- total += local_read(__module_ref_addr(mod, cpu));
+ total += per_cpu_ptr(mod->refptr, cpu)->count;
return total;
}
EXPORT_SYMBOL(module_refcount);
@@ -796,14 +798,15 @@ static struct module_attribute refcnt = {
void module_put(struct module *module)
{
if (module) {
- unsigned int cpu = get_cpu();
- local_dec(__module_ref_addr(module, cpu));
+ preempt_disable();
+ __this_cpu_dec(module->refptr->count);
+
trace_module_put(module, _RET_IP_,
- local_read(__module_ref_addr(module, cpu)));
+ __this_cpu_read(module->refptr->count));
/* Maybe they're waiting for us to drop reference? */
if (unlikely(!module_is_live(module)))
wake_up_process(module->waiter);
- put_cpu();
+ preempt_enable();
}
}
EXPORT_SYMBOL(module_put);
@@ -1377,9 +1380,9 @@ static void free_module(struct module *mod)
kfree(mod->args);
if (mod->percpu)
percpu_modfree(mod->percpu);
-#if defined(CONFIG_MODULE_UNLOAD) && defined(CONFIG_SMP)
+#if defined(CONFIG_MODULE_UNLOAD)
if (mod->refptr)
- percpu_modfree(mod->refptr);
+ free_percpu(mod->refptr);
#endif
/* Free lock-classes: */
lockdep_free_key_range(mod->module_core, mod->core_size);
@@ -2145,9 +2148,8 @@ static noinline struct module *load_module(void __user *umod,
mod = (void *)sechdrs[modindex].sh_addr;
kmemleak_load_module(mod, hdr, sechdrs, secstrings);
-#if defined(CONFIG_MODULE_UNLOAD) && defined(CONFIG_SMP)
- mod->refptr = percpu_modalloc(sizeof(local_t), __alignof__(local_t),
- mod->name);
+#if defined(CONFIG_MODULE_UNLOAD)
+ mod->refptr = alloc_percpu(struct module_ref);
if (!mod->refptr) {
err = -ENOMEM;
goto free_init;
@@ -2373,8 +2375,8 @@ static noinline struct module *load_module(void __user *umod,
kobject_put(&mod->mkobj.kobj);
free_unload:
module_unload_free(mod);
-#if defined(CONFIG_MODULE_UNLOAD) && defined(CONFIG_SMP)
- percpu_modfree(mod->refptr);
+#if defined(CONFIG_MODULE_UNLOAD)
+ free_percpu(mod->refptr);
free_init:
#endif
module_free(mod, mod->module_init);
--
1.6.4.2
This was changed to Acked-by as Rusty acked on the previous thread.
Thanks.
--
tejun
This looks very wrong.
Rusty
Indeed, thanks for spotting it. Christoph, I'm rolling back all
patches from this series. Please re-post with updates.
Thanks.
--
tejun
> I am a bit concerned about the "generic" version of this_cpu_cmpxchg.
> Given that what LTTng needs is basically an atomic, nmi-safe version of
> the primitive (on all architectures that have something close to a NMI),
> this means that it could not switch over to your primitives until we add
> the equivalent support we currently have with local_t to all
> architectures. The transition would be faster if we create an
> atomic_cpu_*() variant which would map to local_t operations in the
> initial version.
>
> Or maybe have I missed something in your patchset that address this ?
NMI safeness is not covered by this_cpu operations.
We could add nmi_safe_.... ops?
The atomic_cpu reference make me think that you want full (LOCK)
semantics? Then use the regular atomic ops?
>
> > Index: linux-2.6/kernel/trace/ring_buffer.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/trace/ring_buffer.c 2009-12-18 13:13:24.000000000 -0600
> > +++ linux-2.6/kernel/trace/ring_buffer.c 2009-12-18 14:15:57.000000000 -0600
> > @@ -12,6 +12,7 @@
> > #include <linux/hardirq.h>
> > #include <linux/kmemcheck.h>
> > #include <linux/module.h>
> > +#include <asm/local.h>
>
> This doesn't belong to this patch, right? I stripped this part out,
> added Cc: to Rusty and applied 1, 2, 7 and 8 to percpu branch. I'll
> post the updated patch here. Thanks.
If you strip this part out then ringbuffer.c will no longer compile
since it relies on the "#include <local.h>" that is removed by this patch
from the module.h file.
> index 482efc8..9350577 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -16,8 +16,7 @@
> #include <linux/kobject.h>
> #include <linux/moduleparam.h>
> #include <linux/tracepoint.h>
> -
> -#include <asm/local.h>
> +#include <linux/percpu.h>
This is going to break ringbuffer.c.
> On 12/19/2009 07:26 AM, Christoph Lameter wrote:
> > Current this_cpu ops only allow an arch to specify add RMW operations or inc
> > and dec for all sizes. Some arches can do more efficient inc and dec
> > operations. Allow size specific override of fallback functions like with
> > the other operations.
>
> Wouldn't it be better to use __builtin_constant_p() and switching in
> arch add/dec macros? It just seems a bit extreme to define all those
Yes that could be done but I am on vacation till next year ;-)...
> On 12/22/2009 08:28 AM, Rusty Russell wrote:
> > On Mon, 21 Dec 2009 06:29:36 pm Tejun Heo wrote:
> >> @@ -555,6 +556,7 @@ static void module_unload_free(struct module *mod)
> >> kfree(use);
> >> sysfs_remove_link(i->holders_dir, mod->name);
> >> /* There can be at most one match. */
> >> + free_percpu(i->refptr);
> >> break;
> >> }
> >> }
> >
> > This looks very wrong.
>
> Indeed, thanks for spotting it. Christoph, I'm rolling back all
> patches from this series. Please re-post with updates.
Simply drop the chunk?
nmi_safe would probably make sense here.
But given that we have to disable preemption to add precision in terms
of trace clock timestamp, I wonder if we would really gain something
considerable performance-wise.
I also thought about the design change this requires for the per-cpu
buffer commit count pointer which would have to become a per-cpu pointer
independent of the buffer structure, and I foresee a problem with
Steven's irq off tracing which need to perform buffer exchanges while
tracing is active. Basically, having only one top-level pointer for the
buffer makes it possible to exchange it atomically, but if we have to
have two separate pointers (one for per-cpu buffer, one for per-cpu
commit count array), then we are stucked.
So given that per-cpu ops limits us in terms of data structure layout, I
am less and less sure it's the best fit for ring buffers, especially if
we don't gain much performance-wise.
Thanks,
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
On 12/23/2009 12:56 AM, Christoph Lameter wrote:
> On Mon, 21 Dec 2009, Tejun Heo wrote:
>
>> On 12/19/2009 07:26 AM, Christoph Lameter wrote:
>>> Current this_cpu ops only allow an arch to specify add RMW operations or inc
>>> and dec for all sizes. Some arches can do more efficient inc and dec
>>> operations. Allow size specific override of fallback functions like with
>>> the other operations.
>>
>> Wouldn't it be better to use __builtin_constant_p() and switching in
>> arch add/dec macros? It just seems a bit extreme to define all those
>
> Yes that could be done but I am on vacation till next year ;-)...
I'm not sure how much I would be working from now till the end of the
year but if I end up doing some work I'll give it a shot.
Happy holidays and enjoy your vacation!
--
tejun
Oh... alright. I'll add that comment and drop the offending chunk and
recommit.
Thanks.
--
tejun
> On 12/23/2009 12:57 AM, Christoph Lameter wrote:
> > On Mon, 21 Dec 2009, Tejun Heo wrote:
> >
> >>
> >>> Index: linux-2.6/kernel/trace/ring_buffer.c
> >>> ===================================================================
> >>> --- linux-2.6.orig/kernel/trace/ring_buffer.c 2009-12-18 13:13:24.000000000 -0600
> >>> +++ linux-2.6/kernel/trace/ring_buffer.c 2009-12-18 14:15:57.000000000 -0600
> >>> @@ -12,6 +12,7 @@
> >>> #include <linux/hardirq.h>
> >>> #include <linux/kmemcheck.h>
> >>> #include <linux/module.h>
> >>> +#include <asm/local.h>
> >>
> >> This doesn't belong to this patch, right? I stripped this part out,
> >> added Cc: to Rusty and applied 1, 2, 7 and 8 to percpu branch. I'll
> >> post the updated patch here. Thanks.
> >
> > If you strip this part out then ringbuffer.c will no longer compile
> > since it relies on the "#include <local.h>" that is removed by this patch
> > from the module.h file.
>
> Oh... alright. I'll add that comment and drop the offending chunk and
> recommit.
So we need a separate patch to deal with remval of the #include
<asm/local.h> from modules.h?
> > > I am a bit concerned about the "generic" version of this_cpu_cmpxchg.
> > > Given that what LTTng needs is basically an atomic, nmi-safe version of
> > > the primitive (on all architectures that have something close to a NMI),
> > > this means that it could not switch over to your primitives until we add
> > > the equivalent support we currently have with local_t to all
> > > architectures. The transition would be faster if we create an
> > > atomic_cpu_*() variant which would map to local_t operations in the
> > > initial version.
> > >
> > > Or maybe have I missed something in your patchset that address this ?
> >
> > NMI safeness is not covered by this_cpu operations.
> >
> > We could add nmi_safe_.... ops?
> >
> > The atomic_cpu reference make me think that you want full (LOCK)
> > semantics? Then use the regular atomic ops?
>
> nmi_safe would probably make sense here.
I am not sure how to implement fallback for nmi_safe operations though
since there is no way of disabling NMIs.
> But given that we have to disable preemption to add precision in terms
> of trace clock timestamp, I wonder if we would really gain something
> considerable performance-wise.
Not sure what exactly you attempt to do there.
> I also thought about the design change this requires for the per-cpu
> buffer commit count pointer which would have to become a per-cpu pointer
> independent of the buffer structure, and I foresee a problem with
> Steven's irq off tracing which need to perform buffer exchanges while
> tracing is active. Basically, having only one top-level pointer for the
> buffer makes it possible to exchange it atomically, but if we have to
> have two separate pointers (one for per-cpu buffer, one for per-cpu
> commit count array), then we are stucked.
You just need to keep percpu pointers that are offsets into the percpu
area. They can be relocated as needed to the processor specific addresses
using the cpu ops.
> So given that per-cpu ops limits us in terms of data structure layout, I
> am less and less sure it's the best fit for ring buffers, especially if
> we don't gain much performance-wise.
I dont understand how exactly the ring buffer logic works and what you are
trying to do here.
The ringbuffers are per cpu structures right and you do not change cpus
while performing operations on them? If not then the per cpu ops are not
useful to you.
If you dont: How can you safely use the local_t operations for the
ringbuffer logic?
I think this would make sense, yes. This would be a patch specific to
the ring-buffer code that would go through (or be acked by) Steven.
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
Trying to make a long story short:
This scheme where Steven moves buffers from one CPU to another, he only
performs this operation when some other exclusion mechanism ensures that
the buffer is not used for writing by the CPU when this move operation
is done.
It is therefore correct, and needs local_t type to deal with the data
structure hierarchy vs atomic exchange as I pointed in my email.
Mathieu
>
> If you dont: How can you safely use the local_t operations for the
> ringbuffer logic?
>
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
> This scheme where Steven moves buffers from one CPU to another, he only
> performs this operation when some other exclusion mechanism ensures that
> the buffer is not used for writing by the CPU when this move operation
> is done.
>
> It is therefore correct, and needs local_t type to deal with the data
> structure hierarchy vs atomic exchange as I pointed in my email.
Yes this_cpu_xx does not seem to work for your scheme. Please review the
patchset that I sent you instead.