Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH] lib/percpu_counter.c: disable local irq when updating percpu couter

4 views
Skip to first unread message

Ming Lei

unread,
Jan 7, 2014, 5:40:02 AM1/7/14
to
__percpu_counter_add() may be called in softirq/hardirq handler
(such as, blk_mq_queue_exit() is typically called in hardirq/softirq
handler), so we need to disable local irq when updating the percpu
counter, otherwise counts may be lost.

The patch fixes problem that 'rmmod null_blk' may hang in blk_cleanup_queue()
because of miscounting of request_queue->mq_usage_counter.

Cc: Paul Gortmaker <paul.go...@windriver.com>
Cc: Andrew Morton <ak...@linux-foundation.org>
Cc: Shaohua Li <sh...@fusionio.com>
Cc: Jens Axboe <ax...@kernel.dk>
Cc: Fan Du <fan...@windriver.com>
Signed-off-by: Ming Lei <tom.l...@gmail.com>
---
lib/percpu_counter.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
index 7473ee3..2b87bc1 100644
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -75,19 +75,19 @@ EXPORT_SYMBOL(percpu_counter_set);
void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch)
{
s64 count;
+ unsigned long flags;

- preempt_disable();
+ raw_local_irq_save(flags);
count = __this_cpu_read(*fbc->counters) + amount;
if (count >= batch || count <= -batch) {
- unsigned long flags;
- raw_spin_lock_irqsave(&fbc->lock, flags);
+ raw_spin_lock(&fbc->lock);
fbc->count += count;
- raw_spin_unlock_irqrestore(&fbc->lock, flags);
+ raw_spin_unlock(&fbc->lock);
__this_cpu_write(*fbc->counters, 0);
} else {
__this_cpu_write(*fbc->counters, count);
}
- preempt_enable();
+ raw_local_irq_restore(flags);
}
EXPORT_SYMBOL(__percpu_counter_add);

--
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Andrew Morton

unread,
Jan 7, 2014, 5:30:02 PM1/7/14
to
On Tue, 7 Jan 2014 18:29:27 +0800 Ming Lei <tom.l...@gmail.com> wrote:

> __percpu_counter_add() may be called in softirq/hardirq handler
> (such as, blk_mq_queue_exit() is typically called in hardirq/softirq
> handler), so we need to disable local irq when updating the percpu
> counter, otherwise counts may be lost.

OK.

> The patch fixes problem that 'rmmod null_blk' may hang in blk_cleanup_queue()
> because of miscounting of request_queue->mq_usage_counter.
>
> ...
>
> --- a/lib/percpu_counter.c
> +++ b/lib/percpu_counter.c
> @@ -75,19 +75,19 @@ EXPORT_SYMBOL(percpu_counter_set);
> void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch)
> {
> s64 count;
> + unsigned long flags;
>
> - preempt_disable();
> + raw_local_irq_save(flags);
> count = __this_cpu_read(*fbc->counters) + amount;
> if (count >= batch || count <= -batch) {
> - unsigned long flags;
> - raw_spin_lock_irqsave(&fbc->lock, flags);
> + raw_spin_lock(&fbc->lock);
> fbc->count += count;
> - raw_spin_unlock_irqrestore(&fbc->lock, flags);
> + raw_spin_unlock(&fbc->lock);
> __this_cpu_write(*fbc->counters, 0);
> } else {
> __this_cpu_write(*fbc->counters, count);
> }
> - preempt_enable();
> + raw_local_irq_restore(flags);
> }
> EXPORT_SYMBOL(__percpu_counter_add);

Can this be made more efficient?

The this_cpu_foo() documentation is fairly dreadful, but way down at
the end of Documentation/this_cpu_ops.txt we find "this_cpu ops are
interrupt safe". So I think this is a more efficient fix:

--- a/lib/percpu_counter.c~a
+++ a/lib/percpu_counter.c
@@ -82,10 +82,10 @@ void __percpu_counter_add(struct percpu_
unsigned long flags;
raw_spin_lock_irqsave(&fbc->lock, flags);
fbc->count += count;
+ __this_cpu_sub(*fbc->counters, count);
raw_spin_unlock_irqrestore(&fbc->lock, flags);
- __this_cpu_write(*fbc->counters, 0);
} else {
- __this_cpu_write(*fbc->counters, count);
+ this_cpu_add(*fbc->counters, amount);
}
preempt_enable();
}

It avoids the local_irq_disable() in the common case, when the CPU
supports efficient this_cpu_add(). It will in rare race situations
permit the cpu-local counter to exceed `batch', but that should be
harmless.

What do you think?

Ming Lei

unread,
Jan 7, 2014, 8:20:02 PM1/7/14
to
Hi Andrew,
I am wondering if the above patch is more efficient, because:

- raw_local_irq_save()/raw_local_irq_restore() should be cheaper
than preempt_enable() in theory

- except for x86 and s390, other ARCHs have not their own implementation
of this_cpu_foo(), and the generic one just disables local interrupt
when operating the percpu variable.

So I suggest to fix it by replacing preempt_* with raw_local_irq_*.


Thanks,
--
Ming Lei

Andrew Morton

unread,
Jan 7, 2014, 8:40:01 PM1/7/14
to
On Wed, 8 Jan 2014 09:12:19 +0800 Ming Lei <tom.l...@gmail.com> wrote:

> Hi Andrew,
Don't think so - local_irq_disable() requires quite some internal
synchronization in the CPU and is expensive. preempt_disable() is just
an add+barrier, minus the add if the kernel is non-preemptable.

> - except for x86 and s390, other ARCHs have not their own implementation
> of this_cpu_foo(), and the generic one just disables local interrupt
> when operating the percpu variable.

Yup. But other CPUs should and will optimise their this_cpu
implementations over time.

Ming Lei

unread,
Jan 7, 2014, 10:40:02 PM1/7/14
to
Hi Andrew,

On Wed, Jan 8, 2014 at 9:36 AM, Andrew Morton <ak...@linux-foundation.org> wrote:
>> I am wondering if the above patch is more efficient, because:
>>
>> - raw_local_irq_save()/raw_local_irq_restore() should be cheaper
>> than preempt_enable() in theory
>
> Don't think so - local_irq_disable() requires quite some internal
> synchronization in the CPU and is expensive. preempt_disable() is just

Yes, it might be a little expensive on some CPUs, but should be
arch-dependent(CPU inside things are involved)

> an add+barrier, minus the add if the kernel is non-preemptable.

IMO, generally, from software view, local_irq_save() only save the
CPU's interrupt mask to the local variable 'flag', and sets irq mask
to register, considered local variable can be thought to be in cache,
so I think it might be cheaper than preempt_enable() because
preempt counter may not be in cache.

Also this_cpu_add() won't work in batch path(slow path), we still
need to avoid interrupt coming between reading the percpu counter
and resetting it, otherwise counts might be lost too, :-)

Thanks,
--
Ming Lei

Ming Lei

unread,
Jan 7, 2014, 11:40:01 PM1/7/14
to
On Wed, Jan 8, 2014 at 11:29 AM, Ming Lei <tom.l...@gmail.com> wrote:
> Hi Andrew,
>
> On Wed, Jan 8, 2014 at 9:36 AM, Andrew Morton <ak...@linux-foundation.org> wrote:
>>> I am wondering if the above patch is more efficient, because:
>>>
>>> - raw_local_irq_save()/raw_local_irq_restore() should be cheaper
>>> than preempt_enable() in theory
>>
>> Don't think so - local_irq_disable() requires quite some internal
>> synchronization in the CPU and is expensive. preempt_disable() is just
>
> Yes, it might be a little expensive on some CPUs, but should be
> arch-dependent(CPU inside things are involved)
>
>> an add+barrier, minus the add if the kernel is non-preemptable.
>
> IMO, generally, from software view, local_irq_save() only save the
> CPU's interrupt mask to the local variable 'flag', and sets irq mask
> to register, considered local variable can be thought to be in cache,
> so I think it might be cheaper than preempt_enable() because
> preempt counter may not be in cache.
>
> Also this_cpu_add() won't work in batch path(slow path), we still
> need to avoid interrupt coming between reading the percpu counter
> and resetting it, otherwise counts might be lost too, :-)

Sorry, I miss the __this_cpu_sub() in slow path, so it is correct, and
even preempt_enable() and preempt_disable() can be removed.
0 new messages