[PATCH] softirq: punt to ksoftirqd if __do

Rik van Riel

unread,

Apr 10, 2014, 12:00:02 PM4/10/14

to

Jiri noticed that netperf throughput had gotten worse in recent years,
for smaller message sizes. In the past, ksoftirqd would take around 80%
of a CPU, and netserver would take around 100% of another CPU.

On current kernels, sometimes all the softirq processing is done in the
context of the netperf process, which can result in as much as a 50%
performance drop, due to netserver spending all its CPU time "delivering"
packets to a socket it rarely empties, and dropping the packets on the
floor as a result.

This seems silly in an age where even cell phones are multi-core, and
we could simply let the ksoftirqd thread handle the softirq load, so
the scheduler can migrate the userspace task to another CPU.

This patch accomplishes that in a very simplistic way. The code
remembers when __do_softirq last looped, and will punt softirq
handling to ksoftirqd if another softirq happens in the same jiffie.

Netperf results:

without patch with patch
UDP_STREAM 1472 957.17 / 954.18 957.15 / 951.73
UDP_STREAM 978 936.85 / 930.06 936.84 / 927.63
UDP_STREAM 466 875.98 / 865.62 875.98 / 868.65
UDP_STREAM 210 760.88 / 748.70 760.88 / 748.61
UDP_STREAM 82 554.06 / 329.96 554.06 / 505.95
unstable ^^^^^^
UDP_STREAM 18 158.99 / 108.95 160.73 / 112.68

Signed-off-by: Rik van Riel <ri...@redhat.com>
Cc: Frederic Weisbecker <fwei...@gmail.com>
Cc: David Miller <da...@davemloft.net>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: Peter Zijlstra <pet...@infradead.org>
Tested-by: Jiri Benc <jb...@redhat.com>
Reported-by: Jiri Benc <jb...@redhat.com>
---
kernel/softirq.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 787b3a0..020be2f 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -56,6 +56,7 @@ EXPORT_SYMBOL(irq_stat);
static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;

DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
+DEFINE_PER_CPU(unsigned long, softirq_looped);

char *softirq_to_name[NR_SOFTIRQS] = {
"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL",
@@ -271,6 +272,9 @@ asmlinkage void __do_softirq(void)

pending = local_softirq_pending();
if (pending) {
+ /* Still busy? Remember this for invoke_softirq() below... */
+ this_cpu_write(softirq_looped, jiffies);
+
if (time_before(jiffies, end) && !need_resched() &&
--max_restart)
goto restart;
@@ -330,7 +334,11 @@ void irq_enter(void)

static inline void invoke_softirq(void)
{
- if (!force_irqthreads) {
+ /*
+ * If force_irqthreads is set, or if we looped in __do_softirq this
+ * jiffie, punt to ksoftirqd to prevent userland starvation.
+ */
+ if (!force_irqthreads && this_cpu_read(softirq_looped) != jiffies) {
/*
* We can safely execute softirq on the current stack if
* it is the irq stack, because it should be near empty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Eric Dumazet

unread,

Apr 10, 2014, 2:20:01 PM4/10/14

to

I guess this is the tradeoff between latencies and throughput.

Have you tried some TCP_RR / UDP_RR tests with one / multiple
instances, and have you tried drivers that use deferred skb freeing
(hard irq calling TX completion handler, then dev_kfree_skb_any()
scheduling TX softirq) and increase chance of having a not zero
local_softirq_pending()

Calling skb destructor on a different cpu can have a huge false sharing
effect. A TCP socket is really big.

Your test only UDP_STREAM stresses the RX part, and UDP sockets dont
use the complex callbacks TCP sockets use.

David Miller

unread,

Apr 11, 2014, 4:40:02 PM4/11/14

to

From: Rik van Riel <ri...@redhat.com>
Date: Thu, 10 Apr 2014 11:57:06 -0400

> @@ -330,7 +334,11 @@ void irq_enter(void)
>
> static inline void invoke_softirq(void)
> {
> - if (!force_irqthreads) {
> + /*
> + * If force_irqthreads is set, or if we looped in __do_softirq this
> + * jiffie, punt to ksoftirqd to prevent userland starvation.
> + */
> + if (!force_irqthreads && this_cpu_read(softirq_looped) != jiffies) {

If we do this, which I'm not convinced of yet, I think we should use two
jiffies as the cutoff.

Because if we are at the tail end of one jiffie we'll prematurely go to
ksoftirqd when we really have no reason to do so.

Rik van Riel

unread,

Apr 11, 2014, 8:10:02 PM4/11/14

to

On 04/11/2014 04:33 PM, David Miller wrote:
> From: Rik van Riel <ri...@redhat.com>
> Date: Thu, 10 Apr 2014 11:57:06 -0400
>
>> @@ -330,7 +334,11 @@ void irq_enter(void)
>>
>> static inline void invoke_softirq(void)
>> {
>> - if (!force_irqthreads) {
>> + /*
>> + * If force_irqthreads is set, or if we looped in __do_softirq this
>> + * jiffie, punt to ksoftirqd to prevent userland starvation.
>> + */
>> + if (!force_irqthreads && this_cpu_read(softirq_looped) != jiffies) {
>
> If we do this, which I'm not convinced of yet, I think we should use two
> jiffies as the cutoff.

I am not fully convinced, either. This patch mostly just illustrates
the problem, and gives something that solves Jiri's immediate problem.

It is quite likely that there is a better way to solve the problem of:
1) softirq handling starving out userspace processing,
2) which could be solved by moving the userspace process elsewhere, and
3) shifting softirq processing to ksoftirqd

A working patch seems to be one of the better ways to start a
discussion, though.

If anybody has a nicer idea on how to solve the problem, I'd even be
willing to implement your idea, and give Jiri another patch to test :)

--
All rights reversed

[PATCH] softirq: punt to ksoftirqd if __do_softirq recently looped

Rik van Riel

Eric Dumazet

David Miller

Rik van Riel