Currently, when fair sleepers are enabled, the task that was sleeping seem to
get a bonus of cfs_rq->min_vruntime - sched_latency (in most cases). While with
gentle fair sleepers this effect was reduced to half, there still remains a
chance that on busy machines with more number of tasks, the sleepers might get
a huge undue bonus.
Here's a patch to avoid this by computing the entitled CPU time for the
sleeping task during the period taking into account only the current
cfs_rq->nr_running and thus tries to make it adaptive.
Compile-tested only.
Signed-off-by: Suresh Jayaraman <sjaya...@suse.de>
---
kernel/sched_fair.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 42ac3c9..d81fcb3 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -739,6 +739,15 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
/* sleeps up to a single latency don't count. */
if (!initial && sched_feat(FAIR_SLEEPERS)) {
unsigned long thresh = sysctl_sched_latency;
+ unsigned long delta_exec = (unsigned long)
+ (rq_of(cfs_rq)->clock - se->exec_start);
+ unsigned long sleeper_bonus;
+
+ /* entitled share of CPU time adapted to current nr_running */
+ if (likely(cfs_rq->nr_running > 1))
+ sleeper_bonus = delta_exec/cfs_rq->nr_running;
+ else
+ sleeper_bonus = delta_exec;
/*
* Convert the sleeper threshold into virtual time.
@@ -757,7 +766,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
if (sched_feat(GENTLE_FAIR_SLEEPERS))
thresh >>= 1;
- vruntime -= thresh;
+ vruntime -= min(thresh, sleeper_bonus);
}
/* ensure we never gain time by being placed backwards. */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
There is no bonus. Sleepers simply get to keep some of their lag, but
any lag beyond sched_latency is trashed in the interest of reasonable
latency for non-sleepers as the sleeper preempts and tries to catch up.
-Mike
Sorry, perhaps it's not a bonus, but it seems that the credit to
sleepers due to their lag (when it was sleeping) doesn't appear to take
in to account the number of tasks in the run_queue currently. IOW, the
credit to sleepers is same irrespective of the number of current tasks.
This might mean sleepers are getting an edge (since this will slow down
current tasks) when the number of tasks is more, isn't?
Would it be a good idea to make the threshold dependent on number of
tasks? This can help us achieve sleeper fairness with respect to the
current context and not relevant to when the task went to sleep, I think.
Does this make sense?
Thanks,
--
Suresh Jayaraman
As load increases, min_vruntime advances slower, so it's already scaled.
> Would it be a good idea to make the threshold dependent on number of
> tasks? This can help us achieve sleeper fairness with respect to the
> current context and not relevant to when the task went to sleep, I think.
>
> Does this make sense?
In one respect it makes some sense to scale. As load climbs, the waker
has to wait longer to get cpu, so sleepers sleep longer. This leads to
increased wakeup peremption as load climbs. However, if you do any kind
of scaling, you harm light threads, not their hog competition. Any
diddling of sleeper fairness would have to be accompanied with a
preemption model change methinks.
-Mike
Just told jays the exact same thing on IRC ;-)
Also, workloads are interesting, the signal test thing is the easiest to
test the preemption side, various things like QPID show the down-side
iirc.
Best testcase for the downside in my arsenal is vmark. It performs a
_lot_ better with no wakeup preemption. 'Course if you run your box
that way, you quickly find out what a horrible idea that is :)
-Mike