[PATCH] workqueue: Restore cpus_allowed mask for sleeping workqueue rescue threads

Ripduman Sohan

unread,

Aug 31, 2011, 9:30:02 AM8/31/11

to

Rescuer threads may be migrated (and are bound) to particular CPUs when
active. However, the allowed_cpus mask is not restored when they return
to sleep rendering inconsistent the presented and actual set of CPUs the
process may potentially run on. This patch fixes this oversight by
recording the allowed_cpus mask for rescuer threads when it enters the
rescuer_thread() main loop and restoring it every time the thread sleeps.

Signed-off-by: Ripduman Sohan <ripduma...@cl.cam.ac.uk>
---
kernel/workqueue.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 25fb1b0..0a4e785 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2031,6 +2031,7 @@ static int rescuer_thread(void *__wq)
struct list_head *scheduled = &rescuer->scheduled;
bool is_unbound = wq->flags & WQ_UNBOUND;
unsigned int cpu;
+ cpumask_t allowed_cpus = current->cpus_allowed;

set_user_nice(current, RESCUER_NICE_LEVEL);
repeat:
@@ -2078,6 +2079,8 @@ repeat:
spin_unlock_irq(&gcwq->lock);
}

+ set_cpus_allowed_ptr(current, &allowed_cpus);
+
schedule();
goto repeat;
}
--
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Peter Zijlstra

unread,

Aug 31, 2011, 11:30:02 AM8/31/11

to

On Wed, 2011-08-31 at 14:17 +0100, Ripduman Sohan wrote:
> Rescuer threads may be migrated (and are bound) to particular CPUs when
> active. However, the allowed_cpus mask is not restored when they return
> to sleep rendering inconsistent the presented and actual set of CPUs the
> process may potentially run on. This patch fixes this oversight by
> recording the allowed_cpus mask for rescuer threads when it enters the
> rescuer_thread() main loop and restoring it every time the thread sleeps.
>
> Signed-off-by: Ripduman Sohan <ripduma...@cl.cam.ac.uk>
> ---
> kernel/workqueue.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 25fb1b0..0a4e785 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2031,6 +2031,7 @@ static int rescuer_thread(void *__wq)
> struct list_head *scheduled = &rescuer->scheduled;
> bool is_unbound = wq->flags & WQ_UNBOUND;
> unsigned int cpu;
> + cpumask_t allowed_cpus = current->cpus_allowed;

except you cannot just allocate a cpumask_t like that on the stack,
those things can be massive.

> set_user_nice(current, RESCUER_NICE_LEVEL);
> repeat:
> @@ -2078,6 +2079,8 @@ repeat:
> spin_unlock_irq(&gcwq->lock);
> }
>
> + set_cpus_allowed_ptr(current, &allowed_cpus);
> +
> schedule();
> goto repeat;
> }

--

Ripduman Sohan

unread,

Sep 1, 2011, 9:40:02 AM9/1/11

to

Rescuer threads may be migrated (and are bound) to particular CPUs when
active. However, the allowed_cpus mask is not restored when they return
to sleep rendering inconsistent the presented and actual set of CPUs the
process may potentially run on. This patch fixes this oversight by
recording the allowed_cpus mask for rescuer threads when it enters the
rescuer_thread() main loop and restoring it every time the thread sleeps.

v2: Heeded Peter Zijlstra's comments and don't allocate cpumask_t on
stack, manipulate task cpus_allowed struct directly instead

Signed-off-by: Ripduman Sohan <ripduma...@cl.cam.ac.uk>
---

kernel/workqueue.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 25fb1b0..29d2ddf 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2078,6 +2078,12 @@ repeat:
spin_unlock_irq(&gcwq->lock);
}

+ if (is_unbound)
+ cpumask_setall(&current->cpus_allowed);
+ else
+ for_each_cwq_cpu(cpu, wq)
+ cpu_set(cpu, current->cpus_allowed);

+
schedule();
goto repeat;
}
--

1.7.1

Tejun Heo

unread,

Sep 1, 2011, 8:30:02 PM9/1/11

to

Hello,

On Thu, Sep 01, 2011 at 02:36:33PM +0100, Ripduman Sohan wrote:
> Rescuer threads may be migrated (and are bound) to particular CPUs when
> active. However, the allowed_cpus mask is not restored when they return
> to sleep rendering inconsistent the presented and actual set of CPUs the
> process may potentially run on. This patch fixes this oversight by
> recording the allowed_cpus mask for rescuer threads when it enters the
> rescuer_thread() main loop and restoring it every time the thread sleeps.

Hmmm... so, currently, rescuer is left bound to the last cpu it worked
on. Why is this a problem?

Thanks.

--
tejun

Ripduman Sohan

unread,

Sep 15, 2011, 12:20:03 PM9/15/11

to

Tejun Heo <t...@kernel.org> wrote:

> Hello,
>
> On Thu, Sep 01, 2011 at 02:36:33PM +0100, Ripduman Sohan wrote:
> > Rescuer threads may be migrated (and are bound) to particular CPUs when
> > active. However, the allowed_cpus mask is not restored when they return
> > to sleep rendering inconsistent the presented and actual set of CPUs the
> > process may potentially run on. This patch fixes this oversight by
> > recording the allowed_cpus mask for rescuer threads when it enters the
> > rescuer_thread() main loop and restoring it every time the thread sleeps.
>
> Hmmm... so, currently, rescuer is left bound to the last cpu it worked
> on. Why is this a problem?
>
> Thanks.
>
> --
> tejun

Hi,

The rescuer being left bound to the last CPU it was active on is not a
problem. As I pointed out in the commit log the issue is that the
allowed_cpus mask is not restored when rescuers return to sleep,

rendering inconsistent the presented and actual set of CPUs the
process may potentially run on.

Perhaps an explanation is in order. I am working on a system where we
constantly sample process run-state (including the process
Cpus_Allowed field in /proc/<pid>/status) to build a forward plan of
where the process _may_ run in the future. In situations of high
memory pressue (common on our setup) where the rescuers ran often the
plan begun to significantly deviate from the calculated schedule
because rescuer threads were marked as only runnable on a single CPU
when in reality they would bounce across CPUs.

I've currently put in a special-case exception in our code to account
for the fact that rescuer threads may run on _any_ CPU regardless of
the current cpus_allowed mask but I thought it would be useful to
correct it. I'm happy to continue with my current approach if you
deem the patch irrelevant.

Kind regards,

--rip

Tejun Heo

unread,

Sep 16, 2011, 8:40:01 PM9/16/11

to

Hello, Ripduman.

On Thu, Sep 15, 2011 at 05:14:30PM +0100, Ripduman Sohan wrote:
> The rescuer being left bound to the last CPU it was active on is not a
> problem. As I pointed out in the commit log the issue is that the
> allowed_cpus mask is not restored when rescuers return to sleep,
> rendering inconsistent the presented and actual set of CPUs the
> process may potentially run on.
>
> Perhaps an explanation is in order. I am working on a system where we
> constantly sample process run-state (including the process
> Cpus_Allowed field in /proc/<pid>/status) to build a forward plan of
> where the process _may_ run in the future. In situations of high
> memory pressue (common on our setup) where the rescuers ran often the
> plan begun to significantly deviate from the calculated schedule
> because rescuer threads were marked as only runnable on a single CPU
> when in reality they would bounce across CPUs.

But cpus_allowed doesn't mean where the task *may* run in the future.
It indicates on which cpus the task is allowed to run *now* and it's
allowed to change.

> I've currently put in a special-case exception in our code to account
> for the fact that rescuer threads may run on _any_ CPU regardless of
> the current cpus_allowed mask but I thought it would be useful to
> correct it. I'm happy to continue with my current approach if you
> deem the patch irrelevant.

I'm not necessarily against the patch if it helps a valid use case but
let's do that when and if the use case becomes relevant enough, which
I don't think it is yet. Please feel free to raise the issue again
when the situation changes.

Thank you.

--
tejun

Gilad Ben-Yossef

unread,

Sep 18, 2011, 2:40:02 AM9/18/11

to

Hi

On Thu, Sep 15, 2011 at 7:14 PM, Ripduman Sohan
<Ripduma...@cl.cam.ac.uk> wrote:
> Tejun Heo <t...@kernel.org> wrote:
>
>> Hello,
>>
>> On Thu, Sep 01, 2011 at 02:36:33PM +0100, Ripduman Sohan wrote:
>> > Rescuer threads may be migrated (and are bound) to particular CPUs when
>> > active. However, the allowed_cpus mask is not restored when they return
>> > to sleep rendering inconsistent the presented and actual set of CPUs the
>> > process may potentially run on. This patch fixes this oversight by
>> > recording the allowed_cpus mask for rescuer threads when it enters the
>> > rescuer_thread() main loop and restoring it every time the thread sleeps.
>>
>> Hmmm... so, currently, rescuer is left bound to the last cpu it worked
>> on. Why is this a problem?
>

There is at least one use case where I think this can be a problem -
cpu isolation.

If you decide to partition your CPU to give some group of processes a
set of CPUs all to their own (using cgroups/cpuset for example) having
non related bound processes really gets in the way. You would really
want to migrate away non related tasks from the isolated cpuset and
having a bound kernel thread prevents that.

Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker
gi...@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com
"Dance like no one is watching, love like you'll never be hurt, sing
like no one is listening... but for BEEP sake you better code like
you're going to maintain it for years!"

Tejun Heo

unread,

Sep 23, 2011, 11:10:01 PM9/23/11

to

Hello,

On Sun, Sep 18, 2011 at 09:36:57AM +0300, Gilad Ben-Yossef wrote:
> There is at least one use case where I think this can be a problem -
> cpu isolation.
>
> If you decide to partition your CPU to give some group of processes a
> set of CPUs all to their own (using cgroups/cpuset for example) having
> non related bound processes really gets in the way. You would really
> want to migrate away non related tasks from the isolated cpuset and
> having a bound kernel thread prevents that.

Hmm... I don't think this has much to do with CPU isolation.
Workqueue rescuers are reactively invoked to resolve possible resource
deadlock. It doesn't initiate operations on itself while CPU
isolation requires moving all sources of unwanted activites off the
isolated CPUs. Having idle rescuers not bound to any CPU might look
prettier but wouldn't help anything - it's not an activity source and
the 'wanted' activities on the isolated CPU also require rescuers for
forward progress guarantee.

Thanks.

--
tejun

Gilad Ben-Yossef

unread,

Sep 25, 2011, 2:10:01 AM9/25/11

to

Hi,

On Sat, Sep 24, 2011 at 6:07 AM, Tejun Heo <t...@kernel.org> wrote:
> On Sun, Sep 18, 2011 at 09:36:57AM +0300, Gilad Ben-Yossef wrote:
>> There is at least one use case where I think this can be a problem -
>> cpu isolation.
>>
>> If you decide to partition your CPU to give some group of processes a
>> set of CPUs all to their own (using cgroups/cpuset for example) having
>> non related bound processes really gets in the way. You would really
>> want to migrate away non related tasks from the isolated cpuset and
>> having a bound kernel thread prevents that.
>
> Hmm... I don't think this has much to do with CPU isolation.
> Workqueue rescuers are reactively invoked to resolve possible resource
> deadlock. It doesn't initiate operations on itself while CPU
> isolation requires moving all sources of unwanted activites off the
> isolated CPUs. Having idle rescuers not bound to any CPU might look
> prettier but wouldn't help anything - it's not an activity source and
> the 'wanted' activities on the isolated CPU also require rescuers for
> forward progress guarantee.
>

Thank you for taking the time to answer my query :-)

I agree there is no real problem with having an additional task bound
to an "isolated"
CPU so long that it does not run and of course that if a task on an
isolated CPU initiated
activity that resulted in requiring the services of a rescuer
workqueue thread it most certainly
needs to run there and that is fine.

I guess my question is - apart form running on the isolated CPU, does
the fact that the
rescuer thread is bound there can cause activity on that CPU
originating from a foreign CPU,
such as for example running an IPI handler in order to migrate it there?

If the answer is negative I agree there is no issue.

You do raise another valid point though - that having the rescuer
workqueue thread bound to an isolated
CPU is "less pretty". Right now it means that someone viewing the set
of tasks in a cpuset, for example,
need to figure out for each task "stuck" on the root set what is it,
why it is bound (and to which CPU) and
if that is a problem. I wonder what we can do to help with that.

Thanks,
Gilad

> Thanks.
>
> --
> tejun

>

--
Gilad Ben-Yossef
Chief Coffee Drinker
gi...@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"I've seen things you people wouldn't believe. Goto statements used to
implement co-routines. I watched C structures being stored in
registers. All those moments will be lost in time... like tears in
rain... Time to die. "

Tejun Heo

unread,

Sep 30, 2011, 4:10:02 AM9/30/11

to

Hello,

On Sun, Sep 25, 2011 at 09:02:25AM +0300, Gilad Ben-Yossef wrote:
> Thank you for taking the time to answer my query :-)
>
> I agree there is no real problem with having an additional task
> bound to an "isolated" CPU so long that it does not run and of
> course that if a task on an isolated CPU initiated activity that
> resulted in requiring the services of a rescuer workqueue thread it
> most certainly needs to run there and that is fine.
>
> I guess my question is - apart form running on the isolated CPU,
> does the fact that the rescuer thread is bound there can cause
> activity on that CPU originating from a foreign CPU, such as for
> example running an IPI handler in order to migrate it there?

Hmmm... indeed. This can cause an unnecessary wakeup / migration on
an isolated CPU when another CPU asks for the rescuer, so yeah it
makes sense to change the behavior. BTW, why didn't the original
patch simply use set_cpus_allowed_ptr(cpu_all_mask)?

Thanks.

--
tejun

Ripduman Sohan

unread,

Sep 30, 2011, 5:00:02 AM9/30/11

to

Tejun Heo <t...@kernel.org> wrote:

> On which CPU a rescuer is doesn't matter at all. Using cpu_all_mask
> is good enough.

OK. Am I OK to assume you'll edit the original patch to do this?
(i.e. you're not waiting on me?)

Kind regards,

--rip

Ripduman Sohan

unread,

Sep 30, 2011, 5:00:02 AM9/30/11

to

Tejun Heo <t...@kernel.org> wrote:

> Hmmm... indeed. This can cause an unnecessary wakeup / migration on
> an isolated CPU when another CPU asks for the rescuer, so yeah it
> makes sense to change the behavior. BTW, why didn't the original
> patch simply use set_cpus_allowed_ptr(cpu_all_mask)?
>

Because while at present all (bound) rescuer threads have an associated workqueue on each CPU, I didn't want to assume this arrangement would _always_ be the case. It was my thinking that for bound threads, iterating over the CPUs to only set those that have an associated workqueue for the rescuer would insulate agsinst any future case where rescuer threads may be bound to a subset of CPUs.

Kind regards,

--rip

Tejun Heo

unread,

Sep 30, 2011, 5:00:02 AM9/30/11

to

On Fri, Sep 30, 2011 at 09:54:21AM +0100, Ripduman Sohan wrote:
> Tejun Heo <t...@kernel.org> wrote:
>
> > Hmmm... indeed. This can cause an unnecessary wakeup / migration on
> > an isolated CPU when another CPU asks for the rescuer, so yeah it
> > makes sense to change the behavior. BTW, why didn't the original
> > patch simply use set_cpus_allowed_ptr(cpu_all_mask)?
> >
>
> Because while at present all (bound) rescuer threads have an
> associated workqueue on each CPU, I didn't want to assume this
> arrangement would _always_ be the case. It was my thinking that for
> bound threads, iterating over the CPUs to only set those that have
> an associated workqueue for the rescuer would insulate agsinst any
> future case where rescuer threads may be bound to a subset of CPUs.

On which CPU a rescuer is doesn't matter at all. Using cpu_all_mask
is good enough.

Thank you.

--
tejun

Tejun Heo

unread,

Sep 30, 2011, 9:20:02 PM9/30/11

to

On Fri, Sep 30, 2011 at 09:59:39AM +0100, Ripduman Sohan wrote:
> Tejun Heo <t...@kernel.org> wrote:
>
> > On which CPU a rescuer is doesn't matter at all. Using cpu_all_mask
> > is good enough.
>
> OK. Am I OK to assume you'll edit the original patch to do this?
> (i.e. you're not waiting on me?)

I think it would be better if you can respin the patch w/ updated
patch description explaning why it's a good idea for cpu isolation.

Thanks.

--
tejun