Re: Fw: Re: [mm PATCH 4/6] RCU: (now) CPU hotplug

Dipankar Sarma

unread,

Jan 26, 2007, 2:12:31 PM1/26/07

to Paul E. McKenney

On Thu, Jan 25, 2007 at 02:36:45AM +0530, Dipankar Sarma wrote:
> On Wed, Jan 24, 2007 at 08:15:59AM -0800, Paul E. McKenney wrote:
>
> It should be relatively easy. Setting the offlined cpu's flags
> to neutral state should do the trick in most cases.
> I will send out the patches tomorrow after reviewing the code
> some more.

Famous last words.

It turns out that I have been bitten by the ugly cpu hotplug
locking mess while trying to get preemptible RCU code working
with CPU hotplug. The per-subsystem locking thing isn't really
user-friendly. Here is the dependency -

In cpu hotplug path (after CPU_LOCK_ACQUIRE) -

CPU_DOWN_PREPARE:sched domains -> detach_destroy_domains() ->
synchronize_sched() -> sched_setaffinity()

sched_setaffinity() tries to acquire the scheduler cpu hotplug
mutex and deadlocks.

I see no easy way of getting around this - doing "cpu hotplug locked"
version of all those APIs would add lots of code which is bad.
We could try Gautham's idea of letting each subsystem maintain
its own online cpu mask, but I bet implementing sched_setaffinity()
would not be very easy despite this.

What is the status on this now ? Is this a good example to
show why per-subsystem locks might be unmaintainable ? Can
we go back to a simple scalable refcount model
for CPU hotplug now ?

Thanks
Dipankar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Dipankar Sarma

unread,

Jan 26, 2007, 2:47:37 PM1/26/07

to Andrew Morton

On Fri, Jan 26, 2007 at 11:28:37AM -0800, Andrew Morton wrote:

> On Sat, 27 Jan 2007 00:41:13 +0530
> Dipankar Sarma <dipa...@in.ibm.com> wrote:
>
> > Is this a good example to
> > show why per-subsystem locks might be unmaintainable ?
>

> Maybe. It might also be a good example of confused design.

Not this one. The point is that this forces us to determine
which APIs are usable in cpu hotplug path and which aren't.
It isn't unreasonable to expect that some one may need
to use sched_setaffinity() in hotplug path. This now forces
us to have two versions of sched_setaffinity() - one with
the hotplug per-subsystem lock, the other one without.

> > Can we go back
>
> "back" assumes it was once present. It wasn't.

IIRC, there was one get_cpu_hotplug()/put_cpu_hotplug() implementation
based on Arjan's suggestion and implemented in a scalable way by Gautham.

>
> > to a simple
>
> "simple" hasn't been demonstrated. New lock types and their use are never
> simple, especially magic ones.

get_cpu_hotplug()/put_cpu_hotplug() hides all the refcount complexities
underneath and most kernel programmers are familiar with the
get()/put() model. This seems like the most simple thing we
could do in the short term without making too many changes in all over the
kernel for CPU hotplug.

>
> > scalable refcount model
> > for CPU hotplug now ?
>

> The plan is, I hope, to rip it all out and do freeze_processes() on the
> hotplug side, so nobody else needs to worry about cpu hotplug any more.
> But at present everyone seems to be in hiding.

This would be ideal. However, we don't seem to have any momentum
on this. The other thing we would need to do in this case is to
check if all the users of cpu hotplug can tolerate a very slow
hotplug step if there are 10s of thousands of processes and threads.

Andrew Morton

unread,

Jan 26, 2007, 2:51:40 PM1/26/07

to dipa...@in.ibm.com

On Sat, 27 Jan 2007 00:41:13 +0530
Dipankar Sarma <dipa...@in.ibm.com> wrote:

> On Thu, Jan 25, 2007 at 02:36:45AM +0530, Dipankar Sarma wrote:
> > On Wed, Jan 24, 2007 at 08:15:59AM -0800, Paul E. McKenney wrote:
> >
> > It should be relatively easy. Setting the offlined cpu's flags
> > to neutral state should do the trick in most cases.
> > I will send out the patches tomorrow after reviewing the code
> > some more.
>
> Famous last words.
>
> It turns out that I have been bitten by the ugly cpu hotplug
> locking mess while trying to get preemptible RCU code working
> with CPU hotplug. The per-subsystem locking thing isn't really
> user-friendly. Here is the dependency -
>
> In cpu hotplug path (after CPU_LOCK_ACQUIRE) -
>
> CPU_DOWN_PREPARE:sched domains -> detach_destroy_domains() ->
> synchronize_sched() -> sched_setaffinity()
>
> sched_setaffinity() tries to acquire the scheduler cpu hotplug
> mutex and deadlocks.
>
> I see no easy way of getting around this - doing "cpu hotplug locked"
> version of all those APIs would add lots of code which is bad.
> We could try Gautham's idea of letting each subsystem maintain
> its own online cpu mask, but I bet implementing sched_setaffinity()
> would not be very easy despite this.

Suggest you just ignore cpu hotplug locking, if that helps.

> What is the status on this now ?

Stalled, apparently.

> Is this a good example to
> show why per-subsystem locks might be unmaintainable ?

Maybe. It might also be a good example of confused design.

> Can we go back

"back" assumes it was once present. It wasn't.

> to a simple

"simple" hasn't been demonstrated. New lock types and their use are never
simple, especially magic ones.

> scalable refcount model

> for CPU hotplug now ?

The plan is, I hope, to rip it all out and do freeze_processes() on the

hotplug side, so nobody else needs to worry about cpu hotplug any more.
But at present everyone seems to be in hiding.

Andrew Morton

unread,

Jan 26, 2007, 3:42:28 PM1/26/07

to dipa...@in.ibm.com

On Sat, 27 Jan 2007 01:16:22 +0530
Dipankar Sarma <dipa...@in.ibm.com> wrote:

> > The plan is, I hope, to rip it all out and do freeze_processes() on the
> > hotplug side, so nobody else needs to worry about cpu hotplug any more.
> > But at present everyone seems to be in hiding.
>
> This would be ideal. However, we don't seem to have any momentum
> on this.

There's no point in expending effort on a fancy new lock until this option
has been eliminated, so yeah, things are stuck.

> The other thing we would need to do in this case is to
> check if all the users of cpu hotplug can tolerate a very slow
> hotplug step if there are 10s of thousands of processes and threads.

Yes, that needs evaluation. If it's a problem then we might need to
introduce a more sophisticated user interface which enables userspace to
take multiple CPUs online and offline in a single step, rather than
one-at-a-time. I expect that'd be fairly straightforward.

Dipankar Sarma

unread,

Jan 26, 2007, 3:45:29 PM1/26/07

to Andrew Morton

On Fri, Jan 26, 2007 at 12:17:39PM -0800, Andrew Morton wrote:
> On Sat, 27 Jan 2007 01:16:22 +0530
> Dipankar Sarma <dipa...@in.ibm.com> wrote:
> > > The plan is, I hope, to rip it all out and do freeze_processes() on the
> > > hotplug side, so nobody else needs to worry about cpu hotplug any more.
> > > But at present everyone seems to be in hiding.
> >
> > This would be ideal. However, we don't seem to have any momentum
> > on this.
>
> There's no point in expending effort on a fancy new lock until this option
> has been eliminated, so yeah, things are stuck.

The new lock (scalable refcount) is almost already there.
This http://lkml.org/lkml/2006/10/26/65 can be used to implement
get_cpu_hotplug()/put_cpu_hotplug(). The unfairness issue
can be fixed. I am going to play with these patches and
see if I can come up with something useful quickly.

> > The other thing we would need to do in this case is to
> > check if all the users of cpu hotplug can tolerate a very slow
> > hotplug step if there are 10s of thousands of processes and threads.
>
> Yes, that needs evaluation. If it's a problem then we might need to

I will check with the dynamic LPAR people to see if they
have any requirements in this regard. I have also heard
about some people wanting to disable additional threads
in processors (running 1 h/w thread per core) using cpu hotplug
for some type of applications. I will check with those
folks as well.

> introduce a more sophisticated user interface which enables userspace to
> take multiple CPUs online and offline in a single step, rather than
> one-at-a-time. I expect that'd be fairly straightforward.

The worry I have is that all of this is a complete rewrite of
existing CPU hotplug and it took us 1+ years to get a decent
CPU hotplug implementation. Rusty and Vatsa can probably vouch
for how difficult it was to get it right. I don't think
it is a good idea to keep CPU hotplug broken until this
rewrite happens. Can't we just go with something that is
already there and works correctly for us without waiting
for the perfect solution ?

Thanks
Dipankar

Andrew Morton

unread,

Jan 26, 2007, 4:35:37 PM1/26/07

to dipa...@in.ibm.com

On Sat, 27 Jan 2007 02:14:06 +0530
Dipankar Sarma <dipa...@in.ibm.com> wrote:

> On Fri, Jan 26, 2007 at 12:17:39PM -0800, Andrew Morton wrote:
> > On Sat, 27 Jan 2007 01:16:22 +0530
> > Dipankar Sarma <dipa...@in.ibm.com> wrote:
> > > > The plan is, I hope, to rip it all out and do freeze_processes() on the
> > > > hotplug side, so nobody else needs to worry about cpu hotplug any more.
> > > > But at present everyone seems to be in hiding.
> > >
> > > This would be ideal. However, we don't seem to have any momentum
> > > on this.
> >
> > There's no point in expending effort on a fancy new lock until this option
> > has been eliminated, so yeah, things are stuck.
>
> The new lock (scalable refcount) is almost already there.
> This http://lkml.org/lkml/2006/10/26/65 can be used to implement
> get_cpu_hotplug()/put_cpu_hotplug(). The unfairness issue
> can be fixed. I am going to play with these patches and
> see if I can come up with something useful quickly.

You're forgetting the large, unknown number of places in the kernel which
are presently buggy in the presence of CPU hotplug. With your proposal, we
still need to hunt them all down and put magic locks around them, and we need to
continue to do that as the kernel evolves.

If we use the process freezer, these bugs all get automatically fixed, and we get
to remove the existing locking, and we don't need to think about it any
more.

> > > The other thing we would need to do in this case is to
> > > check if all the users of cpu hotplug can tolerate a very slow
> > > hotplug step if there are 10s of thousands of processes and threads.
> >
> > Yes, that needs evaluation. If it's a problem then we might need to
>
> I will check with the dynamic LPAR people to see if they
> have any requirements in this regard. I have also heard
> about some people wanting to disable additional threads
> in processors (running 1 h/w thread per core) using cpu hotplug
> for some type of applications. I will check with those
> folks as well.

OK.

> > introduce a more sophisticated user interface which enables userspace to
> > take multiple CPUs online and offline in a single step, rather than
> > one-at-a-time. I expect that'd be fairly straightforward.
>
> The worry I have is that all of this is a complete rewrite of
> existing CPU hotplug and it took us 1+ years to get a decent
> CPU hotplug implementation. Rusty and Vatsa can probably vouch
> for how difficult it was to get it right. I don't think
> it is a good idea to keep CPU hotplug broken until this
> rewrite happens. Can't we just go with something that is
> already there and works correctly for us without waiting
> for the perfect solution ?

Who said it's complete rewrite? Nobody's even started to look at it afaik.
It might be quite simple.

Paul E. McKenney

unread,

Jan 28, 2007, 6:14:20 PM1/28/07

to Andrew Morton

On Fri, Jan 26, 2007 at 01:29:49PM -0800, Andrew Morton wrote:
> On Sat, 27 Jan 2007 02:14:06 +0530
> Dipankar Sarma <dipa...@in.ibm.com> wrote:
>
> > On Fri, Jan 26, 2007 at 12:17:39PM -0800, Andrew Morton wrote:
> > > On Sat, 27 Jan 2007 01:16:22 +0530
> > > Dipankar Sarma <dipa...@in.ibm.com> wrote:
> > > > > The plan is, I hope, to rip it all out and do freeze_processes() on the
> > > > > hotplug side, so nobody else needs to worry about cpu hotplug any more.
> > > > > But at present everyone seems to be in hiding.
> > > >
> > > > This would be ideal. However, we don't seem to have any momentum
> > > > on this.
> > >
> > > There's no point in expending effort on a fancy new lock until this option
> > > has been eliminated, so yeah, things are stuck.
> >
> > The new lock (scalable refcount) is almost already there.
> > This http://lkml.org/lkml/2006/10/26/65 can be used to implement
> > get_cpu_hotplug()/put_cpu_hotplug(). The unfairness issue
> > can be fixed. I am going to play with these patches and
> > see if I can come up with something useful quickly.
>
> You're forgetting the large, unknown number of places in the kernel which
> are presently buggy in the presence of CPU hotplug. With your proposal, we
> still need to hunt them all down and put magic locks around them, and we need to
> continue to do that as the kernel evolves.

Certainly the current practice of notifying CPU_UP_PREPARE and CPU_ONLINE
in the same order as CPU_DOWN_PREPARE and CPU_DEAD is just asking for lots
of bugs. The "up" notifications should proceed in the same order as the
services were initialized at boot time, and the "down" notifications need
to proceed in the opposite order.

As things stand, if service B relies on service A, either service B must
go without during the CPU-removal process or services A and B must know
way too much about one another. Either approach is an excellent way
to breed endless quantities of bugs. If "down" notifications instead
proceeded in the opposite order from boot-time initialization, such
dependencies would be resolved automatically, and life is -much- simpler.

So why are we doing hard things the hard way here??? ;-)

> If we use the process freezer, these bugs all get automatically fixed,
> and we get to remove the existing locking, and we don't need to think
> about it any more.

The idea being to essentially suspend the system to RAM, remove the
CPU and then unsuspend it? Seems like quite high overhead -- or am
I misunderstanding the proposal?

Thanx, Paul

Andrew Morton

unread,

Jan 28, 2007, 6:36:45 PM1/28/07

to pau...@linux.vnet.ibm.com

On Sun, 28 Jan 2007 14:47:56 -0800
"Paul E. McKenney" <pau...@linux.vnet.ibm.com> wrote:

> > If we use the process freezer, these bugs all get automatically fixed,
> > and we get to remove the existing locking, and we don't need to think
> > about it any more.
>
> The idea being to essentially suspend the system to RAM, remove the
> CPU and then unsuspend it? Seems like quite high overhead -- or am
> I misunderstanding the proposal?

The process freezer basically wakes up all threads in the machine and makes
them go to sleep in a specific place, so they're all in a known state.
kernel threads are also captured, via their open-coded polling call to
try_to_freeze().

The machine suspend code uses the process freezer, as does kprobes. The
freezer isn't tied to suspend or to power management.

The freezer does have potential to be expensive if used frequently and if
there are many threads. But I don't think anyone has looked at optimising
it. For example, there are certain places in the kernel where threads
commonly sleep (eg, select()). We know that this is a safe place to sleep
wrt (at least) CPU hotplug, so there's not really a need to wake those
processes up. Perhaps something could be done with that observation...

But first we'd need to demonstrate that we actually have a problem.

Paul E. McKenney

unread,

Jan 28, 2007, 9:40:45 PM1/28/07

to Andrew Morton

On Sun, Jan 28, 2007 at 03:30:05PM -0800, Andrew Morton wrote:
> On Sun, 28 Jan 2007 14:47:56 -0800
> "Paul E. McKenney" <pau...@linux.vnet.ibm.com> wrote:
>
> > > If we use the process freezer, these bugs all get automatically fixed,
> > > and we get to remove the existing locking, and we don't need to think
> > > about it any more.
> >
> > The idea being to essentially suspend the system to RAM, remove the
> > CPU and then unsuspend it? Seems like quite high overhead -- or am
> > I misunderstanding the proposal?
>
> The process freezer basically wakes up all threads in the machine and makes
> them go to sleep in a specific place, so they're all in a known state.
> kernel threads are also captured, via their open-coded polling call to
> try_to_freeze().
>
> The machine suspend code uses the process freezer, as does kprobes. The
> freezer isn't tied to suspend or to power management.
>
> The freezer does have potential to be expensive if used frequently and if
> there are many threads. But I don't think anyone has looked at optimising
> it. For example, there are certain places in the kernel where threads
> commonly sleep (eg, select()). We know that this is a safe place to sleep
> wrt (at least) CPU hotplug, so there's not really a need to wake those
> processes up. Perhaps something could be done with that observation...
>
> But first we'd need to demonstrate that we actually have a problem.

Fair enough -- though if it is a goal to remove CPUs from systems with
realtime workloads, I can assure you that we do have a problem.

Thanx, Paul

Ingo Molnar

unread,

Jan 29, 2007, 2:15:52 PM1/29/07

to Andrew Morton

* Andrew Morton <ak...@osdl.org> wrote:

> > The idea being to essentially suspend the system to RAM, remove the
> > CPU and then unsuspend it? Seems like quite high overhead -- or am
> > I misunderstanding the proposal?
>
> The process freezer basically wakes up all threads in the machine and
> makes them go to sleep in a specific place, so they're all in a known
> state. kernel threads are also captured, via their open-coded polling
> call to try_to_freeze().
>
> The machine suspend code uses the process freezer, as does kprobes.
> The freezer isn't tied to suspend or to power management.
>
> The freezer does have potential to be expensive if used frequently and
> if there are many threads. But I don't think anyone has looked at

> optimising it. [...]

i've measured it and it takes a few milliseconds typically - certainly
not noticeable even for hypothetical highly scripted CPU-hotplug
environments. (i doubt they really exist in practice)

in fact (new) kprobes uses the freezer, and it's far more performance
sensitive than the handling of CPU hotplug events.

And there was almost no effort put into making the freezer fast, i'm
sure if it ever becomes a problem we could improve it quite drastically.
And that effort would improve suspend performance and kprobes
performance as well - while the cpu-hotplug locking hell is just an
unneeded distraction nobody really can deal with properly. So i say lets
ditch CPU hotplug locking completely and lets go for the dead-simple
freezer based design ...

Ingo

Paul E. McKenney

unread,

Jan 29, 2007, 9:48:13 PM1/29/07

to Ingo Molnar

On Mon, Jan 29, 2007 at 08:12:41PM +0100, Ingo Molnar wrote:
>
> * Andrew Morton <ak...@osdl.org> wrote:
>
> > > The idea being to essentially suspend the system to RAM, remove the
> > > CPU and then unsuspend it? Seems like quite high overhead -- or am
> > > I misunderstanding the proposal?
> >
> > The process freezer basically wakes up all threads in the machine and
> > makes them go to sleep in a specific place, so they're all in a known
> > state. kernel threads are also captured, via their open-coded polling
> > call to try_to_freeze().
> >
> > The machine suspend code uses the process freezer, as does kprobes.
> > The freezer isn't tied to suspend or to power management.
> >
> > The freezer does have potential to be expensive if used frequently and
> > if there are many threads. But I don't think anyone has looked at
> > optimising it. [...]
>
> i've measured it and it takes a few milliseconds typically - certainly
> not noticeable even for hypothetical highly scripted CPU-hotplug
> environments. (i doubt they really exist in practice)
>
> in fact (new) kprobes uses the freezer, and it's far more performance
> sensitive than the handling of CPU hotplug events.

Outside of realtime workloads, I agree that performance should not be
a problem. And I don't know of any reason why realtime systems need to
be able to do hotplug CPU. Yet, anyway. ;-)

> And there was almost no effort put into making the freezer fast, i'm
> sure if it ever becomes a problem we could improve it quite drastically.
> And that effort would improve suspend performance and kprobes
> performance as well - while the cpu-hotplug locking hell is just an
> unneeded distraction nobody really can deal with properly. So i say lets
> ditch CPU hotplug locking completely and lets go for the dead-simple
> freezer based design ...

I never have seen anyone try a freezer-based approach, so why not?

However, it seems to me that the notifiers are still needed in some form
or another -- for example, RCU needs to manipulate its masks. My guess
is that you are proposing doing the CPU-down and CPU-up notifications
between the freeze_processes() and the thaw_processes().

Going through the list:

o migration_call() needs to create a kthread, which can fail,
so there still needs to be the dual phase bring-up, with
CPU_ONLINE and CPU_UP_CANCELED.

o comp_pool_callback() can also fail during CPU_UP_PREPARE,
as can kernel/softirq.c:cpu_callback(),
kernel/softlockup.c:cpu_callback(), timer_cpu_notify(),
ratelimit_handler() [which returns 0 rather than NOTIFY_DONE,
patch in separate email], pageset_cpuup_callback(), and
cpuup_callback().

o The following are unconditional: vmstat_cpuup_callback(),
sysfs_cpu_notify(), cpu_numa_callback(), hrtimer_cpu_notify(),
and rcu_cpu_notify().

o None of the notifier handlers currently say "no" to a CPU
going down, which is reassuring -- at least given that
move_task_off_dead_cpu() is willing to forcibly migrate
if needed.

OK, I indeed do not see any obvious dependencies amoung current CPU
notifier handlers, though the same-order up/down traversal still seems to
me to be an accident waiting to happen. However, this is independent of
the method to be used to bring CPUs up and down.

And yes, I did in fact suffer severe trauma due to same-order up/down
traversal in a previous life, involving (among other things) CPUs
attempting to do useful work with out-of-date TLBs and attempting to
execute RCU read-side critical sections on CPUs that were no longer
participating in grace-period processing. Not pretty. :-/ But I don't
immediately see similar issues in Linux at the moment (with the caveat
that I can't claim to be expert in every area using CPU notifiers), and
if such problems do arise, it is pretty easy to provide reverse-order
down traversal. The fact that RCU doesn't stop grace-period processing
until CPU_DEAD is at least somewhat comforting to me. ;-)

So the thought is to make _cpu_down() and _cpu_up() do something like
the following (untested, probably does not even compile), perhaps with
suitable adjustments elsewhere as well?

static int _cpu_down(unsigned int cpu)
{
int err;
struct task_struct *p;
cpumask_t old_allowed, tmp;

if (num_online_cpus() == 1)
return -EBUSY;

if (!cpu_online(cpu))
return -EINVAL;

if (freeze_processes()) {
err = -EBUSY;
goto out_freeze_notify_failed;
}
err = raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE,
(void *)(long)cpu);
if (err == NOTIFY_BAD) {
printk("%s: attempt to take down CPU %u failed\n",
__FUNCTION__, cpu);
err = -EINVAL;
goto out_freeze_notify_failed;
}

/* Ensure that we are not runnable on dying cpu */
old_allowed = current->cpus_allowed;
tmp = CPU_MASK_ALL;
cpu_clear(cpu, tmp);
set_cpus_allowed(current, tmp);

mutex_lock(&cpu_bitmask_lock);
p = __stop_machine_run(take_cpu_down, NULL, cpu);
mutex_unlock(&cpu_bitmask_lock);

if (IS_ERR(p) || cpu_online(cpu)) {
/* CPU didn't die: tell everyone. Can't complain. */
if (raw_notifier_call_chain(&cpu_chain, CPU_DOWN_FAILED,
(void *)(long)cpu) == NOTIFY_BAD)
BUG();

if (IS_ERR(p)) {
err = PTR_ERR(p);
goto out_allowed;
}
goto out_thread;
}

/* Wait for it to sleep (leaving idle task). */
while (!idle_cpu(cpu))
yield();

/* This actually kills the CPU. */
__cpu_die(cpu);

/* Move it here so it can run. */
kthread_bind(p, get_cpu());
put_cpu();

/* CPU is completely dead: tell everyone. Too late to complain. */
if (raw_notifier_call_chain(&cpu_chain, CPU_DEAD,
(void *)(long)cpu) == NOTIFY_BAD)
BUG();

check_for_tasks(cpu);

out_thread:
err = kthread_stop(p);
out_allowed:
set_cpus_allowed(current, old_allowed);
out_freeze_notify_failed:
thaw_processes();
return err;
}

static int __devinit _cpu_up(unsigned int cpu)
{
int ret;
void *hcpu = (void *)(long)cpu;

if (cpu_online(cpu) || !cpu_present(cpu))
return -EINVAL;

if (freeze_processes()) {
ret = -EBUSY;
goto out_freeze_failed;
}
ret = raw_notifier_call_chain(&cpu_chain, CPU_UP_PREPARE, hcpu);
if (ret == NOTIFY_BAD) {
printk("%s: attempt to bring up CPU %u failed\n",
__FUNCTION__, cpu);
ret = -EINVAL;
goto out_notify;
}

/* Arch-specific enabling code. */
mutex_lock(&cpu_bitmask_lock);
ret = __cpu_up(cpu);
mutex_unlock(&cpu_bitmask_lock);
if (ret != 0)
goto out_notify;
BUG_ON(!cpu_online(cpu));

/* Now call notifier in preparation. */
raw_notifier_call_chain(&cpu_chain, CPU_ONLINE, hcpu);

out_notify:
if (ret != 0)
raw_notifier_call_chain(&cpu_chain,
CPU_UP_CANCELED, hcpu);

out_freeze_failed:
thaw_processes();
return ret;
}

Thanx, Paul

Ingo Molnar

unread,

Jan 30, 2007, 2:51:32 AM1/30/07

to Paul E. McKenney

* Paul E. McKenney <pau...@linux.vnet.ibm.com> wrote:

> > in fact (new) kprobes uses the freezer, and it's far more
> > performance sensitive than the handling of CPU hotplug events.
>
> Outside of realtime workloads, I agree that performance should not be
> a problem. And I don't know of any reason why realtime systems need
> to be able to do hotplug CPU. Yet, anyway. ;-)

even for -rt it's not really an issue: the hotplug locks are so
all-encompassing and so unbound at the moment that there's no realistic
expectation for them to ever become deterministic. So we might as well
make them encompass "everything" - without any noticeable effect.

> So the thought is to make _cpu_down() and _cpu_up() do something like
> the following (untested, probably does not even compile), perhaps with
> suitable adjustments elsewhere as well?
>
> static int _cpu_down(unsigned int cpu)
> {
> int err;
> struct task_struct *p;
> cpumask_t old_allowed, tmp;
>
> if (num_online_cpus() == 1)
> return -EBUSY;
>
> if (!cpu_online(cpu))
> return -EINVAL;
>
> if (freeze_processes()) {
> err = -EBUSY;
> goto out_freeze_notify_failed;
> }
> err = raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE,
> (void *)(long)cpu);

yeah. This all looks so nice that i almost cannot believe it's true :-)
This would allow us to rip out all the cpu-hotplug locking: wow! If only
someone would volunteer to try to pull this off and then to touch so
many subsystems ;-)

i fully agree that the opposite notifications should be traversed in
inverse order [but this is an orthogonal improvement]. Too bad the
notifier list is a single linked list.

Ingo

Gautham R Shenoy

unread,

Jan 30, 2007, 9:03:06 AM1/30/07

to Paul E. McKenney

We won't be needing this lock either, since it forms the core of
the unpopular lock_cpu_hotplug :-)

> p = __stop_machine_run(take_cpu_down, NULL, cpu);
> mutex_unlock(&cpu_bitmask_lock);

Looks good.

I'm wondering if we'll need to try_to_freeze in the fork and exit code
as well :?

Thanks and Regards
gautham.
--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

Paul E. McKenney

unread,

Jan 30, 2007, 11:03:57 AM1/30/07

to Ingo Molnar

Well, it turns out that maybe it is in fact untrue. :-/

I need to look at all uses of PF_NOFREEZE -- as I understand the
code, processes marked PF_NOFREEZE will continue running, potentially
interfering with the hotplug operation. :-(

I will pass my findings on to this list.

> This would allow us to rip out all the cpu-hotplug locking: wow! If only
> someone would volunteer to try to pull this off and then to touch so
> many subsystems ;-)

Hey, just ending the debates on how to do CPU-hotplug locking would be
worth something! ;-)

> i fully agree that the opposite notifications should be traversed in
> inverse order [but this is an orthogonal improvement]. Too bad the
> notifier list is a single linked list.

:-(

But there can't be -that- many elements in that list... But agreed,
separate issue.

Thanx, Paul

Rafael J. Wysocki

unread,

Jan 30, 2007, 11:44:46 AM1/30/07

to pau...@linux.vnet.ibm.com

Well, I did it some time ago, although not very thoroughly.

AFAICS there are not so many, but one that stands out is the worker threads.
We needed two of them to actually go to sleep, so now it's possible to create
a "freezeable workqueue" the worker thread of which will not set PF_NOFREEZE,
but currently this is only used by XFS.

Greetings,
Rafael

--
If you don't have the time to read,
you don't have the time or the tools to write.
- Stephen King

Paul E. McKenney

unread,

Jan 30, 2007, 11:47:55 AM1/30/07

to Gautham R Shenoy

;-)

> > p = __stop_machine_run(take_cpu_down, NULL, cpu);
> > mutex_unlock(&cpu_bitmask_lock);
>
> Looks good.
>
> I'm wondering if we'll need to try_to_freeze in the fork and exit code
> as well :?

Quite possibly -- if you check into that, I will look at PF_NOFREEZE.

Fair enough?

Thanx, Paul

Andrew Morton

unread,

Jan 30, 2007, 1:56:22 PM1/30/07

to Rafael J. Wysocki

On Tue, 30 Jan 2007 17:44:47 +0100
"Rafael J. Wysocki" <r...@sisk.pl> wrote:

> > I need to look at all uses of PF_NOFREEZE -- as I understand the
> > code, processes marked PF_NOFREEZE will continue running, potentially
> > interfering with the hotplug operation. :-(
> >
> > I will pass my findings on to this list.
>
> Well, I did it some time ago, although not very thoroughly.
>
> AFAICS there are not so many, but one that stands out is the worker threads.
> We needed two of them to actually go to sleep, so now it's possible to create
> a "freezeable workqueue" the worker thread of which will not set PF_NOFREEZE,
> but currently this is only used by XFS.

Or we can create a variant of freeze_processes which ignores PF_NOFREEZE.

As I said eariler, we might need to change the freezer code for this
application. In fact we should do so: that sys_sync() call in there is
quite inappropriate, as is, I suppose, the two-pass freeze attempt. As are
the nice printks, come to that.

Pretty simple stuff though.

Paul E. McKenney

unread,

Jan 30, 2007, 2:50:44 PM1/30/07

to Andrew Morton

On Tue, Jan 30, 2007 at 10:27:18AM -0800, Andrew Morton wrote:
> On Tue, 30 Jan 2007 17:44:47 +0100
> "Rafael J. Wysocki" <r...@sisk.pl> wrote:
>
> > > I need to look at all uses of PF_NOFREEZE -- as I understand the
> > > code, processes marked PF_NOFREEZE will continue running, potentially
> > > interfering with the hotplug operation. :-(
> > >
> > > I will pass my findings on to this list.
> >
> > Well, I did it some time ago, although not very thoroughly.
> >
> > AFAICS there are not so many, but one that stands out is the worker threads.
> > We needed two of them to actually go to sleep, so now it's possible to create
> > a "freezeable workqueue" the worker thread of which will not set PF_NOFREEZE,
> > but currently this is only used by XFS.
>
> Or we can create a variant of freeze_processes which ignores PF_NOFREEZE.
>
> As I said eariler, we might need to change the freezer code for this
> application. In fact we should do so: that sys_sync() call in there is
> quite inappropriate, as is, I suppose, the two-pass freeze attempt. As are
> the nice printks, come to that.
>
> Pretty simple stuff though.

And we might need to change some of the processes that currently set
PF_NOFREEZE so that they periodically go somewhere that the freezer can
find them -- if I remember correctly, at least some of the PF_NOFREEZE
tasks were so marked in order to prevent suspend hangs.

Part of what I need to look at. ;-)

Thanx, Paul

Paul E. McKenney

unread,

Jan 31, 2007, 6:11:24 PM1/31/07

to Andrew Morton

On Tue, Jan 30, 2007 at 11:49:40AM -0800, Paul E. McKenney wrote:
> On Tue, Jan 30, 2007 at 10:27:18AM -0800, Andrew Morton wrote:
> > On Tue, 30 Jan 2007 17:44:47 +0100
> > "Rafael J. Wysocki" <r...@sisk.pl> wrote:
> >
> > > > I need to look at all uses of PF_NOFREEZE -- as I understand the
> > > > code, processes marked PF_NOFREEZE will continue running, potentially
> > > > interfering with the hotplug operation. :-(
> > > >
> > > > I will pass my findings on to this list.
> > >
> > > Well, I did it some time ago, although not very thoroughly.
> > >
> > > AFAICS there are not so many, but one that stands out is the worker threads.
> > > We needed two of them to actually go to sleep, so now it's possible to create
> > > a "freezeable workqueue" the worker thread of which will not set PF_NOFREEZE,
> > > but currently this is only used by XFS.
> >
> > Or we can create a variant of freeze_processes which ignores PF_NOFREEZE.
> >
> > As I said eariler, we might need to change the freezer code for this
> > application. In fact we should do so: that sys_sync() call in there is
> > quite inappropriate, as is, I suppose, the two-pass freeze attempt. As are
> > the nice printks, come to that.
> >
> > Pretty simple stuff though.
>
> And we might need to change some of the processes that currently set
> PF_NOFREEZE so that they periodically go somewhere that the freezer can
> find them -- if I remember correctly, at least some of the PF_NOFREEZE
> tasks were so marked in order to prevent suspend hangs.
>
> Part of what I need to look at. ;-)

OK. This just might be feasible. That said, there is a lot of code
containing PF_NOFREEZE that I am not familiar with. That said, here
are my thoughts -- this is in addition to the changes to freeze_processes()
and thaw_processes() called out earlier.

Thoughts?

Thanx, Paul

Proposal:

o Add a new task_struct field pf_exempt or some such.
This would be a bitfield:

#define PFE_SUSPEND 0x0001
#define PFE_KPROBES 0x0002
#define PFE_HOTPLUG 0x0004

There would be no tasks specifying PFE_HOTPLUG, since everything
needs to be frozen in that case. But see below for use of this
definition.

o freeze_processes() takes an argument that says what kind of
freezing is going on. For example, freeze_processes(PFE_SUSPEND)
would be appropriate for suspend.

o freezeable() would take an additional PFE_* argument.
Only processes matching the specified PFE_* would be exempt
from freezing -- so CPU hotplug would do freeze_processes(0).
Or better yet, use a PFE_THIS_MEANS_YOU that is defined to 0.
(OK, OK, PFE_ALL!!!)

Presumably we only allow one freeze_processes() to run at a
time (though I don't see anything preventing concurrent
freezing!), so that thaw_processes() just uses whatever was
passed to the matching freeze_processes().

o Introduce a mutex to prevent overlapping freezes -- or find
out what the heck prevents them at present!!! (I don't see
anything.) Keep in mind that CPU hotplug might be script
driven ("this CPU is getting ECC errors in its cache, time to
shut it down!").

Acquire the mutex in freeze_processes(), release in thaw_processes().

o Replace all the "current->flags |= PF_NOFREEZE" statements with
"exempt_from_freeze(current, int pfe)" or some such. This would
set the flags bit and also store the pfe argument into the pf_exempt
field.

Results of inspection -- skepticism advised, but here it is:

o arch/arm/kernel/apm.c line 308 apm_ioctl().
Specific to suspend. Restores flags after suspend-wait
complete.

Perhaps have different types of NOFREEZE in a separate bitmask.

o arch/arm/kernel/apm.c line 526 apm_init()
Manages the kapmd thread -- seems specific to suspend.
Module-init function.

o arch/i386/kernel/apm.c line 2352 apm_init()
Manages the kapmd thread for i386, also seems specific to suspend.
Module-init function.

o arch/mips/kernel/apm.c line 308 apm_ioctl(). Similar to
arch/arm/kernel/apm.c above.

o arch/mips/kernel/apm.c line 472 kapmd(). Seems specific to
suspend.

o arch/sh/kernel/apm.c line 314 apm_ioctl(). Similar to
arch/arm/kernel/apm.c above.

o arch/sh/kernel/apm.c line 455 kapmd(). Similar to mips version.

o drivers/block/loop.c line 589 loop_thread(). Prevent loop
driver from being stalled on suspension (might be needed for
encryption, they say...). There is a wait_event_interruptible()
on each request, so should be OK.

o drivers/ieee1394/ieee1394_core.c line 1028 hpsbpkt_thread().
Firewire packet processing. Drains the queue, then hits
schedule(). Could avoid schedule forever if there is a flood
of firewire packets. So probably need to add some sort of
cond_resched() to the packet-processing loop.

o drivers/md/md.c line 4489 md_thread(). Waits on each pass through the
loop. Wakes up other threads, so should be OK.

Main concern would be if some other thread is waiting on
a disk access, but in such a way that this other thread
blocks freeze_processes().

o drivers/mmc/mmc_queue.c line 68 mmc_queue_thread(). Seems
specific to suspend. It says that it handles suspension
itself, but not clear to me how this happens. MMC_QUEUE_EXIT?
Doesn't seem likely. Maybe it just is sufficiently untrusting
of the device state that it can tolerate random suspends?

o drivers/mtd/mtd_blkdevs.c line 83 mtd_blktrans_thread().
Again seems suspend-specific.

o drivers/scsi/libsas/sas_scsi_host.c line 718 sas_queue_thread().
This one is a bit bizarre. One motivation seems to be to allow
SAS processing to continue to happen during suspend processing,
but this one seems to run only when a queue is shut down.

So at first glance appears to be specific to suspend, but not
sure if it would avoid blocking freeze_processes() if not
exempted from freeze.

o drivers/scsi/scsi_error.c line 1500 scsi_error_handler().
Again seems to be specific to suspend. Looks like there
are no indefinite code paths that avoid sleeping.

o drivers/usb/storage/usb.c line 304 usb_stor_control_thread().
Whoa! This one doesn't seem to pay attention to
kthread_should_stop()! Or maybe it is doing so in some
non-obvious way.

Also appears to be for suspend (to USB memory stick).

o init/do_mounts_initrd.c line 57 handle_initrd().
This looks to be short term anyway, so OK to leave.
But does kernel_execve() clear PF_NOFREEZE?

But it should be OK to freeze the init process when doing CPU
hotplug ops, right?

o kernel/fork.c line 917 copy_flags(). This is -clearing-
PF_NOFREEZE, so OK.

o kernel/power/process.c line 26 freezeable(). This is just
checking the flag.

This will need to be augmented to check flags in some other
task_struct field -- a bitfield that indicates what sort of
freezing that this task is exempt from. No task will be
exempt from hotplug freezing.

o kernel/softirq.c line 473 ksoftirqd(). Main concern here is
that it might be necessary for ksoftirqd() to run in order
for some unrelated process to find its way to the next
try_to_freeze() or refrigerator() call. Might be able to
take care of such situations by adding try_to_freeze() or
refrigerator() calls. If they occur, of course.

o kernel/softlockup.c line 88 watchdog(). Well, we wouldn't
want false alarms when freezing for hotplug. Perhaps
temporarily disabling timestamp checking while doing hotplug
would do the trick. But if hotplug takes the time required
to trigger softlockup (seconds!), we are broken anyway.
The fix would be to speed up the freezing process.

o kernel/workqueue.c line 240 worker_thread(). Same situation
as for ksoftirqd().

o net/bluetooth/bnep/core.c line 476 bnep_session(). Suspending
to a bluetooth device??? These guys got -hair-!!! I bet this
one can tolerate being frozen for hotplugging CPUs -- though
I could imagine the bluetooth protocol needing some TLC after
such an event. But I don't know enough about bluetooth to do
more than raise the possibility.

o net/bluetooth/cmtp/core.c line 290 cmtp_session(). Same as
for bnep_session(), at least as far as I can tell.

o net/bluetooth/hidp/core.c line 476 hidp_session(). Same as
for bnep_session(), AFAICT.

o net/bluetooth/rfcomm/core.c line 1940 rfcomm_run(). Same as
for bnep_session(), AFAICT.

o kernel/rcutorture.c -- no problem freezing, just need to make
sure that these processes do try_to_freeze() occasionally.

Again, may need additional try_to_freeze() calls in various places.

Pavel Machek

unread,

Feb 3, 2007, 6:07:59 AM2/3/07

to Rafael J. Wysocki

Hi!

> > > > static int _cpu_down(unsigned int cpu)
> > > > {
> > > > int err;
> > > > struct task_struct *p;
> > > > cpumask_t old_allowed, tmp;
> > > >
> > > > if (num_online_cpus() == 1)
> > > > return -EBUSY;
> > > >
> > > > if (!cpu_online(cpu))
> > > > return -EINVAL;
> > > >
> > > > if (freeze_processes()) {
> > > > err = -EBUSY;
> > > > goto out_freeze_notify_failed;
> > > > }
> > > > err = raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE,
> > > > (void *)(long)cpu);
> > >
> > > yeah. This all looks so nice that i almost cannot believe it's true :-)
> >
> > Well, it turns out that maybe it is in fact untrue. :-/
> >
> > I need to look at all uses of PF_NOFREEZE -- as I understand the
> > code, processes marked PF_NOFREEZE will continue running, potentially
> > interfering with the hotplug operation. :-(
> >
> > I will pass my findings on to this list.
>
> Well, I did it some time ago, although not very thoroughly.
>
> AFAICS there are not so many, but one that stands out is the worker threads.
> We needed two of them to actually go to sleep, so now it's possible to create
> a "freezeable workqueue" the worker thread of which will not set PF_NOFREEZE,
> but currently this is only used by XFS.

We should slowly move as workqueues to freezeable ones... Having too
much stuff NOFREEZE is evil, even for swsusp.
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Pavel Machek

unread,

Feb 3, 2007, 6:24:23 AM2/3/07

to Paul E. McKenney

Hi!

> > Part of what I need to look at. ;-)
>
> OK. This just might be feasible. That said, there is a lot of code
> containing PF_NOFREEZE that I am not familiar with. That said, here
> are my thoughts -- this is in addition to the changes to freeze_processes()
> and thaw_processes() called out earlier.
>
> Thoughts?

Looks ok to me.

> o Introduce a mutex to prevent overlapping freezes -- or find
> out what the heck prevents them at present!!! (I don't see
> anything.)

swsusp is protected by some giant "doing suspend" mutex. Other users
may be buggy :-).

> o Replace all the "current->flags |= PF_NOFREEZE" statements with
> "exempt_from_freeze(current, int pfe)" or some such. This would
> set the flags bit and also store the pfe argument into the pf_exempt
> field.

I'd suggest step 0, remove as many PF_NOFREEZE as possible... ok, you
seem to be doing that one.

> o init/do_mounts_initrd.c line 57 handle_initrd().
> This looks to be short term anyway, so OK to leave.
> But does kernel_execve() clear PF_NOFREEZE?
>
> But it should be OK to freeze the init process when doing CPU
> hotplug ops, right?

That looks bogus. If it is short term, it can as well live _without_
PF_NOFREEZE. Noone should suspend system at that stage, right?

> o kernel/softlockup.c line 88 watchdog(). Well, we wouldn't
> want false alarms when freezing for hotplug. Perhaps
> temporarily disabling timestamp checking while doing hotplug
> would do the trick. But if hotplug takes the time required
> to trigger softlockup (seconds!), we are broken anyway.
> The fix would be to speed up the freezing process.

Freezing _can_ take seconds. We do sync in between freezing userspace
and kernel, for example. We avoid freezing in some difficult situations
by waiting for I/O to complete....

> o net/bluetooth/bnep/core.c line 476 bnep_session(). Suspending
> to a bluetooth device??? These guys got -hair-!!! I bet this
> one can tolerate being frozen for hotplugging CPUs -- though
> I could imagine the bluetooth protocol needing some TLC after
> such an event. But I don't know enough about bluetooth to do
> more than raise the possibility.

Should be fixed. Someone was probably lazy.

> o net/bluetooth/cmtp/core.c line 290 cmtp_session(). Same as
> for bnep_session(), at least as far as I can tell.
>
> o net/bluetooth/hidp/core.c line 476 hidp_session(). Same as
> for bnep_session(), AFAICT.
>
> o net/bluetooth/rfcomm/core.c line 1940 rfcomm_run(). Same as
> for bnep_session(), AFAICT.

Someone was definitely lazy :-).
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Rafael J. Wysocki

unread,

Feb 3, 2007, 5:52:32 PM2/3/07

to Pavel Machek

On the other hand, some of the workqueues may be necessary for saving the image
(still, I have no examples ;-)).

Greetings,
Rafael

--
If you don't have the time to read,
you don't have the time or the tools to write.
- Stephen King

-

Paul E. McKenney

unread,

Feb 3, 2007, 7:32:16 PM2/3/07

to Rafael J. Wysocki

In any case, getting CPU hotplug working is probably first priority.

That said, I am testing a patch that (hopefully) gets rid of the
RCU-boost and rcutorture NOFREEZE tasks. Every little bit will help.

Thanx, Paul

Paul E. McKenney

unread,

Feb 3, 2007, 11:40:38 PM2/3/07

to Pavel Machek

On Sat, Feb 03, 2007 at 01:17:45AM +0100, Pavel Machek wrote:
> Hi!
>
> > > Part of what I need to look at. ;-)
> >
> > OK. This just might be feasible. That said, there is a lot of code
> > containing PF_NOFREEZE that I am not familiar with. That said, here
> > are my thoughts -- this is in addition to the changes to freeze_processes()
> > and thaw_processes() called out earlier.
> >
> > Thoughts?
>
> Looks ok to me.

Cool!

> > o Introduce a mutex to prevent overlapping freezes -- or find
> > out what the heck prevents them at present!!! (I don't see
> > anything.)
>
> swsusp is protected by some giant "doing suspend" mutex. Other users
> may be buggy :-).

Ah! Any reason not to have locking at the level of the
freeze_processes()/thaw_processes() functions?

> > o Replace all the "current->flags |= PF_NOFREEZE" statements with
> > "exempt_from_freeze(current, int pfe)" or some such. This would
> > set the flags bit and also store the pfe argument into the pf_exempt
> > field.
>
> I'd suggest step 0, remove as many PF_NOFREEZE as possible... ok, you
> seem to be doing that one.

Well, in my little corner of the kernel, anyway. ;-)

> > o init/do_mounts_initrd.c line 57 handle_initrd().
> > This looks to be short term anyway, so OK to leave.
> > But does kernel_execve() clear PF_NOFREEZE?
> >
> > But it should be OK to freeze the init process when doing CPU
> > hotplug ops, right?
>
> That looks bogus. If it is short term, it can as well live _without_
> PF_NOFREEZE. Noone should suspend system at that stage, right?

I agree that any attempt to freeze that early in boot would be
at best an act of extreme bravery!

> > o kernel/softlockup.c line 88 watchdog(). Well, we wouldn't
> > want false alarms when freezing for hotplug. Perhaps
> > temporarily disabling timestamp checking while doing hotplug
> > would do the trick. But if hotplug takes the time required
> > to trigger softlockup (seconds!), we are broken anyway.
> > The fix would be to speed up the freezing process.
>
> Freezing _can_ take seconds. We do sync in between freezing userspace
> and kernel, for example. We avoid freezing in some difficult situations
> by waiting for I/O to complete....

OK. Point taken.

> > o net/bluetooth/bnep/core.c line 476 bnep_session(). Suspending
> > to a bluetooth device??? These guys got -hair-!!! I bet this
> > one can tolerate being frozen for hotplugging CPUs -- though
> > I could imagine the bluetooth protocol needing some TLC after
> > such an event. But I don't know enough about bluetooth to do
> > more than raise the possibility.
>
> Should be fixed. Someone was probably lazy.
>
> > o net/bluetooth/cmtp/core.c line 290 cmtp_session(). Same as
> > for bnep_session(), at least as far as I can tell.
> >
> > o net/bluetooth/hidp/core.c line 476 hidp_session(). Same as
> > for bnep_session(), AFAICT.
> >
> > o net/bluetooth/rfcomm/core.c line 1940 rfcomm_run(). Same as
> > for bnep_session(), AFAICT.
>
> Someone was definitely lazy :-).
> Pavel

OK, so we should think in terms of moving these to try_to_freeze(),
then.

Thanx, Paul

Rafael J. Wysocki

unread,

Feb 4, 2007, 6:09:00 AM2/4/07

to pau...@linux.vnet.ibm.com

On Sunday, 4 February 2007 05:39, Paul E. McKenney wrote:
> On Sat, Feb 03, 2007 at 01:17:45AM +0100, Pavel Machek wrote:
> > Hi!
> >
> > > > Part of what I need to look at. ;-)
> > >
> > > OK. This just might be feasible. That said, there is a lot of code
> > > containing PF_NOFREEZE that I am not familiar with. That said, here
> > > are my thoughts -- this is in addition to the changes to freeze_processes()
> > > and thaw_processes() called out earlier.
> > >
> > > Thoughts?
> >
> > Looks ok to me.
>
> Cool!
>
> > > o Introduce a mutex to prevent overlapping freezes -- or find
> > > out what the heck prevents them at present!!! (I don't see
> > > anything.)
> >
> > swsusp is protected by some giant "doing suspend" mutex. Other users
> > may be buggy :-).
>
> Ah! Any reason not to have locking at the level of the
> freeze_processes()/thaw_processes() functions?

I don't think so. It just wasn't needed before ...

> > > o Replace all the "current->flags |= PF_NOFREEZE" statements with
> > > "exempt_from_freeze(current, int pfe)" or some such. This would
> > > set the flags bit and also store the pfe argument into the pf_exempt
> > > field.
> >
> > I'd suggest step 0, remove as many PF_NOFREEZE as possible... ok, you
> > seem to be doing that one.
>
> Well, in my little corner of the kernel, anyway. ;-)
>
> > > o init/do_mounts_initrd.c line 57 handle_initrd().
> > > This looks to be short term anyway, so OK to leave.
> > > But does kernel_execve() clear PF_NOFREEZE?
> > >
> > > But it should be OK to freeze the init process when doing CPU
> > > hotplug ops, right?
> >
> > That looks bogus. If it is short term, it can as well live _without_
> > PF_NOFREEZE. Noone should suspend system at that stage, right?
>
> I agree that any attempt to freeze that early in boot would be
> at best an act of extreme bravery!

This is needed so that the _resume_ works, when it's handled from the user land
by our resume tool. Currently, the resume code calls freeze_processes() too.

Definitely.

I think we also should try to use freezeable workqueues wherever possible.

Greetings,
Rafael

Pavel Machek

unread,

Feb 4, 2007, 7:54:02 AM2/4/07

to Rafael J. Wysocki

Hi!

> > > > o init/do_mounts_initrd.c line 57 handle_initrd().
> > > > This looks to be short term anyway, so OK to leave.
> > > > But does kernel_execve() clear PF_NOFREEZE?
> > > >
> > > > But it should be OK to freeze the init process when doing CPU
> > > > hotplug ops, right?
> > >
> > > That looks bogus. If it is short term, it can as well live _without_
> > > PF_NOFREEZE. Noone should suspend system at that stage, right?
> >
> > I agree that any attempt to freeze that early in boot would be
> > at best an act of extreme bravery!
>
> This is needed so that the _resume_ works, when it's handled from the user land
> by our resume tool. Currently, the resume code calls
> freeze_processes() too.

I do not understand... freeze_processes() always leaves curent process
running... why is it needed?

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Rafael J. Wysocki

unread,

Feb 4, 2007, 8:46:24 AM2/4/07

to Pavel Machek

Hi,

On Sunday, 4 February 2007 13:53, Pavel Machek wrote:
> Hi!
>
> > > > > o init/do_mounts_initrd.c line 57 handle_initrd().
> > > > > This looks to be short term anyway, so OK to leave.
> > > > > But does kernel_execve() clear PF_NOFREEZE?
> > > > >
> > > > > But it should be OK to freeze the init process when doing CPU
> > > > > hotplug ops, right?
> > > >
> > > > That looks bogus. If it is short term, it can as well live _without_
> > > > PF_NOFREEZE. Noone should suspend system at that stage, right?
> > >
> > > I agree that any attempt to freeze that early in boot would be
> > > at best an act of extreme bravery!
> >
> > This is needed so that the _resume_ works, when it's handled from the user land
> > by our resume tool. Currently, the resume code calls
> > freeze_processes() too.
>
> I do not understand... freeze_processes() always leaves curent process
> running... why is it needed?

IIRC, the do_linuxrc thread cannot be frozen (doesn't call try_to_freeze()),
so the freeze_processes() during the resume fails and the resume fails as a
result.

Still, I have an idea:

Instead of hunting for PF_NOFREEZE and wondering if the suspend/resume fails
when we remove them or replace them with try_to_freeze(), why don't we add
an "ignore_pf_nofreeze" argument to freeze_processes() and make it regard
_all_ tasks as if they haven't set PF_NOFREEZE when this "ignore_pf_nofreeze"
is set? Of course, additionally we'll have to make everyone call
try_to_freeze(), even if they set PF_NOFREEZE anyway.

Then, if freeze_processes() is called with "ignore_pf_nofreeze = 0", it will
work just as it does now. However, if it's called with
"ignore_pf_nofreeze = 1", it will try to make all prcesses enter the
refrigerator. The "ignore_pf_nofreeze = 0" version will be suitable for us
(ie. suspend etc.) and the "ignore_pf_nofreeze = 1" version will be suitable
for the CPU hotplug and such things.

You think?

Rafael

Pavel Machek

unread,

Feb 4, 2007, 8:51:35 AM2/4/07

to Rafael J. Wysocki

Hi!

> > > This is needed so that the _resume_ works, when it's handled from the user land
> > > by our resume tool. Currently, the resume code calls
> > > freeze_processes() too.
> >
> > I do not understand... freeze_processes() always leaves curent process
> > running... why is it needed?
>
> IIRC, the do_linuxrc thread cannot be frozen (doesn't call try_to_freeze()),
> so the freeze_processes() during the resume fails and the resume fails as a
> result.

Aha, ok. (We still may want to add try_to_freeze there; there's no
reason to have that running while resuming).

> Still, I have an idea:
>
> Instead of hunting for PF_NOFREEZE and wondering if the suspend/resume fails
> when we remove them or replace them with try_to_freeze(), why don't we add
> an "ignore_pf_nofreeze" argument to freeze_processes() and make it regard
> _all_ tasks as if they haven't set PF_NOFREEZE when this "ignore_pf_nofreeze"
> is set? Of course, additionally we'll have to make everyone call
> try_to_freeze(), even if they set PF_NOFREEZE anyway.
>
> Then, if freeze_processes() is called with "ignore_pf_nofreeze = 0", it will
> work just as it does now. However, if it's called with
> "ignore_pf_nofreeze = 1", it will try to make all prcesses enter the
> refrigerator. The "ignore_pf_nofreeze = 0" version will be suitable for us
> (ie. suspend etc.) and the "ignore_pf_nofreeze = 1" version will be suitable
> for the CPU hotplug and such things.

Yep, something like that will be needed. Probably more finegrained
(with flags), because CPU hotplug and kprobes may want their own sets
of unfreezeable processes.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Rafael J. Wysocki

unread,

Feb 4, 2007, 9:00:03 AM2/4/07

to Pavel Machek

Yes, but for this purpose we'll need to use one more task flag at least, I
think. Or transform PF_NOFREEZE into a 2-bit field with values from 0 to 3
indicating the "freezing level" at which given task should be frozen and
call freeze_processes() with the level as an argument?

Paul, what is your opnion?

Rafael