Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[RFC PATCH] freezer: revert 27920651fe "PM / Freezer: Make fake_signal_wake_up() wake TASK_KILLABLE tasks too"

4 views
Skip to first unread message

Tejun Heo

unread,
Oct 31, 2011, 6:20:02 PM10/31/11
to
Commit 27920651fe "PM / Freezer: Make fake_signal_wake_up() wake
TASK_KILLABLE tasks too" made freezer wake up tasks in TASK_KILLABLE
sleep too citing non-interruptible but killable sleeps in cifs and
nfs.

I don't think we can do this. We should not send spurious unsolicited
non-interruptible wakeups. Most synchornization constructs are built
to cope with spurious wakeups and any INTERRUPTIBLE sleep must be able
to handle spurious wakeups but that's not true for KILLABLE sleeps -
KILLABLE condition cannot be cancelled.

This is probably okay for most cases but circumventing fundamental
wakeup condition like this is asking for trouble. Furthermore, I'm
not sure the behavior change brought on by this change - breaking
nfs/cifs uninterruptible operation guarantee - is correct. If such
behavior is desirable, the right thing to do is using intr mount
option, not circumventing it from PM layer.

Signed-off-by: Tejun Heo <t...@kernel.org>
Cc: Jeff Layton <jla...@redhat.com>
---
Neil, Steve, do the network filesystems need a way to indicate "I can
either be killed or enter freezer"?

Thanks.

kernel/freezer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/freezer.c b/kernel/freezer.c
index 66a594e..7b01de9 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -67,7 +67,7 @@ static void fake_signal_wake_up(struct task_struct *p)
unsigned long flags;

spin_lock_irqsave(&p->sighand->siglock, flags);
- signal_wake_up(p, 1);
+ signal_wake_up(p, 0);
spin_unlock_irqrestore(&p->sighand->siglock, flags);
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Tejun Heo

unread,
Oct 31, 2011, 7:20:02 PM10/31/11
to
On Mon, Oct 31, 2011 at 03:17:43PM -0700, Tejun Heo wrote:
> Commit 27920651fe "PM / Freezer: Make fake_signal_wake_up() wake
> TASK_KILLABLE tasks too" made freezer wake up tasks in TASK_KILLABLE
> sleep too citing non-interruptible but killable sleeps in cifs and
> nfs.
>
> I don't think we can do this. We should not send spurious unsolicited
> non-interruptible wakeups. Most synchornization constructs are built
> to cope with spurious wakeups and any INTERRUPTIBLE sleep must be able
> to handle spurious wakeups but that's not true for KILLABLE sleeps -
> KILLABLE condition cannot be cancelled.
>
> This is probably okay for most cases but circumventing fundamental
> wakeup condition like this is asking for trouble. Furthermore, I'm
> not sure the behavior change brought on by this change - breaking
> nfs/cifs uninterruptible operation guarantee - is correct. If such
> behavior is desirable, the right thing to do is using intr mount
> option, not circumventing it from PM layer.
>
> Signed-off-by: Tejun Heo <t...@kernel.org>
> Cc: Jeff Layton <jla...@redhat.com>
> ---
> Neil, Steve, do the network filesystems need a way to indicate "I can
> either be killed or enter freezer"?

Hmm... okay, so commit f06ac72e929 "cifs, freezer: add
wait_event_freezekillable and have cifs use it" added freezable &
killable sleep. If this is necessary, the right thing to do is adding
another waking state, not modifying existing one's behavior.

But, before going there, is this *really* necessary? Do we really
have to choose among different combinations of interruptible, killable
and freezable? If something doesn't want to be bothered than being
killed, assuming it doesn't want to go hibernating is kinda natural.

IOW, can we please update cifs, if it wants to allow hibernation, to
use interruptible sleep in wait_for_response() than trying to modify
basic sleep mechanism?

Thank you.

--
tejun

Rafael J. Wysocki

unread,
Oct 31, 2011, 7:30:01 PM10/31/11
to
On Monday, October 31, 2011, Tejun Heo wrote:
> Commit 27920651fe "PM / Freezer: Make fake_signal_wake_up() wake
> TASK_KILLABLE tasks too" made freezer wake up tasks in TASK_KILLABLE
> sleep too citing non-interruptible but killable sleeps in cifs and
> nfs.
>
> I don't think we can do this. We should not send spurious unsolicited
> non-interruptible wakeups. Most synchornization constructs are built
> to cope with spurious wakeups and any INTERRUPTIBLE sleep must be able
> to handle spurious wakeups but that's not true for KILLABLE sleeps -
> KILLABLE condition cannot be cancelled.
>
> This is probably okay for most cases but circumventing fundamental
> wakeup condition like this is asking for trouble. Furthermore, I'm
> not sure the behavior change brought on by this change - breaking
> nfs/cifs uninterruptible operation guarantee - is correct. If such
> behavior is desirable, the right thing to do is using intr mount
> option, not circumventing it from PM layer.

Do you have any specific examples of breakage, or is it just that you _think_
it's not quite right?

One patch depending on that change has been merged already and I have two
more in the queue, so I'd like to clarify this ASAP. Jeff, Steve?

> Signed-off-by: Tejun Heo <t...@kernel.org>
> Cc: Jeff Layton <jla...@redhat.com>
> ---
> Neil, Steve, do the network filesystems need a way to indicate "I can
> either be killed or enter freezer"?
>
> Thanks.
>
> kernel/freezer.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/freezer.c b/kernel/freezer.c
> index 66a594e..7b01de9 100644
> --- a/kernel/freezer.c
> +++ b/kernel/freezer.c
> @@ -67,7 +67,7 @@ static void fake_signal_wake_up(struct task_struct *p)
> unsigned long flags;
>
> spin_lock_irqsave(&p->sighand->siglock, flags);
> - signal_wake_up(p, 1);
> + signal_wake_up(p, 0);
> spin_unlock_irqrestore(&p->sighand->siglock, flags);
> }

Thanks,
Rafael

Tejun Heo

unread,
Oct 31, 2011, 7:40:01 PM10/31/11
to
Hello,

On Tue, Nov 01, 2011 at 12:24:16AM +0100, Rafael J. Wysocki wrote:
> > This is probably okay for most cases but circumventing fundamental
> > wakeup condition like this is asking for trouble. Furthermore, I'm
> > not sure the behavior change brought on by this change - breaking
> > nfs/cifs uninterruptible operation guarantee - is correct. If such
> > behavior is desirable, the right thing to do is using intr mount
> > option, not circumventing it from PM layer.
>
> Do you have any specific examples of breakage, or is it just that you _think_
> it's not quite right?

I can't remember one off the top of my head but I'm pretty sure there
at least are few which expect tight inter-locking between sleeps and
wakeups. I'll look for examples and post reply. ISTR them being
kernel threads so this might not apply directly but it's still a
dangerous game to play.

Bugs caused by behaviors like this will be very difficult to reproduce
and diagnose. There is no reason to play a gamble like this. If
somebody *really* wants non-interruptible killable & freezable sleep,
we really should be adding TASK_WAKE_FREEZER or something instead of
modifying the behavior of TASK_KILLABLE.

Thanks.

--
tejun

Steve French

unread,
Oct 31, 2011, 7:50:02 PM10/31/11
to
Probably, yes, but I will defer to Jeff as he has looked
more recently at these issues.

I can explain cifs state, and disconnect/reconnection of sessions
(and smb2 is a little more feature rich in this regard), but will
let Jeff explain the more subtle points you are getting at.

--
Thanks,

Steve

Tejun Heo

unread,
Oct 31, 2011, 8:00:02 PM10/31/11
to
Hello,

On Mon, Oct 31, 2011 at 06:45:48PM -0500, Steve French wrote:
> >> Signed-off-by: Tejun Heo <t...@kernel.org>
> >> Cc: Jeff Layton <jla...@redhat.com>
> >> ---
> >> Neil, Steve, do the network filesystems need a way to indicate "I can
> >> either be killed or enter freezer"?
>
> Probably, yes, but I will defer to Jeff as he has looked
> more recently at these issues.
>
> I can explain cifs state, and disconnect/reconnection of sessions
> (and smb2 is a little more feature rich in this regard), but will
> let Jeff explain the more subtle points you are getting at.

Hmmm... I'm getting confused.

For nfs, this really is a non-issue. Either the user wants nointr or
intr behavior. NFS nointr is rather crazy - it's basically "nothing
can do anything to tasks which is doing NFS IO until it's complete"
and really meant to be used for servers sharing filesystems for /usr,
/home and stuff. It doesn't make whole lot of sense on systems which
may go suspend and that's why there's intr option.

I suppose the problem is that cifs doesn't know how to do 'intr' yet,
right? If that really is the problem, the correct long term solution
would be implementing proper intr behavior and it doesn't make any
sense to push this type of change to PM core to for short term
workaround. Just use prepare_to_wait() / schedule() / finish_wait()
directly w/ INTERRUPTIBLE sleep and don't break out of wait loop on
signal_pending(). If this should be used in multiple places, write up
a wait_event_XXX() wrapper. There is absolutely no reason to change
wakeup condition.

Thank you.

--
tejun

Steve French

unread,
Oct 31, 2011, 8:10:02 PM10/31/11
to
It isn't that simple (among other reasons due to much time waiting
in the socket interface), but since this directly addresses problems
Jeff has spent much time debugging, I would like him to chime in.


--
Thanks,

Steve

Tejun Heo

unread,
Oct 31, 2011, 9:00:02 PM10/31/11
to
Hey, again.

On Mon, Oct 31, 2011 at 04:30:59PM -0700, Tejun Heo wrote:
> I can't remember one off the top of my head but I'm pretty sure there
> at least are few which expect tight inter-locking between sleeps and
> wakeups. I'll look for examples and post reply. ISTR them being
> kernel threads so this might not apply directly but it's still a
> dangerous game to play.

Hmm... I couldn't find KILLABLE used like that but here are two
UNINTERRUPTIBLE sleep examples.

* kthread_start() depends on the fact that a kthread won't be woken up
from UNINTERRUPTIBLE sleep spuriously.

* jfs_flush_journal() doesn't check whether the wakeup was spurious
after waiting if !tblkGC_COMMITTED.

Maybe we can re-define KILLABLE as killable && freezable but IMHO that
requires pretty strong rationales. If at all possible, let's not
diddle with that if it can be worked around some other way.

Thank you.

Jeff Layton

unread,
Nov 1, 2011, 4:20:01 AM11/1/11
to
On Mon, 31 Oct 2011 17:55:05 -0700
Tejun Heo <t...@kernel.org> wrote:

> Hey, again.
>
> On Mon, Oct 31, 2011 at 04:30:59PM -0700, Tejun Heo wrote:
> > I can't remember one off the top of my head but I'm pretty sure there
> > at least are few which expect tight inter-locking between sleeps and
> > wakeups. I'll look for examples and post reply. ISTR them being
> > kernel threads so this might not apply directly but it's still a
> > dangerous game to play.
>
> Hmm... I couldn't find KILLABLE used like that but here are two
> UNINTERRUPTIBLE sleep examples.
>
> * kthread_start() depends on the fact that a kthread won't be woken up
> from UNINTERRUPTIBLE sleep spuriously.
>
> * jfs_flush_journal() doesn't check whether the wakeup was spurious
> after waiting if !tblkGC_COMMITTED.
>
> Maybe we can re-define KILLABLE as killable && freezable but IMHO that
> requires pretty strong rationales. If at all possible, let's not
> diddle with that if it can be worked around some other way.
>
> Thank you.
>

(cc'ing Trond and the linux-nfs mailing list -- fwiw, he maintains the
NFS client code -- Bruce is the NFS server maintainer and probably has
little interest in this thread).

The main reason for this change is primarily that we have people with
laptops and nfs and cifs mounts that sometimes fail to suspend.

IIUC, the TASK_KILLABLE was mostly added to ensure that file-store
writes would be uninterruptible, but still allow those tasks to be
killed if the process is going down anyway.

The intr/nointr mount options in NFS have been deprecated since
TASK_KILLABLE was added. The scheme now is basically that those sleeps
ignore any signals except for fatal ones. So, that knob is meaningless
and has been for a long time now.

cifs never had a working intr/nointr knob, but signal handling while
waiting for replies was always a difficult thing to handle correctly. I
don't think the right answer is to go back to using such a knob in cifs
or nfs.

I suppose we could look at going back to the world of complicated
signal handling and TASK_INTERRUPTIBLE, but that's not a trivial change
either. The TASK_WAKE_FREEZABLE flag you mention might make more sense
than doing that.

--
Jeff Layton <jla...@redhat.com>

Jeff Layton

unread,
Nov 1, 2011, 7:10:01 AM11/1/11
to
Also, since task state and signal handling clearly isn't my forte...can
you clarify what the main difference is in using a TASK_KILLABLE sleep
vs. TASK_INTERRUPTIBLE with all signals but SIGKILL blocked?

I know that the process state would end up different (S vs. D state).
Is there anything else we'd need to be concerned with if we converted
all these call sites to use such a scheme in lieu of a new
TASK_WAKE_FREEZABLE flag or similar?

Tejun Heo

unread,
Nov 1, 2011, 12:40:02 PM11/1/11
to
Hello, Jeff.

On Tue, Nov 01, 2011 at 06:59:58AM -0400, Jeff Layton wrote:
> > I suppose we could look at going back to the world of complicated
> > signal handling and TASK_INTERRUPTIBLE, but that's not a trivial change
> > either. The TASK_WAKE_FREEZABLE flag you mention might make more sense
> > than doing that.

I see.

> Also, since task state and signal handling clearly isn't my forte...can
> you clarify what the main difference is in using a TASK_KILLABLE sleep
> vs. TASK_INTERRUPTIBLE with all signals but SIGKILL blocked?
>
> I know that the process state would end up different (S vs. D state).
> Is there anything else we'd need to be concerned with if we converted
> all these call sites to use such a scheme in lieu of a new
> TASK_WAKE_FREEZABLE flag or similar?

Manipulating sigmask would work in most cases but there are corner
cases (e.g. debug signals will force the mask off) and diddling with
sigmask in kernel generally isn't a good idea.

If TASK_KILLABLE is only used for killable && freezable, that probably
would be the cleanest solution but given the name and the fact that
people are in general much more aware of SIGKILL's then freezer, I
think adding such assumption is likely to cause very subtle bugs. For
example, mem_cgroup_handle_oom() seems to assume that if it's waken
from TASK_KILLABLE either the condition it was waiting for is true or
it is dying.

If there are only a handful sites which need this type of behavior,
wouldn't something like the following work?

#define wait_event_freezekillable(wq, condition) \
do { \
DEFINE_WAIT(__wait); \
for (;;) { \
prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \
if (condition || fatal_signal_pending(current)) \
break; \
schedule(); \
try_to_freeze(); \
} \
finish_wait(&wq, &__wait); \
} while (0)

Thanks.

--
tejun

Trond Myklebust

unread,
Nov 1, 2011, 1:00:02 PM11/1/11
to
Err... Won't this end up busy-waiting if a non-fatal signal is received?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.M...@netapp.com
www.netapp.com

Tejun Heo

unread,
Nov 1, 2011, 1:00:02 PM11/1/11
to
Hello,

On Tue, Nov 01, 2011 at 12:49:31PM -0400, Trond Myklebust wrote:
> > #define wait_event_freezekillable(wq, condition) \
> > do { \
> > DEFINE_WAIT(__wait); \
> > for (;;) { \
> > prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \
> > if (condition || fatal_signal_pending(current)) \
> > break; \
> > schedule(); \
> > try_to_freeze(); \
> > } \
> > finish_wait(&wq, &__wait); \
> > } while (0)
>
> Err... Won't this end up busy-waiting if a non-fatal signal is received?

Ah... right, forgot about signal_pending_state() special case in
schedule(). Any better ideas, anyone?

--
tejun

Oleg Nesterov

unread,
Nov 1, 2011, 2:00:03 PM11/1/11
to
On 10/31, Tejun Heo wrote:
>
> Commit 27920651fe "PM / Freezer: Make fake_signal_wake_up() wake
> TASK_KILLABLE tasks too" made freezer wake up tasks in TASK_KILLABLE
> sleep too citing non-interruptible but killable sleeps in cifs and
> nfs.
>
> I don't think we can do this. We should not send spurious unsolicited
> non-interruptible wakeups. Most synchornization constructs are built
> to cope with spurious wakeups and any INTERRUPTIBLE sleep must be able
> to handle spurious wakeups but that's not true for KILLABLE sleeps -
> KILLABLE condition cannot be cancelled.

Agreed.

For example. sys_read() or page can sleep in TASK_KILLABLE assuming that
wait/down/whatever _killable can only fail if we can not return to the
usermode.

TASK_TRACED case is obviously wrong too.

> --- a/kernel/freezer.c
> +++ b/kernel/freezer.c
> @@ -67,7 +67,7 @@ static void fake_signal_wake_up(struct task_struct *p)
> unsigned long flags;
>
> spin_lock_irqsave(&p->sighand->siglock, flags);
> - signal_wake_up(p, 1);
> + signal_wake_up(p, 0);
> spin_unlock_irqrestore(&p->sighand->siglock, flags);
> }

Agreed, this looks like a bug fix to me.

Acked-by: Oleg Nesterov <ol...@redhat.com>

Tejun Heo

unread,
Nov 1, 2011, 2:10:02 PM11/1/11
to
On Tue, Nov 01, 2011 at 06:59:53PM +0100, Oleg Nesterov wrote:
> > #define wait_event_freezekillable(wq, condition) \
> > do { \
> > DEFINE_WAIT(__wait); \
> > for (;;) { \
> > prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \
> > if (condition || fatal_signal_pending(current)) \
> > break; \
> > schedule(); \
>
> No, this can't work, afaics.
>
> Once the caller recieves a non-fatal signal (gets TIF_SIGPENDING),
> schedule() won't block in TASK_INTERRUPTIBLE state.
>
> IOW, wait_event_freezekillable() becomes a busy-wait loop.

Yeah yeah, Trond already pointed it out. I forgot about the
sigpending special case in schedule(), which I think is rather odd, oh
well. Any better ideas?

Oleg Nesterov

unread,
Nov 1, 2011, 2:10:03 PM11/1/11
to
On 11/01, Tejun Heo wrote:
>
> diddling with
> sigmask in kernel generally isn't a good idea.

Agreed.

> If there are only a handful sites which need this type of behavior,
> wouldn't something like the following work?
>
> #define wait_event_freezekillable(wq, condition) \
> do { \
> DEFINE_WAIT(__wait); \
> for (;;) { \
> prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \
> if (condition || fatal_signal_pending(current)) \
> break; \
> schedule(); \

No, this can't work, afaics.

Once the caller recieves a non-fatal signal (gets TIF_SIGPENDING),
schedule() won't block in TASK_INTERRUPTIBLE state.

IOW, wait_event_freezekillable() becomes a busy-wait loop.

Oleg.

Oleg Nesterov

unread,
Nov 1, 2011, 2:20:02 PM11/1/11
to
On 11/01, Tejun Heo wrote:
>
> Yeah yeah, Trond already pointed it out. I forgot about the
> sigpending special case in schedule(), which I think is rather odd,

I disagree with "rather odd" ;)

We have a lot of examples of

current->state = TASK_INTERRUPTIBLE;
...
if (signal_pending())
break;
schedule();

Without that special case in schedule() the code above becomes racy.
Just consider __wait_event_interruptible().

> Any better ideas?

Well. As a simple (probably temporary) fix, I'd suggest

#define wait_event_freezekillable(wq, condition)
{
freezer_do_not_count();
__retval = wait_event_killable(condition);
freezer_count();
__retval;
}

Do you think it can work?

Oleg.

Jeff Layton

unread,
Nov 1, 2011, 2:30:01 PM11/1/11
to
On Tue, 1 Nov 2011 11:06:01 -0700
Tejun Heo <t...@kernel.org> wrote:

> On Tue, Nov 01, 2011 at 06:59:53PM +0100, Oleg Nesterov wrote:
> > > #define wait_event_freezekillable(wq, condition) \
> > > do { \
> > > DEFINE_WAIT(__wait); \
> > > for (;;) { \
> > > prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \
> > > if (condition || fatal_signal_pending(current)) \
> > > break; \
> > > schedule(); \
> >
> > No, this can't work, afaics.
> >
> > Once the caller recieves a non-fatal signal (gets TIF_SIGPENDING),
> > schedule() won't block in TASK_INTERRUPTIBLE state.
> >
> > IOW, wait_event_freezekillable() becomes a busy-wait loop.
>
> Yeah yeah, Trond already pointed it out. I forgot about the
> sigpending special case in schedule(), which I think is rather odd, oh
> well. Any better ideas?
>

This is (obviously) not my area of expertise, but the TAKE_WAKEFREEZE
flag that you mentioned earlier might be the best way to solve this.

Tasks that want to be awoken on freezer events can set that flag and we
can have the freezer code wake them up. For the NFS and CIFS code, we'll
just ensure that we set that flag

This does mean a rather nasty proliferation of new wait_event_* macros,
and we'll need a new schedule_timeout_freezekillable() variant for the
new state. It looks fairly straightforward to implement though.

--
Jeff Layton <jla...@redhat.com>

Tejun Heo

unread,
Nov 1, 2011, 2:30:01 PM11/1/11
to
Hello,

On Tue, Nov 01, 2011 at 07:13:29PM +0100, Oleg Nesterov wrote:
> On 11/01, Tejun Heo wrote:
> >
> > Yeah yeah, Trond already pointed it out. I forgot about the
> > sigpending special case in schedule(), which I think is rather odd,
>
> I disagree with "rather odd" ;)
>
> We have a lot of examples of
>
> current->state = TASK_INTERRUPTIBLE;
> ...
> if (signal_pending())
> break;
> schedule();
>
> Without that special case in schedule() the code above becomes racy.
> Just consider __wait_event_interruptible().

But __wait_event_interruptible() does proper set-TASK_*, check
sigpending and schedule() sequence. As long as the waker performs
seg-sigpending, wakeup sequence in the correct order, nothing is
broken (as w/ any other wakeup conditions). The special case deals
with callers which don't check sigpending between set-TASK_* and
schedule() and that's the part I think is a bit odd. Whether I feel
odd or not is irrelevant tho - it's already there.

> > Any better ideas?
>
> Well. As a simple (probably temporary) fix, I'd suggest
>
> #define wait_event_freezekillable(wq, condition)
> {
> freezer_do_not_count();
> __retval = wait_event_killable(condition);
> freezer_count();
> __retval;
> }
>
> Do you think it can work?

Yeah, probably. I was hoping to remove count/do_not_count tho.
Hmmm... maybe we can just flip PF_NOFREEZE instead with a bit of
modification, I think.

Thanks.

--
tejun

Oleg Nesterov

unread,
Nov 1, 2011, 3:50:01 PM11/1/11
to
On 11/01, Tejun Heo wrote:
>
> Hello,
>
> On Tue, Nov 01, 2011 at 07:13:29PM +0100, Oleg Nesterov wrote:
> > On 11/01, Tejun Heo wrote:
> > >
> > > Yeah yeah, Trond already pointed it out. I forgot about the
> > > sigpending special case in schedule(), which I think is rather odd,
> >
> > I disagree with "rather odd" ;)
> >
> > We have a lot of examples of
> >
> > current->state = TASK_INTERRUPTIBLE;
> > ...
> > if (signal_pending())
> > break;
> > schedule();
> >
> > Without that special case in schedule() the code above becomes racy.
> > Just consider __wait_event_interruptible().
>
> But __wait_event_interruptible() does proper set-TASK_*, check
> sigpending and schedule() sequence. As long as the waker performs
> seg-sigpending, wakeup sequence in the correct order, nothing is
> broken (as w/ any other wakeup conditions). The special case deals
> with callers which don't check sigpending between set-TASK_* and
> schedule() and that's the part I think is a bit odd.

OK, agreed, __wait_event_interruptible() was a bad example.

Yes, this is only needed for the code which doesn't check
signal_pending() "properly", or doesn't check it at all before
schedule(). OK, say, wait_for_completion_interruptible().
Or schedule_timeout_interruptible().

I suspect we have a lot more examples. Historically linux allows to
set TASK_INTERRUPTIBLE and schedule() without any checks.

> Whether I feel
> odd or not is irrelevant tho - it's already there.

Yes, I don't think we can remove it.

> > > Any better ideas?
> >
> > Well. As a simple (probably temporary) fix, I'd suggest
> >
> > #define wait_event_freezekillable(wq, condition)
> > {
> > freezer_do_not_count();
> > __retval = wait_event_killable(condition);
> > freezer_count();
> > __retval;
> > }
> >
> > Do you think it can work?
>
> Yeah, probably. I was hoping to remove count/do_not_count tho.

Or at least rename it ;)

> Hmmm... maybe we can just flip PF_NOFREEZE instead with a bit of
> modification, I think.

Perhaps.

Or we can add TASK_FREEZABLE (like TASK_WAKEKILL), iirc we already
discussed this some time ago. And probably it makes sense to add the
generic wait_event_state().

Oleg.

Oleg Nesterov

unread,
Nov 1, 2011, 4:00:02 PM11/1/11
to
On 11/01, Oleg Nesterov wrote:
>
> Or we can add TASK_FREEZABLE (like TASK_WAKEKILL), iirc we already
> discussed this some time ago. And probably it makes sense to add the
> generic wait_event_state().

Forgot to mention. I think that before anything else we need
signal_wake_up_state(). For example, note that none of the callers
of signal_wake_up(resume => true) in ptrace code wants to wake up
the killable task.

Tejun Heo

unread,
Nov 1, 2011, 6:00:02 PM11/1/11
to
Hey, Oleg.

On Tue, Nov 01, 2011 at 08:46:01PM +0100, Oleg Nesterov wrote:
> On 11/01, Oleg Nesterov wrote:
> >
> > Or we can add TASK_FREEZABLE (like TASK_WAKEKILL), iirc we already
> > discussed this some time ago. And probably it makes sense to add the
> > generic wait_event_state().
>
> Forgot to mention. I think that before anything else we need
> signal_wake_up_state(). For example, note that none of the callers
> of signal_wake_up(resume => true) in ptrace code wants to wake up
> the killable task.

Yeah, agreed for both wait_event_state() and signal_wake_up_state().
For now, let's go with the count/dont_count. Can you please write up
a patch for that? Jeff, does this seem okay to you?

For TASK_FREEZABLE, I'm not entirely sure. Combined with
wait_event_state(), it can definitely reduce the number of different
variants of wait_event_*(). Let's see.

Thanks.

--
tejun

Jeff Layton

unread,
Nov 2, 2011, 7:50:01 AM11/2/11
to
On Tue, 1 Nov 2011 14:57:10 -0700
Tejun Heo <t...@kernel.org> wrote:

> Hey, Oleg.
>
> On Tue, Nov 01, 2011 at 08:46:01PM +0100, Oleg Nesterov wrote:
> > On 11/01, Oleg Nesterov wrote:
> > >
> > > Or we can add TASK_FREEZABLE (like TASK_WAKEKILL), iirc we already
> > > discussed this some time ago. And probably it makes sense to add the
> > > generic wait_event_state().
> >
> > Forgot to mention. I think that before anything else we need
> > signal_wake_up_state(). For example, note that none of the callers
> > of signal_wake_up(resume => true) in ptrace code wants to wake up
> > the killable task.
>
> Yeah, agreed for both wait_event_state() and signal_wake_up_state().
> For now, let's go with the count/dont_count. Can you please write up
> a patch for that? Jeff, does this seem okay to you?
>

Let me make sure I understand since I don't have a great grasp of the
freezer internals...

This will set the PF_FREEZER_SKIP flag on the task, which prevents
try_to_freeze_tasks from incrementing the "todo" var for this process
and should let the suspend proceed.

So this really makes try_to_freeze_tasks set PF_FREEZING on the task,
but not get upset that it doesn't actually call try_to_freeze().

Is that sufficient for a process that's just sleeping here?

If so, guess I'll need to respin the NFS/RPC patches for this to do
something similar around the sleeps there since they don't use
wait_event_*.

> For TASK_FREEZABLE, I'm not entirely sure. Combined with
> wait_event_state(), it can definitely reduce the number of different
> variants of wait_event_*(). Let's see.
>

wait_event_state() sounds like a wonderful idea regardless of what we
do here.

--
Jeff Layton <jla...@redhat.com>

Oleg Nesterov

unread,
Nov 2, 2011, 11:20:02 AM11/2/11
to
On 11/02, Jeff Layton wrote:
>
> On Tue, 1 Nov 2011 14:57:10 -0700
> Tejun Heo <t...@kernel.org> wrote:
>
> > For now, let's go with the count/dont_count. Can you please write up
> > a patch for that? Jeff, does this seem okay to you?
> >
>
> Let me make sure I understand since I don't have a great grasp of the
> freezer internals...
>
> This will set the PF_FREEZER_SKIP flag on the task, which prevents
> try_to_freeze_tasks from incrementing the "todo" var for this process
> and should let the suspend proceed.
>
> So this really makes try_to_freeze_tasks set PF_FREEZING on the task,
> but not get upset that it doesn't actually call try_to_freeze().

Yes. PF_FREEZER_SKIP means, "please count me as frozen". This task can
do nothing except refrigerator() after wakeup.

> Is that sufficient for a process that's just sleeping here?

I think this should work... but this is the question to you ;)

Oleg.

Oleg Nesterov

unread,
Nov 2, 2011, 12:30:01 PM11/2/11
to
Hi,

On 11/01, Tejun Heo wrote:
>
> For now, let's go with the count/dont_count. Can you please write up
> a patch for that? Jeff, does this seem okay to you?

OK, will do in a minute. On top of
"[PATCH pm] freezer: fix wait_event_freezable/__thaw_task races"
you sent. (btw, thanks, I forgout about it ;)

Rafael, could you remind why freezer_do_not_count/freezer_count check
->mm != NULL ?

The comment says "However, we don't want kernel threads to be frozen",
but it is not clear anyway. A kernel thread simply shouldn't use this
interface if it doesn't want to freeze.

And in any case, PF_KTHREAD looks better if we really need to filter
out the kernel threads.

Oleg.

Oleg Nesterov

unread,
Nov 2, 2011, 2:00:02 PM11/2/11
to
fake_signal_wake_up() should not wake up the TASK_KILLABLE tasks,
we are going to fix this. This means that wait_event_freezekillable()
can't rely on wakeup from freezer. Reimplement it using
freezer_do_not_count/freezer_count. This is not really nice, just
a simple fix for now.

Signed-off-by: Oleg Nesterov <ol...@redhat.com>
---

include/linux/freezer.h | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)

--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -143,13 +143,9 @@ static inline void set_freezable_with_si
#define wait_event_freezekillable(wq, condition) \
({ \
int __retval; \
- for (;;) { \
- __retval = wait_event_killable(wq, \
- (condition) || freezing(current)); \
- if (__retval || (condition)) \
- break; \
- try_to_freeze(); \
- } \
+ freezer_do_not_count(); \
+ __retval = wait_event_killable(wq, (condition)); \
+ freezer_count(); \
__retval; \
})

Rafael J. Wysocki

unread,
Nov 2, 2011, 7:10:02 PM11/2/11
to
On Wednesday, November 02, 2011, Oleg Nesterov wrote:
> Hi,
>
> On 11/01, Tejun Heo wrote:
> >
> > For now, let's go with the count/dont_count. Can you please write up
> > a patch for that? Jeff, does this seem okay to you?
>
> OK, will do in a minute. On top of
> "[PATCH pm] freezer: fix wait_event_freezable/__thaw_task races"
> you sent. (btw, thanks, I forgout about it ;)
>
> Rafael, could you remind why freezer_do_not_count/freezer_count check
> ->mm != NULL ?

You're asking difficult questions. ;-)

The intention was to prevent PF_FREEZER_SKIP from having any effect on
kernel threads, IIRC. Anyway, there are only two legitimate users of it
(vfork and apm_ioctl) and in both cases the task in question is user space.

> The comment says "However, we don't want kernel threads to be frozen",
> but it is not clear anyway. A kernel thread simply shouldn't use this
> interface if it doesn't want to freeze.
>
> And in any case, PF_KTHREAD looks better if we really need to filter
> out the kernel threads.

PF_FREEZER_SKIP was introduced specifically with vfork in mind and I'm not
sure if it's a good idea to re-use it for something else (at least not for
something entirely obvious).

Thanks,
Rafael

Jeff Layton

unread,
Nov 3, 2011, 6:50:02 AM11/3/11
to
I'm not sure we really need this macro anymore since this is much
simpler. I could just move cifs back to using wait_event_killable and
simply wrap it in freezer_do_not_count/freezer_count (with some
comments to explain why we're doing that).

In any event, I plan to test this scheme out today and will let you
know whether it works...

Thanks,
--
Jeff Layton <jla...@redhat.com>

Jeff Layton

unread,
Nov 3, 2011, 7:20:02 AM11/3/11
to
On Thu, 3 Nov 2011 00:11:23 +0100
"Rafael J. Wysocki" <r...@sisk.pl> wrote:

> On Wednesday, November 02, 2011, Oleg Nesterov wrote:
> > Hi,
> >
> > On 11/01, Tejun Heo wrote:
> > >
> > > For now, let's go with the count/dont_count. Can you please write up
> > > a patch for that? Jeff, does this seem okay to you?
> >
> > OK, will do in a minute. On top of
> > "[PATCH pm] freezer: fix wait_event_freezable/__thaw_task races"
> > you sent. (btw, thanks, I forgout about it ;)
> >
> > Rafael, could you remind why freezer_do_not_count/freezer_count check
> > ->mm != NULL ?
>
> You're asking difficult questions. ;-)
>
> The intention was to prevent PF_FREEZER_SKIP from having any effect on
> kernel threads, IIRC. Anyway, there are only two legitimate users of it
> (vfork and apm_ioctl) and in both cases the task in question is user space.
>
> > The comment says "However, we don't want kernel threads to be frozen",
> > but it is not clear anyway. A kernel thread simply shouldn't use this
> > interface if it doesn't want to freeze.
> >
> > And in any case, PF_KTHREAD looks better if we really need to filter
> > out the kernel threads.
>
> PF_FREEZER_SKIP was introduced specifically with vfork in mind and I'm not
> sure if it's a good idea to re-use it for something else (at least not for
> something entirely obvious).
>

FWIW, wrapping wait_event_killable in freezer_do_not_count/freezer_count
seems to work. The machine suspends consistently with it. It sounds
like Rafael has concerns about this scheme however, so I'll let you
guys argue this out :)

Once you tell me the right scheme to use, I'll be happy to fix up cifs
and nfs to use it.

Thanks,
--
Jeff Layton <jla...@redhat.com>

Tejun Heo

unread,
Nov 3, 2011, 10:20:01 AM11/3/11
to
Hello,

On Thu, Nov 03, 2011 at 06:42:15AM -0400, Jeff Layton wrote:
> I'm not sure we really need this macro anymore since this is much
> simpler. I could just move cifs back to using wait_event_killable and
> simply wrap it in freezer_do_not_count/freezer_count (with some
> comments to explain why we're doing that).
>
> In any event, I plan to test this scheme out today and will let you
> know whether it works...

I think it would be better to put this in a macro as the
implementation is likely to change in future and I really don't want
to see FREEZER_SKIP scattered around the tree.

Thank you.

--
tejun

Oleg Nesterov

unread,
Nov 3, 2011, 11:20:02 AM11/3/11
to
On 11/03, Rafael J. Wysocki wrote:
>
> On Wednesday, November 02, 2011, Oleg Nesterov wrote:
> >
> > Rafael, could you remind why freezer_do_not_count/freezer_count check
> > ->mm != NULL ?
>
> You're asking difficult questions. ;-)

I am trying ;)

> The intention was to prevent PF_FREEZER_SKIP from having any effect on
> kernel threads, IIRC. Anyway, there are only two legitimate users of it
> (vfork and apm_ioctl) and in both cases the task in question is user space.

Actually CLONE_VFORK is used by call_usermodehelper() paths but this case
is fine. The caller is the PF_NOFREEZE workqueue thread.

> > The comment says "However, we don't want kernel threads to be frozen",
> > but it is not clear anyway. A kernel thread simply shouldn't use this
> > interface if it doesn't want to freeze.
> >
> > And in any case, PF_KTHREAD looks better if we really need to filter
> > out the kernel threads.
>
> PF_FREEZER_SKIP was introduced specifically with vfork in mind and I'm not
> sure if it's a good idea to re-use it for something else (at least not for
> something entirely obvious).

Indeed! So why do we check ->mm != NULL?

We can remove this check, right now it doesn't matter. And we are trying
to avoid the new users of this interface.

Oleg.

Jeff Layton

unread,
Nov 3, 2011, 11:30:02 AM11/3/11
to
On Thu, 3 Nov 2011 07:13:54 -0700
Tejun Heo <t...@kernel.org> wrote:

> Hello,
>
> On Thu, Nov 03, 2011 at 06:42:15AM -0400, Jeff Layton wrote:
> > I'm not sure we really need this macro anymore since this is much
> > simpler. I could just move cifs back to using wait_event_killable and
> > simply wrap it in freezer_do_not_count/freezer_count (with some
> > comments to explain why we're doing that).
> >
> > In any event, I plan to test this scheme out today and will let you
> > know whether it works...
>
> I think it would be better to put this in a macro as the
> implementation is likely to change in future and I really don't want
> to see FREEZER_SKIP scattered around the tree.
>

Ok, note though that I also need to do a similar set of changes to the
nfs and sunrpc code [1].

Those call sites do not use wait_event_* macros though. So I'll either
need to add a separate set of functions/macros to handle those, or
sprinkle freezer_do_not_count/freezer_count around the code there...

[1] Here are the older patches that depended on the change to
fake_signal_wake_up. I'll need to convert these to use the new scheme
once you guys settle on what it should be:

https://git.samba.org/jlayton/?p=jlayton/linux.git;a=commitdiff;h=0f85cbb747a0f9f8a582ae9bf642e094168001be
https://git.samba.org/jlayton/?p=jlayton/linux.git;a=commitdiff;h=084535708bcb33dd2448e73be5a0f4cac69bea92

--
Jeff Layton <jla...@redhat.com>

Tejun Heo

unread,
Nov 3, 2011, 11:40:01 AM11/3/11
to
Hello,

On Thu, Nov 03, 2011 at 11:27:04AM -0400, Jeff Layton wrote:
> Ok, note though that I also need to do a similar set of changes to the
> nfs and sunrpc code [1].
>
> Those call sites do not use wait_event_* macros though. So I'll either
> need to add a separate set of functions/macros to handle those, or
> sprinkle freezer_do_not_count/freezer_count around the code there...

Yes, I still think it would be better to isolate implementation
details from users. I really hate to see the 'count' functions
scattered around the kernel.

Thank you.

--
tejun
0 new messages