Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH 03/10] freezer: add new freezable helpers using freezer_do_not_count()

34 views
Skip to first unread message

Colin Cross

unread,
Apr 29, 2013, 5:46:09 PM4/29/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Colin Cross, Len Brown, Pavel Machek
Freezing tasks will wake up almost every userspace task from
where it is blocking and force it to run until it hits a
call to try_to_sleep(), generally on the exit path from the syscall
it is blocking in. On resume each task will run again, usually
restarting the syscall and running until it hits the same
blocking call as it was originally blocked in.

To allow tasks to avoid running on every suspend/resume cycle,
this patch adds additional freezable wrappers around blocking calls
that call freezer_do_not_count(). Combined with the previous patch,
these tasks will not run during suspend or resume unless they wake
up for another reason, in which case they will run until they hit
the try_to_freeze() in freezer_count(), and then continue processing
the wakeup after tasks are thawed. This patch also converts the
existing wait_event_freezable* wrappers to use freezer_do_not_count().

Additional patches will convert the most common locations that
userspace blocks in to use freezable helpers.

Signed-off-by: Colin Cross <ccr...@android.com>
---
include/linux/freezer.h | 72 +++++++++++++++++++++++++++++++++++++++----------
1 file changed, 58 insertions(+), 14 deletions(-)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index e70df40..b239f45 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -152,6 +152,26 @@ static inline bool freezer_should_skip(struct task_struct *p)
freezer_count(); \
})

+/* Like schedule_timeout(), but should not block the freezer. */
+#define freezable_schedule_timeout(timeout) \
+({ \
+ long __retval; \
+ freezer_do_not_count(); \
+ __retval = schedule_timeout(timeout); \
+ freezer_count(); \
+ __retval; \
+})
+
+/* Like schedule_timeout_interruptible(), but should not block the freezer. */
+#define freezable_schedule_timeout_interruptible(timeout) \
+({ \
+ long __retval; \
+ freezer_do_not_count(); \
+ __retval = schedule_timeout_interruptible(timeout); \
+ freezer_count(); \
+ __retval; \
+})
+
/* Like schedule_timeout_killable(), but should not block the freezer. */
#define freezable_schedule_timeout_killable(timeout) \
({ \
@@ -162,6 +182,17 @@ static inline bool freezer_should_skip(struct task_struct *p)
__retval; \
})

+/* Like schedule_hrtimeout_range(), but should not block the freezer. */
+#define freezable_schedule_hrtimeout_range(expires, delta, mode) \
+({ \
+ int __retval; \
+ freezer_do_not_count(); \
+ __retval = schedule_hrtimeout_range(expires, delta, mode); \
+ freezer_count(); \
+ __retval; \
+})
+
+
/*
* Freezer-friendly wrappers around wait_event_interruptible(),
* wait_event_killable() and wait_event_interruptible_timeout(), originally
@@ -180,30 +211,32 @@ static inline bool freezer_should_skip(struct task_struct *p)
#define wait_event_freezable(wq, condition) \
({ \
int __retval; \
- for (;;) { \
- __retval = wait_event_interruptible(wq, \
- (condition) || freezing(current)); \
- if (__retval || (condition)) \
- break; \
- try_to_freeze(); \
- } \
+ freezer_do_not_count(); \
+ __retval = wait_event_interruptible(wq, (condition)); \
+ freezer_count(); \
__retval; \
})

#define wait_event_freezable_timeout(wq, condition, timeout) \
({ \
long __retval = timeout; \
- for (;;) { \
- __retval = wait_event_interruptible_timeout(wq, \
- (condition) || freezing(current), \
+ freezer_do_not_count(); \
+ __retval = wait_event_interruptible_timeout(wq, (condition), \
__retval); \
- if (__retval <= 0 || (condition)) \
- break; \
- try_to_freeze(); \
- } \
+ freezer_count(); \
__retval; \
})

+#define wait_event_freezable_exclusive(wq, condition) \
+({ \
+ int __retval; \
+ freezer_do_not_count(); \
+ __retval = wait_event_interruptible_exclusive(wq, condition); \
+ freezer_count(); \
+ __retval; \
+})
+
+
#else /* !CONFIG_FREEZER */
static inline bool frozen(struct task_struct *p) { return false; }
static inline bool freezing(struct task_struct *p) { return false; }
@@ -225,15 +258,26 @@ static inline void set_freezable(void) {}

#define freezable_schedule() schedule()

+#define freezable_schedule_timeout(timeout) schedule_timeout(timeout)
+
+#define freezable_schedule_timeout_interruptible(timeout) \
+ schedule_timeout_interruptible(timeout)
+
#define freezable_schedule_timeout_killable(timeout) \
schedule_timeout_killable(timeout)

+#define freezable_schedule_hrtimeout_range(expires, delta, mode) \
+ schedule_hrtimeout_range(expires, delta, mode)
+
#define wait_event_freezable(wq, condition) \
wait_event_interruptible(wq, condition)

#define wait_event_freezable_timeout(wq, condition, timeout) \
wait_event_interruptible_timeout(wq, condition, timeout)

+#define wait_event_freezable_exclusive(wq, condition) \
+ wait_event_interruptible_exclusive(wq, condition)
+
#define wait_event_freezekillable(wq, condition) \
wait_event_killable(wq, condition)

--
1.8.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Colin Cross

unread,
Apr 29, 2013, 5:46:11 PM4/29/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Colin Cross
On slow cpus the large number of task wakeups and context switches
triggered by freezing and thawing tasks can take a significant amount
of cpu time. This patch series reduces the amount of work done during
freezing tasks by avoiding waking up tasks that are already in a freezable
state.

The first patch reduces the wasted time in try_to_freeze_tasks() by
starting with a 1 ms sleep during the first loop and backing off
up to an 8 ms sleep if all tasks are not frozen.

The second patch modifies the try_to_freeze_tasks() loop to skip tasks
that have set the PF_FREEZER_SKIP flag by calling freezer_do_not_count().
These tasks will not enter the refrigerator during the suspend/resume
cycle unless they woken up by something else, in which case they will
enter the refrigerator in freezer_count() before they access any
resources that would not be available in suspend or deadlock with
another freezing/frozen task.

The rest of the series adds a few more freezable helpers and converts the
top call sites that userspace tasks are usually blocked at to freezable
helpers. The list of call sites was collected on a Nexus 10 (ARM Exynos
5250 SoC), but all the top call sites other than binder show up at the
top of the list on Ubuntu x86-64 as well.

This series cuts the time for freezing tasks from 50 ms to 5 ms when
the cpu speed is locked at its lowest setting (200MHz), and reduces
the number of context switches and restarted syscalls from 1000 to
25.

Colin Cross

unread,
Apr 29, 2013, 5:46:55 PM4/29/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Colin Cross, Tejun Heo, Li Zefan, Len Brown, Pavel Machek, conta...@lists.linux-foundation.org, cgr...@vger.kernel.org
If a task has called freezer_do_not_count(), don't bother waking it
up. If it happens to wake up later it will call freezer_count() and
immediately enter the refrigerator.

Signed-off-by: Colin Cross <ccr...@android.com>
---
kernel/cgroup_freezer.c | 5 ++++-
kernel/power/process.c | 4 ++--
2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 75dda1e..406dd71 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -331,8 +331,11 @@ static void freeze_cgroup(struct freezer *freezer)
struct task_struct *task;

cgroup_iter_start(cgroup, &it);
- while ((task = cgroup_iter_next(cgroup, &it)))
+ while ((task = cgroup_iter_next(cgroup, &it))) {
+ if (freezer_should_skip(task))
+ continue;
freeze_task(task);
+ }
cgroup_iter_end(cgroup, &it);
}

diff --git a/kernel/power/process.c b/kernel/power/process.c
index eb7e6c1..0680be2 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -46,10 +46,10 @@ static int try_to_freeze_tasks(bool user_only)
todo = 0;
read_lock(&tasklist_lock);
do_each_thread(g, p) {
- if (p == current || !freeze_task(p))
+ if (p == current || freezer_should_skip(p))
continue;

- if (!freezer_should_skip(p))
+ if (freeze_task(p))
todo++;
} while_each_thread(g, p);
read_unlock(&tasklist_lock);
--
1.8.2.1

Colin Cross

unread,
Apr 29, 2013, 5:46:55 PM4/29/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Colin Cross, Alexander Viro, linux-...@vger.kernel.org
Avoid waking up every thread sleeping in a select call during
suspend and resume by calling a freezable blocking call.

Signed-off-by: Colin Cross <ccr...@android.com>
---
fs/select.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/select.c b/fs/select.c
index 8c1c96c..6b14dc7 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -27,6 +27,7 @@
#include <linux/rcupdate.h>
#include <linux/hrtimer.h>
#include <linux/sched/rt.h>
+#include <linux/freezer.h>

#include <asm/uaccess.h>

@@ -236,7 +237,8 @@ int poll_schedule_timeout(struct poll_wqueues *pwq, int state,

set_current_state(state);
if (!pwq->triggered)
- rc = schedule_hrtimeout_range(expires, slack, HRTIMER_MODE_ABS);
+ rc = freezable_schedule_hrtimeout_range(expires, slack,
+ HRTIMER_MODE_ABS);
__set_current_state(TASK_RUNNING);

/*
--
1.8.2.1

Colin Cross

unread,
Apr 29, 2013, 5:47:12 PM4/29/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Colin Cross, Al Viro, Andrew Morton, Oleg Nesterov, Eric W. Biederman, Serge Hallyn
Avoid waking up every thread sleeping in a sigtimedwait call during
suspend and resume by calling a freezable blocking call.

Signed-off-by: Colin Cross <ccr...@android.com>
---
kernel/signal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 598dc06..10a70a0 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2845,7 +2845,7 @@ int do_sigtimedwait(const sigset_t *which, siginfo_t *info,
recalc_sigpending();
spin_unlock_irq(&tsk->sighand->siglock);

- timeout = schedule_timeout_interruptible(timeout);
+ timeout = freezable_schedule_timeout_interruptible(timeout);

spin_lock_irq(&tsk->sighand->siglock);
__set_task_blocked(tsk, &tsk->real_blocked);
--
1.8.2.1

Colin Cross

unread,
Apr 29, 2013, 5:47:46 PM4/29/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Colin Cross, Alexander Viro, linux-...@vger.kernel.org
Avoid waking up every thread sleeping in an epoll_wait call during
suspend and resume by calling a freezable blocking call.

Signed-off-by: Colin Cross <ccr...@android.com>
---
fs/eventpoll.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 9fec183..65245e7 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -34,6 +34,7 @@
#include <linux/mutex.h>
#include <linux/anon_inodes.h>
#include <linux/device.h>
+#include <linux/freezer.h>
#include <asm/uaccess.h>
#include <asm/io.h>
#include <asm/mman.h>
@@ -1543,7 +1544,8 @@ fetch_events:
}

spin_unlock_irqrestore(&ep->lock, flags);
- if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS))
+ if (!freezable_schedule_hrtimeout_range(to, slack,
+ HRTIMER_MODE_ABS))
timed_out = 1;

spin_lock_irqsave(&ep->lock, flags);
--
1.8.2.1

Colin Cross

unread,
Apr 29, 2013, 5:51:14 PM4/29/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Colin Cross, Len Brown, Pavel Machek
All tasks can easily be frozen in under 10 ms, switch to using
an initial 1 ms sleep followed by exponential backoff until
8 ms. Also convert the printed time to ms instead of centiseconds.

Signed-off-by: Colin Cross <ccr...@android.com>
---
kernel/power/process.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/kernel/power/process.c b/kernel/power/process.c
index 98088e0..eb7e6c1 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -30,9 +30,10 @@ static int try_to_freeze_tasks(bool user_only)
unsigned int todo;
bool wq_busy = false;
struct timeval start, end;
- u64 elapsed_csecs64;
- unsigned int elapsed_csecs;
+ u64 elapsed_msecs64;
+ unsigned int elapsed_msecs;
bool wakeup = false;
+ int sleep_usecs = USEC_PER_MSEC;

do_gettimeofday(&start);

@@ -70,20 +71,22 @@ static int try_to_freeze_tasks(bool user_only)
* We need to retry, but first give the freezing tasks some
* time to enter the refrigerator.
*/
- msleep(10);
+ usleep_range(sleep_usecs / 2, sleep_usecs);
+ if (sleep_usecs < 8 * USEC_PER_MSEC)
+ sleep_usecs *= 2;
}

do_gettimeofday(&end);
- elapsed_csecs64 = timeval_to_ns(&end) - timeval_to_ns(&start);
- do_div(elapsed_csecs64, NSEC_PER_SEC / 100);
- elapsed_csecs = elapsed_csecs64;
+ elapsed_msecs64 = timeval_to_ns(&end) - timeval_to_ns(&start);
+ do_div(elapsed_msecs64, NSEC_PER_MSEC);
+ elapsed_msecs = elapsed_msecs64;

if (todo) {
printk("\n");
- printk(KERN_ERR "Freezing of tasks %s after %d.%02d seconds "
+ printk(KERN_ERR "Freezing of tasks %s after %d.%03d seconds "
"(%d tasks refusing to freeze, wq_busy=%d):\n",
wakeup ? "aborted" : "failed",
- elapsed_csecs / 100, elapsed_csecs % 100,
+ elapsed_msecs / 1000, elapsed_msecs % 1000,
todo - wq_busy, wq_busy);

if (!wakeup) {
@@ -96,8 +99,8 @@ static int try_to_freeze_tasks(bool user_only)
read_unlock(&tasklist_lock);
}
} else {
- printk("(elapsed %d.%02d seconds) ", elapsed_csecs / 100,
- elapsed_csecs % 100);
+ printk("(elapsed %d.%03d seconds) ", elapsed_msecs / 1000,
+ elapsed_msecs % 1000);
}

return todo ? -EBUSY : 0;
--
1.8.2.1

Colin Cross

unread,
Apr 29, 2013, 5:51:26 PM4/29/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Colin Cross, Thomas Gleixner
Avoid waking up every thread sleeping in a nanosleep call during
suspend and resume by calling a freezable blocking call.

Signed-off-by: Colin Cross <ccr...@android.com>
---
kernel/hrtimer.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 14be27f..e036276 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -47,6 +47,7 @@
#include <linux/sched/sysctl.h>
#include <linux/sched/rt.h>
#include <linux/timer.h>
+#include <linux/freezer.h>

#include <asm/uaccess.h>

@@ -1525,7 +1526,7 @@ static int __sched do_nanosleep(struct hrtimer_sleeper *t, enum hrtimer_mode mod
t->task = NULL;

if (likely(t->task))
- schedule();
+ freezable_schedule();

hrtimer_cancel(&t->timer);
mode = HRTIMER_MODE_ABS;
--
1.8.2.1

Tejun Heo

unread,
Apr 29, 2013, 5:52:11 PM4/29/13
to Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Li Zefan, Len Brown, Pavel Machek, conta...@lists.linux-foundation.org, cgr...@vger.kernel.org
Hello,

On Mon, Apr 29, 2013 at 02:45:38PM -0700, Colin Cross wrote:
> If a task has called freezer_do_not_count(), don't bother waking it
> up. If it happens to wake up later it will call freezer_count() and
> immediately enter the refrigerator.
>
> Signed-off-by: Colin Cross <ccr...@android.com>
> ---
> kernel/cgroup_freezer.c | 5 ++++-
> kernel/power/process.c | 4 ++--
> 2 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index 75dda1e..406dd71 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -331,8 +331,11 @@ static void freeze_cgroup(struct freezer *freezer)
> struct task_struct *task;
>
> cgroup_iter_start(cgroup, &it);
> - while ((task = cgroup_iter_next(cgroup, &it)))
> + while ((task = cgroup_iter_next(cgroup, &it))) {
> + if (freezer_should_skip(task))
> + continue;
> freeze_task(task);
> + }
> cgroup_iter_end(cgroup, &it);

I feel a bit weary of changes which try to optimize state checks for
freezer because the synchronization rules are kinda fragile and things
may not work reliably depending on who's testing the flag, and it has
been subtly broken in various ways in the past (maybe even now). Can
you please explain the benefits of this patch (in terms of actual
overhead because not many use freezer_do_not_count()) and why this is
correct?

Thanks.

--
tejun

Colin Cross

unread,
Apr 29, 2013, 5:53:28 PM4/29/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Colin Cross, David S. Miller, Eric Dumazet, Al Viro, Eric W. Biederman, Gao feng, net...@vger.kernel.org
Avoid waking up every thread sleeping in read call on an AF_UNIX
socket during suspend and resume by calling a freezable blocking
call.

Signed-off-by: Colin Cross <ccr...@android.com>
---
net/unix/af_unix.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 2db702d..2bcac57 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -114,6 +114,7 @@
#include <linux/mount.h>
#include <net/checksum.h>
#include <linux/security.h>
+#include <linux/freezer.h>

struct hlist_head unix_socket_table[2 * UNIX_HASH_SIZE];
EXPORT_SYMBOL_GPL(unix_socket_table);
@@ -1880,7 +1881,7 @@ static long unix_stream_data_wait(struct sock *sk, long timeo)

set_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
unix_state_unlock(sk);
- timeo = schedule_timeout(timeo);
+ timeo = freezable_schedule_timeout(timeo);
unix_state_lock(sk);
clear_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
}
--
1.8.2.1

Colin Cross

unread,
Apr 29, 2013, 5:54:11 PM4/29/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Colin Cross, Greg Kroah-Hartman, Al Viro, Eric W. Biederman, Sachin Kamat, de...@driverdev.osuosl.org
Avoid waking up every thread sleeping in a binder call during
suspend and resume by calling a freezable blocking call.

Signed-off-by: Colin Cross <ccr...@android.com>
---
drivers/staging/android/binder.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/android/binder.c b/drivers/staging/android/binder.c
index 24456a0..af8fba4 100644
--- a/drivers/staging/android/binder.c
+++ b/drivers/staging/android/binder.c
@@ -20,6 +20,7 @@
#include <asm/cacheflush.h>
#include <linux/fdtable.h>
#include <linux/file.h>
+#include <linux/freezer.h>
#include <linux/fs.h>
#include <linux/list.h>
#include <linux/miscdevice.h>
@@ -2140,13 +2141,13 @@ retry:
if (!binder_has_proc_work(proc, thread))
ret = -EAGAIN;
} else
- ret = wait_event_interruptible_exclusive(proc->wait, binder_has_proc_work(proc, thread));
+ ret = wait_event_freezable_exclusive(proc->wait, binder_has_proc_work(proc, thread));
} else {
if (non_block) {
if (!binder_has_thread_work(thread))
ret = -EAGAIN;
} else
- ret = wait_event_interruptible(thread->wait, binder_has_thread_work(thread));
+ ret = wait_event_freezable(thread->wait, binder_has_thread_work(thread));
}

binder_lock(__func__);
--
1.8.2.1

Tejun Heo

unread,
Apr 29, 2013, 5:58:09 PM4/29/13
to Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Li Zefan, Len Brown, Pavel Machek, conta...@lists.linux-foundation.org, cgr...@vger.kernel.org
On Mon, Apr 29, 2013 at 02:51:57PM -0700, Tejun Heo wrote:
> I feel a bit weary of changes which try to optimize state checks for
> freezer because the synchronization rules are kinda fragile and things
> may not work reliably depending on who's testing the flag, and it has
> been subtly broken in various ways in the past (maybe even now). Can

And BTW, this is why the function is only used when checking whether a
task is frozen rather than to decide to issue freeze_task() on it, and
it seems your change is correct now that we don't have per-task FREEZE
flag but I can't say I like the change. I'd really like to keep
things as dumb as possible for freezer.

Colin Cross

unread,
Apr 29, 2013, 6:02:44 PM4/29/13
to Tejun Heo, Linux PM list, lkml, Rafael J. Wysocki, Arve Hjønnevåg, Li Zefan, Len Brown, Pavel Machek, conta...@lists.linux-foundation.org, cgr...@vger.kernel.org
On Mon, Apr 29, 2013 at 2:57 PM, Tejun Heo <t...@kernel.org> wrote:
> On Mon, Apr 29, 2013 at 02:51:57PM -0700, Tejun Heo wrote:
>> I feel a bit weary of changes which try to optimize state checks for
>> freezer because the synchronization rules are kinda fragile and things
>> may not work reliably depending on who's testing the flag, and it has
>> been subtly broken in various ways in the past (maybe even now). Can
>
> And BTW, this is why the function is only used when checking whether a
> task is frozen rather than to decide to issue freeze_task() on it, and
> it seems your change is correct now that we don't have per-task FREEZE
> flag but I can't say I like the change. I'd really like to keep
> things as dumb as possible for freezer.

See the first patch in the series (which isn't available in the
archive yet, so I can't link to it). The short version is that
Android goes through suspend/resume very often (every few seconds when
on a busy wifi network with the screen off), and a significant portion
of the energy used to go in and out of suspend is spent in the
freezer. This patch series takes the most common userspace sleep
points and converts them to PF_FREEZER_SKIP, which reduces the number
of context switches for every suspend or resume event on a
freshly-booted Android device from 1000 to 25, and reduces the time
spent freezing by a factor of 5. It will have a similar effect on a
non-Android system, although those generally don't care about
suspend/resume optimization.

Colin Cross

unread,
Apr 29, 2013, 6:09:05 PM4/29/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Colin Cross, Darren Hart, Thomas Gleixner, Andrew Morton, Randy Dunlap, Al Viro
Avoid waking up every thread sleeping in a futex_wait call during
suspend and resume by calling a freezable blocking call.

Signed-off-by: Colin Cross <ccr...@android.com>
---
kernel/futex.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index b26dcfc..d710fae 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -61,6 +61,7 @@
#include <linux/nsproxy.h>
#include <linux/ptrace.h>
#include <linux/sched/rt.h>
+#include <linux/freezer.h>

#include <asm/futex.h>

@@ -1807,7 +1808,7 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
* is no timeout, or if it has yet to expire.
*/
if (!timeout || timeout->task)
- schedule();
+ freezable_schedule();
}
__set_current_state(TASK_RUNNING);
}
--
1.8.2.1

Tejun Heo

unread,
Apr 29, 2013, 6:09:06 PM4/29/13
to Colin Cross, Linux PM list, lkml, Rafael J. Wysocki, Arve Hjønnevåg, Li Zefan, Len Brown, Pavel Machek, conta...@lists.linux-foundation.org, cgr...@vger.kernel.org
Hey,

On Mon, Apr 29, 2013 at 03:02:19PM -0700, Colin Cross wrote:
> See the first patch in the series (which isn't available in the
> archive yet, so I can't link to it). The short version is that

It didn't arrive in my lkml folder either. Maybe vger is taking some
time distributing emails.

> Android goes through suspend/resume very often (every few seconds when
> on a busy wifi network with the screen off), and a significant portion
> of the energy used to go in and out of suspend is spent in the
> freezer. This patch series takes the most common userspace sleep
> points and converts them to PF_FREEZER_SKIP, which reduces the number
> of context switches for every suspend or resume event on a
> freshly-booted Android device from 1000 to 25, and reduces the time

Ah, okay, so you're spreading PF_FREEZER_SKIP. When you post patches
which touch the freezer can you please cc me and Oleg Nesterov
<ol...@redhat.com> (I'll ping him this time)? Freezer has been very
subtly broken in various ways and many kthread users are still broken,
so let's tread carefully.

> spent freezing by a factor of 5. It will have a similar effect on a
> non-Android system, although those generally don't care about
> suspend/resume optimization.

Yeah, if it's something which makes actual difference rather than
"this seems to be a good idea" thing, sure, let's find a way.

Thanks.

--
tejun

Tejun Heo

unread,
Apr 29, 2013, 6:16:47 PM4/29/13
to Colin Cross, Linux PM list, lkml, Rafael J. Wysocki, Arve Hjønnevåg, Li Zefan, Len Brown, Pavel Machek, conta...@lists.linux-foundation.org, cgr...@vger.kernel.org
On Mon, Apr 29, 2013 at 03:08:31PM -0700, Tejun Heo wrote:
> > spent freezing by a factor of 5. It will have a similar effect on a
> > non-Android system, although those generally don't care about
> > suspend/resume optimization.
>
> Yeah, if it's something which makes actual difference rather than
> "this seems to be a good idea" thing, sure, let's find a way.

And a probably better approach would be rolling should_skip test into
freeze_task() with a comment explaining the interlocking rather than
adding should_skip test in its callers.

Darren Hart

unread,
Apr 29, 2013, 6:52:48 PM4/29/13
to Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Thomas Gleixner, Andrew Morton, Randy Dunlap, Al Viro, Matthew Helsley
Colin,

I don't know anything about when or when not to use freezable*, and I
suspect that may be true for others as well. A more complete
description of why it's acceptable here in the commit log might help
expedite acceptance.


Matt,

I have a vague memory of discussing something similar to this with you.
Do you see any potential problems here?

--
Darren
Darren Hart
Intel Open Source Technology Center
Yocto Project - Technical Lead - Linux Kernel

Colin Cross

unread,
Apr 29, 2013, 7:46:36 PM4/29/13
to Darren Hart, Linux PM list, lkml, Rafael J. Wysocki, Arve Hjønnevåg, Thomas Gleixner, Andrew Morton, Randy Dunlap, Al Viro, Matthew Helsley
On Mon, Apr 29, 2013 at 3:52 PM, Darren Hart <dvh...@linux.intel.com> wrote:
> Colin,
>
> I don't know anything about when or when not to use freezable*, and I
> suspect that may be true for others as well. A more complete
> description of why it's acceptable here in the commit log might help
> expedite acceptance.
>

Sure, how about:
This call can be converted to a freezable call because it doesn't hold
any locks or release any resources when interrupted that might be
needed by another freezing task or a kernel driver during suspend.

Rafael J. Wysocki

unread,
Apr 30, 2013, 7:40:50 AM4/30/13
to Tejun Heo, Colin Cross, Linux PM list, lkml, Arve Hjønnevåg, Li Zefan, Len Brown, Pavel Machek, conta...@lists.linux-foundation.org, cgr...@vger.kernel.org
On Monday, April 29, 2013 03:16:24 PM Tejun Heo wrote:
> On Mon, Apr 29, 2013 at 03:08:31PM -0700, Tejun Heo wrote:
> > > spent freezing by a factor of 5. It will have a similar effect on a
> > > non-Android system, although those generally don't care about
> > > suspend/resume optimization.
> >
> > Yeah, if it's something which makes actual difference rather than
> > "this seems to be a good idea" thing, sure, let's find a way.
>
> And a probably better approach would be rolling should_skip test into
> freeze_task() with a comment explaining the interlocking rather than
> adding should_skip test in its callers.

Agreed.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

Oleg Nesterov

unread,
Apr 30, 2013, 12:41:49 PM4/30/13
to Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Al Viro, Andrew Morton, Eric W. Biederman, Serge Hallyn
On 04/29, Colin Cross wrote:
>
> Avoid waking up every thread sleeping in a sigtimedwait call during
> suspend and resume by calling a freezable blocking call.

This doesn't explain why do want this change...

OK, probably to avoid -EAGAIN from sigtimedwait() if the freezer wakes
up the caller.

> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2845,7 +2845,7 @@ int do_sigtimedwait(const sigset_t *which, siginfo_t *info,
> recalc_sigpending();
> spin_unlock_irq(&tsk->sighand->siglock);
>
> - timeout = schedule_timeout_interruptible(timeout);
> + timeout = freezable_schedule_timeout_interruptible(timeout);

And I guess freezable_schedule_timeout_interruptible() is added by
http://marc.info/?l=linux-kernel&m=136727195719575 ...

+#define freezable_schedule_timeout_interruptible(timeout) \
+({ \
+ long __retval; \
+ freezer_do_not_count(); \
+ __retval = schedule_timeout_interruptible(timeout); \
+ freezer_count(); \
+ __retval; \
+})

How this can help?

The task will be interrupted anyway and the syscall will return
-EAGAIN, this only changes the time when try_to_freeze() is called.

For what? The task will call do_signal/try_to_freeze really "soon".

Confused...

Oleg.

Colin Cross

unread,
Apr 30, 2013, 12:58:48 PM4/30/13
to Oleg Nesterov, Linux PM list, lkml, Rafael J. Wysocki, Arve Hjønnevåg, Al Viro, Andrew Morton, Eric W. Biederman, Serge Hallyn
On Tue, Apr 30, 2013 at 9:38 AM, Oleg Nesterov <ol...@redhat.com> wrote:
> On 04/29, Colin Cross wrote:
>>
>> Avoid waking up every thread sleeping in a sigtimedwait call during
>> suspend and resume by calling a freezable blocking call.
>
> This doesn't explain why do want this change...
>
> OK, probably to avoid -EAGAIN from sigtimedwait() if the freezer wakes
> up the caller.

See http://marc.info/?l=linux-pm&m=136727197819593&w=2 for the full
justification. I will include a fuller description of the reason for
this patch in the next version.

>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -2845,7 +2845,7 @@ int do_sigtimedwait(const sigset_t *which, siginfo_t *info,
>> recalc_sigpending();
>> spin_unlock_irq(&tsk->sighand->siglock);
>>
>> - timeout = schedule_timeout_interruptible(timeout);
>> + timeout = freezable_schedule_timeout_interruptible(timeout);
>
> And I guess freezable_schedule_timeout_interruptible() is added by
> http://marc.info/?l=linux-kernel&m=136727195719575 ...
>
> +#define freezable_schedule_timeout_interruptible(timeout) \
> +({ \
> + long __retval; \
> + freezer_do_not_count(); \
> + __retval = schedule_timeout_interruptible(timeout); \
> + freezer_count(); \
> + __retval; \
> +})
>
> How this can help?
>
> The task will be interrupted anyway and the syscall will return
> -EAGAIN, this only changes the time when try_to_freeze() is called.
>
> For what? The task will call do_signal/try_to_freeze really "soon".

See http://marc.info/?l=linux-pm&m=136727204919622&w=2, which removes
the wakeup sent to skipped tasks, so schedule_timeout_interruptible()
will only return if the timeout finishes or another task (not the
freezer) calls wake_up on the task. If that happens during freezing
or while frozen freezer_count() will freeze. If another task does not
wake up this task while frozen, this task will not need to run at all
during suspend or resume, saving cpu cycles, context switches, and
power.

Oleg Nesterov

unread,
Apr 30, 2013, 12:59:36 PM4/30/13
to Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Al Viro, Andrew Morton, Eric W. Biederman, Serge Hallyn
OK, I wasn't cced, I have found another patch
http://marc.info/?l=linux-kernel&m=136727200519602 which should make
a difference, it won't be woken if PF_FREEZER_SKIP was already set.

This is racy, but it seems that "avoid -EAGAIN" was not your goal...

> For what? The task will call do_signal/try_to_freeze really "soon".

It seems that you want to speed up the freezing.

Oleg Nesterov

unread,
Apr 30, 2013, 1:03:42 PM4/30/13
to Colin Cross, Linux PM list, lkml, Rafael J. Wysocki, Arve Hjønnevåg, Al Viro, Andrew Morton, Eric W. Biederman, Serge Hallyn
On 04/30, Colin Cross wrote:
> On Tue, Apr 30, 2013 at 9:38 AM, Oleg Nesterov <ol...@redhat.com> wrote:
> > On 04/29, Colin Cross wrote:
> >>
> >> Avoid waking up every thread sleeping in a sigtimedwait call during
> >> suspend and resume by calling a freezable blocking call.
> >
> > This doesn't explain why do want this change...
> >
> > OK, probably to avoid -EAGAIN from sigtimedwait() if the freezer wakes
> > up the caller.
>
> See http://marc.info/?l=linux-pm&m=136727197819593&w=2 for the full
> justification. I will include a fuller description of the reason for
> this patch in the next version.

Yes, thanks, I already realized what are you trying to do.

> > The task will be interrupted anyway and the syscall will return
> > -EAGAIN, this only changes the time when try_to_freeze() is called.
> >
> > For what? The task will call do_signal/try_to_freeze really "soon".
>
> See http://marc.info/?l=linux-pm&m=136727204919622&w=2, which removes
> the wakeup sent to skipped tasks, so schedule_timeout_interruptible()
> will only return if the timeout finishes or another task

Or if freeze_task() was already called.

but I guess you do not care.

Oleg Nesterov

unread,
Apr 30, 2013, 1:14:50 PM4/30/13
to Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Tejun Heo, Li Zefan, Len Brown, Pavel Machek, conta...@redhat.com
On 04/29, Colin Cross wrote:
>
> @@ -46,10 +46,10 @@ static int try_to_freeze_tasks(bool user_only)
> todo = 0;
> read_lock(&tasklist_lock);
> do_each_thread(g, p) {
> - if (p == current || !freeze_task(p))
> + if (p == current || freezer_should_skip(p))
> continue;
>
> - if (!freezer_should_skip(p))
> + if (freeze_task(p))
> todo++;

What if we race with freezer_count() ?

try_to_freeze_tasks() can wrongly succeed leaving the running task
unfrozen, no?

Oleg.

Oleg Nesterov

unread,
Apr 30, 2013, 1:18:48 PM4/30/13
to Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Tejun Heo, Li Zefan, Len Brown, Pavel Machek, conta...@redhat.com
On 04/30, Oleg Nesterov wrote:
>
> On 04/29, Colin Cross wrote:
> >
> > @@ -46,10 +46,10 @@ static int try_to_freeze_tasks(bool user_only)
> > todo = 0;
> > read_lock(&tasklist_lock);
> > do_each_thread(g, p) {
> > - if (p == current || !freeze_task(p))
> > + if (p == current || freezer_should_skip(p))
> > continue;
> >
> > - if (!freezer_should_skip(p))
> > + if (freeze_task(p))
> > todo++;
>
> What if we race with freezer_count() ?
>
> try_to_freeze_tasks() can wrongly succeed leaving the running task
> unfrozen, no?

Ah, no, sorry for noise.

Colin Cross

unread,
May 1, 2013, 9:35:40 PM5/1/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Tejun Heo, Oleg Nesterov, Colin Cross, Pavel Machek
Android goes through suspend/resume very often (every few seconds when
on a busy wifi network with the screen off), and a significant portion
of the energy used to go in and out of suspend is spent in the
freezer. If a task has called freezer_do_not_count(), don't bother
waking it up. If it happens to wake up later it will call
freezer_count() and immediately enter the refrigerator.

Combined with patches to convert freezable helpers to use
freezer_do_not_count() and convert common sites where idle userspace
tasks are blocked to use the freezable helpers, this reduces the
time and energy required to suspend and resume.

Signed-off-by: Colin Cross <ccr...@android.com>
---
v2: move check to freeze_task()

kernel/freezer.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/kernel/freezer.c b/kernel/freezer.c
index c38893b..8b2afc1 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -110,6 +110,18 @@ bool freeze_task(struct task_struct *p)
{
unsigned long flags;

+ /*
+ * This check can race with freezer_do_not_count, but worst case that
+ * will result in an extra wakeup being sent to the task. It does not
+ * race with freezer_count(), the barriers in freezer_count() and
+ * freezer_should_skip() ensure that either freezer_count() sees
+ * freezing == true in try_to_freeze() and freezes, or
+ * freezer_should_skip() sees !PF_FREEZE_SKIP and freezes the task
+ * normally.
+ */
+ if (freezer_should_skip(p))
+ return false;
+
spin_lock_irqsave(&freezer_lock, flags);
if (!freezing(p) || frozen(p)) {
spin_unlock_irqrestore(&freezer_lock, flags);
--
1.8.2.1

Colin Cross

unread,
May 1, 2013, 9:35:41 PM5/1/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Tejun Heo, Oleg Nesterov, Colin Cross, Len Brown, Pavel Machek
Freezing tasks will wake up almost every userspace task from
where it is blocking and force it to run until it hits a
call to try_to_sleep(), generally on the exit path from the syscall
it is blocking in. On resume each task will run again, usually
restarting the syscall and running until it hits the same
blocking call as it was originally blocked in.

To allow tasks to avoid running on every suspend/resume cycle,
this patch adds additional freezable wrappers around blocking calls
that call freezer_do_not_count(). Combined with the previous patch,
these tasks will not run during suspend or resume unless they wake
up for another reason, in which case they will run until they hit
the try_to_freeze() in freezer_count(), and then continue processing
the wakeup after tasks are thawed. This patch also converts the
existing wait_event_freezable* wrappers to use freezer_do_not_count().

Additional patches will convert the most common locations that
userspace blocks in to use freezable helpers.

Signed-off-by: Colin Cross <ccr...@android.com>
---
include/linux/freezer.h | 72 +++++++++++++++++++++++++++++++++++++++----------
1 file changed, 58 insertions(+), 14 deletions(-)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index e70df40..b239f45 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -152,6 +152,26 @@ static inline bool freezer_should_skip(struct task_struct *p)
freezer_count(); \
})

+/* Like schedule_timeout(), but should not block the freezer. */
+#define freezable_schedule_timeout(timeout) \
+({ \
+ long __retval; \
+ freezer_do_not_count(); \
+ __retval = schedule_timeout(timeout); \
+ freezer_count(); \
+ __retval; \
+})
+
+/* Like schedule_timeout_interruptible(), but should not block the freezer. */
+#define freezable_schedule_timeout_interruptible(timeout) \
+({ \
+ long __retval; \
+ freezer_do_not_count(); \
+ __retval = schedule_timeout_interruptible(timeout); \
+ freezer_count(); \
+ __retval; \
+})
+
/* Like schedule_timeout_killable(), but should not block the freezer. */
#define freezable_schedule_timeout_killable(timeout) \
({ \
@@ -162,6 +182,17 @@ static inline bool freezer_should_skip(struct task_struct *p)
__retval; \
})

+/* Like schedule_hrtimeout_range(), but should not block the freezer. */
+#define freezable_schedule_hrtimeout_range(expires, delta, mode) \
+({ \
+ int __retval; \
+ freezer_do_not_count(); \
+ __retval = schedule_hrtimeout_range(expires, delta, mode); \
+ freezer_count(); \
+ __retval; \
+})
+
+
/*
* Freezer-friendly wrappers around wait_event_interruptible(),
* wait_event_killable() and wait_event_interruptible_timeout(), originally
@@ -180,30 +211,32 @@ static inline bool freezer_should_skip(struct task_struct *p)
#define wait_event_freezable(wq, condition) \
({ \
int __retval; \
- for (;;) { \
- __retval = wait_event_interruptible(wq, \
- (condition) || freezing(current)); \
- if (__retval || (condition)) \
- break; \
- try_to_freeze(); \
- } \
+ freezer_do_not_count(); \
+ __retval = wait_event_interruptible(wq, (condition)); \
+ freezer_count(); \
__retval; \
})

#define wait_event_freezable_timeout(wq, condition, timeout) \
({ \
long __retval = timeout; \
- for (;;) { \
- __retval = wait_event_interruptible_timeout(wq, \
- (condition) || freezing(current), \
+ freezer_do_not_count(); \
+ __retval = wait_event_interruptible_timeout(wq, (condition), \
__retval); \
- if (__retval <= 0 || (condition)) \
- break; \
- try_to_freeze(); \
- } \
+ freezer_count(); \
__retval; \
})

+#define wait_event_freezable_exclusive(wq, condition) \
+({ \
+ int __retval; \
+ freezer_do_not_count(); \
+ __retval = wait_event_interruptible_exclusive(wq, condition); \
+ freezer_count(); \
+ __retval; \
+})
+
+
#else /* !CONFIG_FREEZER */
static inline bool frozen(struct task_struct *p) { return false; }
static inline bool freezing(struct task_struct *p) { return false; }
@@ -225,15 +258,26 @@ static inline void set_freezable(void) {}

#define freezable_schedule() schedule()

+#define freezable_schedule_timeout(timeout) schedule_timeout(timeout)
+
+#define freezable_schedule_timeout_interruptible(timeout) \
+ schedule_timeout_interruptible(timeout)
+
#define freezable_schedule_timeout_killable(timeout) \
schedule_timeout_killable(timeout)

+#define freezable_schedule_hrtimeout_range(expires, delta, mode) \
+ schedule_hrtimeout_range(expires, delta, mode)
+
#define wait_event_freezable(wq, condition) \
wait_event_interruptible(wq, condition)

#define wait_event_freezable_timeout(wq, condition, timeout) \
wait_event_interruptible_timeout(wq, condition, timeout)

+#define wait_event_freezable_exclusive(wq, condition) \
+ wait_event_interruptible_exclusive(wq, condition)
+
#define wait_event_freezekillable(wq, condition) \
wait_event_killable(wq, condition)

Colin Cross

unread,
May 1, 2013, 9:35:41 PM5/1/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Tejun Heo, Oleg Nesterov, Colin Cross, Al Viro, Andrew Morton, Eric W. Biederman, Serge Hallyn
Avoid waking up every thread sleeping in a sigtimedwait call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.

This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.

Signed-off-by: Colin Cross <ccr...@android.com>
---
kernel/signal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 598dc06..10a70a0 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2845,7 +2845,7 @@ int do_sigtimedwait(const sigset_t *which, siginfo_t *info,
recalc_sigpending();
spin_unlock_irq(&tsk->sighand->siglock);

- timeout = schedule_timeout_interruptible(timeout);
+ timeout = freezable_schedule_timeout_interruptible(timeout);

spin_lock_irq(&tsk->sighand->siglock);
__set_task_blocked(tsk, &tsk->real_blocked);

Colin Cross

unread,
May 1, 2013, 9:36:11 PM5/1/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Tejun Heo, Oleg Nesterov, Colin Cross, David S. Miller, Eric Dumazet, Al Viro, Eric W. Biederman, Gao feng, net...@vger.kernel.org
Avoid waking up every thread sleeping in read call on an AF_UNIX
socket during suspend and resume by calling a freezable blocking
call. Previous patches modified the freezer to avoid sending
wakeups to threads that are blocked in freezable blocking calls.

This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.

Signed-off-by: Colin Cross <ccr...@android.com>
---
net/unix/af_unix.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 2db702d..2bcac57 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -114,6 +114,7 @@
#include <linux/mount.h>
#include <net/checksum.h>
#include <linux/security.h>
+#include <linux/freezer.h>

struct hlist_head unix_socket_table[2 * UNIX_HASH_SIZE];
EXPORT_SYMBOL_GPL(unix_socket_table);
@@ -1880,7 +1881,7 @@ static long unix_stream_data_wait(struct sock *sk, long timeo)

set_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
unix_state_unlock(sk);
- timeo = schedule_timeout(timeo);
+ timeo = freezable_schedule_timeout(timeo);
unix_state_lock(sk);
clear_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
}

Colin Cross

unread,
May 1, 2013, 9:36:37 PM5/1/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Tejun Heo, Oleg Nesterov, Colin Cross, Thomas Gleixner
Avoid waking up every thread sleeping in a nanosleep call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.

This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.

Signed-off-by: Colin Cross <ccr...@android.com>
---
kernel/hrtimer.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 14be27f..e036276 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -47,6 +47,7 @@
#include <linux/sched/sysctl.h>
#include <linux/sched/rt.h>
#include <linux/timer.h>
+#include <linux/freezer.h>

#include <asm/uaccess.h>

@@ -1525,7 +1526,7 @@ static int __sched do_nanosleep(struct hrtimer_sleeper *t, enum hrtimer_mode mod
t->task = NULL;

if (likely(t->task))
- schedule();
+ freezable_schedule();

hrtimer_cancel(&t->timer);
mode = HRTIMER_MODE_ABS;

Colin Cross

unread,
May 1, 2013, 9:36:59 PM5/1/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Tejun Heo, Oleg Nesterov, Colin Cross, Alexander Viro, linux-...@vger.kernel.org
Avoid waking up every thread sleeping in a select call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.

This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.

Signed-off-by: Colin Cross <ccr...@android.com>
---
fs/select.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/select.c b/fs/select.c
index 8c1c96c..6b14dc7 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -27,6 +27,7 @@
#include <linux/rcupdate.h>
#include <linux/hrtimer.h>
#include <linux/sched/rt.h>
+#include <linux/freezer.h>

#include <asm/uaccess.h>

@@ -236,7 +237,8 @@ int poll_schedule_timeout(struct poll_wqueues *pwq, int state,

set_current_state(state);
if (!pwq->triggered)
- rc = schedule_hrtimeout_range(expires, slack, HRTIMER_MODE_ABS);
+ rc = freezable_schedule_hrtimeout_range(expires, slack,
+ HRTIMER_MODE_ABS);
__set_current_state(TASK_RUNNING);

/*

Colin Cross

unread,
May 1, 2013, 9:37:01 PM5/1/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Tejun Heo, Oleg Nesterov, Colin Cross, Alexander Viro, linux-...@vger.kernel.org
Avoid waking up every thread sleeping in an epoll_wait call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.

This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.

Signed-off-by: Colin Cross <ccr...@android.com>
---
fs/eventpoll.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 9fec183..65245e7 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -34,6 +34,7 @@
#include <linux/mutex.h>
#include <linux/anon_inodes.h>
#include <linux/device.h>
+#include <linux/freezer.h>
#include <asm/uaccess.h>
#include <asm/io.h>
#include <asm/mman.h>
@@ -1543,7 +1544,8 @@ fetch_events:
}

spin_unlock_irqrestore(&ep->lock, flags);
- if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS))
+ if (!freezable_schedule_hrtimeout_range(to, slack,
+ HRTIMER_MODE_ABS))
timed_out = 1;

spin_lock_irqsave(&ep->lock, flags);

Colin Cross

unread,
May 1, 2013, 9:41:05 PM5/1/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Tejun Heo, Oleg Nesterov, Colin Cross, Greg Kroah-Hartman, Al Viro, Eric W. Biederman, Sachin Kamat, de...@driverdev.osuosl.org
Avoid waking up every thread sleeping in a binder call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.

This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.

Signed-off-by: Colin Cross <ccr...@android.com>
---

Colin Cross

unread,
May 1, 2013, 9:43:13 PM5/1/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Tejun Heo, Oleg Nesterov, Colin Cross, Darren Hart, Thomas Gleixner, Andrew Morton, Randy Dunlap, Al Viro
Avoid waking up every thread sleeping in a futex_wait call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.

This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.

Signed-off-by: Colin Cross <ccr...@android.com>
---
kernel/futex.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index b26dcfc..d710fae 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -61,6 +61,7 @@
#include <linux/nsproxy.h>
#include <linux/ptrace.h>
#include <linux/sched/rt.h>
+#include <linux/freezer.h>

#include <asm/futex.h>

@@ -1807,7 +1808,7 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
* is no timeout, or if it has yet to expire.
*/
if (!timeout || timeout->task)
- schedule();
+ freezable_schedule();
}
__set_current_state(TASK_RUNNING);
}

Colin Cross

unread,
May 1, 2013, 9:43:14 PM5/1/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Tejun Heo, Oleg Nesterov, Colin Cross, Len Brown, Pavel Machek
All tasks can easily be frozen in under 10 ms, switch to using
an initial 1 ms sleep followed by exponential backoff until
8 ms. Also convert the printed time to ms instead of centiseconds.

Signed-off-by: Colin Cross <ccr...@android.com>
---
kernel/power/process.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/kernel/power/process.c b/kernel/power/process.c
index 98088e0..eb7e6c1 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -30,9 +30,10 @@ static int try_to_freeze_tasks(bool user_only)
unsigned int todo;
bool wq_busy = false;
struct timeval start, end;
- u64 elapsed_csecs64;
- unsigned int elapsed_csecs;
+ u64 elapsed_msecs64;
+ unsigned int elapsed_msecs;
bool wakeup = false;
+ int sleep_usecs = USEC_PER_MSEC;

do_gettimeofday(&start);

@@ -70,20 +71,22 @@ static int try_to_freeze_tasks(bool user_only)
* We need to retry, but first give the freezing tasks some
* time to enter the refrigerator.
*/
- msleep(10);
+ usleep_range(sleep_usecs / 2, sleep_usecs);
+ if (sleep_usecs < 8 * USEC_PER_MSEC)
+ sleep_usecs *= 2;
}

do_gettimeofday(&end);
- elapsed_csecs64 = timeval_to_ns(&end) - timeval_to_ns(&start);
- do_div(elapsed_csecs64, NSEC_PER_SEC / 100);
- elapsed_csecs = elapsed_csecs64;
+ elapsed_msecs64 = timeval_to_ns(&end) - timeval_to_ns(&start);
+ do_div(elapsed_msecs64, NSEC_PER_MSEC);
+ elapsed_msecs = elapsed_msecs64;

if (todo) {
printk("\n");
- printk(KERN_ERR "Freezing of tasks %s after %d.%02d seconds "
+ printk(KERN_ERR "Freezing of tasks %s after %d.%03d seconds "
"(%d tasks refusing to freeze, wq_busy=%d):\n",
wakeup ? "aborted" : "failed",
- elapsed_csecs / 100, elapsed_csecs % 100,
+ elapsed_msecs / 1000, elapsed_msecs % 1000,
todo - wq_busy, wq_busy);

if (!wakeup) {
@@ -96,8 +99,8 @@ static int try_to_freeze_tasks(bool user_only)
read_unlock(&tasklist_lock);
}
} else {
- printk("(elapsed %d.%02d seconds) ", elapsed_csecs / 100,
- elapsed_csecs % 100);
+ printk("(elapsed %d.%03d seconds) ", elapsed_msecs / 1000,
+ elapsed_msecs % 1000);
}

return todo ? -EBUSY : 0;

Colin Cross

unread,
May 1, 2013, 9:43:28 PM5/1/13
to linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Tejun Heo, Oleg Nesterov, Colin Cross
On slow cpus the large number of task wakeups and context switches
triggered by freezing and thawing tasks can take a significant amount
of cpu time. This patch series reduces the amount of work done during
freezing tasks by avoiding waking up tasks that are already in a freezable
state.

The first patch reduces the wasted time in try_to_freeze_tasks() by
starting with a 1 ms sleep during the first loop and backing off
up to an 8 ms sleep if all tasks are not frozen.

The second patch modifies the freeze_task() function to skip tasks
that have set the PF_FREEZER_SKIP flag by calling freezer_do_not_count().
These tasks will not enter the refrigerator during the suspend/resume
cycle unless they woken up by something else, in which case they will
enter the refrigerator in freezer_count() before they access any
resources that would not be available in suspend or deadlock with
another freezing/frozen task.

The rest of the series adds a few more freezable helpers and converts the
top call sites that userspace tasks are usually blocked at to freezable
helpers. The list of call sites was collected on a Nexus 10 (ARM Exynos
5250 SoC), but all the top call sites other than binder show up at the
top of the list on Ubuntu x86-64 as well.

This series cuts the time for freezing tasks from 50 ms to 5 ms when
the cpu speed is locked at its lowest setting (200MHz), and reduces
the number of context switches and restarted syscalls from 1000 to
25.

v2 moves the skip check to freeze_task(), and expands the commit
messages.

Pavel Machek

unread,
May 2, 2013, 8:28:10 AM5/2/13
to Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Len Brown
On Mon 2013-04-29 14:45:37, Colin Cross wrote:
> All tasks can easily be frozen in under 10 ms, switch to using
> an initial 1 ms sleep followed by exponential backoff until
> 8 ms. Also convert the printed time to ms instead of centiseconds.
>
> Signed-off-by: Colin Cross <ccr...@android.com>

Acked-by: Pavel Machek <pa...@ucw.cz>

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Pavel Machek

unread,
May 2, 2013, 8:48:50 AM5/2/13
to Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Len Brown
On Mon 2013-04-29 14:45:39, Colin Cross wrote:
> Freezing tasks will wake up almost every userspace task from
> where it is blocking and force it to run until it hits a
> call to try_to_sleep(), generally on the exit path from the syscall
> it is blocking in. On resume each task will run again, usually
> restarting the syscall and running until it hits the same
> blocking call as it was originally blocked in.

Ok, so you are optimizing suspend at the cost of runtime operations,
right?

Would it make sense to do suspends entirely without freezer in your
configurations? With the right drivers, it should work ok.

Actually, would it make sense to simply enter deep idle and do runtime
powersaving in the drivers... n900 does that rather successfully and
it is certainly cleaner design.

> +/* Like schedule_timeout(), but should not block the freezer. */
> +#define freezable_schedule_timeout(timeout) \
> +({ \
> + long __retval; \
> + freezer_do_not_count(); \
> + __retval = schedule_timeout(timeout); \
> + freezer_count(); \
> + __retval; \
> +})

You plan to use this a lot during normal operation, right? Is it going
to have some measureable impact?

Also, why not static inline?
Pavel

Oliver Neukum

unread,
May 2, 2013, 9:04:40 AM5/2/13
to Pavel Machek, Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Len Brown
On Thursday 02 May 2013 14:48:26 Pavel Machek wrote:
> On Mon 2013-04-29 14:45:39, Colin Cross wrote:
> > Freezing tasks will wake up almost every userspace task from
> > where it is blocking and force it to run until it hits a
> > call to try_to_sleep(), generally on the exit path from the syscall
> > it is blocking in. On resume each task will run again, usually
> > restarting the syscall and running until it hits the same
> > blocking call as it was originally blocked in.
>
> Ok, so you are optimizing suspend at the cost of runtime operations,
> right?
>
> Would it make sense to do suspends entirely without freezer in your
> configurations? With the right drivers, it should work ok.

Right now drivers now that they will not be busy when runtime
suspend happens. The freezer has the same effect for system PM.
If you remove that certainty it becomes impossible for simple drivers
to declare their devices busy upon open and do no synchronization
between IO and PM.

Regards
Oliver

Pavel Machek

unread,
May 2, 2013, 9:47:03 AM5/2/13
to Oliver Neukum, Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Len Brown
On Thu 2013-05-02 15:05:13, Oliver Neukum wrote:
> On Thursday 02 May 2013 14:48:26 Pavel Machek wrote:
> > On Mon 2013-04-29 14:45:39, Colin Cross wrote:
> > > Freezing tasks will wake up almost every userspace task from
> > > where it is blocking and force it to run until it hits a
> > > call to try_to_sleep(), generally on the exit path from the syscall
> > > it is blocking in. On resume each task will run again, usually
> > > restarting the syscall and running until it hits the same
> > > blocking call as it was originally blocked in.
> >
> > Ok, so you are optimizing suspend at the cost of runtime operations,
> > right?
> >
> > Would it make sense to do suspends entirely without freezer in your
> > configurations? With the right drivers, it should work ok.
>
> Right now drivers now that they will not be busy when runtime
> suspend happens. The freezer has the same effect for system PM.
> If you remove that certainty it becomes impossible for simple drivers
> to declare their devices busy upon open and do no synchronization
> between IO and PM.

I know. Drivers can mostly ignore suspend.

But android people do suspend multiple times a second, and are
optimizing freezer. I'm suggesting they should take a look at their
drivers, and perhaps they can optimize freezer out totally.

(Or perhaps switch to n900-like solution, and avoid suspend
entirely. They keep machines functioning in suspend mode. That aint no
suspend.)

Colin Cross

unread,
May 2, 2013, 1:06:02 PM5/2/13
to Pavel Machek, Linux PM list, lkml, Rafael J. Wysocki, Arve Hjønnevåg, Len Brown
On Thu, May 2, 2013 at 5:48 AM, Pavel Machek <pa...@ucw.cz> wrote:
> On Mon 2013-04-29 14:45:39, Colin Cross wrote:
>> Freezing tasks will wake up almost every userspace task from
>> where it is blocking and force it to run until it hits a
>> call to try_to_sleep(), generally on the exit path from the syscall
>> it is blocking in. On resume each task will run again, usually
>> restarting the syscall and running until it hits the same
>> blocking call as it was originally blocked in.
>
> Ok, so you are optimizing suspend at the cost of runtime operations,
> right?
The runtime operations are negligible, it is setting a bit before
blocking, and then clearing the bit and checking a single global
counter as the fast-path after blocking. Both will be lost in the
noise of the context switch that is happening.

> Would it make sense to do suspends entirely without freezer in your
> configurations? With the right drivers, it should work ok.

No, we need userspace tasks to freeze. That is one of the main
reasons we go to suspend, to prevent recurring timers from firing.

> Actually, would it make sense to simply enter deep idle and do runtime
> powersaving in the drivers... n900 does that rather successfully and
> it is certainly cleaner design.

It depends on the SoC if that is even possible, and if we did do it we
would still use cgroup freezers to prevent apps from running and
eating the battery so this patch would still be applicable.

>> +/* Like schedule_timeout(), but should not block the freezer. */
>> +#define freezable_schedule_timeout(timeout) \
>> +({ \
>> + long __retval; \
>> + freezer_do_not_count(); \
>> + __retval = schedule_timeout(timeout); \
>> + freezer_count(); \
>> + __retval; \
>> +})
>
> You plan to use this a lot during normal operation, right? Is it going
> to have some measureable impact?

It has a very measurable impact on suspend/resume operations (reducing
freeze time by a factor of 10, reducing context switches from 1000 to
25). I haven't measured the impact on total energy used for a
suspend/resume cycle, but based on previous measurements I expect it
to be beneficial.

> Also, why not static inline?

I copied the style of the existing helpers. I can change them all to
static inlines if you prefer.

Thomas Gleixner

unread,
May 2, 2013, 3:46:22 PM5/2/13
to Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Tejun Heo, Oleg Nesterov, Darren Hart, Andrew Morton, Randy Dunlap, Al Viro
On Wed, 1 May 2013, Colin Cross wrote:

> Avoid waking up every thread sleeping in a futex_wait call during
> suspend and resume by calling a freezable blocking call. Previous
> patches modified the freezer to avoid sending wakeups to threads
> that are blocked in freezable blocking calls.
>
> This call was selected to be converted to a freezable call because
> it doesn't hold any locks or release any resources when interrupted
> that might be needed by another freezing task or a kernel driver
> during suspend, and is a common site where idle userspace tasks are
> blocked.
>
> Signed-off-by: Colin Cross <ccr...@android.com>

Acked-by: Thomas Gleixner <tg...@linutronix.de>

Thomas Gleixner

unread,
May 2, 2013, 3:46:25 PM5/2/13
to Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Tejun Heo, Oleg Nesterov


On Wed, 1 May 2013, Colin Cross wrote:

> Avoid waking up every thread sleeping in a nanosleep call during
> suspend and resume by calling a freezable blocking call. Previous
> patches modified the freezer to avoid sending wakeups to threads
> that are blocked in freezable blocking calls.
>
> This call was selected to be converted to a freezable call because
> it doesn't hold any locks or release any resources when interrupted
> that might be needed by another freezing task or a kernel driver
> during suspend, and is a common site where idle userspace tasks are
> blocked.
>
> Signed-off-by: Colin Cross <ccr...@android.com>

Acked-by: Thomas Gleixner <tg...@linutronix.de>

Rafael J. Wysocki

unread,
May 2, 2013, 4:58:37 PM5/2/13
to Colin Cross, Tejun Heo, Oleg Nesterov, linu...@vger.kernel.org, linux-...@vger.kernel.org, ar...@android.com
The entire series makes sense to me, so are there any objections?

Tejun, Oleg??

Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

Tejun Heo

unread,
May 2, 2013, 7:21:01 PM5/2/13
to Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Oleg Nesterov, Len Brown, Pavel Machek
Hello,

On Wed, May 01, 2013 at 06:34:59PM -0700, Colin Cross wrote:
> All tasks can easily be frozen in under 10 ms, switch to using
> an initial 1 ms sleep followed by exponential backoff until
> 8 ms. Also convert the printed time to ms instead of centiseconds.

Can you please put the above as comment in the code? Other than that,

Acked-by: Tejun Heo <t...@kernel.org>

Thanks.

--
tejun

Tejun Heo

unread,
May 2, 2013, 7:24:32 PM5/2/13
to Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Oleg Nesterov, Pavel Machek
Maybe a line or two explaining that this matters for power saving?
Other than that,

Acked-by: Tejun Heo <t...@kernel.org>

Oleg, this looks correct to me. Can you please ack too?

Thanks.

--
tejun

Tejun Heo

unread,
May 2, 2013, 7:55:32 PM5/2/13
to Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Oleg Nesterov, Len Brown, Pavel Machek, Jeff Layton
Hello,

On Wed, May 01, 2013 at 06:35:01PM -0700, Colin Cross wrote:
> To allow tasks to avoid running on every suspend/resume cycle,
> this patch adds additional freezable wrappers around blocking calls
> that call freezer_do_not_count(). Combined with the previous patch,
> these tasks will not run during suspend or resume unless they wake
> up for another reason, in which case they will run until they hit
> the try_to_freeze() in freezer_count(), and then continue processing
> the wakeup after tasks are thawed. This patch also converts the
> existing wait_event_freezable* wrappers to use freezer_do_not_count().

I'd much prefer the latter part being a separate patch because the
change isn't really trivial and it isn't exactly equivalent - before
we prioritized meeting the condition over freezing, now it's the other
way around. It's fine but does deserve description in the log so that
if something gets tracked down to the commit, it's easy to tell how
the behavior flipped and why.

> +/* Like schedule_timeout(), but should not block the freezer. */
> +#define freezable_schedule_timeout(timeout) \
> +({ \
> + long __retval; \
> + freezer_do_not_count(); \
> + __retval = schedule_timeout(timeout); \
> + freezer_count(); \
> + __retval; \
> +})
> +
> +/* Like schedule_timeout_interruptible(), but should not block the freezer. */
> +#define freezable_schedule_timeout_interruptible(timeout) \
> +({ \
> + long __retval; \
> + freezer_do_not_count(); \
> + __retval = schedule_timeout_interruptible(timeout); \
> + freezer_count(); \
> + __retval; \
> +})
..
> +/* Like schedule_hrtimeout_range(), but should not block the freezer. */
> +#define freezable_schedule_hrtimeout_range(expires, delta, mode) \
> +({ \
> + int __retval; \
> + freezer_do_not_count(); \
> + __retval = schedule_hrtimeout_range(expires, delta, mode); \
> + freezer_count(); \
> + __retval; \
> +})

(cc'ing Jeff Layton)

So, one worry that I have about this is that freezer essentially
behaves like a huge lock that everyone grabs and while scattering
try_to_freeze() calls around might seem innocuous, it effectively
adding a dependency to that single giant lock to that place, so
whenever you're adding try_to_freeze() call while holding any kind of
locks, it substiantially increases the chance of possible deadlock
scenarios. NFS recently got bitten by it and now is trying to get rid
of them as it's fundamentally broken.

So, the freezable interface can't be something that people can use
casually. It is something which should be carefully and strategically
deployed where we *know* that lock dependency risks don't exist or at
least are acceptable. I'm a bit weary that this patch is expanding
the interface a lot that they now look like the equivalents of normal
schedule calls. Not exactly sure what to do here but can we please at
least have RED BOLD BLINKING comments which scream to people not to
use these unless they know what they're doing?

Thanks.

--
tejun

Tejun Heo

unread,
May 2, 2013, 8:04:27 PM5/2/13
to Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Oleg Nesterov, Len Brown, Pavel Machek, Jeff Layton
On Thu, May 02, 2013 at 04:55:05PM -0700, Tejun Heo wrote:
> So, the freezable interface can't be something that people can use
> casually. It is something which should be carefully and strategically
> deployed where we *know* that lock dependency risks don't exist or at
> least are acceptable. I'm a bit weary that this patch is expanding
> the interface a lot that they now look like the equivalents of normal
> schedule calls. Not exactly sure what to do here but can we please at
> least have RED BOLD BLINKING comments which scream to people not to
> use these unless they know what they're doing?

Maybe we should trigger WARN_ON_ONCE() if lockdep_depth() > 0 by
default and have ugly variants which can be used if the caller is sure
that it's okay possibly with list of locks which are held?

Tejun Heo

unread,
May 2, 2013, 8:08:38 PM5/2/13
to Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Oleg Nesterov, David S. Miller, Eric Dumazet, Al Viro, Eric W. Biederman, Gao feng, net...@vger.kernel.org
On Wed, May 01, 2013 at 06:35:08PM -0700, Colin Cross wrote:
> Avoid waking up every thread sleeping in read call on an AF_UNIX
> socket during suspend and resume by calling a freezable blocking
> call. Previous patches modified the freezer to avoid sending
> wakeups to threads that are blocked in freezable blocking calls.
>
> This call was selected to be converted to a freezable call because
> it doesn't hold any locks or release any resources when interrupted
> that might be needed by another freezing task or a kernel driver
> during suspend, and is a common site where idle userspace tasks are
> blocked.

Heh, so you are aware of the deadlock possibilities. Good selection
of spots. For all the conversion patches.

Acked-by: Tejun Heo <t...@kernel.org>

Thanks.

--
tejun

Tejun Heo

unread,
May 2, 2013, 8:10:49 PM5/2/13
to Rafael J. Wysocki, Colin Cross, Oleg Nesterov, linu...@vger.kernel.org, linux-...@vger.kernel.org, ar...@android.com
Hey, Rafael.

On Thu, May 02, 2013 at 11:06:35PM +0200, Rafael J. Wysocki wrote:
> > v2 moves the skip check to freeze_task(), and expands the commit
> > messages.
>
> The entire series makes sense to me, so are there any objections?
>
> Tejun, Oleg??

Generally looks good to me. I'm a bit uncomfortable with the
generic-looking expansion of the freezable API as it can't be
something which can be used in any generic manner. It'd be great if
we just add only the used ones (do we already?) and add some sort of
lockdep annotation so that people don't use them while holding any
locks.

Thanks.

--
tejun

Colin Cross

unread,
May 2, 2013, 10:17:21 PM5/2/13
to Tejun Heo, Linux PM list, lkml, Rafael J. Wysocki, Arve Hjønnevåg, Oleg Nesterov, Len Brown, Pavel Machek, Jeff Layton
This sounds the same as what ended up getting reverted in
https://lkml.org/lkml/2013/3/4/221
I can add the WARN_ON_ONCE to all my new calls, and leave them out of
existing calls, but that seems a little odd, and will be redundant if
the lockdep call in try_to_freeze goes back in in 3.11. Do you still
want it in the new apis?

Colin Cross

unread,
May 2, 2013, 10:42:00 PM5/2/13
to Tejun Heo, Linux PM list, lkml, Rafael J. Wysocki, Arve Hjønnevåg, Oleg Nesterov, Len Brown, Pavel Machek, Jeff Layton, Mandeep Baines
On Thu, May 2, 2013 at 7:16 PM, Colin Cross <ccr...@android.com> wrote:
> This sounds the same as what ended up getting reverted in
> https://lkml.org/lkml/2013/3/4/221
> I can add the WARN_ON_ONCE to all my new calls, and leave them out of
> existing calls, but that seems a little odd, and will be redundant if
> the lockdep call in try_to_freeze goes back in in 3.11. Do you still
> want it in the new apis?
>
> On Thu, May 2, 2013 at 5:03 PM, Tejun Heo <t...@kernel.org> wrote:
>> On Thu, May 02, 2013 at 04:55:05PM -0700, Tejun Heo wrote:
>>> So, the freezable interface can't be something that people can use
>>> casually. It is something which should be carefully and strategically
>>> deployed where we *know* that lock dependency risks don't exist or at
>>> least are acceptable. I'm a bit weary that this patch is expanding
>>> the interface a lot that they now look like the equivalents of normal
>>> schedule calls. Not exactly sure what to do here but can we please at
>>> least have RED BOLD BLINKING comments which scream to people not to
>>> use these unless they know what they're doing?
>>
>> Maybe we should trigger WARN_ON_ONCE() if lockdep_depth() > 0 by
>> default and have ugly variants which can be used if the caller is sure
>> that it's okay possibly with list of locks which are held?
>>
>> --
>> tejun

(sorry for the top post)

I could also put the lockdep check that was reveted back into
try_to_freeze(), and add a freezable_schedule_unsafe() that skips it
for use in the known-unsafe users in nfs, with a big comment not to
add new users of it.

Darren Hart

unread,
May 2, 2013, 11:13:20 PM5/2/13
to Thomas Gleixner, Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Tejun Heo, Oleg Nesterov, Andrew Morton, Randy Dunlap, Al Viro


On 05/02/2013 12:45 PM, Thomas Gleixner wrote:
> On Wed, 1 May 2013, Colin Cross wrote:
>
>> Avoid waking up every thread sleeping in a futex_wait call during
>> suspend and resume by calling a freezable blocking call. Previous
>> patches modified the freezer to avoid sending wakeups to threads
>> that are blocked in freezable blocking calls.
>>
>> This call was selected to be converted to a freezable call because
>> it doesn't hold any locks or release any resources when interrupted
>> that might be needed by another freezing task or a kernel driver
>> during suspend, and is a common site where idle userspace tasks are
>> blocked.
>>
>> Signed-off-by: Colin Cross <ccr...@android.com>
>
> Acked-by: Thomas Gleixner <tg...@linutronix.de>

Beat me to it :-) Also agreed.

Acked-by: Darren Hart <dvh...@linux.intel.com>

>
>> ---
>> kernel/futex.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/futex.c b/kernel/futex.c
>> index b26dcfc..d710fae 100644
>> --- a/kernel/futex.c
>> +++ b/kernel/futex.c
>> @@ -61,6 +61,7 @@
>> #include <linux/nsproxy.h>
>> #include <linux/ptrace.h>
>> #include <linux/sched/rt.h>
>> +#include <linux/freezer.h>
>>
>> #include <asm/futex.h>
>>
>> @@ -1807,7 +1808,7 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
>> * is no timeout, or if it has yet to expire.
>> */
>> if (!timeout || timeout->task)
>> - schedule();
>> + freezable_schedule();
>> }
>> __set_current_state(TASK_RUNNING);
>> }
>> --
>> 1.8.2.1
>>
>>

--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Technical Lead - Linux Kernel

Tejun Heo

unread,
May 3, 2013, 12:09:45 AM5/3/13
to Colin Cross, Linux PM list, lkml, Rafael J. Wysocki, Arve Hjønnevåg, Oleg Nesterov, Len Brown, Pavel Machek, Jeff Layton, Mandeep Baines
Hello,

On Thu, May 02, 2013 at 07:41:39PM -0700, Colin Cross wrote:
> On Thu, May 2, 2013 at 7:16 PM, Colin Cross <ccr...@android.com> wrote:
> > This sounds the same as what ended up getting reverted in
> > https://lkml.org/lkml/2013/3/4/221
> > I can add the WARN_ON_ONCE to all my new calls, and leave them out of
> > existing calls, but that seems a little odd, and will be redundant if
> > the lockdep call in try_to_freeze goes back in in 3.11. Do you still
> > want it in the new apis?
..
> I could also put the lockdep check that was reveted back into
> try_to_freeze(), and add a freezable_schedule_unsafe() that skips it
> for use in the known-unsafe users in nfs, with a big comment not to
> add new users of it.

Oh yeah, that sounds like a good idea, and as for the non-ugly
variants, at least in the mid/long term, I think it'd be best to add
the lockdep annotation to try_to_freeze() with
__try_to_freeze_unsafe_youre_gonna_burn_in_hell_if_you_use_this() for
the existing users which should be gradually converted, but if that's
too burdensome, adding warnings to the wrappers should do for now, I
guess.

And I *hope* the lockdep annotation is stricter than what was added
before. I think it better be "no lock ever should be held at this
point" rather than "consider this a big lock". I'm not sure how much
consensus we have on this but I'd like to, in the longer term, merge
freezer into job control stop so that frozen tasks don't appear as
being in uninterruptible sleep. Currently, the frozen tasks are
essentially in UNINTERRUPTIBLE sleep which is okay for suspend but for
cgroup freezer, it sucks. You end up with tasks which you can't even
kill.

Combined with the locking problems, I was planning to update the
freezer such that the frozen state is implemented as a form of jobctl
stop, so that things like ptrace / kill -9 could work on them and we
also have the clear definition of the frozen state rather than the
current "it may get stuck somewhere in the kernel".

But that conflicts with what you're doing here which seems pretty
useful, so, to satisfy both goals, when somebody needs to put a
pseudo-frozen task into the actual frozen jobctl stop, those spots
which are currently using try_to_stop() would have to return an error,
most likely -EINTR with TIF_SIGPENDING set, and the control should
return towards userland so that signal handling path can be invoked.
ie. It should be possible to steer the tasks which are considered
frozen but not in the frozen jobctl stop into the jobctl stop without
any side effect. To do that, those spots basically have to be pretty
close to the userland boundary where it can easily leave the kernel
with -EINTR and AFAICS all the spots that you converted are like that
(which I think is natural). While not holding any locks doesn't
guarantee that, I think there'd be a fairly high correlation at least
and it'd be able to drive people towards finding out what's going on.

So, that's my agenda. Not sure how many actually agree with it.
Rafael, Oleg, Jeff?

Thanks.

--
tejun

Tejun Heo

unread,
May 3, 2013, 12:12:21 AM5/3/13
to Colin Cross, Linux PM list, lkml, Rafael J. Wysocki, Arve Hjønnevåg, Oleg Nesterov, Len Brown, Pavel Machek, Jeff Layton, Mandeep Baines
On Thu, May 02, 2013 at 09:09:34PM -0700, Tejun Heo wrote:
> But that conflicts with what you're doing here which seems pretty
> useful, so, to satisfy both goals, when somebody needs to put a
> pseudo-frozen task into the actual frozen jobctl stop, those spots
> which are currently using try_to_stop() would have to return an error,
> most likely -EINTR with TIF_SIGPENDING set, and the control should

Let's make that -ERESTART*.

Colin Cross

unread,
May 3, 2013, 12:17:41 AM5/3/13
to Tejun Heo, Linux PM list, lkml, Rafael J. Wysocki, Arve Hjønnevåg, Oleg Nesterov, Len Brown, Pavel Machek, Jeff Layton, Mandeep Baines
On Thu, May 2, 2013 at 9:09 PM, Tejun Heo <t...@kernel.org> wrote:
> Hello,
>
> On Thu, May 02, 2013 at 07:41:39PM -0700, Colin Cross wrote:
>> On Thu, May 2, 2013 at 7:16 PM, Colin Cross <ccr...@android.com> wrote:
>> > This sounds the same as what ended up getting reverted in
>> > https://lkml.org/lkml/2013/3/4/221
>> > I can add the WARN_ON_ONCE to all my new calls, and leave them out of
>> > existing calls, but that seems a little odd, and will be redundant if
>> > the lockdep call in try_to_freeze goes back in in 3.11. Do you still
>> > want it in the new apis?
> ...
>> I could also put the lockdep check that was reveted back into
>> try_to_freeze(), and add a freezable_schedule_unsafe() that skips it
>> for use in the known-unsafe users in nfs, with a big comment not to
>> add new users of it.
>
> Oh yeah, that sounds like a good idea, and as for the non-ugly
> variants, at least in the mid/long term, I think it'd be best to add
> the lockdep annotation to try_to_freeze() with
> __try_to_freeze_unsafe_youre_gonna_burn_in_hell_if_you_use_this() for
> the existing users which should be gradually converted, but if that's
> too burdensome, adding warnings to the wrappers should do for now, I
> guess.
>
> And I *hope* the lockdep annotation is stricter than what was added
> before. I think it better be "no lock ever should be held at this
> point" rather than "consider this a big lock".

The previous patch (6aa9707099c4b25700940eb3d016f16c4434360d in Linus'
tree) already required that no locks be held, it wasn't using a lock
annotation.

Tejun Heo

unread,
May 3, 2013, 12:20:39 AM5/3/13
to Colin Cross, Linux PM list, lkml, Rafael J. Wysocki, Arve Hjønnevåg, Oleg Nesterov, Len Brown, Pavel Machek, Jeff Layton, Mandeep Baines
On Thu, May 02, 2013 at 09:17:21PM -0700, Colin Cross wrote:
> > And I *hope* the lockdep annotation is stricter than what was added
> > before. I think it better be "no lock ever should be held at this
> > point" rather than "consider this a big lock".
>
> The previous patch (6aa9707099c4b25700940eb3d016f16c4434360d in Linus'
> tree) already required that no locks be held, it wasn't using a lock
> annotation.

Ooh, cool. Thanks for the pointer. Forget about my rambling and
let's just add an unsafe version of try_to_freeze() and be done with
it. :)

--
tejun

Pavel Machek

unread,
May 3, 2013, 5:24:32 AM5/3/13
to Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Tejun Heo, Oleg Nesterov
On Wed 2013-05-01 18:35:00, Colin Cross wrote:
> Android goes through suspend/resume very often (every few seconds when
> on a busy wifi network with the screen off), and a significant portion
> of the energy used to go in and out of suspend is spent in the
> freezer. If a task has called freezer_do_not_count(), don't bother
> waking it up. If it happens to wake up later it will call
> freezer_count() and immediately enter the refrigerator.
>
> Combined with patches to convert freezable helpers to use
> freezer_do_not_count() and convert common sites where idle userspace
> tasks are blocked to use the freezable helpers, this reduces the
> time and energy required to suspend and resume.
>
> Signed-off-by: Colin Cross <ccr...@android.com>

Acked-by: Pavel Machek <pa...@ucw.cz>

Rafael J. Wysocki

unread,
May 3, 2013, 7:35:45 AM5/3/13
to Tejun Heo, Colin Cross, Oleg Nesterov, linu...@vger.kernel.org, linux-...@vger.kernel.org, ar...@android.com
On Thursday, May 02, 2013 05:10:26 PM Tejun Heo wrote:
> Hey, Rafael.
>
> On Thu, May 02, 2013 at 11:06:35PM +0200, Rafael J. Wysocki wrote:
> > > v2 moves the skip check to freeze_task(), and expands the commit
> > > messages.
> >
> > The entire series makes sense to me, so are there any objections?
> >
> > Tejun, Oleg??
>
> Generally looks good to me. I'm a bit uncomfortable with the
> generic-looking expansion of the freezable API as it can't be
> something which can be used in any generic manner. It'd be great if
> we just add only the used ones (do we already?)

I'm not sure about that, unfortunately.

> and add some sort of lockdep annotation so that people don't use them while
> holding any locks.

I guess we can add that anyway?

Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

Pavel Machek

unread,
May 3, 2013, 10:01:02 AM5/3/13
to Colin Cross, Linux PM list, lkml, Rafael J. Wysocki, Arve Hjønnevåg, Len Brown
Hi!

> > Also, why not static inline?
>
> I copied the style of the existing helpers. I can change them all to
> static inlines if you prefer.

That would be better, yes.

Alan Stern

unread,
May 3, 2013, 10:18:37 AM5/3/13
to Tejun Heo, Colin Cross, Linux PM list, lkml, Rafael J. Wysocki, Arve Hjønnevåg, Oleg Nesterov, Len Brown, Pavel Machek, Jeff Layton, Mandeep Baines
On Thu, 2 May 2013, Tejun Heo wrote:

> Combined with the locking problems, I was planning to update the
> freezer such that the frozen state is implemented as a form of jobctl
> stop, so that things like ptrace / kill -9 could work on them and we
> also have the clear definition of the frozen state rather than the
> current "it may get stuck somewhere in the kernel".
>
> But that conflicts with what you're doing here which seems pretty
> useful, so, to satisfy both goals, when somebody needs to put a
> pseudo-frozen task into the actual frozen jobctl stop, those spots
> which are currently using try_to_stop() would have to return an error,
> most likely -EINTR with TIF_SIGPENDING set, and the control should
> return towards userland so that signal handling path can be invoked.
> ie. It should be possible to steer the tasks which are considered
> frozen but not in the frozen jobctl stop into the jobctl stop without
> any side effect. To do that, those spots basically have to be pretty
> close to the userland boundary where it can easily leave the kernel
> with -EINTR and AFAICS all the spots that you converted are like that
> (which I think is natural). While not holding any locks doesn't
> guarantee that, I think there'd be a fairly high correlation at least
> and it'd be able to drive people towards finding out what's going on.

Don't forget about freezable kernel threads. They never cross the
kernel/user boundary.

Alan Stern

Tejun Heo

unread,
May 3, 2013, 12:55:13 PM5/3/13
to Alan Stern, Colin Cross, Linux PM list, lkml, Rafael J. Wysocki, Arve Hjønnevåg, Oleg Nesterov, Len Brown, Pavel Machek, Jeff Layton, Mandeep Baines
On Fri, May 03, 2013 at 10:18:17AM -0400, Alan Stern wrote:
> Don't forget about freezable kernel threads. They never cross the
> kernel/user boundary.

Doesn't really matter for this purpose. They first of all can't be
moved into !root cgroups and thus can't be cgroup-frozen and even if
they could they can't be killed or ptraced, so they just don't play
the same game.

Thanks.

--
tejun

Matt Helsley

unread,
May 3, 2013, 6:58:33 PM5/3/13
to Darren Hart, Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, Rafael J. Wysocki, ar...@android.com, Thomas Gleixner, Andrew Morton, Randy Dunlap, Al Viro, Matthew Helsley
On Mon, Apr 29, 2013 at 03:52:27PM -0700, Darren Hart wrote:
> Colin,
>
> I don't know anything about when or when not to use freezable*, and I
> suspect that may be true for others as well. A more complete
> description of why it's acceptable here in the commit log might help
> expedite acceptance.
>
>
> Matt,
>
> I have a vague memory of discussing something similar to this with you.
> Do you see any potential problems here?

Re: vague memories: We discussed futexes in the context of the old
checkpoint/restart patch series (<= 2.6.32).

This change looks correct to me.

> --
> Darren
>
> On 04/29/2013 02:45 PM, Colin Cross wrote:
> > Avoid waking up every thread sleeping in a futex_wait call during
> > suspend and resume by calling a freezable blocking call.

(in addition to suspend/resume: freeze/thaw via the cgroup freezer.
I'm going to call it freeze/thaw since that should cover both cases..)

Here's my shot at explaining what I *think* the commit is supposed fix:

I imagine that before this patch on a highly-contended futex there
could be a thundering herd during freeze/thaw -- the wakeups are
*likely* to be painful because lots of tasks could be woken from the
schedule() call by the freezer only to find that the futex state hasn't
changed.

With this change the freezer won't wake these tasks up because the
FREEZER_SKIP flag is set while in the schedule() call and thus the
thundering herd won't be triggered by the freezer.

Cheers,
-Matt Helsley

Rafael J. Wysocki

unread,
May 4, 2013, 3:11:51 PM5/4/13
to Tejun Heo, Colin Cross, linu...@vger.kernel.org, linux-...@vger.kernel.org, ar...@android.com, Oleg Nesterov, David S. Miller, Eric Dumazet, Al Viro, Eric W. Biederman, Gao feng, net...@vger.kernel.org
On Thursday, May 02, 2013 05:08:12 PM Tejun Heo wrote:
> On Wed, May 01, 2013 at 06:35:08PM -0700, Colin Cross wrote:
> > Avoid waking up every thread sleeping in read call on an AF_UNIX
> > socket during suspend and resume by calling a freezable blocking
> > call. Previous patches modified the freezer to avoid sending
> > wakeups to threads that are blocked in freezable blocking calls.
> >
> > This call was selected to be converted to a freezable call because
> > it doesn't hold any locks or release any resources when interrupted
> > that might be needed by another freezing task or a kernel driver
> > during suspend, and is a common site where idle userspace tasks are
> > blocked.
>
> Heh, so you are aware of the deadlock possibilities. Good selection
> of spots. For all the conversion patches.
>
> Acked-by: Tejun Heo <t...@kernel.org>

I wonder if that includes [3/10] (just to get the record straight)?

Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

Tejun Heo

unread,
May 4, 2013, 4:39:47 PM5/4/13
to Rafael J. Wysocki, Colin Cross, linu...@vger.kernel.org, lkml, ar...@android.com, Oleg Nesterov, David S. Miller, Eric Dumazet, Al Viro, Eric W. Biederman, Gao feng, netdev
Hello, Rafael.

On Sat, May 4, 2013 at 12:19 PM, Rafael J. Wysocki <r...@sisk.pl> wrote:
>> Heh, so you are aware of the deadlock possibilities. Good selection
>> of spots. For all the conversion patches.
>>
>> Acked-by: Tejun Heo <t...@kernel.org>
>
> I wonder if that includes [3/10] (just to get the record straight)?

I think we want the lockdep annotations before these go in. I'll
review Colin's new series later today.

Thanks!

--
tejun

Colin Cross

unread,
May 4, 2013, 6:24:12 PM5/4/13
to Tejun Heo, Rafael J. Wysocki, linu...@vger.kernel.org, lkml, Arve Hjønnevåg, Oleg Nesterov, David S. Miller, Eric Dumazet, Al Viro, Eric W. Biederman, Gao feng, netdev
On Sat, May 4, 2013 at 1:39 PM, Tejun Heo <t...@kernel.org> wrote:
> Hello, Rafael.
>
> On Sat, May 4, 2013 at 12:19 PM, Rafael J. Wysocki <r...@sisk.pl> wrote:
>>> Heh, so you are aware of the deadlock possibilities. Good selection
>>> of spots. For all the conversion patches.
>>>
>>> Acked-by: Tejun Heo <t...@kernel.org>
>>
>> I wonder if that includes [3/10] (just to get the record straight)?
>
> I think we want the lockdep annotations before these go in. I'll
> review Colin's new series later today.

Just to make sure the merge order is clear, the two patches I posted
yesterday to reintroduce the lockdep warning in try_to_freeze()
(https://lkml.org/lkml/2013/5/3/488) need to be merged first, then I
will repost this series on top of that one.

Rafael J. Wysocki

unread,
May 5, 2013, 7:15:59 AM5/5/13
to Colin Cross, Tejun Heo, linu...@vger.kernel.org, lkml, Arve Hjønnevåg, Oleg Nesterov, David S. Miller, Eric Dumazet, Al Viro, Eric W. Biederman, Gao feng, netdev
On Saturday, May 04, 2013 03:23:30 PM Colin Cross wrote:
> On Sat, May 4, 2013 at 1:39 PM, Tejun Heo <t...@kernel.org> wrote:
> > Hello, Rafael.
> >
> > On Sat, May 4, 2013 at 12:19 PM, Rafael J. Wysocki <r...@sisk.pl> wrote:
> >>> Heh, so you are aware of the deadlock possibilities. Good selection
> >>> of spots. For all the conversion patches.
> >>>
> >>> Acked-by: Tejun Heo <t...@kernel.org>
> >>
> >> I wonder if that includes [3/10] (just to get the record straight)?
> >
> > I think we want the lockdep annotations before these go in. I'll
> > review Colin's new series later today.
>
> Just to make sure the merge order is clear, the two patches I posted
> yesterday to reintroduce the lockdep warning in try_to_freeze()
> (https://lkml.org/lkml/2013/5/3/488) need to be merged first, then I
> will repost this series on top of that one.

OK, thanks for the clarification.

Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

Colin Cross

unread,
May 6, 2013, 7:50:51 PM5/6/13
to linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Colin Cross, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo
On slow cpus the large number of task wakeups and context switches
triggered by freezing and thawing tasks can take a significant amount
of cpu time. This patch series reduces the amount of work done during
freezing tasks by avoiding waking up tasks that are already in a freezable
state.

The first 4 patches reintroduce 6aa9707099c (lockdep: check that no locks
held at freeze time) which was reverted in dbf520a9d7d4, and fix up the
known callers with locks held in NFS and CIFS to skip the lockdep check
for now. The lockdep check will warn any future incorrect users of the
freezable helpers.

The fifth patch reduces the wasted time in try_to_freeze_tasks() by
starting with a 1 ms sleep during the first loop and backing off
up to an 8 ms sleep if all tasks are not frozen.

The sixth patch modifies the freeze_task() function to skip tasks
that have set the PF_FREEZER_SKIP flag by calling freezer_do_not_count().
These tasks will not enter the refrigerator during the suspend/resume
cycle unless they woken up by something else, in which case they will
enter the refrigerator in freezer_count() before they access any
resources that would not be available in suspend or deadlock with
another freezing/frozen task.

The rest of the series adds a few more freezable helpers and converts the
top call sites that userspace tasks are usually blocked at to freezable
helpers. The list of call sites was collected on a Nexus 10 (ARM Exynos
5250 SoC), but all the top call sites other than binder show up at the
top of the list on Ubuntu x86-64 as well.

This series cuts the time for freezing tasks from 50 ms to 5 ms when
the cpu speed is locked at its lowest setting (200MHz), and reduces
the number of context switches and restarted syscalls from 1000 to
25.

v2 moves the skip check to freeze_task(), and expands the commit
messages.

v3 adds the patches to reintroduce the lockdep check to this patchset,
adds a patch to convert the freezable helpers to static inlines when
possible, and splits the patch that adds the new helpers out of the one
that converts the existing helpers to use freezer_do_not_count.

Colin Cross

unread,
May 6, 2013, 7:50:52 PM5/6/13
to linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Colin Cross, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Trond Myklebust, Len Brown, J. Bruce Fields, David S. Miller
NFS calls the freezable helpers with locks held, which is unsafe
and will cause lockdep warnings when 6aa9707 "lockdep: check
that no locks held at freeze time" is reapplied (it was reverted
in dbf520a). NFS shouldn't be doing this, but it has
long-running syscalls that must hold a lock but also shouldn't
block suspend. Until NFS freeze handling is rewritten to use a
signal to exit out of the critical section, add new *_unsafe
versions of the helpers that will not run the lockdep test when
6aa9707 is reapplied, and call them from NFS.

In practice the likley result of holding the lock while freezing
is that a second task blocked on the lock will never freeze,
aborting suspend, but it is possible to manufacture a case using
the cgroup freezer, the lock, and the suspend freezer to create
a deadlock. Silencing the lockdep warning here will allow
problems to be found in other drivers that may have a more
serious deadlock risk, and prevent new problems from being added.

Acked-by: Pavel Machek <pa...@ucw.cz>
Acked-by: Tejun Heo <t...@kernel.org>
Signed-off-by: Colin Cross <ccr...@android.com>
---
fs/nfs/inode.c | 2 +-
fs/nfs/nfs3proc.c | 2 +-
fs/nfs/nfs4proc.c | 4 ++--
include/linux/freezer.h | 42 +++++++++++++++++++++++++++++++++++++++++-
net/sunrpc/sched.c | 2 +-
5 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 1f94167..53cbee5 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -79,7 +79,7 @@ int nfs_wait_bit_killable(void *word)
{
if (fatal_signal_pending(current))
return -ERESTARTSYS;
- freezable_schedule();
+ freezable_schedule_unsafe();
return 0;
}
EXPORT_SYMBOL_GPL(nfs_wait_bit_killable);
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index 43ea96c..ce90eb4 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -33,7 +33,7 @@ nfs3_rpc_wrapper(struct rpc_clnt *clnt, struct rpc_message *msg, int flags)
res = rpc_call_sync(clnt, msg, flags);
if (res != -EJUKEBOX)
break;
- freezable_schedule_timeout_killable(NFS_JUKEBOX_RETRY_TIME);
+ freezable_schedule_timeout_killable_unsafe(NFS_JUKEBOX_RETRY_TIME);
res = -ERESTARTSYS;
} while (!fatal_signal_pending(current));
return res;
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 0ad025e..a236077 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -266,7 +266,7 @@ static int nfs4_delay(struct rpc_clnt *clnt, long *timeout)
*timeout = NFS4_POLL_RETRY_MIN;
if (*timeout > NFS4_POLL_RETRY_MAX)
*timeout = NFS4_POLL_RETRY_MAX;
- freezable_schedule_timeout_killable(*timeout);
+ freezable_schedule_timeout_killable_unsafe(*timeout);
if (fatal_signal_pending(current))
res = -ERESTARTSYS;
*timeout <<= 1;
@@ -4309,7 +4309,7 @@ int nfs4_proc_delegreturn(struct inode *inode, struct rpc_cred *cred, const nfs4
static unsigned long
nfs4_set_lock_task_retry(unsigned long timeout)
{
- freezable_schedule_timeout_killable(timeout);
+ freezable_schedule_timeout_killable_unsafe(timeout);
timeout <<= 1;
if (timeout > NFS4_LOCK_MAXTIMEOUT)
return NFS4_LOCK_MAXTIMEOUT;
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index e70df40..5b31e21c 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -46,7 +46,11 @@ extern int freeze_kernel_threads(void);
extern void thaw_processes(void);
extern void thaw_kernel_threads(void);

-static inline bool try_to_freeze(void)
+/*
+ * DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION
+ * If try_to_freeze causes a lockdep warning it means the caller may deadlock
+ */
+static inline bool try_to_freeze_unsafe(void)
{
might_sleep();
if (likely(!freezing(current)))
@@ -54,6 +58,11 @@ static inline bool try_to_freeze(void)
return __refrigerator(false);
}

+static inline bool try_to_freeze(void)
+{
+ return try_to_freeze_unsafe();
+}
+
extern bool freeze_task(struct task_struct *p);
extern bool set_freezable(void);

@@ -115,6 +124,14 @@ static inline void freezer_count(void)
try_to_freeze();
}

+/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
+static inline void freezer_count_unsafe(void)
+{
+ current->flags &= ~PF_FREEZER_SKIP;
+ smp_mb();
+ try_to_freeze_unsafe();
+}
+
/**
* freezer_should_skip - whether to skip a task when determining frozen
* state is reached
@@ -152,6 +169,14 @@ static inline bool freezer_should_skip(struct task_struct *p)
freezer_count(); \
})

+/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
+#define freezable_schedule_unsafe() \
+({ \
+ freezer_do_not_count(); \
+ schedule(); \
+ freezer_count_unsafe(); \
+})
+
/* Like schedule_timeout_killable(), but should not block the freezer. */
#define freezable_schedule_timeout_killable(timeout) \
({ \
@@ -162,6 +187,16 @@ static inline bool freezer_should_skip(struct task_struct *p)
__retval; \
})

+/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
+#define freezable_schedule_timeout_killable_unsafe(timeout) \
+({ \
+ long __retval; \
+ freezer_do_not_count(); \
+ __retval = schedule_timeout_killable(timeout); \
+ freezer_count_unsafe(); \
+ __retval; \
+})
+
/*
* Freezer-friendly wrappers around wait_event_interruptible(),
* wait_event_killable() and wait_event_interruptible_timeout(), originally
@@ -225,9 +260,14 @@ static inline void set_freezable(void) {}

#define freezable_schedule() schedule()

+#define freezable_schedule_unsafe() schedule()
+
#define freezable_schedule_timeout_killable(timeout) \
schedule_timeout_killable(timeout)

+#define freezable_schedule_timeout_killable_unsafe(timeout) \
+ schedule_timeout_killable(timeout)
+
#define wait_event_freezable(wq, condition) \
wait_event_interruptible(wq, condition)

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index f8529fc..8dcfadc 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -254,7 +254,7 @@ static int rpc_wait_bit_killable(void *word)
{
if (fatal_signal_pending(current))
return -ERESTARTSYS;
- freezable_schedule();
+ freezable_schedule_unsafe();
return 0;
}

--
1.8.2.1

Colin Cross

unread,
May 6, 2013, 7:50:53 PM5/6/13
to linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Colin Cross, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Al Viro, Eric W. Biederman, Kees Cook
Avoid waking up every thread sleeping in a sigtimedwait call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.

This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.

Acked-by: Tejun Heo <t...@kernel.org>
Signed-off-by: Colin Cross <ccr...@android.com>
---
kernel/signal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 598dc06..10a70a0 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2845,7 +2845,7 @@ int do_sigtimedwait(const sigset_t *which, siginfo_t *info,
recalc_sigpending();
spin_unlock_irq(&tsk->sighand->siglock);

- timeout = schedule_timeout_interruptible(timeout);
+ timeout = freezable_schedule_timeout_interruptible(timeout);

spin_lock_irq(&tsk->sighand->siglock);
__set_task_blocked(tsk, &tsk->real_blocked);
--
1.8.2.1

Colin Cross

unread,
May 6, 2013, 7:51:13 PM5/6/13
to linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Colin Cross, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, David S. Miller, Eric Dumazet, Al Viro, Eric W. Biederman, Gao feng
Avoid waking up every thread sleeping in read call on an AF_UNIX
socket during suspend and resume by calling a freezable blocking
call. Previous patches modified the freezer to avoid sending
wakeups to threads that are blocked in freezable blocking calls.

This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.

Acked-by: Tejun Heo <t...@kernel.org>
Signed-off-by: Colin Cross <ccr...@android.com>
---
net/unix/af_unix.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 2db702d..2bcac57 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -114,6 +114,7 @@
#include <linux/mount.h>
#include <net/checksum.h>
#include <linux/security.h>
+#include <linux/freezer.h>

struct hlist_head unix_socket_table[2 * UNIX_HASH_SIZE];
EXPORT_SYMBOL_GPL(unix_socket_table);
@@ -1880,7 +1881,7 @@ static long unix_stream_data_wait(struct sock *sk, long timeo)

set_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
unix_state_unlock(sk);
- timeo = schedule_timeout(timeo);
+ timeo = freezable_schedule_timeout(timeo);
unix_state_lock(sk);
clear_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
}
--
1.8.2.1

Colin Cross

unread,
May 6, 2013, 7:52:04 PM5/6/13
to linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Colin Cross, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Thomas Gleixner
Avoid waking up every thread sleeping in a nanosleep call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.

This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.

Acked-by: Tejun Heo <t...@kernel.org>
Acked-by: Thomas Gleixner <tg...@linutronix.de>
Signed-off-by: Colin Cross <ccr...@android.com>
---
kernel/hrtimer.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 14be27f..e036276 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -47,6 +47,7 @@
#include <linux/sched/sysctl.h>
#include <linux/sched/rt.h>
#include <linux/timer.h>
+#include <linux/freezer.h>

#include <asm/uaccess.h>

@@ -1525,7 +1526,7 @@ static int __sched do_nanosleep(struct hrtimer_sleeper *t, enum hrtimer_mode mod
t->task = NULL;

if (likely(t->task))
- schedule();
+ freezable_schedule();

hrtimer_cancel(&t->timer);
mode = HRTIMER_MODE_ABS;
--
1.8.2.1

Colin Cross

unread,
May 6, 2013, 7:52:05 PM5/6/13
to linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Colin Cross, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Darren Hart, Thomas Gleixner, Randy Dunlap, Al Viro
Avoid waking up every thread sleeping in a futex_wait call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.

This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.

Acked-by: Tejun Heo <t...@kernel.org>
Acked-by: Thomas Gleixner <tg...@linutronix.de>
Acked-by: Darren Hart <dvh...@linux.intel.com>
Signed-off-by: Colin Cross <ccr...@android.com>
---
kernel/futex.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index b26dcfc..d710fae 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -61,6 +61,7 @@
#include <linux/nsproxy.h>
#include <linux/ptrace.h>
#include <linux/sched/rt.h>
+#include <linux/freezer.h>

#include <asm/futex.h>

@@ -1807,7 +1808,7 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
* is no timeout, or if it has yet to expire.
*/
if (!timeout || timeout->task)
- schedule();
+ freezable_schedule();
}
__set_current_state(TASK_RUNNING);
}
--
1.8.2.1

Colin Cross

unread,
May 6, 2013, 7:52:56 PM5/6/13
to linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Colin Cross, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Alexander Viro, linux-...@vger.kernel.org
Avoid waking up every thread sleeping in an epoll_wait call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.

This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.

Acked-by: Tejun Heo <t...@kernel.org>
Signed-off-by: Colin Cross <ccr...@android.com>
---
fs/eventpoll.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 9fec183..65245e7 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -34,6 +34,7 @@
#include <linux/mutex.h>
#include <linux/anon_inodes.h>
#include <linux/device.h>
+#include <linux/freezer.h>
#include <asm/uaccess.h>
#include <asm/io.h>
#include <asm/mman.h>
@@ -1543,7 +1544,8 @@ fetch_events:
}

spin_unlock_irqrestore(&ep->lock, flags);
- if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS))
+ if (!freezable_schedule_hrtimeout_range(to, slack,
+ HRTIMER_MODE_ABS))
timed_out = 1;

spin_lock_irqsave(&ep->lock, flags);
--
1.8.2.1

Colin Cross

unread,
May 6, 2013, 7:53:19 PM5/6/13
to linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Colin Cross, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Alexander Viro, linux-...@vger.kernel.org
Avoid waking up every thread sleeping in a select call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.

This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.

Acked-by: Tejun Heo <t...@kernel.org>
Signed-off-by: Colin Cross <ccr...@android.com>
---
fs/select.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/select.c b/fs/select.c
index 8c1c96c..6b14dc7 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -27,6 +27,7 @@
#include <linux/rcupdate.h>
#include <linux/hrtimer.h>
#include <linux/sched/rt.h>
+#include <linux/freezer.h>

#include <asm/uaccess.h>

@@ -236,7 +237,8 @@ int poll_schedule_timeout(struct poll_wqueues *pwq, int state,

set_current_state(state);
if (!pwq->triggered)
- rc = schedule_hrtimeout_range(expires, slack, HRTIMER_MODE_ABS);
+ rc = freezable_schedule_hrtimeout_range(expires, slack,
+ HRTIMER_MODE_ABS);
__set_current_state(TASK_RUNNING);

/*
--
1.8.2.1

Colin Cross

unread,
May 6, 2013, 7:54:02 PM5/6/13
to linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Colin Cross, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Len Brown
Freezing tasks will wake up almost every userspace task from
where it is blocking and force it to run until it hits a
call to try_to_sleep(), generally on the exit path from the syscall
it is blocking in. On resume each task will run again, usually
restarting the syscall and running until it hits the same
blocking call as it was originally blocked in.

To allow tasks to avoid running on every suspend/resume cycle,
this patch adds additional freezable wrappers around blocking calls
that call freezer_do_not_count(). Combined with the previous patch,
these tasks will not run during suspend or resume unless they wake
up for another reason, in which case they will run until they hit
the try_to_freeze() in freezer_count(), and then continue processing
the wakeup after tasks are thawed.

Additional patches will convert the most common locations that
userspace blocks in to use freezable helpers.

Signed-off-by: Colin Cross <ccr...@android.com>
---
v3:
split out the changes to existing helpers to a separate patch

include/linux/freezer.h | 61 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 61 insertions(+)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index 8430d4c5..7fd81b8 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -180,6 +180,32 @@ static inline void freezable_schedule_unsafe(void)
freezer_count_unsafe();
}

+/*
+ * Like freezable_schedule_timeout(), but should not block the freezer. Do not
+ * call this with locks held.
+ */
+static inline long freezable_schedule_timeout(long timeout)
+{
+ long __retval;
+ freezer_do_not_count();
+ __retval = schedule_timeout(timeout);
+ freezer_count();
+ return __retval;
+}
+
+/*
+ * Like schedule_timeout_interruptible(), but should not block the freezer. Do not
+ * call this with locks held.
+ */
+static inline long freezable_schedule_timeout_interruptible(long timeout)
+{
+ long __retval;
+ freezer_do_not_count();
+ __retval = schedule_timeout_interruptible(timeout);
+ freezer_count();
+ return __retval;
+}
+
/* Like schedule_timeout_killable(), but should not block the freezer. */
static inline long freezable_schedule_timeout_killable(long timeout)
{
@@ -201,6 +227,20 @@ static inline long freezable_schedule_timeout_killable_unsafe(long timeout)
}

/*
+ * Like schedule_hrtimeout_range(), but should not block the freezer. Do not
+ * call this with locks held.
+ */
+static inline int freezable_schedule_hrtimeout_range(ktime_t *expires,
+ unsigned long delta, const enum hrtimer_mode mode)
+{
+ int __retval;
+ freezer_do_not_count();
+ __retval = schedule_hrtimeout_range(expires, delta, mode);
+ freezer_count();
+ return __retval;
+}
+
+/*
* Freezer-friendly wrappers around wait_event_interruptible(),
* wait_event_killable() and wait_event_interruptible_timeout(), originally
* defined in <linux/wait.h>
@@ -244,6 +284,16 @@ static inline long freezable_schedule_timeout_killable_unsafe(long timeout)
__retval; \
})

+#define wait_event_freezable_exclusive(wq, condition) \
+({ \
+ int __retval; \
+ freezer_do_not_count(); \
+ __retval = wait_event_interruptible_exclusive(wq, condition); \
+ freezer_count(); \
+ __retval; \
+})
+
+
#else /* !CONFIG_FREEZER */
static inline bool frozen(struct task_struct *p) { return false; }
static inline bool freezing(struct task_struct *p) { return false; }
@@ -267,18 +317,29 @@ static inline void set_freezable(void) {}

#define freezable_schedule_unsafe() schedule()

+#define freezable_schedule_timeout(timeout) schedule_timeout(timeout)
+
+#define freezable_schedule_timeout_interruptible(timeout) \
+ schedule_timeout_interruptible(timeout)
+
#define freezable_schedule_timeout_killable(timeout) \
schedule_timeout_killable(timeout)

#define freezable_schedule_timeout_killable_unsafe(timeout) \
schedule_timeout_killable(timeout)

+#define freezable_schedule_hrtimeout_range(expires, delta, mode) \
+ schedule_hrtimeout_range(expires, delta, mode)
+
#define wait_event_freezable(wq, condition) \
wait_event_interruptible(wq, condition)

#define wait_event_freezable_timeout(wq, condition, timeout) \
wait_event_interruptible_timeout(wq, condition, timeout)

+#define wait_event_freezable_exclusive(wq, condition) \
+ wait_event_interruptible_exclusive(wq, condition)
+
#define wait_event_freezekillable(wq, condition) \
wait_event_killable(wq, condition)

--
1.8.2.1

Colin Cross

unread,
May 6, 2013, 7:54:23 PM5/6/13
to linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Colin Cross, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Len Brown
Freezing tasks will wake up almost every userspace task from
where it is blocking and force it to run until it hits a
call to try_to_sleep(), generally on the exit path from the syscall
it is blocking in. On resume each task will run again, usually
restarting the syscall and running until it hits the same
blocking call as it was originally blocked in.

Convert the existing wait_event_freezable* wrappers to use
freezer_do_not_count(). Combined with a previous patch,
these tasks will not run during suspend or resume unless they wake
up for another reason, in which case they will run until they hit
the try_to_freeze() in freezer_count(), and then continue processing
the wakeup after tasks are thawed.

This results in a small change in behavior, previously a race
between freezing and a normal wakeup would be won by the wakeup,
now the task will freeze and then handle the wakeup after thawing.

Signed-off-by: Colin Cross <ccr...@android.com>
---
v3:
split this out of the patch that adds new freezable helpers

include/linux/freezer.h | 22 +++++++---------------
1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index bcf9e65..c71337af 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -228,27 +228,19 @@ static inline bool freezer_should_skip(struct task_struct *p)
#define wait_event_freezable(wq, condition) \
({ \
int __retval; \
- for (;;) { \
- __retval = wait_event_interruptible(wq, \
- (condition) || freezing(current)); \
- if (__retval || (condition)) \
- break; \
- try_to_freeze(); \
- } \
+ freezer_do_not_count(); \
+ __retval = wait_event_interruptible(wq, (condition)); \
+ freezer_count(); \
__retval; \
})

#define wait_event_freezable_timeout(wq, condition, timeout) \
({ \
long __retval = timeout; \
- for (;;) { \
- __retval = wait_event_interruptible_timeout(wq, \
- (condition) || freezing(current), \
- __retval); \
- if (__retval <= 0 || (condition)) \
- break; \
- try_to_freeze(); \
- } \
+ freezer_do_not_count(); \
+ __retval = wait_event_interruptible_timeout(wq, (condition), \
+ __retval); \
+ freezer_count(); \
__retval; \
})

--
1.8.2.1

Colin Cross

unread,
May 6, 2013, 7:54:41 PM5/6/13
to linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Colin Cross, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Len Brown
Some of the freezable helpers have to be macros because their
condition argument needs to get evaluated every time through
the wait loop. Convert the others to static inline to make
future changes easier.

Signed-off-by: Colin Cross <ccr...@android.com>
---
include/linux/freezer.h | 58 ++++++++++++++++++++++++-------------------------
1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index c71337af..8430d4c5 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -159,46 +159,46 @@ static inline bool freezer_should_skip(struct task_struct *p)
}

/*
- * These macros are intended to be used whenever you want allow a sleeping
+ * These functions are intended to be used whenever you want allow a sleeping
* task to be frozen. Note that neither return any clear indication of
* whether a freeze event happened while in this function.
*/

/* Like schedule(), but should not block the freezer. */
-#define freezable_schedule() \
-({ \
- freezer_do_not_count(); \
- schedule(); \
- freezer_count(); \
-})
+static inline void freezable_schedule(void)
+{
+ freezer_do_not_count();
+ schedule();
+ freezer_count();
+}

/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
-#define freezable_schedule_unsafe() \
-({ \
- freezer_do_not_count(); \
- schedule(); \
- freezer_count_unsafe(); \
-})
+static inline void freezable_schedule_unsafe(void)
+{
+ freezer_do_not_count();
+ schedule();
+ freezer_count_unsafe();
+}

/* Like schedule_timeout_killable(), but should not block the freezer. */
-#define freezable_schedule_timeout_killable(timeout) \
-({ \
- long __retval; \
- freezer_do_not_count(); \
- __retval = schedule_timeout_killable(timeout); \
- freezer_count(); \
- __retval; \
-})
+static inline long freezable_schedule_timeout_killable(long timeout)
+{
+ long __retval;
+ freezer_do_not_count();
+ __retval = schedule_timeout_killable(timeout);
+ freezer_count();
+ return __retval;
+}

/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
-#define freezable_schedule_timeout_killable_unsafe(timeout) \
-({ \
- long __retval; \
- freezer_do_not_count(); \
- __retval = schedule_timeout_killable(timeout); \
- freezer_count_unsafe(); \
- __retval; \
-})
+static inline long freezable_schedule_timeout_killable_unsafe(long timeout)
+{
+ long __retval;
+ freezer_do_not_count();
+ __retval = schedule_timeout_killable(timeout);
+ freezer_count_unsafe();
+ return __retval;
+}

/*
* Freezer-friendly wrappers around wait_event_interruptible(),
--
1.8.2.1

Colin Cross

unread,
May 6, 2013, 7:55:04 PM5/6/13
to linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Colin Cross, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo
Android goes through suspend/resume very often (every few seconds when
on a busy wifi network with the screen off), and a significant portion
of the energy used to go in and out of suspend is spent in the
freezer. If a task has called freezer_do_not_count(), don't bother
waking it up. If it happens to wake up later it will call
freezer_count() and immediately enter the refrigerator.

Combined with patches to convert freezable helpers to use
freezer_do_not_count() and convert common sites where idle userspace
tasks are blocked to use the freezable helpers, this reduces the
time and energy required to suspend and resume.

Acked-by: Tejun Heo <t...@kernel.org>
Acked-by: Pavel Machek <pa...@ucw.cz>
Signed-off-by: Colin Cross <ccr...@android.com>
---
v2: move check to freeze_task()

kernel/freezer.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/kernel/freezer.c b/kernel/freezer.c
index c38893b..8b2afc1 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -110,6 +110,18 @@ bool freeze_task(struct task_struct *p)
{
unsigned long flags;

+ /*
+ * This check can race with freezer_do_not_count, but worst case that
+ * will result in an extra wakeup being sent to the task. It does not
+ * race with freezer_count(), the barriers in freezer_count() and
+ * freezer_should_skip() ensure that either freezer_count() sees
+ * freezing == true in try_to_freeze() and freezes, or
+ * freezer_should_skip() sees !PF_FREEZE_SKIP and freezes the task
+ * normally.
+ */
+ if (freezer_should_skip(p))
+ return false;
+
spin_lock_irqsave(&freezer_lock, flags);
if (!freezing(p) || frozen(p)) {
spin_unlock_irqrestore(&freezer_lock, flags);
--
1.8.2.1

Colin Cross

unread,
May 6, 2013, 7:55:35 PM5/6/13
to linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Colin Cross, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Len Brown
All tasks can easily be frozen in under 10 ms, switch to using
an initial 1 ms sleep followed by exponential backoff until
8 ms. Also convert the printed time to ms instead of centiseconds.

Acked-by: Pavel Machek <pa...@ucw.cz>
Acked-by: Tejun Heo <t...@kernel.org>
Signed-off-by: Colin Cross <ccr...@android.com>
---
kernel/power/process.c | 26 +++++++++++++++-----------
1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/kernel/power/process.c b/kernel/power/process.c
index 98088e0..fc0df84 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -30,9 +30,10 @@ static int try_to_freeze_tasks(bool user_only)
unsigned int todo;
bool wq_busy = false;
struct timeval start, end;
- u64 elapsed_csecs64;
- unsigned int elapsed_csecs;
+ u64 elapsed_msecs64;
+ unsigned int elapsed_msecs;
bool wakeup = false;
+ int sleep_usecs = USEC_PER_MSEC;

do_gettimeofday(&start);

@@ -68,22 +69,25 @@ static int try_to_freeze_tasks(bool user_only)

/*
* We need to retry, but first give the freezing tasks some
- * time to enter the refrigerator.
+ * time to enter the refrigerator. Start with an initial
+ * 1 ms sleep followed by exponential backoff until 8 ms.
*/
- msleep(10);
+ usleep_range(sleep_usecs / 2, sleep_usecs);
+ if (sleep_usecs < 8 * USEC_PER_MSEC)
+ sleep_usecs *= 2;
}

do_gettimeofday(&end);
- elapsed_csecs64 = timeval_to_ns(&end) - timeval_to_ns(&start);
- do_div(elapsed_csecs64, NSEC_PER_SEC / 100);
- elapsed_csecs = elapsed_csecs64;
+ elapsed_msecs64 = timeval_to_ns(&end) - timeval_to_ns(&start);
+ do_div(elapsed_msecs64, NSEC_PER_MSEC);
+ elapsed_msecs = elapsed_msecs64;

if (todo) {
printk("\n");
- printk(KERN_ERR "Freezing of tasks %s after %d.%02d seconds "
+ printk(KERN_ERR "Freezing of tasks %s after %d.%03d seconds "
"(%d tasks refusing to freeze, wq_busy=%d):\n",
wakeup ? "aborted" : "failed",
- elapsed_csecs / 100, elapsed_csecs % 100,
+ elapsed_msecs / 1000, elapsed_msecs % 1000,
todo - wq_busy, wq_busy);

if (!wakeup) {
@@ -96,8 +100,8 @@ static int try_to_freeze_tasks(bool user_only)
read_unlock(&tasklist_lock);
}
} else {
- printk("(elapsed %d.%02d seconds) ", elapsed_csecs / 100,
- elapsed_csecs % 100);
+ printk("(elapsed %d.%03d seconds) ", elapsed_msecs / 1000,
+ elapsed_msecs % 1000);
}

return todo ? -EBUSY : 0;
--
1.8.2.1

Colin Cross

unread,
May 6, 2013, 7:56:01 PM5/6/13
to linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Colin Cross, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Ben Chan, Len Brown
From: Mandeep Singh Baines <m...@chromium.org>

We shouldn't try_to_freeze if locks are held. Holding a lock can cause a
deadlock if the lock is later acquired in the suspend or hibernate path
(e.g. by dpm). Holding a lock can also cause a deadlock in the case of
cgroup_freezer if a lock is held inside a frozen cgroup that is later
acquired by a process outside that group.

History:
This patch was originally applied as 6aa9707099c and reverted in
dbf520a9d7d4 because NFS was freezing with locks held. It was
deemed better to keep the bad freeze point in NFS to allow laptops
to suspend consistently. The previous patch in this series converts
NFS to call _unsafe versions of the freezable helpers so that
lockdep doesn't complain about them until a more correct fix
can be applied.

[ak...@linux-foundation.org: export debug_check_no_locks_held]
Signed-off-by: Mandeep Singh Baines <m...@chromium.org>
Cc: Ben Chan <ben...@chromium.org>
Cc: Oleg Nesterov <ol...@redhat.com>
Cc: Tejun Heo <t...@kernel.org>
Cc: Rafael J. Wysocki <r...@sisk.pl>
Cc: Ingo Molnar <mi...@redhat.com>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
[ccr...@android.com: don't warn if try_to_freeze_unsafe is called]
Signed-off-by: Colin Cross <ccr...@android.com>
---
include/linux/freezer.h | 3 +++
1 file changed, 3 insertions(+)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index d3c038e..bcf9e65 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -3,6 +3,7 @@
#ifndef FREEZER_H_INCLUDED
#define FREEZER_H_INCLUDED

+#include <linux/debug_locks.h>
#include <linux/sched.h>
#include <linux/wait.h>
#include <linux/atomic.h>
@@ -60,6 +61,8 @@ static inline bool try_to_freeze_unsafe(void)

static inline bool try_to_freeze(void)
{
+ if (!(current->flags & PF_NOFREEZE))
+ debug_check_no_locks_held();
return try_to_freeze_unsafe();
}

--
1.8.2.1

Colin Cross

unread,
May 6, 2013, 7:56:20 PM5/6/13
to linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Colin Cross, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Paul Walmsley, Al Viro, Eric W. Biederman, David Howells
The only existing caller to debug_check_no_locks_held calls it
with 'current' as the task, and the freezer needs to call
debug_check_no_locks_held but doesn't already have a current
task pointer, so remove the argument. It is already assuming
that the current task is relevant by dumping the current stack
trace as part of the warning.

This was originally part of 6aa9707099c (lockdep: check that
no locks held at freeze time) which was reverted in
dbf520a9d7d4.

Original-author: Mandeep Singh Baines <m...@chromium.org>
Signed-off-by: Colin Cross <ccr...@android.com>
---
include/linux/debug_locks.h | 4 ++--
kernel/exit.c | 2 +-
kernel/lockdep.c | 17 ++++++++---------
3 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
index 3bd46f7..a975de1 100644
--- a/include/linux/debug_locks.h
+++ b/include/linux/debug_locks.h
@@ -51,7 +51,7 @@ struct task_struct;
extern void debug_show_all_locks(void);
extern void debug_show_held_locks(struct task_struct *task);
extern void debug_check_no_locks_freed(const void *from, unsigned long len);
-extern void debug_check_no_locks_held(struct task_struct *task);
+extern void debug_check_no_locks_held(void);
#else
static inline void debug_show_all_locks(void)
{
@@ -67,7 +67,7 @@ debug_check_no_locks_freed(const void *from, unsigned long len)
}

static inline void
-debug_check_no_locks_held(struct task_struct *task)
+debug_check_no_locks_held(void)
{
}
#endif
diff --git a/kernel/exit.c b/kernel/exit.c
index 60bc027..51e485c 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -835,7 +835,7 @@ void do_exit(long code)
/*
* Make sure we are holding no locks:
*/
- debug_check_no_locks_held(tsk);
+ debug_check_no_locks_held();
/*
* We can do this unlocked here. The futex code uses this flag
* just to verify whether the pi state cleanup has been done
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 8a0efac..259db20 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -4088,7 +4088,7 @@ void debug_check_no_locks_freed(const void *mem_from, unsigned long mem_len)
}
EXPORT_SYMBOL_GPL(debug_check_no_locks_freed);

-static void print_held_locks_bug(struct task_struct *curr)
+static void print_held_locks_bug(void)
{
if (!debug_locks_off())
return;
@@ -4097,22 +4097,21 @@ static void print_held_locks_bug(struct task_struct *curr)

printk("\n");
printk("=====================================\n");
- printk("[ BUG: lock held at task exit time! ]\n");
+ printk("[ BUG: %s/%d still has locks held! ]\n",
+ current->comm, task_pid_nr(current));
print_kernel_ident();
printk("-------------------------------------\n");
- printk("%s/%d is exiting with locks still held!\n",
- curr->comm, task_pid_nr(curr));
- lockdep_print_held_locks(curr);
-
+ lockdep_print_held_locks(current);
printk("\nstack backtrace:\n");
dump_stack();
}

-void debug_check_no_locks_held(struct task_struct *task)
+void debug_check_no_locks_held(void)
{
- if (unlikely(task->lockdep_depth > 0))
- print_held_locks_bug(task);
+ if (unlikely(current->lockdep_depth > 0))
+ print_held_locks_bug();
}
+EXPORT_SYMBOL_GPL(debug_check_no_locks_held);

void debug_show_all_locks(void)
{
--
1.8.2.1

Colin Cross

unread,
May 6, 2013, 7:56:38 PM5/6/13
to linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Colin Cross, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Len Brown
CIFS calls wait_event_freezekillable_unsafe with a VFS lock held,
which is unsafe and will cause lockdep warnings when 6aa9707
"lockdep: check that no locks held at freeze time" is reapplied
(it was reverted in dbf520a). CIFS shouldn't be doing this, but
it has long-running syscalls that must hold a lock but also
shouldn't block suspend. Until CIFS freeze handling is rewritten
to use a signal to exit out of the critical section, add a new
wait_event_freezekillable_unsafe helper that will not run the
lockdep test when 6aa9707 is reapplied, and call it from CIFS.

In practice the likley result of holding the lock while freezing
is that a second task blocked on the lock will never freeze,
aborting suspend, but it is possible to manufacture a case using
the cgroup freezer, the lock, and the suspend freezer to create
a deadlock. Silencing the lockdep warning here will allow
problems to be found in other drivers that may have a more
serious deadlock risk, and prevent new problems from being added.

Signed-off-by: Colin Cross <ccr...@android.com>
---
include/linux/freezer.h | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index 5b31e21c..d3c038e 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -212,6 +212,16 @@ static inline bool freezer_should_skip(struct task_struct *p)
__retval; \
})

+/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
+#define wait_event_freezekillable_unsafe(wq, condition) \
+({ \
+ int __retval; \
+ freezer_do_not_count(); \
+ __retval = wait_event_killable(wq, (condition)); \
+ freezer_count_unsafe(); \
+ __retval; \
+})
+
#define wait_event_freezable(wq, condition) \
({ \
int __retval; \
@@ -277,6 +287,9 @@ static inline void set_freezable(void) {}
#define wait_event_freezekillable(wq, condition) \
wait_event_killable(wq, condition)

+#define wait_event_freezekillable_unsafe(wq, condition) \
+ wait_event_killable(wq, condition)
+
#endif /* !CONFIG_FREEZER */

#endif /* FREEZER_H_INCLUDED */
--
1.8.2.1

Colin Cross

unread,
May 6, 2013, 7:57:39 PM5/6/13
to linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Colin Cross, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Greg Kroah-Hartman, Al Viro, Arve Hjønnevåg, Eric W. Biederman, Sachin Kamat, de...@driverdev.osuosl.org
Avoid waking up every thread sleeping in a binder call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.

This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.

Acked-by: Tejun Heo <t...@kernel.org>
Signed-off-by: Colin Cross <ccr...@android.com>
---
drivers/staging/android/binder.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/android/binder.c b/drivers/staging/android/binder.c
index 24456a0..af8fba4 100644
--- a/drivers/staging/android/binder.c
+++ b/drivers/staging/android/binder.c
@@ -20,6 +20,7 @@
#include <asm/cacheflush.h>
#include <linux/fdtable.h>
#include <linux/file.h>
+#include <linux/freezer.h>
#include <linux/fs.h>
#include <linux/list.h>
#include <linux/miscdevice.h>
@@ -2140,13 +2141,13 @@ retry:
if (!binder_has_proc_work(proc, thread))
ret = -EAGAIN;
} else
- ret = wait_event_interruptible_exclusive(proc->wait, binder_has_proc_work(proc, thread));
+ ret = wait_event_freezable_exclusive(proc->wait, binder_has_proc_work(proc, thread));
} else {
if (non_block) {
if (!binder_has_thread_work(thread))
ret = -EAGAIN;
} else
- ret = wait_event_interruptible(thread->wait, binder_has_thread_work(thread));
+ ret = wait_event_freezable(thread->wait, binder_has_thread_work(thread));
}

binder_lock(__func__);
--
1.8.2.1

Jeff Layton

unread,
May 7, 2013, 6:08:26 AM5/7/13
to Colin Cross, linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Len Brown
I think you also need to convert wait_for_response in the cifs code to
use this helper. While it's a pretty straightforward change, you should
probably cc linux...@vger.kernel.org as well.

--
Jeff Layton <jla...@redhat.com>

Pavel Machek

unread,
May 7, 2013, 8:28:29 AM5/7/13
to Colin Cross, linux-...@vger.kernel.org, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Len Brown
On Mon 2013-05-06 16:50:07, Colin Cross wrote:
> CIFS calls wait_event_freezekillable_unsafe with a VFS lock held,
> which is unsafe and will cause lockdep warnings when 6aa9707
> "lockdep: check that no locks held at freeze time" is reapplied
> (it was reverted in dbf520a). CIFS shouldn't be doing this, but
> it has long-running syscalls that must hold a lock but also
> shouldn't block suspend. Until CIFS freeze handling is rewritten
> to use a signal to exit out of the critical section, add a new
> wait_event_freezekillable_unsafe helper that will not run the
> lockdep test when 6aa9707 is reapplied, and call it from CIFS.
>
> In practice the likley result of holding the lock while freezing
> is that a second task blocked on the lock will never freeze,
> aborting suspend, but it is possible to manufacture a case using
> the cgroup freezer, the lock, and the suspend freezer to create
> a deadlock. Silencing the lockdep warning here will allow
> problems to be found in other drivers that may have a more
> serious deadlock risk, and prevent new problems from being added.
>
> Signed-off-by: Colin Cross <ccr...@android.com>

Acked-by: Pavel Machek <pa...@ucw.cz>

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Pavel Machek

unread,
May 7, 2013, 8:28:55 AM5/7/13
to Colin Cross, linux-...@vger.kernel.org, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Paul Walmsley, Al Viro, Eric W. Biederman, David Howells
On Mon 2013-05-06 16:50:08, Colin Cross wrote:
> The only existing caller to debug_check_no_locks_held calls it
> with 'current' as the task, and the freezer needs to call
> debug_check_no_locks_held but doesn't already have a current
> task pointer, so remove the argument. It is already assuming
> that the current task is relevant by dumping the current stack
> trace as part of the warning.
>
> This was originally part of 6aa9707099c (lockdep: check that
> no locks held at freeze time) which was reverted in
> dbf520a9d7d4.
>
> Original-author: Mandeep Singh Baines <m...@chromium.org>
> Signed-off-by: Colin Cross <ccr...@android.com>

Pavel Machek

unread,
May 7, 2013, 8:29:27 AM5/7/13
to Colin Cross, linux-...@vger.kernel.org, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Ben Chan, Len Brown

Colin Cross

unread,
May 7, 2013, 1:47:58 PM5/7/13
to Jeff Layton, lkml, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Oleg Nesterov, linux-nfs, Linux PM list, netdev, Linus Torvalds, Tejun Heo, Len Brown
Oops, dropped a hunk which is why linux-cifs didn't get cc'd. I will resend it.

Colin Cross

unread,
May 7, 2013, 1:52:42 PM5/7/13
to linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Colin Cross, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Steve French, Len Brown, linux...@vger.kernel.org, samba-t...@lists.samba.org
CIFS calls wait_event_freezekillable_unsafe with a VFS lock held,
which is unsafe and will cause lockdep warnings when 6aa9707
"lockdep: check that no locks held at freeze time" is reapplied
(it was reverted in dbf520a). CIFS shouldn't be doing this, but
it has long-running syscalls that must hold a lock but also
shouldn't block suspend. Until CIFS freeze handling is rewritten
to use a signal to exit out of the critical section, add a new
wait_event_freezekillable_unsafe helper that will not run the
lockdep test when 6aa9707 is reapplied, and call it from CIFS.

In practice the likley result of holding the lock while freezing
is that a second task blocked on the lock will never freeze,
aborting suspend, but it is possible to manufacture a case using
the cgroup freezer, the lock, and the suspend freezer to create
a deadlock. Silencing the lockdep warning here will allow
problems to be found in other drivers that may have a more
serious deadlock risk, and prevent new problems from being added.

Acked-by: Pavel Machek <pa...@ucw.cz>
Signed-off-by: Colin Cross <ccr...@android.com>
---
v4:
Corrected to include CIFS wait_for_response hunk.
The rest of this series is still at v3.

fs/cifs/transport.c | 2 +-
include/linux/freezer.h | 13 +++++++++++++
2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 1a52868..e7f22f8 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -452,7 +452,7 @@ wait_for_response(struct TCP_Server_Info *server, struct mid_q_entry *midQ)
{
int error;

- error = wait_event_freezekillable(server->response_q,
+ error = wait_event_freezekillable_unsafe(server->response_q,
midQ->mid_state != MID_REQUEST_SUBMITTED);
if (error < 0)
return -ERESTARTSYS;
--
1.8.2.1

Jeff Layton

unread,
May 7, 2013, 2:12:38 PM5/7/13
to Colin Cross, linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Steve French, Len Brown, linux...@vger.kernel.org, samba-t...@lists.samba.org
Looks fine...

Reviewed-by: Jeff Layton <jla...@redhat.com>

Tejun Heo

unread,
May 7, 2013, 2:12:53 PM5/7/13
to Colin Cross, linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds
Hello,

On Mon, May 06, 2013 at 04:50:05PM -0700, Colin Cross wrote:
> On slow cpus the large number of task wakeups and context switches
> triggered by freezing and thawing tasks can take a significant amount
> of cpu time. This patch series reduces the amount of work done during
> freezing tasks by avoiding waking up tasks that are already in a freezable
> state.

For the whole series,

Acked-by: Tejun Heo <t...@kernel.org>

Thanks a lot!

--
tejun

Rafael J. Wysocki

unread,
May 7, 2013, 7:54:19 PM5/7/13
to Tejun Heo, Colin Cross, linux-...@vger.kernel.org, Pavel Machek, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds
On Tuesday, May 07, 2013 11:12:37 AM Tejun Heo wrote:
> Hello,
>
> On Mon, May 06, 2013 at 04:50:05PM -0700, Colin Cross wrote:
> > On slow cpus the large number of task wakeups and context switches
> > triggered by freezing and thawing tasks can take a significant amount
> > of cpu time. This patch series reduces the amount of work done during
> > freezing tasks by avoiding waking up tasks that are already in a freezable
> > state.
>
> For the whole series,
>
> Acked-by: Tejun Heo <t...@kernel.org>
>
> Thanks a lot!

All 16 patches queued up as v3.11 material.

Many thanks to everyone involved,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

tip-bot for Colin Cross

unread,
Jun 25, 2013, 5:16:45 PM6/25/13
to linux-ti...@vger.kernel.org, linux-...@vger.kernel.org, h...@zytor.com, mi...@kernel.org, dvh...@linux.intel.com, rdu...@infradead.org, vi...@zeniv.linux.org.uk, ccr...@android.com, t...@kernel.org, tg...@linutronix.de, ol...@redhat.com, r...@sisk.pl
Commit-ID: 88c8004fd3a5fdd2378069de86b90b21110d33a4
Gitweb: http://git.kernel.org/tip/88c8004fd3a5fdd2378069de86b90b21110d33a4
Author: Colin Cross <ccr...@android.com>
AuthorDate: Wed, 1 May 2013 18:35:05 -0700
Committer: Thomas Gleixner <tg...@linutronix.de>
CommitDate: Tue, 25 Jun 2013 23:11:19 +0200

futex: Use freezable blocking call

Avoid waking up every thread sleeping in a futex_wait call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.

This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.

Signed-off-by: Colin Cross <ccr...@android.com>
Cc: Rafael J. Wysocki <r...@sisk.pl>
Cc: ar...@android.com
Cc: Tejun Heo <t...@kernel.org>
Cc: Oleg Nesterov <ol...@redhat.com>
Cc: Darren Hart <dvh...@linux.intel.com>
Cc: Randy Dunlap <rdu...@infradead.org>
Cc: Al Viro <vi...@zeniv.linux.org.uk>
Link: http://lkml.kernel.org/r/1367458508-9133-8-g...@android.com
Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
---
kernel/futex.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 49dacfb..c3a1a55 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -62,6 +62,7 @@
#include <linux/ptrace.h>
#include <linux/sched/rt.h>
#include <linux/hugetlb.h>
+#include <linux/freezer.h>

#include <asm/futex.h>

@@ -1808,7 +1809,7 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
* is no timeout, or if it has yet to expire.
*/
if (!timeout || timeout->task)
- schedule();
+ freezable_schedule();
}
__set_current_state(TASK_RUNNING);
}

Michael Leun

unread,
Jul 22, 2013, 7:04:54 PM7/22/13
to Colin Cross, linux-...@vger.kernel.org, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Darren Hart, Thomas Gleixner, Randy Dunlap, Al Viro
On Mon, 6 May 2013 16:50:18 -0700
Colin Cross <ccr...@android.com> wrote:

> Avoid waking up every thread sleeping in a futex_wait call during
[...]

With 3.11-rc s2disk from suspend-utils stopped working: Frozen at
displaying 0% of saving image to disk.

echo "1" >/sys/power/state still works.

Bisecting yielded 88c8004fd3a5fdd2378069de86b90b21110d33a4, reverting
that from 3.11-rc2 makes s2disk working again.

--
MfG,

Michael Leun

Pavel Machek

unread,
Jul 22, 2013, 8:26:37 PM7/22/13
to Michael Leun, Colin Cross, linux-...@vger.kernel.org, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Oleg Nesterov, linu...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, Linus Torvalds, Tejun Heo, Darren Hart, Thomas Gleixner, Randy Dunlap, Al Viro
On Tue 2013-07-23 01:02:50, Michael Leun wrote:
> On Mon, 6 May 2013 16:50:18 -0700
> Colin Cross <ccr...@android.com> wrote:
>
> > Avoid waking up every thread sleeping in a futex_wait call during
> [...]
>
> With 3.11-rc s2disk from suspend-utils stopped working: Frozen at
> displaying 0% of saving image to disk.
>
> echo "1" >/sys/power/state still works.
>
> Bisecting yielded 88c8004fd3a5fdd2378069de86b90b21110d33a4, reverting
> that from 3.11-rc2 makes s2disk working again.

Would id be possible to get all the backtraces using magic sysrq?

..actually...

I see what could happen. Before, system hibernated in state where all
the futexes were unlocked. Now, it can happen that we attempt s2disk
with futex held. s2disk should not depend on other parts of userspace,
and should not take futexes, but maybe it does...?
Pavel

Linus Torvalds

unread,
Jul 22, 2013, 8:33:00 PM7/22/13
to Colin Cross, Michael Leun, lkml, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Oleg Nesterov, linux-nfs, Linux PM list, netdev, Tejun Heo, Darren Hart, Thomas Gleixner, Randy Dunlap, Al Viro
On Mon, Jul 22, 2013 at 4:55 PM, Colin Cross <ccr...@android.com> wrote:
>
> I think the right solution is to add a flag to the freezing task that
> marks it unfreezable. I think PF_NOFREEZE would work, although it is
> normally used on kernel threads, can you see if the attached patch
> helps?

Hmm. That does seem to be the right thing to do, but I wonder about
the *other* callers of freeze_processes() IOW, kexec and friends.

So maybe we should do this in {freeze|thaw}_processes() itself, and
just make the rule be that the caller of freeze_processes() itself is
obviously not frozen, and has to be the same one that then thaws
things?

Colin? Rafael? Comments?

Linus

Colin Cross

unread,
Jul 22, 2013, 8:43:00 PM7/22/13
to Linus Torvalds, Michael Leun, lkml, Pavel Machek, Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Oleg Nesterov, linux-nfs, Linux PM list, netdev, Tejun Heo, Darren Hart, Thomas Gleixner, Randy Dunlap, Al Viro
On Mon, Jul 22, 2013 at 5:32 PM, Linus Torvalds
<torv...@linux-foundation.org> wrote:
> On Mon, Jul 22, 2013 at 4:55 PM, Colin Cross <ccr...@android.com> wrote:
>>
>> I think the right solution is to add a flag to the freezing task that
>> marks it unfreezable. I think PF_NOFREEZE would work, although it is
>> normally used on kernel threads, can you see if the attached patch
>> helps?
>
> Hmm. That does seem to be the right thing to do, but I wonder about
> the *other* callers of freeze_processes() IOW, kexec and friends.
>
> So maybe we should do this in {freeze|thaw}_processes() itself, and
> just make the rule be that the caller of freeze_processes() itself is
> obviously not frozen, and has to be the same one that then thaws
> things?
>
> Colin? Rafael? Comments?
>
> Linus

I was worried about clearing the flag in thaw_processes(). If a
kernel thread with PF_NOFREEZE set ever called thaw_processes(), which
autosleep might do, it would clear the flag. Or if a different thread
called freeze_processes() and thaw_processes(). All the other callers
besides the SNAPSHOT_FREEZE ioctl stay in the kernel between
freeze_processes() and thaw_processes(), which makes the fanout of
places that could call try_to_freeze() much more controllable.

Using a new flag that operates like PF_NOFREEZE but doesn't conflict
with it, or a nofreeze_depth counter, would also work.

Rafael J. Wysocki

unread,
Jul 22, 2013, 9:31:49 PM7/22/13
to Colin Cross, Linus Torvalds, Michael Leun, lkml, Pavel Machek, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Oleg Nesterov, linux-nfs, Linux PM list, netdev, Tejun Heo, Darren Hart, Thomas Gleixner, Randy Dunlap, Al Viro
On Monday, July 22, 2013 05:42:49 PM Colin Cross wrote:
> On Mon, Jul 22, 2013 at 5:32 PM, Linus Torvalds
> <torv...@linux-foundation.org> wrote:
> > On Mon, Jul 22, 2013 at 4:55 PM, Colin Cross <ccr...@android.com> wrote:
> >>
> >> I think the right solution is to add a flag to the freezing task that
> >> marks it unfreezable. I think PF_NOFREEZE would work, although it is
> >> normally used on kernel threads, can you see if the attached patch
> >> helps?
> >
> > Hmm. That does seem to be the right thing to do, but I wonder about
> > the *other* callers of freeze_processes() IOW, kexec and friends.
> >
> > So maybe we should do this in {freeze|thaw}_processes() itself, and
> > just make the rule be that the caller of freeze_processes() itself is
> > obviously not frozen, and has to be the same one that then thaws
> > things?
> >
> > Colin? Rafael? Comments?
> >
> > Linus
>
> I was worried about clearing the flag in thaw_processes(). If a
> kernel thread with PF_NOFREEZE set ever called thaw_processes(), which
> autosleep might do, it would clear the flag. Or if a different thread
> called freeze_processes() and thaw_processes().

Is that legitimate?

> All the other callers besides the SNAPSHOT_FREEZE ioctl stay in the kernel
> between freeze_processes() and thaw_processes(), which makes the fanout of
> places that could call try_to_freeze() much more controllable.
>
> Using a new flag that operates like PF_NOFREEZE but doesn't conflict
> with it, or a nofreeze_depth counter, would also work.

Well, that would be robust enough. At least if the purpose of that new flag
is clearly specified, people hopefully won't be tempted to optimize it away in
the future.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

Colin Cross

unread,
Jul 23, 2013, 2:28:53 AM7/23/13
to Rafael J. Wysocki, Linus Torvalds, Michael Leun, lkml, Pavel Machek, Peter Zijlstra, Ingo Molnar, Andrew Morton, Mandeep Singh Baines, Oleg Nesterov, linux-nfs, Linux PM list, netdev, Tejun Heo, Darren Hart, Thomas Gleixner, Randy Dunlap, Al Viro
On Mon, Jul 22, 2013 at 6:41 PM, Rafael J. Wysocki <r...@sisk.pl> wrote:
> On Monday, July 22, 2013 05:42:49 PM Colin Cross wrote:
>> On Mon, Jul 22, 2013 at 5:32 PM, Linus Torvalds
>> <torv...@linux-foundation.org> wrote:
>> > On Mon, Jul 22, 2013 at 4:55 PM, Colin Cross <ccr...@android.com> wrote:
>> >>
>> >> I think the right solution is to add a flag to the freezing task that
>> >> marks it unfreezable. I think PF_NOFREEZE would work, although it is
>> >> normally used on kernel threads, can you see if the attached patch
>> >> helps?
>> >
>> > Hmm. That does seem to be the right thing to do, but I wonder about
>> > the *other* callers of freeze_processes() IOW, kexec and friends.
>> >
>> > So maybe we should do this in {freeze|thaw}_processes() itself, and
>> > just make the rule be that the caller of freeze_processes() itself is
>> > obviously not frozen, and has to be the same one that then thaws
>> > things?
>> >
>> > Colin? Rafael? Comments?
>> >
>> > Linus
>>
>> I was worried about clearing the flag in thaw_processes(). If a
>> kernel thread with PF_NOFREEZE set ever called thaw_processes(), which
>> autosleep might do, it would clear the flag. Or if a different thread
>> called freeze_processes() and thaw_processes().
>
> Is that legitimate?

Nothing precludes it today, but I don't see any need for it. I'll add
a comment when I add the flag.

>> All the other callers besides the SNAPSHOT_FREEZE ioctl stay in the kernel
>> between freeze_processes() and thaw_processes(), which makes the fanout of
>> places that could call try_to_freeze() much more controllable.
>>
>> Using a new flag that operates like PF_NOFREEZE but doesn't conflict
>> with it, or a nofreeze_depth counter, would also work.
>
> Well, that would be robust enough. At least if the purpose of that new flag
> is clearly specified, people hopefully won't be tempted to optimize it away in
> the future.
>
> Thanks,
> Rafael

OK, I'll add a new flag.
It is loading more messages.
0 new messages