Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH 4/8] PM / Sleep: Use wait queue to signal "no wakeup events in progress"

187 views
Skip to first unread message

Rafael J. Wysocki

unread,
Feb 6, 2012, 8:06:14 PM2/6/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern
From: Rafael J. Wysocki <r...@sisk.pl>

The current wakeup source deactivation code doesn't do anything when
the counter of wakeup events in progress goes down to zero, which
requires pm_get_wakeup_count() to poll that counter periodically.
Although this reduces the average time it takes to deactivate a
wakeup source, it also may lead to a substantial amount of unnecessary
polling if there are extended periods of wakeup activity. Thus it
seems reasonable to use a wait queue for signaling the "no wakeup
events in progress" condition and remove the polling.

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
drivers/base/power/wakeup.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)

Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -17,8 +17,6 @@

#include "power.h"

-#define TIMEOUT 100
-
/*
* If set, the suspend/hibernate code will abort transitions to a sleep state
* if wakeup events are registered during or immediately before the transition.
@@ -52,6 +50,8 @@ static void pm_wakeup_timer_fn(unsigned

static LIST_HEAD(wakeup_sources);

+static DECLARE_WAIT_QUEUE_HEAD(wakeup_count_wait_queue);
+
/**
* wakeup_source_create - Create a struct wakeup_source object.
* @name: Name of the new wakeup source.
@@ -84,7 +84,7 @@ void wakeup_source_destroy(struct wakeup
while (ws->active) {
spin_unlock_irq(&ws->lock);

- schedule_timeout_interruptible(msecs_to_jiffies(TIMEOUT));
+ schedule_timeout_interruptible(msecs_to_jiffies(100));

spin_lock_irq(&ws->lock);
}
@@ -411,6 +411,7 @@ EXPORT_SYMBOL_GPL(pm_stay_awake);
*/
static void wakeup_source_deactivate(struct wakeup_source *ws)
{
+ unsigned int cnt, inpr;
ktime_t duration;
ktime_t now;

@@ -444,6 +445,10 @@ static void wakeup_source_deactivate(str
* couter of wakeup events in progress simultaneously.
*/
atomic_add(MAX_IN_PROGRESS, &combined_event_count);
+
+ split_counters(&cnt, &inpr);
+ if (!inpr)
+ wake_up_all(&wakeup_count_wait_queue);
}

/**
@@ -624,14 +629,19 @@ bool pm_wakeup_pending(void)
bool pm_get_wakeup_count(unsigned int *count)
{
unsigned int cnt, inpr;
+ DEFINE_WAIT(wait);

for (;;) {
+ prepare_to_wait(&wakeup_count_wait_queue, &wait,
+ TASK_INTERRUPTIBLE);
split_counters(&cnt, &inpr);
if (inpr == 0 || signal_pending(current))
break;
pm_wakeup_update_hit_counts();
- schedule_timeout_interruptible(msecs_to_jiffies(TIMEOUT));
+
+ schedule();
}
+ finish_wait(&wakeup_count_wait_queue, &wait);

split_counters(&cnt, &inpr);
*count = cnt;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Rafael J. Wysocki

unread,
Feb 6, 2012, 8:06:26 PM2/6/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern
From: Rafael J. Wysocki <r...@sisk.pl>

Currently, the device suspend code only checks if there have been
any wakeup events, and therefore the ongoing system transition to a
sleep state should be aborted, during the first (i.e. "suspend")
device suspend phase. However, wakeup events may be reported later
as well, so it's reasonable to look for them in the in the subsequent
(i.e. "late suspend" and "suspend noirq") phases.

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
drivers/base/power/main.c | 10 ++++++++++
1 file changed, 10 insertions(+)

Index: linux/drivers/base/power/main.c
===================================================================
--- linux.orig/drivers/base/power/main.c
+++ linux/drivers/base/power/main.c
@@ -889,6 +889,11 @@ static int dpm_suspend_noirq(pm_message_
if (!list_empty(&dev->power.entry))
list_move(&dev->power.entry, &dpm_noirq_list);
put_device(dev);
+
+ if (pm_wakeup_pending()) {
+ error = -EBUSY;
+ break;
+ }
}
mutex_unlock(&dpm_list_mtx);
if (error)
@@ -962,6 +967,11 @@ static int dpm_suspend_late(pm_message_t
if (!list_empty(&dev->power.entry))
list_move(&dev->power.entry, &dpm_late_early_list);
put_device(dev);
+
+ if (pm_wakeup_pending()) {
+ error = -EBUSY;
+ break;
+ }
}
mutex_unlock(&dpm_list_mtx);
if (error)

Rafael J. Wysocki

unread,
Feb 6, 2012, 8:06:40 PM2/6/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern
From: Rafael J. Wysocki <r...@sisk.pl>

Wakeup statistics used by Android are slightly different from what we
have at the moment, so modify them to follow Android more closely.

This removes the struct wakeup_source's hit_cout field, which is very
rough and therefore not very useful, and adds two new fields,
wakeup_count and expire_count. The first one tracks how many times
the wakeup source is activated with events_check_enabled set (which
roughly corresponds to the situations when a system power transition
to a sleep state is in progress and should be aborted by this wakeup
source if it is the only active one at that time) and the second one
is the number of times the wakeup source has been activated with a
timeout that expired.

Additionally, the last_time field is now updated when the wakeup
source is deactivated too (previously it was only updated during
the wakeup source's activation), which seems to be what Android does
with the analogous counter for wakelocks.

---
drivers/base/power/sysfs.c | 30 +++++++++++++++++++++++-----
drivers/base/power/wakeup.c | 47 +++++++++++++++++---------------------------
include/linux/pm_wakeup.h | 12 +++++++----
3 files changed, 52 insertions(+), 37 deletions(-)

Index: linux/include/linux/pm_wakeup.h
===================================================================
--- linux.orig/include/linux/pm_wakeup.h
+++ linux/include/linux/pm_wakeup.h
@@ -33,12 +33,14 @@
*
* @total_time: Total time this wakeup source has been active.
* @max_time: Maximum time this wakeup source has been continuously active.
- * @last_time: Monotonic clock when the wakeup source's was activated last time.
+ * @last_time: Monotonic clock when the wakeup source's was touched last time.
* @event_count: Number of signaled wakeup events.
* @active_count: Number of times the wakeup sorce was activated.
* @relax_count: Number of times the wakeup sorce was deactivated.
- * @hit_count: Number of times the wakeup sorce might abort system suspend.
+ * @expire_count: Number of times the wakeup source's timeout has expired.
+ * @wakeup_count: Number of times the wakeup source might abort suspend.
* @active: Status of the wakeup source.
+ * @has_timeout: The wakeup source has been activated with a timeout.
*/
struct wakeup_source {
char *name;
@@ -52,8 +54,10 @@ struct wakeup_source {
unsigned long event_count;
unsigned long active_count;
unsigned long relax_count;
- unsigned long hit_count;
- unsigned int active:1;
+ unsigned long expire_count;
+ unsigned long wakeup_count;
+ bool active:1;
+ bool has_timeout:1;
};

#ifdef CONFIG_PM_SLEEP
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -21,7 +21,7 @@
* If set, the suspend/hibernate code will abort transitions to a sleep state
* if wakeup events are registered during or immediately before the transition.
*/
-bool events_check_enabled;
+bool events_check_enabled __read_mostly;

/*
* Combined counters of registered wakeup events and wakeup events in progress.
@@ -370,9 +370,15 @@ void __pm_stay_awake(struct wakeup_sourc
return;

spin_lock_irqsave(&ws->lock, flags);
+
ws->event_count++;
if (!ws->active)
wakeup_source_activate(ws);
+
+ /* This is racy, but the counter is approximate anyway. */
+ if (events_check_enabled)
+ ws->wakeup_count++;
+
spin_unlock_irqrestore(&ws->lock, flags);
}
EXPORT_SYMBOL_GPL(__pm_stay_awake);
@@ -438,6 +444,11 @@ static void wakeup_source_deactivate(str
if (ktime_to_ns(duration) > ktime_to_ns(ws->max_time))
ws->max_time = duration;

+ ws->last_time = now;
+ if (ws->has_timeout && time_after(jiffies, ws->timer_expires))
+ ws->expire_count++;
+
+ ws->has_timeout = false;
del_timer(&ws->timer);

/*
@@ -542,6 +553,7 @@ void __pm_wakeup_event(struct wakeup_sou
if (time_after(expires, ws->timer_expires)) {
mod_timer(&ws->timer, expires);
ws->timer_expires = expires;
+ ws->has_timeout = true;
}

unlock:
@@ -571,24 +583,6 @@ void pm_wakeup_event(struct device *dev,
EXPORT_SYMBOL_GPL(pm_wakeup_event);

/**
- * pm_wakeup_update_hit_counts - Update hit counts of all active wakeup sources.
- */
-static void pm_wakeup_update_hit_counts(void)
-{
- unsigned long flags;
- struct wakeup_source *ws;
-
- rcu_read_lock();
- list_for_each_entry_rcu(ws, &wakeup_sources, entry) {
- spin_lock_irqsave(&ws->lock, flags);
- if (ws->active)
- ws->hit_count++;
- spin_unlock_irqrestore(&ws->lock, flags);
- }
- rcu_read_unlock();
-}
-
-/**
* pm_wakeup_pending - Check if power transition in progress should be aborted.
*
* Compare the current number of registered wakeup events with its preserved
@@ -610,8 +604,6 @@ bool pm_wakeup_pending(void)
events_check_enabled = !ret;
}
spin_unlock_irqrestore(&events_lock, flags);
- if (ret)
- pm_wakeup_update_hit_counts();
return ret;
}

@@ -637,7 +629,6 @@ bool pm_get_wakeup_count(unsigned int *c
split_counters(&cnt, &inpr);
if (inpr == 0 || signal_pending(current))
break;
- pm_wakeup_update_hit_counts();

schedule();
}
@@ -670,8 +661,6 @@ bool pm_save_wakeup_count(unsigned int c
events_check_enabled = true;
}
spin_unlock_irq(&events_lock);
- if (!events_check_enabled)
- pm_wakeup_update_hit_counts();
return events_check_enabled;
}

@@ -706,9 +695,10 @@ static int print_wakeup_source_stats(str
active_time = ktime_set(0, 0);
}

- ret = seq_printf(m, "%-12s\t%lu\t\t%lu\t\t%lu\t\t"
+ ret = seq_printf(m, "%-12s\t%lu\t\t%lu\t\t%lu\t\t%lu\t\t"
"%lld\t\t%lld\t\t%lld\t\t%lld\n",
- ws->name, active_count, ws->event_count, ws->hit_count,
+ ws->name, active_count, ws->event_count,
+ ws->wakeup_count, ws->expire_count,
ktime_to_ms(active_time), ktime_to_ms(total_time),
ktime_to_ms(max_time), ktime_to_ms(ws->last_time));

@@ -725,8 +715,9 @@ static int wakeup_sources_stats_show(str
{
struct wakeup_source *ws;

- seq_puts(m, "name\t\tactive_count\tevent_count\thit_count\t"
- "active_since\ttotal_time\tmax_time\tlast_change\n");
+ seq_puts(m, "name\t\tactive_count\tevent_count\twakeup_count\t"
+ "expire_count\tactive_since\ttotal_time\tmax_time\t"
+ "last_change\n");

rcu_read_lock();
list_for_each_entry_rcu(ws, &wakeup_sources, entry)
Index: linux/drivers/base/power/sysfs.c
===================================================================
--- linux.orig/drivers/base/power/sysfs.c
+++ linux/drivers/base/power/sysfs.c
@@ -288,22 +288,41 @@ static ssize_t wakeup_active_count_show(

static DEVICE_ATTR(wakeup_active_count, 0444, wakeup_active_count_show, NULL);

-static ssize_t wakeup_hit_count_show(struct device *dev,
- struct device_attribute *attr, char *buf)
+static ssize_t wakeup_wakeup_count_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ unsigned long count = 0;
+ bool enabled = false;
+
+ spin_lock_irq(&dev->power.lock);
+ if (dev->power.wakeup) {
+ count = dev->power.wakeup->wakeup_count;
+ enabled = true;
+ }
+ spin_unlock_irq(&dev->power.lock);
+ return enabled ? sprintf(buf, "%lu\n", count) : sprintf(buf, "\n");
+}
+
+static DEVICE_ATTR(wakeup_wakeup_count, 0444, wakeup_wakeup_count_show, NULL);
+
+static ssize_t wakeup_expire_count_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
{
unsigned long count = 0;
bool enabled = false;

spin_lock_irq(&dev->power.lock);
if (dev->power.wakeup) {
- count = dev->power.wakeup->hit_count;
+ count = dev->power.wakeup->expire_count;
enabled = true;
}
spin_unlock_irq(&dev->power.lock);
return enabled ? sprintf(buf, "%lu\n", count) : sprintf(buf, "\n");
}

-static DEVICE_ATTR(wakeup_hit_count, 0444, wakeup_hit_count_show, NULL);
+static DEVICE_ATTR(wakeup_expire_count, 0444, wakeup_expire_count_show, NULL);

static ssize_t wakeup_active_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -460,7 +479,8 @@ static struct attribute *wakeup_attrs[]
&dev_attr_wakeup.attr,
&dev_attr_wakeup_count.attr,
&dev_attr_wakeup_active_count.attr,
- &dev_attr_wakeup_hit_count.attr,
+ &dev_attr_wakeup_wakeup_count.attr,
+ &dev_attr_wakeup_expire_count.attr,
&dev_attr_wakeup_active.attr,
&dev_attr_wakeup_total_time_ms.attr,
&dev_attr_wakeup_max_time_ms.attr,

Rafael J. Wysocki

unread,
Feb 6, 2012, 8:06:44 PM2/6/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern
From: Rafael J. Wysocki <r...@sisk.pl>

Android allows user space to manipulate wakelocks using two
sysfs file located in /sys/power/, wake_lock and wake_unlock.
Writing a wakelock name and optionally a timeout to the wake_lock
file causes the wakelock whose name was written to be acquired (it
is created before is necessary), optionally with the given timeout.
Writing the name of a wakelock to wake_unlock causes that wakelock
to be released.

Implement an analogous interface for user space using wakeup sources.
Add the /sys/power/wake_lock and /sys/power/wake_unlock files
allowing user space to create, activate and deactivate wakeup
sources, such that writing a name and optionally a timeout to
wake_lock causes the wakeup source of that name to be activated,
optionally with the given timeout. If that wakeup source doesn't
exist, it will be created and then activated. Writing a name to
wake_unlock causes the wakeup source of that name, if there is one,
to be deactivated. Wakeup sources created with the help of
wake_lock that haven't been used for more than 5 minutes are garbage
collected and destroyed. Moreover, there can be only WL_NUMBER_LIMIT
wakeup sources created with the help of wake_lock present at a time.

The data type used to track wakeup sources created by user space is
called "struct wakelock" to indicate the origins of this feature.

---
drivers/base/power/wakeup.c | 1
kernel/power/Kconfig | 8 +
kernel/power/Makefile | 1
kernel/power/main.c | 41 ++++++++
kernel/power/power.h | 9 +
kernel/power/wakelock.c | 218 ++++++++++++++++++++++++++++++++++++++++++++
6 files changed, 278 insertions(+)

Index: linux/kernel/power/main.c
===================================================================
--- linux.orig/kernel/power/main.c
+++ linux/kernel/power/main.c
@@ -415,6 +415,43 @@ static ssize_t autosleep_store(struct ko

power_attr(autosleep);
#endif /* CONFIG_PM_AUTOSLEEP */
+
+#ifdef CONFIG_PM_WAKELOCKS
+static ssize_t wake_lock_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ return pm_show_wakelocks(buf, true);
+}
+
+static ssize_t wake_lock_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ int error = pm_wake_lock(buf);
+ return error ? error : n;
+}
+
+power_attr(wake_lock);
+
+static ssize_t wake_unlock_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ return pm_show_wakelocks(buf, false);
+}
+
+static ssize_t wake_unlock_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ int error = pm_wake_unlock(buf);
+ return error ? error : n;
+}
+
+power_attr(wake_unlock);
+
+#endif /* CONFIG_PM_WAKELOCKS */
#endif /* CONFIG_PM_SLEEP */

#ifdef CONFIG_PM_TRACE
@@ -471,6 +508,10 @@ static struct attribute * g[] = {
#ifdef CONFIG_PM_AUTOSLEEP
&autosleep_attr.attr,
#endif
+#ifdef CONFIG_PM_WAKELOCKS
+ &wake_lock_attr.attr,
+ &wake_unlock_attr.attr,
+#endif
#ifdef CONFIG_PM_DEBUG
&pm_test_attr.attr,
#endif
Index: linux/kernel/power/power.h
===================================================================
--- linux.orig/kernel/power/power.h
+++ linux/kernel/power/power.h
@@ -287,3 +287,12 @@ static inline void pm_autosleep_unlock(v
static inline suspend_state_t pm_autosleep_state(void) { return PM_SUSPEND_ON; }

#endif /* !CONFIG_PM_AUTOSLEEP */
+
+#ifdef CONFIG_PM_WAKELOCKS
+
+/* kernel/power/wakelock.c */
+extern ssize_t pm_show_wakelocks(char *buf, bool show_active);
+extern int pm_wake_lock(const char *buf);
+extern int pm_wake_unlock(const char *buf);
+
+#endif /* !CONFIG_PM_WAKELOCKS */
Index: linux/kernel/power/Kconfig
===================================================================
--- linux.orig/kernel/power/Kconfig
+++ linux/kernel/power/Kconfig
@@ -111,6 +111,14 @@ config PM_AUTOSLEEP
Allow the kernel to trigger a system transition into a global sleep
state automatically whenever there are no active wakeup sources.

+config PM_WAKELOCKS
+ bool "User space wakeup sources interface"
+ depends on PM_SLEEP
+ default n
+ ---help---
+ Allow user space to create, activate and deactivate wakeup source
+ objects with the help of a sysfs-based interface.
+
config PM_RUNTIME
bool "Run-time PM core functionality"
depends on !IA64_HP_SIM
Index: linux/kernel/power/wakelock.c
===================================================================
--- /dev/null
+++ linux/kernel/power/wakelock.c
@@ -0,0 +1,218 @@
+/*
+ * kernel/power/wakelock.c
+ *
+ * User space wakeup sources support.
+ *
+ * Copyright (C) 2012 Rafael J. Wysocki <r...@sisk.pl>
+ *
+ * This code is based on the analogous interface allowing user space to
+ * manipulate wakelocks on Android.
+ */
+
+#include <linux/ctype.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/hrtimer.h>
+#include <linux/list.h>
+#include <linux/rbtree.h>
+#include <linux/slab.h>
+
+#define WL_NUMBER_LIMIT 100
+#define WL_GC_COUNT_MAX 100
+#define WL_GC_TIME_SEC 300
+
+static DEFINE_MUTEX(wakelocks_lock);
+
+struct wakelock {
+ char *name;
+ struct rb_node node;
+ struct wakeup_source ws;
+ struct list_head lru;
+};
+
+static struct rb_root wakelocks_tree = RB_ROOT;
+static LIST_HEAD(wakelocks_lru_list);
+static unsigned int number_of_wakelocks;
+static unsigned int wakelocks_gc_count;
+
+ssize_t pm_show_wakelocks(char *buf, bool show_active)
+{
+ struct rb_node *node;
+ struct wakelock *wl;
+ char *str = buf;
+ char *end = buf + PAGE_SIZE;
+
+ mutex_lock(&wakelocks_lock);
+
+ for (node = rb_first(&wakelocks_tree); node; node = rb_next(node)) {
+ bool active;
+
+ wl = rb_entry(node, struct wakelock, node);
+ spin_lock_irq(&wl->ws.lock);
+ active = wl->ws.active;
+ spin_unlock_irq(&wl->ws.lock);
+ if (active == show_active)
+ str += scnprintf(str, end - str, "%s ", wl->name);
+ }
+ str += scnprintf(str, end - str, "\n");
+
+ mutex_unlock(&wakelocks_lock);
+ return (str - buf);
+}
+
+static struct wakelock *wakelock_lookup_add(const char *name, size_t len,
+ bool add_if_not_found)
+{
+ struct rb_node **node = &wakelocks_tree.rb_node;
+ struct rb_node *parent = *node;
+ struct wakelock *wl;
+
+ while (*node) {
+ int diff;
+
+ wl = rb_entry(*node, struct wakelock, node);
+ diff = strncmp(name, wl->name, len);
+ if (diff == 0) {
+ if (wl->name[len])
+ diff = -1;
+ else
+ return wl;
+ }
+ if (diff < 0)
+ node = &(*node)->rb_left;
+ else
+ node = &(*node)->rb_right;
+
+ parent = *node;
+ }
+ if (!add_if_not_found)
+ return ERR_PTR(-EINVAL);
+
+ if (number_of_wakelocks > WL_NUMBER_LIMIT)
+ return ERR_PTR(-ENOSPC);
+
+ /* Not found, we have to add a new one. */
+ wl = kzalloc(sizeof(*wl), GFP_KERNEL);
+ if (!wl)
+ return ERR_PTR(-ENOMEM);
+
+ wl->name = kstrndup(name, len, GFP_KERNEL);
+ if (!wl->name) {
+ kfree(wl);
+ return ERR_PTR(-ENOMEM);
+ }
+ wl->ws.name = wl->name;
+ wakeup_source_add(&wl->ws);
+ rb_link_node(&wl->node, parent, node);
+ rb_insert_color(&wl->node, &wakelocks_tree);
+ list_add(&wl->lru, &wakelocks_lru_list);
+ number_of_wakelocks++;
+ return wl;
+}
+
+int pm_wake_lock(const char *buf)
+{
+ const char *str = buf;
+ struct wakelock *wl;
+ u64 timeout_ns = 0;
+ size_t len;
+ int ret = 0;
+
+ while (*str && !isspace(*str))
+ str++;
+
+ len = str - buf;
+ if (!len)
+ return -EINVAL;
+
+ if (*str && *str != '\n') {
+ /* Find out if there's a valid timeout string appended. */
+ ret = kstrtou64(skip_spaces(str), 10, &timeout_ns);
+ if (ret)
+ return -EINVAL;
+ }
+
+ mutex_lock(&wakelocks_lock);
+
+ wl = wakelock_lookup_add(buf, len, true);
+ if (IS_ERR(wl)) {
+ ret = PTR_ERR(wl);
+ goto out;
+ }
+ if (timeout_ns) {
+ u64 timeout_ms = timeout_ns + NSEC_PER_MSEC - 1;
+
+ do_div(timeout_ms, NSEC_PER_MSEC);
+ __pm_wakeup_event(&wl->ws, timeout_ms);
+ } else {
+ __pm_stay_awake(&wl->ws);
+ }
+
+ list_move(&wl->lru, &wakelocks_lru_list);
+
+ out:
+ mutex_unlock(&wakelocks_lock);
+ return ret;
+}
+
+static void wakelocks_gc(void)
+{
+ struct wakelock *wl, *aux;
+ ktime_t now = ktime_get();
+
+ list_for_each_entry_safe_reverse(wl, aux, &wakelocks_lru_list, lru) {
+ u64 idle_time_ns;
+ bool active;
+
+ spin_lock_irq(&wl->ws.lock);
+ idle_time_ns = ktime_to_ns(ktime_sub(now, wl->ws.last_time));
+ active = wl->ws.active;
+ spin_unlock_irq(&wl->ws.lock);
+
+ if (idle_time_ns < ((u64)WL_GC_TIME_SEC * NSEC_PER_SEC))
+ break;
+
+ if (!active) {
+ wakeup_source_remove(&wl->ws);
+ rb_erase(&wl->node, &wakelocks_tree);
+ list_del(&wl->lru);
+ kfree(wl->name);
+ kfree(wl);
+ number_of_wakelocks--;
+ }
+ }
+ wakelocks_gc_count = 0;
+}
+
+int pm_wake_unlock(const char *buf)
+{
+ struct wakelock *wl;
+ size_t len;
+ int ret = 0;
+
+ len = strlen(buf);
+ if (!len)
+ return -EINVAL;
+
+ if (buf[len-1] == '\n')
+ len--;
+
+ if (!len)
+ return -EINVAL;
+
+ mutex_lock(&wakelocks_lock);
+
+ wl = wakelock_lookup_add(buf, len, false);
+ if (IS_ERR(wl)) {
+ ret = PTR_ERR(wl);
+ goto out;
+ }
+ __pm_relax(&wl->ws);
+ list_move(&wl->lru, &wakelocks_lru_list);
+ if (++wakelocks_gc_count > WL_GC_COUNT_MAX)
+ wakelocks_gc();
+
+ out:
+ mutex_unlock(&wakelocks_lock);
+ return ret;
+}
Index: linux/kernel/power/Makefile
===================================================================
--- linux.orig/kernel/power/Makefile
+++ linux/kernel/power/Makefile
@@ -9,5 +9,6 @@ obj-$(CONFIG_PM_TEST_SUSPEND) += suspend
obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o \
block_io.o
obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o
+obj-$(CONFIG_PM_WAKELOCKS) += wakelock.o

obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -107,6 +107,7 @@ void wakeup_source_add(struct wakeup_sou
spin_lock_init(&ws->lock);
setup_timer(&ws->timer, pm_wakeup_timer_fn, (unsigned long)ws);
ws->active = false;
+ ws->last_time = ktime_get();

spin_lock_irq(&events_lock);
list_add_rcu(&ws->entry, &wakeup_sources);

Rafael J. Wysocki

unread,
Feb 6, 2012, 8:06:56 PM2/6/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern
From: Rafael J. Wysocki <r...@sisk.pl>

Introduce a mechanism by which the kernel can trigger global
transitions to a sleep state chosen by user space if there are no
active wakeup sources.

It consists of a new sysfs attribute, /sys/power/autosleep, that
can be written one of the strings returned by reads from
/sys/power/state, a freezable ordered workqueue and a work item
carrying out the "suspend" operations. If a string representing
the system's sleep state is written to /sys/power/autosleep, the
work item triggering transitions to that state is queued up and
it requeues it self after every execution until user space writes
"off" to /sys/power/autosleep. That work item enables the detection
of wakeup events using the functions already defined in
drivers/base/power/wakeup.c (with one small modification) and calls
either pm_suspend(), or hibernate() to put the system into a sleep
state. If a wakeup event is reported while the transition is in
progress, it will abort the transition and the "system suspend" work
item will be queued up again.

---
drivers/base/power/wakeup.c | 38 ++++++++------
include/linux/suspend.h | 13 ++++
kernel/power/Kconfig | 8 +++
kernel/power/Makefile | 1
kernel/power/autosleep.c | 115 ++++++++++++++++++++++++++++++++++++++++++++
kernel/power/main.c | 93 +++++++++++++++++++++++++++++------
kernel/power/power.h | 18 ++++++
7 files changed, 254 insertions(+), 32 deletions(-)

Index: linux/kernel/power/Makefile
===================================================================
--- linux.orig/kernel/power/Makefile
+++ linux/kernel/power/Makefile
@@ -8,5 +8,6 @@ obj-$(CONFIG_SUSPEND) += suspend.o
obj-$(CONFIG_PM_TEST_SUSPEND) += suspend_test.o
obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o \
block_io.o
+obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o

obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o
Index: linux/kernel/power/Kconfig
===================================================================
--- linux.orig/kernel/power/Kconfig
+++ linux/kernel/power/Kconfig
@@ -103,6 +103,14 @@ config PM_SLEEP_SMP
select HOTPLUG
select HOTPLUG_CPU

+config PM_AUTOSLEEP
+ bool "Opportunistic sleep"
+ depends on PM_SLEEP
+ default n
+ ---help---
+ Allow the kernel to trigger a system transition into a global sleep
+ state automatically whenever there are no active wakeup sources.
+
config PM_RUNTIME
bool "Run-time PM core functionality"
depends on !IA64_HP_SIM
Index: linux/kernel/power/power.h
===================================================================
--- linux.orig/kernel/power/power.h
+++ linux/kernel/power/power.h
@@ -269,3 +269,21 @@ static inline void suspend_thaw_processe
{
}
#endif
+
+#ifdef CONFIG_PM_AUTOSLEEP
+
+/* kernel/power/autosleep.c */
+extern int pm_autosleep_init(void);
+extern void pm_autosleep_lock(void);
+extern void pm_autosleep_unlock(void);
+extern suspend_state_t pm_autosleep_state(void);
+extern int pm_autosleep_set_state(suspend_state_t state);
+
+#else /* !CONFIG_PM_AUTOSLEEP */
+
+static inline int pm_autosleep_init(void) { return 0; }
+static inline void pm_autosleep_lock(void) {}
+static inline void pm_autosleep_unlock(void) {}
+static inline suspend_state_t pm_autosleep_state(void) { return PM_SUSPEND_ON; }
+
+#endif /* !CONFIG_PM_AUTOSLEEP */
Index: linux/include/linux/suspend.h
===================================================================
--- linux.orig/include/linux/suspend.h
+++ linux/include/linux/suspend.h
@@ -372,7 +372,7 @@ extern int unregister_pm_notifier(struct
extern bool events_check_enabled;

extern bool pm_wakeup_pending(void);
-extern bool pm_get_wakeup_count(unsigned int *count);
+extern bool pm_get_wakeup_count(unsigned int *count, bool block);
extern bool pm_save_wakeup_count(unsigned int count);

static inline void lock_system_sleep(void)
@@ -423,6 +423,17 @@ static inline void unlock_system_sleep(v

#endif /* !CONFIG_PM_SLEEP */

+#ifdef CONFIG_PM_AUTOSLEEP
+
+/* kernel/power/autosleep.c */
+void queue_up_suspend_work(void);
+
+#else /* !CONFIG_PM_AUTOSLEEP */
+
+static inline void queue_up_suspend_work(void) {}
+
+#endif /* !CONFIG_PM_AUTOSLEEP */
+
#ifdef CONFIG_ARCH_SAVE_PAGE_KEYS
/*
* The ARCH_SAVE_PAGE_KEYS functions can be used by an architecture
Index: linux/kernel/power/autosleep.c
===================================================================
--- /dev/null
+++ linux/kernel/power/autosleep.c
@@ -0,0 +1,115 @@
+/*
+ * kernel/power/autosleep.c
+ *
+ * Opportunistic sleep support.
+ *
+ * Copyright (C) 2012 Rafael J. Wysocki <r...@sisk.pl>
+ */
+
+#include <linux/device.h>
+#include <linux/mutex.h>
+#include <linux/pm_wakeup.h>
+
+#include "power.h"
+
+static struct workqueue_struct *autosleep_wq;
+static struct wakeup_source *autosleep_ws;
+
+static DEFINE_MUTEX(autosleep_lock);
+static DECLARE_COMPLETION(suspend_completion);
+
+static suspend_state_t autosleep_state;
+
+static void try_to_suspend(struct work_struct *work)
+{
+ unsigned int initial_count, final_count;
+
+ if (!pm_get_wakeup_count(&initial_count, true))
+ goto out;
+
+ if (!pm_save_wakeup_count(initial_count))
+ goto out;
+
+ mutex_lock(&autosleep_lock);
+ if (autosleep_state == PM_SUSPEND_ON) {
+ mutex_unlock(&autosleep_lock);
+ return;
+ }
+ INIT_COMPLETION(suspend_completion);
+ if (autosleep_state >= PM_SUSPEND_MAX)
+ hibernate();
+ else
+ pm_suspend(autosleep_state);
+
+ complete_all(&suspend_completion);
+ mutex_unlock(&autosleep_lock);
+
+ if (!pm_get_wakeup_count(&final_count, false))
+ goto out;
+
+ if (final_count == initial_count)
+ schedule_timeout(HZ / 2);
+
+ out:
+ queue_up_suspend_work();
+}
+
+static DECLARE_WORK(suspend_work, try_to_suspend);
+
+void queue_up_suspend_work(void)
+{
+ if (!work_pending(&suspend_work) && autosleep_state > PM_SUSPEND_ON)
+ queue_work(autosleep_wq, &suspend_work);
+}
+
+suspend_state_t pm_autosleep_state(void)
+{
+ return autosleep_state;
+}
+
+int pm_autosleep_set_state(suspend_state_t state)
+{
+#ifndef CONFIG_HIBERNATION
+ if (state >= PM_SUSPEND_MAX)
+ return -EINVAL;
+#endif
+ mutex_lock(&autosleep_lock);
+ __pm_stay_awake(autosleep_ws);
+ if (state == PM_SUSPEND_ON && autosleep_state != PM_SUSPEND_ON) {
+ autosleep_state = PM_SUSPEND_ON;
+ __pm_relax(autosleep_ws);
+ mutex_unlock(&autosleep_lock);
+ wait_for_completion(&suspend_completion);
+ } else if (state > PM_SUSPEND_ON) {
+ autosleep_state = state;
+ __pm_relax(autosleep_ws);
+ queue_up_suspend_work();
+ mutex_unlock(&autosleep_lock);
+ }
+ return 0;
+}
+
+void pm_autosleep_lock(void)
+{
+ mutex_lock(&autosleep_lock);
+}
+
+void pm_autosleep_unlock(void)
+{
+ mutex_unlock(&autosleep_lock);
+}
+
+int __init pm_autosleep_init(void)
+{
+ complete_all(&suspend_completion);
+ autosleep_ws = wakeup_source_register("main");
+ if (!autosleep_ws)
+ return -ENOMEM;
+
+ autosleep_wq = alloc_ordered_workqueue("autosleep", 0);
+ if (autosleep_wq)
+ return 0;
+
+ wakeup_source_unregister(autosleep_ws);
+ return -ENOMEM;
+}
Index: linux/kernel/power/main.c
===================================================================
--- linux.orig/kernel/power/main.c
+++ linux/kernel/power/main.c
@@ -269,8 +269,7 @@ static ssize_t state_show(struct kobject
return (s - buf);
}

-static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
- const char *buf, size_t n)
+static suspend_state_t decode_state(const char *buf, size_t n)
{
#ifdef CONFIG_SUSPEND
suspend_state_t state = PM_SUSPEND_STANDBY;
@@ -278,29 +277,46 @@ static ssize_t state_store(struct kobjec
#endif
char *p;
int len;
- int error = -EINVAL;

p = memchr(buf, '\n', n);
len = p ? p - buf : n;

- /* First, check if we are requested to hibernate */
- if (len == 4 && !strncmp(buf, "disk", len)) {
- error = hibernate();
- goto Exit;
- }
+ /* Check hibernation first. */
+ if (len == 4 && !strncmp(buf, "disk", len))
+ return PM_SUSPEND_MAX;

#ifdef CONFIG_SUSPEND
for (s = &pm_states[state]; state < PM_SUSPEND_MAX; s++, state++) {
if (*s && len == strlen(*s) && !strncmp(buf, *s, len))
break;
}
- if (state < PM_SUSPEND_MAX && *s) {
- error = enter_state(state);
- suspend_stats_update(error);
- }
+ if (state < PM_SUSPEND_MAX && *s)
+ return state;
#endif

- Exit:
+ return PM_SUSPEND_ON;
+}
+
+static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ suspend_state_t state;
+ int error = -EINVAL;
+
+ pm_autosleep_lock();
+ if (pm_autosleep_state() > PM_SUSPEND_ON) {
+ error = -EBUSY;
+ goto out;
+ }
+
+ state = decode_state(buf, n);
+ if (state < PM_SUSPEND_MAX)
+ error = pm_suspend(state);
+ else if (state > PM_SUSPEND_ON)
+ error = hibernate();
+
+ out:
+ pm_autosleep_unlock();
return error ? error : n;
}

@@ -341,7 +357,8 @@ static ssize_t wakeup_count_show(struct
{
unsigned int val;

- return pm_get_wakeup_count(&val) ? sprintf(buf, "%u\n", val) : -EINTR;
+ return pm_get_wakeup_count(&val, true) ?
+ sprintf(buf, "%u\n", val) : -EINTR;
}

static ssize_t wakeup_count_store(struct kobject *kobj,
@@ -358,6 +375,46 @@ static ssize_t wakeup_count_store(struct
}

power_attr(wakeup_count);
+
+#ifdef CONFIG_PM_AUTOSLEEP
+static ssize_t autosleep_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ suspend_state_t state = pm_autosleep_state();
+
+ if (state == PM_SUSPEND_ON)
+ return sprintf(buf, "off\n");
+
+#ifdef CONFIG_SUSPEND
+ if (state < PM_SUSPEND_MAX)
+ return sprintf(buf, "%s\n", valid_state(state) ?
+ pm_states[state] : "error");
+#endif
+#ifdef CONFIG_HIBERNATION
+ return sprintf(buf, "disk\n");
+#else
+ return sprintf(buf, "error");
+#endif
+}
+
+static ssize_t autosleep_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ suspend_state_t state = decode_state(buf, n);
+ int error;
+
+ if (state == PM_SUSPEND_ON && strncmp(buf, "off", 3)
+ && strncmp(buf, "off\n", 4))
+ return -EINVAL;
+
+ error = pm_autosleep_set_state(state);
+ return error ? error : n;
+}
+
+power_attr(autosleep);
+#endif /* CONFIG_PM_AUTOSLEEP */
#endif /* CONFIG_PM_SLEEP */

#ifdef CONFIG_PM_TRACE
@@ -411,6 +468,9 @@ static struct attribute * g[] = {
#ifdef CONFIG_PM_SLEEP
&pm_async_attr.attr,
&wakeup_count_attr.attr,
+#ifdef CONFIG_PM_AUTOSLEEP
+ &autosleep_attr.attr,
+#endif
#ifdef CONFIG_PM_DEBUG
&pm_test_attr.attr,
#endif
@@ -446,7 +506,10 @@ static int __init pm_init(void)
power_kobj = kobject_create_and_add("power", NULL);
if (!power_kobj)
return -ENOMEM;
- return sysfs_create_group(power_kobj, &attr_group);
+ error = sysfs_create_group(power_kobj, &attr_group);
+ if (error)
+ return error;
+ return pm_autosleep_init();
}

core_initcall(pm_init);
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -458,8 +458,10 @@ static void wakeup_source_deactivate(str
atomic_add(MAX_IN_PROGRESS, &combined_event_count);

split_counters(&cnt, &inpr);
- if (!inpr)
+ if (!inpr) {
wake_up_all(&wakeup_count_wait_queue);
+ queue_up_suspend_work();
+ }
}

/**
@@ -610,29 +612,33 @@ bool pm_wakeup_pending(void)
/**
* pm_get_wakeup_count - Read the number of registered wakeup events.
* @count: Address to store the value at.
+ * @block: Whether or not to block.
*
- * Store the number of registered wakeup events at the address in @count. Block
- * if the current number of wakeup events being processed is nonzero.
+ * Store the number of registered wakeup events at the address in @count. If
+ * @block is set, block until the current number of wakeup events being
+ * processed is zero.
*
- * Return 'false' if the wait for the number of wakeup events being processed to
- * drop down to zero has been interrupted by a signal (and the current number
- * of wakeup events being processed is still nonzero). Otherwise return 'true'.
+ * Return 'false' if the current number of wakeup events being processed is
+ * nonzero. Otherwise return 'true'.
*/
-bool pm_get_wakeup_count(unsigned int *count)
+bool pm_get_wakeup_count(unsigned int *count, bool block)
{
unsigned int cnt, inpr;
- DEFINE_WAIT(wait);

- for (;;) {
- prepare_to_wait(&wakeup_count_wait_queue, &wait,
- TASK_INTERRUPTIBLE);
- split_counters(&cnt, &inpr);
- if (inpr == 0 || signal_pending(current))
- break;
+ if (block) {
+ DEFINE_WAIT(wait);

- schedule();
+ for (;;) {
+ prepare_to_wait(&wakeup_count_wait_queue, &wait,
+ TASK_INTERRUPTIBLE);
+ split_counters(&cnt, &inpr);
+ if (inpr == 0 || signal_pending(current))
+ break;
+
+ schedule();
+ }
+ finish_wait(&wakeup_count_wait_queue, &wait);
}
- finish_wait(&wakeup_count_wait_queue, &wait);

split_counters(&cnt, &inpr);
*count = cnt;

Rafael J. Wysocki

unread,
Feb 6, 2012, 8:07:27 PM2/6/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern
From: Rafael J. Wysocki <r...@sisk.pl>

Android uses one wakelock statistics that is only necessary for
opportunistic sleep. Namely, the prevent_suspend_time field
accumulates the total time the given wakelock has been locked
while "automatic suspend" was enabled. Add an analogous field,
prevent_sleep_time, to wakeup sources and make it behave in a similar
way.

---
drivers/base/power/wakeup.c | 61 +++++++++++++++++++++++++++++++++++++++++---
include/linux/pm_wakeup.h | 4 ++
include/linux/suspend.h | 1
kernel/power/autosleep.c | 2 +
4 files changed, 64 insertions(+), 4 deletions(-)

Index: linux/include/linux/pm_wakeup.h
===================================================================
--- linux.orig/include/linux/pm_wakeup.h
+++ linux/include/linux/pm_wakeup.h
@@ -34,6 +34,7 @@
* @total_time: Total time this wakeup source has been active.
* @max_time: Maximum time this wakeup source has been continuously active.
* @last_time: Monotonic clock when the wakeup source's was touched last time.
+ * @prevent_sleep_time: Total time this source has been preventing autosleep.
* @event_count: Number of signaled wakeup events.
* @active_count: Number of times the wakeup sorce was activated.
* @relax_count: Number of times the wakeup sorce was deactivated.
@@ -51,6 +52,8 @@ struct wakeup_source {
ktime_t total_time;
ktime_t max_time;
ktime_t last_time;
+ ktime_t start_prevent_time;
+ ktime_t prevent_sleep_time;
unsigned long event_count;
unsigned long active_count;
unsigned long relax_count;
@@ -58,6 +61,7 @@ struct wakeup_source {
unsigned long wakeup_count;
bool active:1;
bool has_timeout:1;
+ bool autosleep_enabled:1;
};

#ifdef CONFIG_PM_SLEEP
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -351,6 +351,8 @@ static void wakeup_source_activate(struc
ws->active_count++;
ws->timer_expires = jiffies;
ws->last_time = ktime_get();
+ if (ws->autosleep_enabled)
+ ws->start_prevent_time = ws->last_time;

/* Increment the counter of events in progress. */
atomic_inc(&combined_event_count);
@@ -407,6 +409,17 @@ void pm_stay_awake(struct device *dev)
}
EXPORT_SYMBOL_GPL(pm_stay_awake);

+#ifdef CONFIG_PM_AUTOSLEEP
+static void update_prevent_sleep_time(struct wakeup_source *ws, ktime_t now)
+{
+ ktime_t delta = ktime_sub(now, ws->start_prevent_time);
+ ws->prevent_sleep_time = ktime_add(ws->prevent_sleep_time, delta);
+}
+#else
+static inline void update_prevent_sleep_time(struct wakeup_source *ws,
+ ktime_t now) {}
+#endif
+
/**
* wakup_source_deactivate - Mark given wakeup source as inactive.
* @ws: Wakeup source to handle.
@@ -451,6 +464,9 @@ static void wakeup_source_deactivate(str
ws->has_timeout = false;
del_timer(&ws->timer);

+ if (ws->autosleep_enabled)
+ update_prevent_sleep_time(ws, now);
+
/*
* Increment the counter of registered wakeup events and decrement the
* couter of wakeup events in progress simultaneously.
@@ -670,6 +686,34 @@ bool pm_save_wakeup_count(unsigned int c
return events_check_enabled;
}

+#ifdef CONFIG_PM_AUTOSLEEP
+/**
+ * pm_wakep_autosleep_enabled - Modify autosleep_enabled for all wakeup sources.
+ * @enabled: Whether to set or to clear the autosleep_enabled flags.
+ */
+void pm_wakep_autosleep_enabled(bool set)
+{
+ struct wakeup_source *ws;
+ ktime_t now = ktime_get();
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(ws, &wakeup_sources, entry) {
+ spin_lock_irq(&ws->lock);
+ if (ws->autosleep_enabled != set) {
+ ws->autosleep_enabled = set;
+ if (ws->active) {
+ if (set)
+ ws->start_prevent_time = now;
+ else
+ update_prevent_sleep_time(ws, now);
+ }
+ }
+ spin_unlock_irq(&ws->lock);
+ }
+ rcu_read_unlock();
+}
+#endif /* CONFIG_PM_AUTOSLEEP */
+
static struct dentry *wakeup_sources_stats_dentry;

/**
@@ -685,28 +729,37 @@ static int print_wakeup_source_stats(str
ktime_t max_time;
unsigned long active_count;
ktime_t active_time;
+ ktime_t prevent_sleep_time;
int ret;

spin_lock_irqsave(&ws->lock, flags);

total_time = ws->total_time;
max_time = ws->max_time;
+ prevent_sleep_time = ws->prevent_sleep_time;
active_count = ws->active_count;
if (ws->active) {
- active_time = ktime_sub(ktime_get(), ws->last_time);
+ ktime_t now = ktime_get();
+
+ active_time = ktime_sub(now, ws->last_time);
total_time = ktime_add(total_time, active_time);
if (active_time.tv64 > max_time.tv64)
max_time = active_time;
+
+ if (ws->autosleep_enabled)
+ prevent_sleep_time = ktime_add(prevent_sleep_time,
+ ktime_sub(now, ws->start_prevent_time));
} else {
active_time = ktime_set(0, 0);
}

ret = seq_printf(m, "%-12s\t%lu\t\t%lu\t\t%lu\t\t%lu\t\t"
- "%lld\t\t%lld\t\t%lld\t\t%lld\n",
+ "%lld\t\t%lld\t\t%lld\t\t%lld\t\t%lld\n",
ws->name, active_count, ws->event_count,
ws->wakeup_count, ws->expire_count,
ktime_to_ms(active_time), ktime_to_ms(total_time),
- ktime_to_ms(max_time), ktime_to_ms(ws->last_time));
+ ktime_to_ms(max_time), ktime_to_ms(ws->last_time),
+ ktime_to_ms(prevent_sleep_time));

spin_unlock_irqrestore(&ws->lock, flags);

@@ -723,7 +776,7 @@ static int wakeup_sources_stats_show(str

seq_puts(m, "name\t\tactive_count\tevent_count\twakeup_count\t"
"expire_count\tactive_since\ttotal_time\tmax_time\t"
- "last_change\n");
+ "last_change\tprevent_suspend_time\n");

rcu_read_lock();
list_for_each_entry_rcu(ws, &wakeup_sources, entry)
Index: linux/include/linux/suspend.h
===================================================================
--- linux.orig/include/linux/suspend.h
+++ linux/include/linux/suspend.h
@@ -374,6 +374,7 @@ extern bool events_check_enabled;
extern bool pm_wakeup_pending(void);
extern bool pm_get_wakeup_count(unsigned int *count, bool block);
extern bool pm_save_wakeup_count(unsigned int count);
+extern void pm_wakep_autosleep_enabled(bool set);

static inline void lock_system_sleep(void)
{
Index: linux/kernel/power/autosleep.c
===================================================================
--- linux.orig/kernel/power/autosleep.c
+++ linux/kernel/power/autosleep.c
@@ -78,11 +78,13 @@ int pm_autosleep_set_state(suspend_state
if (state == PM_SUSPEND_ON && autosleep_state != PM_SUSPEND_ON) {
autosleep_state = PM_SUSPEND_ON;
__pm_relax(autosleep_ws);
+ pm_wakep_autosleep_enabled(false);
mutex_unlock(&autosleep_lock);
wait_for_completion(&suspend_completion);
} else if (state > PM_SUSPEND_ON) {
autosleep_state = state;
__pm_relax(autosleep_ws);
+ pm_wakep_autosleep_enabled(true);
queue_up_suspend_work();
mutex_unlock(&autosleep_lock);

Rafael J. Wysocki

unread,
Feb 6, 2012, 8:07:55 PM2/6/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern
From: Rafael J. Wysocki <r...@sisk.pl>

Initialize wakeup source locks in wakeup_source_add() instead of
wakeup_source_create(), because otherwise the locks of the wakeup
sources that haven't been allocated with wakeup_source_create()
aren't initialized and handled properly.

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
drivers/base/power/wakeup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -64,7 +64,6 @@ struct wakeup_source *wakeup_source_crea
if (!ws)
return NULL;

- spin_lock_init(&ws->lock);
if (name)
ws->name = kstrdup(name, GFP_KERNEL);

@@ -105,6 +104,7 @@ void wakeup_source_add(struct wakeup_sou
if (WARN_ON(!ws))
return;

+ spin_lock_init(&ws->lock);
setup_timer(&ws->timer, pm_wakeup_timer_fn, (unsigned long)ws);
ws->active = false;


Rafael J. Wysocki

unread,
Feb 6, 2012, 8:08:05 PM2/6/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern
From: Rafael J. Wysocki <r...@sisk.pl>

Use the observation that it is more efficient to check the wakeup
variable once before the loop reporting tasks that were not
frozen in try_to_freeze_tasks() than to do that in every step of that
loop.

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
kernel/power/process.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)

Index: linux/kernel/power/process.c
===================================================================
--- linux.orig/kernel/power/process.c
+++ linux/kernel/power/process.c
@@ -98,13 +98,15 @@ static int try_to_freeze_tasks(bool user
elapsed_csecs / 100, elapsed_csecs % 100,
todo - wq_busy, wq_busy);

- read_lock(&tasklist_lock);
- do_each_thread(g, p) {
- if (!wakeup && !freezer_should_skip(p) &&
- p != current && freezing(p) && !frozen(p))
- sched_show_task(p);
- } while_each_thread(g, p);
- read_unlock(&tasklist_lock);
+ if (!wakeup) {
+ read_lock(&tasklist_lock);
+ do_each_thread(g, p) {
+ if (p != current && !freezer_should_skip(p)
+ && freezing(p) && !frozen(p))
+ sched_show_task(p);
+ } while_each_thread(g, p);
+ read_unlock(&tasklist_lock);
+ }
} else {
printk("(elapsed %d.%02d seconds) ", elapsed_csecs / 100,
elapsed_csecs % 100);

Rafael J. Wysocki

unread,
Feb 6, 2012, 8:08:42 PM2/6/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern
Hi all,

This series tests the theory that the easiest way to sell a once rejected
feature is to advertise it under a different name.

Well, there actually are two different features, although they are closely
related to each other. First, patch [6/8] introduces a feature that allows
the kernel to trigger system suspend (or more generally a transition into
a sleep state) whenever there are no active wakeup sources (no, they aren't
called wakelocks). It is called "autosleep" here, but it was called a few
different names in the past ("opportunistic suspend" was probably the most
popular one). Second, patch [8/8] introduces "wake locks" that are,
essentially, wakeup sources which may be created and manipulated by user
space. Using them user space may control the autosleep feature introduced
earlier.

This also is a kind of a proof of concept for the people who wanted me to
show a kernel-based implementation of automatic suspend, so there you go.
Please note, however, that it is done so that the user space "wake locks"
interface is compatible with Android in support of its user space. I don't
really like this interface, but since the Android's user space seems to rely
on it, I'm fine with using it as is. YMMV.

Let me say a few words about every patch in the series individually.

[1/8] - This really is a bug fix, so it's v3.4 material. Nobody has stepped
on this bug so far, but it should be fixed anyway.

[2/8] - This is a freezer cleanup, worth doing anyway IMO, so v3.4 material too.

[3/8] - This is something we can do no problem, although completely optional
without the autosleep feature. Rather necessary with it, though.

[4/8] - This kind of reintroduces my original idea of using a wait queue for
waiting until there are no wakeup events in progress. Alan convinced me that
it would be better to poll the counter to prevent wakeup_source_deactivate()
from having to call wake_up_all() occasionally (that may be costly in fast
paths), but then quite some people told me that the wait queue migh be
better. I think that the polling will make much less sense with autosleep
and user space "wake locks". Anyway, [4/8] is something we can do without
those things too.

The patches above were given Sign-off-by tags, because I think they make some
sense regardless of the features introcuded by the remaining patches that in
turn are total RFC.

[5/8] - This changes wakeup source statistics so that they are more similar to
the statistics collected for wakelocks on Android. The file those statistics
may be read from is still located in debugfs, though (I don't think it
belongs to proc and its name is different from the analogous Android's file
name anyway). It could be done without autosleep, but then it would be a bit
pointless. BTW, this changes interfaces that _in_ _theory_ may be used by
someone, but I'm not aware of anyone using them. If you are one, I'll be
pleased to learn about that, so please tell me who you are. :-)

[6/8] - Autosleep implementation. I think the changelog explains the idea
quite well and the code is really nothing special. It doesn't really add
anything new to the kernel in terms of infrastructure etc., it just uses
the existing stuff to implement an alternative method of triggering system
sleep transitions. Note, though, that the interface here is different
from the Android's one, because Android actually modifies /sys/power/state
to trigger something called "early suspend" (that is never going to be
implemented in the "stock" kernel as long as I have any influence on it) and
we simply can't do that in the mainline.

[7/8] - This adds a wakeup source statistics that only makes sense with
autosleep and (I believe) is analogous to the Android's prevent_suspend_time
statistics. Nothing really special, but I didn't want
wakeup_source_activate/deactivate() to take a common lock to avoid
congestion.

[8/8] - This adds a user space interface to create, activate and deactivate
wakeup sources. Since the files it consists of are called wake_lock and
wake_unlock, to follow Android, the objects the wakeup sources are wrapped
into are called "wakelocks" (for added confusion). Since the interface
doesn't provide any means to destroy those "wakelocks", I added a garbage
collection mechanism to get rid of the unused ones, if any. I also tought
it might be a good idea to put a limit on the number of those things that
user space can operate simultaneously, so I did that too.

All in all, it's not as much code as I thought it would be and it seems to be
relatively simple (which rises the question why the Android people didn't
even _try_ to do something like this instead of slapping the "real" wakelocks
onto the kernel FWIW). IMHO it doesn't add anything really new to the kernel,
except for the user space interfaces that should be maintainable. At least I
think I should be able to maintain them. :-)

All of the above has been tested very briefly on my test-bed Mackerel board
and it quite obviously requires more thorough testing, but first I need to know
if it makes sense to spend any more time on it.

IOW, I need to know your opinions!

Thanks,
Rafael

Rafael J. Wysocki

unread,
Feb 6, 2012, 8:10:02 PM2/6/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern
Ouch. Sorry for breaking the Greg's address. Please replace it with the
correct one when you reply.

John Stultz

unread,
Feb 7, 2012, 5:30:43 PM2/7/12
to Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, Brian Swetland, Neil Brown, Alan Stern
On Tue, 2012-02-07 at 02:01 +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <r...@sisk.pl>
>
> Initialize wakeup source locks in wakeup_source_add() instead of
> wakeup_source_create(), because otherwise the locks of the wakeup
> sources that haven't been allocated with wakeup_source_create()
> aren't initialized and handled properly.
>
> Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>

Ah, I've shot myself in the foot before, forgetting to init the wakeup
source, so this should be good. Although, would a WARN_ON be better then
just initializing the lock in add? That way bad behavior is more likely
to be corrected, rather then just ignored.

thanks
-john

Rafael J. Wysocki

unread,
Feb 7, 2012, 5:37:58 PM2/7/12
to John Stultz, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, Brian Swetland, Neil Brown, Alan Stern
On Tuesday, February 07, 2012, John Stultz wrote:
> On Tue, 2012-02-07 at 02:01 +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <r...@sisk.pl>
> >
> > Initialize wakeup source locks in wakeup_source_add() instead of
> > wakeup_source_create(), because otherwise the locks of the wakeup
> > sources that haven't been allocated with wakeup_source_create()
> > aren't initialized and handled properly.
> >
> > Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
>
> Ah, I've shot myself in the foot before, forgetting to init the wakeup
> source, so this should be good. Although, would a WARN_ON be better then
> just initializing the lock in add? That way bad behavior is more likely
> to be corrected, rather then just ignored.

Well, that's not bad behavior, since users are not supposed to open code
wakeup source initialization. _add() is supposed to do the job (that's
why I regard this one as a fix).

Thanks,
Rafael

Rafael J. Wysocki

unread,
Feb 7, 2012, 5:45:58 PM2/7/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern
On Tuesday, February 07, 2012, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <r...@sisk.pl>
>
> Introduce a mechanism by which the kernel can trigger global
> transitions to a sleep state chosen by user space if there are no
> active wakeup sources.
>
> It consists of a new sysfs attribute, /sys/power/autosleep, that
> can be written one of the strings returned by reads from
> /sys/power/state, a freezable ordered workqueue and a work item
> carrying out the "suspend" operations. If a string representing
> the system's sleep state is written to /sys/power/autosleep, the
> work item triggering transitions to that state is queued up and
> it requeues it self after every execution until user space writes
> "off" to /sys/power/autosleep. That work item enables the detection
> of wakeup events using the functions already defined in
> drivers/base/power/wakeup.c (with one small modification) and calls
> either pm_suspend(), or hibernate() to put the system into a sleep
> state. If a wakeup event is reported while the transition is in
> progress, it will abort the transition and the "system suspend" work
> item will be queued up again.

OK, so before somebody points that out to me, the completion was redundant
(it was a leftover from one of the previous versions of the patch, sorry
about that).

Moreover, try_to_suspend() is racy with respect to wakeup_count_store()
(in theory, an automatic suspend without checking wakeup sources may happen
if the latter is used carelessly when autosleep is enabled).

Thus below is an updated patch (it requires [8/8] to be updated too because
of the changes in pm_autosleep_set_state(), but that's rather trivial).

Thanks,
Rafael

---
From: Rafael J. Wysocki <r...@sisk.pl>
Subject: PM / Sleep: Implement opportunistic sleep

Introduce a mechanism by which the kernel can trigger global
transitions to a sleep state chosen by user space if there are no
active wakeup sources.

It consists of a new sysfs attribute, /sys/power/autosleep, that
can be written one of the strings returned by reads from
/sys/power/state, an ordered workqueue and a work item carrying out
the "suspend" operations. If a string representing the system's
sleep state is written to /sys/power/autosleep, the work item
triggering transitions to that state is queued up and it requeues
itself after every execution until user space writes "off" to
/sys/power/autosleep.

That work item enables the detection of wakeup events using the
functions already defined in drivers/base/power/wakeup.c (with one
small modification) and calls either pm_suspend(), or hibernate() to
put the system into a sleep state. If a wakeup event is reported
while the transition is in progress, it will abort the transition and
the "system suspend" work item will be queued up again.

---
drivers/base/power/wakeup.c | 38 ++++++++------
include/linux/suspend.h | 13 ++++-
kernel/power/Kconfig | 8 +++
kernel/power/Makefile | 1
kernel/power/autosleep.c | 112 ++++++++++++++++++++++++++++++++++++++++++++
kernel/power/main.c | 105 ++++++++++++++++++++++++++++++++++-------
kernel/power/power.h | 18 +++++++
7 files changed, 262 insertions(+), 33 deletions(-)
@@ -0,0 +1,112 @@
+/*
+ * kernel/power/autosleep.c
+ *
+ * Opportunistic sleep support.
+ *
+ * Copyright (C) 2012 Rafael J. Wysocki <r...@sisk.pl>
+ */
+
+#include <linux/device.h>
+#include <linux/mutex.h>
+#include <linux/pm_wakeup.h>
+
+#include "power.h"
+
+static struct workqueue_struct *autosleep_wq;
+static struct wakeup_source *autosleep_ws;
+
+static DEFINE_MUTEX(autosleep_lock);
+
+static suspend_state_t autosleep_state;
+
+static void try_to_suspend(struct work_struct *work)
+{
+ unsigned int initial_count, final_count;
+
+ if (!pm_get_wakeup_count(&initial_count, true))
+ goto out;
+
+ mutex_lock(&autosleep_lock);
+
+ if (!pm_save_wakeup_count(initial_count)) {
+ mutex_unlock(&autosleep_lock);
+ goto out;
+ }
+
+ if (autosleep_state == PM_SUSPEND_ON) {
+ mutex_unlock(&autosleep_lock);
+ return;
+ }
+ if (autosleep_state >= PM_SUSPEND_MAX)
+ hibernate();
+ else
+ pm_suspend(autosleep_state);
+
+ } else if (state > PM_SUSPEND_ON) {
+ autosleep_state = state;
+ __pm_relax(autosleep_ws);
+ queue_up_suspend_work();
+ }
+ mutex_unlock(&autosleep_lock);
+ return 0;
+}
+
+void pm_autosleep_lock(void)
+{
+ mutex_lock(&autosleep_lock);
+}
+
+void pm_autosleep_unlock(void)
+{
+ mutex_unlock(&autosleep_lock);
+}
+
+int __init pm_autosleep_init(void)
+{
@@ -349,15 +366,65 @@ static ssize_t wakeup_count_store(struct
const char *buf, size_t n)
{
unsigned int val;
+ int error = -EINVAL;
+
+ pm_autosleep_lock();
+ if (pm_autosleep_state() > PM_SUSPEND_ON) {
+ error = -EBUSY;
+ goto out;
+ }

if (sscanf(buf, "%u", &val) == 1) {
if (pm_save_wakeup_count(val))
return n;
}
- return -EINVAL;
+
+ out:
+ pm_autosleep_unlock();
+ return error;
@@ -411,6 +478,9 @@ static struct attribute * g[] = {
#ifdef CONFIG_PM_SLEEP
&pm_async_attr.attr,
&wakeup_count_attr.attr,
+#ifdef CONFIG_PM_AUTOSLEEP
+ &autosleep_attr.attr,
+#endif
#ifdef CONFIG_PM_DEBUG
&pm_test_attr.attr,
#endif
@@ -446,7 +516,10 @@ static int __init pm_init(void)

Rafael J. Wysocki

unread,
Feb 8, 2012, 7:02:10 PM2/8/12
to NeilBrown, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Alan Stern
On Thursday, February 09, 2012, NeilBrown wrote:
> Would it be worth making this:
>
> if (!inpr && waitqueue_active(&wakeup_count_wait_queue))
> wake_up_all(&wakeup_count_wait_queue);
>
> ??
> It would often save a spinlock.

Yes, good point. :-)

> Also was there a reason you used wake_up_all(). That is only really needed
> were EXCLUSIVE waits are happening, and there aren't any of those.

Right, I think wake_up() should be fine too.

Thanks,
Rafael

Rafael J. Wysocki

unread,
Feb 9, 2012, 7:40:28 PM2/9/12
to NeilBrown, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Alan Stern
Hi,

On Thursday, February 09, 2012, NeilBrown wrote:
> On Tue, 7 Feb 2012 02:00:55 +0100 "Rafael J. Wysocki" <r...@sisk.pl> wrote:
>
>
> > All in all, it's not as much code as I thought it would be and it seems to be
> > relatively simple (which rises the question why the Android people didn't
> > even _try_ to do something like this instead of slapping the "real" wakelocks
> > onto the kernel FWIW). IMHO it doesn't add anything really new to the kernel,
> > except for the user space interfaces that should be maintainable. At least I
> > think I should be able to maintain them. :-)
> >
> > All of the above has been tested very briefly on my test-bed Mackerel board
> > and it quite obviously requires more thorough testing, but first I need to know
> > if it makes sense to spend any more time on it.
> >
> > IOW, I need to know your opinions!
>
> I've got opinions!!!

Good! :-)

It seems that no one else has.

> I'll try to avoid the obvious bike-shedding about interface design...
>
> The key point I want to make is that doing this in the kernel has one very
> import difference to doing it in userspace (which, as you know, I prefer)
> which may not be obvious to everyone at first sight. So I will try to make it
> apparent.
>
> In the user-space solution that we have previously discussed, it is only
> necessary for the kernel to hold a wakeup_source active until the event is
> *visible* to user-space. So a low level driver can queue e.g. an input event
> and then deactivate their wakeup_source. The event can remain in the input
> queue without any wakeup_source being active and there is no risk of going to
> sleep inappropriately.
> This is because - in the user-space approach - user-space must effectively
> poll every source of interesting wakeup events between the last wakeup_source
> being deactivate and the next attempt to suspend. This poll will notice the
> event sitting in a queue so that a well-written user-space will not go to
> sleep but will read the event.
> (Note that this 'poll-of-every-device' need not be expensive. It can be a
> single 'poll' or 'select' or even 'read' on a pollfd).

So I see one little problem with that, which is that you'd need to teach user
space developers what to do an how to do that correctly.

Also, when you say "user space", it isn't exactly clear whether you mean a
power manager (that would carry out the attmepts to suspend) or applications
(that would need to communicate with the power manager to let it know what
they are doing). This is important, because in general, before deactivating
a wakeup source the kernel subsystem should know that the associated event
has become visible not only to the "polling" application, but also (perhaps
indirectly) to the power manager, so that it doesn't trigger suspend too
early.

> In the kernel based approach that you have presented this is not the case.
> As the kernel will initiate suspend the moment the last wakeup_source is
> released (with no polling of other queues), there must be an unbroken chain of
> wakeup_sources from the initial interrupt all the way up to the user.
> In particular, any subsystem (such as 'input') must hold a wakeup_source
> active as long as any designated 'wakeup event' is in any of its queues.
> This means that the subsystem must be able to differentiate wakeup events
> from non-wakeup events.
> This might be easy (maybe "all events are wakeup events" or "all events on
> this queue are wakeup events") but it is not obvious to me that that is the
> case.
>
> To summarise: for this solution to be effective it also requires that
> 1/ every subsystem that carries wakeup events must know about wakeup_sources
> and must activate/deactivate them as events are queued/dequeued.
> 2/ these subsystems must be able to differentiate between wakeup events and
> non-wakeup events, and this must be a configurable decision.
>
> Currently, understanding wakeup events is restricted to:
> - drivers that are capable of configuring wakeup
> - user-space which cares about wakeup
> The proposed solution adds:
> - intermediate subsystems which might queue wakeup events
>
> I think that is a significant addition to make and not one to be made
> lightly. It might end up adding more code than you thought it would be :-)

I'm aware of that and I expect people to come up with patches adding the
handling of wakeup events to a number of subsystems (this is kind of needed
regardless of autosleep if we want to be sure that user space has actually
consumed events we want it to take from us before suspending). However,
I'm not expecting that to be a lot of code (I think we both can only speculate
about that at this point) and those subsystems have maintainers and the
decision whether or not to take that code is theirs.

That may be a long process, but at least we can see from Android what's
needed and where.

Still, the point here is to give people something to start with so that they
can take the Android user space, test it against the mainline and see what
doesn't work and why and come up with fixes. Perhaps they will have better
ideas than we think right now, but surely nothing more is going to happen
without this starting point.

I'd like us and Android to use the same low-level data structures for power
management and the same API eventually, at least for drivers. This is not
the case at the moment and it's actively hurting us as a project quite a bit.
If Android needs to add patches on top of whatever we have to get the desired
functionality, I'm fine with that, as long as they don't require drivers to use
APIs that are incompatible with the mainline. Insisting that Android should
use a user-space-based autosleep implementation wouldn't help at all, because
realistically this isn't going to happen.

> Thanks for the opportunity to comment,

No need to thank for that, it's Open Source after all ...

mark gross

unread,
Feb 11, 2012, 8:20:53 PM2/11/12
to Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern
dude early suspend is the hallmark of enlightend coding for implementing
a kernel / user mode handshake to user mode when the display is turned
off. How can you not like that shit?

>
> [7/8] - This adds a wakeup source statistics that only makes sense with
> autosleep and (I believe) is analogous to the Android's prevent_suspend_time
> statistics. Nothing really special, but I didn't want
> wakeup_source_activate/deactivate() to take a common lock to avoid
> congestion.
>
> [8/8] - This adds a user space interface to create, activate and deactivate
> wakeup sources. Since the files it consists of are called wake_lock and
> wake_unlock, to follow Android, the objects the wakeup sources are wrapped
> into are called "wakelocks" (for added confusion). Since the interface
> doesn't provide any means to destroy those "wakelocks", I added a garbage
> collection mechanism to get rid of the unused ones, if any. I also tought
> it might be a good idea to put a limit on the number of those things that
> user space can operate simultaneously, so I did that too.
>
> All in all, it's not as much code as I thought it would be and it seems to be
> relatively simple (which rises the question why the Android people didn't
> even _try_ to do something like this instead of slapping the "real" wakelocks
> onto the kernel FWIW). IMHO it doesn't add anything really new to the kernel,
> except for the user space interfaces that should be maintainable. At least I
> think I should be able to maintain them. :-)
>
> All of the above has been tested very briefly on my test-bed Mackerel board
> and it quite obviously requires more thorough testing, but first I need to know
> if it makes sense to spend any more time on it.
>
> IOW, I need to know your opinions!
my opinion is "sigh".

FWIW we need to bring Android wakelocks into the main line so we can fix
them WRT wake event notification handling. But, I'll have to take a
look at the patches to see if I still have heart burn over the race
between wake sources and wake lock dropping in kernel mode.

/me goes and looks now....

--mark

mark gross

unread,
Feb 11, 2012, 8:28:12 PM2/11/12
to Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern
Nit/ style comment: how is replacing a TIMEOUT macro with a magic number
an improvement. (maybe timeout is a un-helpful name but 100 isn't any
better. )

mark gross

unread,
Feb 11, 2012, 8:55:20 PM2/11/12
to NeilBrown, Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Alan Stern
On Thu, Feb 09, 2012 at 10:57:36AM +1100, NeilBrown wrote:
> On Tue, 7 Feb 2012 02:00:55 +0100 "Rafael J. Wysocki" <r...@sisk.pl> wrote:
>
>
> > All in all, it's not as much code as I thought it would be and it seems to be
> > relatively simple (which rises the question why the Android people didn't
> > even _try_ to do something like this instead of slapping the "real" wakelocks
> > onto the kernel FWIW). IMHO it doesn't add anything really new to the kernel,
> > except for the user space interfaces that should be maintainable. At least I
> > think I should be able to maintain them. :-)
> >
> > All of the above has been tested very briefly on my test-bed Mackerel board
> > and it quite obviously requires more thorough testing, but first I need to know
> > if it makes sense to spend any more time on it.
> >
> > IOW, I need to know your opinions!
>
> I've got opinions!!!
>
> I'll try to avoid the obvious bike-shedding about interface design...
>
> The key point I want to make is that doing this in the kernel has one very
> import difference to doing it in userspace (which, as you know, I prefer)
> which may not be obvious to everyone at first sight. So I will try to make it
> apparent.
>
> In the user-space solution that we have previously discussed, it is only
> necessary for the kernel to hold a wakeup_source active until the event is
> *visible* to user-space. So a low level driver can queue e.g. an input event
> and then deactivate their wakeup_source. The event can remain in the input
> queue without any wakeup_source being active and there is no risk of going to
> sleep inappropriately.
> This is because - in the user-space approach - user-space must effectively
> poll every source of interesting wakeup events between the last wakeup_source
> being deactivate and the next attempt to suspend. This poll will notice the
> event sitting in a queue so that a well-written user-space will not go to
> sleep but will read the event.
<sarcasm>
its on running on 100's of millions of devices today... It must be well
written. Right?
</sarcasm>

> single 'poll' or 'select' or even 'read' on a pollfd).
>
> In the kernel based approach that you have presented this is not the case.
> As the kernel will initiate suspend the moment the last wakeup_source is
> released (with no polling of other queues), there must be an unbroken chain of
> wakeup_sources from the initial interrupt all the way up to the user.
> In particular, any subsystem (such as 'input') must hold a wakeup_source
> active as long as any designated 'wakeup event' is in any of its queues.
> This means that the subsystem must be able to differentiate wakeup events
> from non-wakeup events.
> This might be easy (maybe "all events are wakeup events" or "all events on
> this queue are wakeup events") but it is not obvious to me that that is the
> case.
>
And this brings us to a wake acknowledgement of wake events from user
mode before re-suspending type of design.


> To summarise: for this solution to be effective it also requires that
> 1/ every subsystem that carries wakeup events must know about wakeup_sources
> and must activate/deactivate them as events are queued/dequeued.
> 2/ these subsystems must be able to differentiate between wakeup events and
> non-wakeup events, and this must be a configurable decision.
>
> Currently, understanding wakeup events is restricted to:
> - drivers that are capable of configuring wakeup
> - user-space which cares about wakeup
> The proposed solution adds:
> - intermediate subsystems which might queue wakeup events
>
> I think that is a significant addition to make and not one to be made
> lightly. It might end up adding more code than you thought it would be :-)
you mean wake lock-itis sprinkling time out wake locks all over the
place?

--mark

> Thanks for the opportunity to comment,
> NeilBrown

mark gross

unread,
Feb 11, 2012, 9:05:58 PM2/11/12
to Rafael J. Wysocki, NeilBrown, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Alan Stern
On Fri, Feb 10, 2012 at 01:44:10AM +0100, Rafael J. Wysocki wrote:
> Hi,
>
> On Thursday, February 09, 2012, NeilBrown wrote:
> > On Tue, 7 Feb 2012 02:00:55 +0100 "Rafael J. Wysocki" <r...@sisk.pl> wrote:
> >
> >
> > > All in all, it's not as much code as I thought it would be and it seems to be
> > > relatively simple (which rises the question why the Android people didn't
> > > even _try_ to do something like this instead of slapping the "real" wakelocks
> > > onto the kernel FWIW). IMHO it doesn't add anything really new to the kernel,
> > > except for the user space interfaces that should be maintainable. At least I
> > > think I should be able to maintain them. :-)
> > >
> > > All of the above has been tested very briefly on my test-bed Mackerel board
> > > and it quite obviously requires more thorough testing, but first I need to know
> > > if it makes sense to spend any more time on it.
> > >
> > > IOW, I need to know your opinions!
> >
> > I've got opinions!!!
>
> Good! :-)
>
> It seems that no one else has.
I'm sorry I've been really bad this last year about my email latency.
yup, an explicit user mode acknowledgment of the wake event would be
appropriate.
why not? I don't think having the PMS explicitly acknowledge a wake
event is a big ask at all.

--mark

Rafael J. Wysocki

unread,
Feb 12, 2012, 4:28:31 PM2/12/12
to mark...@thegnar.org, NeilBrown, Linux PM list, LKML, Magnus Damm, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Alan Stern
On Sunday, February 12, 2012, mark gross wrote:
> On Fri, Feb 10, 2012 at 01:44:10AM +0100, Rafael J. Wysocki wrote:
[...]
> > I'd like us and Android to use the same low-level data structures for power
> > management and the same API eventually, at least for drivers. This is not
> > the case at the moment and it's actively hurting us as a project quite a bit.
> > If Android needs to add patches on top of whatever we have to get the desired
> > functionality, I'm fine with that, as long as they don't require drivers to use
> > APIs that are incompatible with the mainline. Insisting that Android should
> > use a user-space-based autosleep implementation wouldn't help at all, because
> > realistically this isn't going to happen.
>
> why not? I don't think having the PMS explicitly acknowledge a wake
> event is a big ask at all.

I'd like to hear what the Android people think about that, but somehow it seems
to me they won't like it. :-)

Arve Hjønnevåg

unread,
Feb 13, 2012, 7:11:33 PM2/13/12
to Rafael J. Wysocki, mark...@thegnar.org, NeilBrown, Linux PM list, LKML, Magnus Damm, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Alan Stern
On Sun, Feb 12, 2012 at 1:32 PM, Rafael J. Wysocki <r...@sisk.pl> wrote:
> On Sunday, February 12, 2012, mark gross wrote:
>> On Fri, Feb 10, 2012 at 01:44:10AM +0100, Rafael J. Wysocki wrote:
> [...]
>> > I'd like us and Android to use the same low-level data structures for power
>> > management and the same API eventually, at least for drivers.  This is not
>> > the case at the moment and it's actively hurting us as a project quite a bit.
>> > If Android needs to add patches on top of whatever we have to get the desired
>> > functionality, I'm fine with that, as long as they don't require drivers to use
>> > APIs that are incompatible with the mainline.  Insisting that Android should
>> > use a user-space-based autosleep implementation wouldn't help at all, because
>> > realistically this isn't going to happen.
>>
>> why not?  I don't think having the PMS explicitly acknowledge a wake
>> event is a big ask at all.
>
> I'd like to hear what the Android people think about that, but somehow it seems
> to me they won't like it. :-)
>

Correct.

The android power manager service does not handle wake events and
therefore does not know when it is safe to acknowledge a wake event
(assuming this acknowledgement re-triggers suspend). Other components
handle the event and only notify the power manager if the event should
change a state (e.g. turn the screen on). Some wake events, like the
alarm used for battery monitoring, don't signal user space at all if
the user visible state did not change. Other wake events are processed
by lower level user-space services than the system-server where the
power manager runs.

--
Arve Hjønnevåg

Arve Hjønnevåg

unread,
Feb 13, 2012, 9:07:51 PM2/13/12
to Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern
On Mon, Feb 6, 2012 at 5:00 PM, Rafael J. Wysocki <r...@sisk.pl> wrote:
..
> All in all, it's not as much code as I thought it would be and it seems to be
> relatively simple (which rises the question why the Android people didn't
> even _try_ to do something like this instead of slapping the "real" wakelocks
> onto the kernel FWIW).  IMHO it doesn't add anything really new to the kernel,
> except for the user space interfaces that should be maintainable.  At least I
> think I should be able to maintain them. :-)
>

Replacing a working solution with an untested one takes time. That
said, I have recently tried replacing all our kernel wake-locks with a
thin wrapper around wake-sources. This appears to mostly work, but the
wake-source timeout feature has some bugs or incompatible apis. An
init api would also be useful for embedding wake-sources in other data
structures without adding another memory allocation. Your patch to
move the spinlock init to wakeup_source_add still require the struct
to be zero initialized and the name set manually.

I needed to use two wake-sources per wake-lock since calling
__pm_stay_awake after __pm_wakeup_event on a wake-source does not
cancel the timeout. Unless there is a reason to keep this behavior I
would like __pm_stay_awake to cancel any active timeout.

Destroying a wake-source also has some problems. If you call
wakeup_source_destroy it will spin forever if the wake-source is
active without a timeout. And, if you call __pm_relax then
wakeup_source_destroy it could free the wake-source memory while the
timer function is still running. It also looks as if the wake_source
can be immediately deactivated if you call __pm_wakeup_event at the
same time as the previous timeout expired.

--
Arve Hjønnevåg

Rafael J. Wysocki

unread,
Feb 14, 2012, 6:18:39 PM2/14/12
to Arve Hjønnevåg, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern
On Tuesday, February 14, 2012, Arve Hjønnevåg wrote:
> On Mon, Feb 6, 2012 at 5:00 PM, Rafael J. Wysocki <r...@sisk.pl> wrote:
> ...
> > All in all, it's not as much code as I thought it would be and it seems to be
> > relatively simple (which rises the question why the Android people didn't
> > even _try_ to do something like this instead of slapping the "real" wakelocks
> > onto the kernel FWIW). IMHO it doesn't add anything really new to the kernel,
> > except for the user space interfaces that should be maintainable. At least I
> > think I should be able to maintain them. :-)
> >
>
> Replacing a working solution with an untested one takes time.

Sure, that's pretty obvious. :-)

> That said, I have recently tried replacing all our kernel wake-locks with a
> thin wrapper around wake-sources. This appears to mostly work,

Good!

> but the wake-source timeout feature has some bugs or incompatible apis. An
> init api would also be useful for embedding wake-sources in other data
> structures without adding another memory allocation. Your patch to
> move the spinlock init to wakeup_source_add still require the struct
> to be zero initialized and the name set manually.

That should be easy to fix. What about the appended patch?

> I needed to use two wake-sources per wake-lock since calling
> __pm_stay_awake after __pm_wakeup_event on a wake-source does not
> cancel the timeout. Unless there is a reason to keep this behavior I
> would like __pm_stay_awake to cancel any active timeout.

That actually is a bug. At least it's not consistent with
__pm_wakeup_event() that will replace the existing timeout with a new
one.

I'll post a patch to fix that in the next couple of days, stay tuned. :-)

> Destroying a wake-source also has some problems. If you call
> wakeup_source_destroy it will spin forever if the wake-source is
> active without a timeout. And, if you call __pm_relax then
> wakeup_source_destroy it could free the wake-source memory while the
> timer function is still running.

This also is a bug that needs fixing anyway.

> It also looks as if the wake_source can be immediately deactivated if
> you call __pm_wakeup_event at the same time as the previous timeout expired.

Yes, there is a race window if the timer function has already started.
It looks like I wanted to make it too simple. :-) Will fix.

Thanks,
Rafael


Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
drivers/base/power/wakeup.c | 44 +++++++++++++++++++++++++++++++++++++-------
include/linux/pm_wakeup.h | 9 +++++++++
2 files changed, 46 insertions(+), 7 deletions(-)

Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -53,6 +53,28 @@ static void pm_wakeup_timer_fn(unsigned
static LIST_HEAD(wakeup_sources);

/**
+ * wakeup_source_init - Initialize a struct wakeup_source object.
+ * @ws: Wakeup source to initialize.
+ * @name: Name of the new wakeup source.
+ */
+int wakeup_source_init(struct wakeup_source *ws, const char *name)
+{
+ int ret = 0;
+
+ if (!ws)
+ return -EINVAL;
+
+ memset(ws, 0, sizeof(*ws));
+ if (name) {
+ ws->name = kstrdup(name, GFP_KERNEL);
+ if (!ws->name)
+ ret = -ENOMEM;
+ }
+ return ret;
+}
+EXPORT_SYMBOL_GPL(wakeup_source_init);
+
+/**
* wakeup_source_create - Create a struct wakeup_source object.
* @name: Name of the new wakeup source.
*/
@@ -60,22 +82,20 @@ struct wakeup_source *wakeup_source_crea
{
struct wakeup_source *ws;

- ws = kzalloc(sizeof(*ws), GFP_KERNEL);
+ ws = kmalloc(sizeof(*ws), GFP_KERNEL);
if (!ws)
return NULL;

- if (name)
- ws->name = kstrdup(name, GFP_KERNEL);
-
+ wakeup_source_init(ws, name);
return ws;
}
EXPORT_SYMBOL_GPL(wakeup_source_create);

/**
- * wakeup_source_destroy - Destroy a struct wakeup_source object.
- * @ws: Wakeup source to destroy.
+ * wakeup_source_drop - Prepare a struct wakeup_source object for destruction.
+ * @ws: Wakeup source to prepare for destruction.
*/
-void wakeup_source_destroy(struct wakeup_source *ws)
+void wakeup_source_drop(struct wakeup_source *ws)
{
if (!ws)
return;
@@ -91,6 +111,16 @@ void wakeup_source_destroy(struct wakeup
spin_unlock_irq(&ws->lock);

kfree(ws->name);
+}
+EXPORT_SYMBOL_GPL(wakeup_source_drop);
+
+/**
+ * wakeup_source_destroy - Destroy a struct wakeup_source object.
+ * @ws: Wakeup source to destroy.
+ */
+void wakeup_source_destroy(struct wakeup_source *ws)
+{
+ wakeup_source_drop(ws);
kfree(ws);
}
EXPORT_SYMBOL_GPL(wakeup_source_destroy);
Index: linux/include/linux/pm_wakeup.h
===================================================================
--- linux.orig/include/linux/pm_wakeup.h
+++ linux/include/linux/pm_wakeup.h
@@ -73,7 +73,9 @@ static inline bool device_may_wakeup(str
}

/* drivers/base/power/wakeup.c */
+extern int wakeup_source_init(struct wakeup_source *ws, const char *name);
extern struct wakeup_source *wakeup_source_create(const char *name);
+extern void wakeup_source_drop(struct wakeup_source *ws);
extern void wakeup_source_destroy(struct wakeup_source *ws);
extern void wakeup_source_add(struct wakeup_source *ws);
extern void wakeup_source_remove(struct wakeup_source *ws);
@@ -103,11 +105,18 @@ static inline bool device_can_wakeup(str
return dev->power.can_wakeup;
}

+static inline int wakeup_source_init(struct wakeup_source *ws, const char *name)
+{
+ return -ENOSYS;
+}
+
static inline struct wakeup_source *wakeup_source_create(const char *name)
{
return NULL;
}

+static inline void wakeup_source_drop(struct wakeup_source *ws) {}
+
static inline void wakeup_source_destroy(struct wakeup_source *ws) {}

static inline void wakeup_source_add(struct wakeup_source *ws) {}

Arve Hjønnevåg

unread,
Feb 15, 2012, 12:57:54 AM2/15/12
to Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern
2012/2/14 Rafael J. Wysocki <r...@sisk.pl>:
> On Tuesday, February 14, 2012, Arve Hjønnevåg wrote:
>> On Mon, Feb 6, 2012 at 5:00 PM, Rafael J. Wysocki <r...@sisk.pl> wrote:
>> ...
>> but the wake-source timeout feature has some bugs or incompatible apis. An
>> init api would also be useful for embedding wake-sources in other data
>> structures without adding another memory allocation. Your patch to
>> move the spinlock init to wakeup_source_add still require the struct
>> to be zero initialized and the name set manually.
>
> That should be easy to fix.  What about the appended patch?
>

That works, but I still have to call more than one function before I
can use the wakeup-source (wakeup_source_init and wakeup_source_add)
and more than one function before I can free it (__pm_relax,
wakeup_source_remove and wakeup_source_drop). Is there any reason to
keep these separate?

Also, not copying the name when the caller provides the memory for the
wakeup-source would be a closer match to the wakelock api. Most of our
wakelocks pass a string constant as the name, and making a copy of
that string is not useful. wake_lock_init is also safe to call from
atomic context, but I don't know if anyone relies on this.

--
Arve Hjønnevåg

Arve Hjønnevåg

unread,
Feb 15, 2012, 1:15:26 AM2/15/12
to Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern
On Mon, Feb 6, 2012 at 5:05 PM, Rafael J. Wysocki <r...@sisk.pl> wrote:
> From: Rafael J. Wysocki <r...@sisk.pl>
>
> Wakeup statistics used by Android are slightly different from what we
> have at the moment, so modify them to follow Android more closely.
..
> @@ -438,6 +444,11 @@ static void wakeup_source_deactivate(str
>        if (ktime_to_ns(duration) > ktime_to_ns(ws->max_time))
>                ws->max_time = duration;
>
> +       ws->last_time = now;
> +       if (ws->has_timeout && time_after(jiffies, ws->timer_expires))

time_after_eq may work better (or increment the count from the timer).
I applied this patch and the expire counts I see for wakeup-sources
that always time-out do not match the active count.

--
Arve Hjønnevåg

mark gross

unread,
Feb 15, 2012, 10:29:09 AM2/15/12
to Arve Hjønnevåg, Rafael J. Wysocki, mark...@thegnar.org, NeilBrown, Linux PM list, LKML, Magnus Damm, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Alan Stern
So you are all good with the wake event suspend race condition never ever
getting corrected or the fact that we have to sprinkle overlapping
kernel wake locks up and down the stack if we want to attempt to
implement correct code or that there is *no* way to deal with the hand
off of a wake lock critical section between kernel and user mode on wake
events without having a somewhat arbitrary time out wake lock dropping in
kernel mode?

Fine, if you don't like having the PMS ack wake events how about having
the services that handle them do it?

The basic problem with wake locks is that there is no explicit wake
event acknowledgment required before re-suspending. How about helping
us come up with a solution to that.

--mark

Rafael J. Wysocki

unread,
Feb 15, 2012, 5:33:41 PM2/15/12
to Arve Hjønnevåg, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern
On Wednesday, February 15, 2012, Arve Hjønnevåg wrote:
> On Mon, Feb 6, 2012 at 5:05 PM, Rafael J. Wysocki <r...@sisk.pl> wrote:
> > From: Rafael J. Wysocki <r...@sisk.pl>
> >
> > Wakeup statistics used by Android are slightly different from what we
> > have at the moment, so modify them to follow Android more closely.
> ...
> > @@ -438,6 +444,11 @@ static void wakeup_source_deactivate(str
> > if (ktime_to_ns(duration) > ktime_to_ns(ws->max_time))
> > ws->max_time = duration;
> >
> > + ws->last_time = now;
> > + if (ws->has_timeout && time_after(jiffies, ws->timer_expires))
>
> time_after_eq may work better (or increment the count from the timer).

I think incrementing the count from the timer is a better approach.

> I applied this patch and the expire counts I see for wakeup-sources
> that always time-out do not match the active count.

I see. The reason may also be that __pm_wakeup_event() increments
ws->event_count even if the wakeup source is already active.

Thanks,
Rafael

Rafael J. Wysocki

unread,
Feb 15, 2012, 6:03:50 PM2/15/12
to Arve Hjønnevåg, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern
On Wednesday, February 15, 2012, Arve Hjønnevåg wrote:
> 2012/2/14 Rafael J. Wysocki <r...@sisk.pl>:
> > On Tuesday, February 14, 2012, Arve Hjønnevåg wrote:
> >> On Mon, Feb 6, 2012 at 5:00 PM, Rafael J. Wysocki <r...@sisk.pl> wrote:
> >> ...
> >> but the wake-source timeout feature has some bugs or incompatible apis. An
> >> init api would also be useful for embedding wake-sources in other data
> >> structures without adding another memory allocation. Your patch to
> >> move the spinlock init to wakeup_source_add still require the struct
> >> to be zero initialized and the name set manually.
> >
> > That should be easy to fix. What about the appended patch?
> >
>
> That works, but I still have to call more than one function before I
> can use the wakeup-source (wakeup_source_init and wakeup_source_add)
> and more than one function before I can free it (__pm_relax,
> wakeup_source_remove and wakeup_source_drop). Is there any reason to
> keep these separate?

Yes, there is. I think that wakeup_source_create/_destroy() should
use the same initialization functions internally that will be used for
externally allocated wakeup sources (to make sure that all wakeup source
objects are initialized in exactly the same way).

> Also, not copying the name when the caller provides the memory for the
> wakeup-source would be a closer match to the wakelock api. Most of our
> wakelocks pass a string constant as the name, and making a copy of
> that string is not useful. wake_lock_init is also safe to call from
> atomic context, but I don't know if anyone relies on this.

OK, below is another go. It doesn't copy the name if wakeup_source_init() is
used (which also does the _add this time). I think, though, that copying
the name is generally safer, because someone might use wakeup_source_init()
with the name string allocated on the stack or otherwise temporary, which would
be a bug with the new version.

Thanks,
Rafael


Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
drivers/base/power/wakeup.c | 41 ++++++++++++++++++++++++++++++++++-------
include/linux/pm_wakeup.h | 20 ++++++++++++++++++++
2 files changed, 54 insertions(+), 7 deletions(-)

Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -53,6 +53,23 @@ static void pm_wakeup_timer_fn(unsigned
static LIST_HEAD(wakeup_sources);

/**
+ * wakeup_source_prepare - Prepare a new wakeup source for initialization.
+ * @ws: Wakeup source to prepare.
+ * @name: Pointer to the name of the new wakeup source.
+ *
+ * Callers must ensure that the @name string won't be freed when @ws is still in
+ * use.
+ */
+void wakeup_source_prepare(struct wakeup_source *ws, const char *name)
+{
+ if (ws) {
+ memset(ws, 0, sizeof(*ws));
+ ws->name = name;
+ }
+}
+EXPORT_SYMBOL_GPL(wakeup_source_prepare);
+
+/**
* wakeup_source_create - Create a struct wakeup_source object.
* @name: Name of the new wakeup source.
*/
@@ -60,31 +77,41 @@ struct wakeup_source *wakeup_source_crea
{
struct wakeup_source *ws;

- ws = kzalloc(sizeof(*ws), GFP_KERNEL);
+ ws = kmalloc(sizeof(*ws), GFP_KERNEL);
if (!ws)
return NULL;

- if (name)
- ws->name = kstrdup(name, GFP_KERNEL);
-
+ wakeup_source_prepare(ws, name ? kstrdup(name, GFP_KERNEL) : NULL);
return ws;
}
EXPORT_SYMBOL_GPL(wakeup_source_create);

/**
- * wakeup_source_destroy - Destroy a struct wakeup_source object.
- * @ws: Wakeup source to destroy.
+ * wakeup_source_drop - Prepare a struct wakeup_source object for destruction.
+ * @ws: Wakeup source to prepare for destruction.
*
* Callers must ensure that __pm_stay_awake() or __pm_wakeup_event() will never
* be run in parallel with this function for the same wakeup source object.
*/
-void wakeup_source_destroy(struct wakeup_source *ws)
+void wakeup_source_drop(struct wakeup_source *ws)
{
if (!ws)
return;

del_timer_sync(&ws->timer);
__pm_relax(ws);
+}
+EXPORT_SYMBOL_GPL(wakeup_source_drop);
+
+/**
+ * wakeup_source_destroy - Destroy a struct wakeup_source object.
+ * @ws: Wakeup source to destroy.
+ *
+ * Use only for wakeup source objects created with wakeup_source_create().
+ */
+void wakeup_source_destroy(struct wakeup_source *ws)
+{
+ wakeup_source_drop(ws);
kfree(ws->name);
kfree(ws);
}
Index: linux/include/linux/pm_wakeup.h
===================================================================
--- linux.orig/include/linux/pm_wakeup.h
+++ linux/include/linux/pm_wakeup.h
@@ -73,7 +73,9 @@ static inline bool device_may_wakeup(str
}

/* drivers/base/power/wakeup.c */
+extern void wakeup_source_prepare(struct wakeup_source *ws, const char *name);
extern struct wakeup_source *wakeup_source_create(const char *name);
+extern void wakeup_source_drop(struct wakeup_source *ws);
extern void wakeup_source_destroy(struct wakeup_source *ws);
extern void wakeup_source_add(struct wakeup_source *ws);
extern void wakeup_source_remove(struct wakeup_source *ws);
@@ -103,11 +105,16 @@ static inline bool device_can_wakeup(str
return dev->power.can_wakeup;
}

+static inline void wakeup_source_prepare(struct wakeup_source *ws,
+ const char *name) {}
+
static inline struct wakeup_source *wakeup_source_create(const char *name)
{
return NULL;
}

+static inline void wakeup_source_drop(struct wakeup_source *ws) {}
+
static inline void wakeup_source_destroy(struct wakeup_source *ws) {}

static inline void wakeup_source_add(struct wakeup_source *ws) {}
@@ -165,4 +172,17 @@ static inline void pm_wakeup_event(struc

#endif /* !CONFIG_PM_SLEEP */

+static inline void wakeup_source_init(struct wakeup_source *ws,
+ const char *name)
+{
+ wakeup_source_prepare(ws, name);
+ wakeup_source_add(ws);
+}
+
+static inline void wakeup_source_trash(struct wakeup_source *ws)
+{
+ wakeup_source_remove(ws);
+ wakeup_source_drop(ws);
+}
+
#endif /* _LINUX_PM_WAKEUP_H */

Rafael J. Wysocki

unread,
Feb 16, 2012, 5:18:37 PM2/16/12
to Arve Hjønnevåg, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern
So, is the new version more suitable than the previous one?
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in

Arve Hjønnevåg

unread,
Feb 16, 2012, 9:12:09 PM2/16/12
to Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern
2012/2/15 Rafael J. Wysocki <r...@sisk.pl>:
> On Wednesday, February 15, 2012, Arve Hjønnevåg wrote:
>> On Mon, Feb 6, 2012 at 5:05 PM, Rafael J. Wysocki <r...@sisk.pl> wrote:
>> > From: Rafael J. Wysocki <r...@sisk.pl>
>> >
>> > Wakeup statistics used by Android are slightly different from what we
>> > have at the moment, so modify them to follow Android more closely.
>> ...
>> > @@ -438,6 +444,11 @@ static void wakeup_source_deactivate(str
>> >        if (ktime_to_ns(duration) > ktime_to_ns(ws->max_time))
>> >                ws->max_time = duration;
>> >
>> > +       ws->last_time = now;
>> > +       if (ws->has_timeout && time_after(jiffies, ws->timer_expires))
>>
>> time_after_eq may work better (or increment the count from the timer).
>
> I think incrementing the count from the timer is a better approach.
>

OK.

>> I applied this patch and the expire counts I see for wakeup-sources
>> that always time-out do not match the active count.
>
> I see.  The reason may also be that __pm_wakeup_event() increments
> ws->event_count even if the wakeup source is already active.
>

The active count, which is what I was looking at, only changes if it
was not already active though.

--
Arve Hjønnevåg

Arve Hjønnevåg

unread,
Feb 16, 2012, 10:55:53 PM2/16/12
to Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern
2012/2/15 Rafael J. Wysocki <r...@sisk.pl>:
> On Wednesday, February 15, 2012, Arve Hjønnevåg wrote:
>> 2012/2/14 Rafael J. Wysocki <r...@sisk.pl>:
>> > On Tuesday, February 14, 2012, Arve Hjønnevåg wrote:
>> >> On Mon, Feb 6, 2012 at 5:00 PM, Rafael J. Wysocki <r...@sisk.pl> wrote:
>> >> ...
>> >> but the wake-source timeout feature has some bugs or incompatible apis. An
>> >> init api would also be useful for embedding wake-sources in other data
>> >> structures without adding another memory allocation. Your patch to
>> >> move the spinlock init to wakeup_source_add still require the struct
>> >> to be zero initialized and the name set manually.
>> >
>> > That should be easy to fix.  What about the appended patch?
>> >
>>
>> That works, but I still have to call more than one function before I
>> can use the wakeup-source (wakeup_source_init and wakeup_source_add)
>> and more than one function before I can free it (__pm_relax,
>> wakeup_source_remove and wakeup_source_drop). Is there any reason to
>> keep these separate?
>
> Yes, there is.  I think that wakeup_source_create/_destroy() should
> use the same initialization functions internally that will be used for
> externally allocated wakeup sources (to make sure that all wakeup source
> objects are initialized in exactly the same way).
>

I agree with that, but is it useful to export these helper functions?

>> Also, not copying the name when the caller provides the memory for the
>> wakeup-source would be a closer match to the wakelock api. Most of our
>> wakelocks pass a string constant as the name, and making a copy of
>> that string is not useful. wake_lock_init is also safe to call from
>> atomic context, but I don't know if anyone relies on this.
>
> OK, below is another go.  It doesn't copy the name if wakeup_source_init() is
> used (which also does the _add this time).  I think, though, that copying
> the name is generally safer, because someone might use wakeup_source_init()
> with the name string allocated on the stack or otherwise temporary, which would
> be a bug with the new version.
>

I prefer this version. I have not seen a bug where someone passed a
temporary as the wakelock name, I assume since this will show up
immediately in the stats file.

--
Arve Hjønnevåg

Arve Hjønnevåg

unread,
Feb 16, 2012, 10:56:36 PM2/16/12
to Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern
2012/2/16 Rafael J. Wysocki <r...@sisk.pl>:
..
>
> So, is the new version more suitable than the previous one?
>

Yes, I think it is.

--
Arve Hjønnevåg

Rafael J. Wysocki

unread,
Feb 17, 2012, 3:53:37 PM2/17/12
to Arve Hjønnevåg, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern
On Friday, February 17, 2012, Arve Hjønnevåg wrote:
> 2012/2/15 Rafael J. Wysocki <r...@sisk.pl>:
> > On Wednesday, February 15, 2012, Arve Hjønnevåg wrote:
> >> 2012/2/14 Rafael J. Wysocki <r...@sisk.pl>:
> >> > On Tuesday, February 14, 2012, Arve Hjønnevåg wrote:
> >> >> On Mon, Feb 6, 2012 at 5:00 PM, Rafael J. Wysocki <r...@sisk.pl> wrote:
> >> >> ...
> >> >> but the wake-source timeout feature has some bugs or incompatible apis. An
> >> >> init api would also be useful for embedding wake-sources in other data
> >> >> structures without adding another memory allocation. Your patch to
> >> >> move the spinlock init to wakeup_source_add still require the struct
> >> >> to be zero initialized and the name set manually.
> >> >
> >> > That should be easy to fix. What about the appended patch?
> >> >
> >>
> >> That works, but I still have to call more than one function before I
> >> can use the wakeup-source (wakeup_source_init and wakeup_source_add)
> >> and more than one function before I can free it (__pm_relax,
> >> wakeup_source_remove and wakeup_source_drop). Is there any reason to
> >> keep these separate?
> >
> > Yes, there is. I think that wakeup_source_create/_destroy() should
> > use the same initialization functions internally that will be used for
> > externally allocated wakeup sources (to make sure that all wakeup source
> > objects are initialized in exactly the same way).
> >
>
> I agree with that, but is it useful to export these helper functions?

Well, we need to export either them or the ones that will call them internally
and in principle someone may want to do something between _prepare() and _add()
sometimes ...

> >> Also, not copying the name when the caller provides the memory for the
> >> wakeup-source would be a closer match to the wakelock api. Most of our
> >> wakelocks pass a string constant as the name, and making a copy of
> >> that string is not useful. wake_lock_init is also safe to call from
> >> atomic context, but I don't know if anyone relies on this.
> >
> > OK, below is another go. It doesn't copy the name if wakeup_source_init() is
> > used (which also does the _add this time). I think, though, that copying
> > the name is generally safer, because someone might use wakeup_source_init()
> > with the name string allocated on the stack or otherwise temporary, which would
> > be a bug with the new version.
> >
>
> I prefer this version. I have not seen a bug where someone passed a
> temporary as the wakelock name, I assume since this will show up
> immediately in the stats file.

OK

Thanks,
Rafael

Rafael J. Wysocki

unread,
Feb 17, 2012, 5:58:25 PM2/17/12
to Linux PM list, Arve Hjønnevåg, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern
From: Rafael J. Wysocki <r...@sisk.pl>

The existing wakeup source initialization routines are not
particularly useful for wakeup sources that aren't created by
wakeup_source_create(), because their users have to open code
filling the objects with zeros and setting their names. For this
reason, introduce routines that can be used for initializing, for
example, static wakeup source objects.

Requested-by: Arve Hjønnevåg <ar...@android.com>
Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---

This patch is on top of the linux-next branch of the linux-pm tree.

Thanks,
Rafael

Rafael J. Wysocki

unread,
Feb 18, 2012, 6:46:57 PM2/18/12
to Linux PM list, Arve Hjønnevåg, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern
From: Rafael J. Wysocki <r...@sisk.pl>
Subject: PM / Sleep: Add more wakeup source initialization routines

The existing wakeup source initialization routines are not
particularly useful for wakeup sources that aren't created by
wakeup_source_create(), because their users have to open code
filling the objects with zeros and setting their names. For this
reason, introduce routines that can be used for initializing, for
example, static wakeup source objects.

Requested-by: Arve Hjønnevåg <ar...@android.com>
Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---

The name member of struct wakeup_source has to be of type (const char *)
due to the new dependencies between the arguments of the new initializers.
That also reflects the fact that that string is not supposed to be modified.

Thanks,
Rafael

---
drivers/base/power/wakeup.c | 41 ++++++++++++++++++++++++++++++++++-------
include/linux/pm_wakeup.h | 22 +++++++++++++++++++++-
2 files changed, 55 insertions(+), 8 deletions(-)
@@ -41,7 +41,7 @@
* @active: Status of the wakeup source.
*/
struct wakeup_source {
- char *name;
+ const char *name;
struct list_head entry;
spinlock_t lock;
struct timer_list timer;

Rafael J. Wysocki

unread,
Feb 20, 2012, 6:00:20 PM2/20/12
to Linux PM list, Arve Hjønnevåg, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern
From: Rafael J. Wysocki <r...@sisk.pl>
Subject: PM / Sleep: Add more wakeup source initialization routines

The existing wakeup source initialization routines are not
particularly useful for wakeup sources that aren't created by
wakeup_source_create(), because their users have to open code
filling the objects with zeros and setting their names. For this
reason, introduce routines that can be used for initializing, for
example, static wakeup source objects.

Requested-by: Arve Hjønnevåg <ar...@android.com>
Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---

Make sure that wakeup_source_unregister() won't crash or trigger the
WARN_ON() in wakeup_source_remove() if a NULL pointer is passed to it.

Thanks,
Rafael

---
drivers/base/power/wakeup.c | 50 ++++++++++++++++++++++++++++++++++++--------
include/linux/pm_wakeup.h | 22 ++++++++++++++++++-
2 files changed, 62 insertions(+), 10 deletions(-)
@@ -60,31 +77,44 @@ struct wakeup_source *wakeup_source_crea
+ if (!ws)
+ return;
+
+ wakeup_source_drop(ws);
kfree(ws->name);
kfree(ws);
}
@@ -147,8 +177,10 @@ EXPORT_SYMBOL_GPL(wakeup_source_register
*/
void wakeup_source_unregister(struct wakeup_source *ws)
{
- wakeup_source_remove(ws);
- wakeup_source_destroy(ws);
+ if (ws) {
+ wakeup_source_remove(ws);
+ wakeup_source_destroy(ws);
+ }
}
EXPORT_SYMBOL_GPL(wakeup_source_unregister);

Rafael J. Wysocki

unread,
Feb 21, 2012, 6:35:43 PM2/21/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
From: Rafael J. Wysocki <r...@sisk.pl>

Currently, the device suspend code in drivers/base/power/main.c
only checks if there have been any wakeup events, and therefore the
ongoing system transition to a sleep state should be aborted, during
the first (i.e. "suspend") device suspend phase. However, wakeup
events may be reported later as well, so it's reasonable to look for
them in the in the subsequent (i.e. "late suspend" and "suspend
noirq") phases.

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
drivers/base/power/main.c | 10 ++++++++++
1 file changed, 10 insertions(+)

Index: linux/drivers/base/power/main.c
===================================================================
--- linux.orig/drivers/base/power/main.c
+++ linux/drivers/base/power/main.c
@@ -889,6 +889,11 @@ static int dpm_suspend_noirq(pm_message_
if (!list_empty(&dev->power.entry))
list_move(&dev->power.entry, &dpm_noirq_list);
put_device(dev);
+
+ if (pm_wakeup_pending()) {
+ error = -EBUSY;
+ break;
+ }
}
mutex_unlock(&dpm_list_mtx);
if (error)
@@ -962,6 +967,11 @@ static int dpm_suspend_late(pm_message_t
if (!list_empty(&dev->power.entry))
list_move(&dev->power.entry, &dpm_late_early_list);
put_device(dev);
+
+ if (pm_wakeup_pending()) {
+ error = -EBUSY;
+ break;
+ }
}
mutex_unlock(&dpm_list_mtx);
if (error)

Rafael J. Wysocki

unread,
Feb 21, 2012, 6:35:48 PM2/21/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
From: Rafael J. Wysocki <r...@sisk.pl>

Wakeup statistics used by Android are slightly different from what we
have in wakeup sources at the moment and there aren't any known
users of those statistics other than Android, so modify them to make
it easier for Android to switch to wakeup sources.

This removes the struct wakeup_source's hit_cout field, which is very
rough and therefore not very useful, and adds two new fields,
wakeup_count and expire_count. The first one tracks how many times
the wakeup source is activated with events_check_enabled set (which
roughly corresponds to the situations when a system power transition
to a sleep state is in progress and would be aborted by this wakeup
source if it were the only active one at that time) and the second
one is the number of times the wakeup source has been activated with
a timeout that expired.

Additionally, the last_time field is now updated when the wakeup
source is deactivated too (previously it was only updated during
the wakeup source's activation), which seems to be what Android does
with the analogous counter for wakelocks.

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
Documentation/ABI/testing/sysfs-devices-power | 24 ++++++---
drivers/base/power/sysfs.c | 30 ++++++++++--
drivers/base/power/wakeup.c | 64 +++++++++++---------------
include/linux/pm_wakeup.h | 11 ++--
4 files changed, 77 insertions(+), 52 deletions(-)

Index: linux/include/linux/pm_wakeup.h
===================================================================
--- linux.orig/include/linux/pm_wakeup.h
+++ linux/include/linux/pm_wakeup.h
@@ -33,12 +33,14 @@
*
* @total_time: Total time this wakeup source has been active.
* @max_time: Maximum time this wakeup source has been continuously active.
- * @last_time: Monotonic clock when the wakeup source's was activated last time.
+ * @last_time: Monotonic clock when the wakeup source's was touched last time.
* @event_count: Number of signaled wakeup events.
* @active_count: Number of times the wakeup sorce was activated.
* @relax_count: Number of times the wakeup sorce was deactivated.
- * @hit_count: Number of times the wakeup sorce might abort system suspend.
+ * @expire_count: Number of times the wakeup source's timeout has expired.
+ * @wakeup_count: Number of times the wakeup source might abort suspend.
* @active: Status of the wakeup source.
+ * @has_timeout: The wakeup source has been activated with a timeout.
*/
struct wakeup_source {
const char *name;
@@ -52,8 +54,9 @@ struct wakeup_source {
unsigned long event_count;
unsigned long active_count;
unsigned long relax_count;
- unsigned long hit_count;
- unsigned int active:1;
+ unsigned long expire_count;
+ unsigned long wakeup_count;
+ bool active:1;
};

#ifdef CONFIG_PM_SLEEP
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -21,7 +21,7 @@
* If set, the suspend/hibernate code will abort transitions to a sleep state
* if wakeup events are registered during or immediately before the transition.
*/
-bool events_check_enabled;
+bool events_check_enabled __read_mostly;

/*
* Combined counters of registered wakeup events and wakeup events in progress.
@@ -383,6 +383,21 @@ static void wakeup_source_activate(struc
}

/**
+ * wakeup_source_report_event - Report wakeup event using the given source.
+ * @ws: Wakeup source to report the event for.
+ */
+static void wakeup_source_report_event(struct wakeup_source *ws)
+{
+ ws->event_count++;
+ /* This is racy, but the counter is approximate anyway. */
+ if (events_check_enabled)
+ ws->wakeup_count++;
+
+ if (!ws->active)
+ wakeup_source_activate(ws);
+}
+
+/**
* __pm_stay_awake - Notify the PM core of a wakeup event.
* @ws: Wakeup source object associated with the source of the event.
*
@@ -397,10 +412,7 @@ void __pm_stay_awake(struct wakeup_sourc

spin_lock_irqsave(&ws->lock, flags);

- ws->event_count++;
- if (!ws->active)
- wakeup_source_activate(ws);
-
+ wakeup_source_report_event(ws);
del_timer(&ws->timer);
ws->timer_expires = 0;

@@ -469,6 +481,7 @@ static void wakeup_source_deactivate(str
if (ktime_to_ns(duration) > ktime_to_ns(ws->max_time))
ws->max_time = duration;

+ ws->last_time = now;
del_timer(&ws->timer);
ws->timer_expires = 0;

@@ -541,8 +554,10 @@ static void pm_wakeup_timer_fn(unsigned
spin_lock_irqsave(&ws->lock, flags);

if (ws->active && ws->timer_expires
- && time_after_eq(jiffies, ws->timer_expires))
+ && time_after_eq(jiffies, ws->timer_expires)) {
wakeup_source_deactivate(ws);
+ ws->expire_count++;
+ }

spin_unlock_irqrestore(&ws->lock, flags);
}
@@ -569,9 +584,7 @@ void __pm_wakeup_event(struct wakeup_sou

spin_lock_irqsave(&ws->lock, flags);

- ws->event_count++;
- if (!ws->active)
- wakeup_source_activate(ws);
+ wakeup_source_report_event(ws);

if (!msec) {
wakeup_source_deactivate(ws);
@@ -614,24 +627,6 @@ void pm_wakeup_event(struct device *dev,
EXPORT_SYMBOL_GPL(pm_wakeup_event);

/**
- * pm_wakeup_update_hit_counts - Update hit counts of all active wakeup sources.
- */
-static void pm_wakeup_update_hit_counts(void)
-{
- unsigned long flags;
- struct wakeup_source *ws;
-
- rcu_read_lock();
- list_for_each_entry_rcu(ws, &wakeup_sources, entry) {
- spin_lock_irqsave(&ws->lock, flags);
- if (ws->active)
- ws->hit_count++;
- spin_unlock_irqrestore(&ws->lock, flags);
- }
- rcu_read_unlock();
-}
-
-/**
* pm_wakeup_pending - Check if power transition in progress should be aborted.
*
* Compare the current number of registered wakeup events with its preserved
@@ -653,8 +648,6 @@ bool pm_wakeup_pending(void)
events_check_enabled = !ret;
}
spin_unlock_irqrestore(&events_lock, flags);
- if (ret)
- pm_wakeup_update_hit_counts();
return ret;
}

@@ -680,7 +673,6 @@ bool pm_get_wakeup_count(unsigned int *c
split_counters(&cnt, &inpr);
if (inpr == 0 || signal_pending(current))
break;
- pm_wakeup_update_hit_counts();

schedule();
}
@@ -713,8 +705,6 @@ bool pm_save_wakeup_count(unsigned int c
events_check_enabled = true;
}
spin_unlock_irq(&events_lock);
- if (!events_check_enabled)
- pm_wakeup_update_hit_counts();
return events_check_enabled;
}

@@ -749,9 +739,10 @@ static int print_wakeup_source_stats(str
active_time = ktime_set(0, 0);
}

- ret = seq_printf(m, "%-12s\t%lu\t\t%lu\t\t%lu\t\t"
+ ret = seq_printf(m, "%-12s\t%lu\t\t%lu\t\t%lu\t\t%lu\t\t"
"%lld\t\t%lld\t\t%lld\t\t%lld\n",
- ws->name, active_count, ws->event_count, ws->hit_count,
+ ws->name, active_count, ws->event_count,
+ ws->wakeup_count, ws->expire_count,
ktime_to_ms(active_time), ktime_to_ms(total_time),
ktime_to_ms(max_time), ktime_to_ms(ws->last_time));

@@ -768,8 +759,9 @@ static int wakeup_sources_stats_show(str
{
struct wakeup_source *ws;

- seq_puts(m, "name\t\tactive_count\tevent_count\thit_count\t"
- "active_since\ttotal_time\tmax_time\tlast_change\n");
+ seq_puts(m, "name\t\tactive_count\tevent_count\twakeup_count\t"
+ "expire_count\tactive_since\ttotal_time\tmax_time\t"
+ "last_change\n");

rcu_read_lock();
list_for_each_entry_rcu(ws, &wakeup_sources, entry)
Index: linux/drivers/base/power/sysfs.c
===================================================================
--- linux.orig/drivers/base/power/sysfs.c
+++ linux/drivers/base/power/sysfs.c
@@ -288,22 +288,41 @@ static ssize_t wakeup_active_count_show(

static DEVICE_ATTR(wakeup_active_count, 0444, wakeup_active_count_show, NULL);

-static ssize_t wakeup_hit_count_show(struct device *dev,
- struct device_attribute *attr, char *buf)
+static ssize_t wakeup_abort_count_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ unsigned long count = 0;
+ bool enabled = false;
+
+ spin_lock_irq(&dev->power.lock);
+ if (dev->power.wakeup) {
+ count = dev->power.wakeup->wakeup_count;
+ enabled = true;
+ }
+ spin_unlock_irq(&dev->power.lock);
+ return enabled ? sprintf(buf, "%lu\n", count) : sprintf(buf, "\n");
+}
+
+static DEVICE_ATTR(wakeup_abort_count, 0444, wakeup_abort_count_show, NULL);
+
+static ssize_t wakeup_expire_count_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
{
unsigned long count = 0;
bool enabled = false;

spin_lock_irq(&dev->power.lock);
if (dev->power.wakeup) {
- count = dev->power.wakeup->hit_count;
+ count = dev->power.wakeup->expire_count;
enabled = true;
}
spin_unlock_irq(&dev->power.lock);
return enabled ? sprintf(buf, "%lu\n", count) : sprintf(buf, "\n");
}

-static DEVICE_ATTR(wakeup_hit_count, 0444, wakeup_hit_count_show, NULL);
+static DEVICE_ATTR(wakeup_expire_count, 0444, wakeup_expire_count_show, NULL);

static ssize_t wakeup_active_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -460,7 +479,8 @@ static struct attribute *wakeup_attrs[]
&dev_attr_wakeup.attr,
&dev_attr_wakeup_count.attr,
&dev_attr_wakeup_active_count.attr,
- &dev_attr_wakeup_hit_count.attr,
+ &dev_attr_wakeup_abort_count.attr,
+ &dev_attr_wakeup_expire_count.attr,
&dev_attr_wakeup_active.attr,
&dev_attr_wakeup_total_time_ms.attr,
&dev_attr_wakeup_max_time_ms.attr,
Index: linux/Documentation/ABI/testing/sysfs-devices-power
===================================================================
--- linux.orig/Documentation/ABI/testing/sysfs-devices-power
+++ linux/Documentation/ABI/testing/sysfs-devices-power
@@ -96,16 +96,26 @@ Description:
is read-only. If the device is not enabled to wake up the
system from sleep states, this attribute is not present.

-What: /sys/devices/.../power/wakeup_hit_count
-Date: September 2010
+What: /sys/devices/.../power/wakeup_abort_count
+Date: February 2012
Contact: Rafael J. Wysocki <r...@sisk.pl>
Description:
- The /sys/devices/.../wakeup_hit_count attribute contains the
+ The /sys/devices/.../wakeup_abort_count attribute contains the
number of times the processing of a wakeup event associated with
- the device might prevent the system from entering a sleep state.
- This attribute is read-only. If the device is not enabled to
- wake up the system from sleep states, this attribute is not
- present.
+ the device might have aborted system transition into a sleep
+ state in progress. This attribute is read-only. If the device
+ is not enabled to wake up the system from sleep states, this
+ attribute is not present.
+
+What: /sys/devices/.../power/wakeup_expire_count
+Date: February 2012
+Contact: Rafael J. Wysocki <r...@sisk.pl>
+Description:
+ The /sys/devices/.../wakeup_expire_count attribute contains the
+ number of times a wakeup event associated with the device has
+ been reported with a timeout that expired. This attribute is
+ read-only. If the device is not enabled to wake up the system
+ from sleep states, this attribute is not present.

What: /sys/devices/.../power/wakeup_active
Date: September 2010

Rafael J. Wysocki

unread,
Feb 21, 2012, 6:35:52 PM2/21/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
From: Rafael J. Wysocki <r...@sisk.pl>

Android uses one wakelock statistics that is only necessary for
opportunistic sleep. Namely, the prevent_suspend_time field
accumulates the total time the given wakelock has been locked
while "automatic suspend" was enabled. Add an analogous field,
prevent_sleep_time, to wakeup sources and make it behave in a similar
way.

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
Documentation/ABI/testing/sysfs-devices-power | 11 ++++
drivers/base/power/sysfs.c | 24 ++++++++++
drivers/base/power/wakeup.c | 61 ++++++++++++++++++++++++--
include/linux/pm_wakeup.h | 4 +
include/linux/suspend.h | 1
kernel/power/autosleep.c | 2
6 files changed, 99 insertions(+), 4 deletions(-)

Index: linux/include/linux/pm_wakeup.h
===================================================================
--- linux.orig/include/linux/pm_wakeup.h
+++ linux/include/linux/pm_wakeup.h
@@ -34,6 +34,7 @@
* @total_time: Total time this wakeup source has been active.
* @max_time: Maximum time this wakeup source has been continuously active.
* @last_time: Monotonic clock when the wakeup source's was touched last time.
+ * @prevent_sleep_time: Total time this source has been preventing autosleep.
* @event_count: Number of signaled wakeup events.
* @active_count: Number of times the wakeup sorce was activated.
* @relax_count: Number of times the wakeup sorce was deactivated.
@@ -51,12 +52,15 @@ struct wakeup_source {
ktime_t total_time;
ktime_t max_time;
ktime_t last_time;
+ ktime_t start_prevent_time;
+ ktime_t prevent_sleep_time;
unsigned long event_count;
unsigned long active_count;
unsigned long relax_count;
unsigned long expire_count;
unsigned long wakeup_count;
bool active:1;
+ bool autosleep_enabled:1;
};

#ifdef CONFIG_PM_SLEEP
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -377,6 +377,8 @@ static void wakeup_source_activate(struc
ws->active = true;
ws->active_count++;
ws->last_time = ktime_get();
+ if (ws->autosleep_enabled)
+ ws->start_prevent_time = ws->last_time;

/* Increment the counter of events in progress. */
atomic_inc(&combined_event_count);
@@ -444,6 +446,17 @@ void pm_stay_awake(struct device *dev)
}
EXPORT_SYMBOL_GPL(pm_stay_awake);

+#ifdef CONFIG_PM_AUTOSLEEP
+static void update_prevent_sleep_time(struct wakeup_source *ws, ktime_t now)
+{
+ ktime_t delta = ktime_sub(now, ws->start_prevent_time);
+ ws->prevent_sleep_time = ktime_add(ws->prevent_sleep_time, delta);
+}
+#else
+static inline void update_prevent_sleep_time(struct wakeup_source *ws,
+ ktime_t now) {}
+#endif
+
/**
* wakup_source_deactivate - Mark given wakeup source as inactive.
* @ws: Wakeup source to handle.
@@ -485,6 +498,9 @@ static void wakeup_source_deactivate(str
del_timer(&ws->timer);
ws->timer_expires = 0;

+ if (ws->autosleep_enabled)
+ update_prevent_sleep_time(ws, now);
+
/*
* Increment the counter of registered wakeup events and decrement the
* couter of wakeup events in progress simultaneously.
@@ -714,6 +730,34 @@ bool pm_save_wakeup_count(unsigned int c
return events_check_enabled;
}

+#ifdef CONFIG_PM_AUTOSLEEP
+/**
+ * pm_wakep_autosleep_enabled - Modify autosleep_enabled for all wakeup sources.
+ * @enabled: Whether to set or to clear the autosleep_enabled flags.
+ */
+void pm_wakep_autosleep_enabled(bool set)
+{
+ struct wakeup_source *ws;
+ ktime_t now = ktime_get();
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(ws, &wakeup_sources, entry) {
+ spin_lock_irq(&ws->lock);
+ if (ws->autosleep_enabled != set) {
+ ws->autosleep_enabled = set;
+ if (ws->active) {
+ if (set)
+ ws->start_prevent_time = now;
+ else
+ update_prevent_sleep_time(ws, now);
+ }
+ }
+ spin_unlock_irq(&ws->lock);
+ }
+ rcu_read_unlock();
+}
+#endif /* CONFIG_PM_AUTOSLEEP */
+
static struct dentry *wakeup_sources_stats_dentry;

/**
@@ -729,28 +773,37 @@ static int print_wakeup_source_stats(str
ktime_t max_time;
unsigned long active_count;
ktime_t active_time;
+ ktime_t prevent_sleep_time;
int ret;

spin_lock_irqsave(&ws->lock, flags);

total_time = ws->total_time;
max_time = ws->max_time;
+ prevent_sleep_time = ws->prevent_sleep_time;
active_count = ws->active_count;
if (ws->active) {
- active_time = ktime_sub(ktime_get(), ws->last_time);
+ ktime_t now = ktime_get();
+
+ active_time = ktime_sub(now, ws->last_time);
total_time = ktime_add(total_time, active_time);
if (active_time.tv64 > max_time.tv64)
max_time = active_time;
+
+ if (ws->autosleep_enabled)
+ prevent_sleep_time = ktime_add(prevent_sleep_time,
+ ktime_sub(now, ws->start_prevent_time));
} else {
active_time = ktime_set(0, 0);
}

ret = seq_printf(m, "%-12s\t%lu\t\t%lu\t\t%lu\t\t%lu\t\t"
- "%lld\t\t%lld\t\t%lld\t\t%lld\n",
+ "%lld\t\t%lld\t\t%lld\t\t%lld\t\t%lld\n",
ws->name, active_count, ws->event_count,
ws->wakeup_count, ws->expire_count,
ktime_to_ms(active_time), ktime_to_ms(total_time),
- ktime_to_ms(max_time), ktime_to_ms(ws->last_time));
+ ktime_to_ms(max_time), ktime_to_ms(ws->last_time),
+ ktime_to_ms(prevent_sleep_time));

spin_unlock_irqrestore(&ws->lock, flags);

@@ -767,7 +820,7 @@ static int wakeup_sources_stats_show(str

seq_puts(m, "name\t\tactive_count\tevent_count\twakeup_count\t"
"expire_count\tactive_since\ttotal_time\tmax_time\t"
- "last_change\n");
+ "last_change\tprevent_suspend_time\n");

rcu_read_lock();
list_for_each_entry_rcu(ws, &wakeup_sources, entry)
Index: linux/include/linux/suspend.h
===================================================================
--- linux.orig/include/linux/suspend.h
+++ linux/include/linux/suspend.h
@@ -358,6 +358,7 @@ extern bool events_check_enabled;
extern bool pm_wakeup_pending(void);
extern bool pm_get_wakeup_count(unsigned int *count, bool block);
extern bool pm_save_wakeup_count(unsigned int count);
+extern void pm_wakep_autosleep_enabled(bool set);

static inline void lock_system_sleep(void)
{
Index: linux/kernel/power/autosleep.c
===================================================================
--- linux.orig/kernel/power/autosleep.c
+++ linux/kernel/power/autosleep.c
@@ -73,8 +73,10 @@ int pm_autosleep_set_state(suspend_state
mutex_lock(&autosleep_lock);
if (state == PM_SUSPEND_ON && autosleep_state != PM_SUSPEND_ON) {
autosleep_state = PM_SUSPEND_ON;
+ pm_wakep_autosleep_enabled(false);
} else if (state > PM_SUSPEND_ON) {
autosleep_state = state;
+ pm_wakep_autosleep_enabled(true);
queue_up_suspend_work();
}
mutex_unlock(&autosleep_lock);
Index: linux/drivers/base/power/sysfs.c
===================================================================
--- linux.orig/drivers/base/power/sysfs.c
+++ linux/drivers/base/power/sysfs.c
@@ -391,6 +391,27 @@ static ssize_t wakeup_last_time_show(str
}

static DEVICE_ATTR(wakeup_last_time_ms, 0444, wakeup_last_time_show, NULL);
+
+#ifdef CONFIG_PM_AUTOSLEEP
+static ssize_t wakeup_prevent_sleep_time_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ s64 msec = 0;
+ bool enabled = false;
+
+ spin_lock_irq(&dev->power.lock);
+ if (dev->power.wakeup) {
+ msec = ktime_to_ms(dev->power.wakeup->prevent_sleep_time);
+ enabled = true;
+ }
+ spin_unlock_irq(&dev->power.lock);
+ return enabled ? sprintf(buf, "%lld\n", msec) : sprintf(buf, "\n");
+}
+
+static DEVICE_ATTR(wakeup_prevent_sleep_time_ms, 0444,
+ wakeup_prevent_sleep_time_show, NULL);
+#endif /* CONFIG_PM_AUTOSLEEP */
#endif /* CONFIG_PM_SLEEP */

#ifdef CONFIG_PM_ADVANCED_DEBUG
@@ -485,6 +506,9 @@ static struct attribute *wakeup_attrs[]
&dev_attr_wakeup_total_time_ms.attr,
&dev_attr_wakeup_max_time_ms.attr,
&dev_attr_wakeup_last_time_ms.attr,
+#ifdef CONFIG_PM_AUTOSLEEP
+ &dev_attr_wakeup_prevent_sleep_time_ms.attr,
+#endif
#endif
NULL,
};
Index: linux/Documentation/ABI/testing/sysfs-devices-power
===================================================================
--- linux.orig/Documentation/ABI/testing/sysfs-devices-power
+++ linux/Documentation/ABI/testing/sysfs-devices-power
@@ -158,6 +158,17 @@ Description:
not enabled to wake up the system from sleep states, this
attribute is not present.

+What: /sys/devices/.../power/wakeup_prevent_sleep_time_ms
+Date: February 2012
+Contact: Rafael J. Wysocki <r...@sisk.pl>
+Description:
+ The /sys/devices/.../wakeup_prevent_sleep_time_ms attribute
+ contains the total time the device has been preventing
+ opportunistic transitions to sleep states from occuring.
+ This attribute is read-only. If the device is not enabled to
+ wake up the system from sleep states, this attribute is not
+ present.
+
What: /sys/devices/.../power/autosuspend_delay_ms
Date: September 2010
Contact: Alan Stern <st...@rowland.harvard.edu>

Rafael J. Wysocki

unread,
Feb 21, 2012, 6:36:00 PM2/21/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
From: Rafael J. Wysocki <r...@sisk.pl>

Introduce a mechanism by which the kernel can trigger global
transitions to a sleep state chosen by user space if there are no
active wakeup sources.

It consists of a new sysfs attribute, /sys/power/autosleep, that
can be written one of the strings returned by reads from
/sys/power/state, an ordered workqueue and a work item carrying out
the "suspend" operations. If a string representing the system's
sleep state is written to /sys/power/autosleep, the work item
triggering transitions to that state is queued up and it requeues
itself after every execution until user space writes "off" to
/sys/power/autosleep.

That work item enables the detection of wakeup events using the
functions already defined in drivers/base/power/wakeup.c (with one
small modification) and calls either pm_suspend(), or hibernate() to
put the system into a sleep state. If a wakeup event is reported
while the transition is in progress, it will abort the transition and
the "system suspend" work item will be queued up again.

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
Documentation/ABI/testing/sysfs-power | 17 +++++
drivers/base/power/wakeup.c | 38 ++++++-----
include/linux/suspend.h | 13 +++-
kernel/power/Kconfig | 8 ++
kernel/power/Makefile | 1
kernel/power/autosleep.c | 98 ++++++++++++++++++++++++++++++
kernel/power/main.c | 108 ++++++++++++++++++++++++++++------
kernel/power/power.h | 18 +++++
8 files changed, 266 insertions(+), 35 deletions(-)

Index: linux/kernel/power/Makefile
===================================================================
--- linux.orig/kernel/power/Makefile
+++ linux/kernel/power/Makefile
@@ -9,5 +9,6 @@ obj-$(CONFIG_SUSPEND) += suspend.o
obj-$(CONFIG_PM_TEST_SUSPEND) += suspend_test.o
obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o \
block_io.o
+obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o

obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o
Index: linux/kernel/power/Kconfig
===================================================================
--- linux.orig/kernel/power/Kconfig
+++ linux/kernel/power/Kconfig
@@ -103,6 +103,14 @@ config PM_SLEEP_SMP
select HOTPLUG
select HOTPLUG_CPU

+config PM_AUTOSLEEP
+ bool "Opportunistic sleep"
+ depends on PM_SLEEP
+ default n
+ ---help---
+ Allow the kernel to trigger a system transition into a global sleep
+ state automatically whenever there are no active wakeup sources.
+
config PM_RUNTIME
bool "Run-time PM core functionality"
depends on !IA64_HP_SIM
Index: linux/kernel/power/power.h
===================================================================
--- linux.orig/kernel/power/power.h
+++ linux/kernel/power/power.h
@@ -264,3 +264,21 @@ static inline void suspend_thaw_processe
{
}
#endif
+
+#ifdef CONFIG_PM_AUTOSLEEP
+
+/* kernel/power/autosleep.c */
+extern int pm_autosleep_init(void);
+extern void pm_autosleep_lock(void);
+extern void pm_autosleep_unlock(void);
+extern suspend_state_t pm_autosleep_state(void);
+extern int pm_autosleep_set_state(suspend_state_t state);
+
+#else /* !CONFIG_PM_AUTOSLEEP */
+
+static inline int pm_autosleep_init(void) { return 0; }
+static inline void pm_autosleep_lock(void) {}
+static inline void pm_autosleep_unlock(void) {}
+static inline suspend_state_t pm_autosleep_state(void) { return PM_SUSPEND_ON; }
+
+#endif /* !CONFIG_PM_AUTOSLEEP */
Index: linux/include/linux/suspend.h
===================================================================
--- linux.orig/include/linux/suspend.h
+++ linux/include/linux/suspend.h
@@ -356,7 +356,7 @@ extern int unregister_pm_notifier(struct
extern bool events_check_enabled;

extern bool pm_wakeup_pending(void);
-extern bool pm_get_wakeup_count(unsigned int *count);
+extern bool pm_get_wakeup_count(unsigned int *count, bool block);
extern bool pm_save_wakeup_count(unsigned int count);

static inline void lock_system_sleep(void)
@@ -407,6 +407,17 @@ static inline void unlock_system_sleep(v

#endif /* !CONFIG_PM_SLEEP */

+#ifdef CONFIG_PM_AUTOSLEEP
+
+/* kernel/power/autosleep.c */
+void queue_up_suspend_work(void);
+
+#else /* !CONFIG_PM_AUTOSLEEP */
+
+static inline void queue_up_suspend_work(void) {}
+
+#endif /* !CONFIG_PM_AUTOSLEEP */
+
#ifdef CONFIG_ARCH_SAVE_PAGE_KEYS
/*
* The ARCH_SAVE_PAGE_KEYS functions can be used by an architecture
Index: linux/kernel/power/autosleep.c
===================================================================
--- /dev/null
+++ linux/kernel/power/autosleep.c
@@ -0,0 +1,98 @@
+/*
+ * kernel/power/autosleep.c
+ *
+ * Opportunistic sleep support.
+ *
+ * Copyright (C) 2012 Rafael J. Wysocki <r...@sisk.pl>
+ */
+
+#include <linux/device.h>
+#include <linux/mutex.h>
+#include <linux/pm_wakeup.h>
+
+#include "power.h"
+
+static suspend_state_t autosleep_state;
+static struct workqueue_struct *autosleep_wq;
+static DEFINE_MUTEX(autosleep_lock);
+
+static void try_to_suspend(struct work_struct *work)
+{
+ unsigned int initial_count, final_count;
+
+ if (!pm_get_wakeup_count(&initial_count, true))
+ goto out;
+
+ mutex_lock(&autosleep_lock);
+
+ if (!pm_save_wakeup_count(initial_count)) {
+ mutex_unlock(&autosleep_lock);
+ goto out;
+ }
+
+ if (autosleep_state == PM_SUSPEND_ON) {
+ mutex_unlock(&autosleep_lock);
+ return;
+ }
+ if (autosleep_state >= PM_SUSPEND_MAX)
+ hibernate();
+ else
+ pm_suspend(autosleep_state);
+
+ mutex_unlock(&autosleep_lock);
+
+ if (!pm_get_wakeup_count(&final_count, false))
+ goto out;
+
+ if (final_count == initial_count)
+ schedule_timeout(HZ / 2);
+
+ out:
+ queue_up_suspend_work();
+}
+
+static DECLARE_WORK(suspend_work, try_to_suspend);
+
+void queue_up_suspend_work(void)
+{
+ if (!work_pending(&suspend_work) && autosleep_state > PM_SUSPEND_ON)
+ queue_work(autosleep_wq, &suspend_work);
+}
+
+suspend_state_t pm_autosleep_state(void)
+{
+ return autosleep_state;
+}
+
+int pm_autosleep_set_state(suspend_state_t state)
+{
+#ifndef CONFIG_HIBERNATION
+ if (state >= PM_SUSPEND_MAX)
+ return -EINVAL;
+#endif
+ mutex_lock(&autosleep_lock);
+ if (state == PM_SUSPEND_ON && autosleep_state != PM_SUSPEND_ON) {
+ autosleep_state = PM_SUSPEND_ON;
+ } else if (state > PM_SUSPEND_ON) {
+ autosleep_state = state;
+ queue_up_suspend_work();
+ }
+ mutex_unlock(&autosleep_lock);
+ return 0;
+}
+
+void pm_autosleep_lock(void)
+{
+ mutex_lock(&autosleep_lock);
+}
+
+void pm_autosleep_unlock(void)
+{
+ mutex_unlock(&autosleep_lock);
+}
+
+int __init pm_autosleep_init(void)
+{
+ autosleep_wq = alloc_ordered_workqueue("autosleep", 0);
+ return autosleep_wq ? 0 : -ENOMEM;
+}
Index: linux/kernel/power/main.c
===================================================================
--- linux.orig/kernel/power/main.c
+++ linux/kernel/power/main.c
@@ -269,8 +269,7 @@ static ssize_t state_show(struct kobject
return (s - buf);
}

-static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
- const char *buf, size_t n)
+static suspend_state_t decode_state(const char *buf, size_t n)
{
#ifdef CONFIG_SUSPEND
suspend_state_t state = PM_SUSPEND_STANDBY;
@@ -278,27 +277,43 @@ static ssize_t state_store(struct kobjec
#endif
char *p;
int len;
- int error = -EINVAL;

p = memchr(buf, '\n', n);
len = p ? p - buf : n;

- /* First, check if we are requested to hibernate */
- if (len == 4 && !strncmp(buf, "disk", len)) {
- error = hibernate();
- goto Exit;
- }
+ /* Check hibernation first. */
+ if (len == 4 && !strncmp(buf, "disk", len))
+ return PM_SUSPEND_MAX;

#ifdef CONFIG_SUSPEND
- for (s = &pm_states[state]; state < PM_SUSPEND_MAX; s++, state++) {
- if (*s && len == strlen(*s) && !strncmp(buf, *s, len)) {
- error = pm_suspend(state);
- break;
- }
- }
+ for (s = &pm_states[state]; state < PM_SUSPEND_MAX; s++, state++)
+ if (*s && len == strlen(*s) && !strncmp(buf, *s, len))
+ return state;
#endif

- Exit:
+ return PM_SUSPEND_ON;
+}
+
+static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ suspend_state_t state;
+ int error = -EINVAL;
+
+ pm_autosleep_lock();
+ if (pm_autosleep_state() > PM_SUSPEND_ON) {
+ error = -EBUSY;
+ goto out;
+ }
+
+ state = decode_state(buf, n);
+ if (state < PM_SUSPEND_MAX)
+ error = pm_suspend(state);
+ else if (state > PM_SUSPEND_ON)
+ error = hibernate();
+
+ out:
+ pm_autosleep_unlock();
return error ? error : n;
}

@@ -339,7 +354,8 @@ static ssize_t wakeup_count_show(struct
{
unsigned int val;

- return pm_get_wakeup_count(&val) ? sprintf(buf, "%u\n", val) : -EINTR;
+ return pm_get_wakeup_count(&val, true) ?
+ sprintf(buf, "%u\n", val) : -EINTR;
}

static ssize_t wakeup_count_store(struct kobject *kobj,
@@ -347,15 +363,65 @@ static ssize_t wakeup_count_store(struct
const char *buf, size_t n)
{
unsigned int val;
+ int error = -EINVAL;
+
+ pm_autosleep_lock();
+ if (pm_autosleep_state() > PM_SUSPEND_ON) {
+ error = -EBUSY;
+ goto out;
+ }

if (sscanf(buf, "%u", &val) == 1) {
if (pm_save_wakeup_count(val))
return n;
}
- return -EINVAL;
+
+ out:
+ pm_autosleep_unlock();
+ return error;
}

power_attr(wakeup_count);
+
+#ifdef CONFIG_PM_AUTOSLEEP
+static ssize_t autosleep_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ suspend_state_t state = pm_autosleep_state();
+
+ if (state == PM_SUSPEND_ON)
+ return sprintf(buf, "off\n");
+
+#ifdef CONFIG_SUSPEND
+ if (state < PM_SUSPEND_MAX)
+ return sprintf(buf, "%s\n", valid_state(state) ?
+ pm_states[state] : "error");
+#endif
+#ifdef CONFIG_HIBERNATION
+ return sprintf(buf, "disk\n");
+#else
+ return sprintf(buf, "error");
+#endif
+}
+
+static ssize_t autosleep_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ suspend_state_t state = decode_state(buf, n);
+ int error;
+
+ if (state == PM_SUSPEND_ON && strncmp(buf, "off", 3)
+ && strncmp(buf, "off\n", 4))
+ return -EINVAL;
+
+ error = pm_autosleep_set_state(state);
+ return error ? error : n;
+}
+
+power_attr(autosleep);
+#endif /* CONFIG_PM_AUTOSLEEP */
#endif /* CONFIG_PM_SLEEP */

#ifdef CONFIG_PM_TRACE
@@ -409,6 +475,9 @@ static struct attribute * g[] = {
#ifdef CONFIG_PM_SLEEP
&pm_async_attr.attr,
&wakeup_count_attr.attr,
+#ifdef CONFIG_PM_AUTOSLEEP
+ &autosleep_attr.attr,
+#endif
#ifdef CONFIG_PM_DEBUG
&pm_test_attr.attr,
#endif
@@ -444,7 +513,10 @@ static int __init pm_init(void)
power_kobj = kobject_create_and_add("power", NULL);
if (!power_kobj)
return -ENOMEM;
- return sysfs_create_group(power_kobj, &attr_group);
+ error = sysfs_create_group(power_kobj, &attr_group);
+ if (error)
+ return error;
+ return pm_autosleep_init();
}

core_initcall(pm_init);
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -492,8 +492,10 @@ static void wakeup_source_deactivate(str
atomic_add(MAX_IN_PROGRESS, &combined_event_count);

split_counters(&cnt, &inpr);
- if (!inpr && waitqueue_active(&wakeup_count_wait_queue))
+ if (!inpr && waitqueue_active(&wakeup_count_wait_queue)) {
wake_up(&wakeup_count_wait_queue);
+ queue_up_suspend_work();
+ }
}

/**
@@ -654,29 +656,33 @@ bool pm_wakeup_pending(void)
/**
* pm_get_wakeup_count - Read the number of registered wakeup events.
* @count: Address to store the value at.
+ * @block: Whether or not to block.
*
- * Store the number of registered wakeup events at the address in @count. Block
- * if the current number of wakeup events being processed is nonzero.
+ * Store the number of registered wakeup events at the address in @count. If
+ * @block is set, block until the current number of wakeup events being
+ * processed is zero.
*
- * Return 'false' if the wait for the number of wakeup events being processed to
- * drop down to zero has been interrupted by a signal (and the current number
- * of wakeup events being processed is still nonzero). Otherwise return 'true'.
+ * Return 'false' if the current number of wakeup events being processed is
+ * nonzero. Otherwise return 'true'.
*/
-bool pm_get_wakeup_count(unsigned int *count)
+bool pm_get_wakeup_count(unsigned int *count, bool block)
{
unsigned int cnt, inpr;
- DEFINE_WAIT(wait);

- for (;;) {
- prepare_to_wait(&wakeup_count_wait_queue, &wait,
- TASK_INTERRUPTIBLE);
- split_counters(&cnt, &inpr);
- if (inpr == 0 || signal_pending(current))
- break;
+ if (block) {
+ DEFINE_WAIT(wait);

- schedule();
+ for (;;) {
+ prepare_to_wait(&wakeup_count_wait_queue, &wait,
+ TASK_INTERRUPTIBLE);
+ split_counters(&cnt, &inpr);
+ if (inpr == 0 || signal_pending(current))
+ break;
+
+ schedule();
+ }
+ finish_wait(&wakeup_count_wait_queue, &wait);
}
- finish_wait(&wakeup_count_wait_queue, &wait);

split_counters(&cnt, &inpr);
*count = cnt;
Index: linux/Documentation/ABI/testing/sysfs-power
===================================================================
--- linux.orig/Documentation/ABI/testing/sysfs-power
+++ linux/Documentation/ABI/testing/sysfs-power
@@ -172,3 +172,20 @@ Description:

Reading from this file will display the current value, which is
set to 1 MB by default.
+
+What: /sys/power/autosleep
+Date: February 2012
+Contact: Rafael J. Wysocki <r...@sisk.pl>
+Description:
+ The /sys/power/autosleep file can be written one of the strings
+ returned by reads from /sys/power/state. If that happens, a
+ work item attempting to trigger a transition of the system to
+ the sleep state represented by that string is queued up. This
+ attempt will only succeed if there are no active wakeup sources
+ in the system at that time. After evey execution, regardless
+ of whether or not the attempt to put the system to sleep has
+ succeeded, the work item requeues itself until user space
+ writes "off" to /sys/power/autosleep.
+
+ Reading from this file causes the last string successfully
+ written to it to be displayed.

Rafael J. Wysocki

unread,
Feb 21, 2012, 6:36:16 PM2/21/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
From: Rafael J. Wysocki <r...@sisk.pl>

Android allows user space to manipulate wakelocks using two
sysfs file located in /sys/power/, wake_lock and wake_unlock.
Writing a wakelock name and optionally a timeout to the wake_lock
file causes the wakelock whose name was written to be acquired (it
is created before is necessary), optionally with the given timeout.
Writing the name of a wakelock to wake_unlock causes that wakelock
to be released.

Implement an analogous interface for user space using wakeup sources.
Add the /sys/power/wake_lock and /sys/power/wake_unlock files
allowing user space to create, activate and deactivate wakeup
sources, such that writing a name and optionally a timeout to
wake_lock causes the wakeup source of that name to be activated,
optionally with the given timeout. If that wakeup source doesn't
exist, it will be created and then activated. Writing a name to
wake_unlock causes the wakeup source of that name, if there is one,
to be deactivated. Wakeup sources created with the help of
wake_lock that haven't been used for more than 5 minutes are garbage
collected and destroyed. Moreover, there can be only WL_NUMBER_LIMIT
wakeup sources created with the help of wake_lock present at a time.

The data type used to track wakeup sources created by user space is
called "struct wakelock" to indicate the origins of this feature.

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
Documentation/ABI/testing/sysfs-power | 42 ++++++
drivers/base/power/wakeup.c | 1
kernel/power/Kconfig | 8 +
kernel/power/Makefile | 1
kernel/power/main.c | 41 ++++++
kernel/power/power.h | 9 +
kernel/power/wakelock.c | 218 ++++++++++++++++++++++++++++++++++
7 files changed, 320 insertions(+)

Index: linux/kernel/power/main.c
===================================================================
--- linux.orig/kernel/power/main.c
+++ linux/kernel/power/main.c
@@ -422,6 +422,43 @@ static ssize_t autosleep_store(struct ko

power_attr(autosleep);
#endif /* CONFIG_PM_AUTOSLEEP */
+
+#ifdef CONFIG_PM_WAKELOCKS
+static ssize_t wake_lock_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ return pm_show_wakelocks(buf, true);
+}
+
+static ssize_t wake_lock_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ int error = pm_wake_lock(buf);
+ return error ? error : n;
+}
+
+power_attr(wake_lock);
+
+static ssize_t wake_unlock_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ return pm_show_wakelocks(buf, false);
+}
+
+static ssize_t wake_unlock_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ int error = pm_wake_unlock(buf);
+ return error ? error : n;
+}
+
+power_attr(wake_unlock);
+
+#endif /* CONFIG_PM_WAKELOCKS */
#endif /* CONFIG_PM_SLEEP */

#ifdef CONFIG_PM_TRACE
@@ -478,6 +515,10 @@ static struct attribute * g[] = {
#ifdef CONFIG_PM_AUTOSLEEP
&autosleep_attr.attr,
#endif
+#ifdef CONFIG_PM_WAKELOCKS
+ &wake_lock_attr.attr,
+ &wake_unlock_attr.attr,
+#endif
#ifdef CONFIG_PM_DEBUG
&pm_test_attr.attr,
#endif
Index: linux/kernel/power/power.h
===================================================================
--- linux.orig/kernel/power/power.h
+++ linux/kernel/power/power.h
@@ -282,3 +282,12 @@ static inline void pm_autosleep_unlock(v
static inline suspend_state_t pm_autosleep_state(void) { return PM_SUSPEND_ON; }

#endif /* !CONFIG_PM_AUTOSLEEP */
+
+#ifdef CONFIG_PM_WAKELOCKS
+
+/* kernel/power/wakelock.c */
+extern ssize_t pm_show_wakelocks(char *buf, bool show_active);
+extern int pm_wake_lock(const char *buf);
+extern int pm_wake_unlock(const char *buf);
+
+#endif /* !CONFIG_PM_WAKELOCKS */
Index: linux/kernel/power/Kconfig
===================================================================
--- linux.orig/kernel/power/Kconfig
+++ linux/kernel/power/Kconfig
@@ -111,6 +111,14 @@ config PM_AUTOSLEEP
Allow the kernel to trigger a system transition into a global sleep
state automatically whenever there are no active wakeup sources.

+config PM_WAKELOCKS
+ bool "User space wakeup sources interface"
+ depends on PM_SLEEP
+ default n
+ ---help---
+ Allow user space to create, activate and deactivate wakeup source
+ objects with the help of a sysfs-based interface.
+
config PM_RUNTIME
bool "Run-time PM core functionality"
depends on !IA64_HP_SIM
Index: linux/kernel/power/wakelock.c
===================================================================
--- /dev/null
+++ linux/kernel/power/wakelock.c
@@ -0,0 +1,218 @@
+/*
+ * kernel/power/wakelock.c
+ *
+ * User space wakeup sources support.
+ *
+ * Copyright (C) 2012 Rafael J. Wysocki <r...@sisk.pl>
+ *
+ * This code is based on the analogous interface allowing user space to
+ * manipulate wakelocks on Android.
+ */
+
+#include <linux/ctype.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/hrtimer.h>
+#include <linux/list.h>
+#include <linux/rbtree.h>
+#include <linux/slab.h>
+
+#define WL_NUMBER_LIMIT 100
+#define WL_GC_COUNT_MAX 100
+#define WL_GC_TIME_SEC 300
+
+static DEFINE_MUTEX(wakelocks_lock);
+
+struct wakelock {
+ char *name;
+ struct rb_node node;
+ struct wakeup_source ws;
+ struct list_head lru;
+};
+
+static struct rb_root wakelocks_tree = RB_ROOT;
+static LIST_HEAD(wakelocks_lru_list);
+static unsigned int number_of_wakelocks;
+static unsigned int wakelocks_gc_count;
+
+ssize_t pm_show_wakelocks(char *buf, bool show_active)
+{
+ struct rb_node *node;
+ struct wakelock *wl;
+ char *str = buf;
+ char *end = buf + PAGE_SIZE;
+
+ mutex_lock(&wakelocks_lock);
+
+ for (node = rb_first(&wakelocks_tree); node; node = rb_next(node)) {
+ bool active;
+
+ wl = rb_entry(node, struct wakelock, node);
+ spin_lock_irq(&wl->ws.lock);
+ active = wl->ws.active;
+ spin_unlock_irq(&wl->ws.lock);
+ if (active == show_active)
+ str += scnprintf(str, end - str, "%s ", wl->name);
+ }
+ str += scnprintf(str, end - str, "\n");
+
+ mutex_unlock(&wakelocks_lock);
+ return (str - buf);
+}
+
+static struct wakelock *wakelock_lookup_add(const char *name, size_t len,
+ bool add_if_not_found)
+{
+ struct rb_node **node = &wakelocks_tree.rb_node;
+ struct rb_node *parent = *node;
+ struct wakelock *wl;
+
+ while (*node) {
+ int diff;
+
+ wl = rb_entry(*node, struct wakelock, node);
+ diff = strncmp(name, wl->name, len);
+ if (diff == 0) {
+ if (wl->name[len])
+ diff = -1;
+ else
+ return wl;
+ }
+ if (diff < 0)
+ node = &(*node)->rb_left;
+ else
+ node = &(*node)->rb_right;
+
+ parent = *node;
+ }
+ if (!add_if_not_found)
+ return ERR_PTR(-EINVAL);
+
+ if (number_of_wakelocks > WL_NUMBER_LIMIT)
+ return ERR_PTR(-ENOSPC);
+
+ /* Not found, we have to add a new one. */
+ wl = kzalloc(sizeof(*wl), GFP_KERNEL);
+ if (!wl)
+ return ERR_PTR(-ENOMEM);
+
+ wl->name = kstrndup(name, len, GFP_KERNEL);
+ if (!wl->name) {
+ kfree(wl);
+ return ERR_PTR(-ENOMEM);
+ }
+ wl->ws.name = wl->name;
+ wakeup_source_add(&wl->ws);
+ rb_link_node(&wl->node, parent, node);
+ rb_insert_color(&wl->node, &wakelocks_tree);
+ list_add(&wl->lru, &wakelocks_lru_list);
+ number_of_wakelocks++;
+ return wl;
+}
+
+int pm_wake_lock(const char *buf)
+{
+ const char *str = buf;
+ struct wakelock *wl;
+ u64 timeout_ns = 0;
+ size_t len;
+ int ret = 0;
+
+ while (*str && !isspace(*str))
+ str++;
+
+ len = str - buf;
+ if (!len)
+ return -EINVAL;
+
+ if (*str && *str != '\n') {
+ /* Find out if there's a valid timeout string appended. */
+ ret = kstrtou64(skip_spaces(str), 10, &timeout_ns);
+ if (ret)
+ return -EINVAL;
+ }
+
+ mutex_lock(&wakelocks_lock);
+
+ wl = wakelock_lookup_add(buf, len, true);
+ if (IS_ERR(wl)) {
+ ret = PTR_ERR(wl);
+ goto out;
+ }
+ if (timeout_ns) {
+ u64 timeout_ms = timeout_ns + NSEC_PER_MSEC - 1;
+
+ do_div(timeout_ms, NSEC_PER_MSEC);
+ __pm_wakeup_event(&wl->ws, timeout_ms);
+ } else {
+ __pm_stay_awake(&wl->ws);
+ }
+
+ list_move(&wl->lru, &wakelocks_lru_list);
+
+ out:
+ mutex_unlock(&wakelocks_lock);
+ return ret;
+}
+
+static void wakelocks_gc(void)
+{
+ struct wakelock *wl, *aux;
+ ktime_t now = ktime_get();
+
+ list_for_each_entry_safe_reverse(wl, aux, &wakelocks_lru_list, lru) {
+ u64 idle_time_ns;
+ bool active;
+
+ spin_lock_irq(&wl->ws.lock);
+ idle_time_ns = ktime_to_ns(ktime_sub(now, wl->ws.last_time));
+ active = wl->ws.active;
+ spin_unlock_irq(&wl->ws.lock);
+
+ if (idle_time_ns < ((u64)WL_GC_TIME_SEC * NSEC_PER_SEC))
+ break;
+
+ if (!active) {
+ wakeup_source_remove(&wl->ws);
+ rb_erase(&wl->node, &wakelocks_tree);
+ list_del(&wl->lru);
+ kfree(wl->name);
+ kfree(wl);
+ number_of_wakelocks--;
+ }
+ }
+ wakelocks_gc_count = 0;
+}
+
+int pm_wake_unlock(const char *buf)
+{
+ struct wakelock *wl;
+ size_t len;
+ int ret = 0;
+
+ len = strlen(buf);
+ if (!len)
+ return -EINVAL;
+
+ if (buf[len-1] == '\n')
+ len--;
+
+ if (!len)
+ return -EINVAL;
+
+ mutex_lock(&wakelocks_lock);
+
+ wl = wakelock_lookup_add(buf, len, false);
+ if (IS_ERR(wl)) {
+ ret = PTR_ERR(wl);
+ goto out;
+ }
+ __pm_relax(&wl->ws);
+ list_move(&wl->lru, &wakelocks_lru_list);
+ if (++wakelocks_gc_count > WL_GC_COUNT_MAX)
+ wakelocks_gc();
+
+ out:
+ mutex_unlock(&wakelocks_lock);
+ return ret;
+}
Index: linux/kernel/power/Makefile
===================================================================
--- linux.orig/kernel/power/Makefile
+++ linux/kernel/power/Makefile
@@ -10,5 +10,6 @@ obj-$(CONFIG_PM_TEST_SUSPEND) += suspend
obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o \
block_io.o
obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o
+obj-$(CONFIG_PM_WAKELOCKS) += wakelock.o

obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -132,6 +132,7 @@ void wakeup_source_add(struct wakeup_sou
spin_lock_init(&ws->lock);
setup_timer(&ws->timer, pm_wakeup_timer_fn, (unsigned long)ws);
ws->active = false;
+ ws->last_time = ktime_get();

spin_lock_irq(&events_lock);
list_add_rcu(&ws->entry, &wakeup_sources);
Index: linux/Documentation/ABI/testing/sysfs-power
===================================================================
--- linux.orig/Documentation/ABI/testing/sysfs-power
+++ linux/Documentation/ABI/testing/sysfs-power
@@ -189,3 +189,45 @@ Description:

Reading from this file causes the last string successfully
written to it to be displayed.
+
+What: /sys/power/wake_lock
+Date: February 2012
+Contact: Rafael J. Wysocki <r...@sisk.pl>
+Description:
+ The /sys/power/wake_lock file allows user space to create
+ wakeup source objects and activate them on demand (if one of
+ those wakeup sources is active, reads from the
+ /sys/power/wakeup_count file block or return false). When a
+ string without white space is written to /sys/power/wake_lock,
+ it will be assumed to represent a wakeup source name. If there
+ is a wakeup source object with that name, it will be activated
+ (unless active already). Otherwise, a new wakeup source object
+ will be registered, assigned the given name and activated.
+ If a string written to /sys/power/wake_lock contains white
+ space, the part of the string preceding the white space will be
+ regarded as a wakeup source name and handled as descrived above.
+ The other part of the string will be regarded as a timeout (in
+ nanoseconds) such that the wakeup source will be automatically
+ deactivated after it has expired. The timeout, if present, is
+ set regardless of the current state of the wakeup source object
+ in question.
+
+ Reads from this file return a string consisting of the names of
+ wakeup sources created with the help of it that are active at
+ the moment, separated with spaces.
+
+
+What: /sys/power/wake_unlock
+Date: February 2012
+Contact: Rafael J. Wysocki <r...@sisk.pl>
+Description:
+ The /sys/power/wake_unlock file allows user space to deactivate
+ wakeup sources created with the help of /sys/power/wake_lock.
+ When a string is written to /sys/power/wake_unlock, it will be
+ assumed to represent the name of a wakeup source to deactivate.
+ If a wakeup source object of that name exists and is active at
+ the moment, it will be deactivated.
+
+ Reads from this file return a string consisting of the names of
+ wakeup sources created with the help of /sys/power/wake_lock
+ that are inactive at the moment, separated with spaces.

Rafael J. Wysocki

unread,
Feb 21, 2012, 6:36:39 PM2/21/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
Hi all,

After the feedback so far I've decided to follow up with a refreshed patchset.
The first two patches from the previous one went to linux-pm/linux-next
and I included the recent evdev patch from Arve (with some modifications)
to this patchset for completness.

On Tuesday, February 07, 2012, Rafael J. Wysocki wrote:
> Hi all,
>
> This series tests the theory that the easiest way to sell a once rejected
> feature is to advertise it under a different name.
>
> Well, there actually are two different features, although they are closely
> related to each other. First, patch [6/8] introduces a feature that allows
> the kernel to trigger system suspend (or more generally a transition into
> a sleep state) whenever there are no active wakeup sources (no, they aren't
> called wakelocks). It is called "autosleep" here, but it was called a few
> different names in the past ("opportunistic suspend" was probably the most
> popular one). Second, patch [8/8] introduces "wake locks" that are,
> essentially, wakeup sources which may be created and manipulated by user
> space. Using them user space may control the autosleep feature introduced
> earlier.
>
> This also is a kind of a proof of concept for the people who wanted me to
> show a kernel-based implementation of automatic suspend, so there you go.
> Please note, however, that it is done so that the user space "wake locks"
> interface is compatible with Android in support of its user space. I don't
> really like this interface, but since the Android's user space seems to rely
> on it, I'm fine with using it as is. YMMV.
>
> Let me say a few words about every patch in the series individually.
>
> [1/8] - This really is a bug fix, so it's v3.4 material. Nobody has stepped
> on this bug so far, but it should be fixed anyway.
>
> [2/8] - This is a freezer cleanup, worth doing anyway IMO, so v3.4 material too.

The above two are in linux-pm/linux-next now. There are a few more fixes
related to wakeup sources in there and the patches below are based on that
branch.

> [3/8] - This is something we can do no problem, although completely optional
> without the autosleep feature. Rather necessary with it, though.

Now [1/7] - Look for wakeup events in later stages of device suspend.

> [4/8] - This kind of reintroduces my original idea of using a wait queue for
> waiting until there are no wakeup events in progress. Alan convinced me that
> it would be better to poll the counter to prevent wakeup_source_deactivate()
> from having to call wake_up_all() occasionally (that may be costly in fast
> paths), but then quite some people told me that the wait queue migh be
> better. I think that the polling will make much less sense with autosleep
> and user space "wake locks". Anyway, [4/8] is something we can do without
> those things too.

Now [2/7] - Use wait queue to signal "no wakeup events in progress"

With a couple of improvements suggested by Neil.

> The patches above were given Sign-off-by tags, because I think they make some
> sense regardless of the features introcuded by the remaining patches that in
> turn are total RFC.

This time all of the patches are signed-off and include the requisite
documentation changes (hopefully, I haven't forgotten about anything).

> [5/8] - This changes wakeup source statistics so that they are more similar to
> the statistics collected for wakelocks on Android. The file those statistics
> may be read from is still located in debugfs, though (I don't think it
> belongs to proc and its name is different from the analogous Android's file
> name anyway). It could be done without autosleep, but then it would be a bit
> pointless. BTW, this changes interfaces that _in_ _theory_ may be used by
> someone, but I'm not aware of anyone using them. If you are one, I'll be
> pleased to learn about that, so please tell me who you are. :-)

Now [3/7] - Change wakeup source statistics to follow Android.

Rebased and reworked in accordance with the Arve's feedback.

[4/7] - Add ioctl to block suspend while event queue is not empty.

Originally posted by Arve as http://marc.info/?l=linux-pm&m=132711288825973&w=4
Reworked and with modified changelog (I wonder what Dmity thinks about this).

It has some minor problems (for example, in some situations the queue wakeup
source may be activated for events that are not coming from a wakeup device),
but I think it's simple enough, at least for illustration. The ioctls
introduced here will be used by Android user space anyway, although perhaps
under different names, AFAICS.

> [6/8] - Autosleep implementation. I think the changelog explains the idea
> quite well and the code is really nothing special. It doesn't really add
> anything new to the kernel in terms of infrastructure etc., it just uses
> the existing stuff to implement an alternative method of triggering system
> sleep transitions. Note, though, that the interface here is different
> from the Android's one, because Android actually modifies /sys/power/state
> to trigger something called "early suspend" (that is never going to be
> implemented in the "stock" kernel as long as I have any influence on it) and
> we simply can't do that in the mainline.

Now [5/7] - Implement opportunistic sleep

Rebased and simplified (most notably, I've dropped the "main" wakeup source,
since it wasn't really necessary).

> [7/8] - This adds a wakeup source statistics that only makes sense with
> autosleep and (I believe) is analogous to the Android's prevent_suspend_time
> statistics. Nothing really special, but I didn't want
> wakeup_source_activate/deactivate() to take a common lock to avoid
> congestion.

Now [6/7] - Add "prevent autosleep time" statistics to wakeup sources.

Rebased.

> [8/8] - This adds a user space interface to create, activate and deactivate
> wakeup sources. Since the files it consists of are called wake_lock and
> wake_unlock, to follow Android, the objects the wakeup sources are wrapped
> into are called "wakelocks" (for added confusion). Since the interface
> doesn't provide any means to destroy those "wakelocks", I added a garbage
> collection mechanism to get rid of the unused ones, if any. I also tought
> it might be a good idea to put a limit on the number of those things that
> user space can operate simultaneously, so I did that too.

Now [7/7] - Add user space interface for manipulating wakeup sources.

> All of the above has been tested very briefly on my test-bed Mackerel board
> and it quite obviously requires more thorough testing, but first I need to know
> if it makes sense to spend any more time on it.

The above is still accurate, but I also verified that the patches don't break
my PC test boxes (at least as long as the new features aren't used ;-)).

Thanks,
Rafael

Rafael J. Wysocki

unread,
Feb 21, 2012, 6:36:49 PM2/21/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
From: Arve Hjønnevåg <ar...@android.com>

Add a new ioctl, EVIOCSWAKEUPSRC, to attach a wakeup source object to
an evdev client event queue, such that it will be active whenever the
queue is not empty. Then, all events in the queue will be regarded
as wakeup events in progress and pm_get_wakeup_count() will block (or
return false if woken up by a signal) until they are removed from the
queue. In consequence, if the checking of wakeup events is enabled
(e.g. throught the /sys/power/wakeup_count interface), the system
won't be able to go into a sleep state until the queue is empty.

This allows user space processes to handle situations in which they
want to do a select() on an evdev descriptor, so they go to sleep
until there are some events to read from the device's queue, and then
they don't want the system to go into a sleep state until all the
events are read (presumably for further processing). Of course, if
they don't want the system to go into a sleep state _after_ all the
events have been read from the queue, they have to use a separate
mechanism that will prevent the system from doing that and it has
to be activated before reading the first event (that also may be the
last one).

[rjw: Removed unnecessary checks, changed the names of the new ioctls
and the names of the functions that add/remove wakeup source objects
to/from evdev clients, modified the changelog.]

Signed-off-by: Arve Hjønnevåg <ar...@android.com>
Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
drivers/input/evdev.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/input.h | 3 ++
2 files changed, 58 insertions(+)

Index: linux/drivers/input/evdev.c
===================================================================
--- linux.orig/drivers/input/evdev.c
+++ linux/drivers/input/evdev.c
@@ -43,6 +43,7 @@ struct evdev_client {
unsigned int tail;
unsigned int packet_head; /* [future] position of the first element of next packet */
spinlock_t buffer_lock; /* protects access to buffer, head and tail */
+ struct wakeup_source *wakeup_source;
struct fasync_struct *fasync;
struct evdev *evdev;
struct list_head node;
@@ -75,10 +76,12 @@ static void evdev_pass_event(struct evde
client->buffer[client->tail].value = 0;

client->packet_head = client->tail;
+ __pm_relax(client->wakeup_source);
}

if (event->type == EV_SYN && event->code == SYN_REPORT) {
client->packet_head = client->head;
+ __pm_stay_awake(client->wakeup_source);
kill_fasync(&client->fasync, SIGIO, POLL_IN);
}

@@ -255,6 +258,8 @@ static int evdev_release(struct inode *i
mutex_unlock(&evdev->mutex);

evdev_detach_client(evdev, client);
+ wakeup_source_unregister(client->wakeup_source);
+
kfree(client);

evdev_close_device(evdev);
@@ -373,6 +378,8 @@ static int evdev_fetch_next_event(struct
if (have_event) {
*event = client->buffer[client->tail++];
client->tail &= client->bufsize - 1;
+ if (client->packet_head == client->tail)
+ __pm_relax(client->wakeup_source);
}

spin_unlock_irq(&client->buffer_lock);
@@ -623,6 +630,45 @@ static int evdev_handle_set_keycode_v2(s
return input_set_keycode(dev, &ke);
}

+static int evdev_attach_wakeup_source(struct evdev *evdev,
+ struct evdev_client *client)
+{
+ struct wakeup_source *ws;
+ char name[28];
+
+ if (client->wakeup_source)
+ return 0;
+
+ snprintf(name, sizeof(name), "%s-%d",
+ dev_name(&evdev->dev), task_tgid_vnr(current));
+
+ ws = wakeup_source_register(name);
+ if (!ws)
+ return -ENOMEM;
+
+ spin_lock_irq(&client->buffer_lock);
+ client->wakeup_source = ws;
+ if (client->packet_head != client->tail)
+ __pm_stay_awake(client->wakeup_source);
+ spin_unlock_irq(&client->buffer_lock);
+ return 0;
+}
+
+static int evdev_detach_wakeup_source(struct evdev *evdev,
+ struct evdev_client *client)
+{
+ struct wakeup_source *ws;
+
+ spin_lock_irq(&client->buffer_lock);
+ ws = client->wakeup_source;
+ client->wakeup_source = NULL;
+ spin_unlock_irq(&client->buffer_lock);
+
+ wakeup_source_unregister(ws);
+
+ return 0;
+}
+
static long evdev_do_ioctl(struct file *file, unsigned int cmd,
void __user *p, int compat_mode)
{
@@ -696,6 +742,15 @@ static long evdev_do_ioctl(struct file *

case EVIOCSKEYCODE_V2:
return evdev_handle_set_keycode_v2(dev, p);
+
+ case EVIOCGWAKEUPSRC:
+ return put_user(!!client->wakeup_source, ip);
+
+ case EVIOCSWAKEUPSRC:
+ if (p)
+ return evdev_attach_wakeup_source(evdev, client);
+ else
+ return evdev_detach_wakeup_source(evdev, client);
}

size = _IOC_SIZE(cmd);
Index: linux/include/linux/input.h
===================================================================
--- linux.orig/include/linux/input.h
+++ linux/include/linux/input.h
@@ -129,6 +129,9 @@ struct input_keymap_entry {

#define EVIOCGRAB _IOW('E', 0x90, int) /* Grab/Release device */

+#define EVIOCGWAKEUPSRC _IOR('E', 0x91, int) /* Check if wakeup handling is enabled */
+#define EVIOCSWAKEUPSRC _IOW('E', 0x91, int) /* Enable/disable wakeup handling */
+
/*
* Device properties and quirks
*/

Rafael J. Wysocki

unread,
Feb 21, 2012, 6:37:09 PM2/21/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
From: Rafael J. Wysocki <r...@sisk.pl>

The current wakeup source deactivation code doesn't do anything when
the counter of wakeup events in progress goes down to zero, which
requires pm_get_wakeup_count() to poll that counter periodically.
Although this reduces the average time it takes to deactivate a
wakeup source, it also may lead to a substantial amount of unnecessary
polling if there are extended periods of wakeup activity. Thus it
seems reasonable to use a wait queue for signaling the "no wakeup
events in progress" condition and remove the polling.

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
drivers/base/power/wakeup.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -17,8 +17,6 @@

#include "power.h"

-#define TIMEOUT 100
-
/*
* If set, the suspend/hibernate code will abort transitions to a sleep state
* if wakeup events are registered during or immediately before the transition.
@@ -52,6 +50,8 @@ static void pm_wakeup_timer_fn(unsigned

static LIST_HEAD(wakeup_sources);

+static DECLARE_WAIT_QUEUE_HEAD(wakeup_count_wait_queue);
+
/**
* wakeup_source_prepare - Prepare a new wakeup source for initialization.
* @ws: Wakeup source to prepare.
@@ -442,6 +442,7 @@ EXPORT_SYMBOL_GPL(pm_stay_awake);
*/
static void wakeup_source_deactivate(struct wakeup_source *ws)
{
+ unsigned int cnt, inpr;
ktime_t duration;
ktime_t now;

@@ -476,6 +477,10 @@ static void wakeup_source_deactivate(str
* couter of wakeup events in progress simultaneously.
*/
atomic_add(MAX_IN_PROGRESS, &combined_event_count);
+
+ split_counters(&cnt, &inpr);
+ if (!inpr && waitqueue_active(&wakeup_count_wait_queue))
+ wake_up(&wakeup_count_wait_queue);
}

/**
@@ -667,14 +672,19 @@ bool pm_wakeup_pending(void)
bool pm_get_wakeup_count(unsigned int *count)
{
unsigned int cnt, inpr;
+ DEFINE_WAIT(wait);

for (;;) {
+ prepare_to_wait(&wakeup_count_wait_queue, &wait,
+ TASK_INTERRUPTIBLE);
split_counters(&cnt, &inpr);
if (inpr == 0 || signal_pending(current))
break;
pm_wakeup_update_hit_counts();
- schedule_timeout_interruptible(msecs_to_jiffies(TIMEOUT));
+
+ schedule();
}
+ finish_wait(&wakeup_count_wait_queue, &wait);

split_counters(&cnt, &inpr);
*count = cnt;

John Stultz

unread,
Feb 21, 2012, 11:51:11 PM2/21/12
to Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
On Wed, 2012-02-22 at 00:31 +0100, Rafael J. Wysocki wrote:
> Hi all,
>
> After the feedback so far I've decided to follow up with a refreshed patchset.
> The first two patches from the previous one went to linux-pm/linux-next
> and I included the recent evdev patch from Arve (with some modifications)
> to this patchset for completness.

Hey Rafael,
Thanks again for posting this! I've started playing around with it in a
kvm environment, and got the following warning after echoing off >
autosleep:
..
PM: resume of devices complete after 185.615 msecs
PM: Finishing wakeup.
Restarting tasks ... done.
PM: Syncing filesystems ... done.
PM: Preparing system for mem sleep
Freezing user space processes ...
Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0):
bash D ffff880015714010
===============================
[ INFO: suspicious RCU usage. ]
3.3.0-rc3john+ #131 Not tainted
-------------------------------
kernel/sched/core.c:4784 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 0
5 locks held by kworker/u:1/10:
#0: (autosleep){.+.+.+}, at: [<ffffffff81066db8>] process_one_work+0x2d8/0x8c0
#1: (suspend_work){+.+.+.}, at: [<ffffffff81066db8>] process_one_work+0x2d8/0x8c0
#2: (autosleep_lock){+.+.+.}, at: [<ffffffff810a2d3d>] try_to_suspend+0x2d/0xe0
#3: (pm_mutex){+.+.+.}, at: [<ffffffff8109b9fc>] pm_suspend+0x8c/0x210
#4: (tasklist_lock){.+.+..}, at: [<ffffffff8109b0f1>] try_to_freeze_tasks+0x2d1/0x400

stack backtrace:
Pid: 10, comm: kworker/u:1 Not tainted 3.3.0-rc3john+ #131
Call Trace:
[<ffffffff81040d82>] ? vprintk+0x242/0x530
[<ffffffff810b0fdb>] lockdep_rcu_suspicious+0xeb/0x100
[<ffffffff81083371>] sched_show_task+0x121/0x180
[<ffffffff8109b1e5>] try_to_freeze_tasks+0x3c5/0x400
[<ffffffff810a2d10>] ? pm_autosleep_set_state+0x80/0x80
[<ffffffff8109b2eb>] freeze_processes+0x3b/0xb0
[<ffffffff8109baad>] pm_suspend+0x13d/0x210
[<ffffffff810a2d5d>] try_to_suspend+0x4d/0xe0
[<ffffffff81066f02>] process_one_work+0x422/0x8c0
[<ffffffff81066db8>] ? process_one_work+0x2d8/0x8c0
[<ffffffff810b063e>] ? put_lock_stats+0xe/0x40
[<ffffffff81067a16>] worker_thread+0x476/0x550
[<ffffffff810675a0>] ? rescuer_thread+0x200/0x200
[<ffffffff810706fe>] kthread+0xae/0xc0
[<ffffffff81af4cb4>] kernel_thread_helper+0x4/0x10
[<ffffffff81af3078>] ? retint_restore_args+0x13/0x13
[<ffffffff81070650>] ? __init_kthread_worker+0x70/0x70
[<ffffffff81af4cb0>] ? gs_change+0x13/0x13
0 1981 1980 0x00020004
ffff880015715d88 0000000000000046 ffff880015715c88 ffffffff8102c22b
ffff880015714010 ffff880015715fd8 ffff880015714010 ffff880015714000
ffff880015715fd8 ffff880015714000 ffff880015c4e3c0 ffff88001342e540
Call Trace:
[<ffffffff8102c22b>] ? kvm_clock_read+0x6b/0x90
[<ffffffff810b1f2d>] ? mark_held_locks+0xad/0x150
[<ffffffff81af10bf>] schedule+0x3f/0x60
[<ffffffff81aef33b>] mutex_lock_nested+0x1cb/0x4c0
[<ffffffff810a2cae>] ? pm_autosleep_set_state+0x1e/0x80
[<ffffffff810a2cae>] ? pm_autosleep_set_state+0x1e/0x80
[<ffffffff810a2cae>] pm_autosleep_set_state+0x1e/0x80
[<ffffffff8109a74b>] autosleep_store+0x3b/0x80
[<ffffffff813856e7>] kobj_attr_store+0x17/0x20
[<ffffffff81200dcc>] sysfs_write_file+0xec/0x170
[<ffffffff8118085f>] vfs_write+0x11f/0x1b0
[<ffffffff811809f4>] sys_write+0x54/0xa0
[<ffffffff81af4e66>] sysenter_dispatch+0x7/0x26
[<ffffffff8139238e>] ? trace_hardirqs_on_thunk+0x3a/0x3f

Restarting tasks ... done.

Srivatsa S. Bhat

unread,
Feb 22, 2012, 3:45:46 AM2/22/12
to John Stultz, Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
On 02/22/2012 10:19 AM, John Stultz wrote:

> On Wed, 2012-02-22 at 00:31 +0100, Rafael J. Wysocki wrote:
>> Hi all,
>>
>> After the feedback so far I've decided to follow up with a refreshed patchset.
>> The first two patches from the previous one went to linux-pm/linux-next
>> and I included the recent evdev patch from Arve (with some modifications)
>> to this patchset for completness.
>
> Hey Rafael,
> Thanks again for posting this! I've started playing around with it in a
> kvm environment, and got the following warning after echoing off >
> autosleep:
> ...
> PM: resume of devices complete after 185.615 msecs
> PM: Finishing wakeup.
> Restarting tasks ... done.
> PM: Syncing filesystems ... done.
> PM: Preparing system for mem sleep
> Freezing user space processes ...
> Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0):
> bash D ffff880015714010


Ah.. I think I know what is the problem here..

The kernel was freezing userspace processes and meanwhile, you wrote "off"
to autosleep. So, as a result, this userspace process (bash) just now
entered kernel mode. Unfortunately, the autosleep_lock is held for too long,
that is, something like:

acquire autosleep_lock
modify autosleep_state
<============== "A"
pm_suspend or hibernate()

release autosleep_lock

At point marked "A", we should have released the autosleep lock and only then
entered pm_suspend or hibernate(). Since the current code holds the lock and
enters suspend/hibernate, the userspace process that wrote "off" to autosleep
(or even userspace process that writes to /sys/power/state will end up waiting
on autosleep_lock, thus failing the freezing operation.)

So the solution is to always release the autosleep lock before entering
suspend/hibernation.


Regards,
Srivatsa S. Bhat

Srivatsa S. Bhat

unread,
Feb 22, 2012, 3:46:02 AM2/22/12
to Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
We are calling pm_suspend() or hibernate() directly here.
Won't this break build when CONFIG_SUSPEND or CONFIG_HIBERNATION is not set?
CONFIG_PM_AUTOSLEEP depends only on PM_SLEEP which means we could enable
either one of suspend or hibernation and yet come to this point, breaking
the option which was not enabled.

Regards,
Srivatsa S. Bhat

Rafael J. Wysocki

unread,
Feb 22, 2012, 5:06:40 PM2/22/12
to Srivatsa S. Bhat, John Stultz, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
Well, the autosleep lock is intentionally held around suspend/hibernation in
try_to_suspend(), because otherwise it would be possible to trigger automatic
suspend right after user space has disabled it.

I think the solution is to make pm_autosleep_lock() do a _trylock() and
return error code if already locked.

Thanks,
Rafael

Rafael J. Wysocki

unread,
Feb 22, 2012, 5:06:59 PM2/22/12
to Srivatsa S. Bhat, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
On Wednesday, February 22, 2012, Srivatsa S. Bhat wrote:
Both pm_suspend() and hibernate() have appropriate static inline definitions
for !CONFIG_SUSPEND and !CONFIG_HIBERNATION (in suspend.h), as far as I can say.

Thanks,
Rafael

Srivatsa S. Bhat

unread,
Feb 23, 2012, 12:36:52 AM2/23/12
to Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
Oh, you are right.. I overlooked that, sorry!

Regards,
Srivatsa S. Bhat

Srivatsa S. Bhat

unread,
Feb 23, 2012, 1:26:26 AM2/23/12
to Rafael J. Wysocki, John Stultz, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
On 02/23/2012 03:40 AM, Rafael J. Wysocki wrote:

Hmm.. I was just wondering if we could avoid holding yet another lock in the
suspend/hibernate path, if possible..


> I think the solution is to make pm_autosleep_lock() do a _trylock() and
> return error code if already locked.
>

.. and also do a trylock() in pm_autosleep_set_state() right?.... that is
where John hit the problem..

By the way, I am just curious.. how difficult will this make it for userspace
to disable autosleep? I mean, would a trylock mean that the user has to keep
fighting until he finally gets a chance to disable autosleep?

Regards,
Srivatsa S. Bhat

Rafael J. Wysocki

unread,
Feb 23, 2012, 4:22:51 PM2/23/12
to Srivatsa S. Bhat, John Stultz, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
> ... and also do a trylock() in pm_autosleep_set_state() right?.... that is
> where John hit the problem..
>
> By the way, I am just curious.. how difficult will this make it for userspace
> to disable autosleep? I mean, would a trylock mean that the user has to keep
> fighting until he finally gets a chance to disable autosleep?

That's a good point, so I think it may be a good idea to do
mutex_lock_interruptible() in pm_autosleep_set_state() instead.

Thanks,
Rafael

Rafael J. Wysocki

unread,
Feb 23, 2012, 4:29:11 PM2/23/12
to Srivatsa S. Bhat, John Stultz, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
Now that I think of it, perhaps it's a good idea to just make
pm_autosleep_lock() do mutex_lock_interruptible() _and_ make
pm_autosleep_set_state() use pm_autosleep_lock().

What do you think?

Srivatsa S. Bhat

unread,
Feb 23, 2012, 11:44:53 PM2/23/12
to Rafael J. Wysocki, John Stultz, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
Well, I don't think mutex_lock_interruptible() would help us much..
Consider what would happen, if we use it:

* pm-suspend got initiated as part of autosleep. Acquired autosleep lock.
* Userspace is about to get frozen.
* Now, the user tries to write "off" to autosleep. And hence, he is waiting
for autosleep lock, interruptibly.
* The freezer sent a fake signal to all userspace processes and hence
this process also got interrupted.. it is no longer waiting on autosleep
lock - it got the signal and returned, and got frozen.
(And when the userspace gets thawed later, this process won't have the
autosleep lock - which is a different (but yet another) problem).

So ultimately the only thing we achieved is to ensure that freezing of
userspace goes smoothly. But the user process could not succeed in
disabling autosleep. Of course we can work around that by having the
mutex_lock_interruptible() in a loop and so on, but that gets very
ugly pretty soon.

So, I would suggest the following solution:

We want to achieve 2 things here:
a. A user process trying to write to /sys/power/state or
/sys/power/autosleep should not cause freezing failures.
b. When a user process writes "off" to autosleep, the suspend/hibernate
attempt that is on-going, if any, must be immediately aborted, to give
the user the feeling that his preference has been noticed and respected.

And to achieve this, we note that a user process can write "off" to autosleep
only until the userspace gets frozen. No chance after that.

So, let's do this:
1. Drop the autosleep lock before entering pm-suspend/hibernate.
2. This means, a user process can get hold of this lock and successfully
disable autosleep a moment after we initiated suspend, but before userspace
got frozen fully.
3. So, to respect the user's wish, we add a check immediately after the
freezing of userspace is complete - we check if the user disabled autosleep
and bail out, if he did. Otherwise, we continue and suspend the machine.

IOW, this is like hitting 2 birds with one stone ;-)
We don't hold autosleep lock throughout suspend/hibernate, but still react
instantly when the user disables autosleep. And of course, freezing of tasks
won't fail, ever! :-)


Regards,
Srivatsa S. Bhat

Matt Helsley

unread,
Feb 24, 2012, 12:17:28 AM2/24/12
to Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
On Wed, Feb 22, 2012 at 12:34:58AM +0100, Rafael J. Wysocki wrote:
> From: Arve Hjønnevåg <ar...@android.com>
>
> Add a new ioctl, EVIOCSWAKEUPSRC, to attach a wakeup source object to
> an evdev client event queue, such that it will be active whenever the
> queue is not empty. Then, all events in the queue will be regarded
> as wakeup events in progress and pm_get_wakeup_count() will block (or
> return false if woken up by a signal) until they are removed from the
> queue. In consequence, if the checking of wakeup events is enabled
> (e.g. throught the /sys/power/wakeup_count interface), the system
> won't be able to go into a sleep state until the queue is empty.
>
> This allows user space processes to handle situations in which they
> want to do a select() on an evdev descriptor, so they go to sleep
> until there are some events to read from the device's queue, and then
> they don't want the system to go into a sleep state until all the
> events are read (presumably for further processing). Of course, if
> they don't want the system to go into a sleep state _after_ all the
> events have been read from the queue, they have to use a separate
> mechanism that will prevent the system from doing that and it has
> to be activated before reading the first event (that also may be the
> last one).

I haven't seen this idea mentioned before but I must admit I haven't
been following this thread too closely so apologies (and don't bother
rehashing) if it has:

Could you just add this to epoll so that any fd userspace chooses would be
capable of doing this without introducing potentially ecclectic ioctl
interfaces?

struct epoll_event ev;

epfd = epoll_create1(EPOLL_STAY_AWAKE_SET);
ev.data.ptr = foo;
epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev);

Which could be useful because you can put one epollfd in another's epoll
set. Or maybe as an EPOLLKEEPAWAKE flag in the event struct sort of like
EPOLLET:

epfd = epoll_create1(0);
ev.events = EPOLLIN|EPOLLKEEPAWAKE;
epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev);
This does not look like it will work well with tasks in different pid
namespaces. What should happen, I think, is the wakeup_source should hold a
reference to either the struct pid of current or current itself. Then
when someone reads the file you should get the pid vnr in the reader's
pid namespace. That way instead of a bogus pid vnr 0 would show up if
"current" here is not in the reader's pid namepsace.

Cheers,
-Matt Helsley

Rafael J. Wysocki

unread,
Feb 24, 2012, 6:17:42 PM2/24/12
to Srivatsa S. Bhat, John Stultz, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
On Friday, February 24, 2012, Srivatsa S. Bhat wrote:
> On 02/24/2012 03:02 AM, Rafael J. Wysocki wrote:
>
> > On Thursday, February 23, 2012, Rafael J. Wysocki wrote:
> >> On Thursday, February 23, 2012, Srivatsa S. Bhat wrote:
> >>> On 02/23/2012 03:40 AM, Rafael J. Wysocki wrote:
[...]
Well, you essentially are postulating to restore the "interface" wakeup source
that was present in the previous version of this patch and that I dropped in
order to simplify the code.

I guess I can do that ...

Thanks,
Rafael

Arve Hjønnevåg

unread,
Feb 24, 2012, 11:26:14 PM2/24/12
to Matt Helsley, Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
This is an interesting idea, but I'm not sure how well it would work.

I looked at the epoll code and it looks like it is possible to
activate the wakeup-source from the wait queue function it uses. The
epoll callback will happen without holding evdev client buffer_lock,
so the wakeup-source and buffer state will not always be in sync (this
may be OK, but require more thought). This callback is also called if
no data was added to the queue we are polling on because another
client has grabbed the input device (is this a bug or intended?).

There is no call into the epoll code when input queue is emptied, so
we can't deactivate the wakeup-source until epoll_wait is called
again. This also should be workable, but result in different stats.

It does not look like the normal poll and select interfaces can be
extended the same way (since they remove themselves from the
wait-queue before returning to user-space), so user-space has to be
changed to use epoll even if select or poll would be a better fit.

I don't know how many other drivers this would work for. The input
driver will wake up user-space from the same thread or interrupt
handler that queued the event, but other drivers may defer this to
another thread which makes an epoll wakeup-source insufficient.

..
>> +     snprintf(name, sizeof(name), "%s-%d",
>> +              dev_name(&evdev->dev), task_tgid_vnr(current));
>
> This does not look like it will work well with tasks in different pid
> namespaces. What should happen, I think, is the wakeup_source should hold a
> reference to either the struct pid of current or current itself. Then
> when someone reads the file you should get the pid vnr in the reader's
> pid namespace. That way instead of a bogus pid vnr 0 would show up if
> "current" here is not in the reader's pid namepsace.
>

The pid here is only used for debugging purposes, and used less than
the dev_name. I don't think tracking pid namespaces is worth the
trouble here, so if this is a real problem we can just drop the pid
from the name for now.

--
Arve Hjønnevåg

Arve Hjønnevåg

unread,
Feb 24, 2012, 11:44:15 PM2/24/12
to Rafael J. Wysocki, Srivatsa S. Bhat, John Stultz, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
If this wakeup source is reported as active whenever user-space has
not requested suspend that would be useful in the stats. It does not
look like your original patch did this however, but you could have a
main wakeup-source that you release when any form of suspend is
requested and activate when turning off auto suspend or returning from
a one-shot suspend operation.

--
Arve Hjønnevåg

Srivatsa S. Bhat

unread,
Feb 25, 2012, 2:20:59 PM2/25/12
to Rafael J. Wysocki, John Stultz, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
Oh is it? I guess I haven't followed this thread very closely...

> I guess I can do that ...
>


Oh by the way, this scheme doesn't solve all problems. It might be effective
in reacting "instantly" to a request by the user to *switch off* autosleep.
But say, when the user wants to switch to suspend instead of hibernate as the
autosleep preference, for example, I don't think it would be as quick in
responding... (I mean, it might do the old operation one more time before
switching to the new one..)

But I guess at this point it might be wiser to say "sigh.. we can do only so
much..." instead of complicating the code too much in an attempt to meet
everybody's expectations :-)

Regards,
Srivatsa S. Bhat

Rafael J. Wysocki

unread,
Feb 25, 2012, 3:40:00 PM2/25/12
to Arve Hjønnevåg, Srivatsa S. Bhat, John Stultz, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
No, it didn't.

> but you could have a
> main wakeup-source that you release when any form of suspend is
> requested and activate when turning off auto suspend or returning from
> a one-shot suspend operation.

I honestly don't think I can do that and handle the /sys/power/wakeup_count
-> /sys/power/state handoff (which is used by OLPC, as we've learnt recently)
sanely at the same time. OTOH, I don't want CONFIG_AUTOSLEEP to disable that
interface entirely, because things like that basically prevent people from
trying alternative features, which is essential to us for "interesting
feedback" reasons.

So, my "main" wakeup source is only going to register the number of times user
space has (successfully) written to /sysp/power/autosleep (please have a look
at the updated patch I'm going to send in a reply to Srivatsa in a little
while).

Thanks,
Rafael

Rafael J. Wysocki

unread,
Feb 25, 2012, 3:57:20 PM2/25/12
to Srivatsa S. Bhat, John Stultz, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
I think we can do something like in the updated patch [5/7] below.

It uses a special wakeup source object called "autosleep" to bump up the
number of wakeup events in progress before acquiring autosleep_lock in
pm_autosleep_set_state(). This way, either pm_autosleep_set_state() will
acquire autosleep_lock before try_to_suspend(), in which case the latter
will see the change of autosleep_state immediately (after autosleep_lock has
been passed to it), or try_to_suspend() will get it first, but then
pm_save_wakeup_count() or pm_suspend()/hibernate() will see the nonzero counter
of wakeup events in progress and return error code (sooner or later).

The drawback is that writes to /sys/power/autosleep may interfere with
the /sys/power/wakeup_count + /sys/power/state interface by interrupting
transitions started by writing to /sys/power/state, for example (although
I think that's highly unlikely).

Additionally, I made pm_autosleep_lock() use mutex_trylock_interruptible()
to prevent operations on /sys/power/wakeup_count and/or /sys/power/state
from failing the freezing of tasks started by try_to_suspend().

Thanks,
Rafael

---
From: Rafael J. Wysocki <r...@sisk.pl>
Subject: PM / Sleep: Implement opportunistic sleep

Introduce a mechanism by which the kernel can trigger global
transitions to a sleep state chosen by user space if there are no
active wakeup sources.

It consists of a new sysfs attribute, /sys/power/autosleep, that
can be written one of the strings returned by reads from
/sys/power/state, an ordered workqueue and a work item carrying out
the "suspend" operations. If a string representing the system's
sleep state is written to /sys/power/autosleep, the work item
triggering transitions to that state is queued up and it requeues
itself after every execution until user space writes "off" to
/sys/power/autosleep.

That work item enables the detection of wakeup events using the
functions already defined in drivers/base/power/wakeup.c (with one
small modification) and calls either pm_suspend(), or hibernate() to
put the system into a sleep state. If a wakeup event is reported
while the transition is in progress, it will abort the transition and
the "system suspend" work item will be queued up again.

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
Documentation/ABI/testing/sysfs-power | 17 ++++
drivers/base/power/wakeup.c | 38 ++++++-----
include/linux/suspend.h | 13 +++
kernel/power/Kconfig | 8 ++
kernel/power/Makefile | 1
kernel/power/autosleep.c | 113 ++++++++++++++++++++++++++++++++
kernel/power/main.c | 117 ++++++++++++++++++++++++++++------
kernel/power/power.h | 18 +++++
8 files changed, 290 insertions(+), 35 deletions(-)
+extern int pm_autosleep_lock(void);
+extern void pm_autosleep_unlock(void);
+extern suspend_state_t pm_autosleep_state(void);
+extern int pm_autosleep_set_state(suspend_state_t state);
+
+#else /* !CONFIG_PM_AUTOSLEEP */
+
+static inline int pm_autosleep_init(void) { return 0; }
+static inline int pm_autosleep_lock(void) { return 0; }
@@ -0,0 +1,113 @@
+/*
+ * kernel/power/autosleep.c
+ *
+ * Opportunistic sleep support.
+ *
+ * Copyright (C) 2012 Rafael J. Wysocki <r...@sisk.pl>
+ */
+
+#include <linux/device.h>
+#include <linux/mutex.h>
+#include <linux/pm_wakeup.h>
+
+#include "power.h"
+
+static suspend_state_t autosleep_state;
+static struct workqueue_struct *autosleep_wq;
+static DEFINE_MUTEX(autosleep_lock);
+static struct wakeup_source *autosleep_ws;
+
+static void try_to_suspend(struct work_struct *work)
+{
+ unsigned int initial_count, final_count;
+
+ if (!pm_get_wakeup_count(&initial_count, true))
+ goto out;
+
+ mutex_lock(&autosleep_lock);
+
+ if (!pm_save_wakeup_count(initial_count)) {
+ mutex_unlock(&autosleep_lock);
+ goto out;
+ }
+
+ if (autosleep_state == PM_SUSPEND_ON) {
+ mutex_unlock(&autosleep_lock);
+ return;
+ }
+ if (autosleep_state >= PM_SUSPEND_MAX)
+ hibernate();
+ else
+ pm_suspend(autosleep_state);
+
+ mutex_unlock(&autosleep_lock);
+
+ if (!pm_get_wakeup_count(&final_count, false))
+ goto out;
+
+ if (final_count == initial_count)
+ schedule_timeout(HZ / 2);
+
+ out:
+ queue_up_suspend_work();
+}
+
+static DECLARE_WORK(suspend_work, try_to_suspend);
+
+void queue_up_suspend_work(void)
+{
+ if (!work_pending(&suspend_work) && autosleep_state > PM_SUSPEND_ON)
+ queue_work(autosleep_wq, &suspend_work);
+}
+
+suspend_state_t pm_autosleep_state(void)
+{
+ return autosleep_state;
+}
+
+int pm_autosleep_lock(void)
+{
+ return mutex_lock_interruptible(&autosleep_lock);
+}
+
+void pm_autosleep_unlock(void)
+{
+ mutex_unlock(&autosleep_lock);
+}
+
+int pm_autosleep_set_state(suspend_state_t state)
+{
+
+#ifndef CONFIG_HIBERNATION
+ if (state >= PM_SUSPEND_MAX)
+ return -EINVAL;
+#endif
+
+ __pm_stay_awake(autosleep_ws);
+
+ mutex_lock(&autosleep_lock);
+
+ autosleep_state = state;
+
+ __pm_relax(autosleep_ws);
+
+ if (state > PM_SUSPEND_ON)
+ queue_up_suspend_work();
+
+ mutex_unlock(&autosleep_lock);
+ return 0;
+}
+
+int __init pm_autosleep_init(void)
+{
+ autosleep_ws = wakeup_source_register("autosleep");
+ if (!autosleep_ws)
+ return -ENOMEM;
+
+ autosleep_wq = alloc_ordered_workqueue("autosleep", 0);
+ if (autosleep_wq)
+ return 0;
+
+ wakeup_source_unregister(autosleep_ws);
+ return -ENOMEM;
+}
Index: linux/kernel/power/main.c
===================================================================
--- linux.orig/kernel/power/main.c
+++ linux/kernel/power/main.c
@@ -269,8 +269,7 @@ static ssize_t state_show(struct kobject
return (s - buf);
}

-static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
- const char *buf, size_t n)
+static suspend_state_t decode_state(const char *buf, size_t n)
{
#ifdef CONFIG_SUSPEND
suspend_state_t state = PM_SUSPEND_STANDBY;
@@ -278,27 +277,48 @@ static ssize_t state_store(struct kobjec
+ int error;
+
+ error = pm_autosleep_lock();
+ if (error)
+ return error;
+
+ if (pm_autosleep_state() > PM_SUSPEND_ON) {
+ error = -EBUSY;
+ goto out;
+ }
+
+ state = decode_state(buf, n);
+ if (state < PM_SUSPEND_MAX)
+ error = pm_suspend(state);
+ else if (state > PM_SUSPEND_ON)
+ error = hibernate();
+ else
+ error = -EINVAL;
+
+ out:
+ pm_autosleep_unlock();
return error ? error : n;
}

@@ -339,7 +359,8 @@ static ssize_t wakeup_count_show(struct
{
unsigned int val;

- return pm_get_wakeup_count(&val) ? sprintf(buf, "%u\n", val) : -EINTR;
+ return pm_get_wakeup_count(&val, true) ?
+ sprintf(buf, "%u\n", val) : -EINTR;
}

static ssize_t wakeup_count_store(struct kobject *kobj,
@@ -347,15 +368,69 @@ static ssize_t wakeup_count_store(struct
const char *buf, size_t n)
{
unsigned int val;
+ int error;
+
+ error = pm_autosleep_lock();
+ if (error)
+ return error;
+
+ if (pm_autosleep_state() > PM_SUSPEND_ON) {
+ error = -EBUSY;
+ goto out;
+ }

if (sscanf(buf, "%u", &val) == 1) {
if (pm_save_wakeup_count(val))
return n;
}
- return -EINVAL;
+ error = -EINVAL;
+
@@ -409,6 +484,9 @@ static struct attribute * g[] = {
#ifdef CONFIG_PM_SLEEP
&pm_async_attr.attr,
&wakeup_count_attr.attr,
+#ifdef CONFIG_PM_AUTOSLEEP
+ &autosleep_attr.attr,
+#endif
#ifdef CONFIG_PM_DEBUG
&pm_test_attr.attr,
#endif
@@ -444,7 +522,10 @@ static int __init pm_init(void)

Rafael J. Wysocki

unread,
Feb 25, 2012, 6:30:52 PM2/25/12
to Arve Hjønnevåg, Matt Helsley, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
I'm not sure I'm following you here. How exactly would you like to do that?

In particular, what data structure would the wakeup source object be
associated with?

> The epoll callback will happen without holding evdev client buffer_lock,
> so the wakeup-source and buffer state will not always be in sync (this
> may be OK, but require more thought). This callback is also called if
> no data was added to the queue we are polling on because another
> client has grabbed the input device (is this a bug or intended?).
>
> There is no call into the epoll code when input queue is emptied, so
> we can't deactivate the wakeup-source until epoll_wait is called
> again. This also should be workable, but result in different stats.
>
> It does not look like the normal poll and select interfaces can be
> extended the same way (since they remove themselves from the
> wait-queue before returning to user-space), so user-space has to be
> changed to use epoll even if select or poll would be a better fit.

Well, epoll without EPOLLET is equivalent to poll, so the only potential
issue is select. How serious may the problem with that be?

> I don't know how many other drivers this would work for. The input
> driver will wake up user-space from the same thread or interrupt
> handler that queued the event, but other drivers may defer this to
> another thread which makes an epoll wakeup-source insufficient.

If we go for new ioctls insread, we'll have to add them to all of those
drivers, so I would prefer the epoll-based approach if that's viable at
least for a subset of the relevant drivers.

> ...
> >> + snprintf(name, sizeof(name), "%s-%d",
> >> + dev_name(&evdev->dev), task_tgid_vnr(current));
> >
> > This does not look like it will work well with tasks in different pid
> > namespaces. What should happen, I think, is the wakeup_source should hold a
> > reference to either the struct pid of current or current itself. Then
> > when someone reads the file you should get the pid vnr in the reader's
> > pid namespace. That way instead of a bogus pid vnr 0 would show up if
> > "current" here is not in the reader's pid namepsace.
> >
>
> The pid here is only used for debugging purposes, and used less than
> the dev_name. I don't think tracking pid namespaces is worth the
> trouble here, so if this is a real problem we can just drop the pid
> from the name for now.

OK

Thanks,
Rafael

Rafael J. Wysocki

unread,
Feb 26, 2012, 3:53:24 PM2/26/12
to Matt Helsley, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
Do you mean something like the patch below, or something different?

Rafael

---
drivers/input/evdev.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++
fs/eventpoll.c | 15 +++++++++++-
include/linux/eventpoll.h | 6 +++++
include/linux/fs.h | 1
4 files changed, 76 insertions(+), 1 deletion(-)

Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h
+++ linux/include/linux/fs.h
@@ -1604,6 +1604,7 @@ struct file_operations {
ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
int (*readdir) (struct file *, void *, filldir_t);
unsigned int (*poll) (struct file *, struct poll_table_struct *);
+ void (*epoll_ctl) (struct file *, int, unsigned int);
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
Index: linux/fs/eventpoll.c
===================================================================
--- linux.orig/fs/eventpoll.c
+++ linux/fs/eventpoll.c
@@ -609,6 +609,10 @@ static int ep_remove(struct eventpoll *e
unsigned long flags;
struct file *file = epi->ffd.file;

+ /* Notify the underlying driver that the polling has completed */
+ if (file->f_op->epoll_ctl)
+ file->f_op->epoll_ctl(file, EPOLL_CTL_DEL, epi->event.events);
+
/*
* Removes poll wait queue hooks. We _have_ to do this without holding
* the "ep->lock" otherwise a deadlock might occur. This because of the
@@ -1094,6 +1098,10 @@ static int ep_insert(struct eventpoll *e
epq.epi = epi;
init_poll_funcptr(&epq.pt, ep_ptable_queue_proc);

+ /* Notify the underlying driver that we want to poll it */
+ if (tfile->f_op->epoll_ctl)
+ tfile->f_op->epoll_ctl(tfile, EPOLL_CTL_ADD, event->events);
+
/*
* Attach the item to the poll hooks and get current event bits.
* We can safely use the file* here because its usage count has
@@ -1185,6 +1193,7 @@ error_unregister:
*/
static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_event *event)
{
+ struct file *file = epi->ffd.file;
int pwake = 0;
unsigned int revents;

@@ -1196,11 +1205,15 @@ static int ep_modify(struct eventpoll *e
epi->event.events = event->events;
epi->event.data = event->data; /* protected by mtx */

+ /* Notify the underlying driver of the change */
+ if (file->f_op->epoll_ctl)
+ file->f_op->epoll_ctl(file, EPOLL_CTL_MOD, event->events);
+
/*
* Get current event bits. We can safely use the file* here because
* its usage count has been increased by the caller of this function.
*/
- revents = epi->ffd.file->f_op->poll(epi->ffd.file, NULL);
+ revents = file->f_op->poll(file, NULL);

/*
* If the item is "hot" and it is not registered inside the ready
Index: linux/drivers/input/evdev.c
===================================================================
--- linux.orig/drivers/input/evdev.c
+++ linux/drivers/input/evdev.c
@@ -16,6 +16,7 @@
#define EVDEV_BUF_PACKETS 8

#include <linux/poll.h>
+#include <linux/eventpoll.h>
#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/module.h>
@@ -43,6 +44,7 @@ struct evdev_client {
unsigned int tail;
unsigned int packet_head; /* [future] position of the first element of next packet */
spinlock_t buffer_lock; /* protects access to buffer, head and tail */
+ struct wakeup_source *wakeup_source;
struct fasync_struct *fasync;
struct evdev *evdev;
struct list_head node;
@@ -75,10 +77,12 @@ static void evdev_pass_event(struct evde
client->buffer[client->tail].value = 0;

client->packet_head = client->tail;
+ __pm_relax(client->wakeup_source);
}

if (event->type == EV_SYN && event->code == SYN_REPORT) {
client->packet_head = client->head;
+ __pm_stay_awake(client->wakeup_source);
kill_fasync(&client->fasync, SIGIO, POLL_IN);
}

@@ -255,6 +259,8 @@ static int evdev_release(struct inode *i
mutex_unlock(&evdev->mutex);

evdev_detach_client(evdev, client);
+ wakeup_source_unregister(client->wakeup_source);
+
kfree(client);

evdev_close_device(evdev);
@@ -373,6 +379,8 @@ static int evdev_fetch_next_event(struct
if (have_event) {
*event = client->buffer[client->tail++];
client->tail &= client->bufsize - 1;
+ if (client->packet_head == client->tail)
+ __pm_relax(client->wakeup_source);
}

spin_unlock_irq(&client->buffer_lock);
@@ -433,6 +441,52 @@ static unsigned int evdev_poll(struct fi
return mask;
}

+static void evdev_client_attach_wakeup_source(struct evdev_client *client)
+{
+ struct wakeup_source *ws;
+
+ ws = wakeup_source_register(dev_name(&client->evdev->dev));
+ spin_lock_irq(&client->buffer_lock);
+ client->wakeup_source = ws;
+ if (client->packet_head != client->tail)
+ __pm_stay_awake(client->wakeup_source);
+ spin_unlock_irq(&client->buffer_lock);
+}
+
+static void evdev_client_detach_wakeup_source(struct evdev_client *client)
+{
+ struct wakeup_source *ws;
+
+ spin_lock_irq(&client->buffer_lock);
+ ws = client->wakeup_source;
+ client->wakeup_source = NULL;
+ spin_unlock_irq(&client->buffer_lock);
+ wakeup_source_unregister(ws);
+}
+
+static void evdev_epoll_ctl(struct file *file, int op,
+ unsigned int events)
+{
+ struct evdev_client *client = file->private_data;
+
+ switch (op) {
+ case EPOLL_CTL_ADD:
+ if ((events & EPOLLWAKEUP) && !client->wakeup_source)
+ evdev_client_attach_wakeup_source(client);
+ break;
+ case EPOLL_CTL_DEL:
+ if (events & EPOLLWAKEUP)
+ evdev_client_detach_wakeup_source(client);
+ break;
+ case EPOLL_CTL_MOD:
+ /* 'events' is the new events mask (after the change) */
+ if ((events & EPOLLWAKEUP) && !client->wakeup_source)
+ evdev_client_attach_wakeup_source(client);
+ else if (!(events & EPOLLWAKEUP))
+ evdev_client_detach_wakeup_source(client);
+ }
+}
+
#ifdef CONFIG_COMPAT

#define BITS_PER_LONG_COMPAT (sizeof(compat_long_t) * 8)
@@ -845,6 +899,7 @@ static const struct file_operations evde
.read = evdev_read,
.write = evdev_write,
.poll = evdev_poll,
+ .epoll_ctl = evdev_epoll_ctl,
.open = evdev_open,
.release = evdev_release,
.unlocked_ioctl = evdev_ioctl,
Index: linux/include/linux/eventpoll.h
===================================================================
--- linux.orig/include/linux/eventpoll.h
+++ linux/include/linux/eventpoll.h
@@ -26,6 +26,12 @@
#define EPOLL_CTL_DEL 2
#define EPOLL_CTL_MOD 3

+/*
+ * Request the handling of system wakeup events so as to prevent automatic
+ * system suspends from happening while those events are being processed.
+ */
+#define EPOLLWAKEUP (1 << 29)
+
/* Set the One Shot behaviour for the target file descriptor */
#define EPOLLONESHOT (1 << 30)

Matt Helsley

unread,
Feb 27, 2012, 7:56:29 PM2/27/12
to Rafael J. Wysocki, Matt Helsley, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
Yeah, this was sort of what I was thinking of. It nicely avoids the
ioctl() bits. I guess my only issue is the fop mimics the epoll
interface -- should it just be an fop to manage the file as a wakeup
source rather than a generic hook into epoll?

Cheers,
-Matt Helsley

Matt Helsley

unread,
Feb 27, 2012, 7:57:15 PM2/27/12
to Arve Hjønnevåg, Matt Helsley, Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
Yup, that is exactly why epoll is so well suited to this.

> changed to use epoll even if select or poll would be a better fit.

Either way, modification of application code is necessary, right?

> I don't know how many other drivers this would work for. The input
> driver will wake up user-space from the same thread or interrupt
> handler that queued the event, but other drivers may defer this to
> another thread which makes an epoll wakeup-source insufficient.

I don't understand how this would be insufficient. So long as the
interrupt causes the wakeup source to prevent the machine from suspending
before finishing interrupt handling does it matter whether the event
handling itself is deferred?

In case there's some confusion: I'm not saying that this idea will solve
all of the problems, especially:

> >> Of course, if
> >> they don't want the system to go into a sleep state _after_ all the
> >> events have been read from the queue, they have to use a separate
> >> mechanism that will prevent the system from doing that and it has
> >> to be activated before reading the first event (that also may be
> >> the
> >> last one).

(endquote)

>
> ...
> >> +     snprintf(name, sizeof(name), "%s-%d",
> >> +              dev_name(&evdev->dev), task_tgid_vnr(current));
> >
> > This does not look like it will work well with tasks in different pid
> > namespaces. What should happen, I think, is the wakeup_source should hold a
> > reference to either the struct pid of current or current itself. Then
> > when someone reads the file you should get the pid vnr in the reader's
> > pid namespace. That way instead of a bogus pid vnr 0 would show up if
> > "current" here is not in the reader's pid namepsace.
> >
>
> The pid here is only used for debugging purposes, and used less than
> the dev_name. I don't think tracking pid namespaces is worth the
> trouble here, so if this is a real problem we can just drop the pid
> from the name for now.

I think dropping the pid would be the best choice. If it's absolutely
necessary in the output then it should be made to work with pid namespaces
because the interface will be maintained forever.

Cheers,
-Matt

Rafael J. Wysocki

unread,
Feb 27, 2012, 8:13:45 PM2/27/12
to Matt Helsley, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
I'm not exactly sure what you mean, could you be a bit more specific, please?

Rafael

Arve Hjønnevåg

unread,
Feb 28, 2012, 12:59:47 AM2/28/12
to Rafael J. Wysocki, Matt Helsley, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov, jeff...@android.com
I don't think it is useful to tie an evdev implementation to epoll
that way. You just replaced the ioctl with a new control function.

The code below tries to implement the same flag without modifying
evdev at all. The behavior of this is different as it will keep the
device awake until user-space calls epoll_wait again. I also used an
extra wakeup source to handle the function that runs without the
spin_lock held which means that non-wakeup files in same epoll list
could abort suspend.

--
Arve Hjønnevåg

----
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index f9cfd16..45af494 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -33,6 +33,7 @@
#include <linux/bitops.h>
#include <linux/mutex.h>
#include <linux/anon_inodes.h>
+#include <linux/device.h>
#include <asm/uaccess.h>
#include <asm/system.h>
#include <asm/io.h>
@@ -79,7 +80,7 @@
*/

/* Epoll private bits inside the event mask */
-#define EP_PRIVATE_BITS (EPOLLONESHOT | EPOLLET)
+#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET)

/* Maximum number of nesting allowed inside epoll sets */
#define EP_MAX_NESTS 4
@@ -146,6 +147,9 @@ struct epitem {
/* List header used to link this item to the "struct file" items list */
struct list_head fllink;

+ /* wakeup_source used when EPOLLWAKEUP is set */
+ struct wakeup_source *ws;
+
/* The structure that describe the interested events and the source fd */
struct epoll_event event;
};
@@ -186,6 +190,9 @@ struct eventpoll {
*/
struct epitem *ovflist;

+ /* wakeup_source used when ep_scan_ready_list is running */
+ struct wakeup_source *ws;
+
/* The user that created the eventpoll descriptor */
struct user_struct *user;
};
@@ -492,6 +499,7 @@ static int ep_scan_ready_list(struct eventpoll *ep,
* in a lockless way.
*/
spin_lock_irqsave(&ep->lock, flags);
+ __pm_stay_awake(ep->ws);
list_splice_init(&ep->rdllist, &txlist);
ep->ovflist = NULL;
spin_unlock_irqrestore(&ep->lock, flags);
@@ -515,9 +523,12 @@ static int ep_scan_ready_list(struct eventpoll *ep,
* queued into ->ovflist but the "txlist" might already
* contain them, and the list_splice() below takes care of them.
*/
- if (!ep_is_linked(&epi->rdllink))
+ if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);
+ }
}
+
/*
* We need to set back ep->ovflist to EP_UNACTIVE_PTR, so that after
* releasing the lock, events will be queued in the normal way inside
@@ -529,6 +540,7 @@ static int ep_scan_ready_list(struct eventpoll *ep,
* Quickly re-inject items left on "txlist".
*/
list_splice(&txlist, &ep->rdllist);
+ __pm_relax(ep->ws);

if (!list_empty(&ep->rdllist)) {
/*
@@ -583,6 +595,9 @@ static int ep_remove(struct eventpoll *ep, struct
epitem *epi)
list_del_init(&epi->rdllink);
spin_unlock_irqrestore(&ep->lock, flags);

+ if (epi->ws)
+ wakeup_source_unregister(epi->ws);
+
/* At this point it is safe to free the eventpoll item */
kmem_cache_free(epi_cache, epi);

@@ -633,6 +648,8 @@ static void ep_free(struct eventpoll *ep)
mutex_unlock(&epmutex);
mutex_destroy(&ep->mtx);
free_uid(ep->user);
+ if (ep->ws)
+ wakeup_source_unregister(ep->ws);
kfree(ep);
}

@@ -661,6 +678,7 @@ static int ep_read_events_proc(struct eventpoll
*ep, struct list_head *head,
* callback, but it's not actually ready, as far as
* caller requested events goes. We can remove it here.
*/
+ __pm_relax(epi->ws);
list_del_init(&epi->rdllink);
}
}
@@ -851,8 +869,10 @@ static int ep_poll_callback(wait_queue_t *wait,
unsigned mode, int sync, void *k
}

/* If this file is already in the ready list we exit soon */
- if (!ep_is_linked(&epi->rdllink))
+ if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);
+ }

/*
* Wake up ( if active ) both the eventpoll wait list and the ->poll()
@@ -915,6 +935,30 @@ static void ep_rbtree_insert(struct eventpoll
*ep, struct epitem *epi)
rb_insert_color(&epi->rbn, &ep->rbr);
}

+static int ep_create_wakeup_source(struct epitem *epi)
+{
+ const char *name;
+
+ if (!epi->ep->ws) {
+ epi->ep->ws = wakeup_source_register("eventpoll");
+ if (!epi->ep->ws)
+ return -ENOMEM;
+ }
+
+ name = epi->ffd.file->f_path.dentry->d_name.name;
+ epi->ws = wakeup_source_register(name);
+ if (!epi->ws)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static void ep_destroy_wakeup_source(struct epitem *epi)
+{
+ wakeup_source_unregister(epi->ws);
+ epi->ws = NULL;
+}
+
/*
* Must be called with "mtx" held.
*/
@@ -942,6 +986,13 @@ static int ep_insert(struct eventpoll *ep, struct
epoll_event *event,
epi->event = *event;
epi->nwait = 0;
epi->next = EP_UNACTIVE_PTR;
+ if (epi->event.events & EPOLLWAKEUP) {
+ error = ep_create_wakeup_source(epi);
+ if (error)
+ goto error_create_wakeup_source;
+ } else {
+ epi->ws = NULL;
+ }

/* Initialize the poll table using the queue callback */
epq.epi = epi;
@@ -982,6 +1033,7 @@ static int ep_insert(struct eventpoll *ep, struct
epoll_event *event,
/* If the file is already "ready" we drop it inside the ready list */
if ((revents & event->events) && !ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);

/* Notify waiting tasks that events are available */
if (waitqueue_active(&ep->wq))
@@ -1014,6 +1066,10 @@ error_unregister:
list_del_init(&epi->rdllink);
spin_unlock_irqrestore(&ep->lock, flags);

+ if (epi->ws)
+ wakeup_source_unregister(epi->ws);
+
+error_create_wakeup_source:
kmem_cache_free(epi_cache, epi);

return error;
@@ -1035,6 +1091,12 @@ static int ep_modify(struct eventpoll *ep,
struct epitem *epi, struct epoll_even
*/
epi->event.events = event->events;
epi->event.data = event->data; /* protected by mtx */
+ if (epi->event.events & EPOLLWAKEUP) {
+ if (!epi->ws)
+ ep_create_wakeup_source(epi);
+ } else if (epi->ws) {
+ ep_destroy_wakeup_source(epi);
+ }

/*
* Get current event bits. We can safely use the file* here because
@@ -1050,6 +1112,7 @@ static int ep_modify(struct eventpoll *ep,
struct epitem *epi, struct epoll_even
spin_lock_irq(&ep->lock);
if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);

/* Notify waiting tasks that events are available */
if (waitqueue_active(&ep->wq))
@@ -1085,6 +1148,7 @@ static int ep_send_events_proc(struct eventpoll
*ep, struct list_head *head,
!list_empty(head) && eventcnt < esed->maxevents;) {
epi = list_first_entry(head, struct epitem, rdllink);

+ __pm_relax(epi->ws);
list_del_init(&epi->rdllink);

revents = epi->ffd.file->f_op->poll(epi->ffd.file, NULL) &
@@ -1100,6 +1164,7 @@ static int ep_send_events_proc(struct eventpoll
*ep, struct list_head *head,
if (__put_user(revents, &uevent->events) ||
__put_user(epi->event.data, &uevent->data)) {
list_add(&epi->rdllink, head);
+ __pm_stay_awake(epi->ws);
return eventcnt ? eventcnt : -EFAULT;
}
eventcnt++;
@@ -1119,6 +1184,7 @@ static int ep_send_events_proc(struct eventpoll
*ep, struct list_head *head,
* poll callback will queue them in ep->ovflist.
*/
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);
}
}
}
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index f362733..cd156ff 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h

Srivatsa S. Bhat

unread,
Feb 28, 2012, 5:24:23 AM2/28/12
to Rafael J. Wysocki, John Stultz, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov
On 02/26/2012 02:31 AM, Rafael J. Wysocki wrote:

>
> I think we can do something like in the updated patch [5/7] below.
>
> It uses a special wakeup source object called "autosleep" to bump up the
> number of wakeup events in progress before acquiring autosleep_lock in
> pm_autosleep_set_state(). This way, either pm_autosleep_set_state() will
> acquire autosleep_lock before try_to_suspend(), in which case the latter
> will see the change of autosleep_state immediately (after autosleep_lock has
> been passed to it), or try_to_suspend() will get it first, but then
> pm_save_wakeup_count() or pm_suspend()/hibernate() will see the nonzero counter
> of wakeup events in progress and return error code (sooner or later).
>
> The drawback is that writes to /sys/power/autosleep may interfere with
> the /sys/power/wakeup_count + /sys/power/state interface by interrupting
> transitions started by writing to /sys/power/state, for example (although
> I think that's highly unlikely).


Yes, but I think we can live with that.. It doesn't look like a big issue.

>
> Additionally, I made pm_autosleep_lock() use mutex_trylock_interruptible()


You have used mutex_lock_interruptible() in the code below.. It wouldn't matter
as long as you have used some form of "interruptible" but I think
mutex_trylock_interruptible would be even better..

> to prevent operations on /sys/power/wakeup_count and/or /sys/power/state
> from failing the freezing of tasks started by try_to_suspend().
>
> Thanks,
> Rafael
>


The approach taken by the patch below looks good to me. I don't see any obvious
problems, except for the minor ones listed below.

> ---
> From: Rafael J. Wysocki <r...@sisk.pl>
> Subject: PM / Sleep: Implement opportunistic sleep
>
> Introduce a mechanism by which the kernel can trigger global
> transitions to a sleep state chosen by user space if there are no
> active wakeup sources.
>
> It consists of a new sysfs attribute, /sys/power/autosleep, that
> can be written one of the strings returned by reads from
> /sys/power/state, an ordered workqueue and a work item carrying out
> the "suspend" operations. If a string representing the system's
> sleep state is written to /sys/power/autosleep, the work item
> triggering transitions to that state is queued up and it requeues
> itself after every execution until user space writes "off" to
> /sys/power/autosleep.
>
> That work item enables the detection of wakeup events using the
> functions already defined in drivers/base/power/wakeup.c (with one
> small modification) and calls either pm_suspend(), or hibernate() to
> put the system into a sleep state. If a wakeup event is reported
> while the transition is in progress, it will abort the transition and
> the "system suspend" work item will be queued up again.
>
> Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
By the way, the condition checks in the above if-else block look kinda
odd, considering what is done in other similar places, which are more
readable. It would be great if you could make them consistent.

> +
> + out:
> + pm_autosleep_unlock();
> return error ? error : n;
> }
>
> @@ -339,7 +359,8 @@ static ssize_t wakeup_count_show(struct
> {
> unsigned int val;
>
> - return pm_get_wakeup_count(&val) ? sprintf(buf, "%u\n", val) : -EINTR;
> + return pm_get_wakeup_count(&val, true) ?
> + sprintf(buf, "%u\n", val) : -EINTR;
> }
>
> +
> +static ssize_t autosleep_store(struct kobject *kobj,
> + struct kobj_attribute *attr,
> + const char *buf, size_t n)
> +{
> + suspend_state_t state = decode_state(buf, n);
> + int error;
> +
> + if (state == PM_SUSPEND_ON && strncmp(buf, "off", 3)
> + && strncmp(buf, "off\n", 4))
> + return -EINVAL;
> +


I am pretty sure you meant "if autosleep is already off, and the user
wrote "off" to /sys/power/autosleep, then return -EINVAL"

But strncmp() returns 0 if the strings match, and hence the code above
doesn't seem to do what you intended.

> + error = pm_autosleep_set_state(state);
> + return error ? error : n;
> +}
> +
> +power_attr(autosleep);
> +#endif /* CONFIG_PM_AUTOSLEEP */
> #endif /* CONFIG_PM_SLEEP */
>
> #ifdef CONFIG_PM_TRACE


Regards,
Srivatsa S. Bhat

Rafael J. Wysocki

unread,
Mar 4, 2012, 5:52:17 PM3/4/12
to Arve Hjønnevåg, Matt Helsley, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov, jeff...@android.com
Well, if that works for you, it will be better than adding ioctls to evdev
(and presumably a number of other devices).

Care to resubmit with a proper changelog and sign-off?

Rafael

Arve Hjønnevåg

unread,
Mar 5, 2012, 8:04:54 PM3/5/12
to Rafael J. Wysocki, Matt Helsley, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov, jeff...@android.com, Arve Hjønnevåg
When an epoll_event, that has the EPOLLWAKEUP flag set, is ready, a
wakeup_source will be active to prevent suspend. This can be used to
handle wakeup events from a driver that support poll, e.g. input, if
that driver wakes up the waitqueue passed to epoll before allowing
suspend.

The current implementation uses an extra wakeup_source when
ep_scan_ready_list runs. This can cause problems if a single thread
is polling on wakeup events and frequent non-wakeup events (events
usually arrive during thread freezing) using the same epoll file.

Signed-off-by: Arve Hjønnevåg <ar...@android.com>
---
fs/eventpoll.c | 71 +++++++++++++++++++++++++++++++++++++++++++--
include/linux/eventpoll.h | 6 ++++
2 files changed, 74 insertions(+), 3 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index aabdfc3..6263ac6 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -33,6 +33,7 @@
#include <linux/bitops.h>
#include <linux/mutex.h>
#include <linux/anon_inodes.h>
+#include <linux/device.h>
#include <asm/uaccess.h>
#include <asm/system.h>
#include <asm/io.h>
@@ -88,7 +89,7 @@
*/

/* Epoll private bits inside the event mask */
-#define EP_PRIVATE_BITS (EPOLLONESHOT | EPOLLET)
+#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET)

/* Maximum number of nesting allowed inside epoll sets */
#define EP_MAX_NESTS 4
@@ -155,6 +156,9 @@ struct epitem {
/* List header used to link this item to the "struct file" items list */
struct list_head fllink;

+ /* wakeup_source used when EPOLLWAKEUP is set */
+ struct wakeup_source *ws;
+
/* The structure that describe the interested events and the source fd */
struct epoll_event event;
};
@@ -195,6 +199,9 @@ struct eventpoll {
*/
struct epitem *ovflist;

+ /* wakeup_source used when ep_scan_ready_list is running */
+ struct wakeup_source *ws;
+
/* The user that created the eventpoll descriptor */
struct user_struct *user;

@@ -524,6 +531,7 @@ static int ep_scan_ready_list(struct eventpoll *ep,
* in a lockless way.
*/
spin_lock_irqsave(&ep->lock, flags);
+ __pm_stay_awake(ep->ws);
list_splice_init(&ep->rdllist, &txlist);
ep->ovflist = NULL;
spin_unlock_irqrestore(&ep->lock, flags);
@@ -547,8 +555,10 @@ static int ep_scan_ready_list(struct eventpoll *ep,
* queued into ->ovflist but the "txlist" might already
* contain them, and the list_splice() below takes care of them.
*/
- if (!ep_is_linked(&epi->rdllink))
+ if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);
+ }
}
/*
* We need to set back ep->ovflist to EP_UNACTIVE_PTR, so that after
@@ -561,6 +571,7 @@ static int ep_scan_ready_list(struct eventpoll *ep,
* Quickly re-inject items left on "txlist".
*/
list_splice(&txlist, &ep->rdllist);
+ __pm_relax(ep->ws);

if (!list_empty(&ep->rdllist)) {
/*
@@ -615,6 +626,9 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi)
list_del_init(&epi->rdllink);
spin_unlock_irqrestore(&ep->lock, flags);

+ if (epi->ws)
+ wakeup_source_unregister(epi->ws);
+
/* At this point it is safe to free the eventpoll item */
kmem_cache_free(epi_cache, epi);

@@ -665,6 +679,8 @@ static void ep_free(struct eventpoll *ep)
mutex_unlock(&epmutex);
mutex_destroy(&ep->mtx);
free_uid(ep->user);
+ if (ep->ws)
+ wakeup_source_unregister(ep->ws);
kfree(ep);
}

@@ -693,6 +709,7 @@ static int ep_read_events_proc(struct eventpoll *ep, struct list_head *head,
* callback, but it's not actually ready, as far as
* caller requested events goes. We can remove it here.
*/
+ __pm_relax(epi->ws);
list_del_init(&epi->rdllink);
}
}
@@ -877,8 +894,10 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
}

/* If this file is already in the ready list we exit soon */
- if (!ep_is_linked(&epi->rdllink))
+ if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);
+ }

/*
* Wake up ( if active ) both the eventpoll wait list and the ->poll()
@@ -1034,6 +1053,30 @@ static int reverse_path_check(void)
return error;
@@ -1061,6 +1104,13 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
epi->event = *event;
epi->nwait = 0;
epi->next = EP_UNACTIVE_PTR;
+ if (epi->event.events & EPOLLWAKEUP) {
+ error = ep_create_wakeup_source(epi);
+ if (error)
+ goto error_create_wakeup_source;
+ } else {
+ epi->ws = NULL;
+ }

/* Initialize the poll table using the queue callback */
epq.epi = epi;
@@ -1106,6 +1156,7 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
/* If the file is already "ready" we drop it inside the ready list */
if ((revents & event->events) && !ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);

/* Notify waiting tasks that events are available */
if (waitqueue_active(&ep->wq))
@@ -1146,6 +1197,10 @@ error_unregister:
list_del_init(&epi->rdllink);
spin_unlock_irqrestore(&ep->lock, flags);

+ if (epi->ws)
+ wakeup_source_unregister(epi->ws);
+
+error_create_wakeup_source:
kmem_cache_free(epi_cache, epi);

return error;
@@ -1167,6 +1222,12 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
*/
epi->event.events = event->events;
epi->event.data = event->data; /* protected by mtx */
+ if (epi->event.events & EPOLLWAKEUP) {
+ if (!epi->ws)
+ ep_create_wakeup_source(epi);
+ } else if (epi->ws) {
+ ep_destroy_wakeup_source(epi);
+ }

/*
* Get current event bits. We can safely use the file* here because
@@ -1182,6 +1243,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
spin_lock_irq(&ep->lock);
if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);

/* Notify waiting tasks that events are available */
if (waitqueue_active(&ep->wq))
@@ -1217,6 +1279,7 @@ static int ep_send_events_proc(struct eventpoll *ep, struct list_head *head,
!list_empty(head) && eventcnt < esed->maxevents;) {
epi = list_first_entry(head, struct epitem, rdllink);

+ __pm_relax(epi->ws);
list_del_init(&epi->rdllink);

revents = epi->ffd.file->f_op->poll(epi->ffd.file, NULL) &
@@ -1232,6 +1295,7 @@ static int ep_send_events_proc(struct eventpoll *ep, struct list_head *head,
if (__put_user(revents, &uevent->events) ||
__put_user(epi->event.data, &uevent->data)) {
list_add(&epi->rdllink, head);
+ __pm_stay_awake(epi->ws);
return eventcnt ? eventcnt : -EFAULT;
}
eventcnt++;
@@ -1251,6 +1315,7 @@ static int ep_send_events_proc(struct eventpoll *ep, struct list_head *head,
* poll callback will queue them in ep->ovflist.
*/
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);
}
}
}
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 657ab55..520a57c 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -26,6 +26,12 @@
#define EPOLL_CTL_DEL 2
#define EPOLL_CTL_MOD 3

+/*
+ * Request the handling of system wakeup events so as to prevent automatic
+ * system suspends from happening while those events are being processed.
+ */
+#define EPOLLWAKEUP (1 << 29)
+
/* Set the One Shot behaviour for the target file descriptor */
#define EPOLLONESHOT (1 << 30)

--
1.7.7.3

Arve Hjønnevåg

unread,
Mar 5, 2012, 8:05:02 PM3/5/12
to Rafael J. Wysocki, Matt Helsley, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov, jeff...@android.com, Arve Hjønnevåg
Add tracepoints to wakeup_source_activate and wakeup_source_deactivate.
Useful for checking that specific wakeup sources overlap as expected.

Signed-off-by: Arve Hjønnevåg <ar...@android.com>
---
drivers/base/power/wakeup.c | 12 +++++++++---
include/trace/events/power.h | 34 ++++++++++++++++++++++++++++++++++
2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
index a896cc8..94b843d 100644
--- a/drivers/base/power/wakeup.c
+++ b/drivers/base/power/wakeup.c
@@ -14,6 +14,7 @@
#include <linux/suspend.h>
#include <linux/seq_file.h>
#include <linux/debugfs.h>
+#include <trace/events/power.h>

#include "power.h"

@@ -375,6 +376,8 @@ EXPORT_SYMBOL_GPL(device_set_wakeup_enable);
*/
static void wakeup_source_activate(struct wakeup_source *ws)
{
+ unsigned int cec;
+
ws->active = true;
ws->active_count++;
ws->last_time = ktime_get();
@@ -382,7 +385,9 @@ static void wakeup_source_activate(struct wakeup_source *ws)
ws->start_prevent_time = ws->last_time;

/* Increment the counter of events in progress. */
- atomic_inc(&combined_event_count);
+ cec = atomic_inc_return(&combined_event_count);
+
+ trace_wakeup_source_activate(ws->name, cec);
}

/**
@@ -468,7 +473,7 @@ static inline void update_prevent_sleep_time(struct wakeup_source *ws,
*/
static void wakeup_source_deactivate(struct wakeup_source *ws)
{
- unsigned int cnt, inpr;
+ unsigned int cnt, inpr, cec;
ktime_t duration;
ktime_t now;

@@ -506,7 +511,8 @@ static void wakeup_source_deactivate(struct wakeup_source *ws)
* Increment the counter of registered wakeup events and decrement the
* couter of wakeup events in progress simultaneously.
*/
- atomic_add(MAX_IN_PROGRESS, &combined_event_count);
+ cec = atomic_add_return(MAX_IN_PROGRESS, &combined_event_count);
+ trace_wakeup_source_deactivate(ws->name, cec);

split_counters(&cnt, &inpr);
if (!inpr && waitqueue_active(&wakeup_count_wait_queue)) {
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 1bcc2a8..5c7b721 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -65,6 +65,40 @@ TRACE_EVENT(machine_suspend,
TP_printk("state=%lu", (unsigned long)__entry->state)
);

+DECLARE_EVENT_CLASS(wakeup_source,
+
+ TP_PROTO(const char *name, unsigned int state),
+
+ TP_ARGS(name, state),
+
+ TP_STRUCT__entry(
+ __string( name, name )
+ __field( u64, state )
+ ),
+
+ TP_fast_assign(
+ __assign_str(name, name);
+ __entry->state = state;
+ ),
+
+ TP_printk("%s state=0x%lx", __get_str(name),
+ (unsigned long)__entry->state)
+);
+
+DEFINE_EVENT(wakeup_source, wakeup_source_activate,
+
+ TP_PROTO(const char *name, unsigned int state),
+
+ TP_ARGS(name, state)
+);
+
+DEFINE_EVENT(wakeup_source, wakeup_source_deactivate,
+
+ TP_PROTO(const char *name, unsigned int state),
+
+ TP_ARGS(name, state)
+);
+
/* This code will be removed after deprecation time exceeded (2.6.41) */
#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED

Rafael J. Wysocki

unread,
Apr 22, 2012, 5:21:29 PM4/22/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
From: Rafael J. Wysocki <r...@sisk.pl>

Currently, the device suspend code in drivers/base/power/main.c
only checks if there have been any wakeup events, and therefore the
ongoing system transition to a sleep state should be aborted, during
the first (i.e. "suspend") device suspend phase. However, wakeup
events may be reported later as well, so it's reasonable to look for
them in the in the subsequent (i.e. "late suspend" and "suspend
noirq") phases.

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
drivers/base/power/main.c | 10 ++++++++++
1 file changed, 10 insertions(+)

Index: linux/drivers/base/power/main.c
===================================================================
--- linux.orig/drivers/base/power/main.c
+++ linux/drivers/base/power/main.c
@@ -889,6 +889,11 @@ static int dpm_suspend_noirq(pm_message_
if (!list_empty(&dev->power.entry))
list_move(&dev->power.entry, &dpm_noirq_list);
put_device(dev);
+
+ if (pm_wakeup_pending()) {
+ error = -EBUSY;
+ break;
+ }
}
mutex_unlock(&dpm_list_mtx);
if (error)
@@ -962,6 +967,11 @@ static int dpm_suspend_late(pm_message_t
if (!list_empty(&dev->power.entry))
list_move(&dev->power.entry, &dpm_late_early_list);
put_device(dev);
+
+ if (pm_wakeup_pending()) {
+ error = -EBUSY;
+ break;
+ }
}
mutex_unlock(&dpm_list_mtx);
if (error)

Rafael J. Wysocki

unread,
Apr 22, 2012, 5:21:30 PM4/22/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
From: Rafael J. Wysocki <r...@sisk.pl>

The current wakeup source deactivation code doesn't do anything when
the counter of wakeup events in progress goes down to zero, which
requires pm_get_wakeup_count() to poll that counter periodically.
Although this reduces the average time it takes to deactivate a
wakeup source, it also may lead to a substantial amount of unnecessary
polling if there are extended periods of wakeup activity. Thus it
seems reasonable to use a wait queue for signaling the "no wakeup
events in progress" condition and remove the polling.

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
drivers/base/power/wakeup.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -17,8 +17,6 @@

#include "power.h"

-#define TIMEOUT 100
-
/*
* If set, the suspend/hibernate code will abort transitions to a sleep state
* if wakeup events are registered during or immediately before the transition.
@@ -52,6 +50,8 @@ static void pm_wakeup_timer_fn(unsigned

static LIST_HEAD(wakeup_sources);

+static DECLARE_WAIT_QUEUE_HEAD(wakeup_count_wait_queue);
+
/**
* wakeup_source_prepare - Prepare a new wakeup source for initialization.
* @ws: Wakeup source to prepare.
@@ -442,6 +442,7 @@ EXPORT_SYMBOL_GPL(pm_stay_awake);
*/
static void wakeup_source_deactivate(struct wakeup_source *ws)
{
+ unsigned int cnt, inpr;
ktime_t duration;
ktime_t now;

@@ -476,6 +477,10 @@ static void wakeup_source_deactivate(str
* couter of wakeup events in progress simultaneously.
*/
atomic_add(MAX_IN_PROGRESS, &combined_event_count);
+
+ split_counters(&cnt, &inpr);
+ if (!inpr && waitqueue_active(&wakeup_count_wait_queue))
+ wake_up(&wakeup_count_wait_queue);
}

/**
@@ -667,14 +672,19 @@ bool pm_wakeup_pending(void)
bool pm_get_wakeup_count(unsigned int *count)
{
unsigned int cnt, inpr;
+ DEFINE_WAIT(wait);

for (;;) {
+ prepare_to_wait(&wakeup_count_wait_queue, &wait,
+ TASK_INTERRUPTIBLE);
split_counters(&cnt, &inpr);
if (inpr == 0 || signal_pending(current))
break;
pm_wakeup_update_hit_counts();
- schedule_timeout_interruptible(msecs_to_jiffies(TIMEOUT));
+
+ schedule();
}
+ finish_wait(&wakeup_count_wait_queue, &wait);

split_counters(&cnt, &inpr);
*count = cnt;

Rafael J. Wysocki

unread,
Apr 22, 2012, 5:21:48 PM4/22/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
From: Rafael J. Wysocki <r...@sisk.pl>

Introduce a mechanism by which the kernel can trigger global
transitions to a sleep state chosen by user space if there are no
active wakeup sources.

It consists of a new sysfs attribute, /sys/power/autosleep, that
can be written one of the strings returned by reads from
/sys/power/state, an ordered workqueue and a work item carrying out
the "suspend" operations. If a string representing the system's
sleep state is written to /sys/power/autosleep, the work item
triggering transitions to that state is queued up and it requeues
itself after every execution until user space writes "off" to
/sys/power/autosleep.

That work item enables the detection of wakeup events using the
functions already defined in drivers/base/power/wakeup.c (with one
small modification) and calls either pm_suspend(), or hibernate() to
put the system into a sleep state. If a wakeup event is reported
while the transition is in progress, it will abort the transition and
the "system suspend" work item will be queued up again.

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
+ goto out;
+
+ mutex_lock(&autosleep_lock);
+
+ if (!pm_save_wakeup_count(initial_count)) {
+ mutex_unlock(&autosleep_lock);
+ goto out;
+ }
+
+ if (autosleep_state == PM_SUSPEND_ON) {
+ mutex_unlock(&autosleep_lock);
+ return;
+ }
+ if (autosleep_state >= PM_SUSPEND_MAX)
+ hibernate();
+ else
+ pm_suspend(autosleep_state);
+
+ mutex_unlock(&autosleep_lock);
+
+ if (!pm_get_wakeup_count(&final_count, false))
+ goto out;
+
+ return 0;
+}
+
+int __init pm_autosleep_init(void)
+{
+ autosleep_ws = wakeup_source_register("autosleep");
+ if (!autosleep_ws)
+ return -ENOMEM;
+
+ autosleep_wq = alloc_ordered_workqueue("autosleep", 0);
+ if (autosleep_wq)
+ return 0;
+
+ wakeup_source_unregister(autosleep_ws);
+ return -ENOMEM;
+}
+ error = -EBUSY;
+ goto out;
+ }
+
+ state = decode_state(buf, n);
+ if (state < PM_SUSPEND_MAX)
+ error = pm_suspend(state);
+ else if (state > PM_SUSPEND_ON)
+ error = hibernate();
+ else
+ error = -EINVAL;
+
+ out:
+ pm_autosleep_unlock();
return error ? error : n;
}

@@ -339,7 +359,8 @@ static ssize_t wakeup_count_show(struct
{
unsigned int val;

- return pm_get_wakeup_count(&val) ? sprintf(buf, "%u\n", val) : -EINTR;
+ return pm_get_wakeup_count(&val, true) ?
+ sprintf(buf, "%u\n", val) : -EINTR;
}

static ssize_t wakeup_count_store(struct kobject *kobj,
@@ -347,15 +368,69 @@ static ssize_t wakeup_count_store(struct
const char *buf, size_t n)
{
unsigned int val;
+ int error;
+
+ error = pm_autosleep_lock();
+ if (error)
+ return error;
+
+ if (pm_autosleep_state() > PM_SUSPEND_ON) {
+ error = -EBUSY;
+ goto out;
+ }

if (sscanf(buf, "%u", &val) == 1) {
if (pm_save_wakeup_count(val))
return n;
}
- return -EINVAL;
+ error = -EINVAL;
+
+ out:
+ pm_autosleep_unlock();
+ return error;
}

power_attr(wakeup_count);
+
+#ifdef CONFIG_PM_AUTOSLEEP
+static ssize_t autosleep_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ suspend_state_t state = pm_autosleep_state();
+
+ if (state == PM_SUSPEND_ON)
+ return sprintf(buf, "off\n");
+
+#ifdef CONFIG_SUSPEND
+ if (state < PM_SUSPEND_MAX)
+ return sprintf(buf, "%s\n", valid_state(state) ?
+ pm_states[state] : "error");
+#endif
+#ifdef CONFIG_HIBERNATION
+ return sprintf(buf, "disk\n");
+#else
+ return sprintf(buf, "error");
+#endif
+}
+
+static ssize_t autosleep_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ suspend_state_t state = decode_state(buf, n);
+ int error;
+
+ if (state == PM_SUSPEND_ON && strncmp(buf, "off", 3)
+ && strncmp(buf, "off\n", 4))
+ return -EINVAL;
+
+ error = pm_autosleep_set_state(state);
+ return error ? error : n;
+}
+
+power_attr(autosleep);
+#endif /* CONFIG_PM_AUTOSLEEP */
#endif /* CONFIG_PM_SLEEP */

#ifdef CONFIG_PM_TRACE
@@ -409,6 +484,9 @@ static struct attribute * g[] = {
#ifdef CONFIG_PM_SLEEP
&pm_async_attr.attr,
&wakeup_count_attr.attr,
+#ifdef CONFIG_PM_AUTOSLEEP
+ &autosleep_attr.attr,
+#endif
#ifdef CONFIG_PM_DEBUG
&pm_test_attr.attr,
#endif
@@ -444,7 +522,10 @@ static int __init pm_init(void)
power_kobj = kobject_create_and_add("power", NULL);
if (!power_kobj)
return -ENOMEM;
- return sysfs_create_group(power_kobj, &attr_group);
+ error = sysfs_create_group(power_kobj, &attr_group);
+ if (error)
+ return error;
+ return pm_autosleep_init();
}

core_initcall(pm_init);
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -498,8 +498,10 @@ static void wakeup_source_deactivate(str
trace_wakeup_source_deactivate(ws->name, cec);

split_counters(&cnt, &inpr);
- if (!inpr && waitqueue_active(&wakeup_count_wait_queue))
+ if (!inpr && waitqueue_active(&wakeup_count_wait_queue)) {
wake_up(&wakeup_count_wait_queue);
+ queue_up_suspend_work();
+ }
}

/**
@@ -660,29 +662,33 @@ bool pm_wakeup_pending(void)
+ for (;;) {
+ prepare_to_wait(&wakeup_count_wait_queue, &wait,
+ TASK_INTERRUPTIBLE);
+ split_counters(&cnt, &inpr);
+ if (inpr == 0 || signal_pending(current))
+ break;
+
+ schedule();
+ }
+ finish_wait(&wakeup_count_wait_queue, &wait);
}
- finish_wait(&wakeup_count_wait_queue, &wait);

split_counters(&cnt, &inpr);
*count = cnt;
Index: linux/Documentation/ABI/testing/sysfs-power
===================================================================
--- linux.orig/Documentation/ABI/testing/sysfs-power
+++ linux/Documentation/ABI/testing/sysfs-power
@@ -172,3 +172,20 @@ Description:

Reading from this file will display the current value, which is
set to 1 MB by default.
+
+What: /sys/power/autosleep
+Date: February 2012
+Contact: Rafael J. Wysocki <r...@sisk.pl>
+Description:
+ The /sys/power/autosleep file can be written one of the strings
+ returned by reads from /sys/power/state. If that happens, a
+ work item attempting to trigger a transition of the system to
+ the sleep state represented by that string is queued up. This
+ attempt will only succeed if there are no active wakeup sources
+ in the system at that time. After evey execution, regardless
+ of whether or not the attempt to put the system to sleep has
+ succeeded, the work item requeues itself until user space
+ writes "off" to /sys/power/autosleep.
+
+ Reading from this file causes the last string successfully
+ written to it to be displayed.

Rafael J. Wysocki

unread,
Apr 22, 2012, 5:21:57 PM4/22/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
From: Rafael J. Wysocki <r...@sisk.pl>

Android uses one wakelock statistics that is only necessary for
opportunistic sleep. Namely, the prevent_suspend_time field
accumulates the total time the given wakelock has been locked
while "automatic suspend" was enabled. Add an analogous field,
prevent_sleep_time, to wakeup sources and make it behave in a similar
way.

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
Documentation/ABI/testing/sysfs-devices-power | 11 ++++
drivers/base/power/sysfs.c | 24 ++++++++++
drivers/base/power/wakeup.c | 61 ++++++++++++++++++++++++--
include/linux/pm_wakeup.h | 4 +
include/linux/suspend.h | 1
kernel/power/autosleep.c | 6 ++
6 files changed, 102 insertions(+), 5 deletions(-)

Index: linux/include/linux/pm_wakeup.h
===================================================================
--- linux.orig/include/linux/pm_wakeup.h
+++ linux/include/linux/pm_wakeup.h
@@ -34,6 +34,7 @@
* @total_time: Total time this wakeup source has been active.
* @max_time: Maximum time this wakeup source has been continuously active.
* @last_time: Monotonic clock when the wakeup source's was touched last time.
+ * @prevent_sleep_time: Total time this source has been preventing autosleep.
* @event_count: Number of signaled wakeup events.
* @active_count: Number of times the wakeup sorce was activated.
* @relax_count: Number of times the wakeup sorce was deactivated.
@@ -51,12 +52,15 @@ struct wakeup_source {
ktime_t total_time;
ktime_t max_time;
ktime_t last_time;
+ ktime_t start_prevent_time;
+ ktime_t prevent_sleep_time;
unsigned long event_count;
unsigned long active_count;
unsigned long relax_count;
unsigned long expire_count;
unsigned long wakeup_count;
bool active:1;
+ bool autosleep_enabled:1;
};

#ifdef CONFIG_PM_SLEEP
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -380,6 +380,8 @@ static void wakeup_source_activate(struc
ws->active = true;
ws->active_count++;
ws->last_time = ktime_get();
+ if (ws->autosleep_enabled)
+ ws->start_prevent_time = ws->last_time;

/* Increment the counter of events in progress. */
cec = atomic_inc_return(&combined_event_count);
@@ -449,6 +451,17 @@ void pm_stay_awake(struct device *dev)
}
EXPORT_SYMBOL_GPL(pm_stay_awake);

+#ifdef CONFIG_PM_AUTOSLEEP
+static void update_prevent_sleep_time(struct wakeup_source *ws, ktime_t now)
+{
+ ktime_t delta = ktime_sub(now, ws->start_prevent_time);
+ ws->prevent_sleep_time = ktime_add(ws->prevent_sleep_time, delta);
+}
+#else
+static inline void update_prevent_sleep_time(struct wakeup_source *ws,
+ ktime_t now) {}
+#endif
+
/**
* wakup_source_deactivate - Mark given wakeup source as inactive.
* @ws: Wakeup source to handle.
@@ -490,6 +503,9 @@ static void wakeup_source_deactivate(str
del_timer(&ws->timer);
ws->timer_expires = 0;

+ if (ws->autosleep_enabled)
+ update_prevent_sleep_time(ws, now);
+
/*
* Increment the counter of registered wakeup events and decrement the
* couter of wakeup events in progress simultaneously.
@@ -720,6 +736,34 @@ bool pm_save_wakeup_count(unsigned int c
return events_check_enabled;
}

+#ifdef CONFIG_PM_AUTOSLEEP
+/**
+ * pm_wakep_autosleep_enabled - Modify autosleep_enabled for all wakeup sources.
+ * @enabled: Whether to set or to clear the autosleep_enabled flags.
+ */
+void pm_wakep_autosleep_enabled(bool set)
+{
+ struct wakeup_source *ws;
+ ktime_t now = ktime_get();
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(ws, &wakeup_sources, entry) {
+ spin_lock_irq(&ws->lock);
+ if (ws->autosleep_enabled != set) {
+ ws->autosleep_enabled = set;
+ if (ws->active) {
+ if (set)
+ ws->start_prevent_time = now;
+ else
+ update_prevent_sleep_time(ws, now);
+ }
+ }
+ spin_unlock_irq(&ws->lock);
+ }
+ rcu_read_unlock();
+}
+#endif /* CONFIG_PM_AUTOSLEEP */
+
static struct dentry *wakeup_sources_stats_dentry;

/**
@@ -735,28 +779,37 @@ static int print_wakeup_source_stats(str
ktime_t max_time;
unsigned long active_count;
ktime_t active_time;
+ ktime_t prevent_sleep_time;
int ret;

spin_lock_irqsave(&ws->lock, flags);

total_time = ws->total_time;
max_time = ws->max_time;
+ prevent_sleep_time = ws->prevent_sleep_time;
active_count = ws->active_count;
if (ws->active) {
- active_time = ktime_sub(ktime_get(), ws->last_time);
+ ktime_t now = ktime_get();
+
+ active_time = ktime_sub(now, ws->last_time);
total_time = ktime_add(total_time, active_time);
if (active_time.tv64 > max_time.tv64)
max_time = active_time;
+
+ if (ws->autosleep_enabled)
+ prevent_sleep_time = ktime_add(prevent_sleep_time,
+ ktime_sub(now, ws->start_prevent_time));
} else {
active_time = ktime_set(0, 0);
}

ret = seq_printf(m, "%-12s\t%lu\t\t%lu\t\t%lu\t\t%lu\t\t"
- "%lld\t\t%lld\t\t%lld\t\t%lld\n",
+ "%lld\t\t%lld\t\t%lld\t\t%lld\t\t%lld\n",
ws->name, active_count, ws->event_count,
ws->wakeup_count, ws->expire_count,
ktime_to_ms(active_time), ktime_to_ms(total_time),
- ktime_to_ms(max_time), ktime_to_ms(ws->last_time));
+ ktime_to_ms(max_time), ktime_to_ms(ws->last_time),
+ ktime_to_ms(prevent_sleep_time));

spin_unlock_irqrestore(&ws->lock, flags);

@@ -773,7 +826,7 @@ static int wakeup_sources_stats_show(str

seq_puts(m, "name\t\tactive_count\tevent_count\twakeup_count\t"
"expire_count\tactive_since\ttotal_time\tmax_time\t"
- "last_change\n");
+ "last_change\tprevent_suspend_time\n");

rcu_read_lock();
list_for_each_entry_rcu(ws, &wakeup_sources, entry)
Index: linux/include/linux/suspend.h
===================================================================
--- linux.orig/include/linux/suspend.h
+++ linux/include/linux/suspend.h
@@ -358,6 +358,7 @@ extern bool events_check_enabled;
extern bool pm_wakeup_pending(void);
extern bool pm_get_wakeup_count(unsigned int *count, bool block);
extern bool pm_save_wakeup_count(unsigned int count);
+extern void pm_wakep_autosleep_enabled(bool set);

static inline void lock_system_sleep(void)
{
Index: linux/drivers/base/power/sysfs.c
===================================================================
--- linux.orig/drivers/base/power/sysfs.c
+++ linux/drivers/base/power/sysfs.c
@@ -417,6 +417,27 @@ static ssize_t wakeup_last_time_show(str
}

static DEVICE_ATTR(wakeup_last_time_ms, 0444, wakeup_last_time_show, NULL);
+
+#ifdef CONFIG_PM_AUTOSLEEP
+static ssize_t wakeup_prevent_sleep_time_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ s64 msec = 0;
+ bool enabled = false;
+
+ spin_lock_irq(&dev->power.lock);
+ if (dev->power.wakeup) {
+ msec = ktime_to_ms(dev->power.wakeup->prevent_sleep_time);
+ enabled = true;
+ }
+ spin_unlock_irq(&dev->power.lock);
+ return enabled ? sprintf(buf, "%lld\n", msec) : sprintf(buf, "\n");
+}
+
+static DEVICE_ATTR(wakeup_prevent_sleep_time_ms, 0444,
+ wakeup_prevent_sleep_time_show, NULL);
+#endif /* CONFIG_PM_AUTOSLEEP */
#endif /* CONFIG_PM_SLEEP */

#ifdef CONFIG_PM_ADVANCED_DEBUG
@@ -511,6 +532,9 @@ static struct attribute *wakeup_attrs[]
&dev_attr_wakeup_total_time_ms.attr,
&dev_attr_wakeup_max_time_ms.attr,
&dev_attr_wakeup_last_time_ms.attr,
+#ifdef CONFIG_PM_AUTOSLEEP
+ &dev_attr_wakeup_prevent_sleep_time_ms.attr,
+#endif
#endif
NULL,
};
Index: linux/Documentation/ABI/testing/sysfs-devices-power
===================================================================
--- linux.orig/Documentation/ABI/testing/sysfs-devices-power
+++ linux/Documentation/ABI/testing/sysfs-devices-power
@@ -158,6 +158,17 @@ Description:
not enabled to wake up the system from sleep states, this
attribute is not present.

+What: /sys/devices/.../power/wakeup_prevent_sleep_time_ms
+Date: February 2012
+Contact: Rafael J. Wysocki <r...@sisk.pl>
+Description:
+ The /sys/devices/.../wakeup_prevent_sleep_time_ms attribute
+ contains the total time the device has been preventing
+ opportunistic transitions to sleep states from occuring.
+ This attribute is read-only. If the device is not enabled to
+ wake up the system from sleep states, this attribute is not
+ present.
+
What: /sys/devices/.../power/autosuspend_delay_ms
Date: September 2010
Contact: Alan Stern <st...@rowland.harvard.edu>
Index: linux/kernel/power/autosleep.c
===================================================================
--- linux.orig/kernel/power/autosleep.c
+++ linux/kernel/power/autosleep.c
@@ -91,8 +91,12 @@ int pm_autosleep_set_state(suspend_state

__pm_relax(autosleep_ws);

- if (state > PM_SUSPEND_ON)
+ if (state > PM_SUSPEND_ON) {
+ pm_wakep_autosleep_enabled(true);
queue_up_suspend_work();
+ } else {
+ pm_wakep_autosleep_enabled(false);
+ }

mutex_unlock(&autosleep_lock);
return 0;

Rafael J. Wysocki

unread,
Apr 22, 2012, 5:22:11 PM4/22/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
From: Arve Hjønnevåg <ar...@android.com>

When an epoll_event, that has the EPOLLWAKEUP flag set, is ready, a
wakeup_source will be active to prevent suspend. This can be used to
handle wakeup events from a driver that support poll, e.g. input, if
that driver wakes up the waitqueue passed to epoll before allowing
suspend.

The current implementation uses an extra wakeup_source when
ep_scan_ready_list runs. This can cause problems if a single thread
is polling on wakeup events and frequent non-wakeup events (events
usually arrive during thread freezing) using the same epoll file.

Signed-off-by: Arve Hjønnevåg <ar...@android.com>
Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
fs/eventpoll.c | 71 ++++++++++++++++++++++++++++++++++++++++++++--
include/linux/eventpoll.h | 6 +++
2 files changed, 74 insertions(+), 3 deletions(-)

Index: linux/fs/eventpoll.c
===================================================================
--- linux.orig/fs/eventpoll.c
+++ linux/fs/eventpoll.c
@@ -33,6 +33,7 @@
#include <linux/bitops.h>
#include <linux/mutex.h>
#include <linux/anon_inodes.h>
+#include <linux/device.h>
#include <asm/uaccess.h>
#include <asm/io.h>
#include <asm/mman.h>
@@ -87,7 +88,7 @@
*/

/* Epoll private bits inside the event mask */
-#define EP_PRIVATE_BITS (EPOLLONESHOT | EPOLLET)
+#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET)

/* Maximum number of nesting allowed inside epoll sets */
#define EP_MAX_NESTS 4
@@ -154,6 +155,9 @@ struct epitem {
/* List header used to link this item to the "struct file" items list */
struct list_head fllink;

+ /* wakeup_source used when EPOLLWAKEUP is set */
+ struct wakeup_source *ws;
+
/* The structure that describe the interested events and the source fd */
struct epoll_event event;
};
@@ -194,6 +198,9 @@ struct eventpoll {
*/
struct epitem *ovflist;

+ /* wakeup_source used when ep_scan_ready_list is running */
+ struct wakeup_source *ws;
+
/* The user that created the eventpoll descriptor */
struct user_struct *user;

@@ -565,6 +572,7 @@ static int ep_scan_ready_list(struct eve
* in a lockless way.
*/
spin_lock_irqsave(&ep->lock, flags);
+ __pm_stay_awake(ep->ws);
list_splice_init(&ep->rdllist, &txlist);
ep->ovflist = NULL;
spin_unlock_irqrestore(&ep->lock, flags);
@@ -588,8 +596,10 @@ static int ep_scan_ready_list(struct eve
* queued into ->ovflist but the "txlist" might already
* contain them, and the list_splice() below takes care of them.
*/
- if (!ep_is_linked(&epi->rdllink))
+ if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);
+ }
}
/*
* We need to set back ep->ovflist to EP_UNACTIVE_PTR, so that after
@@ -602,6 +612,7 @@ static int ep_scan_ready_list(struct eve
* Quickly re-inject items left on "txlist".
*/
list_splice(&txlist, &ep->rdllist);
+ __pm_relax(ep->ws);

if (!list_empty(&ep->rdllist)) {
/*
@@ -656,6 +667,9 @@ static int ep_remove(struct eventpoll *e
list_del_init(&epi->rdllink);
spin_unlock_irqrestore(&ep->lock, flags);

+ if (epi->ws)
+ wakeup_source_unregister(epi->ws);
+
/* At this point it is safe to free the eventpoll item */
kmem_cache_free(epi_cache, epi);

@@ -706,6 +720,8 @@ static void ep_free(struct eventpoll *ep
mutex_unlock(&epmutex);
mutex_destroy(&ep->mtx);
free_uid(ep->user);
+ if (ep->ws)
+ wakeup_source_unregister(ep->ws);
kfree(ep);
}

@@ -737,6 +753,7 @@ static int ep_read_events_proc(struct ev
* callback, but it's not actually ready, as far as
* caller requested events goes. We can remove it here.
*/
+ __pm_relax(epi->ws);
list_del_init(&epi->rdllink);
}
}
@@ -932,8 +949,10 @@ static int ep_poll_callback(wait_queue_t
}

/* If this file is already in the ready list we exit soon */
- if (!ep_is_linked(&epi->rdllink))
+ if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);
+ }

/*
* Wake up ( if active ) both the eventpoll wait list and the ->poll()
@@ -1091,6 +1110,30 @@ static int reverse_path_check(void)
return error;
}

+static int ep_create_wakeup_source(struct epitem *epi)
+{
+ const char *name;
+
+ if (!epi->ep->ws) {
+ epi->ep->ws = wakeup_source_register("eventpoll");
+ if (!epi->ep->ws)
+ return -ENOMEM;
+ }
+
+ name = epi->ffd.file->f_path.dentry->d_name.name;
+ epi->ws = wakeup_source_register(name);
+ if (!epi->ws)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static void ep_destroy_wakeup_source(struct epitem *epi)
+{
+ wakeup_source_unregister(epi->ws);
+ epi->ws = NULL;
+}
+
/*
* Must be called with "mtx" held.
*/
@@ -1118,6 +1161,13 @@ static int ep_insert(struct eventpoll *e
epi->event = *event;
epi->nwait = 0;
epi->next = EP_UNACTIVE_PTR;
+ if (epi->event.events & EPOLLWAKEUP) {
+ error = ep_create_wakeup_source(epi);
+ if (error)
+ goto error_create_wakeup_source;
+ } else {
+ epi->ws = NULL;
+ }

/* Initialize the poll table using the queue callback */
epq.epi = epi;
@@ -1164,6 +1214,7 @@ static int ep_insert(struct eventpoll *e
/* If the file is already "ready" we drop it inside the ready list */
if ((revents & event->events) && !ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);

/* Notify waiting tasks that events are available */
if (waitqueue_active(&ep->wq))
@@ -1204,6 +1255,10 @@ error_unregister:
list_del_init(&epi->rdllink);
spin_unlock_irqrestore(&ep->lock, flags);

+ if (epi->ws)
+ wakeup_source_unregister(epi->ws);
+
+error_create_wakeup_source:
kmem_cache_free(epi_cache, epi);

return error;
@@ -1229,6 +1284,12 @@ static int ep_modify(struct eventpoll *e
epi->event.events = event->events;
pt._key = event->events;
epi->event.data = event->data; /* protected by mtx */
+ if (epi->event.events & EPOLLWAKEUP) {
+ if (!epi->ws)
+ ep_create_wakeup_source(epi);
+ } else if (epi->ws) {
+ ep_destroy_wakeup_source(epi);
+ }

/*
* Get current event bits. We can safely use the file* here because
@@ -1244,6 +1305,7 @@ static int ep_modify(struct eventpoll *e
spin_lock_irq(&ep->lock);
if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);

/* Notify waiting tasks that events are available */
if (waitqueue_active(&ep->wq))
@@ -1282,6 +1344,7 @@ static int ep_send_events_proc(struct ev
!list_empty(head) && eventcnt < esed->maxevents;) {
epi = list_first_entry(head, struct epitem, rdllink);

+ __pm_relax(epi->ws);
list_del_init(&epi->rdllink);

pt._key = epi->event.events;
@@ -1298,6 +1361,7 @@ static int ep_send_events_proc(struct ev
if (__put_user(revents, &uevent->events) ||
__put_user(epi->event.data, &uevent->data)) {
list_add(&epi->rdllink, head);
+ __pm_stay_awake(epi->ws);
return eventcnt ? eventcnt : -EFAULT;
}
eventcnt++;
@@ -1317,6 +1381,7 @@ static int ep_send_events_proc(struct ev
* poll callback will queue them in ep->ovflist.
*/
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);
}
}
}
Index: linux/include/linux/eventpoll.h
===================================================================
--- linux.orig/include/linux/eventpoll.h
+++ linux/include/linux/eventpoll.h
@@ -26,6 +26,12 @@
#define EPOLL_CTL_DEL 2
#define EPOLL_CTL_MOD 3

+/*
+ * Request the handling of system wakeup events so as to prevent automatic
+ * system suspends from happening while those events are being processed.
+ */
+#define EPOLLWAKEUP (1 << 29)
+
/* Set the One Shot behaviour for the target file descriptor */
#define EPOLLONESHOT (1 << 30)


Rafael J. Wysocki

unread,
Apr 22, 2012, 5:22:25 PM4/22/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
From: Rafael J. Wysocki <r...@sisk.pl>

Android allows user space to manipulate wakelocks using two
sysfs file located in /sys/power/, wake_lock and wake_unlock.
Writing a wakelock name and optionally a timeout to the wake_lock
file causes the wakelock whose name was written to be acquired (it
is created before is necessary), optionally with the given timeout.
Writing the name of a wakelock to wake_unlock causes that wakelock
to be released.

Implement an analogous interface for user space using wakeup sources.
Add the /sys/power/wake_lock and /sys/power/wake_unlock files
allowing user space to create, activate and deactivate wakeup
sources, such that writing a name and optionally a timeout to
wake_lock causes the wakeup source of that name to be activated,
optionally with the given timeout. If that wakeup source doesn't
exist, it will be created and then activated. Writing a name to
wake_unlock causes the wakeup source of that name, if there is one,
to be deactivated. Wakeup sources created with the help of
wake_lock that haven't been used for more than 5 minutes are garbage
collected and destroyed. Moreover, there can be only WL_NUMBER_LIMIT
wakeup sources created with the help of wake_lock present at a time.

The data type used to track wakeup sources created by user space is
called "struct wakelock" to indicate the origins of this feature.

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
Documentation/ABI/testing/sysfs-power | 42 ++++++
drivers/base/power/wakeup.c | 1
kernel/power/Kconfig | 8 +
kernel/power/Makefile | 1
kernel/power/main.c | 41 ++++++
kernel/power/power.h | 9 +
kernel/power/wakelock.c | 218 ++++++++++++++++++++++++++++++++++
7 files changed, 320 insertions(+)

Index: linux/kernel/power/main.c
===================================================================
--- linux.orig/kernel/power/main.c
+++ linux/kernel/power/main.c
@@ -431,6 +431,43 @@ static ssize_t autosleep_store(struct ko

power_attr(autosleep);
#endif /* CONFIG_PM_AUTOSLEEP */
+
+#ifdef CONFIG_PM_WAKELOCKS
+static ssize_t wake_lock_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ return pm_show_wakelocks(buf, true);
+}
+
+static ssize_t wake_lock_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ int error = pm_wake_lock(buf);
+ return error ? error : n;
+}
+
+power_attr(wake_lock);
+
+static ssize_t wake_unlock_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ return pm_show_wakelocks(buf, false);
+}
+
+static ssize_t wake_unlock_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ int error = pm_wake_unlock(buf);
+ return error ? error : n;
+}
+
+power_attr(wake_unlock);
+
+#endif /* CONFIG_PM_WAKELOCKS */
#endif /* CONFIG_PM_SLEEP */

#ifdef CONFIG_PM_TRACE
@@ -487,6 +524,10 @@ static struct attribute * g[] = {
#ifdef CONFIG_PM_AUTOSLEEP
&autosleep_attr.attr,
#endif
+#ifdef CONFIG_PM_WAKELOCKS
+ &wake_lock_attr.attr,
+ &wake_unlock_attr.attr,
+#endif
#ifdef CONFIG_PM_DEBUG
&pm_test_attr.attr,
#endif
Index: linux/kernel/power/power.h
===================================================================
--- linux.orig/kernel/power/power.h
+++ linux/kernel/power/power.h
@@ -282,3 +282,12 @@ static inline void pm_autosleep_unlock(v
static inline suspend_state_t pm_autosleep_state(void) { return PM_SUSPEND_ON; }

#endif /* !CONFIG_PM_AUTOSLEEP */
+
+#ifdef CONFIG_PM_WAKELOCKS
+
+/* kernel/power/wakelock.c */
+extern ssize_t pm_show_wakelocks(char *buf, bool show_active);
+extern int pm_wake_lock(const char *buf);
+extern int pm_wake_unlock(const char *buf);
+
+#endif /* !CONFIG_PM_WAKELOCKS */
Index: linux/kernel/power/Kconfig
===================================================================
--- linux.orig/kernel/power/Kconfig
+++ linux/kernel/power/Kconfig
@@ -111,6 +111,14 @@ config PM_AUTOSLEEP
Allow the kernel to trigger a system transition into a global sleep
state automatically whenever there are no active wakeup sources.

+config PM_WAKELOCKS
+ bool "User space wakeup sources interface"
+ depends on PM_SLEEP
+ default n
+ ---help---
+ Allow user space to create, activate and deactivate wakeup source
+ objects with the help of a sysfs-based interface.
+
config PM_RUNTIME
bool "Run-time PM core functionality"
depends on !IA64_HP_SIM
Index: linux/kernel/power/wakelock.c
===================================================================
--- /dev/null
+++ linux/kernel/power/wakelock.c
@@ -0,0 +1,218 @@
+/*
+ * kernel/power/wakelock.c
+ *
+ * User space wakeup sources support.
+ *
+ * Copyright (C) 2012 Rafael J. Wysocki <r...@sisk.pl>
+ *
+ * This code is based on the analogous interface allowing user space to
+ * manipulate wakelocks on Android.
+ */
+
+#include <linux/ctype.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/hrtimer.h>
+#include <linux/list.h>
+#include <linux/rbtree.h>
+#include <linux/slab.h>
+
+#define WL_NUMBER_LIMIT 100
+#define WL_GC_COUNT_MAX 100
+#define WL_GC_TIME_SEC 300
+
+static DEFINE_MUTEX(wakelocks_lock);
+
+struct wakelock {
+ char *name;
+ struct rb_node node;
+ struct wakeup_source ws;
+ struct list_head lru;
+};
+
+static struct rb_root wakelocks_tree = RB_ROOT;
+static LIST_HEAD(wakelocks_lru_list);
+static unsigned int number_of_wakelocks;
+static unsigned int wakelocks_gc_count;
+
+ssize_t pm_show_wakelocks(char *buf, bool show_active)
+{
+ struct rb_node *node;
+ struct wakelock *wl;
+ char *str = buf;
+ char *end = buf + PAGE_SIZE;
+
+ mutex_lock(&wakelocks_lock);
+
+ for (node = rb_first(&wakelocks_tree); node; node = rb_next(node)) {
+ bool active;
+
+ wl = rb_entry(node, struct wakelock, node);
+ spin_lock_irq(&wl->ws.lock);
+ active = wl->ws.active;
+ spin_unlock_irq(&wl->ws.lock);
+ if (active == show_active)
+ str += scnprintf(str, end - str, "%s ", wl->name);
+ }
+ str += scnprintf(str, end - str, "\n");
+
+ mutex_unlock(&wakelocks_lock);
+ return (str - buf);
+}
+
+static struct wakelock *wakelock_lookup_add(const char *name, size_t len,
+ bool add_if_not_found)
+{
+ struct rb_node **node = &wakelocks_tree.rb_node;
+ struct rb_node *parent = *node;
+ struct wakelock *wl;
+
+ while (*node) {
+ int diff;
+
+ wl = rb_entry(*node, struct wakelock, node);
+ diff = strncmp(name, wl->name, len);
+ if (diff == 0) {
+ if (wl->name[len])
+ diff = -1;
+ else
+ return wl;
+ }
+ if (diff < 0)
+ node = &(*node)->rb_left;
+ else
+ node = &(*node)->rb_right;
+
+ parent = *node;
+ }
+ if (!add_if_not_found)
+ return ERR_PTR(-EINVAL);
+
+ if (number_of_wakelocks > WL_NUMBER_LIMIT)
+ return ERR_PTR(-ENOSPC);
+
+ /* Not found, we have to add a new one. */
+ wl = kzalloc(sizeof(*wl), GFP_KERNEL);
+ if (!wl)
+ return ERR_PTR(-ENOMEM);
+
+ wl->name = kstrndup(name, len, GFP_KERNEL);
+ if (!wl->name) {
+ kfree(wl);
+ return ERR_PTR(-ENOMEM);
+ }
+ wl->ws.name = wl->name;
+ wakeup_source_add(&wl->ws);
+ rb_link_node(&wl->node, parent, node);
+ rb_insert_color(&wl->node, &wakelocks_tree);
+ list_add(&wl->lru, &wakelocks_lru_list);
+ number_of_wakelocks++;
+ return wl;
+}
+
+int pm_wake_lock(const char *buf)
+{
+ const char *str = buf;
+ struct wakelock *wl;
+ u64 timeout_ns = 0;
+ size_t len;
+ int ret = 0;
+
+ while (*str && !isspace(*str))
+ str++;
+
+ len = str - buf;
+ if (!len)
+ return -EINVAL;
+
+ if (*str && *str != '\n') {
+ /* Find out if there's a valid timeout string appended. */
+ ret = kstrtou64(skip_spaces(str), 10, &timeout_ns);
+ if (ret)
+ return -EINVAL;
+ }
+
+ mutex_lock(&wakelocks_lock);
+
+ wl = wakelock_lookup_add(buf, len, true);
+ if (IS_ERR(wl)) {
+ ret = PTR_ERR(wl);
+ goto out;
+ }
+ if (timeout_ns) {
+ u64 timeout_ms = timeout_ns + NSEC_PER_MSEC - 1;
+
+ do_div(timeout_ms, NSEC_PER_MSEC);
+ __pm_wakeup_event(&wl->ws, timeout_ms);
+ } else {
+ __pm_stay_awake(&wl->ws);
+ }
+
+ list_move(&wl->lru, &wakelocks_lru_list);
+
+ out:
+ mutex_unlock(&wakelocks_lock);
+ return ret;
+}
+
+static void wakelocks_gc(void)
+{
+ struct wakelock *wl, *aux;
+ ktime_t now = ktime_get();
+
+ list_for_each_entry_safe_reverse(wl, aux, &wakelocks_lru_list, lru) {
+ u64 idle_time_ns;
+ bool active;
+
+ spin_lock_irq(&wl->ws.lock);
+ idle_time_ns = ktime_to_ns(ktime_sub(now, wl->ws.last_time));
+ active = wl->ws.active;
+ spin_unlock_irq(&wl->ws.lock);
+
+ if (idle_time_ns < ((u64)WL_GC_TIME_SEC * NSEC_PER_SEC))
+ break;
+
+ if (!active) {
+ wakeup_source_remove(&wl->ws);
+ rb_erase(&wl->node, &wakelocks_tree);
+ list_del(&wl->lru);
+ kfree(wl->name);
+ kfree(wl);
+ number_of_wakelocks--;
+ }
+ }
+ wakelocks_gc_count = 0;
+}
+
+int pm_wake_unlock(const char *buf)
+{
+ struct wakelock *wl;
+ size_t len;
+ int ret = 0;
+
+ len = strlen(buf);
+ if (!len)
+ return -EINVAL;
+
+ if (buf[len-1] == '\n')
+ len--;
+
+ if (!len)
+ return -EINVAL;
+
+ mutex_lock(&wakelocks_lock);
+
+ wl = wakelock_lookup_add(buf, len, false);
+ if (IS_ERR(wl)) {
+ ret = PTR_ERR(wl);
+ goto out;
+ }
+ __pm_relax(&wl->ws);
+ list_move(&wl->lru, &wakelocks_lru_list);
+ if (++wakelocks_gc_count > WL_GC_COUNT_MAX)
+ wakelocks_gc();
+
+ out:
+ mutex_unlock(&wakelocks_lock);
+ return ret;
+}
Index: linux/kernel/power/Makefile
===================================================================
--- linux.orig/kernel/power/Makefile
+++ linux/kernel/power/Makefile
@@ -10,5 +10,6 @@ obj-$(CONFIG_PM_TEST_SUSPEND) += suspend
obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o \
block_io.o
obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o
+obj-$(CONFIG_PM_WAKELOCKS) += wakelock.o

obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -133,6 +133,7 @@ void wakeup_source_add(struct wakeup_sou
spin_lock_init(&ws->lock);
setup_timer(&ws->timer, pm_wakeup_timer_fn, (unsigned long)ws);
ws->active = false;
+ ws->last_time = ktime_get();

spin_lock_irq(&events_lock);
list_add_rcu(&ws->entry, &wakeup_sources);
Index: linux/Documentation/ABI/testing/sysfs-power
===================================================================
--- linux.orig/Documentation/ABI/testing/sysfs-power
+++ linux/Documentation/ABI/testing/sysfs-power
@@ -189,3 +189,45 @@ Description:

Reading from this file causes the last string successfully
written to it to be displayed.
+
+What: /sys/power/wake_lock
+Date: February 2012
+Contact: Rafael J. Wysocki <r...@sisk.pl>
+Description:
+ The /sys/power/wake_lock file allows user space to create
+ wakeup source objects and activate them on demand (if one of
+ those wakeup sources is active, reads from the
+ /sys/power/wakeup_count file block or return false). When a
+ string without white space is written to /sys/power/wake_lock,
+ it will be assumed to represent a wakeup source name. If there
+ is a wakeup source object with that name, it will be activated
+ (unless active already). Otherwise, a new wakeup source object
+ will be registered, assigned the given name and activated.
+ If a string written to /sys/power/wake_lock contains white
+ space, the part of the string preceding the white space will be
+ regarded as a wakeup source name and handled as descrived above.
+ The other part of the string will be regarded as a timeout (in
+ nanoseconds) such that the wakeup source will be automatically
+ deactivated after it has expired. The timeout, if present, is
+ set regardless of the current state of the wakeup source object
+ in question.
+
+ Reads from this file return a string consisting of the names of
+ wakeup sources created with the help of it that are active at
+ the moment, separated with spaces.
+
+
+What: /sys/power/wake_unlock
+Date: February 2012
+Contact: Rafael J. Wysocki <r...@sisk.pl>
+Description:
+ The /sys/power/wake_unlock file allows user space to deactivate
+ wakeup sources created with the help of /sys/power/wake_lock.
+ When a string is written to /sys/power/wake_unlock, it will be
+ assumed to represent the name of a wakeup source to deactivate.
+ If a wakeup source object of that name exists and is active at
+ the moment, it will be deactivated.
+
+ Reads from this file return a string consisting of the names of
+ wakeup sources created with the help of /sys/power/wake_lock
+ that are inactive at the moment, separated with spaces.

Rafael J. Wysocki

unread,
Apr 22, 2012, 5:22:51 PM4/22/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
From: Arve Hjønnevåg <ar...@android.com>

Add tracepoints to wakeup_source_activate and wakeup_source_deactivate.
Useful for checking that specific wakeup sources overlap as expected.

Signed-off-by: Arve Hjønnevåg <ar...@android.com>
Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
drivers/base/power/wakeup.c | 12 +++++++++---
include/trace/events/power.h | 34 ++++++++++++++++++++++++++++++++++
2 files changed, 43 insertions(+), 3 deletions(-)

Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -14,6 +14,7 @@
#include <linux/suspend.h>
#include <linux/seq_file.h>
#include <linux/debugfs.h>
+#include <trace/events/power.h>

#include "power.h"

@@ -374,12 +375,16 @@ EXPORT_SYMBOL_GPL(device_set_wakeup_enab
*/
static void wakeup_source_activate(struct wakeup_source *ws)
{
+ unsigned int cec;
+
ws->active = true;
ws->active_count++;
ws->last_time = ktime_get();

/* Increment the counter of events in progress. */
- atomic_inc(&combined_event_count);
+ cec = atomic_inc_return(&combined_event_count);
+
+ trace_wakeup_source_activate(ws->name, cec);
}

/**
@@ -454,7 +459,7 @@ EXPORT_SYMBOL_GPL(pm_stay_awake);
*/
static void wakeup_source_deactivate(struct wakeup_source *ws)
{
- unsigned int cnt, inpr;
+ unsigned int cnt, inpr, cec;
ktime_t duration;
ktime_t now;

@@ -489,7 +494,8 @@ static void wakeup_source_deactivate(str
* Increment the counter of registered wakeup events and decrement the
* couter of wakeup events in progress simultaneously.
*/
- atomic_add(MAX_IN_PROGRESS, &combined_event_count);
+ cec = atomic_add_return(MAX_IN_PROGRESS, &combined_event_count);
+ trace_wakeup_source_deactivate(ws->name, cec);

split_counters(&cnt, &inpr);
if (!inpr && waitqueue_active(&wakeup_count_wait_queue))
Index: linux/include/trace/events/power.h
===================================================================
--- linux.orig/include/trace/events/power.h
+++ linux/include/trace/events/power.h
#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED

/*

Rafael J. Wysocki

unread,
Apr 22, 2012, 5:23:01 PM4/22/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
From: Rafael J. Wysocki <r...@sisk.pl>

Wakeup statistics used by Android are slightly different from what we
have in wakeup sources at the moment and there aren't any known
users of those statistics other than Android, so modify them to make
it easier for Android to switch to wakeup sources.

This removes the struct wakeup_source's hit_cout field, which is very
rough and therefore not very useful, and adds two new fields,
wakeup_count and expire_count. The first one tracks how many times
the wakeup source is activated with events_check_enabled set (which
roughly corresponds to the situations when a system power transition
to a sleep state is in progress and would be aborted by this wakeup
source if it were the only active one at that time) and the second
one is the number of times the wakeup source has been activated with
a timeout that expired.

Additionally, the last_time field is now updated when the wakeup
source is deactivated too (previously it was only updated during
the wakeup source's activation), which seems to be what Android does
with the analogous counter for wakelocks.

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
Documentation/ABI/testing/sysfs-devices-power | 24 ++++++---
drivers/base/power/sysfs.c | 30 ++++++++++--
drivers/base/power/wakeup.c | 64 +++++++++++---------------
include/linux/pm_wakeup.h | 11 ++--
4 files changed, 77 insertions(+), 52 deletions(-)

Index: linux/include/linux/pm_wakeup.h
===================================================================
--- linux.orig/include/linux/pm_wakeup.h
+++ linux/include/linux/pm_wakeup.h
@@ -33,12 +33,14 @@
*
* @total_time: Total time this wakeup source has been active.
* @max_time: Maximum time this wakeup source has been continuously active.
- * @last_time: Monotonic clock when the wakeup source's was activated last time.
+ * @last_time: Monotonic clock when the wakeup source's was touched last time.
* @event_count: Number of signaled wakeup events.
* @active_count: Number of times the wakeup sorce was activated.
* @relax_count: Number of times the wakeup sorce was deactivated.
- * @hit_count: Number of times the wakeup sorce might abort system suspend.
+ * @expire_count: Number of times the wakeup source's timeout has expired.
+ * @wakeup_count: Number of times the wakeup source might abort suspend.
* @active: Status of the wakeup source.
+ * @has_timeout: The wakeup source has been activated with a timeout.
*/
struct wakeup_source {
const char *name;
@@ -52,8 +54,9 @@ struct wakeup_source {
unsigned long event_count;
unsigned long active_count;
unsigned long relax_count;
- unsigned long hit_count;
- unsigned int active:1;
+ unsigned long expire_count;
+ unsigned long wakeup_count;
+ bool active:1;
};

#ifdef CONFIG_PM_SLEEP
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -21,7 +21,7 @@
* If set, the suspend/hibernate code will abort transitions to a sleep state
* if wakeup events are registered during or immediately before the transition.
*/
-bool events_check_enabled;
+bool events_check_enabled __read_mostly;

/*
* Combined counters of registered wakeup events and wakeup events in progress.
@@ -383,6 +383,21 @@ static void wakeup_source_activate(struc
}

/**
+ * wakeup_source_report_event - Report wakeup event using the given source.
+ * @ws: Wakeup source to report the event for.
+ */
+static void wakeup_source_report_event(struct wakeup_source *ws)
+{
+ ws->event_count++;
+ /* This is racy, but the counter is approximate anyway. */
+ if (events_check_enabled)
+ ws->wakeup_count++;
+
+ if (!ws->active)
+ wakeup_source_activate(ws);
+}
+
+/**
* __pm_stay_awake - Notify the PM core of a wakeup event.
* @ws: Wakeup source object associated with the source of the event.
*
@@ -397,10 +412,7 @@ void __pm_stay_awake(struct wakeup_sourc

spin_lock_irqsave(&ws->lock, flags);

- ws->event_count++;
- if (!ws->active)
- wakeup_source_activate(ws);
-
+ wakeup_source_report_event(ws);
del_timer(&ws->timer);
ws->timer_expires = 0;

@@ -469,6 +481,7 @@ static void wakeup_source_deactivate(str
if (ktime_to_ns(duration) > ktime_to_ns(ws->max_time))
ws->max_time = duration;

+ ws->last_time = now;
del_timer(&ws->timer);
ws->timer_expires = 0;

@@ -541,8 +554,10 @@ static void pm_wakeup_timer_fn(unsigned
spin_lock_irqsave(&ws->lock, flags);

if (ws->active && ws->timer_expires
- && time_after_eq(jiffies, ws->timer_expires))
+ && time_after_eq(jiffies, ws->timer_expires)) {
wakeup_source_deactivate(ws);
+ ws->expire_count++;
+ }

spin_unlock_irqrestore(&ws->lock, flags);
}
@@ -569,9 +584,7 @@ void __pm_wakeup_event(struct wakeup_sou

spin_lock_irqsave(&ws->lock, flags);

- ws->event_count++;
- if (!ws->active)
- wakeup_source_activate(ws);
+ wakeup_source_report_event(ws);

if (!msec) {
wakeup_source_deactivate(ws);
@@ -614,24 +627,6 @@ void pm_wakeup_event(struct device *dev,
EXPORT_SYMBOL_GPL(pm_wakeup_event);

/**
- * pm_wakeup_update_hit_counts - Update hit counts of all active wakeup sources.
- */
-static void pm_wakeup_update_hit_counts(void)
-{
- unsigned long flags;
- struct wakeup_source *ws;
-
- rcu_read_lock();
- list_for_each_entry_rcu(ws, &wakeup_sources, entry) {
- spin_lock_irqsave(&ws->lock, flags);
- if (ws->active)
- ws->hit_count++;
- spin_unlock_irqrestore(&ws->lock, flags);
- }
- rcu_read_unlock();
-}
-
-/**
* pm_wakeup_pending - Check if power transition in progress should be aborted.
*
* Compare the current number of registered wakeup events with its preserved
@@ -653,8 +648,6 @@ bool pm_wakeup_pending(void)
events_check_enabled = !ret;
}
spin_unlock_irqrestore(&events_lock, flags);
- if (ret)
- pm_wakeup_update_hit_counts();
return ret;
}

@@ -680,7 +673,6 @@ bool pm_get_wakeup_count(unsigned int *c
split_counters(&cnt, &inpr);
if (inpr == 0 || signal_pending(current))
break;
- pm_wakeup_update_hit_counts();

schedule();
}
@@ -713,8 +705,6 @@ bool pm_save_wakeup_count(unsigned int c
events_check_enabled = true;
}
spin_unlock_irq(&events_lock);
- if (!events_check_enabled)
- pm_wakeup_update_hit_counts();
return events_check_enabled;
}

@@ -749,9 +739,10 @@ static int print_wakeup_source_stats(str
active_time = ktime_set(0, 0);
}

- ret = seq_printf(m, "%-12s\t%lu\t\t%lu\t\t%lu\t\t"
+ ret = seq_printf(m, "%-12s\t%lu\t\t%lu\t\t%lu\t\t%lu\t\t"
"%lld\t\t%lld\t\t%lld\t\t%lld\n",
- ws->name, active_count, ws->event_count, ws->hit_count,
+ ws->name, active_count, ws->event_count,
+ ws->wakeup_count, ws->expire_count,
ktime_to_ms(active_time), ktime_to_ms(total_time),
ktime_to_ms(max_time), ktime_to_ms(ws->last_time));

@@ -768,8 +759,9 @@ static int wakeup_sources_stats_show(str
{
struct wakeup_source *ws;

- seq_puts(m, "name\t\tactive_count\tevent_count\thit_count\t"
- "active_since\ttotal_time\tmax_time\tlast_change\n");
+ seq_puts(m, "name\t\tactive_count\tevent_count\twakeup_count\t"
+ "expire_count\tactive_since\ttotal_time\tmax_time\t"
+ "last_change\n");

rcu_read_lock();
list_for_each_entry_rcu(ws, &wakeup_sources, entry)
Index: linux/drivers/base/power/sysfs.c
===================================================================
--- linux.orig/drivers/base/power/sysfs.c
+++ linux/drivers/base/power/sysfs.c
@@ -314,22 +314,41 @@ static ssize_t wakeup_active_count_show(

static DEVICE_ATTR(wakeup_active_count, 0444, wakeup_active_count_show, NULL);

-static ssize_t wakeup_hit_count_show(struct device *dev,
- struct device_attribute *attr, char *buf)
+static ssize_t wakeup_abort_count_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ unsigned long count = 0;
+ bool enabled = false;
+
+ spin_lock_irq(&dev->power.lock);
+ if (dev->power.wakeup) {
+ count = dev->power.wakeup->wakeup_count;
+ enabled = true;
+ }
+ spin_unlock_irq(&dev->power.lock);
+ return enabled ? sprintf(buf, "%lu\n", count) : sprintf(buf, "\n");
+}
+
+static DEVICE_ATTR(wakeup_abort_count, 0444, wakeup_abort_count_show, NULL);
+
+static ssize_t wakeup_expire_count_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
{
unsigned long count = 0;
bool enabled = false;

spin_lock_irq(&dev->power.lock);
if (dev->power.wakeup) {
- count = dev->power.wakeup->hit_count;
+ count = dev->power.wakeup->expire_count;
enabled = true;
}
spin_unlock_irq(&dev->power.lock);
return enabled ? sprintf(buf, "%lu\n", count) : sprintf(buf, "\n");
}

-static DEVICE_ATTR(wakeup_hit_count, 0444, wakeup_hit_count_show, NULL);
+static DEVICE_ATTR(wakeup_expire_count, 0444, wakeup_expire_count_show, NULL);

static ssize_t wakeup_active_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -486,7 +505,8 @@ static struct attribute *wakeup_attrs[]
&dev_attr_wakeup.attr,
&dev_attr_wakeup_count.attr,
&dev_attr_wakeup_active_count.attr,
- &dev_attr_wakeup_hit_count.attr,
+ &dev_attr_wakeup_abort_count.attr,
+ &dev_attr_wakeup_expire_count.attr,
&dev_attr_wakeup_active.attr,
&dev_attr_wakeup_total_time_ms.attr,
&dev_attr_wakeup_max_time_ms.attr,
Index: linux/Documentation/ABI/testing/sysfs-devices-power
===================================================================
--- linux.orig/Documentation/ABI/testing/sysfs-devices-power
+++ linux/Documentation/ABI/testing/sysfs-devices-power
@@ -96,16 +96,26 @@ Description:
is read-only. If the device is not enabled to wake up the
system from sleep states, this attribute is not present.

-What: /sys/devices/.../power/wakeup_hit_count
-Date: September 2010
+What: /sys/devices/.../power/wakeup_abort_count
+Date: February 2012
Contact: Rafael J. Wysocki <r...@sisk.pl>
Description:
- The /sys/devices/.../wakeup_hit_count attribute contains the
+ The /sys/devices/.../wakeup_abort_count attribute contains the
number of times the processing of a wakeup event associated with
- the device might prevent the system from entering a sleep state.
- This attribute is read-only. If the device is not enabled to
- wake up the system from sleep states, this attribute is not
- present.
+ the device might have aborted system transition into a sleep
+ state in progress. This attribute is read-only. If the device
+ is not enabled to wake up the system from sleep states, this
+ attribute is not present.
+
+What: /sys/devices/.../power/wakeup_expire_count
+Date: February 2012
+Contact: Rafael J. Wysocki <r...@sisk.pl>
+Description:
+ The /sys/devices/.../wakeup_expire_count attribute contains the
+ number of times a wakeup event associated with the device has
+ been reported with a timeout that expired. This attribute is
+ read-only. If the device is not enabled to wake up the system
+ from sleep states, this attribute is not present.

What: /sys/devices/.../power/wakeup_active
Date: September 2010

Rafael J. Wysocki

unread,
Apr 22, 2012, 5:23:14 PM4/22/12
to Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
Hi all,

Following is the third update of the autosleep patchset.

Patches [1-4/8] are regarded as v3.5 material, the rest - depending on
the feedback I get (lack of feedback will be understood as no objections,
though).

On Wednesday, February 22, 2012, Rafael J. Wysocki wrote:
> Hi all,
>
> After the feedback so far I've decided to follow up with a refreshed patchset.
> The first two patches from the previous one went to linux-pm/linux-next
> and I included the recent evdev patch from Arve (with some modifications)
> to this patchset for completness.
>
> On Tuesday, February 07, 2012, Rafael J. Wysocki wrote:
> > Hi all,
> >
> > This series tests the theory that the easiest way to sell a once rejected
> > feature is to advertise it under a different name.
> >
> > Well, there actually are two different features, although they are closely
> > related to each other. First, patch [6/8] introduces a feature that allows
> > the kernel to trigger system suspend (or more generally a transition into
> > a sleep state) whenever there are no active wakeup sources (no, they aren't
> > called wakelocks). It is called "autosleep" here, but it was called a few
> > different names in the past ("opportunistic suspend" was probably the most
> > popular one). Second, patch [8/8] introduces "wake locks" that are,
> > essentially, wakeup sources which may be created and manipulated by user
> > space. Using them user space may control the autosleep feature introduced
> > earlier.
> >
> > This also is a kind of a proof of concept for the people who wanted me to
> > show a kernel-based implementation of automatic suspend, so there you go.
> > Please note, however, that it is done so that the user space "wake locks"
> > interface is compatible with Android in support of its user space. I don't
> > really like this interface, but since the Android's user space seems to rely
> > on it, I'm fine with using it as is. YMMV.
> >
> > Let me say a few words about every patch in the series individually.
> >
> > [1/8] - This really is a bug fix, so it's v3.4 material. Nobody has stepped
> > on this bug so far, but it should be fixed anyway.
> >
> > [2/8] - This is a freezer cleanup, worth doing anyway IMO, so v3.4 material too.

The two patches above have been merged.

> The above two are in linux-pm/linux-next now. There are a few more fixes
> related to wakeup sources in there and the patches below are based on that
> branch.
>
> > [3/8] - This is something we can do no problem, although completely optional
> > without the autosleep feature. Rather necessary with it, though.
>
> Now [1/7] - Look for wakeup events in later stages of device suspend.

[1/8] now - Look for wakeup events later down the suspend code path.

> > [4/8] - This kind of reintroduces my original idea of using a wait queue for
> > waiting until there are no wakeup events in progress. Alan convinced me that
> > it would be better to poll the counter to prevent wakeup_source_deactivate()
> > from having to call wake_up_all() occasionally (that may be costly in fast
> > paths), but then quite some people told me that the wait queue migh be
> > better. I think that the polling will make much less sense with autosleep
> > and user space "wake locks". Anyway, [4/8] is something we can do without
> > those things too.
>
> Now [2/7] - Use wait queue to signal "no wakeup events in progress"
>
> With a couple of improvements suggested by Neil.

[2/8] now - Use wait queue to signal "no wakeup events in progress" condition.

> > The patches above were given Sign-off-by tags, because I think they make some
> > sense regardless of the features introcuded by the remaining patches that in
> > turn are total RFC.
>
> This time all of the patches are signed-off and include the requisite
> documentation changes (hopefully, I haven't forgotten about anything).
>
> > [5/8] - This changes wakeup source statistics so that they are more similar to
> > the statistics collected for wakelocks on Android. The file those statistics
> > may be read from is still located in debugfs, though (I don't think it
> > belongs to proc and its name is different from the analogous Android's file
> > name anyway). It could be done without autosleep, but then it would be a bit
> > pointless. BTW, this changes interfaces that _in_ _theory_ may be used by
> > someone, but I'm not aware of anyone using them. If you are one, I'll be
> > pleased to learn about that, so please tell me who you are. :-)
>
> Now [3/7] - Change wakeup source statistics to follow Android.
>
> Rebased and reworked in accordance with the Arve's feedback.

[3/8] now - Change wakeup source statistics to follow Android.

[4/8] - Add tracepoints to wakeup_source_{de}activate()

[5/8] - Teach epoll to use wakeup sources if requested

This should be sufficient to ensure that a wakeup source will be kept active
after a wakeup event all the way up to user space without a need to make a
number of random drivers use wakeup sources.

> > [6/8] - Autosleep implementation. I think the changelog explains the idea
> > quite well and the code is really nothing special. It doesn't really add
> > anything new to the kernel in terms of infrastructure etc., it just uses
> > the existing stuff to implement an alternative method of triggering system
> > sleep transitions. Note, though, that the interface here is different
> > from the Android's one, because Android actually modifies /sys/power/state
> > to trigger something called "early suspend" (that is never going to be
> > implemented in the "stock" kernel as long as I have any influence on it) and
> > we simply can't do that in the mainline.
>
> Now [5/7] - Implement opportunistic sleep
>
> Rebased and simplified (most notably, I've dropped the "main" wakeup source,
> since it wasn't really necessary).

[6/8] now - Implement apportunistic sleep.

> > [7/8] - This adds a wakeup source statistics that only makes sense with
> > autosleep and (I believe) is analogous to the Android's prevent_suspend_time
> > statistics. Nothing really special, but I didn't want
> > wakeup_source_activate/deactivate() to take a common lock to avoid
> > congestion.
>
> Now [6/7] - Add "prevent autosleep time" statistics to wakeup sources.
>
> Rebased.

[7/8] now - Add "prevent autosleep time" statistics to wakeup sources.

> > [8/8] - This adds a user space interface to create, activate and deactivate
> > wakeup sources. Since the files it consists of are called wake_lock and
> > wake_unlock, to follow Android, the objects the wakeup sources are wrapped
> > into are called "wakelocks" (for added confusion). Since the interface
> > doesn't provide any means to destroy those "wakelocks", I added a garbage
> > collection mechanism to get rid of the unused ones, if any. I also tought
> > it might be a good idea to put a limit on the number of those things that
> > user space can operate simultaneously, so I did that too.
>
> Now [7/7] - Add user space interface for manipulating wakeup sources.

[8/8] now - Add user space interface for manipulating wakeup sources.

> > All of the above has been tested very briefly on my test-bed Mackerel board
> > and it quite obviously requires more thorough testing, but first I need to know
> > if it makes sense to spend any more time on it.
>
> The above is still accurate, but I also verified that the patches don't break
> my PC test boxes (at least as long as the new features aren't used ;-)).

Nothing has changed in that respect, as far as I can say.

The patches in the following series are available from the autosleep branch in
the linux-pm tree.

Thanks,
Rafael

mark gross

unread,
Apr 23, 2012, 12:02:12 AM4/23/12
to Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
Acked-by: mark gross <mark...@thegnar.org>

Greg KH

unread,
Apr 23, 2012, 12:49:50 PM4/23/12
to Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
On Sun, Apr 22, 2012 at 11:19:01PM +0200, Rafael J. Wysocki wrote:
> Hi all,
>
> Following is the third update of the autosleep patchset.
>
> Patches [1-4/8] are regarded as v3.5 material, the rest - depending on
> the feedback I get (lack of feedback will be understood as no objections,
> though).

This all looks great to me, thanks for continuing to push this:
Acked-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Rafael J. Wysocki

unread,
Apr 23, 2012, 3:47:06 PM4/23/12
to Greg KH, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
On Monday, April 23, 2012, Greg KH wrote:
> On Sun, Apr 22, 2012 at 11:19:01PM +0200, Rafael J. Wysocki wrote:
> > Hi all,
> >
> > Following is the third update of the autosleep patchset.
> >
> > Patches [1-4/8] are regarded as v3.5 material, the rest - depending on
> > the feedback I get (lack of feedback will be understood as no objections,
> > though).
>
> This all looks great to me, thanks for continuing to push this:
> Acked-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Thanks a lot!

Rafael

John Stultz

unread,
Apr 23, 2012, 9:36:18 PM4/23/12
to Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
One small bug. In wakelock_lookup_add, you're assigning parent after
you assign node, so at loop exit the parent might be null.
This resulted in some strange cases where I'd add two wakelocks and
everything would be fine, but then adding the third would cause the
first two to get lost.

The following patch seems to fix it.

thanks
-john

diff --git a/kernel/power/wakelock.c b/kernel/power/wakelock.c
index 2f99f02..f950cc2 100644
--- a/kernel/power/wakelock.c
+++ b/kernel/power/wakelock.c
@@ -70,6 +70,7 @@ static struct wakelock *wakelock_lookup_add(const char *name, size_t len,
while (*node) {
int diff;

+ parent = *node;
wl = rb_entry(*node, struct wakelock, node);
diff = strncmp(name, wl->name, len);
if (diff == 0) {
@@ -82,8 +83,6 @@ static struct wakelock *wakelock_lookup_add(const char *name, size_t len,
node =&(*node)->rb_left;
else
node =&(*node)->rb_right;
-
- parent = *node;
}
if (!add_if_not_found)
return ERR_PTR(-EINVAL);

Rafael J. Wysocki

unread,
Apr 24, 2012, 5:22:48 PM4/24/12
to John Stultz, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Neil Brown, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
Thanks a lot for the fix!

I have folded it into the $subject patch and the new version is appended.

Thanks,
Rafael

---
From: Rafael J. Wysocki <r...@sisk.pl>
Subject: PM / Sleep: Add user space interface for manipulating wakeup sources, v2

Android allows user space to manipulate wakelocks using two
sysfs file located in /sys/power/, wake_lock and wake_unlock.
Writing a wakelock name and optionally a timeout to the wake_lock
file causes the wakelock whose name was written to be acquired (it
is created before is necessary), optionally with the given timeout.
Writing the name of a wakelock to wake_unlock causes that wakelock
to be released.

Implement an analogous interface for user space using wakeup sources.
Add the /sys/power/wake_lock and /sys/power/wake_unlock files
allowing user space to create, activate and deactivate wakeup
sources, such that writing a name and optionally a timeout to
wake_lock causes the wakeup source of that name to be activated,
optionally with the given timeout. If that wakeup source doesn't
exist, it will be created and then activated. Writing a name to
wake_unlock causes the wakeup source of that name, if there is one,
to be deactivated. Wakeup sources created with the help of
wake_lock that haven't been used for more than 5 minutes are garbage
collected and destroyed. Moreover, there can be only WL_NUMBER_LIMIT
wakeup sources created with the help of wake_lock present at a time.

The data type used to track wakeup sources created by user space is
called "struct wakelock" to indicate the origins of this feature.

This version of the patch includes an rbtree manipulation fix from John Stultz.

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
Acked-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
Documentation/ABI/testing/sysfs-power | 42 ++++++
drivers/base/power/wakeup.c | 1
kernel/power/Kconfig | 8 +
kernel/power/Makefile | 1
kernel/power/main.c | 41 ++++++
kernel/power/power.h | 9 +
kernel/power/wakelock.c | 217 ++++++++++++++++++++++++++++++++++
7 files changed, 319 insertions(+)
@@ -0,0 +1,217 @@
+ parent = *node;
+ wl = rb_entry(*node, struct wakelock, node);
+ diff = strncmp(name, wl->name, len);
+ if (diff == 0) {
+ if (wl->name[len])
+ diff = -1;
+ else
+ return wl;
+ }
+ if (diff < 0)
+ node = &(*node)->rb_left;
+ else
+ node = &(*node)->rb_right;
+ }

Rafael J. Wysocki

unread,
Apr 26, 2012, 4:36:21 PM4/26/12
to NeilBrown, Arve Hjønnevåg, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
On Thursday, April 26, 2012, NeilBrown wrote:
> On Sun, 22 Apr 2012 23:22:43 +0200 "Rafael J. Wysocki" <r...@sisk.pl> wrote:
>
> > From: Arve Hjønnevåg <ar...@android.com>
> >
> > When an epoll_event, that has the EPOLLWAKEUP flag set, is ready, a
> > wakeup_source will be active to prevent suspend. This can be used to
> > handle wakeup events from a driver that support poll, e.g. input, if
> > that driver wakes up the waitqueue passed to epoll before allowing
> > suspend.
> >
> > The current implementation uses an extra wakeup_source when
> > ep_scan_ready_list runs. This can cause problems if a single thread
> > is polling on wakeup events and frequent non-wakeup events (events
> > usually arrive during thread freezing) using the same epoll file.
>
> This is quite neat.
>
> If I understand it correctly, you register file descriptors with epoll_ctl()
> on an fd created with epoll_create(), and set the new EPOLLWAKEUP flag.
> Then when a regular 'poll' or 'select' on the epoll fd reports that it is
> readable you:
> - get a wakelock
> - use epoll_wait to collect the events
> - process the events
> - release your wakelock
> - go back to poll() or select() on the epoll fd.
> Correct? As long as there are ready events with EPOLLWAKEUP set a
> wakeup_source is held active and the system won't go to sleep.
>
> My concern with this is about permissions. It appears that any process could
> wait of some fd (maybe a pipe they created themselves) with EPOLLWAKEUP, and
> then simply never epoll_wait() for the event. Then they would be keeping
> the system awake. I don't think that is acceptable.

I wonder what Arve has to say to that, but let me say that on systems without
autosleep every process can go into an infinite busy loop which is going to
drain battery relatively quickly just as well and I don't see why that's so
much different.

> So there needs to be some way to limit who can effectively block suspend by
> using EPOLLWAKEUP.
> (This is one of the reasons I like an all-user-space solution. Policy issues
> like this can easily be decided in user-space but are clumsy to put into the
> kernel).
>
> Also, I'm having trouble understanding the ep->ws wakeup_source.
> The epi->ws makes lots of sense and I think I understand it all.
> However I don't see why you need a wakeup_source for the 'struct eventpoll'.
>
> Every time that 'poll' decides to call the ->poll fop for the eventpoll, this
> wakeup_source will be activated and deactivated which will abort any current
> suspend cycle even if there are no events to report.
>
> I suspect it can just go away.

I'll leave this one entirely to Arve, if you don't mind. :-)

> One last item that doesn't really belong here - but it is in context.
>
> This mechanism is elegant because it provides a single implementation that
> provides wakeup_source for almost any sort of device. I would like to do the
> same thing for interrupts.
> Most (maybe all) of the wakeup device on my phone have an interrupt where the
> body is run in a thread. When the thread has done it's work the event is
> visible to userspace so the EPOLLWAKEUP mechanism is all that is needed to
> complete the path to user-space (or for my user-space solution, nothing else
> is needed once it is visible to user-space).
> So we just need to ensure a clear path from the "top half" interrupt handler
> to the threaded handler.
> So I imagine attaching a wakeup source to every interrupt for which 'wakeup'
> is enabled, activating it when the top-half starts and relaxing it when the
> bottom-half completes. With this in place, almost all drivers would get
> wakeup_source handling for free.
> Does this seem reasonable to you.

Yes, it does.

Wakeup devices have their own wakeup source objects anyway, so perhaps they may
be used for this purpose somehow (just wondering).

> I'm afraid I don't have code yet, but hope to find time in a few weeks.
>
> One difficulty with that is that I have noticed a number of drivers that
> potentially enable_irq_wake just before suspend and disable_irq_wake
> immediately after (e.g. gpio_keys.c). Allocating a wakeup_source on each
> enable_irq_wake would be an unfortunate overhead. Maybe we just allocate it
> the first time enable_irq_wake is called ....

I guess we can do something in analogy with device_wakeup_enable()?

Rafael

Rafael J. Wysocki

unread,
Apr 26, 2012, 5:48:22 PM4/26/12
to NeilBrown, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
On Thursday, April 26, 2012, NeilBrown wrote:
> On Sun, 22 Apr 2012 23:23:23 +0200 "Rafael J. Wysocki" <r...@sisk.pl> wrote:
>
> > From: "Rafael J. Wysocki" <r...@sisk.pl>
> > To: Linux PM list <linu...@vger.kernel.org>
> > Cc: LKML <linux-...@vger.kernel.org>, Magnus Damm <magnu...@gmail.com>, mark...@thegnar.org, Matthew Garrett <m...@redhat.com>, Greg KH <gre...@linuxfoundation.org>, Arve Hjønnevåg <ar...@android.com>, John Stultz <john....@linaro.org>, Brian Swetland <swet...@google.com>, Neil Brown <ne...@suse.de>, Alan Stern <st...@rowland.harvard.edu>, Dmitry Torokhov <dmitry....@gmail.com>, "Srivatsa S. Bhat" <srivat...@linux.vnet.ibm.com>
> > Subject: [RFC][PATCH 6/8] PM / Sleep: Implement opportunistic sleep
> > Date: Sun, 22 Apr 2012 23:23:23 +0200
> > Sender: linux-ker...@vger.kernel.org
> > User-Agent: KMail/1.13.6 (Linux/3.4.0-rc3+; KDE/4.6.0; x86_64; ; )
> >
> > From: Rafael J. Wysocki <r...@sisk.pl>
> >
> > Introduce a mechanism by which the kernel can trigger global
> > transitions to a sleep state chosen by user space if there are no
> > active wakeup sources.
>
> Hi Rafael,

Hi,

> just a few little issues below. Over all I think that if we have to have
> auto-sleep in the kernel, then this is a good way to do it.

Good, we seem to agree in principle, then. :-)
> This doesn't do what you seem to expect it to do.
> You need to set current->state to something like TASK_UNINTERRUPTIBLE
> before calling schedule_timeout, otherwise it is effectily a no-op.
> schedule_timeout_uninterruptible(), for example, will do this for you.

Right. I obviously overlooked the missing state change.

> However the value of this isn't clear to me, so a comment would probably be a
> good thing.
> This continue presumably fires if we wake up without any wakeup sources
> being activated. In that case you want to delay for 500ms - presumably to
> avoid a tight suspend/resume loop if something goes wrong?

Yes.

> I have occasionally seen a stray/uninteresting interrupt wake from suspend
> immediately after entering suspend and the next attempt succeeds. Maybe this
> is a bug in some driver somewhere, but not a big one. I think I would rather
> in that case that we attempt to re-enter suspend immediately. Maybe after a
> few failed attempts it makes sense to back off.

Perhaps. We can adjust this particular thing later, I think.

> The other question is: if we want to back-off, is 500ms really enough? What
> will be gained by, or could be achieved in, that time? An exponential
> back-off might be defensible, but I can't see the value of a 500ms fixed
> back-off.
> However if you can, I'd love to see a comment in there explaining it.

Sure.

> > +
> > + out:
> > + queue_up_suspend_work();
> > +}
> > +
>
>
> > +
> > +int pm_autosleep_set_state(suspend_state_t state)
> > +{
> > +
> > +#ifndef CONFIG_HIBERNATION
> > + if (state >= PM_SUSPEND_MAX)
> > + return -EINVAL;
> > +#endif
> > +
> > + __pm_stay_awake(autosleep_ws);
> > +
> > + mutex_lock(&autosleep_lock);
> > +
> > + autosleep_state = state;
> > +
> > + __pm_relax(autosleep_ws);
>
> I'm struggling to see the point of the autosleep_ws.
>
> A suspend cannot actually happen while this code is running (can it?) because
> it will wait for the process to enter the freezer.
> So the only effect of this is:
> 1/ cause the current auto-sleep cycle to abort and
> 2/ maybe add some accounting number is the autosleep_ws.
> Is that right?
> Which of these is needed?

This is to solve a problem when user space attempts to echo "off" to
/sys/power/autosleep exactly when pm_suspend() is initiated as a part
of autosleep under the autosleep lock. In that case, if autosleep_ws is not
there, the process wanting to disable autosleep will have to wait for the
pm_suspend() to complete (unless it holds a wakelock), which is suboptimal.

> I would imagine that any process writing to /sys/power/autosleep would be
> holding a wakelock, and if it didn't it should expect things to be racy...
>
> Am I missing something?

The assumption above is kind of optimistic in my opinion. That process
very well may be a system administrator's bash, for example. :-)

> > +
> > + if (state > PM_SUSPEND_ON)
> > + queue_up_suspend_work();
>
> The test here is superfluous as queue_up_suspend_work() itself tests that
> 'state' is > PM_SUSPEND_ON. However maybe it is more readable this way, so I
> won't object it you like it.

Well, patch [7/8] adds the second statement under this conditional,
so I'd prefer to keep it the current way.

> > +
> > + mutex_unlock(&autosleep_lock);
> > + return 0;
> > +}
>
>
> > @@ -339,7 +359,8 @@ static ssize_t wakeup_count_show(struct
> > {
> > unsigned int val;
> >
> > - return pm_get_wakeup_count(&val) ? sprintf(buf, "%u\n", val) : -EINTR;
> > + return pm_get_wakeup_count(&val, true) ?
> > + sprintf(buf, "%u\n", val) : -EINTR;
> > }
>
> I think it would be really nice for user-space auto-suspend if the 'block'
> flag to be settable from the O_NONBLOCK setting. And for poll() to work
> on /sys/power/wakeup-count. However this would require a bit of surgery on
> sysfs. So that is a "maybe later", but having the 'block' flag in there is
> a step in the right direction.

Yes, "maybe later" is what I think about that too. :-)

> >
> > static ssize_t wakeup_count_store(struct kobject *kobj,
> > @@ -347,15 +368,69 @@ static ssize_t wakeup_count_store(struct
> > const char *buf, size_t n)
> > {
> > unsigned int val;
> > + int error;
> > +
> > + error = pm_autosleep_lock();
> > + if (error)
> > + return error;
> > +
> > + if (pm_autosleep_state() > PM_SUSPEND_ON) {
> > + error = -EBUSY;
> > + goto out;
> > + }
> >
> > if (sscanf(buf, "%u", &val) == 1) {
> > if (pm_save_wakeup_count(val))
> > return n;
>
> You need a 'pm_autosleep_unlock() in there - or possibly
> error = n; goto out;

Right, thanks for spotting this!

> > }
> > - return -EINVAL;
> > + error = -EINVAL;
> > +
> > + out:
> > + pm_autosleep_unlock();
> > + return error;
> > }
>
> > core_initcall(pm_init);
> > Index: linux/drivers/base/power/wakeup.c
> > ===================================================================
> > --- linux.orig/drivers/base/power/wakeup.c
> > +++ linux/drivers/base/power/wakeup.c
> > @@ -498,8 +498,10 @@ static void wakeup_source_deactivate(str
> > trace_wakeup_source_deactivate(ws->name, cec);
> >
> > split_counters(&cnt, &inpr);
> > - if (!inpr && waitqueue_active(&wakeup_count_wait_queue))
> > + if (!inpr && waitqueue_active(&wakeup_count_wait_queue)) {
> > wake_up(&wakeup_count_wait_queue);
> > + queue_up_suspend_work();
> > + }
>
> This doesn't look right. suspend_work always requeues itself unless
> autosleep_state == PM_SUSPEND_ON, and whenver autosleep_state is set we
> already call queue_up_suspend_work(). So there is no need to call it here.

OK, I agree. Good, I don't have to add more code to wakeup_source_deactivate(). :-)

> > Index: linux/Documentation/ABI/testing/sysfs-power
> > ===================================================================
> > --- linux.orig/Documentation/ABI/testing/sysfs-power
> > +++ linux/Documentation/ABI/testing/sysfs-power
> > @@ -172,3 +172,20 @@ Description:
> >
> > Reading from this file will display the current value, which is
> > set to 1 MB by default.
> > +
> > +What: /sys/power/autosleep
> > +Date: February 2012
> > +Contact: Rafael J. Wysocki <r...@sisk.pl>
> > +Description:
> > + The /sys/power/autosleep file can be written one of the strings
>
> "To the .. file can be written..." or
> "The .. file can have written ..." or
> "One of the strings returned by (reads from) /sys/power/state can be written
> to the file ..."
> ??
> > + returned by reads from /sys/power/state. If that happens, a
> > + work item attempting to trigger a transition of the system to
> > + the sleep state represented by that string is queued up. This
> > + attempt will only succeed if there are no active wakeup sources
> > + in the system at that time. After evey execution, regardless
> ^^^^
> "every"
>
> > + of whether or not the attempt to put the system to sleep has
> > + succeeded, the work item requeues itself until user space
> > + writes "off" to /sys/power/autosleep.
> > +
> > + Reading from this file causes the last string successfully
> > + written to it to be displayed.
> ^^^^^^^^^ "returned".

Well spotted, thanks!

Below is an updated patch hopefully addressing your comments.

Thanks,
Rafael

---
From: Rafael J. Wysocki <r...@sisk.pl>
Subject: PM / Sleep: Implement opportunistic sleep, v2

Introduce a mechanism by which the kernel can trigger global
transitions to a sleep state chosen by user space if there are no
active wakeup sources.

It consists of a new sysfs attribute, /sys/power/autosleep, that
can be written one of the strings returned by reads from
/sys/power/state, an ordered workqueue and a work item carrying out
the "suspend" operations. If a string representing the system's
sleep state is written to /sys/power/autosleep, the work item
triggering transitions to that state is queued up and it requeues
itself after every execution until user space writes "off" to
/sys/power/autosleep.

That work item enables the detection of wakeup events using the
functions already defined in drivers/base/power/wakeup.c (with one
small modification) and calls either pm_suspend(), or hibernate() to
put the system into a sleep state. If a wakeup event is reported
while the transition is in progress, it will abort the transition and
the "system suspend" work item will be queued up again.

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
Acked-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
Documentation/ABI/testing/sysfs-power | 17 ++++
drivers/base/power/wakeup.c | 34 +++++----
include/linux/suspend.h | 13 +++
kernel/power/Kconfig | 8 ++
kernel/power/Makefile | 1
kernel/power/autosleep.c | 117 +++++++++++++++++++++++++++++++++
kernel/power/main.c | 119 ++++++++++++++++++++++++++++------
kernel/power/power.h | 18 +++++
8 files changed, 292 insertions(+), 35 deletions(-)
@@ -0,0 +1,117 @@
+ /*
+ * If the wakeup occured for an unknown reason, wait to prevent the
+ * system from trying to suspend and waking up in a tight loop.
+ */
+ if (final_count == initial_count)
+ schedule_timeout_uninterruptible(HZ / 2);
+ else if (state == PM_SUSPEND_MAX)
+ error = -EINVAL;
if (sscanf(buf, "%u", &val) == 1) {
if (pm_save_wakeup_count(val))
- return n;
+ error = n;
}
- return -EINVAL;
+
+ && !(strncmp(buf, "off", 3) && strncmp(buf, "off\n", 4)))
@@ -660,29 +660,33 @@ bool pm_wakeup_pending(void)
+Date: April 2012
+Contact: Rafael J. Wysocki <r...@sisk.pl>
+Description:
+ The /sys/power/autosleep file can be written one of the strings
+ returned by reads from /sys/power/state. If that happens, a
+ work item attempting to trigger a transition of the system to
+ the sleep state represented by that string is queued up. This
+ attempt will only succeed if there are no active wakeup sources
+ in the system at that time. After every execution, regardless
+ of whether or not the attempt to put the system to sleep has
+ succeeded, the work item requeues itself until user space
+ writes "off" to /sys/power/autosleep.
+
+ Reading from this file causes the last string successfully
+ written to it to be returned.

Rafael J. Wysocki

unread,
Apr 26, 2012, 5:59:57 PM4/26/12
to NeilBrown, John Stultz, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
On Thursday, April 26, 2012, NeilBrown wrote:
> Looks good. Just a couple of minor suggestions.
>
>
> > +ssize_t pm_show_wakelocks(char *buf, bool show_active)
> > +{
> > + struct rb_node *node;
> > + struct wakelock *wl;
> > + char *str = buf;
> > + char *end = buf + PAGE_SIZE;
> > +
> > + mutex_lock(&wakelocks_lock);
> > +
> > + for (node = rb_first(&wakelocks_tree); node; node = rb_next(node)) {
> > + bool active;
> > +
> > + wl = rb_entry(node, struct wakelock, node);
> > + spin_lock_irq(&wl->ws.lock);
> > + active = wl->ws.active;
> > + spin_unlock_irq(&wl->ws.lock);
>
> I don't think the spin_lock is needed. We are just reading one value and it
> is either 0 or not. So there is no possibility for any inconsistency.
> if (wl->ws.active == show_active)
> ?

Good point.

> > + if (active == show_active)
> > + str += scnprintf(str, end - str, "%s ", wl->name);
>
> Arg. Extra space on the end of the line!! :-)

Well, it's not too difficult to get rid of it (as in the patch below).

> I would suggest the entries be terminated by '\n' rather than separate by
> space.
> one-item-per-line is much more common in Unix in general. 'grep' allows
> you to find things more easily etc.
> while read a
> do echo $a > wake_unlock
> done < wake_lock

I know, but this follows the general convention of the files under /sys/power/.

Thanks,
Rafael

---
From: Rafael J. Wysocki <r...@sisk.pl>
Subject: PM / Sleep: Add user space interface for manipulating wakeup sources, v3
kernel/power/wakelock.c | 215 ++++++++++++++++++++++++++++++++++
7 files changed, 317 insertions(+)
@@ -0,0 +1,215 @@
+ wl = rb_entry(node, struct wakelock, node);
+ if (wl->ws.active == show_active)
+ str += scnprintf(str, end - str, "%s ", wl->name);
+ }
+ if (str > buf)
+ str--;
written to it to be returned.

Arve Hjønnevåg

unread,
Apr 26, 2012, 11:50:10 PM4/26/12
to Rafael J. Wysocki, NeilBrown, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
2012/4/26 Rafael J. Wysocki <r...@sisk.pl>:
> On Thursday, April 26, 2012, NeilBrown wrote:
>> On Sun, 22 Apr 2012 23:22:43 +0200 "Rafael J. Wysocki" <r...@sisk.pl> wrote:
>>
>> > From: Arve Hjønnevåg <ar...@android.com>
>> >
>> > When an epoll_event, that has the EPOLLWAKEUP flag set, is ready, a
>> > wakeup_source will be active to prevent suspend. This can be used to
>> > handle wakeup events from a driver that support poll, e.g. input, if
>> > that driver wakes up the waitqueue passed to epoll before allowing
>> > suspend.
>> >
>> > The current implementation uses an extra wakeup_source when
>> > ep_scan_ready_list runs. This can cause problems if a single thread
>> > is polling on wakeup events and frequent non-wakeup events (events
>> > usually arrive during thread freezing) using the same epoll file.
>>
>> This is quite neat.
>>
>> If I understand it correctly, you register file descriptors with epoll_ctl()
>> on an fd created with epoll_create(), and set the new EPOLLWAKEUP flag.
>> Then when a regular 'poll' or 'select' on the epoll fd reports that it is
>> readable you:

I think it makes more sense to use epoll_wait than mixing this with
select or poll.

>>   - get a wakelock
This may not be needed, since epoll does not reevaluate its state
until you call into it again (at least using epoll_wait).

>>   - use epoll_wait to collect the events
>>   - process the events
>>   - release your wakelock
>>   - go back to poll() or select() on the epoll fd.
>> Correct?  As long as there are ready events with EPOLLWAKEUP set a
>> wakeup_source is held active and the system won't go to sleep.
>>
>> My concern with this is about permissions.  It appears that any process could
>> wait of some fd (maybe a pipe they created themselves) with EPOLLWAKEUP, and
>> then simply never epoll_wait() for the event.  Then they would be keeping
>> the system awake.  I don't think that is acceptable.
>
> I wonder what Arve has to say to that, but let me say that on systems without
> autosleep every process can go into an infinite busy loop which is going to
> drain battery relatively quickly just as well and I don't see why that's so
> much different.
>

I still think is useful to limit access to this feature. On a phone, a
process stuck in an infinite loop will increase battery drain, but if
this process does not have permission to prevent suspend, then this is
only catastrophic if another process that have that permission is
preventing suspend. I think we should add a capability for this.
Assuming you agree, do want me to create a separate patch for that
adds a capability, or roll it into this one.

>> So there needs to be some way to limit who can effectively block suspend by
>> using EPOLLWAKEUP.
>> (This is one of the reasons I like an all-user-space solution.  Policy issues
>> like this can easily be decided in user-space but are clumsy to put into the
>> kernel).
>>
>> Also, I'm having trouble understanding the ep->ws wakeup_source.
>> The epi->ws makes lots of sense and I think I understand it all.
>> However I don't see why you need a wakeup_source for the 'struct eventpoll'.
>>
>> Every time that 'poll' decides to call the ->poll fop for the eventpoll, this
>> wakeup_source will be activated and deactivated which will abort any current
>> suspend cycle even if there are no events to report.
>>
>> I suspect it can just go away.
>
> I'll leave this one entirely to Arve, if you don't mind. :-)
>

I keep the wakeup-source active whenever the epitem is on a list
(ep->rdllist or the local txlist). The temporary txlist is modified
without holding the lock that protects ep->rdllist. It is easier to
use a separate wakeup source to prevent suspend while this list is
manipulated than trying to maintain the wakeup-source state in a
different way than the existing eventpoll state. I think this only
causes real problems if the same epoll file is used for frequent
non-wakeup events (e.g. a gyro) and wakeup events. You should be able
to work around this by using two epoll files.

>> One last item that doesn't really belong here - but it is in context.
>>
>> This mechanism is elegant because it provides a single implementation that
>> provides wakeup_source for almost any sort of device.  I would like to do the
>> same thing for interrupts.
>> Most (maybe all) of the wakeup device on my phone have an interrupt where the
>> body is run in a thread.  When the thread has done it's work the event is
>> visible to userspace so the EPOLLWAKEUP mechanism is all that is needed to
>> complete the path to user-space (or for my user-space solution, nothing else
>> is needed once it is visible to user-space).
>> So we just need to ensure a clear path from the "top half" interrupt handler
>> to the threaded handler.
>> So I imagine attaching a wakeup source to every interrupt for which 'wakeup'
>> is enabled, activating it when the top-half starts and relaxing it when the
>> bottom-half completes.  With this in place, almost all drivers would get
>> wakeup_source handling for free.
>> Does this seem reasonable to you.
>
> Yes, it does.
>

How useful is that? Suspend already synchronizes with interrupt
handlers and will not proceed until they have returned. Are threaded
interrupts handlers not always run at that stage? For drivers that use
work-queues instead of a threaded interrupt handler, I think the
suspend-blocking work-queue patch I wrote a while back is convenient.

--
Arve Hjønnevåg

Arve Hjønnevåg

unread,
Apr 26, 2012, 11:57:33 PM4/26/12
to Rafael J. Wysocki, NeilBrown, John Stultz, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
2012/4/26 Rafael J. Wysocki <r...@sisk.pl>:
..
> ---
> From: Rafael J. Wysocki <r...@sisk.pl>
> Subject: PM / Sleep: Add user space interface for manipulating wakeup sources, v3
>
> Android allows user space to manipulate wakelocks using two
> sysfs file located in /sys/power/, wake_lock and wake_unlock.
> Writing a wakelock name and optionally a timeout to the wake_lock
> file causes the wakelock whose name was written to be acquired (it
> is created before is necessary), optionally with the given timeout.
> Writing the name of a wakelock to wake_unlock causes that wakelock
> to be released.
>
> Implement an analogous interface for user space using wakeup sources.
> Add the /sys/power/wake_lock and /sys/power/wake_unlock files
> allowing user space to create, activate and deactivate wakeup
> sources, such that writing a name and optionally a timeout to
> wake_lock causes the wakeup source of that name to be activated,
> optionally with the given timeout.  If that wakeup source doesn't
> exist, it will be created and then activated.  Writing a name to
> wake_unlock causes the wakeup source of that name, if there is one,
> to be deactivated.  Wakeup sources created with the help of
> wake_lock that haven't been used for more than 5 minutes are garbage
> collected and destroyed.  Moreover, there can be only WL_NUMBER_LIMIT

I think it would be better if the garbage collection and limit was
configurable and optional. I would probably turn both features off
since I do not want to chase down bugs because a wakelock was ignored,
and I think the garbage collection will erase stats that we care
about.

--
Arve Hjønnevåg

Rafael J. Wysocki

unread,
Apr 27, 2012, 5:10:19 PM4/27/12
to Arve Hjønnevåg, NeilBrown, John Stultz, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
On Friday, April 27, 2012, Arve Hjønnevåg wrote:
> 2012/4/26 Rafael J. Wysocki <r...@sisk.pl>:
> ...
OK, but would you mind if I added the configurability as a separate incremental
patch?

Rafael

Rafael J. Wysocki

unread,
Apr 27, 2012, 5:10:36 PM4/27/12
to NeilBrown, John Stultz, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
On Friday, April 27, 2012, NeilBrown wrote:
> On Fri, 27 Apr 2012 00:04:27 +0200 "Rafael J. Wysocki" <r...@sisk.pl> wrote:
>
> > ---
> > From: Rafael J. Wysocki <r...@sisk.pl>
> > Subject: PM / Sleep: Add user space interface for manipulating wakeup sources, v3
>
> Reviewed-by: NeilBrown <ne...@suse.de>

Thanks!

Rafael J. Wysocki

unread,
Apr 27, 2012, 5:14:12 PM4/27/12
to Arve Hjønnevåg, NeilBrown, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
I do.

> do want me to create a separate patch for that
> adds a capability, or roll it into this one.

Please roll it into this one, if that's not a problem.

Thanks,
Rafael

Rafael J. Wysocki

unread,
Apr 27, 2012, 5:17:33 PM4/27/12
to NeilBrown, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, Arve Hjønnevåg, John Stultz, Brian Swetland, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
On Friday, April 27, 2012, NeilBrown wrote:
> If it is, then presumably the auto-sleep could kick in between any pair of
> keystrokes that the sysadmin types. Or between the final 'enter' and when the
> write() system call begins. All that autosleep_ws seems to provide is
> certainty that when the write() system call completes, autosleep will be
> fully disabled.
> I don't think that is really worth anything.
>
> However, something did occur to me that I would like clarified.
> What happens if try_to_suspend() gets the autosleep_lock just before
> wakeup_count_store(), state_store() or pm_autosleep_set_state()
> try to get it?
> For pm_autosleep_set_state() the try_to_suspend() attempt will abort because
> it is holding autosleep_ws, so it will drop the lock and
> pm_autosleep_set_state() will continue happily.
> For the other two, what will happen (if there are no active wakesources and
> autosleep is enabled).
> I'm guessing that try_to_suspend will try to freeze all the process, which
> sends a pseudo signal to all processes, so the mutex_lock_interruptible will
> fail and the suspend will complete.
> Then will the aborted write() system call be re-attempted?
>
> If that is right, then here is a very clear need to autosleep_ws: it prevents
> a deadlock.

Yes, I think that this is the case.

> So it appears there is a very real need for autosleep_ws that even I can
> agree with. It seems subtle though and could usefully be documented:
>
> /* Note: it is only safe to mutex_lock(&autosleep_lock) if a wakeup_source
> * is active, otherwise a deadlock with try_to_suspend() is possible.
> * Alternatively mutex_lock_interruptible() can be used. This will then fail
> * if an auto_sleep cycle tries to freeze processes.
> */

I'll add the comment above if you don't mind. :-)

> static DEFINE_MUTEX(autosleep_lock);
>
> So:
> Reviewed-by: NeilBrown <ne...@suse.de>

Thanks!

Rafael

Arve Hjønnevåg

unread,
Apr 27, 2012, 5:24:53 PM4/27/12
to Rafael J. Wysocki, NeilBrown, John Stultz, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
2012/4/27 Rafael J. Wysocki <r...@sisk.pl>:
That is fine with me.

--
Arve Hjønnevåg

Rafael J. Wysocki

unread,
Apr 27, 2012, 5:29:31 PM4/27/12
to Arve Hjønnevåg, NeilBrown, John Stultz, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
Cool, thanks!

Arve Hjønnevåg

unread,
Apr 27, 2012, 7:26:53 PM4/27/12
to Rafael J. Wysocki, Arve Hjønnevåg, NeilBrown, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
When an epoll_event, that has the EPOLLWAKEUP flag set, is ready, a
wakeup_source will be active to prevent suspend. This can be used to
handle wakeup events from a driver that support poll, e.g. input, if
that driver wakes up the waitqueue passed to epoll before allowing
suspend.

The current implementation uses an extra wakeup_source when
ep_scan_ready_list runs. This can cause problems if a single thread
is polling on wakeup events and frequent non-wakeup events (events
usually arrive during thread freezing) using the same epoll file.

Signed-off-by: Arve Hjønnevåg <ar...@android.com>
Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
fs/eventpoll.c | 75 ++++++++++++++++++++++++++++++++++++++++++--
include/linux/capability.h | 5 ++-
include/linux/eventpoll.h | 6 +++
3 files changed, 82 insertions(+), 4 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 739b098..16718f6 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -565,6 +572,7 @@ static int ep_scan_ready_list(struct eventpoll *ep,
* in a lockless way.
*/
spin_lock_irqsave(&ep->lock, flags);
+ __pm_stay_awake(ep->ws);
list_splice_init(&ep->rdllist, &txlist);
ep->ovflist = NULL;
spin_unlock_irqrestore(&ep->lock, flags);
@@ -588,8 +596,10 @@ static int ep_scan_ready_list(struct eventpoll *ep,
* queued into ->ovflist but the "txlist" might already
* contain them, and the list_splice() below takes care of them.
*/
- if (!ep_is_linked(&epi->rdllink))
+ if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);
+ }
}
/*
* We need to set back ep->ovflist to EP_UNACTIVE_PTR, so that after
@@ -602,6 +612,7 @@ static int ep_scan_ready_list(struct eventpoll *ep,
* Quickly re-inject items left on "txlist".
*/
list_splice(&txlist, &ep->rdllist);
+ __pm_relax(ep->ws);

if (!list_empty(&ep->rdllist)) {
/*
@@ -656,6 +667,9 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi)
list_del_init(&epi->rdllink);
spin_unlock_irqrestore(&ep->lock, flags);

+ if (epi->ws)
+ wakeup_source_unregister(epi->ws);
+
/* At this point it is safe to free the eventpoll item */
kmem_cache_free(epi_cache, epi);

@@ -706,6 +720,8 @@ static void ep_free(struct eventpoll *ep)
mutex_unlock(&epmutex);
mutex_destroy(&ep->mtx);
free_uid(ep->user);
+ if (ep->ws)
+ wakeup_source_unregister(ep->ws);
kfree(ep);
}

@@ -737,6 +753,7 @@ static int ep_read_events_proc(struct eventpoll *ep, struct list_head *head,
* callback, but it's not actually ready, as far as
* caller requested events goes. We can remove it here.
*/
+ __pm_relax(epi->ws);
list_del_init(&epi->rdllink);
}
}
@@ -932,8 +949,10 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
}

/* If this file is already in the ready list we exit soon */
- if (!ep_is_linked(&epi->rdllink))
+ if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);
+ }

/*
* Wake up ( if active ) both the eventpoll wait list and the ->poll()
@@ -1091,6 +1110,30 @@ static int reverse_path_check(void)
return error;
}

+static int ep_create_wakeup_source(struct epitem *epi)
+{
+ const char *name;
+
+ if (!epi->ep->ws) {
+ epi->ep->ws = wakeup_source_register("eventpoll");
+ if (!epi->ep->ws)
+ return -ENOMEM;
+ }
+
+ name = epi->ffd.file->f_path.dentry->d_name.name;
+ epi->ws = wakeup_source_register(name);
+ if (!epi->ws)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static void ep_destroy_wakeup_source(struct epitem *epi)
+{
+ wakeup_source_unregister(epi->ws);
+ epi->ws = NULL;
+}
+
/*
* Must be called with "mtx" held.
*/
@@ -1118,6 +1161,13 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
epi->event = *event;
epi->nwait = 0;
epi->next = EP_UNACTIVE_PTR;
+ if (epi->event.events & EPOLLWAKEUP) {
+ error = ep_create_wakeup_source(epi);
+ if (error)
+ goto error_create_wakeup_source;
+ } else {
+ epi->ws = NULL;
+ }

/* Initialize the poll table using the queue callback */
epq.epi = epi;
@@ -1164,6 +1214,7 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
/* If the file is already "ready" we drop it inside the ready list */
if ((revents & event->events) && !ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);

/* Notify waiting tasks that events are available */
if (waitqueue_active(&ep->wq))
@@ -1204,6 +1255,10 @@ error_unregister:
list_del_init(&epi->rdllink);
spin_unlock_irqrestore(&ep->lock, flags);

+ if (epi->ws)
+ wakeup_source_unregister(epi->ws);
+
+error_create_wakeup_source:
kmem_cache_free(epi_cache, epi);

return error;
@@ -1229,6 +1284,12 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
epi->event.events = event->events;
pt._key = event->events;
epi->event.data = event->data; /* protected by mtx */
+ if (epi->event.events & EPOLLWAKEUP) {
+ if (!epi->ws)
+ ep_create_wakeup_source(epi);
+ } else if (epi->ws) {
+ ep_destroy_wakeup_source(epi);
+ }

/*
* Get current event bits. We can safely use the file* here because
@@ -1244,6 +1305,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
spin_lock_irq(&ep->lock);
if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);

/* Notify waiting tasks that events are available */
if (waitqueue_active(&ep->wq))
@@ -1282,6 +1344,7 @@ static int ep_send_events_proc(struct eventpoll *ep, struct list_head *head,
!list_empty(head) && eventcnt < esed->maxevents;) {
epi = list_first_entry(head, struct epitem, rdllink);

+ __pm_relax(epi->ws);
list_del_init(&epi->rdllink);

pt._key = epi->event.events;
@@ -1298,6 +1361,7 @@ static int ep_send_events_proc(struct eventpoll *ep, struct list_head *head,
if (__put_user(revents, &uevent->events) ||
__put_user(epi->event.data, &uevent->data)) {
list_add(&epi->rdllink, head);
+ __pm_stay_awake(epi->ws);
return eventcnt ? eventcnt : -EFAULT;
}
eventcnt++;
@@ -1317,6 +1381,7 @@ static int ep_send_events_proc(struct eventpoll *ep, struct list_head *head,
* poll callback will queue them in ep->ovflist.
*/
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);
}
}
}
@@ -1629,6 +1694,10 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
if (!tfile->f_op || !tfile->f_op->poll)
goto error_tgt_fput;

+ /* Check if EPOLLWAKEUP is allowed */
+ if ((epds.events & EPOLLWAKEUP) && !capable(CAP_EPOLLWAKEUP))
+ goto error_tgt_fput;
+
/*
* We have to check that the file structure underneath the file descriptor
* the user passed to us _is_ an eventpoll file. And also we do not permit
diff --git a/include/linux/capability.h b/include/linux/capability.h
index 12d52de..222974a 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -360,8 +360,11 @@ struct cpu_vfs_cap_data {

#define CAP_WAKE_ALARM 35

+/* Allow preventing automatic system suspends while epoll events are pending */

-#define CAP_LAST_CAP CAP_WAKE_ALARM
+#define CAP_EPOLLWAKEUP 36
+
+#define CAP_LAST_CAP CAP_EPOLLWAKEUP

#define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)

diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 657ab55..520a57c 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -26,6 +26,12 @@
#define EPOLL_CTL_DEL 2
#define EPOLL_CTL_MOD 3

+/*
+ * Request the handling of system wakeup events so as to prevent automatic
+ * system suspends from happening while those events are being processed.
+ */
+#define EPOLLWAKEUP (1 << 29)
+
/* Set the One Shot behaviour for the target file descriptor */
#define EPOLLONESHOT (1 << 30)

--
1.7.7.3

Arve Hjønnevåg

unread,
Apr 30, 2012, 8:52:18 PM4/30/12
to NeilBrown, Rafael J. Wysocki, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
On Sun, Apr 29, 2012 at 6:58 PM, NeilBrown <ne...@suse.de> wrote:
> On Thu, 26 Apr 2012 20:49:51 -0700 Arve Hjønnevåg <ar...@android.com> wrote:
..
>> I keep the wakeup-source active whenever the epitem is on a list
>> (ep->rdllist or the local txlist). The temporary txlist is modified
>> without holding the lock that protects ep->rdllist. It is easier to
>> use a separate wakeup source to prevent suspend while this list is
>> manipulated than trying to maintain the wakeup-source state in a
>> different way than the existing eventpoll state. I think this only
>> causes real problems if the same epoll file is used for frequent
>> non-wakeup events (e.g. a gyro) and wakeup events. You should be able
>> to work around this by using two epoll files.
>
> Thanks for the explanation.  I can now see more clearly how your patch works.
> I can also see why you might need the ep->ws wakeup_source.  However I don't
> like it.
>
> If it acted purely as a lock and prevented suspend while it was active then
> it would be fine.  However it doesn't.  It also aborts any current suspend
> attempt - so it is externally visible.
> The way your code it written, *any* call to epoll_wait will abort the current
> suspend cycle, even if it is called by a completely non-privileged user.

With the patch I posted Friday, a non-privileged user will not be able
to pass EPOLLWAKEUP and have the wakeup-source created.

> That may not obviously be harmful, but it makes the precise semantics of the
> system call something quite non-obvious, and it is much better to have a very
> clean semantic.
> As you say, it can probably be worked-around but code is much safer when you
> don't need to work-around things.
>
> I see two alternatives:
> 1/ set the 'wakeup' flag on the whole epoll-fd, not on the individual events
>   that it is asked to monitor.  i.e. add a new flag to epoll_create1()
>   instead of to epoll_ctl events.
>   Then you just need a single wakeup_source for the fd which is active
>   whenever any event is ready.
>
>   This interface might be generally nicer, I'm not sure.
>
> 2/ Find a way to get rid of ep->ws.
>   Thinking about it more, I again think it isn't needed.
>   The reason is that suspend is already exclusive with any process running in
>   kernel context.
>   One of the first things suspend does is to freeze all process and (for
>   regular non-kernel-thread processes) this happens by sending a virtual
>   signal which is acted up when the process returns from a system call or
>   returns from a context switch.  So while any given system call is running
>   (e.g. epoll_wait) suspend is blocked.  When epoll_wait sets
>   TASK_INTERRUPTIBLE the 'freeze' signal will interrupt it of course, but
>   this is the only point where suspend can interfere with epoll_wait, and you
>   aren't holding ep->ws then anyway.
>   Hopefully Rafael will correct me if I got that outline wrong.  But even if
>   I did, I think we need to get rid of ep->ws.
>

If ep_scan_ready_list is only called from freezable threads, then
ep->ws is not strictly needed, but without it another suspend attempt
will be triggered if there are not other wakeup-sources active. I'm
also not sure if it could get called from a non-freezable thread since
other subsystems can call it through the poll hook.

A third option is to only activate ep->ws when needed. This may may work:
---
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 16718f6..beb7138 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -572,7 +572,6 @@ static int ep_scan_ready_list(struct eventpoll *ep,
* in a lockless way.
*/
spin_lock_irqsave(&ep->lock, flags);
- __pm_stay_awake(ep->ws);
list_splice_init(&ep->rdllist, &txlist);
ep->ovflist = NULL;
spin_unlock_irqrestore(&ep->lock, flags);
@@ -753,6 +752,8 @@ static int ep_read_events_proc(struct eventpoll
*ep, struct list_head *head,
* callback, but it's not actually ready, as far as
* caller requested events goes. We can remove it here.
*/
+ if (epi->ws && epi->ws->active)
+ __pm_stay_awake(ep->ws);
__pm_relax(epi->ws);
list_del_init(&epi->rdllink);
}
@@ -1344,6 +1345,8 @@ static int ep_send_events_proc(struct eventpoll
*ep, struct list_head *head,
!list_empty(head) && eventcnt < esed->maxevents;) {
epi = list_first_entry(head, struct epitem, rdllink);

+ if (epi->ws && epi->ws->active)
+ __pm_stay_awake(ep->ws);
__pm_relax(epi->ws);
list_del_init(&epi->rdllink);

---


> Also, I think it is important to clearly document how to use this safely.
> You suggested that if any EPOLLWAKEUP event is ready, then suspend will
> remain disabled not only until the event is handled, but also until the next
> call to epoll_wait.  That sounds like very useful semantics, but it isn't at
> all explicit in the patch.  I think it should be made very clear in
> eventpoll.h how the flag can be used. (and then eventually get this into a
> man page of course).
>

OK

>>
>> >> One last item that doesn't really belong here - but it is in context.
>> >>
>> >> This mechanism is elegant because it provides a single implementation that
>> >> provides wakeup_source for almost any sort of device.  I would like to do the
>> >> same thing for interrupts.
>> >> Most (maybe all) of the wakeup device on my phone have an interrupt where the
>> >> body is run in a thread.  When the thread has done it's work the event is
>> >> visible to userspace so the EPOLLWAKEUP mechanism is all that is needed to
>> >> complete the path to user-space (or for my user-space solution, nothing else
>> >> is needed once it is visible to user-space).
>> >> So we just need to ensure a clear path from the "top half" interrupt handler
>> >> to the threaded handler.
>> >> So I imagine attaching a wakeup source to every interrupt for which 'wakeup'
>> >> is enabled, activating it when the top-half starts and relaxing it when the
>> >> bottom-half completes.  With this in place, almost all drivers would get
>> >> wakeup_source handling for free.
>> >> Does this seem reasonable to you.
>> >
>> > Yes, it does.
>> >
>>
>> How useful is that? Suspend already synchronizes with interrupt
>> handlers and will not proceed until they have returned. Are threaded
>> interrupts handlers not always run at that stage? For drivers that use
>> work-queues instead of a threaded interrupt handler, I think the
>> suspend-blocking work-queue patch I wrote a while back is convenient.
>>
>
> Maybe it isn't useful at all - I'm still working this stuff out.
>
> Yes, threaded interrupts are run "straight away", but what exactly does that
> mean?  And in particular, is there any interlocking to ensure they run
> before suspend gets stop the CPU?  Maybe the scheduling priority of the
> different threads is enough to make sure this works, as irq_threads are
> SCHED_FIFO and  the suspending thread almost certainly isn't.  But is that
> still a guarantee on an SMP machine?  irq_threads aren't freezable so suspend
> won't block on them for that reason..
>
> I really just want to be sure that some interlock is in place to ensure that
> the threaded interrupt handler runs before suspend absolutely commits to
> suspending.  If that is already the case, when what I suggest isn't needed as
> you suggest.  Do you know of such an interlock?
>

Normal interrupts are disabled during suspend. This synchronizes with
the interrupt handler, and pending wakeup interrupts abort suspend. I
have not looked at this code since threaded interrupt handlers were
added, so there could be bugs there.

Arve Hjønnevåg

unread,
May 1, 2012, 1:34:08 AM5/1/12
to Rafael J. Wysocki, Arve Hjønnevåg, NeilBrown, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
When an epoll_event, that has the EPOLLWAKEUP flag set, is ready, a
wakeup_source will be active to prevent suspend. This can be used to
handle wakeup events from a driver that support poll, e.g. input, if
that driver wakes up the waitqueue passed to epoll before allowing
suspend.

Signed-off-by: Arve Hjønnevåg <ar...@android.com>
Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
---
fs/eventpoll.c | 90 ++++++++++++++++++++++++++++++++++++++++++-
include/linux/capability.h | 5 ++-
include/linux/eventpoll.h | 12 ++++++
3 files changed, 103 insertions(+), 4 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 739b098..1abed50 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -588,8 +595,10 @@ static int ep_scan_ready_list(struct eventpoll *ep,
* queued into ->ovflist but the "txlist" might already
* contain them, and the list_splice() below takes care of them.
*/
- if (!ep_is_linked(&epi->rdllink))
+ if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);
+ }
}
/*
* We need to set back ep->ovflist to EP_UNACTIVE_PTR, so that after
@@ -602,6 +611,7 @@ static int ep_scan_ready_list(struct eventpoll *ep,
* Quickly re-inject items left on "txlist".
*/
list_splice(&txlist, &ep->rdllist);
+ __pm_relax(ep->ws);

if (!list_empty(&ep->rdllist)) {
/*
@@ -656,6 +666,8 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi)
list_del_init(&epi->rdllink);
spin_unlock_irqrestore(&ep->lock, flags);

+ wakeup_source_unregister(epi->ws);
+
/* At this point it is safe to free the eventpoll item */
kmem_cache_free(epi_cache, epi);

@@ -706,6 +718,7 @@ static void ep_free(struct eventpoll *ep)
mutex_unlock(&epmutex);
mutex_destroy(&ep->mtx);
free_uid(ep->user);
+ wakeup_source_unregister(ep->ws);
kfree(ep);
}

@@ -737,6 +750,7 @@ static int ep_read_events_proc(struct eventpoll *ep, struct list_head *head,
* callback, but it's not actually ready, as far as
* caller requested events goes. We can remove it here.
*/
+ __pm_relax(epi->ws);
list_del_init(&epi->rdllink);
}
}
@@ -927,13 +941,23 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
if (epi->next == EP_UNACTIVE_PTR) {
epi->next = ep->ovflist;
ep->ovflist = epi;
+ if (epi->ws) {
+ /*
+ * Activate ep->ws since epi->ws may get
+ * deactivated at any time.
+ */
+ __pm_stay_awake(ep->ws);
+ }
+
}
goto out_unlock;
}

/* If this file is already in the ready list we exit soon */
- if (!ep_is_linked(&epi->rdllink))
+ if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);
+ }

/*
* Wake up ( if active ) both the eventpoll wait list and the ->poll()
@@ -1091,6 +1115,30 @@ static int reverse_path_check(void)
@@ -1118,6 +1166,13 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
epi->event = *event;
epi->nwait = 0;
epi->next = EP_UNACTIVE_PTR;
+ if (epi->event.events & EPOLLWAKEUP) {
+ error = ep_create_wakeup_source(epi);
+ if (error)
+ goto error_create_wakeup_source;
+ } else {
+ epi->ws = NULL;
+ }

/* Initialize the poll table using the queue callback */
epq.epi = epi;
@@ -1164,6 +1219,7 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
/* If the file is already "ready" we drop it inside the ready list */
if ((revents & event->events) && !ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);

/* Notify waiting tasks that events are available */
if (waitqueue_active(&ep->wq))
@@ -1204,6 +1260,9 @@ error_unregister:
list_del_init(&epi->rdllink);
spin_unlock_irqrestore(&ep->lock, flags);

+ wakeup_source_unregister(epi->ws);
+
+error_create_wakeup_source:
kmem_cache_free(epi_cache, epi);

return error;
@@ -1229,6 +1288,12 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
epi->event.events = event->events;
pt._key = event->events;
epi->event.data = event->data; /* protected by mtx */
+ if (epi->event.events & EPOLLWAKEUP) {
+ if (!epi->ws)
+ ep_create_wakeup_source(epi);
+ } else if (epi->ws) {
+ ep_destroy_wakeup_source(epi);
+ }

/*
* Get current event bits. We can safely use the file* here because
@@ -1244,6 +1309,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
spin_lock_irq(&ep->lock);
if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);

/* Notify waiting tasks that events are available */
if (waitqueue_active(&ep->wq))
@@ -1282,6 +1348,18 @@ static int ep_send_events_proc(struct eventpoll *ep, struct list_head *head,
!list_empty(head) && eventcnt < esed->maxevents;) {
epi = list_first_entry(head, struct epitem, rdllink);

+ /*
+ * Activate ep->ws before deactivating epi->ws to prevent
+ * triggering auto-suspend here (in case we reactive epi->ws
+ * below).
+ *
+ * This could be rearranged to delay the deactivation of epi->ws
+ * instead, but then epi->ws would temporarily be out of sync
+ * with ep_is_linked().
+ */
+ if (epi->ws && epi->ws->active)
+ __pm_stay_awake(ep->ws);
+ __pm_relax(epi->ws);
list_del_init(&epi->rdllink);

pt._key = epi->event.events;
@@ -1298,6 +1376,7 @@ static int ep_send_events_proc(struct eventpoll *ep, struct list_head *head,
if (__put_user(revents, &uevent->events) ||
__put_user(epi->event.data, &uevent->data)) {
list_add(&epi->rdllink, head);
+ __pm_stay_awake(epi->ws);
return eventcnt ? eventcnt : -EFAULT;
}
eventcnt++;
@@ -1317,6 +1396,7 @@ static int ep_send_events_proc(struct eventpoll *ep, struct list_head *head,
* poll callback will queue them in ep->ovflist.
*/
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);
}
}
}
@@ -1629,6 +1709,10 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
if (!tfile->f_op || !tfile->f_op->poll)
goto error_tgt_fput;

+ /* Check if EPOLLWAKEUP is allowed */
+ if ((epds.events & EPOLLWAKEUP) && !capable(CAP_EPOLLWAKEUP))
+ goto error_tgt_fput;
+
/*
* We have to check that the file structure underneath the file descriptor
* the user passed to us _is_ an eventpoll file. And also we do not permit
diff --git a/include/linux/capability.h b/include/linux/capability.h
index 12d52de..222974a 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -360,8 +360,11 @@ struct cpu_vfs_cap_data {

#define CAP_WAKE_ALARM 35

+/* Allow preventing automatic system suspends while epoll events are pending */

-#define CAP_LAST_CAP CAP_WAKE_ALARM
+#define CAP_EPOLLWAKEUP 36
+
+#define CAP_LAST_CAP CAP_EPOLLWAKEUP

#define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)

diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 657ab55..5b591fb 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -26,6 +26,18 @@
#define EPOLL_CTL_DEL 2
#define EPOLL_CTL_MOD 3

+/*
+ * Request the handling of system wakeup events so as to prevent automatic
+ * system suspends from happening while those events are being processed.
+ *
+ * Assuming neither EPOLLET nor EPOLLONESHOT is set, automatic system suspends
+ * will not be re-allowed until epoll_wait is called again after consuming the
+ * wakeup event(s).
+ *
+ * Requires CAP_EPOLLWAKEUP
+ */
+#define EPOLLWAKEUP (1 << 29)
+
/* Set the One Shot behaviour for the target file descriptor */
#define EPOLLONESHOT (1 << 30)

--
1.7.7.3

Rafael J. Wysocki

unread,
May 1, 2012, 9:47:09 AM5/1/12
to NeilBrown, Arve Hjønnevåg, Linux PM list, LKML, Magnus Damm, mark...@thegnar.org, Matthew Garrett, Greg KH, John Stultz, Brian Swetland, Alan Stern, Dmitry Torokhov, Srivatsa S. Bhat
On Tuesday, May 01, 2012, NeilBrown wrote:
> On Mon, 30 Apr 2012 22:33:48 -0700 Arve Hjønnevåg <ar...@android.com> wrote:
>
> > When an epoll_event, that has the EPOLLWAKEUP flag set, is ready, a
> > wakeup_source will be active to prevent suspend. This can be used to
> > handle wakeup events from a driver that support poll, e.g. input, if
> > that driver wakes up the waitqueue passed to epoll before allowing
> > suspend.
> >
> > Signed-off-by: Arve Hjønnevåg <ar...@android.com>
> > Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>
>
> Thanks.
> Reviewed-by: NeilBrown <ne...@suse.de>

Thanks a lot for your involvement here!

> However:
> 1/ I think all references to "automatic system suspend" can be replaced with
> "system suspend" as an active wakeup_source disables any suspend, no matter
> it's source

OK, I'll change that when applying the patch (although that only applies to
suspends taking the wakeup events signaling through wakeup sources into
account).

> 2/ I reserve to right to submit for discussion a later patch which removes
> the ep->ws in favour or some other exclusion mechanism :-)

Well, you can alwyas do that. :-) Of course, when the patch goes to Linus,
we'll have to be careful about changes visible to user space, though.

Thanks,
Rafael
It is loading more messages.
0 new messages