[PATCH 0/3] Hook up powerclamp with PM QOS and cpuidle

Jacob Pan

unread,

Nov 26, 2013, 6:30:01 PM11/26/13

to

This patchset is intended to address the behavior change and efficiency
loss introduced by using consolidated idle routine in powerclamp driver.

Specifically,
[PATCH 3/8] idle, thermal, acpi: Remove home grown idle implementations

The motivation is that after using common idle routine, powerclamp driver
can no longer pick the deepest idle state needed to conserve power.
Idle state is selected by governors which can be influenced by PM QOS and
other factors. This patchset hooks up powerclamp idle injection with PM
QOS and eventually influce idle governors to pick the power saving target
states.

There are some downside of this approach. Due to overhead, communication
with PM QOS is at enable/disable idle injection time instead of each
injection period. The implication is that if the system natual idle is
more than target injected idle, powerclamp will skip some injection period.
During this period however, deepest idle state may still be chosen
necessarily regardless the latency constraint.

Jacob Pan (3):
pm/qos: allow state control of qos class
cpuidle: check for pm qos constraint override
thermal/powerclamp: communicate with pm qos when injecting idle

drivers/cpuidle/governors/ladder.c | 4 ++++
drivers/cpuidle/governors/menu.c | 13 +++++++++++--
drivers/thermal/intel_powerclamp.c | 8 ++++++++
include/linux/pm_qos.h | 10 +++++++++-
kernel/power/qos.c | 24 ++++++++++++++++++++++++
5 files changed, 56 insertions(+), 3 deletions(-)

--
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Rafael J. Wysocki

unread,

Nov 26, 2013, 6:30:01 PM11/26/13

to

On 11/27/2013 12:20 AM, Jacob Pan wrote:
> When power capping or thermal control is needed, CPU QOS latency cannot
> be satisfied. This patch adds a state variable to indicate whether a QOS
> class (including all constraint requests) should be ignored.
>
> Signed-off-by: Jacob Pan <jacob....@linux.intel.com>

Honestly, I don't like this. I know the motivation and what you're
trying to achieve, but I don't like the approach.

I need to think a bit more about that.

Thanks,
Rafael

> ---

> include/linux/pm_qos.h | 10 +++++++++-
> kernel/power/qos.c | 24 ++++++++++++++++++++++++

> 2 files changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
> index 5a95013..648b50b 100644
> --- a/include/linux/pm_qos.h
> +++ b/include/linux/pm_qos.h
> @@ -10,7 +10,7 @@
> #include <linux/device.h>
> #include <linux/workqueue.h>
>
> -enum {
> +enum pm_qos_class {
> PM_QOS_RESERVED = 0,
> PM_QOS_CPU_DMA_LATENCY,
> PM_QOS_NETWORK_LATENCY,
> @@ -20,6 +20,11 @@ enum {
> PM_QOS_NUM_CLASSES,
> };
>
> +enum pm_qos_state {
> + PM_QOS_CONSTRAINT_AVAILABLE,
> + PM_QOS_CONSTRAINT_IGNORED,
> +};
> +
> enum pm_qos_flags_status {
> PM_QOS_FLAGS_UNDEFINED = -1,
> PM_QOS_FLAGS_NONE,
> @@ -77,6 +82,7 @@ struct pm_qos_constraints {
> struct plist_head list;
> s32 target_value; /* Do not change to 64 bit */
> s32 default_value;
> + enum pm_qos_state state;
> enum pm_qos_type type;
> struct blocking_notifier_head *notifiers;
> };
> @@ -123,6 +129,8 @@ int pm_qos_add_notifier(int pm_qos_class, struct notifier_block *notifier);
> int pm_qos_remove_notifier(int pm_qos_class, struct notifier_block *notifier);
> int pm_qos_request_active(struct pm_qos_request *req);
> s32 pm_qos_read_value(struct pm_qos_constraints *c);
> +void pm_qos_set_constraint_class_state(enum pm_qos_class class,
> + enum pm_qos_state state);
>
> #ifdef CONFIG_PM
> enum pm_qos_flags_status __dev_pm_qos_flags(struct device *dev, s32 mask);
> diff --git a/kernel/power/qos.c b/kernel/power/qos.c
> index 8dff9b4..cf475b0 100644
> --- a/kernel/power/qos.c
> +++ b/kernel/power/qos.c
> @@ -146,6 +146,11 @@ static inline int pm_qos_get_value(struct pm_qos_constraints *c)
>
> s32 pm_qos_read_value(struct pm_qos_constraints *c)
> {
> + /* return invalid default value if constraints cannot be met, e.g.
> + * during idle injection.
> + */
> + if (c->state == PM_QOS_CONSTRAINT_IGNORED)
> + return PM_QOS_DEFAULT_VALUE;
> return c->target_value;
> }
>
> @@ -353,6 +358,25 @@ void pm_qos_add_request(struct pm_qos_request *req,
> }
> EXPORT_SYMBOL_GPL(pm_qos_add_request);
>
> +void pm_qos_set_constraint_class_state(enum pm_qos_class class,
> + enum pm_qos_state state)
> +{
> + struct pm_qos_constraints *c = pm_qos_array[class]->constraints;
> + unsigned long curr_value;
> +
> + if (c->state == state)
> + return;
> + curr_value = (state == PM_QOS_CONSTRAINT_IGNORED) ?
> + PM_QOS_DEFAULT_VALUE : c->target_value;
> + c->state = state;
> +
> + /* notify existing QOS requests change */
> + blocking_notifier_call_chain(c->notifiers,
> + curr_value,
> + NULL);
> +}
> +EXPORT_SYMBOL_GPL(pm_qos_set_constraint_class_state);
> +
> /**
> * pm_qos_update_request - modifies an existing qos request
> * @req : handle to list element holding a pm_qos request to use

Jacob Pan

unread,

Nov 26, 2013, 6:30:01 PM11/26/13

to

When power capping or thermal control is needed, CPU QOS latency cannot
be satisfied. This patch adds a state variable to indicate whether a QOS
class (including all constraint requests) should be ignored.

Signed-off-by: Jacob Pan <jacob....@linux.intel.com>

---

include/linux/pm_qos.h | 10 +++++++++-
kernel/power/qos.c | 24 ++++++++++++++++++++++++

Jacob Pan

unread,

Nov 26, 2013, 6:30:01 PM11/26/13

to

If a QOS class is disabled due to platform thermal or power capping
needs, idle governor shall pick the deepest idle state to conserve power.

Power saving should be prioritized over QOS in this case in that
hardware based catastrophic measure is more intrusive than
managed idle.

Signed-off-by: Jacob Pan <jacob....@linux.intel.com>
---

drivers/cpuidle/governors/ladder.c | 4 ++++
drivers/cpuidle/governors/menu.c | 13 +++++++++++--

2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
index 9f08e8c..40161d3 100644
--- a/drivers/cpuidle/governors/ladder.c
+++ b/drivers/cpuidle/governors/ladder.c
@@ -71,6 +71,10 @@ static int ladder_select_state(struct cpuidle_driver *drv,
int last_residency, last_idx = ldev->last_state_idx;
int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);

+ /* qos ignored, pick the deepest idle state */
+ if (unlikely(latency_req == PM_QOS_DEFAULT_VALUE))
+ return drv->state_count - 1;
+
/* Special case when user has set very strict latency requirement */
if (unlikely(latency_req == 0)) {
ladder_do_selection(ldev, last_idx, 0);
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index cf7f2f0..afa0088 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -297,12 +297,21 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
data->needs_update = 0;
}

+ /* qos ignored, pick the deepest idle state */
+ if (unlikely(latency_req == PM_QOS_DEFAULT_VALUE)) {
+ struct cpuidle_state *s = &drv->states[drv->state_count - 1];
+
+ data->last_state_idx = drv->state_count - 1;
+ data->exit_us = s->exit_latency;
+ goto exit_done;
+ }
+
data->last_state_idx = 0;
data->exit_us = 0;

/* Special case when user has set very strict latency requirement */
if (unlikely(latency_req == 0))
- return 0;
+ goto exit_done;

/* determine the expected residency time, round up */
t = ktime_to_timespec(tick_nohz_get_sleep_length());
@@ -361,7 +370,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
data->last_state_idx = i;
data->exit_us = s->exit_latency;
}
-
+exit_done:
return data->last_state_idx;

Jacob Pan

unread,

Nov 26, 2013, 6:30:02 PM11/26/13

to

During idle injection period, CPU PM QOS class shall be ignored.
This will indirectly influence the idle governors to choose the
deepest idle states.

Signed-off-by: Jacob Pan <jacob....@linux.intel.com>
---

drivers/thermal/intel_powerclamp.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
index de90f89..62dbf95 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -51,6 +51,7 @@
#include <linux/debugfs.h>
#include <linux/seq_file.h>
#include <linux/sched/rt.h>
+#include <linux/pm_qos.h>

#include <asm/nmi.h>
#include <asm/msr.h>
@@ -483,6 +484,10 @@ static int start_power_clamp(void)
clamping = true;
schedule_delayed_work(&poll_pkg_cstate_work, 0);

+ /* Tell PM QOS all CPU constraints are ignored during idle injection */
+ pm_qos_set_constraint_class_state(PM_QOS_CPU_DMA_LATENCY,
+ PM_QOS_CONSTRAINT_IGNORED);
+
/* start one thread per online cpu */
for_each_online_cpu(cpu) {
struct task_struct **p =
@@ -524,6 +529,9 @@ static void end_power_clamp(void)
kthread_stop(thread);
}
}
+ /* make CPU PM QOS active again */
+ pm_qos_set_constraint_class_state(PM_QOS_CPU_DMA_LATENCY,
+ PM_QOS_CONSTRAINT_AVAILABLE);
}

static int powerclamp_cpu_callback(struct notifier_block *nfb,

Peter Zijlstra

unread,

Nov 27, 2013, 7:00:03 AM11/27/13

to

On Tue, Nov 26, 2013 at 03:20:08PM -0800, Jacob Pan wrote:
> This patchset is intended to address the behavior change and efficiency
> loss introduced by using consolidated idle routine in powerclamp driver.
>
> Specifically,
> [PATCH 3/8] idle, thermal, acpi: Remove home grown idle implementations
>
> The motivation is that after using common idle routine, powerclamp driver
> can no longer pick the deepest idle state needed to conserve power.
> Idle state is selected by governors which can be influenced by PM QOS and
> other factors. This patchset hooks up powerclamp idle injection with PM
> QOS and eventually influce idle governors to pick the power saving target
> states.
>
> There are some downside of this approach. Due to overhead, communication
> with PM QOS is at enable/disable idle injection time instead of each
> injection period. The implication is that if the system natual idle is
> more than target injected idle, powerclamp will skip some injection period.
> During this period however, deepest idle state may still be chosen
> necessarily regardless the latency constraint.

Does the QoS stuff have a means of notifying its users of constraints
violation? I suspect some applications might light to be told if their
requests aren't honoured.

jacob pan

unread,

Nov 27, 2013, 11:50:03 AM11/27/13

to

Each class has a notifier. This patchset is calling the notifier
when the qos class is disable/enable. the receiver of these
notifications are in the kernel.

I don't see the qos core code has a way to signal userspace about
target change.

Rafael J. Wysocki

unread,

Nov 27, 2013, 3:40:02 PM11/27/13

to

But user space can add constraints and expect them to be actually satisfied.
If those constraints cannot be satisfied, there should be a mechanism to
notify the processes who set them about that. That mechanism is not present
currently.

Thanks,
Rafael

Rafael J. Wysocki

unread,

Jan 15, 2014, 8:10:02 PM1/15/14

to

On Wednesday, November 27, 2013 12:28:16 AM Rafael J. Wysocki wrote:
> On 11/27/2013 12:20 AM, Jacob Pan wrote:
> > When power capping or thermal control is needed, CPU QOS latency cannot
> > be satisfied. This patch adds a state variable to indicate whether a QOS
> > class (including all constraint requests) should be ignored.
> >
> > Signed-off-by: Jacob Pan <jacob....@linux.intel.com>
>
> Honestly, I don't like this. I know the motivation and what you're
> trying to achieve, but I don't like the approach.
>
> I need to think a bit more about that.

So the reason I don't like this patch is mainly because it affects all of the
users of struct pm_qos_constraints and pm_qos_read_value(), which include
device PM QoS among other things, but it only really needs to affect
PM_QOS_CPU_DMA_LATENCY.

I would add a special routine, say pm_qos_cpu_dma_latency(), for reading the
current effective PM_QOS_CPU_DMA_LATENCY constraint and checking whether or
not it should be ignored. Then, I'd make cpuidle use that.

Thanks!

> To unsubscribe from this list: send the line "unsubscribe linux-pm" in

> the body of a message to majo...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

Jacob Pan

unread,

Jan 21, 2014, 5:20:04 PM1/21/14

to

On Thu, 16 Jan 2014 02:17:01 +0100
"Rafael J. Wysocki" <r...@rjwysocki.net> wrote:

> On Wednesday, November 27, 2013 12:28:16 AM Rafael J. Wysocki wrote:
> > On 11/27/2013 12:20 AM, Jacob Pan wrote:
> > > When power capping or thermal control is needed, CPU QOS latency
> > > cannot be satisfied. This patch adds a state variable to indicate
> > > whether a QOS class (including all constraint requests) should be
> > > ignored.
> > >
> > > Signed-off-by: Jacob Pan <jacob....@linux.intel.com>
> >
> > Honestly, I don't like this. I know the motivation and what you're
> > trying to achieve, but I don't like the approach.
> >
> > I need to think a bit more about that.
>
> So the reason I don't like this patch is mainly because it affects
> all of the users of struct pm_qos_constraints and
> pm_qos_read_value(), which include device PM QoS among other things,
> but it only really needs to affect PM_QOS_CPU_DMA_LATENCY.
>
> I would add a special routine, say pm_qos_cpu_dma_latency(), for
> reading the current effective PM_QOS_CPU_DMA_LATENCY constraint and
> checking whether or not it should be ignored. Then, I'd make cpuidle
> use that.
>

Agreed, it was a little too broad. I will send an updated patch soon.

Alternatively, can we add a special check for ignored system wide QOS
class in:
int pm_qos_request(int pm_qos_class)

i.e.
diff --git a/kernel/power/qos.c b/kernel/power/qos.c
index 8dff9b4..9342da4 100644
--- a/kernel/power/qos.c
+++ b/kernel/power/qos.c
@@ -286,10 +286,28 @@ bool pm_qos_update_flags(struct pm_qos_flags *pqf,
*/
int pm_qos_request(int pm_qos_class)
{
- return pm_qos_read_value(pm_qos_array[pm_qos_class]->constraints);
+ struct pm_qos_constraints *c;
+
+ c = pm_qos_array[pm_qos_class]->constraints;

+ if (c->state == PM_QOS_CONSTRAINT_IGNORED)
+ return PM_QOS_DEFAULT_VALUE;

+ return pm_qos_read_value(c);

Then we don't have to add a special routine just for CPU_DMA_LATENCY
class. It does not affect other system wide QOS classes unless the
state is set to be ignored.

[Jacob Pan]

Rafael J. Wysocki

unread,

Jan 21, 2014, 6:10:02 PM1/21/14

to

Yes, but then the check has to be done regardless which is slightly inefficient
and I'm not sure if we need/want a mechanism to set "ignored" for all classes.

It actually is specific to CPU in practice, so I'd prefer to make it specific
in the code as well.

Thanks,
Rafael

Jacob Pan

unread,

Jan 21, 2014, 6:50:02 PM1/21/14

to

On Wed, 22 Jan 2014 00:15:44 +0100

Actually, the idle consolidation patches went into the tip tree do not
include common idle loop, it was different than the earlier patch with
play_idle() which causes idle injection to go through pm qos.

There is no need for this patchset for now. acpi_pad and powerclamp
driver still can pick their own target c-states.

Thanks,

Jacob