[PATCH 1/1] Cpufreq: Make governor data on nonboot cpus across system syspend/resume

Lan Tianyu

unread,

Nov 15, 2013, 1:10:02 AM11/15/13

to

Currently, governor of nonboot cpus will be put to EXIT when system syspend.
Since all these cpus will be unplugged and the governor usage_count decreases
to zero. The governor data and its sysfs interfaces will be freed or released.
This makes user config of these governors loss during suspend and resume.

This doesn't happen on the governor covering boot cpu because it isn't
unplugged during system suspend.

To fix this issue, skipping governor exit during system suspend and check
policy governor data to determine whether the governor is really needed
to be initialized when do init. If not, return EALREADY to indicate the
governor has been initialized and should do nothing. __cpufreq_governor()
convert EALREADY to 0 as return value for INIT event since governor is
still under INIT state and can do START operation.

Signed-off-by: Lan Tianyu <tiany...@intel.com>
---
drivers/cpufreq/cpufreq.c | 5 ++++-
drivers/cpufreq/cpufreq_governor.c | 13 ++++++++++++-
2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 02d534d..38f2e4a 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1239,7 +1239,7 @@ static int __cpufreq_remove_dev_finish(struct device *dev,

/* If cpu is last user of policy, free policy */
if (cpus == 1) {
- if (has_target()) {
+ if (has_target() && !frozen) {
ret = __cpufreq_governor(policy,
CPUFREQ_GOV_POLICY_EXIT);
if (ret) {
@@ -1822,6 +1822,9 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
((event == CPUFREQ_GOV_POLICY_EXIT) && !ret))
module_put(policy->governor->owner);

+ if ((event == CPUFREQ_GOV_POLICY_INIT) && ret == -EALREADY)
+ ret = 0;
+
return ret;
}

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 0806c31..ddb93af 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -204,9 +204,20 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,

switch (event) {
case CPUFREQ_GOV_POLICY_INIT:
+ /*
+ * In order to keep governor data across suspend/resume,
+ * Governor doesn't exit when suspend and will be
+ * reinitialized when resume. Here check policy governor
+ * data to determine whether the governor has been exited.
+ * If not, return EALREADY.
+ */
if (have_governor_per_policy()) {
- WARN_ON(dbs_data);
+ if (dbs_data)
+ return -EALREADY;
} else if (dbs_data) {
+ if (policy->governor_data == dbs_data)
+ return -EALREADY;
+
dbs_data->usage_count++;
policy->governor_data = dbs_data;
return 0;
--
1.8.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Lan Tianyu

unread,

Nov 15, 2013, 3:20:01 AM11/15/13

to

Currently, governor of nonboot cpus will be put to EXIT when system suspend.

Since all these cpus will be unplugged and the governor usage_count decreases
to zero. The governor data and its sysfs interfaces will be freed or released.
This makes user config of these governors loss during suspend and resume.

This doesn't happen on the governor covering boot cpu because it isn't
unplugged during system suspend.

To fix this issue, skipping governor exit during system suspend and check
policy governor data to determine whether the governor is really needed
to be initialized when do init. If not, return EALREADY to indicate the
governor has been initialized and should do nothing. __cpufreq_governor()
convert EALREADY to 0 as return value for INIT event since governor is
still under INIT state and can do START operation.

Signed-off-by: Lan Tianyu <tiany...@intel.com>
---

Fix some typos

Viresh Kumar

unread,

Nov 15, 2013, 5:30:02 AM11/15/13

to

On 15 November 2013 13:45, Lan Tianyu <tiany...@intel.com> wrote:
> Currently, governor of nonboot cpus will be put to EXIT when system suspend.
> Since all these cpus will be unplugged and the governor usage_count decreases
> to zero. The governor data and its sysfs interfaces will be freed or released.
> This makes user config of these governors loss during suspend and resume.
>
> This doesn't happen on the governor covering boot cpu because it isn't
> unplugged during system suspend.
>
> To fix this issue, skipping governor exit during system suspend and check
> policy governor data to determine whether the governor is really needed
> to be initialized when do init. If not, return EALREADY to indicate the
> governor has been initialized and should do nothing. __cpufreq_governor()
> convert EALREADY to 0 as return value for INIT event since governor is
> still under INIT state and can do START operation.
>
> Signed-off-by: Lan Tianyu <tiany...@intel.com>
> ---

Hi Lan..

Apologies!!

I already had a solution for this as this was reported by few Broadcom people
as well. But I haven't send it to mainline yet as it was untested. It
looked similar
to what you had..

And so I would have taken your patch (as you have sent it first to the list and
there is no real advantage of my patch over yours, they were almost same) :)

But then I went chasing another bug posted by Nishant:

https://lkml.org/lkml/2013/10/24/369

And the final solution I have to write solved all the problems you and he
reported.

Please have a look at that patch (you are cc'd) and give it a try to see
if it fixes your problem..

Btw, One question about your setup:
- you must have a multi cluster/socket SoC as you have atleast one more
policy structure than used for group containing boot cpu..
- Are you using separate governor for both groups?
- Or are you using CPUFREQ_HAVE_GOVERNOR_PER_POLICY stuff
to use same governor with separate tunables for both groups?

Just wanted to know if somebody else is also using
CPUFREQ_HAVE_GOVERNOR_PER_POLICY :)

Viresh Kumar

unread,

Nov 15, 2013, 6:00:01 AM11/15/13

to

Adding Rainer as well

On 15 November 2013 13:45, Lan Tianyu <tiany...@intel.com> wrote:

> Currently, governor of nonboot cpus will be put to EXIT when system suspend.
> Since all these cpus will be unplugged and the governor usage_count decreases
> to zero. The governor data and its sysfs interfaces will be freed or released.
> This makes user config of these governors loss during suspend and resume.
>
> This doesn't happen on the governor covering boot cpu because it isn't
> unplugged during system suspend.
>
> To fix this issue, skipping governor exit during system suspend and check
> policy governor data to determine whether the governor is really needed
> to be initialized when do init. If not, return EALREADY to indicate the
> governor has been initialized and should do nothing. __cpufreq_governor()
> convert EALREADY to 0 as return value for INIT event since governor is
> still under INIT state and can do START operation.

Though the patch I have sent fixes a problem similar to this but I don't think
patch of any of us will solve the issue Rainer is facing..

I checked his system configuration and its like this:
- Four CPUs, all having separate clock domains (atleast from kernel
perspective) and so separate policy structure.
- All are using ondemand governor
- not using CPUFREQ_HAVE_GOVERNOR_PER_POLICY feature
- So there is a single set of tunables for ondemand governor that is applicable
across all CPUs..

The way INIT/EXIT are designed in cpufreq_governor.c should take care
of this scenario.

memory for tunables must not be freed unless all the CPUs are removed.
Which can't happen, as we only offline non-boot CPUs and so I believe
that memory isn't getting freed and so your solution wouldn't address his
problem..

Sorry if I said something stupid enough :)

--
viresh

Rafael J. Wysocki

unread,

Nov 15, 2013, 7:30:04 PM11/15/13

to

On Friday, November 15, 2013 04:15:34 PM Lan Tianyu wrote:
> Currently, governor of nonboot cpus will be put to EXIT when system suspend.
> Since all these cpus will be unplugged and the governor usage_count decreases
> to zero. The governor data and its sysfs interfaces will be freed or released.
> This makes user config of these governors loss during suspend and resume.

First off, do we have a pointer to a bug report related to that?

Second, what does need to be done to reproduce this problem?

I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

Lan Tianyu

unread,

Nov 15, 2013, 11:10:02 PM11/15/13

to

On 11/16/2013 08:38 AM, Rafael J. Wysocki wrote:
> On Friday, November 15, 2013 04:15:34 PM Lan Tianyu wrote:
>> Currently, governor of nonboot cpus will be put to EXIT when system suspend.
>> Since all these cpus will be unplugged and the governor usage_count decreases
>> to zero. The governor data and its sysfs interfaces will be freed or released.
>> This makes user config of these governors loss during suspend and resume.
>
> First off, do we have a pointer to a bug report related to that?
>

No, I found this bug when I tried to resolve other similar bug.
https://bugzilla.kernel.org/show_bug.cgi?id=63081. I still have no idea
about bug 63081 and asked reporter to try this patch.

> Second, what does need to be done to reproduce this problem?
>

Defaultly, all cpus use ondemand governor after bootup. Change one
non-boot cpu's governor to conservative, modify conservative config via
sysfs interface and then do system suspend. After resume, the config
of conservative is reset. On my machine, all cpus have owen policy.

Best Regards
Tianyu Lan

Lan Tianyu

unread,

Nov 15, 2013, 11:40:02 PM11/15/13

to

On 11/15/2013 06:22 PM, Viresh Kumar wrote:
> On 15 November 2013 13:45, Lan Tianyu <tiany...@intel.com> wrote:
>> Currently, governor of nonboot cpus will be put to EXIT when system suspend.
>> Since all these cpus will be unplugged and the governor usage_count decreases
>> to zero. The governor data and its sysfs interfaces will be freed or released.
>> This makes user config of these governors loss during suspend and resume.
>>
>> This doesn't happen on the governor covering boot cpu because it isn't
>> unplugged during system suspend.
>>
>> To fix this issue, skipping governor exit during system suspend and check
>> policy governor data to determine whether the governor is really needed
>> to be initialized when do init. If not, return EALREADY to indicate the
>> governor has been initialized and should do nothing. __cpufreq_governor()
>> convert EALREADY to 0 as return value for INIT event since governor is
>> still under INIT state and can do START operation.
>>
>> Signed-off-by: Lan Tianyu <tiany...@intel.com>
>> ---
>
> Hi Lan..
>

Hi Viresh:

> Apologies!!
>
> I already had a solution for this as this was reported by few Broadcom people
> as well. But I haven't send it to mainline yet as it was untested. It
> looked similar
> to what you had..
>
> And so I would have taken your patch (as you have sent it first to the list and
> there is no real advantage of my patch over yours, they were almost same) :)
>
> But then I went chasing another bug posted by Nishant:
>
> https://lkml.org/lkml/2013/10/24/369
>
> And the final solution I have to write solved all the problems you and he
> reported.
>
> Please have a look at that patch (you are cc'd) and give it a try to see
> if it fixes your problem..

Never mind. I think it should work and will try it.

>
> Btw, One question about your setup:
> - you must have a multi cluster/socket SoC as you have atleast one more
> policy structure than used for group containing boot cpu..

Actually, I test on a laptop and find this issue when reading code to
fix other bug. :)

All cpus have their own policys.

> - Are you using separate governor for both groups?

Just to produce the bug, I set one non-boot cpu to conservative gov. All
other remain default "ondemand".

> - Or are you using CPUFREQ_HAVE_GOVERNOR_PER_POLICY stuff
> to use same governor with separate tunables for both groups?
>

No, I am not using this.

> Just wanted to know if somebody else is also using
> CPUFREQ_HAVE_GOVERNOR_PER_POLICY :)
>

--

Best Regards
Tianyu Lan

viresh kumar

unread,

Nov 16, 2013, 12:30:01 AM11/16/13

to

On Friday 15 November 2013 04:24 PM, Viresh Kumar wrote:
> Though the patch I have sent fixes a problem similar to this but I don't think
> patch of any of us will solve the issue Rainer is facing..
>
> I checked his system configuration and its like this:
> - Four CPUs, all having separate clock domains (atleast from kernel
> perspective) and so separate policy structure.
> - All are using ondemand governor
> - not using CPUFREQ_HAVE_GOVERNOR_PER_POLICY feature
> - So there is a single set of tunables for ondemand governor that is applicable
> across all CPUs..
>
> The way INIT/EXIT are designed in cpufreq_governor.c should take care
> of this scenario.
>
> memory for tunables must not be freed unless all the CPUs are removed.
> Which can't happen, as we only offline non-boot CPUs and so I believe
> that memory isn't getting freed and so your solution wouldn't address his
> problem..
>
> Sorry if I said something stupid enough :)

I haven't :)

From your another mail it is clear that you have used separate governors and so
you have faced the real problem :)

Hope my patch fixes it for you.

Rafael J. Wysocki

unread,

Nov 16, 2013, 9:30:01 AM11/16/13

to

On Saturday, November 16, 2013 11:59:59 AM Lan Tianyu wrote:
> On 11/16/2013 08:38 AM, Rafael J. Wysocki wrote:
> > On Friday, November 15, 2013 04:15:34 PM Lan Tianyu wrote:
> >> Currently, governor of nonboot cpus will be put to EXIT when system suspend.
> >> Since all these cpus will be unplugged and the governor usage_count decreases
> >> to zero. The governor data and its sysfs interfaces will be freed or released.
> >> This makes user config of these governors loss during suspend and resume.
> >
> > First off, do we have a pointer to a bug report related to that?
> >
>
> No, I found this bug when I tried to resolve other similar bug.
> https://bugzilla.kernel.org/show_bug.cgi?id=63081. I still have no idea
> about bug 63081 and asked reporter to try this patch.
>
> > Second, what does need to be done to reproduce this problem?
> >
>
> Defaultly, all cpus use ondemand governor after bootup. Change one
> non-boot cpu's governor to conservative,

Well, why would anyone want to do that? Just out of curiosity ...

> modify conservative config via sysfs interface and then do system suspend.
> After resume, the config of conservative is reset. On my machine, all cpus
> have owen policy.

So this is acpi-cpufreq, right?

The patch looks basically OK to me, but ->

-> I'd prefer this check to be combined with the one done to determine whether
or not we need to do the module_put(). Something like

if (event == CPUFREQ_GOV_POLICY_EXIT && ret) {
module_put(policy->governor->owner);
if (ret == -EALREADY)
return 0;
} else if (event == CPUFREQ_GOV_POLICY_EXIT && !ret) {
module_put(policy->governor->owner);
}

Thanks!

I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

Rafael J. Wysocki

unread,

Nov 16, 2013, 10:00:02 AM11/16/13

to

On Saturday, November 16, 2013 08:27:07 PM Viresh Kumar wrote:

> On 16 November 2013 20:11, Rafael J. Wysocki <r...@rjwysocki.net> wrote:
> > On Saturday, November 16, 2013 11:59:59 AM Lan Tianyu wrote:
>
> >> Defaultly, all cpus use ondemand governor after bootup. Change one
> >> non-boot cpu's governor to conservative,
> >
> > Well, why would anyone want to do that? Just out of curiosity ...
>

> People may want to use different group/cluster/socket of CPUs differently,
> with different kind of policies. Maybe performance governor for boot cpu
> and ondemand for others.
>
> This bug would also be there for big LITTLE where we want to have
> separate set of tunables for big and LITTLE clusters for the same type
> of governor.

>
> > So this is acpi-cpufreq, right?
>

> Probably yes, I saw something similar somewhere.. But this is driver
> independent..

>
> > The patch looks basically OK to me, but ->
>

> We wouldn't need this patch if my other patch (where I am disabling
> governors in suspend/resume goes in, in any form)..

Yes, I know that, but I don't think this is the right approach.

Thanks!

Viresh Kumar

unread,

Nov 16, 2013, 10:00:02 AM11/16/13

to

On 16 November 2013 20:11, Rafael J. Wysocki <r...@rjwysocki.net> wrote:

> On Saturday, November 16, 2013 11:59:59 AM Lan Tianyu wrote:

>> Defaultly, all cpus use ondemand governor after bootup. Change one
>> non-boot cpu's governor to conservative,
>
> Well, why would anyone want to do that? Just out of curiosity ...

People may want to use different group/cluster/socket of CPUs differently,
with different kind of policies. Maybe performance governor for boot cpu
and ondemand for others.

This bug would also be there for big LITTLE where we want to have
separate set of tunables for big and LITTLE clusters for the same type
of governor.

> So this is acpi-cpufreq, right?

Probably yes, I saw something similar somewhere.. But this is driver
independent..

> The patch looks basically OK to me, but ->

We wouldn't need this patch if my other patch (where I am disabling
governors in suspend/resume goes in, in any form)..

Rafael J. Wysocki

unread,

Nov 16, 2013, 10:00:03 AM11/16/13

to

On Saturday, November 16, 2013 03:41:10 PM Rafael J. Wysocki wrote:

[...]

> > >> @@ -1822,6 +1822,9 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
> > >> ((event == CPUFREQ_GOV_POLICY_EXIT) && !ret))
> > >> module_put(policy->governor->owner);
> > >>
> > >> + if ((event == CPUFREQ_GOV_POLICY_INIT) && ret == -EALREADY)
> > >> + ret = 0;
> > >> +
>
> -> I'd prefer this check to be combined with the one done to determine whether
> or not we need to do the module_put(). Something like
>
> if (event == CPUFREQ_GOV_POLICY_EXIT && ret) {

Obviously, that should be:

if (event == CPUFREQ_GOV_POLICY_INIT && ret) {

> module_put(policy->governor->owner);
> if (ret == -EALREADY)
> return 0;
> } else if (event == CPUFREQ_GOV_POLICY_EXIT && !ret) {
> module_put(policy->governor->owner);
> }

Sorry for the confusion.

Thanks!

Lan Tianyu

unread,

Nov 16, 2013, 10:30:02 AM11/16/13

to

On 11/16/2013 10:41 PM, Rafael J. Wysocki wrote:
> On Saturday, November 16, 2013 11:59:59 AM Lan Tianyu wrote:
>> On 11/16/2013 08:38 AM, Rafael J. Wysocki wrote:
>>> On Friday, November 15, 2013 04:15:34 PM Lan Tianyu wrote:
>>>> Currently, governor of nonboot cpus will be put to EXIT when system suspend.
>>>> Since all these cpus will be unplugged and the governor usage_count decreases
>>>> to zero. The governor data and its sysfs interfaces will be freed or released.
>>>> This makes user config of these governors loss during suspend and resume.
>>>
>>> First off, do we have a pointer to a bug report related to that?
>>>
>>
>> No, I found this bug when I tried to resolve other similar bug.
>> https://bugzilla.kernel.org/show_bug.cgi?id=63081. I still have no idea
>> about bug 63081 and asked reporter to try this patch.
>>
>>> Second, what does need to be done to reproduce this problem?
>>>
>>
>> Defaultly, all cpus use ondemand governor after bootup. Change one
>> non-boot cpu's governor to conservative,
>
> Well, why would anyone want to do that? Just out of curiosity ...

Just use this way to produce the issue. But on the laptop, I think
fewer people will do the same thing. Just like Viresh said, this also
will affect the systems of governor per policy.

>
>> modify conservative config via sysfs interface and then do system suspend.
>> After resume, the config of conservative is reset. On my machine, all cpus
>> have owen policy.
>
> So this is acpi-cpufreq, right?
>

Yes, it's acpi-cpufreq driver.

Ok. I will update soon.

Best Regards
Tianyu Lan

Lan Tianyu

unread,

Nov 16, 2013, 10:40:02 AM11/16/13

to

Currently, governor of nonboot cpus will be put to EXIT when system suspend.
Since all these cpus will be unplugged and the governor usage_count decreases
to zero. The governor data and its sysfs interfaces will be freed or released.
This makes user config of these governors loss during suspend and resume.

This doesn't happen on the governor covering boot cpu because it isn't
unplugged during system suspend.

To fix this issue, skipping governor exit during system suspend and check
policy governor data to determine whether the governor is really needed
to be initialized when do init. If not, return EALREADY to indicate the
governor has been initialized and should do nothing. __cpufreq_governor()
convert EALREADY to 0 as return value for INIT event since governor is
still under INIT state and can do START operation.

Signed-off-by: Lan Tianyu <tiany...@intel.com>
---

Change since v1:
Change coding style.

drivers/cpufreq/cpufreq.c | 10 +++++++---
drivers/cpufreq/cpufreq_governor.c | 13 ++++++++++++-
2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 02d534d..73ad593 100644

--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1239,7 +1239,7 @@ static int __cpufreq_remove_dev_finish(struct device *dev,

/* If cpu is last user of policy, free policy */
if (cpus == 1) {
- if (has_target()) {
+ if (has_target() && !frozen) {
ret = __cpufreq_governor(policy,
CPUFREQ_GOV_POLICY_EXIT);
if (ret) {

@@ -1818,9 +1818,13 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
mutex_unlock(&cpufreq_governor_lock);
}

- if (((event == CPUFREQ_GOV_POLICY_INIT) && ret) ||
- ((event == CPUFREQ_GOV_POLICY_EXIT) && !ret))
+ if ((event == CPUFREQ_GOV_POLICY_INIT) && ret) {
+ module_put(policy->governor->owner);
+ if (ret == -EALREADY)
+ return 0;
+ } else if ((event == CPUFREQ_GOV_POLICY_EXIT) && !ret) {
module_put(policy->governor->owner);
+ }

1.8.2.1

viresh kumar

unread,

Nov 16, 2013, 11:20:01 PM11/16/13

to

On 16 November 2013 21:06, Lan Tianyu <tiany...@intel.com> wrote:
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c

> @@ -1818,9 +1818,13 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
> mutex_unlock(&cpufreq_governor_lock);
> }
>
> - if (((event == CPUFREQ_GOV_POLICY_INIT) && ret) ||
> - ((event == CPUFREQ_GOV_POLICY_EXIT) && !ret))
> + if ((event == CPUFREQ_GOV_POLICY_INIT) && ret) {
> + module_put(policy->governor->owner);
> + if (ret == -EALREADY)
> + return 0;
> + } else if ((event == CPUFREQ_GOV_POLICY_EXIT) && !ret) {
> module_put(policy->governor->owner);
> + }

This can still be written more efficiently I believe:

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 138ebe9..54e28e1 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1866,10 +1866,12 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
ret = policy->governor->governor(policy, event);

if (!ret) {
- if (event == CPUFREQ_GOV_POLICY_INIT)
+ if (event == CPUFREQ_GOV_POLICY_INIT) {
policy->governor->initialized++;
- else if (event == CPUFREQ_GOV_POLICY_EXIT)
+ } else if (event == CPUFREQ_GOV_POLICY_EXIT) {
policy->governor->initialized--;

+ module_put(policy->governor->owner);
+ }

} else {
/* Restore original values */
mutex_lock(&cpufreq_governor_lock);
@@ -1877,13 +1879,14 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
policy->governor_enabled = true;
else if (event == CPUFREQ_GOV_START)
policy->governor_enabled = false;
+ else if (event == CPUFREQ_GOV_POLICY_INIT)

+ if (ret == -EALREADY) {

+ module_put(policy->governor->owner);

+ ret = 0;
+ }

mutex_unlock(&cpufreq_governor_lock);
}

- if (((event == CPUFREQ_GOV_POLICY_INIT) && ret) ||
- ((event == CPUFREQ_GOV_POLICY_EXIT) && !ret))

- module_put(policy->governor->owner);
-
return ret;

But I don't really like the solution here. You are handling frozen for EXIT in
cpufreq-core and for INIT in governor. That doesn't look like the right
approach. There are out of tree governors too (I know we don't care about them
:)), and those also need to adapt with some policy made at cpufreq-core level.

I told you that I had another solution for this problem, pretty similar to
yours. It looked like this:

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 4fa06de..e70e906 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -408,7 +408,8 @@ show_one(scaling_max_freq, max);
show_one(scaling_cur_freq, cur);

static int cpufreq_set_policy(struct cpufreq_policy *policy,
- struct cpufreq_policy *new_policy);
+ struct cpufreq_policy *new_policy,
+ bool frozen);

/**
* cpufreq_per_cpu_attr_write() / store_##file_name() - sysfs write access
@@ -428,7 +429,7 @@ static ssize_t store_##file_name \
if (ret != 1) \
return -EINVAL; \
\
- ret = cpufreq_set_policy(policy, &new_policy); \
+ ret = cpufreq_set_policy(policy, &new_policy, false); \
policy->user_policy.object = policy->object; \
\
return ret ? ret : count; \
@@ -486,7 +487,7 @@ static ssize_t store_scaling_governor(struct cpufreq_policy
*policy,
&new_policy.governor))
return -EINVAL;

- ret = cpufreq_set_policy(policy, &new_policy);
+ ret = cpufreq_set_policy(policy, &new_policy, false);

policy->user_policy.policy = policy->policy;
policy->user_policy.governor = policy->governor;
@@ -822,7 +823,7 @@ err_out_kobj_put:
return ret;
}

-static void cpufreq_init_policy(struct cpufreq_policy *policy)
+static void cpufreq_init_policy(struct cpufreq_policy *policy, bool frozen)
{
struct cpufreq_policy new_policy;
int ret = 0;
@@ -832,7 +833,7 @@ static void cpufreq_init_policy(struct cpufreq_policy *policy)
policy->governor = NULL;

/* set default policy */
- ret = cpufreq_set_policy(policy, &new_policy);
+ ret = cpufreq_set_policy(policy, &new_policy, frozen);
policy->user_policy.policy = policy->policy;
policy->user_policy.governor = policy->governor;

@@ -1077,7 +1078,7 @@ static int __cpufreq_add_dev(struct device *dev, struct
subsys_interface *sif,
list_add(&policy->policy_list, &cpufreq_policy_list);
write_unlock_irqrestore(&cpufreq_driver_lock, flags);

- cpufreq_init_policy(policy);
+ cpufreq_init_policy(policy, frozen);

kobject_uevent(&policy->kobj, KOBJ_ADD);
up_read(&cpufreq_rwsem);
@@ -1239,17 +1240,17 @@ static int __cpufreq_remove_dev_finish(struct device *dev,

/* If cpu is last user of policy, free policy */
if (cpus == 1) {
- if (has_target()) {

- ret = __cpufreq_governor(policy,
- CPUFREQ_GOV_POLICY_EXIT);
- if (ret) {
- pr_err("%s: Failed to exit governor\n",
- __func__);
- return ret;
+ if (!frozen) {
+ if (has_target()) {
+ ret = __cpufreq_governor(policy,
+ CPUFREQ_GOV_POLICY_EXIT);
+ if (ret) {
+ pr_err("%s: Failed to exit governor\n",
+ __func__);
+ return ret;
+ }
}
- }

- if (!frozen) {
down_read(&policy->rwsem);
kobj = &policy->kobj;
cmp = &policy->kobj_unregister;
@@ -1920,7 +1921,7 @@ EXPORT_SYMBOL(cpufreq_get_policy);
* new_policy: policy to be set.
*/
static int cpufreq_set_policy(struct cpufreq_policy *policy,
- struct cpufreq_policy *new_policy)
+ struct cpufreq_policy *new_policy, bool frozen)
{
int ret = 0, failed = 1;

@@ -1987,7 +1988,7 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,

/* start new governor */
policy->governor = new_policy->governor;
- if (!__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) {
+ if (frozen || !__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) {
if (!__cpufreq_governor(policy, CPUFREQ_GOV_START)) {
failed = 0;
} else {
@@ -2065,7 +2066,7 @@ int cpufreq_update_policy(unsigned int cpu)
}
}

- ret = cpufreq_set_policy(policy, &new_policy);
+ ret = cpufreq_set_policy(policy, &new_policy, false);

up_write(&policy->rwsem);

-------------------------------

But after the PM notifiers patch I even don't want this to go.. I will make sure
that that patch goes in, in one form or another :)

So, just wait for sometime before posting a new version :) (I know you did it
because Rafael suggested a change)..

Rafael J. Wysocki

unread,

Nov 17, 2013, 9:40:01 AM11/17/13

to

You can write this as

else if (event == CPUFREQ_GOV_POLICY_INIT && ret == -EALREADY) {

> + module_put(policy->governor->owner);
> + ret = 0;
> + }
> mutex_unlock(&cpufreq_governor_lock);
> }
>
> - if (((event == CPUFREQ_GOV_POLICY_INIT) && ret) ||
> - ((event == CPUFREQ_GOV_POLICY_EXIT) && !ret))
> - module_put(policy->governor->owner);
> -
> return ret;

Apart from the above I agree.

Thanks!

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

Viresh Kumar

unread,

Nov 17, 2013, 11:30:02 AM11/17/13

to

On 17 November 2013 20:14, Rafael J. Wysocki <r...@rjwysocki.net> wrote:
> On Sunday, November 17, 2013 09:43:14 AM viresh kumar wrote:

>> + else if (event == CPUFREQ_GOV_POLICY_INIT)
>> + if (ret == -EALREADY) {
>
> You can write this as
>
> else if (event == CPUFREQ_GOV_POLICY_INIT && ret == -EALREADY) {

I wrote it that way on the first go (though with separate braces for
both comparisons
as well), but there were multiple statements to enclose and so will
require {} and then
that would have to be added for above single line if/else as well, if
we follow coding
guidelines properly..

Anyway, this code is no more required. :)

>> + module_put(policy->governor->owner);
>> + ret = 0;
>> + }
>> mutex_unlock(&cpufreq_governor_lock);
>> }
>>
>> - if (((event == CPUFREQ_GOV_POLICY_INIT) && ret) ||
>> - ((event == CPUFREQ_GOV_POLICY_EXIT) && !ret))
>> - module_put(policy->governor->owner);
>> -
>> return ret;
>
> Apart from the above I agree.

Thanks..

Rafael J. Wysocki

unread,

Nov 21, 2013, 9:40:03 AM11/21/13

to

The inner parens are not necessary.

> - ((event == CPUFREQ_GOV_POLICY_EXIT) && !ret))
> + if ((event == CPUFREQ_GOV_POLICY_INIT) && ret) {

Same here.

> + module_put(policy->governor->owner);
> + if (ret == -EALREADY)
> + return 0;
> + } else if ((event == CPUFREQ_GOV_POLICY_EXIT) && !ret) {

Same here.

> module_put(policy->governor->owner);
> + }

Can you please combine these checks with the checks made right after
calling policy->governor->governor() as indicated by Viresh?

Rafael

Viresh Kumar

unread,

Nov 21, 2013, 11:00:02 AM11/21/13

to

On 21 November 2013 20:13, Rafael J. Wysocki <r...@rjwysocki.net> wrote:
>> - if (((event == CPUFREQ_GOV_POLICY_INIT) && ret) ||
>
> The inner parens are not necessary.
>
>> - ((event == CPUFREQ_GOV_POLICY_EXIT) && !ret))
>> + if ((event == CPUFREQ_GOV_POLICY_INIT) && ret) {
>
> Same here.
>
>> + module_put(policy->governor->owner);
>> + if (ret == -EALREADY)
>> + return 0;
>> + } else if ((event == CPUFREQ_GOV_POLICY_EXIT) && !ret) {
>
> Same here.

Logically, yes you are correct. But probably its better for readability to
get these even if you know precedence is going to take care of our
expression..

Rafael J. Wysocki

unread,

Nov 21, 2013, 4:40:02 PM11/21/13

to

On Thursday, November 21, 2013 09:24:02 PM Viresh Kumar wrote:
> On 21 November 2013 20:13, Rafael J. Wysocki <r...@rjwysocki.net> wrote:
> >> - if (((event == CPUFREQ_GOV_POLICY_INIT) && ret) ||
> >
> > The inner parens are not necessary.
> >
> >> - ((event == CPUFREQ_GOV_POLICY_EXIT) && !ret))
> >> + if ((event == CPUFREQ_GOV_POLICY_INIT) && ret) {
> >
> > Same here.
> >
> >> + module_put(policy->governor->owner);
> >> + if (ret == -EALREADY)
> >> + return 0;
> >> + } else if ((event == CPUFREQ_GOV_POLICY_EXIT) && !ret) {
> >
> > Same here.
>
> Logically, yes you are correct. But probably its better for readability to
> get these even if you know precedence is going to take care of our
> expression..

Are you serious?

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

Lan Tianyu

unread,

Nov 22, 2013, 2:40:01 AM11/22/13

to

Currently, governor of nonboot cpus will be put to EXIT when system suspend.
Since all these cpus will be unplugged and the governor usage_count decreases
to zero. The governor data and its sysfs interfaces will be freed or released.
This makes user config of these governors loss during suspend and resume.

This doesn't happen on the governor covering boot cpu because it isn't
unplugged during system suspend.

To fix this issue, skipping governor exit during system suspend and check
policy governor data to determine whether the governor is really needed
to be initialized when do init. If not, return EALREADY to indicate the
governor has been initialized and should do nothing. __cpufreq_governor()
convert EALREADY to 0 as return value for INIT event since governor is
still under INIT state and can do START operation.

Signed-off-by: Lan Tianyu <tiany...@intel.com>
---

Change since v1 and v2:
Change some coding style.

drivers/cpufreq/cpufreq.c | 14 ++++++++------
drivers/cpufreq/cpufreq_governor.c | 13 ++++++++++++-
2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 02d534d..0b5a1a2 100644

--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1239,7 +1239,7 @@ static int __cpufreq_remove_dev_finish(struct device *dev,

/* If cpu is last user of policy, free policy */
if (cpus == 1) {
- if (has_target()) {
+ if (has_target() && !frozen) {
ret = __cpufreq_governor(policy,
CPUFREQ_GOV_POLICY_EXIT);
if (ret) {

@@ -1806,8 +1806,10 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
if (!ret) {

if (event == CPUFREQ_GOV_POLICY_INIT)
policy->governor->initialized++;
- else if (event == CPUFREQ_GOV_POLICY_EXIT)

+ else if (event == CPUFREQ_GOV_POLICY_EXIT) {
policy->governor->initialized--;

+ module_put(policy->governor->owner);
+ }

} else {
/* Restore original values */
mutex_lock(&cpufreq_governor_lock);

@@ -1815,13 +1817,13 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,

policy->governor_enabled = true;
else if (event == CPUFREQ_GOV_START)
policy->governor_enabled = false;

+ else if (event == CPUFREQ_GOV_POLICY_INIT && ret == -EALREADY) {
+ module_put(policy->governor->owner);

+ ret = 0;
+ }
mutex_unlock(&cpufreq_governor_lock);
}

- if (((event == CPUFREQ_GOV_POLICY_INIT) && ret) ||

- ((event == CPUFREQ_GOV_POLICY_EXIT) && !ret))

- module_put(policy->governor->owner);
-
return ret;
}

--
1.8.2.1

viresh kumar

unread,

Nov 22, 2013, 3:00:02 AM11/22/13

to

On Sunday 17 November 2013 09:43 AM, viresh kumar wrote:
> On 16 November 2013 21:06, Lan Tianyu <tiany...@intel.com> wrote:

> But I don't really like the solution here. You are handling frozen for EXIT in
> cpufreq-core and for INIT in governor. That doesn't look like the right
> approach. There are out of tree governors too (I know we don't care about them
> :)), and those also need to adapt with some policy made at cpufreq-core level.
>
> I told you that I had another solution for this problem, pretty similar to
> yours. It looked like this:

Hi Lan,

There is some confusion going on here :)

There were few problems in the approach in your patch, which I have mentioned
above, and Rafael agreed to them..

> But after the PM notifiers patch I even don't want this to go.. I will make sure
> that that patch goes in, in one form or another :)

And I was still trying to get a better solution in place of these changes. And
was going on the suggestions you gave about calling cpufreq callbacks from
dpm_{suspend|resume}_noirq() calls.. And I am on my way to get things fixed that
way. And so we don't actually need this patch anymore (I just saw that you have
sent another version of it, probably because Rafael asked? Don't know what
happened there :))..

So, I will try to get something working soon for you and Nishanth..

Viresh Kumar

unread,

Nov 22, 2013, 3:40:02 AM11/22/13

to

On 22 November 2013 13:49, Lan Tianyu <tiany...@intel.com> wrote:

> I think you also are in the Cc list and replied the mail.:)
> https://lkml.org/lkml/2013/11/21/273

Yeah, my reply was more about the coding style and I should have
asked again on the same mail about waiting for sometime before
posting V3... My fault!!

> I only saw the out-of-tree governor issue your mentioned but where they
> are? How upstream kernel cares them?

That's what I said, we might not care about them. But I had issues with
the design of your patch:

>>> But I don't really like the solution here. You are handling frozen for EXIT in
>>> cpufreq-core and for INIT in governor. That doesn't look like the right
>>> approach.

And so gave the solution I had as well.. And then said, I even don't want my
solution to go in, as this can be fixed by taking adequate steps before
removing non-boot CPUs and we are still in discussions for that.

I will post the short term solution that Rafael referred to as soon as
possible and will discuss more with Rafael about the long term one
(As Rafael pointed out).

Sorry for the confusion..

Lan Tianyu

unread,

Nov 22, 2013, 3:40:02 AM11/22/13

to

On 2013年11月22日 15:49, viresh kumar wrote:
> On Sunday 17 November 2013 09:43 AM, viresh kumar wrote:
>> On 16 November 2013 21:06, Lan Tianyu <tiany...@intel.com> wrote:
>
>> But I don't really like the solution here. You are handling frozen for EXIT in
>> cpufreq-core and for INIT in governor. That doesn't look like the right
>> approach. There are out of tree governors too (I know we don't care about them
>> :)), and those also need to adapt with some policy made at cpufreq-core level.
>>
>> I told you that I had another solution for this problem, pretty similar to
>> yours. It looked like this:
>
> Hi Lan,
>

Hi Viresh:

> There is some confusion going on here :)
>

I think you also are in the Cc list and replied the mail.:)
https://lkml.org/lkml/2013/11/21/273

> There were few problems in the approach in your patch, which I have mentioned
> above, and Rafael agreed to them..

I only saw the out-of-tree governor issue your mentioned but where they
are? How upstream kernel cares them?

>

>> But after the PM notifiers patch I even don't want this to go.. I will make sure
>> that that patch goes in, in one form or another :)
>
> And I was still trying to get a better solution in place of these changes. And
> was going on the suggestions you gave about calling cpufreq callbacks from
> dpm_{suspend|resume}_noirq() calls.. And I am on my way to get things fixed that
> way. And so we don't actually need this patch anymore (I just saw that you have
> sent another version of it, probably because Rafael asked? Don't know what
> happened there :))..
>
> So, I will try to get something working soon for you and Nishanth..
>

--

Best regards
Tianyu Lan