Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[RFC] ARM: sched_clock: update epoch_cyc on resume

146 views
Skip to first unread message

Colin Cross

unread,
Jul 17, 2012, 7:30:02 PM7/17/12
to
Many clocks that are used to provide sched_clock will reset during
suspend. If read_sched_clock returns 0 after suspend, sched_clock will
appear to jump forward. This patch resets cd.epoch_cyc to the current
value of read_sched_clock during resume, which causes sched_clock() just
after suspend to return the same value as sched_clock() just before
suspend.

In addition, during the window where epoch_ns has been updated before
suspend, but epoch_cyc has not been updated after suspend, it is unknown
whether the clock has reset or not, and sched_clock() could return a
bogus value. Add a suspended flag, and return the pre-suspend epoch_ns
value during this period.

This will have a side effect of causing SoCs that have clocks that
continue to count in suspend to appear to stop counting, reporting the
same sched_clock() value before and after suspend.

Signed-off-by: Colin Cross <ccr...@android.com>
---
arch/arm/kernel/sched_clock.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/arm/kernel/sched_clock.c b/arch/arm/kernel/sched_clock.c
index 27d186a..46c7d32 100644
--- a/arch/arm/kernel/sched_clock.c
+++ b/arch/arm/kernel/sched_clock.c
@@ -21,6 +21,7 @@ struct clock_data {
u32 epoch_cyc_copy;
u32 mult;
u32 shift;
+ bool suspended;
};

static void sched_clock_poll(unsigned long wrap_ticks);
@@ -49,6 +50,9 @@ static unsigned long long cyc_to_sched_clock(u32 cyc, u32 mask)
u64 epoch_ns;
u32 epoch_cyc;

+ if (cd.suspended)
+ return cd.epoch_ns;
+
/*
* Load the epoch_cyc and epoch_ns atomically. We do this by
* ensuring that we always write epoch_cyc, epoch_ns and
@@ -169,11 +173,20 @@ void __init sched_clock_postinit(void)
static int sched_clock_suspend(void)
{
sched_clock_poll(sched_clock_timer.data);
+ cd.suspended = true;
return 0;
}

+static void sched_clock_resume(void)
+{
+ cd.epoch_cyc = read_sched_clock();
+ cd.epoch_cyc_copy = cd.epoch_cyc;
+ cd.suspended = false;
+}
+
static struct syscore_ops sched_clock_ops = {
.suspend = sched_clock_suspend,
+ .resume = sched_clock_resume,
};

static int __init sched_clock_syscore_init(void)
--
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linus Walleij

unread,
Jul 23, 2012, 3:00:02 PM7/23/12
to
On Wed, Jul 18, 2012 at 1:27 AM, Colin Cross <ccr...@android.com> wrote:

> This will have a side effect of causing SoCs that have clocks that
> continue to count in suspend to appear to stop counting, reporting the
> same sched_clock() value before and after suspend.

So for our platform (ux500) that has a sched clock that *does*
continue to run during suspend,
drivers/clocksource/clksrc-dbx500-prcmu.c
how do we opt out of this behaviour?

Since sched_clock is used for the debug prints, if we have a
crash immediately after resume() it will appear to be at resume
time in the log which kinda sucks. :-(

Isn't the proper way to do this either:

- Assign suspend/resume hooks to the sched_clock code in the
platform and let the code that reads the hardware clock deal with
this

Or

- If it absolutely needs to be in the core code, also have a bool
field indicating whether the clock is going to die during suspend
and add new registration functions for setting that sched_clock
type, e.g. setup_sched_clock_nonsuspendable()

Yours,
Linus Walleij

Colin Cross

unread,
Jul 23, 2012, 3:30:01 PM7/23/12
to
On Mon, Jul 23, 2012 at 11:55 AM, Linus Walleij
<linus....@linaro.org> wrote:
> On Wed, Jul 18, 2012 at 1:27 AM, Colin Cross <ccr...@android.com> wrote:
>
>> This will have a side effect of causing SoCs that have clocks that
>> continue to count in suspend to appear to stop counting, reporting the
>> same sched_clock() value before and after suspend.
>
> So for our platform (ux500) that has a sched clock that *does*
> continue to run during suspend,
> drivers/clocksource/clksrc-dbx500-prcmu.c
> how do we opt out of this behaviour?

Does the clock you use for sched_clock continue to run in all suspend
modes? All the SoC's I've used only have a 32kHz clock in the deepest
suspend mode, which is not ideal for sched_clock. For example, on
Tegra2 the faster 1MHz clock used for sched_clock resets in the
deepest suspend state (LP0) but not the shallowest suspend state
(LP2), and which suspend state the chip hits depends on which hardware
is active. Opting out of this patch would cause Tegra's clock to
sometimes run in suspend, and sometimes not, which seems worse for
debugging than consistently not running in suspend. I'd be surprised
if a similar situation didn't apply to your platform.

> Since sched_clock is used for the debug prints, if we have a
> crash immediately after resume() it will appear to be at resume
> time in the log which kinda sucks. :-(

Most resume implementations I've seen print a resume message very
early, making it fairly easy to distinguish between suspend and resume
crashes, but I can see why an accurate timestamp would be helpful.

> Isn't the proper way to do this either:
>
> - Assign suspend/resume hooks to the sched_clock code in the
> platform and let the code that reads the hardware clock deal with
> this

That's effectively what I've done on 3 different platforms, which is
why I thought it might be better to put it in the core code.

> Or
>
> - If it absolutely needs to be in the core code, also have a bool
> field indicating whether the clock is going to die during suspend
> and add new registration functions for setting that sched_clock
> type, e.g. setup_sched_clock_nonsuspendable()

Sounds reasonable if some platforms need the extra complexity.

Linus Walleij

unread,
Jul 23, 2012, 8:20:01 PM7/23/12
to
On Mon, Jul 23, 2012 at 9:27 PM, Colin Cross <ccr...@android.com> wrote:
> On Mon, Jul 23, 2012 at 11:55 AM, Linus Walleij

> Does the clock you use for sched_clock continue to run in all suspend
> modes? All the SoC's I've used only have a 32kHz clock in the deepest
> suspend mode,

Yes, and yes it is 32kHz.

> which is not ideal for sched_clock.

Not that I know why scheduling with 32kHz is so bad compared to the
default system scheduling granularity which is HZ if you don't have
sched_clock() implemented.

Since this seems to be such an important point, what makes you
want MHz:es for scheduling granularity? To me the biggest impact
is actually the granularity of the timestamps in the printk:s.

(It's not that I doubt your needs, more curiosity.)

> For example, on
> Tegra2 the faster 1MHz clock used for sched_clock resets in the
> deepest suspend state (LP0) but not the shallowest suspend state
> (LP2), and which suspend state the chip hits depends on which hardware
> is active. Opting out of this patch would cause Tegra's clock to
> sometimes run in suspend, and sometimes not, which seems worse for
> debugging than consistently not running in suspend. I'd be surprised
> if a similar situation didn't apply to your platform.

Well being able to switch between different sched_clock() providers
may be the ideal...

>> - If it absolutely needs to be in the core code, also have a bool
>> field indicating whether the clock is going to die during suspend
>> and add new registration functions for setting that sched_clock
>> type, e.g. setup_sched_clock_nonsuspendable()
>
> Sounds reasonable if some platforms need the extra complexity.

OK agreed.

A connecting theme is that of being avle to flag clock sources as
sched_clock providers. If all clocksources were tagged with
rating, and only clocksources were used for sched_clock(), the
kernel could select the highest-rated clock under all circumstances.

But that's quite intrusive, more of an idea. :-P

Yours,
Linus Walleij

Colin Cross

unread,
Jul 23, 2012, 8:30:02 PM7/23/12
to
On Mon, Jul 23, 2012 at 5:14 PM, Linus Walleij <linus....@linaro.org> wrote:
> On Mon, Jul 23, 2012 at 9:27 PM, Colin Cross <ccr...@android.com> wrote:
>> On Mon, Jul 23, 2012 at 11:55 AM, Linus Walleij
>
>> Does the clock you use for sched_clock continue to run in all suspend
>> modes? All the SoC's I've used only have a 32kHz clock in the deepest
>> suspend mode,
>
> Yes, and yes it is 32kHz.
>
>> which is not ideal for sched_clock.
>
> Not that I know why scheduling with 32kHz is so bad compared to the
> default system scheduling granularity which is HZ if you don't have
> sched_clock() implemented.
>
> Since this seems to be such an important point, what makes you
> want MHz:es for scheduling granularity? To me the biggest impact
> is actually the granularity of the timestamps in the printk:s.
>
> (It's not that I doubt your needs, more curiosity.)

There's a comment somewhere about higher resolution sched_clock
providing fairer scheduling. With 32 kHz sched_clock, every runtime
measured by the scheduler will be wrong by up to 31.25 us. Most
systems have a faster clock, and if it's available it seems silly not
to use it.

It's also used for tracing, where 31.25 us resolution is a little low
for function tracing or function graph tracing.

>> For example, on
>> Tegra2 the faster 1MHz clock used for sched_clock resets in the
>> deepest suspend state (LP0) but not the shallowest suspend state
>> (LP2), and which suspend state the chip hits depends on which hardware
>> is active. Opting out of this patch would cause Tegra's clock to
>> sometimes run in suspend, and sometimes not, which seems worse for
>> debugging than consistently not running in suspend. I'd be surprised
>> if a similar situation didn't apply to your platform.
>
> Well being able to switch between different sched_clock() providers
> may be the ideal...
>
>>> - If it absolutely needs to be in the core code, also have a bool
>>> field indicating whether the clock is going to die during suspend
>>> and add new registration functions for setting that sched_clock
>>> type, e.g. setup_sched_clock_nonsuspendable()
>>
>> Sounds reasonable if some platforms need the extra complexity.
>
> OK agreed.
>
> A connecting theme is that of being avle to flag clock sources as
> sched_clock providers. If all clocksources were tagged with
> rating, and only clocksources were used for sched_clock(), the
> kernel could select the highest-rated clock under all circumstances.
>
> But that's quite intrusive, more of an idea. :-P

sched_clock is supposed to be very low overhead compared to ktime_get,
and has some strict requirements if CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
is not set (see kernel/sched/clock.c), but it might be possible.

Barry Song

unread,
Jul 24, 2012, 2:50:02 AM7/24/12
to
2012/7/18 Colin Cross <ccr...@android.com>:
> Many clocks that are used to provide sched_clock will reset during
> suspend. If read_sched_clock returns 0 after suspend, sched_clock will
> appear to jump forward. This patch resets cd.epoch_cyc to the current
> value of read_sched_clock during resume, which causes sched_clock() just
> after suspend to return the same value as sched_clock() just before
> suspend.
>
> In addition, during the window where epoch_ns has been updated before
> suspend, but epoch_cyc has not been updated after suspend, it is unknown
> whether the clock has reset or not, and sched_clock() could return a
> bogus value. Add a suspended flag, and return the pre-suspend epoch_ns
> value during this period.

Acked-by: Barry Song <21c...@gmail.com>

this patch should also fix the issue that:
1. launch some rt threads, rt threads sleep before suspend
2. repeat to suspend/resume
3. after resuming, waking up rt threads

repeat 1-3 again and again, sometimes all rt threads will hang after
resuming due to wrong sched_clock will make sched_rt think rt_time is
much more than rt_runtime (default 950ms in 1s). then rt threads will
lost cpu timeslot to run since the 95% threshold is there.
-barry

Bedia, Vaibhav

unread,
Jul 24, 2012, 5:20:02 AM7/24/12
to
On Tue, Jul 24, 2012 at 05:58:32, Colin Cross wrote:
> On Mon, Jul 23, 2012 at 5:14 PM, Linus Walleij <linus....@linaro.org> wrote:
> > On Mon, Jul 23, 2012 at 9:27 PM, Colin Cross <ccr...@android.com> wrote:
> >> On Mon, Jul 23, 2012 at 11:55 AM, Linus Walleij
> >
> >> Does the clock you use for sched_clock continue to run in all suspend
> >> modes? All the SoC's I've used only have a 32kHz clock in the deepest
> >> suspend mode,
> >
> > Yes, and yes it is 32kHz.
> >

On the platform that I am working on (AM335x) the sched_clock continues to
run and yes even on this it runs @ 32KHz.

> >> which is not ideal for sched_clock.
> >
> > Not that I know why scheduling with 32kHz is so bad compared to the
> > default system scheduling granularity which is HZ if you don't have
> > sched_clock() implemented.
> >
> > Since this seems to be such an important point, what makes you
> > want MHz:es for scheduling granularity? To me the biggest impact
> > is actually the granularity of the timestamps in the printk:s.
> >

I personally find the timestamps in the printks very late in the suspend
or very early in the resume extremely helpful for debugging.
The printk timestamps also help figure out which portions of the platform
suspend code are taking long and can be optimized.

> > (It's not that I doubt your needs, more curiosity.)
>
> There's a comment somewhere about higher resolution sched_clock
> providing fairer scheduling. With 32 kHz sched_clock, every runtime
> measured by the scheduler will be wrong by up to 31.25 us. Most
> systems have a faster clock, and if it's available it seems silly not
> to use it.
>
> It's also used for tracing, where 31.25 us resolution is a little low
> for function tracing or function graph tracing.
>
> >> For example, on
> >> Tegra2 the faster 1MHz clock used for sched_clock resets in the
> >> deepest suspend state (LP0) but not the shallowest suspend state
> >> (LP2), and which suspend state the chip hits depends on which hardware
> >> is active. Opting out of this patch would cause Tegra's clock to
> >> sometimes run in suspend, and sometimes not, which seems worse for
> >> debugging than consistently not running in suspend. I'd be surprised
> >> if a similar situation didn't apply to your platform.
> >

Is there any other slower on Tegra2 which doesn't reset and hence can be used
for sched_clock in LP0 as well as LP2?

There's a trade-off that the end-user might choose to make -
either use a faster clock for sched_clock which stops in suspend
or use a slower, in most cases 32KHz, clock which doesn't.
IMO this aspect should be configurable.

> > Well being able to switch between different sched_clock() providers
> > may be the ideal...
> >

Hmm... not exactly. More below.

> >>> - If it absolutely needs to be in the core code, also have a bool
> >>> field indicating whether the clock is going to die during suspend
> >>> and add new registration functions for setting that sched_clock
> >>> type, e.g. setup_sched_clock_nonsuspendable()
> >>
> >> Sounds reasonable if some platforms need the extra complexity.
> >

I am not sure but I am guessing even on OMAPx the timer used as sched_clock
keeps running.

> > OK agreed.
> >
> > A connecting theme is that of being avle to flag clock sources as
> > sched_clock providers. If all clocksources were tagged with
> > rating, and only clocksources were used for sched_clock(), the
> > kernel could select the highest-rated clock under all circumstances.
> >
> > But that's quite intrusive, more of an idea. :-P
>

Too intrusive I guess ;)

There were some discussions on this in the context of AM335x [1] [2].
Right now only sched_clock can be registered and I guess this restriction
is not going to go away any time soon.

Regards,
Vaibhav B.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2012-January/080862.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2012-April/092890.html


> sched_clock is supposed to be very low overhead compared to ktime_get,
> and has some strict requirements if CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
> is not set (see kernel/sched/clock.c), but it might be possible.
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-ar...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Colin Cross

unread,
Jul 24, 2012, 10:50:02 PM7/24/12
to
Many clocks that are used to provide sched_clock will reset during
suspend. If read_sched_clock returns 0 after suspend, sched_clock will
appear to jump forward. This patch resets cd.epoch_cyc to the current
value of read_sched_clock during resume, which causes sched_clock() just
after suspend to return the same value as sched_clock() just before
suspend.

In addition, during the window where epoch_ns has been updated before
suspend, but epoch_cyc has not been updated after suspend, it is unknown
whether the clock has reset or not, and sched_clock() could return a
bogus value. Add a suspended flag, and return the pre-suspend epoch_ns
value during this period.

The new behavior is triggered by calling setup_sched_clock_needs_suspend
instead of setup_sched_clock.

Signed-off-by: Colin Cross <ccr...@android.com>
---
arch/arm/include/asm/sched_clock.h | 2 ++
arch/arm/kernel/sched_clock.c | 24 ++++++++++++++++++++++++
2 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/sched_clock.h b/arch/arm/include/asm/sched_clock.h
index e3f7572..05b8e82 100644
--- a/arch/arm/include/asm/sched_clock.h
+++ b/arch/arm/include/asm/sched_clock.h
@@ -10,5 +10,7 @@

extern void sched_clock_postinit(void);
extern void setup_sched_clock(u32 (*read)(void), int bits, unsigned long rate);
+extern void setup_sched_clock_needs_suspend(u32 (*read)(void), int bits,
+ unsigned long rate);

#endif
diff --git a/arch/arm/kernel/sched_clock.c b/arch/arm/kernel/sched_clock.c
index 27d186a..f451539 100644
--- a/arch/arm/kernel/sched_clock.c
+++ b/arch/arm/kernel/sched_clock.c
@@ -21,6 +21,8 @@ struct clock_data {
u32 epoch_cyc_copy;
u32 mult;
u32 shift;
+ bool suspended;
+ bool needs_suspend;
};

static void sched_clock_poll(unsigned long wrap_ticks);
@@ -49,6 +51,9 @@ static unsigned long long cyc_to_sched_clock(u32 cyc, u32 mask)
u64 epoch_ns;
u32 epoch_cyc;

+ if (cd.suspended)
+ return cd.epoch_ns;
+
/*
* Load the epoch_cyc and epoch_ns atomically. We do this by
* ensuring that we always write epoch_cyc, epoch_ns and
@@ -98,6 +103,13 @@ static void sched_clock_poll(unsigned long wrap_ticks)
update_sched_clock();
}

+void __init setup_sched_clock_needs_suspend(u32 (*read)(void), int bits,
+ unsigned long rate)
+{
+ setup_sched_clock(read, bits, rate);
+ cd.needs_suspend = true;
+}
+
void __init setup_sched_clock(u32 (*read)(void), int bits, unsigned long rate)
{
unsigned long r, w;
@@ -169,11 +181,23 @@ void __init sched_clock_postinit(void)
static int sched_clock_suspend(void)
{
sched_clock_poll(sched_clock_timer.data);
+ if (cd.needs_suspend)
+ cd.suspended = true;
return 0;
}

+static void sched_clock_resume(void)
+{
+ if (cd.needs_suspend) {
+ cd.epoch_cyc = read_sched_clock();
+ cd.epoch_cyc_copy = cd.epoch_cyc;
+ cd.suspended = false;
+ }
+}
+
static struct syscore_ops sched_clock_ops = {
.suspend = sched_clock_suspend,
+ .resume = sched_clock_resume,
};

static int __init sched_clock_syscore_init(void)
--
1.7.7.3

Linus Walleij

unread,
Jul 27, 2012, 6:30:01 PM7/27/12
to
On Tue, Jul 24, 2012 at 11:16 AM, Bedia, Vaibhav <vaibha...@ti.com> wrote:

>> > A connecting theme is that of being avle to flag clock sources as
>> > sched_clock providers. If all clocksources were tagged with
>> > rating, and only clocksources were used for sched_clock(), the
>> > kernel could select the highest-rated clock under all circumstances.
>> >
>> > But that's quite intrusive, more of an idea. :-P
>>
>
> Too intrusive I guess ;)
>
> There were some discussions on this in the context of AM335x [1] [2].
> Right now only sched_clock can be registered and I guess this restriction
> is not going to go away any time soon.

Why do you think that? The restriction to only assign sched_clock() at
compile-time was recently removed so its now runtime assigned.

So yes, a clock source that can die and change frequency is no good
as primary system time, but the abstraction could still be used for
those that do, just add another flag NOT_CONTINUOUS or so, and
make sure the system does not select this for primary system
clock source.

Then modelling sched_clock() on clock sources makes all more
sense: just select the best one. For primary system clock source,
do not select one which is non-continous.

Yours,
Linus Walleij

Russell King - ARM Linux

unread,
Jul 27, 2012, 6:50:02 PM7/27/12
to
On Sat, Jul 28, 2012 at 12:23:05AM +0200, Linus Walleij wrote:
> On Tue, Jul 24, 2012 at 11:16 AM, Bedia, Vaibhav <vaibha...@ti.com> wrote:
>
> >> > A connecting theme is that of being avle to flag clock sources as
> >> > sched_clock providers. If all clocksources were tagged with
> >> > rating, and only clocksources were used for sched_clock(), the
> >> > kernel could select the highest-rated clock under all circumstances.
> >> >
> >> > But that's quite intrusive, more of an idea. :-P
> >>
> >
> > Too intrusive I guess ;)
> >
> > There were some discussions on this in the context of AM335x [1] [2].
> > Right now only sched_clock can be registered and I guess this restriction
> > is not going to go away any time soon.
>
> Why do you think that? The restriction to only assign sched_clock() at
> compile-time was recently removed so its now runtime assigned.
>
> So yes, a clock source that can die and change frequency is no good
> as primary system time, but the abstraction could still be used for
> those that do, just add another flag NOT_CONTINUOUS or so, and
> make sure the system does not select this for primary system
> clock source.
>
> Then modelling sched_clock() on clock sources makes all more
> sense: just select the best one. For primary system clock source,
> do not select one which is non-continous.

Except for the overhead. sched_clock() is supposed to be as _fast_ as
it can possibly be, because it has a direct impact on the performance
of the system.

Linus Walleij

unread,
Jul 27, 2012, 7:40:01 PM7/27/12
to
On Wed, Jul 25, 2012 at 4:49 AM, Colin Cross <ccr...@android.com> wrote:

> Many clocks that are used to provide sched_clock will reset during
> suspend. If read_sched_clock returns 0 after suspend, sched_clock will
> appear to jump forward. This patch resets cd.epoch_cyc to the current
> value of read_sched_clock during resume, which causes sched_clock() just
> after suspend to return the same value as sched_clock() just before
> suspend.
>
> In addition, during the window where epoch_ns has been updated before
> suspend, but epoch_cyc has not been updated after suspend, it is unknown
> whether the clock has reset or not, and sched_clock() could return a
> bogus value. Add a suspended flag, and return the pre-suspend epoch_ns
> value during this period.
>
> The new behavior is triggered by calling setup_sched_clock_needs_suspend
> instead of setup_sched_clock.
>
> Signed-off-by: Colin Cross <ccr...@android.com>

Sweet!
Reviewed-by: Linus Walleij <linus....@linaro.org>

Yours,
Linus Walleij

Russell King - ARM Linux

unread,
Jul 27, 2012, 7:40:01 PM7/27/12
to
On Sat, Jul 28, 2012 at 01:32:50AM +0200, Linus Walleij wrote:
> On Wed, Jul 25, 2012 at 4:49 AM, Colin Cross <ccr...@android.com> wrote:
>
> > Many clocks that are used to provide sched_clock will reset during
> > suspend. If read_sched_clock returns 0 after suspend, sched_clock will
> > appear to jump forward. This patch resets cd.epoch_cyc to the current
> > value of read_sched_clock during resume, which causes sched_clock() just
> > after suspend to return the same value as sched_clock() just before
> > suspend.
> >
> > In addition, during the window where epoch_ns has been updated before
> > suspend, but epoch_cyc has not been updated after suspend, it is unknown
> > whether the clock has reset or not, and sched_clock() could return a
> > bogus value. Add a suspended flag, and return the pre-suspend epoch_ns
> > value during this period.
> >
> > The new behavior is triggered by calling setup_sched_clock_needs_suspend
> > instead of setup_sched_clock.
> >
> > Signed-off-by: Colin Cross <ccr...@android.com>
>
> Sweet!
> Reviewed-by: Linus Walleij <linus....@linaro.org>

Have any of you looked at the patch I originally posted for doing this?
It needs updating but shows the overall principle - which is to ensure
that the epoch is up to date before suspending.

It doesn't deal with resume, because different timers behave differently,
and there's no real way to deal with that properly. The important thing
that this patch does is to ensure sched_clock() doesn't ever go backwards.

arch/arm/kernel/sched_clock.c | 18 ++++++++++++++++++
1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/arch/arm/kernel/sched_clock.c b/arch/arm/kernel/sched_clock.c
index 9a46370..4be4019 100644
--- a/arch/arm/kernel/sched_clock.c
+++ b/arch/arm/kernel/sched_clock.c
@@ -10,6 +10,7 @@
#include <linux/jiffies.h>
#include <linux/kernel.h>
#include <linux/sched.h>
+#include <linux/syscore_ops.h>
#include <linux/timer.h>

#include <asm/sched_clock.h>
@@ -72,3 +73,20 @@ void __init sched_clock_postinit(void)
{
sched_clock_poll(sched_clock_timer.data);
}
+
+static int sched_clock_suspend(void)
+{
+ sched_clock_poll(sched_clock_timer.data);
+ return 0;
+}
+
+static struct syscore_ops sched_clock_ops = {
+ .suspend = sched_clock_suspend,
+};
+
+static int __init sched_clock_syscore_init(void)
+{
+ register_syscore_ops(&sched_clock_ops);
+ return 0;
+}
+device_initcall(sched_clock_syscore_init);

Colin Cross

unread,
Jul 27, 2012, 9:20:02 PM7/27/12
to
That patch was merged in 3.4, and my patch is on top of it. Your
patch updates epoch_cyc and epoch_ns in suspend, but if the first call
to cyc_to_sched_clock after resume gets cyc = 0, cyc - epoch_cyc can
be negative, although it will be cast back to a large positive number.

With my patch, epoch_cyc is updated in resume to whatever
read_sched_clock() returns, and epoch_ns is still set to the suspend
value, so it appears that sched_clock has not changed between
sched_clock_suspend and sched_clock_resume. It will work with any
timer behavior (reset to 0, reset to random value, or continuing
counting). The old setup_sched_clock function maintains the old
behavior to appease those who like their 32kHz sched clock to continue
ticking in suspend, although I think it is more correct for all sched
clocks to be faster than 32 kHz (when possible) and stop in suspend.

Colin Cross

unread,
Jul 27, 2012, 11:40:01 PM7/27/12
to
I think the existing code can cause sched_clock to go backwards if the
register read by the read function resets to 0 in suspend:

In sched_clock_suspend epoch_cyc is updated to 0x00002000.

The first sched_clock call after resume gets cyc = 0, returning
epoch_ns + cyc_to_ns(0xffffe000)

Later, sched_clock gets cyc = 0x5000, returning epoch_ns +
cyc_to_ns(0x3000), which will be much smaller than the previous
sched_clock value.

Bedia, Vaibhav

unread,
Jul 30, 2012, 8:40:01 AM7/30/12
to
On Sat, Jul 28, 2012 at 03:53:05, Linus Walleij wrote:
> On Tue, Jul 24, 2012 at 11:16 AM, Bedia, Vaibhav <vaibha...@ti.com> wrote:
>
> >> > A connecting theme is that of being avle to flag clock sources as
> >> > sched_clock providers. If all clocksources were tagged with
> >> > rating, and only clocksources were used for sched_clock(), the
> >> > kernel could select the highest-rated clock under all circumstances.
> >> >
> >> > But that's quite intrusive, more of an idea. :-P
> >>
> >
> > Too intrusive I guess ;)
> >
> > There were some discussions on this in the context of AM335x [1] [2].
> > Right now only sched_clock can be registered and I guess this restriction
> > is not going to go away any time soon.
>
> Why do you think that? The restriction to only assign sched_clock() at
> compile-time was recently removed so its now runtime assigned.
>

Yes it's runtime assigned but multiple calls to setup_sched_clock()
will trigger a WARN_ON() from arch/arm/kernel/sched_clock.c
unless I missing something basic here.

> So yes, a clock source that can die and change frequency is no good
> as primary system time, but the abstraction could still be used for
> those that do, just add another flag NOT_CONTINUOUS or so, and
> make sure the system does not select this for primary system
> clock source.
>
> Then modelling sched_clock() on clock sources makes all more
> sense: just select the best one. For primary system clock source,
> do not select one which is non-continous.
>

I think what you are suggesting above is similar to what AM335x was trying
to do but Russell pointed out issues with switching of timing sources and
that's what I was trying to highlight.

Regards,
Vaibhav B.

Colin Cross

unread,
Aug 2, 2012, 9:30:02 PM8/2/12
to
Russell, any update on this? Should sched_clock.c handle read
functions that go backwards in suspend, or should I modify the read
function to track an offset in suspend and always return a
monotonically increasing value across suspend?

Linus Walleij

unread,
Aug 4, 2012, 8:20:02 PM8/4/12
to
Colin, I suggest you put this into Russell's patch tracker so
he can keep track of it from there.

Yours,
Linus Walleij

Kevin Hilman

unread,
Oct 19, 2012, 8:00:02 PM10/19/12
to
Barry Song <21c...@gmail.com> writes:

> 2012/7/18 Colin Cross <ccr...@android.com>:
>> Many clocks that are used to provide sched_clock will reset during
>> suspend. If read_sched_clock returns 0 after suspend, sched_clock will
>> appear to jump forward. This patch resets cd.epoch_cyc to the current
>> value of read_sched_clock during resume, which causes sched_clock() just
>> after suspend to return the same value as sched_clock() just before
>> suspend.
>>
>> In addition, during the window where epoch_ns has been updated before
>> suspend, but epoch_cyc has not been updated after suspend, it is unknown
>> whether the clock has reset or not, and sched_clock() could return a
>> bogus value. Add a suspended flag, and return the pre-suspend epoch_ns
>> value during this period.
>
> Acked-by: Barry Song <21c...@gmail.com>
>
> this patch should also fix the issue that:
> 1. launch some rt threads, rt threads sleep before suspend
> 2. repeat to suspend/resume
> 3. after resuming, waking up rt threads
>
> repeat 1-3 again and again, sometimes all rt threads will hang after
> resuming due to wrong sched_clock will make sched_rt think rt_time is
> much more than rt_runtime (default 950ms in 1s). then rt threads will
> lost cpu timeslot to run since the 95% threshold is there.

Re-visiting this in light of a related problem.

I've run into a similar issue where IRQ threads are prevented from
running during resume becase the RT throttling kicks because RT
runtime is accumulated during suspend. Using the 'needs_suspend'
version fixes this problem too.

However, because of the RT throttling issue, it seems like *all*
platforms should be using the 'needs_suspend' version always. But, as
already pointed out, that makes the timed printk output during
suspend/resume rather unhelpful.

Having to choose between useful printk times during suspend/resume and
functioning IRQ threads during suspend/resume isn't a choice I want to
make. I'd rather have both. Any ideas?

Kevin
0 new messages