suspend blockers & Android integration

Ingo Molnar

unread,

Jun 3, 2010, 3:40:01 PM6/3/10

to

> [...] Not only has the source code been made available, but hundreds of
> engineering hours have been made trying to accomodate the demands of LKML
> --- and LKML has said no to suspend blockers/wakelocks.

I dont think you are being fair here, at all.

Firstly, the suspend-blockers feature is not being rejected (fixing and
extending suspend is a worthwile goal), it's just that various different
schemes have been proposed by the people who'll eventually have to maintain
that code down the line.

Those reasons seem justified and they are based in praxis that have solved
similar problems to what Android tries to solve.

Sadly the response from the Android team has been 100% uncompromising: either
suspend blockers or nothing.

The thing is, if the insertion of 'hundreds of man hours' into discussing a
feature was technical grounds for upstream inclusion then we'd today have a
Linux kernel with:

- STREAMS
- a kABI
- modularized ipv4
- perfmon
- two dozen CPU schedulers
- zero-copy stupidly pushed to all the file APIs

... and IMO we'd be off much worse technically.

Lets realize it, Linux is an engineering effort that has literally cost about
ten thousand man years. That's about a _85 million_ man hours. It takes effort
to keep that kind of work valuable!

Also, why did the Android team start its contributions with such a difficult
and controversial kernel feature?

There is absolutely _zero_ technical reason why the Android team should
present this as as an all-or-nothing effort. Why not merge hw drivers first
(with suspend blockers commented or stubbed out), to reduce the fork distance?

Really, i myself have controversial kernel features pending all the time. They
dont go upstream for a few kernel releases - over a year sometimes - and
sometimes they never go upstream.

But the fact that some feature of mine is pending doesnt give me the right to
go away sulking, it doesnt mean i will block the whole flow of patches in
retaliation (as you seem to suggest Google will now have the right to do) - i
simply try to work it out.

Lets be reasonable and work it all out, ok?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Brian Swetland

unread,

Jun 3, 2010, 4:00:03 PM6/3/10

to

On Thu, Jun 3, 2010 at 12:30 PM, Ingo Molnar <mi...@elte.hu> wrote:
>
> Sadly the response from the Android team has been 100% uncompromising: either
> suspend blockers or nothing.

Well, we're willing to accept something that gives us the same
functionality (thus rewriting the api several times to meet various
objections, current discussions around
constraint-based-implementations / pm-qos, etc). We believe we're
solving a real problem here and have not seen a counter-proposal that
accomplishes the same.

Suggestions such as "just yell at developers for writing bad apps" or
"it's the user's fault if they install a lousy app" or "make your app
marketplace more restrictive" are not helpful. The technical
discussions around alternatives are more so (though I do feel like
we're going in circles in places), which again is why we're still here
talking about this (that and Arve is about a billion times more
patient and persistent than I am).

We're not interested in massively rearchitecting our userspace to
accomplish this (and the "rewrite your userspace!" proposals I've seen
have had race conditions and/or significant more complexity than the
wakelock model).

...

> Also, why did the Android team start its contributions with such a difficult
> and controversial kernel feature?

We started here because it's possibly the only api level change we
have -- almost everything else is driver or subarch type work or
controversial but entirely self-contained (like the binder, which I
would be shocked to see ever hit mainline). Assertions have been made
that because the "android kernel" (not a term I like -- linux is
linux, we have some assorted patches on top) has this feature it
represents a difficulty for silicon vendors trying to support both
Android projects and OEMs and mainline:

See: http://www.kroah.com/log/linux/android-kernel-problems.html and
various other rants about the evil terrible android forks, etc.

So, we figure, let's sort out the hard problem first and then move on
with our lives.

> There is absolutely _zero_ technical reason why the Android team should
> present this as as an all-or-nothing effort. Why not merge hw drivers first
> (with suspend blockers commented or stubbed out), to reduce the fork distance?

If that's the case then there is no problem and people could stop
yelling at us and just submit their drivers. Awesome.

I can't speak for all the nameless silicon vendors Greg represents,
that we apparently are preventing from doing this (how? I don't
know!), etc, but for my team maintaining multiple versions of drivers
is a headache, we'd rather square away the wakelock debate first and
figure something out there, as it just seems like a more logical
approach. Maybe we're crazy.

> Really, i myself have controversial kernel features pending all the time. They
> dont go upstream for a few kernel releases - over a year sometimes - and
> sometimes they never go upstream.
>
> But the fact that some feature of mine is pending doesnt give me the right to
> go away sulking, it doesnt mean i will block the whole flow of patches in
> retaliation (as you seem to suggest Google will now have the right to do) - i
> simply try to work it out.

We're not blocking anything. Hell, if people want drivers we wrote
upstream and we're not fast enough for 'em, we publish everything via
android.git.kernel.org, pretty aggressively rebase to follow latest
mainline, and release everything under GPLv2, ready-to-go. We have to
ship though, and as long as the version we maintain has the features
we need to ship and the mainline version doesn't, we're going to ship
based on our version, but this really shouldn't be surprising to
anyone.

> Lets be reasonable and work it all out, ok?

We're trying.

I do feel like we're suffering from lack of a clear "how do we move
forward" path, and in particular from an environment where every time
we do a bunch of work to address one set of concerns and entirely new
set of people pop up with different concerns (sometimes contradicting
the last round of changes we were asked to make, etc, etc).

Brian

Ingo Molnar

unread,

Jun 3, 2010, 7:30:02 PM6/3/10

to

* Ingo Molnar <mi...@elte.hu> wrote:

> * ty...@mit.edu <ty...@mit.edu> wrote:
>
> > [...] Not only has the source code been made available, but hundreds of
> > engineering hours have been made trying to accomodate the demands of LKML
> > --- and LKML has said no to suspend blockers/wakelocks.
>
> I dont think you are being fair here, at all.
>
> Firstly, the suspend-blockers feature is not being rejected (fixing and
> extending suspend is a worthwile goal), it's just that various different
> schemes have been proposed by the people who'll eventually have to maintain
> that code down the line.

Btw., i'd like to summarize the scheduler based suspend scheme proposed by
Thomas Gleixner, Peter Zijlstra and myself. I found no good summary of it in
the big thread, and there are also new elements of the proposal:

- Create a 'deep idle' mode that suspends. This, if all constraints
are met, is triggered by the scheduler automatically: just like the other
idle modes are triggered currently. This approach fixes the wakeup
races because an incoming wakeup event will set need_resched() and
abort the suspend.

( This mode can even use the existing suspend code to bring stuff down,
therefore it also solves the pending timer problem and works even on
PC style x86. )

- Introduce a 'minimum wakeup latency' task attribute (task->latency),
settable via a scheduler syscall. This is an ABI that influences the kernel
how idle the system can go. (i.e. the equivalent of suspend blockers, just
not binary and not system-wide.)

- Solve crappy app confinement via the scheduler:

A first proposal was to use the existing cgroup mechanism, but we found
a different and probably more elegant solution:

We can slightly extend the scheduler and introduce another per task 'minimum
latency other tasks are allowed to run' scheduling attribute
(task->exclude_latency) - set via a scheduler syscall as well. (only
settable by privileged tasks - such as the screensaver.)

This allows a task to 'exclude' other tasks that dont have low-latency
requirements. Crappy apps would have a large latency value, so they'd
be idled out when a privileged task sets the exclusion level low enough.

In the case of Android, this would for example be used by the screensaver
to introduce different levels of runnability/idling.

[ Note that this scheme would also be useful in a completely different
scenario, for real-time tasks as well: it would allow extreme-RT tasks to
quiescence all lower prio tasks in a controlled manner. (even if the RT
task is sleeping) ]

- Controlled auto-suspend: drivers (such as input) could on wakeup
automatically set the 'minimum wakeup latency' value of wakee tasks to a
lower value. This automatically prevents another auto-suspend in the near
future: up to the point the wakee task increases its latency (via the
scheduler syscall) again and allows suspend again.

This means there will be no surprise suspends for a task that may take a
bit longer than usual to finish its work. [ Detail: this would only be done
for tasks that have a non-default (non-infinity) task->latency value - to
prevent the input driver from lowering latency values (and preventing
future suspends) just because some unaware apps are running and using input
drivers. ]

All in one, this scheme allows everything without exception that
suspend-blockers allows and supports all the important usecases:

- allows agressive auto-idling

- has no wakeup races

- allows crappy-app confinement and other finegrained suspend control

- it should be pretty easy to adopt by Android as well, as it goes
along similar principles of kernel automatisms combined with
user-space controlled task and system attributes.

It's straightforward to adapt and it is also more generic, more clean and more
flexible than suspend-blockers.

Please mention any remaining technical issues that may still be are
unaddressed.

Ingo Molnar

unread,

Jun 3, 2010, 7:50:02 PM6/3/10

to

* Linus Torvalds <torv...@linux-foundation.org> wrote:

> On Fri, 4 Jun 2010, Ingo Molnar wrote:
> >
> > This allows a task to 'exclude' other tasks that dont have low-latency
> > requirements. Crappy apps would have a large latency value, so they'd
> > be idled out when a privileged task sets the exclusion level low enough.
>

> Quite frankly, this sounds fundamentally broken.
>
> Think deadlock. The high-latency task got a lock, and now you're excluding
> it because it scheduled away.

Mail was a bit too long already so i trimmed it at the wrong place :-/

What you say is absolutely true, hence this would be driven via sched_tick() +
TIF notifiers - i.e. only ever treat user-mode tasks as 'idle-able'. This can
be done with no overhead to the regular fastpaths.

The TIF notifier would be the one scheduling to idle - and would thus do it
only to user-mode tasks.

Linus Torvalds

unread,

Jun 3, 2010, 7:50:02 PM6/3/10

to

On Fri, 4 Jun 2010, Ingo Molnar wrote:
>

> This allows a task to 'exclude' other tasks that dont have low-latency
> requirements. Crappy apps would have a large latency value, so they'd
> be idled out when a privileged task sets the exclusion level low enough.

Quite frankly, this sounds fundamentally broken.

Think deadlock. The high-latency task got a lock, and now you're excluding
it because it scheduled away.

So from my perspective, putting that kind of logic deep in the system
sounds like the _last_ thing we want to do.

I think it's much saner to have a very targeted suspend blocker that only
blocks the opportunistic suspends and has _zero_ interaction with the rest
of the system (certainly none at all with core code like the scheduler).

And if somebody then suspends the traditional way (by an actual suspend
event, not that opportunistic thing), then the suspend blocker does
nothing at all - because it simply doesn't even _exist_ at that level.
It's only about the opportunistic suspends.

(I'd further suggest that disk wait and running in kernel mode disable any
opportunistic suspend anyway - but that's not about suspend blockers as
much as it is about just the opportunistic suspend itself).

Linus

Ingo Molnar

unread,

Jun 3, 2010, 8:50:01 PM6/3/10

to

* Ingo Molnar <mi...@elte.hu> wrote:

> - Create a 'deep idle' mode that suspends. This, if all constraints
> are met, is triggered by the scheduler automatically: just like the other
> idle modes are triggered currently. This approach fixes the wakeup
> races because an incoming wakeup event will set need_resched() and
> abort the suspend.
>
> ( This mode can even use the existing suspend code to bring stuff down,
> therefore it also solves the pending timer problem and works even on
> PC style x86. )

Note that this does not necessarily have to be implemented as 'execute suspend
from the idle task' code: scheduling from the idle task, while can certainly
be made to work, is a somewhat recursive concept that we might want to avoid
for robustness reasons.

Instead, the 'deepest idle' (suspend) method could consist of a wakeup of a
kernel thread (or of any of the existing kernel threads such as the migration
thread) - which kernel thread then does a race-free suspend: it offlines all
but one CPU [on platforms that need that] and then initiates the suspend - but
aborts the attempt if there's any sign of wakeup activity.

Linus Torvalds

unread,

Jun 3, 2010, 10:30:02 PM6/3/10

to

On Fri, 4 Jun 2010, Ingo Molnar wrote:
>
> What you say is absolutely true, hence this would be driven via sched_tick() +
> TIF notifiers - i.e. only ever treat user-mode tasks as 'idle-able'. This can
> be done with no overhead to the regular fastpaths.
>
> The TIF notifier would be the one scheduling to idle - and would thus do it
> only to user-mode tasks.

The thing is, unless there is some _really_ deep other reason to do
something like this, I still think it's total overdesign to push any
knowledge/choices like this into the scheduler. I'd rather keep things way
more independent, less tied to each other and to deep kernel subsystems.

IOW, my personal opinion is that somethng like a suspend (blocker or not)
decision simply shouldn't be important enough to be tied into the
scheduler. Especially not if it could just be its own layer.

That said, as far as I know, the Android people have mostly been looking
at the suspend angle from a single-core standpoint. And I'm not at all
convinced that they should hijack the existing "/sys/power/state" thing
which is what I think they do now.

And those two things go together. The /sys/power/state thing is a global
suspend - which I don't think is appropriate for a opportunistic thing in
the first place, especially for multi-core.

A well-designed opportunistic suspend should be a two-phase thing: an
opportunistc CPU hotunplug (shutting down cores one by one as the system
is idle), and not a "global" event in the first place. And only when
you've reached single-core state should you then say "do I suspend the
system too".

So I've tried to look a bit at the patches, and my admittedly rough
comments so far is

- I really do prefer the "off to the side" approach that the current
google opportunistic suspend patches have. As mentioned, I don't think
this should be deep in the scheduler. Not at all.

- I do think there are possibly races and CPU idle issues there, but I
think they are mainly for the multi-core thing. And I think that's a
totally separate issue. Or it _should_ be.

- once you're single-core (whether because you never had more cores to
begin with, or because the "opportunistic CPU offlining" has taken down
the other cores), I think the suspend-blocker is fine as a concept, and
certainly shouldn't need any deep scheduler hooks.

so I'd like to see the opportunistc suspend thing think about CPU
offlining, and I'd like to see it disconnect from the existing
/sys/power/state. And I'd really not like to involved deep internal kernel
hooks into it.

But I'll also admit that maybe I'm not seeing some problems. I've frankly
tried to avoid the whole discussion until Andrew pulled me in yesterday.

Linus

Linus Torvalds

unread,

Jun 3, 2010, 10:40:01 PM6/3/10

to

On Thu, 3 Jun 2010, Linus Torvalds wrote:
>
> so I'd like to see the opportunistc suspend thing think about CPU
> offlining

Side note: one reason for me being somewhat interested in the CPU
offlining is that I think the Android kind of opportunistic suspend is
_not_ likely something I'd like to see on a desktop. But an the
"opportunistic CPU offliner"? That might _well_ be useful even outside of
any other suspend activity.

If the system is idle (or almost idle) for long times, I would heartily
recommend actively shutting down unused cores. Some CPU's are hopefully
smart enough to not even need that kind of software management, but I
suspect even the really smart ones might be able to take advantage of the
kernel saying: "I'm shutting you down, you don't have to worry about
latency AT ALL, because I'm keeping another CPU active to do any real
work".

I'd also be interested to see if it could even improve single-thread
performance if we end up doing the whole SMP->UP "lock" prefix rewriting
when the system is idle enough that we'd be better off running just a
single core. I dunno - just throwing that out there.

Anyway, the only reason I think this is related is literally because I
think that if we know there is only a single CPU active, I think the
actual "real" opportunistic suspend is easier. Suddenly you don't have to
worry about what happens on other run-queues etc, and whether another CPU
is just about to create a suspend block etc.

So I think they tie together, although it's mostly tangential. And as
mentioned, I think a opportunistic CPU suspend part is more relevant
outside of Android, and thus perhaps more widely interesting.

Arjan van de Ven

unread,

Jun 3, 2010, 11:50:01 PM6/3/10

to

On Thu, 3 Jun 2010 19:26:50 -0700 (PDT)
Linus Torvalds <torv...@linux-foundation.org> wrote:

>
> If the system is idle (or almost idle) for long times, I would
> heartily recommend actively shutting down unused cores. Some CPU's
> are hopefully smart enough to not even need that kind of software
> management, but I suspect even the really smart ones might be able to
> take advantage of the kernel saying: "I'm shutting you down, you
> don't have to worry about latency AT ALL, because I'm keeping another
> CPU active to do any real work".

sadly the reality is that "offline" is actually the same as "deepest C
state". At best.

As far as I can see, this is at least true for all Intel and AMD cpus.

And because there's then no power saving (but a performance cost), it's
actually a negative for battery life/total energy.

(lots of experiments inside Intel seem to confirm that, it's not just
theory)

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

Arve Hjønnevåg

unread,

Jun 3, 2010, 11:50:02 PM6/3/10

to

On Thu, Jun 3, 2010 at 7:16 PM, Linus Torvalds
<torv...@linux-foundation.org> wrote:
>
>
> On Fri, 4 Jun 2010, Ingo Molnar wrote:
>>
>> What you say is absolutely true, hence this would be driven via sched_tick() +
>> TIF notifiers - i.e. only ever treat user-mode tasks as 'idle-able'. This can
>> be done with no overhead to the regular fastpaths.
>>
>> The TIF notifier would be the one scheduling to idle - and would thus do it
>> only to user-mode tasks.
>
> The thing is, unless there is some _really_ deep other reason to do
> something like this, I still think it's total overdesign to push any
> knowledge/choices like this into the scheduler. I'd rather keep things way
> more independent, less tied to each other and to deep kernel subsystems.
>
> IOW, my personal opinion is that somethng like a suspend (blocker or not)
> decision simply shouldn't be important enough to be tied into the
> scheduler. Especially not if it could just be its own layer.
>
> That said, as far as I know, the Android people have mostly been looking
> at the suspend angle from a single-core standpoint. And I'm not at all
> convinced that they should hijack the existing "/sys/power/state" thing
> which is what I think they do now.
>

While it is true that we have not used this code on a multi core
system yet, I'm not sure why multiple cores codes would affect it. We
annotate that works needs to be done before it is safe to suspend, but
we don't care which core does the work (or if multiple cores do pieces
of it).

> And those two things go together. The /sys/power/state thing is a global
> suspend - which I don't think is appropriate for a opportunistic thing in
> the first place, especially for multi-core.
>
> A well-designed opportunistic suspend should be a two-phase thing: an
> opportunistc CPU hotunplug (shutting down cores one by one as the system
> is idle), and not a "global" event in the first place. And only when
> you've reached single-core state should you then say "do I suspend the
> system too".
>

This seems to fit better into the cpuidle and/or frequency scaling framework.

> So I've tried to look a bit at the patches, and my admittedly rough
> comments so far is
>
> - I really do prefer the "off to the side" approach that the current
> google opportunistic suspend patches have. As mentioned, I don't think
> this should be deep in the scheduler. Not at all.
>
> - I do think there are possibly races and CPU idle issues there, but I
> think they are mainly for the multi-core thing. And I think that's a
> totally separate issue. Or it _should_ be.
>

I'm not aware of any races with multi-core systems unless there are
existing problems in suspend. We check if any suspend blockers are
active after disable_nonboot_cpus() has returned.

> - once you're single-core (whether because you never had more cores to
> begin with, or because the "opportunistic CPU offlining" has taken down
> the other cores), I think the suspend-blocker is fine as a concept, and
> certainly shouldn't need any deep scheduler hooks.
>
> so I'd like to see the opportunistc suspend thing think about CPU
> offlining,

I see this as a separate problem. We ignore a single busy CPU for
opportunistic suspend, so why should the number of online CPUs matter?

> and I'd like to see it disconnect from the existing
> /sys/power/state.

The entry point is not important to us. The current interface is what
Rafael wanted instead of the /sys/power/request-state interface which
is what we changed it to last year.

> And I'd really not like to involved deep internal kernel
> hooks into it.
>
> But I'll also admit that maybe I'm not seeing some problems. I've frankly
> tried to avoid the whole discussion until Andrew pulled me in yesterday.
>
> Linus
>

--
Arve Hjønnevåg

Neil Brown

unread,

Jun 4, 2010, 12:40:02 AM6/4/10

to

On Fri, 4 Jun 2010 01:23:02 +0200
Ingo Molnar <mi...@elte.hu> wrote:

> Btw., i'd like to summarize the scheduler based suspend scheme proposed by
> Thomas Gleixner, Peter Zijlstra and myself. I found no good summary of it in
> the big thread, and there are also new elements of the proposal:

Hi
I would like to summarise the alternate proposal that I an others have
suggested in a variety of different forms.

It starts from the premise that
1/ Android developers actually like the "big hammer" aspect of suspend.
Initiating suspend powers down some devices, puts others in low power
states, freezes all processes and generally puts the device to sleep
with a well defined and easily controlled (at the whole-of-system level)
set of events that will wake from suspend. This is a big part of the
Android approach to power-saving and I'm guessing they are not keen to
depart from it.

2/ The main problem with using suspend as-is is that it is racy.
The purpose of suspend is to put the device to sleep until a wake-event
occurs. When that wake-event occurs at much the same time that suspend is
requested races can occur. We want a wake-event to not only wake the
device, to be keep the device awake while the wake-event is being handled,
and to cancel any suspend that was initiated before the wake event
completed.
We need to understand "wake event" in an holistic sense. If a key press is
expected to brighten the screen and make a glyph appear, and if that key
press is considered to be a wake-event, then the glyph appearing must also
be a part of the wake event. For such a holistic wake-event to fully
block/cancel a suspend there much be some mechanism for hand-over of
wake-events from kernel-space to user-space.

Given those premises, google's suspend-blocker approach was to allow a
kernel thread to initiate suspend whenever nothing was stopping it, and to
allow both drivers and user processes to block that suspend while handling
a wake event (or anything else that needed to keep the device awake).
In this case the hand-over is fairly straight forward as the kernel thread
as full knowledge and can easily wait for all sorts of things.

The alternate proposal is simply to have user-space initiate a suspend (as
is already possible), user-space processes can then trivially block that
suspend through any of a number of IPC approaches, and kernel space drivers
can block/abort suspend by explicitly requesting a block.

The variety of alternate proposals comes from a variety of ways to modify
the semantics of "ask for a suspend" in such a way that userspace can
discover when there are kernel-space blocks, and can wait for them to be
released without spinning.

A sample modification (which I think is different to all the ones
mentioned so far, and hopefully pulls out the best of them all) is
to allow userspace to write e.g. "mem_safe" rather than "mem" to
/sys/power/state. The 'safe' implies it is safe from races.

When this is written, the process sleeps in an interruptible state until
all in-kernel suspend blocks have been dropped. If any such suspend blocks
were found, or if a signal is received, the request aborts. Only if there
were no suspend blocks and no pending signals does the suspend progress.

wake-events in the kernel then need to be tracked all the way to user-space,
and the in-kernel lock is only dropped when the event is consumed by
user-space. User-space must take some sort of lock to ensure no new
suspend is requested before consuming any wake-events from the kernel.

I believe this is very close to what android has today, only with a much
smaller change to the user-space interface, which I believe to be the thing
that has been found most objectionable.
I does still require a degree of event-tracking within the kernel which
might still be objectionable - I'm not so sure about different people's
positions on that.

Thanks,
NeilBrown

Arve Hjønnevåg

unread,

Jun 4, 2010, 1:00:02 AM6/4/10

to

On Thu, Jun 3, 2010 at 4:23 PM, Ingo Molnar <mi...@elte.hu> wrote:
...

> - Controlled auto-suspend: drivers (such as input) could on wakeup
> automatically set the 'minimum wakeup latency' value of wakee tasks to a
> lower value. This automatically prevents another auto-suspend in the near
> future: up to the point the wakee task increases its latency (via the
> scheduler syscall) again and allows suspend again.
>

How do you clear the latency value in a safe way? If another wakeup
event happens right after your wakee task is done processing the last
event and decides to increase its latency, auto suspend will be
allowed even though you have an unprocessed wakeup event. Also how do
you know which task will read the event if it is not already waiting
for it?

> This means there will be no surprise suspends for a task that may take a
> bit longer than usual to finish its work. [ Detail: this would only be done
> for tasks that have a non-default (non-infinity) task->latency value - to
> prevent the input driver from lowering latency values (and preventing
> future suspends) just because some unaware apps are running and using input
> drivers. ]

Don't you need two inifinity values for this?

--
Arve Hjønnevåg

Linus Torvalds

unread,

Jun 4, 2010, 1:00:02 AM6/4/10

to

On Thu, 3 Jun 2010, Arjan van de Ven wrote:
>
> And because there's then no power saving (but a performance cost), it's
> actually a negative for battery life/total energy.

Including the UP optimizations we do (ie lock prefix removal)? It's
possible that I'm just biased by benchmarks, and it's true that Intel has
been getting lots better, but the locking costs are very noticeable
performance-wise on some benchmarks.

And several CPU's have been held back from going into deepest sleep states
by stupid firmware and/or platform bugs.

But hey, if it's not going to help, and people have tried it, I guess I'll
have to believe it.

Linus

Ingo Molnar

unread,

Jun 4, 2010, 2:30:02 AM6/4/10

to

* Linus Torvalds <torv...@linux-foundation.org> wrote:

> [...]

>
> And those two things go together. The /sys/power/state thing is a global
> suspend - which I don't think is appropriate for a opportunistic thing in
> the first place, especially for multi-core.
>
> A well-designed opportunistic suspend should be a two-phase thing: an
> opportunistc CPU hotunplug (shutting down cores one by one as the system is
> idle), and not a "global" event in the first place. And only when you've
> reached single-core state should you then say "do I suspend the system too".

Shutting a core down would be a natural idle level, and when the last one goes
idle we can do the suspend. (it happens as part of suspend anyway)

So on systems that dont want to auto-suspend this would indeed behave like you
suggest: the final core left would run as UP in essence.

Ingo

Ingo Molnar

unread,

Jun 4, 2010, 3:20:01 AM6/4/10

to

* Arve Hj?nnev?g <ar...@android.com> wrote:

> On Thu, Jun 3, 2010 at 4:23 PM, Ingo Molnar <mi...@elte.hu> wrote:
> ...

> > ?- Controlled auto-suspend: drivers (such as input) could on wakeup
> > ? automatically set the 'minimum wakeup latency' value of wakee tasks to a
> > ? lower value. This automatically prevents another auto-suspend in the near
> > ? future: up to the point the wakee task increases its latency (via the
> > ? scheduler syscall) again and allows suspend again.

> >
>
> How do you clear the latency value in a safe way? If another wakeup event
> happens right after your wakee task is done processing the last event and
> decides to increase its latency, auto suspend will be allowed even though
> you have an unprocessed wakeup event. Also how do you know which task will
> read the event if it is not already waiting for it?

The easiest solution would be to not do any of that initially. (If it's ever a
concern we could subtract/add without destroying the nesting property)

Why do you need to track input wakeups? It's rather fragile and rather
unnecessary - the idle drivers know it very well how to not go into the
deepest idle mode already today. We wont hit C8 on laptops when you are using
the desktop.

> > ? This means there will be no surprise suspends for a task that may take a
> > ? bit longer than usual to finish its work. [ Detail: this would only be done
> > ? for tasks that have a non-default (non-infinity) task->latency value - to
> > ? prevent the input driver from lowering latency values (and preventing
> > ? future suspends) just because some unaware apps are running and using input
> > ? drivers. ]

>
> Don't you need two inifinity values for this?

Yes - any value above the max idle latency in the system will do.

Thanks,

Ingo

Arve Hjønnevåg

unread,

Jun 4, 2010, 3:40:02 AM6/4/10

to

On Fri, Jun 4, 2010 at 12:13 AM, Ingo Molnar <mi...@elte.hu> wrote:
>
> * Arve Hj?nnev?g <ar...@android.com> wrote:
>
>> On Thu, Jun 3, 2010 at 4:23 PM, Ingo Molnar <mi...@elte.hu> wrote:
>> ...
>> > ?- Controlled auto-suspend: drivers (such as input) could on wakeup
>> > ? automatically set the 'minimum wakeup latency' value of wakee tasks to a
>> > ? lower value. This automatically prevents another auto-suspend in the near
>> > ? future: up to the point the wakee task increases its latency (via the
>> > ? scheduler syscall) again and allows suspend again.
>> >
>>
>> How do you clear the latency value in a safe way? If another wakeup event
>> happens right after your wakee task is done processing the last event and
>> decides to increase its latency, auto suspend will be allowed even though
>> you have an unprocessed wakeup event. Also how do you know which task will
>> read the event if it is not already waiting for it?
>
> The easiest solution would be to not do any of that initially. (If it's ever a
> concern we could subtract/add without destroying the nesting property)
>
> Why do you need to track input wakeups? It's rather fragile and rather

Because we have keys that should always turn the screen on, but the
problem is not specific to input events. If we enabled a wakeup event
it usually means we need this event to always work, not just when the
system is fully awake or fully suspended.

> unnecessary - the idle drivers know it very well how to not go into the
> deepest idle mode already today. We wont hit C8 on laptops when you are using
> the desktop.
>

The whole point allow the use of suspend.

>> > ? This means there will be no surprise suspends for a task that may take a
>> > ? bit longer than usual to finish its work. [ Detail: this would only be done
>> > ? for tasks that have a non-default (non-infinity) task->latency value - to
>> > ? prevent the input driver from lowering latency values (and preventing
>> > ? future suspends) just because some unaware apps are running and using input
>> > ? drivers. ]
>>
>> Don't you need two inifinity values for this?
>
> Yes - any value above the max idle latency in the system will do.
>
> Thanks,
>
> � � � �Ingo
>

--
Arve Hj�nnev�g

Ingo Molnar

unread,

Jun 4, 2010, 4:00:03 AM6/4/10

to

* Brian Swetland <swet...@google.com> wrote:

> On Thu, Jun 3, 2010 at 12:30 PM, Ingo Molnar <mi...@elte.hu> wrote:
> >
> > Sadly the response from the Android team has been 100% uncompromising: either
> > suspend blockers or nothing.
>
> Well, we're willing to accept something that gives us the same
> functionality (thus rewriting the api several times to meet various
> objections, current discussions around
> constraint-based-implementations / pm-qos, etc). We believe we're
> solving a real problem here and have not seen a counter-proposal that
> accomplishes the same.
>
> Suggestions such as "just yell at developers for writing bad apps" or
> "it's the user's fault if they install a lousy app" or "make your app

> marketplace more restrictive" are not helpful. [...]

Agreed.

> [...] The technical discussions around alternatives are more so (though I
> do feel like we're going in circles in places), [...]

Yep.

> [...] which again is why we're still here talking about this (that and Arve

> is about a billion times more patient and persistent than I am).
>
> We're not interested in massively rearchitecting our userspace to accomplish
> this (and the "rewrite your userspace!" proposals I've seen have had race
> conditions and/or significant more complexity than the wakelock model).

Having a somewhat different ABI for achieving things you'll probably have
prepare for. I doubt it would result in any large-scale, massive rewrites.

> ...
>
> > Also, why did the Android team start its contributions with such a
> > difficult and controversial kernel feature?
>
> We started here because it's possibly the only api level change we have --
> almost everything else is driver or subarch type work or controversial but
> entirely self-contained (like the binder, which I would be shocked to see

> ever hit mainline). [...]

So why arent those bits mainline? It's a 1000 times easier to get drivers and
small improvements and non-ABI changes upstream.

After basically two years of growing your fork (and some attempts to get your
drivers into drivers/staging/ - from where they have meanwhile dropped out
again) you re-started with the worst possible thing to merge: a big and
difficult kernel feature affecting many subsystems. Why?

This is one of the fundamental problems here. People simply dont know you,
because you have not worked with us much - and hence they dont trust you
positively out of box - they are neutral at best.

And believe me, it's hard enough to get difficult features upstream if people
_do_ know you and when they positively _do_ trust you ... Arent you talking to
Andrew Morton about how to do these things properly? This is kernel
contribution 101 really.

> [...] Assertions have been made that because the "android kernel" (not a
> term I like -- linux is linux, we have some assorted patches on top) [...]

I've been tracking android-common and android-msm for a while and i have to
say that it shows a very lackluster attitude towards upstream:

- The latest branches i can see are v2.6.32 based today. We are in the
v2.6.35 stabilization cycle and are developing v2.6.36. I.e. your upstream
base is about a year too old.

- The last commit is a couple of weeks old AFAICS.

- The diffstat of android-common/android-2.6.32 is:

890 files changed, 39962 insertions(+), 6286 deletions(-)

Those assorted patches have spread over nearly a thousand files. FYI, by
the looks of it you are facing an exponentially worsening maintenance
overhead curve here.

Is there perhaps some other tree i should be following? I'm looking at:

[remote "android-msm"]
url = git://android.git.kernel.org/kernel/msm.git
fetch = +refs/heads/*:refs/remotes/android-msm/*
[remote "android-common"]
url = git://android.git.kernel.org/kernel/common.git
fetch = +refs/heads/*:refs/remotes/android-common/*

Btw., the commits i've glanced at looked mostly clean and well structured, so
i see no fundamental reason why this couldn't be done better.

> See: http://www.kroah.com/log/linux/android-kernel-problems.html and various
> other rants about the evil terrible android forks, etc.
>
> So, we figure, let's sort out the hard problem first and then move on with
> our lives.

Well, my suggestion would be to first build up a path towards upstream, build
up trust, reduce your very high cross section to mainline - and do the most
difficult bits last.

Especially 'move on with our lives' suggests that you just want to get rid of
this ABI divergence and continue-as-usual with the pattern of non-cooperation,
hm?

> > There is absolutely _zero_ technical reason why the Android team should
> > present this as as an all-or-nothing effort. Why not merge hw drivers
> > first (with suspend blockers commented or stubbed out), to reduce the fork
> > distance?
>
> If that's the case then there is no problem and people could stop yelling at
> us and just submit their drivers. Awesome.
>
> I can't speak for all the nameless silicon vendors Greg represents, that we
> apparently are preventing from doing this (how? I don't know!), etc, but for
> my team maintaining multiple versions of drivers is a headache, we'd rather
> square away the wakelock debate first and figure something out there, as it
> just seems like a more logical approach. Maybe we're crazy.

It's not crazy, it's just IMHO inefficient and very difficult to do it like
that. And you arent the first one to try it like that (people _always_
gravitate towards coming with their most difficult patches first - because
they are very often the most useful patches) - it's a non-trivial learning
curve IMHO.

Ingo

Ingo Molnar

unread,

Jun 4, 2010, 4:20:01 AM6/4/10

to

* Arjan van de Ven <ar...@infradead.org> wrote:

> On Thu, 3 Jun 2010 19:26:50 -0700 (PDT)
> Linus Torvalds <torv...@linux-foundation.org> wrote:
>
> > If the system is idle (or almost idle) for long times, I would heartily
> > recommend actively shutting down unused cores. Some CPU's are hopefully
> > smart enough to not even need that kind of software management, but I
> > suspect even the really smart ones might be able to take advantage of the
> > kernel saying: "I'm shutting you down, you don't have to worry about
> > latency AT ALL, because I'm keeping another CPU active to do any real
> > work".
>
> sadly the reality is that "offline" is actually the same as "deepest C
> state". At best.
>
> As far as I can see, this is at least true for all Intel and AMD cpus.
>
> And because there's then no power saving (but a performance cost), it's
> actually a negative for battery life/total energy.
>
> (lots of experiments inside Intel seem to confirm that, it's not just
> theory)

Well, the scheme would only be useful if it's _NOT_ just a deep C4 state, but
something that prevents tasks from being woken to that CPU for a good period
of time. Hot-unplugging that CPU achieves that (the runqueues are pulled), so
i think in Linus's idea makes sense in principle.

[ Or have you done deep-idle experiments to that effect as well? ]

I suspect it all depends on the cost: and our current hot-unplug and
hot-replug code is all but cheap ...

Ingo

Ingo Molnar

unread,

Jun 4, 2010, 4:20:02 AM6/4/10

to

* Linus Torvalds <torv...@linux-foundation.org> wrote:

> On Fri, 4 Jun 2010, Ingo Molnar wrote:
>
> > What you say is absolutely true, hence this would be driven via
> > sched_tick() + TIF notifiers - i.e. only ever treat user-mode tasks as
> > 'idle-able'. This can be done with no overhead to the regular fastpaths.
> >
> > The TIF notifier would be the one scheduling to idle - and would thus do
> > it only to user-mode tasks.
>
> The thing is, unless there is some _really_ deep other reason to do
> something like this, I still think it's total overdesign to push any
> knowledge/choices like this into the scheduler. I'd rather keep things way
> more independent, less tied to each other and to deep kernel subsystems.

Well, the deep reason as i see it is simply the observation that what the
Android auto-suspend code implements via the suspend-blocker patches is an
idle driver and user-space scheduler in disguise. (if you count that as a deep
enough reason)

I dont mind hacks if they are local and if i dont have to maintain them, but
the objection from other folks was that suspend blockers are not that local
and not that maintainable. And if (and that's a big if) we have a global
effect anyway, then we might as well consider implementing it cleanly:

- A global /sys flag is fundamentally racy and only allows a single
user-space actor. Not a problem on mobile phones but sure violates
taste buds.

Proper per task latency attributes are not racy - we always know the
maximum/minimum values, without user-space interfering with each other.

- When done correctly we might win a couple of new features as well around
the fringes:

- Useful for power savings on mobile: crappy apps can be idled on an
intermediate level, even before the system goes totally idle. There's no
equivalent suspend-blockers feature.

- Useful for real-time tasks that want to idle lower prio tasks when some
really important thing is running - even if the real-time task might sleep.
This is superior to the 'hog the CPU' kind of hacks that have been used
for this purpose before.

- The hacks needed to express a race-free suspend/wakeup cycle are unnatural
and stem from the model being a user-space driven idle manager instead of a
proper part of task sleep/wakeup.

- None of this code seems to impact any scheduler hotpath (most of it is just
a special form of idle driver) - it's all on deeper levels of idle and, at
most, in off-line return-to-userspace codepaths. So there's no strong
performance reason _against_ some level of integration. There is indeed
the coupling effect as you mention, which weighs against.

- i also think Andoid's auto-suspend is a strategic feature to Linux: i
think auto/opportunistic suspend will matter more and more, and my guess
is that ten years most of our daily systems will be doing auto-suspend and
will have proper wakeups from suspend implemented in hardware. Not just
phones and gadgets but also portable tablets, book readers, TVs - and i
wouldnt mind a non-portable, table sized tablet either ;-)

At which point i'd hate to have some hack of a solution ingrained and
ABI-ized with little chance to move user-space to sanity.

But yes, i definitely agree with you that it all comes down to 'do we care':

- If we care we should integrate it intelligently where it belongs
conceptually: the idle drivers and the scheduler.

- If we dont care then we should isolate the hacks as much as possible - and
then the current suspend blocker patch-set is definitely a good basis to
start. (with perhaps the /sys hackery cleaned up a bit, as you suggested)

I dont favor either of the solutions too deeply - so i personally have not
NAK-ed suspend blockers - i just saw a half a dozen semi-NAKs flying from
other folks, so tried to help come up with a palatable design.

_If_ most of x86 hardware was able to suspend race-free i think deeper
integration would be a slam-dunk - as we could make it work almost everywhere.
Sadly only a tiny subset of x86 qualifies, so the argument isnt obvious. Maybe
we should pick a variant of suspend blockers and re-examine things in a few
years? It being an ABI makes it difficult tho.

What i would personally find unacceptable is to have _neither_ solutions - and
the discussion was heading towards that stage really, with both sides digging
the trenches of non-cooperation. IMHO we just cannot afford to let this drop
on the floor as the feature is immensely useful to Android and thus to Linux
at large.

Anyway, i'm glad that it's up to you ;-)

Ingo

Ingo Molnar

unread,

Jun 4, 2010, 4:40:03 AM6/4/10

to

* Arve Hj?nnev?g <ar...@android.com> wrote:

> > [...]

> >
> > Why do you need to track input wakeups? It's rather fragile and rather

> > unnecessary [...]

>
> Because we have keys that should always turn the screen on, but the problem
> is not specific to input events. If we enabled a wakeup event it usually
> means we need this event to always work, not just when the system is fully
> awake or fully suspended.

Hm, i cannot follow that generic claim. Could you please point out the problem
to me via a specific example? Which task does what, what undesirable thing
happens where, etc.

Thanks,

Ingo

Brian Swetland

unread,

Jun 4, 2010, 4:40:02 AM6/4/10

to

On Fri, Jun 4, 2010 at 12:57 AM, Ingo Molnar <mi...@elte.hu> wrote:
> * Brian Swetland <swet...@google.com> wrote:
>>
>> We started here because it's possibly the only api level change we have --
>> almost everything else is driver or subarch type work or controversial but
>> entirely self-contained (like the binder, which I would be shocked to see
>> ever hit mainline). [...]
>
> So why arent those bits mainline? It's a 1000 times easier to get drivers and
> small improvements and non-ABI changes upstream.
>
> After basically two years of growing your fork (and some attempts to get your
> drivers into drivers/staging/ - from where they have meanwhile dropped out
> again) you re-started with the worst possible thing to merge: a big and
> difficult kernel feature affecting many subsystems. Why?

Because a large number of our drivers depend on it.

> This is one of the fundamental problems here. People simply dont know you,
> because you have not worked with us much - and hence they dont trust you
> positively out of box - they are neutral at best.
>
> And believe me, it's hard enough to get difficult features upstream if people
> _do_ know you and when they positively _do_ trust you ... Arent you talking to
> Andrew Morton about how to do these things properly? This is kernel
> contribution 101 really.
>
>> [...] Assertions have been made that because the "android kernel" (not a
>> term I like -- linux is linux, we have some assorted patches on top) [...]
>
> I've been tracking android-common and android-msm for a while and i have to
> say that it shows a very lackluster attitude towards upstream:
>
> - The latest branches i can see are v2.6.32 based today. We are in the
> v2.6.35 stabilization cycle and are developing v2.6.36. I.e. your upstream
> base is about a year too old.

We have some branch naming confusion and work going on in
experimental, but our active work right now is against 2.6.34 and
2.6.35-rc. The tegra2 work has been very aggressively following
mainline (rebasing against 2.6.34rc as they were getting underway),
and we've been sending those patches out for review, in hopes of
getting that tree off on a better foot.

>
> - The last commit is a couple of weeks old AFAICS.
>
> - The diffstat of android-common/android-2.6.32 is:
>
> 890 files changed, 39962 insertions(+), 6286 deletions(-)
>
> Those assorted patches have spread over nearly a thousand files. FYI, by
> the looks of it you are facing an exponentially worsening maintenance
> overhead curve here.
>
> Is there perhaps some other tree i should be following? I'm looking at:
>
> [remote "android-msm"]
> url = git://android.git.kernel.org/kernel/msm.git
> fetch = +refs/heads/*:refs/remotes/android-msm/*
> [remote "android-common"]
> url = git://android.git.kernel.org/kernel/common.git
> fetch = +refs/heads/*:refs/remotes/android-common/*
>
> Btw., the commits i've glanced at looked mostly clean and well structured, so
> i see no fundamental reason why this couldn't be done better.

I think the fundamental issue we keep bumping into is the turnaround
time on patch review / inclusion (again we're trying to get things
going much earlier on tegra2 to hopefully have less pain there). We
aim for kernel style compliance (though we're not perfect and we make
our share of mistakes), but previously when I tried sending mach-msm
stuff out, it seemed infeasible to send 30-60+ patches, so we'd start
with 5-10, feedback would trickle in over the course of a week, I'd
respin, etc. After a couple weeks some stuff would get picked up
toward a merge window but the rest would have to wait. And then we
hit crunch to ship, etc, and get behind.

Totally our fault that we're not just constantly pushing patches (and
we're trying to get a fulltime engineer or two just to work on
upstream related stuff), but we rapidly hit the point where what we're
sending up is a drop in the bucket compared to the work we're doing
and things keep diverging, etc.

I'm told this happens to everyone, is common, etc. We're (seriously)
a small team, trying to ship multiple products a year and keep our
head above water here, and unfortunately that means we keep tabling
these projects until we can find some cycles to give it another go and
the delta grows.

>> So, we figure, let's sort out the hard problem first and then move on with
>> our lives.
>
> Well, my suggestion would be to first build up a path towards upstream, build
> up trust, reduce your very high cross section to mainline - and do the most
> difficult bits last.

Having to maintain two versions of about half our driver code because
we depend on an ABI not in mainline is a significant factor for us --
it's difficult to have what's going upstream lag behind our active
work (basically we have to maintain two different trees -- one for
mainline one for ship) already, but having these codelines also be
different makes it worse for us.

> Especially 'move on with our lives' suggests that you just want to get rid of
> this ABI divergence and continue-as-usual with the pattern of non-cooperation,
> hm?

I'd like to make some forward progress either to get something
wakelock-ish in and shift to whatever that api is, or to get a clear
"no not going to happen" and deal with the fallout there.

...

Sadly, for mach-msm, we're now further out due to maintainership
shifts (Daniel stepped up to do msm stuff, is pushing up some hybrid
of our work and Qualcomm's work that doesn't seem to really fit with
either, and I have no idea how to sanely get our stuff to sit on top
of that). I'd love to find some time to sit down, clean up the whole
msm tree for 8x50/7x30 which is (largely) pretty clean, and is
extremely stable and shippable, and try to get it into a patch series
and headed upstream, but we're now colliding with the upstream
mach-msm which has gone off in a different direction, etc.

Anyway, we continue to try to figure out how to make stuff work better
(again, trying some different approaches with tegra2), but so far the
process of getting code upstream has been extremely time intensive and
rather frustrating and it remains unclear who can sign off on what and
how many hoops different people will keep asking us to jump through.

Brian

Ingo Molnar

unread,

Jun 4, 2010, 5:00:02 AM6/4/10

to

* Brian Swetland <swet...@google.com> wrote:

> On Fri, Jun 4, 2010 at 12:57 AM, Ingo Molnar <mi...@elte.hu> wrote:
> > * Brian Swetland <swet...@google.com> wrote:
> >>
> >> We started here because it's possibly the only api level change we have
> >> -- almost everything else is driver or subarch type work or controversial
> >> but entirely self-contained (like the binder, which I would be shocked to
> >> see ever hit mainline). [...]
> >
> > So why arent those bits mainline? It's a 1000 times easier to get drivers
> > and small improvements and non-ABI changes upstream.
> >
> > After basically two years of growing your fork (and some attempts to get
> > your drivers into drivers/staging/ - from where they have meanwhile
> > dropped out again) you re-started with the worst possible thing to merge:
> > a big and difficult kernel feature affecting many subsystems. Why?
>
> Because a large number of our drivers depend on it.

So why not put in some stub or so? Auto-suspend/suspend-blockers is a feature,
and drivers ought to be able to work without a feature as well. Keep the
suspend-blocker changes in the android tree initially, and get the main body
of changes out first, and establish a flow of timely changes. That reduces
your maintenance burden and increases trust for future changes - a win-win
situation.

In any case, this is not to suggest that the suspend-blocker bits are
'impossible' to merge. I just say that if you start with your most difficult
feature you should not be surprised to be on the receiving end of a 1000+
mails flamewar on lkml ;-)

> > I've been tracking android-common and android-msm for a while and i have
> > to say that it shows a very lackluster attitude towards upstream:
> >

> > ??- The latest branches i can see are v2.6.32 based today. We are in the
> > ?? v2.6.35 stabilization cycle and are developing v2.6.36. I.e. your
> > upstream ?? base is about a year too old.

>
> We have some branch naming confusion and work going on in
> experimental, but our active work right now is against 2.6.34 and

> 2.6.35-rc. [...]

That's nice!

> [...] The tegra2 work has been very aggressively following mainline

> (rebasing against 2.6.34rc as they were getting underway), and we've been
> sending those patches out for review, in hopes of getting that tree off on a
> better foot.

Ah, googling for 'tegra2' gave me the magic URI:

git remote add android-tegra2 git://android.git.kernel.org/kernel/tegra.git

I generally roam various trees for scheduler patches when i can, seeing what
problems people are facing and trying to prevent more painful forks from
developing. You have these changes there currently:

d82647e: sched: make task dump print all 15 chars of proc comm
5e3e0f1: sched: Enable might_sleep before initializing drivers.

Please submit 5e3e0f1. We can probably do that one even simpler, by turning
__might_sleep_init_called into the only flag that __might_sleep() checks -
i.e. not checking system_state at all.

Also, please submit d82647e, it makes sense too.

Thanks,

Ingo

Arve Hjønnevåg

unread,

Jun 4, 2010, 5:00:02 AM6/4/10

to

On Fri, Jun 4, 2010 at 1:34 AM, Ingo Molnar <mi...@elte.hu> wrote:
>
> * Arve Hj?nnev?g <ar...@android.com> wrote:
>
>> > [...]
>> >
>> > Why do you need to track input wakeups? It's rather fragile and rather
>> > unnecessary [...]
>>
>> Because we have keys that should always turn the screen on, but the problem
>> is not specific to input events. If we enabled a wakeup event it usually
>> means we need this event to always work, not just when the system is fully
>> awake or fully suspended.
>
> Hm, i cannot follow that generic claim. Could you please point out the problem
> to me via a specific example? Which task does what, what undesirable thing
> happens where, etc.
>

We have many wakeup events, and some of them are invisible to the
user. For instance on the Nexus One wake up every 10 minutes monitor
the battery health. If the user presses a key right after this work
has finished and we did not block suspend until userspace could
process this key event, we risk suspending before we could turn the
screen on, which to the user looks like the key did not work. Another
example, the user pressed the power key which turns the screen off and
allows suspend. We initiate suspend and a phone call comes in. If we
don't block suspend until we processed the incoming phone call
notification, the phone may never ring (some devices will send a new
message every few seconds for this, so on those devices it would just
delay the ringing).

--
Arve Hjønnevåg

Brian Swetland

unread,

Jun 4, 2010, 5:10:01 AM6/4/10

to

On Fri, Jun 4, 2010 at 1:55 AM, Ingo Molnar <mi...@elte.hu> wrote:
> * Brian Swetland <swet...@google.com> wrote:
>> > After basically two years of growing your fork (and some attempts to get
>> > your drivers into drivers/staging/ - from where they have meanwhile
>> > dropped out again) you re-started with the worst possible thing to merge:
>> > a big and difficult kernel feature affecting many subsystems. Why?
>>
>> Because a large number of our drivers depend on it.
>
> So why not put in some stub or so? Auto-suspend/suspend-blockers is a feature,
> and drivers ought to be able to work without a feature as well. Keep the
> suspend-blocker changes in the android tree initially, and get the main body
> of changes out first, and establish a flow of timely changes. That reduces
> your maintenance burden and increases trust for future changes - a win-win
> situation.

The impression I got from previous discussions was that upstream did
not want things that were built conditionally around APIs that did not
exist in mainline nor stub implementations for things that were not
agreed upon.

We could easily either #if defined(CONFIG_SUSPEND_BLOCKERS) or submit
a suspend_blockers.h that just makes everything a no-op, if that's an
acceptable transition vehicle. I didn't think either were an option
open to us.

> In any case, this is not to suggest that the suspend-blocker bits are
> 'impossible' to merge. I just say that if you start with your most difficult
> feature you should not be surprised to be on the receiving end of a 1000+
> mails flamewar on lkml ;-)

Yeah, I do understand that we're not making it easy for ourselves
here. I think we hit the point where Rafael and Matthew signed off on
things and thought "aha, linux-pm maintainers are happy, now we're
getting somewhere" only to realize the light at the end of the tunnel
was a bit further out than we anticipated ^^

Brian

Pekka Enberg

unread,

Jun 4, 2010, 5:10:01 AM6/4/10

to

On Fri, Jun 4, 2010 at 11:55 AM, Ingo Molnar <mi...@elte.hu> wrote:
> In any case, this is not to suggest that the suspend-blocker bits are
> 'impossible' to merge. I just say that if you start with your most difficult
> feature you should not be surprised to be on the receiving end of a 1000+
> mails flamewar on lkml ;-)

Indeed. This 'all or nothing' approach hasn't worked well in the past
and I highly doubt it will work now. It's much easier to work with
people when you have a track record of getting things merged and
actually maintaining the code.

Pekka

Peter Zijlstra

unread,

Jun 4, 2010, 5:50:02 AM6/4/10

to

On Fri, 2010-06-04 at 01:23 +0200, Ingo Molnar wrote:
> Btw., i'd like to summarize the scheduler based suspend scheme proposed by
> Thomas Gleixner, Peter Zijlstra and myself. I found no good summary of it in
> the big thread, and there are also new elements of the proposal:

Just to clarify, my proposition doesn't go much further than treating
'suspend' as a genuine idle state (on suitable hardware, which x86 isn't).

> - Create a 'deep idle' mode that suspends. This, if all constraints
> are met, is triggered by the scheduler automatically: just like the other
> idle modes are triggered currently. This approach fixes the wakeup
> races because an incoming wakeup event will set need_resched() and
> abort the suspend.
>

Right, so 'suspend' as idle seems (at least on UP/arm) a very sensible
idea. On SMP current suspend hot-unplugs all but the boot cpu, I'm not
sure we need to do that, since if the system is genuinely idle, what races
are there?

And if its not idle...

> ( This mode can even use the existing suspend code to bring stuff down,
> therefore it also solves the pending timer problem and works even on
> PC style x86. )

You cannot solve the pending timer issue from idle, unless you allow idle
to stop clock_monotonic, which would change idle semantics, and that is not
something I can say is a good idea.

You want all idle states to have the same semantics, otherwise things just
get way too confusing.

> - Solve crappy app confinement via the scheduler:
>
> A first proposal was to use the existing cgroup mechanism,

I still believe containment is a cgroup problem. The freeze/snapshot/resume
container folks seem to face many of the same problems. Including the
pending timer one I suspect. Lets solve it there.

> - Controlled auto-suspend: drivers (such as input) could on wakeup
> automatically set the 'minimum wakeup latency' value of wakee tasks to a
> lower value. This automatically prevents another auto-suspend in the near
> future: up to the point the wakee task increases its latency (via the
> scheduler syscall) again and allows suspend again.

I think treating wakeups special like that is a mistake. I also think the
kernel should never adjust a task's QoS attributes, the user set them in
the expectation of them being respected.

I'm not really sure about the interaction between wakeups and untrusted
apps. It seems to me that an untrusted app needs a trusted intermediate
anyway, that intermediate can be responsible for freezing/unfreezing of the
untrusted app.

So either the app asks for suspend blockers through the intermediate, or it's
cgroup is managed by the intermediate -- should work out to the same end
result, right?

Peter Zijlstra

unread,

Jun 4, 2010, 6:00:03 AM6/4/10

to

On Fri, 2010-06-04 at 11:43 +0200, Peter Zijlstra wrote:
> I still believe containment is a cgroup problem. The freeze/snapshot/resume
> container folks seem to face many of the same problems. Including the
> pending timer one I suspect. Lets solve it there.

While talking to Thomas about this, we'd probably need a CLOCK_MONOTONIC
namespace to pull this off, so that resumed apps don't see the jump in
absolute time.

This would also help with locating the relevant timers, since they'd be
on the related timer base.

The only 'interesting' issue I can see here is that if you create 1000
CLOCK_MONOTONIC namepaces, we'd need to have a tree of trees in order to
efficiently find the leftmost timer.

Ingo Molnar

unread,

Jun 4, 2010, 6:00:03 AM6/4/10

to

* Brian Swetland <swet...@google.com> wrote:

> On Fri, Jun 4, 2010 at 1:55 AM, Ingo Molnar <mi...@elte.hu> wrote:
> > * Brian Swetland <swet...@google.com> wrote:
> >> > After basically two years of growing your fork (and some attempts to get
> >> > your drivers into drivers/staging/ - from where they have meanwhile
> >> > dropped out again) you re-started with the worst possible thing to merge:
> >> > a big and difficult kernel feature affecting many subsystems. Why?
> >>
> >> Because a large number of our drivers depend on it.
> >
> > So why not put in some stub or so? Auto-suspend/suspend-blockers is a
> > feature, and drivers ought to be able to work without a feature as well.
> > Keep the suspend-blocker changes in the android tree initially, and get
> > the main body of changes out first, and establish a flow of timely
> > changes. That reduces your maintenance burden and increases trust for
> > future changes - a win-win situation.
>
> The impression I got from previous discussions was that upstream did not
> want things that were built conditionally around APIs that did not exist in
> mainline nor stub implementations for things that were not agreed upon.

Well, if it's some ugly #ifdef solution i could imagine light objections on
pure aesthetic micro-grounds.

> We could easily either #if defined(CONFIG_SUSPEND_BLOCKERS) or submit a
> suspend_blockers.h that just makes everything a no-op, if that's an
> acceptable transition vehicle. I didn't think either were an option open to
> us.

You can certainly put in a suspend_blockers.h thing into some Android
directory, and populate it with empty wrappers - as long as you only use it
within Android drivers and not core kernel code or other subsystems you dont
maintain.

It's being done all the time and helpful cleanup patches eliminating the stubs
are frowned upon (unless the subs are there like for years with no progress
and no maintenance in sight).

Putting empty stubs into include/linux/ would be pushing things i think.

In fact sometimes architectures even jump the gun with major kernel features:
we had a dynticks implementation in ARM for years, we had RTLinux stubs in x86
code for quite some time, and we still have perfmon in IA64 - despite the core
kernel having gone for a different design.

It's certainly not ideal, but it's certainly a solution that is used every now
and then. The less difference there is between trees the easier it becomes to
merge - for both sides, both technically and socially.

> > In any case, this is not to suggest that the suspend-blocker bits are
> > 'impossible' to merge. I just say that if you start with your most
> > difficult feature you should not be surprised to be on the receiving end
> > of a 1000+ mails flamewar on lkml ;-)
>
> Yeah, I do understand that we're not making it easy for ourselves here. I
> think we hit the point where Rafael and Matthew signed off on things and
> thought "aha, linux-pm maintainers are happy, now we're getting somewhere"
> only to realize the light at the end of the tunnel was a bit further out
> than we anticipated ^^

That's a well-known problem on lkml: the light at the end of the tunnel was
the other train ;-)

Anyway, i'm not pessimistic at all: _some_ sort of scheme appears to be
crystalising out today. Everyone seems to agree now that the main usecases are
indeed useful and need handling one way or another - the rest is really just
technological discussions how to achieve the mostly-agreed-upon end goal.

The worst situation are features where one side says 'we dont need this kind
of functionality at all' - IMO auto/opportunistic-suspend isnt in that
situation, fortunately.

Thanks,

Ingo

Peter Zijlstra

unread,

Jun 4, 2010, 6:10:01 AM6/4/10

to

On Fri, 2010-06-04 at 12:03 +0200, Ingo Molnar wrote:

> > The only 'interesting' issue I can see here is that if you create 1000
> > CLOCK_MONOTONIC namepaces, we'd need to have a tree of trees in order to
> > efficiently find the leftmost timer.
>

> Realistically Android userspace would create just a single such namespace for
> all the untrusted/unknown/uncontrolled apps, right?

Possibly, yeah.

But it might not stop someone else from create an insane amount of them.
So we do need to deal with that, and a linear loop over all timer bases,
which then will be a user controlled quantity, just doesn't sound
right :-)

Ingo Molnar

unread,

Jun 4, 2010, 6:10:02 AM6/4/10

to

* Peter Zijlstra <pet...@infradead.org> wrote:

> On Fri, 2010-06-04 at 11:43 +0200, Peter Zijlstra wrote:
> > I still believe containment is a cgroup problem. The freeze/snapshot/resume
> > container folks seem to face many of the same problems. Including the
> > pending timer one I suspect. Lets solve it there.
>
> While talking to Thomas about this, we'd probably need a CLOCK_MONOTONIC
> namespace to pull this off, so that resumed apps don't see the jump in
> absolute time.
>
> This would also help with locating the relevant timers, since they'd be on
> the related timer base.

Ok - this looks workable, and looks technically isolated that can be pursued
as a separate module of this whole topic.

> The only 'interesting' issue I can see here is that if you create 1000
> CLOCK_MONOTONIC namepaces, we'd need to have a tree of trees in order to
> efficiently find the leftmost timer.

Realistically Android userspace would create just a single such namespace for

all the untrusted/unknown/uncontrolled apps, right?

Ingo

Thomas Gleixner

unread,

Jun 4, 2010, 6:20:02 AM6/4/10

to

On Fri, 4 Jun 2010, Peter Zijlstra wrote:

> On Fri, 2010-06-04 at 11:43 +0200, Peter Zijlstra wrote:
> > I still believe containment is a cgroup problem. The freeze/snapshot/resume
> > container folks seem to face many of the same problems. Including the
> > pending timer one I suspect. Lets solve it there.
>
> While talking to Thomas about this, we'd probably need a CLOCK_MONOTONIC
> namespace to pull this off, so that resumed apps don't see the jump in
> absolute time.
>
> This would also help with locating the relevant timers, since they'd be
> on the related timer base.
>
> The only 'interesting' issue I can see here is that if you create 1000
> CLOCK_MONOTONIC namepaces, we'd need to have a tree of trees in order to
> efficiently find the leftmost timer.

We can do more clever than that. All CLOCK_MONOTONIC timers can live
in the CLOCK_MONOTONIC rbtree, we just need proper annotation, i.e.:

struct hrtimer {
ktime_t expires;
......
struct list_head namespace;
ktime_t base_offset;
};

So expires would be on CLOCK_MONOTONIC as seen from the kernel, just
the user space interfaces would take the base_offset into account.

On freeze we remove the timers from the rbtree (they are easy to
find via the namespace list) and on thaw we set the base_offset
accordingly and insert them again. So no surprise for user space and
no tree of trees to walk through.

Thanks,

tglx

Brian Swetland

unread,

Jun 4, 2010, 6:20:02 AM6/4/10

to

On Fri, Jun 4, 2010 at 3:08 AM, Peter Zijlstra <pet...@infradead.org> wrote:
> On Fri, 2010-06-04 at 12:03 +0200, Ingo Molnar wrote:
>
>> > The only 'interesting' issue I can see here is that if you create 1000
>> > CLOCK_MONOTONIC namepaces, we'd need to have a tree of trees in order to
>> > efficiently find the leftmost timer.
>>
>> Realistically Android userspace would create just a single such namespace for
>> all the untrusted/unknown/uncontrolled apps, right?
>
> Possibly, yeah.

Definitely, at least initially. If we had the ability to do more fine
grained control, we might be tempted to experiment with it, but to
start with we'd feel safer using the big hammer as we are today.

Brian

Brian Swetland

unread,

Jun 4, 2010, 6:20:02 AM6/4/10

to

On Fri, Jun 4, 2010 at 2:59 AM, Ingo Molnar <mi...@elte.hu> wrote:
>
> You can certainly put in a suspend_blockers.h thing into some Android
> directory, and populate it with empty wrappers - as long as you only use it
> within Android drivers and not core kernel code or other subsystems you dont
> maintain.
>
> It's being done all the time and helpful cleanup patches eliminating the stubs
> are frowned upon (unless the subs are there like for years with no progress
> and no maintenance in sight).
>
> Putting empty stubs into include/linux/ would be pushing things i think.
>
> In fact sometimes architectures even jump the gun with major kernel features:
> we had a dynticks implementation in ARM for years, we had RTLinux stubs in x86
> code for quite some time, and we still have perfmon in IA64 - despite the core
> kernel having gone for a different design.
>
> It's certainly not ideal, but it's certainly a solution that is used every now
> and then. The less difference there is between trees the easier it becomes to
> merge - for both sides, both technically and socially.

Totally -- our goal would be that as drivers find their way from our
tree to mainline we'd keep them 1:1 between the trees. If we can it a
local suspend_blocker.h somewhere while the long term solution gets
hashed out that'd remove the biggest painpoint on a driver level. I'm
not quite sure where the best place to drop such a thing would be --
we'd likely be including it from mach-msm, mach-tegra2, and drivers
for both those architectures in the normal driver places for the tree.
I guess we could just drop it in
arch/arm/mach-{msm,tegra2}/include/mach/ and both the subarch code and
subarch-specific-drivers we've been writing could pick it up via
#include <mach/suspend_blockers.h>

>> Yeah, I do understand that we're not making it easy for ourselves here. I
>> think we hit the point where Rafael and Matthew signed off on things and
>> thought "aha, linux-pm maintainers are happy, now we're getting somewhere"
>> only to realize the light at the end of the tunnel was a bit further out
>> than we anticipated ^^
>
> That's a well-known problem on lkml: the light at the end of the tunnel was
> the other train ;-)
>
> Anyway, i'm not pessimistic at all: _some_ sort of scheme appears to be
> crystalising out today. Everyone seems to agree now that the main usecases are
> indeed useful and need handling one way or another - the rest is really just
> technological discussions how to achieve the mostly-agreed-upon end goal.
>
> The worst situation are features where one side says 'we dont need this kind
> of functionality at all' - IMO auto/opportunistic-suspend isnt in that
> situation, fortunately.

It is encouraging that there's at least some general consensus that
the feature is useful, and as Arve and I have both mentioned, we're
really not religious about names, etc, provided we can solve the
problem we're trying to solve, so if it ends up being qos constraints
or something else entirely but still gets us where we're trying to go,
it's good news.

I think one point of contention remaining may be "just blocking
suspend" vs "halting specific untrusted processes". The latter is
difficult for us to work with because of the overall complexity of
(our) userspace environment. A big hammer where we stop it all and
suspend ends up being less deadlock/inversion-prone. Of course if the
general solution ends up being able to do either, then perhaps
everyone's happy.

Brian

Peter Zijlstra

unread,

Jun 4, 2010, 6:20:02 AM6/4/10

to

Ah indeed, much nicer.

Andi Kleen

unread,

Jun 4, 2010, 6:50:02 AM6/4/10

to

Linus Torvalds <torv...@linux-foundation.org> writes:

> On Thu, 3 Jun 2010, Arjan van de Ven wrote:
>>
>> And because there's then no power saving (but a performance cost), it's
>> actually a negative for battery life/total energy.
>
> Including the UP optimizations we do (ie lock prefix removal)? It's

Those only help the kernel and most workloads do not do enough kernel
execution for it to really matter, but spend most of their
time in user space.

Even if as kernel programmers we often have a different view, in most
cases most cycles are in user space :)

-Andi
--
a...@linux.intel.com -- Speaking for myself only.

Peter Zijlstra

unread,

Jun 4, 2010, 8:10:02 AM6/4/10

to

Right, so in the proposed scheme all these tasks would be executed by
trusted processes, and trusted processes will never get frozen and so
will never be delayed in processing these events.

Only untrusted code will be frozen. And trusted processes are reliable
for thawing the untrusted processes and delivering events to it.

Trusted processes are assumed to be sane and idle when there is nothing
for them to do, allowing the machine to go into deep idle states.

James Bottomley

unread,

Jun 4, 2010, 10:30:02 AM6/4/10

to

On Fri, 2010-06-04 at 11:59 +0200, Ingo Molnar wrote:
> * Brian Swetland <swet...@google.com> wrote:
> > On Fri, Jun 4, 2010 at 1:55 AM, Ingo Molnar <mi...@elte.hu> wrote:
> > > * Brian Swetland <swet...@google.com> wrote:

[...]

> > > In any case, this is not to suggest that the suspend-blocker bits are
> > > 'impossible' to merge. I just say that if you start with your most
> > > difficult feature you should not be surprised to be on the receiving end
> > > of a 1000+ mails flamewar on lkml ;-)
> >
> > Yeah, I do understand that we're not making it easy for ourselves here. I
> > think we hit the point where Rafael and Matthew signed off on things and
> > thought "aha, linux-pm maintainers are happy, now we're getting somewhere"
> > only to realize the light at the end of the tunnel was a bit further out
> > than we anticipated ^^
>
> That's a well-known problem on lkml: the light at the end of the tunnel was
> the other train ;-)
>
> Anyway, i'm not pessimistic at all: _some_ sort of scheme appears to be
> crystalising out today. Everyone seems to agree now that the main usecases are
> indeed useful and need handling one way or another - the rest is really just
> technological discussions how to achieve the mostly-agreed-upon end goal.

It's still not clear to me whether everyone's revolving around to using
the current suspend block API because it's orthogonal to all other
mechanisms and is therefore separate from the kernel (and can be
compiled out if you don't want it). Or whether re-expressing what the
android drivers want (minimum idle states and suspend block) in pm_qos
terms which others can use is the way to go. I think the latter, but
I'd like to know what other people think (because I'm not wedded to this
preference).

> The worst situation are features where one side says 'we dont need this kind
> of functionality at all' - IMO auto/opportunistic-suspend isnt in that
> situation, fortunately.

Great ... because deprecating the problem has been one of the persistent
memes by some people on this huge thread.

James

Alan Stern

unread,

Jun 4, 2010, 11:00:01 AM6/4/10

to

On Fri, 4 Jun 2010, Ingo Molnar wrote:

> Note that this does not necessarily have to be implemented as 'execute suspend
> from the idle task' code: scheduling from the idle task, while can certainly
> be made to work, is a somewhat recursive concept that we might want to avoid
> for robustness reasons.
>
> Instead, the 'deepest idle' (suspend) method could consist of a wakeup of a
> kernel thread (or of any of the existing kernel threads such as the migration
> thread) - which kernel thread then does a race-free suspend: it offlines all
> but one CPU [on platforms that need that] and then initiates the suspend - but
> aborts the attempt if there's any sign of wakeup activity.

Out of morbid curiosity... A typical sign of wakeup activity is a
thread becoming runnable because of expiration of a kernel timer or an
I/O completion interrupt. How would the "race-free suspend" thread
detect this sort of thing? Indeed, isn't the inability to detect these
part of what makes the existing suspend implementation (the freezer in
particular) not race-free?

Alan Stern

Florian Mickler

unread,

Jun 4, 2010, 11:10:02 AM6/4/10

to

On Fri, 04 Jun 2010 09:24:06 -0500
James Bottomley <James.B...@suse.de> wrote:

> On Fri, 2010-06-04 at 11:59 +0200, Ingo Molnar wrote:
> > Anyway, i'm not pessimistic at all: _some_ sort of scheme appears to be
> > crystalising out today. Everyone seems to agree now that the main usecases are
> > indeed useful and need handling one way or another - the rest is really just
> > technological discussions how to achieve the mostly-agreed-upon end goal.
>
> It's still not clear to me whether everyone's revolving around to using
> the current suspend block API because it's orthogonal to all other
> mechanisms and is therefore separate from the kernel (and can be
> compiled out if you don't want it). Or whether re-expressing what the
> android drivers want (minimum idle states and suspend block) in pm_qos
> terms which others can use is the way to go. I think the latter, but
> I'd like to know what other people think (because I'm not wedded to this
> preference).

I'd like to know that also.
I have a patch to add pm_qos_add_request_nonblock function, so it is
possible to register an pm_qos constraint by passing preallocated
memory to it.

Notifying should be possible to do from atomic contexts via
async_schedule()?

The scalability issues of pm_qos can be adressed by using plists for
all pm_qos_class'es. Or by having the different pm_qos_class'es provide
their own implementations for the update and get operations.

Cheers,
Flo

Rafael J. Wysocki

unread,

Jun 4, 2010, 7:40:01 PM6/4/10

to

I kind of agree here, so I'd like to focus a bit on that.

Here's my idea in the very general terms:

(1) Use the cgroup freezer to "suspend" the "untrusted" apps (ie. the ones
that don't use suspend blockers aka wakelocks in the Android world) at the
point Android would normally start opportunistic suspend.

(2) Allow the cpuidle framework to put CPUs into low-power states after the
"trusted" apps (ie. the ones that use suspend blockers in the Android
world) have gone idle.

(3) Teach the cpuidle framework to schedule runtime suspend of I/O devices
before idling the last CPU (*).

(4) Design a mechanism to resume the I/O devices suspended in (3) so that
they are not powered up unnecessarily (that's going to be difficult as far
as I can see).

This way, in principle, we should be able to save (at least almost) as much
energy as the opportunistic suspend currently used by Android, provided that
things will be capable of staying idle for extended periods of time.

(*) That may require per-device PM QoS requirements to be used, in which case
devices may even be suspended earlier if the PM QoS requirements of all
of their users are met.

I wonder what people think. Is this realistic and if so, would it be difficult
to implement?

Rafael

Thomas Gleixner

unread,

Jun 4, 2010, 8:10:01 PM6/4/10

to

On Sat, 5 Jun 2010, Rafael J. Wysocki wrote:
> I kind of agree here, so I'd like to focus a bit on that.
>
> Here's my idea in the very general terms:
>
> (1) Use the cgroup freezer to "suspend" the "untrusted" apps (ie. the ones
> that don't use suspend blockers aka wakelocks in the Android world) at the
> point Android would normally start opportunistic suspend.

There is an additional benefit to this approach:

In the current android world a background task (e.g. download
initiated before the screensaver kicked in) prevents the suspend,
but that also means that the crapplications can still suck power
completely unconfined.

With the cgroup freezer you can "suspend" them right away and
just keep the trusted background task(s) alive which allows us to
go into deeper idle states instead of letting the crapplications
run unconfined until the download finished and the suspend
blocker goes away.

> (2) Allow the cpuidle framework to put CPUs into low-power states after the
> "trusted" apps (ie. the ones that use suspend blockers in the Android
> world) have gone idle.
>
> (3) Teach the cpuidle framework to schedule runtime suspend of I/O devices
> before idling the last CPU (*).
>
> (4) Design a mechanism to resume the I/O devices suspended in (3) so that
> they are not powered up unnecessarily (that's going to be difficult as far
> as I can see).
>
> This way, in principle, we should be able to save (at least almost) as much
> energy as the opportunistic suspend currently used by Android, provided that
> things will be capable of staying idle for extended periods of time.
>
> (*) That may require per-device PM QoS requirements to be used, in which case
> devices may even be suspended earlier if the PM QoS requirements of all
> of their users are met.
>
> I wonder what people think. Is this realistic and if so, would it be difficult
> to implement?

I think it's realistic and not overly complicated to implement.

Thanks,

tglx

Arve Hjønnevåg

unread,

Jun 4, 2010, 8:20:02 PM6/4/10

to

2010/6/4 Peter Zijlstra <pet...@infradead.org>:

There are many proposes schemes. I assume you mean freezing only
untrusted processes and nothing else.

> Only untrusted code will be frozen. And trusted processes are reliable
> for thawing the untrusted processes and delivering events to it.
>

I have two problems with this. I don't want to funnel all events
trough trusted processes, and I also want to freeze trusted processes.

> Trusted processes are assumed to be sane and idle when there is nothing
> for them to do, allowing the machine to go into deep idle states.
>

Neither the kernel nor our trusted user-space code currently meets
this criteria.

--
Arve Hjønnevåg

Arve Hjønnevåg

unread,

Jun 4, 2010, 8:40:02 PM6/4/10

to

On Fri, Jun 4, 2010 at 5:05 PM, Thomas Gleixner <tg...@linutronix.de> wrote:
> On Sat, 5 Jun 2010, Rafael J. Wysocki wrote:
>> I kind of agree here, so I'd like to focus a bit on that.
>>
>> Here's my idea in the very general terms:
>>
>> (1) Use the cgroup freezer to "suspend" the "untrusted" apps (ie. the ones
>> that don't use suspend blockers aka wakelocks in the Android world) at the
>> point Android would normally start opportunistic suspend.
>
> There is an additional benefit to this approach:
>
> In the current android world a background task (e.g. download
> initiated before the screensaver kicked in) prevents the suspend,
> but that also means that the crapplications can still suck power
> completely unconfined.
>

Yes this can happen. It is usually only a big problem when you combine
an (trusted) application that has a bug that blocks suspend forever
with an application that wakes up too often for us to enter low power
idle modes.

> With the cgroup freezer you can "suspend" them right away and
> just keep the trusted background task(s) alive which allows us to
> go into deeper idle states instead of letting the crapplications
> run unconfined until the download finished and the suspend
> blocker goes away.
>

Yes this would be better, but I want it in addition to suspend, not
instead of it. It is also unclear if our user-space code could easily
make use of it since our trusted code calls into untrusted code.

>> (2) Allow the cpuidle framework to put CPUs into low-power states after the
>> "trusted" apps (ie. the ones that use suspend blockers in the Android
>> world) have gone idle.
>>

As far as I know this is what we already have on hardware that supports it.

>> (3) Teach the cpuidle framework to schedule runtime suspend of I/O devices
>> before idling the last CPU (*).
>>

I don't think we need this for android phones. We already put I/O
devices in low power modes when they are not in use.

>> (4) Design a mechanism to resume the I/O devices suspended in (3) so that
>> they are not powered up unnecessarily (that's going to be difficult as far
>> as I can see).
>>
>> This way, in principle, we should be able to save (at least almost) as much
>> energy as the opportunistic suspend currently used by Android, provided that
>> things will be capable of staying idle for extended periods of time.

The main reason we use suspend is that the system does not stay idle
for extened periods of time. If this gets fixed, and our if user-space
framework can deal with a subset of processes being frozen (this is a
big if) this solution may work, but it does not help us today.

>>
>> (*) That may require per-device PM QoS requirements to be used, in which case
>> devices may even be suspended earlier if the PM QoS requirements of all
>> of their users are met.
>>
>> I wonder what people think. Is this realistic and if so, would it be difficult
>> to implement?
>
> I think it's realistic and not overly complicated to implement.
>

The kernel support can be easily implemented on most arm hardware, I
don't know if it can work on most existing x86 hardware. It does not
give us the same power savings as suspend with existing software, but
it can handle bad apps better (assuming you don't combine
opportunistic suspend and cgroup freezing). The biggest hurdle is how
to handle dependencies between processes that gets frozen and
processes that don't get frozen.

--
Arve Hjønnevåg

Matt Helsley

unread,

Jun 4, 2010, 9:20:01 PM6/4/10

to

On Fri, Jun 04, 2010 at 05:39:17PM -0700, Arve Hj�nnev�g wrote:
> On Fri, Jun 4, 2010 at 5:05 PM, Thomas Gleixner <tg...@linutronix.de> wrote:
> > On Sat, 5 Jun 2010, Rafael J. Wysocki wrote:

<snip>

>
> > � � With the cgroup freezer you can "suspend" them right away and

> > � � just keep the trusted background task(s) alive which allows us to
> > � � go into deeper idle states instead of letting the crapplications
> > � � run unconfined until the download finished and the suspend
> > � � blocker goes away.
> >
>
> Yes this would be better, but I want it in addition to suspend, not
> instead of it. It is also unclear if our user-space code could easily
> make use of it since our trusted code calls into untrusted code.
>

Perhaps I'm misunderstanding, but suspend and the cgroup freezer
interoperate well today -- you don't have to choose one or the other.
If you've discovered otherwise I'd consider it a bug and would like to
hear more about it.

<snip>

> it can handle bad apps better (assuming you don't combine
> opportunistic suspend and cgroup freezing).

I don't see why that would be a problem. The cgroup freezer works
independently of the suspend freezer -- even with suspend blockers.
So my hunch is this is really the same as the next problem you refer to:

> The biggest hurdle is how
> to handle dependencies between processes that gets frozen and
> processes that don't get frozen.

I'm not sure it covers everything you want, but it should be possible to
identify some of those so long as you know which process you're
communicating with.

A trusted app can look up the freezer cgroup of a target app in /proc, then
look at the cgroup's freezer.state file. If it's FREEZING or FROZEN then
you've very likely got a "bad" dependency.

For example, say a trusted app plans on doing a blocking read() to fetch
the output of an untrusted app via a pipe. Assuming we know the untrusted
app's pid we could then check the dependency and determine that we're likely
to block because the untrusted app's freezer cgroup is FREEZING or FROZEN.
(certain to block if we see FROZEN)

That said, it involves quite a few system calls compared to a simple read()
from the pipe. So my guess is it would be a debugging tool at best -- not
something you always have enabled.

It may even be possible to make an lsof-like debugging tool to do that from
outside both apps.

Cheers,
-Matt Helsley

Thomas Gleixner

unread,

Jun 4, 2010, 9:40:01 PM6/4/10

to

Arve,

On Fri, 4 Jun 2010, Arve Hjï¿œnnevï¿œg wrote:

> On Fri, Jun 4, 2010 at 5:05 PM, Thomas Gleixner <tg...@linutronix.de> wrote:
> > On Sat, 5 Jun 2010, Rafael J. Wysocki wrote:
> >> I kind of agree here, so I'd like to focus a bit on that.
> >>
> >> Here's my idea in the very general terms:
> >>
> >> (1) Use the cgroup freezer to "suspend" the "untrusted" apps (ie. the ones

> >> ï¿œ ï¿œ that don't use suspend blockers aka wakelocks in the Android world) at the
> >> ï¿œ ï¿œ point Android would normally start opportunistic suspend.

> >
> > There is an additional benefit to this approach:
> >

> > ï¿œ ï¿œ In the current android world a background task (e.g. download
> > ï¿œ ï¿œ initiated before the screensaver kicked in) prevents the suspend,
> > ï¿œ ï¿œ but that also means that the crapplications can still suck power
> > ï¿œ ï¿œ completely unconfined.

> >
>
> Yes this can happen. It is usually only a big problem when you combine
> an (trusted) application that has a bug that blocks suspend forever
> with an application that wakes up too often for us to enter low power
> idle modes.

Why is it a BUG in the trusted app, when I initiate a download and put
the phone down ?

That download might take a minute or two, but that's not an
justification for the crapplication to run unconfined and prevent
lower power states.

> > ï¿œ ï¿œ With the cgroup freezer you can "suspend" them right away and
> > ï¿œ ï¿œ just keep the trusted background task(s) alive which allows us to
> > ï¿œ ï¿œ go into deeper idle states instead of letting the crapplications
> > ï¿œ ï¿œ run unconfined until the download finished and the suspend
> > ï¿œ ï¿œ blocker goes away.

> >
>
> Yes this would be better, but I want it in addition to suspend, not
> instead of it. It is also unclear if our user-space code could easily
> make use of it since our trusted code calls into untrusted code.

Sorry, that's really the worst argument I saw in this whole
discussion.

You're basically saying, that you have no idea what your user space
stack is doing and you do not care at all as long as your suspend
blocker scheme makes things work somehow.

Up to that point, I really tried hard to step back from my initial
"OMG, promoting crap is a nono" reaction and work with you on a
sensible technical solution to confine crap and make it aligned with
other efforts in this area.

So now, after I spent a reasonable amount of time (as you did) to
understand what your requirements are, you come up with another
restriction which is so outside of any level of sanity, that I'm at
the point of giving up and just going into NAK mode.

Can you please answer the following question:

What is the point of having the distinction of "trusted" and
"untrusted" when you have no way to prevent "trusted" code calling
"into "untrusted" code ?

That's violating any sense of abstraction and layering and makes it
entirely clear that the only way you can deal with your own design
failure is a big hammer which you need to force into the kernel.

Sorry, no. I'm perfectly willing to make progress on that, as long as
we walk on a sane ground. But abusing the kernel for fixing basic
engineering problems in the user space side of affairs is completely
out of discussion.

> >> I wonder what people think. ï¿œIs this realistic and if so, would it be difficult

> >> to implement?
> >
> > I think it's realistic and not overly complicated to implement.
> >
>
> The kernel support can be easily implemented on most arm hardware, I
> don't know if it can work on most existing x86 hardware. It does not

It does not matter. Even Intel folks told you more than once, that x86
hardware is going to be fixed pretty soon. Hint: that's crucial to
their business ....

> give us the same power savings as suspend with existing software, but
> it can handle bad apps better (assuming you don't combine
> opportunistic suspend and cgroup freezing). The biggest hurdle is how
> to handle dependencies between processes that gets frozen and
> processes that don't get frozen.

See above.

Thanks,

tglx

Arve Hjønnevåg

unread,

Jun 5, 2010, 1:30:02 AM6/5/10

to

2010/6/4 Thomas Gleixner <tg...@linutronix.de>:
> Arve,

>
> On Fri, 4 Jun 2010, Arve Hjønnevåg wrote:
>
>> On Fri, Jun 4, 2010 at 5:05 PM, Thomas Gleixner <tg...@linutronix.de> wrote:
>> > On Sat, 5 Jun 2010, Rafael J. Wysocki wrote:
>> >> I kind of agree here, so I'd like to focus a bit on that.
>> >>
>> >> Here's my idea in the very general terms:
>> >>
>> >> (1) Use the cgroup freezer to "suspend" the "untrusted" apps (ie. the ones

>> >> that don't use suspend blockers aka wakelocks in the Android world) at the

>> >> point Android would normally start opportunistic suspend.
>> >
>> > There is an additional benefit to this approach:
>> >

>> > In the current android world a background task (e.g. download

>> > initiated before the screensaver kicked in) prevents the suspend,

>> > but that also means that the crapplications can still suck power

>> > completely unconfined.
>> >
>>
>> Yes this can happen. It is usually only a big problem when you combine
>> an (trusted) application that has a bug that blocks suspend forever
>> with an application that wakes up too often for us to enter low power
>> idle modes.
>
> Why is it a BUG in the trusted app, when I initiate a download and put
> the phone down ?
>

It is not, but we have had bugs where a trusted app does not unblock
suspend after some failure case where it is no longer making any
progress.

> That download might take a minute or two, but that's not an
> justification for the crapplication to run unconfined and prevent
> lower power states.
>

I agree, but this is not a simple problem to solve.

>> > With the cgroup freezer you can "suspend" them right away and

>> > just keep the trusted background task(s) alive which allows us to

>> > go into deeper idle states instead of letting the crapplications

>> > run unconfined until the download finished and the suspend

>> > blocker goes away.
>> >
>>
>> Yes this would be better, but I want it in addition to suspend, not
>> instead of it. It is also unclear if our user-space code could easily
>> make use of it since our trusted code calls into untrusted code.
>
> Sorry, that's really the worst argument I saw in this whole
> discussion.
>
> You're basically saying, that you have no idea what your user space
> stack is doing and you do not care at all as long as your suspend
> blocker scheme makes things work somehow.
>

Yes I don't know everything our user-space stack is doing, but I do
know that it makes many calls between processes (and in both
directions). As far as I know it uses timeouts when calling into
untrusted code, so a misbehaving application will cause an error
dialog to pop up asking if the user if it should wait longer or
terminate the application.

> Up to that point, I really tried hard to step back from my initial
> "OMG, promoting crap is a nono" reaction and work with you on a
> sensible technical solution to confine crap and make it aligned with
> other efforts in this area.
>
> So now, after I spent a reasonable amount of time (as you did) to
> understand what your requirements are, you come up with another
> restriction which is so outside of any level of sanity, that I'm at
> the point of giving up and just going into NAK mode.
>

I don't think this is a new restriction. Both Brian and I have
mentioned that we have a lot of dependencies between processes.

> Can you please answer the following question:
>
> What is the point of having the distinction of "trusted" and
> "untrusted" when you have no way to prevent "trusted" code calling
> "into "untrusted" code ?
>

Trusted code that calls into untrusted code has to deal with the
untrusted code not responding, but we only want to pop up a message
that the application is not responding if it is misbehaving, not just
because it was frozen though no fault of its own.

> That's violating any sense of abstraction and layering and makes it
> entirely clear that the only way you can deal with your own design
> failure is a big hammer which you need to force into the kernel.
>

How can it be fixed? The user presses the back button, the framework
determines that app A is in the foreground and send the key to app A,
app A decides that it it does not have anything internal to go back to
and tells the framework to switch back to the previous app. If the
user presses the back key again, the framework does not know which app
this key should go to until app A has finished processing the first
key press.

> Sorry, no. I'm perfectly willing to make progress on that, as long as
> we walk on a sane ground. But abusing the kernel for fixing basic
> engineering problems in the user space side of affairs is completely
> out of discussion.
>

>> >> I wonder what people think. Is this realistic and if so, would it be difficult

>> >> to implement?
>> >
>> > I think it's realistic and not overly complicated to implement.
>> >
>>
>> The kernel support can be easily implemented on most arm hardware, I
>> don't know if it can work on most existing x86 hardware. It does not
>
> It does not matter. Even Intel folks told you more than once, that x86

How does it not matter. Are dropping support for existing x86 hardware
once the new hardware comes out?

> hardware is going to be fixed pretty soon. Hint: that's crucial to
> their business ....
>
>> give us the same power savings as suspend with existing software, but
>> it can handle bad apps better (assuming you don't combine
>> opportunistic suspend and cgroup freezing). The biggest hurdle is how
>> to handle dependencies between processes that gets frozen and
>> processes that don't get frozen.
>
> See above.
>
> Thanks,
>
> tglx

--

Arve Hjønnevåg

unread,

Jun 5, 2010, 1:40:02 AM6/5/10

to

2010/6/4 Matt Helsley <matt...@us.ibm.com>:

> On Fri, Jun 04, 2010 at 05:39:17PM -0700, Arve Hj�nnev�g wrote:
>> On Fri, Jun 4, 2010 at 5:05 PM, Thomas Gleixner <tg...@linutronix.de> wrote:
>> > On Sat, 5 Jun 2010, Rafael J. Wysocki wrote:
>
> <snip>
>
>>
>> > � � With the cgroup freezer you can "suspend" them right away and
>> > � � just keep the trusted background task(s) alive which allows us to
>> > � � go into deeper idle states instead of letting the crapplications
>> > � � run unconfined until the download finished and the suspend
>> > � � blocker goes away.
>> >
>>
>> Yes this would be better, but I want it in addition to suspend, not
>> instead of it. It is also unclear if our user-space code could easily
>> make use of it since our trusted code calls into untrusted code.
>>
>
> Perhaps I'm misunderstanding, but suspend and the cgroup freezer
> interoperate well today -- you don't have to choose one or the other.
> If you've discovered otherwise I'd consider it a bug and would like to
> hear more about it.
>

I'm not aware of any bug with combining both, but we cannot use
suspend at all without suspend blockers in the kernel (since wakeup
events may be ignored) and I don't know how we can safely freeze
cgroups without funneling all potential wakeup events through a
process that never gets frozen.

> <snip>
>
>> it can handle bad apps better (assuming you don't combine
>> opportunistic suspend and cgroup freezing).
>
> I don't see why that would be a problem. The cgroup freezer works
> independently of the suspend freezer -- even with suspend blockers.
> So my hunch is this is really the same as the next problem you refer to:
>
>> The biggest hurdle is how
>> to handle dependencies between processes that gets frozen and
>> processes that don't get frozen.
>
> I'm not sure it covers everything you want, but it should be possible to
> identify some of those so long as you know which process you're
> communicating with.
>
> A trusted app can look up the freezer cgroup of a target app in /proc, then
> look at the cgroup's freezer.state file. If it's FREEZING or FROZEN then
> you've very likely got a "bad" dependency.
>

I don't think they are "bad" dependencies. Our framework has to
communicate with apps.

> For example, say a trusted app plans on doing a blocking read() to fetch
> the output of an untrusted app via a pipe. Assuming we know the untrusted
> app's pid we could then check the dependency and determine that we're likely
> to block because the untrusted app's freezer cgroup is FREEZING or FROZEN.
> (certain to block if we see FROZEN)
>
> That said, it involves quite a few system calls compared to a simple read()
> from the pipe. So my guess is it would be a debugging tool at best -- not
> something you always have enabled.
>
> It may even be possible to make an lsof-like debugging tool to do that from
> outside both apps.
>
> Cheers,
> � � � �-Matt Helsley
>

--
Arve Hj�nnev�g

Peter Zijlstra

unread,

Jun 5, 2010, 6:00:02 AM6/5/10

to

On Fri, 2010-06-04 at 17:10 -0700, Arve Hjønnevåg wrote:
> > Trusted processes are assumed to be sane and idle when there is nothing
> > for them to do, allowing the machine to go into deep idle states.
> >
>
> Neither the kernel nor our trusted user-space code currently meets
> this criteria.

Then both need fixing. Really, that's the only sane approach.

Arjan van de Ven

unread,

Jun 5, 2010, 12:30:03 PM6/5/10

to

On Sat, 05 Jun 2010 11:54:13 +0200
Peter Zijlstra <pet...@infradead.org> wrote:

> On Fri, 2010-06-04 at 17:10 -0700, Arve Hjønnevåg wrote:
> > > Trusted processes are assumed to be sane and idle when there is
> > > nothing for them to do, allowing the machine to go into deep idle
> > > states.
> > >
> >
> > Neither the kernel nor our trusted user-space code currently meets
> > this criteria.
>
> Then both need fixing. Really, that's the only sane approach.

fwiw... in MeeGo we're seeing quite good idle times (> 1 seconds)
without really bad hacks.

the kernel has a set of infrastructure already to help here (range
timers, with which you can wakeup-limit untrusted userspace crap),
timer slack for legacy background timers, etc etc.

getting to 10 seconds is not in the range of impossibilities to be
honest... and that's even without doing things like putting untrusted
junk (read: Appstore apps) into a cgroup and do wakeup limiting and cpu
time limiting on a cgroup level....

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

Thomas Gleixner

unread,

Jun 5, 2010, 1:00:02 PM6/5/10

to

Well, that's simply an application bug which sucks battery with or
without suspend blockers. So it's unrelated to the freezing of
untrusted apps while a trusted app still works in the background
before allowing the machine to suspend.

> > That download might take a minute or two, but that's not an
> > justification for the crapplication to run unconfined and prevent
> > lower power states.
> >
>
> I agree, but this is not a simple problem to solve.

Not with suspend blockers, but with cgroup confinement of crap, it's
straight forward.

> >> > With the cgroup freezer you can "suspend" them right away and
> >> > just keep the trusted background task(s) alive which allows us to
> >> > go into deeper idle states instead of letting the crapplications
> >> > run unconfined until the download finished and the suspend
> >> > blocker goes away.
> >> >
> >>
> >> Yes this would be better, but I want it in addition to suspend, not
> >> instead of it. It is also unclear if our user-space code could easily
> >> make use of it since our trusted code calls into untrusted code.
> >
> > Sorry, that's really the worst argument I saw in this whole
> > discussion.
> >
> > You're basically saying, that you have no idea what your user space
> > stack is doing and you do not care at all as long as your suspend
> > blocker scheme makes things work somehow.
> >
>
> Yes I don't know everything our user-space stack is doing, but I do
> know that it makes many calls between processes (and in both
> directions). As far as I know it uses timeouts when calling into
> untrusted code, so a misbehaving application will cause an error
> dialog to pop up asking if the user if it should wait longer or
> terminate the application.

Sigh, the more I learn about the details of android and it's violation
of all sane engineering principles the more I understand why you
invented a huge nail to push through all layers in order to bring the
system into idle at all. And yes, you need a sledge hammer to drive
that big nail through everything, so you are using the right tool.

Seriously, the cross app call goes through your framework, which
already knows, that the untrusted part is frozen. So it can deal
nicely with it in any way you want including unfreezing.

> > Up to that point, I really tried hard to step back from my initial
> > "OMG, promoting crap is a nono" reaction and work with you on a
> > sensible technical solution to confine crap and make it aligned with
> > other efforts in this area.
> >
> > So now, after I spent a reasonable amount of time (as you did) to
> > understand what your requirements are, you come up with another
> > restriction which is so outside of any level of sanity, that I'm at
> > the point of giving up and just going into NAK mode.
> >
>
> I don't think this is a new restriction. Both Brian and I have
> mentioned that we have a lot of dependencies between processes.
>
> > Can you please answer the following question:
> >
> > What is the point of having the distinction of "trusted" and
> > "untrusted" when you have no way to prevent "trusted" code calling
> > "into "untrusted" code ?
> >
>
> Trusted code that calls into untrusted code has to deal with the
> untrusted code not responding, but we only want to pop up a message
> that the application is not responding if it is misbehaving, not just
> because it was frozen though no fault of its own.

See above.

> > That's violating any sense of abstraction and layering and makes it
> > entirely clear that the only way you can deal with your own design
> > failure is a big hammer which you need to force into the kernel.
> >
>
> How can it be fixed? The user presses the back button, the framework
> determines that app A is in the foreground and send the key to app A,
> app A decides that it it does not have anything internal to go back to
> and tells the framework to switch back to the previous app. If the
> user presses the back key again, the framework does not know which app
> this key should go to until app A has finished processing the first
> key press.

Errm, what has this to do with frozen apps? If your system is
handling input events then there are no frozen apps and even if they
are frozen your framework can unfreeze them _before_ talking to them.

So which unfixable problem are you describing with the above example ?

Thanks,

tglx

Rafael J. Wysocki

unread,

Jun 5, 2010, 2:30:01 PM6/5/10

to

On Saturday 05 June 2010, Arve Hjønnevåg wrote:
> 2010/6/4 Matt Helsley <matt...@us.ibm.com>:

> > On Fri, Jun 04, 2010 at 05:39:17PM -0700, Arve Hjønnevåg wrote:
> >> On Fri, Jun 4, 2010 at 5:05 PM, Thomas Gleixner <tg...@linutronix.de> wrote:
> >> > On Sat, 5 Jun 2010, Rafael J. Wysocki wrote:
> >
> > <snip>
> >
> >>
> >> > With the cgroup freezer you can "suspend" them right away and
> >> > just keep the trusted background task(s) alive which allows us to
> >> > go into deeper idle states instead of letting the crapplications
> >> > run unconfined until the download finished and the suspend
> >> > blocker goes away.
> >> >
> >>
> >> Yes this would be better, but I want it in addition to suspend, not
> >> instead of it. It is also unclear if our user-space code could easily
> >> make use of it since our trusted code calls into untrusted code.
> >>
> >
> > Perhaps I'm misunderstanding, but suspend and the cgroup freezer
> > interoperate well today -- you don't have to choose one or the other.
> > If you've discovered otherwise I'd consider it a bug and would like to
> > hear more about it.
> >
>
> I'm not aware of any bug with combining both, but we cannot use
> suspend at all without suspend blockers in the kernel (since wakeup
> events may be ignored)

The more I think of it, the more it appears to me that the problem of
lost wakeup events can actually be solved without suspend blockers.
I'll send a bunch of patches to address this issue, probably tomorrow.

> and I don't know how we can safely freeze
> cgroups without funneling all potential wakeup events through a
> process that never gets frozen.

If your untrusted apps get called by the trusted ones, they aren't really
untrusted in the first place.

From what you're saying it follows that you're not really willing to accept
any solution different to your suspend blockers. Is that really the case?

Rafael

Florian Mickler

unread,

Jun 5, 2010, 4:40:01 PM6/5/10

to

On Thu, 3 Jun 2010 19:16:55 -0700 (PDT)
Linus Torvalds <torv...@linux-foundation.org> wrote:

> The thing is, unless there is some _really_ deep other reason to do
> something like this, I still think it's total overdesign to push any
> knowledge/choices like this into the scheduler. I'd rather keep things way
> more independent, less tied to each other and to deep kernel subsystems.
>
> IOW, my personal opinion is that somethng like a suspend (blocker or not)
> decision simply shouldn't be important enough to be tied into the
> scheduler. Especially not if it could just be its own layer.
>
> That said, as far as I know, the Android people have mostly been looking
> at the suspend angle from a single-core standpoint. And I'm not at all
> convinced that they should hijack the existing "/sys/power/state" thing
> which is what I think they do now.
>
> And those two things go together. The /sys/power/state thing is a global
> suspend - which I don't think is appropriate for a opportunistic thing in
> the first place, especially for multi-core.
>

This sounds right.

If there is soo much need for a better solution, it will emerge. With
merged suspend blockers or not.

Just my 2 cents.

> Linus

Cheers,
Flo

Arve Hjønnevåg

unread,

Jun 5, 2010, 5:30:02 PM6/5/10

to

On Sat, Jun 5, 2010 at 9:28 AM, Arjan van de Ven <ar...@infradead.org> wrote:
> On Sat, 05 Jun 2010 11:54:13 +0200
> Peter Zijlstra <pet...@infradead.org> wrote:
>

>> On Fri, 2010-06-04 at 17:10 -0700, Arve Hj�nnev�g wrote:
>> > > Trusted processes are assumed to be sane and idle when there is
>> > > nothing for them to do, allowing the machine to go into deep idle
>> > > states.
>> > >
>> >
>> > Neither the kernel nor our trusted user-space code currently meets
>> > this criteria.
>>
>> Then both need fixing. Really, that's the only sane approach.
>
> fwiw... in MeeGo we're seeing quite good idle times (> 1 seconds)
> without really bad hacks.
>

We clearly have different standards for what we consider good. We
measure time suspended in minutes or hours, not seconds, and waking up
every second or two causes a noticeable decrease in battery life on
the hardware we have today.

> the kernel has a set of infrastructure already to help here (range
> timers, with which you can wakeup-limit untrusted userspace crap),
> timer slack for legacy background timers, etc etc.

Range timers allows the kernel to align different timers so they don't
each bring the cpu out of idle individually. They do not eliminate
timers or make individual timers fire less often. For example if you
have 10 timers that fire every second on an idle system, without range
timers you will most likely have to bring the cpu out of idle 10 times
a second, but with range timers you have a chance of waking up only
once a second (I say a chance here, since if they are identical they
will just chase each other and never catch up).

>
> getting to 10 seconds is not in the range of impossibilities to be
> honest... and that's even without doing things like putting untrusted

That is still far short of what we get with suspend (in terms of time).

> junk (read: Appstore apps) into a cgroup and do wakeup limiting and cpu
> time limiting on a cgroup level....
>
>
> --
> Arjan van de Ven � � � �Intel Open Source Technology Centre
> For development, discussion and tips for power savings,
> visit http://www.lesswatts.org
>

--
Arve Hj�nnev�g

Arve Hjønnevåg

unread,

Jun 5, 2010, 5:50:02 PM6/5/10

to

2010/6/5 Thomas Gleixner <tg...@linutronix.de>:

It is not unrelated if the trusted app has stopped working but still
blocks suspend. The battery drains when you combine them.

>> > That download might take a minute or two, but that's not an
>> > justification for the crapplication to run unconfined and prevent
>> > lower power states.
>> >
>>
>> I agree, but this is not a simple problem to solve.
>
> Not with suspend blockers, but with cgroup confinement of crap, it's
> straight forward.
>

I don't think is is straight forward. If the a process in the frozen
group holds a resource that a process in the unfrozen group needs, how
do deal with that?

Cross app calls do not go through a central process.

You are claiming that trusted code should not have any dependencies on
untrusted code. I gave you a visible example of such a dependency and
want you to tell me how you can avoid this dependency. Since you are
claiming that our user-space framework is fundamentally broken if it
has to wait for untrusted code, I don't think it is unreasonable for
you to answer this. Or do you think it is valid to communicate with
untrusted code when the screen is on but not when it is off.

Arve Hjønnevåg

unread,

Jun 5, 2010, 6:20:02 PM6/5/10

to

2010/6/5 Rafael J. Wysocki <r...@sisk.pl>:

> On Saturday 05 June 2010, Arve Hjønnevåg wrote:
>> 2010/6/4 Matt Helsley <matt...@us.ibm.com>:
>> > On Fri, Jun 04, 2010 at 05:39:17PM -0700, Arve Hjønnevåg wrote:
>> >> On Fri, Jun 4, 2010 at 5:05 PM, Thomas Gleixner <tg...@linutronix.de> wrote:
>> >> > On Sat, 5 Jun 2010, Rafael J. Wysocki wrote:
>> >
>> > <snip>
>> >
>> >>
>> >> > With the cgroup freezer you can "suspend" them right away and
>> >> > just keep the trusted background task(s) alive which allows us to
>> >> > go into deeper idle states instead of letting the crapplications
>> >> > run unconfined until the download finished and the suspend
>> >> > blocker goes away.
>> >> >
>> >>
>> >> Yes this would be better, but I want it in addition to suspend, not
>> >> instead of it. It is also unclear if our user-space code could easily
>> >> make use of it since our trusted code calls into untrusted code.
>> >>
>> >
>> > Perhaps I'm misunderstanding, but suspend and the cgroup freezer
>> > interoperate well today -- you don't have to choose one or the other.
>> > If you've discovered otherwise I'd consider it a bug and would like to
>> > hear more about it.
>> >
>>
>> I'm not aware of any bug with combining both, but we cannot use
>> suspend at all without suspend blockers in the kernel (since wakeup
>> events may be ignored)
>
> The more I think of it, the more it appears to me that the problem of
> lost wakeup events can actually be solved without suspend blockers.
> I'll send a bunch of patches to address this issue, probably tomorrow.
>

I know of two ways to prevent lost wakeup events. Reset a timeout
every time you receive a wakeup event or prevents suspend until you
know the event has been fully processed. Does your solution fall onto
one of these two categories, or do you have a third way?

>> and I don't know how we can safely freeze
>> cgroups without funneling all potential wakeup events through a
>> process that never gets frozen.
>
> If your untrusted apps get called by the trusted ones, they aren't really
> untrusted in the first place.
>

That is not a correct statement. A trusted apps can call into an
untrusted app, it just has to validate the response and handle not
getting a response at all. There are also different levels of trust. I
may have trusted an app to provide a contact pictures, but not trusted
it to block suspend. When the phone rings the app will be called to
provide the picture for the incoming call dialog, but if it is frozen
at this point the more trusted app that handles the incoming phone
call will not be able to get the picture.

> From what you're saying it follows that you're not really willing to accept
> any solution different to your suspend blockers. Is that really the case?
>

I don't think that is a fair way to put it. We need to support our
user-space framework and I have not seen an alternative solution that
clearly will work (other than replacing suspend_blockers with pm_qos
constraints that do the same thing).

--
Arve Hjønnevåg

Thomas Gleixner

unread,

Jun 5, 2010, 6:20:02 PM6/5/10

to

On Sat, 5 Jun 2010, Arve Hj�nnev�g wrote:
> 2010/6/5 Thomas Gleixner <tg...@linutronix.de>:

> > B1;2005;0cOn Fri, 4 Jun 2010, Arve Hj�nnev�g wrote:
> >> > Why is it a BUG in the trusted app, when I initiate a download and put
> >> > the phone down ?
> >> >
> >>
> >> It is not, but we have had bugs where a trusted app does not unblock
> >> suspend after some failure case where it is no longer making any
> >> progress.
> >
> > Well, that's simply an application bug which sucks battery with or
> > without suspend blockers. So it's unrelated to the freezing of
> > untrusted apps while a trusted app still works in the background
> > before allowing the machine to suspend.
> >
>
> It is not unrelated if the trusted app has stopped working but still
> blocks suspend. The battery drains when you combine them.

What you are describing is a problem which is not solvable either way.
If you take the lock and do not release it you're not going to
suspend. I never claimed that any other mechanism resolves this.

But this is not related to the fact that freezing crap while running a
sane background task is going to save you power vs. an approach where
running a sane background task allows crap to consume power unconfined
until it is done.

> >> > That download might take a minute or two, but that's not an
> >> > justification for the crapplication to run unconfined and prevent
> >> > lower power states.
> >> >
> >>
> >> I agree, but this is not a simple problem to solve.
> >
> > Not with suspend blockers, but with cgroup confinement of crap, it's
> > straight forward.
> >
>
> I don't think is is straight forward. If the a process in the frozen
> group holds a resource that a process in the unfrozen group needs, how
> do deal with that?

I'm going to fix the framework which puts the group into freeze state
w/o making sure that there is no held shared resource. Come on it's
not rocket science.

> >> Yes I don't know everything our user-space stack is doing, but I do
> >> know that it makes many calls between processes (and in both
> >> directions). As far as I know it uses timeouts when calling into
> >> untrusted code, so a misbehaving application will cause an error
> >> dialog to pop up asking if the user if it should wait longer or
> >> terminate the application.
> >
> > Sigh, the more I learn about the details of android and it's violation
> > of all sane engineering principles the more I understand why you
> > invented a huge nail to push through all layers in order to bring the
> > system into idle at all. And yes, you need a sledge hammer to drive
> > that big nail through everything, so you are using the right tool.
> >
> > Seriously, the cross app call goes through your framework, which
> > already knows, that the untrusted part is frozen. So it can deal
> > nicely with it in any way you want including unfreezing.
>
> Cross app calls do not go through a central process.

It's not about a central process, it goes through your framework,
which should be able to deal with it. If not, it's a design failure
which needs to be fixed at the place where the failure happened.

> >>
> >> How can it be fixed? The user presses the back button, the framework
> >> determines that app A is in the foreground and send the key to app A,
> >> app A decides that it it does not have anything internal to go back to
> >> and tells the framework to switch back to the previous app. If the
> >> user presses the back key again, the framework does not know which app
> >> this key should go to until app A has finished processing the first
> >> key press.
> >
> > Errm, what has this to do with frozen apps? If your system is
> > handling input events then there are no frozen apps and even if they
> > are frozen your framework can unfreeze them _before_ talking to them.
> >
> > So which unfixable problem are you describing with the above example ?
> >
>
> You are claiming that trusted code should not have any dependencies on
> untrusted code. I gave you a visible example of such a dependency and
> want you to tell me how you can avoid this dependency. Since you are
> claiming that our user-space framework is fundamentally broken if it
> has to wait for untrusted code, I don't think it is unreasonable for
> you to answer this. Or do you think it is valid to communicate with
> untrusted code when the screen is on but not when it is off.

It does not matter whether the screen is off or not. If you need to
call into that untrusted app from your trusted app and you know about
the might be frozen state then you can deal with it.

So taking your example:

Event happens and gets delivered to the framework

framework selects A because it is in the foreground

if (A is frozen)
unfreeze(A)

deliver_event_to(A)

It's that simple.

If your framework cannot deal with that simple problem then you have a
much more serious problem already.

Thanks,

tglx

Arjan van de Ven

unread,

Jun 5, 2010, 6:30:01 PM6/5/10

to

On Sat, 5 Jun 2010 14:26:14 -0700
Arve Hjønnevåg <ar...@android.com> wrote:

> On Sat, Jun 5, 2010 at 9:28 AM, Arjan van de Ven
> <ar...@infradead.org> wrote:
> > On Sat, 05 Jun 2010 11:54:13 +0200
> > Peter Zijlstra <pet...@infradead.org> wrote:
> >

> >> On Fri, 2010-06-04 at 17:10 -0700, Arve Hjønnevåg wrote:
> >> > > Trusted processes are assumed to be sane and idle when there is
> >> > > nothing for them to do, allowing the machine to go into deep
> >> > > idle states.
> >> > >
> >> >
> >> > Neither the kernel nor our trusted user-space code currently
> >> > meets this criteria.
> >>
> >> Then both need fixing. Really, that's the only sane approach.
> >
> > fwiw... in MeeGo we're seeing quite good idle times (> 1 seconds)
> > without really bad hacks.
> >
>
> We clearly have different standards for what we consider good. We
> measure time suspended in minutes or hours, not seconds, and waking up
> every second or two causes a noticeable decrease in battery life on
> the hardware we have today.

I guess I'm spoiled working with (unreleased) hardware that knows how
to power gate ;-)

>
> > the kernel has a set of infrastructure already to help here (range
> > timers, with which you can wakeup-limit untrusted userspace crap),
> > timer slack for legacy background timers, etc etc.
>
> Range timers allows the kernel to align different timers so they don't
> each bring the cpu out of idle individually. They do not eliminate
> timers or make individual timers fire less often.

you're incorrect.
With range timers you can control the rate at which timers fire just
fine.

For example if the Adobe Flash player puts a timer every 10
milliseconds (yes it does that), and you give it a 3.99 seconds range,
it will fire its timers every 4 seconds.... unless other activity
happens independently, at which point it'll align with that instead.

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--

Brian Swetland

unread,

Jun 5, 2010, 6:30:01 PM6/5/10

to

On Sat, Jun 5, 2010 at 3:23 PM, Arjan van de Ven <ar...@infradead.org> wrote:
>>
>> We clearly have different standards for what we consider good. We
>> measure time suspended in minutes or hours, not seconds, and waking up
>> every second or two causes a noticeable decrease in battery life on
>> the hardware we have today.
>
> I guess I'm spoiled working with (unreleased) hardware that knows how
> to power gate ;-)

I'm continually surprised by answers like this. We run on hardware
that power gates very aggressively and draws in the neighborhood of
1-2mA at the battery when in the lowest state (3-5mA while the radio
is connected to the network and paging). Waking up out of that lowest
state and executing code every few seconds or (worse) several times a
second) will raise your average power consumption. Being able to stay
parked at the very bottom for minutes or hours at a time when nothing
"interesting" is happening is very useful and can have a significant
impact on overall battery life.

Brian

Arve Hjønnevåg

unread,

Jun 5, 2010, 6:50:01 PM6/5/10

to

2010/6/5 Arjan van de Ven <ar...@infradead.org>:

If you do that what you are delivering is nowhere close to what the
app asked for. You don't need range timers for this, you could just as
well add 4 seconds to all normal timers.

--
Arve Hjønnevåg

Rafael J. Wysocki

unread,

Jun 5, 2010, 6:50:01 PM6/5/10

to

On Saturday 05 June 2010, Arve Hjønnevåg wrote:

That depends a good deal on what you mean by holding a resource.

Generally, however, if your "trusted" processes depend on the processes you
don't trust, then either the former should not be trusted, or the latter should
be trusted.

Well, yeah.

Arve, we're still learning you have some more requirements we had no idea
about before and such that _only_ the suspend blockers (or wakelocks) framework
is suitable to satisfy them. I don't realistically think we can make any
progress this way.

> >> > Up to that point, I really tried hard to step back from my initial
> >> > "OMG, promoting crap is a nono" reaction and work with you on a
> >> > sensible technical solution to confine crap and make it aligned with
> >> > other efforts in this area.
> >> >
> >> > So now, after I spent a reasonable amount of time (as you did) to
> >> > understand what your requirements are, you come up with another
> >> > restriction which is so outside of any level of sanity, that I'm at
> >> > the point of giving up and just going into NAK mode.
> >> >
> >>
> >> I don't think this is a new restriction. Both Brian and I have
> >> mentioned that we have a lot of dependencies between processes.

Which is not the same as "the dependencies are such that they can't be
taken into account in any way other than by using wakelocks (or suspend
blockers)".

> >> > Can you please answer the following question:
> >> >
> >> > What is the point of having the distinction of "trusted" and
> >> > "untrusted" when you have no way to prevent "trusted" code calling
> >> > "into "untrusted" code ?
> >> >
> >>
> >> Trusted code that calls into untrusted code has to deal with the
> >> untrusted code not responding, but we only want to pop up a message
> >> that the application is not responding if it is misbehaving, not just
> >> because it was frozen though no fault of its own.

When Android starts opportunistic suspend, all applications are frozen,
"trusted" as well as "untrusted", right? So, after they are all frozen, none
of them can do anything to prevent suspend from happening, right?

Now, in my proposed approach the "untrusted" apps are frozen exactly at the
point Android would start opportunistic suspend and they wouldn't be able
to do anything about that anyway. So if one of your "trusted" apps depends
on the "untrusted" ones in a way that you describe, you alread have a bug
(the "trusted" app cannot prevent automatic suspend from happening even if it
wants, because it depends on an "untrusted" app that has just been frozen).

> >> > That's violating any sense of abstraction and layering and makes it
> >> > entirely clear that the only way you can deal with your own design
> >> > failure is a big hammer which you need to force into the kernel.
> >> >
> >>
> >> How can it be fixed? The user presses the back button, the framework
> >> determines that app A is in the foreground and send the key to app A,
> >> app A decides that it it does not have anything internal to go back to
> >> and tells the framework to switch back to the previous app. If the
> >> user presses the back key again, the framework does not know which app
> >> this key should go to until app A has finished processing the first
> >> key press.
> >
> > Errm, what has this to do with frozen apps? If your system is
> > handling input events then there are no frozen apps and even if they
> > are frozen your framework can unfreeze them _before_ talking to them.
> >
> > So which unfixable problem are you describing with the above example ?
> >
>
> You are claiming that trusted code should not have any dependencies on
> untrusted code.

Not "any". It shouldn't have dependencies that make a difference between
"trusted" and "untrusted".

Think of security, for example. A root-owned process surely can exchange data
with processes owned by non-root users, but it shouldn't blindly accept any
data these processes give it.

Your wakelock-holding application is a counterpart of the root-owned process
above. It can exchange data with processes that don't take wakelocks, but not
in such a way that would prevent them from taking wakelocks if necessary
(or from dropping wakelocks if no longer needed from their point of view).

If this condition is satisfied, then I claim you won't have any problems with
freezing the "untrusted" apps upfront. If this condition is not satisfied, in
turn, your framework already doesn't work.

Rafael

Arjan van de Ven

unread,

Jun 5, 2010, 6:50:02 PM6/5/10

to

On Sat, 5 Jun 2010 15:26:36 -0700
Brian Swetland <swet...@google.com> wrote:

>
> I'm continually surprised by answers like this. We run on hardware
> that power gates very aggressively and draws in the neighborhood of
> 1-2mA at the battery when in the lowest state (3-5mA while the radio
> is connected to the network and paging). Waking up out of that lowest
> state and executing code every few seconds or (worse) several times a
> second) will raise your average power consumption. Being able to stay
> parked at the very bottom for minutes or hours at a time when nothing
> "interesting" is happening is very useful and can have a significant
> impact on overall battery life.

It's relatively simple math.

If you wake up for a burst of work, you burn power at the higher level
P1 (versus the lower power level P2), for, lets say an average time T,
with a relatively small T (few milliseconds at most).

If you wake up X times per second (with X being a fractional number, so
can be smaller than 1) the extra power consumption factor is

X * T * P1
-------------------------------
X * T * P1 + (1.0 - X * T) * P2

if you draw a graph of this, for real values of P and T, there's a real
point where you hit diminishing returns.

if say T is 5 milliseconds (that's a high amount), and X is 1
wakeup/second, then there's already a 200:1 ratio in time an power.

If X goes to once every 10 seconds (not unreasonable, especially since
any real device will pull email and stuff in the backgroudn), you have
2000:1 time and power ratios...

Unless your "on" power is insane high (and hopefully it's not, since
you're not turning on the whole device obviously, you do selective
power and clock gating)... that "divide by 200 or 2000" makes the whole
problem go away.. in the "seconds" range for really low power devices.
Not in "hours" range.

On laptops (which have much more poor powermanagement) this point is
around 40 milliseconds or so.. but on phone silicon that I've seen,
both Intel and others, this is in the 1 to 5 seconds range.

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

Rafael J. Wysocki

unread,

Jun 5, 2010, 7:00:01 PM6/5/10

to

On Sunday 06 June 2010, Brian Swetland wrote:
> On Sat, Jun 5, 2010 at 3:23 PM, Arjan van de Ven <ar...@infradead.org> wrote:
> >>
> >> We clearly have different standards for what we consider good. We
> >> measure time suspended in minutes or hours, not seconds, and waking up
> >> every second or two causes a noticeable decrease in battery life on
> >> the hardware we have today.
> >
> > I guess I'm spoiled working with (unreleased) hardware that knows how
> > to power gate ;-)
>
> I'm continually surprised by answers like this. We run on hardware
> that power gates very aggressively and draws in the neighborhood of
> 1-2mA at the battery when in the lowest state (3-5mA while the radio
> is connected to the network and paging). Waking up out of that lowest
> state and executing code every few seconds or (worse) several times a
> second) will raise your average power consumption. Being able to stay
> parked at the very bottom for minutes or hours at a time when nothing
> "interesting" is happening is very useful and can have a significant
> impact on overall battery life.

Yes, and if you look at the approach I proposed in this very thread
(http://lkml.org/lkml/2010/6/4/368), it goes exactly in this direction.

And I think it is superior to the opportunistic suspend framework you have
right now, because, for example, it doesn't require you to carry out full
system resume and full system suspend every once a while to check battery
status.

And guess what, suspending and resuming the whole system actually uses energy.

Rafael

Rafael J. Wysocki

unread,

Jun 5, 2010, 7:10:01 PM6/5/10

to

Basically, it involves two mechanisms, detection of wakeup events occuring
right before suspend is started and aborting suspend if wakeup events occur
in the middle of it.

> >> and I don't know how we can safely freeze
> >> cgroups without funneling all potential wakeup events through a
> >> process that never gets frozen.
> >
> > If your untrusted apps get called by the trusted ones, they aren't really
> > untrusted in the first place.
> >
> That is not a correct statement. A trusted apps can call into an
> untrusted app, it just has to validate the response and handle not
> getting a response at all. There are also different levels of trust. I
> may have trusted an app to provide a contact pictures, but not trusted
> it to block suspend. When the phone rings the app will be called to
> provide the picture for the incoming call dialog, but if it is frozen
> at this point the more trusted app that handles the incoming phone
> call will not be able to get the picture.

It will be able to do that if it causes the frozen part of user space to be
thawed.

I think you have this problem already, though, because you use full system
suspend and all of your apps are frozen by it. So, to handle the situation you
describe above, you need to carry out full system resume that will thaw the
tasks for you. I don't see any fundamental difference betwee the two cases.

> > From what you're saying it follows that you're not really willing to accept
> > any solution different to your suspend blockers. Is that really the case?
> >
> I don't think that is a fair way to put it. We need to support our
> user-space framework and I have not seen an alternative solution that
> clearly will work (other than replacing suspend_blockers with pm_qos
> constraints that do the same thing).

Then think again of the approach I proposed and explain to me why it won't
work, because I haven't seen any convincing argument on that yet.

Rafael

Arve Hjønnevåg

unread,

Jun 5, 2010, 7:30:02 PM6/5/10

to

2010/6/5 Thomas Gleixner <tg...@linutronix.de>:

> On Sat, 5 Jun 2010, Arve Hj�nnev�g wrote:
>> 2010/6/5 Thomas Gleixner <tg...@linutronix.de>:
>> > B1;2005;0cOn Fri, 4 Jun 2010, Arve Hj�nnev�g wrote:
>> >> > Why is it a BUG in the trusted app, when I initiate a download and put
>> >> > the phone down ?
>> >> >
>> >>
>> >> It is not, but we have had bugs where a trusted app does not unblock
>> >> suspend after some failure case where it is no longer making any
>> >> progress.
>> >
>> > Well, that's simply an application bug which sucks battery with or
>> > without suspend blockers. So it's unrelated to the freezing of
>> > untrusted apps while a trusted app still works in the background
>> > before allowing the machine to suspend.
>> >
>>
>> It is not unrelated if the trusted app has stopped working but still
>> blocks suspend. The battery drains when you combine them.
>
> What you are describing is a problem which is not solvable either way.
> If you take the lock and do not release it you're not going to
> suspend. I never claimed that any other mechanism resolves this.
>

Whether you claimed it or not, this is the only case where using
cgroups would have a significant power saving over what we get with
suspend. The trusted app is idle and the untrusted app is frozen, so
we enter a low power mode from idle.

> But this is not related to the fact that freezing crap while running a
> sane background task is going to save you power vs. an approach where
> running a sane background task allows crap to consume power unconfined
> until it is done.
>

If the task that is blocking suspend is using the cpu anyway, then the
bad app does not increase the power consumption nearly as much as if
the task that blocked suspend is idle.

>> >> > That download might take a minute or two, but that's not an
>> >> > justification for the crapplication to run unconfined and prevent
>> >> > lower power states.
>> >> >
>> >>
>> >> I agree, but this is not a simple problem to solve.
>> >
>> > Not with suspend blockers, but with cgroup confinement of crap, it's
>> > straight forward.
>> >
>>
>> I don't think is is straight forward. If the a process in the frozen
>> group holds a resource that a process in the unfrozen group needs, how
>> do deal with that?
>
> I'm going to fix the framework which puts the group into freeze state
> w/o making sure that there is no held shared resource. Come on it's
> not rocket science.
>

I'm not sure which framework you are talking about here, but I don't
think there is a single framework that knows about all shared
resources.

That is too simple. You also have to prevent A from being frozen while
it is processing the event or the result would be the same as if it
was frozen beforehand.

> If your framework cannot deal with that simple problem then you have a
> much more serious problem already.
>
> Thanks,
>
> � � � �tglx
>

--
Arve Hj�nnev�g

Arjan van de Ven

unread,

Jun 5, 2010, 7:40:02 PM6/5/10

to

On Sat, 5 Jun 2010 15:39:44 -0700
Arve Hjønnevåg <ar...@android.com> wrote:

> >
> > For example if the Adobe Flash player puts a timer every 10
> > milliseconds (yes it does that), and you give it a 3.99 seconds
> > range, it will fire its timers every 4 seconds.... unless other
> > activity happens independently, at which point it'll align with
> > that instead.
> >
>
> If you do that what you are delivering is nowhere close to what the
> app asked for.

yeah it feels a little bit suspended

> You don't need range timers for this, you could just as
> well add 4 seconds to all normal timers.

.. with the difference that with range timers, you naturally align with
other activity, so if there's system level activity, the AVERAGE service
the app gets is better by a LOT than just adding 4 seconds always.

but you knew that.... just doesn't help your case.

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

Rafael J. Wysocki

unread,

Jun 5, 2010, 7:40:03 PM6/5/10

to

On Sunday 06 June 2010, Arve Hjønnevåg wrote:
> 2010/6/5 Thomas Gleixner <tg...@linutronix.de>:

> > On Sat, 5 Jun 2010, Arve Hjønnevåg wrote:
> >> 2010/6/5 Thomas Gleixner <tg...@linutronix.de>:

> >> > B1;2005;0cOn Fri, 4 Jun 2010, Arve Hjønnevåg wrote:
...

> > So taking your example:
> >
> > Event happens and gets delivered to the framework
> >
> > framework selects A because it is in the foreground
> >
> > if (A is frozen)
> > unfreeze(A)
> >
> > deliver_event_to(A)
> >
> > It's that simple.
> >
>
> That is too simple. You also have to prevent A from being frozen while
> it is processing the event or the result would be the same as if it
> was frozen beforehand.

Well, the freezing of the "untrusted" part of user space needs to be triggered
somehow in the first place. Whatever mechanism is used for that, there should
be a way to tell it to not to freeze the "untrusted" part of user space for a
while. Yes, it is similar to wakelocks, but I think it can be implemented
entirely in user space.

So, in general, the "trusted" app that needs an "untrusted" one to handle stuff
will take a "freeze lock" to prevent the power manager from freezing the
"untrusted" part of user space (that will also cause it to thaw these tasks if
they are frozen at the moment) and will release the "freeze lock" when it's
done with its job. You can use timeouts and whatever you like with that and
the kernel doesn't have to participate in that (except for carrying out the
low-level freezing and thawing of the "untrusted" tasks at the power manager's
request).

Rafael

Thomas Gleixner

unread,

Jun 5, 2010, 7:50:01 PM6/5/10

to

That's the whole problem. Suspend blockers are a binary all on/off
approach so you waste power just to get the thing back to
"suspend". They unleash the world and some more just to put it back
into oblivion with brute force.

Thanks,

tglx

Arve Hjønnevåg

unread,

Jun 5, 2010, 8:00:02 PM6/5/10

to

2010/6/5 Rafael J. Wysocki <r...@sisk.pl>:

> On Saturday 05 June 2010, Arve Hj�nnev�g wrote:
>> 2010/6/5 Thomas Gleixner <tg...@linutronix.de>:

I'm not sure why that matters. Any resource held by a frozen process
could cause problems whether it is something like being the current
foreground app or a mutex in shared memory.

> Generally, however, if your "trusted" processes depend on the processes you
> don't trust, then either the former should not be trusted, or the latter should
> be trusted.
>

There are different levels of trust. Trusted processes often need to
wait for an untrusted process to release a resource for the untrusted
process to behave correctly, but the trusted process can revoke the
resource if the untrusted process does not comply in time.

>> >> >> > � � With the cgroup freezer you can "suspend" them right away and

What new requirement are you talking about. Did you assume all our
user-space ipc calls went though a single process?

Not if you mean when we write to /sys/power/state. Processes are not
frozen until the last suspend blocker is released.

>
> Now, in my proposed approach the "untrusted" apps are frozen exactly at the
> point Android would start opportunistic suspend and they wouldn't be able
> to do anything about that anyway. �So if one of your "trusted" apps depends
> on the "untrusted" ones in a way that you describe, you alread have a bug
> (the "trusted" app cannot prevent automatic suspend from happening even if it
> wants, because it depends on an "untrusted" app that has just been frozen).
>

I don't think what you said here is correct. If a wakeup event happens
all processed are unfrozen since the driver blocks suspend. The app
that reads this event blocks suspend before reading it. If it was busy
talking to a less trusted app when the event happened it still works
since all apps are running at this point.

The problem is that properly working untrusted apps may get treated as
non-working apps and killed because they were frozen and did not
respond. Also this is not invisible to the user as the system usually
gives the app several seconds to respond.

--
Arve Hj�nnev�g

Thomas Gleixner

unread,

Jun 5, 2010, 8:10:02 PM6/5/10

to

On Sat, 5 Jun 2010, Arve Hj�nnev�g wrote:
> 2010/6/5 Thomas Gleixner <tg...@linutronix.de>:
> > On Sat, 5 Jun 2010, Arve Hj�nnev�g wrote:
> >> >> > That download might take a minute or two, but that's not an
> >> >> > justification for the crapplication to run unconfined and prevent
> >> >> > lower power states.
> >> >> >
> >> >>
> >> >> I agree, but this is not a simple problem to solve.
> >> >
> >> > Not with suspend blockers, but with cgroup confinement of crap, it's
> >> > straight forward.
> >> >
> >>
> >> I don't think is is straight forward. If the a process in the frozen
> >> group holds a resource that a process in the unfrozen group needs, how
> >> do deal with that?
> >
> > I'm going to fix the framework which puts the group into freeze state
> > w/o making sure that there is no held shared resource. Come on it's
> > not rocket science.
> >
>
> I'm not sure which framework you are talking about here, but I don't
> think there is a single framework that knows about all shared
> resources.

Damn, it's not me talking about "our framework", you are mentioning
when it fits your needs.

If you do not have a clearly defined user space framework, then we
talk about a completely random conglomeration of applications which
need to be brought into submission by some global brute force
approach.

I'm tired of this, really. You just use terminlology as it fits to
defend the complete design failure of android. But you fail to trick
me :)

Can you please explain in a consistent way how the application stack
and the underlying framework (which exists according to android docs)
is handling events and how the separation of trust level works ?

We need to know that, otherwise we turn in circles forever.

Thanks,

tglx

Arve Hjønnevåg

unread,

Jun 5, 2010, 8:10:02 PM6/5/10

to

2010/6/5 Arjan van de Ven <ar...@infradead.org>:

> On Sat, 5 Jun 2010 15:39:44 -0700
> Arve Hj�nnev�g <ar...@android.com> wrote:
>
>> >
>> > For example if the Adobe Flash player puts a timer every 10
>> > milliseconds (yes it does that), and you give it a 3.99 seconds
>> > range, it will fire its timers every 4 seconds.... unless other
>> > activity happens independently, at which point it'll align with
>> > that instead.
>> >
>>
>> If you do that what you are delivering is nowhere close to what the
>> app asked for.
>
> yeah it feels a little bit suspended
>
>> You don't need range timers for this, you could just as
>> well add 4 seconds to all normal timers.
>
> .. with the difference that with range timers, you naturally align with
> other activity, so if there's system level activity, the AVERAGE service
> the app gets is better by a LOT than just adding 4 seconds always.
>
> but you knew that.... just doesn't help your case.

So you are saying it is safe to use range timers to radically change
the requested timer interval because it does not actually get to the
value that you changed it so. But you are also saying that this will
allow the system to stay idle for that long. Something does not add
up.

--
Arve Hj�nnev�g

Thomas Gleixner

unread,

Jun 5, 2010, 8:30:01 PM6/5/10

to

On Sat, 5 Jun 2010, Arve Hj�nnev�g wrote:
> 2010/6/5 Thomas Gleixner <tg...@linutronix.de>:

> >> > Well, that's simply an application bug which sucks battery with or
> >> > without suspend blockers. So it's unrelated to the freezing of
> >> > untrusted apps while a trusted app still works in the background
> >> > before allowing the machine to suspend.
> >> >
> >>
> >> It is not unrelated if the trusted app has stopped working but still
> >> blocks suspend. The battery drains when you combine them.
> >
> > What you are describing is a problem which is not solvable either way.
> > If you take the lock and do not release it you're not going to
> > suspend. I never claimed that any other mechanism resolves this.
> >
> Whether you claimed it or not, this is the only case where using
> cgroups would have a significant power saving over what we get with
> suspend. The trusted app is idle and the untrusted app is frozen, so
> we enter a low power mode from idle.

Nothing else was what I said and depending on the usage pattern this
can be significant. Just you converted a perfectly sensible technical
argument into a quibble about BUGs in applicatins which are not
confinable by defintion.

> > But this is not related to the fact that freezing crap while running a
> > sane background task is going to save you power vs. an approach where
> > running a sane background task allows crap to consume power unconfined
> > until it is done.
> >
> If the task that is blocking suspend is using the cpu anyway, then the
> bad app does not increase the power consumption nearly as much as if
> the task that blocked suspend is idle.

That's utter bullshit. If the app missed to release the supsend
blocker then your crappy "while(1);" app is killing you in no time,
while the same frozen crappy "while(1);" does no harm at all.

Thanks,

tglx

Arve Hjønnevåg

unread,

Jun 5, 2010, 8:40:01 PM6/5/10

to

It is a 200:1 ratio in time not in power.

> If X goes to once every 10 seconds (not unreasonable, especially since
> any real device will pull email and stuff in the backgroudn), you have
> 2000:1 time and power ratios...
>
> Unless your "on" power is insane high (and hopefully it's not, since

The absolute "on" power is not relevant to the ratio, the difference
between on and off power is. This can easily be 100:1.

> you're not turning on the whole device obviously, you do selective
> power and clock gating)... that "divide by 200 or 2000" makes the whole
> problem go away.. in the "seconds" range for really low power devices.
> Not in "hours" range.
>

If you improve the low power state, compared to the "on" state wakeup
gets worse, not better, but yes the phone hardware we have now does
not need to stay idle for hours to get good battery life, the msm
hardware at least needs to stay idle for more than a few seconds.

>
> On laptops (which have much more poor powermanagement) this point is
> around 40 milliseconds or so.. but on phone silicon that I've seen,
> both Intel and others, this is in the 1 to 5 seconds range.
>
>
>
>
>
> --
> Arjan van de Ven Intel Open Source Technology Centre
> For development, discussion and tips for power savings,
> visit http://www.lesswatts.org
>

--
Arve Hjønnevåg

Thomas Gleixner

unread,

Jun 5, 2010, 8:40:01 PM6/5/10

to

On Sat, 5 Jun 2010, Arve Hjønnevåg wrote:
> 2010/6/5 Thomas Gleixner <tg...@linutronix.de>:

> > On Sat, 5 Jun 2010, Arve Hjønnevåg wrote:
> >> 2010/6/5 Thomas Gleixner <tg...@linutronix.de>:

The framework decides when to freeze the app in the first place (as
your framework does now when it decides to suspend)

So it knows whether the app is frozen or not.

So it knows damend well whether it processed the event or not.

Thanks,

tglx

Arve Hjønnevåg

unread,

Jun 5, 2010, 9:10:02 PM6/5/10

to

This sounds like the timeout approach which I thought you did not like.

> and aborting suspend if wakeup events occur
> in the middle of it.
>

Aborting suspend is easy, but when do you allow suspend again?

>> >> and I don't know how we can safely freeze
>> >> cgroups without funneling all potential wakeup events through a
>> >> process that never gets frozen.
>> >
>> > If your untrusted apps get called by the trusted ones, they aren't really
>> > untrusted in the first place.
>> >
>> That is not a correct statement. A trusted apps can call into an
>> untrusted app, it just has to validate the response and handle not
>> getting a response at all. There are also different levels of trust. I
>> may have trusted an app to provide a contact pictures, but not trusted
>> it to block suspend. When the phone rings the app will be called to
>> provide the picture for the incoming call dialog, but if it is frozen
>> at this point the more trusted app that handles the incoming phone
>> call will not be able to get the picture.
>
> It will be able to do that if it causes the frozen part of user space to be
> thawed.
>
> I think you have this problem already, though, because you use full system
> suspend and all of your apps are frozen by it. So, to handle the situation you
> describe above, you need to carry out full system resume that will thaw the
> tasks for you. I don't see any fundamental difference betwee the two cases.
>

Yes, we can keep all our user space suspend blockers and thaw the
frozen cgroup when any suspend blocker is held, but this would
eliminate any power advantage that freezing a cgroup has over using
suspend to freeze all processes. Without annotating the drivers to
block the cgroup freezing in the same places as we now block suspend,
it also prevents processes in the cgroup that we freeze from directly
consuming wakup events.

>> > From what you're saying it follows that you're not really willing to accept
>> > any solution different to your suspend blockers. Is that really the case?
>> >
>> I don't think that is a fair way to put it. We need to support our
>> user-space framework and I have not seen an alternative solution that
>> clearly will work (other than replacing suspend_blockers with pm_qos
>> constraints that do the same thing).
>
> Then think again of the approach I proposed and explain to me why it won't
> work, because I haven't seen any convincing argument on that yet.
>

If you are referring to the approach that we don't use suspend but
freeze a cgroup instead, this only solves the problem of bad apps. It
does not help pause timers in trusted user space code and in the
kernel, so it does not lower our average power consumption. And, it
does not solve the problem for systems that enters lower power states
from suspend than it can from idle. The last point my not be relevant
to android anymore, but desktop systems already have auto suspend and
it would be preferable to have a race free kernel api for this.

--
Arve Hjønnevåg

Arve Hjønnevåg

unread,

Jun 5, 2010, 9:20:01 PM6/5/10

to

2010/6/5 Thomas Gleixner <tg...@linutronix.de>:
> On Sat, 5 Jun 2010, Arve Hj�nnev�g wrote:
>> 2010/6/5 Thomas Gleixner <tg...@linutronix.de>:
>> > On Sat, 5 Jun 2010, Arve Hj�nnev�g wrote:
>> >> >> > That download might take a minute or two, but that's not an
>> >> >> > justification for the crapplication to run unconfined and prevent
>> >> >> > lower power states.
>> >> >> >
>> >> >>
>> >> >> I agree, but this is not a simple problem to solve.
>> >> >
>> >> > Not with suspend blockers, but with cgroup confinement of crap, it's
>> >> > straight forward.
>> >> >
>> >>
>> >> I don't think is is straight forward. If the a process in the frozen
>> >> group holds a resource that a process in the unfrozen group needs, how
>> >> do deal with that?
>> >
>> > I'm going to fix the framework which puts the group into freeze state
>> > w/o making sure that there is no held shared resource. Come on it's
>> > not rocket science.
>> >
>>
>> I'm not sure which framework you are talking about here, but I don't
>> think there is a single framework that knows about all shared
>> resources.
>
> Damn, it's not me talking about "our framework", you are mentioning
> when it fits your needs.

You said you were going to fix the framework. I did know if you were
talking about the cgroup framework, or the android user-space
frameworks. I don't think either has knowledge about all shared
resources.

>
> If you do not have a clearly defined user space framework, then we
> talk about a completely random conglomeration of applications which
> need to be brought into submission by some global brute force
> approach.
>
> I'm tired of this, really. You just use terminlology as it fits to
> defend the complete design failure of android. But you fail to trick
> me :)
>
> Can you please explain in a consistent way how the application stack
> and the underlying framework (which exists according to android docs)
> is handling events and how the separation of trust level works ?
>

I don't think I can, since I only know small parts of it. I know some
events like input event go though a single thread in our system
process, while other events like network packets (which are also
wakeup events) goes directly to the app.

> We need to know that, otherwise we turn in circles forever.
>
> Thanks,
>
> � � � �tglx

--

Arve Hjønnevåg

unread,

Jun 5, 2010, 9:30:02 PM6/5/10

to

2010/6/5 Thomas Gleixner <tg...@linutronix.de>:

This is the bug I described above. If the app that blocked suspend did
not release the suspend blocker and went idle, then another while(1)
app will drain the battery. If the app that blocked suspend only
blocked suspend while it needs to run (which is the typical reason to
block suspend) then the system is not idle anyway and the impact of
the while(1) app is much less severe.

Arve Hjønnevåg

unread,

Jun 5, 2010, 9:50:02 PM6/5/10

to

Our user-space code is not single-threaded. So just because an app was
not frozen when you checked does not mean it will remain unfrozen. We
can use the same user-space wakelock api we have now to prevent
freezing apps instead of preventing suspend, but we loose any
advantage we get from freezing just a subset of processes this way.

--
Arve Hjønnevåg

Alan Stern

unread,

Jun 5, 2010, 10:50:01 PM6/5/10

to

On Sat, 5 Jun 2010, Arve Hjønnevåg wrote:

> Yes, we can keep all our user space suspend blockers and thaw the
> frozen cgroup when any suspend blocker is held, but this would
> eliminate any power advantage that freezing a cgroup has over using
> suspend to freeze all processes. Without annotating the drivers to
> block the cgroup freezing in the same places as we now block suspend,
> it also prevents processes in the cgroup that we freeze from directly
> consuming wakup events.

The driver annotations don't need to block the cgroup freezing. They
just need to keep the system running long enough to awaken a thread
that will handle the wakeup event. (See below.) A pm-qos constraint
is good enough for this.

> If you are referring to the approach that we don't use suspend but
> freeze a cgroup instead, this only solves the problem of bad apps. It
> does not help pause timers in trusted user space code and in the
> kernel, so it does not lower our average power consumption.

You can solve this problem if you restructure your "trusted" apps in
the right way. Require a trusted app to guarantee that whenever it
doesn't hold any suspend blockers, it will do nothing but wait (in a
poll() system call for example) for a wakeup event. When the event
occurs, it must then activate a suspend blocker.

Better yet, make it more fine-grained. Instead of trusted apps, have
trusted threads. Freeze the untrusted threads along with everything
else, and require the trusted threads to satisfy this guarantee.

In this way, while the system is idle no user timers will get renewed.
Kernel timers are another matter, but we should be able to handle them.
There's nothing Android-specific about wanting to reduce kernel timer
wakeups while in a low-power mode.

> And, it
> does not solve the problem for systems that enters lower power states
> from suspend than it can from idle. The last point my not be relevant
> to android anymore, but desktop systems already have auto suspend and
> it would be preferable to have a race free kernel api for this.

This is an entirely different matter from the rest of the discussion.
It would be better to consider this separately after Android's current
problems have been addressed.

Alan Stern

Vitaly Wool

unread,

Jun 6, 2010, 4:00:02 AM6/6/10

to

2010/6/5 Arve Hjønnevåg <ar...@android.com>:

> On Sat, Jun 5, 2010 at 9:28 AM, Arjan van de Ven <ar...@infradead.org> wrote:
>> On Sat, 05 Jun 2010 11:54:13 +0200
>> Peter Zijlstra <pet...@infradead.org> wrote:
>>
>>> On Fri, 2010-06-04 at 17:10 -0700, Arve Hjønnevåg wrote:
>>> > > Trusted processes are assumed to be sane and idle when there is
>>> > > nothing for them to do, allowing the machine to go into deep idle
>>> > > states.
>>> > >
>>> >
>>> > Neither the kernel nor our trusted user-space code currently meets
>>> > this criteria.
>>>
>>> Then both need fixing. Really, that's the only sane approach.
>>
>> fwiw... in MeeGo we're seeing quite good idle times (> 1 seconds)
>> without really bad hacks.
>>
>

> We clearly have different standards for what we consider good. We
> measure time suspended in minutes or hours, not seconds, and waking up
> every second or two causes a noticeable decrease in battery life on
> the hardware we have today.

Are you stating that the existing Android implementation enters the
suspended state for hours for any of the existing designs?

~Vitaly

da...@lang.hm

unread,

Jun 6, 2010, 4:10:02 AM6/6/10

to

On Fri, 4 Jun 2010, Brian Swetland wrote:

> Yeah, I do understand that we're not making it easy for ourselves
> here. I think we hit the point where Rafael and Matthew signed off on
> things and thought "aha, linux-pm maintainers are happy, now we're
> getting somewhere" only to realize the light at the end of the tunnel
> was a bit further out than we anticipated ^^

What you missed is that the linux-pm maintainers have relativly little
weight in getting things into the kernel. They are gatekeeper, so until
they approve it there is basically no chance of getting in, but even
changes that they develop and push frequently have a uphill battle to get
into the kernel, especially if they would end up touching all drivers.
There have been several proposals by the pm team that have been shot down
much more completely than wavelocks.

David Lang

da...@lang.hm

unread,

Jun 6, 2010, 4:20:01 AM6/6/10

to

On Thu, 3 Jun 2010, Arjan van de Ven wrote:

> On Thu, 3 Jun 2010 19:26:50 -0700 (PDT)
> Linus Torvalds <torv...@linux-foundation.org> wrote:
>
>>
>> If the system is idle (or almost idle) for long times, I would
>> heartily recommend actively shutting down unused cores. Some CPU's
>> are hopefully smart enough to not even need that kind of software
>> management, but I suspect even the really smart ones might be able to
>> take advantage of the kernel saying: "I'm shutting you down, you
>> don't have to worry about latency AT ALL, because I'm keeping another
>> CPU active to do any real work".
>
> sadly the reality is that "offline" is actually the same as "deepest C
> state". At best.
>
> As far as I can see, this is at least true for all Intel and AMD cpus.
>
> And because there's then no power saving (but a performance cost), it's
> actually a negative for battery life/total energy.

I believe that this assumes you are in the 'race to idle' situation where
when you finish your work you can shutdown. If the work is ongoing you may
never shutdown.

Also, what about the new CPUs where you can ramp up the clockspeed on some
cores if you hsut down other cores? that couls also benifit individual
threads.

Brian Swetland

unread,

Jun 6, 2010, 4:30:01 AM6/6/10

to

On Sun, Jun 6, 2010 at 12:52 AM, Vitaly Wool <vital...@gmail.com> wrote:
> 2010/6/5 Arve Hjønnevåg <ar...@android.com>:

>>
>> We clearly have different standards for what we consider good. We
>> measure time suspended in minutes or hours, not seconds, and waking up
>> every second or two causes a noticeable decrease in battery life on
>> the hardware we have today.
>
> Are you stating that the existing Android implementation enters the
> suspended state for hours for any of the existing designs?

It varies depending on device and usage. The battery monitoring on
NexusOne happens every ten minutes, so that's the longest you'll see a
N1 suspended for. On a G1 or Dream/myTouch you can see 20-30 minutes
between wakeups (depending on network issues and background data sync
traffic), and if you have background data sync off those devices can
sit in suspend for days at a time (unless you receive a phone call or
something). In "airplane mode", with no local alarms, a device can
easily sit in the lowest power state for a month or so, until the
battery finally runs out.

Brian

da...@lang.hm

unread,

Jun 6, 2010, 4:30:01 AM6/6/10

to

On Thu, 3 Jun 2010, Linus Torvalds wrote:

> On Thu, 3 Jun 2010, Linus Torvalds wrote:
>>
>> so I'd like to see the opportunistc suspend thing think about CPU
>> offlining
>
> Side note: one reason for me being somewhat interested in the CPU
> offlining is that I think the Android kind of opportunistic suspend is
> _not_ likely something I'd like to see on a desktop. But an the
> "opportunistic CPU offliner"? That might _well_ be useful even outside of
> any other suspend activity.

When the OLPC was first released there was talk that the hardware was well
designed for sleeping (including the ability for the display to keep going
even if the system itself shut down), with the idealistic talk of the
system possibly sleeping between keystrokes.

things didn't end up working (a couple pieces of hardware ended up not
playing well with others), but the concept is still something that could
end up impacting users outside of the mobile phone market, even if not on
your traditional desktop.

David Lang

Vitaly Wool

unread,

Jun 6, 2010, 4:40:01 AM6/6/10

to

On Sun, Jun 6, 2010 at 10:20 AM, Brian Swetland <swet...@google.com> wrote:
> On Sun, Jun 6, 2010 at 12:52 AM, Vitaly Wool <vital...@gmail.com> wrote:
>> 2010/6/5 Arve Hjønnevåg <ar...@android.com>:
>>>
>>> We clearly have different standards for what we consider good. We
>>> measure time suspended in minutes or hours, not seconds, and waking up
>>> every second or two causes a noticeable decrease in battery life on
>>> the hardware we have today.
>>
>> Are you stating that the existing Android implementation enters the
>> suspended state for hours for any of the existing designs?
>
> It varies depending on device and usage. The battery monitoring on
> NexusOne happens every ten minutes, so that's the longest you'll see a
> N1 suspended for. On a G1 or Dream/myTouch you can see 20-30 minutes
> between wakeups (depending on network issues and background data sync
> traffic), and if you have background data sync off those devices can
> sit in suspend for days at a time (unless you receive a phone call or
> something). In "airplane mode", with no local alarms, a device can
> easily sit in the lowest power state for a month or so, until the
> battery finally runs out.

That only concerns the case when you have just turned on the phone and
left it laying around.
You have to admit that it's not the common case for a smartphone. The
common case is that you've played with it for a bit, turning on things
like BT/WIFI, running some apps and so on. And doing so you'll end up
having wake locks taken from everywhere, so I can hardly see a second
of suspend for Nexus.

E. g. when the wireless is connected to an AP, it takes a wake lock
which is released on 15 minutes touchscreen inactivity timeout, as far
as I can tell. So:

* the system will never hit suspend during this period;
* if the download was ongoing and had not been completed during this
period, it will be terminated.

So the bottom line is: the approach is very inflexible. Of course it
can give you the best power savings if you turn the Airplane mode on
as soon as you switched on the phone, but this is not what a typical
user would do.

~Vitaly

Brian Swetland

unread,

Jun 6, 2010, 5:30:02 AM6/6/10

to

On Sun, Jun 6, 2010 at 1:32 AM, Vitaly Wool <vital...@gmail.com> wrote:
>>
>> It varies depending on device and usage. The battery monitoring on
>> NexusOne happens every ten minutes, so that's the longest you'll see a
>> N1 suspended for. On a G1 or Dream/myTouch you can see 20-30 minutes
>> between wakeups (depending on network issues and background data sync
>> traffic), and if you have background data sync off those devices can
>> sit in suspend for days at a time (unless you receive a phone call or
>> something). In "airplane mode", with no local alarms, a device can
>> easily sit in the lowest power state for a month or so, until the
>> battery finally runs out.
>
> That only concerns the case when you have just turned on the phone and
> left it laying around.
> You have to admit that it's not the common case for a smartphone. The
> common case is that you've played with it for a bit, turning on things
> like BT/WIFI, running some apps and so on. And doing so you'll end up
> having wake locks taken from everywhere, so I can hardly see a second
> of suspend for Nexus.

The common case for a phone is to be sitting around. Even for heavy
smartphone users, unless they power on, use the device screen-on for 4
hours solid or whatnot and drain the battery straight away, the device
is going to spend a significant portion of its operating time in
screen-off standby modes (conserving power for when you take a call,
browse the web, etc).

For typical users on typical android devices, this means the device
stays suspended for 5-10 minutes at a time, coming up for air when a
network packet (mail sync, im, etc) or alarm (battery monitor) wakes
the device briefly. Obviously with the right combination of bad apps
you will see a device suspending more rarely.

> E. g. when the wireless is connected to an AP, it takes a wake lock
> which is released on 15 minutes touchscreen inactivity timeout, as far
> as I can tell. So:
>
> * the system will never hit suspend during this period;
> * if the download was ongoing and had not been completed during this
> period, it will be terminated.

I'm pretty sure the wifi subsystem does not actually take a wakelock
while its connected -- it does have an alarm to spin down wifi after
15 minutes (by default, and user disableable) largely due to power
inefficiencies in the wifi solution in some early devices. There's
some room for improvement here, obviously. With a decent wifi chipset
and implementation, depending on local wifi traffic patterns, you can
see power usage competitive to cellular.

> So the bottom line is: the approach is very inflexible. Of course it
> can give you the best power savings if you turn the Airplane mode on
> as soon as you switched on the phone, but this is not what a typical
> user would do.

The savings in airplane mode (apart from preventing data connections,
which saves power by preventing data-hungry background apps from doing
much) is the difference between standby with radio (3-5mA) and without
(1-2mA). I'm not suggesting that airplane mode is a typical case,
just using it as in illustration of the more extreme standby case.

Users do like that to work too -- I recall Arve leaving a device in
his filing cabinet with the radio off while he was out of the country
for three weeks once, and him discovering it was still running with
something like 25% battery remaining when he returned.

In any case, I'm saying that suspending for minutes at a time
(typical, 10s of minutes or more in some cases, hours in others), does
happen and it does represent an improvement over suspending or
otherwise entering your lowest power state for seconds at a time.

Brian

da...@lang.hm

unread,

Jun 6, 2010, 6:00:02 AM6/6/10

to

On Sun, 6 Jun 2010, Brian Swetland wrote:

> The savings in airplane mode (apart from preventing data connections,
> which saves power by preventing data-hungry background apps from doing
> much) is the difference between standby with radio (3-5mA) and without
> (1-2mA). I'm not suggesting that airplane mode is a typical case,
> just using it as in illustration of the more extreme standby case.

for the sake of discussion, let's say that standby is 5ma and full
operation is 500ma and a minimal wakeup is 0.1 sec. these are probably
fairly pessimistic numbers.

waking up every second would be awake 10% of the time, so in an hour you
would use .9*5mA + .1*500mA = 4.5mA +45mA = 49.5mAH

waking up every 10 seconds would be awake 1% of the time, so in an hour
you would use .99*5mA + 0.01*500mA = 4.95mA + 5mA = 9.95mAH

waking up every 100 seconds would be awake 0.1% of the time, so in an hour
you would use .999*5mA + 0.001*500mA =4.995mA + 0.5mA = 5.495mAH

waking up every 1000 seconds would be awake 0.01% of the time so in an
hour you would use .9999*5mA + 0.0001*500mA = 4.9995mA + 0.05mAH =
5.0495mAH

now if you have a 1000mAH battery (small, but reasonable for a smartphone)
your standby life would be

.1 second wakeup (on continuously) = 2 hours
1 second wakup = 20 hours
10 second wakeup = 100 hours
100 second wakeup = 182 hours
1000 second wakeup = 198 hours

if you could shrink the time awake to 0.01 second per wakeup you would
shift this all up a category (and avoiding the need to wake everything up
to service a timer would help do this)

this effort very definantly has diminishing returns as you go to larger
sleep periods as the constant standby power draw becomes more and more
dominating. someone mentioned that they were getting the sleep time of
normal systems up past the 1 second mark with the 10 second mark looking
very attainable. that is where you get the most benifit for whatever
changes are needed. getting up to a 2 min sleep time really gives you
about all the benifit that you can get, going from there to 15 min makes
very little difference.

don't let chasing the best possible sleep time prevent you from
considering options that would be good enough in time, but would
drastically reduce the maintinance effort (as things could be upstreamed
more easily), and would be usable on far more systems.

David Lang

Thomas Gleixner

unread,

Jun 6, 2010, 6:10:02 AM6/6/10

to

Errm. That does not matter whether its single threaded or not. And
right, you have to prevent that it gets frozen while you are calling
into it.

But that does not change the fact that you can do finer grained power
control even in the case when suspend is impossible because a
background application has work to finish and does that without
requiring interaction with the frozen part.

That's what I pointed out in the first place and you just argue in
circles why it is impossible to do so.

Let me recapitulate:

Full on state: No difference because everything runs
Full suspend state: No difference because everything is down

Screen off, background work active:

Suspend blocker held by the active background work lets
other applications which are unrelated consume CPU cycles
and power.

versus

Frozen apps restrict the CPU cycles and power consumption to
the background work (if there are no interactions with
frozen tasks) and therefor save more power than the on/off
approach.

If your user space stack cannot be distangled that way, then
it's a problem of your user space stack and not changing the
fact, that a well designedd system allows you to do that.

Any objections ?

Thanks,

tglx

Vitaly Wool

unread,

Jun 6, 2010, 6:10:01 AM6/6/10

to

On Sun, Jun 6, 2010 at 11:21 AM, Brian Swetland <swet...@google.com> wrote:
>
> The common case for a phone is to be sitting around. Even for heavy
> smartphone users, unless they power on, use the device screen-on for 4
> hours solid or whatnot and drain the battery straight away, the device
> is going to spend a significant portion of its operating time in
> screen-off standby modes (conserving power for when you take a call,
> browse the web, etc).

Sure, but my point was, some non-trivial (still kind of natural for a
smartphone) activities with the device will prevent it from suspending
for quite some time. Even worse, the suspend wakelock will keep the
whole kernel active, as opposed to powering off unused devices
separately as it's done in runtime PM. Yep, I know about the "early
suspend" type of thing; yet it's excess, not mainlined and lacks
granularity.

> For typical users on typical android devices, this means the device
> stays suspended for 5-10 minutes at a time, coming up for air when a
> network packet (mail sync, im, etc) or alarm (battery monitor) wakes
> the device briefly. Obviously with the right combination of bad apps
> you will see a device suspending more rarely.

Wasn't that you who stated that you so successfully tolerate bad apps
with opportunistic suspend that anything of the kind should not really
be the case? :)

>> E. g. when the wireless is connected to an AP, it takes a wake lock
>> which is released on 15 minutes touchscreen inactivity timeout, as far
>> as I can tell. So:
>>
>> * the system will never hit suspend during this period;
>> * if the download was ongoing and had not been completed during this
>> period, it will be terminated.
>
> I'm pretty sure the wifi subsystem does not actually take a wakelock
> while its connected -- it does have an alarm to spin down wifi after
> 15 minutes (by default, and user disableable) largely due to power
> inefficiencies in the wifi solution in some early devices.

Oh? How does it make sure it's not powered off while scanning for APs,
for instance?

> Users do like that to work too -- I recall Arve leaving a device in
> his filing cabinet with the radio off while he was out of the country
> for three weeks once, and him discovering it was still running with
> something like 25% battery remaining when he returned.

So what you're actually up to is that a user should restart the phone
and turn the radio off if he wants to find it running when he's back
from a long business trip or something. Nice...

> In any case, I'm saying that suspending for minutes at a time
> (typical, 10s of minutes or more in some cases, hours in others), does
> happen and it does represent an improvement over suspending or
> otherwise entering your lowest power state for seconds at a time.

That's for sure, if _all_ the other parameters *are* *equal*. This is
obviously not the case.

~Vitaly

da...@lang.hm

unread,

Jun 6, 2010, 6:20:02 AM6/6/10

to

and while it will represent an improvement, is the cost worth the
relativly minor benifit that going from 10s of seconds of sleep to 10s of
minuites of sleep give you?

a system that wakes up every 10 seconds, but only wakes the portion of the
system needed for the wakeup can easily outlast one that wakes up far less
frequently, but when it's awake is fully awake.

as an example (taken from this thread).

system A needs to wake up to get a battery reading, store it and go
back to sleep, It does so every 10 seconds. But when it does so it only
runs the one process and then goes back to sleep.

system B has the same need, but wakes up every 10 minutes. but when it
does so it fully wakes up and this allows the mail app to power up the
radio, connect to the Internet and start checking for new mail before
oppurtunistic sleep shuts things down (causing the mail check to fail)

System A will last considerably longer on a battery than System B.

David Lang

Vitaly Wool

unread,

Jun 6, 2010, 6:20:02 AM6/6/10

to

2010/6/6 <da...@lang.hm>:

> as an example (taken from this thread).
>
> system A needs to wake up to get a battery reading, store it and go back to
> sleep, It does so every 10 seconds. But when it does so it only runs the one
> process and then goes back to sleep.
>
> system B has the same need, but wakes up every 10 minutes. but when it does
> so it fully wakes up and this allows the mail app to power up the radio,
> connect to the Internet and start checking for new mail before oppurtunistic
> sleep shuts things down (causing the mail check to fail)
>
> System A will last considerably longer on a battery than System B.

Exactly, thanks for pointing out the specific example :)

~Vitaly

Thomas Gleixner

unread,

Jun 6, 2010, 6:40:02 AM6/6/10

to

On Sat, 5 Jun 2010, Arve Hjønnevåg wrote:
> 2010/6/5 Thomas Gleixner <tg...@linutronix.de>:

> > On Sat, 5 Jun 2010, Arve Hjønnevåg wrote:
> >> 2010/6/5 Thomas Gleixner <tg...@linutronix.de>:

> >> > On Sat, 5 Jun 2010, Arve Hjønnevåg wrote:
> >> >> >> > That download might take a minute or two, but that's not an
> >> >> >> > justification for the crapplication to run unconfined and prevent
> >> >> >> > lower power states.
> >> >> >> >
> >> >> >>
> >> >> >> I agree, but this is not a simple problem to solve.
> >> >> >
> >> >> > Not with suspend blockers, but with cgroup confinement of crap, it's
> >> >> > straight forward.
> >> >> >
> >> >>
> >> >> I don't think is is straight forward. If the a process in the frozen
> >> >> group holds a resource that a process in the unfrozen group needs, how
> >> >> do deal with that?
> >> >
> >> > I'm going to fix the framework which puts the group into freeze state
> >> > w/o making sure that there is no held shared resource. Come on it's
> >> > not rocket science.
> >> >
> >>
> >> I'm not sure which framework you are talking about here, but I don't
> >> think there is a single framework that knows about all shared
> >> resources.
> >
> > Damn, it's not me talking about "our framework", you are mentioning
> > when it fits your needs.
>
> You said you were going to fix the framework. I did know if you were
> talking about the cgroup framework, or the android user-space
> frameworks. I don't think either has knowledge about all shared
> resources.

The cgroup freezer makes sure that there are no in kernel resources
blocked. Of course the user space side has to do the same and it's not
rocket science.

> >

> > If you do not have a clearly defined user space framework, then we
> > talk about a completely random conglomeration of applications which
> > need to be brought into submission by some global brute force
> > approach.
> >
> > I'm tired of this, really. You just use terminlology as it fits to
> > defend the complete design failure of android. But you fail to trick
> > me :)
> >
> > Can you please explain in a consistent way how the application stack
> > and the underlying framework (which exists according to android docs)
> > is handling events and how the separation of trust level works ?
> >
>
> I don't think I can, since I only know small parts of it. I know some

Sigh. That's the main reason why this discussion goes nowhere.

How in heavens sake can we make a decision whether suspend blockers
are the right and only way to go, when the people

Florian Mickler

unread,

Jun 6, 2010, 6:50:02 AM6/6/10

to

On Sun, 6 Jun 2010 12:00:47 +0200
Vitaly Wool <vital...@gmail.com> wrote:

> Even worse, the suspend wakelock will keep the
> whole kernel active, as opposed to powering off unused devices
> separately as it's done in runtime PM.

That is not true. While the kernel is not suspended it does
runtime pm.

> > Users do like that to work too -- I recall Arve leaving a device in
> > his filing cabinet with the radio off while he was out of the country
> > for three weeks once, and him discovering it was still running with
> > something like 25% battery remaining when he returned.
>
> So what you're actually up to is that a user should restart the phone
> and turn the radio off if he wants to find it running when he's back
> from a long business trip or something. Nice...

?

Cheers,
Flo

Florian Mickler

unread,

Jun 6, 2010, 7:00:02 AM6/6/10

to

On Sun, 6 Jun 2010 12:19:08 +0200
Vitaly Wool <vital...@gmail.com> wrote:

> 2010/6/6 <da...@lang.hm>:
>
> > as an example (taken from this thread).
> >
> > system A needs to wake up to get a battery reading, store it and go back to
> > sleep, It does so every 10 seconds. But when it does so it only runs the one
> > process and then goes back to sleep.
> >
> > system B has the same need, but wakes up every 10 minutes. but when it does
> > so it fully wakes up and this allows the mail app to power up the radio,
> > connect to the Internet and start checking for new mail before oppurtunistic
> > sleep shuts things down (causing the mail check to fail)
> >
> > System A will last considerably longer on a battery than System B.
>
> Exactly, thanks for pointing out the specific example :)
>
> ~Vitaly

This does not affect suspend_blockers nor does suspend_blockers
interfere with that.

Suspend_blockers allow the system to suspend ("mem">/sys/power/state
suspend), when the userspace decides that the device is not in use.

So implementing suspend_blockers support does not impact any
optimizations done to either system A nor system B.

Cheers,
Flo

Alan Cox

unread,

Jun 6, 2010, 7:00:03 AM6/6/10

to

On Sun, 6 Jun 2010 12:46:01 +0200
Florian Mickler <flo...@mickler.org> wrote:

> On Sun, 6 Jun 2010 12:00:47 +0200
> Vitaly Wool <vital...@gmail.com> wrote:
>
> > Even worse, the suspend wakelock will keep the
> > whole kernel active, as opposed to powering off unused devices
> > separately as it's done in runtime PM.
>
> That is not true. While the kernel is not suspended it does
> runtime pm.

On several of our platforms runtime PM already includes suspend so a
suspend wakelock does interfere with existing power managemet at that
level (not to mention the maintenance mess it causes).

This is one of the reasons you want QoS information, it provides
parameters by which the power management code can make a decision.
Suspend blocksers simply don't have sufficient variety to manage the
direction of power policy.

If Android chooses to abuse the QoS information for crude suspend
blocking then that is fine, it doesn't interfere with doing the job
'properly' on other systems or its use for realtime work on other boxes.

Alan

Thomas Gleixner

unread,

Jun 6, 2010, 7:00:02 AM6/6/10

to

On Sat, 5 Jun 2010, Arve Hjønnevåg wrote:
> 2010/6/5 Thomas Gleixner <tg...@linutronix.de>:
> >

> > Can you please explain in a consistent way how the application stack
> > and the underlying framework (which exists according to android docs)
> > is handling events and how the separation of trust level works ?
> >
>
> I don't think I can, since I only know small parts of it. I know some

Sigh, thats the whole reason why this discussion goes nowhere.

How in heavens sake should we be able to decide whether suspend
blockers are the right and only thing which solves a problem, when the
folks advocating suspend blockers are not able to explain the problem
in the first place ?

> events like input event go though a single thread in our system
> process, while other events like network packets (which are also
> wakeup events) goes directly to the app.

Yes, we know that already, but that's a completely useless information
as it does not describe the full constraints and dependencies.

Lemme summarize:

Android needs suspend blockers, because it works, but cannot explain
why it works and why it only works that way.

A brilliant argument to merge them - NOT.

Thanks,

tglx

Vitaly Wool

unread,

Jun 6, 2010, 7:10:02 AM6/6/10

to

2010/6/6 Florian Mickler <flo...@mickler.org>:

> Suspend_blockers allow the system to suspend ("mem">/sys/power/state
> suspend), when the userspace decides that the device is not in use.

Sorry. What? Blockers allow the system to suspend?

> So implementing suspend_blockers support does not impact any
> optimizations done to either system A nor system B.

Suspend blockers by themselves are of no use. Completely. So any talks
on suspend blockers separated from the sleep policy are completely
pointless.
The suspend blockers are of use when the userspace tries to blindly
freeze the tasks to enter the suspend state. This way of hammering the
system down obviously impacts everything.

~Vitaly

Felipe Contreras

unread,

Jun 6, 2010, 7:20:02 AM6/6/10

to

2010/6/6 <da...@lang.hm>:

Not to mention the fact that there's nothing fundamental that prevents
dynamic PM to reach > 15 min idle. It's a matter of time before we
find the tools needed. The amount of work that suspend blockers would
require to implement properly in user-space other than Android just
doesn't match the power savings.

--
Felipe Contreras

Felipe Contreras

unread,

Jun 6, 2010, 7:20:01 AM6/6/10

to

2010/6/6 Arjan van de Ven <ar...@infradead.org>:
> On Sat, 5 Jun 2010 14:26:14 -0700
> Arve Hjønnevåg <ar...@android.com> wrote:
>> > the kernel has a set of infrastructure already to help here (range
>> > timers, with which you can wakeup-limit untrusted userspace crap),
>> > timer slack for legacy background timers, etc etc.
>>
>> Range timers allows the kernel to align different timers so they don't
>> each bring the cpu out of idle individually. They do not eliminate
>> timers or make individual timers fire less often.
>
> you're incorrect.
> With range timers you can control the rate at which timers fire just
> fine.

I was wondering... Currently GLib user-space aligns itself to fire
burst of work at second boundaries without the need for IPC. But if
you want to align beyond one second you need multi-process alignment.
Say, one application says: wake me up between 30s and 1m. And the
other one says: wake me up between 10m and 20m. They could very well
align at some point if there was a central process keeping track of
all the timers.

Does the kernel provide something to solve that problem already?

da...@lang.hm

unread,

Jun 6, 2010, 7:20:02 AM6/6/10

to

On Sun, 6 Jun 2010, Florian Mickler wrote:

> On Sun, 6 Jun 2010 12:19:08 +0200
> Vitaly Wool <vital...@gmail.com> wrote:
>
>> 2010/6/6 <da...@lang.hm>:
>>
>>> as an example (taken from this thread).
>>>
>>> system A needs to wake up to get a battery reading, store it and go back to
>>> sleep, It does so every 10 seconds. But when it does so it only runs the one
>>> process and then goes back to sleep.
>>>
>>> system B has the same need, but wakes up every 10 minutes. but when it does
>>> so it fully wakes up and this allows the mail app to power up the radio,
>>> connect to the Internet and start checking for new mail before oppurtunistic
>>> sleep shuts things down (causing the mail check to fail)
>>>
>>> System A will last considerably longer on a battery than System B.
>>
>> Exactly, thanks for pointing out the specific example :)
>>
>> ~Vitaly
>
> This does not affect suspend_blockers nor does suspend_blockers
> interfere with that.
>
> Suspend_blockers allow the system to suspend ("mem">/sys/power/state
> suspend), when the userspace decides that the device is not in use.
>
> So implementing suspend_blockers support does not impact any
> optimizations done to either system A nor system B.

Actually, it does.

system A is what's being proposed by kernel developers, where the
untrusted stuff is in a different cgroup and what puts the system to sleep
is 'normal' power management. It doesn't sleep as long, but when it wakes
up the untrusted stuff is still frozen, so it doesn't stay awake long, or
do very much.

System B is suspend blockers where you are either awake or asleep, and
when you wake up you wake up fully, but oppertunistic sleep can interrupt
untrusted processes at any time. The system sleeps longer (as fewer things
can wake it), but when it wakes up it's fully awake.

David Lang

Alan Stern

unread,

Jun 6, 2010, 7:20:02 AM6/6/10

to

On Sat, 5 Jun 2010, Alan Stern wrote:

> > If you are referring to the approach that we don't use suspend but
> > freeze a cgroup instead, this only solves the problem of bad apps. It
> > does not help pause timers in trusted user space code and in the
> > kernel, so it does not lower our average power consumption.
>
> You can solve this problem if you restructure your "trusted" apps in
> the right way. Require a trusted app to guarantee that whenever it
> doesn't hold any suspend blockers, it will do nothing but wait (in a
> poll() system call for example) for a wakeup event. When the event
> occurs, it must then activate a suspend blocker.
>
> Better yet, make it more fine-grained. Instead of trusted apps, have
> trusted threads. Freeze the untrusted threads along with everything
> else, and require the trusted threads to satisfy this guarantee.
>
> In this way, while the system is idle no user timers will get renewed.
> Kernel timers are another matter, but we should be able to handle them.
> There's nothing Android-specific about wanting to reduce kernel timer
> wakeups while in a low-power mode.

In fact it's possible to do this with only minimal changes to the
userspace, providing you can specify all your possible hardware wakeup
sources. (On the Android this list probably isn't very large -- I
imagine it includes the keypad, the radio link(s), the RTC, and maybe
a few switches, buttons, or other things.)

Here's how you can do it. Extend the userspace suspend-blocker API, so
that each suspend blocker can optionally have an associated wakeup
source.

The power-manager process should keep a list of "active" wakeup
sources. A source gets removed from the list when an associated
suspend blocker is activated.

When the "active" list is empty and no suspend blockers are activated,
the power manager freezes ALL other processes, trusted and untrusted
alike. It then does a big poll() on all the wakeup sources. When the
poll() returns, its output is used to repopulate the "active" list and
processes are unfrozen.

(You can also include some error detection: If a source remains on the
"active" list for too long then something has gone wrong.)

To do all this you don't even need to use cgroups. The existing PM
implementation allows a user process to freeze everything but itself;
that's how swsusp and related programs work.

This is still a big-hammer sort of approach, but it doesn't require any
kernel changes.