TL; DR The process priority manager has accumulated a fair amount of
technical baggage and is in need of an overhaul. I'm proposing a bunch
of changes here as well as some historical perspective and the rationale
for those changes. This is not a very well known topic so if you don't
know this area you might want to take a dive too so that you're aware of
how FxOS application priorities are handled.
Hi everybody,
I've been sitting on some ideas on how to improve our priority
management in FxOS and I think it's time to start discussing the topic
because some choices we made in the original design have proven to be
fragile and we could use some improvements.
A little bit of history first; the process priority manager
(dom/ipc/ProcessPriorityManager.[h|cpp]) was designed in the early days
of the project with two roles in mind: to maximize application
responsiveness by adjusting CPU priorities and to keep relevant
applications alive by adjusting LMK scores.
The earliest design assigned both CPU priorities and LMK scores based on
application visibility (favoring foreground applications) and wakelocks
but were not affected by the state of other applications.
As we came close to delivering our first version a number of usability
bugs cropped up which forced changes to be made to account for scenarios
in which all applications should have been deprioritized except for one
(e.g. incoming call) as well as making certain exceptions for "special"
applications.
This lead us to partially split CPU priorities from LMK scores (but not
really) and made us special-case the homescreen (which has higher
priority than any other background app), the preallocated process (which
we don't want to kill because of memory pressure) and the keyboard
(which is considered in the foreground when visible, but less important
than the app itself). Once we were done the process priority manager had
become a collection of corner cases and obscure logic which made
modifying it and improving it complicated (e.g. see the fallout of bug
874353 [1]).
So I'm proposing a few radical changes to reduce the complexity of the
manager and improve some aspects of it that are currently lacking IMHO.
1) First of all I'd like to *really* decouple the LMK scores and CPU
priority levels. This should simplify the logic as well as saving us the
hacks we put in place (and relative assertions) to ensure we don't blow
the limit of 6 free memory thresholds that the kernel imposes on us.
Changing priorities should probably also be done on a global basis so
that we know early in the process if we're affecting only one process or
multiple ones.
2) Currently the manager more or less assumes that there's only one
foreground app. If it needs to be special (e.g. the dialer during a
call) it can get higher priority. This doesn't map very well to certain
scenarios where we have multiple apps visible (because of an attention
screen, or simply because we don't set the visibility flag correctly,
bug 892371 [2], bug 846850 [3]). If multiple apps are considered in the
foreground they'll all get the same priority and this usually leads the
LMK to kill the larger of them if we run out of memory. This is almost
always the *wrong* choice as the largest application is often the one
being actively used. To mitigate this issue I'd like to keep foreground
apps in an LRU queue with decreasing priority like we do with background
apps. This should make us pick the right app for killing when we run out
of memory without requiring further information from gaia.
3) Remove the special class that makes the homescreen process more
important than other processes and rely on LRU priorities for background
apps. Why? Because this is both suboptimal and leads to all sorts of
problems. For example when turning off the device screen the foreground
app becomes hidden and is thus sent into the background. If memory is
tight the LMK will look for an app to kill and will usually find the
large, former foreground app since it's now less important than the
homescreen. The user will turn on the phone again and find himself on
the homescreen with no clue as to why his app was closed. Similarly when
using the new navigation paradigm of swiping between applications the
older applications will be killed before the homescreen but that's
silly, if the user is not using the homescreen to switch between those
and just sliding between them we should be keeping the apps open not the
homescreen. So how do we solve this? By making the homescreen a regular
app and relying on the LRU nature of background priorities to keep it
alive. If the user is using the homescreen to switch between apps the
LRU logic will always consider it the most important of the background
apps and the LMK will kill the other apps before killing it, effectively
reproducing the current behavior but without special-casing it. If on
the other hand the user is swiping between applications, the homescreen
will drop to the least important priority and be killed freeing memory
for the apps the user is actively using so both scenarios should be
equal or better than they are now.
4) Compute the LMK memory thresholds automatically whenever possible. We
currently configure the levels of free memory at which applications
should be killed in the prefs. This is a very inflexible approach and
requires vendors to fine tune the values for their devices or take
suboptimal values (the default ones were meant for 256MiB devices with
~180MiB of memory available to userspace applications). Instead we could
compute those values as a percentage of the available memory and scale
them automatically (using the current values as references for 256MiB
devices since they work well there). We might want to keep around the
mechanism to override them via prefs but in general it would be better
if the defaults are set automatically.
So this is it more or less. Sorry for the lengthy e-mail but it's two
years worth of brain dump. For those interested we've got a metabug for
this, it's bug 994518 [4]. Comments are *very* welcome.
Gabriele
[1] Remove CPU wake lock control from ContentParent
https://bugzilla.mozilla.org/show_bug.cgi?id=874353
[2] [Activity] Adjust OOM score of activity opener
https://bugzilla.mozilla.org/show_bug.cgi?id=892371
[3] [Unagi][Camera][Activity][Contacts] App doesn't get
'mozvisibilitychange' event when running as an activity
https://bugzilla.mozilla.org/show_bug.cgi?id=846850
[4] [B2G] Improve priority manager of B2G for better performance
https://bugzilla.mozilla.org/show_bug.cgi?id=994518