Current state of performance in Gaia

Eli Perelman

unread,

Oct 2, 2015, 10:52:03 AM10/2/15

to dev-...@lists.mozilla.org

Hello fxos,

With deadlines for v2.5 approaching, I thought I would take a couple minutes and summarize the current state of performance for Gaia. At the outset of v2.5 we captured metrics of v2.2 and have used that as the baseline to determine whether applications have regressed their performance since. Any applications whose performance has significantly regressed since v2.2 will need approval to not block as major increases will block v2.5.

Enough of the chatter, here's the data:

Calendar v2.2 cold launch: 1454ms

Calendar current cold launch: 1638ms (~180ms regression)

Calendar v2.2 USS: 14.01MB

Calendar current USS: 13.99MB (good)

Camera v2.2 cold launch: 1492ms

Camera current cold launch: 2090ms (~600ms regression)

Camera v2.2 USS: 13.83MB

Camera current USS: 16.05MB (~2.2MB regression)

Clock v2.2 cold launch: 1232ms

Clock current cold launch: 1260ms (acceptable)

Clock v2.2 USS: 13.98MB

Clock current USS: 14.95MB (~1MB regression)

Contacts v2.2 cold launch: 773ms

Contacts current cold launch: 1246ms (~475ms regression)

Contacts v2.2 USS: 18.26MB

Contacts current USS: 20.04MB (~1.75MB regression)

Dialer v2.2 cold launch: 851ms

Dialer current cold launch: 944ms (~90ms regression, still under 1000ms)

Dialer v2.2 USS: 17.48MB

Dialer current USS: 13.04MB (good!)

Email v2.2 cold launch: 2129ms

Email current cold launch: 606ms (good!)

Email v2.2 USS: 16.17MB

Email current USS: 15.78MB (good)

FM v2.2 cold launch: 604ms

FM current cold launch: 783ms (~175ms regression)

FM v2.2 USS: 10.37MB

FM current USS: 10.51MB (acceptable)

Gallery v2.2 cold launch: 1113ms

Gallery current cold launch: 1207ms (~90ms regression)

Gallery v2.2 USS: 17.71MB

Gallery current USS: 18.98MB (~1.25MB regression)

Music v2.2 cold launch: 1066ms

Music current cold launch: 1717ms (~650ms regression)

Music v2.2 USS: 13.37MB

Music current USS: 29.49MB (~16.12MB regression)

SMS v2.2 cold launch: 1340ms

SMS current cold launch: 1630ms (~290ms regression)

SMS v2.2 USS: 12.86MB

SMS current USS: 19.94MB (~7MB regression)

Settings v2.2 cold launch: 2474ms

Settings current cold launch: 2950ms (~475ms regression)

Settings v2.2 USS: 17.18MB

Settings current USS: 17.54MB (acceptable)

Video v2.2 cold launch: 1115ms

Video current cold launch: 1309ms (~190ms regression)

Video v2.2 USS: 12.13MB

Video current USS: 13MB (acceptable)

TLDR; there seem to be quite a few serious regressions across many applications, in both cold launch time and USS memory usage. As a comparison, the Test Startup Limit app when first captured started off in the 880ms range, spent a good chunk of June and July around 620ms and is now around 850ms.

If anyone has any questions about the data or needs additional information, please let me know.

Also, kudos to the Email team for the massive improvement in both launch time and memory.

Thanks,

Eli Perelman

Wilson Page

unread,

Oct 2, 2015, 11:37:49 AM10/2/15

to Eli Perelman, James Burke, dev-...@lists.mozilla.org

Wow! Can email share what changes they made to get such a big improvement?

W I L S O N P A G E

Front-end Developer
Firefox OS (Gaia)
London Office

Twitter: @wilsonpage
IRC: wilsonpage

_______________________________________________
dev-fxos mailing list
dev-...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-fxos

Gareth Aye

unread,

Oct 2, 2015, 11:54:46 AM10/2/15

to Wilson Page, James Burke, dev-...@lists.mozilla.org, Eli Perelman

muy cache!

Fabrice Desré

unread,

Oct 2, 2015, 12:02:10 PM10/2/15

to dev-...@lists.mozilla.org

All the nga apps (music, contacts, sms) show significant regressions. Is
that only a lack of optimizations in these apps, in the bridge they all
use or design flaws in nga itself?
In any case, we have to stop porting new apps to nga until these
questions are answered.

Fabrice

--
Fabrice Desré
b2g team
Mozilla Corporation

Eli Perelman

unread,

Oct 2, 2015, 12:02:38 PM10/2/15

to Gareth Aye, Wilson Page, James Burke, dev-...@lists.mozilla.org

If you would like to use Raptor to start performance testing your apps to get these numbers down, it's all documented on MDN [1]. These numbers were captured on a Flame-KK with 319MB of memory with the light reference workload, which is the baseline device for v2.2 -> v2.5. Raptor does require Node v0.12 [2], so if you find you need to switch between Node 0.10 and 0.12 for Gaia, I recommend something like "n" [3] to easily switch between.

[1] https://developer.mozilla.org/en-US/Firefox_OS/Automated_testing/Raptor
[2] https://developer.mozilla.org/en-US/Firefox_OS/Automated_testing/Raptor#Prerequisites
[3] https://www.npmjs.com/package/n

Thanks,

Eli

On Fri, Oct 2, 2015 at 10:54 AM, Gareth Aye <garet...@gmail.com> wrote:

muy cache!

On Fri, Oct 2, 2015 at 11:37 AM, Wilson Page <wp...@mozilla.com> wrote:

Wow! Can email share what changes they made to get such a big improvement?

W I L S O N P A G E

Front-end Developer
Firefox OS (Gaia)
London Office

Twitter: @wilsonpage
IRC: wilsonpage

Gareth Aye

unread,

Oct 2, 2015, 12:22:50 PM10/2/15

to Fabrice Desré, dev-...@lists.mozilla.org

I don't think that there's anything fundamental to our new app design model that would prevent us from getting to startup very quickly. It's just that we're going to have to work on deferring computations and resource loading until after the first render like we did before the nga effort.

On Fri, Oct 2, 2015 at 12:01 PM, Fabrice Desré <fab...@mozilla.com> wrote:

All the nga apps (music, contacts, sms) show significant regressions. Is
that only a lack of optimizations in these apps, in the bridge they all
use or design flaws in nga itself?
In any case, we have to stop porting new apps to nga until these
questions are answered.

Fabrice

--
Fabrice Desré
b2g team
Mozilla Corporation

Reza Akhavan

unread,

Oct 2, 2015, 12:30:45 PM10/2/15

to Eli Perelman, Wilson Page, James Burke, Gareth Aye, dev-...@lists.mozilla.org

I’ve been using nvm [1] to switch versions of node/npm. Sharing here as an alternative to n.

[1] https://github.com/creationix/nvm

On Oct 2, 2015, at 9:02 AM, Eli Perelman <eper...@mozilla.com> wrote:

If you would like to use Raptor to start performance testing your apps to get these numbers down, it's all documented on MDN [1]. These numbers were captured on a Flame-KK with 319MB of memory with the light reference workload, which is the baseline device for v2.2 -> v2.5. Raptor does require Node v0.12 [2], so if you find you need to switch between Node 0.10 and 0.12 for Gaia, I recommend something like "n" [3] to easily switch between.

[1] https://developer.mozilla.org/en-US/Firefox_OS/Automated_testing/Raptor
[2] https://developer.mozilla.org/en-US/Firefox_OS/Automated_testing/Raptor#Prerequisites
[3] https://www.npmjs.com/package/n

Thanks,

Eli

On Fri, Oct 2, 2015 at 10:54 AM, Gareth Aye <garet...@gmail.com> wrote:

muy cache!

On Fri, Oct 2, 2015 at 11:37 AM, Wilson Page <wp...@mozilla.com> wrote:

Wow! Can email share what changes they made to get such a big improvement?

W I L S O N P A G E

Front-end Developer
Firefox OS (Gaia)
London Office

Twitter: @wilsonpage
IRC: wilsonpage

Justin D'Arcangelo

unread,

Oct 2, 2015, 12:31:35 PM10/2/15

to Fabrice Desré, dev-...@lists.mozilla.org

I feel like this is the 3rd or 4th time I’ve had to give this explanation, but at least in the case of Music NGA, we merely landed a completely new, feature-complete app this week. The optimization phase of the code had not yet begun, hence the reason for the increase in the perf numbers. However, prior to landing, I *did* run Raptor every day for the past 2 weeks on Flame. In my Raptor results, Music NGA was coming out ~500ms *faster* than the old app. However, as I noted in the bug, I do not trust the numbers because of the OS-wide perf regression that was causing *both* Music apps to take about 3-4 seconds to launch.

This week, the focus has been mainly on identifying and quickly addressing any bugs that came up after the initial testing of the app. I feel that we have things somewhat under control as far as broken functionality goes. Yesterday, we started working on optimizations. There are several areas where we are completely unoptimized at the moment:

- album art caching/loading
- thumbnail sizes
- script loading
- view caching

All of these items will address the memory usage, startup time or both. So, please do not assume that we spent weeks optimizing the app before landing this week. We merely reached a state of “feature-complete” with a new codebase. We hope to meet or beat the prior app’s numbers before the v2.5 deadline.

Thanks!

-Justin

> On Oct 2, 2015, at 12:01 PM, Fabrice Desré <fab...@mozilla.com> wrote:
>
> All the nga apps (music, contacts, sms) show significant regressions. Is
> that only a lack of optimizations in these apps, in the bridge they all
> use or design flaws in nga itself?
> In any case, we have to stop porting new apps to nga until these
> questions are answered.
>
> Fabrice
>

> --
> Fabrice Desré
> b2g team
> Mozilla Corporation

James Burke

unread,

Oct 2, 2015, 12:33:40 PM10/2/15

to Gareth Aye, Wilson Page, dev-...@lists.mozilla.org, Eli Perelman

On Fri, Oct 2, 2015 at 8:54 AM, Gareth Aye <garet...@gmail.com> wrote:
> muy cache!
>
> On Fri, Oct 2, 2015 at 11:37 AM, Wilson Page <wp...@mozilla.com> wrote:
>>
>> Wow! Can email share what changes they made to get such a big improvement?

Email does use a startup cache, but also uses the cache in 2.2. I
suspect some other factor involved in the high 2.2 number, it should
not be that high, previous releases to 2.2 were not that high with the
same cache mechanism.

We did change some of the startup logic in the 2.5 timeframe, but that
was more about trying to better reason about startup with the async
setMozMessageHandler entry points vs tuning for performance. The
startup cache in 2.2 was stored in cookies, 2.5 now uses localStorage,
which is slightly slower than a cookie store, but allows us to cache
more entry points, and is saner to deal with.

It could be that the very first startup, which does not have a cache
to use, takes long enough to skew the numbers high on 2.2? Testing
manually just now on a flame flash of a 2.2 engineering build, running
at 319MB, the cache is in effect in 2.2 and feels as fast as master,
near instant UI display from cache.

As I recall, the email numbers for 2.2 were in the range of the 2.5
numbers for a while, but at some point changed, but we were not
pushing email changes to 2.2 for a while, already focused on 2.5 code,
so did not feel the urgency to investigate, figured it was something
that would be corrected on its own via something changing in gecko for
example.

If it is an issue to get to the bottom of the higher number in 2.2 I
can look more, but I felt resources were better spent on the current
master efforts since manually testing still indicated 2.2 was fast
with its cache implementation, and larger changes to 2.2 seem
unlikely.

James

Fabrice Desré

unread,

Oct 2, 2015, 12:37:02 PM10/2/15

to Justin D'Arcangelo, dev-...@lists.mozilla.org

Think about people dogfooding. It's barely acceptable to suddenly switch
to a much worse version of any app, even I your target is some arbitrary
deadline in the future and you're confident to fix issues. You would not
do that on a live website right?

Eli Perelman

unread,

Oct 2, 2015, 12:37:57 PM10/2/15

to James Burke, Wilson Page, Gareth Aye, dev-...@lists.mozilla.org

I presume James is correct about the v2.2 launch time of Email, as it had previously been fast, but when we captured the official v2.2 numbers, had increased significantly, but has since been fixed. I'm sure changes have been uplifted to v2.2 that may have resolved that issue.

For the first startup, that isn't an issue with Raptor. Raptor primes every application before actually capturing any metrics to combat any variation the first run of an app has with the subsequent runs.

Eli

Justin D'Arcangelo

unread,

Oct 2, 2015, 12:41:14 PM10/2/15

to Fabrice Desré, dev-...@lists.mozilla.org

I understand, but the whole point of doing the switch now was to ensure that there were more people using the app to make sure there were no showstopper bugs that we weren’t aware of. If we don’t get people using the app for a few weeks, its possible that some major bugs could slip through. Especially in the case of an app like Music where everyone possesses a wide variety of media files. It would be impossible for any of the Music app devs or QA testers to check every possible media file out there. However, with dogfooders using the app, its more likely that more types of media files will get tested.

-Justin

Justin D'Arcangelo

unread,

Oct 2, 2015, 12:49:16 PM10/2/15

to Fabrice Desré, dev-...@lists.mozilla.org

I would also like to add that this policy of immediately pouncing on devs who attempt to try something new that may cause the perf numbers to momentarily dip is part of why we seem to have a culture problem in FxOS dev where everyone is afraid to take any kind of risks. If we are not allowed to have a 2-3 week window to optimize after a huge landing such as this, then how are we supposed to experiment or take risks?

In a worse-case scenario, we have the old Music app in the dev_apps folder that we can switch back to at a moment’s notice. But, we should be encouraging devs by giving them time to optimize after taking a huge risk. We are shipping to Mozillians (foxfooders) right now, who presumably understand that we are trying to make FxOS better.

-Justin

Fabrice Desré

unread,

Oct 2, 2015, 1:17:12 PM10/2/15

to Justin D'Arcangelo, dev-...@lists.mozilla.org

On 10/02/2015 09:49 AM, Justin D'Arcangelo wrote:
> I would also like to add that this policy of immediately pouncing on devs who attempt to try something new that may cause the perf numbers to momentarily dip is part of why we seem to have a culture problem in FxOS dev where everyone is afraid to take any kind of risks. If we are not allowed to have a 2-3 week window to optimize after a huge landing such as this, then how are we supposed to experiment or take risks?

You have all the time you want if you don't put dogfooders at risk. No
one is saying that you should not take the risk to try something new
(side note, you spent enough time on spark & flyweb to know that). But
when it comes to shipping there is a minimum bar to meet, and with
basically a x2 memory usage we are not meeting it in this app yet,
sorry. Feel free to ship a new app alongside the existing one instead
and ask people to try it, since we can't do A/B testing.

The problem we have is that most people don't care enough about having a
stable nightly, which is why we haven't updated dogfooders for more than
a month now.

Fabrice

Etienne Segonzac

unread,

Oct 2, 2015, 1:29:23 PM10/2/15

to Justin D'Arcangelo, Fabrice Desré, dev-...@lists.mozilla.org

On Fri, Oct 2, 2015 at 6:49 PM, Justin D'Arcangelo <jdarc...@mozilla.com> wrote:

I would also like to add that this policy of immediately pouncing on devs who attempt to try something new

Pouncing on devs should never be acceptable... unless a robot is doing it!

In all seriousness, while I'm fully on the "The way to make a program faster is to never let it get slower." [1] boat we don't have the infrastructure (yet) to back that claim.

Even extra cautious devs like Justin (who ran raptor tests locally) get noisy results from the latest gecko regression.

My point is, we can cheerfully talk between humans, make compromises and change course.
Or we can have bots doing aggressive backouts / pre-landing checks (here's a nice read for the weekend [2]).

But aggressive email exchanges between team members won't make our phones any better.

[1] http://www.webkit.org/projects/performance/
[2] https://code.facebook.com/posts/924676474230092/mobile-performance-tooling-infrastructure-at-facebook/

PS: Eli's original message was super nice and useful and I know the raptor team is working really hard to automate more of this! <3

Diego Marcos

unread,

Oct 2, 2015, 1:41:48 PM10/2/15

to Fabrice Desré, Justin D'Arcangelo, dev-...@lists.mozilla.org

We should definitively try to not be too sloppy and always have our dev branch in a working state.

At the same time there’s should some tolerance for regressions in non critical aspects. It’s an essential part of an iterative dev process: move fast and adjust as you go.

It there’s 0 tolerance for regressions we will get into a conservative mindset where people just want to move bits around and not take on any substantial piece of work.

The more ambitious people will work out of separate branches to do what they have to do and dump everything on the main repo as late as possible. New features will be exposed to dogfooders too late in the cycle to uncover bugs and collect proper feedback.

People have to understand that dogfooding carries some risk. I worked at Apple within the iOS team and there were perf and power consumption regressions all the time in the nightly builds. You, of course, keep an eye not the stats and want to hit the numbers before release but regressions during the dev cycle should not be stigmatized.

Diego.

Justin D'Arcangelo

unread,

Oct 2, 2015, 1:51:10 PM10/2/15

to Fabrice Desré, dev-...@lists.mozilla.org

Here’s the latest Raptor numbers, run locally, on Flame:

OGA:

jdarcangelo-20869:gaia Justin$ raptor test coldlaunch --app music-oga --runs 5

[Cold Launch: music-oga.gaiamobile.org] Preparing to start testing...

[Cold Launch: music-oga.gaiamobile.org] Priming application

[Cold Launch: music-oga.gaiamobile.org] Starting run 1

[Cold Launch: music-oga.gaiamobile.org] Run 1 complete

[Cold Launch: music-oga.gaiamobile.org] Starting run 2

[Cold Launch: music-oga.gaiamobile.org] Run 2 complete

[Cold Launch: music-oga.gaiamobile.org] Starting run 3

[Cold Launch: music-oga.gaiamobile.org] Run 3 complete

[Cold Launch: music-oga.gaiamobile.org] Starting run 4

[Cold Launch: music-oga.gaiamobile.org] Run 4 complete

[Cold Launch: music-oga.gaiamobile.org] Starting run 5

[Cold Launch: music-oga.gaiamobile.org] Run 5 complete

[Cold Launch: music-oga.gaiamobile.org] Results from music-oga.gaiamobile.org

| Metric | Mean | Median | Min | Max | StdDev | p95 |

| --------------------- | -------- | ------ | ------ | ------ | ------ | -------- |

| navigationLoaded | 737 | 739 | 715 | 752 | 14.071 | 737 |

| navigationInteractive | 900.600 | 893 | 880 | 924 | 16.305 | 900.600 |

| visuallyLoaded | 1123 | 1121 | 1076 | 1170 | 32.323 | 1123 |

| contentInteractive | 1123.400 | 1121 | 1077 | 1171 | 32.327 | 1123.400 |

| fullyLoaded | 2478.400 | 2474 | 2450 | 2517 | 23.466 | 2478.400 |

| uss | 14.542 | 14.645 | 14.301 | 14.738 | 0.182 | 14.542 |

| rss | 34.377 | 34.480 | 34.133 | 34.574 | 0.185 | 34.377 |

| pss | 18.775 | 18.877 | 18.534 | 18.978 | 0.186 | 18.775 |

[Cold Launch: music-oga.gaiamobile.org] Testing complete

jdarcangelo-20869:gaia Justin$

NGA:

jdarcangelo-20869:gaia Justin$ raptor test coldlaunch --app music --runs 5

[Cold Launch: music.gaiamobile.org] Preparing to start testing...

[Cold Launch: music.gaiamobile.org] Priming application

[Cold Launch: music.gaiamobile.org] Starting run 1

[Cold Launch: music.gaiamobile.org] Run 1 complete

[Cold Launch: music.gaiamobile.org] Starting run 2

[Cold Launch: music.gaiamobile.org] Run 2 complete

[Cold Launch: music.gaiamobile.org] Starting run 3

[Cold Launch: music.gaiamobile.org] Run 3 complete

[Cold Launch: music.gaiamobile.org] Starting run 4

[Cold Launch: music.gaiamobile.org] Run 4 complete

[Cold Launch: music.gaiamobile.org] Starting run 5

[Cold Launch: music.gaiamobile.org] Run 5 complete

[Cold Launch: music.gaiamobile.org] Results from music.gaiamobile.org

| Metric | Mean | Median | Min | Max | StdDev | p95 |

| --------------------- | -------- | ------ | ------ | ------ | ------ | -------- |

| navigationLoaded | 804.200 | 808 | 742 | 858 | 37.151 | 804.200 |

| navigationInteractive | 826.400 | 827 | 773 | 879 | 33.643 | 826.400 |

| visuallyLoaded | 1655.600 | 1625 | 1597 | 1788 | 68.002 | 1655.600 |

| contentInteractive | 1656 | 1625 | 1598 | 1789 | 68.220 | 1656 |

| fullyLoaded | 1656.600 | 1625 | 1598 | 1790 | 68.482 | 1656.600 |

| uss | 20.751 | 20.855 | 19.855 | 21.309 | 0.491 | 20.751 |

| pss | 24.955 | 25.103 | 24.015 | 25.555 | 0.518 | 24.955 |

| rss | 40.479 | 40.594 | 39.578 | 41.035 | 0.493 | 40.479 |

[Cold Launch: music.gaiamobile.org] Testing complete

jdarcangelo-20869:gaia Justin$

I’m actually seeing ~800ms *improvement* with the NGA app in “fullyLoaded”. Maybe Eli can answer this, but is “fullyLoaded” the number I should be looking at? I’m also only seeing a ~6MB regression in USS. All-in-all, this doesn’t seem “that bad” for what is essentially an unoptimized app. Again, I’m confident that we can improve upon these numbers significantly in the next 2-3 weeks.

It also looks like the Gecko-wide perf regression is now gone (I hope). So, maybe we can start trusting these numbers again.

-Justin

Justin D'Arcangelo

unread,

Oct 2, 2015, 1:58:17 PM10/2/15

to Fabrice Desré, dev-...@lists.mozilla.org

Correction, my bad. Eli just informed me that his numbers earlier were from the 319mb Flame configuration. I was under the impression that we were no longer supporting that config. In addition, I was looking at “fullyLoaded” and not “visuallyLoaded”.

We have a patch on the way that should be optimizing the way we fetch album art and should drastically cut down on memory usage. Once that has landed, I will re-test under 319mb and see where we stand.

-Justin

James Burke

unread,

Oct 2, 2015, 2:48:11 PM10/2/15

to Fabrice Desré, Justin D'Arcangelo, dev-...@lists.mozilla.org

On Fri, Oct 2, 2015 at 10:16 AM, Fabrice Desré <fab...@mozilla.com> wrote:
> when it comes to shipping there is a minimum bar to meet, and with
> basically a x2 memory usage we are not meeting it in this app yet,
> sorry. Feel free to ship a new app alongside the existing one instead
> and ask people to try it, since we can't do A/B testing.

I am hopeful that we can get to a place where we do have the option to
allow early adopters to try variations in the apps.

Long term I believe that will be possible with the new security model.
We can host signed apps on domains that have access to traditionally
certified APIs. This will allow us to set up app permutations that can
be easy to try out. I could even see longer term allowing apps to have
their own "trains", nightly.email.gaiamobile.org, etc. It would take a
lot of discipline by the app devs to maintain them, and they may need
to test for gecko versions, but it may be feasible.

In the short term, the Web IDE allows installing certified app
variants. In email, I use gaia-dev-zip[1] to alter the name of the app
for the zip file and manifest so that I can get a capture of the email
app that can be installed via the WebIDE and have it run alongside the
email app that comes preinstalled in gaia.

We will likely use this approach for getting early testers for an
email app variant that supports conversations, to work out the bugs
that way before landing on master.

A bit longer term, if the app does not need certified APIs, then
marketplace delivery of the app can work, particularly after bug
1208633 lands (if you want to use custom elements).

James

[1] https://github.com/jrburke/gaia-dev-zip

zbran...@mozilla.com

unread,

Oct 2, 2015, 4:30:02 PM10/2/15

to mozilla-...@lists.mozilla.org

Eli,

I have to say that I'm surprised by the results, it seems that we have regressions in almost all apps.
I'm wondering if there's a chance that it's in Gecko?

Another litmus test for me is the FM app. From what I know there were very, very little changes to the app over the last couple releases and the only significant change was landed by my team a couple months ago and I did extensive performance testing locally and was monitoring raptor dashboard and the results were actually an improvement.

Because of how little activity there was between 2.2 and 2.5 I can actually revert this single change and see if there's any underlying trend that affects all apps (and maybe Template app is not affected because it doesn't load any JS, CSS etc.).

I landed a lot of small optimizations for apps and I always tested Flame-kk on raptor and the results I got gave me an impression that we should see startup perf improvement between 2.2 and 2.5.
That's why I'm asking about a possibility that there's some platform regression that impacts us.

Other proposal - could we launch FM 2.2 against Gecko from master and compare to FM 2.2 against Gecko from 2.2?

zb.

Justin D'Arcangelo

unread,

Oct 2, 2015, 4:35:40 PM10/2/15

to zbran...@mozilla.com, mozilla-...@lists.mozilla.org

I’m not sure if this was resolved prior to the results, but this OS-wide performance bug was open for a couple weeks at least:

https://bugzilla.mozilla.org/show_bug.cgi?id=1204837

-Justin

zbran...@mozilla.com

unread,

Oct 2, 2015, 4:38:37 PM10/2/15

to mozilla-...@lists.mozilla.org

Another app that didn't change much since 2.2 is Clock.

I landed one major patch that actually should have *improve* memory by 200-500kb [0]

Since nothing else major landed, I'm surprised to see 1mb regression.

Can you tell me how many runs did you do and add stdev to the data? I'm wondering how statistically significant the numbers are.

zb.

[0] https://bugzilla.mozilla.org/show_bug.cgi?id=1207044#c16

zbran...@mozilla.com

unread,

Oct 2, 2015, 4:46:43 PM10/2/15

to mozilla-...@lists.mozilla.org

And another calm app - Video. I landed an L10n/Intl refactor this morning with good perf results [0], but the tests were likely done before.

If you look at Video changelog [1] in the recent months, there's really nothing between November 2014 and my change today.

So, where does 200ms and 900kb regression come from? (side note - 900kb should not be acceptable unless significant chunk of new data has been added to startup path).

I believe that those three apps - Video, FM and Clock should see performance/memory between 2.2 and 2.5 improvement. Out of them, Clock is the hardest to "revert", but FM and Video should be fairly easy to test the same gaia between two geckos and two gaias against the same gecko.

And I believe we should do this before we jump to conclusions because the perf impact on those apps seems to be bigger than for some other apps and if it's all noise it's a red herring. And if it's Gecko impact, we should fix it in Gecko.

zb.

[0] https://bugzilla.mozilla.org/show_bug.cgi?id=1197454#c6
[1] https://github.com/mozilla-b2g/gaia/commits/master/apps/video/js/video.js

Eli Perelman

unread,

Oct 2, 2015, 5:25:39 PM10/2/15

to Zbigniew Braniecki (Gandalf), mozilla-...@lists.mozilla.org

There definitely could be a regression in Gecko. As Justin pointed out, we had a pretty serious regression for the past few weeks that was only resolved today, but my results were gathered after the regression resolution.

For Flame performance automation, we test every application against each build we get from PVT. Each application is tested 30 times, and the metrics for every run are stored in a database which is where the dashboards get their data. That also means we can re-query to slice the data in different ways. The data I provided today represents the 95th percentile of visuallyLoaded and the mean of USS. We typically don't use mean for launch times as they are not informative of the standard deviation of results like a percentile value can. Regardless, here is the current standard deviation of launch times for said apps:

Calendar: 49ms

Camera: 157ms

Clock: 31ms

Contacts: 224ms

Dialer: 22ms

Email: 152ms

FM: 33ms

Gallery: 52ms

Music: 47ms

SMS: 44ms

Settings: 44ms

Video: 41ms

Test Startup Limit: 116ms

Now, I'm not a statistician in the least, but standard deviation representing the "normal distribution", most apps exhibit a stable launch time within about 40-50ms. The 95th percentile gives us a metric that users can care about: "95% of the time my app loads faster than X milliseconds". Interestingly enough, the Test Startup Limit app which we use to determine a general launch baseline, exhibits swings of 116ms in its normal distribution which is concerning, even though it hasn't really regressed its performance. Gareth wrote that app and maybe can give us some insight into what it does that other apps don't that may cause instability in launch time but over consistently achieving its goal.

I'm happy to help provide whatever insight into this that is needed. :)

Thanks,

Eli Perelman

unread,

Oct 2, 2015, 5:39:28 PM10/2/15

to Justin D'Arcangelo, Fabrice Desré, dev-...@lists.mozilla.org

As mentioned in a previous thread [1], we still use the 319MB Flame configuration as it's the only comparable baseline we have between v2.2 and v2.5. We can't drop using the 319MB Flame for v2.5 unless we build a new baseline with a different device configuration against v2.2.

[1] https://groups.google.com/d/msg/mozilla.dev.fxos/uDrAUHGTq18/79YaFJHQCQAJ

Thanks,

Eli Perelman

On Fri, Oct 2, 2015 at 12:58 PM, Justin D'Arcangelo <jdarc...@mozilla.com> wrote:

Correction, my bad. Eli just informed me that his numbers earlier were from the 319mb Flame configuration. I was under the impression that we were no longer supporting that config. In addition, I was looking at “fullyLoaded” and not “visuallyLoaded”.

We have a patch on the way that should be optimizing the way we fetch album art and should drastically cut down on memory usage. Once that has landed, I will re-test under 319mb and see where we stand.

-Justin

On Oct 2, 2015, at 1:16 PM, Fabrice Desré <fab...@mozilla.com> wrote:

On 10/02/2015 09:49 AM, Justin D'Arcangelo wrote:
I would also like to add that this policy of immediately pouncing on devs who attempt to try something new that may cause the perf numbers to momentarily dip is part of why we seem to have a culture problem in FxOS dev where everyone is afraid to take any kind of risks. If we are not allowed to have a 2-3 week window to optimize after a huge landing such as this, then how are we supposed to experiment or take risks?

You have all the time you want if you don't put dogfooders at risk. No
one is saying that you should not take the risk to try something new
(side note, you spent enough time on spark & flyweb to know that). But

when it comes to shipping there is a minimum bar to meet, and with
basically a x2 memory usage we are not meeting it in this app yet,
sorry. Feel free to ship a new app alongside the existing one instead
and ask people to try it, since we can't do A/B testing.

The problem we have is that most people don't care enough about having a
stable nightly, which is why we haven't updated dogfooders for more than
a month now.

Fabrice
--
Fabrice Desré
b2g team
Mozilla Corporation

zbran...@mozilla.com

unread,

Oct 2, 2015, 7:32:25 PM10/2/15

to mozilla-...@lists.mozilla.org

I tried to test my hypothesis using Flame with a build from today (from taskcluster, flame-kk-eng) testing test-startup-limit app (no changes) and FM app (minimal changes):

Here are my results: https://pastebin.mozilla.org/8848224

The bottom line is that I did not reproduce your results for Flame, 319mb for FM.

I used Gecko from today and tested FM and test-startup-limit from 2.2 against 2.5.

So, one should read my read my results as "if we exclude Gecko changes, that's what should have happened".

My results, if anything can be significant, indicate:

1) fullyLoaded improvement in FM on both 319/512 between 2.2 and 2.5
2) visuallyLoaded no change to minimal improvement in FM on both 319/512 between 2.2 and 2.5
3) No noticable performance change between 2.2 and 2.5 in test-startup-limit
4) Potentially a minor memory win in test-startup-limit in 319 scenario
5) Significant memory win in FM between 2.2 and 2.5 in both 319 and 512 scenarios

Based on that, I would say that we might be seeing Gecko regression because your results for FM differ significantly from mine and the only change is that you used 2.2 gecko while I used 2.5 gecko.

Where full tests show 175ms regression in visuallyLoaded and 140kb regression in memory usage, my results show 140ms improvement in visuallyLoaded and 850kb improvement in memory usage.

Also, while your results for 2.5 match mine, your results for FM 2.2 show significant difference that can (only?) be attributed to Gecko changes:

Your results (Gecko 2.2 and FM 2.2):

FM v2.2 cold launch: 604ms

FM v2.2 USS: 10.37MB

My results (Gecko 2.5 an FM 2.2):
FM v2.2 cold launch: 782ms (visuallyLoaded, median)
FM v2.2 USS: 11.31MB (USS, median)

The consistency in mine and your results for 2.5 for FM make me feel a bit more confident with the data. The discrepancy in 2.2 results make me worried about Gecko impact on the results.

If the same app (FM from 2.2) run on two different geckos (2.2 and 2.5) give 180ms difference in results and 1mb difference in memory usage, it would indicate that quite possibly many of the regressions we observe come from Gecko and not Gaia changes.

zb.

Christopher Lord

unread,

Oct 5, 2015, 4:46:38 AM10/5/15

to zbran...@mozilla.com, mozilla-...@lists.mozilla.org

I'll preface what I say with the hopefully obvious statement that we should always aim for everything to be better. That said, however, I'd take a 2mb memory regression and a half-second startup time regression if it meant the app was polished and performed well.

Obviously I want both, but it'd be unfair to compare an app that's buggy and unpolished that has marginally better memory use and/or startup time to a superior app that provides a more polished experience. Especially if we're talking sub-second startup and minimal memory use as it is. I think this partially applies to the music app, which now looks/feels more polished and snappy than before.

Have you guys used an Android phone recently? Their startup time for apps is generally atrocious compared to ours (even on high-end devices) - we shouldn't drop the ball, but it's not where we compare badly. Given we aren't targeting 256mb devices anymore, I'd gladly have all our apps use double the memory they did in 2.2 if it meant we had a consistent 60Hz update, consistent transitions and snappy response.

I also agree that it's unfair to be jumping on devs over such small differences when this is far from a fair comparison (platform changes will have significant effects on these numbers).

--Chris

Eli Perelman

unread,

Oct 5, 2015, 8:29:09 AM10/5/15

to Christopher Lord, mozilla-...@lists.mozilla.org, Zbigniew Braniecki (Gandalf)

I agree with the sentiment about allowing tradeoffs. Improving performance in startup time can often mean consuming more memory, and it's hard to improve both. But that's why these numbers are here: for app owners and release decision makers to decide what tradeoffs to make for the best experience while still keeping perceived performance the best they can.

Like I said before, it is possible that there are platform changes that have caused performance regressions in apps along the release, and we have tried to file bugs along the way to ensure these were not known at the last minute. Maybe some confusion has arisen about the implied assumption between what numbers are published and their cause:

I have provided launch time metrics for each app as they are executed in our automation against regular builds. Just because there is a regression in launch time for an app does not mean the app in question was at fault. Doing due diligence with bisections, etc. will be the determinate of root cause, but these are the numbers nonetheless. :)

Eli Perelman

_______________________________________________
dev-fxos mailing list
dev-...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-fxos

Fabrice Desré

unread,

Oct 5, 2015, 11:19:40 AM10/5/15

to mozilla-...@lists.mozilla.org

On 10/05/2015 01:46 AM, Christopher Lord wrote:
> I'll preface what I say with the hopefully obvious statement that we
> should always aim for everything to be better. That said, however, I'd
> take a 2mb memory regression and a half-second startup time regression
> if it meant the app was polished and performed well.

Some apps regressed by way more than 2MB. And also, beware of the
boiling frog.

> Have you guys used an Android phone recently? Their startup time for
> apps is generally atrocious compared to ours (even on high-end devices)
> - we shouldn't drop the ball, but it's not where we compare badly. Given
> we aren't targeting 256mb devices anymore, I'd gladly have all our apps
> use double the memory they did in 2.2 if it meant we had a consistent
> 60Hz update, consistent transitions and snappy response.

That's not what I see on a Nexus 4 running CM 12 and on a z3c running L.
They are both super fast and snappy when launching the default apps.
Still better than us on the same hardware.

Alexandre Lissy

unread,

Oct 5, 2015, 11:29:31 AM10/5/15

to dev-...@lists.mozilla.org

I cannot but emphasize this. Even Kitkat release (which we are based off
for the currently in the wild port) beats us badly.

>
> Fabrice
>

Paul

unread,

Oct 5, 2015, 2:12:54 PM10/5/15

to dev-...@lists.mozilla.org

> Some apps regressed by way more than 2MB. And also, beware of the
> boiling frog.
>

/*
Sorry to be completely off-topic but the boiling frog thing is actually
a myth[1] for those unfamiliar with it. I do believe it's a nice
metaphor and it fits right here in the discussion.
[1]https://en.wikipedia.org/wiki/Boiling_frog

Gr, Paul
*/

Dietrich Ayala

unread,

Oct 5, 2015, 4:53:53 PM10/5/15

to Fabrice Desré, mozilla-...@lists.mozilla.org

+1 to the frog metaphor.

History has shown it's *incredibly* hard to claw back from performance regressions. And every moment spent doing so is done *at the cost* of exactly the type of work Chris described - work that actually moves the project *forward*.

I strongly disagree with any acceptance of any performance regression for any reason except emergency security patches. Only a zero tolerance policy for perf regressions will result in performant software in such a large and complex project.

If you have a tension between perf and features, then it's time to cut the slow features, or get some more time.

The polish/bugs problems mentioned is fixed by landing fewer bugs (a culture of detailed automated tests and a project-wide love and acceptance of backouts), not by accepting perf regressions.

Also, I recommend not using any subjective measure to compare app startup times across different platforms. We used tools to do this in the past.

(My first patch ever, in 2006, regressed Firefox startup time and I spent a few days on the hook... until my feature could land with no startup hit. Can you tell it had an impact on me :D)

On Mon, Oct 5, 2015 at 5:19 PM Fabrice Desré <fab...@mozilla.com> wrote:

On 10/05/2015 01:46 AM, Christopher Lord wrote:
> I'll preface what I say with the hopefully obvious statement that we
> should always aim for everything to be better. That said, however, I'd
> take a 2mb memory regression and a half-second startup time regression
> if it meant the app was polished and performed well.

Some apps regressed by way more than 2MB. And also, beware of the
boiling frog.

> Have you guys used an Android phone recently? Their startup time for
> apps is generally atrocious compared to ours (even on high-end devices)
> - we shouldn't drop the ball, but it's not where we compare badly. Given
> we aren't targeting 256mb devices anymore, I'd gladly have all our apps
> use double the memory they did in 2.2 if it meant we had a consistent
> 60Hz update, consistent transitions and snappy response.

That's not what I see on a Nexus 4 running CM 12 and on a z3c running L.
They are both super fast and snappy when launching the default apps.
Still better than us on the same hardware.

Naoki Hirata

unread,

Oct 5, 2015, 6:37:13 PM10/5/15

to Dietrich Ayala, Eli Perelman, mozilla-...@lists.mozilla.org, Fabrice Desré

I think to add what dietrich is saying, I personally am not condeming performance on your own local branch or doing test runs there...

I think we're condeming anything that lands on master as an experiment, without testing or without running through our jenkins setup etc for testing performance.

I don't want to speak too much for the performance team, I'm not sure what their load is in handling requests though. Eli is there an easy way for dev to setup try runs w/ performance testing?

Fabrice
--
Fabrice Desré
b2g team
Mozilla Corporation
_______________________________________________
dev-fxos mailing list
dev-...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-fxos

Eli Perelman

unread,

Oct 5, 2015, 6:47:19 PM10/5/15

to Naoki Hirata, mozilla-...@lists.mozilla.org, Fabrice Desré, Dietrich Ayala

If by try runs you mean automated performance testing when opening a PR, then no. Right now the best way to ensure performance is up-to-snuff with your patch is to run Raptor during development. With Raptor installed, use a performance profile on the device by using the same flags as `make raptor`, then testing an app is as easy as:

raptor test coldlaunch --app clock --runs 10

raptor test coldlaunch --app communications --entry-point dialer --runs 20

During development use a small number of runs for a tighter feedback loop. Before committing/landing, use many runs to ensure a better statistical guarantee. See the Raptor docs for details on getting started [1]. If you need any help getting up and running with Raptor, Rob Wood and myself would be happy to lend a hand, just ping us [2].

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Firefox_OS/Automated_testing/Raptor#Getting_Started

[2] We're in #fxos or #raptor

Eli Perelman

Fabrice
--
Fabrice Desré
b2g team
Mozilla Corporation
_______________________________________________
dev-fxos mailing list
dev-...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-fxos

rnico...@mozilla.com

unread,

Oct 5, 2015, 7:56:28 PM10/5/15

to mozilla-...@lists.mozilla.org

My first reaction to reading this thread was that it was probable there was a gecko regression affecting startup time. This is because I was quite sure there have been no changes to the video app since 2.2 that would affect startup time.

To test my theory, I ran tests comparing the 2.2 video app on 2.5 gecko to the 2.5 video app on 2.5 gecko. There was virtually no difference in cold startup time. I also ran tests with the 2.2 app on 2.2 gecko, which proved to have much faster startup times compared to the tests with the 2.5 gecko.

This information, along with the fact that many apps have regressed startup performance since 2.2, suggests to me there is a gecko startup time regression.

-Russ

Justin D'Arcangelo

unread,

Oct 5, 2015, 8:16:13 PM10/5/15

to Eli Perelman, mozilla-...@lists.mozilla.org, Fabrice Desré, Dietrich Ayala, Naoki Hirata

I'd also like to add that we landed a series of performance patches to Music NGA on Friday which bring our cold launch startup times to within 100ms of the OGA app. We still have several more performance patches forthcoming that should bring that number down even more and also help cut our memory utilization (I actually wouldn't be surprised if we end up beating the old cold launch numbers). So, for dogfooders, they should not be seeing any noticeable startup time regressions anymore with today's build. All the while, we are getting more eyes on the new app and are addressing any new bugs reported that we would not be seeing if we hadn't landed. We waited until we felt we had a stable/feature-complete version of the app before we landed on master. While we knew the app was not fully optimized yet, the version of the app we initially landed should never have felt incomplete, buggy or drastically slower than the old app to anyone dogfooding master. Additionally, if we thought we would not be able to get performance right, we would not have landed in the first place. Now that we've practically closed the gap in the startup time metric, all that remains is tightening up the memory utilization.

-Justin

Fabrice
--
Fabrice Desré
b2g team
Mozilla Corporation
_______________________________________________
dev-fxos mailing list
dev-...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-fxos

_______________________________________________
dev-fxos mailing list
dev-...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-fxos

Fabrice Desré

unread,

Oct 5, 2015, 8:31:03 PM10/5/15

to dev-...@lists.mozilla.org

On 10/05/2015 04:56 PM, rnico...@mozilla.com wrote:

> This information, along with the fact that many apps have regressed startup performance since 2.2, suggests to me there is a gecko startup time regression.

That's absolutely possible. Can you bisect gecko to find where we regressed?

Julien Wajsberg

unread,

Oct 6, 2015, 4:41:24 AM10/6/15

to dev-...@lists.mozilla.org

Le 02/10/2015 19:16, Fabrice Desré a écrit :
> The problem we have is that most people don't care enough about having a
> stable nightly, which is why we haven't updated dogfooders for more than
> a month now.
>

Memory usage and launch time regressions are not the regressions that
prevented us from updating dogfooders.

signature.asc

Julien Wajsberg

unread,

Oct 6, 2015, 4:46:14 AM10/6/15

to dev-...@lists.mozilla.org

Le 02/10/2015 18:01, Fabrice Desré a écrit :
> All the nga apps (music, contacts, sms) show significant regressions. Is
> that only a lack of optimizations in these apps, in the bridge they all
> use or design flaws in nga itself?

We always knew that using workers would make us use more memory and
impair launch time. But we hoped that splitting views and using
serviceworkers would compensate for this.
Sadly, we can't use the split views because we lack prerendering, and we
cant use the serviceworkers because... well it's not ready.

I don't think this shows design flaws. I'm not saying NGA is perfect,
but the outlined issues are not showing anything because we haven't
implemented the parts of NGA that actually improves the perceived
performance.

For 2.5 we definitely want to make this better. Yet we also need to
decide what is(are) our target device(s). Our apps taking some more MB
is not a big deal on the Sony.

signature.asc

Etienne Segonzac

unread,

Oct 6, 2015, 4:52:33 AM10/6/15

to Eli Perelman, mozilla-...@lists.mozilla.org, Fabrice Desré, Dietrich Ayala, Naoki Hirata

On Tue, Oct 6, 2015 at 12:47 AM, Eli Perelman <eper...@mozilla.com> wrote:

If by try runs you mean automated performance testing when opening a PR, then no. Right now the best way to ensure performance is up-to-snuff with your patch is to run Raptor during development. With Raptor installed, use a performance profile on the device by using the same flags as `make raptor`, then testing an app is as easy as:

raptor test coldlaunch --app clock --runs 10
raptor test coldlaunch --app communications --entry-point dialer --runs 20

During development use a small number of runs for a tighter feedback loop. Before committing/landing, use many runs to ensure a better statistical guarantee. See the Raptor docs for details on getting started [1]. If you need any help getting up and running with Raptor, Rob Wood and myself would be happy to lend a hand, just ping us [2].

Fabrice, Dietrich, does your "strong stance" on performance includes having gecko developers run raptor tests manually before landing too?

Because without significant infrastructure/tooling investment this is just big talk...

Julien Wajsberg

unread,

Oct 6, 2015, 4:57:20 AM10/6/15

to dev-...@lists.mozilla.org

Le 02/10/2015 19:16, Fabrice Desré a écrit :

> On 10/02/2015 09:49 AM, Justin D'Arcangelo wrote:
>> I would also like to add that this policy of immediately pouncing on devs who attempt to try something new that may cause the perf numbers to momentarily dip is part of why we seem to have a culture problem in FxOS dev where everyone is afraid to take any kind of risks. If we are not allowed to have a 2-3 week window to optimize after a huge landing such as this, then how are we supposed to experiment or take risks?
> You have all the time you want if you don't put dogfooders at risk. No
> one is saying that you should not take the risk to try something new
> (side note, you spent enough time on spark & flyweb to know that). But
> when it comes to shipping there is a minimum bar to meet, and with
> basically a x2 memory usage we are not meeting it in this app yet,
> sorry. Feel free to ship a new app alongside the existing one instead
> and ask people to try it, since we can't do A/B testing.

Sorry, I disagree here. I don't completely disagree though, so bear with
me :)

I think the best way to find bugs and regressions is exposing the
changes to users. _of course_ we need to make sure we don't badly break
the phone first. But users of the master branch will have regressions.
That's normal and expected. Any big feature will get at least a handful
of regressions. Our goal is to track them and fix them, before we ship
to less technical/less engaged users. IMO that's why we wanted the
dogfood process in the first place.

If you want that dogfooders don't get the master regressions, then don't
use the master branch. BTW I personally think we should let the
dogfooders choose between "master-dogfood" and "aurora-dogfood" branches.

Now, I guess you're afraid that we're losing dogfooders, even those on
master that are aware they can get issues. Big news, they don't leave
the program because an app takes 2x memory. Users don't even see it. You
can look at the list of foxfood bugs [1], very few bugs have "slow" or
"performance" in their summary.

So please don't mix and confuse topics and concerns. The performance
concern is important, but it's not what puts dogfooders at risk.

[1]
https://bugzilla.mozilla.org/buglist.cgi?keywords=foxfood&keywords_type=allwords&list_id=12593021&resolution=---&query_format=advanced

--
Julien

signature.asc

Dietrich Ayala

unread,

Oct 6, 2015, 5:01:46 AM10/6/15

to Etienne Segonzac, Eli Perelman, mozilla-...@lists.mozilla.org, Fabrice Desré, Naoki Hirata

Yes, absolutely. It's standard practice to run all of Firefox's performance tests against Gecko landings. It should not be any different for FxOS, IMO.

I've always seen unhiding of all types of Gaia tests on mozilla-central as a precursor for solving our cross-tree woes. Until that happens, Gaia will continue to get regressed by Gecko changes, causing noise and churn in the Gaia dev cycle.

Dietrich Ayala

unread,

Oct 6, 2015, 5:12:26 AM10/6/15

to Etienne Segonzac, Eli Perelman, mozilla-...@lists.mozilla.org, Fabrice Desré, Naoki Hirata

To be clear, you mean manually but not locally, right? I am talking about using automation prior to landing on the trees consumed by downstream, like is done with gecko+Firefox with inbound trees, etc. Things like try servers.

Nobody should have to run them locally and our learning there is that locally run test results are not to be trusted.

Etienne Segonzac

unread,

Oct 6, 2015, 5:30:01 AM10/6/15

to Dietrich Ayala, mozilla-...@lists.mozilla.org, Fabrice Desré, Naoki Hirata, Eli Perelman

On Tue, Oct 6, 2015 at 11:12 AM, Dietrich Ayala <auto...@gmail.com> wrote:

To be clear, you mean manually but not locally, right? I am talking about using automation prior to landing on the trees consumed by downstream, like is done with gecko+Firefox with inbound trees, etc. Things like try servers.

Nobody should have to run them locally and our learning there is that locally run test results are not to be trusted.

Interesting. We're in complete agreement about what we need (sorry about the snarky-ness of my earlier comment).

But the harsh reality is that raptor tests are still too noisy to be displayed on gaia-try.

Which means there's a long way to go before gecko try builds warn platform developers about OS performance regressions.

It's a huge task, with many technical challenges (AFAIK we'd need on-device runs to get really good results), and we have 2 developers on it (against, AFAIK).

This is why I'd like to see a bit less individual-developer bashing and a bit more organizational change to show that, as a project, we care about performance.

Dietrich Ayala

unread,

Oct 6, 2015, 5:35:39 AM10/6/15

to Etienne Segonzac, mozilla-...@lists.mozilla.org, Fabrice Desré, Naoki Hirata, Eli Perelman

I don't see any individual bashing in this thread. It's an organizational change, as you say.

Fabrice Desré

unread,

Oct 6, 2015, 10:49:30 AM10/6/15

to Dietrich Ayala, Etienne Segonzac, mozilla-...@lists.mozilla.org, Naoki Hirata, Eli Perelman

Raptor automatically reports performance regressions. If one is due to
gecko (like when someone broke nuwa recently) it needs to be treated the
same way we would do with gaia. I see absolutely no difference there.

Note that we don't run Talos performance tests on try either - you get
emails after the fact if your commit is in the range of a regression.
Maybe raptor should also send this kind of emails?

Fabrice

On 10/06/2015 02:35 AM, Dietrich Ayala wrote:
> I don't see any individual bashing in this thread. It's an
> organizational change, as you say.
>
>
> On Tue, Oct 6, 2015, 11:29 AM Etienne Segonzac <eti...@mozilla.com
> <mailto:eti...@mozilla.com>> wrote:
>
>
> On Tue, Oct 6, 2015 at 11:12 AM, Dietrich Ayala <auto...@gmail.com
> <mailto:auto...@gmail.com>> wrote:
>
> To be clear, you mean manually but not locally, right? I am
> talking about using automation prior to landing on the trees
> consumed by downstream, like is done with gecko+Firefox with
> inbound trees, etc. Things like try servers.
>
> Nobody should have to run them locally and our learning there is
> that locally run test results are not to be trusted.
>
>
> Interesting. We're in complete agreement about what we need (sorry
> about the snarky-ness of my earlier comment).
>
> But the harsh reality is that raptor tests are still too noisy to be

> displayed on *gaia*-try.

> Which means there's a long way to go before gecko try builds warn
> platform developers about OS performance regressions.
>
> It's a huge task, with many technical challenges (AFAIK we'd need
> on-device runs to get really good results), and we have 2 developers
> on it (against, AFAIK).
>
> This is why I'd like to see a bit less individual-developer bashing
> and a bit more organizational change to show that, as a project, we
> care about performance.
>

Etienne Segonzac

unread,

Oct 6, 2015, 11:21:00 AM10/6/15

to Fabrice Desré, mozilla-...@lists.mozilla.org, Naoki Hirata, Eli Perelman, Dietrich Ayala

On Tue, Oct 6, 2015 at 4:48 PM, Fabrice Desré <fab...@mozilla.com> wrote:

Raptor automatically reports performance regressions. If one is due to
gecko (like when someone broke nuwa recently) it needs to be treated the
same way we would do with gaia. I see absolutely no difference there.

Well this is clearly not happening.
You just asked Russ to bisect gecko to pinpoint a potential performance regression found by manually testing the video app.

What exactly was done automatically in this scenario?

Note that we don't run Talos performance tests on try either - you get
emails after the fact if your commit is in the range of a regression.
Maybe raptor should also send this kind of emails?

As always there's no disagreement about what we need. But there's a big disconnect with what we have.

Fabrice Desré

unread,

Oct 6, 2015, 12:26:52 PM10/6/15

to Etienne Segonzac, mozilla-...@lists.mozilla.org, Naoki Hirata, Eli Perelman, Dietrich Ayala

On 10/06/2015 08:20 AM, Etienne Segonzac wrote:
>
> On Tue, Oct 6, 2015 at 4:48 PM, Fabrice Desré <fab...@mozilla.com
> <mailto:fab...@mozilla.com>> wrote:
>
> Raptor automatically reports performance regressions. If one is due to
> gecko (like when someone broke nuwa recently) it needs to be treated the
> same way we would do with gaia. I see absolutely no difference there.
>
> Well this is clearly not happening.
> You just asked Russ to bisect gecko to pinpoint a potential performance
> regression found by manually testing the video app.
> What exactly was done automatically in this scenario?

Nothing it seems, and there can be several causes:
- raptor was not ready yet when the regression happened.
- it's not a single but a bunch a small regressions that individually
were buried in the noise but overall ended up being large. Unfortunately
that will make bisecting hard, so we may have to live with it.

> Note that we don't run Talos performance tests on try either - you get
> emails after the fact if your commit is in the range of a regression.
> Maybe raptor should also send this kind of emails?
>
> As always there's no disagreement about what we need. But there's a big
> disconnect with what we have.

Not that much. We only need automatic emailing from raptor. Eli, how
hard does that look?

Fabrice (happy to not be in the Paris office, I feel my head would
probably be on a fork right now).

Fabrice Desré

unread,

Oct 6, 2015, 12:28:12 PM10/6/15

to Julien Wajsberg, dev-...@lists.mozilla.org

On 10/06/2015 01:46 AM, Julien Wajsberg wrote:

> I don't think this shows design flaws. I'm not saying NGA is perfect,
> but the outlined issues are not showing anything because we haven't
> implemented the parts of NGA that actually improves the perceived
> performance.

If we never get to a good enough performance because of platform issues,
the risk we took by designing the app architecture before we had the
platform pieces ends up being a design flaw.

I'm not anti-NGA at all. This looks great but maybe we put the cart
before the horse a little bit.

> For 2.5 we definitely want to make this better. Yet we also need to
> decide what is(are) our target device(s). Our apps taking some more MB
> is not a big deal on the Sony.

I keep hearing that some partners may use 2.5, and they will likely not
be on a 2GB device. We can't claim that everything's fine because we
don't exhaust 2GB when running the SMS app ;) (also, tragedy of the
commons, etc.)

Fabrice

Gregor Wagner

unread,

Oct 6, 2015, 12:37:03 PM10/6/15

to Fabrice Desré, Etienne Segonzac, mozilla-...@lists.mozilla.org, Eli Perelman, Dietrich Ayala, Naoki Hirata

>
> Not that much. We only need automatic emailing from raptor. Eli, how
> hard does that look?

We already file bugs in the right component if an app regresses or in the performance component if multiple apps regress. Bobby and Eli follow up on every single bug but once the app owners are notified it should be their job to figure out what happened. The regression detection algorithm might have some flaws but we are trying to get better.
Everything is documented in bugzilla and I don’t think emails are a better alternative.

We are also talking to the task cluster team this week about integrating raptor into the UI so it can be sheriffed. No promises on this part yet :)

-Gregor

Fabrice Desré

unread,

Oct 6, 2015, 12:40:26 PM10/6/15

to Julien Wajsberg, dev-...@lists.mozilla.org

On 10/06/2015 01:57 AM, Julien Wajsberg wrote:
> Le 02/10/2015 19:16, Fabrice Desré a écrit :
>> On 10/02/2015 09:49 AM, Justin D'Arcangelo wrote:
>>> I would also like to add that this policy of immediately pouncing on devs who attempt to try something new that may cause the perf numbers to momentarily dip is part of why we seem to have a culture problem in FxOS dev where everyone is afraid to take any kind of risks. If we are not allowed to have a 2-3 week window to optimize after a huge landing such as this, then how are we supposed to experiment or take risks?
>> You have all the time you want if you don't put dogfooders at risk. No
>> one is saying that you should not take the risk to try something new
>> (side note, you spent enough time on spark & flyweb to know that). But
>> when it comes to shipping there is a minimum bar to meet, and with
>> basically a x2 memory usage we are not meeting it in this app yet,
>> sorry. Feel free to ship a new app alongside the existing one instead
>> and ask people to try it, since we can't do A/B testing.
>
> Sorry, I disagree here. I don't completely disagree though, so bear with
> me :)
>
> I think the best way to find bugs and regressions is exposing the
> changes to users. _of course_ we need to make sure we don't badly break
> the phone first. But users of the master branch will have regressions.
> That's normal and expected. Any big feature will get at least a handful
> of regressions. Our goal is to track them and fix them, before we ship
> to less technical/less engaged users. IMO that's why we wanted the
> dogfood process in the first place.

I guess we only disagree on the magnitude of the regressions we are
happy to ship to dogfooders. Your bar seems higher than mine.

> If you want that dogfooders don't get the master regressions, then don't
> use the master branch. BTW I personally think we should let the
> dogfooders choose between "master-dogfood" and "aurora-dogfood" branches.

There's already "dogfood" (with QA sign off) and "dogfood-latest"
(nightlies, use at your own risk).

> Now, I guess you're afraid that we're losing dogfooders, even those on
> master that are aware they can get issues. Big news, they don't leave
> the program because an app takes 2x memory. Users don't even see it. You
> can look at the list of foxfood bugs [1], very few bugs have "slow" or
> "performance" in their summary.
>
> So please don't mix and confuse topics and concerns. The performance
> concern is important, but it's not what puts dogfooders at risk.

Well... we have almost no dogfooders, because we have been unable to
ship updates fixing a bunch of bugs that were submitted at the beginning
of the program. So right now I don't think we can draw any conclusions
from the foxfood feedback unfortunately. And it's not because they won't
notice memory regressions that they are not important. I was merely
pointing out that we have an overall quality issue, and memory/startup
time regressions are part of that.

Justin D'Arcangelo

unread,

Oct 6, 2015, 12:43:05 PM10/6/15

to Dietrich Ayala, mozilla-...@lists.mozilla.org, Fabrice Desré

On Oct 5, 2015, at 4:53 PM, Dietrich Ayala <auto...@gmail.com> wrote:

+1 to the frog metaphor.

History has shown it's *incredibly* hard to claw back from performance regressions. And every moment spent doing so is done *at the cost* of exactly the type of work Chris described - work that actually moves the project *forward*.

I strongly disagree with any acceptance of any performance regression for any reason except emergency security patches. Only a zero tolerance policy for perf regressions will result in performant software in such a large and complex project.

If you have a tension between perf and features, then it's time to cut the slow features, or get some more time.

I feel like most of our apps are severely lacking in the features department already. It would be insane if we expect to be able to add new features to our apps and have our perf numbers remain exactly where they’re at. I’m not saying the sky should be the limit for startup time/memory usage, but we should expect at least *some* increase when new functionality is added. Now, in most cases, new features can be added without affecting the numbers at all, but there are still times when this is simply not possible.

The polish/bugs problems mentioned is fixed by landing fewer bugs (a culture of detailed automated tests and a project-wide love and acceptance of backouts), not by accepting perf regressions.

Also, I recommend not using any subjective measure to compare app startup times across different platforms. We used tools to do this in the past.

(My first patch ever, in 2006, regressed Firefox startup time and I spent a few days on the hook... until my feature could land with no startup hit. Can you tell it had an impact on me :D)

On Mon, Oct 5, 2015 at 5:19 PM Fabrice Desré <fab...@mozilla.com> wrote:

On 10/05/2015 01:46 AM, Christopher Lord wrote:
> I'll preface what I say with the hopefully obvious statement that we
> should always aim for everything to be better. That said, however, I'd
> take a 2mb memory regression and a half-second startup time regression
> if it meant the app was polished and performed well.

Some apps regressed by way more than 2MB. And also, beware of the
boiling frog.

> Have you guys used an Android phone recently? Their startup time for
> apps is generally atrocious compared to ours (even on high-end devices)
> - we shouldn't drop the ball, but it's not where we compare badly. Given
> we aren't targeting 256mb devices anymore, I'd gladly have all our apps
> use double the memory they did in 2.2 if it meant we had a consistent
> 60Hz update, consistent transitions and snappy response.

That's not what I see on a Nexus 4 running CM 12 and on a z3c running L.
They are both super fast and snappy when launching the default apps.
Still better than us on the same hardware.

Fabrice
--
Fabrice Desré
b2g team
Mozilla Corporation

_______________________________________________
dev-fxos mailing list
dev-...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-fxos

Fabrice Desré

unread,

Oct 6, 2015, 12:44:40 PM10/6/15

to Gregor Wagner, Etienne Segonzac, mozilla-...@lists.mozilla.org, Eli Perelman, Dietrich Ayala, Naoki Hirata

On 10/06/2015 09:36 AM, Gregor Wagner wrote:
>
>>
>> Not that much. We only need automatic emailing from raptor. Eli, how
>> hard does that look?
>
> We already file bugs in the right component if an app regresses or in the performance component if multiple apps regress. Bobby and Eli follow up on every single bug but once the app owners are notified it should be their job to figure out what happened. The regression detection algorithm might have some flaws but we are trying to get better.
> Everything is documented in bugzilla and I don’t think emails are a better alternative.

Ok, if you feel we don't need more I'm fine with that.

> We are also talking to the task cluster team this week about integrating raptor into the UI so it can be sheriffed. No promises on this part yet :)

\o/

Justin D'Arcangelo

unread,

Oct 6, 2015, 12:58:00 PM10/6/15

to Fabrice Desré, Julien Wajsberg, dev-...@lists.mozilla.org

On Oct 6, 2015, at 12:39 PM, Fabrice Desré <fab...@mozilla.com> wrote:

On 10/06/2015 01:57 AM, Julien Wajsberg wrote:
Le 02/10/2015 19:16, Fabrice Desré a écrit :
On 10/02/2015 09:49 AM, Justin D'Arcangelo wrote:
I would also like to add that this policy of immediately pouncing on devs who attempt to try something new that may cause the perf numbers to momentarily dip is part of why we seem to have a culture problem in FxOS dev where everyone is afraid to take any kind of risks. If we are not allowed to have a 2-3 week window to optimize after a huge landing such as this, then how are we supposed to experiment or take risks?
You have all the time you want if you don't put dogfooders at risk. No
one is saying that you should not take the risk to try something new
(side note, you spent enough time on spark & flyweb to know that). But
when it comes to shipping there is a minimum bar to meet, and with
basically a x2 memory usage we are not meeting it in this app yet,
sorry. Feel free to ship a new app alongside the existing one instead
and ask people to try it, since we can't do A/B testing.

Sorry, I disagree here. I don't completely disagree though, so bear with
me :)

I think the best way to find bugs and regressions is exposing the
changes to users. _of course_ we need to make sure we don't badly break
the phone first. But users of the master branch will have regressions.
That's normal and expected. Any big feature will get at least a handful
of regressions. Our goal is to track them and fix them, before we ship
to less technical/less engaged users. IMO that's why we wanted the
dogfood process in the first place.

I guess we only disagree on the magnitude of the regressions we are
happy to ship to dogfooders. Your bar seems higher than mine.

TBH, if you look at the Raptor dashboards, the Music app startup times were terrible both before and after landing due to the massive NUWA regression. If someone was dogfooding daily builds from master during those two days, they likely would not have even noticed any difference because it was so bad to begin with. But, as others have already mentioned, dogfooders have not seen a new build in a while, so this is kind of a moot point. Also, if you look at the numbers from this week, we are ~100ms away from v2.2 startup times and RSS memory is within 6MB of the old app now due to optimizations that were landed in the 5 days following landing.

I understand that you want to set the bar high, I feel the same way about code quality. But, if you put things into perspective, and remove the noise from the massive 2-week NUWA regression, the Music app is actually in really good shape and is continuing to improve even more throughout this week. We also now have the added benefit of code quality which helps us move much faster than we would be able to with the old codebase.

So, even if we had been shipping regular builds to dogfooders during this time, and assuming that the NUWA regression didn’t happen, they would *maybe* notice a ~100ms slower startup time from the old app to the new app.

If you want that dogfooders don't get the master regressions, then don't
use the master branch. BTW I personally think we should let the
dogfooders choose between "master-dogfood" and "aurora-dogfood" branches.

There's already "dogfood" (with QA sign off) and "dogfood-latest"
(nightlies, use at your own risk).

Now, I guess you're afraid that we're losing dogfooders, even those on
master that are aware they can get issues. Big news, they don't leave
the program because an app takes 2x memory. Users don't even see it. You
can look at the list of foxfood bugs [1], very few bugs have "slow" or
"performance" in their summary.

So please don't mix and confuse topics and concerns. The performance
concern is important, but it's not what puts dogfooders at risk.

Well... we have almost no dogfooders, because we have been unable to
ship updates fixing a bunch of bugs that were submitted at the beginning
of the program. So right now I don't think we can draw any conclusions
from the foxfood feedback unfortunately. And it's not because they won't
notice memory regressions that they are not important. I was merely
pointing out that we have an overall quality issue, and memory/startup
time regressions are part of that.

Fabrice
--
Fabrice Desré
b2g team
Mozilla Corporation

Etienne Segonzac

unread,

Oct 6, 2015, 12:58:34 PM10/6/15

to Fabrice Desré, mozilla-...@lists.mozilla.org, Eli Perelman, Dietrich Ayala, Gregor Wagner, Naoki Hirata

So we're fine with the system that didn't work for 2.5 and we're making no promise for the future.

Nice commitment to performance.

Fabrice Desré

unread,

Oct 6, 2015, 1:06:40 PM10/6/15

to Etienne Segonzac, mozilla-...@lists.mozilla.org, Eli Perelman, Dietrich Ayala, Gregor Wagner, Naoki Hirata

On 10/06/2015 09:58 AM, Etienne Segonzac wrote:

> So we're fine with the system that didn't work for 2.5 and we're making
> no promise for the future.
> Nice commitment to performance.

We have tools to detect regressions and report bugs. We have people
triaging and following up on these bugs. What's left? Locking down devs
to fix issues? If you have suggestions I'm all ears, but I'm out of
politically correct ideas.

Eli Perelman

unread,

Oct 6, 2015, 1:21:51 PM10/6/15

to Etienne Segonzac, mozilla-...@lists.mozilla.org, Fabrice Desré, Dietrich Ayala, Gregor Wagner, Naoki Hirata

On Tue, Oct 6, 2015 at 11:58 AM, Etienne Segonzac <eti...@mozilla.com> wrote:

So we're fine with the system that didn't work for 2.5 and we're making no promise for the future.
Nice commitment to performance.

I'd hardly say that what we've built doesn't have a positive impact towards performance. The fact that this conversation can even exist with real data is a testament to how far we've come. The intent of my email was not to back people into corners and play blame games, but just to shine a light as to what things look like right now so owners and peers have ammunition to make decisions. Let me repeat what I just said, because it is the crux of the problem:

The current scope of performance automation is not in the state that it is something automatically sheriffable. We've built up the tools and infrastructure from almost the ground-up and it has accomplished exactly what it was set out to do: to put the knowledge of performance and problems into the hands of owners and peers to make their own decisions.

Automating performance is usually not a binary decision like a unit test. It takes analysis and guesswork, and even then it still needs human eyes. Rob and I are working towards making this better, to automate as much as possible, but right now the burden of making the tough calls still lies with those landing patches. We equip you to make those determinations until we have more tooling and automation in place for the sheriffing to actually be an option, because right now it is not.

To be honest, most of the bugs we file against performance see very little activity or are acted on too late. In my eyes, for those components it may just not be as high a priority as other issues. I can't really blame owners for this, because I believe we have a history of making performance a lower priority than it should be. Specifically, I have been involved in almost every conversation about performance requirements since approximately v1.4, and those requirements have been abandoned every single time. Yes, that even includes v2.5.

If there is a problem with the way things are done, let's get these out on the table, and actually stick to our guns when we say that we believe in performance. :)

Eli

Etienne Segonzac

unread,

Oct 6, 2015, 1:55:08 PM10/6/15

to Eli Perelman, mozilla-...@lists.mozilla.org, Fabrice Desré, Dietrich Ayala, Gregor Wagner, Naoki Hirata

On Tue, Oct 6, 2015 at 7:21 PM, Eli Perelman <eper...@mozilla.com> wrote:

On Tue, Oct 6, 2015 at 11:58 AM, Etienne Segonzac <eti...@mozilla.com> wrote:

So we're fine with the system that didn't work for 2.5 and we're making no promise for the future.
Nice commitment to performance.

I'd hardly say that what we've built doesn't have a positive impact towards performance. The fact that this conversation can even exist with real data is a testament to how far we've come. The intent of my email was not to back people into corners and play blame games, but just to shine a light as to what things look like right now so owners and peers have ammunition to make decisions. Let me repeat what I just said, because it is the crux of the problem:

The current scope of performance automation is not in the state that it is something automatically sheriffable. We've built up the tools and infrastructure from almost the ground-up and it has accomplished exactly what it was set out to do: to put the knowledge of performance and problems into the hands of owners and peers to make their own decisions.

Automating performance is usually not a binary decision like a unit test. It takes analysis and guesswork, and even then it still needs human eyes. Rob and I are working towards making this better, to automate as much as possible, but right now the burden of making the tough calls still lies with those landing patches. We equip you to make those determinations until we have more tooling and automation in place for the sheriffing to actually be an option, because right now it is not.

We're back to my first message on the thread.

We don't have the adequate tooling to achieve our performance goal.

Every release we talk about performance like if the issue was a "developer awareness" issue, and we take strong stance on how "we should never regress".

But if we meant it we'd have more that 2 people working on the very challenging tooling work required. And believe me I'm fully aware of how challenging it is.

We can't hand every gaia and gecko developer a link to moztrap (manual test case tracker), remove all automated tests, and then be all high-minded about how we should never regress a feature. But it's exactly what we're doing with launch time performance.

Naoki Hirata

unread,

Oct 6, 2015, 2:12:15 PM10/6/15

to Etienne Segonzac, mozilla-...@lists.mozilla.org, Fabrice Desré, Dietrich Ayala, Gregor Wagner, Eli Perelman

We're back to my first message on the thread.
We don't have the adequate tooling to achieve our performance goal.

I think the question lies in answering, how do we resolve this, who resolves this? Perhaps having a quarter goal for some one would help push this to come through? I think it's evident that we need someone to work on it and have it part of their goals even if it's in parts.

Getting buy in that devs will use the tooling is also important though. That in itself is a challenge even if the tooling is there.

Every release we talk about performance like if the issue was a "developer awareness" issue, and we take strong stance on how "we should never regress".

But if we meant it we'd have more that 2 people working on the very challenging tooling work required. And believe me I'm fully aware of how challenging it is.

Completely agree.

We can't hand every gaia and gecko developer a link to moztrap (manual test case tracker), remove all automated tests, and then be all high-minded about how we should never regress a feature. But it's exactly what we're doing with launch time performance.

Again, completely agree.

Dietrich Ayala

unread,

Oct 6, 2015, 2:18:33 PM10/6/15

to Etienne Segonzac, Eli Perelman, mozilla-...@lists.mozilla.org, Fabrice Desré, Gregor Wagner, Naoki Hirata

Developer awareness absolutely part of it, and I don't think we're there yet, based on this thread. But I'm also a perf regression extremist :)

The tooling is the other part, and it sounds like now the only tooling we have is for self-service awareness.

If you require better tooling and people to build the automation you need to ship performant software, then Gaia as a group should demand that from the people who are doing the prioritization and hiring for the project.

From our previous experiences, the ideal automation and tooling almost always comes from the developers involved, or they're at least fully integrated into that project. I recommend as a group you design (and possibly even build) the perf automation you want to see in the world. It will work better than building up a separate team and making it their responsibility.

On Tue, Oct 6, 2015 at 7:55 PM Etienne Segonzac <eti...@mozilla.com> wrote:

On Tue, Oct 6, 2015 at 7:21 PM, Eli Perelman <eper...@mozilla.com> wrote:
On Tue, Oct 6, 2015 at 11:58 AM, Etienne Segonzac <eti...@mozilla.com> wrote:

So we're fine with the system that didn't work for 2.5 and we're making no promise for the future.
Nice commitment to performance.

I'd hardly say that what we've built doesn't have a positive impact towards performance. The fact that this conversation can even exist with real data is a testament to how far we've come. The intent of my email was not to back people into corners and play blame games, but just to shine a light as to what things look like right now so owners and peers have ammunition to make decisions. Let me repeat what I just said, because it is the crux of the problem:

The current scope of performance automation is not in the state that it is something automatically sheriffable. We've built up the tools and infrastructure from almost the ground-up and it has accomplished exactly what it was set out to do: to put the knowledge of performance and problems into the hands of owners and peers to make their own decisions.

Automating performance is usually not a binary decision like a unit test. It takes analysis and guesswork, and even then it still needs human eyes. Rob and I are working towards making this better, to automate as much as possible, but right now the burden of making the tough calls still lies with those landing patches. We equip you to make those determinations until we have more tooling and automation in place for the sheriffing to actually be an option, because right now it is not.

We're back to my first message on the thread.
We don't have the adequate tooling to achieve our performance goal.

Every release we talk about performance like if the issue was a "developer awareness" issue, and we take strong stance on how "we should never regress".
But if we meant it we'd have more that 2 people working on the very challenging tooling work required. And believe me I'm fully aware of how challenging it is.

Eli Perelman

unread,

Oct 6, 2015, 2:22:16 PM10/6/15

to Naoki Hirata, Etienne Segonzac, mozilla-...@lists.mozilla.org, Fabrice Desré, Gregor Wagner, Dietrich Ayala

We're back to my first message on the thread.
We don't have the adequate tooling to achieve our performance goal.

Good, then I believe we are all on the same page and I agree with what you said, with one disconnect: there is a difference between "achieving our performance goals" and "achieving our performance goals automatically". We have the tools to make this happen, it just takes more effort than we would like. Yes, having more than 2 people working on this would be nice, but things were slimmed down quite a bit when we dissolved the performance team. As such, "achieving performance goals" is certainly doable as long as everyone is willing to get their hands dirty and run some local tests.

On Tue, Oct 6, 2015 at 1:12 PM, Naoki Hirata <nhi...@mozilla.com> wrote:

I think the question lies in answering, how do we resolve this, who resolves this? Perhaps having a quarter goal for some one would help push this to come through? I think it's evident that we need someone to work on it and have it part of their goals even if it's in parts.

Are you speaking about automated pre-commit performance testing? Achieving that is in progress but somewhat distant still, and we already have all the tooling in place for answering this information up-front before landing, just manually, which you have already stated.

To be fair, it's not my place to say what can land and what can't, what regressions to keep and which to back out. We work on the tooling to make those decisions, but do not make those decisions. It is up to engineering to work with product to make those determinations, and we will support whatever everyone decides. :)

Thanks,

Eli Perelman

Gabriele Svelto

unread,

Oct 6, 2015, 2:26:20 PM10/6/15

to Justin D'Arcangelo, Fabrice Desré, Julien Wajsberg, dev-...@lists.mozilla.org

On 06/10/2015 18:57, Justin D'Arcangelo wrote:
> TBH, if you look at the Raptor dashboards, the Music app startup times
> were terrible both before and after landing due to the massive NUWA
> regression. If someone was dogfooding daily builds from master during
> those two days, they likely would not have even noticed any difference
> because it was so bad to begin with.

For what it's worth I'm among the ones who's seen those changes as I
dogfood on nightly [1]. From a totally subjective perspective if I
hadn't seen the music app being entirely replaced in the gaia git log I
wouldn't have known it had happened. I didn't perceive any particular
change WRT performance or the responsiveness of the app after the
change. On the other hand the Nuwa regression had been a chore.

Gabriele

[1] On a Flame, which is my only phone, not my primary or secondary but
absolutely the only one I use. I sometime have the feeling I'm the only
one doing it but I happen to know at least another person who does. BTW
it's not so bad, apart from not being able to answer calls from the lock
screen the other day and other minor issues like that which pop up form
time to time.

signature.asc

Andrew Sutherland

unread,

Oct 6, 2015, 2:36:42 PM10/6/15

to dev-...@lists.mozilla.org

On Tue, Oct 6, 2015, at 01:05 PM, Fabrice Desré wrote:
> We have tools to detect regressions and report bugs. We have people
> triaging and following up on these bugs. What's left? Locking down devs
> to fix issues? If you have suggestions I'm all ears, but I'm out of
> politically correct ideas.

I think there are two important things we can do:

1) Reduce developer uncertainty about platform impact on app regressions
by providing a single "b2g performance state of the tree" resource that
is always at-hand. For example, TBPL very nicely is in your face about
the state of the branch you're dealing with (powered by
https://treestatus.mozilla.org/ which one can also directly consult).
The sheriffs are on top of the tree state and they make sure you know
the tree state too. It is my impression that our performance team is
likewise on top of things, it's just not as readily reflected to me as a
developer, and it's very easy for me to just assume that the platform
has regressed again. It would be great to have a green banner on
raptor.mozilla.org that says "The b2g trunk is good; all performance
regressions are your own" or a yellow banner that says "platform is
performanced-busted impacting startup time but not memory usage, check
out bug NNNNNN".

For example, it's my impression that the preallocated process mechanism
was broken for ~2.5 weeks on
https://bugzilla.mozilla.org/show_bug.cgi?id=1204837. From the bug,
it's clear that raptor and the performance team were on top of this
immediately. But my hypothetical devil's advocate impression was mainly
that the platform is flakey and I should not bother investigating
performance regressions because they're probably being dealt with by
other people and it's a lot of work for me to figure out the current
state of regressions, etc.

2) Reduce the activation energy to investigating performance regressions
by providing a profiler run for each raptor performance data point.
This is sorta covered by
https://bugzilla.mozilla.org/show_bug.cgi?id=1192746. If I am going to
investigate a performance regression, I potentially need to do a device
"context switch" where I ensure it's on trunk state, maybe do a b2g
build to get symbols, maybe have to update my checkout, etc. It can be
a hassle. And I'm just going to run the profiler myself anyways. If
raptor and its automatically filed bugs lets me (a hypothetical
developer) directly click to a profiler run, that makes it much easier
for me to investigate and spot obvious regressions/causes, or just find
things I can improve that aren't actually a regression. In fact (as a
hypothetical busybody developer) I might even do drive-by profiler-run
analyses on apps that aren't my own.

Of course, the profiler run will distort the performance numbers, so it
needs to not be one of the counted runs. Note that I'm not suggesting
automated analysis of the profiler runs. That would be cool, but is
probably two orders of magnitude more work.

Andrew

Jonas Sicking

unread,

Oct 6, 2015, 4:06:58 PM10/6/15

to Etienne Segonzac, Eli Perelman, mozilla-...@lists.mozilla.org, Fabrice Desré, Naoki Hirata, Gregor Wagner, Dietrich Ayala

On Tue, Oct 6, 2015 at 9:58 AM, Etienne Segonzac <eti...@mozilla.com> wrote:
> So we're fine with the system that didn't work for 2.5 and we're making no
> promise for the future.
> Nice commitment to performance.

While the wording here is a bit harsh, I agree with the sentiment.

We've talked a lot about focusing in quality, which to me includes not
regressing performance. However on an organizational level we still
haven't done much investment into our testing infrastructure. Some
engineers and QA people certainly has done a lot of great work, but
overall our investments are much too low here.

If we are really serious about actually improving quality, I think the
first thing we should do is to invest into our continuous-integration
testing infrastructure.

That means getting tools like raptor not just "up and running", but
actually enabled on the mozilla-central main reporting pages.

This is a lot of work. It's not just a matter getting tests runners
working. But also on getting numbers to be stable enough, tests to not
intermittently fail too often, etc.

But if we don't make this investment at an organizational level, I
don't think that we can argue that we are focusing on quality, nor do
I see a reason to believe that quality will increase.

/ Jonas

Jonas Sicking

unread,

Oct 6, 2015, 4:11:09 PM10/6/15

to Etienne Segonzac, Eli Perelman, mozilla-...@lists.mozilla.org, Fabrice Desré, Naoki Hirata, Gregor Wagner, Dietrich Ayala

On Tue, Oct 6, 2015 at 10:55 AM, Etienne Segonzac <eti...@mozilla.com> wrote:
> We don't have the adequate tooling to achieve our performance goal.
>

> Every release we talk about performance like if the issue was a "developer
> awareness" issue, and we take strong stance on how "we should never
> regress".
> But if we meant it we'd have more that 2 people working on the very
> challenging tooling work required. And believe me I'm fully aware of how
> challenging it is.

Yes. This is exactly right. I hope we put more people on working on
our testing infrastructure and tools.

It is not a small project. We won't make any progress unless we set up
strict goals and ensure that we get enough people working towards
those goals.

/ Jonas

April Morone

unread,

Oct 6, 2015, 4:39:15 PM10/6/15

to Eli Perelman, Wilson Page, James Burke, dev-...@lists.mozilla.org, Gareth Aye

I would love to, but my Firefox OS Flame reference phone went dead on me, and wont come back on, even after I charged the battery to 100% power level.

On Oct 2, 2015 12:03 PM, "Eli Perelman" <eper...@mozilla.com> wrote:

If you would like to use Raptor to start performance testing your apps to get these numbers down, it's all documented on MDN [1]. These numbers were captured on a Flame-KK with 319MB of memory with the light reference workload, which is the baseline device for v2.2 -> v2.5. Raptor does require Node v0.12 [2], so if you find you need to switch between Node 0.10 and 0.12 for Gaia, I recommend something like "n" [3] to easily switch between.

[1] https://developer.mozilla.org/en-US/Firefox_OS/Automated_testing/Raptor
[2] https://developer.mozilla.org/en-US/Firefox_OS/Automated_testing/Raptor#Prerequisites
[3] https://www.npmjs.com/package/n

Thanks,

Eli

On Fri, Oct 2, 2015 at 10:54 AM, Gareth Aye <garet...@gmail.com> wrote:
muy cache!

On Fri, Oct 2, 2015 at 11:37 AM, Wilson Page <wp...@mozilla.com> wrote:
Wow! Can email share what changes they made to get such a big improvement?

W I L S O N P A G E

Front-end Developer
Firefox OS (Gaia)
London Office

Twitter: @wilsonpage
IRC: wilsonpage

On Fri, Oct 2, 2015 at 3:51 PM, Eli Perelman <eper...@mozilla.com> wrote:
Hello fxos,

With deadlines for v2.5 approaching, I thought I would take a couple minutes and summarize the current state of performance for Gaia. At the outset of v2.5 we captured metrics of v2.2 and have used that as the baseline to determine whether applications have regressed their performance since. Any applications whose performance has significantly regressed since v2.2 will need approval to not block as major increases will block v2.5.

Enough of the chatter, here's the data:

Calendar v2.2 cold launch: 1454ms
Calendar current cold launch: 1638ms (~180ms regression)
Calendar v2.2 USS: 14.01MB
Calendar current USS: 13.99MB (good)

Camera v2.2 cold launch: 1492ms
Camera current cold launch: 2090ms (~600ms regression)
Camera v2.2 USS: 13.83MB
Camera current USS: 16.05MB (~2.2MB regression)

Clock v2.2 cold launch: 1232ms
Clock current cold launch: 1260ms (acceptable)
Clock v2.2 USS: 13.98MB
Clock current USS: 14.95MB (~1MB regression)

Contacts v2.2 cold launch: 773ms
Contacts current cold launch: 1246ms (~475ms regression)
Contacts v2.2 USS: 18.26MB
Contacts current USS: 20.04MB (~1.75MB regression)

Dialer v2.2 cold launch: 851ms
Dialer current cold launch: 944ms (~90ms regression, still under 1000ms)
Dialer v2.2 USS: 17.48MB
Dialer current USS: 13.04MB (good!)

Email v2.2 cold launch: 2129ms
Email current cold launch: 606ms (good!)
Email v2.2 USS: 16.17MB
Email current USS: 15.78MB (good)

FM v2.2 cold launch: 604ms
FM current cold launch: 783ms (~175ms regression)
FM v2.2 USS: 10.37MB
FM current USS: 10.51MB (acceptable)

Gallery v2.2 cold launch: 1113ms
Gallery current cold launch: 1207ms (~90ms regression)
Gallery v2.2 USS: 17.71MB
Gallery current USS: 18.98MB (~1.25MB regression)

Music v2.2 cold launch: 1066ms
Music current cold launch: 1717ms (~650ms regression)
Music v2.2 USS: 13.37MB
Music current USS: 29.49MB (~16.12MB regression)

SMS v2.2 cold launch: 1340ms
SMS current cold launch: 1630ms (~290ms regression)
SMS v2.2 USS: 12.86MB
SMS current USS: 19.94MB (~7MB regression)

Settings v2.2 cold launch: 2474ms
Settings current cold launch: 2950ms (~475ms regression)
Settings v2.2 USS: 17.18MB
Settings current USS: 17.54MB (acceptable)

Video v2.2 cold launch: 1115ms
Video current cold launch: 1309ms (~190ms regression)
Video v2.2 USS: 12.13MB
Video current USS: 13MB (acceptable)

TLDR; there seem to be quite a few serious regressions across many applications, in both cold launch time and USS memory usage. As a comparison, the Test Startup Limit app when first captured started off in the 880ms range, spent a good chunk of June and July around 620ms and is now around 850ms.

If anyone has any questions about the data or needs additional information, please let me know.

Also, kudos to the Email team for the massive improvement in both launch time and memory.

Thanks,

Eli Perelman

_______________________________________________
dev-fxos mailing list
dev-...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-fxos

April Morone

unread,

Oct 6, 2015, 4:43:46 PM10/6/15

to Justin D'Arcangelo, Fabrice Desré, dev-...@lists.mozilla.org

Good points. I agree.

On Oct 2, 2015 12:49 PM, "Justin D'Arcangelo" <jdarc...@mozilla.com> wrote:

I would also like to add that this policy of immediately pouncing on devs who attempt to try something new that may cause the perf numbers to momentarily dip is part of why we seem to have a culture problem in FxOS dev where everyone is afraid to take any kind of risks. If we are not allowed to have a 2-3 week window to optimize after a huge landing such as this, then how are we supposed to experiment or take risks?

In a worse-case scenario, we have the old Music app in the dev_apps folder that we can switch back to at a moment’s notice. But, we should be encouraging devs by giving them time to optimize after taking a huge risk. We are shipping to Mozillians (foxfooders) right now, who presumably understand that we are trying to make FxOS better.

-Justin

> On Oct 2, 2015, at 12:41 PM, Justin D'Arcangelo <jdarc...@mozilla.com> wrote:
>
> I understand, but the whole point of doing the switch now was to ensure that there were more people using the app to make sure there were no showstopper bugs that we weren’t aware of. If we don’t get people using the app for a few weeks, its possible that some major bugs could slip through. Especially in the case of an app like Music where everyone possesses a wide variety of media files. It would be impossible for any of the Music app devs or QA testers to check every possible media file out there. However, with dogfooders using the app, its more likely that more types of media files will get tested.
>
> -Justin
>
>
>> On Oct 2, 2015, at 12:36 PM, Fabrice Desré <fab...@mozilla.com> wrote:
>>
>> Think about people dogfooding. It's barely acceptable to suddenly switch
>> to a much worse version of any app, even I your target is some arbitrary
>> deadline in the future and you're confident to fix issues. You would not
>> do that on a live website right?
>>
>> On 10/02/2015 09:31 AM, Justin D'Arcangelo wrote:
>>> I feel like this is the 3rd or 4th time I’ve had to give this explanation, but at least in the case of Music NGA, we merely landed a completely new, feature-complete app this week. The optimization phase of the code had not yet begun, hence the reason for the increase in the perf numbers. However, prior to landing, I *did* run Raptor every day for the past 2 weeks on Flame. In my Raptor results, Music NGA was coming out ~500ms *faster* than the old app. However, as I noted in the bug, I do not trust the numbers because of the OS-wide perf regression that was causing *both* Music apps to take about 3-4 seconds to launch.
>>>
>>> This week, the focus has been mainly on identifying and quickly addressing any bugs that came up after the initial testing of the app. I feel that we have things somewhat under control as far as broken functionality goes. Yesterday, we started working on optimizations. There are several areas where we are completely unoptimized at the moment:
>>>
>>> - album art caching/loading
>>> - thumbnail sizes
>>> - script loading
>>> - view caching
>>>
>>> All of these items will address the memory usage, startup time or both. So, please do not assume that we spent weeks optimizing the app before landing this week. We merely reached a state of “feature-complete” with a new codebase. We hope to meet or beat the prior app’s numbers before the v2.5 deadline.
>>>
>>> Thanks!
>>>
>>> -Justin
>>>
>>>
>>>> On Oct 2, 2015, at 12:01 PM, Fabrice Desré <fab...@mozilla.com> wrote:
>>>>
>>>> All the nga apps (music, contacts, sms) show significant regressions. Is
>>>> that only a lack of optimizations in these apps, in the bridge they all
>>>> use or design flaws in nga itself?
>>>> In any case, we have to stop porting new apps to nga until these
>>>> questions are answered.
>>>>
>>>> Fabrice

>>>> --
>>>> Fabrice Desré
>>>> b2g team
>>>> Mozilla Corporation

>>>> _______________________________________________
>>>> dev-fxos mailing list
>>>> dev-...@lists.mozilla.org
>>>> https://lists.mozilla.org/listinfo/dev-fxos
>>>
>>
>>

>> --
>> Fabrice Desré
>> b2g team
>> Mozilla Corporation
>

April Morone

unread,

Oct 6, 2015, 4:44:39 PM10/6/15

to Fabrice Desré, Justin D'Arcangelo, dev-...@lists.mozilla.org

Now, I understand.

On Oct 2, 2015 1:17 PM, "Fabrice Desré" <fab...@mozilla.com> wrote:

On 10/02/2015 09:49 AM, Justin D'Arcangelo wrote:
> I would also like to add that this policy of immediately pouncing on devs who attempt to try something new that may cause the perf numbers to momentarily dip is part of why we seem to have a culture problem in FxOS dev where everyone is afraid to take any kind of risks. If we are not allowed to have a 2-3 week window to optimize after a huge landing such as this, then how are we supposed to experiment or take risks?

You have all the time you want if you don't put dogfooders at risk. No
one is saying that you should not take the risk to try something new
(side note, you spent enough time on spark & flyweb to know that). But
when it comes to shipping there is a minimum bar to meet, and with
basically a x2 memory usage we are not meeting it in this app yet,
sorry. Feel free to ship a new app alongside the existing one instead
and ask people to try it, since we can't do A/B testing.

The problem we have is that most people don't care enough about having a
stable nightly, which is why we haven't updated dogfooders for more than
a month now.

Fabrice

April Morone

unread,

Oct 6, 2015, 4:49:58 PM10/6/15

to Fabrice Desré, Justin D'Arcangelo, dev-...@lists.mozilla.org

I have two phones, one of whivh has boot2gecko 3.0.0.0-prerelease, and version 1.3 that, with updatesm updated me to only version 2.0,but it didnt go further than that of version updates (not sure why). But how now to continue testing since I can no longer access and test version 3.0.0.0-prerelease since my phone is dead that had that version on it, but yet my other FxOS phone that has 2.0 curreny on it isnt being updated?

April Morone

unread,

Oct 6, 2015, 5:16:42 PM10/6/15

to Fabrice Desré, Eli Perelman, mozilla-...@lists.mozilla.org, Dietrich Ayala, Etienne Segonzac, Gregor Wagner, Naoki Hirata

It seems that way cus I know one dev tht has to do tht cus uis app tht worked before on one version no lobger works on v2.5 or higher.

On Oct 6, 2015 1:07 PM, "Fabrice Desré" <fab...@mozilla.com> wrote:

On 10/06/2015 09:58 AM, Etienne Segonzac wrote:

> So we're fine with the system that didn't work for 2.5 and we're making
> no promise for the future.
> Nice commitment to performance.

We have tools to detect regressions and report bugs. We have people
triaging and following up on these bugs. What's left? Locking down devs
to fix issues? If you have suggestions I'm all ears, but I'm out of
politically correct ideas.

Gabriele Svelto

unread,

Oct 7, 2015, 5:45:34 AM10/7/15

to April Morone, Fabrice Desré, Justin D'Arcangelo, dev-...@lists.mozilla.org

On 06/10/2015 22:49, April Morone wrote:
> I have two phones, one of whivh has boot2gecko 3.0.0.0-prerelease, and
> version 1.3 that, with updatesm updated me to only version 2.0,but it
> didnt go further than that of version updates (not sure why). But how
> now to continue testing since I can no longer access and test version
> 3.0.0.0-prerelease since my phone is dead that had that version on it,
> but yet my other FxOS phone that has 2.0 curreny on it isnt being updated?

You probably have the latest stable image flashed on that phone, if you
want to experiment with newer version you'll need to flash the latest
nightly base image. The instruction for flashing it are available here
(the version you want is v18D_nightly_v4):

https://developer.mozilla.org/en-US/Firefox_OS/Phone_guide/Flame/Updating_your_Flame#Base%20Image

Once you've done that you might also want to flash the latest
engineering image if you want the currently nightly build (due to a
problem with updates being too large the Flame won't update itself past
a certain version). Instructions for doing so are available here:

https://developer.mozilla.org/en-US/Firefox_OS/Phone_guide/Flame/Updating_your_Flame#Base%20Image

As for your other device (the bricked one) as long as you're able to
reboot it into fastboot mode you should be able to flash a new base
image and bring it back to life. See the procedure here on how to enter
fastboot mode:

https://developer.mozilla.org/en-US/Firefox_OS/Phone_guide/Flame/Updating_your_Flame#Fastboot_mode

Gabriele

signature.asc

Jim Porter

unread,

Oct 7, 2015, 4:07:33 PM10/7/15

to mozilla-...@lists.mozilla.org

On 10/02/2015 12:16 PM, Fabrice Desré wrote:
> You have all the time you want if you don't put dogfooders at risk.

I don't believe we put dogfooders "at risk". Our preliminary testing of
the NGA Music app indicated that the memory regression wasn't a
significant issue in practice, unless you were testing on a 319MB Flame,
and even then things weren't awful. We had several folks using the NGA
version on a daily basis, and we only made the switch when a) the app
was feature-complete, and b) we were able to use the app ourselves
without any major problems.

In any case, dogfooders should be expected to suffer. That's the whole
point. If you're eating from the master branch, then you are *testing*
the software. Among other things, dogfooders help test whether a memory
regression in a (fairly-arbitrary) automated test actually impacts
real-world usage in a meaningful way. If this isn't the goal of the
dogfooding program, then we probably shouldn't be feeding them the
master branch.

> The problem we have is that most people don't care enough about
> having a stable nightly, which is why we haven't updated dogfooders
> for more than a month now.

This isn't a question of stability. The music app isn't (very*) broken;
it's just slower. It's only a matter of optimization (granted, excessive
memory usage can lead to OOMs, but we weren't seeing that in our tests).
Given my overall experience with attempting to dogfood, I'd love it if
the worst problem I had was that an app used more memory.

>From what you say, it sounds like you're treating the dogfooding program
as more of a beta. Established practice elsewhere in Mozilla (Desktop,
Thunderbird, etc) is that the development and beta trees are *separate*.
If we want to ship beta-quality software to our dogfooders, then the
standard Mozilla way of doing so is to let our patches ride the trains
so they have time to get all the kinks out. I know we've been trying
(and failing) to get Firefox OS to ride the trains for a long time now,
but we don't have to copy Desktop. *Anything* that lets us provide a
frequently-updated channel where patches have had some baking time would
be a boon.

Generally, I think we should be landing stuff to master as quickly as
possible. It helps reduce merge conflicts, makes it easier for QA to
test, and lets the more-adventurous dogfooders see the latest-greatest.
You mentioned that we could have created a second app for people to try.
We did. No one used it**. That also makes development harder, since
switching out the apps will bitrot every in-progress patch.

In the end, we need to have a release management strategy compatible
with our quality expectations for the dogfood channel. From what you've
said, it sounds like master isn't the best branch to be pulling from.

- Jim

* There are some significant bugs, but nothing that's caused me any
serious stress, despite using the music app on a regular basis.

** Aside from the NGA Music developers and a couple of other people who
tried it out.

Fabrice Desré

unread,

Oct 7, 2015, 4:47:34 PM10/7/15

to Jim Porter, mozilla-...@lists.mozilla.org

On 10/07/2015 01:07 PM, Jim Porter wrote:

> In the end, we need to have a release management strategy compatible
> with our quality expectations for the dogfood channel. From what you've
> said, it sounds like master isn't the best branch to be pulling from.

Because master's quality is not where it should be. I've been using
nightlies of Firefox desktop for years and it's vastly better than what
we have on Firefox OS. That's not ok, so we need to fix that. Being able
to update often should be a strength - it's ridiculous that we have a
web based OS but can't update it with a web like cadence.

Short term we'll set up another "no smoketest blocker" weekly channel,
but longer term we absolutely need to reach a quality level similar to
desktop.

April Morone

unread,

Oct 7, 2015, 4:48:43 PM10/7/15

to Gabriele Svelto, Fabrice Desré, Justin D'Arcangelo, dev-...@lists.mozilla.org

Okay, cool. ty for this info. Then, I will see about using Raptor for testing... if I can get this particular FxOS Flame phone to turn back on, again. cus it wont turn back on, again, at all.

Jim Porter

unread,

Oct 7, 2015, 4:56:49 PM10/7/15

to mozilla-...@lists.mozilla.org

On 10/07/2015 03:46 PM, Fabrice Desré wrote:
> Because master's quality is not where it should be. I've been using
> nightlies of Firefox desktop for years and it's vastly better than
> what we have on Firefox OS.

That's not an apples-to-apples comparison, since (as far as I know)
developers land to mozilla-inbound, not mozilla-central.

> it's ridiculous that we have a web based OS but can't update it with
> a web like cadence.

How does being "web based" determine release cadence? The complexity of
the project seems to be a much more reliable indicator of the ideal
release cadence, and I think it's safe to say that Firefox OS is more
complicated than most websites.

> Short term we'll set up another "no smoketest blocker" weekly
> channel, but longer term we absolutely need to reach a quality level
> similar to desktop.

That sounds a lot like an inbound -> central model to me, so I don't see
why that's not a viable long term solution too.

- Jim

Fabrice Desré

unread,

Oct 7, 2015, 5:11:23 PM10/7/15

to Jim Porter, mozilla-...@lists.mozilla.org

On 10/07/2015 01:56 PM, Jim Porter wrote:
> On 10/07/2015 03:46 PM, Fabrice Desré wrote:

>> Because master's quality is not where it should be. I've been using
>> nightlies of Firefox desktop for years and it's vastly better than
>> what we have on Firefox OS.
>

> That's not an apples-to-apples comparison, since (as far as I know)
> developers land to mozilla-inbound, not mozilla-central.

Are you arguing for a gaia-inbound? That was consistently pushed back by
gaia-ers.

>> it's ridiculous that we have a web based OS but can't update it with
>> a web like cadence.
>

> How does being "web based" determine release cadence? The complexity of
> the project seems to be a much more reliable indicator of the ideal
> release cadence, and I think it's safe to say that Firefox OS is more
> complicated than most websites.

I disagree. Each app in isolation (except probably the system app) is
not more complex that big websites. We should be able to update them
almost per commit when there's no platform dependency.

>> Short term we'll set up another "no smoketest blocker" weekly
>> channel, but longer term we absolutely need to reach a quality level
>> similar to desktop.
>

> That sounds a lot like an inbound -> central model to me, so I don't see
> why that's not a viable long term solution too.

It's different. A "smoketest blocker only" branch is more like an aurora
branch where we limit the backported fixes. I agree that if it's working
well enough we may keep it longer.

Jim Porter

unread,

Oct 7, 2015, 5:27:20 PM10/7/15

to mozilla-...@lists.mozilla.org

On 10/07/2015 04:10 PM, Fabrice Desré wrote:
> Are you arguing for a gaia-inbound? That was consistently pushed back
> by gaia-ers.

I'd be happy with it (especially if we just designated master to be
gaia-inbound). I'm not really sure what issue people would have with it,
aside from sheriffs/ops having more up-front work.

> I disagree. Each app in isolation (except probably the system app)
> is not more complex that big websites. We should be able to update
> them almost per commit when there's no platform dependency.

The platform is a common issue. Unlike most websites, we're using
bleeding-edge features that often have performance issues at the Gecko
level, not to mention serious bugs that can regress our apps. In fact,
one of the more significant recent issues with the Music app (the
scanning process stalling partway through) was the result of a Gecko
regression.

Because of that, I don't think we can treat each app as independent of
the platform. Nearly all of our apps rely on some platform bit that
doesn't apply to the average website, and those bits are not
coincidentally the ones that we typically have the most problems with.
In some ways, the bugginess in Gecko has just rolled downhill into Gaia
where we don't have the same level of branch management to account for it.

> It's different. A "smoketest blocker only" branch is more like an
> aurora branch where we limit the backported fixes. I agree that if
> it's working well enough we may keep it longer.

An Aurora-like branch would be even better.

- Jim

Fabrice Desré

unread,

Oct 7, 2015, 5:46:52 PM10/7/15

to Jim Porter, mozilla-...@lists.mozilla.org

On 10/07/2015 02:26 PM, Jim Porter wrote:

> Because of that, I don't think we can treat each app as independent of
> the platform. Nearly all of our apps rely on some platform bit that
> doesn't apply to the average website, and those bits are not
> coincidentally the ones that we typically have the most problems with.
> In some ways, the bugginess in Gecko has just rolled downhill into Gaia
> where we don't have the same level of branch management to account for it.

Ok. That's why we must work on the gecko-breaking-gaia problem much more
aggressively. Gecko is stable enough for desktop nightlies, it should
also be stable enough for b2g nightlies. We know that there are many
tests not running on b2g for various reasons and we need to address that.

>> It's different. A "smoketest blocker only" branch is more like an
>> aurora branch where we limit the backported fixes. I agree that if
>> it's working well enough we may keep it longer.
>

> An Aurora-like branch would be even better.

To me this is an aurora-like branch, with weekly trains instead of 6
week trains. Time will tell if that works for us!