telemetry

Graydon Hoare

unread,

Aug 6, 2010, 7:03:20 PM8/6/10

to

Hi,

I've filed a bug (585196) about building a telemetry system for
gathering more-than-just-crash stats from the field users. Performance
telemetry in particular. I was wondering if anyone had broad opinions on
the matter: whether this is a good or bad idea, whether it's already
being done and I just don't know where, how to go about doing it, who
might be interested in doing which pieces of work, why I'm a fool for
proposing it, etc.

Thanks,

-Graydon

Andrew Sutherland

unread,

Aug 6, 2010, 8:24:56 PM8/6/10

to dev-pl...@lists.mozilla.org

Great idea.

A concern for Thunderbird is our memory footprint and having raw data
rather than random data points which frequently omit whether it's
virtual memory size, working set size, etc. would be quite useful. We
could also hopefully cram in extra-useful data like the number of open
message folder databases, each open folder's message count (or possibly
just its truncated logarithm, possibly histogrammed over open folders),
gloda's effective message indexing rate, etc.

Andrew

Robert Kaiser

unread,

Aug 6, 2010, 8:47:52 PM8/6/10

to

Andrew Sutherland schrieb:

Isn't that what Firefox has Test Pilot for?

I'd surely be interested in seeing that available on a broader scope...

Robert Kaiser

--
Note that any statements of mine - no matter how passionate - are never
meant to be offensive but very often as food for thought or possible
arguments that we as a community needs answers to. And most of the time,
I even appreciate irony and fun! :)

Jeff Muizelaar

unread,

Aug 6, 2010, 8:50:08 PM8/6/10

to Graydon Hoare, dev-pl...@lists.mozilla.org

I've wanted this kind of thing for some time. Given the large size of
our user base, I think there's a lot of potential for understanding
and measuring the quality of our product. I'd love to help, but
already feel overcommitted so I hope others will be able to.

-Jeff

Alex Faaborg

unread,

Aug 6, 2010, 8:56:19 PM8/6/10

to dev-pl...@lists.mozilla.org

>
> Isn't that what Firefox has Test Pilot for?
>

I might be wrong, but I don't believe Test Pilot is currently reporting back
performance data yet. It's primarily been used with Firefox 4 betas to
deploy surveys and capture interface usage metrics. Adding real world
performance telemetry to Test Pilot would be extremely valuable (especially
when correlated by various extensions and plugins).

-Alex

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>

Graydon Hoare

unread,

Aug 6, 2010, 8:57:33 PM8/6/10

to Robert Kaiser

On 10-08-06 05:47 PM, Robert Kaiser wrote:

> Isn't that what Firefox has Test Pilot for?

I make some mention of this in the bug; I think test pilot is probably a
poor fit here because of certain (sensible-for-it) choices it makes:
limited test sizes, limited-duration tests, fixed "reporting and
analysis" stage, two-stage opt-in, user-facing UI, etc.

I'd be happy to try to extend the test pilot infrastructure to handle
this sort of thing if there looks to be sufficient overlap; I just got
the impression it was for more user-facing "questionnaires" rather than
low-level performance numbers ubiquitously (and continuously) reported.

(Ideally we'd acquire new telemetry numbers any time someone committed
support to trunk for reporting one; no need to have to "set up a new
experiment" or such.)

-Graydon

Graydon Hoare

unread,

Aug 6, 2010, 9:03:10 PM8/6/10

to Andrew Sutherland

On 10-08-06 05:24 PM, Andrew Sutherland wrote:

> A concern for Thunderbird is our memory footprint and having raw data
> rather than random data points which frequently omit whether it's
> virtual memory size, working set size, etc. would be quite useful. We
> could also hopefully cram in extra-useful data like the number of open
> message folder databases, each open folder's message count (or possibly
> just its truncated logarithm, possibly histogrammed over open folders),
> gloda's effective message indexing rate, etc.

Mm, I think while this is possible it's important to focus on a special,
limited kind of number:

- Those for which the data-burden will not be too punishing to
either gather in the client or report to a server.
- Those that are essentially anonymous in nature.
- Those that *aggregate* well across millions of users.

Regarding the latter point: I'm not sure there's a simple way of
combining structured histograms in a way that preserves much signal;
think you get a lot of cross-talk. One of the only reasons crash-stats
can sort by stack signature is that the "top N frames" of a given crash
are frequently the same, just due to the call graph being
mostly-deterministic. Fiddle the order of entries in your histograms
around a bit and I'm not sure sorting or ranking them remains tractable.

I'm no analytics person though, we could definitely use advice from
someone who does this kind of thing for a living.

-Graydon

Alex Faaborg

unread,

Aug 6, 2010, 9:09:52 PM8/6/10

to Graydon Hoare, dev-pl...@lists.mozilla.org

>
> rather than low-level performance numbers ubiquitously (and continuously)
> reported.
>

We could theoretically expand the definition of Test Pilot to include more
ubiquitous and continuous aspects of performance reporting. Some advantages
of building this into test pilot could include:

-Already deployed to every beta user
-Existing community of people who like to analyze complex data sets
-Existing server side infrastructure for capturing the data
-Single extension for beta users to disable in order to turn off all forms
of metrics reporting
-Single extension for non-beta users to proactively opt in to helping us
gather data

But regardless of how we package it, getting access to real world
performance data is incredibly important (for instance it would have raised
a red flag when the eBay Browser Highlighter extension deployed with Skype
caused Firefox to have a 30 second start up time).

-Alex

David Bolter

unread,

Aug 7, 2010, 12:54:09 PM8/7/10

to Alex Faaborg, Graydon Hoare, dev-pl...@lists.mozilla.org

Could this work be gracefully slotted into the Test Pilot roadmap? Are their
people for it? How does this go from good idea to implementation?

Cheers,
D

On Aug 6, 2010 9:10 PM, "Alex Faaborg" <faa...@mozilla.com> wrote:
>>
>> rather than low-level performance numbers ubiquitously (and continuously)
>> reported.
>>
>

> We could theoretically expand the definition of Test Pilot to include more
> ubiquitous and continuous aspects of performance reporting. Some
advantages
> of building this into test pilot could include:
>
> -Already deployed to every beta user
> -Existing community of people who like to analyze complex data sets
> -Existing server side infrastructure for capturing the data
> -Single extension for beta users to disable in order to turn off all forms
> of metrics reporting
> -Single extension for non-beta users to proactively opt in to helping us
> gather data
>
> But regardless of how we package it, getting access to real world
> performance data is incredibly important (for instance it would have
raised
> a red flag when the eBay Browser Highlighter extension deployed with Skype
> caused Firefox to have a 30 second start up time).
>
> -Alex
>
> On Fri, Aug 6, 2010 at 5:57 PM, Graydon Hoare <gra...@mozilla.com> wrote:
>

Robert Kaiser

unread,

Aug 7, 2010, 2:27:58 PM8/7/10

to

Alex Faaborg schrieb:

>>
>> Isn't that what Firefox has Test Pilot for?
>>
>
> I might be wrong, but I don't believe Test Pilot is currently reporting back
> performance data yet. It's primarily been used with Firefox 4 betas to
> deploy surveys and capture interface usage metrics. Adding real world
> performance telemetry to Test Pilot would be extremely valuable (especially
> when correlated by various extensions and plugins).

I meant the real Test Pilot, not the built-in version of it used as Beta
feedback module right now.

Robert Kaiser

unread,

Aug 7, 2010, 2:32:58 PM8/7/10

to

Graydon Hoare schrieb:

> I'd be happy to try to extend the test pilot infrastructure to handle
> this sort of thing if there looks to be sufficient overlap; I just got
> the impression it was for more user-facing "questionnaires" rather than
> low-level performance numbers ubiquitously (and continuously) reported.

Whoa. Continuous and ubiquitous reporting of user data without opt-in
and telling the user exactly what's up?

That's exactly the "phoning home" privacy-invading things we are trying
to kill off the Internet with some parts of the Mozilla mission!

Or have I understood you wrong and you want just to be a bit lighter
than what Test Pilot is doing now? That's surely something worth
thinking about and working on - just keep in mind that there's a lot of
reason behind what has been set up for Test Pilot - things like the tab
study that has been done there, and I think also what you are
suggesting, are ONLY possible if you at the same time put really high
value on participants' privacy.

Asa Dotzler

unread,

Aug 7, 2010, 4:05:04 PM8/7/10

to

On 8/7/2010 11:32 AM, Robert Kaiser wrote:
> Graydon Hoare schrieb:
>> I'd be happy to try to extend the test pilot infrastructure to handle
>> this sort of thing if there looks to be sufficient overlap; I just got
>> the impression it was for more user-facing "questionnaires" rather than
>> low-level performance numbers ubiquitously (and continuously) reported.
>
> Whoa. Continuous and ubiquitous reporting of user data without opt-in
> and telling the user exactly what's up?
>
> That's exactly the "phoning home" privacy-invading things we are trying
> to kill off the Internet with some parts of the Mozilla mission!
>
> Or have I understood you wrong and you want just to be a bit lighter
> than what Test Pilot is doing now? T

Yes. You have misunderstood. No one in this project has ever advocated
for anything like "ubiquitous reporting of user data without opt-in"

Perhaps next time your initial comment can be the one that assumes the
best of your colleagues rather than the worst.

If you have some minor doubts about their good intentions, maybe make
that the short caveat at the end just to be sure.

- A

Robert Kaiser

unread,

Aug 8, 2010, 9:07:01 AM8/8/10

to

Asa Dotzler schrieb:

> Yes. You have misunderstood. No one in this project has ever advocated
> for anything like "ubiquitous reporting of user data without opt-in"

For one thing, I didn't even recognize that what Graydon wants is an
in-depth MoCo project, which it looks like you are indicating there.
For the other, good that I misunderstood that at first glance, it was
easy to understand it that way and I wanted to clear this up before it
hits the press, which even posts in various newsgroups tend to do these
days - and they don't care if it's a MoCo project or not, they
understand "one Mozilla" better than we do, or something. ;-)

> Perhaps next time your initial comment can be the one that assumes the
> best of your colleagues rather than the worst.

Perhaps next time you don't assume from the first letter on that I want
to attack someone and read my signature that I _intentionally_ did write
up and place under those posts.

Nicholas Nethercote

unread,

Aug 9, 2010, 12:07:19 AM8/9/10

to Graydon Hoare, dev-pl...@lists.mozilla.org

Sounds like a wonderful idea to me. Firefox is so complex, and has so
many users doing so many different things, if we don't do this kind of
thing we're only understanding a fraction of what's going on. It
might really help understand those cases where people have really
appalling performance, eg. startup takes 2 minutes for some strange
reason.

N

Benjamin Smedberg

unread,

Aug 9, 2010, 9:49:03 AM8/9/10

to

Looking at the data you've proposed to collect, I'm concerned that we
wouldn't be able to turn it into information or act on it. Even if we
discover that a release has large GC/CC pause times, or average memory
usage, that doesn't give us way to fix the problem.

One of the things that has been painfully clear from crash-stats is that
most of the valuable data comes in the form of correlations, particular
reports, and comments. If we discover that a certain set of users have
abnormal startup time, we're really going to want to know what makes those
systems different: what kinds of disks they have, how much RAM, whether
Firefox started soon after the OS was booted, and other data which can be
used to correlate/diagnose and hopefully fix.

I also have privacy and collection concerns: all of our current
data-gathering metrics are either side effects of update features (AUS or
AMO update pings) or are opt-in (crash reports and test pilot). If we
actually collect the data we need to correlate problems, as above, then
there will certainly be potentially identifying information and we need the
reporting to be opt-in.

--BDS

Mike Shaver

unread,

Aug 9, 2010, 10:40:16 AM8/9/10

to Benjamin Smedberg, dev-pl...@lists.mozilla.org

On Mon, Aug 9, 2010 at 9:49 AM, Benjamin Smedberg <benj...@smedbergs.us> wrote:
> I also have privacy and collection concerns: all of our current
> data-gathering metrics are either side effects of update features (AUS or
> AMO update pings) or are opt-in (crash reports and test pilot). If we
> actually collect the data we need to correlate problems, as above, then
> there will certainly be potentially identifying information and we need the
> reporting to be opt-in.

I think for this case we are going to want multiple interactions:

- collect harmless data (startup time, etc.)
- when reporting it in, have the collector signal to the browser that
the user's configuration is "interesting", so that we can ask the user
to opt in to giving more information
- get that additional information
- repeat as necessary

We're starting to see some of this response mechanism evolve in
socorro, for telling users more about their crashes (like "try
updating Flash!" or "try disabling BrowserWigglePlus" or even "hey, we
think this is fixed, want to try the beta?"), so we will probably
learn from that.

Mike

Nicholas Nethercote

unread,

Aug 9, 2010, 6:18:55 PM8/9/10

to Mike Shaver, Benjamin Smedberg, dev-pl...@lists.mozilla.org

On Tue, Aug 10, 2010 at 12:40 AM, Mike Shaver <mike....@gmail.com> wrote:
>
> I think for this case we are going to want multiple interactions:
>
> - collect harmless data (startup time, etc.)
> - when reporting it in, have the collector signal to the browser that
> the user's configuration is "interesting", so that we can ask the user
> to opt in to giving more information
> - get that additional information
> - repeat as necessary

Maybe it would be useful to have a way for a user to self-report if
they're seeing something weird. (Eg. if someone has a start-up times
of over a minute.) Not necessarily something you'd want on a normal
release, but maybe in nightly builds.

N

johnjbarton

unread,

Aug 10, 2010, 12:54:19 AM8/10/10

to

Gathering information from many users may be a bad idea if you goal is
to investigate unusual problems: by definition these are problems which
do not happen on typical systems. Gathering lots of data will cover up
the unusual problems with the overwhelming number of ordinary results.

Perhaps a better strategy is to develop a field-deployable kit to
specific users who can help. This dramatically lowers the bar for the
kit, the user is opting in, keen, and skilled. It won't help you develop
UIs for less sophisticated users, but that does not seem to be your
goal. As a small point of experience, Firebug ships tracing in our
alpha/beta builds and we have found bugs by having users trace code when
they cannot create test cases. I think some Firefox testers already do
similar things, just expand it by making the tools better.

jjb

Alex Faaborg

unread,

Aug 10, 2010, 2:26:39 AM8/10/10

to johnjbarton, dev-pl...@lists.mozilla.org

>
> Gathering lots of data will cover up the unusual problems with the
> overwhelming number of ordinary results.
>

I'm concerned we have a false impression of ordinary, which we reinforce
with Ts scores, fresh profiles, personally curated lists of extensions, and
running Firefox on platforms like OS X and Linux that are less likely to
pick up injected crapware. At the moment we don't really know to what
extent a very long start up time is out of the ordinary. Gathering data
from beta users should be a diverse enough sample to be representative, but
relying purely on support requests might not give us a clear picture of
Firefox in the wild.

-Alex

Justin Dolske

unread,

Aug 10, 2010, 3:55:37 PM8/10/10

to

On 8/9/10 6:49 AM, Benjamin Smedberg wrote:

> Looking at the data you've proposed to collect, I'm concerned that we
> wouldn't be able to turn it into information or act on it. Even if we
> discover that a release has large GC/CC pause times, or average memory
> usage, that doesn't give us way to fix the problem.

I think there's still significant value in being able to measure things,
so we have a more accurate understanding of the scope of the problem,
and how improvements we make impact users.

> I also have privacy and collection concerns

Agreed, I think we should definitely err on the side of conservative
privacy. Test Pilot sounds like a good first step to experiment with
what's reasonable and helpful to collect. From there we can see if it
needs to remain a sensitive service or not.

Justin

Patrick McManus

unread,

Aug 10, 2010, 4:13:43 PM8/10/10

to Graydon Hoare, dev-pl...@lists.mozilla.org

On Fri, 2010-08-06 at 17:57 -0700, Graydon Hoare wrote:
> On 10-08-06 05:47 PM, Robert Kaiser wrote:
>
> > Isn't that what Firefox has Test Pilot for?
>

[..]

>
> I'd be happy to try to extend the test pilot infrastructure to handle
> this sort of thing if there looks to be sufficient overlap;

I am very interested in this kind of application for obtaining a whole
range of information: cache hit rates, cache replacement patterns,
handshake times, persistent connection reuse rates, header sizes,
document sizes and subdocument counts, lifetime of a tab, buffer fill
rates, congestion window sizes, etc..

as a matter of fact I just started dreaming on the topic in a separate
email and Boris pointed to this thread. I also have no expertise in the
telemetry area, but I'd sure like to have the data and am happy to work
on it.

Patrick McManus

unread,

Aug 26, 2010, 12:30:40 PM8/26/10

to Graydon Hoare, dev-pl...@lists.mozilla.org

Regarding statistics, I thought this article makes interesting
background information if you haven't seen it:

http://lwn.net/SubscriberLink/401769/ed3bba19da486a68/

The article is about collecting data in linux and coordinating with
virtualized guests, but there are correlaries imo.