Usage of Differential Privacy & RAPPOR

11600 views
Skip to first unread message

Georg Fritzsche

unread,
Aug 21, 2017, 11:56:44 AM8/21/17
to gover...@lists.mozilla.org, dev-p...@lists.mozilla.org
Hi,

for Firefox we want to better understand how people use our product to
improve their experience. To do that, we are planning to run a new SHIELD
study that tests how we can collect additional data in a privacy preserving
way. Check out the details below and send me your thoughts.

The problem.

One recurring ask from the Firefox product teams is the ability to collect
more sensitive data, like top sites users visit and how features perform on
specific sites.

Currently we can collect this data when the user opts in, but we don't
have a way to collect unbiased data, without explicit consent (opt-out).

Asks for sensitive data center most commonly around knowing something in
relation to which sites a user visits:

-

"Which top sites are users visiting?"
-

"Which sites using Flash does a user encounter?"
-

"Which sites does a user see heavy Jank on?"

In summary most asks are for occurrences of an event X per domain (more
specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).

The solution.

One solution is the use of differential privacy [2] [3], which allows us to
collect sensitive data without being able to make conclusions about
individual users, thus preserving their privacy.

An attacker that has access to the data a single user submits is not able
to tell whether a specific site was visited by that user or not.

The Google Open Source project called RAPPOR [4] [5] is the most widely
known and deployed implementation of differential privacy.

We have been investigating the use of RAPPOR for these kind of use-cases,
with initial simulation results being promising.

Our plan.

What we plan to do now is run an opt-out SHIELD study [6] to validate our
implementation of RAPPOR. This study will collect the value for users’ home
page (eTLD+1) for a randomly selected group of our release population We
are hoping to launch this in mid-September.

This is not the type of data we have collected as opt-out in the past and
is a new approach for Mozilla. As such, we are still experimenting with the
project and wanted to reach out for feedback.

Georg

References:

1: https://en.wikipedia.org/wiki/Public_Suffix_List

2: https://en.wikipedia.org/wiki/Differential_privacy

3: https://robertovitillo.com/2016/07/29/differential-privacy-for-dummies/

4: https://github.com/google/rappor
5: https://arxiv.org/abs/1407.6981
<https://arxiv.org/abs/1407.6981>6:
https://wiki.mozilla.org/Firefox/Shield/Shield_Studies

djoac...@gmail.com

unread,
Aug 22, 2017, 8:40:03 AM8/22/17
to mozilla-g...@lists.mozilla.org
Hello.

I don't have the neccesary information to say whether this is correct, moral, or neccesary, but I will say that I believe Opt-in is pro-privacy, while Opt-out is anti-privacy.

If Firefox is dedicated to preserving privacy, then no Opt-in data feature should be added.

Thank you.

omar.a...@gmail.com

unread,
Aug 22, 2017, 9:46:26 AM8/22/17
to mozilla-g...@lists.mozilla.org
What about the fact that I don't want to give my information even in an anonymous and untraceable way? You understand that anonymity is just part of the equation and not the single issue at stake here.

philipp.k...@googlemail.com

unread,
Aug 22, 2017, 9:46:26 AM8/22/17
to mozilla-g...@lists.mozilla.org
If this will be implemented, I’ll have to file a complaint with the relevant Landes- and Bundesbeauftragten für Datenschutz, and, possibly, escalate this to the EU Data Privacy commissioners office.

I’d prefer if you’d avoid doing this.

lede...@gmail.com

unread,
Aug 22, 2017, 9:46:27 AM8/22/17
to mozilla-g...@lists.mozilla.org
hi there.

i do not understand the need to know the top 100 sites for improving the "product".

can you explain?

i see a lot of big issues which should be improved regarding the performance and an overloaded feature-set, ui-quirks and several page rendering issues.

none of these would be addressed by accessing, analysing and storing even anonymously gathered user data.

cmn3...@gmail.com

unread,
Aug 22, 2017, 9:46:28 AM8/22/17
to mozilla-g...@lists.mozilla.org
Wow... I'm not sure I can say it any other way.

Having something like this be opt-out is very anti-privacy.

I'm thoroughly disappointed. I guess, even after having used Firefox since the single digit releases, it's time to look elsewhere.

I wish you the worst of luck in your new venture to infringe even further upon the privacy of your users.


CMN

pik7...@gmail.com

unread,
Aug 22, 2017, 9:46:35 AM8/22/17
to mozilla-g...@lists.mozilla.org
I somehow dont believe anymore in this.
What I spot is that there is more and more bodies which are interested in that kind of data to provide better adds to 'customers', and it sound like one of those bodies reach Firefox and show them how much money they can get for this data.

If you will implement this I will say 'bye bye' to this webbrowser.

Gervase Markham

unread,
Aug 22, 2017, 10:39:36 AM8/22/17
to Georg Fritzsche
Hello, Redditors...

On 21/08/17 08:56, Georg Fritzsche wrote:
> One solution is the use of differential privacy [2] [3], which allows us to
> collect sensitive data without being able to make conclusions about
> individual users, thus preserving their privacy.

If you are going to comment here, your comment would be more useful if
it showed that you have taken the time to understand differential
privacy and RAPPOR, and explained why you think it's not sufficient (if
that's what you think, after studying it).

Comments which assume that we are proposing to collect browser data with
no privacy protections at all are not helpful, because they assume
things which are not true.

Gerv

siva....@gmail.com

unread,
Aug 22, 2017, 10:45:09 AM8/22/17
to mozilla-g...@lists.mozilla.org
> Asks for sensitive data center most commonly around knowing something in
> relation to which sites a user visits:
>
> -
>
> "Which top sites are users visiting?"
> -
>
> "Which sites using Flash does a user encounter?"
> -
>
> "Which sites does a user see heavy Jank on?"
>
> In summary most asks are for occurrences of an event X per domain (more
> specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).


Hello Georg, three questions:

1. Could you explain exactly what kinds of problems (which are currently a big source of trouble) would be solved easily with the currently proposed plan? And also what kinds of problems cannot be solved with this data, but could be solved with more invasive data collection?

2. What exactly is the problem if the collection is opt-in? Yes the data is "biased", so what? Are you worried that you might miss certain issues faced mostly by users who don't opt in? Is there any justification for this argument, or is it just a hunch?

3. For those users who consider privacy most valuable, would there be an easy way to opt out, which *guarantees* that Mozilla collects *no information* about their browser usage?

Thank you.

Siva

turi...@gmail.com

unread,
Aug 22, 2017, 10:47:22 AM8/22/17
to mozilla-g...@lists.mozilla.org
But the disagreement is not about the idea that the technology does not work. But that in principal collecting more data without users having the option for disable it is moral wrong no matter how trustworthy you are or useful it is for the product.

Gervase Markham

unread,
Aug 22, 2017, 10:49:56 AM8/22/17
to mozilla-g...@lists.mozilla.org
On 22/08/17 07:45, turi...@gmail.com wrote:
> But the disagreement is not about the idea that the technology does
> not work. But that in principal collecting more data without users
> having the option for disable it is moral wrong no matter how
> trustworthy you are or useful it is for the product.

Users _do_ have the option to disable it.

Gerv

dan.ca...@gmail.com

unread,
Aug 22, 2017, 11:02:16 AM8/22/17
to mozilla-g...@lists.mozilla.org
On Monday, August 21, 2017 at 10:56:44 AM UTC-5, Georg Fritzsche wrote:
> One solution is the use of differential privacy [2] [3], which allows us to
> collect sensitive data without being able to make conclusions about
> individual users, thus preserving their privacy.

Differential privacy is a great tool, however, I'm concerned that even if we do everything *technically* correctly to preserve user privacy, the *optics* associated with this sort of data collection were not address in this email.

We attempted to do similarly with User Profile ("UP") / Directory Tiles projects in Content Services, which proposed completely local history analysis for purposes of advertising and content discovery. All of which was done in a way that absolutely protected user privacy (the analysis never left the local machine), but we weren't able to overcome the superficial impression that Firefox was tracking users.

1. How do you propose we address the change in (and mis-)perception of Firefox as a result of this telemetry?

2. Secondly, I'm far more comfortable with data collection that's strictly tied to performance (jank, Flash domains, etc.) than I am with personal data, like homepages or top sites. Would this project be as valuable *without* collecting personalized information like the above?

Best,
Dan

malt...@gmail.com

unread,
Aug 22, 2017, 11:02:26 AM8/22/17
to mozilla-g...@lists.mozilla.org
Why collect on the client side when the server side for the larger sites most definitely collects usage data to much more detail than you would ever do?

Wouldn't Mozilla be in a strong enough position to ask for statistics of user agents from Facebook, Google, etc., and maybe even what hoops their respective engineering departments have to jump through to make the site work on all major platforms?

David Teller

unread,
Aug 22, 2017, 11:05:25 AM8/22/17
to gover...@lists.mozilla.org
Hello Siva,

I'll try and chime in.

1. The main problem that is at stake here is improving Firefox for
websites that our users actually use. We fight a perpetual fight to
improve Firefox for our users, which means that we need to know where to
spend our limited resources. While we can manually or semi-automatically
test a number of websites to find out whether, for instance, Firefox 55
is faster or slower than the previous version, for the moment, we have
to rely upon guesses to determine whether these are websites that our
users actually use.

With more data, we could automatically determine such information and more.

For instance, we could correlate this with crash reports of users who
choose to submit these reports and automatically find out that since
version 60 of Firefox, site foobar.com causes crashes, or maybe that
crashes have decreased on that site since the release of a new graphics
driver. Or we could correlate this with performance reports of users who
opt-in for such reports and automatically find out that since version 60
of Firefox, our performance on foobar.com has improved/decreased.

So, to summarize:

- being able to apply effort to websites that matter to our users;

- being able to automatically detect problems (or improvements) on websites.



2. I don't know the details sufficiently to answer on this point.
However, I can give you my personal thoughts on it.

We have known for long (by comparing our data with other available
sources of data such as Alexa) that there is a considerable bias between
users that opt-in for Telemetry and the rest of our users. Users who
opt-in for Telemetry are typically much more technically aware than
other users, but also some countries were largely over-represented.



3. That's a UX question, so that's pretty far from my expertise, but
there is already a section "Firefox Data Collection and Use" in
preferences, which may be used to opt-in/opt-out. I also seem to
remember that Firefox actually asks you upon first installation whether
you are ok with sending data.


I hope this helps,
David

On 22/08/17 16:44, siva.rk.sw--- via governance wrote:
>> Asks for sensitive data center most commonly around knowing something in
>> relation to which sites a user visits:
>>
>> -
>>
>> "Which top sites are users visiting?"
>> -
>>
>> "Which sites using Flash does a user encounter?"
>> -
>>
>> "Which sites does a user see heavy Jank on?"
>>
>> In summary most asks are for occurrences of an event X per domain (more
>> specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).
>
>
> Hello Georg, three questions:
>
> 1. Could you explain exactly what kinds of problems (which are currently a big source of trouble) would be solved easily with the currently proposed plan? And also what kinds of problems cannot be solved with this data, but could be solved with more invasive data collection?
>
> 2. What exactly is the problem if the collection is opt-in? Yes the data is "biased", so what? Are you worried that you might miss certain issues faced mostly by users who don't opt in? Is there any justification for this argument, or is it just a hunch?
>
> 3. For those users who consider privacy most valuable, would there be an easy way to opt out, which *guarantees* that Mozilla collects *no information* about their browser usage?
>
> Thank you.
>
> Siva
> _______________________________________________
> governance mailing list
> gover...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/governance
>

turi...@gmail.com

unread,
Aug 22, 2017, 11:12:59 AM8/22/17
to mozilla-g...@lists.mozilla.org
On Tuesday, August 22, 2017 at 4:49:56 PM UTC+2, Gervase Markham wrote:
> On 22/08/17 07:45,
> > But the disagreement is not about the idea that the technology does
> > not work. But that in principal collecting more data without users
> > having the option for disable it is moral wrong no matter how
> > trustworthy you are or useful it is for the product.
>
> Users _do_ have the option to disable it.
>
> Gerv

Correct me if i am wrong but this is presented as a solution to collect data without having to get explicit consent. It is not clear that user will be able to disabled it or not. If this the case then please be more clear as it will lead to misunderstandings.

How will this work? Having it enabled by default without making explicitly clear that it is happening is still morally wrong and anti-privacy. The policy pretty much hopes that people will be either be uninformed or complacent in disabling it. Otherwise whats the different to asking for explicit consent?

Gervase Markham

unread,
Aug 22, 2017, 11:16:28 AM8/22/17
to turi...@gmail.com
On 22/08/17 08:07, turi...@gmail.com wrote:
> Correct me if i am wrong but this is presented as a solution to
> collect data without having to get explicit consent. It is not clear
> that user will be able to disabled it or not. If this the case then
> please be more clear as it will lead to misunderstandings.

Perhaps it could have been more clear in the initial post, but Georg
definitely says:

"This is not the type of data we have collected as opt-out in the past
and is a new approach for Mozilla."

So this data collection will have an opt-out.

> How will this work? Having it enabled by default without making
> explicitly clear that it is happening is still morally wrong and
> anti-privacy. The policy pretty much hopes that people will be either
> be uninformed or complacent in disabling it. Otherwise whats the
> different to asking for explicit consent?

We have other data we collect which is opt-out, such as how the browser
is performing. The idea of opt-out data collection is not really the
question; the difference here is that the data is potentially more
sensitive. To address that, we want to use differential privacy and
RAPPOR; a good discussion to have, therefore, is whether those tools do
the job or not.

Gerv

jot...@hotmail.com

unread,
Aug 22, 2017, 11:28:25 AM8/22/17
to mozilla-g...@lists.mozilla.org

I made a thoughtful comment and it was rejected with a response that I need to show I'm familiar with Differential Privacy and RAPPOR before commenting. I'll do that before my actual comment.

I'm a computer scientist working in an adjacent field and I've read enough papers on Differential Privacy to understand it.

The objection is not to DP's privacy guarantees, but to the fact that FF will phone home with every website we visit. A neat list of all the websites I visit will be sent to a central location, in chronological order.

A second objection is the users' response, regardless of guarantees. You can't explain DP to everyone. For many users it will amount to "trust us". Microsoft did the same with the Windows 10 telemetry and it resulted in enormous backlash from users, widely reported in tech websites. Consider that before committing.

---

What follows was my actual suggestion, which is orthogonal to DP.

The example questions can be answered with no need for the bulk telemetry that's proposed:

> "Which top sites are users visiting?"

There's enough public data available on what sites are most popular. No need for yet another database on that.

> "Which sites using Flash does a user encounter?"

Mozilla can crawl this information itself, based on the above websites list. It doesn't need to ask users to do it.

> "Which sites does a user see heavy Jank on?"

Slowdowns and similar bad user experiences would better be treated like crash reports.

Offering to send anonymous info on one of these events, through a popup or dropdown hanger (similar to the password manager, security certificates, etc), would fulfill the same objective. A user is inclined to help when his/her favorite website suddenly starts slowing down, or throwing errors. At this point it's also easy to check a box to "always do this from now on".

Rather than authorizing abstract, bulk usage, the user would see the value in sending a report about the current issue, because he/she is experiencing it and wants Mozilla to fix it. I'm sure there would be more reports in this manner, just like there are more than enough crash reports being sent.

---

In conclusion, no telemetry is one of the main reasons for adopting FF over Chrome. Without dismissing the developers' point of view, given the importance of this feature, the onus should be on them to show that the alternatives have been explored and are not feasible, rather than putting the onus on users to show holes in the DP scheme, which is too restrictive for a discussion.

Danny

unread,
Aug 22, 2017, 11:28:31 AM8/22/17
to mozilla-g...@lists.mozilla.org
Hoping to provide constructive feedback.

A little about me as a user first so you can understand:
1) I purposely do not use Chrome
2) I purposely do not use Google and use DuckDuckGo instead as my search engine
3) I purposely do not use Gmail and use FastMail instead
4) I use uBlock, self-destructing cookies, Privacy Badger, etc
5) I use container tabs
6) I opt-out of any data collection
7) The first thing I do with Firefox Focus on iOS is to opt out of data collection
8) I like the idea of Firefox Send being end-to-end encrypted
9) I encrypt my backup locally prior to sending it to the cloud
10) I do not use Dropbox and use Tresorit instead
11) I disabled all telemetry on Windows 10

Now, if this was pushed out, the first thing I would do is still to disable it.
But why is that? RAPPOR is awesome right?

I briefly read the overview for it, so please correct me if I have any misunderstanding.

RAPPOR is kind of like the protection of farting in a crowded elevator. Somebody in that group did it, but we don't know who for sure. Yes, that's better privacy for sure, but is it total privacy? Not to me. Because you still know that somebody in that elevator did it very likely. Not a perfect analogy, but hopefully demonstrates the cracks.

Why do users like me do end-to-end encryption? What does that give me?
It gives me the ability to trust nobody except my end.

RAPPOR does not offer that same level of protection, and I think that's hopefully clear by the elevator example. That's why the first thing I'll do is disable it.

Why do users like me use uBlock and other things? What do those things give us? Total control in our hands. Does RAPPOR measure up to that standard? I think no.

But I very much want Firefox to succeed because the alternative of Chrome or Edge is a sad world. And I very much would like to submit data to Firefox, but not in an automatic and uncontrolled (by me) way.

Why does the choice have to be binary?

If I may suggest, could Mozilla investigate doing a bit more UX work to make data collection palatable to users like me?

And that means putting me in control.

I know that perf data is extremely important. In fact, I was just seeing freezes yesterday and that's kinda frustrating. But I still won't enable automatic data collection. What I think would be nice is if you actually just prompted me "crash reporting" style. Ask me, "hey... we know Firefox was a bit slow for you on such and such site, would you like to let us know?" And then give me the option of "Yes, this one time", "Yes, always on this site", "No".

Of course you still have to anonymize that data, but what this does is you've given me control. See the distinction?

What if you want to know what top sites I'm visiting? or what sites with Flash that I encounter? Same thing. Yes, I know the suggestion is eTLD+1, but hopefully mindset of users like me has been explained at this point to show that is still not measuring up. I would love to let you know what sites I visit. Give me a little feedback ability, show me the sites that you're going to send and let me check off anything I don't want. Again, you probably don't want to annoy the user with needless prompts, so maybe the ability to say "Yes, these sites are always ok to send", or "Always exclude this site".

I'm sure if you proceed as planned to automatic opt-in users to this collection, you're going to get more data. But you can bet you'll get none from me.

I want to send you data, so please help me help you.

turi...@gmail.com

unread,
Aug 22, 2017, 11:53:37 AM8/22/17
to mozilla-g...@lists.mozilla.org

>
> The idea of opt-out data collection is not really the
> question; the difference here is that the data is potentially more
> sensitive.
>
> Gerv

Exactly. Because the data is more sensitive the idea of opt-out comes into question before the question of the technology. If a person thinks that opt-out data collection is wrong it does not matter how effective the privacy technology is.

This definitely has the potential to hurt the Firefox brand as a product that respects choice and does not try to trick you.

Anyway since you wish a greater discussion on the actual technology i will stop here. Thank you for the replies.

Aaron Klotz

unread,
Aug 22, 2017, 1:21:36 PM8/22/17
to gover...@lists.mozilla.org
For the purposes of this thread I am not taking a specific position on
the overall issue, but as somebody who has worked on performance I would
like to point something out for discussion:

On 8/22/2017 9:19 AM, jotaf98--- via governance wrote:
>> "Which top sites are users visiting?"
> There's enough public data available on what sites are most popular. No need for yet another database on that.
>
>> "Which sites using Flash does a user encounter?"
> Mozilla can crawl this information itself, based on the above websites list. It doesn't need to ask users to do it.

I don't think it's that simple. Plenty of content on top sites is
tailored to the user in some way. To measure how a browser is performing
on those sites, one would want to measure performance for actual content
that real users are seeing. Throwing some kind of crawler at it is
unlikely to produce representative samples of encounters with such
content, IMHO.

Georg Fritzsche

unread,
Aug 22, 2017, 1:44:59 PM8/22/17
to lede...@gmail.com, mozilla-g...@lists.mozilla.org
Hi,

great question.
We have been getting better at tracking down general performance issues and
breakage, but usually we don't know which sites this is happening on.
I think there are two parts here:
- Understanding which domains things happen on, e.g. on which sites did
feature X break.
- Understanding what the actual top sites are in Firefox to inform testing
and investigations.

Neither of these are settled yet, but were specific asks that came up.
Collecting these - in a privacy-preserving way - would be valuable.
Georg

Irvin Chen

unread,
Aug 22, 2017, 1:54:51 PM8/22/17
to Aaron Klotz, gover...@lists.mozilla.org
I'm totally support for any user research, if it is following the rules we
advocate for...

“Individuals’ security and privacy on the Internet are fundamental and must
not be treated as optional.”
https://www.mozilla.org/en-US/about/manifesto/#principle-04

“No surprises
Use and share information in a way that is transparent and benefits the
user.”
https://www.mozilla.org/en-US/privacy/principles/

“Privacy as the default setting: ...privacy must be top of mind. It also
means that strong privacy should always be the ‘by-default setting’.”
https://blog.mozilla.org/netpolicy/2016/05/25/the-countdown-is-on-24-months-to-gdpr-compliance/

“Privacy by Default
Privacy by Default simply means that the strictest privacy settings
automatically apply once a customer acquires a new product or service. In
other words, no manual change to the privacy settings should be required on
the part of the user.”
http://www.eudataprotectionregulation.com/data-protection-design-by-default



Aaron Klotz via governance <gover...@lists.mozilla.org>於 2017年8月23日
週三,上午1:21寫道:

> For the purposes of this thread I am not taking a specific position on
> the overall issue, but as somebody who has worked on performance I would
> like to point something out for discussion:
>
> On 8/22/2017 9:19 AM, jotaf98--- via governance wrote:
> >> "Which top sites are users visiting?"
> > There's enough public data available on what sites are most popular. No
> need for yet another database on that.
> >
> >> "Which sites using Flash does a user encounter?"
> > Mozilla can crawl this information itself, based on the above websites
> list. It doesn't need to ask users to do it.
>
> I don't think it's that simple. Plenty of content on top sites is
> tailored to the user in some way. To measure how a browser is performing
> on those sites, one would want to measure performance for actual content
> that real users are seeing. Throwing some kind of crawler at it is
> unlikely to produce representative samples of encounters with such
> content, IMHO.

gene...@gmail.com

unread,
Aug 22, 2017, 3:33:04 PM8/22/17
to mozilla-g...@lists.mozilla.org
How do you see this study lining up with the data already being collected for the Firefox Health Report? This discussion strikes me as raising similar questions and tradeoffs, which were discussed on this forum and also blogged about by Mitchell and the metrics team.*

Users currently have options under Preferences to "automatically send technical and interaction data to Mozilla" and to "send crash reports to Mozilla." How do you see handling the opt-out for the study from a UX perspective? Is there a way to automatically opt out those of us sending a DNT header? Or respect a user with data sharing turned off?

*https://blog.lizardwrangler.com/2012/09/21/firefox-health-report/ and https://blog.mozilla.org/metrics/2012/09/21/firefox-health-report/

Rubén Martín

unread,
Aug 22, 2017, 3:57:32 PM8/22/17
to Irvin Chen, Aaron Klotz, gover...@lists.mozilla.org
Hi,

I completely agree with Irvin here. This proposal is extremely
uncomfortable for me, even after reading about Rappor, I share the same
concerns others have expressed in this topic about privacy and user
expectations.

I think we can be creative if the problem is that we need to understand
which sites are not performing OK on Firefox without compromising our
values. Sending sensitive information as opt-out is not, in my opinion,
the way to go.

Cheers.

El 22/08/17 a las 19:54, Irvin Chen via governance escribió:
> I'm totally support for any user research, if it is following the rules we
> advocate for...
>
> “Individuals’ security and privacy on the Internet are fundamental and must
> not be treated as optional.”
> https://www.mozilla.org/en-US/about/manifesto/#principle-04
>
> “No surprises
> Use and share information in a way that is transparent and benefits the
> user.”
> https://www.mozilla.org/en-US/privacy/principles/
>
> “Privacy as the default setting: ...privacy must be top of mind. It also
> means that strong privacy should always be the ‘by-default setting’.”
> https://blog.mozilla.org/netpolicy/2016/05/25/the-countdown-is-on-24-months-to-gdpr-compliance/
>
> “Privacy by Default
> Privacy by Default simply means that the strictest privacy settings
> automatically apply once a customer acquires a new product or service. In
> other words, no manual change to the privacy settings should be required on
> the part of the user.”
> http://www.eudataprotectionregulation.com/data-protection-design-by-default
>
>
>
> Aaron Klotz via governance <gover...@lists.mozilla.org>於 2017年8月23日
> 週三,上午1:21寫道:
>
>> For the purposes of this thread I am not taking a specific position on
>> the overall issue, but as somebody who has worked on performance I would
>> like to point something out for discussion:
>>
>> On 8/22/2017 9:19 AM, jotaf98--- via governance wrote:
>>>> "Which top sites are users visiting?"
>>> There's enough public data available on what sites are most popular. No
>> need for yet another database on that.
>>>> "Which sites using Flash does a user encounter?"
>>> Mozilla can crawl this information itself, based on the above websites
>> list. It doesn't need to ask users to do it.
>>
>> I don't think it's that simple. Plenty of content on top sites is
>> tailored to the user in some way. To measure how a browser is performing
>> on those sites, one would want to measure performance for actual content
>> that real users are seeing. Throwing some kind of crawler at it is
>> unlikely to produce representative samples of encounters with such
>> content, IMHO.
>> _______________________________________________
>> governance mailing list
>> gover...@lists.mozilla.org
>> https://lists.mozilla.org/listinfo/governance
>>
> _______________________________________________
> governance mailing list
> gover...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/governance


--
Rubén Martín [Nukeador]
Mozilla Reps Mentor
http://www.mozilla-hispano.org
http://twitter.com/mozilla_hispano
http://facebook.com/mozillahispano


signature.asc

hen000....@gmail.com

unread,
Aug 22, 2017, 7:44:35 PM8/22/17
to mozilla-g...@lists.mozilla.org
On Monday, 21 August 2017 15:56:44 UTC, Georg Fritzsche wrote:
> "Which sites does a user see heavy Jank on?"


Why can't FireFox display a bar at the top asking the user to report the page for issues instead?

A bar like the one that tells users about non-responsive scripts, ect. That way you are not always collecting private data about the user and you also don't get the 'bias' of an opt-in system.

morocca...@gmail.com

unread,
Aug 22, 2017, 7:44:48 PM8/22/17
to mozilla-g...@lists.mozilla.org
My initial thoughts were RAPPOR is just another data collection system that claims to respect its user's privacy but doesn't really, though upon a little research I've found it does the exact opposite, wherein it really does respect privacy by aggregating real data with fake, random data. It reminds me of what Wikileaks did to prevent its real sources from being discovered. I just hope more sites take this approach to data collection, since it gives the best of both worlds.

patrick...@gmail.com

unread,
Aug 22, 2017, 7:45:28 PM8/22/17
to mozilla-g...@lists.mozilla.org
I think the premise that you need to collect data on the top sites that a user visits may be flawed. Won't you be contributing to the dominance of (already-dominant) top sites by optimizing for them specifically?

It also seems that you could get a reasonably accurate idea of what sites are most popular among FIrefox users by looking at the most popular sites overall and optimizing for those. Do you expect that Firefox users are so wildly different that their top sites don't look more or less the same as the top sites overall?

Further, as has been shown again and again, data thought to be untraceable to any particular user has been deanonymized through correlations with other data sets. Something like top visited sites are actually a pretty juicy target as well for state actors, blackmailers, etc.

Finally, the mere act of doing random (from the user's perspective) telemetry is problematic. First, users on limited connections don't need to be using more data than they already are. Second, the mere act of making a request with IP endpoints, even if it sends only a ping, can expose an unprepared user who needs privacy. I understand that Firefox already does some of this, but that's not really a reason to do more.

>From a business perspective, a major differentiating factor (arguably the only differentiating factor) of Firefox is that Mozilla isn't Google. The closer you get to that line, the more damage you'll do to the trust users have in Mozilla.

I recommend that you take the high road on this one. I'm not sure what the motivator is here (does having more data give you leverage with partners)? But the stated justification (improving speeds on particular websites) seems too weak to excuse the valid privacy concerns.

Mozilla: we want to trust you. We do trust you. We know it's tough out there. You're playing with the big kids, and they have intel that, admittedly, probably helps them improve their products. But the way you can improve your product is by NOT collecting that intel. Do the Mozilla thing, not the Google thing.

On Monday, August 21, 2017 at 11:56:44 AM UTC-4, Georg Fritzsche wrote:
> Hi,
>
> for Firefox we want to better understand how people use our product to
> improve their experience. To do that, we are planning to run a new SHIELD
> study that tests how we can collect additional data in a privacy preserving
> way. Check out the details below and send me your thoughts.
>
> The problem.
>
> One recurring ask from the Firefox product teams is the ability to collect
> more sensitive data, like top sites users visit and how features perform on
> specific sites.
>
> Currently we can collect this data when the user opts in, but we don't
> have a way to collect unbiased data, without explicit consent (opt-out).
>
> Asks for sensitive data center most commonly around knowing something in
> relation to which sites a user visits:
>
> -
>
> "Which top sites are users visiting?"
> -
>
> "Which sites using Flash does a user encounter?"
> -
>
> "Which sites does a user see heavy Jank on?"
>
> In summary most asks are for occurrences of an event X per domain (more
> specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).
>
> The solution.
>
> One solution is the use of differential privacy [2] [3], which allows us to
> collect sensitive data without being able to make conclusions about
> individual users, thus preserving their privacy.
>
> An attacker that has access to the data a single user submits is not able
> to tell whether a specific site was visited by that user or not.
>
> The Google Open Source project called RAPPOR [4] [5] is the most widely
> known and deployed implementation of differential privacy.
>
> We have been investigating the use of RAPPOR for these kind of use-cases,
> with initial simulation results being promising.
>
> Our plan.
>
> What we plan to do now is run an opt-out SHIELD study [6] to validate our
> implementation of RAPPOR. This study will collect the value for users’ home
> page (eTLD+1) for a randomly selected group of our release population We
> are hoping to launch this in mid-September.
>
> This is not the type of data we have collected as opt-out in the past and

Panos Astithas

unread,
Aug 23, 2017, 8:11:03 AM8/23/17
to hen000....@gmail.com, mozilla-g...@lists.mozilla.org
On Tue, Aug 22, 2017 at 9:45 PM, hen000.c.young--- via governance <
gover...@lists.mozilla.org> wrote:

> On Monday, 21 August 2017 15:56:44 UTC, Georg Fritzsche wrote:
> > "Which sites does a user see heavy Jank on?"
>
>
> Why can't FireFox display a bar at the top asking the user to report the
> page for issues instead?
>

Because this is the definition of opt-in data collection ("can we collect
this data? Sure, I'm in!"), which has the data quality issues already
mentioned. Opt-out data collection means that by default we would be
collecting the data, unless the user goes to the preferences panel and opts
out of it (there is also a notification bar for every new installation to
remind users of this policy and how to opt out).

There is already both opt-out data collection going on (e.g. longest cycle
collection pause) and opt-in data collection (e.g. whether the device
supports touch input) in Firefox. The differential privacy approach of
RAPPOR we believe gives us the mathematical proof that we can collect some
of the more privacy-sensitive data in a way that doesn't reduce user
privacy.

And to reiterate the obvious: we don't collect user data to build user
profiles and we never want to. We are not in the advertising business, we
are a non profit working for the public benefit. We will always be
providing ways to disable data collection, even for the non personal
identifying information we need to collect in order to improve the browser.
We believe the academic research of the last few years on differential
privacy has figured out ways to collect data without infringing on user
privacy. Apple, Google and others are already using the fruits of this
research in their products. If there are reasons to believe these methods
aren't working well enough, we would very much like to know about them!

Panos

Mike Hommey

unread,
Aug 23, 2017, 9:05:52 AM8/23/17
to Panos Astithas, hen000....@gmail.com, mozilla-g...@lists.mozilla.org
On Wed, Aug 23, 2017 at 03:10:31PM +0300, Panos Astithas via governance wrote:
> (...) We are not in the advertising business, (...)

In fairness, we have been, at one moment, to the surprise of many. I can
understand that people could fear that happens again some day.

Mike

oliver....@gmail.com

unread,
Aug 23, 2017, 10:19:47 AM8/23/17
to mozilla-g...@lists.mozilla.org
I know it's super-anonymised, but given the controversial nature of the subject, it might help people to understand more about the actual data you're planning to collect.

So what exactly do you plan to collect? How long do you plan to store this data? How will it be stored? Is there a process in place to ensure it isn't kept for any longer than necessary? Who will have access to the data and how?

Thanks,
Olly.

pgne...@gmail.com

unread,
Aug 23, 2017, 10:20:31 AM8/23/17
to mozilla-g...@lists.mozilla.org
On Wednesday, August 23, 2017 at 6:05:52 AM UTC-7, Mike Hommey wrote:
> In fairness, we have been, at one moment, to the surprise of many. I can
> understand that people could fear that happens again some day.

Well noted. And then some.

This debate is astonishing, but no longer surprising.

In my book, if a vendor wants to *begin* to estabish trust, step *1* is, as Irving above referenced, "Privacy by Default".

OTOH, if they want to ensure that they lose it rapidly, "Opt-Out" is, respevtively, a GREAT starting point.

Is there at least going to be an about:config param, ENV var, etc. to set PRIOR to this auto-toggle to Opt-Out? So that the toggle, even if only for moments, is preventable for both existing users, and for new installs? Or do we need to start sleuthing for, and firewalling, telemetry endpoints?

Display name

unread,
Aug 23, 2017, 10:20:51 AM8/23/17
to mozilla-g...@lists.mozilla.org
On Monday, August 21, 2017 at 5:56:44 PM UTC+2, Georg Fritzsche wrote:
> for Firefox we want to better understand how people use our product to
> improve their experience.

You already have the tools to do that. It's called *survey*.
Plus, never heard any answer from you from all sugestions we give in my Computer Group (20+ persons). In years, never one single answer...
And now you want to "improve our experience"??? Really???
LOL

Kurt Roeckx

unread,
Aug 23, 2017, 10:25:50 AM8/23/17
to mozilla-g...@lists.mozilla.org
On 2017-08-21 17:56, Georg Fritzsche wrote:
> What we plan to do now is run an opt-out SHIELD study [6] to validate our
> implementation of RAPPOR. This study will collect the value for users’ home
> page (eTLD+1) for a randomly selected group of our release population We
> are hoping to launch this in mid-September.

This at least looks confusing to me. Will Firefox have a list of
possible homepages, and then send some answer to Mozilla for a random
sample (or all) of those?

That is at least how I expect it to work, and "collect the value" can be
interpreted in multiple ways. I suggest someones writes a nice
explanation of how this works.


Kurt

Alex Gaynor

unread,
Aug 23, 2017, 10:33:48 AM8/23/17
to Kurt Roeckx, mozilla-g...@lists.mozilla.org
I had the same question, but it looks like RAPPOR has gotten significantly
more advanced since I originally learned about the "just boolean questions"
version. https://arxiv.org/pdf/1503.01214.pdf explains how to build privacy
preserving measurements without knowing the values of the population.

I'm personally very excited to see more investment in privacy preserving
telemetry from open source projects.

Alex

David Teller

unread,
Aug 23, 2017, 10:44:23 AM8/23/17
to gover...@lists.mozilla.org
Hi Olly,

This is a good question. I am not part of that team. However, I have
followed (admittedly from afar) some of the project, so I'll try to
answer from my limited knowledge.

If I understand correctly, the plan is the following:

- Start with a pre-defined list of the N most visited websites around
the world (I don't know the value of N, but I would guess somewhere
around 1000). A website being "google.com" or "blogspot.com", for
instance, but without any further detail, so no differentiating between
blogs or actual search requests.

- Each copy of Firefox involved in this survey will send once a list of
booleans "yes I have visited this website during the past ... days" ...
except the data will be partially falsified by Firefox to make sure that
nobody (including Mozilla) has accurate data on individual users.

- This data will be sent once (and only once) per copy of Firefox, to
make sure that nobody (including Mozilla) can deduce more detailed data
by observing specific users.


I will let you judge whether this information is privacy-invasive.

I have no information regarding storage and retention policy. I am
nearly certain, however, that the IP is not stored.

I imagine that someone actually working on the project is working on a
more detailed presentation that would answer all your questions.

Best regards,
David

Doug Thayer

unread,
Aug 23, 2017, 1:37:58 PM8/23/17
to gover...@lists.mozilla.org
(Edit: sending this again because it didn't seem to make it to the archives)

> The objection is not to DP's privacy guarantees, but to the fact that FF
> will phone home with every website we visit. A neat list of all the
websites
> I visit will be sent to a central location, in chronological order.

I think this is misleading. What we would be sending is a neat list of
jumbled
garbage that is almost indistinguishable from random noise. No
conclusions can
be made about what websites you visit from this. With many records, we could
tell that a given site was probably visited X number of times by various
people, but at no point in time will anyone be able to say that you
visited a
particular website. Apologies if you already understood this, but I
wanted to
make it clear to anyone else reading your comment that it's not as if we're
sending "sketchywebsite.com" back to a central location.

> RAPPOR is kind of like the protection of farting in a crowded elevator.
> Somebody in that group did it, but we don't know who for sure. Yes,
that's
> better privacy for sure, but is it total privacy? Not to me. Because you
> still know that somebody in that elevator did it very likely. Not a
perfect
> analogy, but hopefully demonstrates the cracks.

Sticking to the farting analogy, it would be more like a methane
detector in a
large building. If one person farts, really we couldn't tell since we
couldn't
distinguish between one fart and regular fluctuations in the methane content
of the air. However, if lots of people are farting, we should be able to
estimate roughly how many farts are happening in a given time period. I
think
it's important to make this distinction, because it means that we can only
observe _common_ behaviors of the crowd, while deviant behaviors of an
individual can _never_ be observed.

> Offering to send anonymous info on one of these events, through a
popup or
> dropdown hanger (similar to the password manager, security certificates,
> etc), would fulfill the same objective. A user is inclined to help when
> his/her favorite website suddenly starts slowing down, or throwing
errors.
> At this point it's also easy to check a box to "always do this from
now on".

We don't want to annoy users _more_ by asking them to tell us about their
performance issue. Crashes are severe enough and can require detailed enough
information to diagnose that it's worth it in this case, but we would
like to
be able to observe information about more minor events without pestering
people. This doesn't justify sacrificing their privacy, but the claim is
that
RAPPOR allows us to do this without degrading anyone's privacy, since no
conclusions can be made about individual users or highly uncommon behavior.

> Exactly. Because the data is more sensitive the idea of opt-out comes
into
> question before the question of the technology. If a person thinks
that opt-
> out data collection is wrong it does not matter how effective the privacy
> technology is.
>
> This definitely has the potential to hurt the Firefox brand as a product
> that respects choice and does not try to trick you.
>
> Anyway since you wish a greater discussion on the actual technology i
will
> stop here. Thank you for the replies.

We're focusing on the technology because the claim is that the technology
means that this data is not _actually_ more sensitive than the data we're
already collecting in an opt-out manner. We're not trying to hush users who
can't talk about the technical aspects of RAPPOR, but rather trying to
keep it
on the topic of whether RAPPOR satisfies your definition of privacy or
not. My
understanding of privacy is that if no one at all (malicious or not) is
capable of making conclusions about me in particular, then my privacy is
being
protected. Differential privacy satisfies that definition, but privacy can
mean different things to different people.

Georg Fritzsche

unread,
Aug 23, 2017, 1:43:41 PM8/23/17
to siva....@gmail.com, mozilla-g...@lists.mozilla.org
Hi Siva,

i believe David already addressed the first point.

For opt-in bias, this is a real problem. The proportion of users opting
into data collection is low and not representative.

For opting out, user choice and control is important for us. We always give
the user control over their data, so everything can be disabled in the
preferences.

Georg



On Tue, Aug 22, 2017 at 4:44 PM, siva.rk.sw--- via governance <
gover...@lists.mozilla.org> wrote:

> > Asks for sensitive data center most commonly around knowing something in
> > relation to which sites a user visits:
> >
> > -
> >
> > "Which top sites are users visiting?"
> > -
> >
> > "Which sites using Flash does a user encounter?"
> > -
> >
> > "Which sites does a user see heavy Jank on?"
> >
> > In summary most asks are for occurrences of an event X per domain (more
> > specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).
>
>
> Hello Georg, three questions:
>
> 1. Could you explain exactly what kinds of problems (which are currently a
> big source of trouble) would be solved easily with the currently proposed
> plan? And also what kinds of problems cannot be solved with this data, but
> could be solved with more invasive data collection?
>
> 2. What exactly is the problem if the collection is opt-in? Yes the data
> is "biased", so what? Are you worried that you might miss certain issues
> faced mostly by users who don't opt in? Is there any justification for this
> argument, or is it just a hunch?
>
> 3. For those users who consider privacy most valuable, would there be an
> easy way to opt out, which *guarantees* that Mozilla collects *no
> information* about their browser usage?
>
> Thank you.
>
> Siva

Georg Fritzsche

unread,
Aug 23, 2017, 1:45:18 PM8/23/17
to jot...@hotmail.com, mozilla-g...@lists.mozilla.org
On Tue, Aug 22, 2017 at 5:19 PM, jotaf98--- via governance <
gover...@lists.mozilla.org> wrote:

> The example questions can be answered with no need for the bulk telemetry
> that's proposed:
>
> > "Which top sites are users visiting?"
>
> There's enough public data available on what sites are most popular. No
> need for yet another database on that.
>
> > "Which sites using Flash does a user encounter?"
>
> Mozilla can crawl this information itself, based on the above websites
> list. It doesn't need to ask users to do it.


For crawling the sites, this will allow us to see how many sites use Flash,
but can't tell us which sites our users encounter it on.

Similarly for the top sites - third-party data is useful, but can't
reliably tell us which the actual top-sites for our Firefox users are.

Georg

Georg Fritzsche

unread,
Aug 23, 2017, 1:47:38 PM8/23/17
to Danny, mozilla-g...@lists.mozilla.org
Hi Danny.

On Tue, Aug 22, 2017 at 5:19 PM, Danny via governance <
gover...@lists.mozilla.org> wrote:

> I know that perf data is extremely important. In fact, I was just seeing
> freezes yesterday and that's kinda frustrating. But I still won't enable
> automatic data collection. What I think would be nice is if you actually
> just prompted me "crash reporting" style. Ask me, "hey... we know Firefox
> was a bit slow for you on such and such site, would you like to let us
> know?" And then give me the option of "Yes, this one time", "Yes, always on
> this site", "No".


This would be subject to opt-in bias.

This works well to answer some kind of questions, but not generally when we
need representative samples. Generally submission rates for this kind of
opt-in mechanism are often low, which limits our possible insights.

Georg

Laurențiu Nicola

unread,
Aug 23, 2017, 3:02:07 PM8/23/17
to mozilla-g...@lists.mozilla.org
Hi Georg,

I have a couple of questions and/or concerns and they don't seem to be addressed too well in this thread. It's probably going to be rather long, so sorry for that.

> One recurring ask from the Firefox product teams is the ability to collect
> more sensitive data, like top sites users visit and how features perform on
> specific sites.
>
> Currently we can collect this data when the user opts in, but we don't
> have a way to collect unbiased data, without explicit consent (opt-out).

Does this refer to the Firefox Pioneer [1] add-on, or something else?

>> (...) We are not in the advertising business, (...)
> In fairness, we have been, at one moment, to the surprise of many. I can
understand that people could fear that happens again some day.

Thanks, Mike, for admitting this. I assume it's about the sponsored/suggested tiles functionality, but I'm not convinced that it stopped. Are there still plans to make about:newtab load from the Mozilla servers [2]? Is Activity Stream fundamentally different?

Note that RAPPOR was originally implemented years ago with the intention of being used for this kind of data collection [4] and not for monitoring performance.

I also think it's happening with Pocket.

> This data will be sent once (and only once) per copy of Firefox, to
make sure that nobody (including Mozilla) can deduce more detailed data
by observing specific users.

That is the promise for this SHIELD experiment. We don't know how RAPPOR will be used in the future. It might, for example, be expanded to cover whole domains instead of eTLD+1s (that's been considered in the past [5], so it's not just a slippery slope argument).

> What we would be sending is a neat list of jumbled garbage that is almost indistinguishable from random noise. No conclusions can be made about what websites you visit from this.

My (admittedly shallow) understanding of DP is that there is always a risk of data being exposed. This is a parameter of the implementation and can be tuned in one direction of another, but it's always there. DP is not perfect privacy.

There's also a discussion of client identifiers (FHR/Telemetry ids) being included or not in the data. This is not obviously safe.

>> Offering to send anonymous info on one of these events, through a popup or
>> dropdown hanger (similar to the password manager, security certificates,
>> etc), would fulfill the same objective. A user is inclined to help when
>> his/her favorite website suddenly starts slowing down, or throwing errors.
>> At this point it's also easy to check a box to "always do this from now on".
> We don't want to annoy users _more_ by asking them to tell us about their
performance issue.

I feel like you're too eager to dismiss suggestions like this. Please don't. Mobile applications on iOS and Android do something similar [6], so the users might be familiar to them. Don't ask for a thousand permissions at install time. Ask nicely when you need something and show what you need it for. Allow the user to decide on a site by site basis.

> For crawling the sites, this will allow us to see how many sites use Flash,
but can't tell us which sites our users encounter it on.

If I understand it correctly, RAPPOR needs a pre-defined list of sites. If users encounter Flash applets in a RAPPOR study, you will already know it's on a site in that pre-defined list. It can be most likely be found via crawling.

Now you might be interested in how often users interact with Flash on those sites. I admit that's not possible with only crawling, but it's not obvious from your message.

I strongly dislike you giving the example of Flash. It's already dying and we all know that. Adobe will discontinue it in a couple of years. My guess is that the top visited sites are no longer using it. Would any information obtained via RAPPOR change Mozilla or Adobe's stance on Flash support? Compare this with the XUL add-on situation, where Mozilla already knows exactly what add-ons the users have and what they are installing.

> "Which top sites are users visiting?"

Alexa's top list should be enough, or whatever list you would be preloading into RAPPOR. If Firefox works well on those sites, the users will be happy. There's no reason to believe that Firefox users are interested in completely different sites from other Internet users.

The feeling I got from your first post is that you want to have the mechanism available without a clear idea of how it's going to be used. Myself, I'm really uncomfortable with this.

> Hello, Redditors...

Please don't dismiss posters from high-profile sites like HN and Reddit. They came here because they care. They're the ones that recommend Firefox to their friends. Some of them are the ones who offered constructive feedback related to the issue at hand. And they are the Firefox users, even if they might not care to read a Wikipedia page full of formulae.

I understand how you might be annoyed about thousands of people coming to this thread like it happened before on the one about Pocket. But we aren't the "bad guys" here, just as I think you're not, either.

Laurentiu

[1] https://addons.mozilla.org/en-US/firefox/addon/firefox-pioneer/
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=1176429
[3] https://wiki.mozilla.org/Firefox/Activity_Stream
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1136461
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1138022#c17
[6] https://techcrunch.com/2014/04/04/the-right-way-to-ask-users-for-ios-permissions/

Danny

unread,
Aug 23, 2017, 3:26:22 PM8/23/17
to mozilla-g...@lists.mozilla.org
Hi Georg,

I'm still not convinced. Does anyone actually have data on this?

We're not talking about opt-in vs opt-out in _general_. In general, you are absolutely right.

I meant to suggest the solution specifically for the perf issues. I meant to suggest to do an in-context, in-the-moment type request.

Like Laurențiu mentioned, this is very common in iOS apps, which I'm very comfortable with. An app want my contacts? I can decide at the moment of request. An app want my location? I can decide to allow it while the app is running or allow it also in background.

Apple goes even further and once in a while ask you "this app has been using your location in the background, do you want to continue to allow this?"

Apple received a ton of backlash early on and have implemented mechanisms to protect users' privacy. That level of protection I'm comfortable with.

The stated purpose is to discover top sites that users experience "heavy jank" on. My question is more that if users are experiencing "heavy jank", would they not want to submit a report? Are there desktop apps where such an approach has been taken?

I don't know of any.

It's always the all or nothing approach. Always "key to my house" vs not. To those requests, I always opt out.

Danny

unread,
Aug 23, 2017, 4:40:09 PM8/23/17
to mozilla-g...@lists.mozilla.org
On Wednesday, August 23, 2017 at 10:37:58 AM UTC-7, Doug Thayer wrote:
> Sticking to the farting analogy, it would be more like a methane
> detector in a
> large building. If one person farts, really we couldn't tell since we
> couldn't
> distinguish between one fart and regular fluctuations in the methane content
> of the air. However, if lots of people are farting, we should be able to
> estimate roughly how many farts are happening in a given time period. I
> think
> it's important to make this distinction, because it means that we can only
> observe _common_ behaviors of the crowd, while deviant behaviors of an
> individual can _never_ be observed.

Hi Doug,

Thanks for the response.

I definitely wrote that when I haven't understood RAPPOR as well, so I apologize for that quick trigger response.

Reading the RAPPOR paper more, it looks like it does think through the case I was alluding to. The situation I was worried about is multiple collections and over a period of time. Yes, one participation in the methane detection test might not reveal much. But what's being asked is the automatic participation in all subsequent tests.

The RAPPOR paper does talk about this situation and does have cautions needed to accurately mitigate these. Especially things like multiple accidental participations. (install Firefox, install Firefox nightly for example)

I guess it's impossible for me to actually drill into whether Firefox's implementation would have all the cases covered. But just to say that what's asked is still the automatic trust of all and future behaviors (of the implementation).

sk.gri...@gmail.com

unread,
Aug 23, 2017, 8:40:59 PM8/23/17
to mozilla-g...@lists.mozilla.org
Do it with google, apple, windows first

sk.gri...@gmail.com

unread,
Aug 23, 2017, 8:40:59 PM8/23/17
to mozilla-g...@lists.mozilla.org
I think it is not a bad decision. Most users who know about telemetry can disable it. Others and that means a greater majority of people have no idea if such a thing exists, and would not care either way as long as they get to the site they type in address bar. Just dont do it in incognito

Kurt Roeckx

unread,
Aug 24, 2017, 4:24:21 AM8/24/17
to mozilla-g...@lists.mozilla.org
On 2017-08-23 16:33, Alex Gaynor wrote:
> I had the same question, but it looks like RAPPOR has gotten significantly
> more advanced since I originally learned about the "just boolean questions"
> version. https://arxiv.org/pdf/1503.01214.pdf explains how to build privacy
> preserving measurements without knowing the values of the population.

So if I understand things correctly from the paper, you create a bloom
filter for the URL/hostname you want to send, then randomly change it,
store that. And each time they ask about the URL/hostname you take the
stored version, randomly change it and that's what you send.

What I understand from that is that you don't get to learn the
URL/hostname at all, but can query if a URL/hostname has been submitted.
You don't get to learn what the population is, but the whole population
can be send.

Is that accurate?


Kurt

Georg Fritzsche

unread,
Aug 24, 2017, 9:36:39 AM8/24/17
to gene...@gmail.com, mozilla-g...@lists.mozilla.org
Hi,

if users turn off data sharing through the preferences, we will not submit
this data. Firefox will always respect the users choice.

Georg

hagi...@gmail.com

unread,
Aug 24, 2017, 9:12:07 PM8/24/17
to mozilla-g...@lists.mozilla.org
I'm just a regular Firefox user, albeit on who spent over 30 years in IT, including time on sensitive applications like BANKING and HEALTH CARE.

Whether kept as Opt-in or changed to Opt-out this is my feedback on maintaining privacy while monitoring product use:

1. I trust telemetry and dumps are encrypted for transmission.

2. The big thing is being very careful with "PERSONALLY IDENTIFIABLE" information.

The combination of add-ons, computer hardware, antivirus, and other options is UNIQUE ENOUGH to qualify as personally identifiable. That information MUST be kept disconnected from lists of what sites we visit and how often we visit them.

3. Since it is not typical monitoring, I think most people would accept that when there is a dump/error/crash that there is website info.

It is one thing to give a count of how many cases of syphilis there are in a city, quite another to give a count of how many cases there are in a town of 300. It is one thing to say that credit card spending is up 20%, quite another to say that Bob Smith's credit card spending is up 20%.

hagi...@gmail.com

unread,
Aug 24, 2017, 9:12:08 PM8/24/17
to mozilla-g...@lists.mozilla.org
It would be better to keep it opt-in as the default, but during installation prompt with the option to opt-in to telemetry, rather than quietly letting the install go with the default as opted-out.

Also please make sure you've got the security precautions mentioned in my earlier post.

1. Keep personally identifiable information (like the combination of browser, add-ons, computer hardware, and antivirus) separate and disconnected from records of what websites were visited.

2. Ensure transmissions of monitoring data are encrypted.

Athena82

unread,
Aug 26, 2017, 12:38:52 PM8/26/17
to mozilla-g...@lists.mozilla.org
Op maandag 21 augustus 2017 17:56:44 UTC+2 schreef Georg Fritzsche:
> Hi,
>
> for Firefox we want to better understand how people use our product to
> improve their experience. To do that, we are planning to run a new SHIELD
> study that tests how we can collect additional data in a privacy preserving
> way. Check out the details below and send me your thoughts.
>
> The problem.
>
> One recurring ask from the Firefox product teams is the ability to collect
> more sensitive data, like top sites users visit and how features perform on
> specific sites.
>
> Currently we can collect this data when the user opts in, but we don't
> have a way to collect unbiased data, without explicit consent (opt-out).
>
> Asks for sensitive data center most commonly around knowing something in
> relation to which sites a user visits:
>
> -
>
> "Which top sites are users visiting?"
> -
>
> "Which sites using Flash does a user encounter?"
> -
>
> "Which sites does a user see heavy Jank on?"
>
> In summary most asks are for occurrences of an event X per domain (more
> specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).
>
> The solution.
>
> One solution is the use of differential privacy [2] [3], which allows us to
> collect sensitive data without being able to make conclusions about
> individual users, thus preserving their privacy.
>
> An attacker that has access to the data a single user submits is not able
> to tell whether a specific site was visited by that user or not.
>
> The Google Open Source project called RAPPOR [4] [5] is the most widely
> known and deployed implementation of differential privacy.
>
> We have been investigating the use of RAPPOR for these kind of use-cases,
> with initial simulation results being promising.
>
> Our plan.
>
> What we plan to do now is run an opt-out SHIELD study [6] to validate our
> implementation of RAPPOR. This study will collect the value for users’ home
> page (eTLD+1) for a randomly selected group of our release population We
> are hoping to launch this in mid-September.
>
> This is not the type of data we have collected as opt-out in the past and
> is a new approach for Mozilla. As such, we are still experimenting with the
> project and wanted to reach out for feedback.
>
> Georg
>
> References:
>
> 1: https://en.wikipedia.org/wiki/Public_Suffix_List
>
> 2: https://en.wikipedia.org/wiki/Differential_privacy
>
> 3: https://robertovitillo.com/2016/07/29/differential-privacy-for-dummies/
>
> 4: https://github.com/google/rappor
> 5: https://arxiv.org/abs/1407.6981
> <https://arxiv.org/abs/1407.6981>6:
> https://wiki.mozilla.org/Firefox/Shield/Shield_Studies

Thank you for reaching out to the community for feedback on this topic!
It is this kind of openness and transparency that makes me trust Mozilla's products more than anything else.

Which brings me to the planned opt-out SHIELD study.
I took the time to read a few things about the mechanism behind differential privacy, and while I believe this technology is promising and could be of value for Firefox to anonymize the more sensitive data, I don't think the goal of the study and the technology alone justify these data to be acquired in an opt-out fashion.

The benefits (eliminating occasional performance issues on popular websites) do not weigh up against the drawbacks (perception that Firefox resorts to techniques that put the user out of control, negative media coverage, declining user trust). I also wonder how this can be compatible with the GDPR's principles of consent?

So may I suggest to make this kind of anonymized but sensitive data collection always opt-in and persuade more users than ever (UX exercise!) to participate by building trust and informing about the purpose and the technology being used? Remember: trust takes years to build, seconds to break and forever to repair...

David Bruant

unread,
Aug 27, 2017, 8:47:26 AM8/27/17
to Georg Fritzsche, gover...@lists.mozilla.org, dev-p...@lists.mozilla.org
Hi Georg,

Some questions inlined

Le 21/08/2017 à 17:56, Georg Fritzsche via governance a écrit :
> Hi,
>
> for Firefox we want to better understand how people use our product to
> improve their experience. To do that, we are planning to run a new SHIELD
> study that tests how we can collect additional data in a privacy preserving
> way. Check out the details below and send me your thoughts.
>
> The problem.
>
> One recurring ask from the Firefox product teams is the ability to collect
> more sensitive data, like top sites users visit and how features perform on
> specific sites.
>
> Currently we can collect this data when the user opts in, but we don't
> have a way to collect unbiased data, without explicit consent (opt-out).
What is the current percentage of Firefox users opting-in?
What are the known biaises? How do they affect the study results?

> Asks for sensitive data center most commonly around knowing something in
> relation to which sites a user visits:
>
> -
>
> "Which top sites are users visiting?"
> -
>
> "Which sites using Flash does a user encounter?"
> -
>
> "Which sites does a user see heavy Jank on?"
>
> In summary most asks are for occurrences of an event X per domain (more
> specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).
>
> The solution.
>
> One solution is the use of differential privacy [2] [3], which allows us to
> collect sensitive data without being able to make conclusions about
> individual users, thus preserving their privacy.
>
> An attacker that has access to the data a single user submits is not able
> to tell whether a specific site was visited by that user or not.
Just to be 100% sure i understand, what will happen is that Firefox will
lie (or answer randomly) to the question with probability p. This way,
even if an attacker reaches to Moz servers, they can trust the answer
only with probability 1-p.
There is a trade-off between utility (low p) and stronger privacy (high p).
Could this trade-off be documented and a hard low limit be decided?
Should each study decide on a different p based on data sensitivity?

> The Google Open Source project called RAPPOR [4] [5] is the most widely
> known and deployed implementation of differential privacy.
>
> We have been investigating the use of RAPPOR for these kind of use-cases,
> with initial simulation results being promising.
>
> Our plan.
>
> What we plan to do now is run an opt-out SHIELD study [6] to validate our
> implementation of RAPPOR. This study will collect the value for users’ home
> page (eTLD+1) for a randomly selected group of our release population We
> are hoping to launch this in mid-September.
>
> This is not the type of data we have collected as opt-out in the past and
> is a new approach for Mozilla. As such, we are still experimenting with the
> project and wanted to reach out for feedback.
When this is on, can you publish the percentage and evolution of opt-out
somewhere?

Maybe I'm unfamiliar with Firefox data collection and Shield stuides,
but what's the policy regarding data deletion? Should the one about
opt-out study data be stricter?
The Shield study page suggests a study lasts 7 days but "can last much
longer" [1]. Could there be a strict policy about opt-out studies?
Or if a study needs to be longer, implement that each user does not have
the addon for more than 7 days (for instance) in a row?

Thanks,

David

[1]
https://wiki.mozilla.org/Firefox/Shield/Shield_Studies#How_long_do_Shield_Studies_last.3F

mac...@gmail.com

unread,
Aug 28, 2017, 6:06:23 AM8/28/17
to mozilla-g...@lists.mozilla.org
On Monday, 21 August 2017 16:56:44 UTC+1, Georg Fritzsche wrote:
> Hi,
>
> for Firefox we want to better understand how people use our product to
> improve their experience. To do that, we are planning to run a new SHIELD
> study that tests how we can collect additional data in a privacy preserving
> way. Check out the details below and send me your thoughts.
>
> The problem.
>
> One recurring ask from the Firefox product teams is the ability to collect
> more sensitive data, like top sites users visit and how features perform on
> specific sites.
>
> Currently we can collect this data when the user opts in, but we don't
> have a way to collect unbiased data, without explicit consent (opt-out).
>
> Asks for sensitive data center most commonly around knowing something in
> relation to which sites a user visits:
>
> -
>
> "Which top sites are users visiting?"
> -
>
> "Which sites using Flash does a user encounter?"
> -
>
> "Which sites does a user see heavy Jank on?"
>
> In summary most asks are for occurrences of an event X per domain (more
> specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).
>
> The solution.
>
> One solution is the use of differential privacy [2] [3], which allows us to
> collect sensitive data without being able to make conclusions about
> individual users, thus preserving their privacy.
>
> An attacker that has access to the data a single user submits is not able
> to tell whether a specific site was visited by that user or not.
>
> The Google Open Source project called RAPPOR [4] [5] is the most widely
> known and deployed implementation of differential privacy.
>
> We have been investigating the use of RAPPOR for these kind of use-cases,
> with initial simulation results being promising.
>
> Our plan.
>
> What we plan to do now is run an opt-out SHIELD study [6] to validate our
> implementation of RAPPOR. This study will collect the value for users’ home
> page (eTLD+1) for a randomly selected group of our release population We
> are hoping to launch this in mid-September.
>
> This is not the type of data we have collected as opt-out in the past and
> is a new approach for Mozilla. As such, we are still experimenting with the
> project and wanted to reach out for feedback.
>
May I as a Firefox user and who teaches Seniors computing and computing privacy and who is much influenced by European past 'big data' collection enter a small problem to Mozilla Firefox. I (note "I") opt into telemetry but I teach 'opt-out' always and every time. The population I am exposed to (Youngsters age 16 to 25 and seniors aged 55 to 93) divide into two;the youngsters are not much concerned about telemetry as long as it makes things easier for them, the seniors are extremely worried, and my recommendations of Firefox over MS browsers and Google Chrome/ Chromium would be to naught if you had an automatic opt-in policy. In Europe with its privacy concerns Firefox would be on its way to extingtion if you adopt an 'opt-out' policy. The opt-in and choice in Firefox settings is something that is used to set you appart and its in some ways your selling pointI as a user and teacher can recommend. I would not wish to loose this. The sugestions made by many more technically competant than I and their worries should be taken note off. I would note that much of my support on Windows 10 machines is turning off MS's telemetry for my students, and teaching them about privacy. I support Firefoix as an independent relaible browser and privacy always is higher than speed. Your present tell us what happened on a crash is probably sufficient for you to get user data.

Georg Fritzsche

unread,
Aug 29, 2017, 9:50:59 AM8/29/17
to Kurt Roeckx, mozilla-g...@lists.mozilla.org
On Thu, Aug 24, 2017 at 10:23 AM, Kurt Roeckx via governance <
gover...@lists.mozilla.org> wrote:

> On 2017-08-23 16:33, Alex Gaynor wrote:
>
>> I had the same question, but it looks like RAPPOR has gotten significantly
>> more advanced since I originally learned about the "just boolean
>> questions"
>> version. https://arxiv.org/pdf/1503.01214.pdf explains how to build
>> privacy
>> preserving measurements without knowing the values of the population.
>>
>
> So if I understand things correctly from the paper, you create a bloom
> filter for the URL/hostname you want to send, then randomly change it,
> store that. And each time they ask about the URL/hostname you take the
> stored version, randomly change it and that's what you send.
>
> What I understand from that is that you don't get to learn the
> URL/hostname at all, but can query if a URL/hostname has been submitted.
> You don't get to learn what the population is, but the whole population can
> be send.
>
> Is that accurate?
>

Hi,

through RAPPOR, we can send randomized values for all encountered domain
values.

Then, in analysis, we can test the noisy aggregate data against known
domain values and get an estimate of how frequently they occurred.

This gives immediate insights and we can increase the detail by adding more
sources for known domain values.

Georg

Georg Fritzsche

unread,
Aug 29, 2017, 9:52:29 AM8/29/17
to hagi...@gmail.com, mozilla-g...@lists.mozilla.org
On Fri, Aug 25, 2017 at 12:50 AM, hagis6789--- via governance <
gover...@lists.mozilla.org> wrote:

> It is one thing to give a count of how many cases of syphilis there are in
> a city, quite another to give a count of how many cases there are in a town
> of 300. It is one thing to say that credit card spending is up 20%, quite
> another to say that Bob Smith's credit card spending is up 20%.
>

Usage of techniques like RAPPOR allows us to keep individual data private,
regardless of the population size.

The trade-off is that we need a minimum population of a certain size to get
answers out of the aggregate data we collect.

In your specific example, we would not be able to get answers from a
population size of 300 people.


Georg

Georg Fritzsche

unread,
Aug 29, 2017, 9:54:39 AM8/29/17
to hagi...@gmail.com, mozilla-g...@lists.mozilla.org
On Fri, Aug 25, 2017 at 12:53 AM, hagis6789--- via governance <
gover...@lists.mozilla.org> wrote:

> It would be better to keep it opt-in as the default, but during
> installation prompt with the option to opt-in to telemetry, rather than
> quietly letting the install go with the default as opted-out.
>

For opt-out, we inform users about our data collection on first use and
that they can turn it off.


> 1. Keep personally identifiable information (like the combination of
> browser, add-ons, computer hardware, and antivirus) separate and
> disconnected from records of what websites were visited.
>

What is being proposed here will use RAPPOR to submit randomized data on
domains, so we will not be able to see any individual records. We only get
statistical information out of this data when looking at the aggregate of
all collected data.

Georg

Kurt Roeckx

unread,
Aug 29, 2017, 11:14:18 AM8/29/17
to mozilla-g...@lists.mozilla.org
The paper has several algorithms in it. The first is described in "II.
BACKGROUND", which does not allow you to learn the dictionary, but you
can check that certain URLs are in it or not.

Then in "III. ESTIMATING JOINT DISTRIBUTIONS" they describe how you can
correlate different answers with each other.

Then in "IV. RAPPOR WITHOUT A KNOWN DICTIONARY" they describe that you
can send some additional data, and then using the algorithm from III to
learn something about the dictionary.

Do you intend to use the algorithm from II or from IV?

From what I understand, for the algorithm of II there are various
parameters that affect the noise, and how likely it is someone can learn
something about the data you're sending. I think they at least include:
- The size of the bloom filter
- The number of hashes you use
- probability of the randomization for the PRR (f in the paper)
- probability of the randomization for the IRR (q and p from the paper)

Do you have any idea which you plan to use, and what the effect of that is?


Kurt

Georg Fritzsche

unread,
Aug 29, 2017, 1:25:53 PM8/29/17
to Laurențiu Nicola, mozilla-g...@lists.mozilla.org
Hi,

thanks for the feedback.

On Wed, Aug 23, 2017 at 9:01 PM, Laurențiu Nicola via governance <
gover...@lists.mozilla.org> wrote:

> > One recurring ask from the Firefox product teams is the ability to
> collect
> > more sensitive data, like top sites users visit and how features perform
> on
> > specific sites.
> >
> > Currently we can collect this data when the user opts in, but we don't
> > have a way to collect unbiased data, without explicit consent (opt-out).
>
> Does this refer to the Firefox Pioneer [1] add-on, or something else?
>

Yes, Pioneer is a way to approach this opt-in.

Georg

Laurențiu Nicola

unread,
Aug 29, 2017, 1:32:28 PM8/29/17
to mozilla-g...@lists.mozilla.org
On Tuesday, 29 August 2017 16:54:39 UTC+3, Georg Fritzsche wrote:
> For opt-out, we inform users about our data collection on first use and
> that they can turn it off.
>
> Georg

Just so you know, this alleviates a large part of concerns. Thanks for implementing it.

Georg Fritzsche

unread,
Aug 30, 2017, 2:43:46 PM8/30/17