Usage of Differential Privacy & RAPPOR

11628 views
Skip to first unread message

Georg Fritzsche

unread,
Aug 21, 2017, 11:56:44 AM8/21/17
to gover...@lists.mozilla.org, dev-p...@lists.mozilla.org
Hi,

for Firefox we want to better understand how people use our product to
improve their experience. To do that, we are planning to run a new SHIELD
study that tests how we can collect additional data in a privacy preserving
way. Check out the details below and send me your thoughts.

The problem.

One recurring ask from the Firefox product teams is the ability to collect
more sensitive data, like top sites users visit and how features perform on
specific sites.

Currently we can collect this data when the user opts in, but we don't
have a way to collect unbiased data, without explicit consent (opt-out).

Asks for sensitive data center most commonly around knowing something in
relation to which sites a user visits:

-

"Which top sites are users visiting?"
-

"Which sites using Flash does a user encounter?"
-

"Which sites does a user see heavy Jank on?"

In summary most asks are for occurrences of an event X per domain (more
specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).

The solution.

One solution is the use of differential privacy [2] [3], which allows us to
collect sensitive data without being able to make conclusions about
individual users, thus preserving their privacy.

An attacker that has access to the data a single user submits is not able
to tell whether a specific site was visited by that user or not.

The Google Open Source project called RAPPOR [4] [5] is the most widely
known and deployed implementation of differential privacy.

We have been investigating the use of RAPPOR for these kind of use-cases,
with initial simulation results being promising.

Our plan.

What we plan to do now is run an opt-out SHIELD study [6] to validate our
implementation of RAPPOR. This study will collect the value for users’ home
page (eTLD+1) for a randomly selected group of our release population We
are hoping to launch this in mid-September.

This is not the type of data we have collected as opt-out in the past and
is a new approach for Mozilla. As such, we are still experimenting with the
project and wanted to reach out for feedback.

Georg

References:

1: https://en.wikipedia.org/wiki/Public_Suffix_List

2: https://en.wikipedia.org/wiki/Differential_privacy

3: https://robertovitillo.com/2016/07/29/differential-privacy-for-dummies/

4: https://github.com/google/rappor
5: https://arxiv.org/abs/1407.6981
<https://arxiv.org/abs/1407.6981>6:
https://wiki.mozilla.org/Firefox/Shield/Shield_Studies

djoac...@gmail.com

unread,
Aug 22, 2017, 8:40:03 AM8/22/17
to mozilla-g...@lists.mozilla.org
Hello.

I don't have the neccesary information to say whether this is correct, moral, or neccesary, but I will say that I believe Opt-in is pro-privacy, while Opt-out is anti-privacy.

If Firefox is dedicated to preserving privacy, then no Opt-in data feature should be added.

Thank you.

omar.a...@gmail.com

unread,
Aug 22, 2017, 9:46:26 AM8/22/17
to mozilla-g...@lists.mozilla.org
What about the fact that I don't want to give my information even in an anonymous and untraceable way? You understand that anonymity is just part of the equation and not the single issue at stake here.

philipp.k...@googlemail.com

unread,
Aug 22, 2017, 9:46:26 AM8/22/17
to mozilla-g...@lists.mozilla.org
If this will be implemented, I’ll have to file a complaint with the relevant Landes- and Bundesbeauftragten für Datenschutz, and, possibly, escalate this to the EU Data Privacy commissioners office.

I’d prefer if you’d avoid doing this.

lede...@gmail.com

unread,
Aug 22, 2017, 9:46:27 AM8/22/17
to mozilla-g...@lists.mozilla.org
hi there.

i do not understand the need to know the top 100 sites for improving the "product".

can you explain?

i see a lot of big issues which should be improved regarding the performance and an overloaded feature-set, ui-quirks and several page rendering issues.

none of these would be addressed by accessing, analysing and storing even anonymously gathered user data.

cmn3...@gmail.com

unread,
Aug 22, 2017, 9:46:28 AM8/22/17
to mozilla-g...@lists.mozilla.org
Wow... I'm not sure I can say it any other way.

Having something like this be opt-out is very anti-privacy.

I'm thoroughly disappointed. I guess, even after having used Firefox since the single digit releases, it's time to look elsewhere.

I wish you the worst of luck in your new venture to infringe even further upon the privacy of your users.


CMN

pik7...@gmail.com

unread,
Aug 22, 2017, 9:46:35 AM8/22/17
to mozilla-g...@lists.mozilla.org
I somehow dont believe anymore in this.
What I spot is that there is more and more bodies which are interested in that kind of data to provide better adds to 'customers', and it sound like one of those bodies reach Firefox and show them how much money they can get for this data.

If you will implement this I will say 'bye bye' to this webbrowser.

Gervase Markham

unread,
Aug 22, 2017, 10:39:36 AM8/22/17
to Georg Fritzsche
Hello, Redditors...

On 21/08/17 08:56, Georg Fritzsche wrote:
> One solution is the use of differential privacy [2] [3], which allows us to
> collect sensitive data without being able to make conclusions about
> individual users, thus preserving their privacy.

If you are going to comment here, your comment would be more useful if
it showed that you have taken the time to understand differential
privacy and RAPPOR, and explained why you think it's not sufficient (if
that's what you think, after studying it).

Comments which assume that we are proposing to collect browser data with
no privacy protections at all are not helpful, because they assume
things which are not true.

Gerv

siva....@gmail.com

unread,
Aug 22, 2017, 10:45:09 AM8/22/17
to mozilla-g...@lists.mozilla.org
> Asks for sensitive data center most commonly around knowing something in
> relation to which sites a user visits:
>
> -
>
> "Which top sites are users visiting?"
> -
>
> "Which sites using Flash does a user encounter?"
> -
>
> "Which sites does a user see heavy Jank on?"
>
> In summary most asks are for occurrences of an event X per domain (more
> specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).


Hello Georg, three questions:

1. Could you explain exactly what kinds of problems (which are currently a big source of trouble) would be solved easily with the currently proposed plan? And also what kinds of problems cannot be solved with this data, but could be solved with more invasive data collection?

2. What exactly is the problem if the collection is opt-in? Yes the data is "biased", so what? Are you worried that you might miss certain issues faced mostly by users who don't opt in? Is there any justification for this argument, or is it just a hunch?

3. For those users who consider privacy most valuable, would there be an easy way to opt out, which *guarantees* that Mozilla collects *no information* about their browser usage?

Thank you.

Siva

turi...@gmail.com

unread,
Aug 22, 2017, 10:47:22 AM8/22/17
to mozilla-g...@lists.mozilla.org
But the disagreement is not about the idea that the technology does not work. But that in principal collecting more data without users having the option for disable it is moral wrong no matter how trustworthy you are or useful it is for the product.

Gervase Markham

unread,
Aug 22, 2017, 10:49:56 AM8/22/17
to mozilla-g...@lists.mozilla.org
On 22/08/17 07:45, turi...@gmail.com wrote:
> But the disagreement is not about the idea that the technology does
> not work. But that in principal collecting more data without users
> having the option for disable it is moral wrong no matter how
> trustworthy you are or useful it is for the product.

Users _do_ have the option to disable it.

Gerv

dan.ca...@gmail.com

unread,
Aug 22, 2017, 11:02:16 AM8/22/17
to mozilla-g...@lists.mozilla.org
On Monday, August 21, 2017 at 10:56:44 AM UTC-5, Georg Fritzsche wrote:
> One solution is the use of differential privacy [2] [3], which allows us to
> collect sensitive data without being able to make conclusions about
> individual users, thus preserving their privacy.

Differential privacy is a great tool, however, I'm concerned that even if we do everything *technically* correctly to preserve user privacy, the *optics* associated with this sort of data collection were not address in this email.

We attempted to do similarly with User Profile ("UP") / Directory Tiles projects in Content Services, which proposed completely local history analysis for purposes of advertising and content discovery. All of which was done in a way that absolutely protected user privacy (the analysis never left the local machine), but we weren't able to overcome the superficial impression that Firefox was tracking users.

1. How do you propose we address the change in (and mis-)perception of Firefox as a result of this telemetry?

2. Secondly, I'm far more comfortable with data collection that's strictly tied to performance (jank, Flash domains, etc.) than I am with personal data, like homepages or top sites. Would this project be as valuable *without* collecting personalized information like the above?

Best,
Dan

malt...@gmail.com

unread,
Aug 22, 2017, 11:02:26 AM8/22/17
to mozilla-g...@lists.mozilla.org
Why collect on the client side when the server side for the larger sites most definitely collects usage data to much more detail than you would ever do?

Wouldn't Mozilla be in a strong enough position to ask for statistics of user agents from Facebook, Google, etc., and maybe even what hoops their respective engineering departments have to jump through to make the site work on all major platforms?

David Teller

unread,
Aug 22, 2017, 11:05:25 AM8/22/17
to gover...@lists.mozilla.org
Hello Siva,

I'll try and chime in.

1. The main problem that is at stake here is improving Firefox for
websites that our users actually use. We fight a perpetual fight to
improve Firefox for our users, which means that we need to know where to
spend our limited resources. While we can manually or semi-automatically
test a number of websites to find out whether, for instance, Firefox 55
is faster or slower than the previous version, for the moment, we have
to rely upon guesses to determine whether these are websites that our
users actually use.

With more data, we could automatically determine such information and more.

For instance, we could correlate this with crash reports of users who
choose to submit these reports and automatically find out that since
version 60 of Firefox, site foobar.com causes crashes, or maybe that
crashes have decreased on that site since the release of a new graphics
driver. Or we could correlate this with performance reports of users who
opt-in for such reports and automatically find out that since version 60
of Firefox, our performance on foobar.com has improved/decreased.

So, to summarize:

- being able to apply effort to websites that matter to our users;

- being able to automatically detect problems (or improvements) on websites.



2. I don't know the details sufficiently to answer on this point.
However, I can give you my personal thoughts on it.

We have known for long (by comparing our data with other available
sources of data such as Alexa) that there is a considerable bias between
users that opt-in for Telemetry and the rest of our users. Users who
opt-in for Telemetry are typically much more technically aware than
other users, but also some countries were largely over-represented.



3. That's a UX question, so that's pretty far from my expertise, but
there is already a section "Firefox Data Collection and Use" in
preferences, which may be used to opt-in/opt-out. I also seem to
remember that Firefox actually asks you upon first installation whether
you are ok with sending data.


I hope this helps,
David

On 22/08/17 16:44, siva.rk.sw--- via governance wrote:
>> Asks for sensitive data center most commonly around knowing something in
>> relation to which sites a user visits:
>>
>> -
>>
>> "Which top sites are users visiting?"
>> -
>>
>> "Which sites using Flash does a user encounter?"
>> -
>>
>> "Which sites does a user see heavy Jank on?"
>>
>> In summary most asks are for occurrences of an event X per domain (more
>> specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).
>
>
> Hello Georg, three questions:
>
> 1. Could you explain exactly what kinds of problems (which are currently a big source of trouble) would be solved easily with the currently proposed plan? And also what kinds of problems cannot be solved with this data, but could be solved with more invasive data collection?
>
> 2. What exactly is the problem if the collection is opt-in? Yes the data is "biased", so what? Are you worried that you might miss certain issues faced mostly by users who don't opt in? Is there any justification for this argument, or is it just a hunch?
>
> 3. For those users who consider privacy most valuable, would there be an easy way to opt out, which *guarantees* that Mozilla collects *no information* about their browser usage?
>
> Thank you.
>
> Siva
> _______________________________________________
> governance mailing list
> gover...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/governance
>

turi...@gmail.com

unread,
Aug 22, 2017, 11:12:59 AM8/22/17
to mozilla-g...@lists.mozilla.org
On Tuesday, August 22, 2017 at 4:49:56 PM UTC+2, Gervase Markham wrote:
> On 22/08/17 07:45,
> > But the disagreement is not about the idea that the technology does
> > not work. But that in principal collecting more data without users
> > having the option for disable it is moral wrong no matter how
> > trustworthy you are or useful it is for the product.
>
> Users _do_ have the option to disable it.
>
> Gerv

Correct me if i am wrong but this is presented as a solution to collect data without having to get explicit consent. It is not clear that user will be able to disabled it or not. If this the case then please be more clear as it will lead to misunderstandings.

How will this work? Having it enabled by default without making explicitly clear that it is happening is still morally wrong and anti-privacy. The policy pretty much hopes that people will be either be uninformed or complacent in disabling it. Otherwise whats the different to asking for explicit consent?

Gervase Markham

unread,
Aug 22, 2017, 11:16:28 AM8/22/17
to turi...@gmail.com
On 22/08/17 08:07, turi...@gmail.com wrote:
> Correct me if i am wrong but this is presented as a solution to
> collect data without having to get explicit consent. It is not clear
> that user will be able to disabled it or not. If this the case then
> please be more clear as it will lead to misunderstandings.

Perhaps it could have been more clear in the initial post, but Georg
definitely says:

"This is not the type of data we have collected as opt-out in the past
and is a new approach for Mozilla."

So this data collection will have an opt-out.

> How will this work? Having it enabled by default without making
> explicitly clear that it is happening is still morally wrong and
> anti-privacy. The policy pretty much hopes that people will be either
> be uninformed or complacent in disabling it. Otherwise whats the
> different to asking for explicit consent?

We have other data we collect which is opt-out, such as how the browser
is performing. The idea of opt-out data collection is not really the
question; the difference here is that the data is potentially more
sensitive. To address that, we want to use differential privacy and
RAPPOR; a good discussion to have, therefore, is whether those tools do
the job or not.

Gerv

jot...@hotmail.com

unread,
Aug 22, 2017, 11:28:25 AM8/22/17
to mozilla-g...@lists.mozilla.org

I made a thoughtful comment and it was rejected with a response that I need to show I'm familiar with Differential Privacy and RAPPOR before commenting. I'll do that before my actual comment.

I'm a computer scientist working in an adjacent field and I've read enough papers on Differential Privacy to understand it.

The objection is not to DP's privacy guarantees, but to the fact that FF will phone home with every website we visit. A neat list of all the websites I visit will be sent to a central location, in chronological order.

A second objection is the users' response, regardless of guarantees. You can't explain DP to everyone. For many users it will amount to "trust us". Microsoft did the same with the Windows 10 telemetry and it resulted in enormous backlash from users, widely reported in tech websites. Consider that before committing.

---

What follows was my actual suggestion, which is orthogonal to DP.

The example questions can be answered with no need for the bulk telemetry that's proposed:

> "Which top sites are users visiting?"

There's enough public data available on what sites are most popular. No need for yet another database on that.

> "Which sites using Flash does a user encounter?"

Mozilla can crawl this information itself, based on the above websites list. It doesn't need to ask users to do it.

> "Which sites does a user see heavy Jank on?"

Slowdowns and similar bad user experiences would better be treated like crash reports.

Offering to send anonymous info on one of these events, through a popup or dropdown hanger (similar to the password manager, security certificates, etc), would fulfill the same objective. A user is inclined to help when his/her favorite website suddenly starts slowing down, or throwing errors. At this point it's also easy to check a box to "always do this from now on".

Rather than authorizing abstract, bulk usage, the user would see the value in sending a report about the current issue, because he/she is experiencing it and wants Mozilla to fix it. I'm sure there would be more reports in this manner, just like there are more than enough crash reports being sent.

---

In conclusion, no telemetry is one of the main reasons for adopting FF over Chrome. Without dismissing the developers' point of view, given the importance of this feature, the onus should be on them to show that the alternatives have been explored and are not feasible, rather than putting the onus on users to show holes in the DP scheme, which is too restrictive for a discussion.

Danny

unread,
Aug 22, 2017, 11:28:31 AM8/22/17
to mozilla-g...@lists.mozilla.org
Hoping to provide constructive feedback.

A little about me as a user first so you can understand:
1) I purposely do not use Chrome
2) I purposely do not use Google and use DuckDuckGo instead as my search engine
3) I purposely do not use Gmail and use FastMail instead
4) I use uBlock, self-destructing cookies, Privacy Badger, etc
5) I use container tabs
6) I opt-out of any data collection
7) The first thing I do with Firefox Focus on iOS is to opt out of data collection
8) I like the idea of Firefox Send being end-to-end encrypted
9) I encrypt my backup locally prior to sending it to the cloud
10) I do not use Dropbox and use Tresorit instead
11) I disabled all telemetry on Windows 10

Now, if this was pushed out, the first thing I would do is still to disable it.
But why is that? RAPPOR is awesome right?

I briefly read the overview for it, so please correct me if I have any misunderstanding.

RAPPOR is kind of like the protection of farting in a crowded elevator. Somebody in that group did it, but we don't know who for sure. Yes, that's better privacy for sure, but is it total privacy? Not to me. Because you still know that somebody in that elevator did it very likely. Not a perfect analogy, but hopefully demonstrates the cracks.

Why do users like me do end-to-end encryption? What does that give me?
It gives me the ability to trust nobody except my end.

RAPPOR does not offer that same level of protection, and I think that's hopefully clear by the elevator example. That's why the first thing I'll do is disable it.

Why do users like me use uBlock and other things? What do those things give us? Total control in our hands. Does RAPPOR measure up to that standard? I think no.

But I very much want Firefox to succeed because the alternative of Chrome or Edge is a sad world. And I very much would like to submit data to Firefox, but not in an automatic and uncontrolled (by me) way.

Why does the choice have to be binary?

If I may suggest, could Mozilla investigate doing a bit more UX work to make data collection palatable to users like me?

And that means putting me in control.

I know that perf data is extremely important. In fact, I was just seeing freezes yesterday and that's kinda frustrating. But I still won't enable automatic data collection. What I think would be nice is if you actually just prompted me "crash reporting" style. Ask me, "hey... we know Firefox was a bit slow for you on such and such site, would you like to let us know?" And then give me the option of "Yes, this one time", "Yes, always on this site", "No".

Of course you still have to anonymize that data, but what this does is you've given me control. See the distinction?

What if you want to know what top sites I'm visiting? or what sites with Flash that I encounter? Same thing. Yes, I know the suggestion is eTLD+1, but hopefully mindset of users like me has been explained at this point to show that is still not measuring up. I would love to let you know what sites I visit. Give me a little feedback ability, show me the sites that you're going to send and let me check off anything I don't want. Again, you probably don't want to annoy the user with needless prompts, so maybe the ability to say "Yes, these sites are always ok to send", or "Always exclude this site".

I'm sure if you proceed as planned to automatic opt-in users to this collection, you're going to get more data. But you can bet you'll get none from me.

I want to send you data, so please help me help you.

turi...@gmail.com

unread,
Aug 22, 2017, 11:53:37 AM8/22/17
to mozilla-g...@lists.mozilla.org

>
> The idea of opt-out data collection is not really the
> question; the difference here is that the data is potentially more
> sensitive.
>
> Gerv

Exactly. Because the data is more sensitive the idea of opt-out comes into question before the question of the technology. If a person thinks that opt-out data collection is wrong it does not matter how effective the privacy technology is.

This definitely has the potential to hurt the Firefox brand as a product that respects choice and does not try to trick you.

Anyway since you wish a greater discussion on the actual technology i will stop here. Thank you for the replies.

Aaron Klotz

unread,
Aug 22, 2017, 1:21:36 PM8/22/17
to gover...@lists.mozilla.org
For the purposes of this thread I am not taking a specific position on
the overall issue, but as somebody who has worked on performance I would
like to point something out for discussion:

On 8/22/2017 9:19 AM, jotaf98--- via governance wrote:
>> "Which top sites are users visiting?"
> There's enough public data available on what sites are most popular. No need for yet another database on that.
>
>> "Which sites using Flash does a user encounter?"
> Mozilla can crawl this information itself, based on the above websites list. It doesn't need to ask users to do it.

I don't think it's that simple. Plenty of content on top sites is
tailored to the user in some way. To measure how a browser is performing
on those sites, one would want to measure performance for actual content
that real users are seeing. Throwing some kind of crawler at it is
unlikely to produce representative samples of encounters with such
content, IMHO.

Georg Fritzsche

unread,
Aug 22, 2017, 1:44:59 PM8/22/17
to lede...@gmail.com, mozilla-g...@lists.mozilla.org
Hi,

great question.
We have been getting better at tracking down general performance issues and
breakage, but usually we don't know which sites this is happening on.
I think there are two parts here:
- Understanding which domains things happen on, e.g. on which sites did
feature X break.
- Understanding what the actual top sites are in Firefox to inform testing
and investigations.

Neither of these are settled yet, but were specific asks that came up.
Collecting these - in a privacy-preserving way - would be valuable.
Georg

Irvin Chen

unread,
Aug 22, 2017, 1:54:51 PM8/22/17
to Aaron Klotz, gover...@lists.mozilla.org
I'm totally support for any user research, if it is following the rules we
advocate for...

“Individuals’ security and privacy on the Internet are fundamental and must
not be treated as optional.”
https://www.mozilla.org/en-US/about/manifesto/#principle-04

“No surprises
Use and share information in a way that is transparent and benefits the
user.”
https://www.mozilla.org/en-US/privacy/principles/

“Privacy as the default setting: ...privacy must be top of mind. It also
means that strong privacy should always be the ‘by-default setting’.”
https://blog.mozilla.org/netpolicy/2016/05/25/the-countdown-is-on-24-months-to-gdpr-compliance/

“Privacy by Default
Privacy by Default simply means that the strictest privacy settings
automatically apply once a customer acquires a new product or service. In
other words, no manual change to the privacy settings should be required on
the part of the user.”
http://www.eudataprotectionregulation.com/data-protection-design-by-default



Aaron Klotz via governance <gover...@lists.mozilla.org>於 2017年8月23日
週三,上午1:21寫道:

> For the purposes of this thread I am not taking a specific position on
> the overall issue, but as somebody who has worked on performance I would
> like to point something out for discussion:
>
> On 8/22/2017 9:19 AM, jotaf98--- via governance wrote:
> >> "Which top sites are users visiting?"
> > There's enough public data available on what sites are most popular. No
> need for yet another database on that.
> >
> >> "Which sites using Flash does a user encounter?"
> > Mozilla can crawl this information itself, based on the above websites
> list. It doesn't need to ask users to do it.
>
> I don't think it's that simple. Plenty of content on top sites is
> tailored to the user in some way. To measure how a browser is performing
> on those sites, one would want to measure performance for actual content
> that real users are seeing. Throwing some kind of crawler at it is
> unlikely to produce representative samples of encounters with such
> content, IMHO.

gene...@gmail.com

unread,
Aug 22, 2017, 3:33:04 PM8/22/17
to mozilla-g...@lists.mozilla.org
How do you see this study lining up with the data already being collected for the Firefox Health Report? This discussion strikes me as raising similar questions and tradeoffs, which were discussed on this forum and also blogged about by Mitchell and the metrics team.*

Users currently have options under Preferences to "automatically send technical and interaction data to Mozilla" and to "send crash reports to Mozilla." How do you see handling the opt-out for the study from a UX perspective? Is there a way to automatically opt out those of us sending a DNT header? Or respect a user with data sharing turned off?

*https://blog.lizardwrangler.com/2012/09/21/firefox-health-report/ and https://blog.mozilla.org/metrics/2012/09/21/firefox-health-report/

Rubén Martín

unread,
Aug 22, 2017, 3:57:32 PM8/22/17
to Irvin Chen, Aaron Klotz, gover...@lists.mozilla.org
Hi,

I completely agree with Irvin here. This proposal is extremely
uncomfortable for me, even after reading about Rappor, I share the same
concerns others have expressed in this topic about privacy and user
expectations.

I think we can be creative if the problem is that we need to understand
which sites are not performing OK on Firefox without compromising our
values. Sending sensitive information as opt-out is not, in my opinion,
the way to go.

Cheers.

El 22/08/17 a las 19:54, Irvin Chen via governance escribió:
> I'm totally support for any user research, if it is following the rules we
> advocate for...
>
> “Individuals’ security and privacy on the Internet are fundamental and must
> not be treated as optional.”
> https://www.mozilla.org/en-US/about/manifesto/#principle-04
>
> “No surprises
> Use and share information in a way that is transparent and benefits the
> user.”
> https://www.mozilla.org/en-US/privacy/principles/
>
> “Privacy as the default setting: ...privacy must be top of mind. It also
> means that strong privacy should always be the ‘by-default setting’.”
> https://blog.mozilla.org/netpolicy/2016/05/25/the-countdown-is-on-24-months-to-gdpr-compliance/
>
> “Privacy by Default
> Privacy by Default simply means that the strictest privacy settings
> automatically apply once a customer acquires a new product or service. In
> other words, no manual change to the privacy settings should be required on
> the part of the user.”
> http://www.eudataprotectionregulation.com/data-protection-design-by-default
>
>
>
> Aaron Klotz via governance <gover...@lists.mozilla.org>於 2017年8月23日
> 週三,上午1:21寫道:
>
>> For the purposes of this thread I am not taking a specific position on
>> the overall issue, but as somebody who has worked on performance I would
>> like to point something out for discussion:
>>
>> On 8/22/2017 9:19 AM, jotaf98--- via governance wrote:
>>>> "Which top sites are users visiting?"
>>> There's enough public data available on what sites are most popular. No
>> need for yet another database on that.
>>>> "Which sites using Flash does a user encounter?"
>>> Mozilla can crawl this information itself, based on the above websites
>> list. It doesn't need to ask users to do it.
>>
>> I don't think it's that simple. Plenty of content on top sites is
>> tailored to the user in some way. To measure how a browser is performing
>> on those sites, one would want to measure performance for actual content
>> that real users are seeing. Throwing some kind of crawler at it is
>> unlikely to produce representative samples of encounters with such
>> content, IMHO.
>> _______________________________________________
>> governance mailing list
>> gover...@lists.mozilla.org
>> https://lists.mozilla.org/listinfo/governance
>>
> _______________________________________________
> governance mailing list
> gover...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/governance


--
Rubén Martín [Nukeador]
Mozilla Reps Mentor
http://www.mozilla-hispano.org
http://twitter.com/mozilla_hispano
http://facebook.com/mozillahispano


signature.asc

hen000....@gmail.com

unread,
Aug 22, 2017, 7:44:35 PM8/22/17
to mozilla-g...@lists.mozilla.org
On Monday, 21 August 2017 15:56:44 UTC, Georg Fritzsche wrote:
> "Which sites does a user see heavy Jank on?"


Why can't FireFox display a bar at the top asking the user to report the page for issues instead?

A bar like the one that tells users about non-responsive scripts, ect. That way you are not always collecting private data about the user and you also don't get the 'bias' of an opt-in system.

morocca...@gmail.com

unread,
Aug 22, 2017, 7:44:48 PM8/22/17
to mozilla-g...@lists.mozilla.org
My initial thoughts were RAPPOR is just another data collection system that claims to respect its user's privacy but doesn't really, though upon a little research I've found it does the exact opposite, wherein it really does respect privacy by aggregating real data with fake, random data. It reminds me of what Wikileaks did to prevent its real sources from being discovered. I just hope more sites take this approach to data collection, since it gives the best of both worlds.

patrick...@gmail.com

unread,
Aug 22, 2017, 7:45:28 PM8/22/17
to mozilla-g...@lists.mozilla.org
I think the premise that you need to collect data on the top sites that a user visits may be flawed. Won't you be contributing to the dominance of (already-dominant) top sites by optimizing for them specifically?

It also seems that you could get a reasonably accurate idea of what sites are most popular among FIrefox users by looking at the most popular sites overall and optimizing for those. Do you expect that Firefox users are so wildly different that their top sites don't look more or less the same as the top sites overall?

Further, as has been shown again and again, data thought to be untraceable to any particular user has been deanonymized through correlations with other data sets. Something like top visited sites are actually a pretty juicy target as well for state actors, blackmailers, etc.

Finally, the mere act of doing random (from the user's perspective) telemetry is problematic. First, users on limited connections don't need to be using more data than they already are. Second, the mere act of making a request with IP endpoints, even if it sends only a ping, can expose an unprepared user who needs privacy. I understand that Firefox already does some of this, but that's not really a reason to do more.

>From a business perspective, a major differentiating factor (arguably the only differentiating factor) of Firefox is that Mozilla isn't Google. The closer you get to that line, the more damage you'll do to the trust users have in Mozilla.

I recommend that you take the high road on this one. I'm not sure what the motivator is here (does having more data give you leverage with partners)? But the stated justification (improving speeds on particular websites) seems too weak to excuse the valid privacy concerns.

Mozilla: we want to trust you. We do trust you. We know it's tough out there. You're playing with the big kids, and they have intel that, admittedly, probably helps them improve their products. But the way you can improve your product is by NOT collecting that intel. Do the Mozilla thing, not the Google thing.

On Monday, August 21, 2017 at 11:56:44 AM UTC-4, Georg Fritzsche wrote:
> Hi,
>
> for Firefox we want to better understand how people use our product to
> improve their experience. To do that, we are planning to run a new SHIELD
> study that tests how we can collect additional data in a privacy preserving
> way. Check out the details below and send me your thoughts.
>
> The problem.
>
> One recurring ask from the Firefox product teams is the ability to collect
> more sensitive data, like top sites users visit and how features perform on
> specific sites.
>
> Currently we can collect this data when the user opts in, but we don't
> have a way to collect unbiased data, without explicit consent (opt-out).
>
> Asks for sensitive data center most commonly around knowing something in
> relation to which sites a user visits:
>
> -
>
> "Which top sites are users visiting?"
> -
>
> "Which sites using Flash does a user encounter?"
> -
>
> "Which sites does a user see heavy Jank on?"
>
> In summary most asks are for occurrences of an event X per domain (more
> specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).
>
> The solution.
>
> One solution is the use of differential privacy [2] [3], which allows us to
> collect sensitive data without being able to make conclusions about
> individual users, thus preserving their privacy.
>
> An attacker that has access to the data a single user submits is not able
> to tell whether a specific site was visited by that user or not.
>
> The Google Open Source project called RAPPOR [4] [5] is the most widely
> known and deployed implementation of differential privacy.
>
> We have been investigating the use of RAPPOR for these kind of use-cases,
> with initial simulation results being promising.
>
> Our plan.
>
> What we plan to do now is run an opt-out SHIELD study [6] to validate our
> implementation of RAPPOR. This study will collect the value for users’ home
> page (eTLD+1) for a randomly selected group of our release population We
> are hoping to launch this in mid-September.
>
> This is not the type of data we have collected as opt-out in the past and

Panos Astithas

unread,
Aug 23, 2017, 8:11:03 AM8/23/17
to hen000....@gmail.com, mozilla-g...@lists.mozilla.org
On Tue, Aug 22, 2017 at 9:45 PM, hen000.c.young--- via governance <
gover...@lists.mozilla.org> wrote:

> On Monday, 21 August 2017 15:56:44 UTC, Georg Fritzsche wrote:
> > "Which sites does a user see heavy Jank on?"
>
>
> Why can't FireFox display a bar at the top asking the user to report the
> page for issues instead?
>

Because this is the definition of opt-in data collection ("can we collect
this data? Sure, I'm in!"), which has the data quality issues already
mentioned. Opt-out data collection means that by default we would be
collecting the data, unless the user goes to the preferences panel and opts
out of it (there is also a notification bar for every new installation to
remind users of this policy and how to opt out).

There is already both opt-out data collection going on (e.g. longest cycle
collection pause) and opt-in data collection (e.g. whether the device
supports touch input) in Firefox. The differential privacy approach of
RAPPOR we believe gives us the mathematical proof that we can collect some
of the more privacy-sensitive data in a way that doesn't reduce user
privacy.

And to reiterate the obvious: we don't collect user data to build user
profiles and we never want to. We are not in the advertising business, we
are a non profit working for the public benefit. We will always be
providing ways to disable data collection, even for the non personal
identifying information we need to collect in order to improve the browser.
We believe the academic research of the last few years on differential
privacy has figured out ways to collect data without infringing on user
privacy. Apple, Google and others are already using the fruits of this
research in their products. If there are reasons to believe these methods
aren't working well enough, we would very much like to know about them!

Panos

Mike Hommey

unread,
Aug 23, 2017, 9:05:52 AM8/23/17
to Panos Astithas, hen000....@gmail.com, mozilla-g...@lists.mozilla.org
On Wed, Aug 23, 2017 at 03:10:31PM +0300, Panos Astithas via governance wrote:
> (...) We are not in the advertising business, (...)

In fairness, we have been, at one moment, to the surprise of many. I can
understand that people could fear that happens again some day.

Mike

oliver....@gmail.com

unread,
Aug 23, 2017, 10:19:47 AM8/23/17
to mozilla-g...@lists.mozilla.org
I know it's super-anonymised, but given the controversial nature of the subject, it might help people to understand more about the actual data you're planning to collect.

So what exactly do you plan to collect? How long do you plan to store this data? How will it be stored? Is there a process in place to ensure it isn't kept for any longer than necessary? Who will have access to the data and how?

Thanks,
Olly.

pgne...@gmail.com

unread,
Aug 23, 2017, 10:20:31 AM8/23/17
to mozilla-g...@lists.mozilla.org
On Wednesday, August 23, 2017 at 6:05:52 AM UTC-7, Mike Hommey wrote:
> In fairness, we have been, at one moment, to the surprise of many. I can
> understand that people could fear that happens again some day.

Well noted. And then some.

This debate is astonishing, but no longer surprising.

In my book, if a vendor wants to *begin* to estabish trust, step *1* is, as Irving above referenced, "Privacy by Default".

OTOH, if they want to ensure that they lose it rapidly, "Opt-Out" is, respevtively, a GREAT starting point.

Is there at least going to be an about:config param, ENV var, etc. to set PRIOR to this auto-toggle to Opt-Out? So that the toggle, even if only for moments, is preventable for both existing users, and for new installs? Or do we need to start sleuthing for, and firewalling, telemetry endpoints?

Display name

unread,
Aug 23, 2017, 10:20:51 AM8/23/17