Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Usage of Differential Privacy & RAPPOR

51 views
Skip to first unread message

Georg Fritzsche

unread,
Aug 21, 2017, 11:56:42 AM8/21/17
to gover...@lists.mozilla.org, dev-p...@lists.mozilla.org
Hi,

for Firefox we want to better understand how people use our product to
improve their experience. To do that, we are planning to run a new SHIELD
study that tests how we can collect additional data in a privacy preserving
way. Check out the details below and send me your thoughts.

The problem.

One recurring ask from the Firefox product teams is the ability to collect
more sensitive data, like top sites users visit and how features perform on
specific sites.

Currently we can collect this data when the user opts in, but we don't
have a way to collect unbiased data, without explicit consent (opt-out).

Asks for sensitive data center most commonly around knowing something in
relation to which sites a user visits:

-

"Which top sites are users visiting?"
-

"Which sites using Flash does a user encounter?"
-

"Which sites does a user see heavy Jank on?"

In summary most asks are for occurrences of an event X per domain (more
specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).

The solution.

One solution is the use of differential privacy [2] [3], which allows us to
collect sensitive data without being able to make conclusions about
individual users, thus preserving their privacy.

An attacker that has access to the data a single user submits is not able
to tell whether a specific site was visited by that user or not.

The Google Open Source project called RAPPOR [4] [5] is the most widely
known and deployed implementation of differential privacy.

We have been investigating the use of RAPPOR for these kind of use-cases,
with initial simulation results being promising.

Our plan.

What we plan to do now is run an opt-out SHIELD study [6] to validate our
implementation of RAPPOR. This study will collect the value for users’ home
page (eTLD+1) for a randomly selected group of our release population We
are hoping to launch this in mid-September.

This is not the type of data we have collected as opt-out in the past and
is a new approach for Mozilla. As such, we are still experimenting with the
project and wanted to reach out for feedback.

Georg

References:

1: https://en.wikipedia.org/wiki/Public_Suffix_List

2: https://en.wikipedia.org/wiki/Differential_privacy

3: https://robertovitillo.com/2016/07/29/differential-privacy-for-dummies/

4: https://github.com/google/rappor
5: https://arxiv.org/abs/1407.6981
<https://arxiv.org/abs/1407.6981>6:
https://wiki.mozilla.org/Firefox/Shield/Shield_Studies

David Bruant

unread,
Aug 27, 2017, 8:47:25 AM8/27/17
to Georg Fritzsche, gover...@lists.mozilla.org, dev-p...@lists.mozilla.org
Hi Georg,

Some questions inlined

Le 21/08/2017 à 17:56, Georg Fritzsche via governance a écrit :
> Hi,
>
> for Firefox we want to better understand how people use our product to
> improve their experience. To do that, we are planning to run a new SHIELD
> study that tests how we can collect additional data in a privacy preserving
> way. Check out the details below and send me your thoughts.
>
> The problem.
>
> One recurring ask from the Firefox product teams is the ability to collect
> more sensitive data, like top sites users visit and how features perform on
> specific sites.
>
> Currently we can collect this data when the user opts in, but we don't
> have a way to collect unbiased data, without explicit consent (opt-out).
What is the current percentage of Firefox users opting-in?
What are the known biaises? How do they affect the study results?

> Asks for sensitive data center most commonly around knowing something in
> relation to which sites a user visits:
>
> -
>
> "Which top sites are users visiting?"
> -
>
> "Which sites using Flash does a user encounter?"
> -
>
> "Which sites does a user see heavy Jank on?"
>
> In summary most asks are for occurrences of an event X per domain (more
> specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).
>
> The solution.
>
> One solution is the use of differential privacy [2] [3], which allows us to
> collect sensitive data without being able to make conclusions about
> individual users, thus preserving their privacy.
>
> An attacker that has access to the data a single user submits is not able
> to tell whether a specific site was visited by that user or not.
Just to be 100% sure i understand, what will happen is that Firefox will
lie (or answer randomly) to the question with probability p. This way,
even if an attacker reaches to Moz servers, they can trust the answer
only with probability 1-p.
There is a trade-off between utility (low p) and stronger privacy (high p).
Could this trade-off be documented and a hard low limit be decided?
Should each study decide on a different p based on data sensitivity?

> The Google Open Source project called RAPPOR [4] [5] is the most widely
> known and deployed implementation of differential privacy.
>
> We have been investigating the use of RAPPOR for these kind of use-cases,
> with initial simulation results being promising.
>
> Our plan.
>
> What we plan to do now is run an opt-out SHIELD study [6] to validate our
> implementation of RAPPOR. This study will collect the value for users’ home
> page (eTLD+1) for a randomly selected group of our release population We
> are hoping to launch this in mid-September.
>
> This is not the type of data we have collected as opt-out in the past and
> is a new approach for Mozilla. As such, we are still experimenting with the
> project and wanted to reach out for feedback.
When this is on, can you publish the percentage and evolution of opt-out
somewhere?

Maybe I'm unfamiliar with Firefox data collection and Shield stuides,
but what's the policy regarding data deletion? Should the one about
opt-out study data be stricter?
The Shield study page suggests a study lasts 7 days but "can last much
longer" [1]. Could there be a strict policy about opt-out studies?
Or if a study needs to be longer, implement that each user does not have
the addon for more than 7 days (for instance) in a row?

Thanks,

David

[1]
https://wiki.mozilla.org/Firefox/Shield/Shield_Studies#How_long_do_Shield_Studies_last.3F

0 new messages