Potential privacy issues of not showing suggestions in certain contexts

Ed Lee

unread,

Apr 23, 2015, 2:23:12 PM4/23/15

to

[ For context around Suggested Tiles, please read http://ed.agadak.net/2015/04/whys-and-hows-of-suggested-tiles ]

An important feature for some brands who would want to provide content in the new tab page is to not be shown in the context of certain sites. For example, a suggestion for an upcoming movie trailer wouldn't want to be shown next to a tile for illegal file sharing.

We're planning on supporting this feature, but it has some privacy nuances.

Similar to how Firefox has enough data to decide that it's appropriate to show a movie trailer suggestion to users who tend to visit movie related sites, Firefox could have enough data to decide when it's inappropriate. If a given Firefox instance does *not* report back that the movie trailer suggestion was shown, one could try to infer that it was because the user had a visible illegal file sharing site in the new tab page.

Another approach is to temporarily hide the illegal file sharing site so that it's acceptable to show the movie trailer. Once the suggestion no longer needs to be shown, the hidden tile can reappear. However, even in this situation, Firefox reports back the structure of the new tab page (but no URLs), and there could be enough data to infer something was hidden then shown again.

To be clear on both potential privacy leaks, we have aggressive data deletion policies and don't keep any unique identifiers that can be associated with our users. But this ability could be used by malicious entities to learn information about users.

Do people have thoughts on the privacy issues raised here and potential solutions?

Ed Lee

Archaeopteryx

unread,

Apr 24, 2015, 4:36:45 AM4/24/15

to

-------- Original-Nachricht --------
Betreff: Potential privacy issues of not showing suggestions in certain
contexts
Von: Ed Lee <edi...@mozilla.com>
Datum: 2015-04-23 20:23

> Similar to how Firefox has enough data to decide that it's
> appropriate to show a movie trailer suggestion to users who tend to
> visit movie related sites, Firefox could have enough data to decide
> when it's inappropriate. If a given Firefox instance does *not*
> report back that the movie trailer suggestion was shown, one could
> try to infer that it was because the user had a visible illegal file
> sharing site in the new tab page.

Is it feasible to only show such paid tiles under the condition that it
only gets shown to X% of the users so not showing the tile doesn't imply
anything?

Archaeopteryx

Ed Lee

unread,

Apr 25, 2015, 10:04:15 PM4/25/15

to

On Friday, April 24, 2015 at 1:36:45 AM UTC-7, Archaeopteryx wrote:
> Is it feasible to only show such paid tiles under the condition that it
> only gets shown to X% of the users so not showing the tile doesn't imply
> anything?

There is some fuzziness already around when a suggested tile is shown. We only show one suggestion at a time, so if a given tile isn't shown, it could be because another tile is being shown. We also frequency cap suggestions, i.e., stop showing them after some number of views.

We could add to that uncertainty by having Firefox decide to show a suggestion 50% of the time that it could have shown something.

Each of these makes it so that when a tile isn't shown, it's not guaranteed it's because the user has some illegal site visible.

Andrew Sutherland

unread,

Apr 25, 2015, 10:35:35 PM4/25/15

to dev-pl...@lists.mozilla.org

On Thu, Apr 23, 2015, at 02:23 PM, Ed Lee wrote:
> Do people have thoughts on the privacy issues raised here and potential
> solutions?

Use a probabilistic mechanism like bloom filters tuned to err on the
side of false positives to determine when to not show the suggested
tile? (And which can be additionally permuted to further increase
false-positives.)

This could also be beneficial because if the brand has a very large list
of sites they don't want to be associated with, all of that information
doesn't need to be downloaded. And the side-effects of (don't show)
false positives is beneficial in that it decreases the information from
a tile not being shown.

A possibly good/possibly bad side effect is that this could allow the
brands to not explicitly say which websites they don't want to be
associated with. If this is a desirable characteristic, even Mozilla
potentially need not know what the list of sites was. If this is not a
desirable characteristic, the Mozilla automation could automatically
derive the filter from the total list of domains and make that available
as part of the data.

I would think that letting brands not explicitly reveal the websites
with 100% certainty that they don't want to be associated with is good
and acceptable since it provides parity with server-side solution and
the historical nature of ads. (Just because a company doesn't advertise
in a certain TV show/magazine doesn't mean they have explicitly decided
not to advertise there.) And interested users could still run
brute-forces against the filters and assign probabilities to certain
sites or clusters of sites being intentionally excluded in a similar
fashion to how they could notice what sites a company is not running ad
campaigns on.

Andrew

Ed Lee

unread,

Apr 25, 2015, 10:53:58 PM4/25/15

to

On Saturday, April 25, 2015 at 7:35:35 PM UTC-7, Andrew Sutherland wrote:
> Use a probabilistic mechanism like bloom filters tuned to err on the
> side of false positives to determine when to not show the suggested
> tile?

Hey, that's pretty clever. ;) I believe what you're getting at is false positives for "negative matches" ends up causing us to show fewer times than we would have -- potentially leading to less money, but brands are protected and user data isn't leaked because there's so many possible false positives.

We wouldn't be able to use this directly for "positive matches" because a bloom filter matching "web developers" would accidentally include some non web developer sites. But if we had some metrics to figure out which false positive sites were leading to low engagement (more blocks / less clicks), we could just add those false positives into the "negative matches" bloom filter.

Ed Lee

Eric Rescorla

unread,

Apr 26, 2015, 9:30:30 AM4/26/15

to Andrew Sutherland, dev. planning

On Sat, Apr 25, 2015 at 7:35 PM, Andrew Sutherland <
asuth...@asutherland.org> wrote:

> On Thu, Apr 23, 2015, at 02:23 PM, Ed Lee wrote:

> > Do people have thoughts on the privacy issues raised here and potential
> > solutions?
>

> Use a probabilistic mechanism like bloom filters tuned to err on the
> side of false positives to determine when to not show the suggested

> tile? (And which can be additionally permuted to further increase
> false-positives.
>

I love me some bloom filters, but if you want false positives, why not
just inject some directly?

-Ekr

> This could also be beneficial because if the brand has a very large list
> of sites they don't want to be associated with, all of that information
> doesn't need to be downloaded. And the side-effects of (don't show)
> false positives is beneficial in that it decreases the information from
> a tile not being shown.
>
> A possibly good/possibly bad side effect is that this could allow the
> brands to not explicitly say which websites they don't want to be
> associated with. If this is a desirable characteristic, even Mozilla
> potentially need not know what the list of sites was. If this is not a
> desirable characteristic, the Mozilla automation could automatically
> derive the filter from the total list of domains and make that available
> as part of the data.
>
> I would think that letting brands not explicitly reveal the websites
> with 100% certainty that they don't want to be associated with is good
> and acceptable since it provides parity with server-side solution and
> the historical nature of ads. (Just because a company doesn't advertise
> in a certain TV show/magazine doesn't mean they have explicitly decided
> not to advertise there.) And interested users could still run
> brute-forces against the filters and assign probabilities to certain
> sites or clusters of sites being intentionally excluded in a similar
> fashion to how they could notice what sites a company is not running ad
> campaigns on.
>
> Andrew

> _______________________________________________
> dev-planning mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-planning
>

edi...@gmail.com

unread,

May 1, 2015, 5:58:46 PM5/1/15

to

On Saturday, April 25, 2015 at 7:35:35 PM UTC-7, Andrew Sutherland wrote:

> A possibly good/possibly bad side effect is that this could allow the
> brands to not explicitly say which websites they don't want to be
> associated with.

This concept has come up in some discussions around whether it's okay for us to ship a list of adult/porn sites as clear text with Firefox. If we do want to prevent users from having a list of questionable sites on their computer, using a bloom filter can indeed avoid the problem.

Having a list could power other features such as porn-browser or the opposite with parental-control browser. Although not quite the topic of discussion here.

Do people have thoughts on whether it's okay to have a plaintext list of sites either as source or packaged as part of Firefox?

Ed Lee

Eric Rescorla

unread,

May 2, 2015, 7:24:17 AM5/2/15

to edi...@gmail.com, dev. planning

Is there a reason not to use the same techniques we use for safe browsing?

-Ekr

Ed Lee

unread,

May 2, 2015, 2:12:46 PM5/2/15

to

On Saturday, May 2, 2015 at 4:24:17 AM UTC-7, Eric Rescorla wrote:
> Is there a reason not to use the same techniques we use for safe browsing?

That's an interesting possibility, but we've been trying to reduce risk by keeping code relatively self contained within new tab modules. I could definitely see the code refactored to make use of safebrowsing for blacklisting as well as potentially positive matches for triggering a suggested tile. This would definitely be more involved as there would need to be coordination of multiple services on both server and client pieces.

Another tricky aspect is the longer term plans that don't necessarily involve matching at a site level. For example, we might want to trigger independently from sites on search keywords or page titles or path segments. Potentially we could augment safe browsing to allow for that, but that increases risk for other users of safe browsing, and it would be faster to keep with the current delivery mechanism of tiles data.

Eric Rescorla

unread,

May 2, 2015, 4:45:45 PM5/2/15

to Ed Lee, dev. planning

I was only considering a safe browsing-style mechanism for blacklisting.

Note: I'm not suggesting actually using the safe browsing list per se, just
a hash-block
mechanism like is used by safe browsing

-Ekr