Intent-to-prototype: URL Query String Stripping

574 views
Skip to first unread message

Tim Huang

unread,
Oct 11, 2021, 2:01:24 AM10/11/21
to dev-pl...@mozilla.org

Summary

Navigational Tracking is a common technique for tracking individual users by passing information alongside cross-site navigations. The query string is one of the tracking surfaces, trackers can append a tracking identifier to the query string and a tracking script on the destination page can recognize users using the identifier in the query string.


To combat this, the Anti-Tracking team is building a prototype for URL query string stripping. This prototype would provide an infrastructure which allows Firefox to strip tracking query strings from the URL on top-level navigation, based on a blocklist.


A real example: All outbound links from Facebook.com will be appended a query string “fbclid” which is known as Facebook Click Id and it’s unique for each user. So, if a user visiting facebook.com clicks a link to “example.com”, Facebook will change the link to “example.com?fbclid=ABC”. The Facebook tracking script embedded on example.com can read “fbclid” from the query string and use it to track the user in a similar manner to third-party cookie tracking.


The URLQueryStringStripper module will be responsible for taking the query strings and returning stripped query strings. The stripping will be applied on top-level navigations, including

  • Open a new tab.

  • Navigation by clicking a link.

  • Window.open().

  • Script navigation.

  • Redirect.


To avoid massive web breakage, we will follow certain rules when doing the stripping.

  • The query stripping only applies for top-level navigations

  • We don’t strip query string for same-site navigations


To stay in control of breakage and web ecosystem impact we use a list based approach for specifying the names of the parameters to strip. The list will be served by a pref value and/or Remote Setting.


The prototype was implemented in Nightly 91 and it is prefed off by default while we work on confirming an initial list to ship to our Nightly users. People who want to try it out can flip the pref ‘privacy.query_stripping.enabled’ to enable it and add the query strings in pref ‘privacy.query_stripping.strip_list’. Note that the strip list is using a space as a delimiter.

Standard

None

Platform coverage

Desktop

Preference

privacy.query_stripping.enabled

privacy.query_stripping.strip_list

DevTools bug

N/A

Other browsers

Brave has built Query String Filter

Chrome and Safari haven’t implemented this yet.

Web-platform-tests

N/A



--
Tim Huang
Mozilla


Daniel Veditz

unread,
Oct 11, 2021, 5:39:37 PM10/11/21
to Tim Huang, dev-pl...@mozilla.org
This is great for privacy!

I love the way Brave has clearly documented which items they block, and even link to issues so people can try to understand the reasons. That's a bit harder to do with a single-line pref string, and nearly impossible to make visible if it's remote-settings unless we have an explicit documentation page. The more tracking parameters we end up blocking (Brave currently has two dozen) the more likely this could lead to mysterious failures because unrelated sites just happened to use the same abbreviation for something else.

A contrived example, but imagine someone has a fantasy soccer league site. Players could be very frustrated if they're unable to link to their profile because we keep stripping the FootBall CLub ID param. A bit of documentation could help sites avoid using those IDs.

-Dan Veditz

Chris H-C

unread,
Oct 12, 2021, 11:42:02 AM10/12/21
to Daniel Veditz, Tim Huang, dev-pl...@mozilla.org
Might this (or a future not-quite-list-based version) have an effect on attributed downloads? I know we use attribution in product downloads to determine the efficacy of marketing campaigns (as a non-exhaustive example).

:chutten

--
You received this message because you are subscribed to the Google Groups "dev-pl...@mozilla.org" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev-platform...@mozilla.org.
To view this discussion on the web visit https://groups.google.com/a/mozilla.org/d/msgid/dev-platform/CADYDTCB8gh7GjVpx9pMLD5pqpDjc%3DgeqkyRJzvcK2KZpwNFSgg%40mail.gmail.com.

Daniel Veditz

unread,
Oct 12, 2021, 12:03:55 PM10/12/21
to Chris H-C, Tim Huang, dev-pl...@mozilla.org
On Tue, Oct 12, 2021 at 8:42 AM Chris H-C <chu...@mozilla.com> wrote:
Might this (or a future not-quite-list-based version) have an effect on attributed downloads? I know we use attribution in product downloads to determine the efficacy of marketing campaigns (as a non-exhaustive example).

Are you tracking users, or campaigns?

Tim did say same-site navigation wouldn't be affected (any *.mozilla.org to another *.mozilla.org, for example) -- do you track cross-site? Are you likely to use fbclid= or other known tracker to do so? The intent is to stop broad cross-web re-identification, an end-run around cookie tracking protection. Firefox downloads or similar specific use-cases aren't going to end up on that radar.

For a rough idea of scope, Brave's list (which can often get away with breaking more sites than we can) can be found at https://github.com/brave/brave-core/blob/master/browser/net/brave_site_hacks_network_delegate_helper.cc#L31-L58

-Dan Veditz

Johann Hofmann

unread,
Oct 12, 2021, 12:51:11 PM10/12/21
to Daniel Veditz, Tim Huang, dev-pl...@mozilla.org
Thank you Dan, I think that's a good suggestion. We're currently working on updating our anti-tracking policy (https://wiki.mozilla.org/Security/Anti_tracking_policy) to cover this specific protection and outline which identifiers qualify for blocklisting. From there, I think it's a good idea to maintain a publicly visible list of stripped identifiers, though I doubt that any non-expert users can utilize it to fix breakage (Brave's list is also simply embedded in its source code).

Johann

--

Johann Hofmann

unread,
Oct 12, 2021, 1:00:32 PM10/12/21
to Daniel Veditz, Chris H-C, Tim Huang, dev-pl...@mozilla.org
(Re: Are we breaking attributed downloads?)

I'd like to learn more about how attribution for downloads works (and I can reach out to you separately and update this thread) but I doubt that it utilizes a high-entropy token to join user identities across sites, as will be disallowed in our anti-tracking policy.

(Re: Future iterations of this protection)

There is a lot more work to be done until we can move towards a not-quite-listed-based version, though I think that this is a desirable end goal. We'll have to figure out how to identify tracking identifiers and, at the same time, allow for the various legit uses of URL parameters that look very much like navigational tracking. The Privacy CG is currently discussing this challenge as part of a standardization effort on navigational tracking mitigations in https://github.com/privacycg/nav-tracking-mitigations. We hope to contribute to and benefit from this effort over time.

--
You received this message because you are subscribed to the Google Groups "dev-pl...@mozilla.org" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev-platform...@mozilla.org.

Chris H-C

unread,
Oct 12, 2021, 4:37:27 PM10/12/21
to Johann Hofmann, Daniel Veditz, Tim Huang, dev-pl...@mozilla.org
To dveditz's points:

> Are you tracking users, or campaigns?

Me personally? Neither : ) But to my knowledge from helping with the implementation of attribution collection in Firefox Telemetry the answer is campaigns (and other things on that order)[1]. Basically `utm_` params.

> do you track cross-site?

Again, not me personally : ) But we may have partners driving us traffic, and we may host the installer on non-first-party eTLD+1s for, I dunno, CDN reasons? (not sure we do for Desktop, but we don't host our own packages for Android and iOS as you'd imagine. App Stores. Ick.).

> Are you likely to use fbclid= or other known tracker to do so?

I doubt that very much. To my knowledge we're interested in campaign/experiment/branch-level efficacy measurement, not user-level tracking.

Ultimately, I'm bringing this topic of "will this hurt marketing attribution" up on behalf of folks not on dev-platform who might have their marketing spend analyses thrown were this to go awry. But it sounds like their analyses are safe.

To jhoffman:

> I'd like to learn more about how attribution for downloads works (and I can reach out to you separately and update this thread) but I doubt that it utilizes a high-entropy token to join user identities across sites, as will be disallowed in our anti-tracking policy.

All data collection in mozilla projects is documented in public as required by Data Collection Review[2] so the info should all be "available" (still working on making it discoverable). Firefox Telemetry's (for Firefox Desktop) is the `attribution` section of the Environment[1] and is basically `utm_` params that get saved to disk on install and reported to Telemetry on run. Might be platform-specific wrinkles in here, definitely talk to the Desktop Integrations team about further technical detail. For mobile we probably have to mention Adjust and I'm the wrong person to talk to about that (and sadly I'm not sure who _is_ the right person to talk to about that).

:chutten

Daniel Veditz

unread,
Oct 12, 2021, 10:00:24 PM10/12/21
to Johann Hofmann, Tim Huang, dev-pl...@mozilla.org
On Tue, Oct 12, 2021 at 9:51 AM Johann Hofmann <jhof...@mozilla.com> wrote:
a publicly visible list of stripped identifiers, though I doubt that any non-expert users can utilize it to fix breakage

Agreed. Breakage would need to be patched in the website that's using the parameter with a colliding name and that should be expert enough.

(Brave's list is also simply embedded in its source code).

True. What I liked about it is that
1. it's linked from a readable explainer on their wiki (we could use a SUMO KB article)
2. the wiki (or KB article) can't be out of date because the code is truth
3. Its multi-line source format supports comments, which link to more detail about the specific params

We could certainly do the equivalent of all of the above regardless of what delivery format we use. Some are more prone to getting out of date than others.

-Dan Veditz

Nick Alexander

unread,
Oct 13, 2021, 12:59:53 PM10/13/21
to Chris H-C, Johann Hofmann, Daniel Veditz, Tim Huang, dev-pl...@mozilla.org
Hi folks,

On Tue, Oct 12, 2021 at 1:37 PM Chris H-C <chu...@mozilla.com> wrote:
To dveditz's points:

> Are you tracking users, or campaigns?

Me personally? Neither : ) But to my knowledge from helping with the implementation of attribution collection in Firefox Telemetry the answer is campaigns (and other things on that order)[1]. Basically `utm_` params.

> do you track cross-site?

Again, not me personally : ) But we may have partners driving us traffic, and we may host the installer on non-first-party eTLD+1s for, I dunno, CDN reasons? (not sure we do for Desktop, but we don't host our own packages for Android and iOS as you'd imagine. App Stores. Ick.).

> Are you likely to use fbclid= or other known tracker to do so?

I doubt that very much. To my knowledge we're interested in campaign/experiment/branch-level efficacy measurement, not user-level tracking.

Many thanks to :chutten for raising this, 'cuz otherwise it would have been me.  The attribution that we're talking about is not user-level, but we do use "well known" identifiers: from https://searchfox.org/mozilla-central/rev/b822a27de3947d3f4898defac6164e52caf1451b/browser/components/attribution/AttributionCode.jsm#45-54, I see:

"source",
"medium",
"campaign",
"content",
"experiment",
"variation",
"ua",
"dltoken",

These are terms that Firefox's client-side attribution code recognizes  It's possible that mozilla.org (bedrock) uses and/or recognizes more.  I believe that the first several of those are "industry standard" terms and while not user-level generally (or at this time), might be blocked were we and others to block more aggressively.  The "dltoken" is a per-download identifier.

Nick

xintrea

unread,
Jun 29, 2022, 2:33:26 AMJun 29
to dev-pl...@mozilla.org, nalex...@mozilla.com, Johann Hofmann, dve...@mozilla.com, tih...@mozilla.com, dev-pl...@mozilla.org, Chris H-C
This is a very strange decision. It will be necessary to create a WEB standard with a list of "forbidden" parameters, so that other projects that might accidentally use the same parameter names do not suffer.

This update may cause mega-corporations to generate parameter names. Visually, they will look like a random set of characters, inside they will contain an encrypted name with a random initialization vector and a checksum to distinguish between friend or enemy.

?vdj1967enxb52p99kiGFskdj785hFyu=kjQGj90sac17E6AJjk8afzmScA

Further you will forbid to use "unreadable" names of parameters? Then they will begin to make up random names from readable words. Will you start limiting the length of parameter names next? Then you will destroy the entire Internet.

среда, 13 октября 2021 г. в 19:59:53 UTC+3, nalex...@mozilla.com:

Daniel Serodio

unread,
Aug 8, 2022, 1:57:50 PM (6 days ago) Aug 8
to dev-pl...@mozilla.org, dev-pl...@mozilla.org
Is this list documented somewhere? I searched for "privacy.query_stripping site:mozilla.org" and found only developer-centric links (Bugzilla issues, mailing list posts, commits, etc.)

Thanks,
Daniel Serodio

Paul Zühlcke

unread,
Aug 8, 2022, 2:14:58 PM (6 days ago) Aug 8
to Daniel Serodio, Tim Huang, dev-pl...@mozilla.org
Hi Daniel!

@Tim Huang did we file bugs for the individual query params we strip?

Best Regards,
Paul

--
You received this message because you are subscribed to the Google Groups "dev-pl...@mozilla.org" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev-platform...@mozilla.org.

Tim Huang

unread,
Aug 9, 2022, 8:30:43 AM (5 days ago) Aug 9
to Paul Zühlcke, Daniel Serodio, dev-pl...@mozilla.org
Yes, we have filed a meta bug to list all of the stripped query parameters in the release channel when ETP strict is enabled.

Here is the meta bug.
Reply all
Reply to author
Forward
0 new messages