Intent to Experiment: Speculation Rules (Prefetch)

843 views
Skip to first unread message

Jeremy Roman

unread,
May 25, 2021, 1:58:55 PM5/25/21
to blink-dev, Kenji Baheux

Contact emails

jbr...@chromium.org, kenji...@chromium.org


Explainer

https://github.com/jeremyroman/alternate-loading-modes/blob/main/triggers.md


Specification

https://jeremyroman.github.io/alternate-loading-modes/#speculation-rules


Summary

Speculation Rules is a flexible syntax for defining what outgoing links are eligible to be prepared speculatively before navigation. It enables access to additional enhancements, such as use of a private prefetch proxy, where applicable.


Participants in this trial can use this syntax to request prefetching of links they expect the user is likely to visit next.


This experiment is for a limited subset of the eventual behavior we envision for speculation rules, to test if they are useful for partners. The limitations are:

  • We only process rules being added; removal of rule sets is presently ignored.

  • We only accept "prefetch_with_subresources" rules. (We envision supporting other actions in the future.)

  • We only accept list rules. (Eventually we envision also supporting document rules based on selectors and/or URL patterns.)

  • We only accept same-origin URLs and do not follow redirects, except in the case of "anonymous-client-ip-when-cross-origin" noted below.

  • For the purpose of experimentation, rules which require "anonymous-client-ip-when-cross-origin" are accepted only from certain allow-listed Google origins (which are allowed to use Google's private prefetch proxy), and used only when it is enabled in Chrome. (We envision making this feature available to all origins in the future. See below for more details.)


We intend to provide more detailed documentation for both authors who would like to use this feature and site operators who would like to control the behavior of the private prefetch proxy.


We would like to run this experiment from M92 to M94 (inclusive).


In the short term, for reasons explained in the next paragraph, a single partner (a Google property) would be granted experimental access beginning in M91 (through the experiment period mentioned above, enabled retroactively using Finch) in order to experiment with the "anonymous-client-ip-when-cross-origin" option. We expect to gather some earlier feedback about the API and data about the potential benefits to the user experience, which will help give us better direction as we continue to develop this feature.


Because these navigations originate on a Google property, the Google private prefetch proxy, which Chrome uses to anonymize the client IP address, only serves encrypted requests which are already known to Google. Assuming that this experiment confirms that the user experience is greatly improved, we'd like to explore making this capability available more broadly. As such, we would like to proactively discuss with the community a proposal for a trust and opt-in model which would allow more origins leverage IP anonymization for prefetch traffic. If that sounds interesting to you, please voice your interest and share your thoughts or questions on the proposal.


Blink component

Internals>Preload


TAG review

https://github.com/w3ctag/design-reviews/issues/611


TAG review status

Pending


Risks



Interoperability and Compatibility


Gecko: No signal


WebKit: No signal


Web developers: Past success with <link rel=prefetch> and libraries like QuickLink, and discussion with some partners suggests interest in this space.



Goals for experimentation

To gather feedback about the convenience of the Speculation Rules syntax, and to gather data about performance improvements for navigations that are prefetched, directly and via a private prefetch proxy (subject to the limitations mentioned above).


Ongoing technical constraints

No significant technical constraints anticipated.


Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?

Chrome for Android (non-WebView) only, at present.

Eventually other platforms will be supported.


Is this feature fully tested by web-platform-tests?

Not yet, but we have plans to.


Flag name

The origin trial feature name will be SpeculationRulesPrefetch.

(This will require a cherry-pick to rename the origin trial feature in M91, which has already branched from main.)


Tracking bug

https://bugs.chromium.org/p/chromium/issues/detail?id=1173646


Link to entry on the Chrome Platform Status

https://chromestatus.com/feature/5740655424831488


Links to previous Intent discussions

Intent to prototype: https://groups.google.com/a/chromium.org/g/blink-dev/c/1q7Fp3zpjgQ



This intent message was generated by Chrome Platform Status.

Tom W

unread,
May 27, 2021, 3:10:40 PM5/27/21
to blink-dev, Jeremy Roman, Kenji Baheux

Jeremy, thanks for the heads-up. I've a few questions:

  1. Before the experiment is started, is it possible to address the concerns already raised by the community at https://github.com/buettner/private-prefetch-proxy/issues/7? Providing a side-channel to Google (and only Google) that enables it to learn users’ browsing history even though the user has explicitly opted-out of sync seems undesirable. Is it possible to avoid building this side-channel to be consistent with Chrome’s plans to actively clamp up the use of similar APIs by non-Google websites via privacy sandbox techniques?

  2. Are there any bounds on how many links would be prefetched by the Google properties? I’m concerned that there are no incentives for Google properties to minimize the overhead of prefetching. Normally, that would not be a problem but in this case all the costs for prefetching are paid by the web developer and none by the initiator (aka Google properties). These costs include:

    1. Paying for extra bandwidth and computation costs. It’s incorrect to assume that by default all web developers are willing to pay higher infrastructure costs.

    2. Exposing the web server to significantly more abuse traffic that they have no visibility into due to misleading source IP addresses.

    3. Risk of serving wrong geolocated content to the users and annoying them. This includes serving personalized content based on user’s location (e.g., which country and province the user is located in)

    4. Serving content in the wrong language because of the misleading IP address which may appear to come from a different country or province. This is not a niche case. E.g., even google.com serves search results in different language based on the user's location (and not accept-language header), and again the languages can vary at the province level.

    5. Incorrectly serving other geolocated content to users which may not be otherwise available to them due to licensing restrictions.  


As part of this experiment, would it be possible to measure how many sites would be negatively affected by this? 

Roberto Clapis

unread,
Jun 1, 2021, 10:25:00 AM6/1/21
to blink-dev, privac...@gmail.com, Jeremy Roman, Kenji Baheux

In the detailed explanation there is a passage that mentions using user's past activity to speculate on which resources to pre-cache.

I think this will be introducing a new side channel that would leak information on user history.

If this is implemented already we should disable it or at least implement mitigations for it, if it is not we'd have to find a way to protect the user history from being probed this way.

Jeremy Roman

unread,
Jun 1, 2021, 10:49:23 AM6/1/21
to Roberto Clapis, blink-dev, privac...@gmail.com, Kenji Baheux
On Tue, Jun 1, 2021 at 10:24 AM Roberto Clapis <cl...@google.com> wrote:

In the detailed explanation there is a passage that mentions using user's past activity to speculate on which resources to pre-cache.

I think this will be introducing a new side channel that would leak information on user history.

If this is implemented already we should disable it or at least implement mitigations for it, if it is not we'd have to find a way to protect the user history from being probed this way.

No such heuristic is currently implemented, and that text is there by way of example. At present actions requested are executed more or less unconditionally (some resource limits may apply).

We're aware of the need to protect user privacy with any such heuristic. For example, the part you quoted specifically mentions activity on the same page which is already available to the origin (e.g., outbound link navigations the origin could have recorded at the time they occurred). Even still we would conduct a more detailed assessment of the privacy impact before implementing and shipping any such heuristic.

I don't follow your final point: given it is not implemented, how can it be used to probe user history?

Michael Buettner

unread,
Jun 2, 2021, 10:25:22 AM6/2/21
to blink-dev, privac...@gmail.com, Jeremy Roman, Kenji Baheux

Sorry for the slow reply! Most of the team was out Fri - Mon (the latter being a US holiday).


On Thursday, May 27, 2021 at 12:10:40 PM UTC-7 privac...@gmail.com wrote:

Jeremy, thanks for the heads-up. I've a few questions:

  1. Before the experiment is started, is it possible to address the concerns already raised by the community at https://github.com/buettner/private-prefetch-proxy/issues/7? Providing a side-channel to Google (and only Google) that enables it to learn users’ browsing history even though the user has explicitly opted-out of sync seems undesirable. Is it possible to avoid building this side-channel to be consistent with Chrome’s plans to actively clamp up the use of similar APIs by non-Google websites via privacy sandbox techniques?

Yes, we’ll add an update on that thread with details for the experiment. Note that we do not join proxy logs with other data linked to your Google account.  But before a full launch, we will address the issue in more depth. 

In case it’s not clear, the concern is that doing an uncredentialed prefetch when the user has a cookie (and the response can’t be used) consumes user and publisher network bandwidth but does not provide any performance benefit to the user. From our experiment, we’ll better understand the trade-off between performance improvement and network cost, which will help us design a solution to mitigate the potential side-channel.  E.g., if the added overhead is small, we can just always make the extra prefetches. If that would negatively impact user experience, we’ll need to design something more clever -- e.g., consider if the user has Sync enabled.

  1. Are there any bounds on how many links would be prefetched by the Google properties? I’m concerned that there are no incentives for Google properties to minimize the overhead of prefetching. Normally, that would not be a problem but in this case all the costs for prefetching are paid by the web developer and none by the initiator (aka Google properties). These costs include:

Yes, there are bounds on how many links will be prefetched by Chrome. The incentive for Chrome is to protect Chrome users and publishers from large bandwidth increases, and more generally wasting bytes on prefetches that won't improve performance. 

We expect to learn a lot more about these trade-offs during the course of our experiment.

    1. Paying for extra bandwidth and computation costs. It’s incorrect to assume that by default all web developers are willing to pay higher infrastructure costs.

    2. Exposing the web server to significantly more abuse traffic that they have no visibility into due to misleading source IP addresses.

Note that the proxy IP addresses map back to a Google domain via reverse DNS, and publishers can control traffic to their site by adding a traffic-advice file.

    1. Risk of serving wrong geolocated content to the users and annoying them. This includes serving personalized content based on user’s location (e.g., which country and province the user is located in)

    2. Serving content in the wrong language because of the misleading IP address which may appear to come from a different country or province. This is not a niche case. E.g., even google.com serves search results in different language based on the user's location (and not accept-language header), and again the languages can vary at the province level.

    3. Incorrectly serving other geolocated content to users which may not be otherwise available to them due to licensing restrictions.  

The proxy attempts to fetch from an IP as close to the user as possible. There may be sites that will experience geolocation issues on a small fraction of their page loads during the experiment, but if the user refreshes the page the site will load using the user’s IP. We will be tracking refresh rates and other signals closely during our experiment to detect these sites. With respect to geo-restricted content, note that images and video are not eligible for prefetching. 


As part of this experiment, would it be possible to measure how many sites would be negatively affected by this? 

We will track refresh rates and other metrics to understand the impact on both users and web developers. We would greatly appreciate feedback from publishers that think they may be impacted.

Rick Byers

unread,
Jun 2, 2021, 11:29:42 AM6/2/21
to Michael Buettner, blink-dev, privac...@gmail.com, Jeremy Roman, Kenji Baheux
On Wed, Jun 2, 2021 at 10:25 AM 'Michael Buettner' via blink-dev <blin...@chromium.org> wrote:

Sorry for the slow reply! Most of the team was out Fri - Mon (the latter being a US holiday).


On Thursday, May 27, 2021 at 12:10:40 PM UTC-7 privac...@gmail.com wrote:

Jeremy, thanks for the heads-up. I've a few questions:

  1. Before the experiment is started, is it possible to address the concerns already raised by the community at https://github.com/buettner/private-prefetch-proxy/issues/7? Providing a side-channel to Google (and only Google) that enables it to learn users’ browsing history even though the user has explicitly opted-out of sync seems undesirable. Is it possible to avoid building this side-channel to be consistent with Chrome’s plans to actively clamp up the use of similar APIs by non-Google websites via privacy sandbox techniques?

Yes, we’ll add an update on that thread with details for the experiment. Note that we do not join proxy logs with other data linked to your Google account.  But before a full launch, we will address the issue in more depth. 

In case it’s not clear, the concern is that doing an uncredentialed prefetch when the user has a cookie (and the response can’t be used) consumes user and publisher network bandwidth but does not provide any performance benefit to the user. From our experiment, we’ll better understand the trade-off between performance improvement and network cost, which will help us design a solution to mitigate the potential side-channel.  E.g., if the added overhead is small, we can just always make the extra prefetches. If that would negatively impact user experience, we’ll need to design something more clever -- e.g., consider if the user has Sync enabled.

  1. Are there any bounds on how many links would be prefetched by the Google properties? I’m concerned that there are no incentives for Google properties to minimize the overhead of prefetching. Normally, that would not be a problem but in this case all the costs for prefetching are paid by the web developer and none by the initiator (aka Google properties). These costs include:

Yes, there are bounds on how many links will be prefetched by Chrome. The incentive for Chrome is to protect Chrome users and publishers from large bandwidth increases, and more generally wasting bytes on prefetches that won't improve performance. 

One additional point worth mentioning I think is that even if Chrome over-optimized just for the user experience (and neglected publisher experience - not our intention of course), there would still be significant disincentive to prefetch too much. First there is at least some contention for network and hardware resources on the majority of the world's computing devices which are lower end phones. Fetching something that's ultimately unused always has some negative impact on user experience and I expect to be able to quantify that with the same metrics that we're using to quantify the user benefit (eg. CWV). 

Secondly we're well aware that if we fail to make this net-positive for publishers, publishers will opt-out en masse. We'll be looking at opt-out rates as a signal into how well we're doing at making this a win for publishers. From our other work on improving loading speed (eg. BFCache) we know that when pages load faster, users spend more time on the web and complete more transactions. So I'm personally quite optimistic that there is significant publisher upside to this work if we can just balance and tune the tradeoffs properly to find the sweet spot (hence this experiment, and many future ones I'm sure).
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/c146b31c-05c6-4bb4-92e8-59a48b3520e4n%40chromium.org.

Tom W

unread,
Jun 3, 2021, 11:49:34 AM6/3/21
to blink-dev, rby...@chromium.org, blink-dev, Tom W, Jeremy Roman, Kenji Baheux, Michael Buettner
> if the added overhead is small, we can just always make the extra prefetches. If that would negatively impact user experience, we’ll need to design something more clever -- e.g., consider if the user has Sync enabled.

This is very reasonable approach. Thank you!

> The incentive for Chrome is to protect Chrome users and publishers from large bandwidth increases, and more generally wasting bytes on prefetches that won't improve performance. 

I think they are different problems with somewhat different solutions. e.g., Prefetching aggressively on WiFi connections may not incur any data usage for Chrome, but the publisher would still need to pay for all the data costs.  In most cases (unless there is a UX component), users would likely even blame higher data costs on websites they visit rather than on the Chrome itself.

>  Secondly we're well aware that if we fail to make this net-positive for publishers, publishers will opt-out en masse. We'll be looking at opt-out rates as a signal into how well we're doing at making this a win for publishers.

Thanks. To be clear, I do not doubt the benefit of this feature for the referrers and some publishers. What's unclear is why developers are opted-in by default when as you indicate that developers are proactive enough to make the right choices.

I hope that Blink/Chrome keeps developer interests in mind while designing this and not make default assumptions on behalf of web developers which are not validated. Different developers have different goals and there is no single default solution that fits everybody's requirements.


Chris Harrelson

unread,
Jun 7, 2021, 6:03:19 PM6/7/21
to Tom W, blink-dev, rby...@chromium.org, Jeremy Roman, Kenji Baheux, Michael Buettner

Roberto Clapis

unread,
Jun 21, 2021, 5:39:23 AM6/21/21
to Jeremy Roman, blink-dev, privac...@gmail.com, Kenji Baheux
If it's not implemented, there is no concern here.

Mathias Bynens

unread,
Jun 21, 2021, 7:40:32 AM6/21/21
to Jeremy Roman, blink-dev, Kenji Baheux
The Intent* email template includes a “Debuggability” section, which is missing in this case. How will web developers be able to debug this new functionality through DevTools? See https://goo.gle/devtools-checklist for context.


--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Jeremy Roman

unread,
Jun 21, 2021, 5:15:19 PM6/21/21
to Mathias Bynens, blink-dev, Kenji Baheux
On Mon, Jun 21, 2021 at 7:40 AM Mathias Bynens <mt...@google.com> wrote:
The Intent* email template includes a “Debuggability” section, which is missing in this case. How will web developers be able to debug this new functionality through DevTools? See https://goo.gle/devtools-checklist for context.

I didn't delete it; this was not included in the template generated by Chrome Platform Status (which I'm told has replaced the previous template). If it should be included, perhaps that application needs to be changed?

None of the categories there directly apply. Debuggability support is presently fairly limited (though I think support for previous prefetching features also was). Ideally we would have more detailed support for surfacing and controlling the contemplated and executed prefetching activity, but that was not prioritized for this experiment.

We do sometimes emit warning messages on certain kinds of errors.

Yang Guo

unread,
Jun 22, 2021, 3:11:25 AM6/22/21
to blink-dev, Jeremy Roman, blink-dev, Kenji Baheux, Mathias Bynens
Are these warning messages emitted to the DevTools console? This is an anti-pattern we are trying to limit. If the warning is something developers should act on, we should implement it as a warning in the Issues panel.

Jeremy Roman

unread,
Jun 22, 2021, 4:59:30 PM6/22/21
to Yang Guo, blink-dev, Kenji Baheux, Mathias Bynens
On Tue, Jun 22, 2021 at 3:11 AM Yang Guo <yan...@google.com> wrote:
Are these warning messages emitted to the DevTools console? This is an anti-pattern we are trying to limit. If the warning is something developers should act on, we should implement it as a warning in the Issues panel.

Yes. At the moment it simply surfaces JSON syntax errors, and was implemented in direct response to feedback.

As far as the issues tab goes, it seemed to me like it was intended for exposing warnings arising less directly from an immediate use of the feature (like suboptimal use of an API, or deprecations), but perhaps I'm misunderstanding. Anecdotally I've seldom noticed the issues tab being somewhere I look when my feature actively isn't working, and the process makes it seem like adding support for the Issues Tab would be a multi-week affair (compared to ordinary console warnings and exception messages).

Is there a more lightweight way to determine what belongs in issues vs console, and to add information to the Issues Tab without requiring so many layers of approval and review?

Mathias Bynens

unread,
Jun 23, 2021, 3:46:58 AM6/23/21
to Jeremy Roman, Sigurd Schneider, Yang Guo, blink-dev, Kenji Baheux

Yang Guo

unread,
Jun 23, 2021, 3:50:00 AM6/23/21
to Mathias Bynens, Jeremy Roman, Sigurd Schneider, blink-dev, Kenji Baheux
If the error is thrown in the form of a JavaScript exception, showing it in the console is a good way to surface (when the exception is not caught).

We already use the Issues tab for deprecations or wrong API usage. You can find a list of issues here. For example, we warn against usage of the deprecated navigator.userAgent, usage of quirks mode, and a large range set of CORS, CORP, and CSP related issues.

Admittedly it is more involved to create an Issue when compared to outputting a line to the console. However, the latter is not particularly developer friendly and often ignored. So far we had good experience with developers reacting to reported issues.

Anecdotally I've seldom noticed the issues tab being somewhere I look when my feature actively isn't working

This is a chicken and egg problem and won't improve if new features bypass the Issues tab. Maybe we can streamline the process of adding a new issue though. I know some things moved in this space to make adding new issues easier.

Cheers,

Yang

Sigurd Schneider

unread,
Jun 23, 2021, 6:39:06 AM6/23/21
to Yang Guo, Mathias Bynens, Jeremy Roman, blink-dev, Kenji Baheux
I agree with Yang here.

We are aware that adding issues has a bit too much overhead at the moment, and we have a new hire starting in August and we plan to have her tackle this problem in fall. So we expect that in Q4, we will have a solution that makes it more lightweight to add issues in many cases (it depends on the particular thing one wants to surface).

If you find yourself being blocked by having to add issues, please feel free to add a console message for now to get unblocked, and file a bug (component Platform>DevTools) detailing that the message(s) should really be issues.
--
Sigurd Schneider | Software Engineer | Chrome - V8 | sig...@google.com


Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg


Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.

    

This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.

Reply all
Reply to author
Forward
0 new messages