Intent to Experiment: "is-cross-site" bit in the HTTP Cache Partitioning Key

353 views
Skip to first unread message

Andrew Williams

unread,
May 2, 2023, 10:41:24 AM5/2/23
to blink-dev, Dustin Mitchell, Mike Taylor

Contact emails

awi...@chromium.org


Explainer

This change is not covered by an explainer, but the following are related:

https://github.com/shivanigithub/http-cache-partitioning

https://github.com/MattMenke2/Explainer---Partition-Network-State/blob/main/README.md


Spec

https://fetch.spec.whatwg.org/#http-cache-partitions

https://fetch.spec.whatwg.org/#network-partition-keys


Summary


To protect users from multiple types of cross-site data leak attacks, the HTTP cache was partitioned based on the top-level site and the frame site. In other words, cache entries for a given URL created from third-party contexts (e.g., a ‘b.com’ iframe embedded on ‘a.com’) are stored separately from those from first-party contexts (‘a.com’, or an ‘a.com’ iframe embedded on ‘a.com’) and other third-party contexts (a ‘c.com’ iframe embedded on ‘a.com’).


Other shared network state was recently partitioned as well, but using a different partitioning scheme - instead of using the top-level site and the frame site, network state in third-party contexts is partitioned by the top-level site and whether the frame site is cross-site from the top-level site. Using this "is-cross-site" bit instead of the frame site was chosen as a balance between security and performance after running experiments and measuring the results.


The change proposed here is to replace the frame site in the HTTP cache partitioning scheme with an "is-cross-site" bit to perform similar experimentation. Using the examples above, this change means that a ‘b.com’ iframe embedded on ‘a.com’ would now share an HTTP cache partition with a ‘c.com’ iframe embedded on ‘a.com’, since the “is-cross-site” bit for both would be set to true (and since the top-level site for both is ‘a.com’).


Blink component

Internals>Network>Cache


TAG review

Not requested at this time


TAG review status

N/A


Risks

Interoperability and Compatibility

No interoperability / compatibility risks are expected, since the caching behavior generally only affects site performance , and in this case the change should result in performance improvements


Gecko: N/A, but their implementation doesn’t partition by frame site or “is-cross-site”;

WebKit: N/A, but their implementation doesn’t partition by frame site or “is-cross-site”;

Web developers: No signals

Other signals: N/A


WebView application risks

None, since HTTP cache partitioning is not enabled for WebView


Security

One side effect of this change is that a malicious cross-site iframe will now share an HTTP cache partition with other cross-site iframes under the same top-level site. This allows the malicious cross-site iframe to perform data leak attacks against the other cross-site iframes via HTTP cache probing. It’s unclear how useful this attack primitive would be in practice, since the data available to the attacker would still be partitioned by the top-level site, and since construction of a page to take advantage of this weakness (for instance, a phishing page where a victim site is in one cross-site iframe) could be thwarted by X-Frame-Options / CSP’s frame-ancestors option.


Goals for experimentation

This experiment aims to identify what effect the new partitioning scheme has on the Chrome guiding metrics/vital metrics, and on other metrics that are influenced by the performance of the HTTP cache.


Also, one specific side-effect of this change is that iframes with opaque origins (for instance, those created using data: URLs) may now be eligible to have their resources added to the HTTP cache. We aim to measure what changes in performance result from this.


Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?

This feature will be supported on all six Blink platforms, but note that HTTP cache partitioning is only enabled-by-default for Chrome desktop and mobile platforms (but not WebView).


Flag name

--enable-features=EnableCrossSiteFlagNetworkIsolationKey


Proposed experiment timeline

We plan to roll out the experiment as follows:

  • April 27th - 50% of Canary and Dev users (M114)

  • May 4th - 50% of Canary, Dev, and Beta users (M114)

  • May 11th - 1% of Stable users and 50% of Canary, Dev, and Beta users

  • June 15th - End the experiment


Is this feature fully tested by web-platform-tests?

No - for this experiment, testing is only implemented via unit tests and Chrome browser tests


Link to entry on the feature dashboard

https://chromestatus.com/feature/6169233265786880


Yoav Weiss

unread,
May 3, 2023, 3:42:22 AM5/3/23
to Andrew Williams, blink-dev, Dustin Mitchell, Mike Taylor
LGTM to experiment based on the suggested timelines

Thanks for trying to win back some of the performance losses that came with cache partitioning! 

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAEa0%2BkVFqn2BxfXhtGjYbb6%3DVA8MpTnD%2BNx4bcQqt3a3a0s%3DCw%40mail.gmail.com.

Mike West

unread,
May 8, 2023, 3:27:01 PM5/8/23
to Andrew Williams, blink-dev, Dustin Mitchell, Mike Taylor
Hey Andrew!

On Tue, May 2, 2023 at 4:41 PM Andrew Williams <awi...@chromium.org> wrote:
I'm not sure I agree. There are plenty of interesting frames that are happy with being embedded in a wide variety of sites (YouTube, Maps, Ads, payment providers like Stripe, single-sign on login forms, etc), and it seems reasonable to assume that being able to gain insight into their state would be a risk they'd rather avoid. The current partitioning scheme does that, changing the model seems like a regression.

I don't object to the experiment, but I'd like to see a little more analysis of risk here so that we can take that into account when making decisions about performance deltas.

-mike
 

Goals for experimentation

This experiment aims to identify what effect the new partitioning scheme has on the Chrome guiding metrics/vital metrics, and on other metrics that are influenced by the performance of the HTTP cache.


Also, one specific side-effect of this change is that iframes with opaque origins (for instance, those created using data: URLs) may now be eligible to have their resources added to the HTTP cache. We aim to measure what changes in performance result from this.


Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?

This feature will be supported on all six Blink platforms, but note that HTTP cache partitioning is only enabled-by-default for Chrome desktop and mobile platforms (but not WebView).


Flag name

--enable-features=EnableCrossSiteFlagNetworkIsolationKey


Proposed experiment timeline

We plan to roll out the experiment as follows:

  • April 27th - 50% of Canary and Dev users (M114)

  • May 4th - 50% of Canary, Dev, and Beta users (M114)

  • May 11th - 1% of Stable users and 50% of Canary, Dev, and Beta users

  • June 15th - End the experiment


Is this feature fully tested by web-platform-tests?

No - for this experiment, testing is only implemented via unit tests and Chrome browser tests


Link to entry on the feature dashboard

https://chromestatus.com/feature/6169233265786880


Andrew Williams

unread,
Sep 19, 2023, 1:55:17 PM9/19/23
to Mike West, blink-dev, Dustin Mitchell, Mike Taylor
Hey everyone,

TL;DR: We'd like to extend the experiment with some modifications, and are requesting an LGTM for this.

As part of the experiment we found that enabling the HTTP cache for opaque origin contexts (and having those cache partitions be shared among third-party contexts) resulted in the following changes to the FCP (first contentful paint) times for those contexts:
 - decreases in the 25th and 50th percentile time by 28.19% and 18.56% respectively for desktop platforms, with very strong confidence (4-diamonds in our internal dashboard)
 - decreases in the 75th percentile time by 7.39% for desktop platform, with strong confidence (3-diamonds in our internal dashboard)
 - decreases in the 25th percentile time by 6.14% for Android, with moderate confidence (2-diamonds in our internal dashboard)
 - decreases in the 99th percentile time by 3.09% for Android, with weak confidence (1-diamond in our internal dashboard)

Page loads from opaque origin iframes appear to make up only 1% of all third-party context page loads, though, and the metrics tracking the time until FCP for all third-party contexts showed no statistically significant changes.

We also tracked metrics related to cache utilization over the course of the experiment, but we weren’t able to draw any conclusions from the data we collected. Our plan was to take the experiment group, who would effectively be starting from an empty cache due to the cache partitioning scheme change, and compare them to a control group of users with a primed cache. We thought that the experiment group would have low cache utilization at first but reach a steady state after a few weeks and allow us to make a fair comparison at the end of the experiment. This turned out to not be the case, though, due to the experiment itself taking a month to fully roll out and the cache likely taking longer on average to fully repopulate. Also, we thought it'd be interesting to measure the effect having a shared cache partition for all opaque origin contexts under a given top-level site (but otherwise using triple-keying for everything) would have on performance. As a result, we’d like to repeat the experiment with different groups:
 - New control group that clears the HTTP cache upon starting and begins alongside the others
 - 2.5-keyed with shared cache for opaque origin contexts (same as in Round 1)
 - Triple-keyed with a shared partition for all opaque origin iframes under a given top-level site

Our proposed timeline for this is:
 - September 20th - Enable experiment for Canary and Dev users
 - September 27th - Enable experiment for Canary, Dev, and Beta users, pending I2E LGTM
 - October 4th - Enable experiment for 3% of Stable users (1% in each experiment group mentioned above)
 - November 8th - End the experiment

Also, Mike, thanks for the feedback regarding risk analysis. We will ensure there is a more comprehensive assessment that is made publicly available if we decide to move forward with implementing a change here.

-Andrew

Yoav Weiss

unread,
Sep 25, 2023, 2:18:50 AM9/25/23
to Andrew Williams, Mike West, blink-dev, Dustin Mitchell, Mike Taylor
LGTM to continue experimenting!

On Tue, Sep 19, 2023 at 7:55 PM Andrew Williams <awi...@chromium.org> wrote:
Hey everyone,

TL;DR: We'd like to extend the experiment with some modifications, and are requesting an LGTM for this.

As part of the experiment we found that enabling the HTTP cache for opaque origin contexts (and having those cache partitions be shared among third-party contexts) resulted in the following changes to the FCP (first contentful paint) times for those contexts:
 - decreases in the 25th and 50th percentile time by 28.19% and 18.56% respectively for desktop platforms, with very strong confidence (4-diamonds in our internal dashboard)
 - decreases in the 75th percentile time by 7.39% for desktop platform, with strong confidence (3-diamonds in our internal dashboard)
 - decreases in the 25th percentile time by 6.14% for Android, with moderate confidence (2-diamonds in our internal dashboard)
 - decreases in the 99th percentile time by 3.09% for Android, with weak confidence (1-diamond in our internal dashboard)

Page loads from opaque origin iframes appear to make up only 1% of all third-party context page loads, though, and the metrics tracking the time until FCP for all third-party contexts showed no statistically significant changes.

We also tracked metrics related to cache utilization over the course of the experiment, but we weren’t able to draw any conclusions from the data we collected. Our plan was to take the experiment group, who would effectively be starting from an empty cache due to the cache partitioning scheme change, and compare them to a control group of users with a primed cache. We thought that the experiment group would have low cache utilization at first but reach a steady state after a few weeks and allow us to make a fair comparison at the end of the experiment. This turned out to not be the case, though, due to the experiment itself taking a month to fully roll out and the cache likely taking longer on average to fully repopulate. Also, we thought it'd be interesting to measure the effect having a shared cache partition for all opaque origin contexts under a given top-level site (but otherwise using triple-keying for everything) would have on performance. As a result, we’d like to repeat the experiment with different groups:
 - New control group that clears the HTTP cache upon starting and begins alongside the others
 - 2.5-keyed with shared cache for opaque origin contexts (same as in Round 1)
 - Triple-keyed with a shared partition for all opaque origin iframes under a given top-level site

Our proposed timeline for this is:
 - September 20th - Enable experiment for Canary and Dev users
 - September 27th - Enable experiment for Canary, Dev, and Beta users, pending I2E LGTM
 - October 4th - Enable experiment for 3% of Stable users (1% in each experiment group mentioned above)
 - November 8th - End the experiment

Also, Mike, thanks for the feedback regarding risk analysis. We will ensure there is a more comprehensive assessment that is made publicly available if we decide to move forward with implementing a change here.

I'm sure y'all considered it, but a potentially-naive question: could "interesting" frames opt-in to being triple keyed if they prefer the security benefits over the performance ones? Or somehow choose to share that 3rd key with other origins they trust, to get the best of both worlds?
 

Andrew Williams

unread,
Sep 5, 2024, 11:30:19 PMSep 5
to Yoav Weiss, Mike West, blink-dev, Dustin Mitchell, Mike Taylor
Hi everyone,

To follow-up on this, we ran the updated experiment and the results are largely the same as in the first round. Caching resources corresponding to third-party opaque origin contexts results in speed-ups for these page loads, but the impact isn't large enough to move the needle on broader metrics (likely because opaque origin contexts seem to only make up around 1% of all third-party contexts).

The performance metrics for the new experiment group (triple-keyed with a shared cache partition per top-level site for opaque origin iframes) were largely the same as those from the experiment group that was 2.5-keyed and cached contents from opaque origin iframes. It seems that 2.5-keying has comparable performance to triple-keying in general, and there may be significant performance gains to be had for opaque origin iframes if a way to safely support caching for these contexts can be identified.

Regarding cache utilization, the experiment groups that supported caching for opaque origin contexts had slightly higher cache reuse levels than the control group that had its cache cleared at the beginning of the experiment. Comparing the cache clearing control group to a control group that didn't clear the cache showed that clearing the cache does result in less utilization (although still by a relatively small amount), possibly because it takes longer than the experiment timeframe to fully re-populate the cache after it has been cleared.

If anyone has any questions on this please let me know! Thanks,

-Andrew
Reply all
Reply to author
Forward
0 new messages