Contact emails
Explainer & Design Doc
Summary
Implement stale-while-revalidate processing in the Cache-Control header. Allow stale resources to be served from the cache while asynchronously revalidated.
Motivation
Allowing websites find a balance between rapid deployment and improved time to load is important. Websites might use Javascript bootstrapping code which might want the deployment of a resource that has a short max-age (for rapid deployment) but allow it to be served stale for a longer duration. Allowing it to be served stale to a page removes the need for the resource to be blocking the load of the page if the rest of the resources are in the cache. Authors expect that the resource would be revalidated shortly thereafter.
Risks
Interoperability and Compatibility
Service Worker API will see the revalidation requests. This might complicate some scenarios.
Edge: No signals
Firefox: No signals
Safari: No signals
Web developers: Unknown
Ergonomics
N/A
Activation
Http Server Change would be needed. Some resources like css fonts and amp bootstrap code are already served with the cache directive.
Debuggability
Dev tools will show an additional resource requests when it is being revalidated.
Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?
Yes
Link to entry on the feature dashboard
https://www.chromestatus.com/feature/5050913014153216
Requesting approval to ship?
No, will conduct origin trial. An Intent to Experiment will be sent once the code has landed.
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAHXv1wm0iwRY7i5aOT7qCB2aRMy9dR6fU51CasW4Bctqbsh17g%40mail.gmail.com.
> Tag Review not required as this isn't web exposed.
Sounds pretty web exposed to me, even if not through an API.The page will have stale resources which can affect its behavior, it can know that it has stale resources if there is a mixture, for example.
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CA%2B1UsbRY%2BiLyBupStj42a%3Db%3DBH2U4zEsZV4uQ8sBZTi9Qpea%2Bg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAF8qwaAok4omjYLWRTkyKGN4%3DdaqkspDjoausUzcVUEBCMHH6Q%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CANh-dXmocowPZgkLH%3D%3DrrdvuAExpi7k3bZOfx7paTDCgSrbwHw%40mail.gmail.com.
Thanks Jeffrey / Hi David,
> On 23 Jun 2018, at 6:28 am, Jeffrey Yasskin <jyas...@chromium.org> wrote:
>
> FYI to mnot, the author of https://tools.ietf.org/html/rfc5861, in case any of the details here argue for updating that RFC.
>
> It's definitely worth talking to Anne about how Fetch should integrate this behavior for stale-while-revalidate. Actually getting those changes into the spec might take a lot of elaboration of the existing cache behavior, though.
>
> Jeffrey
>
> On Fri, Jun 22, 2018 at 12:56 PM David Benjamin <davi...@chromium.org> wrote:
> I'm going to ask a question then answer it, since I already know and am happy with the answer, but I feel it should be mentioned in the thread somewhere... :-)
>
> (Wearing my question-asking hat)
>
> In the past, when we've looked at stale-while-revalidate, we had trouble trying to use it sanely in a browser. My sense is that it was originally designed with more a CDN-like use in mind. Typically in a CDN, connectivity to the origin is reliable, there is no difference between an out-of-band and in-band request, and you expect a single cache to serve many clients talking to a site. The CDN and origin also typically have some kind of relationship, so there is much less worry about whether the CDN is willing to make a request at this time.
The original motivation was back-end API caching at Yahoo!, but yes that's closer to CDN than browser.
> In a browser, none of those hold. Out-of-band and in-band requests are quite different. Requests may trigger user interaction (auth prompts), and there is an expectation that a site "stops doing things" when one closes a tab. The client may be offline or have flaky connectivity, so the revalidation may fail. Or perhaps we have some local policy (extension?) that rejects such revalidations.
Just to make sure we're on the same page -- nothing about SwR requires a cache to revalidate in any given situation; not only would that be backwards-incompatible with caches that don't implement it, but it also is against the whole nature of caching as an optimisation. Caches are free to treat it like a hint and use additional information / heuristics to help decide when and how to revalidate.
> Moreover, an async revalidation is inherently predictive. It extends the max-age and stale-while-revalidate window for future requests. If no future request hits that window, the revalidation is useless.
Yes. Effectively, a hit during the SwR window is used as an indication that it's worth trying a revalidation.
> (It is also predictive in a CDN, but as the CDN's cache services many clients, it's a very solid prediction.)
Depending on the nature of the content; while CDN hit rates are generally higher than browsers and forward caches, lots of little-used content goes through CDNs. Regardless, this actually shouldn't matter too much; if the revalidation ends up not getting used, no more requests were issued than would have been without SwR.
> The browser's HTTP cache serves not just the site developer, but also the user, who may be visiting another site or have limited resources. Being predictive, the case for dropping those under load becomes very strong.
As per above, this is *additional* efficiency that the cache can eke out if it decides it wants to drop SwR revalidations -- which would not be available without it.
> All together, this means we must strongly consider revalidations failing on the client. In a naive stale-while-revalidate implementation, a failed revalidation acts as if we had written a larger max-age.
I don't think anyone has implemented it that way; generally people will either immediately consider the object "truly" stale, or retry (with some sort of backoff heuristic).
> If the site author was okay with that, why didn't they set the larger max-age? This is an apparent contradiction. The spec nominally allows for this (it only says revalidation is a SHOULD), but max-age=<1 day>, stale-while-revalidate=<1 week> means very different things depending on whether the revalidations will happen or not. We need clear semantics here between the client and the server.
See above. SwR is designed the way it is so that it's backwards-compatible with caches that don't implement it.
> (Wearing my question-answering hat)
>
> The problem is stale-while-revalidate's semantics are tied to revalidation success.
Can you dig in here a bit? Why do you say this?
> We need to decouple those. The proposed design is to clamp the stale-while-revalidate period on use: Let T be some small grace period, say one minute. If we use a resource in the stale-while-revalidate period, update the end of the period to min(currentEndpoint, now + T). Then return the resource stale, but also request that the upper layers asynchronously revalidate the resource.
>
> This effectively implements a revalidation timeout of T. This preserves the good properties of stale-while-revalidate. If revalidation completes within now+T, cache behavior is better and we behave as a naive stale-while-revalidate implementation in the good case. At the same time, it avoids the bad properties. If revalidation fails, we start requiring a revalidation, gracefully decaying to the pre-stale-while-revalidate behavior. (For completeness, if revalidation completes, but misses the now+T timeout, that is also okay. Requests after the revalidation will still hit the cache.)
>
> This gives clear semantics for stale-while-revalidate.
This is certainly a valid approach (if I understand you correctly). I don't think we can say that it provides clarity about SwR's semantics, given that other implementations have taken other, equally valid approaches.
> The author can tune Cache-Control based on their requirements. The semantics of max-age=M and stale-while-revalidate=S are:
>
> If the resource is within M, just use it stale.
If it's within Cache-Control: max-age, how is it stale (assuming that the rest of the freshness algorithm lets it be fresh)?
> I am okay with having to wait M for an update to be rolled out, in exchange for M worth of cache. Past M, if it's still within M+S, I'm dubiously okay with it being used stale. I really want M+S worth of cache because M is too small, but I still want to know updates are rolled out after M. This is impossible, so I will concede waiting M+S for an update to be completely rolled out. But I want the update to be mostly rolled out after M. Thus there must be some bound on using resources in the [M, M+S) range. Finally, past M+S, you must revalidate. M+S is the oldest resource I'm willing to consider.
That is roughly the current semantics of SwR (delta the point about CC: max-age).
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAHXv1wkPq04L6XVmWuUF3aCY80JNEosCb3a7nnbUphPinc2eFg%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/8CFD542A-1418-4CF1-B686-DB644DF07015%40mnot.net.
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/4D4968FD-1AAE-4CC2-BDE3-39CB2B6B95D4%40mnot.net.
(I've tried to recreate the quoting below; apologies for any mistakes. Many thanks for whoever can hunt down the appropriate PM for Gmail and "persuade" them to change this behaviour).
On 26 Jun 2018, at 1:09 am, David Benjamin <davi...@chromium.org> wrote:
>> Just to make sure we're on the same page -- nothing about SwR requires a cache to revalidate in any given situation; not only would that be backwards-incompatible with caches that don't implement it, but it also is against the whole nature of caching as an optimisation. Caches are free to treat it like a hint and use additional information / heuristics to help decide when and how to revalidate.
>
> The problem is the behavior of stale-while-revalidate is very different depending on whether that revalidation happens at all. As you say, it's the cache's choice whether and how to revalidate. However, the exact behavior here affects the semantics significantly. Consider these two implementations:
>
> A: The cache always serves the resource stale in the SwR window, but the revalidations basically always work because connectivity between cache and backend is solid.
>
> B: The cache always serves the resource stale in the SwR window, but it never bothers to revalidate [or revalidations always fail]. This is fine per spec, as it's merely a SHOULD.
>
> These two behave very differently. (A) is the intuitive semantics one would expect out of SwR. (B) is just a larger max-age, which is presumably not what the author wanted. Now, (B) is rather absurd of an implementation, but the problem is failed revalidations decay to (B), while the author expected (A)'s staleness behavior.
There's also:
C: The cache don't receive any requests during the stale window until the very end, where it serves the stale content, makes the async request successfully but never uses the refreshed response before it becomes stale.
From the standpoint of the author, B and C are very similar, and C isn't that uncommon (even on a CDN or reverse proxy; we have unpopular content too, and we also have connectivity problems to the origin). The author *really* has to be comfortable with that stale content being used for its entire window, even if popular content won't exercise that in practice all the time.
> The effective is amplified when you consider SwR windows wide enough to be of any use for the browser. The RFC's example of Cache-Control: max-age=600, stale-while-revalidate=30 makes (A) and (B) roughly the same. But it will never be hit in a browser where, unlike a backend cache shared by multiple clients, one usually does not expect continuous access of a resource by just one client. More plausible for a browser would be a SwR window measured in weeks or days. Now (A) and (B) are very meaningfully different.
So, this table shows SwR values in the HTTP Archive - first column is the SwR value, second is count of that value seen in the latest run:
https://docs.google.com/spreadsheets/d/1bV32i_KvJ7_ywTPWApxGjQE_Ovb6uRD3-BEAgC5nd90/edit?usp=sharing
The biggest spike there is at seven days. Playing with the query a bit, it seems like almost all of the upper-end values are JS and CSS.
I'm sure some of those values are set with Chrome in mind, but given that SwR is pretty widely supported by CDNs and reverse proxies, I strongly suspect there's a good number targeting intermediary caches specifically.
But yes, the traffic profile for a CDN or reverse proxy is going to be very different from what you see. Do you think what you suggest will work for them equally well?
>> > (It is also predictive in a CDN, but as the CDN's cache services many clients, it's a very solid prediction.)
>>
>> Depending on the nature of the content; while CDN hit rates are generally higher than browsers and forward caches, lots of little-used content goes through CDNs. Regardless, this actually shouldn't matter too much; if the revalidation ends up not getting used, no more requests were issued than would have been without SwR.
>>
>> > The browser's HTTP cache serves not just the site developer, but also the user, who may be visiting another site or have limited resources. Being predictive, the case for dropping those under load becomes very strong.
>>
>> As per above, this is *additional* efficiency that the cache can eke out if it decides it wants to drop SwR revalidations -- which would not be available without it.
>
> I think the "no more requests" reduction is a bit more nuanced. Background and foreground requests are different. Rate limiting is usually accounted based on live requests and live tabs. (This corresponds to a user expectation: closing the tab must make it go away.) The SwR revalidations detach from that and thus need to be separately rate-limited.
I see; that makes sense from a browser standpoint, but isn't really relevant to an intermediary cache or the origin.
>> > We need to decouple those. The proposed design is to clamp the stale-while-revalidate period on use: Let T be some small grace period, say one minute. If we use a resource in the stale-while-revalidate period, update the end of the period to min(currentEndpoint, now + T). Then return the resource stale, but also request that the upper layers asynchronously revalidate the resource.
>> >
>> > This effectively implements a revalidation timeout of T. This preserves the good properties of stale-while-revalidate. If revalidation completes within now+T, cache behavior is better and we behave as a naive stale-while-revalidate implementation in the good case. At the same time, it avoids the bad properties. If revalidation fails, we start requiring a revalidation, gracefully decaying to the pre-stale-while-revalidate behavior. (For completeness, if revalidation completes, but misses the now+T timeout, that is also okay. Requests after the revalidation will still hit the cache.)
>> >
>> > This gives clear semantics for stale-while-revalidate.
>>
>> This is certainly a valid approach (if I understand you correctly). I don't think we can say that it provides clarity about SwR's semantics, given that other implementations have taken other, equally valid approaches.
>
> See above. The specification failed to detail this.
I'm having a hard time seeing this as a significant improvement in clarity; we go from "I'm OK with the response being served stale during this window, while you attempt to revalidate it in the background" to "I'm OK with it being served stale in this window, while an attempt is made to revalidate it in the background, but if that revalidation attempt fails (for some definition of failure), suddenly it's not OK to serve it stale." It introduces a new factor into whether something is allowed to be served stale.
I.e., if there's no traffic for a week and it's OK to serve a response stale, why is it not OK to serve that same response stale at the same time if there was a failed revalidation driven by a previous request sometime during that week? A transient network or server failure is not likely to affect the resource's state.
It seems like what you really want to do here is to specify how stale responses can or cannot be used when a fresh response can't be obtained. By default, HTTP caches are allowed to do this with all storable responses unless CC: must-revalidate or no-cache are present; we introduced CC: stale-if-error to provide some finer granularity there. Perhaps that's what you're looking for?
E.g., a one-week SwR window with the semantics you define might look like:
Cache-Control: max-age=3600, stale-while-revalidate=604800, stale-if-error=60
If that's interesting, I could see updating the spec to clarify the relationship between SwR and SiE.
If you do want to try to add additional constraints to error handling for SwR independently, the discussion really needs to happen on a list like ietf-h...@w3.org, since lots of caches have implemented it and they're going to want to weigh in.
>> > We need to decouple those. The proposed design is to clamp the stale-while-revalidate period on use: Let T be some small grace period, say one minute. If we use a resource in the stale-while-revalidate period, update the end of the period to min(currentEndpoint, now + T). Then return the resource stale, but also request that the upper layers asynchronously revalidate the resource.
>> >
>> > This effectively implements a revalidation timeout of T. This preserves the good properties of stale-while-revalidate. If revalidation completes within now+T, cache behavior is better and we behave as a naive stale-while-revalidate implementation in the good case. At the same time, it avoids the bad properties. If revalidation fails, we start requiring a revalidation, gracefully decaying to the pre-stale-while-revalidate behavior. (For completeness, if revalidation completes, but misses the now+T timeout, that is also okay. Requests after the revalidation will still hit the cache.)
>> >
>> > This gives clear semantics for stale-while-revalidate.
>>
>> This is certainly a valid approach (if I understand you correctly). I don't think we can say that it provides clarity about SwR's semantics, given that other implementations have taken other, equally valid approaches.
>
> See above. The specification failed to detail this.
I'm having a hard time seeing this as a significant improvement in clarity; we go from "I'm OK with the response being served stale during this window, while you attempt to revalidate it in the background" to "I'm OK with it being served stale in this window, while an attempt is made to revalidate it in the background, but if that revalidation attempt fails (for some definition of failure), suddenly it's not OK to serve it stale." It introduces a new factor into whether something is allowed to be served stale.I.e., if there's no traffic for a week and it's OK to serve a response stale, why is it not OK to serve that same response stale at the same time if there was a failed revalidation driven by a previous request sometime during that week? A transient network or server failure is not likely to affect the resource's state.The difference is we're in the SwR window where (I'm positing) the server wants some reasonable effort be made to revalidate the resource. In an environment where background requests are treated differently from foreground requests and the background revalidations may fail, that means we need to give up on the background ones at some point. Otherwise we risk degrading SwR to a larger max-age, which violates the server expectations.It seems like what you really want to do here is to specify how stale responses can or cannot be used when a fresh response can't be obtained. By default, HTTP caches are allowed to do this with all storable responses unless CC: must-revalidate or no-cache are present; we introduced CC: stale-if-error to provide some finer granularity there. Perhaps that's what you're looking for?
E.g., a one-week SwR window with the semantics you define might look like:
Cache-Control: max-age=3600, stale-while-revalidate=604800, stale-if-error=60
If that's interesting, I could see updating the spec to clarify the relationship between SwR and SiE.From later in the thread, it sounds like stale-if-error doesn't actually provide this. (If it did, I think it'd be rather poor to separate them because SwR on its own is not meaningful without the bound.)But, no, what I'm looking for isn't a particular mechanism per se. As you note, HTTP caches are already allowed significant leeway in how they do things. This is great since intermediate caches and clients have different needs. Even different browsers will have different opinions on how to do things. The cost is we need to be clear on the bounds of that flexibility, otherwise headers have no meaning.If you do want to try to add additional constraints to error handling for SwR independently, the discussion really needs to happen on a list like ietf-h...@w3.org, since lots of caches have implemented it and they're going to want to weigh in.Happy to weigh in on ietf-http-wg, but I've found such things work much better after some initial discussion to make sure we're all talking about roughly the same thing. I am still puzzled by your responses on this thread which seem to simultaneously advocate for these semantics while also rejecting them. I'm probably horribly misunderstanding something, so I'd like to get to the bottom of that first. :-)
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAF8qwaALDFdqtELyTFe99nmdpRF4EcVdtk1KcvwW7e-aGyaOMA%40mail.gmail.com.