Intent to implement and ship: HTTP/2 push header validation

173 views
Skip to first unread message

Bence Béky

unread,
Oct 30, 2017, 4:17:31 PM10/30/17
to blink-dev
To: blink-dev@ (bcc: net-dev@ for heads up)

Contact emails

b...@chromium.org

Summary

Do not serve requests from HTTP/2 pushed streams that do not match with respect to the Vary header or range parameters.

Motivation

Currently HTTP/2 pushed streams are matched with request based on URL only.  This can result in an incorrect or corrupt response to a request (for example, unsupported encoding, web page corresponding to invalid user credentials, or wrong range for a range request).

Tracking bug


Design document


Relevant specifications

HTTP/2 server push: RFC 7540 Section 8.2.
Vary header: RFC 7231 Section 7.1.4.
Range request: RFC 7233

Interoperability risk

As of 2017 May, Firefox and Edge both ignore the Vary header for HTTP/2 pushed streams, along with the current behavior of Chrome.  Safari already obeys the Vary header, which is the proposed behavior for Chrome.  See https://jakearchibald.com/2017/h2-push-tougher-than-i-thought/#items-in-the-push-cache-should-be-matched-using-http-semantics-aside-from-freshness.  Also note that Chrome obeys the Vary header for QUIC pushed streams.

Range parameters are currently ignored in Chrome both for HTTP/2 pushed streams and QUIC pushed streams.  I have no information about Range requests in other browsers.

Usage information from UMA

Net.SpdyStreamsPushedPerSession shows that Chrome receives pushed streams on only very few HTTP/2 connections.

Net.SpdySession.PushedBytes show that on average, a few dozen bytes of data are pushed per connection.

Net.PushedStreamAlreadyHasResponseHeaders show that about 25% of the time, response headers have not yet been received when a request is matched to a pushed stream.  This justifies the need for an async mechanism for matching requests to pushed streams.

New metrics shall be added to monitor the percentage of request--pushed stream pairs with matching URL that are not matched due to Vary header or range parameters.

Entry on the Chrome Platform Status


Is this feature fully tested by web-platform-tests?

No.  WPT does not support HTTP/2.

Will this feature be supported on all six Blink platforms (Windows, Mac,
Linux, Chrome OS, Android, and Android WebView)?

Yes.

Requesting approval to ship?

Yes.

Jeffrey Yasskin

unread,
Oct 30, 2017, 4:41:05 PM10/30/17
to b...@chromium.org, blink-dev
It seems like https://fetch.spec.whatwg.org/ should describe this behavior of fulfilling requests based on previously-pushed resources. Is anyone working on that?

Thanks,
Jeffrey

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CACMu3toRrnLet8r8NTLTxoZob%2Bj-0ZAYsmgaKorYQe7KfSGAdA%40mail.gmail.com.

Yoav Weiss

unread,
Nov 1, 2017, 5:40:33 PM11/1/17
to Jeffrey Yasskin, ann...@annevk.nl, mn...@mnot.net, b...@chromium.org, blink-dev
On Mon, Oct 30, 2017 at 8:41 PM 'Jeffrey Yasskin' via blink-dev <blin...@chromium.org> wrote:
It seems like https://fetch.spec.whatwg.org/ should describe this behavior of fulfilling requests based on previously-pushed resources. Is anyone working on that?

There's an open issue for that definition, but I'm not aware of anyone actively working on it. I'm also not 100% sure that Fetch is the right place for the H2 push cache definition (one could argue that it should be defined as part of an extension to HTTP, as it is relevant for non-browser HTTP clients).

 

On Mon, Oct 30, 2017 at 1:17 PM Bence Béky <b...@chromium.org> wrote:

No.  WPT does not support HTTP/2.

Indeed. We talked in the past about Someone™ creating an HTTP/2 test suite... sigh.
  

Mark Nottingham

unread,
Nov 1, 2017, 7:08:33 PM11/1/17
to Yoav Weiss, Jeffrey Yasskin, Anne van Kesteren, b...@chromium.org, blink-dev
On 2 Nov 2017, at 8:40 am, Yoav Weiss <yo...@yoav.ws> wrote:
>
> +Anne van Kesteren +Mark Nottingham
>
> On Mon, Oct 30, 2017 at 8:41 PM 'Jeffrey Yasskin' via blink-dev <blin...@chromium.org> wrote:
> It seems like https://fetch.spec.whatwg.org/ should describe this behavior of fulfilling requests based on previously-pushed resources. Is anyone working on that?
>
> There's an open issue for that definition, but I'm not aware of anyone actively working on it. I'm also not 100% sure that Fetch is the right place for the H2 push cache definition (one could argue that it should be defined as part of an extension to HTTP, as it is relevant for non-browser HTTP clients).

Yeah. There's still an argument that there doesn't need to be a separate H2 push cache; i.e., the current browser behaviour is unnecessary. Push is currently defined just in terms of HTTP caching, so it might be that the implementations need to change to match the spec, not the other way around.

My understanding is that no one actually knows where the requirement for a separate push cache is coming from -- both the browser network engineers and browser security folks say it came from somewhere else, but upon examination there isn't a well-stated reason for it.

Not sure what the next steps are, but I'm hoping that if we have another HTTP workshop this year, we might have some illuminating discussions there. Lots of people are talking about push and what its future might be.

All of that said -- even if push *is* just an interaction with the HTTP cache, there are still some HTTP caching tests in the fetch directory of WPT, so conceivably it could fit there.



--
Mark Nottingham https://www.mnot.net/

Ryan Hamilton

unread,
Nov 1, 2017, 8:44:57 PM11/1/17
to Mark Nottingham, Yoav Weiss, Jeffrey Yasskin, Anne van Kesteren, Bence Béky, blink-dev
We (chrome network team) talked some time back about pushing directly into the HTTP cache and decided against it because (swapping in context) it lets the server write entries into the HTTP cache that the client would not otherwise create. For example, consider a situation where a page is infected with malware. SafeBrowsing would kick in and prevent the request from going to the network and the infected page would not end up in the HTTP cache. Once the page was cleaned up and removed from SafeBrowsing, requests for that URL would go to the network and into the cache. However, if the server pushed the page into the HTTP cache, then after the page is cleaned up and removed from SafeBrowsing, the client would request the URL and the infected page would be served out of the HTTP cache. This seemed bad.

Though I may be misunderstanding your comment about changing behavior to simply be HTTP caching.

Cheers,

Ryan

Mark Nottingham

unread,
Nov 1, 2017, 11:10:43 PM11/1/17
to Ryan Hamilton, Yoav Weiss, Jeffrey Yasskin, Anne van Kesteren, Bence Béky, blink-dev


> On 2 Nov 2017, at 11:44 am, Ryan Hamilton <r...@chromium.org> wrote:
>
> We (chrome network team) talked some time back about pushing directly into the HTTP cache and decided against it because (swapping in context) it lets the server write entries into the HTTP cache that the client would not otherwise create. For example, consider a situation where a page is infected with malware. SafeBrowsing would kick in and prevent the request from going to the network and the infected page would not end up in the HTTP cache. Once the page was cleaned up and removed from SafeBrowsing, requests for that URL would go to the network and into the cache. However, if the server pushed the page into the HTTP cache, then after the page is cleaned up and removed from SafeBrowsing, the client would request the URL and the infected page would be served out of the HTTP cache. This seemed bad.

If SafeBrowing detects that a request to https://evil.example.com/ is infected, does the request actually go out on the wire? I ask because pushed requests have to be associated with a stream that's either open or half-closed.

Ryan Hamilton

unread,
Nov 1, 2017, 11:46:24 PM11/1/17
to Mark Nottingham, Yoav Weiss, Jeffrey Yasskin, Anne van Kesteren, Bence Béky, blink-dev
On Wed, Nov 1, 2017 at 11:10 PM, Mark Nottingham <mn...@mnot.net> wrote:
> On 2 Nov 2017, at 11:44 am, Ryan Hamilton <r...@chromium.org> wrote:
>
> We (chrome network team) talked some time back about pushing directly into the HTTP cache and decided against it because (swapping in context) it lets the server write entries into the HTTP cache that the client would not otherwise create. For example, consider a situation where a page is infected with malware. SafeBrowsing would kick in and prevent the request from going to the network and the infected page would not end up in the HTTP cache. Once the page was cleaned up and removed from SafeBrowsing, requests for that URL would go to the network and into the cache. However, if the server pushed the page into the HTTP cache, then after the page is cleaned up and removed from SafeBrowsing, the client would request the URL and the infected page would be served out of the HTTP cache. This seemed bad.

If SafeBrowing detects that a request to https://evil.example.com/ is infected, does the request actually go out on the wire?

​No.​
 
I ask because pushed requests have to be associated with a stream that's either open or half-closed.

That's true, but imagine that the user requests https://example.com/ which links to https://example.com/other. The request to ​https://example.com/ would go to the wire and the server could try to push /other (which is currently infected) as associated with /. SafeBrowsing would prevent the request for /other from hitting the wire and hence the pushed (infected) resource would not be pulled up into the HTTP cache. After a short period of time it will expire from the H2 push cache.

​Does that make sense?

Cheers,

Ryan

Anne van Kesteren

unread,
Nov 2, 2017, 1:50:42 AM11/2/17
to Ryan Hamilton, Mark Nottingham, Yoav Weiss, Jeffrey Yasskin, Bence Béky, blink-dev
On Thu, Nov 2, 2017 at 4:46 AM, Ryan Hamilton <r...@chromium.org> wrote:
> That's true, but imagine that the user requests https://example.com/ which
> links to https://example.com/other. The request to https://example.com/
> would go to the wire and the server could try to push /other (which is
> currently infected) as associated with /. SafeBrowsing would prevent the
> request for /other from hitting the wire and hence the pushed (infected)
> resource would not be pulled up into the HTTP cache. After a short period of
> time it will expire from the H2 push cache.
>
> Does that make sense?

If that is the sole reason why invent a new caching mechanism rather
than perform a SafeBrowsing check before putting pushed resources in
the cache?


--
https://annevankesteren.nl/

Matthew Menke

unread,
Nov 2, 2017, 5:26:53 PM11/2/17
to blink-dev, mn...@mnot.net, yo...@yoav.ws, jyas...@google.com, ann...@annevk.nl, b...@chromium.org


On Wednesday, November 1, 2017 at 11:46:24 PM UTC-4, Ryan Hamilton wrote:
On Wed, Nov 1, 2017 at 11:10 PM, Mark Nottingham <mn...@mnot.net> wrote:
> On 2 Nov 2017, at 11:44 am, Ryan Hamilton <r...@chromium.org> wrote:
>
> We (chrome network team) talked some time back about pushing directly into the HTTP cache and decided against it because (swapping in context) it lets the server write entries into the HTTP cache that the client would not otherwise create. For example, consider a situation where a page is infected with malware. SafeBrowsing would kick in and prevent the request from going to the network and the infected page would not end up in the HTTP cache. Once the page was cleaned up and removed from SafeBrowsing, requests for that URL would go to the network and into the cache. However, if the server pushed the page into the HTTP cache, then after the page is cleaned up and removed from SafeBrowsing, the client would request the URL and the infected page would be served out of the HTTP cache. This seemed bad.

If SafeBrowing detects that a request to https://evil.example.com/ is infected, does the request actually go out on the wire?

​No.​

Worth noting that it does on Android, where SafeBrowsing checks are done asynchronously, and behavior on desktop is changing with the advent of an out-of-process network stack.

Ryan Hamilton

unread,
Nov 2, 2017, 6:12:42 PM11/2/17
to Anne van Kesteren, Mark Nottingham, Yoav Weiss, Jeffrey Yasskin, Bence Béky, blink-dev
Safe browsing is not the only reason, no. But it's an example (the only one I could remember offhand, admittedly :>) of a class of problems that crop up when things are written to the cache by the net stack without the intervention of higher layers of the code. 

Mark Nottingham

unread,
Nov 2, 2017, 6:27:16 PM11/2/17
to Ryan Hamilton, Anne van Kesteren, Yoav Weiss, Jeffrey Yasskin, Bence Béky, blink-dev
So, I don't see any reason why you can't just patch in before a cache write. That's the most intuitive way to model it, rather than inventing a new cache -- especially when SB isn't part of the standard.

Also, AIUI SB is done on a per-origin level; if so, your example above doesn't hold together.

And, if SB checks are moving to async, it seems like you're going to have cache pollution problems to deal with all over the place, not just from push.

Cheers,

Ryan Hamilton

unread,
Nov 2, 2017, 6:35:13 PM11/2/17
to Mark Nottingham, Anne van Kesteren, Yoav Weiss, Jeffrey Yasskin, Bence Béky, blink-dev
On Thu, Nov 2, 2017 at 3:27 PM, Mark Nottingham <mn...@mnot.net> wrote:
On 3 Nov 2017, at 9:12 am, Ryan Hamilton <r...@chromium.org> wrote:
>
> On Wed, Nov 1, 2017 at 10:50 PM, Anne van Kesteren <ann...@annevk.nl> wrote:
> On Thu, Nov 2, 2017 at 4:46 AM, Ryan Hamilton <r...@chromium.org> wrote:
> > That's true, but imagine that the user requests https://example.com/ which
> > links to https://example.com/other. The request to https://example.com/
> > would go to the wire and the server could try to push /other (which is
> > currently infected) as associated with /. SafeBrowsing would prevent the
> > request for /other from hitting the wire and hence the pushed (infected)
> > resource would not be pulled up into the HTTP cache. After a short period of
> > time it will expire from the H2 push cache.
> >
> > Does that make sense?
>
> If that is the sole reason why invent a new caching mechanism rather
> than perform a SafeBrowsing check before putting pushed resources in
> the cache?
>
> Safe browsing is not the only reason, no. But it's an example (the only one I could remember offhand, admittedly :>) of a class of problems that crop up when things are written to the cache by the net stack without the intervention of higher layers of the code.

So, I don't see any reason why you can't just patch in before a cache write. That's the most intuitive way to model it, rather than inventing a new cache -- especially when SB isn't part of the standard.

Also, AIUI SB is done on a per-origin level; if so, your example above doesn't hold together.

​I don't believe that is correct. From https://developers.google.com/safe-browsing/v4/urls-hashing

For the host, the client will try at most five different strings. They are:
    • The exact hostname in the URL.
    • Up to four hostnames formed by starting with the last five components and successively removing the leading component. The top-level domain can be skipped. These additional hostnames should not be checked if the host is an IP address.

For the path, the client will try at most six different strings. They are:

    • The exact path of the URL, including query parameters.
    • The exact path of the URL, without query parameters.
    • The four paths formed by starting at the root (/) and successively appending path components, including a trailing slash.

Mark Nottingham

unread,
Nov 2, 2017, 8:30:42 PM11/2/17
to Ryan Hamilton, Anne van Kesteren, Yoav Weiss, Jeffrey Yasskin, Bence Béky, blink-dev
Ah, OK - thanks. I think the other points made stand.

Cheers,

Ryan Hamilton

unread,
Nov 2, 2017, 10:47:16 PM11/2/17
to Mark Nottingham, Anne van Kesteren, Yoav Weiss, Jeffrey Yasskin, Bence Béky, blink-dev
To back up, I came into this thread to address the assertion:

My understanding is that no one actually knows where the requirement for a separate push cache is coming from -- both the browser network engineers and browser security folks say it came from somewhere else, but upon examination there isn't a well-stated reason for it.

I was simply attempting to explain why the Chrome network engineers chose to have a separate push cache. Namely, that pushing straight into the cache at the HTTP layer results in entries in the cache for resources that the client may not request for policy reasons. Now sure, it might be possible to up call from the network layer into the loading/rendering layer at cache write time to see if a push really should go into the cache. But that's non-trivially complex and as it will be asynchronous, you need a place for the pushed data to live while you're waiting for that call to complete. You might call such a waiting area the "push cache" or you might call it the "push map". Or, instead of doing the up-call, you could instead not do the upcall and simply wait for the request to come in. Both cases result in pushed stream data sitting down in the network before being pulled up into the cache when instructed by the higher layers. The latter solution is much easier to implement. But either way, you still end up with the same holding area.

In any case, the possible change to async SafeBrowsing and the effects this has on the HTTP cache are an ... active topic of discussion at the moment. :) The WebRequest extension API is another example of such a higher layer interaction.

Cheers,

Ryan

Mark Nottingham

unread,
Nov 2, 2017, 11:02:16 PM11/2/17
to Ryan Hamilton, Anne van Kesteren, Yoav Weiss, Jeffrey Yasskin, Bence Béky, blink-dev


> On 3 Nov 2017, at 1:47 pm, Ryan Hamilton <r...@chromium.org> wrote:
>
> To back up, I came into this thread to address the assertion:
>
> My understanding is that no one actually knows where the requirement for a separate push cache is coming from -- both the browser network engineers and browser security folks say it came from somewhere else, but upon examination there isn't a well-stated reason for it.

Fair enough. Thanks for the information, it's good to get that context.

> I was simply attempting to explain why the Chrome network engineers chose to have a separate push cache. Namely, that pushing straight into the cache at the HTTP layer results in entries in the cache for resources that the client may not request for policy reasons. Now sure, it might be possible to up call from the network layer into the loading/rendering layer at cache write time to see if a push really should go into the cache. But that's non-trivially complex and as it will be asynchronous, you need a place for the pushed data to live while you're waiting for that call to complete. You might call such a waiting area the "push cache" or you might call it the "push map". Or, instead of doing the up-call, you could instead not do the upcall and simply wait for the request to come in. Both cases result in pushed stream data sitting down in the network before being pulled up into the cache when instructed by the higher layers. The latter solution is much easier to implement. But either way, you still end up with the same holding area.

This leaves me wondering why the SafeBrowsing is considered to be at the loading/rendering layer, when it seems so well-suited to be a network-layer function.

An alternative approach would be to mark it "potentially dirty" upon cache insertion, updating the cache with "clean" or "dirty" (i.e., purging it) upon the SafeBrowsing call returning. This is the approach that's taken in similar situations in intermediary caches, and it works reasonably well at that scale. That avoids creating another layer of caching.

If it's not obvious, a lot of the concern about this is because having another layer of caching makes understanding the system behaviour all the more difficult -- especially when that cache's behaviours aren't aligned with the others'.


> In any case, the possible change to async SafeBrowsing and the effects this has on the HTTP cache are an ... active topic of discussion at the moment. :) The WebRequest extension API is another example of such a higher layer interaction.

Interesting. That might be a better illustration than SafeBrowsing.

In VCL-land, we have a separate callback for inspecting/modifying a response before (possible) cache insertion. I can't help but wonder if browsers needs something similar. I know ServiceWorker can do it with *its* cache, but while there's still a HTTP cache in the client, it needs to be accounted for too.

Cheers,

Ryan Hamilton

unread,
Nov 2, 2017, 11:37:54 PM11/2/17
to Mark Nottingham, Anne van Kesteren, Yoav Weiss, Jeffrey Yasskin, Bence Béky, blink-dev
On Thu, Nov 2, 2017 at 8:02 PM, Mark Nottingham <mn...@mnot.net> wrote:
> On 3 Nov 2017, at 1:47 pm, Ryan Hamilton <r...@chromium.org> wrote:
> I was simply attempting to explain why the Chrome network engineers chose to have a separate push cache. Namely, that pushing straight into the cache at the HTTP layer results in entries in the cache for resources that the client may not request for policy reasons. Now sure, it might be possible to up call from the network layer into the loading/rendering layer at cache write time to see if a push really should go into the cache. But that's non-trivially complex and as it will be asynchronous, you need a place for the pushed data to live while you're waiting for that call to complete. You might call such a waiting area the "push cache" or you might call it the "push map". Or, instead of doing the up-call, you could instead not do the upcall and simply wait for the request to come in. Both cases result in pushed stream data sitting down in the network before being pulled up into the cache when instructed by the higher layers. The latter solution is much easier to implement. But either way, you still end up with the same holding area.

This leaves me wondering why the SafeBrowsing is considered to be at the loading/rendering layer, when it seems so well-suited to be a network-layer function.

Hm. I don't know the answer to that. I wonder if it might be related to the fact that SafeBrowsing issues requests, and we typically avoid issuing requests from inside the network stack? But I'm just guessing. (Also a quick perusal of the code suggests that it currently has pieces which run on the UI thread and is aware of things like Navigations. I suspect that someone with more knowledge of SafeBrowsing would probably be able to shed some light).
An alternative approach would be to mark it "potentially dirty" upon cache insertion, updating the cache with "clean" or "dirty" (i.e., purging it) upon the SafeBrowsing call returning. This is the approach that's taken in similar situations in intermediary caches, and it works reasonably well at that scale. That avoids creating another layer of caching.

*nod* Though see the issue about push bandwidth below.
 
If it's not obvious, a lot of the concern about this is because having another layer of caching makes understanding the system behaviour all the more difficult -- especially when that cache's behaviours aren't aligned with the others'.

​Agreed!​ Very happy to see effort in this space to ensure that the behaviors are easily understandable. 

> In any case, the possible change to async SafeBrowsing and the effects this has on the HTTP cache are an ... active topic of discussion at the moment. :) The WebRequest extension API is another example of such a higher layer interaction.

Interesting. That might be a better illustration than SafeBrowsing.

​Fair enough. (SB was the only one I could remember initially :>)

I finally found the notes from the meeting we had about this. It also pointed out that pushing straight into the cache raises the potential for a server to consume a virtually unlimited volume of data, even if the user navigates away from a page. When navigating away from a page, any in-progress requests are killed. But if a resource is being pushed into cache without being associated with an active request, it won't be killed and that push will continue to use bandwidth with no way for the user to make it stop. (Clearly the pushed stream was associated with an explicitly requested stream at the H2 layer when it was promised, but that request may well complete before navigating away)
In VCL-land, we have a separate callback for inspecting/modifying a response before (possible) cache insertion. I can't help but wonder if browsers needs something similar. I know ServiceWorker can do it with *its* cache, but while there's still a HTTP cache in the client, it needs to be accounted for too.

​There are caches everywhere! :) ServiceWorker caches, Blink caches, HTTP caches, Push caches.


Cheers,

Ryan

Jeffrey Yasskin

unread,
Nov 3, 2017, 12:27:55 PM11/3/17
to Ryan Hamilton, Mark Nottingham, Anne van Kesteren, Yoav Weiss, b...@chromium.org, blink-dev, Jake Archibald
On Thu, Nov 2, 2017 at 7:47 PM Ryan Hamilton <r...@chromium.org> wrote:
To back up, I came into this thread to address the assertion:

My understanding is that no one actually knows where the requirement for a separate push cache is coming from -- both the browser network engineers and browser security folks say it came from somewhere else, but upon examination there isn't a well-stated reason for it.

I was simply attempting to explain why the Chrome network engineers chose to have a separate push cache. Namely, that pushing straight into the cache at the HTTP layer results in entries in the cache for resources that the client may not request for policy reasons. Now sure, it might be possible to up call from the network layer into the loading/rendering layer at cache write time to see if a push really should go into the cache.

This actually strikes me as a possibility.
<naive-speculation>
Call up, not at cache write time, but when the network layer receives the PUSH_PROMISE (semantically, a Request) that opens the response stream. This PUSH_PROMISE is associated with an existing fetch operation "parent". If "parent" is owned by a JS fetch() call, its observer (https://gist.github.com/slightlyoff/18dc42ae00768c23fbc4c5097400adfb#gistcomment-2227534) receives a cancelable event for the PUSH_PROMISE. If it's canceled, that RST_STREAMs the pushed stream.

Otherwise, "parent"'s page fetches the pushed request, which schedules it to be canceled if the page navigates, and runs the request through any frontend-based SafeBrowsing checks before directing the response stream toward the cache. If "parent" is owned by the browser itself, it just always does the default action to fetch the pushed request.

This default fetch can also be the place to incorporate the Vary-checking in this thread's Intent: if a new request to the pushed's request's URL wouldn't match the pushed Vary header, it could cancel instead of actually fetching.

There's still a period where the network stack has to store the response stream before the page has taken ownership, but I think this avoids exposing that complexity to web authors.
</naive-speculation>

What have I missed?

Jeffrey

Ryan Hamilton

unread,
Nov 3, 2017, 6:32:05 PM11/3/17
to Jeffrey Yasskin, Mark Nottingham, Anne van Kesteren, Yoav Weiss, Bence Béky, blink-dev, Jake Archibald
On Fri, Nov 3, 2017 at 9:27 AM, Jeffrey Yasskin <jyas...@google.com> wrote:
On Thu, Nov 2, 2017 at 7:47 PM Ryan Hamilton <r...@chromium.org> wrote:
To back up, I came into this thread to address the assertion:

My understanding is that no one actually knows where the requirement for a separate push cache is coming from -- both the browser network engineers and browser security folks say it came from somewhere else, but upon examination there isn't a well-stated reason for it.

I was simply attempting to explain why the Chrome network engineers chose to have a separate push cache. Namely, that pushing straight into the cache at the HTTP layer results in entries in the cache for resources that the client may not request for policy reasons. Now sure, it might be possible to up call from the network layer into the loading/rendering layer at cache write time to see if a push really should go into the cache.

This actually strikes me as a possibility.
<naive-speculation>
Call up, not at cache write time, but when the network layer receives the PUSH_PROMISE (semantically, a Request) that opens the response stream. This PUSH_PROMISE is associated with an existing fetch operation "parent". If "parent" is owned by a JS fetch() call, its observer (https://gist.github.com/slightlyoff/18dc42ae00768c23fbc4c5097400adfb#gistcomment-2227534) receives a cancelable event for the PUSH_PROMISE. If it's canceled, that RST_STREAMs the pushed stream.

Otherwise, "parent"'s page fetches the pushed request, which schedules it to be canceled if the page navigates, and runs the request through any frontend-based SafeBrowsing checks before directing the response stream toward the cache. If "parent" is owned by the browser itself, it just always does the default action to fetch the pushed request.

This default fetch can also be the place to incorporate the Vary-checking in this thread's Intent: if a new request to the pushed's request's URL wouldn't match the pushed Vary header, it could cancel instead of actually fetching.

There's still a period where the network stack has to store the response stream before the page has taken ownership, but I think this avoids exposing that complexity to web authors.
</naive-speculation>

What have I missed?

​Sure, all of that could definitely be done. Or we could simply keep the pushed stream down at the HTTP/2 layer and wait for the request to rendezvous with it. This is much simpler.

Jeffrey Yasskin

unread,
Nov 3, 2017, 6:38:34 PM11/3/17
to Ryan Hamilton, Mark Nottingham, Anne van Kesteren, Yoav Weiss, b...@chromium.org, blink-dev, Jake Archibald
It's simpler for the implementation, but seems to be less simple for web developers (e.g. https://jakearchibald.com/2017/h2-push-tougher-than-i-thought/). In our priority of stakeholders, the developers beat the implementers.

Jeffrey

Ryan Hamilton

unread,
Nov 3, 2017, 7:28:49 PM11/3/17
to Jeffrey Yasskin, Mark Nottingham, Anne van Kesteren, Yoav Weiss, Bence Béky, blink-dev, Jake Archibald
It's not obvious to me that these two implementations have different user-visible behaviors. Can you say more? (I'm probably overlooking something you said)​

Jeffrey Yasskin

unread,
Nov 3, 2017, 7:51:28 PM11/3/17
to Ryan Hamilton, Mark Nottingham, Anne van Kesteren, Yoav Weiss, b...@chromium.org, blink-dev, Jake Archibald
I *think*, although I could be wrong, that passing every push up to the page, which always either fetches it into the HTTP cache or cancels it, eliminates the developer-visible push cache. That in turn gets rid of two troublesome things Jake pointed out:

* If the connection closes, bye bye push cache
* Requests without credentials use a separate connection

Jeffrey

Yoav Weiss

unread,
Nov 6, 2017, 8:58:24 AM11/6/17
to Jeffrey Yasskin, Ryan Hamilton, Mark Nottingham, Anne van Kesteren, b...@chromium.org, blink-dev, Jake Archibald
On Fri, Nov 3, 2017 at 11:51 PM Jeffrey Yasskin <jyas...@google.com> wrote:
On Fri, Nov 3, 2017 at 4:28 PM Ryan Hamilton <r...@chromium.org

I *think*, although I could be wrong, that passing every push up to the page, which always either fetches it into the HTTP cache or cancels it, eliminates the developer-visible push cache. That in turn gets rid of two troublesome things Jake pointed out:

* If the connection closes, bye bye push cache
* Requests without credentials use a separate connection

 
While getting rid of the above would indeed be great, I'm not sure the conclusion must be that we need to store pushed resources in the HTTP cache. I also think this is a wider-scoped discussion which doesn't necessarily need to block the work that Bence is proposing, which will lead to arguably better push behavior in the interim.
What I'm missing here is a way to assess the risk involved. While push changes are likely to bare little breakage risk, they can lead to spurious downloads, and can result in perf regressions.

A few points that bother me:
* As I commented in the doc, I think the way Range request support will be implemented will increase spurious pushes. I believe we need to gather data on the presence of such pushes today before making this compromise.
*  It's not clear to me what the use case is for `Vary` headers in pushed responses. Generally, if you need content negotiation, server push is not the right tool for the job (unless that negotiation gets the same result in 99% of the cases, e.g. "Content-Encoding"). Bence, could you comment on which user scenarios are likely to improve as a result of this change?
* Range request push seems useful mostly for media streaming scenarios. Bence, is that the use-case we're targeting here? 

Cheers :)
Yoav 

Jeffrey Yasskin

unread,
Nov 6, 2017, 9:14:08 AM11/6/17
to Yoav Weiss, Ryan Hamilton, Mark Nottingham, Anne van Kesteren, b...@chromium.org, blink-dev, Jake Archibald
On Mon, Nov 6, 2017 at 6:58 AM Yoav Weiss <yo...@yoav.ws> wrote:
On Fri, Nov 3, 2017 at 11:51 PM Jeffrey Yasskin <jyas...@google.com> wrote:
On Fri, Nov 3, 2017 at 4:28 PM Ryan Hamilton <r...@chromium.org

I *think*, although I could be wrong, that passing every push up to the page, which always either fetches it into the HTTP cache or cancels it, eliminates the developer-visible push cache. That in turn gets rid of two troublesome things Jake pointed out:

* If the connection closes, bye bye push cache
* Requests without credentials use a separate connection

 
While getting rid of the above would indeed be great, I'm not sure the conclusion must be that we need to store pushed resources in the HTTP cache. I also think this is a wider-scoped discussion which doesn't necessarily need to block the work that Bence is proposing, which will lead to arguably better push behavior in the interim.

I agree. For Bence's work, I just want someone to specify what Chrome is actually doing in response to fetches. It doesn't need to be the long-term consensus behavior.
 
What I'm missing here is a way to assess the risk involved. While push changes are likely to bare little breakage risk, they can lead to spurious downloads, and can result in perf regressions.

A few points that bother me:
* As I commented in the doc, I think the way Range request support will be implemented will increase spurious pushes. I believe we need to gather data on the presence of such pushes today before making this compromise.
*  It's not clear to me what the use case is for `Vary` headers in pushed responses. Generally, if you need content negotiation, server push is not the right tool for the job (unless that negotiation gets the same result in 99% of the cases, e.g. "Content-Encoding"). Bence, could you comment on which user scenarios are likely to improve as a result of this change?

Even if there's no use case for including Vary headers in pushed responses, it seems to make sense to either do what Bence is proposing or drop push responses with Vary headers entirely, to prevent the ecosystem from starting to include them with nonsensical values.

Jeffrey

Bence Béky

unread,
Nov 6, 2017, 10:15:51 AM11/6/17
to blink-dev, ckr...@chromium.org, Yoav Weiss, Ryan Hamilton, Mark Nottingham, Anne van Kesteren, Jake Archibald, Jeffrey Yasskin
Hi,

Thank you for the valuable feedback.

I agree with Jeffrey that it is important for the ecosysem not to
match requests to pushed streams with incompatible Vary values. While
I am not aware of any use case, I know that this is property
implemented for QUIC pushes in Chrome. Cc'ing Buck who implemented
that: Buck, are you aware of any specific use cases?

I can also add a metric to see what percentage of pushed response
headers has a Vary header. If it is exceedingly low, it might be a
good option to reject such pushes for the time being as proposed by
Jeffrey, until a compelling use case arises. However, I would not be
surprised if it actually was pretty high due to Content-Encoding.

Yoav: I was told that game asset loading or DASH streaming could
potentially benefit from non-overlapping Range pushes for the same
URL. Again, if wasted download bandwidth is a concern, measurements
can be made before making this change.

Thank you,

Bence

ckr...@chromium.org

unread,
Nov 6, 2017, 1:08:31 PM11/6/17
to blink-dev, ckr...@chromium.org, yo...@yoav.ws, r...@chromium.org, mn...@mnot.net, ann...@annevk.nl, jakear...@google.com, jyas...@google.com


On Monday, November 6, 2017 at 7:15:51 AM UTC-8, Bence Béky wrote:
Hi,

Thank you for the valuable feedback.

I agree with Jeffrey that it is important for the ecosysem not to
match requests to pushed streams with incompatible Vary values.  While
I am not aware of any use case, I know that this is property
implemented for QUIC pushes in Chrome.  Cc'ing Buck who implemented
that: Buck, are you aware of any specific use cases?


No.  The inclusion in QUIC's implementation wasn't driven by a use case, but rather was based on an e-mail discussion I had with Patrick McManus about RFC 7540.

Yoav Weiss

unread,
Nov 6, 2017, 2:59:07 PM11/6/17
to ckr...@chromium.org, blink-dev, r...@chromium.org, mn...@mnot.net, ann...@annevk.nl, jakear...@google.com, jyas...@google.com, Patrick McManus
On Mon, Nov 6, 2017 at 10:08 AM <ckr...@chromium.org> wrote:


On Monday, November 6, 2017 at 7:15:51 AM UTC-8, Bence Béky wrote:
Hi,

Thank you for the valuable feedback.

I agree with Jeffrey that it is important for the ecosysem not to
match requests to pushed streams with incompatible Vary values.  While
I am not aware of any use case, I know that this is property
implemented for QUIC pushes in Chrome.  Cc'ing Buck who implemented
that: Buck, are you aware of any specific use cases?


No.  The inclusion in QUIC's implementation wasn't driven by a use case, but rather was based on an e-mail discussion I had with Patrick McManus about RFC 7540.


Changing our behavior without a clear use-case is fine. I was just wondering if there was a particular reason you chose to tackle this issue.

Also, does a request from Patrick McManus to change behavior here means that Firefox will also align on similar behavior? Is there an open issue for them on that front?

(CCing Patrick)

 
I can also add a metric to see what percentage of pushed response
headers has a Vary header.  If it is exceedingly low, it might be a
good option to reject such pushes for the time being as proposed by
Jeffrey, until a compelling use case arises.  However, I would not be
surprised if it actually was pretty high due to Content-Encoding.

Yeah, I would hope Content-Encoding negotiation can happen on the initial request before anything is pushed on the connection, but that may not always be the case. Gathering that data can be interesting, but I don't know if we should block shipping on it.
  

Yoav: I was told that game asset loading or DASH streaming could
potentially benefit from non-overlapping Range pushes for the same
URL.  

That makes sense.
 
Again, if wasted download bandwidth is a concern, measurements
can be made before making this change.

Ideally, I'd love to see Range request support that still RSTs push promises on formerly-pushed URLs which have no range in their request.
That kind of implementation will have no spurious-download concerns while still covering the Range request use-case.

Charles 'Buck' Krasic

unread,
Nov 6, 2017, 3:04:20 PM11/6/17
to Yoav Weiss, blink-dev, Ryan Hamilton, Mark Nottingham, ann...@annevk.nl, Jake Archibald, jyas...@google.com, Patrick McManus
On Mon, Nov 6, 2017 at 11:58 AM, Yoav Weiss <yo...@yoav.ws> wrote:


On Mon, Nov 6, 2017 at 10:08 AM <ckr...@chromium.org> wrote:


On Monday, November 6, 2017 at 7:15:51 AM UTC-8, Bence Béky wrote:
Hi,

Thank you for the valuable feedback.

I agree with Jeffrey that it is important for the ecosysem not to
match requests to pushed streams with incompatible Vary values.  While
I am not aware of any use case, I know that this is property
implemented for QUIC pushes in Chrome.  Cc'ing Buck who implemented
that: Buck, are you aware of any specific use cases?


No.  The inclusion in QUIC's implementation wasn't driven by a use case, but rather was based on an e-mail discussion I had with Patrick McManus about RFC 7540.


Changing our behavior without a clear use-case is fine. I was just wondering if there was a particular reason you chose to tackle this issue.

Also, does a request from Patrick McManus to change behavior here means that Firefox will also align on similar behavior? Is there an open issue for them on that front?

(CCing Patrick)

Clarification.  I didn't see it as a change in behavior.  I was doing the initial implementation of server push in chromium QUIC, and I was just trying to understand the spec.
 
 
I can also add a metric to see what percentage of pushed response
headers has a Vary header.  If it is exceedingly low, it might be a
good option to reject such pushes for the time being as proposed by
Jeffrey, until a compelling use case arises.  However, I would not be
surprised if it actually was pretty high due to Content-Encoding.

Yeah, I would hope Content-Encoding negotiation can happen on the initial request before anything is pushed on the connection, but that may not always be the case. Gathering that data can be interesting, but I don't know if we should block shipping on it.
  

Yoav: I was told that game asset loading or DASH streaming could
potentially benefit from non-overlapping Range pushes for the same
URL.  

That makes sense.
 
Again, if wasted download bandwidth is a concern, measurements
can be made before making this change.

Ideally, I'd love to see Range request support that still RSTs push promises on formerly-pushed URLs which have no range in their request.
That kind of implementation will have no spurious-download concerns while still covering the Range request use-case.

--
You received this message because you are subscribed to a topic in the Google Groups "blink-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/a/chromium.org/d/topic/blink-dev/5_aP_stqndw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to blink-dev+unsubscribe@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CACj%3DBEhTKppWXLw5hPjN99-DZsgzfxJ79WSrpGvzoyNK49hncA%40mail.gmail.com.



--
Charles 'Buck' Krasic | Software Engineer | ckr...@google.com | +1 (408) 412-1141

Patrick McManus

unread,
Nov 6, 2017, 3:11:49 PM11/6/17
to Yoav Weiss, ckr...@chromium.org, blink-dev, Ryan Hamilton, mnot, Anne van Kesteren, Jake Archibald, jyas...@google.com, Patrick McManus
yes - the firefox push cache should take vary into account.. (I'm not saying someone is typing the code for that atm, but that's where we'll want to be..)

Yoav Weiss

unread,
Nov 6, 2017, 3:11:55 PM11/6/17
to Charles 'Buck' Krasic, blink-dev, Ryan Hamilton, Mark Nottingham, ann...@annevk.nl, Jake Archibald, jyas...@google.com, Patrick McManus
On Mon, Nov 6, 2017 at 12:04 PM Charles 'Buck' Krasic <ckr...@google.com> wrote:
On Mon, Nov 6, 2017 at 11:58 AM, Yoav Weiss <yo...@yoav.ws> wrote:


On Mon, Nov 6, 2017 at 10:08 AM <ckr...@chromium.org> wrote:


On Monday, November 6, 2017 at 7:15:51 AM UTC-8, Bence Béky wrote:
Hi,

Thank you for the valuable feedback.

I agree with Jeffrey that it is important for the ecosysem not to
match requests to pushed streams with incompatible Vary values.  While
I am not aware of any use case, I know that this is property
implemented for QUIC pushes in Chrome.  Cc'ing Buck who implemented
that: Buck, are you aware of any specific use cases?


No.  The inclusion in QUIC's implementation wasn't driven by a use case, but rather was based on an e-mail discussion I had with Patrick McManus about RFC 7540.


Changing our behavior without a clear use-case is fine. I was just wondering if there was a particular reason you chose to tackle this issue.

Also, does a request from Patrick McManus to change behavior here means that Firefox will also align on similar behavior? Is there an open issue for them on that front?

(CCing Patrick)

Clarification.  I didn't see it as a change in behavior.  I was doing the initial implementation of server push in chromium QUIC, and I was just trying to understand the spec.

OK. I agree it's not a change in behavior for QUIC. I now understand that the discussion with Patrick revolved around the QUIC behavior, so that doesn't necessarily mean Firefox aim to align their H2 behavior. Is that correct?

Yoav Weiss

unread,
Nov 6, 2017, 3:21:58 PM11/6/17
to Patrick McManus, ckr...@chromium.org, blink-dev, Ryan Hamilton, mnot, Anne van Kesteren, Jake Archibald, jyas...@google.com
On Mon, Nov 6, 2017 at 12:11 PM Patrick McManus <mcm...@ducksong.com> wrote:
yes - the firefox push cache should take vary into account.. (I'm not saying someone is typing the code for that atm, but that's where we'll want to be..)

Thanks, Patrick. That's valuable input! :) 

Mark Nottingham

unread,
Nov 6, 2017, 8:11:38 PM11/6/17
to Yoav Weiss, Jeffrey Yasskin, Ryan Hamilton, Anne van Kesteren, b...@chromium.org, blink-dev, Jake Archibald
On 7 Nov 2017, at 12:58 am, Yoav Weiss <yo...@yoav.ws> wrote:
>
> * As I commented in the doc, I think the way Range request support will be implemented will increase spurious pushes. I believe we need to gather data on the presence of such pushes today before making this compromise.

I agree, although I'd call it unmatched pushes, not spurious (i.e., it's not the pushers' fault). Range allows servers to respond with *more* than what was requested, in case serving that is more efficient; e.g., if the client requests bytes 1-10 and 20-30, the server can respond 1-30. So only using pushes which exactly match can result in less than optimal behaviour.

If pushes updated the cache, this wouldn't be an issue, of course.


> * It's not clear to me what the use case is for `Vary` headers in pushed responses. Generally, if you need content negotiation, server push is not the right tool for the job (unless that negotiation gets the same result in 99% of the cases, e.g. "Content-Encoding"). Bence, could you comment on which user scenarios are likely to improve as a result of this change?

Besides Content-Encoding, there's Client Hints, of course; you might want to chat with Ilya about how he sees CH and push interacting.

My understanding is that sending Vary is quite common on pushes; you might want to look at what common implementations like mod_h2 do:
https://github.com/icing/mod_h2/issues/86

Also, bringing this up in the HTTP WG would probably result in some data from other implementations. *nudge, nudge*

Yoav Weiss

unread,
Nov 7, 2017, 6:14:10 PM11/7/17
to Mark Nottingham, Jeffrey Yasskin, Ryan Hamilton, Anne van Kesteren, b...@chromium.org, blink-dev, Jake Archibald
On Mon, Nov 6, 2017 at 5:11 PM Mark Nottingham <mn...@mnot.net> wrote:
On 7 Nov 2017, at 12:58 am, Yoav Weiss <yo...@yoav.ws> wrote:
>
> * As I commented in the doc, I think the way Range request support will be implemented will increase spurious pushes. I believe we need to gather data on the presence of such pushes today before making this compromise.

I agree, although I'd call it unmatched pushes, not spurious (i.e., it's not the pushers' fault).

The scenario I'm concerned with is one where the pusher is pushing the same resource multiple times (because there's some logic telling it to push e.g. common.css on all pages, and it doesn't take the fact that it already pushed it into account).
  
Range allows servers to respond with *more* than what was requested, in case serving that is more efficient; e.g., if the client requests bytes 1-10 and 20-30, the server can respond 1-30. So only using pushes which exactly match can result in less than optimal behaviour.

That is a different scenario than what I had in mind, and it's an interesting one as well. Solving that would require implementing range matching in the push cache. (at least in current architecture)
 

If pushes updated the cache, this wouldn't be an issue, of course.


> *  It's not clear to me what the use case is for `Vary` headers in pushed responses. Generally, if you need content negotiation, server push is not the right tool for the job (unless that negotiation gets the same result in 99% of the cases, e.g. "Content-Encoding"). Bence, could you comment on which user scenarios are likely to improve as a result of this change?

Besides Content-Encoding, there's Client Hints, of course; you might want to chat with Ilya about how he sees CH and push interacting.

My understanding is that sending Vary is quite common on pushes; you might want to look at what common implementations like mod_h2 do:
  https://github.com/icing/mod_h2/issues/86

Thanks! I guess that Content-Negotiation based on the initial request makes sense, and then Vary is there to make sure that if conditions change, a different resource would be requested in the future.
 


Also, bringing this up in the HTTP WG would probably result in some data from other implementations. *nudge, nudge*

Will do! :)

Jochen Eisinger

unread,
Nov 16, 2017, 12:48:16 PM11/16/17
to Yoav Weiss, msr...@chromium.org, Mark Nottingham, Jeffrey Yasskin, Ryan Hamilton, Anne van Kesteren, b...@chromium.org, blink-dev, Jake Archibald

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CACj%3DBEif_cEMp2iUwZveJeLYOzS88kD40f-_sfXHJDrTsyP3fQ%40mail.gmail.com.

Bence Béky

unread,
May 12, 2018, 2:41:04 PM5/12/18
to Jochen Eisinger, Yoav Weiss, msr...@chromium.org, Mark Nottingham, Jeffrey Yasskin, Ryan Hamilton, Anne van Kesteren, blink-dev, Jake Archibald
Hi,

I just wrapped up the implementation on this and closed https://crrev.com/554220, so I'm circling back to this thread.  Sorry I didn't wait for LGTMs, I remembered that we mostly reached concensus on the important issues in principle, and totally forgot that I need to wait for format approval.  My apologies, and thanks to Yoav for reminding me.  Owners, please approve retroactively, or voice your concerns.

Vary header matching is now implemented.  There is a histogram that I created early enough that we already have data from Stable channel.  It shows that there is a non-empty Vary header in more than quarter of the cases (often it's accept-encoding, but in some cases there are other fields in it too).  There is another histogram, currently under review, that will allow us to distingish between accepted because Vary header matches, accepted because there is no Vary header, rejected because of Vary mismatch, and many other cases.  This will help understand the impact of this project.

As originally proposed, response headers are validated even if they arrive after the client dispatches a request with matching URL and method (quarter of total matches between client request and pushed stream; in the remaining three quarters, pushed response headers arrive before client request).  However, upon mismatch in the response headers (due to Range or Vary), the pushed stream is reset.  In particular, the case of multiple range requests discussed above is not handled, but the histogram (under review) will have a dedicated bucket for range mismatch, allowing us to see if this happens in the wild.  While I found this use case interesting, I decided to focus on getting the implementation done, requiring exact range match to start with, leaving the option of further complexity open.  As it is, this is definitely an improvement over the previous state, where any pushed stream was matched to a request as long as the URL matched (and certificate was valid etc.).

Cheers,

Bence

Yoav Weiss

unread,
May 12, 2018, 3:00:59 PM5/12/18
to Bence Béky, Jochen Eisinger, msr...@chromium.org, Mark Nottingham, Jeffrey Yasskin, Ryan Hamilton, Anne van Kesteren, blink-dev, Jake Archibald
Thanks for updating this thread!

Retroactive LGTM1

On Sat, May 12, 2018 at 8:41 PM Bence Béky <b...@chromium.org> wrote:
Hi,

I just wrapped up the implementation on this and closed https://crrev.com/554220, so I'm circling back to this thread.  Sorry I didn't wait for LGTMs, I remembered that we mostly reached concensus on the important issues in principle, and totally forgot that I need to wait for format approval.  My apologies, and thanks to Yoav for reminding me.  Owners, please approve retroactively, or voice your concerns.

Vary header matching is now implemented.  There is a histogram that I created early enough that we already have data from Stable channel.  It shows that there is a non-empty Vary header in more than quarter of the cases (often it's accept-encoding, but in some cases there are other fields in it too).  There is another histogram, currently under review, that will allow us to distingish between accepted because Vary header matches, accepted because there is no Vary header, rejected because of Vary mismatch, and many other cases.  This will help understand the impact of this project.

As originally proposed, response headers are validated even if they arrive after the client dispatches a request with matching URL and method (quarter of total matches between client request and pushed stream; in the remaining three quarters, pushed response headers arrive before client request).  However, upon mismatch in the response headers (due to Range or Vary), the pushed stream is reset. 

Happy to see the case of late response headers is also covered, making this matching deterministic. Reseting the stream in case of a mismatch makes perfect sense.
 
In particular, the case of multiple range requests discussed above is not handled, but the histogram (under review) will have a dedicated bucket for range mismatch, allowing us to see if this happens in the wild.  While I found this use case interesting, I decided to focus on getting the implementation done, requiring exact range match to start with, leaving the option of further complexity open. 

Implementing simple logic for Range requests now and collect data that will enable us to add more complex logic in the future as needed makes sense.
 
As it is, this is definitely an improvement over the previous state, where any pushed stream was matched to a request as long as the URL matched (and certificate was valid etc.).

I agree that this is a significant improvement over the previous status quo. Thanks for making it happen!

mk...@chromium.org

unread,
May 24, 2018, 6:17:00 AM5/24/18
to blink-dev, b...@chromium.org, joc...@chromium.org, msr...@chromium.org, mn...@mnot.net, jyas...@google.com, r...@chromium.org, ann...@annevk.nl, jakear...@google.com
On Saturday, May 12, 2018 at 9:00:59 PM UTC+2, Yoav Weiss wrote:
Thanks for updating this thread!

Retroactive LGTM1

Equally retroactive LGTM2. The behavior that landed on ToT is better than the status quo: it aligns our network stack with Safari's (and aligns with our own behavior for QUIC), and gives developers more clarity about what to expect when pushing responses. There's some evidence that developers actually ran into this in the wild, so I'm glad to see it fixed.

It's not clear to me, however, what the standards-side of this looks like. Bence, you started a thread on the HTTP WG's mailing list last year (https://lists.w3.org/Archives/Public/ietf-http-wg/2017OctDec/0202.html), and it seems to have gone nowhere. Do you think specification changes are necessary, or do you think this change aligns our behavior with the existing spec?
 

On Sat, May 12, 2018 at 8:41 PM Bence Béky <b...@chromium.org> wrote:
Hi,

I just wrapped up the implementation on this and closed https://crrev.com/554220, so I'm circling back to this thread.  Sorry I didn't wait for LGTMs, I remembered that we mostly reached concensus on the important issues in principle, and totally forgot that I need to wait for format approval.  My apologies, and thanks to Yoav for reminding me.  Owners, please approve retroactively, or voice your concerns.

For posterity, the link should have been https://bugs.chromium.org/p/chromium/issues/detail?id=554220. :)

Philip Jägenstedt

unread,
May 24, 2018, 6:23:55 AM5/24/18
to Mike West, blink-dev, Bence Béky, joc...@chromium.org, msr...@chromium.org, mn...@mnot.net, jyas...@google.com, r...@chromium.org, ann...@annevk.nl, jakear...@google.com
Retroactive rubberstamp LGTM3

Bence Béky

unread,
May 24, 2018, 10:46:49 AM5/24/18
to mk...@chromium.org, blink-dev, Jochen Eisinger, msr...@chromium.org, Mark Nottingham, Jeffrey Yasskin, Ryan Hamilton, Anne van Kesteren, Jake Archibald
Hi Mike,

Good question.  I do not think that the current specification prescribes the desired behavior.  RFC 7540 states in Section 8.2.1 that "Server push is semantically equivalent to a server responding to a request", but there is no mention of the three-way relationship between client request, pushed request, and pushed response.  Most developers think about the collection of unmatched pushed resources as an in-memory cache, but for example the RFC explicitly states that "Pushed responses that are not cacheable MUST NOT be stored by any HTTP cache. They MAY be made available to the application separately.", that is, they can be kept in this in-memory cache but not in the regular HTTP cache.  So at least in this respect usual cache semantics do not apply.  And as far as range request goes, all bets are off: one could justify implementing a complex cache that serves the pushed part of a client request from the pushed response, and synthesizes a request for the missing part.  I did the second most simple solution of only serving exact matches (the simplest would have been to reject all pushed range responses), and I argue that this in sane and permitted by the specs.

So I think that this behavior is in compliance with the specification, in large part because the specification is vague.  If one wanted to spell out the desired behavior in detail, I guess probably the best place for this would be the Fetch spec.

Bence

Mark Nottingham

unread,
May 24, 2018, 8:21:23 PM5/24/18
to Bence Béky, mk...@chromium.org, blink-dev, Jochen Eisinger, msr...@chromium.org, Jeffrey Yasskin, Ryan Hamilton, Anne van Kesteren, Jake Archibald
Hi Bence,

> On 25 May 2018, at 12:46 am, Bence Béky <b...@chromium.org> wrote:
>
> Hi Mike,
>
> Good question. I do not think that the current specification prescribes the desired behavior. RFC 7540 states in Section 8.2.1 that "Server push is semantically equivalent to a server responding to a request", but there is no mention of the three-way relationship between client request, pushed request, and pushed response.

The intent of that was that a pushed response is stored in a HTTP cache (the separate push cache wasn't contemplated then, and AFAIK we're still not sure why it's necessary; see <https://github.com/whatwg/fetch/issues/354>), and since Vary needs the request headers, the pushed request must "match" it. The current request that you're about to make can then be matched (or not) to the pushed response, using the pushed request to guide the application of Vary.

> Most developers think about the collection of unmatched pushed resources as an in-memory cache, but for example the RFC explicitly states that "Pushed responses that are not cacheable MUST NOT be stored by any HTTP cache. They MAY be made available to the application separately.", that is, they can be kept in this in-memory cache but not in the regular HTTP cache. So at least in this respect usual cache semantics do not apply.

Yes, but I don't think you can infer to much from that; if something's not cacheable, the cache doesn't interact with it.

> And as far as range request goes, all bets are off: one could justify implementing a complex cache that serves the pushed part of a client request from the pushed response

This is not uncommon in intermediary caches, FWIW...

> , and synthesizes a request for the missing part. I did the second most simple solution of only serving exact matches (the simplest would have been to reject all pushed range responses), and I argue that this in sane and permitted by the specs.

Yes. Caching is an optimisation, so it's not sensible to put hard requirements on it (at least in the generic spec) -- doubly so when it's something complex like caching and combining ranges. I have a test suite for this, BTW, and it seems like all browser caches don't do much with ranges.

> So I think that this behavior is in compliance with the specification, in large part because the specification is vague.

I think it's compliant (and wasn't before). H2 is vague about this because it's relying on normal HTTP semantics / requirements.

> If one wanted to spell out the desired behavior in detail, I guess probably the best place for this would be the Fetch spec.

Sure. If we can make it clearer in HTTP without making the specs too much longer, we've opened up the core set of documents (but not h2) for revision, FWIW.

Cheers,

Bence Béky

unread,
May 25, 2018, 9:07:01 AM5/25/18
to Mark Nottingham, mk...@chromium.org, blink-dev, Jochen Eisinger, msr...@chromium.org, Jeffrey Yasskin, Ryan Hamilton, Anne van Kesteren, Jake Archibald
Hi Mark,

Thank you for your response.  What you are writing all makes perfect sense to me.  And if HTTP/2 push is meant to inject resources into the HTTP cache, then indeed there's no need to spell out the three-way interaction between client request, pushed request, and pushed response, because relevant cache semantics is already specified elsewhere.

I sounds like it is an artifact that Chrome uses a separate in-memory cache, with the consequence that Vary etc. validation has to be wired up separately.

Just to clarify, with respect to the text "Pushed responses that are not cacheable MUST NOT be stored by any HTTP cache. They MAY be made available to the application separately.", Chrome's in-memory cache is not considered to be an HTTP cache, right?  That is, storing non-cacheable requests in memory for five minutes and potentially matching them up with a client request counts as being "made available to the application separately", correct?

Cheers,

Bence

Mark Nottingham

unread,
May 29, 2018, 1:24:18 AM5/29/18
to Bence Béky, mk...@chromium.org, blink-dev, Jochen Eisinger, msr...@chromium.org, Jeffrey Yasskin, Ryan Hamilton, Anne van Kesteren, Jake Archibald
On 25 May 2018, at 11:06 pm, Bence Béky <b...@chromium.org> wrote:
>
> Just to clarify, with respect to the text "Pushed responses that are not cacheable MUST NOT be stored by any HTTP cache. They MAY be made available to the application separately.", Chrome's in-memory cache is not considered to be an HTTP cache, right?

Correct, unless it defines itself to be following HTTP caching rules. Keep in mind that a cacheable response is defined here: <https://httpwg.org/specs/rfc7234.html#response.cacheability> -- e.g., it can be stale.

> That is, storing non-cacheable requests in memory for five minutes and potentially matching them up with a client request counts as being "made available to the application separately", correct?

Well... that's interesting. The intent behind that text was that it could be consumed by other APIs, etc. Matching it up with a HTTP request sounds an awful lot like a HTTP cache...

This seems to take us back to the discussion in <https://github.com/whatwg/fetch/issues/354>.

Cheers,
Reply all
Reply to author
Forward
0 new messages