Intent to implement and ship: HTTP/2 push header validation

171 views
Skip to first unread message

Bence Béky

unread,
Oct 30, 2017, 4:17:31 PM10/30/17
to blink-dev
To: blink-dev@ (bcc: net-dev@ for heads up)

Contact emails

b...@chromium.org

Summary

Do not serve requests from HTTP/2 pushed streams that do not match with respect to the Vary header or range parameters.

Motivation

Currently HTTP/2 pushed streams are matched with request based on URL only.  This can result in an incorrect or corrupt response to a request (for example, unsupported encoding, web page corresponding to invalid user credentials, or wrong range for a range request).

Tracking bug


Design document


Relevant specifications

HTTP/2 server push: RFC 7540 Section 8.2.
Vary header: RFC 7231 Section 7.1.4.
Range request: RFC 7233

Interoperability risk

As of 2017 May, Firefox and Edge both ignore the Vary header for HTTP/2 pushed streams, along with the current behavior of Chrome.  Safari already obeys the Vary header, which is the proposed behavior for Chrome.  See https://jakearchibald.com/2017/h2-push-tougher-than-i-thought/#items-in-the-push-cache-should-be-matched-using-http-semantics-aside-from-freshness.  Also note that Chrome obeys the Vary header for QUIC pushed streams.

Range parameters are currently ignored in Chrome both for HTTP/2 pushed streams and QUIC pushed streams.  I have no information about Range requests in other browsers.

Usage information from UMA

Net.SpdyStreamsPushedPerSession shows that Chrome receives pushed streams on only very few HTTP/2 connections.

Net.SpdySession.PushedBytes show that on average, a few dozen bytes of data are pushed per connection.

Net.PushedStreamAlreadyHasResponseHeaders show that about 25% of the time, response headers have not yet been received when a request is matched to a pushed stream.  This justifies the need for an async mechanism for matching requests to pushed streams.

New metrics shall be added to monitor the percentage of request--pushed stream pairs with matching URL that are not matched due to Vary header or range parameters.

Entry on the Chrome Platform Status


Is this feature fully tested by web-platform-tests?

No.  WPT does not support HTTP/2.

Will this feature be supported on all six Blink platforms (Windows, Mac,
Linux, Chrome OS, Android, and Android WebView)?

Yes.

Requesting approval to ship?

Yes.

Jeffrey Yasskin

unread,
Oct 30, 2017, 4:41:05 PM10/30/17
to b...@chromium.org, blink-dev
It seems like https://fetch.spec.whatwg.org/ should describe this behavior of fulfilling requests based on previously-pushed resources. Is anyone working on that?

Thanks,
Jeffrey

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CACMu3toRrnLet8r8NTLTxoZob%2Bj-0ZAYsmgaKorYQe7KfSGAdA%40mail.gmail.com.

Yoav Weiss

unread,
Nov 1, 2017, 5:40:33 PM11/1/17
to Jeffrey Yasskin, ann...@annevk.nl, mn...@mnot.net, b...@chromium.org, blink-dev
On Mon, Oct 30, 2017 at 8:41 PM 'Jeffrey Yasskin' via blink-dev <blin...@chromium.org> wrote:
It seems like https://fetch.spec.whatwg.org/ should describe this behavior of fulfilling requests based on previously-pushed resources. Is anyone working on that?

There's an open issue for that definition, but I'm not aware of anyone actively working on it. I'm also not 100% sure that Fetch is the right place for the H2 push cache definition (one could argue that it should be defined as part of an extension to HTTP, as it is relevant for non-browser HTTP clients).

 

On Mon, Oct 30, 2017 at 1:17 PM Bence Béky <b...@chromium.org> wrote:

No.  WPT does not support HTTP/2.

Indeed. We talked in the past about Someone™ creating an HTTP/2 test suite... sigh.
  

Mark Nottingham

unread,
Nov 1, 2017, 7:08:33 PM11/1/17
to Yoav Weiss, Jeffrey Yasskin, Anne van Kesteren, b...@chromium.org, blink-dev
On 2 Nov 2017, at 8:40 am, Yoav Weiss <yo...@yoav.ws> wrote:
>
> +Anne van Kesteren +Mark Nottingham
>
> On Mon, Oct 30, 2017 at 8:41 PM 'Jeffrey Yasskin' via blink-dev <blin...@chromium.org> wrote:
> It seems like https://fetch.spec.whatwg.org/ should describe this behavior of fulfilling requests based on previously-pushed resources. Is anyone working on that?
>
> There's an open issue for that definition, but I'm not aware of anyone actively working on it. I'm also not 100% sure that Fetch is the right place for the H2 push cache definition (one could argue that it should be defined as part of an extension to HTTP, as it is relevant for non-browser HTTP clients).

Yeah. There's still an argument that there doesn't need to be a separate H2 push cache; i.e., the current browser behaviour is unnecessary. Push is currently defined just in terms of HTTP caching, so it might be that the implementations need to change to match the spec, not the other way around.

My understanding is that no one actually knows where the requirement for a separate push cache is coming from -- both the browser network engineers and browser security folks say it came from somewhere else, but upon examination there isn't a well-stated reason for it.

Not sure what the next steps are, but I'm hoping that if we have another HTTP workshop this year, we might have some illuminating discussions there. Lots of people are talking about push and what its future might be.

All of that said -- even if push *is* just an interaction with the HTTP cache, there are still some HTTP caching tests in the fetch directory of WPT, so conceivably it could fit there.



--
Mark Nottingham https://www.mnot.net/

Ryan Hamilton

unread,
Nov 1, 2017, 8:44:57 PM11/1/17
to Mark Nottingham, Yoav Weiss, Jeffrey Yasskin, Anne van Kesteren, Bence Béky, blink-dev
We (chrome network team) talked some time back about pushing directly into the HTTP cache and decided against it because (swapping in context) it lets the server write entries into the HTTP cache that the client would not otherwise create. For example, consider a situation where a page is infected with malware. SafeBrowsing would kick in and prevent the request from going to the network and the infected page would not end up in the HTTP cache. Once the page was cleaned up and removed from SafeBrowsing, requests for that URL would go to the network and into the cache. However, if the server pushed the page into the HTTP cache, then after the page is cleaned up and removed from SafeBrowsing, the client would request the URL and the infected page would be served out of the HTTP cache. This seemed bad.

Though I may be misunderstanding your comment about changing behavior to simply be HTTP caching.

Cheers,

Ryan

Mark Nottingham

unread,
Nov 1, 2017, 11:10:43 PM11/1/17
to Ryan Hamilton, Yoav Weiss, Jeffrey Yasskin, Anne van Kesteren, Bence Béky, blink-dev


> On 2 Nov 2017, at 11:44 am, Ryan Hamilton <r...@chromium.org> wrote:
>
> We (chrome network team) talked some time back about pushing directly into the HTTP cache and decided against it because (swapping in context) it lets the server write entries into the HTTP cache that the client would not otherwise create. For example, consider a situation where a page is infected with malware. SafeBrowsing would kick in and prevent the request from going to the network and the infected page would not end up in the HTTP cache. Once the page was cleaned up and removed from SafeBrowsing, requests for that URL would go to the network and into the cache. However, if the server pushed the page into the HTTP cache, then after the page is cleaned up and removed from SafeBrowsing, the client would request the URL and the infected page would be served out of the HTTP cache. This seemed bad.

If SafeBrowing detects that a request to https://evil.example.com/ is infected, does the request actually go out on the wire? I ask because pushed requests have to be associated with a stream that's either open or half-closed.

Ryan Hamilton

unread,
Nov 1, 2017, 11:46:24 PM11/1/17
to Mark Nottingham, Yoav Weiss, Jeffrey Yasskin, Anne van Kesteren, Bence Béky, blink-dev
On Wed, Nov 1, 2017 at 11:10 PM, Mark Nottingham <mn...@mnot.net> wrote:
> On 2 Nov 2017, at 11:44 am, Ryan Hamilton <r...@chromium.org> wrote:
>
> We (chrome network team) talked some time back about pushing directly into the HTTP cache and decided against it because (swapping in context) it lets the server write entries into the HTTP cache that the client would not otherwise create. For example, consider a situation where a page is infected with malware. SafeBrowsing would kick in and prevent the request from going to the network and the infected page would not end up in the HTTP cache. Once the page was cleaned up and removed from SafeBrowsing, requests for that URL would go to the network and into the cache. However, if the server pushed the page into the HTTP cache, then after the page is cleaned up and removed from SafeBrowsing, the client would request the URL and the infected page would be served out of the HTTP cache. This seemed bad.

If SafeBrowing detects that a request to https://evil.example.com/ is infected, does the request actually go out on the wire?

​No.​
 
I ask because pushed requests have to be associated with a stream that's either open or half-closed.

That's true, but imagine that the user requests https://example.com/ which links to https://example.com/other. The request to ​https://example.com/ would go to the wire and the server could try to push /other (which is currently infected) as associated with /. SafeBrowsing would prevent the request for /other from hitting the wire and hence the pushed (infected) resource would not be pulled up into the HTTP cache. After a short period of time it will expire from the H2 push cache.

​Does that make sense?

Cheers,

Ryan

Anne van Kesteren

unread,
Nov 2, 2017, 1:50:42 AM11/2/17
to Ryan Hamilton, Mark Nottingham, Yoav Weiss, Jeffrey Yasskin, Bence Béky, blink-dev
On Thu, Nov 2, 2017 at 4:46 AM, Ryan Hamilton <r...@chromium.org> wrote:
> That's true, but imagine that the user requests https://example.com/ which
> links to https://example.com/other. The request to https://example.com/
> would go to the wire and the server could try to push /other (which is
> currently infected) as associated with /. SafeBrowsing would prevent the
> request for /other from hitting the wire and hence the pushed (infected)
> resource would not be pulled up into the HTTP cache. After a short period of
> time it will expire from the H2 push cache.
>
> Does that make sense?

If that is the sole reason why invent a new caching mechanism rather
than perform a SafeBrowsing check before putting pushed resources in
the cache?


--
https://annevankesteren.nl/

Matthew Menke

unread,
Nov 2, 2017, 5:26:53 PM11/2/17
to blink-dev, mn...@mnot.net, yo...@yoav.ws, jyas...@google.com, ann...@annevk.nl, b...@chromium.org


On Wednesday, November 1, 2017 at 11:46:24 PM UTC-4, Ryan Hamilton wrote:
On Wed, Nov 1, 2017 at 11:10 PM, Mark Nottingham <mn...@mnot.net> wrote:
> On 2 Nov 2017, at 11:44 am, Ryan Hamilton <r...@chromium.org> wrote:
>
> We (chrome network team) talked some time back about pushing directly into the HTTP cache and decided against it because (swapping in context) it lets the server write entries into the HTTP cache that the client would not otherwise create. For example, consider a situation where a page is infected with malware. SafeBrowsing would kick in and prevent the request from going to the network and the infected page would not end up in the HTTP cache. Once the page was cleaned up and removed from SafeBrowsing, requests for that URL would go to the network and into the cache. However, if the server pushed the page into the HTTP cache, then after the page is cleaned up and removed from SafeBrowsing, the client would request the URL and the infected page would be served out of the HTTP cache. This seemed bad.

If SafeBrowing detects that a request to https://evil.example.com/ is infected, does the request actually go out on the wire?

​No.​

Worth noting that it does on Android, where SafeBrowsing checks are done asynchronously, and behavior on desktop is changing with the advent of an out-of-process network stack.

Ryan Hamilton

unread,
Nov 2, 2017, 6:12:42 PM11/2/17
to Anne van Kesteren, Mark Nottingham, Yoav Weiss, Jeffrey Yasskin, Bence Béky, blink-dev
Safe browsing is not the only reason, no. But it's an example (the only one I could remember offhand, admittedly :>) of a class of problems that crop up when things are written to the cache by the net stack without the intervention of higher layers of the code. 

Mark Nottingham

unread,
Nov 2, 2017, 6:27:16 PM11/2/17
to Ryan Hamilton, Anne van Kesteren, Yoav Weiss, Jeffrey Yasskin, Bence Béky, blink-dev
So, I don't see any reason why you can't just patch in before a cache write. That's the most intuitive way to model it, rather than inventing a new cache -- especially when SB isn't part of the standard.

Also, AIUI SB is done on a per-origin level; if so, your example above doesn't hold together.

And, if SB checks are moving to async, it seems like you're going to have cache pollution problems to deal with all over the place, not just from push.

Cheers,

Ryan Hamilton

unread,
Nov 2, 2017, 6:35:13 PM11/2/17
to Mark Nottingham, Anne van Kesteren, Yoav Weiss, Jeffrey Yasskin, Bence Béky, blink-dev
On Thu, Nov 2, 2017 at 3:27 PM, Mark Nottingham <mn...@mnot.net> wrote:
On 3 Nov 2017, at 9:12 am, Ryan Hamilton <r...@chromium.org> wrote:
>
> On Wed, Nov 1, 2017 at 10:50 PM, Anne van Kesteren <ann...@annevk.nl> wrote:
> On Thu, Nov 2, 2017 at 4:46 AM, Ryan Hamilton <r...@chromium.org> wrote:
> > That's true, but imagine that the user requests https://example.com/ which
> > links to https://example.com/other. The request to https://example.com/
> > would go to the wire and the server could try to push /other (which is
> > currently infected) as associated with /. SafeBrowsing would prevent the
> > request for /other from hitting the wire and hence the pushed (infected)
> > resource would not be pulled up into the HTTP cache. After a short period of
> > time it will expire from the H2 push cache.
> >
> > Does that make sense?
>
> If that is the sole reason why invent a new caching mechanism rather
> than perform a SafeBrowsing check before putting pushed resources in
> the cache?
>
> Safe browsing is not the only reason, no. But it's an example (the only one I could remember offhand, admittedly :>) of a class of problems that crop up when things are written to the cache by the net stack without the intervention of higher layers of the code.

So, I don't see any reason why you can't just patch in before a cache write. That's the most intuitive way to model it, rather than inventing a new cache -- especially when SB isn't part of the standard.

Also, AIUI SB is done on a per-origin level; if so, your example above doesn't hold together.

​I don't believe that is correct. From https://developers.google.com/safe-browsing/v4/urls-hashing

For the host, the client will try at most five different strings. They are:
    • The exact hostname in the URL.
    • Up to four hostnames formed by starting with the last five components and successively removing the leading component. The top-level domain can be skipped. These additional hostnames should not be checked if the host is an IP address.

For the path, the client will try at most six different strings. They are:

    • The exact path of the URL, including query parameters.
    • The exact path of the URL, without query parameters.
    • The four paths formed by starting at the root (/) and successively appending path components, including a trailing slash.

Mark Nottingham

unread,
Nov 2, 2017, 8:30:42 PM11/2/17
to Ryan Hamilton, Anne van Kesteren, Yoav Weiss, Jeffrey Yasskin, Bence Béky, blink-dev
Ah, OK - thanks. I think the other points made stand.

Cheers,

Ryan Hamilton

unread,
Nov 2, 2017, 10:47:16 PM11/2/17
to Mark Nottingham, Anne van Kesteren, Yoav Weiss, Jeffrey Yasskin, Bence Béky, blink-dev
To back up, I came into this thread to address the assertion:

My understanding is that no one actually knows where the requirement for a separate push cache is coming from -- both the browser network engineers and browser security folks say it came from somewhere else, but upon examination there isn't a well-stated reason for it.

I was simply attempting to explain why the Chrome network engineers chose to have a separate push cache. Namely, that pushing straight into the cache at the HTTP layer results in entries in the cache for resources that the client may not request for policy reasons. Now sure, it might be possible to up call from the network layer into the loading/rendering layer at cache write time to see if a push really should go into the cache. But that's non-trivially complex and as it will be asynchronous, you need a place for the pushed data to live while you're waiting for that call to complete. You might call such a waiting area the "push cache" or you might call it the "push map". Or, instead of doing the up-call, you could instead not do the upcall and simply wait for the request to come in. Both cases result in pushed stream data sitting down in the network before being pulled up into the cache when instructed by the higher layers. The latter solution is much easier to implement. But either way, you still end up with the same holding area.

In any case, the possible change to async SafeBrowsing and the effects this has on the HTTP cache are an ... active topic of discussion at the moment. :) The WebRequest extension API is another example of such a higher layer interaction.

Cheers,

Ryan

Mark Nottingham

unread,
Nov 2, 2017, 11:02:16 PM11/2/17
to Ryan Hamilton, Anne van Kesteren, Yoav Weiss, Jeffrey Yasskin, Bence Béky, blink-dev


> On 3 Nov 2017, at 1:47 pm, Ryan Hamilton <r...@chromium.org> wrote:
>
> To back up, I came into this thread to address the assertion:
>
> My understanding is that no one actually knows where the requirement for a separate push cache is coming from -- both the browser network engineers and browser security folks say it came from somewhere else, but upon examination there isn't a well-stated reason for it.

Fair enough. Thanks for the information, it's good to get that context.

> I was simply attempting to explain why the Chrome network engineers chose to have a separate push cache. Namely, that pushing straight into the cache at the HTTP layer results in entries in the cache for resources that the client may not request for policy reasons. Now sure, it might be possible to up call from the network layer into the loading/rendering layer at cache write time to see if a push really should go into the cache. But that's non-trivially complex and as it will be asynchronous, you need a place for the pushed data to live while you're waiting for that call to complete. You might call such a waiting area the "push cache" or you might call it the "push map". Or, instead of doing the up-call, you could instead not do the upcall and simply wait for the request to come in. Both cases result in pushed stream data sitting down in the network before being pulled up into the cache when instructed by the higher layers. The latter solution is much easier to implement. But either way, you still end up with the same holding area.

This leaves me wondering why the SafeBrowsing is considered to be at the loading/rendering layer, when it seems so well-suited to be a network-layer function.

An alternative approach would be to mark it "potentially dirty" upon cache insertion, updating the cache with "clean" or "dirty" (i.e., purging it) upon the SafeBrowsing call returning. This is the approach that's taken in similar situations in intermediary caches, and it works reasonably well at that scale. That avoids creating another layer of caching.

If it's not obvious, a lot of the concern about this is because having another layer of caching makes understanding the system behaviour all the more difficult -- especially when that cache's behaviours aren't aligned with the others'.


> In any case, the possible change to async SafeBrowsing and the effects this has on the HTTP cache are an ... active topic of discussion at the moment. :) The WebRequest extension API is another example of such a higher layer interaction.

Interesting. That might be a better illustration than SafeBrowsing.

In VCL-land, we have a separate callback for inspecting/modifying a response before (possible) cache insertion. I can't help but wonder if browsers needs something similar. I know ServiceWorker can do it with *its* cache, but while there's still a HTTP cache in the client, it needs to be accounted for too.

Cheers,

Ryan Hamilton

unread,
Nov 2, 2017, 11:37:54 PM11/2/17
to Mark Nottingham, Anne van Kesteren, Yoav Weiss, Jeffrey Yasskin, Bence Béky, blink-dev
On Thu, Nov 2, 2017 at 8:02 PM, Mark Nottingham <mn...@mnot.net> wrote:
> On 3 Nov 2017, at 1:47 pm, Ryan Hamilton <r...@chromium.org> wrote:
> I was simply attempting to explain why the Chrome network engineers chose to have a separate push cache. Namely, that pushing straight into the cache at the HTTP layer results in entries in the cache for resources that the client may not request for policy reasons. Now sure, it might be possible to up call from the network layer into the loading/rendering layer at cache write time to see if a push really should go into the cache. But that's non-trivially complex and as it will be asynchronous, you need a place for the pushed data to live while you're waiting for that call to complete. You might call such a waiting area the "push cache" or you might call it the "push map". Or, instead of doing the up-call, you could instead not do the upcall and simply wait for the request to come in. Both cases result in pushed stream data sitting down in the network before being pulled up into the cache when instructed by the higher layers. The latter solution is much easier to implement. But either way, you still end up with the same holding area.

This leaves me wondering why the SafeBrowsing is considered to be at the loading/rendering layer, when it seems so well-suited to be a network-layer function.

Hm. I don't know the answer to that. I wonder if it might be related to the fact that SafeBrowsing issues requests, and we typically avoid issuing requests from inside the network stack? But I'm just guessing. (Also a quick perusal of the code suggests that it currently has pieces which run on the UI thread and is aware of things like Navigations. I suspect that someone with more knowledge of SafeBrowsing would probably be able to shed some light).
An alternative approach would be to mark it "potentially dirty" upon cache insertion, updating the cache with "clean" or "dirty" (i.e., purging it) upon the SafeBrowsing call returning. This is the approach that's taken in similar situations in intermediary caches, and it works reasonably well at that scale. That avoids creating another layer of caching.

*nod* Though see the issue about push bandwidth below.
 
If it's not obvious, a lot of the concern about this is because having another layer of caching makes understanding the system behaviour all the more difficult -- especially when that cache's behaviours aren't aligned with the others'.

​Agreed!​ Very happy to see effort in this space to ensure that the behaviors are easily understandable. 

> In any case, the possible change to async SafeBrowsing and the effects this has on the HTTP cache are an ... active topic of discussion at the moment. :) The WebRequest extension API is another example of such a higher layer interaction.

Interesting. That might be a better illustration than SafeBrowsing.

​Fair enough. (SB was the only one I could remember initially :>)

I finally found the notes from the meeting we had about this. It also pointed out that pushing straight into the cache raises the potential for a server to consume a virtually unlimited volume of data, even if the user navigates away from a page. When navigating away from a page, any in-progress requests are killed. But if a resource is being pushed into cache without being associated with an active request, it won't be killed and that push will continue to use bandwidth with no way for the user to make it stop. (Clearly the pushed stream was associated with an explicitly requested stream at the H2 layer when it was promised, but that request may well complete before navigating away)
In VCL-land, we have a separate callback for inspecting/modifying a response before (possible) cache insertion. I can't help but wonder if browsers needs something similar. I know ServiceWorker can do it with *its* cache, but while there's still a HTTP cache in the client, it needs to be accounted for too.

​There are caches everywhere! :) ServiceWorker caches, Blink caches, HTTP caches, Push caches.


Cheers,

Ryan

Jeffrey Yasskin

unread,
Nov 3, 2017, 12:27:55 PM11/3/17
to Ryan Hamilton, Mark Nottingham, Anne van Kesteren, Yoav Weiss, b...@chromium.org, blink-dev, Jake Archibald
On Thu, Nov 2, 2017 at 7:47 PM Ryan Hamilton <r...@chromium.org> wrote:
To back up, I came into this thread to address the assertion:

My understanding is that no one actually knows where the requirement for a separate push cache is coming from -- both the browser network engineers and browser security folks say it came from somewhere else, but upon examination there isn't a well-stated reason for it.

I was simply attempting to explain why the Chrome network engineers chose to have a separate push cache. Namely, that pushing straight into the cache at the HTTP layer results in entries in the cache for resources that the client may not request for policy reasons. Now sure, it might be possible to up call from the network layer into the loading/rendering layer at cache write time to see if a push really should go into the cache.

This actually strikes me as a possibility.
<naive-speculation>
Call up, not at cache write time, but when the network layer receives the PUSH_PROMISE (semantically, a Request) that opens the response stream. This PUSH_PROMISE is associated with an existing fetch operation "parent". If "parent" is owned by a JS fetch() call, its observer (https://gist.github.com/slightlyoff/18dc42ae00768c23fbc4c5097400adfb#gistcomment-2227534) receives a cancelable event for the PUSH_PROMISE. If it's canceled, that RST_STREAMs the pushed stream.

Otherwise, "parent"'s page fetches the pushed request, which schedules it to be canceled if the page navigates, and runs the request through any frontend-based SafeBrowsing checks before directing the response stream toward the cache. If "parent" is owned by the browser itself, it just always does the default action to fetch the pushed request.

This default fetch can also be the place to incorporate the Vary-checking in this thread's Intent: if a new request to the pushed's request's URL wouldn't match the pushed Vary header, it could cancel instead of actually fetching.

There's still a period where the network stack has to store the response stream before the page has taken ownership, but I think this avoids exposing that complexity to web authors.
</naive-speculation>

What have I missed?

Jeffrey

Ryan Hamilton

unread,
Nov 3, 2017, 6:32:05 PM11/3/17
to Jeffrey Yasskin, Mark Nottingham, Anne van Kesteren, Yoav Weiss, Bence Béky, blink-dev, Jake Archibald
On Fri, Nov 3, 2017 at 9:27 AM, Jeffrey Yasskin <jyas...@google.com> wrote:
On Thu, Nov 2, 2017 at 7:47 PM Ryan Hamilton <r...@chromium.org> wrote:
To back up, I came into this thread to address the assertion:

My understanding is that no one actually knows where the requirement for a separate push cache is coming from -- both the browser network engineers and browser security folks say it came from somewhere else, but upon examination there isn't a well-stated reason for it.

I was simply attempting to explain why the Chrome network engineers chose to have a separate push cache. Namely, that pushing straight into the cache at the HTTP layer results in entries in the cache for resources that the client may not request for policy reasons. Now sure, it might be possible to up call from the network layer into the loading/rendering layer at cache write time to see if a push really should go into the cache.

This actually strikes me as a possibility.
<naive-speculation>
Call up, not at cache write time, but when the network layer receives the PUSH_PROMISE (semantically, a Request) that opens the response stream. This PUSH_PROMISE is associated with an existing fetch operation "parent". If "parent" is owned by a JS fetch() call, its observer (https://gist.github.com/slightlyoff/18dc42ae00768c23fbc4c5097400adfb#gistcomment-2227534) receives a cancelable event for the PUSH_PROMISE. If it's canceled, that RST_STREAMs the pushed stream.

Otherwise, "parent"'s page fetches the pushed request, which schedules it to be canceled if the page navigates, and runs the request through any frontend-based SafeBrowsing checks before directing the response stream toward the cache. If "parent" is owned by the browser itself, it just always does the default action to fetch the pushed request.

This default fetch can also be the place to incorporate the Vary-checking in this thread's Intent: if a new request to the pushed's request's URL wouldn't match the pushed Vary header, it could cancel instead of actually fetching.

There's still a period where the network stack has to store the response stream before the page has taken ownership, but I think this avoids exposing that complexity to web authors.
</naive-speculation>

What have I missed?

​Sure, all of that could definitely be done. Or we could simply keep the pushed stream down at the HTTP/2 layer and wait for the request to rendezvous with it. This is much simpler.

Jeffrey Yasskin

unread,
Nov 3, 2017, 6:38:34 PM11/3/17
to Ryan Hamilton, Mark Nottingham, Anne van Kesteren, Yoav Weiss, b...@chromium.org, blink-dev, Jake Archibald
It's simpler for the implementation, but seems to be less simple for web developers (e.g. https://jakearchibald.com/2017/h2-push-tougher-than-i-thought/). In our priority of stakeholders, the developers beat the implementers.

Jeffrey

Ryan Hamilton

unread,
Nov 3, 2017, 7:28:49 PM11/3/17
to Jeffrey Yasskin, Mark Nottingham, Anne van Kesteren, Yoav Weiss, Bence Béky, blink-dev, Jake Archibald
It's not obvious to me that these two implementations have different user-visible behaviors. Can you say more? (I'm probably overlooking something you said)​

Jeffrey Yasskin

unread,
Nov 3, 2017, 7:51:28 PM11/3/17
to Ryan Hamilton, Mark Nottingham, Anne van Kesteren, Yoav Weiss, b...@chromium.org, blink-dev, Jake Archibald
I *think*, although I could be wrong, that passing every push up to the page, which always either fetches it into the HTTP cache or cancels it, eliminates the developer-visible push cache. That in turn gets rid of two troublesome things Jake pointed out:

* If the connection closes, bye bye push cache
* Requests without credentials use a separate connection

Jeffrey

Yoav Weiss

unread,
Nov 6, 2017, 8:58:24 AM11/6/17
to Jeffrey Yasskin, Ryan Hamilton, Mark Nottingham, Anne van Kesteren, b...@chromium.org, blink-dev, Jake Archibald
On Fri, Nov 3, 2017 at 11:51 PM Jeffrey Yasskin <jyas...@google.com> wrote:
On Fri, Nov 3, 2017 at 4:28 PM Ryan Hamilton <r...@chromium.org

I *think*, although I could be wrong, that passing every push up to the page, which always either fetches it into the HTTP cache or cancels it, eliminates the developer-visible push cache. That in turn gets rid of two troublesome things Jake pointed out:

* If the connection closes, bye bye push cache
* Requests without credentials use a separate connection

 
While getting rid of the above would indeed be great, I'm not sure the conclusion must be that we need to store pushed resources in the HTTP cache. I also think this is a wider-scoped discussion which doesn't necessarily need to block the work that Bence is proposing, which will lead to arguably better push behavior in the interim.
What I'm missing here is a way to assess the risk involved. While push changes are likely to bare little breakage risk, they can lead to spurious downloads, and can result in perf regressions.

A few points that bother me:
* As I commented in the doc, I think the way Range request support will be implemented will increase spurious pushes. I believe we need to gather data on the presence of such pushes today before making this compromise.
*  It's not clear to me what the use case is for `Vary` headers in pushed responses. Generally, if you need content negotiation, server push is not the right tool for the job (unless that negotiation gets the same result in 99% of the cases, e.g. "Content-Encoding"). Bence, could you comment on which user scenarios are likely to improve as a result of this change?
* Range request push seems useful mostly for media streaming scenarios. Bence, is that the use-case we're targeting here? 

Cheers :)
Yoav 

Jeffrey Yasskin

unread,
Nov 6, 2017, 9:14:08 AM11/6/17
to Yoav Weiss, Ryan Hamilton, Mark Nottingham, Anne van Kesteren, b...@chromium.org, blink-dev, Jake Archibald
On Mon, Nov 6, 2017 at 6:58 AM Yoav Weiss <yo...@yoav.ws> wrote:
On Fri, Nov 3, 2017 at 11:51 PM Jeffrey Yasskin <jyas...@google.com> wrote:
On Fri, Nov 3, 2017 at 4:28 PM Ryan Hamilton <r...@chromium.org

I *think*, although I could be wrong, that passing every push up to the page, which always either fetches it into the HTTP cache or cancels it, eliminates the developer-visible push cache. That in turn gets rid of two troublesome things Jake pointed out:

* If the connection closes, bye bye push cache
* Requests without credentials use a separate connection

 
While getting rid of the above would indeed be great, I'm not sure the conclusion must be that we need to store pushed resources in the HTTP cache. I also think this is a wider-scoped discussion which doesn't necessarily need to block the work that Bence is proposing, which will lead to arguably better push behavior in the interim.

I agree. For Bence's work, I just want someone to specify what Chrome is actually doing in response to fetches. It doesn't need to be the long-term consensus behavior.
 
What I'm missing here is a way to assess the risk involved. While push changes are likely to bare little breakage risk, they can lead to spurious downloads, and can result in perf regressions.

A few points that bother me:
* As I commented in the doc, I think the way Range request support will be implemented will increase spurious pushes. I believe we need to gather data on the presence of such pushes today before making this compromise.
*  It's not clear to me what the use case is for `Vary` headers in pushed responses. Generally, if you need content negotiation, server push is not the right tool for the job (unless that negotiation gets the same result in 99% of the cases, e.g. "Content-Encoding"). Bence, could you comment on which user scenarios are likely to improve as a result of this change?

Even if there's no use case for including Vary headers in pushed responses, it seems to make sense to either do what Bence is proposing or drop push responses with Vary headers entirely, to prevent the ecosystem from starting to include them with nonsensical values.

Jeffrey

Bence Béky

unread,
Nov 6, 2017, 10:15:51 AM11/6/17
to blink-dev, ckr...@chromium.org, Yoav Weiss, Ryan Hamilton, Mark Nottingham, Anne van Kesteren, Jake Archibald, Jeffrey Yasskin
Hi,

Thank you for the valuable feedback.

I agree with Jeffrey that it is important for the ecosysem not to
match requests to pushed streams with incompatible Vary values. While
I am not aware of any use case, I know that this is property
implemented for QUIC pushes in Chrome. Cc'ing Buck who implemented
that: Buck, are you aware of any specific use cases?

I can also add a metric to see what percentage of pushed response
headers has a Vary header. If it is exceedingly low, it might be a
good option to reject such pushes for the time being as proposed by
Jeffrey, until a compelling use case arises. However, I would not be
surprised if it actually was pretty high due to Content-Encoding.

Yoav: I was told that game asset loading or DASH streaming could
potentially benefit from non-overlapping Range pushes for the same
URL. Again, if wasted download bandwidth is a concern, measurements
can be made before making this change.

Thank you,

Bence

ckr...@chromium.org

unread,
Nov 6, 2017, 1:08:31 PM11/6/17
to blink-dev, ckr...@chromium.org, yo...@yoav.ws, r...@chromium.org, mn...@mnot.net, ann...@annevk.nl, jakear...@google.com, jyas...@google.com


On Monday, November 6, 2017 at 7:15:51 AM UTC-8, Bence Béky wrote:
Hi,

Thank you for the valuable feedback.

I agree with Jeffrey that it is important for the ecosysem not to
match requests to pushed streams with incompatible Vary values.  While
I am not aware of any use case, I know that this is property
implemented for QUIC pushes in Chrome.  Cc'ing Buck who implemented
that: Buck, are you aware of any specific use cases?


No.  The inclusion in QUIC's implementation wasn't driven by a use case, but rather was based on an e-mail discussion I had with Patrick McManus about RFC 7540.

Yoav Weiss

unread,
Nov 6, 2017, 2:59:07 PM11/6/17
to ckr...@chromium.org, blink-dev, r...@chromium.org, mn...@mnot.net, ann...@annevk.nl, jakear...@google.com, jyas...@google.com, Patrick McManus
On Mon, Nov 6, 2017 at 10:08 AM <ckr...@chromium.org> wrote:


On Monday, November 6, 2017 at 7:15:51 AM UTC-8, Bence Béky wrote:
Hi,

Thank you for the valuable feedback.

I agree with Jeffrey that it is important for the ecosysem not to
match requests to pushed streams with incompatible Vary values.  While
I am not aware of any use case, I know that this is property
implemented for QUIC pushes in Chrome.  Cc'ing Buck who implemented
that: Buck, are you aware of any specific use cases?


No.  The inclusion in QUIC's implementation wasn't driven by a use case, but rather was based on an e-mail discussion I had with Patrick McManus about RFC 7540.


Changing our behavior without a clear use-case is fine. I was just wondering if there was a particular reason you chose to tackle this issue.

Also, does a request from Patrick McManus to change behavior here means that Firefox will also align on similar behavior? Is there an open issue for them on that front?

(CCing Patrick)

 
I can also add a metric to see what percentage of pushed response
headers has a Vary header.  If it is exceedingly low, it might be a
good option to reject such pushes for the time being as proposed by
Jeffrey, until a compelling use case arises.  However, I would not be
surprised if it actually was pretty high due to Content-Encoding.

Yeah, I would hope Content-Encoding negotiation can happen on the initial request before anything is pushed on the connection, but that may not always be the case. Gathering that data can be interesting, but I don't know if we should block shipping on it.
  

Yoav: I was told that game asset loading or DASH streaming could
potentially benefit from non-overlapping Range pushes for the same
URL.  

That makes sense.
 
Again, if wasted download bandwidth is a concern, measurements
can be made before making this change.

Ideally, I'd love to see Range request support that still RSTs push promises on formerly-pushed URLs which have no range in their request.
That kind of implementation will have no spurious-download concerns while still covering the Range request use-case.

Charles 'Buck' Krasic

unread,
Nov 6, 2017, 3:04:20 PM11/6/17
to Yoav Weiss, blink-dev, Ryan Hamilton, Mark Nottingham, ann...@annevk.nl, Jake Archibald, jyas...@google.com, Patrick McManus
On Mon, Nov 6, 2017 at 11:58 AM, Yoav Weiss <yo...@yoav.ws> wrote:


On Mon, Nov 6, 2017 at 10:08 AM <ckr...@chromium.org> wrote:


On Monday, November 6, 2017 at 7:15:51 AM UTC-8, Bence Béky wrote:
Hi,

Thank you for the valuable feedback.

I agree with Jeffrey that it is important for the ecosysem not to
match requests to pushed streams with incompatible Vary values.  While
I am not aware of any use case, I know that this is property
implemented for QUIC pushes in Chrome.  Cc'ing Buck who implemented
that: Buck, are you aware of any specific use cases?


No.  The inclusion in QUIC's implementation wasn't driven by a use case, but rather was based on an e-mail discussion I had with Patrick McManus about RFC 7540.


Changing our behavior without a clear use-case is fine. I was just wondering if there was a particular reason you chose to tackle this issue.

Also, does a request from Patrick McManus to change behavior here means that Firefox will also align on similar behavior? Is there an open issue for them on that front?

(CCing Patrick)

Clarification.  I didn't see it as a change in behavior.  I was doing the initial implementation of server push in chromium QUIC, and I was just trying to understand the spec.
 
 
I can also add a metric to see what percentage of pushed response
headers has a Vary header.  If it is exceedingly low, it might be a
good option to reject such pushes for the time being as proposed by
Jeffrey, until a compelling use case arises.  However, I would not be
surprised if it actually was pretty high due to Content-Encoding.

Yeah, I would hope Content-Encoding negotiation can happen on the initial request before anything is pushed on the connection, but that may not always be the case. Gathering that data can be interesting, but I don't know if we should block shipping on it.
  

Yoav: I was told that game asset loading or DASH streaming could
potentially benefit from non-overlapping Range pushes for the same
URL.  

That makes sense.
 
Again, if wasted download bandwidth is a concern, measurements
can be made before making this change.

Ideally, I'd love to see Range request support that still RSTs push promises on formerly-pushed URLs which have no range in their request.
That kind of implementation will have no spurious-download concerns while still covering the Range request use-case.

--
You received this message because you are subscribed to a topic in the Google Groups "blink-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/a/chromium.org/d/topic/blink-dev/5_aP_stqndw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to blink-dev+unsubscribe@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CACj%3DBEhTKppWXLw5hPjN99-DZsgzfxJ79WSrpGvzoyNK49hncA%40mail.gmail.com.



--
Charles 'Buck' Krasic | Software Engineer | ckr...@google.com | +1 (408) 412-1141

Patrick McManus

unread,
Nov 6, 2017, 3:11:49 PM11/6/17
to Yoav Weiss, ckr...@chromium.org, blink-dev, Ryan Hamilton, mnot, Anne van Kesteren, Jake Archibald, jyas...@google.com, Patrick McManus
yes - the firefox push cache should take vary into account.. (I'm not saying someone is typing the code for that atm, but that's where we'll want to be..)

Yoav Weiss

unread,
Nov 6, 2017, 3:11:55 PM11/6/17
to Charles 'Buck' Krasic, blink-dev, Ryan Hamilton, Mark Nottingham, ann...@annevk.nl, Jake Archibald, jyas...@google.com, Patrick McManus
On Mon, Nov 6, 2017 at 12:04 PM Charles 'Buck' Krasic <ckr...@google.com> wrote:
On Mon, Nov 6, 2017 at 11:58 AM, Yoav Weiss <yo...@yoav.ws> wrote:


On Mon, Nov 6, 2017 at 10:08 AM <ckr...@chromium.org> wrote:


On Monday, November 6, 2017 at 7:15:51 AM UTC-8, Bence Béky wrote:
Hi,

Thank you for the valuable feedback.

I agree with Jeffrey that it is important for the ecosysem not to
match requests to pushed streams with incompatible Vary values.  While
I am not aware of any use case, I know that this is property
implemented for QUIC pushes in Chrome.  Cc'ing Buck who implemented
that: Buck, are you aware of any specific use cases?


No.  The inclusion in QUIC's implementation wasn't driven by a use case, but rather was based on an e-mail discussion I had with Patrick McManus about RFC 7540.


Changing our behavior without a clear use-case is fine. I was just wondering if there was a particular reason you chose to tackle this issue.

Also, does a request from Patrick McManus to change behavior here means that Firefox will also align on similar behavior? Is there an open issue for them on that front?

(CCing Patrick)

Clarification.  I didn't see it as a change in behavior.  I was doing the initial implementation of server push in chromium QUIC, and I was just trying to understand the spec.

OK. I agree it's not a change in behavior for QUIC. I now understand that the discussion with Patrick revolved around the QUIC behavior, so that doesn't necessarily mean Firefox aim to align their H2 behavior. Is that correct?

Yoav Weiss

unread,
Nov 6, 2017, 3:21:58 PM11/6/17
to Patrick McManus, ckr...@chromium.org, blink-dev, Ryan Hamilton, mnot, Anne van Kesteren, Jake Archibald, jyas...@google.com
On Mon, Nov 6, 2017 at 12:11 PM Patrick McManus <mcm...@ducksong.com> wrote:
yes - the firefox push cache should take vary into account.. (I'm not saying someone is typing the code for that atm, but that's where we'll want to be..)

Thanks, Patrick. That's valuable input! :)