--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CACMu3toRrnLet8r8NTLTxoZob%2Bj-0ZAYsmgaKorYQe7KfSGAdA%40mail.gmail.com.
It seems like https://fetch.spec.whatwg.org/ should describe this behavior of fulfilling requests based on previously-pushed resources. Is anyone working on that?
On Mon, Oct 30, 2017 at 1:17 PM Bence Béky <b...@chromium.org> wrote:No. WPT does not support HTTP/2.
> On 2 Nov 2017, at 11:44 am, Ryan Hamilton <r...@chromium.org> wrote:
>
> We (chrome network team) talked some time back about pushing directly into the HTTP cache and decided against it because (swapping in context) it lets the server write entries into the HTTP cache that the client would not otherwise create. For example, consider a situation where a page is infected with malware. SafeBrowsing would kick in and prevent the request from going to the network and the infected page would not end up in the HTTP cache. Once the page was cleaned up and removed from SafeBrowsing, requests for that URL would go to the network and into the cache. However, if the server pushed the page into the HTTP cache, then after the page is cleaned up and removed from SafeBrowsing, the client would request the URL and the infected page would be served out of the HTTP cache. This seemed bad.
If SafeBrowing detects that a request to https://evil.example.com/ is infected, does the request actually go out on the wire?
I ask because pushed requests have to be associated with a stream that's either open or half-closed.
> On 2 Nov 2017, at 11:44 am, Ryan Hamilton <r...@chromium.org> wrote:
>
> We (chrome network team) talked some time back about pushing directly into the HTTP cache and decided against it because (swapping in context) it lets the server write entries into the HTTP cache that the client would not otherwise create. For example, consider a situation where a page is infected with malware. SafeBrowsing would kick in and prevent the request from going to the network and the infected page would not end up in the HTTP cache. Once the page was cleaned up and removed from SafeBrowsing, requests for that URL would go to the network and into the cache. However, if the server pushed the page into the HTTP cache, then after the page is cleaned up and removed from SafeBrowsing, the client would request the URL and the infected page would be served out of the HTTP cache. This seemed bad.
If SafeBrowing detects that a request to https://evil.example.com/ is infected, does the request actually go out on the wire?No.
So, I don't see any reason why you can't just patch in before a cache write. That's the most intuitive way to model it, rather than inventing a new cache -- especially when SB isn't part of the standard.On 3 Nov 2017, at 9:12 am, Ryan Hamilton <r...@chromium.org> wrote:
>
> On Wed, Nov 1, 2017 at 10:50 PM, Anne van Kesteren <ann...@annevk.nl> wrote:
> On Thu, Nov 2, 2017 at 4:46 AM, Ryan Hamilton <r...@chromium.org> wrote:
> > That's true, but imagine that the user requests https://example.com/ which
> > links to https://example.com/other. The request to https://example.com/
> > would go to the wire and the server could try to push /other (which is
> > currently infected) as associated with /. SafeBrowsing would prevent the
> > request for /other from hitting the wire and hence the pushed (infected)
> > resource would not be pulled up into the HTTP cache. After a short period of
> > time it will expire from the H2 push cache.
> >
> > Does that make sense?
>
> If that is the sole reason why invent a new caching mechanism rather
> than perform a SafeBrowsing check before putting pushed resources in
> the cache?
>
> Safe browsing is not the only reason, no. But it's an example (the only one I could remember offhand, admittedly :>) of a class of problems that crop up when things are written to the cache by the net stack without the intervention of higher layers of the code.
Also, AIUI SB is done on a per-origin level; if so, your example above doesn't hold together.
For the host, the client will try at most five different strings. They are:
For the path, the client will try at most six different strings. They are:
My understanding is that no one actually knows where the requirement for a separate push cache is coming from -- both the browser network engineers and browser security folks say it came from somewhere else, but upon examination there isn't a well-stated reason for it.
> On 3 Nov 2017, at 1:47 pm, Ryan Hamilton <r...@chromium.org> wrote:
> I was simply attempting to explain why the Chrome network engineers chose to have a separate push cache. Namely, that pushing straight into the cache at the HTTP layer results in entries in the cache for resources that the client may not request for policy reasons. Now sure, it might be possible to up call from the network layer into the loading/rendering layer at cache write time to see if a push really should go into the cache. But that's non-trivially complex and as it will be asynchronous, you need a place for the pushed data to live while you're waiting for that call to complete. You might call such a waiting area the "push cache" or you might call it the "push map". Or, instead of doing the up-call, you could instead not do the upcall and simply wait for the request to come in. Both cases result in pushed stream data sitting down in the network before being pulled up into the cache when instructed by the higher layers. The latter solution is much easier to implement. But either way, you still end up with the same holding area.
This leaves me wondering why the SafeBrowsing is considered to be at the loading/rendering layer, when it seems so well-suited to be a network-layer function.
An alternative approach would be to mark it "potentially dirty" upon cache insertion, updating the cache with "clean" or "dirty" (i.e., purging it) upon the SafeBrowsing call returning. This is the approach that's taken in similar situations in intermediary caches, and it works reasonably well at that scale. That avoids creating another layer of caching.
If it's not obvious, a lot of the concern about this is because having another layer of caching makes understanding the system behaviour all the more difficult -- especially when that cache's behaviours aren't aligned with the others'.
> In any case, the possible change to async SafeBrowsing and the effects this has on the HTTP cache are an ... active topic of discussion at the moment. :) The WebRequest extension API is another example of such a higher layer interaction.
Interesting. That might be a better illustration than SafeBrowsing.
In VCL-land, we have a separate callback for inspecting/modifying a response before (possible) cache insertion. I can't help but wonder if browsers needs something similar. I know ServiceWorker can do it with *its* cache, but while there's still a HTTP cache in the client, it needs to be accounted for too.
To back up, I came into this thread to address the assertion:My understanding is that no one actually knows where the requirement for a separate push cache is coming from -- both the browser network engineers and browser security folks say it came from somewhere else, but upon examination there isn't a well-stated reason for it.I was simply attempting to explain why the Chrome network engineers chose to have a separate push cache. Namely, that pushing straight into the cache at the HTTP layer results in entries in the cache for resources that the client may not request for policy reasons. Now sure, it might be possible to up call from the network layer into the loading/rendering layer at cache write time to see if a push really should go into the cache.
On Thu, Nov 2, 2017 at 7:47 PM Ryan Hamilton <r...@chromium.org> wrote:To back up, I came into this thread to address the assertion:My understanding is that no one actually knows where the requirement for a separate push cache is coming from -- both the browser network engineers and browser security folks say it came from somewhere else, but upon examination there isn't a well-stated reason for it.I was simply attempting to explain why the Chrome network engineers chose to have a separate push cache. Namely, that pushing straight into the cache at the HTTP layer results in entries in the cache for resources that the client may not request for policy reasons. Now sure, it might be possible to up call from the network layer into the loading/rendering layer at cache write time to see if a push really should go into the cache.This actually strikes me as a possibility.<naive-speculation>Call up, not at cache write time, but when the network layer receives the PUSH_PROMISE (semantically, a Request) that opens the response stream. This PUSH_PROMISE is associated with an existing fetch operation "parent". If "parent" is owned by a JS fetch() call, its observer (https://gist.github.com/slightlyoff/18dc42ae00768c23fbc4c5097400adfb#gistcomment-2227534) receives a cancelable event for the PUSH_PROMISE. If it's canceled, that RST_STREAMs the pushed stream.Otherwise, "parent"'s page fetches the pushed request, which schedules it to be canceled if the page navigates, and runs the request through any frontend-based SafeBrowsing checks before directing the response stream toward the cache. If "parent" is owned by the browser itself, it just always does the default action to fetch the pushed request.This default fetch can also be the place to incorporate the Vary-checking in this thread's Intent: if a new request to the pushed's request's URL wouldn't match the pushed Vary header, it could cancel instead of actually fetching.There's still a period where the network stack has to store the response stream before the page has taken ownership, but I think this avoids exposing that complexity to web authors.</naive-speculation>What have I missed?
On Fri, Nov 3, 2017 at 4:28 PM Ryan Hamilton <r...@chromium.org>
I *think*, although I could be wrong, that passing every push up to the page, which always either fetches it into the HTTP cache or cancels it, eliminates the developer-visible push cache. That in turn gets rid of two troublesome things Jake pointed out:* If the connection closes, bye bye push cache* Requests without credentials use a separate connection
On Fri, Nov 3, 2017 at 11:51 PM Jeffrey Yasskin <jyas...@google.com> wrote:On Fri, Nov 3, 2017 at 4:28 PM Ryan Hamilton <r...@chromium.org>I *think*, although I could be wrong, that passing every push up to the page, which always either fetches it into the HTTP cache or cancels it, eliminates the developer-visible push cache. That in turn gets rid of two troublesome things Jake pointed out:* If the connection closes, bye bye push cache* Requests without credentials use a separate connectionWhile getting rid of the above would indeed be great, I'm not sure the conclusion must be that we need to store pushed resources in the HTTP cache. I also think this is a wider-scoped discussion which doesn't necessarily need to block the work that Bence is proposing, which will lead to arguably better push behavior in the interim.
What I'm missing here is a way to assess the risk involved. While push changes are likely to bare little breakage risk, they can lead to spurious downloads, and can result in perf regressions.A few points that bother me:* As I commented in the doc, I think the way Range request support will be implemented will increase spurious pushes. I believe we need to gather data on the presence of such pushes today before making this compromise.* It's not clear to me what the use case is for `Vary` headers in pushed responses. Generally, if you need content negotiation, server push is not the right tool for the job (unless that negotiation gets the same result in 99% of the cases, e.g. "Content-Encoding"). Bence, could you comment on which user scenarios are likely to improve as a result of this change?
Hi,
Thank you for the valuable feedback.
I agree with Jeffrey that it is important for the ecosysem not to
match requests to pushed streams with incompatible Vary values. While
I am not aware of any use case, I know that this is property
implemented for QUIC pushes in Chrome. Cc'ing Buck who implemented
that: Buck, are you aware of any specific use cases?
On Monday, November 6, 2017 at 7:15:51 AM UTC-8, Bence Béky wrote:Hi,
Thank you for the valuable feedback.
I agree with Jeffrey that it is important for the ecosysem not to
match requests to pushed streams with incompatible Vary values. While
I am not aware of any use case, I know that this is property
implemented for QUIC pushes in Chrome. Cc'ing Buck who implemented
that: Buck, are you aware of any specific use cases?
No. The inclusion in QUIC's implementation wasn't driven by a use case, but rather was based on an e-mail discussion I had with Patrick McManus about RFC 7540.
I can also add a metric to see what percentage of pushed response
headers has a Vary header. If it is exceedingly low, it might be a
good option to reject such pushes for the time being as proposed by
Jeffrey, until a compelling use case arises. However, I would not be
surprised if it actually was pretty high due to Content-Encoding.
Yoav: I was told that game asset loading or DASH streaming could
potentially benefit from non-overlapping Range pushes for the same
URL.
Again, if wasted download bandwidth is a concern, measurements
can be made before making this change.
On Mon, Nov 6, 2017 at 10:08 AM <ckr...@chromium.org> wrote:
On Monday, November 6, 2017 at 7:15:51 AM UTC-8, Bence Béky wrote:Hi,
Thank you for the valuable feedback.
I agree with Jeffrey that it is important for the ecosysem not to
match requests to pushed streams with incompatible Vary values. While
I am not aware of any use case, I know that this is property
implemented for QUIC pushes in Chrome. Cc'ing Buck who implemented
that: Buck, are you aware of any specific use cases?
No. The inclusion in QUIC's implementation wasn't driven by a use case, but rather was based on an e-mail discussion I had with Patrick McManus about RFC 7540.Changing our behavior without a clear use-case is fine. I was just wondering if there was a particular reason you chose to tackle this issue.Also, does a request from Patrick McManus to change behavior here means that Firefox will also align on similar behavior? Is there an open issue for them on that front?(CCing Patrick)
I can also add a metric to see what percentage of pushed response
headers has a Vary header. If it is exceedingly low, it might be a
good option to reject such pushes for the time being as proposed by
Jeffrey, until a compelling use case arises. However, I would not be
surprised if it actually was pretty high due to Content-Encoding.Yeah, I would hope Content-Encoding negotiation can happen on the initial request before anything is pushed on the connection, but that may not always be the case. Gathering that data can be interesting, but I don't know if we should block shipping on it.
Yoav: I was told that game asset loading or DASH streaming could
potentially benefit from non-overlapping Range pushes for the same
URL.That makes sense.Again, if wasted download bandwidth is a concern, measurements
can be made before making this change.Ideally, I'd love to see Range request support that still RSTs push promises on formerly-pushed URLs which have no range in their request.That kind of implementation will have no spurious-download concerns while still covering the Range request use-case.
--
You received this message because you are subscribed to a topic in the Google Groups "blink-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/a/chromium.org/d/topic/blink-dev/5_aP_stqndw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to blink-dev+unsubscribe@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CACj%3DBEhTKppWXLw5hPjN99-DZsgzfxJ79WSrpGvzoyNK49hncA%40mail.gmail.com.
On Mon, Nov 6, 2017 at 11:58 AM, Yoav Weiss <yo...@yoav.ws> wrote:On Mon, Nov 6, 2017 at 10:08 AM <ckr...@chromium.org> wrote:
On Monday, November 6, 2017 at 7:15:51 AM UTC-8, Bence Béky wrote:Hi,
Thank you for the valuable feedback.
I agree with Jeffrey that it is important for the ecosysem not to
match requests to pushed streams with incompatible Vary values. While
I am not aware of any use case, I know that this is property
implemented for QUIC pushes in Chrome. Cc'ing Buck who implemented
that: Buck, are you aware of any specific use cases?
No. The inclusion in QUIC's implementation wasn't driven by a use case, but rather was based on an e-mail discussion I had with Patrick McManus about RFC 7540.Changing our behavior without a clear use-case is fine. I was just wondering if there was a particular reason you chose to tackle this issue.Also, does a request from Patrick McManus to change behavior here means that Firefox will also align on similar behavior? Is there an open issue for them on that front?(CCing Patrick)Clarification. I didn't see it as a change in behavior. I was doing the initial implementation of server push in chromium QUIC, and I was just trying to understand the spec.
yes - the firefox push cache should take vary into account.. (I'm not saying someone is typing the code for that atm, but that's where we'll want to be..)