I was about to respond that your conclusion didn't seem to have
any correlation to the question being studied, but when I read it
in light of the SPDY draft 2 use of "server push" then it does.
Web folks use that same term for asynchronous notifications and
pointcast-style content updates, so I found it very confusing.
BTW, the SPDY server push acts contrary to hierarchical caching,
so it would be actively harmful to the commons. No improvement
in page load times could justify it because a cache hit is both
faster for the client and better for the shared infrastructure.
....Roy
I was about to respond that your conclusion didn't seem to haveOn Jun 4, 2010, at 2:41 PM, Matthew Lloyd wrote:
> Does server push have sufficient impact on page load times to justify the relatively large amount of effort required to build and deploy it? I have been doing a number of experiments over the last few months to attempt to answer that question, and have prepared a report summarizing my findings. You can view the full report here:
>
> http://docs.google.com/View?id=d446246_0cc6c6dkr
>
> I'm also attaching a summary of the benchmark results to this email. Please see the report for more details.
>
> Based on the theoretical analysis and experimental results in the report, it seems that the performance gains achievable with server push probably do not justify the substantial difficulties faced in making it work in the real world. If there are no compelling other use cases for server push, we may wish to consider pruning it from the SPDY specification or making it optional.
any correlation to the question being studied, but when I read it
in light of the SPDY draft 2 use of "server push" then it does.
Web folks use that same term for asynchronous notifications and
pointcast-style content updates, so I found it very confusing.
BTW, the SPDY server push acts contrary to hierarchical caching,
so it would be actively harmful to the commons. No improvement
in page load times could justify it because a cache hit is both
faster for the client and better for the shared infrastructure.
....Roy
Sorry - I'm still confused.
Are you suggesting to remove:
the server initiated stream feature part of the framing layer
or just the "Server Push Transactions" part of the HTTP layering over SPDY
section - which is about multiple responses for the same request
or just the "Prefetching/server push" part of the Future work/experiments
or all of the above.
My feeling is that without the server push parts of HTTP there is no
clear use-case for the server initiated streams in the framing layer.
However I don't think the complexity in the framing layer is too
great for this and thus it might be good to keep the capability
in case use-cases do emerge.
cheers
I have a number of questions/comments here.
* Packet loss seems to make a big difference in results. Your email
didn't mention the distribution of the packet loss. I assume you are
just using a uniform distribution. If not, please describe it
further. Also, I think I should note that this probably is not an
accurate model of real packet loss (not that I have a suggestion for a
better model). I thought you may have mentioned before how the packet
loss affects the page load time, but I don't see anything in this
report. Can you provide an explanation from your analysis of the
packet traces?
* cwnd clearly plays a big role here. We've proposed to the IETF that
cwnd be 10, and SPDY is going to 18. How do values like these affect
PLT?
* It'd be nice to see Speed Tracer output, so we can see how the
difference in receipt times of different resources affects the
rendering engine. Obviously this doesn't work in aggregate, but if we
could see a few different runs per setting, it might be enlightening.
* I think I've mentioned to you before that the results should mention
the caveat that these were gathered with chunking turned off. A test
on a chunked page might have a different result. Sending the first
chunk out earlier will result in client ACKs reaching the server
earlier, thus increasing cwnd sooner. Also, I'm not sure how the
packetization will vary between a chunked page and a non-chunked page,
but it could affect the number of client ACKs, which again will affect
cwnd. Also, chunking will clearly affect the various metrics in terms
of first paint / doc load / page load. I'm also curious how it would
affect the rendering engine.
* This report is only with Google Image Search. It'd be nice to see
results on other page types. Others have suggested the Google
Homepage for one.
In response to the final question of whether the performance gains
achievable with server push are worth it, it's not immediately clear
to me why we're not more excited about the 8.5% speedup (cwnd=50).
Does everyone consider that negligible? Or is it that we don't think
that we'll realistically have a cwnd of 50? Or are we worried about
overcoming the other issues (such as caching)? How much of the image
search page is likely to be cached? I wouldn't think that the image
thumbnails would typically be in cache.
I think we don't have enough data to make a decision yet. I think we
should run more experiments.
On 5 June 2010 09:48, Matthew Lloyd <mll...@chromium.org> wrote:
> On Fri, Jun 4, 2010 at 7:34 PM, Roy T. Fielding <roy.fi...@gmail.com>
>> I was about to respond that your conclusion didn't seem to haveSorry - I'm still confused.
>> any correlation to the question being studied, but when I read it
>> in light of the SPDY draft 2 use of "server push" then it does.
>> Web folks use that same term for asynchronous notifications and
>> pointcast-style content updates, so I found it very confusing.
>
> Sorry for the confusion - yes, as you correctly concluded, the report is
> about the server-initiated stream feature of the SPDY protocol, which we
> have been referring to both within Google and in the specification as
> "server push".
Are you suggesting to remove:
the server initiated stream feature part of the framing layer
or just the "Server Push Transactions" part of the HTTP layering over SPDY
section - which is about multiple responses for the same request
or just the "Prefetching/server push" part of the Future work/experiments
or all of the above.
My feeling is that without the server push parts of HTTP there is no
clear use-case for the server initiated streams in the framing layer.
However I don't think the complexity in the framing layer is too
great for this and thus it might be good to keep the capability
in case use-cases do emerge.
cheers
Matt,
If just the even/odd stream ID is kept , that would leave the door
open for server initiated streams.
I've spent a few days trying to think of another use-case, but the
problem is that there are just no known targets/handlers inside a
browser that can be a destination for any server initiated channels.
The server HTTP push worked because the browser cache is a moderately
well defined target to send streams to. But if there is no benefit
for that, and it violates the cache design anyway (as Roy suggested),
then there are no other well defined targets that I can think of -
other than the DOM itself (which is too scary for me to contemplate).
If the client has to take action (eg register functions) to set up a
target, then it can also open the stream (eg websocket).
cheers
Matthew
If I understand this correctly, the improvement is due to server sending the content in a way that avoids few client roundtrips and client parsing delays. Neither requires 'server push' - or protocol changes.You can define a new content type - say 'application/chunked-mux' - and use the normal "Accept-Encoding" to advertise the client supports it.
Server would detect it and send the same packets it would send in the 'server push' case, but as part of the normal request/response. Servers and caches will see a normal HTTP request - the servlet will need to either encode on demand, or have pre-encoded responses with the multiplexed content. It would be similar with a .zip file, but with the streams chunked and mixed.
Creating such a stream that is optimal for rendering would be tricky - just like it is for server push to decide which resources to send and when, but the result is the same and IMHO it's much cleaner than a protocol change, and it can be deployed on existing infrastructure, so the 8% can be realized now, without waiting for SPDY-aware caches and servers to be widely deployed.
On Tue, Jun 8, 2010 at 10:28 AM, Costin Manolache <cos...@gmail.com> wrote:If I understand this correctly, the improvement is due to server sending the content in a way that avoids few client roundtrips and client parsing delays. Neither requires 'server push' - or protocol changes.You can define a new content type - say 'application/chunked-mux' - and use the normal "Accept-Encoding" to advertise the client supports it.I think you're just kicking the same work to a different layer, but adding a new set of oddities which now percolate through the entire system and even up to the HTML. For instance, what is the ID of the objects sent via the "application/chunked-mix"? Traditionally, the web uses URLs. But in your case, you'll have a single URL for the bundle, and then some sort of sub-id for the resources inside of it.
Maybe you intend that the client will crack open the chunk, and then store the items contained within as URLs and apply the same disk caching policies that it would for any URL? Does that mean these items are independent and expire separately? Or do they all have the same cache expiry settings? I think you'll find that when you run through all the combinations, you've got more complexity in this new scheme than we did with Server Push where we could keep the identity consistent with today's naming.
BTW - there is a resource-bundle proposal from Mozilla which is exactly what you're describing.A couple of other things:a) as you can see from Matthew's data - addressing bundling without addressing cwnd is a performance loser. So the notion that you don't need any protocol changes for this doesn't seem true yet.
b) you lose priorities to some degree, but maybe that could be mitigated with per-priority bundles.
On Tue, Jun 8, 2010 at 10:43 AM, Mike Belshe <mbe...@google.com> wrote:On Tue, Jun 8, 2010 at 10:28 AM, Costin Manolache <cos...@gmail.com> wrote:If I understand this correctly, the improvement is due to server sending the content in a way that avoids few client roundtrips and client parsing delays. Neither requires 'server push' - or protocol changes.You can define a new content type - say 'application/chunked-mux' - and use the normal "Accept-Encoding" to advertise the client supports it.I think you're just kicking the same work to a different layer, but adding a new set of oddities which now percolate through the entire system and even up to the HTML. For instance, what is the ID of the objects sent via the "application/chunked-mix"? Traditionally, the web uses URLs. But in your case, you'll have a single URL for the bundle, and then some sort of sub-id for the resources inside of it.Yes, I think a higher layer is more appropriate for this. The ID of the resource will be the URL - it would require cache servers to support content negotiation, as with all content using Accept.Inside the bundle you'll have sub-resources, each starts with a URL - it's not different than server push.Maybe you intend that the client will crack open the chunk, and then store the items contained within as URLs and apply the same disk caching policies that it would for any URL? Does that mean these items are independent and expire separately? Or do they all have the same cache expiry settings? I think you'll find that when you run through all the combinations, you've got more complexity in this new scheme than we did with Server Push where we could keep the identity consistent with today's naming.It's a different kind of complexity :-) - but I think that if you add up all layers it's less with this scheme.Server, caching proxies, transport are not changed - you only need client changes to understand the new content type. And even on the client - it's not a huge change, multipart/related is pretty common, and this is very similar. The only difference compared with multipart/related is that the parts are chunked and mixed.
BTW - there is a resource-bundle proposal from Mozilla which is exactly what you're describing.A couple of other things:a) as you can see from Matthew's data - addressing bundling without addressing cwnd is a performance loser. So the notion that you don't need any protocol changes for this doesn't seem true yet.My point was that SPDY + push is about equivalent with SPDY + single mixed/chunked response.
On Tue, Jun 8, 2010 at 10:43 AM, Mike Belshe <mbe...@google.com> wrote:On Tue, Jun 8, 2010 at 10:28 AM, Costin Manolache <cos...@gmail.com> wrote:If I understand this correctly, the improvement is due to server sending the content in a way that avoids few client roundtrips and client parsing delays. Neither requires 'server push' - or protocol changes.You can define a new content type - say 'application/chunked-mux' - and use the normal "Accept-Encoding" to advertise the client supports it.I think you're just kicking the same work to a different layer, but adding a new set of oddities which now percolate through the entire system and even up to the HTML. For instance, what is the ID of the objects sent via the "application/chunked-mix"? Traditionally, the web uses URLs. But in your case, you'll have a single URL for the bundle, and then some sort of sub-id for the resources inside of it.Yes, I think a higher layer is more appropriate for this. The ID of the resource will be the URL - it would require cache servers to support content negotiation, as with all content using Accept.Inside the bundle you'll have sub-resources, each starts with a URL - it's not different than server push.
Awesome report Matthew, thanks for putting this together!
I have a number of questions/comments here.
* Packet loss seems to make a big difference in results. Your email
didn't mention the distribution of the packet loss. I assume you are
just using a uniform distribution. If not, please describe it
further.
Also, I think I should note that this probably is not an
accurate model of real packet loss (not that I have a suggestion for a
better model).
I thought you may have mentioned before how the packet
loss affects the page load time, but I don't see anything in this
report. Can you provide an explanation from your analysis of the
packet traces?
* cwnd clearly plays a big role here. We've proposed to the IETF that
cwnd be 10, and SPDY is going to 18. How do values like these affect
PLT?
* It'd be nice to see Speed Tracer output, so we can see how the
difference in receipt times of different resources affects the
rendering engine. Obviously this doesn't work in aggregate, but if we
could see a few different runs per setting, it might be enlightening.
* I think I've mentioned to you before that the results should mention
the caveat that these were gathered with chunking turned off. A test
on a chunked page might have a different result. Sending the first
chunk out earlier will result in client ACKs reaching the server
earlier, thus increasing cwnd sooner. Also, I'm not sure how the
packetization will vary between a chunked page and a non-chunked page,
but it could affect the number of client ACKs, which again will affect
cwnd. Also, chunking will clearly affect the various metrics in terms
of first paint / doc load / page load. I'm also curious how it would
affect the rendering engine.
* This report is only with Google Image Search. It'd be nice to see
results on other page types. Others have suggested the Google
Homepage for one.
In response to the final question of whether the performance gains
achievable with server push are worth it, it's not immediately clear
to me why we're not more excited about the 8.5% speedup (cwnd=50).
Does everyone consider that negligible? Or is it that we don't think
that we'll realistically have a cwnd of 50? Or are we worried about
overcoming the other issues (such as caching)? How much of the image
search page is likely to be cached? I wouldn't think that the image
thumbnails would typically be in cache.
Yes, I'd expect packet loss to be bursty.
>
>>
>> I thought you may have mentioned before how the packet
>> loss affects the page load time, but I don't see anything in this
>> report. Can you provide an explanation from your analysis of the
>> packet traces?
>
> The client was running a recent version of Windows (Vista), so the cause of
> the increase in PLT is not a high initial RTO (i.e. the 3 second penalty we
> saw on XP for SYN and first data packet loss). In this case packet loss
> increases PLT because it causes retransmissions and a cut in the size of the
> congestion window whenever a packet is dropped, which decreases effective
> bandwidth.
Sorry, I wasn't clear in my line of questioning here. You definitely
answered the obvious question of how packet loss affects PLT, but I
was more interested in the relative impact of packet loss on PLT
between the SPDY and SPDY + server push cases. The PLT win of server
push decreases significantly when packet loss is 0%. Are the stalls
caused by packet loss giving server push a bigger win since it's
better able to fill the pipe during the stalls?
>>
>> * cwnd clearly plays a big role here. We've proposed to the IETF that
>> cwnd be 10, and SPDY is going to 18. How do values like these affect
>> PLT?
>
> I haven't benchmarked at cwnd=18, but I can do that if there's enough
> interest. The result is likely to be somewhere between the -4% loss and +10%
> win I saw for server push with the default initcwnd and initcwnd=50,
> respectively. Of course exactly where on this spectrum it would lie will
> also depend crucially on the web property.
>
>>
>> * It'd be nice to see Speed Tracer output, so we can see how the
>> difference in receipt times of different resources affects the
>> rendering engine. Obviously this doesn't work in aggregate, but if we
>> could see a few different runs per setting, it might be enlightening.
>
> I've been asked to provide that before and my general take is that it isn't
> enlightening, and in fact is often very misleading, to look at even a
> handful of packet traces. There is just too much variability so it's easy to
> draw the wrong conclusion.
I think that it's probably wrong to draw an overall conclusion from
just a few packet traces. But what I think would be useful is some
illustrations of how the pipe is getting filled in the different
cases. I'm not sure it's totally misleading, as long as one is aware
that this is just one sample in the distribution and should be aware
of how the overall distribution looks.
>>
>> * I think I've mentioned to you before that the results should mention
>> the caveat that these were gathered with chunking turned off. A test
>> on a chunked page might have a different result. Sending the first
>> chunk out earlier will result in client ACKs reaching the server
>> earlier, thus increasing cwnd sooner. Also, I'm not sure how the
>> packetization will vary between a chunked page and a non-chunked page,
>> but it could affect the number of client ACKs, which again will affect
>> cwnd. Also, chunking will clearly affect the various metrics in terms
>> of first paint / doc load / page load. I'm also curious how it would
>> affect the rendering engine.
>
> By chunking, do you mean that the image search server is able to start
> sending the body content before it knows exactly what the image thumbnail
> URLs will be, by sending the server push headers as a chunk? Yes, it's
> possible this could result in the first response data packet being sent from
> the server sooner than in the current implementation. Arguably this might
> increase cwnd for the portion of the main results page that is after the
> resource reference.
> I can't go into the specifics in this public forum, but we looked at the
> possibility of running that test and decided for, reasons of difficulty of
> implementation in the image search server, that it would require more
> engineering time than it merited. We can of course revisit that decision and
> put engineer resources onto it, but it might be quicker to simulate this in
> the SPDY in-memory server and see whether it makes a difference there.
I was referring to the fact that, for the google image search page, we
send back a significant amount of chunked data (to render the header
of the page), before we actually know the search results. I agree
that implementing this for reals is a non-trivial effort. I just
thought it was worth noting. And I agree that we could simulate it
quite well using the in-memory server.
- Show quoted text -
Can you clarify this further? If you have other data on server push
hurting PLT, then can you publish this as well? If you are making a
case to remove server push or make it optional, it seems like it'd be
useful to include this other data.
> It is possible that the probability the client has the images cached for
> image search is low. However, note the following. (a) The largest image by
> far, and the first to be fetched, is the nav logo, which weighs in at 30KB,
> and is very likely to be cached by any browser that has been to the image
> search property before. (b) Users are likely to repeat their queries within
> a short timeframe (a few weeks), which increases the probability the
> thumbnails are in the cache. (c) Even if the probability that the thumbnail
> is cached is as low as 10%, that is still 10% more thumbnail bandwidth for
> the entire image search property, and all intermediate proxies, routers,
> peers, wireless links, etc., than need to be sent, which is detrimental to
> the commons. In practice, we would be forced to implement the bloom filter
> approach.
> ... and implementing the bloom filter approach is going to: (a) require is
> to essentially reengineer the entire disk cache and pay a possibly large
> performance penalty across the entire browser, or incur a big runtime
> penalty at request time, and (b) add an additional 0.5-1KB to the initial
> request headers that will further reduce the already small and rare
> performance win we have seen from server push.
> We would go to all that trouble for a small probability of a <1 RTT latency
> improvement in certain limited circumstances? I think there is lower hanging
> fruit.
Just for clarity's sake, there's nothing that says that the server has
to push data, right? For something like the nav logo, I don't see why
the server would choose to push this. And yeah, I don't think it's
likely that the image thumbnails would have been cached.
I know this is an old thread but I just wanted to point out that this is not a good site to benchmark Server Push functionality because there is no nesting of content, which means the browser can request the images as soon as it receives and parses the html returned by the image search.Seems like you need to come up with a way to test this for a wider variety of sites as I think those that are highly nested will show a much bigger improvement.