SPDY and Server Push: Analysis and Benchmarks

Matthew Lloyd

unread,

Jun 4, 2010, 5:41:48 PM6/4/10

to spdy...@googlegroups.com

Does server push have sufficient impact on page load times to justify the relatively large amount of effort required to build and deploy it? I have been doing a number of experiments over the last few months to attempt to answer that question, and have prepared a report summarizing my findings. You can view the full report here:

http://docs.google.com/View?id=d446246_0cc6c6dkr

I'm also attaching a summary of the benchmark results to this email. Please see the report for more details.

Based on the theoretical analysis and experimental results in the report, it seems that the performance gains achievable with server push probably do not justify the substantial difficulties faced in making it work in the real world. If there are no compelling other use cases for server push, we may wish to consider pruning it from the SPDY specification or making it optional.

Matthew

SPDY benchmark results 642010 44112 PM.jpg

Roy T. Fielding

unread,

Jun 4, 2010, 7:34:24 PM6/4/10

to spdy...@googlegroups.com

I was about to respond that your conclusion didn't seem to have
any correlation to the question being studied, but when I read it
in light of the SPDY draft 2 use of "server push" then it does.
Web folks use that same term for asynchronous notifications and
pointcast-style content updates, so I found it very confusing.

BTW, the SPDY server push acts contrary to hierarchical caching,
so it would be actively harmful to the commons. No improvement
in page load times could justify it because a cache hit is both
faster for the client and better for the shared infrastructure.

....Roy

Matthew Lloyd

unread,

Jun 4, 2010, 7:48:16 PM6/4/10

to spdy...@googlegroups.com

On Fri, Jun 4, 2010 at 7:34 PM, Roy T. Fielding <roy.fi...@gmail.com> wrote:

On Jun 4, 2010, at 2:41 PM, Matthew Lloyd wrote:

> Does server push have sufficient impact on page load times to justify the relatively large amount of effort required to build and deploy it? I have been doing a number of experiments over the last few months to attempt to answer that question, and have prepared a report summarizing my findings. You can view the full report here:
>
> http://docs.google.com/View?id=d446246_0cc6c6dkr
>
> I'm also attaching a summary of the benchmark results to this email. Please see the report for more details.
>
> Based on the theoretical analysis and experimental results in the report, it seems that the performance gains achievable with server push probably do not justify the substantial difficulties faced in making it work in the real world. If there are no compelling other use cases for server push, we may wish to consider pruning it from the SPDY specification or making it optional.

I was about to respond that your conclusion didn't seem to have
any correlation to the question being studied, but when I read it
in light of the SPDY draft 2 use of "server push" then it does.
Web folks use that same term for asynchronous notifications and
pointcast-style content updates, so I found it very confusing.

Sorry for the confusion - yes, as you correctly concluded, the report is about the server-initiated stream feature of the SPDY protocol, which we have been referring to both within Google and in the specification as "server push".

BTW, the SPDY server push acts contrary to hierarchical caching,
so it would be actively harmful to the commons. No improvement
in page load times could justify it because a cache hit is both
faster for the client and better for the shared infrastructure.

Agreed. Arranging for the client to send a cache digest adds substantial implementation complexity to both the client and the server, and although it would help mitigate the speed and infrastructure problems you mention, it wouldn't completely eliminate them.

Matthew

....Roy

Greg Wilkins

unread,

Jun 6, 2010, 8:40:57 PM6/6/10

to spdy-dev

On 5 June 2010 09:48, Matthew Lloyd <mll...@chromium.org> wrote:
> On Fri, Jun 4, 2010 at 7:34 PM, Roy T. Fielding <roy.fi...@gmail.com>

>> I was about to respond that your conclusion didn't seem to have
>> any correlation to the question being studied, but when I read it
>> in light of the SPDY draft 2 use of "server push" then it does.
>> Web folks use that same term for asynchronous notifications and
>> pointcast-style content updates, so I found it very confusing.
>
> Sorry for the confusion - yes, as you correctly concluded, the report is
> about the server-initiated stream feature of the SPDY protocol, which we
> have been referring to both within Google and in the specification as
> "server push".

Sorry - I'm still confused.

Are you suggesting to remove:

the server initiated stream feature part of the framing layer

or just the "Server Push Transactions" part of the HTTP layering over SPDY
section - which is about multiple responses for the same request

or just the "Prefetching/server push" part of the Future work/experiments

or all of the above.

My feeling is that without the server push parts of HTTP there is no
clear use-case for the server initiated streams in the framing layer.
However I don't think the complexity in the framing layer is too
great for this and thus it might be good to keep the capability
in case use-cases do emerge.

cheers

William Chan (陈智昌)

unread,

Jun 7, 2010, 4:07:16 PM6/7/10

to spdy...@googlegroups.com

Awesome report Matthew, thanks for putting this together!

I have a number of questions/comments here.

* Packet loss seems to make a big difference in results. Your email
didn't mention the distribution of the packet loss. I assume you are
just using a uniform distribution. If not, please describe it
further. Also, I think I should note that this probably is not an
accurate model of real packet loss (not that I have a suggestion for a
better model). I thought you may have mentioned before how the packet
loss affects the page load time, but I don't see anything in this
report. Can you provide an explanation from your analysis of the
packet traces?

* cwnd clearly plays a big role here. We've proposed to the IETF that
cwnd be 10, and SPDY is going to 18. How do values like these affect
PLT?

* It'd be nice to see Speed Tracer output, so we can see how the
difference in receipt times of different resources affects the
rendering engine. Obviously this doesn't work in aggregate, but if we
could see a few different runs per setting, it might be enlightening.

* I think I've mentioned to you before that the results should mention
the caveat that these were gathered with chunking turned off. A test
on a chunked page might have a different result. Sending the first
chunk out earlier will result in client ACKs reaching the server
earlier, thus increasing cwnd sooner. Also, I'm not sure how the
packetization will vary between a chunked page and a non-chunked page,
but it could affect the number of client ACKs, which again will affect
cwnd. Also, chunking will clearly affect the various metrics in terms
of first paint / doc load / page load. I'm also curious how it would
affect the rendering engine.

* This report is only with Google Image Search. It'd be nice to see
results on other page types. Others have suggested the Google
Homepage for one.

In response to the final question of whether the performance gains
achievable with server push are worth it, it's not immediately clear
to me why we're not more excited about the 8.5% speedup (cwnd=50).
Does everyone consider that negligible? Or is it that we don't think
that we'll realistically have a cwnd of 50? Or are we worried about
overcoming the other issues (such as caching)? How much of the image
search page is likely to be cached? I wouldn't think that the image
thumbnails would typically be in cache.

I think we don't have enough data to make a decision yet. I think we
should run more experiments.

Matthew Lloyd

unread,

Jun 7, 2010, 4:23:51 PM6/7/10

to spdy...@googlegroups.com

Hi Greg,

On Sun, Jun 6, 2010 at 8:40 PM, Greg Wilkins <gr...@webtide.com> wrote:

On 5 June 2010 09:48, Matthew Lloyd <mll...@chromium.org> wrote:
> On Fri, Jun 4, 2010 at 7:34 PM, Roy T. Fielding <roy.fi...@gmail.com>

>> I was about to respond that your conclusion didn't seem to have
>> any correlation to the question being studied, but when I read it
>> in light of the SPDY draft 2 use of "server push" then it does.
>> Web folks use that same term for asynchronous notifications and
>> pointcast-style content updates, so I found it very confusing.
>
> Sorry for the confusion - yes, as you correctly concluded, the report is
> about the server-initiated stream feature of the SPDY protocol, which we
> have been referring to both within Google and in the specification as
> "server push".

Sorry - I'm still confused.

Are you suggesting to remove:

the server initiated stream feature part of the framing layer

or just the "Server Push Transactions" part of the HTTP layering over SPDY
section - which is about multiple responses for the same request

or just the "Prefetching/server push" part of the Future work/experiments

or all of the above.

I'm hoping to open up discussion about this, but my thinking so far has been: remove all of the above, or make client-side support for them optional.

My feeling is that without the server push parts of HTTP there is no
clear use-case for the server initiated streams in the framing layer.

Agreed.

However I don't think the complexity in the framing layer is too
great for this and thus it might be good to keep the capability
in case use-cases do emerge.

If there is a use case, or the possibility of a use case, then I agree it is worth leaving the door open for server-initiated streams; the overhead in the framing layer is definitely minimal (effectively just one bit). There is of course additional client-side complexity currently required to respond and keep track of server-initiated streams, and we could eliminate that for the time being by modifying the spec to allow clients to ignore SYN_STREAMs from the server.

cheers

Greg Wilkins

unread,

Jun 7, 2010, 8:27:19 PM6/7/10

to spdy-dev

On 8 June 2010 06:23, Matthew Lloyd <mll...@chromium.org> wrote:
> On Sun, Jun 6, 2010 at 8:40 PM, Greg Wilkins <gr...@webtide.com> wrote:
>> or all of the above.
>
> I'm hoping to open up discussion about this, but my thinking so far has
> been: remove all of the above, or make client-side support for them
> optional.

[...]

>> My feeling is that without the server push parts of HTTP there is no
>> clear use-case for the server initiated streams in the framing layer.
>
> Agreed.
>
>>
>> However I don't think the complexity in the framing layer is too
>> great for this and thus it might be good to keep the capability
>> in case use-cases do emerge.
>
> If there is a use case, or the possibility of a use case, then I agree it is
> worth leaving the door open for server-initiated streams;

Matt,

If just the even/odd stream ID is kept , that would leave the door
open for server initiated streams.

I've spent a few days trying to think of another use-case, but the
problem is that there are just no known targets/handlers inside a
browser that can be a destination for any server initiated channels.
The server HTTP push worked because the browser cache is a moderately
well defined target to send streams to. But if there is no benefit
for that, and it violates the cache design anyway (as Roy suggested),
then there are no other well defined targets that I can think of -
other than the DOM itself (which is too scary for me to contemplate).

If the client has to take action (eg register functions) to set up a
target, then it can also open the stream (eg websocket).

cheers

Mike Belshe

unread,

Jun 8, 2010, 11:34:35 AM6/8/10

to spdy...@googlegroups.com

On Fri, Jun 4, 2010 at 2:41 PM, Matthew Lloyd <mll...@chromium.org> wrote:

Thanks for sending this out Matthew.

In some of your earlier reports, the gains for server push were considerably less. Now you're seeing nearly 10% (and 8% better than server-hint) by using server push. That's a pretty solid improvement :-) What percentage improvement would you want to see before you'd recommend server push?

Mike

Matthew

Costin Manolache

unread,

Jun 8, 2010, 1:28:06 PM6/8/10

to spdy...@googlegroups.com

If I understand this correctly, the improvement is due to server sending the content in a way that avoids few client roundtrips and client parsing delays. Neither requires 'server push' - or protocol changes.

You can define a new content type - say 'application/chunked-mux' - and use the normal "Accept-Encoding" to advertise the client supports it.

Server would detect it and send the same packets it would send in the 'server push' case, but as part of the normal request/response. Servers and caches will see a normal HTTP request - the servlet will need to either encode on demand, or have pre-encoded responses with the multiplexed content. It would be similar with a .zip file, but with the streams chunked and mixed.

Creating such a stream that is optimal for rendering would be tricky - just like it is for server push to decide which resources to send and when, but the result is the same and IMHO it's much cleaner than a protocol change, and it can be deployed on existing infrastructure, so the 8% can be realized now, without waiting for SPDY-aware caches and servers to be widely deployed.

IMHO the right question is not if there are other use cases for server push, but if there are use cases that can't be implemented with a hanging GET. SPDY will hopefully make it easier for hanging GET to work properly.

Big +1 on removing the server push completely and simplifying the protocol, including the odd/even oddity.

Costin

Mike Belshe

unread,

Jun 8, 2010, 1:43:30 PM6/8/10

to spdy...@googlegroups.com

On Tue, Jun 8, 2010 at 10:28 AM, Costin Manolache <cos...@gmail.com> wrote:

If I understand this correctly, the improvement is due to server sending the content in a way that avoids few client roundtrips and client parsing delays. Neither requires 'server push' - or protocol changes.
You can define a new content type - say 'application/chunked-mux' - and use the normal "Accept-Encoding" to advertise the client supports it.

I think you're just kicking the same work to a different layer, but adding a new set of oddities which now percolate through the entire system and even up to the HTML. For instance, what is the ID of the objects sent via the "application/chunked-mix"? Traditionally, the web uses URLs. But in your case, you'll have a single URL for the bundle, and then some sort of sub-id for the resources inside of it. Maybe you intend that the client will crack open the chunk, and then store the items contained within as URLs and apply the same disk caching policies that it would for any URL? Does that mean these items are independent and expire separately? Or do they all have the same cache expiry settings? I think you'll find that when you run through all the combinations, you've got more complexity in this new scheme than we did with Server Push where we could keep the identity consistent with today's naming.

BTW - there is a resource-bundle proposal from Mozilla which is exactly what you're describing.

A couple of other things:

a) as you can see from Matthew's data - addressing bundling without addressing cwnd is a performance loser. So the notion that you don't need any protocol changes for this doesn't seem true yet.

b) you lose priorities to some degree, but maybe that could be mitigated with per-priority bundles.

Server would detect it and send the same packets it would send in the 'server push' case, but as part of the normal request/response. Servers and caches will see a normal HTTP request - the servlet will need to either encode on demand, or have pre-encoded responses with the multiplexed content. It would be similar with a .zip file, but with the streams chunked and mixed.

Creating such a stream that is optimal for rendering would be tricky - just like it is for server push to decide which resources to send and when, but the result is the same and IMHO it's much cleaner than a protocol change, and it can be deployed on existing infrastructure, so the 8% can be realized now, without waiting for SPDY-aware caches and servers to be widely deployed.

For the record, it may prove that many lessons from SPDY can be applied back to HTTP. If we can do that with everything SPDY does, that is *preferred*. Nobody wants to change horses unnecessarily.

Mike

Costin Manolache

unread,

Jun 8, 2010, 2:18:44 PM6/8/10

to spdy...@googlegroups.com

On Tue, Jun 8, 2010 at 10:43 AM, Mike Belshe <mbe...@google.com> wrote:

On Tue, Jun 8, 2010 at 10:28 AM, Costin Manolache <cos...@gmail.com> wrote:
If I understand this correctly, the improvement is due to server sending the content in a way that avoids few client roundtrips and client parsing delays. Neither requires 'server push' - or protocol changes.
You can define a new content type - say 'application/chunked-mux' - and use the normal "Accept-Encoding" to advertise the client supports it.

I think you're just kicking the same work to a different layer, but adding a new set of oddities which now percolate through the entire system and even up to the HTML. For instance, what is the ID of the objects sent via the "application/chunked-mix"? Traditionally, the web uses URLs. But in your case, you'll have a single URL for the bundle, and then some sort of sub-id for the resources inside of it.

Yes, I think a higher layer is more appropriate for this. The ID of the resource will be the URL - it would require cache servers to support content negotiation, as with all content using Accept.

Inside the bundle you'll have sub-resources, each starts with a URL - it's not different than server push.

Maybe you intend that the client will crack open the chunk, and then store the items contained within as URLs and apply the same disk caching policies that it would for any URL? Does that mean these items are independent and expire separately? Or do they all have the same cache expiry settings? I think you'll find that when you run through all the combinations, you've got more complexity in this new scheme than we did with Server Push where we could keep the identity consistent with today's naming.

It's a different kind of complexity :-) - but I think that if you add up all layers it's less with this scheme.

Server, caching proxies, transport are not changed - you only need client changes to understand the new content type. And even on the client - it's not a huge change, multipart/related is pretty common, and this is very similar. The only difference compared with multipart/related is that the parts are chunked and mixed.

BTW - there is a resource-bundle proposal from Mozilla which is exactly what you're describing.

A couple of other things:
a) as you can see from Matthew's data - addressing bundling without addressing cwnd is a performance loser. So the notion that you don't need any protocol changes for this doesn't seem true yet.

My point was that SPDY + push is about equivalent with SPDY + single mixed/chunked response.

b) you lose priorities to some degree, but maybe that could be mitigated with per-priority bundles.

Yes - but removing priorities is another good simplification of the protocol :-)

Costin

Roberto Peon

unread,

Jun 8, 2010, 2:27:52 PM6/8/10

to spdy...@googlegroups.com

On Tue, Jun 8, 2010 at 11:18 AM, Costin Manolache <cos...@gmail.com> wrote:

On Tue, Jun 8, 2010 at 10:43 AM, Mike Belshe <mbe...@google.com> wrote:

On Tue, Jun 8, 2010 at 10:28 AM, Costin Manolache <cos...@gmail.com> wrote:
If I understand this correctly, the improvement is due to server sending the content in a way that avoids few client roundtrips and client parsing delays. Neither requires 'server push' - or protocol changes.
You can define a new content type - say 'application/chunked-mux' - and use the normal "Accept-Encoding" to advertise the client supports it.

I think you're just kicking the same work to a different layer, but adding a new set of oddities which now percolate through the entire system and even up to the HTML. For instance, what is the ID of the objects sent via the "application/chunked-mix"? Traditionally, the web uses URLs. But in your case, you'll have a single URL for the bundle, and then some sort of sub-id for the resources inside of it.

Yes, I think a higher layer is more appropriate for this. The ID of the resource will be the URL - it would require cache servers to support content negotiation, as with all content using Accept.

Inside the bundle you'll have sub-resources, each starts with a URL - it's not different than server push.

Maybe you intend that the client will crack open the chunk, and then store the items contained within as URLs and apply the same disk caching policies that it would for any URL? Does that mean these items are independent and expire separately? Or do they all have the same cache expiry settings? I think you'll find that when you run through all the combinations, you've got more complexity in this new scheme than we did with Server Push where we could keep the identity consistent with today's naming.

It's a different kind of complexity :-) - but I think that if you add up all layers it's less with this scheme.

Server, caching proxies, transport are not changed - you only need client changes to understand the new content type. And even on the client - it's not a huge change, multipart/related is pretty common, and this is very similar. The only difference compared with multipart/related is that the parts are chunked and mixed.

Note that you have a combinatorial explosion problem with caching bundles. We've seen it already, and backed away from it as it was hell on cache hit rates. The theoretical "infinite" cache doesn't exist!

BTW - there is a resource-bundle proposal from Mozilla which is exactly what you're describing.

A couple of other things:
a) as you can see from Matthew's data - addressing bundling without addressing cwnd is a performance loser. So the notion that you don't need any protocol changes for this doesn't seem true yet.

My point was that SPDY + push is about equivalent with SPDY + single mixed/chunked response.

It isn't the same-- caches become less effective if the bundles are not consistently the same the vast majority of the time. In a bundling scheme, caches will cache the same object multiple times unless the cache can unbundle and rebundle on the fly. Unfortunately a cache really can't unbundle and rebundle unless the bundling is defined at a layer that the cache understands.

It may be that bundling is "good enough," which would be fine, but it isn't equivalent.

-=R

Mike Belshe

unread,

Jun 8, 2010, 2:32:01 PM6/8/10

to spdy...@googlegroups.com

On Tue, Jun 8, 2010 at 11:18 AM, Costin Manolache <cos...@gmail.com> wrote:

On Tue, Jun 8, 2010 at 10:43 AM, Mike Belshe <mbe...@google.com> wrote:

On Tue, Jun 8, 2010 at 10:28 AM, Costin Manolache <cos...@gmail.com> wrote:
If I understand this correctly, the improvement is due to server sending the content in a way that avoids few client roundtrips and client parsing delays. Neither requires 'server push' - or protocol changes.
You can define a new content type - say 'application/chunked-mux' - and use the normal "Accept-Encoding" to advertise the client supports it.

I think you're just kicking the same work to a different layer, but adding a new set of oddities which now percolate through the entire system and even up to the HTML. For instance, what is the ID of the objects sent via the "application/chunked-mix"? Traditionally, the web uses URLs. But in your case, you'll have a single URL for the bundle, and then some sort of sub-id for the resources inside of it.

Yes, I think a higher layer is more appropriate for this. The ID of the resource will be the URL - it would require cache servers to support content negotiation, as with all content using Accept.

Inside the bundle you'll have sub-resources, each starts with a URL - it's not different than server push.

You're way under-appreciating the problem you're creating. You're creating multiple identities for the same resource.

Example: We have bundle.zip, containing foo.gif and bar.jpg. "foo.gif" is now identifiable by two names: a subset of "http://www.foo.com/bundle.zip", or by "http://www.foo.com/foo.gif". If we extract them from the zip, and put them into the browser cache, when the browser cache expires, how do you address "foo.gif"? Do you go download bundle.zip? Or do you go download foo.gif? Don't forget browsers have multi-level caches, further complicating matters.

When you come across an image tag, which URL does it use to address foo.gif?

Having two names for the same resource is an architectural change with repercussions through the entire system. Server push contains the transport changes in the protocol only. The application is unchanged.

Mike

Matthew Lloyd

unread,

Jun 8, 2010, 2:36:47 PM6/8/10

to spdy...@googlegroups.com

Hi Will,

On Mon, Jun 7, 2010 at 4:07 PM, William Chan (陈智昌) <will...@chromium.org> wrote:

Awesome report Matthew, thanks for putting this together!

I have a number of questions/comments here.

* Packet loss seems to make a big difference in results. Your email
didn't mention the distribution of the packet loss. I assume you are
just using a uniform distribution. If not, please describe it
further.

Yes, it's uniform, random, and agnostic to packet type - this is PLR as simulated by Dummynet (i.e. an ipfw pipe).

Also, I think I should note that this probably is not an
accurate model of real packet loss (not that I have a suggestion for a
better model).

There is no evidence that I'm aware of that this isn't a suitably realistic model of packet loss. What makes you believe this is probably not an accurate model? We don't know for sure, but it seems likely that most packet loss occurs either in the last hop due to link errors like poor WiFi signal strength, noisy modem lines, etc., or due to congestion at an intermediate router. Neither of these kinds of packet loss would be anything other than random, uniform, and agnostic to type.

I can see an argument that packet loss would be bursty, but I'd expect it nevertheless to be a stationary process over the duration of a single page load.

I thought you may have mentioned before how the packet
loss affects the page load time, but I don't see anything in this
report. Can you provide an explanation from your analysis of the
packet traces?

The client was running a recent version of Windows (Vista), so the cause of the increase in PLT is not a high initial RTO (i.e. the 3 second penalty we saw on XP for SYN and first data packet loss). In this case packet loss increases PLT because it causes retransmissions and a cut in the size of the congestion window whenever a packet is dropped, which decreases effective bandwidth.

* cwnd clearly plays a big role here. We've proposed to the IETF that
cwnd be 10, and SPDY is going to 18. How do values like these affect
PLT?

I haven't benchmarked at cwnd=18, but I can do that if there's enough interest. The result is likely to be somewhere between the -4% loss and +10% win I saw for server push with the default initcwnd and initcwnd=50, respectively. Of course exactly where on this spectrum it would lie will also depend crucially on the web property.

* It'd be nice to see Speed Tracer output, so we can see how the
difference in receipt times of different resources affects the
rendering engine. Obviously this doesn't work in aggregate, but if we
could see a few different runs per setting, it might be enlightening.

I've been asked to provide that before and my general take is that it isn't enlightening, and in fact is often very misleading, to look at even a handful of packet traces. There is just too much variability so it's easy to draw the wrong conclusion.

* I think I've mentioned to you before that the results should mention
the caveat that these were gathered with chunking turned off. A test
on a chunked page might have a different result. Sending the first
chunk out earlier will result in client ACKs reaching the server
earlier, thus increasing cwnd sooner. Also, I'm not sure how the
packetization will vary between a chunked page and a non-chunked page,
but it could affect the number of client ACKs, which again will affect
cwnd. Also, chunking will clearly affect the various metrics in terms
of first paint / doc load / page load. I'm also curious how it would
affect the rendering engine.

By chunking, do you mean that the image search server is able to start sending the body content before it knows exactly what the image thumbnail URLs will be, by sending the server push headers as a chunk? Yes, it's possible this could result in the first response data packet being sent from the server sooner than in the current implementation. Arguably this might increase cwnd for the portion of the main results page that is after the resource reference.

I can't go into the specifics in this public forum, but we looked at the possibility of running that test and decided for, reasons of difficulty of implementation in the image search server, that it would require more engineering time than it merited. We can of course revisit that decision and put engineer resources onto it, but it might be quicker to simulate this in the SPDY in-memory server and see whether it makes a difference there.

* This report is only with Google Image Search. It'd be nice to see
results on other page types. Others have suggested the Google
Homepage for one.

Mike has already run benchmarks on a wider range of websites - see the SPDY whitepaper http://sites.google.com/a/chromium.org/dev/spdy/spdy-whitepaper. I guess those tests would need to be rerun with the higher proposed initcwnd.

Just to give some background - we decided to focus on Google Image Search because it seems to be perhaps the ideal use case for server push. If it's going to be a win anywhere, it would be a win on image search. The constraining factor is the expense in terms of engineer time of setting up local servers for each property type. It's much less expensive to use the SPDY in-memory server to serve cached copies of those sites in a controllable way, but that doesn't simulate realistic server think time; it's configurable, but currently only per site - not per resource - which isn't realistic as the main results page might require 200ms say, but the thumbnails require only 5-10ms. So the idea was that we would test on a live server to see what the performance impact would be like in the real world.

In response to the final question of whether the performance gains
achievable with server push are worth it, it's not immediately clear
to me why we're not more excited about the 8.5% speedup (cwnd=50).
Does everyone consider that negligible? Or is it that we don't think
that we'll realistically have a cwnd of 50? Or are we worried about
overcoming the other issues (such as caching)? How much of the image
search page is likely to be cached? I wouldn't think that the image
thumbnails would typically be in cache.

My take here is that it's an 8.5% speedup in a rather artificial set of circumstances - with initcwnd cranked up over twice as high as we're likely to be able to use in practice, on a property that was chosen specifically with the intent of giving server push a chance to look good, with a specific set of packet loss conditions, etc. I've run a lot of these benchmarks before, and I think this is the only time - out of maybe three dozen experiments - that I've seen server push give any latency win at all. All the other times it has actually hurt page load times.

It is possible that the probability the client has the images cached for image search is low. However, note the following. (a) The largest image by far, and the first to be fetched, is the nav logo, which weighs in at 30KB, and is very likely to be cached by any browser that has been to the image search property before. (b) Users are likely to repeat their queries within a short timeframe (a few weeks), which increases the probability the thumbnails are in the cache. (c) Even if the probability that the thumbnail is cached is as low as 10%, that is still 10% more thumbnail bandwidth for the entire image search property, and all intermediate proxies, routers, peers, wireless links, etc., than need to be sent, which is detrimental to the commons. In practice, we would be forced to implement the bloom filter approach.

... and implementing the bloom filter approach is going to: (a) require is to essentially reengineer the entire disk cache and pay a possibly large performance penalty across the entire browser, or incur a big runtime penalty at request time, and (b) add an additional 0.5-1KB to the initial request headers that will further reduce the already small and rare performance win we have seen from server push.

We would go to all that trouble for a small probability of a <1 RTT latency improvement in certain limited circumstances? I think there is lower hanging fruit.

Matthew

Roberto Peon

unread,

Jun 8, 2010, 2:43:09 PM6/8/10

to spdy...@googlegroups.com

Actually.... it is configurable per resource. The server will examine for a special header per resource (whose name I forget at the moment) which will override the default server think time. So long as you record what the think time faithfully, and get it into the response for the in-memory server, it may have a better approximation than you think.

-=R

William Chan (陈智昌)

unread,

Jun 8, 2010, 3:22:12 PM6/8/10

to spdy...@googlegroups.com

On Tue, Jun 8, 2010 at 11:36 AM, Matthew Lloyd <mll...@chromium.org> wrote:

Yes, I'd expect packet loss to be bursty.

>
>>
>> I thought you may have mentioned before how the packet
>> loss affects the page load time, but I don't see anything in this
>> report. Can you provide an explanation from your analysis of the
>> packet traces?
>
> The client was running a recent version of Windows (Vista), so the cause of
> the increase in PLT is not a high initial RTO (i.e. the 3 second penalty we
> saw on XP for SYN and first data packet loss). In this case packet loss
> increases PLT because it causes retransmissions and a cut in the size of the
> congestion window whenever a packet is dropped, which decreases effective
> bandwidth.

Sorry, I wasn't clear in my line of questioning here. You definitely
answered the obvious question of how packet loss affects PLT, but I
was more interested in the relative impact of packet loss on PLT
between the SPDY and SPDY + server push cases. The PLT win of server
push decreases significantly when packet loss is 0%. Are the stalls
caused by packet loss giving server push a bigger win since it's
better able to fill the pipe during the stalls?

>>
>> * cwnd clearly plays a big role here. We've proposed to the IETF that
>> cwnd be 10, and SPDY is going to 18. How do values like these affect
>> PLT?
>
> I haven't benchmarked at cwnd=18, but I can do that if there's enough
> interest. The result is likely to be somewhere between the -4% loss and +10%
> win I saw for server push with the default initcwnd and initcwnd=50,
> respectively. Of course exactly where on this spectrum it would lie will
> also depend crucially on the web property.
>
>>
>> * It'd be nice to see Speed Tracer output, so we can see how the
>> difference in receipt times of different resources affects the
>> rendering engine. Obviously this doesn't work in aggregate, but if we
>> could see a few different runs per setting, it might be enlightening.
>
> I've been asked to provide that before and my general take is that it isn't
> enlightening, and in fact is often very misleading, to look at even a
> handful of packet traces. There is just too much variability so it's easy to
> draw the wrong conclusion.

I think that it's probably wrong to draw an overall conclusion from
just a few packet traces. But what I think would be useful is some
illustrations of how the pipe is getting filled in the different
cases. I'm not sure it's totally misleading, as long as one is aware
that this is just one sample in the distribution and should be aware
of how the overall distribution looks.

>>
>> * I think I've mentioned to you before that the results should mention
>> the caveat that these were gathered with chunking turned off. A test
>> on a chunked page might have a different result. Sending the first
>> chunk out earlier will result in client ACKs reaching the server
>> earlier, thus increasing cwnd sooner. Also, I'm not sure how the
>> packetization will vary between a chunked page and a non-chunked page,
>> but it could affect the number of client ACKs, which again will affect
>> cwnd. Also, chunking will clearly affect the various metrics in terms
>> of first paint / doc load / page load. I'm also curious how it would
>> affect the rendering engine.
>
> By chunking, do you mean that the image search server is able to start
> sending the body content before it knows exactly what the image thumbnail
> URLs will be, by sending the server push headers as a chunk? Yes, it's
> possible this could result in the first response data packet being sent from
> the server sooner than in the current implementation. Arguably this might
> increase cwnd for the portion of the main results page that is after the
> resource reference.
> I can't go into the specifics in this public forum, but we looked at the
> possibility of running that test and decided for, reasons of difficulty of
> implementation in the image search server, that it would require more
> engineering time than it merited. We can of course revisit that decision and
> put engineer resources onto it, but it might be quicker to simulate this in
> the SPDY in-memory server and see whether it makes a difference there.

I was referring to the fact that, for the google image search page, we
send back a significant amount of chunked data (to render the header
of the page), before we actually know the search results. I agree
that implementing this for reals is a non-trivial effort. I just
thought it was worth noting. And I agree that we could simulate it
quite well using the in-memory server.
- Show quoted text -
Can you clarify this further? If you have other data on server push
hurting PLT, then can you publish this as well? If you are making a
case to remove server push or make it optional, it seems like it'd be
useful to include this other data.

> It is possible that the probability the client has the images cached for
> image search is low. However, note the following. (a) The largest image by
> far, and the first to be fetched, is the nav logo, which weighs in at 30KB,
> and is very likely to be cached by any browser that has been to the image
> search property before. (b) Users are likely to repeat their queries within
> a short timeframe (a few weeks), which increases the probability the
> thumbnails are in the cache. (c) Even if the probability that the thumbnail
> is cached is as low as 10%, that is still 10% more thumbnail bandwidth for
> the entire image search property, and all intermediate proxies, routers,
> peers, wireless links, etc., than need to be sent, which is detrimental to
> the commons. In practice, we would be forced to implement the bloom filter
> approach.
> ... and implementing the bloom filter approach is going to: (a) require is
> to essentially reengineer the entire disk cache and pay a possibly large
> performance penalty across the entire browser, or incur a big runtime
> penalty at request time, and (b) add an additional 0.5-1KB to the initial
> request headers that will further reduce the already small and rare
> performance win we have seen from server push.
> We would go to all that trouble for a small probability of a <1 RTT latency
> improvement in certain limited circumstances? I think there is lower hanging
> fruit.

Just for clarity's sake, there's nothing that says that the server has
to push data, right? For something like the nav logo, I don't see why
the server would choose to push this. And yeah, I don't think it's
likely that the image thumbnails would have been cached.

Matthew Lloyd

unread,

Jun 14, 2010, 8:54:54 PM6/14/10

to spdy...@googlegroups.com

I have now completed benchmarking at initcwnd=18, and I've added the results to the spreadsheet:

http://spreadsheets.google.com/pub?key=0AvDGHDB4HedAdFFlcGpFdEMyV1IzWk94QUxYVVlhQ1E&hl=en&output=html

Neither server push nor server hint were faster than regular SPDY with initcwnd=18, 5.0Mbit/1.5Mbit/75ms. Both were quite a bit slower than regular SPDY due to the bandwidth overhead of the additional headers. This held true whether packet loss was turned off, or turned on at 1%.

Matthew

Peter Lepeska

unread,

Apr 2, 2012, 4:03:55 PM4/2/12

to spdy...@googlegroups.com

I know this is an old thread but I just wanted to point out that this is not a good site to benchmark Server Push functionality because there is no nesting of content, which means the browser can request the images as soon as it receives and parses the html returned by the image search.

Seems like you need to come up with a way to test this for a wider variety of sites as I think those that are highly nested will show a much bigger improvement.

Thanks,

Peter

Mike Belshe

unread,

Apr 2, 2012, 4:46:24 PM4/2/12

to spdy...@googlegroups.com

On Mon, Apr 2, 2012 at 1:03 PM, Peter Lepeska <bizzb...@gmail.com> wrote:

I know this is an old thread but I just wanted to point out that this is not a good site to benchmark Server Push functionality because there is no nesting of content, which means the browser can request the images as soon as it receives and parses the html returned by the image search.

Seems like you need to come up with a way to test this for a wider variety of sites as I think those that are highly nested will show a much bigger improvement.

Help is welcomed :-)

Its hard to create the server logic to implement push, which is one of its drawbacks. But this is to be expected with new technologies.

mike

Peter Lepeska

unread,

Apr 2, 2012, 6:22:16 PM4/2/12

to spdy...@googlegroups.com

Sounds good. I'm new to the chromium code base. Could someone point me to where both Server Push and Server Hint are implemented?

Stephen Lamm

unread,

Apr 2, 2012, 7:04:42 PM4/2/12

to spdy...@googlegroups.com

Code search looks like it points in the right direction (I am not super familiar with the code myself):

http://code.google.com/p/chromium/source/search

[Search for "push" with a file path of "src/net/spdy".]

-slamm

William Chan (陈智昌)

unread,

Apr 2, 2012, 7:07:47 PM4/2/12

to spdy...@googlegroups.com

While I'm always happy when people are interested in learning the Chromium code base, I think Mike was implying creating a web app that utilized push. I think there's a lack of server side support for push (to my knowledge, mod_spdy is still trying to come up with an API).

Peter Lepeska

unread,

Apr 2, 2012, 7:28:10 PM4/2/12

to spdy...@googlegroups.com

Okay thanks. My interest in the push implementation in Chromium was only a little related to the question of coming up with good ways to test the benefit of Server Push.

Reply all

Reply to author

Forward