Content distribution network (CDN) versus Page Speed Improvements

7 views
Skip to first unread message

Paco Tomei

unread,
Jul 3, 2010, 10:04:26 AM7/3/10
to page-speed-discuss
Google is set to operate as a content distribution network.

http://en.wikipedia.org/wiki/Content_delivery_network

Why does not Google exploit this issue to its fullest potential?

When someones queries Google, what prevents Google from serving the
page from a cached server closest to the location of the searcher,
rather than from the server where the original page is located?

It seems to me that in many cases, a significant portion of the
performance issues is controlled by the distance between the searcher
and server computers.

Sam Kerner

unread,
Jul 6, 2010, 11:12:44 AM7/6/10
to page-spee...@googlegroups.com
On Sat, Jul 3, 2010 at 10:04 AM, Paco Tomei <fat...@gmail.com> wrote:
> Google is set to operate as a content distribution network.
>
> http://en.wikipedia.org/wiki/Content_delivery_network
>
> Why does not Google exploit this issue to its fullest potential?
>
> When someones queries Google, what prevents Google from serving the
> page from a cached server closest to the location of the searcher,
> rather than from the server where the original page is located?

Paco,
I think you are suggesting that Google keep a copy of the web
pages it crawls, and send users to the copy instead of the original
page. There are a few issues with this. The hardest one is dealing
with user specific content. Suppose a site uses a cookie to identify
users, and presents different content based on the cookie. The copy
that google gets doesn't have your cookie, so when you click the link
from search to go to a site, you see a different page than the one you
expect.

Caching proxies solve this problem by examining headers and
caching only content that the web server explicitly marks as safe to
cache. Many ISPs run caching proxies. The only advantage of a Google
proxy over one run by your ISP would be that Google's proxy might have
a larger user population, reducing the odds that you are the first
user to fetch some content. However, it would probably have a higher
round trip time than a computer at your ISP (because you are more
directly connected to your ISP than to anything on the internet). So
it is not clear that real users would see much benefit.

Sam


> It seems to me that in many cases, a significant portion of the
> performance issues is controlled by the distance between the searcher
> and server computers.
>

> --
> You received this message because you are subscribed to the Google Groups "page-speed-discuss" group.
> To post to this group, send email to page-spee...@googlegroups.com.
> To unsubscribe from this group, send email to page-speed-disc...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/page-speed-discuss?hl=en.
>
>

Paco Tomei

unread,
Jul 6, 2010, 6:46:40 PM7/6/10
to page-speed-discuss
Sam,

I will modify and narrow the focus of my question.

What prevents Google from mirroring the sites that display Google
Adsense ads and broadcast the sites from the optimum server, just like
firms like Akamai do?

Sam Kerner

unread,
Jul 8, 2010, 1:23:11 AM7/8/10
to page-spee...@googlegroups.com
On Tue, Jul 6, 2010 at 6:46 PM, Paco Tomei <fat...@gmail.com> wrote:
> Sam,
>
> I will modify and narrow the focus of my question.
>
> What prevents Google from mirroring the sites that display Google
> Adsense ads and broadcast the sites from the optimum server, just like
> firms like Akamai do?

Paco,

Here is a site that shows the user's IP address, and serves ads:
http://whatismyipaddress.com/ . If Google crawls this site, it will
have a copy of the page. That copy will have the IP address of the
computer at Google that fetched it. If you click a link in search
results, and Google links to its copy, you will get the wrong IP
address.

There are many other cases where caching content could break the
functionality of a site. Suppose your page reads statistics out of a
database which gets new data every five minutes. Or your site uses
cookies to show different content to different users. What if you
update the content of your site, but Google doesn't crawl it again for
a week? If you process your web server's logs to figure out how many
users you have (and how much money the ad network owes you), you will
not be happy to discover that your logs have no record of the users
who saw the copy Google cached.

CDNs (such as Akamai) have your consent to serve content. The
creator of the site built and tested it with a CDN in mind, and told
the CDN which resources they should serve. Serving cached content
without a site owner's consent or cooperation, and not breaking any
functionality on the site, is a much harder problem.

Google could act as an http proxy, and only cache resources that
are served with headers that explicitly say caching is allowed. The
second paragraph of my previous message on this thread explains why
your ISP is in a better position than Google to implement this.

Sam

Reply all
Reply to author
Forward
0 new messages