Paco,
I think you are suggesting that Google keep a copy of the web
pages it crawls, and send users to the copy instead of the original
page. There are a few issues with this. The hardest one is dealing
with user specific content. Suppose a site uses a cookie to identify
users, and presents different content based on the cookie. The copy
that google gets doesn't have your cookie, so when you click the link
from search to go to a site, you see a different page than the one you
expect.
Caching proxies solve this problem by examining headers and
caching only content that the web server explicitly marks as safe to
cache. Many ISPs run caching proxies. The only advantage of a Google
proxy over one run by your ISP would be that Google's proxy might have
a larger user population, reducing the odds that you are the first
user to fetch some content. However, it would probably have a higher
round trip time than a computer at your ISP (because you are more
directly connected to your ISP than to anything on the internet). So
it is not clear that real users would see much benefit.
Sam
> It seems to me that in many cases, a significant portion of the
> performance issues is controlled by the distance between the searcher
> and server computers.
>
> --
> You received this message because you are subscribed to the Google Groups "page-speed-discuss" group.
> To post to this group, send email to page-spee...@googlegroups.com.
> To unsubscribe from this group, send email to page-speed-disc...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/page-speed-discuss?hl=en.
>
>
Paco,
Here is a site that shows the user's IP address, and serves ads:
http://whatismyipaddress.com/ . If Google crawls this site, it will
have a copy of the page. That copy will have the IP address of the
computer at Google that fetched it. If you click a link in search
results, and Google links to its copy, you will get the wrong IP
address.
There are many other cases where caching content could break the
functionality of a site. Suppose your page reads statistics out of a
database which gets new data every five minutes. Or your site uses
cookies to show different content to different users. What if you
update the content of your site, but Google doesn't crawl it again for
a week? If you process your web server's logs to figure out how many
users you have (and how much money the ad network owes you), you will
not be happy to discover that your logs have no record of the users
who saw the copy Google cached.
CDNs (such as Akamai) have your consent to serve content. The
creator of the site built and tested it with a CDN in mind, and told
the CDN which resources they should serve. Serving cached content
without a site owner's consent or cooperation, and not breaking any
functionality on the site, is a much harder problem.
Google could act as an http proxy, and only cache resources that
are served with headers that explicitly say caching is allowed. The
second paragraph of my previous message on this thread explains why
your ISP is in a better position than Google to implement this.
Sam