It's a no brainer for static content to cache permanently. Then if
css/js or images change, version it so you're sure to break cache.
But what about dynamic content?
Here's a discussion:
http://www.webscalingblog.com/performance/caching-http-headers-last-modified-and-etag.html
And some code snippet for a Zend Framework implementation:
http://www.zfsnippets.com/snippets/view/id/67
Curious what your thoughts are?
By hashing the body content, you're sure to know if you have the same
content...but what a shame to not save server resources and still need
to generate the page...you do save bandwidth though.
OTOH, the concept and reality of caching is totally different. How
many times did you have a bug because you might have so many caching
layers? ESI. Cached database calls. Cached images. etc... It can get
pretty tricky and easy to overlook one.
Personally, I'd shy away from sending those headers on dynamic
content, but I did wonder if any of you have played with it in
practice.
I'd say :
- do not use ETag
- yet you can easily find Last-Modified without doing a lot of work.
Of course in a dynamic site it can depend on anything, it depends on
your functionalities. In a blog it could be the datetime of the last
post (or the last comment on a post page). It can be the maximum
datetime of all dynamic generated parts in your page.
--
Julien
> Personally, I'd shy away from sending those headers on dynamic
> content, but I did wonder if any of you have played with it in
> practice.
Django has nice built-in middleware that do Etags/Last Modified for dynamic content. I've turned this on for some dynamic sites and it's a big win for pages where the content rarely changes. It's great for our CMS.
Code snippets backatcha:
http://docs.djangoproject.com/en/dev/topics/cache/#other-optimizations
http://code.djangoproject.com/browser/django/trunk/django/middleware/common.py#L111
http://code.djangoproject.com/browser/django/trunk/django/middleware/http.py
You got me wondering about exactly how much of a CPU hit I'm taking for this feature, so I wrote a little test program to see what the cost of Etag hashing vs Gzip (at least the way django does them): https://gist.github.com/665570
Using the same methods as Django's Etag and Gzip features, calculating the MD5 for Etags turned out to be an *order of magnitude faster* than gzip. Given this, I'd have to recommend using it!
I think you could get fancy with last modified headers as well if you stored content freshness dates in your database, but given how cheap Etags are, it might not be worth the complexity.
Do you guys agree with my methodology?
--
Ryan Witt
http://whichloadsfaster.com/
Thanks for the code snippet Ryan.
Sergey, Good point on CPU. Didn't think about that...
Jonathan, so true re: bad css or js file being cached...
--
I tend to be pretty cautious with stuff that brings about a bit of
improvement in performance with the potential cost of major bugs.
So many weird things happen. I once had to dynamically generate zipped
files containing screensavers and the like for Mac and Windows, cross
browser. Then have that cached by 3rd party ESI provider. And through
some arcane caching configuration, it broke in production for some
files and not others. Was so hard to really nail it down.
This causes people to loose track of the simple logic that goes into
caching when they try to go through all the rules across many layers
instead of looking at layers separately. In reality, they are quite
independent and operate with different objects which makes them
relatively easy to plan and manage actually.
Sergey
On Sunday, November 7, 2010, Jonathan Klein <jonathan...@gmail.com> wrote:
> Caching dynamic content is pretty challenging. We built an in house solution that is similar to varnish, which basically does HTML output caching. We have a ton of really complex rules around how long pages get cached for, what events trigger invalidation, how the purging happens, etc. Granted this is slightly different from setting cache headers, but it's still along the same topic of caching dynamic pages.
>
>
> One of the big issues is the potential to cache bugs. What if rssomeone pushes out a bad CSS file, or a bad sprite? Normally you would just fix it, version the filename, and you would be all set. If you are caching your HTML you have to worry about how many pages have gotten cached with the bad filename. The only way to be sure that you've fixed it is to purge the entire cache.
Some people use RUM extensively, though the trend seems to be toward scripts and pixels for noscript.
--Ryan
In any case, even if you don't cache page requests, all static files
still need to be aggressively cached with URL versioning for
invalidation (not in query string but as part of file/folder name to
avoid stupid heuristics).
--