I today worked on ticket #580 to make the caching work better with
dependencies on headers. The problem: the current caching won't take
into account many HTTP headers and won't be easily configureable for
new headers. So for example if your page content is based on cookies,
the cache middleware won't see that and will send out content
regardless of cookie value from the cache.
Sune Kirkeby did the main work by splitting the CacheMiddleware into
three distinct middlewares: one for gzip encoding, one for HTTP
conditional GET handling and one for caching. Those are already linked
from the user contributed middlewares.
What my patch does is integrate those middlewares into django core and
to rework the decorators to be based fully on the middleware code (so
that there is only one place where cache handling is done).
Additionally I added some helper stuff to manage the Vary response
header, as that is the beast the caching now bases it's cache keys on
(before it was based on the path, the Accept-Encoding header - only
partly, as it only looked for gzip support).
I did some tests with the middlewares and the decorators and it worked
fine. Maybe somebody else want's to have a look at this stuff.
Would be great if this - or something like this - could make it into
trunk, as the current caching will collide heavily with my i18n work.
A nice side-effect of the patch: django will be one of the first python
web frameworks that fully interoperates with accellerator proxies like
squid without degrading them to simple forwarders (like Pragma:
no-cache or Cache-Control: none would do) :-)
> Do you mean that django will be ICP protocol compatible ??
No, but it will give correct Vary header and set some Cache-control
headers. Usually frameworks just ignore this stuff and send out stuff
like Cache-control: private or even Pragma: no-cache - in fact
disabling caching completely. Django will send out headers that allow
caches to cache within the given limits, though. And it will do it in a
controlled way that makes it easier to do the right thing :-)
>1) If I want to use all 3 cache middlewares + session middleware, what is
>the correct order now? Is this stack correct:
They can have allmost any order you like, but you have to make sure
that the cache middleware is after all middlwares that might change
content depending on headers - for example the SessionMiddleware needs
to come first. I would put them in in the following order:
CommonMiddleware (handles the / and stuff and so does redirects)
That way the / handling (redirecting to the / URL if the / is missing)
is done first - it's redirecting and so doesn't need any of the other
middlewares. Session are handled early on, because some other
middleware or the view might depend on it - I think that one should go
in as early as possible (for example the i18n middleware will make use
of sessions if the middleware is loaded after the SessionMiddleware).
Then the caching takes place. I moved the ConditionalGetMiddleware
after it, because it depends on headers set in the CacheMiddleware -
and using it before the CacheMiddleware will lead to problems, as those
headers might not be set.
The GZipMiddleware is last and after the cache, because it doesn't
really change the content based on headers, it just changes the
encoding - so the cache can merrily carry the uncompressed content and
that way the cache _only_ will contain uncompressed content. If you
have large pages that need to be compressed and so the recompressing
takes too much resources, you might want to move it before the
CacheMiddleware, as then the cache will store both compressed (for
users that request it) and uncompressed pages.
>2) If I want to use GZipMiddleware, do I have to specify that my response
>depends on presence of 'gzip' in HTTP_ACCEPT_ENCODING? Is it done
>automatically? If not, how to specify it?
Yes, if you use the GZipMiddleware, pages are compressed if gzip is in
the Accept-Encoding header of the request.
>3) Do I have to define CACHE_MIDDLEWARE_GZIP? Or is it obsolete now?
It is obsolete, now. If you don't want GZipping stuff, just don't load
>4) Is it possible to use GZipMiddleware without cache?
Yes, of course. Every middleware can be used alone from the others.
Just use the middleware you need, if you just need compression, just
use the GZipMiddleware alone.
Even the ConditionalGetMiddleware - but that one needs some headers to
be present, so it might not work as good as you would like it without
the CacheMiddleware. The reason being that without the CacheMiddleware,
the view will allways have to run fully to produce a response and only
then the ConditionalGetMiddleware can kick in (it needs the ETag or
Last-Modified headers). So if you use it alone without the Cache, the
view will still be run, only the transfer is reduced (but that might be
usefull, too - for example in tight tranfer volume situations).
All middlewares that produce/change the response based on headers will
add those headers to the Vary resonse header. That way another cache in
front of the project (like a transparent proxy or the proxy of the
user) will handle caching correctly. So using the GZipMiddleware alone
will make other proxies store pages in the cache based on the URI and
the Accept-Encoding header. Using the SessionMiddleware will add the
Cookie to the list of headers to base storage on.
BTW: Django authentication is cookie based, so if your pages are
different for each user (and different for anonymous from logged in
users), you might want to use the @vary_on_cookie decorator on those
views that are different per user to make sure that those views are
cached based on the cookie header. Of course this manual decorating
only needs to take place when you don't use the SessionMiddleware,
because that alrady adds Cookie to the list of Vary headers.
No, I think you got it right. The reverse order of middleware in the
response phase does make sense, but it is a bit counter-intuitive at
some times ;-)
So actually in the list of installed middleware, the cache middleware
should come early on to make sure that it comes last in the response
phase. You should put all middleware that has to react based on the
request before it - like the CommonMiddleware, that does possible
redirects based on the missing "/" bit (and does some shortcutting
In it's request phase the cache middleware just checks wether the
requested page is cached and if it is cached (and shortcuts if it has
learned something). It uses a learned cache key for that - that cache
key is defined in the response phase based on the Vary headers. So it
will only take into account those additions to the Vary header that are
already in there - Middleware that's before the CacheMiddleware will
run after the CacheMiddleware in response and so it's Vary headers
won't be used in the cache key.
This should definitely be clarified in the documentation. Actually I
think it might even make sense to have an additional setting where you
give an explicit order of middleware to use in the response phase -
that way you could even have different order in request and response
phase. Would be interesting what Adrian and Jacob think about this. It
could be implemented in a way that if the user doesn't give an explicit
response order, it will just be the reversed list of middleware - that
way it's like it is now. But if the user wants control, he can set up a
list of response phase handling. The system could even check that both
lists are identical in content and only differ in order, to make sure
the user doesn't forget a middlware in that list.
It would make things much more obvious, I think.
As I wrote in the ticket #730: the LocaleMiddleware needs to come in
process_request _after_ the SessionMiddleware, because it needs the
session handling (the language discovery is done in the
process_request). But it modifies the Vary header, so it must come
_after_ the CacheMiddleware to come _before_ it in the
process_response. So it must be:
to make sure that the CacheMiddleware is run last on process_response
(for caching), first on process_request (for shortcutting) and both
session and translation are initialized int the right order.
Alone this description should be pointer enough that we really need
more obvious ways to order middlewares for process_response than just
"reverse the list of middlewares". It shouldn't take two paragraphs and
one explicit list to tell people how to order middleware with respect
to process_request and process_response ;-)
I think the reversed order is just counter-intuitive - you just don't
think about it when adding middleware and talking about middleware. So
I think at least an optional way to explicitely specify the order would