reworked cache management

7 views
Skip to first unread message

hugo

unread,
Oct 8, 2005, 4:35:21 PM10/8/05
to Django developers
Hi,

I today worked on ticket #580 to make the caching work better with
dependencies on headers. The problem: the current caching won't take
into account many HTTP headers and won't be easily configureable for
new headers. So for example if your page content is based on cookies,
the cache middleware won't see that and will send out content
regardless of cookie value from the cache.

Sune Kirkeby did the main work by splitting the CacheMiddleware into
three distinct middlewares: one for gzip encoding, one for HTTP
conditional GET handling and one for caching. Those are already linked
from the user contributed middlewares.

What my patch does is integrate those middlewares into django core and
to rework the decorators to be based fully on the middleware code (so
that there is only one place where cache handling is done).
Additionally I added some helper stuff to manage the Vary response
header, as that is the beast the caching now bases it's cache keys on
(before it was based on the path, the Accept-Encoding header - only
partly, as it only looked for gzip support).

I did some tests with the middlewares and the decorators and it worked
fine. Maybe somebody else want's to have a look at this stuff.

Would be great if this - or something like this - could make it into
trunk, as the current caching will collide heavily with my i18n work.

A nice side-effect of the patch: django will be one of the first python
web frameworks that fully interoperates with accellerator proxies like
squid without degrading them to simple forwarders (like Pragma:
no-cache or Cache-Control: none would do) :-)

bye, Georg

Adrian Holovaty

unread,
Oct 8, 2005, 5:04:29 PM10/8/05
to django-d...@googlegroups.com
On 10/8/05, hugo <g...@hugo.westfalen.de> wrote:
> I today worked on ticket #580 to make the caching work better with
> dependencies on headers. The problem: the current caching won't take
> into account many HTTP headers and won't be easily configureable for
> new headers. So for example if your page content is based on cookies,
> the cache middleware won't see that and will send out content
> regardless of cookie value from the cache.

I'm on it!

Adrian

--
Adrian Holovaty
holovaty.com | djangoproject.com | chicagocrime.org

club-internet

unread,
Oct 8, 2005, 7:55:11 PM10/8/05
to django-d...@googlegroups.com
Le Samedi 8 Octobre 2005 22:35, hugo a écrit :
> A nice side-effect of the patch: django will be one of the first python
> web frameworks that fully interoperates with accellerator proxies like
> squid without degrading them to simple forwarders (like Pragma:
> no-cache or Cache-Control: none would do) :-)

Hi,

Do you mean that django will be ICP protocol compatible ??

Regards,

Laurent

hugo

unread,
Oct 8, 2005, 8:07:34 PM10/8/05
to Django developers
Hi,

> Do you mean that django will be ICP protocol compatible ??

No, but it will give correct Vary header and set some Cache-control
headers. Usually frameworks just ignore this stuff and send out stuff
like Cache-control: private or even Pragma: no-cache - in fact
disabling caching completely. Django will send out headers that allow
caches to cache within the given limits, though. And it will do it in a
controlled way that makes it easier to do the right thing :-)

bye, Georg

Adrian Holovaty

unread,
Oct 8, 2005, 9:10:40 PM10/8/05
to django-d...@googlegroups.com
On 10/8/05, hugo <g...@hugo.westfalen.de> wrote:
> No, but it will give correct Vary header and set some Cache-control
> headers. Usually frameworks just ignore this stuff and send out stuff
> like Cache-control: private or even Pragma: no-cache - in fact
> disabling caching completely. Django will send out headers that allow
> caches to cache within the given limits, though. And it will do it in a
> controlled way that makes it easier to do the right thing :-)

All set! In [810], I checked in the patch from Hugo and Sune that
takes care of all the cache improvements.

Docs have also been updated:

http://www.djangoproject.com/documentation/cache/
http://www.djangoproject.com/documentation/middleware/

Eugene Lazutkin

unread,
Oct 8, 2005, 10:58:39 PM10/8/05
to django-d...@googlegroups.com
It's very good that Sune's and George's additions finally made their way
into Django. They brought more structure to http handling.

Now it is high time to start Q&A session. Qs:

1) If I want to use all 3 cache middlewares + session middleware, what is
the correct order now? Is this stack correct:

MIDDLEWARE_CLASSES = (
"django.middleware.sessions.SessionMiddleware",
"django.middleware.http.ConditionalGetMiddleware",
"django.middleware.gzip.GZipMiddleware",
"django.middleware.cache.CacheMiddleware",
"django.middleware.common.CommonMiddleware",
)

2) If I want to use GZipMiddleware, do I have to specify that my response
depends on presence of 'gzip' in HTTP_ACCEPT_ENCODING? Is it done
automatically? If not, how to specify it?

3) Do I have to define CACHE_MIDDLEWARE_GZIP? Or is it obsolete now?

4) Is it possible to use GZipMiddleware without cache?

Thanks,

Eugene


"Adrian Holovaty" <holo...@gmail.com> wrote
in message
news:6464bab0510081810l64b...@mail.gmail.com...

hugo

unread,
Oct 9, 2005, 6:01:26 AM10/9/05
to Django developers
Hi,

>1) If I want to use all 3 cache middlewares + session middleware, what is
>the correct order now? Is this stack correct:

They can have allmost any order you like, but you have to make sure
that the cache middleware is after all middlwares that might change
content depending on headers - for example the SessionMiddleware needs
to come first. I would put them in in the following order:

CommonMiddleware (handles the / and stuff and so does redirects)
SessionMiddleware
CacheMiddleware
ConditionalGetMiddleware
GZipMiddleware

That way the / handling (redirecting to the / URL if the / is missing)
is done first - it's redirecting and so doesn't need any of the other
middlewares. Session are handled early on, because some other
middleware or the view might depend on it - I think that one should go
in as early as possible (for example the i18n middleware will make use
of sessions if the middleware is loaded after the SessionMiddleware).

Then the caching takes place. I moved the ConditionalGetMiddleware
after it, because it depends on headers set in the CacheMiddleware -
and using it before the CacheMiddleware will lead to problems, as those
headers might not be set.

The GZipMiddleware is last and after the cache, because it doesn't
really change the content based on headers, it just changes the
encoding - so the cache can merrily carry the uncompressed content and
that way the cache _only_ will contain uncompressed content. If you
have large pages that need to be compressed and so the recompressing
takes too much resources, you might want to move it before the
CacheMiddleware, as then the cache will store both compressed (for
users that request it) and uncompressed pages.

>2) If I want to use GZipMiddleware, do I have to specify that my response
>depends on presence of 'gzip' in HTTP_ACCEPT_ENCODING? Is it done
>automatically? If not, how to specify it?

Yes, if you use the GZipMiddleware, pages are compressed if gzip is in
the Accept-Encoding header of the request.

>3) Do I have to define CACHE_MIDDLEWARE_GZIP? Or is it obsolete now?

It is obsolete, now. If you don't want GZipping stuff, just don't load
the middleare.

>4) Is it possible to use GZipMiddleware without cache?

Yes, of course. Every middleware can be used alone from the others.
Just use the middleware you need, if you just need compression, just
use the GZipMiddleware alone.

Even the ConditionalGetMiddleware - but that one needs some headers to
be present, so it might not work as good as you would like it without
the CacheMiddleware. The reason being that without the CacheMiddleware,
the view will allways have to run fully to produce a response and only
then the ConditionalGetMiddleware can kick in (it needs the ETag or
Last-Modified headers). So if you use it alone without the Cache, the
view will still be run, only the transfer is reduced (but that might be
usefull, too - for example in tight tranfer volume situations).

All middlewares that produce/change the response based on headers will
add those headers to the Vary resonse header. That way another cache in
front of the project (like a transparent proxy or the proxy of the
user) will handle caching correctly. So using the GZipMiddleware alone
will make other proxies store pages in the cache based on the URI and
the Accept-Encoding header. Using the SessionMiddleware will add the
Cookie to the list of headers to base storage on.

BTW: Django authentication is cookie based, so if your pages are
different for each user (and different for anonymous from logged in
users), you might want to use the @vary_on_cookie decorator on those
views that are different per user to make sure that those views are
cached based on the cookie header. Of course this manual decorating
only needs to take place when you don't use the SessionMiddleware,
because that alrady adds Cookie to the list of Vary headers.

bye, Georg

Eugene Lazutkin

unread,
Oct 9, 2005, 11:25:26 AM10/9/05
to django-d...@googlegroups.com
Georg,

Thank you for comprehensive answers. I think the part about interaction
between different types of middleware should go directly to Django's
documentation.

Thanks,

Eugene

"hugo" <g...@hugo.westfalen.de> wrote in
message news:1128852086....@z14g2000cwz.googlegroups.com...

Luke Plant

unread,
Nov 3, 2005, 8:07:45 PM11/3/05
to django-d...@googlegroups.com
Hi Hugo, everyone,

Sorry for the reply to such an old e-mail - I dug it out after coming
across problems while developing my own middleware.

> They can have allmost any order you like, but you have to make sure
> that the cache middleware is after all middlwares that might change
> content depending on headers - for example the SessionMiddleware needs
> to come first. I would put them in in the following order:
>
> CommonMiddleware (handles the / and stuff and so does redirects)
> SessionMiddleware
> CacheMiddleware
> ConditionalGetMiddleware
> GZipMiddleware
>
> That way the / handling (redirecting to the / URL if the / is missing)
> is done first - it's redirecting and so doesn't need any of the other
> middlewares. Session are handled early on, because some other
> middleware or the view might depend on it - I think that one should go
> in as early as possible (for example the i18n middleware will make use
> of sessions if the middleware is loaded after the SessionMiddleware).
>
> Then the caching takes place. I moved the ConditionalGetMiddleware
> after it, because it depends on headers set in the CacheMiddleware -
> and using it before the CacheMiddleware will lead to problems, as
> those headers might not be set.

The first thing is that response middleware is done in the inverse
order to request and view middleware (though this is not documented).

The SessionMiddleware patches the 'Vary' header as part of
process_response(), so with the order given above, the CacheMiddleware
process_response() never sees the fact that it should vary on cookie,
and gets the caching wrong (I think I've seen this behaviour
experimentally, but checking these things can be a bit tricky!).

The documentation here:
http://www.djangoproject.com/documentation/cache/#the-per-site-cache

says this:

> Put the CacheMiddleware after any middlewares that might add something
> to the Vary header. The following middlewares do so:
> SessionMiddleware adds Cookie
> GZipMiddleware adds Accept-Encoding

which means the order ought to be something like:

"django.middleware.common.CommonMiddleware",
"django.middleware.cache.CacheMiddleware",
"django.middleware.gzip.GZipMiddleware",
"django.middleware.sessions.SessionMiddleware",

Am I missing something? This makes quite a big difference, since you
could end up serving private data to the wrong person if you get it
wrong.

Cheers,

Luke

--
"In my opinion, we don't devote nearly enough scientific research to
finding a cure for jerks." (Calvin and Hobbes)

Luke Plant || L.Plant.98 (at) cantab.net || http://lukeplant.me.uk/

hugo

unread,
Nov 4, 2005, 3:40:45 PM11/4/05
to Django developers
>Am I missing something? This makes quite a big difference, since you
>could end up serving private data to the wrong person if you get it
>wrong.

No, I think you got it right. The reverse order of middleware in the
response phase does make sense, but it is a bit counter-intuitive at
some times ;-)

So actually in the list of installed middleware, the cache middleware
should come early on to make sure that it comes last in the response
phase. You should put all middleware that has to react based on the
request before it - like the CommonMiddleware, that does possible
redirects based on the missing "/" bit (and does some shortcutting
itself).

In it's request phase the cache middleware just checks wether the
requested page is cached and if it is cached (and shortcuts if it has
learned something). It uses a learned cache key for that - that cache
key is defined in the response phase based on the Vary headers. So it
will only take into account those additions to the Vary header that are
already in there - Middleware that's before the CacheMiddleware will
run after the CacheMiddleware in response and so it's Vary headers
won't be used in the cache key.

This should definitely be clarified in the documentation. Actually I
think it might even make sense to have an additional setting where you
give an explicit order of middleware to use in the response phase -
that way you could even have different order in request and response
phase. Would be interesting what Adrian and Jacob think about this. It
could be implemented in a way that if the user doesn't give an explicit
response order, it will just be the reversed list of middleware - that
way it's like it is now. But if the user wants control, he can set up a
list of response phase handling. The system could even check that both
lists are identical in content and only differ in order, to make sure
the user doesn't forget a middlware in that list.

It would make things much more obvious, I think.

bye, Georg

Luke Plant

unread,
Nov 4, 2005, 4:39:32 PM11/4/05
to django-d...@googlegroups.com

> This should definitely be clarified in the documentation. Actually I
> think it might even make sense to have an additional setting where you
> give an explicit order of middleware to use in the response phase -
> that way you could even have different order in request and response
> phase. Would be interesting what Adrian and Jacob think about this. It
> could be implemented in a way that if the user doesn't give an
> explicit response order, it will just be the reversed list of
> middleware - that way it's like it is now. But if the user wants
> control, he can set up a list of response phase handling. The system
> could even check that both lists are identical in content and only
> differ in order, to make sure the user doesn't forget a middlware in
> that list.
>
> It would make things much more obvious, I think.

Yes, I like this idea. As I was working with this I found myself think
a number of times that with a couple more middleware's thrown into the
mix and you could easily have an impossible set of constraints to
satisfy.

Sune Kirkeby

unread,
Nov 5, 2005, 3:37:33 AM11/5/05
to django-d...@googlegroups.com
On 11/4/05, Luke Plant <luke....@gmail.com> wrote:
> Yes, I like this idea. As I was working with this I found myself think
> a number of times that with a couple more middleware's thrown into the
> mix and you could easily have an impossible set of constraints to
> satisfy.

Actually, if you have a middleware which needs to run in different
places in the request and response chains, you could just split
it in two. One which has process_request and one which has
process_response.

Also, can you come up with an actual middleware, which needs
to run in different places in the req. and resp. chains? Otherwise
this whole discussion is a bit academic, and the code to handle
it in django would be cruft.

/s

hugo

unread,
Nov 5, 2005, 4:46:09 AM11/5/05
to Django developers
>Also, can you come up with an actual middleware, which needs
>to run in different places in the req. and resp. chains? Otherwise
>this whole discussion is a bit academic, and the code to handle
>it in django would be cruft.

As I wrote in the ticket #730: the LocaleMiddleware needs to come in
process_request _after_ the SessionMiddleware, because it needs the
session handling (the language discovery is done in the
process_request). But it modifies the Vary header, so it must come
_after_ the CacheMiddleware to come _before_ it in the
process_response. So it must be:

CacheMiddleware
SessionMiddleware
LocaleMiddleware

to make sure that the CacheMiddleware is run last on process_response
(for caching), first on process_request (for shortcutting) and both
session and translation are initialized int the right order.

Alone this description should be pointer enough that we really need
more obvious ways to order middlewares for process_response than just
"reverse the list of middlewares". It shouldn't take two paragraphs and
one explicit list to tell people how to order middleware with respect
to process_request and process_response ;-)

I think the reversed order is just counter-intuitive - you just don't
think about it when adding middleware and talking about middleware. So
I think at least an optional way to explicitely specify the order would
be nice.

bye, Georg

Sune Kirkeby

unread,
Nov 5, 2005, 8:31:30 AM11/5/05
to django-d...@googlegroups.com
On 11/5/05, hugo <g...@hugo.westfalen.de> wrote:
> I think the reversed order is just counter-intuitive - you just don't
> think about it when adding middleware and talking about middleware.

It's perfecetly obvious to me. I think of it like a normal call-stack;
the request walks down the middleware-stack, the response
walks back up.

> I think at least an optional way to explicitely specify the order would
> be nice.

It might be; but you still haven't given an example where you
actually need it. All you have shown is a need for documenting
the way things work.

/s

Luke Plant

unread,
Nov 5, 2005, 2:18:09 PM11/5/05
to django-d...@googlegroups.com
I agree that functionality shouldn't be added until we have a use case,
and the concept of walking up and down the stack helps. It's not quite
that simple though - if one of the middleware returns a response
during process_request, the other process_request middleware are
skipped. But all the process_response middleware will still be called.
(I'm getting this from reading handlers/base.py and
handlers/modpython.py)

Doesn't that add complications for CacheMiddleware? That
will cache a version that's been processed by all the response
middleware (since it comes high in the list). When you get a cache hit,
it returns that response right away in process_request and so all the
other middleware and the view itself are skipped. But that response
will then go through all the other process_response middleware. If one
of them changes the response content e.g. adds 'foo' at the end of each
paragraph, then the second time you get it, it will have foo added
twice.

Which is exactly what happens. (test this without GZip middleware,
since that will stop you doing a search and replace on the contents of
the response). So I guess there is a use case (and not an entirely
silly one - my CsrfMiddleware inserts some HTML into the response,
which is now inserted n times).

Solutions:
1) allow different ordering for request and response middleware

2) call this a bug in the CacheMiddleware, and make process_response
return the one it got out of the cache, not "cache plus middleware"

3) implement the stack of middleware as you described it, which would
solve this problem, but I don't know if it might introduce some others.

Regards,

Luke

--
"I regret I wasn't born with opposable toes." (Calvin and Hobbes)

Sune Kirkeby

unread,
Nov 6, 2005, 3:18:30 AM11/6/05
to django-d...@googlegroups.com
On 11/5/05, Luke Plant <luke....@gmail.com> wrote:
> I agree that functionality shouldn't be added until we have a use case,
> and the concept of walking up and down the stack helps. It's not quite
> that simple though -

Hmm... I thought the handlers did the right thing, damn :-(

> Solutions:
> 1) allow different ordering for request and response middleware

If this is the solution, we should go all the way, and expose the
four different middleware lists directly.

> 2) call this a bug in the CacheMiddleware, and make process_response
> return the one it got out of the cache, not "cache plus middleware"

I don't see how the cache-middleware can be at fault here, unless
you want it to return the cached response in process_response.
This doesn't stop the other response middlewares from running,
which I think they shouldn't at all.

> 3) implement the stack of middleware as you described it, which would
> solve this problem, but I don't know if it might introduce some others.

+1 from me.

/s

Sune Kirkeby

unread,
Nov 19, 2005, 3:09:03 AM11/19/05
to django-d...@googlegroups.com
On 11/5/05, Luke Plant <luke....@gmail.com> wrote:
> 3) implement the stack of middleware as you described it, which would
> solve this problem, but I don't know if it might introduce some others.

4) Change middleware to work like WSGI middleware-applications. A
middleware would be initialized with a callable, which represents the next
downstream middleware, or initially the view callable. And instead
of process_* the middleware is called with a request, and would then
itself call the downstream callable it was initialized with.

The biggest disadvantage I can see that this has is performance,
since the middleware stack has to change for each view; of course
some caching could probably help here.

But, it's conceptually simple and hence easy to understand, which
seems to be worth a lot.

/s
Reply all
Reply to author
Forward
0 new messages