I today worked on ticket #580 to make the caching work better with dependencies on headers. The problem: the current caching won't take into account many HTTP headers and won't be easily configureable for new headers. So for example if your page content is based on cookies, the cache middleware won't see that and will send out content regardless of cookie value from the cache.
Sune Kirkeby did the main work by splitting the CacheMiddleware into three distinct middlewares: one for gzip encoding, one for HTTP conditional GET handling and one for caching. Those are already linked from the user contributed middlewares.
What my patch does is integrate those middlewares into django core and to rework the decorators to be based fully on the middleware code (so that there is only one place where cache handling is done). Additionally I added some helper stuff to manage the Vary response header, as that is the beast the caching now bases it's cache keys on (before it was based on the path, the Accept-Encoding header - only partly, as it only looked for gzip support).
I did some tests with the middlewares and the decorators and it worked fine. Maybe somebody else want's to have a look at this stuff.
Would be great if this - or something like this - could make it into trunk, as the current caching will collide heavily with my i18n work.
A nice side-effect of the patch: django will be one of the first python web frameworks that fully interoperates with accellerator proxies like squid without degrading them to simple forwarders (like Pragma: no-cache or Cache-Control: none would do) :-)
> I today worked on ticket #580 to make the caching work better with
> dependencies on headers. The problem: the current caching won't take
> into account many HTTP headers and won't be easily configureable for
> new headers. So for example if your page content is based on cookies,
> the cache middleware won't see that and will send out content
> regardless of cookie value from the cache.
I'm on it!
Adrian
--
Adrian Holovaty
holovaty.com | djangoproject.com | chicagocrime.org
> A nice side-effect of the patch: django will be one of the first python
> web frameworks that fully interoperates with accellerator proxies like
> squid without degrading them to simple forwarders (like Pragma:
> no-cache or Cache-Control: none would do) :-)
Hi,
Do you mean that django will be ICP protocol compatible ??
> Do you mean that django will be ICP protocol compatible ??
No, but it will give correct Vary header and set some Cache-control headers. Usually frameworks just ignore this stuff and send out stuff like Cache-control: private or even Pragma: no-cache - in fact disabling caching completely. Django will send out headers that allow caches to cache within the given limits, though. And it will do it in a controlled way that makes it easier to do the right thing :-)
> No, but it will give correct Vary header and set some Cache-control
> headers. Usually frameworks just ignore this stuff and send out stuff
> like Cache-control: private or even Pragma: no-cache - in fact
> disabling caching completely. Django will send out headers that allow
> caches to cache within the given limits, though. And it will do it in a
> controlled way that makes it easier to do the right thing :-)
All set! In [810], I checked in the patch from Hugo and Sune that
takes care of all the cache improvements.
2) If I want to use GZipMiddleware, do I have to specify that my response depends on presence of 'gzip' in HTTP_ACCEPT_ENCODING? Is it done automatically? If not, how to specify it?
3) Do I have to define CACHE_MIDDLEWARE_GZIP? Or is it obsolete now?
4) Is it possible to use GZipMiddleware without cache?
> No, but it will give correct Vary header and set some Cache-control
> headers. Usually frameworks just ignore this stuff and send out stuff
> like Cache-control: private or even Pragma: no-cache - in fact
> disabling caching completely. Django will send out headers that allow
> caches to cache within the given limits, though. And it will do it in a
> controlled way that makes it easier to do the right thing :-)
All set! In [810], I checked in the patch from Hugo and Sune that
takes care of all the cache improvements.
>1) If I want to use all 3 cache middlewares + session middleware, what is >the correct order now? Is this stack correct:
They can have allmost any order you like, but you have to make sure that the cache middleware is after all middlwares that might change content depending on headers - for example the SessionMiddleware needs to come first. I would put them in in the following order:
CommonMiddleware (handles the / and stuff and so does redirects) SessionMiddleware CacheMiddleware ConditionalGetMiddleware GZipMiddleware
That way the / handling (redirecting to the / URL if the / is missing) is done first - it's redirecting and so doesn't need any of the other middlewares. Session are handled early on, because some other middleware or the view might depend on it - I think that one should go in as early as possible (for example the i18n middleware will make use of sessions if the middleware is loaded after the SessionMiddleware).
Then the caching takes place. I moved the ConditionalGetMiddleware after it, because it depends on headers set in the CacheMiddleware - and using it before the CacheMiddleware will lead to problems, as those headers might not be set.
The GZipMiddleware is last and after the cache, because it doesn't really change the content based on headers, it just changes the encoding - so the cache can merrily carry the uncompressed content and that way the cache _only_ will contain uncompressed content. If you have large pages that need to be compressed and so the recompressing takes too much resources, you might want to move it before the CacheMiddleware, as then the cache will store both compressed (for users that request it) and uncompressed pages.
>2) If I want to use GZipMiddleware, do I have to specify that my response >depends on presence of 'gzip' in HTTP_ACCEPT_ENCODING? Is it done >automatically? If not, how to specify it?
Yes, if you use the GZipMiddleware, pages are compressed if gzip is in the Accept-Encoding header of the request.
>3) Do I have to define CACHE_MIDDLEWARE_GZIP? Or is it obsolete now?
It is obsolete, now. If you don't want GZipping stuff, just don't load the middleare.
>4) Is it possible to use GZipMiddleware without cache?
Yes, of course. Every middleware can be used alone from the others. Just use the middleware you need, if you just need compression, just use the GZipMiddleware alone.
Even the ConditionalGetMiddleware - but that one needs some headers to be present, so it might not work as good as you would like it without the CacheMiddleware. The reason being that without the CacheMiddleware, the view will allways have to run fully to produce a response and only then the ConditionalGetMiddleware can kick in (it needs the ETag or Last-Modified headers). So if you use it alone without the Cache, the view will still be run, only the transfer is reduced (but that might be usefull, too - for example in tight tranfer volume situations).
All middlewares that produce/change the response based on headers will add those headers to the Vary resonse header. That way another cache in front of the project (like a transparent proxy or the proxy of the user) will handle caching correctly. So using the GZipMiddleware alone will make other proxies store pages in the cache based on the URI and the Accept-Encoding header. Using the SessionMiddleware will add the Cookie to the list of headers to base storage on.
BTW: Django authentication is cookie based, so if your pages are different for each user (and different for anonymous from logged in users), you might want to use the @vary_on_cookie decorator on those views that are different per user to make sure that those views are cached based on the cookie header. Of course this manual decorating only needs to take place when you don't use the SessionMiddleware, because that alrady adds Cookie to the list of Vary headers.
Thank you for comprehensive answers. I think the part about interaction between different types of middleware should go directly to Django's documentation.
>>1) If I want to use all 3 cache middlewares + session middleware, what is
>>the correct order now? Is this stack correct:
> They can have allmost any order you like, but you have to make sure
> that the cache middleware is after all middlwares that might change
> content depending on headers - for example the SessionMiddleware needs
> to come first. I would put them in in the following order:
> CommonMiddleware (handles the / and stuff and so does redirects)
> SessionMiddleware
> CacheMiddleware
> ConditionalGetMiddleware
> GZipMiddleware
> That way the / handling (redirecting to the / URL if the / is missing)
> is done first - it's redirecting and so doesn't need any of the other
> middlewares. Session are handled early on, because some other
> middleware or the view might depend on it - I think that one should go
> in as early as possible (for example the i18n middleware will make use
> of sessions if the middleware is loaded after the SessionMiddleware).
> Then the caching takes place. I moved the ConditionalGetMiddleware
> after it, because it depends on headers set in the CacheMiddleware -
> and using it before the CacheMiddleware will lead to problems, as those
> headers might not be set.
> The GZipMiddleware is last and after the cache, because it doesn't
> really change the content based on headers, it just changes the
> encoding - so the cache can merrily carry the uncompressed content and
> that way the cache _only_ will contain uncompressed content. If you
> have large pages that need to be compressed and so the recompressing
> takes too much resources, you might want to move it before the
> CacheMiddleware, as then the cache will store both compressed (for
> users that request it) and uncompressed pages.
>>2) If I want to use GZipMiddleware, do I have to specify that my response
>>depends on presence of 'gzip' in HTTP_ACCEPT_ENCODING? Is it done
>>automatically? If not, how to specify it?
> Yes, if you use the GZipMiddleware, pages are compressed if gzip is in
> the Accept-Encoding header of the request.
>>3) Do I have to define CACHE_MIDDLEWARE_GZIP? Or is it obsolete now?
> It is obsolete, now. If you don't want GZipping stuff, just don't load
> the middleare.
>>4) Is it possible to use GZipMiddleware without cache?
> Yes, of course. Every middleware can be used alone from the others.
> Just use the middleware you need, if you just need compression, just
> use the GZipMiddleware alone.
> Even the ConditionalGetMiddleware - but that one needs some headers to
> be present, so it might not work as good as you would like it without
> the CacheMiddleware. The reason being that without the CacheMiddleware,
> the view will allways have to run fully to produce a response and only
> then the ConditionalGetMiddleware can kick in (it needs the ETag or
> Last-Modified headers). So if you use it alone without the Cache, the
> view will still be run, only the transfer is reduced (but that might be
> usefull, too - for example in tight tranfer volume situations).
> All middlewares that produce/change the response based on headers will
> add those headers to the Vary resonse header. That way another cache in
> front of the project (like a transparent proxy or the proxy of the
> user) will handle caching correctly. So using the GZipMiddleware alone
> will make other proxies store pages in the cache based on the URI and
> the Accept-Encoding header. Using the SessionMiddleware will add the
> Cookie to the list of headers to base storage on.
> BTW: Django authentication is cookie based, so if your pages are
> different for each user (and different for anonymous from logged in
> users), you might want to use the @vary_on_cookie decorator on those
> views that are different per user to make sure that those views are
> cached based on the cookie header. Of course this manual decorating
> only needs to take place when you don't use the SessionMiddleware,
> because that alrady adds Cookie to the list of Vary headers.
> They can have allmost any order you like, but you have to make sure
> that the cache middleware is after all middlwares that might change
> content depending on headers - for example the SessionMiddleware needs
> to come first. I would put them in in the following order:
> CommonMiddleware (handles the / and stuff and so does redirects)
> SessionMiddleware
> CacheMiddleware
> ConditionalGetMiddleware
> GZipMiddleware
> That way the / handling (redirecting to the / URL if the / is missing)
> is done first - it's redirecting and so doesn't need any of the other
> middlewares. Session are handled early on, because some other
> middleware or the view might depend on it - I think that one should go
> in as early as possible (for example the i18n middleware will make use
> of sessions if the middleware is loaded after the SessionMiddleware).
> Then the caching takes place. I moved the ConditionalGetMiddleware
> after it, because it depends on headers set in the CacheMiddleware -
> and using it before the CacheMiddleware will lead to problems, as
> those headers might not be set.
The first thing is that response middleware is done in the inverse
order to request and view middleware (though this is not documented).
The SessionMiddleware patches the 'Vary' header as part of
process_response(), so with the order given above, the CacheMiddleware
process_response() never sees the fact that it should vary on cookie,
and gets the caching wrong (I think I've seen this behaviour
experimentally, but checking these things can be a bit tricky!).
> Put the CacheMiddleware after any middlewares that might add something
> to the Vary header. The following middlewares do so: > SessionMiddleware adds Cookie > GZipMiddleware adds Accept-Encoding
>Am I missing something? This makes quite a big difference, since you >could end up serving private data to the wrong person if you get it >wrong.
No, I think you got it right. The reverse order of middleware in the response phase does make sense, but it is a bit counter-intuitive at some times ;-)
So actually in the list of installed middleware, the cache middleware should come early on to make sure that it comes last in the response phase. You should put all middleware that has to react based on the request before it - like the CommonMiddleware, that does possible redirects based on the missing "/" bit (and does some shortcutting itself).
In it's request phase the cache middleware just checks wether the requested page is cached and if it is cached (and shortcuts if it has learned something). It uses a learned cache key for that - that cache key is defined in the response phase based on the Vary headers. So it will only take into account those additions to the Vary header that are already in there - Middleware that's before the CacheMiddleware will run after the CacheMiddleware in response and so it's Vary headers won't be used in the cache key.
This should definitely be clarified in the documentation. Actually I think it might even make sense to have an additional setting where you give an explicit order of middleware to use in the response phase - that way you could even have different order in request and response phase. Would be interesting what Adrian and Jacob think about this. It could be implemented in a way that if the user doesn't give an explicit response order, it will just be the reversed list of middleware - that way it's like it is now. But if the user wants control, he can set up a list of response phase handling. The system could even check that both lists are identical in content and only differ in order, to make sure the user doesn't forget a middlware in that list.
> This should definitely be clarified in the documentation. Actually I
> think it might even make sense to have an additional setting where you
> give an explicit order of middleware to use in the response phase -
> that way you could even have different order in request and response
> phase. Would be interesting what Adrian and Jacob think about this. It
> could be implemented in a way that if the user doesn't give an
> explicit response order, it will just be the reversed list of
> middleware - that way it's like it is now. But if the user wants
> control, he can set up a list of response phase handling. The system
> could even check that both lists are identical in content and only
> differ in order, to make sure the user doesn't forget a middlware in
> that list.
> It would make things much more obvious, I think.
Yes, I like this idea. As I was working with this I found myself think
a number of times that with a couple more middleware's thrown into the
mix and you could easily have an impossible set of constraints to
satisfy.
Luke
-- "In my opinion, we don't devote nearly enough scientific research to finding a cure for jerks." (Calvin and Hobbes)
On 11/4/05, Luke Plant <luke.pl...@gmail.com> wrote:
> Yes, I like this idea. As I was working with this I found myself think
> a number of times that with a couple more middleware's thrown into the
> mix and you could easily have an impossible set of constraints to
> satisfy.
Actually, if you have a middleware which needs to run in different
places in the request and response chains, you could just split
it in two. One which has process_request and one which has
process_response.
Also, can you come up with an actual middleware, which needs
to run in different places in the req. and resp. chains? Otherwise
this whole discussion is a bit academic, and the code to handle
it in django would be cruft.
>Also, can you come up with an actual middleware, which needs >to run in different places in the req. and resp. chains? Otherwise >this whole discussion is a bit academic, and the code to handle >it in django would be cruft.
As I wrote in the ticket #730: the LocaleMiddleware needs to come in process_request _after_ the SessionMiddleware, because it needs the session handling (the language discovery is done in the process_request). But it modifies the Vary header, so it must come _after_ the CacheMiddleware to come _before_ it in the process_response. So it must be:
to make sure that the CacheMiddleware is run last on process_response (for caching), first on process_request (for shortcutting) and both session and translation are initialized int the right order.
Alone this description should be pointer enough that we really need more obvious ways to order middlewares for process_response than just "reverse the list of middlewares". It shouldn't take two paragraphs and one explicit list to tell people how to order middleware with respect to process_request and process_response ;-)
I think the reversed order is just counter-intuitive - you just don't think about it when adding middleware and talking about middleware. So I think at least an optional way to explicitely specify the order would be nice.
On Sat, 5 Nov 2005 14:31:30 +0100 Sune Kirkeby wrote:
> On 11/5/05, hugo <g...@hugo.westfalen.de> wrote:
> > I think the reversed order is just counter-intuitive - you just
> > don't think about it when adding middleware and talking about
> > middleware.
> It's perfecetly obvious to me. I think of it like a normal call-stack;
> the request walks down the middleware-stack, the response
> walks back up.
> > I think at least an optional way to explicitely specify the order
> > would be nice.
> It might be; but you still haven't given an example where you
> actually need it. All you have shown is a need for documenting
> the way things work.
I agree that functionality shouldn't be added until we have a use case,
and the concept of walking up and down the stack helps. It's not quite
that simple though - if one of the middleware returns a response
during process_request, the other process_request middleware are
skipped. But all the process_response middleware will still be called.
(I'm getting this from reading handlers/base.py and
handlers/modpython.py)
Doesn't that add complications for CacheMiddleware? That
will cache a version that's been processed by all the response
middleware (since it comes high in the list). When you get a cache hit,
it returns that response right away in process_request and so all the
other middleware and the view itself are skipped. But that response
will then go through all the other process_response middleware. If one
of them changes the response content e.g. adds 'foo' at the end of each
paragraph, then the second time you get it, it will have foo added
twice.
Which is exactly what happens. (test this without GZip middleware,
since that will stop you doing a search and replace on the contents of
the response). So I guess there is a use case (and not an entirely
silly one - my CsrfMiddleware inserts some HTML into the response,
which is now inserted n times).
Solutions:
1) allow different ordering for request and response middleware
2) call this a bug in the CacheMiddleware, and make process_response
return the one it got out of the cache, not "cache plus middleware"
3) implement the stack of middleware as you described it, which would
solve this problem, but I don't know if it might introduce some others.
Regards,
Luke
-- "I regret I wasn't born with opposable toes." (Calvin and Hobbes)
On 11/5/05, Luke Plant <luke.pl...@gmail.com> wrote:
> I agree that functionality shouldn't be added until we have a use case,
> and the concept of walking up and down the stack helps. It's not quite
> that simple though -
Hmm... I thought the handlers did the right thing, damn :-(
> Solutions:
> 1) allow different ordering for request and response middleware
If this is the solution, we should go all the way, and expose the
four different middleware lists directly.
> 2) call this a bug in the CacheMiddleware, and make process_response
> return the one it got out of the cache, not "cache plus middleware"
I don't see how the cache-middleware can be at fault here, unless
you want it to return the cached response in process_response.
This doesn't stop the other response middlewares from running,
which I think they shouldn't at all.
> 3) implement the stack of middleware as you described it, which would
> solve this problem, but I don't know if it might introduce some others.
On 11/5/05, Luke Plant <luke.pl...@gmail.com> wrote:
> 3) implement the stack of middleware as you described it, which would
> solve this problem, but I don't know if it might introduce some others.
4) Change middleware to work like WSGI middleware-applications. A
middleware would be initialized with a callable, which represents the next
downstream middleware, or initially the view callable. And instead
of process_* the middleware is called with a request, and would then
itself call the downstream callable it was initialized with.
The biggest disadvantage I can see that this has is performance,
since the middleware stack has to change for each view; of course
some caching could probably help here.
But, it's conceptually simple and hence easy to understand, which
seems to be worth a lot.