Adding a middleware to match cookies

84 views
Skip to first unread message

Jeff Willette

unread,
Jan 6, 2017, 9:50:56 PM1/6/17
to Django developers (Contributions to Django itself)
I recently proposed a bad fix (https://code.djangoproject.com/ticket/27686) but I think the problem still remains and I might have a way arpund it.

I understand that calling is_authenticated on a user will require the session to be accessed and the vary by cookie header to be in the response, but if I understand how caching systems work then this will cause all cookies in the request to be taken into account, correct?


The idea in the ticket about using ajax requests is good, but i would prefer to keep the page differences in the django logic and avoid an extra request to the server on every page load.

What if there was an optional middleware early in the request processing that matched cookies based on a regex in settings and then modified the header to only include the matched cookies?

That way...the unauthed users request will vary by cookies, but we would have removed all inconsequential cookies so all unauthed users will have the same set of cookies (likely none), and authed users will have (sessionid) or whatever else you wish to match and everyone will be happily cached correctly.

Is there a hole in my thinking anywhere? Would this work as I expect?

Carl Meyer

unread,
Jan 6, 2017, 10:02:30 PM1/6/17
to django-d...@googlegroups.com
Hi Jeff,

On 01/06/2017 06:21 PM, Jeff Willette wrote:
> I understand that calling is_authenticated on a user will require the
> session to be accessed and the vary by cookie header to be in the
> response, but if I understand how caching systems work then this will
> cause all cookies in the request to be taken into account, correct?

Yes. HTTP doesn't provide any way to say "vary only on this cookie, not
the others." Be nice if it did!

> What if there was an optional middleware early in the request
> processing that matched cookies based on a regex in settings and then
> modified the header to only include the matched cookies?
>
> That way...the unauthed users request will vary by cookies, but we
> would have removed all inconsequential cookies so all unauthed users
> will have the same set of cookies (likely none), and authed users
> will have (sessionid) or whatever else you wish to match and everyone
> will be happily cached correctly.
>
> Is there a hole in my thinking anywhere? Would this work as I
> expect?

I think it could work, yeah. It won't help the efficiency of any other
downstream HTTP caches, but they would still be safe (not serve anyone
the wrong response). And you should be able to help efficiency of
Django's own cache this way, if you strip cookies that Django / your
code doesn't care about before the request ever reaches the caching
middleware. Try it and experiment!

Carl

signature.asc

Jeff Willette

unread,
Jan 7, 2017, 2:26:02 AM1/7/17
to Django developers (Contributions to Django itself)
Carl, thanks for the reply. 

Wy would this not help the efficiency of the downstream caches? Is it because the request has already passed through them with the cookies intact? and when it comes back through the response they have no way to know they have been stripped?

Florian Apolloner

unread,
Jan 7, 2017, 6:25:10 AM1/7/17
to Django developers (Contributions to Django itself)
Hi Jeff,


On Saturday, January 7, 2017 at 3:50:56 AM UTC+1, Jeff Willette wrote:
What if there was an optional middleware early in the request processing that matched cookies based on a regex in settings and then modified the header to only include the matched cookies?

I do not see how this would help -- you'd still have to set "Vary: Cookie" on the response as soon as you are accessing request.user. Or is the goal of this to allow Django's internal page caching stuff to ignore some cookies? That seems doable, but very very dangerous.

This issue reminds me of another issue I came up with (or as Carl puts it: "…presenting the hypothetical case that exposed this bug."), namely https://code.djangoproject.com/ticket/19649 -- Basically as soon as Django accesses __any__ cookie we should set "Vary: Cookie", with all the downsides this entails. I think we finally should fix that and put a fix for it into the BaseHandler.

What would be great would be an HTTP header which allowed for something ala "Cache: if-request-did-not-have-cookies" -- usually it is pointless to cache __anything__ with cookies anyways. That said, with all the analytics super cookies out there, there are not many pages without cookies anymore :(

Carl Meyer

unread,
Jan 7, 2017, 5:30:10 PM1/7/17
to django-d...@googlegroups.com
On 01/06/2017 11:26 PM, Jeff Willette wrote:
> Wy would this not help the efficiency of the downstream caches? Is it
> because the request has already passed through them with the cookies
> intact? and when it comes back through the response they have no way to
> know they have been stripped?

That's correct. Stripping cookies from the request in Django is far too
late to have any effect on an external cache. If the request has reached
Django, then it's already passed through any external caching proxies,
with all cookies, and the cache has already decided not to serve a
cached response. (And if the cache holds on to the response, it'll
associate with the the request it saw, which still had all its cookies).

Carl

signature.asc

Carl Meyer

unread,
Jan 7, 2017, 5:41:02 PM1/7/17
to django-d...@googlegroups.com
On 01/07/2017 03:25 AM, Florian Apolloner wrote:
> On Saturday, January 7, 2017 at 3:50:56 AM UTC+1, Jeff Willette wrote:
>
> What if there was an optional middleware early in the request
> processing that matched cookies based on a regex in settings and
> then modified the header to only include the matched cookies?
>
>
> I do not see how this would help -- you'd still have to set "Vary:
> Cookie" on the response as soon as you are accessing request.user. Or is
> the goal of this to allow Django's internal page caching stuff to ignore
> some cookies? That seems doable, but very very dangerous.

Right, the latter is how I understood it; you'd still use Vary: Cookie,
but strip some cookies before the request reaches the cache middleware.

I don't think it's too dangerous, if you're conservative about the
cookies you strip (e.g. only strip cookies that are known for sure to be
unused on the server, like Google Analytics cookies for instance.)

>
> This issue reminds me of another issue I came up with (or as Carl puts
> it: "…presenting the hypothetical case that exposed this bug."), namely
> https://code.djangoproject.com/ticket/19649 -- Basically as soon as
> Django accesses __any__ cookie we should set "Vary: Cookie", with all
> the downsides this entails. I think we finally should fix that and put a
> fix for it into the BaseHandler.

+1

> What would be great would be an HTTP header which allowed for something
> ala "Cache: if-request-did-not-have-cookies" -- usually it is pointless
> to cache __anything__ with cookies anyways. That said, with all the
> analytics super cookies out there, there are not many pages without
> cookies anymore :(

+1. Basically analytics have already effectively broken HTTP caching as
it was designed to work.

Carl

signature.asc

Jeff Willette

unread,
Jan 7, 2017, 11:41:49 PM1/7/17
to Django developers (Contributions to Django itself)
the specific case I am talking about deals with google analytics cookies, which are different for every user and sent with the request. When accessing request.user, I really only care about sessionid and csrftoken, if present. So sending a vary by cookie header back will cause all the unauthed/unsessioned users to miss the cache because of the GA cookies.

Since I have no use for these cookies in my code, and they are only used for external requests to GA, eliminating them somewhere (earlier the better) should improve cache hits, right?

Tobias McNulty

unread,
Jan 8, 2017, 2:06:38 AM1/8/17
to django-developers
On Jan 7, 2017 11:41 PM, "Jeff Willette" <jrwill...@gmail.com> wrote:
the specific case I am talking about deals with google analytics cookies, which are different for every user and sent with the request. When accessing request.user, I really only care about sessionid and csrftoken, if present. So sending a vary by cookie header back will cause all the unauthed/unsessioned users to miss the cache because of the GA cookies.

Since I have no use for these cookies in my code, and they are only used for external requests to GA, eliminating them somewhere (earlier the better) should improve cache hits, right?

Perhaps, but the place to do that is in your edge cache servers, not Django:


I'm unclear how feasible this is (I've never tried it). It's with noting the last page isn't even on Fastly's public site anymore.

In any event, I'm not seeing the case for a change to Django proper here. If Django's cache middleware is the only cache you're using, you might be able to accomplish something like the above via middleware, as Carl suggested.

If you're looking for assistance with the middleware implementation, I recommend the django-users list. If you're using a another cache in front of Django, you'll need to figure how to implement this there, or find a simpler route such as never setting the tracking cookies in the first place, or splitting the request in two.

Good luck!

Tobias
Reply all
Reply to author
Forward
0 new messages