Cache and GET parameters

609 views
Skip to first unread message

Adrian Holovaty

unread,
Dec 6, 2005, 1:28:06 AM12/6/05
to django-d...@googlegroups.com
Right now, the Django cache system doesn't cache pages that have GET
parameters. This is because GET parameters don't necessarily influence
the output of the page. For example, if the page example.com/foo/ is
cached, anybody could simply add a "?bar=baz" to the URL and Django
wouldn't know whether that was a separate page, or just a bunch of
bogus query string cruft added by a nincompoop. So that's why Django
currently doesn't cache any pages with GET parameters, across the
board.

This is a bad long-term solution, though.

I have a couple of ideas for solutions. The first is to introduce a
NO_GET_PARAMS setting, which would default to False. If it's set to
True, Django would assume that *all* GET parameters (query strings),
sitewide, contain meaningless information, and therefore would not
account for them in creating cache. For example, a request to
example.com/foo/?bar=baz would use the same cache as example.com/foo/.
We might even be able to reuse this setting for other things; I'm not
sure what, yet.

Another solution could be to introduce a view decorator that specifies
the view doesn't care about GET parameters. Essentially it'd be the
opposite of the vary_on_headers decorator
(http://www.djangoproject.com/documentation/cache/#controlling-cache-using-vary-headers).
However, it'd a hassle to have to add that decorator to each view,
particularly if you're like me and rarely query strings.

Finally, along those lines, we could introduce a vary_on_get
decorator, which, used with the NO_GET_PARAMS setting, would be an
opt-in signifying a view *does* rely on query string. This could be
for stuff like search engines, which do vary based on the query string
(e.g. /search/?q=foo). In this case, though, it'd be nice to be able
to specify the variables that are valid. For example, with the
decorator @vary_on_get('foo', 'bar'), the cache would store separate
pages for /search/?foo=1 and /search/?bar=1, but it would use the same
cache for /search/?foo=1 and /search/?foo=1&gonzo=2, because "gonzo"
isn't specified in "vary_on_get" and thus would be ignored.

What do people think of these ideas?

Adrian

--
Adrian Holovaty
holovaty.com | djangoproject.com | chicagocrime.org

Amit Upadhyay

unread,
Dec 6, 2005, 1:30:28 AM12/6/05
to django-d...@googlegroups.com
+1
--
Amit Upadhyay
Blog: http://www.rootshell.be/~upadhyay
+91-9867-359-701

Jacob Kaplan-Moss

unread,
Dec 6, 2005, 1:35:20 AM12/6/05
to django-d...@googlegroups.com
On Dec 6, 2005, at 12:28 AM, Adrian Holovaty wrote:
> Finally, along those lines, we could introduce a vary_on_get
> decorator, which, used with the NO_GET_PARAMS setting, would be an
> opt-in signifying a view *does* rely on query string. This could be
> for stuff like search engines, which do vary based on the query string
> (e.g. /search/?q=foo). In this case, though, it'd be nice to be able
> to specify the variables that are valid. For example, with the
> decorator @vary_on_get('foo', 'bar'), the cache would store separate
> pages for /search/?foo=1 and /search/?bar=1, but it would use the same
> cache for /search/?foo=1 and /search/?foo=1&gonzo=2, because "gonzo"
> isn't specified in "vary_on_get" and thus would be ignored.

This sounds like the right idea to me: explicitly state which GET
params invalidate the cache.

Jacob

Eugene Lazutkin

unread,
Dec 6, 2005, 1:47:24 AM12/6/05
to django-d...@googlegroups.com
"Adrian Holovaty" <holo...@gmail.com> wrote
in message
news:6464bab0512052228r62b...@mail.gmail.com...

>Finally, along those lines, we could introduce a vary_on_get
>decorator, which, used with the NO_GET_PARAMS setting, would be an
>opt-in signifying a view *does* rely on query string. This could be
>for stuff like search engines, which do vary based on the query string
>(e.g. /search/?q=foo). In this case, though, it'd be nice to be able
>to specify the variables that are valid. For example, with the
>decorator @vary_on_get('foo', 'bar'), the cache would store separate
>pages for /search/?foo=1 and /search/?bar=1, but it would use the same
>cache for /search/?foo=1 and /search/?foo=1&gonzo=2, because "gonzo"
>isn't specified in "vary_on_get" and thus would be ignored.

I like @vary_on_get(). IMHO, it covers a lot of real-life scenarios.

Additionally I would be nice to be specify life time of cached copy from
view. In this case we can dynamically assign bigger time to least likely to
change items based on their content. Example: recent articles can be
modified, while old articles are unlikely to be changed now (see
http://code.djangoproject.com/ticket/590).

One more wish: add "clear the cache" button to Admin. I don't ask for
sophisticated cache management (it would be nice to have, but...). Even
simple thing will help a lot. ;-)

Thanks,

Eugene



Maniac

unread,
Dec 6, 2005, 2:15:09 AM12/6/05
to django-d...@googlegroups.com
Jacob Kaplan-Moss wrote:

> This sounds like the right idea to me: explicitly state which GET
> params invalidate the cache.

So when the view's code change during development one should alsways
remember to update this invalidators list. Not very DRY :-(

James Bennett

unread,
Dec 6, 2005, 3:02:28 AM12/6/05
to django-d...@googlegroups.com
On 12/6/05, Maniac <Man...@softwaremaniacs.org> wrote:
> So when the view's code change during development one should alsways
> remember to update this invalidators list. Not very DRY :-(

Except it's a decorator, so it's right there with your view code.


--
"May the forces of evil become confused on the way to your house."
-- George Carlin

Maniac

unread,
Dec 6, 2005, 3:27:40 AM12/6/05
to django-d...@googlegroups.com
James Bennett wrote:

>Except it's a decorator, so it's right there with your view code.
>
>
But still you have to blindly copy strings from code to decorator
parameters.

Cheng Zhang

unread,
Dec 6, 2005, 3:43:04 AM12/6/05
to django-d...@googlegroups.com
I agree. This is the best one among the three proposals.

hugo

unread,
Dec 6, 2005, 4:08:43 AM12/6/05
to Django developers
>Finally, along those lines, we could introduce a vary_on_get
>decorator, which, used with the NO_GET_PARAMS setting, would be an
>opt-in signifying a view *does* rely on query string. This could be
>for stuff like search engines, which do vary based on the query string
>(e.g. /search/?q=foo). In this case, though, it'd be nice to be able
>to specify the variables that are valid.

+1 for vary_on_get, that fits nicely into the current scheme and just
sounds right to me.

bye, Georg

James Bennett

unread,
Dec 6, 2005, 5:11:40 AM12/6/05
to django-d...@googlegroups.com
On 12/6/05, Maniac <Man...@softwaremaniacs.org> wrote:
> But still you have to blindly copy strings from code to decorator
> parameters.

Any way of implementing this is going to require you to specify
*somewhere* which GET parameters are relevant to caching a particular
view, and it'd be hard to implement that directly in the view syntax
(since not everyone will be using caching). The proposed decorator
does the next best thing to having it directly "in" the view, and
keeps that information bundled with your view code instead of storing
it somewhere else. So it gets a +1 from me.

And while DRY is great, I'm still not convinced that this is a
violation of it, or at least one we need to worry too much about -- if
strictly following DRY means needlessly complicating things, then I
don't think it should be strictly followed.

Maniac

unread,
Dec 6, 2005, 5:48:43 AM12/6/05
to django-d...@googlegroups.com
James Bennett wrote:

>Any way of implementing this is going to require you to specify
>*somewhere* which GET parameters are relevant to caching a particular
>view, and it'd be hard to implement that directly in the view syntax
>(since not everyone will be using caching). The proposed decorator
>does the next best thing to having it directly "in" the view, and
>keeps that information bundled with your view code instead of storing
>it somewhere else. So it gets a +1 from me.
>
>And while DRY is great, I'm still not convinced that this is a
>violation of it, or at least one we need to worry too much about -- if
>strictly following DRY means needlessly complicating things, then I
>don't think it should be strictly followed.
>
>
I completely agree. I was just expressing a concern about it, may be
someone would come up with a better solution. Thinking about the issue I
too fail to invent something absolutely automatic...

xlex...@gmail.com

unread,
Dec 6, 2005, 8:26:55 AM12/6/05
to Django developers
Don't you want to use my cache algorithm, proposed here:
http://groups.google.fi/group/django-developers/browse_thread/thread/fdc59b0b46502ede
?

It is able to handle GET/POST parameters (via converting these
parameters to the array and futher hashing array to the string, which
will be used as an unique indentifier).

Adrian Holovaty

unread,
Dec 6, 2005, 9:37:18 AM12/6/05
to django-d...@googlegroups.com
On 12/6/05, hugo <g...@hugo.westfalen.de> wrote:
> +1 for vary_on_get, that fits nicely into the current scheme and just
> sounds right to me.

Looks like vary_on_get is the most popular choice. So here's how that
might work:

@vary_on_get('id')
def my_view(request):
id = request.GET.get('id', None)

@vary_on_get('q', 'page')
def search(request):
q = request.GET.get('q', None)
page = request.GET.get('page', 1)

In the second example, a request to /search/?foo=bar would use the
cached version of /search/, because "foo" isn't in vary_on_get.

The remaining question is: What's the behavior if vary_on_get() isn't
specified for a particular view? Do we cache everything (including
separate cache entries for any combination of different GET
parameters) or cache nothing (current behavior)?

Amit Upadhyay

unread,
Dec 6, 2005, 9:41:11 AM12/6/05
to django-d...@googlegroups.com
On 12/6/05, Adrian Holovaty <holo...@gmail.com> wrote:

The remaining question is: What's the behavior if vary_on_get() isn't
specified for a particular view? Do we cache everything (including
separate cache entries for any combination of different GET
parameters) or cache nothing (current behavior)?

Quoting your original post:

I have a couple of ideas for solutions. The first is to introduce a
NO_GET_PARAMS setting, which would default to False. If it's set to
True, Django would assume that *all* GET parameters (query strings),
sitewide, contain meaningless information, and therefore would not
account for them in creating cache. For example, a request to
example.com/foo/?bar=baz would use the same cache as example.com/foo/.
We might even be able to reuse this setting for other things; I'm not
sure what, yet.
Sounds fine to me.

PeterK

unread,
Nov 1, 2008, 6:30:29 AM11/1/08
to Adrian Holovaty, django-d...@googlegroups.com
Picking up an old thread (because it is still relevant)

On 6 Dec 2005, 15:37, Adrian Holovaty <holov...@gmail.com> wrote:
>
> The remaining question is: What's the behavior if vary_on_get() isn't
> specified for a particular view? Do we cache everything (including
> separate cache entries for any combination of different GETparameters) or cachenothing (current behavior)?
>

URL:s should be treated as opaque in the default behaviour so there
would be a separate cache entry for each of these:

example.com/list/
example.com/list/?a=1&b=2
example.com/list/?b=2&a=1

However, the developer may know better and details which parameters
that affect the get request. This could be provided in a decorator for
the view method like this:

Vary by the entire URL (should be default behaviour):

@cache_page(60 * 15)
@vary_by_param("*") #This should not be required to get per full URL
caching.
def slashdot_this(request):
...

Only vary by values for parameter a and b (ignore everything else):

@cache_page(60 * 15)
@vary_by_param(["a","b"])
def slashdot_this(request):
...

I have added this to ticket 4992 [1] as I believe it would be of great
benefit for everyone filtering lists of data by URL parameters (a
common use case).

Kind regards,

Peter Krantz

[1]: http://code.djangoproject.com/ticket/4992

Jeremy Dunck

unread,
Nov 1, 2008, 9:52:30 AM11/1/08
to django-d...@googlegroups.com
On Tue, Dec 6, 2005 at 9:37 AM, Adrian Holovaty <holo...@gmail.com> wrote:
...

> Looks like vary_on_get is the most popular choice. So here's how that
> might work:
>
> @vary_on_get('id')
> def my_view(request):
> id = request.GET.get('id', None)

To be clear, the generated cache key would still include anything
stated in the HTTP Vary heads, right?

Vary: Cookie combined with @vary_on_get() should still vary on Cookie.

> The remaining question is: What's the behavior if vary_on_get() isn't
> specified for a particular view? Do we cache everything (including
> separate cache entries for any combination of different GET
> parameters) or cache nothing (current behavior)?

I say cache nothing; doing otherwise is backwards-incompatible. I
realize that means a bunch of decorators on views if you want the
cache-everything behavior.

Assuming vary_on_get() with no parameters means no variance (other
than the HTTP Vary headers), then
perhaps we could write a helper to walk URLConf and apply a
vary_on_get() decorator to indicate cache-everything. People could
opt-in this way without having to go update all code.

(This does fall down if you're mixing reusable apps that expect
cache-nothing. Hmm.)

SmileyChris

unread,
Nov 1, 2008, 9:32:45 PM11/1/08
to Django developers
On Nov 2, 2:52 am, "Jeremy Dunck" <jdu...@gmail.com> wrote:
> Assuming vary_on_get() with no parameters means no variance (other
> than the HTTP Vary headers), then [...]

That seems confusing - the decorator name seems to imply that it would
vary on any get attribute (even though this is the default) - at least
that's how I'd look at it if I didn't know otherwise.

Jeremy Dunck

unread,
Nov 1, 2008, 9:51:06 PM11/1/08
to django-d...@googlegroups.com

@vary_on_get(None) ? :-)

David Cramer

unread,
Nov 2, 2008, 2:00:08 PM11/2/08
to Django developers
I really like the idea of the explicit GET params passed.So I'm +1
especially on solution #3. I actually had never realized it wasn't
caching pages with GET params, luckily though, any pages where I use
this decorator don't fluctuate like that :)

On Nov 1, 7:51 pm, "Jeremy Dunck" <jdu...@gmail.com> wrote:

PeterK

unread,
Nov 2, 2008, 2:25:09 PM11/2/08
to Django developers
On Nov 1, 2:52 pm, "Jeremy Dunck" <jdu...@gmail.com> wrote:
>
> To be clear, the generated cache key would still include anything
> stated in the HTTP Vary heads, right?
>
> Vary: Cookie combined with @vary_on_get() should still vary on Cookie.
>

Yes.

>
> I say cache nothing; doing otherwise is backwards-incompatible.   I
> realize that means a bunch of decorators on views if you want the
> cache-everything behavior.
>

Maybe, it's just me, but I find the current behaviour confusing after
reading the introduction to the cache documentation [1]. It says
"Given a URL..." so I expected the cache to use everyting that
identifies an object in the URL (path and query as described in RFC
3986 [2]).

But, it is backwards-incompatible so maybe your suggestion is the
right way to go.

[1]: http://docs.djangoproject.com/en/dev/topics/cache/
[2]: http://labs.apache.org/webarch/uri/rfc/rfc3986.html#components

I attached a patch to ticket #4992 for the behaviour I (and apparently
other people) expected:

http://code.djangoproject.com/attachment/ticket/4992/cache_by_request_full_path.diff

Regards,

Peter
Reply all
Reply to author
Forward
0 new messages