Clearing cache between tests (#11505)

174 views
Skip to first unread message

Jim Dalton

unread,
Oct 29, 2011, 1:07:25 PM10/29/11
to django-d...@googlegroups.com
I noticed that Aymeric Augustin committed a patch last week [17042] that fixed some issues in the cache backends. Apparently the fact that the cache carries over test to test remains a problem.

I've progressed pretty far with an aggressive "fix" for this problem with the patch in #11505. The current patch modifies all the keys set during a test run (so they don't overwrite any existing values), it tracks any values set during each test and it deletes those and only those that were set during that test. In essence, the cache is "restored" to its original state after each test. This is achieved by monkeypatching each cache backend in use and adding some "bookkeeping" code so to speak to track the keys set. The patch includes thorough tests, and as of last week includes a fix for the cache middleware problem as well (where views cached by middleware get carried over test to test so that the view function does not run.). The full Django test suite passes with it.

That all said, for the most part reception has been lukewarm for this patch. I think that's pretty understandable, since there's a fair amount of fiddly stuff taking place that makes people uncomfortable.

Anyhow, this morning I thought of a vastly simpler alternative proposal, which would fix the cache between tests and ensure we did not step on anyone's toes by unknowingly clearing a cache that the developer did not want to be cleared. As an alternative, I propose we:

1. Add a new optional setting under CACHES, CLEAR_BETWEEN_TESTS. The default for this setting is False. To be clear, the value is set independently for each cache defined in CACHES.

2. Between each test, cycle through each cache in CACHES. If CLEAR_BETWEEN_TESTS is True, then clear() that cache. Otherwise, move on.

Since test problems caused by cache values set in the tests are fairly infrequent, the default use case is just to leave this setting alone. However, if your test(s) need it, the option is there. It's also then trivial to e.g. set it for True in your local environment but have it on False in production, if e.g. you wanted to run your tests there but didn't want to clear your entire production cache.

I'd love to move #11505 through the logjam here and get it taken care of. Are any core devs willing to buy in to this alternative proposal? If so, I can create a patch for it with relative ease. Part of me still likes my original patch and I think there are good reasons to take the "thorough" approach. But this works as well and I could really go either way.

Let me know and thanks for listening.

Here are some references for previous conversations that have taken place about this:

Aymeric Augustin

unread,
Oct 29, 2011, 2:14:06 PM10/29/11
to django-d...@googlegroups.com
Hi Jim,

On 29 oct. 2011, at 19:07, Jim Dalton wrote:

> I noticed that Aymeric Augustin committed a patch last week [17042] that fixed some issues in the cache backends. Apparently the fact that the cache carries over test to test remains a problem.

Yes, while working on the cache tests, I came across #11505. But I just fixed the cache tests themselves, not the general issue.

> Anyhow, this morning I thought of a vastly simpler alternative proposal, which would fix the cache between tests and ensure we did not step on anyone's toes by unknowingly clearing a cache that the developer did not want to be cleared.

Note that clearing a cache should never be a catastrophic event.

> As an alternative, I propose we:
>
> 1. Add a new optional setting under CACHES, CLEAR_BETWEEN_TESTS. The default for this setting is False. To be clear, the value is set independently for each cache defined in CACHES.
>
> 2. Between each test, cycle through each cache in CACHES. If CLEAR_BETWEEN_TESTS is True, then clear() that cache. Otherwise, move on.

Here's what cache.clear() does on each cache backend:
- db: it runs a SQL query "DELETE FROM <cache_table>"
- dummy: it's a no-op
- filebased: it calls shutil.rmtree on the cache directory
- locmem: it clears two Python dicts
- memcached: it calls the "flush_all" API method

It's dirt cheap with locmem, who is probably the most popular cache backend for testing. It's a little bit expensive with db, but if you're using the database cache backend, you're clearly not a performance junkie.

The fact that Django doesn't reset cache state between tests is a bug, which means we don't need to be 100% backwards compatible. I'd be comfortable with calling clear() unconditionally on each cache, after each test. I don't feel strongly against the CLEAR_BETWEEN_TESTS flag, but if you add it, I suggest making it True by default.

To sum up: +1 on .clear(), -0 on CLEAR_BETWEEN_TESTS.

> Since test problems caused by cache values set in the tests are fairly infrequent, the default use case is just to leave this setting alone. However, if your test(s) need it, the option is there. It's also then trivial to e.g. set it for True in your local environment but have it on False in production, if e.g. you wanted to run your tests there but didn't want to clear your entire production cache.

Let's not evoke the idea running tests on production machines -- please :) Whoever does that is one typo away from wiping his production database; the cache is the least of his problems…

> I'd love to move #11505 through the logjam here and get it taken care of. Are any core devs willing to buy in to this alternative proposal?

Sure, I like it.

Best regards,

--
Aymeric Augustin.

Jim Dalton

unread,
Oct 29, 2011, 2:36:30 PM10/29/11
to django-d...@googlegroups.com
On Oct 29, 2011, at 11:14 AM, Aymeric Augustin wrote:

Thank Aymeric,

>
>> As an alternative, I propose we:
>>
>> 1. Add a new optional setting under CACHES, CLEAR_BETWEEN_TESTS. The default for this setting is False. To be clear, the value is set independently for each cache defined in CACHES.
>>
>> 2. Between each test, cycle through each cache in CACHES. If CLEAR_BETWEEN_TESTS is True, then clear() that cache. Otherwise, move on.
>
> Here's what cache.clear() does on each cache backend:
> - db: it runs a SQL query "DELETE FROM <cache_table>"
> - dummy: it's a no-op
> - filebased: it calls shutil.rmtree on the cache directory
> - locmem: it clears two Python dicts
> - memcached: it calls the "flush_all" API method
>
> It's dirt cheap with locmem, who is probably the most popular cache backend for testing. It's a little bit expensive with db, but if you're using the database cache backend, you're clearly not a performance junkie.
>
> The fact that Django doesn't reset cache state between tests is a bug, which means we don't need to be 100% backwards compatible. I'd be comfortable with calling clear() unconditionally on each cache, after each test. I don't feel strongly against the CLEAR_BETWEEN_TESTS flag, but if you add it, I suggest making it True by default.
>
> To sum up: +1 on .clear(), -0 on CLEAR_BETWEEN_TESTS.
>

I *believe* that when this conversation has been had in the past, the simple solution of cache.clear() has been sort of a nonstarter. Certainly it's by far the simplest solution, and it's tolerable for most of the caches. The main issue is that, using the memcached backend, clear() flushes out the entire cache….i.e. not just the cache for the tests, not just the cache you might have set with this application, but literally the whole of the cache.

So the challenge has been to find a way to clear the cache between tests without potentially destructive side effects.

If I personally were given the choice between cache.clear() and nothing at all, i'd take cache.clear() and be careful with it. But I'm not sure if the rest of the community is willing to accept such a solution. Hence the CLEAR_BETWEEN_TESTS flag as a compromise.

Cheers,

Jim

Łukasz Rekucki

unread,
Oct 30, 2011, 6:26:06 AM10/30/11
to django-d...@googlegroups.com
> If I personally were given the choice between cache.clear() and nothing at all, i'd take cache.clear() and be careful with it. But I'm not sure if the rest of the community is willing to accept such a solution. Hence the CLEAR_BETWEEN_TESTS flag as a compromise.

But the flag doesn't solve the memcached issue, right? IMHO, it only
adds to the false impression that you can have any control on what
gets flushed by memcached.

I also agree with Aymeric that not clearing cache state is a bug.
Having a flag to fix/restore buggy behaviour seems weird.

--
Łukasz Rekucki

Jim Dalton

unread,
Oct 30, 2011, 11:48:01 AM10/30/11
to django-d...@googlegroups.com
Hi Lukasz,

I agree with this. At the same time, this line is from the Django cache docs:

"Finally, if you want to delete all the keys in the cache, use cache.clear(). Be careful with this; clear() will remove everythingfrom the cache, not just the keys set by your application."

We're warning people about using clear() indiscriminately, and yet we're proposing to do just that in the test suite on every test run. So although I agree that not clearing the cache between tests is a bug, I do feel like the responsible thing to do is to give developers some measure of control over this behavior. If you really really want to run the tests and you really really don't want to have your cache cleared, here's a way. Or if you're more than happy to clear the cache on your dev machines but want to have some measure of protection from a hapless developer coming along and running ./manage.py test on your production machine and thus clearing your *entire* cache, here's a way.

Lukasz, as an alternative you can take a quick look at the patch I put together for #11505. This is an approach that, in a sense, creates a test namespace on each cache during tests and clears any keys set in it during the test. This approach preserves the state of each cache during the test and ensures it is restored at the end. The cost is the added degree of complexity that comes with monkeypatching systems during tests. So though in some sense it's a "better" solution, I'm not sure that the cost is worth the benefit.

Anyhow, I honestly agree with you and Aymeric that a simple cache.clear() is fine, but I thought the CLEAR_BETWEEN_TESTS flag was a good way to answer any objections that we should be cautious about clearing the cache. If there are no such objections, let's just do a simple cache.clear() then. It's literally a single line of code in the test runner, along with a test and a note in the docs. I can do that work as soon as some kind of consensus is reached.

Thanks for your feedback.

Jeremy Dunck

unread,
Oct 30, 2011, 7:19:44 PM10/30/11
to django-d...@googlegroups.com
On Sun, Oct 30, 2011 at 8:48 AM, Jim Dalton <jim.d...@gmail.com> wrote:
...

> Anyhow, I honestly agree with you and Aymeric that a simple cache.clear() is fine, but I thought the CLEAR_BETWEEN_TESTS flag was a good way to answer any objections that we should be cautious about clearing the cache. If there are no such objections, let's just do a simple cache.clear() then. It's literally a single line of code in the test runner, along with a test and a note in the docs. I can do that work as soon as some kind of consensus is reached.
>
> Thanks for your feedback.

I locally have a test-runner class that calls .clear() on setUp.
Works for me. But we run CI, so test against prod. I needed to point
to different CACHES under test to avoid flushing prod cache 20 times
per day.

Another approach that can work is to have the test runner generate a
unique key prefix for each test run. Then the keys will all be misses
on later runs. If invading the available key length isn't acceptable,
then the test runner could also have a custom KEY_FUNCTION to hash the
resulting key. A downside is that this can create significant
pressure on the cache. I think that is less bad than clearing cache,
though, since that is effectively infinite cache pressure.

Jeremy Dunck

unread,
Oct 30, 2011, 8:10:15 PM10/30/11
to django-d...@googlegroups.com

And now I'm reminded that we talked about all this before:
https://groups.google.com/forum/#!topic/django-developers/zlaPsP13dUY/discussion

Sorry for not really adding anything here. :-/

Jim Dalton

unread,
Oct 31, 2011, 8:56:40 AM10/31/11
to django-d...@googlegroups.com
On Oct 30, 2011, at 4:19 PM, Jeremy Dunck wrote:

Hi Jeremy,

> I locally have a test-runner class that calls .clear() on setUp.
> Works for me. But we run CI, so test against prod. I needed to point
> to different CACHES under test to avoid flushing prod cache 20 times
> per day.

Right. You've got a great real world use case where cleaning the cache is valuable, yet it's important for you to ensure it's not getting cleared frequently.


> Another approach that can work is to have the test runner generate a
> unique key prefix for each test run. Then the keys will all be misses
> on later runs. If invading the available key length isn't acceptable,
> then the test runner could also have a custom KEY_FUNCTION to hash the
> resulting key. A downside is that this can create significant
> pressure on the cache. I think that is less bad than clearing cache,
> though, since that is effectively infinite cache pressure.

I think this is indeed a workable alternative and I recall it from the previous discussion. I too had the same concern about the downside, namely that pressure it puts on the cache. If you're someone like you who's running integration tests 20 times a day on production, I do think it would add up. The end result might be cache data falling out of production, though certainly it would be preferable to clearing out the cache.

BTW also in the last conversation, Jacob proposed something similar to what you do now, i.e. a "test" cache(s) that the user could specify, and if one is not defined, fall back to locmem.

So I'm curious what your opinion is about the alternatives we've bandied about here? If you had your personal wish, which implementation would best suit you?

Reply all
Reply to author
Forward
0 new messages