|Resetting cache gracefully at the end of every test?||Jim Dalton||7/1/11 8:54 AM|
This issue came up again for me recently: Because the cache is not reset or flushed after every test, cache values linger there. This can create headaches when writing tests.
I would like to propose that we treat the cache backend(s) like database backends during the test run. Specifically, I propose that we:
* Prefix "test_" before the KEY_PREFIX in each backend during global setup. This is equivalent to adding the test_ prefix to DB names. In doing this we prevent cache calls during test form overwriting possible legitimate values.
* Monkey patch cache.set() (and cache.set_many()) such that we record all keys used to set values during a test.
* Use cache.delete_many() on the list of keys we record being set during test teardown. This will restore the cache to its previous state after each test run. It's better than flush() which clears the whole cache and is too destructive to be run during testing.
I don't think it needs to be more complicated than this.
I checked around to see if anyone had opened a ticket related to this or had previous discussions on the subject but couldn't find anything. I did briefly touch upon this with Russ in a comment thread on https://code.djangoproject.com/ticket/15026 and I noticed he made a comment in this vein on Django Users: https://groups.google.com/forum/#!topic/django-users/OOGO-3MIO_c . If there is something already open on this or a decision has already been made, please do feel free to point me in the right direction.
Otherwise, if there is interest I'd be happy to open a ticket and get a patch started.
|Re: Resetting cache gracefully at the end of every test?||jdunck||7/1/11 10:01 AM|
On Fri, Jul 1, 2011 at 10:54 AM, Jim D. <jim.d...@gmail.com> wrote:
Well, I think you forgot all the other low-level cache options, like
If I were to do this, I wouldn't have cache shared between tests or
Rather than "test_" and bookkeeping on key names, the key prefix was
Or, what if we just had hooks called from setUp and tearDown, either
Book-keeping with monkey-patched methods sounds fiddly to me; consider
|Re: Resetting cache gracefully at the end of every test?||Jim Dalton||7/1/11 10:27 AM|
Awesome feedback, thanks.
On Jul 1, 2011, at 10:01 AM, Jeremy Dunck wrote:
> Well, I think you forgot all the other low-level cache options, like
Yep, forgot those so they would conceivably need to be tracked.
I'm not totally sure if I'm following you on this point?
What I like about this idea is it avoids the bookkeeping and monkey-patching nonsense. What I don't like about it is that it feels like you could potentially be adding a lot of junk to your cache. A test run might end up hitting it with many values, each unique. A requirement I'm sort of working with in my mind here is that the cache returns to the "pristine" state it was in prior to the test run.
I know what you mean about it being fiddly, but I'm kind of envisioning something not all that different from what goes on with Template.render() settings.EMAIL_BACKEND, or the database backends during test setup and teardown. The bookkeeping required would in truth be nothing more than a set() where you add keys. The monkey patching would entail nothing more than a line to add the passed keys to that set and then a call to the original function. I'm just not sure it's much more fiddly than the fiddling we already do for those examples is what I'm saying.
I"m not quite sure what you mean about celery poking keys in. Maybe you can elaborate a bit? To be clear I'm not sure we could keep the actual underling cache entirely pristine if you were using it in different ways from different systems; the goal would really be to ensure calls using the cache backend API are rolled back with every test.
Anyhow thanks once again for the feedback and the critical thoughts.
|Re: Resetting cache gracefully at the end of every test?||jdunck||7/1/11 2:04 PM|
On Fri, Jul 1, 2011 at 12:27 PM, Jim Dalton <jim.d...@gmail.com> wrote:...
What I mean is, the scenario you described doesn't affect me, because
>> Rather than "test_" and bookkeeping on key names, the key prefix was
That's true about collecting garbage. It would matter for filesystem
>> Or, what if we just had hooks called from setUp and tearDown, either...
What I meant to point out is that the test process, which is doing the
|Re: Resetting cache gracefully at the end of every test?||Jim Dalton||7/1/11 2:47 PM|
On Jul 1, 2011, at 2:04 PM, Jeremy Dunck wrote:
> What I mean is, the scenario you described doesn't affect me, because
Thanks for expanding on this point, I understand now. Just to clarify, I don't presume my scenario is what the community wants -- like you I actually don't care that much about calling cache.flush() where I run tests. I just wanted to throw out some concrete ideas for requirements so that there was a starting point for discussion.
That said, I am certainly still arguing for this requirement. I believe as a general principle that a testing framework should always return the system under test to the state it was in prior to the run. Calling cache.flush() violates that principle -- and of principle concern to me, it flushes your *entire* cache. That means if someone executed tests in production it could kill an entire cache, a situation which could be slightly annoying or a horrific headache depending.
Of course, if no one else agrees with me that this is a worthy principle for Django tests then I won't waste anyone's time by undertaking a patch :)
Okay, understood for the most part. It's hard for me to tell without having gotten my hands dirty yet how much that will matter in the context of running tests.
But to me it's kind of like sending email. There are plenty of ways to send mail from Python, and your application might very well send actual email out during a test run if e.g. you use an alternate email API of some kind. Django's test suite makes no promise that *no* email will be sent when you run tests, it only promises that if you use the standard email backend API it'll be replaced with a dummy backend -- and that no email will be sent if you use Django's provided API.
Likewise, I'd be doubtful that we could guarantee with certainty that running the tests would not affect the cache at all. What I would hope to accomplish was that if your application called cache or get_cache, it would get a modified backend that deleted any entries you inserted after each and every test run. If your system touches a cache via other means then you're sort of on your own (which of course is the status quo right now for all of the cache).
Just to reiterate, I really do appreciate your taking the time to get back to me with this feedback. Thanks!
|Re: Resetting cache gracefully at the end of every test?||Souen||7/3/11 11:03 AM|
I believe as a general principle that a testing framework should always return the system under test to the state it was in prior to the run. Calling cache.flush() violates that principle -- and of principle concern to me, it flushes your *entire* cache. That means if someone executed tests in production it could kill an entire cache, a situation which could be slightly annoying or a horrific headache depending.
I'd like to +1 on the whole idea.
Making the testing framework use a special key prefix when saving/getting values from cache during test run (a key prefix such as "test_", which is equivalent to adding the "test_" prefix for DB names, as pointed out by Jim) and reset all records saved in cache during test run would not only ensure that tests are not polluting "real cache" with dummy values and are not altering legitimate values in cache, it would also ensure that tests are run within a pristine cache. That is, it would prevent tests from interacting with some cache values which would have been saved outside the test run (during normal system run or during a previous test run, for instance), i.e. it would prevent tests from being sensitive to cache state.
|Re: Resetting cache gracefully at the end of every test?||Jim Dalton||7/4/11 6:54 AM|
I've created a ticket for this and uploaded a patch: https://code.djangoproject.com/ticket/16401
Please feel free to review and let me know if this is a good idea in the first place, as well as if the patch makes sense and is the right approach.
|Re: Resetting cache gracefully at the end of every test?||Jacob Kaplan-Moss||7/4/11 8:01 AM|
On Mon, Jul 4, 2011 at 8:54 AM, Jim Dalton <jim.d...@gmail.com> wrote:
While I think the feature's a good idea (I've been stung by tests
I have an alternate proposal:
Instead of using the default cache defined in CACHES, use a "test
Now, unlike DB test we can't just prefix "test_" to the database name
I think this simplifies things immensely, and also gives a fair bit of
|Re: Resetting cache gracefully at the end of every test?||Jim Dalton||7/4/11 2:54 PM|
On Jul 4, 2011, at 8:01 AM, Jacob Kaplan-Moss wrote:
> On Mon, Jul 4, 2011 at 8:54 AM, Jim Dalton <jim.d...@gmail.com> wrote:
Yeah, I agree with you in principle there. I'm definitely an advocate of avoiding monkey patching like the plague in production code.
While I also agree that this code looks a bit hairy when you sort of squint your eyes at it, if you actually look closely at what's going on, we're not modifying things in any way that's tricky or complicated. Really, the only two things we are doing are adding a bit of code that that tracks keys and then swapping a few methods out. Pretty similar to what we do already with Template.render -- just x5. The monkey patching stuff is not modifying any arguments or results, nor is it changing the observable behavior of the modified methods. The tests demonstrate this pretty thoroughly.
The only place I've managed to think of where this is different from production is in the length of the key. Because we are appending "_test_" to the start of KEY_PREFIX, you're going to get 6 less characters in your key than you would otherwise. It *is* a difference, so I don't want to downplay it, but at the same time it's a difference where you'll get an error in testing that you wouldn't get in production -- which is better than the opposite.
The only other place where someone might run into trouble is if they themselves were monkey patching the cache and their changes collided somehow. I understand that this is the kind of thing that happens in Ruby a lot, and it strikes me as unpleasant.
Anyhow, that's not to say that there isn't a better solution or that your proposal below isn't preferable (more on that in a sec), but I'd argue that my patch is at least a plausible solution, i.e. it works in lieu of a better alternative.
> I have an alternate proposal:
* Not sure how to handle multiple cache backends. I imagine if you're someone who has implemented multiple cache backends, then you are using them in specific ways for specific reasons. If your tests end up sharing a test backend, that could result in unpredictable behavior. One possible resolution would be to amend your proposal a bit: For each cache backend defined, we swap it out with the locmem backend unless you have yourself defined a test_CACHE_NAME version of that cache. So e.g. fi I have three backends named "default", "file" and "db", I then need to define test_default, test_file and test_db if I want to specify how the test framework is going to set up the test cache. Otherwise, it'll fall back to replacing it with a locmem cache.
Those were my major concerns, though I'm not sure any of them are fatal at least from my perspective.
At present our choices boil down to:
1. Monkey patch cache, keep track of set keys, flush set keys at the end of each test.
Speaking personally, any of 1-3 are better than #4. I also wonder if #3 is worth considering. It's certainly the easiest, and part of me wonders if objections to it are purely hypothetical.
Anyhow, thanks for giving me your feedback Jacob and obviously thanks for your alternative proposal. I'd be happy to pitch in and work on that idea if that's what you guys all lean toward. In the meantime I'd be interested in other opinions as well.