Upcoming Changes to LMS unit tests

Calen Pennington

unread,

May 19, 2016, 10:27:32 AM5/19/16

to edx-...@googlegroups.com, Engineering - All (Employees, Contractors)

Hi, all,

I'm just finishing up the few changes to enable concurrent unit testing for the LMS unit tests in edx-platform. This will be turned on by default, and will let the tests run across all of the available cores on your machine, which should speed them up noticeably.

Nothing is for free, though, so there are some caveats:

1) Concurrency introduces some amount of indeterminacy in the order of the tests. So, test classes may run in different orders than you've seen them before, and might run in different orders between runs of the tests suite. This is likely not a problem, unless you've accidentally introduced a dependency between your test classes. (Fixing those historical dependencies is actually what I've spent much of this project working on).
2) Because we're already getting some re-ordering of tests, I installed and enabled the 'randomly' plugin for nose, which randomizes the orders of all of the tests. Again, not a problem, unless test classes or test methods have ordering dependencies.

Both of these changes should help root out inter-test dependencies. When tests depend on each other, it hard to distinguish actual failures from failures that happen purely because earlier tests haven't run, which can slow down future development. Ensuring that our tests can run in any order will help keep future development easier.

The new features also add a couple of options to paver: '--no-randomize' will turn off randomization. '--processes=N' will tell the system to only run N processes (useful if you don't want the tests using all of your CPUs). '--extra_args="--randomly-seed=N"' can be useful to reproduce a particular ordering of tests, in case you are seeing what appears to be an order-dependent failure. You can find the seed used during a test run in the test output (when enabled, randomly will add the line 'Using --randomly-seed=XXXXX' to the test output).

If you do see a failure that seems intermittent, please let me know. As I've been working on this, I've seen a couple myself, but they are often hard to reproduce and fix.

As I've been making these changes, here are a number of the failure modes I've had to deal with:

1) Small-value changes to the number of queries being measured by assertNumQueries: this is usually caused by cache dependencies, particularly of things like LanguagePreferences or EmbargoStatus. Usually, these numbers can just be adjusted.
2) Suddenly-missing data: This is often related to a cache now being disabled by default that used to be enabled. The CacheIsolationTestCase (which is a base class of ModulestoreTestCase) allows you to specifically enable caches for certain tests.
3) Unexpected data: Often this is indicative of a resources being shared between concurrently running tests. In my changes, I've isolated the modulestore, the sqlite database used by django, and specific instances of the filesystem (by separating directories). Typically, the solution involves putting a unique identifier in the name of the resource to prevent collisions between test runs.

Thanks
-Cale

p.s. If you want to follow the progress of the pull request, it's https://github.com/edx/edx-platform/pull/12386

Calen Pennington

unread,

May 19, 2016, 11:56:07 AM5/19/16

to edx-...@googlegroups.com, Engineering - All (Employees, Contractors)

I've gotten a few questions about how much this actually speeds up the tests. I don't have rigorous numbers, but I've seen this shave ~2-4 minutes off of the actual testing part of the lms shards on jenkins (from about 13-18 minutes to about 10-11 minutes) on the Jenkins workers. I also ran it on a much larger box (with 8 cores, instead of the 2 on our typical jenkins worker), and I was able to run all of the LMS tests in under 10 minutes (on a single box).

-Cale

Calen Pennington

unread,

May 20, 2016, 9:30:51 AM5/20/16

to edx-...@googlegroups.com, Engineering - All (Employees, Contractors)

I merged the changes yesterday afternoon, and sadly they started making the master build fail with a bunch of different intermittent failures. So, I turned off randomization and concurrency, and will be making separate PRs to debug each of them independently. You can still enable them yourselves using the --randomize and --processes=-1 flags (but be warned, there may be odd failures).

-Cale

Calen Pennington

unread,

Jun 9, 2016, 10:44:59 AM6/9/16

to edx-...@googlegroups.com, Engineering - All (Employees, Contractors)

At long last, I'm prepared to re-merge test concurrency and randomization (for the LMS tests). I've worked through a largish number of different ways that tests can fail because of different orderings, and I thought I'd share them here, along with the debugging techniques that I've used to find them.

So, first, the list of types of failures:

Problem: Language changes leaking out of tests. The LocaleMiddleware doesn't reset the django translator at the end of request. This is fine under normal operation, because the next request will set it to the user's language, but in tests can be a problem if the next test after a language-change test doesn't go through the LocaleMiddleware.

Solution: call django.translation.deactivate as a test cleanup action.

Example: https://github.com/edx/edx-platform/pull/12693

Problem: Memory state can change the order of dicts and sets. So, if you use a set for uniqueness, but then convert the results to a list (https://github.com/edx/edx-platform/blob/master/lms/djangoapps/ccx/api/v0/views.py#L168), the order of the data on the way may change.
Solution: Use assertItemsEqual when comparing lists made from sets. (This was a subtle one).

Example: https://github.com/edx/edx-platform/pull/12674

Problem: Monkey-patches aren't cleaned up after tests. If you just use Class.method = mock_method, then that patch isn't reset when your test is over, and could break other tests that attempt to use that same method.

Solution: Use the patch function from the mock library

Example: https://github.com/edx/edx-platform/pull/12673

Problem: Course content is wrong in the middle of a test. This can happen if a test inside a SharedModulestoreTestCase makes modifications to course content. Those modifications are reset at the end of the test, and future tests end up using the modified course (without expecting to).

Solution: Use a separate ModulestoreTestCase for the tests that modify the course.

Example: https://github.com/edx/edx-platform/pull/12672

Problem: Django reverse doesn't find the urls that a test expects. The django url configuration is set at import time. Tests which modify settings to changes urls.py (directly or indirectly) won't see those changes unless they reset the url configuration.

Solution: Use the UrlResetMixin class to reset the urls.py file so that it is re-imported before and after tests that change it.

Example: https://github.com/edx/edx-platform/pull/12670

Problem: Test configuration data in global dictionaries is changed by specific tests. If those tests don't use a copy of the dictionary or reset their changes at the end of the test, other tests using the same dictionary will be affected. This can happen even when the dictionary is passed to the override_settings decorator.

Solution: Use a copy of the global dictionary when overriding settings values or passing it into a request.

Examples: https://github.com/edx/edx-platform/pull/12668, https://github.com/edx/edx-platform/pull/12626

Problem: Uploaded files are left around after tests. When django saves an uploaded file, if the filename already exists, the new file is given a random suffix. If your test relies on the filename, then it may be broken unexpectedly by the suffix.

Solution: Delete any uploaded files at the end of the test.

Example: https://github.com/edx/edx-platform/pull/12629

Problem: Manually executing middleware process_request methods may change global state that should be cleaned up by running the corresponding process_response method.

Solution: Use the django TestClient to test views, rather than calling the view directly (because the TestClient runs all enabled middleware).

Example: https://github.com/edx/edx-platform/pull/12629

Debugging Tips:

When the python unit tests fail unexpectedly, and you think it might be an ordering problem, try the following. First, attempt to reproduce locally by running the same shard with the same randomized seed. This means using the --attrs=... argument found in the jenkins to specify the subset of tests to run, and then using the --randomly-seed=... argument that is logged at the start of the test run (also in the jenkins build log). Pass those values in the --extra_args="..." parameter to paver when starting your tests. When you do that, hopefully you will see the same failure (which would indicate that it's likely an ordering-based failure). At that point, you can attempt to reproduce it using a smaller test case. I typically was able to find the ordering issue by running the test that failed as well as the test that ran immediately before it (although occasionally it would be a test that ran significantly ahead of the failing test that left the suite in a bad state).

A final note: One problem that I haven't been able to solve is a intermittent segmentation fault in sqlite. It appears to be the same segfault mentioned in issue #24080 on the django tracker, and unfortunately appears to be related simply to a large number of queries getting sqlite into a bad state. It doesn't appear to be something that we can fix purely on the django side, and upgrading sqlite is fraught with difficulty, because of the number of packages that depend on it. If you hit that failure, please just re-run your unit tests (via jenkins run python in a PR comment), and hopefully you'll get a better ordering the next time around. We'll be monitoring the frequency of these segfaults, and if they get too disruptive, we'll aim to put more weight behind resolving the issue.

Thanks

-Cale

Calen Pennington

unread,

Jun 14, 2016, 8:48:23 AM6/14/16

to edx-...@googlegroups.com, Engineering - All (Employees, Contractors)

The saga of concurrent tests continues. We had to rollback "concurrent-by-default" yesterday for several reasons: 1) it lost track of coverage numbers, so our coverage report dropped to ~60%. (I'm going to work on making sure that level of drop fails the build in the future), and 2) it made local debugging more difficult, by preventing breakpoint activation and by preventing you from running a single test method.

To mitigate those, once I fix the coverage issue, I'm going to turn concurrent testing on in Jenkins, but leave it disabled by default locally. If you want to have faster tests, you'll need to explicitly use the -p/--processes argument to paver/nosetests when you run the test suite. (Remember, --processes=-1 uses all available cores).