Timing of tests

Christian Askeland

unread,

Jun 11, 2013, 3:04:20 PM6/11/13

to catch...@googlegroups.com

Hi all,

One of my colleagues attended Phil's presentation at ACCU in Bristol earlier this year, and convinced me into (considering) moving from an old ctest+cppunit framework to the simpler and cooler Catch. It looks promising so far, especially how easy it is to add new tests.

One feature I miss is timing of tests. There is an issue in the tracker, one year old, describing this, but I still want to ask if there are any plans to implement it in the near future.

Catch exports JUnit files to our Jenkins server, but all the timings contain a "tbd". This gives zero timings in the Jenkins web gui. I use these timings to at a glance find tests that are too slow and therefore should be optimized.

Here is a snippet from our command line, showing the default output form ctest (just as an example of the kind of timing output I find useful):

[shell cmd] ctest -D ExperimentalTest --no-compress-output
[shell ###]    Site: jenkins-Ubuntu-slave
[shell ###]    Build name: Linux-c++
[shell ###] Test project /home/jenkins/slave_fs/workspace/CustusX/working/CustusX3/build_Release
[shell ###]         Start   1: TestSliceComputer::testInitialize
[shell ###]   1/116 Test   #1: TestSliceComputer::testInitialize ....................................   Passed    1.36 sec
[shell ###]         Start   2: TestSliceComputer::testACS
[shell ###]   2/116 Test   #2: TestSliceComputer::testACS ...........................................   Passed    0.49 sec
[shell ###]         Start   3: TestSliceComputer::testACS_FollowTool
[shell ###]   3/116 Test   #3: TestSliceComputer::testACS_FollowTool ................................   Passed    0.55 sec
[shell ###]         Start   4: TestSliceComputer::testAnyPlanes
[shell ###]   4/116 Test   #4: TestSliceComputer::testAnyPlanes .....................................   Passed    0.52 sec
[shell ###]         Start   5: TestVisualRendering::testInitialize
[shell ###]   5/116 Test   #5: TestVisualRendering::testInitialize ..................................   Passed    0.54 sec
[shell ###]         Start   6: TestVisualRendering::testEmptyView
[shell ###]   6/116 Test   #6: TestVisualRendering::testEmptyView ...................................   Passed    1.73 sec
[shell ###]         Start   7: TestVisualRendering::test_3D_Tool
[shell ###]   7/116 Test   #7: TestVisualRendering::test_3D_Tool ....................................   Passed    3.24 sec
…

Please don't get upset about the amount of time spent on each test - I am well aware that this is a problem.

Best Regards

Christian

Phil Nash

unread,

Jun 11, 2013, 5:57:31 PM6/11/13

to catch...@googlegroups.com

I was just commenting, over on https://github.com/philsquared/Catch/issues/167, that this is definitely still planned and coming very soon.

I've recently built an equivalent system for an alternate (commercial) project and have been able to incorporate adaptive iterations too (so it runs the code in a loop until the timing suitably exceeds the resolution of the timer).

Sorry it's taken a while to get to this - it's quite high on my list now (although my immediate priority is stabilising the public interfaces so I can move to 1.0).

Christian Askeland

unread,

Jun 12, 2013, 12:47:27 AM6/12/13

to catch...@googlegroups.com

Thanks for replying, Phil.

I'm looking forward to it, and good luck with the 1.0 release!

Kosta

unread,

Jul 12, 2013, 12:13:11 PM7/12/13

to catch...@googlegroups.com

Am Mittwoch, 12. Juni 2013 06:47:27 UTC+2 schrieb Christian Askeland:
> Thanks for replying, Phil.
>
> I'm looking forward to it, and good luck with the 1.0 release!

I was just about requesting the same feature; same setup here: moved from cppunit to catch, but now I am in need to identify slow running tests. An option for sorting the tests output based on their actual running time would also helpful.

Kosta

unread,

Jul 13, 2013, 6:23:47 AM7/13/13

to catch...@googlegroups.com

I am not quite sure if I understand the adaptive iterations: is it something like run as much tests as you can with a timeout threshold of 10 seconds or so?

I would find it also useful to have an optional "timeout" command line parameter which fails all tests that are running longer than the given value in order to enforce short and fast tests to do TDD effectively.

Phil Nash

unread,

Jul 13, 2013, 8:29:59 AM7/13/13

to catch...@googlegroups.com

Hi Kosta,

The idea behind the adaptive iterations is that Catch would first execute the test once, within a timer. If the recorded duration is less than some value (which would be some multiple of the timer resolution, giving some confidence in the timing) then it executes the test 10 times and checks again - then 100 times, then 1000, etc until it takes longer than the threshold value (increasing 10x seems to do the trick, but other multiples could work too).

So eventually it has executed enough repetitions that the timed value is reasonably reliable (assuming background tasks are not significant).

This can be useful if you want to keep an eye on the actual performance of something (as a compliment to proper profiling using a dedicated tool, of course).

If you just want to be alerted to long running tests then that's obviously not what you need. You just want to time a single execution. Clearly we need the latter before we can get the former anyway :-)

As for sorting results in duration order that's an interesting possibility. The problem there is that results are reported progressively, as the tests are running. In order to sort them we'd need to store the results as we run then sort them before reporting. That's not out of the question (the Junit reporter does have to store results before writing them out due to the way it reports failure counts as attributes) - but my preference is to keep everything progressively reported where possible.

The other alternative you mentioned - setting a threshold timeout - is also interesting. I suspect it's a lot more complex than it sounds, though, as I imagine it would require running each test on a separate thread. As well as the inherent complexity there it is multiplied by doing it portably (in C++98) with no external dependencies! You may achieve a similar effect by running the whole test executable from a script that is capable of killing the process after a timeout (most CI systems have such a feature built in already).

But if you just want to be alerted that a test case is starting to take a bit too long then I suspect the simplest way to support that in Catch would be to make the duration a success/ failure criteria. i.e. if the test takes longer than so many milliseconds to run then the test fails (but still allow it to run to completion). It does introduce non-determinism into your testing, which I don't like, but could be useful. It could be made a "soft error", so it gets reported but does fail the overall test run.

Would appreciate your thoughts.

Kosta

unread,

Jul 13, 2013, 1:48:42 PM7/13/13

to catch...@googlegroups.com

Hi Phil,

> The idea behind the adaptive iterations is that Catch would first execute the test once, within a timer. If the recorded duration is less than some value (which would be some multiple of the timer resolution, giving some confidence in the timing) then it executes the test 10 times and checks again - then 100 times, then 1000, etc until it takes longer than the threshold value (increasing 10x seems to do the trick, but other multiples could work too).
> So eventually it has executed enough repetitions that the timed value is reasonably reliable (assuming background tasks are not significant).
>
> This can be useful if you want to keep an eye on the actual performance of something (as a compliment to proper profiling using a dedicated tool, of course).

OK, got it. Need a bit to think about where that could be applied in our team/project...

> If you just want to be alerted to long running tests then that's obviously not what you need. You just want to time a single execution. Clearly we need the latter before we can get the former anyway :-)
>
>
> As for sorting results in duration order that's an interesting possibility. The problem there is that results are reported progressively, as the tests are running. In order to sort them we'd need to store the results as we run then sort them before reporting. That's not out of the question (the Junit reporter does have to store results before writing them out due to the way it reports failure counts as attributes) - but my preference is to keep everything progressively reported where possible.

I could live with a custom reporter for this feature; I haven't looked into the (new) reporter interface yet, but maybe a custom reporter could also do the trick for the timeout thresholds (see below).

> The other alternative you mentioned - setting a threshold timeout - is also interesting. I suspect it's a lot more complex than it sounds, though, as I imagine it would require running each test on a separate thread. As well as the inherent complexity there it is multiplied by doing it portably (in C++98) with no external dependencies! You may achieve a similar effect by running the whole test executable from a script that is capable of killing the process after a timeout (most CI systems have such a feature built in already).

Threads would not do the trick here since you cannot cancel/kill a thread without risk, you would need a separate process. But I thought more in the direction you described next:

> But if you just want to be alerted that a test case is starting to take a bit too long then I suspect the simplest way to support that in Catch would be to make the duration a success/ failure criteria. i.e. if the test takes longer than so many milliseconds to run then the test fails (but still allow it to run to completion). It does introduce non-determinism into your testing, which I don't like, but could be useful. It could be made a "soft error", so it gets reported but does fail the overall test run.

This is the behavior I am striving for. My idea is to set a timeout threshold of say 10ms for the developers, signaling too slow tests during the development process (modify-compile-link-test cycle); getting a false failure here (e.g., due to temporary heavy background task loads) could be easily reverified by a rerun of the test by the developer. The threshold for the CI machines would be set to a lot less restrictive value (e.g. 50ms) in order to avoid reporting false failures; or we omit the timeout threshold at all for the CI machines.

I could also think to use the tagging system to classify the tests in fast, slow, medium, ultra slow categories, and ensure the right tagging of the tests with such a threshold.

If I would need that for only a subset of tests I would use a RAII style Timeout helper class for it with a CATCH_REQUIRE() test in the dtor, which gets instantiated at the top of the test implementation. But doing this for 100s or 1000s of tests manually... Maybe a custom reporter could help with that?

Or maybe we could have hooks to register for test-begin/test-end events?

Kosta

unread,

Jul 23, 2013, 5:25:06 PM7/23/13

to catch...@googlegroups.com

I made a first implementation for integrating timings and outputing them via the console and the junit reporter in this pull request against the integration branch: