Trying to speed up some tests at work. We have a bunch of test files like this:
t/foo/bar/baz.yml t/foo/bar/baz.t
So each .yml file has a corresponding .t file. However, each .t file is (more or less) identical. It looks like I can get the test time down from 10 minutes to about three by using one .t file which looks like this:
use Test::More 'no_plan'; foreach my $test ( find_tests() ) { runtests($test); }
That avoids the overhead of reloading perl and the modules multiple times. However, each .yml file defines its own test count and I don't want 'no_plan'. What I really want to do is this:
use Test::More 'deferred_plan' my $plan = 0;
foreach my $test ( find_tests() ) {
$plan += runtest($test); # returns count
} plan $plan;
And if my plan doesn't match tests run, I get an error.
I could get around this by loading all of the YAML files and checking their count, but then I'd have to load them *again* when I run the tests and that defeats the purpose of speeding up the test suite.
> That avoids the overhead of reloading perl and the modules multiple > times. However, each .yml file defines its own test count and I > don't want 'no_plan'. What I really want to do is this:
> use Test::More 'deferred_plan' > my $plan = 0;
> foreach my $test ( find_tests() ) {
> $plan += runtest($test); # returns count
> } > plan $plan;
[snip]
For this particular case I would just do:
use Test::More 'no_plan'; my $builder = Test::More->builder; foreach my $test ( find_tests() ) { my $initial_count = $builder->current_test; my $expected_num_tests = runtest($test); is $builder->current_test, $initial_count + $expected_num_tests, "expected $expected_count tests in $test }
> I could get around this by loading all of the YAML files and > checking their count, but then I'd have to load them *again* when I > run the tests and that defeats the purpose of speeding up the test > suite.
>That avoids the overhead of reloading perl and the modules multiple > times. However, each .yml file defines its own test count and I > don't want 'no_plan'.
So, you write a custom 'exec' program which starts a daemon (the first time (e.g. checks a lockfile)) and passes it the name of the test file (on a socket/fifo.) The daemon has preloaded all of the modules, then forks-off a new process to run each test. The harness sees it as multiple tests, so the plans are fine.
Now you can get it down to 1.5min if you have a second CPU.
Some sort of preload support in TAP::Harness would be nice.
Also note, with SGI::FAM, a persistent daemon could conceivably have already loaded the new code before you can reach for the hotkey to start the tests.
--Eric -- To a database person, every nail looks like a thumb. --Jamie Zawinski --------------------------------------------------- http://scratchcomputing.com ---------------------------------------------------
On Mon, 2007-11-19 at 13:06 +0000, Andy Armstrong wrote: > On 19 Nov 2007, at 11:04, Ovid wrote: > > I could get around this by loading all of the YAML files and > > checking their count, but then I'd have to load them *again* when I > > run the tests and that defeats the purpose of speeding up the test > > suite.
It looks like the con on both of these proposals is lack of backcompat.
My idea is to encode as much of the block information in the current dialect of TAP as possible, and then add extra info in the comments that new harnesses can process (and old harnesses can ignore).
How about:
<no current-style plan> # PLAN 4 BLOCKS # {BLOCK 1} 1..2 ok 1 - BLOCK{1} TEST{1} - and the usual comment ok 2 - BLOCK{1} TEST{2} # {BLOCK 2} PLAN NO_PLAN ok 3 - BLOCK{2} TEST{1} # {BLOCK 2} 1..1 # {BLOCK 3} 1..1 ... # {BLOCK 4} 1..2 <a total of 6 tests run over 4 blocks> 1..6
This is fully-backwards compatible with current harnesses, and even provides most of the safety of the above proposals (a bit better than no_plan, since the number at the bottom of the TAP is calculated based on block declarations, not on number of tests run). Blocks can also nest, if you want.
One thing I might add is a symbol after the # like:
#@ this is a new-style TAP command
If the @ after the # (without a space separating them) is legal in TAP 1.0, then even Test::More::diag('@ BLOCK{1} 1..2') would still be old-style TAP as far as the new parser is concerned (since it would print "# @ BLOCK..."). Nifty.
> It looks like the con on both of these proposals is lack of backcompat.
Backcompat shouldn't be a problem now that we have T::H 3. That was one of the goals. TAP is now a versioned protocol and anything emitting a specific version of TAP needs to declare the version. Older TAP processors were supposed to ignore anything they didn't understand so none of these should really be a problem.
I guess I'm not seeing why a deferred plan is better than no plan at all. Seems to me the whole point of a plan is that you know up front how many they're gonna be.
On Monday 19 November 2007 14:11:35 Andy Lester wrote:
> I guess I'm not seeing why a deferred plan is better than no plan at > all. Seems to me the whole point of a plan is that you know up front > how many they're gonna be.
There's that, and there's that Ovid's tests take too long to run when you time all of the startup costs.
I'm having trouble convincing myself that the right solution to that is to wedge more stuff into TAP though.
* Andy Lester <a...@petdance.com> [2007-11-19 23:17]:
> I guess I'm not seeing why a deferred plan is better than no > plan at all.
At a minimum, because the harness expects a plan. If you exit prematurely, it can at least detect that no plan was given, whereas if you test without a plan, it knows nothing at all.
And beyond that, you still declare intent and the harness can compare with actual behaviour. A buggy set of tests is more likely to align with the count than it might with an up-front plan, but not with complete certainty – whereas if you test without a plan, the harness, once again, knows nothing at all.
A deferred plan is clearly not as good as a predeclared plan, but is definitely much safer than no plan at all.
* Andy Lester <a...@petdance.com> [2007-11-20 00:10]:
> But what if something blows up before getting to the deferred > plan? Then you don't know.
How could you *not* know? The TAP stream says “I’m gonna supply a plan at the end, I just don’t know how many tests I’m going to run yet.” How would the harness miss the fact that the promised plan never materialised?
On Mon, 2007-11-19 at 17:08 -0600, Andy Lester wrote: > On Nov 19, 2007, at 5:04 PM, A. Pagaltzis wrote:
> > A deferred plan is clearly not as good as a predeclared plan, > > but is definitely much safer than no plan at all.
> But what if something blows up before getting to the deferred plan? > Then you don't know. You've bypassed having a plan.
More information is better than less information.
Consider the case where you want to run n + 10 tests. With blocks in a deferred plan, you can't be entirely sure that n is correct, but you can be sure that the other 10 tests did run. Not perfect, but better than just saying "1..63" at the end and not knowing if the "+ 10" is included in that 63.
Secondly, perhaps it's possible to refactor the test to turn an entire "block" of TAP into a single test. Compare "files_are_valid(@FILES)" to "file_is_valid($_) for @FILES". Same effect, but with the first one you can declare the plan in advance. (OK, bad example because you know how many elements are in @FILES. But the concept still applies.)
----- Original Message ---- > From: Andy Lester <a...@petdance.com> > I guess I'm not seeing why a deferred plan is better than no plan at > all. Seems to me the whole point of a plan is that you know up front > how many they're gonna be.
I've not explained myself well. Sorry about that.
The reason you have to know up front is because that's how the entire system was designed. We currently have two cases when we should have at least three:
1. We know the test count up front, in which case we declare a leading plan. 2. We don't know the test count up front, in which case Test::Builder supplies a trailing plan which merely tells me how many tests I've run.
This misses the obvious case where I don't know the expected count up front, but I do know the expected count at the end (not guaranteed to be the same as the actual count). It should be trivial to fix Test::Builder and co., so that the programmer supplies the trailing plan instead of Test::Builder.
That being said, I like Adrian's code for this and I'll be stealing it.
----- Original Message ---- > From: chromatic <chroma...@wgz.org> > There's that, and there's that Ovid's tests take too long to run > when you time all of the startup costs.
The runtime of the tests is completely orthogonal to this problem.
> I'm having trouble convincing myself that the right solution to that > is
to wedge more stuff into TAP though.
There's nothing else being wedged into TAP. This is about the programmer supplying the trailing plan instead of Test::Builder.
> It looks like the con on both of these proposals is lack of backcompat.
No. They're actually both completely backwards compatible. Consider test groups: 1..3 ok 1 1..2 2 a block 1..3 2.1 another block ok 2.1.1 ok 2.1.2 ok 2.1.3 ok 2.1 # end of another block ok 2.2 ok 2 # end of a block 1..3 3 a third block ok 3.1 ok 3.2 not ok 3 # end of a third block, planned for 3 but only ran 2 tests Since older TAP parsers are required to ignore lines which don't recognize the grammar, here's what the parser should see:
1..3 ok 1 ok 2 # end of a block not ok 3 # end of a third block, planned for 3 but only ran 2 tests And with test blocks (the version on the Wiki is different and incorrect. I've fixed it below, but not yet on the wiki): TAP version 14 1..4 ok 1 - testing begin 1 Object creation 1..2 ok 1 Object created OK ok 2 Object isa Flunge::Twizzler end 1 Object creation ok 2 Clone OK begin 3 Methods 1..4 ok 1 has twizzle method ok 2 has burnish method ok 3 has spangle method not ok 4 has frob method end 3 Methods ok 3 another test ok 4 Resources releasedHere's what an older TAP parser will see: 1..4 ok 1 - testing ok 2 Clone OK ok 3 another test ok 4 Resources released So if your current TAP parser is correct, you shouldn't have a problem. The "breaks backwards compatibility" arguments on the wiki don't seem correct.
> # PLAN 4 BLOCKS > # {BLOCK 1} 1..2 > ok 1 - BLOCK{1} TEST{1} - and the usual comment > ok 2 - BLOCK{1} TEST{2} > # {BLOCK 2} PLAN NO_PLAN > ok 3 - BLOCK{2} TEST{1} > # {BLOCK 2} 1..1 > # {BLOCK 3} 1..1
This has much of the same problem as the current 'test groups' proposal: it's ugly and hard to read. However, it seems even harder to read than test groups. TAP should be as terse as possible, and no terser, in order to unequivocally represent intent. Otherwise, why not just switch to XML?
>> I guess I'm not seeing why a deferred plan is better than no plan at >> all. Seems to me the whole point of a plan is that you know up front >> how many they're gonna be.
> There's that, and there's that Ovid's tests take too long to run > when you time > all of the startup costs.
> I'm having trouble convincing myself that the right solution to that > is to > wedge more stuff into TAP though.
I hope the idea of structured TAP is quite generic. It's not just about supporting this case. Is that the proposal you're sceptical about?
* Adrian Howard <adri...@quietstars.com> [2007-11-20 16:25]:
> I don't get this. Why is saying "I know this test script > outputs 8 test results" at the start better than saying it > at the end?
I assume that if you knew up front how many tests you are going to run, then you’d just say it.
So you’d defer the plan in cases where the number of tests is predetermined but maybe hard to precompute, or where it’s variable. So in both cases you are calculating the number at run time, which is immediately subject to more bugs than providing a constant.
Additionally, it’s more likely for a bug in the calculation to line up with a bug in the corresponding test code, so that you end up with a plan that matches the number of tests run even though you *intended* to run fewer/more tests.
And lastly, even a runtime-calculated predeclared plan separates the test code and calculation code at least in time (while running) and probably also in space (in the source code). Therefore it seems to me that bugs are somewhat less likely to line up.
So a deferred plan should be used only if you really can’t determine the number of tests ahead of time or it is *very* hard to do so.