Who's defines success?

Michael G Schwern

unread,

Sep 7, 2010, 4:57:31 AM9/7/10

to test-mo...@googlegroups.com

I was working on Result objects to make them more flexible and not hard code
the directives as they are now. I was also thinking about a NoHistory object
which records an O(1) summary of the test history, but no linearly growing
details. Basic facts like have there been any failures. And this brought up
a problem.

In Test::Builder1, Test::Builder decides if an assert is to be interpreted as
a pass or failure [1] taking into account TODO and SKIP and whatever. Mostly
TODO. TB2 has Result object, so instead of TB2 making the assessment each
Result says if its a pass or fail.

Simple asserts are no problem. But throw in modifiers like TODO and things
get fuzzy. A TODO fail is supposed to be treated as a not being a failure [2]
and that much is pretty clear, otherwise what's the point? But there was some
argument that a TODO pass should be treated as a failure, because its
unexpected and thus could cause a bug. I thought the proposed circumstances a
bit of a stretch, but TB2 is about flexibility.

And then what about systems other than TAP? For example, the POSIX/Dejagnu
test system has a rather different set of statuses: PASS, FAIL, UNRESOLVED
(needs a human to look at the result), UNTESTED (there's a requirement here,
but no test for it), UNSUPPORTED (kinda like skip), XFAIL/XPASS (like TODO,
but the failure is due to a 3rd party bug), KFAIL/KPASS (like TODO, but the
failure is due to a bug in the software being tested).

I'm finding that the interpretation of success depends on the formatter. This
gives more responsibility to the formatter than was originally intended, as
you might guess from the name. It also raises the problem of what happens if
you switch formatters (feasible given you can replay all the historical
results through the new formatter). Or if you use a formatter that produces
two formats? It makes answering the rather simple question "are we passing?"
ambiguous and that's no good.

I'm a bit flummoxed. Thoughts?

[1] Test::Harness makes its own independent judgment.

[2] I've found it useful sometimes to think of statuses in the negative
particularly because of the new "unknown" status which is neither a pass nor fail.

--
Whip me, beat me, make my code compatible with VMS!

David E. Wheeler

unread,

Sep 7, 2010, 11:42:55 AM9/7/10

to test-mo...@googlegroups.com

On Sep 7, 2010, at 1:57 AM, Michael G Schwern wrote:

> And then what about systems other than TAP? For example, the POSIX/Dejagnu
> test system has a rather different set of statuses: PASS, FAIL, UNRESOLVED
> (needs a human to look at the result), UNTESTED (there's a requirement here,
> but no test for it), UNSUPPORTED (kinda like skip), XFAIL/XPASS (like TODO,
> but the failure is due to a 3rd party bug), KFAIL/KPASS (like TODO, but the
> failure is due to a bug in the software being tested).

Interesting. I could see a use for some of these in Test::Builder[12].

> I'm finding that the interpretation of success depends on the formatter. This
> gives more responsibility to the formatter than was originally intended, as
> you might guess from the name. It also raises the problem of what happens if
> you switch formatters (feasible given you can replay all the historical
> results through the new formatter). Or if you use a formatter that produces
> two formats? It makes answering the rather simple question "are we passing?"
> ambiguous and that's no good.

I disagree. TAP has a spec with very specific meanings. You're parsing TAP. Formatters may formate TAP into something else, but only to the extent that it's supported by TAP. So where there is a clear translation (SKIP => UNSUPPORTED), they should make it. But if the format isn't supported by TAP, there is no way for the tests to alert the formatter to do something different. Nor should they.

So formatters should translate the TAP syntax that translates well, and not translate something that, well, isn't in TAP.

That's not to say that it wouldn't be interesting to add something like XFAIL/XPASS to TAP. Once it's in the TAP spec and the parser is updated to add it to the result object *then* the formatter can be updated to translate it.

> I'm a bit flummoxed. Thoughts?

HTH.As for whether a passing TODO should fail, I kinda of think it should, but have no strong opinion.

Best,

David

Michael G Schwern

unread,

Sep 7, 2010, 3:32:27 PM9/7/10

to test-mo...@googlegroups.com

On 2010.9.7 8:42 AM, David E. Wheeler wrote:
>> I'm finding that the interpretation of success depends on the formatter. This
>> gives more responsibility to the formatter than was originally intended, as
>> you might guess from the name. It also raises the problem of what happens if
>> you switch formatters (feasible given you can replay all the historical
>> results through the new formatter). Or if you use a formatter that produces
>> two formats? It makes answering the rather simple question "are we passing?"
>> ambiguous and that's no good.
>
> I disagree. TAP has a spec with very specific meanings. You're parsing TAP.
> Formatters may formate TAP into something else, but only to the extent that
> it's supported by TAP. So where there is a clear translation (SKIP =>
UNSUPPORTED),
> they should make it. But if the format isn't supported by TAP, there is no
way for
> the tests to alert the formatter to do something different. Nor should they.
>
> So formatters should translate the TAP syntax that translates well, and not translate
> something that, well, isn't in TAP.
>
> That's not to say that it wouldn't be interesting to add something like XFAIL/XPASS to
> TAP. Once it's in the TAP spec and the parser is updated to add it to the
result object
> *then* the formatter can be updated to translate it.

By your answer, I don't think I explained myself correctly. "if the format
isn't supported by TAP" doesn't make sense because TAP is a format. Maybe I
haven't been clear that Test::Builder2 is no longer tied to TAP. TB2 can
output things other than TAP, such as POSIX.

Formats: TAP, POSIX, XML, Semaphore...
Results: Pass, fail, todo, skip, XFAIL, UNSUPPORTED, etc...

The results are normalized as much as possible so a TAP todo/fail and a POSIX
XFAIL both look the same internally allowing the result to be understood by
different formatters.

This isn't about formatting. If a formatter doesn't understand a result it
can't do much about it, yes. This is about who or what decides if a result is
a pass or a fail.

Maybe if you said what you're disagreeing with that would clear things up?

--
There will be snacks.

David E. Wheeler

unread,

Sep 7, 2010, 5:14:42 PM9/7/10

to test-mo...@googlegroups.com

On Sep 7, 2010, at 12:32 PM, Michael G Schwern wrote:

> By your answer, I don't think I explained myself correctly. "if the format
> isn't supported by TAP" doesn't make sense because TAP is a format. Maybe I
> haven't been clear that Test::Builder2 is no longer tied to TAP. TB2 can
> output things other than TAP, such as POSIX.

Oh, right.

> This isn't about formatting. If a formatter doesn't understand a result it
> can't do much about it, yes. This is about who or what decides if a result is
> a pass or a fail.
>
> Maybe if you said what you're disagreeing with that would clear things up?

I don't think formatters should decide what's pass and what's fail. That should be canonical in TB2.