Some gripes with uTest

101 views
Skip to first unread message

Ben Hutchison

unread,
Mar 22, 2014, 8:24:59 PM3/22/14
to scal...@googlegroups.com
I'm experimenting with uTest for testing, because the ability to run test both JVM + JS, automatically, is IMO a "killer feature".

I'm going to raise some aspects of uTest I don't (currently) like, or perhaps don't understand here, because I think many uTest users are probably Scala.js users, and I thought I'd ask about my issues first, rather than raise github issues right away. 

I have 3 main gripes, which I will describe with reference to this uTest output:

> root/test
[info] Compiling 1 Scala source to C:\Users\ben\workspace\prickle\target\scala-2.10\test-classes...
[info] 1/5     prickle.PickleTests.caseclass.encoding           Failure(utest.AssertionError: assert(expected...
[info] 2/5     prickle.PickleTests.caseclass.unpickling         Success
[info] 3/5     prickle.PickleTests.caseclass.toleratesextradata         Success
[info] 4/5     prickle.PickleTests.caseclass            Success
[info] 5/5     prickle.PickleTests.             Success
[info] -----------------------------------Results-----------------------------------
[info] PickleTests$             Success
[info]     caseclass            Success
[info]         encoding         Failure(utest.AssertionError: assert(expected...
[info]         unpickling               Success
[info]         toleratesextradata               Success
[info] Tests: 5
[info] Passed: 4
[info] Failed: 1


1. Missing/truncated output on assertion fails

uTest emits 14 lines of output, but the info that is key to understanding the failure is missing. Surely, *detailing failed assertions* is at the heart of what a test runner should do?

To see the full fail message, do I have to type this awkward command 'test-only -- --trace=true', and then get a *huge* unwanted stack trace?


2. Cannot create test names with whitespace. Why??

For some reason, uTest wont allow me to use descriptive names for my tests, such as 'case class with self-reference'


3. Success output for grouping structures that don't test anything 

In the output above, 4 of the 14 lines (copied below) are for non-test grouping structures that contain no assertions. And given that they group a failed leaf node, why report 'success?

[info] 4/5     prickle.PickleTests.caseclass            Success
[info] 5/5     prickle.PickleTests.             Success
[info] PickleTests$             Success
[info]     caseclass            Success


Do I not get something about how uTest is supposed to be used?

-Ben

Haoyi Li

unread,
Mar 22, 2014, 8:48:06 PM3/22/14
to Ben Hutchison, scal...@googlegroups.com
No, those are pretty annoying problems. 

1. is a difficult choice; SBT also often provides you with no stack traces when things blow up, asking you to type in an arcane "last blah blah" command to see it. What would your expected behavior be? Trying to find the ideal balance between "doesn't tell me anything" and "floods my screen with spam" is hard, but we can probably do better than what we're doing now.

The test-only -- rubbish is annoying, but I blame SBT's fault for not passing args to the testrunner if you only use test =P.

2. Mainly because the mechanism for running tests from the command line relies on the default input parser, which tokenizes stuff on whitespace, and I couldn't figure out a good way of making it work with tests with spaces in their names. Any ideas?

3. The main reason why they're there is because grouping structures can contain common setup code, rather than awkwardly chucking them in some helper somewhere else. If the shared setup code blows up, the tests don't get run, but that's not to say the tests failed, but the grouping structure failed. And I want to see the stack trace and all that just the same. So the way it's set up it treats grouping structures as tests themselves, with success and failures.

One possibility would be to by default not show success messages for tests with nested tests (e.g. the "grouping structures" you described), but still show failures. 

What do you think? I have felt all these pain points too.


--
You received this message because you are subscribed to the Google Groups "Scala.js" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-js+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ben Hutchison

unread,
Mar 23, 2014, 7:40:27 AM3/23/14
to Haoyi Li, scal...@googlegroups.com
On Sun, Mar 23, 2014 at 11:48 AM, Haoyi Li <haoy...@gmail.com> wrote:
1. is a difficult choice; SBT also often provides you with no stack traces when things blow up, asking you to type in an arcane "last blah blah" command to see it. What would your expected behavior be? Trying to find the ideal balance between "doesn't tell me anything" and "floods my screen with spam" is hard, but we can probably do better than what we're doing now.

There are at least 2 use cases, I guess... 

For me, the most common one is I'm incrementally working on 1-2 tests at a time. The few that are red, I want rich failure data for, the rest that are green I don't care about, just a total number of Passed would be enough.

But then there's exploratory/summary runs, where I'm running the tests in a less-familiar codebase, and I want to enumerate all the tests that run, red or green. (Or perhaps I want to impress my boss by showing them lots of green tests running over the code :P)

Right now, I feel uTest output behaviour well supports the 2nd, exploratory/summary usage. But it's not such a good fit for the first case, where I know what tests exist, and I mainly want to know if/when they are not green.

Personally, I'd like to see the defaults bias the other way, or else some project-level mode settings for:
(a) Show detailed info for Failures?
(b) Hide details for Success?


The test-only -- rubbish is annoying, but I blame SBT's fault for not passing args to the testrunner if you only use test =P.

2. Mainly because the mechanism for running tests from the command line relies on the default input parser, which tokenizes stuff on whitespace, and I couldn't figure out a good way of making it work with tests with spaces in their names. Any ideas?

OK, got it, an SBT problem at heart... 

After some googling and hunting through SBT issues, it looks like Eugene Yokota did actually address the above limitation, although not documented?  https://github.com/sbt/sbt/pull/396

Allows eg: test-only -- """mytests.MyTestSuite.Test that Zebras only eat green grass"""

3. The main reason why they're there is because grouping structures can contain common setup code, rather than awkwardly chucking them in some helper somewhere else. If the shared setup code blows up, the tests don't get run, but that's not to say the tests failed, but the grouping structure failed. And I want to see the stack trace and all that just the same. So the way it's set up it treats grouping structures as tests themselves, with success and failures.

One possibility would be to by default not show success messages for tests with nested tests (e.g. the "grouping structures" you described), but still show failures.

Ok, the reasoning makes sense: if setup code fails, I do want to know. But as you suggest, if it doesn't fail, for myself I'd like it quiet by default.

-Ben

 

Haoyi Li

unread,
Mar 24, 2014, 12:57:54 AM3/24/14
to Ben Hutchison, scal...@googlegroups.com
Here's another thing to think about: currently, since the grouping constructs are considered tests, they are counted in the "N passed, M failed" summary. I personally think this is kind of weird, but have not come up with a better way of figuring it. Do you have any ideas for how we could set this up, that would play nicely with the "don't show the Success for the grouping constructs if they don't fail" thing? Currently each Success of Failure adds to that count, which is really simple to understand even if not quite accurate since the grouping constructs aren't really tests. 

Once we start doing clever things like hiding Successes we don't care about, but showing failures, that brings us into unfamiliar territory where the count of Successes + Failures changes depending on how many grouping constructs failed, or the #Pass is not the same of #Successes, and other weirdness. There may be an elegant solution, but I haven't managed to come up with it

Ben Hutchison

unread,
Mar 25, 2014, 10:49:16 PM3/25/14
to Haoyi Li, scal...@googlegroups.com
On Mon, Mar 24, 2014 at 3:57 PM, Haoyi Li <haoy...@gmail.com> wrote:
Here's another thing to think about: currently, since the grouping constructs are considered tests, they are counted in the "N passed, M failed" summary. I personally think this is kind of weird, but have not come up with a better way of figuring it. Do you have any ideas for how we could set this up, that would play nicely with the "don't show the Success for the grouping constructs if they don't fail" thing? Currently each Success of Failure adds to that count, which is really simple to understand even if not quite accurate since the grouping constructs aren't really tests. 

Once we start doing clever things like hiding Successes we don't care about, but showing failures, that brings us into unfamiliar territory where the count of Successes + Failures changes depending on how many grouping constructs failed, or the #Pass is not the same of #Successes, and other weirdness. There may be an elegant solution, but I haven't managed to come up with it

Various thoughts:

Fails are either (a) Assertion fails, or (b) Code throwing an exception, right? It'd be nice if they're easy to tell apart. And for non-test grouping nodes (ie without asserts) the cause of a fail must be an exception.

Personally, the most useful numbers are #FailedTestAssertions (a) and #ThrownExceptions (b). "Nice to have" is the number of blocks that ran OK (ie #Success), (and perhaps, the proportion of blocks that passed), mainly to sanity check that you are running the tests you think you are.

If there was a project Setting to show me full Assertion or Throw Exception failure info, for any failures, but hide all success details, giving some summary count of how many things in total were run, I'd probably choose that myself.

-Ben
Reply all
Reply to author
Forward
0 new messages