Option to re-run only tests that failed last time?

210 views
Skip to first unread message

Nathan Long

unread,
Nov 22, 2017, 3:43:32 PM11/22/17
to elixir-lang-core
Ruby's Rspec has a handy option, `--only-failures`, which "filters what examples are run so that only those that failed the last time they ran are executed". https://relishapp.com/rspec/rspec-core/docs/command-line/only-failures

I'd love to have this feature in ExUnit. The closest thing I see right now is `--stale`, but if ExUnit can't accurately determine which tests may have been broken by a change, it doesn't work. (I have such an example, but don't want to be long-winded; maybe the utility of this feature is clear enough?)

Louis Pilfold

unread,
Nov 22, 2017, 4:54:51 PM11/22/17
to elixir-l...@googlegroups.com
Hi Nathan

I feel ExUnit --stale should always be able to tell this. Could you share your example please?

Cheers,
Louis 

On Wed, 22 Nov 2017 at 20:43 Nathan Long <h...@nathanmlong.com> wrote:
Ruby's Rspec has a handy option, `--only-failures`, which "filters what examples are run so that only those that failed the last time they ran are executed". https://relishapp.com/rspec/rspec-core/docs/command-line/only-failures

I'd love to have this feature in ExUnit. The closest thing I see right now is `--stale`, but if ExUnit can't accurately determine which tests may have been broken by a change, it doesn't work. (I have such an example, but don't want to be long-winded; maybe the utility of this feature is clear enough?)

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/f5881fa3-ed51-44be-8f6b-81e5181fa449%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nathan Long

unread,
Nov 22, 2017, 5:06:18 PM11/22/17
to elixir-lang-core
Sure. I have a module called `MyApp.Mixpanel` with functions like `track_event(:user_signup, data_map)`. These are called from various places throughout the codebase. There's a production adapter, which actually sends the event data to Mixpanel for analytics purposes, a dev adapter, which just logs it, and a test adapter, which sends it to `self()` as a message.

Several of my tests say things like "if I POST the info required for a new user signup, I should get a message showing that the correct info would have been sent to Mixpanel." These use `assert_receive`.

I just changed the format of the message built within `MyApp.Mixpanel`. This caused `assert_receive` to fail in tests throughout my app, as expected. But since the tests didn't directly reference `MyApp.Mixpanel`, `--stale` didn't know which ones should be run when the message format changed; I had to run all tests to get them to fail.

This is no big deal, but it would be nice in such situations to run all tests once, then be able to whittle down the failing tests without re-running the whole suite.

José Valim

unread,
Nov 22, 2017, 5:48:14 PM11/22/17
to elixir-l...@googlegroups.com
To clarify, --stale does not run previously failed tests.

I just changed the format of the message built within `MyApp.Mixpanel`. This caused `assert_receive` to fail in tests throughout my app, as expected. But since the tests didn't directly reference `MyApp.Mixpanel`, `--stale` didn't know which ones should be run when the message format changed; I had to run all tests to get them to fail.

That feels like a bug. Maybe we are being conservative on how we compute the dependencies. If you can provide a sample app that reproduces the error, I would love to take a look at it.



José Valim
Founder and 
Director of R&D

To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/2aa483e6-f63c-42d6-9e4b-84efb8adf9de%40googlegroups.com.

Myron Marston

unread,
Nov 23, 2017, 3:03:14 AM11/23/17
to elixir-lang-core
I too would love to see ExUnit support an `--only-failures` flag.  It's one of my favorite features of RSpec and I wish every test framework had it.  I find that it makes a huge difference to my workflow to be able to quickly and easily filter to the tests that failed the last time they ran.

In fact, I love this feature of RSpec so much that I was the one who added it to the framework a couple years back :).  I'd be happy to help see it get added to ExUnit if José and others were amenable.  ExUnit already has most of the building blocks needed for it via tags and filtering.

Myron

José Valim

unread,
Nov 23, 2017, 7:03:41 AM11/23/17
to elixir-l...@googlegroups.com
Thanks everyone!

I believe this would be a good addition. My only question is where are the failed tests stored? In _build? Also, maybe we can also implement it as a special tag called "--only failed" or "--only failures"?




José Valim
Founder and 
Director of R&D

To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/270ca4ee-aa76-4e05-b7ad-c06427e748b9%40googlegroups.com.

Myron Marston

unread,
Nov 23, 2017, 12:28:07 PM11/23/17
to elixir-lang-core

I believe this would be a good addition. My only question is where are the failed tests stored? In _build?

For RSpec we made users configure where this state is stored, via a config.example_status_persistence_file_path option. RSpec didn’t have an established place to write that state so we left it up to the user to decide where they wanted it to go. I think for ExUnit, storing it in _build make sense.

However, note that we are not merely storing a list of failed tests. We store a list of all tests (including ones that were not included in the latest run) that looks like this:

example_id                                                             | status  | run_time        |
---------------------------------------------------------------------- | ------- | --------------- |
./spec/rspec/core/backtrace_formatter_spec.rb[1:1:1]                   | passed  | 0.00115 seconds |
./spec/rspec/core/backtrace_formatter_spec.rb[1:1:2]                   | passed  | 0.00052 seconds |
./spec/rspec/core/backtrace_formatter_spec.rb[1:1:3]                   | unknown |                 |
./spec/rspec/core/backtrace_formatter_spec.rb[1:1:4]                   | passed  | 0.00048 seconds |
./spec/rspec/core/backtrace_formatter_spec.rb[1:2:1:1]                 | passed  | 0.00058 seconds |
./spec/rspec/core/backtrace_formatter_spec.rb[1:2:2:1]                 | failed  | 0.00088 seconds |
./spec/rspec/core/backtrace_formatter_spec.rb[1:2:3:1]                 | passed  | 0.00084 seconds |
./spec/rspec/core/backtrace_formatter_spec.rb[1:3:1]                   | passed  | 0.00052 seconds |
./spec/rspec/core/backtrace_formatter_spec.rb[1:3:2]                   | failed  | 0.00059 seconds |
./spec/rspec/core/backtrace_formatter_spec.rb[1:4:1]                   | pending | 0.00053 seconds |
./spec/rspec/core/bisect/coordinator_spec.rb[1:1]                      | passed  | 0.00366 seconds |
./spec/rspec/core/bisect/coordinator_spec.rb[1:2]                      | passed  | 0.00307 seconds |
./spec/rspec/core/bisect/coordinator_spec.rb[1:3:1]                    | passed  | 0.002 seconds   |
./spec/rspec/core/bisect/coordinator_spec.rb[1:3:2]                    | passed  | 0.00231 seconds |
./spec/rspec/core/bisect/coordinator_spec.rb[1:4:1]                    | passed  | 0.00293 seconds |
./spec/rspec/core/bisect/example_minimizer_spec.rb[1:1]                | passed  | 0.00049 seconds |
./spec/rspec/core/bisect/example_minimizer_spec.rb[1:2]                | passed  | 0.0006 seconds  |

# ...

This is a custom serialization format we designed to be easily scannable by a human (as it’s useful information, particular the run_time). The example_id column uniquely identifies each test (since the other common ways to identify tests, such as description and file location, are not guaranteed to be unique). Every time a test run finishes, we merge the results with the existing contents of this file using a few simple rules.

We then use this data to automatically add :last_run_status metadata to every test (with values of passed, failed, pending or unknown) when the spec files are loaded, which unlocks the generic ability to filter based on this via the RSpec CLI:

$ rspec --tag last_run_status:failed

This is the equivalent of --only failed like you asked about, José. Whether or not you add an explicit option like --only-failures is up to you, but the explicit option does provide a couple nice advantages for RSpec:

  • It surfaces this extremely useful option in the --help output. Without calling it out, it would not be clear to most users that failure filtering is possible.
  • Since we can easily tell from our persistence file which spec files have failures, when --only-failures is passed, we automatically load only those files. In contrast, --tag filtering doesn’t generally know anything in advance about which files might have specs matching the tag, so --tag last_run_status:failed will load all spec files, and then apply the filtering. This can be significantly slower, particularly if there are files without failures that load a heavyweight dependency (like rails).

One other option we provide (which ExUnit may or may not want to provide) is --next-failure. This is the equivalent of --only-failures --fail-fast --order defined. The idea is that you often want to work through the failures systematically one-by-one. --fail-fast causes RSpec to abort as soon as the first failure is hit and --order defined disables the random ordering so you get the same failed example when you run rspec --next-failure over and over again to help you focus on a specific one. This option is also why we do the merging operation with the status from prior runs: it’s important that we preserve the failed status of tests that weren’t executed in the latest run.

ExUnit certainly doesn’t have to go the same route RSpec went here, but the combination of the perf speed up from avoiding loading files with only passing tests and the usefulness of --next-failure is pretty awesome, IMO.

Myron


--
You received this message because you are subscribed to a topic in the Google Groups "elixir-lang-core" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elixir-lang-core/_jbuzf4UvA4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elixir-lang-core+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4J9wMEN4w3wZ4WPio%3DVvCSmgtpcdQJJsP8ggzTngnGuxw%40mail.gmail.com.

Allen Madsen

unread,
Nov 23, 2017, 1:27:49 PM11/23/17
to elixir-l...@googlegroups.com
+1 for --next-failure functionality. My current approach with ExUnit is basically a manual version of that.

José Valim

unread,
Nov 23, 2017, 1:46:20 PM11/23/17
to elixir-l...@googlegroups.com
That's very helpful, thank you Myron.

We already keep several manifests for compiled code with the function calls, files and modules. Therefore it should be relatively straight-forward to keep one for tests. I think the first step is to build the manifest itself which will give us the last_run_status information. Is that right?

Implementation-wise, we can probably even use a custom "formatter" to maintain this information. All we need is a path to store this manifest (which is opt-in but mix test can generate one by default in _build and pass to ExUnit).




José Valim
Founder and 
Director of R&D

Myron Marston

unread,
Nov 23, 2017, 4:14:22 PM11/23/17
to elixir-lang-core

I think the first step is to build the manifest itself which will give us the last_run_status information. Is that right?

I think there’s a pre-requisite you need to get out of the way before you can build the manifest: you need to decide how you plan to uniquely identify each test. Does ExUnit already have something analogous to RSpec’s example ids? If not, you could potentially use either the test name or the test location (e.g. file_name:line_number) but those may not be sufficient (for RSpec they weren’t). For RSpec, the file location is not guaranteed unique, since you can dynamically define multiple tests in a loop, which results in multiple tests sharing the same file location, and this seems like a problem for ExUnit. Likewise, RSpec does not require that each test description is unique (I think ExUnit might require this, though…is that right?). Even if test descriptions are unique, it has some properties that, IMO, make it undesirable for use here:

  • There’s no easy way to map a test description back to the file the test is defined in, which means it limits the kind of cleanup you can do as part of merging the current results and the old results. At the end of a test run, RSpec cleans up the manifest by removing tests that cannot possibly still exist due to their file no longer existing, which is only possible since the example ids list what file the tests come from.
  • Test descriptions often change when the contents of the test may not. (Likewise, the location of a test can easily change just by the introduction of a helper function, an import or alias at the top of the module, etc).

It’s easiest to explain how RSpec’s example ids work by showing an example:

# foo_spec.rb

RSpec.describe "Group 1" do
  it 'foos' do # foo_spec.rb[1:1]
    # ...
  end

  describe "a nested group" do
    it 'bars' do # foo_spec.rb[1:2:1]
      # ...
    end

    it 'bars again' do # foo_spec.rb[1:2:2]
      # ...
    end
  end

  it 'foos again' do # foo_spec.rb[1:3]

  end
end

RSpec.describe "Group 2" do
  it 'foos' do # foo_spec.rb[2:1]
    # ...
  end
end

Basically, we number each example and example group with a counter that starts over at 1 within each new scope, and use colons to separate the elements that form the “path” to the specific example. A nice thing about the ids is that they are relatively stable even in the sense of further development of the file. Users can change their test descriptions and introduce new things that change the line numbers, and the ids still work to correctly identify the tests.

Would it make sense to introduce something like this for ExUnit? In RSpec we have found these ids to be useful for several other things (including --bisect, deterministic ordering when applying a seed to a subset, etc).

BTW, this is something I’d be happy to take a stab at in ExUnit unless someone else wanted to do it.

Myron



José Valim

unread,
Nov 23, 2017, 4:55:31 PM11/23/17
to elixir-l...@googlegroups.com
Tests are uniquely identified by module+name. It is not quite powerful as an ID system but it does the job of identifying tests uniquely.



José Valim
Founder and 
Director of R&D

Myron Marston

unread,
Nov 25, 2017, 11:27:13 AM11/25/17
to elixir-lang-core
In that case, module + name should work just fine, so building the manifest is the first step :).

José Valim

unread,
Nov 26, 2017, 4:59:04 PM11/26/17
to elixir-l...@googlegroups.com
Beautiful.

Nathan, can you please open up an issue about supporting --only-failures? Also, we would love if you could file a report with a mechanism to reproduce the issues with --stale. ExUnit should have picked up those changes.

Myron, if you would like to get started on the manifest, please do! We would love a PR. Feel free to ping me if you have any questions.



José Valim
Founder and 
Director of R&D

Onorio Catenacci

unread,
Nov 27, 2017, 9:34:24 AM11/27/17
to elixir-lang-core
While I think this is all great conversation, it seems to me to be slightly missing the point of doing unit tests in the first place.  If I re-run only failed tests what happens if I inadvertently break something that had previously passed?  Because that test isn't getting run, I won't catch the fact that I've broken it. 

I know there's nothing to force anyone to run any test (previously passing or failing) but it seems that adding a "---only-failures" switch is facilitating something that's really not a great practice to begin with.  

Please don't misunderstand: I'm not saying you shouldn't do this; I'm just a little concerned about implicitly blessing a bad practice in terms of unit testing.

--
Onorio

--
You received this message because you are subscribed to a topic in the Google Groups "elixir-lang-core" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elixir-lang-core/_jbuzf4UvA4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elixir-lang-co...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "elixir-lang-core" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elixir-lang-core/_jbuzf4UvA4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elixir-lang-co...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "elixir-lang-core" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elixir-lang-core/_jbuzf4UvA4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elixir-lang-co...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.

Hubert Łępicki

unread,
Nov 27, 2017, 10:18:33 AM11/27/17
to elixir-l...@googlegroups.com
This is a good point from Onorio. In addition to that, in RSpec the common use case I saw was actually abusing the feature so that randomly failing tests over CI could pass.

There is no reason it would be different here, I can think of at least one client of mine whose devs would do it immediately rather than fix the tests stability / dependency on order in first place. :(

To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/f2a4ee00-7d44-46a2-9a63-f183bad78b68%40googlegroups.com.

José Valim

unread,
Nov 27, 2017, 10:35:25 AM11/27/17
to elixir-l...@googlegroups.com
In addition to that, in RSpec the common use case I saw was actually abusing the feature so that randomly failing tests over CI could pass.

How could the feature be used for that? They would run it multiple times until it passes?

There is no reason it would be different here, I can think of at least one client of mine whose devs would do it immediately rather than fix the tests stability / dependency on order in first place. :(

To be fair, that's a bad argument for not adding something if we agree it is something useful to the majority.

I want everyone to think what happens when they do a change to their codebases and then a group of test fails. What do you do? Do you always run the full suite or do you fix each test individually and then do a full run after? This is about automating the part where you run each test individually.



José Valim
Founder and 
Director of R&D

Myron Marston

unread,
Nov 27, 2017, 11:26:21 AM11/27/17
to elixir-lang-core
If I re-run only failed tests what happens if I inadvertently break something that had previously passed?

Unless I'm missing something, this criticism applies to any way of running a subset of the full suite, such as passing a single test file at the command line, or using tag filtering.  On that specific run, you won't know the status of tests that were not included in the run.  This feature isn't intended to be used as a replacement for running the whole suite.  Always running the whole suite lengthens the feedback cycle from your tests and hinders productivity.

Personally, I use this option after making a change that breaks tests across multiple test files.  It makes it easy to keep re-running the tests that failed; once I've fixed them, I run the whole suite again.  It just automates the process of running the specific subset I am currently interested in.

I know there's nothing to force anyone to run any test (previously passing or failing) but it seems that adding a "---only-failures" switch is facilitating something that's really not a great practice to begin with.  

I would argue that optimizing your workflow for tight feedback loops is an excellent practice, and something to be encouraged...unless our position is "you should always run the entire suite" (in which case, why does ExUnit already support a variety of ways to run a subset?).

In addition to that, in RSpec the common use case I saw was actually abusing the feature so that randomly failing tests over CI could pass.

I'm surprised to hear this.  As the author of the original RSpec feature I've never heard of folks using it for this.  So while it may be the most common use case you've encountered, I don't think this is representative of the community at large.

There is no reason it would be different here, I can think of at least one client of mine whose devs would do it immediately rather than fix the tests stability / dependency on order in first place. :(
 
Honestly, I think the underlying problem here is developers who are OK with flickering tests and don't put effort into fixing them.  Unless we remove all ways of running subsets of ExUnit test suites, there's always going to be a way for developers who don't want to address these problems to work around them.  (And sometimes, that's the right choice!)

Myron

To unsubscribe from this group and all its topics, send an email to elixir-lang-core+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4KvbDoR%2Bf7Ka9m7VbfBtGfzDYoxGGFg5CcbGYUALFKU6Q%40mail.gmail.com.

Michał Muskała

unread,
Nov 27, 2017, 11:33:59 AM11/27/17
to elixir-lang-core
Does --only-failed imply using the same seed as the last run? If not, I think it can be problematic - the failures would be printed differently each time making it as hard to fix tests one-by-one as it is today.

Michał.

Hubert Łępicki

unread,
Nov 27, 2017, 11:40:13 AM11/27/17
to elixir-l...@googlegroups.com
Yeah, hopefully just one `rspec spec || rspec --only-failures` in the
pipeline config (or similar with if clause).

But I think the feature might be useful, even as a logging utility to
see which tests did fail without scrolling up: if you have many test
suites as in umbrella project for example. Currently it's a lot of
scrolling, if failed tests from each suite were logged by default to
text files in individual apps, would be much easier to find them.
>> https://groups.google.com/d/msgid/elixir-lang-core/f2a4ee00-7d44-46a2-9a63-f183bad78b68%40googlegroups.com.
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elixir-lang-core" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elixir-lang-co...@googlegroups.com.
>> To view this discussion on the web visit
> --
> You received this message because you are subscribed to the Google Groups
> "elixir-lang-core" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elixir-lang-co...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4KvbDoR%2Bf7Ka9m7VbfBtGfzDYoxGGFg5CcbGYUALFKU6Q%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.



--
Pozdrawiam,
Hubert Łępicki
-----------------------------------------------
[ http://hubertlepicki.com ]

Onorio Catenacci

unread,
Nov 27, 2017, 11:58:42 AM11/27/17
to elixir-lang-core
Please don't misunderstand; I'm not saying "don't add the feature".  I'm just saying that this sort of thing can enable a bad practice which diminishes the value of unit testing. 

Those of you who pointed out that running any single test as opposed to running the whole test suite is already a problem--you're completely correct.  I'm not arguing against the feature at all; I'm simply trying to add an important consideration to the discussion for the sake of others who may read this discussion in the future.  I'm trying to make sure that if others see this conversation in the future, they will be aware that using any subset of the tests (including "only failed") is a little bit dangerous and should always be followed as closely as possible by running the entire suite. 



To unsubscribe from this group and all its topics, send an email to elixir-lang-core+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4KvbDoR%2Bf7Ka9m7VbfBtGfzDYoxGGFg5CcbGYUALFKU6Q%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Xavier Noria

unread,
Nov 27, 2017, 12:27:04 PM11/27/17
to elixir-l...@googlegroups.com
On Mon, Nov 27, 2017 at 5:57 PM, Onorio Catenacci <Cate...@ieee.org> wrote:

Please don't misunderstand; I'm not saying "don't add the feature".  I'm just saying that this sort of thing can enable a bad practice which diminishes the value of unit testing. 

Those of you who pointed out that running any single test as opposed to running the whole test suite is already a problem--you're completely correct.  I'm not arguing against the feature at all; I'm simply trying to add an important consideration to the discussion for the sake of others who may read this discussion in the future.  I'm trying to make sure that if others see this conversation in the future, they will be aware that using any subset of the tests (including "only failed") is a little bit dangerous and should always be followed as closely as possible by running the entire suite.

That's a fair point.

Depending on the size of the project, I'd say today this best practice has moved to running locally what is obviously affected, and delegating to CI the detection of unexpected breakages. In my case it's been years since I do not run the entire suite of a non-trivial project locally.

Andrew Dryga

unread,
Dec 3, 2017, 7:37:21 AM12/3/17
to elixir-lang-core
I'd like to add a point that I think it's a good idea to run `--only-failures` along with `--stale` all the times, because when you fix issue you can create a new one and the goal is not to just fix the failed tests, but don't add a regression along that way. And if we support this, `--only` prefix would confuse people.

Another option is to extend which tests are considered to be stale and run them with `--stale` flag.
Reply all
Reply to author
Forward
0 new messages