Because when there are errors in different files, you're very likely to have to switch back and forth between files. Also, test cases close to each other are more likely to be related to the same set of files, reducing the number of files opened/closed and the scrolling around within a file.
Sorry, I did not test `--seed 0` until now. I assumed it to still be random, but in a deterministic ordering due to using the same seed each time. It seems to actually be sorted by (path, line), as I wanted, when using `--seed 0`, with 39 failing test cases in multiple times.
This will definitely work for me. Thanks for pointing this out!