And therein lies the problem. We're trying to balance two
good things against not having enough time to do everything!
More inline below.
On 1/2/14, 8:48 PM, Michael Betancourt wrote:
> I'm going to have to cast a vote in favor of our push for more unit tests.
>
> Firstly, "more functionality-less tests" is really a false option. Without tests
> bugs sneak through, causing flawed releases that require a huge amount
> of developer time to investigate, fix, and then update. Not to mention the
> potential breakdown of trust from our users. I say this from experience,
> loosing entire weekends tracking down bugs.
I'm particularly worried about trust. Both in ourselves and
from our users.
The problem I see is that as we crawl up the functionality and modularity
hierarchy from individual mathematical functions defined in stan::math to
the command-line code, it gets harder and harder to unit test. The complete
spec is less clearly defined, and testing little pieces requires extensive
mock-object frameworks. As far as I know, we don't have any tests that the
right samplers are getting called with the right command-line args, and such
tests are complex to write if we want to do it functionally, because we have
to know how each of the samplers behaves w.r.t. some fixed input.
> Secondly, unit tests offer far more than explicit tests. They also offer an
> example for how objects should be instantiated and used. In our
> "deprecated comments is a greater sin than no comments" philosophy
> these examples are critical to understanding how someone else's code
> is supposed to work and hence how it can be integrated/extended. It seems
> like half of our debugging time is spent just trying to deduce another developer's
> intentions.
If you are referring to my distaste for code comments, my preference is
for better structured code (i.e., more modular code) with reasonable naming conventions
so it's easier to follow. For instance, I'm in command.hpp right now, and there
are Fortran-style comments in functions like this:
//////////////////////////////////////////////////
// Initialize Model //
//////////////////////////////////////////////////
to point out that the following block of code initializes the model. I'd far
prefer a function named initialize_model(...) that encapsulated all of that code.
Functions should be short and scannable, not 700 lines of nested loops!
The problem (there's always a problem!) is that the first line is:
Model model(data_var_context, &std::cout);
and that variable "model" is going to be needed outside the loop. So if that
initialization is done internal the function, you have to define a nullary
constructor for Model and pass in a reference to model to set along with
the arguments. It can quickly get ugly. In this case, it makes sense to
define the model outside the initialization code, but I hope the general problem's
clear enough.
Also, I believe very strongly in API comments that indicate what the inputs and
outputs of functions are and how the outputs relate to the inputs. The unit
tests can be used as stand-ins for the API comments, but I think the right way round
is to doc the API, then test vs. the doc. Including boundary conditions. But
this also gets much harder as we crawl up the functionality hierarchy.
Code comments should be reserved, in my opinion, for when you do something non-idiomatic
in the language or when there's a very complex algorithm the steps of which need
to be explained.
> Thirdly, higher-level tests are extremely limited. They can't identify problems
> beyond the tested models (and there's no way we'd be able to fully test the space
> of all models) and even when a problem is identified we have no guidance in
> where the problems are occurring. Yes, a bisect might be able to identify the
> offending commit but it wouldn't identify interactions between the new and old
> code that induce the problems.
Agreed. And unit tests are extremely limited in exactly the opposite way --- almost
by definition, they test only micro behavior, not macro behavior.
> tl;dr Unit tests not only identify bugs they make debugging easier and the
> integration of new code far less difficult. This will not change with the addition of
> higher-level integration tests or statistical tests.
Agreed. The only problem is that we don't seem to get around to writing
the higher-level tests because we're all (or at least I am) buried in the
details of unit testing.
In the end, I think it's a judgment call. I don't think we want to be dogmatic
about demanding tests for every single function at every level of abstraction.
We just don't have the programmer-hours to make progress this way.
Then again, maybe we don't have the programmer hours not to :-)
- Bob