An example of why we want tests for internal functions.

53 views
Skip to first unread message

Daniel Lee

unread,
Sep 13, 2016, 10:00:51 AM9/13/16
to stan-dev mailing list, stan...@mc-stan.org
Hi all,

First off, please don't read this as a statement blame or judgement of any sort. In the past our code base was smaller, there were fewer contributors, fewer users, and much less optimization happening with the code base. As we've evolved as a team, we've added process to reduce the downstream burden of maintenance.

Here's a recent example of when it would have been useful to have tested and documented code. I'm looking at Bob's fix for math issue #370. The pull request looks reasonable: https://github.com/stan-dev/math/pull/392

The place where there's a change in behavior is in stan/math/prim/scal/fun/grad_reg_inc_beta.hpp, but the place where it's actually failing is in the distribution tests for beta_ccdf_log(). If you look at the pull request, that function isn't one of the ones that changed at all.

stan/math/prim/scal/fun/grad_reg_inc_beta.hpp has almost no doc, especially boundary conditions when this function is suppose to work and when it's not supposed to work. There are a few tests in test/unit/math/mix, but not in prim, which it should have.

Over the last two days, I think I've spent 3 hours getting down to an internal function that doesn't have tests, that wasn't touched, and now has different behavior. (I've actually tried to tackle this issue in the past and put probably a good 16 hours without succeeding due to differences in behavior with functions that aren't doced and tested. Bob's much more patient than I am.)

I won't claim that having tests in place would have prevented this from happening. But I will claim that it would have saved hours of time if they were in place. And I'll also claim that it would have taken much less time to have generated the tests when the code was written.

If anyone has spare time, an immediate thing you can help with is writing unit tests and documentation for the Math library for any functions that don't have tests and doc. For the time being, I'm going to continue to suggest strongly that internal functions have tests.


Daniel


Bob Carpenter

unread,
Sep 13, 2016, 10:43:30 AM9/13/16
to stan...@googlegroups.com
Can we trace the inputs that cause beta_ccdf_log() to fail? Then
we can tell if the bug is in my pull request or in the doc/tests
for beta_ccdf_log(). The new pull request raises exceptions and
changes error messages in cases where there used to be either
NaN or -inf returns or different messages. So I'd wager the
problem is in beta_ccdf_log or its tests, not in the lower-level
function, even if that's the proximate cause of failure.

One issue I'm having is that the distribution tests take so long
to run that I can't reasonably run them locally before putting in
a pull request.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Daniel Lee

unread,
Sep 13, 2016, 10:46:10 AM9/13/16
to stan-dev mailing list
I traced it down by hand. It'll fail with input values of y = 1, alpha = 1, beta = 1 when either alpha or beta are stan::math::var. It won't fail if all three are double.



Daniel


> To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+unsubscribe@googlegroups.com.

> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+unsubscribe@googlegroups.com.

Daniel Lee

unread,
Sep 13, 2016, 10:49:17 AM9/13/16
to stan-dev mailing list
I put a comment on the pull request with a simplified test.

Bob Carpenter

unread,
Sep 13, 2016, 11:37:31 AM9/13/16
to stan...@googlegroups.com
Thanks for tracking this down. The root of the problem is
how we want to deal with overflow boundaries:

beta_lccdf(1 | 1, 1)

= log1m(beta_cdf(1 | 1, 1))

= log1m(1)

= log(1 - 1)

= -infinity

The change I made causes log1m(1) to throw an overflow exception.
I see two fixes for this:

* not have any of our basic functions throw overflow exceptions
but just let them return -infinity, +infinity or whatever

* have beta_ccdf() throw an overflow exception rather than
returning -infinity

The one reason we might prefer the former is that exp(-infinity) = 0,
and that may be useful.

Let's discuss at the meeting on Thursday and maybe rethink
the basic math lib behavior. Whatever we do, we should be
consistent across all of our distributions w.r.t. what we
do for overflow.

- Bob
> > To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.
> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.

Bob Carpenter

unread,
Sep 13, 2016, 3:45:29 PM9/13/16
to stan...@googlegroups.com
From another issue:

> beta_lccdf(1 | 1, 1)
>
> = log1m(beta_cdf(1 | 1, 1))
>
> = log1m(1)
>
> = log(1 - 1)
>
> = -infinity


Should we return -infinity or throw an overflow exception
for

* beta_lccdf(1 | a, b)

* log1p(-1)
* log1m(1)

* log(0)

or should we return -infinity and go on?

The reason I ask is that I'm reconfiguring our basic functions
like log1m and log1p to use Boost and need to figure out how
to configure them. We decided to have them throw out-of-domain
exceptions rather than returning NaN, but haven't otherwise decided
what to do for overflow

- Bob

Krzysztof Sakrejda

unread,
Sep 13, 2016, 3:57:31 PM9/13/16
to stan...@googlegroups.com
Throw exceptions.  All of these produce useless results and the sooner the calling code finds out about it the better. -Inf is ambiguous (it's not like we're doing limits here, just using it as a flag). Krzysztof

Daniel Lee

unread,
Sep 13, 2016, 3:58:19 PM9/13/16
to stan-dev mailing list
Thanks for the summary, Bob.

On Tue, Sep 13, 2016 at 3:44 PM, Bob Carpenter <ca...@alias-i.com> wrote:
From another issue:

>  beta_lccdf(1 | 1, 1)
>
>  = log1m(beta_cdf(1 | 1, 1))
>
>  = log1m(1)
>
>  = log(1 - 1)
>
>  = -infinity


Should we return -infinity or throw an overflow exception
for

  * beta_lccdf(1 | a, b)

  * log1p(-1)
  * log1m(1)

  * log(0)

or should we return -infinity and go on?

(I really wish we didn't have to make this decision, but I guess we've got to.)

One option is to match C++11's standard library's behavior to the best of our ability. For the log1p(-1) example, this would return -infinity. See:
If we went with that option, we could eventually remove the dependency on Boost for these calls.

The other sensible option would be to match Boost's behavior.


I think I'm leaning towards the standard library. Maybe I'm just tired of trying to coerce different libraries into doing the consistent thing. Is there any benefit to using the std implementation wherever possible? Like specialized libraries that may not exist for Boost?


Daniel

Bob Carpenter

unread,
Sep 13, 2016, 4:14:29 PM9/13/16
to stan...@googlegroups.com
-infinity isn't totally useless:

exp(-infinity) = 0

-infinity < x for x finite, or x = +infinity

I think 1 / infinity = 0, too, but I'm not 100% sure.

But it'll almost certainly break derivatives when
you get 0 / 0 evaluating a derivative of exp(log(x))
for x = 0.

- Bob
Reply all
Reply to author
Forward
0 new messages