> On Apr 28, 2015, at 6:16 PM, Michael Betancourt <
betan...@gmail.com> wrote:
>
>> Let's separate the intent of the namespace with the general functions
>> in prim from the probability densities.
>>
>> The probability densities in prim are meant to be instantiated with
>> double if you just include prim. If you also include the template
>> specializations in fwd and rev, then you can instantiate with var and
>> fvar.
>>
>> Other functions within the prim namespace work differently, with
>> a double-based implementation (or just using the cmath as in
>> case of exp and log) and then special fwd and rev implementations.
>
> The different behavior for fun and prim is particularly confusing,
You mean the different way that functions in prim/scal/prob vs. prim/scal/fun
are implemented? We could use the approach in prim/prob for
examples in prim/fun. I don't think we need to keep the stan::prob
namespace vs. stan::math namespace distinction.
And I'm not sure about stan::agrad as a namespace. It was meant to
mean "automatic gradients", so it doesn't even make sense for fvar!
I'd be OK merging all of:
* stan::math
* stan::agrad
* stan::prob
It'd make the argument-dependent lookup easier to manage.
> especially with the former requiring rev and fwd specializing and
> the latter not.
Everything in prim requires rev and fwd specializations of at
least part of their implementations, because nothing in prim
should depend on fwd or rev.
> In particular, I think it’s incredibly misleading to say “prim just
> gets instantiated with doubles if you just include prim” when
> you’re not always in control of what gets included by the user.
The point isn't that prim only gets instantiated with double,
but rather that nothing in prim depends on rev or fwd. We should
be able to include just stan/math/prim and get a double-based
math library that works and never includes any fwd or rev functionality.
Whenever you write a template function in C++, you can't control
how it gets instantiated unless you pile on disable_if/enable_if
conditions. If I write
template <typename T>
T id(T x) { return x; }
it'll be the identity function for any type T a user implements,
as long as supplies a copy constructor.
> All I wanted to hear was “the prob distributions in prim can
> get instantiated with any autodiff type, depending on how they’re
> used”.
That's true if you add that "and I've included all the template
specializations required for fwd and rev" (of the vector views
and operands/partials structures). If you only include
prim, you won't be able to do that.
Sorry if I was being too literal --- it's the danger of a computer
science upbringing :-)
>> It only works for classes, not primitive types, which aren't
>> defined in a namespace. What gets brought in is the namespace
>> where the class is defined. So even if a function uses var
>> as an argument, it won't get brought in unless it's included and
>> the function is defined in namespace a grad.
>
> That is helpful.
>
>> I'd think autodiffing through the functions would be better than
>> nothing. I understand it's not ideal and also understand that it can
>> be worse than not having anything if it's too bad.
>> Have you tested the accuracy of the derivatives?
>
> Let’s consider the motivation behind this particular branch. I
> was building a model with a colleague that required negative
> binomial CDF for very large arguments, arguments large enough
> to strain the naive gradient implementation which looped for the
> max number of iterations before prematurely terminating at
> an inaccurate value. And this was for the gradient — autodiffing
> through this would be even worse.
>
> I am very much in the belief that one of the best features of Stan
> is that it informs the user of any problems and I’d rather support
> this design by not including fragile higher-order derivatives.
I think it should be a judgment call about how fragile they are.
In a very real sense, all of our functions are fragile because floating
point's fragile. Subtraction's a disaster! But even exp() only works with
inputs with absolute values less than 800. That seems very fragile to me,
but it doesn't mean we don't include it. And inverse() for matrices is fragile
if the input matrix is not well conditioned, but we still include it.
(Someone at my talk tonight, who was older than me, told me that his father,
who is now 90, drilled into his head when he was a wee lad that one should
never invert a matrix --- so at least part of my talk resonated!)
So I think it comes down to where to draw the line. That's what I
meant by "be worse than not having anything if it's [the behavior] too
bad". If it's too ill-behaved then I agree we shouldn't include it.
- Bob