--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hmm... that statement illuminated my issue a little better. Suppose that I don't care about a,b I only care about z!!
Think of this as a noise-filtering issue, i have y which is noisy measurements of z, and I'd like to figure out z, and I happen to know that z has a certain form with some unknown parameters involved, but I do have some information about the unknown parameters, but I don't really care what the parameters are and I don't really have any strong knowledge of the probability distribution over a,b.
> now I'd like to specify the information I have about the relationship between y and x, i could do.
>
> exp(-(x-a)/b) ~ normal(y,ysigma);
>
This last statement is equivalent to
y ~ normal(exp((a - x) / b), ysigma);
which is the usual way to write the non-linear regression you provided above. Written
this way, you shouldn't need a Jacobian. It depends on what density you want to define.
(I've also eliminated a negation for efficiency.)
The key thing is to understand the density you're defining and make sure
it's the one you want.
On Thursday, August 13, 2015 at 4:58:36 PM UTC-7, Bob Carpenter wrote:This last statement is equivalent to
y ~ normal(exp((a - x) / b), ysigma);
which is the usual way to write the non-linear regression you provided above. Written
this way, you shouldn't need a Jacobian. It depends on what density you want to define.
(I've also eliminated a negation for efficiency.)
I don't need a Jacobian because stan figures it out and puts it in for me, or I don't need a Jacobian because of some other reason?
Stan often spits out warnings about jacobian corrections, and I can't figure out logically whether they're needed for my purposes.
Obviously, you're right that it matters what pdf I'm trying to create, and that's what I'm getting at. The two programs encode different models, but it's not entirely clear that the model WITH the jacobian *is the model* that I want. Furthermore, it's not so clear that "a nonlinear transform on the left hand side == needs a jacobian" and "no nonlinear transform" means no jacobian.
I guess I'm trying to figure out how to understand the meaning of the model with and without the jacobian so I can figure out which one I want! :-)
I'm particularly puzzled by the bit about putting the nonlinear transform on the right.
First off, normal_log(a,b,c) is mathematically symmetric with normal_log(b,a,c) since they're both -((a-b)/c)^2/2 right?
so I don't see how
exp((x-a)/b) ~ normal(y,ysigma);
leads to different mathematics from
y ~ normal(exp((x-a)/b),ysigma);
where I now "don't need a jacobian"
also, as Michael pointed out, if I really "care about inference on z" then I should make it a parameter. But it's a parameter that I'm requiring a certain structure to. So I could do it in stan like this:
parameter{
real z;
}
...
z ~ normal(exp((x-a)/b, 1e-36);
y ~ normal(z, ysigma);
there's no "nonlinear transformation on the left hand side" so "I don't need a jacobian", but z is with probability 1 so close to exp((x-a)/b) that it doesn't make a difference and this is terribly hard to sample (my guess). Nevertheless, suppose what I want is the limit of this model as the z stddev goes to zero. How do I code it in stan?
Perhaps itwill help you get started if you recognize that the fundamentalobject is a measure,p(x) dxwhere p(x) is the density. If you change parameters then youget a new density and a new volume, but the measure has tostay the same,p(x) dx = p(y) dyorp(x) = p(y) | dy / dx | = p(y(x)) | dy/dx (x) |where | dy/dx (x) | is exactly the Jacobian.
....
But that is NOT how Stan works. In Stan, these two expressions produce
identical results up to names of parameters in error messages.
I'll reiterate that the thing to do is understand how Stan
statements get translated to log densities.
The key here is that
y ~ foo(theta);
is translated as
increment_log_prob(foo_log(y, theta));
This is just syntactic sugar other than for the fact that the sampling
notation with ~ drops constant terms.
After that, it's up to you to apply whatever Jacobians you need.
Or maybe I'll just take the Gelman approach and draft such a book
myself, put your name on it, and let you correct all my misconceptions.
But that would be too easy in of itself. Because in Bayes Theorem
we specify a posterior with two terms — the prior and the likelihood.
The prior behaves is a density and so behaves as above, but the
likelihood is a different object. The likelihood is a conditional
probability distribution over data and when we plug in a particular
measurement it’s just a function with respect to the parameters.
That means likelihood terms _don’t_ get Jacobians.
Usually this is pretty clear, because by definition likelihoods define
how the data are sampled and we’d always write
data ~ distribution_name(theta, hyperparameters).
Because theta is on the right-hand side writing
data ~ distribution_name(f(theta), hyperparameters)
wouldn’t throw a Jacobian warning. Similarly, because data don’t
trigger Jacobian warnings
f(data) ~ distribution_name(theta, hyperparameters)
would also be fine.
What’s happening here is that Daniel Lakeland is abusing the
symmetry of the Gaussian density to write
data ~ normal(f(theta), hyperparameters)
as
f(theta) ~ normal(data, hyperparameters)
The latter doesn’t really make sense and abuses the intended
meaning of the ~ notation (which ultimately is allowed only because
the ~ notation is somewhat ill-defined in the first place). For any
other distribution this switch wouldn’t make sense and not needing
the Jacobian would be obvious.
My thought was: "if I correct this data in a certain way, it will be equal to the expression for the error term, and then I want that corrected data to be a normal(0,sigma) random variable"
That is, I was doing a nonlinear regression by saying in my head: y - myfunction(x,a,b) = epsilon
and was shocked that Stan would be telling me to change my model, and then when I thought about it, the request for a jacobian wasn't even a well defined request.
> On Aug 16, 2015, at 11:19 PM, Daniel Lakeland <dlak...@street-artists.org> wrote:
>
>
>
>
> But that would be too easy in of itself. Because in Bayes Theorem
> we specify a posterior with two terms — the prior and the likelihood.
> The prior behaves is a density and so behaves as above, but the
> likelihood is a different object. The likelihood is a conditional
> probability distribution over data and when we plug in a particular
> measurement it’s just a function with respect to the parameters.
> That means likelihood terms _don’t_ get Jacobians.
>
>
> Right, the data is fixed, there are no neighborhoods involved. Likelihoods are actually weird objects, because we talk about them like they're probabilities,
It's a common misconception, but nobody should be talking about densities as
if they're probabilities --- they're not even constrained to be less than 1!
Probability theory and math stats books are very careful to distinguish
event probabilities from densities. There's even a different notation with a
capital "P" (or "Pr") for event probabilities.
I'll insert an editorial note into the blog post if you remember where
it is.
I do love Andrew and Jennifer's book in part because they provide great
translation keys.
- Bob
the shoe sizes as priors. They're an unknown for sure. But then so is
a prediction for a new data point. "Prior" is usually reserved for
priors on parameters. Stan uses "parameters" for any unobserved variable,
which is admittedly confusing.
- Bob