This isn't a bug in Stan. But it does point out a number
of subtleties in the way Stan behaves w.r.t. constants
and (implicit) Jacobians.
I've updated the manual to try to make it clearer
what's going on. I'll summarize here, with R code
to validate vs. Stan 1.2.
First, if y is a data variable and lambda a parameter variable,
when you write
y ~ poisson(lambda);
Stan drops the normalizing term for y (because we only
need the log probability up to an additive constant that
doesn't depend on the params).
On the other hand,
lp__ <- lp__ + poisson_log(y,lambda);
does NOT drop the normalizing terms. So the difference
between these expressions of the model is that the
-log(y!) term is not added to lp__ in the sampling version.
This difference does NOT affect the samples that are drawn,
because it's only an additive term.
Second, if the declaration of lambda is constrained (as it
should be here),
parameters {
real<lower=0> lambda;
}
then there's an additional term in the log probability corresponding
to the log derivative of the inverse transform. We log transform,
so we actually run HMC/NUTS over y, where
y = log(lambda).
The inverse transform is exp, so the log Jacobian is just
log | d/dx exp(y) | = y
= log(lambda)
What I got wrong before was thinking that the log Jacobian
was just lambda. It's just y, which is log(lambda). Daniel Lee worked
it out and set me straight (thanks!).
Here's an example in R, using Stan 1.2, that verifies everything's
working correctly (I simplified to a single sample -- the issue's
the same with more than one sampling statement.)
poisson.stan
------------
parameters {
real<lower=0> lambda;
}
model {
3 ~ poisson(lambda);
}
Here's an R program to test the behavior,
poisson.R
---------
fit <- stan('poisson.stan',iter=100,chains=1);
fit_ss <- extract(fit);
fit_lambda <- fit_ss$lambda;
fit_lp <- fit_ss$lp__;
lpfun <- function(lam) { dpois(3,lam,log=TRUE) + log(lam) };
print("lambda, lp__, log_prob_from_R, diff")
for (n in 1:10) {
print(paste(c(fit_lambda[n],
fit_lp[n],
lpfun(fit_lambda[n]),
fit_lp[n] - lpfun(fit_lambda[n]))));
}
Running it in R gives us:
> source('poisson.R')
TRANSLATING MODEL 'poisson' FROM Stan CODE TO C++ CODE NOW.
COMPILING THE C++ CODE FOR MODEL 'poisson' NOW.
SAMPLING FOR MODEL 'poisson' NOW (CHAIN 1).
Iteration: 1 / 100 [ 1%] (Adapting)
Iteration: 10 / 100 [ 10%] (Adapting)
Iteration: 20 / 100 [ 20%] (Adapting)
Iteration: 30 / 100 [ 30%] (Adapting)
Iteration: 40 / 100 [ 40%] (Adapting)
Iteration: 50 / 100 [ 50%] (Adapting)
Iteration: 60 / 100 [ 60%] (Sampling)
Iteration: 70 / 100 [ 70%] (Sampling)
Iteration: 80 / 100 [ 80%] (Sampling)
Iteration: 90 / 100 [ 90%] (Sampling)
Iteration: 100 / 100 [100%] (Sampling)
[1] "lambda, lp__, log_prob_from_R, diff"
[1] "1.78793243119014" "0.536307112310427" "-1.25545235691763" "1.79175946922805"
[1] "0.773767329031482" "-1.7997035653141" "-3.59146303454216" "1.79175946922806"
[1] "3.72632350977326" "1.53536486367832" "-0.256394605549737" "1.79175946922805"
[1] "5.02324278265428" "1.43306000851938" "-0.358699460708681" "1.79175946922806"
[1] "1.92381455035318" "0.693425289837281" "-1.09833417939077" "1.79175946922805"
[1] "5.03502743963486" "1.43064846410193" "-0.361111005126123" "1.79175946922805"
[1] "6.99976027955474" "0.783743331209334" "-1.00801613801872" "1.79175946922806"
[1] "3.9970547866651" "1.54517635966183" "-0.246583109566227" "1.79175946922806"
[1] "3.26998279484947" "1.46915609867612" "-0.322603370551938" "1.79175946922805"
[1] "1.67638108841113" "0.390168335322012" "-1.40159113390604" "1.79175946922806"
>
So everything lines up once Daniel sorted out the Jacobian.
Thanks again for bringing this up. We were forgetting that the
explicit poisson_log() would behave differently w.r.t. constants
than calling the sampling statement.
- Bob