calculation of Poisson log-likelihood from STAN does match dpois in R

1,053 views
Skip to first unread message

atzap

unread,
Apr 10, 2013, 4:21:44 PM4/10/13
to stan-...@googlegroups.com
Hi all

The calculation of the Poisson log-likelihood at each iteration in STAN differs from the dpois function in R.  Presumably there is a straightforward explanation to this that I've just missed?  I note that the difference isn't constant if you plot out the two sets of estimates.

Consider the follow model:
m1p1 <- "
data {
 int<lower=0> y[50];
}
parameters {
 real<lower = 0> lambda;
}
model {
 lp__ <- lp__ + poisson_log(y,lambda);
}"

# Generate some data and initial values:
data <- list(y =rpois(50, 100))
init <- function(x){list(list("lambda" = runif(1)))}

# And run a quick test:
tmp1 <- stan(model_code = m1p1, data = data, pars = "lambda",
init = init(), iter = 200, warmup = 100, thin = 1, chains = 1)

# Extract the lambda estimates
estlam <- extract(tmp1, "lambda")[[1]]

# Plug them into dpois
sapply(estlam, function(x){sum(dpois(x = data$y, lambda = x, log = T))})

# Extract the STAN lp__ estimates
# they should differ from the above line
extract(tmp1, "lp__")[[1]]


Many thanks,
Andrew



Marcus Brubaker

unread,
Apr 10, 2013, 4:24:52 PM4/10/13
to stan-...@googlegroups.com
One thing is that Stan automatically drops constants when they're not
needed for estimation. E.G., the sqrt(2*pi) term in a normal
distribution is not included in the lp__. This should be easy to
check, as the difference between dpois and lp__ should be a constant.

Cheers,
Marcus
> --
> You received this message because you are subscribed to the Google Groups
> "stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

atzap

unread,
Apr 10, 2013, 4:29:54 PM4/10/13
to stan-...@googlegroups.com
Yup, it's not constant.  Differences range between -4.62 and -4.55 in an example I just ran.

Cheers,
Andrew

Bob Carpenter

unread,
Apr 11, 2013, 12:14:23 AM4/11/13
to stan-...@googlegroups.com
HMC only requires log probability functions up
to an additive constant. So Stan automatically drops
all terms in the log probability sum that involve only
constants.

Because

log Poisson(y|lambda)
= - y * log(lambda)
- log y!
- lambda

and you call it with y constant (declared as data),
the second term (- log gamma(y+1)) gets dropped.

- Bob

Ben Goodrich

unread,
Apr 11, 2013, 12:44:28 AM4/11/13
to stan-...@googlegroups.com
On Thursday, April 11, 2013 12:14:23 AM UTC-4, Bob Carpenter wrote:
HMC only requires log probability functions up
to an additive constant.  So Stan automatically drops
all terms in the log probability sum that involve only
constants.

Because

   log Poisson(y|lambda)
     = - y * log(lambda)
       - log y!
       - lambda

and you call it with y constant (declared as data),
the second term (- log gamma(y+1)) gets dropped.

As Andrew mentioned, the difference isn't a constant, and moreover they aren't equal when you do it with doubles.

library(rstan)

data
<- list(y =rpois(50, 100))
lambda <- 100


m1p1 <- "
data {
 int<lower=0> y[50];
 }
parameters {
 real<lower = 0> lambda;
}
model {
 lp__ <- lp__ + poisson_log(y,lambda);
}"

tmp1 <- stan(model_code = m1p1, data = data, pars = "lambda", chains = 0)
log_prob
(tmp1, lambda) - sum(dpois(data$y, lambda, log = TRUE)) # not zero

Ben

Jiqiang Guo

unread,
Apr 11, 2013, 12:10:19 PM4/11/13
to stan-...@googlegroups.com

On Thu, Apr 11, 2013 at 12:44 AM, Ben Goodrich <goodri...@gmail.com> wrote:
library(rstan)

data
<- list(y =rpois(50, 100))
lambda <- 100


m1p1 <- "
data {
 int<lower=0> y[50];
 }
parameters {
 real<lower = 0> lambda;
}
model {
 lp__ <- lp__ + poisson_log(y,lambda);
}"

tmp1 <- stan(model_code = m1p1, data = data, pars = "lambda", chains = 0)
log_prob
(tmp1, lambda) - sum(dpois(data$y, lambda, log = TRUE)) # not zero

I think the issue here is that our lp__ actually is defined on parameters with non-constrained space and the jacobian has been adjusted.  So to compare them we need to work on unconstrained space for the lambda here and consider the difference due to the transformation on the parameters. 

> lambda <- 50
> log_prob(tmp1, log(lambda)) - sum(dpois(data$y, lambda, log = TRUE))
[1] 3.912023
> log(lambda)
[1] 3.912023
> lambda <- 100
> log_prob(tmp1, log(lambda)) - sum(dpois(data$y, lambda, log = TRUE))
[1] 4.60517
> log(lambda)
[1] 4.60517

--
Jiqiang 



atzap

unread,
Apr 11, 2013, 2:21:46 PM4/11/13
to
Thanks Jiqiang!  

To follow-up, why does this work only when the prior is removed from lambda?  Your explanation doesn't work if I add lambda ~ normal(0,100) to the model.  I thought not specifying a prior still constrains lambda uniformly onto the interval between zero and infinity.

Andrew

Jiqiang Guo

unread,
Apr 11, 2013, 2:31:32 PM4/11/13
to stan-...@googlegroups.com
It works for me.  How did it not work for you? 

> data <- list(y =rpois(50, 100))
> lambda <- 100
> m1p1 <- "
+ data {
+  int<lower=0> y[50];
+  }
+ parameters {
+  real<lower = 0> lambda;
+ }
+ model {
+   y ~ poisson(lambda);
+   lambda ~ normal(0, 10);
+ }"
>
> tmp1 <- stan(model_code = m1p1, data = data, pars = "lambda", chains = 0)

TRANSLATING MODEL 'm1p1' FROM Stan CODE TO C++ CODE NOW.
COMPILING THE C++ CODE FOR MODEL 'm1p1' NOW.
the number of chains is less than 1; sampling not done

> log_prob(tmp1, log(lambda)) - sum(dpois(data$y, lambda, log = TRUE)) - dnorm(lambda, 0, 10, log = TRUE) - log(lambda)
[1] 18272.69
> lambda <- 50
> log_prob(tmp1, log(lambda)) - sum(dpois(data$y, lambda, log = TRUE)) - dnorm(lambda, 0, 10, log = TRUE) - log(lambda)
[1] 18272.69
> lambda <- 51
> log_prob(tmp1, log(lambda)) - sum(dpois(data$y, lambda, log = TRUE)) - dnorm(lambda, 0, 10, log = TRUE) - log(lambda)
[1] 18272.69
> lambda <- 52
> log_prob(tmp1, log(lambda)) - sum(dpois(data$y, lambda, log = TRUE)) - dnorm(lambda, 0, 10, log = TRUE) - log(lambda)
[1] 18272.69

--
Jiqiang 


On Thu, Apr 11, 2013 at 2:21 PM, atzap <ata...@gmail.com> wrote:
Thanks Jiqiang!  

To follow-up, why does this work only when the prior is removed from lambda.  Your explanation doesn't work if I add lambda ~ normal(0,100) to the model.  I thought not specifying a prior still constrains lambda uniformly onto the interval between zero and infinity.

Andrew


I think the issue here is that our lp__ actually is defined on parameters with non-constrained space and the jacobian has been adjusted.  So to compare them we need to work on unconstrained space for the lambda here and consider the difference due to the transformation on the parameters. 

> lambda <- 50
> log_prob(tmp1, log(lambda)) - sum(dpois(data$y, lambda, log = TRUE))
[1] 3.912023
> log(lambda)
[1] 3.912023
> lambda <- 100
> log_prob(tmp1, log(lambda)) - sum(dpois(data$y, lambda, log = TRUE))
[1] 4.60517
> log(lambda)
[1] 4.60517

--
Jiqiang 

atzap

unread,
Apr 11, 2013, 3:25:48 PM4/11/13
to stan-...@googlegroups.com
On Thursday, April 11, 2013 2:31:32 PM UTC-4, Jiqiang Guo wrote:
It works for me.  How did it not work for you?  
 
I must be doing something wrong, i.e. don't fully understand what you've done!  I want to calculate sum(dpois(data$y, lambda, log = T)).  As per your last email, I thought it was log_prob(tmp1, log(lambda)) - log(lambda), and log_prob(tmp1, log(lambda)) - log(lambda) -  dnorm(lambda, 0, 10, log = TRUE) when I added the normal prior.  But this isn't working...  Please see below:

> library(rstan)
> data <- list(y =rpois(50, 100))
> m1p1 <- "
+ data {
+   int<lower=0> y[50]; 
+  }
+  parameters {
+   real<lower = 0> lambda;
+  }
+  model {
+   lambda ~ normal(0,10);
+   lp__ <- lp__ + poisson_log(y,lambda);
+  }"
>  
> tmp1 <- stan(model_code = m1p1, data = data, pars = "lambda", chains = 0)

TRANSLATING MODEL 'm1p1' FROM Stan CODE TO C++ CODE NOW.
COMPILING THE C++ CODE FOR MODEL 'm1p1' NOW.
the number of chains is less than 1; sampling not done
> lambda <- 50
> sum(dpois(data$y, lambda, log = T))
[1] -1178.749
> log_prob(tmp1, log(lambda)) - log(lambda) 
[1] -1191.249
> log_prob(tmp1, log(lambda)) - log(lambda) -  dnorm(lambda, 0, 10, log = TRUE)
[1] -1175.527

Thanks,
Andrew

Bob Carpenter

unread,
Apr 11, 2013, 4:11:37 PM4/11/13
to stan-...@googlegroups.com
I'm not sure which version of Stan you're using,
but if it's a recent version from GitHub, Daniel's
just isolated an issue that causes

lp__ <- lp__ + poisson_log(y,lambda);

to yield different results than:

y ~ poisson(lambda);

These two expressions should be equivalent and we'll
fix it so that they are in the upcoming 1.3 release
(hopefully today or tomorrow).

Thanks again for bringing up the issue. When it's
all fixed, I'll provide a working example.

- Bob

atzap

unread,
Apr 11, 2013, 4:26:35 PM4/11/13
to stan-...@googlegroups.com
Thanks Bob.  It is version 1.2.0 (packaged: 2013-03-06 21:47:56 UTC, GitRev: 1cde932d15a3), so look forward to the fix!

One last thing is that I'm finding similar issues with the negative binomial.  It might be worth having a check with that as well.  I was waiting to sort out the Poisson issue before bringing this up.

Andrew

Jiqiang Guo

unread,
Apr 11, 2013, 5:34:27 PM4/11/13
to stan-...@googlegroups.com

On Thu, Apr 11, 2013 at 3:25 PM, atzap <ata...@gmail.com> wrote:
On Thursday, April 11, 2013 2:31:32 PM UTC-4, Jiqiang Guo wrote:
It works for me.  How did it not work for you?  
 
I must be doing something wrong, i.e. don't fully understand what you've done!

What Bob pointed out is another issue here (y ~ poisson is not equivalent to lp__ <- lp__ + poisson_log).  But in any case in your rstan version, our lp__ just drops an additive constant.  For this model, what I have done is to compare 

log_prob(tmp1, log(lamba)) // get lp__ computed by stan 

with 

sum(dpois(data$y, lambda, log = T)) + log(lambda) + dnorm(lambda, 0, 10, log = TRUE) 

and see if their difference is a *constant* for any lambda.  In the above calculation of lp__ manually in R, we adjust the transform that would be done in Stan (the item of log(lambda)) and add the prior (the last item). 

> library(rstan)
Loading required package: Rcpp
Loading required package: inline

Attaching package: ‘inline’

The following object is masked from ‘package:Rcpp’:

    registerPlugin

Loading required package: stats4
rstan (Version 1.2.0, packaged: 2013-03-06 21:47:56 UTC, GitRev: 1cde932d15a3)
>
> data <- list(y =rpois(50, 100))
> m1p1 <- "
+ data {
+   int<lower=0> y[50];
+  }
+  parameters {
+   real<lower = 0> lambda;
+  }
+  model {
+   lambda ~ normal(0,10);
+   lp__ <- lp__ + poisson_log(y,lambda);
+  }"
>
> tmp1 <- stan(model_code = m1p1, data = data, pars = "lambda", chains = 0)

TRANSLATING MODEL 'm1p1' FROM Stan CODE TO C++ CODE NOW.

lambda <- 50
sum(dpois(data$y, lambda, log = T)) + log(lambda) + dnorm(lambda, 0, 10, log = TRUE) - log_prob(tmp1, log(lambda))
lambda <- 51
sum(dpois(data$y, lambda, log = T)) + log(lambda) + dnorm(lambda, 0, 10, log = TRUE) - log_prob(tmp1, log(lambda))
lambda <- 52
sum(dpois(data$y, lambda, log = T)) + log(lambda) + dnorm(lambda, 0, 10, log = TRUE) - log_prob(tmp1, log(lambda))

COMPILING THE C++ CODE FOR MODEL 'm1p1' NOW.
the number of chains is less than 1; sampling not done
>
> lambda <- 50
> sum(dpois(data$y, lambda, log = T)) + log(lambda) + dnorm(lambda, 0, 10, log = TRUE) - log_prob(tmp1, log(lambda))
[1] -3.221524
> lambda <- 51
> sum(dpois(data$y, lambda, log = T)) + log(lambda) + dnorm(lambda, 0, 10, log = TRUE) - log_prob(tmp1, log(lambda))
[1] -3.221524
> lambda <- 52
> sum(dpois(data$y, lambda, log = T)) + log(lambda) + dnorm(lambda, 0, 10, log = TRUE) - log_prob(tmp1, log(lambda))
[1] -3.221524


Note the above difference is always -3.221524 for all lambda's I have tried.  And since poisson_log in rstan 1.2 actually does not drop its constant.  The above difference is just due the constant that we dropped for calculating the prior of lambda. 

> log(sqrt(2*pi)*10)
[1] 3.221524

--
Jiqiang 

atzap

unread,
Apr 11, 2013, 6:23:26 PM4/11/13
to stan-...@googlegroups.com
Great, got it now.  This is the part that I was missing:  
> log(sqrt(2*pi)*10)
[1] 3.221524

 
Thank you again for your help.
Andrew

Bob Carpenter

unread,
Apr 11, 2013, 6:37:09 PM4/11/13
to stan-...@googlegroups.com
This isn't a bug in Stan. But it does point out a number
of subtleties in the way Stan behaves w.r.t. constants
and (implicit) Jacobians.

I've updated the manual to try to make it clearer
what's going on. I'll summarize here, with R code
to validate vs. Stan 1.2.

First, if y is a data variable and lambda a parameter variable,
when you write

y ~ poisson(lambda);

Stan drops the normalizing term for y (because we only
need the log probability up to an additive constant that
doesn't depend on the params).

On the other hand,

lp__ <- lp__ + poisson_log(y,lambda);

does NOT drop the normalizing terms. So the difference
between these expressions of the model is that the
-log(y!) term is not added to lp__ in the sampling version.

This difference does NOT affect the samples that are drawn,
because it's only an additive term.

Second, if the declaration of lambda is constrained (as it
should be here),

parameters {
real<lower=0> lambda;
}

then there's an additional term in the log probability corresponding
to the log derivative of the inverse transform. We log transform,
so we actually run HMC/NUTS over y, where

y = log(lambda).

The inverse transform is exp, so the log Jacobian is just

log | d/dx exp(y) | = y

= log(lambda)

What I got wrong before was thinking that the log Jacobian
was just lambda. It's just y, which is log(lambda). Daniel Lee worked
it out and set me straight (thanks!).

Here's an example in R, using Stan 1.2, that verifies everything's
working correctly (I simplified to a single sample -- the issue's
the same with more than one sampling statement.)

poisson.stan
------------
parameters {
real<lower=0> lambda;
}
model {
3 ~ poisson(lambda);
}


Here's an R program to test the behavior,

poisson.R
---------
fit <- stan('poisson.stan',iter=100,chains=1);
fit_ss <- extract(fit);
fit_lambda <- fit_ss$lambda;
fit_lp <- fit_ss$lp__;

lpfun <- function(lam) { dpois(3,lam,log=TRUE) + log(lam) };
print("lambda, lp__, log_prob_from_R, diff")
for (n in 1:10) {
print(paste(c(fit_lambda[n],
fit_lp[n],
lpfun(fit_lambda[n]),
fit_lp[n] - lpfun(fit_lambda[n]))));
}


Running it in R gives us:

> source('poisson.R')

TRANSLATING MODEL 'poisson' FROM Stan CODE TO C++ CODE NOW.
COMPILING THE C++ CODE FOR MODEL 'poisson' NOW.
SAMPLING FOR MODEL 'poisson' NOW (CHAIN 1).
Iteration: 1 / 100 [ 1%] (Adapting)
Iteration: 10 / 100 [ 10%] (Adapting)
Iteration: 20 / 100 [ 20%] (Adapting)
Iteration: 30 / 100 [ 30%] (Adapting)
Iteration: 40 / 100 [ 40%] (Adapting)
Iteration: 50 / 100 [ 50%] (Adapting)
Iteration: 60 / 100 [ 60%] (Sampling)
Iteration: 70 / 100 [ 70%] (Sampling)
Iteration: 80 / 100 [ 80%] (Sampling)
Iteration: 90 / 100 [ 90%] (Sampling)
Iteration: 100 / 100 [100%] (Sampling)

[1] "lambda, lp__, log_prob_from_R, diff"
[1] "1.78793243119014" "0.536307112310427" "-1.25545235691763" "1.79175946922805"
[1] "0.773767329031482" "-1.7997035653141" "-3.59146303454216" "1.79175946922806"
[1] "3.72632350977326" "1.53536486367832" "-0.256394605549737" "1.79175946922805"
[1] "5.02324278265428" "1.43306000851938" "-0.358699460708681" "1.79175946922806"
[1] "1.92381455035318" "0.693425289837281" "-1.09833417939077" "1.79175946922805"
[1] "5.03502743963486" "1.43064846410193" "-0.361111005126123" "1.79175946922805"
[1] "6.99976027955474" "0.783743331209334" "-1.00801613801872" "1.79175946922806"
[1] "3.9970547866651" "1.54517635966183" "-0.246583109566227" "1.79175946922806"
[1] "3.26998279484947" "1.46915609867612" "-0.322603370551938" "1.79175946922805"
[1] "1.67638108841113" "0.390168335322012" "-1.40159113390604" "1.79175946922806"
>

So everything lines up once Daniel sorted out the Jacobian.

Thanks again for bringing this up. We were forgetting that the
explicit poisson_log() would behave differently w.r.t. constants
than calling the sampling statement.

- Bob
Reply all
Reply to author
Forward
0 new messages