Wrong results in regression with log link function

Wayne Zhang

unread,

Aug 5, 2016, 7:15:43 PM8/5/16

to Stan users mailing list

I am running a simple regression model with a log link function: y ~ normal(exp(X*beta), sigma).

For this simple model, it seems that Stan is giving really weird results. In the following example, I simulated data using the coefficients

beta <- c(1, -0.1, 0.5, 0.2, 1.2, -1.2).

But Stan gives the following estimate (the last parameter is not estimated properly)

> get_posterior_mean(fit, "beta")[, 4]

beta[1] beta[2] beta[3] beta[4] beta[5] beta[6]

8.778758e-01 3.400380e-02 5.213096e-01 2.996850e-01 1.313288e+00 -1.136512e+11

In comparison, the standard glm function gives the right answer:

> coef(glm(y~-1+X, family = gaussian(link = "log")))

X(Intercept) Xfactor(x)2 Xfactor(x)3 Xfactor(x)4 Xfactor(x)5 Xfactor(x)6

0.88147641 0.03564664 0.52207573 0.30118363 1.31056680 -1.38623379

Please suggest what might be the problem and how to fix this. Thanks.

###############

# R code

###############

library(rstan)

set.seed(11)

beta <- c(1, -0.1, 0.5, 0.2, 1.2, -1.2)

sigma <- 0.5

N <- 40

J <- length(beta)

x <- gl(J, k = 5, length = N)

X <- model.matrix(as.integer(x) ~ factor(x))

y <- rnorm(N, exp(X %*% beta), sigma)

da_stan <- list(N = nrow(X), J = ncol(X),

y = y,

X = X)

fit <- stan(file = 'glm.stan', data = da_stan, chains = 3)

get_posterior_mean(fit, "beta")[, 4]

###############

# glm.stan

###############

data {

int<lower=0> N; // number of observations

int<lower=0> J; // number of predictors

vector[N] y;

matrix[N, J] X;

}

parameters {

vector[J] beta;

real<lower=0> sigma;

}

transformed parameters{

vector[N] mu;

mu = exp(X*beta);

}

model {

y ~ normal(mu, sigma);

}

Ben Goodrich

unread,

Aug 5, 2016, 7:21:43 PM8/5/16

to Stan users mailing list

On Friday, August 5, 2016 at 7:15:43 PM UTC-4, Wayne Zhang wrote:

I am running a simple regression model with a log link function: y ~ normal(exp(X*beta), sigma).

In this case, it is better to model the log of y than to transform the linear predictor, particularly with no prior information on sigma.

Ben

Wayne Zhang

unread,

Aug 5, 2016, 7:26:52 PM8/5/16

to stan-...@googlegroups.com

Thank you for your reply. I could use lognormal, but that is not what I am after.

The current regression with a log link is a legitimate model, and I would like to know why Stan does not yield reasonable estimates. Any ideas?

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/9s9yhiHIbpI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Krzysztof Sakrejda

unread,

Aug 5, 2016, 8:45:39 PM8/5/16

to Stan users mailing list

This model produces divergences after warmup, produces large R-hat values (particularly for beta[6] and sigma), and therefore shows no signs of having converged or even being a workable parameterization for this data set. Why our HMC integrator is diverging I can only guess but probably something to do with (as Ben suggests) the exponential transform and little data. On the beta[6] <0 side of the estimates for beta[6] there's not enough data to rule out very small values so in the unconstrained space the HMC sampler goes off towards very negative numbers. If you look at the pairs plot for mu[26] (calculated as exp(beta[6]) in my run of your script) you can see that when mu is close to zero you get divergences.

It may be a legitimate model but it needs to be reparameterized to give HMC (or almost any other sampler) a fair chance of producing samples correctly.

Krzysztof

Bob Carpenter

unread,

Aug 6, 2016, 3:25:51 PM8/6/16

to stan-...@googlegroups.com

This is not a problem with Stan. The problem is in the posterior.
Bayesian posterior aren't just fuzzy maximum likelihood estimates.
They weigh the prior and the likelihood. Your likelihood and
improper flat prior are the problem, and you can see that analytically
without any software.

beta[, 6] is zero everywhere other than n in 26:30:

beta[1:25, 6] = 0
beta[31: , 6] = 0

And when beta[, 6] is equal to 1, all other predictors are 0:

beta[26:31, 1:5] = 0
beta[26:31, 6] = 1

And the relevant outcomes are within two standard deviations (sigma = 0.5)
of zero:

> y[26:30]
[1] 0.8223102 0.7249307 0.4358804 0.7082023 0.3269365

Here's where you run into trouble: the prior on beta is improper
and you have an exp() link function. Therefore, you get a posterior
with a very flat likelihood for beta[6] < 0. Because your prior
is improper, those combine to put most of your probability mass
on high negative values.

Stan can't sample effectively from very nearly improper posteriors,
and it shows its inability by providing divergence diagnostics and
R-hat values clearly indicating non-convergence.

Even if you add a normal(0, 5) prior for beta, it will pull the
posterior to the point where the true values are in the 95% posterior
intervals. But because the likelihood is relatively flat, the prior will
have a very strong effect.

- Bob

> To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

Michael Betancourt

unread,

Aug 6, 2016, 6:30:05 PM8/6/16

to stan-...@googlegroups.com

Bayesian posterior aren't just fuzzy maximum likelihood estimates.

Wonderful elevator pitch!

Reply all

Reply to author

Forward