Clarification of logNormal sampling function

Tom Wallis

unread,

Jan 2, 2013, 10:16:49 AM1/2/13

to stan-...@googlegroups.com

I'm confused about your parameterisation of the log-normal distribution (manual page 163). As I understand it, this will return densities for y (in linear units), and requires inputs of mu (in log units) and sigma.

For example, if I wanted to specify a log-Normal hyperprior on a variance parameter (which can't be less than zero) with a mean of 2 and a standard deviation of 1 log unit, I would write:

y ~ lognormal(log(2),1)

and this would return y in linear (not log) units. I can then plug y into a lower-level sampling of something like:

expected_value ~ normal(0,y)

Is this correct?

I plugged the formula (p. 163) into R to play around with it, and this gives slightly different values for the mean. Specifically, the mean on semilog axes doesn't always sit at the expected value of 2, but seems to depend on the sigma value such that as sigma gets larger the mean gets smaller. This seems to be caused by the (1 / y) term in the equation you use, which when removed causes the function to behave as I expect (with a mean of two no matter the sigma). What's the reason for this term? Can I use the logNormal function as I outlined above?

Thanks

Tom Wallis

Bob Carpenter

unread,

Jan 2, 2013, 1:41:40 PM1/2/13

to stan-...@googlegroups.com

On 1/2/13 10:16 AM, Tom Wallis wrote:
> I'm confused about your parameterisation of the log-normal distribution (manual page 163).

For the Stan parameterizations, we just followed the appendix of
Gelman et al.'s "Bayesian Data Analysis" book.

It's similar to the one used on the Wikipedi page, but uses
deviation (sigma) instead of variance (sigma^2) as a parameter:

http://en.wikipedia.org/wiki/Log-normal_distribution

> As I understand it, this will
> return densities for y (in linear units), and requires inputs of mu (in log units) and sigma.

There's also discussion of how log normals are defined as
an example in section 11.2, changes of variables.

The idea is that y has a lognormal(mu,sigma) distribution
if log(y) has a normal(mu,sigma) distribution.

> For example, if I wanted to specify a log-Normal hyperprior on a variance parameter (which can't be less than zero) with
> a mean of 2 and a standard deviation of 1 log unit, I would write:
>
> y ~ lognormal(log(2),1)
>
> and this would return y in linear (not log) units.

First a point of clarification. lognormal(log(2),1)
doesn't return anything. It's just the name of a distribution
in Stan. What happens when the above statement is executed is
the total log probability gets incremented as follows:

lp__ += lognormal_log(y,log(2),1);

You can see this in the .cpp file output by stanc.

The language for talking about units here is confusing at
the best of times.

> I can then plug y into a lower-level sampling of something like:
>
> expected_value ~ normal(0,y)

The expected value of normal(0,y) is 0 so I'm not sure what this
notation is supposed to mean.

> Is this correct?
>
> I plugged the formula (p. 163) into R to play around with it, and this gives slightly different values for the mean.
> Specifically, the mean on semilog axes doesn't always sit at the expected value of 2, but seems to depend on the sigma
> value such that as sigma gets larger the mean gets smaller. This seems to be caused by the (1 / y) term in the equation
> you use, which when removed causes the function to behave as I expect (with a mean of two no matter the sigma). What's
> the reason for this term? Can I use the logNormal function as I outlined above?

See the Stan manual about changes of variables or
read the relevant section of any respectable math stats book,
such as DeGroot and Schervish or Larsen and Marx. Or
on the Wikipedia:

http://en.wikipedia.org/wiki/Probability_density_function#Dependent_variables_and_change_of_variables

Also, expectations may not be what you expect because of
the curvature of the log function. In thinking about this issue,
you want to study Jensen's inequality:

http://en.wikipedia.org/wiki/Jensen's_inequality

- Bob

Tom Wallis

unread,

Jan 2, 2013, 1:56:38 PM1/2/13

to stan-...@googlegroups.com

Thanks for your detailed reply Bob, that's helpful.

What I meant by the line "expected_value ~ normal(0,y)" was just that the parameter "expected_value" is normally distributed with a mean of 0 and a standard deviation of y. I should have named the parameter something else to avoid the confusion. Equivalently:

beta ~ normal(0, y).

Bob Carpenter

unread,

Jan 2, 2013, 2:04:24 PM1/2/13

to stan-...@googlegroups.com

On 1/2/13 1:56 PM, Tom Wallis wrote:
> Thanks for your detailed reply Bob, that's helpful.
>
> What I meant by the line "expected_value ~ normal(0,y)" was just that the parameter "expected_value" is normally
> distributed with a mean of 0 and a standard deviation of y. I should have named the parameter something else to avoid
> the confusion.

Yeah, I wouldn't recommend "expected_value" as the name of
a parameter (unless it is the expected value of something,
but even then I wouldn't recommend it).

> Equivalently:
>
> beta ~ normal(0, y).

Right -- if this is the only statement in the model containing beta
and y is a constant (not itself a parameter),
then beta will have a posterior mean of 0 and posterior
deviation of y.

- Bob

Matt Hoffman

unread,

Jan 2, 2013, 2:10:11 PM1/2/13

to stan-...@googlegroups.com

Hi Tom,

The mean of a log-normal distribution is a function of both mu and
sigma; specifically E[y] = exp(mu + 0.5*sigma^2), which is only equal
to exp(mu) when sigma goes to 0:
http://en.wikipedia.org/wiki/Log-normal_distribution
The 1/y term comes as a result of transforming from the unconstrained
space of log(x) to the nonnegative space of x. If you get rid of it
then you won't have a properly normalized distribution—effectively,
you're sampling from a distribution that's proportional to a
log-normal with a different mu.

If you want to specify the mean directly, you can say something like
mu = log(ymean) - 0.5 * sigma * sigma;
where ymean is the desired mean for y. That approach is much less
likely to lead to trouble than random fiddling with density functions.

Best,
Matt

> --
>
>

Reply all

Reply to author

Forward