On 4/27/13 12:18 PM, Carlos wrote:
> I built rep_matrix after building a bunch of models myself and
> getting frustrated that we didn't have it!
> I'm going to add indexing by array for subslicing soon
> too, and also block operations for vectors.
>
>
> That would be amazing! Also, I recommend implementing the vec and the kronecker operators... :)
>
> We probably won't be getting to discrete variable sampling
> any time soon. We know how to do it inefficiently, but doing
> it efficiently is going to be a lot of work from where we're
> at.
>
>
> I imagined that... I guess that it is difficult to implement (sigh)
>
> That's essentially what we suggest in the mixture modeling
> chapter in the manual. Marginalize out the actual binary
> variable and you're often left with just such a probability
> in (0,1) instead of the binary variable. Not sure if that'll
> work for the model you care about. And I don't really have
> time to dig the details out of the papers you cite. If you
> can provide an example of what the model would look like in
> Stan if we did have binary parameters, it should be easy
> to tell.
>
>
> Sooo, let me consider two examples. The first one would be an stochastic search, where you have a bunch of variables,
> and you do not know which variables should enter as a regressors in a model, for example. Imagine that we have some
> data, Y, and we consider 3 regressors, X1, X2, and X2, with associated coefficients b1,b2 and b3. Then, we would like to
> model,
>
> |
> Y ~ normal(d1*b1*X1 + d2*b2*X2 + d3*b3*X3, sigma)
> |
This is the one we haven't figured out how to do. It's
easy enough in a small dimensional case, but the marginalization
grows exponentially with the number of possibly-zero variates.
For example, with:
Y ~ normal(d1 * b1 * X1 + d2 * b2 * X2,
sigma);
you can marginalize out the indicator variables d1 and d2
based on their probabilities of being included, p_d1 and p_d2.
Hang on, it's ugly.
parameters {
real<lower=0,upper=1> p_d1;
real<lower=0,upper=1> p_d2;
}
model {
real log_p_12;
real log_p_1;
real log_p_2;
real log_p;
log_p_12 <- log(p_d1) + log(p_d2);
log_p_1 <- log(p_d1) + log1m(p_d2);
log_p_2 <- log1m(p_d1) + log(p_d2);
log_p <- log1m(p_d1) + log1m(p_d2);
for (n in 1:size(Y)) {
real log_p[4];
log_p[1] <- log_p_12 + normal_log(Y[n], b1 * X1[n] + b2 * X2[n], sigma);
log_p[2] <- log_p_1 + normal_log(Y[n], b1 * X1[n], sigma);
log_p[3] <- log_p_2 + normal_log(Y[n], b2 * X2[n], sigma);
log_p[4] <- log_p_ + normal_log(Y[n], 0, sigma);
lp__ <- lp__ + log_sum_exp(log_p);
}
...
where the built-in function log_sum_exp is defined as:
log_sum_exp(xs) == log(sum(exp(xs));
It's faster more arithmetically stable than the direct implementation.
There's not a good way to vectorize over the data items Y[n],
so it's really inefficient on a per-iteration basis compared
to doing discrete sampling. Having said that, it should still
work in Stan, and it may wind up giving you higher effective
sample sizes per iteration than sampling the discrete variable.
It would be possible to calculate all of this in a loop, but
I thought it'd be easier to understand blown out. It won't
be any more efficient in a loop.
For some other alternatives, see the following blog post
and discussion:
http://andrewgelman.com/2013/03/18/tibshirani-announces-new-research-result-a-significance-test-for-the-lasso/
> d1,d2 and d3 are binary variables that turn on or off the contribution of the variables X1, X2 and X3, respectively. If
> we model them as Bernouilli distributions, then the posterior mean of the parameters gives us the probability that the
> variable Xi should be included in the model.
>
> The second example is a state-space model. We can consider for example, a stochastic volatility model, with
> |
> Y[n]~normal(mu,exp(h[n]/2))
> h[n]~h[n-1]+J*hnoise
> hnoise~normal(0,sigmah)
> |
> In this case, J is again a binary variable. If J=0, then h remains the same as in the previous period, while if J=1,
> then (log) volatility changes by the amount hnoise
This one's much easier because the mixture is at the data item level.
But do you really mean it doesn't move at all if J = 0? If so, then
you get this:
parameters {
real<lower=0,upper=1> lambda;
...
}
model {
for (n in 2:N) {
if (h[n] == h[n-1])
lp__ <- lp__ + log_sum_exp(log(1 - lambda),
log(lambda) + normal_log(h[n],h[n-1],sigmah));
else
lp__ <- lp__ + log(lambda) + normal_log(h[n],h[n-1],sigmah);
}
...
}
where log_sum_exp(x,y) = log(exp(x) + exp(y)).
I think that captures the model, but someone should doublecheck
my work here. The above model has a log likelihood equivalent
to:
log PROD_n (
lambda * Normal(h[n] | h[n-1], sigmah)
+ (1 - lambda) * h[n] == h[n-1]
)
- Bob