how to master mixture models?

637 views
Skip to first unread message

Martin Šmíra

unread,
Sep 17, 2013, 10:41:42 AM9/17/13
to stan-...@googlegroups.com
Hello Stan users, 

I'm bringing another interesting model from EJ's book (https://webfiles.uci.edu/mdlee/BB_Free.pdf). It is latent mixture model Exam_2 on the page 79.
Purpouse of the model is to separate exam results who were just guessing from those who have some knowledge. 

My model (below) is working and doing the job, but I'm not sure about some parts of the code. Esspecialy in generated quantities block - if I transformed lp__ for mixture in the right way. The mixing actually differs slightly from mixing I have in Jags. 

Isn't there a better way how to extract mixing proportions? And any other suggestions how to improve the model? 

I've also encountered one minor error when I was trying to put truncation on sampling of phi: phi ~ normal(mu,sigma)T[0,1]. Stating : no matches for function name="normal_cdf" ... lower bound in truncation type does not match sampled variate in distribution's type .   Why is that? 

model ="
# Exam Scores
data { 
  int<lower=0> p;
  int<lower=0> k[p];
  int<lower=0> n;
}
parameters {
  vector<lower=0,upper=1>[p] phi;
  simplex[2] mix[p];
  real<lower=.5,upper=1> mu;
  real<lower=0> lambda;
transformed parameters {
  real<lower=0> sigma;
  matrix[p,2] lp_for_z;

  sigma <- 1 / sqrt(lambda);
  for (i in 1:p) {
    lp_for_z[i,1] <- exp(log(mix[i,1]) + binomial_log(k[i],n,phi[i]));
    lp_for_z[i,2] <- exp(log(mix[i,2]) + binomial_log(k[i],n,.5));
  }
}
model {
  mu ~ beta(1,1)T[.5,1];
  lambda ~ gamma(.001,.001);
  phi ~ normal(mu,sigma);  # !!!

  for (i in 1:p) {
    lp__ <- lp__ + log(sum(lp_for_z));
  }
}
generated quantities {
  vector[p] prob;
  int<lower=0,upper=1> z[p];
  vector<lower=0,upper=1>[p] theta;
  
  for (i in 1:p) {
    prob[i] <- lp_for_z[i,1] / (lp_for_z[i,1] + lp_for_z[i,2]);
    z[i] <- bernoulli_rng(prob[i]);
    theta[i] <- (z[i] == 0) * .5 + (z[i] == 1) * phi[i];
  }  
}"

k = c(21,17,21,18,22,31,31,34,34,35,35,36,39,36,35)
p = length(k)  # number of people
n = 40  # number of questions

data = list(p=p, k=k, n=n) 

samples = stan(model_code = model, data = data,
               iter = 10000, chains = 1)

print(samples)

Best regards,

Martin

Bob Carpenter

unread,
Sep 17, 2013, 1:58:05 PM9/17/13
to stan-...@googlegroups.com


On 9/17/13 10:41 AM, Martin �m�ra wrote:
> Hello Stan users,
>
> I'm bringing another interesting model from EJ's book (https://webfiles.uci.edu/mdlee/BB_Free.pdf). It is latent mixture
> model Exam_2 on the page 79.

I've been meaning for ages to write a mixture model like this
for Mechanical Turkers who submit completely random results.
It's a little more complicated in that there's no official answer
key, but not much. It's why I recommended checking out the
Dawid and Skene (1979) model. Maybe Lee and Wagenmkaers get to it later.

> Purpouse of the model is to separate exam results who were just guessing from those who have some knowledge.

Why use a (0,1)-truncated normal rather than a Beta
distribution for a chance-of-success parameter?

Even if I did use a Gaussian, I wouldn't recommend
the gamma(0.001,0.001) prior.

> My model (below) is working and doing the job, but I'm not sure about some parts of the code. Esspecialy in generated
> quantities block - if I transformed lp__ for mixture in the right way.

You need the Jacobian for the 1/x^2 transform transform.

> The mixing actually differs slightly from mixing
> I have in Jags.

Nor surprising. You're also missing the index on your log prob
sums. But I'm recommending just putting it in the model and
using our log_sum_exp op. See below.

> Isn't there a better way how to extract mixing proportions? And any other suggestions how to improve the model?
>
> I've also encountered one minor error when I was trying to put truncation on sampling of phi: phi ~
> normal(mu,sigma)T[0,1]. Stating : no matches for function name="normal_cdf" ... lower bound in truncation type does not
> match sampled variate in distribution's type . Why is that?

Truncation doesn't play very nicely with vectorization
in the way we're evaluating. We need a better error message.

Specific domments on the code below.

>
> /model ="/
> /# Exam Scores/
> /data { /
> / int<lower=0> p;/
> / int<lower=0> k[p];/
> / int<lower=0> n;/
> /}/
> /parameters {/
> / vector<lower=0,upper=1>[p] phi;/
> / simplex[2] mix[p];/
> / real<lower=.5,upper=1> mu;/
> / real<lower=0> lambda;/
> /} /
> /transformed parameters {/
> / real<lower=0> sigma;/
> / matrix[p,2] lp_for_z;/
> /
> /
> / sigma <- 1 / sqrt(lambda);/
> / for (i in 1:p) {/
> / lp_for_z[i,1] <- exp(log(mix[i,1]) + binomial_log(k[i],n,phi[i]));/
> / lp_for_z[i,2] <- exp(log(mix[i,2]) + binomial_log(k[i],n,.5));/
> / }/

See below.


> /}/
> /model {/
> / mu ~ beta(1,1)T[.5,1];/
> / lambda ~ gamma(.001,.001);/

This is where you need the Jacobian adjustment.

log | d/d.lambda 1 / sqrt(lambda) |

= log(1/2 lambda^{-3/2}) // lambda > 0

= log(1/2) - 3/2 log(lambda)

and we can drop constants, so that's

lp__ <- lp__ - 1.5 * log(lambda);

But you should really use a different prior altogether
here. The 0.0001 business is just a (failed)
attempt to provide a "non-informative" prior.
The simplest thing to do is just constrain sigma
to a (reasonable) interval (not [0,10000] for this
model, where the sd can't be large because
of the truncation!).

> / phi ~ normal(mu,sigma); # !!!/

Here you need to loop over p.

for (n in 1:p)
phi[p] ~ normal(mu,sigma) T[0,1];

I think it'd also make sense (actually not --- the whole
use of normal here is funny) to truncate to [0.5,1].

But really, just use a Beta prior!

> /
> /
> / for (i in 1:p) {/
> / lp__ <- lp__ + log(sum(lp_for_z));/

You need the index lp_for_z[i] here and it should be right.
I assume this is a typo given that you had the loop.

The transformed params you had are these, but I'd just
get rid of them altogether:

> for (i in 1:p) {/
> / lp_for_z[i,1] <- exp(log(mix[i,1]) + binomial_log(k[i],n,phi[i]));/
> / lp_for_z[i,2] <- exp(log(mix[i,2]) + binomial_log(k[i],n,.5));/
> / }/

Instead, replace the above loop with

for (i in 1:p)
lp__ <- lp__ + log_sum_exp(log(mix[i,1]) + binomial_log(k[i],n,phi[i]),
log(mix[i,2]) + binomial_log(k[i],n,.5));

It's more arithmetically stable. You can assign the inner terms
to a local variable if you think it's clearer.

> / }/
> /}/
> /generated quantities {/
> / vector[p] prob;/
> / int<lower=0,upper=1> z[p];/
> / vector<lower=0,upper=1>[p] theta;/
> //
> / for (i in 1:p) {/
> / prob[i] <- lp_for_z[i,1] / (lp_for_z[i,1] + lp_for_z[i,2]);/
> / z[i] <- bernoulli_rng(prob[i]);/
> / theta[i] <- (z[i] == 0) * .5 + (z[i] == 1) * phi[i];/
> / } /
> /}"/

You can do this, but it's more efficient for downstream inference
if you just work with the expectations for z. (See the Rao-Blackwell
theorem.)

- Bob

Martin Šmíra

unread,
Sep 17, 2013, 7:10:47 PM9/17/13
to stan-...@googlegroups.com
Thank you for your answer Bob.  

It seems that the typo did a lot, results are more consistent with Jags now. 


  -  Jacobian - when is it actually used? When a sampled variable is transformed?  So everytime I did this 1/sqrt(lambda) with gamma.  Can I use invSqrtGamma in a simpler way? 
And in this case you wrote I should deduct   lp__ <- lp__ - 1.5 * log(lambda); , but where should I implement it? I tried it on separate line on the end of the model block, but it resulted just in 10x longer sampling with wierd output. And in Jags/BUGS is this Jacobian adjustment done automaticaly?

  -  Rao-Blackwell - I don't really get it. What change does it suggest I should do? Instead of: 

z[i] <- bernoulli_rng(prob[i]);/ 
theta[i] <- (z[i] == 0) * .5 + (z[i] == 1) * phi[i];/ 

do: 

theta[i] <- (1 - prob[i]) * 0.5 + prob[i] * phi[i]  ???  Or using mean from the sample?

  -  Moving the mixing computation in the model block, as you suggested, wouldn't let me gather lp__ for separate mixtures. Thats why I put it in transformed parameters block and didn't use log_sum_exp. Or could I get mixing proportions some other way?

  -  About the priors. I'm just trying to implment those models as it is in the book. But I'll forward it to EJ.


Thank you once more!

Martin

Eric-Jan Wagenmakers

unread,
Sep 17, 2013, 7:35:49 PM9/17/13
to stan-...@googlegroups.com
Hi Bob

Of course you are right, the beta is much better. In our defense, we do state that very clearly in the book, and, a few examples later, we have hierarchical mixture models ("malingering" and "Alzheimers") that do use beta distributions. Thanks again for all your help. I hope that some of these examples will be useful for Stan users in the future. At some point it might be nice to develop a compendium of applications. In my experience, it is much easier to learn bugs/jags/stan when you can work from existing examples.

Cheers,
E.J.

Bob Carpenter

unread,
Sep 17, 2013, 8:48:47 PM9/17/13
to stan-...@googlegroups.com
You definitely want to use log_sum_exp, too.
It's much more resistant to underflow and overflow
and also faster.

On 9/17/13 7:10 PM, Martin �m�ra wrote:
> Thank you for your answer Bob.
>
> It seems that the typo did a lot, results are more consistent with Jags now.
>
>
> - *Jacobian *- when is it actually used?When a sampled variable is transformed?

It needs to get added to the log probability once per
variable transform, not once every time you use it. It's
what lets you do the transform soundly.

> So everytime I did this
> 1/sqrt(lambda) with gamma. Can I use invSqrtGamma in a simpler way?

I should actually think more rather than just responding.

Yes, the thing to do is to make tau the parameter,
sample

tau ~ inv_gamma(...);

and then make sigma the derived parameter

sigma <- 1 / sqrt(tau);

Then you don't need the Jacobian.

> And in this case you wrote I should deduct lp__ <- lp__ - 1.5 * log(lambda); , but where should I implement it? I
> tried it on separate line on the end of the model block, but it resulted just in 10x longer sampling with wierd output.

I'd put it in the transformed parameter block. I like
to use variables near where they're declared.

I think I did the Jacobian correctly there, but you may
want to check my work!

> And in Jags/BUGS is this Jacobian adjustment done automaticaly?

No. They just prohibit transforming on the LHS of a sampling
statement, so you have to do it the other way around.

> - *Rao-Blackwell* - I don't really get it. What change does it suggest I should do? Instead of:
>
> z[i] <- bernoulli_rng(prob[i]);/
> theta[i] <- (z[i] == 0) * .5 + (z[i] == 1) * phi[i];/
>
> do:
>
> theta[i] <- (1 - prob[i]) * 0.5 + prob[i] * phi[i] ??? Or using mean from the sample?

I meant when it comes to do inference, use the prob[i] values,
which are expected values for z[i] rather than the z[i]
themselves. It's more efficient in an effective sample sense.

> - *Moving the mixing computation* in the model block, as you suggested, wouldn't let me gather lp__ for separate
> mixtures. Thats why I put it in transformed parameters block and didn't use log_sum_exp. Or could I get mixing
> proportions some other way?

I'm not sure what you mean by wouldn't let you gather lp__.

You definitely want to use log_sum_exp.

>
> - About the priors. I'm just trying to implment those models as it is in the book. But I'll forward it to EJ.

He took it up with Andrew :-) Look for the summary on
Andrew's blog in a month or two (it's on delay he writes
so much).

- Bob

Bob Carpenter

unread,
Sep 17, 2013, 8:54:07 PM9/17/13
to stan-...@googlegroups.com


On 9/17/13 7:35 PM, Eric-Jan Wagenmakers wrote:
> Hi Bob
>
> Of course you are right, the beta is much better. In our defense, we do state that very clearly in the book,

I did see that.

> and, a few
> examples later, we have hierarchical mixture models ("malingering" and "Alzheimers") that do use beta distributions.

Having said this, I found the first hierarchical example in
Andrew et al.'s Bayesian Data Analysis to be confusing at
first because of the hierarchical prior for the beta distribution.
(The rat example, not to be confused with BUGS's Rats example.)

> Thanks again for all your help. I hope that some of these examples will be useful for Stan users in the future.

They're much better examples of discrete parameters than we
have elsewhere, which are mostly just mixtures of various
sorts for illustration purposes.

> At some
> point it might be nice to develop a compendium of applications.

We're happy to put links on our web site and in the
manual if you have the source available somewhere.

We'd even be happy to host the examples.

> In my experience, it is much easier to learn
> bugs/jags/stan when you can work from existing examples.

I agree, which is why I'm excited that Martin's doing this!

But that's also why I don't like the idea of putting in odd
models like the normal one, because users will just copy it.
We've already had problems with that in Stan where users copy
what we have in the manuals, where it was really written in
the manual for pedagogical purposes. So we're trying to be
more careful in the models themselves, because lots of users
just skip over all the text.

I really like the examples. Some of them are in areas where
I'm actively working, like data annotation.

- Bob

Martin Šmíra

unread,
Sep 18, 2013, 1:14:40 PM9/18/13
to stan-...@googlegroups.com
Bob,

I've tried it with log_sum_exp, but n_eff seems even smaller and sampling speed aproxiamtely same as before. But then there is the thing that I can't estimate theta and z - that is my goal.
That is the reason I had the part of the code in transformed parameters block, so I could use the lp_for_z variable for estimating theta and z. Variables mix and phi are not giving me the information I need. 

I don't understand why there is a need for Jacobian adjustment while using:
lambda ~ gamma(0.001,0.001)
sigma <- 1 / sqrt(lambda)

and not in :
lambda ~ inv_gamma(0.001,0.001)
sigma <- sqrt(lambda)

Or is it equivalent?

I tried the Jacobian in right after statement for sigma, but it goes wrong.

Actually the model I which I stated in the first post works well (after the one typo fixed) so I think I'll stick with it. I get the point with Rao-Blackwell, but we need sampled "z" for an excercise in the book. 

Thanks,

Martin

Michael Betancourt

unread,
Sep 18, 2013, 1:23:20 PM9/18/13
to stan-...@googlegroups.com
Why can't you just sample the z_{i} in the generated quantities block using categorical rng?  See section 31.5 in the manual.


--
You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Bob Carpenter

unread,
Sep 18, 2013, 1:34:54 PM9/18/13
to stan-...@googlegroups.com


On 9/18/13 1:14 PM, Martin �m�ra wrote:
> Bob,
>
> I've tried it with log_sum_exp, but n_eff seems even smaller and sampling speed aproxiamtely same as before.

If everything's defined properly, it should produce exactly
the same samples when given the same seed because it's exactly
the same log probability function.

> But then
> there is the thing that I can't estimate theta and z - that is my goal.
> That is the reason I had the part of the code in transformed parameters block, so I could use the lp_for_z variable for
> estimating theta and z. Variables mix and phi are not giving me the information I need.

This is where I was suggesting not recovering z, but doing
inference based on its expectation. But if you need samples for z
to match the model you have, then that's what you have to do.

> I don't understand why there is a need for Jacobian adjustment while using:
> lambda ~ gamma(0.001,0.001)
> sigma <- 1 / sqrt(lambda)

No need for Jacobian here.

> and not in :
> lambda ~ inv_gamma(0.001,0.001)
> sigma <- sqrt(lambda)

No need for Jacobian here, either.

> Or is it equivalent?

Not quite equivalent as it's a different distribution over
lambda. But it induces the same distribution for sigma.

Where you need the Jacobian is when you have a transformed
variable on the left side of a sampling statement.

parameters {
real<lower=0> sigma;
}
transformed parameters {
real<lower=0> tau;
tau <- 1 / (sigma * sigma);
}
model {
tau ~ gamma(0.001, 0.001);
}

You don't need the Jacobian when you do this:

parameters {
real<lower=0> tau;
}
transformed parameters {
real<lower=0> sigma;
sigma <- 1 / sqrt(tau);
}
model {
tau ~ gamma(0.001, 0.001);
}

I should work through the inverse gamma example in the manual.
If you do this:

parameters {
real<lower=0> tau;
}
transformed parameters {
real<lower=0> phi;
phi <- 1 / tau;
}
model {
tau ~ gamma(0.001, 0.001);
}

then phi has an inverse gamma distribution. The other way to
do this is as follows:

parameters {
real<lower=0> phi;
}
transformed parameters {
real<lower=0> tau;
tau <- 1 / phi;
}
model {
tau ~ gamma(0.001, 0.001);
lp__ <- lp__ - 2 * log(phi);
}

The Jacobian adjustment is for the transform

f(phi) = 1 / phi = phi^{-1}

f'(phi) = - phi^{-2}

log | f'(phi) | = log phi^{-2} = - 2 * log(phi);

You can also skip the transformed parameters in the above and
just use this:

parameters {
real<lower=0> phi;
}
model {
(1 / phi) ~ gamma(0.001, 0.001);
lp__ <- lp__ - 2 * log(phi);
}

Now you wouldn't ever want to do this specifically, because we
have the inverse gamma distribution. But we don't have all the
possible transforms of all the possible distributions, so we give
users the pieces to do Jacobian adjustments for changes of variables
themselves.

> I tried the Jacobian in right after statement for sigma, but it goes wrong.
>
> Actually the model I which I stated in the first post works well (after the one typo fixed) so I think I'll stick with
> it. I get the point with Rao-Blackwell, but we need sampled "z" for an excercise in the book.

Great. Don't fix it if it's not broken :-)

- Bob

Eric-Jan Wagenmakers

unread,
Sep 19, 2013, 6:45:40 AM9/19/13
to stan-...@googlegroups.com
Hi Bob,

What we'll try to do is create Stan code for every example in our book (we already have the WinBUGS and JAGS code). You can then select the examples that you like (and ignore or change the model code for the examples you do not like). At any rate, when the Stan code is complete we'll post it on www.bayesmodels.com, together with the WinBUGS/JAGS code.


Cheers,
E.J.

On Tuesday, September 17, 2013 4:41:42 PM UTC+2, Martin Šmíra wrote:

Bob Carpenter

unread,
Sep 19, 2013, 1:54:29 PM9/19/13
to stan-...@googlegroups.com




On 9/19/13 6:45 AM, Eric-Jan Wagenmakers wrote:
> Hi Bob,
>
> What we'll try to do is create Stan code for every example in our book (we already have the WinBUGS and JAGS code). You
> can then select the examples that you like (and ignore or change the model code for the examples you do not like). At
> any rate, when the Stan code is complete we'll post it on www.bayesmodels.com, together with the WinBUGS/JAGS code.

That's great.

If you're releasing it, I'd much rather just point to your
site. I'll post about it on Andrew's blog when you let us
know it's ready.

I think it's better to keep the code with the book and
the BUGS/JAGS variants, which can help people learn Stan
who already know BUGS/JAGS.

I really like the book and examples, by the way. Will it continue to
be free? I also really like the BUGS book --- I wish one of
these two were around when I was trying to learn Bayesian stats
and MCMC.

- Bob

Eric-Jan Wagenmakers

unread,
Sep 20, 2013, 9:26:56 AM9/20/13
to stan-...@googlegroups.com
We negotiated with the publisher that the first two parts of the book will remain free. The other parts are not free. Of course, we can still provide the relevant code for those parts. Also, in the current day and age a scanned-in copy can easily find its way to the internet. Michael and I argued that providing the entire book free of charge online will actually boost the sales, but the publisher didn't buy it.
Cheers,
E.J.

Bob Carpenter

unread,
Sep 20, 2013, 12:48:48 PM9/20/13
to stan-...@googlegroups.com

On 9/20/13 9:26 AM, Eric-Jan Wagenmakers wrote:
> We negotiated with the publisher that the first two parts of the book will remain free.

What you've already got out there is great. And I think it
will give people good incentive to buy the rest.

> The other parts are not free. Of
> course, we can still provide the relevant code for those parts.

Great.

> Also, in the current day and age a scanned-in copy can
> easily find its way to the internet.

I think pretty much everything's getting scanned in
and made available online, even if it's a crappy photocopy.

- Bob
Reply all
Reply to author
Forward
0 new messages