model comparison: shifted log normal hierarchical model vs. log normal hierarchical model. WAIC?

436 views
Skip to first unread message

Bruno Nicenboim

unread,
Apr 5, 2015, 11:30:28 AM4/5/15
to stan-...@googlegroups.com
Hi, I'm continuing this post here as Bob recommended. I want to model RTs, and I suspect that a shifted lognormal hierarchical model is more accurate than just a lognormal hierarchical one. I've managed to fit the shifted lognormal model thanks to Bob, and I was hoping to get some advise regarding the model comparison. 

As I said in the original post I thought about using WAIC (using this code).
I'm extracting the log likelihood in this way:

    log_lik[i] <- normal_log(rt[i]-c[i],log_mu[i], sigma);

There's something wrong for sure, because I get a huge WAIC for the shifted lognormal, even though I simulated the data with a lognormal distribution (in fakedata.Rda). If I understood Bob correctly, it may have to do with having an extra parameter that varies by subject (the shift) in the shifted model, which I don't have in the other lognormal model. I think I may have read somewhere that one shouldn't compare different distributions with WAIC, but in any case, I'm not sure if it's right, and if I really have different distributions (in the lognormal one, the shift is 0).

Any recommendation on how to compare the models would be much appreciated!
Bruno


logshifted_lmm.stan
lmm.stan
evallmm.R
fakedata.Rda

Bob Carpenter

unread,
Apr 5, 2015, 8:51:42 PM4/5/15
to stan-...@googlegroups.com
If you're doing likelihood evaluations, you typically want to marginalize
out any latent parameters associated with data items.

The most general way of doing evaluations is cross-validation. Estimate
the model on part of the data and predict the rest. You want to use full
Bayes for that --- the generated quantities block in Stan is set up to do that.
It can also be tricky with hierarchical data or time series, because you
can't just take a random subset of the data.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <logshifted_lmm.stan><lmm.stan><evallmm.R><fakedata.Rda>

Andrew Gelman

unread,
Apr 5, 2015, 8:56:50 PM4/5/15
to stan-...@googlegroups.com
Leave-one-out cross-validation is closely connected to Waic and can be approximately implemented in Stan as described in my paper with Aki. I recommend that approach rather than trying to set up your own cross-validation from scratch.

Aki Vehtari

unread,
Apr 6, 2015, 2:09:43 AM4/6/15
to stan-...@googlegroups.com
What do you mean by huge WAIC? Relative to what?

Difference in models can be big as in lmm.stan 
 rt ~ normal(mu, sigma)
and in logshifted_lmm.stan
 (rt-c) ~ lognormal(log_mu, sigma);

where mu and log_mu are both linear functions of covariates. Based on your description, I would have assumed that lmm.stan would have log-normal, too
 rt ~ lognormal(log_mu, sigma);

Aki

Bruno Nicenboim

unread,
Apr 6, 2015, 6:59:50 AM4/6/15
to stan-...@googlegroups.com
Thanks for the responses. It's a pleasure to get answers in such a short time!

Aki: I meant that the WAIC is for 39245 for the shifted lognormal model and 3243 for the lognormal. I used rt ~ normal(mu, sigma) but I log-transformed the data (It should be the same, right?). I also wanted to check what happens with the WAIC when I use a reciprocal transformation on RT (which is fairly common in psycholinguistics). So I'm using the same normal model for data which is either log(RT) and  -1000/RT. In any case I changed a bit the model that I uploaded to include the priors suggested by Bob, and I noticed that a few of the Rhats reach 1.05, so I need to check again what's happening.


Anyway, I'll recheck the model, and I'll try to implement the cross-validation approach from the paper, and I'll probably post back with more questions.

Bruno

Aki Vehtari

unread,
Apr 6, 2015, 10:24:28 AM4/6/15
to stan-...@googlegroups.com


On Monday, April 6, 2015 at 1:59:50 PM UTC+3, Bruno Nicenboim wrote:
Thanks for the responses. It's a pleasure to get answers in such a short time!

Aki: I meant that the WAIC is for 39245 for the shifted lognormal model and 3243 for the lognormal. I used rt ~ normal(mu, sigma) but I log-transformed the data (It should be the same, right?).

It's not the same as you are making transformation of the observation space and don't take into account the Jacobian of the transformation. It's easier to use log_normal for both, or look p. 21 in BDA3 for the transformation of variables.

Aki

Bob Carpenter

unread,
Apr 6, 2015, 10:26:46 AM4/6/15
to stan-...@googlegroups.com

> On Apr 6, 2015, at 8:59 PM, Bruno Nicenboim <bruno.n...@gmail.com> wrote:
>
> Thanks for the responses. It's a pleasure to get answers in such a short time!
>
> Aki: I meant that the WAIC is for 39245 for the shifted lognormal model and 3243 for the lognormal. I used rt ~ normal(mu, sigma) but I log-transformed the data (It should be the same, right?).

What exactly are you comparing? Could you provide the
actual Stan code?

These:

> rt ~ normal(mu, sigma)
> and in logshifted_lmm.stan
> (rt-c) ~ lognormal(log_mu, sigma);

are not similar. Was there a typo?

The easiest thing to do to see this is to evaluate the densities
using the definition of normal and lognormal. You'll see that

lognormal(y | mu, sigma)

isn't the same as

normal(log(y) | mu, sigma)

The difference is in the Jacobian adjustment for the change of
variables. Sorry if that's obvious.

> I also wanted to check what happens with the WAIC when I use a reciprocal transformation on RT (which is fairly common in psycholinguistics). So I'm using the same normal model for data which is either log(RT) and -1000/RT.

These are very different models, but you should be able to
compare which is a better fit using either WAIC or cross validation.
The goal is to have better prediction of RT (reaction time?).

- Bob

Bruno Nicenboim

unread,
Apr 6, 2015, 2:08:42 PM4/6/15
to stan-...@googlegroups.com


On Monday, April 6, 2015 at 4:26:46 PM UTC+2, Bob Carpenter wrote:

> On Apr 6, 2015, at 8:59 PM, Bruno Nicenboim <bruno.n...@gmail.com> wrote:
>
> Thanks for the responses. It's a pleasure to get answers in such a short time!
>
> Aki: I meant that the WAIC is for 39245 for the shifted lognormal model and 3243 for the lognormal. I used rt ~ normal(mu, sigma) but I log-transformed the data (It should be the same, right?).

What exactly are you comparing?  Could you provide the
actual Stan code?  
I've simulated my own data to be shifted lognormal distributed, but I want to see if I can "discover" it; and if a model comparison actually shows that a model that assumes a shifted lognormal distribution is  better than a model that a assumes a non-shifted lognormal distribution or reciprocal normal distribution. I attached the models in the first post, but I discovered that I messed up the priors, I'll fix it before I attach it again.

These:

>  rt ~ normal(mu, sigma)
> and in logshifted_lmm.stan
>  (rt-c) ~ lognormal(log_mu, sigma);

are not similar. Was there a typo?  

The easiest thing to do to see this is to evaluate the densities
using the definition of normal and lognormal.  You'll see that

  lognormal(y | mu, sigma)

isn't the same as

  normal(log(y) | mu, sigma)

The difference is in the Jacobian adjustment for the change of
variables.  Sorry if that's obvious.
no, it wasn't obvious to me.  I'll change that, thanks for the heads up. 

So I should also change the reciprocal transformation. I guess this is wrong: normal( (-1000/y) | mu, sigma)

and I should do this
(-1000/y) ~ normal(mu,sigma);
increment_log_prob( log(1000*y^(-2));

If I understood right the manual. 


> I also wanted to check what happens with the WAIC when I use a reciprocal transformation on RT (which is fairly common in psycholinguistics). So I'm using the same normal model for data which is either log(RT) and  -1000/RT.

These are very different models, but you should be able to
compare which is a better fit using either WAIC or cross validation.
The goal is to have better prediction of RT (reaction time?).
Yes, a type of reaction times: reading times in a self-paced reading experiment, where people go from word to word by pressing the space bar.

- Bob

Bob Carpenter

unread,
Apr 6, 2015, 9:48:43 PM4/6/15
to stan-...@googlegroups.com
If y is data, it's not going to matter, because the Jacobian adjustment
will be constant.

That looks like the right Jacobian adjustment, but I'd write it
as

incement_log_prob(log(1000) + -2 * log(y));

and then you can drop the constant log(1000) term and just use

inrement_log_prob(-2 * log(y));

- Bob

Aki Vehtari

unread,
Apr 7, 2015, 2:10:02 AM4/7/15
to stan-...@googlegroups.com
If y is data, it's not going to matter, because the Jacobian adjustment
will be constant.  

It's not going to matter for sampling, but if you are making model comparison then the constant term matters in log_lik used for LOO or WAIC.

Aki

Bob Carpenter

unread,
Apr 7, 2015, 2:17:43 AM4/7/15
to stan-...@googlegroups.com
Thanks for that very important correction!

- Bob

Bruno Nicenboim

unread,
Apr 8, 2015, 4:46:48 AM4/8/15
to stan-...@googlegroups.com
Thanks for all the comments! I'm attaching all the code now.

So I managed to compare the log normal model vs shifted log normal (v2). The shifted log normal model recovers the parameters that  I simulated (betas= (5, .12, .05,.08), shift =180, sdev -.7, etc) quite well, and its WAIC is lower than the one of the regular (non shifted) log normal model.
I still haven't tried to to do the comparison with the cross validation approach, I'll try it soon

My problems (that maybe I should post them in a different thread):
* The model which assumes reciprocal normal distribution is not working at all after I implemented the   increment_log_prob(log(1000) + -2 * log(y)); . It's very likely I'm overlooking something completely idiotic (sorry if that's the case), or that I'm doing something quite idiotic.  (This is the model that I originally had with -1000/rt transformation of the data (I called that stan file recip, and the models ir - sorry  for the confusion) )
* Most of the  Rhats are over 1.00, some of them even 1.07 (up to 1.07 in the  shifted log normal (v2) and up to 1.03 in the regular log normal model). The traceplots of shifted log normal (v2) don't look that bad, but I think they aren't mixing completely. Any recommendations? I'm using now priors that I think make sense given the data, but it didn't help

And a comment (mostly for Bob):
The first shifted log normal model is what Bob recommended me (or at least what I understood), where I take care that the shift by observation c doesn't go over the minimum RT in the data (see this post) I'm using there a function at the beginning of the code that constrains the general shift. In the second version, I modified the code so that I also take care that the shift by observation c is not lower than 0.
(c = general shift + shift per subject)

If this is too confusing with the new questions, I'll start two threads for the different problems (or I'll continue the old shift lognormal thread and a new one).

Traceplots: log_shifted, log_shifted2, log, recip  (The password is 1 for all of them) 

Thanks!!!

Bruno
evallmm.R
evallmm.Rout
fakedata.Rda
log_lmm.stan
logshifted_lmm.stan
logshifted_lmm_v2.stan
recip_lmm.stan

Bob Carpenter

unread,
Apr 9, 2015, 9:09:46 AM4/9/15
to stan-...@googlegroups.com

> On Apr 8, 2015, at 6:46 PM, Bruno Nicenboim <bruno.n...@gmail.com> wrote:
>
> Thanks for all the comments! I'm attaching all the code now.
>
> So I managed to compare the log normal model vs shifted log normal (v2). The shifted log normal model recovers the parameters that I simulated (betas= (5, .12, .05,.08), shift =180, sdev -.7, etc) quite well, and its WAIC is lower than the one of the regular (non shifted) log normal model.
> I still haven't tried to to do the comparison with the cross validation approach, I'll try it soon
>
> My problems (that maybe I should post them in a different thread):
> * The model which assumes reciprocal normal distribution is not working at all after I implemented the increment_log_prob(log(1000) + -2 * log(y));


The log Jacobian should be

log abs(d/d.rt -1000 * rt^(-1))
= log(abs(-1000 * (-1 * rt^(-2)))
= log(1000 * rt^(-2))
= log(1000) -2 * log(rt)

So the Jacobian looks right.

Where does the y come in?

If rt (or if that's what's named y) is just data, the Jacobian
will be constant and won't matter for sampling.

> . It's very likely I'm overlooking something completely idiotic (sorry if that's the case), or that I'm doing something quite idiotic. (This is the model that I originally had with -1000/rt transformation of the data (I called that stan file recip, and the models ir - sorry for the confusion) )

If rt is negative, you have a problem in that log(rt) will fail.

> * Most of the Rhats are over 1.00, some of them even 1.07 (up to 1.07 in the shifted log normal (v2) and up to 1.03 in the regular log normal model). The traceplots of shifted log normal (v2) don't look that bad, but I think they aren't mixing completely. Any recommendations? I'm using now priors that I think make sense given the data, but it didn't help

So you either need to run longer or reparameterize to get better
mixing.

- Bob


Bruno Nicenboim

unread,
Apr 13, 2015, 3:21:48 PM4/13/15
to stan-...@googlegroups.com
I'll add some info for completeness.

The reciprocal model wasn't working because I just forgot an index here:
      for (i in 1:N_obs)
        (-1000/rt[i]) ~ normal(ir_mu[i], sigma);
      
      increment_log_prob(log(1000) - 2*log(rt));

And I realized that I wasn't updating the log likelihood correctly, and it should be like this:
    log_lik[i] <- normal_log(-1000/rt[i],ir_mu[i], sigma) + log(1000) - 2*log(rt[i]);


With all this, WAIC works really good (in fact I'm even a bit worried)

If my simulated data is shifted log normal
shifted log normal: 39242
log normal:41457
reciprocal: 41158

If my simulated data is reciprocal
shifted log normal: 43771
log normal: 45136
reciprocal: 43503


And for my real data:
shifted log normal: 47503
log normal 48975
reciprocal 48113

Isn't it my difference in the real data too big? 48113 - 47503=610?

Bruno

Avraham Adler

unread,
Apr 13, 2015, 6:00:14 PM4/13/15
to stan-...@googlegroups.com, gel...@stat.columbia.edu
The code linked to by Bruno implements approximate loo-cv as explained in your paper with Aki; the only difference between that code and the code in your paper is a slightly streamlined version of colVars. It doesn't multiply the result by -2, though, so it isn't on a "deviance" scale but a "log-likelihood" scale, which should be kept in mind (or adjusted in the code) if the Burnham & Anderson rules of thumb are used (as I believe Spieghalter does with DIC).

Avi
Reply all
Reply to author
Forward
0 new messages