multinomial probit choice models

Amy Shi

unread,

Jul 16, 2014, 9:28:10 AM7/16/14

to stan-...@googlegroups.com

I am currently working on conducting Bayesian analysis for multinomial probit models in discrete choice models. The attached paper proposed a MCMC sampler for multinomial probit choice models, using latent variables via data augmentation. I have tried it and it worked, but it exhibits high autocorrelation, because the latent variables and the regression coefficients are sampled separately.

So I want to see if Stan would work for this type of model. Is there anyone who has done a multinomial probit model with Stan before?

Thanks,

Amy

McCullochRossi1994.pdf

Andrew Gelman

unread,

Jul 16, 2014, 10:13:46 AM7/16/14

to stan-...@googlegroups.com

Hi, as Bob would say it’s probably best to model the outcome probabilities directly rather than use latent variables. See Section "5.6. Ordered Logistic and Probit Regression” in the Stan manual.

A

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<McCullochRossi1994.pdf>

Ben Goodrich

unread,

Jul 16, 2014, 12:13:58 PM7/16/14

to stan-...@googlegroups.com, gel...@stat.columbia.edu

On Wednesday, July 16, 2014 10:13:46 AM UTC-4, Andrew Gelman wrote:

Hi, as Bob would say it’s probably best to model the outcome probabilities directly rather than use latent variables. See Section "5.6. Ordered Logistic and Probit Regression” in the Stan manual.

As economists would say, the multivariate probit likelihood function for the outcome probabilities entails integrating the multivariate normal PDF, which Stan does not support and would be brutal on the autodiff if we did.

Doing it with latent variables in Stan is tedious and may not be any better than a Gibbs sampler, although an LKJ prior will probably work better than a scaled inverse-Wishart. The attached is probably wrong in some of the details but at least parses.

Ben

MNP.stan

Andrew Gelman

unread,

Jul 16, 2014, 12:24:18 PM7/16/14

to stan-...@googlegroups.com

I was assuming an ordered multinomial probit which can be integrated using the formula. If non-ordered it could make sense to construct a model using a series of ordered comparisons. I’m often skeptical of general multinomial formulations (although of course they can make sense in some contexts).

A

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

<MNP.stan>

Amy Shi

unread,

Jul 18, 2014, 11:43:38 AM7/18/14

to stan-...@googlegroups.com

Many thanks, Ben, for the Stan code. It seems that you took difference against the chosen alternative. However, the order of alternatives in a choice set matters. The usual practice is to take difference against a particular alternative, either the first or the last one, so the resulting error difference covariance matrix makes sense. I don't know how to interpret the covariance matrix of the error difference in your set-up.

I have been trying to sample the latent utility variables and the regression coefficients at the same time using HMC. But there is one constraint: the utility for the chosen alternative should be the largest. I am trying to use the "bounce-off-the-walls" technique suggested in section 5.5.1.5 in Neal (2011), but don't know if it will work. Stan uses variable transformation to handle constraints. How should I transfer the latent utilities to avoid constraints?

Thanks for your time,

Amy

Michael Betancourt

unread,

Jul 18, 2014, 11:58:59 AM7/18/14

to stan-...@googlegroups.com

I have been trying to sample the latent utility variables and the regression coefficients at the same time using HMC. But there is one constraint: the utility for the chosen alternative should be the largest. I am trying to use the "bounce-off-the-walls" technique suggested in section 5.5.1.5 in Neal (2011), but don't know if it will work. Stan uses variable transformation to handle constraints. How should I transfer the latent utilities to avoid constraints?

Bouncing isn't too hard to implement if you're writing up your own HMC implementation but it won't be in Stan anytime soon because of the difficulty specifying constraints. By the way, if you're interested slightly more theory on bouncing check out 3.1 of http://arxiv.org/abs/1005.0157 and Appendix A of http://arxiv.org/abs/1112.4118.

Ben Goodrich

unread,

Jul 18, 2014, 12:45:07 PM7/18/14

to stan-...@googlegroups.com

On Friday, July 18, 2014 11:43:38 AM UTC-4, Amy Shi wrote:

Many thanks, Ben, for the Stan code. It seems that you took difference against the chosen alternative. However, the order of alternatives in a choice set matters. The usual practice is to take difference against a particular alternative, either the first or the last one, so the resulting error difference covariance matrix makes sense. I don't know how to interpret the covariance matrix of the error difference in your set-up.

You could do it that way too. The main motivation for defining a reference category is to obtain identification for maximum likelihood; otherwise you could add a constant to all the raw utilities and not affect the utility differences or the observed choices. That way would be harder in Stan I think because you would have to constrain one utility difference to be positive and the rest negative. The way I did it, the utilities are unconstrained except for one positive difference, and the translation indeterminacy is broken by the standard normal priors on all the coefficients.

I have been trying to sample the latent utility variables and the regression coefficients at the same time using HMC. But there is one constraint: the utility for the chosen alternative should be the largest. I am trying to use the "bounce-off-the-walls" technique suggested in section 5.5.1.5 in Neal (2011), but don't know if it will work. Stan uses variable transformation to handle constraints. How should I transfer the latent utilities to avoid constraints?

As Michael mentioned, Stan doesn't have a bouncing sampler to enforce that constraint that utility is highest for the chosen alternative. That is why the model block is so convoluted. Basically, the bump parameter is constrained to be positive and can be interpreted as the difference in utility between the best and second-best alternatives. Then, there is a marginal multivariate normal density for the utility of unchosen alternatives. Finally, there is a truncated univariate normal for the utility of the best alternative conditional on the utility of the unchosen alternatives. So, if you multiply that marginal density and that conditional density together, I think you have the likelihood function implied by the MNP model.

But I didn't actually test it.

Ben

Reply all

Reply to author

Forward