Multivariate Tobit

Andrew O'Reilly-Nugent

unread,

May 1, 2017, 2:44:29 AM5/1/17

to stan-...@googlegroups.com

Hi all,

I've been banging my head against the wall trying to figure out how to extend a Tobit regression in Stan to the multivariate case. I noticed Dr. Goodrich recently pushed a multivariate Probit model to the example models repo and wondered whether someone might have the time to explain the logic behind it. For the life of me, I cannot see how incrementing 'prev' by rows of the Cholesky factor provides a truncated MVN. (That said, Jacobian adjustment is still an enigma so this is not surprising)

I've attached a data simulation and my attempted model (incrementing the log prob with marginal normals) but it does not properly recover the covariance or correlation structures. For reference, section 3 of Pakman et. al, 2013 (https://arxiv.org/pdf/1208.4118.pdf) has the likelihoods for multivariate Probit and Tobit models.

Any advice would be greatly appreciated.

Cheers,

Andrew

P.S. the latest Cholesky factor optimisations are amazing!

multivariate_tobit_cholesky_noncentered.stan

mvtobit_sim_stan.R

Ben Goodrich

unread,

May 1, 2017, 3:20:05 AM5/1/17

to Stan users mailing list

Here you go.

Ben

tMVN.pdf

tMVN.stan

Andrew O'Reilly-Nugent

unread,

May 1, 2017, 3:57:22 AM5/1/17

to Stan users mailing list

Wow, thank you. Stan-users continues to amaze.

On Monday, 1 May 2017 17:20:05 UTC+10, Ben Goodrich wrote:

Here you go.

Ben

Andrew O'Reilly-Nugent

unread,

May 2, 2017, 1:12:07 AM5/2/17

to stan-...@googlegroups.com

Apologies Ben, I'm having trouble shoehorning this into sampling syntax. Following the tobit log-likelihood from Wikipedia -

where I(y[j]) is an indicator function equal to one if y[j] is greater than a lower bound and zero otherwise. Thus when when y[j] is left-censored at zero my intuition was:

for(i in 1:N){
  vector[S] z = trunc_mvn(mu[i], L_Sigma, lb, y_cen[i], u)[1]; // Inverse CDF of truncated multivariate normal
  for(j in 1:S){
    if(y[i, j] > lb[j]) {
      target += normal_lpdf(y[i, j] | mu[i, j], L_Sigma[j, j]);
    }
    else {
      target += log1m(Phi(z[j]));
    }
  }
  target += log(trunc_mvn(mu[i], L_Sigma, lb, y_cen[i], u)[2]); // Jacobian adjustments
  // implicit: u ~ uniform(0,1)
}

where z[j] = inv_Phi(u[j]* + (1 - u[j]*)) and u[j]* = Phi((0 - mu[j] + L[j, 1:j-1])/ L_Sigma[j, j]).

Would you be so kind as to point out where I've gone wrong? Full model attached.

Thanks for your help,

Andrew

multivariate_tobit_cholesky_noncentered.stan

Ben Goodrich

unread,

May 2, 2017, 11:26:38 PM5/2/17

to Stan users mailing list

In the case of a Tobit model, you presumably have enough observed data that you can just follow the recommendations on censored regression models in the Users Manual. You just need the multivariate extension (attached).

Ben

multivariate_tobit_cholesky_noncentered.stan

Andrew O'Reilly-Nugent

unread,

May 3, 2017, 3:32:35 AM5/3/17

to Stan users mailing list

Derp. I got so hung up on finding multivariate CDFs to integrate out censored values that I forgot they could be imputed.

Thank you for taking the time to share your notes, they were extremely helpful for understanding both Jacobian adjustments and probability integral transforms. Readily moving to and from the unconstrained scale looks like it will useful for many types of problems.