Converting a logit to a probit model

312 views
Skip to first unread message

new2...@gmail.com

unread,
May 10, 2017, 10:57:32 AM5/10/17
to Stan users mailing list
Hi all,

I first have a basic question and then a perhaps trickier question. First, I have set up a basic two-level logistic regression model in STAN, and that works fine. But a co-author is asking for the probit formulation instead. I have looked in the manual and I see that it shouldn't be too hard to make the switch, but I'm having trouble because it seems that the probit (Phi) is introduced as part of the likelihood function (see below) whereas I've been using binomial_logit after the for loop:

y[n] ~ bernoulli(Phi(alpha + beta * x[n]));

How should I go about redoing this as a probit?

The second question is that we will be using WAIC to compare models with different covariates, so I calculated the deviance for each observation with the binomial_logit_lpmf probability mass function. I cannot discern if there is a comparable way to do this for probit models.

Where should I start? Are there example scripts posted that accomplish these modifications?

Many thanks,
Ronaldo
model logit.R

Bob Carpenter

unread,
May 10, 2017, 2:52:14 PM5/10/17
to stan-...@googlegroups.com

> On May 10, 2017, at 10:57 AM, new2...@gmail.com wrote:
>
> Hi all,
>
> I first have a basic question and then a perhaps trickier question. First, I have set up a basic two-level logistic regression model in STAN, and that works fine. But a co-author is asking for the probit formulation instead. I have looked in the manual and I see that it shouldn't be too hard to make the switch, but I'm having trouble because it seems that the probit (Phi) is introduced as part of the likelihood function (see below) whereas I've been using binomial_logit after the for loop:
>
> y[n] ~ bernoulli(Phi(alpha + beta * x[n]));
>
> How should I go about redoing this as a probit?

Is that a trick question? It's already a probit. Probit just means
using Phi rather than inv_logit as the (inverse) link function.

> The second question is that we will be using WAIC to compare models with different covariates, so I calculated the deviance for each observation with the binomial_logit_lpmf probability mass function. I cannot discern if there is a comparable way to do this for probit models.

No difference at all. Literally the only difference is using inv_logit vs. Phi.
We also have a Phi_approx that's almost as efficient as inv_logit, but on the
scale of Phi.

- Bob

new2...@gmail.com

unread,
May 10, 2017, 3:03:21 PM5/10/17
to Stan users mailing list
Sorry for the confusion, Bob. To clarify, I found the Phi in the R manual and realized that I need to insert that in the code somewhere. But the script I've been working on doesn't have it yet (see the attachment), and every time I think I'm doing the replacement correctly, I get an error message.

Meanwhile, are you indicating that I can leave this following code unchanged for a probit model?

generated quantities{

    vector[N] p;

    real dev;

    dev = 0;

    for ( i in 1:N ) {

        p[i] = a + v_j_ID[j_ID[i]] * sigma_j_ID;

    }

    dev = dev + (-2)*binomial_logit_lpmf( y | 1 , p );

Bob Carpenter

unread,
May 10, 2017, 4:10:15 PM5/10/17
to stan-...@googlegroups.com

It's very hard to help people debug fractions of their code.
You want whatever's in generated quantities here to match your
model.

Logit: binomial_logit_lpmf(y | N, alpha);

or: binomial_lpmf(y | N, inv_logit(alpha));

Probit: binomial_lpmf(y | N, Phi(alpha))

- Bob

stan user

unread,
May 11, 2017, 5:08:10 AM5/11/17
to stan-...@googlegroups.com
Okay, I tried some substitution in the original code, which now looks like this:

model_code_0 <- "

data{

    int N;

    int N_j_ID;

    int y[N];

    int j_ID[N];

}


parameters{

    real a;

    vector[N_j_ID] v_j_ID;

    real<lower=0> sigma_j_ID;

}


model{

    vector[N] p;

    a ~ normal( 0 , 5 );

    v_j_ID ~ normal ( 0, 1);

    sigma_j_ID ~ exponential( 1 );


    for ( i in 1:N ) {

        p[i] = a + v_j_ID[j_ID[i]] * sigma_j_ID;

    }

    y ~ bernoulli(Phi( 1 , p ));

}


generated quantities{

    vector[N] p;

    real dev;

    dev = 0;

    for ( i in 1:N ) {

        p[i] = a + v_j_ID[j_ID[i]] * sigma_j_ID;

    }

    dev = dev + (-2)*binomial_lpmf (y | N, Phi(alpha));

}

"


## Short chain for testing

mfit <- stan ( model_code = model_code_0, data = dat_list_0, chains = 1, cores = 1, warmup = 40, iter = 120, control = list(adapt_delta = 0.8))


Unfortunately, that yields the following error and it is not clear why:

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 

  c++ exception (unknown reason)



- Bob

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/7ynljqtxLWk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ben Goodrich

unread,
May 11, 2017, 9:17:58 AM5/11/17
to Stan users mailing list
On Thursday, May 11, 2017 at 5:08:10 AM UTC-4, stan user wrote:
Unfortunately, that yields the following error and it is not clear why:

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 

  c++ exception (unknown reason)


We don't known what the unknown reason is, but we do know that the error tends to go away on a Mac if you do

install.packages("rstan", type = "source")


stan user

unread,
May 11, 2017, 10:49:12 AM5/11/17
to stan-...@googlegroups.com
Thanks, Ben. Reinstalling rstan from the source took care of the one issue. After that, it was back to the usual error messages I see. This came up when I tried to run the code I had posted above:

ERROR at line 24

 22:            p[i] = a + v_j_ID[j_ID[i]] * sigma_j_ID;
 23:        }
 24:        y ~ bernoulli(Phi( 1 , p ));
                                      ^
 25:    }

And so now I'm trying some new code, inspired by the rstan manual, but that's producing a new error message:

model_code_0 <- "

data{

    int N;

    int N_j_ID;

    int y[N];

    int j_ID[N];

}


parameters{

    real a;

    vector[N_j_ID] v_j_ID;

    real<lower=0> sigma_j_ID;

}


model{

    vector[N] p;

    a ~ normal( 0 , 5 );

    v_j_ID ~ normal ( 0, 1);

    sigma_j_ID ~ exponential( 1 );


    for ( i in 1:N ) {

        y[i] ~ bernoulli(Phi(a + v_j_ID[j_ID[i]] * sigma_j_ID));

    }

}


generated quantities{

    vector[N] p;

    real dev;

    dev = 0;

    for ( i in 1:N ) {

        p[i] = a + v_j_ID[j_ID[i]] * sigma_j_ID;

    }

    dev = dev + (-2)*binomial_lpmf (y | N, Phi(alpha));

}

"


mfit <- stan ( model_code = model_code_0, data = dat_list_0, chains = 1, cores = 1, warmup = 40, iter = 120, control = list(adapt_delta = 0.8))



ERROR at line 33


 31:            p[i] = a + v_j_ID[j_ID[i]] * sigma_j_ID;

 32:        }

 33:        dev = dev + (-2)*binomial_lpmf (y | N, Phi(alpha));

                                                            ^

 34:    }



This seems like it should be a pretty basic two-level probit model, and I am tempted to look into the brms package to generate model code for this. The only tricky thing is that the deviance calculations are important, and I'm not sure if brms can do that.

Thanks for all of the help!

Saludos,
Ronaldo

Bob Carpenter

unread,
May 11, 2017, 11:35:33 AM5/11/17
to stan-...@googlegroups.com
Phi is a univariate function. No idea what your intent
is here, but what youhave won't work.

In general, it's much easier for us to help if we get
the whole program and the error message you get.

- Bob

stan user

unread,
May 11, 2017, 11:59:56 AM5/11/17
to stan-...@googlegroups.com
My apologies. I am still getting used to the norms of American research culture.

The intention is that I have a working logit model that I would like to estimate instead with a probit link. Here is the working code for the logit model:


#### Define variables from data frame

N <- nrow(df)

N_j_ID <- max(df$j_ID)


dat_list_0 <- list(

N = N,

N_j_ID = N_j_ID,

y = df$y,

j_ID = df$j_ID

)

model_code_0 <- "

data{

    int N;

    int N_j_ID;

    int y[N];

    int j_ID[N];

}


parameters{

    real a;

    vector[N_j_ID] v_j_ID;

    real<lower=0> sigma_j_ID;

}


model{

    vector[N] p;

    a ~ normal( 0 , 5 );

    v_j_ID ~ normal ( 0, 1);

    sigma_j_ID ~ exponential( 1 );


    for ( i in 1:N ) {

        p[i] = a + v_j_ID[j_ID[i]] * sigma_j_ID;

    }

    y ~ binomial_logit( 1 , p );

}


generated quantities{

    vector[N] p;

    real dev;

    dev = 0;

    for ( i in 1:N ) {

        p[i] = a + v_j_ID[j_ID[i]] * sigma_j_ID;

    }

    dev = dev + (-2)*binomial_logit_lpmf( y | 1 , p );

}

"


mfit <- stan ( model_code = model_code_0, data = dat_list_0, chains = 1, cores = 1, warmup = 40, iter = 120, control = list(adapt_delta = 0.8))

Ben Goodrich

unread,
May 11, 2017, 1:30:17 PM5/11/17
to Stan users mailing list
On Thursday, May 11, 2017 at 11:59:56 AM UTC-4, stan user wrote:
    y ~ binomial_logit( 1 , p );

First, it does not make much sense to use binomial_* with a size of 1. It is clearer to use bernoulli_*. In your case, it would be

y ~ bernoulli(Phi(p));

 

    dev = dev + (-2)*binomial_logit_lpmf( y | 1 , p );


Change this line to

dev = dev - 2 * bernoulli_lpmf(y | Phi(p))

But deviance calculations are pretty useless to a Bayesian.

Ben


Bob Carpenter

unread,
May 11, 2017, 3:48:32 PM5/11/17
to stan-...@googlegroups.com
I'm not sure where stan user is confused---I wrote exactly that
a few messages back.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

stan user

unread,
May 11, 2017, 6:00:27 PM5/11/17
to stan-...@googlegroups.com
As an update, the code that Ben recommended is running fine and producing sensible estimates. Thanks so much!

Bob, I regard my troubles primarily as ignorance of the syntax for a probit model, but if you have further insights into problems with the model (calculating deviance?), I would appreciate additional comments.

Also, let me mention that what your group is doing is a real blessing in international settings where we cannot easily acquire proprietary software!

Gracias.

On Thu, May 11, 2017 at 3:47 PM, Bob Carpenter <ca...@alias-i.com> wrote:
I'm not sure where stan user is confused---I wrote exactly that
a few messages back.

- Bob

> On May 11, 2017, at 1:30 PM, Ben Goodrich <goodri...@gmail.com> wrote:
>
> On Thursday, May 11, 2017 at 11:59:56 AM UTC-4, stan user wrote:
>     y ~ binomial_logit( 1 , p );
>
> First, it does not make much sense to use binomial_* with a size of 1. It is clearer to use bernoulli_*. In your case, it would be
>
> y ~ bernoulli(Phi(p));
>
>
>     dev = dev + (-2)*binomial_logit_lpmf( y | 1 , p );
>
> Change this line to
>
> dev = dev - 2 * bernoulli_lpmf(y | Phi(p))
>
> But deviance calculations are pretty useless to a Bayesian.
>
> Ben
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+unsubscribe@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/7ynljqtxLWk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.

Bob Carpenter

unread,
May 12, 2017, 12:10:31 AM5/12/17
to stan-...@googlegroups.com
I'd suggest looking into WAIC cross-validation instead
of calculating deviance:

https://arxiv.org/abs/1507.04544

- Bob

> On May 11, 2017, at 6:00 PM, stan user <new2...@gmail.com> wrote:
>
> As an update, the code that Ben recommended is running fine and producing sensible estimates. Thanks so much!
>
> Bob, I regard my troubles primarily as ignorance of the syntax for a probit model, but if you have further insights into problems with the model (calculating deviance?), I would appreciate additional comments.
>
> Also, let me mention that what your group is doing is a real blessing in international settings where we cannot easily acquire proprietary software!
>
> Gracias.
>
> On Thu, May 11, 2017 at 3:47 PM, Bob Carpenter <ca...@alias-i.com> wrote:
> I'm not sure where stan user is confused---I wrote exactly that
> a few messages back.
>
> - Bob
>
> > On May 11, 2017, at 1:30 PM, Ben Goodrich <goodri...@gmail.com> wrote:
> >
> > On Thursday, May 11, 2017 at 11:59:56 AM UTC-4, stan user wrote:
> > y ~ binomial_logit( 1 , p );
> >
> > First, it does not make much sense to use binomial_* with a size of 1. It is clearer to use bernoulli_*. In your case, it would be
> >
> > y ~ bernoulli(Phi(p));
> >
> >
> > dev = dev + (-2)*binomial_logit_lpmf( y | 1 , p );
> >
> > Change this line to
> >
> > dev = dev - 2 * bernoulli_lpmf(y | Phi(p))
> >
> > But deviance calculations are pretty useless to a Bayesian.
> >
> > Ben
> >
> >
> >
> > --
> > You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> > To post to this group, send email to stan-...@googlegroups.com.
> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/7ynljqtxLWk/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

new2...@gmail.com

unread,
May 12, 2017, 10:12:15 AM5/12/17
to Stan users mailing list
I am evidently farther behind than I realized. My understanding was that I needed to calculate the deviance in order to compute the WAIC. That's why I was intent on solving that part of the model code. I will try to work my way through the linked paper. Thanks!

- Ronaldo

new2...@gmail.com

unread,
May 27, 2017, 4:18:30 PM5/27/17
to Stan users mailing list
Hi again, everyone.

I took Bob's advice and researched WAIC, which looks to be a preferred option for comparing models. I found a package that can calculate WAIC of models if I calculate a "log_lik" for every observation. I thought that the "generated quantities" code below would do this, but I have failed on the syntax. Is there an easy fix to this?

Gracias y saludos,
Ronaldo 


model_code_b <- "

data{

    int N;

    int N_j_ID;

    int y[N];

    int j_ID[N];

}


parameters{

    real a;

    // Random effect for higher-level

    vector[N_j_ID] v_j_ID;

    real<lower=0> sigma_j_ID;

}



model{

    vector[N] p;

    

    //priors

    a ~ normal( 0 , 5 );


    sigma_j_ID ~ exponential (1);



// LIKELIHOOD

    for ( i in 1:N ) {

        

        p[i] = 

        a

       + v_j_ID [ j_ID[i]] * sigma_j_ID

       ;

      }

      

      // update target

      y ~ bernoulli (Phi(p));

      }

      

generated quantities{

vector[N] p;

vector[N] log_lik;

for ( i in 1:N) {

p[i] =

a

       + v_j_ID [ j_ID[i]] * sigma_j_ID;

       log_lik[i] = bernoulli_lpmf (y[i]) | Phi(p) );

Bob Carpenter

unread,
May 27, 2017, 7:04:03 PM5/27/17
to stan-...@googlegroups.com
First you have to get your model to compile.

You want to pull your model into a separate file and then
the line numbers in the syntax error will tell you where
the error is.

Stan provides the loo package for computing cross-validation
and WAIC.

- Bob

P.S. We're closing this list and moving to: http://discourse.mc-stan.org
Reply all
Reply to author
Forward
0 new messages