Advice on priors for logistic regression of infrequent events with limit data

93 views
Skip to first unread message

Jon Fawcett

unread,
May 23, 2017, 12:48:56 PM5/23/17
to Stan users mailing list
Hey Everyone,

This question is relevant to a reply I made at the end of an earlier thread, but I think it was buried. I am fitting a multivariate logistic regression model in rstanarm wherein performance is low but not at floor. Data is also very sparse; there are two conditions and participants contribute only a single binary response to each. I am trying to fit random intercepts and slopes. I have dealt with logistic regression often in the past, but usually in cases where performance was closer to 50% and/or where data was plentiful. Usually the estimates from the model don’t match with basic proportions, but they are not so far off. In this particular instance, the estimates I get from the model differ quite a bit from the raw proportions, more than I am used to seeing. I wanted to double check whether I am doing something unreasonable with my priors that could account for this.

The model I am fitting as follows:

mod = stan_glmer(switched ~ cond + order + choices.s + relprefc1.s + ici.s + bore.s + fat.s + (cond|ID),
                             data
= dat,
                             family
= binomial,
                             adapt_delta
=.999,
                             iter
=6000,
                             prior
= normal(0, 1.5), # Regularizing prior
                             prior_intercept
= normal(-1, 1)) # Encourages grand mean ~5% - ~73%


All predictors are normalized using the scale() function except for cond and order, which are categorical and have been coded as -1 and 1 (following something I read in a post by Andrew). I have applied mildly informative prior to the coefficients and a prior to the intercept (which should be the grand mean) indicative of our belief that overall performance should fall no lower than ~5%.

In doing this, I expected the intercept to be close to the overall mean proportion, but when I check this:

Check the grand mean against the intercept
mean
(dat$switched)
plogis
(mod$coefficients[1]) # Probability at the mean/centre of all variables

I instead get .15 for the raw proportion and .07 for the back-transformed intercept. Similarly, estimates I calculate for each condition (using posterior_linpred) are generally much lower than the corresponding proportions. Everything seems shifted sharply towards 0. The prior placed on the intercept helps with this situation somewhat. Is this cause for concern? Or is it just the nature of logistic regression near a boundary and with limited data? Am I using inappropriate priors? I have posted some data and code, in case it was helpful.

Cheers!
Jon
dat.rds
sample_script.R

Ben Goodrich

unread,
May 23, 2017, 3:39:01 PM5/23/17
to Stan users mailing list
On Tuesday, May 23, 2017 at 12:48:56 PM UTC-4, Jon Fawcett wrote:
All predictors are normalized using the scale() function except for cond and order, which are categorical and have been coded as -1 and 1 (following something I read in a post by Andrew). I have applied mildly informative prior to the coefficients and a prior to the intercept (which should be the grand mean) indicative of our belief that overall performance should fall no lower than ~5%.

That was pre-Stan and pre-rstanarm and pre-Andrew-changing-his-mind. Centering is not the worst thing in the world, but (all) the predictors are centered internally by stan_glmer(), so you don't need to do so yourself in order to interpret the prior_intercept as pertaining to the centroid observation. Dividing by the predictor's standard deviation is a worse idea and also not necessary because by default normal() specifies auto_scale = TRUE, which creatively reinterprets the scale parameter to be in standard deviation units. Centering and scaling just makes it harder to do posterior prediction correctly. So, I would say that the only transformations you should be doing are to give things reasonable units (e.g. thousands of dollars rather than dollars) and even then, it is not so necessary if you specify the recommended but not default argument QR = TRUE when there are multiple predictors.

I am not really seeing a problem when QR = TRUE and all the other arguments are left at their default values. A normal(-1,1) prior on the intercept is a lot stronger than the default one, which is fine. But there does not seem to be a lot of information in the data about the global intercept when there are so many cond-specific intercept shifts, so a strong prior on the global intercept shrinks it toward zero a lot, which also affects the posterior distribution of everything else.

Ben

Jon Fawcett

unread,
May 23, 2017, 9:02:34 PM5/23/17
to stan-...@googlegroups.com
Thanks, Ben!
Looks like I need to read-up on what rstanarm is doing behind the scenes. Your post was very educational. However, I worry I am still missing something. I tried running the model with QR = TRUE and all other arguments left at their default, removing the priors and also switching from standardized back to centred predictors (either at -1/1 or mean centred). This corresponds to the following model:

mod_nopriors = stan_glmer(switched ~ cond + order + choices + relprefc1 + ici + bore + fat + (cond|ID),

                            data
= dat,
                            family
= binomial,

                            QR
= TRUE)


However, this produces even more extreme estimates:

Estimates:
           
Median MAD_SD
(Intercept) -6.6    2.6  
cond        
1.2    0.9  
order        
0.2    0.6  
choices      
5.2    3.0  
relprefc1  
-0.4    0.3  
ici        
-4.7    2.4  
bore        
0.8    0.5  
fat          
0.5    0.4  


Error terms:
 
Groups Name        Std.Dev. Corr
 ID    
(Intercept) 3.8          
        cond        
3.8      0.14
Num. levels: ID 96

Once back-transformed, the intercept (representing performance at the mean of all predictors) is now .0013. If I re-introduce the prior on my intercept (prior_intercept = normal(-1, 1)), but exclude the prior on my coefficients (I think the help file says you should not use informative priors if QR = TRUE) then it looks a little better:

Estimates:
           
Median MAD_SD
(Intercept) -2.7    0.4  
cond        
0.6    0.3  
order        
0.1    0.3  
choices      
2.7    1.2  
relprefc1  
-0.2    0.1  
ici        
-2.1    0.7  
bore        
0.3    0.2  
fat          
0.2    0.1  


Error terms:
 
Groups Name        Std.Dev. Corr
 ID    
(Intercept) 0.82        
        cond        
0.80     0.14
Num. levels: ID 96

Not bad. Still, the intercept seems lower than I would have expected.

Nonetheless, maybe I am asking the wrong question. What I am trying to accomplish here is to use my model to (a) draw judgments as to the strength of association between each predictor and the dependent measure, and, (b) produce counterfactual plots for each variable that I can include in my article to demonstrate the nature of these associations. For (a) I figure I already have this sorted, using the posterior of each coefficient. In this sense, the absolute value of the intercept is less important. For (b), from skimming various texts on regression, it seems the thing to do is to present a plot depicting the effect of each variable at the mean of all other variables – hence the centred predictors. For example, looking at the effect of cond, I would use:

plot_dat = posterior_linpred(mod, re.form=NA, transform=TRUE, newdata=expand.grid(cond=c(-1,1), order=0, choices=0, relprefc1=0, ici=0, bore=0, fat=0))



This produces estimates of .04 and .11 for cond == -1 and cond == 1, respectively, which I would interpret as representing the mean expected performance in each condition for a “typical” participant (due to re.form=NA). However, the estimates for each condition are very low, and in fact the basic proportion for each condition, ignoring all other predictors, would be .08 and .22 – which do not even fall within the 95% HDI of the relevant estimates. Is there a better way to summarize the results graphically?

Cheers!
Jon

Ben Goodrich

unread,
May 23, 2017, 10:54:18 PM5/23/17
to Stan users mailing list
On Tuesday, May 23, 2017 at 9:02:34 PM UTC-4, Jon Fawcett wrote:
Looks like I need to read-up on what rstanarm is doing behind the scenes. Your post was very educational. However, I worry I am still missing something. I tried running the model with QR = TRUE and all other arguments left at their default, removing the priors and also switching from standardized back to centred predictors (either at -1/1 or mean centred). This corresponds to the following model:

mod_nopriors = stan_glmer(switched ~ cond + order + choices + relprefc1 + ici + bore + fat + (cond|ID),
                            data
= dat,
                            family
= binomial,
                            QR
= TRUE)


However, this produces even more extreme estimates:

The posterior distribution of that intercept is given the uncentered predictors, even though the prior on the intercept is given centered predictors. So, you can interpret the posterior distribution as the log-odds of someone who is zero on all other predictors. In the generated quantities block of the underlying Stan program, it shifts the intercept given centered predictors back to the regular intercept so that it is comparable with lme4::glmer.
 
Nonetheless, maybe I am asking the wrong question. What I am trying to accomplish here is to use my model to (a) draw judgments as to the strength of association between each predictor and the dependent measure, and, (b) produce counterfactual plots for each variable that I can include in my article to demonstrate the nature of these associations. For (a) I figure I already have this sorted, using the posterior of each coefficient. In this sense, the absolute value of the intercept is less important. For (b), from skimming various texts on regression, it seems the thing to do is to present a plot depicting the effect of each variable at the mean of all other variables – hence the centred predictors.

That is standard advice but I don't really like it. What I prefer is to start with a constructed dataset where cond = -1 and the other predictors are at their means given that cond = -1. Get that posterior_linpred() and then do it again changing only cond to 1. If cond were randomly assigned, it wouldn't matter, but in general I like conditional means or medians.
 
For example, looking at the effect of cond, I would use:

plot_dat = posterior_linpred(mod, re.form=NA, transform=TRUE, newdata=expand.grid(cond=c(-1,1), order=0, choices=0, relprefc1=0, ici=0, bore=0, fat=0))
 
I also do not like re.form = NA, but a lot of people do it and Andrew seems to think it is acceptable. The reason I don't like it is that the posterior distribution of alpha, beta, Sigma, etc. is jointly determined along with b. And then you are evaluating a function of alpha and beta after setting all the b's to zero without deriving the distribution of alpha, beta | b = 0. I prefer predictions that integrate the b's out (by predicting for a new ID that wasn't in the original dataset) or average the predictions over each level of ID in the data.

Ben

Jon Fawcett

unread,
May 24, 2017, 8:11:55 PM5/24/17
to Stan users mailing list
Thanks Ben.
As always, this has been exceptionally helpful!
Reply all
Reply to author
Forward
0 new messages