observation level random effect variance, bernoulli outcomes

Eric Payne

unread,

Feb 14, 2018, 9:34:19 PM2/14/18

to brms-users

Dear all,

I am new to the brms package, so please forgive me if this is a simplistic question.

When including an observation level random effect (i.e., a random effect for each row in your dataframe), is it possible to set the the variance of that random effect to a fixed value within brms? I ask because for nomial data in which trial size is one (e.g., bernoulli), the residual variance cannot be estimated (see Hadfield's MCMCglmm Course Notes, chapter 2, section 6, starting page 48 for some discussion of this issue). In the rethinking and MCMCglmm packages, one can set this variance to a specific value, but I am unsure of how to do so within the brms syntax. Below, I show a general model in the rethinking and MCMCglmm syntax, hopefully further explaining what I am trying to do. Thanks much!

Rethinking model (line 6 is the point of the question)

line 1: model <- map2stan(

line 2: alist(

line 3: bernouilli_outcome ~ dbinom(1,p), #trial size is set to 1, with probability of success p

line 4: logit(p) <- a + a_olre[olre], #a_olre is a random intercept by observation, aka each row of the data(observation level random effect, olre)

line 5: a ~ dnorm(0,10), # fixed a intercept prior

line 6: a_olre[olre] ~ dnorm(0,1)), #prior for olre ***

line 7: data=list(your data), start=list(starting values), iter=some number, warmup=some number)

***At line 6, because olre variance cannot be estimated for nomial data in which trial size is 1, we set the standard deviation of the olre prior to be "1" (instead of "a_olre[olre] ~ dnorm(0, sigma_olre)", with another prior for sigma_olre). Is there a way to set this olre standard deviation within brms? Or does brms handle this internally and set it to zero (as lme4 does) or some other value?

Likewise, for a bernoulli model (family=categorical) in MCMCglmm, in which there are no random effects except for the observation level intercept, one could encode a prior as follows (only showing the olre part of the prior):

prior=list(R = list(V = 1, fix=1)). This prior fixes the olre variance to the value specified by V, in this case 1.

Within brms, I would imagine setting a fixed olre prior somewhat like below:

First, assuming the formula looks something like: formula=outcome ~ 1 + (1|olre), data=data, family=bernoulli("logit")

The prior for the olre could be:

prior=c(set_prior("normal(1,0)", class="sd", group="olre"))

But under this method, a test model I have run still estimates the olre variance, rather than fixes it to the chosen value.

I'd greatly appreciate any help or thoughts you may have!

Thanks,

Eric

Paul Buerkner

unread,

Feb 15, 2018, 2:27:54 AM2/15/18

to Eric Payne, brms-users

Hi Eric,

brms currently provides no interface to fix parameters of random effects.

However, I wonder, why do you need this random effect is the first place? After all, it is basically unidentifiable. What is the advantage of having it?

Best,

Paul

--
You received this message because you are subscribed to the Google Groups "brms-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to brms-users+unsubscribe@googlegroups.com.
To post to this group, send email to brms-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/brms-users/53984090-e4ce-437f-b0e0-7c0aaa5346b3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Eric Payne

unread,

Feb 15, 2018, 6:38:55 PM2/15/18

to brms-users

Hi Paul,

In my context, it's primarily useful for calculating repeatability in overdispersed data (also called ICC, or in another post within the brms group, VPC). I study parasites and non-human animal personality, and I use repeatability (variance among individuals divided by [variance among individuals plus residual variance]) as a metric to evaluate whether a population exhibits consistent among-individual variation in behavior and parasite infection. The infection data is binomial (infected: yes or no). To deal with additive overdispersion in these data, I include an observation level random effect (olre) in the binomial model. If I aggregate data by individual, so that each individual has only row in the data, then trial size exceeds 1 and -- as far I understand -- the olre variance is estimable. But with only one row per individual in the data, it doesn't make sense to include an individual level random intercept. Instead, I do not aggregate the data, so that trial size is one. Here, I still include an olre, but do not estimate it's variance, instead setting it to an arbitrary value as per Hadfield's recommendation. In this case, repeatability for a logit link bernoulli model is (among individual variance)/(among individual variance + olre variance (set to, e.g., 1) + pi^2/3) (though see Nakagawa et al. 2017 Journal of the Royal Society Interface for a slightly different formulation regarding pi^2/3). I suppose I could use a beta-binomial outcome approach to deal with the overdispersion multiplicatively, but I do not know how to derive the repeatability equation for such a distribution.

I could also of course assume the absence of overdispersion, but Hadfield argues that we should assume its presence, rather than it's absence, and indeed his "categorical," "poisson," and "ordinal" family models automatically include olre. However, I admit that I don't have the statistical expertise to evaluate whether doing so is appropriate. From a biological standpoint, it seems reasonable to me that infection data would be overdispersed. As for other variables, say ordinal behavior data (which I have), I am unsure whether it is reasonable to assume overdispersion. But it's certainly true with my specific dataset that most individuals are timid, while a few are hyper-aggressive, which could be analogous to overdispersion.

I hope this clarifies why I was asking about olre variance.

Cheers,

Eric

To unsubscribe from this group and stop receiving emails from it, send an email to brms-users+...@googlegroups.com.

Paul Buerkner

unread,

Feb 15, 2018, 8:09:08 PM2/15/18

to Eric Payne, brms-users

Your understanding is correct, but I wonder why we don't just set olre variance to zero in this case, since it is unidentifiable anyway for binary data.

Let's move away from the GLMM for a second and think of the binomial model with a multiplicative overdispersion parameter omega (rather than an additive one). For such a model, we could define

ICC = sigma_a / (sigma_a + omega * pi^2/3)

and, for binary data, we would always set omega = 1. I believe, this is also what Nakagawa says somewhere in Repeatability for Gaussian and non-Gaussian
data: a practical guide for biologists. The equivalent additive version of omega = 1 (to obtain the same ICC) is sigma_e = 0.

I haven't been working with ICCs in these biological cases, so I won't present this as the only valid way, but at least it appears naively reasonable to me.

To unsubscribe from this group and stop receiving emails from it, send an email to brms-users+unsubscribe@googlegroups.com.

To post to this group, send email to brms-...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/brms-users/d18804a2-38e8-46f8-a897-975d5cb71e03%40googlegroups.com.

Paul Buerkner

unread,

Feb 16, 2018, 2:06:20 AM2/16/18

to brms-users

Please keep brms-user in CC.

Here is what Eric wrote

Right, I see in the 2010 paper that N&S discuss beta-binomial outcomes and repeatability. I had forgotten. Thanks for pointing that out.

I understand your point that setting omega = 1 is the same as sigma_e = 0 in an additive model, and that we could do so for binary data since omega and sigma_e are unidentifiable in such data. For me, the question of why we should do so depends on the word unidentifiable. It seems that if we set omega=1 (or sigma_e=0), then aren’t we assuming that there is no overdispersion? But in fact, there may be, we just cannot estimate it for binary data. By setting omega > 1 or sigma_e > 0, we instead assume that there is overdispersion, and acknowledge that we cannot actually estimate it, and so we set it to a user-chosen value.

I sound as if I’m taking a strong stand in favor of setting sigma_e > 0. But this is an ongoing question for me. I don’t have a great reason for choosing sigma_e > 0 as opposed to sigma_e = 0, other than the subjective argument above.

-Eric

Paul Buerkner

unread,

Feb 16, 2018, 2:16:29 AM2/16/18

to brms-users

For me, the only non-arbitrary value is sigma_e = 0 in this case.

Considering the ICC, we can make it infinitely small just by using a very large value sigma_e > 0. So the question is what justifies your sigma_e = 1? What not, say, sigma_e = 2?

Also think of the case, where we have summed over the trials of a person, so that we now have a binomial response with multilple trials, but only one observation per person.
Then, we still *cannot* estimate sigma_a and sigma_e separately, because the variance between persons is the same as the residual variance.

For these to be both estimababe at the same time, we need multiple trials per observation *and* multiple observations per person.

Reply all

Reply to author

Forward