Help with an unbalanced mixed model with repeated measures

Lauren Pincus

unread,

Oct 29, 2015, 7:24:30 AM10/29/15

to Davis R Users' Group

Hi DRUG,

I have a mixed model conundrum that I’m wondering if someone can help me understand. For my study I grew plants under 9 different fertilizer conditions and measured yield across many farmers’ plots in two different seasons. Because some farmers left the study and others joined, I have an unbalanced design with some sites only participating in Year 1 (7), others only participating in Year 2 (14), and still others who participated both years (24). This leads to the majority of sites violating the assumption of independence and requiring a repeated measure.

Beyond having some sites that participated twice, I also have a mixed design: my farm sites are random (to extend the conclusions to the general region), but my treatments are fixed (my treatments are discrete conditions and generalizing outside of them would not make sense). Year is also fixed because I only have two years and do not want to expend the conclusions to other years.

My question is: is this model using R’s nlme package appropriate to analyze a mixed model with repeated measures? How does the lme command take into account the unbalanced design and repeated measure? If it does! This is based off of examples I found online, but I haven’t found the online resources super helpful. Does anyone out there have a superior knowledge of nlme than me who could tell me if this model is appropriate?

mod <- lme(yield ~ treatment + season, random= ~1|site, data=df)

Thanks in advance!

Lauren

Jaime Ashander

unread,

Oct 29, 2015, 12:43:12 PM10/29/15

to davi...@googlegroups.com

Hi Lauren,

I agree the model you propose is likely the best one for your data*. In the model you specify, nlme estimates for each site a random variation drawn from a common distribution. These random site effects are consistent across years and treatments. The package authors Pinheiro and Bates wrote a book on nlme (2000 https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=pinheiro+bates ) that explains how to use the package to fit mixed models, and describes some of the theory behind it.

Briefly, nlme estimates random effects using regression models fit via likelihood methods that not subject to the balancing assumption of classic ANOVA. It does make assumptions -- that the remaining ("within-group") error is iid normal as are the random effects. I encourage you to look at Pinheiro and Bates, particularly section 4.3 on examining the fit of your model and checking if these assumptions are met.

For a general discussion of mixed models and designs in easy-to-access form see the paper "Nested by Design" by Schielzeth and Nakagawa. (2013. Methods in Ecology and Evolution http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210x.2012.00251.x/abstract ). These authors briefly discuss repeated measures, but assume a situation where more within-site measurements are taken*.

- Jaime

*If there were several measures in time for each site, you'd want to allow for different temporal effects within sites

mod_many <- lme(yield ~ treatment + season, random = ~ season | site, data=df_many)

But, since you have two seasons, there almost certainly isn't enough data to do this and it is likely less of an issue.

--
Check out our R resources at http://www.noamross.net/davis-r-users-group.html
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.
Visit this group at http://groups.google.com/group/davis-rug.
For more options, visit https://groups.google.com/d/optout.

Bonnie Dixon

unread,

Oct 29, 2015, 12:43:39 PM10/29/15

to davi...@googlegroups.com

I think your model is correct, Lauren. Or, alternatively, you could set it up this way:

mod <- lme(yield ~ treatment + season, random= ~ treatment + season |site, data=df)

Including treatment and season in the "random" argument would allow the effects of treatment and season to vary between different sites. But this is optional and would make the model more complex.

But, am I remembering correctly from talking with you before that you have weather data too? If it were me, I would include the weather in the model instead of the season/year. (Season would still be the repeats in your repeated measures design even if it is not modeled explicitly as a variable.) My thinking is that what really matters to crop yield is not what calendar year it was, but what the weather was that year.

I hope that helps.

Bonnie

On Thu, Oct 29, 2015 at 4:24 AM, Lauren Pincus <lpi...@gmail.com> wrote:

--

Jaime Ashander

unread,

Oct 29, 2015, 3:51:47 PM10/29/15

to davi...@googlegroups.com

Bonnie's answer made me realize I didn't think carefully about the repeated measures in your study and should clarify my final comment.

The full model Bonnie suggests would be ideal, but because you're most interested in the effect of treatment, and you're repeatedly measuring the treatment effect across two years, a model with treatment in the random term would be better than just random intercepts or a random slope on season (i.e., mod <- lme(yield ~ treatment + season, random= ~ treatment | site, data=df) rather than the model in my final comment below).

But, I still think data limitations (specifically the low number of repeated measurements within each site) mean you'll have trouble fitting (either the model won't converge or if it does the estimated random slopes and intercepts will have very high correlation and shouldn't be trusted).

I agree with Bonnie that its worth thinking about weather covariates instead of season as a factor

Lauren Pincus

unread,

Nov 5, 2015, 3:12:44 PM11/5/15

to Davis R Users' Group

Thanks so much Bonnie and Jaime! Yes, Bonnie, we talked about this last year maybe – I’m dealing with comments now and one comment is about the repeated measures part of my design. I remember thinking about that as I set up this model, but now I don’t remember how nmle deals with repeated measures. Bonnie, would you mind explaining what you meant when you said “Season would still be the repeats in your repeated measures design even if it is not modeled explicitly as a variable?” Like Jaime said, I only have two repeated measurements per site. I don't know if that means running the model like I did [ mod <- lme(yield ~ treatment + season, random= ~1|site, data=df) ] means the estimated slopes and intercepts (for seasons only?) are highly correlated and shouldn't be trusted.

I went with seasons in the model instead of using the rainfall data I had since rainfall actually does NOT have a significant effect, but season is significant. So there are other things that happened that season that made a difference and I discuss what those could be.

Thanks again!

Jaime Ashander

unread,

Nov 5, 2015, 3:21:54 PM11/5/15

to davi...@googlegroups.com

Glad to help! I added a clarifying comment below

Thanks so much Bonnie and Jaime! Yes, Bonnie, we talked about this last year maybe – I’m dealing with comments now and one comment is about the repeated measures part of my design. I remember thinking about that as I set up this model, but now I don’t remember how nmle deals with repeated measures. Bonnie, would you mind explaining what you meant when you said “Season would still be the repeats in your repeated measures design even if it is not modeled explicitly as a variable?” Like Jaime said, I only have two repeated measurements per site. I don't know if that means running the model like I did [ mod <- lme(yield ~ treatment + season, random= ~1|site, data=df) ] means the estimated slopes and intercepts (for seasons only?) are highly correlated and shouldn't be trusted.

The correlation issue that I mentioned occurs when fitting a model with both random slopes and intercepts [mod <- lme(yield ~ treatment + season, random= ~ treatment | site, data=df) ] to data with few elements of each group.

The model you just listed above shouldn't have issues with correlation in the random effects -- they are only estimated at one level so no correlation to worry about.

I went with seasons in the model instead of using the rainfall data I had since rainfall actually does NOT have a significant effect, but season is significant. So there are other things that happened that season that made a difference and I discuss what those could be.

Thanks again!

--

Bonnie Dixon

unread,

Nov 9, 2015, 9:24:03 PM11/9/15

to davi...@googlegroups.com

Hi Lauren.

Apologies for the delayed reply. This has been a crazy week for me.

Here is an attempt at an explanation:

As I'm sure you know, in repeated measures data, the observations are not all independent and unrelated to each other. Instead there are groups of observations that are related to each other, and expected to be correlated with one another, because they are repeated measurements taken from the same sampling units at different times. This is why repeated measures data is sometimes referred to as "clustered" or "nested" data. In the case of your data, the observations came from different farm sites, and repeated observations from the same site would be expected to correlate. So the sites give a natural grouping structure to the data. When you put "|site" in the "random" argument of the lme() command, what you are doing is grouping the data by site. So, in your model, all of the observations from one site are treated as a group. lme() creates a separate regression model for each group (in this case each farm site). The overall model results that are presented are the expected results for a typical group (i.e. site). (My understanding is that the overall model is basically a weighted average of all of the group models.)

So, when I said that “season would still be the repeats in your repeated measures design even if it is not modeled explicitly as a variable", I just meant that even without including season as a variable in the model, all the observations from both seasons would still be included in the analysis, and grouped appropriately by site. That said, if the weather data you have is not a significant predictor and season is, then I think you made the right choice to include season as a variable.

I hope that helps. (Maybe someone else can explain it better.) Perhaps you just need to explain this to your reviewer.

Bonnie

Lauren Pincus

unread,

Nov 11, 2015, 6:47:09 AM11/11/15

to Davis R Users' Group

Hi Bonnie,

That is very clear and helps me understand what lme() is doing. Thanks for taking the time to explain that!

Lauren

Zachary Pierce

unread,

Mar 5, 2016, 2:16:09 AM3/5/16

to Davis R Users' Group

Hello all!

I have a similar question, and didn't want to make a new post, so I figured someone will see it and get it.

I have a dataset (don't we all?) and I'm testing the effects of breastfeeding status (dichotomous) on changes in the concentration of an immune cell marker (neopterin) within and across times (6, 11, 15 wks). I have 4 variables to my set: infant (or subject, and I'm not sure if this is necessary), bf status (ebf or non-ebf) and week. So, is neopterin different between groups at each time, and do neopterin concentrations change across times?

time is a random effect, no? I'm not entirely sure how to approach my code. Here is what I have:

neopterin$MainPlot = with(neopterin,(interaction(wk,infant)))

mixed=lmer(neo~bfstat + (1+wk|infant),neopterin)

my anova returns:

Error: number of observations (=134) <= number of random effects (=2209) for term (1 + wk | infant); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable

then:

Analysis of Variance Table of type III with Satterthwaite

approximation for degrees of freedom

Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)

bfstat 215550 215550 1 0

infant 22690500 504233 45 0 0.71132

so what happened to my degrees of freedom!!!??? And everything else?!

Any thoughts?

Reply all

Reply to author

Forward