[R] multiple comparisons for GAMs

767 views
Skip to first unread message

Bird_Girl

unread,
Aug 2, 2012, 3:09:46 PM8/2/12
to r-h...@r-project.org
Hi,

I have a question regarding whether it is possible to do post hoc tests on a
model fit with GAM {mgcv}. My response variable is abundance (no.
individuals per plot), and I have one continuous predictor (light) and one
factor (height) which includes 7 levels.

> mod2=gam(log_abundance~s(light)+height+te(light,by=height)+s(long)+s(lat))

The relationship between log_abundance and light at the seven levels of
height all differ significantly from the overall relationship between
log_abundance and light, and relationships at most of the 7 levels are not
linear. I would like to do some kind of multiple comparison or post hoc
test to determine whether the relationship between log_abundance and light
differs significantly among the different levels of height (i.e., is the
relationship at 200 m different from that at 400 m)? Is there any way to do
this?

Thanks in advance, and I apologize if this is a stupid question – I am new
to R.


--
View this message in context: http://r.789695.n4.nabble.com/multiple-comparisons-for-GAMs-tp4638935.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

peter dalgaard

unread,
Aug 3, 2012, 4:39:33 AM8/3/12
to Bird_Girl, r-h...@r-project.org

On Aug 2, 2012, at 21:09 , Bird_Girl wrote:

> Hi,
>
> I have a question regarding whether it is possible to do post hoc tests on a
> model fit with GAM {mgcv}. My response variable is abundance (no.
> individuals per plot), and I have one continuous predictor (light) and one
> factor (height) which includes 7 levels.
>
>> mod2=gam(log_abundance~s(light)+height+te(light,by=height)+s(long)+s(lat))
>
> The relationship between log_abundance and light at the seven levels of
> height all differ significantly from the overall relationship between
> log_abundance and light, and relationships at most of the 7 levels are not
> linear. I would like to do some kind of multiple comparison or post hoc
> test to determine whether the relationship between log_abundance and light
> differs significantly among the different levels of height (i.e., is the
> relationship at 200 m different from that at 400 m)? Is there any way to do
> this?
>
> Thanks in advance, and I apologize if this is a stupid question – I am new
> to R.

Nothing stupid about it, but maybe difficult to give a complete answer to in email (a simplified, reproducible example would help so that readers can try out suggestions).

I would expect that library(multcomp) is your friend. It should work since gam objects have coef() and vcov() methods. Look into glht() and mcp(height="Tukey").

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk Priv: PDa...@gmail.com

Bird_Girl

unread,
Aug 7, 2012, 12:48:01 PM8/7/12
to r-h...@r-project.org
I have looked into using glht (‘multcomp’ package) to do multiple comparisons
for a model fit with GAM {mgcv} but after reading the description of the
‘multcomp’ package, I believe this method only applies to parametric models
and linear hypotheses. When I ran the code glht(model,linfct….) I got an
error message (see below), but I am not sure if that was the result of my
data being input in the incorrect format or the result of the test not
working on the type of model I used. Does anyone know if there is an
equivalent procedure for non-parametric models, or one that can be used to
test for significant differences in curves modeled using GAMs?

I have included a highly simplified, mini dataset that approximates the
shape of the relationships exhibited by my own much larger data set, as well
as the code I used. In a nutshell, I would like to test whether the shape
of the relationship between light and log_abundance at a height of 200m is
significantly different from the relationship between light and
log_abundance at a height of 400m.


> require (lattice)
> require (mgcv)
> require(multcomp)
> species=read.csv("Book1.csv", header=TRUE, sep=",", quote="\"",fill=TRUE)
> attach(species)
> names(species)
> species
height light log_abundance
1 200m 0.15 0.20
2 200m 0.23 0.28
3 200m 0.38 0.30
4 200m 0.41 0.47
5 200m 0.52 0.48
6 200m 0.63 0.42
7 200m 0.71 0.37
8 200m 0.85 0.35
9 200m 0.96 0.40
10 200m 1.05 0.45
11 200m 1.16 0.37
12 200m 1.23 0.30
13 200m 1.39 0.26
14 200m 1.47 0.15
15 400m 0.12 0.10
16 400m 0.25 0.09
17 400m 0.36 0.12
18 400m 0.42 0.14
19 400m 0.60 0.24
20 400m 0.65 0.28
21 400m 0.74 0.37
22 400m 0.86 0.40
23 400m 0.93 0.35
24 400m 1.06 0.25
25 400m 1.15 0.15
26 400m 1.24 0.18
27 400m 1.37 0.40
28 400m 1.48 0.57

>coplot(log_abundance~light|height)
> model1=gam(log_abundance~s(light)+height+te(light,by=height,k=6))
> plot(model1,trans=function(x)exp(x)/(1+exp(x)),shade=T,pages=1)
> summary (model1)
> glht(model1,linfct=mcp(height="Tukey"))

Error in linfct[[nm]] %*% C :
requires numeric/complex matrix/vector arguments

--
View this message in context: http://r.789695.n4.nabble.com/multiple-comparisons-for-GAMs-tp4638935p4639429.html


Sent from the R help mailing list archive at Nabble.com.

______________________________________________

peter dalgaard

unread,
Aug 7, 2012, 4:20:04 PM8/7/12
to Bird_Girl, r-h...@r-project.org
Thanks for the example. The direct cause of the error is that model.matrix.gam doesn't return the contrasts that you need for the symbolic trickery of mcp(height="Tukey") so the C term in the above is NULL.

You can do it by hand:

> L <- matrix(c(0,1,rep(0,19)),nrow=1)
> summary(glht(model1,linfct=L))

Simultaneous Tests for General Linear Hypotheses

Fit: gam(formula = log_abundance ~ s(light) + height + te(light, by = height,
k = 6))

Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
1 == 0 -0.08557 0.01622 -5.275 1.33e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)

Now, I suspect that wasn't actually what you wanted... It sounds rather like you want to test the significance of the te() part. For two levels, I suppose this can be done with

> model2 <- update(model1, ~ s(light) + height)
> anova(model1,model2, test="F")
Analysis of Deviance Table

Model 1: log_abundance ~ s(light) + height + te(light, by = height, k = 6)
Model 2: log_abundance ~ s(light) + height
Resid. Df Resid. Dev Df Deviance F Pr(>F)
1 17.123 0.031427
2 23.624 0.282765 -6.5011 -0.25134 21.064 3.885e-07 ***

(These F tests with smooth terms are a bit elusive. GAM experts please step up and tell me if I'm off my rocker here!)

However, the multiple comparisons come in when you want to do this with pairs of seven height levels, and that's not something that falls within the glht() framework. The best idea that I can come up with is to generate the 21 comparisons "manually" (e.g. by using version of "height" in which pairs of levels have been fused in the te() term) and then apply a generic p-value adjustment to the pairwise comparisons).

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk Priv: PDa...@gmail.com

Reply all
Reply to author
Forward
0 new messages