Re: [R] package mgcv - predict with bam: Error in X[ind, ] : subscript out of bounds

105 views
Skip to first unread message

Katharina May

unread,
Feb 2, 2014, 12:52:03 PM2/2/14
to r-h...@r-project.org
Hi Simon,

thank you for your reply, I really appreciate any help to understand
the problem here...
Unluckily the package upgrade didn't help with this issue.
An example reproducing the error, and a current sessionInfo() Output
can be found below.

Many thanks once again,

Katharina


R Code Example
<snip>
library(RCurl)
library(mgcv)
#retrieve xylemRohWeekXnn2011 test data frame
eval( expr = parse( text =
getURL("https://webdisk.ads.mwn.de/Handlers/AnonymousDownload.ashx?folder=1a7cbaa4&path=xylemRohWeekXnn2011.R")
))

xylemRohWeekXnn.fit.bam <- bam(sensor1 ~ sensor2 + s(site, bs="re")
+ s(site, NthSampling, bs="re") , data=xylemRohWeekXnn2011,
na.action=na.omit)

#subset data containing gaps for predicting
gapData <- xylemRohWeekXnn2011[is.na(xylemRohWeekXnn2011[,2]) &
!is.na(xylemRohWeekXnn2011[,11]),c(2:3,6:7, 11)]

xylemRohWeekXnnSite.fit <-
predict.gam(xylemRohWeekXnn.fit.bam,gapData, type="response", se=F)
</snap>



My current Session Information (sessionInfo() Output - also confirming
that the problem exists on both Windows and Mac OS X):
<snip>
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] mgcv_1.7-28 nlme_3.1-113 RCurl_1.95-4.1 bitops_1.0-6

loaded via a namespace (and not attached):
[1] grid_3.0.2 lattice_0.20-24 Matrix_1.1-2 tools_3.0.2
</snap>




On 31/01/14 12:57, Simon Wood wrote:
>
>Hi Katharina,
>
>Could you try upgrading to mgcv_1.7-28, please? There was an occasional
>problem to do with matching factor levels, which is fixed, but I'm not
>very confident that is what is going on.
>
>If upgrading doesn't work, is there any chance you could send me a small
>example dataset and code that produces the error, and I'll look at it?
>
>best,
>Simon
>
>--
>Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
>+44 (0)1225 386603 http://people.bath.ac.uk/sw283

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius

unread,
Feb 2, 2014, 5:07:54 PM2/2/14
to Katharina May, r-h...@r-project.org

On Feb 2, 2014, at 9:52 AM, Katharina May wrote:

> Hi Simon,
>
> thank you for your reply, I really appreciate any help to understand
> the problem here...
> Unluckily the package upgrade didn't help with this issue.
> An example reproducing the error, and a current sessionInfo() Output
> can be found below.

I suppose there may be an error of sorts, but have you considered the fact that solving the error might not gain you admittance into heaven? Look at the RHS of the model:

sensor2 + s(site, bs = "re")

... and think about the fact that you are "smoothing" a factor variable.

> str(gapData)
'data.frame': 2304 obs. of 5 variables:
$ sensor1 : num NA NA NA NA NA NA NA NA NA NA ...
$ site : Factor w/ 9 levels "KRB","NP.FOR",..: 3 3 3 3 3 3 3 3 3 3 ...
$ NthSampling: int 7489 7490 7491 7492 7493 7494 7495 7496 7497 7498 ...
$ YDay : num 53 53 53 53 53 53 53 53 53 53 ...
$ sensor2 : num 0.567 0.566 0.567 0.567 0.569 ...

I'm having trouble making any sense of how that might work. It is, of course, possible to just do this:

xylemRohWeekXnnSite.fit <-
predict.gam(xylemRohWeekXnn.fit.bam, type="response", se=F)

That gives predictions for the original dataset.

But I think the error might be helpful in alerting one to the problems with the model.

--
David.
David Winsemius
Alameda, CA, USA

Simon Wood

unread,
Feb 3, 2014, 5:19:21 AM2/3/14
to David Winsemius, Katharina May, r-h...@r-project.org

> I suppose there may be an error of sorts, but have you considered
> the fact that solving the error might not gain you admittance into
> heaven? Look at the RHS of the model:
>
> sensor2 + s(site, bs = "re")
>
> ... and think about the fact that you are "smoothing" a factor
> variable.
- Actually this is ok. mgcv exploits the duality between quadratically
penalized smooths and Gaussian random effects to allow random effects to
be specified this way. bs="re" specifies a Gaussian random effect with
corresponding model matrix given by model.matrix(~site-1). (More
generally s(x,y,z,bs="re") specifies a gaussian random effect with model
matrix given by model.matrix(~x:y:z-1), with obvious generalization to
more or fewer variables). See mgcv help file ?random.effects for more.

best,
Simon



>> str(gapData)
> 'data.frame': 2304 obs. of 5 variables: $ sensor1 : num NA NA NA
> NA NA NA NA NA NA NA ... $ site : Factor w/ 9 levels
> "KRB","NP.FOR",..: 3 3 3 3 3 3 3 3 3 3 ... $ NthSampling: int 7489
> 7490 7491 7492 7493 7494 7495 7496 7497 7498 ... $ YDay : num
> 53 53 53 53 53 53 53 53 53 53 ... $ sensor2 : num 0.567 0.566
> 0.567 0.567 0.569 ...
>
> I'm having trouble making any sense of how that might work. It is, of
> course, possible to just do this:
>
> xylemRohWeekXnnSite.fit <- predict.gam(xylemRohWeekXnn.fit.bam,
> type="response", se=F)
>
> That gives predictions for the original dataset.
>
> But I think the error might be helpful in alerting one to the
> problems with the model.
>


--

Simon Wood

unread,
Feb 3, 2014, 6:42:23 AM2/3/14
to Katharina May, r-h...@r-project.org
Hi Katharina,

Thanks for sending this.

The problem is that the prediction data for site contain levels not
available in the (useable non-NA) fit data...

> levels(m$model$site)
[1] "KRB" "NP.FOR" "WKS.FRE" "WKS.KRE" "WKS.RIE" "WKS.WUE"
> levels(gapData$site)
[1] "KRB" "NP.FOR" "RIE.2" "WKS.BBR" "WKS.FRE" "WKS.HOE" "WKS.KRE"
[8] "WKS.RIE" "WKS.WUE"

predict.lm has a check for this, and so fails with a rather more
informative error message. e.g.

m0 <- lm(sensor1 ~ sensor2 + site + site:NthSampling,
data=xylemRohWeekXnn2011,na.action=na.omit)
predict(m0,gapData)
... factor site has new levels RIE.2, WKS.BBR

I'll add a better check to predict.gam.

best,
Simon

ps. if you want predictions with the random effects for site set to zero
then one trick is to use terms like s(site,bs="re",by=dum) in fitting
with dum set to 1. Then in prediction you can set 'site' to any existing
level, and dum to zero, in order to get a prediction for the missing
level, with the 'site' effect set to zero.

Katharina May

unread,
Feb 3, 2014, 6:59:57 AM2/3/14
to Simon Wood, r-h...@r-project.org
Hi Simon,

many thanks for looking into this and making me understand the problem!
I'll adjust my factor levels right away...

Best, Katharina
--
Katharina May
BSc. Forest Science and Resource Management
IT Specialist (CCI)

Prinz-Ludwig-Str. 7
85354 Freising
Germany

Mobile: +49-176-24031809
www: m3y.de
Reply all
Reply to author
Forward
0 new messages