[R] MRF smoothers in MGCV - specifying neighbours

131 views
Skip to first unread message

Mark Payne

unread,
May 7, 2014, 8:58:54 PM5/7/14
to r-h...@r-project.org
Hi,

Does anyone have an example of a Markov Random Field smoother (MRF) in MGCV
where they have specified the neighbourhood directly, rather than supplying
polygons? Does anyone understand how the rules should be? Based on the
columb example, I have setup my data set and neighbourhood like so:

> head(nb.l)
$`10/10`
[1] 135 155 153

$`10/2`
[1] 27 8 6

$`10/3`
[1] 48 7 28 26

$`10/4`
[1] 69 27 49 47

$`10/5`
[1] 48 70 68

$`10/7`
[1] 115 95 93

> head(obs)
x y truth x.idx y.idx xy.idx
24 1.4835147 0.8026673 2.3605204 13 10 13/10
26 1.0452111 0.4673685 1.8316741 11 8 11/8
43 2.1514977 -0.2640058 -2.8812026 17 5 17/5
46 2.8473951 0.5445714 3.6347799 20 9 20/9
53 1.7983253 -0.6905912 -2.5473984 15 3 15/3
86 -0.1839814 -0.7824026 -0.5776616 5 2 5/2
>

but get the following error:

> mdl <- gam(truth ~
s(xy.idx,bs="mrf",xt=list(nb=nb.l)),data=obs,method="REML")
Error in smooth.construct.mrf.smooth.spec(object, dk$data, dk$knots) :
mismatch between nb/polys supplied area names and data area names

However, there is a perfect match between the nb list names and the data
area names:
> all(levels(obs$xy.idx) %in% names(nb.l))
[1] TRUE
>

Any suggestions where to start?

Mark

[[alternative HTML version deleted]]

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Roger Bivand

unread,
May 8, 2014, 7:10:05 AM5/8/14
to r-h...@stat.math.ethz.ch
Mark Payne <markpayneatwork <at> gmail.com> writes:

>
> Hi,
>
> Does anyone have an example of a Markov Random Field smoother (MRF) in MGCV
> where they have specified the neighbourhood directly, rather than supplying
> polygons? Does anyone understand how the rules should be? Based on the
> columb example, I have setup my data set and neighbourhood like so:
>
> > head(nb.l)
> $`10/10`
> [1] 135 155 153
>
...
>
> > head(obs)
> x y truth x.idx y.idx xy.idx
> 24 1.4835147 0.8026673 2.3605204 13 10 13/10
...
>
> but get the following error:
>
> > mdl <- gam(truth ~
> s(xy.idx,bs="mrf",xt=list(nb=nb.l)),data=obs,method="REML")
> Error in smooth.construct.mrf.smooth.spec(object, dk$data, dk$knots) :
> mismatch between nb/polys supplied area names and data area names

I could reconstruct your problem with:

library(mgcv)
library(spdep)
example(columbus)
nb <- poly2nb(columbus)
names(nb) <- attr(nb, "region.id")
columbus$ID <- names(nb)
bnb <- gam(CRIME ~ s(ID, bs="mrf", xt=list(nb=nb)), data=columbus,
method="REML")

#Error in ExtractData(object, data, knots) :
# 'names' attribute [1] must be the same length as the vector [0]

It appears that the ID variable is happier as integer:

columbus$ID <- as.integer(names(nb))
bnb <- gam(CRIME ~ s(ID, bs="mrf", xt=list(nb=nb)), data=columbus,
method="REML")
summary(bnb)

Using another representation:

nb <- poly2nb(columbus, row.names=columbus$NEIGNO)
names(nb) <- attr(nb, "region.id")
bnb <- gam(CRIME ~ s(NEIGNO, bs="mrf", xt=list(nb=nb)), data=columbus,
method="REML")

also works, but NEIGNO is numeric.

It appears that the ID variable should be numeric (or integer), but that the
names of the list of neighbour vector should be character, but otherwise
match. The call tree is not simple; the error seems to occur in
mgcv:::ExtractData.

Hope this helps,

Roger

Simon Wood

unread,
May 8, 2014, 9:15:02 AM5/8/14
to Mark Payne, r-h...@r-project.org
Hi Mark,

I'm not sure what is happening here - there is no chance that nb.l
contains a neighbourhood not in the levels of obs$xy.idx, I suppose?
i.e. is

all(names(nb.l)%in%levels(obs$xy.idx))

also TRUE? Here is some code illustrating what nb should look like (and
in response to Roger Bivand's suggestion I also tried this replacing all
the labels with things like "x/y", but it still works).


## example mrf fit using polygons....
library(mgcv)
## Load Columbus Ohio crime data (see ?columbus for details and credits)
data(columb) ## data frame
data(columb.polys) ## district shapes list
xt <- list(polys=columb.polys) ## neighbourhood structure info for MRF
par(mfrow=c(2,2))
## First a full rank MRF...
b0 <- gam(crime ~ s(district,bs="mrf",xt=xt),data=columb,method="REML")

## same fit based on direct neighbour spec...
nb <- mgcv:::pol2nb(columb.polys)$nb
xt <- list(nb=nb)
b <- gam(crime ~ s(district,bs="mrf",xt=xt),data=columb,method="REML")

best,
Simon
--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603 http://people.bath.ac.uk/sw283

Mark Payne

unread,
May 8, 2014, 10:17:00 AM5/8/14
to Simon Wood, r-h...@r-project.org
Hi Roger and Simon,

Thanks for the replies. Simon's suggestion of an isolated or missing
neighbourhood doesn't hold either.

I've attached the code below - its my attempt to solve the FELSPLINE
sausage using mrf rather than a soap smoother. Its a bit convoluted, but
should run ok. I thought this would be a good starting example to get a
GMRF running, but then hit the problem mentioned. My attempts to track the
bug suggest that there is something wierd in the knots argument that is
being supplied to smooth.construct.mrf.smooth.spec() - but haven't come so
much further than that.

Code follows.

Mark

#GMRF Example
#Solves the classic FELSPINE problem using
#GMRF in mgcv. This example is a modified version of the
#example from smooth.construct.so.smooth.spec()
#in the mgcv package

rm(list=ls())
library(mgcv)

#Extract boundary
fsb <- fs.boundary()

#Create an underlying grid and evaluate the function on it
#Based on mgcv::fs.boundary() example
dx<-0.2;dy<-0.2 #Grid steps
id.fmt <- "%i/%i"
x.vals <- seq(-1,4,by=dx)
y.vals<-seq(-1,1,by=dy)
grd <- expand.grid(x=x.vals,y=y.vals)
tru.mat <- matrix(fs.test(grd$x,grd$y),length(x.vals),length(y.vals))
grd$truth <- as.vector(tru.mat)
grd$x.idx <- as.numeric(factor(grd$x,x.vals))
grd$y.idx <- as.numeric(factor(grd$y,y.vals))
grd$xy.idx <- sprintf(id.fmt,grd$x.idx,grd$y.idx)
grd <- subset(grd,!is.na(truth))

## Simulate some fitting data, inside boundary...
n.samps<-250
x <- runif(n.samps)*5-1
y <- runif(n.samps)*2-1
obs <- data.frame(x=x,y=y)
obs$truth <- fs.test(obs$x,obs$y,b=1)
obs$z <- obs$truth + rnorm(n.samps)*.3 ## add noise
pt.inside <- inSide(fsb,x=x,y=y) ## remove outsiders

## Associate observation with grid cell
obs$x.rnd <- round(obs$x/dx)*dx
obs$y.rnd <- round(obs$y/dy)*dy
obs$x.idx <- as.numeric(factor(obs$x.rnd,x.vals))
obs$y.idx <- as.numeric(factor(obs$y.rnd,y.vals))
obs$xy.idx <- sprintf(id.fmt,obs$x.idx,obs$y.idx)
obs$xy.idx <- factor(obs$xy.idx,levels=grd$xy.idx)

#Filter observations that are outside or don't have an associated grid cell
obs <- subset(obs,pt.inside & xy.idx %in% grd$xy.idx )

## plot boundary with truth and data locations
par(mfrow=c(1,2))
image(x.vals,y.vals,tru.mat,col=heat.colors(100),xlab="x",ylab="y")
contour(x.vals,y.vals,tru.mat,levels=seq(-5,5,by=.25),add=TRUE)
lines(fsb$x,fsb$y);
points(obs$x,obs$y,pch=3);

#Plot grid
plot(y~x,grd)
lines(fsb$x,fsb$y);

#Setup neighbourhood adjancey
nb <- grd[,c("x.idx","y.idx","xy.idx")]
nb$N <- factor(sprintf(id.fmt,nb$x.idx,nb$y.idx+1),levels=nb$xy.idx)
nb$S <- factor(sprintf(id.fmt,nb$x.idx,nb$y.idx-1),levels=nb$xy.idx)
nb$E <- factor(sprintf(id.fmt,nb$x.idx+1,nb$y.idx),levels=nb$xy.idx)
nb$W <- factor(sprintf(id.fmt,nb$x.idx-1,nb$y.idx),levels=nb$xy.idx)
nb.mat <- sapply(nb[,c("N","S","E","W")],as.numeric)
nb.l <- lapply(split(nb.mat,nb$xy.idx),function(x) x[!is.na(x)])

#Fit MRF gam
mdl <- gam(z ~ s(xy.idx,bs="mrf",xt=list(nb=nb.l)),data=obs,method="REML")

Simon Wood

unread,
May 8, 2014, 11:21:41 AM5/8/14
to Mark Payne, r-h...@r-project.org
Hi Mark,

The problem here is that the constructor expects there to be at least
one observation per location. The nb.l list has neighbourhood
information for 166 locations, while the 'obs' data contains
observations for only 99 of them (unique(obs$xy.idx)).

The solution probably requires more complicated construction of nb.l.
You can't just drop locations from the existing nb.l because that messes
up the internal indexing of nb.l. You could add a dummy observation with
zero weight for each of the extra locations, but I guess that isn't what
you really want to do for this application, as presumably the
neighbourhood structure is not supposed to lead to smoothing across the
gap between the two arms of the sausage...

best,
Simon
> grd <- subset(grd,!is.na <http://is.na>(truth))
> <http://is.na>(x)])
>


> #Fit MRF gam
> mdl <- gam(z ~ s(xy.idx,bs="mrf",xt=list(nb=nb.l)),data=obs,method="REML")
>
>
>
>
>
>
>
>
>
>
> On 8 May 2014 15:15, Simon Wood <s.w...@bath.ac.uk
> <mailto:s.w...@bath.ac.uk>> wrote:
>
> Hi Mark,
>
> I'm not sure what is happening here - there is no chance that nb.l
> contains a neighbourhood not in the levels of obs$xy.idx, I suppose?
> i.e. is
>
> all(names(nb.l)%in%levels(obs$__xy.idx))
>
> also TRUE? Here is some code illustrating what nb should look like
> (and in response to Roger Bivand's suggestion I also tried this
> replacing all the labels with things like "x/y", but it still works).
>
>
> ## example mrf fit using polygons....
> library(mgcv)
> ## Load Columbus Ohio crime data (see ?columbus for details and credits)
> data(columb) ## data frame
> data(columb.polys) ## district shapes list
> xt <- list(polys=columb.polys) ## neighbourhood structure info for MRF
> par(mfrow=c(2,2))
> ## First a full rank MRF...
> b0 <- gam(crime ~
> s(district,bs="mrf",xt=xt),__data=columb,method="REML")
>
> ## same fit based on direct neighbour spec...
> nb <- mgcv:::pol2nb(columb.polys)$nb
> xt <- list(nb=nb)
> b <- gam(crime ~ s(district,bs="mrf",xt=xt),__data=columb,method="REML")
> 53 1.7983253 <tel:53%20%201.7983253> -0.6905912 -2.5473984 15
> <tel:2.5473984%20%20%20%2015> 3 15/3
> 86 -0.1839814 -0.7824026 -0.5776616 5 2 5/2
>
>
>
> but get the following error:
>
> mdl <- gam(truth ~
>
> s(xy.idx,bs="mrf",xt=list(nb=__nb.l)),data=obs,method="REML")
> Error in smooth.construct.mrf.smooth.__spec(object, dk$data,
> dk$knots) :
> mismatch between nb/polys supplied area names and data area
> names
>
> However, there is a perfect match between the nb list names and
> the data
> area names:
>
> all(levels(obs$xy.idx) %in% names(nb.l))
>
> [1] TRUE
>
>
>
> Any suggestions where to start?
>
> Mark
>
> [[alternative HTML version deleted]]
>
> ________________________________________________
> R-h...@r-project.org <mailto:R-h...@r-project.org> mailing list
> https://stat.ethz.ch/mailman/__listinfo/r-help
> <https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide
> http://www.R-project.org/__posting-guide.html
> <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
> +44 (0)1225 386603 <tel:%2B44%20%280%291225%20386603>
> http://people.bath.ac.uk/sw283
>
>


--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603 http://people.bath.ac.uk/sw283

Reply all
Reply to author
Forward
0 new messages