Problems with AR1 in simulated spatiotemporal binomial data

296 views
Skip to first unread message

Mark Myer

unread,
Mar 8, 2018, 1:26:24 PM3/8/18
to R-inla discussion group
Hello, 

I'm trying to simulate a binary response in space and time with an AR1 process and some linear predictors, and I'm having an issue where the model predicts the spatial component, zero inflation, and predictors well but does not come close to recovering the AR1 parameter, rho. Even when I use a small degree of zero inflation and a large rho, the INLA estimates rho at near zero. 

Here are my inputs:

AR1 parameter (rho): 0.95
range: ~5000
zero-inflation probability: 0.1

Below are the coefficients for the predictors:
rlag99 : (0, 0.2, 0.4, 0.6), this is a categorical predictor with values of 0, 1, 2, or 3
ppl_FldPer: 0.4
SwgPerc: -0.4
KarstPerc: -0.3
x2010mmha_Huc10: 0.1

Here are my results: 
Call:
c("inla(formula = formula, family = \"zeroinflated.binomial.1\", data = inla.stack.data(stack), ",  "    control.predictor = list(compute = TRUE, A = inla.stack.A(stack), ",  "        link = 1), control.family = list(link = \"logit\", hyper = list(theta = prec.prior)), ",  "    control.inla = list(int.strategy = \"eb\"), control.fixed = list(expand.factor.strategy = \"inla\"))" )

Time used:
 Pre-processing    Running inla Post-processing           Total 
         2.5569        508.6335          6.0287        517.2191 

Fixed effects:
                   mean     sd 0.025quant 0.5quant 0.975quant    mode kld
rlag99rlag990    0.0711 0.0145     0.0427   0.0711     0.0996  0.0711   0
rlag99rlag991    0.2256 0.0218     0.1829   0.2255     0.2684  0.2255   0
rlag99rlag992    0.4147 0.0193     0.3769   0.4147     0.4526  0.4146   0
rlag99rlag993    0.6034 0.0316     0.5417   0.6033     0.6656  0.6031   0
ppl_FldPer       0.3554 0.0117     0.3325   0.3554     0.3785  0.3553   0
SwgPerc         -0.3284 0.0102    -0.3485  -0.3283    -0.3084 -0.3283   0
KarstPerc       -0.2196 0.0195    -0.2580  -0.2196    -0.1813 -0.2196   0
X2010mmha_Huc10  0.0405 0.0206     0.0001   0.0405     0.0809  0.0404   0

Random effects:
Name   Model
 spatial   SPDE2 model 
wk   AR1 model 

Model hyperparameters:
                                                             mean        sd 0.025quant  0.5quant 0.975quant      mode
zero-probability parameter for zero-inflated binomial_1 1.125e-01 2.240e-02     0.0718 1.116e-01  1.593e-01    0.1105
Range for spatial                                       2.025e+04 2.089e+04  3778.8640 1.398e+04  7.463e+04 8018.9230
Stdev for spatial                                       6.308e-01 6.627e-01     0.1085 4.322e-01  2.352e+00    0.2412
Precision for wk                                        1.901e+04 1.856e+04  1374.2573 1.358e+04  6.787e+04 3786.2079
Rho for wk                                              1.780e-02 6.974e-01    -0.9874 3.870e-02  9.873e-01   -0.9977

Expected number of effective parameters(std dev): 11.54(0.00)
Number of equivalent replicates : 5358.54 

Marginal log-Likelihood:  -41259.05 
Posterior marginals for linear predictor and fitted values computed

The estimates for all my inputs are reasonable, except for rho which is far from my input. I can't figure out why!

The following is the code that I am using to simulate the data. My first thought was that I had somehow simulated the AR process incorrectly, but I am following the process outlined in the SPDE tutorial, Section 5.1.1 with the addition of a zero-inflated binomial response rather than a continuous one. 

library
(readr)
library
(INLA)
library
(rgeos)
library
(rgdal)
library
(maptools)
library
(splancs)
library
(dplyr)
library
(lubridate)


#Import the dataset
setwd
("")
MergedPRvars <- read_csv("")


#Import the workspace image with the mesh and locations already made
load
("")

k
<- 847 #set the number of time points in weeks
coords
<- as.matrix(loc@coords) #Get the 73 location coordinates


#Make k independent realizations of the spatial mesh for each coordinate
rspde
<- function(coords, kappa, variance=1, alpha=2, n=1, mesh,
                  verbose
=FALSE, seed, return.attributes=FALSE) {
  t0
<- Sys.time()
  theta
<- c(-0.5*log(4*pi*variance*kappa^2), log(kappa))
 
if (verbose) cat('theta =', theta, '\n')
 
if (missing(mesh)) {
    mesh
.pars <- c(0.5, 1, 0.1, 0.5, 1)*sqrt(alpha-ncol(coords)/2)/kappa
   
if (verbose) cat('mesh.pars =', mesh.pars, '\n')
    attributes
<- list(
      mesh
=inla.mesh.2d(,
                        coords
[chull(coords), ], max.edge=mesh.pars[1:2],
                        cutoff
=mesh.pars[3], offset=mesh.pars[4:5]))
   
if (verbose) cat('n.mesh =', attributes$mesh$n, '\n')
 
}
 
else attributes <- list(mesh=mesh)
  attributes$spde
<- inla.spde2.matern(attributes$mesh, alpha=alpha)
  attributes$Q
<- inla.spde2.precision(attributes$spde, theta=theta)
  attributes$A
<- inla.mesh.project(mesh=attributes$mesh, loc=coords)$A
 
if (n==1)
    result
<- drop(attributes$A%*%inla.qsample(
      Q
=attributes$Q,
      constr
=attributes$spde$f$extraconstr))
  t1
<- Sys.time()
  result
<- inla.qsample(n, attributes$Q,
                         seed
=ifelse(missing(seed), 0, seed),
                         constr
=attributes$spde$f$extraconstr)
 
if (nrow(result)<nrow(attributes$A)) {
    result
<- rbind(result, matrix(
      NA
, nrow(attributes$A)-nrow(result), ncol(result)))
    dimnames
(result)[[1]] <- paste('x', 1:nrow(result), sep='')
   
for (j in 1:ncol(result))
      result
[, j] <- drop(attributes$A%*%
                            result
[1:ncol(attributes$A),j])
 
}
 
else {
   
for (j in 1:ncol(result))
      result
[1:nrow(attributes$A), j] <-
        drop
(attributes$A%*%result[,j])
    result
<- result[1:nrow(attributes$A), ]
 
}
  t2
<- Sys.time()
  attributes$cpu
<- c(prep=t1-t0, sample=t2-t1, total=t2-t0)
 
if (return.attributes)
    attributes
(result) <- c(attributes(result), attributes)
 
return(drop(result))
}


params <- c(variance=1, kappa=sqrt(8*1)/5000) #kappa = sqrt(8*nu)/range, where range = 5 km in this case and nu is 1 since it's alpha/dimensions of coords, which are both 2


x
.k <- rspde(coords, kappa=params[2], variance=params[1], n=k,
             mesh
=mesh, return.attributes=TRUE)
dim
(x.k)


#Define autoregressive parameter
rho
<- 0.95


#Get correlated sample over time
x
<- x.k
for (j in 2:k)
  x
[,j] <- rho*x[,j-1] + sqrt(1-rho^2)*x.k[,j]


n
<- nrow(coords) #number of observations sites


#Get the covariates
vars
<- data.frame(
rlag99
= as.factor(MergedPRvars$rlag99),
ppl_FldPer
= (MergedPRvars$ppl_FldPer),
SwgPerc = (MergedPRvars$SwgPerc),
KarstPerc = (MergedPRvars$KarstPerc),
X2010mmha_Huc10
= (MergedPRvars$X2010mmha_Huc10), #variable names can't start with a number so add an X
wk
= rep(1:k, times=n))


vars
[,2:5] <- scale(vars[,2:5])


#Set beta coefficient values for each of the predictors
b1
= c(0,0.2, 0.4, 0.6) #Since this is a factor, for simplicity we will multiply and assume that each add'l rain event is equal in magnitude, not a realistic assumption!
b2
= 0.4
b3
= -0.4
b4
= -0.3
b5
= 0.1


sd
.y <-0.1 #set a small standard deviation for each observation, random chance
y
<- b1[unclass(vars$rlag99)] + b2*vars$ppl_FldPer + b3*vars$SwgPerc + b4*vars$KarstPerc + b5*vars$X2010mmha_Huc10 + x + rnorm(n*k, 0, sd.y)
#Change the response to a binomial rather than in logits, using ifelse() to set zero inflation to 0.7
y
[] <- apply(y, c(1,2), FUN= function(x) ifelse(rbinom(n=1, size=1, prob = 0.1) > 0, 0, rbinom(n=1, prob=exp(x)/(1+exp(x)), size = 1)))


#####Create the INLA SPDE and run the model
min
.width <- min(c(diff(range(mesh$loc[, 1])), diff(range(mesh$loc[, 2])))) #Minimum width of the study area in km


spde
<- inla.spde2.pcmatern(
  mesh
=mesh, alpha=2, ### mesh and smoothness parameter
  prior
.range=c(min.width/2, 0.5), ### P(range<(half the minimum width of study area in km)=0.5
  prior
.sigma=c(1, 0.5) ### P(sigma>1)=0.5, note that we know there is no spatial effect present
)




#Create the projector matrix without week grouping
obs
.locations <- as.matrix(rep(loc@coords[,1], each=847),rep(loc@coords[,2], each=847))
A
<- inla.spde.make.A(mesh, loc=obs.locations)


#Create the stack
stack
<- inla.stack(data=list(y=as.vector(y)), #This takes the responses from the matrix y defined in the simulation earlier
                    A
=list(A,1),
                    effect
=list(list(spatial = 1:spde$n.spde),data.frame(vars)))


#Define prior for AR1
h
.spec <- list(theta=list(prior='pccor1', param=c(0.5, 0.5))) #P(cor > 0.5) = 0.5


#Define the formula
formula
<- y ~ 0 + rlag99 + ppl_FldPer + SwgPerc + KarstPerc + X2010mmha_Huc10 +
 f
(spatial, model=spde) +
 f
(wk, model = 'ar1', hyper=h.spec)




#Define prior for precision
prec
.prior <- list(prior='pc.prec', param=c(1, 0.01))


result
<- inla(formula,
              family
="zeroinflated.binomial.1", #We use Type 1 because we have "true" zero values for Y
              data
=inla.stack.data(stack),
              control
.predictor=list(compute=TRUE, A=inla.stack.A(stack), link = 1), #link = 1 scales the fitted values with the logit link
              control
.family=list(link='logit',hyper=list(theta=prec.prior)),
              control
.inla=list(int.strategy='eb'),
              control
.fixed=list(expand.factor.strategy ='inla'))


#Improve estimation of hyperparameters by rerunning the model, using the previous run's estimates as the starting point
#inla.rerun(result)
summary
(result)


Any insight would be greatly appreciated. Thank you very much. 

Sincerely, 
Mark Myer

Helpdesk

unread,
Mar 8, 2018, 2:14:41 PM3/8/18
to Mark Myer, R-inla discussion group
Hi

the first I would do is to replace binary data with a N(, prec=1) data
[fixed precision], just to check that the code/model was correct. If
so, so you have the same problem then?

you might want to concentrate the prior around rho=1
[inla.doc("cor1")], in case you think of this as a flxexible extention
of a constant-in-time effect. if you concentrate this around rho=0
[inla.doc("cor0")], its like a dependent `random'-effect.

H
> --
> You received this message because you are subscribed to the Google
> Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to r-inla-discussion...@googlegroups.com
> .
> To post to this group, send email to r-inla-discussion-group@googlegr
> oups.com.
> Visit this group at https://groups.google.com/group/r-inla-discussion
> -group.
> For more options, visit https://groups.google.com/d/optout.

--
Håvard Rue
Helpdesk
he...@r-inla.org

Mark Myer

unread,
Mar 8, 2018, 2:43:16 PM3/8/18
to R-inla discussion group
Rerunning the same code with fixed-precision Gaussian data like you said produced the same effect, so I investigated further. 

It turns out that the covariates I was passing were structured as the transpose of the simulated data, because I read the output of rspde() wrong!

Simply adding x <- t(x) fixed the problem. What a silly mistake. Thank you for your help!

---Mark
Reply all
Reply to author
Forward
0 new messages