Problem using predict in occu with categorical raster data

956 views
Skip to first unread message

Amy J

unread,
Jul 14, 2016, 5:49:11 PM7/14/16
to unmarked
Hi unmarked users,

I'm having trouble with the predict function in an occupancy model and I believe it is because of what I found on this post from 2013: (https://groups.google.com/forum/#!searchin/unmarked/predict$20raster$20type$3D"state"/unmarked/QTPK3ujKVEc/iNuwSR2HOZs)
which says that the predict function can't handle raster stacks that include factors. There doesn't seem to be anything more recent so I'm wondering if this has changed?

I tried 'un-rasterizing' my raster stack as suggested by Chris Sutherland in that post but am still getting the same error message
Error: ncol(coefficients) == length(obj@estimates) is not TRUE

Here is some of my code to give you an idea of where I might be going wrong here:
height and NDVI are standardized continuous variables while Cover and Soil are factors with various levels

#best model
fm <- occu(~CL ~height + NDVI + Cover + Soil, data=umf)

#resampling rasters to a common extent and projection
ndvi.raster <- resample(ndvi.raster , s, method='ngb')
height.raster<- resample(height.raster, s, method='ngb')
cover.raster<- resample(cover.raster, s, method='ngb')
soil.raster <- resample(soil.raster , s, method='ngb')

#standardizing continuous rasters using original data
ndvi.s <-(ndvi.raster-NDVI.mean)/NDVI.sd)
height.s <-(height.raster-height.mean)/height.sd)

#stacking all rasters together
ef<-stack(ndvi.s, height.s, cover.raster, soil.raster)
names(ef)<-c("NDVI", "height", "Cover", "Soil")

#taking the anti-logit scale and puts in a binary scale
(beta <- coef(fm, type="state"))

logit.psi <- beta[1] + beta[2]*height.s + beta[3]*ndvi.s

psi <- exp(logit.psi)/(1 + exp(logit.psi))

plot(spplot(psi, col=terrain.colors(100)))
print(spplot(psi, col.regions=terrain.colors(100)))

E.psi <- predict(fm, type="state", newdata=ef)
Error: ncol(coefficients) == length(obj@estimates) is not TRUE

# I tried what Chris Sutherland suggested here and converted my raster stack into a dataframe
efDF<-as.data.frame(values(ef))
E.psi <- predict(fm, type="state", newdata=efDF)
Error: ncol(coefficients) == length(obj@estimates) is not TRUE

#tried something different and got the same error
efDF<-as.data.frame(ef, stringsAsFactors=T)
E.psi <- predict(fm, type="state", newdata=efDF)
Error: ncol(coefficients) == length(obj@estimates) is not TRUE

I appreciate any advice you all may have.  Thanks so much for all of your help.
Amy




Dan Linden

unread,
Jul 14, 2016, 7:50:02 PM7/14/16
to unmarked
Hi Amy,

Are you sure the factors are actually factors?  Also, are all levels of each factor consistent from model fitting to prediction?  Might need to post the model summary and results of str() applied to some of your data objects to help with troubleshooting here.

Amy J

unread,
Jul 15, 2016, 9:28:45 AM7/15/16
to unmarked
Thanks Dan,

I see where the problem might be now thanks to your comment.  When I import my rasters into R they automatically use the VALUE column and are not consistent with the names in the model (ie. 'Forest' in the model = 141, 142 and 143 in the raster).  Do I need to change all of the names in my csv file to the values that match up with the raster values in order for predict to work?

Either way, here is the model summary. 

Occupancy:
                                                Estimate      SE       z  P(>|z|)
(Intercept)                             -3.379   0.780 -4.3305 1.49e-05
height                                   -0.764   0.225 -3.3983 6.78e-04*
CoverForest                         -2.786   0.987 -2.8230 4.76e-03*
CoverGrassland                    0.643   0.696  0.9238 3.56e-01
CoverHigh Development  -14.310 675.974 -0.0212 9.83e-01
CoverLow Development       -1.402   0.859 -1.6332 1.02e-01
CoverOther                          -6.726  45.849 -0.1467 8.83e-01
CoverWater                          -0.954   0.995 -0.9587 3.38e-01
CoverWetland                     -7.094  43.300 -0.1638 8.70e-01
Soil1                                      0.939   0.474  1.9788 4.78e-02*
NDVI                                    -0.111   0.291 -0.3828 7.02e-01

Detection:
                                                      Estimate    SE     z P(>|z|)
(Intercept)                                 -0.793 0.515 -1.54  0.1233
CL                                              1.700 0.724  2.35  0.0188
AIC: 273.802


On Thursday, July 14, 2016 at 7:50:02 PM UTC-4, Dan Linden wrote:

Dan Linden

unread,
Jul 15, 2016, 11:40:27 AM7/15/16
to unmarked
It would probably be easier to just code those name changes in your R script to make sure things line up.

Also, your modeling would benefit from grouping some land cover categories.  The SEs on some of those coefficients are huge, likely because those cover categories are rare among your sampled sites, and/or the sites in those categories had no observations >0.  Unfortunately, these models have trouble when the occupancy probability is essentially zero for a given factor level.

Amy J

unread,
Jul 15, 2016, 12:05:41 PM7/15/16
to unmarked
Thanks for your advice, Dan.  I will work on your recommendations and see if I can improve the model and hopefully get the predict function to work.
Amy

Amy J

unread,
Jul 20, 2016, 4:38:54 PM7/20/16
to unmarked
Using Dan's recommendations, I reduced the land cover categories to 4 which helped the model a lot.  I also recoded the data so that each class is a value (ie. Grassland = 4, Forest = 3) in both the raster and the model so that they would match up for the prediction.
Unfortunately I'm still getting the same error message when I use the predict function. 
Error: ncol(coefficients) == length(obj@estimates) is not TRUE
However, if I run a model with no categorical variables (without Cover or Soil), the predict function works fine. If anyone has any experience with this issue, I'd love some help on this.

Here is some sample code:
#Load csv
data <- read.csv("data.csv")

#makes sure that categorical data are recognized as a factors
data$Cover<-as.factor(data$Cover)
data$Cover=factor(data$Cover,levels=c("1","2","3","4"))
data$GEOL<-as.factor(data$GEOL)
data$GEOL=factor(data$GEOL,levels=c("1","2","3","4"))

##save the original mean and sd of each continuous variable to scale the rasters later
distfor.mean <- mean(data$distfor)
distfor.sd <- sd(data$distfor)
dist_occ.mean<- mean(data$dist_occ)
dist_occ.sd  <- sd(data$dist_occ)

#create the unmarked dataframe
umf <-unmarkedFrameOccu(
  y=CH,
  siteCovs=data[,c("Cover","GEOL","distfor", "dist_occ")],
  obsCovs=list(NM=NM, CL=CL)
)

#standardize numerical covariates to optimize model fit during MLE
sco <- scale(obsCovs(umf))
oc <-(siteCovs(umf))
scs <- cbind(oc[,1:2],scale(oc[,3:4])) #cannot scale categorical covariates 1:2

obsCovs(umf) <- sco
siteCovs(umf) <- scs

#best model
best.model <-occu(~CL ~dist_occ+Cover, data=umf)

(best.model)
#Occupancy:
#  Estimate    SE     z  P(>|z|)
#(Intercept)   -0.373 0.229  -1.63 1.04e-01
#dist_occ      -3.307 0.312 -10.59 3.31e-26*
#Cover2        -0.383 0.246  -1.56 1.20e-01
#Cover3        -1.012 0.295  -3.43 6.08e-04*
#Cover4         1.523 0.556   2.74 6.17e-03*
#
#Detection:
#  Estimate     SE     z  P(>|z|)
#(Intercept)   -0.309 0.0775 -3.98 6.84e-05
#CL             0.179 0.0631  2.84 4.54e-03*
#
#AIC: 3137.741

#bring in rasters
(cover.raster<-raster("rasters/cover.bil"))
(occu.raster<-raster("rasters/occu.bil"))

#make sure categorical rasters are recognized as factors
cover.raster<-as.factor(cover.raster)

#use the original means and SDs of continous covariates to scale the rasters and join them in a rasterstack
(occu.s <-(occu.raster-dist_occ.mean)/dist_occ.sd)

#stack the raster in prep for predicting occupancy
ef<-stack(cover.raster, occu.s)
names(ef)<-c("Cover", "dist_occ")

E.psi <- predict(best.model, type="state", newdata=ef)
#Error: ncol(coefficients) == length(obj@estimates) is not TRUE

Dan Linden

unread,
Jul 20, 2016, 11:08:43 PM7/20/16
to unmarked
Hi Amy, can you report back the levels for the cover raster?

levels(cover.raster)

Amy J

unread,
Jul 21, 2016, 8:18:45 AM7/21/16
to unmarked
Hi Dan, here are the levels:

> levels(cover.raster) [[1]] ID 1 1 2 2 3 3 4 4

Dan Linden

unread,
Jul 21, 2016, 9:48:18 AM7/21/16
to unmarked
Okay, I think the problem relates to how a raster stores factors and values.  And it might be that the raster predict function doesn't work well with model objects, even though it should.  The fix is to combine the rasters into a dataframe (remove the NAs), use the dataframe for prediction, and then create a new raster from the predictions and the original coordinates.  Try the following before you had stacked the rasters:

ef <- data.frame(coordinates(cover.raster),
                 Cover=values(cover.raster),
                 dist_occ=values(occu.s))
ef <- ef[which(!is.na(ef$Cover)),]


E.psi <- predict(best.model, type="state", newdata=ef)

E.psi.r <- rasterFromXYZ(cbind(ef[,c("x","y")],E.psi[,"Predicted"]))



Dan Linden

unread,
Jul 21, 2016, 10:10:15 AM7/21/16
to unmarked
Of course, I forgot the important part which is to make sure Cover is a proper factor, after creating ef:

ef$Cover <- as.factor(ef$Cover)

Amy J

unread,
Jul 21, 2016, 10:44:46 AM7/21/16
to unmarked
Thank you, Dan.  At first I got the same error message but then I added a line where I converted Cover from a number to a factor and it works! Thank you for all of your help! ~Amy

The final code looks like this:
ef <- data.frame(coordinates(cover.raster),
                 Cover=values(cover.raster),
                 dist_occ=values(occu.s))
ef <- ef[which(!is.na(ef$Cover)),]
#make sure cover is recognized as a factor
ef$Cover<-as.factor(ef$Cover)
E.psi <- predict(best.model, type="state", newdata=ef)

E.psi.r <- rasterFromXYZ(cbind(ef[,c("x","y")],E.psi[,"Predicted"]))

Amy J

unread,
Jul 21, 2016, 10:45:37 AM7/21/16
to unmarked
Oops! Just saw this!  Thanks again!
Reply all
Reply to author
Forward
0 new messages