Coding categorical covariates

1,212 views
Skip to first unread message

Holly Booker

unread,
Feb 19, 2016, 9:39:34 AM2/19/16
to unmarked
Hi everyone, 
I hope someone can help me as I am fairly sure there is a simple solution that I just can't figure out. I have detection histories for 4 camps: horm, zns, man, and km20. I want to include the camps (aka sites in my data frame) as a categorical covariate for occupancy modelling. Is it sufficient to just have them in my sitCovs data frame like this? I have 80 sample sites total. 
site transect quadrat distance aguada
km20 1 1 355 1700
horm 1 2 700 455
man 1 3 1100 800
zns 1 4 1450 1200

Would I then use as.factor to specify that the variable is a factor with 4 levels? 
hab$site<-as.factor(hab$site)

is.factor(hab$site)

hab$site=factor(hab$site,levels=c("horm","zns","man","km20"),ordered=TRUE)


Or do I need to some sort of dummy coding? 

If site is important for influencing occupancy I would like to be able to look at for example, occupancy estimates for a range for aguada at each of the camps (sites).


Thank you very much,

Holly 

Jeffrey Royle

unread,
Feb 19, 2016, 11:21:06 AM2/19/16
to unma...@googlegroups.com
hi Holly,
 If that is what your siteCovs dataframe looks like, what happens if you try to fit an occupancy model using "site" as a covariate on occupancy?  
 regards
andy


--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Holly Booker

unread,
Feb 19, 2016, 2:05:25 PM2/19/16
to unmarked
Hi Andy,
It appears to work just fine but it was found in all of my top models based on AICc. When I try to use predict(); however, I get psi estimates of 1. For example for mod33 "p(.)psi(site+aguada+preyo)"


This appears to work ok.

newdata<-data.frame(aguada=0, preyo=0,site=factor(c("horm","zns","man","km20"))) 

predict(mod33,newdata=newdata,type="state")


  Predicted         SE       lower     upper

1 0.06354625 0.06229819 0.008644156 0.3455915

2 0.46750745 0.33205091 0.060402270 0.9230209

3 0.25294942 0.27023560 0.020117569 0.8481227

4 0.05039270 0.07766775 0.002199549 0.5609199


When I try to predict for aguada at each site the estimate is 1 for every site. 

newdata<-data.frame(preyo=0,site=factor("zns", levels=c("horm","zns","man","km20")),aguada=seq(0,3500,by=50)) 

Epsi1<-predict(mod33,type="state",newdata=newdata,appendData=TRUE)

Holly Booker

unread,
Feb 22, 2016, 9:34:45 AM2/22/16
to unmarked
Sorry to post again, but is there anyone that can tell me if this is okay for a categorical variable with 4 "levels" or do I need to do some sort of dummy coding? I am not sure why I am getting psi estimates of 1 using predict this way. 
Thank you very much.
Holly

Dan Linden

unread,
Feb 22, 2016, 12:17:30 PM2/22/16
to unmarked
Hi Holly, if you attach a reproducible example (R script and data) we might be able to better help.  What does the output from your model look like?  Did you happen to standardize your covariates?  You shouldn't need to use any dummy coding with unmarked, as factors will automatically be interpreted correctly.

Holly Booker

unread,
Feb 22, 2016, 2:45:51 PM2/22/16
to unmarked
Hi Dan, 
Thank you so much for responding. I think I figured it out as I did standardize the covariates but did not include the standardized limits in the predict script. I also changed "by=50" to "length=20" as I have 20 "zns" sites out of my 80 total. I now have this and the results make sense:


newdata<-data.frame(site=factor("zns", levels=c("horm","zns","man","km20")),aguada=seq(-1.3407662,1.8854890,length=20)) 

Epsi1<-predict(mod11,type="state",newdata=newdata,appendData=TRUE)

Epsi1


Does this make sense to you as a solution? Also, is there a way to view the occupancy estimates for each individual sampling site from a model? I want to use the occupancy estimates from my top model for brocket deer as a covariate for occupancy for my predators. Predict gives me the estimates over a range of data but I would like the estimates for the actual data at each sampling site. Right now I am just using a density index of brocket deer based on the detection histories I have rather than an actual psi estimate. 
Thank you,
Holly

Dan Linden

unread,
Feb 22, 2016, 3:06:29 PM2/22/16
to unmarked
Yes, that makes sense.  To get the latent occurrence states you can use the ranef() function.

Holly Booker

unread,
Feb 22, 2016, 3:17:15 PM2/22/16
to unmarked
Excellent, thank you again Dan.
Holly

Lauren Nicole

unread,
Nov 7, 2017, 3:12:33 PM11/7/17
to unmarked
Holly,

Did you ever learn if there was something wrong with a psi estimate of 1?  I get similar results when I try to predict or backtransform my top models for a psi or p probability estimate.

Thanks,

Lauren


On Friday, February 19, 2016 at 2:05:25 PM UTC-5, Holly Booker wrote:
Reply all
Reply to author
Forward
0 new messages