psi estimates for colext() in yr1 when all detection NA for site

329 views
Skip to first unread message

Jamie M. Kass

unread,
Aug 18, 2016, 12:43:26 AM8/18/16
to unmarked
Hello. I am running colext() on a multiyear detection/non-detection dataset with covariates for psi, col, and ext. In an effort to understand better how everything is calculated, I've been examining my outputs, and I noticed something initially strange.

Over the years, the number of sampled sites has increased, so year1 has fewer sites sampled than year2, and so on. Sites in year1 with NA for detection still have a prediction for psi in the output (fm@smoothed). I thought that possibly the value for sites with NA detection is derived by applying the full model formula (estimated via all the years' data) to the site covariate values and back-transforming, but the model estimates are different. Further, if I run the same data for year1 through occu(), the sites with NA detection reasonably are discarded. What is colext() doing different?

If anyone can help me out, I'd greatly appreciate it. Thanks!

-Jamie

Jim Baldwin

unread,
Aug 18, 2016, 5:44:06 PM8/18/16
to unma...@googlegroups.com
The colext vignette in the unmarked documentation gives hints as to what's going (about smoothed and projected) on but more details (as suggested by that document) are found in

Weir, L., I.J. Fiske, and J.A. Royle. 2009. Trends in anuran occupancy from northeastern
states of the North American Amphibian Monitoring Program. Herpetological
Conservation and Biology 4:389-402.

and found online at


While that document does not explicitly cover site/year combinations with all NA values, I think that it still deals with the question you have.  (I'm not presenting any equations here because it does take more that a few lines to explain it.)

Jim


--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jamie M. Kass

unread,
Aug 29, 2016, 11:58:10 AM8/29/16
to unmarked
Thanks for your comments, Jim. I read through Weir et al. 2009, and it was illuminating regarding the smoothed and projected estimates for colext(). Although I now have a cursory understanding of these differences, it's still hard to see how they relate to my particular problem. I am still having trouble understanding how I get estimates for psi in the first year for sites that have NA samples for all visits. If the first year's occupancy is indeed calculated just like a single-season occupancy model, whereas following years are calculated using the last year's occupancy and colonization and extinction parameters, why would I have values for NA sites in colext() while they are dropped from occ()? Would one occupancy estimate give values for psi for sites with NA samples, while the other would not?

I really appreciate the help! Thanks.

-Jamie
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.

Kery Marc

unread,
Aug 29, 2016, 3:24:25 PM8/29/16
to unma...@googlegroups.com
Dear all,

I wanted to try out these differences in an experimental setting (i.e., with simulated data, where we know truth) and put together the attached code. The upshot is that I don't understand the differences either. I would have thought that the smoothed estimates from colext correspond to what we get from ranef(fm), where fm is a fitted model object from occu. And that the projected estimates from colext correspond to what we get from predict(fm) for a model fit with occu. But this is not the case; indeed, there seem to be substantial differences. As an aside, I assume that the site-specific estimates are from predict(fm) or ranef(fm) (for the static model) are shown in the same order as the sites appear in the data file.

In addition, a model fit with occu that had entire sites with missing values but non-missing values of an observational covariate produced an error; see line 412.

Marc





From: unma...@googlegroups.com [unma...@googlegroups.com] on behalf of Jamie M. Kass [ndimhy...@gmail.com]
Sent: 29 August 2016 17:58
To: unmarked
Subject: Re: [unmarked] psi estimates for colext() in yr1 when all detection NA for site

Check some stuff with estimation of occupancy at unsurveyed sites in functions colext and occu.docx

Jim Baldwin

unread,
Aug 30, 2016, 1:40:30 AM8/30/16
to unma...@googlegroups.com
I'm wondering if there might be really two issues:  (1) How are the conditional estimates for sites determined when visit information is missing for one or more years and (2) Can one construct estimates when the visit information is missing?

The second question is a definite "yes" as it is just about predictions based on the estimated coefficients and the observed data.  One can certainly make predictions for sites not visited at all so I'm not seeing a problem with such predictions.  And getting an estimate for a year with no visits doesn't mean that site/year combination provides any information for the estimators.

If a site contains all NA's from single year dataset with occu, then as you note that site is not used in the estimation of the coefficients.  But one can still make a prediction for that site.  With colext only years with visits for a site are used to construct estimates.

But maybe I've completely missed the point.

If I have time tomorrow, I'll attach an explicit example as to how the conditional estimates are calculated.

Jim


To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+unsubscribe@googlegroups.com.

Jim Baldwin

unread,
Aug 30, 2016, 5:30:40 PM8/30/16
to unma...@googlegroups.com
So the "smoothed" site estimates of the probability of presence for each year are conditional probabilities conditioned on the observed status.  Sites with some years missing can still have these conditional probabilities estimated.  I've attached a PDF document outlining the rationale for when there are just two seasons (and to make it simpler, no covariates).  I've also attached some R code that will produce that conditional probabilities "by hand" and shows that those match perfectly with what is produced by slot(fm,"smoothed").  The journal article mentioned earlier has, of course, more details and covers the more general case.

Jim
colext example.r
explanation.pdf

Jim Baldwin

unread,
Aug 31, 2016, 3:02:42 AM8/31/16
to unma...@googlegroups.com
Marc,

As you state using ranef(fm2) results in the conditional probabilities for each site as in colext using "smoothed" except that the sites with no visits are dropped (as those sites do not provide information for estimating the parameters - although maybe a future version of occu might want to "skip" those sites rather than dropping them).  If the site has at least one detection, then the conditional probability of the site having presence is 1.  If the site has no detections, then the conditional probability is psi*(1-p)^v/(1-psi+psi*(1-p)^v).  If the site has no visits, then the conditional probability of presence is psi.

The conditional probabilities of presence for all of the sites (with the covariates not missing) are found and placed into a variable named psi1 as follows:

psi.mle = 1 - 1/(1+exp(fm2@opt$par[1]+fm2@opt$par[2]*data$Xpsi1))
p.mle =  backTransform(fm2,"det")@estimate
# This determination of observed status works only because if one visit is
# missing, all visits are missing.
status = pmin(1,rowSums(yy[,1:3]))
psi1 = psi.mle*(1-p.mle)^3/(1-psi.mle+psi.mle*(1-p.mle)^3)
psi1[is.na(status)] = psi.mle[is.na(status)]
psi1[status==1] = 1
psi1

Jim


To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+unsubscribe@googlegroups.com.

Jamie M. Kass

unread,
Sep 2, 2016, 3:41:25 PM9/2/16
to unmarked
Marc and Jim,

Thank you both very much for the detailed answers.

Jim, after going through your code and explanation in the PDF, I have a much better understanding for how NA years are handled in the calculations. To sum it up, these years are not used to model initial occupancy, but are given predicted occupancy values after all other years are considered via the equations you outlined, right?

One last question -- is it always the case that the smoothed estimates (or the output of ranef) will be equal to 1 if at least one of the visits in that year has a detection, regardless of any other non-detections or NAs? For my data, I see that some sites for some years have a smoothed value below 1 even though they have a detection for that year. I might be misunderstanding something here.

-Jamie

Kery Marc

unread,
Sep 2, 2016, 4:33:22 PM9/2/16
to unma...@googlegroups.com
Dear Jamie,

yes, the NA sites are not used to estimate the parameter(s) for initial occupancy, but one may still estimate occupancy for them (given these parameters that are estimated from the other sites).

And yes, sites where a species is detected in a given year will always have a conditional occupancy of 1, regardless of any NA's or zeroes in a given year.

Best regards --- Marc




Sent: 02 September 2016 21:41
Reply all
Reply to author
Forward
0 new messages