Hi all,
This Group has been a huge help to me as I’ve attempted to learn unmarked—thanks to everyone who devotes time to helping folks like me. Here’s the problem that finally spurred me to post a question:
I have distance sampling data for breeding birds from 140 point count stations that were surveyed four times during the breeding season. Temporary emigration models (Chandler et al. 2011) seemed ideal for these data, but after a lot of attempts, I’ve decided to not model availability. I have a good idea of true density from intensive territory mapping for one species of interest, but model estimates for that species are more than twice the “true” density. The reason for this, I believe, is that replicate counts in this system (desert grasslands) vary widely in detections, which results in low estimates of phi (mean <0.5) despite any survey-level (“yearly site-level” in gdistsamp) covariates that I include.
Instead, my approach until recently was to sum the detections from all surveys for each distance class and include an offset of the number of surveys in distsamp models. Although I’ve never been confident about this approach, the estimates seem reasonable, and I have good model fit for most species. After reading pertinent chapters in the new Kery and Royle and playing with the yellow wagtail data, I decided to try stacking my data and using gdistsamp on unmarked frames with no yearlySiteCovs. This is appealing because it would allow me to try negative binomial mixtures and model survey-level covariates on the detection function. I suspected that if I fit two models with the same abundance and detection covariates: 1) a distsamp model with summed data and an offset, and 2) a gdistsamp model with stacked data and no yearlySiteCovs, estimates would be very similar. Sure enough, these two models returned identical parameter estimates and SEs. The problem arises, however, when I run GOF tests on these two models. Fit statistics for the distsamp model look good, but for the gdistsamp model, results are abysmal.
So this brings me to two questions:
1) Are there any obvious problems with my approach that are glaringly obvious?
2) If not, why are the fit statistics from two models with identical parameter estimates and SEs so different?
Hopefully, I’m just being dense and missing something obvious, but figuring it out seems to be beyond me at this point. Below are some simplified code and outputs that (maybe) make this more clear. I tried to alter the wagtail data a bit to reproduce my problem, but it seems that I can’t get a Poisson mixture with only site-level covariates (which I need to demonstrate my distsamp/offset approach) to return decent fit statistics.
Any help would be highly appreciated!
> # distance data from four surveys, binned into 8 distance classes (22m each)
> yDat2 <- formatDistData(ptct, distCol="meters", transectNameCol="plot", dist.breaks=seq(0,176,22), occasionCol="ct.num")
> str(yDat2)
int [1:140, 1:32] 0 0 0 0 0 0 0 0 0 0 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:140] "aw1" "aw2" "aw4" "aw400" ... ..$ : chr [1:32] "[0,22]1" "(22,44]1" "(44,66]1" "(66,88]1" ...>
> # sum distance data from each survey by distance classes (sum of all detections in each class )
> yDat <- formatDistData(ptct, distCol="meters", transectNameCol="plot", dist.breaks=seq(0,176,22))
> head(yDat)
[0,22] (22,44] (44,66] (66,88] (88,110] (110,132] (132,154] (154,176]aw1 0 2 1 1 1 1 2 0aw2 0 1 3 0 3 1 1 0aw4 1 1 3 2 1 1 1 0aw400 0 0 3 3 3 2 1 0aw402 0 0 0 0 0 1 1 1aw403 0 0 1 1 1 0 1 0>
> # create umf with three site covariates: 2 continuous (scaled) and 1 to be used as an offset (number surveys = 4; not scaled)
> umf.offset <- unmarkedFrameDS(y=as.matrix(yDat),survey="point",dist.breaks=seq(0,176,22),unitsIn="m",
+ siteCovs=data.frame(sc.scaled$cv.tr.mean,sc.scaled$gr.nat.nn.RA.dvrsty.PC1,sc.scaled$surveys))
> str(umf.offset)
Formal class 'unmarkedFrameDS' [package "unmarked"] with 9 slots ..@ dist.breaks: num [1:9] 0 22 44 66 88 110 132 154 176 ..@ tlength : num [1:140] NA NA NA NA NA NA NA NA NA NA ... ..@ survey : chr "point" ..@ unitsIn : chr "m" ..@ y : int [1:140, 1:8] 0 0 1 0 0 0 0 0 0 1 ... .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:140] "aw1" "aw2" "aw4" "aw400" ... .. .. ..$ : chr [1:8] "[0,22]" "(22,44]" "(44,66]" "(66,88]" ... ..@ obsCovs : NULL ..@ siteCovs :'data.frame': 140 obs. of 3 variables: .. ..$ sc.scaled.cv.tr.mean : num [1:140] -0.85 -0.85 -0.85 -0.56 -0.353 ... .. ..$ sc.scaled.gr.nat.nn.RA.dvrsty.PC1: num [1:140] -0.116 -1.122 -1.321 -0.238 0.938 ... .. ..$ sc.scaled.surveys : num [1:140] 4 4 4 4 4 4 4 4 4 4 ... ..@ mapInfo : NULL ..@ obsToY : num [1, 1:8] 1 1 1 1 1 1 1 1>
> # stack data
> ystacked<- rbind(yDat2[,1:8], yDat2[,9:16], yDat2[,17:24], yDat2[,25:32])
> str(ystacked)
int [1:560, 1:8] 0 0 0 0 0 0 0 0 0 0 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:560] "aw1" "aw2" "aw4" "aw400" ... ..$ : chr [1:8] "[0,22]1" "(22,44]1" "(44,66]1" "(66,88]1" ...>
> # umf with stacked data, no yearly site covariates, site covariates repeated 4 times
> cv.tr.mean <- rep(sc.scaled$cv.tr.mean,4)
> gr.nat.nn.RA.dvrsty.PC1 <- rep(sc.scaled$gr.nat.nn.RA.dvrsty.PC1,4)
> umf.stacked <- unmarkedFrameGDS(y = ystacked, survey="point", unitsIn="m",
+ dist.breaks=seq(0,176,22), numPrimary=1,
+ siteCovs=data.frame(cv.tr.mean, gr.nat.nn.RA.dvrsty.PC1))
> str(umf.stacked)
Formal class 'unmarkedFrameGDS' [package "unmarked"] with 11 slots ..@ dist.breaks : num [1:9] 0 22 44 66 88 110 132 154 176 ..@ tlength : num [1:560] 1 1 1 1 1 1 1 1 1 1 ... ..@ survey : chr "point" ..@ unitsIn : chr "m" ..@ numPrimary : num 1 ..@ yearlySiteCovs: NULL ..@ y : int [1:560, 1:8] 0 0 0 0 0 0 0 0 0 0 ... .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:560] "aw1" "aw2" "aw4" "aw400" ... .. .. ..$ : chr [1:8] "[0,22]1" "(22,44]1" "(44,66]1" "(66,88]1" ... ..@ obsCovs : NULL ..@ siteCovs :'data.frame': 560 obs. of 2 variables: .. ..$ cv.tr.mean : num [1:560] -0.85 -0.85 -0.85 -0.56 -0.353 ... .. ..$ gr.nat.nn.RA.dvrsty.PC1: num [1:560] -0.116 -1.122 -1.321 -0.238 0.938 ... ..@ mapInfo : NULL ..@ obsToY : num [1, 1:8] 1 1 1 1 1 1 1 1 > ### fit distsamp model with offset
> (distMod <- distsamp(~cv.tr.mean
+ ~ offset(log(surveys))+cv.tr.mean+I(cv.tr.mean^2)+gr.nat.nn.RA.dvrsty.PC1,
+ umf, keyfun = "hazard", output = "density",unitsOut = "ha"))
Call:distsamp(formula = ~cv.tr.mean ~ offset(log(surveys)) + cv.tr.mean + I(cv.tr.mean^2) + gr.nat.nn.RA.dvrsty.PC1, data = umf, keyfun = "hazard", output = "density", unitsOut = "ha") Density: Estimate SE z P(>|z|)(Intercept) -0.5934 0.0543 -10.92 8.83e-28cv.tr.mean 0.3168 0.0509 6.22 4.93e-10I(cv.tr.mean^2) -0.0631 0.0284 -2.22 2.64e-02gr.nat.nn.RA.dvrsty.PC1 0.0754 0.0268 2.82 4.87e-03 Detection: Estimate SE z P(>|z|)shape(Intercept) 4.625 0.0363 127.46 0.00e+00shapecv.tr.mean -0.156 0.0232 -6.71 1.97e-11 Hazard-rate(scale): Estimate SE z P(>|z|) 1.24 0.08 15.6 1.5e-54 AIC: 3049.603 >
> ### fit gistsamp model with same covariates (no offset, no phi)
> (gdistMod <- gdistsamp(~ cv.tr.mean+I(cv.tr.mean^2)+gr.nat.nn.RA.dvrsty.PC1,
+ ~ 1,
+ ~ cv.tr.mean,
+ data = umf.stacked, keyfun = "hazard", output = "density",unitsOut = "ha",
+ mixture = "P", K = 100, se = TRUE))
Call:gdistsamp(lambdaformula = ~cv.tr.mean + I(cv.tr.mean^2) + gr.nat.nn.RA.dvrsty.PC1, phiformula = ~1, pformula = ~cv.tr.mean, data = umf.stacked, keyfun = "hazard", output = "density", unitsOut = "ha", mixture = "P", K = 100, se = TRUE) Abundance: Estimate SE z P(>|z|)(Intercept) -0.5934 0.0543 -10.92 8.78e-28cv.tr.mean 0.3169 0.0509 6.22 4.90e-10I(cv.tr.mean^2) -0.0631 0.0284 -2.22 2.64e-02gr.nat.nn.RA.dvrsty.PC1 0.0754 0.0268 2.82 4.87e-03 Detection: Estimate SE z P(>|z|)(Intercept) 4.625 0.0363 127.47 0.00e+00cv.tr.mean -0.156 0.0232 -6.71 1.95e-11 Hazard-rate(scale): Estimate SE z P(>|z|) 1.24 0.08 15.6 1.5e-54 AIC: 5750.533 > ### all coefficients and SEs identical
> ### Goodness of fit: distsamp model
> fitstatsDS <- function(distMod) {
+ observed <- getY(distMod@data)
+ expected <- fitted(distMod)
+ resids <- residuals(distMod)
+ sse <- sum(resids^2)
+ chisq <- sum((observed - expected)^2 / expected)
+ freeTuke <- sum((sqrt(observed) - sqrt(expected))^2)
+ out <- c(SSE=sse, Chisq=chisq, freemanTukey=freeTuke)
+ return(out)
+ }
>
> (pbDS <- parboot(distMod, fitstatsDS, nsim=50, report=10))
Call: parboot(object = distMod, statistic = fitstatsDS, nsim = 50, report = 10) Parametric Bootstrap Statistics: t0 mean(t0 - t_B) StdDev(t0 - t_B) Pr(t_B > t0)SSE 1367 -35.63 76.5 0.686Chisq 1075 -38.94 47.8 0.843freemanTukey 467 1.99 16.2 0.451 t_B quantiles: 0% 2.5% 25% 50% 75% 97.5% 100%SSE 1191 1252 1343 1433 1457 1510 1537Chisq 965 1027 1085 1112 1142 1205 1239freemanTukey 405 437 456 466 474 489 508 t0 = Original statistic compuated from datat_B = Vector of bootstrap samples> ### distsamp with offset: pretty good fit > ### Goodness of fit: gdistsamp model
> fitstatsGDS <- function(gdistMod) {
+ observed <- getY(gdistMod@data)
+ expected <- fitted(gdistMod)
+ resids <- residuals(gdistMod)
+ sse <- sum(resids^2)
+ chisq <- sum((observed - expected)^2 / expected)
+ freeTuke <- sum((sqrt(observed) - sqrt(expected))^2)
+ out <- c(SSE=sse, Chisq=chisq, freemanTukey=freeTuke)
+ return(out)
+ }
>
> (pbGDS <- parboot(gdistMod, fitstatsGDS, nsim=50, report=1))
Call: parboot(object = gdistMod, statistic = fitstatsGDS, nsim = 50, report = 1) Parametric Bootstrap Statistics: t0 mean(t0 - t_B) StdDev(t0 - t_B) Pr(t_B > t0)SSE 1682 291 44.8 0.00Chisq 49155 -2981 1188.3 0.98freemanTukey 1051 175 18.5 0.00 t_B quantiles: 0% 2.5% 25% 50% 75% 97.5% 100%SSE 1293 1298 1356 1400 1423 1467 1471Chisq 49524 50060 51111 52365 52971 54279 55035freemanTukey 839 842 862 876 890 908 923 t0 = Original statistic compuated from datat_B = Vector of bootstrap samples> ### gdistsamp with stacked data: poor fit
--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
> head(observed <- getY(distMod@data))
[0,22] (22,44] (44,66] (66,88] (88,110] (110,132] (132,154] (154,176] aw1 0 2 1 1 1 1 2 0 aw2 0 1 3 0 3 1 1 0 aw4 1 1 3 2 1 1 1 0 aw400 0 0 3 3 3 2 1 0 aw402 0 0 0 0 0 1 1 1 aw403 0 0 1 1 1 0 1 0
> head(observed <- getY(gdistMod@data)) [0,22]1 (22,44]1 (44,66]1 (66,88]1 (88,110]1 (110,132]1 (132,154]1 (154,176]1 aw1 0 1 1 1 0 0 1 0 aw2 0 0 0 0 0 0 0 0 aw4 0 0 2 1 0 1 1 0 aw400 0 0 2 2 0 1 0 0 aw402 0 0 0 0 0 0 1 0 aw403 0 0 0 0 0 0 0 0 > head(expected <- fitted(distMod)) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 0.2430734 0.7292201 1.215255 1.661171 1.793406 1.557631 1.227865 0.9443200 [2,] 0.2253136 0.6759408 1.126464 1.539800 1.662374 1.443825 1.138153 0.8753247 [3,] 0.2219546 0.6658639 1.109671 1.516845 1.637591 1.422301 1.121185 0.8622754 [4,] 0.2709168 0.8127504 1.354182 1.823019 1.879913 1.570436 1.210270 0.9191614 [5,] 0.3199107 0.9597321 1.598557 2.120951 2.112475 1.718878 1.305408 0.9835162 [6,] 0.2522307 0.7566920 1.261037 1.723752 1.860969 1.616312 1.274122 0.9798953 > head(expected <- fitted(gdistMod)) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 0.006244105 0.01873232 0.03121765 0.04267284 0.04607126 0.04001549 0.03154427 0.02426006 [2,] 0.005787910 0.01736373 0.02893689 0.03955515 0.04270529 0.03709195 0.02923964 0.02248762 [3,] 0.005701628 0.01710488 0.02850552 0.03896549 0.04206867 0.03653901 0.02880375 0.02215239 [4,] 0.006959499 0.02087850 0.03478717 0.04683155 0.04829447 0.04034491 0.03109246 0.02361382 [5,] 0.008218166 0.02465450 0.04106524 0.05448567 0.05426928 0.04415850 0.03353655 0.02526704 [6,] 0.006479328 0.01943798 0.03239366 0.04428037 0.04780682 0.04152292 0.03273258 0.02517397 > head(resids <- residuals(distMod)) [0,22] (22,44] (44,66] (66,88] (88,110] (110,132] (132,154] (154,176] aw1 -0.2430734 1.2707799 -0.2152549 -0.6611709 -0.7934058 -0.5576310 0.7721352 -0.94431998 aw2 -0.2253136 0.3240592 1.8735357 -1.5398001 1.3376265 -0.4438252 -0.1381528 -0.87532475 aw4 0.7780454 0.3341361 1.8903291 0.4831554 -0.6375907 -0.4223006 -0.1211852 -0.86227535 aw400 -0.2709168 -0.8127504 1.6458184 1.1769806 1.1200875 0.4295644 -0.2102700 -0.91916144 aw402 -0.3199107 -0.9597321 -1.5985565 -2.1209508 -2.1124748 -0.7188783 -0.3054082 0.01648382 aw403 -0.2522307 -0.7566920 -0.2610372 -0.7237522 -0.8609688 -1.6163117 -0.2741222 -0.97989535 > head(resids <- residuals(gdistMod)) [0,22]1 (22,44]1 (44,66]1 (66,88]1 (88,110]1 (110,132]1 (132,154]1 (154,176]1 aw1 -0.006244105 0.98126768 0.96878235 0.95732716 -0.04607126 -0.04001549 0.96845573 -0.02426006 aw2 -0.005787910 -0.01736373 -0.02893689 -0.03955515 -0.04270529 -0.03709195 -0.02923964 -0.02248762 aw4 -0.005701628 -0.01710488 1.97149448 0.96103451 -0.04206867 0.96346099 0.97119625 -0.02215239 aw400 -0.006959499 -0.02087850 1.96521283 1.95316845 -0.04829447 0.95965509 -0.03109246 -0.02361382 aw402 -0.008218166 -0.02465450 -0.04106524 -0.05448567 -0.05426928 -0.04415850 0.96646345 -0.02526704 aw403 -0.006479328 -0.01943798 -0.03239366 -0.04428037 -0.04780682 -0.04152292 -0.03273258 -0.02517397
> d.off <- distsamp(~1 ~ 1+ offset(log(surveys)), + umf.offset, keyfun = "hazard", output = "density",unitsOut = "ha") > d.no.off <- distsamp(~1 ~ 1, + umf.offset, keyfun = "hazard", output = "density",unitsOut = "ha") > backTransform(d.no.off, type="state") Backtransformed linear combination(s) of Density estimate(s) Estimate SE LinComb (Intercept) 2.18 0.113 0.78 1 Transformation: exp > backTransform(d.off, type="state") Backtransformed linear combination(s) of Density estimate(s) Estimate SE LinComb (Intercept) 0.545 0.0283 -0.606 1 Transformation: exp > mean(predict(d.no.off, type="state")$Predicted) [1] 2.1816 > mean(predict(d.off, type="state")$Predicted) [1] 2.181601 > mean(predict(d.off, type="state",newdata=data.frame(surveys=1))$Predicted) [1] 0.5454003