Hello everyone I would greatly appreciate any insight into the issues I am having with nmixTTD models, specifically the detection submodel.
I am working with an inherited dataset looking at the impacts of controlled burns on passerine abundance in wet shrublands. Four sites with 32 point count locations per site were surveyed each year (BACI design). Point counts had a 100 meter radius and were (usually) searched 4 times in June of each year when surveys took place. I have model parameters related to vegetation structure, vegetation composition, time since the most recent burn, severity of of burn, and season in which the controlled burn occurred. I also have detection parameters at both the survey and point count level.
My initial thoughts when seeing that dataset and thinking of analytical frameworks was that the closure assumption was being violated often and/or birds were often unavailable for detection (not singing during the survey) because of the short (6 minute) duration of point counts. In addition to repeated surveys, initial distance to individuals and tracking data on individuals (6 minute search broken up into 3 two-minute segments and every individual was tracked as detected or not detected in all 3 segments) were recorded. I had worries about different components of each of these three sources of metadata. There were often large differences in counts of a single species between surveys of one point count within a single season. For more abundant species it was not irregular for repeated counts to vary by 3-5 individuals. Under the closer assumption false absence was common for less abundant species.
My first thoughts for analysis was a method that could account for detection and temporary emigration. With the availability of tracking data, Chandler et al. 2011 was my first thought. My worry was with the accuracy of the tracking data. For most six minute point counts 8-15 individual birds were observer from 4-8 species. From my point of view (very experienced field ornithologist who was not present during data collection for this study), accurately tracking this many individuals in two minute segments seems very difficult from even the most experienced observer, and technicians collecting data turned over from year to year in this study.
This led me to Strebel et al. 2021 and TTD N-Mixture models. Turning the individual tracking data into species level time-to-detection data felt like a good way to account for imperfect detection and temporary emigration with a likely more accurate metric.
Attached is my data and below is an example of my code for one species.
library(unmarked)
BetaBirdsDataV2.4 <- read.csv("D:/Fire/BetaBirdsDataV2.4.csv")
View(BetaBirdsDataV2.4)
N <- 542
Tmax <- 6
nrep <- 4
AbundanceP <- BetaBirdsDataV2.4[,2:11]
DetectionP <- list(Time = BetaBirdsDataV2.4[,12:15], Observer = BetaBirdsDataV2.4[,16:19], DOS = BetaBirdsDataV2.4[,20:23], TotalStemHeight = BetaBirdsDataV2.4[c(24,24,24,24)])
yALFL <- BetaBirdsDataV2.4[c(26,28,30,32)]
zALFL <- BetaBirdsDataV2.4[c(25,27,29,31)]
umfALFL <- unmarkedFrameOccuTTD(y = yALFL, surveyLength = Tmax, siteCovs = AbundanceP, obsCovs = DetectionP)
After a model selection process I fit this model for Alder Flycatchers.
fitALFL7 <- nmixTTD(stateformula = ~Year + I(Stem.Density^2) + PC3 + Growing.Seasons.Post.Burn + Season.of.Last.Burn, detformula = ~DOS + Time, data = umfALFL, K = 100, mixture = "P", ttdDist = "weibull")
Call:
nmixTTD(stateformula = ~Year + I(Stem.Density^2) + PC3 + Growing.Seasons.Post.Burn +
Season.of.Last.Burn, detformula = ~DOS + Time, data = umfALFL,
K = 100, mixture = "P", ttdDist = "weibull")
Abundance:
Estimate SE z P(>|z|)
(Intercept) 2.0689 0.1335 15.50 3.59e-54
Year2017 -0.2080 0.0753 -2.76 5.73e-03
Year2018 -0.0993 0.0865 -1.15 2.51e-01
Year2019 -0.1329 0.1041 -1.28 2.02e-01
Year2020 -0.4602 0.1077 -4.27 1.94e-05
Year2022 -0.3921 0.0960 -4.09 4.40e-05
I(Stem.Density^2) -0.0461 0.0145 -3.17 1.53e-03
PC3 -0.0659 0.0308 -2.14 3.26e-02
Growing.Seasons.Post.Burn 0.1369 0.0311 4.40 1.10e-05
Season.of.Last.BurnFall 0.3192 0.0889 3.59 3.30e-04
Season.of.Last.BurnSummer 0.1438 0.1046 1.38 1.69e-01
Detection:
Estimate SE z P(>|z|)
(Intercept) -2.852 0.1311 -21.75 6.54e-105
DOS -0.267 0.0239 -11.20 3.98e-29
Time -0.209 0.0227 -9.19 3.75e-20
Weibull shape:
Estimate SE z P(>|z|)
0.198 0.0213 9.3 1.34e-20
AIC: 6697.52
In general the model selection process I used for all species was producing models with abundance parameters and parameter estimates that made sense at first glance and fit hypotheses and expectations.
Obviously something was going wrong on the detection side of the model. The fact that nmixTTD was not compatible with most unmarked and associated model selection packages made things slow going and it was difficult for me (limited coding experience) to examine the model. Using the model to make predictions made me quite sure that the model I built were operating under the closure assumption and not accounting for temporary emigration, but a cannot make predictions of detectability for any species which makes me worry that I also may have made a simple error.
Predictions of abundance are far higher that the data warrant (max number of Alder Flycatchers recorded in a plot is 4 or 5) with large amounts of uncertainty, making me think that detectability is very low, but predictions of detectability return NA.
impnewdataTime <- data.frame(Year = "2019", PC3 = mean(umfALFL@siteCovs$PC3), Stem.Density = mean(umfALFL@siteCovs$Stem.Density), Growing.Seasons.Post.Burn = c(1:18), Season.of.Last.Burn = "aSpring", DOS = mean(umfALFL@obsCovs$DOS), Time = mean(umfALFL@obsCovs$Time))
ALFLTimeS <- predict(fitALFL7, type = "state", newdata = impnewdataTime)
ALFLTimeS
Predicted SE lower upper
1 7.947459 1.219470 5.883269 10.73589
2 9.113179 1.488245 6.617026 12.55096
3 10.449884 1.861477 7.370276 14.81628
4 11.982656 2.358235 8.147665 17.62272
5 13.740251 2.999963 8.956710 21.07856
6 15.755647 3.812256 9.805697 25.31594
7 18.066659 4.826148 10.702773 30.49716
8 20.716645 6.079175 11.655730 36.82132
9 23.755328 7.616461 12.672076 44.53221
10 27.239718 9.491946 13.759198 53.92772
11 31.235194 11.769831 14.924517 65.37145
12 35.816718 14.526307 16.175627 79.30681
13 41.070253 17.851583 17.520415 96.27430
14 47.094368 21.852294 18.967159 116.93261
15 54.002089 26.654319 20.524614 142.08431
16 61.923023 32.406097 22.202093 172.70717
17 71.005786 39.282500 24.009542 209.99241
18 81.420793 47.489380 25.957608 255.39123
ALFLTimeD <- predict(fitALFL7, type = "det", newdata = impnewdataTime)
ALFLTimeD
Predicted SE lower upper
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
4 NA NA NA NA
5 NA NA NA NA
6 NA NA NA NA
7 NA NA NA NA
8 NA NA NA NA
9 NA NA NA NA
10 NA NA NA NA
11 NA NA NA NA
12 NA NA NA NA
13 NA NA NA NA
14 NA NA NA NA
15 NA NA NA NA
16 NA NA NA NA
17 NA NA NA NA
18 NA NA NA NA
My assumption at this point is that the model is working under the closure assumption and assigning species very low detectability.
I am also worried about how I formatted obsCovs. My first hint that this model was not accounting for temporary emigration was that I could not enter site level detection covariates as a single column.
My first question is have I made an obvious error in my coding? My second question is, why are my predictions of detectability returning NA and is there an obvious solution? Third, does code exist for a multi-season version of this model that accounts for temporary emigration in umarked? I have found some resources associated with supplementary materials from the paper that say that code for a temporary emigration model in unmarked did not exists as of 2021.
I have more model specific questions that probably go beyond the scope of discussion on this board (the unmarked package). I have been in contact with the authors of the paper in reference to the models ability to account to temporary emigration and its relationship to the closure assumption (In relation to what is stated in Priyadarshani et al. 2024). I am also confused on how my current models are weighting the two different sources of information on detectability and if the closure assumption exists in the current modelling framework of nmixTTD. Repeated counts and the closure assumption say detectability is low for all species, while time to detection data says detectability is relatively high for all species. Is the dissonance between these to inputs causing the issues in the detectability portion of the submodel? The high abundance estimates make me thing the model is weighing the repeated counts more strongly than the time to detection data.
I am on a tight timeline with the duration of my postdoc and am considering switching to an analysis based on Chandler et al. 2011 if there are not available solutions for multiseason nmixture ttd models in unmarked. Any thoughts or assistance would be greatly appreciated.