Difficulty with nmixTTD

45 views
Skip to first unread message

Tom Stevens

unread,
May 21, 2024, 3:55:20 PMMay 21
to unmarked
Hello everyone I would greatly appreciate any insight into the issues I am having with nmixTTD models, specifically the detection submodel.

I am working with an inherited dataset looking at the impacts of controlled burns on passerine abundance in wet shrublands. Four sites with 32 point count locations per site were surveyed each year (BACI design). Point counts had a 100 meter radius and were (usually) searched 4 times in June of each year when surveys took place. I have model parameters related to vegetation structure, vegetation composition, time since the most recent burn, severity of of burn, and season in which the controlled burn occurred. I also have detection parameters at both the survey and point count level.

My initial thoughts when seeing that dataset and thinking of analytical frameworks was that the closure assumption was being violated often and/or birds were often unavailable for detection (not singing during the survey) because of the short (6 minute) duration of point counts. In addition to repeated surveys, initial distance to individuals and tracking data on individuals (6 minute search broken up into 3  two-minute segments and every individual was tracked as detected or not detected in all 3 segments) were recorded. I had worries about different components of each of these three sources of metadata. There were often large differences in counts of a single species between surveys of one point count within a single season. For more abundant species it was not irregular for repeated counts to vary by 3-5 individuals. Under the closer assumption false absence was common for less abundant species. 

My first thoughts for analysis was a method that could account for detection and temporary emigration. With the availability of tracking data, Chandler et al. 2011 was my first thought. My worry was with the accuracy of the tracking data. For most six minute point counts 8-15 individual birds were observer from 4-8 species. From my point of view (very experienced field ornithologist who was not present during data collection for this study), accurately tracking this many individuals in two minute segments seems very difficult from even the most experienced observer, and technicians collecting data turned over from year to year in this study.

This led me to Strebel et al. 2021 and TTD N-Mixture models. Turning the individual tracking data into species level time-to-detection data felt like a good way to account for imperfect detection and temporary emigration with a likely more accurate metric.

Attached is my data and below is an example of my code for one species.

library(unmarked)

BetaBirdsDataV2.4 <- read.csv("D:/Fire/BetaBirdsDataV2.4.csv")
View(BetaBirdsDataV2.4)

N <- 542
Tmax <- 6
nrep <- 4

AbundanceP <- BetaBirdsDataV2.4[,2:11]
DetectionP <- list(Time = BetaBirdsDataV2.4[,12:15], Observer = BetaBirdsDataV2.4[,16:19], DOS = BetaBirdsDataV2.4[,20:23], TotalStemHeight = BetaBirdsDataV2.4[c(24,24,24,24)])

yALFL <- BetaBirdsDataV2.4[c(26,28,30,32)]
zALFL <- BetaBirdsDataV2.4[c(25,27,29,31)]

umfALFL <- unmarkedFrameOccuTTD(y = yALFL, surveyLength = Tmax, siteCovs = AbundanceP, obsCovs = DetectionP)

After a model selection process I fit this model for Alder Flycatchers.

fitALFL7 <- nmixTTD(stateformula = ~Year + I(Stem.Density^2) + PC3 + Growing.Seasons.Post.Burn + Season.of.Last.Burn, detformula = ~DOS + Time, data = umfALFL, K = 100, mixture = "P", ttdDist = "weibull")

Call:
nmixTTD(stateformula = ~Year + I(Stem.Density^2) + PC3 + Growing.Seasons.Post.Burn +
    Season.of.Last.Burn, detformula = ~DOS + Time, data = umfALFL,
    K = 100, mixture = "P", ttdDist = "weibull")

Abundance:
                          Estimate     SE     z  P(>|z|)
(Intercept)                 2.0689 0.1335 15.50 3.59e-54
Year2017                   -0.2080 0.0753 -2.76 5.73e-03
Year2018                   -0.0993 0.0865 -1.15 2.51e-01
Year2019                   -0.1329 0.1041 -1.28 2.02e-01
Year2020                   -0.4602 0.1077 -4.27 1.94e-05
Year2022                   -0.3921 0.0960 -4.09 4.40e-05
I(Stem.Density^2)          -0.0461 0.0145 -3.17 1.53e-03
PC3                        -0.0659 0.0308 -2.14 3.26e-02
Growing.Seasons.Post.Burn   0.1369 0.0311  4.40 1.10e-05
Season.of.Last.BurnFall     0.3192 0.0889  3.59 3.30e-04
Season.of.Last.BurnSummer   0.1438 0.1046  1.38 1.69e-01

Detection:
            Estimate     SE      z   P(>|z|)
(Intercept)   -2.852 0.1311 -21.75 6.54e-105
DOS           -0.267 0.0239 -11.20  3.98e-29
Time          -0.209 0.0227  -9.19  3.75e-20

Weibull shape:
 Estimate     SE   z  P(>|z|)
    0.198 0.0213 9.3 1.34e-20

AIC: 6697.52 

In general the model selection process I used for all species was producing models with abundance parameters and parameter estimates that made sense at first glance and fit hypotheses and expectations. 

Obviously something was going wrong on the detection side of the model. The fact that nmixTTD was not compatible with most unmarked and associated model selection packages made things slow going and it was difficult for me (limited coding experience) to examine the model. Using the model to make predictions made me quite sure that the model I built were operating under the closure assumption and not accounting for temporary emigration, but a cannot make predictions of detectability for any species which makes me worry that I also may have made a simple error.

Predictions of abundance are far higher that the data warrant (max number of Alder Flycatchers recorded in a plot is 4 or 5) with large amounts of uncertainty, making me think that detectability is very low, but predictions of detectability return NA. 

impnewdataTime <- data.frame(Year = "2019", PC3 = mean(umfALFL@siteCovs$PC3), Stem.Density = mean(umfALFL@siteCovs$Stem.Density), Growing.Seasons.Post.Burn = c(1:18), Season.of.Last.Burn = "aSpring", DOS = mean(umfALFL@obsCovs$DOS), Time = mean(umfALFL@obsCovs$Time))

ALFLTimeS <- predict(fitALFL7, type = "state", newdata = impnewdataTime)
ALFLTimeS

Predicted        SE     lower     upper
1   7.947459  1.219470  5.883269  10.73589
2   9.113179  1.488245  6.617026  12.55096
3  10.449884  1.861477  7.370276  14.81628
4  11.982656  2.358235  8.147665  17.62272
5  13.740251  2.999963  8.956710  21.07856
6  15.755647  3.812256  9.805697  25.31594
7  18.066659  4.826148 10.702773  30.49716
8  20.716645  6.079175 11.655730  36.82132
9  23.755328  7.616461 12.672076  44.53221
10 27.239718  9.491946 13.759198  53.92772
11 31.235194 11.769831 14.924517  65.37145
12 35.816718 14.526307 16.175627  79.30681
13 41.070253 17.851583 17.520415  96.27430
14 47.094368 21.852294 18.967159 116.93261
15 54.002089 26.654319 20.524614 142.08431
16 61.923023 32.406097 22.202093 172.70717
17 71.005786 39.282500 24.009542 209.99241
18 81.420793 47.489380 25.957608 255.39123

ALFLTimeD <- predict(fitALFL7, type = "det", newdata = impnewdataTime)
ALFLTimeD

Predicted SE lower upper
1         NA NA    NA    NA
2         NA NA    NA    NA
3         NA NA    NA    NA
4         NA NA    NA    NA
5         NA NA    NA    NA
6         NA NA    NA    NA
7         NA NA    NA    NA
8         NA NA    NA    NA
9         NA NA    NA    NA
10        NA NA    NA    NA
11        NA NA    NA    NA
12        NA NA    NA    NA
13        NA NA    NA    NA
14        NA NA    NA    NA
15        NA NA    NA    NA
16        NA NA    NA    NA
17        NA NA    NA    NA
18        NA NA    NA    NA

My assumption at this point is that the model is working under the closure assumption and assigning species very low detectability. 

I am also worried about how I formatted obsCovs. My first hint that this model was not accounting for temporary emigration was that I could not enter site level detection covariates as a single column. 

My first question is have I made an obvious error in my coding? My second question is, why are my predictions of detectability returning NA and is there an obvious solution? Third, does code exist for a multi-season version of this model that accounts for temporary emigration in umarked? I have found some resources associated with supplementary materials from the paper that say that code for a temporary emigration model in unmarked did not exists as of 2021.

I have more model specific questions that probably go beyond the scope of discussion on this board (the unmarked package). I have been in contact with the authors of the paper in reference to the models ability to account to temporary emigration and its relationship to the closure assumption (In relation to what is stated in Priyadarshani et al. 2024). I am also confused on how my current models are weighting the two different sources of information on detectability and if the closure assumption exists in the current modelling framework of nmixTTD. Repeated counts and the closure assumption say detectability is low for all species, while time to detection data says detectability is relatively high for all species. Is the dissonance between these to inputs causing the issues in the detectability portion of the submodel? The high abundance estimates make me thing the model is weighing the repeated counts more strongly than the time to detection data.

I am on a tight timeline with the duration of my postdoc and am considering switching to an analysis based on Chandler et al. 2011 if there are not available solutions for multiseason nmixture ttd models in unmarked. Any thoughts or assistance would be greatly appreciated.


Ken Kellner

unread,
May 21, 2024, 4:10:03 PMMay 21
to unmarked
Hi,

The short answer is no, nmixTTD does not currently support multi-period data or estimate temporary emigration. It's just using any replicate samples to improve estimates of detection probability. This is noted in the mini-tutorial provided in the paper supplements. Unfortunately the actual wording in the paper about the availability of the TE variant model in unmarked is ambiguous, so apologies for that. The supplemental material does provide JAGS code for the temporary emigration model if you are interested in going that route.

As far as the NAs in the output from predict, I think this may be due to a coding mistake but I can't tell for sure. First, you should include just the detection covariates in your newdata (DOS and time) to avoid potential issues. Second, I suspect that your obs data covariates containing some missing values and so when you calculate the mean values of DOS and/or time in the newdata data frame, the resulting means are NAs as well. If you predict with NA values in the covariates, the output will also be NA. You should be able to fix this by using mean(..., ,na.rm=TRUE).

Ken

Tom Stevens

unread,
May 22, 2024, 8:47:11 AMMay 22
to unmarked
Thanks for the reply Ken, I appreciate it. Adding the ",na.rm=TRUE" fixed the NA issue. 

> impnewdataTime <- data.frame(Year = "2019", PC3 = mean(umfALFL@siteCovs$PC3,na.rm=TRUE), Stem.Density = mean(umfALFL@siteCovs$Stem.Density,na.rm=TRUE), Growing.Seasons.Post.Burn = c(1:18), Season.of.Last.Burn = "aSpring", DOS = mean(umfALFL@obsCovs$DOS,na.rm=TRUE), Time = mean(umfALFL@obsCovs$Time,na.rm=TRUE))
> ALFLTimeS <- predict(fitALFL7, type = "det", newdata = impnewdataTime)
> ALFLTimeS

    Predicted          SE     lower      upper
1  0.05775013 0.007570706 0.0446648 0.07466904
2  0.05775013 0.007570706 0.0446648 0.07466904
3  0.05775013 0.007570706 0.0446648 0.07466904
4  0.05775013 0.007570706 0.0446648 0.07466904
5  0.05775013 0.007570706 0.0446648 0.07466904
6  0.05775013 0.007570706 0.0446648 0.07466904
7  0.05775013 0.007570706 0.0446648 0.07466904
8  0.05775013 0.007570706 0.0446648 0.07466904
9  0.05775013 0.007570706 0.0446648 0.07466904
10 0.05775013 0.007570706 0.0446648 0.07466904
11 0.05775013 0.007570706 0.0446648 0.07466904
12 0.05775013 0.007570706 0.0446648 0.07466904
13 0.05775013 0.007570706 0.0446648 0.07466904
14 0.05775013 0.007570706 0.0446648 0.07466904
15 0.05775013 0.007570706 0.0446648 0.07466904
16 0.05775013 0.007570706 0.0446648 0.07466904
17 0.05775013 0.007570706 0.0446648 0.07466904
18 0.05775013 0.007570706 0.0446648 0.07466904

Detectability being this low makes me think I have coded/formatted something incorrectly in terms of the time to detection data.

I have my time to detection values formatted this way. So in point count #1 there were four searches in that year, at least one ALFL was found in all searches, and initial detection times were 1 minute, 3 minutes, 1 minute, 1 minute. In point count #4 no ALFLs were detected in searches 2 and 4. In point counts # 3,8,13 etc., there was only one replication of the count in this year. Any thoughts would be appreciated. Thanks again for your reply and your work on the TTD models.
.

Tmax [1] 6
yALFL TTD_ALFL_1 TTD_ALFL_2 TTD_ALFL_3 TTD_ALFL_4 1 1 3 1 1 2 5 1 1 3 3 1 NA NA NA 4 1 6 1 6 5 6 6 6 6 6 6 6 1 6 7 1 5 3 6 8 1 NA NA NA 9 6 1 1 6 10 6 6 1 6 11 1 6 1 1 12 1 1 1 1 13 1 NA NA NA 14 1 1 1 1 15 6 1 6 3 16 1 1 1 3 17 1 3 6 1 18 6 NA NA NA 19 1 1 1 1 20 1 1 1 3 21 1 1 5 1 22 1 NA NA NA 23 1 6 5 6 24 3 1 6 1 25 1 6 6 3 26 3 NA NA NA 27 1 5 6 6 28 6 6 3 6 29 1 1 6 6 30 1 1 1 6 31 3 NA NA NA 32 1 6 1 1 33 1 6 1 NA 34 1 1 1 1 35 1 1 6 3 36 1 NA NA NA 37 1 3 1 6 38 1 1 6 1 39 6 3 1 1 40 1 1 1 1 41 1 6 1 1 42 1 5 1 1 43 1 6 6 5 44 1 1 3 1 45 6 6 1 6 46 1 6 1 1 47 1 1 1 6 48 3 3 1 1 49 1 1 1 1 50 1 1 3 1 ...

Ken Kellner

unread,
May 22, 2024, 9:48:28 AMMay 22
to unmarked
Are you sure you're interpreting detection correctly? As per the docs, what unmarked is reporting from predict(..., type='det') is an estimate of the Weibull rate parameter, not a probability of detection. Also, it's the individual-animal level rate parameter, not the "overall" rate parameter for all animals present (which would be higher).

With the code below and your fitted model output, I estimated individual level probability of detection to be about 0.24. Normally you could use the getP() method in unmarked to calculate this from the model output  but it looks like it is not working correctly for this model type right now. I need to fix it.

# from your model

shape <- exp(0.198)

rate <- exp(-2.85)

scale <- 1/rate

tmax <- 6


times <- seq(0.01, 80, by=0.01)

dens <- dweibull(times, shape = shape, scale = scale)


plot(times, dens, type='l', xlab='time-to-detection')

abline(v=tmax, lty=2, col='red')

legend('topright',lty=2,col='red',legend='survey length')


times_sub <- times[times<tmax]

dens_sub <- dens[times<tmax]

polygon(c(times_sub, rev(times_sub)), c(dens_sub, rep(0, length(dens_sub))),

col='grey90', border=NA)


# area under the curve = individual probability of detection

pweibull(tmax, shape = shape, scale = scale, lower.tail=TRUE)

Tom Stevens

unread,
May 24, 2024, 3:01:40 PMMay 24
to unmarked
Thanks again for your help. One way I may be confusing myself.

When I write this line of code with these results

mpnewdataTime <- data.frame(Year = "2019", PC3 = mean(umfALFL@siteCovs$PC3), Stem.Density = mean(umfALFL@siteCovs$Stem.Density), Growing.Seasons.Post.Burn = c(1:18), Season.of.Last.Burn = "aSpring", DOS = mean(umfALFL@obsCovs$DOS), Time = mean(umfALFL@obsCovs$Time))

ALFLTimeS <- predict(fitALFL7, type = "state", newdata = impnewdataTime)
ALFLTimeS

Predicted        SE     lower     upper
1   7.947459  1.219470  5.883269  10.73589
2   9.113179  1.488245  6.617026  12.55096
3  10.449884  1.861477  7.370276  14.81628
4  11.982656  2.358235  8.147665  17.62272
5  13.740251  2.999963  8.956710  21.07856
6  15.755647  3.812256  9.805697  25.31594
7  18.066659  4.826148 10.702773  30.49716
8  20.716645  6.079175 11.655730  36.82132
9  23.755328  7.616461 12.672076  44.53221
10 27.239718  9.491946 13.759198  53.92772
11 31.235194 11.769831 14.924517  65.37145
12 35.816718 14.526307 16.175627  79.30681
13 41.070253 17.851583 17.520415  96.27430
14 47.094368 21.852294 18.967159 116.93261
15 54.002089 26.654319 20.524614 142.08431
16 61.923023 32.406097 22.202093 172.70717
17 71.005786 39.282500 24.009542 209.99241
18 81.420793 47.489380 25.957608 255.39123

1 through 18 is the number of growing seasons post fire that exist in our data.

BetaBirdsDataV2.4[,10] [1] 2 3 1 2 5 2 3 1 2 5 2 3 1 2 5 2 3 [18] 1 2 2 3 1 2 2 3 1 2 5 2 3 1 2 5 2 [35] 3 1 2 5 2 3 2 3 2 3 5 2 3 2 3 5 2 [52] 3 2 3 2 3 2 3 4 5 8 2 3 4 5 8 2 3 [69] 2 3 4 5 8 2 3 4 2 3 4 5 8 2 3 4 5 [86] 8 2 3 4 5 8 2 3 1 2 3 1 4 2 3 1 4 [103] 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 [120] 3 1 12 13 15 12 13 15 18 12 13 12 13 12 13 15 18 [137] 12 13 12 13 12 13 15 18 12 13 14 1 4 12 13 14 1 [154] 4 12 13 14 1 12 13 14 1 4 12 13 14 1 4 12 13 [171] 14 1 4 12 13 14 1 4 12 13 14 1 13 13 13 13 15 [188] 18 13 15 18 13 15 18 13 15 13 12 13 1 2 12 13 1 [205] 2 5 12 13 1 2 12 13 1 2 5 12 13 1 2 12 13 [222] 1 2 12 13 1 2 5 12 13 1 2 5 7 8 1 7 8 [239] 1 3 5 7 8 1 3 5 7 8 1 3 5 7 8 1 7 [256] 8 1 3 5 7 8 1 3 5 7 1 2 4 6 7 1 2 [273] 4 6 7 1 2 4 6 7 1 2 4 7 1 2 4 6 7 [290] 1 2 4 6 7 1 2 4 7 1 2 4 6 7 1 2 4 [307] 6 7 1 2 4 6 7 1 2 4 6 7 1 2 4 6 7 [324] 1 2 4 6 7 1 2 4 6 7 1 2 4 6 7 1 2 [341] 4 6 7 8 9 11 13 7 8 9 11 13 7 8 9 11 13 [358] 7 8 9 11 13 7 8 9 11 13 7 8 9 11 13 7 8 [375] 9 11 13 7 8 9 11 13 7 8 1 3 5 7 8 1 3 [392] 5 7 8 1 3 5 7 8 1 3 5 7 8 1 3 5 7 [409] 8 1 3 5 7 8 1 3 5 7 8 1 3 5 7 8 1 [426] 3 5 6 7 1 3 5 6 7 1 3 5 6 7 1 3 5 [443] 6 7 1 3 5 6 7 6 7 1 3 5 6 7 1 3 5 [460] 6 7 1 3 5 6 1 2 4 6 6 1 2 4 6 6 1 [477] 2 4 6 6 1 2 4 6 6 1 2 4 6 6 1 2 4 [494] 6 6 1 2 4 6 6 1 2 4 6 6 7 8 10 12 6 [511] 7 8 10 12 6 7 8 10 12 6 7 8 10 12 6 7 8 [528] 12 6 7 8 12 6 7 8 10 12 6 7 8 10 12

I scaled all variables prior to fitting the model.

umfALFL@siteCovs$Growing.Seasons.Post.Burn <- scale(umfALFL@siteCovs$Growing.Seasons.Post.Burn)

Is the prediction viewing 1 through 18 on the new scaled range of the variable?

range(umfALFL@siteCovs$Growing.Seasons.Post.Burn) [1] -1.162538 2.890291

Ken Kellner

unread,
May 24, 2024, 3:16:20 PMMay 24
to unmarked
If you have manually scaled a variable in the data, you need to scale it the same way in the newdata yourself (unmarked has no way of knowing what the original mean/sd were).

If you do not scale the data yourself, but instead in your formulas use the scale() function e.g.

~scale(x ) + scale(x2)  instead of ~x + x2

then unmarked will handle all the scaling automatically, including with predict. This is usually easier. There's a number of past discussions/examples of this in the archives if you search 'scale covariate newdata' etc.

Ken
Reply all
Reply to author
Forward
0 new messages