Backward elimination model selection with pcount

430 views
Skip to first unread message

MP

unread,
Aug 8, 2014, 1:29:45 AM8/8/14
to unma...@googlegroups.com
Hi,

I apologize if this question ends up being more about R than unmarked.  I'm still learning my way around unmarked and R.

GOAL:
I am trying to write a program to do backwards selection for covariates to enter a pcount model. I want to specify a starting model, e.g.:

m00 <- pcount(~ a + b + c + d, ~1, umf, K=300, mixture="P", engine="C")

...then look at the unmarkedFitPCount object and review the P(>|z|) for covariates a, b, c, and d.  If the p-val of one or more covariates > alpha, choose the covariate with the largest p-val and remove it.  Run another model (m01) without that covariate.  Repeat until no covariates are included, or all are significant.  Then take the model with the final set of detection covariates and run it through another backwards selection algorithm with a set of abundance covariates, so you end up with something like:

m05 <- pcount(~ a + d, ~a + c + e, umf, K=300, mixture="P", engine="C")

The ultimate model will be sent on to parboot.  I have >150 of these models to run, so doing it 'by hand' will take forever and introduce potential human error.

QUESTION:
Hopefully this question isn't too stupid: how can I extract the p-vals and covariate names from the unmarkedFitPCount object?  Are the p-vals even in there?

I can pull out the estimates for detection covariates...
m00@estimates@estimates$det@estimates[[2]]

...but I'm having trouble getting (or locating) the covariate names and p-vals from the models.


Thanks for all your help.  I've learned a lot already from this group.
Michael


Kery Marc

unread,
Aug 8, 2014, 3:42:31 AM8/8/14
to unma...@googlegroups.com
Hi Mike,

one way of getting the pvals is this (for the mallard example at the bottom of the help text for pcount):
tmp <- summary(fm.mallard)
str(tmp)
List of 2
 $ state:'data.frame':  4 obs. of  4 variables:
  ..$ Estimate: num [1:4] -1.989 -0.413 -1.507 -0.707
  ..$ SE      : num [1:4] 0.245 0.134 0.247 0.162
  ..$ z       : num [1:4] -8.14 -3.07 -6.11 -4.37
  ..$ P(>|z|) : num [1:4] 4.10e-16 2.14e-03 1.01e-09 1.22e-05
 $ det  :'data.frame':  4 obs. of  4 variables:
  ..$ Estimate: num [1:4] 0.25516 0.29778 -0.36894 0.00908
  ..$ SE      : num [1:4] 0.224 0.1773 0.1521 0.0886
  ..$ z       : num [1:4] 1.139 1.68 -2.425 0.103
  ..$ P(>|z|) : num [1:4] 0.2546 0.093 0.0153 0.9183

There are many comments that could be made about your model selection. Obviously, some say that one should not do stepwise automated approaches. I think that when you have lots of data and few a priori hypotheses, then some degree of data dredging is OK.  ---- Saying "dredge" makes me think of that function of the same name, I believe in the Mumin package, which may do this for you in a more automated way, perhaps you check that.

The other thing is that in GLMs, significance of a term in the linear model is best NOT tested by the simple z-test, but by likelihood ratio test for the comparison of a model that does contain the term with another model that does not.

Kind regards  --  Marc




From: unma...@googlegroups.com [unma...@googlegroups.com] on behalf of MP [pendra...@gmail.com]
Sent: 08 August 2014 07:29
To: unma...@googlegroups.com
Subject: [unmarked] Backward elimination model selection with pcount

--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Richard Schuster

unread,
Aug 9, 2014, 12:15:17 AM8/9/14
to unma...@googlegroups.com
Hi Michael,

Along the lines of Marc's comments, you could also use AIC to select
your model(s). If you don't want to bother writing some custom code for
this, the dredge function from the MuMIn package does this for you as it
supports the functions of unmarked (or at least several). It further
allows you to subset models within a certain AIC threshold and use model
averaging on that subset. I know that's a bit different from what you
were asking, but this is a more accepted or at least more used method at
present.

Cheers,
Richard

MP

unread,
Aug 20, 2014, 2:01:14 PM8/20/14
to unma...@googlegroups.com
Marc, Richard,
Thanks for the suggestions - you both had good points.  I ended up using pcount with pdredge in MuMIn and working out a model selection approach first using backwards stepwise selection to work down from a set of 11 detection covariates to 8, then comparing all subsets of the remaining 8 to arrive at a set of detection covariates.  Then I kept those detection covariates in a model with my full set of potential abundance covariates and ran that through all subsets using pdredge again.  At each stage, I selected the best model according to AICc.

This seemed to produce fairly reasonable models for some of my species in terms of abundance estimates, though they all seem a bit high.  For my Indigo Bunting models, however, I am getting absurdly large abundance (25 birds / 50 m count circle) and low prob of detection (0.06) estimates, despite this species being the one for which I had the most detections.  I have tried being more selective with my potential covariates, limiting them to 3 or 4 for abundance and detection, but pdredge selects out everything, leaving me with a null model.  I also tried model averaging to combine all models with delta AIC < 2, (even tried delta AIC < 4) and my estimates are nearly the same.

I'm sort of banging my head against the wall here.  Any suggestions on where to go with this next?


Indigo Bunting model selection using pdredge

> head(x,10)
Global model call: pcount(formula = ~ST + PCTSHRB + USTY_P ~ yyyy + grnd_g + pctshrb + 
    usty_p, data = umf, K = 35, mixture = "P", engine = "C")
---
Model selection table 
   lam(Int) p(Int) lam(grn_g)  lam(pct) lam(yyy)     p(PCT)    p(ST)   p(UST_P)     adjR^2 df   logLik  AICc delta weight
1     2.980 -2.728                                                               0.000e+00  2 -454.823 913.8  0.00  0.275
65    3.001 -2.696                                                   -6.108e-05  5.394e-03  3 -454.523 915.3  1.51  0.129
2     3.044 -2.724  -0.001023                                                    1.232e-03  3 -454.754 915.7  1.98  0.102
33    2.979 -2.636                                          -0.01182             3.942e-04  3 -454.801 915.8  2.07  0.098
9     2.988 -2.728                      -0.01888                                 2.903e-04  3 -454.807 915.8  2.08  0.097
3     2.976 -2.724            7.349e-05                                          2.091e-06  3 -454.823 915.9  2.11  0.096
17    2.981 -2.729                               -1.133e-06                     -2.351e-07  3 -454.823 915.9  2.11  0.096
34    3.047 -2.616  -0.001055                               -0.01399             1.737e-03  4 -454.726 917.8  4.07  0.036
10    3.063 -2.730  -0.001070           -0.02377                                 1.691e-03  4 -454.729 917.8  4.08  0.036
18    3.050 -2.726  -0.001035                    -6.644e-04                      1.477e-03  4 -454.741 917.9  4.10  0.035
>



My ydat object for Indigo Buntings:

 row.namesvisit1visit2visit3
1V00851101
2V00853311
3V00942213
4V00945311
5V01041200
6V01101231
7V01102212
8V01175011
9V01675001
10V01676122
11V01722121
12V01883111
13V01886210
14V01901201
15V01905202
16V02541111
17V02546201
18V02631221
19V02635120
20V02682230
21V02686300
22V03561313
23V03566022
24V03581112
25V03583122
26V03593211
27V03595002
28V03642012
29V03646101
30V03652011
31V03654211
32V03693022
33V03694121
34V03702012
35V03703002
36V03742122
37V03744012
38V04501202
39V04503114
40V04511211
41V04513011
42V04521011
43V04523111
44V05212312
45V05216310
46V05601012
47V05604010
48V05621210
49V05622023
50V05651120
51V05654232
52V05921211
53V06414112
54V06542102
55V06543101
56V06573301
57V06773012
58V06776301
59V06861301
60V06865110
61V06972010
62X0395121
63X0396031
64X0644113
65X0645144
66X0705132
67X0706030
68X0835120
69X0836121
70X0864121
71X0865010
72X0891211
73X0892203
74X1125020
75X1126101
76X1235221
77X1236322
78X1295111
79X1296112
80X1301210
81X1302220
82X1514111
83X1516120
84X2395103
85X2396114
86X2425321
87X2426014
88X2525102
89X2526213
90X2535101
91X2536101
92X2585021
93X2755120
94X2756100
95X4055013
96X4056402
97X4493002
98X4494123
99X4543130
100X4544110
101X5043011
102X5045210
103X7155212
104X7156112
105X7245243
106X7246210
107X7524020
108X7525010
109V01043000
110V01173000
111V06976000

Jeffrey Royle

unread,
Aug 20, 2014, 8:36:42 PM8/20/14
to unma...@googlegroups.com
hi Mike,
 That's interesting -- the data set looks reasonable and I'm surprised the estimated p is so low and vice versa for log(lambda).
 I recommend checking the results by fitting the model by itself (not using dredge) and vary the starting values and see if you have sensitivity to starting values. 

 Also K = 35 is probably too low especially if you're getting estimates of E(N) around 25. Try 100 or something just to make sure there's no problem there (if E(N) keeps increasing it suggests an identifiability problem although I'd be surprised if that was the case here).

andy



Dan Linden

unread,
Aug 21, 2014, 10:23:31 AM8/21/14
to unma...@googlegroups.com

If you plot the data, it appears that quite a few individuals are missed in a given survey.  Here the sites are sorted by maximum count and then by total count.  So I would say it is not surprising for the detection probability to be estimated so low, but I would guess that it's due to severe closure violations more than detectability issues.  For songbirds, a 50-m radius plot can have low individual availability for a standard point count duration depending on movement ecology.


MP

unread,
Aug 25, 2014, 10:55:40 PM8/25/14
to unma...@googlegroups.com
I tried varying K for my model.  I actually tried K=35, 300, and 500.  My Indigo Bunting p dropped lower and my abundance estimates went much higher.

At K=35, the best model included no covariates, estimated mean abundance was ~20 and p(detection) was 0.0613.
At K=300, the best model included one abundance covariate, estimated mean abundance was ~158 and p(detection) was 0.0076.
At K=300, the best model included the same abundance covariate, estimated mean abundance was ~270 and p(detection) was 0.0045.

So that's no good.  I'm concerned now that there was some sort of closure violation, as Dan mentioned.  Year one spanned ~70 days, while year 2 spanned ~40 days).  We surveyed any given site 3 times, but only during one breeding season. 

Or perhaps we had birds moving toward / away from observers.  Our sites had very dense vegetation, so there was some non-zero level of disturbance as observers approached the count stations.

I ran three other species through at the different K values.  Two species models did not change with K, while the third jumped a little (abundance went from 5.2 to 12 and p(det) went from 0.87 to 0.79) somewhere between K=35 and K=300, but then stabilized.

Should I be considering switching to pcountOpen or occu?  My study spanned two years, but any given site was surveyed (3 times) in only one of the years, so that would be 1 primary period and 3 secondary periods, right?  Would that even work with pcountOpen?

Richard Chandler

unread,
Aug 26, 2014, 8:53:38 AM8/26/14
to Unmarked package
Hi Michael,

I don't think you are going to have much luck with any of these models when p is so low. However, I would think that some of the variation in p could be explained by covariates. I see that you have included some, but it doesn't look like you standardized them. I wonder how your results would change if you did so.

Also, you could try some of the open population models, but they work best when you have robust design data. 

Richard



--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Richard Chandler
Assistant Professor
Warnell School of Forestry and Natural Resources
University of Georgia

MP

unread,
Aug 28, 2014, 10:25:37 PM8/28/14
to unma...@googlegroups.com, rcha...@warnell.uga.edu
Hi Richard,

I gave standardizing a try, but I didn't see significant improvement on the INBU model.  My better models actually got a lot worse when I did it.  Not sure why that would be the case, but the p(detection) for two species that had previously been aorund 0.80, dropped to around 0.10-0.15.  Strange.

I'd like to give pcountOpen a try.  If I'm operating on the assumption that the populations were not closed between surveys, would I be justified in calling each of my 3 visits per season a primary period (with no secondary periods)?  If I understand correctly, primary periods are periods between which the population is open, while secondary periods are periods between which the population is closed.  In my case, I only surveyed during a single season per site, but it is looking like perhaps there was some violation of the closure assumption.  Sound reasonable?

Thanks,
Michael

Richard Chandler

unread,
Aug 29, 2014, 8:39:50 AM8/29/14
to Unmarked package
Hi Michael,

When you say that p dropped from 0.8 to 0.1, are you referring to the intercept? With covariates in the model, p isn't a single value. If the intercept decreased after standardizing your covariates, that isn't a bad thing. The more important question is whether or not the estimates became less sensitive to K after standardization.

I don't think a more complicated model will help in this situation, but you could give it a try.

Richard



MP

unread,
Sep 4, 2014, 2:23:00 PM9/4/14
to unma...@googlegroups.com, rcha...@warnell.uga.edu
Hi Richard,

Not sure if this will give you (or anyone else) any other ideas, but I figured I'd throw it out there....


Earlier, I was referring to the mean estimate for p(detection).  I calculated p(det) using the covariate levels at each station and averaged across stations.  Anyway, my pcount estimates were a mess.

After all that, I tried to get things going in pcountOpen ('PCO'), reasoning that many of the problems I was experiencing in pcount might be due (as others suggested) to a closure violation.  I treated each visit as a primary period.  I think my PCO models look a lot more realistic compared to the pcount models.  I didn't add any covariates for recruitment or survival, potentially that could improve things some more.

My Indigo Bunting / clearcuts PCO model is giving me NaNs, but none of the other models I selected (using forward selection) are doing that. 

Here are some preliminary results for one habitat type.  These are mean estimates averaged across count stations. 
My pcountOpen results are up on top (INBU is still a mess, and recruitment/survival seems off on a few others), but the abundance makes better sense.
The next table is just the average, across stations, of either the mean count/station or the max count/station, by way of comparison.
The final one is the pcount estimates that seemed so wildly wrong.

 

Mean ests from pcountOpen models (relaxed assumption of closure between visits)
Species Abundance density (birds/ha) Recruitment Survival p(detection)
INBU 1.69 2.15 1.72 0.00 0.71
YBCH 1.79 2.29 0.00 0.63 0.58
PRAW 3.53 4.49 0.01 0.65 0.23
COYE 0.77 0.98 0.57 0.09 0.63
 



 

 Mean estimates based on raw counts (max count or mean count / station)
Species Abundance (max) Abundance (mean) Density (max) Density (mean)  
INBU 2.07 1.21 2.64 1.54  
YBCH 1.09 0.66 1.39 0.84  
PRAW 1.14 0.57 1.45 0.72  
COYE 0.94 0.44 1.19 0.55  
 



 

Mean ests from pcount models (assumed closure between visits)
Species Abundance density (birds/ha) Pct Sites Birds Seen   p(detection)
INBU 19.68 25.06 97.3%
0.06
YBCH 2.90 3.69 75.7%
0.78
PRAW 5.18 6.59 74.8%
0.87
COYE 3.78 4.81 64.9%   0.20


None of the best models selected in an abundance covariate at alpha=0.10.  Some of my other models for different birds/habitats did.

pcountOpen Models:

*********************************
 SELECTED HAB/SPP:  CC  /  INBU  
*********************************
> best
 
Call:
pcountOpen(lambdaformula = ~1, gammaformula = ~1, omegaformula = ~1, 
    pformula = ~1, data = UMFPCO, mixture = "P", K = 50)
 
Abundance:
 Estimate  SE   z P(>|z|)
    0.522 NaN NaN     NaN
 
Recruitment:
 Estimate  SE   z P(>|z|)
    0.541 NaN NaN     NaN
 
Apparent Survival:
 Estimate   SE      z P(>|z|)
    -5.67 10.4 -0.547   0.585
 
Detection:
 Estimate  SE   z P(>|z|)
    0.875 NaN NaN     NaN
 
AIC: 916.3025 
 
 
 
*********************************
 SELECTED HAB/SPP:  CC  /  YBCH  
*********************************
> best
 
Call:
pcountOpen(lambdaformula = ~1, gammaformula = ~1, omegaformula = ~1, 
    pformula = ~JD, data = UMFPCO, mixture = "P", K = 50)
 
Abundance:
 Estimate    SE    z  P(>|z|)
    0.585 0.162 3.61 0.000302
 
Recruitment:
 Estimate   SE      z P(>|z|)
    -6.27 8.89 -0.705   0.481
 
Apparent Survival:
 Estimate    SE    z P(>|z|)
    0.545 0.336 1.62   0.104
 
Detection:
            Estimate     SE     z P(>|z|)
(Intercept)  -4.3534 2.3921 -1.82  0.0688
JD            0.0304 0.0167  1.82  0.0691
 
AIC: 649.7416 

*********************************
 SELECTED HAB/SPP:  CC  /  PRAW  
*********************************
> best
 
Call:
pcountOpen(lambdaformula = ~1, gammaformula = ~1, omegaformula = ~1, 
    pformula = ~GRND_G + PCTSHRB, data = UMFPCO, mixture = "P", 
    K = 50)
 
Abundance:
 Estimate    SE    z  P(>|z|)
     1.26 0.315 3.99 6.47e-05
 
Recruitment:
 Estimate   SE      z P(>|z|)
    -4.58 12.8 -0.358    0.72
 
Apparent Survival:
 Estimate    SE    z P(>|z|)
    0.634 0.316 2.01  0.0446
 
Detection:
            Estimate      SE     z  P(>|z|)
(Intercept)  -2.7159 0.51990 -5.22 1.75e-07
GRND_G        0.0193 0.00623  3.10 1.94e-03
PCTSHRB       0.0303 0.01236  2.45 1.44e-02
 
AIC: 632.1969 
 
 
*********************************
 SELECTED HAB/SPP:  CC  /  COYE  
*********************************
> best
 
Call:
pcountOpen(lambdaformula = ~1, gammaformula = ~1, omegaformula = ~1, 
    pformula = ~GRND_W + PCTSHRB + USTY_P, data = UMFPCO, mixture = "P", 
    K = 50)
 
Abundance:
 Estimate   SE     z P(>|z|)
   -0.261 0.19 -1.37   0.171
 
Recruitment:
 Estimate    SE     z P(>|z|)
   -0.554 0.204 -2.71 0.00668
 
Apparent Survival:
 Estimate   SE     z P(>|z|)
     -2.3 1.13 -2.03  0.0428
 
Detection:
            Estimate       SE     z  P(>|z|)
(Intercept)  0.14580 0.811767  0.18 8.57e-01
GRND_W      -0.09349 0.031377 -2.98 2.89e-03
PCTSHRB      0.10024 0.046195  2.17 3.00e-02
USTY_P       0.00177 0.000435  4.08 4.58e-05
 
AIC: 560.5799 



INBU models in my other habitat types did better:

SMZ habitat

Mean ests from pcountOpen models (relaxed assumption of closure between visits)
Species Abundance density (birds/ha) Recruitment Survival p(detection)
INBU 1.54 1.96 0.54 0.15 0.50
 



 

  Mean estimates based on raw counts (max count or mean count / station)
Species Abundance (max) Abundance (mean) Density (max) Density (mean)  
INBU 0.86 0.45 1.10 0.57  
 



 

Mean ests from pcount models (assumed closure between visits)
Species Abundance density (birds/ha) Pct Sites Birds Seen   p(detection)
INBU 14.27 18.17 56.2%
0.06












Stringer






Mean ests from pcountOpen models (relaxed assumption of closure between visits)
Species Abundance density (birds/ha) Recruitment Survival p(detection)
INBU 2.94 3.75 1.76 0.21 0.29
 



 

  Mean estimates based on raw counts (max count or mean count / station)
Species Abundance (max) Abundance (mean) Density (max) Density (mean)  
INBU 1.43 0.74 1.81 0.94  





 

Mean ests from pcount models (assumed closure between visits)
Species Abundance density (birds/ha) Pct Sites Birds Seen   p(detection)
INBU 11.65 14.84 86.2%   0.23


pcountOpen models:

*********************************
 SELECTED HAB/SPP:  SMZ  /  INBU  
*********************************
> best
Call:
pcountOpen(lambdaformula = ~osty_hw, gammaformula = ~1, omegaformula = ~1, 
    pformula = ~MSTY_HW + JD + USTY_P, data = UMFPCO, mixture = "P", 
    K = 50)
 
Abundance:
            Estimate      SE     z P(>|z|)
(Intercept)  0.68961 0.28944  2.38  0.0172
osty_hw     -0.00376 0.00226 -1.66  0.0961
 
Recruitment:
 Estimate    SE     z P(>|z|)
   -0.615 0.254 -2.42  0.0156
 
Apparent Survival:
 Estimate    SE     z P(>|z|)
    -1.74 0.885 -1.97  0.0489
 
Detection:
            Estimate      SE     z  P(>|z|)
(Intercept) -9.00238 2.52398 -3.57 3.61e-04
MSTY_HW      0.00738 0.00160  4.60 4.17e-06
JD           0.03875 0.01641  2.36 1.82e-02
USTY_P       0.00245 0.00103  2.37 1.79e-02
 
AIC: 445.2098 



*********************************
 SELECTED HAB/SPP:  STR  /  INBU  
*********************************
> best
 
Call:
pcountOpen(lambdaformula = ~1, gammaformula = ~1, omegaformula = ~1, 
    pformula = ~1, data = UMFPCO, mixture = "P", K = 50)
 
Abundance:
 Estimate   SE     z P(>|z|)
     1.08 3.75 0.288   0.773
 
Recruitment:
 Estimate   SE     z P(>|z|)
    0.566 2.62 0.216   0.829
 
Apparent Survival:
 Estimate   SE      z P(>|z|)
    -1.34 4.95 -0.271   0.786
 
Detection:
 Estimate   SE      z P(>|z|)
   -0.888 5.33 -0.167   0.868
 
AIC: 602.7579 





Reply all
Reply to author
Forward
0 new messages