Backward elimination model selection with pcount

MP

unread,

Aug 8, 2014, 1:29:45 AM8/8/14

to unma...@googlegroups.com

Hi,

I apologize if this question ends up being more about R than unmarked. I'm still learning my way around unmarked and R.

GOAL:
I am trying to write a program to do backwards selection for covariates to enter a pcount model. I want to specify a starting model, e.g.:

m00 <- pcount(~ a + b + c + d, ~1, umf, K=300, mixture="P", engine="C")

...then look at the unmarkedFitPCount object and review the P(>|z|) for covariates a, b, c, and d. If the p-val of one or more covariates > alpha, choose the covariate with the largest p-val and remove it. Run another model (m01) without that covariate. Repeat until no covariates are included, or all are significant. Then take the model with the final set of detection covariates and run it through another backwards selection algorithm with a set of abundance covariates, so you end up with something like:

m05 <- pcount(~ a + d, ~a + c + e, umf, K=300, mixture="P", engine="C")

The ultimate model will be sent on to parboot. I have >150 of these models to run, so doing it 'by hand' will take forever and introduce potential human error.

QUESTION:
Hopefully this question isn't too stupid: how can I extract the p-vals and covariate names from the unmarkedFitPCount object? Are the p-vals even in there?

I can pull out the estimates for detection covariates...
m00@estimates@estimates$det@estimates[[2]]

...but I'm having trouble getting (or locating) the covariate names and p-vals from the models.

Thanks for all your help. I've learned a lot already from this group.
Michael

Kery Marc

unread,

Aug 8, 2014, 3:42:31 AM8/8/14

to unma...@googlegroups.com

Hi Mike,

one way of getting the pvals is this (for the mallard example at the bottom of the help text for pcount):
tmp <- summary(fm.mallard)
str(tmp)
List of 2
$ state:'data.frame': 4 obs. of 4 variables:
..$ Estimate: num [1:4] -1.989 -0.413 -1.507 -0.707
..$ SE      : num [1:4] 0.245 0.134 0.247 0.162
..$ z       : num [1:4] -8.14 -3.07 -6.11 -4.37
..$ P(>|z|) : num [1:4] 4.10e-16 2.14e-03 1.01e-09 1.22e-05
$ det :'data.frame': 4 obs. of 4 variables:
..$ Estimate: num [1:4] 0.25516 0.29778 -0.36894 0.00908
..$ SE      : num [1:4] 0.224 0.1773 0.1521 0.0886
..$ z       : num [1:4] 1.139 1.68 -2.425 0.103
..$ P(>|z|) : num [1:4] 0.2546 0.093 0.0153 0.9183

There are many comments that could be made about your model selection. Obviously, some say that one should not do stepwise automated approaches. I think that when you have lots of data and few a priori hypotheses, then some degree of data dredging is OK. ---- Saying "dredge" makes me think of that function of the same name, I believe in the Mumin package, which may do this for you in a more automated way, perhaps you check that.

The other thing is that in GLMs, significance of a term in the linear model is best NOT tested by the simple z-test, but by likelihood ratio test for the comparison of a model that does contain the term with another model that does not.

Kind regards -- Marc

From: unma...@googlegroups.com [unma...@googlegroups.com] on behalf of MP [pendra...@gmail.com]
Sent: 08 August 2014 07:29
To: unma...@googlegroups.com
Subject: [unmarked] Backward elimination model selection with pcount

--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Richard Schuster

unread,

Aug 9, 2014, 12:15:17 AM8/9/14

to unma...@googlegroups.com

Hi Michael,

Along the lines of Marc's comments, you could also use AIC to select
your model(s). If you don't want to bother writing some custom code for
this, the dredge function from the MuMIn package does this for you as it
supports the functions of unmarked (or at least several). It further
allows you to subset models within a certain AIC threshold and use model
averaging on that subset. I know that's a bit different from what you
were asking, but this is a more accepted or at least more used method at
present.

Cheers,
Richard

MP

unread,

Aug 20, 2014, 2:01:14 PM8/20/14

to unma...@googlegroups.com

Marc, Richard,
Thanks for the suggestions - you both had good points. I ended up using pcount with pdredge in MuMIn and working out a model selection approach first using backwards stepwise selection to work down from a set of 11 detection covariates to 8, then comparing all subsets of the remaining 8 to arrive at a set of detection covariates. Then I kept those detection covariates in a model with my full set of potential abundance covariates and ran that through all subsets using pdredge again. At each stage, I selected the best model according to AICc.

This seemed to produce fairly reasonable models for some of my species in terms of abundance estimates, though they all seem a bit high. For my Indigo Bunting models, however, I am getting absurdly large abundance (25 birds / 50 m count circle) and low prob of detection (0.06) estimates, despite this species being the one for which I had the most detections. I have tried being more selective with my potential covariates, limiting them to 3 or 4 for abundance and detection, but pdredge selects out everything, leaving me with a null model. I also tried model averaging to combine all models with delta AIC < 2, (even tried delta AIC < 4) and my estimates are nearly the same.

I'm sort of banging my head against the wall here. Any suggestions on where to go with this next?

Indigo Bunting model selection using pdredge

> head(x,10)
Global model call: pcount(formula = ~ST + PCTSHRB + USTY_P ~ yyyy + grnd_g + pctshrb + 
    usty_p, data = umf, K = 35, mixture = "P", engine = "C")
---
Model selection table 
   lam(Int) p(Int) lam(grn_g)  lam(pct) lam(yyy)     p(PCT)    p(ST)   p(UST_P)     adjR^2 df   logLik  AICc delta weight
1     2.980 -2.728                                                               0.000e+00  2 -454.823 913.8  0.00  0.275
65    3.001 -2.696                                                   -6.108e-05  5.394e-03  3 -454.523 915.3  1.51  0.129
2     3.044 -2.724  -0.001023                                                    1.232e-03  3 -454.754 915.7  1.98  0.102
33    2.979 -2.636                                          -0.01182             3.942e-04  3 -454.801 915.8  2.07  0.098
9     2.988 -2.728                      -0.01888                                 2.903e-04  3 -454.807 915.8  2.08  0.097
3     2.976 -2.724            7.349e-05                                          2.091e-06  3 -454.823 915.9  2.11  0.096
17    2.981 -2.729                               -1.133e-06                     -2.351e-07  3 -454.823 915.9  2.11  0.096
34    3.047 -2.616  -0.001055                               -0.01399             1.737e-03  4 -454.726 917.8  4.07  0.036
10    3.063 -2.730  -0.001070           -0.02377                                 1.691e-03  4 -454.729 917.8  4.08  0.036
18    3.050 -2.726  -0.001035                    -6.644e-04                      1.477e-03  4 -454.741 917.9  4.10  0.035

>

My ydat object for Indigo Buntings:

	row.names	visit1	visit2	visit3
1	V00851	1	0	1
2	V00853	3	1	1
3	V00942	2	1	3
4	V00945	3	1	1
5	V01041	2	0	0
6	V01101	2	3	1
7	V01102	2	1	2
8	V01175	0	1	1
9	V01675	0	0	1
10	V01676	1	2	2
11	V01722	1	2	1
12	V01883	1	1	1
13	V01886	2	1	0
14	V01901	2	0	1
15	V01905	2	0	2
16	V02541	1	1	1
17	V02546	2	0	1
18	V02631	2	2	1
19	V02635	1	2	0
20	V02682	2	3	0
21	V02686	3	0	0
22	V03561	3	1	3
23	V03566	0	2	2
24	V03581	1	1	2
25	V03583	1	2	2
26	V03593	2	1	1
27	V03595	0	0	2
28	V03642	0	1	2
29	V03646	1	0	1
30	V03652	0	1	1
31	V03654	2	1	1
32	V03693	0	2	2
33	V03694	1	2	1
34	V03702	0	1	2
35	V03703	0	0	2
36	V03742	1	2	2
37	V03744	0	1	2
38	V04501	2	0	2
39	V04503	1	1	4
40	V04511	2	1	1
41	V04513	0	1	1
42	V04521	0	1	1
43	V04523	1	1	1
44	V05212	3	1	2
45	V05216	3	1	0
46	V05601	0	1	2
47	V05604	0	1	0
48	V05621	2	1	0
49	V05622	0	2	3
50	V05651	1	2	0
51	V05654	2	3	2
52	V05921	2	1	1
53	V06414	1	1	2
54	V06542	1	0	2
55	V06543	1	0	1
56	V06573	3	0	1
57	V06773	0	1	2
58	V06776	3	0	1
59	V06861	3	0	1
60	V06865	1	1	0
61	V06972	0	1	0
62	X0395	1	2	1
63	X0396	0	3	1
64	X0644	1	1	3
65	X0645	1	4	4
66	X0705	1	3	2
67	X0706	0	3	0
68	X0835	1	2	0
69	X0836	1	2	1
70	X0864	1	2	1
71	X0865	0	1	0
72	X0891	2	1	1
73	X0892	2	0	3
74	X1125	0	2	0
75	X1126	1	0	1
76	X1235	2	2	1
77	X1236	3	2	2
78	X1295	1	1	1
79	X1296	1	1	2
80	X1301	2	1	0
81	X1302	2	2	0
82	X1514	1	1	1
83	X1516	1	2	0
84	X2395	1	0	3
85	X2396	1	1	4
86	X2425	3	2	1
87	X2426	0	1	4
88	X2525	1	0	2
89	X2526	2	1	3
90	X2535	1	0	1
91	X2536	1	0	1
92	X2585	0	2	1
93	X2755	1	2	0
94	X2756	1	0	0
95	X4055	0	1	3
96	X4056	4	0	2
97	X4493	0	0	2
98	X4494	1	2	3
99	X4543	1	3	0
100	X4544	1	1	0
101	X5043	0	1	1
102	X5045	2	1	0
103	X7155	2	1	2
104	X7156	1	1	2
105	X7245	2	4	3
106	X7246	2	1	0
107	X7524	0	2	0
108	X7525	0	1	0
109	V01043	0	0	0
110	V01173	0	0	0
111	V06976	0	0	0

Jeffrey Royle

unread,

Aug 20, 2014, 8:36:42 PM8/20/14

to unma...@googlegroups.com

hi Mike,

That's interesting -- the data set looks reasonable and I'm surprised the estimated p is so low and vice versa for log(lambda).

I recommend checking the results by fitting the model by itself (not using dredge) and vary the starting values and see if you have sensitivity to starting values.

Also K = 35 is probably too low especially if you're getting estimates of E(N) around 25. Try 100 or something just to make sure there's no problem there (if E(N) keeps increasing it suggests an identifiability problem although I'd be surprised if that was the case here).

andy

Dan Linden

unread,

Aug 21, 2014, 10:23:31 AM8/21/14

to unma...@googlegroups.com

If you plot the data, it appears that quite a few individuals are missed in a given survey. Here the sites are sorted by maximum count and then by total count. So I would say it is not surprising for the detection probability to be estimated so low, but I would guess that it's due to severe closure violations more than detectability issues. For songbirds, a 50-m radius plot can have low individual availability for a standard point count duration depending on movement ecology.

MP

unread,

Aug 25, 2014, 10:55:40 PM8/25/14

to unma...@googlegroups.com

I tried varying K for my model. I actually tried K=35, 300, and 500. My Indigo Bunting p dropped lower and my abundance estimates went much higher.

At K=35, the best model included no covariates, estimated mean abundance was ~20 and p(detection) was 0.0613.
At K=300, the best model included one abundance covariate, estimated mean abundance was ~158 and p(detection) was 0.0076.
At K=300, the best model included the same abundance covariate, estimated mean abundance was ~270 and p(detection) was 0.0045.

So that's no good. I'm concerned now that there was some sort of closure violation, as Dan mentioned. Year one spanned ~70 days, while year 2 spanned ~40 days). We surveyed any given site 3 times, but only during one breeding season.

Or perhaps we had birds moving toward / away from observers. Our sites had very dense vegetation, so there was some non-zero level of disturbance as observers approached the count stations.

I ran three other species through at the different K values. Two species models did not change with K, while the third jumped a little (abundance went from 5.2 to 12 and p(det) went from 0.87 to 0.79) somewhere between K=35 and K=300, but then stabilized.

Should I be considering switching to pcountOpen or occu? My study spanned two years, but any given site was surveyed (3 times) in only one of the years, so that would be 1 primary period and 3 secondary periods, right? Would that even work with pcountOpen?

Richard Chandler

unread,

Aug 26, 2014, 8:53:38 AM8/26/14

to Unmarked package

Hi Michael,

I don't think you are going to have much luck with any of these models when p is so low. However, I would think that some of the variation in p could be explained by covariates. I see that you have included some, but it doesn't look like you standardized them. I wonder how your results would change if you did so.

Also, you could try some of the open population models, but they work best when you have robust design data.

Richard

--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Richard Chandler

Assistant Professor

Warnell School of Forestry and Natural Resources

University of Georgia

706-542-5815

chandlerlab.wordpress.com

MP

unread,

Aug 28, 2014, 10:25:37 PM8/28/14

to unma...@googlegroups.com, rcha...@warnell.uga.edu

Hi Richard,

I gave standardizing a try, but I didn't see significant improvement on the INBU model. My better models actually got a lot worse when I did it. Not sure why that would be the case, but the p(detection) for two species that had previously been aorund 0.80, dropped to around 0.10-0.15. Strange.

I'd like to give pcountOpen a try. If I'm operating on the assumption that the populations were not closed between surveys, would I be justified in calling each of my 3 visits per season a primary period (with no secondary periods)? If I understand correctly, primary periods are periods between which the population is open, while secondary periods are periods between which the population is closed. In my case, I only surveyed during a single season per site, but it is looking like perhaps there was some violation of the closure assumption. Sound reasonable?

Thanks,
Michael

Richard Chandler

unread,

Aug 29, 2014, 8:39:50 AM8/29/14

to Unmarked package

Hi Michael,

When you say that p dropped from 0.8 to 0.1, are you referring to the intercept? With covariates in the model, p isn't a single value. If the intercept decreased after standardizing your covariates, that isn't a bad thing. The more important question is whether or not the estimates became less sensitive to K after standardization.

I don't think a more complicated model will help in this situation, but you could give it a try.

Richard

MP

unread,

Sep 4, 2014, 2:23:00 PM9/4/14

to unma...@googlegroups.com, rcha...@warnell.uga.edu

Hi Richard,

Not sure if this will give you (or anyone else) any other ideas, but I figured I'd throw it out there....

Earlier, I was referring to the mean estimate for p(detection). I calculated p(det) using the covariate levels at each station and averaged across stations. Anyway, my pcount estimates were a mess.

After all that, I tried to get things going in pcountOpen ('PCO'), reasoning that many of the problems I was experiencing in pcount might be due (as others suggested) to a closure violation. I treated each visit as a primary period. I think my PCO models look a lot more realistic compared to the pcount models. I didn't add any covariates for recruitment or survival, potentially that could improve things some more.

My Indigo Bunting / clearcuts PCO model is giving me NaNs, but none of the other models I selected (using forward selection) are doing that.

Here are some preliminary results for one habitat type. These are mean estimates averaged across count stations.
My pcountOpen results are up on top (INBU is still a mess, and recruitment/survival seems off on a few others), but the abundance makes better sense.
The next table is just the average, across stations, of either the mean count/station or the max count/station, by way of comparison.
The final one is the pcount estimates that seemed so wildly wrong.

	Mean ests from pcountOpen models (relaxed assumption of closure between visits)
Species	Abundance	density (birds/ha)	Recruitment	Survival	p(detection)
INBU	1.69	2.15	1.72	0.00	0.71
YBCH	1.79	2.29	0.00	0.63	0.58
PRAW	3.53	4.49	0.01	0.65	0.23
COYE	0.77	0.98	0.57	0.09	0.63

		Mean estimates based on raw counts (max count or mean count / station)
Species	Abundance (max)	Abundance (mean)	Density (max)	Density (mean)
INBU	2.07	1.21	2.64	1.54
YBCH	1.09	0.66	1.39	0.84
PRAW	1.14	0.57	1.45	0.72
COYE	0.94	0.44	1.19	0.55

	Mean ests from pcount models (assumed closure between visits)
Species	Abundance	density (birds/ha)	Pct Sites Birds Seen		p(detection)
INBU	19.68	25.06	97.3%		0.06
YBCH	2.90	3.69	75.7%		0.78
PRAW	5.18	6.59	74.8%		0.87
COYE	3.78	4.81	64.9%		0.20

None of the best models selected in an abundance covariate at alpha=0.10. Some of my other models for different birds/habitats did.

pcountOpen Models:

*********************************

 SELECTED HAB/SPP:  CC  /  INBU

*********************************

> best

Call:

pcountOpen(lambdaformula = ~1, gammaformula = ~1, omegaformula = ~1,

    pformula = ~1, data = UMFPCO, mixture = "P", K = 50)

Abundance:

 Estimate  SE   z P(>|z|)

    0.522 NaN NaN     NaN

Recruitment:

 Estimate  SE   z P(>|z|)

    0.541 NaN NaN     NaN

Apparent Survival:

 Estimate   SE      z P(>|z|)

    -5.67 10.4 -0.547   0.585

Detection:

 Estimate  SE   z P(>|z|)

    0.875 NaN NaN     NaN

AIC: 916.3025

*********************************

 SELECTED HAB/SPP:  CC  /  YBCH

*********************************

> best

Call:

pcountOpen(lambdaformula = ~1, gammaformula = ~1, omegaformula = ~1,

    pformula = ~JD, data = UMFPCO, mixture = "P", K = 50)

Abundance:

 Estimate    SE    z  P(>|z|)

    0.585 0.162 3.61 0.000302

Recruitment:

 Estimate   SE      z P(>|z|)

    -6.27 8.89 -0.705   0.481

Apparent Survival:

 Estimate    SE    z P(>|z|)

    0.545 0.336 1.62   0.104

Detection:

            Estimate     SE     z P(>|z|)

(Intercept)  -4.3534 2.3921 -1.82  0.0688

JD            0.0304 0.0167  1.82  0.0691

AIC: 649.7416

*********************************

 SELECTED HAB/SPP:  CC  /  PRAW

*********************************

> best

Call:

pcountOpen(lambdaformula = ~1, gammaformula = ~1, omegaformula = ~1,

    pformula = ~GRND_G + PCTSHRB, data = UMFPCO, mixture = "P",

    K = 50)

Abundance:

 Estimate    SE    z  P(>|z|)

     1.26 0.315 3.99 6.47e-05

Recruitment:

 Estimate   SE      z P(>|z|)

    -4.58 12.8 -0.358    0.72

Apparent Survival:

 Estimate    SE    z P(>|z|)

    0.634 0.316 2.01  0.0446

Detection:

            Estimate      SE     z  P(>|z|)

(Intercept)  -2.7159 0.51990 -5.22 1.75e-07

GRND_G        0.0193 0.00623  3.10 1.94e-03

PCTSHRB       0.0303 0.01236  2.45 1.44e-02

AIC: 632.1969

*********************************

 SELECTED HAB/SPP:  CC  /  COYE

*********************************

> best

Call:

pcountOpen(lambdaformula = ~1, gammaformula = ~1, omegaformula = ~1,

    pformula = ~GRND_W + PCTSHRB + USTY_P, data = UMFPCO, mixture = "P",

    K = 50)

Abundance:

 Estimate   SE     z P(>|z|)

   -0.261 0.19 -1.37   0.171

Recruitment:

 Estimate    SE     z P(>|z|)

   -0.554 0.204 -2.71 0.00668

Apparent Survival:

 Estimate   SE     z P(>|z|)

     -2.3 1.13 -2.03  0.0428

Detection:

            Estimate       SE     z  P(>|z|)

(Intercept)  0.14580 0.811767  0.18 8.57e-01

GRND_W      -0.09349 0.031377 -2.98 2.89e-03

PCTSHRB      0.10024 0.046195  2.17 3.00e-02

USTY_P       0.00177 0.000435  4.08 4.58e-05

AIC: 560.5799

INBU models in my other habitat types did better:

SMZ habitat

	Mean ests from pcountOpen models (relaxed assumption of closure between visits)
Species	Abundance	density (birds/ha)	Recruitment	Survival	p(detection)
INBU	1.54	1.96	0.54	0.15	0.50

		Mean estimates based on raw counts (max count or mean count / station)
Species	Abundance (max)	Abundance (mean)	Density (max)	Density (mean)
INBU	0.86	0.45	1.10	0.57

	Mean ests from pcount models (assumed closure between visits)
Species	Abundance	density (birds/ha)	Pct Sites Birds Seen		p(detection)
INBU	14.27	18.17	56.2%		0.06


Stringer
	Mean ests from pcountOpen models (relaxed assumption of closure between visits)
Species	Abundance	density (birds/ha)	Recruitment	Survival	p(detection)
INBU	2.94	3.75	1.76	0.21	0.29

		Mean estimates based on raw counts (max count or mean count / station)
Species	Abundance (max)	Abundance (mean)	Density (max)	Density (mean)
INBU	1.43	0.74	1.81	0.94

	Mean ests from pcount models (assumed closure between visits)
Species	Abundance	density (birds/ha)	Pct Sites Birds Seen		p(detection)
INBU	11.65	14.84	86.2%		0.23

pcountOpen models:

*********************************

 SELECTED HAB/SPP:  SMZ  /  INBU

*********************************

> best

Call:

pcountOpen(lambdaformula = ~osty_hw, gammaformula = ~1, omegaformula = ~1,

    pformula = ~MSTY_HW + JD + USTY_P, data = UMFPCO, mixture = "P",

    K = 50)

Abundance:

            Estimate      SE     z P(>|z|)

(Intercept)  0.68961 0.28944  2.38  0.0172

osty_hw     -0.00376 0.00226 -1.66  0.0961

Recruitment:

 Estimate    SE     z P(>|z|)

   -0.615 0.254 -2.42  0.0156

Apparent Survival:

 Estimate    SE     z P(>|z|)

    -1.74 0.885 -1.97  0.0489

Detection:

            Estimate      SE     z  P(>|z|)

(Intercept) -9.00238 2.52398 -3.57 3.61e-04

MSTY_HW      0.00738 0.00160  4.60 4.17e-06

JD           0.03875 0.01641  2.36 1.82e-02

USTY_P       0.00245 0.00103  2.37 1.79e-02

AIC: 445.2098 



*********************************
 SELECTED HAB/SPP:  STR  /  INBU  
*********************************
> best
 
Call:
pcountOpen(lambdaformula = ~1, gammaformula = ~1, omegaformula = ~1, 
    pformula = ~1, data = UMFPCO, mixture = "P", K = 50)
 
Abundance:
 Estimate   SE     z P(>|z|)
     1.08 3.75 0.288   0.773
 
Recruitment:
 Estimate   SE     z P(>|z|)
    0.566 2.62 0.216   0.829
 
Apparent Survival:
 Estimate   SE      z P(>|z|)
    -1.34 4.95 -0.271   0.786
 
Detection:
 Estimate   SE      z P(>|z|)
   -0.888 5.33 -0.167   0.868
 
AIC: 602.7579

Reply all

Reply to author

Forward