Setting K in pcount

1,310 views
Skip to first unread message

rak

unread,
Jan 11, 2011, 2:47:46 PM1/11/11
to unma...@googlegroups.com

I have an avian point count data set in which the maximum count (primarily auditory) over all visits was 5.  The birds are highly territorial at the time of year when the surveys were performed, and the maximum number that could have been heard at one site visit (200 m observation radius) is estimated to be 10 with maximal territory packing.  Pcount's default K is defined as

K <- max(y, na.rm = TRUE) + 20

so K=25 is the default in this case.  In the literature the equations show integration of likelihood to infinity, but 100 or 200 is commonly used as the limit in practice.  However, Royle (2004) says that with this model, "the marginal likelihood of the counts is obtained by integrating the binomial likelihood for the observed counts over possible values of abundance for each site" (emphasis mine).  Setting a large K is understandable for nonterritorial species, but in this case should K be set to 10? 

Richard Chandler

unread,
Jan 11, 2011, 9:23:41 PM1/11/11
to unma...@googlegroups.com
Richard-

If there really is a near zero probability that abundance could be > 10, then the MLEs should be the same for any value of K >= 10. Have you tried different values of K?

-Richard

Jeffrey Royle

unread,
Jan 12, 2011, 8:17:41 AM1/12/11
to unma...@googlegroups.com
My belief is that you probably want to set K high enough so that
there's no affect at all on the MLE -- when I wrote "possible values"
what I meant was mathematically possible, i.e., the support of the
random variable N, not biological possible.

An interesting example arises when you consider the negative binomial
model. For some reason ecologists are obsessed about fitting negative
binomial models (there are now dozens of papers). For the NB model
there is extreme sensitivity to the value of K for these models owing
to, essentially, a very long right tail of the negative binomial
model. You will find in many practical situations that you need to use
K=several hundred (even 1000) to realize no effect on the MLEs. So, NB
models imply support for N on extremely large and unreasonable
(biologically) values.

choosing a value of N that truncates the likelihood is effectively
like using an informative prior distribution which seems to be
generally frowned about in the literature. You would have to argue
that N=10 makes sense (or whatever), and someone else might think N=8
makes sense, and you find that everyone has their own answer!

regards
andy


On Tue, Jan 11, 2011 at 2:47 PM, rak <richard...@myfwc.com> wrote:

rak

unread,
Jan 12, 2011, 9:59:07 AM1/12/11
to unmarked

Many thanks for your replies Richard and Andy. I composed the post
below before I saw Andy's, but it is very helpful as well. RK

Yes I have tried varying K with a variety of covariates and mixture
distribution models. For any particular combination of covariates the
NB model is always better than Poisson by AIC and by parboot (although
I’ve applied parboot to fewer models than I’ve looked at AIC).
Parboot GOF results don’t seem to become even barely acceptable until
K is about 15. Models with still larger K do better by parboot, and
AIC continues to decrease as K is set larger (I’ve set K up to 500
for at least one model), but the disturbing thing is that MLE
parameter estimates never seem to settle down even though AIC and NLL
may change little with further increases in K. The abundance estimate
(judged by inspecting the abundance intercept because all abundance
covariates are scaled to mean 0) just keeps getting larger—
ridiculously so—as K is set higher. Model convergence always looks OK
(the convergence index of the model output is 0). Just looking at the
data, I don’t see anything particularly odd about the counts that
would suggest a cause for such pathological behavior. I may need to
pay closer attention to model condition number than I have been (I’ve
checked it for a couple of models, and it looked OK).

It is a fairly large data set that I posted about once previously—6
years, 3 count surveys per year, 166 survey points established to
provide independent counts at any particular time. Maybe just the
size of it contributes somehow to the strange behavior. To apply
pcount as an exploratory tool, I arranged the data with each year’s 3
counts from a site on a row, and year was included as a site covariate
so that year’s “effect” on both abundance and detection (along with
other covariates) could be checked. I’ve also been applying more
appropriate “open population” models with R code kindly provided by
David Dail. That code can also do closed models and it uses the same
basic machinery as pcount (R’s optim function and method = “BFGS” as
the default—although I’ve tried other method options too with similar
results); but the input data are structured somewhat differently than
for unmarked and the models take much longer to run in my 32-bit
environment, so I have been interested in checking how results from
pcount with year as covariate compare to the open-population model
results for these data. It turns out that with Dail’s code I’m
getting the same disturbing effect of K on the model results as with
pcount.

The reason for my original post is that I was thinking that if I can’t
“solve” the K effect problem, then perhaps I could just set K at some
value for which the pcount parboot GOF test is passed and for which
the model condition number is OK (say K = 15), then use that model to
assess relative effects of covariates (including year) on abundance
even though the absolute estimate of abundance implied by the model
may not be particularly trustworthy. To do this convincingly I
suppose that I would need to do parallel analyses with different K’s
and show that the implied relative effects of year and other
covariates are the same regardless of K.

Many thanks for replying to my first post and for any further
comments.

Richard Chandler

unread,
Jan 12, 2011, 11:16:33 AM1/12/11
to unma...@googlegroups.com
What are your estimates of lambda, p, and alpha from the "null" negative binomial model? I might expect strange behavior if p is very low and alpha is very high. Notice that K doesn't have much influence on "good" data once K is high enough.




sim1 <- function(R=100, J=5, lambda=2, p=0.3, mix="P", disp=1) {
    y <- matrix(NA, R, J)
    switch(mix,
        P = N <- rpois(R, lambda),
        NB = N <- rnbinom(R, mu=lambda, size=disp)
        )
    for(i in 1:R) {
        y[i,] <- rbinom(J, N[i], p)
        }
    return(y)
    }


> set.seed(7)
>
> umf1 <- unmarkedFramePCount(y=sim1(mix="NB", disp=0.7))
> m1 <- pcount(~1 ~1, umf1, K=10)
> coef(m1)
  lam(Int)     p(Int)
 0.4783041 -0.4713292
>
> m2 <- pcount(~1 ~1, umf1, K=100)
> coef(m2)
  lam(Int)     p(Int)
 0.4790952 -0.4725918
>
>
> m3 <- pcount(~1 ~1, umf1, K=10, mixture="NB")
> coef(m3)
    lam(Int)       p(Int) alpha(alpha)
   0.7168886   -0.8344119   -0.1940775
>
> m4 <- pcount(~1 ~1, umf1, K=100, mixture="NB")
> coef(m4)
    lam(Int)       p(Int) alpha(alpha)
   1.0292313   -1.2568697   -0.4097785
>
> m5 <- pcount(~1 ~1, umf1, K=500, mixture="NB")
> coef(m5)
    lam(Int)       p(Int) alpha(alpha)
   1.0292313   -1.2568697   -0.4097785
>



BTW, simulation tests like this and other quality control checks are part of the package. You can find them in the "R/library/unmarked/inst/unitTests" directory. I'll add this one now.


Richard

Andy Royle

unread,
Jan 12, 2011, 11:06:43 PM1/12/11
to unma...@googlegroups.com, unmarked

hi Richard,
 just a couple of comments here:
 Sometimes the NB model is very unstable, when either the data set is relatively sparse or when extreme over-dispersion is indicated. We had some cases of this in the Kery et al. 2005 paper (Ecol. Apps). What happens, I think, is that excessive over-dispersion can look like really low encounter probability (among other things) and this induces a near-nonidentifiability problem -- but also MLEs that occur near p=0 which causes E[N] to be huge also. Caution is in order whenever fitting the NB model. 
 
 Richard Chandler wrote some functions in unmarked that fit the Dail-Madsen model and I think he will provide to you -- Richard's stuff has the advantage that it works with the unmarked data formats and summary functions and, who knows, maybe its even faster!  (maybe he already has that stuff in the current release of unmarked).
 
regards,
andy

 
-----unma...@googlegroups.com wrote: -----

To: unmarked <unma...@googlegroups.com>
From: rak <richard...@myfwc.com>
Sent by: unma...@googlegroups.com
Date: 01/12/2011 09:59AM
Subject: [unmarked] Re: Setting K in pcount

Richard Chandler

unread,
Jan 13, 2011, 7:31:05 AM1/13/11
to unma...@googlegroups.com
Hopefully next week I will be able to release a new version of unmarked with the Dail-Madsen model.

Richard

rak

unread,
Jan 13, 2011, 8:55:52 AM1/13/11
to unmarked
Thanks again for the comments Richard and Andy.

Below are the MLEs from the null NB model. They seem to indicate that
the data are “good,” I just need to make K large enough.

K Lam p alpha AIC
10 0.4707985 -1.180340 -0.1900092 4401.524
25 1.0489732 -1.884234 -0.4423648 4368.535
50 1.5074432 -2.396683 -0.5305943 4364.168
100 1.7593279 -2.668474 -0.5606742 4363.755
200 1.7661927 -2.675797 -0.5617480 4363.755
500 1.7661927 -2.675797 -0.5617480 4363.755

As in your example, the MLEs from the null P model are immediately
stable:
K Lam p AIC
10 0.1798548 -0.7741432 4584.089
25 0.1798724 -0.7741687 4584.088
Etc.

However, with covariates even the Poisson model MLEs take a while to
settle down. I thought yesterday that I had a usable Poisson model
with covariates slightly pared down from the full set, but it failed
the parboot test miserably. My NB models with the full set of
covariates seem to behave just as Andy described, but I’m somewhat
hopeful now that with some paring of the covariates I may be able to
identify an NB model that settles to reasonable estimates with
sufficient K and that passes the parboot test.

Looking forward to the next unmarked release. All the best, RK

Richard Chandler

unread,
Jan 13, 2011, 9:21:03 AM1/13/11
to unma...@googlegroups.com
Thanks for providing the nice example. I see a couple of issues. First, the NB model indicates that detection probability is only 0.06:

> plogis(-2.67)
[1] 0.06476697

Is that reasonable?

Second, take a look at the probability mass functions for the NB and the P distributions:

> plot(dnbinom(0:20, mu=exp(1.766), size=exp(-0.5617)), type="h", ylab="density", xlab="N") # tail is much longer than shown
> plot(dpois(0:20, lambda=exp(0.18)), type="h", ylab="density", xlab="N")

I think caution is definitely advised with the NB model in this case, as suggested by Andy. Another option would be to try a zero-inflated Poisson, but that isn't implemented in unmarked yet. Alternatively, If you think the expectation of the Poisson model is correct but the variance is underestimated, you could consider a non-parametric bootstrap:

newmodel <- nonparboot(yourPoissonModel, B=someHighNumber)
vcov(newmodel, method="nonparboot")

Or there are all sorts of post-hoc adjustments that people do.


Richard

Jeffrey Royle

unread,
Jan 13, 2011, 10:06:41 AM1/13/11
to unma...@googlegroups.com
Maybe we should change the default value of K for these models to be a
higher value and let the user decide otherwise.

Richard Chandler

unread,
Jan 13, 2011, 10:25:12 AM1/13/11
to unma...@googlegroups.com

rak

unread,
Jan 13, 2011, 11:14:04 AM1/13/11
to unmarked
You're right that 0.06 seems too low for detection p. I was hoping
that with covariates, NB might be better. Here's the model results
with some scaled covariates.

Abundance:
Estimate SE z P(>|z|)
(Intercept) 0.714 0.1924 3.71 2.05e-04
scale(yr) 0.442 0.1491 2.96 3.06e-03
scale(Y) 0.622 0.1311 4.74 2.10e-06
scale(edgedist) 0.473 0.0449 10.54 5.90e-26

Detection:
Estimate SE z P(>|z|)
(Intercept) -1.96386 0.2203 -8.9136 4.94e-19
scale(yr) -0.62892 0.1722 -3.6529 2.59e-04
scale(Y) -0.18306 0.1474 -1.2422 2.14e-01
scale(lbrn) -0.44686 0.0580 -7.7048 1.31e-14
scale(gbrn) -0.14130 0.0536 -2.6340 8.44e-03
scale(da) -0.08268 0.0402 -2.0592 3.95e-02
scale(st) -0.52749 0.0512 -10.2930 7.57e-25
scale(rnyr) 0.00196 0.0604 0.0325 9.74e-01
scale(rn01) 0.08563 0.0482 1.7761 7.57e-02

Dispersion:
Estimate SE z P(>|z|)
0.755 0.214 3.53 0.000413

AIC: 3969.257

and detection p is better but maybe still too low: plogis(-1.96386) =
0.1230499

The mass distribution plot is interesting though because it just about
hits zero beyond 10, the inferred maximum per site.

plot(dnbinom(0:20, mu=exp(0.714), size=exp(0.755)), type="h",
ylab="density", xlab="N")

I'll look into the nonparametric bootstrap next week, thanks for that
suggestion.

RK

Jenny Hofmeister

unread,
Oct 13, 2015, 7:54:18 PM10/13/15
to unmarked, richard...@myfwc.com
Hi, 

I know this is an older post, but I am having the exact same issues as RK except my lambda and p values won't stabilize no matter how big I set K. I have point count values of octopuses in 24 sites, each site visited 3 times. My occupancy is 1, but abundance does vary across visits within each site. I have tried with the null model, with covariates, Poisson, Negative Binomial, and Zero-Inflated Poisson and nothing has solved the issue. Both my p values and my lambda values increase with increasing K. I have tried up to K = 500 with no stabilization of parameters. Not to mention, it is ecologically impossible to have 500 octopuses in a 100 m^2 area. I don't know if it is because my number of sites is fairly low or there is something else I am missing. 

I am very new to unmarked and binomial mixture modeling, so any advice you can provide would be greatly appreciated! 

Thanks, 
Jenny

Jeffrey Royle

unread,
Oct 13, 2015, 8:03:27 PM10/13/15
to unma...@googlegroups.com
hi Jenny,

 it is possible to have a basic identifiability problem in this class of models if there is too much heterogeneity within or among sites.  I'm guessing this is the situation you have.
Some relevant discussion appears in this paper:

A good diagnostic of this problem is what you're observing: as you increase K you do not obtain stable MLEs. 
regards,
andy


--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jenny Hofmeister

unread,
Oct 13, 2015, 8:03:44 PM10/13/15
to unmarked, richard...@myfwc.com
To clarify, my detection probability decreases with increasing K, and my lambda increases with increasing K, with no stabilization in either. Also, with increasing K, correlations between my covariates and the "true abundance" (using the ranef() function) gets stronger. 

Thanks, 
Jenny
Reply all
Reply to author
Forward
0 new messages