Open model

steeves.b...@gmail.com

unread,

Aug 24, 2013, 2:35:49 PM8/24/13

to unma...@googlegroups.com

Hey guys,

I have a quick and simple question: I have 9 quadrats (16mx16m) that I have been monitoring gecko abundance and habitat selectivity four times per month for 12 months and I was wondering if I could use the open model on a monthly basis? If not, any recommendations?

Any answers would be very much appreciated.

Thanks

Stvs

Jeffrey Royle

unread,

Aug 24, 2013, 3:24:14 PM8/24/13

to unma...@googlegroups.com

seems reasonable to me

--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Alex Anderson

unread,

Aug 26, 2013, 8:30:57 AM8/26/13

to unma...@googlegroups.com

Hi All,
after some timely and much appreciated assistance from the group I have had success modelling some rainforest bird abundances using pcount(). My data are from some ~6-15 years of monitoring (depending on site), across a broad environmental gradient. I am now working on introducing a temporal component into these models, and have some confusion as to how to proceed. My aim is to slice the data into two timesteps, and look for change in population size between the two. A typical species' data frame looks like this:
> summary(spp.umf) unmarkedFrame Object 692 sites Maximum number of observations per site: 21 Mean number of observations per site: 3.72 Sites with at least one detection: 532 Tabulation of y observations: 0 1 2 3 4 5 6 7 8 9 <NA> 588 400 646 499 282 114 29 9 3 2 11960 Site-level covariates: point_ID_2 MATemp.V1 accu04.V1 annual_pptn12.V1 slope.V1 AU10A1 : 1 Min. :-2.3154688 Min. :-3.521833 Min. :-1.656607 Min. :-1.064214 AU10A151: 1 1st Qu.:-0.8061028 1st Qu.:-0.631898 1st Qu.:-0.616339 1st Qu.:-0.754615 AU10A2 : 1 Median :-0.0218144 Median : 0.122412 Median :-0.262914 Median :-0.255946 AU10A3 : 1 Mean : 0.0000000 Mean : 0.000000 Mean : 0.000000 Mean : 0.000000 AU10A4 : 1 3rd Qu.: 0.7314643 3rd Qu.: 0.719766 3rd Qu.: 0.479222 3rd Qu.: 0.490195 AU10A5 : 1 Max. : 2.3763567 Max. : 2.547358 Max. : 5.127368 Max. : 4.875478 (Other) :686 aspect.V1 dem_250 MATemp2.V1 Min. :-1.4213289 Min. : 3.229 Min. :0.000017 1st Qu.:-0.9828337 1st Qu.: 434.676 1st Qu.:0.128789 Median : 0.0189237 Median : 757.465 Median :0.597676 Mean : 0.0000000 Mean : 697.022 Mean :0.998555 3rd Qu.: 0.7595556 3rd Qu.: 923.235 3rd Qu.:1.545021 Max. : 1.7828212 Max. :1524.078 Max. :5.647071 Observation-level covariates: wet wind temp_anomaly.V1 month start season time_step2 Min. :0.000 Min. :0.000 Min. :-4.241 Min. : 1.000 Min. : 4.750 Min. :1.000 T1 : 1397 1st Qu.:1.000 1st Qu.:0.000 1st Qu.:-0.577 1st Qu.: 3.000 1st Qu.: 6.500 1st Qu.:1.000 T2 : 1175 Median :1.000 Median :0.500 Median : 0.215 Median : 6.000 Median : 7.070 Median :2.000 NA's:11960 Mean :0.826 Mean :1.091 Mean : 0.000 Mean : 6.322 Mean : 7.307 Mean :1.583 3rd Qu.:1.000 3rd Qu.:2.000 3rd Qu.: 0.657 3rd Qu.:10.000 3rd Qu.: 7.670 3rd Qu.:2.000 Max. :1.000 Max. :5.000 Max. : 2.775 Max. :12.000 Max. :18.250 Max. :2.000 NA's :11960 NA's :11960 NA's :11960 NA's :11960 NA's :11960 NA's :11960

I have found a previous thread with a reply where Jeffrey Royle suggests that:

"unmarked can fit multi-year models using either the pcount() function, where you could model year as a polynomial trend or in relation to fixed covariates, or you can use pcountOpen which fits a fully dynamic model."

I am currently looking into these two options. My data includes some sites that were only visited in one or other period, making me think that pcountOpen will struggle to fit reasonable models, and that for simplicity it would be best to assume closure within these two periods. That said, I have yet to understand how pcount can deal with a "time_step" covariate. If I build a model with this in the obsCovs object, then is this not assuming that time_step influences detectability, rather than producing some trend in abundance? e.g:

Call:pcount(formula = ~wet + season + temp_anomaly + time_step2 ~ MATemp + MATemp2 + slope, data = spp.umf, K = 100, mixture = "ZIP", starts = c(1, -1, 0, 0, 0, 0, 0, 0, 0, 0), engine = "C")Abundance: Estimate SE z P(>|z|)(Intercept) 2.0871 0.0681 30.63 5.23e-206MATemp 0.0345 0.0235 1.47 1.42e-01MATemp2 -0.0248 0.0238 -1.04 2.97e-01slope -0.0518 0.0231 -2.24 2.52e-02Detection: Estimate SE z P(>|z|)(Intercept) -1.566 0.1122 -13.954 2.98e-44wet 0.288 0.0571 5.044 4.55e-07season 0.232 0.0462 5.030 4.91e-07temp_anomaly 0.025 0.0245 1.017 3.09e-01time_step2T2 -0.010 0.0428 -0.234 8.15e-01Zero-inflation: Estimate SE z P(>|z|) -1.62 0.125 -13 1.13e-38AIC: 8278.007

In this case I am not sure I understand how to "model year as a polynomial trend or in relation to fixed covariates" Any thoughts would be much appreciated.
Cheers
A

Jeffrey Royle

unread,

Aug 26, 2013, 10:29:21 AM8/26/13

to unma...@googlegroups.com

hi Alex,

When I wrote that I think I had in mind a model based on stacking data from multiple years on top of each other and basically treating "year x site" as "effective sites". e.g., if you have 2 years of data on 4 sites, then treat that as 8 effective sites and structure your data like this:

site1year1 obs1 obs2 obs3

site2year1 obs1 obs2 obs3

site3year1 ....

site4year1 ....

site1year2 ...

...

site4year2 ....

now you can use year as a site covariate and build a simple kind of open model in which N[site,year] is completely independent of N[site,year+1]

regards

andy

Alex Anderson

unread,

Oct 10, 2013, 11:42:51 AM10/10/13

to unma...@googlegroups.com

Hi All,
Thanks so much for your replies Andy, Marc, Richard, and others. I have since had some success in constructing a simple kind of open model using your suggestion Andy of "stacking" time-steps and designating site+time-step combinations as unique "sites". For example, below I plot abundance data for one species, but show model fits for two time-steps (superimposed in yellow and orange, with their 95% CIs)

load(file = paste(outfolder,spp_folder,"spp.umf", sep = ""))(mod2 <- pcount(~ wet + temp_anomaly ~ MATemp+MATemp2 + time_step3, spp.umf, starts = c(1,-1,0,0,0,0,0,0), K = 100,mixture = "ZIP", engine = "C"))

I think I may have come up against a limitation though in that using time-step as a covariate in this model only allows changes in the scale of the fit, and not its location. For example, if I construct the same models but for two separate unmarkedFrame objects, one for each time-step, there is a clear shift, between time-steps, in the location of the optimum value for this species:

load(file = paste(outfolder,spp_folder,"spp.umf1", sep = ""))

(mod2 <- pcount(~ wet + temp_anomaly ~ MATemp+MATemp2 , spp.umf, starts = c(1,-1,0,0,0,0,0), K = 100,mixture = "ZIP", engine = "C"))

load(file = paste(outfolder,spp_folder,"spp.umf2", sep = ""))(mod2 <- pcount(~ wet + temp_anomaly ~ MATemp+MATemp2 , spp.umf, starts = c(1,-1,0,0,0,0,0), K = 100,mixture = "ZIP", engine = "C"))

As these types of shift are the main focus of my work, I am keen to streamline this process, and do so in the most robust and repeatable way. Any thoughts on ways to make this contrast without resorting to two separate unmarkedFrame objects and model sets? Is there a way to allow the location to also vary with a time covariate?
thanks in advance.
A

PS: my code for generating the predictions and confidence intervals is as follows (requires the object newData2, which contains all the necessary covariates), (I used predictSE.zip() from the package "AICcmodavg", as suggested by Marc Kery)

library("AICcmodavg")

E.psi = predictSE.zip(mod2, newdata =newData2, se.fit = TRUE, parm.type = "lambda", type = "response", print.matrix = FALSE)E.psi = as.data.frame(E.psi)E.psi$upper = E.psi$fit + (1.96 * E.psi$se.fit )E.psi$lower = E.psi$fit - (1.96 * E.psi$se.fit )

Kery Marc

unread,

Oct 10, 2013, 11:48:48 AM10/10/13

to unma...@googlegroups.com

Hi Alex,

as far as I understand, doing something like the following should do the trick:
(mod2 <- pcount(~ wet + temp_anomaly ~ (MATemp+MATemp2) * time_step3, spp.umf, starts = c(1,-1,0,0,0,0,0,0),

K = 100,mixture = "ZIP", engine = "C"))

It is useful to think about these models exactly in the same way as, say, any other regression models, e.g., you have categorical explanatory variables (factors) and continuous explanatory variables and you can combine them in a "main effects" manner or in an "interaction effects" manner. So far, you have fitted for abundance a model with main effects only of Temp, Temp2 and Year (time-step_3). Doing the interaction in the R formula language means replacing a plus sign with a multiplication sign.

Kind regards -- Marc

______________________________________________________________

Marc Kéry
Tel. ++41 41 462 97 93
marc...@vogelwarte.ch
www.vogelwarte.ch

Swiss Ornithological Institute | Seerose 1 | CH-6204 Sempach | Switzerland
______________________________________________________________

*** Introduction to Bayesian statistical modeling: Kéry (2010), Introduction to WinBUGS for Ecologists, Academic Press; see www.mbr-pwrc.usgs.gov/pubanalysis/kerybook
*** Book on Bayesian statistical modeling: Kéry & Schaub (2012), Bayesian Population Analysis using WinBUGS, Academic Press; see www.vogelwarte.ch/bpa
*** Upcoming workshops: www.phidot.org/forum/viewforum.php?f=8

From: unma...@googlegroups.com [unma...@googlegroups.com] on behalf of Alex Anderson [alejandro....@gmail.com]
Sent: 10 October 2013 17:42
To: unma...@googlegroups.com
Subject: Re: [unmarked] N-mixture models and temporal trends in abundance: pcount versus pcountOpen

Alex Anderson

unread,

Oct 10, 2013, 12:21:59 PM10/10/13

to unma...@googlegroups.com

Hi Marc,
Thanks very much for your quick reply. That is exactly what I am after. This could also be a simple question, but I am not sure how to make sure predictSE.zip() in AICcmodavg recognises the interaction term in my new model? Thus far it is rejecting my newdata object...

> E.psi2 = predictSE.zip(mod2, newdata =newData2, se.fit = TRUE, parm.type = "lambda",+ type = "state", print.matrix = FALSE)Error in `[.data.frame`(site.cov.data, , var.names) : undefined columns selected

Though the predict() function in unmarked has no trouble. previously I got around a similar recognition problem in predictSE.zip(), with the quadratic term for "MATemp", by creating a new vector of this variable squared. Is there a similar solution here?
best regards
A

Kery Marc

unread,

Oct 10, 2013, 1:30:06 PM10/10/13

to unma...@googlegroups.com

Hi Alex,

glad this helped.

Re. the model averaging business: unfortunately, I don't know.

Kind regards -- Marc

From: unma...@googlegroups.com [unma...@googlegroups.com] on behalf of Alex Anderson [alejandro....@gmail.com]

Sent: 10 October 2013 18:21

Marc J. Mazerolle

unread,

Oct 10, 2013, 1:36:11 PM10/10/13

to unma...@googlegroups.com, alejandro....@gmail.com

Hi Alex,

For the time being, predictSE.zip( ) does not handle interaction terms
such as those you specified below. The workaround would consist in
creating your interaction terms "manually" and then adding them in the
model. For instance, you would add additional columns in your original
data set to code for the interactions. Keep in mind that to code for an
interaction term, you need to compute the product of the two variables
involved in the interaction.

In your case, you would have:

> MATemp.time_step3 <- MATemp * time_step3
> MATemp2.time_step3 <- MATemp2 * time_step3

This would be translated in your model formula, as:

> pcount(~ wet + temp_anomaly ~ MATemp + MATemp2 + time_step3 +
MATemp.time_step3 + MATemp2.time_step3) * time_step3, spp.umf, starts =

c(1,-1,0,0,0,0,0,0), K = 100,mixture = "ZIP", engine = "C"))

Best,

Marc
--
____________________________________
Marc J. Mazerolle
Centre d'étude de la forêt
Université du Québec en Abitibi-Témiscamingue
445 boulevard de l'Université
Rouyn-Noranda, Québec J9X 5E4, Canada
Tel: (819) 762-0971 ext. 2458
Email: marc.ma...@uqat.ca

-------- Message initial --------
De: Alex Anderson <alejandro....@gmail.com>
À: unma...@googlegroups.com
Sujet: Re: [unmarked] N-mixture models and temporal trends in abundance:
pcount versus pcountOpen
Date: Thu, 10 Oct 2013 18:21:59 +0200

> ______________________________________________________________________

> From: unma...@googlegroups.com [unma...@googlegroups.com] on behalf
> of Alex Anderson [alejandro....@gmail.com]
> Sent: 10 October 2013 17:42
> To: unma...@googlegroups.com
> Subject: Re: [unmarked] N-mixture models and temporal trends in
> abundance: pcount versus pcountOpen
>
>
> Hi All,
> Thanks so much for your replies Andy, Marc, Richard, and others. I
> have since had some success in constructing a simple kind of open
> model using your suggestion Andy of "stacking" time-steps and
> designating site+time-step combinations as unique "sites". For
> example, below I plot abundance data for one species, but show model
> fits for two time-steps (superimposed in yellow and orange, with their
> 95% CIs)
>
>
> load(file = paste(outfolder,spp_folder,"spp.umf", sep = ""))
>
> (mod2 <- pcount(~ wet + temp_anomaly ~ MATemp+MATemp2 + time_step3,
> spp.umf, starts = c(1,-1,0,0,0,0,0,0),
> K = 100,mixture = "ZIP", engine = "C"))
>
>
>

> I think I may have come up against a limitation though in that using
> time-step as a covariate in this model only allows changes in the
> scale of the fit, and not its location. For example, if I construct
> the same models but for two separate unmarkedFrame objects, one for
> each time-step, there is a clear shift, between time-steps, in the
> location of the optimum value for this species:
>
> load(file = paste(outfolder,spp_folder,"spp.umf1", sep = ""))
>
> (mod2 <- pcount(~ wet + temp_anomaly ~ MATemp+MATemp2 , spp.umf,
> starts = c(1,-1,0,0,0,0,0),
> K = 100,mixture = "ZIP", engine = "C"))
>
> load(file = paste(outfolder,spp_folder,"spp.umf2", sep = ""))
>
> (mod2 <- pcount(~ wet + temp_anomaly ~ MATemp+MATemp2 , spp.umf,
> starts = c(1,-1,0,0,0,0,0),
> K = 100,mixture = "ZIP", engine = "C"))
>
>
>

> > +unsub...@googlegroups.com.

Kery Marc

unread,

Oct 10, 2013, 2:03:46 PM10/10/13

to unma...@googlegroups.com, alejandro....@gmail.com

Hi Marc and hello all,

thanks for helping.

One thing I would like to point out is the usefulness of the model.matrix function in R. If you know how to write a model in the linear model definition language in R (as you do in lm() or glm() and, to a lesser extent, in lmer() etc), you can use the model.matrix function to tell you how exactly the linear predictor of that model should look like, i.e., which single-degree-of-freedom terms appear in the model.

In the case of Alejandro, if you write the following inside of an R workspace which contains the variables MATemp and MATemp2 and the factor time_step3

model.matrix(~ (MATemp + MATemp2) * time_step3)

then, you will see exactly what terms you have to fit if you want to specify that linear model "by hand", as you have to for doing model averaging as pointed out by Marc.

Kind regards - Marc

PS: Sorry for the confusing number of Marcs here....

________________________________________
From: unma...@googlegroups.com [unma...@googlegroups.com] on behalf of Marc J. Mazerolle [marc.ma...@uqat.ca]
Sent: 10 October 2013 19:36
To: unma...@googlegroups.com
Cc: alejandro....@gmail.com

Alex Anderson

unread,

Oct 23, 2013, 7:01:09 AM10/23/13

to unma...@googlegroups.com

Hi All,
With much help from the group I have been able to model abundance of about 70 species of rainforest bird using pcount() in unmarked. I have used a method suggested by Andy Royle to create some "effective sites", rather than using pcountOpen, but testing with pcountOpen yields similar results. Using parboot, model fits are generally good, with a few exceptions. I am finding though that for many species, abundance estimates can be very high, and increasingly so as K values increase, (sometimes well beyond 80 individuals for species with a max survey N value of 10 individuals). Using ZIP rather than NB to model the detection function can help in some cases, but I have also tried using lower K values as a sort of "informed prior" as suggested (but also advised against!) in previous threads on this topic.

"choosing a value of N that truncates the likelihood is effectively
like using an informative prior distribution which seems to be
generally frowned about in the literature. You would have to argue
that N=10 makes sense (or whatever), and someone else might think N=8
makes sense, and you find that everyone has their own answer!"
(from Andy Royle)

The high estimates may be the result of many species having lots of zero counts and few large counts when larger groups were detected... (over-dispersion, and possibly a violation of the assumption of closure in pcount?) Using values of about N(max) + 20 serves to reduce my estimates to plausible values for most species (though i have not checked the resulting fits with parboot() yet). One of the reasons I am so concerned to derive plausible absolute values and not just relative estimates is that I plan to use them in conjunction with distance-sampling based estimates of effective strip width, calculated in unmarked using a subset of surveys when distance measurements were taken to each observation. My question is thus two-fold:

1: Is it valid to use reduced K values to truncate the tails of the assumed error distributions and bring my estimated N values into a plausible range?

2: Given the above, is it then valid to use these estimates, multiplied by independently derived effective strip widths, to estimate density, or is this "mixing" two aspects of detectability process in a way that is not valid?

Thanks for all help so far, and any thoughts or suggestions would be much appreciated.

Alex

#################################################

Richard Chandler

unread,

Oct 24, 2013, 9:20:03 PM10/24/13

to Unmarked package

Hi Alex,

If abundance estimates don't stabilize as K increases, it suggests that detection probability is very low. There aren't any methods (occupancy, capture-recapture, etc...) that work very well in this situation. Your best bet is to find "good" covariates of detection and stick to a simple abundance model, such as the Poisson. If this doesn't help, you may need to forget about estimating detection probability and just model the counts as an index of abundance. Or you could go the Bayesian route and find some prior information about the parameters.

Richard

Richard Chandler

Assistant Professor

Warnell School of Forestry and Natural Resources

University of Georgia

706-542-5815

chandlerlab.wordpress.com

Alex W.

unread,

Jun 12, 2015, 11:08:23 AM6/12/15

to unma...@googlegroups.com

Hi all,

I have a dataset stretching over 14 sampling years (22 calendar years), with relatively continual sampling throughout each year. I originally tried to set this up as a robust design in pcountOpen, but was coming up with estimates of detection ~ 0.05 or less, regardless of how I divided up the seasons into secondary periods (I've tried lumped data into just 3 surveys or dividing it out into as much as 21 surveys within a season). I ran into the issue mentioned by Richard below of abundance estimates increasing with K, regardless of mixture, starting values, modeling detection on obsCovs, etc., which I assume is because of the low detection. I have "stacked" the data into site-years, as Andy suggested, and now get detection estimates 0.2 < p < 0.25 using the Poisson mixture. The NB mixture has much better AIC scores (~2000 AIC lower for NB vs Poisson models) but the NB models still have p ~0.07 vs p ~0.2 for Poissoin models (and higher abundance estimates, ~29 for NB vs ~9 for Poisson) , whether using a null model or modeling detection on obsCovs.

My main questions: 1) Does stacking the data into site-years violate the assumption of independent sites? 2) How does stacking the sites change the interpretation of results?

If anyone has any other suggestions for dealing with my data, I'm all ears! I've included a UMF summary below for reference.

Many thanks,

Alex Wolf

> summary(SCLA.umf)

unmarkedFrame Object

1512 sites

Maximum number of observations per site: 3

Mean number of observations per site: 3

Sites with at least one detection: 1420

Tabulation of y observations:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 33 <NA>

1295 1022 765 481 310 229 132 97 67 37 31 25 14 11 5 6 2 3 1 2 1 0

Site-level covariates:

Block Year ELT Landform Slope NEness Aspect SoilRest SoilDrain

b1:504 y1992 :108 elt17:756 Backslope:1106 Min. :-2.2156 Min. :-1.39050 W :266 No :812 mwse :588

b2:504 y1993 :108 elt18:756 Bench : 182 1st Qu.:-0.5791 1st Qu.:-0.94633 NE :224 Yes:700 mww : 28

b3:504 y1994 :108 Footslope: 14 Median : 0.1730 Median :-0.09327 E :210 se :224

y1995 :108 Shoulder : 196 Mean : 0.0000 Mean : 0.00000 NW :196 w :182

y1998 :108 Summit : 14 3rd Qu.: 0.7294 3rd Qu.: 1.04722 SE :196 wmwse: 14

y1999 :108 Max. : 2.3273 Max. : 1.38962 SW :182 wse :476

(Other):864 (Other):238

Flow PondD StreamD Harvest TSH HrvstTime

Min. :-0.39176 Min. :-0.67204 Min. :-1.70894 Clearcut : 45 Min. : 0.00 Leave. :1102

1st Qu.:-0.34521 1st Qu.:-0.55085 1st Qu.:-0.88223 Group opening : 37 1st Qu.: 0.00 Single-tree selection.recent : 123

Median :-0.20558 Median :-0.36704 Median :-0.03105 Intermediate thin : 118 Median : 0.00 Single-tree selection.longterm: 87

Mean : 0.00000 Mean : 0.00000 Mean : 0.00000 Leave :1102 Mean : 1.95 Intermediate thin.recent : 67

3rd Qu.:-0.06594 3rd Qu.:-0.00316 3rd Qu.: 0.75768 Single-tree selection: 210 3rd Qu.: 1.00 Intermediate thin.longterm : 51

Max. : 9.05703 Max. : 2.95899 Max. : 2.37165 Max. :17.00 Clearcut.recent : 27

(Other) : 55

TimeClass

longterm: 177

none :1102

recent : 233

mike worland

unread,

Jun 22, 2015, 9:30:15 PM6/22/15

to unma...@googlegroups.com

Alex,

I stacked my data as well and wondered about these same issues. I think this post might help you as it did me:

https://groups.google.com/forum/#!searchin/unmarked/pseudoreplication/unmarked/OHkk98y09Zo/6gRiEowXNeMJ

Best,

Mike Worland

Alex W.

unread,

Jun 23, 2015, 10:13:29 PM6/23/15

to unma...@googlegroups.com

MIke,

Thanks so much for your response! That thread is really helpful. I tried searching for past posts on the forum but clearly missed some good ones.