Open model

1,162 views
Skip to first unread message

steeves.b...@gmail.com

unread,
Aug 24, 2013, 2:35:49 PM8/24/13
to unma...@googlegroups.com

Hey guys,

I have a quick and simple question: I have 9 quadrats (16mx16m) that I have been monitoring gecko abundance and habitat selectivity four times per month for 12 months and I was wondering if I could use the open model on a monthly basis? If not, any recommendations?

Any answers would be very much appreciated.

Thanks

Stvs

Jeffrey Royle

unread,
Aug 24, 2013, 3:24:14 PM8/24/13
to unma...@googlegroups.com
seems reasonable to me


--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Alex Anderson

unread,
Aug 26, 2013, 8:30:57 AM8/26/13
to unma...@googlegroups.com
Hi All,
after some timely and much appreciated assistance from the group I have had success modelling some rainforest bird abundances using pcount().  My data are from some ~6-15 years of monitoring (depending on site), across a broad environmental gradient.  I am now working on introducing a temporal component into these models, and have some confusion as to how to proceed.  My aim is to slice the data into two timesteps, and look for change in population size between the two.  A typical species' data frame looks like this:

> summary(spp.umf)
unmarkedFrame Object

692 sites
Maximum number of observations per site: 21
Mean number of observations per site: 3.72
Sites with at least one detection: 532

Tabulation of y observations:
    0     1     2     3     4     5     6     7     8     9  <NA>
  588   400   646   499   282   114    29     9     3     2 11960

Site-level covariates:
    point_ID_2       MATemp.V1            accu04.V1       annual_pptn12.V1        slope.V1     
 AU10A1  :  1   Min.   :-2.3154688   Min.   :-3.521833   Min.   :-1.656607   Min.   :-1.064214 
 AU10A151:  1   1st Qu.:-0.8061028   1st Qu.:-0.631898   1st Qu.:-0.616339   1st Qu.:-0.754615 
 AU10A2  :  1   Median :-0.0218144   Median : 0.122412   Median :-0.262914   Median :-0.255946 
 AU10A3  :  1   Mean   : 0.0000000   Mean   : 0.000000   Mean   : 0.000000   Mean   : 0.000000 
 AU10A4  :  1   3rd Qu.: 0.7314643   3rd Qu.: 0.719766   3rd Qu.: 0.479222   3rd Qu.: 0.490195 
 AU10A5  :  1   Max.   : 2.3763567   Max.   : 2.547358   Max.   : 5.127368   Max.   : 4.875478 
 (Other) :686                                                                                  
      aspect.V1          dem_250             MATemp2.V1   
 Min.   :-1.4213289   Min.   :   3.229   Min.   :0.000017 
 1st Qu.:-0.9828337   1st Qu.: 434.676   1st Qu.:0.128789 
 Median : 0.0189237   Median : 757.465   Median :0.597676 
 Mean   : 0.0000000   Mean   : 697.022   Mean   :0.998555 
 3rd Qu.: 0.7595556   3rd Qu.: 923.235   3rd Qu.:1.545021 
 Max.   : 1.7828212   Max.   :1524.078   Max.   :5.647071 
                                                          

Observation-level covariates:
      wet             wind       temp_anomaly.V1      month            start            season      time_step2 
 Min.   :0.000   Min.   :0.000   Min.   :-4.241   Min.   : 1.000   Min.   : 4.750   Min.   :1.000   T1  : 1397 
 1st Qu.:1.000   1st Qu.:0.000   1st Qu.:-0.577   1st Qu.: 3.000   1st Qu.: 6.500   1st Qu.:1.000   T2  : 1175 
 Median :1.000   Median :0.500   Median : 0.215   Median : 6.000   Median : 7.070   Median :2.000   NA's:11960 
 Mean   :0.826   Mean   :1.091   Mean   : 0.000   Mean   : 6.322   Mean   : 7.307   Mean   :1.583              
 3rd Qu.:1.000   3rd Qu.:2.000   3rd Qu.: 0.657   3rd Qu.:10.000   3rd Qu.: 7.670   3rd Qu.:2.000              
 Max.   :1.000   Max.   :5.000   Max.   : 2.775   Max.   :12.000   Max.   :18.250   Max.   :2.000              
 NA's   :11960   NA's   :11960   NA's   :11960    NA's   :11960    NA's   :11960    NA's   :11960 
 

I have found a previous thread with a reply where Jeffrey Royle suggests that:

"unmarked can fit multi-year models using either the pcount() function, where you could model year as a polynomial trend or in relation to fixed covariates, or you can use pcountOpen which fits a fully dynamic model."

I am currently looking into these two options.  My data includes some sites that were only visited in one or other period, making me think that pcountOpen will struggle to fit reasonable models, and that for simplicity it would be best to assume closure within these two periods.  That said, I have yet to understand how pcount can deal with a "time_step" covariate.  If I build a model with this in the obsCovs object, then is this not assuming that time_step influences detectability, rather than producing some trend in abundance? e.g:

Call:
pcount(formula = ~wet + season + temp_anomaly + time_step2 ~
    MATemp + MATemp2 + slope, data = spp.umf, K = 100, mixture = "ZIP",
    starts = c(1, -1, 0, 0, 0, 0, 0, 0, 0, 0), engine = "C")

Abundance:
            Estimate     SE     z   P(>|z|)
(Intercept)   2.0871 0.0681 30.63 5.23e-206
MATemp        0.0345 0.0235  1.47  1.42e-01
MATemp2      -0.0248 0.0238 -1.04  2.97e-01
slope        -0.0518 0.0231 -2.24  2.52e-02

Detection:
             Estimate     SE       z  P(>|z|)
(Intercept)    -1.566 0.1122 -13.954 2.98e-44
wet             0.288 0.0571   5.044 4.55e-07
season          0.232 0.0462   5.030 4.91e-07
temp_anomaly    0.025 0.0245   1.017 3.09e-01
time_step2T2   -0.010 0.0428  -0.234 8.15e-01

Zero-inflation:
 Estimate    SE   z  P(>|z|)
    -1.62 0.125 -13 1.13e-38

AIC: 8278.007

In this case I am not sure I understand how to "model year as a polynomial trend or in relation to fixed covariates" Any thoughts would be much appreciated.
Cheers
A


Jeffrey Royle

unread,
Aug 26, 2013, 10:29:21 AM8/26/13
to unma...@googlegroups.com
hi Alex,
 When I wrote that I think I had in mind a model based on stacking data from multiple years on top of each other and basically treating "year x site" as "effective sites". e.g., if you have 2 years of data on 4 sites, then treat that as 8 effective sites and structure your data like this:
 
 site1year1  obs1 obs2 obs3
 site2year1  obs1 obs2 obs3
 site3year1 ....
 site4year1 ....
 site1year2 ...
  ...
 ...
  site4year2  ....
 
 
now you can use year as a site covariate and build a simple kind of open model in which N[site,year] is completely independent of N[site,year+1]
 
regards
andy
 


Alex Anderson

unread,
Oct 10, 2013, 11:42:51 AM10/10/13
to unma...@googlegroups.com
Hi All,
Thanks so much for your replies Andy, Marc, Richard, and others.  I have since had some success in constructing a simple kind of open model using your suggestion Andy of "stacking" time-steps and designating site+time-step combinations as unique "sites".  For example, below I plot abundance data for one species, but show model fits for two time-steps (superimposed in yellow and orange, with their 95% CIs)


load(file = paste(outfolder,spp_folder,"spp.umf", sep = ""))

(mod2 <- pcount(~ wet + temp_anomaly ~ MATemp+MATemp2 + time_step3, spp.umf, starts = c(1,-1,0,0,0,0,0,0),
                K = 100,mixture = "ZIP", engine  = "C"))



I think I may have come up against a limitation though in that using time-step as a covariate in this model only allows changes in the scale of the fit, and not its location.  For example, if I construct the same models but for two separate unmarkedFrame objects, one for each time-step, there is a clear shift, between time-steps, in the location of the optimum value for this species:

load(file = paste(outfolder,spp_folder,"spp.umf1", sep = ""))

(mod2 <- pcount(~ wet + temp_anomaly ~ MATemp+MATemp2 , spp.umf, starts = c(1,-1,0,0,0,0,0),
                K = 100,mixture = "ZIP", engine  = "C"))

load(file = paste(outfolder,spp_folder,"spp.umf2", sep = ""))

(mod2 <- pcount(~ wet + temp_anomaly ~ MATemp+MATemp2 , spp.umf, starts = c(1,-1,0,0,0,0,0),
                K = 100,mixture = "ZIP", engine  = "C"))



As these types of shift are the main focus of my work, I am keen to streamline this process, and do so in the most robust and repeatable way.  Any thoughts on ways to make this contrast without resorting to two separate unmarkedFrame objects and model sets?   Is there a way to allow the location to also vary with a time covariate?
thanks in advance.
A

PS: my code for generating the predictions and confidence intervals is as follows (requires the object newData2, which contains all the necessary covariates),  (I used predictSE.zip() from the package "AICcmodavg", as suggested by Marc Kery)

library("AICcmodavg")

E.psi = predictSE.zip(mod2, newdata =newData2, se.fit = TRUE, parm.type = "lambda",
                      type = "response", print.matrix = FALSE)

E.psi = as.data.frame(E.psi)
E.psi$upper = E.psi$fit + (1.96 * E.psi$se.fit )
E.psi$lower = E.psi$fit - (1.96 * E.psi$se.fit )

Kery Marc

unread,
Oct 10, 2013, 11:48:48 AM10/10/13
to unma...@googlegroups.com
Hi Alex,

as far as I understand, doing something like the following should do the trick:

(mod2 <- pcount(~ wet + temp_anomaly ~ (MATemp+MATemp2) * time_step3, spp.umf, starts = c(1,-1,0,0,0,0,0,0),

                K = 100,mixture = "ZIP", engine  = "C"))


It is useful to think about these models exactly in the same way as, say, any other regression models, e.g., you have categorical explanatory variables (factors) and continuous explanatory variables and you can combine them in a "main effects" manner or in an "interaction effects" manner. So far, you have fitted for abundance a model with main effects only of Temp, Temp2 and Year (time-step_3). Doing the interaction in the R formula language means replacing a plus sign with a multiplication sign.

Kind regards  --  Marc
 
______________________________________________________________
 
Marc Kéry
Tel. ++41 41 462 97 93
marc...@vogelwarte.ch
www.vogelwarte.ch
 
Swiss Ornithological Institute | Seerose 1 | CH-6204 Sempach | Switzerland
______________________________________________________________
 
*** Introduction to Bayesian statistical modeling: Kéry (2010), Introduction to WinBUGS for Ecologists, Academic Press; see www.mbr-pwrc.usgs.gov/pubanalysis/kerybook 
*** Book on Bayesian statistical modeling: Kéry & Schaub (2012), Bayesian Population Analysis using WinBUGS, Academic Press; see
www.vogelwarte.ch/bpa
*** Upcoming workshops: www.phidot.org/forum/viewforum.php?f=8


From: unma...@googlegroups.com [unma...@googlegroups.com] on behalf of Alex Anderson [alejandro....@gmail.com]
Sent: 10 October 2013 17:42
To: unma...@googlegroups.com
Subject: Re: [unmarked] N-mixture models and temporal trends in abundance: pcount versus pcountOpen

Alex Anderson

unread,
Oct 10, 2013, 12:21:59 PM10/10/13
to unma...@googlegroups.com
Hi Marc,
Thanks very much for your quick reply.  That is exactly what I am after.  This could also be a simple question, but I am not sure how to make sure predictSE.zip() in AICcmodavg recognises the interaction term in my new model? Thus far it is rejecting my newdata object...

  > E.psi2 = predictSE.zip(mod2, newdata =newData2, se.fit = TRUE, parm.type = "lambda",
+                        type = "state", print.matrix = FALSE)
Error in `[.data.frame`(site.cov.data, , var.names) :
  undefined columns selected

Though the predict() function in unmarked has no trouble.  previously I got around a similar recognition problem in predictSE.zip(), with the quadratic term for "MATemp", by creating a new vector of this variable squared.  Is there a similar solution here?
best regards
A

Kery Marc

unread,
Oct 10, 2013, 1:30:06 PM10/10/13
to unma...@googlegroups.com
Hi Alex,

glad this helped.

Re. the model averaging business: unfortunately, I don't know.

Kind regards  --  Marc


Sent: 10 October 2013 18:21

Marc J. Mazerolle

unread,
Oct 10, 2013, 1:36:11 PM10/10/13
to unma...@googlegroups.com, alejandro....@gmail.com
Hi Alex,

For the time being, predictSE.zip( ) does not handle interaction terms
such as those you specified below. The workaround would consist in
creating your interaction terms "manually" and then adding them in the
model. For instance, you would add additional columns in your original
data set to code for the interactions. Keep in mind that to code for an
interaction term, you need to compute the product of the two variables
involved in the interaction.

In your case, you would have:

> MATemp.time_step3 <- MATemp * time_step3
> MATemp2.time_step3 <- MATemp2 * time_step3

This would be translated in your model formula, as:

> pcount(~ wet + temp_anomaly ~ MATemp + MATemp2 + time_step3 +
MATemp.time_step3 + MATemp2.time_step3) * time_step3, spp.umf, starts =
c(1,-1,0,0,0,0,0,0), K = 100,mixture = "ZIP", engine = "C"))


Best,

Marc
--
____________________________________
Marc J. Mazerolle
Centre d'étude de la forêt
Université du Québec en Abitibi-Témiscamingue
445 boulevard de l'Université
Rouyn-Noranda, Québec J9X 5E4, Canada
Tel: (819) 762-0971 ext. 2458
Email: marc.ma...@uqat.ca


-------- Message initial --------
De: Alex Anderson <alejandro....@gmail.com>
À: unma...@googlegroups.com
Sujet: Re: [unmarked] N-mixture models and temporal trends in abundance:
pcount versus pcountOpen
Date: Thu, 10 Oct 2013 18:21:59 +0200
> ______________________________________________________________________
> From: unma...@googlegroups.com [unma...@googlegroups.com] on behalf
> of Alex Anderson [alejandro....@gmail.com]
> Sent: 10 October 2013 17:42
> To: unma...@googlegroups.com
> Subject: Re: [unmarked] N-mixture models and temporal trends in
> abundance: pcount versus pcountOpen
>
>
> Hi All,
> Thanks so much for your replies Andy, Marc, Richard, and others. I
> have since had some success in constructing a simple kind of open
> model using your suggestion Andy of "stacking" time-steps and
> designating site+time-step combinations as unique "sites". For
> example, below I plot abundance data for one species, but show model
> fits for two time-steps (superimposed in yellow and orange, with their
> 95% CIs)
>
>
> load(file = paste(outfolder,spp_folder,"spp.umf", sep = ""))
>
> (mod2 <- pcount(~ wet + temp_anomaly ~ MATemp+MATemp2 + time_step3,
> spp.umf, starts = c(1,-1,0,0,0,0,0,0),
> K = 100,mixture = "ZIP", engine = "C"))
>
>
>
> I think I may have come up against a limitation though in that using
> time-step as a covariate in this model only allows changes in the
> scale of the fit, and not its location. For example, if I construct
> the same models but for two separate unmarkedFrame objects, one for
> each time-step, there is a clear shift, between time-steps, in the
> location of the optimum value for this species:
>
> load(file = paste(outfolder,spp_folder,"spp.umf1", sep = ""))
>
> (mod2 <- pcount(~ wet + temp_anomaly ~ MATemp+MATemp2 , spp.umf,
> starts = c(1,-1,0,0,0,0,0),
> K = 100,mixture = "ZIP", engine = "C"))
>
> load(file = paste(outfolder,spp_folder,"spp.umf2", sep = ""))
>
> (mod2 <- pcount(~ wet + temp_anomaly ~ MATemp+MATemp2 , spp.umf,
> starts = c(1,-1,0,0,0,0,0),
> K = 100,mixture = "ZIP", engine = "C"))
>
>
>
> > +unsub...@googlegroups.com.

Kery Marc

unread,
Oct 10, 2013, 2:03:46 PM10/10/13
to unma...@googlegroups.com, alejandro....@gmail.com
Hi Marc and hello all,

thanks for helping.

One thing I would like to point out is the usefulness of the model.matrix function in R. If you know how to write a model in the linear model definition language in R (as you do in lm() or glm() and, to a lesser extent, in lmer() etc), you can use the model.matrix function to tell you how exactly the linear predictor of that model should look like, i.e., which single-degree-of-freedom terms appear in the model.

In the case of Alejandro, if you write the following inside of an R workspace which contains the variables MATemp and MATemp2 and the factor time_step3

model.matrix(~ (MATemp + MATemp2) * time_step3)

then, you will see exactly what terms you have to fit if you want to specify that linear model "by hand", as you have to for doing model averaging as pointed out by Marc.

Kind regards - Marc



PS: Sorry for the confusing number of Marcs here....

________________________________________
From: unma...@googlegroups.com [unma...@googlegroups.com] on behalf of Marc J. Mazerolle [marc.ma...@uqat.ca]
Sent: 10 October 2013 19:36
To: unma...@googlegroups.com
Cc: alejandro....@gmail.com

Alex Anderson

unread,
Oct 23, 2013, 7:01:09 AM10/23/13
to unma...@googlegroups.com
Hi All,
With much help from the group I have been able to model abundance of about 70 species of rainforest bird using pcount() in unmarked.  I have used a method suggested by Andy Royle to create some "effective sites", rather than using pcountOpen, but testing with pcountOpen yields similar results.  Using parboot, model fits are generally good, with a few exceptions. I am finding though that for many species, abundance estimates can be very high, and increasingly so as K values increase, (sometimes well beyond 80 individuals for species with a max survey N value of 10 individuals).  Using ZIP rather than NB to model the detection function can help in some cases, but I have also tried using lower K values as a sort of "informed prior" as suggested (but also advised against!) in previous threads on this topic.

"choosing a value of N that truncates the likelihood is effectively
like using an informative prior distribution which seems to be
generally frowned about in the literature.  You would have to argue
that N=10 makes sense (or whatever), and someone else might think N=8
makes sense, and you find that everyone has their own answer!"
(from Andy Royle)

The high estimates may be the result of many species having lots of zero counts and few large counts when larger groups were detected... (over-dispersion, and possibly a violation of the assumption of closure in pcount?)  Using values of about N(max) + 20 serves to reduce my estimates to plausible values for most species (though i have not checked the resulting fits with parboot() yet).  One of the reasons I am so concerned to derive plausible absolute values and not just relative estimates  is that I plan to use them in conjunction with distance-sampling based estimates of effective strip width, calculated in unmarked using a subset of surveys when distance measurements were taken to each observation.  My question is thus two-fold: 

1: Is it valid to use reduced K values to truncate the tails of the assumed error distributions and bring my estimated N values into a plausible range?

2:  Given the above, is it then valid to use these estimates, multiplied by independently derived effective strip widths, to estimate density, or is this "mixing" two aspects of detectability process in a way that is not valid?

Thanks for all help so far, and any thoughts or suggestions would be much appreciated.

Alex

#################################################

Richard Chandler

unread,
Oct 24, 2013, 9:20:03 PM10/24/13
to Unmarked package
Hi Alex,

If abundance estimates don't stabilize as K increases, it suggests that detection probability is very low. There aren't any methods (occupancy, capture-recapture, etc...) that work very well in this situation. Your best bet is to find "good" covariates of detection and stick to a simple abundance model, such as the Poisson. If this doesn't help, you may need to forget about estimating detection probability and just model the counts as an index of abundance.   Or you could go the Bayesian route and find some prior information about the parameters. 

Richard
Richard Chandler
Assistant Professor
Warnell School of Forestry and Natural Resources
University of Georgia

Alex W.

unread,
Jun 12, 2015, 11:08:23 AM6/12/15
to unma...@googlegroups.com
Hi all,

I have a dataset stretching over 14 sampling years (22 calendar years), with relatively continual sampling throughout each year.  I originally tried to set this up as a robust design in pcountOpen, but was coming up with estimates of detection ~ 0.05 or less, regardless of how I divided up the seasons into secondary periods (I've tried lumped data into just 3 surveys or dividing it out into as much as 21 surveys within a season).  I ran into the issue mentioned by Richard below of abundance estimates increasing with K, regardless of mixture, starting values, modeling detection on obsCovs, etc., which I assume is because of the low detection.  I have "stacked" the data into site-years, as Andy suggested, and now get detection estimates 0.2 < p < 0.25 using the Poisson mixture.  The NB mixture has much better AIC scores (~2000 AIC lower for NB vs Poisson models) but the NB models still have p ~0.07 vs p ~0.2 for Poissoin models (and higher abundance estimates, ~29 for NB vs ~9 for Poisson) , whether using a null model or modeling detection on obsCovs.

My main questions: 1) Does stacking the data into site-years violate the assumption of independent sites?  2) How does stacking the sites change the interpretation of results?

If anyone has any other suggestions for dealing with my data, I'm all ears!  I've included a UMF summary below for reference.

Many thanks,
Alex Wolf

> summary(SCLA.umf)
unmarkedFrame Object

1512 sites
Maximum number of observations per site: 3 
Mean number of observations per site: 3 
Sites with at least one detection: 1420 

Tabulation of y observations:
   0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   33 <NA> 
1295 1022  765  481  310  229  132   97   67   37   31   25   14   11    5    6    2    3    1    2    1    0 

Site-level covariates:
 Block         Year        ELT           Landform        Slope             NEness             Aspect    SoilRest  SoilDrain  
 b1:504   y1992  :108   elt17:756   Backslope:1106   Min.   :-2.2156   Min.   :-1.39050   W      :266   No :812   mwse :588  
 b2:504   y1993  :108   elt18:756   Bench    : 182   1st Qu.:-0.5791   1st Qu.:-0.94633   NE     :224   Yes:700   mww  : 28  
 b3:504   y1994  :108               Footslope:  14   Median : 0.1730   Median :-0.09327   E      :210             se   :224  
          y1995  :108               Shoulder : 196   Mean   : 0.0000   Mean   : 0.00000   NW     :196             w    :182  
          y1998  :108               Summit   :  14   3rd Qu.: 0.7294   3rd Qu.: 1.04722   SE     :196             wmwse: 14  
          y1999  :108                                Max.   : 2.3273   Max.   : 1.38962   SW     :182             wse  :476  
          (Other):864                                                                     (Other):238                        
      Flow              PondD             StreamD                          Harvest          TSH                                 HrvstTime   
 Min.   :-0.39176   Min.   :-0.67204   Min.   :-1.70894   Clearcut             :  45   Min.   : 0.00   Leave.                        :1102  
 1st Qu.:-0.34521   1st Qu.:-0.55085   1st Qu.:-0.88223   Group opening        :  37   1st Qu.: 0.00   Single-tree selection.recent  : 123  
 Median :-0.20558   Median :-0.36704   Median :-0.03105   Intermediate thin    : 118   Median : 0.00   Single-tree selection.longterm:  87  
 Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.00000   Leave                :1102   Mean   : 1.95   Intermediate thin.recent      :  67  
 3rd Qu.:-0.06594   3rd Qu.:-0.00316   3rd Qu.: 0.75768   Single-tree selection: 210   3rd Qu.: 1.00   Intermediate thin.longterm    :  51  
 Max.   : 9.05703   Max.   : 2.95899   Max.   : 2.37165                                Max.   :17.00   Clearcut.recent               :  27  
                                                                                                       (Other)                       :  55  
    TimeClass   
 longterm: 177  
 none    :1102  
 recent  : 233 

mike worland

unread,
Jun 22, 2015, 9:30:15 PM6/22/15
to unma...@googlegroups.com
Alex,
 
I stacked my data as well and wondered about these same issues.  I think this post might help you as it did me:
 
 
Best,
 
Mike Worland
 
 

Alex W.

unread,
Jun 23, 2015, 10:13:29 PM6/23/15
to unma...@googlegroups.com
MIke,

Thanks so much for your response!  That thread is really helpful.  I tried searching for past posts on the forum but clearly missed some good ones.

Thanks again!

-Alex
Reply all
Reply to author
Forward
0 new messages