Multicollinearity, model selection and offset

37 views
Skip to first unread message

Julien Piquet

unread,
Feb 10, 2025, 11:44:36 AMFeb 10
to spOccupancy and spAbundance users
Hi all,

I am here because while running N mixture models in spAbundance several questions came to my mind and I've found no answers yet. 
  1. spAbundance allows the use of offset terms. These can be particularly useful to include the area, according to the package description. This is precisely my case. I want to use it in my abund.formula, yet I do not know how I should specify it. When including the term 'offset' or 'area' in the formula I get "Error in model.frame.default(formula = abund.formula, data = data$abund.covs,  :  invalid type (closure) for variable 'offset'", which typically appears when the function does not recognize the variable or the variable is missing.
  2.  Is there a way to test multicollinearity for NMix and related functions? The function vif() does not function and I wonder if this is contemplated in the package or if I should test it outside of it.
  3. Since I have multiple detection and abundance covariates I would like to run model selection, yet dredge() does not work for NMix and related. Is there any other way to run it automatically or should I proceed manually?
  4. I have a non-random factor (cloudiness) that is coded as character in my data. Is it ok if I leave it that way or should I convert it to numerical as random ones to ensure better results?
Thank you very much in advance for answering all these questions.

Best

Julien

Jeffrey Doser

unread,
Feb 11, 2025, 5:34:57 AMFeb 11
to Julien Piquet, spOccupancy and spAbundance users
Hi Julien,

Thanks for the question. Here are some thoughts on each of your points:
  1. The offset should be supplied separately in the "data" argument as opposed to including in the formula. When you create the data list for "data", the offset should be specified as "offset" as another component of the list. It can either be a single value, a vector with a different offset value for each site, or a matrix with rows corresponding to sites and columns corresponding to visits if the offset varies by both site and visits. If "offset" is included in the "data" list, then it will automatically be incorporated into the model, and you don't need to specify anything in the model formulas.
  2. No, any tests for multicolinearity have to be done manually with any spAbundance/spOccupancy functions.
  3. No, there is no dredge-like function for doing model selection with spAbundance/spOccupancy. Model selection will need to be done manually.
  4. It's totally fine to include a non-random categorical variable in the model if that is what you think makes the most sense. Of course, if there are a lot of levels to that categorical variable, it will be harder to estimate than a random effect. But, if there are only a few (e.g., 2-4) levels, a non-random factor makes more sense.
Hope that helps,

Jeff

--
You received this message because you are subscribed to the Google Groups "spOccupancy and spAbundance users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spocc-spabund-u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/spocc-spabund-users/eba3ea53-2d69-4a8d-9350-425408fc817an%40googlegroups.com.


--
Jeffrey W. Doser, Ph.D.
Assistant Professor
Department of Forestry and Environmental Resources
North Carolina State University
Pronouns: he/him/his

Julien Piquet

unread,
Feb 12, 2025, 8:01:52 AMFeb 12
to spOccupancy and spAbundance users
Hi Jeff,

Thank you for provinding guidance on all these questions.

  1. Regarding the offset, I tried specifying a matrix with J rows and columns equal to the number of occasions and I get error: data$offset must be of length 1 or 105. Is there a way to circumvent this other than going back to single-column vector that averages the offset across occasions? BTW regarding the offset I just came to another question. When specifying area (in my case) as an offset, are results expressed in terms of density or should I report abundance as number of individuals per transect? I know other packages running N-micture models allow you to express results as density with some treatments.
  2. OK!
  3. OK!
  4. What I meant is whether I should convert my fixed factor cloudiness to a numerical factor or I can leave it the way it is now (e.g., 0_no_clouds, 1_partially_covered,...).
Again, thank you for any help you can provide.

Best

Julien

Jeffrey Doser

unread,
Feb 13, 2025, 5:08:16 AMFeb 13
to Julien Piquet, spOccupancy and spAbundance users
Hi Julien,

Sorry about that, I gave an incorrect response to your first question. It is only possible to have an offset in the spAbundance N-mixture models that is constant or varies across sites (so it should be supplied as either a single value or a vector of length equal to the number of rows in data$y). So, you'll have to supply an offset that is constant across the replicates at each site. If there is large variation in survey size across replicates, you may wish to try to account for some of that variation with some covariate effect on the detection part of the model (where things are allowed to vary by both site and replicate). When an area offset is supplied, the estimates returned from the model will all be returned in terms of density. However, you don't need to change how you're supplying "data$y" to the model (i.e., data$y would be the same regardless of whether you did or did not include an area offset).

For question 4: yes, you should be able to include the fixed factor in its current form and won't have to convert it to numeric.

Jeff

Julien Piquet

unread,
Feb 14, 2025, 4:03:47 AMFeb 14
to spOccupancy and spAbundance users
Hi Jeff,

Ok! Thank you for your quick answer. All good now ;).

Best

Julien
Reply all
Reply to author
Forward
0 new messages