Number of covariate combinations

229 views
Skip to first unread message

Luís Santiago

unread,
Nov 29, 2021, 6:36:38 AM11/29/21
to unmarked
Hi there,

After testing my variables for multicollinearity, I end up with a group of, for instance, 6 habitat variables for occupancy.

Say that I want to use Forested Area (numeric variable; FA) in my occupancy model (for the sake of this example, let's assume that detection is constant). I have seen that it can be computed as occu(~1, ~FA, unmarkedFrameOccu object) . 

Now, say that I want to use all the possible combinations of my 6 variables to run my occupancy model. I could add them up manually, but that would take a while. So, my question is if there is a way to do it automatically, by providing a list of my variables(?).

Thanks in advance!!

Marc Kery

unread,
Nov 29, 2021, 7:32:39 AM11/29/21
to unmarked

Dear Luis,

 

I think there is a function called “dredge” in package MuMin, but I am not sure whether it works for unmarked fitting functions.

 

Best regards  --- Marc

--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unmarked/def5d546-8763-4fc9-bcba-731ccffc944bn%40googlegroups.com.

Jim Baldwin

unread,
Nov 29, 2021, 1:57:27 PM11/29/21
to unma...@googlegroups.com
Using a canned (and thoroughly tested) function is usually the way to go (such as the approach that Marc suggested).

However, here is one (not thoroughly tested) approach for running the analysis for all possible subsets:

# Create some fake data
  y = matrix(rbinom(100,1,0.2), ncol=2)
  umf = unmarkedFrameOccu(y=y,
    siteCovs=data.frame(a=rnorm(50), b=rnorm(50), c=rnorm(50)))

# Generate all possible models
  variables = c("a", "b", "c")
  n <- length(variables)
  results = list()
  for (i in 1:(2^n-1)) {
      v <- as.logical(intToBits(i))[1:n]
      occu.formula = as.formula(paste("~ 1 ~", paste(variables[v], collapse=" + ")))
      results[[i]] = occu(occu.formula, umf)
  }

Then you can process the results object (ranking by AIC, etc.).

I also don't want the opportunity to go by for commenting on "testing my variables for multicollinearity".  While it is important to know your data intimately, throwing out highly  correlated variables really just "ignores" multicollinearity and doesn't fix the issue.  Removing correlated variables might certainly be necessary if there are numerical convergence problems. But if you are just needing predictions or the variables are simply what a decision maker has available, then unless you have zillions of predictors, you might not want to toss any when fitting.

But if the objective is learning about the effect of predictors (as opposed to making predictions as the major objective), then your knowledge is essential as to how to proceed (as opposed to some automatic tossing of highly correlated variables).  If you don't have the knowledge (outside of what's in the data), then having highly correlated variables is more than problematic with such an objective.

Jim
 

--

Benedikt Schmidt

unread,
Nov 30, 2021, 1:36:14 AM11/30/21
to unma...@googlegroups.com
The use of the dredge function was discussed some years ago: https://groups.google.com/g/unmarked/c/d0p_c9W_kRo
 
Best wishes, Beni
 
 
 
Gesendet: Montag, 29. November 2021 um 13:32 Uhr
Von: "Marc Kery" <marc...@vogelwarte.ch>
An: "unmarked" <unma...@googlegroups.com>
Betreff: AW: [unmarked] Number of covariate combinations

Luís Santiago

unread,
Nov 30, 2021, 9:24:07 AM11/30/21
to unma...@googlegroups.com
Thank you all.

The dredge function seems to be useful for my case. And Jim, I really appreciate your input too, but, as you said, I do not know most of the sites I am working with, so in this case, I really would have to use variables with little correlation.

You received this message because you are subscribed to a topic in the Google Groups "unmarked" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/unmarked/2tBTVqyaf6s/unsubscribe.
To unsubscribe from this group and all its topics, send an email to unmarked+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unmarked/trinity-23fd26ea-ad07-45ce-8e60-00e49e65d106-1638254171264%403c-app-gmx-bap42.

Luís Santiago

unread,
Nov 30, 2021, 10:29:30 AM11/30/21
to unma...@googlegroups.com
Thank you all.

The dredge function seems to be useful for my case. And Jim, I really appreciate your input too, but, as you said, I do not know most of the sites I am working with, so in this case, I really would have to use variables with little correlation.

Another question has arised, though, and, at least, I have not found it in the group. So, the dredge function seems to be working fine. I took a look at the output of a fairly common rodent in one of my study areas (which I am partially attaching):
image.png

I compared the AICc value of the models that have constant detection and occupancy. Is there a way to include this in my line of code so the results are part of the output as well, or is it something that should be done separately?
(my lines of code below)
mGlobal_ABSETTLEMENT.DasLep <- occu(~TrapEffort + HumanUseSites
                                                          ~StreamDens + RoadDens + AgriPast + MajorWater + AvgAnnualRain + Elevation,
                                                          data = UFO_ABSETTLEMENT.DasyproctaLeporina)


Thanks again

Benedikt Schmidt <benz.s...@gmx.ch> escreveu no dia terça, 30/11/2021 à(s) 06:36:
You received this message because you are subscribed to a topic in the Google Groups "unmarked" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/unmarked/2tBTVqyaf6s/unsubscribe.
To unsubscribe from this group and all its topics, send an email to unmarked+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unmarked/trinity-23fd26ea-ad07-45ce-8e60-00e49e65d106-1638254171264%403c-app-gmx-bap42.

Luís Santiago

unread,
Nov 30, 2021, 10:39:22 AM11/30/21
to unma...@googlegroups.com
Sorry, forgot to add that I add up 1 to detection and occupancy ( occu(~1+TrapEffort + HumanUseSites
                                                          ~1+StreamDens + ... ), as ~1~1 means constant detection and occupancy, but I doubt this is the way to go, as the results are same as before.

Luís Santiago

unread,
Dec 20, 2021, 2:28:53 PM12/20/21
to unmarked
Hi again,

I found the answer to my last question and will post it here in case someone comes across the same thing in the future.
It crossed my mind, but I then managed to double-check it with a statistician from my university, who regularly uses unmarked package too.
The answer is that, first, the dredge function will have to be used separately three different times:
1) global model;
2) ~1 ~(all the variables);
3) ~(all the variables) ~1.

Then, deltaAIC for the selection of which model(s) to be considered has to be calculated manually using all the AIC values.

Best to all of you and once again thank you very much.
Luís

Reply all
Reply to author
Forward
0 new messages