Make predictions on the logit scale in unmarked

174 views
Skip to first unread message

Joshua Jones

unread,
Jul 19, 2021, 3:55:02 PM7/19/21
to unmarked
I have occupancy predictions produced using the package unmarked using the function predict(). However, for the next stage of my analysis I need these predictions on the logit scale, whereas unmarked makes these predictions on the probability scale. Does anyone know how to make predictions from an occupancy model on a logit scale?

Ken Kellner

unread,
Jul 19, 2021, 3:56:39 PM7/19/21
to unmarked
Set the backTransform argument to FALSE.

predict(..., backTransform=FALSE)

Joshua Jones

unread,
Jul 19, 2021, 4:09:49 PM7/19/21
to unmarked
Worked perfectly! Thank you. One follow up, is there any reason why the data produced by unmarked on a logit scale is three orders of magnitude more than that produced by a binary GLM using naive occupancy values?

Ken Kellner

unread,
Jul 19, 2021, 4:15:49 PM7/19/21
to unmarked
I'm guessing you have very low estimated detection probabilities, resulting in the estimated occupancy being much higher than naive occupancy (on both scales). Or it may be a result of problems with model fit as you suggested in a previous post (indicated by NaNs) which looks like it was related to the occupancy covariate. It's hard to say without seeing the model results and/or the input data.

Ken

Joshua Jones

unread,
Jul 19, 2021, 5:14:00 PM7/19/21
to unmarked
Hi Ken,

Just to see if my models are usable here is the output for the unmarked model:

Head
       Predicted       SE     lower          upper         acd
1  5.323980 2.494605 0.4346445 10.21332   4
2  6.757680 3.034477 0.8102155 12.70515   5
3  8.191380 3.579660 1.1753754 15.20739   6
4  9.625081 4.128052 1.5342480 17.71591   7
5 11.058781 4.678523 1.8890441 20.22852   8
6 12.492481 5.230418 2.2410505 22.74391   9

Tail 
           Predicted       SE      lower          upper     acd
296  428.2655 166.7728 101.3968 755.1342 299
297  429.6992 167.3300 101.7384 757.6600 300
298  431.1329 167.8872 102.0801 760.1857 301
299  432.5666 168.4444 102.4217 762.7115 302
300  434.0003 169.0016 102.7633 765.2373 303
301  435.4340 169.5587 103.1050 767.7630 304

And here they are for the binary GLM:

Head
   acd  pred          se               upperCI     lowerCI     family
1   4 -3.327853 0.6340353 -2.085144 -4.570562 Buprestidae
2   5 -3.321068 0.6306797 -2.084935 -4.557200 Buprestidae
3   6 -3.314282 0.6273319 -2.084711 -4.543853 Buprestidae
4   7 -3.307496 0.6239923 -2.084472 -4.530521 Buprestidae
5   8 -3.300711 0.6206609 -2.084216 -4.517206 Buprestidae
6   9 -3.293925 0.6173377 -2.083943 -4.503907 Buprestidae

Tail 
        acd      pred        se                    upperCI   lowerCI      family
296 299 -1.326107 0.7475176 0.1390279 -2.791241 Buprestidae
297 300 -1.319321 0.7510805 0.1527968 -2.791439 Buprestidae
298 301 -1.312535 0.7546481 0.1665749 -2.791646 Buprestidae
299 302 -1.305750 0.7582204 0.1803621 -2.791862 Buprestidae
300 303 -1.298964 0.7617972 0.1941582 -2.792087 Buprestidae
301 304 -1.292179 0.7653785 0.2079632 -2.792321 Buprestidae

Looking at the models, the occupancy isn't higher overall but it is more spread out. This could be due to low detection probabilities though I'd assume? And if so would I be correct in saying the unmarked models are a truer reflection of the actual data than the naive binary glms?

Ken Kellner

unread,
Jul 21, 2021, 10:29:12 AM7/21/21
to unmarked
To make a more informed assessment I'd also need to see the summary output (parameter estimates) from your final fitted model, I'm not sure if you are using the one from your previous post or a new one. The short answer here is no, based solely on this output I wouldn't use the model.

Even the smallest linear predictor you've generated (~5) here with predict corresponds to an occupancy ~1. When that's the case I would not be very confident in parameter estimates associated with the occupancy model - even though you have a range in the linear predictor from 5-435 the range in the actual occupancy is nonexistent - every site is 1. The model is going to really struggle to determine the effect of acd when it thinks every site is occupied.

I'm guessing the SEs of the parameter estimates are very large as well, which results in the huge errors around the estimates here. That suggests to me that there is some kind of issue with the input data, either the response or the covariate. I would take three steps here: first try fitting the model without a covariate (without acd) on occupancy, and see if the occupancy estimates you get back from predict are more reasonable. Then try fitting the model with acd again, but this time scale it to a Z-score first. Your covariate acd takes on some very large absolute values which might result in occu() struggling to get a good parameter estimate.

Finally, I would take another look at the response/y data. Is there at least one detection at every site? I'm guessing based on your naive model the answer is no, but just wanted to double check.

Ken
Reply all
Reply to author
Forward
0 new messages