Hello everyone !
I recently posted a question related to a difficulty understanding unmarked’s occupancy output when my categorical predictor (habitat) had one category with 0 observations.
https://groups.google.com/forum/#!topic/unmarked/ho6o8UIk-bo
Turns out I am not supposed to interpret parameters with huge SE. I’m okey with that, I assume that I can still say that, in my case, elephants are less present in reserves M and RW than in reserve R (reference category) thanks to my raw data.
Call:occu(formula = ~Camera.type ~ Reserve, data = Eleph) Occupancy: Estimate SE z P(>|z|)(Intercept) -1.69 0.335 -5.0408 4.63e-07ReserveM -17.46 1398.101 -0.0125 9.90e-01ReserveRW -1.68 0.657 -2.5543 1.06e-02 Detection: Estimate SE z P(>|z|)(Intercept) -2.418 0.277 -8.73 2.66e-18Camera.typeCuddeback -0.686 0.654 -1.05 2.95e-01 AIC: 320.9485
Raw data tells me that there are 0 observations at reserve M, 40 observations at reserve R and 7 observations at reserve RW.
My current best model (selected based on AIC) looks like this:
> best2=occu(~1~Altitude_s+Habitat+Dist.Water.Main.Secon_s+Dist.Main.Secondary.Roads_s+Slope_s+DistPark_s+Nom.Reserve,Eleph)
> best2
Call:occu(formula = ~1 ~ Altitude_s + Habitat + Dist.Water.Main.Secon_s + Dist.Main.Secondary.Roads_s + Slope_s + Distpark_s + Nom.Reserve, data = Eleph) Occupancy: Estimate SE z P(>|z|)(Intercept) 0.851 1.506 0.56510 0.5720Altitude_s 0.813 0.598 1.36078 0.1736Habitatclosed woodland 3.318 2.450 1.35439 0.1756Habitatopen woodland -1.712 1.216 -1.40760 0.1592Habitatwooded grassland -11.666 353.071 -0.03304 0.9736Dist.Water.Main.Secon_s -0.569 0.525 -1.08362 0.2785Dist.Main.Secondary.Roads_s 2.087 0.723 2.88611 0.0039Slope_s -0.969 0.628 -1.54347 0.1227DistPark_s 3.347 1.319 2.53709 0.0112Nom.ReserveM -19.757 4843.165 -0.00408 0.9967Nom.ReserveRW -10.016 3.630 -2.75878 0.0058 Detection: Estimate SE z P(>|z|) -2.49 0.251 -9.92 3.47e-23 AIC: 292.7279
I would like to use this model to make a prediction map but before doing that I tried to predict my categorical variables when the other variables are kept constant. These are my predicted values for reserve:
> newData=data.frame(Nom.Reserve=c("M","R","RW"),DistPark_s=rep(0,3),Dist.Water.Main.Secon_s=c(0,0,0),Altitude_s=c(0,0,0),Dist.Main.Secondary.Roads_s=rep(0,3),Pente_s=rep(0,3))
> pred=predict(best3,type="state",newdata=newData,appendData=T)
> pred
|
Predicted |
SE |
lower |
upper |
Nom.Reserve |
Distpark_s |
Dist.Water.Main.Secon_s |
|
4.24E-01 |
2.73E-01 |
7.60E-02 |
0.86777228 |
M |
0 |
0 |
|
1.48E-07 |
8.23E-05 |
0.00E+00 |
1 |
R |
0 |
0 |
|
5.99E-05 |
1.43E-04 |
5.64E-07 |
0.00632038 |
RW |
0 |
0 |
In this prediction, reserve M has the highest occupancy eventhough we saw 0 elephants there and the other 2 reserves have a close to 0 occupancy. This has to be wrong! Even if I model reserve alone, I get similar results.
Does this
mean that it is impossible to predict scores when we have too few observations in
our categories?
Why the model tells me that occupancy decreases (negative estimate) when we pass from reserve R (reference category) to RW and then it predicts the opposite?? I have observed the same phenomenon for other species. Example with Panthera Pardus:
> best3=occu(~Camera.type~Dist.Water.Main.Secon_s+Nom.Reserve,Pardus)
> best3
Call:occu(formula = ~Camera.type ~ Dist.Water.Main.Secon_s + Nom.Reserve, data = Pardus)
Occupancy: Estimate SE z P(>|z|)(Intercept) -2.065 0.644 -3.21 0.00134Dist.Water.Main.Secon_s -0.883 0.347 -2.54 0.01101Nom.ReserveM 2.031 0.878 2.31 0.02068Nom.ReserveR 1.264 0.768 1.64 0.10007 Detection: Estimate SE z P(>|z|)(Intercept) -3.896 0.410 -9.5 2.09e-21Camera.typeCuddeback -0.705 0.392 -1.8 7.20e-02 AIC: 453.7909
> newData=data.frame(Nom.Reserve=c("M","R","RW"),Dist.Water.Main.Secon_s=c(0,0,0))
> pred=predict(best3,type="state",newdata=newData,appendData=T)
> pred
|
Predicted |
SE |
lower |
upper |
Nom.Reserve |
Dist.Water.Main.Secon_s |
|
0.1125661 |
0.0643181 |
0.03466546 |
0.3094148 |
M |
0 |
|
0.4916469 |
0.1997519 |
0.16799854 |
0.8224519 |
R |
0 |
|
0.3097557 |
0.1496749 |
0.10217138 |
0.6389486 |
RW |
0 |
For this case I would have assumed that Reserve M had a higher occupancy than reference reserve R. When I use predict, it looks like the opposite. This reasoning works for other species.
I only get these confusing results for species that were observed less than 50 times across 440 sites (21 days survey). Also, I only have problems with categorical predictors (habitat too).
What should I do?
Thanks in advance!
Gabriel Valdez
Dear Marc,
Thank you for your fast reply once again.
I did as you said and used my model to predict occupancy across every pixel of my study area (using ArcGIS). However, the problem persists.
> best2=occu(~1~Altitude_s+Habitat+Dist.Water.Main.Secon_s+Dist.Main.Secondary.Roads_s+Slope_s+Nom.Reserve,Eleph)
> best2
Call:occu(formula = ~1 ~ Altitude_s + Habitat + Dist.Water.Main.Secon_s + Dist.Main.Secondary.Roads_s + Slope_s + Nom.Reserve, data = Eleph) Occupancy: Estimate SE z P(>|z|)(Intercept) -3.545 1.088 -3.2576 0.00112Altitude_s 0.857 0.556 1.5400 0.12357Habitatclosed woodland 3.997 1.609 2.4845 0.01297Habitatopen grassland 2.149 1.304 1.6483 0.09929Habitatwooded grassland -10.073 294.202 -0.0342 0.97269Dist.Water.Main.Secon_s -0.561 0.497 -1.1287 0.25901Dist.Main.Secondary.Roads_s 2.427 0.798 3.0437 0.00234Slope_s -0.843 0.469 -1.7979 0.07219Nom.ReserveM -13.494 392.619 -0.0344 0.97258Nom.ReserveRW -3.308 1.731 -1.9110 0.05601 Detection: Estimate SE z P(>|z|) -2.5 0.253 -9.88 4.93e-23 AIC: 300.0922
> Eleph.Predmap=predict(best2, type="state", newdata=Predmap, appendData=T)
> aggregate(Eleph.Predmap[,1:4], list(Eleph.Predmap$Nom.Reserve), mean)
Group.1 Predicted SE lower upper1 M 8.329646e-02 0.0913347325 0.007149494 0.49723392 R 2.340644e-06 0.0009189598 0.000000000 1.00000003 RW 6.753009e-02 0.0592211875 0.007940877 0.2765730
I still get a lower occupancy prediction in the reserve where I observed most elephants.
I controlled for multicollinearity using VIF and it seems to me to be okey (no VIF higher than 5).
> vif(lm(Model_ss$Eleph~Model_ss$Altitude_s+Model_ss$Habitat+Model_ss$Dist.Water.Main_s+Model_ss$Dist.Main.Secondary.Roads_s+Model_ss$Slope_s+Model_ss$Nom.Reserve))
GVIF Df GVIF^(1/(2*Df))Model_ss$Altitude_s 2.980656 1 1.726458Model_ss$Habitat 1.259629 3 1.039219Model_ss$Dist.Water.Main_s 1.085870 1 1.042051Model_ss$Dist.Main.Secondary.Roads_s 1.714767 1 1.309491Model_ss$Slope_s 1.644706 1 1.282461Model_ss$Nom.Reserve 4.757057 2 1.476844
Also, in
the model we can see that parameter ReserveRW has a negative estimate. Shouldn’t
this mean that reserve RW has a lower “occupancy” than R (reference category)?
I really don't know if I should throw my model to the garbage or not.
Thanks again!
Gabriel
> head(Predmap) FID Join_Count Habitat Water_FC Dist_water Dist_Villa Villa_FC Dist_culti 1 0 1 open woodland Water\\Rivers major 11322.00 23331.37 Infrastructures\\Villages 16042.56 2 1 1 open woodland Water\\Rivers major 10975.45 23324.18 Infrastructures\\Villages 16294.47 3 2 1 open woodland Water\\Rivers major 11346.64 23824.22 Infrastructures\\Villages 16725.02 4 3 1 open woodland Water\\Rivers major 11727.41 24324.27 Infrastructures\\Villages 17159.34 5 4 1 open woodland Water\\Rivers major 12116.86 24824.31 Infrastructures\\Villages 17597.15 6 5 1 open woodland Water\\Rivers major 12514.17 25324.36 Infrastructures\\Villages 18038.20 Dist_Park Dist_GameR Dist_road Dist_escar Slope Altitude Nom.Reserve DistWatMS WatMS_Fc 1 33390.10 34129.73 791.6553 8708.317 0.7220203 1291.973 M 11322.00 Water\\Rivers major 2 32890.93 33635.16 1268.3798 8869.866 0.8075468 1288.703 M 10975.45 Water\\Rivers major 3 32864.96 33564.47 1107.7780 8402.630 0.5225660 1290.012 M 11346.64 Water\\Rivers major 4 32846.59 33501.09 943.6142 7939.391 1.1464645 1294.530 M 11727.41 Water\\Rivers major 5 32835.82 33445.07 779.4499 7480.893 1.3533006 1302.229 M 12116.86 Water\\Rivers major 6 32832.67 33396.44 615.2854 7028.064 1.3432235 1310.627 M 12514.17 Water\\Rivers major Dist.Water.Main_s DistVillage_s Dist.cultivation_s DistPark_s Dist.Game.Reserve_s Dist.Main.Secondary.Roads_s 1 0.8522358 -0.27133053 -0.6373598 0.06679590 2.130527 -1.986802 2 0.8016014 -0.27203736 -0.6094975 0.04647859 2.106300 -1.815373 3 0.8558062 -0.22313092 -0.5623724 0.04541753 2.102822 -1.868811 4 0.9104960 -0.17473497 -0.5154452 0.04466649 2.099702 -1.927613 5 0.9655208 -0.12683387 -0.4687381 0.04422625 2.096941 -1.991794 6 1.0207530 -0.07941273 -0.4222696 0.04409727 2.094542 -2.063174 Dist.Escarpement_s Slope_s Altitude_s Dist.Water.Main.Secon_s 1 0.7735670 -0.15394253 0.3302090 1.671498 2 0.7980627 -0.08783263 0.3092900 1.612651 3 0.7265855 -0.32534568 0.3176652 1.675647 4 0.6537298 0.14474215 0.3465511 1.739207 5 0.5794949 0.26986292 0.3956533 1.803156 6 0.5039094 0.26399864 0.4490529 1.867346
<w
Dear Gabriel,
I am a little puzzled, but from looking at your pardus results reported earlier (and somewhere at the bottom of this mail), I get this hunch: I think that in R, the levels of a factor must appear in exactly the same order in the factor that you use to fit the model and in the predict statement. Look at your pardus results again:
> best3=occu(~Camera.type~Dist.Water.Main.Secon_s+Nom.Reserve,Pardus)
> best3
Call:
occu(formula = ~Camera.type ~ Dist.Water.Main.Secon_s + Nom.Reserve,
data = Pardus)
Occupancy:
Estimate SE z P(>|z|)
(Intercept) -2.065 0.644 -3.21 0.00134
Dist.Water.Main.Secon_s -0.883 0.347 -2.54 0.01101
Nom.ReserveM 2.031 0.878 2.31 0.02068
Nom.ReserveR 1.264 0.768 1.64 0.10007
[.....]
You have different orders of the factor levels. In the fitted model (i.e., in the factor ‘Nom.Reserve’ used to fit the model), the intercept and hence the first (reference) level seems to be RW and then the 2nd and 3rd levels are M and R. In contrast, the ordering of the factor levels in ‘Nom.Reserve’ in the predict command is M, R, RW. It seems that predict produces predictions of the reserved IN THE SAME order as in the fitted model, i.e., first comes RW, then M and finally R. Look at this:
> plogis(-2.065)
[1] 0.1125455
This is essentially identical for the prediction for the first level (RW), which, however, there is erroneously labelled M.
Next, the second prediction is that for the 2nd level IN THE FITTED MODEL (i.e., M)
> plogis(-2.065 + 2.031)
[1] 0.4915008
And, finally, this seems to be the prediction for level R.
> plogis(-2.065 + 1.264)
[1] 0.3098116
Hence, I would try to list the names of the reserves in the Nom.Reserve factor in the predict command in the same order as in the factor used to fit the model and I guess things come out right.
Best regards --- Marc
Call:
occu(formula = ~Camera.type ~ Dist.Water.Main.Secon_s + Nom.Reserve,
data = Pardus)
Occupancy:
Estimate SE z P(>|z|)
(Intercept) -2.065 0.644 -3.21 0.00134
Dist.Water.Main.Secon_s -0.883 0.347 -2.54 0.01101
Nom.ReserveM 2.031 0.878 2.31 0.02068
Nom.ReserveRW 1.264 0.768 1.64 0.10007Predicted SE lower upper Nom.Reserve Dist.Water.Main.Secon_s 1 0.4916469 0.1997519 0.16799854 0.8224519 R 0 2 0.1125661 0.0643181 0.03466546 0.3094148 M 0 3 0.3097557 0.1496749 0.10217138 0.6389486 RW 0
OK, that beats me then. But I think the fact that predict seems to give the right predictions in the wrong order seems significant. Anybody else any ideas ?
newData$Nom.Reserve=relevel(newData$Nom.Reserve,ref="Rukwa")--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+unsubscribe@googlegroups.com.
Thanks, Jim, for the clarifications !
Best regards --- Marc
Von: unma...@googlegroups.com [mailto:unma...@googlegroups.com]
Im Auftrag von Jim Baldwin
Gesendet: Mittwoch, 19. Juli 2017 17:04
An: unma...@googlegroups.com
Betreff: Re: [unmarked] Wrong prediction output? Is it possible to predict occupancy when a predictor's category has 0 observations?
I just wanted to follow up on your comment "Turns out I am not supposed to interpret parameters with huge SE." That's an incorrect interpretation of what Marc Kery told you. He told you exactly what what the issue is.
--
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit
https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.