Different results with same dataset in communityModel

46 views
Skip to first unread message

Jarrad Barnes

unread,
Apr 26, 2024, 6:43:07 AM4/26/24
to camtrapR
Hi Juergen, 

I've been trying to reorder some of my covariates so the model reads them in the order I want - I found the easiest way was to recode them as dummy variables e.g., A, B, C, etc. which would then be automatically specified in alphabetical order. Having done this, I reran my model and the occupancy values for ALL covariates changed. The below two images show occupancy estimates for species in a community based on a covariate "ruggedness (scaled)". I'm using this as an example because this is a continuous variable, so nothing about it was changed, yet it's clear the values are different. The first figure is the values for this covariate with the original dataset, the second is with the dummy dataset. I should stress that I've checked and double checked and run this multiple times and I get the same result. Continuous variables are similarly affected, and categorical variables indicate different different relative probabilities to one another (i.e., if variable A showed the highest occupancy probability in the original dataset, it doesn't necessarily show the highest probability in the dummy dataset). Does changing the alphabetical order of categorical variables, and therefore the order in which their respective levels are included, really change the model THIS much, even (especially) for variables that aren't recoded?

Thanks for any insight,
Jarrad

Rplot.png
Rplot01.png

Jürgen Niedballa

unread,
May 15, 2024, 12:51:17 AM5/15/24
to Jarrad Barnes, camtrapR
Hi Jarrad,
would you mind sharing your workflow please to help me better understand what you did that led to these differences? That would also help me reproduce it. Ideally, if you could make a small reproducible example and share as an R workspace (.RData) it would be much appreciated (e.g. based on the AHMbook::simComm() function as in vignette 5). Otherwise just the code would help a lot too.
I also don't quite understand what you mean by:
I've been trying to reorder some of my covariates so the model reads them in the order I want
Please elaborate what  exactly the original problem was and how you addressed it. 

Generally I would not expect such dramatic differences depending on how covariates are specified. So it might potentially be an internal problem.
Thank you,
Jürgen

--
You received this message because you are subscribed to the Google Groups "camtrapR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to camtrapr+u...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/camtrapr/c13ebdad-8f53-4ae3-b3e1-6425d99b8fdfn%40googlegroups.com.

Jarrad Barnes

unread,
Jul 3, 2024, 5:31:11 PM7/3/24
to camtrapR
Hi Juergen,

Apologies for the late reply - I've been pretty busy and let this stagnate.

To your question re: reordering, I mean I've been trying to set the levels of factors so that everything is compared to a reference level that I choose, rather than it being the default alphabetical. I ran the code with the levels for each of my categorical covariates as recorded (e.g. "aspect" as N, NE, E, etc.) and then again with levels re-coded to my specifications (e.g., for "aspect", N = A, NE = B, E = C, etc.). Does that make sense? When doing that, the data doesn't change, just the way each level is coded, but it gives wildly different occupancy estimates not only for the re-coded ones, but also for continuous variables which aren't changed. Hopefully the provided code and data makes this clearer ...

Please see link to script, workspace and relevant sample data. I've limited the database to two species for simplicity - the problem persists.

https://file.io/gKpJm4BrpkQ7 (link valid for 2 weeks from today)

Thanks for any light you might be able to shed on this.

Jarrad

Message has been deleted

Jürgen Niedballa

unread,
Jul 17, 2024, 2:22:18 AM7/17/24
to Jarrad Barnes, camtrapR
Hi,
yes, the order of factor levels is important. The first factor level serves as reference and its mean and SD are fixed at 0 (the effect of that particular factor level is included in the intercept already). Hence the effects of other factor levels are affected by the choice of reference level. Normally this should only affect the parameter values, but the ecological parameters should remain the same (except for some minor variation due to MCMC algorithm). It does affect plots though, where factor levels are ordered as they are in the data.

For model results the order of covariates should not matter since the effects of all covariates are simply summed up.

When I change the order of factor levels and run the models to convergence (crucial for making comparisons!), then the results are still very different from the model with the original data, and similar to the model where you re-leveled the data.
This is just to let you know I don't fully understand this yet and will investigate further. Sorry I don't have a proper answer yet.

In the meantime can you confirm the differences exist when you run the models with all species to full convergence?

Best,
J


Am Mi., 17. Juli 2024 um 13:42 Uhr schrieb Jarrad Barnes <jarrad...@gmail.com>:
Hi Juergen,

Given communityModel is hierarchical, does as simple a thing as changing the order of the levels of a factor cause this issue?
I'd understand different outcomes based on the input order of entire covariates, but I wouldn't have thought it should happen if the order of covariates is kept the same and only the order of levels is changed.

Jarrad Barnes

unread,
Jul 17, 2024, 2:42:53 AM7/17/24
to camtrapR
Thanks Juergen, I figured that would be the case as for any coefficient comparison, I just wasn't sure why it should affect the whole model. At least I'm not alone in that at the moment!
I haven't explicitly looked at convergence, but the same differences still exist when I run the model with all species and higher mcmc parameters of 25000 iterations, 5000 burn-in, 3 chains.

Jürgen Niedballa

unread,
Jul 19, 2024, 8:02:00 PM7/19/24
to Jarrad Barnes, camtrapR
Did you check model predictions in the full model runs, e.g. maps of predicted occupancy and species richness, or predictions for the camera trap stations themselves? That would help show if and where differences between the models are, and if all species are affected similarly. 

I can't quite tell from the small toy dataset, but it could also be that different factor levels being used as reference levels in the two models leads to differences in responses to individual covariates, but the combined effects of all covariates may match again.

Jarrad Barnes

unread,
Jul 22, 2024, 1:37:50 AM7/22/24
to camtrapR
Hi Juergen,

I haven't done it by camera station, but I have run predictions for both models for all species by all potential combinations of covariates, and the predictions are definitely different. Sometimes close, but often way off. Something I have noticed, though, is that the output prediction tables are not equal size, but I'll have to work out where the problem is there. Same record database, and same covariate matrix (except for the re-coding, which I've triple-checked), so maybe there's something there that might fix this if I can work out what's happening.

Jarrad Barnes

unread,
Jul 23, 2024, 3:01:37 AM7/23/24
to camtrapR
Hi Juergen, I worked out the issue with my predictions datasets, but the problem persists.
The overall *pattern* of the estimates looks the same, but the values for any given combination are different (similarly to when viewing the marginal effects plots)

Just to give two examples using the first and last line of each dataset:

dataset.png

On Saturday 20 July 2024 at 10:02:00 UTC+10 Jürgen Niedballa wrote:

Jarrad Barnes

unread,
Jul 26, 2024, 1:41:14 AM7/26/24
to camtrapR
Hi again Juergen,

For what it's worth, I've had a look back through some previous iterations of my data and this issue doesn't seem to have any real effect on the *effect sizes* (which is what I'm primarily reporting on in my manuscript) - they're a bit different because of changes to my data and number of iterations etc., but the patterns are consistent. So it appears there's just something weird going on with the marginal effects/species-specific occupancy estimates - I don't know if that helps narrow down what might be going on.

Jarrad

Reply all
Reply to author
Forward
0 new messages