New R package "ubms"

232 views
Skip to first unread message

kkellner

unread,
Jan 25, 2021, 10:09:36 AM1/25/21
to hmecology: Hierarchical Modeling in Ecology
Hi all,

My R package "ubms" is now available on CRAN[1]. The package fits a variety of hierarchical models (single-season and dynamic occupancy, N-mixture, Royle-Nichols, distance sampling, etc.) in a Bayesian framework with Stan [2], using a formula-based interface almost identical to package "unmarked". You can specify random effects on model parameters as well using the same syntax as package "lme4". 

Also included are tools for assessing model fit, calculating WAIC and leave-one-out cross validation (LOO), model selection, prediction, plotting residuals and covariate effects, and extracting posterior distributions of latent states (e.g. site abundance/occupancy). And of course you get the full MCMC sample output from Stan so you can apply any other tools that interact with Stan.

I see ubms as an intermediate step along the continuum from unmarked/MARK/PRESENCE to writing custom models in BUGS/JAGS/Nimble/Stan. It is not meant to replace either approach but rather to supplement them, for situations when you need a Bayesian framework and “off-the-shelf” model structures are adequate. In such situations ubms should be a fast and easy solution.

If you'd like to see more details, the package website is here [3], an extended demo of the package features here [4], and a comparison with JAGS here [5]. Feel free to contact me with questions or issues, or submit bug reports on Github [6]. I hope you find the package useful!

Ken

Kery Marc

unread,
Jan 25, 2021, 10:22:36 AM1/25/21
to kkellner, hmecology: Hierarchical Modeling in Ecology
Dear Ken,

that sounds like an unbelievably useful package. What an achievement ! Thank you very much.

Best regards  --- Marc
--
*** Three hierarchical modeling email lists ***
(1) unmarked: for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology (this list): for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2016, 2020)
---
You received this message because you are subscribed to the Google Groups "hmecology: Hierarchical Modeling in Ecology" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hmecology/9f880d6e-9f85-4e00-905f-28cb4635c625n%40googlegroups.com.

Thành Văn Nguyễn

unread,
Feb 10, 2021, 5:39:31 AM2/10/21
to hmecology: Hierarchical Modeling in Ecology
Dear Ken,

I am using the ubms package to estimate the occupancy pattern of rare species (which means low detection data), and the package is working very well. I am super happy with it.

When I compare the candidate models by using modSel(), a concern comes to me. I searched around but failed, so I hope you can help to solve it.

As you might know, in unmarked package, we have the ΔAIC for each candidate model. Models within ΔAIC of 2 (or say with ΔAIC<2) are considered to have the same support as the top model (which has lowest AIC). By this way, I take all models which have ΔAIC<2 into further running.
In the meanwhile, the ubms package ranks the candidate models by using elpd. And you also pointed out that the comparison of elpf_diff and se_diff can help to indicate the models with higher elpd are performed better. The question came to me at this step: which models I should select for further running? All of my models have elpf_diff higher than se_diff at least 2 or 3 times. And I could not take them all in the next running.

So my question is: do we have any kind of that  ΔAIC<2 "threshold" for ubms package?

Please let me know if my phrase was not clear, or if you need more information from me.

Thank you and best regards,
Thanh

Ken Kellner

unread,
Feb 10, 2021, 8:13:16 AM2/10/21
to hmecology: Hierarchical Modeling in Ecology
Hi Thanh,

It's a bit hard to discuss this without an example selection table, so I've pasted one I made below:

              elpd nparam elpd_diff se_diff weight
sfit_opt -1163.452 25.225     0.000   0.000   0.95
sfit     -1206.496  3.949   -43.044   9.197   0.05
sfit_2   -1257.392  1.860   -93.940  12.716   0.00

The "top" ranked model (with highest/least negative elpd) is first in the table, and all other models are being compared pairwise to the top model. Both 'sfit' and 'sfit_2' have have an absolute difference in elpd much larger than the standard error of this difference. As far as I know there isn't a widely accepted rule of thumb like delta AIC < 2 for this situation, but one approach is to see if elpd_diff +/- 1 SE overlaps 0, and if not, the model with the higher/more positive elpd is considered better (see e.g. https://meridian.allenpress.com/jfwm/article/10/2/691/433057/Bayesian-Model-Selection-in-Fisheries-Management). Following that rule 'sfit_opt' is expected to have better predictive performance than either 'sfit' or 'sfit_2' and so I would move forward using only 'sfit_opt'. In this case model selection made the correct choice, since I simulated the dataset with the same model as 'sfit_opt'.

When you say "All of my models have elpf_diff higher than se_diff at least 2 or 3 times" are you describing a situation like my example here? If so it sounds like your top model is probably much better than any other model, since abs(elpd_diff) +/- se_diff for other models would not overlap 0.

On the other hand if you have one elpd_diff much smaller than the corresponding se_diff (e.g., fake example below)

elpd_diff se_diff
 0.000     0.000
-4.044     9.197
-93.940    12.716

That would imply similar support for both the top and 2nd ranked models, but not the third. If they all have abs(elpd_diff) < se_diff, then model selection has failed in the same way that it would if every model you fit had delta AIC < 2.

Ken

Thành Văn Nguyễn

unread,
Feb 10, 2021, 11:05:09 AM2/10/21
to hmecology: Hierarchical Modeling in Ecology
Hi Ken,

Thank you for the clear explanation. It corrected my understanding of the model selection in ubms. 

Your examples somehow fit my current situation. Below is output for one of my species:
     Model         elpd  elpd_diff      se_diff  Rhat
ADM_02  -344.571        0.000 0.000 1
ADM_01  -355.648     -11.077 7.283 1
ADM_00  -371.794     -27.222 9.387 1

In this case, the ADM_02 is clearly the best-supported model.


And for my 2nd species, the model selection output:
   Model        elpd  elpd_diff      se_diff  Rhat
ASR_02 -419.753        0.000 0.000 1
ASR_00 -423.587       -3.834 4.728 1
ASR_01 -425.236       -5.483 4.035 1

This mean the ASR_02 and ASR_00 are having similar support and I would use them in the next steps. And I might ignore the ASR_01


Thanks again,
Thanh
Reply all
Reply to author
Forward
0 new messages