Modify existing distributions

118 views
Skip to first unread message

Matthijs Hollanders

unread,
May 4, 2021, 11:15:26 PM5/4/21
to nimble-users
Hello all,

I'm working on a multistate capture-mark-recapture model where surveys were performed in the robust design at multiple sites. I don't think I can use the dHMM distributions in nimbleEcology because it appears to me these don't include multiple secondary sessions or multiple sites. Since I don't feel comfortable at all writing custom distributions myself, is it possible to access/view/modify the distributions contained in nimbleEcology? I'm sure I'd have a better chance of success if I could start with those as a template.

Kind regards,
Matt

Perry de Valpine

unread,
May 4, 2021, 11:47:46 PM5/4/21
to Matthijs Hollanders, nimble-users
Absolutely!   That is a great way to get into writing your own custom distributions. The source code is on GitHub.  You can copy and paste a nimbleFunction into your own file and use it from there.  Here is the source code for dDHMM: https://github.com/nimble-dev/nimbleEcology/blob/master/R/dDHMM.R.  Let us know further questions.

-Perry 

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/7208edf3-9d30-4c77-b5e9-5a20cf2a3294n%40googlegroups.com.

Matthijs Hollanders

unread,
May 5, 2021, 6:15:34 AM5/5/21
to nimble-users
Thanks Perry! 

I'm interested in trying this. Let me just ask another quick question: I was just re-reading Turek et al. 2017 "Efficient MCMC sampling..." and saw in the discussion that the inclusion of latent states is necessary for individual-specific covariates. Does that mean I can't use the custom ecology distributions if I have covariates like infection load or body weight per individual at each primary survey?

Matt

Daniel Turek

unread,
May 5, 2021, 7:23:58 AM5/5/21
to Matthijs Hollanders, nimble-users
Matt, thanks for your interest in the paper, the nimbleEcology package, and also your willingness to try your hand at writing your own custom distributions. Briefly, addressing that comment from the paper ("the inclusion of latent states is necessary for individual-specific covariates"), I believe the motivation for that statement was that using the distributions as were written for and used in the paper (which were provided at the time of publication at: https://github.com/danielturek/HMM-MCMC, which has been superseded by the nimbleEcology package), those distributions did not allow for inclusion of individual-level covariates.  All that said, if you're up for writing your own model-specific distribution, specific for using your particular set of covariates, then there would be nothing stopping you from providing individual covariates, e.g.:

observationData[i] ~ dMyCustomDistribution(modelParam1, modelParam2, individualCovariate1[i], individualCovariate2[i])

Then, having the implementation of dMyCustomWrittenDistribution correctly use the individual covariate values in the underlying state transition and observation probability calculations.  I think that would be fine.  My same comment again, however, is that this distribution would now be fairly "model specific" to the particular covariates that you're using.  But that's fine, there's no problem with that.

I hope this helps.  Keep at it, I'm happy to see you going down this path.

Daniel


Perry de Valpine

unread,
May 5, 2021, 10:04:55 AM5/5/21
to Matthijs Hollanders, nimble-users
I don't think so, but I may be forgetting the context of that point from the 2017 paper.  The nimbleEcology distributions marginalize over a capture (or detection) history for a single individual (or site).  So if necessary the other inputs could be indexed by individual, with their values calcualted as needed using individual-level covariates.  For many individuals, it could become a large model, but that's a different topic.

What is a limitation is that if you use the marginalized distributions, you won't be able to save posteriors for individual latent states, because they aren't sampled by MCMC.  They can be determined later (after the MCMC is done) but we haven't yet set that up nicely in nimbleEcology, and we would welcome if anyone takes an interest in contributing that.

Another point that was in that paper was consolidating identical capture histories to reduce redundant computation.  But that approach can't be done if each individual has different covariates.

-Perry

Stephen Gregory

unread,
Jul 21, 2021, 11:45:02 AM7/21/21
to nimble-users
Hello Daniel and Perry,

I'm exploring nimbleEcology as a means to speed up (perhaps by reducing redundant computations) a HMM to explore the effect of individual fish to survive (accounting for imperfect detection) given their size, i.e., an individual covariate.
I have a model working for simulated data in JAGS (1300 individual fish over 7 rivers for between 4 & 20 years), but it takes some time (~48hrs) to recover the generating parameter values (although it does :)).
I was wondering whether I could use the dHMM or perhaps the dDHMM distributions in nimbleEcology to marginalise over states and/or weight the likelihood for individual fishes with identical capture histories.

Things to note about the data (larger than the simulated data):
- I have individual covariates (size)
- There are many individual fish (~40K)
- My JAGS model currently includes random terms for year (up to 17 years) and river (7 rivers)

Do you have any view(s) on whether the problem is tractable within a reasonable time-frame and whether it can be simplified, perhaps using dDHMM?
As per Perry's last point above, can I do anything to reduce redundant computations when I have an individual covariate?

Many thanks in advance,
Stephen

Perry de Valpine

unread,
Jul 21, 2021, 12:19:12 PM7/21/21
to Stephen Gregory, nimble-users
Hi Stephen,

Thanks for posting to the list.

In general it is hard to predict just what is feasible or with how much computational effort.  I think you can do it, but it's hard to say for sure.  Here are some thoughts on this problem.

If it is size-dependent survival with imperfect detection, would dCJS be sufficient?  That would compute faster than an HMM.  The survival and/or detection probabilities can be filled with values calculated from the size-dependence and year random effects.

From a purely computational perspective, the year random effects undercut the benefit of the marginalization partially but not entirely.  That is because if a year random effect is updated by MCMC, the dependent calculations will include the detection history of every fish (possibly) alive during that year, yet those calculations will cover the entire life of the fish.  It still has the benefit of removing the discrete latent states after the last time a fish is known to be alive.

In the case of using individual latent states, a potentially very useful trick is to set the latent states for non-detection years that are between detection years as *data*.  The animal must be alive between detections, and providing this information as data avoids wasted computation from sampling those states (with never a change in value) via MCMC.

The idea of using a weighted likelihood for individuals with identical capture histories is a good one but would not work if the individuals have different sizes or lived in different years.   If you can use it, the trick was reported in Turek et al. 2016 and recently given as a worked example in the capture-recapture workshop materials put together by Olivier Gimenez (See "Class 8 live demo").  Also do you have river effects?

Block sampling could be useful, perhaps for coefficients (do you have more covariates than size?), perhaps for sets of adjacent year effects.

Sometimes it is effective to cut the matrices or arrays of transition or detection probabilities out of the model.  For example, using dCJS or dHMM (or dDHMM) or simply dcat, a common scheme is to use deterministic declarations to fill entries of a large matrix or array (indexed by individual, time, and/or stage), rows or slices or which are then used in dCJS, dHMM, or dcat.  That can create a very large number of nodes in a case like yours with 40K fish.  An alternative is to write a customized version of dCJS (or one of the others) that takes as input the underlying parameters and/or covariates used to calculate entries of the large matrix or array.  Sometimes many of the values are 0s and there are very few actual inputs, so the dCJS (or other) steps can be written directly in terms of those inputs, and the large matrix or array never needs to be formed in the model or the customized distribution.  However, this approach could run into limitations for a large number of covariates.

I hope that helps!

-Perry

Reply all
Reply to author
Forward
0 new messages