reasonable initial values recalculated to impossible values in distance sampling model

10 views
Skip to first unread message

SimonGE

unread,
Jun 12, 2024, 7:21:26 PMJun 12
to nimble-users
Hello nimble users,

I have been struggling with a distance sampling model for a couple of weeks and am out of ideas. The model is meant to estimate abundances (N) among species and sites (see the attached sample data).

I took care to initialize this model with what I think are reasonable starting values for every parameter in this model but am getting the familiar "NaN values in logProb of model variable y".

y is drawn from a binomial distribution of size N with probability pcap. Note that initial values for N are reasonable (I think) and so are those for pcap. After building the model, I confirm whether the model object is holding the correct values for pcap and it is not.

All the values for pcap are replaced with 1's. I also noticed that if I were to skip the summation step on line 44 `pcap[s,j] <- sum( fc[s,j, 1:n.bins] ) ` and replaced it with, say, ` pcap[s,j] <- fc[s,j, 1] ` or any number from 1 to n.bins in the 3rd index, then this doesn't happen. I don't know why that is considering that the initial values for fc along that 3rd index sum to 0.75.

I think this must be the problem but I a) don't understand why this is happening, and b) don't know what to do about it.

Any guidance would be much appreciated. Thanks in advance!
constants.rda
data.rda
inits.rda
test.R

Perry de Valpine

unread,
Jun 12, 2024, 8:18:34 PMJun 12
to SimonGE, nimble-users
Dear Simon,

Thanks for the question.

nimbleModel by default does a model$calculate() as one of its last steps. (You can skip this by calculate=FALSE, but that's not your problem.)

model$calculate() does the following: In topologically sorted order (i.e. parent nodes before descendent nodes), calculate deterministic values for deterministic nodes ("<-") or calculate log probabilities for stochastic nodes ("~"). 

Therefore, it is typically not very useful to provide initial values for deterministic nodes, as you have done. They will be overwritten by values calculated in the model as soon as model$calculate() runs. (In your case, if you use calculate=FALSE in nimbleModel, you will see your initial values for fc, which addresses part of your question but not really the underlying problem.) 

For your model, when model$calculate() operates within nimbleModel (or you can run it yourself), it appears that the deterministic calculations really do result in values of pcap that are all 1s. Roughly what I see from the code is that pcap scalars depend on fc vectors, which depend on f vectors, which depend on f_0 scalars and p vectors, which depend on up and low vectors, which depend on sigma scalars, which depend on p_sp scalars, which are the stochastic parents to that entire chain of calculations. One could follow all the calculations directly in R as well. I can't be sure but I'm thinking that there could be a mistake in your math. I am not checking the details, but roughly it looks like you have discretized a half-normal distribution into a set of bins, adjusted those bins for two dimensions, and summed the total probability, resulting (correctly?) in values of 1? I am wondering if there is a missing component to the modeling ideas.

In fact, the situation is slightly worse than having all pcap values be exactly 1. That would result in -Inf log probabilities because for y~dbin(p, N) with p==1, the probability is 0 if y is not equal to N (as you have). In your case, however, the numerical imprecisions on the calculations resulting in pcap are such that all the values are slightly > 1, which results in NaN log probabilities, which is what is happening.

If you want to gain more insight into model$calculate(), you can do:
for(node in model$getNodeNames()) model$calculate(node)
Your model as >73000 nodes you might prefer to use more of a toy example for exploration.

Finally, I note that Michael Scroggie has some tools for distance sampling models in nimble here. I don't know its state of development (I hope it's ok to point it out!).

HTH
Perry


--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/e6f94bd2-680d-4e68-acdb-35e4f7b2db17n%40googlegroups.com.

SimonGE

unread,
Jun 12, 2024, 10:41:16 PMJun 12
to nimble-users
Amazing! Thank you so much for such a quick and thorough reply.

I am disappointed that it was exactly the problem that I spent so much time trying (and failing!) to avoid (i.e. providing good initial values). You're correct that I was wasting my time with the inits for the deterministic nodes and you correctly identified p_sp as the stochastic parent that was causing the problem. The topological sorting explains a lot. The initial values for that parameter were too large, hence pcap > 1. For posterity, this model is based on Sollmann et al. 2015. I had also looked at Michael Scroggie's GitHub but had the same question about the state of development.

Anyhow, you've solved my problem and made my day. Thanks again, Perry.
Reply all
Reply to author
Forward
0 new messages