Hello Nimble community,
I'm running a capture-recapture model where I'm estimating survival of 1st year geese, and I'm using the difference between the date of peak plant quality and their hatching date (i.e, trophic mismatch) as an individual-level covariate on survival.
Because the hatching date of most juveniles (~20 000 out of 22 000) is unknown, I'm estimating the age of these birds of unknown age based on the relationship between the length of the 9th primary feather and age, which predicts the age of most birds with a precision of ~1-2 days, which is quite good.
I wanted to make things very clean and incorporate the uncertainty in the age estimation of the goslings when estimating the effect of the mismatch covariate on survival, so I'm running both these analyses (individual age prediction + CR-survival) in a single joint analysis (i.e., in the same nimble model).
My issue lies in calculating the WAIC, or some measure I can use to compare models. Because my dataset is quite large (22 000 juveniles + 30 000 adults) and the model is complex, computation time and memory use are already almost prohibitive, making it difficult to use cross-validation techniques, which is why I opted for WAIC. Currently, using the marginalized likelihood function of nimble ecology, and having optimized my model as much as possible for my level of programming skills, it still takes about a week to run a bit under 30 000 iterations on the fastest cluster I have access to. Consequently, I'm running each chain separately, and I'm saving the MCMC samples every 1000 iterations, restarting the MCMC without resetting model and mcmc state (option reset=F), but resetting model values (resetMV=T) to keep memory use reasonable (I’m still using around 60GB of memory as things are now). I'm calculating the WAIC 'on-the-fly', as the model is running (enableWAIC=T, and mcmc$getWAIC()), but reading into how WAIC and specifically pWAIC2 is calculated, I'm realizing (I think..) that the fact that I'm resetting the number of samples stored by the model every 1000 iterations means I'm estimating the number of parameters based on the sample variance of the last 1000 iterations rather than the entire sample (obviously, since I removed previous samples from the model), which is likely problematic, and would explain I'm getting quite different WAIC and pWAIC values between each run and between chains. I thought that I would then be able to calculate the WAIC post-hoc after running everything and getting all my samples back together. Unfortunately I'm now realizing that I need to monitor all stochastic parent nodes of the data nodes in order to do that. That is also a problem since I'm not monitoring the estimated ages of the 20 000 juveniles, as this would also require way too much memory.
So I'm wondering what my options are at this point. Since I'm estimating the age of birds in the same way in all models I aim to compare, I think I could simply not account for this in the calculation of the pWAIC, but I’m not sure of the consequences of doing that, and if there is even a way to do this using the post-hoc WAIC calculation method. Has anyone run into a similar problem before and/or would have ideas on how I could proceed next?
Thanks for the help!
Cheers,
Fred L.
On Aug 24, 2023, at 4:58 PM, Frédéric LeTourneux <frederic....@gmail.com> wrote:
--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/c6b9e243-abb6-466c-b71d-7608fc8f832bn%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/ce3575da-ca08-4f54-bf10-9111bc0131ccn%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/fafc13bc-9f5d-489b-ac38-92ea05196ff0n%40googlegroups.com.