Data with random size

46 views
Skip to first unread message

Rémi

unread,
May 4, 2021, 9:01:08 AM5/4/21
to nimble-users
Hi,

I wonder if it is possible to create a model using some data with random size.
The context is capture recapture with misidentifications creating false histories. I want to use  the array x of latent histories (with: 0 not seen, 1 seen & correctly identified, 2 seen misidentified) to generate a Markov Chain and get posterior distribution of the parameters. The problem is that the size of x will change (histories added and/or removed).

I'm not proefficient with BUGs based syntax but I'd write it that way :

p ~ dbeta(1.0, 1.0)
a ~ dbeta(1.0, 1.0)
Ncont ~ dunif(0, 1e6)
N <- round(Ncont)
for (i in 1:N){
    x[i, 1:S] ~ dlatentHisto(identification = a, capture = p)
}

Is it something that can be done ?

Also, assuming it works, or in a different context where x would be multinomial but with total changing (x ~ dmult(p[ ], N)) is there a better way to deal with the N ?

Thank you in advance,
Rémi

Perry de Valpine

unread,
May 5, 2021, 12:02:36 AM5/5/21
to Rémi, nimble-users
Hi Rémi,

Thanks for the question. I don't think I'm totally grasping what you want to do, but let me give it a try.

The short answer is that the formal dimension of the model can't change during MCMC.  What can change is the parts of the model that are actually used, such that formally the dimension is constant but in practice some parts are not always used.  Often this is controlled with some kind of indicator variable scheme, with variables that are 0 or 1 and multiply other parts of the model such that a "0" has the effect of "turning off" part of the model.

If the misidentification you are talking about is misclassification of the state of an individual, with the individual correctly identified, that can be handled by a Hidden Markov Model.  The vignette to nimbleEcology can help one get started on that.  

But I think what you are talking about is if individual A might have been recorded as individual B.  In that case, I think what you are talking about is for example a capture history with (unknown) misidentification, say (1, 0, 1), might really correspond to two correct capture histories, (1, 0, 0) and (0, 0, 1), and so on.  I am not sure off the top of my head if there is a very clever way to implement this purely in the model language and have the MCMC sampling work out ok.  If you can write it formally as a distribution, you should be able to write that distribution as a nimbleFunction, I hope.

Another option is I think you could implement this with a custom MCMC sampler to make propose-accept-reject (Metropolis-Hastings) steps of which identification goes with which individual, while respecting the constraint that one observation can belong to one and only one individual at a time in the chain.  This may need to involve a data augmentation scheme for never-observed individuals.  So, I think it would be non-trivial but could be done.

I hope that helps.  If I've missed the boat, please follow up.

-Perry


--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/9d2cc2ee-a9c2-4b95-99fe-3739f56853edn%40googlegroups.com.

Rémi

unread,
May 5, 2021, 3:26:30 AM5/5/21
to nimble-users
Thanks for the answer !

I consider the case where misidentification leads the real capture history (101) to be split into two ovserved ones : (100) and (001) and I work with a latent history (in this case it would be (102) if the misidentification occurs at t=3). I wrote a distribution for the latent histories (the dlatentHisto in my first post). And I have an algorithm which adds and remove latent histories to sample a new set of latent histories (and make it so that the latent set will always be compatible with the observations) .
I tried to implement the chain in nimble but I saw that the getDependancies("x") would always return the same parts ("x[1, 1:3]",... to "x[200, 1:3]") when I thought I changed x and N properly in the sampler. And that's what made me ask about if it really is possible to implement such a thing with number of rows in x changing as we sample new x.

Rémi

Perry de Valpine

unread,
May 5, 2021, 9:56:00 AM5/5/21
to Rémi, nimble-users
Would the following idea work?

You could set up a vector of indicators, say z[i].  If z[i] is 0, x[i, 1:S] should not be included in model calculations.  If z[i] is 1, x[i, 1:S] should be included.

Then modify so that you have 

x[i, 1:S] ~ dlatentHisto(identification = a, capture = p, include = z[i])

Inside of dlatentHisto, use the value of the include argument as needed.  Perhaps it should return logProb = 0 if z[i] = 0.  You'd have to determine the correct interpretation and resulting value.

And then also modify your custom sampler (very nice to hear about that) to also update values of z[i] as it puts rows of x in and out of the model.

Does it make sense?  The idea is to have large enough objects to contain the largest possible model and turn parts of them on and off in model calculations.  I've seen this kind of approach work well in other problems.

-Perry


Reply all
Reply to author
Forward
0 new messages