Fitting distributions to modified vectors in Nimble

97 views
Skip to first unread message

Jaume Adria Badia Boher

unread,
Nov 17, 2021, 9:21:05 AM11/17/21
to nimble-users
Hi all!

I have a question related to fitting distributions to vectors of data in Nimble, and I wonder if somebody could help me with a couple of things I am struggling with. I will try to provide it as an understandable example!

My aim here is to see how parameters of a distribution vary after I bias a vector of data in particular ways. I start by fitting a distribution (gamma) to a vector that is a combination of observed and unobserved (NA) values.

for(i in 1:nvec1){
vec1 ~ dgamma(shape1,rate1)
}
shape1 ~ dgamma(1,1)
rate1 ~ dgamma(1,1)

Here the model imputes values to the NAs, which I want. My second step here is to remove some values of this vector (or turning them into NAs again, whichever is easiest) following a probability that varies depending on the values of the vector themselves, but for the ease of this example, let's assume it is a fixed probability. I thought about doing it this way. First, create a vector of 0s and 1s (removed) and then multiply it with vec1.

for(i in 1:nvec1){
removed[i] ~ dbern(0.5) #Either remove or not following a probability
vec2[i] <- vec1[i] * removed[i] #The ones to remove well become a 0
}

Now, I expect that vec2 is a combination of 0s and values originally obtained at vec1. My aim as I said before is to now either get rid of these 0s or turn them into NAs, because I would like to fit a gamma distribution to it again and see how the parameters of the distribution have changed after removing these values (that is, after "biasing" the data in particular ways, as I said above). Here is where problems starts, as I do not figure out any way to do it.

First, by using a nimbleFunction, I guess it would be easy to do:
vec3 <- vec2[vec2>0]

Or

vec2[vec2 == 0] <- NA

And then, fit a distribution to this vector:

vec3 (or vec2)[i] ~ dgamma(shape2, rate2)

shape2 ~ dgamma(1,1)
shape3 ~ dgamma(1,1)

The issue here is: I am getting an undefined node error, and I guess it is due to the fact that either way I try to implement this solution, my vector appears twice in the left side of a relationship:

Either as vec3 <- (...) or vec2 <- (...).
And also either as vec2 ~ or vec3 ~ dgamma(...)

Does anybody have a suggestion on how to deal with this, or any other type of approach to try? I hope I explained myself clearly enough!

Thanks in advance,

Jaume

Perry de Valpine

unread,
Nov 17, 2021, 10:04:21 AM11/17/21
to Jaume Adria Badia Boher, nimble-users
Hi Jaume,

Here is one idea.

vec1[i] ~ dgamma_toggle(shape1, rate1, removed[i])

dgamma_toggle <- nimbleFunction(
 run = function(x = double(), shape = double(), rate = double(), removed = double(), log = integer(0, default = 0)) {
   if(removed == 0) 
       logProb <- 0
   else
        logProb <- dgamma(x, shape = shape, rate = rate, log = TRUE)
  
  if(log)
     return(logProb). ## From a model, log will always be TRUE
   else
      return(exp(logProb)) 

   returnType(double())
})

This replaces dgamma with a custom version of dgamma that treats removed nodes as unimportant (by always giving them 0 log likelihood).  Then you would not need to create vec2 or vec3, and you can change the entries in removed[] to change which elements of vec1 contribute to the likelihood for shape1 and rate1.

If it is preferable to create vec2, you can use the following trick:

dummy[i] ~ dgamma_dummy(vec2[i], shape1, rate1, removed[i])

and write a nimbleFunction for dgamma_dummy.  Here the dummy[i] values need to be provided as data and will be provided as "x" to dgamma_dummy.  However, dgamma_dummy can ignore them and return dgamma(vec2[i], shape1, rate = rate1, log = TRUE) (when removed[i] is FALSE).  This is a way to place a stochastic node (dummy[i]) in the model graph but ignore its value and instead calculate a log probability contribution from other values (vec2[i] in this case).

There is also a way you could do this by turning samplers on and off, but the above seems easier unless the scale of the problem is a computational bottleneck.  

Also note that any vec1 nodes with NA values will be assigned samplers in the MCMC configuration step.  If you later give them non-NA values intended as data, after building the MCMC (and/or compiling), those samplers will still be there, so the nodes will be sampled rather than treated like data.  The samplers do not dynamically update what is treated as missing data (NA) or not after that has been determined during MCMC configuration and building.  And vice-versa, if you set a node from non-NA to NA after those steps, it will not dynamically be assigned a sampler.  (What you can do is ensure that all nodes have samplers and then omit the ones you don't want by leaving them out of the sampler order vector. But the above ideas seem easier.)

-Perry
 

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/aeda6279-56bc-47b0-b262-ce8d97ecf91en%40googlegroups.com.

Jaume Adria Badia Boher

unread,
Nov 25, 2021, 2:38:51 PM11/25/21
to nimble-users
Hi Perry,

Thank you very much and sorry for my late reply. This has been extremely useful, thank you for the time writing this! I am already working on it.

Best,

El dia dimecres, 17 de novembre de 2021 a les 16:04:21 UTC+1, pdevalpine va escriure:
Reply all
Reply to author
Forward
0 new messages