Problem with integer vectors in user defined distribution

Dave Ramsey

unread,

Aug 3, 2018, 1:27:31 AM8/3/18

to nimble-users

Hi Nimble team,

I have a user defined distribution written in Nimble code that basically calculates the binomial likelihood for a vector of data (and parameters), following an idea I got from one of the Nimble Wikis (great examples there by the way!). It looks like this.

#----------------------------------------------------------------

# Binomial distribution for for vectors with scalar K

dbin_by_row <- nimbleFunction(

run = function(x = double(1), P = double(1), K = integer(0), log = integer(0, default = 0)) {

J <- length(x)

ans <- 0.0

for(j in 1:J)

ans <- ans + dbinom(x[j], K, P[j], 1)

returnType(double())

if(log) return(ans)

else return(exp(ans))

})

rbin_by_row <- nimbleFunction(

run = function(n = integer(), P = double(1), K = integer(0)) {

J <- length(P)

ans<- numeric(J)

for(j in 1:J)

ans[j] <- rbinom(1, K, P[j])

returnType(double(1))

return(ans)

})

registerDistributions(list(

dbin_by_row = list(

BUGSdist = "dbin_by_row(P, K)",

Rdist = "dbin_by_row(P, K)",

range = c(0, Inf),

types = c('value = double(1)', 'P = double(1)', 'K = integer(0)'))

))

This version takes a vector of data x, probabilities P and scalar number of trials K. This version work fine. However, if I want to move to a vector of (integer) trials then I presumably want this version of the distribution

#-----------------------------------------------------------

# Now try Vector K

dbin_by_row <- nimbleFunction(

run = function(x = double(1), P = double(1), K = integer(1), log = integer(0, default = 0)) {

J <- length(x)

ans <- 0.0

for(j in 1:J)

ans <- ans + dbinom(x[j], K[j], P[j], 1)

returnType(double())

if(log) return(ans)

else return(exp(ans))

})

rbin_by_row <- nimbleFunction(

run = function(n = integer(), P = double(1), K = integer(1)) {

J <- length(P)

ans<- numeric(J)

for(j in 1:J)

ans[j] <- rbinom(1, K[j], P[j])

returnType(double(1))

return(ans)

})

registerDistributions(list(

dbin_by_row = list(

BUGSdist = "dbin_by_row(P, K)",

Rdist = "dbin_by_row(P, K)",

range = c(0, Inf),

types = c('value = double(1)', 'P = double(1)', 'K = integer(1)'))

))

For this version I now specify a vector K in the data (or constants?). However, this version throws a compiler error complaining about type conversion.

error: invalid initialization of reference of type 'NimArr<1, int>&' from expression of type 'NimArr<1, double>'

The vector K specified in the data is an integer vector so I can't see where the type error is coming from. I have attached a minimal working example if that helps

Any help much appreciated!

regards

Dave

test code.R

Perry de Valpine

unread,

Aug 3, 2018, 11:32:09 AM8/3/18

to Dave Ramsey, nimble-users

Hi Dave,

Models store everything as doubles anyway, so you might as well pass it as a double(1) and then simply use its elements as integers.

We'll have to think about how to handle or at least document the situation better.

-Perry

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To post to this group, send email to nimble...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/1a4c2293-5353-44b4-b3d0-aa9b08f3beb0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dave Ramsey

unread,

Aug 3, 2018, 6:43:50 PM8/3/18

to nimble-users

Hi Perry, Thanks for the reply. Yes the version I am actually using has all the function arguments specified as double(). I was just curious why scalar integer() works fine but vector integer(1) throws an error. Is this worth posting as an issue?

cheers

Dave

Perry de Valpine

unread,

Aug 8, 2018, 6:22:31 PM8/8/18

to Dave Ramsey, nimble-users

At the moment I think it's more of a documentation and error-trapping issue than a functionality issue. Eventually we'd like it to work better, but that will be part of some deeper redesign work, I hope. I just filed a quick GitHub issue on it. Please do feel free to comment there if you want to add something. Thanks for bringing it to our attention.

To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/0591e301-f7dd-430d-913d-780bd9ac1cb9%40googlegroups.com.

Andrei Akhmetzhanov

unread,

Sep 19, 2018, 10:29:03 PM9/19/18

to nimble-users

Dear all,

I have tried to move a bit further with Dave's approach and implement also vectorized version of dmulti combined with dbin_by_row. However, if I use both versions of the functions in my code, I get warning messages about non-integer values.

Specifically, I defined following two distributions:

# vectorized binomial

dbin_by_row <- nimbleFunction(
  run = function(x = double(1), P = double(1), K = integer(0), log = integer(0, default = 0)) {
    J <- length(x)
    ans <- 0.0
    for(j in 1:J)
      ans <- ans + dbinom(x[j], K, P[j], 1)
    returnType(double())
    if(log) return(ans)
    else return(exp(ans))
  })

rbin_by_row  <- nimbleFunction(
  run = function(n = integer(), P = double(1), K = integer(0)) {
    J <- length(P)
    ans<- numeric(J)
    for(j in 1:J)
      ans[j] <- rbinom(1, K, P[j])
    returnType(double(1))
    return(ans)
  })

registerDistributions(list(
  dbin_by_row = list(
    BUGSdist = "dbin_by_row(P, K)",
    Rdist = "dbin_by_row(P, K)",

discrete = TRUE,

    range = c(0, Inf),
    types = c('value = double(1)', 'P = double(1)', 'K = integer(0)'))
))

# vectorized multinomial
dvmulti <- nimbleFunction(
  run = function(x = double(2), P = double(1), N = double(1), log = integer(0, default = 0)) {
        J <- length(N)
        ans <- 0
        for (k in 1:J) {
            ans <- ans + dmulti(x[k,1:J], N[k], P, 1)

        }
        returnType(double())
        if(log) return(ans)
        else return(exp(ans))
    })

rvmulti  <- nimbleFunction(
    run = function(n = integer(), P = double(1), N = double(1)) {
        declare(ans, double(2))
        J <- length(P)
        setSize(ans, J, J)
        for (k in 1:J) {
            ans[k,1:J] <- rmulti(1, N[k], P)
        }
        returnType(double(2))
        return(ans)
    })

registerDistributions(list(
    dvmulti = list(
        BUGSdist = "dvmulti(P, N)",
        Rdist = "dvmulti(P, N)",
        discrete = TRUE,
        range = c(0, Inf),
        types = c('value = double(2)', 'P = double(1)', 'N = double(1)'))
    ))

Now I use the following code:

nimCode = nimbleCode({
    for(k in 1:K) {
        probs[k] ~ dbeta(1,1)
        x[k] ~ dbin(probs[k],xMax)
    }
#     x[1:K] ~ dbin_by_row(probs[1:K],xMax) #<- if I use this line instead of the one above, I will get warnings about non-integer
    y[1:K,1:K] ~ dvmulti(P[1:K],x[1:K])
    observed[1:K] <- nimRowSums(y[1:K,1:K])
})

nimRowSums = nimbleFunction(
    run = function(a = double(2)) {
        nrows <- dim(a)[1]
        ncols <- dim(a)[2]
        ans <- numeric(ncols)
        for(j in 1:ncols) {
            ans[j] <- sum(a[1:nrows,j])
        }
        return(ans)
        returnType(double(1))
    }
)

while running simulations with vectorized binomial gives the message:

Warning message in compiledModel$nimMCMC$run(niter = Niter): "non-integer x = 2.970471"

and so on.

I don't know whether I need simply ignore those warning messages, but I think something seems not correct.

Thank you very much for your time in advance. Please also see the attached script.

Best,

Andrei

20180920 verifying dmulti.r

Chris Paciorek

unread,

Sep 21, 2018, 2:29:27 PM9/21/18

to akhmet...@gmail.com, nimble-users

Hi Andrei,

When you switch from using scalar x[1], x[2], ... variables to the
vector x[1:K], the sampler(s) assigned by NIMBLE change from slice
samplers on each x[i] to a random walk block sampler. Our slice
sampler recognizes that x[i] are discrete and proposes new values
appropriately. Our random walk block sampler is not designed for
discrete multivariate random variables and so proposes non-integer
values and you get the warnings you do. A simple fix is to configure
the MCMC so that each x[i] is sampled using a slice sampler. But that
presumably defeats the purpose of creating the vectorized binomial
density since each scalar sampler will recompute the likelihood for
the full vector x[1:K].

We don't have a sampler in our provided samplers that would handle
this discrete multivariate case, so if you wanted to continue on this
path, you would probably need to write your own multivariate sampler
for x[1:K]. This could be a version of our RW_block sampler (you can
copy and then modify the sampler_RW_block code from the file
MCMC_samplers.R in https://github.com/nimble-devnimble) that proposes
integer-valued proposals perhaps via some sort of rounding. There is
discussion of user-defined samplers in Section 5 of chapter 15 of the
NIMBLE manual.

All that being said, there is a bigger issue here. In the form you
write your model, NIMBLE is not constraining y[1:K,1:K] such that the
rowSums are equal to the values of observed. To impose such a
constraint, you need to use the dconstraint() syntax discussed in
Section 5.2.7.3 of the NIMBLE manual. And once you do impose the
constraint, MCMC sampling won't work very well (actually it probably
won't accept any proposals at all) because it will be very hard for a
sampler to propose values of 'y' that are consistent with the fixed
values of 'observed'. I think you would have to come up with your own
user-defined sampler that proposed new 'y' values that are consistent
with 'observed'. E.g., within a row of 'y', you could propose to
increase one element by one and decrease another element by one so the
sum stays fixed.

Chris

> To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/7b471014-d6ba-4e83-9b73-f4e1248dc242%40googlegroups.com.

Andrei Akhmetzhanov

unread,

Sep 26, 2018, 11:27:34 PM9/26/18

to nimble-users

Dear Chris,

Thank you for your detailed response. It looks complicated, but I will think about it. :-) For now, I would like to avoid doing very technical coding, e.g. writing samplers, so I will try to figure out if there is a simpler workaround (It looks like there is).

Actually, my original motivation was to check PMHH for particle filtering, but I realized yesterday while reading the manual, that NIMBLE still offers only markovian filters (see an example of non-markovian https://arxiv.org/pdf/1601.07622.pdf)