Dynamic indexing

369 views
Skip to first unread message

Zoe Luo

unread,
Jul 9, 2022, 3:49:27 AM7/9/22
to nimble-users
Hi everyone,

I'm new to nimble and trying to fit a linear mixed model y=x*beta+residual with missing x. The part of code for imputing missing data starts from line 97 of the attached R code and involves a nimbleFunction called imputation which is defined at the beginning. I got the following error message:

Defining model
Error in getSymbolicParentNodesRecurse(x, constNames, indexNames, nimbleFunctionNames,  :
  getSymbolicParentNodesRecurse: dynamic indexing of constants is not allowed in FID[rowx, 1]. Try adding the dynamically-indexed constant as data instead (using the data argument of nimbleModel).
Calls: nimbleModel ... getSymbolicParentNodesRecurse -> lapply -> FUN -> getSymbolicParentNodesRecurse

I thought it would solve the problem by defining
colx <- col_idx[i]
rowx <- row_idx[i]
and having col_idx and row_idx in the list of constants, or adding 'nimbleOptions(allowDynamicIndexing = TRUE)' at the beginning, but they don't make a difference and I'm not sure where are the problems now. 

Thank you for your attention, and any help would be much appreciated!

Zoe
sim04(N1200)2ydep.RData
nimble_lmm.R

Perry de Valpine

unread,
Jul 9, 2022, 1:14:35 PM7/9/22
to Zoe Luo, nimble-users
Hi Zoe,
Thanks for posting the question and sharing a reproducible example.
I haven't yet run through your code and will instead give a quick reply.  If that doesn't get you up and running, we can keep trying.
The message "dynamic indexing of constants is not allowed in FID[rowx, 1]. Try adding the dynamically-indexed constant as data instead (using the data argument of nimbleModel)" is asking you to try moving FID from the constants list to the data list.
(Like a lot of R error trapping, our attempt at an informative error message is embedded in R's message about where the error was caught in the code, and that can be distracting.)
Fingers crossed that making that change will get you going.
Perry


--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/a007bd56-59d2-42ae-9692-e7eac855c160n%40googlegroups.com.

Zoe Luo

unread,
Jul 9, 2022, 6:08:35 PM7/9/22
to nimble-users
Hi Perry,

I got the following error message when I moved FID from the constants list to the data list.

Defining model
Error in BUGSdecl$replacementsEnv[[indexNamePieces]] :
  wrong arguments for subsetting an environment

I had this error before, and I saw the post on GitHub that says you would get this undecipherable error if you forget to supply a constant representing group membership to the constants list to nimbleModel(). I'm not sure which component should be in the constants list in my case, but apparently it's not FID.

P.s. just a minor correction of my original post, 'missing x' actually means there are missing values in x, not the whole x is missing.
Many thanks for your help!

Zoe

Perry de Valpine

unread,
Jul 9, 2022, 7:17:59 PM7/9/22
to Zoe Luo, nimble-users
Hi Zoe,

OK, I'm taking a more thorough look and running your code.  It looks like there's something of a misconception about the model language.  The BUGS language is unusual in that it is declarative, not imperative as most languages are.  Each line of code in BUGS creates one or more node relationships in a graphical model.  Once a node has been declared, the same node can't be declared again.  The order of code does not matter, because the lines declare nodes and relationships rather than steps to follow in order (as most code does).  

This means that the following won't work:
  for (i in 1:n_middle) {
    idx <- middle[i]
    R[idx] ~ dbinom(size=1, prob=frac)
  }
Once idx is declared once, it has its graphical relationship established and can't be declared again in another iteration of the for loop.  This is not a for loop for steps to be taken in order repeatedly as in imperative programming, but rather a for loop to declare multiple nodes.  You would want instead:
  for (i in 1:n_middle) {
    R[middle[i] ] ~ dbinom(size=1, prob=frac)
  }
where the vector middle should not have any repeated values.

Another example that will not work is this:
  for (i in 1:n_rest) {

    colx <- col_idx[i]
    rowx <- row_idx[i]
    indicator[1:4] <- c(R[FID[rowx,1]],R[FID[rowx,2]],R[FID[rowx,3]],R[FID[rowx,4]])
    out[1:4] <- imputation(geno=x_raw[],FID=FID[,],indicator=indicator[1:4],rowx=rowx,colx=colx,maf=maf)
    idx <- rest[i]
    x[idx] <- out[1]
    x[idx] ~ dcat(prob=out[2:4])
  }

I think you could for example do what you want like this:
  for (i in 1:n_rest) {
    indicator[i, 1:4] <- c(R[FID[row_idx[i],1]],R[FID[row_idx[i],2]],R[FID[rowx,3]],R[FID[row_idx[i],4]])
    out[i, 1:4] <- imputation(geno=x_raw[],FID=FID[,],indicator=indicator[i, 1:4],rowx=row_idx[i],colx=col_idx[i],maf=maf)
    x[rest[i]] <- out[1]
    x[rest[i]] ~ dcat(prob=out[2:4])
  }
I've removed some of the nodes you were trying to re-declare, and for others I've added an i index to make each one unique.
However I might not be following the intent of your model entirely, so I can't be sure I've captured it.

And a last example is that you can't do this:
    yf[1:4] <- c(y[members[1]],y[members[2]],y[members[3]],y[members[4]])
    yf ~ dmnorm(mu[1:4],cov=Xi[1:4,1:4])
because you can't declare the same node as deterministic and stochastic (and it should be "y[1:4]" if the second line is wanted).  Often when people have a need to do that, it arises from flawed probability logic.  Sometimes it really is what one wants, and then there are tricks to make it happen. However, I am thinking that you will benefit from finding some more introductory material on the BUGS language, either examples from our web site and user manual and/or material from BUGS and JAGS.  Once you are more oriented with how to use the language, I would be happy to help with this trick if you need it.

Also there were a few issues with your imputation nimbleFunction (but see below).  I tried compiling it separately:
Cimputation <- compileNimble(imputation)
This revealed that the argument to rcat should be "prob", not "probs", that p and out need to be created before assigning into them using indexing (this is true in R as well, and it is helpful to try using a nimbleFunction uncompiled, which will use R execution), and that you are using a j that is not defined anywhere.  For p and out, you can create them by:
p <- numeric(length = 3)
out <- numeric(length = 4)

For out you could also do
out <- c(x,p)
instead of 
out[1:4] <- c(x,p)
because when creating a whole object, it can be created by assignment.

A final and potentially vital comment is that you may encounter strange behavior having deterministic nodes (e.g. out[i, 1:4], where I've inserted i so you have unique nodes) assigned from a function that makes random draws (your imputation function).  If you aren't sure you know what you're doing in nimble, that could break nimble's MCMC system because the MCMC system caches values of intermediate calculations.  If those calculations are stochastic, what is cached may change, and the logic of the MCMC might break. I would have to grasp more of what you are trying to do to be sure, but my guess is it won't work correctly.  However, what we usually see when people want to impute missing values is they let those be handled by nimble's MCMC engine.  The stochastic declarations are included in the model code, and if the data is provided as NA, it is treated as missing and the node is sampled by MCMC.   If you need a different kind of imputation to happen, it might be helpful to dig deeper into how things are working in nimble, and we're happy to answer questions if we can.

-Perry

Zoe Luo

unread,
Jul 9, 2022, 10:14:53 PM7/9/22
to nimble-users
Hi Perry,

In fact, I have tried 'R[middle[i]] ~ dbinom(size=1, prob=frac)'  but nimble returns an error messages

Defining model
Error : Error processing model: something is wrong with the index middle[i].
Indexing in model code requires this syntax: '(start expression):(end expression)'.
Error: Error occurred defining  R[middle[i]]

As a node can't be deterministic and stochastic at the same time, I suppose I can't have the following as well?

 x[rest[i]] <- out[1]
 x[rest[i]] ~ dcat(prob=out[2:4])

If my understanding is correct, I could let the imputation function only returns the vector 'p', and keep the second line 'x[rest[i]] ~ dcat(prob=p)', then I won't have a deterministic node assigned from a function that makes random draws. However, I'm not sure if the calculation of mean in the modelling part (mu[j] <- beta_0 + beta_1*x[members[j]]) would have a problem when x is a stochastic node at [members[j]]. 

Basically, I'm trying to fit a LMM to each nuclear family (which contains two parents and two children). There are some individuals whose y (phenotype) and x (genotype) are always observed, and some individuals whose y is observed, but x is missing with probability 1-frac (this is described in the sampling part). The variable x is a categorical variable with three possible outcomes (0,1,2 which are equivalent to 1,2,3 in dcat). For a family member with missing genotype, the probability of its outcome depends on its relatives if the genotype of any relatives is observed, otherwise it depends on the constant maf. The imputation function returns the genotype probability of the i-th individual under different conditions, e.g. if i is a child, the functions returns the probability under conditions if  genotype of mother/father is observed, or genotype of both parents are observed or genotype of its sibling is observed or genotypes of all relatives are missing. I hope this makes sense and thank you very much for looking into the code.

Best,
Zoe

Perry de Valpine

unread,
Jul 11, 2022, 10:08:37 AM7/11/22
to Zoe Luo, nimble-users
Hi Zoe,

Thanks for explaining.  It looks like you're getting up to speed on nimble for the first, and welcome to the community.  A good strategy for a model with lots of pieces and things confusingly not working is to build up toy models bit by bit to get the mechanics working even if the models make no statistical sense until the end.  I started to try that as follows:

code1 <- nimbleCode({

  for (i in 1:n_middle) {
    R[middle[i]] ~ dbinom(size=1, prob=frac) # sampling indicator of the non-extreme individuals
  }
})
m1 <- nimbleModel(code = code1, constants = constants, data = data, inits = inits)

I see a problem here, and it is that you have provided middle as data instead of constants.  data are values that could potentially change. constants are values used to define model structure and will not change. In this case, indexing nodes declared as indexed elements of your R variable must not change, so they must be in constants.

constants$middle <- data$middle
data$middle <- NULL
m1 <- nimbleModel(code = code1, constants = constants, data = data, inits = inits) # WORKS!

Next I tried building up a bit more:
code2 <- nimbleCode({
  beta_0 ~ dnorm(50, sd = 10)
  beta_1 ~ dnorm(0, sd = 10)
  sigma_g2 ~ dunif(0, 1000)
  sigma_e2 ~ dunif(0, 1000)

  for (i in 1:n_middle) {
    R[middle[i]] ~ dbinom(size=1, prob=frac) # sampling indicator of the non-extreme individuals
  }
  for (i in 1:n_sample) {
    x[sample[i]] <- x_raw[sample[i]]
  }
})
constants$middle <- data$middle
constants$sample <- data$sample
constants$rest <- data$rest # I can see the same issue ahead for "rest"
data$middle <- NULL
data$sample <- NULL
data$rest <- NULL
m2 <- nimbleModel(code = code2, constants = constants, data = data, inits = inits) # WORKS

Next I ran into issues where you'll have to rearrange some of your indexing scheme so I can't really check things
but I hope I can point in the right direction.  I think you want something like this:

code3 <- nimbleCode({
  beta_0 ~ dnorm(50, sd = 10)
  beta_1 ~ dnorm(0, sd = 10)
  sigma_g2 ~ dunif(0, 1000)
  sigma_e2 ~ dunif(0, 1000)

  for (i in 1:n_middle) {
    R[middle[i]] ~ dbinom(size=1, prob=frac) # sampling indicator of the non-extreme individuals
  }
  for (i in 1:n_sample) {
    x[sample[i]] <- x_raw[sample[i]]

  }
  for (i in 1:n_rest) {
    x[rest[i]] ~ dnorm(0, 1) # not the model of interest, but will let you work on later
  }

  # I think you'll want something like this.
  # I don't really think I got this right, but I hope it's a start.
  for(j in 1:4) {
    for(k in 1:4) {
      Xi[i,j] <- kinship_cov[j,k] + (j==1)*sigma_g2 + (j==k)*sigma_e2
    }
  }
  for (i in 1:nf) {
    y[i, 1:4] ~ dmnorm(mu[1:4],cov=Xi[1:4,1:4]) # rearrange y in this order
    #y[start[i]:end[i]] ~ dmnorm(mu[1:4],cov=Xi[1:4,1:4]) # Alternative if you make y a vector
  }
})

In piece-by-piece mode, code3 skips the meaningful content for the imputed values and uses a toy placeholder.  I'm
focusing here on the data arrangement for dmnorm.  For a vector node, the values must be contiguous, so you will
have to rearrange your indexing scheme for y to make one of the above lines for contiguous groups of four elements of y possible.

I looked at your "rel" matrix ("kinship" in your model code) and it looks like it is a huge matrix that provides just the
same information of (1, .5, .5, .5) within each family.  If that's the case, is it correct that you could form a single
Xi matrix and use it for all families?  That's what I tried.  Also (whether the same for all or not), I think rel and FID are both 
constant, so you could set up a variable resulting from processing the nested indexing (rel indexed by FID elements, but you'll have to rearrange for the new y 
format) outside the model.  I have assumed here you can set up kinship_cov and then add the pieces that depend on nodes
sigma_g2 and sigma_e2, but I'm not really sure I got it just right.

Finally, to get the imputation arranged, I am thinking you want something like:

code4 <- nimbleCode({
  beta_0 ~ dnorm(50, sd = 10)
  beta_1 ~ dnorm(0, sd = 10)
  sigma_g2 ~ dunif(0, 1000)
  sigma_e2 ~ dunif(0, 1000)

  for (i in 1:n_middle) {
    R[middle[i]] ~ dbinom(size=1, prob=frac) # sampling indicator of the non-extreme individuals
  }
  for (i in 1:n_sample) {
    x[sample[i]] <- x_raw[sample[i]]

  }
  for (i in 1:n_rest) {
    out[i, 1:4] <- imputation(<rearrange inputs and outputs to return the vector of prob>)
    x[rest[i]] ~ dcat(prob=out[i, 2:4])
  }

  # I think you'll want something like this.
  # I don't really think I got this right, but I hope it's a start.
  for(j in 1:4) {
    for(k in 1:4) {
      X[i,j] <- kinship_cov[j,k] + (j==1)*sigma_g2 + (j==k)*sigma_e2
    }
  }
  for (i in 1:nf) {
    y[i, 1:4] ~ dmnorm(mu[1:4],cov=Xi[1:4,1:4]) # rearrange y in this order
    #y[start[i]:end[i]] ~ dmnorm(mu[1:4],cov=Xi[1:4,1:4]) # Alternative if you make y a vector
  }
})

The idea is to use a nimbleFunction (imputation) to return the vector of probabilities and use the dcat in the model to get MCMC working.  Provide those values of X as data with NA values, and provide valid inits for them (with NAs where there really are data).  In the mode of trying things bit by bit, you might even want to try this in its own two model until you get it working.

I hope this helps you get going.  I'm sorry to say I won't have time soon for another deep dive at this level into your code.  We have lots of training material here: https://github.com/orgs/nimble-training/repositories.  If you want to work through extended material, the most recent multi-day workshop is the one labeled lisbon-2022.  Please feel free to ask more questions, but it does look like you would get a lot out of working through some of our training and documentation first.

Perry


Zoe Luo

unread,
Jul 13, 2022, 3:04:21 AM7/13/22
to nimble-users
Hi Perry,

Thanks for the explanation and link to the trainning material, they are very helpful and I have managed to make the code work (the results also looks correct)! 

I just have one more question about the output. 'samples <- runMCMC(CglmmMCMC, niter = 10000)' only gives me the four parameters, but not the missing values in x. Here are my lists

data <- list(y=y,x=x,R=R)
constants <- list(x_raw=x_raw,kinship=rel,middle=middle,sample=sample,rest=rest,FID=FID,
                  maf=0.2,nf=nf,frac=(N/2-length(extreme))/length(middle),n_middle=length(middle),
                  n_sample=n_sample,n_rest=n_rest,col_idx=col_idx,row_idx=row_idx)
inits <- list(beta_0=50, beta_1=0, sigma_g2=2, sigma_e2=2,x=sim$genotype+1)

Is it possible to get the samples for x or just the missing value in x?

Many thanks,
Zoe
LMM_nimble.RData
nimble_lmm.R

Perry de Valpine

unread,
Jul 15, 2022, 1:19:35 PM7/15/22
to Zoe Luo, nimble-users
Hi Zoe,

In the MCMC configuration step (configureMCMC and/or buildMCMC) you can change the monitors, which is the set of variables that will be recorded in the output.  Monitoring is done by variable name, so you have to record all of x.  You can't record just parts of it.  There is also a second set of monitored variables, monitors2, which has its own thinning interval, thin2.  In cases where a variable of interest such as a set of latent states / random effects is much larger than the coefficients / model parameters of interest, it can work well to monitor the large variable(s) in monitors2 with a higher thinning interval.

The "cheat sheet" on our documentation page is another resource that might be helpful.

HTH

Perry


Reply all
Reply to author
Forward
0 new messages