Measurement error model in INLA

robi

unread,

Dec 10, 2012, 5:14:09 PM12/10/12

to r-inla-disc...@googlegroups.com

Dear list,

I am trying to use R-INLA for a measurement error problem. However, I am struggling to formulate it properly, so I would appreciate any help you could give.

The model is something like this:

log(y) = a + b*log(x)

However, we don't have measurements of x itself. Instead, we have 3 approximations of x with error (w). So w_i = x + eps_i.

So this is a basic measurement error problem. The only additional complexity is that both y and x are count data.

I would like to simultaneously estimate x and then use that estimate in the equation above to calculate a and b

My first thought was to use the copy function. I defined the formula as

formula = cbind(x, y)~ 0 + trt + f(j, model='iid', hyper=list(theta=list(fixed=FALSE))) +
                    f(j2, copy='j', hyper=list(theta=list(fixed=FALSE))),

where j and j2 are indices for the replicates of x and y, and are NA when x and y are NA. trt is a label for whether each datapoint is an x or a y.

The problem I see here is that I think this is fitting the equation
log(y) = a + b*log(v)
where v[j] = log(x[j]) - mean(log(x))

Is there anyway of fitting this sort of model in inla? It seems like something it could do, but I can't figure it out. My knowledge of the tools to manipulate models is still fairly poor. Apologies if this is trivial.

I have code that simulates sthe sort of data we have (it's a little more complex than the description above, but not much). I have attached this in case it is useful.

Many thanks in advance for any help.

best wishes
Robi

## Below is a simulation of the type of data we have.

## There are 3 measurements of x, w_i (i=1:3) for each of 36 replicates,(j=1:36)

## For each replicate, we measured y under k treatments to provide y_jk.
## For simplicities sake, let us assume that k = 2,

## I have simulated different effects of density under the two treatments
## For one, x has no effect (b=1) but for the other x as a strongeffect (b=0.8)

## here is the simulation:
## a function that takes lambda, the expecation of x,
## also a and b to simulate y data
## this simulates x_i and y_i
## Note that x_i for another call of sdlgen with the same
## lambda will give a 'measurement' of x, called w_i
sdlgen <- function(lambda, a, b, model='nb'){
## (default) negative binomial with parameters size=1, p=1/(1+lambda)
if(model=='nb')
    n0=rnbinom(n=length(lambda), mu=lambda, size=1) ## simulated x_ij
else
    ## otherwise poisson with parameters lambda=lambda
    n0=rpois(n=length(lambda), lambda)

## now simulate y as y_ij = a*x_ij^b
n1 <- rbinom(length(lambda), size=n0, prob=a*n0^(b-1))
return(cbind(n0,n1))                  # return both the N0 and the N1
}
## set parameters for the simulation
a <- 0.1; b1 <- 1; b2 <- 0.8 ## b=effect of x
                                        # b1 and b2 are two treatments,
                                        # b1 has no effect,
                                        # b2 has a small one
j <- 1:36                ## vector of replicates, j
lambda <- exp(rnorm(36, mean=5)) ## expected x for each station, lambda_j;

## looking at the expected curves
curve(a*x^b1, from=1, to=1000)
curve(a*x^b2, from=1, to=1000, add=T, lty=2) ## note b2 gives lower y at high x

## now simulate the "observed" data
## simulate x 3 times to get w_ij
## note for w_ij we take the n0 instead of n1 from the simulations
w <- Reduce(function(x, y) return(c(x, y)),
            x=replicate(3, sdlgen(lambda=lambda, a=1, b=1)[,'n0'], simplify=F))

y1 <- sdlgen(lambda=lambda, a=a, b=b.a) ## first y, with NO effect of x
y2 <- sdlgen(lambda=lambda, a=a, b=b.c) ## second y with effect of x
colnames(y1) <- c('x1', 'y1') ## note we keep the 'real' x for comparison
colnames(y2) <- c('x2', 'y2')
## overplot these on the curves
points(y1, pch=1)
points(y2, pch=2)

## put all the data into a dataframe for analysis with INLA - padding with NAs as appropriate
dat.t <- data.frame(j=rep(j, 5), lambda=rep(lambda, 5),
                    trt=factor(c(rep('w', 36*3), rep('y1', 36), rep('y2', 36)),
                      levels=c('w', 'y1', 'y2')),
                    w=c(w, rep(NA, 36*2)),
                    y1=c(rep(NA, 36*3), y1[,2], rep(NA, 36)),
                    y2=c(rep(NA, 36*4), y2[,2]))
## NOTE trt=T: the seeds in adjacent seed traps (3 per station, j)
##      trt=A: recruits in seedling plots when there is no effect of density (1 per station, j)
##      trt=C: recruits in seedling plots when there is (1 per station, j)

dat.t$j1 <- c(rep(j, 3), rep(NA, 36*2))     ## index for replicate for w
dat.t$j2 <- c(rep(NA, 36*3), j, rep(NA, 36)) ## index for replicate for trt 1
dat.t$j3 <- c(rep(NA, 36*3), rep(NA, 36), j) ## index for replicate for trt 2

## Now fit the (wrong in this case) model
library(INLA)

m.inla <- inla(cbind(w, y1, y2)~trt + f(j1, model='iid', hyper=list(theta=list(fixed=F))) +
                      f(j2, copy='j1', hyper=list(theta=list(fixed=FALSE))) +
                      f(j3, copy='j1', hyper=list(theta=list(fixed=FALSE))),
                      family=rep('nbinomial', 3), data=dat.t, control.compute=list(dic=TRUE))
m.inla$summary.hyper

robi

unread,

Dec 18, 2012, 6:04:38 AM12/18/12

to r-inla-disc...@googlegroups.com

Dear list,

Since I first sent this mail, Håvard has given me a lot of help off-list on this topic. A model can be defined in INLA to address this sort of analysis, but first a redefinition of the problem might be helpful as my initial post was a little confusing.

we assume that x is unobserved but w is and is related to x by

w ~ Pois(exp(x))

However, we are primarily interested in the relationship between x and another variable poisson distributed variable y, and want to fit

log(y) = a + bx

so

y ~ Pois(exp(a + bx))

The model is then easily fitted with the following code from Håvard:

## estimate 'a' and 'b'

## w ~ Pois(exp(x))

## y ~ Pois(exp(a + b*x))

n = 1000

x = rnorm(n)

a = 1.1

b = 2.2

w = rpois(n, lambda = exp(x))

y = rpois(n, lambda = exp(a + b*x))

Y = c(w, y)

X = c(1:n, rep(NA, n))

XX = c(rep(NA, n), 1:n)

A = c(rep(NA, n), rep(1, n))

formula = Y ~ -1 + A +

f(X, model="iid") +

f(XX, copy="X", hyper = list(theta = list(fixed=FALSE)))

r = inla(formula,

data = data.frame(Y, X, XX, A),

family = "poisson")

summary(r)

This works beautifully, giving good estimates of w, a and b. Thanks to Håvard for all the help,

Hope this is useful to someone. I'm happy to share the code for a more complex simulation with anyone who is interested.

Best wishes

Robi

muff....@gmail.com

unread,

Feb 14, 2013, 2:13:01 PM2/14/13

to r-inla-disc...@googlegroups.com

Hi Robi

We have just submitted a manuscript about error modelling with INLA, see Muff et al. (2013). Perhaps this is also if interest to you, or to other members of the list.

Best wishes
Steffi

Reply all

Reply to author

Forward