robi
unread,Dec 10, 2012, 5:14:09 PM12/10/12Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to r-inla-disc...@googlegroups.com
Dear list,
I
am trying to use R-INLA for a measurement error problem. However, I am
struggling to formulate it properly, so I would appreciate any help you
could give.
The model is something like this:
log(y) = a + b*log(x)
However, we don't have measurements of x itself. Instead, we have 3 approximations of x with error (w). So w_i = x + eps_i.
So this is a basic measurement error problem. The only additional complexity is that both y and x are count data.
I would like to simultaneously estimate x and then use that estimate in the equation above to calculate a and b
My first thought was to use the copy function. I defined the formula as
formula = cbind(x, y)~ 0 + trt + f(j, model='iid', hyper=list(theta=list(fixed=FALSE))) +
f(j2, copy='j', hyper=list(theta=list(fixed=FALSE))),
where
j and j2 are indices for the replicates of x and y, and are NA when x
and y are NA. trt is a label for whether each datapoint is an x or a y.
The problem I see here is that I think this is fitting the equation
log(y) = a + b*log(v)
where v[j] = log(x[j]) - mean(log(x))
Is
there anyway of fitting this sort of model in inla? It seems like
something it could do, but I can't figure it out. My knowledge of the
tools to manipulate models is still fairly poor. Apologies if this is
trivial.
I have code that simulates sthe sort of data we have (it's a little more complex than the description above, but not much). I have attached this in case it is useful.
Many thanks in advance for any help.
best wishes
Robi
## Below is a simulation of the type of data we have.
## There are 3 measurements of x, w_i (i=1:3) for each of 36 replicates,(j=1:36)
## For each replicate, we measured y under k treatments to provide y_jk.
## For simplicities sake, let us assume that k = 2,
## I have simulated different effects of density under the two treatments
## For one, x has no effect (b=1) but for the other x as a strongeffect (b=0.8)
## here is the simulation:
## a function that takes lambda, the expecation of x,
## also a and b to simulate y data
## this simulates x_i and y_i
## Note that x_i for another call of sdlgen with the same
## lambda will give a 'measurement' of x, called w_i
sdlgen <- function(lambda, a, b, model='nb'){
## (default) negative binomial with parameters size=1, p=1/(1+lambda)
if(model=='nb')
n0=rnbinom(n=length(lambda), mu=lambda, size=1) ## simulated x_ij
else
## otherwise poisson with parameters lambda=lambda
n0=rpois(n=length(lambda), lambda)
## now simulate y as y_ij = a*x_ij^b
n1 <- rbinom(length(lambda), size=n0, prob=a*n0^(b-1))
return(cbind(n0,n1)) # return both the N0 and the N1
}
## set parameters for the simulation
a <- 0.1; b1 <- 1; b2 <- 0.8 ## b=effect of x
# b1 and b2 are two treatments,
# b1 has no effect,
# b2 has a small one
j <- 1:36 ## vector of replicates, j
lambda <- exp(rnorm(36, mean=5)) ## expected x for each station, lambda_j;
## looking at the expected curves
curve(a*x^b1, from=1, to=1000)
curve(a*x^b2, from=1, to=1000, add=T, lty=2) ## note b2 gives lower y at high x
## now simulate the "observed" data
## simulate x 3 times to get w_ij
## note for w_ij we take the n0 instead of n1 from the simulations
w <- Reduce(function(x, y) return(c(x, y)),
x=replicate(3, sdlgen(lambda=lambda, a=1, b=1)[,'n0'], simplify=F))
y1 <- sdlgen(lambda=lambda, a=a, b=b.a) ## first y, with NO effect of x
y2 <- sdlgen(lambda=lambda, a=a, b=b.c) ## second y with effect of x
colnames(y1) <- c('x1', 'y1') ## note we keep the 'real' x for comparison
colnames(y2) <- c('x2', 'y2')
## overplot these on the curves
points(y1, pch=1)
points(y2, pch=2)
## put all the data into a dataframe for analysis with INLA - padding with NAs as appropriate
dat.t <- data.frame(j=rep(j, 5), lambda=rep(lambda, 5),
trt=factor(c(rep('w', 36*3), rep('y1', 36), rep('y2', 36)),
levels=c('w', 'y1', 'y2')),
w=c(w, rep(NA, 36*2)),
y1=c(rep(NA, 36*3), y1[,2], rep(NA, 36)),
y2=c(rep(NA, 36*4), y2[,2]))
## NOTE trt=T: the seeds in adjacent seed traps (3 per station, j)
## trt=A: recruits in seedling plots when there is no effect of density (1 per station, j)
## trt=C: recruits in seedling plots when there is (1 per station, j)
dat.t$j1 <- c(rep(j, 3), rep(NA, 36*2)) ## index for replicate for w
dat.t$j2 <- c(rep(NA, 36*3), j, rep(NA, 36)) ## index for replicate for trt 1
dat.t$j3 <- c(rep(NA, 36*3), rep(NA, 36), j) ## index for replicate for trt 2
## Now fit the (wrong in this case) model
library(INLA)
m.inla <- inla(cbind(w, y1, y2)~trt + f(j1, model='iid', hyper=list(theta=list(fixed=F))) +
f(j2, copy='j1', hyper=list(theta=list(fixed=FALSE))) +
f(j3, copy='j1', hyper=list(theta=list(fixed=FALSE))),
family=rep('nbinomial', 3), data=dat.t, control.compute=list(dic=TRUE))
m.inla$summary.hyper