John et al
Thank you for your advice below. It was sloppy of me not to verify my reproducible code below. I have tried a few of your suggestions and wrapped the working code into the function below called pl2. The function properly lands on the right model parameters when I use the optim or nlminb (for nlminb I had to increase max iterations).
The function is enormously slow. At first, I created the object rr1 with two calls to sapply(). This works, but creates an extremely large matrix at each iteration.
library(statmod)
dat <- replicate(20, sample(c(0,1), 2000, replace = T))
a <- b <- rep(1, 20)
Q <- 10
qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1)
nds <- qq$nodes
wts <- qq$weights
rr1 <- sapply(1:nrow(dat), function(j)
sapply(1:Q, function(i)
exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * (qq$nodes[i] - b))), log = TRUE))) * qq$weights[i]))
So, I thought to reduce some memory, I would do it this way which is equivalent, doesn't create such a large matrix, but instead uses an explicit loop. Both approaches are still equally as slow.
rr1 <- numeric(nrow(dat))
for(j in 1:length(rr1)){
rr1[j] <- sum(sapply(1:Q,
function(i) exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * (nds[i] - b))), log = TRUE))) * wts[i]))
}
As you noted, my likelihood is not complex; in fact I have another program that uses newton-raphson with the analytic first and second derivatives because they are so easy to find. In that program, the model converges very (very) quickly. My purpose in using numeric differentiation is experiential in some respects and hoping to apply this to problems for which the analytic derivatives might not be so easy to come by.
I think the basic idea here to improve speed is to make a call to the gradient, which I understand to be the vector of first derivatives of my likelihood function, is that right? If that is right, in a multi-parameter problem, I'm not sure how to think about the gradient function. Since I am maximizing w.r.t. a and b (these are the parameters of the model), I would have a vector of first partials for a and another for b. So I conceptually do not understand what the gradient would be in this instance, perhaps some clarification would be helpful.
Below is the working function, which as I noted is enormously slow. Any advice on speed improvements here would be helpful. Thank you
pl2 <- function(dat, Q, startVal = NULL, ...){
if(!is.null(startVal) && length(startVal) != ncol(dat) ){
stop("Length of argument startVal not equal to the number of parameters estimated")
}
if(!is.null(startVal)){
startVal <- startVal
} else {
p <- colMeans(dat)
startValA <- rep(1, ncol(dat))
startValB <- as.vector(log((1 - p)/p))
startVal <- c(startValA,startValB)
}
rr1 <- numeric(nrow(dat))
qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1)
nds <- qq$nodes
wts <- qq$weights
dat <- as.matrix(dat)
fn <- function(params){
a <- params[1:20]
b <- params[21:40]
for(j in 1:length(rr1)){
rr1[j] <- sum(sapply(1:Q,
function(i) exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * (nds[i] - b))), log = TRUE))) * wts[i]))
}
-sum(log(rr1))
}
#opt <- optim(startVal, fn, method = "BFGS", hessian = TRUE)
opt <- nlminb(startVal, fn)
#opt <- Rcgmin(startVal, fn)
opt
#list("coefficients" = opt$par, "LogLik" = -opt$value, "Std.Error" = sqrt(diag(solve(opt$hessian))))
}
dat <- replicate(20, sample(c(0,1), 2000, replace = T))
r2 <- pl2(datat, Q =10)