Fitting SCR models using RTMB

53 views
Skip to first unread message

Rishika Chopara

unread,
Jun 30, 2025, 1:39:46 AMJun 30
to TMB Users

I am currently trying to use RTMB to fit SCR (spatial capture-recapture) models. I am using the following objective function: 

objective.fn = function(params) {

  # --------------------------------------

  # Pulling in 'data.vec'
  getDat = function(null) {
    print("Get data")
    env_get(data.env, "data")
  }
  empty = rep(params$beta0, 0)
  data.vec = DataEval(getDat, empty)

  # --------------------------------------

  # Unpacking parameters
  beta0 = params[["beta0"]]
  logit.g0 = params[["logit.g0"]]
  log.sigma = params[["log.sigma"]]
  g0 = exp(logit.g0)/(1 + exp(logit.g0))
  sigma = exp(log.sigma)

  # Unpacking data
  nocc = data[["n.occasions"]]
  mask.dists = data[["mask.dists"]]
  n.traps = ncol(mask.dists)
  n.mask = nrow(mask.dists)
  mask = data[["mask"]]
  a = attr(mask, "area")

  # Unpacking 'data.vec'
  animal.indicator = data.vec[1:M] 
  capt.hist = matrix(data.vec[-(1:M)], ncol=n.traps) 
  n = sum(animal.indicator) 

  # --------------------------------------

  ## Negative log-likelihood calculations

  # Density at each mask point (homogeneous density)
  D = rep(exp(beta0), n.mask)

  # Detection probability calculations
  mask.probs = g0 * exp( -mask.dists^2 / (2 * sigma^2) )
  p.avoid = advector(rep(0, n.mask))
  p.avoid = (apply(1 - mask.probs, 1, prod)) ^ nocc
  p.det = 1 - p.avoid

  # E[n]
  esa.mult.D = sum(a * D * p.det)

  # Contribution of each capture history to the log-likelihood
  mask.integrand = exp((log(mask.probs) %*% t(capt.hist)) + (log(1-mask.probs) %*% t(nocc-capt.hist)) + log(D) - log(esa.mult.D))
  log.lik.contr = log(colSums(mask.integrand))

  # Sending contributions for unobserved animals to 0. Then, summing across all contributions
  log.f.capt = sum(log.lik.contr * animal.indicator)

  # Log-likelihood contribution for n
  log.f.n = dpois(n, esa.mult.D, log=TRUE)

  # Overall log-likelihood
  ll = log.f.n + log.f.capt

  # Calculating the negative log-likelihood
  nllike = ll * (-1)

  # --------------------------------------

  ADREPORT(g0)
  ADREPORT(sigma)
  nllike
}

I have tried to write this objective function so that I can reuse the output from MakeADFun() to fit my model to multiple data sets. I understand that when updating data, I must be working with a vector of fixed length. However, when fitting these models, the data I want to update (the capture history matrix) often changes dimension. For each data set, the number of observed animals determines the number of rows in the matrix.

To get around this, I select a value of M large enough so that the probability of observing  M animals is essentially 0. I fit my model to a capture history matrix with M rows, where the last M-n rows represent artificial, unobserved animals that I manually add to the data.

Specifically, I assume the first 1:M elements of the vector labelled 'data.vec' specify an indicator vector, and the remaining elements specify the corresponding capture history matrix. This capture history matrix will have M rows, where the last M-n rows will be all-0 capture histories. Rows representing observed animals will correspond to a value of 1 in the indicator vector, and vice versa. I use the indicator vector in my calculations so that the unobserved animals do not contribute anything to the final value of the log-likelihood.


I have two main questions:

1) Are there ways I can make my objective function more efficient?

2) I would like to fit my SCR model to many simulated datasets. The approach described in (a) is my best attempt at providing the data as a vector of fixed length, so I do not have to re-run MakeADFun() to fit my model to every data set. Does this seem like a good approach, or are there any other recommended solutions to this?


Let me know if I can clarify anything. Thank you in advance! 

Mollie Brooks

unread,
Aug 8, 2025, 8:23:30 AMAug 8
to Rishika Chopara, TMB Users
Hi Rishika,

Sorry this question slipped by over the summer. Maybe you fixed it already.

I might be missing some of the complexity that is needed, but it seems like things could be simplified. I made some comments below.



On 30 Jun 2025, at 07.39, 'Rishika Chopara' via TMB Users <tmb-...@googlegroups.com> wrote:


I am currently trying to use RTMB to fit SCR (spatial capture-recapture) models. I am using the following objective function: 

objective.fn = function(params) {

Instead of the unpacking parts below, why are you not storing all the objects in lists named "dat" and "par" and then use the function getAll(dat, par) near the top of the objective function? That seems to be a standard from what I’ve seen.

  # --------------------------------------

  # Pulling in 'data.vec'
  getDat = function(null) {
    print("Get data")
    env_get(data.env, "data")
  }
  empty = rep(params$beta0, 0)
  data.vec = DataEval(getDat, empty)

  # --------------------------------------

  # Unpacking parameters
  beta0 = params[["beta0"]]
  logit.g0 = params[["logit.g0"]]
  log.sigma = params[["log.sigma"]]
  g0 = exp(logit.g0)/(1 + exp(logit.g0))
  sigma = exp(log.sigma)

  # Unpacking data
  nocc = data[["n.occasions"]]
  mask.dists = data[["mask.dists"]]
  n.traps = ncol(mask.dists)
  n.mask = nrow(mask.dists)
  mask = data[["mask"]]
  a = attr(mask, "area")

Would be possible to organize the data outside of the objective function so that animal.indicator and capt.hist are different objects in the list "dat" that I suggested above?
I looked at my own porject where I use the same objective function for many data sets and it looks like I always rerun MakeADFun with every new data set. Maybe someone else will chime in and suggest something else.

Best wishes,
Mollie



Let me know if I can clarify anything. Thank you in advance! 


--
To post to this group, send email to us...@tmb-project.org. Before posting, please check the wiki and issuetracker at https://github.com/kaskr/adcomp/. Please try to create a simple repeatable example to go with your question (e.g issues 154, 134, 51). Use the issuetracker to report bugs.
---
You received this message because you are subscribed to the Google Groups "TMB Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tmb-users+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tmb-users/e26e5d17-32a2-48f7-a5fe-ffc2cd131d7dn%40googlegroups.com.

Message has been deleted

Rishika Chopara

unread,
Aug 12, 2025, 5:29:22 AMAug 12
to TMB Users
Thank you for your reply, I appreciate it! 

So far, I haven't organised the data outside of the objective function so that 'animal.indicator' and 'capt.hist' are separate objects because it seems any data I want to update when reusing the tape must be provided as a numeric vector of fixed length. Each time I fit the model, I want to update the 'animal.indicator' and 'capt.hist' objects, so I provide them in the same 'data.vec' vector (which has a fixed length). Feel free to let me know I can clarify this explanation :)

Mollie Brooks

unread,
Aug 18, 2025, 6:26:24 AMAug 18
to Rishika Chopara, TMB Users
Hi Rishika,

I changed an existing example to show that the data input can be a list with vectors of different sizes (see below). Was this the limiting thing for your data organization? The only thing I couldn’t put in the list in my experiences so far was a formula (but I found a different way around that... a separate topic).

cheers,
Mollie

library(RTMB)
data(ChickWeight)

indx1 = 1:500
dat1 = list(weight=ChickWeight[indx1,]$weight,
Time=ChickWeight[indx1,]$Time,
Chick=ChickWeight[indx1,]$Chick,
covariate = 5) #list item with different length just to show flexibility
indx2 = 20:578 #another list with objects of different length from above
dat2 = list(weight=ChickWeight[indx2,]$weight,
Time=ChickWeight[indx2,]$Time,
Chick=ChickWeight[indx2,]$Chick,
covariate = 4)#list item with different length just to show flexibility
parameters = list(
mua=0,          ## Mean slope
sda=1,          ## Std of slopes
mub=0,          ## Mean intercept
sdb=1,          ## Std of intercepts
sdeps=1,        ## Residual Std
a=rep(0, 50),   ## Random slope by chick
b=rep(0, 50),   ## Random intercept by chick
beta=0        ## Coefficient on covariate
)

objective.fn = function(parms, dat) {
getAll(dat, parms, warn=FALSE)
## Optional (enables extra RTMB features)
weight <- OBS(weight)
## Initialize joint negative log likelihood
nll <- 0
## Random slopes
nll <- nll - sum(dnorm(a, mean=mua, sd=sda, log=TRUE))
## Random intercepts
nll <- nll - sum(dnorm(b, mean=mub, sd=sdb, log=TRUE))
## Data
predWeight <- a[Chick] * Time + b[Chick] + beta*covariate
nll <- nll - sum(dnorm(weight, predWeight, sd=sdeps, log=TRUE))
## Get predicted weight uncertainties
ADREPORT(predWeight)
## Return
nll
}

cmb = function(f, d) function(p) f(p, d)

obj = MakeADFun(cmb(objective.fn, dat1), parameters, random=c("a", "b"))
opt = nlminb(obj$par, obj$fn, obj$gr)
opt$par

obj = MakeADFun(cmb(objective.fn, dat2), parameters, random=c("a", "b"))
opt = nlminb(obj$par, obj$fn, obj$gr)
opt$par

Mollie Brooks

unread,
Aug 18, 2025, 6:43:14 AMAug 18
to Rishika Chopara, TMB Users
P.S. Since your problem has a matrix in the data, here’s a version with that.


indx1 = 1:500
dat1 = list(weight=ChickWeight[indx1,]$weight,
Time=ChickWeight[indx1,]$Time,
Chick=ChickWeight[indx1,]$Chick,
covariate = 5,#list item with different length just to show flexibility
mat = matrix(1, nrow=5, ncol=4)
indx2 = 20:578 #another list with objects of different length from above
dat2 = list(weight=ChickWeight[indx2,]$weight,
Time=ChickWeight[indx2,]$Time,
Chick=ChickWeight[indx2,]$Chick,
covariate = 4,#list item with different length just to show flexibility
mat = matrix(1, nrow=4, ncol=4)
)

objective.fn = function(parms, dat) {
getAll(dat, parms, warn=FALSE)
## Optional (enables extra RTMB features)
weight <- OBS(weight)
## Initialize joint negative log likelihood
nll <- 0
## Random slopes
nll <- nll - sum(dnorm(a, mean=mua, sd=sda, log=TRUE))
## Random intercepts
nll <- nll - sum(dnorm(b, mean=mub, sd=sdb, log=TRUE))
## Data
predWeight <- a[Chick] * Time + b[Chick] + beta*covariate + mat[1,1]
nll <- nll - sum(dnorm(weight, predWeight, sd=sdeps, log=TRUE))
## Get predicted weight uncertainties
ADREPORT(predWeight)
## Return
nll
}
Reply all
Reply to author
Forward
0 new messages