Dear all,
I'm trying to fit a multinomial model (using the Poisson trick) that varies spatio-temporally. With this model, I am trying to predict the probability of a data point being a particular 'authority' (think region/country) given its location and covariates. From this, I will e.g. predict the probability of a particular location being an authority given all other authorities (i.e. normalise across the predicted rates to get a probability).
For each data point I've replicated the rows across the number of alternative Authorities that the data point could be attributed to. A new column (obs) is created that is either 1 or 0, with 1 denoting which of the replicated rows is the observed authority. All other replicated rows within that 'strata' are given a value of 0.
Each observed data point has an associated count - the greater the count, the more confident that this location is assigned to a particular authority. So ideally, I want to weight the likelihood to take this into account. Currently, I have this count logged and included as an offset but I'm not sure if this is the correct approach?
Spatially, the data points denoted to a specific 'authority' are concentrated within a particular part of Britain. For example, one authority is concentrated in the south-west; one in the north-east, etc. Data points with different authorities might overlap, e.g. where two data points are similar spatially but have different authorities, but they will be concentrated largely separately (i.e. there will be a 'buffer' between authorities with a decay from each authority centre).
Temporally, authorities can 'appear' and 'disappear'. For example, one authority might be present between two dates. Then it will no longer be present. Another will be present/not present at different times. I presume that when multiple authorities are present they will 'compete' with one another, i.e. they won't be independent, with a 'buffer' appearing between authorities.
I have been able to model this spatially, but not when including the temporal element (the model fitting fails):
fit_data <- readRDS("M_data_sample.rds")
mesh <- readRDS("M_mesh.rds")
spde <- readRDS("M_spde.rds")
components <- ~
Authority(Authority_new, model = "factor_full") +
Elevation(elevation_km, model = "linear") +
River(rivers_1_dist2_km, model = "linear") +
field(geometry, model = spde, replicate= Authority_new) +
unique_id(location_id, model="iid", fixed = T)
# field, replicate is expected to fit a surface to each Authority_new, assuming that they have different spatial fields. to take into account that they are 'present' in different locations of the country.
fit_poi <- bru(
components = components,
formula = obs ~ -1 + unique_id + field + Authority + Elevation + River + offset(log_count),
family = "poisson",
data = fit_data,
options = list(control.compute = list(dic = FALSE, waic = FALSE, cpo = FALSE)))
# include offset(log_count) to take into account the count associated with each data point. I'm unsure if this is correct, though?
For the spatio-temporal model, I do:
fit_data <- readRDS("M_data_sample.rds")
mesh <- readRDS("M_mesh.rds")
spde <- readRDS("M_spde.rds")
unique_times <- seq(-125.5, 37.5, by = 0.5) # do this so i can then predict authority extents at unobserved time stamps.
time_map <- setNames(seq_along(unique_times), unique_times)
fit_data$time_idx <- time_map[as.character(fit_data$Mid.Date)]
components <- ~
Authority(Authority_new, model = "iid") +
Elevation(elevation_km, model = "linear") +
River(rivers_1_dist2_km, model = "linear") +
field(geometry,
model = spde,
replicate = Authority_new,
group = time_idx,
control.group = list(model = "ar1"), hyper = prec.prior) +
unique_id(location_id, model = "iid", fixed = TRUE)
# for spde, allow it to be grouped by time_idx using an ar1 model, replicated across each Authority.
prec.prior <- list(prec = list(param = c(0.001, 0.001)))
fit_poi5 <- bru(
components = components,
formula = obs ~ -1 + unique_id + field + Authority + Elevation + River + offset(log_count),
family = "poisson",
data = fit_data,
options = list(control.compute = list(dic = FALSE, waic = FALSE, cpo = FALSE), verbose = TRUE))
A summary of my data/full dataset (mesh, spde etc) is attached.
Any help would be greatly appreciated.
Thank you.
Joe