Hi all,
I preface
this with: I am totally new to using INLA and inlabru.
I have built a random forest model that has predicted the presence/absence of
species across 16 different marine sites. The sites are spread over around 3
degrees of longitude and latitude, and each site is around 40-70km away from
another site.
Although there are 16 sites, daily presence absence has been predicted for each
site (binary presence/absence data). Thus, for instance, for the first year of
data I have around 477 rows of data (some days missing for some sites)
including presence/absence, latitude, longitude.
I am interested in using R-INLA and inlabru to understand two things:
Firstly,
I'm interested in understanding whether there is spatial autocorrelation in the
data. Meaning if I have a presence at a site, do am I also more likely to have
a presence at a nearby site? I have already used joint-count statistics
for this, but I think using INLA would be even better.
Secondly, I
was starting to think I could use INLA/inlabru to expand the point data to the
region I'm working on to predict where the species is likely to be present/absent. I could combine the point data with environmental data
(chlorophyll,temperature,oxygen etc.) to predict the density surface of the
species across time (I have daily environmental variables thus I expect you
could predict daily density surfaces for the species).
There are a number of things that are unclear to me. For instance, given that I
have daily data, do I need to fit many different models? Also, Although the spatial
lat/lon for daily data for each site differ slightly, they are basically
representative of the same location, does this mean I should just average
these? Even the structure of the model formula and all of the hyperparameters are eluding me at the moment.
For the spatial autocorrelation I have done the following, and I'm not even
sure this is right:
# script to test for spatial autocorrelation
# converting to sf object
df = st_as_sf(df, coords = c("longitude","latitude"))
# ensuring CRS is correct
st_crs(df) <- "EPSG:4326"
# iporting map
map2 = ne_countries("large", country = "Maldives",
returnclass = "sf")
# cropping map
map2 = st_crop(map2, xmin = 72, xmax = 74, ymin = -2, ymax = 2)
# checking projection
st_crs(df)
st_crs(map2)
# mesh creation
coords <- st_coordinates(df)
# Extract longitude values from the geometry column
longitudes <- st_coordinates(df)[, 1]
# Calculate the median longitude
median_lon <- median(longitudes)
# Determine the UTM zone and construct the UTM CRS string
utm_zone <- 30 + floor((median_lon + 6) / 6)
utm_crs <- paste0("EPSG:32", ifelse(median_lon > 0, "6",
"7"), utm_zone)
# Reproject your data to the UTM CRS
# changes from latlon
data_utm <- st_transform(df, crs = utm_crs)
# extracting coordinstes
coords <- st_coordinates(data_utm)
# creating inla mesh
mesh = inla.mesh.2d(loc = coords, max.edge = c(50000, 50000))
# mapping spatial locations in the mesho to the observed datapoints
# A matrix maps the Gaussian Markov Random Field (GMRF) from the mesh nodes to
the n observation location
A <- inla.spde.make.A(mesh=mesh,loc=as.matrix(coords));dim(A)
# creating spatial structure (SPDE object) - leave alpha as 2
spde <- inla.spde2.matern(mesh, alpha=2)
# creating required indexes
iset<-inla.spde.make.index("spatial.field", spde$n.spde)
# creating inla stack, I need to check if all this is correct
# this includes variables of interest
# the GMRF stack
stk <- inla.stack(
data=list(y=df$binary_presence), # adding binary variable to stack
A=list(A,1),
effects=list(iset,
list(year = df$year)),
tag='dat')
# model formula
formula <- y ~ + 1 + f(spatial.field, model=spde) + year
# fitting model
model1 <- inla(formula, data=inla.stack.data(stk,spde=spde),
family= 'binomial',
control.predictor=list(A=inla.stack.A(stk),compute=TRUE),
control.compute =
list(dic = TRUE, waic = TRUE, config = TRUE),
verbose = FALSE)
# Having a look at the spatial field
gproj <- inla.mesh.projector(mesh, dims = c(300, 300))
g.mean <- inla.mesh.project(gproj, model1$summary.random$spatial.field$mean)
g.sd <- inla.mesh.project(gproj, model1$summary.random$spatial.field$sd)
# what does this representation of the spatial field actually show me?
grid.arrange(levelplot(g.mean, scales=list(draw=F), xlab='', ylab='',
main='mean',col.regions = heat.colors(16)),
levelplot(g.sd,
scal=list(draw=F), xla='', yla='', main='sd' ,col.regions = heat.colors(16)),
nrow=1)
If someone could help with both questions it would be amazing. I am sure that
the packages are able to do exactly what I need to do, but I just need a little
help.
Cheers,
Davide