Modelling of point data (presence/absence) and spatial autocorrelation

210 views

Skip to first unread message

Davide

unread,

Apr 10, 2024, 10:00:27 AM4/10/24

to R-inla discussion group

Hi all,

I preface this with: I am totally new to using INLA and inlabru.

I have built a random forest model that has predicted the presence/absence of species across 16 different marine sites. The sites are spread over around 3 degrees of longitude and latitude, and each site is around 40-70km away from another site.

Although there are 16 sites, daily presence absence has been predicted for each site (binary presence/absence data). Thus, for instance, for the first year of data I have around 477 rows of data (some days missing for some sites) including presence/absence, latitude, longitude.

I am interested in using R-INLA and inlabru to understand two things:

Firstly, I'm interested in understanding whether there is spatial autocorrelation in the data. Meaning if I have a presence at a site, do am I also more likely to have a presence at a nearby site? I have already used joint-count statistics for this, but I think using INLA would be even better.

Secondly, I was starting to think I could use INLA/inlabru to expand the point data to the region I'm working on to predict where the species is likely to be present/absent. I could combine the point data with environmental data (chlorophyll,temperature,oxygen etc.) to predict the density surface of the species across time (I have daily environmental variables thus I expect you could predict daily density surfaces for the species).

There are a number of things that are unclear to me. For instance, given that I have daily data, do I need to fit many different models? Also, Although the spatial lat/lon for daily data for each site differ slightly, they are basically representative of the same location, does this mean I should just average these? Even the structure of the model formula and all of the hyperparameters are eluding me at the moment.

For the spatial autocorrelation I have done the following, and I'm not even sure this is right:

# script to test for spatial autocorrelation

# converting to sf object
df = st_as_sf(df, coords = c("longitude","latitude"))

# ensuring CRS is correct
st_crs(df) <- "EPSG:4326"

# iporting map
map2 = ne_countries("large", country = "Maldives", returnclass = "sf")

# cropping map
map2 = st_crop(map2, xmin = 72, xmax = 74, ymin = -2, ymax = 2)

# checking projection
st_crs(df)
st_crs(map2)

# mesh creation
coords <- st_coordinates(df)

# Extract longitude values from the geometry column
longitudes <- st_coordinates(df)[, 1]

# Calculate the median longitude
median_lon <- median(longitudes)

# Determine the UTM zone and construct the UTM CRS string
utm_zone <- 30 + floor((median_lon + 6) / 6)
utm_crs <- paste0("EPSG:32", ifelse(median_lon > 0, "6", "7"), utm_zone)

# Reproject your data to the UTM CRS
# changes from latlon
data_utm <- st_transform(df, crs = utm_crs)

# extracting coordinstes
coords <- st_coordinates(data_utm)

# creating inla mesh
mesh = inla.mesh.2d(loc = coords, max.edge = c(50000, 50000))

# mapping spatial locations in the mesho to the observed datapoints
# A matrix maps the Gaussian Markov Random Field (GMRF) from the mesh nodes to the n observation location
A <- inla.spde.make.A(mesh=mesh,loc=as.matrix(coords));dim(A)

# creating spatial structure (SPDE object) - leave alpha as 2
spde <- inla.spde2.matern(mesh, alpha=2)

# creating required indexes
iset<-inla.spde.make.index("spatial.field", spde$n.spde)

# creating inla stack, I need to check if all this is correct
# this includes variables of interest
# the GMRF stack
stk <- inla.stack(
data=list(y=df$binary_presence), # adding binary variable to stack
A=list(A,1),
effects=list(iset,
list(year = df$year)), tag='dat')

# model formula
formula <- y ~ + 1 + f(spatial.field, model=spde) + year

# fitting model
model1 <- inla(formula, data=inla.stack.data(stk,spde=spde),
family= 'binomial',
control.predictor=list(A=inla.stack.A(stk),compute=TRUE),
control.compute = list(dic = TRUE, waic = TRUE, config = TRUE),
verbose = FALSE)

# Having a look at the spatial field
gproj <- inla.mesh.projector(mesh, dims = c(300, 300))
g.mean <- inla.mesh.project(gproj, model1$summary.random$spatial.field$mean)
g.sd <- inla.mesh.project(gproj, model1$summary.random$spatial.field$sd)

# what does this representation of the spatial field actually show me?
grid.arrange(levelplot(g.mean, scales=list(draw=F), xlab='', ylab='', main='mean',col.regions = heat.colors(16)),
levelplot(g.sd, scal=list(draw=F), xla='', yla='', main='sd' ,col.regions = heat.colors(16)), nrow=1)

If someone could help with both questions it would be amazing. I am sure that the packages are able to do exactly what I need to do, but I just need a little help.

Cheers,