(When) Is it worth going beyond 15 NNGP ?

30 views
Skip to first unread message

Marc Kéry

unread,
Apr 11, 2025, 5:56:07 AMApr 11
to spOccupancy and spAbundance users
Dear Jeff,

we're fitting some models with spPGOcc and have what appear to be decent results with 5 and 15 NNGP. As I understand it, the latter is preferable since it is less of an approximation to the full GP model. But when would you explore higher numbers for the NNGP, or would you even try to run the model with the full GP ?

Thanks for your advice, best regards  --- Marc

Jeffrey Doser

unread,
Apr 15, 2025, 7:52:22 AMApr 15
to spOccupancy and spAbundance users
Hi Marc, 

Apologies for the delay. As you might expect, the optimal number of neighbors to use for an NNGP approximation is dependent on a variety of patterns. As you increase the number of neighbors in the approximation, the NNGP gets closer and closer to the full GP. The reason why the NNGP works well in approximating spatial patterns is related to what Michael Stein called the "screening effect" back in 2002, which is that most of the information used in estimating a GP spatial random effect at a given location comes from the locations closest to it. Of course, how will this performs depends on the underlying complexity of the residual spatial autocorrelation and distribution of the observed point locations in space. In short, you will need more neighbors to get a good approximation of a GP when the spatial autocorrelation is very fine-scale and/or when the design of the observed locations is "complex" (e.g., highly clustered sampling design, or large missing chunks of the study area). If the spatial autocorrelation is relatively simple (e.g., broad-scale or long range) then often fewer neighbors does quite well (e.g., 5). 15 neighbors is by no means a magic number, it was just shown in the original Datta et al. (2016) paper to be a point at which minimal benefits were obtained by adding more neighbors. Of course, there are also computational considerations to think about as well. If the data set is in fact very large (e.g., tens of thousands or more), then using 15 neighbors can be fairly slow and one may have to resort to fewer, especially for more complex models than the basic single-species spatial occupancy model. For a data set with a few hundred locations, then I would always recommend at least starting out with 15 neighbors. If you use 15 neighbors and see that the estimated spatial autocorrelation is found to be very broad scale (e.g., the phi parameter is very small), then that may indicate that you can reduce the number of neighbors without losing complexity. Additionally, if you predict using an NNGP model and notice some unusually bizarre patterns in the predicted spatial random effect surface (e.g., there are abrupt changes in the spatial random effect that do not appear spatially smooth), that is an indicator that you need more neighbors for a better approximation. 

Hope that helps. While there has been some work done in the spatial stats literature on this, I certainly think there is some room here for an applied simulation study to get some more concrete recommendations under different scenarios. 

Jeff

Marc Kéry

unread,
Apr 16, 2025, 3:30:05 AMApr 16
to Jeffrey Doser, spOccupancy and spAbundance users
Dear Jeff,

thanks for this extremely helpful information. Would the following reasoning make sense too ? If the change in param estimates is minimal as one goes from, say, 15 to 30 neighbours, then that is enough and we don't need to try any higher number (and would presumably present the results of the model with 30) ?

Best regards  --- Marc

--
You received this message because you are subscribed to the Google Groups "spOccupancy and spAbundance users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spocc-spabund-u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/spocc-spabund-users/b0f9eb92-9efc-45c2-a623-b3f79a848381n%40googlegroups.com.


--
______________________________________________________________
 
Marc Kéry
Tel. ++41 41 462 97 93
marc...@vogelwarte.ch
www.vogelwarte.ch
 
Swiss Ornithological Institute | Seerose 1 | CH-6204 Sempach | Switzerland
______________________________________________________________

*** Hierarchical modeling in ecology ***

Jeffrey Doser

unread,
Apr 16, 2025, 8:09:06 AMApr 16
to spOccupancy and spAbundance users
Hi Marc, 

Yes, that reasoning certainly makes sense. You could of course compare WAIC as well in addition to the parameter estimates. 

Cheers,

Jeff

Marc Kéry

unread,
Apr 16, 2025, 2:47:30 PMApr 16
to Jeffrey Doser, spOccupancy and spAbundance users
Dear Jeff,

thank you very much.

Interesting about the suggestion to use WAIC. Naively, I had assumed that it would not make sense to compare with the WAIC different approximations to the full GP, i.e., NNGPs with a different number of neighbours. I had thought that since they are all approximations, the higher the number of neighbours, the better the approximation, but the more costly this is in terms of computation time. Hence, we ought to choose the "cheapest" (in terms of computation time) that is good enough. I had then thought that one way of deciding when the approximation is good enough would be by comparison with a model run with lower value of NNGP. Similar in spirit with an optimization, where we say we have arrived at the optimum when some difference criterion computed from two subsequent steps in the iterative search is lower than some chosen threshold.

Best regards  --- Marc

Reply all
Reply to author
Forward
0 new messages