Hi Marc,
Apologies for the delay. As you might expect, the optimal number of neighbors to use for an NNGP approximation is dependent on a variety of patterns. As you increase the number of neighbors in the approximation, the NNGP gets closer and closer to the full GP. The reason why the NNGP works well in approximating spatial patterns is related to what Michael Stein called the "screening effect" back in 2002, which is that most of the information used in estimating a GP spatial random effect at a given location comes from the locations closest to it. Of course, how will this performs depends on the underlying complexity of the residual spatial autocorrelation and distribution of the observed point locations in space. In short, you will need more neighbors to get a good approximation of a GP when the spatial autocorrelation is very fine-scale and/or when the design of the observed locations is "complex" (e.g., highly clustered sampling design, or large missing chunks of the study area). If the spatial autocorrelation is relatively simple (e.g., broad-scale or long range) then often fewer neighbors does quite well (e.g., 5). 15 neighbors is by no means a magic number, it was just shown in the original Datta et al. (2016) paper to be a point at which minimal benefits were obtained by adding more neighbors. Of course, there are also computational considerations to think about as well. If the data set is in fact very large (e.g., tens of thousands or more), then using 15 neighbors can be fairly slow and one may have to resort to fewer, especially for more complex models than the basic single-species spatial occupancy model. For a data set with a few hundred locations, then I would always recommend at least starting out with 15 neighbors. If you use 15 neighbors and see that the estimated spatial autocorrelation is found to be very broad scale (e.g., the phi parameter is very small), then that may indicate that you can reduce the number of neighbors without losing complexity. Additionally, if you predict using an NNGP model and notice some unusually bizarre patterns in the predicted spatial random effect surface (e.g., there are abrupt changes in the spatial random effect that do not appear spatially smooth), that is an indicator that you need more neighbors for a better approximation.
Hope that helps. While there has been some work done in the spatial stats literature on this, I certainly think there is some room here for an applied simulation study to get some more concrete recommendations under different scenarios.
Jeff