feature request list

24 views

Skip to first unread message

Mike D

unread,

Mar 2, 2012, 4:04:46 PM3/2/12

to HyperNiche and NPMR

Hi Bruce, all,
I just took Heather's workshop this week at Hatfield Marine Science
Center (great job, Heather!), and am very interested in using NPMR in
my work. Specifically, what I'm trying to do is:
1) quantify species-habitat relationships for select benthic fishes
surveyed visually using a Remotely Operated Vehicle (ROV) against
environmental variables derived from high-resolution seafloor
elevation data (e.g., depth, slope, aspect at a range of spatial
scales), and use the resulting model to predict either the probability
of occurrence or density at unsampled locations within the same reef
complex.
2) quantify species-habitat relationships for giant kelp (canopies)
surveyed using aerial infrared photography against environmental
variables derived from high-resolution seafloor elevation data (e.g.,
depth, slope, aspect at a range of spatial scales). The response
variable is a composite measure of canopy persistence (# of
observations per cell).
Both of these datasets are GIS grids with cell sizes ~2m, have a
spatial autocorrelation component (especially the kelp data), and have
tens of thousands to hundreds of thousands of records.

I used a subset of the ROV dataset (30k records, 3 spp., presence-
absence response, 11 predictor variables) and tried to run a model in
NPMR but nothing registered on the progress bar after 15 minutes, so I
aborted. I iteratively reduced the dataset in size until 850 records
remained, in which case the analysis took ~5 min to run. I noticed
during the run when I opened up task manager to assess the memory and
processor performance that only 1 of 12 CPU cores was being tapped by
NPMR. So, my feature request is that the software be adapted to run on
multiple cores in order to speed up processing time for large
datasets.

I also noticed a discussion thread in this group pertaining to spatial
autocorrelation. I have seen some publications (e.g., Lichtenstein et
al. 2002, SPATIAL AUTOCORRELATION AND AUTOREGRESSIVE MODELS
IN ECOLOGY, Ecological Monographs 72(3) pp 445-463) that recommend
explicitly modeling such autocorrelation in order to evaluate the
predictor variables "independently" of any contamination due to
contagious processes like schooling behavior, dispersal limitation,
etc. Of course hypothesis testing has more rigorous requirements about
data independence than exploratory data, but the underlying desire of
being able to model both with and without spatial autocorrelation is
the same, and I think it would be an extremely valuable feature to
include in NPMR (perhaps by inclusion of a geographic distance matrix
for sampling locations?).
cheers,
Mike

Bruce McCune

unread,

Mar 2, 2012, 10:37:40 PM3/2/12

to hyper...@googlegroups.com

Hi Mike,

Thanks for your interesting contributions. I have a few points in response:

On multiple processors: point taken. This is on our list for future
versions. So far we have been focusing on building various kinds of
functionality but postponing the speed issue. It sounds like you are
making some progress by sampling your data -- which is good. I might
add that you can actually strengthen the models some by using
stratified random sampling to make emphasis more uniform across your
predictor space, as opposed to wasting a bunch of time on the
oversampled parts of the space. You have probably discovered this.

On the autocorrelation issue: Because NPMR is using a local model,
you can already model the autocorrelation with NPMR by just including
coordinates as predictors -- you don't need to calculated a distance
matrix with a local model. You can visualize this easily by fitting a
response variable to x and y as predictors, then looking at a contour
map of the surface. That surface is built on the autocorrelation --
if it wasn't there, the surface would just be flat or irregularly
bumpy. The difficulty as I see it is that many of the interesting
drivers are themselves autocorrelated, which means you risk throwing
the baby out with the bath water.

I believe that this built-in modeling of autocorrelation is a feature
of NPMR that hasn't been explored in the literature. It would be
interesting to compare different modeling strategies for dealing with
autocorrelation wtih NPMR. Please post a link to the pdf here if you
or someone else does this!