#28: Part 2, Spatial statistical models for stream: applications and inference

45 views
Skip to first unread message

Dan Isaak

unread,
Jun 19, 2012, 4:42:36 AM6/19/12
to ClimateAquaticsBlog
Using the right tools is even better…

Hi Everyone,
Last time out it was argued that Ver Hoef and Peterson’s spatial
statistical network models are a fundamentally better tool for
analyzing many types of stream attributes, particularly when the
locations of samples are characterized by non-randomness and spatial
clustering as will often be the case with aggregated databases. This
time we’re highlighting some of spatial model applications to show the
sorts of improved information they may provide about the attributes of
stream networks. I thought the easiest way to do this is simply by
stepping through an example because the map graphics will convey a lot
more information more efficiently than I can write about it. Before
starting, however, let me re-emphasize the fact that there’s an
untapped goldmine of data out there to learn from if/when it’s
organized into functional databases. There are thousands upon
thousands of stream sites that have been sampled to determine the
occurrence and abundance of species (graphic 1), there are rapidly
growing databases of genetic attributes for these species (graphic 2),
there are thousands of sites where regulatory agencies monitor water
quality attributes (graphic 3), and of course, 10’s of thousands of
sites with stream temperature measurements (graphic 4, blog #25). Each
of those individual samples is ultimately just a local representation
of much broader spatial patterns when viewed at the stream or river
network scale. The Ver Hoef & Peterson models simply allow us to
describe these patterns more accurately, and sometimes in ways that
were previously impossible.

So in the example, we’ll use a temperature database compiled from
several state and federal agencies across a 7,000 km2 mountain river
basin in central Idaho (graphic 5). In this basin, there were almost
800 summers of data available across a stream network of 2,500
kilometers, so autocorrelation & spatial redundancy among some of
these measurements was a strong possibility. These data were fit with
2 models; a traditional, non-spatial multiple regression model
(graphic 6, upper panel) & the spatial statistical stream regression
model (lower panel). The same set of predictors was used in each
model, but notice that we get different parameter estimates describing
the relationships to stream temperature in each model. That’s because
the non-spatial model estimates were biased by the autocorrelation in
the database. Moreover, this bias has consequences when we use the
models to make predictions. Predictions from the non-spatial model
deviate systematically from the 1:1 line; in this case under
predicting temperatures by a few degrees in warm streams and over
predicting in cold streams. That bias is largely eliminated by the
spatial models, which also have the advantage of considerably greater
predictive power & precision (R2 improves from 0.68 to 0.93; RMSE
decreases from 1.54 °C to 0.74 °C).

As a bit of an aside, I’ve now been involved in projects to fit the
spatial stream models to 3 different temperature databases that were
composites from multiple agencies & some interesting patterns are
beginning to emerge when making comparisons between spatial and non-
spatial regression estimates. If, for example, we look at the
parameter estimate for elevation across those 3 datasets (graphic 7),
we see a lot of variability in the answers that the non-spatial models
provide (-0.0036 to -0.0064 °C/meter) and more consistency from the
spatial models (-0.0034 to -0.0045 °C/meter). Thus, a meta-estimate
for this parameter averaged across the 3 datasets would have a
standard error that is more than 50% smaller using the spatial models
than the non-spatial models and an overall mean that is also less
biased (graphic 7, bottom panel). It again highlights some of the
dangers associated with autocorrelation if it’s not properly accounted
for. In this case the apparent variation in the relationship between
stream temperature & elevation would have been much greater than the
reality & we’d have been misled to some extent by biased model
results.

So with the spatial stream models, we now have a flexible analytical
structure for accurately describing patterns in many datasets
collected on networks & that’s a really powerful scientific tool. If
this tool is coupled with good ecological theory and insightful, a
priori hypotheses, we’ll be able to describe new relationships and
test or refine many old hypotheses to increase the rigor of our
science (graphic 8). That, in turn, will fundamentally improve what we
know about streams & should also improve our ability to manage &
conserve them. The attached paper by McIntire & Fajardo, “Beyond
description: the active and effective way to infer processes from
spatial patterns” is a great one for discussing the potential
interplay between spatial patterns, hypothesis formulation, and
inference regarding underlying processes.

Once we’ve accounted for the spatial autocorrelation in our
temperature dataset & have an accurate model, it can be used for many
purposes that include: 1) making predictions at unsampled locations to
develop those “smart maps” we need for prioritizing conservation
efforts across river networks (graphic 9; blog #26), 2) quantifying
the effects of climate change on stream temperatures (graphic 10; blog
#7), and 3) translating stream temperature increases to species-
specific maps of thermal habitat (graphic 11; blog #7). Those are the
standard temperature model applications that may often be useful but
the spatial models also provide a suite of new applications that will
be interesting to explore in future years. These include: 1) designing
efficient temperature monitoring strategies using information
regarding autocorrelation distance to ensure that monitoring sites are
not redundant (graphic 12); 2) developing spatially explicit maps of
uncertainty in temperature predictions that could also aid in adaptive
monitoring strategies or be used in decision support tools (graphic
13); and 3) block-kriging estimates of stream temperature parameters
within subsections of a river network that are of particular interest
(graphic 14). And remember, although this example is based on a stream
temperature dataset, these same basic analyses & inferences are
possible for many of the attributes we commonly sample on streams
because the Ver Hoef and Peterson models are generalizable to the
standard set of Gaussian, Poisson, and binomial response variable
types (graphic 15). For more on additional applications of the spatial
stream models, graphic 16 contains a short bibliography.

For all the benefits the spatial models provide, there are no free
lunches in life and so here, too, there are a few downsides. First,
there are more parameters to estimate in these models because of the
complex stream covariance structure (blog #27), which means we need
more data, and a good general rule of thumb regarding a minimum sample
size is probably around 100 sites. There also needs to be some spatial
clustering among those sites and autocorrelation in the dataset if the
spatial models are going to provide performance enhancements relative
to non-spatial models. Second, the spatial models are not for the
quantitatively faint of heart. They require relatively advanced GIS
skills to develop the spatial data that describe stream network
topology and the spatial relationships among samples taken on those
networks, a working knowledge of the R statistical program, and some
graduate level training in statistics is always handy for fitting
sensible models and interpreting the results. It will often be the
case, therefore, that using the spatial models requires small teams of
people with complimentary skillsets. Third, fitting the spatial models
in the past required special R code and GIS tools that have not been
widely available and aren’t going to appear any time soon in
commercial statistical programs like SAS or SyStat. This hurdle is
close to being removed, however, as Erin Peterson and Jay Ver Hoef are
putting the finishing touches on a set of freeware GIS tools, a R
statistical package, example datasets, and extensive tutorials that
will be distributed through a new website (more on that later…).

So in some regards the spatial models may be less convenient than many
traditional analyses but there are big payoffs, including the ability
to: 1) use data aggregated across multiple agencies that used a
variety of sampling designs without worries about spatial
autocorrelation, 2) extract massive amounts of new information, and
more accurate information, from existing databases, and 3) map
information back to real-world coordinates so that it’s in a format
accessible to those making on-the-ground decisions and choices about
where to prioritize conservation efforts. In many ways, the spatial
models have the potential to bring people together as we work to
manage and conserve aquatic resources this century. And so even as
budgets shrink & pressures on natural resources continue to grow,
there’s a real possibility that not only will we be able to do more
with less, but we may be able to do much more.

Until next time, best regards,
Dan
Reply all
Reply to author
Forward
0 new messages