to smooth or not to smooth...

Thomas Wolbers

unread,

Aug 12, 2008, 3:54:03 PM8/12/08

to mvpa-t...@googlegroups.com

Dear all,

in the context of a traditional GLM analysis, people usually smooth
their data to ensure the validity of random field theory and to account
for residual anatomical variability.
I guess the latter is the reason why some pattern classification studies
have smoothed their data as well (i.e. Polyn et al, 2005), if one's goal
is to show group data such as averaged importance maps. Is this correct
or are there any underlyng statistical considerations that I am missing
here?

Best,
Thomas Wolbers

Sam Gershman

unread,

Aug 13, 2008, 6:30:03 AM8/13/08

to Princeton MVPA Toolbox for Matlab

There is another reason (from signal processing): it's called the
matched filter theorem.
See here: http://imaging.mrc-cbu.cam.ac.uk/imaging/PrinciplesSmoothing

But note that the assumptions about the spatial distribution of the
signal implied, for example, by the commonly-used Gaussian kernel, may
be inappropriate for certain kinds of MVPA models. In particular,
models that look for sparse signals in a region of interest generally
assume a non-Gaussian spatial distribution. It's best to think of
smoothing as a part of the statistical model itself; in fact there are
ways to incorporate adaptive smoothing as part of the parameterization
of the model. See, for example, the work by Flandin & Penny in the
context of mass-univariate GLMs (they don't do smoothing as part of
pre-processing, but incorporate it as a spatial prior):
http://www.fil.ion.ucl.ac.uk/spm/doc/papers/gf_sparse_vb.pdf

These sorts of spatial priors would be a very interesting direction to
take the MVPA toolbox ...

Sam

Thomas Wolbers

unread,

Aug 13, 2008, 2:28:52 PM8/13/08

to mvpa-t...@googlegroups.com

Sam,

thanks for reminding me of the matched filter theorem. I use a spherical
searchlight in my current datsaset, and I have reasons to assume that
the spatial distribution of the information I want to classify is indeed
Gaussian. So I guess it makes sense to smooth the data to increase SNR
and to roughly match the size of the smoothing kernel to the size of the
searchlight, which of course reflects the size of the expected effects.
Any objections?

Best,
Thomas

Sam Gershman schrieb:

Jesse Rissman

unread,

Aug 13, 2008, 3:01:37 PM8/13/08

to mvpa-t...@googlegroups.com

Hi Thomas,

For a spherical searchlight analysis, if you smooth your data with a FWHM kernel that's approx. the same size of your spheres, you run the risk of reducing the total amount of information contained within the sphere, since all of its voxels will have highly correlated values after smoothing. Thus, it seems the classifier will largely base its classifications on the mean activity of the sphere, rather on the distributed activation pattern within the sphere. As a result, the spheres that are best able to discriminate your two conditions will be those whose mean is consistently higher in one condition than the other. When I generate spherical searchlight maps, I run the analysis on unsmoothed data, but then smooth the resulting searchlight maps by 4mm or 8mm before averaging them across subjects, which serves to make the group maps more robust to slight anatomical/functional differences across subjects.

-- Jesse

Sam Gershman

unread,

Aug 14, 2008, 8:00:43 AM8/14/08

to Princeton MVPA Toolbox for Matlab

Jesse and Thomas,

The different strategies that you proposed correspond to different
statistical models of the data. I can make this explicit by pointing
out a connection between spherical searchlights and Gaussian
smoothing.

I'm going to base my argument on the original implementation of the
spherical searchlight in Kriegeskorte and Bandettini (2006), but I
realize that there are other ways to do it. In that implementation
they used the Mahalanobis distance between activity on 2 conditions in
each spotlight as their information measure. To better understand this
measure, consider P voxels in a spherical spotlight. We partition the
timepoints into 2 conditions: A and B. Now imagine the P-dimensional
space spanned by the voxel activity, and that the activity on each
condition is distributed according to a multivariate Gaussian in this
space. According to the maximum likelihood estimate, we put the mean
of each Gaussian at the sample mean. Now what about the covariance?
Kriegeskorte and Bandettini use a 'shrinkage' estimate that biases the
covariance matrix to be as diagonal as possible. This actually
corresponds to an "empirical Bayes" estimate with a particular kind of
prior. Specifically, it is a prior that embodies the belief that the
multivariate Gaussian can be decomposed into the product of
(independent) univariate Gaussians, one for each voxel. This estimator
'shrinks' the sample covariance towards the prior covariance. One
implication of this prior is that voxels within the searchlight share
very little information, according to the Mahalanobis distance. When
the covariance matrix is diagnoal, the Mahalanobis distance measures
the Euclidean distance between the 2 means, normalized by the
covariance; if the response to each condition is correlated on average
across voxels, this distance will be small (because the normalization
will push it towards zero).

Let's consider a different prior that embodies different assumptions
about the data. Suppose that, contrary to the diagonal prior, I
believe that voxels within the searchlight are fairly homogenous in
their response to the different conditions. More realistically, I
might suppose that the degree of similarity between the response of
two voxels decreases as they physical distance between them increases.
I could encode these beliefs in a non-diagonal prior with covariance
as a function of physical distance (note that in this case the
diagonal entries would still have the largest variance). In terms of
the Mahalanobis distance, now in addition to penalizing the Euclidean
distance according to how correlated a voxel on condition A is with
itself on condition B, we are also penalizing according to how
correlated that voxel is with other voxels on condition B. Now here is
the thrust of my argument: using this prior has the same effect on the
ensuing results as smoothing the data with a Gaussian kernel and then
using the shrinkage estimate. The Gaussian kernel is enforcing prior
beliefs about spatial covariance on your estimates. I would need to
work through the math to ascertain the precise quantitative
relationship, but I'm pretty sure that this holds qualitatively.

So my point is this: both strategies are reasonable under different
assumptions about the data. What are your assumptions?

Sam

On Aug 13, 3:01 pm, "Jesse Rissman" <riss...@gmail.com> wrote:
> Hi Thomas,
> For a spherical searchlight analysis, if you smooth your data with a FWHM
> kernel that's approx. the same size of your spheres, you run the risk of
> reducing the total amount of information contained within the sphere, since
> all of its voxels will have highly correlated values after smoothing. Thus,
> it seems the classifier will largely base its classifications on the mean
> activity of the sphere, rather on the distributed activation pattern within
> the sphere. As a result, the spheres that are best able to discriminate
> your two conditions will be those whose mean is consistently higher in one
> condition than the other. When I generate spherical searchlight maps, I run
> the analysis on unsmoothed data, but then smooth the resulting searchlight
> maps by 4mm or 8mm before averaging them across subjects, which serves to
> make the group maps more robust to slight anatomical/functional differences
> across subjects.
>
> -- Jesse
>
> On Wed, Aug 13, 2008 at 11:28 AM, Thomas Wolbers

> <thomas.wolb...@gmail.com>wrote:

Reply all

Reply to author

Forward