[SciPy-User] moving window (2D) correlation coefficient

824 views
Skip to first unread message

Vincent Schut

unread,
Feb 10, 2010, 10:45:41 AM2/10/10
to scipy...@scipy.org
Hi,

I need to calculate the correlation coefficient of a (simultaneously)
moving window over 2 images, such that the resulting pixel x,y (center
of the window) is the corrcoef((a 5x5 window around x,y for image A), (a
5x5 window around x,y for image B)).
Currently, I just loop over y, x, and then call corrcoef for each x,y
window. Would there be a better way, other than converting the loop to
cython?


For clarity (or not), the relevant part from my code:


for y in range(d500shape[2]):
for x in range(d500shape[3]):
if valid500[d,y,x]:
window = spectral500Bordered[d,:,y:y+5, x:x+5].reshape(7, -1)
for b in range(5):
nonzeroMask = (window[0] > 0)
b01corr[0,b,y,x] =
numpy.corrcoef(window[0][nonzeroMask], window[b+2][nonzeroMask])[0,1]
b01corr[1,b,y,x] =
numpy.corrcoef(window[1][nonzeroMask], window[b+2][nonzeroMask])[0,1]


forget the 'if valid500' and 'nonzeroMask', those are to prevent
calculating pixels that don't need to be calculated, and to feed only
valid pixels from the window into corrcoef
spectral500Bordered is essentially a [d dates, 7 images, ysize, xsize]
array. I work per date (d), then calculate the corrcoef for images[0]
versus images[2:], and for images[1] versus images[2:]

Thanks,
Vincent.

_______________________________________________
SciPy-User mailing list
SciPy...@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user

Zachary Pincus

unread,
Feb 10, 2010, 11:42:19 AM2/10/10
to SciPy Users List
I bet that you could construct an array with shape (x,y,5,5), where
array[i,j,:,:] would give the 5x5 window around (i,j), as some sort of
mind-bending view on an array of shape (x,y), using a positive offset
and some dimensions having negative strides. Then you could compute
the correlation coefficient between the two arrays directly. Maybe?

Probably cython would be more maintainable...

Zach

josef...@gmail.com

unread,
Feb 10, 2010, 11:53:28 AM2/10/10
to SciPy Users List

I wrote a moving correlation for time series last november (scipy user
and preceding discussion on numpy mailing list)
I don't work with pictures, so I don't know if this can be extended to
your case. Since signal.correlate or convolve work in all directions
it might be possible

def yxcov(self):
xys = signal.correlate(self.x*self.y, self.kern,
mode='same')[self.sslice]
return xys/self.n - self.ymean*self.xmean

Josef

Vincent Schut

unread,
Feb 10, 2010, 2:31:16 PM2/10/10
to scipy...@scipy.org
On 02/10/2010 05:42 PM, Zachary Pincus wrote:
> I bet that you could construct an array with shape (x,y,5,5), where
> array[i,j,:,:] would give the 5x5 window around (i,j), as some sort of
> mind-bending view on an array of shape (x,y), using a positive offset
> and some dimensions having negative strides.

I tried that, but then less clever (just reshaping and transposing); it
made a copy, and I ended up with an array that took 5*5 times as much
space as my initial array... And it took a lot of time.
But I don't see how this reshape would gain me much speed?

Then you could compute
> the correlation coefficient between the two arrays directly. Maybe?

Ah, like that. I thought numpy.corrcoef(x,y) only worked on flat x and
flat y. There is no axis keyword... Is there another function that would
work on nd arrays?

>
> Probably cython would be more maintainable...

Well, speed is more important than readability this time :-) It's
terabytes of data I'll need to push through this function...
What I understood till now (just starting to look at cython) is that in
this kind of thing, the repeated array sub-indexing (and its bounds
checking etc.) is the main bottleneck. I presume the corrcoef function
is pretty much optimized? So I thought if I could do the loop in cython
and implement some clever code for the window indexing (only replace the
new elements), I'd relatively simple get the major speed gain. Does that
sound correct?
Only still have to find out now how to call the corrcoef function from
cython...

Thanks for your thoughts on this,
Vincent.

Vincent Schut

unread,
Feb 10, 2010, 2:36:44 PM2/10/10
to scipy...@scipy.org

I saw that when searching on this topic, but didn't think it would work
for me as I supposed it was purely 1-dimensional, and I thought that in
your implementation, though the window moves, the kernel is the same all
the time? I'm no signal processing pro (alas) so please correct me if
I'm wrong. I'll try to find the discussion you mentioned tomorrow. Damn
time zones... ;-)

josef...@gmail.com

unread,
Feb 10, 2010, 2:48:23 PM2/10/10
to SciPy Users List

correlation is just sum(x*y)/sqrt(sum(x*x))/sqrt(sum(y*y)) but
subtracting moving mean which makes it slightly tricky.

the moving sum can be done with any nd convolve or correlate (ndimage
or signal) which goes in all directions

the window is just np.ones((windowsize,windowsize)) for symmetric
picture, or if nan handling is not necessary than the averaging window
can be used directly.

I'm pretty sure now it works, with 5 or so intermediate arrays in the
same size as the original array

Josef

Zachary Pincus

unread,
Feb 10, 2010, 3:02:22 PM2/10/10
to SciPy Users List
> I tried that, but then less clever (just reshaping and transposing);
> it
> made a copy, and I ended up with an array that took 5*5 times as much
> space as my initial array... And it took a lot of time.
> But I don't see how this reshape would gain me much speed?

Well, the feindishly-clever view (which might not even be possible, I
haven't thought through it super-clearly) would at least use no more
memory than the original array.

> Then you could compute
>> the correlation coefficient between the two arrays directly. Maybe?
>
> Ah, like that. I thought numpy.corrcoef(x,y) only worked on flat x and
> flat y. There is no axis keyword... Is there another function that
> would
> work on nd arrays?

I'd follow Joseph's lead here and implement your own corrcoef that (in
the above case) works on (x,y,n,n)-shape arrays, or, in the cython
case, is just a cdef function that computes it directly.

> Well, speed is more important than readability this time :-) It's
> terabytes of data I'll need to push through this function...

OK, then cython is likely preferable here. Just make sure to properly
declare the array and the index variables, e.g.:
cdef np.ndarray[np.float32_t, ndim=2, negative_indices=False,
mode='c'] a = np.asarray(a_in, dtype=np.float32, order='C')
(or could use fortran-order if that's what the arrays are natively)
and make sure after debugging to @cython.boundscheck(False) the
function.

With these steps, your indexing will be maximally fast.

> What I understood till now (just starting to look at cython) is that
> in
> this kind of thing, the repeated array sub-indexing (and its bounds
> checking etc.) is the main bottleneck. I presume the corrcoef function
> is pretty much optimized? So I thought if I could do the loop in
> cython
> and implement some clever code for the window indexing (only replace
> the
> new elements), I'd relatively simple get the major speed gain. Does
> that
> sound correct?
> Only still have to find out now how to call the corrcoef function from
> cython...

Definitely implement your own corrcoef in cython in this case. Or just
inline it in the inner loop of the function you're writing. The killer
overhead will be in calling a python function on every pixel and
converting the arguments and return value otherwise.

Zach

David Baddeley

unread,
Feb 10, 2010, 4:24:53 PM2/10/10
to SciPy Users List
I think Josef's nailed it - if you can live with subtracting a moving mean from each pixel [calculated using a correlation/convolution with (1.0/(windowsize*windowsize))*ones((windowsize, windowsize))] rather than taking the mean across the actual window used for the correlation, you'll be home free. Not only do you save yourself the loops, but you're also going to be doing significantly less multiplications than if you calculated the correlation separately for each window.

I suspect that this approach would even be considerably faster than coding your loops and the windowed correlation up in cython.

cheers,
David

Stéfan van der Walt

unread,
Feb 11, 2010, 1:35:55 AM2/11/10
to SciPy Users List
On 10 February 2010 22:02, Zachary Pincus <zachary...@yale.edu> wrote:
>> Well, speed is more important than readability this time :-) It's
>> terabytes of data I'll need to push through this function...
>
> OK, then cython is likely preferable here.

I'd say the Cython version is bound to be more readable than any
vectorised form one could contrive. As an example of how to calculate
correlation coefficients in Cython using Summed Area Tables, have a
look at

http://dip.sun.ac.za/~stefan/code/supreme.git/?p=stefan/supreme.git;a=blob;f=supreme/register/ncorr.pyx;hb=HEAD#l21

(sorry for the horrible URL)

Further down, the function "ncc" declares normalised cross-correlation.

Regards
Stéfan

Vincent Schut

unread,
Feb 11, 2010, 4:40:41 AM2/11/10
to scipy...@scipy.org
On 02/10/2010 10:24 PM, David Baddeley wrote:
> I think Josef's nailed it - if you can live with subtracting a moving mean from each pixel [calculated using a correlation/convolution with (1.0/(windowsize*windowsize))*ones((windowsize, windowsize))] rather than taking the mean across the actual window used for the correlation, you'll be home free. Not only do you save yourself the loops, but you're also going to be doing significantly less multiplications than if you calculated the correlation separately for each window.
>
> I suspect that this approach would even be considerably faster than coding your loops and the windowed correlation up in cython.
>
> cheers,
> David
>

Guys, thanks a lot! It works well, and executing times have decreased
with approx a factor 6. Pity I still don't get to learn cython, but I
gained some insight. Be asured I'd buy you a beer if we lived a bit
closer :-)

thanks again,
Vincent.

Reply all
Reply to author
Forward
0 new messages