I need to calculate the correlation coefficient of a (simultaneously)
moving window over 2 images, such that the resulting pixel x,y (center
of the window) is the corrcoef((a 5x5 window around x,y for image A), (a
5x5 window around x,y for image B)).
Currently, I just loop over y, x, and then call corrcoef for each x,y
window. Would there be a better way, other than converting the loop to
cython?
For clarity (or not), the relevant part from my code:
for y in range(d500shape[2]):
for x in range(d500shape[3]):
if valid500[d,y,x]:
window = spectral500Bordered[d,:,y:y+5, x:x+5].reshape(7, -1)
for b in range(5):
nonzeroMask = (window[0] > 0)
b01corr[0,b,y,x] =
numpy.corrcoef(window[0][nonzeroMask], window[b+2][nonzeroMask])[0,1]
b01corr[1,b,y,x] =
numpy.corrcoef(window[1][nonzeroMask], window[b+2][nonzeroMask])[0,1]
forget the 'if valid500' and 'nonzeroMask', those are to prevent
calculating pixels that don't need to be calculated, and to feed only
valid pixels from the window into corrcoef
spectral500Bordered is essentially a [d dates, 7 images, ysize, xsize]
array. I work per date (d), then calculate the corrcoef for images[0]
versus images[2:], and for images[1] versus images[2:]
Thanks,
Vincent.
_______________________________________________
SciPy-User mailing list
SciPy...@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
Probably cython would be more maintainable...
Zach
I wrote a moving correlation for time series last november (scipy user
and preceding discussion on numpy mailing list)
I don't work with pictures, so I don't know if this can be extended to
your case. Since signal.correlate or convolve work in all directions
it might be possible
def yxcov(self):
xys = signal.correlate(self.x*self.y, self.kern,
mode='same')[self.sslice]
return xys/self.n - self.ymean*self.xmean
Josef
I tried that, but then less clever (just reshaping and transposing); it
made a copy, and I ended up with an array that took 5*5 times as much
space as my initial array... And it took a lot of time.
But I don't see how this reshape would gain me much speed?
Then you could compute
> the correlation coefficient between the two arrays directly. Maybe?
Ah, like that. I thought numpy.corrcoef(x,y) only worked on flat x and
flat y. There is no axis keyword... Is there another function that would
work on nd arrays?
>
> Probably cython would be more maintainable...
Well, speed is more important than readability this time :-) It's
terabytes of data I'll need to push through this function...
What I understood till now (just starting to look at cython) is that in
this kind of thing, the repeated array sub-indexing (and its bounds
checking etc.) is the main bottleneck. I presume the corrcoef function
is pretty much optimized? So I thought if I could do the loop in cython
and implement some clever code for the window indexing (only replace the
new elements), I'd relatively simple get the major speed gain. Does that
sound correct?
Only still have to find out now how to call the corrcoef function from
cython...
Thanks for your thoughts on this,
Vincent.
I saw that when searching on this topic, but didn't think it would work
for me as I supposed it was purely 1-dimensional, and I thought that in
your implementation, though the window moves, the kernel is the same all
the time? I'm no signal processing pro (alas) so please correct me if
I'm wrong. I'll try to find the discussion you mentioned tomorrow. Damn
time zones... ;-)
correlation is just sum(x*y)/sqrt(sum(x*x))/sqrt(sum(y*y)) but
subtracting moving mean which makes it slightly tricky.
the moving sum can be done with any nd convolve or correlate (ndimage
or signal) which goes in all directions
the window is just np.ones((windowsize,windowsize)) for symmetric
picture, or if nan handling is not necessary than the averaging window
can be used directly.
I'm pretty sure now it works, with 5 or so intermediate arrays in the
same size as the original array
Josef
Well, the feindishly-clever view (which might not even be possible, I
haven't thought through it super-clearly) would at least use no more
memory than the original array.
> Then you could compute
>> the correlation coefficient between the two arrays directly. Maybe?
>
> Ah, like that. I thought numpy.corrcoef(x,y) only worked on flat x and
> flat y. There is no axis keyword... Is there another function that
> would
> work on nd arrays?
I'd follow Joseph's lead here and implement your own corrcoef that (in
the above case) works on (x,y,n,n)-shape arrays, or, in the cython
case, is just a cdef function that computes it directly.
> Well, speed is more important than readability this time :-) It's
> terabytes of data I'll need to push through this function...
OK, then cython is likely preferable here. Just make sure to properly
declare the array and the index variables, e.g.:
cdef np.ndarray[np.float32_t, ndim=2, negative_indices=False,
mode='c'] a = np.asarray(a_in, dtype=np.float32, order='C')
(or could use fortran-order if that's what the arrays are natively)
and make sure after debugging to @cython.boundscheck(False) the
function.
With these steps, your indexing will be maximally fast.
> What I understood till now (just starting to look at cython) is that
> in
> this kind of thing, the repeated array sub-indexing (and its bounds
> checking etc.) is the main bottleneck. I presume the corrcoef function
> is pretty much optimized? So I thought if I could do the loop in
> cython
> and implement some clever code for the window indexing (only replace
> the
> new elements), I'd relatively simple get the major speed gain. Does
> that
> sound correct?
> Only still have to find out now how to call the corrcoef function from
> cython...
Definitely implement your own corrcoef in cython in this case. Or just
inline it in the inner loop of the function you're writing. The killer
overhead will be in calling a python function on every pixel and
converting the arguments and return value otherwise.
Zach
I suspect that this approach would even be considerably faster than coding your loops and the windowed correlation up in cython.
cheers,
David
I'd say the Cython version is bound to be more readable than any
vectorised form one could contrive. As an example of how to calculate
correlation coefficients in Cython using Summed Area Tables, have a
look at
(sorry for the horrible URL)
Further down, the function "ncc" declares normalised cross-correlation.
Regards
Stéfan
Guys, thanks a lot! It works well, and executing times have decreased
with approx a factor 6. Pity I still don't get to learn cython, but I
gained some insight. Be asured I'd buy you a beer if we lived a bit
closer :-)
thanks again,
Vincent.