[SciPy-User] Understanding the cross-correlation function numpy.correlate & how to use it properly with real and synthetic data

3,174 views
Skip to first unread message

Rob Newman

unread,
Dec 14, 2011, 3:14:45 PM12/14/11
to SciPy Users List
Hi SciPy gurus,

First up - I am not a physicist, so please be gentle!

I have an array of real data and an array of synthetic data. I am trying to determine the cross-correlation of the two signals and the timeshift that needs to be applied to the real data to best match the synthetic data. I also want to only use the real data later on the script if the cross correlation result is above some level of confidence.

I have read the man page on numpy.correlate, but I am not entirely sure of what that function returns to me, and how I should use it. I have looked at James Battat's website that has a useful script on the discrete correlation function of two functions (https://www.cfa.harvard.edu/~jbattat/computer/python/science/#correlation) but I think his example is more complicated than my needs.

I understand that the correlate function returns an array that is twice the size of both the input arrays minus 1 (when using mode='full'), but what do I need to do to that resulting array to get the correlation value (if there is indeed a value to be returned) and the timeshift that needs to be applied to the real data to match the synthetic data.

Thanks in advance,
- Rob
_______________________________________________
SciPy-User mailing list
SciPy...@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user

Kevin Gullikson

unread,
Dec 14, 2011, 3:49:30 PM12/14/11
to SciPy Users List
Rob,


I understand that the correlate function returns an array that is twice the size of both the input arrays minus 1 (when using mode='full'), but what do I need to do to that resulting array to get the correlation value (if there is indeed a value to be returned) and the timeshift that needs to be applied to the real data to match the synthetic data.

numpy.correlate returns an array of correlation values, so you don't need to do anything to get that. Getting the timeshift is the somewhat tricky part. Here is some code that I use (I stole it from somewhere, but don't remember where...)"

#Do the correlation. x and y is the x and y components of your data (so I guess x is time and y is whatever you are modeling), template is what you are cross-correlating with
ycorr = scipy.correlate(y, template mode="full")

#Generate an x axis
xcorr = numpy.arange(ycorr.size)

#Convert this into lag units, but still not really physical
lags = xcorr - (y.size-1)
distancePerLag = (x[-1] - x[0])/float(x.size)  #This is just the x-spacing (or for you, the timestep) in your data

#Convert your lags into physical units
offsets = -lags*distancePerLag


You can then use numpy.argmax() to find the index in ycorr that has the highest cross-correlation value, and do whatever you want with the cross-correlation.

Cheers,
Kevin Gullikson

Rob Newman

unread,
Dec 15, 2011, 11:38:48 AM12/15/11
to SciPy Users List
Hi Kevin,

Thanks for that chunk of code and the explanation - its a great help.

Happy holidays.
- Rob Newman

Vincent Schut

unread,
Dec 16, 2011, 4:45:11 AM12/16/11
to scipy...@scipy.org
On 12/14/2011 09:14 PM, Rob Newman wrote:
> Hi SciPy gurus,
>
> First up - I am not a physicist, so please be gentle!
>
> I have an array of real data and an array of synthetic data. I am trying to determine the cross-correlation of the two signals and the timeshift that needs to be applied to the real data to best match the synthetic data. I also want to only use the real data later on the script if the cross correlation result is above some level of confidence.
>
> I have read the man page on numpy.correlate, but I am not entirely sure of what that function returns to me, and how I should use it. I have looked at James Battat's website that has a useful script on the discrete correlation function of two functions (https://www.cfa.harvard.edu/~jbattat/computer/python/science/#correlation) but I think his example is more complicated than my needs.
>
> I understand that the correlate function returns an array that is twice the size of both the input arrays minus 1 (when using mode='full'), but what do I need to do to that resulting array to get the correlation value (if there is indeed a value to be returned) and the timeshift that needs to be applied to the real data to match the synthetic data.
>
> Thanks in advance,
> - Rob

Hi,

if you just need to find the time-shift, another approach could be fft
phase correlation. I have successfully used that to co-register images
(satellite images) together, but I suppose it would apply in the 1-d
case as well. Unfortunately I don't have any code ready, but you just
might want to check some info on the subject on the internet to see if
it would fit your needs.

Best,
Vincent.

Reply all
Reply to author
Forward
0 new messages