Cross-correlation: rfft() VS fft() VS xcorr() performances

CrocoDuck O'Ducks

unread,

Apr 7, 2016, 4:37:18 PM4/7/16

to julia-users

Hey there cool people!

Seems like this topic will be pretty similar to this one.

So, I am trying to write my version of alignsignals. In few words, I calculate the cross-correlation, find the peak, calculate its shift from the origin of time, use it to align the signals. I got three version of a function to achieve this, using xcorr, fft and rfft respectively. When using two signals one minute long, sampled at 48 kHz, they benchmark as follows:

xcorr() version:

================ Benchmark Results ========================
     Time per evaluation: 832.16 ms [663.40 ms, 1.00 s]
Proportion of time in GC: 15.94% [14.32%, 17.57%]
        Memory allocated: 460.01 mb
   Number of allocations: 239 allocations
       Number of samples: 10
   Number of evaluations: 10
 Time spent benchmarking: 9.73 s

rfft() version:

================ Benchmark Results ========================
     Time per evaluation: 1.01 s [654.63 ms, 1.36 s]
Proportion of time in GC: 12.11% [8.30%, 15.91%]
        Memory allocated: 306.21 mb
   Number of allocations: 344 allocations
       Number of samples: 6
   Number of evaluations: 6
 Time spent benchmarking: 7.58 s

fft() version:

================ Benchmark Results ========================
     Time per evaluation: 715.45 ms [536.86 ms, 894.04 ms]
Proportion of time in GC: 20.37% [16.49%, 24.24%]
        Memory allocated: 569.87 mb
   Number of allocations: 276 allocations
       Number of samples: 13
   Number of evaluations: 13
 Time spent benchmarking: 10.56 s

It seems that the fft() version is the fastest while the rfft() version is the one that allocates less memory. According to my understanding, rfft() should also be faster with respect fft(). Unfortunately, the FFTW plan in the post I linked above does not seem to work. Are you able to suggest how to plan rfft() to make it faster? I will have to align many data collections sampled at 96 kHz minium, usually at least 30 s long. I plan to do it in a loopy structure, so it would be of great advantage to have quickest execution and lowest memory usage.

Here the code defining the functions:

function alignsignals{T <: Real}(x::Array{T, 1}, u::Array{T, 1})

# Delay as lag of cross correlation peak from origin of time.
lₓ = length(x)
lᵤ = length(u)
lₛ = lₓ + lᵤ - 1

sₓᵤ::Array{T, 1} = xcorr(x, u)

if mod(lₛ, 2) == 0
    ct_idx = fld(lₛ, 2)
else
    ct_idx = fld(lₛ, 2) + 1
end

_::Array{T, 1}, pk_idx::Array{Int, 1} = findmax(sₓᵤ, 1)

δ::Int = ct_idx - pk_idx[1]

# Align:
y = zeros(T, lᵤ)

if δ > 0
    y[1:(lᵤ - δ)] = u[(δ + 1):lᵤ]
else
    y[(-δ + 1):lᵤ] = u[1:(lᵤ - -δ)]
end

return y, δ

end

function rfftalignsignals{T <: Real}(x::Array{T, 1}, u::Array{T, 1}; ncores::Int = 1)

lₓ = length(x)
lᵤ = length(u)

FFTW.set_num_threads(ncores)

x_nfft = nextprod(primes(10), lₓ)
u_nfft = nextprod(primes(10), lᵤ)
nfft = max(x_nfft, u_nfft)

X = rfft([x; zeros(nfft - lₓ)])
U = rfft([u; zeros(nfft - lᵤ)])

Sₓᵤ = conj(X) .* U

sₓᵤ = flipdim(fftshift(irfft(Sₓᵤ, 2 * length(Sₓᵤ) - 1)), 1)

lₛ = length(sₓᵤ)

if mod(lₛ, 2) == 0
    ct_idx = fld(lₛ, 2)
else
    ct_idx = fld(lₛ, 2) + 1
end

_::Array{T, 1}, pk_idx::Array{Int, 1} = findmax(sₓᵤ, 1)

δ::Int = ct_idx - pk_idx[1]

# Align:
y = zeros(T, lᵤ)

if δ > 0
    y[1:(lᵤ - δ)] = u[(δ + 1):lᵤ]
else
    y[(-δ + 1):lᵤ] = u[1:(lᵤ - -δ)]
end

return y, δ

end

function fftalignsignals{T <: Real}(x::Array{T, 1}, u::Array{T, 1}; ncores::Int = 1)

lₓ = length(x)
lᵤ = length(u)

FFTW.set_num_threads(ncores)

x_nfft = nextprod(primes(10), lₓ)
u_nfft = nextprod(primes(10), lᵤ)
nfft = max(x_nfft, u_nfft)

X = fft([x; zeros(x_nfft - lₓ)])
U = fft([u; zeros(u_nfft - lᵤ)])

Sₓᵤ = conj(X) .* U

sₓᵤ = abs(flipdim(fftshift(ifft(Sₓᵤ)), 1))

lₛ = length(sₓᵤ)

if mod(lₛ, 2) == 0
    ct_idx = fld(lₛ, 2)
else
    ct_idx = fld(lₛ, 2) + 1
end

_::Array{T, 1}, pk_idx::Array{Int, 1} = findmax(sₓᵤ, 1)

δ::Int = ct_idx - pk_idx[1]

# Align:
y = zeros(T, lᵤ)

if δ > 0
    y[1:(lᵤ - δ)] = u[(δ + 1):lᵤ]
else
    y[(-δ + 1):lᵤ] = u[1:(lᵤ - -δ)]
end

return y, δ

end

DNF

unread,

Apr 8, 2016, 3:18:17 AM4/8/16

to julia-users

The xcorr function calls the conv function, which again uses fft. If you know the general structure and length of your signals ahead of time, you can probably gain some performance by planning the ffts beforehand. I don't know why it doesn't work for you, but you could have a look in at conv in dsp.jl.

If you really want to speed things up, though, you might implement your own xcorr. xcorr dominates the runtime in your function, and if you know an upper bound on the signal lags, you can implement xcorr with a limited number of lags. By default xcorr calculates for all lags (in your case that's 2*48000*60-1 ~ 6million lags). If you know that the max lag is 1 second, you can save ~98% percent of the runtime of xcorr.

A couple of other remarks:

* There's no need to put type annotations next to the function outputs, it's just visual noise

* Use ct_idx = cld(lₛ, 2) and forget about the mod.

CrocoDuck O'Ducks

unread,

Apr 8, 2016, 3:59:31 AM4/8/16

to julia-users

Oh cool, thanks for the tips. I'll dive in and be back!

DNF

unread,

Apr 8, 2016, 7:31:54 AM4/8/16

to julia-users

Hmm. Thinking a bit more about it, it might be hard to beat the FFT with when your signals are that long.

FFT is O(N*logN), while time domain is O(L*logN) where L is the number of lags. log2(60*48000) is less than 22(!) so the FFT will be very beneficial. Here you can have a look at a discussion of conv speed in Julia:

https://github.com/JuliaLang/julia/issues/13996 , including a heuristic for choosing between FFT and time domain processing.

DNF

unread,

Apr 8, 2016, 7:33:01 AM4/8/16

to julia-users

On Friday, April 8, 2016 at 1:31:54 PM UTC+2, DNF wrote:

FFT is O(N*logN), while time domain is O(L*logN) where L is the number of lags.

I meant, of course, that the time domain xcorr is O(L*N).

Stefan Karpinski

unread,

Apr 8, 2016, 9:49:59 AM4/8/16

to julia...@googlegroups.com

Would it make sense to have a maxlag keyword option on xcorr to limit how big the lags it considers are?

CrocoDuck O'Ducks

unread,

Apr 8, 2016, 10:11:38 AM4/8/16

to julia-users

I think so. MATLAB xcorr implementation accepts a maxlag argument, which is probably handy for many people.

I don't think I will write my xcorr version. I will probably take a dirty shortcut: calculate the cross correlation only on windowed segments of signals. It should give the correct lag anyway.

Reply all

Reply to author

Forward