ztoc ctoz performance and multiple fft processes

Laurent Noudohounsi

unread,

Mar 31, 2017, 7:35:04 AM3/31/17

to perfoptimi...@lists.apple.com

Hi, I'm currently developing an audio application and I've to do some fft computations in order to transform real data to complex data.

I've few questions about vDSP library and I cannot find any answers on the web unfortunately.

I understand the concept of vDSP_ztoc and vDSP_ctoz but since I have to deal with other c++ matrix libraries where std::complex data are interleaved, I end up using a lot of these things.

1) So my first question is, is it absolutely necessary to use the vDSPfft_zrip to compute real-to-complex? Will be same if I use a a complex-to-complex fft with vDSP_fft_zip? In this case I won't need to use a vDSP_ctoz since all my input data will be in the realp part of the DSPSplitComplex.

2) Why vDSP uses vDSP_ztoc and vDSP_ctoz? Is it really just to save 1 DSPSplitComplex element? The Nyquist component which is moved to the imaginary part of the first element of DSPSplitComplex. In this case I have to conclude that assigning a value and calling vDSP_ctoz takes the same time?

3) I would like to make multiple fft computations in a single call so I've to use vDSP_fftm_zrip. The problem is that I don't really understand how to organize my data. vDSP_fftm_zrip needs one single DSPSplitComplex so saying I've a real array S which is the concatenation of 2 signal S1, S2. S = S1[0] S1[1]... S1[N] S2[0] S2[1]... S2[N]. I've to create 2 DSPSPlitComplex, use vDSP_ctoz each time, and concatenate all the realp/imagp of these DSPSPlitComplex into a big one DSPSPlitComplex that will be passed to the vDSP_fftm_zrip? Is the the good practice to use vDSP_fftm_zrip?

Thank for your help

Laurent Noudohounsi

unread,

Apr 1, 2017, 2:57:29 PM4/1/17

to Eric Postpischil, perfoptimi...@lists.apple.com

Thank you so much Eric for your fantastic answer. Also I didn’t expect such a quick reply, thank you!

It really help to have a better understand of this fantastic library. Some time we can think that some function would be better to use but it end up to be slower, like the vDSP_fft_zip and vDSP_fft_zrip question.

Indeed the legacy thing makes sense now for the choice of the split complex.

I will architect my program more in a more efficient way to avoid the calls to vDSP_ctoz and vDSP_ztoc.

Regarding the help about vDSP_fftm_zrip, it help me a lot. In fact I did this during my first attempt but it didn’t work. I understand why: the Apple documentation of this function seems confusing. Here is what it says about the fourth parameter:

"The number of elements between the first element of one input signal and the first element of the next (which is also the length of each input signal, measured in elements)."

For me "input signal" means the array of real data so the length of N. But I guess they it means the length of the input data regarding a DSPSplitComplex hence N/2. I looked at the DSPSplitComplex as the output data.

By the way I found on the web a website ( http://stpeterandpaul.ca/tiger/documentation/Performance/Conceptual/vDSP/ref_chap/chapter_4.1_section_53.html ) which seems to talk about the old version (Tiger OSX) of vDSP_fftm_zrip with two stride parameters instead of one:

"fftStride: The number of elements between the first element of one input signal and the first element of the next (which is also to length of each input signal, measured in elements)." here I understand a length of N.

"rfftStride: The number of elements between the first element of one result vector and the next in the output vector result." here I understand a length of N/2.

So is it an issue (so it’s time to use the link you quoted for the bug issues) or is it me who misunderstands?

Eric Postpischil

unread,

Apr 3, 2017, 11:30:25 AM4/3/17

to Laurent Noudohounsi, perfoptimi...@lists.apple.com

On Apr 1, 2017, at 14:57, Laurent Noudohounsi <laurent.n...@gmail.com> wrote:

Regarding the help about vDSP_fftm_zrip, it help me a lot. In fact I did this during my first attempt but it didn’t work. I understand why: the Apple documentation of this function seems confusing. Here is what it says about the fourth parameter:
"The number of elements between the first element of one input signal and the first element of the next (which is also the length of each input signal, measured in elements)."
For me "input signal" means the array of real data so the length of N. But I guess they it means the length of the input data regarding a DSPSplitComplex hence N/2. I looked at the DSPSplitComplex as the output data.

Yes, that is a mistake in the documentation, on two counts. One, for real-to-complex signals, each array contains only half the elements for a signal, so the stride between signals packed into memory with no padding would be N/2 elements. Two, the stride is largely arbitrary; users can put padding between signals or even supply negative strides to traverse the signals backward. So it is not generally true that the stride between signals is the length of a signal. Thanks for pointing it out. I entered a bug report about this. (Negative strides are not generally useful in this case, since the FFT operations on the signals are purely parallel, but they can be useful in other operations.)

By the way I found on the web a website ( http://stpeterandpaul.ca/tiger/documentation/Performance/Conceptual/vDSP/ref_chap/chapter_4.1_section_53.html ) which seems to talk about the old version (Tiger OSX) of vDSP_fftm_zrip with two stride parameters instead of one:

"fftStride: The number of elements between the first element of one input signal and the first element of the next (which is also to length of each input signal, measured in elements)." here I understand a length of N.

"rfftStride: The number of elements between the first element of one result vector and the next in the output vector result." here I understand a length of N/2.

That is overzealous copying and pasting. Notice there is no rfftStride in the prototype function call shown there. It looks like somebody copied and pasted the descriptive text from vDSP_fftm_zrop (which is an out-of-place routine and has extra parameters to describe the output data, including rfftStride) without deleting the parts that do not apply the vDSP_fftm_zrip.

(There are actually two stride parameters in vDSP_fftm_zrip: one for the stride between elements within a signal and one for the stride between the first elements of successive signals. The descriptive text on that web page includes three: signalStride (the stride between elements within a signal), fftStride (the stride between successive signals), and rfftStride (the stride between elements within the output “results” array, which is not applicable to vDSP_fftm_zrip).

When you have troubles with Apple’s developer documentation for vDSP, one alternative is to examine the header, vDSP.h. This is deep within the SDK, typically, for macOS, at a path such as:

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/System/Library/Frameworks/Accelerate.framework/Frameworks/vecLib.framework/Headers/vDSP.h

/Applications/Xcode.app may change if you install Xcode somewhere else, and the platform and SDK names will change for iOS or other versions of macOS.

Inside vDSP.h, I have written comments documenting the routines. Be sure to read the “Documentation conventions” section at the beginning. In particular, to separate the description of the mathematics from the description of memory layout, many of the routines are described using mathematical vectors that disregard strides. E.g., an array’s elements may be referred to as C[i], treating C as a simple mathematical vector with elements C[0], C[1], C[2],…, when in fact the user may supply a stride other than one, so the actual elements in memory are at C[i*IC], where IC is the stride. For routines with more involved strides, this default mapping of a mathematical vector to memory layout is not used, and the documentation for the routine shows the strides explicitly.

—edp

Reply all

Reply to author

Forward