vDSP_normalize()

37 views

Skip to first unread message

Luigi Castelli

unread,

Nov 11, 2015, 10:33:26 AM11/11/15

to PerfOpt-Dev

Hi there,

I am coding in C and I have to implement scalar and vectorized versions of the same algorithm.

For the vectorized version I chose to use the Accelerate framework and more often than not I find myself using the vDSP library.

Now I am working on a algorithm to calculate the mean and standard deviation of an array of values.

It just so happens that vDSP has a function to do that very elegantly: vDSP_normalize()

Now for the scalar code version of my algorithm I am not concerned so much about speed, however I want to make sure I am getting consistent results across both versions. In order to do that I would like to be able to match the numerical stability/error of the vDSP implementation.

Can someone shed some light on the numerical properties of the algorithm implemented in vDSP_normalize()?

What method vDSP_normalize() uses under the hood? (Welford, direct method, etc…)

Thanks for any help.

- Luigi

Eric Postpischil

unread,

Nov 11, 2015, 2:02:40 PM11/11/15

to Luigi Castelli, PerfOpt-Dev

On Nov 11, 2015, at 07:29, Luigi Castelli <super...@yahoo.com> wrote:

Now for the scalar code version of my algorithm I am not concerned so much about speed, however I want to make sure I am getting consistent results across both versions. In order to do that I would like to be able to match the numerical stability/error of the vDSP implementation.

Can someone shed some light on the numerical properties of the algorithm implemented in vDSP_normalize()?
What method vDSP_normalize() uses under the hood? (Welford, direct method, etc…)

vDSP facilitates performance at the expense of allowing leeway in accuracy. Part of the license granted to vDSP is that it may use different methods depending on processor model, stride, alignment, and other factors, and the methods may change from version to version of the Accelerate framework. So one cannot expect to exactly match the numerical error of vDSP.

In the current implementation of vDSP_normalize, the sum and the sum of squares are computed directly using one, two, or eight partial sums accumulating in parallel, if the data has unit stride. The number depends on the processor architecture vDSP_normalize executes on. Also, a few elements at the beginning of the array may be processed before the implementation begins using vector instructions, and the number of these elements depends on array alignment. And some elements at the end of the array are also processed separately, depending on the alignment of the end of the array.

This means you could pass two arrays with identical contents but different addresses to vDSP_normalize and get slightly different results, even when executing with the same software version on the same processor.