vDSP facilitates performance at the expense of allowing leeway in accuracy. Part of the license granted to vDSP is that it may use different methods depending on processor model, stride, alignment, and other factors, and the methods may change from version to version of the Accelerate framework. So one cannot expect to exactly match the numerical error of vDSP.
In the current implementation of vDSP_normalize, the sum and the sum of squares are computed directly using one, two, or eight partial sums accumulating in parallel, if the data has unit stride. The number depends on the processor architecture vDSP_normalize executes on. Also, a few elements at the beginning of the array may be processed before the implementation begins using vector instructions, and the number of these elements depends on array alignment. And some elements at the end of the array are also processed separately, depending on the alignment of the end of the array.
This means you could pass two arrays with identical contents but different addresses to vDSP_normalize and get slightly different results, even when executing with the same software version on the same processor.
—edp (Eric Postpischil)