I am interested to see how Kahan's algorithm might conflict with a "smart"
compiler so this is my testing:
Chasing round-off errors with SUM can lead to a number of options with limited
practical gains.
* Kahan is one of them.
* Using a higher precision accumulator is another approach (probably simpler).
I have written a simple test of Kahan's algorithm using real*4 precision
(listed below) and compared this to a real*8 accumulator.
I have used gFortran 6.1.0 to see what smart compiler conflicts there may be
and also tested a number of different compiler options:
set options=-O1 -mavx
set options=-O2 -mavx
set options=-O3 -mavx
set options=-O3 -mavx -ffast-math
set options=-O3 -mavx -ffast-math -funroll-loops --param max-unroll-times=2
My .bat test loop is:
del %1.exe
del %1.o
gfortran %1.f90 %options% -o %1.exe
set options >> %1.log
%1 >> %1.log
My test tried 4 approaches at using SUM of a list of real*4 values.
si = SUM intrinsic
s4 = real*4 DO loop
k4 = Kahan's algorithm for real*4
s8 = real*8 accumulator
The results I found were
-ffast-math messed with Kahan's algorithm, but
changing optimisation with -Ox doesn't (no smarts identified)
From this test case, using a real*8 accumulator appears to be the most robust
approach.
In the past when using 8086/8087 there was the uncertainty of what values were
retained in 80-bit registers.
These latest tests with -mavx appear to show no mixing of 4-byte and 8-byte
registers.
When considering other cases, the results are very sensitive to the type of
accumulated rounding error that is occurring. Claims of error analysis in the
Wiki article look wrong to me as I find that cases where you may want to
improve accuracy are where the error accumulation is unusual.
Another problem becomes as to what you should make of the result, given the
accuracy of the original data.
The test case I used is summing a set of +ve random numbers in the range 0,1.
Overflow swamps the result at about 10^7 values.
Others may be interested in using the following test or adapting to a different
set of values with different round-off characteristics.
The test code I used is:
! program to test KahanSum
!
real*4 function KahanSum (values, n)
!
https://en.wikipedia.org/wiki/Kahan_summation_algorithm
integer*4 n
real*4 values(n)
!
integer*4 i
real*4 sum, c, y, t
!
c = 0
sum = 0
do i = 1,n
y = values(i) - c ! So far, so good : c is zero
t = sum + y ! Alas, sum is big, Y small, so low-order digits of y are lost
c = (t - sum) - y ! (t-sum) cancels the high-order part of y; subtracting y recovers negative (low part of y)
sum = t ! Algebraically, c should always be zero. Beware overly-aggressive optimizing compilers !
end do ! Next time around, the low part will be added to y in a fresh attempt.
!
KahanSum = sum
end function KahanSum
real*4 function Sum8 (values, n)
integer*4 n
real*4 values(n)
!
integer*4 i
real*8 sum, y
!
sum = 0
do i = 1,n
y = values(i)
sum = sum + y
end do
!
Sum8 = sum
end function Sum8
real*4 function Sum4 (values, n)
integer*4 n
real*4 values(n)
!
integer*4 i
real*4 sum, y
!
sum = 0
do i = 1,n
y = values(i)
sum = sum + y
end do
!
Sum4 = sum
end function Sum4
Program Kahan_test
!
integer*4 n, k
real*4, allocatable :: values(:)
real*4 sum4, sum8, KahanSum, s8, k4, s4, si
!
call report_options
!
do k = 2,30
n = 2**k
allocate ( values(n) )
call random_number ( values )
s8 = sum8 ( values, n )
k4 = KahanSum ( values, n )
s4 = sum4 ( values, n )
si = sum ( values )
write (*,fmt='(i10,7es11.3)') n, si, s4, k4, s8, abs(si-s8) , abs(s4-s8), abs(k4-s8)
deallocate (values)
end do
!
end
subroutine report_options
use ISO_FORTRAN_ENV
write (*,*) 'Compiler: ',compiler_version ()
write (*,*) 'Options : ',compiler_options ()
end subroutine report_options