optimization oddities: sincos and sincosf

James

unread,

Jul 11, 2013, 8:39:23 AM7/11/13

to

Hi,

I have been surprised by an significant performance difference between the ifort and gfortran compilers and any insights would be most welcome.

A key bottleneck in a code I work on is the evaluation of exp(ix) inside a tight loop. This can be evaluated using either exp(cmplx(0.0,x)) or cmplx(cos(x),sin(x)): both approaches should be equivalent. However, I have found that gfortran and ifort treat these constructs very differently.

Using the following test programs:

program test_exp

implicit none
integer, parameter :: dp = selected_real_kind(15,307)
complex(dp) :: s
real(dp) :: x
real :: start, finish
integer :: i

s = cmplx(0.0_dp, 0.0_dp)
call cpu_time(start)
do i = 1, 100000000
call random_number(x)
s = s + exp(cmplx(0.0_dp,x))
end do
call cpu_time(finish)
write (6,*) s, finish-start

end program test_exp

program test_sincos

implicit none
integer, parameter :: dp = selected_real_kind(15,307)
complex(dp) :: s
real(dp) :: x
real :: start, finish
integer :: i

s = cmplx(0.0_dp, 0.0_dp)
call cpu_time(start)
do i = 1, 100000000
call random_number(x)
s = s + cmplx(cos(x), sin(x))
end do
call cpu_time(finish)
write (6,*) s, finish-start

end program test_sincos

ifort and gfortran give rather different timings:

ifort 12.1.4.319 test_exp : (84148188.8383174,45967108.5120851) 3.188199
ifort 12.1.4.319 test_sincos: (84148188.8380664,45967108.5207939) 2.528158
gfortran 4.8.1 test_exp : ( 84147209.604948997 , 45969226.660132274 ) 3.2002001
gfortran 4.8.1 test_sincos : ( 84147209.633834898 , 45969226.716904908 ) 5.0843172

I can understand (to an extent) the difference in optimization behaviour: ifort links to libm_sse2_sincos for test_sincos but sinf and cos for test_exp whereas gfortran links to glibc's sincos and sincosf respectively. sincosf is single precision and so faster than sincos (hence gfortran's behaviour) and calculating sin and cos simultaneously (both in double precision) is faster than calculating sin (in single precision) and cos (in double precision) separately (hence ifort's behaviour).

What I don't understand is: why are (some) calls in the exp version converted to single precision and why does gfortran convert both sin and cos to single precision in the exp version whereas ifort only converts the sin call?

If I drop x and s to single precision (which I don't want to do for production calculations), then gfortran gives the same time for both cases (both of which call glibc's sincosf). ifort resolutely sticks to using separate cos (still double precision despite changing the 0 to single precision) and sinf calls for the exp version and libm_sse2_sincosf for the sin,cos version.

It's somewhat frustrating that the substantially faster version is different in each compiler...

Thanks,

James

Gordon Sande

unread,

Jul 11, 2013, 9:38:50 AM7/11/13

to

You have been had by the gotcha of cmplx type specification. See below.

> If I drop x and s to single precision (which I don't want to do for
> production calculations), then gfortran gives the same time for both
> cases (both of which call glibc's sincosf). ifort resolutely sticks to
> using separate cos (still double precision despite changing the 0 to
> single precision) and sinf calls for the exp version and
> libm_sse2_sincosf for the sin,cos version.
>
> It's somewhat frustrating that the substantially faster version is
> different in each compiler...
>
> Thanks,
>
> James

Without addressing your stated problem there are two issues that make
your problem
"not well formed" as they say about mathematical formulae.

1. cmplx has a serious gotcha concerning precision comversions. Read
all its fine print
very carefully several times. The awkward specification is there for
historical
continuity. It causes trouble and has been complained about here
several times.
At the very least use explicit kind specifications. Notice that your
question is
about spurious type conversions so the gotcha should not be very
surprising at
this point.

2. Different vendors use different pseudo random number generators so
timing of trivial
randomized tests can be badly mislead both by the cost and quality
of the pseudo
random number generators. The question is what fraction of the
running time goes to
the PRNG when the rest of the computation is trivial? What effect
does the differnt
sequence of PRNs have on the cost of the test?

P.S. There is something curious about your line endings that makes the
pasted text
hard to follow.

robin....@gmail.com

unread,

Jul 11, 2013, 11:10:26 AM7/11/13

to

The fault is in your code.

In both versions, cmplx returns a single precision result.
You must specify the kind when you expect a double-precision
result from CMPLX, e.g.,
s = s + cmplx(cos(x), sin(x), kind=dp)

When F90 was designed, CMPLX wasn't designed as a generic function.
It defaults to single precision result, regardless of the kind(s)
of the arguments.

Moreover, the committee didn't understand generic functions.
Consequently, when the first argument is complex, the committee
didn't understand that if a second argument is supplied, then
it must be the kind.
That could have been handled simply through the generic mechanism.

Returning to your examples, s = cmplx(0.0_dp, ...
can be replaced with
s = 0

Zaak

unread,

Jul 11, 2013, 11:15:30 AM7/11/13

to

On Thursday, July 11, 2013 9:38:50 AM UTC-4, Gordon Sande wrote:
> P.S. There is something curious about your line endings that makes the
> pasted text
> hard to follow.

His line endings look fine for me....

Richard Maine

unread,

Jul 11, 2013, 12:10:10 PM7/11/13

to

<robin....@gmail.com> wrote:

> When F90 was designed, CMPLX wasn't designed as a generic function.

...

> Moreover, the committee didn't understand generic functions.

We've been down this path before, so I won't (re)argue about it past
this one note for others who might not have seen or recall the previous
bout.

Robin has his own definition of "generic", which is not particularly
close to that of the standard. When he says that CMPLX is not generic,
he means that it doesn't fit Robin's definition of generic. It does fit
the definition in the standard.

And "didn't understand" is Robin-speak for didn't do things the way that
Robin would have liked. The committee understood exactly what they were
doing in this case, and the issues involved in the requirements for
genericity are *FAR* more subtle than are worth bringing up here, when
even the trivial aspects result in claims like those quoted above.

I'd also have preferred that things be different with CMPLX, but I don't
like to use disinformation to argue for them. My personal preference
would be to leave CMPLX alone for compatibility with old code, but to
add a separate intrinsic (the name COMPLEX comes to mind as an obvious
one) without the same wart.

(Used to have him on my killfile before I started anew on this computer;
I suppose he's likely to end up back on it eventually).

--
Richard Maine
email: last name at domain . net
domain: summer-triangle

glen herrmannsfeldt

unread,

Jul 11, 2013, 2:13:43 PM7/11/13

to

James <james.s...@gmail.com> wrote:

> A key bottleneck in a code I work on is the evaluation of
> exp(ix) inside a tight loop. This can be evaluated using
> either exp(cmplx(0.0,x)) or cmplx(cos(x),sin(x)): both
> approaches should be equivalent. However, I have found
> that gfortran and ifort treat these constructs very differently.

Besides the question on the KIND of CMPLX, and besides the usual
comments on premature optimization, I would use the second form.

Note that the first form, at least potentially and reasonably
likely in practice, multiplies some numbers by EXP(0.0).

While one should expect optimizing compilers to optimize
the expressions you write, it isn't so obvious that they
expand what you write and optimize inside intrinsic functions.

exp(cmplx(0.0,x)), as written, generates the complex number
and calls the complex exp() routine. That routine may test
for the real part 0.0, or it may pass it to the real exp()
routine which might test it and return 1.0.

Note that likely the most popular Fortran routine to use
complex values, FFT routines, normally don't use complex
variables or intrinsic functions. The required data movement
tends to be easier with pairs of real variables, and also
avoid some unneeded multiplication by zero.

-- glen

FX

unread,

Jul 11, 2013, 5:40:41 PM7/11/13

to

> Note that likely the most popular Fortran routine to use complex
> values, FFT routines, normally don't use complex variables or intrinsic
> functions.

I beg to differ. What I consider to be the two most-used FFT libraries,
FFTW and fftpack, both actually use complex arrays rather than pair of
arrays (or arrays of pairs):

-- http://www.fftw.org/doc/Fortran-Examples.html
-- http://www.netlib.org/fftpack/doc

--
FX

Dan Nagle

unread,

Jul 11, 2013, 6:01:14 PM7/11/13

to

Hi,

On 2013-07-11 16:10:10 +0000, Richard Maine said:

> I'd also have preferred that things be different with CMPLX, but I don't
> like to use disinformation to argue for them. My personal preference
> would be to leave CMPLX alone for compatibility with old code, but to
> add a separate intrinsic (the name COMPLEX comes to mind as an obvious
> one) without the same wart.

This was proposed for f08, and didn't make the cut because
it was viewed as too easy for the user to fix using CMPLX
combined with a desire to keep the work list as small as possible.
That is, bang for the buck was wrong at that time.

--
Cheers!

Dan Nagle

glen herrmannsfeldt

unread,

Jul 11, 2013, 6:45:20 PM7/11/13

to

I carefully didn't say the most popular implemenations, just that
FFT was the most popular use for complex. (I am not sure how to
weight the uses of older ones.)

There have been many FFT routines over the years.

Does fftpack do actual complex arithmetic on them, or just
extract and restore real and imaginary parts?

I still remember one that used nested DO loops to do the bit
reversal moves. It then used appropriate IF and GOTO to go into
and out of (at the end) each level of DO depending on the size
of the transform. I am sure violating the standard on the use
of DO loops, but it must have worked.

-- glen

John Harper

unread,

Jul 11, 2013, 10:30:37 PM7/11/13

to

glen herrmannsfeldt wrote:

> James <james.s...@gmail.com> wrote:
>
>> A key bottleneck in a code I work on is the evaluation of
>> exp(ix) inside a tight loop. This can be evaluated using
>> either exp(cmplx(0.0,x)) or cmplx(cos(x),sin(x)): both
>> approaches should be equivalent. However, I have found
>> that gfortran and ifort treat these constructs very differently.
>
> Besides the question on the KIND of CMPLX, and besides the usual
> comments on premature optimization, I would use the second form.

I tried amalgamating the OP's 2 programs, using a third method inside the
loop, fixing the CMPLX problem, and also calculating a geometric progression
instead of using pseudorandom numbers so the "exact" (really quad precision)
mathematical result could be compared with the computed results. I used
ifort Version 12.1 Build 20120212 and gfortran version 4.7.1 on the same
Linux x86_64 machine. It seems that gfortran's arithmetic can give different
answers with different optimisation levels, that with the -Ofast option it
was faster _for this program_ than ifort (I have had ifort twice as fast as
gfortran when dealing with lots of double precision real arrays), and that
the speed varies with the algorithm (surprise surprise). Using ifort -O2
made no difference to the results and at most 0.025 seconds to the time in a
program taking it 8 or 11 seconds. FWIW gfortran compiled much faster than
ifort: useful when trying possibilities, unimportant for production.

The program:

program test_exp_cos_sin
implicit none
complex(selected_real_kind(33)) :: iquad = (0,1), w
character(*),parameter:: fmt = "(' (',F21.18,',',F21.18,' ) ',A)"
integer,parameter :: ilast = 10**8
character(7) time
call timing(1)
call timing(2)
call timing(3)
w = exp(iquad)
write (6,fmt) w*(1-w**ilast)/(1-w),'"exact"'
contains

subroutine timing(iwhich)
integer,intent(in) :: iwhich

integer, parameter :: dp = selected_real_kind(15,307)

complex(dp),parameter :: dpi = (0._dp, 1._dp)
complex(dp) :: s
real(dp) :: x
integer :: i
real :: start, finish
s = cmplx(0.0_dp, 0.0_dp)
x = 0.0_dp
call cpu_time(start)
do i = 1,ilast
x = x + 1.0_dp
if (iwhich==1) then
s = s + exp( dpi*x )
else if (iwhich==2) then
s = s + exp(cmplx(0.0_dp,x,dp))
else
s = s + cmplx(cos(x),sin(x),dp)
end if
end do
call cpu_time(finish)
write(time,"(F7.3)") finish-start
write (6,fmt) s, time
end subroutine timing

end program test_exp_cos_sin

Output with ifort:
( 0.170984355418128453, 1.713649346570515153 ) 11.015
( 0.170984355418128453, 1.713649346570515153 ) 8.026
( 0.170984355418128453, 1.713649346570515153 ) 8.013
( 0.170984355418398518, 1.713649346570575846 ) "exact"
Output with gfortran (no options requested):
( 0.170984355418201228, 1.713649346570279786 ) 13.416
( 0.170984355418201228, 1.713649346570279786 ) 5.879
( 0.170984355417932610, 1.713649346570545795 ) 43.033
( 0.170984355418398518, 1.713649346570575846 ) "exact"
Output with gfortran -Ofast:
( 0.170984355418201228, 1.713649346570279786 ) 5.548
( 0.170984355418201228, 1.713649346570279786 ) 5.554
( 0.170984355418201228, 1.713649346570279786 ) 5.458
( 0.170984355418398518, 1.713649346570575846 ) "exact"

--
John Harper

robin....@gmail.com

unread,

Jul 11, 2013, 10:50:41 PM7/11/13

to

On Friday, July 12, 2013 12:30:37 PM UTC+10, jfh wrote:
> glen herrmannsfeldt wrote:

You didn't fix all the CMPLX problems. Here is one that uses
kind(dp) arguments, but then converts them to single precision,
and finally for assignment, converts the single complex result
back to kind(dp).

Nasser M. Abbasi

unread,

Jul 12, 2013, 12:52:55 AM7/12/13

to

On 7/11/2013 9:50 PM, robin....@gmail.com wrote:

>>
>> s = cmplx(0.0_dp, 0.0_dp)
>

> You didn't fix all the CMPLX problems. Here is one that uses
> kind(dp) arguments, but then converts them to single precision,
> and finally for assignment, converts the single complex result
> back to kind(dp).
>

I was wondering if there a way to catch these things at compile
time? The following did not generate any warnings:

-----------------------------------------------
gfortran -Wextra -Wall -pedantic -fcheck=all -fcheck=do
-fwhole-file -funroll-loops -ftree-vectorize -Wsurprising
-Wconversion test_exp_cos_sin.f90
---------------------------------------------

>gfortran -v
Target: i686-linux-gnu
gcc version 4.7.3 (Ubuntu/Linaro 4.7.3-1ubuntu1)

-----------------
Syntax:
C = CMPLX(X[,Y,KIND])
Arguments:

X The type may be INTEGER(*), REAL(*), or COMPLEX(*).
Y Optional, allowed if X is not COMPLEX(*). May be INTEGER(*) or REAL(*).
KIND Optional scaler integer initialization expression.
--------------------------------------

thanks,
--Nasser

FX

unread,

Jul 12, 2013, 3:50:56 AM7/12/13

to

> I carefully didn't say the most popular implemenations, just that FFT
> was the most popular use for complex. (I am not sure how to weight the
> uses of older ones.)

On that I agree: I think FFT is the most popular use for complex.

However, the end of your sentence was: "FFT routines normally don't use
complex variables or intrinsic functions". I disagree with that
"normally", hence my message.

--
FX

robin....@gmail.com

unread,

Jul 12, 2013, 5:04:24 AM7/12/13

to

On Friday, July 12, 2013 2:52:55 PM UTC+10, Nasser M. Abbasi wrote:

The syntax for the above CMPLX is incorrect.
It implies that either X is supplied, or, if y is supplied,
then kind must be also.

robin....@gmail.com

unread,

Jul 12, 2013, 5:23:46 AM7/12/13

to

On Friday, July 12, 2013 2:10:10 AM UTC+10, Richard Maine wrote:

> <o.....@gmail.com> wrote:
>
> > When F90 was designed, CMPLX wasn't designed as a generic function.
>
> ...
>
> > Moreover, the committee didn't understand generic functions.
>
>
>
> We've been down this path before, so I won't (re)argue about it past
>
> this one note for others who might not have seen or recall the previous
>
> bout.
>
>
>
> Robin has his own definition of "generic", which is not particularly
>
> close to that of the standard. When he says that CMPLX is not generic,
>
> he means that it doesn't fit Robin's definition of generic. It does fit
>
> the definition in the standard.
>
>
>
> And "didn't understand" is Robin-speak for didn't do things the way that
>
> Robin would have liked.

Don't talk rot.
CMPLX is not generic by any definition of the word.

Let me spell it out for you, Richard.
The committee could have provided (via two hidden) explicit names
to deal with the case when the first argument is complex.
In one case, the second argument is omitted.
In the second case, the second argument is supplied,
in which case it must be the kind.

Incidentally, that change could still be done.

> The committee understood exactly what they were
> doing in this case,

We will have to disagree on that. I just spelled it out for you
above.

> and the issues involved in the requirements for
> genericity are *FAR* more subtle than are worth bringing up here, when
> even the trivial aspects result in claims like those quoted above.

The fact that an experienced programmer made the same mistake
with CMPLX even after I pointed it out emphasizes the point
that CMPLX is NOT generic.

> I'd also have preferred that things be different with CMPLX, but I don't
>
> like to use disinformation to argue for them. My personal preference
>
> would be to leave CMPLX alone for compatibility with old code, but to
>
> add a separate intrinsic (the name COMPLEX comes to mind as an obvious
>
> one) without the same wart.

warts (plural).

glen herrmannsfeldt

unread,

Jul 12, 2013, 1:23:50 PM7/12/13

to

It has been many years since I actually looked at the source
for FFT routines, but in the past I did look at many different ones.

Consider an unweighted average over all FFT routines written
in the past 48 years.

-- glen

-- glen

Richard Maine

unread,

Jul 12, 2013, 1:36:58 PM7/12/13

to

If you are talking about that long ago, consider that a substantial bar
to doing FFTs (or many things) using complex was that there was no
standard way to do double precision complex prior to f90. There were two
fairly common nonstandard forms, spelled "double complex" and
"complex*16". But there were compilers that accepted all 4 combinations
of those two forms: one, the other, both, or neither. I recall using
compilers of all 4 varieties.

I'm not counting the kind syntax introduced in f90.

I also recall at least one fairly popular library (though I forget which
one) that used both forms of declaration. That seemed awfully sloppy
coding discipline to me, managing to minimize the set of compilers it
would work on as is. Changing he declarations wasn't particularly hard
(unless one was using one of the compilers that accepted none of the
forms), but it was a bit of a maintenance and porting nuisance.

glen herrmannsfeldt

unread,

Jul 12, 2013, 3:14:26 PM7/12/13

to

Richard Maine <nos...@see.signature> wrote:
> glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
>> FX <cou...@alussinan.org> wrote:

(snip)

>> > However, the end of your sentence was: "FFT routines normally
>> > don't use complex variables or intrinsic functions".
>> > I disagree with that "normally", hence my message.

>> It has been many years since I actually looked at the source
>> for FFT routines, but in the past I did look at many different ones.

>> Consider an unweighted average over all FFT routines written
>> in the past 48 years.

> If you are talking about that long ago, consider that a substantial
> bar to doing FFTs (or many things) using complex was that there
> was no standard way to do double precision complex prior to f90.

I almost wrote 50 years, but 48 years is the Cooley-Tukey paper.

Yes, the lack of double precision complex would have been another
reason. But I believe the inner loop of the FFT is just a little
easier to do, and expect good generate code, in real.

> There were two fairly common nonstandard forms, spelled
> "double complex" and "complex*16". But there were compilers
> that accepted all 4 combinations of those two forms: one,
> the other, both, or neither. I recall using compilers of
> all 4 varieties.

As to the OP, how good are compilers at recognizing a multiply
by (0.0,1.0) without doing any multiply instructions? As well
as I know them, many compilers over the years would call the
complex multiply library routine even for that one.

That would have been another reason to do it with REAL.

-- glen

Dick Hendrickson

unread,

Jul 12, 2013, 4:35:30 PM7/12/13

to

Also, it's trivial to do with statement functions.

complex(x,y) = cmplx(x,y,kind=8)

Clear, concise, easy to understand, and the optimizers will get it right!

Dick Hendrickson

robin....@gmail.com

unread,

Jul 12, 2013, 8:09:22 PM7/12/13

to

On Saturday, July 13, 2013 6:35:30 AM UTC+10, Dick Hendrickson wrote:

> Also, it's trivial to do with statement functions.
>
> complex(x,y) = cmplx(x,y,kind=8)
>
> Clear, concise, easy to understand, and the optimizers will get it right!

Except that that isn't portable.

robin....@gmail.com

unread,

Jul 12, 2013, 8:39:10 PM7/12/13

to

On Saturday, July 13, 2013 3:36:58 AM UTC+10, Richard Maine wrote:
> glen herrmannsfeldt <g......@ugcs.caltech.edu> wrote:

>
> > FX <c......@alussinan.org> wrote:
>
> > >> I carefully didn't say the most popular implemenations, just that FFT
> > >> was the most popular use for complex. (I am not sure how to weight the
> > >> uses of older ones.)
>
> > > On that I agree: I think FFT is the most popular use for complex.
> > > However, the end of your sentence was: "FFT routines normally don't use
> > > complex variables or intrinsic functions". I disagree with that
> > > "normally", hence my message.
>
> > It has been many years since I actually looked at the source
> > for FFT routines, but in the past I did look at many different ones.
> > Consider an unweighted average over all FFT routines written
> > in the past 48 years.
>
> If you are talking about that long ago, consider that a substantial bar
> to doing FFTs (or many things) using complex was that there was no
> standard way to do double precision complex prior to f90.

There was, and that was in PL/I, 24 years before.

robin....@gmail.com

unread,

Jul 12, 2013, 8:44:49 PM7/12/13

to

On Saturday, July 13, 2013 5:14:26 AM UTC+10, glen herrmannsfeldt wrote:
> Richard Maine <nos...@see.signature> wrote:
>

> > glen herrmannsfeldt <g.....@ugcs.caltech.edu> wrote:

>
> >> FX <c......@alussinan.org> wrote:
>
>
>
> (snip)
>
> >> > However, the end of your sentence was: "FFT routines normally
>
> >> > don't use complex variables or intrinsic functions".
>
> >> > I disagree with that "normally", hence my message.
>
>
>
> >> It has been many years since I actually looked at the source
>
> >> for FFT routines, but in the past I did look at many different ones.
>
>
>
> >> Consider an unweighted average over all FFT routines written
>
> >> in the past 48 years.
>
>
>
> > If you are talking about that long ago, consider that a substantial
>
> > bar to doing FFTs (or many things) using complex was that there
>
> > was no standard way to do double precision complex prior to f90.
>
>
>
> I almost wrote 50 years, but 48 years is the Cooley-Tukey paper.
> Yes, the lack of double precision complex would have been another
> reason. But I believe the inner loop of the FFT is just a little
> easier to do, and expect good generate code, in real.

Au contraire, complex FFT is much shorter than real FFT.
As well as that, complex runs about 10% faster --
simply because the compiler has more to work with,
namely, pairs of values, and can do a better job of
optimisation.

Terence

unread,

Jul 12, 2013, 9:39:53 PM7/12/13

to

Line endings of OP looked OK with my reader!
But Gordon's OWN lines were broken up
into oddly-arranged pieces. Like this one..

Terence

unread,

Jul 12, 2013, 9:52:52 PM7/12/13

to

If the experienced Fortran programmer (I don't include myself any more)
gets confused with any current or past 'standard'; then the committee who
made that decision didn't do a thorough job.

Fortran was SUPPOSED to be intuitive to a mathematician.

Remember the adage of designing a horse by committee?
I also am reminded of Shakespeare's plotters, (who, wanting to start over),
"..first let's kill the lawyers".

Any chance we can go back a few 'standards' (say to F95) and start over?

Dick Hendrickson

unread,

Jul 12, 2013, 10:13:00 PM7/12/13

to

I forgot the IMPLICIT TONGUE_IN_CHEEK line ;)
Obviously, it should be "kind=dp" to be portable ;).

Dick Hendrickson

Richard Maine

unread,

Jul 12, 2013, 10:49:25 PM7/12/13

to

I really ought to just killfile Terence for being consistently
irrelevant and repeating this pet notion of his at every opportunity, or
not even bothering to wait for an opportunity where it is actually
relevant.

But just to note why it is irrelevant for the particular case in
question....

Since the exact full syntax in question was introduced in f90, going
back to f95 (or even f90) wouldn't have much to do with removing any
confusion relating to it, now would it?

Even further, as noted here multiple times, the complete and whole
reason for the confusing wart in CMPLX was for compatibility with f77,
which specified that CMPLX with double precision real arguments gave a
single precision complex result. So going back to f77 isn't going to fix
that either. In fact, if one was really intent on fixing it in f90,
throwing out f77 compatibility would have done the trick. Not that I'd
have advised that, but if avoiding the wart was the overriding
objective, throwing out f77 compatibility would have been the simplest
fix.

So if you are going to go back a few standards to avoid that particular
confusion, you'll have to go back to f66.

James Van Buskirk

unread,

Jul 13, 2013, 1:33:17 AM7/13/13

to

<robin....@gmail.com> wrote in message
news:272eae7e-855c-4095...@googlegroups.com...

It is easier to express the code for a full complex FFT than for a
real-half complex FFT, and this in turn may make it fit in icache
more readily that real-half complex FFT, but the latter definitely
takes fewer operations when performed directly, as when the
ultimate goal is real convolution, whether via the convolution
theorem or one of its slightly more efficient anologs.

--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end

glen herrmannsfeldt

unread,

Jul 13, 2013, 3:18:03 AM7/13/13

to

James Van Buskirk <not_...@comcast.net> wrote:
> <robin....@gmail.com> wrote in message

(snip about FFT and real or complex variables)

>> Au contraire, complex FFT is much shorter than real FFT.
>> As well as that, complex runs about 10% faster --
>> simply because the compiler has more to work with,
>> namely, pairs of values, and can do a better job of
>> optimisation.

> It is easier to express the code for a full complex FFT than for a
> real-half complex FFT, and this in turn may make it fit in icache
> more readily that real-half complex FFT, but the latter definitely
> takes fewer operations when performed directly, as when the
> ultimate goal is real convolution, whether via the convolution
> theorem or one of its slightly more efficient anologs.

The usual way, as far as I know, to do the real transform is to
transform to a N/2 element complex array, transform that, and then
fix it up as the real transform would be.

But what I actually meant was doing the complex transform with
pairs of real variables.

It isn't completely obvious that complex array actual to an assumed
size dummy real array comes out twice as long, with alternating real
and imaginary elements, but it usually does.

A fairly common operation inside the FFT is multiplying a complex
number by (0.,1.), which normally requires four real multiplications.
A slightly smart compiler will figure out that two are always zero.
An even smarter one will figure out that the other two aren't
needed at all.

As far as I can tell, gfortran without any -O does two multiplies
and with -O3 doesn't do any. In the past, many did complex multiply
by subroutine call.

-- glen

robin....@gmail.com

unread,

Jul 13, 2013, 4:59:53 AM7/13/13

to

On Saturday, July 13, 2013 12:49:25 PM UTC+10, Richard Maine wrote:

> Even further, as noted here multiple times, the complete and whole
> reason for the confusing wart in CMPLX was for compatibility with f77,
> which specified that CMPLX with double precision real arguments gave a
> single precision complex result.

No it wasn't.
First, generic CMPLX could have included (from F90) a 2-argument
form where the first dummy argument is complex and the second
dummy argument is the kind (without the need for the explicit
appearance of the keyword KIND).

Second, either:
(1) a compiler option could have enabled the compiler to use
F77 rules for CMPLX (just as is done for minimum 1-trip DO); or
(2) a specification statement could have been provided to
use a new form of CMPLX that behaves like a generic function.

> So going back to f77 isn't going to fix
> that either. In fact, if one was really intent on fixing it in f90,
> throwing out f77 compatibility would have done the trick. Not that I'd
> have advised that, but if avoiding the wart was the overriding
> objective, throwing out f77 compatibility would have been the simplest
> fix.

See above.

FX

unread,

Jul 13, 2013, 8:32:03 AM7/13/13

to

> As far as I can tell, gfortran without any -O does two multiplies
> and with -O3 doesn't do any. In the past, many did complex multiply
> by subroutine call.

With -Ofast, gfortran avoids multiplications altogether. The reason it
doesn't do it already at lower optimization levels, I think, is to
account for special cases where the complex argument might have
infinities or NaN's in its components.

--
FX

Thomas Koenig

unread,

Jul 13, 2013, 3:18:44 PM7/13/13

to

glen herrmannsfeldt <g...@ugcs.caltech.edu> schrieb:

> As to the OP, how good are compilers at recognizing a multiply
> by (0.0,1.0) without doing any multiply instructions?

That depends. If you want full IEEE sematics including NaN and
Infinity handling, you will need multiplies. If you indicate that
you don't want it (for example by using -ffast-math with gfortran)
then you will get the direct assignment.

FX

unread,

Jul 13, 2013, 4:00:30 PM7/13/13

to

> If you want full IEEE sematics including NaN and Infinity handling, you
> will need multiplies.

On this topic, the reading of
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1399.htm left me amazed
by the number of possible corner cases.

--
FX

John Harper

unread,

Jul 14, 2013, 7:30:15 PM7/14/13

to

Apologies - Robin is right. I should of course have said either s =
cmplx(0.0_dp, 0.0_dp, dp) or s = (0.0_dp,0.0_dp) or s = 0.0_dp or just s = 0
but as it happens, both compilers appeared to convert kind(1.0) complex zero
to kind(1d0) zero without putting in the garbage they might have.

On using s = 0.0_dp my numerical results were unchanged and none of the run
times changed by as much as 2%. As the machine has various users the changes
that did happen were insignificant.

--
John Harper

John Harper

unread,

Jul 14, 2013, 8:29:12 PM7/14/13

to

Several fishhooks are waiting there for young players.

1. Last time I used the NAG compiler without the -kind=byte option, kind=8
would not have worked. Kind=2 or (better for portability) kind=kind(1d0) or
kind=selected_real_kind(15) would have.

2. Even when that's fixed it's not quite as trivial as Dick's one-liner. The
name complex must also be declared with a suitable complex type, and if x
and y are not elsewhere declared then they are default real, which gets us
back to the unappealing feature of cmplx we started with.

3. There are various things you can't do with a statement function, e.g.
declaring it in a module but not inside a module subprogram.

--
John Harper

James

unread,

Jul 16, 2013, 1:58:31 PM7/16/13

to

Hi all,

Sorry for the slow reply: I was unexpectedly offline for a few days. Thanks to all for the interesting discussion (I didn't expect it to explode like it did!) and everyone who pointed out the error of my ways.

I find it very surprising that the default behaviour of cmplx and real differ: calling real on a complex number does not result in the loss of precision...

As asked by Nasser, are there any compilers which warn about these kinds of issues at compile time?

Thanks,

James

ps I realise that the difference in the implementations of the random number generator make it invalid to compare between compilers; I was more interested in comparing between versions with the same compiler...

Richard Maine

unread,

Jul 16, 2013, 2:57:35 PM7/16/13

to

James <james.s...@gmail.com> wrote:

> I find it very surprising that the default behaviour of cmplx and real
> differ: calling real on a complex number does not result in the loss of
> precision...

Why should the REAL intrinsic do such a strange thing? You might have
missed the explanation of why the cmplx intrinsic behaves in the
admitedly strange way that it does (possibly because the fairly simple
explanation got burried in misinformation from a few posters). But that
reason would not extend to anything else. It really is specific only to
the cmplx intrinsic. Namely...

Standard Fortran 77 (and 66 before it) had no double precision complex.
The only complex in the language was single precision. Therefore, cmplx
with double precision real arguments returned a complex of the only
precision that existed, namely single precision. The only other option
at that time would have been to disallow double precsion arguments in
cmplx at all (an option that I could see at least some sense to, but
that's getting into history nearly 40 years old).

With f90, double precision complex was added to the standard. If things
were redone from scratch, the CMPLX intrinsic would certainly have
returned a double precision complex result from double precision real
arguments. I am absolutely 100% confident of that. Nobody on the
committee would have actually thought it desirable behavior to return a
single precision complex result. The *ONLY* reason that CMPLX returns a
single precision result in that case is that f77 did that (as explained
above), and changing it would have broken standard-conforming f77 codes.

It probably would not have broken a lot of codes, as the extra precision
would not have hurt anything in most contexts. But there are a few cases
where it the change would actually have caused formerly
standard-conforming codes to fail to compile and link. One might well
argue whether breaking those few cases might have been an acceptable
price to pay. I decline to debate the point. (When there are 2 posters
in the thread who seem inclined to argue with simple matters of fact, it
is pointless to try to debate opinions about tradeoffs like that).
Regardless of whether one thinks that was a good choice or not, the
*ONLY* reason for that choice was to avoid breaking previously existing
Fortran 77 codes.

That reason does not apply to the REAL intrinsic with complex arguments.
So it doesn't do anything so strange. I know of nobody who would have
thought it a reasonable way for the REAL intrinsic to act. I suppose I
could imagine an argument along the lines of "cmplx is screwed up, so
let's screw up real in a simillar way just for consistency." But I don't
think I've never heard anyone seriously propose such a position.

glen herrmannsfeldt

unread,

Jul 16, 2013, 3:23:07 PM7/16/13

to

James <james.s...@gmail.com> wrote:

(snip)

> Sorry for the slow reply: I was unexpectedly offline for a few
> days. Thanks to all for the interesting discussion
> (I didn't expect it to explode like it did!) and everyone who
> pointed out the error of my ways.

> I find it very surprising that the default behaviour of cmplx
> and real differ: calling real on a complex number does not
> result in the loss of precision...

Personally, though maybe no-one else agrees, I believe that they
shouldn't have given two uses to the REAL function. As always, it
gives the real part of a complex value, but now also converts to
the appropriate KIND of type REAL.

Now, if all other types had functions with the same name as the
type then I might agree, but there is the INT function for converting
to INTEGER type (or the specified KIND), and, as noted CMPLX for
converting to COMPLEX type. Why not INTEGER and COMPLEX functions
instead?

As usual, it is way too late, but leaving CMPLX without the KIND
argument, and supplying COMPLEX with KIND would have solved this
problem. Supplying INTEGER along side INT allows for consistency.

As previouly noted, the reason for this strange behavior is that
in versions of Fortran, I believe through 77, that didn't have
a double precision complex type the CMPLX function would allow
two DOUBLE PRECISION values to convert to (the only) COMPLEX type.

I am not so sure how many program that change would have broken,
but it would have been too many.

-- glen

Dick Hendrickson

unread,

Jul 16, 2013, 4:45:41 PM7/16/13

to

On 7/16/13 2:23 PM, glen herrmannsfeldt wrote:
> James<james.s...@gmail.com> wrote:
>
> (snip)
>> Sorry for the slow reply: I was unexpectedly offline for a few
>> days. Thanks to all for the interesting discussion
>> (I didn't expect it to explode like it did!) and everyone who
>> pointed out the error of my ways.
>
>> I find it very surprising that the default behaviour of cmplx
>> and real differ: calling real on a complex number does not
>> result in the loss of precision...
>
> Personally, though maybe no-one else agrees, I believe that they
> shouldn't have given two uses to the REAL function. As always, it
> gives the real part of a complex value, but now also converts to
> the appropriate KIND of type REAL.

But, the F66 REAL did more than extract the real part of a complex
value, it also converted integers to real.

>
> Now, if all other types had functions with the same name as the
> type then I might agree, but there is the INT function for converting
> to INTEGER type (or the specified KIND), and, as noted CMPLX for
> converting to COMPLEX type. Why not INTEGER and COMPLEX functions
> instead?

Because FORTRAN 77 was limited to 6 character names.

>
> As usual, it is way too late, but leaving CMPLX without the KIND
> argument, and supplying COMPLEX with KIND would have solved this
> problem. Supplying INTEGER along side INT allows for consistency.

Yes, probably true, but there's not much need for INTEGER when INT does
pretty much what INTEGER does in the normal case. In the F77/90 time
frame double length integers were not commonly supported.

Dick Hendrickson

glen herrmannsfeldt

unread,

Jul 16, 2013, 6:12:48 PM7/16/13

to

Dick Hendrickson <dick.hen...@att.net> wrote:

(snip on CMPLX)

>> Personally, though maybe no-one else agrees, I believe that they
>> shouldn't have given two uses to the REAL function. As always, it
>> gives the real part of a complex value, but now also converts to
>> the appropriate KIND of type REAL.

> But, the F66 REAL did more than extract the real part of a complex
> value, it also converted integers to real.

No, that is FLOAT. F66 doesn't have generics, so a function argument
can only have one possible type. For REAL that is COMPLEX.

>> Now, if all other types had functions with the same name as the
>> type then I might agree, but there is the INT function for converting
>> to INTEGER type (or the specified KIND), and, as noted CMPLX for
>> converting to COMPLEX type. Why not INTEGER and COMPLEX functions
>> instead?

> Because FORTRAN 77 was limited to 6 character names.

Yes, but F90 is when KIND appeared, and when the need for KIND
conversion appeared.

>> As usual, it is way too late, but leaving CMPLX without the KIND
>> argument, and supplying COMPLEX with KIND would have solved this
>> problem. Supplying INTEGER along side INT allows for consistency.

> Yes, probably true, but there's not much need for INTEGER when
> INT does pretty much what INTEGER does in the normal case.
> In the F77/90 time frame double length integers were not
> commonly supported.

I suppose, but having the names match the types seems reasonable.

(snip)

>> I am not so sure how many program that change would have broken,
>> but it would have been too many.

Most of the time it wouldn't matter. If the return value is used
as an actual argument for a subroutine or function, it could fail
in bad ways.

-- glen

James

unread,

Jul 16, 2013, 6:22:33 PM7/16/13

to

> > I find it very surprising that the default behaviour of cmplx and real
> > differ: calling real on a complex number does not result in the loss of
> > precision...
>
> Why should the REAL intrinsic do such a strange thing? You might have
> missed the explanation of why the cmplx intrinsic behaves in the
> admitedly strange way that it does (possibly because the fairly simple
> explanation got burried in misinformation from a few posters). But that
> reason would not extend to anything else. It really is specific only to
> the cmplx intrinsic. Namely...

REAL should not do such a strange thing! My point was really the opposite: I had extrapolated how the cmplx intrinsic would work based upon how the real intrinsic (on a complex number) worked, rather than reading the documentation. Thanks for the history lesson--I now understand (even if I don't like) why this behaviour came about...

James