ELEMENTAL functions and performance

deltaquattro

unread,

Jul 29, 2008, 12:57:50 PM7/29/08

to

Hi,

I would like to know if the attribute ELEMENTAL for a function/
subroutine can negatively affect performance or not.

I am going to code a lot of small functions, for things like Jacobi
polynomial evalution, Gauss point generation, etc. etc., to be used in
a "huge" code. I would like the functions to work for scalar as well
as for rank-1 array input/output. I'd prefer to write them as
ELEMENTAL, rather than forcing the user to read/write a scalar into a
rank-1 array of size 1 each time he has to use the subroutine for
scalars.
However, sometimes I heard that some "new" features of the language
may have an hit on performance, because they make optimization more
difficult for the compiler. I don't know if this is the case with
ELEMENTAL, however these functions will be called a lot of times, so
it's not a good idea if they are slow. The code will run on parallel
vectorial supercomputers. According to these informations, what would
you suggest me to do?
Thanks,

Best Regards,

deltaquattro

rusi_pathan

unread,

Jul 30, 2008, 12:24:53 AM7/30/08

to

I think one the purpose of having elemental functions was that
compilers can optimize it better (for example by auto-parallelization
when used on whole arrays). What they actually do in practice is a
different matter alltogether.

Daniel Kraft

unread,

Jul 30, 2008, 3:37:56 AM7/30/08

to deltaquattro

Hi,

I can't tell what your compiler does, but if you call the ELEMENTAL
function on scalars I think it is quite probable that ELEMENTAL or not
ELEMENTAL does not matter in this case (except that ELEMENTAL procedures
have to be PURE for your point of view).

When called with an array argument, your ELEMENTAL procedure will be
called for each element as opposed to when your procedure accepts the
whole array for one call, and thus here's the possiblity that the
performance changes; but I don't think you can really know in advance if
it improves or degrades and how much; that probably depends on your code
and how your compiler interprets it.

Yours,
Daniel

--
Done: Arc-Bar-Sam-Val-Wiz, Dwa-Elf-Gno-Hum-Orc, Law-Neu-Cha, Fem-Mal
Underway: Cav-Dwa-Law-Fem
To go: Cav-Hea-Kni-Mon-Pri-Ran-Rog-Tou

Tobias Burnus

unread,

Jul 30, 2008, 5:10:06 AM7/30/08

to

On Jul 30, 9:37 am, Daniel Kraft <d...@domob.eu> wrote:
> When called with an array argument, your ELEMENTAL procedure will be
> called for each element as opposed to when your procedure accepts the
> whole array for one call, and thus here's the possiblity that the
> performance changes; but I don't think you can really know in advance if
> it improves or degrades and how much; that probably depends on your code
> and how your compiler interprets it.

If the compiler calls the elemental function once per each array
element, it largely depends whether the compiler can inline the
function. If it can, it should be quite fast, if it cannot, I would
expect that the program might slow down a lot, especially if you have
a large number of array elements and the calculation in the elemental
procedure is quick (i.e. the procedure-calling overhead is larger than
the calculations in the procedure). On the other hand, if you have
only an array-valued procedure, the construction of an array and doing
array assignments in the procedure can also take some time, which
means that for scalars a elemental procedure should be (a tiny bit,
presumably negligibly) faster.

I believe that in most of the cases, the performanceloss due to
elemental procedures is negligible and worrying about the algorithm or
other things is much more important.

Thus, unless you are sure that the procedure is called very often with
array arguments, I would not worry about it. If it is called a lot,
you should profile the code and verify that it is indeed a hot spot
for the compilers you use.

An alternative to an ELEMENTAL function is to have a generic procedure
containing a scalar and an array specific procedure. That gives a
convenient interface to the user and is fast - for both scalars and
arrays - but you replicate code with all its disadvantages.
(Therefore, you really should check whether it makes a difference for
your compiler; having a well-readable code is more important than
having a tiny performance gain. Especially as compilers make progress
and different compilers behave differently, some changes might be not
needed or worse make the program even slower.)

Tobias

Mark Westwood

unread,

Jul 30, 2008, 7:49:18 AM7/30/08

to

Hi

I'd like to know too -- any chance of you writing some variant codes,
comparing them and posting your results ?

Regards

Mark

John Harper

unread,

Jul 30, 2008, 5:19:00 PM7/30/08

to

In article <f833019e-602e-4340...@m3g2000hsc.googlegroups.com>,

deltaquattro <deltaq...@gmail.com> wrote:
>Hi,
>
>I would like to know if the attribute ELEMENTAL for a function/
>subroutine can negatively affect performance or not.

...

>However, sometimes I heard that some "new" features of the language
>may have an hit on performance, because they make optimization more
>difficult for the compiler. I don't know if this is the case with
>ELEMENTAL, however these functions will be called a lot of times, so
>it's not a good idea if they are slow. The code will run on parallel
>vectorial supercomputers. According to these informations, what would
>you suggest me to do?

Try it both ways with a test program, and tell us which is better on
the machine(s) and compiler(s) you used.

-- John Harper, School of Mathematics, Statistics and Computer Science,
Victoria University, PO Box 600, Wellington 6140, New Zealand
e-mail john....@vuw.ac.nz phone (+64)(4)463 6780 fax (+64)(4)463 5045

Dick Hendrickson

unread,

Jul 30, 2008, 9:28:27 PM7/30/08

to

The problem with "functions" is that they can be big or little.
For big ones, calling sequence details tend not to be
important. For little ones, calling sequence time can dominate
function execution time.

So, there's two things to try. Search out the compilers options
for inlining and try it with and without inlining, for computationally
intensive functions and for simple ones.

I think your original post said you were making a package for
others to use. If so, will they be able to compile the package,
or will you give them precompiled .obj and .mod files? They
probably can't get inining with a precompiled versions; having
to recompile your code every time to get the benefit of inlining
is a pain.

Dick Hendrickson

Gary Scott

unread,

Jul 30, 2008, 9:52:20 PM7/30/08

to

I just bought a new IO card. In theory, it can read/write discretes
from host memory within 500ns of host command (as measured on a similar
but not identical system to mine). Due to function call overhead on
WinXP, it can't do it faster than 7usec. They've given me the source
for the device driver so that I can modify it to bypass some of the call
overhead :(. I haven't gotten into the details, but that seems like a
gigantic difference that they claim just hard coding the function codes
in the driver will fix.

>
> So, there's two things to try. Search out the compilers options
> for inlining and try it with and without inlining, for computationally
> intensive functions and for simple ones.
>
> I think your original post said you were making a package for
> others to use. If so, will they be able to compile the package,
> or will you give them precompiled .obj and .mod files? They
> probably can't get inining with a precompiled versions; having
> to recompile your code every time to get the benefit of inlining
> is a pain.
>
> Dick Hendrickson

--

Gary Scott
mailto:garylscott@sbcglobal dot net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford

deltaquattro

unread,

Jul 31, 2008, 10:22:30 AM7/31/08

to

Hi,

thanks to all of you guys. I have no time to do comparisons. I chose
not to use ELEMENTAL since my functions are very small and I believe
calling time would dominate.

Best Regards

deltaquattro