efficiency of allocatable array into derived type

stefan...@gmail.com

unread,

Sep 18, 2012, 6:35:17 AM9/18/12

to

Hi all,
I have a serious problem with a derived into which is defined an allocatable array.

The derived type is defined as follows

type ineff
real, allocatable:: x(:)
real:: y
endtype ineff

The problem arises with the re-definition of the operators like "=", "+", "*"... (I report only the subroutines between two "ineff-type" variables, omitting the subroutines between ineff variables and other types):

interface assignment (=)
...
module procedure assign_ineff
...
endinterface

interface operator (+)
...
module procedure ineff_sum_ineff
...
endinterface

...

where the subroutines are like:

elemental subroutine assign_ineff(ineff1,ineff2)
implicit none
type(ineff), intent(INOUT):: ineff1
type(ineff), intent(IN):: ineff2
if (allocated(ineff1%r).and.allocated(ineff2%r)) ineff1%x = ineff2%x
ineff1%y = ineff2%y
return
endsubroutine assign_ineff

elemental function ineff_sum_ineff(ineff1,ineff2) result(summ)
implicit none
type(ineff), intent(IN):: ineff1
type(ineff), intent(IN):: ineff2
type(ineff):: summ
if (.not.allocated(summ%x)) allocate(summ%x(1:size(ineff1%x)))
summ%x = ineff1%x + ineff2%x
summ%y = ineff1%y + ineff2%y
endfunction ineff_sum_ineff

The problems, I guess, arise from the checking of the allocation status of the "x" array and its eventual allocation (in the case of sum "summ%x" needs always a new allocation. This is very inefficient for large arrays computation like the following:

...
type(ineff):: array1(1:1000), array2(1:1000),array3(1:1000)
...

do i=1,1000
if (.not.allocated(array1(i)%x)) allocate(array1(i)%x(1:20))
if (.not.allocated(array2(i)%x)) allocate(array2(i)%x(1:20))
if (.not.allocated(array3(i)%x)) allocate(array3(i)%x(1:20))
array1(i)%x = 1. ; array1(i)%y = 1.
array2(i)%x = 2. ; array2(i)%y = 2.
array3(i)%x = 3. ; array3(i)%y = 3.
array3(i) = array1(i) + array2(i)
enddo

Can anyone suggest me how to improve the efficiency of such a data type? The allocation status checking is very onerous. Nevertheless it is necessary: I have reported only a small snippet, but in the original version this kind of type is used to achieve a sort of polymorphism...

The static version of the derived type (with "x" defined as a static array) is much more 10 times faster than this one. I guess that the parameterized derived types can solve this inefficiency, but gfortran 4.6 and intel fortran 12.0 seems to not support this kind of derived type.

Thank you all

Paul van Delst

unread,

Sep 18, 2012, 10:00:10 AM9/18/12

to

(My newsreader mangled your original message... sorry)

I wouldn't write an assignment procedure. Let it be done
"intrinsically". The necessary allocations will occur.

If you assume the + operator is only valid for defined arguments, how
about removing the allocation check from the procedure?

You could do the following:

elemental function ineff_sum_ineff(ineff1,ineff2) result(summ)
implicit none
type(ineff), intent(IN):: ineff1
type(ineff), intent(IN):: ineff2
type(ineff):: summ

summ = ineff1 ! Summ is de/allocated as necessary
summ%x = summ%x + ineff2%x
summ%y = summ%y + ineff2%y
end function ineff_sum_ineff

Then with a function like,

ELEMENTAL FUNCTION ineff_allocated( ineff ) RESULT( Status )
TYPE(ineff_type), INTENT(IN) :: ineff
LOGICAL :: Status
Status = ALLOCATED(ineff%x)
END FUNCTION ineff_allocated

you could do:

IF ( ALL( ineff_allocated(array1)) .AND. &
ALL( ineff_allocated(array2)) ) THEN
array3 = array1 + array2
ELSE
...issue error for unallocated component
END IF

No loop!

(It also means you only have to modify a single procedure for allocation
checking if/when you add additional allocatable components to the
"ineff" type)

I guess the basic message is to do as much work as possible outside the
loop (allocating[*] as well as checking it).

Now, having said all that, I always put an allocation checker in my
summation procedures. I suck up the performance hit for the safety and
convenience. Doing the allocation check outside the routine make it a
less useful procedure (IMO), but if the performance is suffering then I
guess the pragmatic choice would be to take it out.

Also, some profiling+timing results with and without the allocation
checkers would be useful too. Are you really sure the performance issues
are where you think they are? (I'm sure you're right, but it's always
worth mentioning)

cheers,

paulv

[*] An elemental "constructor"

ELEMENTAL SUBROUTINE ineff_create(ineff, n)
TYPE(ineff_type), INTENT(OUT) :: ineff ! INTENT is important!
INTEGER, INTENT(IN) :: n
ALLOCATE(ineff%x(n))
END SUBROUTINE ineff_create

That you could use to do:

CALL ineff_create(array1,n)
CALL ineff_create(array2,n)

to do the allocations prior to the loop.

Note that the INTENT(OUT) above can also be the cause of a performance
hit because the structure is ALWAYS being deallocated, then allocated.
If the structure is big enough already, there's no need to reallocate.
All of the above techniques I use are for general, basic structures that
are part of a library. They might not apply (well) for structures more
tightly integrated into an application.

Stefano Zaghi

unread,

Sep 18, 2012, 11:04:07 AM9/18/12

to

Hi Paul,
thank you very much for your replay.

> I wouldn't write an assignment procedure. Let it be done
> "intrinsically". The necessary allocations will occur.

This sentence is something obscure for my understanding. The allocation status check and the eventual allocation in the subroutine ineff_sum_ineff is necessary because the result variable summ always has %x not allocated (from this point of view the "if (allocated..)" check is not necessary): the real efficiency hit is the allocation of %x always when "+" operation is done. In my understanding this is due to the use of function for overloading the "+" operator instead of a subroutine where it is possible to define summ as intent(INOUT) variable. In a few words, suppose "array1", "array2" and "array3" are already allocated; when you do, for example,

array1(7) = array2(5) + array3(6)

the corresponding array %x of "array1(7)" is always reallocated even if all components of array1 have been correctly initialized.

> If you assume the + operator is only valid for defined arguments, how
> about removing the allocation check from the procedure?
>
> You could do the following:
>
> elemental function ineff_sum_ineff(ineff1,ineff2) result(summ)
> implicit none
> type(ineff), intent(IN):: ineff1
> type(ineff), intent(IN):: ineff2
> type(ineff):: summ
> summ = ineff1 ! Summ is de/allocated as necessary
> summ%x = summ%x + ineff2%x
> summ%y = summ%y + ineff2%y
> end function ineff_sum_ineff

Yes, as I said before this avoid the check, but not the allocation of summ%x, that I guess is my problem.

> Then with a function like,
> ELEMENTAL FUNCTION ineff_allocated( ineff ) RESULT( Status )
> TYPE(ineff_type), INTENT(IN) :: ineff
> LOGICAL :: Status
> Status = ALLOCATED(ineff%x)
> END FUNCTION ineff_allocated
>
> you could do:
>
> IF ( ALL( ineff_allocated(array1)) .AND. &
> ALL( ineff_allocated(array2)) ) THEN
> array3 = array1 + array2
> ELSE
> ...issue error for unallocated component
> END IF
> No loop!
>

You are right, but again my issue is not on the main loop, but in the necessity of reallocation of memory already allocated. The structure is used for a very complex fluid dynamic computation in a parallel code: the efficiency is important. The derived type is used to simplify the implementation of the physical formula. The operations in the main loop are complex and cannot be reduced to a simply summation... what I need is to avoid the reallocation, but keep the operators overloading.

> (It also means you only have to modify a single procedure for allocation
> checking if/when you add additional allocatable components to the
> "ineff" type)

Yes, you are right, but my above observations are still valid.

> I guess the basic message is to do as much work as possible outside the
> loop (allocating[*] as well as checking it).
>
> Now, having said all that, I always put an allocation checker in my
> summation procedures. I suck up the performance hit for the safety and
> convenience. Doing the allocation check outside the routine make it a
> less useful procedure (IMO), but if the performance is suffering then I
> guess the pragmatic choice would be to take it out.

You are right, and it was my first (paranoiac) attempt, but the performance is too bad.

> Also, some profiling+timing results with and without the allocation
> checkers would be useful too. Are you really sure the performance issues
> are where you think they are? (I'm sure you're right, but it's always
> worth mentioning)

Again you are right. The comparison I reported is simply a "time" output, a deeper profiling is mandatory.

> cheers,
> paulv
>
> [*] An elemental "constructor"
>
> ELEMENTAL SUBROUTINE ineff_create(ineff, n)
> TYPE(ineff_type), INTENT(OUT) :: ineff ! INTENT is important!
> INTEGER, INTENT(IN) :: n
> ALLOCATE(ineff%x(n))
> END SUBROUTINE ineff_create
>
> That you could use to do:
> CALL ineff_create(array1,n)
> CALL ineff_create(array2,n)
> to do the allocations prior to the loop.
>
> Note that the INTENT(OUT) above can also be the cause of a performance
> hit because the structure is ALWAYS being deallocated, then allocated.
> If the structure is big enough already, there's no need to reallocate.

Yes, this is clear.

For summarizing, I need to find a way for using operators overloading without the necessity to perform a reallocation of variables that, indeed, are already allocated.

Thank you again

Ian Harvey

unread,

Sep 18, 2012, 8:38:49 PM9/18/12

to

On 2012-09-19 1:04 AM, Stefano Zaghi wrote:
>
> Hi Paul,
> thank you very much for your replay.
>
>> I wouldn't write an assignment procedure. Let it be done
>> "intrinsically". The necessary allocations will occur.
>
> This sentence is something obscure for my understanding. The allocation status check and the eventual allocation in the subroutine ineff_sum_ineff is necessary because the result variable summ always has %x not allocated (from this point of view the "if (allocated..)" check is not necessary): the real efficiency hit is the allocation of %x always when "+" operation is done. In my understanding this is due to the use of function for overloading the "+" operator instead of a subroutine where it is possible to define summ as intent(INOUT) variable. In a few words, suppose "array1", "array2" and "array3" are already allocated; when you do, for example,
>
> array1(7) = array2(5) + array3(6)
>
> the corresponding array %x of "array1(7)" is always reallocated even if all components of array1 have been correctly initialized.

I think the simple answer to your original post is that you are
expecting optimisations that may not be supported by the current crop of
processors. That said, a few aspects of your code might be working
against optimisation, or at least confuse the issue.

(If you want more control over allocation etc, the best path forward is
to not use things like defined assignment and defined operations, but
explicitly code each action using subroutines. That also lets you play
tricks like using move_alloc to move arrays from one object to another,
which, if the array is large, is much, much quicker than allocating,
copying and deallocating. Assuming that's not an option:)

Subject to implementation detail, the compiler most likely creates a
temporary for the right hand side of that assignment. This temporary
(which the function result variable is a proxy for) is then assigned to
array1.

In your original post you had:

elemental function ineff_sum_ineff(ineff1,ineff2) result(summ)
implicit none
type(ineff), intent(IN):: ineff1
type(ineff), intent(IN):: ineff2
type(ineff):: summ

if (.not.allocated(summ%x)) allocate(summ%x(1:size(ineff1%x)))
summ%x = ineff1%x + ineff2%x

The "not allocated" test is redundant. The function result summ
conceptually didn't exist prior to the specification part of that block
of code. When it comes into existance, all its components are not
allocated by the rules of the language. The very first executable
statement after it comes into existance is then the allocation test -
there isn't a possibility for summ%x to be anything but unallocated (in
the absence of compiler bugs, and if you have compiler bugs like that
floating around then optimisation is the least of your worries.)

Further (with appropriate command line switches), the allocate child
statement is redundant in F2003 - if not allocated, then summ%x will be
automatically allocated to the size of the ineff1%x + ineff2%x expression.

Because you have defined assignment, the compiler must call the defined
assignment procedure when it hits an assignment statement. The compiler
doesn't neccessarily know what the overall effect of your defined
assignment is going to be, so it can't (unless it is very, very clever)
play games like elimination of temporaries. The compiler pretty much
has to transform

type(ineff) :: array1, array2, array3
...

array3 = array1 + array2

into something like (not real code):

type(ineff) :: array1, array2, array3
...
type(ineff) :: tmp
! The addition function transformed into something like a
! subroutine call with hidden intent(out) argument for the
! result.
call pseudo_ineff_sum_ineff(array1, array2, tmp)
! The defined assignment.
call assign_ineff(array3, tmp)

and all the allocation and copying associated with the existance of that
temporary (checking of allocation is probably inconsequential in terms
of execution time - it is the allocation and copying of data that hurts).

Without defined assignment, the compiler may well arrange for the left
hand side of the assignment statement to be the function result directly.

call pseudo_ineff_sum_ineff(array1, array2, array3)

(noting that because the hidden argument for array3 is intent(out), any
operations on array3 prior to the relevant real statement become
irrelevant.)

So do you really need the defined assignment?

(For types with value semantics (which your type sort of seems to want
to have) I'd always have the the argument for the left hand side of the
assignment in a procedure implementing defined assignment as
intent(out). In the code *that you posted* you aren't saving anything
by having the argument inout, but you are potentially further
obfuscating the overall intent of your code.)

glen herrmannsfeldt

unread,

Sep 18, 2012, 10:59:48 PM9/18/12

to

stefan...@gmail.com wrote:

> I have a serious problem with a derived into which is
> defined an allocatable array.

> The derived type is defined as follows

> type ineff
> real, allocatable:: x(:)
> real:: y
> endtype ineff

Best guess is that allocatables, pointers, and automatic local
variables are slightly less efficient than static (local or not)
variables. Small enough not to worry about most of the time.

> The problem arises with the re-definition of the operators
> like "=", "+", "*"... (I report only the subroutines between
> two "ineff-type" variables, omitting the subroutines
> between ineff variables and other types):

Separate from accessing allocatable variables, the actual
allocation has some inefficiency. It is best to avoid it when
possible if you are worried about efficiency. Don't reallocate
each time through a loop, especially if the size didn't change.

> interface assignment (=)
> ...
> module procedure assign_ineff
> ...
> endinterface

It is even more difficult for compilers to optimize through
such defined operators. They usually are implemented as a
procedure call, which is hard to optimize across. Even more,
it might require allocation of temporary arrays.

If you are worried about efficiency, then try to avoid it, at least
inside deeply nested loops. (Or shallow ones executed millions or
billions of times.)

You might find that the C preprocessor allows you to write
code in an easy to understand way, and yet still allow for
the usual optimizations.

-- glen

Stefano Zaghi

unread,

Sep 19, 2012, 2:34:51 AM9/19/12

to

Hi all,
thank you for yours replay.

@Ian

You are right and I already know what you said. But as you argued I was trying to do something that is not possible.

My data has a semantic and it is very nice to write formula like the following:

ineff1(i) = (0.5*ineff2(i+1)-0.25*ineff3(i-1))/ineff4(2*i)

Unlikely the overloaded operators must use function procedure and the result variables (summ, mult, divv, ecc...) are always with component %x not allocated (a large use of temporary arrays). I just wondering if there is a way to do a truly semantic operation on derived data variables but at the same not without compromising the efficiency with re-allocation of temporary variables. As you said this is not possible, I have to leave the overloaded operators in favor of components operations (less elegant but more efficient).

Thank you very much.

Stefano Zaghi

unread,

Sep 19, 2012, 2:37:16 AM9/19/12

to

Hi Glen,

I already (ab)use "C preprocessor". What do you mean with

"You might find that the C preprocessor allows you to write
code in an easy to understand way, and yet still allow for
the usual optimizations."

How "C preprocessor" can help me to make the overloaded operators efficient?

Wolfgang Kilian

unread,

Sep 19, 2012, 4:12:46 AM9/19/12

to

On 09/18/2012 12:35 PM, stefan...@gmail.com wrote:
> Hi all,
> I have a serious problem with a derived into which is defined an allocatable array.
>
> The derived type is defined as follows
>
> type ineff
> real, allocatable:: x(:)
> real:: y
> endtype ineff
>
> The problem arises with the re-definition of the operators like "=", "+", "*"... (I report only the subroutines between two "ineff-type" variables, omitting the subroutines between ineff variables and other types):
>
> interface assignment (=)
> ...
> module procedure assign_ineff
> ...
> endinterface
>
> interface operator (+)
> ...
> module procedure ineff_sum_ineff
> ...
> endinterface
>

> [...]

Assuming that you want to keep the operator-overloading concept for
readability while looking for an efficient algorithm, the obvious
solution (subroutines that operate on some workspace) is ruled out.

If unnecessary allocating and copying arrays is the actual cause of the
inefficiency, working in-place for all operations would resolve the
issue. However, working in-place is not straightforward here, since
operator overloading implies specific INTENT for the arguments of the
defined assignment and the defined operator. This rules out a solution
that uses MOVE_ALLOC.

However, you might be able to do this with old-fashioned pointer arrays,
see my code below. There are caveats:

1) The pointer_add function has a side effect. It modifies the array
which is pointed add by its first argument. This is no problem if the
first argument is a temporary. If it is a variable, make sure that this
variable is not used a second time after inserting it into the function.

2) The module procedures are impure. That is reasonable, they do have
side effects. If your compiler supports IMPURE ELEMENTAL, you can write
elemental procedures. Without IMPURE ELEMENTAL, you have to implement
specific versions for arrays, as I did below.

3) It may be tricky to avoid memory leaks.

4) Be sure to keep the code well-documented, because future
modifications have to live with the side effects.

I hope this helps.

-- Wolfgang

---------------------------------------------------------------------
module defined

implicit none

type :: wrapper_t
integer, dimension(:), pointer :: array => null ()
end type wrapper_t

interface assignment(=)
module procedure pointer_assignment
end interface

interface operator(+)
module procedure pointer_add
end interface

contains

subroutine pointer_assignment (a, b)
type(wrapper_t), dimension(:), intent(out) :: a
type(wrapper_t), dimension(:), intent(in) :: b
integer :: i
do i = 1, size (a)
a(i)%array => b(i)%array
end do
end subroutine pointer_assignment

function pointer_add (a, b) result (c)
type(wrapper_t), dimension(:), intent(in) :: a, b
type(wrapper_t), dimension(size(a)) :: c
integer :: i
do i = 1, size (a)
c(i)%array => a(i)%array
if (associated (c(i)%array) .and. associated (b(i)%array)) &
c(i)%array = c(i)%array + b(i)%array
end do
end function pointer_add

end module defined

program main

use defined

type(wrapper_t), dimension(:), allocatable :: a, b, c

allocate (a(1), b(1), c(1))
allocate (a(1)%array (3), b(1)%array(3))

a(1)%array = [1, 2, 3]
b(1)%array = [4, 5, 6]

c = a + b

print *, c(1)%array

end program main
-------------------------------------------------------------------------

--
E-mail: firstnameini...@domain.de
Domain: yahoo

glen herrmannsfeldt

unread,

Sep 19, 2012, 4:27:04 AM9/19/12

to

Stefano Zaghi <stefan...@gmail.com> wrote:

> I already (ab)use "C preprocessor". What do you mean with

(after I wrote)

> "You might find that the C preprocessor allows you to write
> code in an easy to understand way, and yet still allow for
> the usual optimizations."

> How "C preprocessor" can help me to make the overloaded operators efficient?

Not directly operator overloading, but maybe more readable than the
full expansion of the operations.

I do presume you have something more complicated than adding and
assigning two member structures. (If that is all, use COMPLEX instead.)

Otherwise, you could have macros like assign_ineff(x,y)
and add_ineff(x,y,z) such that the expansions generated the
appropriate assignment. Maybe not quite as readable, but maybe enough.

-- glen

Stefano Zaghi

unread,

Sep 19, 2012, 4:36:59 AM9/19/12

to

Dear Wolfgang,
thank very much for your suggestion. I am not sure I have understand all, but I will study your code.

>1) The pointer_add function has a side effect. It modifies the array
>which is pointed add by its first argument. This is no problem if the
>first argument is a temporary. If it is a variable, make sure that this
>variable is not used a second time after inserting it into the function.

I don't understand. Is the trick to use pointer side-effect to avoid re-allocation?

>2) The module procedures are impure. That is reasonable, they do have
>side effects. If your compiler supports IMPURE ELEMENTAL, you can write
>elemental procedures. Without IMPURE ELEMENTAL, you have to implement
>specific versions for arrays, as I did below.

Ok, this make sense for me.

>3) It may be tricky to avoid memory leaks.

I am almost sure that my efficiency problem is due to re-allocation and temporary arrays abuse rather than memory leaks.

>4) Be sure to keep the code well-documented, because future
>modifications have to live with the side effects.

You are right, the documentation is one of my paranoiac activity.

Thank you very much, I am going to study your caveats.

Wolfgang Kilian

unread,

Sep 19, 2012, 4:54:47 AM9/19/12

to

On 09/19/2012 10:36 AM, Stefano Zaghi wrote:
> Dear Wolfgang,
> thank very much for your suggestion. I am not sure I have understand all, but I will study your code.
>
>> 1) The pointer_add function has a side effect. It modifies the array
>> which is pointed add by its first argument. This is no problem if the
>> first argument is a temporary. If it is a variable, make sure that this
>> variable is not used a second time after inserting it into the function.
>
> I don't understand. Is the trick to use pointer side-effect to avoid re-allocation?

Yes. But this implies that one has to be careful. After

c = a + b

the contents of a are not unchanged, but equal to the contents of b.
Therefore, after

c = a + b

d = a + b

d and c will not be equal. Don't use 'a' a second time.

-- Wolfgang

Wolfgang Kilian

unread,

Sep 19, 2012, 4:56:49 AM9/19/12

to

Correction:

> ... the contents of a are not unchanged, but equal to the contents of c.

Wolfgang Kilian

unread,

Sep 19, 2012, 4:58:49 AM9/19/12

to

and another correction, I was too fast:

> c = a + b
> d = a + b
>

d and c will become equal, namely equal to a + 2*b. a, c and d all
point to the same array.

Arjen Markus

unread,

Sep 19, 2012, 5:56:36 AM9/19/12

to

Why not use the ASSOCIATE construct?

associate( ineff1 => ineff1(i)%x, ineff2 => ineff2(i+1)%x,
ineff3 => ineff3(i-1)%x, ineff4 => ineff4(2*i) )
ineff1 = (0.5*ineff2-0.25*ineff3)/ineff4
end associate

It works as a sort of macro substitution and it lets you use these
expressions in the way you want.

Regards,

Arjen

Stefano Zaghi

unread,

Sep 19, 2012, 6:49:08 AM9/19/12

to

Dear Arjen,
your suggestion is interesting, but I am not sure it solves my problem. If have understood your trick when I have to perform a "semantic operation" I have to build an associate block, instead of using overloaded operators. Is it correct? In that case it seems only slightly more elegant/readable than the direct use of %x components.

Now I think that the only way to still use overloaded operators and preserving the efficiency (that is my choice) is to (ab)use pointers components (maybe with the side-effect described above).

Thank you Arjen.

Arjen Markus

unread,

Sep 19, 2012, 7:01:36 AM9/19/12

to

You can use more than one statement in an ASSOCIATE block and you can
still use the original names. It "merely" defines aliases for the variables
or components of variables. Depending on how many such alias you introduce,
it may or may not help. I may be completely mistaken, but it should be
a textual substitution to the compiler and all operations work out as if
you had used the full text yourself. No need to define or use your own operators.

Regards,

Arjen

Richard Maine

unread,

Sep 19, 2012, 11:11:47 AM9/19/12

to

Wolfgang Kilian <see...@domain.invalid> wrote:

> and another correction, I was too fast:
>
> > c = a + b
> > d = a + b
> >
> d and c will become equal, namely equal to a + 2*b. a, c and d all
> point to the same array.

And the advantage of this approach is that it helps clarity? No thanks.
I'd advise good old subroutine calls over this any day.

--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain

Stefano Zaghi

unread,

Sep 20, 2012, 3:14:32 AM9/20/12

to

On Wednesday, September 19, 2012 5:11:48 PM UTC+2, Richard Maine wrote:
> And the advantage of this approach is that it helps clarity? No thanks.
> I'd advise good old subroutine calls over this any day.

Hi Richard,
thank you for your replay, but I think that all of us know how efficient are "good old subroutine calls"... the question here is how make efficient "overloaded operators" applied to "dynamic derived type" variables...

I am experimenting some changes of the Walfgang's pointers tricks. I am trying to "limit" the side effect. I will post some comments soon.

See you soon.

Richard Maine

unread,

Sep 20, 2012, 4:14:06 AM9/20/12

to

Note that my post did not include the word "efficiency" or any variant
thereof. That's because I was saying nothing about efficiency. I did
comment about clarity.

Atempting to improve efficiency at the expense of clarity is one of the
most classic mistakes in program design, and it so often turns out not
to really improve efficiency anyway. Much has been written on the
subject.

To my knowledge, the main reason for using overloaded operators is
improvement in clarity. If there has ever been a suggestion that such
overloading somehow improved efficiency, I've never seen it.

It does vaguely occur to me that perhaps you might be using "efficiency"
to refer to something like efficiency of expression, which is related to
clarity. I'd think that a confusing usage in this context, but that's
the only way I can see that you might read my post as talking about
efficiency.

My comment was to the effect that such pointer issues detract from the
clarity more than the overloaded operators are likely to improve it.

Stefano Zaghi

unread,

Sep 20, 2012, 4:52:41 AM9/20/12

to

Hi Richard,
my question evidently was not clear.

I hope to use overloaded operators within "semantic" derived type to improve clarity, but my attempt was frustrated by efficiency issue. I think that an operation like

ineff3(i) = 0.5*ineff1(i-1)+1.5*ineff2(i+1)

is more clear (if one have defined ineff# as a semantic type variable) than to perform the same operation on all components separately.

As you said the main reason for using overloaded operators is improvement in clarity. In this sense I like to use overloaded operators. Nevertheless I cannot accept the efficiency drop. Thus there is confusing usage of efficiency word: I say "efficiency" meaning speed to perform the sum and "clarity" meaning the semantic sense of the formula.

As you argued the side-effect of pointer tricks degrades the clarity, but improve the efficiency of the overloaded operators. Nevertheless the side effect is intolerable for me and I am now trying to eliminate it, but I think that is not possible (i.e. it is just the side-effect that, avoiding a new allocation, improves the performances of the sum). My conclusion is to avoid operators overloading until parameterized derived type will be implemented.

Later today I will post some comparison between "static", "allocatable" and "pointer" derived with overloaded derived type just to improve the clarity of this discussion :-)

See you soon.

Stefano Zaghi

unread,

Sep 20, 2012, 6:21:29 AM9/20/12

to

Hi all,
I have made a simple comparison between three different derived types: 1) with static array, 2) with allocatable array, 3) pointer array.

For pointer array I avoided the side effect, thus it is very very similar to the allocatable one (but the performances are not so identical).

The code used is the following:

module derived_static
implicit none

type:: wrapper_s
real(8), dimension(3):: array
integer:: scalar
endtype wrapper_s

interface assignment(=)
module procedure static_assignment
endinterface

interface operator(+)
module procedure static_add
endinterface

contains
subroutine static_assignment (a, b)
implicit none
type(wrapper_s), intent(inout) :: a
type(wrapper_s), intent(in) :: b
a%array = b%array
a%scalar = b%scalar
return
endsubroutine static_assignment

function static_add (a, b) result (c)
implicit none
type(wrapper_s), intent(in):: a, b
type(wrapper_s):: c
c%array = a%array + b%array
c%scalar = a%scalar + b%scalar
return
endfunction static_add
endmodule derived_static

module derived_allocatable
implicit none

type:: wrapper_a
real(8), dimension(:), allocatable:: array
integer:: scalar
endtype wrapper_a

interface assignment(=)
module procedure allocatable_assignment
endinterface

interface operator(+)
module procedure allocatable_add
endinterface

contains
subroutine allocatable_assignment (a, b)
implicit none
type(wrapper_a), intent(inout) :: a
type(wrapper_a), intent(in) :: b
a%array = b%array
a%scalar = b%scalar
return
endsubroutine allocatable_assignment

function allocatable_add (a, b) result (c)
implicit none
type(wrapper_a), intent(in):: a, b
type(wrapper_a):: c
allocate(c%array(size(a%array)))
c%array = a%array + b%array
c%scalar = a%scalar + b%scalar
return
endfunction allocatable_add
endmodule derived_allocatable

module derived_pointer
implicit none

type:: wrapper_p
real(8), dimension(:), pointer:: array => null ()
integer:: scalar
endtype wrapper_p

interface assignment(=)
module procedure pointer_assignment
endinterface

interface operator(+)
module procedure pointer_add
endinterface

contains
subroutine pointer_assignment (a, b)

implicit none
type(wrapper_p), intent(inout) :: a
type(wrapper_p), intent(in) :: b
a%array = b%array
a%scalar = b%scalar
return
endsubroutine pointer_assignment

function pointer_add (a, b) result (c)

implicit none
type(wrapper_p), intent(in):: a, b
type(wrapper_p):: c
allocate(c%array(size(a%array)))
c%array = a%array + b%array
c%scalar = a%scalar + b%scalar
return
endfunction pointer_add
endmodule derived_pointer

program main
use derived_static
use derived_allocatable
use derived_pointer
implicit none
type(wrapper_s), dimension(:), allocatable :: sa, sb, sc
type(wrapper_a), dimension(:), allocatable :: aa, ab, ac
type(wrapper_p), dimension(:), allocatable :: pa, pb, pc
real:: partial(1:2)
integer, parameter:: N=100000,M=3,R=500
integer:: i,j,k

! initializing
allocate (sa(N), sb(N), sc(N))
allocate (aa(N), ab(N), ac(N))
allocate (pa(N), pb(N), pc(N))

do i=1,N
allocate (aa(i)%array(M),ab(i)%array(M),ac(i)%array(M))
allocate (pa(i)%array(M),pb(i)%array(M),pc(i)%array(M))
! array
sa(i)%array = [1._8*i, 2._8*i, 3._8*i] ; sb(i)%array = [4._8*i, 5._8*i, 6._8*i]
aa(i)%array = [1._8*i, 2._8*i, 3._8*i] ; ab(i)%array = [4._8*i, 5._8*i, 6._8*i]
pa(i)%array = [1._8*i, 2._8*i, 3._8*i] ; pb(i)%array = [4._8*i, 5._8*i, 6._8*i]
! scalar
sa(i)%scalar = i ; sb(i)%scalar = 2*i
aa(i)%scalar = i ; ab(i)%scalar = 2*i
pa(i)%scalar = i ; pb(i)%scalar = 2*i
enddo

! testing static without overloaded operators
call CPU_TIME(partial(1))
do j=1,R
do i=1,N
sc(i)%array = sa(i)%array + sb(i)%array
sc(i)%scalar = sa(i)%scalar + sb(i)%scalar
enddo
enddo
call CPU_TIME(partial(2))
print '(A)', 'Static without overloaded operators'
print '(E13.6)', (partial(2)-partial(1))/R
print '(3F8.1,1X,I1)', sa(1)%array, sa(1)%scalar
print '(3F8.1,1X,I1)', sb(1)%array, sb(1)%scalar
print '(3F8.1,1X,I1)', sc(1)%array, sc(1)%scalar

! testing static
call CPU_TIME(partial(1))
do j=1,R
do i=1,N
sc(i) = sa(i) + sb(i)
enddo
enddo
call CPU_TIME(partial(2))
print '(A)', 'Static'
print '(E13.6)', (partial(2)-partial(1))/R
print '(3F8.1,1X,I1)', sa(1)%array, sa(1)%scalar
print '(3F8.1,1X,I1)', sb(1)%array, sb(1)%scalar
print '(3F8.1,1X,I1)', sc(1)%array, sc(1)%scalar

! testing allocatable
call CPU_TIME(partial(1))
do j=1,R
do i=1,N
ac(i) = aa(i) + ab(i)
enddo
enddo
call CPU_TIME(partial(2))
print '(A)', 'Allocatable'
print '(E13.6)', (partial(2)-partial(1))/R
print '(3F8.1,1X,I1)', aa(1)%array, aa(1)%scalar
print '(3F8.1,1X,I1)', ab(1)%array, ab(1)%scalar
print '(3F8.1,1X,I1)', ac(1)%array, ac(1)%scalar

! testing pointer
call CPU_TIME(partial(1))
do j=1,R
do i=1,N
pc(i) = pa(i) + pb(i)
enddo
enddo
call CPU_TIME(partial(2))
print '(A)', 'Pointer'
print '(E13.6)', (partial(2)-partial(1))/R
print '(3F8.1,1X,I1)', pa(1)%array, pa(1)%scalar
print '(3F8.1,1X,I1)', pb(1)%array, pb(1)%scalar
print '(3F8.1,1X,I1)', pc(1)%array, pc(1)%scalar

stop
end program main

I have tested Intel fortran 12.1.5 and GNU gfortran 4.6.3. The results on my workstation are (I omit the output for the side-effect check, reporting only the time values):

Intel compiler

Intel (-O0 compiling):
Static without overloaded operators
0.276817E-02
Static
0.384024E-02
Allocatable
0.183371E-01
Pointer
0.207293E-01

Intel (-O3 compiling):
Static without overloaded operators
0.000000E+00
Static
0.888056E-03
Allocatable
0.839252E-02
Pointer
0.101606E-01

It seems that the optimization, at least for a generic -O3, performs (slightly) better with allocatable than with pointer array. The static version is always one order faster than the others and when the overhead of using overloaded operators is eliminated the sum is so faster that the loop is not enough long to measure its time with CPU_TIME.

For the GNU compiler the comments are almost identical:

gfortran (-O0):
Static without overloaded operators
0.140009E-02
Static
0.255216E-02
Allocatable
0.103927E-01
Pointer
0.116887E-01

gfortran (-O3):
Static without overloaded operators
0.848054E-03
Static
0.880056E-03
Allocatable
0.696043E-02
Pointer
0.705644E-02
Note that the -O3 optimization of the gfortran is not so effective as the ifort one for the static without overloaded operators case.

My conclusions are:

1) using the side-effect the pointer array into derived type is efficient, but it makes the code confusing and unsafe;

2) static and pointer (without side-effect) array into derived type are at least one order slower than the static version;

3) even with the static array derived type the overloaded operators have a not negligible overhead.

I am now thinking to completely avoid overloaded operators in favor of explicitly components computations: less clear but more efficient computations.

See you soon.

glen herrmannsfeldt

unread,

Sep 20, 2012, 1:06:20 PM9/20/12

to

Stefano Zaghi <stefan...@gmail.com> wrote:

(snip)

> my question evidently was not clear.

> I hope to use overloaded operators within "semantic" derived
> type to improve clarity, but my attempt was frustrated by
> efficiency issue. I think that an operation like

> ineff3(i) = 0.5*ineff1(i-1)+1.5*ineff2(i+1)

> is more clear (if one have defined ineff# as a semantic
> type variable) than to perform the same operation on all
> components separately.

It isn't so obvious what operation you need to perform on
these structures, but I will mention, for comparison purposes
only, a PL/I feature that Fortran doesn't have.

Structure expressions.

Like array expressions, you can use the usual operators for
member by member operation between two structures.
There are two ways to do it. The default way is by position
in the structure. A+B adds the first members, the second, etc.

The second way is with the BY NAME option, which operates only
on similar named structure members, whatever their position,
and ignores those that don't have a common name.

Last I knew, PL/I didn't have operator overloading, which
is a pretty nice feature, but somewhat overkill if you just
want to perform the same operation on some structures.

-- glen

Wolfgang Kilian

unread,

Sep 21, 2012, 3:04:17 AM9/21/12

to

On 09/19/2012 05:11 PM, Richard Maine wrote:
> Wolfgang Kilian<see...@domain.invalid> wrote:
>
>> and another correction, I was too fast:
>>
>>> c = a + b
>>> d = a + b
>>>
>> d and c will become equal, namely equal to a + 2*b. a, c and d all
>> point to the same array.
>
> And the advantage of this approach is that it helps clarity? No thanks.
> I'd advise good old subroutine calls over this any day.

The OP asked a specific question and I tried to find an answer to
precisely that question.

From his other posts I learn that, in the actual implementation, the
pointer solution is not really helpful. This eliminates another reason
to use pointer arrays in Fortran, in favor of allocatables (or static,
where possible).

Regarding overloaded operators: as attractive as they are, I find them
quite misleading for the programmer. It is no surprise that one can get
headaches when trying to use them in complex situations. Addition, for
instance, is a commutative and associative operation, and for ordinary
numbers the rules of Fortran allow the optimizer to exploit this
property. Overloaded addition needs not have either of those
properties, so a + b + c and c + b + a could have completely different
results. This is also the case here, because of the in-place operations.

[I'd consider type-bound overloaded operators as even worse, type-bound
addition (with PASSed argument) for polymorphic objects is actually
forced to be asymmetric in its two arguments. After some experience, I
would no longer recommend to use this feature of F2003. Type-bound
assignment, however, is very useful.]

-- Wolfgang

Ian Harvey

unread,

Sep 21, 2012, 8:48:55 AM9/21/12

to

"forced to be asymmetric"? What do you mean?

Wolfgang Kilian

unread,

Sep 21, 2012, 11:04:36 AM9/21/12

to

On 09/21/2012 02:48 PM, Ian Harvey wrote:
> On 2012-09-21 5:04 PM, Wolfgang Kilian wrote:

> "forced to be asymmetric"? What do you mean?

One argument to the operator is the passed argument, the other is an
ordinary argument. The argument/type matching rules for the two are not
identical. This may lead to complications (I think I had an example,
but I'd have to dig it up.)