Static array versus allocatable array in derived type

fj

unread,

Jun 28, 2009, 2:44:20 PM6/28/09

to

In a previous post, I gave an example demonstrating that using assumed
shape array instead of explicit shape array has a cost :
- an assume shape array is much more efficient if copy-in/copy-out is
required for the explicit shape variant
- but without copy-in/copy-out, the explicit shape is always a little
bit faster, despite the urban legend saying the contrary ...

I also promised an actual example about the cost of allocatable arrays
versus static arrays in derived types. I cannot post the full example
here because of the size of the module var_dep (VARiable DEPendencies)
managing variables associated to partial derivatives (> 1000 lines).

The module itself may be download from my website (which is mainly the
website of my wife, especially for the perfume part) :

http://parfum-echecs.chez-alice.fr/download/var_dep.f90
http://parfum-echecs.chez-alice.fr/download/var_dep_alloc.f90
http://parfum-echecs.chez-alice.fr/download/test.f90

Two variants are implemented : static arrays (var_dep.f90) and
allocatable arrays (var_dep_alloc.f90). These variants are very
similar (try a diff command to see the difference). The differences
are located :
- in the definition of the derides type vardep
- in the routine AllocDeriv

Here is the test program I propose which partially computes a very
simple 1D gaseous flow :

PROGRAM main

USE var_dep
IMPLICIT NONE

CALL vardep_test(1000,4,1000)

CONTAINS

SUBROUTINE vardep_test(nx,ng,niter)

INTEGER,INTENT(in) :: nx ! number of meshes
INTEGER,INTENT(in) :: ng ! number of ideal gases
INTEGER,INTENT(in) :: niter ! number of iterations to increase the
CPU

TYPE(vardep),ALLOCATABLE :: PP(:,:),P(:),T(:),n(:),rho(:),v(:),q(:)
REAL(r8) ::
M=18.e-3_r8,Vol,R=8.314_r8,S=2_r8,fric=30_r8
REAL(r8) ,ALLOCATABLE :: sp(:),st(:),sn(:),srho(:),sv(:),sq(:)
INTEGER :: i,k
REAL :: t0,t1,t2,t3,t4,t5

Vol=50._r8/nx ! volume of a mesh

ALLOCATE(pp(ng,nx),p(nx),t(nx),n(nx),rho(nx),v(nx-1),q(nx-1))
ALLOCATE(sp(nx),st(nx),sn(nx),srho(nx),sv(nx-1),sq(nx-1))

! Initializing main variables (pressures and temperatures)

P=0
DO i=1,nx
DO k=1,ng
CALL newMainVar(PP(k,i),1.e5_r8*(5-DBLE(i+k)/nx),(ng+1)*
(i-1)+k)
P(i)=P(i)+PP(k,i) ! total pressure
ENDDO
CALL newMainVar(T(i),1000._r8*(1+DBLE(i)/nx),(ng+1)*i)
sp(i)=p(i)%value
st(i)=t(i)%value
ENDDO

! Allocating main results only once (for the ALLOCATABLE version)

n=(P/T)*(Vol/R) ! number of moles
rho=n*(M/Vol) ! density in assuming an average
molar mass
v=(P(1:nx-1)-P(2:nx))/fric ! velocity
q=v*(rho(1:nx-1)+rho(2:nx))*(S/2) ! mass flow rate

! Use of subroutines

CALL cpu_time(t0)
DO k=1,niter
DO i=1,nx
CALL newVar(n(i),P(i))
CALL divVar(n(i),T(i))
CALL multVar(n(i),Vol/R)
CALL newVar(rho(i),n(i))
CALL multVar(rho(i),M/Vol)
IF(i < nx) THEN
CALL newVar (v(i),P(i))
CALL minusVar(v(i),P(i+1))
CALL divVar (v(i),fric)
CALL newVar (q(i),rho(i))
CALL addVar (q(i),rho(i+1))
CALL multVar (q(i),v(i))
CALL multVar (q(i),S/2)
ENDIF
ENDDO
ENDDO
write(*,10) 'NR ',n(nx)%value,rho(nx)%value,v(1)%value,q(1)%value
CALL cpu_time(t1)

! Use of elemental subroutines

DO k=1,niter
CALL newVar(n,P)
CALL divVar(n,T)
CALL multVar(n,Vol/R)
CALL newVar(rho,n)
CALL multVar(rho,M/Vol)
CALL newVar (v,P(1:nx-1))
CALL minusVar(v,P(2:nx))
CALL divVar (v,fric)
CALL newVar (q,rho(1:nx-1))
CALL addVar (q,rho(2:nx))
CALL multVar (q,v)
CALL multVar (q,S/2)
ENDDO
write(*,10) 'ER ',n(nx)%value,rho(nx)%value,v(1)%value,q(1)%value

! Use of functions (operators)

CALL cpu_time(t2)
DO k=1,niter
DO i=1,nx
n(i)=(P(i)/T(i))*(Vol/R)
rho(i)=n(i)*(M/Vol)
IF(i < nx) THEN
v(i)=(P(i)-P(i+1))/fric
q(i)=v(i)*(rho(i)+rho(i+1))*(S/2)
ENDIF
ENDDO
ENDDO
write(*,10) 'NF ',n(nx)%value,rho(nx)%value,v(1)%value,q(1)%value

! Use of elemental functions (vector operators)

CALL cpu_time(t3)
DO k=1,niter
n=(P/T)*(Vol/R)
rho=n*(M/Vol)
v=(P(1:nx-1)-P(2:nx))/fric
q=v*(rho(1:nx-1)+rho(2:nx))*(S/2)
ENDDO
write(*,10) 'EF ',n(nx)%value,rho(nx)%value,v(1)%value,q(1)%value

! Scalar compupation

CALL cpu_time(t4)
DO k=1,niter
sn=(sp/st)*(Vol/R)
srho=sn*(M/Vol)
sv=(sp(1:nx-1)-sp(2:nx))/fric
sq=sv*(srho(1:nx-1)+srho(2:nx))*(S/2)
ENDDO
write(*,10) 'SC ',sn(nx),srho(nx),sv(1),sq(1)
CALL cpu_time(t5)

WRITE(*,*) 'NR ',t1-t0
WRITE(*,*) 'ER ',t2-t1
WRITE(*,*) 'NF ',t3-t2
WRITE(*,*) 'EF ',t4-t3
WRITE(*,*) 'SC ',t5-t4

10 FORMAT(1x,a,10(1x,1pe11.4))

END SUBROUTINE

END PROGRAM

This test checks several uses of the var_dep module :
- NR : Normal Routines
- ER : Elemental Routines
- NF : Normal Functions (scalar operators)
- EF : Elemental functions (vector operators)
- SC : Scalar Computation

The goal is not to test ALLOCATE and DEALLOCATE of main variables
because, in my actual applications, the derivatives are known since
the early beginning of each transient computation. This is the reason
why these variables are allocated before the first "CALL cpu_time".
But intermediate variables often need to be created (with ALLOCATE and
DEALLOCATE for the "alloc" version of the module), especially in the
cases NF and EF (operators).

Results with several compilers (I excluded intermediate print out used
to check whether each block was really computing the same things) :

ifort 11.0 20090131
===================

ifort -O3 var_dep.f90 test.f90

NR 2.349644
ER 1.994697
NF 2.515616
EF 2.290652
SC 2.0997047E-02

ifort -O3 var_dep_alloc.f90 test.f90

NR 4.655292
ER 4.499316
NF 10.83235
EF 11.66223
SC 3.0994415E-02

gfortran GNU Fortran (GCC) 4.3.2
================================

gfortran -O3 var_dep.f90 test.f90

NR 2.5366139
ER 2.2576568
NF 4.0553827
EF 3.5034685
SC 5.39922714E-02

gfortran -O3 var_dep_alloc.f90 test.f90

NR 5.5641537
ER 4.6752887
NF 11.471256
EF 11.313282
SC 5.59899292E-02

G95 (GCC 4.0.3 (g95 0.92!) Sep 16 2008)
=======================================

g95 -O3 var_dep.f90 test.f90

NR 9.769514
ER 8.754668
NF 12.080164
EF 12.181149
SC 0.061992645

g95 -O3 var_dep_alloc.f90 test.f90

NR 35.06567
ER 31.30524
NF 51.816124
EF 79.474915
SC 0.06299805

My conclusions :
================

- using allocatable arrays instead of static arrays multiplies the CPU
by at least 2 even if a minimum number of allocate/deallocate
statements occur (NR and ER),
- using operators instead of routines also involves an increase of CPU
which is acceptable with static arrays (+ 20/50 % ) but not with
allocatable arrays (+ 100 %)
- the most elegant solution EF becomes 4 or 5 times slower with
allocatable arrays than with static arrays.

Of course, I have perhaps missed something which makes allocatable
arrays inefficient in my programming. So I wait for your comments.

Post Scriptum :
===============

Compared to scalar computations, var_dep are rather expensive (at
least 100 times slower). Of course, one needs to take into account in
addition the cost of partial derivative computations but I expected
initially a factor 20 for such a test case. If the factor 100 cannot
be reduced, then a solution with pure numerical derivatives could be
more efficient. Indeed it is possible to derive numerically versus
several independent variables together (two variables are independent
if they do not influence the same balance equations); and I checked
recently, for 1D and 3D two phase flows (finite volume), that it was
generally possible to share the state variables into about 50 lists of
independent variables : so potentially, the cost of the computation of
the balance equations and the associated jacobian matrix could be
equal to 51 times the cost the balance equations alone, i.e. twice
less than the fasted vardep solution ! But this idea must be checked
on a real case ...

Richard Maine

unread,

Jun 28, 2009, 3:55:16 PM6/28/09

to

fj <franco...@irsn.fr> wrote:

> In a previous post, I gave an example demonstrating that using assumed
> shape array instead of explicit shape array has a cost :
> - an assume shape array is much more efficient if copy-in/copy-out is
> required for the explicit shape variant
> - but without copy-in/copy-out, the explicit shape is always a little
> bit faster, despite the urban legend saying the contrary ...

That's more in accordance with rather than contrary to what I would call
the usual "urban legend" (except that being largely true probably
disqualifies it as being a legend). I don't recall hearing much of
anyone claim that assumed shape should be faster (except for the
copy-in/copy-out case). It has plenty of benefits, but faster is not
generally cited as one of them; I'm not sure where you would have picked
that up. Perhaps you are extrapolating from a general recommendation to
use assumed-shape. That recommendation is common; but the recommendation
does not tend to be based on an expectation of being faster (except for
the copy-in/copy-out case).

I'll also note that the posted data do show at least one exception to
the above "always". Quoting from my prior post

>> In G95 with no optimization, I see the reverse of the expected
>> behavior; the assumed-shape case is about 30% faster than explicit
>> shape.

But I agree that is an exception, and one which I'm a little at loss to
explain. I also posted NAG results that showed only about a 1% penalty,
which seems plausible. I suppose that technically fits your comment
about explicit shape being "a little bit faster", to the extent that 1%
counts as a little bit, but I would have said that "about the same" was
a more accurate description.

> I also promised an actual example about the cost of allocatable arrays
> versus static arrays in derived types.

[code and links elided]

I didn't dig into this one as much, as it is more than I feel like
working on it right now. (My daughter has come in twice this morning
while I was doing clf stuff and left after concluding that I was doing
boring things, which was no doubt true). But a quick skim suggests
support for my previous speculation as to the likely culprit.

There are lots of implicit allocations and deallocations here. I'd lay
quite high odds that all those allocations and deallocations are the
source of your performance problems.

In particular, whenever you have an INTENT(OUT) dummy argument or a
function result variable that has an allocatable component, that
component is implicitly deallocated on entry. That's consistent with the
way intent(out) works for anything - it does not use any value that
might or might not have existed before the call. The allocation of any
allocatable components is part of the "value" of a derived type object.

And function results don't even have anything to take an initial value
from. Recall that even in a trivial case like x=f(y), the x is not the
function result variable; instead the function result is computed and
then assigned to x. Optimizers might well shortcut some steps, but there
is no reason why any initial state of x should have anything to do with
what happens while evaluating the function.

After those implicit deallocations, you probably get implicit
allocations later (possibly on assignment); I didn't trace that through
carefully enough to be sure, though the fact that you didn't get errors
from trying to use elements of unallocated arrays suggests that they
probably got allocated somewhere.

> But intermediate variables often need to be created (with ALLOCATE and
> DEALLOCATE for the "alloc" version of the module), especially in the
> cases NF and EF (operators).

I think you will find that there are a lot more allocations and
deallocations than just the allocate and deallocate statements; see
above.

> My conclusions :
> ================
>
> - using allocatable arrays instead of static arrays multiplies the CPU
> by at least 2 even if a minimum number of allocate/deallocate
> statements occur (NR and ER),
> - using operators instead of routines also involves an increase of CPU
> which is acceptable with static arrays (+ 20/50 % ) but not with
> allocatable arrays (+ 100 %)
> - the most elegant solution EF becomes 4 or 5 times slower with
> allocatable arrays than with static arrays.
>
> Of course, I have perhaps missed something which makes allocatable
> arrays inefficient in my programming. So I wait for your comments.

My conclusion is very different. My conclusion is that when using
allocatable things (components or not), you need to be aware of when
allocations and deallocations are going to occur because those
allocations and deallocations can be significant performance drags (and
also because they can make for bugs if you expected a variable to be
allocated when it turns out not to be). The allocations and
deallocations can occur in places that might not be obvious to you at
first.

I am almost certain that your performance drag is because of all the
allocations and deallocations rather than anything inherent to operating
on allocatable components. If one is doing things that don't involve so
many allocations and deallocations, I can think of very little reason
why allocatable components would be slower than fixed-size ones. Your
application appears to do a lot of such allocations and deallocations,
but that isn't necessarily so in general.

So I do see reason for caution here. And perhaps the fixed-size
components might be more appropriate for your case; I haven't tried to
evaluate that. But I think you far over-generalize in attributing this
as a general property of using allocatable components. I would say it is
more a property of using allocatable components in ways where they get
allocated and deallocated a lot.

--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain

fj

unread,

Jun 28, 2009, 4:52:13 PM6/28/09

to

On 28 juin, 21:55, nos...@see.signature (Richard Maine) wrote:

> fj <francois.j...@irsn.fr> wrote:
> > In a previous post, I gave an example demonstrating that using assumed
> > shape array instead of explicit shape array has a cost :
> > - an assume shape array is much more efficient if copy-in/copy-out is
> > required for the explicit shape variant
> > - but without copy-in/copy-out, the explicit shape is always a little
> > bit faster, despite the urban legend saying the contrary ...
>
> That's more in accordance with rather than contrary to what I would call
> the usual "urban legend" (except that being largely true probably
> disqualifies it as being a legend). I don't recall hearing much of
> anyone claim that assumed shape should be faster (except for the
> copy-in/copy-out case). It has plenty of benefits, but faster is not
> generally cited as one of them; I'm not sure where you would have picked
> that up. Perhaps you are extrapolating from a general recommendation to
> use assumed-shape. That recommendation is common; but the recommendation
> does not tend to be based on an expectation of being faster (except for
> the copy-in/copy-out case).

Amusing : the expression "urban legend" was extracted from a comment
of somebody who answered my previous post but who's name was not
"Richard Main" !

>
> I'll also note that the posted data do show at least one exception to
> the above "always". Quoting from my prior post
>
> >> In G95 with no optimization, I see the reverse of the expected
> >> behavior; the assumed-shape case is about 30% faster than explicit
> >> shape.
>
> But I agree that is an exception, and one which I'm a little at loss to
> explain. I also posted NAG results that showed only about a 1% penalty,
> which seems plausible. I suppose that technically fits your comment
> about explicit shape being "a little bit faster", to the extent that 1%
> counts as a little bit, but I would have said that "about the same" was
> a more accurate description.
>
> > I also promised an actual example about the cost of allocatable arrays
> > versus static arrays in derived types.
>
> [code and links elided]
>
> I didn't dig into this one as much, as it is more than I feel like
> working on it right now. (My daughter has come in twice this morning
> while I was doing clf stuff and left after concluding that I was doing
> boring things, which was no doubt true). But a quick skim suggests
> support for my previous speculation as to the likely culprit.

Yes, I agree. I read attentively yours speculations about the
statistics I gave. They are probably true. But how to avoid allocation/
deallocation with allocatable components ?

>
> There are lots of implicit allocations and deallocations here. I'd lay
> quite high odds that all those allocations and deallocations are the
> source of your performance problems.

Perhaps. But in the cases where subroutines are used instead of
operators, the number of allocations is not so high because arguments
are declared in or inout (rarely out). Unfortunately, when adding or
multiplying two vardep variables, it is necessary to create an
intermediate vardep and this is also true for the "static array
version".

>
> In particular, whenever you have an INTENT(OUT) dummy argument or a
> function result variable that has an allocatable component, that
> component is implicitly deallocated on entry. That's consistent with the
> way intent(out) works for anything - it does not use any value that
> might or might not have existed before the call. The allocation of any
> allocatable components is part of the "value" of a derived type object.

I agree but they are very few INTENT(OUT) variables in my module ...
OK I will check that attentively. I possibly missed something here.

>
> And function results don't even have anything to take an initial value
> from. Recall that even in a trivial case like x=f(y), the x is not the
> function result variable; instead the function result is computed and
> then assigned to x. Optimizers might well shortcut some steps, but there
> is no reason why any initial state of x should have anything to do with
> what happens while evaluating the function.

OK but how to take advantage of operators in that case ?

OK you are certainly right even if I don't see clearly them when I
examine the solutions NR and ER (Normal or Elemental Routines). For
operators (Normal Functions and Elemental Functions), I see them very
well (and the CPU too).

I never wanted to generalize but just to show, with an actual example,
that allocatable components are sometimes not easy to manage
efficiently. And my conclusions do not concern allocatable arrays in
derived types but only allocatable arrays in VARDEP derived types !

If you have understood something else, I am sorry ! this was not my
intention. In particular, I use allocatable components in many parts
of the codes I develop ... but not in the var_dep module !

Richard Maine

unread,

Jun 28, 2009, 5:14:44 PM6/28/09

to

fj <franco...@irsn.fr> wrote:

> On 28 juin, 21:55, nos...@see.signature (Richard Maine) wrote:
> > fj <francois.j...@irsn.fr> wrote:
> > > - but without copy-in/copy-out, the explicit shape is always a little
> > > bit faster, despite the urban legend saying the contrary ...
> >
> > That's more in accordance with rather than contrary to what I would call
> > the usual "urban legend" (except that being largely true probably
> > disqualifies it as being a legend). I don't recall hearing much of
> > anyone claim that assumed shape should be faster (except for the
> > copy-in/copy-out case). It has plenty of benefits, but faster is not
> > generally cited as one of them; I'm not sure where you would have picked
> > that up. Perhaps you are extrapolating from a general recommendation to
> > use assumed-shape. That recommendation is common; but the recommendation
> > does not tend to be based on an expectation of being faster (except for
> > the copy-in/copy-out case).
>
> Amusing : the expression "urban legend" was extracted from a comment
> of somebody who answered my previous post but who's name was not
> "Richard Main" !

But you'll note that Gordon used "urban legend" in the opposite sense
from what you did, which was most of my point above. He commented that
it was an "urban legend" that explicit shape was faster. You then said
that "explicit shape is always a little faster, despite the urban legend
to the contrary." It is that "despite" and "to the contrary" that seem
backwards. It would have fit more to say that your data tended to
support the "urban legend" - not that it was contrary to it.

(I don't have anything else useful to say about the allocatable
component stuff; looks like we probably largely agree on it and I think
that was a useful example.)

fj

unread,

Jun 28, 2009, 5:21:35 PM6/28/09

to

On 28 juin, 21:55, nos...@see.signature (Richard Maine) wrote:

I replaced all remaining INTENT(out) into INTENT(inout) ... without
any change on the performances I measured previously. Anyway I have
updated the versions of the var_dep module on my website.

fj

unread,

Jun 28, 2009, 5:30:18 PM6/28/09

to

Because I estimated that he was propagating himself an urban legend :
the fact that all FORTRAN compilers exhibit, for a long time, the same
performance for explicit and assume shape arrays ;-)

fj

unread,

Jun 29, 2009, 6:08:50 AM6/29/09

to

On 28 juin, 20:44, fj <francois.j...@irsn.fr> wrote:
> In a previous post, I gave an example demonstrating that using assumed
> shape array instead of explicit shape array has a cost :
> - an assume shape array is much more efficient if copy-in/copy-out is
> required for the explicit shape variant
> - but without copy-in/copy-out, the explicit shape is always a little
> bit faster, despite the urban legend saying the contrary ...
>
> I also promised an actual example about the cost of allocatable arrays
> versus static arrays in derived types. I cannot post the full example
> here because of the size of the module var_dep (VARiable DEPendencies)
> managing variables associated to partial derivatives (> 1000 lines).
>
> The module itself may be download from my website (which is mainly the
> website of my wife, especially for the perfume part) :
>

> http://parfum-echecs.chez-alice.fr/download/var_dep.f90http://parfum-echecs.chez-alice.fr/download/var_dep_alloc.f90http://parfum-echecs.chez-alice.fr/download/test.f90

>
> Two variants are implemented : static arrays (var_dep.f90) and
> allocatable arrays (var_dep_alloc.f90). These variants are very
> similar (try a diff command to see the difference). The differences
> are located :
> - in the definition of the derides type vardep
> - in the routine AllocDeriv
>
> Here is the test program I propose which partially computes a very
> simple 1D gaseous flow :

A new version for the three files has been installed on my website. So
the main program shown on the forum is not valid anymore.

This new version tries to count the number of allocate statements ...
but such a task is not easy because pure procedures and operator
functions do not accept to modify a global variable.

So I was obliged to lie and to deceive the compilers ... sorry for
this non standard conforming program !

New results on a faster computer (core 2 duo 2.3 Ghz)

ifort var_dep

NR 1.300081 3997000
ER 1.156072 3997000
NF 1.656104 11992000
EF 1.572098 11992000
SC 1.6001225E-02 0

ifort var_dep_alloc

NR 2.460154 3997000
ER 2.368148 3997000
NF 6.228389 15989000
EF 5.720356 15989000
SC 1.6002655E-02 0

gfortran var_dep

NR 1.5240949 3997000
ER 1.4440911 3997000
NF 2.8921797 11992000
EF 2.6921697 11992000
SC 3.60012054E-02 0

gfortran var_dep_alloc

NR 2.7801740 3997000
ER 2.5321581 3997000
NF 5.1603231 15989000
EF 5.6003494 15989000
SC 3.60031128E-02 0

g95 var_dep

NR 6.1803856 0
ER 5.472342 0
NF 8.076505 0
EF 8.212513 0
SC 0.040002823 0

g95 var_dep_alloc
segfault

Damned : a bug somewhere, probably the same that Tobias Burnus found
this morning.

robin

unread,

Jun 29, 2009, 12:04:43 PM6/29/09

to

"fj" <franco...@irsn.fr> wrote in message
news:ca6e7a24-e44c-4419...@b14g2000yqd.googlegroups.com...

>I replaced all remaining INTENT(out) into INTENT(inout) ... without
>any change on the performances I measured previously. Anyway I have
>updated the versions of the var_dep module on my website.

One of the purposes of INTENT(OUT) is to convey to the reader
that no value is passed into the procedure via that dummy argument,
and that the procedure assigns (in one way or another) a value to
that dummy argument.

Tobias Burnus

unread,

Jun 30, 2009, 3:29:42 AM6/30/09

to

On 29 Jun., 18:04, "robin" <robi...@bigpond.com> wrote:
> >I replaced all remaining INTENT(out) into INTENT(inout) ... without
> >any change on the performances I measured previously. Anyway I have
> >updated the versions of the var_dep module on my website.
>
> One of the purposes of INTENT(OUT) is to convey to the reader
> that no value is passed into the procedure via that dummy argument,
> and that the procedure assigns (in one way or another) a value to
> that dummy argument.

Well, the information is also used by the compiler

a) For diagnostics, i.e. you cannot assign to a INTENT(IN) variable or
pass a literal/parameter to a INTENT(OUT)/INTENT(INOUT).

b) For optimization, e.g., if an array is not contiguous but the dummy
argument requires this (e.g. assumed size), the data has* to be copied
into a temporary variable - and after the procedure call it has to be
copied back into the original array. The last step can be omitted if
the compiler knows that the dummy variable is INTENT(IN). (There are
some other optimization possibilities.)

Thus: Never lie to the compiler that a variable is INTENT(IN) or INTENT
(OUT) if it isn't. (Using modules or internal procedures reduces the
likelihood of having inconsistent intents. But one has still to be
careful when using procedures as actual argument as most compilers do
not check whether the interface of the actual argument matches the one
of the dummy procedure.)

Tobias

(* That what happens in most (?) implementations; the standard just
says what has to work not how it has to be implemented.)

fj

unread,

Jun 30, 2009, 8:20:55 AM6/30/09

to

Finally, this bug looks like a compiler trouble. I came back to the
last stable version of g95 (g95 0.91!) Feb 26 2008 :

> g95 var_dep_alloc

NR 15.148946 0
ER 14.464904 0
NF 35.68223 0
EF 51.483215 0
SC 0.040000916 0

Notice that g95 accepts the trick to get a count of the allocatable
statements but the result is 0 as if the call to the routine "count"
was simply ignored, surely because this one has no argument (it
modifies a common variable) and therefore cannot serve to anything in
a true pure routine. Well done Andy !

Of course, I verified that the trick was not responsible of the crash
with the most recent g95 compiler. So I am preparing a bug report for
Andy (my vengeance for my trick falling down into the trap of Andy's
optimization).

robin

unread,

Jun 30, 2009, 9:10:32 PM6/30/09

to

"Tobias Burnus" <bur...@net-b.de> wrote in message
news:0d6bf265-580c-4e8f...@x17g2000yqd.googlegroups.com...

> On 29 Jun., 18:04, "robin" <robi...@bigpond.com> wrote:
> > >I replaced all remaining INTENT(out) into INTENT(inout) ... without
> > >any change on the performances I measured previously. Anyway I have
> > >updated the versions of the var_dep module on my website.
> >
> > One of the purposes of INTENT(OUT) is to convey to the reader
> > that no value is passed into the procedure via that dummy argument,
> > and that the procedure assigns (in one way or another) a value to
> > that dummy argument.
>
> Well, the information is also used by the compiler

I know that, and I expect that the OP does also.
I was commenting in the fact that the OP changed his
INTENT(OUT) dummy arguments to INOUT,
which was not a good step.

> a) For diagnostics, i.e. you cannot assign to a INTENT(IN) variable or
> pass a literal/parameter to a INTENT(OUT)/INTENT(INOUT).
>
> b) For optimization, e.g., if an array is not contiguous but the dummy
> argument requires this (e.g. assumed size), the data has* to be copied
> into a temporary variable - and after the procedure call it has to be
> copied back into the original array. The last step can be omitted if
> the compiler knows that the dummy variable is INTENT(IN). (There are
> some other optimization possibilities.)

And vice versa if the dummy argument is INTENT (OUT).

fj

unread,

Jul 1, 2009, 3:30:27 AM7/1/09

to

On 1 juil, 03:10, "robin" <robi...@bigpond.com> wrote:
> "Tobias Burnus" <bur...@net-b.de> wrote in message
>
> news:0d6bf265-580c-4e8f...@x17g2000yqd.googlegroups.com...
>
> > On 29 Jun., 18:04, "robin" <robi...@bigpond.com> wrote:
> > > >I replaced all remaining INTENT(out) into INTENT(inout) ... without
> > > >any change on the performances I measured previously. Anyway I have
> > > >updated the versions of the var_dep module on my website.
>
> > > One of the purposes of INTENT(OUT) is to convey to the reader
> > > that no value is passed into the procedure via that dummy argument,
> > > and that the procedure assigns (in one way or another) a value to
> > > that dummy argument.
>
> > Well, the information is also used by the compiler
>
> I know that, and I expect that the OP does also.
> I was commenting in the fact that the OP changed his
> INTENT(OUT) dummy arguments to INOUT,
> which was not a good step.

I know that very well but when a derided type variable contains
allocatable or pointer components, it is often more efficient (as
underlined by R. Maine) to take care about deallocation and
reallocation which are "automatic" in case of INTENT(OUT) : an
allocatable array already allocated is immediately deallocated when
entering the routine whereas a pointer may be nullified. So, when I
changed few variables from INTENT(OUT) to INTENT(INOUT), this was
because allocations may be "IN" whereas values are always "OUT".

And as I was aware about the possible performance decrease due to too
many allocate/deallocate statements, most "out" variables where
already declared "inout" and I programed a specific routine
(AllocDeriv) to check whether allocatable arrays are already allocated
at a sufficient size. So, after the advice of Richard, I just checked
again the existence of remaining INTENT(out) variables (I wrote the
module "var_dep" two year ago) and found few (exactly 5). To avoid
performance troubles, I changed them from OUT to INOUT (in verifying
rapidly that this change was possible without danger) but without real
hope because this work was already done previously.

fj

unread,

Jul 1, 2009, 3:30:45 AM7/1/09

to

On 1 juil, 03:10, "robin" <robi...@bigpond.com> wrote:

> "Tobias Burnus" <bur...@net-b.de> wrote in message
>
> news:0d6bf265-580c-4e8f...@x17g2000yqd.googlegroups.com...
>
> > On 29 Jun., 18:04, "robin" <robi...@bigpond.com> wrote:
> > > >I replaced all remaining INTENT(out) into INTENT(inout) ... without
> > > >any change on the performances I measured previously. Anyway I have
> > > >updated the versions of the var_dep module on my website.
>
> > > One of the purposes of INTENT(OUT) is to convey to the reader
> > > that no value is passed into the procedure via that dummy argument,
> > > and that the procedure assigns (in one way or another) a value to
> > > that dummy argument.
>
> > Well, the information is also used by the compiler
>
> I know that, and I expect that the OP does also.
> I was commenting in the fact that the OP changed his
> INTENT(OUT) dummy arguments to INOUT,
> which was not a good step.

I know that very well but when a derided type variable contains

allocatable or pointer components, it is often more efficient (as
underlined by R. Maine) to take care about deallocation and
reallocation which are "automatic" in case of INTENT(OUT) : an
allocatable array already allocated is immediately deallocated when
entering the routine whereas a pointer may be nullified. So, when I
changed few variables from INTENT(OUT) to INTENT(INOUT), this was
because allocations may be "IN" whereas values are always "OUT".

And as I was aware about the possible performance decrease due to too
many allocate/deallocate statements, most "out" variables where
already declared "inout" and I programed a specific routine
(AllocDeriv) to check whether allocatable arrays are already allocated
at a sufficient size. So, after the advice of Richard, I just checked
again the existence of remaining INTENT(out) variables (I wrote the
module "var_dep" two year ago) and found few (exactly 5). To avoid
performance troubles, I changed them from OUT to INOUT (in verifying
rapidly that this change was possible without danger) but without real
hope because this work was already done previously.
>

fj

unread,

Jul 1, 2009, 4:35:24 AM7/1/09

to

On 1 juil, 03:10, "robin" <robi...@bigpond.com> wrote:

> "Tobias Burnus" <bur...@net-b.de> wrote in message
>
> news:0d6bf265-580c-4e8f...@x17g2000yqd.googlegroups.com...
>
> > On 29 Jun., 18:04, "robin" <robi...@bigpond.com> wrote:
> > > >I replaced all remaining INTENT(out) into INTENT(inout) ... without
> > > >any change on the performances I measured previously. Anyway I have
> > > >updated the versions of the var_dep module on my website.
>
> > > One of the purposes of INTENT(OUT) is to convey to the reader
> > > that no value is passed into the procedure via that dummy argument,
> > > and that the procedure assigns (in one way or another) a value to
> > > that dummy argument.
>
> > Well, the information is also used by the compiler
>
> I know that, and I expect that the OP does also.
> I was commenting in the fact that the OP changed his
> INTENT(OUT) dummy arguments to INOUT,
> which was not a good step.

I know that very well but when a derided type variable contains

allocatable or pointer components, it is often more efficient (as
underlined by R. Maine) to take care about deallocation and
reallocation which are "automatic" in case of INTENT(OUT) : an
allocatable array already allocated is immediately deallocated when
entering the routine whereas a pointer may be nullified. So, when I
changed few variables from INTENT(OUT) to INTENT(INOUT), this was
because allocations may be "IN" whereas values are always "OUT".

And as I was aware about the possible performance decrease due to too
many allocate/deallocate statements, most "out" variables where
already declared "inout" and I programed a specific routine
(AllocDeriv) to check whether allocatable arrays are already allocated
at a sufficient size. So, after the advice of Richard, I just checked
again the existence of remaining INTENT(out) variables (I wrote the
module "var_dep" two year ago) and found few (exactly 5). To avoid
performance troubles, I changed them from OUT to INOUT (in verifying
rapidly that this change was possible without danger) but without real
hope because this work was already done previously.
>

robin

unread,

Jul 2, 2009, 9:00:27 PM7/2/09

to

"fj" <franco...@irsn.fr> wrote in message

news:bc1eaf5b-141e-479c...@q11g2000yqi.googlegroups.com...

And did changing those 5 INTENTs make any difference?

BTW, did you do a profile on the code to find where the time is being spent?

Also, did you try another compiler (as I suggested)?
In my test on your code, there was no difference in time between your codes.

fj

unread,

Jul 2, 2009, 9:30:24 PM7/2/09

to

On 3 juil, 03:00, "robin" <robi...@bigpond.com> wrote:
> "fj" <francois.j...@irsn.fr> wrote in message

No

>
> BTW, did you do a profile on the code to find where the time is being spent?

Yes/No : I profiled the code in counting for allocate/deallocate and
in measuring the CPU-TIME of different parts. But I did not use a
specific profiling tool like gprof because the routines I want to
profile are too fast (the overhead of a profile tool would be too
high).

>
> Also, did you try another compiler (as I suggested)?

I tried 3 different compilers (ifort, g95, gfortran). Did you look at
the results ? The cpu time measurement for the three compilers is
provided in that thread !

> In my test on your code, there was no difference in time between your codes.

No : you never tested that program as far as I know. I cannot believe
that you could get the same results with static arrays and allocatable
arrays in that particular test.

You just indicated that you tested the first program I proposed in a
previous post (comparison between explicit shape and assume shape
arguments), but without providing neither the compiler name you used,
nor the kind of CPU you have, the optimization options you have
chosen, the results you got ...

Richard Maine got effectively almost the same CPU time for the two
versions of the first program, with the Nag compiler, in activating
specific compiling options asking the compiler for assuming contiguous
memory in case of assume shape arrays. Without these options, he got
about the same results (+10%) I got with ifort or g95 (for gfortran,
the difference was larger).

robin

unread,

Jul 4, 2009, 12:15:43 PM7/4/09

to

"fj" <franco...@irsn.fr> wrote in message

news:612ffe3f-4687-4eaf...@t13g2000yqt.googlegroups.com...

On 3 juil, 03:00, "robin" <robi...@bigpond.com> wrote:
>> "fj" <francois.j...@irsn.fr> wrote in message
>
>> news:bc1eaf5b-141e-479c...@q11g2000yqi.googlegroups.com...
>

>> > I know that very well but when a derided type variable contains

>> > allocatable or pointer components, it is often more efficient (as
>> > underlined by R. Maine) to take care about deallocation and
>> > reallocation which are "automatic" in case of INTENT(OUT) : an
>> > allocatable array already allocated is immediately deallocated when
>> > entering the routine whereas a pointer may be nullified. So, when I
>> > changed few variables from INTENT(OUT) to INTENT(INOUT), this was
>> > because allocations may be "IN" whereas values are always "OUT".
>
>> > And as I was aware about the possible performance decrease due to too
>> > many allocate/deallocate statements, most "out" variables where
>> > already declared "inout" and I programed a specific routine
>> > (AllocDeriv) to check whether allocatable arrays are already allocated
>> > at a sufficient size. So, after the advice of Richard, I just checked
>> > again the existence of remaining INTENT(out) variables (I wrote the
>> > module "var_dep" two year ago) and found few (exactly 5). To avoid
>> > performance troubles, I changed them from OUT to INOUT (in verifying
>> > rapidly that this change was possible without danger) but without real
>> > hope because this work was already done previously.
>
>> And did changing those 5 INTENTs make any difference?

>No

Point made, then.

>> BTW, did you do a profile on the code to find where the time is being spent?

>Yes/No : I profiled the code in counting for allocate/deallocate and
>in measuring the CPU-TIME of different parts.

The purpose of profiling is to determine where significant CPU time is spent,
and then to use that information to investigate improvements.
Counting a few specific statements is unlikely to do that.

> But I did not use a
>specific profiling tool like gprof because the routines I want to
>profile are too fast (the overhead of a profile tool would be too
>high).

Nonsense.
A profiling tool gives the execution time spent in each procedure,
at the very least.
The Silverfrost FTN95 compiler has one built in ( /TIMING option).
The extra time taken is 0.2%
(not counting the extra 10 seconds to calibrate the timer).

>> Also, did you try another compiler (as I suggested)?

>I tried 3 different compilers (ifort, g95, gfortran). Did you look at
>the results ?

Because I replied to your post containing that information
(and included yiour post), you can be certain that I saw the results.

It's because of your tests that I suggested using another compiler.

>The cpu time measurement for the three compilers is
>provided in that thread !

I know. That's why I suggested using another compiler.

>> In my test on your code, there was no difference in time between your codes.

>No : you never tested that program as far as I know.

I posted that I had run your code, and that there was no difference
between the two times. Did you read my reply?

> I cannot believe
>that you could get the same results with static arrays and allocatable
>arrays in that particular test.

It's true. You can do the test too.

>You just indicated that you tested the first program I proposed in a
>previous post (comparison between explicit shape and assume shape
>arguments), but without providing neither the compiler name you used,
>nor the kind of CPU you have, the optimization options you have
>chosen, the results you got ...

I told you the results. I told you the compilation options
(namely, no optimisation and full optimisation).