f90: Forte Developer 7 Fortran 95 7.0 Patch 111714-09 2003/10/15
I compile with use of the "-fast" option.
A critical routine in my code has a heading similiar to this:
| subroutine foo (n, x, w)
| integer, intent (in) :: n
| real (kind=dp), intent (in) :: x(0:)
| real (kind=dp), intent (out) :: w(0:)
The expected size of w is n, and at the top of my code I test
| if (size(w).ne.n) then
| stop '...'
| endif
Profiling tells me that this subroutine executes three times faster
if the declaration of w is replaced by
| real (kind=dp), intent (out) :: w(0:n-1)
Is this kind of behavior a known feature? I'm quite annoyed with
it, because the replacement declaration is much more sensitive to
errors. (A routine that calls foo could pass to w an array of any
size at least w and of any dimension, and all is legal.)
Bas Braams
One reason code for explicit-shape arrays can run faster than code for
assumed-shape arrays is that explicit-shape arrays are known to be
contiguous. While the optimizer can guess that it is worthwhile to
test for contiguity at the top and clone different versions of the
code for the contiguous and discontiguous cases, it can't always
guess correctly.
Sincerely,
Bob Corbett
can you translate out of standardese?
here is what a human sees:
w(0:) ! lower bound declared
w(0:n-1) ! upper and lower bound declared
A human wants the compiler to be able to check accesses to w so that
it is never indexed by more than 'n-1'. It is quite a reasonable idea.
Now "explicit shape" and "assumed shape"---what the hell do those
words mean here? Nobody said squat about 'shape'.
To an average human, the shape of both appears to be one dimensional,
and only the bounds declaration whas changed. One had explicit upper
bound, the other did not.
Quite. But with assumed-shape - the type of declaration containing a
colon - the actual argument to the dummy argument given above might be
x(37, 0:2n-2:2), for instance. And the called routine has to be able
to handle this case properly. You either have the caller "compact" the
actual to a temporary and call the subroutine with that and "un-compact"
(for a dummy not declared INTENT(IN)) afterward, or the callee has to
inspect the dope vector associated with the argument to decide whether
it can use the code for contiguous arrays or the code for non-contiguous
arrays. Ain't that simple...
Jan
Which one of the above does that correspond to? I see colons
in both w(0:) and w(0:n-1).
> And the called routine has to be able
> to handle this case properly. You either have the caller "compact" the
> actual to a temporary and call the subroutine with that and "un-compact"
> (for a dummy not declared INTENT(IN)) afterward, or the callee has to
> inspect the dope vector associated with the argument to decide whether
> it can use the code for contiguous arrays or the code for non-contiguous
> arrays. Ain't that simple...
OK, but the user knows nothing about this, and only that one has
an upper bound and the other does not. There was no change to
contiguity in either case that the user ever intended.
Why not have a "contiguous" specifier like
subroutine foo(n,w)
integer, intent(in) :: n
real, intent(in), contiguous :: w(0:n-1)
Hence, the *callers* would have to manufacture contiguous
arrays if necessary.
And remember remember, the user only wanted an upper bound!
I still don't understand why adding an explicit upper bound should
suddenly induce contiguity issues.
> Jan
Huh? What about:
program blah
integer, parameter :: n = 16
real, dimension(0:n-1) :: x
x = real(1)
call foo( n/3, x(0:n-1:3 ) )
write( *, '( 20f5.1 )' ) x
contains
subroutine foo( n, x )
integer, intent(in) :: n
real, dimension(0:n-1), intent(in out) :: x
x = real(2)
end subroutine foo
end program blah
the explicit shape dummy arg array x in subroutine foo isn't contiguous (assuming I know
what you mean by contiguous). All 3 of pgf90 5.2, ifort v8 and lf95 v6.2 print
2.0 1.0 1.0 2.0 1.0 1.0 2.0 1.0 1.0 2.0 1.0 1.0 2.0 1.0 1.0 1.0
when the code is run.
cheers,
paulv
p.s. I did the intent(in out) on the dummy arg x so that it wouldn't lose its values of
1.0 on entry to the subroutine. I get the same result if the intent is simply intent(out)
but that doesn't seem right. I think the absoft compiler would complain about that.
> p.s. I did the intent(in out) on the dummy arg x so that it wouldn't lose its values of
> 1.0 on entry to the subroutine. I get the same result if the intent is simply intent(out)
> but that doesn't seem right. I think the absoft compiler would complain about that.
Try printing the values returned by LOC for the sequence of elements.
Sincerely,
Bob Corbett
Ahh! I modified the code to:
program blah
integer, parameter :: n = 16
real, dimension(0:n-1) :: x
integer :: i
x = real(1)
print *, 'Main x locs:'
do i = 0,n-1
print *, loc(x(i))
end do
call foo( n/3, x(0:n-1:3 ) )
write( *, '( 20f5.1 )' ) x
contains
subroutine foo( n, x )
integer, intent(in) :: n
real, dimension(0:n-1), intent(in out) :: x
integer :: i
x = real(2)
print *, 'Sub x locs:'
do i = 0,size(x)-1
print *, loc(x(i))
end do
end subroutine foo
end program blah
and got:
Main x locs:
134522528
134522532
134522536
134522540
134522544
134522548
134522552
134522556
134522560
134522564
134522568
134522572
134522576
134522580
134522584
134522588
Sub x locs:
134522592
134522596
134522600
134522604
134522608
2.0 1.0 1.0 2.0 1.0 1.0 2.0 1.0 1.0 2.0 1.0 1.0 2.0 1.0 1.0 1.0
When I changed the foo sub dummy arg x decl from
real, dimension(0:n-1), intent(in out) :: x
to
real, dimension(0:), intent(in out) :: x
I then got:
Main x locs:
134522388
134522392
134522396
134522400
134522404
134522408
134522412
134522416
134522420
134522424
134522428
134522432
134522436
134522440
134522444
134522448
Sub x locs:
134522388
134522400
134522412
134522424
134522436
134522448
2.0 1.0 1.0 2.0 1.0 1.0 2.0 1.0 1.0 2.0 1.0 1.0 2.0 1.0 1.0 2.0
So use of an explicit shape array guarantees that a copy of the argument will be made (at
least on the compilers I tested, pgf90 5.2, ifort 8.0, and lf95 v6.2 on linux). Is that a
Fortran Standard detail or an implementation one?
I probably should have know about this already, but this littel exercise means I won't
forget it now. Cool. Thanks.
cheers,
paulv
It's an implementation detail. Nothing in the standard
requires that a copy of an argument be made. And that
includes passing vector valued sections. There is one
(I think) obscure case that effectively disallows passing
a copy when enough pointers are used. But the general rule
applies: if your program can detect whether or not copies
are being passed, you need to fix your code ;).
Dick Hendrickson
O.k., that's what I suspected. The original statement by Bob Corbett (that got me involved
in this discussion) was:
"One reason code for explicit-shape arrays can run faster than code for
assumed-shape arrays is that explicit-shape arrays are known to be
contiguous."
I interpreted this as a general statement about fortran, but I see now he was really just
referring to the Sun f90 compiler and how it implemented explicit-shape arrays dummy args.
> There is one
> (I think) obscure case that effectively disallows passing
> a copy when enough pointers are used. But the general rule
> applies: if your program can detect whether or not copies
> are being passed, you need to fix your code ;).
Well, even in the original case it wasn't the code that detected that a copy was being
made - it was the OP! He noticed a large difference in execution speed between using
assummed- or explicit-shape dummy arguments. Doesn't that imply the compiler should be
"fixed" (to better handle the assumed-shape dummy argument case) ?
It's all very well to say that you shouldn't need to worry about what a compiler does
under the hood -- I say it to people all the time -- but when you notice a large
performance hit like the OP did, you'd be nuts *not* to modify the code to eliminate the
cause of the performance degradation. That gets you immediate speed. Complaining to the
vendor to get their product to optimise code better seems like a windmill jousting type of
proposition (in my experience at least ... although Sun has always been the more open and
accessible of the Big Three, Sun/SGI/IBM).
That's another reason I like linux over a vendor like Sun/IBM/SGI. There's a wide variety
of compilers to immediately choose from if any particular one sucks in one way or another.
And the licensing is oh so much cheaper! [*]
cheers,
paulv
[*] Note for readers (Gerry..you still out there?) who think when I say "cheaper" I mean
"free". I don't. :o)
Paul Van Delst wrote:
> Dick Hendrickson wrote:
>
>>
>>
>> Paul Van Delst wrote:
>>
> [snip]
>
>>>
>>> So use of an explicit shape array guarantees that a copy of the
>>> argument will be made (at least on the compilers I tested, pgf90 5.2,
>>> ifort 8.0, and lf95 v6.2 on linux). Is that a Fortran Standard detail
>>> or an implementation one?
>>
>>
>>
>> It's an implementation detail. Nothing in the standard
>> requires that a copy of an argument be made. And that
>> includes passing vector valued sections.
>
>
> O.k., that's what I suspected. The original statement by Bob Corbett
> (that got me involved in this discussion) was:
>
> "One reason code for explicit-shape arrays can run faster than code for
> assumed-shape arrays is that explicit-shape arrays are known to be
> contiguous."
>
> I interpreted this as a general statement about fortran, but I see now
> he was really just referring to the Sun f90 compiler and how it
> implemented explicit-shape arrays dummy args.
>
Right, the old FORTRAN 77 way of passing an array was to
just pass the base address, not a dope vector, because
there was no F77 way to be discontiguous. I was being
overly academic (or something ;) ) when I said nothing
"requires" it. I sort of meant that F77 compilers
could have passed dope vectors as well as base addresses,
but none of them did.
>
>> There is one
>> (I think) obscure case that effectively disallows passing
>> a copy when enough pointers are used. But the general rule
>> applies: if your program can detect whether or not copies
>> are being passed, you need to fix your code ;).
>
>
> Well, even in the original case it wasn't the code that detected that a
> copy was being made - it was the OP! He noticed a large difference in
> execution speed between using assummed- or explicit-shape dummy
> arguments. Doesn't that imply the compiler should be "fixed" (to better
> handle the assumed-shape dummy argument case) ?
Yes, You're right again. When I said "program", I meant
the code, not the guy waiting for it to finish. I was
making more of a standards answer, not a totally practical
answer.
Dick Hendrickson
(snip)
> It's an implementation detail. Nothing in the standard
> requires that a copy of an argument be made. And that
> includes passing vector valued sections. There is one
> (I think) obscure case that effectively disallows passing
> a copy when enough pointers are used. But the general rule
> applies: if your program can detect whether or not copies
> are being passed, you need to fix your code ;).
I believe that in some cases the dummy argument can be a
one dimensional array, when the actual argument has more than
one dimension. In that case, there is an assumption that
the array is in contiguous storage.
It might be possible to arrange it such that array elements
were properly mapped from a non-contiguous subarray, but a
copy is simpler.
Depending on how much the array is used, and the access pattern,
it may end up faster to make the copy, and copy back before
returning.
-- glen
glen herrmannsfeldt wrote:
> Dick Hendrickson wrote:
>
> (snip)
>
>> It's an implementation detail. Nothing in the standard
>> requires that a copy of an argument be made. And that
>> includes passing vector valued sections. There is one
>> (I think) obscure case that effectively disallows passing
>> a copy when enough pointers are used. But the general rule
>> applies: if your program can detect whether or not copies
>> are being passed, you need to fix your code ;).
>
>
> I believe that in some cases the dummy argument can be a
> one dimensional array, when the actual argument has more than
> one dimension. In that case, there is an assumption that
> the array is in contiguous storage.
Actually, neither rank needs to be one. It's possible to
pass any N dimensional array to any M dimensional one.
(for fun, you can even pass character arrays with LEN = J
to ones with LEN=K). The rule about what must happen is
clearly spelled out in the standard, and the compiler must
do the right thing. But, from a purely academic view, it
doesn't have to have things in contiguous memory. But,
that's the only practical way to get argument passing and
EQUIVALENCE to work efficiently. Compilers have always
been free to pass dope vectors describing the storage
layouts.
>
> It might be possible to arrange it such that array elements
> were properly mapped from a non-contiguous subarray, but a
> copy is simpler.
Absolutely!
>
> Depending on how much the array is used, and the access pattern,
> it may end up faster to make the copy, and copy back before
> returning.
Absolutely, that why compiler optimizers get paid millions
of dollars a year!
Dick Hendrickson
>
> -- glen
>
> So use of an explicit shape array guarantees that a copy of the argument will be made (at
> least on the compilers I tested, pgf90 5.2, ifort 8.0, and lf95 v6.2 on linux). Is that a
> Fortran Standard detail or an implementation one?
The Fortran 95 standard does not require that behavior, but the
standard and the existing code base strongly favor it. I believe
every existing Fortran 90/95 compiler works that way.
An implementation could use array descriptors to pass actual arguments
in all cases where the corresponding dummy argument is not known to be
a scalar. Such an implementation have a performance advantage if
discontiguous arrays were routinely passed to dummy arguments that are
explicit-shape or assumed-size (not assumed-shape) arrays. Since such
codes are rare, the cost of using descriptors for explicit-shape and
assumed-size is rarely justified. Since the cost of copying arrays is
high, users avoid writing codes that pass discontiguous arrays to
explicit-shape or assumed-size arrays. The circle is complete.
> I probably should have know about this already, but this littel exercise means I won't
> forget it now. Cool. Thanks.
You're welcome.
Sincerely,
Bob Corbett
> > So use of an explicit shape array guarantees that a copy of the argument
> > will be made (at least on the compilers I tested, pgf90 5.2, ifort 8.0,
> > and lf95 v6.2 on linux). Is that a Fortran Standard detail or an
> > implementation one?
>
> It's an implementation detail. Nothing in the standard
> requires that a copy of an argument be made. And that
> includes passing vector valued sections. There is one
> (I think) obscure case that effectively disallows passing
> a copy when enough pointers are used.
Which case do you have in mind? The only cases I know where
a copy cannot be used are when the dummy argument is scalar
or is an assumed-shape or deferred-shape array, and some
other conditions apply. I can't think of a case where a
copy of an array actual argument cannot be passed to an
explicit-shape dummy argument.
Sincerely,
Bob Corbett
> glen herrmannsfeldt wrote:
(snip)
>> I believe that in some cases the dummy argument can be a
>> one dimensional array, when the actual argument has more than
>> one dimension. In that case, there is an assumption that
>> the array is in contiguous storage.
> Actually, neither rank needs to be one. It's possible to
> pass any N dimensional array to any M dimensional one.
> (for fun, you can even pass character arrays with LEN = J
> to ones with LEN=K). The rule about what must happen is
> clearly spelled out in the standard, and the compiler must
> do the right thing. But, from a purely academic view, it
> doesn't have to have things in contiguous memory. But,
> that's the only practical way to get argument passing and
> EQUIVALENCE to work efficiently. Compilers have always
> been free to pass dope vectors describing the storage
> layouts.
I found out from a discussion here one that I never knew
about from many years ago. VAX/VMS passes CHARACTER
constants by descriptor, but at LINK time they are
converted to addresses if the dummy argument is not
a CHARACTER variable. (Fortran 66 compatibility.)
That is the only time I have known argument types to
be checked or corrected at link time.
(snip)
>> Depending on how much the array is used, and the access pattern,
>> it may end up faster to make the copy, and copy back before
>> returning.
> Absolutely, that why compiler optimizers get paid millions
> of dollars a year!
I haven't known optimizers to detect how often code will
be executed, other than to assume that inner loops will be
executed more often than outer loops. If the copy is
done by the caller, it is especially unlikely to know
how often the array will be referenced in the subroutine.
The important point being that because of the way the
cache works, it could easily be faster to copy the array
to contiguous storage.
-- glen
> > There is one
> > (I think) obscure case that effectively disallows passing
> > a copy when enough pointers are used. But the general rule
> > applies: if your program can detect whether or not copies
> > are being passed, you need to fix your code ;).
>
> Well, even in the original case it wasn't the code that detected that a copy was being
> made - it was the OP! He noticed a large difference in execution speed between using
> assummed- or explicit-shape dummy arguments. Doesn't that imply the compiler should be
> "fixed" (to better handle the assumed-shape dummy argument case) ?
In the original case, where the dummy argument is an assumed-shape array,
no copy is made. My guess is that the argument passed to the array is
contiguous and so no copy is made in the second case where the dummy
argument is an explicit-shape array. The reason the code runs slower
when the dummy argument is an assumed-shape array than when the dummy
argument is an explicit-shape array is that the code that accesses the
assumed-shape array must assume that the array might be discontiguous
while the code that accesses the explicit-shape array can assume that the
array is contiguous.
As I said earlier in this thread, the routine with the assumed-shape
dummy argument could test to see if the array is contiguous at the top of
the routine and branch to a clone of the body of the routine that takes
advantage of contiguity in that case. Given the code in code size, an
optimizer is likely to be reluctant to do that optimization.
> It's all very well to say that you shouldn't need to worry about what a compiler does
> under the hood -- I say it to people all the time -- but when you notice a large
> performance hit like the OP did, you'd be nuts *not* to modify the code to eliminate the
> cause of the performance degradation.
Fortran 90/95 made the situation much worse than it was in FORTRAN 77.
For FORTRAN 77, pretty much all of the major vendors implemented the
same set of optimizations. Fortran 90/95's array expressions create far
more problems for optimizers. The number of possible optimizations is
staggering, but there are no known algorithms that implement more than a
tiny portion of the possible optimizations. As a result, the amount of
code in Fortran 90/95 compilers devoted to optimizing array expressions
is huge, but the chance that a compiler implements a given optimization
that someone might think it should is small. Furthermore, each compiler
implements a different set of optimizations.
Sincerely,
Bob Corbett