I have a module function like so
ELEMENTAL FUNCTION Cloud_Add( cld1, cld2 ) RESULT( cldsum )
TYPE(Cloud_type), INTENT(IN) :: cld1, cld2
TYPE(Cloud_type) :: cldsum
....
END FUNCTION Cloud_Add
that is overloaded with OPERATOR(+),
INTERFACE OPERATOR(+)
MODULE PROCEDURE Cloud_Add
END INTERFACE OPERATOR(+)
The "cloud" structure itself is part of the "atmosphere" structure,
TYPE :: Atmosphere_type
....
TYPE(Cloud_type), ALLOCATABLE :: Cloud(:)
....
END TYPE Atmosphere_type
for which I also have an "add" function, which includes a call to the Cloud_Add() function
via the + operator:
ELEMENTAL FUNCTION Atmosphere_Add( atm1, atm2 ) RESULT( atmsum )
TYPE(Atmosphere_type), INTENT(IN) :: atm1, atm2
TYPE(Atmosphere_type) :: atmsum
....
atmsum = atm1
....
IF ( atm1%n_Clouds > 0 ) THEN
nc = atm1%n_Clouds
atmsum%Cloud(1:nc) = atmsum%Cloud(1:nc) + atm2%Cloud(1:nc)
END IF
....
END FUNCTION Atmosphere_Add
When invoked, the line
atmsum%Cloud(1:nc) = atmsum%Cloud(1:nc) + atm2%Cloud(1:nc)
causes a segfault. When I change it to the more usual
atmsum%Cloud = atmsum%Cloud + atm2%Cloud
everything is fine. No segfault and results are correct.
Obviously my use of array slices when the function result is also one of the arguments is
causing a problem. Regular clf readers will know that treating an array slice like an
array will one day BYITA - today appears to be my turn.
First off, is the array-slice version legal? If it is, can anyone put forward a possible
explanation as to what the mechanism of that error might be. E.g. something is being
copied and the order one of the operations is important and/or...? I just want to provide
some sort of cogent explanation of the problem to others so they don't do what I did.
Any insights appreciated.
cheers,
paulv
> IF ( atm1%n_Clouds > 0 ) THEN
> nc = atm1%n_Clouds
> atmsum%Cloud(1:nc) = atmsum%Cloud(1:nc) + atm2%Cloud(1:nc)
> END IF
(snip)
> When invoked, the line
> atmsum%Cloud(1:nc) = atmsum%Cloud(1:nc) + atm2%Cloud(1:nc)
> causes a segfault. When I change it to the more usual
> atmsum%Cloud = atmsum%Cloud + atm2%Cloud
> everything is fine. No segfault and results are correct.
What is the value of nc at that point? If it is nowhere near
the value it should have, then segfault is likely. If it is,
then I would wonder about compiler bugs.
I notice that nc comes from atm1, but atm2 is used in the sum.
I would print nc as a debugging aid until the problem was fixed.
-- glen
Well, I can't print it out since the function is elemental, but your point is well taken -
the array slice may be a red herring.
I will check the values in a debugger.
BTW, the problem only appears to occur on an IBM system using xlf95. On a linux system
(RHE 5.0, gfortran 4.4) everything looks to be o.k.
Thanks,
paulv
> BTW, the problem only appears to occur on an IBM system using xlf95. On a linux system
> (RHE 5.0, gfortran 4.4) everything looks to be o.k.
Which xlf version? Some older versions had problems with allocatable
derived type components, but recent versions of xlf2003 (e.g. v12.1)
have improved a lot.
>>> everything is fine. No segfault and results are correct.
>> What is the value of nc at that point? If it is nowhere near
>> the value it should have, then segfault is likely. If it is,
>> then I would wonder about compiler bugs.
>> I notice that nc comes from atm1, but atm2 is used in the sum.
>> I would print nc as a debugging aid until the problem was fixed.
> Well, I can't print it out since the function is elemental,
> but your point is well taken - the array slice may be a red herring.
I had noticed that Cloud_Add was elemental, but I didn't notice
the other one. I hope they remove this restriction. It does
make some sense, but it makes debugging hard sometimes.
> I will check the values in a debugger.
> BTW, the problem only appears to occur on an IBM system using xlf95.
> On a linux system
> (RHE 5.0, gfortran 4.4) everything looks to be o.k.
Well, if it was caused by an uninitialized value, it could be
very different on a different system.
It wouldn't seem a bad idea to check atm2%n_clouds also.
if(atm1%n_clouds.ne.atm2%n_clouds) STOP 123
It would have been nice to allow a variable on the STOP statement.
This is back to the question asked some weeks ago about how much
checking one should do on function arguments.
Also, note that the array add not using slice notation depends
on the two arrays having the same size.
(I presume STOP is allowed in elemental functions, but maybe not.)
-- glen
> When invoked, the line
>
> atmsum%Cloud(1:nc) = atmsum%Cloud(1:nc) + atm2%Cloud(1:nc)
>
> causes a segfault. When I change it to the more usual
>
> atmsum%Cloud = atmsum%Cloud + atm2%Cloud
>
> everything is fine. No segfault and results are correct.
>
> Obviously my use of array slices when the function result is also one of
> the arguments is causing a problem. Regular clf readers will know that
> treating an array slice like an array will one day BYITA - today appears
> to be my turn.
>
> First off, is the array-slice version legal? If it is, can anyone put
> forward a possible explanation as to what the mechanism of that error
> might be. E.g. something is being copied and the order one of the
> operations is important and/or...? I just want to provide some sort of
> cogent explanation of the problem to others so they don't do what I did.
I don't see anything obviously wrong with either version. Being a style
that I don't normally personally use doesn't count as wrong. The nature
of the problem is not obvious to me.
But I do want to correct one things that is an error - not in the code,
but in your description of it. The function result is *NOT* one of the
arguments to the function - not even close. You are making what is a
very common mistake. Let me simplify to a trivial function invocation,
without any array issues or operator overloads. Consider
y = f(x)
The y here is *NOT* the function result. That's not the way assignment
works. Conceptually, the y=f(x) is done in two steps. (One can break it
down further, but these two are the relevant ones here).
1. Evaluate f(x). In doing that, you don't look at any other context. It
isn't relevant that it is in an assignment statement at all, much less
what is on the left-hand side of the assignment. Just evaluate f(x) as
it stands. Store the result in some unnamed temporary place if needed.
2. Then do the assignment. Take the value computed in step 1 and assign
it to y.
Compilers are quite likely to optimize away the temporary in some cases,
combining the two steps into one. But you still need to start with this
2-step model as the basic concept and consider anything else to be just
an optimization.
When you are making extensive use of things like defined operations, it
is particularly important to keep the two-step model in mind. The
assignment part might well not be trivial. In particular, it could be a
defined assignment in some cases. If you have a defined assignment, the
compiler is less likely to be able to optimize things to get rid of it
as a separate step.
You could possibly be running into bugs in the compiler's attempt to
optimize things here. The fact that the variable on the left-hand side
of the assignment is also used on the right-hand side might preclude
such optimization unless the compiler is particularly smart about how to
manage such things. That probably means you have a temporary variable
with an allocatable component that needs to get appropriately allocated
and deallocated behind your back. That's all supposed to work, but it is
at least fertile ground for comipler bugs. (Of course, it is also
fertile ground for user errors to interact badly with non-obvious but
correct compiler behavior).
No, I don't know why it would be different when you use array slices.
--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain
> (I presume STOP is allowed in elemental functions, but maybe not.)
Not.
F2003:
C1276 A pure subprogram shall not contain a stop-stmt.
What are you supposed to use STOP for? In subroutines and functions
it is usually a last resort when everything is wrong and you just
want to get out of there. (I believe some systems have kernel panic
when there is nothing else to do.)
It seems to me that whoever wrote the specs for ELEMENTAL never
worked on real code.
Oh well.
-- glen
$ xlf2003 -qversion
IBM XL Fortran for AIX, V12.1
Version: 12.01.0000.0001
:o(
I redid all the tests using the xlf2003 invocation rather than xlf95 (grasping at straws
here), but got the same problem.
cheers,
paulv
Thanks, Richard. You're right - my thinking about it was wrong. It's one of those
incorrect beliefs one acquires early on and, well, just seems to stick. :o(
> When you are making extensive use of things like defined operations, it
> is particularly important to keep the two-step model in mind. The
> assignment part might well not be trivial. In particular, it could be a
> defined assignment in some cases. If you have a defined assignment, the
> compiler is less likely to be able to optimize things to get rid of it
> as a separate step.
No, I've moved away from defined assignments every since I discovered (I believe through
one of your replies to a post of mine on that particular subject) that the "intrinsic"
assignment handles all the allocations.
> You could possibly be running into bugs in the compiler's attempt to
> optimize things here. The fact that the variable on the left-hand side
> of the assignment is also used on the right-hand side might preclude
> such optimization unless the compiler is particularly smart about how to
> manage such things. That probably means you have a temporary variable
> with an allocatable component that needs to get appropriately allocated
> and deallocated behind your back. That's all supposed to work, but it is
> at least fertile ground for comipler bugs. (Of course, it is also
> fertile ground for user errors to interact badly with non-obvious but
> correct compiler behavior).
Even though I'm a firm believer in the 99/100 rule-of-thumb[*] regarding compiler errors
(mostly through personal experience), I think this might be one. I just cannot get the
code to break on my linux system. Bummer, :o(
Thanks & cheers,
paulv
[*] Whenever a programmer "discovers" a compiler bug, 99 times out of 100 it's their own
code that's buggy.
You can if you use a technique I call "lying." Declare the function
to be elemental
in the code that calls it, but do not declare it elemental in the code
that defines
it. You might need to break the function out into a file of its own
to compile it.
Of course, you should do this only for debugging, never for production
code.
Bob Corbett
Just to follow up:
Down below is a (relatively) simple working example that indicates the issue.
If I compile and run through the debugger I get:
$ xlf2003 -qversion
IBM XL Fortran for AIX, V12.1
Version: 12.01.0000.0001
$ xlf2003 -qdbg test_elementalslice.f90
** b_define === End of Compilation 1 ===
** a_define === End of Compilation 2 ===
** test_elementalslice === End of Compilation 3 ===
1501-510 Compilation successful for file test_elementalslice.f90.
$ dbx a.out
Type 'help' for help.
reading symbolic information ...
(dbx) run
n_a = 1
a%n = 4
a%x = 3.141590118 3.141590118 3.141590118 3.141590118
n_b = 1
b%n = 4
b%x = 0.6931470037 0.6931470037 0.6931470037 0.6931470037
n_b = 2
b%n = 4
b%x = 0.6931470037 0.6931470037 0.6931470037 0.6931470037
n_b = 3
b%n = 4
b%x = 0.6931470037 0.6931470037 0.6931470037 0.6931470037
n_a = 2
a%n = 4
a%x = 3.141590118 3.141590118 3.141590118 3.141590118
n_b = 1
b%n = 4
b%x = 0.6931470037 0.6931470037 0.6931470037 0.6931470037
n_b = 2
b%n = 4
b%x = 0.6931470037 0.6931470037 0.6931470037 0.6931470037
n_b = 3
b%n = 4
b%x = 0.6931470037 0.6931470037 0.6931470037 0.6931470037
Segmentation fault in . at 0xf458
0x000000000000f458 f8e30008 std r7,0x8(r3)
(dbx) up
__a_define_NMOD_a_add(??, ??, ??), line 126 in "test_elementalslice.f90"
(dbx) quit
The line in question is
asum%b(1:n_b) = asum%b(1:n_b) + a2%b(1:n_b)
The next commented out line
! asum%b = asum%b + a2%b
gets around the problem, but this doesn't allow for the structure arrays to be allocated
to a larger dimension, and then used for a smaller "n_b" -- for example, if I want to
allocate the "b" array component of "a" to some maximum size and then only operate on a
subset.
I've submitted the test case to our local IBM folks for their assessment.
cheers,
paulv
<-----begin----->
module b_define
implicit none
private
public :: operator(+)
public :: b_type
public :: b_destroy
public :: b_create
public :: b_inspect
interface operator(+)
module procedure b_add
end interface operator(+)
type :: b_type
integer :: n
real, allocatable :: x(:)
end type
contains
! Public
elemental subroutine b_destroy(b)
type(b_type), intent(out) :: b
end subroutine b_destroy
elemental subroutine b_create(b,n)
type(b_type), intent(out) :: b
integer, intent(in) :: n
allocate(b%x(n))
b%n = n
b%x = 0.0
end subroutine b_create
subroutine b_inspect(b)
type(b_type), intent(in) :: b
print *, ' b%n = ',b%n
print *, ' b%x = ',b%x
end subroutine b_inspect
! Private
elemental function b_add(b1,b2) result(bsum)
type(b_type), intent(in) :: b1, b2
type(b_type) :: bsum
integer :: n
if ( b1%n /= b2%n ) return
bsum = b1
n = b1%n
bsum%x(1:n) = bsum%x(1:n) + b2%x(1:n)
end function b_add
end module b_define
module a_define
use b_define
implicit none
private
public :: operator(+)
public :: a_type
public :: a_destroy
public :: a_create
public :: a_inspect
interface operator(+)
module procedure a_add
end interface operator(+)
type :: a_type
integer :: n, n_b
real, allocatable :: x(:)
type(b_type), allocatable :: b(:)
end type
contains
! Public
elemental subroutine a_destroy(a)
type(a_type), intent(out) :: a
end subroutine a_destroy
elemental subroutine a_create(a,n,n_b)
type(a_type), intent(out) :: a
integer, intent(in) :: n, n_b
allocate(a%x(n))
if ( n_b > 0 ) then
allocate(a%b(n_b))
call b_create(a%b,n)
end if
a%n = n
a%n_b = n_b
a%x = 0.0
end subroutine a_create
subroutine a_inspect(a)
type(a_type), intent(in) :: a
integer :: i
print *, 'a%n = ',a%n
print *, 'a%x = ',a%x
if ( a%n_b > 0 ) then
do i = 1, a%n_b
print *, ' n_b = ',i
call b_inspect(a%b(i))
end do
end if
end subroutine a_inspect
! Private
elemental function a_add(a1,a2) result(asum)
type(a_type), intent(in) :: a1, a2
type(a_type) :: asum
integer :: n, n_b
if ( a1%n /= a2%n ) return
asum = a1
n = a1%n
asum%x(1:n) = asum%x(1:n) + a2%x(1:n)
if ( a1%n_b > 0 ) then
n_b = a1%n_b
asum%b(1:n_b) = asum%b(1:n_b) + a2%b(1:n_b)
! asum%b = asum%b + a2%b
end if
end function a_add
end module a_define
program test_elementalslice
use a_define
integer, parameter :: n = 4
integer, parameter :: n_a = 2, n_b = 3
integer :: i, j
type(a_type) :: a(n_a), a2(n_a)
call a_create(a,n,n_b)
do i = 1, n_a
a(i)%x = 3.14159
do j = 1, a(i)%n_b
a(i)%b(j)%x = 0.693147
end do
print *, 'n_a = ',i
call a_inspect(a(i))
end do
a2 = a
a = a + a2
print *, '*** After summation ***'
do i = 1, n_a
print *, 'n_a = ',i
call a_inspect(a(i))
end do
call a_destroy(a)
call a_destroy(a2)
end program test_elementalslice
<-----end----->
I would also print out the register contents at this point.
It is not so easy to tell sometimes, but if r3 has a really
strange value then sometimes it is obvious. A few instructions
before and after, which I presume would be a loop, would also help.
I had thought we were discussing IA32, as that is what most
people use. It seems, though, to be PowerPC. That likely means
that the compiler hasn't been tested as much.
> The line in question is
> asum%b(1:n_b) = asum%b(1:n_b) + a2%b(1:n_b)
> The next commented out line
> ! asum%b = asum%b + a2%b
You could also compile just the one function with -S and
post the generated assembly code. Most likely, though, IBM
will find it.
(snip of the whole program listing)
-- glen
> I had thought we were discussing IA32, as that is what most
> people use.
For XLF?
> It seems, though, to be PowerPC. That likely means
> that the compiler hasn't been tested as much.
That seems an odd conclusion. XLF has a history going back... gee... I
don't recall how far, but well longer than some of the IA32 compilers
have existed. I don't think that the popularity of the platform is a
very good measure of compiler robustness. There have sure been plenty of
junk compilers on popular platforms. I think I'll avoid the distraction
of mentioning compiler names.
No, xlf. I didn't point it out in my original post, but I did in a followup.
Linux+gfortran works fine. AIX+xlf does not.
>> The line in question is
>> asum%b(1:n_b) = asum%b(1:n_b) + a2%b(1:n_b)
>
>> The next commented out line
>> ! asum%b = asum%b + a2%b
>
> You could also compile just the one function with -S and
> post the generated assembly code. Most likely, though, IBM
> will find it.
It is being submitted to the IBM compiler people. I was told it will likely be a low
priority since there is a simple workaround, i.e. do something like
do i = 1, n_b
asum%b(i) = asum%b(i) + a2%b(i)
end do
rather than
asum%b(1:n_b) = asum%b(1:n_b) + a2%b(1:n_b)
cheers,
paulv
No, not robustness but usage. While testing is good and important,
there is nothing better than use by many people compiling many
different programs for finding bugs. That would be especially
true if it required a combination of not so commonly used features
to expose the problem.
There are stories of compiler testing by feeding in cards from
the recycling bin. Probably better at detecting parser bugs
than code generator bugs, though.
-- glen
> Richard Maine <nos...@see.signature> wrote:
> (snip)
>
> > That seems an odd conclusion. XLF has a history going back... gee... I
> > don't recall how far, but well longer than some of the IA32 compilers
> > have existed. I don't think that the popularity of the platform is a
> > very good measure of compiler robustness.
>
> No, not robustness but usage. While testing is good and important,
> there is nothing better than use by many people compiling many
> different programs for finding bugs.
Then MS powerstation ought to have been great. :-) Well, I suppose there
is that step after finding bugs - the one about fixing them.
In any case, I think you'll find that XLF has a history of being pretty
robust. While there might be nothing better than use by many people for
finding bugs, that's not all there is to making a quality compiler. I
think that actual experience on the particular compiler is a lot better
than any such theorizing. By that measure, XLF has generaly fared pretty
well.... which doesn't mean, of course, that it is bug free.
Also fails on:
IBM XL Fortran for AIX, V12.1
Version: 12.01.0000.0006
Works with nagfor 5.2.668, although I get warnings during
compilation which appear harmless:
NAG Fortran Compiler Release 5.2(686)
Warning: test_elementalslice.f90, line 26: INTENT(OUT) dummy argument
B partly default initialised but otherwise unused
detected at B_DESTROY@<end-of-statement>
Warning: test_elementalslice.f90, line 83: INTENT(OUT) dummy argument
A partly default initialised but otherwise unused
detected at A_DESTROY@<end-of-statement>
[NAG Fortran Compiler normal termination, 2 warnings]
Fortunately my previous post did not mention fixing.
> In any case, I think you'll find that XLF has a history of being pretty
> robust. While there might be nothing better than use by many people for
> finding bugs, that's not all there is to making a quality compiler. I
> think that actual experience on the particular compiler is a lot better
> than any such theorizing. By that measure, XLF has generaly fared pretty
> well.... which doesn't mean, of course, that it is bug free.
Yes. IBM has a pretty good history of caring about their products.
When I said 'testing' in the previous post, I meant testing
by actual users in addition to what the maintainers do.
When you 'know' how a program is supposed to work, you might
be less likely to try using it a different way.
I used to be pretty good at finding bugs in program, by trying
the more unusual cases. Sometimes just to see that it works.
That hasn't happened quite as much lately, maybe programs are
getting better, or maybe I don't try as hard.
-- glen
Thanks for the info! I've passed it onto the sysadmins here.
> Works with nagfor 5.2.668, although I get warnings during
> compilation which appear harmless:
>
> NAG Fortran Compiler Release 5.2(686)
> Warning: test_elementalslice.f90, line 26: INTENT(OUT) dummy argument
> B partly default initialised but otherwise unused
> detected at B_DESTROY@<end-of-statement>
> Warning: test_elementalslice.f90, line 83: INTENT(OUT) dummy argument
> A partly default initialised but otherwise unused
> detected at A_DESTROY@<end-of-statement>
> [NAG Fortran Compiler normal termination, 2 warnings]
Yeah I get a similar warning from gfortran (unused dummy argument). It's annoying and, I
believe, incorrect to issue these warnings since the INTENT(OUT)-ness of the dummy means
it *is* being "used" -- it's being reinitialised.
I do wonder why your warning messages say "partly" default initialised? I would hope it
would be wholly default initialised.
cheers,
paulv
> Harald Anlauf wrote:
> > Warning: test_elementalslice.f90, line 26: INTENT(OUT) dummy argument
> > B partly default initialised but otherwise unused
> > detected at B_DESTROY@<end-of-statement>
> I do wonder why your warning messages say "partly" default initialised? I
> would hope it would be wholly default initialised.
I'll ignore the elided questions (as I also find it a slightly strange
message), but the "partly" mentioned above at least makes sense. I think
you are misreading the English of the message. It isn't that only part
of the default initialization is done. It is that only part of the type
has "default initialization". In particular, the component n is not
default initialized.
I find it slightly more amusing that the compiler seems to agree with a
viewpoint of mine that isn't reflected in the standard.
Per the standard, there is no default initialization in any of the
(elided) code unless my eyes are failing to see it.
I personally view allocatable components as being default initialized to
an unallocated status, based on the "waddles like a duck and quacks like
a duck" theory. That uninitialized status gets set in the same
situations where default initialization happens. I wish the standard
would just call it default initialization, since that's what it acts
like. More substantitively, I wish that the user was allowed to confirm
the default initialization by explicit declaration, which could help
clarity. As is, it is a bit like an implicit default initialization that
you can't make explicit.
Sounds to me like the NAG compiler must be viewing it the sam eway as I
do. Otherwise, I don't see what default initialization has to do with
the code shown.