Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Confusing seg fault in pointer-assignment-stmt of the form "ptr => ptr(index)%component"

257 views
Skip to first unread message

Connor B

unread,
Dec 9, 2022, 11:08:59 AM12/9/22
to
I'm having trouble figuring out if the seg fault I'm observing is a compiler bug or the result of a non-standard compliant statement. Having done a decent amount of debugging and standard reading, I kinda think this is a compiler bug, but I definitely could be reading something wrong/misinterpreting things so I'm hoping someone is able to confirm or clarify what I've found.

My apologies if this is too much info up front/confusing, not the right way to go about this, etc., but I figured I'd show the important points I came across up front so all the context is there. I've tried to break the info into the sections with title separators to make it more readible, but but I don't see formatting options on the window to post a message so it might be more readible if you copy+paste into a text editor (espcially with the code fragments).

Lastly, we do have a fix for now (using auxilary pointers for the pointer-assignment), but it seems needlessly cumbersome and, either way, I'd like to know if we are doing something wrong or this is an edge case that slipped through the cracks.


!############################# Compilers Involved ##############################


-- gfortran 6.3.1 (NO seg fault)
-- gfortran 8.5.0 (WITH seg fault)
-- gfortran 9.3.1 (WITH seg fault)



I'll start with a snippet that distills the offending lines as much as possible to highlight what I think the focus should be:

!################# Snippet (full reproducible example below): ##################

type myType
type(myType), pointer, dimension(:) :: pChild => null()
end type myType

type(myType), pointer, dimension(:) :: child

child => child(1)%pChild ! <---- Seg fault


!################################# Overview: ###################################

As can be seen from this snippet, there is a derived type that is effectively a linked list, however, our next element is stored within the first element of a pointer-to-array. This structure of the derived type comes from a large project and it cannot change.

The project has existed with this derived type and many statements of the form shown above without problems for quite some time, only recently causing issues with trying out the newer compiler versions (I realize this doesn't mean it's standard compliant/valid).

I've tracked down why the seg fault occurs by looking at the disassembly, I'm just not sure where the blame lies. First I'll show some code to reproduce, then discuss what I've observed/read.

Also, I've put the seg fault message, followed by the the disassembly snippets, at the bottom.


!######################### Full Reproducible Example: ##########################

1 program main
2 implicit none
3
4 type myType
5 type(myType), pointer, dimension(:) :: pChild => null()
6 end type myType
7
8 type(myType), pointer, dimension(:) :: root, child
9
10 nullify(root, child)
11 call allocateMyType(root)
12
13 call allocateMyType(root(1)%pChild)
14 call allocateMyType(root(1)%pChild(1)%pChild)
15
16 child => root(1)%pChild ! <----- Does NOT cause a seg fault
17 do while (associated(child))
18 child => child(1)%pChild ! <----- Causes a seg fault
19 enddo
20
21 call deallocateMyType(root) ! Details omitted, seg fault happens above
22
23 contains
24 subroutine allocateMyType(pMyType)
25 implicit none
26 type(myType), pointer, dimension(:), intent(inout) :: pMyType
27
28 if (associated(pMyType)) then
29 stop -1
30 endif
31
32 allocate(pMyType(2))
33 if (.not. associated(pMyType)) then
34 stop -2
35 endif
36
37 if (associated(pMyType(1)%pChild) .or. associated(pMyType(2)%pChild)) then
38 stop -3
39 endif
40 end subroutine allocateMyType
41 end program main


!############################## Code Discussion: ###############################

The two lines highlighted above (with the trailing comments), Line 16 (L16) and Line 18 (L18), demonstrate what I believe the issue is. I've provided the disassembly at the bottom - there are four bits of disassembly: L16 in the old and new compilers and L18 in the old and new compilers.

In particular, L18 causes a seg fault due to a two-step (in the new compilers) process of copying the pointer data to the LHS.

In the older compiler (where pointers seem to be 48 bytes), the pointer meta data is copied over in one sweep of RHS->registers->LHS.

However, in the newer compilers (where pointers seem to be 64 bytes instead), we have the same sweep of copying RHS->registers->LHS, but then an additional set of instrcutions is executed to populate some segment of data in the (new) +0x20 space of the pointer data (the actual pointer, not the target). I'm not really sure why it does this as I believe this data was already copied during the initial RHS->regsiters->LHS copying. Either way, it tries do this, by the calculations to index and offset to the ultimate component of the RHS, but if the parent variable of the RHS is the same as the variable on the LHS, it's now indexing off of a different object, which can cause a seg fault. Hopefully the comments I put in the disassembly below makes what I mean more clear.


!########################### Reading the Standard: ############################

I tried looking at the standard to see if our pointer-assignment-stmt was an invalid form, but it seems to be perfectly valid as far as I can tell.

The standard I'm going off of is the Fortran 2008 committee draft located here: https://wg5-fortran.org/f2008.html

My apologies if this is hard to follow, but here I include the sections I found relevant, comment on my interpretation, then tie it altogether towards the end.


To start,

* R733 pointer-assignment-stmt is data-pointer-object [ (bounds-spec-list) ] => data-target
* R737 data-target is variable
* C724 (R737) A variable shall have either the TARGET or POINTER attribute, and shall not be an array section with a vector subscript

R733 and R737 tell us the RHS of a pointer-assignment-stmt must be a variable, but
C724 says it is restricted by disallowing a variable that is also "an array section with a vector subscript".

* R602 variable is designator or exp
* R601 designator is object-name or array-element or ... or structure-component

R602 and R601 say that both an array-element or structure-component are considered designators, which satisfy the defintion of a variable.

* R613 structure-component is data-ref
* R617 array-element is data-ref
* R618 array-section is data-ref [ ( substring-range ) ]
* NOTE 6.4
* Examples of structure components are:
* SCALAR_PARENT%SCALAR_FIELD scalar component of scalar parent
* ARRAY_PARENT(J)%SCALAR_FIELD component of array element parent
* ARRAY_PARENT(1:N)%SCALAR_FIELD component of array section parent

R613 and NOTE 6.4 show the various forms a structure-component can take. In our seg faulting statement, we have a structure-component of the form "ARRAY_PARENT(J)%SCALAR_FIELD component of array element parent".

It's unclear to me how the various "component of XXX parent" would be classified (i.e., component, scalar, array element, etc.), but in any case, according to C724, the only possibly disallowed structure-component form in a pointer-assignment-stmt would be one of the form "ARRAY_PARENT(1:N)%SCALAR_FIELD".

For completeness of these definitions:

* R611 data-ref is part-ref [ % part-ref ]
* R612 part-ref is part-name [ ( section-subscript-list ) ] [ image-selector ]

and a helpful example to confirm my reading:

* NOTE 6.1
* For example, ... P % AGE, and A (1:1) are all variables


Altogether,

-- "child(1)" is a (R611) data-ref of the (R612) part-ref form.
-- "child(1)" is an (R617) array-element, which is distinct from an (R618) array-section
-- C724 only disallows an (R618) array-section, but seemingly does not disallow an (R617) array-element
-- "child(1)%pChild" is a (R613) structure-component of an (R617) array-element which makes it a (R601) designator, and thus a (R602) variable

That was probably an unnecessary construction of what makes a valid RHS for a pointer-assignment-stmt, but it was to show that it is at least not of a disallowed form. Furthermore, in finding those rules, I did not see any rule disallowing the RHS from having a part-ref that the LHS also uses.


Additionally (I realize this is a different context as it relates to an assignment-stmt), but I know statements of the form:

* NOTE 7.37
* For example, in the character intrinsic assignment statement:
* STRING (2:5) = STRING (1:4)

are valid/safe with the RHS not overwriting the LHS during evaluation because all assignments done in this statements have the LHS take the corresponding value the RHS had before this statement began execution.



!################################ Extra Info: #################################

Here are some more pieces that went into testing. Importantly, all permutations tested produced the same behavior (either executing fine on the older compiler or seg faulting on the newer ones):

-- Compilation:
---- gfortran -glevel -ggdblevel -O0 -Wall -Wextra -fbounds-check -pedantic-errors -c main.f90
------ level = 0, 1, 2, and 3
------ -Og was also used instead of -O0 at times (same error, although the instruction sets produced were different)

-- Software used:
---- Running the executable on it's own, or with:
------ Valgrind 3.15.0
------ GNU gdb (GDB) Red Hat Enterprise Linux 8.3-3.el7
-------- Note: these were not the only valgrind/gdb versions used, just the main ones (but the other versions observed the same).



!############################## Seg Fault Message ##############################

In all cases that resulted in a seg fault, the message was always of the following form:

Use of uninitialised value of size 8
at 0x40179A: MAIN__ (main.f90:18)
by 0x401D59: main (main.f90:21)

Uninitialised value was created by a stack allocation
at 0x401824: allocatemytype.3897 (main.f90:24)

Use of uninitialised value of size 8
at 0x4017F3: MAIN__ (main.f90:18)
by 0x401D59: main (main.f90:21)
Uninitialised value was created by a stack allocation
at 0x401828: allocatemytype.3897 (main.f90:24)

Invalid read of size 8
at 0x4017F3: MAIN__ (main.f90:18)
by 0x401D59: main (main.f90:21)
Address 0x20 is not stack'd, malloc'd or (recently) free'd


!################################# Disassembly #################################

Below are the full set of instrcutions produced that were associated to the indicated "ptr => ptr(index)%component" statement using the indicated compiler versions.

The disassembly was obtained by running the program with a vgdb valgrind session, running gdb in a separate terminal, connecting to the valgrind process, stepping into the main program, and running
(gdb) $ disassemble /s

I've added comments giving the overview of related instructions, as I understand them, although there's a chance I misinterpreted things. Please let me know if I have.


!###############################################################################
!################### Disassembly of child => ROOT(1)%pChild ####################

!####################### (NON-seg fault) gfortran 6.3.1 ########################
L16 child => root(1)%pChild
; ... L16+0 through L16+21 are bounds check instructions
L16+21 mov -0x60(%rbp),%rcx ; Step 1:
L16+22 mov -0x58(%rbp),%rdx ; Copy ROOT's pointer metadata
L16+23 mov -0x48(%rbp),%rax ; into registers rax, rcx, rbx
L16+24 add %rax,%rdx ;
L16+25 mov %rdx,%rax ; Step 2:
L16+26 shl $0x3,%rax ; Use metadata to offset
L16+27 sub %rdx,%rax ; from address of root(1)
L16+28 shl $0x3,%rax ; to address of root(1)%pChild
L16+29 add %rcx,%rax ;
L16+30 mov (%rax),%rdx ;
L16+31 mov %rdx,-0x30(%rbp) ;
L16+32 mov 0x8(%rax),%rdx ; Step 3:
L16+33 mov %rdx,-0x28(%rbp) ; Copy the pointer metadata
L16+34 mov 0x10(%rax),%rdx ; of root(1)%pChild, via the
L16+35 mov %rdx,-0x20(%rbp) ; registers into the address
L16+36 mov 0x18(%rax),%rdx ; space occupied by the pointer
L16+37 mov %rdx,-0x18(%rbp) ; representing child
L16+38 mov 0x20(%rax),%rdx ; (i.e., the actual pointer
L16+39 mov %rdx,-0x10(%rbp) ; object, like a C-struct)
L16+40 mov 0x28(%rax),%rax ;
L16+41 mov %rax,-0x8(%rbp) ;

!################## (NON-seg fault) gfortran 8.5.0 and 9.3.1 ###################
L16 child => root(1)%pChild
; ... L16+0 through L16+21 are bounds check instructions
L16+21 mov -0x90(%rbp),%rax ; Step 1:
L16+22 mov -0x88(%rbp),%rcx ; Copy ROOT's pointer metadata
L16+23 mov -0x68(%rbp),%rdx ; into registers rax, rcx, rbx
L16+24 add %rdx,%rcx ; Step 2:
L16+25 mov -0x70(%rbp),%rdx ; Use metadata to offset
L16+26 imul %rcx,%rdx ; from address of root(1)
L16+27 add %rdx,%rax ; to address of root(1)%pChild
L16+28 mov (%rax),%rcx ;
L16+29 mov 0x8(%rax),%rbx ;
L16+30 mov %rcx,-0x50(%rbp) ;
L16+31 mov %rbx,-0x48(%rbp) ;
L16+32 mov 0x10(%rax),%rcx ; Step 3:
L16+33 mov 0x18(%rax),%rbx ; Copy the pointer metadata
L16+34 mov %rcx,-0x40(%rbp) ; of root(1)%pChild, via
L16+35 mov %rbx,-0x38(%rbp) ; the registers through
L16+36 mov 0x20(%rax),%rcx ; offset/dereferencing rax
L16+37 mov 0x28(%rax),%rbx ;
L16+38 mov %rcx,-0x30(%rbp) ;
L16+39 mov %rbx,-0x28(%rbp) ;
L16+40 mov 0x38(%rax),%rdx ;
L16+41 mov 0x30(%rax),%rax ;
L16+42 mov %rax,-0x20(%rbp) ;
L16+43 mov %rdx,-0x18(%rbp) ;
L16+44 mov -0x90(%rbp),%rax ; Repeat L16+21
L16+45 mov -0x88(%rbp),%rcx ; Repeat L16+22
L16+46 mov -0x68(%rbp),%rdx ; Repeat L16+23
L16+47 add %rdx,%rcx ; Repeat L16+24
L16+48 mov -0x70(%rbp),%rdx ; Repeat L16+25
L16+49 imul %rcx,%rdx ; Repeat L16+26
L16+50 add %rdx,%rax ; Repeat L16+27
L16+51 mov 0x20(%rax),%rax ; Repeat L16+36 ? I'm not sure what this field is, but it often appears to be sizeof(myType)
L16+52 mov %rax,-0x30(%rbp) ; Repeat L16+38



!###############################################################################
!################### Disassembly of child => CHILD(1)%pChild ###################

!####################### (NON-seg fault) gfortran 6.3.1 ########################
L18 child => child(1)%pChild
; ... L18+0 through L18+21 are bounds check instructions
L18+21 mov -0x30(%rbp),%rcx ; Step 1:
L18+22 mov -0x28(%rbp),%rdx ; Copy CHILD's pointer metadata
L18+23 mov -0x18(%rbp),%rax ; into registers rax, rcx, rbx
L18+24 add %rax,%rdx ;
L18+25 mov %rdx,%rax ; Step 2:
L18+26 shl $0x3,%rax ; Use metadata to offset
L18+27 sub %rdx,%rax ; from address of child(1)
L18+28 shl $0x3,%rax ; to address of child(1)%pChild
L18+29 add %rcx,%rax ;
L18+30 mov (%rax),%rdx ;
L18+31 mov %rdx,-0x30(%rbp) ;
L18+32 mov 0x8(%rax),%rdx ; Step 3:
L18+33 mov %rdx,-0x28(%rbp) ; Copy the pointer metadata
L18+34 mov 0x10(%rax),%rdx ; of child(1)%pChild, via the
L18+35 mov %rdx,-0x20(%rbp) ; registers into the address
L18+36 mov 0x18(%rax),%rdx ; space occupied by the pointer
L18+37 mov %rdx,-0x18(%rbp) ; representing child
L18+38 mov 0x20(%rax),%rdx ; (i.e., the actual pointer
L18+39 mov %rdx,-0x10(%rbp) ; object, like a C-struct)
L18+40 mov 0x28(%rax),%rax ;
L18+41 mov %rax,-0x8(%rbp) ;


!#################### (SEG FAULT) gfortran 8.5.0 and 9.3.1 #####################
L18 child => child(1)%pChild
; ... L18+0 through L18+21 are bounds check instructions
L18+21 mov -0x50(%rbp),%rax ; Step 1:
L18+22 mov -0x48(%rbp),%rcx ; Copy CHILD's pointer metadata
L18+23 mov -0x28(%rbp),%rdx ; into registers rax, rcx, rbx
L18+24 add %rdx,%rcx ; Step 2:
L18+25 mov -0x30(%rbp),%rdx ; Use metadata to offset
L18+26 imul %rcx,%rdx ; from address of child(1)
L18+27 add %rdx,%rax ; to address of child(1)%pChild
L18+28 mov (%rax),%rcx ;
L18+29 mov 0x8(%rax),%rbx ;
L18+30 mov %rcx,-0x50(%rbp) ;
L18+31 mov %rbx,-0x48(%rbp) ;
L18+32 mov 0x10(%rax),%rcx ; Step 3:
L18+33 mov 0x18(%rax),%rbx ; Copy the pointer metadata
L18+34 mov %rcx,-0x40(%rbp) ; of child(1)%pChild, via
L18+35 mov %rbx,-0x38(%rbp) ; the registers through
L18+36 mov 0x20(%rax),%rcx ; offset/dereferencing rax
L18+37 mov 0x28(%rax),%rbx ;
L18+38 mov %rcx,-0x30(%rbp) ;
L18+39 mov %rbx,-0x28(%rbp) ;
L18+40 mov 0x38(%rax),%rdx ;
L18+41 mov 0x30(%rax),%rax ;
L18+42 mov %rax,-0x20(%rbp) ;
L18+43 mov %rdx,-0x18(%rbp) ;
L18+44 mov -0x50(%rbp),%rax ; Repeat L18+21 (but now with new values, could be 0x0 if target was null)
L18+45 mov -0x48(%rbp),%rcx ; Repeat L18+22
L18+46 mov -0x28(%rbp),%rdx ; Repeat L18+23
L18+47 add %rdx,%rcx ; Repeat L18+24
L18+48 mov -0x30(%rbp),%rdx ; Repeat L18+25
L18+49 imul %rcx,%rdx ; Repeat L18+26
L18+50 add %rdx,%rax ; Repeat L18+27
L18+51 mov 0x20(%rax),%rax ; <---- Seg fault occurs here as it offsets from 0x0 by 0x20 and dereferences
L18+52 mov %rax,-0x30(%rbp) ;

FortranFan

unread,
Dec 9, 2022, 11:49:44 AM12/9/22
to
On Friday, December 9, 2022 at 11:08:59 AM UTC-5, Connor B wrote:

> I'm having trouble figuring out if the seg fault I'm observing is a compiler bug or the result of a non-standard compliant statement. Having done a decent amount of debugging and standard reading, I kinda think this is a compiler bug, but I definitely could be reading something wrong/misinterpreting things so I'm hoping someone is able to confirm or clarify what I've found. ..

@Connor B,

I can confirm the seg fault using gfortran version as part of "gcc version 12.0.1 20220123 (experimental) (GCC)".

My hunch looking at your code with a "mental compiler" is that it is a compiler bug also. And if it is of any help, Intel Fortran runs your program and completes with no faults or exceptions.

You may know you can work with the Intel Fortran compiler now *without* needing to procure a commercial license, see this link and notice the option on the top section, right-hand side for "Download the Stand-Alone Version A stand-alone download of the Intel Fortran Compiler is available. You can download binaries from Intel or choose your preferred repository."

https://www.intel.com/content/www/us/en/developer/tools/oneapi/fortran-compiler.html#gs.k82xd4

By the way, you can reference a proxy document toward the current Fortran standard i.e., 2018 here:
https://j3-fortran.org/doc/year/18/18-007r1.pdf

gah4

unread,
Dec 9, 2022, 6:02:17 PM12/9/22
to
On Friday, December 9, 2022 at 8:08:59 AM UTC-8, Connor B wrote:

(snip)

> !############################## Seg Fault Message ##############################
>
> In all cases that resulted in a seg fault, the message was always of the following form:
>
> Use of uninitialised value of size 8
> at 0x40179A: MAIN__ (main.f90:18)
> by 0x401D59: main (main.f90:21)
>
> Uninitialised value was created by a stack allocation
> at 0x401824: allocatemytype.3897 (main.f90:24)
>
> Use of uninitialised value of size 8
> at 0x4017F3: MAIN__ (main.f90:18)
> by 0x401D59: main (main.f90:21)
> Uninitialised value was created by a stack allocation
> at 0x401828: allocatemytype.3897 (main.f90:24)
>
> Invalid read of size 8
> at 0x4017F3: MAIN__ (main.f90:18)
> by 0x401D59: main (main.f90:21)
> Address 0x20 is not stack'd, malloc'd or (recently) free'd

Doing C programming, I do get used to pointers, but it isn't so easy to follow.
I notice in the above messages:

First, they say stack allocation.
I would expect ALLOCATE not to be stack allocation.
It might be, though, that dummy argument copies of the pointer
are on the stack.

Also, I notice that the messages mention line 21:
The deallocate call that you say to ignore because the problem occurs earlier.

Or maybe I don't understand line numbers.


Connor B

unread,
Dec 9, 2022, 8:35:23 PM12/9/22
to
Also coming from C, I've had my expectation of pointers subverted before by Fortran, hence why I'm unsure if I'm interpreting things correctly here.

Initially I thought the warnings about the error coming from a stack allocation were a Valgrind misnomer, but after the debugging I've done I believe it's a real message. What I think it is referring to is the value comes from the pointer metadata, not the actual dynamic memory the pointer is responsible for.

As far as I can tell, the layout of a pointer is as follows:

0x00: 0x0000000000000001 0x0000000000000002
0x10: 0x0000000000000003 0x0000000000000004
0x20: 0x0000000000000005 0x0000000000000006
0x30: 0x0000000000000007 0x0000000000000008

Note, the numbers in the margin are address offsets from the start of the pointer object and the 0x0000000000000001-0x0000000000000008 hex values are each simply 8-byte fields (int64) that I have numbered so I can refer to them below.

That is, pointers are essentially a C-struct with 8 long int sized fields* which exist as a stack allocated object and contain the information necessary to interact with the dynamic memory they are (possibly) associated with.


*The one caveat to the layout above/description below is the pointers in gfortran 6.3.1 seem to be missing two of those fields. In particular, the "0x20: 0x0000000000000005 0x0000000000000006" fields seem to only exist in gfortran 8.5.0/gfortran 9.3.1. So, in gfortran 6.3.1, the layout appears as follows:

0x00: 0x0000000000000001 0x0000000000000002
0x10: 0x0000000000000003 0x0000000000000004
0x20: 0x0000000000000007 0x0000000000000008

Note that I've used the same hex values to identify the fields to show which fields I think are missing between the two representations.



As for what the fields mean, some I'm more confident on, others I have a guess, and one or two I'm really not sure, but here's what I think they mean (I put ?? follow fields I'm unsure of):

0x00: 0x0000000000000001
=> The address of the pointer's target

0x08: 0x0000000000000002 ??
=> An offset value that, when added to the lbound, converts the index to a memory offset

0x10: 0x0000000000000003
=> The size of the datatype of the target

0x18: 0x0000000000000004 ????
=> No idea. For many of the debug runs I did, I often saw a static value of 0x0000050100000000, regardless of ways I shaped/targeted the pointer

0x20: 0x0000000000000005 ??
=> Also the size of the datatype of the target. This is the field that gets copied again causing the seg fault but, as far as I remember, everytime I inspected this section, it matched the value of "0x10: 0x0000000000000003" and, again, this seems to be missing from gfortran 6.3.1

0x28: 0x0000000000000006 ????
=> Also not sure. I think I wrote down a likely candidate at some point but don't see it in my notes currently. I think it might have to do with the rank? but I'm not sure (I would often see a low valued number, but I don't think it was constant)

0x30: 0x0000000000000007
=> Lower bound of the array range

0x38: 0x0000000000000008
=> Upper bound of the array range


I believe the reason the error message says the data involved was created by stack allocation is because it comes from an instruction that is using the metadata (which lives on the stack) to do indexing/component offsetting. I've confirmed that the line which I highlighted in the disassembly is the one responsible for the seg fault, at least in debug mode (the seg fault still occurs from the same line of code when not in debug mode, but I wouldn't know how to identify the offending instrcution without a debugger).



As for the message referring to the later line (the dellocateMyType call), I believe that _is_ a misnomer. I put that line in to indicate I knew to be responsible with memory, but the seg fault occurs before the line is reached. In the original project, the seg fault occurs in a procedure without a deallocate call in its body, and stepping through the demonstration program I provided shows that the dellocateMyType call is never reached. This should be confirmable by putting print statements in the program, running it, and seeing what gets printed before the crash.

I believe the line number referenced in the error message is actually trying to locate the end of the function, but since there is no return statement, it picks the last executable statement in the scope. I saw similar messages in other contexts and some disassemblies showed the reason for this is possibly because of how stack allocation/deallocation works. I think there are some cleanup instructions that need to be executed after a procedure completes but before it removes itself from the stack, and the way this happens is it inserts these instructions onto the stack frame after the procedure is put on the stack, but before its data is allocated on the stack. That is, it's like a hook that injects itself between the call to the procedure (in this case, the MAIN program) and the execution of the body, hence why it appears in the error report, but since it isn't a thing that actually exists within your source code, the line number associated with it ends up being the last line of the procedure.

Ev. Drikos

unread,
Dec 10, 2022, 12:28:47 AM12/10/22
to
On 09/12/2022 18:08, Connor B wrote:

> !######################### Full Reproducible Example: ##########################

Where is defined the function *deallocateMyType*?

gah4

unread,
Dec 10, 2022, 5:17:36 AM12/10/22
to
On Friday, December 9, 2022 at 5:35:23 PM UTC-8, Connor B wrote:

(snip)

> Also coming from C, I've had my expectation of pointers subverted before by Fortran,
> hence why I'm unsure if I'm interpreting things correctly here.

This one:

https://thinkingeek.com/2017/01/14/gfortran-array-descriptor/

has a description of the descriptor in 2017.

That seems slightly different from yours.
For one, it keeps the rank, data type, and data element size, in one entry.
(64 bit in your case.)

But also, it keeps a stride along with lower and upper bound.

Other systems I know keep the stride times the element size,
to save one multiply on each access.

Reminds me, the VAX array descriptor keeps both the address
of the beginning of the array and the address where element zero
would be. I believe the latter is origin+offset in your description.
Or maybe it is origin+offset*elementsize.


Connor B

unread,
Dec 10, 2022, 9:28:40 AM12/10/22
to
Sorry about the wall of info, but I'm trying to give as much info as I can to explain how I've interpreted things in the hopes that someone can either point out my mistake or help confirm this may be a compiler bug. I've tried to section off the extra information in case you only care to see the thing you asked about.

Here's the definition of "deallocateMyType" I was using:

1 program main
2 ! ... L2 through L40 as they appeared in the original post
41 recursive subroutine deallocateMyType(pMyType)
42 implicit none
43 type(myType), pointer, dimension(:), intent(inout) :: pMyType
44 integer :: idx
45
46 if (.not. associated(pMyType)) then
47 return
48 endif
49
50 do idx = lbound(pMyType,1), ubound(pMyType,1)
51 call deallocateMyType(pMyType(idx)%pChild)
52 enddo
53
54 deallocate(pMyType)
55 nullify(pMyType) ! I think this is redundant following deallocate, but did it for my sanity
56 end subroutine deallocateMyType
57 end program main

There was also a longer version of this demo program where I was checking the associated status of the pointers at every step and calling "stop <error_code>" if it ever differed from what I expected, but it reduced the readability of the core of the program so I cut those statements out, reran the debug tools to get the error message I provided, etc, but the behavior I observed was a constant through all iterations of the demonstration program*. That is, the seg fault always occurred on the same source line, the corresponding disassembly line, and the error message was always a pair of messages complaining of "Use of uninitialised value of size 8...Uninitialised value was created by a stack allocation at 0x####..." followed by "Invalid read of size 8...Address 0x## is not stack'd, malloc'd or (recently) free'd".

*strictly speaking, the failure behavior was the same throughout all iterations, but I can get different address violations through permutations of filling myType with more fields, changing what index the linked list is through, etc.



!##### EXTRA INFO

I realize what follows is somewhat tangential to the question, but I'll explain a little more on what I've observed of the error to highlight what I think is happening (although I'm not sure _why_ it happens):


Having stepped through the instructions to watch the seg fault, I gather the error message of "Address 0x## is not stack'd, malloc'd or (recently) free'd" occurs as a result of that extra set of instructions the new compilers use, which happens after they have already copied the descriptor of "pChild" into the descriptor of "child". For the case of seeing "Address 0x20 is not stack'd...", I found that "pChild" that was a null pointer prior to the execution of the pointer-assignment-stmt. When this is the case, the additional indexing results in trying to take the value at "&(child(1)%pChild)+0x20", but since "child" is now a null pointer, the instructions to arrive at "&(child(1)%pChild)" results in "0x0", leading to trying to take the value at "0x0+0x20", casuing a seg fault. I've put some of the disassembly below with some more detailed comments that explain my interpretation of things in-line/in summary:

!#################### (SEG FAULT) gfortran 8.5.0 and 9.3.1 #####################
L18 child => child(1)%pChild ; Source code line that the following disassembly comes from
; ... L18+0 through L18+42 omitted (see original post for what they are)
; ... The L18+43 here corresponds to the _last_ instruction that appears in the non-segfaulting gfortran 6.3.1 disassembly (there, L18+41; see original post)
L18+43 mov %rdx,-0x18(%rbp)
L18+44 mov -0x50(%rbp),%rax
; ... L18+44 through L18+50
; ... are _exact_ copies of L18+21 through L18+27, which take care of loading &(child(1)%pChild)
; ... into %rax so that the pointer descriptor of child(1)%pChild can be loaded into the pointer descriptor child
; ...
; ... At the end of this part (after executing L18+50), the presumed address in %rax is the start of the descriptor of the
; ... _old_ "child(1)%pChild" (i.e, before this source line, L18, began executing), however, since "child" was modified
; ... by L18+28 through L18+43, we have offset to a different address
L18+50 add %rdx,%rax
; ... The following line, L18+51, is where the seg fault happens, in particular this occurs if "child(1)%pChild" (at the start of L18)
; ... was null, which results in the offset that happens on L18+51 attempting to dereference an address such as "0x20"
L18+51 mov 0x20(%rax),%rax

The explanation of what L18+44 through L18+50 does was confirmed (for me) by comparing the disassembly of "L18 child => child(1)%pChild" with the disassembly of "L16 child => root(1)%pChild" (also provided in the original post), as well as stepping through both lines instruction-by-instruction in gdb. If you look at L16+44 through L16+50 (i.e., the lines for "child => root(1)%pChild"), you'll see that there the offset is using -0x90(%rbp), which is the start address of "root", hence there is no issue as it copies data that was valid prior to executing the line.

Lastly, to reiterate my confusion over the inclusion of this additional offset-copy that happens in L18+44 through L18+52, if you look at the disassembly, you'll see that the value that is attempting to be copied is exactly the value that was already copied on line L18+36, so these extra instructions don't seem to have any actual purpose. Again, this was confirmed (for me) by comparing with the disassembly of "L16 child => root(1)%pChild", as well as stepping through the instructions of L16/L18 and watching what values are being copied.

Connor B

unread,
Dec 10, 2022, 2:32:09 PM12/10/22
to
!#################### Thoughts from link on pointer layout #####################

Thanks for providing the link discussing the pointer descriptors in Fortran. Whenever I tried to look up info like this, I mostly just found forum posts with people asking about how pointers work in Fortran and no real information about what's under the hood so I resorted to debugging probes and changing values to glean what the representation was (I demonstrate an example towards the end of this message).

Looking at the content in that link, it does clarify some things I was uncertain of (for example, the "0x501" value I kept seeing apparently indicates it's a rank-1 derived type), and why I seemingly saw a duplication of "sizeof(myType)" (stored in +0x10 as the size of myType, stored "again" in 0x20 as the stride for the first dimension).

Although the link details the pointer descriptor layout, it mentions the layout changes between standards and since I've also observed differing pointer sizes/layouts in my debugging, I'm not currently 100% sure what the 0x20 offset invloved in the seg fault corresponds to - I'll have to look in more detail later. For now, I think it's the stride or, at the very least, is a field involved in indexing, and I'm not sure it's actual meaning would change the discussion that follows.

With this interpretation, I thought about what the extra copying of the "0x20" field might be doing and thought it was weird that it would need to get some value from the RHS after the LHS was just given a copy of that info. That is, it seems that redoing the same index/offset calculation after copying the RHS descriptor is needless/redundant.

I then considered that my demo is probably too simple to demonstrate the usecase for which these instructions become necessary (since my demo has the same shape/bounds on the LHS and RHS) so I figured there must be a more complicated scenario it's trying to accommodate.

My first thought was that a reshaping of the layout between the LHS and RHS might require the RHS descriptor to perform the reshaping after the descriptor had been copied over. After confirming with the standard that reshaping is allowed in a pointer-assignment-stmt, I turned to my demo program to test this and inspect the disassembly.

I provide the results of this investigation below, but first I'll revisit my uncertainty in whether or not "child => child(1)%pChild" is an ill-formed statement (i.e., non-standard conforming and open to seg faults).


!##################### Reconsidering the original question #####################

Returning to the original statement, "child => child(1)%pChild", I see it as nothing more than simply "ptr1 => ptr2". How we arrive at "ptr1" and "ptr2" seems inconsequential to me; at least, according to my reading of the standard since I don't see any rules/clauses that strict a statement of the form "child => child(1)%pChild". All that needs to happen is we find the location of descriptor "ptr1", the location of descriptor "ptr2", and copy the descriptor of "ptr2" into "ptr1". That is, the contents of descriptor "ptr1" should not affect the value obtained in evaluating "ptr2" throughout the execution of the statement.

My reasoning for this interpretation comes from the following snippets of the standard (again, I'm reading https://wg5-fortran.org/f2008.html but maybe I should be reading a different document such as the one FortranFan links to).


I haven't seen anything directly regarding how to resolve the LHS and RHS of a pointer-assignment-stmt, so I'll construct the rule by combinining various rules/clauses.

To start, I'll note the comments on the similar scenario of an (intrinsic) assignment-stmt:

* 7.2.1.3 Interpretation of intrinsic assignments
** 4 Both variable and expr may contain references to any portion of the variable
*** NOTE 7.37
*** For example, in the character intrinsic assignment statement:
*** STRING (2:5) = STRING (1:4)
*** the assignment of the first character of STRING to the second character does not affect the evaluation of
*** STRING (1:4). If the value of STRING prior to the assignment was ’ABCDEF’, the value following the
*** assignment is ’AABCDF’.

Again, I realize this is referring to an assignment-stmt, not a pointer-assignment-stmt, but pulling from other sections, I find this rule for pointer-assignment-stmt's by construction:

* 7.2.2 Pointer assignment
** R733 pointer-assignment-stmt is data-pointer-object [ (bounds-spec-list) ] => data-target
** R737 data-target is variable
** R602 variable is designator or expr
* 7.1.2 Form of an expression
** R701 primary [expr] is ... or designator ... or ( expr )

Thus, I read that the RHS can be considered an expression.

Reading the rules on expressions:

* 7 Expressions and assignment
** 7.1 Expressions
*** 1 An expression represents either a data reference... An expression is formed from operands, operators, and parentheses.
*** 3 Evaluation of an expression produces a value, which has a type, type parameters (if appropriate), and a shape (7.1.9)...

So, it would seem to me that the RHS is an expression which is used as the operand of the "=>" operator, so it needs to be evaluated and that value is used to execute the pointer-assignment-stmt. That is, it seems the standard requires the RHS of a pointer-assignment-stmt to be evaluated at the start of the statement, and it should retain this value throughout the statement (this mirrors the above note on assignment-stmt).

There is a possible gap in this reasoning due to the fact that I don't see anywhere that the symbol "=>" is actually called an "operator" (I don't see "=" referred to as an operator either), but turning to the similar section on assignment-stmt (the closest I could find), I find:

* 7.2.1 Assignment statement
** R732 assignment-stmt is variable = expr
* 7.2.1.3 Interpretation of intrinsic assignments
** 1 Execution of an intrinsic assignment causes, in effect, the evaluation of the expression expr and all expressions within variable (7.1)


This simplistic scenario of "child => child(1)%pChild" does change slightly when we move to a more complicated pointer-assignment-stmt, such as reshaping during the pointer-assignment-stmt; e.g.:

L20: 20 dummy_child(1:1,1:1) => child(1:2)

because, here, we know that we cannot simply copy the descriptor from the RHS to the LHS without doing some processing to define the remapping of the layout of the data. But, as I read the standard, this shouldn't be a problem because the RHS should retain its value throughout execution of the statement.

With that in mind, I move on to inspecting the disassembly of a pointer-assignment-stmt that involves reshaping:


!############## Exploring reshaping in a pointer-assignment-stmt ###############

I modified my demo program to look like the following:

1 program main
2 implicit none
3
4 type myType
5 type(myType), pointer, dimension(:) :: pChild => null()
6 end type myType
7
8 type(myType), pointer, dimension(:) :: root, child
9 type(myType), pointer, dimension(:,:) :: dummy_child
10
11 nullify(root)
12 nullify(child)
13 nullify(dummy_child)
14 call allocateMyType(root)
15
16 call allocateMyType(root(1)%pChild)
17 call allocateMyType(root(1)%pChild(1)%pChild)
18
19 child => root(1)%pChild
20 dummy_child(1:1,1:1) => child(1:2)
! ... the rest omitted as I just wanted to see the disassembly of L20

and the (notable) disassembly is:

; ...
L11 nullify(root)
L11+00 movq $0x0,-0xf0(%rbp)
; ...
L12 nullify(child)
L12+00 movq $0x0,-0x50(%rbp)
; ...
L13 nullify(dummy_child)
L13+00 movq $0x0,-0xb0(%rbp)
; ...
20 dummy_child(1:1,1:1) => child(1:2)
; ... L20+0 through L20+47 appear to be bounds check lines
L20+48 mov -0x30(%rbp),%rax
L20+49 mov %rax,-0x2f0(%rbp)
L20+50 movq $0x0,-0x300(%rbp)
L20+51 movq $0x0,-0x2f8(%rbp)
L20+52 movq $0x40,-0x300(%rbp)
L20+53 movb $0x1,-0x2f4(%rbp)
L20+54 movb $0x5,-0x2f3(%rbp)
L20+55 mov -0x28(%rbp),%rax
L20+56 movq $0x1,-0x2e0(%rbp)
L20+57 movq $0x2,-0x2d8(%rbp)
L20+58 mov %rax,-0x2e8(%rbp)
L20+59 mov -0x50(%rbp),%rdx
L20+60 mov -0x20(%rbp),%rcx
L20+61 mov $0x1,%esi
L20+62 sub %rcx,%rsi
L20+63 mov %rsi,%rcx
L20+64 imul %rax,%rcx
L20+65 shl $0x6,%rcx
L20+66 add %rcx,%rdx
L20+67 mov %rdx,-0x310(%rbp)
L20+68 neg %rax
L20+69 mov %rax,-0x308(%rbp)
L20+70 movq $0x0,-0xa0(%rbp)
L20+71 movq $0x0,-0x98(%rbp)
L20+72 movq $0x40,-0xa0(%rbp)
L20+73 movb $0x2,-0x94(%rbp)
L20+74 movb $0x5,-0x93(%rbp)
L20+75 mov -0x310(%rbp),%rax
L20+76 mov %rax,-0xb0(%rbp)
L20+77 movq $0x40,-0x90(%rbp)
L20+78 mov -0x308(%rbp),%rdx
L20+79 mov -0x2e8(%rbp),%rcx
L20+80 mov -0x2e0(%rbp),%rax
L20+81 imul %rcx,%rax
L20+82 add %rdx,%rax
L20+83 mov %rax,-0xa8(%rbp)
L20+84 movq $0x1,-0x80(%rbp)
L20+85 movq $0x1,-0x78(%rbp)
L20+86 mov -0x2e8(%rbp),%rax
L20+87 mov %rax,-0x88(%rbp)
L20+88 mov -0xa8(%rbp),%rdx
L20+89 sub %rax,%rdx
L20+90 mov %rdx,-0xa8(%rbp)
L20+91 movq $0x1,-0x68(%rbp)
L20+92 movq $0x1,-0x60(%rbp)
L20+93 mov %rax,-0x70(%rbp)
L20+94 mov -0xa8(%rbp),%rdx
L20+95 sub %rax,%rdx
L20+96 mov %rdx,%rax
L20+97 mov %rax,-0xa8(%rbp)
; ... L20+98 through L20+129 also appear to be bounds check lines
; ... This is probably due to compiling with -fbounds-check and these
; ... checks are probably validating the reshaping, but I'd have to
; ... play around with this more to be more certain


I haven't had a lot of time to pour through this more carfully, so I could be misreading things, but I have some interpretation of it at a glance based on the rest of the debugging I've done.

To start I note that the descriptor of "dummy_child" is stored at (%rbp-0xb0) and the descriptor of "child" is at (%rbp-0x50). It's important to keep these addresses in mind, and note the stack starts at some offset before %rbp and grows toward %rbp (notice that "root" is stored at (%rbp-0xf0), preceding "child" and "dummy_child").

If we look at lines L20+48, L20+55, ..., we see that that they are taking values from the "child" descriptor (RHS), which seems to occupy (%rbp-0x50) to (%rbp-0x10).

If we look at L20+52 (movq sizeof(myType)), L20+53:54 (copying the 0x501 I've noted previously), etc., it seems what is happening is a copy of the descriptor data necessary to define a pointer descriptor. This claim that it's constructing "the descriptor data necessary to define a pointer descriptor" isn't conceptually much different than what we saw in the original disassembly (in both cases, we just need to define the descriptor of the LHS), but there is clearly a difference here.

First, notice the addresses used in the first portion of the instructions: L20+48 through L20+69
-- The source values (the first argument of the mov instructions) all come from either literal constants or values read in from the "child" descriptor.
-- The destination for these values is an address range we haven't seen before, starting at (%rbp-0x310) (I think). I believe this is a portion of the stack set aside to keep temporary results, but I could be mistaken.

Next, notice the addresses used in the second portion of the instructions: L20+70 through L20+98
-- Now, the source values are either constant literals or the values that have been placed/manipulated in the temp value address space ~(%rbp-0x310)
-- And the destination is the addresses of the "dummy_child" descriptor

Looking at this disassembly, which should be the more safe version since the LHS and RHS _don't_ use the same pointer object in their expressions, we see that it moves the data from the RHS descriptor into a junk address space, makes the required modifications to morph it into the form that will be needed on the LHS, then moves the values over once those operations are done. Furthermore, once the data starts being copied into the "dummy_child" descriptor, the RHS seemingly never gets accessed again (which was the exact problem causing the seg fault in my original post).

Recalling my interpretation of the standard and my quick analysis of this disassembly, it seems to me that "child => child(1)%pChild" should be a perfectly valid statement, and the extra lines which caused the seg fault either shouldn't be there, or the copying of the data should be done into an auxilary address space until the RHS no longer needs to be accessed, thus preventing corruption during execution of the pointer-assignment-stmt.

I definitely could be overlooking/misinterpreting something though (especially with this new analysis) so I'd love to hear other people's thoughts on the matter.


!########################## Details Referenced Above: ##########################

!################# How I originally determined pointer layout ##################

To determine the addresses that pointers were stored at, I'd write some variables/nullifies as follows and inspect the addresses used in the disasembly (I did something similar to locate their offset from a parent object when they appeared as a component in a derived type):


1 program main
! ...
9 type(myType), pointer, dimension(:) :: root, child, dummy_child
10
11 nullify(root)
11 nullify(child)
11 nullify(dummy_child)

result in the following disassemblies (the definition of myType can be modified but these address seem to remain the same):

!####################### (NON-seg fault) gfortran 6.3.1 ########################

9 type(myType), pointer, dimension(:) :: root, child, dummy_child

11 nullify(root)
L11+0 mov 0x0,-0x90(%rbp) ; Start address of "root" descriptor in gfortran 6.3.1

12 nullify(child)
L12+0 mov 0x0,-0x30(%rbp) ; Start address of "child" descriptor in gfortran 6.3.1

13 nullify(dummy_child)
L13+0 mov 0x0,-0x60(%rbp) ; Start address of "dummy_child" descriptor in gfortran 6.3.1

!######################### (SEG fault) gfortran 9.3.1 ##########################

9 type(myType), pointer, dimension(:) :: root, child, dummy_child

11 nullify(root)
L11+0 mov 0x0,-0xd0(%rbp) ; Start address of "root" descriptor in gfortran 9.3.1

12 nullify(child)
L12+0 mov 0x0,-0x50(%rbp) ; Start address of "child" descriptor in gfortran 9.3.1

13 nullify(dummy_child)
L13+0 mov 0x0,-0x90(%rbp) ; Start address of "dummy_child" descriptor in gfortran 9.3.1


I did similar things to determine things like pointer size, etc. Additionally, it was through probes such as this I came to the conclusion that the message "...stack allocation..." was probably genuine.



Here's the full disassembly of the reshape-on-target statement at L20:

!####################### (newer compiler) gfortran 9.3.1 #######################

20 dummy_child(1:1,1:1) => child(1:2)
L20+0 mov -0x20(%rbp),%rax
L20+1 cmp $0x1,%rax
L20+2 jle 0x401800 <MAIN__+1082>
L20+3 mov -0x18(%rbp),%rdx
L20+4 mov -0x20(%rbp),%rax
L20+5 mov %rdx,%r8
L20+6 mov %rax,%rcx
L20+7 mov $0x1,%edx
L20+8 mov $0x403288,%esi
L20+9 mov $0x4032d8,%edi
L20+10 mov $0x0,%eax
L20+11 callq 0x401030 <_gfortran_runtime_error_at<at>plt>
L20+12 mov -0x18(%rbp),%rax
L20+13 test %rax,%rax
L20+14 jg 0x401830 <MAIN__+1130>
L20+15 mov -0x18(%rbp),%rdx
L20+16 mov -0x20(%rbp),%rax
L20+17 mov %rdx,%r8
L20+18 mov %rax,%rcx
L20+19 mov $0x1,%edx
L20+20 mov $0x403288,%esi
L20+21 mov $0x4032d8,%edi
L20+22 mov $0x0,%eax
L20+23 callq 0x401030 <_gfortran_runtime_error_at<at>plt>
L20+24 mov -0x20(%rbp),%rax
L20+25 cmp $0x2,%rax
L20+26 jle 0x401861 <MAIN__+1179>
L20+27 mov -0x20(%rbp),%rdx
L20+28 mov -0x18(%rbp),%rax
L20+29 mov %rdx,%r8
L20+30 mov %rax,%rcx
L20+31 mov $0x2,%edx
L20+32 mov $0x403288,%esi
L20+33 mov $0x4032d8,%edi
L20+34 mov $0x0,%eax
L20+35 callq 0x401030 <_gfortran_runtime_error_at<at>plt>
L20+36 mov -0x18(%rbp),%rax
L20+37 cmp $0x1,%rax
L20+38 jg 0x401892 <MAIN__+1228>
L20+39 mov -0x20(%rbp),%rdx
L20+40 mov -0x18(%rbp),%rax
L20+41 mov %rdx,%r8
L20+42 mov %rax,%rcx
L20+43 mov $0x2,%edx
L20+44 mov $0x403288,%esi
L20+45 mov $0x4032d8,%edi
L20+46 mov $0x0,%eax
L20+47 callq 0x401030 <_gfortran_runtime_error_at<at>plt>
L20+48 mov -0x30(%rbp),%rax
L20+49 mov %rax,-0x2f0(%rbp)
L20+50 movq $0x0,-0x300(%rbp)
L20+51 movq $0x0,-0x2f8(%rbp)
L20+52 movq $0x40,-0x300(%rbp)
L20+53 movb $0x1,-0x2f4(%rbp)
L20+54 movb $0x5,-0x2f3(%rbp)
L20+55 mov -0x28(%rbp),%rax
L20+56 movq $0x1,-0x2e0(%rbp)
L20+57 movq $0x2,-0x2d8(%rbp)
L20+58 mov %rax,-0x2e8(%rbp)
L20+59 mov -0x50(%rbp),%rdx
L20+60 mov -0x20(%rbp),%rcx
L20+61 mov $0x1,%esi
L20+62 sub %rcx,%rsi
L20+63 mov %rsi,%rcx
L20+64 imul %rax,%rcx
L20+65 shl $0x6,%rcx
L20+66 add %rcx,%rdx
L20+67 mov %rdx,-0x310(%rbp)
L20+68 neg %rax
L20+69 mov %rax,-0x308(%rbp)
L20+70 movq $0x0,-0xa0(%rbp)
L20+71 movq $0x0,-0x98(%rbp)
L20+72 movq $0x40,-0xa0(%rbp)
L20+73 movb $0x2,-0x94(%rbp)
L20+74 movb $0x5,-0x93(%rbp)
L20+75 mov -0x310(%rbp),%rax
L20+76 mov %rax,-0xb0(%rbp)
L20+77 movq $0x40,-0x90(%rbp)
L20+78 mov -0x308(%rbp),%rdx
L20+79 mov -0x2e8(%rbp),%rcx
L20+80 mov -0x2e0(%rbp),%rax
L20+81 imul %rcx,%rax
L20+82 add %rdx,%rax
L20+83 mov %rax,-0xa8(%rbp)
L20+84 movq $0x1,-0x80(%rbp)
L20+85 movq $0x1,-0x78(%rbp)
L20+86 mov -0x2e8(%rbp),%rax
L20+87 mov %rax,-0x88(%rbp)
L20+88 mov -0xa8(%rbp),%rdx
L20+89 sub %rax,%rdx
L20+90 mov %rdx,-0xa8(%rbp)
L20+91 movq $0x1,-0x68(%rbp)
L20+92 movq $0x1,-0x60(%rbp)
L20+93 mov %rax,-0x70(%rbp)
L20+94 mov -0xa8(%rbp),%rdx
L20+95 sub %rax,%rdx
L20+96 mov %rdx,%rax
L20+97 mov %rax,-0xa8(%rbp)
L20+98 mov -0x78(%rbp),%rdx
L20+99 mov -0x80(%rbp),%rax
L20+100 sub %rax,%rdx
L20+101 mov %rdx,%rax
L20+102 mov $0xffffffffffffffff,%rdx
L20+103 test %rax,%rax
L20+104 cmovs %rdx,%rax
L20+105 lea 0x1(%rax),%rcx
L20+106 mov -0x60(%rbp),%rdx
L20+107 mov -0x68(%rbp),%rax
L20+108 sub %rax,%rdx
L20+109 mov %rdx,%rax
L20+110 mov $0xffffffffffffffff,%rdx
L20+111 test %rax,%rax
L20+112 cmovs %rdx,%rax
L20+113 add $0x1,%rax
L20+114 imul %rcx,%rax
L20+115 mov -0x2d8(%rbp),%rcx
L20+116 mov -0x2e0(%rbp),%rdx
L20+117 sub %rdx,%rcx
L20+118 mov %rcx,%rdx
L20+119 mov $0xffffffffffffffff,%rcx
L20+120 test %rdx,%rdx
L20+121 cmovs %rcx,%rdx
L20+122 add $0x1,%rdx
L20+123 cmp %rax,%rdx
L20+124 jge 0x401a64 <MAIN__+1694>
L20+125 mov %rax,%rcx
L20+126 mov $0x4032f8,%esi
L20+127 mov $0x4032d8,%edi
L20+128 mov $0x0,%eax
L20+129 callq 0x401030 <_gfortran_runtime_error_at<at>plt>

Connor B

unread,
Dec 10, 2022, 2:44:34 PM12/10/22
to
Sorry I skipped over replying to your message.

Thank you for the link for more resources to study the standard. It doesn't always seem straight forward to get an electronic copy of it so the more the merrier.

As for using Intel Fortran - We do also use Intel Fortran, but I omitted reference to this because our internal tests were passing but we had a client report a crash that, at a glance, _might_ be related to the one discussed here. Unfortunately, I haven't had time to investigate the Intel side of things and might not have time for a little bit. This post came at the conclusion of my investigation on the gfortran side of things, so I figured I'd get some people's thoughts on the matter, but thank you for the good info on Intel Fortran as well!

It still isn't clear to me whether the issue is the statement being illegal or this being an edge case overlooked by gfortran's tests, but the more I reiterate over the information, the more I suspect it's a compiler issue. Any chance you know what the next steps would be if I feel this is the case at the end of this discussion thread? I.e., who would I contact to confirm this statement is standard compliant, where would I report this potential bug, etc.?

FortranFan

unread,
Dec 10, 2022, 4:58:15 PM12/10/22
to
On Saturday, December 10, 2022 at 2:44:34 PM UTC-5, Connor B wrote:

@Connor B,

> It still isn't clear to me whether the issue is the statement being illegal or this being an edge case overlooked by gfortran's tests, but the more I reiterate over the information, the more I suspect it's a compiler issue. Any chance you know what the next steps would be if I feel this is the case at the end of this discussion thread? I.e., who would I contact to confirm this statement is standard compliant, where would I report this potential bug, etc.?

As I mentioned upthread, your code appears to me to conform to the standard, Intel Fortran works fine with it, and the segmentation fault is, as you think, a likely compiler bug.

As to your next steps, my suggestion is as follows:

1. See this link for reporting problems with `gfortran` https://gcc.gnu.org/wiki/GFortran#Reporting_bugs. Follow the steps to initiate an inquiry at the GCC Bugzilla site and see how it goes. You will notice `gfortran` is currently for its user community to join the GCC FOSS development and contribute to `gfortran` as there are few developers currently active with it. Perhaps you will join `gfortran` development given your strong background in C in which case you can fix the problem yourself even. As noted here and elsewhere, prior gfortran developers were themselves users of `gfortran` and who have very worked hard to advance the open-source compiler.

2. You can try other compilers such as NAG Fortran if you have or can get its license. Or, you can ask here for some other users who have access to NAG Fortran to give you feedback on the reproducer in the original post. You will know NAG Fortran is reputed to be a good compiler for conformance validation vis-a-vis the Fortran standard. The compiler response using NAG Fortran can possibly guide you more firmly in terms of standard conforming behavior and how to evaluate the gfortran response with your example that is different from Intel Fortran.

3. You can also post at the Fortran Discourse site to gain broader feedback on your issue: https://fortran-lang.discourse.group/

Neil

unread,
Dec 11, 2022, 11:29:24 AM12/11/22
to
> 2. You can try other compilers such as NAG Fortran if you have or
can get its license. Or, you can ask here for some other users who
have access to NAG Fortran to give you feedback on the reproducer in
the original post. You will know NAG Fortran is reputed to be a
good compiler for conformance validation vis-a-vis the Fortran
standard. The compiler response using NAG Fortran can possibly
guide you more firmly in terms of standard conforming behavior and
how to evaluate the gfortran response with your example that is
different from Intel Fortran.


There don't seem to be any problems when NAG Fortran is used (other
than easily fixed complaints about the negative stop labels and the
call to deallocateMyType):

pyb075000011:~> cat main.f90
program main
implicit none

type myType
type(myType), pointer, dimension(:) :: pChild => null()
end type myType

type(myType), pointer, dimension(:) :: root, child

nullify(root, child)
call allocateMyType(root)

call allocateMyType(root(1)%pChild)
call allocateMyType(root(1)%pChild(1)%pChild)

child => root(1)%pChild ! <----- Does NOT cause a seg fault
do while (associated(child))
child => child(1)%pChild ! <----- Causes a seg fault
enddo

! call deallocateMyType(root) ! Details omitted, seg fault happens above

contains
subroutine allocateMyType(pMyType)
implicit none
type(myType), pointer, dimension(:), intent(inout) :: pMyType

if (associated(pMyType)) then
stop 1
endif

allocate(pMyType(2))
if (.not. associated(pMyType)) then
stop 2
endif

if (associated(pMyType(1)%pChild) .or. associated(pMyType(2)%pChild)) then
stop 3
endif
end subroutine allocateMyType
end program main
pyb075000011:~> nagfor -C=all -C=undefined -colour -gline -info -mtrace=off -nan -O0 -strict95 -u -v main.f90
NAG Fortran Compiler Release 7.1(Hanzomon) Build 7101
main.f90:
[NAG Fortran Compiler normal termination]
Loading...
pyb075000011:~> ./a.out
pyb075000011:~>

FortranFan

unread,
Dec 11, 2022, 11:46:24 PM12/11/22
to
On Sunday, December 11, 2022 at 11:29:24 AM UTC-5, nddtwe...@gmail.com wrote:

> ..
> There don't seem to be any problems when NAG Fortran is used (other
> than easily fixed complaints about the negative stop labels and the
> call to deallocateMyType)


@nddtwe...@gmail.com, that is good feedback which should be very helpful to OP:

Ev. Drikos

unread,
Dec 12, 2022, 1:06:00 AM12/12/22
to
On 09/12/2022 18:08, Connor B wrote:
Hello,

Having read several replies to your question, I understand that
two commercial compilers successfully run your code.

When I tried to define an intrinsic assignment and replace the
'=>' with a '=' in line 18, I noticed that GNU Fortran doesn't treat
lines 5 and 8 as having type equivalent declarations (unless I make
some mistake here).

Maybe you can try to file a bug report and see what is the answer.

---------------------------------------------------------------------

/usr/local/bin/gfortran -fplugin=dragonegg -g -O0 -finit-derived
ftest.f90 -o ftest
ftest.f90:21.17:

18 child = child(1)%pChild ! <----- Causes a seg fault
1
Error: Actual argument for 'from' must be a pointer at (1)
make: *** [all] Error 1

---------------------------------------------------------------------
program main
2 implicit none

4 type myType
5 type(myType), pointer, dimension(:) :: pChild => null()
6 end type myType
interface assignment(=)
procedure my_assignment
end interface

8 type(myType), pointer, dimension(:) :: root, child

10 nullify(root, child)
11 call allocateMyType(root)

13 call allocateMyType(root(1)%pChild)
14 call allocateMyType(root(1)%pChild(1)%pChild)

16 child => root ! <----- Does NOT cause a seg fault
17 do while (associated(child))
18 child = child(1)%pChild ! <----- Causes a seg fault
19 enddo
!20
21 call deallocateMyType(root) ! Details omitted, seg fault happens above
!22
23 contains
subroutine my_assignment(to, from)
implicit none
type(myType), pointer, dimension(:), intent(out) :: to
type(myType), pointer, dimension(:), intent(in) :: from
allocate( to , source = from)
end subroutine

24 subroutine allocateMyType(pMyType)
25 implicit none
26 type(myType), pointer, dimension(:), intent(inout) :: pMyType

28 if (associated(pMyType)) then
29 stop -1
30 endif

32 allocate(pMyType(2))
33 if (.not. associated(pMyType)) then
34 stop -2
35 endif

37 if (associated(pMyType(1)%pChild) .or.
associated(pMyType(2)%pChild)) then
38 stop -3
39 endif
40 end subroutine allocateMyType
41 recursive subroutine deallocateMyType(pMyType)
42 implicit none
43 type(myType), pointer, dimension(:), intent(inout) :: pMyType
44 integer :: idx

46 if (.not. associated(pMyType)) then
47 return
48 endif

50 do idx = lbound(pMyType,1), ubound(pMyType,1)
51 call deallocateMyType(pMyType(idx)%pChild)
52 enddo

54 deallocate(pMyType)
55 nullify(pMyType) ! I think this is redundant following
deallocate, but did it for my sanity
56 end subroutine deallocateMyType
41 end program main

Ev. Drikos

unread,
Dec 12, 2022, 1:53:09 AM12/12/22
to
On 12/12/2022 08:05, Ev. Drikos wrote:
> Hello,
>
> Having read several replies to your question, I understand that
> two commercial compilers successfully run your code.
>
> When I tried to define an intrinsic assignment and replace the
> '=>' with a '=' in line 18, I noticed that GNU Fortran doesn't treat
> lines 5 and 8 as having type equivalent declarations (unless I make
> some mistake here).
>
> Maybe you can try to file a bug report and see what is the answer.

If we wrap instead the pointer assignment statement (line 18) in an
associate construct, then it works as expected. Likely, a bug?

16 child => root ! <----- Does NOT cause a seg fault
17 do while (associated(child))
associate ( next => child(1)%pChild)
18 child => next ! <----- now not Causes a seg fault
end associate
19 enddo

Connor B

unread,
Dec 12, 2022, 7:45:20 AM12/12/22
to
On Saturday, December 10, 2022 at 1:58:15 PM UTC-8, FortranFan wrote:

> As I mentioned upthread, your code appears to me to conform to the standard, Intel Fortran works fine with it, and the segmentation fault is, as you think, a likely compiler bug.
>
> As to your next steps, my suggestion is as follows:
>
> 1. See this link for reporting problems with `gfortran` https://gcc.gnu.org/wiki/GFortran#Reporting_bugs. Follow the steps to initiate an inquiry at the GCC Bugzilla site and see how it goes. You will notice `gfortran` is currently for its user community to join the GCC FOSS development and contribute to `gfortran` as there are few developers currently active with it. Perhaps you will join `gfortran` development given your strong background in C in which case you can fix the problem yourself even. As noted here and elsewhere, prior gfortran developers were themselves users of `gfortran` and who have very worked hard to advance the open-source compiler.
>
> 2. You can try other compilers such as NAG Fortran if you have or can get its license. Or, you can ask here for some other users who have access to NAG Fortran to give you feedback on the reproducer in the original post. You will know NAG Fortran is reputed to be a good compiler for conformance validation vis-a-vis the Fortran standard. The compiler response using NAG Fortran can possibly guide you more firmly in terms of standard conforming behavior and how to evaluate the gfortran response with your example that is different from Intel Fortran.
>
> 3. You can also post at the Fortran Discourse site to gain broader feedback on your issue: https://fortran-lang.discourse.group/

Sounds good. I'll be looking into reporting this bug in the proper channels and, (hopefully) if I find time, I'd love to help contribute to gfortran to solve this problem (as well as help with contributions in general). Thank you for the good information and your thoughts on the matter!

Connor B

unread,
Dec 12, 2022, 7:50:40 AM12/12/22
to
On Sunday, December 11, 2022 at 8:29:24 AM UTC-8, nddtwe...(at) gmail.com wrote:

> There don't seem to be any problems when NAG Fortran is used (other
> than easily fixed complaints about the negative stop labels and the
> call to deallocateMyType):

Thank you for checking out NAG for me. I don't believe I have access to a copy of NAG so this was very helpful and much appreciated!

As a side note, I didn't realize negative stop codes weren't standard-compliant! I assumed they'd be perfectly fine since, as far as I'm aware, Fortran doesn't (or, at least, didn't) have unsigned int values. The more you know... Thanks again for your message!

FortranFan

unread,
Dec 12, 2022, 8:39:42 AM12/12/22
to
On Monday, December 12, 2022 at 7:50:40 AM UTC-5, Connor B wrote:

> ..
> As a side note, I didn't realize negative stop codes weren't standard-compliant! I assumed they'd be perfectly fine since, as far as I'm aware, Fortran doesn't (or, at least, didn't) have unsigned int values. The more you know... Thanks again for your message!

@Connor B and @nddtwe...(at) gmail.com,

Please note the Fortran standard does NOT state anything about the stop codes having to be nonnegative or any such. In fact, the current standard allows the stop codes to be either a scalar-int-expr of default kind or a scalar character expression of default kind.

The negative stop codes with the code in the original post are conformant, no problems on that count vis-a-vis the standard. Perhaps there are some Fortran processors out there that may not be able to interpret the values so check with the processor limitations in that case.

--- begin blurb from the standard regarding STOP and ERROR STOP statements ---
12 11.4 STOP and ERROR STOP statements

13 R1162 stop-stmt is STOP [ stop-code ] [ , QUIET = scalar-logical-expr]

14 R1163 error-stop-stmt is ERROR STOP [ stop-code ] [ , QUIET = scalar-logical-expr]

15 R1164 stop-code is scalar-default-char-expr
16 or scalar-int-expr

17 C1175 (R1164) The scalar-int-expr shall be of default kind.
be of default kind.

18 1 Execution of a STOP statement initiates normal termination of execution. Execution of an ERROR STOP
19 statement initiates error termination of execution.
20 2 When an image is terminated by a STOP or ERROR STOP statement, its stop code, if any, is made available
21 in a processor-dependent manner. If the stop-code is an integer, it is recommended that the value be used as
22 the process exit status, if the processor supports that concept. If the stop-code in a STOP statement is of type
23 character or does not appear, or if an end-program-stmt is executed, it is recommended that the value zero be
24 supplied as the process exit status, if the processor supports that concept. If the stop-code in an ERROR STOP
25 statement is of type character or does not appear, it is recommended that a processor-dependent nonzero value
26 be supplied as the process exit status, if the processor supports that concept.
27 3 If QUIET= is omitted or the scalar-logical-expr has the value false:
28 • if any exception (17) is signaling on that image, the processor shall issue a warning indicating which
29 exceptions are signaling, and this warning shall be on the unit identified by the named constant ERROR_-
30 UNIT from the intrinsic module ISO_FORTRAN_ENV (16.10.2.9);
31 • if a stop code is specified, it is recommended that it be made available by formatted output to the same
32 unit.
33 4 If QUIET= appears and the scalar-logical-expr has the value true, no output of signaling exceptions or stop code
34 shall be produced.

NOTE 1
When normal termination occurs on more than one image, it is expected that a processor-dependent summary
of any stop codes and signaling exceptions will be made available.

NOTE 2
If the integer stop-code is used as the process exit status, the processor might be able to interpret only values
within a limited range, or only a limited portion of the integer value (for example, only the least-significant 8 bits).
--- end blurb ---

Connor B

unread,
Dec 12, 2022, 9:15:54 AM12/12/22
to
On Sunday, December 11, 2022 at 10:06:00 PM UTC-8, Ev. Drikos wrote:
> Having read several replies to your question, I understand that
> two commercial compilers successfully run your code.
>
> When I tried to define an intrinsic assignment and replace the
> '=>' with a '=' in line 18, I noticed that GNU Fortran doesn't treat
> lines 5 and 8 as having type equivalent declarations (unless I make
> some mistake here).
>
> Maybe you can try to file a bug report and see what is the answer.
>
> ---------------------------------------------------------------------
>
> /usr/local/bin/gfortran -fplugin=dragonegg -g -O0 -finit-derived
> ftest.f90 -o ftest
> ftest.f90:21.17:
> 18 child = child(1)%pChild ! <----- Causes a seg fault
> 1
> Error: Actual argument for 'from' must be a pointer at (1)
>
> 4 type myType
> 5 type(myType), pointer, dimension(:) :: pChild => null()
> 6 end type myType
> 23 contains
> subroutine my_assignment(to, from)
> implicit none
> type(myType), pointer, dimension(:), intent(out) :: to
> type(myType), pointer, dimension(:), intent(in) :: from

That is very interesting. I hadn't thought of checking what defined assignment results in, but I figured it shouldn't cause an error since the thing that leads to the seg fault is the pointer descriptor being modified during execution of the statement.

This error reminds me of a similar error I observed when trying to write a generic interface to report the associated/allocated status of a given pointer/allocatable. I found that, if I wrote the procedures with "intent(in)" on the dummy arguments, the interface failed to compile with an error complaining it was ambiguous. However, I could change the intent to "intent(inout)" and everything worked just fine.

I haven't had time to look into it further, but I think it has to do with optimization of argument passing. Presumably, given I've observed scalar-pointers represented as just an address (with no metadata), the compiler uses a scenario like "intent(in)" to pass minimal information about the target, knowing the rest won't be needed, which causes "pointer" and "allocatable" to have the same signature resulting in ambiguity. Not sure if it's related, but I found it interesting so figured I'd mention it here (I provided a quick demo to show what I mean at the bottom).

I tried to mess with the intent specification after duplicating your "my_assignment" definition to see if that was at play here, but I ran into the issue of defined assignment having requirements on the intent specifications so it possibly isn't related...

However, I did find some additional interesting behavior playing around with defined/intrinsic assignment.

Interestingly, when I first duplicated your "my_assignment" definition, I only changed the "child => root(1)%pChild" line (just above the do-while loop) to assignment and this compiled just fine. It wasn't until I also changed "child => child(1)%pChild" that the error you reported emerged.

Even more interestingly, I removed the definition of "my_assignment" to let intrinsic assignment generate the definition, and this compiled/ran "just fine"... (tested using gfortran 9.3.1)

I say "just fine" because there were warnings and a seg fault, but they make sense. For the warnings, there were a bunch produced (truncated below), but they seem to be reported using the internal representation of a pointer! As for execution, there was a seg fault in the while loop (at L18), but that isn't surprising because "child" (the poitner) was never associated (only a suboject was defined)

Whatever is happening here, it's all very interesting indeed and possibly another clue as to what is going on in my original post. I'll be sure to carry this information forward, thanks for pointing it out!

!############## Using Compiler Generated Derived-type Assignment ###############
! ...
16 child = root ! <----- Does NOT cause a seg fault
17 do while (associated(child))
18 child = child(1)%pChild ! <----- Causes a seg fault
! ...

$ gfortran -g3 -ggdb3 -O0 -Wall -Wextra -fbounds-check -pedantic-errors main.f90

main.f90:29:0:

18 | child = child(1)%pChild
|

Warning: 'child.offset' may be used uninitialized in this function [-Wmaybe-uninitialized]
main.f90:29:0: Warning: 'child.dim[0].stride' may be used uninitialized in this function [-Wmaybe-uninitialized]
main.f90:29:0: Warning: 'child.dim[0].stride' may be used uninitialized in this function [-Wmaybe-uninitialized]
main.f90:29:0: Warning: 'child.offset' may be used uninitialized in this function [-Wmaybe-uninitialized]
main.f90:29:0: Warning: 'child.dim[0].stride' may be used uninitialized in this function [-Wmaybe-uninitialized]

$ ./a.out

At line 23 of file main.f90

Fortran runtime error: Array bound mismatch for dimension 1 of array 'child' (0/2)

Error termination. Backtrace:
#0 0x2ad42c438d8a in ???
#1 0x2ad42c439925 in ???
#2 0x2ad42c439cf7 in ???
#3 0x401c4b in MAIN__



!###################### (Ambiguous) Generic Interface(s) #######################

!############## Scenario #1 - "intent(in)" -> ambiguous interface ##############

module ambiguousInterfaceMOD
interface AmbiguousGenericInterface
module procedure myPointerRoutine
module procedure myAllocatableRoutine
end interface AmbiguousGenericInterface
contains
subroutine myPointerRoutine(myPointerIn)
implicit none
integer, pointer, dimension(:), intent(in) :: myPointerIn
end subroutine myPointerRoutine
!----------------------------------------------------------------------------
subroutine myAllocatableRoutine(myAllocatableIn)
implicit none
integer, allocatable, dimension(:), intent(in) :: myAllocatableIn
end subroutine myAllocatableRoutine
end module ambiguousInterfaceMOD

$ gfortran --version
GNU Fortran (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)

$ gfortran -Wall -Wextra -pedantic-errors -c ambiguousInterfaceMOD.f90
ambiguousInterfaceMOD.f90:7:29:

7 | subroutine myPointerRoutine(myPointerIn)
| 1
......
12 | subroutine myAllocatableRoutine(myAllocatableIn)
| 2
Error: Ambiguous interfaces in generic interface 'ambiguousgenericinterface' for 'mypointerroutine' at (1) and 'myallocatableroutine' at (2)



!############## Scenario #2 - "intent(inout)" -> valid interface ###############

module ambiguousInterfaceMOD
interface AmbiguousGenericInterface
module procedure myPointerRoutine
module procedure myAllocatableRoutine
end interface AmbiguousGenericInterface
contains
subroutine myPointerRoutine(myPointerIn)
implicit none
integer, pointer, dimension(:), intent(inout) :: myPointerIn
write(*,*) "associated(myPointerIn) = ",associated(myPointerIn)
end subroutine myPointerRoutine
!----------------------------------------------------------------------------
subroutine myAllocatableRoutine(myAllocatableIn)
implicit none
integer, allocatable, dimension(:), intent(inout) :: myAllocatableIn
write(*,*) "allocated(myAllocatableIn) = ",allocated(myAllocatableIn)
end subroutine myAllocatableRoutine
end module ambiguousInterfaceMOD

program main
use ambiguousInterfaceMOD
implicit none
integer, pointer, dimension(:) :: myPointer
integer, allocatable, dimension(:) :: myAllocatable
write(*,*) "Hello, World"
call AmbiguousGenericInterface(myPointer)
call AmbiguousGenericInterface(myAllocatable)
write(*,*) "Goodbye, World"
end program main

$ gfortran --version
GNU Fortran (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)

$ gfortran -Wall -Wextra -pedantic-errors ambiguousInterfaceMOD.f90

$ ./a.out
Hello, World
associated(myPointerIn) = T
allocated(myAllocatableIn) = F
Goodbye, World

Connor B

unread,
Dec 12, 2022, 9:21:22 AM12/12/22
to
On Sunday, December 11, 2022 at 10:53:09 PM UTC-8, Ev. Drikos wrote:
> If we wrap instead the pointer assignment statement (line 18) in an
> associate construct, then it works as expected. Likely, a bug?
> 16 child => root ! <----- Does NOT cause a seg fault
> 17 do while (associated(child))
> associate ( next => child(1)%pChild)
> 18 child => next ! <----- now not Causes a seg fault
> end associate
> 19 enddo

I think what you've stumbled on here is incidental (at least, the use of the associate construct being the fix).

Looking at the snippet you provide, your pointer-assignment-stmt is now "next => child(1)%pChild" which is more or less the same form as "child => root(1)%pChild" which hasn't been causing issues. That is, the LHS and RHS of this statement no longer share a reference to the same pointer object, so there is no issue with corruption of the pointer descriptor during the statement's execution.

Thank you for trying out different permutations of the code and reporting what you're finding though! The more information that is gained about this situation, the better!

Connor B

unread,
Dec 12, 2022, 9:27:45 AM12/12/22
to
On Monday, December 12, 2022 at 5:39:42 AM UTC-8, FortranFan wrote:
> Please note the Fortran standard does NOT state anything about the stop codes having to be nonnegative or any such. In fact, the current standard allows the stop codes to be either a scalar-int-expr of default kind or a scalar character expression of default kind.
>
> The negative stop codes with the code in the original post are conformant, no problems on that count vis-a-vis the standard. Perhaps there are some Fortran processors out there that may not be able to interpret the values so check with the processor limitations in that case.
>
> --- begin blurb from the standard regarding STOP and ERROR STOP statements ---
> 12 11.4 STOP and ERROR STOP statements
>
> NOTE 2
> If the integer stop-code is used as the process exit status, the processor might be able to interpret only values
> within a limited range, or only a limited portion of the integer value (for example, only the least-significant 8 bits).
> --- end blurb ---

Ahh, thank you for catching that and providing the relevant snippets from the standard. It is very much appreciated!

Ron Shepard

unread,
Dec 12, 2022, 1:00:04 PM12/12/22
to
On 12/12/22 7:39 AM, FortranFan wrote:
[...]
> 20 2 When an image is terminated by a STOP or ERROR STOP statement, its stop code, if any, is made available
> 21 in a processor-dependent manner.
[...]

Here is some text I found online regarding POSIX stop codes:

"An exit code, or sometimes known as a return code, is the code returned
to a parent process by an executable. On POSIX systems the standard exit
code is 0 for success and any number from 1 to 255 for anything else.
Exit codes can be interpreted by machine scripts to adapt in the event
of successes of failures."

Here is a test case with gfortran on MacOS, which is POSIX compliant:

$ cat stop.f90
write(*,*) "Hello World"
error stop -huge(0)
end
$ gfortran -g stop.f90 && a.out
Hello World
ERROR STOP -2147483647

Error termination. Backtrace:
#0 0x103c797e2 in ???
#1 0x103c7a4b5 in ???
#2 0x103c7b90e in ???
#3 0x103c6fe93 in MAIN__
at shepard/misc/work/stop.f90:2
#4 0x103c6fecc in main
at shepard/misc/work/stop.f90:3
$ echo $?
1

As you can see, gfortran prints out the correct value (and all other
values that I tried), but the actual return code that is seen by the
processor is the low-order 8 bits of that value treated as an unsigned
byte. The dangerous situation occurs when the stop code mod 256 is zero.
In those cases, testing the return status $? will not catch the error
that is generated. In all other cases, $? would show an error, but
perhaps not the value that was intended.

So for portable code, it is probably best to stick with values in the
range 0 to 255.

$.02 -Ron Shepard



jfh

unread,
Dec 12, 2022, 3:22:30 PM12/12/22
to
A long time ago I had access to the NAG Fortran compiler. I was surprised that if I specified an integer as a stop code it was printed in base 8. I don't know if NAG still does that.

Robin Vowels

unread,
Dec 12, 2022, 6:00:29 PM12/12/22
to
On Tuesday, December 13, 2022 at 7:22:30 AM UTC+11, jfh wrote:
.
> A long time ago I had access to the NAG Fortran compiler. I was surprised that if I specified an integer as a stop code it was printed in base 8. I don't know if NAG still does that.
.
Should be OK for integers in the range 0-7.

gah4

unread,
Dec 12, 2022, 7:06:54 PM12/12/22
to
By Fortran 2018, there doesn't seem to be anything wrong with negative values.

It is suggested that the value be used as the return code, or exit code, depending
on how the OS does that. Many systems allow an 8 bit code, taking the low
bits of the value. That works fine for negative values, too.

It can also be a character expression, in which case it is not used
for the exit code.

Older Fortran standards might have had more restrictions.
Compiles might be slow at adapting to the new version.

Unix uses an 8 bit return code, as do many other OS that I know about.
Tradition is low 8 bits of the integer (usually two's complement) value.

VMS has a fun system, where it prints an error message depending
on the code specified. IBM OS, from S/360 through its descendants,
use 0 for success, 4 for warning, and 8 for error. I had a program that
used 8 for its exit code.

Run on VMS, it prints ACCVIO access violation ...
when it ends. But there is no access violation, that is just error
number 8 in the table of errors.



Ev. Drikos

unread,
Dec 12, 2022, 11:37:16 PM12/12/22
to
On 12/12/2022 16:21, Connor B wrote:
> On Sunday, December 11, 2022 at 10:53:09 PM UTC-8, Ev. Drikos wrote:
>> If we wrap instead the pointer assignment statement (line 18) in an
>> associate construct, then it works as expected. Likely, a bug?
>> 16 child => root ! <----- Does NOT cause a seg fault
>> 17 do while (associated(child))
>> associate ( next => child(1)%pChild)
>> 18 child => next ! <----- now not Causes a seg fault
>> end associate
>> 19 enddo
>
> I think what you've stumbled on here is incidental (at least, the use of the associate construct being the fix).

Of course it may not be the fix. Yet, it allows ie gfortran8 to see the
POINTER attribute and compile my previous example as valid.


$ gfortran8 ftest.f90 -o ftest && ./ftest
$
-------------------- ftest.f90 -------------------------
program main
2 implicit none

4 type myType
5 type(myType), pointer, dimension(:) :: pChild => null()
6 end type myType
interface assignment(=)
procedure my_assignment
end interface

8 type(myType), pointer, dimension(:) :: root, child


10 nullify(root, child)
11 call allocateMyType(root)

13 call allocateMyType(root(1)%pChild)
14 call allocateMyType(root(1)%pChild(1)%pChild)

16 child => root ! <----- Does NOT cause a seg fault
17 do while (associated(child))
print *, "looping"
18 associate( next => child(1)%pChild);
print *, "associated"
child=next;
print *, "looped"
end associate
19 enddo
!20
21 call deallocateMyType(root) ! Details omitted, seg fault happens above
!22
23 contains
subroutine my_assignment(to, from)
implicit none
type(myType), pointer, dimension(:), intent(out) :: to
type(myType), pointer, dimension(:), intent(in) :: from
to => from

jfh

unread,
Dec 13, 2022, 3:36:14 PM12/13/22
to
I hadn't thought it necessary to say that the difference was detected by specifying a number larger than 7. I never tried a negative stop code.

Robin Vowels

unread,
Dec 13, 2022, 7:32:32 PM12/13/22
to
.
my reply was intended to be a joke.

gah4

unread,
Dec 14, 2022, 3:49:33 AM12/14/22
to
On Tuesday, December 13, 2022 at 4:32:32 PM UTC-8, Robin Vowels wrote:

(snip)

> my reply was intended to be a joke.

So many of your answers don't seem like jokes.

If they are intended to be jokes, put a :) on them.

Ev. Drikos

unread,
Dec 14, 2022, 8:26:42 AM12/14/22
to
On 13/12/2022 06:37, Ev. Drikos wrote:
> On 12/12/2022 16:21, Connor B wrote:
>> On Sunday, December 11, 2022 at 10:53:09 PM UTC-8, Ev. Drikos wrote:
>>> If we wrap instead the pointer assignment statement (line 18) in an
>>> associate construct, then it works as expected. Likely, a bug?
>>> 16 child => root ! <----- Does NOT cause a seg fault
>>> 17 do while (associated(child))
>>> associate ( next => child(1)%pChild)
>>> 18 child => next ! <----- now not Causes a seg fault
>>> end associate
>>> 19 enddo
>>
>> I think what you've stumbled on here is incidental (at least, the use
>> of the associate construct being the fix).
>
> Of course it may not be the fix. Yet, it allows ie gfortran8 to see the
> POINTER attribute and compile my previous example as valid.

Acknowledged that it may be also incidental as well.

Ev. Drikos
0 new messages