Need debugging help!

Paul van Delst

unread,

Oct 23, 2012, 4:21:08 PM10/23/12

to

Hi all,

I'm looking for some debugging help as I've pretty much run out of
ideas. Here's the situation:

Compiler: ifort v12.0.4
System: Linux cluster

We have some code that crashes in a unit test with the error
forrtl: error (65): floating invalid
when compiled with any of the following:
-O3 -traceback -fpconstant -real-size 32 -integer-size 32 -openmp
-O3 -traceback -fpconstant -real-size 64 -integer-size 64 -openmp
-O3 -traceback -fpconstant -real-size 64 -integer-size 32 -openmp

The offending line -- well, the line indicated in the traceback output
-- is line #224 below:

218 DO K=1,KM
219 IBO(K)=IBI(K)
220 DO N=1,NO
221 LO(N,K)=WO(N,K).GE.PMP*NB4
222 IF(LO(N,K)) THEN
223 ! print*,'in loop ',n,k,GO(N,K),WO(N,K)
224 GO(N,K)=GO(N,K)/WO(N,K)
225 ELSE
226 IBO(K)=1
227 GO(N,K)=0.
228 ENDIF
229 ENDDO
230 ENDDO

the associated definitions are:

99 INTEGER, INTENT(IN ) :: IBI(KM), KM
...
101 INTEGER, INTENT( OUT) :: IBO(KM), NO
102C
...
104 LOGICAL*1, INTENT( OUT) :: LO(MO,KM)
...
107 REAL, INTENT( OUT) :: GO(MO,KM)
...
112 INTEGER :: K, N
113 INTEGER :: NB, NB1, NB2, NB3, NB4
...
115 REAL, ALLOCATABLE :: CROT(:),SROT(:),WO(:,:)
116 REAL :: PMP

- If the print statement at line #223 is UNcommented, the code runs to
completion.
- If the compiler flag "-fpe0" is used (should cause execution to stop
on divide-by-zero), the code runs to completion.
- If debug flag "-g" is included in the compile, the code runs to
completion.

Seems to me to be a classic Heisenbug.

The issue we have now is that as soon as we try anything to figure out
where the problem is (print statements, or using a debugger) the code
runs to completion.

So I'm appealing to the clf group for any debugging ideas. I realise the
above is not a lot of information to go on -- there are a bunch of other
avenues we are looking into, e.g. linked in libraries being the cause,
or catalyst -- but I'm nearing my wits end. I'm hoping there some tip
and/or trick that I've overlooked.

Thanks in advance for any info.

cheers,

paulv

michael...@compuserve.com

unread,

Oct 23, 2012, 5:41:00 PM10/23/12

to

Paul,
What's the relationship between NO and MO?

Regards,

Mike Metcalf

mecej4

unread,

Oct 23, 2012, 5:47:21 PM10/23/12

to

It is possible that the invalid floating point value(s) was generated in
another portion of your code than the small segment that you showed; in
particular, places where the variables GO and WO were set, or were
clobbered because of nearby arrays being run out of bound.

To establish a compiler bug on the basis of a small segment of code, one
may need to examine the compiled code at the assembler level, and the
existence of such bugs is tough to prove since using a symbolic debugger
(for which one may need to compile with symbols ON, using -g or /Zi) may
make the bug go away or hide.

-- mecej4

Dick Hendrickson

unread,

Oct 23, 2012, 5:55:54 PM10/23/12

to

On 10/23/12 3:21 PM, Paul van Delst wrote:
> Hi all,
>
> I'm looking for some debugging help as I've pretty much run out of
> ideas. Here's the situation:
>
> Compiler: ifort v12.0.4
> System: Linux cluster
>
> We have some code that crashes in a unit test with the error
> forrtl: error (65): floating invalid
> when compiled with any of the following:
> -O3 -traceback -fpconstant -real-size 32 -integer-size 32 -openmp
> -O3 -traceback -fpconstant -real-size 64 -integer-size 64 -openmp
> -O3 -traceback -fpconstant -real-size 64 -integer-size 32 -openmp
>
> The offending line -- well, the line indicated in the traceback output
> -- is line #224 below:
>
> 218 DO K=1,KM
> 219 IBO(K)=IBI(K)
> 220 DO N=1,NO
> 221 LO(N,K)=WO(N,K).GE.PMP*NB4
> 222 IF(LO(N,K)) THEN
> 223 ! print*,'in loop ',n,k,GO(N,K),WO(N,K)
> 224 GO(N,K)=GO(N,K)/WO(N,K)

Is it possible that the compiler is starting the divide before it
completes the test for LO(N,K)? They aren't supposed to, but sometimes
aggressive optimizers get carried away with code motion. Putting int
the PRINT will surely prevent code motion. That's a pretty weak guess
from me. You'd have to look at the code (and possibly the hardware
reference manual) to tell. It's also possible the hardware is doing
some sort of out of order speculative execution while the IF test is
going on.

Dick Hendrickson

Robin Vowels

unread,

Oct 23, 2012, 6:59:17 PM10/23/12

to

The typical cause of your symptoms is a subscript out-of-range.
In this case, a value(s) being stored is destroying the code.
You need to have all and every run-time check enabled.

You might have an uninitialized variable.

Gordon Sande

unread,

Oct 23, 2012, 6:59:19 PM10/23/12

to

On 2012-10-23 17:21:08 -0300, Paul van Delst said:

> Hi all,
>
> I'm looking for some debugging help as I've pretty much run out of
> ideas. Here's the situation:
>
> Compiler: ifort v12.0.4
> System: Linux cluster
>
> We have some code that crashes in a unit test with the error
> forrtl: error (65): floating invalid
> when compiled with any of the following:
> -O3 -traceback -fpconstant -real-size 32 -integer-size 32 -openmp
> -O3 -traceback -fpconstant -real-size 64 -integer-size 64 -openmp
> -O3 -traceback -fpconstant -real-size 64 -integer-size 32 -openmp

What happens if you lower the optimization level? And use the rest of
the debugging options like subscript checking?

If the problem is optimization level dependent then ask the vendor.
You can even give then the before and after of a change of just
the optimization. But they may not be too happy if the code is of
any size.

glen herrmannsfeldt

unread,

Oct 23, 2012, 7:24:02 PM10/23/12

to

Paul van Delst <paul.v...@noaa.gov> wrote:

> I'm looking for some debugging help as I've pretty much run out of
> ideas. Here's the situation:

> Compiler: ifort v12.0.4
> System: Linux cluster

> We have some code that crashes in a unit test with the error
> forrtl: error (65): floating invalid

The causes of "floating invalid" for the intel x87 processor are
supposed to be:

• SNaN operand in any floating-point operation or math function call
• Division of zeroes: (+/-0.0)/(+/-0.0)
• Sum of Infinities having different signs: Infinity + (-Infinity)
• Difference of Infinities having the same sign: (+/-Infinity) –
(+/-Infinity
• Product of signed Infinities with zero: (+/-Inf) * 0
• Math Function Domain Errors: log(negative), sqrt(negative),
asin(|x}>1)

If you google for "forrtl: error (65): floating invalid" you find
some other optimization related reports.

Some of those could be related to values coming in from previous
expressions, and being tested differently by different
optimizations.

Note that divide by zero, for non-zero dividend, does not
generate this exception.

-- glen

glen herrmannsfeldt

unread,

Oct 23, 2012, 7:30:27 PM10/23/12

to

Paul van Delst <paul.v...@noaa.gov> wrote:

> I'm looking for some debugging help as I've pretty much run out of
> ideas. Here's the situation:

> Compiler: ifort v12.0.4
> System: Linux cluster

> We have some code that crashes in a unit test with the error
> forrtl: error (65): floating invalid

In addition to the ones I previously posted, it seems that this
might also result from problems in the floating point stack.

The ifort compiler options: -fpstkchk (Linux* and Mac OS* X)
or /Qfpstkchk (Windows) have the compiler check the stack
on function call and/or return.

The problem comes when a function return type is wrong.
Floating point return values are on the floating point
stack, and are popped by the caller. If the caller declares
the wrong type, then it doesn't get popped, causing stack
overflow at an unexpected point.

-- glen

Robert Miles

unread,

Oct 24, 2012, 12:21:12 AM10/24/12

to

On Tuesday, October 23, 2012 3:21:09 PM UTC-5, Paul van Delst wrote:
> Hi all,
>
>
>
> I'm looking for some debugging help as I've pretty much run out of
>
> ideas. Here's the situation:
>
>
>
> Compiler: ifort v12.0.4
>
> System: Linux cluster
>
>
>
> We have some code that crashes in a unit test with the error
>
> forrtl: error (65): floating invalid
>
> when compiled with any of the following:
>
> -O3 -traceback -fpconstant -real-size 32 -integer-size 32 -openmp
>
> -O3 -traceback -fpconstant -real-size 64 -integer-size 64 -openmp
>
> -O3 -traceback -fpconstant -real-size 64 -integer-size 32 -openmp
>
>
>
> The offending line -- well, the line indicated in the traceback output
>
> -- is line #224 below:
>
>
>
> 218 DO K=1,KM
>
> 219 IBO(K)=IBI(K)
>
> 220 DO N=1,NO
>
> 221 LO(N,K)=WO(N,K).GE.PMP*NB4
>
> 222 IF(LO(N,K)) THEN
>
> 223 ! print*,'in loop ',n,k,GO(N,K),WO(N,K)

You might check whether that compiler considers n to be another name for N, and
k to be another name for K.

Also, are all the variables in that line (and the line above, and the line below) initialized to some known values?

Richard Maine

unread,

Oct 24, 2012, 12:42:04 AM10/24/12

to

Robert Miles <robertm...@gmail.com> wrote:

> You might check whether that compiler considers n to be another name for
> N, and k to be another name for K.

As of f90, that is required by the standard. I'm quite confident that
ifort gets that right.

It was fairly typical practice prior to f90, although not 100%
universal. F90 does not require that the compiler allow lower case at
all, but if it is allowed, the standard requires that lower case letters
be considered equivalent to upper case ones except in a character
context.

As of f2003, the standard stopped waffling about lower case and just
plain required it.

--
Richard Maine
email: last name at domain . net
domain: summer-triangle

Ian Harvey

unread,

Oct 24, 2012, 1:09:11 AM10/24/12

to

On 2012-10-24 7:21 AM, Paul van Delst wrote:
> Hi all,
>
> I'm looking for some debugging help as I've pretty much run out of
> ideas. Here's the situation:
>
> Compiler: ifort v12.0.4
> System: Linux cluster
>
> We have some code that crashes in a unit test with the error
> forrtl: error (65): floating invalid
> when compiled with any of the following:
> -O3 -traceback -fpconstant -real-size 32 -integer-size 32 -openmp
> -O3 -traceback -fpconstant -real-size 64 -integer-size 64 -openmp
> -O3 -traceback -fpconstant -real-size 64 -integer-size 32 -openmp
>

What happens if you add -fp-model source [and further, -fp-model except]
to your command line options?

Phillip Helbig---undress to reply

unread,

Oct 24, 2012, 1:12:30 AM10/24/12

to

In article <1ksftev.1vh8bhp156t0c2N%nos...@see.signature>,

nos...@see.signature (Richard Maine) writes:

> > You might check whether that compiler considers n to be another name for
> > N, and k to be another name for K.
>
> As of f90, that is required by the standard. I'm quite confident that
> ifort gets that right.

Right.

> It was fairly typical practice prior to f90, although not 100%
> universal. F90 does not require that the compiler allow lower case at
> all, but if it is allowed, the standard requires that lower case letters
> be considered equivalent to upper case ones except in a character
> context.

Right.

> As of f2003, the standard stopped waffling about lower case and just
> plain required it.

Right, but requires it to be equivalent to upper case. There was never
any version of the standard which allowed n and N to be different.
Before F90, everything was uppercase, so lower case was non-standard.
Maybe some compiler treated n an N differently, but it was non-standard.

Richard Maine

unread,

Oct 24, 2012, 1:58:46 AM10/24/12

to

Phillip Helbig---undress to reply <hel...@astro.multiCLOTHESvax.de>
wrote:

> There was never
> any version of the standard which allowed n and N to be different.
> Before F90, everything was uppercase, so lower case was non-standard.
> Maybe some compiler treated n an N differently, but it was non-standard.

Well, right, but allow me to clarify what seems to me a little ambiguous
in the way you stated that. When talking about whether something is
nonstandard, we sometimes have to be careful to say whether we are
talking about a program or a compiler being nonstandard.

Prior to f90, a program that used both upper and lower case was
nonstandard. But allowing both upper and lower case did not make a
compiler nonstandard. That would count as an extension rather than a
violation of the standard by the compiler.

It was a fairly common extension and most compilers with such an
extension treated upper and lower case as equivalent outside of
character contexts. There were, however, a few exceptions. I couldn't
cite specific compilers at the moment, but I do recall that some
existed.

Ron Shepard

unread,

Oct 24, 2012, 2:55:57 AM10/24/12

to

In article <1ksfwuc.6adnsy1ghoz0wN%nos...@see.signature>,

nos...@see.signature (Richard Maine) wrote:

> It was a fairly common extension and most compilers with such an
> extension treated upper and lower case as equivalent outside of
> character contexts. There were, however, a few exceptions. I couldn't
> cite specific compilers at the moment, but I do recall that some
> existed.

There were one or two f77 compilers for the Macintosh that treated
upper and lower case as distinct by default (e.g. like C), but there
were compiler options that folded the case so that it acted like
everyone else's extension. These would be the Language Systems and
maybe the ABSOFT compilers. When f90 was supported, the default was
switched to be standard conforming (although by then, I think ABSOFT
had bought out the other compiler company).

$.02 -Ron Shepard

Ian Bush

unread,

Oct 24, 2012, 4:19:19 AM10/24/12

to

That was one of my thoughts. I remember getting caught out on a Cray 1
in a similar way with something like

Real Function j0( x )

Real x

If( x .eq. 0.0 ) Then
j0 = 1.0
Else
j0 = Sin( x ) / x
End If

End

when x was zero

I assume Paul has tried bounds checking, so my other thought is an
uninitialized variable - Paul, do you have nagfor available which can
check for that?

Ian

glen herrmannsfeldt

unread,

Oct 24, 2012, 4:32:19 AM10/24/12

to

Phillip Helbig---undress to reply <hel...@astro.multiclothesvax.de> wrote:

(snip, someone wrote)

>> As of f2003, the standard stopped waffling about lower case and just
>> plain required it.

> Right, but requires it to be equivalent to upper case. There was never
> any version of the standard which allowed n and N to be different.
> Before F90, everything was uppercase, so lower case was non-standard.
> Maybe some compiler treated n an N differently, but it was non-standard.

Fortran 66 specifically gives the upper case alphabet, 10 digits,
and 11 special characters. As far as I can tell, there is no
restriction against extensions to the character set.

The Fortran 66 character set includes $, though with no defined use.
Some IBM compilers treat $ as the 27th letter, an extension that
doesn't seem to violate the standard.

So, yes it would be an extension, and so non-standard, but I don't
see a restriction against it.

-- glen

Louis Krupp

unread,

Oct 24, 2012, 6:31:13 AM10/24/12

to

On Tue, 23 Oct 2012 16:21:08 -0400, Paul van Delst
<paul.v...@noaa.gov> wrote:

<snip>

>- If debug flag "-g" is included in the compile, the code runs to
>completion.

<snip>

You can run the program in the debugger even if it wasn't compiled for
debug. It's a little harder to guesss what's going on, but sometimes
it's the only way.

Louis

Paul van Delst

unread,

Oct 24, 2012, 9:12:26 AM10/24/12

to

Hello,

On 10/23/12 17:47, mecej4 wrote:
> On 10/23/2012 3:21 PM, Paul van Delst wrote:
>> Hi all,
>>
>> I'm looking for some debugging help as I've pretty much run out of
>> ideas. Here's the situation:
>>

[details snipped]

>>
> It is possible that the invalid floating point value(s) was generated in
> another portion of your code than the small segment that you showed; in
> particular, places where the variables GO and WO were set, or were
> clobbered because of nearby arrays being run out of bound.

I'm pretty sure that (or similar) is what is happening. But, as with the
other usual techniques, as soon as we compile with -C (i.e. check array
bounds during run time) the code runs to completion.

> To establish a compiler bug on the basis of a small segment of code, one
> may need to examine the compiled code at the assembler level,

Urg. That's what I was afraid of. Maybe we have some consultant types
around here to do that....

Thanks,

paulv

dpb

unread,

Oct 24, 2012, 9:49:41 AM10/24/12

to

On 10/24/2012 8:12 AM, Paul van Delst wrote:
> On 10/23/12 17:47, mecej4 wrote:

...

>> It is possible that the invalid floating point value(s) was generated in
>> another portion of your code than the small segment that you showed; in
>> particular, places where the variables GO and WO were set, or were
>> clobbered because of nearby arrays being run out of bound.
>
> I'm pretty sure that (or similar) is what is happening. But, as with the
> other usual techniques, as soon as we compile with -C (i.e. check array
> bounds during run time) the code runs to completion.
>
>> To establish a compiler bug on the basis of a small segment of code, one
>> may need to examine the compiled code at the assembler level,
>
> Urg. That's what I was afraid of. Maybe we have some consultant types
> around here to do that....

...

Just to be certain it's not overlooked...have you

a) changed the (fp in particular) optimization level/coprocessor
storage/register-holding temp flags, and

b) done a complete rebuild of all source including libraries to ensure
it's not an out-of-date routine mismatch?

--

Gordon Sande

unread,

Oct 24, 2012, 9:57:42 AM10/24/12

to

On 2012-10-24 10:12:26 -0300, Paul van Delst said:

> Hello,
>
> On 10/23/12 17:47, mecej4 wrote:
>> On 10/23/2012 3:21 PM, Paul van Delst wrote:
>>> Hi all,
>>>
>>> I'm looking for some debugging help as I've pretty much run out of
>>> ideas. Here's the situation:
>>>
> [details snipped]
>>>
>> It is possible that the invalid floating point value(s) was generated in
>> another portion of your code than the small segment that you showed; in
>> particular, places where the variables GO and WO were set, or were
>> clobbered because of nearby arrays being run out of bound.
>
> I'm pretty sure that (or similar) is what is happening. But, as with
> the other usual techniques, as soon as we compile with -C (i.e. check
> array bounds during run time) the code runs to completion.

The other common cause of segment faults is mismatched arguments on calls.
Are all you calls under control of explicit interfaces in F90? Hopefully by
having everything inside modules.

Clobbering the object code comes and goes with the most minor of changes
in where the object is located. If all subscripts check out then the
calls are the next usual suspects to be rounded up. When you say that
you have checked subscripts are there any object libraries have have escaped
being checked. Just asking as they are all too easy to forget about.

If you are not using external object libraries you could try borrowing a
Windows box and running under Siverfrost. They do their own checking of
calls even without F90 explicit interfaces when you enable their debugging.
Of course they are not object compatible with anything but themselves in
that mode. I think Lahey/Fujitsu have enhanced debugging in one of their
commercial Linux versions. (Explain to your accountants how quickly you are
burning expensive staff time compared to the compiler price!) Again no
external object code if I recall correctly.

>> To establish a compiler bug on the basis of a small segment of code, one
>> may need to examine the compiled code at the assembler level,
>
> Urg. That's what I was afraid of. Maybe we have some consultant types
> around here to do that....

Spend you money on other good debugging tools (even if it means using
other than
the preferred system) before you resort to looking at machine level
code. If the
graduate students (aka captive slaves) who look at machine code come
for free than
my advice does not matter and it will purify their souls in any case!

> Thanks,
>
> paulv

Gordon Sande

unread,

Oct 24, 2012, 10:19:37 AM10/24/12

to

On 2012-10-24 10:12:26 -0300, Paul van Delst said:

A different followup as I forgot that I had seen -openmp in the earlier
postings.

Try running with the maximum thread count set to one. That may check for race
conditions in openmp if you have missed putting in enough critical sections.
I do not expect this is going to solve things if the subscripts check OK but
the extra overhead might be changing/curing race issues so it is another straw
to try grasping at. ;-)

Paul van Delst

unread,

Oct 24, 2012, 12:48:37 PM10/24/12

to

Hello,

On 10/24/12 09:57, Gordon Sande wrote:
> On 2012-10-24 10:12:26 -0300, Paul van Delst said:
>
>> Hello,
>>
>> On 10/23/12 17:47, mecej4 wrote:
>>> On 10/23/2012 3:21 PM, Paul van Delst wrote:
>>>> Hi all,
>>>>
>>>> I'm looking for some debugging help as I've pretty much run out of
>>>> ideas. Here's the situation:
>>>>
>> [details snipped]
>>>>
>>> It is possible that the invalid floating point value(s) was generated in
>>> another portion of your code than the small segment that you showed; in
>>> particular, places where the variables GO and WO were set, or were
>>> clobbered because of nearby arrays being run out of bound.
>>
>> I'm pretty sure that (or similar) is what is happening. But, as with
>> the other usual techniques, as soon as we compile with -C (i.e. check
>> array bounds during run time) the code runs to completion.
>
> The other common cause of segment faults is mismatched arguments on calls.
> Are all you calls under control of explicit interfaces in F90? Hopefully by
> having everything inside modules.

Yep - we've looked at that too. The code is older f77-style with
additions of f90 features over the years, but no modules. However, we
have used the Intel compilers capability to automatically generate
interface blocks (which we can then USE) to check the argument mismatch
scenario.

All was well in that regard.

> Clobbering the object code comes and goes with the most minor of changes
> in where the object is located. If all subscripts check out then the
> calls are the next usual suspects to be rounded up. When you say that
> you have checked subscripts are there any object libraries have have
> escaped
> being checked. Just asking as they are all too easy to forget about.

No no, I agree. Good to ask. The makefiles for the library in question
are actually from our production environment which, by default, clobber
all the *.o's and *.mod's before compilation. That is, *every* library
build is a from-scratch build to avoid exactly the problem you mention.

>
> If you are not using external object libraries you could try borrowing a
> Windows box and running under Siverfrost. They do their own checking of
> calls even without F90 explicit interfaces when you enable their debugging.
> Of course they are not object compatible with anything but themselves in
> that mode. I think Lahey/Fujitsu have enhanced debugging in one of their
> commercial Linux versions. (Explain to your accountants how quickly you are
> burning expensive staff time compared to the compiler price!) Again no
> external object code if I recall correctly.

True, but do note my email address ends in ".gov" :o)

I am looking at this exercise as a learning process regarding the use of
the intel compiler on our new supercomputer (intel-based linux cluster
after 10+ years of AIX Power machines).

:o)

>
>>> To establish a compiler bug on the basis of a small segment of code, one
>>> may need to examine the compiled code at the assembler level,
>>
>> Urg. That's what I was afraid of. Maybe we have some consultant types
>> around here to do that....
>
> Spend you money on other good debugging tools (even if it means using
> other than
> the preferred system) before you resort to looking at machine level
> code. If the
> graduate students (aka captive slaves) who look at machine code come for
> free than
> my advice does not matter and it will purify their souls in any case!

Ha! No students.

thanks,

paulv

Paul van Delst

unread,

Oct 24, 2012, 12:51:03 PM10/24/12

to

Hello,

On 10/24/12 09:49, dpb wrote:
> On 10/24/2012 8:12 AM, Paul van Delst wrote:
>> On 10/23/12 17:47, mecej4 wrote:
> ...
>
>>> It is possible that the invalid floating point value(s) was generated in
>>> another portion of your code than the small segment that you showed; in
>>> particular, places where the variables GO and WO were set, or were
>>> clobbered because of nearby arrays being run out of bound.
>>
>> I'm pretty sure that (or similar) is what is happening. But, as with the
>> other usual techniques, as soon as we compile with -C (i.e. check array
>> bounds during run time) the code runs to completion.
>>
>>> To establish a compiler bug on the basis of a small segment of code, one
>>> may need to examine the compiled code at the assembler level,
>>
>> Urg. That's what I was afraid of. Maybe we have some consultant types
>> around here to do that....
> ...
>
> Just to be certain it's not overlooked...have you
>
> a) changed the (fp in particular) optimization level/coprocessor
> storage/register-holding temp flags, and

I'm actually re-running the regression tests now to cover exactly these
sorts of things.

> b) done a complete rebuild of all source including libraries to ensure
> it's not an out-of-date routine mismatch?

yep. See my post elsethread regarding this: the makefiles we are using
are taken from our production environment and they by default clobber
everything before a build.

cheers,

paulv

Paul van Delst

unread,

Oct 24, 2012, 12:52:54 PM10/24/12

to

Hello,

yep. Our current threaded regression test uses just one thread -- since
there didn't seem any point increasing the number of threads when even
one didn't work.

cheers,

paulv

>
>

gmail-unlp

unread,

Oct 24, 2012, 1:57:45 PM10/24/12

to

On Oct 24, 1:52 pm, Paul van Delst <paul.vande...@noaa.gov> wrote:
> Hello,
>
> On 10/24/12 10:19, Gordon Sande wrote:
>

[snip]

>
> > A different followup as I forgot that I had seen -openmp in the earlier
> > postings.
>
> > Try running with the maximum thread count set to one. That may check for
> > race
> > conditions in openmp if you have missed putting in enough critical
> > sections.
> > I do not expect this is going to solve things if the subscripts check OK
> > but
> > the extra overhead might be changing/curing race issues so it is another
> > straw
> > to try grasping at. ;-)
>
> yep. Our current threaded regression test uses just one thread -- since
> there didn't seem any point increasing the number of threads when even
> one didn't work.
>
> cheers,
>
> paulv
>

Hmmm... this (errors when using only one OpenMP thread) usually leads
me to find some uninitialized/copied in thread local data. I'm trying
to "reconstruct" the subroutine (I'm assuming
a subroutine, but it could be a function, I think):

SUBROUTINE SOMETHING(IBI, KM, IBO, NO, LO, GO [, SOMETHING
ELSE])
...

99 INTEGER, INTENT(IN ) :: IBI(KM), KM
...
101 INTEGER, INTENT( OUT) :: IBO(KM), NO

...
104 LOGICAL*1, INTENT( OUT) :: LO(MO,KM)
...
107 REAL, INTENT( OUT) :: GO(MO,KM)
...
112 INTEGER :: K, N
113 INTEGER :: NB, NB1, NB2, NB3, NB4
...
115 REAL, ALLOCATABLE :: CROT(:),SROT(:),WO(:,:)
116 REAL :: PMP

...
...

218 DO K=1,KM
219 IBO(K)=IBI(K)
220 DO N=1,NO
221 LO(N,K)=WO(N,K).GE.PMP*NB4
222 IF(LO(N,K)) THEN
223 ! print*,'in loop ',n,k,GO(N,K),WO(N,K)
224 GO(N,K)=GO(N,K)/WO(N,K)

225 ELSE
226 IBO(K)=1
227 GO(N,K)=0.
228 ENDIF
229 ENDDO
230 ENDDO

And I just have questions, so far:
1) (A silly one, but just to confirm I understood something...):
variables declared in lines 99-107 are dummy arguments, and those
declared in lines 112-116 are data local to the subroutine, right?
2) Did you forget to include MO as a dummy argument with at least
INTENT(IN)?
3) Also to confirm:
3.1) variables GO, and WO have been assigned values before line
218, right? (and WO has been previously allocated at least for NOxKM
elements)
3.2) NO is less than or equal to MO, right?
4) Where does the (OpenMP) parallel region starts? if it starts
somewhere between lines 116 and 218, would you add the relevant
allocation and assignments pre- and post- parallel region start?

Fernando.

Gordon Sande

unread,

Oct 24, 2012, 2:23:31 PM10/24/12

to

I know and was gently teasing. ;-) You can get Silverfrost for free for
personal use. There are emulators that are free that will run Windows on
Linux/etc if I have not grossly misread the bumff. You will still need
a Windows
license but I think the cheapest one is around $100. Maybe the newer conditions
will allow it to be run on a few personal machines (but not all at once).

You can still make a decent business case of having the other compilers for
their debugging, portability checking and general software engineering good
practices. It helps keep the internal processes cleaner and makes the next
machine change easier. And is generally just good professional practice. Don't
give up so easily but then don't make a career out of chasing that particular
squirrel.

Dick Hendrickson

unread,

Oct 24, 2012, 2:45:59 PM10/24/12

to

I'd be surprised if you got caught on that one, since the Cray 1
compilers tended not to do multi-block optimizations on IF blocks.
Almost for sure ( ;) ) you got caught on something like

X = 0.0
DO 10 I = 1, 0
Y(I) = 1.0/X
10 CONTINUE

It took us a while to teach the code-motion thing just how far it could
safely move loop invariant operations. :(

Dick Hendrickson

Thomas Koenig

unread,

Oct 24, 2012, 2:57:07 PM10/24/12

to

On 2012-10-23, Paul van Delst <paul.v...@noaa.gov> wrote:

> I'm looking for some debugging help as I've pretty much run out of
> ideas.

Have you tried another compiler, e.g. gfortran?

Can you run the program under valgrind?

Can you reduce this to a small, self-contained example which people
compile and use on their own? If you manage to do that, chances
are you will already have found the bug in your own program :-)

Gordon Sande

unread,

Oct 24, 2012, 3:01:19 PM10/24/12

to

If the gov is like other big bureaucracies then there may well be a
use-it-or-loose-it
time for the budget when it is real handy to have quality projects
ready to go. The
bosses may even appreciate the good foresight in having such things
around. Good planning
will have had a contingency line and if it opens up then be the guy
with the best
suggestion. The surest way to not get something is to never ask. :-(

glen herrmannsfeldt

unread,

Oct 24, 2012, 3:26:48 PM10/24/12

to

Dick Hendrickson <dick.hen...@att.net> wrote:

(snip)

> I'd be surprised if you got caught on that one, since the Cray 1
> compilers tended not to do multi-block optimizations on IF blocks.
> Almost for sure ( ;) ) you got caught on something like

> X = 0.0
> DO 10 I = 1, 0
> Y(I) = 1.0/X
> 10 CONTINUE

> It took us a while to teach the code-motion thing just how far it could
> safely move loop invariant operations. :(

As I understand it, 1./0. doesn't generate floating invalid,
I believe it generates +Inf.

But 0./0. (plus or minus) does.

And also any operation with SNaN.

Writing over data, or outside arrays, could generate SNaN.

-- glen

Paul van Delst

unread,

Oct 24, 2012, 4:07:00 PM10/24/12

to

Hello,

On 10/24/12 14:57, Thomas Koenig wrote:
> On 2012-10-23, Paul van Delst<paul.v...@noaa.gov> wrote:
>
>> I'm looking for some debugging help as I've pretty much run out of
>> ideas.
>
> Have you tried another compiler, e.g. gfortran?

Not yet. :o)

> Can you run the program under valgrind?

Again, not yet. We're transitioning (everything! all operations) to a
new computer and our test system doesn't have the usual set of tools.
And the sysadmin-types are horrifically busy with the transition.

> Can you reduce this to a small, self-contained example which people
> compile and use on their own? If you manage to do that, chances
> are you will already have found the bug in your own program :-)

A colleague has actually started doing that. The libraries in question
contain 100's of routines, but when he isolated just the routines in
which the failures were occurring, everything ran to successful
completion regardless of compiler switches, optimisation levels etc.

So now he's started adding in routines one by one to see at what point
the failures start reappearing. He deserves a medal!

Once (if?) we get back into failure mode with minimal dependencies (see
next paragraph), at that point I think we'll introduce a gfortran
compile to the mix.

The complicating factor is that some of the routines require linking in
a different library (which may be the cause... we've contacted that
developer also) which in turn uses another library again.

But I reckon we're getting there... the input from the clf crowd has
been very helpful, and not just via the re-invigoration one gets when
others take an interest. :o)

cheers,

paulv

Steve Lionel

unread,

Oct 24, 2012, 5:09:33 PM10/24/12

to

On 10/24/2012 4:07 PM, Paul van Delst wrote:

> Once (if?) we get back into failure mode with minimal dependencies (see
> next paragraph), at that point I think we'll introduce a gfortran
> compile to the mix.
>
> The complicating factor is that some of the routines require linking in
> a different library (which may be the cause... we've contacted that
> developer also) which in turn uses another library again.
>
> But I reckon we're getting there... the input from the clf crowd has
> been very helpful, and not just via the re-invigoration one gets when
> others take an interest. :o)

Please feel free to contact Intel Premier Support for help with this.
See the support link in my signature below., You may want to add

-fp-model precise

to a test build and see if the behavior changes. This disables some
optimizations and will slow down the code. It may also "move" or hide
the problem but not really eliminate it. But the compiler should not
optimize the code you showed in a way that creates exceptions where none
should occur. If it does, that's a bug and we'd like to know about it.

--
Steve Lionel
Developer Products Division
Intel Corporation
Merrimack, NH

For email address, replace "invalid" with "com"

User communities for Intel Software Development Products
http://software.intel.com/en-us/forums/
Intel Software Development Products Support
http://software.intel.com/sites/support/
My Fortran blog
http://www.intel.com/software/drfortran

Refer to http://software.intel.com/en-us/articles/optimization-notice
for more information regarding performance and optimization choices in
Intel software products.

Gib Bogle

unread,

Oct 24, 2012, 6:17:10 PM10/24/12

to

On 25/10/2012 9:07 a.m., Paul van Delst wrote:
...

> But I reckon we're getting there... the input from the clf crowd has
> been very helpful, and not just via the re-invigoration one gets when
> others take an interest. :o)

There's nothing like a Heisenbug hunt to get the Fortran juices flowing.
:-)

Robin Vowels

unread,

Oct 24, 2012, 7:07:55 PM10/24/12

to

On Oct 25, 4:57 am, gmail-unlp <ftine...@gmail.com> wrote:
> On Oct 24, 1:52 pm, Paul van Delst <paul.vande...@noaa.gov> wrote:

> Hmmm... this (errors when using only one OpenMP thread) usually leads
> me to find some uninitialized/copied in thread local data. I'm trying
> to "reconstruct" the subroutine (I'm assuming
> a subroutine, but it could be a function, I think):
>
> SUBROUTINE SOMETHING(IBI, KM, IBO, NO, LO, GO [, SOMETHING
> ELSE])

Probably MO needs to be in the list of dummy arguments.
That has been raised before.

Robin Vowels

unread,

Oct 24, 2012, 7:09:40 PM10/24/12

to

On Oct 24, 7:21 am, Paul van Delst <paul.vande...@noaa.gov> wrote:
> Hi all,

>
> I'm looking for some debugging help as I've pretty much run out of

> ideas. Here's the situation:
>
> Compiler: ifort v12.0.4
> System: Linux cluster
>
> We have some code that crashes in a unit test with the error
> forrtl: error (65): floating invalid
> when compiled with any of the following:
> -O3 -traceback -fpconstant -real-size 32 -integer-size 32 -openmp
> -O3 -traceback -fpconstant -real-size 64 -integer-size 64 -openmp
> -O3 -traceback -fpconstant -real-size 64 -integer-size 32 -openmp
>
> The offending line -- well, the line indicated in the traceback output
> -- is line #224 below:
>

> 218 DO K=1,KM
> 219 IBO(K)=IBI(K)
> 220 DO N=1,NO
> 221 LO(N,K)=WO(N,K).GE.PMP*NB4
> 222 IF(LO(N,K)) THEN
> 223 ! print*,'in loop ',n,k,GO(N,K),WO(N,K)
> 224 GO(N,K)=GO(N,K)/WO(N,K)
> 225 ELSE
> 226 IBO(K)=1
> 227 GO(N,K)=0.
> 228 ENDIF
> 229 ENDDO
> 230 ENDDO
>

> the associated definitions are:

Where is the definition for MO ?
What is the procedure statement?

> 99 INTEGER, INTENT(IN ) :: IBI(KM), KM
> ...
> 101 INTEGER, INTENT( OUT) :: IBO(KM), NO

> 102C

Dick Hendrickson

unread,

Oct 24, 2012, 7:49:23 PM10/24/12

to

On 10/24/12 2:26 PM, glen herrmannsfeldt wrote:
> Dick Hendrickson<dick.hen...@att.net> wrote:
>
> (snip)
>> I'd be surprised if you got caught on that one, since the Cray 1
>> compilers tended not to do multi-block optimizations on IF blocks.
>> Almost for sure ( ;) ) you got caught on something like
>
>> X = 0.0
>> DO 10 I = 1, 0
>> Y(I) = 1.0/X
>> 10 CONTINUE
>
>> It took us a while to teach the code-motion thing just how far it could
>> safely move loop invariant operations. :(
>
> As I understand it, 1./0. doesn't generate floating invalid,
> I believe it generates +Inf.

Since Ian's original comment referred to the Cray 1, I'd be surprised if
the IEEE rules applied. More to the point, the standard places no
absolute requirements on what a processor MUST do for arithmetic
operations it doesn't particularly like. If the processor claims to
support some level of IEEE arithmetic, it must support that level. If
it doesn't, then it doesn't have to. ;)

Dick Hendrickson

glen herrmannsfeldt

unread,

Oct 24, 2012, 8:32:30 PM10/24/12

to

Dick Hendrickson <dick.hen...@att.net> wrote:

(snip)
>>> I'd be surprised if you got caught on that one, since the Cray 1
>>> compilers tended not to do multi-block optimizations on IF blocks.
>>> Almost for sure ( ;) ) you got caught on something like

>>> X = 0.0
>>> DO 10 I = 1, 0
>>> Y(I) = 1.0/X
>>> 10 CONTINUE

>>> It took us a while to teach the code-motion thing just how far it could
>>> safely move loop invariant operations. :(

>> As I understand it, 1./0. doesn't generate floating invalid,
>> I believe it generates +Inf.

> Since Ian's original comment referred to the Cray 1, I'd be surprised if
> the IEEE rules applied.

I thought he was generalizing to what optimizers might do.

The specific message: "forrtl: error (65): floating invalid"
comes from ifort, I believe to a specific exception detected
by the floating point processor. So, not even IEEE in general,
but to one specific implementation.

> More to the point, the standard places no absolute requirements
> on what a processor MUST do for arithmetic operations it
> doesn't particularly like. If the processor claims to
> support some level of IEEE arithmetic, it must support that level.
> If it doesn't, then it doesn't have to. ;)

Yes, what matters is what the specific implementation
does, but that can be known, except for undiscovered
hardware bugs.

A google search shows that a similar problem showed up here
about four years ago. Personally, I think floating point stack
problems are the most likely, and will tend to show up nowhere
near the actual cause.

The x87 stack has eight entries. The original design was
for a virtual stack that would spill to RAM on overflow,
and back on underflow. It was only after the 8087 was built
that someone tried to write the software interrupt handler
to do it, and found that it wasn't possible. Not all the
needed state was available. For that reason, compilers
limit the stack depth that they use to eight.

If somewhere else in the program something is left on the
stack, it won't show up until somewhere else where the
full stack depth is needed. That may happen a long way
from the cause. The easiest way to mess up the stack is
to call a REAL function, declaring it some other type
in the calling routine. The caller won't pop the
return value, which then stays on the stack.

The return value seen by the caller will be wrong, but
that might not be noticed. Could even be deep in some
library routine written by someone else. (Many systems
don't have problems with mismatched return types,
so it might go unnoticed for a long time.)

-- glen

mecej4

unread,

Oct 24, 2012, 9:15:16 PM10/24/12

to

On 10/24/2012 3:07 PM, Paul van Delst wrote:
> Hello,
>
<---CUT--->

> The complicating factor is that some of the routines require linking in
> a different library (which may be the cause... we've contacted that
> developer also) which in turn uses another library again.
>

> paulv

It can be tough hunting for the kind of bug discussed in this thread if
there is a dependency on libraries that are not available in source form.

Mysterious failures can be caused by not preserving SSE2 MXSCR (or 80X87
CW) flags as required by these libraries. For an instance of a failure
of an IMSL routine when called from Fortran code compiled with the /fast
option of Intel Fortran, see

<http://forums.roguewave.com/showthread.php?1427-Strange-bug-%28-%29-in-FNL7-routine-IVOAM>

-- mecej4

Ian Bush

unread,

Oct 25, 2012, 3:50:27 AM10/25/12

to

To be honest I was sure it was the above! But then again whilst not as
old as some here I'm definitely as the stage in life where I forget more
than I remember it may be what you say. 'Twas many years ago,

Ian

gmail-unlp

unread,

Oct 25, 2012, 6:35:43 AM10/25/12

to

On Oct 24, 8:07 pm, Robin Vowels <robin.vow...@gmail.com> wrote:
> On Oct 25, 4:57 am, gmail-unlp <ftine...@gmail.com> wrote:
>
> > On Oct 24, 1:52 pm, Paul van Delst <paul.vande...@noaa.gov> wrote:
> > Hmmm... this (errors when using only one OpenMP thread) usually leads
> > me to find some uninitialized/copied in thread local data. I'm trying
> > to "reconstruct" the subroutine (I'm assuming
> > a subroutine, but it could be a function, I think):
>
> > SUBROUTINE SOMETHING(IBI, KM, IBO, NO, LO, GO [, SOMETHING
> > ELSE])
>
> Probably MO needs to be in the list of dummy arguments.
> That has been raised before.

I was asking the OP about it...

Fernando.

[snip]

Paul van Delst

unread,

Oct 25, 2012, 10:51:00 AM10/25/12

to

Hello,

On 10/24/12 13:57, gmail-unlp wrote:
>
> Hmmm... this (errors when using only one OpenMP thread) usually leads
> me to find some uninitialized/copied in thread local data. I'm trying
> to "reconstruct" the subroutine (I'm assuming
> a subroutine, but it could be a function, I think):

Well a test with four threads fails as well.

[post-composition preface: What follows is a looooong synopsis of where
we're at. So, if you're still inclined to keep reading, grab a coffee
and get comfy]

BTW, responding to other posts as well, here is the actual subroutine
statement:

SUBROUTINE POLATES6(IPOPT,KGDSI,KGDSO,MI,MO,KM,IBI,LI,GI,
& NO,RLAT,RLON,IBO,LO,GO,IRET)

along with all of the declarations:

IMPLICIT NONE
C
INTEGER, INTENT(IN ) :: IBI(KM), IPOPT(20), KM, MI, MO
INTEGER, INTENT(IN ) :: KGDSI(200), KGDSO(200)
INTEGER, INTENT( OUT) :: IBO(KM), IRET, NO
C
LOGICAL*1, INTENT(IN ) :: LI(MI,KM)
LOGICAL*1, INTENT( OUT) :: LO(MO,KM)
C
REAL, INTENT(IN ) :: GI(MI,KM)
REAL, INTENT( OUT) :: GO(MO,KM), RLAT(MO), RLON(MO)
C
REAL, PARAMETER :: FILL=-9999.
C
INTEGER :: IB, I1, IJKGDS1, IJKGDSA(20)
INTEGER :: JB, J1, K, LB, LSW, MP, N
INTEGER :: N11(MO), NB, NB1, NB2, NB3, NB4, NV
C

REAL, ALLOCATABLE :: CROT(:),SROT(:),WO(:,:)

REAL :: PMP,RLOB(MO),RLAB(MO)
REAL :: WB, XI, YI
REAL :: XPTB(MO),YPTB(MO),XPTS(MO),YPTS(MO)

The stripping out of the (supposedly) offending routines from the
library gets us to the point where the entire library, now CONTAINed as
internal subprograms in the main test, works!

Argh.

We've switched our focus to how the routine is called. Last night I was
digging a little deeper into how users actually call these routines and
I saw:

Subroutine interp, with definitions,

integer*4 :: i1
integer :: ip, ipopt(20), output_kgds(200)
integer :: km, ibi, mi, iret, i, j
integer :: i_output, j_output, mo, no, ibo

logical*1, allocatable :: output_bitmap(:,:)

real, allocatable :: output_rlat(:,:), output_rlon(:,:)
real, allocatable :: output_data(:,:)

calls "ipolates" thusly:

km = 1 ! number of fields to interpolate
mi = i_input * j_input ! dimension of input grids

mo = i_output * j_output

allocate (output_rlat(i_output,j_output))
allocate (output_rlon(i_output,j_output))
allocate (output_data(i_output,j_output))
allocate (output_bitmap(i_output,j_output))

call ipolates(ip, ipopt, input_kgds, output_kgds, mi, mo,&
km, ibi, input_bitmap, input_data, &
no, output_rlat, output_rlon, ibo, &
output_bitmap, output_data, iret)

And subroutine ipolates,

SUBROUTINE IPOLATES(IP,IPOPT,KGDSI,KGDSO,MI,MO,KM,IBI,LI,GI,
& NO,RLAT,RLON,IBO,LO,GO,IRET)
IMPLICIT NONE
C
INTEGER, INTENT(IN ) :: IP, IPOPT(20), KM, MI, MO
INTEGER, INTENT(IN ) :: IBI(KM), KGDSI(200), KGDSO(200)
INTEGER, INTENT(INOUT) :: NO
INTEGER, INTENT( OUT) :: IRET, IBO(KM)
C
LOGICAL*1, INTENT(IN ) :: LI(MI,KM)
LOGICAL*1, INTENT( OUT) :: LO(MO,KM)
C
REAL, INTENT(IN ) :: GI(MI,KM)
REAL, INTENT(INOUT) :: RLAT(MO),RLON(MO)
REAL, INTENT( OUT) :: GO(MO,KM)
C
INTEGER :: K, N

calls the various "polatesX" routines like so

C NEIGHBOR-BUDGET INTERPOLATION
ELSEIF(IP.EQ.6) THEN
CALL POLATES6(IPOPT,KGDSI,KGDSO,MI,MO,KM,IBI,LI,GI,
& NO,RLAT,RLON,IBO,LO,GO,IRET)

So the "ipolates" driver routine has arguments dimensioned just like the
"polates6" routine.

The calling subroutine has, for some of the arrays, quite different
dimension sizes, but the same rank and total number of elements.

The big question I have now is regarding the IBI dummy argument - it is
declared as IBI(KM) in the library codes. In our test, KM is set to one.
In the main test driver (interp), ibi is declared as a scalar.

(I think wasn't picked up by the ifort interface-block-generation switch
because it's the *test* driver, not a library member. Argh.)

So, after reading Louis Krupp's post, I decided to try the combination
of "-O3 + -g" do get a debugger friendly failing results. And it worked!
Nobody had tried that before I guess assuming -O3 meant no -g. Anyhoo...

Inspecting the failure in the debugger shows that the IPOLATES
subroutine dummy argument values beyond the IBI argument are corrupted!

I'd *never* seen code before where an actual argument dimensioned
argument(N_I, N_J)
was passed in to a routine that was expecting
argument(N_I*N_J, 1)

Uff da.

So now we're dimensioning everything consistently to see if that is the
cause of the segfaulting.

There is a wrinkle in that the main IPOLATES calls are

C BILINEAR INTERPOLATION
IF(IP.EQ.0) THEN
CALL POLATES0(IPOPT,KGDSI,KGDSO,MI,MO,KM,IBI,LI,GI,
& NO,RLAT,RLON,IBO,LO,GO,IRET)
C - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
C BICUBIC INTERPOLATION
ELSEIF(IP.EQ.1) THEN
CALL POLATES1(IPOPT,KGDSI,KGDSO,MI,MO,KM,IBI,LI,GI,
& NO,RLAT,RLON,IBO,LO,GO,IRET)
C - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
C NEIGHBOR INTERPOLATION
ELSEIF(IP.EQ.2) THEN
CALL POLATES2(IPOPT,KGDSI,KGDSO,MI,MO,KM,IBI,LI,GI,
& NO,RLAT,RLON,IBO,LO,GO,IRET)
C - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
C BUDGET INTERPOLATION
ELSEIF(IP.EQ.3) THEN
CALL POLATES3(IPOPT,KGDSI,KGDSO,MI,MO,KM,IBI,LI,GI,
& NO,RLAT,RLON,IBO,LO,GO,IRET)
C - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
C SPECTRAL INTERPOLATION
ELSEIF(IP.EQ.4) THEN
CALL POLATES4(IPOPT,KGDSI,KGDSO,MI,MO,KM,IBI,LI,GI,
& NO,RLAT,RLON,IBO,LO,GO,IRET)
C - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
C NEIGHBOR-BUDGET INTERPOLATION
ELSEIF(IP.EQ.6) THEN
CALL POLATES6(IPOPT,KGDSI,KGDSO,MI,MO,KM,IBI,LI,GI,
& NO,RLAT,RLON,IBO,LO,GO,IRET)
C - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
C UNRECOGNIZED INTERPOLATION METHOD
ELSE
....handle invalid method...

and:
a) the arguments are all dimensioned and called in the same weird way
b) only the call to POLATES6 segfaults. 0-4 complete normally.

Seeing as this is mostly f77-era, external subroutine code, is that type
of weird argument dimensioning illegal?

I am also told, after expressing horror at the argument dimensioning
shenanigans, that this "trick" is used in innumerable places in various
peoples codes. The same codes that ran apparently fine for years
(decades?) on IBM systems.

So, there you go. A comprehensive update of where we're at.

Any info about the legality of the dimensioning would be appreciated.

And, again, thanks to all who replied. Just getting other people's ideas
made a big difference.

cheers,

paulv

Gordon Sande

unread,

Oct 25, 2012, 11:12:45 AM10/25/12

to

Did not read details but offer comment on subscript checking. The usual
form treats the local
declaration as correct but does not check the local declaration against
the storage provided
by the caller. This is for F77 semantics of fixed size and assumed
size. The serious debugging
compilers (Siverfrost and Lahey/Fujitsu) will also check that the local
declaration fits
inside the supplied storage. Or at least that is how I understand their
discriptions. To do
this they need to pass descriptors much like the F90 assumed shape (":"
stuff) but they do it
for their debugging purposes. So subscript checking has a weakness as
usually done for F77
semantics that is not present in the full press debugging systems. Note
that F90 assumed
shape does not allow this form of user (white?) lie so does not have
the problem.

dpb

unread,

Oct 25, 2012, 3:33:08 PM10/25/12

to

On 10/25/2012 9:51 AM, Paul van Delst wrote:
...

> The big question I have now is regarding the IBI dummy argument - it is
> declared as IBI(KM) in the library codes. In our test, KM is set to one.
> In the main test driver (interp), ibi is declared as a scalar.
>
> (I think wasn't picked up by the ifort interface-block-generation switch
> because it's the *test* driver, not a library member. Argh.)
>
> So, after reading Louis Krupp's post, I decided to try the combination
> of "-O3 + -g" do get a debugger friendly failing results. And it worked!
> Nobody had tried that before I guess assuming -O3 meant no -g. Anyhoo...
>
> Inspecting the failure in the debugger shows that the IPOLATES
> subroutine dummy argument values beyond the IBI argument are corrupted!
>
> I'd *never* seen code before where an actual argument dimensioned
> argument(N_I, N_J)
> was passed in to a routine that was expecting
> argument(N_I*N_J, 1)
>
> Uff da.
>
> So now we're dimensioning everything consistently to see if that is the
> cause of the segfaulting.

...

Shouldn't be a problem unless the values run over the total size
allocated since would be passed by address association. As

glen herrmannsfeldt

unread,

Oct 25, 2012, 4:25:15 PM10/25/12

to

Paul van Delst <paul.v...@noaa.gov> wrote:
>

> On 10/24/12 13:57, gmail-unlp wrote:

>> Hmmm... this (errors when using only one OpenMP thread) usually leads
>> me to find some uninitialized/copied in thread local data. I'm trying
>> to "reconstruct" the subroutine (I'm assuming
>> a subroutine, but it could be a function, I think):

(snip)

> BTW, responding to other posts as well, here is the actual subroutine
> statement:

> SUBROUTINE POLATES6(IPOPT,KGDSI,KGDSO,MI,MO,KM,IBI,LI,GI,
> & NO,RLAT,RLON,IBO,LO,GO,IRET)

(snip)

> along with all of the declarations:

> INTEGER, INTENT(IN ) :: IBI(KM), IPOPT(20), KM, MI, MO

(snip)

> We've switched our focus to how the routine is called. Last night I was
> digging a little deeper into how users actually call these routines and
> I saw:

> Subroutine interp, with definitions,

(snip)

> C NEIGHBOR-BUDGET INTERPOLATION
> ELSEIF(IP.EQ.6) THEN
> CALL POLATES6(IPOPT,KGDSI,KGDSO,MI,MO,KM,IBI,LI,GI,
> & NO,RLAT,RLON,IBO,LO,GO,IRET)

> So the "ipolates" driver routine has arguments dimensioned just like the
> "polates6" routine.

> The calling subroutine has, for some of the arrays, quite different
> dimension sizes, but the same rank and total number of elements.

For assumed shape or explicit shape, that is fine.

> The big question I have now is regarding the IBI dummy argument - it is
> declared as IBI(KM) in the library codes. In our test, KM is set to one.
> In the main test driver (interp), ibi is declared as a scalar.

I believe that is non-standard, but often works. I don't know the
conditions on which it might fail.

One that I can think of, if one tried to check array bounds through
a subroutine call, it might require passing hidden arguments.
The other way around (array element to scalar dummy) has to work,
as does array element actual argument to array dummy (yes it
seems strange, but it has to work).

> (I think wasn't picked up by the ifort interface-block-generation switch
> because it's the *test* driver, not a library member. Argh.)

> So, after reading Louis Krupp's post, I decided to try the combination
> of "-O3 + -g" do get a debugger friendly failing results. And it worked!
> Nobody had tried that before I guess assuming -O3 meant no -g. Anyhoo...

> Inspecting the failure in the debugger shows that the IPOLATES
> subroutine dummy argument values beyond the IBI argument are corrupted!

> I'd *never* seen code before where an actual argument dimensioned
> argument(N_I, N_J)
> was passed in to a routine that was expecting
> argument(N_I*N_J, 1)

As above, legal still in Fortran 2008, and as far as I know, no plans
to change that.

> Uff da.

> So now we're dimensioning everything consistently to see if that is the
> cause of the segfaulting.

I thought the problem was floating invalid. Is it now segfault?

> There is a wrinkle in that the main IPOLATES calls are

> C BILINEAR INTERPOLATION
> IF(IP.EQ.0) THEN
> CALL POLATES0(IPOPT,KGDSI,KGDSO,MI,MO,KM,IBI,LI,GI,
> & NO,RLAT,RLON,IBO,LO,GO,IRET)

(snip)

> and:
> a) the arguments are all dimensioned and called in the same weird way
> b) only the call to POLATES6 segfaults. 0-4 complete normally.

> Seeing as this is mostly f77-era, external subroutine code, is that type
> of weird argument dimensioning illegal?

> I am also told, after expressing horror at the argument dimensioning
> shenanigans, that this "trick" is used in innumerable places in various
> peoples codes. The same codes that ran apparently fine for years
> (decades?) on IBM systems.

Passing an array actual argument to a different rank (except
assumed shape) dummy is legal. You have to know how they are
arranged in memory, as required by the standard, but yes
it does work and is still (after almost 60 years) legal.

> So, there you go. A comprehensive update of where we're at.

> Any info about the legality of the dimensioning would be appreciated.

The only one illegal that I see is passing a scalar to an array dummy.

-- glen

dpb

unread,

Oct 25, 2012, 6:19:44 PM10/25/12

to

On 10/25/2012 9:51 AM, Paul van Delst wrote:
...

Sorry about that; I thought I had trashed the other draft and sent this
one instead of the (now obvious obverse). Not a whole lot to add to
what's already been said now, but since I wrote it... :)

> The calling subroutine has, for some of the arrays, quite different
> dimension sizes, but the same rank and total number of elements.
>
> The big question I have now is regarding the IBI dummy argument - it is
> declared as IBI(KM) in the library codes. In our test, KM is set to one.
> In the main test driver (interp), ibi is declared as a scalar.
>
> (I think wasn't picked up by the ifort interface-block-generation switch
> because it's the *test* driver, not a library member. Argh.)
>

...

>
> Inspecting the failure in the debugger shows that the IPOLATES
> subroutine dummy argument values beyond the IBI argument are corrupted!

I'm not quite sure how to interpret you meaning of "beyond the IBI
argument" here. Can you be more explicit?

> I'd *never* seen code before where an actual argument dimensioned
> argument(N_I, N_J)
> was passed in to a routine that was expecting
> argument(N_I*N_J, 1)
>
> Uff da.
>
> So now we're dimensioning everything consistently to see if that is the
> cause of the segfaulting.

...

The dimensions themselves, no. The dummy argument is associated w/ the
actual argument by sequence association so as long as the actual
allocation of space in the caller is large enough to match (or exceed)
the sizes passed to the subroutine there is sufficient space.

Now, it's always possible there's a bug in the subroutine and the
addressing computation is fouled up and exceeds the bounds, but that's
not fundamentally because of the treatment of the 2D array as 1D vector
per se.

And, of course, it's always possible the corruption even if it is in
this array or past is actually from some one or more of the other
routines clobbering it instead.

Indeed, such stuff was used greatly altho lots of time it was sorta'
backwards of this where all memory of the application was in one long 1D
array and that was parceled out in various ways to working routines
along the way--the poor man's dynamic memory allocation.

--

By

Ron Shepard

unread,

Oct 26, 2012, 2:02:13 AM10/26/12

to

In article <k6bjkh$ds9$1...@speranza.aioe.org>,

Paul van Delst <paul.v...@noaa.gov> wrote:

> I'd *never* seen code before where an actual argument dimensioned
> argument(N_I, N_J)
> was passed in to a routine that was expecting
> argument(N_I*N_J, 1)

This is fairly common stuff, and it is legal. It depends on storage
sequence order, and it was common to see in f77 code for things like
setting a 2D array to zero, or scaling an array by some constant
value, adding two arrays, element-by-element multiplication, and
things like that. Now with array syntax, this is not as useful was
it was 30 years ago.

However, passing a scalar actual argument and associating it to an
array, even with size 1, is an actual error. It works for compilers
that pass everything by address, but even in f77 days there were
many compilers that passed scalars and arrays in different ways
(e.g. using registers), and this would fail.

$.02 -Ron Shepard

glen herrmannsfeldt

unread,

Oct 26, 2012, 3:19:30 AM10/26/12

to

Ron Shepard <ron-s...@nospam.comcast.net> wrote:

(snip)

> However, passing a scalar actual argument and associating it to an
> array, even with size 1, is an actual error. It works for compilers
> that pass everything by address, but even in f77 days there were
> many compilers that passed scalars and arrays in different ways
> (e.g. using registers), and this would fail.

Yes, but note that it is also legal to use an array element
as an actual argument for an array dummy, in which case
the callee can access all elements from that to the
end of the array.

So, yes scalars can be passed differently, but it has to be
consistent with the above.

-- glen

gmail-unlp

unread,

Oct 26, 2012, 6:36:44 AM10/26/12

to

On Oct 25, 11:51 am, Paul van Delst <paul.vande...@noaa.gov> wrote:
> Hello,
>
> On 10/24/12 13:57, gmail-unlp wrote:
>
>
>
> > Hmmm... this (errors when using only one OpenMP thread) usually leads
> > me to find some uninitialized/copied in thread local data. I'm trying
> > to "reconstruct" the subroutine (I'm assuming
> > a subroutine, but it could be a function, I think):
>
> Well a test with four threads fails as well.

Sure, if one thread does not have properly initialized local data, the
problem is proportional with more threads.

I didn't have time to check the details, but do you mean that in

> CALL POLATES6(IPOPT,KGDSI,KGDSO,MI,MO,KM,IBI,LI,GI,
> & NO,RLAT,RLON,IBO,LO,GO,IRET)

values for LI,GI,NO,RLAT,RLON,IBO,LO,GO,IRET are not what should be?

I still would like to know if the subroutine is called in a parallel
region (e.g. one created in ipolates or in interp or "before") or a
parallel region is created inside the routine.

Fernando.

[snip]

Ron Shepard

unread,

Oct 26, 2012, 11:47:39 AM10/26/12

to

In article <k6ddi2$qem$1...@dont-email.me>,

The machines I knew about that did this were the DEC PDP-10 and
DECSYSTEM-20 machines. In this case, the registers were addressable
through both memory instructions and with immediate instructions, so
copies of scalars would often be passed efficiently through
registers with their register address, while arrays (and array
elements, for the reason you state above) would be passed through
their original memory location address.

I remember the situation where this caught me. I had an array
actual argument that was passed through several layers of
subroutines to eventually an array dummy argument. In one of those
intermediate levels, one of the dummy arguments was declared as a
scalar. With whatever optimization flags I happened to be using,
the compiler decided to make a local register copy and pass it
through the register as a scalar. This was well-tested code that
ran on a variety of machines, and I did not believe at first that it
was a code error in that part of the program, so I spent a lot of
time looking at the wrong places. But when I found the error, I did
realize what was the mistake and why the code seemed to have been
working correctly for such a long time elsewhere.

This was in the 1970's, well before f90 and explicit interfaces,
which today would catch this kind of mistake at compile time. Which
brings me to the point that I wanted to make to the original poster.
If you turn your old f77 style code into f90 code by moving the
routines into modules (which will give the compiler explicit
interfaces to work with), then the compiler will help a lot with all
kinds of errors like this. This particular error is a "rank"
mismatch, but f90 will also catch "type" and "kind" mismatches. This
reduces this kind of debugging effort from days or weeks down to
seconds.

$.02 -Ron Shepard

glen herrmannsfeldt

unread,

Oct 26, 2012, 1:29:28 PM10/26/12

to

Ron Shepard <ron-s...@nospam.comcast.net> wrote:
> In article <k6ddi2$qem$1...@dont-email.me>,

(snip)

>> Ron Shepard <ron-s...@nospam.comcast.net> wrote:

>> (snip)
>> > However, passing a scalar actual argument and associating it to an
>> > array, even with size 1, is an actual error. It works for compilers
>> > that pass everything by address, but even in f77 days there were
>> > many compilers that passed scalars and arrays in different ways
>> > (e.g. using registers), and this would fail.

(snip, then I wrote)

>> Yes, but note that it is also legal to use an array element
>> as an actual argument for an array dummy, in which case
>> the callee can access all elements from that to the
>> end of the array.

>> So, yes scalars can be passed differently, but it has to be
>> consistent with the above.

> The machines I knew about that did this were the DEC PDP-10 and
> DECSYSTEM-20 machines. In this case, the registers were addressable
> through both memory instructions and with immediate instructions, so
> copies of scalars would often be passed efficiently through
> registers with their register address, while arrays (and array
> elements, for the reason you state above) would be passed through
> their original memory location address.

The other interesting, and related to argument passing, feature
of the PDP-10 is the indirect bit. Instructions and addresses
are stored in 36 bit words, each of which has an indirect bit.
Addresses in argument lists (lists of addresses to the arguments
of a called routine) will have the indirect bit off if they
address the data directly, on if they address the address of
the data. An indirect load of that address, then, gets you the
data, and an indirect store changes it. Indirect is recursive,
so the called routine doesn't need to know how deep it goes.

> I remember the situation where this caught me. I had an array
> actual argument that was passed through several layers of
> subroutines to eventually an array dummy argument. In one of those
> intermediate levels, one of the dummy arguments was declared as a
> scalar. With whatever optimization flags I happened to be using,
> the compiler decided to make a local register copy and pass it
> through the register as a scalar. This was well-tested code that
> ran on a variety of machines, and I did not believe at first that it
> was a code error in that part of the program, so I spent a lot of
> time looking at the wrong places. But when I found the error, I did
> realize what was the mistake and why the code seemed to have been
> working correctly for such a long time elsewhere.

The OS/360 Fortran compilers make local copies of scalars.
They don't have an indirect bit, so it takes one additional
instruction for each level of indirect addressing every time
you reference it. A local copy, plus the instructions to
load and restore it, takes a constant number of instructions.

This means it will also likely fail if the called routine has
too many dummy arguments, even if the program doesn't reference
one of them.

> This was in the 1970's, well before f90 and explicit interfaces,
> which today would catch this kind of mistake at compile time. Which
> brings me to the point that I wanted to make to the original poster.
> If you turn your old f77 style code into f90 code by moving the
> routines into modules (which will give the compiler explicit
> interfaces to work with), then the compiler will help a lot with all
> kinds of errors like this. This particular error is a "rank"
> mismatch, but f90 will also catch "type" and "kind" mismatches. This
> reduces this kind of debugging effort from days or weeks down to
> seconds.

It has to allow for the legal rank mismatch for assumed shape
arrays, though.

-- glen

Richard Maine

unread,

Oct 26, 2012, 2:55:29 PM10/26/12

to

glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

> It has to allow for the legal rank mismatch for assumed shape
> arrays, though.

I'm (fairly) sure that you mean assumed size. It is not legal for
assumed shape arrays to have mismatched ranks.

--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain

glen herrmannsfeldt

unread,

Oct 26, 2012, 4:26:42 PM10/26/12

to

Richard Maine <nos...@see.signature> wrote:
> glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
>
>> It has to allow for the legal rank mismatch for assumed shape
>> arrays, though.
>
> I'm (fairly) sure that you mean assumed size. It is not legal for
> assumed shape arrays to have mismatched ranks.

Oops. Once in a while, I think one thing and my hands write
another.

Well, it used to be that I got them wrong because the names seemed
wrong, but that doesn't happen anymore.

Yes, I meant assumed size.

I suppose some will give a warning, though.

-- glen

Ron Shepard

unread,

Oct 27, 2012, 2:34:17 AM10/27/12

to

In article <k6erm1$epa$2...@dont-email.me>,

This is a good point that I should have mentioned. To take
advantage of the argument mismatch checking that modern fortran
does, you do need to replace all of the fixed-dimensioned (a(m),
a(m,n), etc.) and assumed size (a(*), a(m,*), etc.) arrays with
assumed shape declarations(a(:), a(:,:), etc.). The former fall
back to the storage sequence association conventions of f77, where
not only do compilers typically not warn the programmer about rank
or dimension mismatches, but such things are perfectly valid and
legal. As mentioned before, these sometimes trigger copy-in/copy-out
argument association, which is another reason to avoid them when
possible. With assumed shape dummy declarations, the
copy-in/copy-out operation is usually avoided, and the compiler can
do it TKR (type, kind, rank) argument checking at compile time.

If you are upgrading old code, then this is sometimes kind of
tedious to do. What I generally do is start at the bottom of the
calling tree and do one subroutine at a time and work my way up to
the top.

$.02 -Ron Shepard

Dan Nagle

unread,

Oct 27, 2012, 9:02:11 AM10/27/12

to

Hi,

On 2012-10-27 06:34:17 +0000, Ron Shepard said:
> As mentioned before, these sometimes trigger copy-in/copy-out
> argument association, which is another reason to avoid them when
> possible. With assumed shape dummy declarations, the
> copy-in/copy-out operation is usually avoided, and the compiler can
> do it TKR (type, kind, rank) argument checking at compile time.

Where you can and your compilers support it, you can use
the contiguous attribute to improve the efficiency of assumed size.

--
Cheers!

Dan Nagle

Ron Shepard

unread,

Oct 28, 2012, 1:31:20 PM10/28/12

to

In article <k6gm0e$mfs$1...@dont-email.me>,

The CONTIGUOUS attribute is related to this issue, but for
efficiency it can cut both ways. First, I think CONTIGUOUS applies
only to assumed shape arrays (and pointer arrays, which look like
assumed shape in their declarations), it does not apply to assumed
size or arrays with fixed dimensions which the compiler already
knows are storage order contiguous.

If a CONTIGUOUS assumed shape dummy is associated with a
noncontiguous actual argument, then the compiler must either make a
copy of that array to guarantee that it is storage sequence
contiguous, or it must somehow determine by other means that it is
storage sequence contiguous (e.g. by testing memory locations).
Thus the use of CONTIGUOUS might actually trigger a copy that would
not have otherwise occurred.

The utility of CONTIGUOUS comes in when for optimization purposes
the compiler would like to work with a storage order contiguous
array, in which case it would normally make a local copy, operate on
that copy, and then copy the results back to the original. In this
situation, the CONTIGUOUS attribute tells the compiler that these
additional copies are not required.

So if you are writing a subroutine that will be called with both
contiguous and noncontiguous actual arguments, and you want your
assumed shape dummy array to use the original array in both cases
without copies, then you do NOT want to use the CONTIGUOUS attribute
in the declaration. This does not force the compiler to avoid the
copies, it is still free to do so if it wants (e.g. for cache
locality reasons), but it gives it a hint that the programmer does
not think this is necessary.

On the other hand, if you have a group of subroutines that call each
other with array arguments, and some of those subroutines require
contiguous storage arrays and others don't care, then the compiler
might normally make numerous copies in those cases where storage
sequence order is required. The use of the CONTIGUOUS attribute
could be used to eliminate all of the copies (except possibly for
one at the beginning of the sequence of calls). A common example of
this is when using external subroutines with implicit interfaces;
here the compiler must assume that the actual array arguments need
to be storage order contiguous. Once the arrays have the CONTIGUOUS
attribute, then no redundant copies need to be made, the compiler
can always just pass the address of the first element the way that
f77 did.

Also, there is the associated IS_CONTIGUOUS() intrinsic which allows
the programmer to test the status of a particular array. Presumably
this is the same test that the compiler must do anyway while
associating actual and dummy arguments, and it allows the programmer
to make algorithm choices based on the status of the array.

I have not actually used this attribute yet in my code, but I have
read about it here in c.l.f. and in various documentation. So I'm
not certain the above is correct, but that is my understanding. I
think this is actually quite complicated, and it takes several
readings of the documentation in order to understand what is
supposed to be happening with this attribute.

$.02 -Ron Shepard

glen herrmannsfeldt

unread,

Oct 28, 2012, 5:55:26 PM10/28/12

to

Ron Shepard <ron-s...@nospam.comcast.net> wrote:

(snip)

>> Where you can and your compilers support it, you can use
>> the contiguous attribute to improve the efficiency of assumed size.

> The CONTIGUOUS attribute is related to this issue, but for
> efficiency it can cut both ways. First, I think CONTIGUOUS applies
> only to assumed shape arrays (and pointer arrays, which look like
> assumed shape in their declarations), it does not apply to assumed
> size or arrays with fixed dimensions which the compiler already
> knows are storage order contiguous.

Contiguous, or close to contiguous, also makes more efficient
use of the cache on modern processors.

-- glen

Dan Nagle

unread,

Oct 28, 2012, 6:51:22 PM10/28/12

to

Hi,

On 2012-10-28 17:31:20 +0000, Ron Shepard said:

> I have not actually used this attribute yet in my code, but I have
> read about it here in c.l.f. and in various documentation. So I'm
> not certain the above is correct, but that is my understanding. I
> think this is actually quite complicated, and it takes several
> readings of the documentation in order to understand what is
> supposed to be happening with this attribute.

Yes, it's hard to fully understand. (If you check the work list
for f08, there's two work items, something like "contiguous" and
"more contiguous". That's because it was a lot harder than anyone
thought it would be.)

There's simply contiguous, which means compile-time-provable contiguous.
And then there's contiguous, which is run-time contiguous.

Using the is_contiguous() intrinsic, with perhaps different versions
(yes, I know) of key subprograms, you can likely help assumed-shape
array performance. At least, that was the intention.

--
Cheers!

Dan Nagle

Paul van Delst

unread,

Nov 1, 2012, 11:01:53 AM11/1/12

to

Hey there folks,

Just an update... we moved to a new machine. The previous one was our
pre-supacomputer-delivery test machine that replicated the environment
of the actual final machine.

We ran the tests on the actual to-be-operational machine and, everything
passed. Regardless of optimisation level, fp switches, etc.

You coulda knocked us down with a feather.

It's my understanding the machines are themselves physically similar
(e.g. same chips etc).

The one difference we know of: compiler version. The previous machine
used ifort v12.0.4 and v12.1.0. The new machine has v12.1.5 (20120612).

Note that we altered the tests to ensure actual and dummy argument
shapes matched in the failing routines, re-ran on both machines and
still got the fail on one, pass on the other.

So we're sorta scratching our heads a bit now. We're pretty much
convinced the problem still exists in our code and that somehow the
updated compiler helped "fix" it. But the wind has been removed from our
"find that bug" sail and the effort has been moved onto a back-burner.
Mostly because when the new machine goes operational and everyone starts
piling on there using the libraries in question we have a better chance
of isolating a failure case (which we can add to our tests).

So to all who chipped in with advice, thanks very much. It was much
appreciated. When we figure out this nut of a problem, I'll be sure to post.

cheers,

paulv

On 10/23/12 16:21, Paul van Delst wrote:
> Hi all,
>
> I'm looking for some debugging help as I've pretty much run out of
> ideas. Here's the situation:
>
> Compiler: ifort v12.0.4
> System: Linux cluster
>
> We have some code that crashes in a unit test with the error

> forrtl: error (65): floating invalid

> when compiled with any of the following:
> -O3 -traceback -fpconstant -real-size 32 -integer-size 32 -openmp
> -O3 -traceback -fpconstant -real-size 64 -integer-size 64 -openmp
> -O3 -traceback -fpconstant -real-size 64 -integer-size 32 -openmp
>
> The offending line -- well, the line indicated in the traceback output
> -- is line #224 below:
>
> 218 DO K=1,KM
> 219 IBO(K)=IBI(K)
> 220 DO N=1,NO
> 221 LO(N,K)=WO(N,K).GE.PMP*NB4
> 222 IF(LO(N,K)) THEN
> 223 ! print*,'in loop ',n,k,GO(N,K),WO(N,K)
> 224 GO(N,K)=GO(N,K)/WO(N,K)
> 225 ELSE
> 226 IBO(K)=1
> 227 GO(N,K)=0.
> 228 ENDIF
> 229 ENDDO
> 230 ENDDO
>
> the associated definitions are:
>

> 99 INTEGER, INTENT(IN ) :: IBI(KM), KM
> ...
> 101 INTEGER, INTENT( OUT) :: IBO(KM), NO
> 102C
> ...
> 104 LOGICAL*1, INTENT( OUT) :: LO(MO,KM)
> ...
> 107 REAL, INTENT( OUT) :: GO(MO,KM)
> ...
> 112 INTEGER :: K, N
> 113 INTEGER :: NB, NB1, NB2, NB3, NB4
> ...

> 115 REAL, ALLOCATABLE :: CROT(:),SROT(:),WO(:,:)
> 116 REAL :: PMP
>
> - If the print statement at line #223 is UNcommented, the code runs to
> completion.
> - If the compiler flag "-fpe0" is used (should cause execution to stop
> on divide-by-zero), the code runs to completion.
> - If debug flag "-g" is included in the compile, the code runs to
> completion.
>
> Seems to me to be a classic Heisenbug.
>
> The issue we have now is that as soon as we try anything to figure out
> where the problem is (print statements, or using a debugger) the code
> runs to completion.
>
> So I'm appealing to the clf group for any debugging ideas. I realise the
> above is not a lot of information to go on -- there are a bunch of other
> avenues we are looking into, e.g. linked in libraries being the cause,
> or catalyst -- but I'm nearing my wits end. I'm hoping there some tip
> and/or trick that I've overlooked.
>
> Thanks in advance for any info.
>
> cheers,
>
> paulv
>
>