eispack tests

Wolfman Jack

unread,

Dec 11, 2018, 10:03:14 AM12/11/18

to

Dear All,

Is anybody here familiar with the tests of eispack? I downloaded eispack source from here:
http://ftp.uni-bayreuth.de/math/old/eispack/lib/deispack.f

and its tests from here:
http://ftp.uni-bayreuth.de/math/old/eispack/test/

when compiling and running dch, I get errors depending on optimization level with gfortran 5.4.0. If optimization is switched on, the E2 variable of HTRIDI seems not to be calculated correctly when called by ZH, and that propagates, and the resulting errors are large.

With gfortran 7.3.0 and -O2 (maybe also -O3 ) the results seem to be OK.
(Also, if I change dch to never use the same variable for the offdiagonal elements and their squares, which, in principle is allowed if the squares are not needed, according to the documentation), it works again.

Is this a gfortran bug, or eispack?

Wolfman Jack

unread,

Dec 11, 2018, 10:37:14 AM12/11/18

to

Sorry, by ZH I meant CH, just my local version is renamed to distinguish double precision complex and single precision complex versions. In Eispack, DP and SP routines have the same name, you can only link one to your executable.

robin....@gmail.com

unread,

Dec 12, 2018, 7:51:08 AM12/12/18

to

With code as old as that, relying on adjustable dimensions,
and no IMPLICIT NONE anywhere, consider that there could be
uninitialized variable(s) (including array elements).
That you get different results depending on optimizing level,
is one symptom of uninitialized variables.
Consider also the likelihood of subscript errors.

BTW, what data are you using?

Wolfman Jack

unread,

Dec 12, 2018, 8:32:12 AM12/12/18

to

The tests include data, the one in question uses FILE4{3,4}. Why I did not really think it is subscript error, or uninitialized variables, is that these codes have seen a lot of testing.

What I saw is that CH transform the matrix into HTRIDI, and it works well with that particular gfortran version as well if I
a) lower the optimization level to -O1, or
b) if I just print the elements of the variable E(J) somewhere below where it is set after the DO 260 J=1,L line,
c) change the test routine dch.f not to call CH with the same argument for both E and E2 at a few places.

I checked, and actually the htridi subroutine does compile with -fimplicit-none, and at least with another gfortran version, it also runs the test correctly (at least with plausible results).

Wolfman Jack

unread,

Dec 12, 2018, 8:35:24 AM12/12/18

to

Sorry, I wrote things a bit messy. What I wanted to write was, that what CH does first, is transform the matrix to a tridiagonal form using HTRIDI, and after HTRIDI, the array of the diagonal elements (variable D of HTRIDI) agrees on the computer where it works, and on the one, where it fails, and its variable E does not. So that's why I was looking at HTRIDI.

Now, in HTRIDI, I started printing out intermediate results, and if I just printed out E(J) at the right place (hope I remember correctly where), it just started working.

mecej4

unread,

Dec 12, 2018, 9:57:46 AM12/12/18

to

Inserting WRITE statements often has the effect of inhibiting certain
optimizations. Based on the evidence so far, I suspect that you ran into
a Gfortran optimizer bug in 5.4, a bug that was probably fixed in some
Gfortran release between 6.3 (which I have and gives an error with -O2)
and 7.3, which you have tested and found to work fine.

-- mecej4

mecej...@nospam.invalid
(Replace four by 4, nospam by gmail, invalid by com,
and remove all underscores)

robin....@gmail.com

unread,

Dec 12, 2018, 9:59:15 AM12/12/18

to

That symptom is further evidence that there's an error in the program,
probably unintialised variable or subscript error, as afore suggested.

Wolfman Jack

unread,

Dec 12, 2018, 10:36:04 AM12/12/18

to

So far, I've been unable to find it. Quite annoyingly, compiling the HTRIDI subroutine with -Wall -Wextra -pedantic still gives no warning.

However, it may trick the compiler into weird stuff, because if I use the same output variables to E and E2, then if the optimization changes the order the two are set, then the correct results in E may get overwritten (a kind of pointer aliasing).

Probably an easy fix is to disregard that part of the documentation where it is said that HTRIDI may be used with E and E2 being the same variable in the calling program, if the values in E2 are not needed. Still, I find it quite annoying that I do not know if the program is incorrect and working by sheer luck, or correct and fails due to a compiler bug.

mecej4

unread,

Dec 13, 2018, 5:53:38 AM12/13/18

to

On 12/12/2018 9:36 AM, Wolfman Jack wrote:
> So far, I've been unable to find it. Quite annoyingly, compiling the HTRIDI subroutine with -Wall -Wextra -pedantic still gives no warning.
>
> However, it may trick the compiler into weird stuff, because if I use the same output variables to E and E2, then if the optimization changes the order the two are set, then the correct results in E may get overwritten (a kind of pointer aliasing).
>
> Probably an easy fix is to disregard that part of the documentation where it is said that HTRIDI may be used with E and E2 being the same variable in the calling program, if the values in E2 are not needed. Still, I find it quite annoying that I do not know if the program is incorrect and working by sheer luck, or correct and fails due to a compiler bug.
>

I believe that your conjecture regarding aliasing is correct. I found
that making the following changes to HTRIDI overcomes the problem. This
is just a temporary workaround, and there may be other similar bugs.

Subroutine HTRIDI has two array arguments E and E2. There are two calls
to HTRIDI from DCH where the same array E is used as the actual argument
for these. That is the cause for the aliasing.

The workaround: In HTRIDI, add an automatic array, Etmp(N). At the
beginning of HTRIDI, copy E into Etmp. Wherever you see E( within the
subroutine, replace by Etmp(. Just before returning from HTRIDI, copy
Etmp back into E.

With these changes, I was able to obtain correct output with Gfortran
6.3 with option -O2.

Why not use Lapack in your application?

Wolfman Jack

unread,

Dec 13, 2018, 7:05:24 AM12/13/18

to

And, hopefully, gfortran will optimize away Etmp :)

Thank you very much. Actually, I am not developing a new application, I just wanted to run someone else's old code, it needed EISPACK, so I installed it, tried the test on two computers...and it failed on one.

The I wanted to know, if this is a bug that has survived 40 years in EISPACK (note, that the name of the project developing it, had TESTING in its name), or a compiler bug.

Now I tend to think it is a compiler bug, but HTRIDI demands quite a lot from the compiler.

steve kargl

unread,

Dec 13, 2018, 11:26:35 AM12/13/18

to

Wolfman Jack wrote:

> Now I tend to think it is a compiler bug, but HTRIDI demands quite a lot from the compiler.

If an array is used as the actual argument of two different dummy arguments,
and those dummy are changed, then you have aliasing. That is a programming
error. This is definitely not a compiler bug. Compilers have been allowed to do
aggressive optimizations because things like aliasing are disallowed by the
Fortran standard.

--
steve

robin....@gmail.com

unread,

Dec 13, 2018, 4:06:32 PM12/13/18

to

On Thursday, December 13, 2018 at 11:05:24 PM UTC+11, Wolfman Jack wrote:
> And, hopefully, gfortran will optimize away Etmp :)
>
> Thank you very much. Actually, I am not developing a new application, I just wanted to run someone else's old code, it needed EISPACK, so I installed it, tried the test on two computers...and it failed on one.
>
> The I wanted to know, if this is a bug that has survived 40 years in EISPACK

Yes, it is.

> (note, that the name of the project developing it, had TESTING in its name), or a compiler bug.
>
> Now I tend to think it is a compiler bug, but HTRIDI demands quite a lot from the compiler.

It is a programming error, not a compiler bug.

The relevant code in HTRIDI is the following 3 lines:

E2(I) = SCALE * SCALE * H
G = DSQRT(H)
E(I) = SCALE * G

What do you think happens when E and E2 have been passed the same
argument?

And here's another pair of lines:
130 E(I) = 0.0D0 E2(I) = 0.0D0

Wolfman Jack

unread,

Dec 13, 2018, 6:03:49 PM12/13/18

to

What I am wondering if the standard allows such optimizations to take place, where the meaning of the code is changed by changing the order of the first and the third line of the example you're showing.
Reading the documentation part, the writer clearly wanted to destroy E2 if E and E2 are the same.

At least we understand why the test is failing. Luckily, the code I wanted to run does not use this "feature".

robin....@gmail.com

unread,

Dec 13, 2018, 8:45:44 PM12/13/18

to

When HTRIDI is called, it will execute one or the other of the
two code sequences I posted previously.

Either way, the code is incorrect (that is, it's a programming error).

It is well to note that there are other similar routines in this particular
program (e.g., HTRID3) that do the same thing, and they also are in error
if called the same way.

robin....@gmail.com

unread,

Dec 13, 2018, 8:47:34 PM12/13/18

to

On Friday, December 14, 2018 at 10:03:49 AM UTC+11, Wolfman Jack wrote:
> On Thursday, December 13, 2018 at 10:06:32 PM UTC+1, robin....@gmail.com wrote:
> > On Thursday, December 13, 2018 at 11:05:24 PM UTC+11, Wolfman Jack wrote:
> > > And, hopefully, gfortran will optimize away Etmp :)
> > >
> > > Thank you very much. Actually, I am not developing a new application, I just wanted to run someone else's old code, it needed EISPACK, so I installed it, tried the test on two computers...and it failed on one.
> > >
> > > The I wanted to know, if this is a bug that has survived 40 years in EISPACK
> >
> > Yes, it is.
> >
> > > (note, that the name of the project developing it, had TESTING in its name), or a compiler bug.
> > >
> > > Now I tend to think it is a compiler bug, but HTRIDI demands quite a lot from the compiler.
> >
> > It is a programming error, not a compiler bug.
> >
> > The relevant code in HTRIDI is the following 3 lines:
> >
> > E2(I) = SCALE * SCALE * H
> > G = DSQRT(H)
> > E(I) = SCALE * G
> >
> > What do you think happens when E and E2 have been passed the same
> > argument?
> >
> > And here's another pair of lines:
> > 130 E(I) = 0.0D0
> > E2(I) = 0.0D0
>
> What I am wondering if the standard allows such optimizations to take place, where the meaning of the code is changed by changing the order of the first and the third line of the example you're showing.

The meaning of the code is irrelevant, because the program contains errors.
Anything can happen with a wrong program.

Wolfman Jack

unread,

Dec 14, 2018, 5:29:42 AM12/14/18

to

OK, I understand. So a proper fix would be to remove that line from the documentation that says that if the values in E2 are not needed, then the two arguments can be the same (because they NEVER can be), and fix dch.f (the test) which calls HTRIDI that way.
(And similarly with the documentation of other similar routines, and their tests. I do not know if there are others, DCH is the only test that fails.)

It is annoyingly hard to catch this type of mistake though. I am usually compiling stuff with the -Wall option, and I do not get a warning where DCH calls CH with aliased variables.
(Of course the compiler would have to guess if it is input or output, but there is nothing before assigning a value to it, so it's either output or an uninitialised variable.)

mec...@gmail.com

unread,

Dec 15, 2018, 6:12:16 AM12/15/18

to

On Friday, December 14, 2018 at 4:29:42 AM UTC-6, Wolfman Jack wrote:
> OK, I understand. So a proper fix would be to remove that line from the documentation that says that if the values in E2 are not needed, then the two arguments can be the same (because they NEVER can be), and fix dch.f (the test) which calls HTRIDI that way.

Yes, somebody could fix the documentation/comments in the code, but I doubt that anyone would do that. Eispack is not maintained, as it has been replaced by Lapack.

> (And similarly with the documentation of other similar routines, and their tests. I do not know if there are others, DCH is the only test that fails.)
>
> It is annoyingly hard to catch this type of mistake though. I am usually compiling stuff with the -Wall option, and I do not get a warning where DCH calls CH with aliased variables.
> (Of course the compiler would have to guess if it is input or output, but there is nothing before assigning a value to it, so it's either output or an uninitialised variable.)
>

The error occurs in the CALL, not in the subroutine. Therefore, one cannot expect the compiler (with options such as -Wall, etc.) to issue warnings when compiling the subroutine. Calling with improper arguments (of which aliased arguments is one case) is something that will have to be checked for at run time, not compile time. The subroutine may reside in an object library with no source code available, in which case it is not possible to ask for additional checking of the subroutine code.

Few compilers can help with checking and catching aliased arguments (Lahey-Fujitsu for Linux and NAG Fortran do, to some extent). On the other hand, compilers such as Intel Fortran do provide compiler options to emit code for the caller that avoids optimizations that rely on non-aliasing.

Many old Fortran programs such as Eispack, MATH77, etc., are high quality packages, but will need adaptation for use with modern hardware and compilers, for a number of reasons such as:

1. Hardwired machine constants in the source code

2. Assumption that all variables are saved and initialized to zero, if no explicit initialization is given

3. Equivalenced variables of different types

4. Mismatched subprogram argument lists

Wolfman Jack

unread,

Dec 15, 2018, 6:58:55 AM12/15/18

to

I compiled the calling routine (DCH) with -Wall too. I'll check if there is something like -Waliasing.

mec...@gmail.com

unread,

Dec 15, 2018, 7:11:31 AM12/15/18

to

On Saturday, December 15, 2018 at 5:58:55 AM UTC-6, Wolfman Jack wrote:
> I compiled the calling routine (DCH) with -Wall too. I'll check if there is something like -Waliasing.
>

That would not help much, because the compiler would not know which arguments are liable to be altered in the subroutine being called. If an explicit interface to HTRIDI were made available, with INTENTs known for the aliased arguments, a compiler may emit warnings. But then, these features (intents and interfaces) were definitely not available in Fortran 66.

dpb

unread,

Dec 15, 2018, 9:42:07 AM12/15/18

to

On 12/15/2018 5:58 AM, Wolfman Jack wrote:
> I compiled the calling routine (DCH) with -Wall too. I'll check if there is something like -Waliasing.

...

There's the remotest of outside chances the compiler _might_ catch
something if you compile both calling routine and the subroutine
together by doing some additional global checks but it's purely a
vendor-added feature of the compiler if it were to do so...

--

Dick Hendrickson

unread,

Dec 17, 2018, 12:01:01 PM12/17/18

to

In the general case it's not checkable at compile time.

Something like
subroutine sub(a,b)
a = a + 1
if (a > 0) b = 42
end
Is fine if used like
a = -3
call sub (a, a)
call sub (a, 1)
But both calls are in error if a >= 0

It's the same as
if ( x > 0 ) x = sqrt (x)
it depends on the run time flow of control.

Dick Hendrickson

Louis Krupp

unread,

Dec 17, 2018, 9:04:38 PM12/17/18

to

Stupid question: Is "if ( x > 0 ) x = sqrt (x)" problematic?

Louis

ga...@u.washington.edu

unread,

Dec 18, 2018, 2:12:06 AM12/18/18

to

On Monday, December 17, 2018 at 6:04:38 PM UTC-8, Louis Krupp wrote:

(snip)

> Stupid question: Is "if ( x > 0 ) x = sqrt (x)" problematic?

There is an IBM example for the H Extended compiler
which goes like this:

DO 11 1=1,10
DO 12 J=1,10
9 IF (B(I).LT.O) GO TO 11
12 C(J)=SQRT(B(I)
11 CONTINUE

It seems that the compiler is good enough to figure out
that the SQRT is invariant inside the J loop, and so move
the computation outside the loop. The IF statement stays
in the loop.

Of course the compiler shouldn't do that, but that was
optimizer technology in 1972. I think it is actually
older, moved into H Extended from the H compiler.

Dick Hendrickson

unread,

Dec 18, 2018, 11:03:21 AM12/18/18

to

Maybe if X has some sort of funny IEEE exceptional value. But, in
general, the compiler is not supposed to evaluate the SQRT if X <= 0.
Actually, I think there's an "as if" clause that covers all execution.
The compiler is allowed to evaluate SQRT(X) speculatively, but it has to
hide any of the effects if the logical expression is false.

Dick Hendrickson

>
> Louis
>

Ron Shepard

unread,

Dec 18, 2018, 12:02:15 PM12/18/18

to

On 12/18/18 10:03 AM, Dick Hendrickson wrote:
> Maybe if X has some sort of funny IEEE exceptional value. But, in
> general, the compiler is not supposed to evaluate the SQRT if X <= 0.
> Actually, I think there's an "as if" clause that covers all execution.
> The compiler is allowed to evaluate SQRT(X) speculatively, but it has to
> hide any of the effects if the logical expression is false.

One would want those exceptions to be raised at the point of assignment
of the illegal value, even though they actually occurred earlier in the
execution stream. But this is only a quality of implementation issue,
there is no requirement in the language standard that this occur. The
IEEE exceptions were designed with this in mind, but actual
implementation is slow coming.

$.02 -Ron Shepard

mecejfour

unread,

Dec 25, 2018, 7:58:58 AM12/25/18

to

> On 12/12/2018 9:36 AM, Wolfman Jack wrote:
>> So far, I've been unable to find it. Quite annoyingly, compiling the
>> HTRIDI subroutine with -Wall -Wextra -pedantic still gives no warning.
>>

I have spent some more time looking at the aliasing problems in Eispack. I
find that the aliasing occurs only in the test drivers, not in the library
itself. In most cases, an easy fix is to provide an otherwise not-in-use
scratch array as a replacement for one of the actual arguments, such as
using 'e' and 'e2' in place of 'e' and 'e'. In a few other cases, in the
call surround an aliased scalar argument by '(' and ')'.

In addition to the aliasing problem discussed in this thread, there exists
a defective code section in many of the test problems.

There is a code segment that computes the mean of the absolute values of
the elements of a double precision array, and then scans the array to find
the first element that is less than the just computed mean. The comment in
the code says that the DO loop used for the scan will never complete.
Unfortunately, this is not true on my Windows PC and the few Fortran
compilers on it; the DO loop goes to completion, and the next memory
location after the end of the array gets accessed and the undefined value
there is used subsequently.

Here is the typical code fragment, which you can locate in the "*test.f"
files in the Netlib Eispack/ex directory by searching for the string "THIS
LOOP WILL NEVER BE COMPLETED".

sumz = 0.0d0
do l = 1, n
sumz = sumz + abs(z(l,i))
end do

nrm(i) = sumz
if (sumz==0.0d0) go to 100
! ..........THIS LOOP WILL NEVER BE COMPLETED SINCE THERE
! WILL ALWAYS EXIST AN ELEMENT IN THE VECTOR Z(I)
! LARGER THAN !!Z(I)!!/N..........
do l = 1, n
if (abs(z(l,i))>=nrm(i)/n) go to 80
end do
! NOT in EISPACK tests: print *, 'The impossible happened'
80 tnorm = sign(nrm(i), z(l,i))

Here is a test program to disprove the claim regarding the DO loop never
being completed.

program tst
implicit none
double precision x(10), y
integer :: i, n = 10, iy(2), ix(2)
equivalence (iy,y),(ix,x)
!
y=0.0d0
do i=1,n
x(i) = -sqrt(0.5d0)
y = y+abs(x(i))
end do
y = y/n
print 10,ix(2),ix(1),'item-1',iy(2),iy(1),'mean'
10 format(1x,2Z8.8,2x,A)
end program

Because of rounding, the mean of the array, whose elements are all equal
to -sqrt(2d0), is larger than that value in the last bit, as you can see
by running the program.

-- mecej4

> mecej...@nospam.invalid (Replace four by 4, nospam by gmail, invalid
> by com,
> and remove all underscores)

--
S:\mecej4.sig

mecejfour

unread,

Dec 25, 2018, 8:16:55 AM12/25/18

to

On Tue, 25 Dec 2018 12:58:56 +0000, mecejfour wrote:

> ...the array, whose elements are all equal to -sqrt(2d0)
^1/
That is, -sqrt(0.5d0)

--
S:\mecej4.sig

dpb

unread,

Dec 25, 2018, 10:17:45 AM12/25/18

to

On 12/25/2018 7:16 AM, mecejfour wrote:
> On Tue, 25 Dec 2018 12:58:56 +0000, mecejfour wrote:
>
>> ...the array, whose elements are all equal to -sqrt(2d0)
> ^1/
> That is, -sqrt(0.5d0)

Good on ya' for uncovering the edge case.

A Christmas present similar to a lump of coal, maybe?? :)

--

mecejfour

unread,

Dec 25, 2018, 10:25:52 AM12/25/18

to

Thanks! Make mine anthracite, with no radioactivity, please.

S:\mecej4.sig

ga...@u.washington.edu

unread,

Dec 25, 2018, 12:50:23 PM12/25/18

to

On Tuesday, December 25, 2018 at 4:58:58 AM UTC-8, mecejfour wrote:

> In addition to the aliasing problem discussed in this thread, there exists
> a defective code section in many of the test problems.

> There is a code segment that computes the mean of the absolute values of
> the elements of a double precision array, and then scans the array to find
> the first element that is less than the just computed mean. The comment in
> the code says that the DO loop used for the scan will never complete.
> Unfortunately, this is not true on my Windows PC and the few Fortran
> compilers on it; the DO loop goes to completion, and the next memory
> location after the end of the array gets accessed and the undefined value
> there is used subsequently.

(snip)

Fortunately, and likely also why it wasn't found before now,
this pretty much never happens with real data.

I have found a similar problem in linear regression where
it gets a sqrt of a negative number, computing r. The cases
I have had are when the data points are exactly on a line,
and likely integers also. Nice for tests, but never happens
with real data.

> Here is a test program to disprove the claim regarding the DO loop never
> being completed.

> program tst
> implicit none
> double precision x(10), y
> integer :: i, n = 10, iy(2), ix(2)
> equivalence (iy,y),(ix,x)
> !
> y=0.0d0
> do i=1,n
> x(i) = -sqrt(0.5d0)
> y = y+abs(x(i))
> end do
> y = y/n
> print 10,ix(2),ix(1),'item-1',iy(2),iy(1),'mean'
> 10 format(1x,2Z8.8,2x,A)
> end program
>
> Because of rounding, the mean of the array, whose elements are all equal
> to -sqrt(2d0), is larger than that value in the last bit, as you can see
> by running the program.

Can you find a case where all the numbers are not equal?

mecejfour

unread,

Dec 25, 2018, 3:42:20 PM12/25/18

to

On Tue, 25 Dec 2018 09:50:21 -0800, gah4 wrote:

> (snip)

>
> I have found a similar problem in linear regression where it gets a sqrt
> of a negative number, computing r.

Excel had such a problem, see <https://support.microsoft.com/en-us/
help/829249/you-will-receive-an-incorrect-r-squared-value-in-the-chart-
tool-in-exc>

> Can you find a case where all the numbers are not equal?

Sure. Take an odd number of equally spaced points on a sloped straight
line, y(x). Compute the mean of y and compare every point on the line to
the mean. The median point should coincide with the mean.

-----------
program tst
implicit none
double precision x(19), avg, s
integer :: i, n = 19
character(2) :: cmp(19)
!
read *, s ! slope
x = [(sqrt(0.5d0) + s*(i-10), i = 1, n)]
avg = sum(x)/n ! mean(x)
print 10, avg
10 format(1x,'Mean = ',F18.15)
! compare element of x to avg (10th should equal avg)
where (x < y)
cmp = 'LT'
else where (x == y)
cmp = 'EQ'
else where
cmp = 'GT'
end where
print 30, (i, x(i), cmp(i), y, i = 9, 11)
30 format('x(',I2,') = ',F18.15,2x,A2,2x,F18.15)
end program
-------------

With s = 0.1, the output agrees with mathematical expectations:

Mean = 0.707106781186548
x( 9) = 0.607106781186548 LT 0.707106781186548
x(10) = 0.707106781186548 EQ 0.707106781186548
x(11) = 0.807106781186548 GT 0.707106781186548

With s = 1, rounding error becomes noticeable, with the centroid lying
below the line:

Mean = 0.707106781186547
x( 9) = -0.292893218813452 LT 0.707106781186547
x(10) = 0.707106781186548 GT 0.707106781186547
x(11) = 1.707106781186547 GT 0.707106781186547

With s = 10, the centroid appears to rise above the line:

Mean = 0.707106781186553
x( 9) = -9.292893218813451 LT 0.707106781186553
x(10) = 0.707106781186548 LT 0.707106781186553
x(11) = 10.707106781186548 GT 0.707106781186553