running the gfortran testsuite with various compilers

Janus

unread,

Jan 22, 2018, 5:42:08 PM1/22/18

to

Hi all,

I recently engaged in the exercise of checking various compilers on the gfortran testsuite, partly inspired by Steve's recent post on the flang-dev mailing list (http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2018-January/000008.html), partly triggered by frustration about numerous ifort regressions.

The gfortran testsuite is the largest public collection of Fortran test cases that I'm aware of, and it is routinely used to check for regressions in the gfortran compiler when new features are added or bugs are fixed. It essentially grows with every modification of the compiler and every bug that is fixed.

It is certainly an interesting question how good other compilers are able to cope with this collection of test cases. To a certain extent it can be regarded as a measure for quality of implementation and coverage of Fortran standard features in any compiler (however, with a gfortran-centric bias, since the set of test cases is constructed such that it specifically includes only cases that gfortran is able to handle well - test cases for open/unfixed bugs are not included, yet).

Technically, not all test cases are suited for being run with other compilers. In particular many of them are compile-time tests that check for the compiler's response to wrong code (with gfortran-specific wording of error messages etc). And even among the runtime tests, which are supposed to be valid Fortran code, some require specific compiler options (which are not trivially translated to other compilers) etc.

To solve all this, I ended up writing a small cmake/ctest script that selects suitable cases, compiles them with a given compiler and runs the resulting executable, see here:

https://gist.github.com/janusw/17a294125d6956bea736a20c409e7881

ctest helps to do all of this in an orderly fashion, even in parallel if desired, and presents a nice overview of the results. I chose cmake/ctest over dejagnu (which is used originally in the gcc testsuite) just because I'm more comfortable with it and found it easier to handle.

I'll show some results here, starting with various versions of gfortran:

gfortran trunk: 100% tests passed, 0 tests failed out of 1553
gfortran 7.2: 96% tests passed, 64 tests failed out of 1553
gfortran 6.3: 93% tests passed, 113 tests failed out of 1553
gfortran 5.4: 90% tests passed, 152 tests failed out of 1553
gfortran 4.9.4: 86% tests passed, 217 tests failed out of 1553
gfortran 4.8.5: 84% tests passed, 251 tests failed out of 1553
gfortran 4.7.4: 80% tests passed, 303 tests failed out of 1553
gfortran 4.6.4: 77% tests passed, 363 tests failed out of 1553
gfortran-4.4.7: 61% tests passed, 600 tests failed out of 1553

It's not surprising that the current trunk of gfortran has 100% success rate, since I used the testsuite from gfortran trunk, which contains all tests that are known to work with this version of the compiler. As you move upward from earlier versions, you can nicely see the progress, with about 600 relevant runtime tests added to the testsuite since version 4.4.

Now, more interestingly, I also tried a few other compilers:

ifort 18.0.1: 82% tests passed, 281 tests failed out of 1553
sunf95 12.6: 70% tests passed, 472 tests failed out of 1553
flang: 63% tests passed, 570 tests failed out of 1553
pgfortran 17.10: 63% tests passed, 575 tests failed out of 1553

Actually it is really quite simple to try this, you just have to check out the latest version of GCC (e.g. via svn or git), then you need access to some Fortran compiler and you need cmake, which is ubiquitous these days. It only takes a few minutes to run all tests, and I hope people can contribute results from other compilers (NAG? IBM? Absoft?).

In any case, the results presented above should be taken with a grain of salt. They certainly cannot be taken as an indication that "ifort is 20% inferior to gfortran" or something like that. First of all the set of tests is biased: There are apparently 281 tests for which gfortran works while ifort fails, but there may as well be 300 other tests (in the non-public ifort testsuite?) where ifort works and gfortran fails. Unfortunately we don't know.

Apart from that bias there are technical reasons why not all test cases are suitable for being run with all compilers, and I tried to select only the 'universally applicable' test cases, but that selection is probably not perfect and I'd be happy to receive feedback on my methodology and suggestions for improvement.

Finally it might well be that several test cases use non-standard extensions of the GNU compiler, which are not implemented in other compilers (I have not checked this in detail yet). In that sense, the above number might rather indicate "compatibility with gfortran" than "compatibility with the Fortran standard".

To go further, one could do the same thing also with other sets of test cases. E.g. the github repo of flang also has a good number of test cases (almost 1000 by now). I haven't tried really hard to run those with other compilers, but maybe someone else is inclined to try this? Or maybe someone is aware of other public collections of Fortran test cases ...?

Cheers,
Janus

urba...@comcast.net

unread,

Jan 22, 2018, 9:09:10 PM1/22/18

to

Perhaps the gfortran compiler itself can be used to reduce the tests into subsets more appropriate for testing against other compilers by selecting only tests from the set that pass when using -std=f95|f2003|f2008 ?

Janus

unread,

Jan 23, 2018, 10:12:44 AM1/23/18

to

On Tuesday, January 23, 2018 at 3:09:10 AM UTC+1, urba...@comcast.net wrote:
> Perhaps the gfortran compiler itself can be used to reduce the tests into subsets more appropriate for testing against other compilers by selecting only tests from the set that pass when using -std=f95|f2003|f2008 ?

That's a good point, but there's a catch.

I tried compiling all tests with gfotran trunk and the following additional flags:
1) -std=f2018
2) -std=f2018 -fall-intrinsics

The first one disallows all kinds of non-standard extensions, the second one allows only non-standard intrinsics.

The results are:
1) 11% tests passed, 1377 tests failed out of 1553
2) 98% tests passed, 33 tests failed out of 1553

So: Almost all of the tests use non-standard intrinsics. I think by far the most popular one is "call ABORT()", which is frequently used to abort if wrong run-time results are detected. Luckily this one is also accepted by most other compilers.

When allowing non-std intrinsics, the number of failures gets quite small, which means: Apart from non-std intrinsics almost all tests are standard conforming (according to gfortran).

The remaining question is: How many non-standard intrinsics (other than ABORT) are used in the tests?

The usage of non-std subroutines should usually show up as an undefined reference. For the case of ifort 18 there are only a handful of those:

$ grep -w 'undefined reference' Testing/Temporary/LastTest.log
associate_28.f90:(.text+0x7b7): undefined reference to `testmod_c_mp_setpt_'
backtrace_1.f90:(.text+0x37): undefined reference to `backtrace_'
complex_intrinsic_1.f90:(.text+0x41): undefined reference to `complex_'
complex_intrinsic_1.f90:(.text+0x57): undefined reference to `complex_'
complex_intrinsic_1.f90:(.text+0x7e): undefined reference to `complex_'
complex_intrinsic_1.f90:(.text+0x94): undefined reference to `complex_'
submodule_1.f08:(.text+0xc5): undefined reference to `foo_interface_mp_greet_'
submodule_1.f08:(.text+0xe2a): undefined reference to `foo_interface_mp_farewell_'

Cheers,
Janus

Janus

unread,

Jan 24, 2018, 5:43:34 PM1/24/18

to

Hi everyone,

I'll give a quick update of what I did in the meantime:
1) Updated the gcc trunk, giving me one more relevant test case.
2) Updated the cmake script from revision 3 to revision 5, with these changes:
a) Fix a small issue with ifort and certain coarray tests.
b) Split each test into a compilation and an execution test, to allow for a more fine-grained analysis.

You can see the details of the changes on github (in case you're interested).

With these updates I re-ran the tests. I can now report two numbers for each compiler:
1) the percentage of test cases that can be successfully compiled ('CMP')
2) the percentage of test cases that be be successfully executed ('EXE')

The second number is essentially what I reported before. The first number is always larger than the second one (since a test can only be executed if was successfully compiled).

The results:

ifort 18.0.1:
CMP: 92% tests passed, 128 tests failed out of 1554
EXE: 82% tests passed, 277 tests failed out of 1554

sunf95 12.6:
CMP: 76% tests passed, 377 tests failed out of 1554
EXE: 70% tests passed, 473 tests failed out of 1554

flang:
CMP: 78% tests passed, 339 tests failed out of 1554
EXE: 63% tests passed, 571 tests failed out of 1554

pgfortran 17.10:
CMP: 78% tests passed, 342 tests failed out of 1554
EXE: 63% tests passed, 581 tests failed out of 1554

You can see that the percentage of compilation success is often significantly higher than the execution success. In particular flang & PGI win in this area, so that their CMP metric is comparable to Oracle's (or even a bit better), which supposedly means that the support of syntax constructs is on a similar level in these three compilers. But then PGI/flang seem to have much more runtime bugs than Oracle, unfortunately. (Note that is just my naive immediate interpretation, without looking at the actual test failures very closely.)

As before, I'm open to criticism of my methodology. Obviously I'm trying to establish some standard metric here, by which a developer can assess the quality of any given Fortran compiler (without having to believe in catchy marketing phrases like "full Fortran 2018 support"). Some vendors may be happy about this highly effecting way of bug reporting, some may not be so flattered about the transparency it brings, but I hope it benefits the Fortran community at large and helps improve compiler quality.

Also, I repeat my call for participation: If you have access to a compiler not listed above, please take five minutes of your time and CPU power to run the script from https://gist.github.com/janusw/17a294125d6956bea736a20c409e7881 (as described in its header) and report back here with the results you obtain.

Cheers,
Janus

Lynn McGuire

unread,

Jan 24, 2018, 6:56:28 PM1/24/18

to

A comparison of the total CMP time and the total EXE time would be
interesting also.

Lynn

FortranFan

unread,

Jan 24, 2018, 8:00:03 PM1/24/18

to

On Wednesday, January 24, 2018 at 5:43:34 PM UTC-5, Janus wrote:

> .. I hope it benefits the Fortran community at large and helps improve compiler quality ..

@Janus,

Truly great effort, this can prove very helpful to those who need to do work with different Fortran compilers.

Fyi, my interest is mainly toward the possibility of a separate *part* to the ISO IEC standard on Fortran, say 1539-3, that constitutes test cases toward every aspect addressed by the standard, particularly the constraints. My vision is for a series of standard test cases which become both validation sets for compiler implementations wanting to claim "full Fortran XXXX support* as well as a standard regression suite for future revisions to said implementations.

Toward this, I think it will be useful if the Fortran user community can come together for a *separate* GitHub initiative on development of test cases - positive and negative, compilation and execution - toward the standard facilities in Fortran. For all the facilities introduced starting with the landmark 2003 revision, the test cases will preferably be classified into sections that rely on "What is new in Fortran XXXX" that WG5 seems to provide (John Reid until now e.g., see this thread https://groups.google.com/d/msg/comp.lang.fortran/CRHeQ65MAr4/iHTjLwm3AAAJ) and which is also being used by Fortranplus to provide feature implementation comparisons at their website: https://www.fortranplus.co.uk/app/download/23704631/fortran_2003_2008_compiler_support.pdf.

Consider the thread on "Parameterized Derived Types make first appearance in gfortran 8.0.0" on this forum:
https://groups.google.com/d/msg/comp.lang.fortran/NDE6JKTFbNU/WjLYYqVcAwAJ

You will see a number of cases posted in that thread in an attempt to check the gfortran implementation relative to the Fortran standard.

Same with this thread on "User Defined Derived-Type IO in gfortran"
https://groups.google.com/d/msg/comp.lang.fortran/dNkR9JC4us8/RiBGjlqRAwAJ

So for each feature whether PDTs or coarrays or submodules or enhanced interoperability with C or whatever, say a number of cases are put together via a collaborative effort by developers to *codify* the standard specifications toward the feature, then an implementation can strive to pass such tests before being able to claim support for the feature.

And if the cases can eventually be 'standardized', meaning reviewed and validated by the standards committee, and captured in an ISO IEC part, it can help institutionalize the work process to enhance the reliability of standard Fortran implementations.

Janus

unread,

Jan 25, 2018, 11:44:16 AM1/25/18

to

Hi Lynn,

On Thursday, January 25, 2018 at 12:56:28 AM UTC+1, Lynn McGuire wrote:
>
> A comparison of the total CMP time and the total EXE time would be
> interesting also.

yes, I though so as well, and ctest by default reports timings for the tests anyway. However, they are not quite as enlightening as one might hope. In any case, here they are:

gfortran trunk:
CMP: 100% tests passed, 0 tests failed out of 1554, Total Test time (real) = 32.88 sec
EXE: 100% tests passed, 0 tests failed out of 1554, Total Test time (real) = 3.44 sec

ifort 18.0.1:
CMP: 92% tests passed, 128 tests failed out of 1554, Total Test time (real) = 78.17 sec
EXE: 82% tests passed, 277 tests failed out of 1554, Total Test time (real) = 15.44 sec

sunf95 12.6:
CMP: 76% tests passed, 377 tests failed out of 1554, Total Test time (real) = 37.01 sec
EXE: 70% tests passed, 473 tests failed out of 1554, Total Test time (real) = 11.48 sec

flang:
CMP: 78% tests passed, 339 tests failed out of 1554, Total Test time (real) = 27.55 sec
EXE: 63% tests passed, 571 tests failed out of 1554, Total Test time (real) = 18.48 sec

pgf95 17.10:
CMP: 78% tests passed, 336 tests failed out of 1554, Total Test time (real) = 54.83 sec
EXE: 63% tests passed, 576 tests failed out of 1554, Total Test time (real) = 18.72 sec

The reason why they are not extremely meaningful comes mostly from the fact that all of the test cases are very short, and most of them do not do any significant amount of work at runtime (you can see that the execution times are shorter than the compile times even). Basically none of those tests were ever aimed at being performance tests. They're merely compiler correctness tests. Also I'm not passing any optimization flags to the compilers, so if and how they optimize by default is up to them (and may be different for each).

So, I would not recommend to draw any conclusions from the timing numbers above. They vary quite a bit among the compilers, though. I'm quite surprised that the execution time with gfortran is so much faster than with the other compilers. But that certainly does not mean that gfortran will exceed the performance of other compilers on large numerical codes by the same amount (actually ifort is usually slightly faster in my experience).

In summary, these figures are not an appropriate metric of runtime performance, and they're certainly not what you would expect to see in real-world codes. I'm only showing them because you asked me to ;)

Cheers,
Janus

Neil

unread,

Jan 26, 2018, 10:07:23 AM1/26/18

to

Janus,

On Wednesday, January 24, 2018 at 3:43:34 PM UTC-7, Janus wrote:
> As before, I'm open to criticism of my methodology. Obviously I'm trying to establish some standard metric here, by which a developer can assess the quality of any given Fortran compiler (without having to believe in catchy marketing phrases like "full Fortran 2018 support"). Some vendors may be happy about this highly effecting way of bug reporting, some may not be so flattered about the transparency it brings, but I hope it benefits the Fortran community at large and helps improve compiler quality.

You may find https://github.com/nncarlson/fortran-compiler-tests of interest. I share your frustration with the state of Fortran compilers. An obvious shortcoming of using the gfortran test suite as the metric (as I'm sure you are already know) is that it only includes tests (apparently) that gfortran passes. There are loads (and loads) of test cases that it fails. But it is a good start.

-Neil

Janus

unread,

Jan 26, 2018, 3:21:20 PM1/26/18

to

Hi Neil,

> You may find https://github.com/nncarlson/fortran-compiler-tests of interest.

Indeed. Thanks for the link! Wasn't aware of that.

I hope you don't mind a few contributions from my side? I'm starting to grow a nice little collection of ifort bugs.

Btw, the folder 'nag-bugs' seems to indicate that you have access to the NAG compiler. Could I ask you the favor to try my script on the gfortran testsuite with NAG? I'm kind of curious how it scores ...

> I share your frustration with the state of Fortran compilers.

I'm not entirely frustrated. gfortran is very stable for my applications, but all recent ifort releases are a complete mess.

When I entered the Fortran scene ten years back (starting my diploma thesis), I was liḱe: I urgently need an open compiler to develop on my laptop, where I don't have the commercial licenses from the university cluster. That fortunately worked out quite nicely. These days it's more like the open compiler is the only thing that works reliably.

> An obvious shortcoming of using the gfortran test suite as the metric (as I'm sure you are already know) is that it only includes tests (apparently) that gfortran passes.

I tried to stress that in the disclaimer when starting off this thread.

It's mostly its extent that makes it so useful IMHO.

And: While it's not a really 'fair' metric for comparing a given compiler's quality to gfortran, I think it's actually quite fair for comparing the non-gfortran compilers among each other. E.g. if ifort scores 20 percentage points better than PGI, then I think that reflects the actual quality difference quite well.

Cheers,
Janus

michael siehl

unread,

Jan 26, 2018, 5:11:05 PM1/26/18

to

https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/754827

Sick behaviour. I would recommend to apologize.

campbel...@gmail.com

unread,

Jan 27, 2018, 1:03:34 AM1/27/18

to

On Saturday, January 27, 2018 at 9:11:05 AM UTC+11, michael siehl wrote:
> https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/754827
>
> Sick behaviour. I would recommend to apologize.

As a compiler user, I don't understand.

If there is a significant set of test codes out there and the compiler fails with 20% of them, I would hope there was an interest in finding out why all the failures.
They may be a skewed set, based on GCC development, but they are a significant set of recent Fortran codes all the same.

Do the tests and let the user base know if the 20% is something to worry about.
Taking the legal route to denial is not encouraging.

Neil

unread,

Jan 27, 2018, 9:34:25 AM1/27/18

to

On Friday, January 26, 2018 at 1:21:20 PM UTC-7, Janus wrote:
> Could I ask you the favor to try my script on the gfortran testsuite with NAG? I'm kind of curious how it scores ...

Looks like this is a no-go because the vast majority of tests appear to use
an abort intrinsic that the NAG compiler doesn't recognize. The NAG compiler
has a "purist" bent and eschews most non-standard language extensions. if you
have any ideas for a workaround let me know; I'm curious myself how it does.

-Neil

Neil

unread,

Jan 27, 2018, 9:57:40 AM1/27/18

to

On Friday, January 26, 2018 at 3:11:05 PM UTC-7, michael siehl wrote:
> https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/754827
>
> Sick behaviour. I would recommend to apologize.

Thanks for the link; a very interesting read.

I was completely unsurprised by Intel's responses, however. One needs to
look at things from their perspective to understand. Despite the technical
merits of the matter, there is the legal liability risk which is potentially
huge (and lawyers are going to err on the conservative side). The viral
design of the GPL license, which I assume the test suite falls under, makes
source code licensed under it untouchable to closed-source software vendors.

Janus

unread,

Jan 27, 2018, 10:43:21 AM1/27/18

to

Well, yeah, I was kinda waiting for that to happen, and it's certainly no surprise that NAG is the first one to hit that problem, given its reputation as the pickiest compiler alive.

Regarding solutions, two things come to mind:
1) Fixing the gfortran tesuite to be more standard-conforming, e.g. by replacing "call abort" by "exit 1" (or something similar). This problem affects a lot of tests, but the replacement could be done fully mechanically.
2) Provide an explicit implementation of the subroutine "abort" to NAG, like so:

subroutine abort
stop 1
end subroutine

Put that in a small source file and build it together with each test case.

In principle both solutions should be feasible. Option #2 would be more straightforward, but in the long run I think #1 would good to have as well. I never had a good feeling about this massive structural usage of non-std intrinsics in the gfortran's test suite.

Cheers,
Janus

Thomas Koenig

unread,

Jan 27, 2018, 10:49:55 AM1/27/18

to

Neil <neil.n....@gmail.com> schrieb:

> I was completely unsurprised by Intel's responses, however. One needs to
> look at things from their perspective to understand. Despite the technical
> merits of the matter, there is the legal liability risk which is potentially
> huge (and lawyers are going to err on the conservative side). The viral
> design of the GPL license, which I assume the test suite falls under, makes
> source code licensed under it untouchable to closed-source software vendors.

For test cases, this argument is strange. There is nothing in
the GPL which prevents you from using the test cases internally.
I am assuming that Intel does not distribute test cases to customers.

And distributing a compiled source code which does nothing but not
call abort would sort of defeat the purpose, and even distribution
of the source would be fine.

Thomas Koenig

unread,

Jan 27, 2018, 11:25:59 AM1/27/18

to

Janus <ja...@gcc.gnu.org> schrieb:

> 1) Fixing the gfortran tesuite to be more standard-conforming,
> e.g. by replacing "call abort" by "exit 1" (or something
> similar). This problem affects a lot of tests, but the replacement
> could be done fully mechanically.

Replacing 'call abort' with 'stop 1' could be done automatically,
especially since 'stop 1' is shorter than 'call abort'. Otherwise,
automatic replacements chould have led to trouble with line lengths.

Sounds like something for gcc 9.

Neil

unread,

Jan 27, 2018, 11:30:08 AM1/27/18

to

On Saturday, January 27, 2018 at 8:43:21 AM UTC-7, Janus wrote:
> 1) Fixing the gfortran tesuite to be more standard-conforming, e.g. by replacing "call abort" by "exit 1" (or something similar). This problem affects a lot of tests, but the replacement could be done fully mechanically.

I just did this, replacing "call abort" with the standard "stop 1".

Got lots lots of errors with the CMP suite. However we need to be very careful
in interpreting what this means. For example, the very first "failure" I
investigated was with the associate_23.f90 test, and the failure was due to
illegal code that gfortran accepted:

[...]
character(len = 15) :: char_var, char_var_dim (3)
[...]
ASSOCIATE(should_work=>char_var_dim(1:2))
should_work = ["test SUCCESFUL", "test_SUCCESFUL", "test.SUCCESFUL"]

$ nagfor associate_23.f90
NAG Fortran Compiler Release 6.1(Tozai) Build 6144
Error: associate_23.f90, line 30: Different vector lengths (2 and 3)

This should have registered as a failure on all of the other compilers too.

I'm going to continue to pick through the failures to see what I find.
Where there are errors in the gfortran test suite it would be nice to see
those fixed. Perhaps this would be best done off-list.

-Neil

Neil

unread,

Jan 27, 2018, 11:47:19 AM1/27/18

to

On Saturday, January 27, 2018 at 8:49:55 AM UTC-7, Thomas Koenig wrote:
> Neil schrieb:

I don't disagree with you. It's just that corporations like Intel are
hyper sensitive. They've got hundreds (thousands?) of developers doing
stuff with the compiler, and all it takes is one of them to do something
wrong just once to create all kinds of problems (with the GPL). I was
responsible for a GPL-licensed code that a corporate collaborator
wanted to use and modify internally, but never had any intention of
distributing outside the company, and their IP lawyer just couldn't see
his way clear to allowing it, despite legitimate arguments similar to
yours. In the end we arranged for a special non-GPL license. BTW, we
subsequently changed the code license to BSD, partly due to that experience.

Ron Shepard

unread,

Jan 27, 2018, 12:11:19 PM1/27/18

to

On 1/27/18 9:43 AM, Janus wrote:
> 2) Provide an explicit implementation of the subroutine "abort" to NAG, like so:
>
> subroutine abort
> stop 1
> end subroutine

There is now also an ERROR STOP statement in the standard. I forget when
this was introduced, but it was some time after f90 and f95, so it is
recent from that perspective. I think the difference between STOP and
ERROR STOP is in how signals, such as floating point exceptions, are
communicated back to the operating system. Perhaps ERROR STOP can also
print a traceback or other debugging information to the error unit.

$.02 -Ron Shepard

michael siehl

unread,

Jan 27, 2018, 2:24:04 PM1/27/18

to

|There is now also an ERROR STOP statement in the standard.

Fortran 2008, coarray related. When executed on one coarray image it does initiate error termination on all other coarray images as well.
Cheers

FortranFan

unread,

Jan 27, 2018, 4:45:35 PM1/27/18

to

On Friday, January 26, 2018 at 10:07:23 AM UTC-5, Neil wrote:

> .. I share your frustration with the state of Fortran compilers. An obvious shortcoming of using the gfortran test suite as the metric (as I'm sure you are already know) is that it only includes tests (apparently) that gfortran passes. There are loads (and loads) of test cases that it fails. ..

OP made a series of posts at the Intel Fortran forum too, some of which a reader publicly commented as being objectionable: (https://groups.google.com/d/msg/comp.lang.fortran/AIHRQ2kJv3c/Q66l9ApRBQAJ) However a larger issue is the strong suggestion in the original post and as well in subsequent posts by OP to venture into compiler comparisons, notwithstanding all the disclaimers.

To paraphrase the Bard, things are "deeply rotten in the state of" Fortran! There are issues galore with reliability of implementations, constrained as they are with limited resources.

OP has already been informed, "No objective, general-purpose comparison is possible between compilers given the state they are all in at the moment, it's all relative to one's needs and interests."
https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/754685#new

The original post finds:
"gfortran trunk: 100% tests passed, 0 tests failed out of 1553" and

"It's not surprising that the current trunk of gfortran has 100% success rate, since I used the testsuite from gfortran trunk, which contains all tests that are known to work with this version of the compiler"

Yet take the first test I looked at based on the regression with Intel Fortran compiler 18.0: "aliasing_array_result_1.f90" at GitHub:
https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/gfortran.dg/aliasing_array_result_1.f90

Now consider a reduced test of that case with one modification to the CONTAINed function i.e., to use the RESULT keyword with the function. WHOOF, the test with gfortran fails!! See below:

--- begin console output ---

C:\Temp>type p.f90
use, intrinsic :: iso_fortran_env, only : compiler_version
integer, parameter :: ONE = 1
integer, parameter :: TEN = 10
integer, parameter :: FIVE = 5
integer :: check(ONE) = TEN
integer :: a(ONE)
print *, "Compiler Version: ", compiler_version()
a = FIVE
! This aliases 'a' by host association
a = f()
if ( any(a /= check) ) print *, "Test failed."
contains
function f() result(r) !<-- Note the keyword
integer :: r(ONE)
r = -FIVE
r = a - r
end function f
end

C:\Temp>gfortran p.f90 -o p.exe

C:\Temp>p.exe
Compiler Version: GCC version 8.0.0 20171217 (experimental)
Test failed.

C:\Temp>
--- end console output ---

So my point is one can read little into "% NN passed" reports, not to mention it can be off-putting to quite a few people while being misleading to some. There is a general problem with reliability of Fortran implementations**. However any suggestion to any commercial entity to do anything with any GPL content is understandably problematic.

I continue to lean toward the ISO IEC framework to provide a helping hand toward improving the reliability of Fortran implementations, say with a part 2 or 3 of the Fortran standard e.g., 1539-3, which can constitute test cases toward every aspect addressed by the standard, particularly the constraints.

My view remains such a part to ISO IEC 1539 can provide a universally accessible set of standard test cases for both commercial as well as FOSS implementations. These then become both validation sets for compiler implementations wanting to claim "full Fortran XXXX support* where XXXX is some ISO IEC standard revision number (e.g., 2018) as well as serve as a standard regression suite for future revisions to said implementations.

---------------------------------------------------
** But no, this does NOT mean Fortranners can simply look to transport themselves back to supposedly simpler times with FORTRAN 77 or like, as suggested by some on this forum.

Message has been deleted

Tim Prince

unread,

Jan 27, 2018, 8:23:25 PM1/27/18

to

Here is the best gfortran Windows (cygwin) result I have seen posted on
the gcc results site in recent years:

=== gfortran Summary ===

# of expected passes 46116
# of unexpected failures 19
# of unexpected successes 6
# of expected failures 79
# of unresolved testcases 13
# of unsupported tests 170
..../gcc/testsuite/gfortran/../../gfortran
version 8.0.0 20180102 (experimental) (GCC)

The "unexpected successes" are due to assuming that 64-bit Windows will
have the same default floating point mode settings as 32-bit Windows.
No one has ever been willing to explain why gfortran test suite doesn't
itself set the expected modes. As gfortran has supported IEEE_ARITHMETIC
for some years now, that should not be as difficult to accomplish as it
may have been in the past. The rounding modes and gradual vs abrupt
underflow are under control of IEEE_ARITHMETIC, leaving the x87 modes
out of reach of Fortran (but relevant only to real(10)). I think the
most severe case which gfortran deals with is the different settings
presented by the freebsd vs linux systems. People should note that
compilers other than gfortran will not set modes compatible with
gfortran test suite. Also, of course, the real(10) tests will fail on
compilers which don't support that.
There are also 4 failures in the libgomp Fortran Win64 test suite.

You are entitled to submit individual problem reports against ifort,
quoting the source of the test and the license (is gfortran testsuite
explicit as to its license, or are various licenses involved?).

Having retired from Intel, I can confirm there was a rule that no one
who had access to beta releases of Intel compiler builds (not source
code) was permitted even to submit patches for gnu compiler or binutils
bugs. The Intel beta tests were opened to gfortran developers, but
Intel personnel whose job required their use in effect had their open
source paperwork revoked during their time at Intel.
As you have seen, Intel has made a conscientious effort to prevent
inclusion of open source in proprietary products contrary to the open
source licenses, one of the possible penalties for violation being the
loss of closed source status. Intel's business has depended on quality
of support of both proprietary and open source tools.
I jumped through all the hoops to get one classical test suite approved
for use on ifort, including receiving explicit approval from all the
original authors 2 decades after the fact. Then little advantage was
taken of my effort, although I have seen C++ translations of my work
creep into some of the Intel webinars.
Needless to say, I don't speak for Intel or any past employee, nor is
there any chance of my understanding for example how Intel can support
llvm compilers in preference to gnu ones.

--
Tim Prince

Janus

unread,

Jan 29, 2018, 2:47:29 AM1/29/18

to

On Saturday, January 27, 2018 at 5:30:08 PM UTC+1, Neil wrote:
> On Saturday, January 27, 2018 at 8:43:21 AM UTC-7, Janus wrote:
> > 1) Fixing the gfortran tesuite to be more standard-conforming, e.g. by replacing "call abort" by "exit 1" (or something similar). This problem affects a lot of tests, but the replacement could be done fully mechanically.
>
> I just did this, replacing "call abort" with the standard "stop 1".

... as seen in https://github.com/nncarlson/gfortran.dg/commit/82f3582391402bd01c1f8393c72ce20522ebc62c

> Got lots lots of errors with the CMP suite. [..]

> Where there are errors in the gfortran test suite it would be nice to see
> those fixed.

I noticed https://github.com/nncarlson/gfortran.dg/issues. Thanks for doing that! I also put a "meta-bug" for this into the GCC bugzilla:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84094

Cheers,
Janus

Ron Shepard

unread,

Jan 29, 2018, 11:08:58 AM1/29/18

to

On 1/29/18 1:47 AM, Janus wrote:
> On Saturday, January 27, 2018 at 5:30:08 PM UTC+1, Neil wrote:
>> On Saturday, January 27, 2018 at 8:43:21 AM UTC-7, Janus wrote:
>>> 1) Fixing the gfortran tesuite to be more standard-conforming, e.g. by replacing "call abort" by "exit 1" (or something similar). This problem affects a lot of tests, but the replacement could be done fully mechanically.
>> I just did this, replacing "call abort" with the standard "stop 1".
> ... as seen inhttps://github.com/nncarlson/gfortran.dg/commit/82f3582391402bd01c1f8393c72ce20522ebc62c

An alternative is to add an ABORT() subroutine to your code. Inside that
subroutine, you can generate tracebacks, write to error units, shut down
communication channels, and do whatever else might be important in
addition to the STOP 1 statement which typically sets the return code.
That way, there are no changes required anywhere else in the program.
And if you decide in the future you want to do something a little
differently when the code aborts, you only need to change one place, not
dozens of places.

$.02 -Ron Shepard

FortranFan

unread,

Jan 30, 2018, 11:09:49 AM1/30/18

to

On Saturday, January 27, 2018 at 4:45:35 PM UTC-5, FortranFan wrote:

> ..

>
> The original post finds:
> "gfortran trunk: 100% tests passed, 0 tests failed out of 1553" and

> ..

>
> Yet take the first test I looked at based on the regression with Intel Fortran compiler 18.0: "aliasing_array_result_1.f90" at GitHub:
> https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/gfortran.dg/aliasing_array_result_1.f90
>

> Now consider a reduced test of that case with one modification to the CONTAINed function i.e., to use the RESULT keyword with the function. WHOOF, the test with gfortran fails!! ..
>
> So my point is one can read little into "% NN passed" reports, ..

So previously I pointed an issue with compilers regarding the test toward aliasing a function result with a host-associated variable.

Now consider the case in gfortran testsuite named, "alloc_comp_assign_1":
https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/gfortran.dg/alloc_comp_assign_1.f90

My observation is this test itself has issues and requires review and validation.

See a reduced version of this case which shows gfortran compiler to fail the test:

--- begin console output using gfortran ---

C:\Temp>type p.f90
use, intrinsic :: iso_fortran_env, only : compiler_version

type :: t
character(len=1), allocatable :: c(:)
end type

type(t) :: x(3)
type(t) :: y(3)

print *, "Compiler Version: ", compiler_version()

x(1)%c = [ "h","e","l","l","o" ]
x(2)%c = [ "g","'","d","a","y" ]
x(3)%c = [ "g","o","d","a","g" ]

y(2:1:-1) = x(1:2)

print *, "x(1)%c = ", x(1)%c, "; expected = ", [ "h","e","l","l","o" ]
print *, "y(1)%c = ", y(1)%c, "; expected = ", [ "h","e","l","l","o" ]

if (any (y(1)%c /= [ "h","e","l","l","o" ]) ) then
print *, "Test failed."
else
print *, "Successful test."
end if

stop

end

C:\Temp>gfortran p.f90 -o p.exe

C:\Temp>p.exe
Compiler Version: GCC version 8.0.0 20171217 (experimental)

x(1)%c = hello; expected = hello
y(1)%c = g'day; expected = hello

Test failed.

C:\Temp>
--- end console output ---

whereas it passes with Intel Fortran compiler:
--- begin console output with Intel Fortran ---

C:\Temp>type p.f90
use, intrinsic :: iso_fortran_env, only : compiler_version

type :: t
character(len=1), allocatable :: c(:)
end type

type(t) :: x(3)
type(t) :: y(3)

print *, "Compiler Version: ", compiler_version()

x(1)%c = [ "h","e","l","l","o" ]
x(2)%c = [ "g","'","d","a","y" ]
x(3)%c = [ "g","o","d","a","g" ]

y(2:1:-1) = x(1:2)

print *, "x(1)%c = ", x(1)%c, "; expected = ", [ "h","e","l","l","o" ]
print *, "y(1)%c = ", y(1)%c, "; expected = ", [ "h","e","l","l","o" ]

if (any (y(1)%c /= [ "h","e","l","l","o" ]) ) then
print *, "Test failed."
else
print *, "Successful test."
end if

stop

end

C:\Temp>ifort /standard-semantics /warn:all /check:all p.f90 -o p.exe
Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R
) 64, Version 18.0.1.156 Build 20171018
Copyright (C) 1985-2017 Intel Corporation. All rights reserved.

Microsoft (R) Incremental Linker Version 14.12.25835.0
Copyright (C) Microsoft Corporation. All rights reserved.

-out:p.exe
-subsystem:console
p.obj

C:\Temp>p.exe
Compiler Version:

Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(

R) 64, Version 18.0.1.156 Build 20171018

x(1)%c = hello; expected = hello
y(1)%c = hello; expected = hello
Successful test.

C:\Temp>
--- end console output ---

So as far as "alloc_comp_assign_1" is concerned, the situation is opposite to what is indicated in the original post.

Again, I think what the Fortran ecosystem can benefit from is a test suite that is thoroughly validated and in my opinion one that is also standardized.

Janus

unread,

Jan 30, 2018, 1:23:32 PM1/30/18

to

On Tuesday, January 30, 2018 at 5:09:49 PM UTC+1, FortranFan wrote:
> Now consider the case in gfortran testsuite named, "alloc_comp_assign_1":
> https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/gfortran.dg/alloc_comp_assign_1.f90
>
> My observation is this test itself has issues and requires review and validation.
>

> [...]

>
> So as far as "alloc_comp_assign_1" is concerned, the situation is opposite to what is indicated in the original post.

I have not looked into what you have modified and how it fails (simply because I don't have the time).

If you think you found a bug either in gfortran itself or the testsuite, the recommended procedure is that you open a bugreport in the GCC bugzilla database.

> Again, I think what the Fortran ecosystem can benefit from is a test suite that is thoroughly validated and in my opinion one that is also standardized.

I don't disagree here, but I think the crucial point is that it takes someone to go ahead and make a start, otherwise it's not going to happen. It's always easy to expect others to do something about the problems in the world, but it's usually more effective to just go ahead and do what you think needs to be done.

Cheers,
Janus

Janus

unread,

Feb 2, 2018, 4:26:04 PM2/2/18

to

let me amend this with another result that was sent to me privately by a friendly person who was able to test drive a beta version of IBM XL Fortran for Linux V16.1:

xlf2008_r:
CMP: 86% tests passed, 225 tests failed out of 1555
EXE: 76% tests passed, 374 tests failed out of 1555

This falls in between ifort and sunf95.

After all the criticism I shoved towards Intel, I guess one has to give them some credit for coming out of this exercise with the best results of any non-gfortran compiler.

As mentioned earlier, these results should not be used for comparisons with gfortran in particular, and in general they should not be taken *too* seriously, but I still think they serve as a good first-order approximation of compiler quality (with certain limitations).

Cheers,
Janus

FortranFan

unread,

Feb 2, 2018, 10:40:07 PM2/2/18

to

On Friday, February 2, 2018 at 4:26:04 PM UTC-5, Janus wrote:

> ..

> After all the criticism I shoved towards Intel, I guess one has to give them some credit for coming out of this exercise with the best results of any non-gfortran compiler. ..
>

You don't have to "guess": Intel Fortran is indeed a premier toolset for Fortranners that provides high performance overall, vast set of features, portability across 4 of the most widely used platforms, online service center, valuable user forums, and so forth.

> a beta version of IBM XL Fortran for Linux V16.1:

> .. falls in between ifort and sunf95. ..
> As mentioned earlier, these results should not be used for comparisons with gfortran in particular, and in general they should not be taken *too* seriously, but I still think they serve as a good first-order approximation of compiler quality (with certain limitations). ..

This is really talking from both sides of the mouth. Moreover, "within certain limitations" is quite an understatement.

Yet another test case I picked at random - typebound_operator_15.f90 - has an issue, this time with respect to standard conformance. That is now 3 of 3 in terms of the cases I looked and which have turned out questionable. The testsuite requires a serious review before any further posts or discussions about it in the context of any compiler evaluations.

The particular test case, typebound_operator_15.f90, involves overriding a type-bound procedure with a PRIVATE attribute in an extension type which is part of another module; this is not permitted by the standard. Either the base type and the extended type should be part of the same module or the procedure in question must have PUBLIC attribute for it to be standard conforming. Do either of the modifications and Intel Fortran works as expected. So an option to modify the test will be:
! { dg-do run }
!
! PR fortran/53255
!
! Contributed by Reinhold Bader.
!
! Before TYPE(ext)'s .tr. wrongly called the base type's trace
! instead of ext's trace_ext.
!
module mod_base
implicit none
private
integer, public :: base_cnt = 0
type, public :: base
private
real :: r(2,2) = reshape( (/ 1.0, 2.0, 3.0, 4.0 /), (/ 2, 2 /))
contains
procedure :: trace !<- note private attribute removed
generic :: operator(.tr.) => trace
end type base
contains
complex function trace(this)
class(base), intent(in) :: this
base_cnt = base_cnt + 1
! write(*,*) 'executing base'
trace = this%r(1,1) + this%r(2,2)
end function trace
end module mod_base

module mod_ext
use mod_base
implicit none
private
integer, public :: ext_cnt = 0
public :: base, base_cnt
type, public, extends(base) :: ext
private
real :: i(2,2) = reshape( (/ 1.0, 1.0, 1.0, 1.5 /), (/ 2, 2 /))
contains
procedure :: trace => trace_ext !<- note private attribute removed
end type ext
contains
complex function trace_ext(this)
class(ext), intent(in) :: this

! the following should be executed through invoking .tr. p below
! write(*,*) 'executing override'
ext_cnt = ext_cnt + 1
trace_ext = .tr. this%base + (0.0, 1.0) * ( this%i(1,1) + this%i(2,2) )
end function trace_ext

end module mod_ext
program test_override
use mod_ext
implicit none
type(base) :: o
type(ext) :: p
real :: r

! Note: ext's ".tr." (trace_ext) calls also base's "trace"

! write(*,*) .tr. o
! write(*,*) .tr. p
if (base_cnt /= 0 .or. ext_cnt /= 0) call abort ()
r = .tr. o
if (base_cnt /= 1 .or. ext_cnt /= 0) call abort ()
r = .tr. p
if (base_cnt /= 2 .or. ext_cnt /= 1) call abort ()

if (abs(.tr. o - 5.0 ) < 1.0e-6 .and. abs( .tr. p - (5.0,2.5)) < 1.0e-6) &
then
if (base_cnt /= 4 .or. ext_cnt /= 2) call abort ()
! write(*,*) 'OK'
else
call abort()
! write(*,*) 'FAIL'
end if
end program test_override

FortranFan

unread,

Feb 3, 2018, 2:16:34 PM2/3/18

to

On Monday, January 22, 2018 at 5:42:08 PM UTC-5, Janus wrote:

> ..

> I recently engaged in the exercise of checking various compilers on the gfortran testsuite..

> gfortran trunk: 100% tests passed, 0 tests failed out of 1553

..
> ifort 18.0.1: 82% tests passed, 281 tests failed out of 1553 ..

Continuing with my previous posts on this thread, yet another case I looked at out of curiosity with object-oriented facilities in Fortran shows issues. The case is class_to_type_4.f90. This particular test also involves intrinsic assignment where the right-hand side evaluates to a dynamic type that is not compatible with the left-hand side. In other words, the instruction used by the test that is of concern can be captured in a snippet like so:

--- begin snippet ---
type :: t
end type

type, extends(t) :: e
end type

type(t) :: foo
class(t), allocatable :: bar

allocate( e :: bar )

foo = bar

end
--- end snippet ---

Intel Fortran is correct in throwing a run-time exception with the assignment on line 12 whereas gfortran trunk lets it pass to the detriment of its users:

forrtl: severe (189): LHS and RHS of an assignment statement have incompatible types
Image PC Routine Line Source

p.exe 000000013FD38EE4 Unknown Unknown Unknown
p.exe 000000013FD3127A MAIN__ 12 p.f90

Neil

unread,

Feb 5, 2018, 1:34:48 AM2/5/18

to

On Saturday, January 27, 2018 at 9:30:08 AM UTC-7, Neil wrote:
> [...] I'm going to continue to pick through the failures to see what I find.

I want to report on what I found with the NAG compiler (6.2). I didn't want to just report the passing percentage, because I think that is really misleading due to the nature of the testsuite (even comparing between non-gfortran compilers). So I spent a great amount of time the last week and a half digging into each of the failures of the CMP tests, and diagnosing the causes.

First some observations about the tests.
* Significant use of non-standard intrinsic functions. Abort() was ubiquitous, but that was fixed by replacing it with "STOP 1" everywhere. But there remained many others.
* Use of language extensions, for either directly testing the extension or in testing something else entirely. 16-byte integers is one example.
* Reliance on implementation behavior specific to gfortran. The big offender here was in non-portable type declarations like REAL(4). These were fixed for NAG by using its "-kind=byte" compile flag. But other cases remained, such as how logical values are represented.
* More advanced language features were often used in tests for less advanced features. For example, tests using assumed rank arrays (F18) to test some middle-of-the-road F03 feature. The consequence is "spurious" failures, or tests that end up testing something other than what they were intended to test.

As a gfortran testsuite, all that is completely reasonable. It only becomes a problem if one wants to use another compiler. And in the end I don't think it is such a great tool for testing other compilers, as you may get a lot of false failures for every true failure. To make it a decent general test suite would require a significant amount of work and on-going discipline from gfortran developers, and the extra burden is hard to justify I think.

All that said here are the results for NAG 6.2 (just CMP I didn't do EXE yet):

87% tests passed, 197 tests failed out of 1555

Of the 197 failures:
18 were due to invalid code rejected by NAG that gfortran allowed (bug)
78 were due to use of extensions not supported by NAG
8 were due to specific gfortran implementation behavior not shared by NAG
37 were due to use of unsupported F08 or later features
30 were coarray tests (I didn't assess the cause of these)
12 were due to use of unsupported F03 features
19 were actual NAG bugs

NAG 6.2 has some support for (single image) coarrays, but I'm uncertain of the level of support and I don't know that part of Fortran at all.

I've put up my patched version of the testsuite at https://github.com/nncarlson/gfortran.dg and invalid/illegal code is documented in the issue tracker there.

Themos Tsikas

unread,

Feb 5, 2018, 10:47:09 AM2/5/18

to

Hello Neil,

Thank you for doing this, do you think you can send me the 19 failure cases please?

Themos Tsikas, NAG Ltd

Janus

unread,

Feb 7, 2018, 11:08:40 AM2/7/18

to

On Monday, February 5, 2018 at 7:34:48 AM UTC+1, Neil wrote:
> All that said here are the results for NAG 6.2 (just CMP I didn't do EXE yet):
>
> 87% tests passed, 197 tests failed out of 1555

Thanks for the detailed analysis, Neil.

> Of the 197 failures:
> 18 were due to invalid code rejected by NAG that gfortran allowed (bug)

Four of these are already fixed, thanks to Dominique d'Humieres:

https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=257364

I hope I'll be able to help fix some of the rest (right now I'm extremely short on spare time, but maybe this will improve in the coming weeks).

> 30 were coarray tests (I didn't assess the cause of these)
>

> NAG 6.2 has some support for (single image) coarrays, but I'm uncertain of the level of support and I don't know that part of Fortran at all.

I have not checked, but it might be that NAG requires extra flags for CAF? (gfortran & ifort also do)

> 37 were due to use of unsupported F08 or later features

> 12 were due to use of unsupported F03 features
> 19 were actual NAG bugs

These are issues on the side of NAG (apparently we already got their attention, and I hope some of this can be fixed).

> 78 were due to use of extensions not supported by NAG
> 8 were due to specific gfortran implementation behavior not shared by NAG

This, unfortunately, is the grey area where the testsuite rather measures 'gfortran compatibility' :/

In any case, it's nice to see that all compilers can benefit from this exercise ...

Cheers,
Janus

Janus

unread,

Feb 7, 2018, 11:38:12 AM2/7/18

to

On Saturday, February 3, 2018 at 4:40:07 AM UTC+1, FortranFan wrote:
> > After all the criticism I shoved towards Intel, I guess one has to give them some credit for coming out of this exercise with the best results of any non-gfortran compiler. ..
> >
>
> You don't have to "guess": Intel Fortran is indeed a premier toolset for Fortranners that provides high performance overall, vast set of features, portability across 4 of the most widely used platforms, online service center, valuable user forums, and so forth.

Thanks for sharing your infinite wisdom. I feel so much better now.

> Yet another test case I picked at random - typebound_operator_15.f90 - has an issue, this time with respect to standard conformance.

In fact it would be much more helpful if you could report the problems you found here:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84094

Or here:

https://github.com/nncarlson/gfortran.dg/issues

Otherwise I'm afraid they'll just drown in the vast depths of this newsgroup.

> That is now 3 of 3 in terms of the cases I looked and which have turned out questionable. The testsuite requires a serious review before any further posts or discussions about it in the context of any compiler evaluations.

Apparently you're trying to give the impression that absolutely nothing in the gfortran testsuite can be trusted.

Fact is: That's not the case. Please stop pretending you're picking things 'at random'. What you are picking are the problematic cases. It is certainly no surprise that such cases exist, but they're a few percent (and not 100), and I hope we can further reduce them.

Cheers,
Janus

FortranFan

unread,

Feb 7, 2018, 2:21:44 PM2/7/18

to

On Wednesday, February 7, 2018 at 11:38:12 AM UTC-5, Janus wrote:

> ..

>
> Thanks for sharing your infinite wisdom. I feel so much better now.

> ..

@Janus,

No, my 'wisdom' is definitely not infinite, it only surpasses your level of "humor and sarcarsm"!

> Apparently you're trying to give the impression that absolutely nothing in the gfortran testsuite can be trusted. ..

No, all I've been suggesting is that any testsuite needs to be reviewed closely before one can get into "% failed" comparisons and so forth.

> .. pretending you're picking things 'at random'. ..

No again, I was not; but perhaps you're inclined to suspect as much because you picked on a compiler seemingly 'at random'.

> In fact it would be much more helpful if you could report the problems..
> .. Otherwise I'm afraid they'll just drown in the vast depths of this newsgroup.

Good suggestions but there are practical limitations: Bugzilla has previously not worked because of some email issue or other; at first, it was MIME content but later with plain text emails too and then I was no longer keen to retry with GCC. With that GitHub project site, first I don't know if it's open for all. But secondly whether the changes I suggested are correct, especially given how my assertion regarding 'class_to_type_4.f90' case is entirely wrong according to J3 committee, as noted on your Intel Forum thread. I'd rather paste my code on forums first and let it be ripped apart to shreads instead of adding to a project only to be cleaned up later.

Cheers,

Neil

unread,

Feb 8, 2018, 10:07:01 AM2/8/18

to

On Wednesday, February 7, 2018 at 9:08:40 AM UTC-7, Janus wrote:
> > Of the 197 failures:
> > 18 were due to invalid code rejected by NAG that gfortran allowed (bug)
>
> Four of these are already fixed, thanks to Dominique d'Humieres:
>
> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=257364
>
> I hope I'll be able to help fix some of the rest (right now I'm extremely short on spare time, but maybe this will improve in the coming weeks).

In a few cases I submitted a PR for the underlying problem (but not the test suite code itself); links are in the issues at https://github.com/nncarlson/gfortran.dg/issues. But otherwise, I'm handing them off to you gfortran guys to handle as you see best. Thanks for already starting that.

> I have not checked, but it might be that NAG requires extra flags for CAF? (gfortran & ifort also do)

No, I looked and it actually doesn't. When I get more time I intend to finally read up on coarrays, but at this point I don't know enough to say anything about those failures.

> > 37 were due to use of unsupported F08 or later features
> > 12 were due to use of unsupported F03 features
> > 19 were actual NAG bugs
>
> These are issues on the side of NAG (apparently we already got their attention, and I hope some of this can be fixed).

I already pushed test cases that expose the underlying problems up to NAG (for the bugs, they are fully aware of the unsupported features). You can see them at https://github.com/nncarlson/fortran-compiler-tests anyone is interested (most of those dated in feb). I fully expect NAG will fix them, and fairly quickly.

Incidently, the unsupported features I encountered were: function pointer function results, recursive allocatable components, submodules, new pointer initialization stuff from F08, assumed rank, internal procedures as specifics for a generic from F08.

> In any case, it's nice to see that all compilers can benefit from this exercise ...

But only if someone takes the time to separate the wheat from the chaff, which is quite time consuming; I haven't even looked at the EXE tests yet :-) I did find it a quite useful personal exercise though, as it required me to think carefully about stuff I wasn't that familiar with. Thanks for putting together the CMakeLists.txt file; without that none of this would have happened.

-Neil

rbader

unread,

Feb 8, 2018, 12:36:45 PM2/8/18

to

Since my name appears on this particular test, I would like to comment that it was published *prior* to an interpretation processed by J3 that changed the semantics of private TBPs quite significantly. So validating a test suite is surely something requiring regular review ...
Regards
Reinhold

paul.rich...@gmail.com

unread,

Feb 8, 2018, 2:10:24 PM2/8/18

to

Dear Reinhold,

The contribution that you have made through posting gfortran bugs is of inestimable help to us. We like to give credit via the "contributed by" in the testcases. We should, perhaps, add a disclaimer that any bollox up that follows is entirely the responsibility of the gfortran maintainers....

Paul

rbader

unread,

Feb 10, 2018, 4:13:40 AM2/10/18

to

Hello Paul,

I don't think that will be necessary, also considering that a contributor cannot entirely be absolved from responsibility (and this is consistent with giving credit, by the way!).

In any case, listing regression test results with failure percentages is somewhat deceptive as a measure of compiler quality, so at the very least one needs to keep in mind:

* there may be a problem with the test case
* there may be a problem with the compiler
* there may be a problem with both the test and the compiler
* there may be a problem with the standard itself

Intel Fortran, by the way, only implemented the updated private TBP semantics in its most recent release.

Dick Hendrickson

unread,

Feb 10, 2018, 11:19:50 AM2/10/18

to

Another problem with failure percentages is that a single compiler error
can trigger errors from different tests. It's not always obvious that
multiple errors come from the same root cause.

Dick Hendrickson