Fortran to C/C++ translation: a running example.

Rock Brentwood

unread,

May 16, 2022, 3:53:09 PM5/16/22

to

The classic text-based computer game Zork / dungeon was originally devised on
MIT computers in a LISP-offshoot (MDL), and translated to Fortran 77 by an
"Anonymous" author. Some time later an enterprising soul converted a version
of the Fortran edition of Zork into C ... pre-ANSI C ... with the aid of an
earlier version of "f2c", but left no detailed paper trail behind on the
actual translation process and stages.

I think this is the kind of project our moderator would really like.

It's been retranslated from Fortran (with the aid of a later version of "f2c")
here:

https://github.com/LydiaMarieWilliamson/zork-fortran

every intermediate stage of the process is archived in the history log and
commit history. This was carried out in tandem with a revision of the Fortran
source, itself (as Fortran 2018 no longer supports all of Fortran 77), and an
upward revision of the 1991 translation into C99. Both the newer C
translation, from 2021, and 2021 revision of the older 1991 C translation have
converted onto the same result.

A key issue that arise, which led to later revision in the Fortran standard,
is the lack of information required to distinguish between parameters that are
input-only, output-only, input/output. That has to be inferred, which requires
either transparency of library functions (here: the functions in the f2c
library or whatever is written in its place) or I/O specifications in the
library functions. So, a "strength reduction" step is required to lift
input/output parameters (the default) to input-only or output-only.

A similar issue arises with locals, which are "static", by default, in Fortran
(or the Fortran equivalent of "static"). A "strength reduction" step is
required to lift non-static locals to bona fide "auto" locals.

Another key issue the aliasing that goes on with "equivalence" constructs.
There is no good uniform translation for this into C ... it actually better
fits C++, where you have reference types available. There's really no good
reason why those have been left out of C, when other things which appeared
first in C++, like "const", "bool" or function prototypes, found their way
into C.

However, a substantial chunk of use-cases for equivalence constructs can be
carved out as "enum" types, so there was a strength reduction step for this,
too.

Perhaps the moderator will have more to say about the intricacies of Fortran
translation. In the meanwhile, another project has already been staged for
conversion to C++ - LAPACK

https://github.com/LydiaMarieWilliamson/lapack

but is in a holding pattern for now. This one will more heavily involve the
synthesis of "template" types. To date, ongoing attempts, elsewhere, have been
mostly limited to creating C or C++ shells for the Fortran core, rather than a
conversion of the core, itself.
[It's been at least 20 years since I've done any sort of Fortran translation
so for this maze of twisty little passages, I'm afraid you're on your own.
I'm always surprised in translation exercises how many ways that languages
that look superficially the same are different in ways that make the translation
much harder. -John]

Ian Lance Taylor

unread,

May 16, 2022, 7:08:05 PM5/16/22

to

> From: Rock Brentwood <rockbr...@gmail.com>

> The classic text-based computer game Zork / dungeon was originally devised on
> MIT computers in a LISP-offshoot (MDL), and translated to Fortran 77 by an
> "Anonymous" author. Some time later an enterprising soul converted a version
> of the Fortran edition of Zork into C ... pre-ANSI C ... with the aid of an
> earlier version of "f2c", but left no detailed paper trail behind on the
> actual translation process and stages.

Just FYI I was the enterprising soul who translated the code from Fortran
to C. I still have at least some of the intermediate files. Happy to
answer any questions.

That said, most of the work was manually rewriting the f2c output into
something more C like. For me this wasn't an exercise in translation
between languages, it was an exercise in making a version of Zork more
available. So I probably don't have anything useful to add that is
relevant to the compilers list.

In particular I changed the format of the data file. I wrote a translation
program between the old format, a new format, and a text format. The text
format let me make minor changes to things like the leaflet text.

Ian

Thomas Koenig

unread,

May 17, 2022, 11:41:08 AM5/17/22

to

Rock Brentwood <rockbr...@gmail.com> schrieb:
[...]

> A key issue that arise, which led to later revision in the Fortran standard,
> is the lack of information required to distinguish between parameters that are
> input-only, output-only, input/output.

Nit: In Fortran, "parameters" are what you would call "constants"
in another language. Arguments to functions or subroutines are
called "dummy arguments", which are then associated with "actual
arguments" on the caller's side.

> That has to be inferred, which requires
> either transparency of library functions (here: the functions in the f2c
> library or whatever is written in its place) or I/O specifications in the
> library functions. So, a "strength reduction" step is required to lift
> input/output parameters (the default) to input-only or output-only.

"Strength reduction" is a term normally used for something else,
for example when replacing multiplication (as in a loop for array
processing) by addition.

It's a question of the semantics of the code. For something like
(C side)

aux_var = 5;
foo (&aux_var);

you can almost certainly rewrite foo to take a value argument.

> A similar issue arises with locals, which are "static", by default, in Fortran
> (or the Fortran equivalent of "static"). A "strength reduction" step is
> required to lift non-static locals to bona fide "auto" locals.

The FORTRAN language never guaranteed that variables would keep their
data unless SAVE was specified, but many compilers did it anyway, so the
code may indeed assume so.

Some experimentation on the Fortran side can help there. Compiling
the code with -frecursive and/or with one of the -finit-integer
and -finit-real options (I'm talking gfortran options here, but
other compilers have similar) will help you find trouble spots.
If you happen to have access to nagfor, they have a -C=all option
which will find very many bugs in code that people think correct,
even more with -C=undefined.

> Another key issue the aliasing that goes on with "equivalence" constructs.

> There is no good uniform translation for this into C ...

The question is - what is equivalence used for? Something sane?

Generally, C's union are a good match for Fortran's equivalence,
with the same problem with undefined behavior if the unions are
used for type punning.

>it actually better
> fits C++, where you have reference types available. There's really no good
> reason why those have been left out of C, when other things which appeared
> first in C++, like "const", "bool" or function prototypes, found their way
> into C.
>
> However, a substantial chunk of use-cases for equivalence constructs can be
> carved out as "enum" types, so there was a strength reduction step for this,
> too.
>
> Perhaps the moderator will have more to say about the intricacies of Fortran
> translation. In the meanwhile, another project has already been staged for
> conversion to C++ - LAPACK
>
> https://github.com/LydiaMarieWilliamson/lapack
>
> but is in a holding pattern for now. This one will more heavily involve the
> synthesis of "template" types. To date, ongoing attempts, elsewhere, have been
> mostly limited to creating C or C++ shells for the Fortran core, rather than a
> conversion of the core, itself.

Fortran has guarantees on the semantics which are quite well tuned for
optimization. Converting it into C or C++ may well lose execution
speed.

Lydia Marie Williamson

unread,

May 21, 2022, 11:54:47 AM5/21/22

to

On Monday, May 16, 2022 at 2:53:09 PM UTC-5, Rock Brentwood wrote:
> Another key issue the aliasing that goes on with "equivalence" constructs.
> There is no good uniform translation for this into C ... it actually better
> fits C++, where you have reference types available. There's really no good
> reason why those have been left out of C, when other things which appeared
> first in C++, like "const", "bool" or function prototypes, found their way
> into C.
>
> However, a substantial chunk of use-cases for equivalence constructs can be
> carved out as "enum" types, so there was a strength reduction step for this,
> too.

This is not exactly correct. It's "common blocks" that were handled in this
way.

In the Fortran source of Zork/dungeon, the "equivalence" statements and
"common blocks" were used together, so it's easy to get the issue confused. I
don't know if their being used together is something that always happened in
Fortran, or if it was just particular to this program.

> In the meanwhile, another project has already been staged for
> conversion to C++ - LAPACK
>
> https://github.com/LydiaMarieWilliamson/lapack
>
> but is in a holding pattern for now.

There were several stages to the translation, one of which involved
regularizing and normalizing the Fortran, itself.
This is also on the local machines here.
But while that was happening, LAPACK came back alive, and is out on GitHub and
being actively maintained again.
Originally, it was (mostly) inert.

> [It's been at least 20 years since I've done any sort of Fortran translation
> so for this maze of twisty little passages, I'm afraid you're on your own.
> I'm always surprised in translation exercises how many ways that languages
> that look superficially the same are different in ways that make the
translation much harder. -John]

Things would be easier going into C++, instead of C, since it already has
aliasing, operator overloading, re-defineable array indexing, and
call-by-reference. This inclusion of more Fortran-friendly features into C++
was apparently done intentionally.
[It was not unusual to use common and equivalence together, particularly when memory
was tight. But equivalence is like a union, not an enum. -John]

Lydia Marie Williamson

unread,

May 21, 2022, 11:55:16 AM5/21/22

to

On Monday, May 16, 2022 at 6:08:05 PM UTC-5, Ian Lance Taylor wrote:
> Just FYI I was the enterprising soul who translated the code from Fortran
> to C. I still have at least some of the intermediate files. Happy to
> answer any questions.

I can interleave your in-between states into my GitHub sequence as a parallel
side-branch, if you wish. That will significantly help close out another loose
end that I hadn't yet fully resolved.

> In particular I changed the format of the data file. I wrote a translation
> program between the old format, a new format, and a text format. The text
> format let me make minor changes to things like the leaflet text.

I reverted back from character-based to a compromise between streaming and
records - something that's also friendly to Fortran 2018.

It just so happened, that later Fortran versions also included made a story
and index file compiler - one that's similar to what I have. That will be
integrated into the GitHub sequence, as well.

Most of the issues with translation (as you'll see in the history log I kept,
in the repository) were dealing with the I/O functions - with quite a few
changes made on the Fortran side, even before translating to C. Fortran has
streaming I/O now, which helps tremendously.

gah4

unread,

May 21, 2022, 1:26:02 PM5/21/22

to

On Saturday, May 21, 2022 at 8:54:47 AM UTC-7, Lydia Marie Williamson wrote:

(snip on COMMON and EQUIVALENCE)

> This is not exactly correct. It's "common blocks" that were handled in this
> way.

> In the Fortran source of Zork/dungeon, the "equivalence" statements and
> "common blocks" were used together, so it's easy to get the issue confused. I
> don't know if their being used together is something that always happened in
> Fortran, or if it was just particular to this program.

COMMON and EQUIVALENCE are closely related in the Fortran standard,
and in the implementation by compilers. A variable equivalenced to a
variable in common, is also in common. Such variable can extend the
length of the common block, but only at the end, not the beginning.

It used to be that compilers would print out a variable map, with the
address, or offset, of each variable, and its length and type. That was
often useful to be sure that the compiler did what you thought it did.
Also, it would include the length of each common block, again good
to check to be sure they agree with what you expect.

The Fortran standard has a C interoperability feature that explains
how Fortran features and C features work together.

Thomas Koenig

unread,

May 21, 2022, 1:26:57 PM5/21/22

to

Lydia Marie Williamson <lydiamarie...@gmail.com> schrieb:

> On Monday, May 16, 2022 at 2:53:09 PM UTC-5, Rock Brentwood wrote:
>> Another key issue the aliasing that goes on with "equivalence" constructs.
>> There is no good uniform translation for this into C ... it actually better
>> fits C++, where you have reference types available. There's really no good
>> reason why those have been left out of C, when other things which appeared
>> first in C++, like "const", "bool" or function prototypes, found their way
>> into C.
>>
>> However, a substantial chunk of use-cases for equivalence constructs can be
>> carved out as "enum" types, so there was a strength reduction step for this,
>> too.
>
> This is not exactly correct. It's "common blocks" that were handled in this way.
>
> In the Fortran source of Zork/dungeon, the "equivalence" statements and
> "common blocks" were used together, so it's easy to get the issue confused. I
> don't know if their being used together is something that always happened in
> Fortran, or if it was just particular to this program.

Fortran has the concept of storage association - under certain
circumstances, the ordering of variables is prescribed by the
standard.

COMMON blocks are one example of this. Taking an example from the
original Fortran source code:

COMMON /SYNTAX/ VFLAG,DOBJ,DFL1,DFL2,DFW1,DFW2,
& IOBJ,IFL1,IFL2,IFW1,IFW2

This declares a common block /SYNTAX/ with 11 named variables
(all of them integers due to an IMPLICIT INTEGER (A-Z) earlier in
all files), which have to be contiguous in memory.

The next line

INTEGER SYN(11)

declares an integer array with 11 elements.

Finally, the statement

EQUIVALENCE (VFLAG, SYN)

tells the compiler that the address of the (first element of) SYN
and VFLAG are the same.

So, you can now use SYN(1) to refer to VFLAG, SYN(2) to DOBJ and so on.

Why is this done? I see only one use case, in np3.for

DO 10 I=1,11
C !CLEAR SYNTAX.
SYN(I)=0
10 CONTINUE

simply to create a shortcut for clearing the syntax.

This is a benign (and standard-conforming) way of using COMMON
and EQUIVALENCE. Equivalent C code might create a 'struct syntax'
and clear it with a memset, or have 11 individual variables and
zero them individually.