Forum: >>> Magnum BBS <<<

Fortran to C/C++ translation: a running example.

From Rock Brentwood@21:1/5 to All on Mon May 16 12:27:30 2022

The classic text-based computer game Zork / dungeon was originally devised on MIT computers in a LISP-offshoot (MDL), and translated to Fortran 77 by an "Anonymous" author. Some time later an enterprising soul converted a version
of the Fortran edition of Zork into C ... pre-ANSI C ... with the aid of an earlier version of "f2c", but left no detailed paper trail behind on the
actual translation process and stages.

I think this is the kind of project our moderator would really like.

It's been retranslated from Fortran (with the aid of a later version of "f2c") here:

https://github.com/LydiaMarieWilliamson/zork-fortran

every intermediate stage of the process is archived in the history log and commit history. This was carried out in tandem with a revision of the Fortran source, itself (as Fortran 2018 no longer supports all of Fortran 77), and an upward revision of the 1991 translation into C99. Both the newer C
translation, from 2021, and 2021 revision of the older 1991 C translation have converted onto the same result.

A key issue that arise, which led to later revision in the Fortran standard,
is the lack of information required to distinguish between parameters that are input-only, output-only, input/output. That has to be inferred, which requires either transparency of library functions (here: the functions in the f2c library or whatever is written in its place) or I/O specifications in the library functions. So, a "strength reduction" step is required to lift input/output parameters (the default) to input-only or output-only.

A similar issue arises with locals, which are "static", by default, in Fortran (or the Fortran equivalent of "static"). A "strength reduction" step is required to lift non-static locals to bona fide "auto" locals.

Another key issue the aliasing that goes on with "equivalence" constructs. There is no good uniform translation for this into C ... it actually better fits C++, where you have reference types available. There's really no good reason why those have been left out of C, when other things which appeared first in C++, like "const", "bool" or function prototypes, found their way
into C.

However, a substantial chunk of use-cases for equivalence constructs can be carved out as "enum" types, so there was a strength reduction step for this, too.

Perhaps the moderator will have more to say about the intricacies of Fortran translation. In the meanwhile, another project has already been staged for conversion to C++ - LAPACK

https://github.com/LydiaMarieWilliamson/lapack

but is in a holding pattern for now. This one will more heavily involve the synthesis of "template" types. To date, ongoing attempts, elsewhere, have been mostly limited to creating C or C++ shells for the Fortran core, rather than a conversion of the core, itself.
[It's been at least 20 years since I've done any sort of Fortran translation
so for this maze of twisty little passages, I'm afraid you're on your own.
I'm always surprised in translation exercises how many ways that languages
that look superficially the same are different in ways that make the translation
much harder. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to Rock Brentwood on Tue May 17 14:59:15 2022

Rock Brentwood <[email protected]> schrieb:
[...]

A key issue that arise, which led to later revision in the Fortran standard, is the lack of information required to distinguish between parameters that are
input-only, output-only, input/output.

Nit: In Fortran, "parameters" are what you would call "constants"
in another language. Arguments to functions or subroutines are
called "dummy arguments", which are then associated with "actual
arguments" on the caller's side.

That has to be inferred, which requires
either transparency of library functions (here: the functions in the f2c library or whatever is written in its place) or I/O specifications in the library functions. So, a "strength reduction" step is required to lift input/output parameters (the default) to input-only or output-only.

"Strength reduction" is a term normally used for something else,
for example when replacing multiplication (as in a loop for array
processing) by addition.

It's a question of the semantics of the code. For something like
(C side)

aux_var = 5;
foo (&aux_var);

you can almost certainly rewrite foo to take a value argument.

A similar issue arises with locals, which are "static", by default, in Fortran
(or the Fortran equivalent of "static"). A "strength reduction" step is required to lift non-static locals to bona fide "auto" locals.

The FORTRAN language never guaranteed that variables would keep their
data unless SAVE was specified, but many compilers did it anyway, so the
code may indeed assume so.

Some experimentation on the Fortran side can help there. Compiling
the code with -frecursive and/or with one of the -finit-integer
and -finit-real options (I'm talking gfortran options here, but
other compilers have similar) will help you find trouble spots.
If you happen to have access to nagfor, they have a -C=all option
which will find very many bugs in code that people think correct,
even more with -C=undefined.

Another key issue the aliasing that goes on with "equivalence" constructs.

There is no good uniform translation for this into C ...

The question is - what is equivalence used for? Something sane?

Generally, C's union are a good match for Fortran's equivalence,
with the same problem with undefined behavior if the unions are
used for type punning.

it actually better
fits C++, where you have reference types available. There's really no good reason why those have been left out of C, when other things which appeared first in C++, like "const", "bool" or function prototypes, found their way into C.

However, a substantial chunk of use-cases for equivalence constructs can be carved out as "enum" types, so there was a strength reduction step for this, too.

Perhaps the moderator will have more to say about the intricacies of Fortran translation. In the meanwhile, another project has already been staged for conversion to C++ - LAPACK

https://github.com/LydiaMarieWilliamson/lapack

but is in a holding pattern for now. This one will more heavily involve the synthesis of "template" types. To date, ongoing attempts, elsewhere, have been
mostly limited to creating C or C++ shells for the Fortran core, rather than a
conversion of the core, itself.

Fortran has guarantees on the semantics which are quite well tuned for optimization. Converting it into C or C++ may well lose execution
speed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lydia Marie Williamson@21:1/5 to Rock Brentwood on Fri May 20 16:34:48 2022

On Monday, May 16, 2022 at 2:53:09 PM UTC-5, Rock Brentwood wrote:

Another key issue the aliasing that goes on with "equivalence" constructs. There is no good uniform translation for this into C ... it actually better fits C++, where you have reference types available. There's really no good reason why those have been left out of C, when other things which appeared first in C++, like "const", "bool" or function prototypes, found their way into C.

However, a substantial chunk of use-cases for equivalence constructs can be carved out as "enum" types, so there was a strength reduction step for this, too.

This is not exactly correct. It's "common blocks" that were handled in this way.

In the Fortran source of Zork/dungeon, the "equivalence" statements and
"common blocks" were used together, so it's easy to get the issue confused. I don't know if their being used together is something that always happened in Fortran, or if it was just particular to this program.

In the meanwhile, another project has already been staged for
conversion to C++ - LAPACK

https://github.com/LydiaMarieWilliamson/lapack

but is in a holding pattern for now.

There were several stages to the translation, one of which involved regularizing and normalizing the Fortran, itself.
This is also on the local machines here.
But while that was happening, LAPACK came back alive, and is out on GitHub and being actively maintained again.
Originally, it was (mostly) inert.

[It's been at least 20 years since I've done any sort of Fortran translation so for this maze of twisty little passages, I'm afraid you're on your own. I'm always surprised in translation exercises how many ways that languages that look superficially the same are different in ways that make the

translation much harder. -John]

Things would be easier going into C++, instead of C, since it already has aliasing, operator overloading, re-defineable array indexing, and call-by-reference. This inclusion of more Fortran-friendly features into C++ was apparently done intentionally.
[It was not unusual to use common and equivalence together, particularly when memory
was tight. But equivalence is like a union, not an enum. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From gah4@21:1/5 to Lydia Marie Williamson on Sat May 21 09:31:45 2022

On Saturday, May 21, 2022 at 8:54:47 AM UTC-7, Lydia Marie Williamson wrote:

(snip on COMMON and EQUIVALENCE)

This is not exactly correct. It's "common blocks" that were handled in this way.

In the Fortran source of Zork/dungeon, the "equivalence" statements and "common blocks" were used together, so it's easy to get the issue confused. I don't know if their being used together is something that always happened in Fortran, or if it was just particular to this program.

COMMON and EQUIVALENCE are closely related in the Fortran standard,
and in the implementation by compilers. A variable equivalenced to a
variable in common, is also in common. Such variable can extend the
length of the common block, but only at the end, not the beginning.

It used to be that compilers would print out a variable map, with the
address, or offset, of each variable, and its length and type. That was
often useful to be sure that the compiler did what you thought it did.
Also, it would include the length of each common block, again good
to check to be sure they agree with what you expect.

The Fortran standard has a C interoperability feature that explains
how Fortran features and C features work together.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to Lydia Marie Williamson on Sat May 21 17:24:37 2022

Lydia Marie Williamson <[email protected]> schrieb:

On Monday, May 16, 2022 at 2:53:09 PM UTC-5, Rock Brentwood wrote:

Another key issue the aliasing that goes on with "equivalence" constructs. >> There is no good uniform translation for this into C ... it actually better >> fits C++, where you have reference types available. There's really no good >> reason why those have been left out of C, when other things which appeared >> first in C++, like "const", "bool" or function prototypes, found their way >> into C.

However, a substantial chunk of use-cases for equivalence constructs can be >> carved out as "enum" types, so there was a strength reduction step for this, >> too.

This is not exactly correct. It's "common blocks" that were handled in this way.

In the Fortran source of Zork/dungeon, the "equivalence" statements and "common blocks" were used together, so it's easy to get the issue confused. I don't know if their being used together is something that always happened in Fortran, or if it was just particular to this program.

Fortran has the concept of storage association - under certain
circumstances, the ordering of variables is prescribed by the
standard.

COMMON blocks are one example of this. Taking an example from the
original Fortran source code:

COMMON /SYNTAX/ VFLAG,DOBJ,DFL1,DFL2,DFW1,DFW2,
& IOBJ,IFL1,IFL2,IFW1,IFW2

This declares a common block /SYNTAX/ with 11 named variables
(all of them integers due to an IMPLICIT INTEGER (A-Z) earlier in
all files), which have to be contiguous in memory.

The next line

INTEGER SYN(11)

declares an integer array with 11 elements.

Finally, the statement

EQUIVALENCE (VFLAG, SYN)

tells the compiler that the address of the (first element of) SYN
and VFLAG are the same.

So, you can now use SYN(1) to refer to VFLAG, SYN(2) to DOBJ and so on.

Why is this done? I see only one use case, in np3.for

DO 10 I=1,11
C !CLEAR SYNTAX.
SYN(I)=0
10 CONTINUE

simply to create a shortcut for clearing the syntax.

This is a benign (and standard-conforming) way of using COMMON
and EQUIVALENCE. Equivalent C code might create a 'struct syntax'
and clear it with a memset, or have 11 individual variables and
zero them individually.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Rixter
  Fri Jul 31 12:17:09 2026
  from Madison, Nc via Telnet
- Krenn
  Fri Jul 31 10:41:58 2026
  from Sydney, Nsw via Telnet
- Krenn
  Fri Jul 31 10:34:35 2026
  from Sydney, Nsw via Telnet
- Shift
  Fri Jul 31 06:46:34 2026
  from Leeds, England via SSH
- Centurion
  Fri Jul 31 00:59:56 2026
  from Berea, Ohio via Telnet
- Rixter
  Fri Jul 31 00:00:46 2026
  from Madison, Nc via Telnet
- Bob Worm
  Thu Jul 30 20:01:55 2026
  from Wales, Uk via Telnet
- Rixter
  Thu Jul 30 14:17:17 2026
  from Madison, Nc via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	113:52:55
Calls:	12,464
Calls today:	6
Files:	15,200
Messages:	6,538,203

Fortran to C/C++ translation: a running example.

Who's Online

Recent Visitors

System Info