• Fortran to C/C++ translation: a running example.

    From Rock Brentwood@21:1/5 to All on Mon May 16 12:27:30 2022
    The classic text-based computer game Zork / dungeon was originally devised on MIT computers in a LISP-offshoot (MDL), and translated to Fortran 77 by an "Anonymous" author. Some time later an enterprising soul converted a version
    of the Fortran edition of Zork into C ... pre-ANSI C ... with the aid of an earlier version of "f2c", but left no detailed paper trail behind on the
    actual translation process and stages.

    I think this is the kind of project our moderator would really like.

    It's been retranslated from Fortran (with the aid of a later version of "f2c") here:

    https://github.com/LydiaMarieWilliamson/zork-fortran

    every intermediate stage of the process is archived in the history log and commit history. This was carried out in tandem with a revision of the Fortran source, itself (as Fortran 2018 no longer supports all of Fortran 77), and an upward revision of the 1991 translation into C99. Both the newer C
    translation, from 2021, and 2021 revision of the older 1991 C translation have converted onto the same result.

    A key issue that arise, which led to later revision in the Fortran standard,
    is the lack of information required to distinguish between parameters that are input-only, output-only, input/output. That has to be inferred, which requires either transparency of library functions (here: the functions in the f2c library or whatever is written in its place) or I/O specifications in the library functions. So, a "strength reduction" step is required to lift input/output parameters (the default) to input-only or output-only.

    A similar issue arises with locals, which are "static", by default, in Fortran (or the Fortran equivalent of "static"). A "strength reduction" step is required to lift non-static locals to bona fide "auto" locals.

    Another key issue the aliasing that goes on with "equivalence" constructs. There is no good uniform translation for this into C ... it actually better fits C++, where you have reference types available. There's really no good reason why those have been left out of C, when other things which appeared first in C++, like "const", "bool" or function prototypes, found their way
    into C.

    However, a substantial chunk of use-cases for equivalence constructs can be carved out as "enum" types, so there was a strength reduction step for this, too.

    Perhaps the moderator will have more to say about the intricacies of Fortran translation. In the meanwhile, another project has already been staged for conversion to C++ - LAPACK

    https://github.com/LydiaMarieWilliamson/lapack

    but is in a holding pattern for now. This one will more heavily involve the synthesis of "template" types. To date, ongoing attempts, elsewhere, have been mostly limited to creating C or C++ shells for the Fortran core, rather than a conversion of the core, itself.
    [It's been at least 20 years since I've done any sort of Fortran translation
    so for this maze of twisty little passages, I'm afraid you're on your own.
    I'm always surprised in translation exercises how many ways that languages
    that look superficially the same are different in ways that make the translation
    much harder. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Rock Brentwood on Tue May 17 14:59:15 2022
    Rock Brentwood <[email protected]> schrieb:
    [...]

    A key issue that arise, which led to later revision in the Fortran standard, is the lack of information required to distinguish between parameters that are
    input-only, output-only, input/output.

    Nit: In Fortran, "parameters" are what you would call "constants"
    in another language. Arguments to functions or subroutines are
    called "dummy arguments", which are then associated with "actual
    arguments" on the caller's side.

    That has to be inferred, which requires
    either transparency of library functions (here: the functions in the f2c library or whatever is written in its place) or I/O specifications in the library functions. So, a "strength reduction" step is required to lift input/output parameters (the default) to input-only or output-only.

    "Strength reduction" is a term normally used for something else,
    for example when replacing multiplication (as in a loop for array
    processing) by addition.

    It's a question of the semantics of the code. For something like
    (C side)

    aux_var = 5;
    foo (&aux_var);

    you can almost certainly rewrite foo to take a value argument.

    A similar issue arises with locals, which are "static", by default, in Fortran
    (or the Fortran equivalent of "static"). A "strength reduction" step is required to lift non-static locals to bona fide "auto" locals.

    The FORTRAN language never guaranteed that variables would keep their
    data unless SAVE was specified, but many compilers did it anyway, so the
    code may indeed assume so.

    Some experimentation on the Fortran side can help there. Compiling
    the code with -frecursive and/or with one of the -finit-integer
    and -finit-real options (I'm talking gfortran options here, but
    other compilers have similar) will help you find trouble spots.
    If you happen to have access to nagfor, they have a -C=all option
    which will find very many bugs in code that people think correct,
    even more with -C=undefined.

    Another key issue the aliasing that goes on with "equivalence" constructs.

    There is no good uniform translation for this into C ...

    The question is - what is equivalence used for? Something sane?

    Generally, C's union are a good match for Fortran's equivalence,
    with the same problem with undefined behavior if the unions are
    used for type punning.

    it actually better
    fits C++, where you have reference types available. There's really no good reason why those have been left out of C, when other things which appeared first in C++, like "const", "bool" or function prototypes, found their way into C.

    However, a substantial chunk of use-cases for equivalence constructs can be carved out as "enum" types, so there was a strength reduction step for this, too.

    Perhaps the moderator will have more to say about the intricacies of Fortran translation. In the meanwhile, another project has already been staged for conversion to C++ - LAPACK

    https://github.com/LydiaMarieWilliamson/lapack

    but is in a holding pattern for now. This one will more heavily involve the synthesis of "template" types. To date, ongoing attempts, elsewhere, have been
    mostly limited to creating C or C++ shells for the Fortran core, rather than a
    conversion of the core, itself.

    Fortran has guarantees on the semantics which are quite well tuned for optimization. Converting it into C or C++ may well lose execution
    speed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lydia Marie Williamson@21:1/5 to Rock Brentwood on Fri May 20 16:34:48 2022
    On Monday, May 16, 2022 at 2:53:09 PM UTC-5, Rock Brentwood wrote:
    Another key issue the aliasing that goes on with "equivalence" constructs. There is no good uniform translation for this into C ... it actually better fits C++, where you have reference types available. There's really no good reason why those have been left out of C, when other things which appeared first in C++, like "const", "bool" or function prototypes, found their way into C.

    However, a substantial chunk of use-cases for equivalence constructs can be carved out as "enum" types, so there was a strength reduction step for this, too.

    This is not exactly correct. It's "common blocks" that were handled in this way.

    In the Fortran source of Zork/dungeon, the "equivalence" statements and
    "common blocks" were used together, so it's easy to get the issue confused. I don't know if their being used together is something that always happened in Fortran, or if it was just particular to this program.

    In the meanwhile, another project has already been staged for
    conversion to C++ - LAPACK

    https://github.com/LydiaMarieWilliamson/lapack

    but is in a holding pattern for now.

    There were several stages to the translation, one of which involved regularizing and normalizing the Fortran, itself.
    This is also on the local machines here.
    But while that was happening, LAPACK came back alive, and is out on GitHub and being actively maintained again.
    Originally, it was (mostly) inert.

    [It's been at least 20 years since I've done any sort of Fortran translation so for this maze of twisty little passages, I'm afraid you're on your own. I'm always surprised in translation exercises how many ways that languages that look superficially the same are different in ways that make the
    translation much harder. -John]

    Things would be easier going into C++, instead of C, since it already has aliasing, operator overloading, re-defineable array indexing, and call-by-reference. This inclusion of more Fortran-friendly features into C++ was apparently done intentionally.
    [It was not unusual to use common and equivalence together, particularly when memory
    was tight. But equivalence is like a union, not an enum. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to Lydia Marie Williamson on Sat May 21 09:31:45 2022
    On Saturday, May 21, 2022 at 8:54:47 AM UTC-7, Lydia Marie Williamson wrote:

    (snip on COMMON and EQUIVALENCE)

    This is not exactly correct. It's "common blocks" that were handled in this way.

    In the Fortran source of Zork/dungeon, the "equivalence" statements and "common blocks" were used together, so it's easy to get the issue confused. I don't know if their being used together is something that always happened in Fortran, or if it was just particular to this program.

    COMMON and EQUIVALENCE are closely related in the Fortran standard,
    and in the implementation by compilers. A variable equivalenced to a
    variable in common, is also in common. Such variable can extend the
    length of the common block, but only at the end, not the beginning.

    It used to be that compilers would print out a variable map, with the
    address, or offset, of each variable, and its length and type. That was
    often useful to be sure that the compiler did what you thought it did.
    Also, it would include the length of each common block, again good
    to check to be sure they agree with what you expect.

    The Fortran standard has a C interoperability feature that explains
    how Fortran features and C features work together.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Lydia Marie Williamson on Sat May 21 17:24:37 2022
    Lydia Marie Williamson <[email protected]> schrieb:
    On Monday, May 16, 2022 at 2:53:09 PM UTC-5, Rock Brentwood wrote:
    Another key issue the aliasing that goes on with "equivalence" constructs. >> There is no good uniform translation for this into C ... it actually better >> fits C++, where you have reference types available. There's really no good >> reason why those have been left out of C, when other things which appeared >> first in C++, like "const", "bool" or function prototypes, found their way >> into C.

    However, a substantial chunk of use-cases for equivalence constructs can be >> carved out as "enum" types, so there was a strength reduction step for this, >> too.

    This is not exactly correct. It's "common blocks" that were handled in this way.

    In the Fortran source of Zork/dungeon, the "equivalence" statements and "common blocks" were used together, so it's easy to get the issue confused. I don't know if their being used together is something that always happened in Fortran, or if it was just particular to this program.

    Fortran has the concept of storage association - under certain
    circumstances, the ordering of variables is prescribed by the
    standard.

    COMMON blocks are one example of this. Taking an example from the
    original Fortran source code:

    COMMON /SYNTAX/ VFLAG,DOBJ,DFL1,DFL2,DFW1,DFW2,
    & IOBJ,IFL1,IFL2,IFW1,IFW2

    This declares a common block /SYNTAX/ with 11 named variables
    (all of them integers due to an IMPLICIT INTEGER (A-Z) earlier in
    all files), which have to be contiguous in memory.

    The next line

    INTEGER SYN(11)

    declares an integer array with 11 elements.

    Finally, the statement

    EQUIVALENCE (VFLAG, SYN)

    tells the compiler that the address of the (first element of) SYN
    and VFLAG are the same.

    So, you can now use SYN(1) to refer to VFLAG, SYN(2) to DOBJ and so on.

    Why is this done? I see only one use case, in np3.for

    DO 10 I=1,11
    C !CLEAR SYNTAX.
    SYN(I)=0
    10 CONTINUE

    simply to create a shortcut for clearing the syntax.

    This is a benign (and standard-conforming) way of using COMMON
    and EQUIVALENCE. Equivalent C code might create a 'struct syntax'
    and clear it with a memset, or have 11 individual variables and
    zero them individually.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)