• Improved accuracy in diagnostics. Is it worthwhile?

    From Ev. Drikos@21:1/5 to All on Fri Mar 18 07:25:40 2022
    Hello,

    This is mainly a parsing question but it's also Fortran related as well.

    When I make syntax checking with the command 'fcheck' in the code below,
    the error message doesn't contain a '(' in the expected tokens. This
    happens due to default actions, although the parser is basically LALR. A
    pure LALR parser wouldn't make reductions without examininig the lookahead.

    Default actions are useful because they save a lot of space in parsing
    tables, at the cost of missing expected tokens in the error messages
    printed by the command 'fcheck'. This is the relevant BNF rule for the
    example given at the end of this message:

    implicit-stmt ::=
    IMPLICIT implicit-spec-list
    | IMPLICIT NONE [ ( [ implicit-none-spec-list ] ) ]


    Disabling default actions for the command 'fcheck' is fairly simple,
    just a button click in Syntaxis, but at the moment I can't think of
    how many error messages would be improved, whereas a parsing table
    increase (50%) would be granted. The command 'fcheck' can be found at https://github.com/drikosev/Fortran

    So far, my approach has been that improved diagnostics shouldn't slow
    down the processing of correct programs. Is it worthwhile to improve diagnostics by disabling default actions in a LALR parser?


    Thanks,
    Ev. Drikos

    ----------------------------------------------------------------------
    $ cat default-actions.f90 && fcheck default-actions.f90
    IMPLICIT NONE ? (type, external)
    PRINT *, "Only ';', not a '(', in the expected tokens in diagnostics."
    END

    default-actions.f90:1: error: syntax:Unexpected: '?'. Expected: ";".

    Parsed with Errors: default-actions.f90
    $
    [When yacc was new and everything had to fit in 64K, small parse tables
    were important. Today when people include a megabyte library to get
    a four line routine, not so much. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Ev. Drikos on Fri Mar 18 16:47:47 2022
    On 2022-03-18, Ev. Drikos <[email protected]> wrote:
    Hello,

    This is mainly a parsing question but it's also Fortran related as well.

    When I make syntax checking with the command 'fcheck' in the code below,
    the error message doesn't contain a '(' in the expected tokens. This
    happens due to default actions, although the parser is basically LALR. A
    pure LALR parser wouldn't make reductions without examininig the lookahead.

    I think you mean default reductions?

    In the case of Yacc, the action is the body { $$ = $1; }

    :)

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Ev. Drikos on Fri Mar 18 18:12:15 2022
    Ev. Drikos <[email protected]> schrieb:

    This is mainly a parsing question but it's also Fortran related as well.

    [...]

    So far, my approach has been that improved diagnostics shouldn't slow
    down the processing of correct programs.

    With today's computer speeds, this is likely not a very important
    consideration any more.

    If you are compiling, it is usually a small fraction of time that
    is spent in the parsing, and much more in optimization and code
    generation. An example: Compiling a 50 k line Fortran program with
    "gfortran -O2" takes 17.4 seconds on the computer I type this on.
    Checking with "gfortran -fsyntax-only" takes 4.2 seconds. (For
    those who want to reproduce: aermod.f90 from the Polyhedron suite).

    50k lines for a single source files is already quite a lot (much
    longer than most source files for modular programs are likely to
    be) and throwing a bit more CPU time at the problem to reduce user
    confusion by emitting better error messages is extremely likely
    to be a win for the user. Just be careful to avoid anything
    worse than O(n log n) for code size, or somebody will come
    along with a test case that takes _really_ long.

    (Take the above with a grain of salt for C++ headers.)


    Is it worthwhile to improve
    diagnostics by disabling default actions in a LALR parser?

    I would presume so. Run a few benchmarks and find out.
    [In my experience, lexing and optimization take most of the
    time, and parsing is insignificant. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ev. Drikos@21:1/5 to Thomas Koenig on Sat Mar 19 19:58:19 2022
    On 18/03/2022 20:12, Thomas Koenig wrote:
    ...
    If you are compiling, it is usually a small fraction of time that
    is spent in the parsing, and much more in optimization and code
    generation. An example: Compiling a 50 k line Fortran program with
    "gfortran -O2" takes 17.4 seconds on the computer I type this on.
    Checking with "gfortran -fsyntax-only" takes 4.2 seconds. (For
    those who want to reproduce: aermod.f90 from the Polyhedron suite).
    ...

    Thanks. Just tested this large file and the runtime overhead seems
    to be negligible.

    Likely, I'll try the change but it took me a while to find another case
    with enumerators (that also lack error recovery now). Although my trial
    changes added messages for 43 states, some of them are useless and so
    this approach seems to be useful for BNF rules with an optional tail.

    Unavoidably, a parser/front-end has to make some guessing on error
    and this doesn't change easily. So, any improvement without default
    state reductions (hello Kaz) will be limited, as in the code below:


    -----------------------------------------------------------------------

    miniserver:errors suser$ cat enum-1.f90 && fcheck enum-1.f90
    ENUM, BIND(C)
    ENUMERATOR :: RED => 4, BLUE => 9
    ENUMERATOR YELLOW
    END ENUM
    END
    enum-1.f90:2: error: syntax:Unexpected: '=>'. Expected: ",", ";", or "=".

    Parsed with Errors: enum-1.f90
    miniserver:errors suser$

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)