• Compilation Quotient (CQ): A Metric for the Compilation Hardness of Pro

    From John R Levine@21:1/5 to All on Mon Jun 10 14:21:36 2024
    This preprint from TU Delft and ETH Zurich generates small programs from
    the grammars of several popular programs, and calculates CQ, which is
    roughly the percentage (0-100) that compile, intended as a proxy for how
    hard the languages are to write. C has a CQ of 48, Rust barely above
    zero.

    In the discussion at the end they say "A programmer's task is to write
    programs that compile." which I think summarizes the basic problem with
    the paper. Take a look.

    https://arxiv.org/abs/2406.04778

    Regards,
    John Levine, [email protected], Taughannock Networks, Trumansburg NY
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Derek@21:1/5 to All on Mon Jun 10 20:30:37 2024
    John,

    This preprint from TU Delft and ETH Zurich generates small programs from
    the grammars of several popular programs, and calculates CQ, which is
    roughly the percentage (0-100) that compile, intended as a proxy for how
    hard the languages are to write. C has a CQ of 48, Rust barely above
    zero.

    The paper
    Programming Languages vs. Fat Fingers https://www2.dmst.aueb.gr/dds/blog/20121205/index.html

    made small changes to existing code, in various languages,
    and then measured how many compiled, ran and produced
    the correct output.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Derek@21:1/5 to All on Tue Jun 11 00:28:18 2024
    John,

    [I had two other thoughts. One was that you can tell C was written when parsing was still hard enough that you didn't want to bulk the parsers
    up with semantic stuff. The other was that in the languages where it is
    hard to write a valid problem, how much more likely is it that the program actually works once you get it to compile? -John]

    C was created after Algol 68, whose 2-level grammar contained
    syntax+semantics. Algol 68 programs automatically generated from the
    language grammar should compile just fine. I suspect that output would
    be rare, because generating the code needed to produce output would be uncommon, and the path to it being the end result of a drunkards walk.

    C had a kind-of conventional grammar, where-as Algol 68 grammar is
    certainly not conventional (it might even be unique).
    [I never heard of any other language using VW-grammars. In C's
    defense, the early compilers -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Tue Jun 11 07:57:46 2024
    John Levine:
    [I had two other thoughts. One was that you can tell C was written when >parsing was still hard enough that you didn't want to bulk the parsers
    up with semantic stuff.

    To me it looks the other way 'round: syntax specification formalisms
    such as BNF inspired programming language designers to put a lot of
    stuff in syntax, because that was formal. E.g., Algol 60
    differentiates between booleans and other values on the syntax level.
    Algol 68 introduced Van Wijngaarden grammars to specify the type
    system and the syntax in one syntactic formalism.

    Other, later languages have reduced the scope of syntax (often only
    slightly), and specify the type system as a separate entity.
    Interestingly, I am not aware of a widely successful formalism for
    type systems, even though many programming languages specify static
    type systems and their implementations have to perform static type
    checking (plus there is also dynamic type checking).

    The other was that in the languages where it is
    hard to write a valid program, how much more likely is it that the program >actually works once you get it to compile? -John]

    That is the promise of programming langauges that make it hard to get
    a program to compile: get it to compile, and it is usually correct. I
    am not aware of any empirical evidence that supports this promise.

    - anton
    --
    M. Anton Ertl
    [email protected]
    http://www.complang.tuwien.ac.at/anton/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Derek@21:1/5 to All on Tue Jun 11 22:45:30 2024
    John, Anton,

    The other was that in the languages where it is
    hard to write a valid program, how much more likely is it that the program >> actually works once you get it to compile? -John]

    That is the promise of programming langauges that make it hard to get
    a program to compile: get it to compile, and it is usually correct. I
    am not aware of any empirical evidence that supports this promise.

    Requiring that variables are defined before use
    decreases incorrectness (which is not a marketable term).

    There is a tiny amount of evidence that strong typing may
    be a benefit https://shape-of-code.com/2014/08/27/evidence-for-the-benefits-of-strong-typing-where-is-it/

    cost effectiveness of benefits is a question that
    researchers avoid (it smacks of grubby usefulness).

    If you are interested in evidence, check out
    My book, Evidence-based Software Engineering, which
    discusses what is currently known about software engineering,
    based on an analysis of all the publicly available data
    pdf+code+all data freely available here:
    http://knosof.co.uk/ESEUR/

    If you know of any interesting software engineering
    data that I don't have, please tell me about it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hans-Peter Diettrich@21:1/5 to John R Levine on Wed Jun 12 11:27:21 2024
    On 6/10/24 2:21 PM, John R Levine wrote:
    generates small programs from
    the grammars of several popular programs,
    I think that the *syntactic grammar* of program *languages* is meant:

    The key idea is to measure the compilation success rates of programs
    sampled from context-free grammars.
    <<

    Then I wonder how ever valid random programs can be generated for
    languages that require a declaration before use of an identifier,
    clearly a *semantic* issue. A CQ of 40 for C indicates to me that
    certain semantic rules have been built into the program generator.

    Or what did I not understand right?

    DoDi
    [The paper describes the grammars they use. C grammar requires declarations precede other statements so that's easy to get right. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to Derek on Fri Jun 14 16:00:06 2024
    Derek <[email protected]> writes:
    That is the promise of programming langauges that make it hard to get
    a program to compile: get it to compile, and it is usually correct. I
    am not aware of any empirical evidence that supports this promise.

    Requiring that variables are defined before use
    decreases incorrectness (which is not a marketable term).

    It's not hard to get a program to compile if the compiler requires
    definition before use.

    The languages for which I have heard the claim the most are Haskell
    and Rust.

    I remember talking at a conference to someone who worked on the
    register allocator of IIRC SML/NJ (ML is an eager language on which
    the syntax and type system of Haskell are based AFAICT), and it did
    not sound like the promise had been achieved. I also wonder how all
    the correctness criteria of a register allocator could be modeled as
    Haskell or Rust types.

    If you are interested in evidence, check out
    My book, Evidence-based Software Engineering, which
    discusses what is currently known about software engineering,
    based on an analysis of all the publicly available data
    pdf+code+all data freely available here:
    http://knosof.co.uk/ESEUR/

    Cool book. If only I had more time to read all the interesting books.

    - anton
    --
    M. Anton Ertl
    [email protected]
    http://www.complang.tuwien.ac.at/anton/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)