• C23 thoughts and opinions

    From David Brown@21:1/5 to All on Wed May 22 18:55:36 2024
    In an attempt to bring some topicality to the group, has anyone started
    using, or considering, C23 ? There's quite a lot of change in it,
    especially compared to the minor changes in C17.

    <https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf> <https://en.wikipedia.org/wiki/C23_(C_standard_revision)> <https://en.cppreference.com/w/c/23>

    I like that it tidies up a lot of old stuff - it is neater to have
    things like "bool", "static_assert", etc., as part of the language
    rather than needing a half-dozen includes for such basic stuff.

    I like that it standardises a several useful extensions that have been
    in gcc and clang (and possibly other compilers) for many years.

    I'm not sure it will make a big difference to my own programming - when
    I want "typeof" or "chk_add()", I already use them in gcc. But for
    people restricted to standard C, there's more new to enjoy. And I
    prefer to use standard syntax when possible.

    "constexpr" is something I think I will find helpful, in at least some circumstances.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Malcolm McLean on Wed May 22 21:50:51 2024
    On 22/05/2024 21:10, Malcolm McLean wrote:
    On 22/05/2024 17:55, David Brown wrote:
    In an attempt to bring some topicality to the group, has anyone
    started using, or considering, C23 ?  There's quite a lot of change in
    it, especially compared to the minor changes in C17.

    <https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>
    <https://en.wikipedia.org/wiki/C23_(C_standard_revision)>
    <https://en.cppreference.com/w/c/23>

    I like that it tidies up a lot of old stuff - it is neater to have
    things like "bool", "static_assert", etc., as part of the language
    rather than needing a half-dozen includes for such basic stuff.

    I like that it standardises a several useful extensions that have been
    in gcc and clang (and possibly other compilers) for many years.

    I'm not sure it will make a big difference to my own programming -
    when I want "typeof" or "chk_add()", I already use them in gcc.  But
    for people restricted to standard C, there's more new to enjoy.  And I
    prefer to use standard syntax when possible.

    "constexpr" is something I think I will find helpful, in at least some
    circumstances.


    So I'm currently writing some code (you can follow my progress on
    github, it is a new branch in the Baby X resource compiler project). And
    it's just standard well understood algorithm code to manipulate XML
    trees. And I certainly don't feel the neeed for static_assert.

    I use static assertions everywhere I can. I used them long before C11
    added them to the language, using a somewhat messy macro to force an
    error if the assertion fails. They catch mistakes, they document
    assumptions, they make code clearer to the reader. And they do so with
    zero cost in code space or run-time, and no more effort than writing a
    comment. I find it hard to understand why anyone would actively choose
    not to use them.

    But even
    boolean type and const.

    Bool is much more than "int 0" and "int 1". And it is significantly
    clearer in code. (Sometimes, of course, a specific enumerated type is
    clearer than bool or int.)

    Const documents the code, makes the action of a function clearer to the
    reader, and helps catch mistakes.

    These are all things that make the language better, and have done so for
    the past 25 years.

    Of course quite alot of the functions don't
    actually change the structures they are passed. But is littering the
    code with const going to help? And why do you really need a boolean when
    an int can hold either a zero or non-zero value?

    And don't you just want a pared down, clean language?


    I want a language with the features I need and that help me to write
    good clear code. Minimal is not helpful, any more than needlessly
    complex is helpful.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Thiago Adams on Wed May 22 22:11:44 2024
    On 22/05/2024 19:42, Thiago Adams wrote:
    On 22/05/2024 13:55, David Brown wrote:
    In an attempt to bring some topicality to the group, has anyone
    started using, or considering, C23 ?  There's quite a lot of change in
    it, especially compared to the minor changes in C17.

    <https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>
    <https://en.wikipedia.org/wiki/C23_(C_standard_revision)>
    <https://en.cppreference.com/w/c/23>

    I like that it tidies up a lot of old stuff - it is neater to have
    things like "bool", "static_assert", etc., as part of the language
    rather than needing a half-dozen includes for such basic stuff.

    I like that it standardises a several useful extensions that have been
    in gcc and clang (and possibly other compilers) for many years.

    I'm not sure it will make a big difference to my own programming -
    when I want "typeof" or "chk_add()", I already use them in gcc.  But
    for people restricted to standard C, there's more new to enjoy.  And I
    prefer to use standard syntax when possible.

    "constexpr" is something I think I will find helpful, in at least some
    circumstances.

    I am waiting MSVC support. There are a lot of simple features MSVC could implement and deliver in small increments. But it is very slow.

    MSVC is primarily a C++ compiler - the C support is more of a leftover
    from the previous century, with a few post-C90 features as an
    afterthought. Surely for C development on Windows, rather than C++,
    you'd look for something better?


    I am would use today if I had.

     - #warning
     - [[nodiscard]]
     - typeof
     - digit separators
     - bool true, false


    I use these today in C, except the digit separators (I use them in C++).
    But as I say, it's nice to see them as standard rather than just
    common extensions.

    I am not planning to use:

     - enum with specific types.

    I haven't found a use for these in C++, and I'm not sure I'll need them
    in C either. I sometimes have ordinary enum types in bitfields for
    specific sizes.

     - #elifdef

    The will slightly neaten some of my pre-processor handling. My strong preference for preprocessor symbols for conditional compilation and the
    like is to have symbols that are always defined, but to different
    values, and use "#if" checks rather than "#ifdef" - when combined with
    gcc warnings, it makes it far easier to catch spelling mistakes, and it
    makes it easy to jump in the code to where the symbol is defined. But
    #ifdef checks do turn up, and this will give marginally neater code.

     - nullptr

    I am fond of nullptr in C++, and will use it in C. Like most of the C23 changes, it's not a big issue - after all, you get a lot of the same
    effect with "#define nullptr (void*)(0)" or similar. But it means your
    code has a visual distinction between the integer 0 and a null pointer,
    and also lets the compiler or other static checking system check better
    than using NULL would. (And I don't like NULL - I dislike all-caps
    identifiers in general.)

     - auto

    I use that occasionally in gcc, as __auto_type. It can be helpful in
    macros. I might use it more when it is standardised. (I use auto in
    C++ a bit more often.)

     - constexpr

    I will definitely use that. Sometimes I want a constant expression for
    things like array sizes or static initialisers, and want to calculate
    it. constexpr gives you that without having to resort to macros. (I'd
    perhaps be even happier if I could just use const, as I can in C++.)


    Not sure
     - empty initializer


    I don't see that one being a big hit, at least for me. But I see little benefit in /not/ allowing it in the language, so it seems a sensible
    addition.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Malcolm McLean on Thu May 23 02:47:20 2024
    On Wed, 22 May 2024 20:10:51 +0100, Malcolm McLean wrote:

    And why do you really need a boolean when
    an int can hold either a zero or non-zero value?

    Knowing that there is a range containing just two possible values allows
    you to use the type as an array index.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Malcolm McLean on Thu May 23 02:46:25 2024
    On Wed, 22 May 2024 20:10:51 +0100, Malcolm McLean wrote:

    And don't you just want a pared down, clean language?

    The train for BCPL and BLISS now leaving on platform 1980 ...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Malcolm McLean on Thu May 23 02:48:50 2024
    On Wed, 22 May 2024 21:39:17 +0100, Malcolm McLean wrote:

    static int haserror(LEXER *lex)
    {
    return lex->error[0] ? 1 : 0;
    }

    static bool has_error(LEXER * lex)
    {
    return lex->error[0] != 0;
    } /*has_error*/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Thiago Adams on Thu May 23 02:49:37 2024
    On Wed, 22 May 2024 14:42:58 -0300, Thiago Adams wrote:

    I am waiting MSVC support. There are a lot of simple features MSVC could implement and deliver in small increments. But it is very slow.

    And they wonder why developers are deserting the Windows platform for
    Linux.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Thiago Adams on Thu May 23 02:59:28 2024
    On Wed, 22 May 2024 22:23:26 -0300, Thiago Adams wrote:

    I like the idea of embed ...

    We’ve discussed this before. It just seems like a sop to those stuck with antiquated, crippled build systems. In which case, how would they get an up-to-date compiler that supports it?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to David Brown on Thu May 23 03:13:50 2024
    On Wed, 22 May 2024 18:55:36 +0200, David Brown wrote:

    <https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>

    Unicode identifiers!

    typedef int
    typėdef;

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Keith Thompson on Thu May 23 04:47:44 2024
    On Wed, 22 May 2024 21:30:34 -0700, Keith Thompson wrote:

    ... code will be written to use it.

    Funny, isn’t it, that when I post code using other features of C (like iso646.h), I get piled on by people who don’t like it, with quite the opposite argument.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Keith Thompson on Thu May 23 04:20:24 2024
    On Wed, 22 May 2024 21:08:54 -0700, Keith Thompson wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:

    On Wed, 22 May 2024 22:23:26 -0300, Thiago Adams wrote:

    I like the idea of embed ...

    We’ve discussed this before. It just seems like a sop to those stuck
    with antiquated, crippled build systems. In which case, how would they
    get an up-to-date compiler that supports it?

    Presumably by waiting until compilers support it, like any new feature.

    Time/effort would be better spent investing in a more versatile build
    system. Which would have the added advantage of supporting other languages besides C.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Kettlewell@21:1/5 to Malcolm McLean on Thu May 23 09:07:23 2024
    Malcolm McLean <[email protected]> writes:
    static int haserror(LEXER *lex)
    {
    return lex->error[0] ? 1 : 0;
    }

    error is a character buffer which holds the error message if an error
    has been encountered. And for convenience it is placed in the
    lexer. If here is no error, it holds the empty string. However it's
    not entirely obvious that testing the message directly is the way you
    should be testing for an error condition, so I wrote that little
    function to make things clearer.

    It's easy enough to make it return a boolean, of course. But I don't
    see a real benefit.

    Possible benefits:

    1) It conveys information to the reader about the nature of the
    function. In this particular case the name also conveys that
    information well enough, so there’s not actually much to be gained
    here, but it other contexts there may be more of an advantage.

    2) It conveys information to the compiler that may be exploited by the
    optimizer (depending on the compilation model, the capabilities of
    the target platform and optimizer, etc).

    We are gradually migrating functions with boolean sense to returning
    bool, albeit not very systematically, mainly for reasons #1.

    --
    https://www.greenend.org.uk/rjk/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Thiago Adams on Thu May 23 13:11:16 2024
    On 23/05/2024 02:21, Thiago Adams wrote:
    Em 5/22/2024 7:53 PM, Keith Thompson escreveu:

    But const doesn't mean constant.  It means read-only.
    `const int r = rand();` is perfectly valid.

    I dislike the C++ hack of making N a constant expression given
    `const int N = 42;`; constexpr made that unnecessary.  C23 makes the
    same (IMHO) mistake.

    If I had a time machine, I'd spell "const" as "readonly" and make
    "const" mean what "constexpr" now means (evaluated at compile time).

    [...]

    Everything is a mess: const in C++, the differences from const in C,
    etc. constexpr in C23 just makes the mess bigger.

    auto is a mess as well not well specified for pointer. not sure if we
    had this topic here, but auto * p in C is not specified.

    I would remove from C23
    - nullptr
    -auto
    -constexpr
    -embed

    I like the idea of embed but there is no implementation in production so
    this is crazy!

    'embed' was discussed a few months ago. I disagreed with the poor way it
    was to be implemented: 'embed' notionally generates a list of
    comma-separated numbers as tokens, where you have to take care of any
    trailing zero yourself if needed. It would also be hopelessly
    inefficient if actually implemented like that.

    I compared it to the scheme in my own language, which could import text
    files, but binary ones didn't really work.

    Since then embedding has been considerably improved, so that it works
    like this:

    []char str = sinclude("hello.c")
    []byte data = binclude("hello.exe")

    The file-embedding is done by sinclude or binclude. The former adds a
    zero terminator to the embedded file data (expected to be a text file), otherwise they are the same.

    binclude can initialise any kind of array, including a 2D array of any
    element type, although the data in the file needs to be suitable.

    C23's 'embed' was claimed to be more flexible, as you can have
    consecutive 'embed' directives initialising the same array. I can do the
    same:

    []byte file = binclude("hello.exe") + binclude("/cx/big/sql.exe")

    proc main=
    println file.len
    end

    This generates an executable of 1077248 bytes, and displays 1050112 when
    run, the combined size of those two embedded binaries. Compiling this
    took 50ms.

    ("+" here is a compile-time operator that can concatenate constant
    strings or also binary data like this.)

    Basically, you are right that the ad hoc features of C23 are messy.

    I suspect that ones like 'embed' have been derived from C++ which always
    likes to make things too wide-ranging and much harder to use and
    implement than necessary.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Thu May 23 15:02:26 2024
    On Wed, 22 May 2024 18:55:36 +0200
    David Brown <[email protected]> wrote:

    In an attempt to bring some topicality to the group, has anyone
    started using, or considering, C23 ? There's quite a lot of change
    in it, especially compared to the minor changes in C17.

    <https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf> <https://en.wikipedia.org/wiki/C23_(C_standard_revision)> <https://en.cppreference.com/w/c/23>

    I like that it tidies up a lot of old stuff - it is neater to have
    things like "bool", "static_assert", etc., as part of the language
    rather than needing a half-dozen includes for such basic stuff.

    I like that it standardises a several useful extensions that have
    been in gcc and clang (and possibly other compilers) for many years.

    I'm not sure it will make a big difference to my own programming -
    when I want "typeof" or "chk_add()", I already use them in gcc. But
    for people restricted to standard C, there's more new to enjoy. And
    I prefer to use standard syntax when possible.

    "constexpr" is something I think I will find helpful, in at least
    some circumstances.



    Removed
    1) Old-style function declarations and definitions
    2) Representations for signed integers other than two's complement
    3) Permission that u/U-prefixed character constants and string
    literals may be not UTF-16/32
    4) Mixed wide string literal concatenation
    5) Support for calling realloc() with zero size (the behavior becomes undefined) 6) __alignof_is_defined and __alignas_is_defined
    7) static_assert is not provided as a macro defined in <assert.h>
    (becomes a keyword) 8) thread_local is not provided as a macro defined
    in <threads.h> (becomes a keyword)

    1) good
    2) good, but insufficient. The next logical step is to make both left
    and right shift of negative integers by count that does not exceed #
    of bits in respective type fully defined
    3) IDNC
    4) IDNC
    5) IDNC
    6) IDNC
    7) bad. Breaks existing code for weak reason
    8) bad. Breaks existing code for weak reason


    Deprecated
    1) <stdnoreturn.h>
    2) Old feature-test macros
    __STDC_IEC_559__
    __STDC_IEC_559_COMPLEX__
    3) _Noreturn function specifier
    4) _Noreturn attribute token
    5) asctime()
    6) ctime()
    7) DECIMAL_DIG (use the appropriate type-specific macro
    (FLT_DECIMAL_DIG, etc) instead)
    8) Definition of following numeric limit macros in <math.h> (they
    should be used via <float.h>)
    INFINITY
    DEC_INFINITY
    NAN
    DEC_NAN
    9) __bool_true_false_are_defined

    No opinion on most of those.
    W.r.t. 5 and 6.
    IMHO, all old-UNIX-style APIs that return pointers to static
    objects within library or rely on presence of static object within
    library for purpose of preserving state for subsequent calls should be systematically deprecated and for majority of them there should be
    provided thread-safe alternatives akin to ctime_s().
    That is, with exception of family of functions that uses FILE*. Not
    that I like them very much, but they are ingrained too deeply.
    So, peeking just asctime and ctime out of long list of problematic
    APIs does not appear particularly consistent. If they were asking me
    where to start, I'd start with rand().

    With regard to new feature, the list is too long to comment in one post.
    Just want to say that strfrom* family is long overdue, but still appear incomplete. The guiding principle should be that all format specifiers available in printf() with sole exception of %s should be provided as
    strfrom* as well.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Thu May 23 14:32:46 2024
    On 23/05/2024 00:53, Keith Thompson wrote:
    David Brown <[email protected]> writes:
    On 22/05/2024 19:42, Thiago Adams wrote:
    [...]
     - nullptr

    I am fond of nullptr in C++, and will use it in C. Like most of the
    C23 changes, it's not a big issue - after all, you get a lot of the
    same effect with "#define nullptr (void*)(0)" or similar. But it
    means your code has a visual distinction between the integer 0 and a
    null pointer, and also lets the compiler or other static checking
    system check better than using NULL would. (And I don't like NULL - I
    dislike all-caps identifiers in general.)

    Quibble: That should be

    #define nullptr ((void*)0)


    Indeed.

    For example, this doesn't produce a syntax error for `sizeof nullptr`.

    Better:

    #if __STDC_VERSION__ < 202311L
    #define nullptr ((void*)0)
    #endif

    C23's nullptr is of type nullptr_t, not void*. But you'd probably have
    to go out of your way for that to be an issue (e.g., using nullptr in a generic selection).


    The use of generics can be an advantage of nullptr here. The use in
    templates was a prime motivation of introducing nullptr to C++, though I
    think it is fair to say that templates are very much more popular in C++
    than _Generic is in C. But I haven't thought of a real-world use-case yet!


    [...]

     - constexpr

    I will definitely use that. Sometimes I want a constant expression
    for things like array sizes or static initialisers, and want to
    calculate it. constexpr gives you that without having to resort to
    macros. (I'd perhaps be even happier if I could just use const, as I
    can in C++.)

    But const doesn't mean constant. It means read-only.
    `const int r = rand();` is perfectly valid.

    Yes - which is why "constexpr" can be useful.


    I dislike the C++ hack of making N a constant expression given
    `const int N = 42;`; constexpr made that unnecessary.

    I find that "hack" convenient at times. But I see what you mean that it
    is a "hack", and I agree that "constexpr" makes such a hack unnecessary.
    (Ideally, the languages would have used terms such as "read_only" and "constant" rather than "const" and "constexpr", but that boat sailed
    long ago.)

    C23 makes the
    same (IMHO) mistake.

    I don't think so - as far as I can see, it avoids that mistake (if you
    feel the "hack" was a mistake). C23 can't fix the choice of names -
    that was from C90.


    If I had a time machine, I'd spell "const" as "readonly" and make
    "const" mean what "constexpr" now means (evaluated at compile time).

    [...]


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Lawrence D'Oliveiro on Thu May 23 15:36:03 2024
    On Thu, 23 May 2024 02:49:37 -0000 (UTC)
    Lawrence D'Oliveiro <[email protected]d> wrote:

    On Wed, 22 May 2024 14:42:58 -0300, Thiago Adams wrote:

    I am waiting MSVC support. There are a lot of simple features MSVC
    could implement and deliver in small increments. But it is very
    slow.

    And they wonder why developers are deserting the Windows platform for
    Linux.

    In practice, on my old home Windows PC (11 y.o. installation of 14 y.o.
    OS) today, 2024-05-23, I can easily install and use gcc14.1.0 alongside clang18.1.5 alongside one of the newest versions of MSVC (not sure
    which one) alongside latest Intel ICC alongside any older version of
    MSVC and ICC and with a little more effort and disk space alongside
    older versions of clang and gcc at least as long back as gcc4.9. I can
    use all of those either simultaneously or interchangeably.
    I very much doubt that I can get similar variety of compiler versions
    on Linux of similar age or even on one that is 5 years younger. Even on
    most up to date Linux distros, in order to get such compilers zoo, I'd
    probably have to fight against package manager rather than be assisted
    by it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Thiago Adams on Thu May 23 14:17:53 2024
    On 22/05/2024 22:26, Thiago Adams wrote:
    On 22/05/2024 17:11, David Brown wrote:
    On 22/05/2024 19:42, Thiago Adams wrote:
    On 22/05/2024 13:55, David Brown wrote:
    In an attempt to bring some topicality to the group, has anyone
    started using, or considering, C23 ?  There's quite a lot of change
    in it, especially compared to the minor changes in C17.

    <https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>
    <https://en.wikipedia.org/wiki/C23_(C_standard_revision)>
    <https://en.cppreference.com/w/c/23>



      - constexpr

    I will definitely use that.  Sometimes I want a constant expression
    for things like array sizes or static initialisers, and want to
    calculate it.  constexpr gives you that without having to resort to
    macros.  (I'd perhaps be even happier if I could just use const, as I
    can in C++.)

    I am curious for that. Do you have a sample?


    If I try to be precise about the terms "constant expression", "integer
    constant expression", etc., I suspect I will get the details wrong
    unless I spend a lot of time checking carefully. So I hope it is good
    enough for me to be a bit lazy and quote the error messages from gcc
    (with "-std=c23 -Wpedantic").


    With this code, compilation fails "initialiser element is not a
    constant" for y.

    int x = 100;
    int y = x / 20;
    int zs[y];


    With this code, compilation fails because the zs is actually a VLA, and "variably modified 'zs' at file scope" is not allowed.

    const int x = 100;
    const int y = x / 20;
    int zs[y];


    This code, however, is fine:

    constexpr int x = 100;
    constexpr int y = x / 20;
    int zs[y];


    This also works, even for older standards:

    enum { x = 100 };
    enum { y = x / 20 };
    int zs[y];


    But constexpr works for other types, not just "int" which is the type of
    all enumeration constants. (And "enum" constants are a somewhat weird
    way to get this effect - "constexpr" looks neater.)

    And in general, I like to be able to say, to the compiler and to people
    reading the code, "this thing is really fixed and constant, and stop
    compiling if you think I am wrong" rather than just "I promise I won't
    change this thing - or if I do, I don't mind the nasal daemons".





    Not sure
      - empty initializer


    I don't see that one being a big hit, at least for me.  But I see
    little benefit in /not/ allowing it in the language, so it seems a
    sensible addition.

    This is what I use
    struct X x = {0};
    But I can do a find-replace and change everything to {}


    You could, but I don't really see the point of such a change. But in
    new code it would be fine to write "= {}" rather than "= { 0 }".

    When I create samples, I use new feature like nullptr and {}.
    The problem I see is to use these features in real code, and create a
    mess of styles.


    I think you need significant motivation to justify changing style in
    existing code, and I don't see anything here that would make me want to
    change existing C17 code to C23 code. But when writing new code, I'd
    use the new features.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Thu May 23 15:43:31 2024
    On Wed, 22 May 2024 22:11:44 +0200
    David Brown <[email protected]> wrote:


    I will definitely use that. Sometimes I want a constant expression
    for things like array sizes or static initialisers, and want to
    calculate it. constexpr gives you that without having to resort to
    macros.

    I don't say that everything that can be done with C23 constexpr can be
    done with enum, but for uses like ones you mentioned above, 90%
    probably can be done with enum.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Thiago Adams on Thu May 23 15:11:45 2024
    On 23/05/2024 14:38, Thiago Adams wrote:
    On 23/05/2024 09:17, David Brown wrote:
    If I try to be precise about the terms "constant expression", "integer
    constant expression", etc., I suspect I will get the details wrong
    unless I spend a lot of time checking carefully.  So I hope it is good
    enough for me to be a bit lazy and quote the error messages from gcc
    (with "-std=c23 -Wpedantic").


    With this code, compilation fails "initialiser element is not a
    constant" for y.

         int x = 100;
         int y = x / 20;
         int zs[y];


    With this code, compilation fails because the zs is actually a VLA,
    and "variably modified 'zs' at file scope" is not allowed.

         const int x = 100;
         const int y = x / 20;
         int zs[y];


    This code, however, is fine:

         constexpr int x = 100;
         constexpr int y = x / 20;
         int zs[y];


    This also works, even for older standards:

         enum { x = 100 };
         enum { y = x / 20 };
         int zs[y];


    But constexpr works for other types, not just "int" which is the type
    of all enumeration constants.  (And "enum" constants are a somewhat
    weird way to get this effect - "constexpr" looks neater.)

    And in general, I like to be able to say, to the compiler and to
    people reading the code, "this thing is really fixed and constant, and
    stop compiling if you think I am wrong" rather than just "I promise I
    won't change this thing - or if I do, I don't mind the nasal daemons".

    We can write:

    #define X 100
    #define Y ((X) / 20)
    int zs[Y];

    I cannot see a good justification for constexpr.

    Clearer code, better checking along the way, better typing. I don't
    think constexpr lets you do things you couldn't do before, but it lets
    you do those things in a neater way. (IMHO.)

    I already see bad usages of constexpr in C++ code. It was used in cases
    where we know for sure that is NOT compile time. This just make review
    harder "why did someone put this here?" conclusion was it was totally unnecessary and ignored by the compiler. The programmer was trying to
    add something extra, like "magic" hoping for something that would never happen.


    IME poor or confusing uses of "constexpr" are for functions, not
    objects, and C23 does not support "constexpr" for functions.

    I think it is better to think of constexpr functions in C++ as "pure"
    functions - confusingly called __attribute__((const)) functions in gcc
    and [[unsequenced] functions in C23. That is, functions that don't
    affect anything around them, are not affected by anything external, have
    no side effects, always give the same results for the same parameters,
    and can be called more or fewer times without affecting the program's observable behaviour. (It's not exactly like that in C++ - a
    "constexpr" function is implicitly inline and needs a local definition.
    But I think that is how it could have been handled.)

    The whole thing - in C and C++ - suffers somewhat from being addons over
    time rather than part of the original design of the language. But
    that's inevitable as an old language evolves.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to bart on Thu May 23 15:25:43 2024
    On 23/05/2024 14:11, bart wrote:
    On 23/05/2024 02:21, Thiago Adams wrote:
    Em 5/22/2024 7:53 PM, Keith Thompson escreveu:

    But const doesn't mean constant.  It means read-only.
    `const int r = rand();` is perfectly valid.

    I dislike the C++ hack of making N a constant expression given
    `const int N = 42;`; constexpr made that unnecessary.  C23 makes the
    same (IMHO) mistake.

    If I had a time machine, I'd spell "const" as "readonly" and make
    "const" mean what "constexpr" now means (evaluated at compile time).

    [...]

    Everything is a mess: const in C++, the differences from const in C,
    etc. constexpr in C23 just makes the mess bigger.

    auto is a mess as well not well specified for pointer. not sure if we
    had this topic here, but auto * p in C is not specified.

    I would remove from C23
    - nullptr
    -auto
    -constexpr
    -embed

    I like the idea of embed but there is no implementation in production
    so this is crazy!

    'embed' was discussed a few months ago. I disagreed with the poor way it
    was to be implemented: 'embed' notionally generates a list of
    comma-separated numbers as tokens, where you have to take care of any trailing zero yourself if needed. It would also be hopelessly
    inefficient if actually implemented like that.

    Fortunately, it is /not/ actually implemented like that - it is only implemented "as if" it were like that. Real prototype implementations
    (for gcc and clang - I don't know about other tools) are extremely
    efficient at handling #embed. And the comma-separated numbers can be
    more flexible in less common use-cases.

    (That was also made clear in the previous discussion. It's been a while
    since you posted much here - it's nice to see you back on form :-) )

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Lawrence D'Oliveiro on Thu May 23 15:42:29 2024
    On 23/05/2024 05:13, Lawrence D'Oliveiro wrote:
    On Wed, 22 May 2024 18:55:36 +0200, David Brown wrote:

    <https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>

    Unicode identifiers!

    typedef int
    typėdef;

    These have been around since C99...

    There are a couple of minor tweaks to the characters supported, I think,
    but nothing anyone is likely to notice in practice.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Chris M. Thomasson on Thu May 23 15:35:21 2024
    On 22/05/2024 23:24, Chris M. Thomasson wrote:
    On 5/22/2024 9:55 AM, David Brown wrote:
    In an attempt to bring some topicality to the group, has anyone
    started using, or considering, C23 ?  There's quite a lot of change in
    it, especially compared to the minor changes in C17.

    Love the way std::vectors respect alignas... C++20, iirc?

    [...]

    I have no idea what you are talking about.

    But did you notice that this is c.l.c, not c.l.c++, and the topic is
    C23, not C++23 ? Discussing comparisons or compatibility with C++ is
    fair enough, but talking about pure C++ matters (such as std::vector<>)
    is unlikely to be helpful.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Thu May 23 15:31:19 2024
    On 23/05/2024 14:43, Michael S wrote:
    On Wed, 22 May 2024 22:11:44 +0200
    David Brown <[email protected]> wrote:


    I will definitely use that. Sometimes I want a constant expression
    for things like array sizes or static initialisers, and want to
    calculate it. constexpr gives you that without having to resort to
    macros.

    I don't say that everything that can be done with C23 constexpr can be
    done with enum, but for uses like ones you mentioned above, 90%
    probably can be done with enum.


    I realise that, and use enum for such things today. But IMHO constexpr
    is neater and it also covers the other 10%.

    I think most of the new features of C23 neaten up the language a bit.
    They are not game-changers - I doubt that any of them will significantly
    change the way anyone writes their code (especially for those already
    happy with gcc or clang extensions). But there are several things here
    that can make code a little nicer.

    So yes, I /could/ use enum constants for things that are not
    enumerations. I /did/ use them for that. But going forward with C23,
    I'll use constexpr instead.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Thu May 23 15:56:39 2024
    On 23/05/2024 14:02, Michael S wrote:
    On Wed, 22 May 2024 18:55:36 +0200
    David Brown <[email protected]> wrote:

    In an attempt to bring some topicality to the group, has anyone
    started using, or considering, C23 ? There's quite a lot of change
    in it, especially compared to the minor changes in C17.

    <https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>
    <https://en.wikipedia.org/wiki/C23_(C_standard_revision)>
    <https://en.cppreference.com/w/c/23>

    I like that it tidies up a lot of old stuff - it is neater to have
    things like "bool", "static_assert", etc., as part of the language
    rather than needing a half-dozen includes for such basic stuff.

    I like that it standardises a several useful extensions that have
    been in gcc and clang (and possibly other compilers) for many years.

    I'm not sure it will make a big difference to my own programming -
    when I want "typeof" or "chk_add()", I already use them in gcc. But
    for people restricted to standard C, there's more new to enjoy. And
    I prefer to use standard syntax when possible.

    "constexpr" is something I think I will find helpful, in at least
    some circumstances.



    Removed
    1) Old-style function declarations and definitions
    2) Representations for signed integers other than two's complement
    3) Permission that u/U-prefixed character constants and string
    literals may be not UTF-16/32
    4) Mixed wide string literal concatenation
    5) Support for calling realloc() with zero size (the behavior becomes undefined)
    6) __alignof_is_defined and __alignas_is_defined
    7) static_assert is not provided as a macro defined in <assert.h>
    (becomes a keyword)
    8) thread_local is not provided as a macro defined
    in <threads.h> (becomes a keyword)

    1) good

    Yes, at long last.

    2) good, but insufficient. The next logical step is to make both left
    and right shift of negative integers by count that does not exceed #
    of bits in respective type fully defined

    Agreed.

    3) IDNC
    4) IDNC
    5) IDNC
    6) IDNC
    7) bad. Breaks existing code for weak reason
    8) bad. Breaks existing code for weak reason


    I am of the opinion that people should specify the standard they use as
    part of their build procedures. (I'd have liked a standard way to
    specify the C standard version code uses, so that it could be fixed in
    source code files.) I don't think people should take random code for
    Cxx and assume blindly that it will work for Cyy.

    Yes, these will break some code. But I don't think it will break much,
    and it will be nice to cut down on some of these headers. I have some
    very old code that defines static_assert as a macro involving typedefs
    with structs that can have positive or negative sizes, for C90 and C99.
    I don't expect to compiler these as C23 without testing - backwards compatibility is vital, but excessive backwards compatibility restricts improvements to the language.

    Still, it's a valid complaint. No change is going to please everyone!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Thu May 23 17:19:11 2024
    On Wed, 22 May 2024 18:55:36 +0200
    David Brown <[email protected]> wrote:

    In an attempt to bring some topicality to the group, has anyone
    started using, or considering, C23 ? There's quite a lot of change
    in it, especially compared to the minor changes in C17.


    Why C Standard Committee, while being recently quite liberal in field
    of introducing new keywords (too liberal for my liking, many new things
    do not really deserve keywords not prefixed by __) is so conservative
    in introduction of program control constructs? I don't remember any
    new program control introduced under Committee regime.
    And I want at least one.


    Another area that was mostly unchanged since 1st edition of K&R is
    storage classes. Even such obvious thing as removal of 'auto' class
    took too long. If I am not mistaken, totally obsolete 'register' class
    is still allowed. And I don't remember any additions.
    Personally I can think about at least two useful backward-compatible
    additions in that area.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Michael S on Thu May 23 16:40:09 2024
    Michael S <[email protected]> writes:
    On Thu, 23 May 2024 02:49:37 -0000 (UTC)
    Lawrence D'Oliveiro <[email protected]d> wrote:

    On Wed, 22 May 2024 14:42:58 -0300, Thiago Adams wrote:

    I am waiting MSVC support. There are a lot of simple features MSVC
    could implement and deliver in small increments. But it is very
    slow.

    And they wonder why developers are deserting the Windows platform for
    Linux.

    In practice, on my old home Windows PC (11 y.o. installation of 14 y.o.
    OS) today, 2024-05-23, I can easily install and use gcc14.1.0 alongside >clang18.1.5 alongside one of the newest versions of MSVC (not sure
    which one) alongside latest Intel ICC alongside any older version of
    MSVC and ICC and with a little more effort and disk space alongside
    older versions of clang and gcc at least as long back as gcc4.9. I can
    use all of those either simultaneously or interchangeably.
    I very much doubt that I can get similar variety of compiler versions
    on Linux of similar age or even on one that is 5 years younger. Even on
    most up to date Linux distros, in order to get such compilers zoo, I'd >probably have to fight against package manager rather than be assisted
    by it.

    While it is not likely that the set of pre-built packages available from the vendor for a particular distribution will include more than two versions
    each of gcc and clang, with a simple script, one can easily build
    all the versions of gcc or clang that one could ever want. Our
    linux systems have gcc4,5,6,7,8,9,10,11,12 and 13 on them as well
    as several versions of clang. If our customers were interested
    in ICC (which is unlikely as they're mainly ARM based), linux could
    accomodate them as well.

    And given the extensive use of gcc extensions, msvc is out of the question.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Thiago Adams on Thu May 23 17:08:03 2024
    Thiago Adams <[email protected]> writes:
    On 23/05/2024 09:17, David Brown wrote:
    If I try to be precise about the terms "constant expression", "integer
    constant expression", etc., I suspect I will get the details wrong
    unless I spend a lot of time checking carefully.  So I hope it is good
    enough for me to be a bit lazy and quote the error messages from gcc
    (with "-std=c23 -Wpedantic").


    With this code, compilation fails "initialiser element is not a
    constant" for y.

        int x = 100;
        int y = x / 20;
        int zs[y];


    With this code, compilation fails because the zs is actually a VLA, and
    "variably modified 'zs' at file scope" is not allowed.

        const int x = 100;
        const int y = x / 20;
        int zs[y];


    This code, however, is fine:

        constexpr int x = 100;
        constexpr int y = x / 20;
        int zs[y];


    This also works, even for older standards:

        enum { x = 100 };
        enum { y = x / 20 };
        int zs[y];


    But constexpr works for other types, not just "int" which is the type of
    all enumeration constants.  (And "enum" constants are a somewhat weird
    way to get this effect - "constexpr" looks neater.)

    And in general, I like to be able to say, to the compiler and to people
    reading the code, "this thing is really fixed and constant, and stop
    compiling if you think I am wrong" rather than just "I promise I won't
    change this thing - or if I do, I don't mind the nasal daemons".

    We can write:

    #define X 100
    #define Y ((X) / 20)

    Neither of which convey type information.

    int zs[Y];

    I cannot see a good justification for constexpr.

    Which does convey type information, and thus would
    be superior to untyped macro definitions.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Michael S on Thu May 23 17:10:38 2024
    Michael S <[email protected]> writes:
    On Wed, 22 May 2024 22:11:44 +0200
    David Brown <[email protected]> wrote:


    I will definitely use that. Sometimes I want a constant expression
    for things like array sizes or static initialisers, and want to
    calculate it. constexpr gives you that without having to resort to
    macros.

    I don't say that everything that can be done with C23 constexpr can be
    done with enum, but for uses like ones you mentioned above, 90%
    probably can be done with enum.

    Are C23 enums signed? or unsigned? What is the supported enum range?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Scott Lurndal on Thu May 23 20:31:59 2024
    On Thu, 23 May 2024 17:10:38 GMT
    [email protected] (Scott Lurndal) wrote:

    Michael S <[email protected]> writes:
    On Wed, 22 May 2024 22:11:44 +0200
    David Brown <[email protected]> wrote:


    I will definitely use that. Sometimes I want a constant expression
    for things like array sizes or static initialisers, and want to
    calculate it. constexpr gives you that without having to resort to
    macros.

    I don't say that everything that can be done with C23 constexpr can
    be done with enum, but for uses like ones you mentioned above, 90%
    probably can be done with enum.

    Are C23 enums signed? or unsigned? What is the supported enum
    range?


    I never read the standard, so below is *according to my understanding*,
    rather than the fact.
    Before C23 - signed, at least as wide as int, but wider ranges are not prohibited and can be provided by implementation.
    C23 - enum without type specifier are the same as before. enum with type specifier have range of their master type.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Kuyper@21:1/5 to Michael S on Thu May 23 14:35:57 2024
    On 5/23/24 10:19, Michael S wrote:
    ...
    Another area that was mostly unchanged since 1st edition of K&R is
    storage classes. Even such obvious thing as removal of 'auto' class
    took too long. If I am not mistaken, totally obsolete 'register' class
    is still allowed. And I don't remember any additions.

    constexpr and thread_local have both been added to the list of
    Storage-class specifiers since K&R.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Thu May 23 22:10:22 2024
    On 23/05/2024 16:19, Michael S wrote:
    On Wed, 22 May 2024 18:55:36 +0200
    David Brown <[email protected]> wrote:

    In an attempt to bring some topicality to the group, has anyone
    started using, or considering, C23 ? There's quite a lot of change
    in it, especially compared to the minor changes in C17.


    Why C Standard Committee, while being recently quite liberal in field
    of introducing new keywords (too liberal for my liking, many new things
    do not really deserve keywords not prefixed by __) is so conservative
    in introduction of program control constructs? I don't remember any
    new program control introduced under Committee regime.
    And I want at least one.


    What program control construct would you like?



    Another area that was mostly unchanged since 1st edition of K&R is
    storage classes. Even such obvious thing as removal of 'auto' class
    took too long. If I am not mistaken, totally obsolete 'register' class
    is still allowed.

    "register" is still in C23. (Some compilers pay attention to it. gcc
    with optimisation disabled puts local variables on the stack, except for
    those marked "register" that get put in registers.) It got dropped from
    C++ when "auto" was re-purposed in C++11, but with the keyword
    "register" kept for future use. I would not have objected to the same
    thing happening in C23.

    And I don't remember any additions.

    _Thread_local was added in C11, with the alias thread_local in C23.

    What would you like to see here?

    Personally I can think about at least two useful backward-compatible additions in that area.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to David Brown on Thu May 23 20:23:15 2024
    On 2024-05-23, David Brown <[email protected]> wrote:
    So yes, I /could/ use enum constants for things that are not
    enumerations. I /did/ use them for that. But going forward with C23,
    I'll use constexpr instead.

    The value of an enum is:

    1. Compiler warns of incomplete switch cases.

    2. In a debugger when you examine an enum-valued expression or
    variable, you get the symbolic name:

    3. Safety (with C++ enum rules: no implicit
    conversion from ordinary integer type to enum).

    Historically, C code bases have abused enums to defined constants
    like "enum { bufsize = 1024 }" for understandable reasons, but it is a cringe-inducing hack, which is also incomplete and inflexible; e.g. what
    if we want a floating-point constant.

    I've benefited from (3) in C programs that were contrived
    to be compilable as C++. (That practice, though, tends to increasingly
    hamper your dialect choice though, as the languages diverge and make
    only small steps here and there to become closer.)

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @[email protected]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Keith Thompson on Fri May 24 00:48:02 2024
    On Thu, 23 May 2024 14:38:23 -0700
    Keith Thompson <[email protected]> wrote:

    Michael S <[email protected]> writes:
    On Wed, 22 May 2024 18:55:36 +0200
    David Brown <[email protected]> wrote:
    In an attempt to bring some topicality to the group, has anyone
    started using, or considering, C23 ? There's quite a lot of change
    in it, especially compared to the minor changes in C17.

    Why C Standard Committee, while being recently quite liberal in
    field of introducing new keywords (too liberal for my liking, many
    new things do not really deserve keywords not prefixed by __) is so conservative in introduction of program control constructs? I don't remember any new program control introduced under Committee regime.
    And I want at least one.

    Which is?

    New keywords are typically prefixed by an underscore and an upper case letter, such as C11's "_Generic". There are no (standard) keywords
    starting with "__".


    You are right. I confused Standard C language with
    implementation-defined extensions.

    But the point stands: in recent times Committee is (to my liking) not sufficiently conservative in adding keywords that do not start with the underscore followed by uppercase.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Fri May 24 00:34:24 2024
    On Thu, 23 May 2024 22:10:22 +0200
    David Brown <[email protected]> wrote:

    On 23/05/2024 16:19, Michael S wrote:
    On Wed, 22 May 2024 18:55:36 +0200
    David Brown <[email protected]> wrote:

    In an attempt to bring some topicality to the group, has anyone
    started using, or considering, C23 ? There's quite a lot of change
    in it, especially compared to the minor changes in C17.


    Why C Standard Committee, while being recently quite liberal in
    field of introducing new keywords (too liberal for my liking, many
    new things do not really deserve keywords not prefixed by __) is so conservative in introduction of program control constructs? I don't remember any new program control introduced under Committee regime.
    And I want at least one.


    What program control construct would you like?


    Ability to break from nested loops. Ability to"continue" outer loops
    would be nice too, but less important.
    I am not sure what syntax I want for this feature, never considered
    myself a competent language designer.



    Another area that was mostly unchanged since 1st edition of K&R is
    storage classes. Even such obvious thing as removal of 'auto' class
    took too long. If I am not mistaken, totally obsolete 'register'
    class is still allowed.

    "register" is still in C23. (Some compilers pay attention to it.
    gcc with optimisation disabled puts local variables on the stack,
    except for those marked "register" that get put in registers.) It
    got dropped from C++ when "auto" was re-purposed in C++11, but with
    the keyword "register" kept for future use. I would not have
    objected to the same thing happening in C23.

    And I don't remember any additions.

    _Thread_local was added in C11, with the alias thread_local in C23.


    _Thread_local is a special-purpose thing, probably not applicable at
    all for programming of small embedded systems, which nowadays is the
    only type of programming in C that I do for money rather than as hobby.
    With regard to constexpr, mentioned above by James Kuyper, my feeling
    about it is that it belongs to metaprogramming so I would not consider
    it a real storage class.


    What would you like to see here?


    Instead of solutions, let's talk about problems that I want to solve:

    1. global objects, declared in header files and included several times.
    Where defined?
    For some linkers, mostly unixy linkers, in case of none-initialized
    objects (implicitly initialized to zero) it somehow works.
    For linkers used on embedded systems it requires additional effort.
    I think, for initialized globals it takes additional effort even with
    unixy linkers.
    I wnat it to "just work" everywhere. I think that the best way to get
    it without breaking existing semantics is a new storage class.

    2. Reversing defaults for visibility of objects and functions at file
    scope.
    Something like:
    #pragma export_by_default(off).
    When this pragma is in effect, we need a way to make objects and
    functions globally visible. I think that it's done best with new
    storage class.

    Personally I can think about at least two useful backward-compatible additions in that area.





    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to All on Thu May 23 17:37:39 2024
    Michael S <[email protected]> writes:

    [comments on various new features in C23]

    Overall I am quite disappointed by C23. IMO it's a step
    backwards rather than forwards.

    W.r.t. [asctime() and ctime() being removed]
    IMHO, all old-UNIX-style APIs that return pointers to static
    objects within library or rely on presence of static object within
    library for purpose of preserving state for subsequent calls
    should be systematically deprecated and for majority of them there
    should be provided thread-safe alternatives akin to ctime_s().

    That is, with exception of family of functions that uses FILE*.
    Not that I like them very much, but they are ingrained too deeply.
    So, peeking just asctime and ctime out of long list of problematic
    APIs does not appear particularly consistent. If they were asking
    me where to start, I'd start with rand().

    I agree with the suggestion that restartable versions of "dirty"
    functions be added to the C standard. I strongly disagree that
    the old ones should be taken out. If compilers choose to give
    warnings, that's fine, but these functions should not be removed
    just because some people think they are clunky.

    [...] Just want to say that strfrom* family is long overdue, but
    still appear incomplete. The guiding principle should be that all
    format specifiers available in printf() with sole exception of %s
    should be provided as strfrom* as well.

    What's the motivation for having separate functions? To me this
    looks like creeping featuritis.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Michael S on Fri May 24 01:06:42 2024
    On Fri, 24 May 2024 00:34:24 +0300, Michael S wrote:

    On Thu, 23 May 2024 22:10:22 +0200 David Brown
    <[email protected]> wrote:

    What program control construct would you like?

    Ability to break from nested loops.

    At least 90% of the time, when I want to exit from an inner loop in C,
    there will be some kind of cleanup I need to do in the outer loop before
    that can exit too. So the ability to jump straight out will rarely be
    used.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Thu May 23 18:35:58 2024
    Michael S <[email protected]> writes:

    [what new language features would you like?]

    Ability to break from nested loops. Ability to"continue" outer
    loops would be nice too, but less important. [...]

    1. global objects, declared in header files and included
    several times. Where defined? [...] I wnat it to "just work"
    everywhere. [...]

    Both of these features seem like frills. Neither one is either
    necessary or common; they would make the language bigger but
    especially any better. Adding them would be the proverbial tail
    wagging the dog.


    2. Reversing defaults for visibility of objects and functions
    at file scope.
    Something like:
    #pragma export_by_default(off).

    Not sure taking a half-and-half approach on this is a good idea,
    but if so I think it's better to have the choice be a per-TU
    compilation option rather than a #pragma.

    When this pragma is in effect, we need a way to make objects and
    functions globally visible. I think that it's done best with new
    storage class.

    Just use extern. No new storage class needed.


    With regard to constexpr, mentioned above by James Kuyper, my
    feeling about it is that it belongs to metaprogramming so I
    would not consider it a real storage class.

    Having 'constexpr' be classified as a storage class illustrates
    how poorly thought out it is.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Keith Thompson on Thu May 23 20:28:29 2024
    Keith Thompson <[email protected]> writes:

    Tim Rentsch <[email protected]> writes:
    [...]

    Having 'constexpr' be classified as a storage class illustrates
    how poorly thought out it is.

    constexpr is not classified as a storage class. N3022, like earlier editions, says there are four storage durations: static, thread,
    automatic, and allocated.

    Obviously I was talking about the syntactic classification.
    Don't be obtuse.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Malcolm McLean on Fri May 24 05:42:41 2024
    On Fri, 24 May 2024 06:38:18 +0100, Malcolm McLean wrote:

    On 24/05/2024 02:06, Lawrence D'Oliveiro wrote:

    On Fri, 24 May 2024 00:34:24 +0300, Michael S wrote:

    On Thu, 23 May 2024 22:10:22 +0200 David Brown
    <[email protected]> wrote:

    What program control construct would you like?

    Ability to break from nested loops.

    At least 90% of the time, when I want to exit from an inner loop in C,
    there will be some kind of cleanup I need to do in the outer loop
    before that can exit too. So the ability to jump straight out will
    rarely be used.

    goto gives you the functionality you require.

    I avoid those. I structure my code like a Nassi-Shneiderman diagram, where
    each block has one entrance at the top, and one exit at the bottom. Easier
    to keep track of error conditions and cleanups that way.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jak@21:1/5 to All on Fri May 24 09:32:38 2024
    Bonita Montero ha scritto:
    Am 23.05.2024 um 21:49 schrieb Thiago Adams:
    On 23/05/2024 16:25, Bonita Montero wrote:
    I ask myself what the point is in further developing a language
    like this that can actually no longer be saved.
    do you mean C++?


    No, C.

    I think you have a lot of confusion about programming languages. C and
    C++ are not comparable languages. it can be c and assembler, or c++ and
    java. If you really want to compare c to c++, then c++ is to c as rust
    is to c++. I'm pretty convinced that c++ will be abandoned long before
    c. Just for one example, c++ would be abandoned years ago if c# didn't
    produce CLI code only because C# lacks nothing important than C++ and
    the learning curve is much steeper (it also benefits from reflection).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Malcolm McLean on Fri May 24 11:42:24 2024
    On Fri, 24 May 2024 06:38:18 +0100
    Malcolm McLean <[email protected]> wrote:

    On 24/05/2024 02:06, Lawrence D'Oliveiro wrote:
    On Fri, 24 May 2024 00:34:24 +0300, Michael S wrote:

    On Thu, 23 May 2024 22:10:22 +0200 David Brown
    <[email protected]> wrote:

    What program control construct would you like?

    Ability to break from nested loops.

    At least 90% of the time, when I want to exit from an inner loop in
    C, there will be some kind of cleanup I need to do in the outer
    loop before that can exit too. So the ability to jump straight out
    will rarely be used.

    goto gives you the functionality you require.


    Sure, me too. Because that's what I have.
    If they hadn't given me {, }, else, while, for, and do then I would
    use goto to simulate all those as well. It gives functionality I
    require, don't it?

    I usually use goto for handling malloc() failures. So if an
    allocation fails within a deeply nested loop, I will jump to code at
    the end of the function, free up amy half-constructed objects, and
    return an error condition.


    I do similar thing too, but that's just a habit that I can't overcome.
    It has no practical sense in environments that I work today. I could
    just as well return immediately, without cleaning up, it would have
    zero practical difference except that my code would be shorter and will
    look cleaner.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Fri May 24 11:03:43 2024
    On 23/05/2024 23:49, Keith Thompson wrote:
    Thiago Adams <[email protected]> writes:
    On 23/05/2024 10:11, David Brown wrote:
    On 23/05/2024 14:38, Thiago Adams wrote:
    [...]
    I already see bad usages of constexpr in C++ code. It was used in
    cases where we know for sure that is NOT compile time. This just
    make review harder "why did someone put this here?" conclusion was
    it was totally unnecessary and ignored by the compiler. The
    programmer was trying to add something extra, like "magic" hoping
    for something that would never happen.

    IME poor or confusing uses of "constexpr" are for functions, not
    objects, and C23 does not support "constexpr" for functions.

    The sample C++ was something like

    constexpr char * s[] = {"a", "b"};
    for (int i = 0; i < sizeof(s); i++)
    {
    //using s[i]
    }

    I checked in C, it is an error.

    Apparently C23 has stricter rules for constexpr than C++ does. I can
    imagine those rules being relaxed in future editions of the C standard.


    From the proposal for "constexpr" in C23, <https://open-std.org/JTC1/SC22/WG14/www/docs/n3018.htm>, it says:

    """
    There are some restrictions on the type of an object that can be
    declared with constexpr storage duration. There is a limited number of constructs that are not allowed:

    pointer types:
    allowing these to use non-trivial addresses would delay the
    deduction of the concrete value from translation to link-time. For most
    of the use cases, such a feature can already be coded by using a static
    and const qualified pointer object, we don’t need constexpr for that. Therefore we only allow pointer types if the initializer value is null.
    """


    I'm not sure (and haven't looked at all the discussions involved, so I
    could be completely wrong), but I think there is concern that constexpr pointers, other than null pointers, might need more features in the
    linker than C currently requires. C++ already has more demands of
    linkers to handle things like inline variables and statics in templates.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Fri May 24 12:05:44 2024
    On Thu, 23 May 2024 17:37:39 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    [comments on various new features in C23]

    Overall I am quite disappointed by C23. IMO it's a step
    backwards rather than forwards.

    W.r.t. [asctime() and ctime() being removed]
    IMHO, all old-UNIX-style APIs that return pointers to static
    objects within library or rely on presence of static object within
    library for purpose of preserving state for subsequent calls
    should be systematically deprecated and for majority of them there
    should be provided thread-safe alternatives akin to ctime_s().

    That is, with exception of family of functions that uses FILE*.
    Not that I like them very much, but they are ingrained too deeply.
    So, peeking just asctime and ctime out of long list of problematic
    APIs does not appear particularly consistent. If they were asking
    me where to start, I'd start with rand().

    I agree with the suggestion that restartable versions of "dirty"
    functions be added to the C standard. I strongly disagree that
    the old ones should be taken out. If compilers choose to give
    warnings, that's fine, but these functions should not be removed
    just because some people think they are clunky.

    [...] Just want to say that strfrom* family is long overdue, but
    still appear incomplete. The guiding principle should be that all
    format specifiers available in printf() with sole exception of %s
    should be provided as strfrom* as well.

    What's the motivation for having separate functions? To me this
    looks like creeping featuritis.

    My practical motivation is space-constrained environments, where I
    possibly want one or two or three formatters. sprintf() gives me
    all or nothing and all can be too expensive. Many embedded environments
    have big and small variants of sprintf that can be chosen at link time,
    but what's in small variant does not necessarily match a set that I
    want in my specific project. And is not necessarily well documented.

    My aesthetic motivation is a symmetry between strto* and strfrom*.

    My esoteric motivation is : sprintf() is historically associated with
    "standard I/O". Functionality in question has no relationship to I/O.
    But let's leave it aside, it's not important.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Fri May 24 06:54:35 2024
    Michael S <[email protected]> writes:

    On Thu, 23 May 2024 17:37:39 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    [...] Just want to say that strfrom* family is long overdue, but
    still appear incomplete. The guiding principle should be that all
    format specifiers available in printf() with sole exception of %s
    should be provided as strfrom* as well.

    What's the motivation for having separate functions? To me this
    looks like creeping featuritis.

    My practical motivation is space-constrained environments, where I
    possibly want one or two or three formatters. sprintf() gives me all
    or nothing and all can be too expensive. Many embedded environments
    have big and small variants of sprintf that can be chosen at link
    time, but what's in small variant does not necessarily match a set
    that I want in my specific project. And is not necessarily well
    documented.

    Okay, I see now where you're coming from, although I'm not sure that
    the strfrom*() functions will give you what you want (in terms of
    memory footprint, etc). But I get your motivation.

    Question: which of the four formats (%A, %E, %F, %G) are ones you
    expect to use? Also I'm curious: do all of your target platforms
    use IEEE floating point, or do some use other representations?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Fri May 24 15:45:52 2024
    On 23/05/2024 22:06, Keith Thompson wrote:
    David Brown <[email protected]> writes:
    On 23/05/2024 14:11, bart wrote:
    [...]
    'embed' was discussed a few months ago. I disagreed with the poor
    way it was to be implemented: 'embed' notionally generates a list of
    comma-separated numbers as tokens, where you have to take care of
    any trailing zero yourself if needed. It would also be hopelessly
    inefficient if actually implemented like that.

    Fortunately, it is /not/ actually implemented like that - it is only
    implemented "as if" it were like that. Real prototype implementations
    (for gcc and clang - I don't know about other tools) are extremely
    efficient at handling #embed. And the comma-separated numbers can be
    more flexible in less common use-cases.
    [...]

    I'm aware of a proposed implementation for clang:

    https://github.com/llvm/llvm-project/pull/68620 https://github.com/ThePhD/llvm-project

    I'm currently cloning the git repo, with the aim of building it so I can
    try it out and test some corner cases. It will take a while.

    I'm not aware of any prototype implementation for gcc. If you are, I'd
    be very interested in trying it out.


    I haven't seen anything concrete, but I believe I read about it in one
    of the papers discussing #embed. It may have been just some tests and proofs-of-concept, and not a development branch or proposed implementation.

    (And thanks for starting this thread!)


    It's not easy to find a topic that is entirely about C, hasn't been
    discussed to death already, has enough controversial aspects for a
    serious discussion but not so many that it leads to fights and flames,
    and is not so esoteric that it causes most readers eyes to glaze over!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Kaz Kylheku on Fri May 24 16:25:20 2024
    On 23/05/2024 22:23, Kaz Kylheku wrote:
    On 2024-05-23, David Brown <[email protected]> wrote:
    So yes, I /could/ use enum constants for things that are not
    enumerations. I /did/ use them for that. But going forward with C23,
    I'll use constexpr instead.

    The value of an enum is:

    1. Compiler warns of incomplete switch cases.

    (gcc -Wswitch or -Wswitch-enum)

    To be clear - I will, without doubt, continue to use "enum" for
    enumerations and enumerated types. For enumerations, enum gives all the advantages you mention and more (such as automatic choice of values).

    But I'd rather use "constexpr" for constant expressions that are not enumerations.


    2. In a debugger when you examine an enum-valued expression or
    variable, you get the symbolic name:

    3. Safety (with C++ enum rules: no implicit
    conversion from ordinary integer type to enum).

    (gcc -Wenum-compare -Wenum-conversion -Wenum-int-mismatch)


    Historically, C code bases have abused enums to defined constants
    like "enum { bufsize = 1024 }" for understandable reasons, but it is a cringe-inducing hack, which is also incomplete and inflexible; e.g. what
    if we want a floating-point constant.

    I've benefited from (3) in C programs that were contrived
    to be compilable as C++. (That practice, though, tends to increasingly
    hamper your dialect choice though, as the languages diverge and make
    only small steps here and there to become closer.)


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Fri May 24 16:19:49 2024
    On 23/05/2024 18:43, Keith Thompson wrote:
    bart <[email protected]> writes:
    [...]
    I suspect that ones like 'embed' have been derived from C++ which
    always likes to make things too wide-ranging and much harder to use
    and implement than necessary.

    No, C++ doesn't have #embed. (If it did, many C compilers would already
    have it, since C and C++ commonly share the preprocessor
    implementation.)


    C++ has proposals for both #embed and std::embed<>, but AFAIK these are
    not yet accepted. I expect #embed to make it (since the big tools will
    support it for C anyway). std::embed<> is more powerful but has
    additional complications.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Malcolm McLean on Fri May 24 16:39:02 2024
    On 24/05/2024 01:06, Malcolm McLean wrote:
    On 22/05/2024 20:50, David Brown wrote:
    On 22/05/2024 21:10, Malcolm McLean wrote:

    But even boolean type and const.

    Const documents the code, makes the action of a function clearer to
    the reader, and helps catch mistakes.

    These are all things that make the language better, and have done so
    for the past 25 years.

    Of course quite alot of the functions don't actually change the
    structures they are passed. But is littering the code with const
    going to help? And why do you really need a boolean when an int can
    hold either a zero or non-zero value?

    And don't you just want a pared down, clean language?


    I want a language with the features I need and that help me to write
    good clear code.  Minimal is not helpful, any more than needlessly
    complex is helpful.


    So the code I'm working on at the moment.

    It's an implemention of XPath (a subset, of course). XPath is sort of
    query language for XML. You pass a query string like
    "/bookstore/book//title" and that selects all children of [root]/bookstore/book with the element tag "title".

    Now querying the document shouldn't change it. So in C++ it should
    bepassed in as a XMLDOC const &. In C, declaring the pointer a const
    XMLDOC * conveyes the intention, but doesn't actually achieve the safety
    you want and get with C++.

    The safety is the same in C and C++ (unless your C++ code provides const
    and non-const overloads for the function). References in C++ don't let
    you pass a null pointer, but you can "cast away" the const in a const
    reference as easily as you can remove the const in a const pointer:

    void naughty(const int & x) {
    int & y = (int &) x;
    y++;
    }

    In both cases, the "const" is a promise to the reader and a promise to
    the compiler, but you can break that promise if you do so explicitly.


    However the algorithm I have just moved to needs a bit associated with
    each node it can turn on and of. Now in fact I did this via a hash
    table. But it is very tempting and far more efficient to simply add a
    hacky field to the XMLNODE structure - after all, I wrote the XML
    parser. And in C++ "mutable" is designed for just this. But in C,
    were're either const or not. And isn't it maybe better to leave the
    const qualifier off the document pointer?

    "mutable" is just a kosher way of breaking your const promises. In
    cases where "mutable" might be useful, I generally prefer to
    differentiate between the part of structure that is fixed and
    unchanging, and the part that is more volatile status. (This can also
    be better from the viewpoint of cache friendliness, if that is of
    concern.) But if I had a situation where C++ "mutable" would be the
    best choice, and I had to implement it in C without "mutable", I am not
    sure that casting to non-const in the implementation function would be
    must worse.


    In fact, wouldn't we just be better off without const?

    No.

    We'd be better off having everything const - /really/ constant - by
    default, and having to explicitly declare the few things that actually
    have to be changed after initialisation. That's how many modern
    programming languages do it.

    After all, you
    need to read the function specifications anyway, and they should say
    that querying for a path will not alter the document.


    /Never/ write things in comments or documentation if you can express the
    same thing in code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Chris M. Thomasson on Fri May 24 16:50:28 2024
    On 24/05/2024 01:05, Chris M. Thomasson wrote:
    On 5/23/2024 6:35 AM, David Brown wrote:
    On 22/05/2024 23:24, Chris M. Thomasson wrote:
    On 5/22/2024 9:55 AM, David Brown wrote:
    In an attempt to bring some topicality to the group, has anyone
    started using, or considering, C23 ?  There's quite a lot of change
    in it, especially compared to the minor changes in C17.

    Love the way std::vectors respect alignas... C++20, iirc?

    [...]

    I have no idea what you are talking about.

    std::vector actually respects alignas, on MSVC at least. I did not know
    this worked until I tried it. Iirc, Bonita was the one that sparked my
    test. It aligned itself on the proper boundaries. Very nice.

    But did you notice that this is c.l.c, not c.l.c++, and the topic is
    C23, not C++23 ?  Discussing comparisons or compatibility with C++ is
    fair enough, but talking about pure C++ matters (such as
    std::vector<>) is unlikely to be helpful.

    C has it as well... Very useful!

    I know C has alignas (now as a keyword in C23, instead of just _Alignas
    from C11).

    I know C++ has alignas (from C++11 onwards).

    What I don't understand is why you think std::vector<> "respects
    alignas" in C++20 - alignment for std::vector<> works like alignment for
    any other class in C++, and always has done.

    And what I /really/ don't understand is why you think it is remotely
    relevant here? Even "alignas" in C is not particular relevant to this
    thread, except that it has become a keyword in C23 instead of a macro
    defined to _Alignas in <stdalign.h>.

    Perhaps I should just be grateful for the small mercy of there being no
    random youtube link in your post.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Fri May 24 17:10:31 2024
    On 23/05/2024 18:40, Keith Thompson wrote:
    Michael S <[email protected]> writes:
    [...]
    Removed
    [...]
    7) static_assert is not provided as a macro defined in <assert.h>
    (becomes a keyword)
    8) thread_local is not provided as a macro defined in <threads.h>
    (becomes a keyword)

    [...]
    7) bad. Breaks existing code for weak reason
    8) bad. Breaks existing code for weak reason

    In pre-C23, _Static_assert and _Thread_local are keywords, and
    static_assert and thread_local are macros that expand to those keywords.

    In C23, _Static_assert, _Thread_local, static_assert, and thread_local
    are all keywords. Code that simply uses the old ugly keywords would not break.

    Code that does something like "#ifdef static_assert". I suppose the
    headers could have retained the old macro definitions.

    #define static_assert static_assert
    #define thread_local thread_local


    The sort of code that could theoretically break is when you have
    definitions like this:

    #define STATIC_ASSERT_NAME_(line) STATIC_ASSERT_NAME2_(line)
    #define STATIC_ASSERT_NAME2_(line) assertion_failed_at_line_##line
    #define static_assert(claim, warning) \
    typedef struct { \
    char STATIC_ASSERT_NAME_(__COUNTER__) [(claim) ? 2 : -2]; \
    } STATIC_ASSERT_NAME_(__COUNTER__)

    That works in any C version, until C23, almost as well as
    _static_assert. I used this when C11 support was rare in the tools I used.

    While using #define for a C keyword is undefined behaviour, in practice
    I think you'd have a hard time finding code and a compiler that used
    such a macro and which did not work just as well in C23 mode.

    (I don't know if anyone is in the habit of declaring macros named "thread_local".)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Fri May 24 17:57:35 2024
    On 23/05/2024 23:34, Michael S wrote:
    On Thu, 23 May 2024 22:10:22 +0200
    David Brown <[email protected]> wrote:

    On 23/05/2024 16:19, Michael S wrote:
    On Wed, 22 May 2024 18:55:36 +0200
    David Brown <[email protected]> wrote:

    In an attempt to bring some topicality to the group, has anyone
    started using, or considering, C23 ? There's quite a lot of change
    in it, especially compared to the minor changes in C17.


    Why C Standard Committee, while being recently quite liberal in
    field of introducing new keywords (too liberal for my liking, many
    new things do not really deserve keywords not prefixed by __) is so
    conservative in introduction of program control constructs? I don't
    remember any new program control introduced under Committee regime.
    And I want at least one.


    What program control construct would you like?


    Ability to break from nested loops. Ability to"continue" outer loops
    would be nice too, but less important.
    I am not sure what syntax I want for this feature, never considered
    myself a competent language designer.

    I've heard people request this before. I can't say I've ever felt it
    was something I'd have use for, but there's lots of things in C that I
    never need.

    There is a proposal for adding it to C:

    <https://open-std.org/JTC1/SC22/WG14/www/docs/n3195.htm>





    Another area that was mostly unchanged since 1st edition of K&R is
    storage classes. Even such obvious thing as removal of 'auto' class
    took too long. If I am not mistaken, totally obsolete 'register'
    class is still allowed.

    "register" is still in C23. (Some compilers pay attention to it.
    gcc with optimisation disabled puts local variables on the stack,
    except for those marked "register" that get put in registers.) It
    got dropped from C++ when "auto" was re-purposed in C++11, but with
    the keyword "register" kept for future use. I would not have
    objected to the same thing happening in C23.

    And I don't remember any additions.

    _Thread_local was added in C11, with the alias thread_local in C23.


    _Thread_local is a special-purpose thing, probably not applicable at
    all for programming of small embedded systems, which nowadays is the
    only type of programming in C that I do for money rather than as hobby.

    I have never seen the point of it either. Why would anyone want a
    variable that exists for /all/ threads in a program, but independently
    per thread? The only use I can think of is for errno (which is, IMHO, a
    horror unto itself) but since that is defined by the implementation, it
    does not need to use _Thread_local. (Indeed, thread-local errno macros
    existed long before C11.)

    You and I (as small embedded systems programmers) are perhaps biased in
    being allergic to the wasted bytes of ram a thread-local variable would
    likely use!

    With regard to constexpr, mentioned above by James Kuyper, my feeling
    about it is that it belongs to metaprogramming so I would not consider
    it a real storage class.


    The term "storage-class specifier" is a bit of a misnomer, in that it is
    more of a syntactic term than referring just to the storage duration or placement of objects. "typedef" is also a storage-class specifier, for example.


    What would you like to see here?


    Instead of solutions, let's talk about problems that I want to solve:

    Good idea.


    1. global objects, declared in header files and included several times.
    Where defined?

    In C, they must be defined in exactly one translation unit.

    For some linkers, mostly unixy linkers, in case of none-initialized
    objects (implicitly initialized to zero) it somehow works.

    The use of "int global_x;" in headers is undefined behaviour (AFAIK) in
    C, and its support is a hangover from linker support for Fortran common
    blocks. And it is the source of many odd errors for people who are not
    careful enough in their coding. If your compiler supports this
    misfeature (such as "gcc -fcommon"), and you accidentally declare two uninitialised non-static variables with the same name in two files, you
    have no detection or protection from the chaos that ensues. With
    compilers that don't support this ("gcc -fno-common"), you get a
    link-time error showing your problem. (gcc made "-fno-common" the
    default in version 10. That's 9 major versions late, IMHO, but better
    late than never.)

    (To the extent that it "works", it is handled by putting the symbol name
    and data space reservation in a "common" section. At link time, common
    symbols with the same name are merged - whether that makes sense for the
    code or not.)


    For linkers used on embedded systems it requires additional effort.

    I can't say I have ever seen it as an effort. Almost all my C "modules"
    come in pairs - "file.h" and "file.c". All non-local variables (and all functions) are either static and declared only in "file.c", or they are externally linked and have an "extern" declaration in "file.h" and a
    definition (with or without initialisation) in "file.c" (which #includes "file.h"). It is a very simple and clean arrangement, easily checked by
    gcc warnings, and there are never any undetected conflicts.

    (And probably 90% or more of current small-systems embedded development
    uses gcc and binutils linker.)

    I think, for initialized globals it takes additional effort even with
    unixy linkers.
    I wnat it to "just work" everywhere. I think that the best way to get
    it without breaking existing semantics is a new storage class.

    This is all very much a non-issue for well-structured code.


    The only time it can matter is if you want to write "header-only"
    modules. This is popular in C++, but not in C. In C++ it relies on the
    linker merging the same symbols for inline declarations such as inline functions, template functions, template class and function statics, and
    - since C++17 - inline variables. So you can write "inline int
    global_x;" or "inline int global_y = 123;" in a header, and it will be
    created once and only once (if it is used somewhere). The
    compiler/linker can't check for consistency of initialiser, unless you
    are using link-time optimisation.



    2. Reversing defaults for visibility of objects and functions at file
    scope.
    Something like:
    #pragma export_by_default(off).
    When this pragma is in effect, we need a way to make objects and
    functions globally visible. I think that it's done best with new
    storage class.


    I would much prefer if file-level variables and functions were "static"
    by default and required explicit exporting. But that ship sailed 50
    years ago.

    What you can do - what /I/ do - is be rigid in making sure all your
    exported variables and functions are declared as "extern" in a header
    that is also included by the defining C file. Use "gcc -Werror=missing-declarations -Werror=missing-variable-declarations" to
    enforce these rules. It is not quite as good as going back in time and
    fixing C at the start, but it's close!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Fri May 24 18:46:23 2024
    On Fri, 24 May 2024 06:54:35 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Thu, 23 May 2024 17:37:39 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    [...] Just want to say that strfrom* family is long overdue, but
    still appear incomplete. The guiding principle should be that all
    format specifiers available in printf() with sole exception of %s
    should be provided as strfrom* as well.

    What's the motivation for having separate functions? To me this
    looks like creeping featuritis.

    My practical motivation is space-constrained environments, where I
    possibly want one or two or three formatters. sprintf() gives me
    all or nothing and all can be too expensive. Many embedded
    environments have big and small variants of sprintf that can be
    chosen at link time, but what's in small variant does not
    necessarily match a set that I want in my specific project. And is
    not necessarily well documented.

    Okay, I see now where you're coming from, although I'm not sure that
    the strfrom*() functions will give you what you want (in terms of
    memory footprint, etc). But I get your motivation.

    Question: which of the four formats (%A, %E, %F, %G) are ones you
    expect to use?

    Rarely: any of those, mostly for debugging.
    In productioon code: %e is most likely, but %f could happen.
    But it's not just a floating point. "Small" variants of sprintf() on
    32-bit platforms often unable to handle %lld and %llu.

    Also I'm curious: do all of your target platforms
    use IEEE floating point, or do some use other representations?

    Currently, only IEEE. In the past, there were others, but that was quite
    a long tyme ago. Back, when after few years in other field I just
    started my pro programming carieer, I spend couple of years doing
    mostly TMS320C30. I don't remember for sure, but it is likely that I
    never used formatted FP output there; our boards were probably too
    short of memory for that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to David Brown on Fri May 24 16:16:27 2024
    David Brown <[email protected]> writes:
    On 23/05/2024 23:34, Michael S wrote:
    On Thu, 23 May 2024 22:10:22 +0200
    David Brown <[email protected]> wrote:

    _Thread_local is a special-purpose thing, probably not applicable at
    all for programming of small embedded systems, which nowadays is the
    only type of programming in C that I do for money rather than as hobby.

    I have never seen the point of it either. Why would anyone want a
    variable that exists for /all/ threads in a program, but independently
    per thread?

    Very common in kernel programming (e.g. the use of '%gs' in x86_linux)
    as a pointer to the 'per-cpu' data structure.

    We use thread local to implement 'self' methods in certain
    classes (so rather than passing pointers around, one can
    simply call class::self() to get a pointer to the
    class for each thread.

    class c_processor {
    ...
    /**
    * Per-thread value of the processor object.
    */
    static __thread c_processor *p_this;
    ...

    public:
    c_processor(c_system *, c_logger *, processor_number_t, bool);
    ~c_processor(void);

    static c_processor *self(void) { return p_this; }

    ...




    c_processor *pp = c_processor::self().

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Fri May 24 19:22:56 2024
    On Fri, 24 May 2024 17:57:35 +0200
    David Brown <[email protected]> wrote:


    I can't say I have ever seen it as an effort. Almost all my C
    "modules" come in pairs - "file.h" and "file.c". All non-local
    variables (and all functions) are either static and declared only in "file.c", or they are externally linked and have an "extern"
    declaration in "file.h" and a definition (with or without
    initialisation) in "file.c" (which #includes "file.h"). It is a very
    simple and clean arrangement, easily checked by gcc warnings, and
    there are never any undetected conflicts.


    Declaration/definition pair is repeating yourself, which is not a good
    think.
    Of course, the same applies to declaration/definition of externally
    visible functions, but somehow in case of functions I am more tolerant
    to repetitions than in case of variable. Probably, a psychological
    phenomenon - I feel that functions are less trivial, so repetition is
    less wasteful.
    But I'd like to get rid of these repetitions to, I just did not figure
    out a way to do it that does not compromise even more important concern
    of seperation between interface and implementation (yes, I dislike Java
    for that reason too).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Fri May 24 19:38:16 2024
    On 24/05/2024 17:22, Michael S wrote:
    On Fri, 24 May 2024 17:57:35 +0200
    David Brown <[email protected]> wrote:


    I can't say I have ever seen it as an effort. Almost all my C
    "modules" come in pairs - "file.h" and "file.c". All non-local
    variables (and all functions) are either static and declared only in
    "file.c", or they are externally linked and have an "extern"
    declaration in "file.h" and a definition (with or without
    initialisation) in "file.c" (which #includes "file.h"). It is a very
    simple and clean arrangement, easily checked by gcc warnings, and
    there are never any undetected conflicts.


    Declaration/definition pair is repeating yourself, which is not a good
    think.
    Of course, the same applies to declaration/definition of externally
    visible functions, but somehow in case of functions I am more tolerant
    to repetitions than in case of variable. Probably, a psychological
    phenomenon - I feel that functions are less trivial, so repetition is
    less wasteful.
    But I'd like to get rid of these repetitions to, I just did not figure
    out a way to do it that does not compromise even more important concern
    of seperation between interface and implementation (yes, I dislike Java
    for that reason too).



    I normally use a private systems language which some here have claimed
    is just C with a different syntax.

    Nevertheless, this particular problem has been solved:

    * There is only a single definition of any function, variable, type,
    struct, enum or macro

    * No separate declarations are needed. Definitions can appear in any order

    * There are no header files

    * Exported definitions have a 'global' atribute

    It also has a module scheme so that, for example, only the lead module
    of a program needs to be submitted to the compiler.

    C's facilities for this stuff are quite crude. It would be difficult to retro-fit a scheme like mine.

    At best a separate preprocessing pass can done before normal
    compilation, which can produce the necessary declarations. Or perhaps a
    clever IDE can generate some of this stuff.

    But working only with the raw language as it is now, the tidiest
    solution is the .h/.c file pairs that have been mentioned.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Keith Thompson on Fri May 24 21:20:52 2024
    On 24/05/2024 21:06, Keith Thompson wrote:
    bart <[email protected]> writes:
    [...]
    I normally use a private systems language which some here have claimed
    is just C with a different syntax.
    [...]

    I don't recall anyone claiming that.


    When it was last discussed it was in a different group but the same
    people who post here.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to David Brown on Fri May 24 23:51:20 2024
    On Fri, 24 May 2024 16:50:28 +0200, David Brown wrote:

    I know C has alignas ...

    Just for a moment, I wondered “what is an aligna?” ...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Malcolm McLean on Sat May 25 00:31:03 2024
    On Fri, 24 May 2024 07:47:48 +0100, Malcolm McLean wrote:

    I virtually always use goto for memory allocation failure.

    It does mean that, strictly, the function is no longer a "structured" subroutine. But reality is usually that memory allocation failure will
    mean program termination pretty soon.

    Hmm, there may be a point in that. Consider also that Linux systems are typically configured to overcommit memory allocations: they never say
    “no”, but when they start running low, then they start killing the big memory hogs.

    However, there are other dynamic checks that may need to be done. For
    example, trying to load an image, and discovering that your decoder cannot handle it, possibly because it is corrupted or the wrong format
    altogether. It would be nice to recover gracefully from this sort of
    situation. And not have the decoder crash or leak memory.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to David Brown on Sat May 25 00:40:10 2024
    On Fri, 24 May 2024 17:57:35 +0200, David Brown wrote:

    Why would anyone want a variable that exists for /all/ threads in a
    program, but independently per thread? The only use I can think of is
    for errno (which is, IMHO, a horror unto itself) but since that is
    defined by the implementation, it does not need to use _Thread_local.

    errno is indeed the example that immediately comes to mind for the use of
    this feature. It is supposed to have the semantics of an assignable
    variable, so how else would you implement it, if not by some (possibly implementation-specific or special-case equivalent of) the _Thread_local mechanism?

    I am in two minds over whether errno is a hack or not. On the one hand, it makes more sense for system calls (and library ones, too) to return an
    error status directly; on the other hand, sometimes maybe you want to “accumulate” an error status after a series of calls, and errno is a convenient way of doing this.

    As for other uses of thread-local, I think most of them have to do with optimizations, like threading itself. For example, imagine a bunch of
    threads all contributing increments to a common counter: instead of
    continually blocking on access to that counter, they could each have their
    own thread-local counter, which periodically has its current value added
    to the global counter and then zeroed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Michael S on Sat May 25 00:32:08 2024
    On Fri, 24 May 2024 19:22:56 +0300, Michael S wrote:

    Declaration/definition pair is repeating yourself, which is not a good [thing].

    But it is standard practice in all languages with a decent module system
    which has separation of interface and implementation.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Sat May 25 03:01:07 2024
    Michael S <[email protected]> writes:

    On Fri, 24 May 2024 06:54:35 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Thu, 23 May 2024 17:37:39 -0700
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    [...] Just want to say that strfrom* family is long overdue, but
    still appear incomplete. The guiding principle should be that all
    format specifiers available in printf() with sole exception of %s
    should be provided as strfrom* as well.

    What's the motivation for having separate functions? To me this
    looks like creeping featuritis.

    My practical motivation is space-constrained environments, where I
    possibly want one or two or three formatters. sprintf() gives me
    all or nothing and all can be too expensive. Many embedded
    environments have big and small variants of sprintf that can be
    chosen at link time, but what's in small variant does not
    necessarily match a set that I want in my specific project. And is
    not necessarily well documented.

    Okay, I see now where you're coming from, although I'm not sure that
    the strfrom*() functions will give you what you want (in terms of
    memory footprint, etc). But I get your motivation.

    Question: which of the four formats (%A, %E, %F, %G) are ones you
    expect to use?

    Rarely: any of those, mostly for debugging.
    In productioon code: %e is most likely, but %f could happen.

    If you can get by without %g, I recommend writing your own. The
    effort needed isn't trivial but it isn't impossibly large either.
    (If you really need %g that's a whole other kettle of fish... and
    really old smelly fish at that. :)

    But it's not just a floating point. "Small" variants of sprintf()
    on 32-bit platforms often unable to handle %lld and %llu.

    Here again, just write them. Easy as falling off a log.


    Also I'm curious: do all of your target platforms
    use IEEE floating point, or do some use other representations?

    Currently, only IEEE. [...]

    My comments above are predicated on being able to count on
    floating point being in IEEE format.

    Oh, if you want more information about this, please feel free
    to email me.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Sat May 25 13:11:37 2024
    On 25/05/2024 03:29, Keith Thompson wrote:
    Keith Thompson <[email protected]> writes:
    David Brown <[email protected]> writes:
    On 23/05/2024 14:11, bart wrote:
    [...]
    'embed' was discussed a few months ago. I disagreed with the poor
    way it was to be implemented: 'embed' notionally generates a list of
    comma-separated numbers as tokens, where you have to take care of
    any trailing zero yourself if needed. It would also be hopelessly
    inefficient if actually implemented like that.

    Fortunately, it is /not/ actually implemented like that - it is only
    implemented "as if" it were like that. Real prototype implementations
    (for gcc and clang - I don't know about other tools) are extremely
    efficient at handling #embed. And the comma-separated numbers can be
    more flexible in less common use-cases.
    [...]

    I'm aware of a proposed implementation for clang:

    https://github.com/llvm/llvm-project/pull/68620
    https://github.com/ThePhD/llvm-project

    I'm currently cloning the git repo, with the aim of building it so I can
    try it out and test some corner cases. It will take a while.

    I'm not aware of any prototype implementation for gcc. If you are, I'd
    be very interested in trying it out.

    (And thanks for starting this thread!)

    I've built this from source, and it mostly works. I haven't seen it do
    any optimization; the `#embed` directive expands to a sequence of comma-separated integer constants.

    Which means that this:

    #include <stdio.h>
    int main(void) {
    struct foo {
    unsigned char a;
    unsigned short b;
    unsigned int c;
    double d;
    };
    struct foo obj = {
    #embed "foo.dat"
    };
    printf("a=%d b=%d c=%d d=%f\n", obj.a, obj.b, obj.c, obj.d);
    }

    given "foo.dat" containing bytes with values 1, 2, 3, and 4, produces
    this output:

    a=1 b=2 c=3 d=4.000000


    That is what you would expect by the way #embed is specified. You would
    not expect to see any "optimisation", since optimisations should not
    change the results (apparent from choosing between alternative valid
    results).

    Where you will see the optimisation difference is between :

    const int xs[] = {
    #embed "x.dat"
    };

    and

    const int xs[] = {
    #include "x.csv"
    };


    where "x.dat" is a large binary file, and "x.csv" is the same data as comma-separated values. The #embed version will compile very much
    faster, using far less memory. /That/ is the optimisation.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Thiago Adams on Sat May 25 13:05:42 2024
    On 25/05/2024 02:27, Thiago Adams wrote:
    Em 5/24/2024 5:19 PM, Keith Thompson escreveu:
    Thiago Adams <[email protected]> writes:
    On 24/05/2024 16:45, Keith Thompson wrote:
    Thiago Adams <[email protected]> writes:
    On 23/05/2024 18:49, Keith Thompson wrote:
    error: 'constexpr' pointer initializer is not null
    5 |     constexpr char * s[] = {"a", "b"};


    Then we were asking why constexpr was used in that case.
    Why not?

    When I see a constexpr I ask if the compiler is able to compute
    everything at compile time. If not immediately it is a bad usage in my >>>>> view.
    I don't understand.  Do you object because it's not *immediately
    obvious* that everthing can be computed at compile time?  If so, why
    should it have to be?

    My understanding is that constexpr is a tip for the compiler. Does not
    ensure anything. Unless you use where constant expression is required.
    So I don't like to see constexpr  where I know it is not a constant
    expression.

    Your understanding is incorrect.  "constexpr" is not a mere hint.
    I think I can explain I little better

    Let´s consider we have a compile time array of integers and a loop.

    https://godbolt.org/z/e8cM1KGWT

    #include <stdio.h>
    #include <stdlib.h>
    int main() {
        constexpr int a[] = {1, 2, 3, 4, 5, 6, 7, 8};
        for (int i = 0 ; i < sizeof(a)/sizeof(a[0]); i++)
        {
            printf("%d", a[i]);
        }
    }

    What the programmer expected using a constant array in a loop?
    The loop is in runtime, unless the compiler expanded the loop into 8
    calls using constant expressions. But this is not the case.
    This was the usage of constexpr I saw but with literal strings.
    So, the array a is not used as constant even if it has constexpr.


    The array /is/ constant. It never changes. The compiler can use that.
    I would expect the array to be fixed in the code section of the binary,
    along with any other read-only data in the program, rather than put on
    the stack.

    In this particular case, the constexpr makes little difference because
    the compiler knows everything about what happens to the array "a", since
    its address does not "escape" from the current translation unit. The
    compiler will generate the same code regardless of whether the array is declared "constexpr int", "const int", or plain "int". (But it can
    check for accidental modification better with "const" or "constexpr" in
    case the programmer made a mistake.)

    I am not entirely sure of the specifications for printf, but the
    compiler may even be able to turn this into:

    int main() {
    printf("12345678");
    }

    It is /certainly/ allowed to turn it into :

    int main() {
    for (int i = 1; i < 9; i++) {
    printf("%d", i);
    }
    }

    In C (not C++), defining an object as "constexpr" gives you two things
    compared to defining it as "const". One is that its value can be used
    when you need a constant expression according to the rules of the
    language (such as for the size of an array in a struct). The other is
    that it gives a compile-time error if its initialiser is not itself a
    constant expression - and that means an extra check and protection
    against some kinds of programmer errors, and extra information to people reading the code.

    I don't expect it to make a difference in generated code from an
    optimising compiler, in comparison to objects declared with "const".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Chris M. Thomasson on Sat May 25 13:22:24 2024
    On 24/05/2024 20:08, Chris M. Thomasson wrote:
    On 5/24/2024 7:50 AM, David Brown wrote:
    On 24/05/2024 01:05, Chris M. Thomasson wrote:
    On 5/23/2024 6:35 AM, David Brown wrote:
    On 22/05/2024 23:24, Chris M. Thomasson wrote:
    On 5/22/2024 9:55 AM, David Brown wrote:
    In an attempt to bring some topicality to the group, has anyone
    started using, or considering, C23 ?  There's quite a lot of
    change in it, especially compared to the minor changes in C17.

    Love the way std::vectors respect alignas... C++20, iirc?

    [...]

    I have no idea what you are talking about.

    std::vector actually respects alignas, on MSVC at least. I did not
    know this worked until I tried it. Iirc, Bonita was the one that
    sparked my test. It aligned itself on the proper boundaries. Very nice.

    But did you notice that this is c.l.c, not c.l.c++, and the topic is
    C23, not C++23 ?  Discussing comparisons or compatibility with C++
    is fair enough, but talking about pure C++ matters (such as
    std::vector<>) is unlikely to be helpful.

    C has it as well... Very useful!

    I know C has alignas (now as a keyword in C23, instead of just
    _Alignas from C11).

    I know C++ has alignas (from C++11 onwards).

    What I don't understand is why you think std::vector<> "respects
    alignas" in C++20 - alignment for std::vector<> works like alignment
    for any other class in C++, and always has done.

    And what I /really/ don't understand is why you think it is remotely
    relevant here?  Even "alignas" in C is not particular relevant to this
    thread, except that it has become a keyword in C23 instead of a macro
    defined to _Alignas in <stdalign.h>.

    alignas is very nice because it can help me make a 100% portable version
    of some of my old exotic lock-free memory allocators that use rounding
    to get down to a header. Any point in the region can be rounded down to
    get at the header for the block. It involves aligning the main region on
    a large boundary, say 8192 bytes. This is a little trick for high
    performance lock-free allocators.

    I know what "alignas" can do, and have made use of it.



    Iirc, I can make std::vector align its elements to say, L2 cachelines,
    and I can make std::vector align itself on a large boundary say 8192
    bytes. All in std C++! That is nice.

    I would be astounded to hear that std::vector could somehow disobey
    alignas specifiers. I can appreciate that it could be useful to use
    alignas with std::vector's, but it is nothing special or dramatic.

    And it has nothing to do with c.l.c., or with this thread. Not have
    your allocators.

    If you want to talk about your allocators, or good uses of alignas,
    start a new thread in c.l.c or c.l.c++, according to which is
    appropriate. Both groups could do with new topical threads. But if you
    want to be a positive contribution to these groups, please consider some
    basic rules - No links (to code sites or pointless videos), no
    out-of-the-blue topic changes, and no pantomime arguments with Bonita or olcott.

    But stick to this thread if you want to talk about C23 and the changes
    it makes, whether you think they are good or bad.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Sat May 25 13:29:00 2024
    On 24/05/2024 21:29, Keith Thompson wrote:
    David Brown <[email protected]> writes:
    On 23/05/2024 18:40, Keith Thompson wrote:
    Michael S <[email protected]> writes:
    [...]
    Removed
    [...]
    7) static_assert is not provided as a macro defined in <assert.h>
    (becomes a keyword)
    8) thread_local is not provided as a macro defined in <threads.h>
    (becomes a keyword)

    [...]
    7) bad. Breaks existing code for weak reason
    8) bad. Breaks existing code for weak reason
    In pre-C23, _Static_assert and _Thread_local are keywords, and
    static_assert and thread_local are macros that expand to those keywords. >>> In C23, _Static_assert, _Thread_local, static_assert, and
    thread_local
    are all keywords. Code that simply uses the old ugly keywords would not >>> break.
    Code that does something like "#ifdef static_assert". I suppose the
    headers could have retained the old macro definitions.
    #define static_assert static_assert
    #define thread_local thread_local


    The sort of code that could theoretically break is when you have
    definitions like this:

    #define STATIC_ASSERT_NAME_(line) STATIC_ASSERT_NAME2_(line)
    #define STATIC_ASSERT_NAME2_(line) assertion_failed_at_line_##line
    #define static_assert(claim, warning) \
    typedef struct { \
    char STATIC_ASSERT_NAME_(__COUNTER__) [(claim) ? 2 : -2]; \
    } STATIC_ASSERT_NAME_(__COUNTER__)

    That works in any C version, until C23, almost as well as
    _static_assert. I used this when C11 support was rare in the tools I
    used.

    You mean _Static_assert.

    I meant either "static_assert" or "_Static_assert", rather than a
    mixture of the two! (I consider "static_assert" part of C11 even though
    it needs a header.)


    While using #define for a C keyword is undefined behaviour, in
    practice I think you'd have a hard time finding code and a compiler
    that used such a macro and which did not work just as well in C23
    mode.

    (I don't know if anyone is in the habit of declaring macros named
    "thread_local".)

    "static_assert" is already a macro defined in <assert.h> starting in
    C11. The above code is valid in pre-C23, but will break in C11 and C17
    if it includes <assert.h> directly or indirectly.

    Yes. But including <assert.h> is optional.

    You can fix it by
    adding "#undef static_assert" or by picking a different name, or by
    making your macro definition conditional on __STDC_VERSION__ >= 202311L.


    The actual code I use had a number of conditional checks for different C standards and C++, so that it does not define a static_assert macro for
    C++ (my C++ usage for the code was always at least C++11), and for C11
    onwards it was defined to _Static_assert. (I specifically did not want
    to include <assert.h>.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Scott Lurndal on Sat May 25 16:41:31 2024
    On 24/05/2024 18:16, Scott Lurndal wrote:
    David Brown <[email protected]> writes:
    On 23/05/2024 23:34, Michael S wrote:
    On Thu, 23 May 2024 22:10:22 +0200
    David Brown <[email protected]> wrote:

    _Thread_local is a special-purpose thing, probably not applicable at
    all for programming of small embedded systems, which nowadays is the
    only type of programming in C that I do for money rather than as hobby.

    I have never seen the point of it either. Why would anyone want a
    variable that exists for /all/ threads in a program, but independently
    per thread?

    Very common in kernel programming (e.g. the use of '%gs' in x86_linux)
    as a pointer to the 'per-cpu' data structure.

    We use thread local to implement 'self' methods in certain
    classes (so rather than passing pointers around, one can
    simply call class::self() to get a pointer to the
    class for each thread.

    class c_processor {
    ...
    /**
    * Per-thread value of the processor object.
    */
    static __thread c_processor *p_this;
    ...

    public:
    c_processor(c_system *, c_logger *, processor_number_t, bool);
    ~c_processor(void);

    static c_processor *self(void) { return p_this; }

    ...




    c_processor *pp = c_processor::self().

    I can see that. But you only want a few of these, and it is typically
    in very low-level code that is full of compiler-specific or
    target-specific stuff anyway. Such things could be compiler extensions
    or other implementation-specific features.

    After all, "thread_local" is useless for the vast majority of OS's
    (counting numbering of OS's, not number of users). You can't use it
    unless the C (or in this case, C++) implementation has support for the
    OS in the library and compiler.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Thiago Adams on Sat May 25 16:51:59 2024
    On 25/05/2024 13:33, Thiago Adams wrote:
    Em 5/24/2024 9:46 PM, Keith Thompson escreveu:
    Thiago Adams <[email protected]> writes:
    Em 5/24/2024 5:19 PM, Keith Thompson escreveu:
    Thiago Adams <[email protected]> writes:
    On 24/05/2024 16:45, Keith Thompson wrote:
    Thiago Adams <[email protected]> writes:
    On 23/05/2024 18:49, Keith Thompson wrote:
    error: 'constexpr' pointer initializer is not null
    5 |     constexpr char * s[] = {"a", "b"};


    Then we were asking why constexpr was used in that case.
    Why not?

    When I see a constexpr I ask if the compiler is able to compute
    everything at compile time. If not immediately it is a bad usage >>>>>>> in my
    view.
    I don't understand.  Do you object because it's not *immediately
    obvious* that everthing can be computed at compile time?  If so, why >>>>>> should it have to be?

    My understanding is that constexpr is a tip for the compiler. Does not >>>>> ensure anything. Unless you use where constant expression is required. >>>>> So I don't like to see constexpr  where I know it is not a constant >>>>> expression.
    Your understanding is incorrect.  "constexpr" is not a mere hint.
    I think I can explain I little better

    Let´s consider we have a compile time array of integers and a loop.

    https://godbolt.org/z/e8cM1KGWT

    #include <stdio.h>
    #include <stdlib.h>
    int main() {
         constexpr int a[] = {1, 2, 3, 4, 5, 6, 7, 8};
         for (int i = 0 ; i < sizeof(a)/sizeof(a[0]); i++)
         {
             printf("%d", a[i]);
         }
    }

    What the programmer expected using a constant array in a loop?
    The loop is in runtime, unless the compiler expanded the loop into 8
    calls using constant expressions. But this is not the case.
    This was the usage of constexpr I saw but with literal strings.
    So, the array a is not used as constant even if it has constexpr.

    What do you mean by "used as constant"?


    Something used to produce a constant expression.
    In the loop the compiler would have to get the value in runtime from
    array, or unroll the loop.

    I just checked, trying to extract an constant value from the array


    https://godbolt.org/z/v33Pqd7W8

    #include <stdio.h>
    #include <stdlib.h>
    int main() {
        constexpr int a[] = {1, 2, 3, 4, 5, 6, 7, 8};
        static_assert(a[0] ==1 );

    }

    I was expecting this to work!

    But gcc says

    <source>:5:24: error: expression in static assertion is not constant
        5 |     static_assert(a[0] ==1 );
          |


    That is disappointing. I too would have expected that to work in C23.
    My guess is that it is the implicit pointer dereference that is the
    problem. But I hope this is something that gets fixed shortly.

    The mess is even bigger than I thought.

    In c++ it works
    https://godbolt.org/z/qG6vGhEMj



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Thiago Adams on Sat May 25 17:14:18 2024
    On 25/05/2024 13:19, Thiago Adams wrote:
    Em 5/25/2024 8:05 AM, David Brown escreveu:


    In C (not C++), defining an object as "constexpr" gives you two things
    compared to defining it as "const".  One is that its value can be used
    when you need a constant expression according to the rules of the
    language (such as for the size of an array in a struct).  The other is
    that it gives a compile-time error if its initialiser is not itself a
    constant expression - and that means an extra check and protection
    against some kinds of programmer errors, and extra information to
    people reading the code.

    I don't expect it to make a difference in generated code from an
    optimising compiler, in comparison to objects declared with "const".



    In my view , for this sample constexpr generates noise.

    I don't share that opinion, but I understand it.

    It also can make
    the compilation slower, otherwise, why not everything constexpr by defaul?

    That claim, on the other hand, is very strange. Making everything
    constexpr by default would be a massive change to the language that
    would break all but the most negligible of existing code. And I can
    think of no particular reason why constexpr would slow down compilation,
    at least to any measurable degree.

    I still didn't find a useful usage for constexpr that would compensate
    the mess created with const, constexpr.

    I don't need a feature to "compensate" for anything to be useful. I
    don't need it to be perfect to be useful. There's a few things about
    constexpr in C23 that I think are poor decisions, unreasonable
    restrictions, or suboptimal integration with other language features
    (like static_assert) - such as the array limitations you've found. That
    will mean I can't use constexpr as much as I'd like, or as much as I do
    in C++. But even if there is just one situation where I think using
    constexpr is neater or clearer than using enum, #define, or some other technique, then I will use constexpr in that one situation. Why are you
    so insistent on throwing it out completely just because it doesn't do everything you might want?


    I already saw ( I don't have it
    now ) proposals to make const more like constexpr in C. In C++ const is already a constant expression!

    No, it is not - but sometimes a const object with particular
    characteristics can be used in situations where you would otherwise need
    a constant expression. I mentioned earlier that I find this convenient
    in C++ - Keith said it was inconsistent, which is also true. I think
    that to a large extent, if C "const" had acquired the additional
    features of C++ "const" (excluding the different linkage for file-scope
    "const" objects, since that would be a breaking change) then it would
    have done everything C23 "constexpr" does today. I personally would
    have been fine with that as a solution. But I fully appreciate that it
    would have been inconsistent and perhaps hard to specify - you'd would
    have the situation that /some/ const objects could be used for things
    like static initialisers, while others could not.

    The justification for C was VLA. They should consider VLA not VLA if it
    has a constant expression. In other words, better break this than create
    a mess.
    #define makes the job of constexpr.


    #define is one way to make named items that can be used in constant expressions, yes. But if it can be done using #define or constexpr, I
    think constexpr is the neater choice. Opinions can vary - that's my
    opinion.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Sat May 25 17:28:25 2024
    On 24/05/2024 18:22, Michael S wrote:
    On Fri, 24 May 2024 17:57:35 +0200
    David Brown <[email protected]> wrote:


    I can't say I have ever seen it as an effort. Almost all my C
    "modules" come in pairs - "file.h" and "file.c". All non-local
    variables (and all functions) are either static and declared only in
    "file.c", or they are externally linked and have an "extern"
    declaration in "file.h" and a definition (with or without
    initialisation) in "file.c" (which #includes "file.h"). It is a very
    simple and clean arrangement, easily checked by gcc warnings, and
    there are never any undetected conflicts.


    Declaration/definition pair is repeating yourself, which is not a good
    think.

    It is a good thing when you are doing different things for different
    purposes.

    Of course, the same applies to declaration/definition of externally
    visible functions, but somehow in case of functions I am more tolerant
    to repetitions than in case of variable. Probably, a psychological
    phenomenon - I feel that functions are less trivial, so repetition is
    less wasteful.

    I don't see the difference.

    A header describes the interface, in code and documenting comments that describe how to use the features of the module, for the benefit of
    programmers working on other modules. The source file gives the
    definitions, along with comments describing how and why it is
    implemented this way, for the benefit of programmers working on /this/
    module. They are different files, but closely correlated.

    But I'd like to get rid of these repetitions to, I just did not figure
    out a way to do it that does not compromise even more important concern
    of seperation between interface and implementation (yes, I dislike Java
    for that reason too).



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Lawrence D'Oliveiro on Sat May 25 17:47:48 2024
    On 25/05/2024 02:40, Lawrence D'Oliveiro wrote:
    On Fri, 24 May 2024 17:57:35 +0200, David Brown wrote:

    Why would anyone want a variable that exists for /all/ threads in a
    program, but independently per thread? The only use I can think of is
    for errno (which is, IMHO, a horror unto itself) but since that is
    defined by the implementation, it does not need to use _Thread_local.

    errno is indeed the example that immediately comes to mind for the use of this feature. It is supposed to have the semantics of an assignable
    variable, so how else would you implement it, if not by some (possibly implementation-specific or special-case equivalent of) the _Thread_local mechanism?

    The normal way for multi-threaded systems is to implement it as a macro.
    It might be, for example :

    #define errno __thread_data->_errno

    or

    #define errno *errno()

    That is precisely why it is specified in the C standards as a macro, not
    an external linkage object with static or thread-local storage duration.
    (The use of errno in multi-threading C code long predates C11 and _Thread_local.)



    I am in two minds over whether errno is a hack or not. On the one hand, it makes more sense for system calls (and library ones, too) to return an
    error status directly; on the other hand, sometimes maybe you want to “accumulate” an error status after a series of calls, and errno is a convenient way of doing this.

    I understand its purpose (and I assume that some people find it useful),
    but I much prefer a clearer flow of return values where possible - I
    don't like "hidden" return values. It is particularly bad, IMHO, when
    setting errno is optional for many library functions - it means that
    otherwise "pure" functions, such as many from <math.h>, might have side-effects, but you can't rely on them accumulating an error status.
    It's a lose-lose situation.


    As for other uses of thread-local, I think most of them have to do with optimizations, like threading itself. For example, imagine a bunch of
    threads all contributing increments to a common counter: instead of continually blocking on access to that counter, they could each have their own thread-local counter, which periodically has its current value added
    to the global counter and then zeroed.

    I fully appreciate that sometimes you want data local to a thread,
    including to different instances of the same thread function. But I
    don't see many situations where you'd want the same object to be
    available per thread in /all/ threads, as you have with thread_local
    data. You want all your counter threads to have their own local
    "counter" object? That's fine. But you don't want all your other
    threads to have that "counter" object that they never use. I admit this
    may be more of a concern for those like myself that work on
    small-systems embedded systems, but the whole concept feels wrong to me.

    The only data that really belongs to /all/ threads is for implementing
    the threading system, or for making the standard library work in
    multi-threaded programs (such as errno, handles for standard streams,
    malloc heaps, and the like). And all that is part of the
    implementation, not user code.

    (And for those working with OS's that are written as user code, rather
    than implementation code, _Thread_local is useless - there's no standard
    way to integrate it with your OS code.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to jak on Sat May 25 21:24:10 2024
    On 2024-05-24, jak <[email protected]> wrote:
    Bonita Montero ha scritto:
    Am 23.05.2024 um 21:49 schrieb Thiago Adams:
    On 23/05/2024 16:25, Bonita Montero wrote:
    I ask myself what the point is in further developing a language
    like this that can actually no longer be saved.
    do you mean C++?


    No, C.

    I think you have a lot of confusion about programming languages. C and
    C++ are not comparable languages.

    Except for observations like that we can write useful, production
    software that compiles as C or C++, but go on ...

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @[email protected]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to David Brown on Sun May 26 02:09:13 2024
    On 25/05/2024 16:14, David Brown wrote:
    On 25/05/2024 13:19, Thiago Adams wrote:

    The justification for C was VLA. They should consider VLA not VLA if
    it has a constant expression. In other words, better break this than
    create a mess.
    #define makes the job of constexpr.


    #define is one way to make named items that can be used in constant expressions, yes.  But if it can be done using #define or constexpr, I
    think constexpr is the neater choice.  Opinions can vary - that's my opinion.

    Before 'constexpr' (and it still is 'before' as implementations are
    rare), there were three disparate ways of emulating named constants in C:

    #define A 100

    enum {B = 200};

    int const C = 300;

    None of them fully do the job of the named constant feature I've used in
    my own languages (and which I also briefly had in my C compiler).

    With 'constexpr' there are now 4 ways of doing it:

    constexpr int D = 400;

    Here are some characteristics of true named constants and how those
    methods fare:

    #define enum const constexpr

    Scope rules N Y Y Y
    No & addr-of Y Y N N?
    Any type Y? N Y Y Any int/float
    Non-VLA bounds Y Y N Y?
    Switch-case? Y Y N Y?
    Reduce Y Y ? Y? 2+3 => 5
    Can't Mod value Y Y N N? By any means
    Not Context sens N Y Y Y Value may vary by context
    Single reeval N Y Y Y Expr processed once
    Lower case OK N? Y Y Y

    Ideally a column would have all Ys. None of these manage that, but
    'enum' comes nearest. However it has a problem: it wasn't designed for
    this task, which is just a useful by-product. So it looks odd.

    With const/constexpr, even if the language can't stop attempts to change
    the value, sometimes those attempts are trapped (via read-only mem etc).
    That's not ideal either.

    Regarding 'Not context sensitive', consider:

    ----------------------
    #include <stdio.h>

    enum {a = 100};

    #define M (a+1)

    enum {b = M};

    int main(void) {
    enum {a=777};

    printf("b = %d\n", b);
    printf("M = %d\n", M);
    }
    ----------------------

    The output is 101 and 778. The value of M is 101 when used to define
    `b`, and 778 later on.

    'Single reevaluation' refers to the fact that the expansion of a #define
    macro will be repeated at each invocation side, so parsing, evaluation
    and reduction of the expression will be done multiple times. It's just inefficient.

    It might also vary, not just because of the last point, but because
    there aren't enough parentheses or something so combines differently
    with surrounding context.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jak@21:1/5 to All on Sun May 26 08:44:12 2024
    Kaz Kylheku ha scritto:
    On 2024-05-24, jak <[email protected]> wrote:
    Bonita Montero ha scritto:
    Am 23.05.2024 um 21:49 schrieb Thiago Adams:
    On 23/05/2024 16:25, Bonita Montero wrote:
    I ask myself what the point is in further developing a language
    like this that can actually no longer be saved.
    do you mean C++?


    No, C.

    I think you have a lot of confusion about programming languages. C and
    C++ are not comparable languages.

    Except for observations like that we can write useful, production
    software that compiles as C or C++, but go on ...


    ... one last thing: I would ask you not to change the context of the
    discussion by cutting some parts of it to justify your comment.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jak@21:1/5 to All on Sun May 26 08:32:15 2024
    Kaz Kylheku ha scritto:
    On 2024-05-24, jak <[email protected]> wrote:
    Bonita Montero ha scritto:
    Am 23.05.2024 um 21:49 schrieb Thiago Adams:
    On 23/05/2024 16:25, Bonita Montero wrote:
    I ask myself what the point is in further developing a language
    like this that can actually no longer be saved.
    do you mean C++?


    No, C.

    I think you have a lot of confusion about programming languages. C and
    C++ are not comparable languages.

    Except for observations like that we can write useful, production
    software that compiles as C or C++, but go on ...


    Indeed there are c++ compilers who, if used to compile c code, could
    decide to call the c compiler to do the work, but if something in the
    code is not strictly c, then the compilation will be in c++, the size of
    the executable will increase significantly and will need of an internal
    or external runtimer to work. If it were the same thing you would not
    get different things.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jak@21:1/5 to All on Sun May 26 09:13:51 2024
    Bonita Montero ha scritto:
    Am 24.05.2024 um 09:32 schrieb jak:
    Bonita Montero ha scritto:
    Am 23.05.2024 um 21:49 schrieb Thiago Adams:
    On 23/05/2024 16:25, Bonita Montero wrote:
    I ask myself what the point is in further developing a language
    like this that can actually no longer be saved.
    do you mean C++?


    No, C.

    I think you have a lot of confusion about programming languages. C and
    C++ are not comparable languages.

    C and C++ have a lot in common since 95% of what you can do you can do
    in C++ also in the same way. But C++ puts 500% on top of that to solve
    your tasks with a fraction of the code and if you use that the code
    looks totally different than C.


    About this I only agree partially because it depends a lot on the
    context in which it is used. Moreover, I would not know how to indicate
    an optimal programming language for all seasons.

    I'm pretty convinced that c++ will be abandoned long before c.

    Maybe, but for sure not in favour of C.


    I absolutely agree with you.

    Just for one example, c++ would be abandoned years ago if c# didn't
    produce CLI code only because C# lacks nothing important than C++
    and the learning curve is much steeper (it also benefits from
    reflection).

    Being a good C++ programmer needs a lot of experience, but if you've
    done that you get a magnitude more productivity. And often you decide
    for simple approaches in C because complex approaches are a lot of work. Often this complex and more efficient approach is easy to handle in C++
    if you managed to understand the language.



    What you describe is the greatest inconvenience of c++. To make only one example, when they decided to rewrite the FB platform to accelerate it,
    they thought of migrating from php to c++ and they had a collapse of the
    staff suitable for work, so they thought of relying a compiler that
    translated the php into c++ and many of the new languages were born to
    try to remedy hits complexity.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Sun May 26 13:09:36 2024
    On 26/05/2024 00:58, Keith Thompson wrote:
    David Brown <[email protected]> writes:
    On 25/05/2024 03:29, Keith Thompson wrote:
    Keith Thompson <[email protected]> writes:
    David Brown <[email protected]> writes:
    On 23/05/2024 14:11, bart wrote:
    [...]
    'embed' was discussed a few months ago. I disagreed with the poor
    way it was to be implemented: 'embed' notionally generates a list of >>>>>> comma-separated numbers as tokens, where you have to take care of
    any trailing zero yourself if needed. It would also be hopelessly
    inefficient if actually implemented like that.

    Fortunately, it is /not/ actually implemented like that - it is only >>>>> implemented "as if" it were like that. Real prototype implementations >>>>> (for gcc and clang - I don't know about other tools) are extremely
    efficient at handling #embed. And the comma-separated numbers can be >>>>> more flexible in less common use-cases.
    [...]

    I'm aware of a proposed implementation for clang:

    https://github.com/llvm/llvm-project/pull/68620
    https://github.com/ThePhD/llvm-project

    I'm currently cloning the git repo, with the aim of building it so I can >>>> try it out and test some corner cases. It will take a while.

    I'm not aware of any prototype implementation for gcc. If you are, I'd >>>> be very interested in trying it out.

    (And thanks for starting this thread!)
    I've built this from source, and it mostly works. I haven't seen it
    do
    any optimization; the `#embed` directive expands to a sequence of
    comma-separated integer constants.
    Which means that this:
    #include <stdio.h>
    int main(void) {
    struct foo {
    unsigned char a;
    unsigned short b;
    unsigned int c;
    double d;
    };
    struct foo obj = {
    #embed "foo.dat"
    };
    printf("a=%d b=%d c=%d d=%f\n", obj.a, obj.b, obj.c, obj.d);
    }
    given "foo.dat" containing bytes with values 1, 2, 3, and 4,
    produces
    this output:
    a=1 b=2 c=3 d=4.000000

    That is what you would expect by the way #embed is specified. You
    would not expect to see any "optimisation", since optimisations should
    not change the results (apparent from choosing between alternative
    valid results).

    Where you will see the optimisation difference is between :

    const int xs[] = {
    #embed "x.dat"
    };

    and

    const int xs[] = {
    #include "x.csv"
    };


    where "x.dat" is a large binary file, and "x.csv" is the same data as
    comma-separated values. The #embed version will compile very much
    faster, using far less memory. /That/ is the optimisation.

    Why would it compile faster? #embed expands to something similar to
    CSV, which still has to be parsed.

    No, it does /not/. That's the /whole/ point of #embed, and the main
    motivation for its existence. People have always managed to embed
    binary source files into their binary output files - using linker
    tricks, or using xxd or other tools (common or specialised) to turn
    binary files into initialisers for constant arrays (or structs). I've
    done so myself on many projects, all integrated together in makefiles.

    #embed has two purposes. One is to save you from using external tools
    for that kind of thing. The other is to do it more efficiently for big
    files.

    There are two ways this is done for examples like this. One is that is
    that the compiler does /not/ turn each byte into a series of ASCII
    digits for the number, then parse that number to get back to a byte. It
    jumps straight from byte in to byte out, possibly after expanding to a
    bigger type size if necessary. Secondly, compilers typically track lots
    more information about each initialiser - such as the file, line and
    column number so that it can give you helpful messages if there is a
    value out of range, or too many or too few initialisers. With #embed,
    the compiler doesn't have to do any of that.

    The compiler will generate results /as if/ it had expanded the file to a
    list of numbers and parsed them. But it will not do that in practice.
    (At least, not for more serious implementations - simple solutions might
    do so to get support implemented quickly.)



    Reference: <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf> 6.10.4.

    The first one will probably initialize each int element of xs to a
    single byte value extracted from x.dat. Is that what you intended?

    Yes, if that's what the programmer wrote - though I agree that character
    types will be more common and will be the prime target for optimisation.

    #embed works best with arrays of unsigned char.

    Sure, that will be a very common use.


    If you mean that the #embed will expand to something other than the
    sequence of integer constants, how does it know to do that in this
    context?

    It knows because the compiler writers are actually quite smart. The C standards may describe the translation process in a series of distinct
    and independent phases, but that's not how it is done in practice. The
    key point is that the compiler knows how the sequence of integers is
    going to be used before it gets that far in the preprocessing.

    I'd expect implementations to have extremely fast implementations for initialising arrays of character types, and probably also for other
    arrays of scaler types. More complicated examples - such as parameters
    in a macro or function call - would probably use a fall-back of
    generating naïve lists of integer constants.


    If you have a binary file containing a sequence of int values, you can
    use #embed to initialize an unsigned char array that's aliased with or
    copied to the int array.

    The *embed element width* is typically going to be CHAR_BIT bits by
    default. It can only be changed by an *implementation-defined* embed parameter. It seems odd that there's no standard way to specify the
    element width.

    It seems even more odd that the embed element width is
    implementation defined and not set to CHAR_BIT by default.

    I agree. But it may be left flexible for situations where the host and
    target have different ideas about CHAR_BIT. (Targets with CHAR_BIT
    other than 8 are very rare, hosts with CHAR_BIT other than 8 are
    non-existent, but C remains flexible.)

    A conforming implementation could set the embed element width to,
    say, 4*CHAR_BIT and then not provide an implementation-defined embed parameter to specify a different width, making #embed unusable for
    unsigned char arrays. (N3220 is a draft, not the final C23 standard,
    but I haven't heard about any changes in this area.)

    The kind of optimization I was thinking about was having #embed, in some cases, expand to something other than the specified sequence of comma-separated integer constants. Such an optimization would be
    intended to improve compile-time speed and memory usage, not run-time performance.

    With a straightforward implementation, the preprocessor has to generate
    a sequence of integer constants as text, and then later compiler phases
    have to parse that text sequence and generate the corresponding code.

    Given:

    const unsigned char data[4] = {
    #embed "four_bytes.dat"
    }

    That 4 byte data file is translated to something like "1, 2, 3, 4", then converted into a stream of tokens, then those tokens are parsed, then,
    given the context, the original 4-byte sequence is written into the
    generated object file.

    For a very large file, that could be a significant burden. (I don't
    have any numbers on that.)

    I do :

    <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#design-efficiency-metrics>

    (That's from a proposal for #embed for C and C++. Generating the
    numbers and parsing them is akin to using xxd.)

    More useful links:

    <https://thephd.dev/embed-the-details#results> <https://thephd.dev/implementing-embed-c-and-c++>

    (These are from someone who did a lot of the work for the proposals, and prototype implementations, as far as I understand it.)



    Note that I can't say how much of a difference this will make in real
    life. I don't know how often people need to include multi-megabyte
    files in their code. It certainly is not at a level where I would
    change any of my existing projects from external generator scripts to
    using #embed, but I might use it in future projects.



    An optimized version might have the preprocessor generate some compiler-specific binary output, say something like "@rawdata N"
    followed by N bytes of raw data. Later compiler phases recognize the "@rawdata" construct and directly dump the data into the object file in
    the right place. Making #embed generate @rawdata is only part of the solution; the compiler has to implement @rawdata in a way that allows it
    to be used inside an initializer, or perhaps in any other appropriate context.

    That's the idea. In theory, C pre-processors and C compilers are
    independent programs with a standardised format between them - in
    practice, they are often part of the same binary, and almost invariably
    come from the same developers. The "cpp" program may have to generate
    standard preprocessed output, and the "cc" program may have to accept
    standard preprocessed output, but there is nothing to stop the pair of
    programs supporting extended formats that are more efficient.


    This could be substantially more efficient for something like:

    static const unsigned char data[] = {
    #embed "bigfile.dat"
    };

    Of course it wouldn't handle my test case above. But #embed can take parameters, so it could generate the standard sequence by default and "@rawdata" if you ask for it.

    I don't know whether this kind of optimization is worthwhile, i.e.,
    whether the straightforward implementation really imposes significant commpile-time performance penalties that @rawdata or equivalent can
    solve. I also don't know whether existing implementations will
    implement this kind of optimization (so far they haven't implemented
    #embed at all).


    Prototypes have been made, and they do have such optimisations. How
    things end up in real tools remains to be seen, of course.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jak@21:1/5 to All on Sun May 26 13:44:32 2024
    Keith Thompson ha scritto:
    jak <[email protected]> writes:
    Kaz Kylheku ha scritto:
    On 2024-05-24, jak <[email protected]> wrote:
    Bonita Montero ha scritto:
    Am 23.05.2024 um 21:49 schrieb Thiago Adams:
    On 23/05/2024 16:25, Bonita Montero wrote:
    I ask myself what the point is in further developing a language
    like this that can actually no longer be saved.
    do you mean C++?


    No, C.

    I think you have a lot of confusion about programming languages. C and >>>> C++ are not comparable languages.
    Except for observations like that we can write useful, production
    software that compiles as C or C++, but go on ...

    Indeed there are c++ compilers who, if used to compile c code, could
    decide to call the c compiler to do the work, but if something in the
    code is not strictly c, then the compilation will be in c++, the size
    of the executable will increase significantly and will need of an
    internal or external runtimer to work. If it were the same thing you
    would not get different things.

    Oh? Do you know of a C++ compiler that actually behaves this way?
    I've never heard of such a thing.

    C and C++ are closely related, and C and C++ compilers often share
    backends, but the two languages have different grammars. The gcc
    command, for example, can invoke either a C or C++ compiler, but it
    knows which language it's compiling based on the source file name or
    command line options, before it's even seen the content.

    There are programs that are valid C and valid C++ but with different behavior. How would a compiler that behaves as you describe cope with
    that?


    For example g++ makes something similar: if you pass a file .C it
    compile the C code but if the file (.C) contains C++ code then
    compile C++.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to David Brown on Sun May 26 12:51:12 2024
    On 26/05/2024 12:09, David Brown wrote:
    On 26/05/2024 00:58, Keith Thompson wrote:

    For a very large file, that could be a significant burden.  (I don't
    have any numbers on that.)

    I do :

    <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#design-efficiency-metrics>

    (That's from a proposal for #embed for C and C++.  Generating the
    numbers and parsing them is akin to using xxd.)

    More useful links:

    <https://thephd.dev/embed-the-details#results> <https://thephd.dev/implementing-embed-c-and-c++>

    (These are from someone who did a lot of the work for the proposals, and prototype implementations, as far as I understand it.)



    Note that I can't say how much of a difference this will make in real
    life.  I don't know how often people need to include multi-megabyte
    files in their code.  It certainly is not at a level where I would
    change any of my existing projects from external generator scripts to
    using #embed, but I might use it in future projects.

    I've just done my own quick test (not in C, using embed in my language):

    []byte clangexe = binclude("f:/llvm/bin/clang.exe")

    proc main=
    fprintln "clang.exe is # bytes", clangexe.len
    end


    This embeds the Clang C compiler which is 119MB. It took 1.3 seconds to
    compile (note my compiler is not optimised).

    If I tried it using text: a 121M-line include file, with one number per
    line, it took 144 seconds (I believe it used more RAM than was
    available: each line will have occupied a 64-byte AST node, so nearly
    8GB, on a machine with only 6GB available RAM, much of which was occupied).

    The figures at your link say it took 1 second for a 40MB test file, on
    an Intel i7 with 24GB.

    My compiler took just over 1.3 seconds (now annoyingly taking 1.4
    seconds for a retest) for a file nearly 3 times bigger, on a much more
    lowly machine (second cheapest PC in the shop), with 8GB.

    So my implementation sounds faster. Of course, those 120M data bytes
    haven't been optimised!

    As for usage, this would be a tidy way of bundling a program like a C
    compiler if your program required it, although there are a number of alternatives in that case: the binary here doesn't need to exist in the application's data space.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to jak on Sun May 26 15:39:13 2024
    On Sun, 26 May 2024 13:44:32 +0200
    jak <[email protected]> wrote:

    Keith Thompson ha scritto:
    jak <[email protected]> writes:
    Kaz Kylheku ha scritto:
    On 2024-05-24, jak <[email protected]> wrote:
    Bonita Montero ha scritto:
    Am 23.05.2024 um 21:49 schrieb Thiago Adams:
    On 23/05/2024 16:25, Bonita Montero wrote:
    I ask myself what the point is in further developing a
    language like this that can actually no longer be saved.
    do you mean C++?


    No, C.

    I think you have a lot of confusion about programming languages.
    C and C++ are not comparable languages.
    Except for observations like that we can write useful, production
    software that compiles as C or C++, but go on ...

    Indeed there are c++ compilers who, if used to compile c code,
    could decide to call the c compiler to do the work, but if
    something in the code is not strictly c, then the compilation will
    be in c++, the size of the executable will increase significantly
    and will need of an internal or external runtimer to work. If it
    were the same thing you would not get different things.

    Oh? Do you know of a C++ compiler that actually behaves this way?
    I've never heard of such a thing.

    C and C++ are closely related, and C and C++ compilers often share backends, but the two languages have different grammars. The gcc
    command, for example, can invoke either a C or C++ compiler, but it
    knows which language it's compiling based on the source file name or command line options, before it's even seen the content.

    There are programs that are valid C and valid C++ but with different behavior. How would a compiler that behaves as you describe cope
    with that?


    For example g++ makes something similar: if you pass a file .C it
    compile the C code but if the file (.C) contains C++ code then
    compile C++.


    No, it does not.
    g++ compiles as C++ unless you tell it to compile as C with '-x c'
    option.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Sun May 26 16:18:32 2024
    On Sun, 26 May 2024 12:51:12 +0100
    bart <[email protected]> wrote:

    On 26/05/2024 12:09, David Brown wrote:
    On 26/05/2024 00:58, Keith Thompson wrote:

    For a very large file, that could be a significant burden.  (I
    don't have any numbers on that.)

    I do :

    <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#design-efficiency-metrics>

    (That's from a proposal for #embed for C and C++.  Generating the
    numbers and parsing them is akin to using xxd.)

    More useful links:

    <https://thephd.dev/embed-the-details#results> <https://thephd.dev/implementing-embed-c-and-c++>

    (These are from someone who did a lot of the work for the
    proposals, and prototype implementations, as far as I understand
    it.)



    Note that I can't say how much of a difference this will make in
    real life.  I don't know how often people need to include
    multi-megabyte files in their code.  It certainly is not at a level
    where I would change any of my existing projects from external
    generator scripts to using #embed, but I might use it in future
    projects.

    I've just done my own quick test (not in C, using embed in my
    language):

    []byte clangexe = binclude("f:/llvm/bin/clang.exe")

    proc main=
    fprintln "clang.exe is # bytes", clangexe.len
    end


    This embeds the Clang C compiler which is 119MB. It took 1.3 seconds
    to compile (note my compiler is not optimised).

    If I tried it using text: a 121M-line include file, with one number
    per line, it took 144 seconds (I believe it used more RAM than was available: each line will have occupied a 64-byte AST node, so nearly
    8GB, on a machine with only 6GB available RAM, much of which was
    occupied).

    On my old PC that was not the cheapest box in the shop, but is more than
    10 y.o. compilation speed for similarly organized (but much smaller)
    text files is as following:
    MSVC 18.00.31101 (VS 2013) - 1950 KB/sec
    MSVC 19.16.27032 (VS 2017) - 1180 KB/sec
    MSVC 19.20.27500 (VS 2019) - 1180 KB/sec
    clang 17.0.6 - 547 KB/sec (somewhat better with hex text)
    gcc 13.2.0 - 580 KB/sec

    So, MSVC compilers, esp. an old one, are somewhat faster than yours.
    But if there was swapping involved it's not comparable. How much time
    does it take for your compiler to produce 5MB byte array from text?


    The figures at your link say it took 1 second for a 40MB test file,
    on an Intel i7 with 24GB.

    My compiler took just over 1.3 seconds (now annoyingly taking 1.4
    seconds for a retest) for a file nearly 3 times bigger, on a much
    more lowly machine (second cheapest PC in the shop), with 8GB.

    So my implementation sounds faster. Of course, those 120M data bytes
    haven't been optimised!


    But both are much faster than compiling through text. Even "slow"
    40MB/3 is 6-7 times faster than the fastest of compilers in my tests.

    As for usage, this would be a tidy way of bundling a program like a C compiler if your program required it, although there are a number of alternatives in that case: the binary here doesn't need to exist in
    the application's data space.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jak@21:1/5 to All on Sun May 26 15:46:33 2024
    Michael S ha scritto:
    On Sun, 26 May 2024 13:44:32 +0200
    jak <[email protected]> wrote:

    Keith Thompson ha scritto:
    jak <[email protected]> writes:
    Kaz Kylheku ha scritto:
    On 2024-05-24, jak <[email protected]> wrote:
    Bonita Montero ha scritto:
    Am 23.05.2024 um 21:49 schrieb Thiago Adams:
    On 23/05/2024 16:25, Bonita Montero wrote:
    I ask myself what the point is in further developing a
    language like this that can actually no longer be saved.
    do you mean C++?


    No, C.

    I think you have a lot of confusion about programming languages.
    C and C++ are not comparable languages.
    Except for observations like that we can write useful, production
    software that compiles as C or C++, but go on ...

    Indeed there are c++ compilers who, if used to compile c code,
    could decide to call the c compiler to do the work, but if
    something in the code is not strictly c, then the compilation will
    be in c++, the size of the executable will increase significantly
    and will need of an internal or external runtimer to work. If it
    were the same thing you would not get different things.

    Oh? Do you know of a C++ compiler that actually behaves this way?
    I've never heard of such a thing.

    C and C++ are closely related, and C and C++ compilers often share
    backends, but the two languages have different grammars. The gcc
    command, for example, can invoke either a C or C++ compiler, but it
    knows which language it's compiling based on the source file name or
    command line options, before it's even seen the content.

    There are programs that are valid C and valid C++ but with different
    behavior. How would a compiler that behaves as you describe cope
    with that?


    For example g++ makes something similar: if you pass a file .C it
    compile the C code but if the file (.C) contains C++ code then
    compile C++.


    No, it does not.
    g++ compiles as C++ unless you tell it to compile as C with '-x c'
    option.




    You didn't read carefully or I didn't express myself well. I wrote that
    the g++ compile c++ even if it is written inside a .c file.
    However in doubt I preferred to try. If I pass to g++ a .c file that
    contains c code, it compiles without any option, perhaps because it
    reads as if it were c++ but in any case compiles it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Sun May 26 16:18:17 2024
    On 26/05/2024 01:45, Keith Thompson wrote:
    David Brown <[email protected]> writes:
    [...]
    The normal way for multi-threaded systems is to implement it as a
    macro. It might be, for example :

    #define errno __thread_data->_errno

    or

    #define errno *errno()

    Both of those need more parentheses -- and I'm unconfortable using the
    same identifier for the macro and the function.


    The second example was from the footnote in the C standard's section on <errno.h>, so it can't be /that/ bad!

    But I agree with your discomfort.

    That is precisely why it is specified in the C standards as a macro,
    not an external linkage object with static or thread-local storage
    duration. (The use of errno in multi-threading C code long predates
    C11 and _Thread_local.)
    [...]

    glibc and musl both have :

    # define errno (*__errno_location ())

    newlib (used on Cygwin) has something similar :

    #define errno (*__errno())


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to jak on Sun May 26 17:20:30 2024
    On Sun, 26 May 2024 15:46:33 +0200
    jak <[email protected]> wrote:

    Michael S ha scritto:
    On Sun, 26 May 2024 13:44:32 +0200
    jak <[email protected]> wrote:

    Keith Thompson ha scritto:
    jak <[email protected]> writes:
    Kaz Kylheku ha scritto:
    On 2024-05-24, jak <[email protected]> wrote:
    Bonita Montero ha scritto:
    Am 23.05.2024 um 21:49 schrieb Thiago Adams:
    On 23/05/2024 16:25, Bonita Montero wrote:
    I ask myself what the point is in further developing a
    language like this that can actually no longer be saved.
    do you mean C++?


    No, C.

    I think you have a lot of confusion about programming
    languages. C and C++ are not comparable languages.
    Except for observations like that we can write useful,
    production software that compiles as C or C++, but go on ...

    Indeed there are c++ compilers who, if used to compile c code,
    could decide to call the c compiler to do the work, but if
    something in the code is not strictly c, then the compilation
    will be in c++, the size of the executable will increase
    significantly and will need of an internal or external runtimer
    to work. If it were the same thing you would not get different
    things.

    Oh? Do you know of a C++ compiler that actually behaves this way?
    I've never heard of such a thing.

    C and C++ are closely related, and C and C++ compilers often share
    backends, but the two languages have different grammars. The gcc
    command, for example, can invoke either a C or C++ compiler, but
    it knows which language it's compiling based on the source file
    name or command line options, before it's even seen the content.

    There are programs that are valid C and valid C++ but with
    different behavior. How would a compiler that behaves as you
    describe cope with that?


    For example g++ makes something similar: if you pass a file .C it
    compile the C code but if the file (.C) contains C++ code then
    compile C++.


    No, it does not.
    g++ compiles as C++ unless you tell it to compile as C with '-x c'
    option.




    You didn't read carefully or I didn't express myself well. I wrote
    that the g++ compile c++ even if it is written inside a .c file.
    However in doubt I preferred to try. If I pass to g++ a .c file that
    contains c code, it compiles without any option, perhaps because it
    reads as if it were c++ but in any case compiles it.


    It is easy to see that it was compiled as C++ rather than as c.
    Look at the content of the generated object with 'objdump -d'.
    You will see that the names of global functions and variables are
    mangled.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Sun May 26 16:15:41 2024
    On 26/05/2024 01:21, Keith Thompson wrote:
    David Brown <[email protected]> writes:
    On 24/05/2024 21:29, Keith Thompson wrote:
    [...]
    "static_assert" is already a macro defined in <assert.h> starting in
    C11. The above code is valid in pre-C23, but will break in C11 and C17
    if it includes <assert.h> directly or indirectly.

    Yes. But including <assert.h> is optional.

    Your header that defines your own "static_assert" macro might depend on
    some other header outside your control. A future version of that other header might add a "#include <assert.h>", breaking your code.


    I believe - but am not entirely sure - that the standard library headers
    are not allowed to include each other, precisely so that there will not
    be conflicts between user-defined identifiers and standard library
    identifiers from headers that you did not explicitly include.

    I appreciate what you are saying, and it can often make sense for other
    people. But in /my/ code, there is no possibility of future versions of headers having other includes. In my projects, I consider the entire
    toolchain to be part of the project, along with any other libraries or
    SDK's. Surprises like that don't happen when I am working on a project
    - nor when I take the same project out of archives and rebuild it 20
    years later to get exactly the same binary, nor when anyone else does
    that. Reproducible builds are vital to my work.

    Of course, if I re-use the same code in a different project with
    different toolchains or libraries, such issues could crop up - but they
    are easily spotted and handled at the time.

    There are solutions (check "#ifdef static_assert" for the macro and __STDC_VERSION__ for the keyword, etc.)


    Indeed.

    Perhaps it's not an issue for you, but it's a corner case to keep in
    mind.


    It is not an issue for me, no - but I agree that it can be an issue for
    some people, and I agree it is worth keeping in mind. I am not
    suggesting that defining your own static_assert macro is a good idea for general use - I was merely saying that /I/ had used it as a temporary
    measure before C11 (and C++11) became practical for the majority of work
    I did, and that it could have compatibility issues when moving to C23.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to jak on Sun May 26 16:29:35 2024
    On 26/05/2024 15:46, jak wrote:
    Michael S ha scritto:
    On Sun, 26 May 2024 13:44:32 +0200
    jak <[email protected]> wrote:

    Keith Thompson ha scritto:
    jak <[email protected]> writes:
    Kaz Kylheku ha scritto:
    On 2024-05-24, jak <[email protected]> wrote:
    Bonita Montero ha scritto:
    Am 23.05.2024 um 21:49 schrieb Thiago Adams:
    On 23/05/2024 16:25, Bonita Montero wrote:
    I ask myself what the point is in further developing a
    language like this that can actually no longer be saved.
    do you mean C++?

    No, C.

    I think you have a lot of confusion about programming languages. >>>>>>> C and C++ are not comparable languages.
    Except for observations like that we can write useful, production
    software that compiles as C or C++, but go on ...

    Indeed there are c++ compilers who, if used to compile c code,
    could decide to call the c compiler to do the work, but if
    something in the code is not strictly c, then the compilation will
    be in c++, the size of the executable will increase significantly
    and will need of an internal or external runtimer to work. If it
    were the same thing you would not get different things.

    Oh?  Do you know of a C++ compiler that actually behaves this way?
    I've never heard of such a thing.

    C and C++ are closely related, and C and C++ compilers often share
    backends, but the two languages have different grammars.  The gcc
    command, for example, can invoke either a C or C++ compiler, but it
    knows which language it's compiling based on the source file name or
    command line options, before it's even seen the content.

    There are programs that are valid C and valid C++ but with different
    behavior.  How would a compiler that behaves as you describe cope
    with that?

    For example g++ makes something similar: if you pass a file .C it
    compile the C code but if the file (.C) contains C++ code then
    compile C++.


    No.


    No, it does not.
    g++ compiles as C++ unless you tell it to compile as C with '-x c'
    option.


    No.




    You didn't read carefully or I didn't express myself well. I wrote that
    the g++ compile c++ even if it is written inside a .c file.
    However in doubt I preferred to try. If I pass to g++ a .c file that
    contains c code, it compiles without any option, perhaps because it
    reads as if it were c++ but in any case compiles it.


    No.


    The way gcc handles all this is actually quite straightforward.

    First, there is no difference between the commands "gcc" and "g++" in
    the languages supported, or the way the language is determined. The
    only difference between these two is the standard libraries linked by
    default when generating a final executable - "g++" automatically
    includes the C++ standard libraries, while "gcc" only has the C standard libraries.

    In neither case does "gcc" or "g++" actually handle the compilation -
    these are driver front-ends that pass things on to the actual compilers, assemblers and linkers (and any other bits and pieces required).

    The front-ends determine the language to use primarily from the suffix
    of the source file it is given. ".c" files are compiled as C. ".cpp",
    ".c++", ".cc", ".C" (note the capital C), ".cp", ".cxx", and ".CPP" are compiled as C++. (There are many other extensions supported for
    different languages.)

    The language choice can be overridden by using the "-x" switch, such as
    "-x c" or "-x c++". The standard can be specified with "-std=".

    There is no automatic detection of C or C++ based on the /content/ of
    the files.


    <https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Sun May 26 18:05:31 2024
    On Sun, 26 May 2024 16:29:35 +0200
    David Brown <[email protected]> wrote:

    On 26/05/2024 15:46, jak wrote:
    Michael S ha scritto:
    On Sun, 26 May 2024 13:44:32 +0200
    jak <[email protected]> wrote:

    Keith Thompson ha scritto:
    jak <[email protected]> writes:
    Kaz Kylheku ha scritto:
    On 2024-05-24, jak <[email protected]> wrote:
    Bonita Montero ha scritto:
    Am 23.05.2024 um 21:49 schrieb Thiago Adams:
    On 23/05/2024 16:25, Bonita Montero wrote:
    I ask myself what the point is in further developing a
    language like this that can actually no longer be saved. >>>>>>>>> do you mean C++?

    No, C.

    I think you have a lot of confusion about programming
    languages. C and C++ are not comparable languages.
    Except for observations like that we can write useful,
    production software that compiles as C or C++, but go on ...

    Indeed there are c++ compilers who, if used to compile c code,
    could decide to call the c compiler to do the work, but if
    something in the code is not strictly c, then the compilation
    will be in c++, the size of the executable will increase
    significantly and will need of an internal or external runtimer
    to work. If it were the same thing you would not get different
    things.

    Oh?  Do you know of a C++ compiler that actually behaves this
    way? I've never heard of such a thing.

    C and C++ are closely related, and C and C++ compilers often
    share backends, but the two languages have different grammars.
    The gcc command, for example, can invoke either a C or C++
    compiler, but it knows which language it's compiling based on
    the source file name or command line options, before it's even
    seen the content.

    There are programs that are valid C and valid C++ but with
    different behavior.  How would a compiler that behaves as you
    describe cope with that?

    For example g++ makes something similar: if you pass a file .C it
    compile the C code but if the file (.C) contains C++ code then
    compile C++.


    No.


    No, it does not.
    g++ compiles as C++ unless you tell it to compile as C with '-x c'
    option.


    No.




    You didn't read carefully or I didn't express myself well. I wrote
    that the g++ compile c++ even if it is written inside a .c file.
    However in doubt I preferred to try. If I pass to g++ a .c file that contains c code, it compiles without any option, perhaps because it
    reads as if it were c++ but in any case compiles it.


    No.


    The way gcc handles all this is actually quite straightforward.

    First, there is no difference between the commands "gcc" and "g++" in
    the languages supported, or the way the language is determined. The
    only difference between these two is the standard libraries linked by default when generating a final executable - "g++" automatically
    includes the C++ standard libraries, while "gcc" only has the C
    standard libraries.

    In neither case does "gcc" or "g++" actually handle the compilation -
    these are driver front-ends that pass things on to the actual
    compilers, assemblers and linkers (and any other bits and pieces
    required).


    I don't know how it works in your environment.
    I am 100% sure that it works like I wrote above in my environment. Specifically:
    'g++ -c foo.c' calls binary cc1plus.exe
    'g++ -c -x c foo.c' calls binary cc1.exe
    'gcc -c foo.c' calls binary cc1.exe
    'gcc -c foo.cpp' calls binary cc1plus.exe
    'gcc -c foo.C' calls binary cc1plus.exe


    The front-ends determine the language to use primarily from the
    suffix of the source file it is given. ".c" files are compiled as C.
    ".cpp", ".c++", ".cc", ".C" (note the capital C), ".cp", ".cxx", and
    ".CPP" are compiled as C++. (There are many other extensions
    supported for different languages.)


    In my environment it applies to gcc, but not to g++.
    In order to force my g++ to compile for other language you have to tell
    it so explicitly.

    The language choice can be overridden by using the "-x" switch, such
    as "-x c" or "-x c++". The standard can be specified with "-std=".


    Yes, of course.

    There is no automatic detection of C or C++ based on the /content/ of
    the files.


    Yes, of course.


    <https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html>


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jak@21:1/5 to All on Sun May 26 17:10:01 2024
    David Brown ha scritto:
    On 26/05/2024 15:46, jak wrote:
    Michael S ha scritto:
    On Sun, 26 May 2024 13:44:32 +0200
    jak <[email protected]> wrote:

    Keith Thompson ha scritto:
    jak <[email protected]> writes:
    Kaz Kylheku ha scritto:
    On 2024-05-24, jak <[email protected]> wrote:
    Bonita Montero ha scritto:
    Am 23.05.2024 um 21:49 schrieb Thiago Adams:
    On 23/05/2024 16:25, Bonita Montero wrote:
    I ask myself what the point is in further developing a
    language like this that can actually no longer be saved.
    do you mean C++?

    No, C.

    I think you have a lot of confusion about programming languages. >>>>>>>> C and C++ are not comparable languages.
    Except for observations like that we can write useful, production >>>>>>> software that compiles as C or C++, but go on ...

    Indeed there are c++ compilers who, if used to compile c code,
    could decide to call the c compiler to do the work, but if
    something in the code is not strictly c, then the compilation will >>>>>> be in c++, the size of the executable will increase significantly
    and will need of an internal or external runtimer to work. If it
    were the same thing you would not get different things.

    Oh?  Do you know of a C++ compiler that actually behaves this way?
    I've never heard of such a thing.

    C and C++ are closely related, and C and C++ compilers often share
    backends, but the two languages have different grammars.  The gcc
    command, for example, can invoke either a C or C++ compiler, but it
    knows which language it's compiling based on the source file name or >>>>> command line options, before it's even seen the content.

    There are programs that are valid C and valid C++ but with different >>>>> behavior.  How would a compiler that behaves as you describe cope
    with that?

    For example g++ makes something similar: if you pass a file .C it
    compile the C code but if the file (.C) contains C++ code then
    compile C++.


    No.


    No, it does not.
    g++ compiles as C++ unless you tell it to compile as C with '-x c'
    option.


    No.




    You didn't read carefully or I didn't express myself well. I wrote that
    the g++ compile c++ even if it is written inside a .c file.
    However in doubt I preferred to try. If I pass to g++ a .c file that
    contains c code, it compiles without any option, perhaps because it
    reads as if it were c++ but in any case compiles it.


    No.


    The way gcc handles all this is actually quite straightforward.

    First, there is no difference between the commands "gcc" and "g++" in
    the languages supported, or the way the language is determined.  The
    only difference between these two is the standard libraries linked by
    default when generating a final executable - "g++" automatically
    includes the C++ standard libraries, while "gcc" only has the C standard libraries.

    In neither case does "gcc" or "g++" actually handle the compilation -
    these are driver front-ends that pass things on to the actual compilers, assemblers and linkers (and any other bits and pieces required).

    The front-ends determine the language to use primarily from the suffix
    of the source file it is given.  ".c" files are compiled as C.  ".cpp", ".c++", ".cc", ".C" (note the capital C), ".cp", ".cxx", and ".CPP" are compiled as C++.  (There are many other extensions supported for
    different languages.)

    The language choice can be overridden by using the "-x" switch, such as
    "-x c" or "-x c++".  The standard can be specified with "-std=".

    There is no automatic detection of C or C++ based on the /content/ of
    the files.


    <https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html>



    ?
    I really wrote that something similar (similar != equal) did g++ and
    that, if you write c++ code in a file with the .c extension, the g++
    compile it. I never wrote that it was automatically recognized.
    In addition, you just explained why g++ compile a .c that contains c++
    code. I don't understand: no what?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to jak on Sun May 26 18:23:48 2024
    On Sun, 26 May 2024 17:10:01 +0200
    jak <[email protected]> wrote:


    ?
    I really wrote that something similar (similar != equal) did g++ and
    that, if you write c++ code in a file with the .c extension, the g++
    compile it. I never wrote that it was automatically recognized.
    In addition, you just explained why g++ compile a .c that contains c++
    code. I don't understand: no what?


    Your English is already harder to understand than mine.
    Congratulations, that is not a small fit. But you still have fir to
    pursuit. Keep exercising.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Sun May 26 16:25:51 2024
    On 26/05/2024 14:18, Michael S wrote:
    On Sun, 26 May 2024 12:51:12 +0100
    bart <[email protected]> wrote:

    On 26/05/2024 12:09, David Brown wrote:
    On 26/05/2024 00:58, Keith Thompson wrote:

    For a very large file, that could be a significant burden.  (I
    don't have any numbers on that.)

    I do :

    <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#design-efficiency-metrics>

    (That's from a proposal for #embed for C and C++.  Generating the
    numbers and parsing them is akin to using xxd.)

    More useful links:

    <https://thephd.dev/embed-the-details#results>
    <https://thephd.dev/implementing-embed-c-and-c++>

    (These are from someone who did a lot of the work for the
    proposals, and prototype implementations, as far as I understand
    it.)



    Note that I can't say how much of a difference this will make in
    real life.  I don't know how often people need to include
    multi-megabyte files in their code.  It certainly is not at a level
    where I would change any of my existing projects from external
    generator scripts to using #embed, but I might use it in future
    projects.

    I've just done my own quick test (not in C, using embed in my
    language):

    []byte clangexe = binclude("f:/llvm/bin/clang.exe")

    proc main=
    fprintln "clang.exe is # bytes", clangexe.len
    end


    This embeds the Clang C compiler which is 119MB. It took 1.3 seconds
    to compile (note my compiler is not optimised).

    If I tried it using text: a 121M-line include file, with one number
    per line, it took 144 seconds (I believe it used more RAM than was
    available: each line will have occupied a 64-byte AST node, so nearly
    8GB, on a machine with only 6GB available RAM, much of which was
    occupied).

    On my old PC that was not the cheapest box in the shop, but is more than
    10 y.o. compilation speed for similarly organized (but much smaller)
    text files is as following:
    MSVC 18.00.31101 (VS 2013) - 1950 KB/sec
    MSVC 19.16.27032 (VS 2017) - 1180 KB/sec
    MSVC 19.20.27500 (VS 2019) - 1180 KB/sec
    clang 17.0.6 - 547 KB/sec (somewhat better with hex text)
    gcc 13.2.0 - 580 KB/sec

    So, MSVC compilers, esp. an old one, are somewhat faster than yours.
    But if there was swapping involved it's not comparable. How much time
    does it take for your compiler to produce 5MB byte array from text?

    Are you talking about a 5MB array initialised like this:

    unsigned char data[] = {
    45,
    67,
    17,
    ... // 5M-3 more rows
    };

    The timing for 120M entries was challenging as it exceeded physical
    memory. However that test I can also do with C compilers. Results for
    120 million lines of data are:

    DMC - Out-of-memory

    Tiny C - Silently stopped after 13 second (I thought it had
    finished but no)

    lccwin32 - Insufficient memory

    gcc 10.x.x - Out of memory after 80 seconds

    mcc - (My product) Memory failure after 27 seconds

    Clang - (Crashed after 5 minutes)

    MM 144s (Compiler for my language)

    So the compiler for my language did quite well, considering!


    Back to the 5MB test:

    Tiny C 1.7s 2.9MB/sec (Tcc doesn't use any IR)

    mcc 3.7s 1.3MB/sec (my product; uses intermediate ASM)

    DMC -- -- (Out of memory; 32-bit compiler)

    lccwin32 3.9s 1.3MB/sec

    gcc 10.x 10.6s 0.5MB/sec

    clang 7.4s 0.7MB/sec (to object file only)

    MM 1.4s 3.6MB/sec (compiler for my language)

    MM 0.7 7.1MB/sec (MM optimised via C and gcc-O3)

    As a reminder, when using my version of 'embed' in my language,
    embedding a 120MB binary file took 1.3 seconds, about 90MB/second.


    But both are much faster than compiling through text. Even "slow"
    40MB/3 is 6-7 times faster than the fastest of compilers in my tests.

    Do you have a C compiler that supports #embed?

    It's generally understood that processing text is slow, if representing byte-at-a-time data. If byte arrays could be represented as sequences of
    i64 constants, it would improve matters. That could be done in C, but awkwardly, by aliasing a byte-array with an i64-array.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to BGB on Sun May 26 18:12:17 2024
    On 26/05/2024 16:48, BGB wrote:
    On 5/26/2024 9:18 AM, David Brown wrote:
    On 26/05/2024 01:45, Keith Thompson wrote:
    David Brown <[email protected]> writes:
    [...]
    The normal way for multi-threaded systems is to implement it as a
    macro.   It might be, for example :

        #define errno __thread_data->_errno

    or

        #define errno *errno()

    Both of those need more parentheses -- and I'm unconfortable using the
    same identifier for the macro and the function.


    The second example was from the footnote in the C standard's section
    on <errno.h>, so it can't be /that/ bad!

    But I agree with your discomfort.


    I would expect it to immediately explode, because AFAIK the usual preprocessor behavior is to keep expanding macros in a line until there
    is nothing left to expand.

    Well, granted, it is possible I could have misinterpreted how it was
    supposed to work and had never noticed...



    I think you did misinterpret. Macros in C are not recursive. That
    stops them exploding, but also means there's a lot you can't do with the preprocessor.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Sun May 26 18:26:49 2024
    On 26/05/2024 17:05, Michael S wrote:
    On Sun, 26 May 2024 16:29:35 +0200
    David Brown <[email protected]> wrote:

    On 26/05/2024 15:46, jak wrote:
    Michael S ha scritto:
    On Sun, 26 May 2024 13:44:32 +0200
    jak <[email protected]> wrote:

    Keith Thompson ha scritto:
    jak <[email protected]> writes:
    Kaz Kylheku ha scritto:
    On 2024-05-24, jak <[email protected]> wrote:
    Bonita Montero ha scritto:
    Am 23.05.2024 um 21:49 schrieb Thiago Adams:
    On 23/05/2024 16:25, Bonita Montero wrote:
    I ask myself what the point is in further developing a >>>>>>>>>>>> language like this that can actually no longer be saved. >>>>>>>>>>> do you mean C++?

    No, C.

    I think you have a lot of confusion about programming
    languages. C and C++ are not comparable languages.
    Except for observations like that we can write useful,
    production software that compiles as C or C++, but go on ...

    Indeed there are c++ compilers who, if used to compile c code,
    could decide to call the c compiler to do the work, but if
    something in the code is not strictly c, then the compilation
    will be in c++, the size of the executable will increase
    significantly and will need of an internal or external runtimer
    to work. If it were the same thing you would not get different
    things.

    Oh?  Do you know of a C++ compiler that actually behaves this
    way? I've never heard of such a thing.

    C and C++ are closely related, and C and C++ compilers often
    share backends, but the two languages have different grammars.
    The gcc command, for example, can invoke either a C or C++
    compiler, but it knows which language it's compiling based on
    the source file name or command line options, before it's even
    seen the content.

    There are programs that are valid C and valid C++ but with
    different behavior.  How would a compiler that behaves as you
    describe cope with that?

    For example g++ makes something similar: if you pass a file .C it
    compile the C code but if the file (.C) contains C++ code then
    compile C++.


    No.


    No, it does not.
    g++ compiles as C++ unless you tell it to compile as C with '-x c'
    option.


    No.




    You didn't read carefully or I didn't express myself well. I wrote
    that the g++ compile c++ even if it is written inside a .c file.
    However in doubt I preferred to try. If I pass to g++ a .c file that
    contains c code, it compiles without any option, perhaps because it
    reads as if it were c++ but in any case compiles it.


    No.


    The way gcc handles all this is actually quite straightforward.

    First, there is no difference between the commands "gcc" and "g++" in
    the languages supported, or the way the language is determined. The
    only difference between these two is the standard libraries linked by
    default when generating a final executable - "g++" automatically
    includes the C++ standard libraries, while "gcc" only has the C
    standard libraries.

    In neither case does "gcc" or "g++" actually handle the compilation -
    these are driver front-ends that pass things on to the actual
    compilers, assemblers and linkers (and any other bits and pieces
    required).


    I don't know how it works in your environment.
    I am 100% sure that it works like I wrote above in my environment. Specifically:
    'g++ -c foo.c' calls binary cc1plus.exe

    My apologies - you are correct.

    g++ does indeed treat ".c" (and ".h" and ".i") files as C++, unless
    overridden. (This applies only to those file extensions - Fortran, Ada, Assembly, linker, etc., files are treated just like with gcc.)

    'g++ -c -x c foo.c' calls binary cc1.exe
    'gcc -c foo.c' calls binary cc1.exe
    'gcc -c foo.cpp' calls binary cc1plus.exe
    'gcc -c foo.C' calls binary cc1plus.exe


    Yes, of course.


    The front-ends determine the language to use primarily from the
    suffix of the source file it is given. ".c" files are compiled as C.
    ".cpp", ".c++", ".cc", ".C" (note the capital C), ".cp", ".cxx", and
    ".CPP" are compiled as C++. (There are many other extensions
    supported for different languages.)


    In my environment it applies to gcc, but not to g++.
    In order to force my g++ to compile for other language you have to tell
    it so explicitly.

    No, g++ treats extensions other than ".c" the same way as gcc. (I
    tested to be sure this time!) Try :

    touch foo.f
    gcc foo.f
    g++ foo.f

    You'll get the same complaint - either from missing Fortran support or a failure to build the Fortran program. Even "g++ foo.m" tries to compile
    as Objective-C, not Objective-C++.

    <https://gcc.gnu.org/onlinedocs/gcc/Invoking-G_002b_002b.html>


    The language choice can be overridden by using the "-x" switch, such
    as "-x c" or "-x c++". The standard can be specified with "-std=".


    Yes, of course.

    There is no automatic detection of C or C++ based on the /content/ of
    the files.


    Yes, of course.


    <https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html>






    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to jak on Sun May 26 18:36:16 2024
    On 26/05/2024 17:10, jak wrote:
    David Brown ha scritto:
    On 26/05/2024 15:46, jak wrote:
    Michael S ha scritto:
    On Sun, 26 May 2024 13:44:32 +0200
    jak <[email protected]> wrote:

    Keith Thompson ha scritto:
    jak <[email protected]> writes:
    Kaz Kylheku ha scritto:
    On 2024-05-24, jak <[email protected]> wrote:
    Bonita Montero ha scritto:
    Am 23.05.2024 um 21:49 schrieb Thiago Adams:
    On 23/05/2024 16:25, Bonita Montero wrote:
    I ask myself what the point is in further developing a >>>>>>>>>>>> language like this that can actually no longer be saved. >>>>>>>>>>> do you mean C++?

    No, C.

    I think you have a lot of confusion about programming languages. >>>>>>>>> C and C++ are not comparable languages.
    Except for observations like that we can write useful, production >>>>>>>> software that compiles as C or C++, but go on ...

    Indeed there are c++ compilers who, if used to compile c code,
    could decide to call the c compiler to do the work, but if
    something in the code is not strictly c, then the compilation will >>>>>>> be in c++, the size of the executable will increase significantly >>>>>>> and will need of an internal or external runtimer to work. If it >>>>>>> were the same thing you would not get different things.

    Oh?  Do you know of a C++ compiler that actually behaves this way? >>>>>> I've never heard of such a thing.

    C and C++ are closely related, and C and C++ compilers often share >>>>>> backends, but the two languages have different grammars.  The gcc >>>>>> command, for example, can invoke either a C or C++ compiler, but it >>>>>> knows which language it's compiling based on the source file name or >>>>>> command line options, before it's even seen the content.

    There are programs that are valid C and valid C++ but with different >>>>>> behavior.  How would a compiler that behaves as you describe cope >>>>>> with that?

    For example g++ makes something similar: if you pass a file .C it
    compile the C code but if the file (.C) contains C++ code then
    compile C++.


    No.


    No, it does not.
    g++ compiles as C++ unless you tell it to compile as C with '-x c'
    option.


    No.




    You didn't read carefully or I didn't express myself well. I wrote that
    the g++ compile c++ even if it is written inside a .c file.
    However in doubt I preferred to try. If I pass to g++ a .c file that
    contains c code, it compiles without any option, perhaps because it
    reads as if it were c++ but in any case compiles it.


    No.


    The way gcc handles all this is actually quite straightforward.

    First, there is no difference between the commands "gcc" and "g++" in
    the languages supported, or the way the language is determined.  The
    only difference between these two is the standard libraries linked by
    default when generating a final executable - "g++" automatically
    includes the C++ standard libraries, while "gcc" only has the C
    standard libraries.

    In neither case does "gcc" or "g++" actually handle the compilation -
    these are driver front-ends that pass things on to the actual
    compilers, assemblers and linkers (and any other bits and pieces
    required).

    The front-ends determine the language to use primarily from the suffix
    of the source file it is given.  ".c" files are compiled as C.
    ".cpp", ".c++", ".cc", ".C" (note the capital C), ".cp", ".cxx", and
    ".CPP" are compiled as C++.  (There are many other extensions
    supported for different languages.)

    The language choice can be overridden by using the "-x" switch, such
    as "-x c" or "-x c++".  The standard can be specified with "-std=".

    There is no automatic detection of C or C++ based on the /content/ of
    the files.


    <https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html>



    ?
    I really wrote that something similar (similar != equal) did g++ and
    that, if you write c++ code in a file with the .c extension, the g++
    compile it. I never wrote that it was automatically recognized.
    In addition, you just explained why g++ compile a .c that contains c++
    code. I don't understand: no what?


    I made an error here - "g++ foo.c" /will/ treat the file as C++. I
    apologise for that, as it made things a lot more confusing.

    But that is not what you wrote. Perhaps you didn't write what you
    intended to write. You said that g++ somehow determines whether to
    compile code as C or C++ based on the /contents/ of the file, not the
    filename suffix. And that is completely wrong.

    You also mixed up ".c" and ".C". gcc considers ".c" to be C code, while
    ".C" (with a capital C) is considered C++.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Sun May 26 19:35:49 2024
    On Sun, 26 May 2024 16:25:51 +0100
    bart <[email protected]> wrote:

    On 26/05/2024 14:18, Michael S wrote:
    On Sun, 26 May 2024 12:51:12 +0100
    bart <[email protected]> wrote:

    On 26/05/2024 12:09, David Brown wrote:
    On 26/05/2024 00:58, Keith Thompson wrote:

    For a very large file, that could be a significant burden.  (I
    don't have any numbers on that.)

    I do :

    <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#design-efficiency-metrics>

    (That's from a proposal for #embed for C and C++.  Generating the
    numbers and parsing them is akin to using xxd.)

    More useful links:

    <https://thephd.dev/embed-the-details#results>
    <https://thephd.dev/implementing-embed-c-and-c++>

    (These are from someone who did a lot of the work for the
    proposals, and prototype implementations, as far as I understand
    it.)



    Note that I can't say how much of a difference this will make in
    real life.  I don't know how often people need to include
    multi-megabyte files in their code.  It certainly is not at a
    level where I would change any of my existing projects from
    external generator scripts to using #embed, but I might use it in
    future projects.

    I've just done my own quick test (not in C, using embed in my
    language):

    []byte clangexe = binclude("f:/llvm/bin/clang.exe")

    proc main=
    fprintln "clang.exe is # bytes", clangexe.len
    end


    This embeds the Clang C compiler which is 119MB. It took 1.3
    seconds to compile (note my compiler is not optimised).

    If I tried it using text: a 121M-line include file, with one number
    per line, it took 144 seconds (I believe it used more RAM than was
    available: each line will have occupied a 64-byte AST node, so
    nearly 8GB, on a machine with only 6GB available RAM, much of
    which was occupied).

    On my old PC that was not the cheapest box in the shop, but is more
    than 10 y.o. compilation speed for similarly organized (but much
    smaller) text files is as following:
    MSVC 18.00.31101 (VS 2013) - 1950 KB/sec
    MSVC 19.16.27032 (VS 2017) - 1180 KB/sec
    MSVC 19.20.27500 (VS 2019) - 1180 KB/sec
    clang 17.0.6 - 547 KB/sec (somewhat better with hex text)
    gcc 13.2.0 - 580 KB/sec

    So, MSVC compilers, esp. an old one, are somewhat faster than yours.
    But if there was swapping involved it's not comparable. How much
    time does it take for your compiler to produce 5MB byte array from
    text?

    Are you talking about a 5MB array initialised like this:

    unsigned char data[] = {
    45,
    67,
    17,
    ... // 5M-3 more rows
    };


    Yes.

    The timing for 120M entries was challenging as it exceeded physical
    memory. However that test I can also do with C compilers. Results for
    120 million lines of data are:

    DMC - Out-of-memory

    Tiny C - Silently stopped after 13 second (I thought it
    had finished but no)

    lccwin32 - Insufficient memory

    gcc 10.x.x - Out of memory after 80 seconds

    mcc - (My product) Memory failure after 27 seconds

    Clang - (Crashed after 5 minutes)

    MM 144s (Compiler for my language)

    So the compiler for my language did quite well, considering!


    That's an interesting test as well, but I don't want to run it on my HW
    right now. May be, at night.


    Back to the 5MB test:

    Tiny C 1.7s 2.9MB/sec (Tcc doesn't use any IR)

    mcc 3.7s 1.3MB/sec (my product; uses intermediate ASM)

    Faster than new MSVC, but slower than old MSVC.


    DMC -- -- (Out of memory; 32-bit compiler)

    lccwin32 3.9s 1.3MB/sec

    gcc 10.x 10.6s 0.5MB/sec

    clang 7.4s 0.7MB/sec (to object file only)

    MM 1.4s 3.6MB/sec (compiler for my language)

    MM 0.7 7.1MB/sec (MM optimised via C and gcc-O3)


    That's quite impressive.
    Does it generate object files or goes directly to exe?
    Even if later, it's still impressive.

    As a reminder, when using my version of 'embed' in my language,
    embedding a 120MB binary file took 1.3 seconds, about 90MB/second.


    But both are much faster than compiling through text. Even "slow"
    40MB/3 is 6-7 times faster than the fastest of compilers in my
    tests.

    Do you have a C compiler that supports #embed?


    No, I just blindly believe the paper.
    But it probably would be available in clang this year and in gcc around
    start of the next year. At least I hope so.

    It's generally understood that processing text is slow, if
    representing byte-at-a-time data. If byte arrays could be represented
    as sequences of i64 constants, it would improve matters. That could
    be done in C, but awkwardly, by aliasing a byte-array with an
    i64-array.


    I don't think that conversion from text to binary is a significant
    bottleneck here. In order to get a feeling of the things, I wrote a
    tiny program that converts comma-separated list of integers to a binary
    file. Something quite similar to 'xxd -r' but with input format that
    is more fit to our requirements. Not identical to full requirements, of
    course. My utility can't handle comments and probably few other things
    that are allowed in C sources, but conversion part is pretty much the
    same.
    It runs at 6.700 MB/s with decimal input and at 9.1 MB/s with hex input.
    That with SATA SSD of sort that went out of fashion before 2020.

    So, it seems that at least in case gcc a conversion part constitutes
    less than 10% of the total run time.

    If you want to play with it yourself, here is my source:

    -- list_to_bin.c
    -- takes textual input from standard input
    -- writes output to binary file
    -- Usage:
    -- list_to_bin oufile.bin < inp_file.txt
    --
    #include <stdio.h>
    #include <stdlib.h>
    #include <ctype.h>

    int main(int argz, char** argv)
    {
    if (argz > 1) {
    FILE* fp = fopen(argv[1], "wb");
    if (fp) {
    char buf[2048];
    _Bool look_for_comma = 0;
    for (;;) {
    if (fgets(buf, sizeof(buf), stdin) != buf)
    break;

    char* p = buf;
    for (;;) {
    char c = *p;
    if (isgraph(c)) {
    if (look_for_comma) {
    if (c == ',') {
    look_for_comma = 0;
    ++p;
    } else {
    goto done;
    }
    } else {
    char* endp;
    long val = strtol(p, &endp, 0);
    if (endp==p) // not a number
    goto done;
    fputc((unsigned char)val, fp);
    p = endp;
    look_for_comma = 1;
    }
    } else {
    if (c == 0)
    break; // end of line
    ++p; // skip space or control character
    }
    }
    }
    done:
    fclose(fp);
    } else {
    perror(argv[1]);
    return 1;
    }
    }
    return 0;
    }










    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Sun May 26 19:50:40 2024
    On Sun, 26 May 2024 18:26:49 +0200
    David Brown <[email protected]> wrote:

    On 26/05/2024 17:05, Michael S wrote:

    In my environment it applies to gcc, but not to g++.
    In order to force my g++ to compile for other language you have to
    tell it so explicitly.

    No, g++ treats extensions other than ".c" the same way as gcc. (I
    tested to be sure this time!) Try :

    touch foo.f
    gcc foo.f
    g++ foo.f

    You'll get the same complaint - either from missing Fortran support
    or a failure to build the Fortran program. Even "g++ foo.m" tries to
    compile as Objective-C, not Objective-C++.


    Yes, I paid attention that for suffix .f (and probably for .ada) gcc
    and g++ behave identically only after I posted my response.

    BTW, it seems to me that here behavior of gcc/g++ is different from
    gfortran. If I am not mistaken, gfortran by default treats extension .f
    as "old FORTRAN" and extension .f90 as "new Fortran". But I can be
    wrong about it, New Fortran is not something I compile regularly and
    old FORTRAN is not something that I compile ever.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jak@21:1/5 to All on Sun May 26 19:11:31 2024
    David Brown ha scritto:
    On 26/05/2024 17:10, jak wrote:
    David Brown ha scritto:
    On 26/05/2024 15:46, jak wrote:
    Michael S ha scritto:
    On Sun, 26 May 2024 13:44:32 +0200
    jak <[email protected]> wrote:

    Keith Thompson ha scritto:
    jak <[email protected]> writes:
    Kaz Kylheku ha scritto:
    On 2024-05-24, jak <[email protected]> wrote:
    Bonita Montero ha scritto:
    Am 23.05.2024 um 21:49 schrieb Thiago Adams:
    On 23/05/2024 16:25, Bonita Montero wrote:
    I ask myself what the point is in further developing a >>>>>>>>>>>>> language like this that can actually no longer be saved. >>>>>>>>>>>> do you mean C++?

    No, C.

    I think you have a lot of confusion about programming languages. >>>>>>>>>> C and C++ are not comparable languages.
    Except for observations like that we can write useful, production >>>>>>>>> software that compiles as C or C++, but go on ...

    Indeed there are c++ compilers who, if used to compile c code, >>>>>>>> could decide to call the c compiler to do the work, but if
    something in the code is not strictly c, then the compilation will >>>>>>>> be in c++, the size of the executable will increase significantly >>>>>>>> and will need of an internal or external runtimer to work. If it >>>>>>>> were the same thing you would not get different things.

    Oh?  Do you know of a C++ compiler that actually behaves this way? >>>>>>> I've never heard of such a thing.

    C and C++ are closely related, and C and C++ compilers often share >>>>>>> backends, but the two languages have different grammars.  The gcc >>>>>>> command, for example, can invoke either a C or C++ compiler, but it >>>>>>> knows which language it's compiling based on the source file name or >>>>>>> command line options, before it's even seen the content.

    There are programs that are valid C and valid C++ but with different >>>>>>> behavior.  How would a compiler that behaves as you describe cope >>>>>>> with that?

    For example g++ makes something similar: if you pass a file .C it
    compile the C code but if the file (.C) contains C++ code then
    compile C++.


    No.


    No, it does not.
    g++ compiles as C++ unless you tell it to compile as C with '-x c'
    option.


    No.




    You didn't read carefully or I didn't express myself well. I wrote that >>>> the g++ compile c++ even if it is written inside a .c file.
    However in doubt I preferred to try. If I pass to g++ a .c file that
    contains c code, it compiles without any option, perhaps because it
    reads as if it were c++ but in any case compiles it.


    No.


    The way gcc handles all this is actually quite straightforward.

    First, there is no difference between the commands "gcc" and "g++" in
    the languages supported, or the way the language is determined.  The
    only difference between these two is the standard libraries linked by
    default when generating a final executable - "g++" automatically
    includes the C++ standard libraries, while "gcc" only has the C
    standard libraries.

    In neither case does "gcc" or "g++" actually handle the compilation -
    these are driver front-ends that pass things on to the actual
    compilers, assemblers and linkers (and any other bits and pieces
    required).

    The front-ends determine the language to use primarily from the
    suffix of the source file it is given.  ".c" files are compiled as C.
    ".cpp", ".c++", ".cc", ".C" (note the capital C), ".cp", ".cxx", and
    ".CPP" are compiled as C++.  (There are many other extensions
    supported for different languages.)

    The language choice can be overridden by using the "-x" switch, such
    as "-x c" or "-x c++".  The standard can be specified with "-std=".

    There is no automatic detection of C or C++ based on the /content/ of
    the files.


    <https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html>



    ?
    I really wrote that something similar (similar != equal) did g++ and
    that, if you write c++ code in a file with the .c extension, the g++
    compile it. I never wrote that it was automatically recognized.
    In addition, you just explained why g++ compile a .c that contains c++
    code. I don't understand: no what?


    I made an error here - "g++ foo.c" /will/ treat the file as C++.  I apologise for that, as it made things a lot more confusing.

    But that is not what you wrote.  Perhaps you didn't write what you
    intended to write.  You said that g++ somehow determines whether to
    compile code as C or C++ based on the /contents/ of the file, not the filename suffix.  And that is completely wrong.

    You also mixed up ".c" and ".C".  gcc considers ".c" to be C code, while ".C" (with a capital C) is considered C++.



    Sorry but no. I wrote that there are compilers who do it and when they
    replied, bringing the gcc as an example, I replied that the g++ does
    something similar.

    and no, I have not confused the .c with the .C:

    $ cat foo.c
    #include <iostream>

    int main()
    {
    std::cout << "hello" << std::endl;
    return 0;
    }
    $ g++ -Wall -pedantic foo.c -o foo
    $ ./foo
    hello
    $

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jak@21:1/5 to All on Sun May 26 19:23:09 2024
    Michael S ha scritto:
    On Sun, 26 May 2024 17:10:01 +0200
    jak <[email protected]> wrote:


    ?
    I really wrote that something similar (similar != equal) did g++ and
    that, if you write c++ code in a file with the .c extension, the g++
    compile it. I never wrote that it was automatically recognized.
    In addition, you just explained why g++ compile a .c that contains c++
    code. I don't understand: no what?


    Your English is already harder to understand than mine.
    Congratulations, that is not a small fit. But you still have fir to
    pursuit. Keep exercising.



    Instead, we could give us an appointment on a usenet where the native
    speaker is me. :-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Sun May 26 19:01:21 2024
    On 26/05/2024 17:35, Michael S wrote:
    On Sun, 26 May 2024 16:25:51 +0100
    bart <[email protected]> wrote:

    On 26/05/2024 14:18, Michael S wrote:

    Are you talking about a 5MB array initialised like this:

    unsigned char data[] = {
    45,
    67,
    17,
    ... // 5M-3 more rows
    };


    Yes.

    The timing for 120M entries was challenging as it exceeded physical
    memory. However that test I can also do with C compilers. Results for
    120 million lines of data are:

    DMC - Out-of-memory

    Tiny C - Silently stopped after 13 second (I thought it
    had finished but no)

    lccwin32 - Insufficient memory

    gcc 10.x.x - Out of memory after 80 seconds

    mcc - (My product) Memory failure after 27 seconds

    Clang - (Crashed after 5 minutes)

    MM 144s (Compiler for my language)

    So the compiler for my language did quite well, considering!


    That's an interesting test as well, but I don't want to run it on my HW
    right now. May be, at night.


    Back to the 5MB test:

    Tiny C 1.7s 2.9MB/sec (Tcc doesn't use any IR)

    mcc 3.7s 1.3MB/sec (my product; uses intermediate ASM)

    Faster than new MSVC, but slower than old MSVC.

    My mcc is never going to be fast, because it uses ASM, which itself will generate a text file several times larger than the C (so the line "123,"
    in C ends up as " db 123" in the ASM file).

    However I've looked at a possible way of speeding this up in general,
    see below.



    DMC -- -- (Out of memory; 32-bit compiler)

    lccwin32 3.9s 1.3MB/sec

    gcc 10.x 10.6s 0.5MB/sec

    clang 7.4s 0.7MB/sec (to object file only)

    MM 1.4s 3.6MB/sec (compiler for my language)

    MM 0.7 7.1MB/sec (MM optimised via C and gcc-O3)


    That's quite impressive.
    Does it generate object files or goes directly to exe?

    All produce EXE files, via linkers if necessary, except Clang (its hefty
    LLVM installation doesn't come with standard C headers, nor a linker; it depends on MS tools, but never manages to sync with them).

    My MM product directly generates EXE files with no intermediate OBJ files.

    Even if later, it's still impressive.


    So, it's more impressive if it first generates an OBJ file then invokes
    a linker? I'd have thought that eliminating that pointless intermediate
    step would be more impressive!

    Anyway, I thought of a way of speeding up initialisation of byte-arrays
    which is, instead of parsing each value into its own AST node, to
    directly parse successive numeric values into a special data-string
    object (similar to normal strings, and identical to the data-strings
    used for embedded data).

    Then there is only one AST node containing one 'string' value, instead
    of 5M or 120M nodes.

    This produced a timing, for 5M lines, of 0.34s (0.28s optimised), a
    throughput of 15-18MB/sec.

    When I applied this to the 120M line data (which is a 0.6GB source
    file), it finished in 6.5 seconds (5.5 optimised), or 18-21MB/sec.
    Previously that took 144 seconds.

    However I can't keep that experimental code, since if it turns out not
    all values are constant expressions, it has to revert to normal
    processing, which is tricky to do; it may already have read 1M numbers
    and needs to backtrack). This was just to see how fast it could be.

    Processing 120MB as binary rather than text is still faster; that works
    at up to 110MB/sec with an optimised compiler.


    As a reminder, when using my version of 'embed' in my language,
    embedding a 120MB binary file took 1.3 seconds, about 90MB/second.


    But both are much faster than compiling through text. Even "slow"
    40MB/3 is 6-7 times faster than the fastest of compilers in my
    tests.

    Do you have a C compiler that supports #embed?


    No, I just blindly believe the paper.

    Funny that no one else has access to an implementation! Those figures
    have been around for a while.

    But it probably would be available in clang this year and in gcc around
    start of the next year. At least I hope so.

    It's generally understood that processing text is slow, if
    representing byte-at-a-time data. If byte arrays could be represented
    as sequences of i64 constants, it would improve matters. That could
    be done in C, but awkwardly, by aliasing a byte-array with an
    i64-array.


    I don't think that conversion from text to binary is a significant
    bottleneck here.

    That's not quite what I meant. That conversion is the lexical part of processing source code, it can be very fast.

    It is parsing, and especially constructing a list of 5M or 120M AST
    nodes, each containing one expression, and the subsequent type-checking
    and code generation that takes the time.

    However your benchmark looks intriguing and I'll have a closer look later.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Malcolm McLean on Sun May 26 23:06:47 2024
    On Sun, 26 May 2024 19:19:59 +0100
    Malcolm McLean <[email protected]> wrote:


    The Baby X resource compiler has a 'binary' tag to embed binary data.
    The biggest file in my documents folder was a 33 mb boost zipped
    image. And the resouce compiler, built in debug mode, took five
    seconds to convert that to a C source file with an array of unsigned
    chars.

    It then took gcc about 20 seconds to compile it to an object file.


    If '33 mb' means 33 MB and 'about 20 seconds' means 20 seconds then
    your gcc compiles at 1.65 MB/s. That's 2.8x faster than
    gcc on my old test machine and 1.7 times faster than gcc 13.2.0 on much
    faster machine with quite good PCIe-attached SSD. Sounds interesting.
    What are your HW, OS and environment?
    Can you show us an example of your output format?

    The output file was 218 mb. It goes straight in the bin.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to jak on Sun May 26 21:03:17 2024
    On 2024-05-26, jak <[email protected]> wrote:
    Indeed there are c++ compilers who, if used to compile c code, could
    decide to call the c compiler to do the work, but if something in the
    code is not strictly c, then the compilation will be in c++, the size of

    Compilers? As in two or more?

    Name them, ore there aren't.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @[email protected]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Sun May 26 23:26:51 2024
    On Sun, 26 May 2024 19:01:21 +0100
    bart <[email protected]> wrote:

    On 26/05/2024 17:35, Michael S wrote:
    On Sun, 26 May 2024 16:25:51 +0100
    bart <[email protected]> wrote:



    Back to the 5MB test:

    Tiny C 1.7s 2.9MB/sec (Tcc doesn't use any IR)

    mcc 3.7s 1.3MB/sec (my product; uses intermediate
    ASM)

    Faster than new MSVC, but slower than old MSVC.

    My mcc is never going to be fast, because it uses ASM, which itself
    will generate a text file several times larger than the C (so the
    line "123," in C ends up as " db 123" in the ASM file).


    Generation of asm at 7-8 MB/s sounds feasible even on slow computer.
    And once you have asm in right format, 'gnu as' processes it quite fast.
    On faster computer I had seen ~30 MB/s. I'd guess the slower one
    should be able to do it at 15 MB/s. So, generation+assembling together
    could run at ~5 MB/s. The trick here is to use format that 'gnu as' was optimized for. To know what it is, look at the output of gcc -S.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Michael S on Sun May 26 23:59:55 2024
    On Sun, 26 May 2024 19:35:49 +0300
    Michael S <[email protected]> wrote:

    On Sun, 26 May 2024 16:25:51 +0100
    bart <[email protected]> wrote:

    On 26/05/2024 14:18, Michael S wrote:

    The timing for 120M entries was challenging as it exceeded physical
    memory. However that test I can also do with C compilers. Results
    for 120 million lines of data are:

    DMC - Out-of-memory

    Tiny C - Silently stopped after 13 second (I thought it
    had finished but no)

    lccwin32 - Insufficient memory

    gcc 10.x.x - Out of memory after 80 seconds

    mcc - (My product) Memory failure after 27 seconds

    Clang - (Crashed after 5 minutes)

    MM 144s (Compiler for my language)

    So the compiler for my language did quite well, considering!


    That's an interesting test as well, but I don't want to run it on my
    HW right now. May be, at night.


    Done.
    On bigger gear it was not as bad as I expected.
    Input file: 155,488,672 bytes
    C source (decimal, one number per line): 641,236,315 bytes
    gcc compilation time: 3m54.635s
    peak memory consumption by compiler: ~27 GB

    0.66 MB/s, only 25-30% slower rate than 5 MB input on the same HW
    That is, slow, but not sky is falling sort of slow.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to jak on Sun May 26 21:16:00 2024
    On 2024-05-26, jak <[email protected]> wrote:
    Keith Thompson ha scritto:
    Indeed there are c++ compilers who, if used to compile c code, could
    decide to call the c compiler to do the work, but if something in the
    code is not strictly c, then the compilation will be in c++, the size
    of the executable will increase significantly and will need of an
    internal or external runtimer to work. If it were the same thing you
    would not get different things.

    Oh? Do you know of a C++ compiler that actually behaves this way?
    I've never heard of such a thing.

    For example g++ makes something similar: if you pass a file .C it
    compile the C code but if the file (.C) contains C++ code then
    compile C++.

    1. The file suffix is not "something /in the code/ that is not strictly C".
    The front end of a compiler collection selecting a compiler based
    on file suffix is not an example of switching language based
    on syntax in the file.

    2. g++ does not behave this way.

    In fact .C (capital C) is one of the conventions for C++ files. I
    seem to remember that the convention was used at A&T and in fact you
    can find examples of it in the source code of Cfront (the historic
    C++ to C transpiler originally developed by B. Stroustrup).

    For g++ to assume that a .C file is C and not C++ would be insanely
    poor.

    The g++ command even assumes that .c files are C++!

    Conversely, when you use the gcc driver command on a .C file,
    you get the C++ compiler!

    Since you'r posting to Usenet, you're obviously connected to the same
    Internet as the rest of us, so it's amazing you're not able to check
    your facts. You know about g++, so presumbly you have an installation of
    it somewhere, where you could run a 30 second experiment.


    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @[email protected]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Sun May 26 22:27:15 2024
    On 26/05/2024 21:26, Michael S wrote:
    On Sun, 26 May 2024 19:01:21 +0100
    bart <[email protected]> wrote:

    On 26/05/2024 17:35, Michael S wrote:
    On Sun, 26 May 2024 16:25:51 +0100
    bart <[email protected]> wrote:



    Back to the 5MB test:

    Tiny C 1.7s 2.9MB/sec (Tcc doesn't use any IR)

    mcc 3.7s 1.3MB/sec (my product; uses intermediate
    ASM)

    Faster than new MSVC, but slower than old MSVC.

    My mcc is never going to be fast, because it uses ASM, which itself
    will generate a text file several times larger than the C (so the
    line "123," in C ends up as " db 123" in the ASM file).


    Generation of asm at 7-8 MB/s sounds feasible even on slow computer.
    And once you have asm in right format,

    If I take the 5M-line data file, and use `gcc -S` on it, produces an ASM
    file where the bytes are combined into strings. Is that the 'trick'?

    Then processing that ASM file can be faster.

    However my ASM o/p doesn't create strings like that, and the ASM file is therefore five times the size.

    Still, my assembler can turn my 72MB ASM file into a 5MB executable in
    0.74 seconds (which is 100MB/sec).

    'as' can turn its much smaller 15MB ASM (.s) file into an executable in
    0.56 seconds (27MB/sec).

    'gnu as' processes it quite fast.

    Given the same input (ie. same set of instructions), my assembler is
    faster than 'as'. See this survey of assembler speeds here:

    https://www.reddit.com/r/Compilers/comments/1c41y6d/assembler_survey/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

    Mine is the 'AA' assembler.

    The bottleneck here is writing the ASM file. But I don't care about
    that, since 'mcc' is not my primary compiler. My primary one doesn't use
    ASM.

    But even with that bottleneck, mcc compiles this data file to EXE three
    times as fast as gcc.

    My MM compiler can do so 17 times as fast as gcc. And with the
    optimisation I mentioned in a previous post (similar to as's trick), it
    could do so 35-40 times faster than gcc.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Sun May 26 22:52:25 2024
    On 26/05/2024 17:35, Michael S wrote:

    #include <stdio.h>
    #include <stdlib.h>
    #include <ctype.h>

    int main(int argz, char** argv)
    {
    if (argz > 1) {
    FILE* fp = fopen(argv[1], "wb");
    if (fp) {
    char buf[2048];
    _Bool look_for_comma = 0;
    for (;;) {
    if (fgets(buf, sizeof(buf), stdin) != buf)
    break;

    char* p = buf;
    for (;;) {
    char c = *p;
    if (isgraph(c)) {
    if (look_for_comma) {
    if (c == ',') {
    look_for_comma = 0;
    ++p;
    } else {
    goto done;
    }
    } else {
    char* endp;
    long val = strtol(p, &endp, 0);
    if (endp==p) // not a number
    goto done;
    fputc((unsigned char)val, fp);
    p = endp;
    look_for_comma = 1;
    }
    } else {
    if (c == 0)
    break; // end of line
    ++p; // skip space or control character
    }
    }
    }
    done:
    fclose(fp);
    } else {
    perror(argv[1]);
    return 1;
    }
    }
    return 0;
    }

    I tried this on my 600MB data like this:

    C:\c>c fred.exe <data

    C:\c>fred --version
    clang version 18.1.0rc
    Target: x86_64-pc-windows-msvc
    Thread model: posix
    InstalledDir: C:\c

    Since those bytes represent the contents of the clang compiler, I was
    able to run it afterwards.

    All versions across compilers/optimise levels seemed to give a constant
    time of 17-18 seconds. This is good compared with my initial 144 seconds
    (most compilers failed; you reported a similar test took several minutes).

    However, what's involved with a compiler is much elaborate than such a
    program. There's syntax, type-checking, code-generation...

    Still, I reported earlier an experimental change to my non-C compiler,
    which translated this same input to a program with that embedded binary
    (not just the binary itself) in under 6 seconds.

    That's three times as fast as the above result:

    C:\mapps>tm \mx2\mm -ext test2 # tm is timing tool
    Compiling test2.m to test2.exe
    TM: 5.86 # (timings vary)

    C:\mapps>test2
    data is 119571969 bytes

    C:\mapps>type test2.m
    []byte data = (
    include "data"
    0)

    proc main=
    fprintln "data is # bytes", data.len
    end

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Kuyper@21:1/5 to BGB on Sun May 26 18:59:09 2024
    On 5/26/24 10:48, BGB wrote:
    On 5/26/2024 9:18 AM, David Brown wrote:
    On 26/05/2024 01:45, Keith Thompson wrote:
    David Brown <[email protected]> writes:
    ...
        #define errno *errno()

    Both of those need more parentheses -- and I'm unconfortable using the
    same identifier for the macro and the function.


    The second example was from the footnote in the C standard's section on
    <errno.h>, so it can't be /that/ bad!

    But I agree with your discomfort.


    I would expect it to immediately explode, because AFAIK the usual preprocessor behavior is to keep expanding macros in a line until there
    is nothing left to expand.

    No, C macros are not recursive:
    "... The resulting preprocessing token sequence is then rescanned, along
    with all subsequent preprocessing tokens of the source file, for more
    macro names to replace.
    If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s
    preprocessing tokens), it is not replaced. Furthermore, if any nested replacements encounter the name of the macro being replaced, it is not replaced. These nonreplaced macro name preprocessing tokens are no
    longer available for further replacement even if they are later
    (re)examined in contexts in which that macro name preprocessing token
    would otherwise have been replaced." (6.10.4.4p1,2)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Michael S on Mon May 27 00:48:05 2024
    On Sun, 26 May 2024 19:35:49 +0300, Michael S wrote:

    Faster than new MSVC, but slower than old MSVC.

    New MSVC is slower than old MSVC?!? Say it isn’t so!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Michael S on Mon May 27 00:49:24 2024
    On Sun, 26 May 2024 23:06:47 +0300, Michael S wrote:

    On Sun, 26 May 2024 19:19:59 +0100 Malcolm McLean <[email protected]> wrote:

    ... was a 33 mb boost zipped image.

    If '33 mb' means 33 MB ...

    Yeah, I wondered about that. Never saw anybody measure things in “millibits” before ...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to David Brown on Mon May 27 00:44:21 2024
    On Sun, 26 May 2024 13:09:36 +0200, David Brown wrote:

    People have always managed to embed
    binary source files into their binary output files - using linker
    tricks, or using xxd or other tools (common or specialised) to turn
    binary files into initialisers for constant arrays (or structs).

    Don’t call them “tricks”. Call them “linker scripts” and “build procedures”. They can do some quite complex things.

    #embed has two purposes. One is to save you from using external tools
    for that kind of thing.

    But it can only be a partial solution to that. It cannot replace the
    procedures needed to construct the binary data format. It only solves the
    easy part: including that binary data in the build. And only in a certain
    way.

    That’s why I think it’s a waste of time.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Bonita Montero on Mon May 27 00:55:28 2024
    On Sun, 26 May 2024 13:23:21 +0200, Bonita Montero wrote:

    C++ is the wrong language for web applications.
    I like Java more for that.

    Java is too clunky and verbose. For asynchronous programming (for
    WebSockets etc), it requires you to use threads.

    I like Python, because it has frameworks based on ASGI for web
    applications.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Lawrence D'Oliveiro on Mon May 27 01:55:24 2024
    On 27/05/2024 01:44, Lawrence D'Oliveiro wrote:
    On Sun, 26 May 2024 13:09:36 +0200, David Brown wrote:

    People have always managed to embed
    binary source files into their binary output files - using linker
    tricks, or using xxd or other tools (common or specialised) to turn
    binary files into initialisers for constant arrays (or structs).

    Don’t call them “tricks”. Call them “linker scripts” and “build procedures”. They can do some quite complex things.

    #embed has two purposes. One is to save you from using external tools
    for that kind of thing.

    But it can only be a partial solution to that. It cannot replace the procedures needed to construct the binary data format.

    The binary data already exists, or has been created.

    The problem is getting it into your program as ready-to-use data rather
    than have to bundle an unwieldy collection of files in a folder
    somewhere and then have assorted routines to read them into memory.

    It only solves the
    easy part: including that binary data in the build.

    Apparently that is not so easy as you seem to think. Or maybe you think
    that 'embedding a file' just means adding it to a zip file?

    That’s why I think it’s a waste of time.

    Embedding applies also to text files not just binaries.

    I used that extensively so that I could build in the sources of the C
    standard libraries into my C compiler. The result is a single executable
    with zero dependencies or support files.

    How would you have implemented that? How maintainable would it have been?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Keith Thompson on Mon May 27 00:53:58 2024
    On Sun, 26 May 2024 02:48:40 -0700, Keith Thompson wrote:

    The gcc command, for example, can invoke either a C or C++ compiler ...

    It can also handle Fortran, Go, D, Ada, assembler and object files. Oh,
    and Objective C as well.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to bart on Mon May 27 02:48:50 2024
    On Mon, 27 May 2024 01:55:24 +0100, bart wrote:

    On 27/05/2024 01:44, Lawrence D'Oliveiro wrote:
    On Sun, 26 May 2024 13:09:36 +0200, David Brown wrote:

    People have always managed to embed binary source files into their
    binary output files - using linker tricks, or using xxd or other tools
    (common or specialised) to turn binary files into initialisers for
    constant arrays (or structs).

    Don’t call them “tricks”. Call them “linker scripts” and “build >> procedures”. They can do some quite complex things.

    #embed has two purposes. One is to save you from using external tools
    for that kind of thing.

    But it can only be a partial solution to that. It cannot replace the
    procedures needed to construct the binary data format.

    The binary data already exists, or has been created.

    It might have to be created as part of the build process.

    The problem is getting it into your program as ready-to-use data rather
    than have to bundle an unwieldy collection of files in a folder
    somewhere and then have assorted routines to read them into memory.

    Nothing “unwieldy” about it. It’s a bunch of temporary intermediate build products, generated from suitable source files like everything else in the build.

    It only solves the easy part: including that binary data in the build.

    Apparently that is not so easy as you seem to think.

    Yes, it is as easy as I think. I’ve done this sort of thing, using
    suitable build scripts.

    Or maybe you think
    that 'embedding a file' just means adding it to a zip file?

    It’s whatever “including it in the build” means. It might indeed be a zip component, as with resources for an Android app. Or it might be converted
    into an object file with a tool like objcopy, to be integrated into the executable.

    Embedding applies also to text files not just binaries.

    Same principle applies.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jak@21:1/5 to All on Mon May 27 07:14:30 2024
    Kaz Kylheku ha scritto:
    On 2024-05-26, jak <[email protected]> wrote:
    Keith Thompson ha scritto:
    Indeed there are c++ compilers who, if used to compile c code, could
    decide to call the c compiler to do the work, but if something in the
    code is not strictly c, then the compilation will be in c++, the size
    of the executable will increase significantly and will need of an
    internal or external runtimer to work. If it were the same thing you
    would not get different things.

    Oh? Do you know of a C++ compiler that actually behaves this way?
    I've never heard of such a thing.

    For example g++ makes something similar: if you pass a file .C it
    compile the C code but if the file (.C) contains C++ code then
    compile C++.

    1. The file suffix is not "something /in the code/ that is not strictly C".
    The front end of a compiler collection selecting a compiler based
    on file suffix is not an example of switching language based
    on syntax in the file.

    2. g++ does not behave this way.

    In fact .C (capital C) is one of the conventions for C++ files. I
    seem to remember that the convention was used at A&T and in fact you
    can find examples of it in the source code of Cfront (the historic
    C++ to C transpiler originally developed by B. Stroustrup).

    For g++ to assume that a .C file is C and not C++ would be insanely
    poor.

    The g++ command even assumes that .c files are C++!

    Conversely, when you use the gcc driver command on a .C file,
    you get the C++ compiler!

    Since you'r posting to Usenet, you're obviously connected to the same Internet as the rest of us, so it's amazing you're not able to check
    your facts. You know about g++, so presumbly you have an installation of
    it somewhere, where you could run a 30 second experiment.



    About what you are talking about I must apologize for one thing: in my
    message that you actually report '.c' is written in capital letters. Unfortunately, Google-Translator transforms everything that look like
    brands or very short texts (c, c++, g++, ...) and initially I have not
    noticed this. I hope to be apologized because I write every sentence
    several times to be able to find a translation as close as possible to
    what I would like to write. In relation to the tests you request, I
    would like to point out that in Sun-26-May-2024-19:11:31+0200 I also
    posted one that, If you had seen, perhaps, it would have avoided this
    post.

    '.c' in GT -> '. C'
    (c, c++, g++, ...) in GT -> (C, C ++, G ++, ...)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Lawrence D'Oliveiro on Mon May 27 11:05:47 2024
    On Mon, 27 May 2024 00:48:05 -0000 (UTC)
    Lawrence D'Oliveiro <[email protected]d> wrote:

    On Sun, 26 May 2024 19:35:49 +0300, Michael S wrote:

    Faster than new MSVC, but slower than old MSVC.

    New MSVC is slower than old MSVC?!? Say it isn’t so!

    Is not it a case for just about any compiler that has a long history
    of development?
    Compilers become slower over time. In return they support newer dialects
    of input language and generate better diagnostics. They also try to
    produce faster code, with very varying levels of success.
    This trend was most easily seen during first decade of LLVM/clang.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to jak on Mon May 27 10:45:28 2024
    On 26/05/2024 19:11, jak wrote:
    David Brown ha scritto:
    On 26/05/2024 17:10, jak wrote:

    ?
    I really wrote that something similar (similar != equal) did g++ and
    that, if you write c++ code in a file with the .c extension, the g++
    compile it. I never wrote that it was automatically recognized.
    In addition, you just explained why g++ compile a .c that contains c++
    code. I don't understand: no what?


    I made an error here - "g++ foo.c" /will/ treat the file as C++.  I
    apologise for that, as it made things a lot more confusing.

    But that is not what you wrote.  Perhaps you didn't write what you
    intended to write.  You said that g++ somehow determines whether to
    compile code as C or C++ based on the /contents/ of the file, not the
    filename suffix.  And that is completely wrong.

    You also mixed up ".c" and ".C".  gcc considers ".c" to be C code,
    while ".C" (with a capital C) is considered C++.



    Sorry but no. I wrote that there are compilers who do it and when they replied, bringing the gcc as an example, I replied that the g++ does something similar.

    and no, I have not confused the .c with the .C:


    You /did/ mix these things up - the Usenet posts are there for you, me,
    or anyone else to read. But there seems little doubt now that you
    understand the difference between "gcc" and "g++", and between ".c" and
    ".C". So I assume the mixup was a language issue - I fully understand
    that it's not always easy to communicate accurately in a different
    language, and even when you are as good as you are in English, sometimes
    there are miscommunications.

    Whichever compiler you use, I strongly recommend using only ".c" for C
    files, and only ".cpp" for C++ files. There are several other
    extensions used for C++, but IME ".cpp" is the most commonly used and
    supported by all C++ tools on all platforms. ".C" (capital C) is a poor
    choice - it's hard to distinguish from ".c" (small C), and it will drive Windows users crazy. And if you use gcc, then unless you can stick to a
    pure C++ setup and never use C, I recommend using "gcc" rather than
    "g++" for everything except the final linking stage (and even that is optional). The "gcc" driver program does the right thing.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Lawrence D'Oliveiro on Mon May 27 11:10:22 2024
    On 27/05/2024 02:49, Lawrence D'Oliveiro wrote:
    On Sun, 26 May 2024 23:06:47 +0300, Michael S wrote:

    On Sun, 26 May 2024 19:19:59 +0100 Malcolm McLean
    <[email protected]> wrote:

    ... was a 33 mb boost zipped image.

    If '33 mb' means 33 MB ...

    Yeah, I wondered about that. Never saw anybody measure things in “millibits” before ...

    I've seen communication systems that had transfer speeds measured in
    mbps - millibits per second.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Mon May 27 13:42:28 2024
    On 27/05/2024 01:17, Keith Thompson wrote:
    David Brown <[email protected]> writes:
    On 26/05/2024 00:58, Keith Thompson wrote:
    David Brown <[email protected]> writes:
    On 25/05/2024 03:29, Keith Thompson wrote:
    Keith Thompson <[email protected]> writes:
    David Brown <[email protected]> writes:
    On 23/05/2024 14:11, bart wrote:
    [...]



    The compiler will generate results /as if/ it had expanded the file to
    a list of numbers and parsed them. But it will not do that in
    practice. (At least, not for more serious implementations - simple
    solutions might do so to get support implemented quickly.)

    I'll start by acknowledging that the prototype information apparently
    *does* optimize #embed when it can. I was mistaken on that point.

    #embed *must* expand to the standard-defined comma-delimited sequence in *some* cases.

    Which means that the piece of the compiler that implements #embed has to recognize when it must generate that sequence, and when it can do
    something more efficient.

    Yes, exactly.


    I'd expect implementations to have extremely fast implementations for
    initialising arrays of character types, and probably also for other
    arrays of scaler types. More complicated examples - such as
    parameters in a macro or function call - would probably use a
    fall-back of generating naïve lists of integer constants.

    My problem is not just with how the compiler can figure out when it can optimize, but how programmers are supposed to understand whatever rules
    it uses. Can I rely on the optimization being performed if I use a
    typedef for unsigned char, or if I use an enumeration type whose
    underlying type is unsigned char, or if I have initialization elements
    befor and after the #embed directive?

    I don't know if that is something the programmer should need to
    consider, at least for most cases. Generally as a programmer you don't consider the compilation speed when writing code. You simply expect
    that compiler writers try to make their tools as fast as reasonably
    possible without sacrificing features. Sometimes there can be
    particular use-cases where the programmer has to look at the compiler
    manuals and adapt the code or build procedures to suit. I think that
    will be the case here too - compiler manuals should document what types
    of #embed usage they optimise. But I think it is unlikely that people
    writing portable code will do anything other than initialising a const
    (or constexpr) array of unsigned char if they have big enough files for optimisation to be relevant. Any compiler that does any #embed
    optimisation will handle this case. And even simple #embed
    implementations will likely be better than any alternatives (such as
    using xxd).


    Effective use of #embed requires too much "magic" for my taste -- particularly having the preprocessor rely on information from later
    phases. The semantics of #embed don't rely on that information, but efficient use for large files does.


    It is a violation of the neat layered (or pipeline) view of C
    compilation. But you could argue that this has been broken for decades
    - you have _Pragma that is syntactically an operator but duplicates preprocessor work, you have compiler pragmas that duplicate command-line
    flags (and command-line flags that duplicate preprocessor defines), you
    have pre-compiled headers, you have LTO that passes data multiple times
    through different parts of the pipeline.

    If you have a binary file containing a sequence of int values, you
    can
    use #embed to initialize an unsigned char array that's aliased with or
    copied to the int array.
    The *embed element width* is typically going to be CHAR_BIT bits by
    default. It can only be changed by an *implementation-defined* embed
    parameter. It seems odd that there's no standard way to specify the
    element width.
    It seems even more odd that the embed element width is
    implementation defined and not set to CHAR_BIT by default.

    I agree. But it may be left flexible for situations where the host
    and target have different ideas about CHAR_BIT. (Targets with
    CHAR_BIT other than 8 are very rare, hosts with CHAR_BIT other than 8
    are non-existent, but C remains flexible.)

    I would think that you'd want the element width to match CHAR_BIT *on
    the target* (which is the only CHAR_BIT that's relevant or available).
    If you're cross-compiling, you'd probably want to embed a file that
    could have been used on the target system.

    Yes, I think so.


    And if I'm not doing that kind of exotic cross-compiling, I can't rely
    on the element width being CHAR_BIT *or* on any standard way to specify
    that I want it to be CHAR_BIT.

    Requiring the default width to be CHAR_BIT would, I'm guessing, solve
    99% of cases. Allowing it to be specified by a parameter would solve
    the remaing 1%. And I expect it *will* be CHAR_BIT in most or all implementations, and programmers will rely on that assumption. I think
    the standard should guarantee that.


    I agree with you. I'm just trying to think of why the standards might
    not make that guarantee.

    For a very large file, that could be a significant burden. (I don't
    have any numbers on that.)

    I do :

    <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#design-efficiency-metrics>

    (That's from a proposal for #embed for C and C++. Generating the
    numbers and parsing them is akin to using xxd.)

    More useful links:

    <https://thephd.dev/embed-the-details#results>
    <https://thephd.dev/implementing-embed-c-and-c++>

    (These are from someone who did a lot of the work for the proposals,
    and prototype implementations, as far as I understand it.)

    That second link does have a lot of good information. I think I had
    seen it before, but I hadn't read it thoroughly. It refers to prototype implementations for both gcc and clang. I've built the prototype on my system, and godbolt.org has it, but the gcc prototype (for which the
    article provides good performance data) doesn't seem to be available anywhere.


    You are putting a lot more effort into this testing than I have. For my
    work, I am generally dependent on "official" toolchain builds - provided
    by the manufacturers of the microcontrollers we use, or at least by the manufacturers of the cpu cores. I like to keep track of what's coming -
    future versions of C or C++, future versions of compilers, etc. But
    details such as implementation efficiency (rather than features) don't
    matter much to me until they are available as part of these pre-built toolchains. (Sometimes it's fun to try things earlier, and I enjoy
    playing with newer compilers on godbolt.org, but I don't see testing the
    speed of #embed to be /so/ much fun that I'd bother building a compiler
    for it!)

    But it's nice to see you've done some independent testing. I have no particular reason to double "thephd.dev", but no particular reason to
    consider it authoritative either.

    My experiments with the clang prototype have been a bit confusing. I
    assumed that `clang -E` would give me meaningful results, but it always produces the comma-delimited sequence of integer constants, and even
    that output is inconsistent. It looks like "-E" synthesizes naive and
    not entirely correct output. Feeding that output to clang produces
    warnings that I don't get without "-E". Some of this might be the
    result of user error on my part.

    I did some tests with 100MB file, both with #embed and with #include
    using the output of "xxd". #embed *is* much faster.

    According to <https://thephd.dev/implementing-embed-c-and-c++>, it
    internally generates __builtin_pp_embed, which takes as arguments the expected type (always unsigned char for now), the filename as a string literal, and the data encoded as a base64 string literal. That's not
    going to be as fast as a hypothetical pure binary blob, but apparently
    it's still much faster than parsing a comma-delimited sequence.

    I haven't been able to get "clang -E" in the prototype to generate __builtin_pp_embed, or to get clang to recognize it. There are internal things going on that I don't understand.

    The author points out that using binary blobs would break tools that
    work with -E preprocessed source files. If you could assume that the preprocessed output will be processed only by the same compiler, that wouldn't be an issue, but apparently that's not a safe assumption.

    The author acknowedges that the prototype implementation doesn't handle
    all cases correctly.

    That's all good testing results - thanks for reporting them.


    Prototypes have been made, and they do have such optimisations. How
    things end up in real tools remains to be seen, of course.

    Here's how I personally would have preferred for #embed to be specified:

    - As in current C23 drafts, #embed with no parameters must operate *as
    if* it expanded to a comma-delimited list of integer constant
    expressions.
    - With no parameters, both the common cases (initializing an array of
    characters) and odd cases (e.g., initializing a struct object with
    varying types and sizes of members) must work as specified.
    - A standard-defined parameter allows control over optimization.

    The parameter can be "optimize(true)" or "optimize(false)".

    "optimize(false)" has no formal effect, but the compiler *should*
    generate the canonical sequence of constants.

    "optimize(true)" causes undefined behavior if #embed is used in a
    context other than the initialization of an array of character type.


    I disagree here. I want the compiler to generate the "as if" results regardless of any optimisation, working as currently specified. And
    /if/ the compiler is able to optimise the #embed, then I want it to do
    so automatically - I see no situation in which I would ever want "optimize(false)".

    What would be nice is an optional warning if the #embed size is over a
    certain limit and it is unable to optimise it - a message telling the
    user that an array of "unsigned char" would be faster than an array of
    "signed char", or whatever, would be helpful. But that kind of thing is definitely implementation-specific.

    I'd also like a pre-processor command-line option (again this is clearly implementation-specific) to force non-optimised output from #embed, for
    use with "gcc -E" (or "clang -E") and third-party tools.

    A naive compiler can quietly ignore the optimize() parameter and always generate the comma-delimited sequence. An exceedingly clever compiler
    could ignore it and always make a correct decision about whether to
    optimize #embed.

    Without the optimize parameter, typical compilers are expected to
    optimize #embed depending on the context in which it's used, and should produce the correct results in all cases. The parameter can be used to override the compiler's judgement.

    Another possibility might have been to specify that #embed can *only* be
    used to initialize an array of character type, and any other use either
    has undefined behavior or is a constraint violation. That would avoid
    all the complication of determining from context whether it can be
    optimized, and would probably cover 99% of cases. But it's probably too
    late for that.


    Agreed.

    As it is, #embed is complicated because it covers more than the simple
    case of initialising a const array of unsigned char. But it can't cover anything like all cases of embedding external data in C programs. (I
    have programs with internal web servers - they need to embed all files
    in a directory, and create an indexing structure. This is currently all automated by a python script called from the makefile - switching to
    #embed only would involve manual source changes when files are added or removed.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Lawrence D'Oliveiro on Mon May 27 14:03:16 2024
    On 27/05/2024 03:48, Lawrence D'Oliveiro wrote:
    On Mon, 27 May 2024 01:55:24 +0100, bart wrote:

    On 27/05/2024 01:44, Lawrence D'Oliveiro wrote:

    Nothing “unwieldy” about it. It’s a bunch of temporary intermediate build
    products, generated from suitable source files like everything else in the build.

    It only solves the easy part: including that binary data in the build.

    Apparently that is not so easy as you seem to think.

    Yes, it is as easy as I think. I’ve done this sort of thing, using
    suitable build scripts.

    Show me.

    This is how I show help text, which is maintained in an ordinary text
    file, from within my C compiler:

    println sinclude("help.txt")

    Just one line directly in the source code. The text is baked in to the executable so there is no discrete file in the installation.

    What would it look like in your build system, and what does it look like
    in the source code of your app?

    I mean, it's not as though this stuff is impossible without such a
    feature; the idea is to make it much simpler to do.

    If your method is simpler, I'll get rid of my feature and use your way.

    BTW here is the entire build process for the compiler:

    C:\cx>mm cc
    Compiling cc.m to cc.exe

    'cc' is cc.m, the lead module. It incorporates 42 embedded files in all.
    Your method can't be any more elaborate than that.

    I don't use build scripts; I don't need them.

    Here is another example using C (using an older compiler that supported embedded text files; this is not standard C, but it could be, and I
    think will be using #embed).

    It is a program posted by Michael S, but with an extra 'puts' line at
    the beginning so that it first prints out its own source code.

    It works by embedded the text for itself within the binary. I'd be
    interested in how your build process manages this.


    ----------------------------------

    #include <stdio.h>
    #include <stdlib.h>
    #include <ctype.h>

    int main(int argz, char** argv)
    {
    puts(strinclude(__FILE__));

    if (argz > 1) {
    FILE* fp = fopen(argv[1], "wb");
    if (fp) {
    char buf[2048];
    _Bool look_for_comma = 0;
    for (;;) {
    if (fgets(buf, sizeof(buf), stdin) != buf)
    break;

    char* p = buf;
    for (;;) {
    char c = *p;
    if (isgraph(c)) {
    if (look_for_comma) {
    if (c == ',') {
    look_for_comma = 0;
    ++p;
    } else {
    goto done;
    }
    } else {
    char* endp;
    long val = strtol(p, &endp, 0);
    if (endp==p) // not a number
    goto done;
    fputc((unsigned char)val, fp);
    p = endp;
    look_for_comma = 1;
    }
    } else {
    if (c == 0)
    break; // end of line
    ++p; // skip space or control character
    }
    }
    }
    done:
    fclose(fp);
    } else {
    perror(argv[1]);
    return 1;
    }
    }
    return 0;
    }

    ----------------------------------

    C:\c>bcc c.c
    Compiling c.c to c.exe

    C:\c>c
    #include <stdio.h>
    #include <stdlib.h>
    #include <ctype.h>

    int main(int argz, char** argv)
    {
    puts(strinclude(__FILE__));

    if (argz > 1) {
    FILE* fp = fopen(argv[1], "wb");
    if (fp) {
    .....








    Or maybe you think
    that 'embedding a file' just means adding it to a zip file?

    It’s whatever “including it in the build” means. It might indeed be a zip
    component, as with resources for an Android app. Or it might be converted into an object file with a tool like objcopy, to be integrated into the executable.

    Embedding applies also to text files not just binaries.

    Same principle applies.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Keith Thompson on Tue May 28 00:20:30 2024
    Keith Thompson <[email protected]> writes:
    David Brown <[email protected]> writes:
    On 26/05/2024 00:58, Keith Thompson wrote:


    It knows because the compiler writers are actually quite smart. The C
    standards may describe the translation process in a series of distinct
    and independent phases, but that's not how it is done in practice.
    The key point is that the compiler knows how the sequence of integers
    is going to be used before it gets that far in the preprocessing.

    I'd expect implementations to have extremely fast implementations for
    initialising arrays of character types, and probably also for other
    arrays of scaler types. More complicated examples - such as
    parameters in a macro or function call - would probably use a
    fall-back of generating naïve lists of integer constants.

    My problem is not just with how the compiler can figure out when it can >optimize, but how programmers are supposed to understand whatever rules
    it uses. Can I rely on the optimization being performed if I use a
    typedef for unsigned char, or if I use an enumeration type whose
    underlying type is unsigned char, or if I have initialization elements
    befor and after the #embed directive?

    A typical use case for me would be to build a binary file
    with a bespoke application. I would expect the #embed of that
    file to _maintain the binary layout in memory exactly the
    same as in the file_. It would be the #embed user's
    responsibilty to ensure that the binary file would be identical
    to the binary data expected by the declaration of the data structure
    being embedded.

    E.g. if the embedded file contained an array of some structure,
    the binary format of the embedded file must match the binary format
    that would be expected by the compiler (field sizes, alignment etc)
    for an array of said structure.

    The spec does say that the data in memory must match the data in the
    file. So it seems that the preprocessor can simply add a private
    attribute (e.g. just pass the #embed to the compiler a la #line or #file)
    and the compiler will tag the symbol table entry for the symbol associated
    with the #embed and the code generator can just open the file and
    copy the data byte-for-byte to the object file.



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to David Brown on Tue May 28 02:48:28 2024
    On Sun, 26 May 2024 18:12:17 +0200, David Brown wrote:

    Macros in C are not recursive. That stops them exploding, but also means there's a lot you can't do with the preprocessor.

    String-based macros + recursive substitution = recipe for trouble.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to bart on Tue May 28 02:45:48 2024
    On Mon, 27 May 2024 14:03:16 +0100, bart wrote:

    On 27/05/2024 03:48, Lawrence D'Oliveiro wrote:

    Apparently that is not so easy as you seem to think.

    Yes, it is as easy as I think. I’ve done this sort of thing, using
    suitable build scripts.

    Show me.

    Here <https://github.com/ldo/unicode_browser_android> is an old
    example, from when I was trying to learn Android programming. It lets
    you browse the Unicode code-point database, and do incremental
    searches by partial matching on code-point names: e.g. you can type
    “right arrow” and see candidate matches such as “U+219B RIGHTWARDS
    ARROW WITH STROKE”, “U+219D RIGHTWARDS WAVE ARROW”, “U+21A0 RIGHTWARDS TWO HEADED ARROW” etc.

    In the “util” subdirectory, you will find a Python script called “get_codes”. This processes a NamesList.txt file as downloaded from Unicode.org, and encodes the database as a binary blob with a specially-constructed header to allow quick loading and extraction of code-point information, including names, categories, related entries
    etc. This blob gets built as a “resource file” into the .apk file,
    where the Java code can find it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Michael S on Tue May 28 05:41:12 2024
    On Sun, 26 May 2024 19:50:40 +0300, Michael S wrote:

    If I am not mistaken, gfortran by default treats extension .f
    as "old FORTRAN" and extension .f90 as "new Fortran".

    The full list of recognized file extensions and their treatment is here <https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gfortran/GNU-Fortran-and-GCC.html>.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to David Brown on Tue May 28 05:45:02 2024
    On Mon, 27 May 2024 10:45:28 +0200, David Brown wrote:

    Whichever compiler you use, I strongly recommend using only ".c" for C
    files, and only ".cpp" for C++ files.

    Some use .cc for C++ code as well. Looking at the Blender source tree, for example, I see a mix of .cpp and .cc.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Lawrence D'Oliveiro on Tue May 28 10:46:32 2024
    On Tue, 28 May 2024 05:41:12 -0000 (UTC)
    Lawrence D'Oliveiro <[email protected]d> wrote:

    On Sun, 26 May 2024 19:50:40 +0300, Michael S wrote:

    If I am not mistaken, gfortran by default treats extension .f
    as "old FORTRAN" and extension .f90 as "new Fortran".

    The full list of recognized file extensions and their treatment is
    here <https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gfortran/GNU-Fortran-and-GCC.html>.

    Thank you.
    So I remembered correctly, but did not realize that both old and new
    variants (a.k.a fixed form and free form) are processed by the same
    front end, f951. The driver programs (gcc, g++, gfortran) passes
    dialect information to f951 as command line parameter. Fixed form is
    chosen by -ffixed-form, free form appears to be the default.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Lawrence D'Oliveiro on Tue May 28 11:30:05 2024
    On 28/05/2024 03:45, Lawrence D'Oliveiro wrote:
    On Mon, 27 May 2024 14:03:16 +0100, bart wrote:

    On 27/05/2024 03:48, Lawrence D'Oliveiro wrote:

    Apparently that is not so easy as you seem to think.

    Yes, it is as easy as I think. I’ve done this sort of thing, using
    suitable build scripts.

    Show me.

    Here <https://github.com/ldo/unicode_browser_android> is an old
    example, from when I was trying to learn Android programming. It lets
    you browse the Unicode code-point database, and do incremental
    searches by partial matching on code-point names: e.g. you can type
    “right arrow” and see candidate matches such as “U+219B RIGHTWARDS ARROW WITH STROKE”, “U+219D RIGHTWARDS WAVE ARROW”, “U+21A0 RIGHTWARDS
    TWO HEADED ARROW” etc.

    In the “util” subdirectory, you will find a Python script called “get_codes”. This processes a NamesList.txt file as downloaded from Unicode.org, and encodes the database as a binary blob with a specially-constructed header to allow quick loading and extraction of code-point information, including names, categories, related entries
    etc. This blob gets built as a “resource file” into the .apk file,
    where the Java code can find it.

    OK, so basically this writes a file. Or, part of a file?

    Where is the bit in the Java code that embeds it. Or is writing it as
    part of the .apk what you consider embedding?

    This is like saying that there's no point in anyone doing:

    #embed "clang.exe"

    because building that program is so much more complicated. (Or would be
    if somebody hadn't already done it.)

    The point is this: /once you already have those discrete files/, how do
    you painlessly embed them into your application?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Tue May 28 13:52:34 2024
    On 28/05/2024 02:33, Keith Thompson wrote:
    David Brown <[email protected]> writes:
    On 27/05/2024 01:17, Keith Thompson wrote:
    [...]
    Here's how I personally would have preferred for #embed to be
    specified:
    - As in current C23 drafts, #embed with no parameters must operate
    *as
    if* it expanded to a comma-delimited list of integer constant
    expressions.
    - With no parameters, both the common cases (initializing an array of
    characters) and odd cases (e.g., initializing a struct object with
    varying types and sizes of members) must work as specified.
    - A standard-defined parameter allows control over optimization.
    The parameter can be "optimize(true)" or "optimize(false)".
    "optimize(false)" has no formal effect, but the compiler *should*
    generate the canonical sequence of constants.
    "optimize(true)" causes undefined behavior if #embed is used in a
    context other than the initialization of an array of character type.

    I disagree here. I want the compiler to generate the "as if" results
    regardless of any optimisation, working as currently specified. And
    /if/ the compiler is able to optimise the #embed, then I want it to do
    so automatically - I see no situation in which I would ever want
    "optimize(false)".

    The issue I'm trying to address (very prematurely, no doubt) is
    that the decision of whether to optimize #embed vs. generating the
    naive comma-separated sequence is difficult to formalize, and easy
    to get wrong in corner cases.

    That's probably true. I would expect compiler implementations to
    optimise #embed only in cases where it is very clear (and at the very
    least, initialising a const array of char will fall into that category),
    and only when the preprocessor and compiler can coordinate it. Fallback
    will be using integer literal constants. I can't see any reason why
    that fallback should be slower than using xxd (or similar) and #include,
    so #embed should always be no slower than existing methods but sometimes
    very much faster.

    If optimisation was controlled or specified by something in the standard
    (such as your suggested "optimize()" parameter), then it would have to
    be formalized - leaving it to the implementation, which can document it
    as "best effort", entirely avoids the difficulty of specifying it. The
    only formalization needed is to say that it will always act "as if" it generated a comma-separated sequence.

    "restrict" is another performance
    hint whose only formal effect is to introduce undefined behavior
    if you use it incorrectly.

    Yes, it is. (And I believe C23 has re-written some of the description
    of "restrict" - not to change its behaviour, but to make it clearer. I
    have not looked at that bit as yet.) But again, I can't see how any
    discussion of optimisation of #embed affects the behaviour and therefore
    any UB. The result is /always/ the same - it's just the compile time
    that may differ.


    Let's say I define an array of a 1-byte enumeration type, initialized
    with #embed for a very large binary file. Maybe one compiler recognizes
    this as a case where it can perform the optimization, and another
    doesn't.

    Yes, that may be the case.

    If I can tell the compiler "trust me, I'm using this to
    initialize raw byte data, and I'll take responsibility if I get it
    wrong", I can see that being useful.

    What do you mean by "wrong" here? Both compilers will give identical
    results. The only difference is that one will do so faster than the other.


    And maybe "optimize" isn't the best name. Perhaps "raw_bytes"?

    "raw_bytes" makes no sense to me. I can see that "optimize" might be
    confusing - normally the word refers to the speed (and/or memory usage)
    of the generated code, while here it refers to the speed (and/or memory
    usage) of the compilation.


    Without some kind of programmer control, I'm concerned that the rules
    for defining an array so #embed will be correctly optimized will be
    spread as lore rather than being specified anywhere.


    They might, but I really do not think that is so important, since they
    will not affect the generated results.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Lawrence D'Oliveiro on Tue May 28 13:56:41 2024
    On 28/05/2024 04:48, Lawrence D'Oliveiro wrote:
    On Sun, 26 May 2024 18:12:17 +0200, David Brown wrote:

    Macros in C are not recursive. That stops them exploding, but also means
    there's a lot you can't do with the preprocessor.

    String-based macros + recursive substitution = recipe for trouble.

    I'm not sure I'd go /that/ far - but it is a recipe for complications,
    which can include useful things and new ways to do really bad things.
    It's quite possible to crash (or cause it to halt with an error message)
    a C pre-processor without recursive macros - it just takes slightly longer.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Malcolm McLean on Tue May 28 15:53:20 2024
    Malcolm McLean <[email protected]> writes:
    On 28/05/2024 01:20, Scott Lurndal wrote:
    Keith Thompson <[email protected]> writes:
    David Brown <[email protected]> writes:
    On 26/05/2024 00:58, Keith Thompson wrote:


    It knows because the compiler writers are actually quite smart. The C >>>> standards may describe the translation process in a series of distinct >>>> and independent phases, but that's not how it is done in practice.
    The key point is that the compiler knows how the sequence of integers
    is going to be used before it gets that far in the preprocessing.

    I'd expect implementations to have extremely fast implementations for
    initialising arrays of character types, and probably also for other
    arrays of scaler types. More complicated examples - such as
    parameters in a macro or function call - would probably use a
    fall-back of generating naïve lists of integer constants.

    My problem is not just with how the compiler can figure out when it can
    optimize, but how programmers are supposed to understand whatever rules
    it uses. Can I rely on the optimization being performed if I use a
    typedef for unsigned char, or if I use an enumeration type whose
    underlying type is unsigned char, or if I have initialization elements
    befor and after the #embed directive?

    A typical use case for me would be to build a binary file
    with a bespoke application. I would expect the #embed of that
    file to _maintain the binary layout in memory exactly the
    same as in the file_. It would be the #embed user's
    responsibilty to ensure that the binary file would be identical
    to the binary data expected by the declaration of the data structure
    being embedded.

    E.g. if the embedded file contained an array of some structure,
    the binary format of the embedded file must match the binary format
    that would be expected by the compiler (field sizes, alignment etc)
    for an array of said structure.

    The spec does say that the data in memory must match the data in the
    file. So it seems that the preprocessor can simply add a private
    attribute (e.g. just pass the #embed to the compiler a la #line or #file)
    and the compiler will tag the symbol table entry for the symbol associated >> with the #embed and the code generator can just open the file and
    copy the data byte-for-byte to the object file.

    You need the Baby X resource compiler.

    No, I'll use mmap() to map the binary into the application at run time.
    for various reasons, #embed wouldn't be the proper solution
    for this application since the data being mapped in varies depending
    on the run-time configuration of the application.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Keith Thompson on Tue May 28 15:42:48 2024
    Keith Thompson <[email protected]> writes:
    [email protected] (Scott Lurndal) writes:
    Keith Thompson <[email protected]> writes:
    David Brown <[email protected]> writes:
    On 26/05/2024 00:58, Keith Thompson wrote:
    It knows because the compiler writers are actually quite smart. The C >>>> standards may describe the translation process in a series of distinct >>>> and independent phases, but that's not how it is done in practice.
    The key point is that the compiler knows how the sequence of integers
    is going to be used before it gets that far in the preprocessing.

    I'd expect implementations to have extremely fast implementations for
    initialising arrays of character types, and probably also for other
    arrays of scaler types. More complicated examples - such as
    parameters in a macro or function call - would probably use a
    fall-back of generating naïve lists of integer constants.

    My problem is not just with how the compiler can figure out when it can >>>optimize, but how programmers are supposed to understand whatever rules >>>it uses. Can I rely on the optimization being performed if I use a >>>typedef for unsigned char, or if I use an enumeration type whose >>>underlying type is unsigned char, or if I have initialization elements >>>befor and after the #embed directive?

    A typical use case for me would be to build a binary file
    with a bespoke application. I would expect the #embed of that
    file to _maintain the binary layout in memory exactly the
    same as in the file_.

    I'm not sure why you'd expect that given the way #embed is specified -- >*unless* you're using to initialize an array of characters.

    It would be the #embed user's
    responsibilty to ensure that the binary file would be identical
    to the binary data expected by the declaration of the data structure
    being embedded.

    E.g. if the embedded file contained an array of some structure,
    the binary format of the embedded file must match the binary format
    that would be expected by the compiler (field sizes, alignment etc)
    for an array of said structure.

    The spec does say that the data in memory must match the data in the
    file.

    Where does it say that?

    See <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf>
    6.10.4. (N3220 is a C26 draft, but it's very close to C23.)

    The spec says that #embed expands to a comma-delimited sequence of
    integer constant expressions (and like anything, optimizations that
    don't violate the specified behavior are allowed). If the >implementation-defined *embed element width* is CHAR_BIT (which is not >guaranteed), then you can expect the same data layout *if* you use it to >initialize an array of characters, preferably unsigned char.


    "Implementations should take into account translation-time bit and
    byte orders as well as execution-time bit and byte orders to more
    appropriately represent the resource's binary data from the directive.
    This maximizes the chance that, if the resource referenced at translation
    time through the #embed irective is the same one accessed through
    execution-time means, the data that is e.g. fread or similar into contiguous
    storage will compare bit-for-bit equal to an array of character type initialized
    from an #embed directive's expanded contents."

    p. 172 n3220.pdf.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Keith Thompson on Tue May 28 23:37:18 2024
    On Tue, 28 May 2024 13:21:26 -0700
    Keith Thompson <[email protected]> wrote:

    David Brown <[email protected]> writes:
    On 28/05/2024 02:33, Keith Thompson wrote:
    [...]
    Without some kind of programmer control, I'm concerned that the
    rules for defining an array so #embed will be correctly optimized
    will be spread as lore rather than being specified anywhere.

    They might, but I really do not think that is so important, since
    they will not affect the generated results.

    Right, it won't affect the generated results (assuming I use it
    correctly). Unless I use `#embed optimize(true)` to initialize
    a struct with varying member sizes, but that's my fault because I
    asked for it.

    The point is compile-timer performance, and perhaps even the ability
    to compile at all.

    I'm thinking about hypothetical cases where I want to embed a
    *very* large file and parsing the comma-delimited sequence could
    have unacceptable compile-time performance, perhaps even causing
    a compile-time stack overflow depending on how the parser works.
    Every time the compiler sees #embed, it has to decide whether to
    optimize it or not, and the decision criteria are not specified
    anywhere (not at all in the standard, perhaps not clearly in the
    compiler's documentation).


    What about suggestion of Scott Lurndal?
    Preprocessor emits implementation defined directive as a prefix to CSV
    table. The directive tells to compiler to temporarily switch itself
    into specialized parsing mode, probably keeping all converted numbers
    in a single node in parser's results tree.
    As demonstrated in several posts below, parsing by itself as well as
    text to number conversion by itself, are not too bad. It's tree
    management that is problematic.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to bart on Wed May 29 04:17:58 2024
    On Tue, 28 May 2024 11:30:05 +0100, bart wrote:

    OK, so basically this writes a file. Or, part of a file?

    It converts a file into a quick-loading and easily-searchable format.

    Where is the bit in the Java code that embeds it.

    See the TableReader class.

    The point is this: /once you already have those discrete files/, how do
    you painlessly embed them into your application?

    That’s what the build tools are for.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Wed May 29 10:02:49 2024
    On 28/05/2024 22:21, Keith Thompson wrote:
    David Brown <[email protected]> writes:
    On 28/05/2024 02:33, Keith Thompson wrote:
    [...]
    Without some kind of programmer control, I'm concerned that the rules
    for defining an array so #embed will be correctly optimized will be
    spread as lore rather than being specified anywhere.

    They might, but I really do not think that is so important, since they
    will not affect the generated results.

    Right, it won't affect the generated results (assuming I use it
    correctly). Unless I use `#embed optimize(true)` to initialize
    a struct with varying member sizes, but that's my fault because I
    asked for it.


    I am still not understanding your point. (I am confident that you have
    a point, even if I don't get it.)

    I cannot see why there would be any need or use of manually adding
    optimisation hints or controls in the source code. I cannot see why the
    there is any possibility of getting incorrect results in any way.

    The point is compile-timer performance, and perhaps even the ability
    to compile at all.

    I'm thinking about hypothetical cases where I want to embed a
    *very* large file and parsing the comma-delimited sequence could
    have unacceptable compile-time performance, perhaps even causing
    a compile-time stack overflow depending on how the parser works.
    Every time the compiler sees #embed, it has to decide whether to
    optimize it or not, and the decision criteria are not specified
    anywhere (not at all in the standard, perhaps not clearly in the
    compiler's documentation).


    Yes, I agree with that. And this is how it should be - this is not
    something that should be specified. The C standards give minimum
    requirements for things like the number of identifiers or the length of
    lines. But pretty much all compilers, for most of the "translation
    limits", say they are "limited by the memory of the host computer". The
    same will apply to #embed. And some compilers will cope better than
    others with huge #embed's, some will be faster, some more memory
    efficient. Some will change from version to version. This is not
    something that can sensibly be specified or formalized - like pretty
    much everything in regard to compilation time, each compiler does the
    best it can without any specifications. I'd expect compiler reference
    manuals might have hints, such as saying #embed is fastest with unsigned
    char arrays (or whatever), but no more than that.

    But again - I see no reason for manual optimisation hints, and no reason
    for any possible errors.

    Let me outline a possible strategy for a compiler like gcc. (I have not
    looked at the prototype implementations from thephd, nor any gcc
    developer discussions.)

    gcc splits the C pre-processor and the compiler itself, and (currently) communicates dataflow in only one direction, via a temporary file or a
    pipe. But the "gcc" (or "g++", according to preference) driver program
    calls and coordinates the two programs.

    If the pre-processor is called stand-alone, then it will generate a comma-separated list of integers, helpfully split over multiple lines of reasonable size. This will clearly always be correct, and always work,
    within limits of a compiler's translation limits.

    But when the gcc driver calls it, it will have a flag indicating that
    the target compiler is gcc and supports an extended pre-processed syntax
    (and also that the source is C23 - after all, the C pre-processor can be
    used as a macro processor for other files with no relation to C). Now
    the pre-processor has a lot more freedom. Whenever it meets an #embed directive, it can generate a line :

    #embed_data 123456

    followed in the file by 123456 (or whatever) bytes of binary data. The
    C compiler, when parsing this file, will pull that in as a single blob.
    Then it is up to the C compiler - which knows how the #embed data will
    be used - to tell if the these bytes should be used as parameters to a
    macro, initialisation for a char array, or whatever. And it can use
    them as efficiently as practically possible. (It is probably only worth
    using this for #embed data over a certain size - smaller #embed's could
    just generate the integer sequences.)



    Nowhere in this is there any call of manual optimisation hints, nor any
    risk of incorrect results.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Kuyper@21:1/5 to BGB on Wed May 29 11:27:04 2024
    On 5/28/24 02:31, BGB wrote:
    On 5/27/2024 9:48 PM, Lawrence D'Oliveiro wrote:
    On Sun, 26 May 2024 18:12:17 +0200, David Brown wrote:

    Macros in C are not recursive. That stops them exploding, but also means >>> there's a lot you can't do with the preprocessor.\
    ...
    It seems the preprocessor in BGBCC is likely not entirely conformant
    in this case...

    If given a recursive macro, it will most likely just explode and
    probably crash the compiler...

    Mostly as it handles macro-expansion by looping over the line and
    performing macro-substitutions until no more substitutions are seen,
    at which point it emits the line to the output buffer and moves on to
    the next line.

    That definitely fails to conform to the requirements in 6.10.4.4.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Bonita Montero on Thu May 30 02:32:03 2024
    On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:

    I've got a small commandline-tool that makes a const'd char
    -array from any binary file.

    It seems to me it would be more efficient to use objcopy to turn that
    binary file directly into an object file with symbols accessible from C
    code defining its beginning and ending points. Then just link it into the executable.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Lawrence D'Oliveiro on Thu May 30 11:09:05 2024
    On Thu, 30 May 2024 02:32:03 -0000 (UTC)
    Lawrence D'Oliveiro <[email protected]d> wrote:

    On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:

    I've got a small commandline-tool that makes a const'd char
    -array from any binary file.

    It seems to me it would be more efficient to use objcopy to turn that
    binary file directly into an object file with symbols accessible from
    C code defining its beginning and ending points. Then just link it
    into the executable.

    Of course, it is more efficient.
    But:
    - it covers fewer use cases.
    - it exposes array's name and size as global symbols which is not
    always desirable
    - it feels too much like a magic. It would feel less like a magic if
    done by compiler rather than by extra tool. Even better if done by
    compiler in standardized manner.

    But yes, in real life, in embedded software project, that's what I'd do.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Thu May 30 13:43:25 2024
    On 30/05/2024 10:09, Michael S wrote:
    On Thu, 30 May 2024 02:32:03 -0000 (UTC)
    Lawrence D'Oliveiro <[email protected]d> wrote:

    On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:

    I've got a small commandline-tool that makes a const'd char
    -array from any binary file.

    It seems to me it would be more efficient to use objcopy to turn that
    binary file directly into an object file with symbols accessible from
    C code defining its beginning and ending points. Then just link it
    into the executable.

    Of course, it is more efficient.
    But:
    - it covers fewer use cases.
    - it exposes array's name and size as global symbols which is not
    always desirable
    - it feels too much like a magic. It would feel less like a magic if
    done by compiler rather than by extra tool. Even better if done by
    compiler in standardized manner.

    But yes, in real life, in embedded software project, that's what I'd do.


    In real life, in embedded software projects, I'd use xxd or a few lines
    of Python and have an initialised const array in the code. Why would
    you use something that "feels like magic" (i.e., may be mystical and
    hard to understand for other developers) and is more limited? To save
    three seconds of build time on the rare occasions when the source binary changes?

    Writing C (or C++) programs to generate these initialise files can be
    fun and instructive - I think we've all learned a little about where the bottlenecks really are as a result of your code. But it is not for
    real-life usage.

    #embed will be convenient for many common cases. For anything that
    #embed can't handle, I'd need a project-specific script anyway (such as
    for collecting all the files in a directory and building structures to
    access them). And then it is all about developer convenience - spending
    hours extra on the code to spare a few seconds of build time makes no sense.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Thu May 30 17:08:36 2024
    On Thu, 30 May 2024 14:34:00 +0100
    bart <[email protected]> wrote:

    On 30/05/2024 03:32, Lawrence D'Oliveiro wrote:
    On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:

    I've got a small commandline-tool that makes a const'd char
    -array from any binary file.

    It seems to me it would be more efficient to use objcopy to turn
    that binary file directly into an object file with symbols
    accessible from C code defining its beginning and ending points.
    Then just link it into the executable.

    None of my compilers, whether for C or anything else, generate object
    files.

    However, suppose I wanted to link a file called 'logo.bmp' say, into
    my program, which consisted of a file called main.c.

    What is the entire process using your suggestion? What do I put into
    main.c? Assume the data is represented by a char-array.


    extern unsigned char _binary_logo_bmp_start[];
    extern unsigned char _binary_logo_bmp_size[];

    The first symbol is an array itself.
    The seconded symbol contains the length of array. You use it in somewhat non-intuitive way:
    size_t my_size = (size_t)_binary_logo_bmp_size;

    Pay attention that I never used this method myself, just took a look at
    the output of objcopy with 'objdump -t', so please don't take my words
    as a sure thing.

    BTW, options in this case are rather simple:
    objcopy -I binary -O elf32-little logo.bmp logo_bmp.o
    Replace elf32-little with relevant format for your software. However I
    am not sure that it would work for none-elf output formats.

    This command puts the variable into the section .data. If one wants it
    in the different section, e.g. .rwdata then the thing could indeed
    become less obvious.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Lawrence D'Oliveiro on Thu May 30 14:34:00 2024
    On 30/05/2024 03:32, Lawrence D'Oliveiro wrote:
    On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:

    I've got a small commandline-tool that makes a const'd char
    -array from any binary file.

    It seems to me it would be more efficient to use objcopy to turn that
    binary file directly into an object file with symbols accessible from C
    code defining its beginning and ending points. Then just link it into the executable.

    None of my compilers, whether for C or anything else, generate object files.

    However, suppose I wanted to link a file called 'logo.bmp' say, into my program, which consisted of a file called main.c.

    What is the entire process using your suggestion? What do I put into
    main.c? Assume the data is represented by a char-array.


    In my language, it would simply be this:

    []byte logobmp = binclude("logo.bmp")

    Using my C extension, it might be this:

    uint8_t logobmp[] = strinclude("logo.bmp");

    (I believe this will cope with embedded zeros, and the file size is
    obtainable with 'sizeof(logobmp)'.

    With the new feature it might be this (I forget the exact syntax):

    uint8_t logobmp[] = {
    #embed "logo.bmp"
    };

    Nothing else is needed; just compile as normal.

    The point of the feature is avoid the palavar with 'objcopy', which is a utility with 100 different options, or messing with ones like xxd.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Michael S on Thu May 30 17:51:07 2024
    On Thu, 30 May 2024 17:08:36 +0300
    Michael S <[email protected]> wrote:

    On Thu, 30 May 2024 14:34:00 +0100
    bart <[email protected]> wrote:

    On 30/05/2024 03:32, Lawrence D'Oliveiro wrote:
    On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:

    I've got a small commandline-tool that makes a const'd char
    -array from any binary file.

    It seems to me it would be more efficient to use objcopy to turn
    that binary file directly into an object file with symbols
    accessible from C code defining its beginning and ending points.
    Then just link it into the executable.

    None of my compilers, whether for C or anything else, generate
    object files.

    However, suppose I wanted to link a file called 'logo.bmp' say, into
    my program, which consisted of a file called main.c.

    What is the entire process using your suggestion? What do I put
    into main.c? Assume the data is represented by a char-array.


    extern unsigned char _binary_logo_bmp_start[];
    extern unsigned char _binary_logo_bmp_size[];

    The first symbol is an array itself.
    The seconded symbol contains the length of array. You use it in
    somewhat non-intuitive way:
    size_t my_size = (size_t)_binary_logo_bmp_size;

    Pay attention that I never used this method myself, just took a look
    at the output of objcopy with 'objdump -t', so please don't take my
    words as a sure thing.

    BTW, options in this case are rather simple:
    objcopy -I binary -O elf32-little logo.bmp logo_bmp.o
    Replace elf32-little with relevant format for your software. However I
    am not sure that it would work for none-elf output formats.


    Tested it.
    On msys2 it can produce correct pe-x86-64 format but does it in counter-intuitive way: you have to ask for elf64-x86-64 instead of
    pe-x86-64.
    I don't know why it works like that.

    The rest of what I wrote above was correct.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Thu May 30 18:03:45 2024
    On Thu, 30 May 2024 15:48:39 +0100
    bart <[email protected]> wrote:


    Where do the _binary_logo_bmp_start and ...-size symbols come from?
    That is, how do they get into the object file.


    objcopy generates names of the symbols from the name of input binary
    file. I would think that it is possible to change these symbols to
    something else, but I am not sure that it is possible withing the same invocation of objcopy. It certainly is possible with a second pass.
    Lawrence probably can give more authoritative answer.
    Or as a last resort you can RTFM.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Thu May 30 15:48:39 2024
    On 30/05/2024 15:08, Michael S wrote:
    On Thu, 30 May 2024 14:34:00 +0100
    bart <[email protected]> wrote:

    On 30/05/2024 03:32, Lawrence D'Oliveiro wrote:
    On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:

    I've got a small commandline-tool that makes a const'd char
    -array from any binary file.

    It seems to me it would be more efficient to use objcopy to turn
    that binary file directly into an object file with symbols
    accessible from C code defining its beginning and ending points.
    Then just link it into the executable.

    None of my compilers, whether for C or anything else, generate object
    files.

    However, suppose I wanted to link a file called 'logo.bmp' say, into
    my program, which consisted of a file called main.c.

    What is the entire process using your suggestion? What do I put into
    main.c? Assume the data is represented by a char-array.


    extern unsigned char _binary_logo_bmp_start[];
    extern unsigned char _binary_logo_bmp_size[];

    The first symbol is an array itself.
    The seconded symbol contains the length of array. You use it in somewhat non-intuitive way:
    size_t my_size = (size_t)_binary_logo_bmp_size;

    Pay attention that I never used this method myself, just took a look at
    the output of objcopy with 'objdump -t', so please don't take my words
    as a sure thing.

    BTW, options in this case are rather simple:
    objcopy -I binary -O elf32-little logo.bmp logo_bmp.o

    Where do the _binary_logo_bmp_start and ...-size symbols come from? That
    is, how do they get into the object file.

    Replace elf32-little with relevant format for your software. However I
    am not sure that it would work for none-elf output formats.


    There appears to be an objcopy utility that runs under Windows.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to bart on Fri May 31 09:24:48 2024
    On 30/05/2024 16:48, bart wrote:
    On 30/05/2024 15:08, Michael S wrote:

    Replace elf32-little with relevant format for your software. However I
    am not sure that it would work for none-elf output formats.


    There appears to be an objcopy utility that runs under Windows.


    objcopy can handle lots of formats, as source or target, and can run on
    any general OS host. So the question is not if you can get objcopy that
    runs on Windows, it is whether you can use this kind of
    blob-to-object-file conversion with the output in the Windows object
    file format in the same was as you can for elf formats. You know vastly
    more about the Windows object file formats than I do, so maybe you can
    answer this yourself.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Fri May 31 13:39:49 2024
    On Fri, 31 May 2024 09:24:48 +0200
    David Brown <[email protected]> wrote:

    On 30/05/2024 16:48, bart wrote:
    On 30/05/2024 15:08, Michael S wrote:

    Replace elf32-little with relevant format for your software.
    However I am not sure that it would work for none-elf output
    formats.


    There appears to be an objcopy utility that runs under Windows.


    objcopy can handle lots of formats, as source or target, and can run
    on any general OS host. So the question is not if you can get
    objcopy that runs on Windows, it is whether you can use this kind of blob-to-object-file conversion with the output in the Windows object
    file format in the same was as you can for elf formats.

    That's quite strange question.
    You mean, you are able to imagine object file format uncapable to
    represent initialized data array?

    You know
    vastly more about the Windows object file formats than I do, so maybe
    you can answer this yourself.


    objcopy supplied with msys2 appear to have bug in -O selection handling,
    but fortunately there exists an easy workaround. Read my post below if
    you are interested.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Fri May 31 13:31:09 2024
    On 31/05/2024 12:39, Michael S wrote:
    On Fri, 31 May 2024 09:24:48 +0200
    David Brown <[email protected]> wrote:

    On 30/05/2024 16:48, bart wrote:
    On 30/05/2024 15:08, Michael S wrote:

    Replace elf32-little with relevant format for your software.
    However I am not sure that it would work for none-elf output
    formats.


    There appears to be an objcopy utility that runs under Windows.


    objcopy can handle lots of formats, as source or target, and can run
    on any general OS host. So the question is not if you can get
    objcopy that runs on Windows, it is whether you can use this kind of
    blob-to-object-file conversion with the output in the Windows object
    file format in the same was as you can for elf formats.

    That's quite strange question.
    You mean, you are able to imagine object file format uncapable to
    represent initialized data array?


    I'm sure I could imagine such a format, but I suppose it is quite unlikely!

    You know
    vastly more about the Windows object file formats than I do, so maybe
    you can answer this yourself.


    objcopy supplied with msys2 appear to have bug in -O selection handling,
    but fortunately there exists an easy workaround. Read my post below if
    you are interested.


    OK, a bug in a particular version or build of objcopy sounds a lot more
    likely than a perversely restricted object code format.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Fri May 31 13:55:33 2024
    On 30/05/2024 16:03, Michael S wrote:
    On Thu, 30 May 2024 15:48:39 +0100
    bart <[email protected]> wrote:


    Where do the _binary_logo_bmp_start and ...-size symbols come from?
    That is, how do they get into the object file.


    objcopy generates names of the symbols from the name of input binary
    file. I would think that it is possible to change these symbols to
    something else, but I am not sure that it is possible withing the same invocation of objcopy. It certainly is possible with a second pass.
    Lawrence probably can give more authoritative answer.
    Or as a last resort you can RTFM.

    I gave myself the simple task of incorporating the source text of
    hello.c into a program, and printing it out.

    My C program looked like this to start, as an initial test (ignoring
    declaring the size as an array, unless I had to):

    #include <stdio.h>
    typedef unsigned char byte;

    extern byte _binary_hello_c_start[];
    extern int _binary_hello_c_size;

    int main(void) {
    printf("%d\n", _binary_hello_c_size);
    }

    One small matter is those ugly, long identifiers. A bigger one in this
    case is that I really want that embedded text to be zero terminated;
    here it's unlikely to be.

    However I still have to create the object file with the data. I tried this:

    objcopy -I binary -O pe-x86-64 hello.c hello.obj

    The contents looked about right when I looked inside.

    Now to build my program. Because my C compiler can't link object files
    itself, I have to get it to generate an object file for the program,
    then use an external linker:

    C:\c>mcc -c c.c
    Compiling c.c to c.obj

    C:\c>gcc c.obj hello.obj
    hello.obj: file not recognized: file format not recognized
    collect2.exe: error: ld returned 1 exit status

    Unfortunately gcc/ld doesn't recognise the output of objcopy. Even
    though it accepts the output of mcc which is the same COFF format.

    But even if it worked, you can see it would be a bit of a palaver.

    Here's how builtin embedding worked using a feature of my older C compiler:

    #include <stdio.h>
    #include <string.h>

    char hello[] = strinclude("hello.c");

    int main(void) {
    printf("hello =\n%s\n", hello);
    printf("strlen(hello) = %zu\n", strlen(hello));
    printf("sizeof(hello) = %zu\n", sizeof(hello));
    }


    I build it and run it like this:

    C:\c>bcc c
    Compiling c.c to c.exe

    C:\c>c
    hello =
    #include "stdio.h"

    int main(void) {
    printf("Hello, World!\n");
    }

    strlen(hello) = 70
    sizeof(hello) = 71

    C:\c>dir hello.c
    31/05/2024 13:48 70 hello.c


    It just works; no messing about with objcopy parameters; no long
    unwieldy names; no link errors due to unsupported file formats; no
    problems with missing terminators for embedded text files imported as
    strings; no funny ways of getting size info.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Michael S on Fri May 31 16:28:11 2024
    On Fri, 31 May 2024 16:19:37 +0300
    Michael S <[email protected]> wrote:


    No, it does not work like that.
    First, copy *exactly* what I said in my previous post.
    Only after you reproduced, start to be smart.
    _binary_hello_c_size is a link simbol rather than variable.

    Declaration:
    extern char _binary_hello_c_size[];

    Usage:
    printf("%zd\n", (size_t)_binary_hello_c_size);


    Thinking about it, I could be wrong.
    I should test more, with less small program.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Fri May 31 16:19:37 2024
    On Fri, 31 May 2024 13:55:33 +0100
    bart <[email protected]> wrote:

    On 30/05/2024 16:03, Michael S wrote:
    On Thu, 30 May 2024 15:48:39 +0100
    bart <[email protected]> wrote:


    Where do the _binary_logo_bmp_start and ...-size symbols come from?
    That is, how do they get into the object file.


    objcopy generates names of the symbols from the name of input binary
    file. I would think that it is possible to change these symbols to something else, but I am not sure that it is possible withing the
    same invocation of objcopy. It certainly is possible with a second
    pass. Lawrence probably can give more authoritative answer.
    Or as a last resort you can RTFM.

    I gave myself the simple task of incorporating the source text of
    hello.c into a program, and printing it out.

    My C program looked like this to start, as an initial test (ignoring declaring the size as an array, unless I had to):

    #include <stdio.h>
    typedef unsigned char byte;

    extern byte _binary_hello_c_start[];
    extern int _binary_hello_c_size;

    int main(void) {
    printf("%d\n", _binary_hello_c_size);
    }


    No, it does not work like that.
    First, copy *exactly* what I said in my previous post.
    Only after you reproduced, start to be smart.
    _binary_hello_c_size is a link simbol rather than variable.

    Declaration:
    extern char _binary_hello_c_size[];

    Usage:
    printf("%zd\n", (size_t)_binary_hello_c_size);

    One small matter is those ugly, long identifiers. A bigger one in
    this case is that I really want that embedded text to be zero
    terminated; here it's unlikely to be.


    The tool is not made specifically for ASCII strings, it is more generic.
    I don't want it zero-terminated, the same as I don't want output of
    fread() zero-terminated. I want it exactly like it is in the
    input file.

    However I still have to create the object file with the data. I tried
    this:

    objcopy -I binary -O pe-x86-64 hello.c hello.obj

    The contents looked about right when I looked inside.

    Now to build my program. Because my C compiler can't link object
    files itself, I have to get it to generate an object file for the
    program, then use an external linker:

    C:\c>mcc -c c.c
    Compiling c.c to c.obj

    C:\c>gcc c.obj hello.obj
    hello.obj: file not recognized: file format not recognized
    collect2.exe: error: ld returned 1 exit status

    Unfortunately gcc/ld doesn't recognise the output of objcopy. Even
    though it accepts the output of mcc which is the same COFF format.


    It recognizes it if lye to objcopy about format.
    Specify elf64-x86-64 instead of pe-x86-64 and everything suddenly
    works.
    It's all was said in my posts from yesterday. It does not sound like you
    had read them.

    But even if it worked, you can see it would be a bit of a palaver.

    Here's how builtin embedding worked using a feature of my older C
    compiler:

    #include <stdio.h>
    #include <string.h>

    char hello[] = strinclude("hello.c");

    int main(void) {
    printf("hello =\n%s\n", hello);
    printf("strlen(hello) = %zu\n", strlen(hello));
    printf("sizeof(hello) = %zu\n", sizeof(hello));
    }


    I build it and run it like this:

    C:\c>bcc c
    Compiling c.c to c.exe

    C:\c>c
    hello =
    #include "stdio.h"

    int main(void) {
    printf("Hello, World!\n");
    }

    strlen(hello) = 70
    sizeof(hello) = 71

    C:\c>dir hello.c
    31/05/2024 13:48 70 hello.c


    It just works; no messing about with objcopy parameters; no long
    unwieldy names; no link errors due to unsupported file formats; no
    problems with missing terminators for embedded text files imported as strings; no funny ways of getting size info.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Fri May 31 15:04:46 2024
    On 31/05/2024 14:48, Michael S wrote:
    On Fri, 31 May 2024 16:28:11 +0300
    Michael S <[email protected]> wrote:

    On Fri, 31 May 2024 16:19:37 +0300
    Michael S <[email protected]> wrote:


    No, it does not work like that.
    First, copy *exactly* what I said in my previous post.
    Only after you reproduced, start to be smart.
    _binary_hello_c_size is a link simbol rather than variable.

    Declaration:
    extern char _binary_hello_c_size[];

    Usage:
    printf("%zd\n", (size_t)_binary_hello_c_size);


    Thinking about it, I could be wrong.
    I should test more, with less small program.


    I tested with bigger program, and it's still works.
    So, what written above is correct.


    Can you show the full program and the full process?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Michael S on Fri May 31 16:48:35 2024
    On Fri, 31 May 2024 16:28:11 +0300
    Michael S <[email protected]> wrote:

    On Fri, 31 May 2024 16:19:37 +0300
    Michael S <[email protected]> wrote:


    No, it does not work like that.
    First, copy *exactly* what I said in my previous post.
    Only after you reproduced, start to be smart.
    _binary_hello_c_size is a link simbol rather than variable.

    Declaration:
    extern char _binary_hello_c_size[];

    Usage:
    printf("%zd\n", (size_t)_binary_hello_c_size);


    Thinking about it, I could be wrong.
    I should test more, with less small program.


    I tested with bigger program, and it's still works.
    So, what written above is correct.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Fri May 31 15:03:55 2024
    On 31/05/2024 14:19, Michael S wrote:
    On Fri, 31 May 2024 13:55:33 +0100

    No, it does not work like that.
    First, copy *exactly* what I said in my previous post.
    Only after you reproduced, start to be smart.
    _binary_hello_c_size is a link simbol rather than variable.

    Declaration:
    extern char _binary_hello_c_size[];

    Usage:
    printf("%zd\n", (size_t)_binary_hello_c_size);

    I've now tried all sorts of combinations. While I can display the
    address of _binary_hello_c_size, it will crash if I try and deference it.

    The value of that symbol looks like this:

    00007ff678b90046

    It is clearly not the size of the data. But as I said, I can't get
    inside it. Neither is it simply the end address of the data (they differ
    by about 2**30).


    One small matter is those ugly, long identifiers. A bigger one in
    this case is that I really want that embedded text to be zero
    terminated; here it's unlikely to be.


    The tool is not made specifically for ASCII strings, it is more generic.

    There are two possibilities I'm interested in:

    * Having the data zero-terminated, for when you want to embed a text
    file as a zero-terminated string
    * Everything else, where you just want the binary blob as-is

    Unfortunately gcc/ld doesn't recognise the output of objcopy. Even
    though it accepts the output of mcc which is the same COFF format.


    It recognizes it if lye to objcopy about format.
    Specify elf64-x86-64 instead of pe-x86-64 and everything suddenly
    works.
    It's all was said in my posts from yesterday. It does not sound like you
    had read them.

    You said RTFM; I did. Nowhere did it say you have to use ELF object
    format /on Windows/.

    So, since I know gcc/ld on Windows understands PE, why didn't work?

    You can see there is problem after problem and a number of quirks.

    But if you have a fully working demo(you can forget the string
    requirement, that was just an easy way of showing it had the right
    data), then I will look at it again.

    Even if works however, I don't like it and now don't trust it.

    This is why I prefer language-supported solutions, and fortunately in my languages I can make it as simple and intuitive as I like.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Fri May 31 17:34:37 2024
    On Fri, 31 May 2024 15:04:46 +0100
    bart <[email protected]> wrote:

    On 31/05/2024 14:48, Michael S wrote:
    On Fri, 31 May 2024 16:28:11 +0300
    Michael S <[email protected]> wrote:

    On Fri, 31 May 2024 16:19:37 +0300
    Michael S <[email protected]> wrote:


    No, it does not work like that.
    First, copy *exactly* what I said in my previous post.
    Only after you reproduced, start to be smart.
    _binary_hello_c_size is a link simbol rather than variable.

    Declaration:
    extern char _binary_hello_c_size[];

    Usage:
    printf("%zd\n", (size_t)_binary_hello_c_size);


    Thinking about it, I could be wrong.
    I should test more, with less small program.


    I tested with bigger program, and it's still works.
    So, what written above is correct.


    Can you show the full program and the full process?

    test_objcopy.c:
    #include <stdio.h>

    int data1[42] = { 1,2,3 ,4,5};
    extern unsigned char _binary_test_bi_start[];
    extern unsigned char _binary_test_bi_end[];
    extern unsigned char _binary_test_bi_size[];

    extern unsigned char _binary_bin_to_list_c_start[];
    extern unsigned char _binary_bin_to_list_c_end[];
    extern unsigned char _binary_bin_to_list_c_size[];

    int main()
    {
    printf("%-40s %p %zd\n", "_binary_test_bi_start",
    _binary_test_bi_start, (size_t)_binary_test_bi_start);
    printf("%-40s %p %zd\n", "_binary_test_bi_end",
    _binary_test_bi_end, (size_t)_binary_test_bi_end);
    printf("%-40s %p %zd\n", "_binary_test_bi_size",
    _binary_test_bi_size, (size_t)_binary_test_bi_size);
    printf("%-40s %p %zd\n", "_binary_bin_to_list_c_start",
    _binary_bin_to_list_c_start, (size_t)_binary_bin_to_list_c_start);
    printf("%-40s %p %zd\n", "_binary_bin_to_list_c_end",
    _binary_bin_to_list_c_end, (size_t)_binary_bin_to_list_c_end);
    printf("%-40s %p %zd\n", "_binary_bin_to_list_c_size",
    _binary_bin_to_list_c_size, (size_t)_binary_bin_to_list_c_size);
    return 0;
    }

    Test files: test.bi and bin_to_list_c.
    Conversion to ojects:
    objcopy -I binary -O elf64-x86-64 test.bi test_bi.o
    objcopy -I binary -O elf64-x86-64 bin_to_list.c test_c.o

    Compilation:
    gcc -s -Wall -Oz test_objcopy.c test_bi.o test_c.o

    I compiled with additional option -Xlinker -Map=test_objcopy.map
    in order to make myself sure tha *_size are indeed pure symbols that
    have no memory allocated underneaths.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to bart on Fri May 31 15:34:29 2024
    On 2024-05-31, bart <[email protected]> wrote:
    Here's how builtin embedding worked using a feature of my older C compiler:

    #include <stdio.h>
    #include <string.h>

    char hello[] = strinclude("hello.c");

    int main(void) {
    printf("hello =\n%s\n", hello);
    printf("strlen(hello) = %zu\n", strlen(hello));
    printf("sizeof(hello) = %zu\n", sizeof(hello));
    }

    Lisp:

    $ cat strincl.tl
    (defmacro strinclude (path)
    (put-line `including @path`)
    (file-get-string path))

    (defun test()
    (strinclude "/etc/hostname"))

    When we run it interpreted we see from the debug put-line that /etc/hostname is included at macro-expansion time before we run the test function:

    $ txr -i strincl.tl
    including /etc/hostname
    This TTY may be recorded for privacy-violating and evidence-gathering purposes.
    (test)
    "sun-go\n"

    Now compile the file: the file is pulled it at compile time. Twice. :)
    A double expansion took place due to certain complexities of compiling.

    $ txr --compile=strincl.tl
    including /etc/hostname
    including /etc/hostname

    Now when we load the compiled file, the diagnostic trace
    "including /etc/hostname" no longer appears: the string is part of the test function as a literal:

    $ txr -i strincl
    TXR is enteric coated to release over 24 hours of lasting relief.
    (test)
    "sun-go\n"

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @[email protected]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Fri May 31 19:03:10 2024
    On 31/05/2024 15:34, Michael S wrote:
    On Fri, 31 May 2024 15:04:46 +0100
    bart <[email protected]> wrote:

    Can you show the full program and the full process?

    test_objcopy.c:
    #include <stdio.h>

    int data1[42] = { 1,2,3 ,4,5};
    extern unsigned char _binary_test_bi_start[];
    extern unsigned char _binary_test_bi_end[];
    extern unsigned char _binary_test_bi_size[];

    extern unsigned char _binary_bin_to_list_c_start[];
    extern unsigned char _binary_bin_to_list_c_end[];
    extern unsigned char _binary_bin_to_list_c_size[];

    int main()
    {
    printf("%-40s %p %zd\n", "_binary_test_bi_start",
    _binary_test_bi_start, (size_t)_binary_test_bi_start);
    printf("%-40s %p %zd\n", "_binary_test_bi_end",
    _binary_test_bi_end, (size_t)_binary_test_bi_end);
    printf("%-40s %p %zd\n", "_binary_test_bi_size",
    _binary_test_bi_size, (size_t)_binary_test_bi_size);
    printf("%-40s %p %zd\n", "_binary_bin_to_list_c_start",
    _binary_bin_to_list_c_start, (size_t)_binary_bin_to_list_c_start);
    printf("%-40s %p %zd\n", "_binary_bin_to_list_c_end",
    _binary_bin_to_list_c_end, (size_t)_binary_bin_to_list_c_end);
    printf("%-40s %p %zd\n", "_binary_bin_to_list_c_size",
    _binary_bin_to_list_c_size, (size_t)_binary_bin_to_list_c_size);
    return 0;
    }

    Test files: test.bi and bin_to_list_c.
    Conversion to ojects:
    objcopy -I binary -O elf64-x86-64 test.bi test_bi.o
    objcopy -I binary -O elf64-x86-64 bin_to_list.c test_c.o

    Compilation:
    gcc -s -Wall -Oz test_objcopy.c test_bi.o test_c.o

    OK, thanks. But I forget to ask what results you got from running the
    program. Because if I try your code, using hello.c and hello.exe as test binary/source data, I get this output:

    _binary_test_bi_start 00007ff6497620e0 140695771160800 _binary_test_bi_end 00007ff649762ae0 140695771163360 _binary_test_bi_size 00007ff509750a00 140690402380288 _binary_bin_to_list_c_start 00007ff649762ae0 140695771163360 _binary_bin_to_list_c_end 00007ff649762b26 140695771163430 _binary_bin_to_list_c_size 00007ff509750046 140690402377798

    The sizes should have been 2560 and 70 respectively; those values are
    bit bigger than that.

    However I see that you also have start and end addresses, which sounds a
    much better way of determining the size. (In that case, what are those
    *size symbols for?).

    So I can put together a working test:

    ---------------------------------
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    extern unsigned char _binary_hello_c_start[];
    extern unsigned char _binary_hello_c_end[];

    char* makestr(char* start, char* end) {
    int length = end-start;
    char* s = malloc(length+1);
    memcpy(s, start, length);
    *(s+length) = 0;
    return s;
    }

    int main() {
    char* str = makestr(_binary_hello_c_start, _binary_hello_c_end);

    printf("Hello = \n%s", str);
    }
    ---------------------------------

    I can build it like this:

    ---------------------------------
    C:\c>mcc -c c
    Compiling c.c to c.obj

    C:\c>objcopy -I binary -O elf64-x86-64 hello.c hello.obj

    C:\c>gcc c.c hello.obj
    ---------------------------------

    And run it like this:
    ---------------------------------
    C:\c>a
    Hello =
    #include "stdio.h"

    int main(void) {
    printf("Hello, World!\n");
    }
    ---------------------------------

    Instead of one compiler, here I used two compilers, a tool 'objcopy'
    (which bizarrely needs to generate ELF format files) and lots of extra
    ugly code. I also need to disregard whatever the hell _binary_..._size does.

    But it works.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to bart on Fri May 31 18:36:14 2024
    bart <[email protected]> writes:
    On 31/05/2024 15:34, Michael S wrote:
    On Fri, 31 May 2024 15:04:46 +0100
    bart <[email protected]> wrote:

    Instead of one compiler, here I used two compilers, a tool 'objcopy'
    (which bizarrely needs to generate ELF format files) and lots of extra
    ugly code. I also need to disregard whatever the hell _binary_..._size does.

    $ objcopy -I binary -O elf64-x86-64 main.cpp /tmp/test.o

    $ objdump -x /tmp/test.o

    /tmp/test.o: file format elf64-little
    /tmp/test.o
    architecture: UNKNOWN!, flags 0x00000010:
    HAS_SYMS
    start address 0x0000000000000000

    Sections:
    Idx Name Size VMA LMA File off Algn
    0 .data 000030e2 0000000000000000 0000000000000000 00000040 2**0
    CONTENTS, ALLOC, LOAD, DATA
    SYMBOL TABLE:
    0000000000000000 l d .data 0000000000000000 .data
    0000000000000000 g .data 0000000000000000 _binary_main_cpp_start 00000000000030e2 g .data 0000000000000000 _binary_main_cpp_end 00000000000030e2 g *ABS* 0000000000000000 _binary_main_cpp_size

    $ ls -l main.cpp
    -rw-rw-r--. 1 scott scott 12514 May 9 2022 main.cpp
    $ printf '%u\n' $(( 0x30e2 ))
    12514

    The value of the symbol _binary_main_cpp_size is the
    number of bytes in the file.

    (in other words,

    _binary_main_cpp_size = _binary_main_cpp_end - _binary_main_cpp_start

    )

    In C code:

    extern uint8_t _binary_main_cpp_size;

    const size_t embed_size = &_binary_main_cpp_size;

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jak@21:1/5 to All on Fri May 31 21:42:35 2024
    bart ha scritto:
    On 31/05/2024 15:34, Michael S wrote:
    On Fri, 31 May 2024 15:04:46 +0100
    bart <[email protected]> wrote:

    Can you show the full program and the full process?

    test_objcopy.c:
    #include <stdio.h>

    int data1[42] = { 1,2,3 ,4,5};
    extern unsigned char _binary_test_bi_start[];
    extern unsigned char _binary_test_bi_end[];
    extern unsigned char _binary_test_bi_size[];

    extern unsigned char _binary_bin_to_list_c_start[];
    extern unsigned char _binary_bin_to_list_c_end[];
    extern unsigned char _binary_bin_to_list_c_size[];

    int main()
    {
       printf("%-40s %p %zd\n", "_binary_test_bi_start",
         _binary_test_bi_start, (size_t)_binary_test_bi_start);
       printf("%-40s %p %zd\n", "_binary_test_bi_end",
         _binary_test_bi_end, (size_t)_binary_test_bi_end);
       printf("%-40s %p %zd\n", "_binary_test_bi_size",
         _binary_test_bi_size, (size_t)_binary_test_bi_size);
       printf("%-40s %p %zd\n", "_binary_bin_to_list_c_start",
         _binary_bin_to_list_c_start, (size_t)_binary_bin_to_list_c_start); >>    printf("%-40s %p %zd\n", "_binary_bin_to_list_c_end",
         _binary_bin_to_list_c_end, (size_t)_binary_bin_to_list_c_end);
       printf("%-40s %p %zd\n", "_binary_bin_to_list_c_size",
         _binary_bin_to_list_c_size, (size_t)_binary_bin_to_list_c_size);
       return 0;
    }

    Test files: test.bi and bin_to_list_c.
    Conversion to ojects:
    objcopy -I binary -O elf64-x86-64 test.bi test_bi.o
    objcopy -I binary -O elf64-x86-64 bin_to_list.c test_c.o

    Compilation:
    gcc -s -Wall -Oz test_objcopy.c test_bi.o test_c.o

    OK, thanks. But I forget to ask what results you got from running the program. Because if I try your code, using hello.c and hello.exe as test binary/source data, I get this output:

    _binary_test_bi_start                    00007ff6497620e0 140695771160800
    _binary_test_bi_end                      00007ff649762ae0 140695771163360
    _binary_test_bi_size                     00007ff509750a00 140690402380288
    _binary_bin_to_list_c_start              00007ff649762ae0 140695771163360
    _binary_bin_to_list_c_end                00007ff649762b26 140695771163430
    _binary_bin_to_list_c_size               00007ff509750046 140690402377798

    The sizes should have been 2560 and 70 respectively; those values are
    bit bigger than that.

    However I see that you also have start and end addresses, which sounds a
    much better way of determining the size. (In that case, what are those
    *size symbols for?).

    So I can put together a working test:

    ---------------------------------
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    extern unsigned char _binary_hello_c_start[];
    extern unsigned char _binary_hello_c_end[];

    char* makestr(char* start, char* end) {
        int length = end-start;
        char* s = malloc(length+1);
        memcpy(s, start, length);
        *(s+length) = 0;
        return s;
    }

    int main() {
        char* str = makestr(_binary_hello_c_start, _binary_hello_c_end);

        printf("Hello = \n%s", str);
    }
    ---------------------------------

    I can build it like this:

    ---------------------------------
    C:\c>mcc -c c
    Compiling c.c to c.obj

    C:\c>objcopy -I binary -O elf64-x86-64 hello.c hello.obj

    C:\c>gcc c.c hello.obj
    ---------------------------------

    And run it like this:
    ---------------------------------
    C:\c>a
    Hello =
    #include "stdio.h"

    int main(void) {
        printf("Hello, World!\n");
    }
    ---------------------------------

    Instead of one compiler, here I used two compilers, a tool 'objcopy'
    (which bizarrely needs to generate ELF format files) and lots of extra
    ugly code. I also need to disregard whatever the hell _binary_..._size
    does.

    But it works.



    You could use the pe-x86-64 format instead of the elf64-x86-64 to reduce
    the size of the object.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to jak on Fri May 31 21:11:26 2024
    jak <[email protected]> writes:
    bart ha scritto:
    On 31/05/2024 15:34, Michael S wrote:
    On Fri, 31 May 2024 15:04:46 +0100
    bart <[email protected]> wrote:



    <snip>


    Instead of one compiler, here I used two compilers, a tool 'objcopy'
    (which bizarrely needs to generate ELF format files) and lots of extra
    ugly code. I also need to disregard whatever the hell _binary_..._size
    does.

    But it works.



    You could use the pe-x86-64 format instead of the elf64-x86-64 to reduce
    the size of the object.

    By a half dozen bytes, perhaps, and only if your binutils have been
    built to support pe-x86-64:

    $ objcopy -I binary -O pe-x86-64 main.cpp /tmp/test1.o
    objcopy:/tmp/test1.o: Invalid bfd target

    The ELF64 format has a 64 byte header, the string table and the
    symbol table, and the remainder is the binary
    data. The PE header may save a few bytes by using 32-bit fields in
    the PE COFF header and symbol table.

    Note, you might want to trim your posts when replying with a one-sentence reply.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to jak on Fri May 31 22:17:54 2024
    On 31/05/2024 20:42, jak wrote:
    bart ha scritto:

    C:\c>objcopy -I binary -O elf64-x86-64 hello.c hello.obj

    You could use the pe-x86-64 format instead of the elf64-x86-64 to reduce
    the size of the object.

    The PE format doesn't work; gcc's ld linker has a problem with it, when
    it is generated by 'objcopy'.

    Actually there is a LOT wrong with this whole approach.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Scott Lurndal on Fri May 31 22:15:54 2024
    On 31/05/2024 19:36, Scott Lurndal wrote:
    bart <[email protected]> writes:
    On 31/05/2024 15:34, Michael S wrote:
    On Fri, 31 May 2024 15:04:46 +0100
    bart <[email protected]> wrote:

    Instead of one compiler, here I used two compilers, a tool 'objcopy'
    (which bizarrely needs to generate ELF format files) and lots of extra
    ugly code. I also need to disregard whatever the hell _binary_..._size does.

    $ objcopy -I binary -O elf64-x86-64 main.cpp /tmp/test.o

    $ objdump -x /tmp/test.o

    /tmp/test.o: file format elf64-little
    /tmp/test.o
    architecture: UNKNOWN!, flags 0x00000010:
    HAS_SYMS
    start address 0x0000000000000000

    Sections:
    Idx Name Size VMA LMA File off Algn
    0 .data 000030e2 0000000000000000 0000000000000000 00000040 2**0
    CONTENTS, ALLOC, LOAD, DATA
    SYMBOL TABLE:
    0000000000000000 l d .data 0000000000000000 .data
    0000000000000000 g .data 0000000000000000 _binary_main_cpp_start 00000000000030e2 g .data 0000000000000000 _binary_main_cpp_end 00000000000030e2 g *ABS* 0000000000000000 _binary_main_cpp_size

    $ ls -l main.cpp
    -rw-rw-r--. 1 scott scott 12514 May 9 2022 main.cpp
    $ printf '%u\n' $(( 0x30e2 ))
    12514

    The value of the symbol _binary_main_cpp_size is the
    number of bytes in the file.

    (in other words,

    _binary_main_cpp_size = _binary_main_cpp_end - _binary_main_cpp_start

    )

    In C code:

    extern uint8_t _binary_main_cpp_size;

    const size_t embed_size = &_binary_main_cpp_size;

    Did you see the output from my version of Michael S's program? The size
    is just an address. If I do what you do:

    extern unsigned char _binary_hello_c_size;

    ....
    size_t size = &_binary_hello_c_size;
    printf("size: %zu\n", size);

    It produces:

    size: 140697695027270

    Little of this seems to work, sorry. You guys keep saying, do this, do
    that, no do it that way, go RTFM, but nobody has shown a complete
    program that correctly shows the -size symbol to be giving anything
    meaningful.

    If I run this:

    printf("%p\n", &_binary_hello_c_start);
    printf("%p\n", &_binary_hello_c_end);
    printf("%p\n", &_binary_hello_c_size);

    I get:

    00007ff6ef252010
    00007ff6ef252056
    00007ff5af240046

    I can see that the first two can be subtracted to give the sizes of the
    data, which is 70 or 0x46. 0x46 is the last byte of the address of
    _size, so what's happening there? What's with the crap in bits 16-47?

    I can extract the size using:

    printf("%d\n", (unsigned short)&_binary_hello_c_size);

    But something is not right. I've also asked what is the point of the
    -size symbol if you can just do -end - -start, but nobody has explained.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to bart on Sat Jun 1 01:25:07 2024
    bart <[email protected]> writes:
    On 31/05/2024 19:36, Scott Lurndal wrote:
    bart <[email protected]> writes:
    On 31/05/2024 15:34, Michael S wrote:
    On Fri, 31 May 2024 15:04:46 +0100
    bart <[email protected]> wrote:

    Instead of one compiler, here I used two compilers, a tool 'objcopy'
    (which bizarrely needs to generate ELF format files) and lots of extra
    ugly code. I also need to disregard whatever the hell _binary_..._size does.

    $ objcopy -I binary -O elf64-x86-64 main.cpp /tmp/test.o

    $ objdump -x /tmp/test.o

    /tmp/test.o: file format elf64-little
    /tmp/test.o
    architecture: UNKNOWN!, flags 0x00000010:
    HAS_SYMS
    start address 0x0000000000000000

    Sections:
    Idx Name Size VMA LMA File off Algn
    0 .data 000030e2 0000000000000000 0000000000000000 00000040 2**0
    CONTENTS, ALLOC, LOAD, DATA
    SYMBOL TABLE:
    0000000000000000 l d .data 0000000000000000 .data
    0000000000000000 g .data 0000000000000000 _binary_main_cpp_start
    00000000000030e2 g .data 0000000000000000 _binary_main_cpp_end
    00000000000030e2 g *ABS* 0000000000000000 _binary_main_cpp_size

    $ ls -l main.cpp
    -rw-rw-r--. 1 scott scott 12514 May 9 2022 main.cpp
    $ printf '%u\n' $(( 0x30e2 ))
    12514

    The value of the symbol _binary_main_cpp_size is the
    number of bytes in the file.

    (in other words,

    _binary_main_cpp_size = _binary_main_cpp_end - _binary_main_cpp_start >>
    )

    In C code:

    extern uint8_t _binary_main_cpp_size;

    const size_t embed_size = &_binary_main_cpp_size;

    Did you see the output from my version of Michael S's program? The size
    is just an address. If I do what you do:

    extern unsigned char _binary_hello_c_size;

    ....
    size_t size = &_binary_hello_c_size;
    printf("size: %zu\n", size);

    It produces:

    size: 140697695027270

    Little of this seems to work, sorry. You guys keep saying, do this, do
    that, no do it that way, go RTFM, but nobody has shown a complete
    program that correctly shows the -size symbol to be giving anything >meaningful.

    If I run this:

    printf("%p\n", &_binary_hello_c_start);
    printf("%p\n", &_binary_hello_c_end);
    printf("%p\n", &_binary_hello_c_size);

    I get:

    00007ff6ef252010
    00007ff6ef252056
    00007ff5af240046

    I can see that the first two can be subtracted to give the sizes of the
    data, which is 70 or 0x46. 0x46 is the last byte of the address of
    _size, so what's happening there? What's with the crap in bits 16-47?

    I can extract the size using:

    printf("%d\n", (unsigned short)&_binary_hello_c_size);

    But something is not right. I've also asked what is the point of the
    -size symbol if you can just do -end - -start, but nobody has explained.

    $ cat /tmp/m.c
    #include <stdio.h>
    #include <stdint.h>

    extern uint64_t _binary_main_cpp_size;
    extern uint8_t *_binary_main_cpp_start;
    extern uint8_t *_binary_main_cpp_end;

    int main()
    {
    printf("%p\n", &_binary_main_cpp_size);
    printf("%p\n", &_binary_main_cpp_start);
    printf("%p\n", &_binary_main_cpp_end);
    return 0;
    }
    $ objcopy -I binary -B i386 -O elf64-x86-64 main.cpp /tmp/test.o
    $ cc -o /tmp/m /tmp/m.c /tmp/test.o
    $ /tmp/m
    0x30e2
    0x601034
    0x604116
    $ nm /tmp/m | grep _binary_main
    0000000000604116 D _binary_main_cpp_end
    00000000000030e2 A _binary_main_cpp_size
    0000000000601034 D _binary_main_cpp_start
    $ wc -c main.cpp
    12514 main.cpp
    $ printf 0x%x\\n 12514
    0x30e2

    The size symbol requires no space in the resulting
    executable memory image, and it's more convenient than
    having to do the math (at run time, since the compiler
    can't know the actual values).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Lynn McGuire on Sat Jun 1 01:27:41 2024
    Lynn McGuire <[email protected]> writes:
    On 5/26/2024 6:23 AM, Bonita Montero wrote:
    Am 26.05.2024 um 09:13 schrieb jak:

    About this I only agree partially because it depends a lot on the
    context in which it is used. Moreover, I would not know how to indicate
    an optimal programming language for all seasons.

    C++ is in almost any case the better C.

    What you describe is the greatest inconvenience of c++. To make only one >>> example, when they decided to rewrite the FB platform to accelerate it,
    they thought of migrating from php to c++ and they had a collapse of the >>> staff suitable for work, so they thought of relying a compiler that
    translated the php into c++ and many of the new languages were born to
    try to remedy hits complexity.

    C++ is the wrong language for web applications.
    I like Java more for that.

    C++ is the wrong language for real time apps.

    That's an incorrect statement.

    No memory allocation allowed.

    It is trivially easy to write C++ code that doesn't
    allocate memory dynamically.


    I use C++ for my server side apps on my webserver. Works great.

    I use C++ for operating systems (you can't get more real-time
    than that) and bare-metal hypervisors.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jak@21:1/5 to All on Sat Jun 1 03:37:04 2024
    bart ha scritto:
    I can see that the first two can be subtracted to give the sizes of the
    data, which is 70 or 0x46. 0x46 is the last byte of the address of
    _size, so what's happening there? What's with the crap in bits 16-47?

    I can extract the size using:

       printf("%d\n", (unsigned short)&_binary_hello_c_size);

    But something is not right. I've also asked what is the point of the
    -size symbol if you can just do -end - -start, but nobody has explained.

    typedef unsigned char uchar;
    extern uchar _binary_hello_c_size[];
    long hello_c_size = _binary_hello_c_size - (uchar *)0;

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to bart on Sat Jun 1 01:39:40 2024
    On Thu, 30 May 2024 14:34:00 +0100, bart wrote:

    On 30/05/2024 03:32, Lawrence D'Oliveiro wrote:

    On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:

    I've got a small commandline-tool that makes a const'd char -array
    from any binary file.

    It seems to me it would be more efficient to use objcopy to turn that
    binary file directly into an object file with symbols accessible from C
    code defining its beginning and ending points. Then just link it into
    the executable.

    None of my compilers, whether for C or anything else, generate object
    files.

    That’s too bad. All the good compilers, for languages like C and others
    which are meant to execute efficiently, do.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Michael S on Sat Jun 1 01:45:51 2024
    On Thu, 30 May 2024 11:09:05 +0300, Michael S wrote:

    On Thu, 30 May 2024 02:32:03 -0000 (UTC)
    Lawrence D'Oliveiro <[email protected]d> wrote:

    On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:

    I've got a small commandline-tool that makes a const'd char -array
    from any binary file.

    It seems to me it would be more efficient to use objcopy to turn that
    binary file directly into an object file with symbols accessible from C
    code defining its beginning and ending points. Then just link it into
    the executable.

    Of course, it is more efficient.
    But:
    - it covers fewer use cases.

    There are many ways of embedding a binary blob in a software project. This
    is just one tool for that; there are other tools for other cases (see the Unicode Browser for Android example that I mentioned elsewhere).

    - it exposes array's name and size as global symbols which is not
    always desirable

    Lots of other things already need to be global symbols, I don’t see why a couple more make a difference to anything.

    Look at how large projects like the Linux kernel deal with this sort of
    thing.

    - it feels too much like a magic. It would feel less like a magic if
    done by compiler rather than by extra tool. Even better if done by
    compiler in standardized manner.

    I don’t understand this at all. I never had the assumption, in any real- world build system, that all the generated code had to come from some “official” compiler for some “official” language.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to jak on Sat Jun 1 11:09:25 2024
    On 01/06/2024 02:37, jak wrote:
    bart ha scritto:
    I can see that the first two can be subtracted to give the sizes of
    the data, which is 70 or 0x46. 0x46 is the last byte of the address of
    _size, so what's happening there? What's with the crap in bits 16-47?

    I can extract the size using:

        printf("%d\n", (unsigned short)&_binary_hello_c_size);

    But something is not right. I've also asked what is the point of the
    -size symbol if you can just do -end - -start, but nobody has explained.

        typedef unsigned char uchar;
        extern uchar _binary_hello_c_size[];
        long hello_c_size = _binary_hello_c_size - (uchar *)0;

    What result for the size did you get when you ran this?

    It seems people are just guessing what might be the right code and
    posting random fragments!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Lawrence D'Oliveiro on Sat Jun 1 11:37:45 2024
    On 01/06/2024 02:39, Lawrence D'Oliveiro wrote:
    On Thu, 30 May 2024 14:34:00 +0100, bart wrote:

    On 30/05/2024 03:32, Lawrence D'Oliveiro wrote:

    On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:

    I've got a small commandline-tool that makes a const'd char -array
    from any binary file.

    It seems to me it would be more efficient to use objcopy to turn that
    binary file directly into an object file with symbols accessible from C
    code defining its beginning and ending points. Then just link it into
    the executable.

    None of my compilers, whether for C or anything else, generate object
    files.

    That’s too bad. All the good compilers, for languages like C and others which are meant to execute efficiently, do.

    What do you mean by 'are meant to execute efficiently'? Is that
    build-time or run-time of the resulting program?

    In the latter case, whether it uses object files is irrevant.

    For build-time, pointlessly generating a discrete object file will slow
    things down.

    My compilers don't routinely generate object files, which would also
    need an external dependency (a linker), but they can do if necessary
    (eg. to statically link my code into another program with another compiler).

    The compiler for my main language is a whole-program one. If it were to
    create an object file, it would be a single file; there would be no
    others to link to!

    And here, makefiles also assume independent compilation of modules.

    So it is makefiles that appear to be holding back advancement in this
    area, by requiring traditional module-at-a-time building, and requiring
    object file intermediates.

    C:\qx52>mm -obj qq
    Compiling qq.m to qq.obj

    C:\qx52>dir qq.obj
    01/06/2024 11:34 787,788 qq.obj

    C:\qx52>gcc qq.obj -oqq # 'link'

    C:\qx52>qq
    Q5.2 Interpreter
    Usage:
    qq filename[.q]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Scott Lurndal on Sat Jun 1 11:24:20 2024
    On 01/06/2024 02:25, Scott Lurndal wrote:
    bart <[email protected]> writes:

    Little of this seems to work, sorry. You guys keep saying, do this, do
    that, no do it that way, go RTFM, but nobody has shown a complete
    program that correctly shows the -size symbol to be giving anything
    meaningful.

    If I run this:

    printf("%p\n", &_binary_hello_c_start);
    printf("%p\n", &_binary_hello_c_end);
    printf("%p\n", &_binary_hello_c_size);

    I get:

    00007ff6ef252010
    00007ff6ef252056
    00007ff5af240046

    I can see that the first two can be subtracted to give the sizes of the
    data, which is 70 or 0x46. 0x46 is the last byte of the address of
    _size, so what's happening there? What's with the crap in bits 16-47?

    I can extract the size using:

    printf("%d\n", (unsigned short)&_binary_hello_c_size);

    But something is not right. I've also asked what is the point of the
    -size symbol if you can just do -end - -start, but nobody has explained.

    $ cat /tmp/m.c
    #include <stdio.h>
    #include <stdint.h>

    extern uint64_t _binary_main_cpp_size;
    extern uint8_t *_binary_main_cpp_start;
    extern uint8_t *_binary_main_cpp_end;

    int main()
    {
    printf("%p\n", &_binary_main_cpp_size);
    printf("%p\n", &_binary_main_cpp_start);
    printf("%p\n", &_binary_main_cpp_end);
    return 0;
    }
    $ objcopy -I binary -B i386 -O elf64-x86-64 main.cpp /tmp/test.o
    $ cc -o /tmp/m /tmp/m.c /tmp/test.o
    $ /tmp/m
    0x30e2
    0x601034
    0x604116
    $ nm /tmp/m | grep _binary_main
    0000000000604116 D _binary_main_cpp_end
    00000000000030e2 A _binary_main_cpp_size
    0000000000601034 D _binary_main_cpp_start
    $ wc -c main.cpp
    12514 main.cpp
    $ printf 0x%x\\n 12514
    0x30e2

    The size symbol requires no space in the resulting
    executable memory image, and it's more convenient than
    having to do the math (at run time, since the compiler
    can't know the actual values).

    Here's my transcript:

    -------------------------------------
    C:\c>copy hello.c main.cpp # create main.cpp, here it's 70 bytes
    1 file(s) copied.

    C:\c>type m.c # exact same code as yours
    #include <stdio.h>
    #include <stdint.h>

    extern uint64_t _binary_main_cpp_size;
    extern uint8_t *_binary_main_cpp_start;
    extern uint8_t *_binary_main_cpp_end;

    int main()
    {
    printf("%p\n", &_binary_main_cpp_size);
    printf("%p\n", &_binary_main_cpp_start);
    printf("%p\n", &_binary_main_cpp_end);
    return 0;
    }

    C:\c>objcopy -I binary -O elf64-x86-64 main.cpp test.o # make test.o

    C:\c>gcc m.c test.o -o m.exe # build m executable

    C:\c>m # run m.exe
    00007ff5d5480046 # and the size is ...
    00007ff715492010
    00007ff715492056
    -------------------------------------

    Maybe Windows is at fault? I'll try it under WSL:

    -------------------------------------
    root@DESKTOP-11:/mnt/c/c# objcopy -I binary -O elf64-x86-64 main.cpp test.o root@DESKTOP-11:/mnt/c/c# gcc m.c test.o -o m
    root@DESKTOP-11:/mnt/c/c# ./m
    0x55effc9f2046
    0x55effc9f6010
    0x55effc9f6056
    -------------------------------------

    Nope, same thing. This doesn't inspire much confidence. With values
    shown, the actual size IS contained within the _size value, but only as
    the last 16 bits of the value.

    gcc versions were 10.3.0 and 9.4.0 respectively; the latter is what is
    provided by Windows 11.

    You also brought up the fact that the size is not known to the compiler
    anyway, which means a few things are not possible, like using the size
    in a static context.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Malcolm McLean on Sat Jun 1 11:53:15 2024
    On 01/06/2024 01:53, Malcolm McLean wrote:
    On 31/05/2024 13:55, bart wrote:
    On 30/05/2024 16:03, Michael S wrote:
    On Thu, 30 May 2024 15:48:39 +0100
    bart <[email protected]> wrote:


    Where do the _binary_logo_bmp_start and ...-size symbols come from?
    That is, how do they get into the object file.


    objcopy generates names of the symbols from the name of input binary
    file. I would think that it is possible to change these symbols to
    something else, but I am not sure that it is possible withing the same
    invocation of objcopy. It certainly is possible with a second pass.
    Lawrence probably can give more authoritative answer.
    Or as a last resort you can RTFM.

    I gave myself the simple task of incorporating the source text of
    hello.c into a program, and printing it out.

    Here's how builtin embedding worked using a feature of my older C
    compiler:

       #include <stdio.h>
       #include <string.h>

       char hello[] = strinclude("hello.c");

       int main(void) {
           printf("hello =\n%s\n", hello);
           printf("strlen(hello) = %zu\n", strlen(hello));
           printf("sizeof(hello) = %zu\n", sizeof(hello));
       }


    I build it and run it like this:

       C:\c>bcc c
       Compiling c.c to c.exe

       C:\c>c
       hello =
       #include "stdio.h"

       int main(void) {
           printf("Hello, World!\n");
       }

       strlen(hello) = 70
       sizeof(hello) = 71

       C:\c>dir hello.c
       31/05/2024  13:48                70 hello.c


    It just works; no messing about with objcopy parameters; no long
    unwieldy names; no link errors due to unsupported file formats; no
    problems with missing terminators for embedded text files imported as
    strings; no funny ways of getting size info.

    Here's my solution. It's a bit more complicated.


    int bbx_write_source (const char *source_xml, char *path, const char *source_xml_file, const char *source_xml_name)
    {
        XMLDOC *doc = 0;
        char error[1024];
        char buff[1024];
        XMLNODE *root;
        XMLNODE *node;
        const char *name;
        FILE *fpout;
        FILE *fpin;
        int ch;

        doc = xmldocfromstring(source_xml, error, 1024);
        if (!doc)
        {
            fprintf(stderr, "%s\n", error);
            return -1;
        }
        root = xml_getroot(doc);
        if (strcmp(xml_gettag(root), "FileSystem"))
            return -1;

        if (!root->child)
            return -1;
        if (strcmp(xml_gettag(root->child), "directory"))
            return -1;

        for (node = root->child->child; node != NULL; node = node->next)
        {
            if (!strcmp(xml_gettag(node), "file"))
            {
                name = xml_getattribute(node, "name");
                snprintf(buff, 1024, "%s%s", path, name);
                fpout = fopen(buff, "w");
                if (!fpout)
                    break;
                fpin = file_fopen(node);
                if (!fpin)
                    break;
                if (!strcmp(name, source_xml_file))
                {
                    char *escaped = texttostring(source_xml);
                    if (!escaped)
                        break;
                    fprintf(fpout, "char %s[] = %s;\n", source_xml_name,
    escaped);
                    free(escaped);
                }
                else
                {
                   while ((ch = fgetc(fpin)) != EOF)
                       fputc(ch, fpout);
                }
                fclose(fpout);
                fclose(fpin);
                fpout = 0;
                fpin = 0;
            }
        }
        if (fpin || fpout)
        {
            fclose(fpin);
            fclose(fpout);
            return -1;
        }

        return 0;

    }

    It's leveraging the Baby X resource compiler, the xmparser, and my
    filesystem programs. You can't include the source of a program in the
    program as a C string, because then the source changes to include that string. So what you do is this.

    You first place a placeholder C source file containing a short dummy
    string.
    The you convert the source to an XML file, and turn it into a string
    with the Baby X Resource compiler. Then you drop the source into the
    file, removing the placeholder.

    Then the program walks the file list, detects that file, and replaces it
    with the xml string it has been passed.

    And this system works, and it's an easy way of adding source output to ptograms. Of course the function now needs to be modified to walk the
    entire tree recursively and I will need a makedirectory function. I've
    got it to work for flat source directories.

    Sorry, I don't understand what that does; what is the input and what is
    the output?

    In the case of a very simple requirement of incorporating a text file
    into a C program as data, usually string data (which I have to say is
    much more common for me than doing anything with XML), how would a BBX
    solution work?

    This doesn't work:

    char strdata[] = {
    #include "file.txt"
    }

    Because the contents of file.txt, which let's say are:

    one
    two
    three

    are interpreted as C source code ('one' is a syntax error, or it might
    be the name of some identifier).

    Some process is needed to either turn that file into:

    "one\ntwo\nthree\n"

    or into a bunch of numbers: '100, 110, 101, ...'. I think this is what
    'xxd' does.

    In the case of binary files, the process of embedding is usually blind
    to the actual format, or meaning, of the file. It is just a blob of data.

    So here, I understand that the BBXRC solution goes much further. If I
    wanted to include a JPG file, then either #embed or my strinclude()
    would just incorporate the raw bytes. I would still need a JPEG decoder
    to use that data.

    Whereas BBXRC, AIUI, does the decoding for you, and incorporates the
    data as a raw table of pixel values that can be directly used.

    So it is at a different level from what is being discussed. But
    sometimes there is also a need for that cruder form of embedding: maybe
    that JPG just needs to be written out again; no need to get inside it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jak@21:1/5 to All on Sat Jun 1 13:59:32 2024
    bart ha scritto:
    On 01/06/2024 02:37, jak wrote:
    bart ha scritto:
    I can see that the first two can be subtracted to give the sizes of
    the data, which is 70 or 0x46. 0x46 is the last byte of the address
    of _size, so what's happening there? What's with the crap in bits 16-47? >>>
    I can extract the size using:

        printf("%d\n", (unsigned short)&_binary_hello_c_size);

    But something is not right. I've also asked what is the point of the
    -size symbol if you can just do -end - -start, but nobody has explained.

         typedef unsigned char uchar;
         extern uchar _binary_hello_c_size[];
         long hello_c_size = _binary_hello_c_size - (uchar *)0;

    What result for the size did you get when you ran this?

    It seems people are just guessing what might be the right code and
    posting random fragments!


    I wrote it that way precisely because I believed it was the clearest
    way. With the extern you can retrive the relative values ​​that in the
    case of _start and _end correspond to the initial and final address of
    the object, in fact you can get the length of the object by subtracting
    the starting address from the final one:

    extern char _binary_hello_c_start[];
    extern char _binary_hello_c_end[];

    long len = _binary_hello_c_end - _binary_hello_c_start;

    Unfortunately, _size is provided in the same way as _start and _end
    addresses, then, since it is not an address but a length and in C:
    Address +/- Value = Address
    Address +/- Address = Value
    so, to retrive this length that in the program it is seen as an address
    it is sufficient to subtract the starting address which in the case of a
    length is zero.

    extern char _binary_hello_c_size[];

    long len = _binary_hello_c_size - (char *)0;

    surely you can also recover the value with a cast:

    long len = (long)_binary_hello_c_size;

    but the example I sent you had seemed more explanatory while the cast
    seems to me a blow of hoe.
    Here nobody invents anything. I'm sorry you think this.
    /*
    * example:
    * file to embed:
    * --- start file.txt ---
    * line number 1
    * line number 2
    * line number 3
    * line number 4
    * line number 5
    * line number 6
    * line number 7
    * line number 8
    * line number 9
    * line number 10
    * line number 11
    * line number 12
    * line number 13
    * line number 14
    * line number 15
    * line number 16
    * line number 17
    * line number 18
    * line number 19
    * line number 20
    * --- end file.txt ---
    * objcopy --input-target binary --output-target pe-x86-64 --binary-architecture i386 file.txt file.txt.o
    * gcc embed.c file.txt.o -o embed
    */

    #include <stdio.h>

    int main()
    {
    typedef unsigned char uchar;
    extern uchar _binary_file_txt_start[];
    extern uchar _binary_file_txt_size[];
    long file_txt_size = _binary_file_txt_size - (uchar *)0;

    for(long i = 0; i < file_txt_size; i++)
    putchar(_binary_file_txt_start[i]);

    return 0;
    }
    output: show file.txt content

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to bart on Sat Jun 1 05:17:12 2024
    bart <[email protected]> writes:

    On 01/06/2024 02:25, Scott Lurndal wrote:

    bart <[email protected]> writes:

    Little of this seems to work, sorry. You guys keep saying, do this, do
    that, no do it that way, go RTFM, but nobody has shown a complete
    program that correctly shows the -size symbol to be giving anything
    meaningful.

    If I run this: [attempt to reproduce example]

    $ cat /tmp/m.c
    #include <stdio.h>
    #include <stdint.h>

    extern uint64_t _binary_main_cpp_size;
    extern uint8_t *_binary_main_cpp_start;
    extern uint8_t *_binary_main_cpp_end;

    int main()
    {
    printf("%p\n", &_binary_main_cpp_size);
    printf("%p\n", &_binary_main_cpp_start);
    printf("%p\n", &_binary_main_cpp_end);
    return 0;
    }
    $ objcopy -I binary -B i386 -O elf64-x86-64 main.cpp /tmp/test.o
    $ cc -o /tmp/m /tmp/m.c /tmp/test.o
    $ /tmp/m
    0x30e2
    0x601034
    0x604116
    $ nm /tmp/m | grep _binary_main
    0000000000604116 D _binary_main_cpp_end
    00000000000030e2 A _binary_main_cpp_size
    0000000000601034 D _binary_main_cpp_start
    $ wc -c main.cpp
    12514 main.cpp
    $ printf 0x%x\\n 12514
    0x30e2

    The size symbol requires no space in the resulting
    executable memory image, and it's more convenient than
    having to do the math (at run time, since the compiler
    can't know the actual values).

    Here's my transcript:

    -------------------------------------
    C:\c>copy hello.c main.cpp # create main.cpp, here it's 70 bytes
    1 file(s) copied.

    C:\c>type m.c # exact same code as yours
    #include <stdio.h>
    #include <stdint.h>

    extern uint64_t _binary_main_cpp_size;
    extern uint8_t *_binary_main_cpp_start;
    extern uint8_t *_binary_main_cpp_end;

    int main()
    {
    printf("%p\n", &_binary_main_cpp_size);
    printf("%p\n", &_binary_main_cpp_start);
    printf("%p\n", &_binary_main_cpp_end);
    return 0;
    }

    C:\c>objcopy -I binary -O elf64-x86-64 main.cpp test.o # make test.o

    C:\c>gcc m.c test.o -o m.exe # build m executable

    C:\c>m # run m.exe
    00007ff5d5480046 # and the size is ...
    00007ff715492010
    00007ff715492056

    [similar results under WSL]

    For what it's worth I see the same behavior running on linux.
    It looks like the culprit is gcc, which apparently relocates
    the symbol even though it is marked with an A type. After
    running around in circles for a goodly amount of time, it
    occurred to me to try compiling using clang, and that worked.

    I suppose it's good to know about the &_binary_main_cpp_size
    trick, but it's kind of the worst of both worlds: the size
    is baked into the executable (or half-baked I might say), but
    the value can't be used at compile time. Bleah. If I wanted
    to use the objcopy method of inserting raw text into a C
    program, I would either do a run-time subtraction to find out
    what the size is, or simply add an extra step to the makefile
    to extract the size out of the 'nm' output and produce a .h
    file with a (named) value that could be used at compile time.
    And both of these methods work under gcc as well as clang.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Lynn McGuire on Sat Jun 1 15:28:02 2024
    On 01/06/2024 01:34, Lynn McGuire wrote:
    On 5/26/2024 6:23 AM, Bonita Montero wrote:
    Am 26.05.2024 um 09:13 schrieb jak:

    About this I only agree partially because it depends a lot on the
    context in which it is used. Moreover, I would not know how to indicate
    an optimal programming language for all seasons.

    C++ is in almost any case the better C.

    What you describe is the greatest inconvenience of c++. To make only one >>> example, when they decided to rewrite the FB platform to accelerate it,
    they thought of migrating from php to c++ and they had a collapse of the >>> staff suitable for work, so they thought of relying a compiler that
    translated the php into c++ and many of the new languages were born to
    try to remedy hits complexity.

    C++ is the wrong language for web applications.
    I like Java more for that.

    C++ is the wrong language for real time apps.  No memory allocation
    allowed.


    I use C++ for real-time apps. You don't have to have dynamic memory
    allocation just because you are writing in C++ !

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Lynn McGuire on Sat Jun 1 15:30:56 2024
    On 01/06/2024 00:55, Lynn McGuire wrote:
    On 5/23/2024 2:25 PM, Bonita Montero wrote:
    Am 22.05.2024 um 18:55 schrieb David Brown:
    In an attempt to bring some topicality to the group, has anyone
    started using, or considering, C23 ?  There's quite a lot of change
    in it, especially compared to the minor changes in C17.

    <https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>
    <https://en.wikipedia.org/wiki/C23_(C_standard_revision)>
    <https://en.cppreference.com/w/c/23>

    I like that it tidies up a lot of old stuff - it is neater to have
    things like "bool", "static_assert", etc., as part of the language
    rather than needing a half-dozen includes for such basic stuff.

    I like that it standardises a several useful extensions that have
    been in gcc and clang (and possibly other compilers) for many years.

    I'm not sure it will make a big difference to my own programming -
    when I want "typeof" or "chk_add()", I already use them in gcc.  But
    for people restricted to standard C, there's more new to enjoy.  And
    I prefer to use standard syntax when possible.

    "constexpr" is something I think I will find helpful, in at least
    some circumstances.


    I ask myself what the point is in further developing a language
    like this that can actually no longer be saved.

    There is way more code written in C than C++.  For instance, just about
    all real time systems such as device and engine management are written
    in C.

    These days, I believe engine management code is more likely to be
    written in C++.


    One of my friends writes the device code for a NAS manufacturer.  The
    code starts off with:
       while (1)
       {
          ...  a bunch of code
       }


    Hey! He's copied from me!

    Pretty much /every/ embedded system has that loop at its heart - either
    once (for bare metal), or in the RTOS and also once per thread.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to bart on Sat Jun 1 15:24:55 2024
    On 01/06/2024 12:24, bart wrote:
    On 01/06/2024 02:25, Scott Lurndal wrote:
    bart <[email protected]> writes:

    Little of this seems to work, sorry. You guys keep saying, do this, do
    that, no do it that way, go RTFM, but nobody has shown a complete
    program that correctly shows the -size symbol to be giving anything
    meaningful.



    But something is not right. I've also asked what is the point of the
    -size symbol if you can just do -end - -start, but nobody has explained.

    $ cat /tmp/m.c
    #include <stdio.h>
    #include <stdint.h>

    extern uint64_t _binary_main_cpp_size;
    extern uint8_t *_binary_main_cpp_start;
    extern uint8_t *_binary_main_cpp_end;

    int main()
    {
         printf("%p\n", &_binary_main_cpp_size);
         printf("%p\n", &_binary_main_cpp_start);
         printf("%p\n", &_binary_main_cpp_end);
         return 0;
    }
    $ objcopy -I binary -B i386 -O elf64-x86-64 main.cpp /tmp/test.o
    $ cc -o /tmp/m /tmp/m.c /tmp/test.o
    $ /tmp/m
    0x30e2
    0x601034
    0x604116
    $ nm /tmp/m | grep _binary_main
    0000000000604116 D _binary_main_cpp_end
    00000000000030e2 A _binary_main_cpp_size
    0000000000601034 D _binary_main_cpp_start


    When I tried it on my Linux system, I get an error "relocation
    R_X86_64_PC32 against absolute symbol `_binary_main_cpp_size' in section `.text' is disallowed". This is, I think, the correct response - from C
    you do not have direct access to absolute linker symbols. There is no
    space allocated for it in the executable, and it only exists as a
    constant in the link stage. Without an allocated address, declaring it
    as "extern" makes no sense. It makes even less sense to try to look at
    the /address/ of the symbol and think that it holds useful information.
    It's not often I say this, but I think Scott has got things muddled here.

    The only workable use of the symbol "_binary_main_cpp_size" would be in
    a linker script. If you want the size of the binary blob in your code,
    use the obvious solution :

    size_t size = &_binary_main_cpp_end - &_binary_main_cpp_start;

    Sometimes it would be useful to have the size of the blob as a constant
    at compile time - such as for declaring another array of the same size,
    or using static assertions to check the size. You can't do that with
    the objcopy blob inclusion. But you /can/ do it using "xxd -i" (or
    similar scripts), or with #embed.

    I am at a loss to see any advantages of the objcopy method in practical
    use for blob embedding.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Tim Rentsch on Sat Jun 1 15:08:46 2024
    Tim Rentsch <[email protected]> writes:
    bart <[email protected]> writes:

    On 01/06/2024 02:25, Scott Lurndal wrote:

    bart <[email protected]> writes:

    Little of this seems to work, sorry. You guys keep saying, do this, do >>>> that, no do it that way, go RTFM, but nobody has shown a complete
    program that correctly shows the -size symbol to be giving anything
    meaningful.

    If I run this: [attempt to reproduce example]

    $ cat /tmp/m.c
    #include <stdio.h>
    #include <stdint.h>

    extern uint64_t _binary_main_cpp_size;
    extern uint8_t *_binary_main_cpp_start;
    extern uint8_t *_binary_main_cpp_end;

    int main()
    {
    printf("%p\n", &_binary_main_cpp_size);
    printf("%p\n", &_binary_main_cpp_start);
    printf("%p\n", &_binary_main_cpp_end);
    return 0;
    }
    $ objcopy -I binary -B i386 -O elf64-x86-64 main.cpp /tmp/test.o
    $ cc -o /tmp/m /tmp/m.c /tmp/test.o
    $ /tmp/m
    0x30e2
    0x601034
    0x604116
    $ nm /tmp/m | grep _binary_main
    0000000000604116 D _binary_main_cpp_end
    00000000000030e2 A _binary_main_cpp_size
    0000000000601034 D _binary_main_cpp_start
    $ wc -c main.cpp
    12514 main.cpp
    $ printf 0x%x\\n 12514
    0x30e2

    The size symbol requires no space in the resulting
    executable memory image, and it's more convenient than
    having to do the math (at run time, since the compiler
    can't know the actual values).

    Here's my transcript:

    -------------------------------------
    C:\c>copy hello.c main.cpp # create main.cpp, here it's 70 bytes
    1 file(s) copied.

    C:\c>type m.c # exact same code as yours
    #include <stdio.h>
    #include <stdint.h>

    extern uint64_t _binary_main_cpp_size;
    extern uint8_t *_binary_main_cpp_start;
    extern uint8_t *_binary_main_cpp_end;

    int main()
    {
    printf("%p\n", &_binary_main_cpp_size);
    printf("%p\n", &_binary_main_cpp_start);
    printf("%p\n", &_binary_main_cpp_end);
    return 0;
    }

    C:\c>objcopy -I binary -O elf64-x86-64 main.cpp test.o # make test.o

    C:\c>gcc m.c test.o -o m.exe # build m executable

    C:\c>m # run m.exe
    00007ff5d5480046 # and the size is ...
    00007ff715492010
    00007ff715492056

    [similar results under WSL]

    For what it's worth I see the same behavior running on linux.

    Which versions? It works fine on my linux system (FC20, GCC 4.8.3)

    It looks like the culprit is gcc, which apparently relocates
    the symbol even though it is marked with an A type.

    gcc doesn't do 'relocations'. If you have a problem, it's
    likely with binutils (i.e. ld(1)).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Sat Jun 1 21:11:31 2024
    On Fri, 31 May 2024 19:03:10 +0100
    bart <[email protected]> wrote:

    OK, thanks. But I forget to ask what results you got from running the program. Because if I try your code, using hello.c and hello.exe as
    test binary/source data, I get this output:

    _binary_test_bi_start 00007ff6497620e0
    140695771160800 _binary_test_bi_end
    00007ff649762ae0 140695771163360 _binary_test_bi_size
    00007ff509750a00 140690402380288 _binary_bin_to_list_c_start
    00007ff649762ae0 140695771163360 _binary_bin_to_list_c_end
    00007ff649762b26 140695771163430
    _binary_bin_to_list_c_size 00007ff509750046
    140690402377798

    The sizes should have been 2560 and 70 respectively; those values are
    bit bigger than that.


    That's strange. I got expected results:
    _binary_test_bi_start 000000013FDD30C0 5366427840 _binary_test_bi_end 000000013FDD67AC 5366441900 _binary_test_bi_size 00000000000036EC 14060 _binary_bin_to_list_c_start 000000013FDD67AC 5366441900 _binary_bin_to_list_c_end 000000013FDD711F 5366444319 _binary_bin_to_list_c_size 0000000000000973 2419

    However I see that you also have start and end addresses, which
    sounds a much better way of determining the size. (In that case, what
    are those *size symbols for?).


    I'd guess, *_size is here for the benefit of less smart compilers that
    can not figure out that *_end - *_start is a connst expression
    and can not compile code like:

    static ptrdiff_t bar = _binary_test_bi_end - _binary_test_bi_start;


    But that is just a guess. For better answer you can ask authors of
    objcopy.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to bart on Sat Jun 1 19:59:19 2024
    On 01/06/2024 11:24, bart wrote:
    On 01/06/2024 02:25, Scott Lurndal wrote:

    [objcopy]


    Nope, same thing. This doesn't inspire much confidence. With values
    shown, the actual size IS contained within the _size value, but only as
    the last 16 bits of the value.

    gcc versions were 10.3.0 and 9.4.0 respectively; the latter is what is provided by Windows 11.

    You also brought up the fact that the size is not known to the compiler anyway, which means a few things are not possible, like using the size
    in a static context.

    I thought I'd dash off my own version of 'objcopy' to see if I could do
    any better. This version does binary/text to COFF only. The input file
    here was a .wav file:

    C:\qapps>qq objcopy test.wav # running my objcopy
    Compiling test.m to test.obj
    Written test.obj # haven't settled on naming schemes yet
    char[] name is: test_wav
    u64 size name is: test_wav_len

    C:\qapps>gcc demo.c test.obj

    C:\qapps>a
    Size = 14355
    Data = 52 49 46 ...

    The demo.c file is this:

    #include <stdio.h>

    extern char test_wav[];
    extern long long test_wav_len;

    int main(void) {
    printf("Size = %lld\n", test_wav_len);
    printf("Data = %02x %02x %02x ...\n", test_wav[0], test_wav[1], test_wav[2]);
    }


    And this is info about the binary to show the right data has got into
    the C program:

    C:\qapps>dir test.wav
    01/11/1996 04:05 14,354 test.wav

    C:\qapps>dump test.wav
    Dump of test.wav; Size = 14354 bytes
    0000: 52 49 46 46 0A 38 00 00 57 41 56 45 66 6D 74 20 RIFF.8..WAVEfmt

    There is one slight discrepancy: the size from the C file is one byte
    bigger; that's because I'm using 'strinclude' (in the code compiled
    during the process, which adds a terminator. I can fix that easily, or
    allow the option.

    The script used to implement my 'objcopy' is shown below. It writes out
    a 2-line program which is compiled by my systems language into an object
    file.



    ------------------------------------------------

    proc main=
    if ncmdparams<1 then
    println "Usage:"
    println " qq objcopy filename [name]"
    stop
    fi

    infile:=cmdparams[1]
    basename:=extractbasefile(infile)
    mfile:=basename+".m"
    objfile:=basename+".obj"
    if infile in (mfile, objfile) then abort("Name clash") fi

    name:=basename+"_"+extractext(infile)
    if ncmdparams>1 then
    name:=cmdparams[2]
    fi

    writetextfile(mfile, (
    sfprint("export []byte # = strinclude(""#"")",name,
    infile),
    sfprint("export int #_len = #.len", name, name)
    )
    )

    if system("mm -obj "+mfile)<>0 then
    abort("Compile error on "+mfile)
    else
    println "Written", objfile
    println "char[] name is:",name
    println "u64 size name is:",name+"_len"
    fi
    end

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Sat Jun 1 22:51:09 2024
    On Sat, 01 Jun 2024 05:17:12 -0700
    Tim Rentsch <[email protected]> wrote:

    For what it's worth I see the same behavior running on linux.
    It looks like the culprit is gcc, which apparently relocates
    the symbol even though it is marked with an A type. After
    running around in circles for a goodly amount of time, it
    occurred to me to try compiling using clang, and that worked.


    It works on Window/msys2 with gcc 13.2.0

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Sun Jun 2 01:11:35 2024
    On Fri, 31 May 2024 22:15:54 +0100
    bart <[email protected]> wrote:

    If I run this:

    printf("%p\n", &_binary_hello_c_start);
    printf("%p\n", &_binary_hello_c_end);
    printf("%p\n", &_binary_hello_c_size);

    I get:

    00007ff6ef252010
    00007ff6ef252056
    00007ff5af240046

    I can see that the first two can be subtracted to give the sizes of
    the data, which is 70 or 0x46. 0x46 is the last byte of the address
    of _size, so what's happening there? What's with the crap in bits
    16-47?


    It looks like ASLR. I don't see it because I test on Win7.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Sun Jun 2 03:06:04 2024
    On Sun, 2 Jun 2024 00:39:39 +0100
    bart <[email protected]> wrote:

    On 01/06/2024 23:11, Michael S wrote:
    On Fri, 31 May 2024 22:15:54 +0100
    bart <[email protected]> wrote:

    If I run this:

    printf("%p\n", &_binary_hello_c_start);
    printf("%p\n", &_binary_hello_c_end);
    printf("%p\n", &_binary_hello_c_size);

    I get:

    00007ff6ef252010
    00007ff6ef252056
    00007ff5af240046

    I can see that the first two can be subtracted to give the sizes of
    the data, which is 70 or 0x46. 0x46 is the last byte of the address
    of _size, so what's happening there? What's with the crap in bits
    16-47?


    It looks like ASLR. I don't see it because I test on Win7.


    I understand those are high-loading addresses. I was asking what they
    were doing as part of the size.

    Apparently, that size value is wrongly relocated by some versions of
    gcc-ld. Since allocations work on 64KB blocks, that explains why the
    bottom 16 bits are unaffected.


    gnu-ld just erroneously marks it as relocatable.
    Then Windows loader relocate/ I'd gues, Linux loader too.

    So such a size value could still be used for objects up 64KB-1, but
    it sounds dodgy.


    For embedded bare-metal use, it will work o.k.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Sun Jun 2 00:39:39 2024
    On 01/06/2024 23:11, Michael S wrote:
    On Fri, 31 May 2024 22:15:54 +0100
    bart <[email protected]> wrote:

    If I run this:

    printf("%p\n", &_binary_hello_c_start);
    printf("%p\n", &_binary_hello_c_end);
    printf("%p\n", &_binary_hello_c_size);

    I get:

    00007ff6ef252010
    00007ff6ef252056
    00007ff5af240046

    I can see that the first two can be subtracted to give the sizes of
    the data, which is 70 or 0x46. 0x46 is the last byte of the address
    of _size, so what's happening there? What's with the crap in bits
    16-47?


    It looks like ASLR. I don't see it because I test on Win7.


    I understand those are high-loading addresses. I was asking what they
    were doing as part of the size.

    Apparently, that size value is wrongly relocated by some versions of
    gcc-ld. Since allocations work on 64KB blocks, that explains why the
    bottom 16 bits are unaffected.

    So such a size value could still be used for objects up 64KB-1, but it
    sounds dodgy.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Scott Lurndal on Sat Jun 1 17:22:57 2024
    [email protected] (Scott Lurndal) writes:

    Tim Rentsch <[email protected]> writes:

    bart <[email protected]> writes:

    On 01/06/2024 02:25, Scott Lurndal wrote:

    bart <[email protected]> writes:

    Little of this seems to work, sorry. You guys keep saying, do this, do >>>>> that, no do it that way, go RTFM, but nobody has shown a complete
    program that correctly shows the -size symbol to be giving anything
    meaningful.

    If I run this: [attempt to reproduce example]

    $ cat /tmp/m.c
    #include <stdio.h>
    #include <stdint.h>

    extern uint64_t _binary_main_cpp_size;
    extern uint8_t *_binary_main_cpp_start;
    extern uint8_t *_binary_main_cpp_end;

    int main()
    {
    printf("%p\n", &_binary_main_cpp_size);
    printf("%p\n", &_binary_main_cpp_start);
    printf("%p\n", &_binary_main_cpp_end);
    return 0;
    }
    $ objcopy -I binary -B i386 -O elf64-x86-64 main.cpp /tmp/test.o
    $ cc -o /tmp/m /tmp/m.c /tmp/test.o
    $ /tmp/m
    0x30e2
    0x601034
    0x604116
    $ nm /tmp/m | grep _binary_main
    0000000000604116 D _binary_main_cpp_end
    00000000000030e2 A _binary_main_cpp_size
    0000000000601034 D _binary_main_cpp_start
    $ wc -c main.cpp
    12514 main.cpp
    $ printf 0x%x\\n 12514
    0x30e2

    The size symbol requires no space in the resulting
    executable memory image, and it's more convenient than
    having to do the math (at run time, since the compiler
    can't know the actual values).

    Here's my transcript:

    -------------------------------------
    C:\c>copy hello.c main.cpp # create main.cpp, here it's 70 bytes >>> 1 file(s) copied.

    C:\c>type m.c # exact same code as yours
    #include <stdio.h>
    #include <stdint.h>

    extern uint64_t _binary_main_cpp_size;
    extern uint8_t *_binary_main_cpp_start;
    extern uint8_t *_binary_main_cpp_end;

    int main()
    {
    printf("%p\n", &_binary_main_cpp_size);
    printf("%p\n", &_binary_main_cpp_start);
    printf("%p\n", &_binary_main_cpp_end);
    return 0;
    }

    C:\c>objcopy -I binary -O elf64-x86-64 main.cpp test.o # make test.o

    C:\c>gcc m.c test.o -o m.exe # build m executable

    C:\c>m # run m.exe
    00007ff5d5480046 # and the size is ...
    00007ff715492010
    00007ff715492056

    [similar results under WSL]

    For what it's worth I see the same behavior running on linux.

    Which versions? It works fine on my linux system (FC20, GCC 4.8.3)

    gcc --version gives 'gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0'

    It looks like the culprit is gcc, which apparently relocates
    the symbol even though it is marked with an A type.

    gcc doesn't do 'relocations'. If you have a problem, it's
    likely with binutils (i.e. ld(1)).

    I expect you are right. I run ld directly only rarely, and
    certainly am no expert. In my tests I was simply blindly
    following the example shown in your posting (with some variations
    after my attempts gave the wrong answer, trying to get it to
    work). It didn't occur to me to consider ld.

    Using clang for the final link step always gave the right answer,
    if I remember correctly.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to jak on Sat Jun 1 17:26:41 2024
    jak <[email protected]> writes:

    bart ha scritto:

    On 01/06/2024 02:37, jak wrote:

    bart ha scritto:

    I can see that the first two can be subtracted to give the sizes
    of the data, which is 70 or 0x46. 0x46 is the last byte of the
    address of _size, so what's happening there? What's with the crap
    in bits 16-47?

    I can extract the size using:

    printf("%d\n", (unsigned short)&_binary_hello_c_size);

    But something is not right. I've also asked what is the point of
    the -size symbol if you can just do -end - -start, but nobody has
    explained.

    typedef unsigned char uchar;
    extern uchar _binary_hello_c_size[];
    long hello_c_size = _binary_hello_c_size - (uchar *)0;

    What result for the size did you get when you ran this?

    It seems people are just guessing what might be the right code and
    posting random fragments!

    I wrote it that way precisely because I believed it was the clearest
    way. [...]

    What is most clear is that the expression used has undefined
    behavior.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Sat Jun 1 17:47:31 2024
    Michael S <[email protected]> writes:

    On Fri, 31 May 2024 19:03:10 +0100
    bart <[email protected]> wrote:

    OK, thanks. But I forget to ask what results you got from running the
    program. Because if I try your code, using hello.c and hello.exe as
    test binary/source data, I get this output:

    _binary_test_bi_start 00007ff6497620e0
    140695771160800 _binary_test_bi_end
    00007ff649762ae0 140695771163360 _binary_test_bi_size
    00007ff509750a00 140690402380288 _binary_bin_to_list_c_start
    00007ff649762ae0 140695771163360 _binary_bin_to_list_c_end
    00007ff649762b26 140695771163430
    _binary_bin_to_list_c_size 00007ff509750046
    140690402377798

    The sizes should have been 2560 and 70 respectively; those values are
    bit bigger than that.

    That's strange. I got expected results:
    _binary_test_bi_start 000000013FDD30C0 5366427840 _binary_test_bi_end 000000013FDD67AC 5366441900 _binary_test_bi_size 00000000000036EC 14060 _binary_bin_to_list_c_start 000000013FDD67AC 5366441900 _binary_bin_to_list_c_end 000000013FDD711F 5366444319 _binary_bin_to_list_c_size 0000000000000973 2419

    However I see that you also have start and end addresses, which
    sounds a much better way of determining the size. (In that case, what
    are those *size symbols for?).

    I'd guess, *_size is here for the benefit of less smart compilers that
    can not figure out that *_end - *_start is a connst expression
    and can not compile code like:

    static ptrdiff_t bar = _binary_test_bi_end - _binary_test_bi_start;

    I wouldn't expect any C compiler to accept that. It's not a constant expression under the rules of the C standard, and there is no plausible
    way to generate code for it in the context of compiling one translation
    unit. Neither gcc nor clang accepts it, even run without asking for
    any standard compliance. An implementation could allow it as an
    extension, but there seems little reason to do so, because it would be
    a lot of work to implement, and offers very little utility.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Malcolm McLean on Sun Jun 2 03:28:32 2024
    On Sat, 1 Jun 2024 16:33:29 +0100, Malcolm McLean wrote:

    ... I like strings which you
    can pass about (though to actually use the contents you need to covert
    to char *, otherwuse it is hopeless) ...

    That’s a bit sad, isn’t it, that such a feature is so obviously a bag
    stuck on the side of the original C core.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to bart on Sun Jun 2 03:27:20 2024
    On Sat, 1 Jun 2024 11:37:45 +0100, bart wrote:

    My compilers don't routinely generate object files, which would also
    need an external dependency (a linker), but they can do if necessary
    (eg. to statically link my code into another program with another
    compiler).

    Modular code design would indicate that there is no point the compiler duplicating functionality available in the linker.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Lynn McGuire on Sun Jun 2 03:29:13 2024
    On Fri, 31 May 2024 17:55:13 -0500, Lynn McGuire wrote:

    while (1)

    Why not

    while (true)

    or even

    for (;;)

    ?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Scott Lurndal on Sun Jun 2 11:02:13 2024
    On Sat, 01 Jun 2024 01:27:41 GMT
    [email protected] (Scott Lurndal) wrote:

    Lynn McGuire <[email protected]> writes:
    On 5/26/2024 6:23 AM, Bonita Montero wrote:
    Am 26.05.2024 um 09:13 schrieb jak:

    About this I only agree partially because it depends a lot on the
    context in which it is used. Moreover, I would not know how to
    indicate an optimal programming language for all seasons.

    C++ is in almost any case the better C.

    What you describe is the greatest inconvenience of c++. To make
    only one example, when they decided to rewrite the FB platform to
    accelerate it, they thought of migrating from php to c++ and they
    had a collapse of the staff suitable for work, so they thought of
    relying a compiler that translated the php into c++ and many of
    the new languages were born to try to remedy hits complexity.

    C++ is the wrong language for web applications.
    I like Java more for that.

    C++ is the wrong language for real time apps.

    That's an incorrect statement.

    No memory allocation allowed.

    It is trivially easy to write C++ code that doesn't
    allocate memory dynamically.


    I use C++ for my server side apps on my webserver. Works great.

    I use C++ for operating systems (you can't get more real-time
    than that)

    Engines control is FAR more real-time that OS, to list just one example
    out of many.
    Of course, nowadays most of these things are no longer done on
    general-purpose CPUs or even MCUs.


    and bare-metal hypervisors.

    It is hard to believe that you don't have at least one co-worker that
    is begging to switch all new development to C approximately every week.
    And couple of folks that beg for Rust.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Lawrence D'Oliveiro on Sun Jun 2 10:37:55 2024
    On 02/06/2024 04:27, Lawrence D'Oliveiro wrote:
    On Sat, 1 Jun 2024 11:37:45 +0100, bart wrote:

    My compilers don't routinely generate object files, which would also
    need an external dependency (a linker), but they can do if necessary
    (eg. to statically link my code into another program with another
    compiler).

    Modular code design would indicate that there is no point the compiler duplicating functionality available in the linker.

    Python uses modules and yet doesn't have a linker. How on earth does it
    manage?

    Lots of languages get by without linkers. Or without having to
    pointlessly write out lots of discrete files, with a lot of useful info
    lost, then having to read them all in again. (Look at the mess that
    'objcopy' gets into.)

    Quite a few compilers give the impression that they also do the job of
    linking:

    gcc x.c y.c z.c

    produces an executable. Does it really matter here whether the 'linking'
    is done by a separate program on discrete files, or completely internally?

    Having all modules in-memory gives you the opportunity for whole-program optimisation, will all useful info intact, without having to invent the
    far hairier and unwieldy concept of LTO.

    Here is also my assembler in action given modules x.asm y.asm z.asm
    produced by my C compiler:

    aa x y z

    It does the job of 'linking' but working from .asm files straight to
    .exe or .dll. What's the effing point of a separate linker here?

    Personally I first designed out a traditional linker sometime around
    1983. The special Loader I write to combined my object files into a
    single binary took seconds, even on floppies. A traditional linker would
    have taken minutes. God knows what they were doing.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Sun Jun 2 14:03:30 2024
    On 02/06/2024 10:02, Michael S wrote:
    On Sat, 01 Jun 2024 01:27:41 GMT
    [email protected] (Scott Lurndal) wrote:

    Lynn McGuire <[email protected]> writes:
    On 5/26/2024 6:23 AM, Bonita Montero wrote:
    Am 26.05.2024 um 09:13 schrieb jak:

    About this I only agree partially because it depends a lot on the
    context in which it is used. Moreover, I would not know how to
    indicate an optimal programming language for all seasons.

    C++ is in almost any case the better C.

    What you describe is the greatest inconvenience of c++. To make
    only one example, when they decided to rewrite the FB platform to
    accelerate it, they thought of migrating from php to c++ and they
    had a collapse of the staff suitable for work, so they thought of
    relying a compiler that translated the php into c++ and many of
    the new languages were born to try to remedy hits complexity.

    C++ is the wrong language for web applications.
    I like Java more for that.

    C++ is the wrong language for real time apps.

    That's an incorrect statement.

    No memory allocation allowed.

    It is trivially easy to write C++ code that doesn't
    allocate memory dynamically.


    I use C++ for my server side apps on my webserver. Works great.

    I use C++ for operating systems (you can't get more real-time
    than that)

    Engines control is FAR more real-time that OS, to list just one example
    out of many.

    Most engine control software runs on an RTOS - so you have at least as
    tough real-time requirements for the OS as for the application. The OS
    stuff Scott works with, AFAIK, is real-time OS's for specific tasks such
    as high-end network equipment. It is not general-purpose or desktop
    OS's (which I agree are not particularly real-time).

    Of course, nowadays most of these things are no longer done on general-purpose CPUs or even MCUs.


    I think you have got that backwards.

    Most engine control /is/ done with general purpose microcontrollers, or
    at least specific variants of them. They will use ARM Cortex-R or
    Cortex-M cores rather than Cortex-A cores (i.e., the "real-time" cores
    or "microcontroller" cores rather than the "application" cores you see
    in telephones, Macs, and ARM servers), but they are standard cores.
    Another common choice is the PowerPC cores used in NXP's engine controllers.

    It used to be the case that engine control and other critical hard
    real-time work was done with DSPs or FPGAs, but those days are long past.


    and bare-metal hypervisors.

    It is hard to believe that you don't have at least one co-worker that
    is begging to switch all new development to C approximately every week.
    And couple of folks that beg for Rust.


    It's possible that he has newbies amongst his co-workers, yes.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Sun Jun 2 16:29:14 2024
    On Sun, 2 Jun 2024 14:03:30 +0200
    David Brown <[email protected]> wrote:

    On 02/06/2024 10:02, Michael S wrote:
    On Sat, 01 Jun 2024 01:27:41 GMT
    [email protected] (Scott Lurndal) wrote:

    Lynn McGuire <[email protected]> writes:
    On 5/26/2024 6:23 AM, Bonita Montero wrote:
    Am 26.05.2024 um 09:13 schrieb jak:

    About this I only agree partially because it depends a lot on
    the context in which it is used. Moreover, I would not know how
    to indicate an optimal programming language for all seasons.

    C++ is in almost any case the better C.

    What you describe is the greatest inconvenience of c++. To make
    only one example, when they decided to rewrite the FB platform
    to accelerate it, they thought of migrating from php to c++ and
    they had a collapse of the staff suitable for work, so they
    thought of relying a compiler that translated the php into c++
    and many of the new languages were born to try to remedy hits
    complexity.

    C++ is the wrong language for web applications.
    I like Java more for that.

    C++ is the wrong language for real time apps.

    That's an incorrect statement.

    No memory allocation allowed.

    It is trivially easy to write C++ code that doesn't
    allocate memory dynamically.


    I use C++ for my server side apps on my webserver. Works great.

    I use C++ for operating systems (you can't get more real-time
    than that)

    Engines control is FAR more real-time that OS, to list just one
    example out of many.

    Most engine control software runs on an RTOS - so you have at least
    as tough real-time requirements for the OS as for the application.

    From what I read about this stuff (admittedly, long time ago) even
    when there is a RTOS, the important part runs alongside RTOS rather than
    "on" RTOS.
    I.e. there is high priority interrupt that is never ever masked by OS in
    the region that is anywhere close to expected time and all
    time-sensitive work is done by ISR, with no sort of RTOS calls.

    The OS stuff Scott works with, AFAIK, is real-time OS's for specific
    tasks such as high-end network equipment. It is not general-purpose
    or desktop OS's (which I agree are not particularly real-time).

    I'd characterized the software running within high-end NIC is as very
    soft real-time. You only care for buffers to not overflow. And if they overflow, it's not too bad either. The flow is very much unidirectional
    or bi-directional with direction almost independent of each other.
    There are dependencies between directions, e.g. TCP acks, but they a
    weak dependencies timing-wise.
    Hard real time is about closed loops, most often closed control loops,
    but not only those.


    Of course, nowadays most of these things are no longer done on general-purpose CPUs or even MCUs.


    I think you have got that backwards.

    Most engine control /is/ done with general purpose microcontrollers,
    or at least specific variants of them. They will use ARM Cortex-R or Cortex-M cores rather than Cortex-A cores (i.e., the "real-time"
    cores or "microcontroller" cores rather than the "application" cores
    you see in telephones, Macs, and ARM servers), but they are standard
    cores. Another common choice is the PowerPC cores used in NXP's
    engine controllers.

    It used to be the case that engine control and other critical hard
    real-time work was done with DSPs or FPGAs, but those days are long
    past.


    Are you sure?
    It's much simpler and far more reliable to do such task with $5 PLD
    (which today means FPGA that boots from internal flash, rather than
    old day's PLD) than with MCU, regardless of price of MCU.
    Even if MCU is $4.99 cheaper, the difference is a noise relatively to
    price of engine.


    and bare-metal hypervisors.

    It is hard to believe that you don't have at least one co-worker
    that is begging to switch all new development to C approximately
    every week. And couple of folks that beg for Rust.


    It's possible that he has newbies amongst his co-workers, yes.


    Well, Linus is not on his team, but if he was, he would say the same
    thing. But probably at much higher rate than weekly.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to [email protected] on Sun Jun 2 13:24:23 2024
    In article <v3gou9$36n61$[email protected]>,
    Lawrence D'Oliveiro <[email protected]d> wrote:
    On Fri, 31 May 2024 17:55:13 -0500, Lynn McGuire wrote:

    while (1)

    Why not

    while (true)

    or even

    for (;;)

    ?

    Or even:

    :loop
    ....
    goto loop

    --
    "The party of Lincoln has become the party of John Wilkes Booth."

    - Carlos Alazraqui -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lew Pitcher@21:1/5 to Kenny McCormack on Sun Jun 2 16:51:15 2024
    On Sun, 02 Jun 2024 13:24:23 +0000, Kenny McCormack wrote:

    In article <v3gou9$36n61$[email protected]>,
    Lawrence D'Oliveiro <[email protected]d> wrote:
    On Fri, 31 May 2024 17:55:13 -0500, Lynn McGuire wrote:

    while (1)

    Why not

    while (true)

    or even

    for (;;)

    ?

    I've always considered
    for (;;)
    preferable over
    while (1)
    as the for (;;) expression does not require the compiler to expand
    and evaluate a condition expression.

    For the for (;;), the compiler sees the token stream <LPAREN>
    <SEMICOLON> <SEMICOLON> <RPAREN>, and emits a closed loop, but
    with while (1), the compiler sees <LPAREN> <CONSTANT> <RPAREN>,
    and has to evaluate (either at compile time or at execution
    time) the value of the <CONSTANT> to determine whether or or
    not to emit the closed loop logic.


    Or even:

    :loop
    ....
    goto loop

    ITYM

    loop:
    /*Stuff happens here */
    goto loop;



    --
    Lew Pitcher
    "In Skills We Trust"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Michael S on Sun Jun 2 19:23:55 2024
    Michael S <[email protected]> writes:
    On Sun, 2 Jun 2024 14:03:30 +0200
    David Brown <[email protected]> wrote:


    The OS stuff Scott works with, AFAIK, is real-time OS's for specific
    tasks such as high-end network equipment. It is not general-purpose
    or desktop OS's (which I agree are not particularly real-time).

    I'd characterized the software running within high-end NIC is as very
    soft real-time. You only care for buffers to not overflow. And if they >overflow, it's not too bad either. The flow is very much unidirectional
    or bi-directional with direction almost independent of each other.

    A high-end network controller needs to be able to support line-rate
    on multiple high speed (100Gb/s+) ports while routing, encapsulating, prioritizing, decrypting/encrypting, and/or applying various protocol transformations to the network traffic. Much of that pipeline is
    implemented in gates, but at various points in flow, the CPU will need
    to be involved and must not cause packet loss, so interrupt latency is
    very important (and having enough cores to handle the required levels
    of traffic). That's one of the reasons that DPDK and ODP stacks run
    in user-mode, with direct access to the hardware - to avoid the overhead
    of kernel mode switches. The hardware is virtualized (using SR-IOV
    on PCIe) so the 'function' exposed to usermode code is isolated from
    other networking resources and traffic flows.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Michael S on Sun Jun 2 19:15:29 2024
    Michael S <[email protected]> writes:
    On Sat, 01 Jun 2024 01:27:41 GMT
    [email protected] (Scott Lurndal) wrote:

    Lynn McGuire <[email protected]> writes:
    On 5/26/2024 6:23 AM, Bonita Montero wrote:
    Am 26.05.2024 um 09:13 schrieb jak:

    About this I only agree partially because it depends a lot on the
    context in which it is used. Moreover, I would not know how to
    indicate an optimal programming language for all seasons.

    C++ is in almost any case the better C.

    What you describe is the greatest inconvenience of c++. To make
    only one example, when they decided to rewrite the FB platform to
    accelerate it, they thought of migrating from php to c++ and they
    had a collapse of the staff suitable for work, so they thought of
    relying a compiler that translated the php into c++ and many of
    the new languages were born to try to remedy hits complexity.

    C++ is the wrong language for web applications.
    I like Java more for that.

    C++ is the wrong language for real time apps.

    That's an incorrect statement.

    No memory allocation allowed.

    It is trivially easy to write C++ code that doesn't
    allocate memory dynamically.


    I use C++ for my server side apps on my webserver. Works great.

    I use C++ for operating systems (you can't get more real-time
    than that)

    Engines control is FAR more real-time that OS, to list just one example
    out of many.

    Actually there are real-time operating systems that support
    those applications.

    Of course, nowadays most of these things are no longer done on >general-purpose CPUs or even MCUs.


    and bare-metal hypervisors.

    It is hard to believe that you don't have at least one co-worker that
    is begging to switch all new development to C approximately every week.

    Language choice is based on the needs of the project. Linux driver
    work necessarily needs to be done in C (perhaps also Rust in the near
    future).

    I've not heard any compliants about language. Discussions about what
    features of a language are useful in our code, yes, those occur, primarily because we need to support older toolchains compatible with third-party toolsets (e.g. verilog environments) which, for example, limits us to
    C++11 features.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Lew Pitcher on Sun Jun 2 19:52:22 2024
    On 2024-06-02, Lew Pitcher <[email protected]> wrote:
    I've always considered
    for (;;)
    preferable over
    while (1)

    Of course it is preferable. The idiom constitutes the language's direct
    support for unconditional looping, not requiring that to be requested by
    an extraneous always-true expression.

    Using while (1) or while (true) is like i = i + 1 instead
    of ++i, or while (*dst++ = *src++); instead of strcpy.

    When Dennis Ritchie (if it was indeed he) chose for to be the construct
    in which the guard expression may be omitted, so that it may express conditional looping, he expressed the intent that it be henceforth used
    for that purpose.

    To continue to use while (1) after the proper utensil is provided is
    like to eat with your hands instead of a fork.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @[email protected]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Sun Jun 2 21:44:01 2024
    On 02/06/2024 15:29, Michael S wrote:
    On Sun, 2 Jun 2024 14:03:30 +0200
    David Brown <[email protected]> wrote:

    On 02/06/2024 10:02, Michael S wrote:
    On Sat, 01 Jun 2024 01:27:41 GMT
    [email protected] (Scott Lurndal) wrote:

    Lynn McGuire <[email protected]> writes:
    On 5/26/2024 6:23 AM, Bonita Montero wrote:
    Am 26.05.2024 um 09:13 schrieb jak:

    About this I only agree partially because it depends a lot on
    the context in which it is used. Moreover, I would not know how
    to indicate an optimal programming language for all seasons.

    C++ is in almost any case the better C.

    What you describe is the greatest inconvenience of c++. To make
    only one example, when they decided to rewrite the FB platform
    to accelerate it, they thought of migrating from php to c++ and
    they had a collapse of the staff suitable for work, so they
    thought of relying a compiler that translated the php into c++
    and many of the new languages were born to try to remedy hits
    complexity.

    C++ is the wrong language for web applications.
    I like Java more for that.

    C++ is the wrong language for real time apps.

    That's an incorrect statement.

    No memory allocation allowed.

    It is trivially easy to write C++ code that doesn't
    allocate memory dynamically.


    I use C++ for my server side apps on my webserver. Works great.

    I use C++ for operating systems (you can't get more real-time
    than that)

    Engines control is FAR more real-time that OS, to list just one
    example out of many.

    Most engine control software runs on an RTOS - so you have at least
    as tough real-time requirements for the OS as for the application.

    From what I read about this stuff (admittedly, long time ago) even
    when there is a RTOS, the important part runs alongside RTOS rather than
    "on" RTOS.
    I.e. there is high priority interrupt that is never ever masked by OS in
    the region that is anywhere close to expected time and all
    time-sensitive work is done by ISR, with no sort of RTOS calls.

    That's sort-of right. To be precise for something like this, we'd have
    to say what exactly we mean by "engine controller". There are many
    kinds of engine or motor, and many types of control that are needed for
    them. Generally, there is a hierarchy of simpler but more time-critical
    parts up to more complex but more flexible parts of the system.

    As an example of a system of motor control that I've worked on (electric
    motors rather than combustion engines), the most timing-critical signal generation and safety (emergency stop, overload protection, etc.) are
    all in hardware - typically dedicated peripherals in the
    microcontroller. Some safety parts might also be implemented in
    non-maskable interrupt functions that the RTOS can never disable.

    The low-level control of the motors is typically run by timer interrupt functions. These may be disabled by the RTOS, but will only be disabled
    for a very short (and predictable) time - interrupt disabling is usually essential to the way locks and inter-process communication works,
    including communication between these timer functions and the rest of
    the code. Higher level control runs as RTOS tasks of various
    priorities, and communication with other boards is usually a lower
    priority task. Clearly these real-time tasks cannot be more "real-time"
    than the RTOS itself. Other boards might have high level non-realtime
    system determining things like path finding, or user interfaces.

    And until you get to the highest level stuff, there is no reason why C++
    is not suitable. But whether you use C++, C, Assembly, or Ada for the low-level and more real-time critical code, you avoid dynamic memory, exceptions, and other techniques that can have unpredictable failure
    modes and unexpected delays. (The high-level stuff can be written in
    any language.)


    The OS stuff Scott works with, AFAIK, is real-time OS's for specific
    tasks such as high-end network equipment. It is not general-purpose
    or desktop OS's (which I agree are not particularly real-time).

    I'd characterized the software running within high-end NIC is as very
    soft real-time.

    I'd characterize it as whatever Scott says it is - he's the expert
    there, not you or me.

    You only care for buffers to not overflow. And if they
    overflow, it's not too bad either.

    That is true for some things, but most certainly not for all usage.

    The flow is very much unidirectional
    or bi-directional with direction almost independent of each other.
    There are dependencies between directions, e.g. TCP acks, but they a
    weak dependencies timing-wise.

    There is a lot of networking that is not TCP/IP.

    High-speed network interfaces are used for two purposes - to get high throughput, or to get low latencies. Throughput is not as sensitive to
    timing and can tolerate some variation as long as the traffic is
    independent, but latency is a different matter.

    Hard real time is about closed loops, most often closed control loops,
    but not only those.


    Of course, nowadays most of these things are no longer done on
    general-purpose CPUs or even MCUs.


    I think you have got that backwards.

    Most engine control /is/ done with general purpose microcontrollers,
    or at least specific variants of them. They will use ARM Cortex-R or
    Cortex-M cores rather than Cortex-A cores (i.e., the "real-time"
    cores or "microcontroller" cores rather than the "application" cores
    you see in telephones, Macs, and ARM servers), but they are standard
    cores. Another common choice is the PowerPC cores used in NXP's
    engine controllers.

    It used to be the case that engine control and other critical hard
    real-time work was done with DSPs or FPGAs, but those days are long
    past.


    Are you sure?

    Pretty sure, yes.

    It's much simpler and far more reliable to do such task with $5 PLD
    (which today means FPGA that boots from internal flash, rather than
    old day's PLD) than with MCU, regardless of price of MCU.

    No, it is not simpler or more reliable. Programmable logic is rarely
    used for engine or motor control. You use microcontrollers with
    appropriate peripherals, such as sophisticated PWM units and encoder interfaces, and advanced timers.

    Even if MCU is $4.99 cheaper, the difference is a noise relatively to
    price of engine.

    That part is true.



    and bare-metal hypervisors.

    It is hard to believe that you don't have at least one co-worker
    that is begging to switch all new development to C approximately
    every week. And couple of folks that beg for Rust.


    It's possible that he has newbies amongst his co-workers, yes.


    Well, Linus is not on his team, but if he was, he would say the same
    thing. But probably at much higher rate than weekly.


    Yes, but Linux Torvalds knows shit about C++. He knows a lot about C,
    and many other things.

    He also - not unreasonably - believes that if C++ was used in the Linux
    kernel, lots of others who know nothing about using C++ in OS's and
    low-level work would make a complete mess of things. You don't want
    someone to randomly add std::vector<> or the like into kernel code. You
    don't want people who take delight in smart-arse coding, such as some
    regulars in c.l.c++, anywhere near the kernel.

    But other OS's are not the Linux kernel - it has particularly unique challenges. If you have an appropriate team, C++ is vastly better for
    writing RTOS kernels than C.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to bart on Mon Jun 3 01:16:39 2024
    On Sun, 2 Jun 2024 10:37:55 +0100, bart wrote:

    On 02/06/2024 04:27, Lawrence D'Oliveiro wrote:

    On Sat, 1 Jun 2024 11:37:45 +0100, bart wrote:

    My compilers don't routinely generate object files, which would also
    need an external dependency (a linker), but they can do if necessary
    (eg. to statically link my code into another program with another
    compiler).

    Modular code design would indicate that there is no point the compiler
    duplicating functionality available in the linker.

    Python uses modules and yet doesn't have a linker.

    What is importlib, then, if not something that links everything together?

    And guess what: it’s a module.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Michael S on Mon Jun 3 03:21:01 2024
    On Sun, 2 Jun 2024 11:02:13 +0300, Michael S wrote:

    Engines control is FAR more real-time that OS, to list just one example
    out of many.

    Speaking of (internal-combustion) engines, I wondered when we can get to
    the point where the controller is operating down at the level of
    individual spark plug activations and valve openings -- getting rid of
    cams and timing belts, in other words.

    With that level of control, could you get down to an idling speed of 0
    rpm? That is, could the engine get itself going from absolute rest?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Lawrence D'Oliveiro on Mon Jun 3 11:16:15 2024
    On Mon, 3 Jun 2024 01:16:39 -0000 (UTC)
    Lawrence D'Oliveiro <[email protected]d> wrote:

    On Sun, 2 Jun 2024 10:37:55 +0100, bart wrote:

    On 02/06/2024 04:27, Lawrence D'Oliveiro wrote:

    On Sat, 1 Jun 2024 11:37:45 +0100, bart wrote:

    My compilers don't routinely generate object files, which would
    also need an external dependency (a linker), but they can do if
    necessary (eg. to statically link my code into another program
    with another compiler).

    Modular code design would indicate that there is no point the
    compiler duplicating functionality available in the linker.

    Python uses modules and yet doesn't have a linker.

    What is importlib, then, if not something that links everything
    together?

    And guess what: it’s a module.

    Bart is very obviously correct. When all sources are available, linker
    is merely an implementation detail. Much less necessary implementation
    detail too in the world of big RAM and of not particularly big apps.
    LTCG is sort of admission of this fact.
    Even in old days of small RAMs, super-popular TurboPascal suit had
    modules, but I don't think that it had linker.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Kaz Kylheku on Mon Jun 3 12:01:48 2024
    On Sun, 2 Jun 2024 19:52:22 -0000 (UTC)
    Kaz Kylheku <[email protected]> wrote:

    On 2024-06-02, Lew Pitcher <[email protected]> wrote:
    I've always considered
    for (;;)
    preferable over
    while (1)

    Of course it is preferable. The idiom constitutes the language's
    direct support for unconditional looping, not requiring that to be
    requested by an extraneous always-true expression.

    Using while (1) or while (true) is like i = i + 1 instead
    of ++i, or while (*dst++ = *src++); instead of strcpy.

    When Dennis Ritchie (if it was indeed he) chose for to be the
    construct in which the guard expression may be omitted, so that it
    may express conditional looping, he expressed the intent that it be henceforth used for that purpose.

    To continue to use while (1) after the proper utensil is provided is
    like to eat with your hands instead of a fork.


    The former becoming increasingly popular.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Mon Jun 3 12:00:43 2024
    On Sun, 2 Jun 2024 21:44:01 +0200
    David Brown <[email protected]> wrote:

    On 02/06/2024 15:29, Michael S wrote:
    On Sun, 2 Jun 2024 14:03:30 +0200
    David Brown <[email protected]> wrote:

    On 02/06/2024 10:02, Michael S wrote:
    On Sat, 01 Jun 2024 01:27:41 GMT
    [email protected] (Scott Lurndal) wrote:

    Lynn McGuire <[email protected]> writes:
    On 5/26/2024 6:23 AM, Bonita Montero wrote:
    Am 26.05.2024 um 09:13 schrieb jak:

    About this I only agree partially because it depends a lot on
    the context in which it is used. Moreover, I would not know
    how to indicate an optimal programming language for all
    seasons.

    C++ is in almost any case the better C.

    What you describe is the greatest inconvenience of c++. To
    make only one example, when they decided to rewrite the FB
    platform to accelerate it, they thought of migrating from php
    to c++ and they had a collapse of the staff suitable for
    work, so they thought of relying a compiler that translated
    the php into c++ and many of the new languages were born to
    try to remedy hits complexity.

    C++ is the wrong language for web applications.
    I like Java more for that.

    C++ is the wrong language for real time apps.

    That's an incorrect statement.

    No memory allocation allowed.

    It is trivially easy to write C++ code that doesn't
    allocate memory dynamically.


    I use C++ for my server side apps on my webserver. Works
    great.

    I use C++ for operating systems (you can't get more real-time
    than that)

    Engines control is FAR more real-time that OS, to list just one
    example out of many.

    Most engine control software runs on an RTOS - so you have at least
    as tough real-time requirements for the OS as for the application.


    From what I read about this stuff (admittedly, long time ago) even
    when there is a RTOS, the important part runs alongside RTOS rather
    than "on" RTOS.
    I.e. there is high priority interrupt that is never ever masked by
    OS in the region that is anywhere close to expected time and all time-sensitive work is done by ISR, with no sort of RTOS calls.

    That's sort-of right. To be precise for something like this, we'd
    have to say what exactly we mean by "engine controller". There are
    many kinds of engine or motor, and many types of control that are
    needed for them. Generally, there is a hierarchy of simpler but more time-critical parts up to more complex but more flexible parts of the
    system.

    As an example of a system of motor control that I've worked on
    (electric motors rather than combustion engines), the most
    timing-critical signal generation and safety (emergency stop,
    overload protection, etc.) are all in hardware - typically dedicated peripherals in the microcontroller. Some safety parts might also be implemented in non-maskable interrupt functions that the RTOS can
    never disable.

    The low-level control of the motors is typically run by timer
    interrupt functions. These may be disabled by the RTOS, but will
    only be disabled for a very short (and predictable) time - interrupt disabling is usually essential to the way locks and inter-process communication works, including communication between these timer
    functions and the rest of the code. Higher level control runs as
    RTOS tasks of various priorities, and communication with other boards
    is usually a lower priority task. Clearly these real-time tasks
    cannot be more "real-time" than the RTOS itself. Other boards might
    have high level non-realtime system determining things like path
    finding, or user interfaces.

    And until you get to the highest level stuff, there is no reason why
    C++ is not suitable. But whether you use C++, C, Assembly, or Ada
    for the low-level and more real-time critical code, you avoid dynamic
    memory, exceptions, and other techniques that can have unpredictable
    failure modes and unexpected delays. (The high-level stuff can be
    written in any language.)


    The OS stuff Scott works with, AFAIK, is real-time OS's for
    specific tasks such as high-end network equipment. It is not
    general-purpose or desktop OS's (which I agree are not
    particularly real-time).

    I'd characterized the software running within high-end NIC is as
    very soft real-time.

    I'd characterize it as whatever Scott says it is - he's the expert
    there, not you or me.

    You only care for buffers to not overflow. And if they
    overflow, it's not too bad either.

    That is true for some things, but most certainly not for all usage.

    The flow is very much unidirectional
    or bi-directional with direction almost independent of each other.
    There are dependencies between directions, e.g. TCP acks, but they a
    weak dependencies timing-wise.

    There is a lot of networking that is not TCP/IP.

    High-speed network interfaces are used for two purposes - to get high throughput, or to get low latencies. Throughput is not as sensitive
    to timing and can tolerate some variation as long as the traffic is independent, but latency is a different matter.


    I think, nearly all work in high-end NIC is concentrated on throughput.
    For low latency, the best you can do with high end NIC is to disable
    all high-end features and to hope that in disabled state they do not
    hurt you too badly.
    It would be probably better to use specialized "dumb" NIC. I don't know
    if such things exist, but considering that high-frequency trading is
    still legal (IMHO, it shouldn't be) I would guess that they do.

    Hard real time is about closed loops, most often closed control
    loops, but not only those.


    Of course, nowadays most of these things are no longer done on
    general-purpose CPUs or even MCUs.


    I think you have got that backwards.

    Most engine control /is/ done with general purpose
    microcontrollers, or at least specific variants of them. They
    will use ARM Cortex-R or Cortex-M cores rather than Cortex-A cores
    (i.e., the "real-time" cores or "microcontroller" cores rather
    than the "application" cores you see in telephones, Macs, and ARM
    servers), but they are standard cores. Another common choice is
    the PowerPC cores used in NXP's engine controllers.

    It used to be the case that engine control and other critical hard
    real-time work was done with DSPs or FPGAs, but those days are long
    past.


    Are you sure?

    Pretty sure, yes.

    It's much simpler and far more reliable to do such task with $5 PLD
    (which today means FPGA that boots from internal flash, rather than
    old day's PLD) than with MCU, regardless of price of MCU.

    No, it is not simpler or more reliable. Programmable logic is rarely
    used for engine or motor control. You use microcontrollers with
    appropriate peripherals, such as sophisticated PWM units and encoder interfaces, and advanced timers.


    I was not talking about electric motors.

    Even if MCU is $4.99 cheaper, the difference is a noise relatively
    to price of engine.

    That part is true.



    and bare-metal hypervisors.

    It is hard to believe that you don't have at least one co-worker
    that is begging to switch all new development to C approximately
    every week. And couple of folks that beg for Rust.


    It's possible that he has newbies amongst his co-workers, yes.


    Well, Linus is not on his team, but if he was, he would say the same
    thing. But probably at much higher rate than weekly.


    Yes, but Linux Torvalds knows shit about C++. He knows a lot about
    C, and many other things.

    He also - not unreasonably - believes that if C++ was used in the
    Linux kernel, lots of others who know nothing about using C++ in OS's
    and low-level work would make a complete mess of things. You don't
    want someone to randomly add std::vector<> or the like into kernel
    code. You don't want people who take delight in smart-arse coding,
    such as some regulars in c.l.c++, anywhere near the kernel.


    Or may be he understand that [for kernel] proclaimed advantages of C++
    do not matter or matter too little. And disadvantage of higher
    difficulty to see quickly what's going on, is real.

    It is interesting to mention that experienced 46 y.o. Dave Cutler and
    young student Linus Torvalds independently came to the same conclusion
    w.r.t. to kernel language choice. That despite Cutler's employer being
    very C++-oriented at that moment and despite most of the decisions
    taken during the peak years of OO hype.
    Unlike Torvalds, Cutler was not in a position to fully disable
    development of 3-rd party kernel modules in C++, but he did his best to discourage this practice.

    But other OS's are not the Linux kernel - it has particularly unique challenges. If you have an appropriate team, C++ is vastly better
    for writing RTOS kernels than C.


    I find your statement unproven.
    How many surviving and proliferating RTOS kernels are written in
    each language?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Lawrence D'Oliveiro on Mon Jun 3 11:13:32 2024
    On 03/06/2024 02:16, Lawrence D'Oliveiro wrote:
    On Sun, 2 Jun 2024 10:37:55 +0100, bart wrote:

    On 02/06/2024 04:27, Lawrence D'Oliveiro wrote:

    On Sat, 1 Jun 2024 11:37:45 +0100, bart wrote:

    My compilers don't routinely generate object files, which would also
    need an external dependency (a linker), but they can do if necessary
    (eg. to statically link my code into another program with another
    compiler).

    Modular code design would indicate that there is no point the compiler
    duplicating functionality available in the linker.

    Python uses modules and yet doesn't have a linker.

    What is importlib, then, if not something that links everything together?

    It seems to provide an API to the mechanisms behind 'import'.

    And guess what: it’s a module.

    So, you use it like this:

    import importlib

    maybe? So how does importlib manage to import importlib before importlib
    itself is imported?

    There is NO ahead-of-time linking of modules in Python as it is
    understood in traditional compiled languages.

    Besides, all such statements are executed at runtime, and can be
    conditional.

    There are of course mechanisms to collate symbols across different
    modules, which are executed at runtime and on demand. There are
    similarities to the methods used to maintain the global symbol table in
    my whole-program compilers, or in my assembler.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Mon Jun 3 14:16:41 2024
    Lawrence D'Oliveiro <[email protected]d> writes:
    On Sun, 2 Jun 2024 11:02:13 +0300, Michael S wrote:

    Engines control is FAR more real-time that OS, to list just one example
    out of many.

    Speaking of (internal-combustion) engines, I wondered when we can get to
    the point where the controller is operating down at the level of
    individual spark plug activations and valve openings -- getting rid of
    cams and timing belts, in other words.

    It's pretty clear that the ICE is becoming a dinosaur.

    And no, I don't see the engine controller working at that level;
    one wonders how it would open the valves without the mechanical
    linkages (individual acutators? $$$).


    With that level of control, could you get down to an idling speed of 0
    rpm? That is, could the engine get itself going from absolute rest?

    Tesla.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Mon Jun 3 18:34:16 2024
    On 03/06/2024 11:00, Michael S wrote:
    On Sun, 2 Jun 2024 21:44:01 +0200
    David Brown <[email protected]> wrote:

    On 02/06/2024 15:29, Michael S wrote:
    On Sun, 2 Jun 2024 14:03:30 +0200
    David Brown <[email protected]> wrote:


    There is a lot of networking that is not TCP/IP.

    High-speed network interfaces are used for two purposes - to get high
    throughput, or to get low latencies. Throughput is not as sensitive
    to timing and can tolerate some variation as long as the traffic is
    independent, but latency is a different matter.


    I think, nearly all work in high-end NIC is concentrated on throughput.
    For low latency, the best you can do with high end NIC is to disable
    all high-end features and to hope that in disabled state they do not
    hurt you too badly.
    It would be probably better to use specialized "dumb" NIC. I don't know
    if such things exist, but considering that high-frequency trading is
    still legal (IMHO, it shouldn't be) I would guess that they do.

    I think Scott can answer the high-end NIC questions a lot better than I
    could.


    Hard real time is about closed loops, most often closed control
    loops, but not only those.


    Of course, nowadays most of these things are no longer done on
    general-purpose CPUs or even MCUs.


    I think you have got that backwards.

    Most engine control /is/ done with general purpose
    microcontrollers, or at least specific variants of them. They
    will use ARM Cortex-R or Cortex-M cores rather than Cortex-A cores
    (i.e., the "real-time" cores or "microcontroller" cores rather
    than the "application" cores you see in telephones, Macs, and ARM
    servers), but they are standard cores. Another common choice is
    the PowerPC cores used in NXP's engine controllers.

    It used to be the case that engine control and other critical hard
    real-time work was done with DSPs or FPGAs, but those days are long
    past.


    Are you sure?

    Pretty sure, yes.

    It's much simpler and far more reliable to do such task with $5 PLD
    (which today means FPGA that boots from internal flash, rather than
    old day's PLD) than with MCU, regardless of price of MCU.

    No, it is not simpler or more reliable. Programmable logic is rarely
    used for engine or motor control. You use microcontrollers with
    appropriate peripherals, such as sophisticated PWM units and encoder
    interfaces, and advanced timers.


    I was not talking about electric motors.

    Petrol and diesel engines have far less demanding requirements for the
    timing of their control systems. The fastest control loops you need to
    control them are a fraction of the speed of those used for high-end
    electric motor control, and the corresponding acceptable jitter levels
    are much less fussy. And they are invariably controlled by
    microcontrollers, and have been for decades. (The microcontrollers you
    use typically have some specialised timing peripherals.)

    However, I would not be surprised to see programmable logic in the
    controllers for jet engines, if that is what you are talking about. The markets there are too small, and the control details too different
    between different models, for there to be microcontrollers with
    jet-engine peripherals. But regardless, it is all still hierarchical in
    the same way, with a RTOS and real-time software tasks sitting above the dedicated hardware and below the high-level control software.


    Well, Linus is not on his team, but if he was, he would say the same
    thing. But probably at much higher rate than weekly.


    Yes, but Linux Torvalds knows shit about C++. He knows a lot about
    C, and many other things.

    He also - not unreasonably - believes that if C++ was used in the
    Linux kernel, lots of others who know nothing about using C++ in OS's
    and low-level work would make a complete mess of things. You don't
    want someone to randomly add std::vector<> or the like into kernel
    code. You don't want people who take delight in smart-arse coding,
    such as some regulars in c.l.c++, anywhere near the kernel.


    Or may be he understand that [for kernel] proclaimed advantages of C++
    do not matter or matter too little. And disadvantage of higher
    difficulty to see quickly what's going on, is real.

    It is interesting to mention that experienced 46 y.o. Dave Cutler and
    young student Linus Torvalds independently came to the same conclusion
    w.r.t. to kernel language choice.

    You /do/ understand that these decisions were made some 30 years ago?
    The languages, developers, compilers, targets, and many other things
    have changed in that time.

    That despite Cutler's employer being
    very C++-oriented at that moment and despite most of the decisions
    taken during the peak years of OO hype.
    Unlike Torvalds, Cutler was not in a position to fully disable
    development of 3-rd party kernel modules in C++, but he did his best to discourage this practice.

    But other OS's are not the Linux kernel - it has particularly unique
    challenges. If you have an appropriate team, C++ is vastly better
    for writing RTOS kernels than C.


    I find your statement unproven.
    How many surviving and proliferating RTOS kernels are written in
    each language?


    Oh, there's little doubt that most publicly available RTOS kernels are
    in C, not C++. That does not mean C is in any way /better/ for the
    task. There are multiple reasons for C being the language of choice here:

    1. Most well-known RTOS kernels have a history stretching back to the
    previous century. C++ was not nearly as viable an option at that time,
    for a great many reasons.

    2. If you write your kernel in C++, you pretty much have to use C++ for
    the application code unless you also write a C API for it. If you write
    your kernel in C, you can use almost any language for the application code.

    3. Most well-known RTOS's are for microcontrollers, often including
    small CISC devices and other microcontrollers for which toolchain
    support was traditionally poor, expensive, and barely classifiable as
    C90 never mind C++ or even C99. If you want to support these devices,
    C90 is the only way to go.

    4. There is a bizarre attitude in a lot of the embedded world that "ANSI
    C" (meaning C89/C90) is somehow magical and "the standard". Marketing departments see it as a "feature" that the code is written to this long out-dated and inferior language standard.

    5. Lots of embedded programmers are not great programmers, or not
    educated as programmers - they are hardware or electronics engineers
    that have moved into software. C90 is often all they know, and
    certainly they have never learned more than basic C++.

    So there are plenty of reasons why C (especially C90) is dominant in
    RTOS's. Note that none of these are technical reasons - C90 is never
    chosen because it is a /better/ language than C++ (or even C99). It is
    chosen /despite/ being a weaker language. (Some non-technical reasons
    can be good arguments, of course - but in most cases they are not.)

    After all, there is virtually nothing that you can write in C90 that you
    cannot use directly in modern C++. Baring a few cases where you need
    casts in C++ but not in C (and such casts are typically mandated by
    embedded coding standards anyway), you can compile the same code as C++.
    Since you can do almost everything with modern C++ that you can with
    C90 or C99, and you can do vastly more with modern C++ - resulting in
    /much/ safer coding, as long as the programmers are competent - it is
    obvious that an appropriately restricted subset C++ is a technically
    better choice of language.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to David Brown on Mon Jun 3 16:50:50 2024
    David Brown <[email protected]> writes:
    On 03/06/2024 11:00, Michael S wrote:
    On Sun, 2 Jun 2024 21:44:01 +0200
    David Brown <[email protected]> wrote:

    On 02/06/2024 15:29, Michael S wrote:
    On Sun, 2 Jun 2024 14:03:30 +0200
    David Brown <[email protected]> wrote:


    There is a lot of networking that is not TCP/IP.

    High-speed network interfaces are used for two purposes - to get high
    throughput, or to get low latencies. Throughput is not as sensitive
    to timing and can tolerate some variation as long as the traffic is
    independent, but latency is a different matter.


    I think, nearly all work in high-end NIC is concentrated on throughput.
    For low latency, the best you can do with high end NIC is to disable
    all high-end features and to hope that in disabled state they do not
    hurt you too badly.
    It would be probably better to use specialized "dumb" NIC. I don't know
    if such things exist, but considering that high-frequency trading is
    still legal (IMHO, it shouldn't be) I would guess that they do.

    I think Scott can answer the high-end NIC questions a lot better than I >could.


    Hard real time is about closed loops, most often closed control
    loops, but not only those.


    Of course, nowadays most of these things are no longer done on
    general-purpose CPUs or even MCUs.


    I think you have got that backwards.

    Most engine control /is/ done with general purpose
    microcontrollers, or at least specific variants of them. They
    will use ARM Cortex-R or Cortex-M cores rather than Cortex-A cores
    (i.e., the "real-time" cores or "microcontroller" cores rather
    than the "application" cores you see in telephones, Macs, and ARM
    servers), but they are standard cores. Another common choice is
    the PowerPC cores used in NXP's engine controllers.

    It used to be the case that engine control and other critical hard
    real-time work was done with DSPs or FPGAs, but those days are long
    past.


    Are you sure?

    Pretty sure, yes.

    It's much simpler and far more reliable to do such task with $5 PLD
    (which today means FPGA that boots from internal flash, rather than
    old day's PLD) than with MCU, regardless of price of MCU.

    No, it is not simpler or more reliable. Programmable logic is rarely
    used for engine or motor control. You use microcontrollers with
    appropriate peripherals, such as sophisticated PWM units and encoder
    interfaces, and advanced timers.


    I was not talking about electric motors.

    Petrol and diesel engines have far less demanding requirements for the
    timing of their control systems. The fastest control loops you need to >control them are a fraction of the speed of those used for high-end
    electric motor control, and the corresponding acceptable jitter levels
    are much less fussy. And they are invariably controlled by
    microcontrollers, and have been for decades. (The microcontrollers you
    use typically have some specialised timing peripherals.)

    However, I would not be surprised to see programmable logic in the >controllers for jet engines, if that is what you are talking about. The >markets there are too small, and the control details too different
    between different models, for there to be microcontrollers with
    jet-engine peripherals. But regardless, it is all still hierarchical in
    the same way, with a RTOS and real-time software tasks sitting above the >dedicated hardware and below the high-level control software.

    https://en.wikipedia.org/wiki/FADEC


    It is interesting to mention that experienced 46 y.o. Dave Cutler and
    young student Linus Torvalds independently came to the same conclusion
    w.r.t. to kernel language choice.

    You /do/ understand that these decisions were made some 30 years ago?

    And it's not particularly interesting, either.

    Cutler would have been happy with Macro-32 and Bliss-32 if he
    could have wrestled them from DEC.

    Linus has always had significant misconceptions about C++
    (as do many who think C++ is defined by the standard C++ library).

    The languages, developers, compilers, targets, and many other things
    have changed in that time.

    We were writting a large unix compatible operating system in C++
    before Linus released the first Linux.


    That despite Cutler's employer being
    very C++-oriented at that moment and despite most of the decisions
    taken during the peak years of OO hype.
    Unlike Torvalds, Cutler was not in a position to fully disable
    development of 3-rd party kernel modules in C++, but he did his best to
    discourage this practice.

    But other OS's are not the Linux kernel - it has particularly unique
    challenges. If you have an appropriate team, C++ is vastly better
    for writing RTOS kernels than C.


    I find your statement unproven.
    How many surviving and proliferating RTOS kernels are written in
    each language?


    Oh, there's little doubt that most publicly available RTOS kernels are
    in C, not C++. That does not mean C is in any way /better/ for the
    task. There are multiple reasons for C being the language of choice here:

    1. Most well-known RTOS kernels have a history stretching back to the >previous century. C++ was not nearly as viable an option at that time,
    for a great many reasons.

    I would disagree with this. The Chorus microkernel (Chorus Systemes,
    later purchased by Sun) was started in the late 1980's and was
    written in C++ (with a small set of assembler functions). This was
    using Cfront (2.1 and later 3.0). I'm pretty sure it is still in
    use. This was long before templates, exceptions or the standard library.


    2. If you write your kernel in C++, you pretty much have to use C++ for
    the application code unless you also write a C API for it.

    Clearly one can use C interfaces from C++ code. And one can develop
    C++ wrapper around C-type functionality.

    Our C++ kernels supported standard unix-style APIs between user
    mode software and the kernel.


    If you write
    your kernel in C, you can use almost any language for the application code.

    If you write your kernel in _any_ lanaguage, you can use _any_ language
    for the application code, or the kernel isn't much use to anyone.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Scott Lurndal on Mon Jun 3 21:05:17 2024
    On 03/06/2024 18:50, Scott Lurndal wrote:
    David Brown <[email protected]> writes:
    On 03/06/2024 11:00, Michael S wrote:
    On Sun, 2 Jun 2024 21:44:01 +0200
    David Brown <[email protected]> wrote:

    Oh, there's little doubt that most publicly available RTOS kernels are
    in C, not C++. That does not mean C is in any way /better/ for the
    task. There are multiple reasons for C being the language of choice here: >>
    1. Most well-known RTOS kernels have a history stretching back to the
    previous century. C++ was not nearly as viable an option at that time,
    for a great many reasons.

    I would disagree with this. The Chorus microkernel (Chorus Systemes,
    later purchased by Sun) was started in the late 1980's and was
    written in C++ (with a small set of assembler functions). This was
    using Cfront (2.1 and later 3.0). I'm pretty sure it is still in
    use. This was long before templates, exceptions or the standard library.


    C++ was viable for the kind of systems you were working with (clearly
    that is true, since you worked on an OS written in C++ at that time).

    I have been specifically referring to "well-known" RTOS's - the sort
    that would have a Wikipedia page, or whose name has a chance of being recognised by many embedded programmers. (I realise this is a very
    vague and subjective classification.) I am quite confident that the
    majority of RTOS's ever written are proprietary, with little if any
    public information. Some of these will be written in C, others in
    Assembly, C++, Ada, and perhaps other languages. I think the share of
    C++ in these will be a lot higher than in more commonly used RTOS's,
    because the team involved in developing and using them will be smaller
    and more controlled, negating many of the reasons for using C.



    2. If you write your kernel in C++, you pretty much have to use C++ for
    the application code unless you also write a C API for it.

    Clearly one can use C interfaces from C++ code. And one can develop
    C++ wrapper around C-type functionality.

    Our C++ kernels supported standard unix-style APIs between user
    mode software and the kernel.


    If you write
    your kernel in C, you can use almost any language for the application code.

    If you write your kernel in _any_ lanaguage, you can use _any_ language
    for the application code, or the kernel isn't much use to anyone.


    Many - I think most - RTOS's are linked as libraries, rather than
    separately linked applications.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Lawrence D'Oliveiro on Mon Jun 3 21:14:08 2024
    On 03/06/2024 05:21, Lawrence D'Oliveiro wrote:
    On Sun, 2 Jun 2024 11:02:13 +0300, Michael S wrote:

    Engines control is FAR more real-time that OS, to list just one example
    out of many.

    Speaking of (internal-combustion) engines, I wondered when we can get to
    the point where the controller is operating down at the level of
    individual spark plug activations and valve openings -- getting rid of
    cams and timing belts, in other words.


    You need the mechanics for the valves anyway, so cam shafts are
    convenient. But AFAIK everything that can be tunable timing is
    controlled by software (microcontrollers with advanced timer
    peripherals) these days, and some engines use electric solenoids for the valves.

    But I can't claim to know much about how these work. I know plenty
    about some of the microcontrollers used in the automotive industry,
    including engine controllers, because we use some of these kinds of
    devices (for other purposes).

    With that level of control, could you get down to an idling speed of 0
    rpm? That is, could the engine get itself going from absolute rest?

    There are physical limitations that make it difficult to idle an ICE at
    very low revs - it makes more sense to simply stop the engine when it is
    not needed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to David Brown on Mon Jun 3 19:38:14 2024
    David Brown <[email protected]> writes:
    On 03/06/2024 18:50, Scott Lurndal wrote:
    nguage for the application code.

    If you write your kernel in _any_ lanaguage, you can use _any_ language
    for the application code, or the kernel isn't much use to anyone.


    Many - I think most - RTOS's are linked as libraries, rather than
    separately linked applications.

    Some, I suspect, load code from some form of flash dynamically.

    Regardless, interlanguage linking has been available for half a century.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Scott Lurndal on Mon Jun 3 22:58:56 2024
    On Mon, 03 Jun 2024 16:50:50 GMT
    [email protected] (Scott Lurndal) wrote:

    David Brown <[email protected]> writes:

    1. Most well-known RTOS kernels have a history stretching back to
    the previous century. C++ was not nearly as viable an option at
    that time, for a great many reasons.

    I would disagree with this. The Chorus microkernel (Chorus Systemes,
    later purchased by Sun) was started in the late 1980's and was
    written in C++ (with a small set of assembler functions). This was
    using Cfront (2.1 and later 3.0). I'm pretty sure it is still in
    use. This was long before templates, exceptions or the standard
    library.


    If Chorus is your idea of well-known then I wonder what you call
    obscure.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Lew Pitcher on Mon Jun 3 13:29:25 2024
    Lew Pitcher <[email protected]> writes:

    On Sun, 02 Jun 2024 13:24:23 +0000, Kenny McCormack wrote:

    In article <v3gou9$36n61$[email protected]>,
    Lawrence D'Oliveiro <[email protected]d> wrote:

    On Fri, 31 May 2024 17:55:13 -0500, Lynn McGuire wrote:

    while (1)

    Why not

    while (true)

    or even

    for (;;)

    ?

    I've always considered
    for (;;)
    preferable over
    while (1)
    as the for (;;) expression does not require the compiler to expand
    and evaluate a condition expression.

    For the for (;;), the compiler sees the token stream <LPAREN>
    <SEMICOLON> <SEMICOLON> <RPAREN>, and emits a closed loop, but
    with while (1), the compiler sees <LPAREN> <CONSTANT> <RPAREN>,

    But the 'for (;;)' tokens need to be matched to a much more
    complicated syntax, with three optional expression (one of
    which might be a declaration) before assigning semantics.
    There is actually a lot more to do when 'for (;;)' is used.

    and has to evaluate (either at compile time or at execution
    time) the value of the <CONSTANT> to determine whether or or
    not to emit the closed loop logic.

    Both gcc and clang turn 'while (1)' into simple loops even
    under -O0. So it can't be that hard.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Scott Lurndal on Mon Jun 3 13:23:37 2024
    [email protected] (Scott Lurndal) writes:

    [ ... (internal-combustion) engines, ... ]

    It's pretty clear that the ICE is becoming a dinosaur.

    Kind of makes it full circle, doesn't it? ;)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Kaz Kylheku on Mon Jun 3 13:31:38 2024
    Kaz Kylheku <[email protected]> writes:

    On 2024-06-02, Lew Pitcher <[email protected]> wrote:

    I've always considered
    for (;;)
    preferable over
    while (1)

    Of course it is preferable. The idiom constitutes the language's direct support for unconditional looping, not requiring that to be requested by
    an extraneous always-true expression.

    Using while (1) or while (true) is like i = i + 1 instead
    of ++i, or while (*dst++ = *src++); instead of strcpy. [...]

    Using for (;;) for an infinite loop is an abomination. Anyone
    who advocates following that rule is an instrument of Satan.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Michael S on Mon Jun 3 21:22:07 2024
    Michael S <[email protected]> writes:
    On Mon, 03 Jun 2024 16:50:50 GMT
    [email protected] (Scott Lurndal) wrote:

    David Brown <[email protected]> writes:

    1. Most well-known RTOS kernels have a history stretching back to
    the previous century. C++ was not nearly as viable an option at
    that time, for a great many reasons.

    I would disagree with this. The Chorus microkernel (Chorus Systemes,
    later purchased by Sun) was started in the late 1980's and was
    written in C++ (with a small set of assembler functions). This was
    using Cfront (2.1 and later 3.0). I'm pretty sure it is still in
    use. This was long before templates, exceptions or the standard
    library.


    If Chorus is your idea of well-known then I wonder what you call
    obscure.

    I was, of course, addressing the second sentence in David's point (1).

    At the time, in the OS research community, Chorus was, indeed well-known.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to Chris M. Thomasson on Mon Jun 3 21:48:23 2024
    In article <v3lb0u$2452$[email protected]>,
    Chris M. Thomasson <[email protected]> wrote:
    On 6/3/2024 1:31 PM, Tim Rentsch wrote:
    Kaz Kylheku <[email protected]> writes:

    On 2024-06-02, Lew Pitcher <[email protected]> wrote:

    I've always considered
    for (;;)
    preferable over
    while (1)

    Of course it is preferable. The idiom constitutes the language's direct >>> support for unconditional looping, not requiring that to be requested by >>> an extraneous always-true expression.

    Using while (1) or while (true) is like i = i + 1 instead
    of ++i, or while (*dst++ = *src++); instead of strcpy. [...]

    Using for (;;) for an infinite loop is an abomination. Anyone
    who advocates following that rule is an instrument of Satan.

    Better than goto? ;^D

    I can't believe we're still having this conversation.

    Surely, on any reasonably modern compiler, all three forms will generate exactly the same code.

    --
    You are again heaping damnation upon your own head by your statements.

    - Rick C Hodgin -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Kaz Kylheku on Mon Jun 3 23:43:00 2024
    On 02/06/2024 20:52, Kaz Kylheku wrote:
    On 2024-06-02, Lew Pitcher <[email protected]> wrote:
    I've always considered
    for (;;)
    preferable over
    while (1)

    Of course it is preferable. The idiom constitutes the language's direct support for unconditional looping, not requiring that to be requested by
    an extraneous always-true expression.

    Using while (1) or while (true) is like i = i + 1 instead
    of ++i, or while (*dst++ = *src++); instead of strcpy.

    When Dennis Ritchie (if it was indeed he) chose for to be the construct
    in which the guard expression may be omitted, so that it may express conditional looping, he expressed the intent that it be henceforth used
    for that purpose.

    To continue to use while (1) after the proper utensil is provided is
    like to eat with your hands instead of a fork.


    To me they're both an abomination.

    I classify loops like this: Endless, Repeat-N-times, While, Iteration
    (over ranges or values).

    Few languages have a special form for the first two, but mine always
    have done.

    'while' is no good for endless loops because there is no actual
    condition to check. You have to provide one just for it to be eliminated.

    'for' is even worse because there is no sort of iteration going on.
    Actually, somebody could write a loop like this:

    for(int i=0;;++i)

    Is that an endless loop or not? Like every other for-loop, you have to
    analyse it to deduce the intent. Here, you can't even be sure that the
    empty condition wasn't an oversight.

    At this point someone will suggest a macro this:

    #define forever for(;;)

    All that suggest sto me is that the language *needs* an explicit endless
    loop!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to bart on Tue Jun 4 02:12:52 2024
    On Mon, 3 Jun 2024 11:13:32 +0100, bart wrote:

    So how does importlib manage to import importlib before importlib
    itself is imported?

    I guess the same way a linker manages to link itself.

    There is NO ahead-of-time linking of modules in Python as it is
    understood in traditional compiled languages.

    Python is a compiled language.

    Besides, all such statements are executed at runtime, and can be
    conditional.

    It’s not the only object-code language with that property.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Michael S on Tue Jun 4 02:10:55 2024
    On Mon, 3 Jun 2024 11:16:15 +0300, Michael S wrote:

    When all sources are available, linker is merely an implementation
    detail.

    That’s assuming all the code is written in the same language, compilable
    with the same compiler.

    For typical non-trivial open-source projects, this is usually not true.

    And consider, even with C, the meaning of top-level “static” and the implications for compiling the source in separate pieces versus all at
    once.

    Even in old days of small RAMs, super-popular TurboPascal suit had
    modules, but I don't think that it had linker.

    The programs it built had sizes in, say, the tens of thousands of lines at most.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to bart on Tue Jun 4 02:20:43 2024
    On Mon, 3 Jun 2024 23:43:00 +0100, bart wrote:

    All that suggest sto me is that the language *needs* an explicit endless loop!

    I agree. Also it is common for a loop to have multiple exits, and I don’t like treating one of them as a special “termination condition” above the others, so I like to use “break” for all of them.

    The “for” form not only caters for this, it allows handy initialization of local variables that keep their value between loop iterations. E.g.

    for (unsigned int i = length_of(array);;)
    {
    if (i == 0)
    {
    ... not found ...
    break;
    } /*if*/
    --i;
    if (... array[i] matches what I want ...)
    {
    .. found ...
    break;
    } /*if*/
    } /*for*/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to bart on Tue Jun 4 04:00:58 2024
    On Sat, 1 Jun 2024 19:59:19 +0100, bart wrote:

    This version does binary/text to COFF only.

    objcopy uses BFD to handle all the object-format details; why not do the
    same?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Michael S on Tue Jun 4 05:12:29 2024
    On 2024-06-03, Michael S <[email protected]> wrote:
    On Mon, 03 Jun 2024 16:50:50 GMT
    [email protected] (Scott Lurndal) wrote:

    David Brown <[email protected]> writes:

    1. Most well-known RTOS kernels have a history stretching back to
    the previous century. C++ was not nearly as viable an option at
    that time, for a great many reasons.

    I would disagree with this. The Chorus microkernel (Chorus Systemes,
    later purchased by Sun) was started in the late 1980's and was
    written in C++ (with a small set of assembler functions). This was
    using Cfront (2.1 and later 3.0). I'm pretty sure it is still in
    use. This was long before templates, exceptions or the standard
    library.


    If Chorus is your idea of well-known then I wonder what you call
    obscure.

    I also know about Chorus. However, not from actual work exposure to it.
    I remember it from operating systems courses at school; i.e. academia.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @[email protected]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Scott Lurndal on Tue Jun 4 05:17:13 2024
    On 2024-06-03, Scott Lurndal <[email protected]> wrote:
    At the time, in the OS research community, Chorus was, indeed well-known.

    If Chorus at least doesn't vaguely ring a bell, you must have your head
    up your ass as even a bachelor-level computer scientist.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @[email protected]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to bart on Tue Jun 4 05:25:22 2024
    On 2024-06-03, bart <[email protected]> wrote:
    Actually, somebody could write a loop like this:

    for(int i=0;;++i)

    Is that an endless loop or not?

    Some compilers may recognize it as assertion which says "this code shall
    not be reached".

    At this point someone will suggest a macro this:

    #define forever for(;;)

    All that suggest sto me is that the language *needs* an explicit endless loop!

    Nope!

    #define ev
    #define e
    #define r

    for(ev;e;r) ...

    :)

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @[email protected]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Tue Jun 4 06:55:04 2024
    On Mon, 03 Jun 2024 16:50:50 GMT, Scott Lurndal wrote:

    We were writting a large unix compatible operating system in C++
    before Linus released the first Linux.

    Where is that now?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Kaz Kylheku on Tue Jun 4 11:23:56 2024
    On Tue, 4 Jun 2024 05:17:13 -0000 (UTC)
    Kaz Kylheku <[email protected]> wrote:

    On 2024-06-03, Scott Lurndal <[email protected]> wrote:
    At the time, in the OS research community, Chorus was, indeed
    well-known.

    If Chorus at least doesn't vaguely ring a bell, you must have your
    head up your ass as even a bachelor-level computer scientist.


    The closest I was near CS department is, may be, passing by in the same building. But even that unsure.

    Chorus certainly was never mentioned as a candidate kernel for any real
    project when I was in the room.

    Majority of the stuff that was mentioned or used around me in 90s was
    from little guys, like MTOS, pSoS, VxWorks (later acquired by big guy).
    The only two that I remember from big guys were iRMX/iRMK and VAXEln.
    W.r.t. to uKernels, back then I heard about QNX, but not in the context
    of proposition to use it in our own project.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Kaz Kylheku on Tue Jun 4 10:25:32 2024
    On 04/06/2024 07:17, Kaz Kylheku wrote:
    On 2024-06-03, Scott Lurndal <[email protected]> wrote:
    At the time, in the OS research community, Chorus was, indeed well-known.

    If Chorus at least doesn't vaguely ring a bell, you must have your head
    up your ass as even a bachelor-level computer scientist.


    I think that is putting it a bit strongly - it is a /long/ time since
    Chorus was relevant even in academic circles. And while it was
    influential, I don't know that it was ever widely used (Scott will know
    more about that, I guess).

    Maybe it would be discussed in some computer science degrees, if you go
    back long enough and had detailed enough courses in operating systems or perhaps computing history. I don't remember that it never turned up in
    my courses, some 30-odd years ago, but I don't remember /all/ the
    details from all my courses!. I knew about it mainly because I am
    interested in OS's, and history, and spend far too much time on
    Wikipedia and countless technical sites - not because it was on my
    syllabus at university.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Kenny McCormack on Tue Jun 4 10:36:00 2024
    On 03/06/2024 23:48, Kenny McCormack wrote:
    In article <v3lb0u$2452$[email protected]>,
    Chris M. Thomasson <[email protected]> wrote:
    On 6/3/2024 1:31 PM, Tim Rentsch wrote:
    Kaz Kylheku <[email protected]> writes:

    On 2024-06-02, Lew Pitcher <[email protected]> wrote:

    I've always considered
    for (;;)
    preferable over
    while (1)

    Of course it is preferable. The idiom constitutes the language's direct >>>> support for unconditional looping, not requiring that to be requested by >>>> an extraneous always-true expression.

    Using while (1) or while (true) is like i = i + 1 instead
    of ++i, or while (*dst++ = *src++); instead of strcpy. [...]

    Using for (;;) for an infinite loop is an abomination. Anyone
    who advocates following that rule is an instrument of Satan.

    Better than goto? ;^D

    I can't believe we're still having this conversation.

    Surely, on any reasonably modern compiler, all three forms will generate exactly the same code.


    I would think so, yes. (I've used toolchains where that was not true,
    but they are firmly in my past.)

    But conversations - arguments - about style of source code /never/ get
    out of date!

    Personally, I'm in the "while (true) { ... }" camp. To me, "for (;;)"
    looks like a weird smiley, and I do not fall for any appeals to Deniis Ritchie's authority.


    But we are missing another option:

    void mainloop() {
    // do something
    mainloop();
    }

    That should be fine with an optimising compiler.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Tue Jun 4 10:47:15 2024
    On 04/06/2024 01:23, Keith Thompson wrote:
    bart <[email protected]> writes:
    [...]


    All that suggest sto me is that the language *needs* an explicit
    endless loop!

    No, it doesn't.


    Indeed - it suggests that the language already has perfectly good,
    workable ways to specify endless loops. It's fine to have additional
    language (or library) features for very common tasks, or for tasks where
    the new feature adds clear benefit. I don't see what benefit a language keyword "forever" compared to "while (true)" or one of the other common
    idioms.

    I suspect some of the people in this thread saying that one form
    is obviously better than the others are joking.


    Yes. People usually have their own preferences and habits for what they
    write themselves, but dislike for the alternatives is usually
    exaggerated. (Unless someone uses a goto loop - then the source code
    should be burned and the programmer forced to copy out the paper "Go to considered harmful" for the rest of the working week.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Lawrence D'Oliveiro on Tue Jun 4 10:47:58 2024
    On 04/06/2024 04:20, Lawrence D'Oliveiro wrote:
    On Mon, 3 Jun 2024 23:43:00 +0100, bart wrote:

    All that suggest sto me is that the language *needs* an explicit endless
    loop!

    I agree. Also it is common for a loop to have multiple exits, and I don’t like treating one of them as a special “termination condition” above the others, so I like to use “break” for all of them.

    The “for” form not only caters for this, it allows handy initialization of
    local variables that keep their value between loop iterations. E.g.

    for (unsigned int i = length_of(array);;)
    {
    if (i == 0)
    {
    ... not found ...
    break;
    } /*if*/
    --i;
    if (... array[i] matches what I want ...)
    {
    .. found ...
    break;
    } /*if*/
    } /*for*/

    Now we know Keith was right that people are joking!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Lawrence D'Oliveiro on Tue Jun 4 12:28:41 2024
    On 04/06/2024 03:10, Lawrence D'Oliveiro wrote:
    On Mon, 3 Jun 2024 11:16:15 +0300, Michael S wrote:

    When all sources are available, linker is merely an implementation
    detail.

    That’s assuming all the code is written in the same language, compilable with the same compiler.

    Why, how many C compilers do you use for the same project?

    Yes it would want build projects where you only have a binary object
    file of some library, then you need a tool that can process that.

    But I nearly always use DLLs.


    For typical non-trivial open-source projects, this is usually not true.

    And consider, even with C, the meaning of top-level “static” and the implications for compiling the source in separate pieces versus all at
    once.

    Even in old days of small RAMs, super-popular TurboPascal suit had
    modules, but I don't think that it had linker.

    The programs it built had sizes in, say, the tens of thousands of lines at most.

    I can build programs of 100s of thousands of lines with no linker. Why
    wouldn't it be scalable? You would anyway expect larger programs to ne
    split into different binaries such as dynamic libraries.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Lawrence D'Oliveiro on Tue Jun 4 12:35:43 2024
    On 04/06/2024 03:12, Lawrence D'Oliveiro wrote:
    On Mon, 3 Jun 2024 11:13:32 +0100, bart wrote:

    So how does importlib manage to import importlib before importlib
    itself is imported?

    I guess the same way a linker manages to link itself.

    There is NO ahead-of-time linking of modules in Python as it is
    understood in traditional compiled languages.

    Python is a compiled language.

    CPython does ahead-of-time compilation to bytecode, of individual
    modules on-demand. There is no AOT compilation of all modules to binary bytecode files which need a linking process before execution starts.

    It is utterly different from the linkers used with typical C code.

    In the past I have written interpreters that had a discrete bytecode
    compiler that produced individual files per module containing binary
    bytecode.

    The loading process was handled by the interpreter, a separate program,
    that fixed things up to allow the program to be run immediately; the
    output was not another monolithic binary file.

    Again this is very different from a traditional linker that combines .o,
    .a, .lib and .dll files into a single executable.

    (With .dll files, they are only used to build import tables of the
    executable; the library says separate.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to David Brown on Tue Jun 4 13:30:18 2024
    David Brown <[email protected]> writes:
    On 04/06/2024 07:17, Kaz Kylheku wrote:
    On 2024-06-03, Scott Lurndal <[email protected]> wrote:
    At the time, in the OS research community, Chorus was, indeed well-known. >>
    If Chorus at least doesn't vaguely ring a bell, you must have your head
    up your ass as even a bachelor-level computer scientist.


    I think that is putting it a bit strongly - it is a /long/ time since
    Chorus was relevant even in academic circles. And while it was
    influential, I don't know that it was ever widely used (Scott will know
    more about that, I guess).

    Unisys used it in the 90's as the basis of a large distributed MPP
    machine[*] (which used the Intel Paragon supercomputer backplane) based
    on the Pentium Pro. Provided a single system image to the programmer
    across a collection of hardware resources (without cache coherency
    but with page-level coherency).

    I spent almost a decade working on the operating system which used
    the Chorus microkernel (and got a couple nice trips to Paris :-).

    The Chorus microkernel (and the unisys unix SVR4.2ES/MP compatible
    subsystem) was also part of the European Amadeus project
    (ICL, Unisys, USL, Fujitsu et alia) that was developing a distributed
    operating system.

    Sun eventually bought Chorus.

    [*] internally known as OPUS, externally as the SPP (Scalable Parallel Processor). They primarily ran decision support software (Oracle
    Parallel server, Informix, Redbrick). All of them were retired by 2010.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to BGB on Tue Jun 4 19:21:34 2024
    BGB <[email protected]> writes:
    On 6/3/2024 3:23 PM, Tim Rentsch wrote:
    [email protected] (Scott Lurndal) writes:

    [ ... (internal-combustion) engines, ... ]

    It's pretty clear that the ICE is becoming a dinosaur.

    Kind of makes it full circle, doesn't it? ;)

    Though, annoyingly, there isn't a great alternative in some use cases:
    Batteries: Lower energy density and require charging (slow);

    Both of which are an order of magnitude better than just a
    decade ago - and both energy density and charge time are
    a subject of intense research (both in the automotive
    and aircraft industries). I fully expect that energy density
    per kilogram will be more than doubled in the next decade.

    Fuel Cells: More expensive and finicky.

    And if you're going to use renewable energy to crack water
    into H2, why not just use the electricity itself (concentrate
    on better storage technology rather than H2 (gas or liquid)
    fuel cells).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to BGB on Tue Jun 4 19:17:50 2024
    BGB <[email protected]> writes:
    On 6/4/2024 12:17 AM, Kaz Kylheku wrote:
    On 2024-06-03, Scott Lurndal <[email protected]> wrote:
    At the time, in the OS research community, Chorus was, indeed well-known. >>
    If Chorus at least doesn't vaguely ring a bell, you must have your head
    up your ass as even a bachelor-level computer scientist.


    FWIW: When I was going to college for a CS major, the emphasis was
    mostly on Microsoft technologies, and a lot of the classes were taught
    in C#. I mostly stuck with C for my own uses though (and IIRC did write
    one class project in C++/CLI).

    A CS major should concentrate on the theory (operating system principles, compiler principles, data structures, algorithmic complexity,
    security, fundamentals of programming independent upon language,
    and a survey of useful programming languages), and perhaps a look at the history of computing.

    It sounds like your CS department let you down.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to bart on Wed Jun 5 01:50:38 2024
    On Tue, 4 Jun 2024 12:35:43 +0100, bart wrote:

    It is utterly different from the linkers used with typical C code.

    It does symbol resolution, just like any linker. It handles dependencies
    (both direct and transitive), just like any linker.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to bart on Wed Jun 5 01:51:41 2024
    On Tue, 4 Jun 2024 12:28:41 +0100, bart wrote:

    On 04/06/2024 03:10, Lawrence D'Oliveiro wrote:

    On Mon, 3 Jun 2024 11:16:15 +0300, Michael S wrote:

    When all sources are available, linker is merely an implementation
    detail.

    That’s assuming all the code is written in the same language,
    compilable with the same compiler.

    Why, how many C compilers do you use for the same project?

    It’s not just C.

    And consider, even with C, the meaning of top-level “static” and the implications for compiling the source in separate pieces versus all at
    once.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to bart on Tue Jun 4 19:45:30 2024
    bart <[email protected]> writes:

    On 04/06/2024 03:10, Lawrence D'Oliveiro wrote:

    On Mon, 3 Jun 2024 11:16:15 +0300, Michael S wrote:

    When all sources are available, linker is merely an
    implementation detail.

    That's assuming all the code is written in the same language,
    compilable with the same compiler.

    Why, how many C compilers do you use for the same project?

    Depends on the project.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to BGB on Tue Jun 4 23:59:31 2024
    On 6/4/2024 9:44 PM, BGB wrote:
    On 6/4/2024 2:21 PM, Scott Lurndal wrote:
    BGB <[email protected]> writes:
    On 6/3/2024 3:23 PM, Tim Rentsch wrote:
    [email protected] (Scott Lurndal) writes:

    [ ... (internal-combustion) engines, ... ]

    It's pretty clear that the ICE is becoming a dinosaur.

    Kind of makes it full circle, doesn't it?  ;)

    Though, annoyingly, there isn't a great alternative in some use cases:
       Batteries: Lower energy density and require charging (slow);

    Both of which are an order of magnitude better than just a
    decade ago - and both energy density and charge time are
    a subject of intense research (both in the automotive
    and aircraft industries).  I fully expect that energy density
    per kilogram will be more than doubled in the next decade.


    Still pretty far tough to catch up with Ethanol or Gasoline, where it is also many orders of magnitude faster to refill a fuel tank than to charge a battery, ...

    IIRC, there aren't many battery technologies that can manage a charge rate much over 1C to 3C (so, getting a recharge time much under ~ 20 minutes or so is unlikely).



    Vs, say, refilling something like a car in ~ 25 seconds or so at a fuel pump (but, could potentially be made faster if needed). Though, there are likely to be limits here short of redesigning the mechanical interface.

    Say, it could be possible to refill a gas tank in around 3 seconds or so with enough pressure and active sensing, but whether this could be done reliably without undue risk of causing fuel tanks to rupture or similar is unclear (say, rather than
    pumping the fuel at 10 gal/min, they pump it at 90 gal/min, and effectively pressure-washing the inside of the fuel-tank).

    Also would need a fairly strong fuel hose as well (likely steel reinforced to deal with the pressure within the hose).


    The main traditional disadvantage of liquid fuel (and ICE's) vs batteries and electric motors, is the comparably low conversion efficiency. Liquid fuel would be stronger here if better conversion efficiencies were achieved (an ICE losing much of its
    potential energy as noise and heat).

    So, ideally, need some sort of semi-efficient fuel to electricity conversion (possibly using a more modest size batter pack as a buffer stage).


    Well, also some potential application areas, like human-scale robots, are hindered by not having any good way to power them (both ICE's and batteries sucking in this application area).



       Fuel Cells: More expensive and finicky.

    And if you're going to use renewable energy to crack water
    into H2, why not just use the electricity itself (concentrate
    on better storage technology rather than H2 (gas or liquid)
    fuel cells).


    Yeah, H2 just kinda sucks.

    Ethanol is much better as a fuel in most regards.

    But, effectively running fuel cells on Ethanol (rather than H2) is a more complex problem. Methanol is a little easier here, but still not great (also methanol poses a risk due to its high toxicity).


    But, yeah, not really a good way to convert electricity into Ethanol or similar.


    Methanol could be produced using electricity assuming one can scavenge enough CO2 (with water as an additional input, leaving O2 as a waste product).


    Could in theory produce methanol simply using air and electricity as inputs (scavenging both H2O and CO2), but the conversion efficiency would likely be dismal (most of the energy use would be spent running an air compressor, though an air-motor could
    recover some of this on the output side).

    Say:
    Compress air into a big tank;
    Collect water that accumulates in tank;
    Bubble compressed air through an amine solution (this collects CO2 into the solution);
    Pump amine through another tank where heat is applied to extract CO2 from the solution (it is then cooled and pumped back through the former tank, to collect more CO2);
    Collected water is subjected to a momentary pressure drop (to remove dissolved CO2), and then sent in to an electrolysis stage (to get H2 gas), with the H2 and CO2 being pumped into a heated high-pressure reaction chamber (to produce water and methanol,
    say, 250C and 75bar), with the resulting water and methanol being collected, then fed through a distillation phase (likely dropping the pressure by a controlled amount so that the methanol vaporizes but leaving the water behind); the water is then
    pumped back into the electrolysis step (which can also serves to also remove oxygen).

    Likely, things like heat control/recovery would be needed to have any semblance of efficiency (as well, one would need to recover what energy they can when the waste products are returned to atmospheric pressure).


    Pumping (followed by electrolysis) are likely to be the main energy uses, potentially much of the heating and cooling needed could be achieved through the compression and expansion stages (so potentially wouldn't need any additional energy input).

    Would need to process a fairly large volume of air relative to any methanol produced though (so, I would expect mechanical losses in the compression and expansion stages would be where most of the energy loss would occur, such as due to friction in the
    pumps and similar).

    There are whizzy solutions. But they're not practical for consumer automotive.

    The battery chemistries have "spider diagrams" which evaluate the battery on
    a number of factors. And that is how we've settled on what is inside cars today.
    The lifetime of the battery pack, received a high priority. A ballpark number is 5000 charging cycles. Real cars, some of them it might be closer to 1800. And leaving a car sitting in the hot sun, might contribute to some of the difference there. The charging history is part of it, but exposing the automobile
    to harsh conditions, can also impact the battery a bit.

    The solid state batteries in the lab, are around 1000 charge cycles currently. These have no liquid electrolyte like the currently-shipped batteries.

    There have been announcements of lab demos of battery chemistries
    with extremely short charge time. But then, the number of charge cycles is
    a joke.

    There was even a vehicle that went 1000 miles on a battery. Why ? Because
    the battery was not rechargeable at all. The battery, at destination,
    needed to be sent to a recycler. But that particular experiment was intended
    to prove they could build a battery trip with more range than your bladder.

    These are all examples of spider diagrams, where multiple points on the
    diagram are a compromise. And if they don't compare favorably to a 5000 charge cycle battery, you won't see them in a car. Not with an 8-year warranty.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to BGB on Wed Jun 5 07:14:37 2024
    On Tue, 4 Jun 2024 12:48:31 -0500, BGB wrote:

    Though, I do have some questionable experimental features, like a
    compiler option to cause arrays and pointers to be bounds-checked, which
    is sometimes useful in debugging (but in some cases can add bugs of its
    own; also makes the binaries bigger and negatively effects performance,
    ...).

    I remember some research being done into this, back in the days of Pascal.

    Remember that, in Pascal (and Ada), subranges exist as types in their own right, not just as bounds of arrays. And this allows the compiler to
    optimize array-bounds checks, and sometimes get rid of them altogether.
    E.g.

    type
    boundstype = 1 .. 10;
    var
    myarr : array [boundstype] of elttype;
    index : boundstype;

    With these definitions, an expression like

    myarr[index]

    doesn’t require any bounds-checking. Of course, assignments to index may require bounds-checking, depending on the types of values involved.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to BGB-Alt on Wed Jun 5 07:22:27 2024
    On Tue, 4 Jun 2024 17:32:50 -0500, BGB-Alt wrote:

    Linux was talked about to some extent in one of the classes (but, more
    in a high-level introductory sense). ...

    But, at the time, the then new OS was Windows Vista ...

    See, there was an opportunity missed, wasn’t it, to compare the different design approaches in the two. Consider GUIs, for example: notice how the
    GUI is commingled inextricably into the kernel in Windows, versus being a separate modular, replaceable (and removable) layer in Linux.

    Particularly relevant in the Vista situation was the big trouble over the
    3D effects in “Aero Glass”: Microsoft could only manage these on (for the time) more expensive, higher-end hardware. This led to a lot of user
    confusion over what “Vista-capable” and “Vista-ready” meant, culminating
    in lawsuits.

    Meanwhile, my modest little Asus Eee 701 PC with its single-core 900-MHz Celeron could run KDE 4 Plasma, with my choice of fun 3D effects, without missing a beat.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Wed Jun 5 07:15:31 2024
    On Mon, 03 Jun 2024 21:22:07 GMT, Scott Lurndal wrote:

    At the time, in the OS research community, Chorus was, indeed
    well-known.

    As soon as you hear “microkernel”, you know it’s essentially a museum- piece now.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Lawrence D'Oliveiro on Wed Jun 5 09:10:34 2024
    On 05/06/2024 02:50, Lawrence D'Oliveiro wrote:
    On Tue, 4 Jun 2024 12:35:43 +0100, bart wrote:

    It is utterly different from the linkers used with typical C code.

    It does symbol resolution, just like any linker. It handles dependencies (both direct and transitive), just like any linker.

    In that case I've written dozens of linkers across 40 years.

    Any language that has 'module' objects, even within the same source
    file, is a linker. Any whole-program compiler, even if its output is IR
    or assembly, is a linker. Any smart code editor could be a linker.

    The term usually refers to a program that takes independently compiled
    binaries containing native code, and produces a monolithic binary
    executable.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Thiago Adams on Wed Jun 5 15:09:31 2024
    On 05/06/2024 14:23, Thiago Adams wrote:
    (
    I am interested in the C23 subject, but I found almost impossible to
    follow such big thread. For me, it is even more difficult without google groups interface and it consumes a lot of time.
    My suggestion is to split C23 topics in smaller ones for the specific
    item like embed etc...
    )


    I think the "embed" part has run its course - or at least, we have
    covered everything that could usefully be said about it until we start
    seeing real-world implementations and real-world usage. (It spawned a
    lot of nice discussions about implementations of xxd alternatives, but
    that is not really topical about C23 any more.)

    Having separate threads for different C23 features would be a fine idea.
    Pick your favourite - or least favourite - and start a new thread.
    But let's pick something other than #embed !

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Wed Jun 5 13:32:38 2024
    Lawrence D'Oliveiro <[email protected]d> writes:
    On Mon, 03 Jun 2024 21:22:07 GMT, Scott Lurndal wrote:

    At the time, in the OS research community, Chorus was, indeed
    well-known.

    As soon as you hear “microkernel”, you know it’s essentially a museum- >piece now.

    You're really a piece of work.

    A modern hypervisor can be considered a microkernel.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to BGB on Wed Jun 5 13:29:35 2024
    BGB <[email protected]> writes:
    On 6/4/2024 2:21 PM, Scott Lurndal wrote:
    BGB <[email protected]> writes:
    On 6/3/2024 3:23 PM, Tim Rentsch wrote:
    [email protected] (Scott Lurndal) writes:

    [ ... (internal-combustion) engines, ... ]

    It's pretty clear that the ICE is becoming a dinosaur.

    Kind of makes it full circle, doesn't it? ;)

    Though, annoyingly, there isn't a great alternative in some use cases:
    Batteries: Lower energy density and require charging (slow);

    Both of which are an order of magnitude better than just a
    decade ago - and both energy density and charge time are
    a subject of intense research (both in the automotive
    and aircraft industries). I fully expect that energy density
    per kilogram will be more than doubled in the next decade.


    Still pretty far tough to catch up with Ethanol or Gasoline, where it is
    also many orders of magnitude faster to refill a fuel tank than to
    charge a battery, ...

    Many orders of magnitude? 5 minutes (petrol) vs. 20 minutes
    (supercharger to 80%)? And you can expect the latter to decrease
    with time while the former has actually increased in the past few
    decades (it used to be faster before self-service pumps were
    developed).


    IIRC, there aren't many battery technologies that can manage a charge
    rate much over 1C to 3C (so, getting a recharge time much under ~ 20
    minutes or so is unlikely).

    Actually, here's where we're going on charge time:

    https://www.fastcompany.com/91016543/scientists-just-invented-an-ev-battery-that-can-fully-charge-in-5-minutes




    Vs, say, refilling something like a car in ~ 25 seconds or so at a fuel
    pump (but, could potentially be made faster if needed). Though, there
    are likely to be limits here short of redesigning the mechanical interface.

    Come now, it's more like 5 minutes rather than 25 seconds (that is
    almost a gallon a second). Not likely to find those types of speeds
    in a self-serve gas station - just for safety sake if nothing else).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dan Cross@21:1/5 to Scott Lurndal on Wed Jun 5 13:59:35 2024
    In article <WNZ7O.9283$nd%[email protected]>,
    Scott Lurndal <[email protected]> wrote:
    Lawrence D'Oliveiro <[email protected]d> writes:
    On Mon, 03 Jun 2024 21:22:07 GMT, Scott Lurndal wrote:

    At the time, in the OS research community, Chorus was, indeed
    well-known.

    As soon as you hear “microkernel”, you know it’s essentially a museum- >>piece now.

    You're really a piece of work.

    A modern hypervisor can be considered a microkernel.

    I don't understand why people still engage with this clown.
    Lawrence is obviously a troll.

    - Dan C.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to bart on Thu Jun 6 02:12:26 2024
    On Wed, 5 Jun 2024 09:10:34 +0100, bart wrote:

    Any language that has 'module' objects, even within the same source
    file, is a linker.

    I did point out that linking involved pulling multiple files together, did
    I not?

    The term usually refers to a program that takes independently compiled binaries containing native code ...

    Binaries of some form, yes. Remember “native” is just as relative a term
    as “hardware” is.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Michael S on Thu Jun 6 14:43:25 2024
    On Sun, 2 Jun 2024 01:11:35 +0300
    Michael S <[email protected]> wrote:

    On Fri, 31 May 2024 22:15:54 +0100
    bart <[email protected]> wrote:

    If I run this:

    printf("%p\n", &_binary_hello_c_start);
    printf("%p\n", &_binary_hello_c_end);
    printf("%p\n", &_binary_hello_c_size);

    I get:

    00007ff6ef252010
    00007ff6ef252056
    00007ff5af240046

    I can see that the first two can be subtracted to give the sizes of
    the data, which is 70 or 0x46. 0x46 is the last byte of the address
    of _size, so what's happening there? What's with the crap in bits
    16-47?


    It looks like ASLR. I don't see it because I test on Win7.



    I tried it on versions of Windows that have ASLR. Had seen no problems.
    I see *_start and *_end at high addresses and sometimes changing
    between invocations, which means that ASLR is certainly in effect, but
    *_size always prints correct result.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Lawrence D'Oliveiro on Thu Jun 6 19:38:08 2024
    On 06/06/2024 03:12, Lawrence D'Oliveiro wrote:
    On Wed, 5 Jun 2024 09:10:34 +0100, bart wrote:

    Any language that has 'module' objects, even within the same source
    file, is a linker.

    I did point out that linking involved pulling multiple files together, did
    I not?

    The term usually refers to a program that takes independently compiled
    binaries containing native code ...

    Binaries of some form, yes. Remember “native” is just as relative a term as “hardware” is.

    Sorry, I've lost track of what it is you are trying to prove. That
    everything is a linker?

    Of course, mine is the narrow viewpoint of someone who has implemented
    multiple assemblers, compilers, interpreters and linkers (that I call
    loaders) of various kinds across decades.

    I thought I had long eliminated the need for traditional 'linking' from
    my tools, but apparently not; I've been writing linkers without being
    aware of it! How about that?

    But if you want the last word, then you're welcome. You're obviously
    100% right and I'm 100% wrong; happy?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to BGB-Alt on Thu Jun 6 21:38:27 2024
    BGB-Alt <[email protected]> writes:
    On 5/31/2024 4:11 PM, Scott Lurndal wrote:
    jak <[email protected]> writes:
    bart ha scritto:
    On 31/05/2024 15:34, Michael S wrote:
    On Fri, 31 May 2024 15:04:46 +0100
    bart <[email protected]> wrote:



    <snip>


    Instead of one compiler, here I used two compilers, a tool 'objcopy'
    (which bizarrely needs to generate ELF format files) and lots of extra >>>> ugly code. I also need to disregard whatever the hell _binary_..._size >>>> does.

    But it works.



    You could use the pe-x86-64 format instead of the elf64-x86-64 to reduce >>> the size of the object.

    By a half dozen bytes, perhaps, and only if your binutils have been
    built to support pe-x86-64:

    $ objcopy -I binary -O pe-x86-64 main.cpp /tmp/test1.o
    objcopy:/tmp/test1.o: Invalid bfd target

    The ELF64 format has a 64 byte header, the string table and the
    symbol table, and the remainder is the binary
    data. The PE header may save a few bytes by using 32-bit fields in
    the PE COFF header and symbol table.

    Note, you might want to trim your posts when replying with a one-sentence reply.


    While I can't say much for using objcopy here (it is likely to be
    hindered by however the program was compiled and linked, in any case),
    in some other contexts PE/COFF can save more significant amounts of
    space vs ELF.


    In particular:

    PE/COFF typically only stores symbols for imports and exports, rather
    than for every symbol in the binary (though, IIRC, GCC+LD does tend to >generate PE/COFF output with every symbol present, *1, so this advantage
    is mostly N/A if using GCC).

    $ man 1 strip


    The PE/COFF base relocation format is more compact than the ELF64
    relocation formats:
    ELF64 tends to spend 24 bytes for every symbol, and 24 bytes for each
    reloc; along with an ASCII string for every symbol.

    Use ELF32 then.


    It also tends to redirect most calls and loads/stores for global
    variables through the GOT, rather than using PC-relative / RIP-relative >addressing (or fixed displacements relative to a Global Pointer),
    causing the generated code to be larger (along with the size of the GOT).

    That has nothing to do with ELF, per se. The ELF format supports
    dynamic linking. It does not require it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to BGB-Alt on Fri Jun 7 00:53:45 2024
    On Thu, 6 Jun 2024 15:38:21 -0500, BGB-Alt wrote:

    *2: Seemingly the main way I am aware of to get small binaries is to use
    an older version of MSVC (such as 6.0 to 9.0), as the binary-bloat
    started to get much more obvious around Visual Studio 2010, but is less
    of an issue with VS2005 or VS2008.

    Newer version of proprietary compiler generates worse code than older version?!?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to bart on Fri Jun 7 00:55:00 2024
    On Thu, 6 Jun 2024 19:38:08 +0100, bart wrote:

    Sorry, I've lost track of what it is you are trying to prove. That
    everything is a linker?

    You were the one trying to prove that linkerless programming was a good
    idea, or something. And then you tried to distract attention from the
    weakness of your arguments by bringing Python into it.

    Not working out so well now, is it?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to BGB on Fri Jun 7 00:57:51 2024
    On Wed, 5 Jun 2024 04:01:28 -0500, BGB wrote:

    For my bounds-checking in C, there are no syntactic changes to C.

    But how efficient is it? Those research papers I mentioned reported being
    able to get the execution overhead in Pascal down to something like 5-10%.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Fri Jun 7 00:59:09 2024
    On Wed, 05 Jun 2024 13:32:38 GMT, Scott Lurndal wrote:

    A modern hypervisor can be considered a microkernel.

    Oh, look who’s desperately trying to keep the “microkernel” brand alive.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to BGB on Fri Jun 7 09:04:43 2024
    On Fri, 7 Jun 2024 00:51:22 -0500, BGB wrote:

    Generally, using ELF32 on 64-bit targets isn't a thing...

    It might have been, if Intel’s promotion of an “X32” ABI (keeping addresses at 32 bits, but using the extra instructions for the increased register set) for AMD64 had taken off. Even the Linux kernel supported it,
    at one point. But nobody seemed to care.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Lawrence D'Oliveiro on Fri Jun 7 22:23:43 2024
    On 07/06/2024 01:55, Lawrence D'Oliveiro wrote:
    On Thu, 6 Jun 2024 19:38:08 +0100, bart wrote:

    Sorry, I've lost track of what it is you are trying to prove. That
    everything is a linker?

    You were the one trying to prove that linkerless programming was a good
    idea, or something. And then you tried to distract attention from the weakness of your arguments by bringing Python into it.

    Not working out so well now, is it?

    For me it's working brilliantly. I have two native code compilers that
    don't use a traditional linker:

    * A C compiler with independently compiled modules
    * A non-C compiler with whole-program compilation

    It's you who can't get your head around the idea that someone could be
    away with a 'linker'.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to bart on Sat Jun 8 00:39:18 2024
    On 2024-06-07, bart <[email protected]> wrote:
    It's you who can't get your head around the idea that someone could be
    away with a 'linker'.

    You can do away with linkers and linking.

    But it's pretty helpful when

    1. the same library is reused for many programs.

    2. you're selling a library, and would like to ship a binary image of
    that library.

    Without linkage, you don't have a library ecosystem.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @[email protected]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Kaz Kylheku on Sat Jun 8 02:14:37 2024
    On 08/06/2024 01:39, Kaz Kylheku wrote:
    On 2024-06-07, bart <[email protected]> wrote:
    It's you who can't get your head around the idea that someone could be
    away with a 'linker'.

    You can do away with linkers and linking.

    But it's pretty helpful when

    1. the same library is reused for many programs.


    You use a shared library.

    2. you're selling a library, and would like to ship a binary image of
    that library.

    You ship a shared library.

    Without linkage, you don't have a library ecosystem.


    Of course you do. Eg. a program depends on the vast WinAPI; but you
    don't have to ship copies of all its DLLs, neither do you have to
    statically link them.

    There are some fixups involving even with using dynamic linking. That's
    taken care of by the OS loader. But the code involved isn't extensive.
    Here is an 800-line C program:

    https://github.com/sal55/langs/blob/master/runmx.c

    that loads a private executable format of mine; it loads any dynamic
    libraries also in the same format (but it doesn't multiple instances
    with other processes); and it resolves any symbols from dependent DLLs.
    Then it runs the program.

    (Here written using WinAPI calls.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to BGB-Alt on Sat Jun 8 03:08:06 2024
    On Fri, 7 Jun 2024 16:58:08 -0500, BGB-Alt wrote:

    I think code generation went in the bulky direction when they started
    adding auto-vectorization, and not really any option to be like "Yes, I
    want SIMD instructions enabled, but, no, don't autovectorize."

    Sometimes vectorization makes things faster, sometimes not, but one
    thing it does do, is make the generated binaries bigger.

    And MSVC is the compiler that Microsoft use to build Windows itself, isn’t it?

    Unless they’ve turned to GCC now ...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to bart on Sat Jun 8 03:55:19 2024
    On 2024-06-08, bart <[email protected]> wrote:
    On 08/06/2024 01:39, Kaz Kylheku wrote:
    On 2024-06-07, bart <[email protected]> wrote:
    It's you who can't get your head around the idea that someone could be
    away with a 'linker'.

    You can do away with linkers and linking.

    But it's pretty helpful when

    1. the same library is reused for many programs.

    You use a shared library.

    That's linking.

    Static linking is the same thing as dynamic except it's being
    precomputed: the libs are dynamically processed, but then rather
    than the program being run, its image is dumped into an executable.
    That executable no then longer needs to repeat that library processing
    when started; everything is integrated. (There are ways to optimize
    linking so not all the material must be present in memory all at once
    as I describe it above.)

    2. you're selling a library, and would like to ship a binary image of
    that library.

    You ship a shared library.

    No, not always. There is such thing as selling static libraries.

    Numerical code, crypto, codecs.

    A few times in my career I worked with purchased static libs.

    There are some advantages to it, like that static calls can be
    faster than dynamic, and unused parts of static libs can be
    removed at link time.

    Another aspect is that it's possible for static libs to be platform-independent, to an extent, because some of the
    object formats like COFF are widely recognized. Whereas
    shared libs tend to be very OS specific. The vendor has to make
    them separately for Windows, Linux, Solaris, BSD, Mac, ...

    This gruntwork is a pain in the ass that is removed from
    the core value of your code.

    The integrator who buys your static lib can turn it into a
    shared lib for their target system, if they are so inclined.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @[email protected]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to BGB on Sat Jun 8 08:27:22 2024
    On Sat, 8 Jun 2024 00:04:02 -0500, BGB wrote:

    Can also note that the compilers handle debugging info different:
    GCC tends to put debug data in the binary itself (as DWARF or STABS);

    Doesn’t have to be that way. Distros that offer precompiled binaries (e.g. Debian and derivatives) tend to have debug symbols in separate packages
    that you don’t have to install. They’re only needed if you want to run a debugger on the actual binary from the package, rather than from your own build.

    But, in general, I suspect MS doesn't care if the EXE and DLL files are
    bulky and if their compiler doesn't win the performance game.

    Yes, but isn’t this impacting the performance of everything they build
    with it, including their own OS?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Kaz Kylheku on Sat Jun 8 11:14:45 2024
    On 08/06/2024 04:55, Kaz Kylheku wrote:
    On 2024-06-08, bart <[email protected]> wrote:
    On 08/06/2024 01:39, Kaz Kylheku wrote:
    On 2024-06-07, bart <[email protected]> wrote:
    It's you who can't get your head around the idea that someone could be >>>> away with a 'linker'.

    You can do away with linkers and linking.

    But it's pretty helpful when

    1. the same library is reused for many programs.

    You use a shared library.

    That's linking.

    Static linking is the same thing as dynamic except it's being
    precomputed: the libs are dynamically processed, but then rather
    than the program being run, its image is dumped into an executable.
    That executable no then longer needs to repeat that library processing
    when started; everything is integrated. (There are ways to optimize
    linking so not all the material must be present in memory all at once
    as I describe it above.)

    The actual process of linking is a fairly trivial matter, as I showed in
    my 0.8Kloc C program which not only loads and relocates an executable
    file (in my format), but loads, relocates and does symbol fixups of any
    dynamic libraries. (Plus fixes up DLL dependencies too! But the
    relocation of those is done within OS routines at my request.)

    What is different in formats like PE and ELF is their tremendous
    complexity; my formats are considerably simpler.

    What I mean by 'doing anyway with linkers and linking' is removing the
    need to run a discrete program that might be called 'ld' or 'link' or indirectly via 'gcc', from a language implementation.

    Primarily by using whole-program compilation, where any inter-module
    references are sorted out early on within the compiler via the global
    symbol table. The compiler directly generates EXE/DLL from source files.

    For C, the language requires independent compilation. Here, I generate
    ASM files. But while traditionally those are assembled to object files
    and linked, I use a special assembler where ASM files are directly
    turned into EXE or DLL files.

    The linking process is again done by manipulating a global symbol table.
    There are no object files, and no separate discrete link step.

    2. you're selling a library, and would like to ship a binary image of
    that library.

    You ship a shared library.

    No, not always. There is such thing as selling static libraries.

    Numerical code, crypto, codecs.

    A few times in my career I worked with purchased static libs.

    If you obtain a static library in the form of an object file or archive,
    then yes you will need a program that can process that file and combine
    it with the rest of your application: a linker.

    But if /I/ were to write a linker, even to process PE/OFF files, it
    would be a 50Kloc application. (There is already such a product, not
    mine, which is 47KB, but it has some peculiarities.)


    There are some advantages to it, like that static calls can be
    faster than dynamic,

    If you do your own fixups (you generate an executable where DLL
    dependences are resolved via your initialisation code rather than
    getting the OS to do it), you can arrange it so that calls to imported
    routines are direct.

    But I don't think it's worth the trouble. You generally know that calls
    across FFI boundaries are going to be a tiny bit slower. That is, by
    needing to execute one extra indirect and probably fully predicted jump
    per call. So usually insignificant.



    and unused parts of static libs can be
    removed at link time.

    Another aspect is that it's possible for static libs to be platform-independent, to an extent, because some of the
    object formats like COFF are widely recognized. Whereas
    shared libs tend to be very OS specific. The vendor has to make
    them separately for Windows, Linux, Solaris, BSD, Mac, ...


    Windows tends to use PE (which includes COFF). Linux tends to use ELF.

    The thing about my private formats (MX/ML) is they would have been cross-platform.

    This gruntwork is a pain in the ass that is removed from
    the core value of your code.

    The integrator who buys your static lib can turn it into a
    shared lib for their target system, if they are so inclined.

    Sure. My tools can generate OBJ files if necessary. But then it'll be
    somebody else who needs to invoke a linker. Not me.

    But if I were to supply a binary, it would be in the form of a DLL.
    There are roundabout ways of bundling it into a EXE if necessary (my ML
    format would be better for such a purpose).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Sat Jun 8 13:09:18 2024
    Lawrence D'Oliveiro <[email protected]d> writes:
    On Fri, 7 Jun 2024 16:58:08 -0500, BGB-Alt wrote:

    I think code generation went in the bulky direction when they started
    adding auto-vectorization, and not really any option to be like "Yes, I
    want SIMD instructions enabled, but, no, don't autovectorize."

    Sometimes vectorization makes things faster, sometimes not, but one
    thing it does do, is make the generated binaries bigger.

    And MSVC is the compiler that Microsoft use to build Windows itself, isn’t >it?

    Last time I built NT, it used the command line compiler 'cl.exe', IIRC.

    Granted that was 1998.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Malcolm McLean on Sun Jun 9 00:45:55 2024
    On Sat, 8 Jun 2024 19:28:47 +0100, Malcolm McLean wrote:

    On 07/06/2024 01:53, Lawrence D'Oliveiro wrote:

    On Thu, 6 Jun 2024 15:38:21 -0500, BGB-Alt wrote:

    *2: Seemingly the main way I am aware of to get small binaries is to
    use an older version of MSVC (such as 6.0 to 9.0), as the binary-bloat
    started to get much more obvious around Visual Studio 2010, but is
    less of an issue with VS2005 or VS2008.

    Newer version of proprietary compiler generates worse code than older
    version?!?

    If the code is calling extern gunctions that do IO, we woul expect these
    to be massively more sophisticated on a modern ststem Witha little
    comouter, pribtf just wtites acharacter raster and utimalthe he Os picks
    the up and flushes it out to a pixel raster. And that' aal it's doing.
    Whilst on a modrern syste, stdout can do whole lot of intricate things.

    Nothing to do with the compiler, though.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Sun Jun 9 00:46:37 2024
    On Sat, 08 Jun 2024 13:09:18 GMT, Scott Lurndal wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:

    On Fri, 7 Jun 2024 16:58:08 -0500, BGB-Alt wrote:

    I think code generation went in the bulky direction when they started
    adding auto-vectorization, and not really any option to be like "Yes,
    I want SIMD instructions enabled, but, no, don't autovectorize."

    Sometimes vectorization makes things faster, sometimes not, but one
    thing it does do, is make the generated binaries bigger.

    And MSVC is the compiler that Microsoft use to build Windows itself, >>isn’t it?

    Last time I built NT, it used the command line compiler 'cl.exe', IIRC.

    Granted that was 1998.

    Is that supposed to be an entirely different compiler? I would assume it
    was simply a different way of invoking the same basic compiler engine.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Scott Lurndal on Sun Jun 9 11:19:53 2024
    On Sat, 08 Jun 2024 13:09:18 GMT
    [email protected] (Scott Lurndal) wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:
    On Fri, 7 Jun 2024 16:58:08 -0500, BGB-Alt wrote:

    I think code generation went in the bulky direction when they
    started adding auto-vectorization, and not really any option to be
    like "Yes, I want SIMD instructions enabled, but, no, don't
    autovectorize."

    Sometimes vectorization makes things faster, sometimes not, but one
    thing it does do, is make the generated binaries bigger.

    And MSVC is the compiler that Microsoft use to build Windows itself, >isn’t it?

    Last time I built NT, it used the command line compiler 'cl.exe',
    IIRC.

    Granted that was 1998.

    MSVC is a common informal moniker. It seems, after it was adapted by godbolt.org it became even more common than before.
    cl.exe is the name of executive for very long time. I don't know how
    long exactly, but would guess that more than 30 years.

    Versions of compiler that were officially approved to build kernel
    modules (supplied with DDK) were historically not the same versions
    that were sold for user mode application development as part Visual
    Studio package or of Windows SDK package. Not that they were radically different, just frozen at different points in time.
    Nowadays, it seems, the versions are more in sync.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to BGB on Sun Jun 9 12:40:32 2024
    On Sat, 8 Jun 2024 14:52:26 -0500
    BGB <[email protected]> wrote:

    On 6/8/2024 1:28 PM, Malcolm McLean wrote:
    On 07/06/2024 01:53, Lawrence D'Oliveiro wrote:
    On Thu, 6 Jun 2024 15:38:21 -0500, BGB-Alt wrote:

    *2: Seemingly the main way I am aware of to get small binaries is
    to use an older version of MSVC (such as 6.0 to 9.0), as the
    binary-bloat started to get much more obvious around Visual
    Studio 2010, but is less of an issue with VS2005 or VS2008.

    Newer version of proprietary compiler generates worse code than
    older version?!?
    If the code is calling extern gunctions that do IO, we woul expect
    these to be massively more sophisticated on a modern ststem Witha
    little comouter, pribtf just wtites acharacter raster and utimalthe
    he Os picks the up and flushes it out to a pixel raster. And that'
    aal it's doing. Whilst on a modrern syste, stdout can do whole lot
    of intricate things.

    That is a whole lot of typos...


    But, even if it is built calling MSVCRT as a DLL (rather than static
    linked), modern MSVC is still the worst of the bunch in this area.

    A build as RISC-V + PIE with a static-linked C library still manages
    to be smaller than an x64 build via MSVC with entirely dynamic-linked libraries.

    And, around 72% bigger than the same program built as a
    dynamic-linked binary with "GCC -O3" (while also often still being
    around 40% slower).


    GCC on Windows or on Linux?
    In my experience, gcc on Windows (ucrt64 variant, other gcc variants
    are worse) very consistently produces bigger (stripped) exe than even
    latest MSVCs which, as you correctly stated, are not as good as older
    versions at producing small code.

    The size of 'Hello, world' program (x86-64, dynamically linked C RTL)
    vs2013 - 6,144 bytes
    vs2019 - 9,216 bytes
    gcc (Debian Linux, -no-pie) - 14,400 bytes
    gcc (Debian Linux) - 14,472 bytes
    gcc (ucrt64 DLL) - 18,432 bytes
    gcc (old DLL) - 42,496 bytes

    MSVC compilation flags: -O1 -MD
    gcc compilation flags: -Oz -s

    Contrast, VS2008 can build programs with binary sizes closer to those
    of GCC.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Sun Jun 9 11:20:11 2024
    On 09/06/2024 10:40, Michael S wrote:
    On Sat, 8 Jun 2024 14:52:26 -0500
    BGB <[email protected]> wrote:

    On 6/8/2024 1:28 PM, Malcolm McLean wrote:
    On 07/06/2024 01:53, Lawrence D'Oliveiro wrote:
    On Thu, 6 Jun 2024 15:38:21 -0500, BGB-Alt wrote:

    *2: Seemingly the main way I am aware of to get small binaries is
    to use an older version of MSVC (such as 6.0 to 9.0), as the
    binary-bloat started to get much more obvious around Visual
    Studio 2010, but is less of an issue with VS2005 or VS2008.

    Newer version of proprietary compiler generates worse code than
    older version?!?
    If the code is calling extern gunctions that do IO, we woul expect
    these to be massively more sophisticated on a modern ststem Witha
    little comouter, pribtf just wtites acharacter raster and utimalthe
    he Os picks the up and flushes it out to a pixel raster. And that'
    aal it's doing. Whilst on a modrern syste, stdout can do whole lot
    of intricate things.

    That is a whole lot of typos...


    But, even if it is built calling MSVCRT as a DLL (rather than static
    linked), modern MSVC is still the worst of the bunch in this area.

    A build as RISC-V + PIE with a static-linked C library still manages
    to be smaller than an x64 build via MSVC with entirely dynamic-linked
    libraries.

    And, around 72% bigger than the same program built as a
    dynamic-linked binary with "GCC -O3" (while also often still being
    around 40% slower).


    GCC on Windows or on Linux?
    In my experience, gcc on Windows (ucrt64 variant, other gcc variants
    are worse) very consistently produces bigger (stripped) exe than even
    latest MSVCs which, as you correctly stated, are not as good as older versions at producing small code.

    The size of 'Hello, world' program (x86-64, dynamically linked C RTL)
    vs2013 - 6,144 bytes
    vs2019 - 9,216 bytes
    gcc (Debian Linux, -no-pie) - 14,400 bytes
    gcc (Debian Linux) - 14,472 bytes
    gcc (ucrt64 DLL) - 18,432 bytes
    gcc (old DLL) - 42,496 bytes

    I get a lot worse than that:

    C:\c>gcc hello.c

    C:\c>dir a.exe
    09/06/2024 11:04 367,349 a.exe

    C:\c>gcc hello.c -s -Os

    C:\c>dir a.exe
    09/06/2024 11:04 88,064 a.exe

    (It didn't like -Oz; did you mean something other than -Os?)

    Both import msvcrt.dll. gcc is version 10.3.0.

    tcc gives 2KB, and mcc gives 2.5KB.

    (With the latter, I know it is because it uses a comprises 5 blocks of
    data each of which is at least 512 bytes: 2 for header stuff, plus
    always 3 segments. The mininum hello.exe size I think is 700 bytes if a
    few corners are cut.)

    367KB sounds astonishing, but the first time I tried Dart, it gave me a
    5MB executable for 'hello.dart'.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Sun Jun 9 14:12:39 2024
    On Sun, 9 Jun 2024 11:20:11 +0100
    bart <[email protected]> wrote:

    On 09/06/2024 10:40, Michael S wrote:
    On Sat, 8 Jun 2024 14:52:26 -0500
    BGB <[email protected]> wrote:

    On 6/8/2024 1:28 PM, Malcolm McLean wrote:
    On 07/06/2024 01:53, Lawrence D'Oliveiro wrote:
    On Thu, 6 Jun 2024 15:38:21 -0500, BGB-Alt wrote:

    *2: Seemingly the main way I am aware of to get small binaries
    is to use an older version of MSVC (such as 6.0 to 9.0), as the
    binary-bloat started to get much more obvious around Visual
    Studio 2010, but is less of an issue with VS2005 or VS2008.

    Newer version of proprietary compiler generates worse code than
    older version?!?
    If the code is calling extern gunctions that do IO, we woul expect
    these to be massively more sophisticated on a modern ststem Witha
    little comouter, pribtf just wtites acharacter raster and
    utimalthe he Os picks the up and flushes it out to a pixel
    raster. And that' aal it's doing. Whilst on a modrern syste,
    stdout can do whole lot of intricate things.

    That is a whole lot of typos...


    But, even if it is built calling MSVCRT as a DLL (rather than
    static linked), modern MSVC is still the worst of the bunch in
    this area.

    A build as RISC-V + PIE with a static-linked C library still
    manages to be smaller than an x64 build via MSVC with entirely
    dynamic-linked libraries.

    And, around 72% bigger than the same program built as a
    dynamic-linked binary with "GCC -O3" (while also often still being
    around 40% slower).


    GCC on Windows or on Linux?
    In my experience, gcc on Windows (ucrt64 variant, other gcc variants
    are worse) very consistently produces bigger (stripped) exe than
    even latest MSVCs which, as you correctly stated, are not as good
    as older versions at producing small code.

    The size of 'Hello, world' program (x86-64, dynamically linked C
    RTL) vs2013 - 6,144 bytes
    vs2019 - 9,216 bytes
    gcc (Debian Linux, -no-pie) - 14,400 bytes
    gcc (Debian Linux) - 14,472 bytes
    gcc (ucrt64 DLL) - 18,432 bytes
    gcc (old DLL) - 42,496 bytes

    I get a lot worse than that:

    C:\c>gcc hello.c

    C:\c>dir a.exe
    09/06/2024 11:04 367,349 a.exe

    C:\c>gcc hello.c -s -Os

    C:\c>dir a.exe
    09/06/2024 11:04 88,064 a.exe

    (It didn't like -Oz; did you mean something other than -Os?)


    No, I meant -Oz.
    It was invented by clang, but newer gcc understand it.
    I don't know what is a difference exactly, but -Oz tends to be a little smaller.
    In program as trivial as this, there should be no difference.

    Both import msvcrt.dll. gcc is version 10.3.0.


    My gcc variants are from msys2.
    Where did you get yours?

    tcc gives 2KB, and mcc gives 2.5KB.


    x86-64 or i386?
    I think, on i386 VC5 can come close, but can not match it.
    I don't have VC5 right now. Last time I tried to find it it was
    surprisingly hard.
    Well, probably I still has it on one very old PC that I didn't power up
    for many years. I don't know if it is still alive.

    (With the latter, I know it is because it uses a comprises 5 blocks
    of data each of which is at least 512 bytes: 2 for header stuff, plus
    always 3 segments. The mininum hello.exe size I think is 700 bytes if
    a few corners are cut.)

    367KB sounds astonishing, but the first time I tried Dart, it gave me
    a 5MB executable for 'hello.dart'.

    golang tend to start at >1.5MB, but then it grows very slowly. It
    appears to generate *very* self-contained executives. At least I
    personally never encountered case where simple copy of exe to new
    computer was insufficient.
    Considering that go needs much more of run-time support than dart, I
    can't find any reason for 5MB except "they don't care".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Michael S on Sun Jun 9 14:44:27 2024
    On Sun, 9 Jun 2024 14:12:39 +0300
    Michael S <[email protected]> wrote:

    On Sun, 9 Jun 2024 11:20:11 +0100
    bart <[email protected]> wrote:


    367KB sounds astonishing, but the first time I tried Dart, it gave
    me a 5MB executable for 'hello.dart'.

    golang tend to start at >1.5MB, but then it grows very slowly. It
    appears to generate *very* self-contained executives. At least I
    personally never encountered case where simple copy of exe to new
    computer was insufficient.
    Considering that go needs much more of run-time support than dart, I
    can't find any reason for 5MB except "they don't care".


    If we started talking about size of statically linked binaries, in this
    field [on x86-64] an advantage of Windows/MSVC over Linux/gcc appears
    quite huge.

    MSVC 2013 - 84,480 bytes
    MSVC 2019 - 119,808 bytes
    gcc (Debian Linux) - 682,688 bytes

    By old standards, MSVC binary is bloated beyond reason, but
    comparatively to gcc/Linux it looks almost lean.

    I can't say that I care deeply, but can't say that I don't care at all
    either. Statically linked binaries is the only way by which I was able
    to copy programs compiled on relatively new Debian to Ubuntu-LTS that
    was not that much older (2-3 years). I fully believe that there exist
    other methods, but they are above my skills and above skills of
    co-workers.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Sun Jun 9 20:00:14 2024
    On Sun, 9 Jun 2024 17:32:40 +0100
    bart <[email protected]> wrote:

    On 09/06/2024 12:12, Michael S wrote:
    On Sun, 9 Jun 2024 11:20:11 +0100
    bart <[email protected]> wrote:


    GCC on Windows or on Linux?
    In my experience, gcc on Windows (ucrt64 variant, other gcc
    variants are worse) very consistently produces bigger (stripped)
    exe than even latest MSVCs which, as you correctly stated, are
    not as good as older versions at producing small code.

    The size of 'Hello, world' program (x86-64, dynamically linked C
    RTL) vs2013 - 6,144 bytes
    vs2019 - 9,216 bytes
    gcc (Debian Linux, -no-pie) - 14,400 bytes
    gcc (Debian Linux) - 14,472 bytes
    gcc (ucrt64 DLL) - 18,432 bytes
    gcc (old DLL) - 42,496 bytes

    I get a lot worse than that:

    C:\c>gcc hello.c

    C:\c>dir a.exe
    09/06/2024 11:04 367,349 a.exe

    C:\c>gcc hello.c -s -Os

    C:\c>dir a.exe
    09/06/2024 11:04 88,064 a.exe

    (It didn't like -Oz; did you mean something other than -Os?)


    No, I meant -Oz.
    It was invented by clang, but newer gcc understand it.
    I don't know what is a difference exactly, but -Oz tends to be a
    little smaller.
    In program as trivial as this, there should be no difference.

    Both import msvcrt.dll. gcc is version 10.3.0.


    My gcc variants are from msys2.
    Where did you get yours?

    It's gcc/TDM.

    I never heard about TDM except from you.

    Anything else, I can spend 10 minutes following links
    to a mingw download, only to end up back where I started from.
    gcc/TDM is a much simpler installation.


    Somehow, I installed msys2 many times, using 2 or 3 different methods
    and it worked every single time. It's huge download, but it works.
    There were cases where I had problems installing additional packages on
    top of msys2, but they were always caused by idiotic policies of
    corporate IT. At my personal systems it was always flawless.

    This page appear to give correct up to date instructions https://www.msys2.org/#installation


    tcc gives 2KB, and mcc gives 2.5KB.


    x86-64 or i386?

    All were for x64.

    gcc's stdio.h header defines `printf` (which my hello.c uses) as an
    inlined wrapper based around `__mingw_vasprintf()`. So there might
    be further inlined stuff or that is statically linked, before it
    finally ends up calling the real `printf`.


    The size you mentioned in the previous post is suspiciously similar to
    the size VS2013 statically linked binary.

    With gcc, I get 39.9KB for -m32 -Os -s.


    That is smaller than statically linked 32-bit VS2013 (73,216 bytes).
    But a lot bigger than 6,144 DLL-based VS2013 32bit binary.

    If I use 'puts' instead, and -m32, then it gets down to 14KB.



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Sun Jun 9 17:32:40 2024
    On 09/06/2024 12:12, Michael S wrote:
    On Sun, 9 Jun 2024 11:20:11 +0100
    bart <[email protected]> wrote:


    GCC on Windows or on Linux?
    In my experience, gcc on Windows (ucrt64 variant, other gcc variants
    are worse) very consistently produces bigger (stripped) exe than
    even latest MSVCs which, as you correctly stated, are not as good
    as older versions at producing small code.

    The size of 'Hello, world' program (x86-64, dynamically linked C
    RTL) vs2013 - 6,144 bytes
    vs2019 - 9,216 bytes
    gcc (Debian Linux, -no-pie) - 14,400 bytes
    gcc (Debian Linux) - 14,472 bytes
    gcc (ucrt64 DLL) - 18,432 bytes
    gcc (old DLL) - 42,496 bytes

    I get a lot worse than that:

    C:\c>gcc hello.c

    C:\c>dir a.exe
    09/06/2024 11:04 367,349 a.exe

    C:\c>gcc hello.c -s -Os

    C:\c>dir a.exe
    09/06/2024 11:04 88,064 a.exe

    (It didn't like -Oz; did you mean something other than -Os?)


    No, I meant -Oz.
    It was invented by clang, but newer gcc understand it.
    I don't know what is a difference exactly, but -Oz tends to be a little smaller.
    In program as trivial as this, there should be no difference.

    Both import msvcrt.dll. gcc is version 10.3.0.


    My gcc variants are from msys2.
    Where did you get yours?

    It's gcc/TDM. Anything else, I can spend 10 minutes following links to a
    mingw download, only to end up back where I started from. gcc/TDM is a
    much simpler installation.

    tcc gives 2KB, and mcc gives 2.5KB.


    x86-64 or i386?

    All were for x64.

    gcc's stdio.h header defines `printf` (which my hello.c uses) as an
    inlined wrapper based around `__mingw_vasprintf()`. So there might be
    further inlined stuff or that is statically linked, before it finally
    ends up calling the real `printf`.

    With gcc, I get 39.9KB for -m32 -Os -s.

    If I use 'puts' instead, and -m32, then it gets down to 14KB.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Sun Jun 9 21:06:00 2024
    On 09/06/2024 18:00, Michael S wrote:
    On Sun, 9 Jun 2024 17:32:40 +0100
    bart <[email protected]> wrote:

    On 09/06/2024 12:12, Michael S wrote:
    On Sun, 9 Jun 2024 11:20:11 +0100
    bart <[email protected]> wrote:


    GCC on Windows or on Linux?
    In my experience, gcc on Windows (ucrt64 variant, other gcc
    variants are worse) very consistently produces bigger (stripped)
    exe than even latest MSVCs which, as you correctly stated, are
    not as good as older versions at producing small code.

    The size of 'Hello, world' program (x86-64, dynamically linked C
    RTL) vs2013 - 6,144 bytes
    vs2019 - 9,216 bytes
    gcc (Debian Linux, -no-pie) - 14,400 bytes
    gcc (Debian Linux) - 14,472 bytes
    gcc (ucrt64 DLL) - 18,432 bytes
    gcc (old DLL) - 42,496 bytes

    I get a lot worse than that:

    C:\c>gcc hello.c

    C:\c>dir a.exe
    09/06/2024 11:04 367,349 a.exe

    C:\c>gcc hello.c -s -Os

    C:\c>dir a.exe
    09/06/2024 11:04 88,064 a.exe

    (It didn't like -Oz; did you mean something other than -Os?)


    No, I meant -Oz.
    It was invented by clang, but newer gcc understand it.
    I don't know what is a difference exactly, but -Oz tends to be a
    little smaller.
    In program as trivial as this, there should be no difference.

    Both import msvcrt.dll. gcc is version 10.3.0.


    My gcc variants are from msys2.
    Where did you get yours?

    It's gcc/TDM.

    I never heard about TDM except from you.

    Anything else, I can spend 10 minutes following links
    to a mingw download, only to end up back where I started from.
    gcc/TDM is a much simpler installation.


    Somehow, I installed msys2 many times, using 2 or 3 different methods
    and it worked every single time. It's huge download, but it works.
    There were cases where I had problems installing additional packages on
    top of msys2, but they were always caused by idiotic policies of
    corporate IT. At my personal systems it was always flawless.

    I'm not talking about MSYS2. I'm not even sure what it is. msys2.org
    describes it as:

    "MSYS2 is software distribution and a building platform for Windows. It provides a Unix-like environment, a command-line interface and a
    software repository making it easier to install, use, build and port
    software on Windows. That means Bash, Autotools, Make, Git, GCC, GDB...,
    all easily installable through Pacman, a fully-featured package manager."

    Um, I only want an optimising C compiler, nothing else! And especially I
    do NOT want a 'Unix-like' environment; I think it is entirely
    unnecessary for a tool that simply converts .c files into .exe files.


    This page appear to give correct up to date instructions https://www.msys2.org/#installation


    Today I tried once more to install mingw gcc. One hit gave me this page:

    https://www.naukri.com/code360/library/gcc-compiler-for-windows

    Step 1 tells me to click here:

    https://sourceforge.net/projects/mingw-w64/

    It says: "A complete runtime environment for gcc"; hmm; it doesn't sound
    like a compiler! But I'm just following the instructions.

    After 10 minutes I had a 110B installation with 6000 files, but none was
    the 85KB EXE file mentioned in step 3, which isn't even part of the ZIP according to the screen shot. Where does that file come from?

    So I tried a different tack; that took me here:

    https://sourceforge.net/projects/mingw/

    This one turns out to be that 85KB file that was missing before! OK,
    let's do it. It shows a list of things to install, including MSYS2 (no
    thanks) and compilers for Ada, C++, Fortran, Objective-C, but no C,
    unless it is the 'base' package? I have really no idea.

    I click that, but then what? There is no Install, Proceed, Get, or OK
    button! But under a pulldown menu, there is Apply Changes. Now it's
    doing something. At the end there was no specific message, but it said somewhere: This package has not been installed;...

    But I tried it anyway (notice this is from a normal command line):

    C:\c>gcc --version
    gcc (MinGW.org GCC-6.3.0-1) 6.3.0

    So it's version 6 of gcc! Nowhere do I remember seeing that mentioned.

    I don't normally waste my time going down these futile rabbitholes, but sometimes it can be fun as you get to see some appallingly bad
    installation processes.

    Of course, people will go to any lengths to defend these very complex
    products, and will explain to you why it is good idea to separate out
    compiler, headers, assembler, linker, library into lots of different
    pieces, all with names that are subtle variations of mingw and w64

    This is why I prefer TDM:

    https://jmeubank.github.io/tdm-gcc/

    Click on the version you want.

    Other Windows C compilers are even simpler, and smaller. (TDM is 0.5GB,
    Tiny C is under 0.002GB, and own 'bcc' is 0.001GB. My own non-C compiler
    is 0.0004GB. Both my products are single EXE files.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to All on Sun Jun 9 23:40:07 2024
    I can only tell you what works well for me. I can't force you to use it.
    Also, I can't prevent you from trying to use something that no longer
    works well due to absence of support, i.e. old msys/mingw.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Sun Jun 9 22:49:39 2024
    On 09/06/2024 21:40, Michael S wrote:
    I can only tell you what works well for me. I can't force you to use it. Also, I can't prevent you from trying to use something that no longer
    works well due to absence of support, i.e. old msys/mingw.


    I was trying to install the LATEST version of gcc on Windows! That would
    13.x, which I've done before, perhaps hitting on the right link by chance.

    'gcc' /can/ be run from a pure Windows command line, as I've been using versions of it for years.

    But they don't make it easy, as gcc is perceived to be tied to WSL MSYS2
    MINGW CYGWIN.

    I've had another go at this elusive compiler, this time apparently
    successful. Here are the steps I used:

    * Start from mingw-w64.com. Ignore where it says it's a 'complete
    runtime environment for gcc'. There is also an actual compiler at the
    end of the process!

    * Click on Downloads on the left

    * There is a list of prebuilt toolchains. The promising ones are
    w64devkit, MingW-W64-builds, and possibly WinLibs.com?
    I clicked on MinGW-W64-builds.

    * That takes you down the page to MingW-Builds, but this is where I had
    a bit of luck: as this is a one-line entry, I missed it and starting
    reading about WinLibs.com instead. But where are the downloads? The
    link is in the small print on the last line of that section.

    * It you to winlibs.com. This is looks disconcertingly like a 1990s
    website. It surely can't be the right place? Just don't click on
    MinGW-w64 as that just takes you back to square one.

    * Scroll down to Downloads. There are 16 to choose from for each
    version. I clicked (by mistake - I think) on the version /with/ LLVM
    etc, but I don't know what the difference is. I chose the MSVCRT
    version.

    The end result was a 1.4GB installation of gcc 14.1.0. Using 'gcc
    hello.c -Os -s' gives of 48KB (with 10.3 it was 88KB). It still imports msvcrt.dll, but not printf (it does import vfprintf).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Mon Jun 10 01:06:53 2024
    On Sun, 9 Jun 2024 22:49:39 +0100
    bart <[email protected]> wrote:

    On 09/06/2024 21:40, Michael S wrote:
    I can only tell you what works well for me. I can't force you to
    use it. Also, I can't prevent you from trying to use something that
    no longer works well due to absence of support, i.e. old msys/mingw.


    I was trying to install the LATEST version of gcc on Windows! That
    would 13.x, which I've done before, perhaps hitting on the right link
    by chance.

    'gcc' /can/ be run from a pure Windows command line, as I've been
    using versions of it for years.

    But they don't make it easy, as gcc is perceived to be tied to WSL
    MSYS2 MINGW CYGWIN.

    I've had another go at this elusive compiler, this time apparently successful. Here are the steps I used:

    * Start from mingw-w64.com. Ignore where it says it's a 'complete
    runtime environment for gcc'. There is also an actual compiler at
    the end of the process!

    * Click on Downloads on the left

    * There is a list of prebuilt toolchains. The promising ones are
    w64devkit, MingW-W64-builds, and possibly WinLibs.com?
    I clicked on MinGW-W64-builds.

    * That takes you down the page to MingW-Builds, but this is where I
    had a bit of luck: as this is a one-line entry, I missed it and
    starting reading about WinLibs.com instead. But where are the
    downloads? The link is in the small print on the last line of that
    section.

    * It you to winlibs.com. This is looks disconcertingly like a 1990s
    website. It surely can't be the right place? Just don't click on
    MinGW-w64 as that just takes you back to square one.

    * Scroll down to Downloads. There are 16 to choose from for each
    version. I clicked (by mistake - I think) on the version /with/
    LLVM etc, but I don't know what the difference is. I chose the MSVCRT
    version.

    The end result was a 1.4GB installation of gcc 14.1.0. Using 'gcc
    hello.c -Os -s' gives of 48KB (with 10.3 it was 88KB). It still
    imports msvcrt.dll, but not printf (it does import vfprintf).


    It sounds like you ended up with gcc distro based on 12 y.o. Microsoft
    DLL that does not support majority of c11 library features and likely
    does not support few c99 library features as well.
    If you were a little less stubborn, in 10 minutes you could have have
    distro based on new ucrt DLL that is closer to new C standard and
    generates smaller binaries.
    And likely occupies less than 1.4 GB.

    BTW, I don't understand why MSVC produces smaller binaries with old MS C
    RTL DLL while gcc produces smaller binaries with new MS C RTL DLL.
    But that's undeniable fact.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Mon Jun 10 01:26:23 2024
    On 09/06/2024 23:06, Michael S wrote:
    On Sun, 9 Jun 2024 22:49:39 +0100
    bart <[email protected]> wrote:

    On 09/06/2024 21:40, Michael S wrote:
    I can only tell you what works well for me. I can't force you to
    use it. Also, I can't prevent you from trying to use something that
    no longer works well due to absence of support, i.e. old msys/mingw.


    I was trying to install the LATEST version of gcc on Windows! That
    would 13.x, which I've done before, perhaps hitting on the right link
    by chance.

    'gcc' /can/ be run from a pure Windows command line, as I've been
    using versions of it for years.

    But they don't make it easy, as gcc is perceived to be tied to WSL
    MSYS2 MINGW CYGWIN.

    I've had another go at this elusive compiler, this time apparently
    successful. Here are the steps I used:
    ...

    The end result was a 1.4GB installation of gcc 14.1.0. Using 'gcc
    hello.c -Os -s' gives of 48KB (with 10.3 it was 88KB). It still
    imports msvcrt.dll, but not printf (it does import vfprintf).


    It sounds like you ended up with gcc distro based on 12 y.o. Microsoft
    DLL that does not support majority of c11 library features and likely
    does not support few c99 library features as well.
    If you were a little less stubborn, in 10 minutes you could have have
    distro based on new ucrt DLL that is closer to new C standard and
    generates smaller binaries.
    And likely occupies less than 1.4 GB.

    I downloaded a different 14.1 version that was 'only' 0.8GB. (Compared
    to 1.4GB; it's still 2000 times bigger than my main compiler!)

    That uses UCRT, but the size difference is probably due to not including LLVM/Clang stuff. (Which didn't work anyway; I think clang triggered my AV.)

    This now gives a hello.c executable of 22KB; it was 48KB with the 1.4GB download, and 88KB with 10.3.0.


    BTW, I don't understand why MSVC produces smaller binaries with old MS C
    RTL DLL while gcc produces smaller binaries with new MS C RTL DLL.
    But that's undeniable fact.

    I think the sizes of the runtime libraries are irrelevant if they are
    both dynamically linked. It's what the compiler puts directly into the executable that makes the difference. And here they are just too diverse
    in how they work. It can't be the 20 bytes of code for hello.c that
    affects it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Michael S on Tue Jun 11 08:33:45 2024
    On Sun, 9 Jun 2024 14:12:39 +0300, Michael S wrote:

    I don't know what is a difference exactly, but -Oz tends to be a little smaller.

    Some kind of “wizard” optimization, no doubt ...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to BGB on Fri Jun 14 03:20:37 2024
    On Fri, 7 Jun 2024 02:52:56 -0500, BGB wrote:

    On 6/6/2024 7:57 PM, Lawrence D'Oliveiro wrote:

    On Wed, 5 Jun 2024 04:01:28 -0500, BGB wrote:

    For my bounds-checking in C, there are no syntactic changes to C.

    But how efficient is it? Those research papers I mentioned reported
    being able to get the execution overhead in Pascal down to something
    like 5-10%.

    Also somewhere around a 10% slowdown in this case, but this was with dedicated ISA level support and various specialized helper instructions
    (to check/set/adjust the pointer bounds bits).

    Yeah, see, Pascal was able to do it without all that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Keith Thompson on Fri Jun 14 23:39:18 2024
    On 14/06/2024 22:30, Keith Thompson wrote:

    Now that it's too late to change the definition, I've thought of
    something that I think would have been a better way to specify #embed.

    Define a new kind of string literal, with a "uc" prefix. `uc"foo"` is
    of type `unsigned char[3]`. (Or `const unsigned char[3]`, if that's not
    too radical.) Unlike other string literals, there is no implicit
    terminating '\0'. Arbitrary byte values can of course be specified in hexadecimal: uc"\x01\x02\x03\x04". Since there's no terminating null character and C doesn't support zero-sized objects, uc"" is a syntax
    error.

    uc"..." string literals might be made even simpler, for example allowing
    only hex digits and not requiring \x (uc"01020304" rather than uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals
    could be useful in other contexts, and programmers will want
    flexibility. Maybe something like hex"01020304" (embedded spaces could
    be ignored) could be defined in addition to uc"\x01\x02\x03\x04".

    That's something I added to string literals in my language within the
    last few months. Nothing do with embedding (but it can make hex
    sequences within strings more efficient, if that approach was used).

    Writing byte-at-a-time hex data was always a bit fiddly:

    0x12, 0x34, 0xAB, ...
    "\x12\x34\xAB...

    It was made worse by my preference for `x` being in lower case, and the
    hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look wrong.

    What I did was create a new, variable-lenghth string escape sequence
    that looks like this:

    "ABC\h1234AB...\nopq" // hex sequence between ABC & nopq

    Hex digits after \h or \H are read in pairs. White space is allowed
    between pairs:

    "ABC\H 12 34 AB ...\nopq"

    The only thing I wasn't sure about was the closing backslash, which
    looks at first like another escape code. But I think it is sound,
    although it can still be tweaked.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Sat Jun 15 17:58:22 2024
    On 14/06/2024 23:30, Keith Thompson wrote:
    David Brown <[email protected]> writes:
    On 28/05/2024 22:21, Keith Thompson wrote:
    David Brown <[email protected]> writes:
    On 28/05/2024 02:33, Keith Thompson wrote:
    [...]
    Without some kind of programmer control, I'm concerned that the rules >>>>> for defining an array so #embed will be correctly optimized will be
    spread as lore rather than being specified anywhere.

    They might, but I really do not think that is so important, since they >>>> will not affect the generated results.
    Right, it won't affect the generated results (assuming I use it
    correctly). Unless I use `#embed optimize(true)` to initialize
    a struct with varying member sizes, but that's my fault because I
    asked for it.

    I am still not understanding your point. (I am confident that you
    have a point, even if I don't get it.)

    I cannot see why there would be any need or use of manually adding
    optimisation hints or controls in the source code. I cannot see why
    the there is any possibility of getting incorrect results in any way.

    The point is compile-timer performance, and perhaps even the ability
    to compile at all.
    I'm thinking about hypothetical cases where I want to embed a
    *very* large file and parsing the comma-delimited sequence could
    have unacceptable compile-time performance, perhaps even causing
    a compile-time stack overflow depending on how the parser works.
    Every time the compiler sees #embed, it has to decide whether to
    optimize it or not, and the decision criteria are not specified
    anywhere (not at all in the standard, perhaps not clearly in the
    compiler's documentation).


    Yes, I agree with that. And this is how it should be - this is not
    something that should be specified. The C standards give minimum
    requirements for things like the number of identifiers or the length
    of lines. But pretty much all compilers, for most of the "translation
    limits", say they are "limited by the memory of the host computer".
    The same will apply to #embed. And some compilers will cope better
    than others with huge #embed's, some will be faster, some more memory
    efficient. Some will change from version to version. This is not
    something that can sensibly be specified or formalized - like pretty
    much everything in regard to compilation time, each compiler does the
    best it can without any specifications. I'd expect compiler reference
    manuals might have hints, such as saying #embed is fastest with
    unsigned char arrays (or whatever), but no more than that.

    But again - I see no reason for manual optimisation hints, and no
    reason for any possible errors.

    Let me outline a possible strategy for a compiler like gcc. (I have
    not looked at the prototype implementations from thephd, nor any gcc
    developer discussions.)

    gcc splits the C pre-processor and the compiler itself, and
    (currently) communicates dataflow in only one direction, via a
    temporary file or a pipe. But the "gcc" (or "g++", according to
    preference) driver program calls and coordinates the two programs.

    If the pre-processor is called stand-alone, then it will generate a
    comma-separated list of integers, helpfully split over multiple lines
    of reasonable size. This will clearly always be correct, and always
    work, within limits of a compiler's translation limits.

    But when the gcc driver calls it, it will have a flag indicating that
    the target compiler is gcc and supports an extended pre-processed
    syntax (and also that the source is C23 - after all, the C
    pre-processor can be used as a macro processor for other files with no
    relation to C). Now the pre-processor has a lot more freedom.
    Whenever it meets an #embed directive, it can generate a line :

    #embed_data 123456

    followed in the file by 123456 (or whatever) bytes of binary data.
    The C compiler, when parsing this file, will pull that in as a single
    blob. Then it is up to the C compiler - which knows how the #embed
    data will be used - to tell if the these bytes should be used as
    parameters to a macro, initialisation for a char array, or whatever.
    And it can use them as efficiently as practically possible. (It is
    probably only worth using this for #embed data over a certain size -
    smaller #embed's could just generate the integer sequences.)

    Nowhere in this is there any call of manual optimisation hints, nor
    any risk of incorrect results.

    I've kept this on the back burner for a couple of weeks. I'm finally
    getting around to posting a followup.

    I'm not particular concerned about compilers processing #embed
    incorrectly. It's conceivable that a compiler could incorrectly decide
    that it can optimize a particular #embed directive, but I expect
    compilers to be conservative, falling back to the specified behavior if
    they can't *prove* that an optimization is safe.


    I'd expect that too. (Of course there's always the risk of bugs with
    weird use-case)

    I see two conceptual problems with #embed as it's currently defined in
    N3220.

    First, there's a possible compile-time performance issue for very large embedded files. The (draft) standard calls for #embed to expand to a comma-separated list of integer constant expressions. (I'm not sure why
    it didn't specify integer constants.)

    My objection is based on the possibility that #embed for a *very* large
    file might result in unacceptable time and memory usage during compile
    time. I haven't looked into how existing compilers handle large initializers, but I can imagine that parsing such a list might consume
    more than O(N) time and/or memory, or at least O(N) with a large
    constant. (If parsing long lists of integer constants is expensive for
    some compiler, this could be a motivation to optimize that particular
    case.)

    The point of #embed is to get O(N) scaling - or at least, much closer to
    that than compilers do today with an #include of a list of numbers (or
    even a string literal). There is little doubt that a big enough #embed
    file will consume time and memory that is unacceptable, at least for
    some people - all you need is to pick a file bigger than your computer's memory, and you can be reasonably confident that it will be problematic.
    But it also seems reasonable to expect that if a file is big enough to
    cause trouble for #embed, then any other method of including it in a C
    file will be at least as bad and probably /much/ worse.

    At worst, #embed is going to be no less efficient than today's solution,
    and at best it will be significantly more efficient. I don't think it
    is fair to object to it because a given implementation might not reach theoretical optimum efficiencies.


    The intent of #embed is to copy the contents of a file at compile time
    into an array of unsigned char -- but it's specified in a roundabout way
    that requires bizarre usages to work "correctly".

    That is one expected use, and will probably be the biggest use by a fair
    way, but it is not the only possible use. The specification lets you
    have more flexibility. For example, I have a project where I include a
    number of files in a structure with a number of unsigned char arrays,
    amongst other data - a simpler #embed solution that forced you to have
    an unsigned char array might not work with that. (The project predates
    #embed and uses a Python script to generate the data.)

    I expect at least
    some compilers to optimize #embed for better compile-time performance,
    but that requires them to determine when optimization is permitted with
    no advice from the standard about how to do that. That's going to be moderately difficult for compiler implementers; I'm not too concerned
    about that. But it also imposes a burden on programmers, who will have
    to use trial and error to determine how to ensure a #embed is optimized.


    I am entirely confident that major compiler vendors will optimise the
    case of initialising char arrays. For anything else, who cares? It is unlikely that you'd use #embed for other purposes with files that are
    big enough for unoptimised implementations to be unreasonably slow. And
    if that does turn out to be a problem in practice, then you /know/ you
    have huge files and are doing something weird, and you can use something
    other than #embed for the purpose in the same way you do today.

    Of prime importance is /correctness/ - #embed should give the results
    you expect, and I can't see that being a problem. Outside that, #embed
    is always going to be at least as efficient as existing solutions, and
    usually much faster for cases that matter.

    This all assumes that a naive #embed implementation is going to
    cause real problems for very large embedded files (compile-time
    stack overflows, unreasonably long compile times, or just using so
    much memory that system performance is affected). If it turns out
    that this isn't the case, then that objection is mostly addressed.

    I don't believe "very large" embedded files are of any real-world use in
    the first place.

    And I don't believe there will be any naïve implementations of any significance. gcc and clang are the only two C compilers with a
    realistic future for serious C work with newer standards. Even MS
    expect people to use clang for C, as far as I understand it. A number
    of other toolchains in the embedded world have switched over, or plan to
    do so - it is simply not worth the development effort. Niche C
    compilers will continue to exist, but it's unlikely they will bother
    with C23.


    My other objection is that it's conceptually messy. The expected use
    case is in an initializer for an array of unsigned char, but there are
    no restrictions on where it can be used.

    That is the point.

    As a programmer, I want to
    copy a file verbatim into an unsigned char array, but at least
    conceptually #embed translates the file contents into a long sequence of expressions which are then processed as C code to recreate the raw data. There are bizarre cases (like my previous example initializing a struct
    with members of various types) that are required to work. #embed is a preprocessor directive, but determining whether it can be optimized
    requires feedback from later compiler phases. It's doable, but it's
    *ugly*.


    I have discussed in previous posts why I don't think there is an issue
    there.

    And I think alternative ways to achieve the effect would have their own problems and complications. (I believe there is a proposal for C++ that includes a std::embed() function that can use a constexpr string.)

    Now that it's too late to change the definition, I've thought of
    something that I think would have been a better way to specify #embed.

    Define a new kind of string literal, with a "uc" prefix. `uc"foo"` is
    of type `unsigned char[3]`. (Or `const unsigned char[3]`, if that's not
    too radical.) Unlike other string literals, there is no implicit
    terminating '\0'. Arbitrary byte values can of course be specified in hexadecimal: uc"\x01\x02\x03\x04". Since there's no terminating null character and C doesn't support zero-sized objects, uc"" is a syntax
    error.

    If you are worried about ugly, few things are uglier than a C string
    literal with escaped hex characters. Well, escaped octal characters are
    worse.


    uc"..." string literals might be made even simpler, for example allowing
    only hex digits and not requiring \x (uc"01020304" rather than uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals
    could be useful in other contexts, and programmers will want
    flexibility. Maybe something like hex"01020304" (embedded spaces could
    be ignored) could be defined in addition to uc"\x01\x02\x03\x04".

    Specify that #embed expands to a sequence of one or more uc string
    literals (or hex string literals if that's added), separated by
    whitespace. If the embedded file might be empty, use the existing
    is_empty() embed parameter. Without is_empty, #embed of an empty file
    will expand to uc"", a syntax error.

    Since a string literal is a single token, parsing it is likely to be
    more efficient than parsing a sequence of integer constant expressions,
    even with concatenation of multiple literals. Since a uc"..." string
    literal is specifically of type unsigned char[], it can *only* be used
    to initialize an unsigned char[] or unsigned char* object, addressing
    the conceptual mess. If you want to use #embed to initialize an
    array of some other type, you can use a union or some other form of type-punning.

    A conforming C23 implementation could even implement this by providing uc"..." (and perhaps hex"...") literals as an extension and adding an implementation-defined embed parameter that generates them.


    I am at a loss to see how this would be any improvement.

    The efficiency gains of #embed are not because a list of integers is
    inherently less efficient than a string literal of some kind. It is
    because existing compilers store more information about each element,
    and do more checking on each of them (such as for range). With #embed-generated integer lists the compiler would not need to store this
    extra information or do the extra checks. Even for "non-optimised"
    #embed, I cannot see it being beaten by any kind of string literal
    solution by any non-negligible degree.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to bart on Sat Jun 15 19:17:23 2024
    On 15/06/2024 00:39, bart wrote:
    On 14/06/2024 22:30, Keith Thompson wrote:

    Now that it's too late to change the definition, I've thought of
    something that I think would have been a better way to specify #embed.

    Define a new kind of string literal, with a "uc" prefix.  `uc"foo"` is
    of type `unsigned char[3]`.  (Or `const unsigned char[3]`, if that's not
    too radical.)  Unlike other string literals, there is no implicit
    terminating '\0'.  Arbitrary byte values can of course be specified in
    hexadecimal: uc"\x01\x02\x03\x04".  Since there's no terminating null
    character and C doesn't support zero-sized objects, uc"" is a syntax
    error.

    uc"..." string literals might be made even simpler, for example allowing
    only hex digits and not requiring \x (uc"01020304" rather than
    uc"\x01\x02\x03\x04").  That's probably overkill.  uc"..."  literals
    could be useful in other contexts, and programmers will want
    flexibility.  Maybe something like hex"01020304" (embedded spaces could
    be ignored) could be defined in addition to uc"\x01\x02\x03\x04".

    That's something I added to string literals in my language within the
    last few months. Nothing do with embedding (but it can make hex
    sequences within strings more efficient, if that approach was used).

    Writing byte-at-a-time hex data was always a bit fiddly:

        0x12, 0x34, 0xAB, ...
        "\x12\x34\xAB...

    It was made worse by my preference for `x` being in lower case, and the
    hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look wrong.

    What I did was create a new, variable-lenghth string escape sequence
    that looks like this:

      "ABC\h1234AB...\nopq"     // hex sequence between ABC & nopq

    Hex digits after \h or \H are read in pairs. White space is allowed
    between pairs:

      "ABC\H 12 34 AB ...\nopq"

    The only thing I wasn't sure about was the closing backslash, which
    looks at first like another escape code. But I think it is sound,
    although it can still be tweaked.



    How often would something like that be useful? I would have thought
    that it is rare to see something that is basically text but has enough
    odd non-printing characters (other than the common \n, \t, \e) to make
    it worth the fuss. If you want to have binary data in something that
    looks like a string literal, then just use straight-up two hex digits
    per character - "4142431234ab". It's simpler to generate and parse. I
    don't see the benefit of something that mixes binary and text data.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to David Brown on Sat Jun 15 20:27:41 2024
    On 15/06/2024 18:17, David Brown wrote:
    On 15/06/2024 00:39, bart wrote:
    On 14/06/2024 22:30, Keith Thompson wrote:

    Now that it's too late to change the definition, I've thought of
    something that I think would have been a better way to specify #embed.

    Define a new kind of string literal, with a "uc" prefix.  `uc"foo"` is
    of type `unsigned char[3]`.  (Or `const unsigned char[3]`, if that's not >>> too radical.)  Unlike other string literals, there is no implicit
    terminating '\0'.  Arbitrary byte values can of course be specified in
    hexadecimal: uc"\x01\x02\x03\x04".  Since there's no terminating null
    character and C doesn't support zero-sized objects, uc"" is a syntax
    error.

    uc"..." string literals might be made even simpler, for example allowing >>> only hex digits and not requiring \x (uc"01020304" rather than
    uc"\x01\x02\x03\x04").  That's probably overkill.  uc"..."  literals
    could be useful in other contexts, and programmers will want
    flexibility.  Maybe something like hex"01020304" (embedded spaces could >>> be ignored) could be defined in addition to uc"\x01\x02\x03\x04".

    That's something I added to string literals in my language within the
    last few months. Nothing do with embedding (but it can make hex
    sequences within strings more efficient, if that approach was used).

    Writing byte-at-a-time hex data was always a bit fiddly:

         0x12, 0x34, 0xAB, ...
         "\x12\x34\xAB...

    It was made worse by my preference for `x` being in lower case, and
    the hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look wrong.

    What I did was create a new, variable-lenghth string escape sequence
    that looks like this:

       "ABC\h1234AB...\nopq"     // hex sequence between ABC & nopq

    Hex digits after \h or \H are read in pairs. White space is allowed
    between pairs:

       "ABC\H 12 34 AB ...\nopq"

    The only thing I wasn't sure about was the closing backslash, which
    looks at first like another escape code. But I think it is sound,
    although it can still be tweaked.



    How often would something like that be useful?  I would have thought
    that it is rare to see something that is basically text but has enough
    odd non-printing characters (other than the common \n, \t, \e) to make
    it worth the fuss.  If you want to have binary data in something that
    looks like a string literal, then just use straight-up two hex digits
    per character - "4142431234ab".  It's simpler to generate and parse.  I don't see the benefit of something that mixes binary and text data.

    That's not the same thing. That sequence "...1234..." occupies 4 bytes
    (with values 49 50 51 52), not two bytes (with values 0x12 and 0x34, or
    18 and 52).

    Here's an example of wanting to print '€4.99', first in C (note that my editor doesn't support Unicode so this stuff is needed):

    puts("\xE2\x82\xAC" "4.99");

    The euro symbol occupies three bytes in UTF8. It's awkward to type: it
    has loads of backslashes, it keeps switching case and it needs more concentration.

    Plus I had to split the string since apparently \x doesn't stop at two
    hex digits, it keeps going: it would have read \xAC4, which overflows
    the 8-bit width of a character anyway, so I don't know what the point is
    of reading more than 2 hex characters.

    Using my feature, it looks like this:

    println "\H E2 82 AC\4.99"

    There must be loads of examples of wanting to write many byte values
    within strings, which in C can also be used to initialise byte arrays (a
    useful feature I've now adopted; see below).

    Here's another example, in my language, which is the first 128 bytes of
    an EXE file which is constant. It is currently defined like this,
    probably created with a script:

    []byte stubdata = (
    0x4D, 0x5A, 0x90, 0x00, 0x03, 0x00, 0x00, 0x00,
    0x04, 0x00, 0x00, 0x00, 0xFF, 0xFF, 0x00, 0x00,
    ...

    Using the new escape, I can just copy&paste a dump, and use a text
    editor to put in the string context needed, which took under a minute:

    []byte stubdata=
    b"\H 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00\"+
    b"\H B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00\"+
    b"\H 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00\"+
    b"\H 00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00\"+
    b"\H 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68\"+
    b"\H 69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F\"+
    b"\H 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20\"+
    b"\H 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00\"+
    b"\H 50 45 00 00 64 86 04 00 00 00 00 00 00 00 00 00\"

    (The 's'/'b' prefixes are needed for strings to have a type of (in C
    terms) char[] rather than char*, a detail that C glosses over via some
    magic. 's' gives you a zero terminator, 'b' as used here doesn't. The
    "+" is used for compile-time string/data-string concatenation.)

    In short, more is possible without needed to resort to tools. You can
    directly work from a hex dump.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to David Brown on Sat Jun 15 22:37:59 2024
    On Sat, 15 Jun 2024 17:58:22 +0200, David Brown wrote:

    But it also seems reasonable to expect that if a file is big enough to
    cause trouble for #embed, then any other method of including it in a C
    file will be at least as bad and probably /much/ worse.

    But if you redefine the problem as “any method of including it in a C *program*”, then you realize that there are better techniques that do not involved C extensions.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to bart on Sat Jun 15 22:39:50 2024
    On Sat, 15 Jun 2024 20:27:41 +0100, bart wrote:

    The "+" is used for compile-time string/data-string concatenation.)

    Why didn’t you follow the C convention of implicit concatenation, just by placing literals next to each other?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to BGB on Sat Jun 15 22:42:46 2024
    On Fri, 14 Jun 2024 03:13:32 -0500, BGB wrote:

    On 6/14/2024 1:53 AM, Bonita Montero wrote:

    Am 13.06.2024 um 21:07 schrieb BGB:

    One possible justification (albeit a weak one) is that if one
    recompiles the program with optimizations turned on, in many cases
    this may subtly change the behavior of the program (particularly in
    relation to things like the contents of uninitialized variables and
    dangling pointers, etc...). ...

    If you rely on that you're misusing the language anyway.

    It is a poor practice, but seemingly does occur in the wild (intentional
    or not).

    It seems to me that kind of thing does tend to get flushed out of open-
    source code. Because such code is often compiled with different compilers,
    on different architectures, using different tool chains etc. And
    assumptions like these tend not to survive such treatment.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Lawrence D'Oliveiro on Sun Jun 16 00:20:34 2024
    On 15/06/2024 23:39, Lawrence D'Oliveiro wrote:
    On Sat, 15 Jun 2024 20:27:41 +0100, bart wrote:

    The "+" is used for compile-time string/data-string concatenation.)

    Why didn’t you follow the C convention of implicit concatenation, just by placing literals next to each other?

    Why is that better?

    I did actually have that, but it wasn't as useful. It could only work at
    the lexical level with actual string literals, for a start.

    As it is now I can do this:

    const x = "abc"
    const y = "def"
    const z = x + y # "abcdef"

    These are named constants with proper scope, which are only resolved in
    a later pass. It also applies to strings created by an embedded file:

    s := "(" + sinclude("help.txt") + ")"

    I can use parentheses and it will still work:

    const cond = ...

    print (cond | "abc" | "def") + "xyz"

    It will display 'abcxyz' or 'defxyz' depending on 'cond', which is known
    at compile-time.

    I could choose to implement "*" also ...

    (I've just spent 10 minutes doing that)

    ... so that I can do this, where having proper operators comes in useful:

    "A" + "B" * 5 ABBBBB
    ("A" + "B") * 5 ABABABABAB

    Here is a use-case:

    const cols = 80
    println "-" * cols # output divider line

    This in a lower level language where strings are not first class types.

    How C does it is a hack that was fine for 1972.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to bart on Sun Jun 16 01:16:57 2024
    On Sun, 16 Jun 2024 00:20:34 +0100, bart wrote:

    On 15/06/2024 23:39, Lawrence D'Oliveiro wrote:
    On Sat, 15 Jun 2024 20:27:41 +0100, bart wrote:

    The "+" is used for compile-time string/data-string concatenation.)

    Why didn’t you follow the C convention of implicit concatenation, just
    by placing literals next to each other?

    Why is that better?

    Less typing. Surprising that few other languages, that otherwise copy
    things from C, do not include that feature. But Python does. E.g.

    toc.write \
    (
    "// Total length: %(total_length)s\n"
    "CD_DA\n"
    "CD_TEXT\n"
    " {\n"
    " LANGUAGE_MAP { 0 : EN }\n"
    " LANGUAGE 0\n"
    " {\n"
    " TITLE \"%(disc_title)s\"\n"
    " PERFORMER \"\"\n"
    # get around off-by-one performer assignment bug in cdrdao
    " }\n"
    " }\n"
    %
    {
    "disc_title" : title_data["disc_title"],
    "total_length" : format_cd_time(title_data["total_nr_frames"], True),
    }
    )


    I did actually have that, but it wasn't as useful. It could only work at
    the lexical level with actual string literals, for a start.

    Of course.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to BGB on Sun Jun 16 03:15:58 2024
    On Sat, 15 Jun 2024 20:42:47 -0500, BGB wrote:

    I fairly promptly fixed this bug once discovered, and am then just left
    to wonder how exactly it managed to work in the first place (or didn't
    break already).

    Been there, done that.

    When I set about to revive the then-moribund NCSA Telnet code for
    Macintosh, back around 1990, it looked like it had been abandoned because nobody had the stomach to do the porting from MPW C v2 (created for Apple
    by Green Hills) to v3 (developed by Apple itself, thoroughly ANSI-
    compliant).

    Besides all the compile-time errors, there were dozens, maybe hundreds, of places to be checked for the different, incompatible definition of the Pascal-equivalent “Str255” type, to ensure there were no lurking bugs. I think I got them all.

    Then I started up my build, got as far as opening a terminal session,
    closed it again, quit ... and the app crashed.

    I discovered that the shutdown loop to ensure all open sessions were
    closed before quitting had an off-by-1 error in its termination condition:
    it was accessing an element in the array of open sessions that didn’t
    exist. Somehow this never manifested a problem with the old C compiler,
    but it did with the new one.

    By the way, that MPW C v3 compiler had some quite amusing error messages. Various people (including myself) independently posted lists of them (and
    once I got accused of plagiarizing the list from someone else--as though someone had made them up). Quite a few people couldn’t believe such
    messages were real.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Lawrence D'Oliveiro on Sun Jun 16 16:55:51 2024
    On 16/06/2024 00:37, Lawrence D'Oliveiro wrote:
    On Sat, 15 Jun 2024 17:58:22 +0200, David Brown wrote:

    But it also seems reasonable to expect that if a file is big enough to
    cause trouble for #embed, then any other method of including it in a C
    file will be at least as bad and probably /much/ worse.

    But if you redefine the problem as “any method of including it in a C *program*”, then you realize that there are better techniques that do not involved C extensions.

    Yes, as I said.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to bart on Sun Jun 16 16:54:53 2024
    On 15/06/2024 21:27, bart wrote:
    On 15/06/2024 18:17, David Brown wrote:
    On 15/06/2024 00:39, bart wrote:
    On 14/06/2024 22:30, Keith Thompson wrote:

    Now that it's too late to change the definition, I've thought of
    something that I think would have been a better way to specify #embed. >>>>
    Define a new kind of string literal, with a "uc" prefix.  `uc"foo"` is >>>> of type `unsigned char[3]`.  (Or `const unsigned char[3]`, if that's
    not
    too radical.)  Unlike other string literals, there is no implicit
    terminating '\0'.  Arbitrary byte values can of course be specified in >>>> hexadecimal: uc"\x01\x02\x03\x04".  Since there's no terminating null >>>> character and C doesn't support zero-sized objects, uc"" is a syntax
    error.

    uc"..." string literals might be made even simpler, for example
    allowing
    only hex digits and not requiring \x (uc"01020304" rather than
    uc"\x01\x02\x03\x04").  That's probably overkill.  uc"..."  literals >>>> could be useful in other contexts, and programmers will want
    flexibility.  Maybe something like hex"01020304" (embedded spaces could >>>> be ignored) could be defined in addition to uc"\x01\x02\x03\x04".

    That's something I added to string literals in my language within the
    last few months. Nothing do with embedding (but it can make hex
    sequences within strings more efficient, if that approach was used).

    Writing byte-at-a-time hex data was always a bit fiddly:

         0x12, 0x34, 0xAB, ...
         "\x12\x34\xAB...

    It was made worse by my preference for `x` being in lower case, and
    the hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look
    wrong.

    What I did was create a new, variable-lenghth string escape sequence
    that looks like this:

       "ABC\h1234AB...\nopq"     // hex sequence between ABC & nopq

    Hex digits after \h or \H are read in pairs. White space is allowed
    between pairs:

       "ABC\H 12 34 AB ...\nopq"

    The only thing I wasn't sure about was the closing backslash, which
    looks at first like another escape code. But I think it is sound,
    although it can still be tweaked.



    How often would something like that be useful?  I would have thought
    that it is rare to see something that is basically text but has enough
    odd non-printing characters (other than the common \n, \t, \e) to make
    it worth the fuss.  If you want to have binary data in something that
    looks like a string literal, then just use straight-up two hex digits
    per character - "4142431234ab".  It's simpler to generate and parse.
    I don't see the benefit of something that mixes binary and text data.

    That's not the same thing. That sequence "...1234..." occupies 4 bytes
    (with values 49 50 51 52), not two bytes (with values 0x12 and 0x34, or
    18 and 52).

    Here's an example of wanting to print '€4.99', first in C (note that my editor doesn't support Unicode so this stuff is needed):

       puts("\xE2\x82\xAC" "4.99");

    The euro symbol occupies three bytes in UTF8. It's awkward to type: it
    has loads of backslashes, it keeps switching case and it needs more concentration.

    Plus I had to split the string since apparently \x doesn't stop at two
    hex digits, it keeps going: it would have read \xAC4, which overflows
    the 8-bit width of a character anyway, so I don't know what the point is
    of reading more than 2 hex characters.

    Using my feature, it looks like this:

        println "\H E2 82 AC\4.99"


    I don't see any improvement of significance. The improvement, if any,
    is very minor.

    (I gather you have other conveniences for your language's printing
    features when converting various types, but that's a different matter.)

    The obvious answer to writing this kind of thing is simply to switch to
    an editor that supports UTF-8. That has been the obvious answer for a
    couple of decades.

    There must be loads of examples of wanting to write many byte values
    within strings, which in C can also be used to initialise byte arrays (a useful feature I've now adopted; see below).

    Here's another example, in my language, which is the first 128 bytes of
    an EXE file which is constant. It is currently defined like this,
    probably created with a script:

      []byte stubdata = (
        0x4D, 0x5A, 0x90, 0x00, 0x03, 0x00, 0x00, 0x00,
        0x04, 0x00, 0x00, 0x00, 0xFF, 0xFF, 0x00, 0x00,
        ...

    Using the new escape, I can just copy&paste a dump, and use a text
    editor to put in the string context needed, which took under a minute:

    []byte stubdata=
      b"\H 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00\"+
      b"\H B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00\"+
      b"\H 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00\"+
      b"\H 00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00\"+
      b"\H 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68\"+
      b"\H 69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F\"+
      b"\H 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20\"+
      b"\H 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00\"+
      b"\H 50 45 00 00 64 86 04 00 00 00 00 00 00 00 00 00\"

    Why bother with the \H stuff? That's my point - use hex data for data,
    and text for text. Mixing these is not common enough to make it worth
    the extra fuss you have to give such negligible extra convenience.

    My suggestion is that it could be helpful to have binary blobs written
    as hex digits without escapes anywhere, because it is /just/ binary
    data. I don't object to having optional spaces - that's a fine idea.
    But just write :

    b"4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00"
    b"B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00"

    The extra "\H" adds nothing useful.





    (The 's'/'b' prefixes are needed for strings to have a type of (in C
    terms) char[] rather than char*, a detail that C glosses over via some
    magic. 's' gives you a zero terminator, 'b' as used here doesn't. The
    "+" is used for compile-time string/data-string concatenation.)

    In short, more is possible without needed to resort to tools. You can directly work from a hex dump.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to David Brown on Sun Jun 16 20:00:45 2024
    On 16/06/2024 15:54, David Brown wrote:
    On 15/06/2024 21:27, bart wrote:
    On 15/06/2024 18:17, David Brown wrote:
    On 15/06/2024 00:39, bart wrote:
    On 14/06/2024 22:30, Keith Thompson wrote:

    Now that it's too late to change the definition, I've thought of
    something that I think would have been a better way to specify #embed. >>>>>
    Define a new kind of string literal, with a "uc" prefix.  `uc"foo"` is >>>>> of type `unsigned char[3]`.  (Or `const unsigned char[3]`, if
    that's not
    too radical.)  Unlike other string literals, there is no implicit
    terminating '\0'.  Arbitrary byte values can of course be specified in >>>>> hexadecimal: uc"\x01\x02\x03\x04".  Since there's no terminating null >>>>> character and C doesn't support zero-sized objects, uc"" is a syntax >>>>> error.

    uc"..." string literals might be made even simpler, for example
    allowing
    only hex digits and not requiring \x (uc"01020304" rather than
    uc"\x01\x02\x03\x04").  That's probably overkill.  uc"..."  literals >>>>> could be useful in other contexts, and programmers will want
    flexibility.  Maybe something like hex"01020304" (embedded spaces
    could
    be ignored) could be defined in addition to uc"\x01\x02\x03\x04".

    That's something I added to string literals in my language within
    the last few months. Nothing do with embedding (but it can make hex
    sequences within strings more efficient, if that approach was used).

    Writing byte-at-a-time hex data was always a bit fiddly:

         0x12, 0x34, 0xAB, ...
         "\x12\x34\xAB...

    It was made worse by my preference for `x` being in lower case, and
    the hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look
    wrong.

    What I did was create a new, variable-lenghth string escape sequence
    that looks like this:

       "ABC\h1234AB...\nopq"     // hex sequence between ABC & nopq

    Hex digits after \h or \H are read in pairs. White space is allowed
    between pairs:

       "ABC\H 12 34 AB ...\nopq"

    The only thing I wasn't sure about was the closing backslash, which
    looks at first like another escape code. But I think it is sound,
    although it can still be tweaked.



    How often would something like that be useful?  I would have thought
    that it is rare to see something that is basically text but has
    enough odd non-printing characters (other than the common \n, \t, \e)
    to make it worth the fuss.  If you want to have binary data in
    something that looks like a string literal, then just use straight-up
    two hex digits per character - "4142431234ab".  It's simpler to
    generate and parse. I don't see the benefit of something that mixes
    binary and text data.

    That's not the same thing. That sequence "...1234..." occupies 4 bytes
    (with values 49 50 51 52), not two bytes (with values 0x12 and 0x34,
    or 18 and 52).

    Here's an example of wanting to print '€4.99', first in C (note that
    my editor doesn't support Unicode so this stuff is needed):

        puts("\xE2\x82\xAC" "4.99");

    The euro symbol occupies three bytes in UTF8. It's awkward to type: it
    has loads of backslashes, it keeps switching case and it needs more
    concentration.

    Plus I had to split the string since apparently \x doesn't stop at two
    hex digits, it keeps going: it would have read \xAC4, which overflows
    the 8-bit width of a character anyway, so I don't know what the point
    is of reading more than 2 hex characters.

    Using my feature, it looks like this:

         println "\H E2 82 AC\4.99"


    I don't see any improvement of significance.  The improvement, if any,
    is very minor.

    The difference is that it can be typed fluently without that annoying \x between every number. Plus I can add white space for grouping without it affecting the data.


    (I gather you have other conveniences for your language's printing
    features when converting various types, but that's a different matter.)

    The obvious answer to writing this kind of thing is simply to switch to
    an editor that supports UTF-8.

    It never happens that you want to type a bunch of hex byte values to
    initialise a byte array? OK.

    Why bother with the \H stuff?  That's my point - use hex data for data,
    and text for text.  Mixing these is not common enough to make it worth
    the extra fuss you have to give such negligible extra convenience.

    My suggestion is that it could be helpful to have binary blobs written
    as hex digits without escapes anywhere, because it is /just/ binary
    data.  I don't object to having optional spaces - that's a fine idea.
    But just write :

        b"4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00"
        b"B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00"

    The extra "\H" adds nothing useful.

    Is this a separate feature using 'b'? Because in my scheme, \H is just
    another string escape code, which can be used in ordinary strings, and
    b"" strings define char[] data which can include normal text data too.

    So my example could have been written as b"MZ\h 90 00 03 ..."

    I did look at having a separate feature, but I didn't want that. I ended
    up with these scheme for data-strings, here expressed using C types:

    Can initialise:

    "abcd" char* only
    s"abcd" char*, char[] or any T[]; zero-terminated
    b"abcd" char*, char[] or any T[]

    sinclude"file" char*, char[] or any T[]; zero-terminated
    binclude"file" char*, char[] or any T[]

    The first 3 can include any string escapes including \H...\

    The last two embed file data, binary or text. But if a normal C-style
    string is needed with no embedded zeros except at the end, sinclude
    should be used with a text file.







    (The 's'/'b' prefixes are needed for strings to have a type of (in C
    terms) char[] rather than char*, a detail that C glosses over via some
    magic. 's' gives you a zero terminator, 'b' as used here doesn't. The
    "+" is used for compile-time string/data-string concatenation.)

    In short, more is possible without needed to resort to tools. You can
    directly work from a hex dump.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Chris M. Thomasson on Mon Jun 17 00:03:49 2024
    On Sun, 16 Jun 2024 12:31:13 -0700, Chris M. Thomasson wrote:

    Code up an L-System for fun:

    Been done ... lots. Particularly fun when it works within my favourite 3D
    app:

    <https://github.com/krljg/lsystem>
    <https://blendermarket.com/products/lsystem>

    Just a couple that I found with a quick search <https://www.google.com/search?q=blender+addon+lsystem>.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to bart on Mon Jun 17 10:49:04 2024
    On 16/06/2024 21:00, bart wrote:
    On 16/06/2024 15:54, David Brown wrote:
    On 15/06/2024 21:27, bart wrote:
    On 15/06/2024 18:17, David Brown wrote:
    On 15/06/2024 00:39, bart wrote:
    On 14/06/2024 22:30, Keith Thompson wrote:

    Now that it's too late to change the definition, I've thought of
    something that I think would have been a better way to specify
    #embed.

    Define a new kind of string literal, with a "uc" prefix.
    `uc"foo"` is
    of type `unsigned char[3]`.  (Or `const unsigned char[3]`, if
    that's not
    too radical.)  Unlike other string literals, there is no implicit >>>>>> terminating '\0'.  Arbitrary byte values can of course be
    specified in
    hexadecimal: uc"\x01\x02\x03\x04".  Since there's no terminating null >>>>>> character and C doesn't support zero-sized objects, uc"" is a syntax >>>>>> error.

    uc"..." string literals might be made even simpler, for example
    allowing
    only hex digits and not requiring \x (uc"01020304" rather than
    uc"\x01\x02\x03\x04").  That's probably overkill.  uc"..."  literals >>>>>> could be useful in other contexts, and programmers will want
    flexibility.  Maybe something like hex"01020304" (embedded spaces >>>>>> could
    be ignored) could be defined in addition to uc"\x01\x02\x03\x04".

    That's something I added to string literals in my language within
    the last few months. Nothing do with embedding (but it can make hex
    sequences within strings more efficient, if that approach was used). >>>>>
    Writing byte-at-a-time hex data was always a bit fiddly:

         0x12, 0x34, 0xAB, ...
         "\x12\x34\xAB...

    It was made worse by my preference for `x` being in lower case, and
    the hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look
    wrong.

    What I did was create a new, variable-lenghth string escape
    sequence that looks like this:

       "ABC\h1234AB...\nopq"     // hex sequence between ABC & nopq >>>>>
    Hex digits after \h or \H are read in pairs. White space is allowed
    between pairs:

       "ABC\H 12 34 AB ...\nopq"

    The only thing I wasn't sure about was the closing backslash, which
    looks at first like another escape code. But I think it is sound,
    although it can still be tweaked.



    How often would something like that be useful?  I would have thought
    that it is rare to see something that is basically text but has
    enough odd non-printing characters (other than the common \n, \t,
    \e) to make it worth the fuss.  If you want to have binary data in
    something that looks like a string literal, then just use
    straight-up two hex digits per character - "4142431234ab".  It's
    simpler to generate and parse. I don't see the benefit of something
    that mixes binary and text data.

    That's not the same thing. That sequence "...1234..." occupies 4
    bytes (with values 49 50 51 52), not two bytes (with values 0x12 and
    0x34, or 18 and 52).

    Here's an example of wanting to print '€4.99', first in C (note that
    my editor doesn't support Unicode so this stuff is needed):

        puts("\xE2\x82\xAC" "4.99");

    The euro symbol occupies three bytes in UTF8. It's awkward to type:
    it has loads of backslashes, it keeps switching case and it needs
    more concentration.

    Plus I had to split the string since apparently \x doesn't stop at
    two hex digits, it keeps going: it would have read \xAC4, which
    overflows the 8-bit width of a character anyway, so I don't know what
    the point is of reading more than 2 hex characters.

    Using my feature, it looks like this:

         println "\H E2 82 AC\4.99"


    I don't see any improvement of significance.  The improvement, if any,
    is very minor.

    The difference is that it can be typed fluently without that annoying \x between every number. Plus I can add white space for grouping without it affecting the data.


    I realise you think your system is much nicer - otherwise you would not
    have implemented it! /I/ don't think it is a big improvement. It is
    certainly not big enough to be worth the effort of changing real
    languages or tools used by lots of people rather than just a single
    person. And I think the termination using "\" is a step backwards - now
    "\" is no longer an escape character, but has different purposes in
    different places. One and a half steps forward, one step back, is not
    worth the effort - especially when you can so easily go several steps
    forward with the format I suggested.


    (I gather you have other conveniences for your language's printing
    features when converting various types, but that's a different matter.)

    The obvious answer to writing this kind of thing is simply to switch
    to an editor that supports UTF-8.

    It never happens that you want to type a bunch of hex byte values to initialise a byte array? OK.

    It /does/ happen. In such cases, I type a bunch of hex values.

    What doesn't happen is that I have a UTF-8 text and I choose to write
    that using hex values. I much prefer to write the UTF-8 text using an
    editor that supports UTF-8 and tools that work with UTF-8.


    Why bother with the \H stuff?  That's my point - use hex data for
    data, and text for text.  Mixing these is not common enough to make it
    worth the extra fuss you have to give such negligible extra convenience.

    My suggestion is that it could be helpful to have binary blobs written
    as hex digits without escapes anywhere, because it is /just/ binary
    data.  I don't object to having optional spaces - that's a fine idea.
    But just write :

         b"4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00"
         b"B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00"

    The extra "\H" adds nothing useful.

    Is this a separate feature using 'b'?

    Yes - that's the point. It would be for expressing binary blob data in
    a compact form as a string of hex digits, with or without spaces, and convenient for copy-and-paste from hex editors and other such sources.
    You could happily use h"..." rather than b"..." if you prefer. And I
    suppose it could be extended to support lumps bigger than 8 bits, but
    then endian issues complicate matters and I suspect it is not worth the
    effort.

    Because in my scheme, \H is just
    another string escape code, which can be used in ordinary strings,

    That is what I would want to avoid. Being able to mix such data is a disadvantage, not an advantage. (IMHO, of course.)

    and
    b"" strings define char[] data which can include normal text data too.

    So my example could have been written as b"MZ\h 90 00 03 ..."

    And that kind of monstrosity is what I was trying to get away from.


    I did look at having a separate feature, but I didn't want that. I ended
    up with these scheme for data-strings, here expressed using C types:

                        Can initialise:

       "abcd"           char* only
      s"abcd"           char*, char[] or any T[]; zero-terminated
      b"abcd"           char*, char[] or any T[]

      sinclude"file"    char*, char[] or any T[]; zero-terminated
      binclude"file"    char*, char[] or any T[]


    It is a mistake to have too many similar-looking alternatives with
    different rules as to when and where they can be used.

    Changing existing languages is always difficult, or even impossible.
    But my suggestion here is that there should be two different kinds of
    literals:

    "Hello, world!"

    and

    b"00 12 34"

    The former is always a string, always UTF-8, in whatever format the
    language uses for strings (zero-terminated, Pascal style, or whatever).
    The later is a compact way of writing binary blobs in hex when needed,
    and is always a constant array of bytes.


    The first 3 can include any string escapes including \H...\

    The last two embed file data, binary or text. But if a normal C-style
    string is needed with no embedded zeros except at the end, sinclude
    should be used with a text file.







    (The 's'/'b' prefixes are needed for strings to have a type of (in C
    terms) char[] rather than char*, a detail that C glosses over via
    some magic. 's' gives you a zero terminator, 'b' as used here
    doesn't. The "+" is used for compile-time string/data-string
    concatenation.)

    In short, more is possible without needed to resort to tools. You can
    directly work from a hex dump.



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Mon Jun 17 13:18:00 2024
    On Sun, 16 Jun 2024 20:00:45 +0100
    bart <[email protected]> wrote:


    I don't see any improvement of significance.  The improvement, if
    any, is very minor.

    The difference is that it can be typed fluently without that annoying
    \x between every number.

    It does not sound like a big obstacle. If you are typing something
    long, just type '-' instead of '\x' and do find&replace after you
    finished.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)