Forum: >>> Magnum BBS <<<

C23 thoughts and opinions

From David Brown@21:1/5 to All on Wed May 22 18:55:36 2024

In an attempt to bring some topicality to the group, has anyone started
using, or considering, C23 ? There's quite a lot of change in it,
especially compared to the minor changes in C17.

<https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf> <https://en.wikipedia.org/wiki/C23_(C_standard_revision)> <https://en.cppreference.com/w/c/23>

I like that it tidies up a lot of old stuff - it is neater to have
things like "bool", "static_assert", etc., as part of the language
rather than needing a half-dozen includes for such basic stuff.

I like that it standardises a several useful extensions that have been
in gcc and clang (and possibly other compilers) for many years.

I'm not sure it will make a big difference to my own programming - when
I want "typeof" or "chk_add()", I already use them in gcc. But for
people restricted to standard C, there's more new to enjoy. And I
prefer to use standard syntax when possible.

"constexpr" is something I think I will find helpful, in at least some circumstances.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Malcolm McLean on Wed May 22 21:50:51 2024

On 22/05/2024 21:10, Malcolm McLean wrote:

On 22/05/2024 17:55, David Brown wrote:

In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change in
it, especially compared to the minor changes in C17.

<https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>
<https://en.wikipedia.org/wiki/C23_(C_standard_revision)>
<https://en.cppreference.com/w/c/23>

I like that it tidies up a lot of old stuff - it is neater to have
things like "bool", "static_assert", etc., as part of the language
rather than needing a half-dozen includes for such basic stuff.

I like that it standardises a several useful extensions that have been
in gcc and clang (and possibly other compilers) for many years.

I'm not sure it will make a big difference to my own programming -
when I want "typeof" or "chk_add()", I already use them in gcc. But
for people restricted to standard C, there's more new to enjoy. And I
prefer to use standard syntax when possible.

"constexpr" is something I think I will find helpful, in at least some
circumstances.

So I'm currently writing some code (you can follow my progress on
github, it is a new branch in the Baby X resource compiler project). And
it's just standard well understood algorithm code to manipulate XML
trees. And I certainly don't feel the neeed for static_assert.

I use static assertions everywhere I can. I used them long before C11
added them to the language, using a somewhat messy macro to force an
error if the assertion fails. They catch mistakes, they document
assumptions, they make code clearer to the reader. And they do so with
zero cost in code space or run-time, and no more effort than writing a
comment. I find it hard to understand why anyone would actively choose
not to use them.

But even
boolean type and const.

Bool is much more than "int 0" and "int 1". And it is significantly
clearer in code. (Sometimes, of course, a specific enumerated type is
clearer than bool or int.)

Const documents the code, makes the action of a function clearer to the
reader, and helps catch mistakes.

These are all things that make the language better, and have done so for
the past 25 years.

Of course quite alot of the functions don't
actually change the structures they are passed. But is littering the
code with const going to help? And why do you really need a boolean when
an int can hold either a zero or non-zero value?

And don't you just want a pared down, clean language?

I want a language with the features I need and that help me to write
good clear code. Minimal is not helpful, any more than needlessly
complex is helpful.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Thiago Adams on Wed May 22 22:11:44 2024

On 22/05/2024 19:42, Thiago Adams wrote:

On 22/05/2024 13:55, David Brown wrote:

In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change in
it, especially compared to the minor changes in C17.

<https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>
<https://en.wikipedia.org/wiki/C23_(C_standard_revision)>
<https://en.cppreference.com/w/c/23>

I like that it tidies up a lot of old stuff - it is neater to have
things like "bool", "static_assert", etc., as part of the language
rather than needing a half-dozen includes for such basic stuff.

I like that it standardises a several useful extensions that have been
in gcc and clang (and possibly other compilers) for many years.

I'm not sure it will make a big difference to my own programming -
when I want "typeof" or "chk_add()", I already use them in gcc. But
for people restricted to standard C, there's more new to enjoy. And I
prefer to use standard syntax when possible.

"constexpr" is something I think I will find helpful, in at least some
circumstances.

I am waiting MSVC support. There are a lot of simple features MSVC could implement and deliver in small increments. But it is very slow.

MSVC is primarily a C++ compiler - the C support is more of a leftover
from the previous century, with a few post-C90 features as an
afterthought. Surely for C development on Windows, rather than C++,
you'd look for something better?

I am would use today if I had.

- #warning
- [[nodiscard]]
- typeof
- digit separators
- bool true, false

I use these today in C, except the digit separators (I use them in C++).
But as I say, it's nice to see them as standard rather than just
common extensions.

I am not planning to use:

- enum with specific types.

I haven't found a use for these in C++, and I'm not sure I'll need them
in C either. I sometimes have ordinary enum types in bitfields for
specific sizes.

- #elifdef

The will slightly neaten some of my pre-processor handling. My strong preference for preprocessor symbols for conditional compilation and the
like is to have symbols that are always defined, but to different
values, and use "#if" checks rather than "#ifdef" - when combined with
gcc warnings, it makes it far easier to catch spelling mistakes, and it
makes it easy to jump in the code to where the symbol is defined. But
#ifdef checks do turn up, and this will give marginally neater code.

- nullptr

I am fond of nullptr in C++, and will use it in C. Like most of the C23 changes, it's not a big issue - after all, you get a lot of the same
effect with "#define nullptr (void*)(0)" or similar. But it means your
code has a visual distinction between the integer 0 and a null pointer,
and also lets the compiler or other static checking system check better
than using NULL would. (And I don't like NULL - I dislike all-caps
identifiers in general.)

- auto

I use that occasionally in gcc, as __auto_type. It can be helpful in
macros. I might use it more when it is standardised. (I use auto in
C++ a bit more often.)

- constexpr

I will definitely use that. Sometimes I want a constant expression for
things like array sizes or static initialisers, and want to calculate
it. constexpr gives you that without having to resort to macros. (I'd
perhaps be even happier if I could just use const, as I can in C++.)

Not sure
- empty initializer

I don't see that one being a big hit, at least for me. But I see little benefit in /not/ allowing it in the language, so it seems a sensible
addition.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Malcolm McLean on Thu May 23 02:47:20 2024

On Wed, 22 May 2024 20:10:51 +0100, Malcolm McLean wrote:

And why do you really need a boolean when
an int can hold either a zero or non-zero value?

Knowing that there is a range containing just two possible values allows
you to use the type as an array index.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Malcolm McLean on Thu May 23 02:46:25 2024

On Wed, 22 May 2024 20:10:51 +0100, Malcolm McLean wrote:

And don't you just want a pared down, clean language?

The train for BCPL and BLISS now leaving on platform 1980 ...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Malcolm McLean on Thu May 23 02:48:50 2024

On Wed, 22 May 2024 21:39:17 +0100, Malcolm McLean wrote:

static int haserror(LEXER *lex)
{
return lex->error[0] ? 1 : 0;
}

static bool has_error(LEXER * lex)
{
return lex->error[0] != 0;
} /*has_error*/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Thiago Adams on Thu May 23 02:49:37 2024

On Wed, 22 May 2024 14:42:58 -0300, Thiago Adams wrote:

I am waiting MSVC support. There are a lot of simple features MSVC could implement and deliver in small increments. But it is very slow.

And they wonder why developers are deserting the Windows platform for
Linux.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Thiago Adams on Thu May 23 02:59:28 2024

On Wed, 22 May 2024 22:23:26 -0300, Thiago Adams wrote:

I like the idea of embed ...

We’ve discussed this before. It just seems like a sop to those stuck with antiquated, crippled build systems. In which case, how would they get an up-to-date compiler that supports it?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to David Brown on Thu May 23 03:13:50 2024

On Wed, 22 May 2024 18:55:36 +0200, David Brown wrote:

<https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>

Unicode identifiers!

typedef int
typėdef;

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Keith Thompson on Thu May 23 04:47:44 2024

On Wed, 22 May 2024 21:30:34 -0700, Keith Thompson wrote:

... code will be written to use it.

Funny, isn’t it, that when I post code using other features of C (like iso646.h), I get piled on by people who don’t like it, with quite the opposite argument.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Keith Thompson on Thu May 23 04:20:24 2024

On Wed, 22 May 2024 21:08:54 -0700, Keith Thompson wrote:

Lawrence D'Oliveiro <[email protected]d> writes:

On Wed, 22 May 2024 22:23:26 -0300, Thiago Adams wrote:

I like the idea of embed ...

We’ve discussed this before. It just seems like a sop to those stuck
with antiquated, crippled build systems. In which case, how would they
get an up-to-date compiler that supports it?

Presumably by waiting until compilers support it, like any new feature.

Time/effort would be better spent investing in a more versatile build
system. Which would have the added advantage of supporting other languages besides C.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Kettlewell@21:1/5 to Malcolm McLean on Thu May 23 09:07:23 2024

Malcolm McLean <[email protected]> writes:

static int haserror(LEXER *lex)
{
return lex->error[0] ? 1 : 0;
}

error is a character buffer which holds the error message if an error
has been encountered. And for convenience it is placed in the
lexer. If here is no error, it holds the empty string. However it's
not entirely obvious that testing the message directly is the way you
should be testing for an error condition, so I wrote that little
function to make things clearer.

It's easy enough to make it return a boolean, of course. But I don't
see a real benefit.

Possible benefits:

1) It conveys information to the reader about the nature of the
function. In this particular case the name also conveys that
information well enough, so there’s not actually much to be gained
here, but it other contexts there may be more of an advantage.

2) It conveys information to the compiler that may be exploited by the
optimizer (depending on the compilation model, the capabilities of
the target platform and optimizer, etc).

We are gradually migrating functions with boolean sense to returning
bool, albeit not very systematically, mainly for reasons #1.

--
https://www.greenend.org.uk/rjk/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Thiago Adams on Thu May 23 13:11:16 2024

On 23/05/2024 02:21, Thiago Adams wrote:

Em 5/22/2024 7:53 PM, Keith Thompson escreveu:

But const doesn't mean constant. It means read-only.
`const int r = rand();` is perfectly valid.

I dislike the C++ hack of making N a constant expression given
`const int N = 42;`; constexpr made that unnecessary. C23 makes the
same (IMHO) mistake.

If I had a time machine, I'd spell "const" as "readonly" and make
"const" mean what "constexpr" now means (evaluated at compile time).

[...]

Everything is a mess: const in C++, the differences from const in C,
etc. constexpr in C23 just makes the mess bigger.

auto is a mess as well not well specified for pointer. not sure if we
had this topic here, but auto * p in C is not specified.

I would remove from C23
- nullptr
-auto
-constexpr
-embed

I like the idea of embed but there is no implementation in production so
this is crazy!

'embed' was discussed a few months ago. I disagreed with the poor way it
was to be implemented: 'embed' notionally generates a list of
comma-separated numbers as tokens, where you have to take care of any
trailing zero yourself if needed. It would also be hopelessly
inefficient if actually implemented like that.

I compared it to the scheme in my own language, which could import text
files, but binary ones didn't really work.

Since then embedding has been considerably improved, so that it works
like this:

[]char str = sinclude("hello.c")
[]byte data = binclude("hello.exe")

The file-embedding is done by sinclude or binclude. The former adds a
zero terminator to the embedded file data (expected to be a text file), otherwise they are the same.

binclude can initialise any kind of array, including a 2D array of any
element type, although the data in the file needs to be suitable.

C23's 'embed' was claimed to be more flexible, as you can have
consecutive 'embed' directives initialising the same array. I can do the
same:

[]byte file = binclude("hello.exe") + binclude("/cx/big/sql.exe")

proc main=
println file.len
end

This generates an executable of 1077248 bytes, and displays 1050112 when
run, the combined size of those two embedded binaries. Compiling this
took 50ms.

("+" here is a compile-time operator that can concatenate constant
strings or also binary data like this.)

Basically, you are right that the ad hoc features of C23 are messy.

I suspect that ones like 'embed' have been derived from C++ which always
likes to make things too wide-ranging and much harder to use and
implement than necessary.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to David Brown on Thu May 23 15:02:26 2024

On Wed, 22 May 2024 18:55:36 +0200
David Brown <[email protected]> wrote:

In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change
in it, especially compared to the minor changes in C17.

<https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf> <https://en.wikipedia.org/wiki/C23_(C_standard_revision)> <https://en.cppreference.com/w/c/23>

I like that it tidies up a lot of old stuff - it is neater to have
things like "bool", "static_assert", etc., as part of the language
rather than needing a half-dozen includes for such basic stuff.

I like that it standardises a several useful extensions that have
been in gcc and clang (and possibly other compilers) for many years.

I'm not sure it will make a big difference to my own programming -
when I want "typeof" or "chk_add()", I already use them in gcc. But
for people restricted to standard C, there's more new to enjoy. And
I prefer to use standard syntax when possible.

"constexpr" is something I think I will find helpful, in at least
some circumstances.

Removed
1) Old-style function declarations and definitions
2) Representations for signed integers other than two's complement
3) Permission that u/U-prefixed character constants and string
literals may be not UTF-16/32
4) Mixed wide string literal concatenation
5) Support for calling realloc() with zero size (the behavior becomes undefined) 6) __alignof_is_defined and __alignas_is_defined
7) static_assert is not provided as a macro defined in <assert.h>
(becomes a keyword) 8) thread_local is not provided as a macro defined
in <threads.h> (becomes a keyword)

1) good
2) good, but insufficient. The next logical step is to make both left
and right shift of negative integers by count that does not exceed #
of bits in respective type fully defined
3) IDNC
4) IDNC
5) IDNC
6) IDNC
7) bad. Breaks existing code for weak reason
8) bad. Breaks existing code for weak reason

Deprecated
1) <stdnoreturn.h>
2) Old feature-test macros
__STDC_IEC_559__
__STDC_IEC_559_COMPLEX__
3) _Noreturn function specifier
4) _Noreturn attribute token
5) asctime()
6) ctime()
7) DECIMAL_DIG (use the appropriate type-specific macro
(FLT_DECIMAL_DIG, etc) instead)
8) Definition of following numeric limit macros in <math.h> (they
should be used via <float.h>)
INFINITY
DEC_INFINITY
NAN
DEC_NAN
9) __bool_true_false_are_defined

No opinion on most of those.
W.r.t. 5 and 6.
IMHO, all old-UNIX-style APIs that return pointers to static
objects within library or rely on presence of static object within
library for purpose of preserving state for subsequent calls should be systematically deprecated and for majority of them there should be
provided thread-safe alternatives akin to ctime_s().
That is, with exception of family of functions that uses FILE*. Not
that I like them very much, but they are ingrained too deeply.
So, peeking just asctime and ctime out of long list of problematic
APIs does not appear particularly consistent. If they were asking me
where to start, I'd start with rand().

With regard to new feature, the list is too long to comment in one post.
Just want to say that strfrom* family is long overdue, but still appear incomplete. The guiding principle should be that all format specifiers available in printf() with sole exception of %s should be provided as
strfrom* as well.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Thu May 23 14:32:46 2024

On 23/05/2024 00:53, Keith Thompson wrote:

David Brown <[email protected]> writes:

On 22/05/2024 19:42, Thiago Adams wrote:

[...]

- nullptr

I am fond of nullptr in C++, and will use it in C. Like most of the
C23 changes, it's not a big issue - after all, you get a lot of the
same effect with "#define nullptr (void*)(0)" or similar. But it
means your code has a visual distinction between the integer 0 and a
null pointer, and also lets the compiler or other static checking
system check better than using NULL would. (And I don't like NULL - I
dislike all-caps identifiers in general.)

Quibble: That should be

#define nullptr ((void*)0)

Indeed.

For example, this doesn't produce a syntax error for `sizeof nullptr`.

Better:

#if __STDC_VERSION__ < 202311L
#define nullptr ((void*)0)
#endif

C23's nullptr is of type nullptr_t, not void*. But you'd probably have
to go out of your way for that to be an issue (e.g., using nullptr in a generic selection).

The use of generics can be an advantage of nullptr here. The use in
templates was a prime motivation of introducing nullptr to C++, though I
think it is fair to say that templates are very much more popular in C++
than _Generic is in C. But I haven't thought of a real-world use-case yet!

[...]

- constexpr

I will definitely use that. Sometimes I want a constant expression
for things like array sizes or static initialisers, and want to
calculate it. constexpr gives you that without having to resort to
macros. (I'd perhaps be even happier if I could just use const, as I
can in C++.)

But const doesn't mean constant. It means read-only.
`const int r = rand();` is perfectly valid.

Yes - which is why "constexpr" can be useful.

I dislike the C++ hack of making N a constant expression given
`const int N = 42;`; constexpr made that unnecessary.

I find that "hack" convenient at times. But I see what you mean that it
is a "hack", and I agree that "constexpr" makes such a hack unnecessary.
(Ideally, the languages would have used terms such as "read_only" and "constant" rather than "const" and "constexpr", but that boat sailed
long ago.)

C23 makes the
same (IMHO) mistake.

I don't think so - as far as I can see, it avoids that mistake (if you
feel the "hack" was a mistake). C23 can't fix the choice of names -
that was from C90.

If I had a time machine, I'd spell "const" as "readonly" and make
"const" mean what "constexpr" now means (evaluated at compile time).

[...]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Lawrence D'Oliveiro on Thu May 23 15:36:03 2024

On Thu, 23 May 2024 02:49:37 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:

On Wed, 22 May 2024 14:42:58 -0300, Thiago Adams wrote:

I am waiting MSVC support. There are a lot of simple features MSVC
could implement and deliver in small increments. But it is very
slow.

And they wonder why developers are deserting the Windows platform for
Linux.

In practice, on my old home Windows PC (11 y.o. installation of 14 y.o.
OS) today, 2024-05-23, I can easily install and use gcc14.1.0 alongside clang18.1.5 alongside one of the newest versions of MSVC (not sure
which one) alongside latest Intel ICC alongside any older version of
MSVC and ICC and with a little more effort and disk space alongside
older versions of clang and gcc at least as long back as gcc4.9. I can
use all of those either simultaneously or interchangeably.
I very much doubt that I can get similar variety of compiler versions
on Linux of similar age or even on one that is 5 years younger. Even on
most up to date Linux distros, in order to get such compilers zoo, I'd
probably have to fight against package manager rather than be assisted
by it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Thiago Adams on Thu May 23 14:17:53 2024

On 22/05/2024 22:26, Thiago Adams wrote:

On 22/05/2024 17:11, David Brown wrote:

On 22/05/2024 19:42, Thiago Adams wrote:

On 22/05/2024 13:55, David Brown wrote:

In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change
in it, especially compared to the minor changes in C17.

<https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>
<https://en.wikipedia.org/wiki/C23_(C_standard_revision)>
<https://en.cppreference.com/w/c/23>

- constexpr

I will definitely use that. Sometimes I want a constant expression
for things like array sizes or static initialisers, and want to
calculate it. constexpr gives you that without having to resort to
macros. (I'd perhaps be even happier if I could just use const, as I
can in C++.)

I am curious for that. Do you have a sample?

If I try to be precise about the terms "constant expression", "integer
constant expression", etc., I suspect I will get the details wrong
unless I spend a lot of time checking carefully. So I hope it is good
enough for me to be a bit lazy and quote the error messages from gcc
(with "-std=c23 -Wpedantic").

With this code, compilation fails "initialiser element is not a
constant" for y.

int x = 100;
int y = x / 20;
int zs[y];

With this code, compilation fails because the zs is actually a VLA, and "variably modified 'zs' at file scope" is not allowed.

const int x = 100;
const int y = x / 20;
int zs[y];

This code, however, is fine:

constexpr int x = 100;
constexpr int y = x / 20;
int zs[y];

This also works, even for older standards:

enum { x = 100 };
enum { y = x / 20 };
int zs[y];

But constexpr works for other types, not just "int" which is the type of
all enumeration constants. (And "enum" constants are a somewhat weird
way to get this effect - "constexpr" looks neater.)

And in general, I like to be able to say, to the compiler and to people
reading the code, "this thing is really fixed and constant, and stop
compiling if you think I am wrong" rather than just "I promise I won't
change this thing - or if I do, I don't mind the nasal daemons".

Not sure
- empty initializer

I don't see that one being a big hit, at least for me. But I see
little benefit in /not/ allowing it in the language, so it seems a
sensible addition.

This is what I use
struct X x = {0};
But I can do a find-replace and change everything to {}

You could, but I don't really see the point of such a change. But in
new code it would be fine to write "= {}" rather than "= { 0 }".

When I create samples, I use new feature like nullptr and {}.
The problem I see is to use these features in real code, and create a
mess of styles.

I think you need significant motivation to justify changing style in
existing code, and I don't see anything here that would make me want to
change existing C17 code to C23 code. But when writing new code, I'd
use the new features.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to David Brown on Thu May 23 15:43:31 2024

On Wed, 22 May 2024 22:11:44 +0200
David Brown <[email protected]> wrote:

I will definitely use that. Sometimes I want a constant expression
for things like array sizes or static initialisers, and want to
calculate it. constexpr gives you that without having to resort to
macros.

I don't say that everything that can be done with C23 constexpr can be
done with enum, but for uses like ones you mentioned above, 90%
probably can be done with enum.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Thiago Adams on Thu May 23 15:11:45 2024

On 23/05/2024 14:38, Thiago Adams wrote:

On 23/05/2024 09:17, David Brown wrote:

If I try to be precise about the terms "constant expression", "integer
constant expression", etc., I suspect I will get the details wrong
unless I spend a lot of time checking carefully. So I hope it is good
enough for me to be a bit lazy and quote the error messages from gcc
(with "-std=c23 -Wpedantic").

With this code, compilation fails "initialiser element is not a
constant" for y.

     int x = 100;
     int y = x / 20;
     int zs[y];

With this code, compilation fails because the zs is actually a VLA,
and "variably modified 'zs' at file scope" is not allowed.

     const int x = 100;
     const int y = x / 20;
     int zs[y];

This code, however, is fine:

     constexpr int x = 100;
     constexpr int y = x / 20;
     int zs[y];

This also works, even for older standards:

     enum { x = 100 };
     enum { y = x / 20 };
     int zs[y];

But constexpr works for other types, not just "int" which is the type
of all enumeration constants. (And "enum" constants are a somewhat
weird way to get this effect - "constexpr" looks neater.)

And in general, I like to be able to say, to the compiler and to
people reading the code, "this thing is really fixed and constant, and
stop compiling if you think I am wrong" rather than just "I promise I
won't change this thing - or if I do, I don't mind the nasal daemons".

We can write:

#define X 100
#define Y ((X) / 20)
int zs[Y];

I cannot see a good justification for constexpr.

Clearer code, better checking along the way, better typing. I don't
think constexpr lets you do things you couldn't do before, but it lets
you do those things in a neater way. (IMHO.)

I already see bad usages of constexpr in C++ code. It was used in cases
where we know for sure that is NOT compile time. This just make review
harder "why did someone put this here?" conclusion was it was totally unnecessary and ignored by the compiler. The programmer was trying to
add something extra, like "magic" hoping for something that would never happen.

IME poor or confusing uses of "constexpr" are for functions, not
objects, and C23 does not support "constexpr" for functions.

I think it is better to think of constexpr functions in C++ as "pure"
functions - confusingly called __attribute__((const)) functions in gcc
and [[unsequenced] functions in C23. That is, functions that don't
affect anything around them, are not affected by anything external, have
no side effects, always give the same results for the same parameters,
and can be called more or fewer times without affecting the program's observable behaviour. (It's not exactly like that in C++ - a
"constexpr" function is implicitly inline and needs a local definition.
But I think that is how it could have been handled.)

The whole thing - in C and C++ - suffers somewhat from being addons over
time rather than part of the original design of the language. But
that's inevitable as an old language evolves.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to bart on Thu May 23 15:25:43 2024

On 23/05/2024 14:11, bart wrote:

On 23/05/2024 02:21, Thiago Adams wrote:

Em 5/22/2024 7:53 PM, Keith Thompson escreveu:

But const doesn't mean constant. It means read-only.
`const int r = rand();` is perfectly valid.

I dislike the C++ hack of making N a constant expression given
`const int N = 42;`; constexpr made that unnecessary. C23 makes the
same (IMHO) mistake.

If I had a time machine, I'd spell "const" as "readonly" and make
"const" mean what "constexpr" now means (evaluated at compile time).

[...]

Everything is a mess: const in C++, the differences from const in C,
etc. constexpr in C23 just makes the mess bigger.

auto is a mess as well not well specified for pointer. not sure if we
had this topic here, but auto * p in C is not specified.

I would remove from C23
- nullptr
-auto
-constexpr
-embed

I like the idea of embed but there is no implementation in production
so this is crazy!

'embed' was discussed a few months ago. I disagreed with the poor way it
was to be implemented: 'embed' notionally generates a list of
comma-separated numbers as tokens, where you have to take care of any trailing zero yourself if needed. It would also be hopelessly
inefficient if actually implemented like that.

Fortunately, it is /not/ actually implemented like that - it is only implemented "as if" it were like that. Real prototype implementations
(for gcc and clang - I don't know about other tools) are extremely
efficient at handling #embed. And the comma-separated numbers can be
more flexible in less common use-cases.

(That was also made clear in the previous discussion. It's been a while
since you posted much here - it's nice to see you back on form :-) )

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Lawrence D'Oliveiro on Thu May 23 15:42:29 2024

On 23/05/2024 05:13, Lawrence D'Oliveiro wrote:

On Wed, 22 May 2024 18:55:36 +0200, David Brown wrote:

<https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>

Unicode identifiers!

typedef int
typėdef;

These have been around since C99...

There are a couple of minor tweaks to the characters supported, I think,
but nothing anyone is likely to notice in practice.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Chris M. Thomasson on Thu May 23 15:35:21 2024

On 22/05/2024 23:24, Chris M. Thomasson wrote:

On 5/22/2024 9:55 AM, David Brown wrote:

In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change in
it, especially compared to the minor changes in C17.

Love the way std::vectors respect alignas... C++20, iirc?

[...]

I have no idea what you are talking about.

But did you notice that this is c.l.c, not c.l.c++, and the topic is
C23, not C++23 ? Discussing comparisons or compatibility with C++ is
fair enough, but talking about pure C++ matters (such as std::vector<>)
is unlikely to be helpful.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Michael S on Thu May 23 15:31:19 2024

On 23/05/2024 14:43, Michael S wrote:

On Wed, 22 May 2024 22:11:44 +0200
David Brown <[email protected]> wrote:

I will definitely use that. Sometimes I want a constant expression
for things like array sizes or static initialisers, and want to
calculate it. constexpr gives you that without having to resort to
macros.

I don't say that everything that can be done with C23 constexpr can be
done with enum, but for uses like ones you mentioned above, 90%
probably can be done with enum.

I realise that, and use enum for such things today. But IMHO constexpr
is neater and it also covers the other 10%.

I think most of the new features of C23 neaten up the language a bit.
They are not game-changers - I doubt that any of them will significantly
change the way anyone writes their code (especially for those already
happy with gcc or clang extensions). But there are several things here
that can make code a little nicer.

So yes, I /could/ use enum constants for things that are not
enumerations. I /did/ use them for that. But going forward with C23,
I'll use constexpr instead.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Michael S on Thu May 23 15:56:39 2024

On 23/05/2024 14:02, Michael S wrote:

On Wed, 22 May 2024 18:55:36 +0200
David Brown <[email protected]> wrote:

In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change
in it, especially compared to the minor changes in C17.

<https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>
<https://en.wikipedia.org/wiki/C23_(C_standard_revision)>
<https://en.cppreference.com/w/c/23>

I like that it tidies up a lot of old stuff - it is neater to have
things like "bool", "static_assert", etc., as part of the language
rather than needing a half-dozen includes for such basic stuff.

I like that it standardises a several useful extensions that have
been in gcc and clang (and possibly other compilers) for many years.

I'm not sure it will make a big difference to my own programming -
when I want "typeof" or "chk_add()", I already use them in gcc. But
for people restricted to standard C, there's more new to enjoy. And
I prefer to use standard syntax when possible.

"constexpr" is something I think I will find helpful, in at least
some circumstances.

Removed
1) Old-style function declarations and definitions
2) Representations for signed integers other than two's complement
3) Permission that u/U-prefixed character constants and string
literals may be not UTF-16/32
4) Mixed wide string literal concatenation
5) Support for calling realloc() with zero size (the behavior becomes undefined)
6) __alignof_is_defined and __alignas_is_defined
7) static_assert is not provided as a macro defined in <assert.h>
(becomes a keyword)
8) thread_local is not provided as a macro defined
in <threads.h> (becomes a keyword)

1) good

Yes, at long last.

2) good, but insufficient. The next logical step is to make both left
and right shift of negative integers by count that does not exceed #
of bits in respective type fully defined

Agreed.

3) IDNC
4) IDNC
5) IDNC
6) IDNC
7) bad. Breaks existing code for weak reason
8) bad. Breaks existing code for weak reason

I am of the opinion that people should specify the standard they use as
part of their build procedures. (I'd have liked a standard way to
specify the C standard version code uses, so that it could be fixed in
source code files.) I don't think people should take random code for
Cxx and assume blindly that it will work for Cyy.

Yes, these will break some code. But I don't think it will break much,
and it will be nice to cut down on some of these headers. I have some
very old code that defines static_assert as a macro involving typedefs
with structs that can have positive or negative sizes, for C90 and C99.
I don't expect to compiler these as C23 without testing - backwards compatibility is vital, but excessive backwards compatibility restricts improvements to the language.

Still, it's a valid complaint. No change is going to please everyone!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to David Brown on Thu May 23 17:19:11 2024

On Wed, 22 May 2024 18:55:36 +0200
David Brown <[email protected]> wrote:

In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change
in it, especially compared to the minor changes in C17.

Why C Standard Committee, while being recently quite liberal in field
of introducing new keywords (too liberal for my liking, many new things
do not really deserve keywords not prefixed by __) is so conservative
in introduction of program control constructs? I don't remember any
new program control introduced under Committee regime.
And I want at least one.

Another area that was mostly unchanged since 1st edition of K&R is
storage classes. Even such obvious thing as removal of 'auto' class
took too long. If I am not mistaken, totally obsolete 'register' class
is still allowed. And I don't remember any additions.
Personally I can think about at least two useful backward-compatible
additions in that area.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Michael S on Thu May 23 16:40:09 2024

Michael S <[email protected]> writes:

On Thu, 23 May 2024 02:49:37 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:

On Wed, 22 May 2024 14:42:58 -0300, Thiago Adams wrote:

I am waiting MSVC support. There are a lot of simple features MSVC
could implement and deliver in small increments. But it is very
slow.

And they wonder why developers are deserting the Windows platform for
Linux.

In practice, on my old home Windows PC (11 y.o. installation of 14 y.o.
OS) today, 2024-05-23, I can easily install and use gcc14.1.0 alongside >clang18.1.5 alongside one of the newest versions of MSVC (not sure
which one) alongside latest Intel ICC alongside any older version of
MSVC and ICC and with a little more effort and disk space alongside
older versions of clang and gcc at least as long back as gcc4.9. I can
use all of those either simultaneously or interchangeably.
I very much doubt that I can get similar variety of compiler versions
on Linux of similar age or even on one that is 5 years younger. Even on
most up to date Linux distros, in order to get such compilers zoo, I'd >probably have to fight against package manager rather than be assisted
by it.

While it is not likely that the set of pre-built packages available from the vendor for a particular distribution will include more than two versions
each of gcc and clang, with a simple script, one can easily build
all the versions of gcc or clang that one could ever want. Our
linux systems have gcc4,5,6,7,8,9,10,11,12 and 13 on them as well
as several versions of clang. If our customers were interested
in ICC (which is unlikely as they're mainly ARM based), linux could
accomodate them as well.

And given the extensive use of gcc extensions, msvc is out of the question.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Thiago Adams on Thu May 23 17:08:03 2024

Thiago Adams <[email protected]> writes:

On 23/05/2024 09:17, David Brown wrote:

If I try to be precise about the terms "constant expression", "integer
constant expression", etc., I suspect I will get the details wrong
unless I spend a lot of time checking carefully. So I hope it is good
enough for me to be a bit lazy and quote the error messages from gcc
(with "-std=c23 -Wpedantic").

With this code, compilation fails "initialiser element is not a
constant" for y.

    int x = 100;
    int y = x / 20;
    int zs[y];

With this code, compilation fails because the zs is actually a VLA, and
"variably modified 'zs' at file scope" is not allowed.

    const int x = 100;
    const int y = x / 20;
    int zs[y];

This code, however, is fine:

    constexpr int x = 100;
    constexpr int y = x / 20;
    int zs[y];

This also works, even for older standards:

    enum { x = 100 };
    enum { y = x / 20 };
    int zs[y];

But constexpr works for other types, not just "int" which is the type of
all enumeration constants. (And "enum" constants are a somewhat weird
way to get this effect - "constexpr" looks neater.)

And in general, I like to be able to say, to the compiler and to people
reading the code, "this thing is really fixed and constant, and stop
compiling if you think I am wrong" rather than just "I promise I won't
change this thing - or if I do, I don't mind the nasal daemons".

We can write:

#define X 100
#define Y ((X) / 20)

Neither of which convey type information.

int zs[Y];

I cannot see a good justification for constexpr.

Which does convey type information, and thus would
be superior to untyped macro definitions.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Michael S on Thu May 23 17:10:38 2024

Michael S <[email protected]> writes:

On Wed, 22 May 2024 22:11:44 +0200
David Brown <[email protected]> wrote:

I will definitely use that. Sometimes I want a constant expression
for things like array sizes or static initialisers, and want to
calculate it. constexpr gives you that without having to resort to
macros.

I don't say that everything that can be done with C23 constexpr can be
done with enum, but for uses like ones you mentioned above, 90%
probably can be done with enum.

Are C23 enums signed? or unsigned? What is the supported enum range?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Scott Lurndal on Thu May 23 20:31:59 2024

On Thu, 23 May 2024 17:10:38 GMT
[email protected] (Scott Lurndal) wrote:

Michael S <[email protected]> writes:

On Wed, 22 May 2024 22:11:44 +0200
David Brown <[email protected]> wrote:

I will definitely use that. Sometimes I want a constant expression
for things like array sizes or static initialisers, and want to
calculate it. constexpr gives you that without having to resort to
macros.

I don't say that everything that can be done with C23 constexpr can
be done with enum, but for uses like ones you mentioned above, 90%
probably can be done with enum.

Are C23 enums signed? or unsigned? What is the supported enum
range?

I never read the standard, so below is *according to my understanding*,
rather than the fact.
Before C23 - signed, at least as wide as int, but wider ranges are not prohibited and can be provided by implementation.
C23 - enum without type specifier are the same as before. enum with type specifier have range of their master type.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Kuyper@21:1/5 to Michael S on Thu May 23 14:35:57 2024

On 5/23/24 10:19, Michael S wrote:
...

Another area that was mostly unchanged since 1st edition of K&R is
storage classes. Even such obvious thing as removal of 'auto' class
took too long. If I am not mistaken, totally obsolete 'register' class
is still allowed. And I don't remember any additions.

constexpr and thread_local have both been added to the list of
Storage-class specifiers since K&R.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Michael S on Thu May 23 22:10:22 2024

On 23/05/2024 16:19, Michael S wrote:

On Wed, 22 May 2024 18:55:36 +0200
David Brown <[email protected]> wrote:

In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change
in it, especially compared to the minor changes in C17.

Why C Standard Committee, while being recently quite liberal in field
of introducing new keywords (too liberal for my liking, many new things
do not really deserve keywords not prefixed by __) is so conservative
in introduction of program control constructs? I don't remember any
new program control introduced under Committee regime.
And I want at least one.

What program control construct would you like?

Another area that was mostly unchanged since 1st edition of K&R is
storage classes. Even such obvious thing as removal of 'auto' class
took too long. If I am not mistaken, totally obsolete 'register' class
is still allowed.

"register" is still in C23. (Some compilers pay attention to it. gcc
with optimisation disabled puts local variables on the stack, except for
those marked "register" that get put in registers.) It got dropped from
C++ when "auto" was re-purposed in C++11, but with the keyword
"register" kept for future use. I would not have objected to the same
thing happening in C23.

And I don't remember any additions.

_Thread_local was added in C11, with the alias thread_local in C23.

What would you like to see here?

Personally I can think about at least two useful backward-compatible additions in that area.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to David Brown on Thu May 23 20:23:15 2024

On 2024-05-23, David Brown <[email protected]> wrote:

So yes, I /could/ use enum constants for things that are not
enumerations. I /did/ use them for that. But going forward with C23,
I'll use constexpr instead.

The value of an enum is:

1. Compiler warns of incomplete switch cases.

2. In a debugger when you examine an enum-valued expression or
variable, you get the symbolic name:

3. Safety (with C++ enum rules: no implicit
conversion from ordinary integer type to enum).

Historically, C code bases have abused enums to defined constants
like "enum { bufsize = 1024 }" for understandable reasons, but it is a cringe-inducing hack, which is also incomplete and inflexible; e.g. what
if we want a floating-point constant.

I've benefited from (3) in C programs that were contrived
to be compilable as C++. (That practice, though, tends to increasingly
hamper your dialect choice though, as the languages diverge and make
only small steps here and there to become closer.)

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @[email protected]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Keith Thompson on Fri May 24 00:48:02 2024

On Thu, 23 May 2024 14:38:23 -0700
Keith Thompson <[email protected]> wrote:

Michael S <[email protected]> writes:

On Wed, 22 May 2024 18:55:36 +0200
David Brown <[email protected]> wrote:

In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change
in it, especially compared to the minor changes in C17.

Why C Standard Committee, while being recently quite liberal in
field of introducing new keywords (too liberal for my liking, many
new things do not really deserve keywords not prefixed by __) is so conservative in introduction of program control constructs? I don't remember any new program control introduced under Committee regime.
And I want at least one.

Which is?

New keywords are typically prefixed by an underscore and an upper case letter, such as C11's "_Generic". There are no (standard) keywords
starting with "__".

You are right. I confused Standard C language with
implementation-defined extensions.

But the point stands: in recent times Committee is (to my liking) not sufficiently conservative in adding keywords that do not start with the underscore followed by uppercase.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to David Brown on Fri May 24 00:34:24 2024

On Thu, 23 May 2024 22:10:22 +0200
David Brown <[email protected]> wrote:

On 23/05/2024 16:19, Michael S wrote:

On Wed, 22 May 2024 18:55:36 +0200
David Brown <[email protected]> wrote:

In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change
in it, especially compared to the minor changes in C17.

Why C Standard Committee, while being recently quite liberal in
field of introducing new keywords (too liberal for my liking, many
new things do not really deserve keywords not prefixed by __) is so conservative in introduction of program control constructs? I don't remember any new program control introduced under Committee regime.
And I want at least one.

What program control construct would you like?

Ability to break from nested loops. Ability to"continue" outer loops
would be nice too, but less important.
I am not sure what syntax I want for this feature, never considered
myself a competent language designer.

Another area that was mostly unchanged since 1st edition of K&R is
storage classes. Even such obvious thing as removal of 'auto' class
took too long. If I am not mistaken, totally obsolete 'register'
class is still allowed.

"register" is still in C23. (Some compilers pay attention to it.
gcc with optimisation disabled puts local variables on the stack,
except for those marked "register" that get put in registers.) It
got dropped from C++ when "auto" was re-purposed in C++11, but with
the keyword "register" kept for future use. I would not have
objected to the same thing happening in C23.

And I don't remember any additions.

_Thread_local was added in C11, with the alias thread_local in C23.

_Thread_local is a special-purpose thing, probably not applicable at
all for programming of small embedded systems, which nowadays is the
only type of programming in C that I do for money rather than as hobby.
With regard to constexpr, mentioned above by James Kuyper, my feeling
about it is that it belongs to metaprogramming so I would not consider
it a real storage class.

What would you like to see here?

Instead of solutions, let's talk about problems that I want to solve:

1. global objects, declared in header files and included several times.
Where defined?
For some linkers, mostly unixy linkers, in case of none-initialized
objects (implicitly initialized to zero) it somehow works.
For linkers used on embedded systems it requires additional effort.
I think, for initialized globals it takes additional effort even with
unixy linkers.
I wnat it to "just work" everywhere. I think that the best way to get
it without breaking existing semantics is a new storage class.

2. Reversing defaults for visibility of objects and functions at file
scope.
Something like:
#pragma export_by_default(off).
When this pragma is in effect, we need a way to make objects and
functions globally visible. I think that it's done best with new
storage class.

Personally I can think about at least two useful backward-compatible additions in that area.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to All on Thu May 23 17:37:39 2024

Michael S <[email protected]> writes:

[comments on various new features in C23]

Overall I am quite disappointed by C23. IMO it's a step
backwards rather than forwards.

W.r.t. [asctime() and ctime() being removed]
IMHO, all old-UNIX-style APIs that return pointers to static
objects within library or rely on presence of static object within
library for purpose of preserving state for subsequent calls
should be systematically deprecated and for majority of them there
should be provided thread-safe alternatives akin to ctime_s().

That is, with exception of family of functions that uses FILE*.
Not that I like them very much, but they are ingrained too deeply.
So, peeking just asctime and ctime out of long list of problematic
APIs does not appear particularly consistent. If they were asking
me where to start, I'd start with rand().

I agree with the suggestion that restartable versions of "dirty"
functions be added to the C standard. I strongly disagree that
the old ones should be taken out. If compilers choose to give
warnings, that's fine, but these functions should not be removed
just because some people think they are clunky.

[...] Just want to say that strfrom* family is long overdue, but
still appear incomplete. The guiding principle should be that all
format specifiers available in printf() with sole exception of %s
should be provided as strfrom* as well.

What's the motivation for having separate functions? To me this
looks like creeping featuritis.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Michael S on Fri May 24 01:06:42 2024

On Fri, 24 May 2024 00:34:24 +0300, Michael S wrote:

On Thu, 23 May 2024 22:10:22 +0200 David Brown
<[email protected]> wrote:

What program control construct would you like?

Ability to break from nested loops.

At least 90% of the time, when I want to exit from an inner loop in C,
there will be some kind of cleanup I need to do in the outer loop before
that can exit too. So the ability to jump straight out will rarely be
used.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Thu May 23 18:35:58 2024

Michael S <[email protected]> writes:

[what new language features would you like?]

Ability to break from nested loops. Ability to"continue" outer
loops would be nice too, but less important. [...]

1. global objects, declared in header files and included
several times. Where defined? [...] I wnat it to "just work"
everywhere. [...]

Both of these features seem like frills. Neither one is either
necessary or common; they would make the language bigger but
especially any better. Adding them would be the proverbial tail
wagging the dog.

2. Reversing defaults for visibility of objects and functions
at file scope.
Something like:
#pragma export_by_default(off).

Not sure taking a half-and-half approach on this is a good idea,
but if so I think it's better to have the choice be a per-TU
compilation option rather than a #pragma.

When this pragma is in effect, we need a way to make objects and
functions globally visible. I think that it's done best with new
storage class.

Just use extern. No new storage class needed.

With regard to constexpr, mentioned above by James Kuyper, my
feeling about it is that it belongs to metaprogramming so I
would not consider it a real storage class.

Having 'constexpr' be classified as a storage class illustrates
how poorly thought out it is.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Keith Thompson on Thu May 23 20:28:29 2024

Keith Thompson <[email protected]> writes:

Tim Rentsch <[email protected]> writes:
[...]

Having 'constexpr' be classified as a storage class illustrates
how poorly thought out it is.

constexpr is not classified as a storage class. N3022, like earlier editions, says there are four storage durations: static, thread,
automatic, and allocated.

Obviously I was talking about the syntactic classification.
Don't be obtuse.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Malcolm McLean on Fri May 24 05:42:41 2024

On Fri, 24 May 2024 06:38:18 +0100, Malcolm McLean wrote:

On 24/05/2024 02:06, Lawrence D'Oliveiro wrote:

On Fri, 24 May 2024 00:34:24 +0300, Michael S wrote:

On Thu, 23 May 2024 22:10:22 +0200 David Brown
<[email protected]> wrote:

What program control construct would you like?

Ability to break from nested loops.

At least 90% of the time, when I want to exit from an inner loop in C,
there will be some kind of cleanup I need to do in the outer loop
before that can exit too. So the ability to jump straight out will
rarely be used.

goto gives you the functionality you require.

I avoid those. I structure my code like a Nassi-Shneiderman diagram, where
each block has one entrance at the top, and one exit at the bottom. Easier
to keep track of error conditions and cleanups that way.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From jak@21:1/5 to All on Fri May 24 09:32:38 2024

Bonita Montero ha scritto:

Am 23.05.2024 um 21:49 schrieb Thiago Adams:

On 23/05/2024 16:25, Bonita Montero wrote:

I ask myself what the point is in further developing a language
like this that can actually no longer be saved.

do you mean C++?

No, C.

I think you have a lot of confusion about programming languages. C and
C++ are not comparable languages. it can be c and assembler, or c++ and
java. If you really want to compare c to c++, then c++ is to c as rust
is to c++. I'm pretty convinced that c++ will be abandoned long before
c. Just for one example, c++ would be abandoned years ago if c# didn't
produce CLI code only because C# lacks nothing important than C++ and
the learning curve is much steeper (it also benefits from reflection).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Malcolm McLean on Fri May 24 11:42:24 2024

On Fri, 24 May 2024 06:38:18 +0100
Malcolm McLean <[email protected]> wrote:

On 24/05/2024 02:06, Lawrence D'Oliveiro wrote:

On Fri, 24 May 2024 00:34:24 +0300, Michael S wrote:

On Thu, 23 May 2024 22:10:22 +0200 David Brown
<[email protected]> wrote:

What program control construct would you like?

Ability to break from nested loops.

At least 90% of the time, when I want to exit from an inner loop in
C, there will be some kind of cleanup I need to do in the outer
loop before that can exit too. So the ability to jump straight out
will rarely be used.

goto gives you the functionality you require.

Sure, me too. Because that's what I have.
If they hadn't given me {, }, else, while, for, and do then I would
use goto to simulate all those as well. It gives functionality I
require, don't it?

I usually use goto for handling malloc() failures. So if an
allocation fails within a deeply nested loop, I will jump to code at
the end of the function, free up amy half-constructed objects, and
return an error condition.

I do similar thing too, but that's just a habit that I can't overcome.
It has no practical sense in environments that I work today. I could
just as well return immediately, without cleaning up, it would have
zero practical difference except that my code would be shorter and will
look cleaner.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Fri May 24 11:03:43 2024

On 23/05/2024 23:49, Keith Thompson wrote:

Thiago Adams <[email protected]> writes:

On 23/05/2024 10:11, David Brown wrote:

On 23/05/2024 14:38, Thiago Adams wrote:

[...]

I already see bad usages of constexpr in C++ code. It was used in
cases where we know for sure that is NOT compile time. This just
make review harder "why did someone put this here?" conclusion was
it was totally unnecessary and ignored by the compiler. The
programmer was trying to add something extra, like "magic" hoping
for something that would never happen.

IME poor or confusing uses of "constexpr" are for functions, not
objects, and C23 does not support "constexpr" for functions.

The sample C++ was something like

constexpr char * s[] = {"a", "b"};
for (int i = 0; i < sizeof(s); i++)
{
//using s[i]
}

I checked in C, it is an error.

Apparently C23 has stricter rules for constexpr than C++ does. I can
imagine those rules being relaxed in future editions of the C standard.

From the proposal for "constexpr" in C23, <https://open-std.org/JTC1/SC22/WG14/www/docs/n3018.htm>, it says:

"""
There are some restrictions on the type of an object that can be
declared with constexpr storage duration. There is a limited number of constructs that are not allowed:

pointer types:
allowing these to use non-trivial addresses would delay the
deduction of the concrete value from translation to link-time. For most
of the use cases, such a feature can already be coded by using a static
and const qualified pointer object, we don’t need constexpr for that. Therefore we only allow pointer types if the initializer value is null.
"""

I'm not sure (and haven't looked at all the discussions involved, so I
could be completely wrong), but I think there is concern that constexpr pointers, other than null pointers, might need more features in the
linker than C currently requires. C++ already has more demands of
linkers to handle things like inline variables and statics in templates.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Fri May 24 12:05:44 2024

On Thu, 23 May 2024 17:37:39 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

[comments on various new features in C23]

Overall I am quite disappointed by C23. IMO it's a step
backwards rather than forwards.

W.r.t. [asctime() and ctime() being removed]
IMHO, all old-UNIX-style APIs that return pointers to static
objects within library or rely on presence of static object within
library for purpose of preserving state for subsequent calls
should be systematically deprecated and for majority of them there
should be provided thread-safe alternatives akin to ctime_s().

That is, with exception of family of functions that uses FILE*.
Not that I like them very much, but they are ingrained too deeply.
So, peeking just asctime and ctime out of long list of problematic
APIs does not appear particularly consistent. If they were asking
me where to start, I'd start with rand().

I agree with the suggestion that restartable versions of "dirty"
functions be added to the C standard. I strongly disagree that
the old ones should be taken out. If compilers choose to give
warnings, that's fine, but these functions should not be removed
just because some people think they are clunky.

[...] Just want to say that strfrom* family is long overdue, but
still appear incomplete. The guiding principle should be that all
format specifiers available in printf() with sole exception of %s
should be provided as strfrom* as well.

What's the motivation for having separate functions? To me this
looks like creeping featuritis.

My practical motivation is space-constrained environments, where I
possibly want one or two or three formatters. sprintf() gives me
all or nothing and all can be too expensive. Many embedded environments
have big and small variants of sprintf that can be chosen at link time,
but what's in small variant does not necessarily match a set that I
want in my specific project. And is not necessarily well documented.

My aesthetic motivation is a symmetry between strto* and strfrom*.

My esoteric motivation is : sprintf() is historically associated with
"standard I/O". Functionality in question has no relationship to I/O.
But let's leave it aside, it's not important.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Fri May 24 06:54:35 2024

Michael S <[email protected]> writes:

On Thu, 23 May 2024 17:37:39 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

[...] Just want to say that strfrom* family is long overdue, but
still appear incomplete. The guiding principle should be that all
format specifiers available in printf() with sole exception of %s
should be provided as strfrom* as well.

What's the motivation for having separate functions? To me this
looks like creeping featuritis.

My practical motivation is space-constrained environments, where I
possibly want one or two or three formatters. sprintf() gives me all
or nothing and all can be too expensive. Many embedded environments
have big and small variants of sprintf that can be chosen at link
time, but what's in small variant does not necessarily match a set
that I want in my specific project. And is not necessarily well
documented.

Okay, I see now where you're coming from, although I'm not sure that
the strfrom*() functions will give you what you want (in terms of
memory footprint, etc). But I get your motivation.

Question: which of the four formats (%A, %E, %F, %G) are ones you
expect to use? Also I'm curious: do all of your target platforms
use IEEE floating point, or do some use other representations?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Fri May 24 15:45:52 2024

On 23/05/2024 22:06, Keith Thompson wrote:

David Brown <[email protected]> writes:

On 23/05/2024 14:11, bart wrote:

[...]

'embed' was discussed a few months ago. I disagreed with the poor
way it was to be implemented: 'embed' notionally generates a list of
comma-separated numbers as tokens, where you have to take care of
any trailing zero yourself if needed. It would also be hopelessly
inefficient if actually implemented like that.

Fortunately, it is /not/ actually implemented like that - it is only
implemented "as if" it were like that. Real prototype implementations
(for gcc and clang - I don't know about other tools) are extremely
efficient at handling #embed. And the comma-separated numbers can be
more flexible in less common use-cases.

[...]

I'm aware of a proposed implementation for clang:

https://github.com/llvm/llvm-project/pull/68620 https://github.com/ThePhD/llvm-project

I'm currently cloning the git repo, with the aim of building it so I can
try it out and test some corner cases. It will take a while.

I'm not aware of any prototype implementation for gcc. If you are, I'd
be very interested in trying it out.

I haven't seen anything concrete, but I believe I read about it in one
of the papers discussing #embed. It may have been just some tests and proofs-of-concept, and not a development branch or proposed implementation.

(And thanks for starting this thread!)

It's not easy to find a topic that is entirely about C, hasn't been
discussed to death already, has enough controversial aspects for a
serious discussion but not so many that it leads to fights and flames,
and is not so esoteric that it causes most readers eyes to glaze over!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Kaz Kylheku on Fri May 24 16:25:20 2024

On 23/05/2024 22:23, Kaz Kylheku wrote:

On 2024-05-23, David Brown <[email protected]> wrote:

So yes, I /could/ use enum constants for things that are not
enumerations. I /did/ use them for that. But going forward with C23,
I'll use constexpr instead.

The value of an enum is:

1. Compiler warns of incomplete switch cases.

(gcc -Wswitch or -Wswitch-enum)

To be clear - I will, without doubt, continue to use "enum" for
enumerations and enumerated types. For enumerations, enum gives all the advantages you mention and more (such as automatic choice of values).

But I'd rather use "constexpr" for constant expressions that are not enumerations.

2. In a debugger when you examine an enum-valued expression or
variable, you get the symbolic name:

3. Safety (with C++ enum rules: no implicit
conversion from ordinary integer type to enum).

(gcc -Wenum-compare -Wenum-conversion -Wenum-int-mismatch)

Historically, C code bases have abused enums to defined constants
like "enum { bufsize = 1024 }" for understandable reasons, but it is a cringe-inducing hack, which is also incomplete and inflexible; e.g. what
if we want a floating-point constant.

I've benefited from (3) in C programs that were contrived
to be compilable as C++. (That practice, though, tends to increasingly
hamper your dialect choice though, as the languages diverge and make
only small steps here and there to become closer.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Fri May 24 16:19:49 2024

On 23/05/2024 18:43, Keith Thompson wrote:

bart <[email protected]> writes:
[...]

I suspect that ones like 'embed' have been derived from C++ which
always likes to make things too wide-ranging and much harder to use
and implement than necessary.

No, C++ doesn't have #embed. (If it did, many C compilers would already
have it, since C and C++ commonly share the preprocessor
implementation.)

C++ has proposals for both #embed and std::embed<>, but AFAIK these are
not yet accepted. I expect #embed to make it (since the big tools will
support it for C anyway). std::embed<> is more powerful but has
additional complications.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Malcolm McLean on Fri May 24 16:39:02 2024

On 24/05/2024 01:06, Malcolm McLean wrote:

On 22/05/2024 20:50, David Brown wrote:

On 22/05/2024 21:10, Malcolm McLean wrote:

But even boolean type and const.

Const documents the code, makes the action of a function clearer to
the reader, and helps catch mistakes.

These are all things that make the language better, and have done so
for the past 25 years.

Of course quite alot of the functions don't actually change the
structures they are passed. But is littering the code with const
going to help? And why do you really need a boolean when an int can
hold either a zero or non-zero value?

And don't you just want a pared down, clean language?

I want a language with the features I need and that help me to write
good clear code. Minimal is not helpful, any more than needlessly
complex is helpful.

So the code I'm working on at the moment.

It's an implemention of XPath (a subset, of course). XPath is sort of
query language for XML. You pass a query string like
"/bookstore/book//title" and that selects all children of [root]/bookstore/book with the element tag "title".

Now querying the document shouldn't change it. So in C++ it should
bepassed in as a XMLDOC const &. In C, declaring the pointer a const
XMLDOC * conveyes the intention, but doesn't actually achieve the safety
you want and get with C++.

The safety is the same in C and C++ (unless your C++ code provides const
and non-const overloads for the function). References in C++ don't let
you pass a null pointer, but you can "cast away" the const in a const
reference as easily as you can remove the const in a const pointer:

void naughty(const int & x) {
int & y = (int &) x;
y++;
}

In both cases, the "const" is a promise to the reader and a promise to
the compiler, but you can break that promise if you do so explicitly.

However the algorithm I have just moved to needs a bit associated with
each node it can turn on and of. Now in fact I did this via a hash
table. But it is very tempting and far more efficient to simply add a
hacky field to the XMLNODE structure - after all, I wrote the XML
parser. And in C++ "mutable" is designed for just this. But in C,
were're either const or not. And isn't it maybe better to leave the
const qualifier off the document pointer?

"mutable" is just a kosher way of breaking your const promises. In
cases where "mutable" might be useful, I generally prefer to
differentiate between the part of structure that is fixed and
unchanging, and the part that is more volatile status. (This can also
be better from the viewpoint of cache friendliness, if that is of
concern.) But if I had a situation where C++ "mutable" would be the
best choice, and I had to implement it in C without "mutable", I am not
sure that casting to non-const in the implementation function would be
must worse.

In fact, wouldn't we just be better off without const?

No.

We'd be better off having everything const - /really/ constant - by
default, and having to explicitly declare the few things that actually
have to be changed after initialisation. That's how many modern
programming languages do it.

After all, you
need to read the function specifications anyway, and they should say
that querying for a path will not alter the document.

/Never/ write things in comments or documentation if you can express the
same thing in code.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Chris M. Thomasson on Fri May 24 16:50:28 2024

On 24/05/2024 01:05, Chris M. Thomasson wrote:

On 5/23/2024 6:35 AM, David Brown wrote:

On 22/05/2024 23:24, Chris M. Thomasson wrote:

On 5/22/2024 9:55 AM, David Brown wrote:

In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change
in it, especially compared to the minor changes in C17.

Love the way std::vectors respect alignas... C++20, iirc?

[...]

I have no idea what you are talking about.

std::vector actually respects alignas, on MSVC at least. I did not know
this worked until I tried it. Iirc, Bonita was the one that sparked my
test. It aligned itself on the proper boundaries. Very nice.

But did you notice that this is c.l.c, not c.l.c++, and the topic is
C23, not C++23 ? Discussing comparisons or compatibility with C++ is
fair enough, but talking about pure C++ matters (such as
std::vector<>) is unlikely to be helpful.

C has it as well... Very useful!

I know C has alignas (now as a keyword in C23, instead of just _Alignas
from C11).

I know C++ has alignas (from C++11 onwards).

What I don't understand is why you think std::vector<> "respects
alignas" in C++20 - alignment for std::vector<> works like alignment for
any other class in C++, and always has done.

And what I /really/ don't understand is why you think it is remotely
relevant here? Even "alignas" in C is not particular relevant to this
thread, except that it has become a keyword in C23 instead of a macro
defined to _Alignas in <stdalign.h>.

Perhaps I should just be grateful for the small mercy of there being no
random youtube link in your post.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Fri May 24 17:10:31 2024

On 23/05/2024 18:40, Keith Thompson wrote:

Michael S <[email protected]> writes:
[...]

Removed

[...]

7) static_assert is not provided as a macro defined in <assert.h>
(becomes a keyword)
8) thread_local is not provided as a macro defined in <threads.h>
(becomes a keyword)

[...]

7) bad. Breaks existing code for weak reason
8) bad. Breaks existing code for weak reason

In pre-C23, _Static_assert and _Thread_local are keywords, and
static_assert and thread_local are macros that expand to those keywords.

In C23, _Static_assert, _Thread_local, static_assert, and thread_local
are all keywords. Code that simply uses the old ugly keywords would not break.

Code that does something like "#ifdef static_assert". I suppose the
headers could have retained the old macro definitions.

#define static_assert static_assert
#define thread_local thread_local

The sort of code that could theoretically break is when you have
definitions like this:

#define STATIC_ASSERT_NAME_(line) STATIC_ASSERT_NAME2_(line)
#define STATIC_ASSERT_NAME2_(line) assertion_failed_at_line_##line
#define static_assert(claim, warning) \
typedef struct { \
char STATIC_ASSERT_NAME_(__COUNTER__) [(claim) ? 2 : -2]; \
} STATIC_ASSERT_NAME_(__COUNTER__)

That works in any C version, until C23, almost as well as
_static_assert. I used this when C11 support was rare in the tools I used.

While using #define for a C keyword is undefined behaviour, in practice
I think you'd have a hard time finding code and a compiler that used
such a macro and which did not work just as well in C23 mode.

(I don't know if anyone is in the habit of declaring macros named "thread_local".)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Michael S on Fri May 24 17:57:35 2024

On 23/05/2024 23:34, Michael S wrote:

On Thu, 23 May 2024 22:10:22 +0200
David Brown <[email protected]> wrote:

On 23/05/2024 16:19, Michael S wrote:

On Wed, 22 May 2024 18:55:36 +0200
David Brown <[email protected]> wrote:

In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change
in it, especially compared to the minor changes in C17.

Why C Standard Committee, while being recently quite liberal in
field of introducing new keywords (too liberal for my liking, many
new things do not really deserve keywords not prefixed by __) is so
conservative in introduction of program control constructs? I don't
remember any new program control introduced under Committee regime.
And I want at least one.

What program control construct would you like?

Ability to break from nested loops. Ability to"continue" outer loops
would be nice too, but less important.
I am not sure what syntax I want for this feature, never considered
myself a competent language designer.

I've heard people request this before. I can't say I've ever felt it
was something I'd have use for, but there's lots of things in C that I
never need.

There is a proposal for adding it to C:

<https://open-std.org/JTC1/SC22/WG14/www/docs/n3195.htm>

Another area that was mostly unchanged since 1st edition of K&R is
storage classes. Even such obvious thing as removal of 'auto' class
took too long. If I am not mistaken, totally obsolete 'register'
class is still allowed.

"register" is still in C23. (Some compilers pay attention to it.
gcc with optimisation disabled puts local variables on the stack,
except for those marked "register" that get put in registers.) It
got dropped from C++ when "auto" was re-purposed in C++11, but with
the keyword "register" kept for future use. I would not have
objected to the same thing happening in C23.

And I don't remember any additions.

_Thread_local was added in C11, with the alias thread_local in C23.

_Thread_local is a special-purpose thing, probably not applicable at
all for programming of small embedded systems, which nowadays is the
only type of programming in C that I do for money rather than as hobby.

I have never seen the point of it either. Why would anyone want a
variable that exists for /all/ threads in a program, but independently
per thread? The only use I can think of is for errno (which is, IMHO, a
horror unto itself) but since that is defined by the implementation, it
does not need to use _Thread_local. (Indeed, thread-local errno macros
existed long before C11.)

You and I (as small embedded systems programmers) are perhaps biased in
being allergic to the wasted bytes of ram a thread-local variable would
likely use!

With regard to constexpr, mentioned above by James Kuyper, my feeling
about it is that it belongs to metaprogramming so I would not consider
it a real storage class.

The term "storage-class specifier" is a bit of a misnomer, in that it is
more of a syntactic term than referring just to the storage duration or placement of objects. "typedef" is also a storage-class specifier, for example.

What would you like to see here?

Instead of solutions, let's talk about problems that I want to solve:

Good idea.

1. global objects, declared in header files and included several times.
Where defined?

In C, they must be defined in exactly one translation unit.

For some linkers, mostly unixy linkers, in case of none-initialized
objects (implicitly initialized to zero) it somehow works.

The use of "int global_x;" in headers is undefined behaviour (AFAIK) in
C, and its support is a hangover from linker support for Fortran common
blocks. And it is the source of many odd errors for people who are not
careful enough in their coding. If your compiler supports this
misfeature (such as "gcc -fcommon"), and you accidentally declare two uninitialised non-static variables with the same name in two files, you
have no detection or protection from the chaos that ensues. With
compilers that don't support this ("gcc -fno-common"), you get a
link-time error showing your problem. (gcc made "-fno-common" the
default in version 10. That's 9 major versions late, IMHO, but better
late than never.)

(To the extent that it "works", it is handled by putting the symbol name
and data space reservation in a "common" section. At link time, common
symbols with the same name are merged - whether that makes sense for the
code or not.)

For linkers used on embedded systems it requires additional effort.

I can't say I have ever seen it as an effort. Almost all my C "modules"
come in pairs - "file.h" and "file.c". All non-local variables (and all functions) are either static and declared only in "file.c", or they are externally linked and have an "extern" declaration in "file.h" and a
definition (with or without initialisation) in "file.c" (which #includes "file.h"). It is a very simple and clean arrangement, easily checked by
gcc warnings, and there are never any undetected conflicts.

(And probably 90% or more of current small-systems embedded development
uses gcc and binutils linker.)

I think, for initialized globals it takes additional effort even with
unixy linkers.
I wnat it to "just work" everywhere. I think that the best way to get
it without breaking existing semantics is a new storage class.

This is all very much a non-issue for well-structured code.

The only time it can matter is if you want to write "header-only"
modules. This is popular in C++, but not in C. In C++ it relies on the
linker merging the same symbols for inline declarations such as inline functions, template functions, template class and function statics, and
- since C++17 - inline variables. So you can write "inline int
global_x;" or "inline int global_y = 123;" in a header, and it will be
created once and only once (if it is used somewhere). The
compiler/linker can't check for consistency of initialiser, unless you
are using link-time optimisation.

2. Reversing defaults for visibility of objects and functions at file
scope.
Something like:
#pragma export_by_default(off).
When this pragma is in effect, we need a way to make objects and
functions globally visible. I think that it's done best with new
storage class.

I would much prefer if file-level variables and functions were "static"
by default and required explicit exporting. But that ship sailed 50
years ago.

What you can do - what /I/ do - is be rigid in making sure all your
exported variables and functions are declared as "extern" in a header
that is also included by the defining C file. Use "gcc -Werror=missing-declarations -Werror=missing-variable-declarations" to
enforce these rules. It is not quite as good as going back in time and
fixing C at the start, but it's close!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Fri May 24 18:46:23 2024

On Fri, 24 May 2024 06:54:35 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Thu, 23 May 2024 17:37:39 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

[...] Just want to say that strfrom* family is long overdue, but
still appear incomplete. The guiding principle should be that all
format specifiers available in printf() with sole exception of %s
should be provided as strfrom* as well.

What's the motivation for having separate functions? To me this
looks like creeping featuritis.

My practical motivation is space-constrained environments, where I
possibly want one or two or three formatters. sprintf() gives me
all or nothing and all can be too expensive. Many embedded
environments have big and small variants of sprintf that can be
chosen at link time, but what's in small variant does not
necessarily match a set that I want in my specific project. And is
not necessarily well documented.

Okay, I see now where you're coming from, although I'm not sure that
the strfrom*() functions will give you what you want (in terms of
memory footprint, etc). But I get your motivation.

Question: which of the four formats (%A, %E, %F, %G) are ones you
expect to use?

Rarely: any of those, mostly for debugging.
In productioon code: %e is most likely, but %f could happen.
But it's not just a floating point. "Small" variants of sprintf() on
32-bit platforms often unable to handle %lld and %llu.

Also I'm curious: do all of your target platforms
use IEEE floating point, or do some use other representations?

Currently, only IEEE. In the past, there were others, but that was quite
a long tyme ago. Back, when after few years in other field I just
started my pro programming carieer, I spend couple of years doing
mostly TMS320C30. I don't remember for sure, but it is likely that I
never used formatted FP output there; our boards were probably too
short of memory for that.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to David Brown on Fri May 24 16:16:27 2024

David Brown <[email protected]> writes:

On 23/05/2024 23:34, Michael S wrote:

On Thu, 23 May 2024 22:10:22 +0200
David Brown <[email protected]> wrote:

_Thread_local is a special-purpose thing, probably not applicable at
all for programming of small embedded systems, which nowadays is the
only type of programming in C that I do for money rather than as hobby.

I have never seen the point of it either. Why would anyone want a
variable that exists for /all/ threads in a program, but independently
per thread?

Very common in kernel programming (e.g. the use of '%gs' in x86_linux)
as a pointer to the 'per-cpu' data structure.

We use thread local to implement 'self' methods in certain
classes (so rather than passing pointers around, one can
simply call class::self() to get a pointer to the
class for each thread.

class c_processor {
...
/**
* Per-thread value of the processor object.
*/
static __thread c_processor *p_this;
...

public:
c_processor(c_system *, c_logger *, processor_number_t, bool);
~c_processor(void);

static c_processor *self(void) { return p_this; }

...

c_processor *pp = c_processor::self().

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to David Brown on Fri May 24 19:22:56 2024

On Fri, 24 May 2024 17:57:35 +0200
David Brown <[email protected]> wrote:

I can't say I have ever seen it as an effort. Almost all my C
"modules" come in pairs - "file.h" and "file.c". All non-local
variables (and all functions) are either static and declared only in "file.c", or they are externally linked and have an "extern"
declaration in "file.h" and a definition (with or without
initialisation) in "file.c" (which #includes "file.h"). It is a very
simple and clean arrangement, easily checked by gcc warnings, and
there are never any undetected conflicts.

Declaration/definition pair is repeating yourself, which is not a good
think.
Of course, the same applies to declaration/definition of externally
visible functions, but somehow in case of functions I am more tolerant
to repetitions than in case of variable. Probably, a psychological
phenomenon - I feel that functions are less trivial, so repetition is
less wasteful.
But I'd like to get rid of these repetitions to, I just did not figure
out a way to do it that does not compromise even more important concern
of seperation between interface and implementation (yes, I dislike Java
for that reason too).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Fri May 24 19:38:16 2024

On 24/05/2024 17:22, Michael S wrote:

On Fri, 24 May 2024 17:57:35 +0200
David Brown <[email protected]> wrote:

I can't say I have ever seen it as an effort. Almost all my C
"modules" come in pairs - "file.h" and "file.c". All non-local
variables (and all functions) are either static and declared only in
"file.c", or they are externally linked and have an "extern"
declaration in "file.h" and a definition (with or without
initialisation) in "file.c" (which #includes "file.h"). It is a very
simple and clean arrangement, easily checked by gcc warnings, and
there are never any undetected conflicts.

Declaration/definition pair is repeating yourself, which is not a good
think.
Of course, the same applies to declaration/definition of externally
visible functions, but somehow in case of functions I am more tolerant
to repetitions than in case of variable. Probably, a psychological
phenomenon - I feel that functions are less trivial, so repetition is
less wasteful.
But I'd like to get rid of these repetitions to, I just did not figure
out a way to do it that does not compromise even more important concern
of seperation between interface and implementation (yes, I dislike Java
for that reason too).

I normally use a private systems language which some here have claimed
is just C with a different syntax.

Nevertheless, this particular problem has been solved:

* There is only a single definition of any function, variable, type,
struct, enum or macro

* No separate declarations are needed. Definitions can appear in any order

* There are no header files

* Exported definitions have a 'global' atribute

It also has a module scheme so that, for example, only the lead module
of a program needs to be submitted to the compiler.

C's facilities for this stuff are quite crude. It would be difficult to retro-fit a scheme like mine.

At best a separate preprocessing pass can done before normal
compilation, which can produce the necessary declarations. Or perhaps a
clever IDE can generate some of this stuff.

But working only with the raw language as it is now, the tidiest
solution is the .h/.c file pairs that have been mentioned.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Keith Thompson on Fri May 24 21:20:52 2024

On 24/05/2024 21:06, Keith Thompson wrote:

bart <[email protected]> writes:
[...]

I normally use a private systems language which some here have claimed
is just C with a different syntax.

[...]

I don't recall anyone claiming that.

When it was last discussed it was in a different group but the same
people who post here.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to David Brown on Fri May 24 23:51:20 2024

On Fri, 24 May 2024 16:50:28 +0200, David Brown wrote:

I know C has alignas ...

Just for a moment, I wondered “what is an aligna?” ...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Malcolm McLean on Sat May 25 00:31:03 2024

On Fri, 24 May 2024 07:47:48 +0100, Malcolm McLean wrote:

I virtually always use goto for memory allocation failure.

It does mean that, strictly, the function is no longer a "structured" subroutine. But reality is usually that memory allocation failure will
mean program termination pretty soon.

Hmm, there may be a point in that. Consider also that Linux systems are typically configured to overcommit memory allocations: they never say
“no”, but when they start running low, then they start killing the big memory hogs.

However, there are other dynamic checks that may need to be done. For
example, trying to load an image, and discovering that your decoder cannot handle it, possibly because it is corrupted or the wrong format
altogether. It would be nice to recover gracefully from this sort of
situation. And not have the decoder crash or leak memory.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to David Brown on Sat May 25 00:40:10 2024

On Fri, 24 May 2024 17:57:35 +0200, David Brown wrote:

Why would anyone want a variable that exists for /all/ threads in a
program, but independently per thread? The only use I can think of is
for errno (which is, IMHO, a horror unto itself) but since that is
defined by the implementation, it does not need to use _Thread_local.

errno is indeed the example that immediately comes to mind for the use of
this feature. It is supposed to have the semantics of an assignable
variable, so how else would you implement it, if not by some (possibly implementation-specific or special-case equivalent of) the _Thread_local mechanism?

I am in two minds over whether errno is a hack or not. On the one hand, it makes more sense for system calls (and library ones, too) to return an
error status directly; on the other hand, sometimes maybe you want to “accumulate” an error status after a series of calls, and errno is a convenient way of doing this.

As for other uses of thread-local, I think most of them have to do with optimizations, like threading itself. For example, imagine a bunch of
threads all contributing increments to a common counter: instead of
continually blocking on access to that counter, they could each have their
own thread-local counter, which periodically has its current value added
to the global counter and then zeroed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Michael S on Sat May 25 00:32:08 2024

On Fri, 24 May 2024 19:22:56 +0300, Michael S wrote:

Declaration/definition pair is repeating yourself, which is not a good [thing].

But it is standard practice in all languages with a decent module system
which has separation of interface and implementation.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Sat May 25 03:01:07 2024

Michael S <[email protected]> writes:

On Fri, 24 May 2024 06:54:35 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

On Thu, 23 May 2024 17:37:39 -0700
Tim Rentsch <[email protected]> wrote:

Michael S <[email protected]> writes:

[...] Just want to say that strfrom* family is long overdue, but
still appear incomplete. The guiding principle should be that all
format specifiers available in printf() with sole exception of %s
should be provided as strfrom* as well.

What's the motivation for having separate functions? To me this
looks like creeping featuritis.

My practical motivation is space-constrained environments, where I
possibly want one or two or three formatters. sprintf() gives me
all or nothing and all can be too expensive. Many embedded
environments have big and small variants of sprintf that can be
chosen at link time, but what's in small variant does not
necessarily match a set that I want in my specific project. And is
not necessarily well documented.

Okay, I see now where you're coming from, although I'm not sure that
the strfrom*() functions will give you what you want (in terms of
memory footprint, etc). But I get your motivation.

Question: which of the four formats (%A, %E, %F, %G) are ones you
expect to use?

Rarely: any of those, mostly for debugging.
In productioon code: %e is most likely, but %f could happen.

If you can get by without %g, I recommend writing your own. The
effort needed isn't trivial but it isn't impossibly large either.
(If you really need %g that's a whole other kettle of fish... and
really old smelly fish at that. :)

But it's not just a floating point. "Small" variants of sprintf()
on 32-bit platforms often unable to handle %lld and %llu.

Here again, just write them. Easy as falling off a log.

Also I'm curious: do all of your target platforms
use IEEE floating point, or do some use other representations?

Currently, only IEEE. [...]

My comments above are predicated on being able to count on
floating point being in IEEE format.

Oh, if you want more information about this, please feel free
to email me.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Sat May 25 13:11:37 2024

On 25/05/2024 03:29, Keith Thompson wrote:

Keith Thompson <[email protected]> writes:

David Brown <[email protected]> writes:

On 23/05/2024 14:11, bart wrote:

[...]

'embed' was discussed a few months ago. I disagreed with the poor
way it was to be implemented: 'embed' notionally generates a list of
comma-separated numbers as tokens, where you have to take care of
any trailing zero yourself if needed. It would also be hopelessly
inefficient if actually implemented like that.

Fortunately, it is /not/ actually implemented like that - it is only
implemented "as if" it were like that. Real prototype implementations
(for gcc and clang - I don't know about other tools) are extremely
efficient at handling #embed. And the comma-separated numbers can be
more flexible in less common use-cases.

[...]

I'm aware of a proposed implementation for clang:

https://github.com/llvm/llvm-project/pull/68620
https://github.com/ThePhD/llvm-project

I'm currently cloning the git repo, with the aim of building it so I can
try it out and test some corner cases. It will take a while.

I'm not aware of any prototype implementation for gcc. If you are, I'd
be very interested in trying it out.

(And thanks for starting this thread!)

I've built this from source, and it mostly works. I haven't seen it do
any optimization; the `#embed` directive expands to a sequence of comma-separated integer constants.

Which means that this:

#include <stdio.h>
int main(void) {
struct foo {
unsigned char a;
unsigned short b;
unsigned int c;
double d;
};
struct foo obj = {
#embed "foo.dat"
};
printf("a=%d b=%d c=%d d=%f\n", obj.a, obj.b, obj.c, obj.d);
}

given "foo.dat" containing bytes with values 1, 2, 3, and 4, produces
this output:

a=1 b=2 c=3 d=4.000000

That is what you would expect by the way #embed is specified. You would
not expect to see any "optimisation", since optimisations should not
change the results (apparent from choosing between alternative valid
results).

Where you will see the optimisation difference is between :

const int xs[] = {
#embed "x.dat"
};

and

const int xs[] = {
#include "x.csv"
};

where "x.dat" is a large binary file, and "x.csv" is the same data as comma-separated values. The #embed version will compile very much
faster, using far less memory. /That/ is the optimisation.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Thiago Adams on Sat May 25 13:05:42 2024

On 25/05/2024 02:27, Thiago Adams wrote:

Em 5/24/2024 5:19 PM, Keith Thompson escreveu:

Thiago Adams <[email protected]> writes:

On 24/05/2024 16:45, Keith Thompson wrote:

Thiago Adams <[email protected]> writes:

On 23/05/2024 18:49, Keith Thompson wrote:

error: 'constexpr' pointer initializer is not null
5 |     constexpr char * s[] = {"a", "b"};

Then we were asking why constexpr was used in that case.

Why not?

When I see a constexpr I ask if the compiler is able to compute
everything at compile time. If not immediately it is a bad usage in my >>>>> view.

I don't understand. Do you object because it's not *immediately
obvious* that everthing can be computed at compile time? If so, why
should it have to be?

My understanding is that constexpr is a tip for the compiler. Does not
ensure anything. Unless you use where constant expression is required.
So I don't like to see constexpr where I know it is not a constant
expression.

Your understanding is incorrect. "constexpr" is not a mere hint.

I think I can explain I little better

Let´s consider we have a compile time array of integers and a loop.

https://godbolt.org/z/e8cM1KGWT

#include <stdio.h>
#include <stdlib.h>
int main() {
    constexpr int a[] = {1, 2, 3, 4, 5, 6, 7, 8};
    for (int i = 0 ; i < sizeof(a)/sizeof(a[0]); i++)
    {
        printf("%d", a[i]);
    }
}

What the programmer expected using a constant array in a loop?
The loop is in runtime, unless the compiler expanded the loop into 8
calls using constant expressions. But this is not the case.
This was the usage of constexpr I saw but with literal strings.
So, the array a is not used as constant even if it has constexpr.

The array /is/ constant. It never changes. The compiler can use that.
I would expect the array to be fixed in the code section of the binary,
along with any other read-only data in the program, rather than put on
the stack.

In this particular case, the constexpr makes little difference because
the compiler knows everything about what happens to the array "a", since
its address does not "escape" from the current translation unit. The
compiler will generate the same code regardless of whether the array is declared "constexpr int", "const int", or plain "int". (But it can
check for accidental modification better with "const" or "constexpr" in
case the programmer made a mistake.)

I am not entirely sure of the specifications for printf, but the
compiler may even be able to turn this into:

int main() {
printf("12345678");
}

It is /certainly/ allowed to turn it into :

int main() {
for (int i = 1; i < 9; i++) {
printf("%d", i);
}
}

In C (not C++), defining an object as "constexpr" gives you two things
compared to defining it as "const". One is that its value can be used
when you need a constant expression according to the rules of the
language (such as for the size of an array in a struct). The other is
that it gives a compile-time error if its initialiser is not itself a
constant expression - and that means an extra check and protection
against some kinds of programmer errors, and extra information to people reading the code.

I don't expect it to make a difference in generated code from an
optimising compiler, in comparison to objects declared with "const".

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Chris M. Thomasson on Sat May 25 13:22:24 2024

On 24/05/2024 20:08, Chris M. Thomasson wrote:

On 5/24/2024 7:50 AM, David Brown wrote:

On 24/05/2024 01:05, Chris M. Thomasson wrote:

On 5/23/2024 6:35 AM, David Brown wrote:

On 22/05/2024 23:24, Chris M. Thomasson wrote:

On 5/22/2024 9:55 AM, David Brown wrote:

In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of
change in it, especially compared to the minor changes in C17.

Love the way std::vectors respect alignas... C++20, iirc?

[...]

I have no idea what you are talking about.

std::vector actually respects alignas, on MSVC at least. I did not
know this worked until I tried it. Iirc, Bonita was the one that
sparked my test. It aligned itself on the proper boundaries. Very nice.

But did you notice that this is c.l.c, not c.l.c++, and the topic is
C23, not C++23 ? Discussing comparisons or compatibility with C++
is fair enough, but talking about pure C++ matters (such as
std::vector<>) is unlikely to be helpful.

C has it as well... Very useful!

I know C has alignas (now as a keyword in C23, instead of just
_Alignas from C11).

I know C++ has alignas (from C++11 onwards).

What I don't understand is why you think std::vector<> "respects
alignas" in C++20 - alignment for std::vector<> works like alignment
for any other class in C++, and always has done.

And what I /really/ don't understand is why you think it is remotely
relevant here? Even "alignas" in C is not particular relevant to this
thread, except that it has become a keyword in C23 instead of a macro
defined to _Alignas in <stdalign.h>.

alignas is very nice because it can help me make a 100% portable version
of some of my old exotic lock-free memory allocators that use rounding
to get down to a header. Any point in the region can be rounded down to
get at the header for the block. It involves aligning the main region on
a large boundary, say 8192 bytes. This is a little trick for high
performance lock-free allocators.

I know what "alignas" can do, and have made use of it.

Iirc, I can make std::vector align its elements to say, L2 cachelines,
and I can make std::vector align itself on a large boundary say 8192
bytes. All in std C++! That is nice.

I would be astounded to hear that std::vector could somehow disobey
alignas specifiers. I can appreciate that it could be useful to use
alignas with std::vector's, but it is nothing special or dramatic.

And it has nothing to do with c.l.c., or with this thread. Not have
your allocators.

If you want to talk about your allocators, or good uses of alignas,
start a new thread in c.l.c or c.l.c++, according to which is
appropriate. Both groups could do with new topical threads. But if you
want to be a positive contribution to these groups, please consider some
basic rules - No links (to code sites or pointless videos), no
out-of-the-blue topic changes, and no pantomime arguments with Bonita or olcott.

But stick to this thread if you want to talk about C23 and the changes
it makes, whether you think they are good or bad.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Sat May 25 13:29:00 2024

On 24/05/2024 21:29, Keith Thompson wrote:

David Brown <[email protected]> writes:

On 23/05/2024 18:40, Keith Thompson wrote:

Michael S <[email protected]> writes:
[...]

Removed

[...]

7) static_assert is not provided as a macro defined in <assert.h>
(becomes a keyword)
8) thread_local is not provided as a macro defined in <threads.h>
(becomes a keyword)

[...]

7) bad. Breaks existing code for weak reason
8) bad. Breaks existing code for weak reason

In pre-C23, _Static_assert and _Thread_local are keywords, and
static_assert and thread_local are macros that expand to those keywords. >>> In C23, _Static_assert, _Thread_local, static_assert, and
thread_local
are all keywords. Code that simply uses the old ugly keywords would not >>> break.
Code that does something like "#ifdef static_assert". I suppose the
headers could have retained the old macro definitions.
#define static_assert static_assert
#define thread_local thread_local

The sort of code that could theoretically break is when you have
definitions like this:

#define STATIC_ASSERT_NAME_(line) STATIC_ASSERT_NAME2_(line)
#define STATIC_ASSERT_NAME2_(line) assertion_failed_at_line_##line
#define static_assert(claim, warning) \
typedef struct { \
char STATIC_ASSERT_NAME_(__COUNTER__) [(claim) ? 2 : -2]; \
} STATIC_ASSERT_NAME_(__COUNTER__)

That works in any C version, until C23, almost as well as
_static_assert. I used this when C11 support was rare in the tools I
used.

You mean _Static_assert.

I meant either "static_assert" or "_Static_assert", rather than a
mixture of the two! (I consider "static_assert" part of C11 even though
it needs a header.)

While using #define for a C keyword is undefined behaviour, in
practice I think you'd have a hard time finding code and a compiler
that used such a macro and which did not work just as well in C23
mode.

(I don't know if anyone is in the habit of declaring macros named
"thread_local".)

"static_assert" is already a macro defined in <assert.h> starting in
C11. The above code is valid in pre-C23, but will break in C11 and C17
if it includes <assert.h> directly or indirectly.

Yes. But including <assert.h> is optional.

You can fix it by
adding "#undef static_assert" or by picking a different name, or by
making your macro definition conditional on __STDC_VERSION__ >= 202311L.

The actual code I use had a number of conditional checks for different C standards and C++, so that it does not define a static_assert macro for
C++ (my C++ usage for the code was always at least C++11), and for C11
onwards it was defined to _Static_assert. (I specifically did not want
to include <assert.h>.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Scott Lurndal on Sat May 25 16:41:31 2024

On 24/05/2024 18:16, Scott Lurndal wrote:

David Brown <[email protected]> writes:

On 23/05/2024 23:34, Michael S wrote:

On Thu, 23 May 2024 22:10:22 +0200
David Brown <[email protected]> wrote:

_Thread_local is a special-purpose thing, probably not applicable at
all for programming of small embedded systems, which nowadays is the
only type of programming in C that I do for money rather than as hobby.

I have never seen the point of it either. Why would anyone want a
variable that exists for /all/ threads in a program, but independently
per thread?

Very common in kernel programming (e.g. the use of '%gs' in x86_linux)
as a pointer to the 'per-cpu' data structure.

We use thread local to implement 'self' methods in certain
classes (so rather than passing pointers around, one can
simply call class::self() to get a pointer to the
class for each thread.

class c_processor {
...
/**
* Per-thread value of the processor object.
*/
static __thread c_processor *p_this;
...

public:
c_processor(c_system *, c_logger *, processor_number_t, bool);
~c_processor(void);

static c_processor *self(void) { return p_this; }

...

c_processor *pp = c_processor::self().

I can see that. But you only want a few of these, and it is typically
in very low-level code that is full of compiler-specific or
target-specific stuff anyway. Such things could be compiler extensions
or other implementation-specific features.

After all, "thread_local" is useless for the vast majority of OS's
(counting numbering of OS's, not number of users). You can't use it
unless the C (or in this case, C++) implementation has support for the
OS in the library and compiler.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Thiago Adams on Sat May 25 16:51:59 2024

On 25/05/2024 13:33, Thiago Adams wrote:

Em 5/24/2024 9:46 PM, Keith Thompson escreveu:

Thiago Adams <[email protected]> writes:

Em 5/24/2024 5:19 PM, Keith Thompson escreveu:

Thiago Adams <[email protected]> writes:

On 24/05/2024 16:45, Keith Thompson wrote:

Thiago Adams <[email protected]> writes:

On 23/05/2024 18:49, Keith Thompson wrote:

error: 'constexpr' pointer initializer is not null
5 |     constexpr char * s[] = {"a", "b"};

Then we were asking why constexpr was used in that case.

Why not?

When I see a constexpr I ask if the compiler is able to compute
everything at compile time. If not immediately it is a bad usage >>>>>>> in my
view.

I don't understand. Do you object because it's not *immediately
obvious* that everthing can be computed at compile time? If so, why >>>>>> should it have to be?

My understanding is that constexpr is a tip for the compiler. Does not >>>>> ensure anything. Unless you use where constant expression is required. >>>>> So I don't like to see constexpr where I know it is not a constant >>>>> expression.

Your understanding is incorrect. "constexpr" is not a mere hint.

I think I can explain I little better

Let´s consider we have a compile time array of integers and a loop.

https://godbolt.org/z/e8cM1KGWT

#include <stdio.h>
#include <stdlib.h>
int main() {
     constexpr int a[] = {1, 2, 3, 4, 5, 6, 7, 8};
     for (int i = 0 ; i < sizeof(a)/sizeof(a[0]); i++)
     {
         printf("%d", a[i]);
     }
}

What the programmer expected using a constant array in a loop?
The loop is in runtime, unless the compiler expanded the loop into 8
calls using constant expressions. But this is not the case.
This was the usage of constexpr I saw but with literal strings.
So, the array a is not used as constant even if it has constexpr.

What do you mean by "used as constant"?

Something used to produce a constant expression.
In the loop the compiler would have to get the value in runtime from
array, or unroll the loop.

I just checked, trying to extract an constant value from the array

https://godbolt.org/z/v33Pqd7W8

#include <stdio.h>
#include <stdlib.h>
int main() {
    constexpr int a[] = {1, 2, 3, 4, 5, 6, 7, 8};
    static_assert(a[0] ==1 );

}

I was expecting this to work!

But gcc says

<source>:5:24: error: expression in static assertion is not constant
    5 |     static_assert(a[0] ==1 );
      |

That is disappointing. I too would have expected that to work in C23.
My guess is that it is the implicit pointer dereference that is the
problem. But I hope this is something that gets fixed shortly.

The mess is even bigger than I thought.

In c++ it works
https://godbolt.org/z/qG6vGhEMj

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Thiago Adams on Sat May 25 17:14:18 2024

On 25/05/2024 13:19, Thiago Adams wrote:

Em 5/25/2024 8:05 AM, David Brown escreveu:

In C (not C++), defining an object as "constexpr" gives you two things
compared to defining it as "const". One is that its value can be used
when you need a constant expression according to the rules of the
language (such as for the size of an array in a struct). The other is
that it gives a compile-time error if its initialiser is not itself a
constant expression - and that means an extra check and protection
against some kinds of programmer errors, and extra information to
people reading the code.

I don't expect it to make a difference in generated code from an
optimising compiler, in comparison to objects declared with "const".

In my view , for this sample constexpr generates noise.

I don't share that opinion, but I understand it.

It also can make
the compilation slower, otherwise, why not everything constexpr by defaul?

That claim, on the other hand, is very strange. Making everything
constexpr by default would be a massive change to the language that
would break all but the most negligible of existing code. And I can
think of no particular reason why constexpr would slow down compilation,
at least to any measurable degree.

I still didn't find a useful usage for constexpr that would compensate
the mess created with const, constexpr.

I don't need a feature to "compensate" for anything to be useful. I
don't need it to be perfect to be useful. There's a few things about
constexpr in C23 that I think are poor decisions, unreasonable
restrictions, or suboptimal integration with other language features
(like static_assert) - such as the array limitations you've found. That
will mean I can't use constexpr as much as I'd like, or as much as I do
in C++. But even if there is just one situation where I think using
constexpr is neater or clearer than using enum, #define, or some other technique, then I will use constexpr in that one situation. Why are you
so insistent on throwing it out completely just because it doesn't do everything you might want?

I already saw ( I don't have it
now ) proposals to make const more like constexpr in C. In C++ const is already a constant expression!

No, it is not - but sometimes a const object with particular
characteristics can be used in situations where you would otherwise need
a constant expression. I mentioned earlier that I find this convenient
in C++ - Keith said it was inconsistent, which is also true. I think
that to a large extent, if C "const" had acquired the additional
features of C++ "const" (excluding the different linkage for file-scope
"const" objects, since that would be a breaking change) then it would
have done everything C23 "constexpr" does today. I personally would
have been fine with that as a solution. But I fully appreciate that it
would have been inconsistent and perhaps hard to specify - you'd would
have the situation that /some/ const objects could be used for things
like static initialisers, while others could not.

The justification for C was VLA. They should consider VLA not VLA if it
has a constant expression. In other words, better break this than create
a mess.
#define makes the job of constexpr.

#define is one way to make named items that can be used in constant expressions, yes. But if it can be done using #define or constexpr, I
think constexpr is the neater choice. Opinions can vary - that's my
opinion.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Michael S on Sat May 25 17:28:25 2024

On 24/05/2024 18:22, Michael S wrote:

On Fri, 24 May 2024 17:57:35 +0200
David Brown <[email protected]> wrote:

I can't say I have ever seen it as an effort. Almost all my C
"modules" come in pairs - "file.h" and "file.c". All non-local
variables (and all functions) are either static and declared only in
"file.c", or they are externally linked and have an "extern"
declaration in "file.h" and a definition (with or without
initialisation) in "file.c" (which #includes "file.h"). It is a very
simple and clean arrangement, easily checked by gcc warnings, and
there are never any undetected conflicts.

Declaration/definition pair is repeating yourself, which is not a good
think.

It is a good thing when you are doing different things for different
purposes.

Of course, the same applies to declaration/definition of externally
visible functions, but somehow in case of functions I am more tolerant
to repetitions than in case of variable. Probably, a psychological
phenomenon - I feel that functions are less trivial, so repetition is
less wasteful.

I don't see the difference.

A header describes the interface, in code and documenting comments that describe how to use the features of the module, for the benefit of
programmers working on other modules. The source file gives the
definitions, along with comments describing how and why it is
implemented this way, for the benefit of programmers working on /this/
module. They are different files, but closely correlated.

But I'd like to get rid of these repetitions to, I just did not figure
out a way to do it that does not compromise even more important concern
of seperation between interface and implementation (yes, I dislike Java
for that reason too).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Lawrence D'Oliveiro on Sat May 25 17:47:48 2024

On 25/05/2024 02:40, Lawrence D'Oliveiro wrote:

On Fri, 24 May 2024 17:57:35 +0200, David Brown wrote:

Why would anyone want a variable that exists for /all/ threads in a
program, but independently per thread? The only use I can think of is
for errno (which is, IMHO, a horror unto itself) but since that is
defined by the implementation, it does not need to use _Thread_local.

errno is indeed the example that immediately comes to mind for the use of this feature. It is supposed to have the semantics of an assignable
variable, so how else would you implement it, if not by some (possibly implementation-specific or special-case equivalent of) the _Thread_local mechanism?

The normal way for multi-threaded systems is to implement it as a macro.
It might be, for example :

#define errno __thread_data->_errno

or

#define errno *errno()

That is precisely why it is specified in the C standards as a macro, not
an external linkage object with static or thread-local storage duration.
(The use of errno in multi-threading C code long predates C11 and _Thread_local.)

I am in two minds over whether errno is a hack or not. On the one hand, it makes more sense for system calls (and library ones, too) to return an
error status directly; on the other hand, sometimes maybe you want to “accumulate” an error status after a series of calls, and errno is a convenient way of doing this.

I understand its purpose (and I assume that some people find it useful),
but I much prefer a clearer flow of return values where possible - I
don't like "hidden" return values. It is particularly bad, IMHO, when
setting errno is optional for many library functions - it means that
otherwise "pure" functions, such as many from <math.h>, might have side-effects, but you can't rely on them accumulating an error status.
It's a lose-lose situation.

As for other uses of thread-local, I think most of them have to do with optimizations, like threading itself. For example, imagine a bunch of
threads all contributing increments to a common counter: instead of continually blocking on access to that counter, they could each have their own thread-local counter, which periodically has its current value added
to the global counter and then zeroed.

I fully appreciate that sometimes you want data local to a thread,
including to different instances of the same thread function. But I
don't see many situations where you'd want the same object to be
available per thread in /all/ threads, as you have with thread_local
data. You want all your counter threads to have their own local
"counter" object? That's fine. But you don't want all your other
threads to have that "counter" object that they never use. I admit this
may be more of a concern for those like myself that work on
small-systems embedded systems, but the whole concept feels wrong to me.

The only data that really belongs to /all/ threads is for implementing
the threading system, or for making the standard library work in
multi-threaded programs (such as errno, handles for standard streams,
malloc heaps, and the like). And all that is part of the
implementation, not user code.

(And for those working with OS's that are written as user code, rather
than implementation code, _Thread_local is useless - there's no standard
way to integrate it with your OS code.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to jak on Sat May 25 21:24:10 2024

On 2024-05-24, jak <[email protected]> wrote:

Bonita Montero ha scritto:

Am 23.05.2024 um 21:49 schrieb Thiago Adams:

On 23/05/2024 16:25, Bonita Montero wrote:

I ask myself what the point is in further developing a language
like this that can actually no longer be saved.

do you mean C++?

No, C.

I think you have a lot of confusion about programming languages. C and
C++ are not comparable languages.

Except for observations like that we can write useful, production
software that compiles as C or C++, but go on ...

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @[email protected]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to David Brown on Sun May 26 02:09:13 2024

On 25/05/2024 16:14, David Brown wrote:

On 25/05/2024 13:19, Thiago Adams wrote:

The justification for C was VLA. They should consider VLA not VLA if
it has a constant expression. In other words, better break this than
create a mess.
#define makes the job of constexpr.

#define is one way to make named items that can be used in constant expressions, yes. But if it can be done using #define or constexpr, I
think constexpr is the neater choice. Opinions can vary - that's my opinion.

Before 'constexpr' (and it still is 'before' as implementations are
rare), there were three disparate ways of emulating named constants in C:

#define A 100

enum {B = 200};

int const C = 300;

None of them fully do the job of the named constant feature I've used in
my own languages (and which I also briefly had in my C compiler).

With 'constexpr' there are now 4 ways of doing it:

constexpr int D = 400;

Here are some characteristics of true named constants and how those
methods fare:

#define enum const constexpr

Scope rules N Y Y Y
No & addr-of Y Y N N?
Any type Y? N Y Y Any int/float
Non-VLA bounds Y Y N Y?
Switch-case? Y Y N Y?
Reduce Y Y ? Y? 2+3 => 5
Can't Mod value Y Y N N? By any means
Not Context sens N Y Y Y Value may vary by context
Single reeval N Y Y Y Expr processed once
Lower case OK N? Y Y Y

Ideally a column would have all Ys. None of these manage that, but
'enum' comes nearest. However it has a problem: it wasn't designed for
this task, which is just a useful by-product. So it looks odd.

With const/constexpr, even if the language can't stop attempts to change
the value, sometimes those attempts are trapped (via read-only mem etc).
That's not ideal either.

Regarding 'Not context sensitive', consider:

----------------------
#include <stdio.h>

enum {a = 100};

#define M (a+1)

enum {b = M};

int main(void) {
enum {a=777};

printf("b = %d\n", b);
printf("M = %d\n", M);
}
----------------------

The output is 101 and 778. The value of M is 101 when used to define
`b`, and 778 later on.

'Single reevaluation' refers to the fact that the expansion of a #define
macro will be repeated at each invocation side, so parsing, evaluation
and reduction of the expression will be done multiple times. It's just inefficient.

It might also vary, not just because of the last point, but because
there aren't enough parentheses or something so combines differently
with surrounding context.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From jak@21:1/5 to All on Sun May 26 08:44:12 2024

Kaz Kylheku ha scritto:

On 2024-05-24, jak <[email protected]> wrote:

Bonita Montero ha scritto:

Am 23.05.2024 um 21:49 schrieb Thiago Adams:

On 23/05/2024 16:25, Bonita Montero wrote:

I ask myself what the point is in further developing a language
like this that can actually no longer be saved.

do you mean C++?

No, C.

I think you have a lot of confusion about programming languages. C and
C++ are not comparable languages.

Except for observations like that we can write useful, production
software that compiles as C or C++, but go on ...

... one last thing: I would ask you not to change the context of the
discussion by cutting some parts of it to justify your comment.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From jak@21:1/5 to All on Sun May 26 08:32:15 2024

Kaz Kylheku ha scritto:

On 2024-05-24, jak <[email protected]> wrote:

Bonita Montero ha scritto:

Am 23.05.2024 um 21:49 schrieb Thiago Adams:

On 23/05/2024 16:25, Bonita Montero wrote:

I ask myself what the point is in further developing a language
like this that can actually no longer be saved.

do you mean C++?

No, C.

I think you have a lot of confusion about programming languages. C and
C++ are not comparable languages.

Except for observations like that we can write useful, production
software that compiles as C or C++, but go on ...

Indeed there are c++ compilers who, if used to compile c code, could
decide to call the c compiler to do the work, but if something in the
code is not strictly c, then the compilation will be in c++, the size of
the executable will increase significantly and will need of an internal
or external runtimer to work. If it were the same thing you would not
get different things.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From jak@21:1/5 to All on Sun May 26 09:13:51 2024

Bonita Montero ha scritto:

Am 24.05.2024 um 09:32 schrieb jak:

Bonita Montero ha scritto:

Am 23.05.2024 um 21:49 schrieb Thiago Adams:

On 23/05/2024 16:25, Bonita Montero wrote:

I ask myself what the point is in further developing a language
like this that can actually no longer be saved.

do you mean C++?

No, C.

I think you have a lot of confusion about programming languages. C and
C++ are not comparable languages.

C and C++ have a lot in common since 95% of what you can do you can do
in C++ also in the same way. But C++ puts 500% on top of that to solve
your tasks with a fraction of the code and if you use that the code
looks totally different than C.

About this I only agree partially because it depends a lot on the
context in which it is used. Moreover, I would not know how to indicate
an optimal programming language for all seasons.

I'm pretty convinced that c++ will be abandoned long before c.

Maybe, but for sure not in favour of C.

I absolutely agree with you.

Just for one example, c++ would be abandoned years ago if c# didn't
produce CLI code only because C# lacks nothing important than C++
and the learning curve is much steeper (it also benefits from
reflection).

Being a good C++ programmer needs a lot of experience, but if you've
done that you get a magnitude more productivity. And often you decide
for simple approaches in C because complex approaches are a lot of work. Often this complex and more efficient approach is easy to handle in C++
if you managed to understand the language.

What you describe is the greatest inconvenience of c++. To make only one example, when they decided to rewrite the FB platform to accelerate it,
they thought of migrating from php to c++ and they had a collapse of the
staff suitable for work, so they thought of relying a compiler that
translated the php into c++ and many of the new languages were born to
try to remedy hits complexity.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Sun May 26 13:09:36 2024

On 26/05/2024 00:58, Keith Thompson wrote:

David Brown <[email protected]> writes:

On 25/05/2024 03:29, Keith Thompson wrote:

Keith Thompson <[email protected]> writes:

David Brown <[email protected]> writes:

On 23/05/2024 14:11, bart wrote:

[...]

'embed' was discussed a few months ago. I disagreed with the poor
way it was to be implemented: 'embed' notionally generates a list of >>>>>> comma-separated numbers as tokens, where you have to take care of
any trailing zero yourself if needed. It would also be hopelessly
inefficient if actually implemented like that.

Fortunately, it is /not/ actually implemented like that - it is only >>>>> implemented "as if" it were like that. Real prototype implementations >>>>> (for gcc and clang - I don't know about other tools) are extremely
efficient at handling #embed. And the comma-separated numbers can be >>>>> more flexible in less common use-cases.

[...]

I'm aware of a proposed implementation for clang:

https://github.com/llvm/llvm-project/pull/68620
https://github.com/ThePhD/llvm-project

I'm currently cloning the git repo, with the aim of building it so I can >>>> try it out and test some corner cases. It will take a while.

I'm not aware of any prototype implementation for gcc. If you are, I'd >>>> be very interested in trying it out.

(And thanks for starting this thread!)

I've built this from source, and it mostly works. I haven't seen it
do
any optimization; the `#embed` directive expands to a sequence of
comma-separated integer constants.
Which means that this:
#include <stdio.h>
int main(void) {
struct foo {
unsigned char a;
unsigned short b;
unsigned int c;
double d;
};
struct foo obj = {
#embed "foo.dat"
};
printf("a=%d b=%d c=%d d=%f\n", obj.a, obj.b, obj.c, obj.d);
}
given "foo.dat" containing bytes with values 1, 2, 3, and 4,
produces
this output:
a=1 b=2 c=3 d=4.000000

That is what you would expect by the way #embed is specified. You
would not expect to see any "optimisation", since optimisations should
not change the results (apparent from choosing between alternative
valid results).

Where you will see the optimisation difference is between :

const int xs[] = {
#embed "x.dat"
};

and

const int xs[] = {
#include "x.csv"
};

where "x.dat" is a large binary file, and "x.csv" is the same data as
comma-separated values. The #embed version will compile very much
faster, using far less memory. /That/ is the optimisation.

Why would it compile faster? #embed expands to something similar to
CSV, which still has to be parsed.

No, it does /not/. That's the /whole/ point of #embed, and the main
motivation for its existence. People have always managed to embed
binary source files into their binary output files - using linker
tricks, or using xxd or other tools (common or specialised) to turn
binary files into initialisers for constant arrays (or structs). I've
done so myself on many projects, all integrated together in makefiles.

#embed has two purposes. One is to save you from using external tools
for that kind of thing. The other is to do it more efficiently for big
files.

There are two ways this is done for examples like this. One is that is
that the compiler does /not/ turn each byte into a series of ASCII
digits for the number, then parse that number to get back to a byte. It
jumps straight from byte in to byte out, possibly after expanding to a
bigger type size if necessary. Secondly, compilers typically track lots
more information about each initialiser - such as the file, line and
column number so that it can give you helpful messages if there is a
value out of range, or too many or too few initialisers. With #embed,
the compiler doesn't have to do any of that.

The compiler will generate results /as if/ it had expanded the file to a
list of numbers and parsed them. But it will not do that in practice.
(At least, not for more serious implementations - simple solutions might
do so to get support implemented quickly.)

Reference: <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf> 6.10.4.

The first one will probably initialize each int element of xs to a
single byte value extracted from x.dat. Is that what you intended?

Yes, if that's what the programmer wrote - though I agree that character
types will be more common and will be the prime target for optimisation.

#embed works best with arrays of unsigned char.

Sure, that will be a very common use.

If you mean that the #embed will expand to something other than the
sequence of integer constants, how does it know to do that in this
context?

It knows because the compiler writers are actually quite smart. The C standards may describe the translation process in a series of distinct
and independent phases, but that's not how it is done in practice. The
key point is that the compiler knows how the sequence of integers is
going to be used before it gets that far in the preprocessing.

I'd expect implementations to have extremely fast implementations for initialising arrays of character types, and probably also for other
arrays of scaler types. More complicated examples - such as parameters
in a macro or function call - would probably use a fall-back of
generating naïve lists of integer constants.

If you have a binary file containing a sequence of int values, you can
use #embed to initialize an unsigned char array that's aliased with or
copied to the int array.

The *embed element width* is typically going to be CHAR_BIT bits by
default. It can only be changed by an *implementation-defined* embed parameter. It seems odd that there's no standard way to specify the
element width.

It seems even more odd that the embed element width is
implementation defined and not set to CHAR_BIT by default.

I agree. But it may be left flexible for situations where the host and
target have different ideas about CHAR_BIT. (Targets with CHAR_BIT
other than 8 are very rare, hosts with CHAR_BIT other than 8 are
non-existent, but C remains flexible.)

A conforming implementation could set the embed element width to,
say, 4*CHAR_BIT and then not provide an implementation-defined embed parameter to specify a different width, making #embed unusable for
unsigned char arrays. (N3220 is a draft, not the final C23 standard,
but I haven't heard about any changes in this area.)

The kind of optimization I was thinking about was having #embed, in some cases, expand to something other than the specified sequence of comma-separated integer constants. Such an optimization would be
intended to improve compile-time speed and memory usage, not run-time performance.

With a straightforward implementation, the preprocessor has to generate
a sequence of integer constants as text, and then later compiler phases
have to parse that text sequence and generate the corresponding code.

Given:

const unsigned char data[4] = {
#embed "four_bytes.dat"
}

That 4 byte data file is translated to something like "1, 2, 3, 4", then converted into a stream of tokens, then those tokens are parsed, then,
given the context, the original 4-byte sequence is written into the
generated object file.

For a very large file, that could be a significant burden. (I don't
have any numbers on that.)

I do :

<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#design-efficiency-metrics>

(That's from a proposal for #embed for C and C++. Generating the
numbers and parsing them is akin to using xxd.)

More useful links:

<https://thephd.dev/embed-the-details#results> <https://thephd.dev/implementing-embed-c-and-c++>

(These are from someone who did a lot of the work for the proposals, and prototype implementations, as far as I understand it.)

Note that I can't say how much of a difference this will make in real
life. I don't know how often people need to include multi-megabyte
files in their code. It certainly is not at a level where I would
change any of my existing projects from external generator scripts to
using #embed, but I might use it in future projects.

An optimized version might have the preprocessor generate some compiler-specific binary output, say something like "@rawdata N"
followed by N bytes of raw data. Later compiler phases recognize the "@rawdata" construct and directly dump the data into the object file in
the right place. Making #embed generate @rawdata is only part of the solution; the compiler has to implement @rawdata in a way that allows it
to be used inside an initializer, or perhaps in any other appropriate context.

That's the idea. In theory, C pre-processors and C compilers are
independent programs with a standardised format between them - in
practice, they are often part of the same binary, and almost invariably
come from the same developers. The "cpp" program may have to generate
standard preprocessed output, and the "cc" program may have to accept
standard preprocessed output, but there is nothing to stop the pair of
programs supporting extended formats that are more efficient.

This could be substantially more efficient for something like:

static const unsigned char data[] = {
#embed "bigfile.dat"
};

Of course it wouldn't handle my test case above. But #embed can take parameters, so it could generate the standard sequence by default and "@rawdata" if you ask for it.

I don't know whether this kind of optimization is worthwhile, i.e.,
whether the straightforward implementation really imposes significant commpile-time performance penalties that @rawdata or equivalent can
solve. I also don't know whether existing implementations will
implement this kind of optimization (so far they haven't implemented
#embed at all).

Prototypes have been made, and they do have such optimisations. How
things end up in real tools remains to be seen, of course.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From jak@21:1/5 to All on Sun May 26 13:44:32 2024

Keith Thompson ha scritto:

jak <[email protected]> writes:

Kaz Kylheku ha scritto:

On 2024-05-24, jak <[email protected]> wrote:

Bonita Montero ha scritto:

Am 23.05.2024 um 21:49 schrieb Thiago Adams:

On 23/05/2024 16:25, Bonita Montero wrote:

I ask myself what the point is in further developing a language
like this that can actually no longer be saved.

do you mean C++?

No, C.

I think you have a lot of confusion about programming languages. C and >>>> C++ are not comparable languages.

Except for observations like that we can write useful, production
software that compiles as C or C++, but go on ...

Indeed there are c++ compilers who, if used to compile c code, could
decide to call the c compiler to do the work, but if something in the
code is not strictly c, then the compilation will be in c++, the size
of the executable will increase significantly and will need of an
internal or external runtimer to work. If it were the same thing you
would not get different things.

Oh? Do you know of a C++ compiler that actually behaves this way?
I've never heard of such a thing.

C and C++ are closely related, and C and C++ compilers often share
backends, but the two languages have different grammars. The gcc
command, for example, can invoke either a C or C++ compiler, but it
knows which language it's compiling based on the source file name or
command line options, before it's even seen the content.

There are programs that are valid C and valid C++ but with different behavior. How would a compiler that behaves as you describe cope with
that?

For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to David Brown on Sun May 26 12:51:12 2024

On 26/05/2024 12:09, David Brown wrote:

On 26/05/2024 00:58, Keith Thompson wrote:

For a very large file, that could be a significant burden. (I don't
have any numbers on that.)

I do :

<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#design-efficiency-metrics>

(That's from a proposal for #embed for C and C++. Generating the
numbers and parsing them is akin to using xxd.)

More useful links:

<https://thephd.dev/embed-the-details#results> <https://thephd.dev/implementing-embed-c-and-c++>

(These are from someone who did a lot of the work for the proposals, and prototype implementations, as far as I understand it.)

Note that I can't say how much of a difference this will make in real
life. I don't know how often people need to include multi-megabyte
files in their code. It certainly is not at a level where I would
change any of my existing projects from external generator scripts to
using #embed, but I might use it in future projects.

I've just done my own quick test (not in C, using embed in my language):

[]byte clangexe = binclude("f:/llvm/bin/clang.exe")

proc main=
fprintln "clang.exe is # bytes", clangexe.len
end

This embeds the Clang C compiler which is 119MB. It took 1.3 seconds to
compile (note my compiler is not optimised).

If I tried it using text: a 121M-line include file, with one number per
line, it took 144 seconds (I believe it used more RAM than was
available: each line will have occupied a 64-byte AST node, so nearly
8GB, on a machine with only 6GB available RAM, much of which was occupied).

The figures at your link say it took 1 second for a 40MB test file, on
an Intel i7 with 24GB.

My compiler took just over 1.3 seconds (now annoyingly taking 1.4
seconds for a retest) for a file nearly 3 times bigger, on a much more
lowly machine (second cheapest PC in the shop), with 8GB.

So my implementation sounds faster. Of course, those 120M data bytes
haven't been optimised!

As for usage, this would be a tidy way of bundling a program like a C
compiler if your program required it, although there are a number of alternatives in that case: the binary here doesn't need to exist in the application's data space.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to jak on Sun May 26 15:39:13 2024

On Sun, 26 May 2024 13:44:32 +0200
jak <[email protected]> wrote:

Keith Thompson ha scritto:

jak <[email protected]> writes:

Kaz Kylheku ha scritto:

On 2024-05-24, jak <[email protected]> wrote:

Bonita Montero ha scritto:

Am 23.05.2024 um 21:49 schrieb Thiago Adams:

On 23/05/2024 16:25, Bonita Montero wrote:

I ask myself what the point is in further developing a
language like this that can actually no longer be saved.

do you mean C++?

No, C.

I think you have a lot of confusion about programming languages.
C and C++ are not comparable languages.

Except for observations like that we can write useful, production
software that compiles as C or C++, but go on ...

Indeed there are c++ compilers who, if used to compile c code,
could decide to call the c compiler to do the work, but if
something in the code is not strictly c, then the compilation will
be in c++, the size of the executable will increase significantly
and will need of an internal or external runtimer to work. If it
were the same thing you would not get different things.

Oh? Do you know of a C++ compiler that actually behaves this way?
I've never heard of such a thing.

C and C++ are closely related, and C and C++ compilers often share backends, but the two languages have different grammars. The gcc
command, for example, can invoke either a C or C++ compiler, but it
knows which language it's compiling based on the source file name or command line options, before it's even seen the content.

There are programs that are valid C and valid C++ but with different behavior. How would a compiler that behaves as you describe cope
with that?

For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.

No, it does not.
g++ compiles as C++ unless you tell it to compile as C with '-x c'
option.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Sun May 26 16:18:32 2024

On Sun, 26 May 2024 12:51:12 +0100
bart <[email protected]> wrote:

On 26/05/2024 12:09, David Brown wrote:

On 26/05/2024 00:58, Keith Thompson wrote:

For a very large file, that could be a significant burden. (I
don't have any numbers on that.)

I do :

<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#design-efficiency-metrics>

(That's from a proposal for #embed for C and C++. Generating the
numbers and parsing them is akin to using xxd.)

More useful links:

<https://thephd.dev/embed-the-details#results> <https://thephd.dev/implementing-embed-c-and-c++>

(These are from someone who did a lot of the work for the
proposals, and prototype implementations, as far as I understand
it.)

Note that I can't say how much of a difference this will make in
real life. I don't know how often people need to include
multi-megabyte files in their code. It certainly is not at a level
where I would change any of my existing projects from external
generator scripts to using #embed, but I might use it in future
projects.

I've just done my own quick test (not in C, using embed in my
language):

[]byte clangexe = binclude("f:/llvm/bin/clang.exe")

proc main=
fprintln "clang.exe is # bytes", clangexe.len
end

This embeds the Clang C compiler which is 119MB. It took 1.3 seconds
to compile (note my compiler is not optimised).

If I tried it using text: a 121M-line include file, with one number
per line, it took 144 seconds (I believe it used more RAM than was available: each line will have occupied a 64-byte AST node, so nearly
8GB, on a machine with only 6GB available RAM, much of which was
occupied).

On my old PC that was not the cheapest box in the shop, but is more than
10 y.o. compilation speed for similarly organized (but much smaller)
text files is as following:
MSVC 18.00.31101 (VS 2013) - 1950 KB/sec
MSVC 19.16.27032 (VS 2017) - 1180 KB/sec
MSVC 19.20.27500 (VS 2019) - 1180 KB/sec
clang 17.0.6 - 547 KB/sec (somewhat better with hex text)
gcc 13.2.0 - 580 KB/sec

So, MSVC compilers, esp. an old one, are somewhat faster than yours.
But if there was swapping involved it's not comparable. How much time
does it take for your compiler to produce 5MB byte array from text?

The figures at your link say it took 1 second for a 40MB test file,
on an Intel i7 with 24GB.

My compiler took just over 1.3 seconds (now annoyingly taking 1.4
seconds for a retest) for a file nearly 3 times bigger, on a much
more lowly machine (second cheapest PC in the shop), with 8GB.

So my implementation sounds faster. Of course, those 120M data bytes
haven't been optimised!

But both are much faster than compiling through text. Even "slow"
40MB/3 is 6-7 times faster than the fastest of compilers in my tests.

As for usage, this would be a tidy way of bundling a program like a C compiler if your program required it, although there are a number of alternatives in that case: the binary here doesn't need to exist in
the application's data space.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From jak@21:1/5 to All on Sun May 26 15:46:33 2024

Michael S ha scritto:

On Sun, 26 May 2024 13:44:32 +0200
jak <[email protected]> wrote:

Keith Thompson ha scritto:

jak <[email protected]> writes:

Kaz Kylheku ha scritto:

On 2024-05-24, jak <[email protected]> wrote:

Bonita Montero ha scritto:

Am 23.05.2024 um 21:49 schrieb Thiago Adams:

On 23/05/2024 16:25, Bonita Montero wrote:

I ask myself what the point is in further developing a
language like this that can actually no longer be saved.

do you mean C++?

No, C.

I think you have a lot of confusion about programming languages.
C and C++ are not comparable languages.

Except for observations like that we can write useful, production
software that compiles as C or C++, but go on ...

Indeed there are c++ compilers who, if used to compile c code,
could decide to call the c compiler to do the work, but if
something in the code is not strictly c, then the compilation will
be in c++, the size of the executable will increase significantly
and will need of an internal or external runtimer to work. If it
were the same thing you would not get different things.

Oh? Do you know of a C++ compiler that actually behaves this way?
I've never heard of such a thing.

C and C++ are closely related, and C and C++ compilers often share
backends, but the two languages have different grammars. The gcc
command, for example, can invoke either a C or C++ compiler, but it
knows which language it's compiling based on the source file name or
command line options, before it's even seen the content.

There are programs that are valid C and valid C++ but with different
behavior. How would a compiler that behaves as you describe cope
with that?

For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.

No, it does not.
g++ compiles as C++ unless you tell it to compile as C with '-x c'
option.

You didn't read carefully or I didn't express myself well. I wrote that
the g++ compile c++ even if it is written inside a .c file.
However in doubt I preferred to try. If I pass to g++ a .c file that
contains c code, it compiles without any option, perhaps because it
reads as if it were c++ but in any case compiles it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Sun May 26 16:18:17 2024

On 26/05/2024 01:45, Keith Thompson wrote:

David Brown <[email protected]> writes:
[...]

The normal way for multi-threaded systems is to implement it as a
macro. It might be, for example :

#define errno __thread_data->_errno

or

#define errno *errno()

Both of those need more parentheses -- and I'm unconfortable using the
same identifier for the macro and the function.

The second example was from the footnote in the C standard's section on <errno.h>, so it can't be /that/ bad!

But I agree with your discomfort.

That is precisely why it is specified in the C standards as a macro,
not an external linkage object with static or thread-local storage
duration. (The use of errno in multi-threading C code long predates
C11 and _Thread_local.)

[...]

glibc and musl both have :

# define errno (*__errno_location ())

newlib (used on Cygwin) has something similar :

#define errno (*__errno())

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to jak on Sun May 26 17:20:30 2024

On Sun, 26 May 2024 15:46:33 +0200
jak <[email protected]> wrote:

Michael S ha scritto:

On Sun, 26 May 2024 13:44:32 +0200
jak <[email protected]> wrote:

Keith Thompson ha scritto:

jak <[email protected]> writes:

Kaz Kylheku ha scritto:

On 2024-05-24, jak <[email protected]> wrote:

Bonita Montero ha scritto:

Am 23.05.2024 um 21:49 schrieb Thiago Adams:

On 23/05/2024 16:25, Bonita Montero wrote:

I ask myself what the point is in further developing a
language like this that can actually no longer be saved.

do you mean C++?

No, C.

I think you have a lot of confusion about programming
languages. C and C++ are not comparable languages.

Except for observations like that we can write useful,
production software that compiles as C or C++, but go on ...

Indeed there are c++ compilers who, if used to compile c code,
could decide to call the c compiler to do the work, but if
something in the code is not strictly c, then the compilation
will be in c++, the size of the executable will increase
significantly and will need of an internal or external runtimer
to work. If it were the same thing you would not get different
things.

Oh? Do you know of a C++ compiler that actually behaves this way?
I've never heard of such a thing.

C and C++ are closely related, and C and C++ compilers often share
backends, but the two languages have different grammars. The gcc
command, for example, can invoke either a C or C++ compiler, but
it knows which language it's compiling based on the source file
name or command line options, before it's even seen the content.

There are programs that are valid C and valid C++ but with
different behavior. How would a compiler that behaves as you
describe cope with that?

For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.

No, it does not.
g++ compiles as C++ unless you tell it to compile as C with '-x c'
option.

You didn't read carefully or I didn't express myself well. I wrote
that the g++ compile c++ even if it is written inside a .c file.
However in doubt I preferred to try. If I pass to g++ a .c file that
contains c code, it compiles without any option, perhaps because it
reads as if it were c++ but in any case compiles it.

It is easy to see that it was compiled as C++ rather than as c.
Look at the content of the generated object with 'objdump -d'.
You will see that the names of global functions and variables are
mangled.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Sun May 26 16:15:41 2024

On 26/05/2024 01:21, Keith Thompson wrote:

David Brown <[email protected]> writes:

On 24/05/2024 21:29, Keith Thompson wrote:

[...]

"static_assert" is already a macro defined in <assert.h> starting in
C11. The above code is valid in pre-C23, but will break in C11 and C17
if it includes <assert.h> directly or indirectly.

Yes. But including <assert.h> is optional.

Your header that defines your own "static_assert" macro might depend on
some other header outside your control. A future version of that other header might add a "#include <assert.h>", breaking your code.

I believe - but am not entirely sure - that the standard library headers
are not allowed to include each other, precisely so that there will not
be conflicts between user-defined identifiers and standard library
identifiers from headers that you did not explicitly include.

I appreciate what you are saying, and it can often make sense for other
people. But in /my/ code, there is no possibility of future versions of headers having other includes. In my projects, I consider the entire
toolchain to be part of the project, along with any other libraries or
SDK's. Surprises like that don't happen when I am working on a project
- nor when I take the same project out of archives and rebuild it 20
years later to get exactly the same binary, nor when anyone else does
that. Reproducible builds are vital to my work.

Of course, if I re-use the same code in a different project with
different toolchains or libraries, such issues could crop up - but they
are easily spotted and handled at the time.

There are solutions (check "#ifdef static_assert" for the macro and __STDC_VERSION__ for the keyword, etc.)

Indeed.

Perhaps it's not an issue for you, but it's a corner case to keep in
mind.

It is not an issue for me, no - but I agree that it can be an issue for
some people, and I agree it is worth keeping in mind. I am not
suggesting that defining your own static_assert macro is a good idea for general use - I was merely saying that /I/ had used it as a temporary
measure before C11 (and C++11) became practical for the majority of work
I did, and that it could have compatibility issues when moving to C23.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to jak on Sun May 26 16:29:35 2024

On 26/05/2024 15:46, jak wrote:

Michael S ha scritto:

On Sun, 26 May 2024 13:44:32 +0200
jak <[email protected]> wrote:

Keith Thompson ha scritto:

jak <[email protected]> writes:

Kaz Kylheku ha scritto:

On 2024-05-24, jak <[email protected]> wrote:

Bonita Montero ha scritto:

Am 23.05.2024 um 21:49 schrieb Thiago Adams:

On 23/05/2024 16:25, Bonita Montero wrote:

I ask myself what the point is in further developing a
language like this that can actually no longer be saved.

do you mean C++?

No, C.

I think you have a lot of confusion about programming languages. >>>>>>> C and C++ are not comparable languages.

Except for observations like that we can write useful, production
software that compiles as C or C++, but go on ...

Indeed there are c++ compilers who, if used to compile c code,
could decide to call the c compiler to do the work, but if
something in the code is not strictly c, then the compilation will
be in c++, the size of the executable will increase significantly
and will need of an internal or external runtimer to work. If it
were the same thing you would not get different things.

Oh? Do you know of a C++ compiler that actually behaves this way?
I've never heard of such a thing.

C and C++ are closely related, and C and C++ compilers often share
backends, but the two languages have different grammars. The gcc
command, for example, can invoke either a C or C++ compiler, but it
knows which language it's compiling based on the source file name or
command line options, before it's even seen the content.

There are programs that are valid C and valid C++ but with different
behavior. How would a compiler that behaves as you describe cope
with that?

For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.

No.

No, it does not.
g++ compiles as C++ unless you tell it to compile as C with '-x c'
option.

No.

You didn't read carefully or I didn't express myself well. I wrote that
the g++ compile c++ even if it is written inside a .c file.
However in doubt I preferred to try. If I pass to g++ a .c file that
contains c code, it compiles without any option, perhaps because it
reads as if it were c++ but in any case compiles it.

No.

The way gcc handles all this is actually quite straightforward.

First, there is no difference between the commands "gcc" and "g++" in
the languages supported, or the way the language is determined. The
only difference between these two is the standard libraries linked by
default when generating a final executable - "g++" automatically
includes the C++ standard libraries, while "gcc" only has the C standard libraries.

In neither case does "gcc" or "g++" actually handle the compilation -
these are driver front-ends that pass things on to the actual compilers, assemblers and linkers (and any other bits and pieces required).

The front-ends determine the language to use primarily from the suffix
of the source file it is given. ".c" files are compiled as C. ".cpp",
".c++", ".cc", ".C" (note the capital C), ".cp", ".cxx", and ".CPP" are compiled as C++. (There are many other extensions supported for
different languages.)

The language choice can be overridden by using the "-x" switch, such as
"-x c" or "-x c++". The standard can be specified with "-std=".

There is no automatic detection of C or C++ based on the /content/ of
the files.

<https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to David Brown on Sun May 26 18:05:31 2024

On Sun, 26 May 2024 16:29:35 +0200
David Brown <[email protected]> wrote:

On 26/05/2024 15:46, jak wrote:

Michael S ha scritto:

On Sun, 26 May 2024 13:44:32 +0200
jak <[email protected]> wrote:

Keith Thompson ha scritto:

jak <[email protected]> writes:

Kaz Kylheku ha scritto:

On 2024-05-24, jak <[email protected]> wrote:

Bonita Montero ha scritto:

Am 23.05.2024 um 21:49 schrieb Thiago Adams:

On 23/05/2024 16:25, Bonita Montero wrote:

I ask myself what the point is in further developing a
language like this that can actually no longer be saved. >>>>>>>>> do you mean C++?

No, C.

I think you have a lot of confusion about programming
languages. C and C++ are not comparable languages.

Except for observations like that we can write useful,
production software that compiles as C or C++, but go on ...

Indeed there are c++ compilers who, if used to compile c code,
could decide to call the c compiler to do the work, but if
something in the code is not strictly c, then the compilation
will be in c++, the size of the executable will increase
significantly and will need of an internal or external runtimer
to work. If it were the same thing you would not get different
things.

Oh? Do you know of a C++ compiler that actually behaves this
way? I've never heard of such a thing.

C and C++ are closely related, and C and C++ compilers often
share backends, but the two languages have different grammars.
The gcc command, for example, can invoke either a C or C++
compiler, but it knows which language it's compiling based on
the source file name or command line options, before it's even
seen the content.

There are programs that are valid C and valid C++ but with
different behavior. How would a compiler that behaves as you
describe cope with that?

For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.

No.

No, it does not.
g++ compiles as C++ unless you tell it to compile as C with '-x c'
option.

No.

You didn't read carefully or I didn't express myself well. I wrote
that the g++ compile c++ even if it is written inside a .c file.
However in doubt I preferred to try. If I pass to g++ a .c file that contains c code, it compiles without any option, perhaps because it
reads as if it were c++ but in any case compiles it.

No.

The way gcc handles all this is actually quite straightforward.

First, there is no difference between the commands "gcc" and "g++" in
the languages supported, or the way the language is determined. The
only difference between these two is the standard libraries linked by default when generating a final executable - "g++" automatically
includes the C++ standard libraries, while "gcc" only has the C
standard libraries.

In neither case does "gcc" or "g++" actually handle the compilation -
these are driver front-ends that pass things on to the actual
compilers, assemblers and linkers (and any other bits and pieces
required).

I don't know how it works in your environment.
I am 100% sure that it works like I wrote above in my environment. Specifically:
'g++ -c foo.c' calls binary cc1plus.exe
'g++ -c -x c foo.c' calls binary cc1.exe
'gcc -c foo.c' calls binary cc1.exe
'gcc -c foo.cpp' calls binary cc1plus.exe
'gcc -c foo.C' calls binary cc1plus.exe

The front-ends determine the language to use primarily from the
suffix of the source file it is given. ".c" files are compiled as C.
".cpp", ".c++", ".cc", ".C" (note the capital C), ".cp", ".cxx", and
".CPP" are compiled as C++. (There are many other extensions
supported for different languages.)

In my environment it applies to gcc, but not to g++.
In order to force my g++ to compile for other language you have to tell
it so explicitly.

The language choice can be overridden by using the "-x" switch, such
as "-x c" or "-x c++". The standard can be specified with "-std=".

Yes, of course.

There is no automatic detection of C or C++ based on the /content/ of
the files.

Yes, of course.

<https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From jak@21:1/5 to All on Sun May 26 17:10:01 2024

David Brown ha scritto:

On 26/05/2024 15:46, jak wrote:

Michael S ha scritto:

On Sun, 26 May 2024 13:44:32 +0200
jak <[email protected]> wrote:

Keith Thompson ha scritto:

jak <[email protected]> writes:

Kaz Kylheku ha scritto:

On 2024-05-24, jak <[email protected]> wrote:

Bonita Montero ha scritto:

Am 23.05.2024 um 21:49 schrieb Thiago Adams:

On 23/05/2024 16:25, Bonita Montero wrote:

I ask myself what the point is in further developing a
language like this that can actually no longer be saved.

do you mean C++?

No, C.

I think you have a lot of confusion about programming languages. >>>>>>>> C and C++ are not comparable languages.

Except for observations like that we can write useful, production >>>>>>> software that compiles as C or C++, but go on ...

Indeed there are c++ compilers who, if used to compile c code,
could decide to call the c compiler to do the work, but if
something in the code is not strictly c, then the compilation will >>>>>> be in c++, the size of the executable will increase significantly
and will need of an internal or external runtimer to work. If it
were the same thing you would not get different things.

Oh? Do you know of a C++ compiler that actually behaves this way?
I've never heard of such a thing.

C and C++ are closely related, and C and C++ compilers often share
backends, but the two languages have different grammars. The gcc
command, for example, can invoke either a C or C++ compiler, but it
knows which language it's compiling based on the source file name or >>>>> command line options, before it's even seen the content.

There are programs that are valid C and valid C++ but with different >>>>> behavior. How would a compiler that behaves as you describe cope
with that?

For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.

No.

No, it does not.
g++ compiles as C++ unless you tell it to compile as C with '-x c'
option.

No.

You didn't read carefully or I didn't express myself well. I wrote that
the g++ compile c++ even if it is written inside a .c file.
However in doubt I preferred to try. If I pass to g++ a .c file that
contains c code, it compiles without any option, perhaps because it
reads as if it were c++ but in any case compiles it.

No.

The way gcc handles all this is actually quite straightforward.

First, there is no difference between the commands "gcc" and "g++" in
the languages supported, or the way the language is determined. The
only difference between these two is the standard libraries linked by
default when generating a final executable - "g++" automatically
includes the C++ standard libraries, while "gcc" only has the C standard libraries.

In neither case does "gcc" or "g++" actually handle the compilation -
these are driver front-ends that pass things on to the actual compilers, assemblers and linkers (and any other bits and pieces required).

The front-ends determine the language to use primarily from the suffix
of the source file it is given. ".c" files are compiled as C. ".cpp", ".c++", ".cc", ".C" (note the capital C), ".cp", ".cxx", and ".CPP" are compiled as C++. (There are many other extensions supported for
different languages.)

The language choice can be overridden by using the "-x" switch, such as
"-x c" or "-x c++". The standard can be specified with "-std=".

There is no automatic detection of C or C++ based on the /content/ of
the files.

<https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html>

?
I really wrote that something similar (similar != equal) did g++ and
that, if you write c++ code in a file with the .c extension, the g++
compile it. I never wrote that it was automatically recognized.
In addition, you just explained why g++ compile a .c that contains c++
code. I don't understand: no what?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to jak on Sun May 26 18:23:48 2024

On Sun, 26 May 2024 17:10:01 +0200
jak <[email protected]> wrote:

?
I really wrote that something similar (similar != equal) did g++ and
that, if you write c++ code in a file with the .c extension, the g++
compile it. I never wrote that it was automatically recognized.
In addition, you just explained why g++ compile a .c that contains c++
code. I don't understand: no what?

Your English is already harder to understand than mine.
Congratulations, that is not a small fit. But you still have fir to
pursuit. Keep exercising.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Sun May 26 16:25:51 2024

On 26/05/2024 14:18, Michael S wrote:

On Sun, 26 May 2024 12:51:12 +0100
bart <[email protected]> wrote:

On 26/05/2024 12:09, David Brown wrote:

On 26/05/2024 00:58, Keith Thompson wrote:

For a very large file, that could be a significant burden. (I
don't have any numbers on that.)

I do :

<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#design-efficiency-metrics>

(That's from a proposal for #embed for C and C++. Generating the
numbers and parsing them is akin to using xxd.)

More useful links:

<https://thephd.dev/embed-the-details#results>
<https://thephd.dev/implementing-embed-c-and-c++>

(These are from someone who did a lot of the work for the
proposals, and prototype implementations, as far as I understand
it.)

Note that I can't say how much of a difference this will make in
real life. I don't know how often people need to include
multi-megabyte files in their code. It certainly is not at a level
where I would change any of my existing projects from external
generator scripts to using #embed, but I might use it in future
projects.

I've just done my own quick test (not in C, using embed in my
language):

[]byte clangexe = binclude("f:/llvm/bin/clang.exe")

proc main=
fprintln "clang.exe is # bytes", clangexe.len
end

This embeds the Clang C compiler which is 119MB. It took 1.3 seconds
to compile (note my compiler is not optimised).

If I tried it using text: a 121M-line include file, with one number
per line, it took 144 seconds (I believe it used more RAM than was
available: each line will have occupied a 64-byte AST node, so nearly
8GB, on a machine with only 6GB available RAM, much of which was
occupied).

On my old PC that was not the cheapest box in the shop, but is more than
10 y.o. compilation speed for similarly organized (but much smaller)
text files is as following:
MSVC 18.00.31101 (VS 2013) - 1950 KB/sec
MSVC 19.16.27032 (VS 2017) - 1180 KB/sec
MSVC 19.20.27500 (VS 2019) - 1180 KB/sec
clang 17.0.6 - 547 KB/sec (somewhat better with hex text)
gcc 13.2.0 - 580 KB/sec

So, MSVC compilers, esp. an old one, are somewhat faster than yours.
But if there was swapping involved it's not comparable. How much time
does it take for your compiler to produce 5MB byte array from text?

Are you talking about a 5MB array initialised like this:

unsigned char data[] = {
45,
67,
17,
... // 5M-3 more rows
};

The timing for 120M entries was challenging as it exceeded physical
memory. However that test I can also do with C compilers. Results for
120 million lines of data are:

DMC - Out-of-memory

Tiny C - Silently stopped after 13 second (I thought it had
finished but no)

lccwin32 - Insufficient memory

gcc 10.x.x - Out of memory after 80 seconds

mcc - (My product) Memory failure after 27 seconds

Clang - (Crashed after 5 minutes)

MM 144s (Compiler for my language)

So the compiler for my language did quite well, considering!

Back to the 5MB test:

Tiny C 1.7s 2.9MB/sec (Tcc doesn't use any IR)

mcc 3.7s 1.3MB/sec (my product; uses intermediate ASM)

DMC -- -- (Out of memory; 32-bit compiler)

lccwin32 3.9s 1.3MB/sec

gcc 10.x 10.6s 0.5MB/sec

clang 7.4s 0.7MB/sec (to object file only)

MM 1.4s 3.6MB/sec (compiler for my language)

MM 0.7 7.1MB/sec (MM optimised via C and gcc-O3)

As a reminder, when using my version of 'embed' in my language,
embedding a 120MB binary file took 1.3 seconds, about 90MB/second.

But both are much faster than compiling through text. Even "slow"
40MB/3 is 6-7 times faster than the fastest of compilers in my tests.

Do you have a C compiler that supports #embed?

It's generally understood that processing text is slow, if representing byte-at-a-time data. If byte arrays could be represented as sequences of
i64 constants, it would improve matters. That could be done in C, but awkwardly, by aliasing a byte-array with an i64-array.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to BGB on Sun May 26 18:12:17 2024

On 26/05/2024 16:48, BGB wrote:

On 5/26/2024 9:18 AM, David Brown wrote:

On 26/05/2024 01:45, Keith Thompson wrote:

David Brown <[email protected]> writes:
[...]

The normal way for multi-threaded systems is to implement it as a
macro.   It might be, for example :

    #define errno __thread_data->_errno

or

    #define errno *errno()

Both of those need more parentheses -- and I'm unconfortable using the
same identifier for the macro and the function.

The second example was from the footnote in the C standard's section
on <errno.h>, so it can't be /that/ bad!

But I agree with your discomfort.

I would expect it to immediately explode, because AFAIK the usual preprocessor behavior is to keep expanding macros in a line until there
is nothing left to expand.

Well, granted, it is possible I could have misinterpreted how it was
supposed to work and had never noticed...

I think you did misinterpret. Macros in C are not recursive. That
stops them exploding, but also means there's a lot you can't do with the preprocessor.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Michael S on Sun May 26 18:26:49 2024

On 26/05/2024 17:05, Michael S wrote:

On Sun, 26 May 2024 16:29:35 +0200
David Brown <[email protected]> wrote:

On 26/05/2024 15:46, jak wrote:

Michael S ha scritto:

On Sun, 26 May 2024 13:44:32 +0200
jak <[email protected]> wrote:

Keith Thompson ha scritto:

jak <[email protected]> writes:

Kaz Kylheku ha scritto:

On 2024-05-24, jak <[email protected]> wrote:

Bonita Montero ha scritto:

Am 23.05.2024 um 21:49 schrieb Thiago Adams:

On 23/05/2024 16:25, Bonita Montero wrote:

I ask myself what the point is in further developing a >>>>>>>>>>>> language like this that can actually no longer be saved. >>>>>>>>>>> do you mean C++?

No, C.

I think you have a lot of confusion about programming
languages. C and C++ are not comparable languages.

Except for observations like that we can write useful,
production software that compiles as C or C++, but go on ...

Indeed there are c++ compilers who, if used to compile c code,
could decide to call the c compiler to do the work, but if
something in the code is not strictly c, then the compilation
will be in c++, the size of the executable will increase
significantly and will need of an internal or external runtimer
to work. If it were the same thing you would not get different
things.

Oh? Do you know of a C++ compiler that actually behaves this
way? I've never heard of such a thing.

C and C++ are closely related, and C and C++ compilers often
share backends, but the two languages have different grammars.
The gcc command, for example, can invoke either a C or C++
compiler, but it knows which language it's compiling based on
the source file name or command line options, before it's even
seen the content.

There are programs that are valid C and valid C++ but with
different behavior. How would a compiler that behaves as you
describe cope with that?

For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.

No.

No, it does not.
g++ compiles as C++ unless you tell it to compile as C with '-x c'
option.

No.

You didn't read carefully or I didn't express myself well. I wrote
that the g++ compile c++ even if it is written inside a .c file.
However in doubt I preferred to try. If I pass to g++ a .c file that
contains c code, it compiles without any option, perhaps because it
reads as if it were c++ but in any case compiles it.

No.

The way gcc handles all this is actually quite straightforward.

First, there is no difference between the commands "gcc" and "g++" in
the languages supported, or the way the language is determined. The
only difference between these two is the standard libraries linked by
default when generating a final executable - "g++" automatically
includes the C++ standard libraries, while "gcc" only has the C
standard libraries.

In neither case does "gcc" or "g++" actually handle the compilation -
these are driver front-ends that pass things on to the actual
compilers, assemblers and linkers (and any other bits and pieces
required).

I don't know how it works in your environment.
I am 100% sure that it works like I wrote above in my environment. Specifically:
'g++ -c foo.c' calls binary cc1plus.exe

My apologies - you are correct.

g++ does indeed treat ".c" (and ".h" and ".i") files as C++, unless
overridden. (This applies only to those file extensions - Fortran, Ada, Assembly, linker, etc., files are treated just like with gcc.)

'g++ -c -x c foo.c' calls binary cc1.exe
'gcc -c foo.c' calls binary cc1.exe
'gcc -c foo.cpp' calls binary cc1plus.exe
'gcc -c foo.C' calls binary cc1plus.exe

Yes, of course.

The front-ends determine the language to use primarily from the
suffix of the source file it is given. ".c" files are compiled as C.
".cpp", ".c++", ".cc", ".C" (note the capital C), ".cp", ".cxx", and
".CPP" are compiled as C++. (There are many other extensions
supported for different languages.)

In my environment it applies to gcc, but not to g++.
In order to force my g++ to compile for other language you have to tell
it so explicitly.

No, g++ treats extensions other than ".c" the same way as gcc. (I
tested to be sure this time!) Try :

touch foo.f
gcc foo.f
g++ foo.f

You'll get the same complaint - either from missing Fortran support or a failure to build the Fortran program. Even "g++ foo.m" tries to compile
as Objective-C, not Objective-C++.

<https://gcc.gnu.org/onlinedocs/gcc/Invoking-G_002b_002b.html>

The language choice can be overridden by using the "-x" switch, such
as "-x c" or "-x c++". The standard can be specified with "-std=".

Yes, of course.

There is no automatic detection of C or C++ based on the /content/ of
the files.

Yes, of course.

<https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to jak on Sun May 26 18:36:16 2024

On 26/05/2024 17:10, jak wrote:

David Brown ha scritto:

On 26/05/2024 15:46, jak wrote:

Michael S ha scritto:

On Sun, 26 May 2024 13:44:32 +0200
jak <[email protected]> wrote:

Keith Thompson ha scritto:

jak <[email protected]> writes:

Kaz Kylheku ha scritto:

On 2024-05-24, jak <[email protected]> wrote:

Bonita Montero ha scritto:

Am 23.05.2024 um 21:49 schrieb Thiago Adams:

On 23/05/2024 16:25, Bonita Montero wrote:

I ask myself what the point is in further developing a >>>>>>>>>>>> language like this that can actually no longer be saved. >>>>>>>>>>> do you mean C++?

No, C.

I think you have a lot of confusion about programming languages. >>>>>>>>> C and C++ are not comparable languages.

Except for observations like that we can write useful, production >>>>>>>> software that compiles as C or C++, but go on ...

Indeed there are c++ compilers who, if used to compile c code,
could decide to call the c compiler to do the work, but if
something in the code is not strictly c, then the compilation will >>>>>>> be in c++, the size of the executable will increase significantly >>>>>>> and will need of an internal or external runtimer to work. If it >>>>>>> were the same thing you would not get different things.

Oh? Do you know of a C++ compiler that actually behaves this way? >>>>>> I've never heard of such a thing.

C and C++ are closely related, and C and C++ compilers often share >>>>>> backends, but the two languages have different grammars. The gcc >>>>>> command, for example, can invoke either a C or C++ compiler, but it >>>>>> knows which language it's compiling based on the source file name or >>>>>> command line options, before it's even seen the content.

There are programs that are valid C and valid C++ but with different >>>>>> behavior. How would a compiler that behaves as you describe cope >>>>>> with that?

For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.

No.

No, it does not.
g++ compiles as C++ unless you tell it to compile as C with '-x c'
option.

No.

You didn't read carefully or I didn't express myself well. I wrote that
the g++ compile c++ even if it is written inside a .c file.
However in doubt I preferred to try. If I pass to g++ a .c file that
contains c code, it compiles without any option, perhaps because it
reads as if it were c++ but in any case compiles it.

No.

The way gcc handles all this is actually quite straightforward.

First, there is no difference between the commands "gcc" and "g++" in
the languages supported, or the way the language is determined. The
only difference between these two is the standard libraries linked by
default when generating a final executable - "g++" automatically
includes the C++ standard libraries, while "gcc" only has the C
standard libraries.

In neither case does "gcc" or "g++" actually handle the compilation -
these are driver front-ends that pass things on to the actual
compilers, assemblers and linkers (and any other bits and pieces
required).

The front-ends determine the language to use primarily from the suffix
of the source file it is given. ".c" files are compiled as C.
".cpp", ".c++", ".cc", ".C" (note the capital C), ".cp", ".cxx", and
".CPP" are compiled as C++. (There are many other extensions
supported for different languages.)

The language choice can be overridden by using the "-x" switch, such
as "-x c" or "-x c++". The standard can be specified with "-std=".

There is no automatic detection of C or C++ based on the /content/ of
the files.

<https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html>

?
I really wrote that something similar (similar != equal) did g++ and
that, if you write c++ code in a file with the .c extension, the g++
compile it. I never wrote that it was automatically recognized.
In addition, you just explained why g++ compile a .c that contains c++
code. I don't understand: no what?

I made an error here - "g++ foo.c" /will/ treat the file as C++. I
apologise for that, as it made things a lot more confusing.

But that is not what you wrote. Perhaps you didn't write what you
intended to write. You said that g++ somehow determines whether to
compile code as C or C++ based on the /contents/ of the file, not the
filename suffix. And that is completely wrong.

You also mixed up ".c" and ".C". gcc considers ".c" to be C code, while
".C" (with a capital C) is considered C++.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Sun May 26 19:35:49 2024

On Sun, 26 May 2024 16:25:51 +0100
bart <[email protected]> wrote:

On 26/05/2024 14:18, Michael S wrote:

On Sun, 26 May 2024 12:51:12 +0100
bart <[email protected]> wrote:

On 26/05/2024 12:09, David Brown wrote:

On 26/05/2024 00:58, Keith Thompson wrote:

For a very large file, that could be a significant burden. (I
don't have any numbers on that.)

I do :

<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#design-efficiency-metrics>

(That's from a proposal for #embed for C and C++. Generating the
numbers and parsing them is akin to using xxd.)

More useful links:

<https://thephd.dev/embed-the-details#results>
<https://thephd.dev/implementing-embed-c-and-c++>

(These are from someone who did a lot of the work for the
proposals, and prototype implementations, as far as I understand
it.)

Note that I can't say how much of a difference this will make in
real life. I don't know how often people need to include
multi-megabyte files in their code. It certainly is not at a
level where I would change any of my existing projects from
external generator scripts to using #embed, but I might use it in
future projects.

I've just done my own quick test (not in C, using embed in my
language):

[]byte clangexe = binclude("f:/llvm/bin/clang.exe")

proc main=
fprintln "clang.exe is # bytes", clangexe.len
end

This embeds the Clang C compiler which is 119MB. It took 1.3
seconds to compile (note my compiler is not optimised).

If I tried it using text: a 121M-line include file, with one number
per line, it took 144 seconds (I believe it used more RAM than was
available: each line will have occupied a 64-byte AST node, so
nearly 8GB, on a machine with only 6GB available RAM, much of
which was occupied).

On my old PC that was not the cheapest box in the shop, but is more
than 10 y.o. compilation speed for similarly organized (but much
smaller) text files is as following:
MSVC 18.00.31101 (VS 2013) - 1950 KB/sec
MSVC 19.16.27032 (VS 2017) - 1180 KB/sec
MSVC 19.20.27500 (VS 2019) - 1180 KB/sec
clang 17.0.6 - 547 KB/sec (somewhat better with hex text)
gcc 13.2.0 - 580 KB/sec

So, MSVC compilers, esp. an old one, are somewhat faster than yours.
But if there was swapping involved it's not comparable. How much
time does it take for your compiler to produce 5MB byte array from
text?

Are you talking about a 5MB array initialised like this:

unsigned char data[] = {
45,
67,
17,
... // 5M-3 more rows
};

Yes.

The timing for 120M entries was challenging as it exceeded physical
memory. However that test I can also do with C compilers. Results for
120 million lines of data are:

DMC - Out-of-memory

Tiny C - Silently stopped after 13 second (I thought it
had finished but no)

lccwin32 - Insufficient memory

gcc 10.x.x - Out of memory after 80 seconds

mcc - (My product) Memory failure after 27 seconds

Clang - (Crashed after 5 minutes)

MM 144s (Compiler for my language)

So the compiler for my language did quite well, considering!

That's an interesting test as well, but I don't want to run it on my HW
right now. May be, at night.

Back to the 5MB test:

Tiny C 1.7s 2.9MB/sec (Tcc doesn't use any IR)

mcc 3.7s 1.3MB/sec (my product; uses intermediate ASM)

Faster than new MSVC, but slower than old MSVC.

DMC -- -- (Out of memory; 32-bit compiler)

lccwin32 3.9s 1.3MB/sec

gcc 10.x 10.6s 0.5MB/sec

clang 7.4s 0.7MB/sec (to object file only)

MM 1.4s 3.6MB/sec (compiler for my language)

MM 0.7 7.1MB/sec (MM optimised via C and gcc-O3)

That's quite impressive.
Does it generate object files or goes directly to exe?
Even if later, it's still impressive.

As a reminder, when using my version of 'embed' in my language,
embedding a 120MB binary file took 1.3 seconds, about 90MB/second.

But both are much faster than compiling through text. Even "slow"
40MB/3 is 6-7 times faster than the fastest of compilers in my
tests.

Do you have a C compiler that supports #embed?

No, I just blindly believe the paper.
But it probably would be available in clang this year and in gcc around
start of the next year. At least I hope so.

It's generally understood that processing text is slow, if
representing byte-at-a-time data. If byte arrays could be represented
as sequences of i64 constants, it would improve matters. That could
be done in C, but awkwardly, by aliasing a byte-array with an
i64-array.

I don't think that conversion from text to binary is a significant
bottleneck here. In order to get a feeling of the things, I wrote a
tiny program that converts comma-separated list of integers to a binary
file. Something quite similar to 'xxd -r' but with input format that
is more fit to our requirements. Not identical to full requirements, of
course. My utility can't handle comments and probably few other things
that are allowed in C sources, but conversion part is pretty much the
same.
It runs at 6.700 MB/s with decimal input and at 9.1 MB/s with hex input.
That with SATA SSD of sort that went out of fashion before 2020.

So, it seems that at least in case gcc a conversion part constitutes
less than 10% of the total run time.

If you want to play with it yourself, here is my source:

-- list_to_bin.c
-- takes textual input from standard input
-- writes output to binary file
-- Usage:
-- list_to_bin oufile.bin < inp_file.txt
--
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

int main(int argz, char** argv)
{
if (argz > 1) {
FILE* fp = fopen(argv[1], "wb");
if (fp) {
char buf[2048];
_Bool look_for_comma = 0;
for (;;) {
if (fgets(buf, sizeof(buf), stdin) != buf)
break;

char* p = buf;
for (;;) {
char c = *p;
if (isgraph(c)) {
if (look_for_comma) {
if (c == ',') {
look_for_comma = 0;
++p;
} else {
goto done;
}
} else {
char* endp;
long val = strtol(p, &endp, 0);
if (endp==p) // not a number
goto done;
fputc((unsigned char)val, fp);
p = endp;
look_for_comma = 1;
}
} else {
if (c == 0)
break; // end of line
++p; // skip space or control character
}
}
}
done:
fclose(fp);
} else {
perror(argv[1]);
return 1;
}
}
return 0;
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to David Brown on Sun May 26 19:50:40 2024

On Sun, 26 May 2024 18:26:49 +0200
David Brown <[email protected]> wrote:

On 26/05/2024 17:05, Michael S wrote:

In my environment it applies to gcc, but not to g++.
In order to force my g++ to compile for other language you have to
tell it so explicitly.

No, g++ treats extensions other than ".c" the same way as gcc. (I
tested to be sure this time!) Try :

touch foo.f
gcc foo.f
g++ foo.f

You'll get the same complaint - either from missing Fortran support
or a failure to build the Fortran program. Even "g++ foo.m" tries to
compile as Objective-C, not Objective-C++.

Yes, I paid attention that for suffix .f (and probably for .ada) gcc
and g++ behave identically only after I posted my response.

BTW, it seems to me that here behavior of gcc/g++ is different from
gfortran. If I am not mistaken, gfortran by default treats extension .f
as "old FORTRAN" and extension .f90 as "new Fortran". But I can be
wrong about it, New Fortran is not something I compile regularly and
old FORTRAN is not something that I compile ever.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From jak@21:1/5 to All on Sun May 26 19:11:31 2024

David Brown ha scritto:

On 26/05/2024 17:10, jak wrote:

David Brown ha scritto:

On 26/05/2024 15:46, jak wrote:

Michael S ha scritto:

On Sun, 26 May 2024 13:44:32 +0200
jak <[email protected]> wrote:

Keith Thompson ha scritto:

jak <[email protected]> writes:

Kaz Kylheku ha scritto:

On 2024-05-24, jak <[email protected]> wrote:

Bonita Montero ha scritto:

Am 23.05.2024 um 21:49 schrieb Thiago Adams:

On 23/05/2024 16:25, Bonita Montero wrote:

I ask myself what the point is in further developing a >>>>>>>>>>>>> language like this that can actually no longer be saved. >>>>>>>>>>>> do you mean C++?

No, C.

I think you have a lot of confusion about programming languages. >>>>>>>>>> C and C++ are not comparable languages.

Except for observations like that we can write useful, production >>>>>>>>> software that compiles as C or C++, but go on ...

Indeed there are c++ compilers who, if used to compile c code, >>>>>>>> could decide to call the c compiler to do the work, but if
something in the code is not strictly c, then the compilation will >>>>>>>> be in c++, the size of the executable will increase significantly >>>>>>>> and will need of an internal or external runtimer to work. If it >>>>>>>> were the same thing you would not get different things.

Oh? Do you know of a C++ compiler that actually behaves this way? >>>>>>> I've never heard of such a thing.

C and C++ are closely related, and C and C++ compilers often share >>>>>>> backends, but the two languages have different grammars. The gcc >>>>>>> command, for example, can invoke either a C or C++ compiler, but it >>>>>>> knows which language it's compiling based on the source file name or >>>>>>> command line options, before it's even seen the content.

There are programs that are valid C and valid C++ but with different >>>>>>> behavior. How would a compiler that behaves as you describe cope >>>>>>> with that?

For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.

No.

No, it does not.
g++ compiles as C++ unless you tell it to compile as C with '-x c'
option.

No.

You didn't read carefully or I didn't express myself well. I wrote that >>>> the g++ compile c++ even if it is written inside a .c file.
However in doubt I preferred to try. If I pass to g++ a .c file that
contains c code, it compiles without any option, perhaps because it
reads as if it were c++ but in any case compiles it.

No.

The way gcc handles all this is actually quite straightforward.

First, there is no difference between the commands "gcc" and "g++" in
the languages supported, or the way the language is determined. The
only difference between these two is the standard libraries linked by
default when generating a final executable - "g++" automatically
includes the C++ standard libraries, while "gcc" only has the C
standard libraries.

In neither case does "gcc" or "g++" actually handle the compilation -
these are driver front-ends that pass things on to the actual
compilers, assemblers and linkers (and any other bits and pieces
required).

The front-ends determine the language to use primarily from the
suffix of the source file it is given. ".c" files are compiled as C.
".cpp", ".c++", ".cc", ".C" (note the capital C), ".cp", ".cxx", and
".CPP" are compiled as C++. (There are many other extensions
supported for different languages.)

The language choice can be overridden by using the "-x" switch, such
as "-x c" or "-x c++". The standard can be specified with "-std=".

There is no automatic detection of C or C++ based on the /content/ of
the files.

<https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html>

?
I really wrote that something similar (similar != equal) did g++ and
that, if you write c++ code in a file with the .c extension, the g++
compile it. I never wrote that it was automatically recognized.
In addition, you just explained why g++ compile a .c that contains c++
code. I don't understand: no what?

I made an error here - "g++ foo.c" /will/ treat the file as C++. I apologise for that, as it made things a lot more confusing.

But that is not what you wrote. Perhaps you didn't write what you
intended to write. You said that g++ somehow determines whether to
compile code as C or C++ based on the /contents/ of the file, not the filename suffix. And that is completely wrong.

You also mixed up ".c" and ".C". gcc considers ".c" to be C code, while ".C" (with a capital C) is considered C++.

Sorry but no. I wrote that there are compilers who do it and when they
replied, bringing the gcc as an example, I replied that the g++ does
something similar.

and no, I have not confused the .c with the .C:

$ cat foo.c
#include <iostream>

int main()
{
std::cout << "hello" << std::endl;
return 0;
}
$ g++ -Wall -pedantic foo.c -o foo
$ ./foo
hello
$

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From jak@21:1/5 to All on Sun May 26 19:23:09 2024

Michael S ha scritto:

On Sun, 26 May 2024 17:10:01 +0200
jak <[email protected]> wrote:

?
I really wrote that something similar (similar != equal) did g++ and
that, if you write c++ code in a file with the .c extension, the g++
compile it. I never wrote that it was automatically recognized.
In addition, you just explained why g++ compile a .c that contains c++
code. I don't understand: no what?

Your English is already harder to understand than mine.
Congratulations, that is not a small fit. But you still have fir to
pursuit. Keep exercising.

Instead, we could give us an appointment on a usenet where the native
speaker is me. :-)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Sun May 26 19:01:21 2024

On 26/05/2024 17:35, Michael S wrote:

On Sun, 26 May 2024 16:25:51 +0100
bart <[email protected]> wrote:

On 26/05/2024 14:18, Michael S wrote:

Are you talking about a 5MB array initialised like this:

unsigned char data[] = {
45,
67,
17,
... // 5M-3 more rows
};

Yes.

The timing for 120M entries was challenging as it exceeded physical
memory. However that test I can also do with C compilers. Results for
120 million lines of data are:

DMC - Out-of-memory

Tiny C - Silently stopped after 13 second (I thought it
had finished but no)

lccwin32 - Insufficient memory

gcc 10.x.x - Out of memory after 80 seconds

mcc - (My product) Memory failure after 27 seconds

Clang - (Crashed after 5 minutes)

MM 144s (Compiler for my language)

So the compiler for my language did quite well, considering!

That's an interesting test as well, but I don't want to run it on my HW
right now. May be, at night.

Back to the 5MB test:

Tiny C 1.7s 2.9MB/sec (Tcc doesn't use any IR)

mcc 3.7s 1.3MB/sec (my product; uses intermediate ASM)

Faster than new MSVC, but slower than old MSVC.

My mcc is never going to be fast, because it uses ASM, which itself will generate a text file several times larger than the C (so the line "123,"
in C ends up as " db 123" in the ASM file).

However I've looked at a possible way of speeding this up in general,
see below.

DMC -- -- (Out of memory; 32-bit compiler)

lccwin32 3.9s 1.3MB/sec

gcc 10.x 10.6s 0.5MB/sec

clang 7.4s 0.7MB/sec (to object file only)

MM 1.4s 3.6MB/sec (compiler for my language)

MM 0.7 7.1MB/sec (MM optimised via C and gcc-O3)

That's quite impressive.
Does it generate object files or goes directly to exe?

All produce EXE files, via linkers if necessary, except Clang (its hefty
LLVM installation doesn't come with standard C headers, nor a linker; it depends on MS tools, but never manages to sync with them).

My MM product directly generates EXE files with no intermediate OBJ files.

Even if later, it's still impressive.

So, it's more impressive if it first generates an OBJ file then invokes
a linker? I'd have thought that eliminating that pointless intermediate
step would be more impressive!

Anyway, I thought of a way of speeding up initialisation of byte-arrays
which is, instead of parsing each value into its own AST node, to
directly parse successive numeric values into a special data-string
object (similar to normal strings, and identical to the data-strings
used for embedded data).

Then there is only one AST node containing one 'string' value, instead
of 5M or 120M nodes.

This produced a timing, for 5M lines, of 0.34s (0.28s optimised), a
throughput of 15-18MB/sec.

When I applied this to the 120M line data (which is a 0.6GB source
file), it finished in 6.5 seconds (5.5 optimised), or 18-21MB/sec.
Previously that took 144 seconds.

However I can't keep that experimental code, since if it turns out not
all values are constant expressions, it has to revert to normal
processing, which is tricky to do; it may already have read 1M numbers
and needs to backtrack). This was just to see how fast it could be.

Processing 120MB as binary rather than text is still faster; that works
at up to 110MB/sec with an optimised compiler.

As a reminder, when using my version of 'embed' in my language,
embedding a 120MB binary file took 1.3 seconds, about 90MB/second.

But both are much faster than compiling through text. Even "slow"
40MB/3 is 6-7 times faster than the fastest of compilers in my
tests.

Do you have a C compiler that supports #embed?

No, I just blindly believe the paper.

Funny that no one else has access to an implementation! Those figures
have been around for a while.

But it probably would be available in clang this year and in gcc around
start of the next year. At least I hope so.

It's generally understood that processing text is slow, if
representing byte-at-a-time data. If byte arrays could be represented
as sequences of i64 constants, it would improve matters. That could
be done in C, but awkwardly, by aliasing a byte-array with an
i64-array.

I don't think that conversion from text to binary is a significant
bottleneck here.

That's not quite what I meant. That conversion is the lexical part of processing source code, it can be very fast.

It is parsing, and especially constructing a list of 5M or 120M AST
nodes, each containing one expression, and the subsequent type-checking
and code generation that takes the time.

However your benchmark looks intriguing and I'll have a closer look later.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Malcolm McLean on Sun May 26 23:06:47 2024

On Sun, 26 May 2024 19:19:59 +0100
Malcolm McLean <[email protected]> wrote:

The Baby X resource compiler has a 'binary' tag to embed binary data.
The biggest file in my documents folder was a 33 mb boost zipped
image. And the resouce compiler, built in debug mode, took five
seconds to convert that to a C source file with an array of unsigned
chars.

It then took gcc about 20 seconds to compile it to an object file.

If '33 mb' means 33 MB and 'about 20 seconds' means 20 seconds then
your gcc compiles at 1.65 MB/s. That's 2.8x faster than
gcc on my old test machine and 1.7 times faster than gcc 13.2.0 on much
faster machine with quite good PCIe-attached SSD. Sounds interesting.
What are your HW, OS and environment?
Can you show us an example of your output format?

The output file was 218 mb. It goes straight in the bin.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to jak on Sun May 26 21:03:17 2024

On 2024-05-26, jak <[email protected]> wrote:

Indeed there are c++ compilers who, if used to compile c code, could
decide to call the c compiler to do the work, but if something in the
code is not strictly c, then the compilation will be in c++, the size of

Compilers? As in two or more?

Name them, ore there aren't.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @[email protected]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Sun May 26 23:26:51 2024

On Sun, 26 May 2024 19:01:21 +0100
bart <[email protected]> wrote:

On 26/05/2024 17:35, Michael S wrote:

On Sun, 26 May 2024 16:25:51 +0100
bart <[email protected]> wrote:

Back to the 5MB test:

Tiny C 1.7s 2.9MB/sec (Tcc doesn't use any IR)

mcc 3.7s 1.3MB/sec (my product; uses intermediate
ASM)

Faster than new MSVC, but slower than old MSVC.

My mcc is never going to be fast, because it uses ASM, which itself
will generate a text file several times larger than the C (so the
line "123," in C ends up as " db 123" in the ASM file).

Generation of asm at 7-8 MB/s sounds feasible even on slow computer.
And once you have asm in right format, 'gnu as' processes it quite fast.
On faster computer I had seen ~30 MB/s. I'd guess the slower one
should be able to do it at 15 MB/s. So, generation+assembling together
could run at ~5 MB/s. The trick here is to use format that 'gnu as' was optimized for. To know what it is, look at the output of gcc -S.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Michael S on Sun May 26 23:59:55 2024

On Sun, 26 May 2024 19:35:49 +0300
Michael S <[email protected]> wrote:

On Sun, 26 May 2024 16:25:51 +0100
bart <[email protected]> wrote:

On 26/05/2024 14:18, Michael S wrote:

The timing for 120M entries was challenging as it exceeded physical
memory. However that test I can also do with C compilers. Results
for 120 million lines of data are:

DMC - Out-of-memory

Tiny C - Silently stopped after 13 second (I thought it
had finished but no)

lccwin32 - Insufficient memory

gcc 10.x.x - Out of memory after 80 seconds

mcc - (My product) Memory failure after 27 seconds

Clang - (Crashed after 5 minutes)

MM 144s (Compiler for my language)

So the compiler for my language did quite well, considering!

That's an interesting test as well, but I don't want to run it on my
HW right now. May be, at night.

Done.
On bigger gear it was not as bad as I expected.
Input file: 155,488,672 bytes
C source (decimal, one number per line): 641,236,315 bytes
gcc compilation time: 3m54.635s
peak memory consumption by compiler: ~27 GB

0.66 MB/s, only 25-30% slower rate than 5 MB input on the same HW
That is, slow, but not sky is falling sort of slow.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to jak on Sun May 26 21:16:00 2024

On 2024-05-26, jak <[email protected]> wrote:

Keith Thompson ha scritto:

Indeed there are c++ compilers who, if used to compile c code, could
decide to call the c compiler to do the work, but if something in the
code is not strictly c, then the compilation will be in c++, the size
of the executable will increase significantly and will need of an
internal or external runtimer to work. If it were the same thing you
would not get different things.

Oh? Do you know of a C++ compiler that actually behaves this way?
I've never heard of such a thing.

For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.

1. The file suffix is not "something /in the code/ that is not strictly C".
The front end of a compiler collection selecting a compiler based
on file suffix is not an example of switching language based
on syntax in the file.

2. g++ does not behave this way.

In fact .C (capital C) is one of the conventions for C++ files. I
seem to remember that the convention was used at A&T and in fact you
can find examples of it in the source code of Cfront (the historic
C++ to C transpiler originally developed by B. Stroustrup).

For g++ to assume that a .C file is C and not C++ would be insanely
poor.

The g++ command even assumes that .c files are C++!

Conversely, when you use the gcc driver command on a .C file,
you get the C++ compiler!

Since you'r posting to Usenet, you're obviously connected to the same
Internet as the rest of us, so it's amazing you're not able to check
your facts. You know about g++, so presumbly you have an installation of
it somewhere, where you could run a 30 second experiment.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @[email protected]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Sun May 26 22:27:15 2024

On 26/05/2024 21:26, Michael S wrote:

On Sun, 26 May 2024 19:01:21 +0100
bart <[email protected]> wrote:

On 26/05/2024 17:35, Michael S wrote:

On Sun, 26 May 2024 16:25:51 +0100
bart <[email protected]> wrote:

Back to the 5MB test:

Tiny C 1.7s 2.9MB/sec (Tcc doesn't use any IR)

mcc 3.7s 1.3MB/sec (my product; uses intermediate
ASM)

Faster than new MSVC, but slower than old MSVC.

My mcc is never going to be fast, because it uses ASM, which itself
will generate a text file several times larger than the C (so the
line "123," in C ends up as " db 123" in the ASM file).

Generation of asm at 7-8 MB/s sounds feasible even on slow computer.
And once you have asm in right format,

If I take the 5M-line data file, and use `gcc -S` on it, produces an ASM
file where the bytes are combined into strings. Is that the 'trick'?

Then processing that ASM file can be faster.

However my ASM o/p doesn't create strings like that, and the ASM file is therefore five times the size.

Still, my assembler can turn my 72MB ASM file into a 5MB executable in
0.74 seconds (which is 100MB/sec).

'as' can turn its much smaller 15MB ASM (.s) file into an executable in
0.56 seconds (27MB/sec).

'gnu as' processes it quite fast.

Given the same input (ie. same set of instructions), my assembler is
faster than 'as'. See this survey of assembler speeds here:

https://www.reddit.com/r/Compilers/comments/1c41y6d/assembler_survey/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

Mine is the 'AA' assembler.

The bottleneck here is writing the ASM file. But I don't care about
that, since 'mcc' is not my primary compiler. My primary one doesn't use
ASM.

But even with that bottleneck, mcc compiles this data file to EXE three
times as fast as gcc.

My MM compiler can do so 17 times as fast as gcc. And with the
optimisation I mentioned in a previous post (similar to as's trick), it
could do so 35-40 times faster than gcc.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Sun May 26 22:52:25 2024

On 26/05/2024 17:35, Michael S wrote:

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

int main(int argz, char** argv)
{
if (argz > 1) {
FILE* fp = fopen(argv[1], "wb");
if (fp) {
char buf[2048];
_Bool look_for_comma = 0;
for (;;) {
if (fgets(buf, sizeof(buf), stdin) != buf)
break;

char* p = buf;
for (;;) {
char c = *p;
if (isgraph(c)) {
if (look_for_comma) {
if (c == ',') {
look_for_comma = 0;
++p;
} else {
goto done;
}
} else {
char* endp;
long val = strtol(p, &endp, 0);
if (endp==p) // not a number
goto done;
fputc((unsigned char)val, fp);
p = endp;
look_for_comma = 1;
}
} else {
if (c == 0)
break; // end of line
++p; // skip space or control character
}
}
}
done:
fclose(fp);
} else {
perror(argv[1]);
return 1;
}
}
return 0;
}

I tried this on my 600MB data like this:

C:\c>c fred.exe <data

C:\c>fred --version
clang version 18.1.0rc
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\c

Since those bytes represent the contents of the clang compiler, I was
able to run it afterwards.

All versions across compilers/optimise levels seemed to give a constant
time of 17-18 seconds. This is good compared with my initial 144 seconds
(most compilers failed; you reported a similar test took several minutes).

However, what's involved with a compiler is much elaborate than such a
program. There's syntax, type-checking, code-generation...

Still, I reported earlier an experimental change to my non-C compiler,
which translated this same input to a program with that embedded binary
(not just the binary itself) in under 6 seconds.

That's three times as fast as the above result:

C:\mapps>tm \mx2\mm -ext test2 # tm is timing tool
Compiling test2.m to test2.exe
TM: 5.86 # (timings vary)

C:\mapps>test2
data is 119571969 bytes

C:\mapps>type test2.m
[]byte data = (
include "data"
0)

proc main=
fprintln "data is # bytes", data.len
end

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Kuyper@21:1/5 to BGB on Sun May 26 18:59:09 2024

On 5/26/24 10:48, BGB wrote:

On 5/26/2024 9:18 AM, David Brown wrote:

On 26/05/2024 01:45, Keith Thompson wrote:

David Brown <[email protected]> writes:

...

#define errno *errno()

Both of those need more parentheses -- and I'm unconfortable using the
same identifier for the macro and the function.

The second example was from the footnote in the C standard's section on
<errno.h>, so it can't be /that/ bad!

But I agree with your discomfort.

I would expect it to immediately explode, because AFAIK the usual preprocessor behavior is to keep expanding macros in a line until there
is nothing left to expand.

No, C macros are not recursive:
"... The resulting preprocessing token sequence is then rescanned, along
with all subsequent preprocessing tokens of the source file, for more
macro names to replace.
If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s
preprocessing tokens), it is not replaced. Furthermore, if any nested replacements encounter the name of the macro being replaced, it is not replaced. These nonreplaced macro name preprocessing tokens are no
longer available for further replacement even if they are later
(re)examined in contexts in which that macro name preprocessing token
would otherwise have been replaced." (6.10.4.4p1,2)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Michael S on Mon May 27 00:48:05 2024

On Sun, 26 May 2024 19:35:49 +0300, Michael S wrote:

Faster than new MSVC, but slower than old MSVC.

New MSVC is slower than old MSVC?!? Say it isn’t so!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Michael S on Mon May 27 00:49:24 2024

On Sun, 26 May 2024 23:06:47 +0300, Michael S wrote:

On Sun, 26 May 2024 19:19:59 +0100 Malcolm McLean <[email protected]> wrote:

... was a 33 mb boost zipped image.

If '33 mb' means 33 MB ...

Yeah, I wondered about that. Never saw anybody measure things in “millibits” before ...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to David Brown on Mon May 27 00:44:21 2024

On Sun, 26 May 2024 13:09:36 +0200, David Brown wrote:

People have always managed to embed
binary source files into their binary output files - using linker
tricks, or using xxd or other tools (common or specialised) to turn
binary files into initialisers for constant arrays (or structs).

Don’t call them “tricks”. Call them “linker scripts” and “build procedures”. They can do some quite complex things.

#embed has two purposes. One is to save you from using external tools
for that kind of thing.

But it can only be a partial solution to that. It cannot replace the
procedures needed to construct the binary data format. It only solves the
easy part: including that binary data in the build. And only in a certain
way.

That’s why I think it’s a waste of time.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Bonita Montero on Mon May 27 00:55:28 2024

On Sun, 26 May 2024 13:23:21 +0200, Bonita Montero wrote:

C++ is the wrong language for web applications.
I like Java more for that.

Java is too clunky and verbose. For asynchronous programming (for
WebSockets etc), it requires you to use threads.

I like Python, because it has frameworks based on ASGI for web
applications.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Lawrence D'Oliveiro on Mon May 27 01:55:24 2024

On 27/05/2024 01:44, Lawrence D'Oliveiro wrote:

On Sun, 26 May 2024 13:09:36 +0200, David Brown wrote:

People have always managed to embed
binary source files into their binary output files - using linker
tricks, or using xxd or other tools (common or specialised) to turn
binary files into initialisers for constant arrays (or structs).

Don’t call them “tricks”. Call them “linker scripts” and “build procedures”. They can do some quite complex things.

#embed has two purposes. One is to save you from using external tools
for that kind of thing.

But it can only be a partial solution to that. It cannot replace the procedures needed to construct the binary data format.

The binary data already exists, or has been created.

The problem is getting it into your program as ready-to-use data rather
than have to bundle an unwieldy collection of files in a folder
somewhere and then have assorted routines to read them into memory.

It only solves the
easy part: including that binary data in the build.

Apparently that is not so easy as you seem to think. Or maybe you think
that 'embedding a file' just means adding it to a zip file?

That’s why I think it’s a waste of time.

Embedding applies also to text files not just binaries.

I used that extensively so that I could build in the sources of the C
standard libraries into my C compiler. The result is a single executable
with zero dependencies or support files.

How would you have implemented that? How maintainable would it have been?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Keith Thompson on Mon May 27 00:53:58 2024

On Sun, 26 May 2024 02:48:40 -0700, Keith Thompson wrote:

The gcc command, for example, can invoke either a C or C++ compiler ...

It can also handle Fortran, Go, D, Ada, assembler and object files. Oh,
and Objective C as well.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to bart on Mon May 27 02:48:50 2024

On Mon, 27 May 2024 01:55:24 +0100, bart wrote:

On 27/05/2024 01:44, Lawrence D'Oliveiro wrote:

On Sun, 26 May 2024 13:09:36 +0200, David Brown wrote:

People have always managed to embed binary source files into their
binary output files - using linker tricks, or using xxd or other tools
(common or specialised) to turn binary files into initialisers for
constant arrays (or structs).

Don’t call them “tricks”. Call them “linker scripts” and “build >> procedures”. They can do some quite complex things.

#embed has two purposes. One is to save you from using external tools
for that kind of thing.

But it can only be a partial solution to that. It cannot replace the
procedures needed to construct the binary data format.

The binary data already exists, or has been created.

It might have to be created as part of the build process.

The problem is getting it into your program as ready-to-use data rather
than have to bundle an unwieldy collection of files in a folder
somewhere and then have assorted routines to read them into memory.

Nothing “unwieldy” about it. It’s a bunch of temporary intermediate build products, generated from suitable source files like everything else in the build.

It only solves the easy part: including that binary data in the build.

Apparently that is not so easy as you seem to think.

Yes, it is as easy as I think. I’ve done this sort of thing, using
suitable build scripts.

Or maybe you think
that 'embedding a file' just means adding it to a zip file?

It’s whatever “including it in the build” means. It might indeed be a zip component, as with resources for an Android app. Or it might be converted
into an object file with a tool like objcopy, to be integrated into the executable.

Embedding applies also to text files not just binaries.

Same principle applies.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From jak@21:1/5 to All on Mon May 27 07:14:30 2024

Kaz Kylheku ha scritto:

On 2024-05-26, jak <[email protected]> wrote:

Keith Thompson ha scritto:

Indeed there are c++ compilers who, if used to compile c code, could
decide to call the c compiler to do the work, but if something in the
code is not strictly c, then the compilation will be in c++, the size
of the executable will increase significantly and will need of an
internal or external runtimer to work. If it were the same thing you
would not get different things.

Oh? Do you know of a C++ compiler that actually behaves this way?
I've never heard of such a thing.

For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.

1. The file suffix is not "something /in the code/ that is not strictly C".
The front end of a compiler collection selecting a compiler based
on file suffix is not an example of switching language based
on syntax in the file.

2. g++ does not behave this way.

In fact .C (capital C) is one of the conventions for C++ files. I
seem to remember that the convention was used at A&T and in fact you
can find examples of it in the source code of Cfront (the historic
C++ to C transpiler originally developed by B. Stroustrup).

For g++ to assume that a .C file is C and not C++ would be insanely
poor.

The g++ command even assumes that .c files are C++!

Conversely, when you use the gcc driver command on a .C file,
you get the C++ compiler!

Since you'r posting to Usenet, you're obviously connected to the same Internet as the rest of us, so it's amazing you're not able to check
your facts. You know about g++, so presumbly you have an installation of
it somewhere, where you could run a 30 second experiment.

About what you are talking about I must apologize for one thing: in my
message that you actually report '.c' is written in capital letters. Unfortunately, Google-Translator transforms everything that look like
brands or very short texts (c, c++, g++, ...) and initially I have not
noticed this. I hope to be apologized because I write every sentence
several times to be able to find a translation as close as possible to
what I would like to write. In relation to the tests you request, I
would like to point out that in Sun-26-May-2024-19:11:31+0200 I also
posted one that, If you had seen, perhaps, it would have avoided this
post.

'.c' in GT -> '. C'
(c, c++, g++, ...) in GT -> (C, C ++, G ++, ...)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Lawrence D'Oliveiro on Mon May 27 11:05:47 2024

On Mon, 27 May 2024 00:48:05 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:

On Sun, 26 May 2024 19:35:49 +0300, Michael S wrote:

Faster than new MSVC, but slower than old MSVC.

New MSVC is slower than old MSVC?!? Say it isn’t so!

Is not it a case for just about any compiler that has a long history
of development?
Compilers become slower over time. In return they support newer dialects
of input language and generate better diagnostics. They also try to
produce faster code, with very varying levels of success.
This trend was most easily seen during first decade of LLVM/clang.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to jak on Mon May 27 10:45:28 2024

On 26/05/2024 19:11, jak wrote:

David Brown ha scritto:

On 26/05/2024 17:10, jak wrote:

?
I really wrote that something similar (similar != equal) did g++ and
that, if you write c++ code in a file with the .c extension, the g++
compile it. I never wrote that it was automatically recognized.
In addition, you just explained why g++ compile a .c that contains c++
code. I don't understand: no what?

I made an error here - "g++ foo.c" /will/ treat the file as C++. I
apologise for that, as it made things a lot more confusing.

But that is not what you wrote. Perhaps you didn't write what you
intended to write. You said that g++ somehow determines whether to
compile code as C or C++ based on the /contents/ of the file, not the
filename suffix. And that is completely wrong.

You also mixed up ".c" and ".C". gcc considers ".c" to be C code,
while ".C" (with a capital C) is considered C++.

Sorry but no. I wrote that there are compilers who do it and when they replied, bringing the gcc as an example, I replied that the g++ does something similar.

and no, I have not confused the .c with the .C:

You /did/ mix these things up - the Usenet posts are there for you, me,
or anyone else to read. But there seems little doubt now that you
understand the difference between "gcc" and "g++", and between ".c" and
".C". So I assume the mixup was a language issue - I fully understand
that it's not always easy to communicate accurately in a different
language, and even when you are as good as you are in English, sometimes
there are miscommunications.

Whichever compiler you use, I strongly recommend using only ".c" for C
files, and only ".cpp" for C++ files. There are several other
extensions used for C++, but IME ".cpp" is the most commonly used and
supported by all C++ tools on all platforms. ".C" (capital C) is a poor
choice - it's hard to distinguish from ".c" (small C), and it will drive Windows users crazy. And if you use gcc, then unless you can stick to a
pure C++ setup and never use C, I recommend using "gcc" rather than
"g++" for everything except the final linking stage (and even that is optional). The "gcc" driver program does the right thing.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Lawrence D'Oliveiro on Mon May 27 11:10:22 2024

On 27/05/2024 02:49, Lawrence D'Oliveiro wrote:

On Sun, 26 May 2024 23:06:47 +0300, Michael S wrote:

On Sun, 26 May 2024 19:19:59 +0100 Malcolm McLean
<[email protected]> wrote:

... was a 33 mb boost zipped image.

If '33 mb' means 33 MB ...

Yeah, I wondered about that. Never saw anybody measure things in “millibits” before ...

I've seen communication systems that had transfer speeds measured in
mbps - millibits per second.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Mon May 27 13:42:28 2024

On 27/05/2024 01:17, Keith Thompson wrote:

David Brown <[email protected]> writes:

On 26/05/2024 00:58, Keith Thompson wrote:

David Brown <[email protected]> writes:

On 25/05/2024 03:29, Keith Thompson wrote:

Keith Thompson <[email protected]> writes:

David Brown <[email protected]> writes:

On 23/05/2024 14:11, bart wrote:

[...]

The compiler will generate results /as if/ it had expanded the file to
a list of numbers and parsed them. But it will not do that in
practice. (At least, not for more serious implementations - simple
solutions might do so to get support implemented quickly.)

I'll start by acknowledging that the prototype information apparently
*does* optimize #embed when it can. I was mistaken on that point.

#embed *must* expand to the standard-defined comma-delimited sequence in *some* cases.

Which means that the piece of the compiler that implements #embed has to recognize when it must generate that sequence, and when it can do
something more efficient.

Yes, exactly.

I'd expect implementations to have extremely fast implementations for
initialising arrays of character types, and probably also for other
arrays of scaler types. More complicated examples - such as
parameters in a macro or function call - would probably use a
fall-back of generating naïve lists of integer constants.

My problem is not just with how the compiler can figure out when it can optimize, but how programmers are supposed to understand whatever rules
it uses. Can I rely on the optimization being performed if I use a
typedef for unsigned char, or if I use an enumeration type whose
underlying type is unsigned char, or if I have initialization elements
befor and after the #embed directive?

I don't know if that is something the programmer should need to
consider, at least for most cases. Generally as a programmer you don't consider the compilation speed when writing code. You simply expect
that compiler writers try to make their tools as fast as reasonably
possible without sacrificing features. Sometimes there can be
particular use-cases where the programmer has to look at the compiler
manuals and adapt the code or build procedures to suit. I think that
will be the case here too - compiler manuals should document what types
of #embed usage they optimise. But I think it is unlikely that people
writing portable code will do anything other than initialising a const
(or constexpr) array of unsigned char if they have big enough files for optimisation to be relevant. Any compiler that does any #embed
optimisation will handle this case. And even simple #embed
implementations will likely be better than any alternatives (such as
using xxd).

Effective use of #embed requires too much "magic" for my taste -- particularly having the preprocessor rely on information from later
phases. The semantics of #embed don't rely on that information, but efficient use for large files does.

It is a violation of the neat layered (or pipeline) view of C
compilation. But you could argue that this has been broken for decades
- you have _Pragma that is syntactically an operator but duplicates preprocessor work, you have compiler pragmas that duplicate command-line
flags (and command-line flags that duplicate preprocessor defines), you
have pre-compiled headers, you have LTO that passes data multiple times
through different parts of the pipeline.

If you have a binary file containing a sequence of int values, you
can
use #embed to initialize an unsigned char array that's aliased with or
copied to the int array.
The *embed element width* is typically going to be CHAR_BIT bits by
default. It can only be changed by an *implementation-defined* embed
parameter. It seems odd that there's no standard way to specify the
element width.
It seems even more odd that the embed element width is
implementation defined and not set to CHAR_BIT by default.

I agree. But it may be left flexible for situations where the host
and target have different ideas about CHAR_BIT. (Targets with
CHAR_BIT other than 8 are very rare, hosts with CHAR_BIT other than 8
are non-existent, but C remains flexible.)

I would think that you'd want the element width to match CHAR_BIT *on
the target* (which is the only CHAR_BIT that's relevant or available).
If you're cross-compiling, you'd probably want to embed a file that
could have been used on the target system.

Yes, I think so.

And if I'm not doing that kind of exotic cross-compiling, I can't rely
on the element width being CHAR_BIT *or* on any standard way to specify
that I want it to be CHAR_BIT.

Requiring the default width to be CHAR_BIT would, I'm guessing, solve
99% of cases. Allowing it to be specified by a parameter would solve
the remaing 1%. And I expect it *will* be CHAR_BIT in most or all implementations, and programmers will rely on that assumption. I think
the standard should guarantee that.

I agree with you. I'm just trying to think of why the standards might
not make that guarantee.

For a very large file, that could be a significant burden. (I don't
have any numbers on that.)

I do :

<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#design-efficiency-metrics>

(That's from a proposal for #embed for C and C++. Generating the
numbers and parsing them is akin to using xxd.)

More useful links:

<https://thephd.dev/embed-the-details#results>
<https://thephd.dev/implementing-embed-c-and-c++>

(These are from someone who did a lot of the work for the proposals,
and prototype implementations, as far as I understand it.)

That second link does have a lot of good information. I think I had
seen it before, but I hadn't read it thoroughly. It refers to prototype implementations for both gcc and clang. I've built the prototype on my system, and godbolt.org has it, but the gcc prototype (for which the
article provides good performance data) doesn't seem to be available anywhere.

You are putting a lot more effort into this testing than I have. For my
work, I am generally dependent on "official" toolchain builds - provided
by the manufacturers of the microcontrollers we use, or at least by the manufacturers of the cpu cores. I like to keep track of what's coming -
future versions of C or C++, future versions of compilers, etc. But
details such as implementation efficiency (rather than features) don't
matter much to me until they are available as part of these pre-built toolchains. (Sometimes it's fun to try things earlier, and I enjoy
playing with newer compilers on godbolt.org, but I don't see testing the
speed of #embed to be /so/ much fun that I'd bother building a compiler
for it!)

But it's nice to see you've done some independent testing. I have no particular reason to double "thephd.dev", but no particular reason to
consider it authoritative either.

My experiments with the clang prototype have been a bit confusing. I
assumed that `clang -E` would give me meaningful results, but it always produces the comma-delimited sequence of integer constants, and even
that output is inconsistent. It looks like "-E" synthesizes naive and
not entirely correct output. Feeding that output to clang produces
warnings that I don't get without "-E". Some of this might be the
result of user error on my part.

I did some tests with 100MB file, both with #embed and with #include
using the output of "xxd". #embed *is* much faster.

According to <https://thephd.dev/implementing-embed-c-and-c++>, it
internally generates __builtin_pp_embed, which takes as arguments the expected type (always unsigned char for now), the filename as a string literal, and the data encoded as a base64 string literal. That's not
going to be as fast as a hypothetical pure binary blob, but apparently
it's still much faster than parsing a comma-delimited sequence.

I haven't been able to get "clang -E" in the prototype to generate __builtin_pp_embed, or to get clang to recognize it. There are internal things going on that I don't understand.

The author points out that using binary blobs would break tools that
work with -E preprocessed source files. If you could assume that the preprocessed output will be processed only by the same compiler, that wouldn't be an issue, but apparently that's not a safe assumption.

The author acknowedges that the prototype implementation doesn't handle
all cases correctly.

That's all good testing results - thanks for reporting them.

Prototypes have been made, and they do have such optimisations. How
things end up in real tools remains to be seen, of course.

Here's how I personally would have preferred for #embed to be specified:

- As in current C23 drafts, #embed with no parameters must operate *as
if* it expanded to a comma-delimited list of integer constant
expressions.
- With no parameters, both the common cases (initializing an array of
characters) and odd cases (e.g., initializing a struct object with
varying types and sizes of members) must work as specified.
- A standard-defined parameter allows control over optimization.

The parameter can be "optimize(true)" or "optimize(false)".

"optimize(false)" has no formal effect, but the compiler *should*
generate the canonical sequence of constants.

"optimize(true)" causes undefined behavior if #embed is used in a
context other than the initialization of an array of character type.

I disagree here. I want the compiler to generate the "as if" results regardless of any optimisation, working as currently specified. And
/if/ the compiler is able to optimise the #embed, then I want it to do
so automatically - I see no situation in which I would ever want "optimize(false)".

What would be nice is an optional warning if the #embed size is over a
certain limit and it is unable to optimise it - a message telling the
user that an array of "unsigned char" would be faster than an array of
"signed char", or whatever, would be helpful. But that kind of thing is definitely implementation-specific.

I'd also like a pre-processor command-line option (again this is clearly implementation-specific) to force non-optimised output from #embed, for
use with "gcc -E" (or "clang -E") and third-party tools.

A naive compiler can quietly ignore the optimize() parameter and always generate the comma-delimited sequence. An exceedingly clever compiler
could ignore it and always make a correct decision about whether to
optimize #embed.

Without the optimize parameter, typical compilers are expected to
optimize #embed depending on the context in which it's used, and should produce the correct results in all cases. The parameter can be used to override the compiler's judgement.

Another possibility might have been to specify that #embed can *only* be
used to initialize an array of character type, and any other use either
has undefined behavior or is a constraint violation. That would avoid
all the complication of determining from context whether it can be
optimized, and would probably cover 99% of cases. But it's probably too
late for that.

Agreed.

As it is, #embed is complicated because it covers more than the simple
case of initialising a const array of unsigned char. But it can't cover anything like all cases of embedding external data in C programs. (I
have programs with internal web servers - they need to embed all files
in a directory, and create an indexing structure. This is currently all automated by a python script called from the makefile - switching to
#embed only would involve manual source changes when files are added or removed.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Lawrence D'Oliveiro on Mon May 27 14:03:16 2024

On 27/05/2024 03:48, Lawrence D'Oliveiro wrote:

On Mon, 27 May 2024 01:55:24 +0100, bart wrote:

On 27/05/2024 01:44, Lawrence D'Oliveiro wrote:

Nothing “unwieldy” about it. It’s a bunch of temporary intermediate build
products, generated from suitable source files like everything else in the build.

It only solves the easy part: including that binary data in the build.

Apparently that is not so easy as you seem to think.

Yes, it is as easy as I think. I’ve done this sort of thing, using
suitable build scripts.

Show me.

This is how I show help text, which is maintained in an ordinary text
file, from within my C compiler:

println sinclude("help.txt")

Just one line directly in the source code. The text is baked in to the executable so there is no discrete file in the installation.

What would it look like in your build system, and what does it look like
in the source code of your app?

I mean, it's not as though this stuff is impossible without such a
feature; the idea is to make it much simpler to do.

If your method is simpler, I'll get rid of my feature and use your way.

BTW here is the entire build process for the compiler:

C:\cx>mm cc
Compiling cc.m to cc.exe

'cc' is cc.m, the lead module. It incorporates 42 embedded files in all.
Your method can't be any more elaborate than that.

I don't use build scripts; I don't need them.

Here is another example using C (using an older compiler that supported embedded text files; this is not standard C, but it could be, and I
think will be using #embed).

It is a program posted by Michael S, but with an extra 'puts' line at
the beginning so that it first prints out its own source code.

It works by embedded the text for itself within the binary. I'd be
interested in how your build process manages this.

----------------------------------

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

int main(int argz, char** argv)
{
puts(strinclude(__FILE__));

if (argz > 1) {
FILE* fp = fopen(argv[1], "wb");
if (fp) {
char buf[2048];
_Bool look_for_comma = 0;
for (;;) {
if (fgets(buf, sizeof(buf), stdin) != buf)
break;

char* p = buf;
for (;;) {
char c = *p;
if (isgraph(c)) {
if (look_for_comma) {
if (c == ',') {
look_for_comma = 0;
++p;
} else {
goto done;
}
} else {
char* endp;
long val = strtol(p, &endp, 0);
if (endp==p) // not a number
goto done;
fputc((unsigned char)val, fp);
p = endp;
look_for_comma = 1;
}
} else {
if (c == 0)
break; // end of line
++p; // skip space or control character
}
}
}
done:
fclose(fp);
} else {
perror(argv[1]);
return 1;
}
}
return 0;
}

----------------------------------

C:\c>bcc c.c
Compiling c.c to c.exe

C:\c>c
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

int main(int argz, char** argv)
{
puts(strinclude(__FILE__));

if (argz > 1) {
FILE* fp = fopen(argv[1], "wb");
if (fp) {
.....

Or maybe you think
that 'embedding a file' just means adding it to a zip file?

It’s whatever “including it in the build” means. It might indeed be a zip
component, as with resources for an Android app. Or it might be converted into an object file with a tool like objcopy, to be integrated into the executable.

Embedding applies also to text files not just binaries.

Same principle applies.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Keith Thompson on Tue May 28 00:20:30 2024

Keith Thompson <[email protected]> writes:

David Brown <[email protected]> writes:

On 26/05/2024 00:58, Keith Thompson wrote:

It knows because the compiler writers are actually quite smart. The C
standards may describe the translation process in a series of distinct
and independent phases, but that's not how it is done in practice.
The key point is that the compiler knows how the sequence of integers
is going to be used before it gets that far in the preprocessing.

I'd expect implementations to have extremely fast implementations for
initialising arrays of character types, and probably also for other
arrays of scaler types. More complicated examples - such as
parameters in a macro or function call - would probably use a
fall-back of generating naïve lists of integer constants.

My problem is not just with how the compiler can figure out when it can >optimize, but how programmers are supposed to understand whatever rules
it uses. Can I rely on the optimization being performed if I use a
typedef for unsigned char, or if I use an enumeration type whose
underlying type is unsigned char, or if I have initialization elements
befor and after the #embed directive?

A typical use case for me would be to build a binary file
with a bespoke application. I would expect the #embed of that
file to _maintain the binary layout in memory exactly the
same as in the file_. It would be the #embed user's
responsibilty to ensure that the binary file would be identical
to the binary data expected by the declaration of the data structure
being embedded.

E.g. if the embedded file contained an array of some structure,
the binary format of the embedded file must match the binary format
that would be expected by the compiler (field sizes, alignment etc)
for an array of said structure.

The spec does say that the data in memory must match the data in the
file. So it seems that the preprocessor can simply add a private
attribute (e.g. just pass the #embed to the compiler a la #line or #file)
and the compiler will tag the symbol table entry for the symbol associated
with the #embed and the code generator can just open the file and
copy the data byte-for-byte to the object file.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to David Brown on Tue May 28 02:48:28 2024

On Sun, 26 May 2024 18:12:17 +0200, David Brown wrote:

Macros in C are not recursive. That stops them exploding, but also means there's a lot you can't do with the preprocessor.

String-based macros + recursive substitution = recipe for trouble.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to bart on Tue May 28 02:45:48 2024

On Mon, 27 May 2024 14:03:16 +0100, bart wrote:

On 27/05/2024 03:48, Lawrence D'Oliveiro wrote:

Apparently that is not so easy as you seem to think.

Yes, it is as easy as I think. I’ve done this sort of thing, using
suitable build scripts.

Show me.

Here <https://github.com/ldo/unicode_browser_android> is an old
example, from when I was trying to learn Android programming. It lets
you browse the Unicode code-point database, and do incremental
searches by partial matching on code-point names: e.g. you can type
“right arrow” and see candidate matches such as “U+219B RIGHTWARDS
ARROW WITH STROKE”, “U+219D RIGHTWARDS WAVE ARROW”, “U+21A0 RIGHTWARDS TWO HEADED ARROW” etc.

In the “util” subdirectory, you will find a Python script called “get_codes”. This processes a NamesList.txt file as downloaded from Unicode.org, and encodes the database as a binary blob with a specially-constructed header to allow quick loading and extraction of code-point information, including names, categories, related entries
etc. This blob gets built as a “resource file” into the .apk file,
where the Java code can find it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Michael S on Tue May 28 05:41:12 2024

On Sun, 26 May 2024 19:50:40 +0300, Michael S wrote:

If I am not mistaken, gfortran by default treats extension .f
as "old FORTRAN" and extension .f90 as "new Fortran".

The full list of recognized file extensions and their treatment is here <https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gfortran/GNU-Fortran-and-GCC.html>.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to David Brown on Tue May 28 05:45:02 2024

On Mon, 27 May 2024 10:45:28 +0200, David Brown wrote:

Whichever compiler you use, I strongly recommend using only ".c" for C
files, and only ".cpp" for C++ files.

Some use .cc for C++ code as well. Looking at the Blender source tree, for example, I see a mix of .cpp and .cc.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Lawrence D'Oliveiro on Tue May 28 10:46:32 2024

On Tue, 28 May 2024 05:41:12 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:

On Sun, 26 May 2024 19:50:40 +0300, Michael S wrote:

If I am not mistaken, gfortran by default treats extension .f
as "old FORTRAN" and extension .f90 as "new Fortran".

The full list of recognized file extensions and their treatment is
here <https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gfortran/GNU-Fortran-and-GCC.html>.

Thank you.
So I remembered correctly, but did not realize that both old and new
variants (a.k.a fixed form and free form) are processed by the same
front end, f951. The driver programs (gcc, g++, gfortran) passes
dialect information to f951 as command line parameter. Fixed form is
chosen by -ffixed-form, free form appears to be the default.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Lawrence D'Oliveiro on Tue May 28 11:30:05 2024

On 28/05/2024 03:45, Lawrence D'Oliveiro wrote:

On Mon, 27 May 2024 14:03:16 +0100, bart wrote:

On 27/05/2024 03:48, Lawrence D'Oliveiro wrote:

Apparently that is not so easy as you seem to think.

Yes, it is as easy as I think. I’ve done this sort of thing, using
suitable build scripts.

Show me.

Here <https://github.com/ldo/unicode_browser_android> is an old
example, from when I was trying to learn Android programming. It lets
you browse the Unicode code-point database, and do incremental
searches by partial matching on code-point names: e.g. you can type
“right arrow” and see candidate matches such as “U+219B RIGHTWARDS ARROW WITH STROKE”, “U+219D RIGHTWARDS WAVE ARROW”, “U+21A0 RIGHTWARDS
TWO HEADED ARROW” etc.

In the “util” subdirectory, you will find a Python script called “get_codes”. This processes a NamesList.txt file as downloaded from Unicode.org, and encodes the database as a binary blob with a specially-constructed header to allow quick loading and extraction of code-point information, including names, categories, related entries
etc. This blob gets built as a “resource file” into the .apk file,
where the Java code can find it.

OK, so basically this writes a file. Or, part of a file?

Where is the bit in the Java code that embeds it. Or is writing it as
part of the .apk what you consider embedding?

This is like saying that there's no point in anyone doing:

#embed "clang.exe"

because building that program is so much more complicated. (Or would be
if somebody hadn't already done it.)

The point is this: /once you already have those discrete files/, how do
you painlessly embed them into your application?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Tue May 28 13:52:34 2024

On 28/05/2024 02:33, Keith Thompson wrote:

David Brown <[email protected]> writes:

On 27/05/2024 01:17, Keith Thompson wrote:

[...]

Here's how I personally would have preferred for #embed to be
specified:
- As in current C23 drafts, #embed with no parameters must operate
*as
if* it expanded to a comma-delimited list of integer constant
expressions.
- With no parameters, both the common cases (initializing an array of
characters) and odd cases (e.g., initializing a struct object with
varying types and sizes of members) must work as specified.
- A standard-defined parameter allows control over optimization.
The parameter can be "optimize(true)" or "optimize(false)".
"optimize(false)" has no formal effect, but the compiler *should*
generate the canonical sequence of constants.
"optimize(true)" causes undefined behavior if #embed is used in a
context other than the initialization of an array of character type.

I disagree here. I want the compiler to generate the "as if" results
regardless of any optimisation, working as currently specified. And
/if/ the compiler is able to optimise the #embed, then I want it to do
so automatically - I see no situation in which I would ever want
"optimize(false)".

The issue I'm trying to address (very prematurely, no doubt) is
that the decision of whether to optimize #embed vs. generating the
naive comma-separated sequence is difficult to formalize, and easy
to get wrong in corner cases.

That's probably true. I would expect compiler implementations to
optimise #embed only in cases where it is very clear (and at the very
least, initialising a const array of char will fall into that category),
and only when the preprocessor and compiler can coordinate it. Fallback
will be using integer literal constants. I can't see any reason why
that fallback should be slower than using xxd (or similar) and #include,
so #embed should always be no slower than existing methods but sometimes
very much faster.

If optimisation was controlled or specified by something in the standard
(such as your suggested "optimize()" parameter), then it would have to
be formalized - leaving it to the implementation, which can document it
as "best effort", entirely avoids the difficulty of specifying it. The
only formalization needed is to say that it will always act "as if" it generated a comma-separated sequence.

"restrict" is another performance
hint whose only formal effect is to introduce undefined behavior
if you use it incorrectly.

Yes, it is. (And I believe C23 has re-written some of the description
of "restrict" - not to change its behaviour, but to make it clearer. I
have not looked at that bit as yet.) But again, I can't see how any
discussion of optimisation of #embed affects the behaviour and therefore
any UB. The result is /always/ the same - it's just the compile time
that may differ.

Let's say I define an array of a 1-byte enumeration type, initialized
with #embed for a very large binary file. Maybe one compiler recognizes
this as a case where it can perform the optimization, and another
doesn't.

Yes, that may be the case.

If I can tell the compiler "trust me, I'm using this to
initialize raw byte data, and I'll take responsibility if I get it
wrong", I can see that being useful.

What do you mean by "wrong" here? Both compilers will give identical
results. The only difference is that one will do so faster than the other.

And maybe "optimize" isn't the best name. Perhaps "raw_bytes"?

"raw_bytes" makes no sense to me. I can see that "optimize" might be
confusing - normally the word refers to the speed (and/or memory usage)
of the generated code, while here it refers to the speed (and/or memory
usage) of the compilation.

Without some kind of programmer control, I'm concerned that the rules
for defining an array so #embed will be correctly optimized will be
spread as lore rather than being specified anywhere.

They might, but I really do not think that is so important, since they
will not affect the generated results.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Lawrence D'Oliveiro on Tue May 28 13:56:41 2024

On 28/05/2024 04:48, Lawrence D'Oliveiro wrote:

On Sun, 26 May 2024 18:12:17 +0200, David Brown wrote:

Macros in C are not recursive. That stops them exploding, but also means
there's a lot you can't do with the preprocessor.

String-based macros + recursive substitution = recipe for trouble.

I'm not sure I'd go /that/ far - but it is a recipe for complications,
which can include useful things and new ways to do really bad things.
It's quite possible to crash (or cause it to halt with an error message)
a C pre-processor without recursive macros - it just takes slightly longer.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Malcolm McLean on Tue May 28 15:53:20 2024

Malcolm McLean <[email protected]> writes:

On 28/05/2024 01:20, Scott Lurndal wrote:

Keith Thompson <[email protected]> writes:

David Brown <[email protected]> writes:

On 26/05/2024 00:58, Keith Thompson wrote:

It knows because the compiler writers are actually quite smart. The C >>>> standards may describe the translation process in a series of distinct >>>> and independent phases, but that's not how it is done in practice.
The key point is that the compiler knows how the sequence of integers
is going to be used before it gets that far in the preprocessing.

I'd expect implementations to have extremely fast implementations for
initialising arrays of character types, and probably also for other
arrays of scaler types. More complicated examples - such as
parameters in a macro or function call - would probably use a
fall-back of generating naïve lists of integer constants.

My problem is not just with how the compiler can figure out when it can
optimize, but how programmers are supposed to understand whatever rules
it uses. Can I rely on the optimization being performed if I use a
typedef for unsigned char, or if I use an enumeration type whose
underlying type is unsigned char, or if I have initialization elements
befor and after the #embed directive?

A typical use case for me would be to build a binary file
with a bespoke application. I would expect the #embed of that
file to _maintain the binary layout in memory exactly the
same as in the file_. It would be the #embed user's
responsibilty to ensure that the binary file would be identical
to the binary data expected by the declaration of the data structure
being embedded.

E.g. if the embedded file contained an array of some structure,
the binary format of the embedded file must match the binary format
that would be expected by the compiler (field sizes, alignment etc)
for an array of said structure.

The spec does say that the data in memory must match the data in the
file. So it seems that the preprocessor can simply add a private
attribute (e.g. just pass the #embed to the compiler a la #line or #file)
and the compiler will tag the symbol table entry for the symbol associated >> with the #embed and the code generator can just open the file and
copy the data byte-for-byte to the object file.

You need the Baby X resource compiler.

No, I'll use mmap() to map the binary into the application at run time.
for various reasons, #embed wouldn't be the proper solution
for this application since the data being mapped in varies depending
on the run-time configuration of the application.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Keith Thompson on Tue May 28 15:42:48 2024

Keith Thompson <[email protected]> writes:

[email protected] (Scott Lurndal) writes:

Keith Thompson <[email protected]> writes:

David Brown <[email protected]> writes:

On 26/05/2024 00:58, Keith Thompson wrote:
It knows because the compiler writers are actually quite smart. The C >>>> standards may describe the translation process in a series of distinct >>>> and independent phases, but that's not how it is done in practice.
The key point is that the compiler knows how the sequence of integers
is going to be used before it gets that far in the preprocessing.

I'd expect implementations to have extremely fast implementations for
initialising arrays of character types, and probably also for other
arrays of scaler types. More complicated examples - such as
parameters in a macro or function call - would probably use a
fall-back of generating naïve lists of integer constants.

My problem is not just with how the compiler can figure out when it can >>>optimize, but how programmers are supposed to understand whatever rules >>>it uses. Can I rely on the optimization being performed if I use a >>>typedef for unsigned char, or if I use an enumeration type whose >>>underlying type is unsigned char, or if I have initialization elements >>>befor and after the #embed directive?

A typical use case for me would be to build a binary file
with a bespoke application. I would expect the #embed of that
file to _maintain the binary layout in memory exactly the
same as in the file_.

I'm not sure why you'd expect that given the way #embed is specified -- >*unless* you're using to initialize an array of characters.

It would be the #embed user's
responsibilty to ensure that the binary file would be identical
to the binary data expected by the declaration of the data structure
being embedded.

E.g. if the embedded file contained an array of some structure,
the binary format of the embedded file must match the binary format
that would be expected by the compiler (field sizes, alignment etc)
for an array of said structure.

The spec does say that the data in memory must match the data in the
file.

Where does it say that?

See <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf>
6.10.4. (N3220 is a C26 draft, but it's very close to C23.)

The spec says that #embed expands to a comma-delimited sequence of
integer constant expressions (and like anything, optimizations that
don't violate the specified behavior are allowed). If the >implementation-defined *embed element width* is CHAR_BIT (which is not >guaranteed), then you can expect the same data layout *if* you use it to >initialize an array of characters, preferably unsigned char.

"Implementations should take into account translation-time bit and
byte orders as well as execution-time bit and byte orders to more
appropriately represent the resource's binary data from the directive.
This maximizes the chance that, if the resource referenced at translation
time through the #embed irective is the same one accessed through
execution-time means, the data that is e.g. fread or similar into contiguous
storage will compare bit-for-bit equal to an array of character type initialized
from an #embed directive's expanded contents."

p. 172 n3220.pdf.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Keith Thompson on Tue May 28 23:37:18 2024

On Tue, 28 May 2024 13:21:26 -0700
Keith Thompson <[email protected]> wrote:

David Brown <[email protected]> writes:

On 28/05/2024 02:33, Keith Thompson wrote:

[...]

Without some kind of programmer control, I'm concerned that the
rules for defining an array so #embed will be correctly optimized
will be spread as lore rather than being specified anywhere.

They might, but I really do not think that is so important, since
they will not affect the generated results.

Right, it won't affect the generated results (assuming I use it
correctly). Unless I use `#embed optimize(true)` to initialize
a struct with varying member sizes, but that's my fault because I
asked for it.

The point is compile-timer performance, and perhaps even the ability
to compile at all.

I'm thinking about hypothetical cases where I want to embed a
*very* large file and parsing the comma-delimited sequence could
have unacceptable compile-time performance, perhaps even causing
a compile-time stack overflow depending on how the parser works.
Every time the compiler sees #embed, it has to decide whether to
optimize it or not, and the decision criteria are not specified
anywhere (not at all in the standard, perhaps not clearly in the
compiler's documentation).

What about suggestion of Scott Lurndal?
Preprocessor emits implementation defined directive as a prefix to CSV
table. The directive tells to compiler to temporarily switch itself
into specialized parsing mode, probably keeping all converted numbers
in a single node in parser's results tree.
As demonstrated in several posts below, parsing by itself as well as
text to number conversion by itself, are not too bad. It's tree
management that is problematic.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to bart on Wed May 29 04:17:58 2024

On Tue, 28 May 2024 11:30:05 +0100, bart wrote:

OK, so basically this writes a file. Or, part of a file?

It converts a file into a quick-loading and easily-searchable format.

Where is the bit in the Java code that embeds it.

See the TableReader class.

The point is this: /once you already have those discrete files/, how do
you painlessly embed them into your application?

That’s what the build tools are for.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Wed May 29 10:02:49 2024

On 28/05/2024 22:21, Keith Thompson wrote:

David Brown <[email protected]> writes:

On 28/05/2024 02:33, Keith Thompson wrote:

[...]

Without some kind of programmer control, I'm concerned that the rules
for defining an array so #embed will be correctly optimized will be
spread as lore rather than being specified anywhere.

They might, but I really do not think that is so important, since they
will not affect the generated results.

Right, it won't affect the generated results (assuming I use it
correctly). Unless I use `#embed optimize(true)` to initialize
a struct with varying member sizes, but that's my fault because I
asked for it.

I am still not understanding your point. (I am confident that you have
a point, even if I don't get it.)

I cannot see why there would be any need or use of manually adding
optimisation hints or controls in the source code. I cannot see why the
there is any possibility of getting incorrect results in any way.

The point is compile-timer performance, and perhaps even the ability
to compile at all.

I'm thinking about hypothetical cases where I want to embed a
*very* large file and parsing the comma-delimited sequence could
have unacceptable compile-time performance, perhaps even causing
a compile-time stack overflow depending on how the parser works.
Every time the compiler sees #embed, it has to decide whether to
optimize it or not, and the decision criteria are not specified
anywhere (not at all in the standard, perhaps not clearly in the
compiler's documentation).

Yes, I agree with that. And this is how it should be - this is not
something that should be specified. The C standards give minimum
requirements for things like the number of identifiers or the length of
lines. But pretty much all compilers, for most of the "translation
limits", say they are "limited by the memory of the host computer". The
same will apply to #embed. And some compilers will cope better than
others with huge #embed's, some will be faster, some more memory
efficient. Some will change from version to version. This is not
something that can sensibly be specified or formalized - like pretty
much everything in regard to compilation time, each compiler does the
best it can without any specifications. I'd expect compiler reference
manuals might have hints, such as saying #embed is fastest with unsigned
char arrays (or whatever), but no more than that.

But again - I see no reason for manual optimisation hints, and no reason
for any possible errors.

Let me outline a possible strategy for a compiler like gcc. (I have not
looked at the prototype implementations from thephd, nor any gcc
developer discussions.)

gcc splits the C pre-processor and the compiler itself, and (currently) communicates dataflow in only one direction, via a temporary file or a
pipe. But the "gcc" (or "g++", according to preference) driver program
calls and coordinates the two programs.

If the pre-processor is called stand-alone, then it will generate a comma-separated list of integers, helpfully split over multiple lines of reasonable size. This will clearly always be correct, and always work,
within limits of a compiler's translation limits.

But when the gcc driver calls it, it will have a flag indicating that
the target compiler is gcc and supports an extended pre-processed syntax
(and also that the source is C23 - after all, the C pre-processor can be
used as a macro processor for other files with no relation to C). Now
the pre-processor has a lot more freedom. Whenever it meets an #embed directive, it can generate a line :

#embed_data 123456

followed in the file by 123456 (or whatever) bytes of binary data. The
C compiler, when parsing this file, will pull that in as a single blob.
Then it is up to the C compiler - which knows how the #embed data will
be used - to tell if the these bytes should be used as parameters to a
macro, initialisation for a char array, or whatever. And it can use
them as efficiently as practically possible. (It is probably only worth
using this for #embed data over a certain size - smaller #embed's could
just generate the integer sequences.)

Nowhere in this is there any call of manual optimisation hints, nor any
risk of incorrect results.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Kuyper@21:1/5 to BGB on Wed May 29 11:27:04 2024

On 5/28/24 02:31, BGB wrote:

On 5/27/2024 9:48 PM, Lawrence D'Oliveiro wrote:

On Sun, 26 May 2024 18:12:17 +0200, David Brown wrote:

Macros in C are not recursive. That stops them exploding, but also means >>> there's a lot you can't do with the preprocessor.\

...

It seems the preprocessor in BGBCC is likely not entirely conformant
in this case...

If given a recursive macro, it will most likely just explode and
probably crash the compiler...

Mostly as it handles macro-expansion by looping over the line and
performing macro-substitutions until no more substitutions are seen,
at which point it emits the line to the output buffer and moves on to
the next line.

That definitely fails to conform to the requirements in 6.10.4.4.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Bonita Montero on Thu May 30 02:32:03 2024

On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:

I've got a small commandline-tool that makes a const'd char
-array from any binary file.

It seems to me it would be more efficient to use objcopy to turn that
binary file directly into an object file with symbols accessible from C
code defining its beginning and ending points. Then just link it into the executable.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Lawrence D'Oliveiro on Thu May 30 11:09:05 2024

On Thu, 30 May 2024 02:32:03 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:

On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:

I've got a small commandline-tool that makes a const'd char
-array from any binary file.

It seems to me it would be more efficient to use objcopy to turn that
binary file directly into an object file with symbols accessible from
C code defining its beginning and ending points. Then just link it
into the executable.

Of course, it is more efficient.
But:
- it covers fewer use cases.
- it exposes array's name and size as global symbols which is not
always desirable
- it feels too much like a magic. It would feel less like a magic if
done by compiler rather than by extra tool. Even better if done by
compiler in standardized manner.

But yes, in real life, in embedded software project, that's what I'd do.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Michael S on Thu May 30 13:43:25 2024

On 30/05/2024 10:09, Michael S wrote:

On Thu, 30 May 2024 02:32:03 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:

On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:

I've got a small commandline-tool that makes a const'd char
-array from any binary file.

It seems to me it would be more efficient to use objcopy to turn that
binary file directly into an object file with symbols accessible from
C code defining its beginning and ending points. Then just link it
into the executable.

Of course, it is more efficient.
But:
- it covers fewer use cases.
- it exposes array's name and size as global symbols which is not
always desirable
- it feels too much like a magic. It would feel less like a magic if
done by compiler rather than by extra tool. Even better if done by
compiler in standardized manner.

But yes, in real life, in embedded software project, that's what I'd do.

In real life, in embedded software projects, I'd use xxd or a few lines
of Python and have an initialised const array in the code. Why would
you use something that "feels like magic" (i.e., may be mystical and
hard to understand for other developers) and is more limited? To save
three seconds of build time on the rare occasions when the source binary changes?

Writing C (or C++) programs to generate these initialise files can be
fun and instructive - I think we've all learned a little about where the bottlenecks really are as a result of your code. But it is not for
real-life usage.

#embed will be convenient for many common cases. For anything that
#embed can't handle, I'd need a project-specific script anyway (such as
for collecting all the files in a directory and building structures to
access them). And then it is all about developer convenience - spending
hours extra on the code to spare a few seconds of build time makes no sense.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Thu May 30 17:08:36 2024

On Thu, 30 May 2024 14:34:00 +0100
bart <[email protected]> wrote:

On 30/05/2024 03:32, Lawrence D'Oliveiro wrote:

On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:

I've got a small commandline-tool that makes a const'd char
-array from any binary file.

It seems to me it would be more efficient to use objcopy to turn
that binary file directly into an object file with symbols
accessible from C code defining its beginning and ending points.
Then just link it into the executable.

None of my compilers, whether for C or anything else, generate object
files.

However, suppose I wanted to link a file called 'logo.bmp' say, into
my program, which consisted of a file called main.c.

What is the entire process using your suggestion? What do I put into
main.c? Assume the data is represented by a char-array.

extern unsigned char _binary_logo_bmp_start[];
extern unsigned char _binary_logo_bmp_size[];

The first symbol is an array itself.
The seconded symbol contains the length of array. You use it in somewhat non-intuitive way:
size_t my_size = (size_t)_binary_logo_bmp_size;

Pay attention that I never used this method myself, just took a look at
the output of objcopy with 'objdump -t', so please don't take my words
as a sure thing.

BTW, options in this case are rather simple:
objcopy -I binary -O elf32-little logo.bmp logo_bmp.o
Replace elf32-little with relevant format for your software. However I
am not sure that it would work for none-elf output formats.

This command puts the variable into the section .data. If one wants it
in the different section, e.g. .rwdata then the thing could indeed
become less obvious.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Lawrence D'Oliveiro on Thu May 30 14:34:00 2024

On 30/05/2024 03:32, Lawrence D'Oliveiro wrote:

On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:

I've got a small commandline-tool that makes a const'd char
-array from any binary file.

It seems to me it would be more efficient to use objcopy to turn that
binary file directly into an object file with symbols accessible from C
code defining its beginning and ending points. Then just link it into the executable.

None of my compilers, whether for C or anything else, generate object files.

However, suppose I wanted to link a file called 'logo.bmp' say, into my program, which consisted of a file called main.c.

What is the entire process using your suggestion? What do I put into
main.c? Assume the data is represented by a char-array.

In my language, it would simply be this:

[]byte logobmp = binclude("logo.bmp")

Using my C extension, it might be this:

uint8_t logobmp[] = strinclude("logo.bmp");

(I believe this will cope with embedded zeros, and the file size is
obtainable with 'sizeof(logobmp)'.

With the new feature it might be this (I forget the exact syntax):

uint8_t logobmp[] = {
#embed "logo.bmp"
};

Nothing else is needed; just compile as normal.

The point of the feature is avoid the palavar with 'objcopy', which is a utility with 100 different options, or messing with ones like xxd.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Michael S on Thu May 30 17:51:07 2024

On Thu, 30 May 2024 17:08:36 +0300
Michael S <[email protected]> wrote:

On Thu, 30 May 2024 14:34:00 +0100
bart <[email protected]> wrote:

On 30/05/2024 03:32, Lawrence D'Oliveiro wrote:

On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:

I've got a small commandline-tool that makes a const'd char
-array from any binary file.

It seems to me it would be more efficient to use objcopy to turn
that binary file directly into an object file with symbols
accessible from C code defining its beginning and ending points.
Then just link it into the executable.

None of my compilers, whether for C or anything else, generate
object files.

However, suppose I wanted to link a file called 'logo.bmp' say, into
my program, which consisted of a file called main.c.

What is the entire process using your suggestion? What do I put
into main.c? Assume the data is represented by a char-array.

extern unsigned char _binary_logo_bmp_start[];
extern unsigned char _binary_logo_bmp_size[];

The first symbol is an array itself.
The seconded symbol contains the length of array. You use it in
somewhat non-intuitive way:
size_t my_size = (size_t)_binary_logo_bmp_size;

Pay attention that I never used this method myself, just took a look
at the output of objcopy with 'objdump -t', so please don't take my
words as a sure thing.

BTW, options in this case are rather simple:
objcopy -I binary -O elf32-little logo.bmp logo_bmp.o
Replace elf32-little with relevant format for your software. However I
am not sure that it would work for none-elf output formats.

Tested it.
On msys2 it can produce correct pe-x86-64 format but does it in counter-intuitive way: you have to ask for elf64-x86-64 instead of
pe-x86-64.
I don't know why it works like that.

The rest of what I wrote above was correct.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Thu May 30 18:03:45 2024

On Thu, 30 May 2024 15:48:39 +0100
bart <[email protected]> wrote:

Where do the _binary_logo_bmp_start and ...-size symbols come from?
That is, how do they get into the object file.

objcopy generates names of the symbols from the name of input binary
file. I would think that it is possible to change these symbols to
something else, but I am not sure that it is possible withing the same invocation of objcopy. It certainly is possible with a second pass.
Lawrence probably can give more authoritative answer.
Or as a last resort you can RTFM.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Thu May 30 15:48:39 2024

On 30/05/2024 15:08, Michael S wrote:

On Thu, 30 May 2024 14:34:00 +0100
bart <[email protected]> wrote:

On 30/05/2024 03:32, Lawrence D'Oliveiro wrote:

On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:

I've got a small commandline-tool that makes a const'd char
-array from any binary file.

It seems to me it would be more efficient to use objcopy to turn
that binary file directly into an object file with symbols
accessible from C code defining its beginning and ending points.
Then just link it into the executable.

None of my compilers, whether for C or anything else, generate object
files.

However, suppose I wanted to link a file called 'logo.bmp' say, into
my program, which consisted of a file called main.c.

What is the entire process using your suggestion? What do I put into
main.c? Assume the data is represented by a char-array.

extern unsigned char _binary_logo_bmp_start[];
extern unsigned char _binary_logo_bmp_size[];

The first symbol is an array itself.
The seconded symbol contains the length of array. You use it in somewhat non-intuitive way:
size_t my_size = (size_t)_binary_logo_bmp_size;

Pay attention that I never used this method myself, just took a look at
the output of objcopy with 'objdump -t', so please don't take my words
as a sure thing.

BTW, options in this case are rather simple:
objcopy -I binary -O elf32-little logo.bmp logo_bmp.o

Where do the _binary_logo_bmp_start and ...-size symbols come from? That
is, how do they get into the object file.

Replace elf32-little with relevant format for your software. However I
am not sure that it would work for none-elf output formats.

There appears to be an objcopy utility that runs under Windows.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to bart on Fri May 31 09:24:48 2024

On 30/05/2024 16:48, bart wrote:

On 30/05/2024 15:08, Michael S wrote:

Replace elf32-little with relevant format for your software. However I
am not sure that it would work for none-elf output formats.

There appears to be an objcopy utility that runs under Windows.

objcopy can handle lots of formats, as source or target, and can run on
any general OS host. So the question is not if you can get objcopy that
runs on Windows, it is whether you can use this kind of
blob-to-object-file conversion with the output in the Windows object
file format in the same was as you can for elf formats. You know vastly
more about the Windows object file formats than I do, so maybe you can
answer this yourself.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to David Brown on Fri May 31 13:39:49 2024

On Fri, 31 May 2024 09:24:48 +0200
David Brown <[email protected]> wrote:

On 30/05/2024 16:48, bart wrote:

On 30/05/2024 15:08, Michael S wrote:

Replace elf32-little with relevant format for your software.
However I am not sure that it would work for none-elf output
formats.

There appears to be an objcopy utility that runs under Windows.

objcopy can handle lots of formats, as source or target, and can run
on any general OS host. So the question is not if you can get
objcopy that runs on Windows, it is whether you can use this kind of blob-to-object-file conversion with the output in the Windows object
file format in the same was as you can for elf formats.

That's quite strange question.
You mean, you are able to imagine object file format uncapable to
represent initialized data array?

You know
vastly more about the Windows object file formats than I do, so maybe
you can answer this yourself.

objcopy supplied with msys2 appear to have bug in -O selection handling,
but fortunately there exists an easy workaround. Read my post below if
you are interested.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Michael S on Fri May 31 13:31:09 2024

On 31/05/2024 12:39, Michael S wrote:

On Fri, 31 May 2024 09:24:48 +0200
David Brown <[email protected]> wrote:

On 30/05/2024 16:48, bart wrote:

On 30/05/2024 15:08, Michael S wrote:

Replace elf32-little with relevant format for your software.
However I am not sure that it would work for none-elf output
formats.

There appears to be an objcopy utility that runs under Windows.

objcopy can handle lots of formats, as source or target, and can run
on any general OS host. So the question is not if you can get
objcopy that runs on Windows, it is whether you can use this kind of
blob-to-object-file conversion with the output in the Windows object
file format in the same was as you can for elf formats.

That's quite strange question.
You mean, you are able to imagine object file format uncapable to
represent initialized data array?

I'm sure I could imagine such a format, but I suppose it is quite unlikely!

You know
vastly more about the Windows object file formats than I do, so maybe
you can answer this yourself.

objcopy supplied with msys2 appear to have bug in -O selection handling,
but fortunately there exists an easy workaround. Read my post below if
you are interested.

OK, a bug in a particular version or build of objcopy sounds a lot more
likely than a perversely restricted object code format.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Fri May 31 13:55:33 2024

On 30/05/2024 16:03, Michael S wrote:

On Thu, 30 May 2024 15:48:39 +0100
bart <[email protected]> wrote:

Where do the _binary_logo_bmp_start and ...-size symbols come from?
That is, how do they get into the object file.

objcopy generates names of the symbols from the name of input binary
file. I would think that it is possible to change these symbols to
something else, but I am not sure that it is possible withing the same invocation of objcopy. It certainly is possible with a second pass.
Lawrence probably can give more authoritative answer.
Or as a last resort you can RTFM.

I gave myself the simple task of incorporating the source text of
hello.c into a program, and printing it out.

My C program looked like this to start, as an initial test (ignoring
declaring the size as an array, unless I had to):

#include <stdio.h>
typedef unsigned char byte;

extern byte _binary_hello_c_start[];
extern int _binary_hello_c_size;

int main(void) {
printf("%d\n", _binary_hello_c_size);
}

One small matter is those ugly, long identifiers. A bigger one in this
case is that I really want that embedded text to be zero terminated;
here it's unlikely to be.

However I still have to create the object file with the data. I tried this:

objcopy -I binary -O pe-x86-64 hello.c hello.obj

The contents looked about right when I looked inside.

Now to build my program. Because my C compiler can't link object files
itself, I have to get it to generate an object file for the program,
then use an external linker:

C:\c>mcc -c c.c
Compiling c.c to c.obj

C:\c>gcc c.obj hello.obj
hello.obj: file not recognized: file format not recognized
collect2.exe: error: ld returned 1 exit status

Unfortunately gcc/ld doesn't recognise the output of objcopy. Even
though it accepts the output of mcc which is the same COFF format.

But even if it worked, you can see it would be a bit of a palaver.

Here's how builtin embedding worked using a feature of my older C compiler:

#include <stdio.h>
#include <string.h>

char hello[] = strinclude("hello.c");

int main(void) {
printf("hello =\n%s\n", hello);
printf("strlen(hello) = %zu\n", strlen(hello));
printf("sizeof(hello) = %zu\n", sizeof(hello));
}

I build it and run it like this:

C:\c>bcc c
Compiling c.c to c.exe

C:\c>c
hello =
#include "stdio.h"

int main(void) {
printf("Hello, World!\n");
}

strlen(hello) = 70
sizeof(hello) = 71

C:\c>dir hello.c
31/05/2024 13:48 70 hello.c

It just works; no messing about with objcopy parameters; no long
unwieldy names; no link errors due to unsupported file formats; no
problems with missing terminators for embedded text files imported as
strings; no funny ways of getting size info.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Michael S on Fri May 31 16:28:11 2024

On Fri, 31 May 2024 16:19:37 +0300
Michael S <[email protected]> wrote:

No, it does not work like that.
First, copy *exactly* what I said in my previous post.
Only after you reproduced, start to be smart.
_binary_hello_c_size is a link simbol rather than variable.

Declaration:
extern char _binary_hello_c_size[];

Usage:
printf("%zd\n", (size_t)_binary_hello_c_size);

Thinking about it, I could be wrong.
I should test more, with less small program.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Fri May 31 16:19:37 2024

On Fri, 31 May 2024 13:55:33 +0100
bart <[email protected]> wrote:

On 30/05/2024 16:03, Michael S wrote:

On Thu, 30 May 2024 15:48:39 +0100
bart <[email protected]> wrote:

Where do the _binary_logo_bmp_start and ...-size symbols come from?
That is, how do they get into the object file.

objcopy generates names of the symbols from the name of input binary
file. I would think that it is possible to change these symbols to something else, but I am not sure that it is possible withing the
same invocation of objcopy. It certainly is possible with a second
pass. Lawrence probably can give more authoritative answer.
Or as a last resort you can RTFM.

I gave myself the simple task of incorporating the source text of
hello.c into a program, and printing it out.

My C program looked like this to start, as an initial test (ignoring declaring the size as an array, unless I had to):

#include <stdio.h>
typedef unsigned char byte;

extern byte _binary_hello_c_start[];
extern int _binary_hello_c_size;

int main(void) {
printf("%d\n", _binary_hello_c_size);
}

No, it does not work like that.
First, copy *exactly* what I said in my previous post.
Only after you reproduced, start to be smart.
_binary_hello_c_size is a link simbol rather than variable.

Declaration:
extern char _binary_hello_c_size[];

Usage:
printf("%zd\n", (size_t)_binary_hello_c_size);

One small matter is those ugly, long identifiers. A bigger one in
this case is that I really want that embedded text to be zero
terminated; here it's unlikely to be.

The tool is not made specifically for ASCII strings, it is more generic.
I don't want it zero-terminated, the same as I don't want output of
fread() zero-terminated. I want it exactly like it is in the
input file.

However I still have to create the object file with the data. I tried
this:

objcopy -I binary -O pe-x86-64 hello.c hello.obj

The contents looked about right when I looked inside.

Now to build my program. Because my C compiler can't link object
files itself, I have to get it to generate an object file for the
program, then use an external linker:

C:\c>mcc -c c.c
Compiling c.c to c.obj

C:\c>gcc c.obj hello.obj
hello.obj: file not recognized: file format not recognized
collect2.exe: error: ld returned 1 exit status

Unfortunately gcc/ld doesn't recognise the output of objcopy. Even
though it accepts the output of mcc which is the same COFF format.

It recognizes it if lye to objcopy about format.
Specify elf64-x86-64 instead of pe-x86-64 and everything suddenly
works.
It's all was said in my posts from yesterday. It does not sound like you
had read them.

But even if it worked, you can see it would be a bit of a palaver.

Here's how builtin embedding worked using a feature of my older C
compiler:

#include <stdio.h>
#include <string.h>

char hello[] = strinclude("hello.c");

int main(void) {
printf("hello =\n%s\n", hello);
printf("strlen(hello) = %zu\n", strlen(hello));
printf("sizeof(hello) = %zu\n", sizeof(hello));
}

I build it and run it like this:

C:\c>bcc c
Compiling c.c to c.exe

C:\c>c
hello =
#include "stdio.h"

int main(void) {
printf("Hello, World!\n");
}

strlen(hello) = 70
sizeof(hello) = 71

C:\c>dir hello.c
31/05/2024 13:48 70 hello.c

It just works; no messing about with objcopy parameters; no long
unwieldy names; no link errors due to unsupported file formats; no
problems with missing terminators for embedded text files imported as strings; no funny ways of getting size info.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Fri May 31 15:04:46 2024

On 31/05/2024 14:48, Michael S wrote:

On Fri, 31 May 2024 16:28:11 +0300
Michael S <[email protected]> wrote:

On Fri, 31 May 2024 16:19:37 +0300
Michael S <[email protected]> wrote:

No, it does not work like that.
First, copy *exactly* what I said in my previous post.
Only after you reproduced, start to be smart.
_binary_hello_c_size is a link simbol rather than variable.

Declaration:
extern char _binary_hello_c_size[];

Usage:
printf("%zd\n", (size_t)_binary_hello_c_size);

Thinking about it, I could be wrong.
I should test more, with less small program.

I tested with bigger program, and it's still works.
So, what written above is correct.

Can you show the full program and the full process?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Michael S on Fri May 31 16:48:35 2024

On Fri, 31 May 2024 16:28:11 +0300
Michael S <[email protected]> wrote:

On Fri, 31 May 2024 16:19:37 +0300
Michael S <[email protected]> wrote:

No, it does not work like that.
First, copy *exactly* what I said in my previous post.
Only after you reproduced, start to be smart.
_binary_hello_c_size is a link simbol rather than variable.

Declaration:
extern char _binary_hello_c_size[];

Usage:
printf("%zd\n", (size_t)_binary_hello_c_size);

Thinking about it, I could be wrong.
I should test more, with less small program.

I tested with bigger program, and it's still works.
So, what written above is correct.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Fri May 31 15:03:55 2024

On 31/05/2024 14:19, Michael S wrote:

On Fri, 31 May 2024 13:55:33 +0100

No, it does not work like that.
First, copy *exactly* what I said in my previous post.
Only after you reproduced, start to be smart.
_binary_hello_c_size is a link simbol rather than variable.

Declaration:
extern char _binary_hello_c_size[];

Usage:
printf("%zd\n", (size_t)_binary_hello_c_size);

I've now tried all sorts of combinations. While I can display the
address of _binary_hello_c_size, it will crash if I try and deference it.

The value of that symbol looks like this:

00007ff678b90046

It is clearly not the size of the data. But as I said, I can't get
inside it. Neither is it simply the end address of the data (they differ
by about 2**30).

One small matter is those ugly, long identifiers. A bigger one in
this case is that I really want that embedded text to be zero
terminated; here it's unlikely to be.

The tool is not made specifically for ASCII strings, it is more generic.

There are two possibilities I'm interested in:

* Having the data zero-terminated, for when you want to embed a text
file as a zero-terminated string
* Everything else, where you just want the binary blob as-is

Unfortunately gcc/ld doesn't recognise the output of objcopy. Even
though it accepts the output of mcc which is the same COFF format.

It recognizes it if lye to objcopy about format.
Specify elf64-x86-64 instead of pe-x86-64 and everything suddenly
works.
It's all was said in my posts from yesterday. It does not sound like you
had read them.

You said RTFM; I did. Nowhere did it say you have to use ELF object
format /on Windows/.

So, since I know gcc/ld on Windows understands PE, why didn't work?

You can see there is problem after problem and a number of quirks.

But if you have a fully working demo(you can forget the string
requirement, that was just an easy way of showing it had the right
data), then I will look at it again.

Even if works however, I don't like it and now don't trust it.

This is why I prefer language-supported solutions, and fortunately in my languages I can make it as simple and intuitive as I like.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Fri May 31 17:34:37 2024

On Fri, 31 May 2024 15:04:46 +0100
bart <[email protected]> wrote:

On 31/05/2024 14:48, Michael S wrote:

On Fri, 31 May 2024 16:28:11 +0300
Michael S <[email protected]> wrote:

On Fri, 31 May 2024 16:19:37 +0300
Michael S <[email protected]> wrote:

No, it does not work like that.
First, copy *exactly* what I said in my previous post.
Only after you reproduced, start to be smart.
_binary_hello_c_size is a link simbol rather than variable.

Declaration:
extern char _binary_hello_c_size[];

Usage:
printf("%zd\n", (size_t)_binary_hello_c_size);

Thinking about it, I could be wrong.
I should test more, with less small program.

I tested with bigger program, and it's still works.
So, what written above is correct.

Can you show the full program and the full process?

test_objcopy.c:
#include <stdio.h>

int data1[42] = { 1,2,3 ,4,5};
extern unsigned char _binary_test_bi_start[];
extern unsigned char _binary_test_bi_end[];
extern unsigned char _binary_test_bi_size[];

extern unsigned char _binary_bin_to_list_c_start[];
extern unsigned char _binary_bin_to_list_c_end[];
extern unsigned char _binary_bin_to_list_c_size[];

int main()
{
printf("%-40s %p %zd\n", "_binary_test_bi_start",
_binary_test_bi_start, (size_t)_binary_test_bi_start);
printf("%-40s %p %zd\n", "_binary_test_bi_end",
_binary_test_bi_end, (size_t)_binary_test_bi_end);
printf("%-40s %p %zd\n", "_binary_test_bi_size",
_binary_test_bi_size, (size_t)_binary_test_bi_size);
printf("%-40s %p %zd\n", "_binary_bin_to_list_c_start",
_binary_bin_to_list_c_start, (size_t)_binary_bin_to_list_c_start);
printf("%-40s %p %zd\n", "_binary_bin_to_list_c_end",
_binary_bin_to_list_c_end, (size_t)_binary_bin_to_list_c_end);
printf("%-40s %p %zd\n", "_binary_bin_to_list_c_size",
_binary_bin_to_list_c_size, (size_t)_binary_bin_to_list_c_size);
return 0;
}

Test files: test.bi and bin_to_list_c.
Conversion to ojects:
objcopy -I binary -O elf64-x86-64 test.bi test_bi.o
objcopy -I binary -O elf64-x86-64 bin_to_list.c test_c.o

Compilation:
gcc -s -Wall -Oz test_objcopy.c test_bi.o test_c.o

I compiled with additional option -Xlinker -Map=test_objcopy.map
in order to make myself sure tha *_size are indeed pure symbols that
have no memory allocated underneaths.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to bart on Fri May 31 15:34:29 2024

On 2024-05-31, bart <[email protected]> wrote:

Here's how builtin embedding worked using a feature of my older C compiler:

#include <stdio.h>
#include <string.h>

char hello[] = strinclude("hello.c");

int main(void) {
printf("hello =\n%s\n", hello);
printf("strlen(hello) = %zu\n", strlen(hello));
printf("sizeof(hello) = %zu\n", sizeof(hello));
}

Lisp:

$ cat strincl.tl
(defmacro strinclude (path)
(put-line `including @path`)
(file-get-string path))

(defun test()
(strinclude "/etc/hostname"))

When we run it interpreted we see from the debug put-line that /etc/hostname is included at macro-expansion time before we run the test function:

$ txr -i strincl.tl
including /etc/hostname
This TTY may be recorded for privacy-violating and evidence-gathering purposes.

(test)

"sun-go\n"

Now compile the file: the file is pulled it at compile time. Twice. :)
A double expansion took place due to certain complexities of compiling.

$ txr --compile=strincl.tl
including /etc/hostname
including /etc/hostname

Now when we load the compiled file, the diagnostic trace
"including /etc/hostname" no longer appears: the string is part of the test function as a literal:

$ txr -i strincl
TXR is enteric coated to release over 24 hours of lasting relief.

(test)

"sun-go\n"

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @[email protected]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Fri May 31 19:03:10 2024

On 31/05/2024 15:34, Michael S wrote:

On Fri, 31 May 2024 15:04:46 +0100
bart <[email protected]> wrote:

Can you show the full program and the full process?

test_objcopy.c:
#include <stdio.h>

int data1[42] = { 1,2,3 ,4,5};
extern unsigned char _binary_test_bi_start[];
extern unsigned char _binary_test_bi_end[];
extern unsigned char _binary_test_bi_size[];

extern unsigned char _binary_bin_to_list_c_start[];
extern unsigned char _binary_bin_to_list_c_end[];
extern unsigned char _binary_bin_to_list_c_size[];

int main()
{
printf("%-40s %p %zd\n", "_binary_test_bi_start",
_binary_test_bi_start, (size_t)_binary_test_bi_start);
printf("%-40s %p %zd\n", "_binary_test_bi_end",
_binary_test_bi_end, (size_t)_binary_test_bi_end);
printf("%-40s %p %zd\n", "_binary_test_bi_size",
_binary_test_bi_size, (size_t)_binary_test_bi_size);
printf("%-40s %p %zd\n", "_binary_bin_to_list_c_start",
_binary_bin_to_list_c_start, (size_t)_binary_bin_to_list_c_start);
printf("%-40s %p %zd\n", "_binary_bin_to_list_c_end",
_binary_bin_to_list_c_end, (size_t)_binary_bin_to_list_c_end);
printf("%-40s %p %zd\n", "_binary_bin_to_list_c_size",
_binary_bin_to_list_c_size, (size_t)_binary_bin_to_list_c_size);
return 0;
}

Test files: test.bi and bin_to_list_c.
Conversion to ojects:
objcopy -I binary -O elf64-x86-64 test.bi test_bi.o
objcopy -I binary -O elf64-x86-64 bin_to_list.c test_c.o

Compilation:
gcc -s -Wall -Oz test_objcopy.c test_bi.o test_c.o

OK, thanks. But I forget to ask what results you got from running the
program. Because if I try your code, using hello.c and hello.exe as test binary/source data, I get this output:

_binary_test_bi_start 00007ff6497620e0 140695771160800 _binary_test_bi_end 00007ff649762ae0 140695771163360 _binary_test_bi_size 00007ff509750a00 140690402380288 _binary_bin_to_list_c_start 00007ff649762ae0 140695771163360 _binary_bin_to_list_c_end 00007ff649762b26 140695771163430 _binary_bin_to_list_c_size 00007ff509750046 140690402377798

The sizes should have been 2560 and 70 respectively; those values are
bit bigger than that.

However I see that you also have start and end addresses, which sounds a
much better way of determining the size. (In that case, what are those
*size symbols for?).

So I can put together a working test:

---------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

extern unsigned char _binary_hello_c_start[];
extern unsigned char _binary_hello_c_end[];

char* makestr(char* start, char* end) {
int length = end-start;
char* s = malloc(length+1);
memcpy(s, start, length);
*(s+length) = 0;
return s;
}

int main() {
char* str = makestr(_binary_hello_c_start, _binary_hello_c_end);

printf("Hello = \n%s", str);
}
---------------------------------

I can build it like this:

---------------------------------
C:\c>mcc -c c
Compiling c.c to c.obj

C:\c>objcopy -I binary -O elf64-x86-64 hello.c hello.obj

C:\c>gcc c.c hello.obj
---------------------------------

And run it like this:
---------------------------------
C:\c>a
Hello =
#include "stdio.h"

int main(void) {
printf("Hello, World!\n");
}
---------------------------------

Instead of one compiler, here I used two compilers, a tool 'objcopy'
(which bizarrely needs to generate ELF format files) and lots of extra
ugly code. I also need to disregard whatever the hell _binary_..._size does.

But it works.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to bart on Fri May 31 18:36:14 2024

bart <[email protected]> writes:

On 31/05/2024 15:34, Michael S wrote:

On Fri, 31 May 2024 15:04:46 +0100
bart <[email protected]> wrote:

Instead of one compiler, here I used two compilers, a tool 'objcopy'
(which bizarrely needs to generate ELF format files) and lots of extra
ugly code. I also need to disregard whatever the hell _binary_..._size does.

$ objcopy -I binary -O elf64-x86-64 main.cpp /tmp/test.o

$ objdump -x /tmp/test.o

/tmp/test.o: file format elf64-little
/tmp/test.o
architecture: UNKNOWN!, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000

Sections:
Idx Name Size VMA LMA File off Algn
0 .data 000030e2 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g .data 0000000000000000 _binary_main_cpp_start 00000000000030e2 g .data 0000000000000000 _binary_main_cpp_end 00000000000030e2 g *ABS* 0000000000000000 _binary_main_cpp_size

$ ls -l main.cpp
-rw-rw-r--. 1 scott scott 12514 May 9 2022 main.cpp
$ printf '%u\n' $(( 0x30e2 ))
12514

The value of the symbol _binary_main_cpp_size is the
number of bytes in the file.

(in other words,

_binary_main_cpp_size = _binary_main_cpp_end - _binary_main_cpp_start

)

In C code:

extern uint8_t _binary_main_cpp_size;

const size_t embed_size = &_binary_main_cpp_size;

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From jak@21:1/5 to All on Fri May 31 21:42:35 2024

bart ha scritto:

On 31/05/2024 15:34, Michael S wrote:

On Fri, 31 May 2024 15:04:46 +0100
bart <[email protected]> wrote:

Can you show the full program and the full process?

test_objcopy.c:
#include <stdio.h>

int data1[42] = { 1,2,3 ,4,5};
extern unsigned char _binary_test_bi_start[];
extern unsigned char _binary_test_bi_end[];
extern unsigned char _binary_test_bi_size[];

extern unsigned char _binary_bin_to_list_c_start[];
extern unsigned char _binary_bin_to_list_c_end[];
extern unsigned char _binary_bin_to_list_c_size[];

int main()
{
   printf("%-40s %p %zd\n", "_binary_test_bi_start",
     _binary_test_bi_start, (size_t)_binary_test_bi_start);
   printf("%-40s %p %zd\n", "_binary_test_bi_end",
     _binary_test_bi_end, (size_t)_binary_test_bi_end);
   printf("%-40s %p %zd\n", "_binary_test_bi_size",
     _binary_test_bi_size, (size_t)_binary_test_bi_size);
   printf("%-40s %p %zd\n", "_binary_bin_to_list_c_start",
     _binary_bin_to_list_c_start, (size_t)_binary_bin_to_list_c_start); >>    printf("%-40s %p %zd\n", "_binary_bin_to_list_c_end",
     _binary_bin_to_list_c_end, (size_t)_binary_bin_to_list_c_end);
   printf("%-40s %p %zd\n", "_binary_bin_to_list_c_size",
     _binary_bin_to_list_c_size, (size_t)_binary_bin_to_list_c_size);
   return 0;
}

Test files: test.bi and bin_to_list_c.
Conversion to ojects:
objcopy -I binary -O elf64-x86-64 test.bi test_bi.o
objcopy -I binary -O elf64-x86-64 bin_to_list.c test_c.o

Compilation:
gcc -s -Wall -Oz test_objcopy.c test_bi.o test_c.o

OK, thanks. But I forget to ask what results you got from running the program. Because if I try your code, using hello.c and hello.exe as test binary/source data, I get this output:

_binary_test_bi_start                    00007ff6497620e0 140695771160800
_binary_test_bi_end                      00007ff649762ae0 140695771163360
_binary_test_bi_size                     00007ff509750a00 140690402380288
_binary_bin_to_list_c_start              00007ff649762ae0 140695771163360
_binary_bin_to_list_c_end                00007ff649762b26 140695771163430
_binary_bin_to_list_c_size               00007ff509750046 140690402377798

The sizes should have been 2560 and 70 respectively; those values are
bit bigger than that.

However I see that you also have start and end addresses, which sounds a
much better way of determining the size. (In that case, what are those
*size symbols for?).

So I can put together a working test:

---------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

extern unsigned char _binary_hello_c_start[];
extern unsigned char _binary_hello_c_end[];

char* makestr(char* start, char* end) {
    int length = end-start;
    char* s = malloc(length+1);
    memcpy(s, start, length);
    *(s+length) = 0;
    return s;
}

int main() {
    char* str = makestr(_binary_hello_c_start, _binary_hello_c_end);

    printf("Hello = \n%s", str);
}
---------------------------------

I can build it like this:

---------------------------------
C:\c>mcc -c c
Compiling c.c to c.obj

C:\c>objcopy -I binary -O elf64-x86-64 hello.c hello.obj

C:\c>gcc c.c hello.obj
---------------------------------

And run it like this:
---------------------------------
C:\c>a
Hello =
#include "stdio.h"

int main(void) {
    printf("Hello, World!\n");
}
---------------------------------

Instead of one compiler, here I used two compilers, a tool 'objcopy'
(which bizarrely needs to generate ELF format files) and lots of extra
ugly code. I also need to disregard whatever the hell _binary_..._size
does.

But it works.

You could use the pe-x86-64 format instead of the elf64-x86-64 to reduce
the size of the object.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to jak on Fri May 31 21:11:26 2024

jak <[email protected]> writes:

bart ha scritto:

On 31/05/2024 15:34, Michael S wrote:

On Fri, 31 May 2024 15:04:46 +0100
bart <[email protected]> wrote:

<snip>

Instead of one compiler, here I used two compilers, a tool 'objcopy'
(which bizarrely needs to generate ELF format files) and lots of extra
ugly code. I also need to disregard whatever the hell _binary_..._size
does.

But it works.

You could use the pe-x86-64 format instead of the elf64-x86-64 to reduce
the size of the object.

By a half dozen bytes, perhaps, and only if your binutils have been
built to support pe-x86-64:

$ objcopy -I binary -O pe-x86-64 main.cpp /tmp/test1.o
objcopy:/tmp/test1.o: Invalid bfd target

The ELF64 format has a 64 byte header, the string table and the
symbol table, and the remainder is the binary
data. The PE header may save a few bytes by using 32-bit fields in
the PE COFF header and symbol table.

Note, you might want to trim your posts when replying with a one-sentence reply.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to jak on Fri May 31 22:17:54 2024

On 31/05/2024 20:42, jak wrote:

bart ha scritto:

C:\c>objcopy -I binary -O elf64-x86-64 hello.c hello.obj

You could use the pe-x86-64 format instead of the elf64-x86-64 to reduce
the size of the object.

The PE format doesn't work; gcc's ld linker has a problem with it, when
it is generated by 'objcopy'.

Actually there is a LOT wrong with this whole approach.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Scott Lurndal on Fri May 31 22:15:54 2024

On 31/05/2024 19:36, Scott Lurndal wrote:

bart <[email protected]> writes:

On 31/05/2024 15:34, Michael S wrote:

On Fri, 31 May 2024 15:04:46 +0100
bart <[email protected]> wrote:

Instead of one compiler, here I used two compilers, a tool 'objcopy'
(which bizarrely needs to generate ELF format files) and lots of extra
ugly code. I also need to disregard whatever the hell _binary_..._size does.

$ objcopy -I binary -O elf64-x86-64 main.cpp /tmp/test.o

$ objdump -x /tmp/test.o

/tmp/test.o: file format elf64-little
/tmp/test.o
architecture: UNKNOWN!, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000

Sections:
Idx Name Size VMA LMA File off Algn
0 .data 000030e2 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g .data 0000000000000000 _binary_main_cpp_start 00000000000030e2 g .data 0000000000000000 _binary_main_cpp_end 00000000000030e2 g *ABS* 0000000000000000 _binary_main_cpp_size

$ ls -l main.cpp
-rw-rw-r--. 1 scott scott 12514 May 9 2022 main.cpp
$ printf '%u\n' $(( 0x30e2 ))
12514

The value of the symbol _binary_main_cpp_size is the
number of bytes in the file.

(in other words,

_binary_main_cpp_size = _binary_main_cpp_end - _binary_main_cpp_start

)

In C code:

extern uint8_t _binary_main_cpp_size;

const size_t embed_size = &_binary_main_cpp_size;

Did you see the output from my version of Michael S's program? The size
is just an address. If I do what you do:

extern unsigned char _binary_hello_c_size;

....
size_t size = &_binary_hello_c_size;
printf("size: %zu\n", size);

It produces:

size: 140697695027270

Little of this seems to work, sorry. You guys keep saying, do this, do
that, no do it that way, go RTFM, but nobody has shown a complete
program that correctly shows the -size symbol to be giving anything
meaningful.

If I run this:

printf("%p\n", &_binary_hello_c_start);
printf("%p\n", &_binary_hello_c_end);
printf("%p\n", &_binary_hello_c_size);

I get:

00007ff6ef252010
00007ff6ef252056
00007ff5af240046

I can see that the first two can be subtracted to give the sizes of the
data, which is 70 or 0x46. 0x46 is the last byte of the address of
_size, so what's happening there? What's with the crap in bits 16-47?

I can extract the size using:

printf("%d\n", (unsigned short)&_binary_hello_c_size);

But something is not right. I've also asked what is the point of the
-size symbol if you can just do -end - -start, but nobody has explained.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to bart on Sat Jun 1 01:25:07 2024

bart <[email protected]> writes:

On 31/05/2024 19:36, Scott Lurndal wrote:

bart <[email protected]> writes:

On 31/05/2024 15:34, Michael S wrote:

On Fri, 31 May 2024 15:04:46 +0100
bart <[email protected]> wrote:

Instead of one compiler, here I used two compilers, a tool 'objcopy'
(which bizarrely needs to generate ELF format files) and lots of extra
ugly code. I also need to disregard whatever the hell _binary_..._size does.

$ objcopy -I binary -O elf64-x86-64 main.cpp /tmp/test.o

$ objdump -x /tmp/test.o

/tmp/test.o: file format elf64-little
/tmp/test.o
architecture: UNKNOWN!, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000

Sections:
Idx Name Size VMA LMA File off Algn
0 .data 000030e2 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g .data 0000000000000000 _binary_main_cpp_start
00000000000030e2 g .data 0000000000000000 _binary_main_cpp_end
00000000000030e2 g *ABS* 0000000000000000 _binary_main_cpp_size

$ ls -l main.cpp
-rw-rw-r--. 1 scott scott 12514 May 9 2022 main.cpp
$ printf '%u\n' $(( 0x30e2 ))
12514

The value of the symbol _binary_main_cpp_size is the
number of bytes in the file.

(in other words,

_binary_main_cpp_size = _binary_main_cpp_end - _binary_main_cpp_start >>
)

In C code:

extern uint8_t _binary_main_cpp_size;

const size_t embed_size = &_binary_main_cpp_size;

Did you see the output from my version of Michael S's program? The size
is just an address. If I do what you do:

extern unsigned char _binary_hello_c_size;

....
size_t size = &_binary_hello_c_size;
printf("size: %zu\n", size);

It produces:

size: 140697695027270

Little of this seems to work, sorry. You guys keep saying, do this, do
that, no do it that way, go RTFM, but nobody has shown a complete
program that correctly shows the -size symbol to be giving anything >meaningful.

If I run this:

printf("%p\n", &_binary_hello_c_start);
printf("%p\n", &_binary_hello_c_end);
printf("%p\n", &_binary_hello_c_size);

I get:

00007ff6ef252010
00007ff6ef252056
00007ff5af240046

I can see that the first two can be subtracted to give the sizes of the
data, which is 70 or 0x46. 0x46 is the last byte of the address of
_size, so what's happening there? What's with the crap in bits 16-47?

I can extract the size using:

printf("%d\n", (unsigned short)&_binary_hello_c_size);

But something is not right. I've also asked what is the point of the
-size symbol if you can just do -end - -start, but nobody has explained.

$ cat /tmp/m.c
#include <stdio.h>
#include <stdint.h>

extern uint64_t _binary_main_cpp_size;
extern uint8_t *_binary_main_cpp_start;
extern uint8_t *_binary_main_cpp_end;

int main()
{
printf("%p\n", &_binary_main_cpp_size);
printf("%p\n", &_binary_main_cpp_start);
printf("%p\n", &_binary_main_cpp_end);
return 0;
}
$ objcopy -I binary -B i386 -O elf64-x86-64 main.cpp /tmp/test.o
$ cc -o /tmp/m /tmp/m.c /tmp/test.o
$ /tmp/m
0x30e2
0x601034
0x604116
$ nm /tmp/m | grep _binary_main
0000000000604116 D _binary_main_cpp_end
00000000000030e2 A _binary_main_cpp_size
0000000000601034 D _binary_main_cpp_start
$ wc -c main.cpp
12514 main.cpp
$ printf 0x%x\\n 12514
0x30e2

The size symbol requires no space in the resulting
executable memory image, and it's more convenient than
having to do the math (at run time, since the compiler
can't know the actual values).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Lynn McGuire on Sat Jun 1 01:27:41 2024

Lynn McGuire <[email protected]> writes:

On 5/26/2024 6:23 AM, Bonita Montero wrote:

Am 26.05.2024 um 09:13 schrieb jak:

About this I only agree partially because it depends a lot on the
context in which it is used. Moreover, I would not know how to indicate
an optimal programming language for all seasons.

C++ is in almost any case the better C.

What you describe is the greatest inconvenience of c++. To make only one >>> example, when they decided to rewrite the FB platform to accelerate it,
they thought of migrating from php to c++ and they had a collapse of the >>> staff suitable for work, so they thought of relying a compiler that
translated the php into c++ and many of the new languages were born to
try to remedy hits complexity.

C++ is the wrong language for web applications.
I like Java more for that.

C++ is the wrong language for real time apps.

That's an incorrect statement.

No memory allocation allowed.

It is trivially easy to write C++ code that doesn't
allocate memory dynamically.

I use C++ for my server side apps on my webserver. Works great.

I use C++ for operating systems (you can't get more real-time
than that) and bare-metal hypervisors.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From jak@21:1/5 to All on Sat Jun 1 03:37:04 2024

bart ha scritto:

I can see that the first two can be subtracted to give the sizes of the
data, which is 70 or 0x46. 0x46 is the last byte of the address of
_size, so what's happening there? What's with the crap in bits 16-47?

I can extract the size using:

printf("%d\n", (unsigned short)&_binary_hello_c_size);

But something is not right. I've also asked what is the point of the
-size symbol if you can just do -end - -start, but nobody has explained.

typedef unsigned char uchar;
extern uchar _binary_hello_c_size[];
long hello_c_size = _binary_hello_c_size - (uchar *)0;

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to bart on Sat Jun 1 01:39:40 2024

On Thu, 30 May 2024 14:34:00 +0100, bart wrote:

On 30/05/2024 03:32, Lawrence D'Oliveiro wrote:

On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:

I've got a small commandline-tool that makes a const'd char -array
from any binary file.

It seems to me it would be more efficient to use objcopy to turn that
binary file directly into an object file with symbols accessible from C
code defining its beginning and ending points. Then just link it into
the executable.

None of my compilers, whether for C or anything else, generate object
files.

That’s too bad. All the good compilers, for languages like C and others
which are meant to execute efficiently, do.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Michael S on Sat Jun 1 01:45:51 2024

On Thu, 30 May 2024 11:09:05 +0300, Michael S wrote:

On Thu, 30 May 2024 02:32:03 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:

On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:

I've got a small commandline-tool that makes a const'd char -array
from any binary file.

It seems to me it would be more efficient to use objcopy to turn that
binary file directly into an object file with symbols accessible from C
code defining its beginning and ending points. Then just link it into
the executable.

Of course, it is more efficient.
But:
- it covers fewer use cases.

There are many ways of embedding a binary blob in a software project. This
is just one tool for that; there are other tools for other cases (see the Unicode Browser for Android example that I mentioned elsewhere).

- it exposes array's name and size as global symbols which is not
always desirable

Lots of other things already need to be global symbols, I don’t see why a couple more make a difference to anything.

Look at how large projects like the Linux kernel deal with this sort of
thing.

- it feels too much like a magic. It would feel less like a magic if
done by compiler rather than by extra tool. Even better if done by
compiler in standardized manner.

I don’t understand this at all. I never had the assumption, in any real- world build system, that all the generated code had to come from some “official” compiler for some “official” language.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to jak on Sat Jun 1 11:09:25 2024

On 01/06/2024 02:37, jak wrote:

bart ha scritto:

I can see that the first two can be subtracted to give the sizes of
the data, which is 70 or 0x46. 0x46 is the last byte of the address of
_size, so what's happening there? What's with the crap in bits 16-47?

I can extract the size using:

    printf("%d\n", (unsigned short)&_binary_hello_c_size);

But something is not right. I've also asked what is the point of the
-size symbol if you can just do -end - -start, but nobody has explained.

    typedef unsigned char uchar;
    extern uchar _binary_hello_c_size[];
    long hello_c_size = _binary_hello_c_size - (uchar *)0;

What result for the size did you get when you ran this?

It seems people are just guessing what might be the right code and
posting random fragments!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Lawrence D'Oliveiro on Sat Jun 1 11:37:45 2024

On 01/06/2024 02:39, Lawrence D'Oliveiro wrote:

On Thu, 30 May 2024 14:34:00 +0100, bart wrote:

On 30/05/2024 03:32, Lawrence D'Oliveiro wrote:

On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:

I've got a small commandline-tool that makes a const'd char -array
from any binary file.

It seems to me it would be more efficient to use objcopy to turn that
binary file directly into an object file with symbols accessible from C
code defining its beginning and ending points. Then just link it into
the executable.

None of my compilers, whether for C or anything else, generate object
files.

That’s too bad. All the good compilers, for languages like C and others which are meant to execute efficiently, do.

What do you mean by 'are meant to execute efficiently'? Is that
build-time or run-time of the resulting program?

In the latter case, whether it uses object files is irrevant.

For build-time, pointlessly generating a discrete object file will slow
things down.

My compilers don't routinely generate object files, which would also
need an external dependency (a linker), but they can do if necessary
(eg. to statically link my code into another program with another compiler).

The compiler for my main language is a whole-program one. If it were to
create an object file, it would be a single file; there would be no
others to link to!

And here, makefiles also assume independent compilation of modules.

So it is makefiles that appear to be holding back advancement in this
area, by requiring traditional module-at-a-time building, and requiring
object file intermediates.

C:\qx52>mm -obj qq
Compiling qq.m to qq.obj

C:\qx52>dir qq.obj
01/06/2024 11:34 787,788 qq.obj

C:\qx52>gcc qq.obj -oqq # 'link'

C:\qx52>qq
Q5.2 Interpreter
Usage:
qq filename[.q]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Scott Lurndal on Sat Jun 1 11:24:20 2024

On 01/06/2024 02:25, Scott Lurndal wrote:

bart <[email protected]> writes:

Little of this seems to work, sorry. You guys keep saying, do this, do
that, no do it that way, go RTFM, but nobody has shown a complete
program that correctly shows the -size symbol to be giving anything
meaningful.

If I run this:

printf("%p\n", &_binary_hello_c_start);
printf("%p\n", &_binary_hello_c_end);
printf("%p\n", &_binary_hello_c_size);

I get:

00007ff6ef252010
00007ff6ef252056
00007ff5af240046

I can see that the first two can be subtracted to give the sizes of the
data, which is 70 or 0x46. 0x46 is the last byte of the address of
_size, so what's happening there? What's with the crap in bits 16-47?

I can extract the size using:

printf("%d\n", (unsigned short)&_binary_hello_c_size);

But something is not right. I've also asked what is the point of the
-size symbol if you can just do -end - -start, but nobody has explained.

$ cat /tmp/m.c
#include <stdio.h>
#include <stdint.h>

extern uint64_t _binary_main_cpp_size;
extern uint8_t *_binary_main_cpp_start;
extern uint8_t *_binary_main_cpp_end;

int main()
{
printf("%p\n", &_binary_main_cpp_size);
printf("%p\n", &_binary_main_cpp_start);
printf("%p\n", &_binary_main_cpp_end);
return 0;
}
$ objcopy -I binary -B i386 -O elf64-x86-64 main.cpp /tmp/test.o
$ cc -o /tmp/m /tmp/m.c /tmp/test.o
$ /tmp/m
0x30e2
0x601034
0x604116
$ nm /tmp/m | grep _binary_main
0000000000604116 D _binary_main_cpp_end
00000000000030e2 A _binary_main_cpp_size
0000000000601034 D _binary_main_cpp_start
$ wc -c main.cpp
12514 main.cpp
$ printf 0x%x\\n 12514
0x30e2

The size symbol requires no space in the resulting
executable memory image, and it's more convenient than
having to do the math (at run time, since the compiler
can't know the actual values).

Here's my transcript:

-------------------------------------
C:\c>copy hello.c main.cpp # create main.cpp, here it's 70 bytes
1 file(s) copied.

C:\c>type m.c # exact same code as yours
#include <stdio.h>
#include <stdint.h>

extern uint64_t _binary_main_cpp_size;
extern uint8_t *_binary_main_cpp_start;
extern uint8_t *_binary_main_cpp_end;

int main()
{
printf("%p\n", &_binary_main_cpp_size);
printf("%p\n", &_binary_main_cpp_start);
printf("%p\n", &_binary_main_cpp_end);
return 0;
}

C:\c>objcopy -I binary -O elf64-x86-64 main.cpp test.o # make test.o

C:\c>gcc m.c test.o -o m.exe # build m executable

C:\c>m # run m.exe
00007ff5d5480046 # and the size is ...
00007ff715492010
00007ff715492056
-------------------------------------

Maybe Windows is at fault? I'll try it under WSL:

-------------------------------------
root@DESKTOP-11:/mnt/c/c# objcopy -I binary -O elf64-x86-64 main.cpp test.o root@DESKTOP-11:/mnt/c/c# gcc m.c test.o -o m
root@DESKTOP-11:/mnt/c/c# ./m
0x55effc9f2046
0x55effc9f6010
0x55effc9f6056
-------------------------------------

Nope, same thing. This doesn't inspire much confidence. With values
shown, the actual size IS contained within the _size value, but only as
the last 16 bits of the value.

gcc versions were 10.3.0 and 9.4.0 respectively; the latter is what is
provided by Windows 11.

You also brought up the fact that the size is not known to the compiler
anyway, which means a few things are not possible, like using the size
in a static context.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Malcolm McLean on Sat Jun 1 11:53:15 2024

On 01/06/2024 01:53, Malcolm McLean wrote:

On 31/05/2024 13:55, bart wrote:

On 30/05/2024 16:03, Michael S wrote:

On Thu, 30 May 2024 15:48:39 +0100
bart <[email protected]> wrote:

Where do the _binary_logo_bmp_start and ...-size symbols come from?
That is, how do they get into the object file.

objcopy generates names of the symbols from the name of input binary
file. I would think that it is possible to change these symbols to
something else, but I am not sure that it is possible withing the same
invocation of objcopy. It certainly is possible with a second pass.
Lawrence probably can give more authoritative answer.
Or as a last resort you can RTFM.

I gave myself the simple task of incorporating the source text of
hello.c into a program, and printing it out.

Here's how builtin embedding worked using a feature of my older C
compiler:

   #include <stdio.h>
   #include <string.h>

   char hello[] = strinclude("hello.c");

   int main(void) {
       printf("hello =\n%s\n", hello);
       printf("strlen(hello) = %zu\n", strlen(hello));
       printf("sizeof(hello) = %zu\n", sizeof(hello));
   }

I build it and run it like this:

   C:\c>bcc c
   Compiling c.c to c.exe

   C:\c>c
   hello =
   #include "stdio.h"

   int main(void) {
       printf("Hello, World!\n");
   }

   strlen(hello) = 70
   sizeof(hello) = 71

   C:\c>dir hello.c
   31/05/2024 13:48                70 hello.c

It just works; no messing about with objcopy parameters; no long
unwieldy names; no link errors due to unsupported file formats; no
problems with missing terminators for embedded text files imported as
strings; no funny ways of getting size info.

Here's my solution. It's a bit more complicated.

int bbx_write_source (const char *source_xml, char *path, const char *source_xml_file, const char *source_xml_name)
{
    XMLDOC *doc = 0;
    char error[1024];
    char buff[1024];
    XMLNODE *root;
    XMLNODE *node;
    const char *name;
    FILE *fpout;
    FILE *fpin;
    int ch;

    doc = xmldocfromstring(source_xml, error, 1024);
    if (!doc)
    {
        fprintf(stderr, "%s\n", error);
        return -1;
    }
    root = xml_getroot(doc);
    if (strcmp(xml_gettag(root), "FileSystem"))
        return -1;

    if (!root->child)
        return -1;
    if (strcmp(xml_gettag(root->child), "directory"))
        return -1;

    for (node = root->child->child; node != NULL; node = node->next)
    {
        if (!strcmp(xml_gettag(node), "file"))
        {
            name = xml_getattribute(node, "name");
            snprintf(buff, 1024, "%s%s", path, name);
            fpout = fopen(buff, "w");
            if (!fpout)
                break;
            fpin = file_fopen(node);
            if (!fpin)
                break;
            if (!strcmp(name, source_xml_file))
            {
                char *escaped = texttostring(source_xml);
                if (!escaped)
                    break;
                fprintf(fpout, "char %s[] = %s;\n", source_xml_name,
escaped);
                free(escaped);
            }
            else
            {
               while ((ch = fgetc(fpin)) != EOF)
                   fputc(ch, fpout);
            }
            fclose(fpout);
            fclose(fpin);
            fpout = 0;
            fpin = 0;
        }
    }
    if (fpin || fpout)
    {
        fclose(fpin);
        fclose(fpout);
        return -1;
    }

    return 0;

}

It's leveraging the Baby X resource compiler, the xmparser, and my
filesystem programs. You can't include the source of a program in the
program as a C string, because then the source changes to include that string. So what you do is this.

You first place a placeholder C source file containing a short dummy
string.
The you convert the source to an XML file, and turn it into a string
with the Baby X Resource compiler. Then you drop the source into the
file, removing the placeholder.

Then the program walks the file list, detects that file, and replaces it
with the xml string it has been passed.

And this system works, and it's an easy way of adding source output to ptograms. Of course the function now needs to be modified to walk the
entire tree recursively and I will need a makedirectory function. I've
got it to work for flat source directories.

Sorry, I don't understand what that does; what is the input and what is
the output?

In the case of a very simple requirement of incorporating a text file
into a C program as data, usually string data (which I have to say is
much more common for me than doing anything with XML), how would a BBX
solution work?

This doesn't work:

char strdata[] = {
#include "file.txt"
}

Because the contents of file.txt, which let's say are:

one
two
three

are interpreted as C source code ('one' is a syntax error, or it might
be the name of some identifier).

Some process is needed to either turn that file into:

"one\ntwo\nthree\n"

or into a bunch of numbers: '100, 110, 101, ...'. I think this is what
'xxd' does.

In the case of binary files, the process of embedding is usually blind
to the actual format, or meaning, of the file. It is just a blob of data.

So here, I understand that the BBXRC solution goes much further. If I
wanted to include a JPG file, then either #embed or my strinclude()
would just incorporate the raw bytes. I would still need a JPEG decoder
to use that data.

Whereas BBXRC, AIUI, does the decoding for you, and incorporates the
data as a raw table of pixel values that can be directly used.

So it is at a different level from what is being discussed. But
sometimes there is also a need for that cruder form of embedding: maybe
that JPG just needs to be written out again; no need to get inside it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From jak@21:1/5 to All on Sat Jun 1 13:59:32 2024

bart ha scritto:

On 01/06/2024 02:37, jak wrote:

bart ha scritto:

I can see that the first two can be subtracted to give the sizes of
the data, which is 70 or 0x46. 0x46 is the last byte of the address
of _size, so what's happening there? What's with the crap in bits 16-47? >>>
I can extract the size using:

    printf("%d\n", (unsigned short)&_binary_hello_c_size);

But something is not right. I've also asked what is the point of the
-size symbol if you can just do -end - -start, but nobody has explained.

     typedef unsigned char uchar;
     extern uchar _binary_hello_c_size[];
     long hello_c_size = _binary_hello_c_size - (uchar *)0;

What result for the size did you get when you ran this?

It seems people are just guessing what might be the right code and
posting random fragments!

I wrote it that way precisely because I believed it was the clearest
way. With the extern you can retrive the relative values that in the
case of _start and _end correspond to the initial and final address of
the object, in fact you can get the length of the object by subtracting
the starting address from the final one:

extern char _binary_hello_c_start[];
extern char _binary_hello_c_end[];

long len = _binary_hello_c_end - _binary_hello_c_start;

Unfortunately, _size is provided in the same way as _start and _end
addresses, then, since it is not an address but a length and in C:
Address +/- Value = Address
Address +/- Address = Value
so, to retrive this length that in the program it is seen as an address
it is sufficient to subtract the starting address which in the case of a
length is zero.

extern char _binary_hello_c_size[];

long len = _binary_hello_c_size - (char *)0;

surely you can also recover the value with a cast:

long len = (long)_binary_hello_c_size;

but the example I sent you had seemed more explanatory while the cast
seems to me a blow of hoe.
Here nobody invents anything. I'm sorry you think this.
/*
* example:
* file to embed:
* --- start file.txt ---
* line number 1
* line number 2
* line number 3
* line number 4
* line number 5
* line number 6
* line number 7
* line number 8
* line number 9
* line number 10
* line number 11
* line number 12
* line number 13
* line number 14
* line number 15
* line number 16
* line number 17
* line number 18
* line number 19
* line number 20
* --- end file.txt ---
* objcopy --input-target binary --output-target pe-x86-64 --binary-architecture i386 file.txt file.txt.o
* gcc embed.c file.txt.o -o embed
*/

#include <stdio.h>

int main()
{
typedef unsigned char uchar;
extern uchar _binary_file_txt_start[];
extern uchar _binary_file_txt_size[];
long file_txt_size = _binary_file_txt_size - (uchar *)0;

for(long i = 0; i < file_txt_size; i++)
putchar(_binary_file_txt_start[i]);

return 0;
}
output: show file.txt content

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to bart on Sat Jun 1 05:17:12 2024

bart <[email protected]> writes:

On 01/06/2024 02:25, Scott Lurndal wrote:

bart <[email protected]> writes:

Little of this seems to work, sorry. You guys keep saying, do this, do
that, no do it that way, go RTFM, but nobody has shown a complete
program that correctly shows the -size symbol to be giving anything
meaningful.

If I run this: [attempt to reproduce example]

$ cat /tmp/m.c
#include <stdio.h>
#include <stdint.h>

extern uint64_t _binary_main_cpp_size;
extern uint8_t *_binary_main_cpp_start;
extern uint8_t *_binary_main_cpp_end;

int main()
{
printf("%p\n", &_binary_main_cpp_size);
printf("%p\n", &_binary_main_cpp_start);
printf("%p\n", &_binary_main_cpp_end);
return 0;
}
$ objcopy -I binary -B i386 -O elf64-x86-64 main.cpp /tmp/test.o
$ cc -o /tmp/m /tmp/m.c /tmp/test.o
$ /tmp/m
0x30e2
0x601034
0x604116
$ nm /tmp/m | grep _binary_main
0000000000604116 D _binary_main_cpp_end
00000000000030e2 A _binary_main_cpp_size
0000000000601034 D _binary_main_cpp_start
$ wc -c main.cpp
12514 main.cpp
$ printf 0x%x\\n 12514
0x30e2

The size symbol requires no space in the resulting
executable memory image, and it's more convenient than
having to do the math (at run time, since the compiler
can't know the actual values).

Here's my transcript:

-------------------------------------
C:\c>copy hello.c main.cpp # create main.cpp, here it's 70 bytes
1 file(s) copied.

C:\c>type m.c # exact same code as yours
#include <stdio.h>
#include <stdint.h>

extern uint64_t _binary_main_cpp_size;
extern uint8_t *_binary_main_cpp_start;
extern uint8_t *_binary_main_cpp_end;

int main()
{
printf("%p\n", &_binary_main_cpp_size);
printf("%p\n", &_binary_main_cpp_start);
printf("%p\n", &_binary_main_cpp_end);
return 0;
}

C:\c>objcopy -I binary -O elf64-x86-64 main.cpp test.o # make test.o

C:\c>gcc m.c test.o -o m.exe # build m executable

C:\c>m # run m.exe
00007ff5d5480046 # and the size is ...
00007ff715492010
00007ff715492056

[similar results under WSL]

For what it's worth I see the same behavior running on linux.
It looks like the culprit is gcc, which apparently relocates
the symbol even though it is marked with an A type. After
running around in circles for a goodly amount of time, it
occurred to me to try compiling using clang, and that worked.

I suppose it's good to know about the &_binary_main_cpp_size
trick, but it's kind of the worst of both worlds: the size
is baked into the executable (or half-baked I might say), but
the value can't be used at compile time. Bleah. If I wanted
to use the objcopy method of inserting raw text into a C
program, I would either do a run-time subtraction to find out
what the size is, or simply add an extra step to the makefile
to extract the size out of the 'nm' output and produce a .h
file with a (named) value that could be used at compile time.
And both of these methods work under gcc as well as clang.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Lynn McGuire on Sat Jun 1 15:28:02 2024

On 01/06/2024 01:34, Lynn McGuire wrote:

On 5/26/2024 6:23 AM, Bonita Montero wrote:

Am 26.05.2024 um 09:13 schrieb jak:

About this I only agree partially because it depends a lot on the
context in which it is used. Moreover, I would not know how to indicate
an optimal programming language for all seasons.

C++ is in almost any case the better C.

What you describe is the greatest inconvenience of c++. To make only one >>> example, when they decided to rewrite the FB platform to accelerate it,
they thought of migrating from php to c++ and they had a collapse of the >>> staff suitable for work, so they thought of relying a compiler that
translated the php into c++ and many of the new languages were born to
try to remedy hits complexity.

C++ is the wrong language for web applications.
I like Java more for that.

C++ is the wrong language for real time apps. No memory allocation
allowed.

I use C++ for real-time apps. You don't have to have dynamic memory
allocation just because you are writing in C++ !

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Lynn McGuire on Sat Jun 1 15:30:56 2024

On 01/06/2024 00:55, Lynn McGuire wrote:

On 5/23/2024 2:25 PM, Bonita Montero wrote:

Am 22.05.2024 um 18:55 schrieb David Brown:

In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change
in it, especially compared to the minor changes in C17.

<https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>
<https://en.wikipedia.org/wiki/C23_(C_standard_revision)>
<https://en.cppreference.com/w/c/23>

I like that it tidies up a lot of old stuff - it is neater to have
things like "bool", "static_assert", etc., as part of the language
rather than needing a half-dozen includes for such basic stuff.

I like that it standardises a several useful extensions that have
been in gcc and clang (and possibly other compilers) for many years.

I'm not sure it will make a big difference to my own programming -
when I want "typeof" or "chk_add()", I already use them in gcc. But
for people restricted to standard C, there's more new to enjoy. And
I prefer to use standard syntax when possible.

"constexpr" is something I think I will find helpful, in at least
some circumstances.

I ask myself what the point is in further developing a language
like this that can actually no longer be saved.

There is way more code written in C than C++. For instance, just about
all real time systems such as device and engine management are written
in C.

These days, I believe engine management code is more likely to be
written in C++.

One of my friends writes the device code for a NAS manufacturer. The
code starts off with:
   while (1)
   {
      ... a bunch of code
   }

Hey! He's copied from me!

Pretty much /every/ embedded system has that loop at its heart - either
once (for bare metal), or in the RTOS and also once per thread.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to bart on Sat Jun 1 15:24:55 2024

On 01/06/2024 12:24, bart wrote:

On 01/06/2024 02:25, Scott Lurndal wrote:

bart <[email protected]> writes:

Little of this seems to work, sorry. You guys keep saying, do this, do
that, no do it that way, go RTFM, but nobody has shown a complete
program that correctly shows the -size symbol to be giving anything
meaningful.

But something is not right. I've also asked what is the point of the
-size symbol if you can just do -end - -start, but nobody has explained.

$ cat /tmp/m.c
#include <stdio.h>
#include <stdint.h>

extern uint64_t _binary_main_cpp_size;
extern uint8_t *_binary_main_cpp_start;
extern uint8_t *_binary_main_cpp_end;

int main()
{
     printf("%p\n", &_binary_main_cpp_size);
     printf("%p\n", &_binary_main_cpp_start);
     printf("%p\n", &_binary_main_cpp_end);
     return 0;
}
$ objcopy -I binary -B i386 -O elf64-x86-64 main.cpp /tmp/test.o
$ cc -o /tmp/m /tmp/m.c /tmp/test.o
$ /tmp/m
0x30e2
0x601034
0x604116
$ nm /tmp/m | grep _binary_main
0000000000604116 D _binary_main_cpp_end
00000000000030e2 A _binary_main_cpp_size
0000000000601034 D _binary_main_cpp_start

When I tried it on my Linux system, I get an error "relocation
R_X86_64_PC32 against absolute symbol `_binary_main_cpp_size' in section `.text' is disallowed". This is, I think, the correct response - from C
you do not have direct access to absolute linker symbols. There is no
space allocated for it in the executable, and it only exists as a
constant in the link stage. Without an allocated address, declaring it
as "extern" makes no sense. It makes even less sense to try to look at
the /address/ of the symbol and think that it holds useful information.
It's not often I say this, but I think Scott has got things muddled here.

The only workable use of the symbol "_binary_main_cpp_size" would be in
a linker script. If you want the size of the binary blob in your code,
use the obvious solution :

size_t size = &_binary_main_cpp_end - &_binary_main_cpp_start;

Sometimes it would be useful to have the size of the blob as a constant
at compile time - such as for declaring another array of the same size,
or using static assertions to check the size. You can't do that with
the objcopy blob inclusion. But you /can/ do it using "xxd -i" (or
similar scripts), or with #embed.

I am at a loss to see any advantages of the objcopy method in practical
use for blob embedding.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Tim Rentsch on Sat Jun 1 15:08:46 2024

Tim Rentsch <[email protected]> writes:

bart <[email protected]> writes:

On 01/06/2024 02:25, Scott Lurndal wrote:

bart <[email protected]> writes:

Little of this seems to work, sorry. You guys keep saying, do this, do >>>> that, no do it that way, go RTFM, but nobody has shown a complete
program that correctly shows the -size symbol to be giving anything
meaningful.

If I run this: [attempt to reproduce example]

$ cat /tmp/m.c
#include <stdio.h>
#include <stdint.h>

extern uint64_t _binary_main_cpp_size;
extern uint8_t *_binary_main_cpp_start;
extern uint8_t *_binary_main_cpp_end;

int main()
{
printf("%p\n", &_binary_main_cpp_size);
printf("%p\n", &_binary_main_cpp_start);
printf("%p\n", &_binary_main_cpp_end);
return 0;
}
$ objcopy -I binary -B i386 -O elf64-x86-64 main.cpp /tmp/test.o
$ cc -o /tmp/m /tmp/m.c /tmp/test.o
$ /tmp/m
0x30e2
0x601034
0x604116
$ nm /tmp/m | grep _binary_main
0000000000604116 D _binary_main_cpp_end
00000000000030e2 A _binary_main_cpp_size
0000000000601034 D _binary_main_cpp_start
$ wc -c main.cpp
12514 main.cpp
$ printf 0x%x\\n 12514
0x30e2

The size symbol requires no space in the resulting
executable memory image, and it's more convenient than
having to do the math (at run time, since the compiler
can't know the actual values).

Here's my transcript:

-------------------------------------
C:\c>copy hello.c main.cpp # create main.cpp, here it's 70 bytes
1 file(s) copied.

C:\c>type m.c # exact same code as yours
#include <stdio.h>
#include <stdint.h>

extern uint64_t _binary_main_cpp_size;
extern uint8_t *_binary_main_cpp_start;
extern uint8_t *_binary_main_cpp_end;

int main()
{
printf("%p\n", &_binary_main_cpp_size);
printf("%p\n", &_binary_main_cpp_start);
printf("%p\n", &_binary_main_cpp_end);
return 0;
}

C:\c>objcopy -I binary -O elf64-x86-64 main.cpp test.o # make test.o

C:\c>gcc m.c test.o -o m.exe # build m executable

C:\c>m # run m.exe
00007ff5d5480046 # and the size is ...
00007ff715492010
00007ff715492056

[similar results under WSL]

For what it's worth I see the same behavior running on linux.

Which versions? It works fine on my linux system (FC20, GCC 4.8.3)

It looks like the culprit is gcc, which apparently relocates
the symbol even though it is marked with an A type.

gcc doesn't do 'relocations'. If you have a problem, it's
likely with binutils (i.e. ld(1)).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Sat Jun 1 21:11:31 2024

On Fri, 31 May 2024 19:03:10 +0100
bart <[email protected]> wrote:

OK, thanks. But I forget to ask what results you got from running the program. Because if I try your code, using hello.c and hello.exe as
test binary/source data, I get this output:

_binary_test_bi_start 00007ff6497620e0
140695771160800 _binary_test_bi_end
00007ff649762ae0 140695771163360 _binary_test_bi_size
00007ff509750a00 140690402380288 _binary_bin_to_list_c_start
00007ff649762ae0 140695771163360 _binary_bin_to_list_c_end
00007ff649762b26 140695771163430
_binary_bin_to_list_c_size 00007ff509750046
140690402377798

The sizes should have been 2560 and 70 respectively; those values are
bit bigger than that.

That's strange. I got expected results:
_binary_test_bi_start 000000013FDD30C0 5366427840 _binary_test_bi_end 000000013FDD67AC 5366441900 _binary_test_bi_size 00000000000036EC 14060 _binary_bin_to_list_c_start 000000013FDD67AC 5366441900 _binary_bin_to_list_c_end 000000013FDD711F 5366444319 _binary_bin_to_list_c_size 0000000000000973 2419

However I see that you also have start and end addresses, which
sounds a much better way of determining the size. (In that case, what
are those *size symbols for?).

I'd guess, *_size is here for the benefit of less smart compilers that
can not figure out that *_end - *_start is a connst expression
and can not compile code like:

static ptrdiff_t bar = _binary_test_bi_end - _binary_test_bi_start;

But that is just a guess. For better answer you can ask authors of
objcopy.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to bart on Sat Jun 1 19:59:19 2024

On 01/06/2024 11:24, bart wrote:

On 01/06/2024 02:25, Scott Lurndal wrote:

[objcopy]

Nope, same thing. This doesn't inspire much confidence. With values
shown, the actual size IS contained within the _size value, but only as
the last 16 bits of the value.

gcc versions were 10.3.0 and 9.4.0 respectively; the latter is what is provided by Windows 11.

You also brought up the fact that the size is not known to the compiler anyway, which means a few things are not possible, like using the size
in a static context.

I thought I'd dash off my own version of 'objcopy' to see if I could do
any better. This version does binary/text to COFF only. The input file
here was a .wav file:

C:\qapps>qq objcopy test.wav # running my objcopy
Compiling test.m to test.obj
Written test.obj # haven't settled on naming schemes yet
char[] name is: test_wav
u64 size name is: test_wav_len

C:\qapps>gcc demo.c test.obj

C:\qapps>a
Size = 14355
Data = 52 49 46 ...

The demo.c file is this:

#include <stdio.h>

extern char test_wav[];
extern long long test_wav_len;

int main(void) {
printf("Size = %lld\n", test_wav_len);
printf("Data = %02x %02x %02x ...\n", test_wav[0], test_wav[1], test_wav[2]);
}

And this is info about the binary to show the right data has got into
the C program:

C:\qapps>dir test.wav
01/11/1996 04:05 14,354 test.wav

C:\qapps>dump test.wav
Dump of test.wav; Size = 14354 bytes
0000: 52 49 46 46 0A 38 00 00 57 41 56 45 66 6D 74 20 RIFF.8..WAVEfmt

There is one slight discrepancy: the size from the C file is one byte
bigger; that's because I'm using 'strinclude' (in the code compiled
during the process, which adds a terminator. I can fix that easily, or
allow the option.

The script used to implement my 'objcopy' is shown below. It writes out
a 2-line program which is compiled by my systems language into an object
file.

------------------------------------------------

proc main=
if ncmdparams<1 then
println "Usage:"
println " qq objcopy filename [name]"
stop
fi

infile:=cmdparams[1]
basename:=extractbasefile(infile)
mfile:=basename+".m"
objfile:=basename+".obj"
if infile in (mfile, objfile) then abort("Name clash") fi

name:=basename+"_"+extractext(infile)
if ncmdparams>1 then
name:=cmdparams[2]
fi

writetextfile(mfile, (
sfprint("export []byte # = strinclude(""#"")",name,
infile),
sfprint("export int #_len = #.len", name, name)
)
)

if system("mm -obj "+mfile)<>0 then
abort("Compile error on "+mfile)
else
println "Written", objfile
println "char[] name is:",name
println "u64 size name is:",name+"_len"
fi
end

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Tim Rentsch on Sat Jun 1 22:51:09 2024

On Sat, 01 Jun 2024 05:17:12 -0700
Tim Rentsch <[email protected]> wrote:

For what it's worth I see the same behavior running on linux.
It looks like the culprit is gcc, which apparently relocates
the symbol even though it is marked with an A type. After
running around in circles for a goodly amount of time, it
occurred to me to try compiling using clang, and that worked.

It works on Window/msys2 with gcc 13.2.0

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Sun Jun 2 01:11:35 2024

On Fri, 31 May 2024 22:15:54 +0100
bart <[email protected]> wrote:

If I run this:

printf("%p\n", &_binary_hello_c_start);
printf("%p\n", &_binary_hello_c_end);
printf("%p\n", &_binary_hello_c_size);

I get:

00007ff6ef252010
00007ff6ef252056
00007ff5af240046

I can see that the first two can be subtracted to give the sizes of
the data, which is 70 or 0x46. 0x46 is the last byte of the address
of _size, so what's happening there? What's with the crap in bits
16-47?

It looks like ASLR. I don't see it because I test on Win7.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Sun Jun 2 03:06:04 2024

On Sun, 2 Jun 2024 00:39:39 +0100
bart <[email protected]> wrote:

On 01/06/2024 23:11, Michael S wrote:

On Fri, 31 May 2024 22:15:54 +0100
bart <[email protected]> wrote:

If I run this:

printf("%p\n", &_binary_hello_c_start);
printf("%p\n", &_binary_hello_c_end);
printf("%p\n", &_binary_hello_c_size);

I get:

00007ff6ef252010
00007ff6ef252056
00007ff5af240046

I can see that the first two can be subtracted to give the sizes of
the data, which is 70 or 0x46. 0x46 is the last byte of the address
of _size, so what's happening there? What's with the crap in bits
16-47?

It looks like ASLR. I don't see it because I test on Win7.

I understand those are high-loading addresses. I was asking what they
were doing as part of the size.

Apparently, that size value is wrongly relocated by some versions of
gcc-ld. Since allocations work on 64KB blocks, that explains why the
bottom 16 bits are unaffected.

gnu-ld just erroneously marks it as relocatable.
Then Windows loader relocate/ I'd gues, Linux loader too.

So such a size value could still be used for objects up 64KB-1, but
it sounds dodgy.

For embedded bare-metal use, it will work o.k.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Sun Jun 2 00:39:39 2024

On 01/06/2024 23:11, Michael S wrote:

On Fri, 31 May 2024 22:15:54 +0100
bart <[email protected]> wrote:

If I run this:

printf("%p\n", &_binary_hello_c_start);
printf("%p\n", &_binary_hello_c_end);
printf("%p\n", &_binary_hello_c_size);

I get:

00007ff6ef252010
00007ff6ef252056
00007ff5af240046

I can see that the first two can be subtracted to give the sizes of
the data, which is 70 or 0x46. 0x46 is the last byte of the address
of _size, so what's happening there? What's with the crap in bits
16-47?

It looks like ASLR. I don't see it because I test on Win7.

I understand those are high-loading addresses. I was asking what they
were doing as part of the size.

Apparently, that size value is wrongly relocated by some versions of
gcc-ld. Since allocations work on 64KB blocks, that explains why the
bottom 16 bits are unaffected.

So such a size value could still be used for objects up 64KB-1, but it
sounds dodgy.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Scott Lurndal on Sat Jun 1 17:22:57 2024

[email protected] (Scott Lurndal) writes:

Tim Rentsch <[email protected]> writes:

bart <[email protected]> writes:

On 01/06/2024 02:25, Scott Lurndal wrote:

bart <[email protected]> writes:

Little of this seems to work, sorry. You guys keep saying, do this, do >>>>> that, no do it that way, go RTFM, but nobody has shown a complete
program that correctly shows the -size symbol to be giving anything
meaningful.

If I run this: [attempt to reproduce example]

$ cat /tmp/m.c
#include <stdio.h>
#include <stdint.h>

extern uint64_t _binary_main_cpp_size;
extern uint8_t *_binary_main_cpp_start;
extern uint8_t *_binary_main_cpp_end;

int main()
{
printf("%p\n", &_binary_main_cpp_size);
printf("%p\n", &_binary_main_cpp_start);
printf("%p\n", &_binary_main_cpp_end);
return 0;
}
$ objcopy -I binary -B i386 -O elf64-x86-64 main.cpp /tmp/test.o
$ cc -o /tmp/m /tmp/m.c /tmp/test.o
$ /tmp/m
0x30e2
0x601034
0x604116
$ nm /tmp/m | grep _binary_main
0000000000604116 D _binary_main_cpp_end
00000000000030e2 A _binary_main_cpp_size
0000000000601034 D _binary_main_cpp_start
$ wc -c main.cpp
12514 main.cpp
$ printf 0x%x\\n 12514
0x30e2

The size symbol requires no space in the resulting
executable memory image, and it's more convenient than
having to do the math (at run time, since the compiler
can't know the actual values).

Here's my transcript:

-------------------------------------
C:\c>copy hello.c main.cpp # create main.cpp, here it's 70 bytes >>> 1 file(s) copied.

C:\c>type m.c # exact same code as yours
#include <stdio.h>
#include <stdint.h>

extern uint64_t _binary_main_cpp_size;
extern uint8_t *_binary_main_cpp_start;
extern uint8_t *_binary_main_cpp_end;

int main()
{
printf("%p\n", &_binary_main_cpp_size);
printf("%p\n", &_binary_main_cpp_start);
printf("%p\n", &_binary_main_cpp_end);
return 0;
}

C:\c>objcopy -I binary -O elf64-x86-64 main.cpp test.o # make test.o

C:\c>gcc m.c test.o -o m.exe # build m executable

C:\c>m # run m.exe
00007ff5d5480046 # and the size is ...
00007ff715492010
00007ff715492056

[similar results under WSL]

For what it's worth I see the same behavior running on linux.

Which versions? It works fine on my linux system (FC20, GCC 4.8.3)

gcc --version gives 'gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0'

It looks like the culprit is gcc, which apparently relocates
the symbol even though it is marked with an A type.

gcc doesn't do 'relocations'. If you have a problem, it's
likely with binutils (i.e. ld(1)).

I expect you are right. I run ld directly only rarely, and
certainly am no expert. In my tests I was simply blindly
following the example shown in your posting (with some variations
after my attempts gave the wrong answer, trying to get it to
work). It didn't occur to me to consider ld.

Using clang for the final link step always gave the right answer,
if I remember correctly.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to jak on Sat Jun 1 17:26:41 2024

jak <[email protected]> writes:

bart ha scritto:

On 01/06/2024 02:37, jak wrote:

bart ha scritto:

I can see that the first two can be subtracted to give the sizes
of the data, which is 70 or 0x46. 0x46 is the last byte of the
address of _size, so what's happening there? What's with the crap
in bits 16-47?

I can extract the size using:

printf("%d\n", (unsigned short)&_binary_hello_c_size);

But something is not right. I've also asked what is the point of
the -size symbol if you can just do -end - -start, but nobody has
explained.

typedef unsigned char uchar;
extern uchar _binary_hello_c_size[];
long hello_c_size = _binary_hello_c_size - (uchar *)0;

What result for the size did you get when you ran this?

It seems people are just guessing what might be the right code and
posting random fragments!

I wrote it that way precisely because I believed it was the clearest
way. [...]

What is most clear is that the expression used has undefined
behavior.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Michael S on Sat Jun 1 17:47:31 2024

Michael S <[email protected]> writes:

On Fri, 31 May 2024 19:03:10 +0100
bart <[email protected]> wrote:

OK, thanks. But I forget to ask what results you got from running the
program. Because if I try your code, using hello.c and hello.exe as
test binary/source data, I get this output:

_binary_test_bi_start 00007ff6497620e0
140695771160800 _binary_test_bi_end
00007ff649762ae0 140695771163360 _binary_test_bi_size
00007ff509750a00 140690402380288 _binary_bin_to_list_c_start
00007ff649762ae0 140695771163360 _binary_bin_to_list_c_end
00007ff649762b26 140695771163430
_binary_bin_to_list_c_size 00007ff509750046
140690402377798

The sizes should have been 2560 and 70 respectively; those values are
bit bigger than that.

That's strange. I got expected results:
_binary_test_bi_start 000000013FDD30C0 5366427840 _binary_test_bi_end 000000013FDD67AC 5366441900 _binary_test_bi_size 00000000000036EC 14060 _binary_bin_to_list_c_start 000000013FDD67AC 5366441900 _binary_bin_to_list_c_end 000000013FDD711F 5366444319 _binary_bin_to_list_c_size 0000000000000973 2419

However I see that you also have start and end addresses, which
sounds a much better way of determining the size. (In that case, what
are those *size symbols for?).

I'd guess, *_size is here for the benefit of less smart compilers that
can not figure out that *_end - *_start is a connst expression
and can not compile code like:

static ptrdiff_t bar = _binary_test_bi_end - _binary_test_bi_start;

I wouldn't expect any C compiler to accept that. It's not a constant expression under the rules of the C standard, and there is no plausible
way to generate code for it in the context of compiling one translation
unit. Neither gcc nor clang accepts it, even run without asking for
any standard compliance. An implementation could allow it as an
extension, but there seems little reason to do so, because it would be
a lot of work to implement, and offers very little utility.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Malcolm McLean on Sun Jun 2 03:28:32 2024

On Sat, 1 Jun 2024 16:33:29 +0100, Malcolm McLean wrote:

... I like strings which you
can pass about (though to actually use the contents you need to covert
to char *, otherwuse it is hopeless) ...

That’s a bit sad, isn’t it, that such a feature is so obviously a bag
stuck on the side of the original C core.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to bart on Sun Jun 2 03:27:20 2024

On Sat, 1 Jun 2024 11:37:45 +0100, bart wrote:

My compilers don't routinely generate object files, which would also
need an external dependency (a linker), but they can do if necessary
(eg. to statically link my code into another program with another
compiler).

Modular code design would indicate that there is no point the compiler duplicating functionality available in the linker.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Lynn McGuire on Sun Jun 2 03:29:13 2024

On Fri, 31 May 2024 17:55:13 -0500, Lynn McGuire wrote:

while (1)

Why not

while (true)

or even

for (;;)

?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Scott Lurndal on Sun Jun 2 11:02:13 2024

On Sat, 01 Jun 2024 01:27:41 GMT
[email protected] (Scott Lurndal) wrote:

Lynn McGuire <[email protected]> writes:

On 5/26/2024 6:23 AM, Bonita Montero wrote:

Am 26.05.2024 um 09:13 schrieb jak:

About this I only agree partially because it depends a lot on the
context in which it is used. Moreover, I would not know how to
indicate an optimal programming language for all seasons.

C++ is in almost any case the better C.

What you describe is the greatest inconvenience of c++. To make
only one example, when they decided to rewrite the FB platform to
accelerate it, they thought of migrating from php to c++ and they
had a collapse of the staff suitable for work, so they thought of
relying a compiler that translated the php into c++ and many of
the new languages were born to try to remedy hits complexity.

C++ is the wrong language for web applications.
I like Java more for that.

C++ is the wrong language for real time apps.

That's an incorrect statement.

No memory allocation allowed.

It is trivially easy to write C++ code that doesn't
allocate memory dynamically.

I use C++ for my server side apps on my webserver. Works great.

I use C++ for operating systems (you can't get more real-time
than that)

Engines control is FAR more real-time that OS, to list just one example
out of many.
Of course, nowadays most of these things are no longer done on
general-purpose CPUs or even MCUs.

and bare-metal hypervisors.

It is hard to believe that you don't have at least one co-worker that
is begging to switch all new development to C approximately every week.
And couple of folks that beg for Rust.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Lawrence D'Oliveiro on Sun Jun 2 10:37:55 2024

On 02/06/2024 04:27, Lawrence D'Oliveiro wrote:

On Sat, 1 Jun 2024 11:37:45 +0100, bart wrote:

My compilers don't routinely generate object files, which would also
need an external dependency (a linker), but they can do if necessary
(eg. to statically link my code into another program with another
compiler).

Modular code design would indicate that there is no point the compiler duplicating functionality available in the linker.

Python uses modules and yet doesn't have a linker. How on earth does it
manage?

Lots of languages get by without linkers. Or without having to
pointlessly write out lots of discrete files, with a lot of useful info
lost, then having to read them all in again. (Look at the mess that
'objcopy' gets into.)

Quite a few compilers give the impression that they also do the job of
linking:

gcc x.c y.c z.c

produces an executable. Does it really matter here whether the 'linking'
is done by a separate program on discrete files, or completely internally?

Having all modules in-memory gives you the opportunity for whole-program optimisation, will all useful info intact, without having to invent the
far hairier and unwieldy concept of LTO.

Here is also my assembler in action given modules x.asm y.asm z.asm
produced by my C compiler:

aa x y z

It does the job of 'linking' but working from .asm files straight to
.exe or .dll. What's the effing point of a separate linker here?

Personally I first designed out a traditional linker sometime around
1983. The special Loader I write to combined my object files into a
single binary took seconds, even on floppies. A traditional linker would
have taken minutes. God knows what they were doing.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Michael S on Sun Jun 2 14:03:30 2024

On 02/06/2024 10:02, Michael S wrote:

On Sat, 01 Jun 2024 01:27:41 GMT
[email protected] (Scott Lurndal) wrote:

Lynn McGuire <[email protected]> writes:

On 5/26/2024 6:23 AM, Bonita Montero wrote:

Am 26.05.2024 um 09:13 schrieb jak:

About this I only agree partially because it depends a lot on the
context in which it is used. Moreover, I would not know how to
indicate an optimal programming language for all seasons.

C++ is in almost any case the better C.

What you describe is the greatest inconvenience of c++. To make
only one example, when they decided to rewrite the FB platform to
accelerate it, they thought of migrating from php to c++ and they
had a collapse of the staff suitable for work, so they thought of
relying a compiler that translated the php into c++ and many of
the new languages were born to try to remedy hits complexity.

C++ is the wrong language for web applications.
I like Java more for that.

C++ is the wrong language for real time apps.

That's an incorrect statement.

No memory allocation allowed.

It is trivially easy to write C++ code that doesn't
allocate memory dynamically.

I use C++ for my server side apps on my webserver. Works great.

I use C++ for operating systems (you can't get more real-time
than that)

Engines control is FAR more real-time that OS, to list just one example
out of many.

Most engine control software runs on an RTOS - so you have at least as
tough real-time requirements for the OS as for the application. The OS
stuff Scott works with, AFAIK, is real-time OS's for specific tasks such
as high-end network equipment. It is not general-purpose or desktop
OS's (which I agree are not particularly real-time).

Of course, nowadays most of these things are no longer done on general-purpose CPUs or even MCUs.

I think you have got that backwards.

Most engine control /is/ done with general purpose microcontrollers, or
at least specific variants of them. They will use ARM Cortex-R or
Cortex-M cores rather than Cortex-A cores (i.e., the "real-time" cores
or "microcontroller" cores rather than the "application" cores you see
in telephones, Macs, and ARM servers), but they are standard cores.
Another common choice is the PowerPC cores used in NXP's engine controllers.

It used to be the case that engine control and other critical hard
real-time work was done with DSPs or FPGAs, but those days are long past.

and bare-metal hypervisors.

It is hard to believe that you don't have at least one co-worker that
is begging to switch all new development to C approximately every week.
And couple of folks that beg for Rust.

It's possible that he has newbies amongst his co-workers, yes.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to David Brown on Sun Jun 2 16:29:14 2024

On Sun, 2 Jun 2024 14:03:30 +0200
David Brown <[email protected]> wrote:

On 02/06/2024 10:02, Michael S wrote:

On Sat, 01 Jun 2024 01:27:41 GMT
[email protected] (Scott Lurndal) wrote:

Lynn McGuire <[email protected]> writes:

On 5/26/2024 6:23 AM, Bonita Montero wrote:

Am 26.05.2024 um 09:13 schrieb jak:

About this I only agree partially because it depends a lot on
the context in which it is used. Moreover, I would not know how
to indicate an optimal programming language for all seasons.

C++ is in almost any case the better C.

What you describe is the greatest inconvenience of c++. To make
only one example, when they decided to rewrite the FB platform
to accelerate it, they thought of migrating from php to c++ and
they had a collapse of the staff suitable for work, so they
thought of relying a compiler that translated the php into c++
and many of the new languages were born to try to remedy hits
complexity.

C++ is the wrong language for web applications.
I like Java more for that.

C++ is the wrong language for real time apps.

That's an incorrect statement.

No memory allocation allowed.

It is trivially easy to write C++ code that doesn't
allocate memory dynamically.

I use C++ for my server side apps on my webserver. Works great.

I use C++ for operating systems (you can't get more real-time
than that)

Engines control is FAR more real-time that OS, to list just one
example out of many.

Most engine control software runs on an RTOS - so you have at least
as tough real-time requirements for the OS as for the application.

From what I read about this stuff (admittedly, long time ago) even
when there is a RTOS, the important part runs alongside RTOS rather than
"on" RTOS.
I.e. there is high priority interrupt that is never ever masked by OS in
the region that is anywhere close to expected time and all
time-sensitive work is done by ISR, with no sort of RTOS calls.

The OS stuff Scott works with, AFAIK, is real-time OS's for specific
tasks such as high-end network equipment. It is not general-purpose
or desktop OS's (which I agree are not particularly real-time).

I'd characterized the software running within high-end NIC is as very
soft real-time. You only care for buffers to not overflow. And if they overflow, it's not too bad either. The flow is very much unidirectional
or bi-directional with direction almost independent of each other.
There are dependencies between directions, e.g. TCP acks, but they a
weak dependencies timing-wise.
Hard real time is about closed loops, most often closed control loops,
but not only those.

Of course, nowadays most of these things are no longer done on general-purpose CPUs or even MCUs.

I think you have got that backwards.

Most engine control /is/ done with general purpose microcontrollers,
or at least specific variants of them. They will use ARM Cortex-R or Cortex-M cores rather than Cortex-A cores (i.e., the "real-time"
cores or "microcontroller" cores rather than the "application" cores
you see in telephones, Macs, and ARM servers), but they are standard
cores. Another common choice is the PowerPC cores used in NXP's
engine controllers.

It used to be the case that engine control and other critical hard
real-time work was done with DSPs or FPGAs, but those days are long
past.

Are you sure?
It's much simpler and far more reliable to do such task with $5 PLD
(which today means FPGA that boots from internal flash, rather than
old day's PLD) than with MCU, regardless of price of MCU.
Even if MCU is $4.99 cheaper, the difference is a noise relatively to
price of engine.

and bare-metal hypervisors.

It is hard to believe that you don't have at least one co-worker
that is begging to switch all new development to C approximately
every week. And couple of folks that beg for Rust.

It's possible that he has newbies amongst his co-workers, yes.

Well, Linus is not on his team, but if he was, he would say the same
thing. But probably at much higher rate than weekly.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kenny McCormack@21:1/5 to [email protected] on Sun Jun 2 13:24:23 2024

In article <v3gou9$36n61$[email protected]>,
Lawrence D'Oliveiro <[email protected]d> wrote:

On Fri, 31 May 2024 17:55:13 -0500, Lynn McGuire wrote:

while (1)

Why not

while (true)

or even

for (;;)

?

Or even:

:loop
....
goto loop

--
"The party of Lincoln has become the party of John Wilkes Booth."

- Carlos Alazraqui -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lew Pitcher@21:1/5 to Kenny McCormack on Sun Jun 2 16:51:15 2024

On Sun, 02 Jun 2024 13:24:23 +0000, Kenny McCormack wrote:

In article <v3gou9$36n61$[email protected]>,
Lawrence D'Oliveiro <[email protected]d> wrote:

On Fri, 31 May 2024 17:55:13 -0500, Lynn McGuire wrote:

while (1)

Why not

while (true)

or even

for (;;)

?

I've always considered
for (;;)
preferable over
while (1)
as the for (;;) expression does not require the compiler to expand
and evaluate a condition expression.

For the for (;;), the compiler sees the token stream <LPAREN>
<SEMICOLON> <SEMICOLON> <RPAREN>, and emits a closed loop, but
with while (1), the compiler sees <LPAREN> <CONSTANT> <RPAREN>,
and has to evaluate (either at compile time or at execution
time) the value of the <CONSTANT> to determine whether or or
not to emit the closed loop logic.

Or even:

:loop
....
goto loop

ITYM

loop:
/*Stuff happens here */
goto loop;

--
Lew Pitcher
"In Skills We Trust"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Michael S on Sun Jun 2 19:23:55 2024

Michael S <[email protected]> writes:

On Sun, 2 Jun 2024 14:03:30 +0200
David Brown <[email protected]> wrote:

The OS stuff Scott works with, AFAIK, is real-time OS's for specific
tasks such as high-end network equipment. It is not general-purpose
or desktop OS's (which I agree are not particularly real-time).

I'd characterized the software running within high-end NIC is as very
soft real-time. You only care for buffers to not overflow. And if they >overflow, it's not too bad either. The flow is very much unidirectional
or bi-directional with direction almost independent of each other.

A high-end network controller needs to be able to support line-rate
on multiple high speed (100Gb/s+) ports while routing, encapsulating, prioritizing, decrypting/encrypting, and/or applying various protocol transformations to the network traffic. Much of that pipeline is
implemented in gates, but at various points in flow, the CPU will need
to be involved and must not cause packet loss, so interrupt latency is
very important (and having enough cores to handle the required levels
of traffic). That's one of the reasons that DPDK and ODP stacks run
in user-mode, with direct access to the hardware - to avoid the overhead
of kernel mode switches. The hardware is virtualized (using SR-IOV
on PCIe) so the 'function' exposed to usermode code is isolated from
other networking resources and traffic flows.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Michael S on Sun Jun 2 19:15:29 2024

Michael S <[email protected]> writes:

On Sat, 01 Jun 2024 01:27:41 GMT
[email protected] (Scott Lurndal) wrote:

Lynn McGuire <[email protected]> writes:

On 5/26/2024 6:23 AM, Bonita Montero wrote:

Am 26.05.2024 um 09:13 schrieb jak:

About this I only agree partially because it depends a lot on the
context in which it is used. Moreover, I would not know how to
indicate an optimal programming language for all seasons.

C++ is in almost any case the better C.

What you describe is the greatest inconvenience of c++. To make
only one example, when they decided to rewrite the FB platform to
accelerate it, they thought of migrating from php to c++ and they
had a collapse of the staff suitable for work, so they thought of
relying a compiler that translated the php into c++ and many of
the new languages were born to try to remedy hits complexity.

C++ is the wrong language for web applications.
I like Java more for that.

C++ is the wrong language for real time apps.

That's an incorrect statement.

No memory allocation allowed.

It is trivially easy to write C++ code that doesn't
allocate memory dynamically.

I use C++ for my server side apps on my webserver. Works great.

I use C++ for operating systems (you can't get more real-time
than that)

Engines control is FAR more real-time that OS, to list just one example
out of many.

Actually there are real-time operating systems that support
those applications.

Of course, nowadays most of these things are no longer done on >general-purpose CPUs or even MCUs.

and bare-metal hypervisors.

It is hard to believe that you don't have at least one co-worker that
is begging to switch all new development to C approximately every week.

Language choice is based on the needs of the project. Linux driver
work necessarily needs to be done in C (perhaps also Rust in the near
future).

I've not heard any compliants about language. Discussions about what
features of a language are useful in our code, yes, those occur, primarily because we need to support older toolchains compatible with third-party toolsets (e.g. verilog environments) which, for example, limits us to
C++11 features.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Lew Pitcher on Sun Jun 2 19:52:22 2024

On 2024-06-02, Lew Pitcher <[email protected]> wrote:

I've always considered
for (;;)
preferable over
while (1)

Of course it is preferable. The idiom constitutes the language's direct
support for unconditional looping, not requiring that to be requested by
an extraneous always-true expression.

Using while (1) or while (true) is like i = i + 1 instead
of ++i, or while (*dst++ = *src++); instead of strcpy.

When Dennis Ritchie (if it was indeed he) chose for to be the construct
in which the guard expression may be omitted, so that it may express conditional looping, he expressed the intent that it be henceforth used
for that purpose.

To continue to use while (1) after the proper utensil is provided is
like to eat with your hands instead of a fork.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @[email protected]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Michael S on Sun Jun 2 21:44:01 2024

On 02/06/2024 15:29, Michael S wrote:

On Sun, 2 Jun 2024 14:03:30 +0200
David Brown <[email protected]> wrote:

On 02/06/2024 10:02, Michael S wrote:

On Sat, 01 Jun 2024 01:27:41 GMT
[email protected] (Scott Lurndal) wrote:

Lynn McGuire <[email protected]> writes:

On 5/26/2024 6:23 AM, Bonita Montero wrote:

Am 26.05.2024 um 09:13 schrieb jak:

About this I only agree partially because it depends a lot on
the context in which it is used. Moreover, I would not know how
to indicate an optimal programming language for all seasons.

C++ is in almost any case the better C.

What you describe is the greatest inconvenience of c++. To make
only one example, when they decided to rewrite the FB platform
to accelerate it, they thought of migrating from php to c++ and
they had a collapse of the staff suitable for work, so they
thought of relying a compiler that translated the php into c++
and many of the new languages were born to try to remedy hits
complexity.

C++ is the wrong language for web applications.
I like Java more for that.

C++ is the wrong language for real time apps.

That's an incorrect statement.

No memory allocation allowed.

It is trivially easy to write C++ code that doesn't
allocate memory dynamically.

I use C++ for my server side apps on my webserver. Works great.

I use C++ for operating systems (you can't get more real-time
than that)

Engines control is FAR more real-time that OS, to list just one
example out of many.

Most engine control software runs on an RTOS - so you have at least
as tough real-time requirements for the OS as for the application.

From what I read about this stuff (admittedly, long time ago) even
when there is a RTOS, the important part runs alongside RTOS rather than
"on" RTOS.
I.e. there is high priority interrupt that is never ever masked by OS in
the region that is anywhere close to expected time and all
time-sensitive work is done by ISR, with no sort of RTOS calls.

That's sort-of right. To be precise for something like this, we'd have
to say what exactly we mean by "engine controller". There are many
kinds of engine or motor, and many types of control that are needed for
them. Generally, there is a hierarchy of simpler but more time-critical
parts up to more complex but more flexible parts of the system.

As an example of a system of motor control that I've worked on (electric
motors rather than combustion engines), the most timing-critical signal generation and safety (emergency stop, overload protection, etc.) are
all in hardware - typically dedicated peripherals in the
microcontroller. Some safety parts might also be implemented in
non-maskable interrupt functions that the RTOS can never disable.

The low-level control of the motors is typically run by timer interrupt functions. These may be disabled by the RTOS, but will only be disabled
for a very short (and predictable) time - interrupt disabling is usually essential to the way locks and inter-process communication works,
including communication between these timer functions and the rest of
the code. Higher level control runs as RTOS tasks of various
priorities, and communication with other boards is usually a lower
priority task. Clearly these real-time tasks cannot be more "real-time"
than the RTOS itself. Other boards might have high level non-realtime
system determining things like path finding, or user interfaces.

And until you get to the highest level stuff, there is no reason why C++
is not suitable. But whether you use C++, C, Assembly, or Ada for the low-level and more real-time critical code, you avoid dynamic memory, exceptions, and other techniques that can have unpredictable failure
modes and unexpected delays. (The high-level stuff can be written in
any language.)

The OS stuff Scott works with, AFAIK, is real-time OS's for specific
tasks such as high-end network equipment. It is not general-purpose
or desktop OS's (which I agree are not particularly real-time).

I'd characterized the software running within high-end NIC is as very
soft real-time.

I'd characterize it as whatever Scott says it is - he's the expert
there, not you or me.

You only care for buffers to not overflow. And if they
overflow, it's not too bad either.

That is true for some things, but most certainly not for all usage.

The flow is very much unidirectional
or bi-directional with direction almost independent of each other.
There are dependencies between directions, e.g. TCP acks, but they a
weak dependencies timing-wise.

There is a lot of networking that is not TCP/IP.

High-speed network interfaces are used for two purposes - to get high throughput, or to get low latencies. Throughput is not as sensitive to
timing and can tolerate some variation as long as the traffic is
independent, but latency is a different matter.

Hard real time is about closed loops, most often closed control loops,
but not only those.

Of course, nowadays most of these things are no longer done on
general-purpose CPUs or even MCUs.

I think you have got that backwards.

Most engine control /is/ done with general purpose microcontrollers,
or at least specific variants of them. They will use ARM Cortex-R or
Cortex-M cores rather than Cortex-A cores (i.e., the "real-time"
cores or "microcontroller" cores rather than the "application" cores
you see in telephones, Macs, and ARM servers), but they are standard
cores. Another common choice is the PowerPC cores used in NXP's
engine controllers.

It used to be the case that engine control and other critical hard
real-time work was done with DSPs or FPGAs, but those days are long
past.

Are you sure?

Pretty sure, yes.

It's much simpler and far more reliable to do such task with $5 PLD
(which today means FPGA that boots from internal flash, rather than
old day's PLD) than with MCU, regardless of price of MCU.

No, it is not simpler or more reliable. Programmable logic is rarely
used for engine or motor control. You use microcontrollers with
appropriate peripherals, such as sophisticated PWM units and encoder interfaces, and advanced timers.

Even if MCU is $4.99 cheaper, the difference is a noise relatively to
price of engine.

That part is true.

and bare-metal hypervisors.

It is hard to believe that you don't have at least one co-worker
that is begging to switch all new development to C approximately
every week. And couple of folks that beg for Rust.

It's possible that he has newbies amongst his co-workers, yes.

Well, Linus is not on his team, but if he was, he would say the same
thing. But probably at much higher rate than weekly.

Yes, but Linux Torvalds knows shit about C++. He knows a lot about C,
and many other things.

He also - not unreasonably - believes that if C++ was used in the Linux
kernel, lots of others who know nothing about using C++ in OS's and
low-level work would make a complete mess of things. You don't want
someone to randomly add std::vector<> or the like into kernel code. You
don't want people who take delight in smart-arse coding, such as some
regulars in c.l.c++, anywhere near the kernel.

But other OS's are not the Linux kernel - it has particularly unique challenges. If you have an appropriate team, C++ is vastly better for
writing RTOS kernels than C.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to bart on Mon Jun 3 01:16:39 2024

On Sun, 2 Jun 2024 10:37:55 +0100, bart wrote:

On 02/06/2024 04:27, Lawrence D'Oliveiro wrote:

On Sat, 1 Jun 2024 11:37:45 +0100, bart wrote:

My compilers don't routinely generate object files, which would also
need an external dependency (a linker), but they can do if necessary
(eg. to statically link my code into another program with another
compiler).

Modular code design would indicate that there is no point the compiler
duplicating functionality available in the linker.

Python uses modules and yet doesn't have a linker.

What is importlib, then, if not something that links everything together?

And guess what: it’s a module.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Michael S on Mon Jun 3 03:21:01 2024

On Sun, 2 Jun 2024 11:02:13 +0300, Michael S wrote:

Engines control is FAR more real-time that OS, to list just one example
out of many.

Speaking of (internal-combustion) engines, I wondered when we can get to
the point where the controller is operating down at the level of
individual spark plug activations and valve openings -- getting rid of
cams and timing belts, in other words.

With that level of control, could you get down to an idling speed of 0
rpm? That is, could the engine get itself going from absolute rest?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Lawrence D'Oliveiro on Mon Jun 3 11:16:15 2024

On Mon, 3 Jun 2024 01:16:39 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:

On Sun, 2 Jun 2024 10:37:55 +0100, bart wrote:

On 02/06/2024 04:27, Lawrence D'Oliveiro wrote:

On Sat, 1 Jun 2024 11:37:45 +0100, bart wrote:

My compilers don't routinely generate object files, which would
also need an external dependency (a linker), but they can do if
necessary (eg. to statically link my code into another program
with another compiler).

Modular code design would indicate that there is no point the
compiler duplicating functionality available in the linker.

Python uses modules and yet doesn't have a linker.

What is importlib, then, if not something that links everything
together?

And guess what: it’s a module.

Bart is very obviously correct. When all sources are available, linker
is merely an implementation detail. Much less necessary implementation
detail too in the world of big RAM and of not particularly big apps.
LTCG is sort of admission of this fact.
Even in old days of small RAMs, super-popular TurboPascal suit had
modules, but I don't think that it had linker.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Kaz Kylheku on Mon Jun 3 12:01:48 2024

On Sun, 2 Jun 2024 19:52:22 -0000 (UTC)
Kaz Kylheku <[email protected]> wrote:

On 2024-06-02, Lew Pitcher <[email protected]> wrote:

I've always considered
for (;;)
preferable over
while (1)

Of course it is preferable. The idiom constitutes the language's
direct support for unconditional looping, not requiring that to be
requested by an extraneous always-true expression.

Using while (1) or while (true) is like i = i + 1 instead
of ++i, or while (*dst++ = *src++); instead of strcpy.

When Dennis Ritchie (if it was indeed he) chose for to be the
construct in which the guard expression may be omitted, so that it
may express conditional looping, he expressed the intent that it be henceforth used for that purpose.

To continue to use while (1) after the proper utensil is provided is
like to eat with your hands instead of a fork.

The former becoming increasingly popular.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to David Brown on Mon Jun 3 12:00:43 2024

On Sun, 2 Jun 2024 21:44:01 +0200
David Brown <[email protected]> wrote:

On 02/06/2024 15:29, Michael S wrote:

On Sun, 2 Jun 2024 14:03:30 +0200
David Brown <[email protected]> wrote:

On 02/06/2024 10:02, Michael S wrote:

On Sat, 01 Jun 2024 01:27:41 GMT
[email protected] (Scott Lurndal) wrote:

Lynn McGuire <[email protected]> writes:

On 5/26/2024 6:23 AM, Bonita Montero wrote:

Am 26.05.2024 um 09:13 schrieb jak:

About this I only agree partially because it depends a lot on
the context in which it is used. Moreover, I would not know
how to indicate an optimal programming language for all
seasons.

C++ is in almost any case the better C.

What you describe is the greatest inconvenience of c++. To
make only one example, when they decided to rewrite the FB
platform to accelerate it, they thought of migrating from php
to c++ and they had a collapse of the staff suitable for
work, so they thought of relying a compiler that translated
the php into c++ and many of the new languages were born to
try to remedy hits complexity.

C++ is the wrong language for web applications.
I like Java more for that.

C++ is the wrong language for real time apps.

That's an incorrect statement.

No memory allocation allowed.

It is trivially easy to write C++ code that doesn't
allocate memory dynamically.

I use C++ for my server side apps on my webserver. Works
great.

I use C++ for operating systems (you can't get more real-time
than that)

Engines control is FAR more real-time that OS, to list just one
example out of many.

Most engine control software runs on an RTOS - so you have at least
as tough real-time requirements for the OS as for the application.

From what I read about this stuff (admittedly, long time ago) even
when there is a RTOS, the important part runs alongside RTOS rather
than "on" RTOS.
I.e. there is high priority interrupt that is never ever masked by
OS in the region that is anywhere close to expected time and all time-sensitive work is done by ISR, with no sort of RTOS calls.

That's sort-of right. To be precise for something like this, we'd
have to say what exactly we mean by "engine controller". There are
many kinds of engine or motor, and many types of control that are
needed for them. Generally, there is a hierarchy of simpler but more time-critical parts up to more complex but more flexible parts of the
system.

As an example of a system of motor control that I've worked on
(electric motors rather than combustion engines), the most
timing-critical signal generation and safety (emergency stop,
overload protection, etc.) are all in hardware - typically dedicated peripherals in the microcontroller. Some safety parts might also be implemented in non-maskable interrupt functions that the RTOS can
never disable.

The low-level control of the motors is typically run by timer
interrupt functions. These may be disabled by the RTOS, but will
only be disabled for a very short (and predictable) time - interrupt disabling is usually essential to the way locks and inter-process communication works, including communication between these timer
functions and the rest of the code. Higher level control runs as
RTOS tasks of various priorities, and communication with other boards
is usually a lower priority task. Clearly these real-time tasks
cannot be more "real-time" than the RTOS itself. Other boards might
have high level non-realtime system determining things like path
finding, or user interfaces.

And until you get to the highest level stuff, there is no reason why
C++ is not suitable. But whether you use C++, C, Assembly, or Ada
for the low-level and more real-time critical code, you avoid dynamic
memory, exceptions, and other techniques that can have unpredictable
failure modes and unexpected delays. (The high-level stuff can be
written in any language.)

The OS stuff Scott works with, AFAIK, is real-time OS's for
specific tasks such as high-end network equipment. It is not
general-purpose or desktop OS's (which I agree are not
particularly real-time).

I'd characterized the software running within high-end NIC is as
very soft real-time.

I'd characterize it as whatever Scott says it is - he's the expert
there, not you or me.

You only care for buffers to not overflow. And if they
overflow, it's not too bad either.

That is true for some things, but most certainly not for all usage.

The flow is very much unidirectional
or bi-directional with direction almost independent of each other.
There are dependencies between directions, e.g. TCP acks, but they a
weak dependencies timing-wise.

There is a lot of networking that is not TCP/IP.

High-speed network interfaces are used for two purposes - to get high throughput, or to get low latencies. Throughput is not as sensitive
to timing and can tolerate some variation as long as the traffic is independent, but latency is a different matter.

I think, nearly all work in high-end NIC is concentrated on throughput.
For low latency, the best you can do with high end NIC is to disable
all high-end features and to hope that in disabled state they do not
hurt you too badly.
It would be probably better to use specialized "dumb" NIC. I don't know
if such things exist, but considering that high-frequency trading is
still legal (IMHO, it shouldn't be) I would guess that they do.

Hard real time is about closed loops, most often closed control
loops, but not only those.

Of course, nowadays most of these things are no longer done on
general-purpose CPUs or even MCUs.

I think you have got that backwards.

Most engine control /is/ done with general purpose
microcontrollers, or at least specific variants of them. They
will use ARM Cortex-R or Cortex-M cores rather than Cortex-A cores
(i.e., the "real-time" cores or "microcontroller" cores rather
than the "application" cores you see in telephones, Macs, and ARM
servers), but they are standard cores. Another common choice is
the PowerPC cores used in NXP's engine controllers.

It used to be the case that engine control and other critical hard
real-time work was done with DSPs or FPGAs, but those days are long
past.

Are you sure?

Pretty sure, yes.

It's much simpler and far more reliable to do such task with $5 PLD
(which today means FPGA that boots from internal flash, rather than
old day's PLD) than with MCU, regardless of price of MCU.

No, it is not simpler or more reliable. Programmable logic is rarely
used for engine or motor control. You use microcontrollers with
appropriate peripherals, such as sophisticated PWM units and encoder interfaces, and advanced timers.

I was not talking about electric motors.

Even if MCU is $4.99 cheaper, the difference is a noise relatively
to price of engine.

That part is true.

and bare-metal hypervisors.

It is hard to believe that you don't have at least one co-worker
that is begging to switch all new development to C approximately
every week. And couple of folks that beg for Rust.

It's possible that he has newbies amongst his co-workers, yes.

Well, Linus is not on his team, but if he was, he would say the same
thing. But probably at much higher rate than weekly.

Yes, but Linux Torvalds knows shit about C++. He knows a lot about
C, and many other things.

He also - not unreasonably - believes that if C++ was used in the
Linux kernel, lots of others who know nothing about using C++ in OS's
and low-level work would make a complete mess of things. You don't
want someone to randomly add std::vector<> or the like into kernel
code. You don't want people who take delight in smart-arse coding,
such as some regulars in c.l.c++, anywhere near the kernel.

Or may be he understand that [for kernel] proclaimed advantages of C++
do not matter or matter too little. And disadvantage of higher
difficulty to see quickly what's going on, is real.

It is interesting to mention that experienced 46 y.o. Dave Cutler and
young student Linus Torvalds independently came to the same conclusion
w.r.t. to kernel language choice. That despite Cutler's employer being
very C++-oriented at that moment and despite most of the decisions
taken during the peak years of OO hype.
Unlike Torvalds, Cutler was not in a position to fully disable
development of 3-rd party kernel modules in C++, but he did his best to discourage this practice.

But other OS's are not the Linux kernel - it has particularly unique challenges. If you have an appropriate team, C++ is vastly better
for writing RTOS kernels than C.

I find your statement unproven.
How many surviving and proliferating RTOS kernels are written in
each language?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Lawrence D'Oliveiro on Mon Jun 3 11:13:32 2024

On 03/06/2024 02:16, Lawrence D'Oliveiro wrote:

On Sun, 2 Jun 2024 10:37:55 +0100, bart wrote:

On 02/06/2024 04:27, Lawrence D'Oliveiro wrote:

On Sat, 1 Jun 2024 11:37:45 +0100, bart wrote:

My compilers don't routinely generate object files, which would also
need an external dependency (a linker), but they can do if necessary
(eg. to statically link my code into another program with another
compiler).

Modular code design would indicate that there is no point the compiler
duplicating functionality available in the linker.

Python uses modules and yet doesn't have a linker.

What is importlib, then, if not something that links everything together?

It seems to provide an API to the mechanisms behind 'import'.

And guess what: it’s a module.

So, you use it like this:

import importlib

maybe? So how does importlib manage to import importlib before importlib
itself is imported?

There is NO ahead-of-time linking of modules in Python as it is
understood in traditional compiled languages.

Besides, all such statements are executed at runtime, and can be
conditional.

There are of course mechanisms to collate symbols across different
modules, which are executed at runtime and on demand. There are
similarities to the methods used to maintain the global symbol table in
my whole-program compilers, or in my assembler.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Mon Jun 3 14:16:41 2024

Lawrence D'Oliveiro <[email protected]d> writes:

On Sun, 2 Jun 2024 11:02:13 +0300, Michael S wrote:

Engines control is FAR more real-time that OS, to list just one example
out of many.

Speaking of (internal-combustion) engines, I wondered when we can get to
the point where the controller is operating down at the level of
individual spark plug activations and valve openings -- getting rid of
cams and timing belts, in other words.

It's pretty clear that the ICE is becoming a dinosaur.

And no, I don't see the engine controller working at that level;
one wonders how it would open the valves without the mechanical
linkages (individual acutators? $$$).

With that level of control, could you get down to an idling speed of 0
rpm? That is, could the engine get itself going from absolute rest?

Tesla.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Michael S on Mon Jun 3 18:34:16 2024

On 03/06/2024 11:00, Michael S wrote:

On Sun, 2 Jun 2024 21:44:01 +0200
David Brown <[email protected]> wrote:

On 02/06/2024 15:29, Michael S wrote:

On Sun, 2 Jun 2024 14:03:30 +0200
David Brown <[email protected]> wrote:

There is a lot of networking that is not TCP/IP.

High-speed network interfaces are used for two purposes - to get high
throughput, or to get low latencies. Throughput is not as sensitive
to timing and can tolerate some variation as long as the traffic is
independent, but latency is a different matter.

I think, nearly all work in high-end NIC is concentrated on throughput.
For low latency, the best you can do with high end NIC is to disable
all high-end features and to hope that in disabled state they do not
hurt you too badly.
It would be probably better to use specialized "dumb" NIC. I don't know
if such things exist, but considering that high-frequency trading is
still legal (IMHO, it shouldn't be) I would guess that they do.

I think Scott can answer the high-end NIC questions a lot better than I
could.

Hard real time is about closed loops, most often closed control
loops, but not only those.

Of course, nowadays most of these things are no longer done on
general-purpose CPUs or even MCUs.

I think you have got that backwards.

Most engine control /is/ done with general purpose
microcontrollers, or at least specific variants of them. They
will use ARM Cortex-R or Cortex-M cores rather than Cortex-A cores
(i.e., the "real-time" cores or "microcontroller" cores rather
than the "application" cores you see in telephones, Macs, and ARM
servers), but they are standard cores. Another common choice is
the PowerPC cores used in NXP's engine controllers.

It used to be the case that engine control and other critical hard
real-time work was done with DSPs or FPGAs, but those days are long
past.

Are you sure?

Pretty sure, yes.

It's much simpler and far more reliable to do such task with $5 PLD
(which today means FPGA that boots from internal flash, rather than
old day's PLD) than with MCU, regardless of price of MCU.

No, it is not simpler or more reliable. Programmable logic is rarely
used for engine or motor control. You use microcontrollers with
appropriate peripherals, such as sophisticated PWM units and encoder
interfaces, and advanced timers.

I was not talking about electric motors.

Petrol and diesel engines have far less demanding requirements for the
timing of their control systems. The fastest control loops you need to
control them are a fraction of the speed of those used for high-end
electric motor control, and the corresponding acceptable jitter levels
are much less fussy. And they are invariably controlled by
microcontrollers, and have been for decades. (The microcontrollers you
use typically have some specialised timing peripherals.)

However, I would not be surprised to see programmable logic in the
controllers for jet engines, if that is what you are talking about. The markets there are too small, and the control details too different
between different models, for there to be microcontrollers with
jet-engine peripherals. But regardless, it is all still hierarchical in
the same way, with a RTOS and real-time software tasks sitting above the dedicated hardware and below the high-level control software.

Well, Linus is not on his team, but if he was, he would say the same
thing. But probably at much higher rate than weekly.

Yes, but Linux Torvalds knows shit about C++. He knows a lot about
C, and many other things.

He also - not unreasonably - believes that if C++ was used in the
Linux kernel, lots of others who know nothing about using C++ in OS's
and low-level work would make a complete mess of things. You don't
want someone to randomly add std::vector<> or the like into kernel
code. You don't want people who take delight in smart-arse coding,
such as some regulars in c.l.c++, anywhere near the kernel.

Or may be he understand that [for kernel] proclaimed advantages of C++
do not matter or matter too little. And disadvantage of higher
difficulty to see quickly what's going on, is real.

It is interesting to mention that experienced 46 y.o. Dave Cutler and
young student Linus Torvalds independently came to the same conclusion
w.r.t. to kernel language choice.

You /do/ understand that these decisions were made some 30 years ago?
The languages, developers, compilers, targets, and many other things
have changed in that time.

That despite Cutler's employer being
very C++-oriented at that moment and despite most of the decisions
taken during the peak years of OO hype.
Unlike Torvalds, Cutler was not in a position to fully disable
development of 3-rd party kernel modules in C++, but he did his best to discourage this practice.

But other OS's are not the Linux kernel - it has particularly unique
challenges. If you have an appropriate team, C++ is vastly better
for writing RTOS kernels than C.

I find your statement unproven.
How many surviving and proliferating RTOS kernels are written in
each language?

Oh, there's little doubt that most publicly available RTOS kernels are
in C, not C++. That does not mean C is in any way /better/ for the
task. There are multiple reasons for C being the language of choice here:

1. Most well-known RTOS kernels have a history stretching back to the
previous century. C++ was not nearly as viable an option at that time,
for a great many reasons.

2. If you write your kernel in C++, you pretty much have to use C++ for
the application code unless you also write a C API for it. If you write
your kernel in C, you can use almost any language for the application code.

3. Most well-known RTOS's are for microcontrollers, often including
small CISC devices and other microcontrollers for which toolchain
support was traditionally poor, expensive, and barely classifiable as
C90 never mind C++ or even C99. If you want to support these devices,
C90 is the only way to go.

4. There is a bizarre attitude in a lot of the embedded world that "ANSI
C" (meaning C89/C90) is somehow magical and "the standard". Marketing departments see it as a "feature" that the code is written to this long out-dated and inferior language standard.

5. Lots of embedded programmers are not great programmers, or not
educated as programmers - they are hardware or electronics engineers
that have moved into software. C90 is often all they know, and
certainly they have never learned more than basic C++.

So there are plenty of reasons why C (especially C90) is dominant in
RTOS's. Note that none of these are technical reasons - C90 is never
chosen because it is a /better/ language than C++ (or even C99). It is
chosen /despite/ being a weaker language. (Some non-technical reasons
can be good arguments, of course - but in most cases they are not.)

After all, there is virtually nothing that you can write in C90 that you
cannot use directly in modern C++. Baring a few cases where you need
casts in C++ but not in C (and such casts are typically mandated by
embedded coding standards anyway), you can compile the same code as C++.
Since you can do almost everything with modern C++ that you can with
C90 or C99, and you can do vastly more with modern C++ - resulting in
/much/ safer coding, as long as the programmers are competent - it is
obvious that an appropriately restricted subset C++ is a technically
better choice of language.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to David Brown on Mon Jun 3 16:50:50 2024

David Brown <[email protected]> writes:

On 03/06/2024 11:00, Michael S wrote:

On Sun, 2 Jun 2024 21:44:01 +0200
David Brown <[email protected]> wrote:

On 02/06/2024 15:29, Michael S wrote:

On Sun, 2 Jun 2024 14:03:30 +0200
David Brown <[email protected]> wrote:

There is a lot of networking that is not TCP/IP.

High-speed network interfaces are used for two purposes - to get high
throughput, or to get low latencies. Throughput is not as sensitive
to timing and can tolerate some variation as long as the traffic is
independent, but latency is a different matter.

I think, nearly all work in high-end NIC is concentrated on throughput.
For low latency, the best you can do with high end NIC is to disable
all high-end features and to hope that in disabled state they do not
hurt you too badly.
It would be probably better to use specialized "dumb" NIC. I don't know
if such things exist, but considering that high-frequency trading is
still legal (IMHO, it shouldn't be) I would guess that they do.

I think Scott can answer the high-end NIC questions a lot better than I >could.

Hard real time is about closed loops, most often closed control
loops, but not only those.

Of course, nowadays most of these things are no longer done on
general-purpose CPUs or even MCUs.

I think you have got that backwards.

Most engine control /is/ done with general purpose
microcontrollers, or at least specific variants of them. They
will use ARM Cortex-R or Cortex-M cores rather than Cortex-A cores
(i.e., the "real-time" cores or "microcontroller" cores rather
than the "application" cores you see in telephones, Macs, and ARM
servers), but they are standard cores. Another common choice is
the PowerPC cores used in NXP's engine controllers.

It used to be the case that engine control and other critical hard
real-time work was done with DSPs or FPGAs, but those days are long
past.

Are you sure?

Pretty sure, yes.

It's much simpler and far more reliable to do such task with $5 PLD
(which today means FPGA that boots from internal flash, rather than
old day's PLD) than with MCU, regardless of price of MCU.

No, it is not simpler or more reliable. Programmable logic is rarely
used for engine or motor control. You use microcontrollers with
appropriate peripherals, such as sophisticated PWM units and encoder
interfaces, and advanced timers.

I was not talking about electric motors.

Petrol and diesel engines have far less demanding requirements for the
timing of their control systems. The fastest control loops you need to >control them are a fraction of the speed of those used for high-end
electric motor control, and the corresponding acceptable jitter levels
are much less fussy. And they are invariably controlled by
microcontrollers, and have been for decades. (The microcontrollers you
use typically have some specialised timing peripherals.)

However, I would not be surprised to see programmable logic in the >controllers for jet engines, if that is what you are talking about. The >markets there are too small, and the control details too different
between different models, for there to be microcontrollers with
jet-engine peripherals. But regardless, it is all still hierarchical in
the same way, with a RTOS and real-time software tasks sitting above the >dedicated hardware and below the high-level control software.

https://en.wikipedia.org/wiki/FADEC

It is interesting to mention that experienced 46 y.o. Dave Cutler and
young student Linus Torvalds independently came to the same conclusion
w.r.t. to kernel language choice.

You /do/ understand that these decisions were made some 30 years ago?

And it's not particularly interesting, either.

Cutler would have been happy with Macro-32 and Bliss-32 if he
could have wrestled them from DEC.

Linus has always had significant misconceptions about C++
(as do many who think C++ is defined by the standard C++ library).

The languages, developers, compilers, targets, and many other things
have changed in that time.

We were writting a large unix compatible operating system in C++
before Linus released the first Linux.

That despite Cutler's employer being
very C++-oriented at that moment and despite most of the decisions
taken during the peak years of OO hype.
Unlike Torvalds, Cutler was not in a position to fully disable
development of 3-rd party kernel modules in C++, but he did his best to
discourage this practice.

But other OS's are not the Linux kernel - it has particularly unique
challenges. If you have an appropriate team, C++ is vastly better
for writing RTOS kernels than C.

I find your statement unproven.
How many surviving and proliferating RTOS kernels are written in
each language?

Oh, there's little doubt that most publicly available RTOS kernels are
in C, not C++. That does not mean C is in any way /better/ for the
task. There are multiple reasons for C being the language of choice here:

1. Most well-known RTOS kernels have a history stretching back to the >previous century. C++ was not nearly as viable an option at that time,
for a great many reasons.

I would disagree with this. The Chorus microkernel (Chorus Systemes,
later purchased by Sun) was started in the late 1980's and was
written in C++ (with a small set of assembler functions). This was
using Cfront (2.1 and later 3.0). I'm pretty sure it is still in
use. This was long before templates, exceptions or the standard library.

2. If you write your kernel in C++, you pretty much have to use C++ for
the application code unless you also write a C API for it.

Clearly one can use C interfaces from C++ code. And one can develop
C++ wrapper around C-type functionality.

Our C++ kernels supported standard unix-style APIs between user
mode software and the kernel.

If you write
your kernel in C, you can use almost any language for the application code.

If you write your kernel in _any_ lanaguage, you can use _any_ language
for the application code, or the kernel isn't much use to anyone.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Scott Lurndal on Mon Jun 3 21:05:17 2024

On 03/06/2024 18:50, Scott Lurndal wrote:

David Brown <[email protected]> writes:

On 03/06/2024 11:00, Michael S wrote:

On Sun, 2 Jun 2024 21:44:01 +0200
David Brown <[email protected]> wrote:

Oh, there's little doubt that most publicly available RTOS kernels are
in C, not C++. That does not mean C is in any way /better/ for the
task. There are multiple reasons for C being the language of choice here: >>
1. Most well-known RTOS kernels have a history stretching back to the
previous century. C++ was not nearly as viable an option at that time,
for a great many reasons.

I would disagree with this. The Chorus microkernel (Chorus Systemes,
later purchased by Sun) was started in the late 1980's and was
written in C++ (with a small set of assembler functions). This was
using Cfront (2.1 and later 3.0). I'm pretty sure it is still in
use. This was long before templates, exceptions or the standard library.

C++ was viable for the kind of systems you were working with (clearly
that is true, since you worked on an OS written in C++ at that time).

I have been specifically referring to "well-known" RTOS's - the sort
that would have a Wikipedia page, or whose name has a chance of being recognised by many embedded programmers. (I realise this is a very
vague and subjective classification.) I am quite confident that the
majority of RTOS's ever written are proprietary, with little if any
public information. Some of these will be written in C, others in
Assembly, C++, Ada, and perhaps other languages. I think the share of
C++ in these will be a lot higher than in more commonly used RTOS's,
because the team involved in developing and using them will be smaller
and more controlled, negating many of the reasons for using C.

2. If you write your kernel in C++, you pretty much have to use C++ for
the application code unless you also write a C API for it.

Clearly one can use C interfaces from C++ code. And one can develop
C++ wrapper around C-type functionality.

Our C++ kernels supported standard unix-style APIs between user
mode software and the kernel.

If you write
your kernel in C, you can use almost any language for the application code.

If you write your kernel in _any_ lanaguage, you can use _any_ language
for the application code, or the kernel isn't much use to anyone.

Many - I think most - RTOS's are linked as libraries, rather than
separately linked applications.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Lawrence D'Oliveiro on Mon Jun 3 21:14:08 2024

On 03/06/2024 05:21, Lawrence D'Oliveiro wrote:

On Sun, 2 Jun 2024 11:02:13 +0300, Michael S wrote:

Engines control is FAR more real-time that OS, to list just one example
out of many.

Speaking of (internal-combustion) engines, I wondered when we can get to
the point where the controller is operating down at the level of
individual spark plug activations and valve openings -- getting rid of
cams and timing belts, in other words.

You need the mechanics for the valves anyway, so cam shafts are
convenient. But AFAIK everything that can be tunable timing is
controlled by software (microcontrollers with advanced timer
peripherals) these days, and some engines use electric solenoids for the valves.

But I can't claim to know much about how these work. I know plenty
about some of the microcontrollers used in the automotive industry,
including engine controllers, because we use some of these kinds of
devices (for other purposes).

With that level of control, could you get down to an idling speed of 0
rpm? That is, could the engine get itself going from absolute rest?

There are physical limitations that make it difficult to idle an ICE at
very low revs - it makes more sense to simply stop the engine when it is
not needed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to David Brown on Mon Jun 3 19:38:14 2024

David Brown <[email protected]> writes:

On 03/06/2024 18:50, Scott Lurndal wrote:

nguage for the application code.

If you write your kernel in _any_ lanaguage, you can use _any_ language
for the application code, or the kernel isn't much use to anyone.

Many - I think most - RTOS's are linked as libraries, rather than
separately linked applications.

Some, I suspect, load code from some form of flash dynamically.

Regardless, interlanguage linking has been available for half a century.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Scott Lurndal on Mon Jun 3 22:58:56 2024

On Mon, 03 Jun 2024 16:50:50 GMT
[email protected] (Scott Lurndal) wrote:

David Brown <[email protected]> writes:

1. Most well-known RTOS kernels have a history stretching back to
the previous century. C++ was not nearly as viable an option at
that time, for a great many reasons.

I would disagree with this. The Chorus microkernel (Chorus Systemes,
later purchased by Sun) was started in the late 1980's and was
written in C++ (with a small set of assembler functions). This was
using Cfront (2.1 and later 3.0). I'm pretty sure it is still in
use. This was long before templates, exceptions or the standard
library.

If Chorus is your idea of well-known then I wonder what you call
obscure.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Lew Pitcher on Mon Jun 3 13:29:25 2024

Lew Pitcher <[email protected]> writes:

On Sun, 02 Jun 2024 13:24:23 +0000, Kenny McCormack wrote:

In article <v3gou9$36n61$[email protected]>,
Lawrence D'Oliveiro <[email protected]d> wrote:

On Fri, 31 May 2024 17:55:13 -0500, Lynn McGuire wrote:

while (1)

Why not

while (true)

or even

for (;;)

?

I've always considered
for (;;)
preferable over
while (1)
as the for (;;) expression does not require the compiler to expand
and evaluate a condition expression.

For the for (;;), the compiler sees the token stream <LPAREN>
<SEMICOLON> <SEMICOLON> <RPAREN>, and emits a closed loop, but
with while (1), the compiler sees <LPAREN> <CONSTANT> <RPAREN>,

But the 'for (;;)' tokens need to be matched to a much more
complicated syntax, with three optional expression (one of
which might be a declaration) before assigning semantics.
There is actually a lot more to do when 'for (;;)' is used.

and has to evaluate (either at compile time or at execution
time) the value of the <CONSTANT> to determine whether or or
not to emit the closed loop logic.

Both gcc and clang turn 'while (1)' into simple loops even
under -O0. So it can't be that hard.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Scott Lurndal on Mon Jun 3 13:23:37 2024

[email protected] (Scott Lurndal) writes:

[ ... (internal-combustion) engines, ... ]

It's pretty clear that the ICE is becoming a dinosaur.

Kind of makes it full circle, doesn't it? ;)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Kaz Kylheku on Mon Jun 3 13:31:38 2024

Kaz Kylheku <[email protected]> writes:

On 2024-06-02, Lew Pitcher <[email protected]> wrote:

I've always considered
for (;;)
preferable over
while (1)

Of course it is preferable. The idiom constitutes the language's direct support for unconditional looping, not requiring that to be requested by
an extraneous always-true expression.

Using while (1) or while (true) is like i = i + 1 instead
of ++i, or while (*dst++ = *src++); instead of strcpy. [...]

Using for (;;) for an infinite loop is an abomination. Anyone
who advocates following that rule is an instrument of Satan.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Michael S on Mon Jun 3 21:22:07 2024

Michael S <[email protected]> writes:

On Mon, 03 Jun 2024 16:50:50 GMT
[email protected] (Scott Lurndal) wrote:

David Brown <[email protected]> writes:

1. Most well-known RTOS kernels have a history stretching back to
the previous century. C++ was not nearly as viable an option at
that time, for a great many reasons.

I would disagree with this. The Chorus microkernel (Chorus Systemes,
later purchased by Sun) was started in the late 1980's and was
written in C++ (with a small set of assembler functions). This was
using Cfront (2.1 and later 3.0). I'm pretty sure it is still in
use. This was long before templates, exceptions or the standard
library.

If Chorus is your idea of well-known then I wonder what you call
obscure.

I was, of course, addressing the second sentence in David's point (1).

At the time, in the OS research community, Chorus was, indeed well-known.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kenny McCormack@21:1/5 to Chris M. Thomasson on Mon Jun 3 21:48:23 2024

In article <v3lb0u$2452$[email protected]>,
Chris M. Thomasson <[email protected]> wrote:

On 6/3/2024 1:31 PM, Tim Rentsch wrote:

Kaz Kylheku <[email protected]> writes:

On 2024-06-02, Lew Pitcher <[email protected]> wrote:

I've always considered
for (;;)
preferable over
while (1)

Of course it is preferable. The idiom constitutes the language's direct >>> support for unconditional looping, not requiring that to be requested by >>> an extraneous always-true expression.

Using while (1) or while (true) is like i = i + 1 instead
of ++i, or while (*dst++ = *src++); instead of strcpy. [...]

Using for (;;) for an infinite loop is an abomination. Anyone
who advocates following that rule is an instrument of Satan.

Better than goto? ;^D

I can't believe we're still having this conversation.

Surely, on any reasonably modern compiler, all three forms will generate exactly the same code.

--
You are again heaping damnation upon your own head by your statements.

- Rick C Hodgin -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Kaz Kylheku on Mon Jun 3 23:43:00 2024

On 02/06/2024 20:52, Kaz Kylheku wrote:

On 2024-06-02, Lew Pitcher <[email protected]> wrote:

I've always considered
for (;;)
preferable over
while (1)

Of course it is preferable. The idiom constitutes the language's direct support for unconditional looping, not requiring that to be requested by
an extraneous always-true expression.

Using while (1) or while (true) is like i = i + 1 instead
of ++i, or while (*dst++ = *src++); instead of strcpy.

When Dennis Ritchie (if it was indeed he) chose for to be the construct
in which the guard expression may be omitted, so that it may express conditional looping, he expressed the intent that it be henceforth used
for that purpose.

To continue to use while (1) after the proper utensil is provided is
like to eat with your hands instead of a fork.

To me they're both an abomination.

I classify loops like this: Endless, Repeat-N-times, While, Iteration
(over ranges or values).

Few languages have a special form for the first two, but mine always
have done.

'while' is no good for endless loops because there is no actual
condition to check. You have to provide one just for it to be eliminated.

'for' is even worse because there is no sort of iteration going on.
Actually, somebody could write a loop like this:

for(int i=0;;++i)

Is that an endless loop or not? Like every other for-loop, you have to
analyse it to deduce the intent. Here, you can't even be sure that the
empty condition wasn't an oversight.

At this point someone will suggest a macro this:

#define forever for(;;)

All that suggest sto me is that the language *needs* an explicit endless
loop!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to bart on Tue Jun 4 02:12:52 2024

On Mon, 3 Jun 2024 11:13:32 +0100, bart wrote:

So how does importlib manage to import importlib before importlib
itself is imported?

I guess the same way a linker manages to link itself.

There is NO ahead-of-time linking of modules in Python as it is
understood in traditional compiled languages.

Python is a compiled language.

Besides, all such statements are executed at runtime, and can be
conditional.

It’s not the only object-code language with that property.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Michael S on Tue Jun 4 02:10:55 2024

On Mon, 3 Jun 2024 11:16:15 +0300, Michael S wrote:

When all sources are available, linker is merely an implementation
detail.

That’s assuming all the code is written in the same language, compilable
with the same compiler.

For typical non-trivial open-source projects, this is usually not true.

And consider, even with C, the meaning of top-level “static” and the implications for compiling the source in separate pieces versus all at
once.

Even in old days of small RAMs, super-popular TurboPascal suit had
modules, but I don't think that it had linker.

The programs it built had sizes in, say, the tens of thousands of lines at most.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to bart on Tue Jun 4 02:20:43 2024

On Mon, 3 Jun 2024 23:43:00 +0100, bart wrote:

All that suggest sto me is that the language *needs* an explicit endless loop!

I agree. Also it is common for a loop to have multiple exits, and I don’t like treating one of them as a special “termination condition” above the others, so I like to use “break” for all of them.

The “for” form not only caters for this, it allows handy initialization of local variables that keep their value between loop iterations. E.g.

for (unsigned int i = length_of(array);;)
{
if (i == 0)
{
... not found ...
break;
} /*if*/
--i;
if (... array[i] matches what I want ...)
{
.. found ...
break;
} /*if*/
} /*for*/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to bart on Tue Jun 4 04:00:58 2024

On Sat, 1 Jun 2024 19:59:19 +0100, bart wrote:

This version does binary/text to COFF only.

objcopy uses BFD to handle all the object-format details; why not do the
same?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Michael S on Tue Jun 4 05:12:29 2024

On 2024-06-03, Michael S <[email protected]> wrote:

On Mon, 03 Jun 2024 16:50:50 GMT
[email protected] (Scott Lurndal) wrote:

David Brown <[email protected]> writes:

1. Most well-known RTOS kernels have a history stretching back to
the previous century. C++ was not nearly as viable an option at
that time, for a great many reasons.

I would disagree with this. The Chorus microkernel (Chorus Systemes,
later purchased by Sun) was started in the late 1980's and was
written in C++ (with a small set of assembler functions). This was
using Cfront (2.1 and later 3.0). I'm pretty sure it is still in
use. This was long before templates, exceptions or the standard
library.

If Chorus is your idea of well-known then I wonder what you call
obscure.

I also know about Chorus. However, not from actual work exposure to it.
I remember it from operating systems courses at school; i.e. academia.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @[email protected]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Scott Lurndal on Tue Jun 4 05:17:13 2024

On 2024-06-03, Scott Lurndal <[email protected]> wrote:

At the time, in the OS research community, Chorus was, indeed well-known.

If Chorus at least doesn't vaguely ring a bell, you must have your head
up your ass as even a bachelor-level computer scientist.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @[email protected]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to bart on Tue Jun 4 05:25:22 2024

On 2024-06-03, bart <[email protected]> wrote:

Actually, somebody could write a loop like this:

for(int i=0;;++i)

Is that an endless loop or not?

Some compilers may recognize it as assertion which says "this code shall
not be reached".

At this point someone will suggest a macro this:

#define forever for(;;)

All that suggest sto me is that the language *needs* an explicit endless loop!

Nope!

#define ev
#define e
#define r

for(ev;e;r) ...

:)

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @[email protected]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Tue Jun 4 06:55:04 2024

On Mon, 03 Jun 2024 16:50:50 GMT, Scott Lurndal wrote:

We were writting a large unix compatible operating system in C++
before Linus released the first Linux.

Where is that now?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Kaz Kylheku on Tue Jun 4 11:23:56 2024

On Tue, 4 Jun 2024 05:17:13 -0000 (UTC)
Kaz Kylheku <[email protected]> wrote:

On 2024-06-03, Scott Lurndal <[email protected]> wrote:

At the time, in the OS research community, Chorus was, indeed
well-known.

If Chorus at least doesn't vaguely ring a bell, you must have your
head up your ass as even a bachelor-level computer scientist.

The closest I was near CS department is, may be, passing by in the same building. But even that unsure.

Chorus certainly was never mentioned as a candidate kernel for any real
project when I was in the room.

Majority of the stuff that was mentioned or used around me in 90s was
from little guys, like MTOS, pSoS, VxWorks (later acquired by big guy).
The only two that I remember from big guys were iRMX/iRMK and VAXEln.
W.r.t. to uKernels, back then I heard about QNX, but not in the context
of proposition to use it in our own project.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Kaz Kylheku on Tue Jun 4 10:25:32 2024

On 04/06/2024 07:17, Kaz Kylheku wrote:

On 2024-06-03, Scott Lurndal <[email protected]> wrote:

At the time, in the OS research community, Chorus was, indeed well-known.

If Chorus at least doesn't vaguely ring a bell, you must have your head
up your ass as even a bachelor-level computer scientist.

I think that is putting it a bit strongly - it is a /long/ time since
Chorus was relevant even in academic circles. And while it was
influential, I don't know that it was ever widely used (Scott will know
more about that, I guess).

Maybe it would be discussed in some computer science degrees, if you go
back long enough and had detailed enough courses in operating systems or perhaps computing history. I don't remember that it never turned up in
my courses, some 30-odd years ago, but I don't remember /all/ the
details from all my courses!. I knew about it mainly because I am
interested in OS's, and history, and spend far too much time on
Wikipedia and countless technical sites - not because it was on my
syllabus at university.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Kenny McCormack on Tue Jun 4 10:36:00 2024

On 03/06/2024 23:48, Kenny McCormack wrote:

In article <v3lb0u$2452$[email protected]>,
Chris M. Thomasson <[email protected]> wrote:

On 6/3/2024 1:31 PM, Tim Rentsch wrote:

Kaz Kylheku <[email protected]> writes:

On 2024-06-02, Lew Pitcher <[email protected]> wrote:

I've always considered
for (;;)
preferable over
while (1)

Of course it is preferable. The idiom constitutes the language's direct >>>> support for unconditional looping, not requiring that to be requested by >>>> an extraneous always-true expression.

Using while (1) or while (true) is like i = i + 1 instead
of ++i, or while (*dst++ = *src++); instead of strcpy. [...]

Using for (;;) for an infinite loop is an abomination. Anyone
who advocates following that rule is an instrument of Satan.

Better than goto? ;^D

I can't believe we're still having this conversation.

Surely, on any reasonably modern compiler, all three forms will generate exactly the same code.

I would think so, yes. (I've used toolchains where that was not true,
but they are firmly in my past.)

But conversations - arguments - about style of source code /never/ get
out of date!

Personally, I'm in the "while (true) { ... }" camp. To me, "for (;;)"
looks like a weird smiley, and I do not fall for any appeals to Deniis Ritchie's authority.

But we are missing another option:

void mainloop() {
// do something
mainloop();
}

That should be fine with an optimising compiler.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Tue Jun 4 10:47:15 2024

On 04/06/2024 01:23, Keith Thompson wrote:

bart <[email protected]> writes:
[...]

All that suggest sto me is that the language *needs* an explicit
endless loop!

No, it doesn't.

Indeed - it suggests that the language already has perfectly good,
workable ways to specify endless loops. It's fine to have additional
language (or library) features for very common tasks, or for tasks where
the new feature adds clear benefit. I don't see what benefit a language keyword "forever" compared to "while (true)" or one of the other common
idioms.

I suspect some of the people in this thread saying that one form
is obviously better than the others are joking.

Yes. People usually have their own preferences and habits for what they
write themselves, but dislike for the alternatives is usually
exaggerated. (Unless someone uses a goto loop - then the source code
should be burned and the programmer forced to copy out the paper "Go to considered harmful" for the rest of the working week.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Lawrence D'Oliveiro on Tue Jun 4 10:47:58 2024

On 04/06/2024 04:20, Lawrence D'Oliveiro wrote:

On Mon, 3 Jun 2024 23:43:00 +0100, bart wrote:

All that suggest sto me is that the language *needs* an explicit endless
loop!

I agree. Also it is common for a loop to have multiple exits, and I don’t like treating one of them as a special “termination condition” above the others, so I like to use “break” for all of them.

The “for” form not only caters for this, it allows handy initialization of
local variables that keep their value between loop iterations. E.g.

for (unsigned int i = length_of(array);;)
{
if (i == 0)
{
... not found ...
break;
} /*if*/
--i;
if (... array[i] matches what I want ...)
{
.. found ...
break;
} /*if*/
} /*for*/

Now we know Keith was right that people are joking!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Lawrence D'Oliveiro on Tue Jun 4 12:28:41 2024

On 04/06/2024 03:10, Lawrence D'Oliveiro wrote:

On Mon, 3 Jun 2024 11:16:15 +0300, Michael S wrote:

When all sources are available, linker is merely an implementation
detail.

That’s assuming all the code is written in the same language, compilable with the same compiler.

Why, how many C compilers do you use for the same project?

Yes it would want build projects where you only have a binary object
file of some library, then you need a tool that can process that.

But I nearly always use DLLs.

For typical non-trivial open-source projects, this is usually not true.

And consider, even with C, the meaning of top-level “static” and the implications for compiling the source in separate pieces versus all at
once.

Even in old days of small RAMs, super-popular TurboPascal suit had
modules, but I don't think that it had linker.

The programs it built had sizes in, say, the tens of thousands of lines at most.

I can build programs of 100s of thousands of lines with no linker. Why
wouldn't it be scalable? You would anyway expect larger programs to ne
split into different binaries such as dynamic libraries.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Lawrence D'Oliveiro on Tue Jun 4 12:35:43 2024

On 04/06/2024 03:12, Lawrence D'Oliveiro wrote:

On Mon, 3 Jun 2024 11:13:32 +0100, bart wrote:

So how does importlib manage to import importlib before importlib
itself is imported?

I guess the same way a linker manages to link itself.

There is NO ahead-of-time linking of modules in Python as it is
understood in traditional compiled languages.

Python is a compiled language.

CPython does ahead-of-time compilation to bytecode, of individual
modules on-demand. There is no AOT compilation of all modules to binary bytecode files which need a linking process before execution starts.

It is utterly different from the linkers used with typical C code.

In the past I have written interpreters that had a discrete bytecode
compiler that produced individual files per module containing binary
bytecode.

The loading process was handled by the interpreter, a separate program,
that fixed things up to allow the program to be run immediately; the
output was not another monolithic binary file.

Again this is very different from a traditional linker that combines .o,
.a, .lib and .dll files into a single executable.

(With .dll files, they are only used to build import tables of the
executable; the library says separate.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to David Brown on Tue Jun 4 13:30:18 2024

David Brown <[email protected]> writes:

On 04/06/2024 07:17, Kaz Kylheku wrote:

On 2024-06-03, Scott Lurndal <[email protected]> wrote:

At the time, in the OS research community, Chorus was, indeed well-known. >>

If Chorus at least doesn't vaguely ring a bell, you must have your head
up your ass as even a bachelor-level computer scientist.

I think that is putting it a bit strongly - it is a /long/ time since
Chorus was relevant even in academic circles. And while it was
influential, I don't know that it was ever widely used (Scott will know
more about that, I guess).

Unisys used it in the 90's as the basis of a large distributed MPP
machine[*] (which used the Intel Paragon supercomputer backplane) based
on the Pentium Pro. Provided a single system image to the programmer
across a collection of hardware resources (without cache coherency
but with page-level coherency).

I spent almost a decade working on the operating system which used
the Chorus microkernel (and got a couple nice trips to Paris :-).

The Chorus microkernel (and the unisys unix SVR4.2ES/MP compatible
subsystem) was also part of the European Amadeus project
(ICL, Unisys, USL, Fujitsu et alia) that was developing a distributed
operating system.

Sun eventually bought Chorus.

[*] internally known as OPUS, externally as the SPP (Scalable Parallel Processor). They primarily ran decision support software (Oracle
Parallel server, Informix, Redbrick). All of them were retired by 2010.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to BGB on Tue Jun 4 19:21:34 2024

BGB <[email protected]> writes:

On 6/3/2024 3:23 PM, Tim Rentsch wrote:

[email protected] (Scott Lurndal) writes:

[ ... (internal-combustion) engines, ... ]

It's pretty clear that the ICE is becoming a dinosaur.

Kind of makes it full circle, doesn't it? ;)

Though, annoyingly, there isn't a great alternative in some use cases:
Batteries: Lower energy density and require charging (slow);

Both of which are an order of magnitude better than just a
decade ago - and both energy density and charge time are
a subject of intense research (both in the automotive
and aircraft industries). I fully expect that energy density
per kilogram will be more than doubled in the next decade.

Fuel Cells: More expensive and finicky.

And if you're going to use renewable energy to crack water
into H2, why not just use the electricity itself (concentrate
on better storage technology rather than H2 (gas or liquid)
fuel cells).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to BGB on Tue Jun 4 19:17:50 2024

BGB <[email protected]> writes:

On 6/4/2024 12:17 AM, Kaz Kylheku wrote:

On 2024-06-03, Scott Lurndal <[email protected]> wrote:

At the time, in the OS research community, Chorus was, indeed well-known. >>

If Chorus at least doesn't vaguely ring a bell, you must have your head
up your ass as even a bachelor-level computer scientist.

FWIW: When I was going to college for a CS major, the emphasis was
mostly on Microsoft technologies, and a lot of the classes were taught
in C#. I mostly stuck with C for my own uses though (and IIRC did write
one class project in C++/CLI).

A CS major should concentrate on the theory (operating system principles, compiler principles, data structures, algorithmic complexity,
security, fundamentals of programming independent upon language,
and a survey of useful programming languages), and perhaps a look at the history of computing.

It sounds like your CS department let you down.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to bart on Wed Jun 5 01:50:38 2024

On Tue, 4 Jun 2024 12:35:43 +0100, bart wrote:

It is utterly different from the linkers used with typical C code.

It does symbol resolution, just like any linker. It handles dependencies
(both direct and transitive), just like any linker.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to bart on Wed Jun 5 01:51:41 2024

On Tue, 4 Jun 2024 12:28:41 +0100, bart wrote:

On 04/06/2024 03:10, Lawrence D'Oliveiro wrote:

On Mon, 3 Jun 2024 11:16:15 +0300, Michael S wrote:

When all sources are available, linker is merely an implementation
detail.

That’s assuming all the code is written in the same language,
compilable with the same compiler.

Why, how many C compilers do you use for the same project?

It’s not just C.

And consider, even with C, the meaning of top-level “static” and the implications for compiling the source in separate pieces versus all at
once.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to bart on Tue Jun 4 19:45:30 2024

bart <[email protected]> writes:

On 04/06/2024 03:10, Lawrence D'Oliveiro wrote:

On Mon, 3 Jun 2024 11:16:15 +0300, Michael S wrote:

When all sources are available, linker is merely an
implementation detail.

That's assuming all the code is written in the same language,
compilable with the same compiler.

Why, how many C compilers do you use for the same project?

Depends on the project.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to BGB on Tue Jun 4 23:59:31 2024

On 6/4/2024 9:44 PM, BGB wrote:

On 6/4/2024 2:21 PM, Scott Lurndal wrote:

BGB <[email protected]> writes:

On 6/3/2024 3:23 PM, Tim Rentsch wrote:

[email protected] (Scott Lurndal) writes:

[ ... (internal-combustion) engines, ... ]

It's pretty clear that the ICE is becoming a dinosaur.

Kind of makes it full circle, doesn't it? ;)

Though, annoyingly, there isn't a great alternative in some use cases:
Batteries: Lower energy density and require charging (slow);

Both of which are an order of magnitude better than just a
decade ago - and both energy density and charge time are
a subject of intense research (both in the automotive
and aircraft industries). I fully expect that energy density
per kilogram will be more than doubled in the next decade.

Still pretty far tough to catch up with Ethanol or Gasoline, where it is also many orders of magnitude faster to refill a fuel tank than to charge a battery, ...

IIRC, there aren't many battery technologies that can manage a charge rate much over 1C to 3C (so, getting a recharge time much under ~ 20 minutes or so is unlikely).

Vs, say, refilling something like a car in ~ 25 seconds or so at a fuel pump (but, could potentially be made faster if needed). Though, there are likely to be limits here short of redesigning the mechanical interface.

Say, it could be possible to refill a gas tank in around 3 seconds or so with enough pressure and active sensing, but whether this could be done reliably without undue risk of causing fuel tanks to rupture or similar is unclear (say, rather than

pumping the fuel at 10 gal/min, they pump it at 90 gal/min, and effectively pressure-washing the inside of the fuel-tank).

Also would need a fairly strong fuel hose as well (likely steel reinforced to deal with the pressure within the hose).

The main traditional disadvantage of liquid fuel (and ICE's) vs batteries and electric motors, is the comparably low conversion efficiency. Liquid fuel would be stronger here if better conversion efficiencies were achieved (an ICE losing much of its

potential energy as noise and heat).

So, ideally, need some sort of semi-efficient fuel to electricity conversion (possibly using a more modest size batter pack as a buffer stage).

Well, also some potential application areas, like human-scale robots, are hindered by not having any good way to power them (both ICE's and batteries sucking in this application area).

Fuel Cells: More expensive and finicky.

And if you're going to use renewable energy to crack water
into H2, why not just use the electricity itself (concentrate
on better storage technology rather than H2 (gas or liquid)
fuel cells).

Yeah, H2 just kinda sucks.

Ethanol is much better as a fuel in most regards.

But, effectively running fuel cells on Ethanol (rather than H2) is a more complex problem. Methanol is a little easier here, but still not great (also methanol poses a risk due to its high toxicity).

But, yeah, not really a good way to convert electricity into Ethanol or similar.

Methanol could be produced using electricity assuming one can scavenge enough CO2 (with water as an additional input, leaving O2 as a waste product).

Could in theory produce methanol simply using air and electricity as inputs (scavenging both H2O and CO2), but the conversion efficiency would likely be dismal (most of the energy use would be spent running an air compressor, though an air-motor could

recover some of this on the output side).

Say:
Compress air into a big tank;
Collect water that accumulates in tank;
Bubble compressed air through an amine solution (this collects CO2 into the solution);
Pump amine through another tank where heat is applied to extract CO2 from the solution (it is then cooled and pumped back through the former tank, to collect more CO2);
Collected water is subjected to a momentary pressure drop (to remove dissolved CO2), and then sent in to an electrolysis stage (to get H2 gas), with the H2 and CO2 being pumped into a heated high-pressure reaction chamber (to produce water and methanol,

say, 250C and 75bar), with the resulting water and methanol being collected, then fed through a distillation phase (likely dropping the pressure by a controlled amount so that the methanol vaporizes but leaving the water behind); the water is then
pumped back into the electrolysis step (which can also serves to also remove oxygen).

Likely, things like heat control/recovery would be needed to have any semblance of efficiency (as well, one would need to recover what energy they can when the waste products are returned to atmospheric pressure).

Pumping (followed by electrolysis) are likely to be the main energy uses, potentially much of the heating and cooling needed could be achieved through the compression and expansion stages (so potentially wouldn't need any additional energy input).

Would need to process a fairly large volume of air relative to any methanol produced though (so, I would expect mechanical losses in the compression and expansion stages would be where most of the energy loss would occur, such as due to friction in the

pumps and similar).

There are whizzy solutions. But they're not practical for consumer automotive.

The battery chemistries have "spider diagrams" which evaluate the battery on
a number of factors. And that is how we've settled on what is inside cars today.
The lifetime of the battery pack, received a high priority. A ballpark number is 5000 charging cycles. Real cars, some of them it might be closer to 1800. And leaving a car sitting in the hot sun, might contribute to some of the difference there. The charging history is part of it, but exposing the automobile
to harsh conditions, can also impact the battery a bit.

The solid state batteries in the lab, are around 1000 charge cycles currently. These have no liquid electrolyte like the currently-shipped batteries.

There have been announcements of lab demos of battery chemistries
with extremely short charge time. But then, the number of charge cycles is
a joke.

There was even a vehicle that went 1000 miles on a battery. Why ? Because
the battery was not rechargeable at all. The battery, at destination,
needed to be sent to a recycler. But that particular experiment was intended
to prove they could build a battery trip with more range than your bladder.

These are all examples of spider diagrams, where multiple points on the
diagram are a compromise. And if they don't compare favorably to a 5000 charge cycle battery, you won't see them in a car. Not with an 8-year warranty.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to BGB on Wed Jun 5 07:14:37 2024

On Tue, 4 Jun 2024 12:48:31 -0500, BGB wrote:

Though, I do have some questionable experimental features, like a
compiler option to cause arrays and pointers to be bounds-checked, which
is sometimes useful in debugging (but in some cases can add bugs of its
own; also makes the binaries bigger and negatively effects performance,
...).

I remember some research being done into this, back in the days of Pascal.

Remember that, in Pascal (and Ada), subranges exist as types in their own right, not just as bounds of arrays. And this allows the compiler to
optimize array-bounds checks, and sometimes get rid of them altogether.
E.g.

type
boundstype = 1 .. 10;
var
myarr : array [boundstype] of elttype;
index : boundstype;

With these definitions, an expression like

myarr[index]

doesn’t require any bounds-checking. Of course, assignments to index may require bounds-checking, depending on the types of values involved.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to BGB-Alt on Wed Jun 5 07:22:27 2024

On Tue, 4 Jun 2024 17:32:50 -0500, BGB-Alt wrote:

Linux was talked about to some extent in one of the classes (but, more
in a high-level introductory sense). ...

But, at the time, the then new OS was Windows Vista ...

See, there was an opportunity missed, wasn’t it, to compare the different design approaches in the two. Consider GUIs, for example: notice how the
GUI is commingled inextricably into the kernel in Windows, versus being a separate modular, replaceable (and removable) layer in Linux.

Particularly relevant in the Vista situation was the big trouble over the
3D effects in “Aero Glass”: Microsoft could only manage these on (for the time) more expensive, higher-end hardware. This led to a lot of user
confusion over what “Vista-capable” and “Vista-ready” meant, culminating
in lawsuits.

Meanwhile, my modest little Asus Eee 701 PC with its single-core 900-MHz Celeron could run KDE 4 Plasma, with my choice of fun 3D effects, without missing a beat.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Wed Jun 5 07:15:31 2024

On Mon, 03 Jun 2024 21:22:07 GMT, Scott Lurndal wrote:

At the time, in the OS research community, Chorus was, indeed
well-known.

As soon as you hear “microkernel”, you know it’s essentially a museum- piece now.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Lawrence D'Oliveiro on Wed Jun 5 09:10:34 2024

On 05/06/2024 02:50, Lawrence D'Oliveiro wrote:

On Tue, 4 Jun 2024 12:35:43 +0100, bart wrote:

It is utterly different from the linkers used with typical C code.

It does symbol resolution, just like any linker. It handles dependencies (both direct and transitive), just like any linker.

In that case I've written dozens of linkers across 40 years.

Any language that has 'module' objects, even within the same source
file, is a linker. Any whole-program compiler, even if its output is IR
or assembly, is a linker. Any smart code editor could be a linker.

The term usually refers to a program that takes independently compiled
binaries containing native code, and produces a monolithic binary
executable.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Thiago Adams on Wed Jun 5 15:09:31 2024

On 05/06/2024 14:23, Thiago Adams wrote:

(
I am interested in the C23 subject, but I found almost impossible to
follow such big thread. For me, it is even more difficult without google groups interface and it consumes a lot of time.
My suggestion is to split C23 topics in smaller ones for the specific
item like embed etc...
)

I think the "embed" part has run its course - or at least, we have
covered everything that could usefully be said about it until we start
seeing real-world implementations and real-world usage. (It spawned a
lot of nice discussions about implementations of xxd alternatives, but
that is not really topical about C23 any more.)

Having separate threads for different C23 features would be a fine idea.
Pick your favourite - or least favourite - and start a new thread.
But let's pick something other than #embed !

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Wed Jun 5 13:32:38 2024

Lawrence D'Oliveiro <[email protected]d> writes:

On Mon, 03 Jun 2024 21:22:07 GMT, Scott Lurndal wrote:

At the time, in the OS research community, Chorus was, indeed
well-known.

As soon as you hear “microkernel”, you know it’s essentially a museum- >piece now.

You're really a piece of work.

A modern hypervisor can be considered a microkernel.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to BGB on Wed Jun 5 13:29:35 2024

BGB <[email protected]> writes:

On 6/4/2024 2:21 PM, Scott Lurndal wrote:

BGB <[email protected]> writes:

On 6/3/2024 3:23 PM, Tim Rentsch wrote:

[email protected] (Scott Lurndal) writes:

[ ... (internal-combustion) engines, ... ]

It's pretty clear that the ICE is becoming a dinosaur.

Kind of makes it full circle, doesn't it? ;)

Though, annoyingly, there isn't a great alternative in some use cases:
Batteries: Lower energy density and require charging (slow);

Both of which are an order of magnitude better than just a
decade ago - and both energy density and charge time are
a subject of intense research (both in the automotive
and aircraft industries). I fully expect that energy density
per kilogram will be more than doubled in the next decade.

Still pretty far tough to catch up with Ethanol or Gasoline, where it is
also many orders of magnitude faster to refill a fuel tank than to
charge a battery, ...

Many orders of magnitude? 5 minutes (petrol) vs. 20 minutes
(supercharger to 80%)? And you can expect the latter to decrease
with time while the former has actually increased in the past few
decades (it used to be faster before self-service pumps were
developed).

IIRC, there aren't many battery technologies that can manage a charge
rate much over 1C to 3C (so, getting a recharge time much under ~ 20
minutes or so is unlikely).

Actually, here's where we're going on charge time:

https://www.fastcompany.com/91016543/scientists-just-invented-an-ev-battery-that-can-fully-charge-in-5-minutes

Vs, say, refilling something like a car in ~ 25 seconds or so at a fuel
pump (but, could potentially be made faster if needed). Though, there
are likely to be limits here short of redesigning the mechanical interface.

Come now, it's more like 5 minutes rather than 25 seconds (that is
almost a gallon a second). Not likely to find those types of speeds
in a self-serve gas station - just for safety sake if nothing else).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dan Cross@21:1/5 to Scott Lurndal on Wed Jun 5 13:59:35 2024

In article <WNZ7O.9283$nd%[email protected]>,
Scott Lurndal <[email protected]> wrote:

Lawrence D'Oliveiro <[email protected]d> writes:

On Mon, 03 Jun 2024 21:22:07 GMT, Scott Lurndal wrote:

At the time, in the OS research community, Chorus was, indeed
well-known.

As soon as you hear “microkernel”, you know it’s essentially a museum- >>piece now.

You're really a piece of work.

A modern hypervisor can be considered a microkernel.

I don't understand why people still engage with this clown.
Lawrence is obviously a troll.

- Dan C.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to bart on Thu Jun 6 02:12:26 2024

On Wed, 5 Jun 2024 09:10:34 +0100, bart wrote:

Any language that has 'module' objects, even within the same source
file, is a linker.

I did point out that linking involved pulling multiple files together, did
I not?

The term usually refers to a program that takes independently compiled binaries containing native code ...

Binaries of some form, yes. Remember “native” is just as relative a term
as “hardware” is.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Michael S on Thu Jun 6 14:43:25 2024

On Sun, 2 Jun 2024 01:11:35 +0300
Michael S <[email protected]> wrote:

On Fri, 31 May 2024 22:15:54 +0100
bart <[email protected]> wrote:

If I run this:

printf("%p\n", &_binary_hello_c_start);
printf("%p\n", &_binary_hello_c_end);
printf("%p\n", &_binary_hello_c_size);

I get:

00007ff6ef252010
00007ff6ef252056
00007ff5af240046

I can see that the first two can be subtracted to give the sizes of
the data, which is 70 or 0x46. 0x46 is the last byte of the address
of _size, so what's happening there? What's with the crap in bits
16-47?

It looks like ASLR. I don't see it because I test on Win7.

I tried it on versions of Windows that have ASLR. Had seen no problems.
I see *_start and *_end at high addresses and sometimes changing
between invocations, which means that ASLR is certainly in effect, but
*_size always prints correct result.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Lawrence D'Oliveiro on Thu Jun 6 19:38:08 2024

On 06/06/2024 03:12, Lawrence D'Oliveiro wrote:

On Wed, 5 Jun 2024 09:10:34 +0100, bart wrote:

Any language that has 'module' objects, even within the same source
file, is a linker.

I did point out that linking involved pulling multiple files together, did
I not?

The term usually refers to a program that takes independently compiled
binaries containing native code ...

Binaries of some form, yes. Remember “native” is just as relative a term as “hardware” is.

Sorry, I've lost track of what it is you are trying to prove. That
everything is a linker?

Of course, mine is the narrow viewpoint of someone who has implemented
multiple assemblers, compilers, interpreters and linkers (that I call
loaders) of various kinds across decades.

I thought I had long eliminated the need for traditional 'linking' from
my tools, but apparently not; I've been writing linkers without being
aware of it! How about that?

But if you want the last word, then you're welcome. You're obviously
100% right and I'm 100% wrong; happy?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to BGB-Alt on Thu Jun 6 21:38:27 2024

BGB-Alt <[email protected]> writes:

On 5/31/2024 4:11 PM, Scott Lurndal wrote:

jak <[email protected]> writes:

bart ha scritto:

On 31/05/2024 15:34, Michael S wrote:

On Fri, 31 May 2024 15:04:46 +0100
bart <[email protected]> wrote:

<snip>

Instead of one compiler, here I used two compilers, a tool 'objcopy'
(which bizarrely needs to generate ELF format files) and lots of extra >>>> ugly code. I also need to disregard whatever the hell _binary_..._size >>>> does.

But it works.

You could use the pe-x86-64 format instead of the elf64-x86-64 to reduce >>> the size of the object.

By a half dozen bytes, perhaps, and only if your binutils have been
built to support pe-x86-64:

$ objcopy -I binary -O pe-x86-64 main.cpp /tmp/test1.o
objcopy:/tmp/test1.o: Invalid bfd target

The ELF64 format has a 64 byte header, the string table and the
symbol table, and the remainder is the binary
data. The PE header may save a few bytes by using 32-bit fields in
the PE COFF header and symbol table.

Note, you might want to trim your posts when replying with a one-sentence reply.

While I can't say much for using objcopy here (it is likely to be
hindered by however the program was compiled and linked, in any case),
in some other contexts PE/COFF can save more significant amounts of
space vs ELF.

In particular:

PE/COFF typically only stores symbols for imports and exports, rather
than for every symbol in the binary (though, IIRC, GCC+LD does tend to >generate PE/COFF output with every symbol present, *1, so this advantage
is mostly N/A if using GCC).

$ man 1 strip

The PE/COFF base relocation format is more compact than the ELF64
relocation formats:
ELF64 tends to spend 24 bytes for every symbol, and 24 bytes for each
reloc; along with an ASCII string for every symbol.

Use ELF32 then.

It also tends to redirect most calls and loads/stores for global
variables through the GOT, rather than using PC-relative / RIP-relative >addressing (or fixed displacements relative to a Global Pointer),
causing the generated code to be larger (along with the size of the GOT).

That has nothing to do with ELF, per se. The ELF format supports
dynamic linking. It does not require it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to BGB-Alt on Fri Jun 7 00:53:45 2024

On Thu, 6 Jun 2024 15:38:21 -0500, BGB-Alt wrote:

*2: Seemingly the main way I am aware of to get small binaries is to use
an older version of MSVC (such as 6.0 to 9.0), as the binary-bloat
started to get much more obvious around Visual Studio 2010, but is less
of an issue with VS2005 or VS2008.

Newer version of proprietary compiler generates worse code than older version?!?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to bart on Fri Jun 7 00:55:00 2024

On Thu, 6 Jun 2024 19:38:08 +0100, bart wrote:

Sorry, I've lost track of what it is you are trying to prove. That
everything is a linker?

You were the one trying to prove that linkerless programming was a good
idea, or something. And then you tried to distract attention from the
weakness of your arguments by bringing Python into it.

Not working out so well now, is it?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to BGB on Fri Jun 7 00:57:51 2024

On Wed, 5 Jun 2024 04:01:28 -0500, BGB wrote:

For my bounds-checking in C, there are no syntactic changes to C.

But how efficient is it? Those research papers I mentioned reported being
able to get the execution overhead in Pascal down to something like 5-10%.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Fri Jun 7 00:59:09 2024

On Wed, 05 Jun 2024 13:32:38 GMT, Scott Lurndal wrote:

A modern hypervisor can be considered a microkernel.

Oh, look who’s desperately trying to keep the “microkernel” brand alive.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to BGB on Fri Jun 7 09:04:43 2024

On Fri, 7 Jun 2024 00:51:22 -0500, BGB wrote:

Generally, using ELF32 on 64-bit targets isn't a thing...

It might have been, if Intel’s promotion of an “X32” ABI (keeping addresses at 32 bits, but using the extra instructions for the increased register set) for AMD64 had taken off. Even the Linux kernel supported it,
at one point. But nobody seemed to care.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Lawrence D'Oliveiro on Fri Jun 7 22:23:43 2024

On 07/06/2024 01:55, Lawrence D'Oliveiro wrote:

On Thu, 6 Jun 2024 19:38:08 +0100, bart wrote:

Sorry, I've lost track of what it is you are trying to prove. That
everything is a linker?

You were the one trying to prove that linkerless programming was a good
idea, or something. And then you tried to distract attention from the weakness of your arguments by bringing Python into it.

Not working out so well now, is it?

For me it's working brilliantly. I have two native code compilers that
don't use a traditional linker:

* A C compiler with independently compiled modules
* A non-C compiler with whole-program compilation

It's you who can't get your head around the idea that someone could be
away with a 'linker'.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to bart on Sat Jun 8 00:39:18 2024

On 2024-06-07, bart <[email protected]> wrote:

It's you who can't get your head around the idea that someone could be
away with a 'linker'.

You can do away with linkers and linking.

But it's pretty helpful when

1. the same library is reused for many programs.

2. you're selling a library, and would like to ship a binary image of
that library.

Without linkage, you don't have a library ecosystem.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @[email protected]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Kaz Kylheku on Sat Jun 8 02:14:37 2024

On 08/06/2024 01:39, Kaz Kylheku wrote:

On 2024-06-07, bart <[email protected]> wrote:

It's you who can't get your head around the idea that someone could be
away with a 'linker'.

You can do away with linkers and linking.

But it's pretty helpful when

1. the same library is reused for many programs.

You use a shared library.

2. you're selling a library, and would like to ship a binary image of
that library.

You ship a shared library.

Without linkage, you don't have a library ecosystem.

Of course you do. Eg. a program depends on the vast WinAPI; but you
don't have to ship copies of all its DLLs, neither do you have to
statically link them.

There are some fixups involving even with using dynamic linking. That's
taken care of by the OS loader. But the code involved isn't extensive.
Here is an 800-line C program:

https://github.com/sal55/langs/blob/master/runmx.c

that loads a private executable format of mine; it loads any dynamic
libraries also in the same format (but it doesn't multiple instances
with other processes); and it resolves any symbols from dependent DLLs.
Then it runs the program.

(Here written using WinAPI calls.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to BGB-Alt on Sat Jun 8 03:08:06 2024

On Fri, 7 Jun 2024 16:58:08 -0500, BGB-Alt wrote:

I think code generation went in the bulky direction when they started
adding auto-vectorization, and not really any option to be like "Yes, I
want SIMD instructions enabled, but, no, don't autovectorize."

Sometimes vectorization makes things faster, sometimes not, but one
thing it does do, is make the generated binaries bigger.

And MSVC is the compiler that Microsoft use to build Windows itself, isn’t it?

Unless they’ve turned to GCC now ...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to bart on Sat Jun 8 03:55:19 2024

On 2024-06-08, bart <[email protected]> wrote:

On 08/06/2024 01:39, Kaz Kylheku wrote:

On 2024-06-07, bart <[email protected]> wrote:

It's you who can't get your head around the idea that someone could be
away with a 'linker'.

You can do away with linkers and linking.

But it's pretty helpful when

1. the same library is reused for many programs.

You use a shared library.

That's linking.

Static linking is the same thing as dynamic except it's being
precomputed: the libs are dynamically processed, but then rather
than the program being run, its image is dumped into an executable.
That executable no then longer needs to repeat that library processing
when started; everything is integrated. (There are ways to optimize
linking so not all the material must be present in memory all at once
as I describe it above.)

2. you're selling a library, and would like to ship a binary image of
that library.

You ship a shared library.

No, not always. There is such thing as selling static libraries.

Numerical code, crypto, codecs.

A few times in my career I worked with purchased static libs.

There are some advantages to it, like that static calls can be
faster than dynamic, and unused parts of static libs can be
removed at link time.

Another aspect is that it's possible for static libs to be platform-independent, to an extent, because some of the
object formats like COFF are widely recognized. Whereas
shared libs tend to be very OS specific. The vendor has to make
them separately for Windows, Linux, Solaris, BSD, Mac, ...

This gruntwork is a pain in the ass that is removed from
the core value of your code.

The integrator who buys your static lib can turn it into a
shared lib for their target system, if they are so inclined.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @[email protected]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to BGB on Sat Jun 8 08:27:22 2024

On Sat, 8 Jun 2024 00:04:02 -0500, BGB wrote:

Can also note that the compilers handle debugging info different:
GCC tends to put debug data in the binary itself (as DWARF or STABS);

Doesn’t have to be that way. Distros that offer precompiled binaries (e.g. Debian and derivatives) tend to have debug symbols in separate packages
that you don’t have to install. They’re only needed if you want to run a debugger on the actual binary from the package, rather than from your own build.

But, in general, I suspect MS doesn't care if the EXE and DLL files are
bulky and if their compiler doesn't win the performance game.

Yes, but isn’t this impacting the performance of everything they build
with it, including their own OS?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Kaz Kylheku on Sat Jun 8 11:14:45 2024

On 08/06/2024 04:55, Kaz Kylheku wrote:

On 2024-06-08, bart <[email protected]> wrote:

On 08/06/2024 01:39, Kaz Kylheku wrote:

On 2024-06-07, bart <[email protected]> wrote:

It's you who can't get your head around the idea that someone could be >>>> away with a 'linker'.

You can do away with linkers and linking.

But it's pretty helpful when

1. the same library is reused for many programs.

You use a shared library.

That's linking.

Static linking is the same thing as dynamic except it's being
precomputed: the libs are dynamically processed, but then rather
than the program being run, its image is dumped into an executable.
That executable no then longer needs to repeat that library processing
when started; everything is integrated. (There are ways to optimize
linking so not all the material must be present in memory all at once
as I describe it above.)

The actual process of linking is a fairly trivial matter, as I showed in
my 0.8Kloc C program which not only loads and relocates an executable
file (in my format), but loads, relocates and does symbol fixups of any
dynamic libraries. (Plus fixes up DLL dependencies too! But the
relocation of those is done within OS routines at my request.)

What is different in formats like PE and ELF is their tremendous
complexity; my formats are considerably simpler.

What I mean by 'doing anyway with linkers and linking' is removing the
need to run a discrete program that might be called 'ld' or 'link' or indirectly via 'gcc', from a language implementation.

Primarily by using whole-program compilation, where any inter-module
references are sorted out early on within the compiler via the global
symbol table. The compiler directly generates EXE/DLL from source files.

For C, the language requires independent compilation. Here, I generate
ASM files. But while traditionally those are assembled to object files
and linked, I use a special assembler where ASM files are directly
turned into EXE or DLL files.

The linking process is again done by manipulating a global symbol table.
There are no object files, and no separate discrete link step.

2. you're selling a library, and would like to ship a binary image of
that library.

You ship a shared library.

No, not always. There is such thing as selling static libraries.

Numerical code, crypto, codecs.

A few times in my career I worked with purchased static libs.

If you obtain a static library in the form of an object file or archive,
then yes you will need a program that can process that file and combine
it with the rest of your application: a linker.

But if /I/ were to write a linker, even to process PE/OFF files, it
would be a 50Kloc application. (There is already such a product, not
mine, which is 47KB, but it has some peculiarities.)

There are some advantages to it, like that static calls can be
faster than dynamic,

If you do your own fixups (you generate an executable where DLL
dependences are resolved via your initialisation code rather than
getting the OS to do it), you can arrange it so that calls to imported
routines are direct.

But I don't think it's worth the trouble. You generally know that calls
across FFI boundaries are going to be a tiny bit slower. That is, by
needing to execute one extra indirect and probably fully predicted jump
per call. So usually insignificant.

and unused parts of static libs can be

removed at link time.

Another aspect is that it's possible for static libs to be platform-independent, to an extent, because some of the
object formats like COFF are widely recognized. Whereas
shared libs tend to be very OS specific. The vendor has to make
them separately for Windows, Linux, Solaris, BSD, Mac, ...

Windows tends to use PE (which includes COFF). Linux tends to use ELF.

The thing about my private formats (MX/ML) is they would have been cross-platform.

This gruntwork is a pain in the ass that is removed from
the core value of your code.

The integrator who buys your static lib can turn it into a
shared lib for their target system, if they are so inclined.

Sure. My tools can generate OBJ files if necessary. But then it'll be
somebody else who needs to invoke a linker. Not me.

But if I were to supply a binary, it would be in the form of a DLL.
There are roundabout ways of bundling it into a EXE if necessary (my ML
format would be better for such a purpose).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Sat Jun 8 13:09:18 2024

Lawrence D'Oliveiro <[email protected]d> writes:

On Fri, 7 Jun 2024 16:58:08 -0500, BGB-Alt wrote:

I think code generation went in the bulky direction when they started
adding auto-vectorization, and not really any option to be like "Yes, I
want SIMD instructions enabled, but, no, don't autovectorize."

Sometimes vectorization makes things faster, sometimes not, but one
thing it does do, is make the generated binaries bigger.

And MSVC is the compiler that Microsoft use to build Windows itself, isn’t >it?

Last time I built NT, it used the command line compiler 'cl.exe', IIRC.

Granted that was 1998.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Malcolm McLean on Sun Jun 9 00:45:55 2024

On Sat, 8 Jun 2024 19:28:47 +0100, Malcolm McLean wrote:

On 07/06/2024 01:53, Lawrence D'Oliveiro wrote:

On Thu, 6 Jun 2024 15:38:21 -0500, BGB-Alt wrote:

*2: Seemingly the main way I am aware of to get small binaries is to
use an older version of MSVC (such as 6.0 to 9.0), as the binary-bloat
started to get much more obvious around Visual Studio 2010, but is
less of an issue with VS2005 or VS2008.

Newer version of proprietary compiler generates worse code than older
version?!?

If the code is calling extern gunctions that do IO, we woul expect these
to be massively more sophisticated on a modern ststem Witha little
comouter, pribtf just wtites acharacter raster and utimalthe he Os picks
the up and flushes it out to a pixel raster. And that' aal it's doing.
Whilst on a modrern syste, stdout can do whole lot of intricate things.

Nothing to do with the compiler, though.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Sun Jun 9 00:46:37 2024

On Sat, 08 Jun 2024 13:09:18 GMT, Scott Lurndal wrote:

Lawrence D'Oliveiro <[email protected]d> writes:

On Fri, 7 Jun 2024 16:58:08 -0500, BGB-Alt wrote:

I think code generation went in the bulky direction when they started
adding auto-vectorization, and not really any option to be like "Yes,
I want SIMD instructions enabled, but, no, don't autovectorize."

Sometimes vectorization makes things faster, sometimes not, but one
thing it does do, is make the generated binaries bigger.

And MSVC is the compiler that Microsoft use to build Windows itself, >>isn’t it?

Last time I built NT, it used the command line compiler 'cl.exe', IIRC.

Granted that was 1998.

Is that supposed to be an entirely different compiler? I would assume it
was simply a different way of invoking the same basic compiler engine.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Scott Lurndal on Sun Jun 9 11:19:53 2024

On Sat, 08 Jun 2024 13:09:18 GMT
[email protected] (Scott Lurndal) wrote:

Lawrence D'Oliveiro <[email protected]d> writes:

On Fri, 7 Jun 2024 16:58:08 -0500, BGB-Alt wrote:

I think code generation went in the bulky direction when they
started adding auto-vectorization, and not really any option to be
like "Yes, I want SIMD instructions enabled, but, no, don't
autovectorize."

Sometimes vectorization makes things faster, sometimes not, but one
thing it does do, is make the generated binaries bigger.

And MSVC is the compiler that Microsoft use to build Windows itself, >isn’t it?

Last time I built NT, it used the command line compiler 'cl.exe',
IIRC.

Granted that was 1998.

MSVC is a common informal moniker. It seems, after it was adapted by godbolt.org it became even more common than before.
cl.exe is the name of executive for very long time. I don't know how
long exactly, but would guess that more than 30 years.

Versions of compiler that were officially approved to build kernel
modules (supplied with DDK) were historically not the same versions
that were sold for user mode application development as part Visual
Studio package or of Windows SDK package. Not that they were radically different, just frozen at different points in time.
Nowadays, it seems, the versions are more in sync.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to BGB on Sun Jun 9 12:40:32 2024

On Sat, 8 Jun 2024 14:52:26 -0500
BGB <[email protected]> wrote:

On 6/8/2024 1:28 PM, Malcolm McLean wrote:

On 07/06/2024 01:53, Lawrence D'Oliveiro wrote:

On Thu, 6 Jun 2024 15:38:21 -0500, BGB-Alt wrote:

*2: Seemingly the main way I am aware of to get small binaries is
to use an older version of MSVC (such as 6.0 to 9.0), as the
binary-bloat started to get much more obvious around Visual
Studio 2010, but is less of an issue with VS2005 or VS2008.

Newer version of proprietary compiler generates worse code than
older version?!?

If the code is calling extern gunctions that do IO, we woul expect
these to be massively more sophisticated on a modern ststem Witha
little comouter, pribtf just wtites acharacter raster and utimalthe
he Os picks the up and flushes it out to a pixel raster. And that'
aal it's doing. Whilst on a modrern syste, stdout can do whole lot
of intricate things.

That is a whole lot of typos...

But, even if it is built calling MSVCRT as a DLL (rather than static
linked), modern MSVC is still the worst of the bunch in this area.

A build as RISC-V + PIE with a static-linked C library still manages
to be smaller than an x64 build via MSVC with entirely dynamic-linked libraries.

And, around 72% bigger than the same program built as a
dynamic-linked binary with "GCC -O3" (while also often still being
around 40% slower).

GCC on Windows or on Linux?
In my experience, gcc on Windows (ucrt64 variant, other gcc variants
are worse) very consistently produces bigger (stripped) exe than even
latest MSVCs which, as you correctly stated, are not as good as older
versions at producing small code.

The size of 'Hello, world' program (x86-64, dynamically linked C RTL)
vs2013 - 6,144 bytes
vs2019 - 9,216 bytes
gcc (Debian Linux, -no-pie) - 14,400 bytes
gcc (Debian Linux) - 14,472 bytes
gcc (ucrt64 DLL) - 18,432 bytes
gcc (old DLL) - 42,496 bytes

MSVC compilation flags: -O1 -MD
gcc compilation flags: -Oz -s

Contrast, VS2008 can build programs with binary sizes closer to those
of GCC.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Sun Jun 9 11:20:11 2024

On 09/06/2024 10:40, Michael S wrote:

On Sat, 8 Jun 2024 14:52:26 -0500
BGB <[email protected]> wrote:

On 6/8/2024 1:28 PM, Malcolm McLean wrote:

On 07/06/2024 01:53, Lawrence D'Oliveiro wrote:

On Thu, 6 Jun 2024 15:38:21 -0500, BGB-Alt wrote:

*2: Seemingly the main way I am aware of to get small binaries is
to use an older version of MSVC (such as 6.0 to 9.0), as the
binary-bloat started to get much more obvious around Visual
Studio 2010, but is less of an issue with VS2005 or VS2008.

Newer version of proprietary compiler generates worse code than
older version?!?

If the code is calling extern gunctions that do IO, we woul expect
these to be massively more sophisticated on a modern ststem Witha
little comouter, pribtf just wtites acharacter raster and utimalthe
he Os picks the up and flushes it out to a pixel raster. And that'
aal it's doing. Whilst on a modrern syste, stdout can do whole lot
of intricate things.

That is a whole lot of typos...

But, even if it is built calling MSVCRT as a DLL (rather than static
linked), modern MSVC is still the worst of the bunch in this area.

A build as RISC-V + PIE with a static-linked C library still manages
to be smaller than an x64 build via MSVC with entirely dynamic-linked
libraries.

And, around 72% bigger than the same program built as a
dynamic-linked binary with "GCC -O3" (while also often still being
around 40% slower).

GCC on Windows or on Linux?
In my experience, gcc on Windows (ucrt64 variant, other gcc variants
are worse) very consistently produces bigger (stripped) exe than even
latest MSVCs which, as you correctly stated, are not as good as older versions at producing small code.

The size of 'Hello, world' program (x86-64, dynamically linked C RTL)
vs2013 - 6,144 bytes
vs2019 - 9,216 bytes
gcc (Debian Linux, -no-pie) - 14,400 bytes
gcc (Debian Linux) - 14,472 bytes
gcc (ucrt64 DLL) - 18,432 bytes
gcc (old DLL) - 42,496 bytes

I get a lot worse than that:

C:\c>gcc hello.c

C:\c>dir a.exe
09/06/2024 11:04 367,349 a.exe

C:\c>gcc hello.c -s -Os

C:\c>dir a.exe
09/06/2024 11:04 88,064 a.exe

(It didn't like -Oz; did you mean something other than -Os?)

Both import msvcrt.dll. gcc is version 10.3.0.

tcc gives 2KB, and mcc gives 2.5KB.

(With the latter, I know it is because it uses a comprises 5 blocks of
data each of which is at least 512 bytes: 2 for header stuff, plus
always 3 segments. The mininum hello.exe size I think is 700 bytes if a
few corners are cut.)

367KB sounds astonishing, but the first time I tried Dart, it gave me a
5MB executable for 'hello.dart'.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Sun Jun 9 14:12:39 2024

On Sun, 9 Jun 2024 11:20:11 +0100
bart <[email protected]> wrote:

On 09/06/2024 10:40, Michael S wrote:

On Sat, 8 Jun 2024 14:52:26 -0500
BGB <[email protected]> wrote:

On 6/8/2024 1:28 PM, Malcolm McLean wrote:

On 07/06/2024 01:53, Lawrence D'Oliveiro wrote:

On Thu, 6 Jun 2024 15:38:21 -0500, BGB-Alt wrote:

*2: Seemingly the main way I am aware of to get small binaries
is to use an older version of MSVC (such as 6.0 to 9.0), as the
binary-bloat started to get much more obvious around Visual
Studio 2010, but is less of an issue with VS2005 or VS2008.

Newer version of proprietary compiler generates worse code than
older version?!?

If the code is calling extern gunctions that do IO, we woul expect
these to be massively more sophisticated on a modern ststem Witha
little comouter, pribtf just wtites acharacter raster and
utimalthe he Os picks the up and flushes it out to a pixel
raster. And that' aal it's doing. Whilst on a modrern syste,
stdout can do whole lot of intricate things.

That is a whole lot of typos...

But, even if it is built calling MSVCRT as a DLL (rather than
static linked), modern MSVC is still the worst of the bunch in
this area.

A build as RISC-V + PIE with a static-linked C library still
manages to be smaller than an x64 build via MSVC with entirely
dynamic-linked libraries.

And, around 72% bigger than the same program built as a
dynamic-linked binary with "GCC -O3" (while also often still being
around 40% slower).

GCC on Windows or on Linux?
In my experience, gcc on Windows (ucrt64 variant, other gcc variants
are worse) very consistently produces bigger (stripped) exe than
even latest MSVCs which, as you correctly stated, are not as good
as older versions at producing small code.

The size of 'Hello, world' program (x86-64, dynamically linked C
RTL) vs2013 - 6,144 bytes
vs2019 - 9,216 bytes
gcc (Debian Linux, -no-pie) - 14,400 bytes
gcc (Debian Linux) - 14,472 bytes
gcc (ucrt64 DLL) - 18,432 bytes
gcc (old DLL) - 42,496 bytes

I get a lot worse than that:

C:\c>gcc hello.c

C:\c>dir a.exe
09/06/2024 11:04 367,349 a.exe

C:\c>gcc hello.c -s -Os

C:\c>dir a.exe
09/06/2024 11:04 88,064 a.exe

(It didn't like -Oz; did you mean something other than -Os?)

No, I meant -Oz.
It was invented by clang, but newer gcc understand it.
I don't know what is a difference exactly, but -Oz tends to be a little smaller.
In program as trivial as this, there should be no difference.

Both import msvcrt.dll. gcc is version 10.3.0.

My gcc variants are from msys2.
Where did you get yours?

tcc gives 2KB, and mcc gives 2.5KB.

x86-64 or i386?
I think, on i386 VC5 can come close, but can not match it.
I don't have VC5 right now. Last time I tried to find it it was
surprisingly hard.
Well, probably I still has it on one very old PC that I didn't power up
for many years. I don't know if it is still alive.

(With the latter, I know it is because it uses a comprises 5 blocks
of data each of which is at least 512 bytes: 2 for header stuff, plus
always 3 segments. The mininum hello.exe size I think is 700 bytes if
a few corners are cut.)

367KB sounds astonishing, but the first time I tried Dart, it gave me
a 5MB executable for 'hello.dart'.

golang tend to start at >1.5MB, but then it grows very slowly. It
appears to generate *very* self-contained executives. At least I
personally never encountered case where simple copy of exe to new
computer was insufficient.
Considering that go needs much more of run-time support than dart, I
can't find any reason for 5MB except "they don't care".

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Michael S on Sun Jun 9 14:44:27 2024

On Sun, 9 Jun 2024 14:12:39 +0300
Michael S <[email protected]> wrote:

On Sun, 9 Jun 2024 11:20:11 +0100
bart <[email protected]> wrote:

367KB sounds astonishing, but the first time I tried Dart, it gave
me a 5MB executable for 'hello.dart'.

golang tend to start at >1.5MB, but then it grows very slowly. It
appears to generate *very* self-contained executives. At least I
personally never encountered case where simple copy of exe to new
computer was insufficient.
Considering that go needs much more of run-time support than dart, I
can't find any reason for 5MB except "they don't care".

If we started talking about size of statically linked binaries, in this
field [on x86-64] an advantage of Windows/MSVC over Linux/gcc appears
quite huge.

MSVC 2013 - 84,480 bytes
MSVC 2019 - 119,808 bytes
gcc (Debian Linux) - 682,688 bytes

By old standards, MSVC binary is bloated beyond reason, but
comparatively to gcc/Linux it looks almost lean.

I can't say that I care deeply, but can't say that I don't care at all
either. Statically linked binaries is the only way by which I was able
to copy programs compiled on relatively new Debian to Ubuntu-LTS that
was not that much older (2-3 years). I fully believe that there exist
other methods, but they are above my skills and above skills of
co-workers.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Sun Jun 9 20:00:14 2024

On Sun, 9 Jun 2024 17:32:40 +0100
bart <[email protected]> wrote:

On 09/06/2024 12:12, Michael S wrote:

On Sun, 9 Jun 2024 11:20:11 +0100
bart <[email protected]> wrote:

GCC on Windows or on Linux?
In my experience, gcc on Windows (ucrt64 variant, other gcc
variants are worse) very consistently produces bigger (stripped)
exe than even latest MSVCs which, as you correctly stated, are
not as good as older versions at producing small code.

The size of 'Hello, world' program (x86-64, dynamically linked C
RTL) vs2013 - 6,144 bytes
vs2019 - 9,216 bytes
gcc (Debian Linux, -no-pie) - 14,400 bytes
gcc (Debian Linux) - 14,472 bytes
gcc (ucrt64 DLL) - 18,432 bytes
gcc (old DLL) - 42,496 bytes

I get a lot worse than that:

C:\c>gcc hello.c

C:\c>dir a.exe
09/06/2024 11:04 367,349 a.exe

C:\c>gcc hello.c -s -Os

C:\c>dir a.exe
09/06/2024 11:04 88,064 a.exe

(It didn't like -Oz; did you mean something other than -Os?)

No, I meant -Oz.
It was invented by clang, but newer gcc understand it.
I don't know what is a difference exactly, but -Oz tends to be a
little smaller.
In program as trivial as this, there should be no difference.

Both import msvcrt.dll. gcc is version 10.3.0.

My gcc variants are from msys2.
Where did you get yours?

It's gcc/TDM.

I never heard about TDM except from you.

Anything else, I can spend 10 minutes following links
to a mingw download, only to end up back where I started from.
gcc/TDM is a much simpler installation.

Somehow, I installed msys2 many times, using 2 or 3 different methods
and it worked every single time. It's huge download, but it works.
There were cases where I had problems installing additional packages on
top of msys2, but they were always caused by idiotic policies of
corporate IT. At my personal systems it was always flawless.

This page appear to give correct up to date instructions https://www.msys2.org/#installation

tcc gives 2KB, and mcc gives 2.5KB.

x86-64 or i386?

All were for x64.

gcc's stdio.h header defines `printf` (which my hello.c uses) as an
inlined wrapper based around `__mingw_vasprintf()`. So there might
be further inlined stuff or that is statically linked, before it
finally ends up calling the real `printf`.

The size you mentioned in the previous post is suspiciously similar to
the size VS2013 statically linked binary.

With gcc, I get 39.9KB for -m32 -Os -s.

That is smaller than statically linked 32-bit VS2013 (73,216 bytes).
But a lot bigger than 6,144 DLL-based VS2013 32bit binary.

If I use 'puts' instead, and -m32, then it gets down to 14KB.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Sun Jun 9 17:32:40 2024

On 09/06/2024 12:12, Michael S wrote:

On Sun, 9 Jun 2024 11:20:11 +0100
bart <[email protected]> wrote:

GCC on Windows or on Linux?
In my experience, gcc on Windows (ucrt64 variant, other gcc variants
are worse) very consistently produces bigger (stripped) exe than
even latest MSVCs which, as you correctly stated, are not as good
as older versions at producing small code.

The size of 'Hello, world' program (x86-64, dynamically linked C
RTL) vs2013 - 6,144 bytes
vs2019 - 9,216 bytes
gcc (Debian Linux, -no-pie) - 14,400 bytes
gcc (Debian Linux) - 14,472 bytes
gcc (ucrt64 DLL) - 18,432 bytes
gcc (old DLL) - 42,496 bytes

I get a lot worse than that:

C:\c>gcc hello.c

C:\c>dir a.exe
09/06/2024 11:04 367,349 a.exe

C:\c>gcc hello.c -s -Os

C:\c>dir a.exe
09/06/2024 11:04 88,064 a.exe

(It didn't like -Oz; did you mean something other than -Os?)

No, I meant -Oz.
It was invented by clang, but newer gcc understand it.
I don't know what is a difference exactly, but -Oz tends to be a little smaller.
In program as trivial as this, there should be no difference.

Both import msvcrt.dll. gcc is version 10.3.0.

My gcc variants are from msys2.
Where did you get yours?

It's gcc/TDM. Anything else, I can spend 10 minutes following links to a
mingw download, only to end up back where I started from. gcc/TDM is a
much simpler installation.

tcc gives 2KB, and mcc gives 2.5KB.

x86-64 or i386?

All were for x64.

gcc's stdio.h header defines `printf` (which my hello.c uses) as an
inlined wrapper based around `__mingw_vasprintf()`. So there might be
further inlined stuff or that is statically linked, before it finally
ends up calling the real `printf`.

With gcc, I get 39.9KB for -m32 -Os -s.

If I use 'puts' instead, and -m32, then it gets down to 14KB.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Sun Jun 9 21:06:00 2024

On 09/06/2024 18:00, Michael S wrote:

On Sun, 9 Jun 2024 17:32:40 +0100
bart <[email protected]> wrote:

On 09/06/2024 12:12, Michael S wrote:

On Sun, 9 Jun 2024 11:20:11 +0100
bart <[email protected]> wrote:

GCC on Windows or on Linux?
In my experience, gcc on Windows (ucrt64 variant, other gcc
variants are worse) very consistently produces bigger (stripped)
exe than even latest MSVCs which, as you correctly stated, are
not as good as older versions at producing small code.

The size of 'Hello, world' program (x86-64, dynamically linked C
RTL) vs2013 - 6,144 bytes
vs2019 - 9,216 bytes
gcc (Debian Linux, -no-pie) - 14,400 bytes
gcc (Debian Linux) - 14,472 bytes
gcc (ucrt64 DLL) - 18,432 bytes
gcc (old DLL) - 42,496 bytes

I get a lot worse than that:

C:\c>gcc hello.c

C:\c>dir a.exe
09/06/2024 11:04 367,349 a.exe

C:\c>gcc hello.c -s -Os

C:\c>dir a.exe
09/06/2024 11:04 88,064 a.exe

(It didn't like -Oz; did you mean something other than -Os?)

No, I meant -Oz.
It was invented by clang, but newer gcc understand it.
I don't know what is a difference exactly, but -Oz tends to be a
little smaller.
In program as trivial as this, there should be no difference.

Both import msvcrt.dll. gcc is version 10.3.0.

My gcc variants are from msys2.
Where did you get yours?

It's gcc/TDM.

I never heard about TDM except from you.

Anything else, I can spend 10 minutes following links
to a mingw download, only to end up back where I started from.
gcc/TDM is a much simpler installation.

Somehow, I installed msys2 many times, using 2 or 3 different methods
and it worked every single time. It's huge download, but it works.
There were cases where I had problems installing additional packages on
top of msys2, but they were always caused by idiotic policies of
corporate IT. At my personal systems it was always flawless.

I'm not talking about MSYS2. I'm not even sure what it is. msys2.org
describes it as:

"MSYS2 is software distribution and a building platform for Windows. It provides a Unix-like environment, a command-line interface and a
software repository making it easier to install, use, build and port
software on Windows. That means Bash, Autotools, Make, Git, GCC, GDB...,
all easily installable through Pacman, a fully-featured package manager."

Um, I only want an optimising C compiler, nothing else! And especially I
do NOT want a 'Unix-like' environment; I think it is entirely
unnecessary for a tool that simply converts .c files into .exe files.

This page appear to give correct up to date instructions https://www.msys2.org/#installation

Today I tried once more to install mingw gcc. One hit gave me this page:

https://www.naukri.com/code360/library/gcc-compiler-for-windows

Step 1 tells me to click here:

https://sourceforge.net/projects/mingw-w64/

It says: "A complete runtime environment for gcc"; hmm; it doesn't sound
like a compiler! But I'm just following the instructions.

After 10 minutes I had a 110B installation with 6000 files, but none was
the 85KB EXE file mentioned in step 3, which isn't even part of the ZIP according to the screen shot. Where does that file come from?

So I tried a different tack; that took me here:

https://sourceforge.net/projects/mingw/

This one turns out to be that 85KB file that was missing before! OK,
let's do it. It shows a list of things to install, including MSYS2 (no
thanks) and compilers for Ada, C++, Fortran, Objective-C, but no C,
unless it is the 'base' package? I have really no idea.

I click that, but then what? There is no Install, Proceed, Get, or OK
button! But under a pulldown menu, there is Apply Changes. Now it's
doing something. At the end there was no specific message, but it said somewhere: This package has not been installed;...

But I tried it anyway (notice this is from a normal command line):

C:\c>gcc --version
gcc (MinGW.org GCC-6.3.0-1) 6.3.0

So it's version 6 of gcc! Nowhere do I remember seeing that mentioned.

I don't normally waste my time going down these futile rabbitholes, but sometimes it can be fun as you get to see some appallingly bad
installation processes.

Of course, people will go to any lengths to defend these very complex
products, and will explain to you why it is good idea to separate out
compiler, headers, assembler, linker, library into lots of different
pieces, all with names that are subtle variations of mingw and w64

This is why I prefer TDM:

https://jmeubank.github.io/tdm-gcc/

Click on the version you want.

Other Windows C compilers are even simpler, and smaller. (TDM is 0.5GB,
Tiny C is under 0.002GB, and own 'bcc' is 0.001GB. My own non-C compiler
is 0.0004GB. Both my products are single EXE files.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to All on Sun Jun 9 23:40:07 2024

I can only tell you what works well for me. I can't force you to use it.
Also, I can't prevent you from trying to use something that no longer
works well due to absence of support, i.e. old msys/mingw.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Sun Jun 9 22:49:39 2024

On 09/06/2024 21:40, Michael S wrote:

I can only tell you what works well for me. I can't force you to use it. Also, I can't prevent you from trying to use something that no longer
works well due to absence of support, i.e. old msys/mingw.

I was trying to install the LATEST version of gcc on Windows! That would
13.x, which I've done before, perhaps hitting on the right link by chance.

'gcc' /can/ be run from a pure Windows command line, as I've been using versions of it for years.

But they don't make it easy, as gcc is perceived to be tied to WSL MSYS2
MINGW CYGWIN.

I've had another go at this elusive compiler, this time apparently
successful. Here are the steps I used:

* Start from mingw-w64.com. Ignore where it says it's a 'complete
runtime environment for gcc'. There is also an actual compiler at the
end of the process!

* Click on Downloads on the left

* There is a list of prebuilt toolchains. The promising ones are
w64devkit, MingW-W64-builds, and possibly WinLibs.com?
I clicked on MinGW-W64-builds.

* That takes you down the page to MingW-Builds, but this is where I had
a bit of luck: as this is a one-line entry, I missed it and starting
reading about WinLibs.com instead. But where are the downloads? The
link is in the small print on the last line of that section.

* It you to winlibs.com. This is looks disconcertingly like a 1990s
website. It surely can't be the right place? Just don't click on
MinGW-w64 as that just takes you back to square one.

* Scroll down to Downloads. There are 16 to choose from for each
version. I clicked (by mistake - I think) on the version /with/ LLVM
etc, but I don't know what the difference is. I chose the MSVCRT
version.

The end result was a 1.4GB installation of gcc 14.1.0. Using 'gcc
hello.c -Os -s' gives of 48KB (with 10.3 it was 88KB). It still imports msvcrt.dll, but not printf (it does import vfprintf).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Mon Jun 10 01:06:53 2024

On Sun, 9 Jun 2024 22:49:39 +0100
bart <[email protected]> wrote:

On 09/06/2024 21:40, Michael S wrote:

I can only tell you what works well for me. I can't force you to
use it. Also, I can't prevent you from trying to use something that
no longer works well due to absence of support, i.e. old msys/mingw.

I was trying to install the LATEST version of gcc on Windows! That
would 13.x, which I've done before, perhaps hitting on the right link
by chance.

'gcc' /can/ be run from a pure Windows command line, as I've been
using versions of it for years.

But they don't make it easy, as gcc is perceived to be tied to WSL
MSYS2 MINGW CYGWIN.

I've had another go at this elusive compiler, this time apparently successful. Here are the steps I used:

* Start from mingw-w64.com. Ignore where it says it's a 'complete
runtime environment for gcc'. There is also an actual compiler at
the end of the process!

* Click on Downloads on the left

* There is a list of prebuilt toolchains. The promising ones are
w64devkit, MingW-W64-builds, and possibly WinLibs.com?
I clicked on MinGW-W64-builds.

* That takes you down the page to MingW-Builds, but this is where I
had a bit of luck: as this is a one-line entry, I missed it and
starting reading about WinLibs.com instead. But where are the
downloads? The link is in the small print on the last line of that
section.

* It you to winlibs.com. This is looks disconcertingly like a 1990s
website. It surely can't be the right place? Just don't click on
MinGW-w64 as that just takes you back to square one.

* Scroll down to Downloads. There are 16 to choose from for each
version. I clicked (by mistake - I think) on the version /with/
LLVM etc, but I don't know what the difference is. I chose the MSVCRT
version.

The end result was a 1.4GB installation of gcc 14.1.0. Using 'gcc
hello.c -Os -s' gives of 48KB (with 10.3 it was 88KB). It still
imports msvcrt.dll, but not printf (it does import vfprintf).

It sounds like you ended up with gcc distro based on 12 y.o. Microsoft
DLL that does not support majority of c11 library features and likely
does not support few c99 library features as well.
If you were a little less stubborn, in 10 minutes you could have have
distro based on new ucrt DLL that is closer to new C standard and
generates smaller binaries.
And likely occupies less than 1.4 GB.

BTW, I don't understand why MSVC produces smaller binaries with old MS C
RTL DLL while gcc produces smaller binaries with new MS C RTL DLL.
But that's undeniable fact.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Mon Jun 10 01:26:23 2024

On 09/06/2024 23:06, Michael S wrote:

On Sun, 9 Jun 2024 22:49:39 +0100
bart <[email protected]> wrote:

On 09/06/2024 21:40, Michael S wrote:

I can only tell you what works well for me. I can't force you to
use it. Also, I can't prevent you from trying to use something that
no longer works well due to absence of support, i.e. old msys/mingw.

I was trying to install the LATEST version of gcc on Windows! That
would 13.x, which I've done before, perhaps hitting on the right link
by chance.

'gcc' /can/ be run from a pure Windows command line, as I've been
using versions of it for years.

But they don't make it easy, as gcc is perceived to be tied to WSL
MSYS2 MINGW CYGWIN.

I've had another go at this elusive compiler, this time apparently
successful. Here are the steps I used:

...

The end result was a 1.4GB installation of gcc 14.1.0. Using 'gcc
hello.c -Os -s' gives of 48KB (with 10.3 it was 88KB). It still
imports msvcrt.dll, but not printf (it does import vfprintf).

It sounds like you ended up with gcc distro based on 12 y.o. Microsoft
DLL that does not support majority of c11 library features and likely
does not support few c99 library features as well.
If you were a little less stubborn, in 10 minutes you could have have
distro based on new ucrt DLL that is closer to new C standard and
generates smaller binaries.
And likely occupies less than 1.4 GB.

I downloaded a different 14.1 version that was 'only' 0.8GB. (Compared
to 1.4GB; it's still 2000 times bigger than my main compiler!)

That uses UCRT, but the size difference is probably due to not including LLVM/Clang stuff. (Which didn't work anyway; I think clang triggered my AV.)

This now gives a hello.c executable of 22KB; it was 48KB with the 1.4GB download, and 88KB with 10.3.0.

BTW, I don't understand why MSVC produces smaller binaries with old MS C
RTL DLL while gcc produces smaller binaries with new MS C RTL DLL.
But that's undeniable fact.

I think the sizes of the runtime libraries are irrelevant if they are
both dynamically linked. It's what the compiler puts directly into the executable that makes the difference. And here they are just too diverse
in how they work. It can't be the 20 bytes of code for hello.c that
affects it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Michael S on Tue Jun 11 08:33:45 2024

On Sun, 9 Jun 2024 14:12:39 +0300, Michael S wrote:

I don't know what is a difference exactly, but -Oz tends to be a little smaller.

Some kind of “wizard” optimization, no doubt ...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to BGB on Fri Jun 14 03:20:37 2024

On Fri, 7 Jun 2024 02:52:56 -0500, BGB wrote:

On 6/6/2024 7:57 PM, Lawrence D'Oliveiro wrote:

On Wed, 5 Jun 2024 04:01:28 -0500, BGB wrote:

For my bounds-checking in C, there are no syntactic changes to C.

But how efficient is it? Those research papers I mentioned reported
being able to get the execution overhead in Pascal down to something
like 5-10%.

Also somewhere around a 10% slowdown in this case, but this was with dedicated ISA level support and various specialized helper instructions
(to check/set/adjust the pointer bounds bits).

Yeah, see, Pascal was able to do it without all that.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Keith Thompson on Fri Jun 14 23:39:18 2024

On 14/06/2024 22:30, Keith Thompson wrote:

Now that it's too late to change the definition, I've thought of
something that I think would have been a better way to specify #embed.

Define a new kind of string literal, with a "uc" prefix. `uc"foo"` is
of type `unsigned char[3]`. (Or `const unsigned char[3]`, if that's not
too radical.) Unlike other string literals, there is no implicit
terminating '\0'. Arbitrary byte values can of course be specified in hexadecimal: uc"\x01\x02\x03\x04". Since there's no terminating null character and C doesn't support zero-sized objects, uc"" is a syntax
error.

uc"..." string literals might be made even simpler, for example allowing
only hex digits and not requiring \x (uc"01020304" rather than uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals
could be useful in other contexts, and programmers will want
flexibility. Maybe something like hex"01020304" (embedded spaces could
be ignored) could be defined in addition to uc"\x01\x02\x03\x04".

That's something I added to string literals in my language within the
last few months. Nothing do with embedding (but it can make hex
sequences within strings more efficient, if that approach was used).

Writing byte-at-a-time hex data was always a bit fiddly:

0x12, 0x34, 0xAB, ...
"\x12\x34\xAB...

It was made worse by my preference for `x` being in lower case, and the
hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look wrong.

What I did was create a new, variable-lenghth string escape sequence
that looks like this:

"ABC\h1234AB...\nopq" // hex sequence between ABC & nopq

Hex digits after \h or \H are read in pairs. White space is allowed
between pairs:

"ABC\H 12 34 AB ...\nopq"

The only thing I wasn't sure about was the closing backslash, which
looks at first like another escape code. But I think it is sound,
although it can still be tweaked.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Sat Jun 15 17:58:22 2024

On 14/06/2024 23:30, Keith Thompson wrote:

David Brown <[email protected]> writes:

On 28/05/2024 22:21, Keith Thompson wrote:

David Brown <[email protected]> writes:

On 28/05/2024 02:33, Keith Thompson wrote:

[...]

Without some kind of programmer control, I'm concerned that the rules >>>>> for defining an array so #embed will be correctly optimized will be
spread as lore rather than being specified anywhere.

They might, but I really do not think that is so important, since they >>>> will not affect the generated results.

Right, it won't affect the generated results (assuming I use it
correctly). Unless I use `#embed optimize(true)` to initialize
a struct with varying member sizes, but that's my fault because I
asked for it.

I am still not understanding your point. (I am confident that you
have a point, even if I don't get it.)

I cannot see why there would be any need or use of manually adding
optimisation hints or controls in the source code. I cannot see why
the there is any possibility of getting incorrect results in any way.

The point is compile-timer performance, and perhaps even the ability
to compile at all.
I'm thinking about hypothetical cases where I want to embed a
*very* large file and parsing the comma-delimited sequence could
have unacceptable compile-time performance, perhaps even causing
a compile-time stack overflow depending on how the parser works.
Every time the compiler sees #embed, it has to decide whether to
optimize it or not, and the decision criteria are not specified
anywhere (not at all in the standard, perhaps not clearly in the
compiler's documentation).

Yes, I agree with that. And this is how it should be - this is not
something that should be specified. The C standards give minimum
requirements for things like the number of identifiers or the length
of lines. But pretty much all compilers, for most of the "translation
limits", say they are "limited by the memory of the host computer".
The same will apply to #embed. And some compilers will cope better
than others with huge #embed's, some will be faster, some more memory
efficient. Some will change from version to version. This is not
something that can sensibly be specified or formalized - like pretty
much everything in regard to compilation time, each compiler does the
best it can without any specifications. I'd expect compiler reference
manuals might have hints, such as saying #embed is fastest with
unsigned char arrays (or whatever), but no more than that.

But again - I see no reason for manual optimisation hints, and no
reason for any possible errors.

Let me outline a possible strategy for a compiler like gcc. (I have
not looked at the prototype implementations from thephd, nor any gcc
developer discussions.)

gcc splits the C pre-processor and the compiler itself, and
(currently) communicates dataflow in only one direction, via a
temporary file or a pipe. But the "gcc" (or "g++", according to
preference) driver program calls and coordinates the two programs.

If the pre-processor is called stand-alone, then it will generate a
comma-separated list of integers, helpfully split over multiple lines
of reasonable size. This will clearly always be correct, and always
work, within limits of a compiler's translation limits.

But when the gcc driver calls it, it will have a flag indicating that
the target compiler is gcc and supports an extended pre-processed
syntax (and also that the source is C23 - after all, the C
pre-processor can be used as a macro processor for other files with no
relation to C). Now the pre-processor has a lot more freedom.
Whenever it meets an #embed directive, it can generate a line :

#embed_data 123456

followed in the file by 123456 (or whatever) bytes of binary data.
The C compiler, when parsing this file, will pull that in as a single
blob. Then it is up to the C compiler - which knows how the #embed
data will be used - to tell if the these bytes should be used as
parameters to a macro, initialisation for a char array, or whatever.
And it can use them as efficiently as practically possible. (It is
probably only worth using this for #embed data over a certain size -
smaller #embed's could just generate the integer sequences.)

Nowhere in this is there any call of manual optimisation hints, nor
any risk of incorrect results.

I've kept this on the back burner for a couple of weeks. I'm finally
getting around to posting a followup.

I'm not particular concerned about compilers processing #embed
incorrectly. It's conceivable that a compiler could incorrectly decide
that it can optimize a particular #embed directive, but I expect
compilers to be conservative, falling back to the specified behavior if
they can't *prove* that an optimization is safe.

I'd expect that too. (Of course there's always the risk of bugs with
weird use-case)

I see two conceptual problems with #embed as it's currently defined in
N3220.

First, there's a possible compile-time performance issue for very large embedded files. The (draft) standard calls for #embed to expand to a comma-separated list of integer constant expressions. (I'm not sure why
it didn't specify integer constants.)

My objection is based on the possibility that #embed for a *very* large
file might result in unacceptable time and memory usage during compile
time. I haven't looked into how existing compilers handle large initializers, but I can imagine that parsing such a list might consume
more than O(N) time and/or memory, or at least O(N) with a large
constant. (If parsing long lists of integer constants is expensive for
some compiler, this could be a motivation to optimize that particular
case.)

The point of #embed is to get O(N) scaling - or at least, much closer to
that than compilers do today with an #include of a list of numbers (or
even a string literal). There is little doubt that a big enough #embed
file will consume time and memory that is unacceptable, at least for
some people - all you need is to pick a file bigger than your computer's memory, and you can be reasonably confident that it will be problematic.
But it also seems reasonable to expect that if a file is big enough to
cause trouble for #embed, then any other method of including it in a C
file will be at least as bad and probably /much/ worse.

At worst, #embed is going to be no less efficient than today's solution,
and at best it will be significantly more efficient. I don't think it
is fair to object to it because a given implementation might not reach theoretical optimum efficiencies.

The intent of #embed is to copy the contents of a file at compile time
into an array of unsigned char -- but it's specified in a roundabout way
that requires bizarre usages to work "correctly".

That is one expected use, and will probably be the biggest use by a fair
way, but it is not the only possible use. The specification lets you
have more flexibility. For example, I have a project where I include a
number of files in a structure with a number of unsigned char arrays,
amongst other data - a simpler #embed solution that forced you to have
an unsigned char array might not work with that. (The project predates
#embed and uses a Python script to generate the data.)

I expect at least
some compilers to optimize #embed for better compile-time performance,
but that requires them to determine when optimization is permitted with
no advice from the standard about how to do that. That's going to be moderately difficult for compiler implementers; I'm not too concerned
about that. But it also imposes a burden on programmers, who will have
to use trial and error to determine how to ensure a #embed is optimized.

I am entirely confident that major compiler vendors will optimise the
case of initialising char arrays. For anything else, who cares? It is unlikely that you'd use #embed for other purposes with files that are
big enough for unoptimised implementations to be unreasonably slow. And
if that does turn out to be a problem in practice, then you /know/ you
have huge files and are doing something weird, and you can use something
other than #embed for the purpose in the same way you do today.

Of prime importance is /correctness/ - #embed should give the results
you expect, and I can't see that being a problem. Outside that, #embed
is always going to be at least as efficient as existing solutions, and
usually much faster for cases that matter.

This all assumes that a naive #embed implementation is going to
cause real problems for very large embedded files (compile-time
stack overflows, unreasonably long compile times, or just using so
much memory that system performance is affected). If it turns out
that this isn't the case, then that objection is mostly addressed.

I don't believe "very large" embedded files are of any real-world use in
the first place.

And I don't believe there will be any naïve implementations of any significance. gcc and clang are the only two C compilers with a
realistic future for serious C work with newer standards. Even MS
expect people to use clang for C, as far as I understand it. A number
of other toolchains in the embedded world have switched over, or plan to
do so - it is simply not worth the development effort. Niche C
compilers will continue to exist, but it's unlikely they will bother
with C23.

My other objection is that it's conceptually messy. The expected use
case is in an initializer for an array of unsigned char, but there are
no restrictions on where it can be used.

That is the point.

As a programmer, I want to
copy a file verbatim into an unsigned char array, but at least
conceptually #embed translates the file contents into a long sequence of expressions which are then processed as C code to recreate the raw data. There are bizarre cases (like my previous example initializing a struct
with members of various types) that are required to work. #embed is a preprocessor directive, but determining whether it can be optimized
requires feedback from later compiler phases. It's doable, but it's
*ugly*.

I have discussed in previous posts why I don't think there is an issue
there.

And I think alternative ways to achieve the effect would have their own problems and complications. (I believe there is a proposal for C++ that includes a std::embed() function that can use a constexpr string.)

Now that it's too late to change the definition, I've thought of
something that I think would have been a better way to specify #embed.

Define a new kind of string literal, with a "uc" prefix. `uc"foo"` is
of type `unsigned char[3]`. (Or `const unsigned char[3]`, if that's not
too radical.) Unlike other string literals, there is no implicit
terminating '\0'. Arbitrary byte values can of course be specified in hexadecimal: uc"\x01\x02\x03\x04". Since there's no terminating null character and C doesn't support zero-sized objects, uc"" is a syntax
error.

If you are worried about ugly, few things are uglier than a C string
literal with escaped hex characters. Well, escaped octal characters are
worse.

uc"..." string literals might be made even simpler, for example allowing
only hex digits and not requiring \x (uc"01020304" rather than uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals
could be useful in other contexts, and programmers will want
flexibility. Maybe something like hex"01020304" (embedded spaces could
be ignored) could be defined in addition to uc"\x01\x02\x03\x04".

Specify that #embed expands to a sequence of one or more uc string
literals (or hex string literals if that's added), separated by
whitespace. If the embedded file might be empty, use the existing
is_empty() embed parameter. Without is_empty, #embed of an empty file
will expand to uc"", a syntax error.

Since a string literal is a single token, parsing it is likely to be
more efficient than parsing a sequence of integer constant expressions,
even with concatenation of multiple literals. Since a uc"..." string
literal is specifically of type unsigned char[], it can *only* be used
to initialize an unsigned char[] or unsigned char* object, addressing
the conceptual mess. If you want to use #embed to initialize an
array of some other type, you can use a union or some other form of type-punning.

A conforming C23 implementation could even implement this by providing uc"..." (and perhaps hex"...") literals as an extension and adding an implementation-defined embed parameter that generates them.

I am at a loss to see how this would be any improvement.

The efficiency gains of #embed are not because a list of integers is
inherently less efficient than a string literal of some kind. It is
because existing compilers store more information about each element,
and do more checking on each of them (such as for range). With #embed-generated integer lists the compiler would not need to store this
extra information or do the extra checks. Even for "non-optimised"
#embed, I cannot see it being beaten by any kind of string literal
solution by any non-negligible degree.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to bart on Sat Jun 15 19:17:23 2024

On 15/06/2024 00:39, bart wrote:

On 14/06/2024 22:30, Keith Thompson wrote:

Now that it's too late to change the definition, I've thought of
something that I think would have been a better way to specify #embed.

Define a new kind of string literal, with a "uc" prefix. `uc"foo"` is
of type `unsigned char[3]`. (Or `const unsigned char[3]`, if that's not
too radical.) Unlike other string literals, there is no implicit
terminating '\0'. Arbitrary byte values can of course be specified in
hexadecimal: uc"\x01\x02\x03\x04". Since there's no terminating null
character and C doesn't support zero-sized objects, uc"" is a syntax
error.

uc"..." string literals might be made even simpler, for example allowing
only hex digits and not requiring \x (uc"01020304" rather than
uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals
could be useful in other contexts, and programmers will want
flexibility. Maybe something like hex"01020304" (embedded spaces could
be ignored) could be defined in addition to uc"\x01\x02\x03\x04".

That's something I added to string literals in my language within the
last few months. Nothing do with embedding (but it can make hex
sequences within strings more efficient, if that approach was used).

Writing byte-at-a-time hex data was always a bit fiddly:

    0x12, 0x34, 0xAB, ...
    "\x12\x34\xAB...

It was made worse by my preference for `x` being in lower case, and the
hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look wrong.

What I did was create a new, variable-lenghth string escape sequence
that looks like this:

"ABC\h1234AB...\nopq"     // hex sequence between ABC & nopq

Hex digits after \h or \H are read in pairs. White space is allowed
between pairs:

"ABC\H 12 34 AB ...\nopq"

The only thing I wasn't sure about was the closing backslash, which
looks at first like another escape code. But I think it is sound,
although it can still be tweaked.

How often would something like that be useful? I would have thought
that it is rare to see something that is basically text but has enough
odd non-printing characters (other than the common \n, \t, \e) to make
it worth the fuss. If you want to have binary data in something that
looks like a string literal, then just use straight-up two hex digits
per character - "4142431234ab". It's simpler to generate and parse. I
don't see the benefit of something that mixes binary and text data.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to David Brown on Sat Jun 15 20:27:41 2024

On 15/06/2024 18:17, David Brown wrote:

On 15/06/2024 00:39, bart wrote:

On 14/06/2024 22:30, Keith Thompson wrote:

Now that it's too late to change the definition, I've thought of
something that I think would have been a better way to specify #embed.

Define a new kind of string literal, with a "uc" prefix. `uc"foo"` is
of type `unsigned char[3]`. (Or `const unsigned char[3]`, if that's not >>> too radical.) Unlike other string literals, there is no implicit
terminating '\0'. Arbitrary byte values can of course be specified in
hexadecimal: uc"\x01\x02\x03\x04". Since there's no terminating null
character and C doesn't support zero-sized objects, uc"" is a syntax
error.

uc"..." string literals might be made even simpler, for example allowing >>> only hex digits and not requiring \x (uc"01020304" rather than
uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals
could be useful in other contexts, and programmers will want
flexibility. Maybe something like hex"01020304" (embedded spaces could >>> be ignored) could be defined in addition to uc"\x01\x02\x03\x04".

That's something I added to string literals in my language within the
last few months. Nothing do with embedding (but it can make hex
sequences within strings more efficient, if that approach was used).

Writing byte-at-a-time hex data was always a bit fiddly:

     0x12, 0x34, 0xAB, ...
     "\x12\x34\xAB...

It was made worse by my preference for `x` being in lower case, and
the hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look wrong.

What I did was create a new, variable-lenghth string escape sequence
that looks like this:

   "ABC\h1234AB...\nopq"     // hex sequence between ABC & nopq

Hex digits after \h or \H are read in pairs. White space is allowed
between pairs:

   "ABC\H 12 34 AB ...\nopq"

The only thing I wasn't sure about was the closing backslash, which
looks at first like another escape code. But I think it is sound,
although it can still be tweaked.

How often would something like that be useful? I would have thought
that it is rare to see something that is basically text but has enough
odd non-printing characters (other than the common \n, \t, \e) to make
it worth the fuss. If you want to have binary data in something that
looks like a string literal, then just use straight-up two hex digits
per character - "4142431234ab". It's simpler to generate and parse. I don't see the benefit of something that mixes binary and text data.

That's not the same thing. That sequence "...1234..." occupies 4 bytes
(with values 49 50 51 52), not two bytes (with values 0x12 and 0x34, or
18 and 52).

Here's an example of wanting to print '€4.99', first in C (note that my editor doesn't support Unicode so this stuff is needed):

puts("\xE2\x82\xAC" "4.99");

The euro symbol occupies three bytes in UTF8. It's awkward to type: it
has loads of backslashes, it keeps switching case and it needs more concentration.

Plus I had to split the string since apparently \x doesn't stop at two
hex digits, it keeps going: it would have read \xAC4, which overflows
the 8-bit width of a character anyway, so I don't know what the point is
of reading more than 2 hex characters.

Using my feature, it looks like this:

println "\H E2 82 AC\4.99"

There must be loads of examples of wanting to write many byte values
within strings, which in C can also be used to initialise byte arrays (a
useful feature I've now adopted; see below).

Here's another example, in my language, which is the first 128 bytes of
an EXE file which is constant. It is currently defined like this,
probably created with a script:

[]byte stubdata = (
0x4D, 0x5A, 0x90, 0x00, 0x03, 0x00, 0x00, 0x00,
0x04, 0x00, 0x00, 0x00, 0xFF, 0xFF, 0x00, 0x00,
...

Using the new escape, I can just copy&paste a dump, and use a text
editor to put in the string context needed, which took under a minute:

[]byte stubdata=
b"\H 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00\"+
b"\H B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00\"+
b"\H 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00\"+
b"\H 00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00\"+
b"\H 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68\"+
b"\H 69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F\"+
b"\H 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20\"+
b"\H 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00\"+
b"\H 50 45 00 00 64 86 04 00 00 00 00 00 00 00 00 00\"

(The 's'/'b' prefixes are needed for strings to have a type of (in C
terms) char[] rather than char*, a detail that C glosses over via some
magic. 's' gives you a zero terminator, 'b' as used here doesn't. The
"+" is used for compile-time string/data-string concatenation.)

In short, more is possible without needed to resort to tools. You can
directly work from a hex dump.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to David Brown on Sat Jun 15 22:37:59 2024

On Sat, 15 Jun 2024 17:58:22 +0200, David Brown wrote:

But it also seems reasonable to expect that if a file is big enough to
cause trouble for #embed, then any other method of including it in a C
file will be at least as bad and probably /much/ worse.

But if you redefine the problem as “any method of including it in a C *program*”, then you realize that there are better techniques that do not involved C extensions.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to bart on Sat Jun 15 22:39:50 2024

On Sat, 15 Jun 2024 20:27:41 +0100, bart wrote:

The "+" is used for compile-time string/data-string concatenation.)

Why didn’t you follow the C convention of implicit concatenation, just by placing literals next to each other?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to BGB on Sat Jun 15 22:42:46 2024

On Fri, 14 Jun 2024 03:13:32 -0500, BGB wrote:

On 6/14/2024 1:53 AM, Bonita Montero wrote:

Am 13.06.2024 um 21:07 schrieb BGB:

One possible justification (albeit a weak one) is that if one
recompiles the program with optimizations turned on, in many cases
this may subtly change the behavior of the program (particularly in
relation to things like the contents of uninitialized variables and
dangling pointers, etc...). ...

If you rely on that you're misusing the language anyway.

It is a poor practice, but seemingly does occur in the wild (intentional
or not).

It seems to me that kind of thing does tend to get flushed out of open-
source code. Because such code is often compiled with different compilers,
on different architectures, using different tool chains etc. And
assumptions like these tend not to survive such treatment.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Lawrence D'Oliveiro on Sun Jun 16 00:20:34 2024

On 15/06/2024 23:39, Lawrence D'Oliveiro wrote:

On Sat, 15 Jun 2024 20:27:41 +0100, bart wrote:

The "+" is used for compile-time string/data-string concatenation.)

Why didn’t you follow the C convention of implicit concatenation, just by placing literals next to each other?

Why is that better?

I did actually have that, but it wasn't as useful. It could only work at
the lexical level with actual string literals, for a start.

As it is now I can do this:

const x = "abc"
const y = "def"
const z = x + y # "abcdef"

These are named constants with proper scope, which are only resolved in
a later pass. It also applies to strings created by an embedded file:

s := "(" + sinclude("help.txt") + ")"

I can use parentheses and it will still work:

const cond = ...

print (cond | "abc" | "def") + "xyz"

It will display 'abcxyz' or 'defxyz' depending on 'cond', which is known
at compile-time.

I could choose to implement "*" also ...

(I've just spent 10 minutes doing that)

... so that I can do this, where having proper operators comes in useful:

"A" + "B" * 5 ABBBBB
("A" + "B") * 5 ABABABABAB

Here is a use-case:

const cols = 80
println "-" * cols # output divider line

This in a lower level language where strings are not first class types.

How C does it is a hack that was fine for 1972.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to bart on Sun Jun 16 01:16:57 2024

On Sun, 16 Jun 2024 00:20:34 +0100, bart wrote:

On 15/06/2024 23:39, Lawrence D'Oliveiro wrote:

On Sat, 15 Jun 2024 20:27:41 +0100, bart wrote:

The "+" is used for compile-time string/data-string concatenation.)

Why didn’t you follow the C convention of implicit concatenation, just
by placing literals next to each other?

Why is that better?

Less typing. Surprising that few other languages, that otherwise copy
things from C, do not include that feature. But Python does. E.g.

toc.write \
(
"// Total length: %(total_length)s\n"
"CD_DA\n"
"CD_TEXT\n"
" {\n"
" LANGUAGE_MAP { 0 : EN }\n"
" LANGUAGE 0\n"
" {\n"
" TITLE \"%(disc_title)s\"\n"
" PERFORMER \"\"\n"
# get around off-by-one performer assignment bug in cdrdao
" }\n"
" }\n"
%
{
"disc_title" : title_data["disc_title"],
"total_length" : format_cd_time(title_data["total_nr_frames"], True),
}
)

I did actually have that, but it wasn't as useful. It could only work at
the lexical level with actual string literals, for a start.

Of course.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to BGB on Sun Jun 16 03:15:58 2024

On Sat, 15 Jun 2024 20:42:47 -0500, BGB wrote:

I fairly promptly fixed this bug once discovered, and am then just left
to wonder how exactly it managed to work in the first place (or didn't
break already).

Been there, done that.

When I set about to revive the then-moribund NCSA Telnet code for
Macintosh, back around 1990, it looked like it had been abandoned because nobody had the stomach to do the porting from MPW C v2 (created for Apple
by Green Hills) to v3 (developed by Apple itself, thoroughly ANSI-
compliant).

Besides all the compile-time errors, there were dozens, maybe hundreds, of places to be checked for the different, incompatible definition of the Pascal-equivalent “Str255” type, to ensure there were no lurking bugs. I think I got them all.

Then I started up my build, got as far as opening a terminal session,
closed it again, quit ... and the app crashed.

I discovered that the shutdown loop to ensure all open sessions were
closed before quitting had an off-by-1 error in its termination condition:
it was accessing an element in the array of open sessions that didn’t
exist. Somehow this never manifested a problem with the old C compiler,
but it did with the new one.

By the way, that MPW C v3 compiler had some quite amusing error messages. Various people (including myself) independently posted lists of them (and
once I got accused of plagiarizing the list from someone else--as though someone had made them up). Quite a few people couldn’t believe such
messages were real.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Lawrence D'Oliveiro on Sun Jun 16 16:55:51 2024

On 16/06/2024 00:37, Lawrence D'Oliveiro wrote:

On Sat, 15 Jun 2024 17:58:22 +0200, David Brown wrote:

But it also seems reasonable to expect that if a file is big enough to
cause trouble for #embed, then any other method of including it in a C
file will be at least as bad and probably /much/ worse.

But if you redefine the problem as “any method of including it in a C *program*”, then you realize that there are better techniques that do not involved C extensions.

Yes, as I said.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to bart on Sun Jun 16 16:54:53 2024

On 15/06/2024 21:27, bart wrote:

On 15/06/2024 18:17, David Brown wrote:

On 15/06/2024 00:39, bart wrote:

On 14/06/2024 22:30, Keith Thompson wrote:

Now that it's too late to change the definition, I've thought of
something that I think would have been a better way to specify #embed. >>>>
Define a new kind of string literal, with a "uc" prefix. `uc"foo"` is >>>> of type `unsigned char[3]`. (Or `const unsigned char[3]`, if that's
not
too radical.) Unlike other string literals, there is no implicit
terminating '\0'. Arbitrary byte values can of course be specified in >>>> hexadecimal: uc"\x01\x02\x03\x04". Since there's no terminating null >>>> character and C doesn't support zero-sized objects, uc"" is a syntax
error.

uc"..." string literals might be made even simpler, for example
allowing
only hex digits and not requiring \x (uc"01020304" rather than
uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals >>>> could be useful in other contexts, and programmers will want
flexibility. Maybe something like hex"01020304" (embedded spaces could >>>> be ignored) could be defined in addition to uc"\x01\x02\x03\x04".

That's something I added to string literals in my language within the
last few months. Nothing do with embedding (but it can make hex
sequences within strings more efficient, if that approach was used).

Writing byte-at-a-time hex data was always a bit fiddly:

     0x12, 0x34, 0xAB, ...
     "\x12\x34\xAB...

It was made worse by my preference for `x` being in lower case, and
the hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look
wrong.

What I did was create a new, variable-lenghth string escape sequence
that looks like this:

   "ABC\h1234AB...\nopq"     // hex sequence between ABC & nopq

Hex digits after \h or \H are read in pairs. White space is allowed
between pairs:

   "ABC\H 12 34 AB ...\nopq"

The only thing I wasn't sure about was the closing backslash, which
looks at first like another escape code. But I think it is sound,
although it can still be tweaked.

How often would something like that be useful? I would have thought
that it is rare to see something that is basically text but has enough
odd non-printing characters (other than the common \n, \t, \e) to make
it worth the fuss. If you want to have binary data in something that
looks like a string literal, then just use straight-up two hex digits
per character - "4142431234ab". It's simpler to generate and parse.
I don't see the benefit of something that mixes binary and text data.

That's not the same thing. That sequence "...1234..." occupies 4 bytes
(with values 49 50 51 52), not two bytes (with values 0x12 and 0x34, or
18 and 52).

Here's an example of wanting to print '€4.99', first in C (note that my editor doesn't support Unicode so this stuff is needed):

   puts("\xE2\x82\xAC" "4.99");

The euro symbol occupies three bytes in UTF8. It's awkward to type: it
has loads of backslashes, it keeps switching case and it needs more concentration.

Plus I had to split the string since apparently \x doesn't stop at two
hex digits, it keeps going: it would have read \xAC4, which overflows
the 8-bit width of a character anyway, so I don't know what the point is
of reading more than 2 hex characters.

Using my feature, it looks like this:

    println "\H E2 82 AC\4.99"

I don't see any improvement of significance. The improvement, if any,
is very minor.

(I gather you have other conveniences for your language's printing
features when converting various types, but that's a different matter.)

The obvious answer to writing this kind of thing is simply to switch to
an editor that supports UTF-8. That has been the obvious answer for a
couple of decades.

There must be loads of examples of wanting to write many byte values
within strings, which in C can also be used to initialise byte arrays (a useful feature I've now adopted; see below).

Here's another example, in my language, which is the first 128 bytes of
an EXE file which is constant. It is currently defined like this,
probably created with a script:

[]byte stubdata = (
    0x4D, 0x5A, 0x90, 0x00, 0x03, 0x00, 0x00, 0x00,
    0x04, 0x00, 0x00, 0x00, 0xFF, 0xFF, 0x00, 0x00,
    ...

Using the new escape, I can just copy&paste a dump, and use a text
editor to put in the string context needed, which took under a minute:

[]byte stubdata=
b"\H 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00\"+
b"\H B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00\"+
b"\H 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00\"+
b"\H 00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00\"+
b"\H 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68\"+
b"\H 69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F\"+
b"\H 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20\"+
b"\H 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00\"+
b"\H 50 45 00 00 64 86 04 00 00 00 00 00 00 00 00 00\"

Why bother with the \H stuff? That's my point - use hex data for data,
and text for text. Mixing these is not common enough to make it worth
the extra fuss you have to give such negligible extra convenience.

My suggestion is that it could be helpful to have binary blobs written
as hex digits without escapes anywhere, because it is /just/ binary
data. I don't object to having optional spaces - that's a fine idea.
But just write :

b"4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00"
b"B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00"

The extra "\H" adds nothing useful.

(The 's'/'b' prefixes are needed for strings to have a type of (in C
terms) char[] rather than char*, a detail that C glosses over via some
magic. 's' gives you a zero terminator, 'b' as used here doesn't. The
"+" is used for compile-time string/data-string concatenation.)

In short, more is possible without needed to resort to tools. You can directly work from a hex dump.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to David Brown on Sun Jun 16 20:00:45 2024

On 16/06/2024 15:54, David Brown wrote:

On 15/06/2024 21:27, bart wrote:

On 15/06/2024 18:17, David Brown wrote:

On 15/06/2024 00:39, bart wrote:

On 14/06/2024 22:30, Keith Thompson wrote:

Now that it's too late to change the definition, I've thought of
something that I think would have been a better way to specify #embed. >>>>>
Define a new kind of string literal, with a "uc" prefix. `uc"foo"` is >>>>> of type `unsigned char[3]`. (Or `const unsigned char[3]`, if
that's not
too radical.) Unlike other string literals, there is no implicit
terminating '\0'. Arbitrary byte values can of course be specified in >>>>> hexadecimal: uc"\x01\x02\x03\x04". Since there's no terminating null >>>>> character and C doesn't support zero-sized objects, uc"" is a syntax >>>>> error.

uc"..." string literals might be made even simpler, for example
allowing
only hex digits and not requiring \x (uc"01020304" rather than
uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals >>>>> could be useful in other contexts, and programmers will want
flexibility. Maybe something like hex"01020304" (embedded spaces
could
be ignored) could be defined in addition to uc"\x01\x02\x03\x04".

That's something I added to string literals in my language within
the last few months. Nothing do with embedding (but it can make hex
sequences within strings more efficient, if that approach was used).

Writing byte-at-a-time hex data was always a bit fiddly:

     0x12, 0x34, 0xAB, ...
     "\x12\x34\xAB...

It was made worse by my preference for `x` being in lower case, and
the hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look
wrong.

What I did was create a new, variable-lenghth string escape sequence
that looks like this:

   "ABC\h1234AB...\nopq"     // hex sequence between ABC & nopq

Hex digits after \h or \H are read in pairs. White space is allowed
between pairs:

   "ABC\H 12 34 AB ...\nopq"

The only thing I wasn't sure about was the closing backslash, which
looks at first like another escape code. But I think it is sound,
although it can still be tweaked.

How often would something like that be useful? I would have thought
that it is rare to see something that is basically text but has
enough odd non-printing characters (other than the common \n, \t, \e)
to make it worth the fuss. If you want to have binary data in
something that looks like a string literal, then just use straight-up
two hex digits per character - "4142431234ab". It's simpler to
generate and parse. I don't see the benefit of something that mixes
binary and text data.

That's not the same thing. That sequence "...1234..." occupies 4 bytes
(with values 49 50 51 52), not two bytes (with values 0x12 and 0x34,
or 18 and 52).

Here's an example of wanting to print '€4.99', first in C (note that
my editor doesn't support Unicode so this stuff is needed):

    puts("\xE2\x82\xAC" "4.99");

The euro symbol occupies three bytes in UTF8. It's awkward to type: it
has loads of backslashes, it keeps switching case and it needs more
concentration.

Plus I had to split the string since apparently \x doesn't stop at two
hex digits, it keeps going: it would have read \xAC4, which overflows
the 8-bit width of a character anyway, so I don't know what the point
is of reading more than 2 hex characters.

Using my feature, it looks like this:

     println "\H E2 82 AC\4.99"

I don't see any improvement of significance. The improvement, if any,
is very minor.

The difference is that it can be typed fluently without that annoying \x between every number. Plus I can add white space for grouping without it affecting the data.

(I gather you have other conveniences for your language's printing
features when converting various types, but that's a different matter.)

The obvious answer to writing this kind of thing is simply to switch to
an editor that supports UTF-8.

It never happens that you want to type a bunch of hex byte values to
initialise a byte array? OK.

Why bother with the \H stuff? That's my point - use hex data for data,
and text for text. Mixing these is not common enough to make it worth
the extra fuss you have to give such negligible extra convenience.

My suggestion is that it could be helpful to have binary blobs written
as hex digits without escapes anywhere, because it is /just/ binary
data. I don't object to having optional spaces - that's a fine idea.
But just write :

    b"4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00"
    b"B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00"

The extra "\H" adds nothing useful.

Is this a separate feature using 'b'? Because in my scheme, \H is just
another string escape code, which can be used in ordinary strings, and
b"" strings define char[] data which can include normal text data too.

So my example could have been written as b"MZ\h 90 00 03 ..."

I did look at having a separate feature, but I didn't want that. I ended
up with these scheme for data-strings, here expressed using C types:

Can initialise:

"abcd" char* only
s"abcd" char*, char[] or any T[]; zero-terminated
b"abcd" char*, char[] or any T[]

sinclude"file" char*, char[] or any T[]; zero-terminated
binclude"file" char*, char[] or any T[]

The first 3 can include any string escapes including \H...\

The last two embed file data, binary or text. But if a normal C-style
string is needed with no embedded zeros except at the end, sinclude
should be used with a text file.

(The 's'/'b' prefixes are needed for strings to have a type of (in C
terms) char[] rather than char*, a detail that C glosses over via some
magic. 's' gives you a zero terminator, 'b' as used here doesn't. The
"+" is used for compile-time string/data-string concatenation.)

In short, more is possible without needed to resort to tools. You can
directly work from a hex dump.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Chris M. Thomasson on Mon Jun 17 00:03:49 2024

On Sun, 16 Jun 2024 12:31:13 -0700, Chris M. Thomasson wrote:

Code up an L-System for fun:

Been done ... lots. Particularly fun when it works within my favourite 3D
app:

<https://github.com/krljg/lsystem>
<https://blendermarket.com/products/lsystem>

Just a couple that I found with a quick search <https://www.google.com/search?q=blender+addon+lsystem>.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to bart on Mon Jun 17 10:49:04 2024

On 16/06/2024 21:00, bart wrote:

On 16/06/2024 15:54, David Brown wrote:

On 15/06/2024 21:27, bart wrote:

On 15/06/2024 18:17, David Brown wrote:

On 15/06/2024 00:39, bart wrote:

On 14/06/2024 22:30, Keith Thompson wrote:

Now that it's too late to change the definition, I've thought of
something that I think would have been a better way to specify
#embed.

Define a new kind of string literal, with a "uc" prefix.
`uc"foo"` is
of type `unsigned char[3]`. (Or `const unsigned char[3]`, if
that's not
too radical.) Unlike other string literals, there is no implicit >>>>>> terminating '\0'. Arbitrary byte values can of course be
specified in
hexadecimal: uc"\x01\x02\x03\x04". Since there's no terminating null >>>>>> character and C doesn't support zero-sized objects, uc"" is a syntax >>>>>> error.

uc"..." string literals might be made even simpler, for example
allowing
only hex digits and not requiring \x (uc"01020304" rather than
uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals >>>>>> could be useful in other contexts, and programmers will want
flexibility. Maybe something like hex"01020304" (embedded spaces >>>>>> could
be ignored) could be defined in addition to uc"\x01\x02\x03\x04".

That's something I added to string literals in my language within
the last few months. Nothing do with embedding (but it can make hex
sequences within strings more efficient, if that approach was used). >>>>>
Writing byte-at-a-time hex data was always a bit fiddly:

     0x12, 0x34, 0xAB, ...
     "\x12\x34\xAB...

It was made worse by my preference for `x` being in lower case, and
the hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look
wrong.

What I did was create a new, variable-lenghth string escape
sequence that looks like this:

   "ABC\h1234AB...\nopq"     // hex sequence between ABC & nopq >>>>>
Hex digits after \h or \H are read in pairs. White space is allowed
between pairs:

   "ABC\H 12 34 AB ...\nopq"

The only thing I wasn't sure about was the closing backslash, which
looks at first like another escape code. But I think it is sound,
although it can still be tweaked.

How often would something like that be useful? I would have thought
that it is rare to see something that is basically text but has
enough odd non-printing characters (other than the common \n, \t,
\e) to make it worth the fuss. If you want to have binary data in
something that looks like a string literal, then just use
straight-up two hex digits per character - "4142431234ab". It's
simpler to generate and parse. I don't see the benefit of something
that mixes binary and text data.

That's not the same thing. That sequence "...1234..." occupies 4
bytes (with values 49 50 51 52), not two bytes (with values 0x12 and
0x34, or 18 and 52).

Here's an example of wanting to print '€4.99', first in C (note that
my editor doesn't support Unicode so this stuff is needed):

    puts("\xE2\x82\xAC" "4.99");

The euro symbol occupies three bytes in UTF8. It's awkward to type:
it has loads of backslashes, it keeps switching case and it needs
more concentration.

Plus I had to split the string since apparently \x doesn't stop at
two hex digits, it keeps going: it would have read \xAC4, which
overflows the 8-bit width of a character anyway, so I don't know what
the point is of reading more than 2 hex characters.

Using my feature, it looks like this:

     println "\H E2 82 AC\4.99"

I don't see any improvement of significance. The improvement, if any,
is very minor.

The difference is that it can be typed fluently without that annoying \x between every number. Plus I can add white space for grouping without it affecting the data.

I realise you think your system is much nicer - otherwise you would not
have implemented it! /I/ don't think it is a big improvement. It is
certainly not big enough to be worth the effort of changing real
languages or tools used by lots of people rather than just a single
person. And I think the termination using "\" is a step backwards - now
"\" is no longer an escape character, but has different purposes in
different places. One and a half steps forward, one step back, is not
worth the effort - especially when you can so easily go several steps
forward with the format I suggested.

(I gather you have other conveniences for your language's printing
features when converting various types, but that's a different matter.)

The obvious answer to writing this kind of thing is simply to switch
to an editor that supports UTF-8.

It never happens that you want to type a bunch of hex byte values to initialise a byte array? OK.

It /does/ happen. In such cases, I type a bunch of hex values.

What doesn't happen is that I have a UTF-8 text and I choose to write
that using hex values. I much prefer to write the UTF-8 text using an
editor that supports UTF-8 and tools that work with UTF-8.

Why bother with the \H stuff? That's my point - use hex data for
data, and text for text. Mixing these is not common enough to make it
worth the extra fuss you have to give such negligible extra convenience.

My suggestion is that it could be helpful to have binary blobs written
as hex digits without escapes anywhere, because it is /just/ binary
data. I don't object to having optional spaces - that's a fine idea.
But just write :

     b"4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00"
     b"B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00"

The extra "\H" adds nothing useful.

Is this a separate feature using 'b'?

Yes - that's the point. It would be for expressing binary blob data in
a compact form as a string of hex digits, with or without spaces, and convenient for copy-and-paste from hex editors and other such sources.
You could happily use h"..." rather than b"..." if you prefer. And I
suppose it could be extended to support lumps bigger than 8 bits, but
then endian issues complicate matters and I suspect it is not worth the
effort.

Because in my scheme, \H is just
another string escape code, which can be used in ordinary strings,

That is what I would want to avoid. Being able to mix such data is a disadvantage, not an advantage. (IMHO, of course.)

and
b"" strings define char[] data which can include normal text data too.

So my example could have been written as b"MZ\h 90 00 03 ..."

And that kind of monstrosity is what I was trying to get away from.

I did look at having a separate feature, but I didn't want that. I ended
up with these scheme for data-strings, here expressed using C types:

                    Can initialise:

   "abcd"           char* only
s"abcd"           char*, char[] or any T[]; zero-terminated
b"abcd"           char*, char[] or any T[]

sinclude"file"    char*, char[] or any T[]; zero-terminated
binclude"file"    char*, char[] or any T[]

It is a mistake to have too many similar-looking alternatives with
different rules as to when and where they can be used.

Changing existing languages is always difficult, or even impossible.
But my suggestion here is that there should be two different kinds of
literals:

"Hello, world!"

and

b"00 12 34"

The former is always a string, always UTF-8, in whatever format the
language uses for strings (zero-terminated, Pascal style, or whatever).
The later is a compact way of writing binary blobs in hex when needed,
and is always a constant array of bytes.

The first 3 can include any string escapes including \H...\

The last two embed file data, binary or text. But if a normal C-style
string is needed with no embedded zeros except at the end, sinclude
should be used with a text file.

(The 's'/'b' prefixes are needed for strings to have a type of (in C
terms) char[] rather than char*, a detail that C glosses over via
some magic. 's' gives you a zero terminator, 'b' as used here
doesn't. The "+" is used for compile-time string/data-string
concatenation.)

In short, more is possible without needed to resort to tools. You can
directly work from a hex dump.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Mon Jun 17 13:18:00 2024

On Sun, 16 Jun 2024 20:00:45 +0100
bart <[email protected]> wrote:

I don't see any improvement of significance. The improvement, if
any, is very minor.

The difference is that it can be typed fluently without that annoying
\x between every number.

It does not sound like a big obstacle. If you are typing something
long, just type '-' instead of '\x' and do find&replace after you
finished.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet
- Rixter
  Tue Jul 28 13:42:46 2026
  from Madison, Nc via Telnet
- Krenn
  Tue Jul 28 11:59:57 2026
  from Sydney, Nsw via Telnet
- Rixter
  Tue Jul 28 01:23:48 2026
  from Madison, Nc via Telnet
- Centurion
  Mon Jul 27 22:50:42 2026
  from Berea, Ohio via Telnet
- Ataricrypt
  Mon Jul 27 19:19:17 2026
  from England via Telnet
- Bob Worm
  Mon Jul 27 15:19:55 2026
  from Wales, Uk via Telnet
- Rixter
  Mon Jul 27 13:04:59 2026
  from Madison, Nc via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	50:58:30
Calls:	12,444
Calls today:	4
Files:	15,192
Messages:	6,537,160

C23 thoughts and opinions

Who's Online

Recent Visitors

System Info