On 22/05/2024 17:55, David Brown wrote:
In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change in
it, especially compared to the minor changes in C17.
<https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>
<https://en.wikipedia.org/wiki/C23_(C_standard_revision)>
<https://en.cppreference.com/w/c/23>
I like that it tidies up a lot of old stuff - it is neater to have
things like "bool", "static_assert", etc., as part of the language
rather than needing a half-dozen includes for such basic stuff.
I like that it standardises a several useful extensions that have been
in gcc and clang (and possibly other compilers) for many years.
I'm not sure it will make a big difference to my own programming -
when I want "typeof" or "chk_add()", I already use them in gcc. But
for people restricted to standard C, there's more new to enjoy. And I
prefer to use standard syntax when possible.
"constexpr" is something I think I will find helpful, in at least some
circumstances.
So I'm currently writing some code (you can follow my progress on
github, it is a new branch in the Baby X resource compiler project). And
it's just standard well understood algorithm code to manipulate XML
trees. And I certainly don't feel the neeed for static_assert.
But even
boolean type and const.
Of course quite alot of the functions don't
actually change the structures they are passed. But is littering the
code with const going to help? And why do you really need a boolean when
an int can hold either a zero or non-zero value?
And don't you just want a pared down, clean language?
On 22/05/2024 13:55, David Brown wrote:
In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change in
it, especially compared to the minor changes in C17.
<https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>
<https://en.wikipedia.org/wiki/C23_(C_standard_revision)>
<https://en.cppreference.com/w/c/23>
I like that it tidies up a lot of old stuff - it is neater to have
things like "bool", "static_assert", etc., as part of the language
rather than needing a half-dozen includes for such basic stuff.
I like that it standardises a several useful extensions that have been
in gcc and clang (and possibly other compilers) for many years.
I'm not sure it will make a big difference to my own programming -
when I want "typeof" or "chk_add()", I already use them in gcc. But
for people restricted to standard C, there's more new to enjoy. And I
prefer to use standard syntax when possible.
"constexpr" is something I think I will find helpful, in at least some
circumstances.
I am waiting MSVC support. There are a lot of simple features MSVC could implement and deliver in small increments. But it is very slow.
I am would use today if I had.
- #warning
- [[nodiscard]]
- typeof
- digit separators
- bool true, false
I am not planning to use:
- enum with specific types.
- #elifdef
- nullptr
- auto
- constexpr
Not sure
- empty initializer
And why do you really need a boolean when
an int can hold either a zero or non-zero value?
And don't you just want a pared down, clean language?
static int haserror(LEXER *lex)
{
return lex->error[0] ? 1 : 0;
}
I am waiting MSVC support. There are a lot of simple features MSVC could implement and deliver in small increments. But it is very slow.
I like the idea of embed ...
<https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>
... code will be written to use it.
Lawrence D'Oliveiro <[email protected]d> writes:
On Wed, 22 May 2024 22:23:26 -0300, Thiago Adams wrote:
I like the idea of embed ...
We’ve discussed this before. It just seems like a sop to those stuck
with antiquated, crippled build systems. In which case, how would they
get an up-to-date compiler that supports it?
Presumably by waiting until compilers support it, like any new feature.
static int haserror(LEXER *lex)
{
return lex->error[0] ? 1 : 0;
}
error is a character buffer which holds the error message if an error
has been encountered. And for convenience it is placed in the
lexer. If here is no error, it holds the empty string. However it's
not entirely obvious that testing the message directly is the way you
should be testing for an error condition, so I wrote that little
function to make things clearer.
It's easy enough to make it return a boolean, of course. But I don't
see a real benefit.
Em 5/22/2024 7:53 PM, Keith Thompson escreveu:
But const doesn't mean constant. It means read-only.
`const int r = rand();` is perfectly valid.
I dislike the C++ hack of making N a constant expression given
`const int N = 42;`; constexpr made that unnecessary. C23 makes the
same (IMHO) mistake.
If I had a time machine, I'd spell "const" as "readonly" and make
"const" mean what "constexpr" now means (evaluated at compile time).
[...]
Everything is a mess: const in C++, the differences from const in C,
etc. constexpr in C23 just makes the mess bigger.
auto is a mess as well not well specified for pointer. not sure if we
had this topic here, but auto * p in C is not specified.
I would remove from C23
- nullptr
-auto
-constexpr
-embed
I like the idea of embed but there is no implementation in production so
this is crazy!
In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change
in it, especially compared to the minor changes in C17.
<https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf> <https://en.wikipedia.org/wiki/C23_(C_standard_revision)> <https://en.cppreference.com/w/c/23>
I like that it tidies up a lot of old stuff - it is neater to have
things like "bool", "static_assert", etc., as part of the language
rather than needing a half-dozen includes for such basic stuff.
I like that it standardises a several useful extensions that have
been in gcc and clang (and possibly other compilers) for many years.
I'm not sure it will make a big difference to my own programming -
when I want "typeof" or "chk_add()", I already use them in gcc. But
for people restricted to standard C, there's more new to enjoy. And
I prefer to use standard syntax when possible.
"constexpr" is something I think I will find helpful, in at least
some circumstances.
David Brown <[email protected]> writes:
On 22/05/2024 19:42, Thiago Adams wrote:[...]
- nullptr
I am fond of nullptr in C++, and will use it in C. Like most of the
C23 changes, it's not a big issue - after all, you get a lot of the
same effect with "#define nullptr (void*)(0)" or similar. But it
means your code has a visual distinction between the integer 0 and a
null pointer, and also lets the compiler or other static checking
system check better than using NULL would. (And I don't like NULL - I
dislike all-caps identifiers in general.)
Quibble: That should be
#define nullptr ((void*)0)
For example, this doesn't produce a syntax error for `sizeof nullptr`.
Better:
#if __STDC_VERSION__ < 202311L
#define nullptr ((void*)0)
#endif
C23's nullptr is of type nullptr_t, not void*. But you'd probably have
to go out of your way for that to be an issue (e.g., using nullptr in a generic selection).
[...]
- constexpr
I will definitely use that. Sometimes I want a constant expression
for things like array sizes or static initialisers, and want to
calculate it. constexpr gives you that without having to resort to
macros. (I'd perhaps be even happier if I could just use const, as I
can in C++.)
But const doesn't mean constant. It means read-only.
`const int r = rand();` is perfectly valid.
I dislike the C++ hack of making N a constant expression given
`const int N = 42;`; constexpr made that unnecessary.
C23 makes the
same (IMHO) mistake.
If I had a time machine, I'd spell "const" as "readonly" and make
"const" mean what "constexpr" now means (evaluated at compile time).
[...]
On Wed, 22 May 2024 14:42:58 -0300, Thiago Adams wrote:
I am waiting MSVC support. There are a lot of simple features MSVC
could implement and deliver in small increments. But it is very
slow.
And they wonder why developers are deserting the Windows platform for
Linux.
On 22/05/2024 17:11, David Brown wrote:
On 22/05/2024 19:42, Thiago Adams wrote:
On 22/05/2024 13:55, David Brown wrote:
In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change
in it, especially compared to the minor changes in C17.
<https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>
<https://en.wikipedia.org/wiki/C23_(C_standard_revision)>
<https://en.cppreference.com/w/c/23>
- constexpr
I will definitely use that. Sometimes I want a constant expression
for things like array sizes or static initialisers, and want to
calculate it. constexpr gives you that without having to resort to
macros. (I'd perhaps be even happier if I could just use const, as I
can in C++.)
I am curious for that. Do you have a sample?
Not sure
- empty initializer
I don't see that one being a big hit, at least for me. But I see
little benefit in /not/ allowing it in the language, so it seems a
sensible addition.
This is what I use
struct X x = {0};
But I can do a find-replace and change everything to {}
When I create samples, I use new feature like nullptr and {}.
The problem I see is to use these features in real code, and create a
mess of styles.
I will definitely use that. Sometimes I want a constant expression
for things like array sizes or static initialisers, and want to
calculate it. constexpr gives you that without having to resort to
macros.
On 23/05/2024 09:17, David Brown wrote:
If I try to be precise about the terms "constant expression", "integer
constant expression", etc., I suspect I will get the details wrong
unless I spend a lot of time checking carefully. So I hope it is good
enough for me to be a bit lazy and quote the error messages from gcc
(with "-std=c23 -Wpedantic").
With this code, compilation fails "initialiser element is not a
constant" for y.
int x = 100;
int y = x / 20;
int zs[y];
With this code, compilation fails because the zs is actually a VLA,
and "variably modified 'zs' at file scope" is not allowed.
const int x = 100;
const int y = x / 20;
int zs[y];
This code, however, is fine:
constexpr int x = 100;
constexpr int y = x / 20;
int zs[y];
This also works, even for older standards:
enum { x = 100 };
enum { y = x / 20 };
int zs[y];
But constexpr works for other types, not just "int" which is the type
of all enumeration constants. (And "enum" constants are a somewhat
weird way to get this effect - "constexpr" looks neater.)
And in general, I like to be able to say, to the compiler and to
people reading the code, "this thing is really fixed and constant, and
stop compiling if you think I am wrong" rather than just "I promise I
won't change this thing - or if I do, I don't mind the nasal daemons".
We can write:
#define X 100
#define Y ((X) / 20)
int zs[Y];
I cannot see a good justification for constexpr.
I already see bad usages of constexpr in C++ code. It was used in cases
where we know for sure that is NOT compile time. This just make review
harder "why did someone put this here?" conclusion was it was totally unnecessary and ignored by the compiler. The programmer was trying to
add something extra, like "magic" hoping for something that would never happen.
On 23/05/2024 02:21, Thiago Adams wrote:
Em 5/22/2024 7:53 PM, Keith Thompson escreveu:
But const doesn't mean constant. It means read-only.
`const int r = rand();` is perfectly valid.
I dislike the C++ hack of making N a constant expression given
`const int N = 42;`; constexpr made that unnecessary. C23 makes the
same (IMHO) mistake.
If I had a time machine, I'd spell "const" as "readonly" and make
"const" mean what "constexpr" now means (evaluated at compile time).
[...]
Everything is a mess: const in C++, the differences from const in C,
etc. constexpr in C23 just makes the mess bigger.
auto is a mess as well not well specified for pointer. not sure if we
had this topic here, but auto * p in C is not specified.
I would remove from C23
- nullptr
-auto
-constexpr
-embed
I like the idea of embed but there is no implementation in production
so this is crazy!
'embed' was discussed a few months ago. I disagreed with the poor way it
was to be implemented: 'embed' notionally generates a list of
comma-separated numbers as tokens, where you have to take care of any trailing zero yourself if needed. It would also be hopelessly
inefficient if actually implemented like that.
On Wed, 22 May 2024 18:55:36 +0200, David Brown wrote:
<https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>
Unicode identifiers!
typedef int
typėdef;
On 5/22/2024 9:55 AM, David Brown wrote:
In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change in
it, especially compared to the minor changes in C17.
Love the way std::vectors respect alignas... C++20, iirc?
[...]
On Wed, 22 May 2024 22:11:44 +0200
David Brown <[email protected]> wrote:
I will definitely use that. Sometimes I want a constant expression
for things like array sizes or static initialisers, and want to
calculate it. constexpr gives you that without having to resort to
macros.
I don't say that everything that can be done with C23 constexpr can be
done with enum, but for uses like ones you mentioned above, 90%
probably can be done with enum.
On Wed, 22 May 2024 18:55:36 +0200
David Brown <[email protected]> wrote:
In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change
in it, especially compared to the minor changes in C17.
<https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>
<https://en.wikipedia.org/wiki/C23_(C_standard_revision)>
<https://en.cppreference.com/w/c/23>
I like that it tidies up a lot of old stuff - it is neater to have
things like "bool", "static_assert", etc., as part of the language
rather than needing a half-dozen includes for such basic stuff.
I like that it standardises a several useful extensions that have
been in gcc and clang (and possibly other compilers) for many years.
I'm not sure it will make a big difference to my own programming -
when I want "typeof" or "chk_add()", I already use them in gcc. But
for people restricted to standard C, there's more new to enjoy. And
I prefer to use standard syntax when possible.
"constexpr" is something I think I will find helpful, in at least
some circumstances.
Removed
1) Old-style function declarations and definitions
2) Representations for signed integers other than two's complement
3) Permission that u/U-prefixed character constants and string
literals may be not UTF-16/32
4) Mixed wide string literal concatenation
5) Support for calling realloc() with zero size (the behavior becomes undefined)
6) __alignof_is_defined and __alignas_is_defined
7) static_assert is not provided as a macro defined in <assert.h>
(becomes a keyword)
8) thread_local is not provided as a macro defined
in <threads.h> (becomes a keyword)
1) good
2) good, but insufficient. The next logical step is to make both left
and right shift of negative integers by count that does not exceed #
of bits in respective type fully defined
3) IDNC
4) IDNC
5) IDNC
6) IDNC
7) bad. Breaks existing code for weak reason
8) bad. Breaks existing code for weak reason
In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change
in it, especially compared to the minor changes in C17.
On Thu, 23 May 2024 02:49:37 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:
On Wed, 22 May 2024 14:42:58 -0300, Thiago Adams wrote:
I am waiting MSVC support. There are a lot of simple features MSVC
could implement and deliver in small increments. But it is very
slow.
And they wonder why developers are deserting the Windows platform for
Linux.
In practice, on my old home Windows PC (11 y.o. installation of 14 y.o.
OS) today, 2024-05-23, I can easily install and use gcc14.1.0 alongside >clang18.1.5 alongside one of the newest versions of MSVC (not sure
which one) alongside latest Intel ICC alongside any older version of
MSVC and ICC and with a little more effort and disk space alongside
older versions of clang and gcc at least as long back as gcc4.9. I can
use all of those either simultaneously or interchangeably.
I very much doubt that I can get similar variety of compiler versions
on Linux of similar age or even on one that is 5 years younger. Even on
most up to date Linux distros, in order to get such compilers zoo, I'd >probably have to fight against package manager rather than be assisted
by it.
On 23/05/2024 09:17, David Brown wrote:
If I try to be precise about the terms "constant expression", "integer
constant expression", etc., I suspect I will get the details wrong
unless I spend a lot of time checking carefully. So I hope it is good
enough for me to be a bit lazy and quote the error messages from gcc
(with "-std=c23 -Wpedantic").
With this code, compilation fails "initialiser element is not a
constant" for y.
int x = 100;
int y = x / 20;
int zs[y];
With this code, compilation fails because the zs is actually a VLA, and
"variably modified 'zs' at file scope" is not allowed.
const int x = 100;
const int y = x / 20;
int zs[y];
This code, however, is fine:
constexpr int x = 100;
constexpr int y = x / 20;
int zs[y];
This also works, even for older standards:
enum { x = 100 };
enum { y = x / 20 };
int zs[y];
But constexpr works for other types, not just "int" which is the type of
all enumeration constants. (And "enum" constants are a somewhat weird
way to get this effect - "constexpr" looks neater.)
And in general, I like to be able to say, to the compiler and to people
reading the code, "this thing is really fixed and constant, and stop
compiling if you think I am wrong" rather than just "I promise I won't
change this thing - or if I do, I don't mind the nasal daemons".
We can write:
#define X 100
#define Y ((X) / 20)
int zs[Y];
I cannot see a good justification for constexpr.
On Wed, 22 May 2024 22:11:44 +0200
David Brown <[email protected]> wrote:
I will definitely use that. Sometimes I want a constant expression
for things like array sizes or static initialisers, and want to
calculate it. constexpr gives you that without having to resort to
macros.
I don't say that everything that can be done with C23 constexpr can be
done with enum, but for uses like ones you mentioned above, 90%
probably can be done with enum.
Michael S <[email protected]> writes:
On Wed, 22 May 2024 22:11:44 +0200
David Brown <[email protected]> wrote:
I will definitely use that. Sometimes I want a constant expression
for things like array sizes or static initialisers, and want to
calculate it. constexpr gives you that without having to resort to
macros.
I don't say that everything that can be done with C23 constexpr can
be done with enum, but for uses like ones you mentioned above, 90%
probably can be done with enum.
Are C23 enums signed? or unsigned? What is the supported enum
range?
Another area that was mostly unchanged since 1st edition of K&R is
storage classes. Even such obvious thing as removal of 'auto' class
took too long. If I am not mistaken, totally obsolete 'register' class
is still allowed. And I don't remember any additions.
On Wed, 22 May 2024 18:55:36 +0200
David Brown <[email protected]> wrote:
In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change
in it, especially compared to the minor changes in C17.
Why C Standard Committee, while being recently quite liberal in field
of introducing new keywords (too liberal for my liking, many new things
do not really deserve keywords not prefixed by __) is so conservative
in introduction of program control constructs? I don't remember any
new program control introduced under Committee regime.
And I want at least one.
Another area that was mostly unchanged since 1st edition of K&R is
storage classes. Even such obvious thing as removal of 'auto' class
took too long. If I am not mistaken, totally obsolete 'register' class
is still allowed.
And I don't remember any additions.
Personally I can think about at least two useful backward-compatible additions in that area.
So yes, I /could/ use enum constants for things that are not
enumerations. I /did/ use them for that. But going forward with C23,
I'll use constexpr instead.
Michael S <[email protected]> writes:
On Wed, 22 May 2024 18:55:36 +0200
David Brown <[email protected]> wrote:
In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change
in it, especially compared to the minor changes in C17.
Why C Standard Committee, while being recently quite liberal in
field of introducing new keywords (too liberal for my liking, many
new things do not really deserve keywords not prefixed by __) is so conservative in introduction of program control constructs? I don't remember any new program control introduced under Committee regime.
And I want at least one.
Which is?
New keywords are typically prefixed by an underscore and an upper case letter, such as C11's "_Generic". There are no (standard) keywords
starting with "__".
On 23/05/2024 16:19, Michael S wrote:
On Wed, 22 May 2024 18:55:36 +0200
David Brown <[email protected]> wrote:
In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change
in it, especially compared to the minor changes in C17.
Why C Standard Committee, while being recently quite liberal in
field of introducing new keywords (too liberal for my liking, many
new things do not really deserve keywords not prefixed by __) is so conservative in introduction of program control constructs? I don't remember any new program control introduced under Committee regime.
And I want at least one.
What program control construct would you like?
Another area that was mostly unchanged since 1st edition of K&R is
storage classes. Even such obvious thing as removal of 'auto' class
took too long. If I am not mistaken, totally obsolete 'register'
class is still allowed.
"register" is still in C23. (Some compilers pay attention to it.
gcc with optimisation disabled puts local variables on the stack,
except for those marked "register" that get put in registers.) It
got dropped from C++ when "auto" was re-purposed in C++11, but with
the keyword "register" kept for future use. I would not have
objected to the same thing happening in C23.
And I don't remember any additions.
_Thread_local was added in C11, with the alias thread_local in C23.
What would you like to see here?
Personally I can think about at least two useful backward-compatible additions in that area.
W.r.t. [asctime() and ctime() being removed]
IMHO, all old-UNIX-style APIs that return pointers to static
objects within library or rely on presence of static object within
library for purpose of preserving state for subsequent calls
should be systematically deprecated and for majority of them there
should be provided thread-safe alternatives akin to ctime_s().
That is, with exception of family of functions that uses FILE*.
Not that I like them very much, but they are ingrained too deeply.
So, peeking just asctime and ctime out of long list of problematic
APIs does not appear particularly consistent. If they were asking
me where to start, I'd start with rand().
[...] Just want to say that strfrom* family is long overdue, but
still appear incomplete. The guiding principle should be that all
format specifiers available in printf() with sole exception of %s
should be provided as strfrom* as well.
On Thu, 23 May 2024 22:10:22 +0200 David Brown
<[email protected]> wrote:
What program control construct would you like?Ability to break from nested loops.
Ability to break from nested loops. Ability to"continue" outer
loops would be nice too, but less important. [...]
1. global objects, declared in header files and included
several times. Where defined? [...] I wnat it to "just work"
everywhere. [...]
2. Reversing defaults for visibility of objects and functions
at file scope.
Something like:
#pragma export_by_default(off).
When this pragma is in effect, we need a way to make objects and
functions globally visible. I think that it's done best with new
storage class.
With regard to constexpr, mentioned above by James Kuyper, my
feeling about it is that it belongs to metaprogramming so I
would not consider it a real storage class.
Tim Rentsch <[email protected]> writes:
[...]
Having 'constexpr' be classified as a storage class illustrates
how poorly thought out it is.
constexpr is not classified as a storage class. N3022, like earlier editions, says there are four storage durations: static, thread,
automatic, and allocated.
On 24/05/2024 02:06, Lawrence D'Oliveiro wrote:
On Fri, 24 May 2024 00:34:24 +0300, Michael S wrote:
On Thu, 23 May 2024 22:10:22 +0200 David Brown
<[email protected]> wrote:
What program control construct would you like?Ability to break from nested loops.
At least 90% of the time, when I want to exit from an inner loop in C,
there will be some kind of cleanup I need to do in the outer loop
before that can exit too. So the ability to jump straight out will
rarely be used.
goto gives you the functionality you require.
Am 23.05.2024 um 21:49 schrieb Thiago Adams:
On 23/05/2024 16:25, Bonita Montero wrote:
I ask myself what the point is in further developing a languagedo you mean C++?
like this that can actually no longer be saved.
No, C.
On 24/05/2024 02:06, Lawrence D'Oliveiro wrote:
On Fri, 24 May 2024 00:34:24 +0300, Michael S wrote:
On Thu, 23 May 2024 22:10:22 +0200 David Brown
<[email protected]> wrote:
What program control construct would you like?Ability to break from nested loops.
At least 90% of the time, when I want to exit from an inner loop in
C, there will be some kind of cleanup I need to do in the outer
loop before that can exit too. So the ability to jump straight out
will rarely be used.
goto gives you the functionality you require.
I usually use goto for handling malloc() failures. So if an
allocation fails within a deeply nested loop, I will jump to code at
the end of the function, free up amy half-constructed objects, and
return an error condition.
Thiago Adams <[email protected]> writes:
On 23/05/2024 10:11, David Brown wrote:[...]
On 23/05/2024 14:38, Thiago Adams wrote:
I already see bad usages of constexpr in C++ code. It was used inIME poor or confusing uses of "constexpr" are for functions, not
cases where we know for sure that is NOT compile time. This just
make review harder "why did someone put this here?" conclusion was
it was totally unnecessary and ignored by the compiler. The
programmer was trying to add something extra, like "magic" hoping
for something that would never happen.
objects, and C23 does not support "constexpr" for functions.
The sample C++ was something like
constexpr char * s[] = {"a", "b"};
for (int i = 0; i < sizeof(s); i++)
{
//using s[i]
}
I checked in C, it is an error.
Apparently C23 has stricter rules for constexpr than C++ does. I can
imagine those rules being relaxed in future editions of the C standard.
Michael S <[email protected]> writes:
[comments on various new features in C23]
Overall I am quite disappointed by C23. IMO it's a step
backwards rather than forwards.
W.r.t. [asctime() and ctime() being removed]
IMHO, all old-UNIX-style APIs that return pointers to static
objects within library or rely on presence of static object within
library for purpose of preserving state for subsequent calls
should be systematically deprecated and for majority of them there
should be provided thread-safe alternatives akin to ctime_s().
That is, with exception of family of functions that uses FILE*.
Not that I like them very much, but they are ingrained too deeply.
So, peeking just asctime and ctime out of long list of problematic
APIs does not appear particularly consistent. If they were asking
me where to start, I'd start with rand().
I agree with the suggestion that restartable versions of "dirty"
functions be added to the C standard. I strongly disagree that
the old ones should be taken out. If compilers choose to give
warnings, that's fine, but these functions should not be removed
just because some people think they are clunky.
[...] Just want to say that strfrom* family is long overdue, but
still appear incomplete. The guiding principle should be that all
format specifiers available in printf() with sole exception of %s
should be provided as strfrom* as well.
What's the motivation for having separate functions? To me this
looks like creeping featuritis.
On Thu, 23 May 2024 17:37:39 -0700
Tim Rentsch <[email protected]> wrote:
Michael S <[email protected]> writes:
[...] Just want to say that strfrom* family is long overdue, but
still appear incomplete. The guiding principle should be that all
format specifiers available in printf() with sole exception of %s
should be provided as strfrom* as well.
What's the motivation for having separate functions? To me this
looks like creeping featuritis.
My practical motivation is space-constrained environments, where I
possibly want one or two or three formatters. sprintf() gives me all
or nothing and all can be too expensive. Many embedded environments
have big and small variants of sprintf that can be chosen at link
time, but what's in small variant does not necessarily match a set
that I want in my specific project. And is not necessarily well
documented.
David Brown <[email protected]> writes:
On 23/05/2024 14:11, bart wrote:[...]
[...]'embed' was discussed a few months ago. I disagreed with the poor
way it was to be implemented: 'embed' notionally generates a list of
comma-separated numbers as tokens, where you have to take care of
any trailing zero yourself if needed. It would also be hopelessly
inefficient if actually implemented like that.
Fortunately, it is /not/ actually implemented like that - it is only
implemented "as if" it were like that. Real prototype implementations
(for gcc and clang - I don't know about other tools) are extremely
efficient at handling #embed. And the comma-separated numbers can be
more flexible in less common use-cases.
I'm aware of a proposed implementation for clang:
https://github.com/llvm/llvm-project/pull/68620 https://github.com/ThePhD/llvm-project
I'm currently cloning the git repo, with the aim of building it so I can
try it out and test some corner cases. It will take a while.
I'm not aware of any prototype implementation for gcc. If you are, I'd
be very interested in trying it out.
(And thanks for starting this thread!)
On 2024-05-23, David Brown <[email protected]> wrote:
So yes, I /could/ use enum constants for things that are not
enumerations. I /did/ use them for that. But going forward with C23,
I'll use constexpr instead.
The value of an enum is:
1. Compiler warns of incomplete switch cases.
2. In a debugger when you examine an enum-valued expression or
variable, you get the symbolic name:
3. Safety (with C++ enum rules: no implicit
conversion from ordinary integer type to enum).
Historically, C code bases have abused enums to defined constants
like "enum { bufsize = 1024 }" for understandable reasons, but it is a cringe-inducing hack, which is also incomplete and inflexible; e.g. what
if we want a floating-point constant.
I've benefited from (3) in C programs that were contrived
to be compilable as C++. (That practice, though, tends to increasingly
hamper your dialect choice though, as the languages diverge and make
only small steps here and there to become closer.)
bart <[email protected]> writes:
[...]
I suspect that ones like 'embed' have been derived from C++ which
always likes to make things too wide-ranging and much harder to use
and implement than necessary.
No, C++ doesn't have #embed. (If it did, many C compilers would already
have it, since C and C++ commonly share the preprocessor
implementation.)
On 22/05/2024 20:50, David Brown wrote:
On 22/05/2024 21:10, Malcolm McLean wrote:
But even boolean type and const.
Const documents the code, makes the action of a function clearer to
the reader, and helps catch mistakes.
These are all things that make the language better, and have done so
for the past 25 years.
Of course quite alot of the functions don't actually change the
structures they are passed. But is littering the code with const
going to help? And why do you really need a boolean when an int can
hold either a zero or non-zero value?
And don't you just want a pared down, clean language?
I want a language with the features I need and that help me to write
good clear code. Minimal is not helpful, any more than needlessly
complex is helpful.
So the code I'm working on at the moment.
It's an implemention of XPath (a subset, of course). XPath is sort of
query language for XML. You pass a query string like
"/bookstore/book//title" and that selects all children of [root]/bookstore/book with the element tag "title".
Now querying the document shouldn't change it. So in C++ it should
bepassed in as a XMLDOC const &. In C, declaring the pointer a const
XMLDOC * conveyes the intention, but doesn't actually achieve the safety
you want and get with C++.
However the algorithm I have just moved to needs a bit associated with
each node it can turn on and of. Now in fact I did this via a hash
table. But it is very tempting and far more efficient to simply add a
hacky field to the XMLNODE structure - after all, I wrote the XML
parser. And in C++ "mutable" is designed for just this. But in C,
were're either const or not. And isn't it maybe better to leave the
const qualifier off the document pointer?
In fact, wouldn't we just be better off without const?
After all, you
need to read the function specifications anyway, and they should say
that querying for a path will not alter the document.
On 5/23/2024 6:35 AM, David Brown wrote:
On 22/05/2024 23:24, Chris M. Thomasson wrote:
On 5/22/2024 9:55 AM, David Brown wrote:
In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change
in it, especially compared to the minor changes in C17.
Love the way std::vectors respect alignas... C++20, iirc?
[...]
I have no idea what you are talking about.
std::vector actually respects alignas, on MSVC at least. I did not know
this worked until I tried it. Iirc, Bonita was the one that sparked my
test. It aligned itself on the proper boundaries. Very nice.
But did you notice that this is c.l.c, not c.l.c++, and the topic is
C23, not C++23 ? Discussing comparisons or compatibility with C++ is
fair enough, but talking about pure C++ matters (such as
std::vector<>) is unlikely to be helpful.
C has it as well... Very useful!
Michael S <[email protected]> writes:
[...]
Removed[...]
7) static_assert is not provided as a macro defined in <assert.h>[...]
(becomes a keyword)
8) thread_local is not provided as a macro defined in <threads.h>
(becomes a keyword)
7) bad. Breaks existing code for weak reason
8) bad. Breaks existing code for weak reason
In pre-C23, _Static_assert and _Thread_local are keywords, and
static_assert and thread_local are macros that expand to those keywords.
In C23, _Static_assert, _Thread_local, static_assert, and thread_local
are all keywords. Code that simply uses the old ugly keywords would not break.
Code that does something like "#ifdef static_assert". I suppose the
headers could have retained the old macro definitions.
#define static_assert static_assert
#define thread_local thread_local
On Thu, 23 May 2024 22:10:22 +0200
David Brown <[email protected]> wrote:
On 23/05/2024 16:19, Michael S wrote:
On Wed, 22 May 2024 18:55:36 +0200
David Brown <[email protected]> wrote:
In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change
in it, especially compared to the minor changes in C17.
Why C Standard Committee, while being recently quite liberal in
field of introducing new keywords (too liberal for my liking, many
new things do not really deserve keywords not prefixed by __) is so
conservative in introduction of program control constructs? I don't
remember any new program control introduced under Committee regime.
And I want at least one.
What program control construct would you like?
Ability to break from nested loops. Ability to"continue" outer loops
would be nice too, but less important.
I am not sure what syntax I want for this feature, never considered
myself a competent language designer.
Another area that was mostly unchanged since 1st edition of K&R is
storage classes. Even such obvious thing as removal of 'auto' class
took too long. If I am not mistaken, totally obsolete 'register'
class is still allowed.
"register" is still in C23. (Some compilers pay attention to it.
gcc with optimisation disabled puts local variables on the stack,
except for those marked "register" that get put in registers.) It
got dropped from C++ when "auto" was re-purposed in C++11, but with
the keyword "register" kept for future use. I would not have
objected to the same thing happening in C23.
And I don't remember any additions.
_Thread_local was added in C11, with the alias thread_local in C23.
_Thread_local is a special-purpose thing, probably not applicable at
all for programming of small embedded systems, which nowadays is the
only type of programming in C that I do for money rather than as hobby.
With regard to constexpr, mentioned above by James Kuyper, my feeling
about it is that it belongs to metaprogramming so I would not consider
it a real storage class.
What would you like to see here?
Instead of solutions, let's talk about problems that I want to solve:
1. global objects, declared in header files and included several times.
Where defined?
For some linkers, mostly unixy linkers, in case of none-initialized
objects (implicitly initialized to zero) it somehow works.
For linkers used on embedded systems it requires additional effort.
I think, for initialized globals it takes additional effort even with
unixy linkers.
I wnat it to "just work" everywhere. I think that the best way to get
it without breaking existing semantics is a new storage class.
2. Reversing defaults for visibility of objects and functions at file
scope.
Something like:
#pragma export_by_default(off).
When this pragma is in effect, we need a way to make objects and
functions globally visible. I think that it's done best with new
storage class.
Michael S <[email protected]> writes:
On Thu, 23 May 2024 17:37:39 -0700
Tim Rentsch <[email protected]> wrote:
Michael S <[email protected]> writes:
[...] Just want to say that strfrom* family is long overdue, but
still appear incomplete. The guiding principle should be that all
format specifiers available in printf() with sole exception of %s
should be provided as strfrom* as well.
What's the motivation for having separate functions? To me this
looks like creeping featuritis.
My practical motivation is space-constrained environments, where I
possibly want one or two or three formatters. sprintf() gives me
all or nothing and all can be too expensive. Many embedded
environments have big and small variants of sprintf that can be
chosen at link time, but what's in small variant does not
necessarily match a set that I want in my specific project. And is
not necessarily well documented.
Okay, I see now where you're coming from, although I'm not sure that
the strfrom*() functions will give you what you want (in terms of
memory footprint, etc). But I get your motivation.
Question: which of the four formats (%A, %E, %F, %G) are ones you
expect to use?
Also I'm curious: do all of your target platforms
use IEEE floating point, or do some use other representations?
On 23/05/2024 23:34, Michael S wrote:
On Thu, 23 May 2024 22:10:22 +0200
David Brown <[email protected]> wrote:
_Thread_local is a special-purpose thing, probably not applicable at
all for programming of small embedded systems, which nowadays is the
only type of programming in C that I do for money rather than as hobby.
I have never seen the point of it either. Why would anyone want a
variable that exists for /all/ threads in a program, but independently
per thread?
I can't say I have ever seen it as an effort. Almost all my C
"modules" come in pairs - "file.h" and "file.c". All non-local
variables (and all functions) are either static and declared only in "file.c", or they are externally linked and have an "extern"
declaration in "file.h" and a definition (with or without
initialisation) in "file.c" (which #includes "file.h"). It is a very
simple and clean arrangement, easily checked by gcc warnings, and
there are never any undetected conflicts.
On Fri, 24 May 2024 17:57:35 +0200
David Brown <[email protected]> wrote:
I can't say I have ever seen it as an effort. Almost all my C
"modules" come in pairs - "file.h" and "file.c". All non-local
variables (and all functions) are either static and declared only in
"file.c", or they are externally linked and have an "extern"
declaration in "file.h" and a definition (with or without
initialisation) in "file.c" (which #includes "file.h"). It is a very
simple and clean arrangement, easily checked by gcc warnings, and
there are never any undetected conflicts.
Declaration/definition pair is repeating yourself, which is not a good
think.
Of course, the same applies to declaration/definition of externally
visible functions, but somehow in case of functions I am more tolerant
to repetitions than in case of variable. Probably, a psychological
phenomenon - I feel that functions are less trivial, so repetition is
less wasteful.
But I'd like to get rid of these repetitions to, I just did not figure
out a way to do it that does not compromise even more important concern
of seperation between interface and implementation (yes, I dislike Java
for that reason too).
bart <[email protected]> writes:
[...]
I normally use a private systems language which some here have claimed[...]
is just C with a different syntax.
I don't recall anyone claiming that.
I know C has alignas ...
I virtually always use goto for memory allocation failure.
It does mean that, strictly, the function is no longer a "structured" subroutine. But reality is usually that memory allocation failure will
mean program termination pretty soon.
Why would anyone want a variable that exists for /all/ threads in a
program, but independently per thread? The only use I can think of is
for errno (which is, IMHO, a horror unto itself) but since that is
defined by the implementation, it does not need to use _Thread_local.
Declaration/definition pair is repeating yourself, which is not a good [thing].
On Fri, 24 May 2024 06:54:35 -0700
Tim Rentsch <[email protected]> wrote:
Michael S <[email protected]> writes:
On Thu, 23 May 2024 17:37:39 -0700
Tim Rentsch <[email protected]> wrote:
Michael S <[email protected]> writes:
[...] Just want to say that strfrom* family is long overdue, but
still appear incomplete. The guiding principle should be that all
format specifiers available in printf() with sole exception of %s
should be provided as strfrom* as well.
What's the motivation for having separate functions? To me this
looks like creeping featuritis.
My practical motivation is space-constrained environments, where I
possibly want one or two or three formatters. sprintf() gives me
all or nothing and all can be too expensive. Many embedded
environments have big and small variants of sprintf that can be
chosen at link time, but what's in small variant does not
necessarily match a set that I want in my specific project. And is
not necessarily well documented.
Okay, I see now where you're coming from, although I'm not sure that
the strfrom*() functions will give you what you want (in terms of
memory footprint, etc). But I get your motivation.
Question: which of the four formats (%A, %E, %F, %G) are ones you
expect to use?
Rarely: any of those, mostly for debugging.
In productioon code: %e is most likely, but %f could happen.
But it's not just a floating point. "Small" variants of sprintf()
on 32-bit platforms often unable to handle %lld and %llu.
Also I'm curious: do all of your target platforms
use IEEE floating point, or do some use other representations?
Currently, only IEEE. [...]
Keith Thompson <[email protected]> writes:
David Brown <[email protected]> writes:
On 23/05/2024 14:11, bart wrote:[...]
[...]'embed' was discussed a few months ago. I disagreed with the poor
way it was to be implemented: 'embed' notionally generates a list of
comma-separated numbers as tokens, where you have to take care of
any trailing zero yourself if needed. It would also be hopelessly
inefficient if actually implemented like that.
Fortunately, it is /not/ actually implemented like that - it is only
implemented "as if" it were like that. Real prototype implementations
(for gcc and clang - I don't know about other tools) are extremely
efficient at handling #embed. And the comma-separated numbers can be
more flexible in less common use-cases.
I'm aware of a proposed implementation for clang:
https://github.com/llvm/llvm-project/pull/68620
https://github.com/ThePhD/llvm-project
I'm currently cloning the git repo, with the aim of building it so I can
try it out and test some corner cases. It will take a while.
I'm not aware of any prototype implementation for gcc. If you are, I'd
be very interested in trying it out.
(And thanks for starting this thread!)
I've built this from source, and it mostly works. I haven't seen it do
any optimization; the `#embed` directive expands to a sequence of comma-separated integer constants.
Which means that this:
#include <stdio.h>
int main(void) {
struct foo {
unsigned char a;
unsigned short b;
unsigned int c;
double d;
};
struct foo obj = {
#embed "foo.dat"
};
printf("a=%d b=%d c=%d d=%f\n", obj.a, obj.b, obj.c, obj.d);
}
given "foo.dat" containing bytes with values 1, 2, 3, and 4, produces
this output:
a=1 b=2 c=3 d=4.000000
Em 5/24/2024 5:19 PM, Keith Thompson escreveu:
Thiago Adams <[email protected]> writes:I think I can explain I little better
On 24/05/2024 16:45, Keith Thompson wrote:
Thiago Adams <[email protected]> writes:
On 23/05/2024 18:49, Keith Thompson wrote:I don't understand. Do you object because it's not *immediately
error: 'constexpr' pointer initializer is not nullWhy not?
5 | constexpr char * s[] = {"a", "b"};
Then we were asking why constexpr was used in that case.
When I see a constexpr I ask if the compiler is able to compute
everything at compile time. If not immediately it is a bad usage in my >>>>> view.
obvious* that everthing can be computed at compile time? If so, why
should it have to be?
My understanding is that constexpr is a tip for the compiler. Does not
ensure anything. Unless you use where constant expression is required.
So I don't like to see constexpr where I know it is not a constant
expression.
Your understanding is incorrect. "constexpr" is not a mere hint.
Let´s consider we have a compile time array of integers and a loop.
https://godbolt.org/z/e8cM1KGWT
#include <stdio.h>
#include <stdlib.h>
int main() {
constexpr int a[] = {1, 2, 3, 4, 5, 6, 7, 8};
for (int i = 0 ; i < sizeof(a)/sizeof(a[0]); i++)
{
printf("%d", a[i]);
}
}
What the programmer expected using a constant array in a loop?
The loop is in runtime, unless the compiler expanded the loop into 8
calls using constant expressions. But this is not the case.
This was the usage of constexpr I saw but with literal strings.
So, the array a is not used as constant even if it has constexpr.
On 5/24/2024 7:50 AM, David Brown wrote:
On 24/05/2024 01:05, Chris M. Thomasson wrote:
On 5/23/2024 6:35 AM, David Brown wrote:
On 22/05/2024 23:24, Chris M. Thomasson wrote:
On 5/22/2024 9:55 AM, David Brown wrote:
In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of
change in it, especially compared to the minor changes in C17.
Love the way std::vectors respect alignas... C++20, iirc?
[...]
I have no idea what you are talking about.
std::vector actually respects alignas, on MSVC at least. I did not
know this worked until I tried it. Iirc, Bonita was the one that
sparked my test. It aligned itself on the proper boundaries. Very nice.
But did you notice that this is c.l.c, not c.l.c++, and the topic is
C23, not C++23 ? Discussing comparisons or compatibility with C++
is fair enough, but talking about pure C++ matters (such as
std::vector<>) is unlikely to be helpful.
C has it as well... Very useful!
I know C has alignas (now as a keyword in C23, instead of just
_Alignas from C11).
I know C++ has alignas (from C++11 onwards).
What I don't understand is why you think std::vector<> "respects
alignas" in C++20 - alignment for std::vector<> works like alignment
for any other class in C++, and always has done.
And what I /really/ don't understand is why you think it is remotely
relevant here? Even "alignas" in C is not particular relevant to this
thread, except that it has become a keyword in C23 instead of a macro
defined to _Alignas in <stdalign.h>.
alignas is very nice because it can help me make a 100% portable version
of some of my old exotic lock-free memory allocators that use rounding
to get down to a header. Any point in the region can be rounded down to
get at the header for the block. It involves aligning the main region on
a large boundary, say 8192 bytes. This is a little trick for high
performance lock-free allocators.
Iirc, I can make std::vector align its elements to say, L2 cachelines,
and I can make std::vector align itself on a large boundary say 8192
bytes. All in std C++! That is nice.
David Brown <[email protected]> writes:
On 23/05/2024 18:40, Keith Thompson wrote:
Michael S <[email protected]> writes:
[...]
Removed[...]
7) static_assert is not provided as a macro defined in <assert.h>[...]
(becomes a keyword)
8) thread_local is not provided as a macro defined in <threads.h>
(becomes a keyword)
7) bad. Breaks existing code for weak reasonIn pre-C23, _Static_assert and _Thread_local are keywords, and
8) bad. Breaks existing code for weak reason
static_assert and thread_local are macros that expand to those keywords. >>> In C23, _Static_assert, _Thread_local, static_assert, and
thread_local
are all keywords. Code that simply uses the old ugly keywords would not >>> break.
Code that does something like "#ifdef static_assert". I suppose the
headers could have retained the old macro definitions.
#define static_assert static_assert
#define thread_local thread_local
The sort of code that could theoretically break is when you have
definitions like this:
#define STATIC_ASSERT_NAME_(line) STATIC_ASSERT_NAME2_(line)
#define STATIC_ASSERT_NAME2_(line) assertion_failed_at_line_##line
#define static_assert(claim, warning) \
typedef struct { \
char STATIC_ASSERT_NAME_(__COUNTER__) [(claim) ? 2 : -2]; \
} STATIC_ASSERT_NAME_(__COUNTER__)
That works in any C version, until C23, almost as well as
_static_assert. I used this when C11 support was rare in the tools I
used.
You mean _Static_assert.
While using #define for a C keyword is undefined behaviour, in
practice I think you'd have a hard time finding code and a compiler
that used such a macro and which did not work just as well in C23
mode.
(I don't know if anyone is in the habit of declaring macros named
"thread_local".)
"static_assert" is already a macro defined in <assert.h> starting in
C11. The above code is valid in pre-C23, but will break in C11 and C17
if it includes <assert.h> directly or indirectly.
You can fix it by
adding "#undef static_assert" or by picking a different name, or by
making your macro definition conditional on __STDC_VERSION__ >= 202311L.
David Brown <[email protected]> writes:
On 23/05/2024 23:34, Michael S wrote:
On Thu, 23 May 2024 22:10:22 +0200
David Brown <[email protected]> wrote:
_Thread_local is a special-purpose thing, probably not applicable at
all for programming of small embedded systems, which nowadays is the
only type of programming in C that I do for money rather than as hobby.
I have never seen the point of it either. Why would anyone want a
variable that exists for /all/ threads in a program, but independently
per thread?
Very common in kernel programming (e.g. the use of '%gs' in x86_linux)
as a pointer to the 'per-cpu' data structure.
We use thread local to implement 'self' methods in certain
classes (so rather than passing pointers around, one can
simply call class::self() to get a pointer to the
class for each thread.
class c_processor {
...
/**
* Per-thread value of the processor object.
*/
static __thread c_processor *p_this;
...
public:
c_processor(c_system *, c_logger *, processor_number_t, bool);
~c_processor(void);
static c_processor *self(void) { return p_this; }
...
c_processor *pp = c_processor::self().
Em 5/24/2024 9:46 PM, Keith Thompson escreveu:
Thiago Adams <[email protected]> writes:
Em 5/24/2024 5:19 PM, Keith Thompson escreveu:
Thiago Adams <[email protected]> writes:I think I can explain I little better
On 24/05/2024 16:45, Keith Thompson wrote:Your understanding is incorrect. "constexpr" is not a mere hint.
Thiago Adams <[email protected]> writes:
On 23/05/2024 18:49, Keith Thompson wrote:I don't understand. Do you object because it's not *immediately
error: 'constexpr' pointer initializer is not nullWhy not?
5 | constexpr char * s[] = {"a", "b"};
Then we were asking why constexpr was used in that case.
When I see a constexpr I ask if the compiler is able to compute
everything at compile time. If not immediately it is a bad usage >>>>>>> in my
view.
obvious* that everthing can be computed at compile time? If so, why >>>>>> should it have to be?
My understanding is that constexpr is a tip for the compiler. Does not >>>>> ensure anything. Unless you use where constant expression is required. >>>>> So I don't like to see constexpr where I know it is not a constant >>>>> expression.
Let´s consider we have a compile time array of integers and a loop.
https://godbolt.org/z/e8cM1KGWT
#include <stdio.h>
#include <stdlib.h>
int main() {
constexpr int a[] = {1, 2, 3, 4, 5, 6, 7, 8};
for (int i = 0 ; i < sizeof(a)/sizeof(a[0]); i++)
{
printf("%d", a[i]);
}
}
What the programmer expected using a constant array in a loop?
The loop is in runtime, unless the compiler expanded the loop into 8
calls using constant expressions. But this is not the case.
This was the usage of constexpr I saw but with literal strings.
So, the array a is not used as constant even if it has constexpr.
What do you mean by "used as constant"?
Something used to produce a constant expression.
In the loop the compiler would have to get the value in runtime from
array, or unroll the loop.
I just checked, trying to extract an constant value from the array
https://godbolt.org/z/v33Pqd7W8
#include <stdio.h>
#include <stdlib.h>
int main() {
constexpr int a[] = {1, 2, 3, 4, 5, 6, 7, 8};
static_assert(a[0] ==1 );
}
I was expecting this to work!
But gcc says
<source>:5:24: error: expression in static assertion is not constant
5 | static_assert(a[0] ==1 );
|
The mess is even bigger than I thought.
In c++ it works
https://godbolt.org/z/qG6vGhEMj
Em 5/25/2024 8:05 AM, David Brown escreveu:
In C (not C++), defining an object as "constexpr" gives you two things
compared to defining it as "const". One is that its value can be used
when you need a constant expression according to the rules of the
language (such as for the size of an array in a struct). The other is
that it gives a compile-time error if its initialiser is not itself a
constant expression - and that means an extra check and protection
against some kinds of programmer errors, and extra information to
people reading the code.
I don't expect it to make a difference in generated code from an
optimising compiler, in comparison to objects declared with "const".
In my view , for this sample constexpr generates noise.
It also can make
the compilation slower, otherwise, why not everything constexpr by defaul?
I still didn't find a useful usage for constexpr that would compensate
the mess created with const, constexpr.
I already saw ( I don't have it
now ) proposals to make const more like constexpr in C. In C++ const is already a constant expression!
The justification for C was VLA. They should consider VLA not VLA if it
has a constant expression. In other words, better break this than create
a mess.
#define makes the job of constexpr.
On Fri, 24 May 2024 17:57:35 +0200
David Brown <[email protected]> wrote:
I can't say I have ever seen it as an effort. Almost all my C
"modules" come in pairs - "file.h" and "file.c". All non-local
variables (and all functions) are either static and declared only in
"file.c", or they are externally linked and have an "extern"
declaration in "file.h" and a definition (with or without
initialisation) in "file.c" (which #includes "file.h"). It is a very
simple and clean arrangement, easily checked by gcc warnings, and
there are never any undetected conflicts.
Declaration/definition pair is repeating yourself, which is not a good
think.
Of course, the same applies to declaration/definition of externally
visible functions, but somehow in case of functions I am more tolerant
to repetitions than in case of variable. Probably, a psychological
phenomenon - I feel that functions are less trivial, so repetition is
less wasteful.
But I'd like to get rid of these repetitions to, I just did not figure
out a way to do it that does not compromise even more important concern
of seperation between interface and implementation (yes, I dislike Java
for that reason too).
On Fri, 24 May 2024 17:57:35 +0200, David Brown wrote:
Why would anyone want a variable that exists for /all/ threads in a
program, but independently per thread? The only use I can think of is
for errno (which is, IMHO, a horror unto itself) but since that is
defined by the implementation, it does not need to use _Thread_local.
errno is indeed the example that immediately comes to mind for the use of this feature. It is supposed to have the semantics of an assignable
variable, so how else would you implement it, if not by some (possibly implementation-specific or special-case equivalent of) the _Thread_local mechanism?
I am in two minds over whether errno is a hack or not. On the one hand, it makes more sense for system calls (and library ones, too) to return an
error status directly; on the other hand, sometimes maybe you want to “accumulate” an error status after a series of calls, and errno is a convenient way of doing this.
As for other uses of thread-local, I think most of them have to do with optimizations, like threading itself. For example, imagine a bunch of
threads all contributing increments to a common counter: instead of continually blocking on access to that counter, they could each have their own thread-local counter, which periodically has its current value added
to the global counter and then zeroed.
Bonita Montero ha scritto:
Am 23.05.2024 um 21:49 schrieb Thiago Adams:
On 23/05/2024 16:25, Bonita Montero wrote:
I ask myself what the point is in further developing a languagedo you mean C++?
like this that can actually no longer be saved.
No, C.
I think you have a lot of confusion about programming languages. C and
C++ are not comparable languages.
On 25/05/2024 13:19, Thiago Adams wrote:
The justification for C was VLA. They should consider VLA not VLA if
it has a constant expression. In other words, better break this than
create a mess.
#define makes the job of constexpr.
#define is one way to make named items that can be used in constant expressions, yes. But if it can be done using #define or constexpr, I
think constexpr is the neater choice. Opinions can vary - that's my opinion.
On 2024-05-24, jak <[email protected]> wrote:
Bonita Montero ha scritto:
Am 23.05.2024 um 21:49 schrieb Thiago Adams:
On 23/05/2024 16:25, Bonita Montero wrote:
I ask myself what the point is in further developing a languagedo you mean C++?
like this that can actually no longer be saved.
No, C.
I think you have a lot of confusion about programming languages. C and
C++ are not comparable languages.
Except for observations like that we can write useful, production
software that compiles as C or C++, but go on ...
On 2024-05-24, jak <[email protected]> wrote:
Bonita Montero ha scritto:
Am 23.05.2024 um 21:49 schrieb Thiago Adams:
On 23/05/2024 16:25, Bonita Montero wrote:
I ask myself what the point is in further developing a languagedo you mean C++?
like this that can actually no longer be saved.
No, C.
I think you have a lot of confusion about programming languages. C and
C++ are not comparable languages.
Except for observations like that we can write useful, production
software that compiles as C or C++, but go on ...
Am 24.05.2024 um 09:32 schrieb jak:
Bonita Montero ha scritto:
Am 23.05.2024 um 21:49 schrieb Thiago Adams:
On 23/05/2024 16:25, Bonita Montero wrote:
I ask myself what the point is in further developing a languagedo you mean C++?
like this that can actually no longer be saved.
No, C.
I think you have a lot of confusion about programming languages. C and
C++ are not comparable languages.
C and C++ have a lot in common since 95% of what you can do you can do
in C++ also in the same way. But C++ puts 500% on top of that to solve
your tasks with a fraction of the code and if you use that the code
looks totally different than C.
I'm pretty convinced that c++ will be abandoned long before c.
Maybe, but for sure not in favour of C.
Just for one example, c++ would be abandoned years ago if c# didn't
produce CLI code only because C# lacks nothing important than C++
and the learning curve is much steeper (it also benefits from
reflection).
Being a good C++ programmer needs a lot of experience, but if you've
done that you get a magnitude more productivity. And often you decide
for simple approaches in C because complex approaches are a lot of work. Often this complex and more efficient approach is easy to handle in C++
if you managed to understand the language.
David Brown <[email protected]> writes:
On 25/05/2024 03:29, Keith Thompson wrote:
Keith Thompson <[email protected]> writes:
David Brown <[email protected]> writes:I've built this from source, and it mostly works. I haven't seen it
On 23/05/2024 14:11, bart wrote:[...]
[...]'embed' was discussed a few months ago. I disagreed with the poor
way it was to be implemented: 'embed' notionally generates a list of >>>>>> comma-separated numbers as tokens, where you have to take care of
any trailing zero yourself if needed. It would also be hopelessly
inefficient if actually implemented like that.
Fortunately, it is /not/ actually implemented like that - it is only >>>>> implemented "as if" it were like that. Real prototype implementations >>>>> (for gcc and clang - I don't know about other tools) are extremely
efficient at handling #embed. And the comma-separated numbers can be >>>>> more flexible in less common use-cases.
I'm aware of a proposed implementation for clang:
https://github.com/llvm/llvm-project/pull/68620
https://github.com/ThePhD/llvm-project
I'm currently cloning the git repo, with the aim of building it so I can >>>> try it out and test some corner cases. It will take a while.
I'm not aware of any prototype implementation for gcc. If you are, I'd >>>> be very interested in trying it out.
(And thanks for starting this thread!)
do
any optimization; the `#embed` directive expands to a sequence of
comma-separated integer constants.
Which means that this:
#include <stdio.h>
int main(void) {
struct foo {
unsigned char a;
unsigned short b;
unsigned int c;
double d;
};
struct foo obj = {
#embed "foo.dat"
};
printf("a=%d b=%d c=%d d=%f\n", obj.a, obj.b, obj.c, obj.d);
}
given "foo.dat" containing bytes with values 1, 2, 3, and 4,
produces
this output:
a=1 b=2 c=3 d=4.000000
That is what you would expect by the way #embed is specified. You
would not expect to see any "optimisation", since optimisations should
not change the results (apparent from choosing between alternative
valid results).
Where you will see the optimisation difference is between :
const int xs[] = {
#embed "x.dat"
};
and
const int xs[] = {
#include "x.csv"
};
where "x.dat" is a large binary file, and "x.csv" is the same data as
comma-separated values. The #embed version will compile very much
faster, using far less memory. /That/ is the optimisation.
Why would it compile faster? #embed expands to something similar to
CSV, which still has to be parsed.
Reference: <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf> 6.10.4.
The first one will probably initialize each int element of xs to a
single byte value extracted from x.dat. Is that what you intended?
#embed works best with arrays of unsigned char.
If you mean that the #embed will expand to something other than the
sequence of integer constants, how does it know to do that in this
context?
If you have a binary file containing a sequence of int values, you can
use #embed to initialize an unsigned char array that's aliased with or
copied to the int array.
The *embed element width* is typically going to be CHAR_BIT bits by
default. It can only be changed by an *implementation-defined* embed parameter. It seems odd that there's no standard way to specify the
element width.
It seems even more odd that the embed element width is
implementation defined and not set to CHAR_BIT by default.
A conforming implementation could set the embed element width to,
say, 4*CHAR_BIT and then not provide an implementation-defined embed parameter to specify a different width, making #embed unusable for
unsigned char arrays. (N3220 is a draft, not the final C23 standard,
but I haven't heard about any changes in this area.)
The kind of optimization I was thinking about was having #embed, in some cases, expand to something other than the specified sequence of comma-separated integer constants. Such an optimization would be
intended to improve compile-time speed and memory usage, not run-time performance.
With a straightforward implementation, the preprocessor has to generate
a sequence of integer constants as text, and then later compiler phases
have to parse that text sequence and generate the corresponding code.
Given:
const unsigned char data[4] = {
#embed "four_bytes.dat"
}
That 4 byte data file is translated to something like "1, 2, 3, 4", then converted into a stream of tokens, then those tokens are parsed, then,
given the context, the original 4-byte sequence is written into the
generated object file.
For a very large file, that could be a significant burden. (I don't
have any numbers on that.)
An optimized version might have the preprocessor generate some compiler-specific binary output, say something like "@rawdata N"
followed by N bytes of raw data. Later compiler phases recognize the "@rawdata" construct and directly dump the data into the object file in
the right place. Making #embed generate @rawdata is only part of the solution; the compiler has to implement @rawdata in a way that allows it
to be used inside an initializer, or perhaps in any other appropriate context.
This could be substantially more efficient for something like:
static const unsigned char data[] = {
#embed "bigfile.dat"
};
Of course it wouldn't handle my test case above. But #embed can take parameters, so it could generate the standard sequence by default and "@rawdata" if you ask for it.
I don't know whether this kind of optimization is worthwhile, i.e.,
whether the straightforward implementation really imposes significant commpile-time performance penalties that @rawdata or equivalent can
solve. I also don't know whether existing implementations will
implement this kind of optimization (so far they haven't implemented
#embed at all).
jak <[email protected]> writes:
Kaz Kylheku ha scritto:
On 2024-05-24, jak <[email protected]> wrote:
Bonita Montero ha scritto:Except for observations like that we can write useful, production
Am 23.05.2024 um 21:49 schrieb Thiago Adams:
On 23/05/2024 16:25, Bonita Montero wrote:
I ask myself what the point is in further developing a languagedo you mean C++?
like this that can actually no longer be saved.
No, C.
I think you have a lot of confusion about programming languages. C and >>>> C++ are not comparable languages.
software that compiles as C or C++, but go on ...
Indeed there are c++ compilers who, if used to compile c code, could
decide to call the c compiler to do the work, but if something in the
code is not strictly c, then the compilation will be in c++, the size
of the executable will increase significantly and will need of an
internal or external runtimer to work. If it were the same thing you
would not get different things.
Oh? Do you know of a C++ compiler that actually behaves this way?
I've never heard of such a thing.
C and C++ are closely related, and C and C++ compilers often share
backends, but the two languages have different grammars. The gcc
command, for example, can invoke either a C or C++ compiler, but it
knows which language it's compiling based on the source file name or
command line options, before it's even seen the content.
There are programs that are valid C and valid C++ but with different behavior. How would a compiler that behaves as you describe cope with
that?
On 26/05/2024 00:58, Keith Thompson wrote:
For a very large file, that could be a significant burden. (I don't
have any numbers on that.)
I do :
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#design-efficiency-metrics>
(That's from a proposal for #embed for C and C++. Generating the
numbers and parsing them is akin to using xxd.)
More useful links:
<https://thephd.dev/embed-the-details#results> <https://thephd.dev/implementing-embed-c-and-c++>
(These are from someone who did a lot of the work for the proposals, and prototype implementations, as far as I understand it.)
Note that I can't say how much of a difference this will make in real
life. I don't know how often people need to include multi-megabyte
files in their code. It certainly is not at a level where I would
change any of my existing projects from external generator scripts to
using #embed, but I might use it in future projects.
Keith Thompson ha scritto:
jak <[email protected]> writes:
Kaz Kylheku ha scritto:
On 2024-05-24, jak <[email protected]> wrote:
Bonita Montero ha scritto:Except for observations like that we can write useful, production
Am 23.05.2024 um 21:49 schrieb Thiago Adams:
On 23/05/2024 16:25, Bonita Montero wrote:
I ask myself what the point is in further developing ado you mean C++?
language like this that can actually no longer be saved.
No, C.
I think you have a lot of confusion about programming languages.
C and C++ are not comparable languages.
software that compiles as C or C++, but go on ...
Indeed there are c++ compilers who, if used to compile c code,
could decide to call the c compiler to do the work, but if
something in the code is not strictly c, then the compilation will
be in c++, the size of the executable will increase significantly
and will need of an internal or external runtimer to work. If it
were the same thing you would not get different things.
Oh? Do you know of a C++ compiler that actually behaves this way?
I've never heard of such a thing.
C and C++ are closely related, and C and C++ compilers often share backends, but the two languages have different grammars. The gcc
command, for example, can invoke either a C or C++ compiler, but it
knows which language it's compiling based on the source file name or command line options, before it's even seen the content.
There are programs that are valid C and valid C++ but with different behavior. How would a compiler that behaves as you describe cope
with that?
For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.
On 26/05/2024 12:09, David Brown wrote:
On 26/05/2024 00:58, Keith Thompson wrote:
For a very large file, that could be a significant burden. (I
don't have any numbers on that.)
I do :
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#design-efficiency-metrics>
(That's from a proposal for #embed for C and C++. Generating the
numbers and parsing them is akin to using xxd.)
More useful links:
<https://thephd.dev/embed-the-details#results> <https://thephd.dev/implementing-embed-c-and-c++>
(These are from someone who did a lot of the work for the
proposals, and prototype implementations, as far as I understand
it.)
Note that I can't say how much of a difference this will make in
real life. I don't know how often people need to include
multi-megabyte files in their code. It certainly is not at a level
where I would change any of my existing projects from external
generator scripts to using #embed, but I might use it in future
projects.
I've just done my own quick test (not in C, using embed in my
language):
[]byte clangexe = binclude("f:/llvm/bin/clang.exe")
proc main=
fprintln "clang.exe is # bytes", clangexe.len
end
This embeds the Clang C compiler which is 119MB. It took 1.3 seconds
to compile (note my compiler is not optimised).
If I tried it using text: a 121M-line include file, with one number
per line, it took 144 seconds (I believe it used more RAM than was available: each line will have occupied a 64-byte AST node, so nearly
8GB, on a machine with only 6GB available RAM, much of which was
occupied).
The figures at your link say it took 1 second for a 40MB test file,
on an Intel i7 with 24GB.
My compiler took just over 1.3 seconds (now annoyingly taking 1.4
seconds for a retest) for a file nearly 3 times bigger, on a much
more lowly machine (second cheapest PC in the shop), with 8GB.
So my implementation sounds faster. Of course, those 120M data bytes
haven't been optimised!
As for usage, this would be a tidy way of bundling a program like a C compiler if your program required it, although there are a number of alternatives in that case: the binary here doesn't need to exist in
the application's data space.
On Sun, 26 May 2024 13:44:32 +0200
jak <[email protected]> wrote:
Keith Thompson ha scritto:
jak <[email protected]> writes:
Kaz Kylheku ha scritto:
On 2024-05-24, jak <[email protected]> wrote:
Bonita Montero ha scritto:Except for observations like that we can write useful, production
Am 23.05.2024 um 21:49 schrieb Thiago Adams:
On 23/05/2024 16:25, Bonita Montero wrote:
I ask myself what the point is in further developing ado you mean C++?
language like this that can actually no longer be saved.
No, C.
I think you have a lot of confusion about programming languages.
C and C++ are not comparable languages.
software that compiles as C or C++, but go on ...
Indeed there are c++ compilers who, if used to compile c code,
could decide to call the c compiler to do the work, but if
something in the code is not strictly c, then the compilation will
be in c++, the size of the executable will increase significantly
and will need of an internal or external runtimer to work. If it
were the same thing you would not get different things.
Oh? Do you know of a C++ compiler that actually behaves this way?
I've never heard of such a thing.
C and C++ are closely related, and C and C++ compilers often share
backends, but the two languages have different grammars. The gcc
command, for example, can invoke either a C or C++ compiler, but it
knows which language it's compiling based on the source file name or
command line options, before it's even seen the content.
There are programs that are valid C and valid C++ but with different
behavior. How would a compiler that behaves as you describe cope
with that?
For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.
No, it does not.
g++ compiles as C++ unless you tell it to compile as C with '-x c'
option.
David Brown <[email protected]> writes:
[...]
The normal way for multi-threaded systems is to implement it as a
macro. It might be, for example :
#define errno __thread_data->_errno
or
#define errno *errno()
Both of those need more parentheses -- and I'm unconfortable using the
same identifier for the macro and the function.
That is precisely why it is specified in the C standards as a macro,[...]
not an external linkage object with static or thread-local storage
duration. (The use of errno in multi-threading C code long predates
C11 and _Thread_local.)
glibc and musl both have :
# define errno (*__errno_location ())
newlib (used on Cygwin) has something similar :
#define errno (*__errno())
Michael S ha scritto:
On Sun, 26 May 2024 13:44:32 +0200
jak <[email protected]> wrote:
Keith Thompson ha scritto:
jak <[email protected]> writes:
Kaz Kylheku ha scritto:
On 2024-05-24, jak <[email protected]> wrote:
Bonita Montero ha scritto:Except for observations like that we can write useful,
Am 23.05.2024 um 21:49 schrieb Thiago Adams:
On 23/05/2024 16:25, Bonita Montero wrote:
I ask myself what the point is in further developing ado you mean C++?
language like this that can actually no longer be saved.
No, C.
I think you have a lot of confusion about programming
languages. C and C++ are not comparable languages.
production software that compiles as C or C++, but go on ...
Indeed there are c++ compilers who, if used to compile c code,
could decide to call the c compiler to do the work, but if
something in the code is not strictly c, then the compilation
will be in c++, the size of the executable will increase
significantly and will need of an internal or external runtimer
to work. If it were the same thing you would not get different
things.
Oh? Do you know of a C++ compiler that actually behaves this way?
I've never heard of such a thing.
C and C++ are closely related, and C and C++ compilers often share
backends, but the two languages have different grammars. The gcc
command, for example, can invoke either a C or C++ compiler, but
it knows which language it's compiling based on the source file
name or command line options, before it's even seen the content.
There are programs that are valid C and valid C++ but with
different behavior. How would a compiler that behaves as you
describe cope with that?
For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.
No, it does not.
g++ compiles as C++ unless you tell it to compile as C with '-x c'
option.
You didn't read carefully or I didn't express myself well. I wrote
that the g++ compile c++ even if it is written inside a .c file.
However in doubt I preferred to try. If I pass to g++ a .c file that
contains c code, it compiles without any option, perhaps because it
reads as if it were c++ but in any case compiles it.
David Brown <[email protected]> writes:
On 24/05/2024 21:29, Keith Thompson wrote:[...]
"static_assert" is already a macro defined in <assert.h> starting in
C11. The above code is valid in pre-C23, but will break in C11 and C17
if it includes <assert.h> directly or indirectly.
Yes. But including <assert.h> is optional.
Your header that defines your own "static_assert" macro might depend on
some other header outside your control. A future version of that other header might add a "#include <assert.h>", breaking your code.
There are solutions (check "#ifdef static_assert" for the macro and __STDC_VERSION__ for the keyword, etc.)
Perhaps it's not an issue for you, but it's a corner case to keep in
mind.
Michael S ha scritto:
On Sun, 26 May 2024 13:44:32 +0200
jak <[email protected]> wrote:
Keith Thompson ha scritto:
jak <[email protected]> writes:
Kaz Kylheku ha scritto:
On 2024-05-24, jak <[email protected]> wrote:
Bonita Montero ha scritto:Except for observations like that we can write useful, production
Am 23.05.2024 um 21:49 schrieb Thiago Adams:
On 23/05/2024 16:25, Bonita Montero wrote:
I ask myself what the point is in further developing ado you mean C++?
language like this that can actually no longer be saved.
No, C.
I think you have a lot of confusion about programming languages. >>>>>>> C and C++ are not comparable languages.
software that compiles as C or C++, but go on ...
Indeed there are c++ compilers who, if used to compile c code,
could decide to call the c compiler to do the work, but if
something in the code is not strictly c, then the compilation will
be in c++, the size of the executable will increase significantly
and will need of an internal or external runtimer to work. If it
were the same thing you would not get different things.
Oh? Do you know of a C++ compiler that actually behaves this way?
I've never heard of such a thing.
C and C++ are closely related, and C and C++ compilers often share
backends, but the two languages have different grammars. The gcc
command, for example, can invoke either a C or C++ compiler, but it
knows which language it's compiling based on the source file name or
command line options, before it's even seen the content.
There are programs that are valid C and valid C++ but with different
behavior. How would a compiler that behaves as you describe cope
with that?
For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.
No, it does not.
g++ compiles as C++ unless you tell it to compile as C with '-x c'
option.
You didn't read carefully or I didn't express myself well. I wrote that
the g++ compile c++ even if it is written inside a .c file.
However in doubt I preferred to try. If I pass to g++ a .c file that
contains c code, it compiles without any option, perhaps because it
reads as if it were c++ but in any case compiles it.
On 26/05/2024 15:46, jak wrote:
Michael S ha scritto:
On Sun, 26 May 2024 13:44:32 +0200
jak <[email protected]> wrote:
Keith Thompson ha scritto:
jak <[email protected]> writes:
Kaz Kylheku ha scritto:
On 2024-05-24, jak <[email protected]> wrote:
Bonita Montero ha scritto:Except for observations like that we can write useful,
Am 23.05.2024 um 21:49 schrieb Thiago Adams:
On 23/05/2024 16:25, Bonita Montero wrote:
I ask myself what the point is in further developing a
language like this that can actually no longer be saved. >>>>>>>>> do you mean C++?
No, C.
I think you have a lot of confusion about programming
languages. C and C++ are not comparable languages.
production software that compiles as C or C++, but go on ...
Indeed there are c++ compilers who, if used to compile c code,
could decide to call the c compiler to do the work, but if
something in the code is not strictly c, then the compilation
will be in c++, the size of the executable will increase
significantly and will need of an internal or external runtimer
to work. If it were the same thing you would not get different
things.
Oh? Do you know of a C++ compiler that actually behaves this
way? I've never heard of such a thing.
C and C++ are closely related, and C and C++ compilers often
share backends, but the two languages have different grammars.
The gcc command, for example, can invoke either a C or C++
compiler, but it knows which language it's compiling based on
the source file name or command line options, before it's even
seen the content.
There are programs that are valid C and valid C++ but with
different behavior. How would a compiler that behaves as you
describe cope with that?
For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.
No.
No, it does not.
g++ compiles as C++ unless you tell it to compile as C with '-x c'
option.
No.
You didn't read carefully or I didn't express myself well. I wrote
that the g++ compile c++ even if it is written inside a .c file.
However in doubt I preferred to try. If I pass to g++ a .c file that contains c code, it compiles without any option, perhaps because it
reads as if it were c++ but in any case compiles it.
No.
The way gcc handles all this is actually quite straightforward.
First, there is no difference between the commands "gcc" and "g++" in
the languages supported, or the way the language is determined. The
only difference between these two is the standard libraries linked by default when generating a final executable - "g++" automatically
includes the C++ standard libraries, while "gcc" only has the C
standard libraries.
In neither case does "gcc" or "g++" actually handle the compilation -
these are driver front-ends that pass things on to the actual
compilers, assemblers and linkers (and any other bits and pieces
required).
The front-ends determine the language to use primarily from the
suffix of the source file it is given. ".c" files are compiled as C.
".cpp", ".c++", ".cc", ".C" (note the capital C), ".cp", ".cxx", and
".CPP" are compiled as C++. (There are many other extensions
supported for different languages.)
The language choice can be overridden by using the "-x" switch, such
as "-x c" or "-x c++". The standard can be specified with "-std=".
There is no automatic detection of C or C++ based on the /content/ of
the files.
<https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html>
On 26/05/2024 15:46, jak wrote:
Michael S ha scritto:
On Sun, 26 May 2024 13:44:32 +0200
jak <[email protected]> wrote:
Keith Thompson ha scritto:
jak <[email protected]> writes:
Kaz Kylheku ha scritto:
On 2024-05-24, jak <[email protected]> wrote:
Bonita Montero ha scritto:Except for observations like that we can write useful, production >>>>>>> software that compiles as C or C++, but go on ...
Am 23.05.2024 um 21:49 schrieb Thiago Adams:
On 23/05/2024 16:25, Bonita Montero wrote:
I ask myself what the point is in further developing ado you mean C++?
language like this that can actually no longer be saved.
No, C.
I think you have a lot of confusion about programming languages. >>>>>>>> C and C++ are not comparable languages.
Indeed there are c++ compilers who, if used to compile c code,
could decide to call the c compiler to do the work, but if
something in the code is not strictly c, then the compilation will >>>>>> be in c++, the size of the executable will increase significantly
and will need of an internal or external runtimer to work. If it
were the same thing you would not get different things.
Oh? Do you know of a C++ compiler that actually behaves this way?
I've never heard of such a thing.
C and C++ are closely related, and C and C++ compilers often share
backends, but the two languages have different grammars. The gcc
command, for example, can invoke either a C or C++ compiler, but it
knows which language it's compiling based on the source file name or >>>>> command line options, before it's even seen the content.
There are programs that are valid C and valid C++ but with different >>>>> behavior. How would a compiler that behaves as you describe cope
with that?
For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.
No.
No, it does not.
g++ compiles as C++ unless you tell it to compile as C with '-x c'
option.
No.
You didn't read carefully or I didn't express myself well. I wrote that
the g++ compile c++ even if it is written inside a .c file.
However in doubt I preferred to try. If I pass to g++ a .c file that
contains c code, it compiles without any option, perhaps because it
reads as if it were c++ but in any case compiles it.
No.
The way gcc handles all this is actually quite straightforward.
First, there is no difference between the commands "gcc" and "g++" in
the languages supported, or the way the language is determined. The
only difference between these two is the standard libraries linked by
default when generating a final executable - "g++" automatically
includes the C++ standard libraries, while "gcc" only has the C standard libraries.
In neither case does "gcc" or "g++" actually handle the compilation -
these are driver front-ends that pass things on to the actual compilers, assemblers and linkers (and any other bits and pieces required).
The front-ends determine the language to use primarily from the suffix
of the source file it is given. ".c" files are compiled as C. ".cpp", ".c++", ".cc", ".C" (note the capital C), ".cp", ".cxx", and ".CPP" are compiled as C++. (There are many other extensions supported for
different languages.)
The language choice can be overridden by using the "-x" switch, such as
"-x c" or "-x c++". The standard can be specified with "-std=".
There is no automatic detection of C or C++ based on the /content/ of
the files.
<https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html>
?
I really wrote that something similar (similar != equal) did g++ and
that, if you write c++ code in a file with the .c extension, the g++
compile it. I never wrote that it was automatically recognized.
In addition, you just explained why g++ compile a .c that contains c++
code. I don't understand: no what?
On Sun, 26 May 2024 12:51:12 +0100
bart <[email protected]> wrote:
On 26/05/2024 12:09, David Brown wrote:
On 26/05/2024 00:58, Keith Thompson wrote:
For a very large file, that could be a significant burden. (I
don't have any numbers on that.)
I do :
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#design-efficiency-metrics>
(That's from a proposal for #embed for C and C++. Generating the
numbers and parsing them is akin to using xxd.)
More useful links:
<https://thephd.dev/embed-the-details#results>
<https://thephd.dev/implementing-embed-c-and-c++>
(These are from someone who did a lot of the work for the
proposals, and prototype implementations, as far as I understand
it.)
Note that I can't say how much of a difference this will make in
real life. I don't know how often people need to include
multi-megabyte files in their code. It certainly is not at a level
where I would change any of my existing projects from external
generator scripts to using #embed, but I might use it in future
projects.
I've just done my own quick test (not in C, using embed in my
language):
[]byte clangexe = binclude("f:/llvm/bin/clang.exe")
proc main=
fprintln "clang.exe is # bytes", clangexe.len
end
This embeds the Clang C compiler which is 119MB. It took 1.3 seconds
to compile (note my compiler is not optimised).
If I tried it using text: a 121M-line include file, with one number
per line, it took 144 seconds (I believe it used more RAM than was
available: each line will have occupied a 64-byte AST node, so nearly
8GB, on a machine with only 6GB available RAM, much of which was
occupied).
On my old PC that was not the cheapest box in the shop, but is more than
10 y.o. compilation speed for similarly organized (but much smaller)
text files is as following:
MSVC 18.00.31101 (VS 2013) - 1950 KB/sec
MSVC 19.16.27032 (VS 2017) - 1180 KB/sec
MSVC 19.20.27500 (VS 2019) - 1180 KB/sec
clang 17.0.6 - 547 KB/sec (somewhat better with hex text)
gcc 13.2.0 - 580 KB/sec
So, MSVC compilers, esp. an old one, are somewhat faster than yours.
But if there was swapping involved it's not comparable. How much time
does it take for your compiler to produce 5MB byte array from text?
But both are much faster than compiling through text. Even "slow"
40MB/3 is 6-7 times faster than the fastest of compilers in my tests.
On 5/26/2024 9:18 AM, David Brown wrote:
On 26/05/2024 01:45, Keith Thompson wrote:
David Brown <[email protected]> writes:
[...]
The normal way for multi-threaded systems is to implement it as a
macro. It might be, for example :
#define errno __thread_data->_errno
or
#define errno *errno()
Both of those need more parentheses -- and I'm unconfortable using the
same identifier for the macro and the function.
The second example was from the footnote in the C standard's section
on <errno.h>, so it can't be /that/ bad!
But I agree with your discomfort.
I would expect it to immediately explode, because AFAIK the usual preprocessor behavior is to keep expanding macros in a line until there
is nothing left to expand.
Well, granted, it is possible I could have misinterpreted how it was
supposed to work and had never noticed...
On Sun, 26 May 2024 16:29:35 +0200
David Brown <[email protected]> wrote:
On 26/05/2024 15:46, jak wrote:
Michael S ha scritto:
On Sun, 26 May 2024 13:44:32 +0200
jak <[email protected]> wrote:
Keith Thompson ha scritto:
jak <[email protected]> writes:
Kaz Kylheku ha scritto:
On 2024-05-24, jak <[email protected]> wrote:
Bonita Montero ha scritto:Except for observations like that we can write useful,
Am 23.05.2024 um 21:49 schrieb Thiago Adams:
On 23/05/2024 16:25, Bonita Montero wrote:
I ask myself what the point is in further developing a >>>>>>>>>>>> language like this that can actually no longer be saved. >>>>>>>>>>> do you mean C++?
No, C.
I think you have a lot of confusion about programming
languages. C and C++ are not comparable languages.
production software that compiles as C or C++, but go on ...
Indeed there are c++ compilers who, if used to compile c code,
could decide to call the c compiler to do the work, but if
something in the code is not strictly c, then the compilation
will be in c++, the size of the executable will increase
significantly and will need of an internal or external runtimer
to work. If it were the same thing you would not get different
things.
Oh? Do you know of a C++ compiler that actually behaves this
way? I've never heard of such a thing.
C and C++ are closely related, and C and C++ compilers often
share backends, but the two languages have different grammars.
The gcc command, for example, can invoke either a C or C++
compiler, but it knows which language it's compiling based on
the source file name or command line options, before it's even
seen the content.
There are programs that are valid C and valid C++ but with
different behavior. How would a compiler that behaves as you
describe cope with that?
For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.
No.
No, it does not.
g++ compiles as C++ unless you tell it to compile as C with '-x c'
option.
No.
You didn't read carefully or I didn't express myself well. I wrote
that the g++ compile c++ even if it is written inside a .c file.
However in doubt I preferred to try. If I pass to g++ a .c file that
contains c code, it compiles without any option, perhaps because it
reads as if it were c++ but in any case compiles it.
No.
The way gcc handles all this is actually quite straightforward.
First, there is no difference between the commands "gcc" and "g++" in
the languages supported, or the way the language is determined. The
only difference between these two is the standard libraries linked by
default when generating a final executable - "g++" automatically
includes the C++ standard libraries, while "gcc" only has the C
standard libraries.
In neither case does "gcc" or "g++" actually handle the compilation -
these are driver front-ends that pass things on to the actual
compilers, assemblers and linkers (and any other bits and pieces
required).
I don't know how it works in your environment.
I am 100% sure that it works like I wrote above in my environment. Specifically:
'g++ -c foo.c' calls binary cc1plus.exe
'g++ -c -x c foo.c' calls binary cc1.exe
'gcc -c foo.c' calls binary cc1.exe
'gcc -c foo.cpp' calls binary cc1plus.exe
'gcc -c foo.C' calls binary cc1plus.exe
The front-ends determine the language to use primarily from the
suffix of the source file it is given. ".c" files are compiled as C.
".cpp", ".c++", ".cc", ".C" (note the capital C), ".cp", ".cxx", and
".CPP" are compiled as C++. (There are many other extensions
supported for different languages.)
In my environment it applies to gcc, but not to g++.
In order to force my g++ to compile for other language you have to tell
it so explicitly.
The language choice can be overridden by using the "-x" switch, such
as "-x c" or "-x c++". The standard can be specified with "-std=".
Yes, of course.
There is no automatic detection of C or C++ based on the /content/ of
the files.
Yes, of course.
<https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html>
David Brown ha scritto:
On 26/05/2024 15:46, jak wrote:
Michael S ha scritto:
On Sun, 26 May 2024 13:44:32 +0200
jak <[email protected]> wrote:
Keith Thompson ha scritto:
jak <[email protected]> writes:
Kaz Kylheku ha scritto:
On 2024-05-24, jak <[email protected]> wrote:
Bonita Montero ha scritto:Except for observations like that we can write useful, production >>>>>>>> software that compiles as C or C++, but go on ...
Am 23.05.2024 um 21:49 schrieb Thiago Adams:
On 23/05/2024 16:25, Bonita Montero wrote:
I ask myself what the point is in further developing a >>>>>>>>>>>> language like this that can actually no longer be saved. >>>>>>>>>>> do you mean C++?
No, C.
I think you have a lot of confusion about programming languages. >>>>>>>>> C and C++ are not comparable languages.
Indeed there are c++ compilers who, if used to compile c code,
could decide to call the c compiler to do the work, but if
something in the code is not strictly c, then the compilation will >>>>>>> be in c++, the size of the executable will increase significantly >>>>>>> and will need of an internal or external runtimer to work. If it >>>>>>> were the same thing you would not get different things.
Oh? Do you know of a C++ compiler that actually behaves this way? >>>>>> I've never heard of such a thing.
C and C++ are closely related, and C and C++ compilers often share >>>>>> backends, but the two languages have different grammars. The gcc >>>>>> command, for example, can invoke either a C or C++ compiler, but it >>>>>> knows which language it's compiling based on the source file name or >>>>>> command line options, before it's even seen the content.
There are programs that are valid C and valid C++ but with different >>>>>> behavior. How would a compiler that behaves as you describe cope >>>>>> with that?
For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.
No.
No, it does not.
g++ compiles as C++ unless you tell it to compile as C with '-x c'
option.
No.
You didn't read carefully or I didn't express myself well. I wrote that
the g++ compile c++ even if it is written inside a .c file.
However in doubt I preferred to try. If I pass to g++ a .c file that
contains c code, it compiles without any option, perhaps because it
reads as if it were c++ but in any case compiles it.
No.
The way gcc handles all this is actually quite straightforward.
First, there is no difference between the commands "gcc" and "g++" in
the languages supported, or the way the language is determined. The
only difference between these two is the standard libraries linked by
default when generating a final executable - "g++" automatically
includes the C++ standard libraries, while "gcc" only has the C
standard libraries.
In neither case does "gcc" or "g++" actually handle the compilation -
these are driver front-ends that pass things on to the actual
compilers, assemblers and linkers (and any other bits and pieces
required).
The front-ends determine the language to use primarily from the suffix
of the source file it is given. ".c" files are compiled as C.
".cpp", ".c++", ".cc", ".C" (note the capital C), ".cp", ".cxx", and
".CPP" are compiled as C++. (There are many other extensions
supported for different languages.)
The language choice can be overridden by using the "-x" switch, such
as "-x c" or "-x c++". The standard can be specified with "-std=".
There is no automatic detection of C or C++ based on the /content/ of
the files.
<https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html>
?
I really wrote that something similar (similar != equal) did g++ and
that, if you write c++ code in a file with the .c extension, the g++
compile it. I never wrote that it was automatically recognized.
In addition, you just explained why g++ compile a .c that contains c++
code. I don't understand: no what?
On 26/05/2024 14:18, Michael S wrote:
On Sun, 26 May 2024 12:51:12 +0100
bart <[email protected]> wrote:
On 26/05/2024 12:09, David Brown wrote:
On 26/05/2024 00:58, Keith Thompson wrote:
For a very large file, that could be a significant burden. (I
don't have any numbers on that.)
I do :
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#design-efficiency-metrics>
(That's from a proposal for #embed for C and C++. Generating the
numbers and parsing them is akin to using xxd.)
More useful links:
<https://thephd.dev/embed-the-details#results>
<https://thephd.dev/implementing-embed-c-and-c++>
(These are from someone who did a lot of the work for the
proposals, and prototype implementations, as far as I understand
it.)
Note that I can't say how much of a difference this will make in
real life. I don't know how often people need to include
multi-megabyte files in their code. It certainly is not at a
level where I would change any of my existing projects from
external generator scripts to using #embed, but I might use it in
future projects.
I've just done my own quick test (not in C, using embed in my
language):
[]byte clangexe = binclude("f:/llvm/bin/clang.exe")
proc main=
fprintln "clang.exe is # bytes", clangexe.len
end
This embeds the Clang C compiler which is 119MB. It took 1.3
seconds to compile (note my compiler is not optimised).
If I tried it using text: a 121M-line include file, with one number
per line, it took 144 seconds (I believe it used more RAM than was
available: each line will have occupied a 64-byte AST node, so
nearly 8GB, on a machine with only 6GB available RAM, much of
which was occupied).
On my old PC that was not the cheapest box in the shop, but is more
than 10 y.o. compilation speed for similarly organized (but much
smaller) text files is as following:
MSVC 18.00.31101 (VS 2013) - 1950 KB/sec
MSVC 19.16.27032 (VS 2017) - 1180 KB/sec
MSVC 19.20.27500 (VS 2019) - 1180 KB/sec
clang 17.0.6 - 547 KB/sec (somewhat better with hex text)
gcc 13.2.0 - 580 KB/sec
So, MSVC compilers, esp. an old one, are somewhat faster than yours.
But if there was swapping involved it's not comparable. How much
time does it take for your compiler to produce 5MB byte array from
text?
Are you talking about a 5MB array initialised like this:
unsigned char data[] = {
45,
67,
17,
... // 5M-3 more rows
};
The timing for 120M entries was challenging as it exceeded physical
memory. However that test I can also do with C compilers. Results for
120 million lines of data are:
DMC - Out-of-memory
Tiny C - Silently stopped after 13 second (I thought it
had finished but no)
lccwin32 - Insufficient memory
gcc 10.x.x - Out of memory after 80 seconds
mcc - (My product) Memory failure after 27 seconds
Clang - (Crashed after 5 minutes)
MM 144s (Compiler for my language)
So the compiler for my language did quite well, considering!
Back to the 5MB test:
Tiny C 1.7s 2.9MB/sec (Tcc doesn't use any IR)
mcc 3.7s 1.3MB/sec (my product; uses intermediate ASM)
DMC -- -- (Out of memory; 32-bit compiler)
lccwin32 3.9s 1.3MB/sec
gcc 10.x 10.6s 0.5MB/sec
clang 7.4s 0.7MB/sec (to object file only)
MM 1.4s 3.6MB/sec (compiler for my language)
MM 0.7 7.1MB/sec (MM optimised via C and gcc-O3)
As a reminder, when using my version of 'embed' in my language,
embedding a 120MB binary file took 1.3 seconds, about 90MB/second.
But both are much faster than compiling through text. Even "slow"
40MB/3 is 6-7 times faster than the fastest of compilers in my
tests.
Do you have a C compiler that supports #embed?
It's generally understood that processing text is slow, if
representing byte-at-a-time data. If byte arrays could be represented
as sequences of i64 constants, it would improve matters. That could
be done in C, but awkwardly, by aliasing a byte-array with an
i64-array.
On 26/05/2024 17:05, Michael S wrote:
In my environment it applies to gcc, but not to g++.
In order to force my g++ to compile for other language you have to
tell it so explicitly.
No, g++ treats extensions other than ".c" the same way as gcc. (I
tested to be sure this time!) Try :
touch foo.f
gcc foo.f
g++ foo.f
You'll get the same complaint - either from missing Fortran support
or a failure to build the Fortran program. Even "g++ foo.m" tries to
compile as Objective-C, not Objective-C++.
On 26/05/2024 17:10, jak wrote:
David Brown ha scritto:
On 26/05/2024 15:46, jak wrote:
Michael S ha scritto:
On Sun, 26 May 2024 13:44:32 +0200
jak <[email protected]> wrote:
Keith Thompson ha scritto:
jak <[email protected]> writes:
Kaz Kylheku ha scritto:
On 2024-05-24, jak <[email protected]> wrote:
Bonita Montero ha scritto:Except for observations like that we can write useful, production >>>>>>>>> software that compiles as C or C++, but go on ...
Am 23.05.2024 um 21:49 schrieb Thiago Adams:
On 23/05/2024 16:25, Bonita Montero wrote:
I ask myself what the point is in further developing a >>>>>>>>>>>>> language like this that can actually no longer be saved. >>>>>>>>>>>> do you mean C++?
No, C.
I think you have a lot of confusion about programming languages. >>>>>>>>>> C and C++ are not comparable languages.
Indeed there are c++ compilers who, if used to compile c code, >>>>>>>> could decide to call the c compiler to do the work, but if
something in the code is not strictly c, then the compilation will >>>>>>>> be in c++, the size of the executable will increase significantly >>>>>>>> and will need of an internal or external runtimer to work. If it >>>>>>>> were the same thing you would not get different things.
Oh? Do you know of a C++ compiler that actually behaves this way? >>>>>>> I've never heard of such a thing.
C and C++ are closely related, and C and C++ compilers often share >>>>>>> backends, but the two languages have different grammars. The gcc >>>>>>> command, for example, can invoke either a C or C++ compiler, but it >>>>>>> knows which language it's compiling based on the source file name or >>>>>>> command line options, before it's even seen the content.
There are programs that are valid C and valid C++ but with different >>>>>>> behavior. How would a compiler that behaves as you describe cope >>>>>>> with that?
For example g++ makes something similar: if you pass a file .C it
compile the C code but if the file (.C) contains C++ code then
compile C++.
No.
No, it does not.
g++ compiles as C++ unless you tell it to compile as C with '-x c'
option.
No.
You didn't read carefully or I didn't express myself well. I wrote that >>>> the g++ compile c++ even if it is written inside a .c file.
However in doubt I preferred to try. If I pass to g++ a .c file that
contains c code, it compiles without any option, perhaps because it
reads as if it were c++ but in any case compiles it.
No.
The way gcc handles all this is actually quite straightforward.
First, there is no difference between the commands "gcc" and "g++" in
the languages supported, or the way the language is determined. The
only difference between these two is the standard libraries linked by
default when generating a final executable - "g++" automatically
includes the C++ standard libraries, while "gcc" only has the C
standard libraries.
In neither case does "gcc" or "g++" actually handle the compilation -
these are driver front-ends that pass things on to the actual
compilers, assemblers and linkers (and any other bits and pieces
required).
The front-ends determine the language to use primarily from the
suffix of the source file it is given. ".c" files are compiled as C.
".cpp", ".c++", ".cc", ".C" (note the capital C), ".cp", ".cxx", and
".CPP" are compiled as C++. (There are many other extensions
supported for different languages.)
The language choice can be overridden by using the "-x" switch, such
as "-x c" or "-x c++". The standard can be specified with "-std=".
There is no automatic detection of C or C++ based on the /content/ of
the files.
<https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html>
?
I really wrote that something similar (similar != equal) did g++ and
that, if you write c++ code in a file with the .c extension, the g++
compile it. I never wrote that it was automatically recognized.
In addition, you just explained why g++ compile a .c that contains c++
code. I don't understand: no what?
I made an error here - "g++ foo.c" /will/ treat the file as C++. I apologise for that, as it made things a lot more confusing.
But that is not what you wrote. Perhaps you didn't write what you
intended to write. You said that g++ somehow determines whether to
compile code as C or C++ based on the /contents/ of the file, not the filename suffix. And that is completely wrong.
You also mixed up ".c" and ".C". gcc considers ".c" to be C code, while ".C" (with a capital C) is considered C++.
On Sun, 26 May 2024 17:10:01 +0200
jak <[email protected]> wrote:
?
I really wrote that something similar (similar != equal) did g++ and
that, if you write c++ code in a file with the .c extension, the g++
compile it. I never wrote that it was automatically recognized.
In addition, you just explained why g++ compile a .c that contains c++
code. I don't understand: no what?
Your English is already harder to understand than mine.
Congratulations, that is not a small fit. But you still have fir to
pursuit. Keep exercising.
On Sun, 26 May 2024 16:25:51 +0100
bart <[email protected]> wrote:
On 26/05/2024 14:18, Michael S wrote:
Are you talking about a 5MB array initialised like this:
unsigned char data[] = {
45,
67,
17,
... // 5M-3 more rows
};
Yes.
The timing for 120M entries was challenging as it exceeded physical
memory. However that test I can also do with C compilers. Results for
120 million lines of data are:
DMC - Out-of-memory
Tiny C - Silently stopped after 13 second (I thought it
had finished but no)
lccwin32 - Insufficient memory
gcc 10.x.x - Out of memory after 80 seconds
mcc - (My product) Memory failure after 27 seconds
Clang - (Crashed after 5 minutes)
MM 144s (Compiler for my language)
So the compiler for my language did quite well, considering!
That's an interesting test as well, but I don't want to run it on my HW
right now. May be, at night.
Back to the 5MB test:
Tiny C 1.7s 2.9MB/sec (Tcc doesn't use any IR)
mcc 3.7s 1.3MB/sec (my product; uses intermediate ASM)
Faster than new MSVC, but slower than old MSVC.
DMC -- -- (Out of memory; 32-bit compiler)
lccwin32 3.9s 1.3MB/sec
gcc 10.x 10.6s 0.5MB/sec
clang 7.4s 0.7MB/sec (to object file only)
MM 1.4s 3.6MB/sec (compiler for my language)
MM 0.7 7.1MB/sec (MM optimised via C and gcc-O3)
That's quite impressive.
Does it generate object files or goes directly to exe?
Even if later, it's still impressive.
As a reminder, when using my version of 'embed' in my language,
embedding a 120MB binary file took 1.3 seconds, about 90MB/second.
But both are much faster than compiling through text. Even "slow"
40MB/3 is 6-7 times faster than the fastest of compilers in my
tests.
Do you have a C compiler that supports #embed?
No, I just blindly believe the paper.
But it probably would be available in clang this year and in gcc around
start of the next year. At least I hope so.
It's generally understood that processing text is slow, if
representing byte-at-a-time data. If byte arrays could be represented
as sequences of i64 constants, it would improve matters. That could
be done in C, but awkwardly, by aliasing a byte-array with an
i64-array.
I don't think that conversion from text to binary is a significant
bottleneck here.
The Baby X resource compiler has a 'binary' tag to embed binary data.
The biggest file in my documents folder was a 33 mb boost zipped
image. And the resouce compiler, built in debug mode, took five
seconds to convert that to a C source file with an array of unsigned
chars.
It then took gcc about 20 seconds to compile it to an object file.
The output file was 218 mb. It goes straight in the bin.
Indeed there are c++ compilers who, if used to compile c code, could
decide to call the c compiler to do the work, but if something in the
code is not strictly c, then the compilation will be in c++, the size of
On 26/05/2024 17:35, Michael S wrote:
On Sun, 26 May 2024 16:25:51 +0100
bart <[email protected]> wrote:
Back to the 5MB test:
Tiny C 1.7s 2.9MB/sec (Tcc doesn't use any IR)
mcc 3.7s 1.3MB/sec (my product; uses intermediate
ASM)
Faster than new MSVC, but slower than old MSVC.
My mcc is never going to be fast, because it uses ASM, which itself
will generate a text file several times larger than the C (so the
line "123," in C ends up as " db 123" in the ASM file).
On Sun, 26 May 2024 16:25:51 +0100
bart <[email protected]> wrote:
On 26/05/2024 14:18, Michael S wrote:
The timing for 120M entries was challenging as it exceeded physical
memory. However that test I can also do with C compilers. Results
for 120 million lines of data are:
DMC - Out-of-memory
Tiny C - Silently stopped after 13 second (I thought it
had finished but no)
lccwin32 - Insufficient memory
gcc 10.x.x - Out of memory after 80 seconds
mcc - (My product) Memory failure after 27 seconds
Clang - (Crashed after 5 minutes)
MM 144s (Compiler for my language)
So the compiler for my language did quite well, considering!
That's an interesting test as well, but I don't want to run it on my
HW right now. May be, at night.
Keith Thompson ha scritto:
For example g++ makes something similar: if you pass a file .C itIndeed there are c++ compilers who, if used to compile c code, could
decide to call the c compiler to do the work, but if something in the
code is not strictly c, then the compilation will be in c++, the size
of the executable will increase significantly and will need of an
internal or external runtimer to work. If it were the same thing you
would not get different things.
Oh? Do you know of a C++ compiler that actually behaves this way?
I've never heard of such a thing.
compile the C code but if the file (.C) contains C++ code then
compile C++.
On Sun, 26 May 2024 19:01:21 +0100
bart <[email protected]> wrote:
On 26/05/2024 17:35, Michael S wrote:
On Sun, 26 May 2024 16:25:51 +0100
bart <[email protected]> wrote:
Back to the 5MB test:
Tiny C 1.7s 2.9MB/sec (Tcc doesn't use any IR)
mcc 3.7s 1.3MB/sec (my product; uses intermediate
ASM)
Faster than new MSVC, but slower than old MSVC.
My mcc is never going to be fast, because it uses ASM, which itself
will generate a text file several times larger than the C (so the
line "123," in C ends up as " db 123" in the ASM file).
Generation of asm at 7-8 MB/s sounds feasible even on slow computer.
And once you have asm in right format,
'gnu as' processes it quite fast.
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
int main(int argz, char** argv)
{
if (argz > 1) {
FILE* fp = fopen(argv[1], "wb");
if (fp) {
char buf[2048];
_Bool look_for_comma = 0;
for (;;) {
if (fgets(buf, sizeof(buf), stdin) != buf)
break;
char* p = buf;
for (;;) {
char c = *p;
if (isgraph(c)) {
if (look_for_comma) {
if (c == ',') {
look_for_comma = 0;
++p;
} else {
goto done;
}
} else {
char* endp;
long val = strtol(p, &endp, 0);
if (endp==p) // not a number
goto done;
fputc((unsigned char)val, fp);
p = endp;
look_for_comma = 1;
}
} else {
if (c == 0)
break; // end of line
++p; // skip space or control character
}
}
}
done:
fclose(fp);
} else {
perror(argv[1]);
return 1;
}
}
return 0;
}
On 5/26/2024 9:18 AM, David Brown wrote:...
On 26/05/2024 01:45, Keith Thompson wrote:
David Brown <[email protected]> writes:
#define errno *errno()
Both of those need more parentheses -- and I'm unconfortable using the
same identifier for the macro and the function.
The second example was from the footnote in the C standard's section on
<errno.h>, so it can't be /that/ bad!
But I agree with your discomfort.
I would expect it to immediately explode, because AFAIK the usual preprocessor behavior is to keep expanding macros in a line until there
is nothing left to expand.
Faster than new MSVC, but slower than old MSVC.
On Sun, 26 May 2024 19:19:59 +0100 Malcolm McLean <[email protected]> wrote:
... was a 33 mb boost zipped image.If '33 mb' means 33 MB ...
People have always managed to embed
binary source files into their binary output files - using linker
tricks, or using xxd or other tools (common or specialised) to turn
binary files into initialisers for constant arrays (or structs).
#embed has two purposes. One is to save you from using external tools
for that kind of thing.
C++ is the wrong language for web applications.
I like Java more for that.
On Sun, 26 May 2024 13:09:36 +0200, David Brown wrote:
People have always managed to embed
binary source files into their binary output files - using linker
tricks, or using xxd or other tools (common or specialised) to turn
binary files into initialisers for constant arrays (or structs).
Don’t call them “tricks”. Call them “linker scripts” and “build procedures”. They can do some quite complex things.
#embed has two purposes. One is to save you from using external tools
for that kind of thing.
But it can only be a partial solution to that. It cannot replace the procedures needed to construct the binary data format.
It only solves the
easy part: including that binary data in the build.
That’s why I think it’s a waste of time.
The gcc command, for example, can invoke either a C or C++ compiler ...
On 27/05/2024 01:44, Lawrence D'Oliveiro wrote:
On Sun, 26 May 2024 13:09:36 +0200, David Brown wrote:
People have always managed to embed binary source files into their
binary output files - using linker tricks, or using xxd or other tools
(common or specialised) to turn binary files into initialisers for
constant arrays (or structs).
Don’t call them “tricks”. Call them “linker scripts” and “build >> procedures”. They can do some quite complex things.
#embed has two purposes. One is to save you from using external tools
for that kind of thing.
But it can only be a partial solution to that. It cannot replace the
procedures needed to construct the binary data format.
The binary data already exists, or has been created.
The problem is getting it into your program as ready-to-use data rather
than have to bundle an unwieldy collection of files in a folder
somewhere and then have assorted routines to read them into memory.
It only solves the easy part: including that binary data in the build.
Apparently that is not so easy as you seem to think.
Or maybe you think
that 'embedding a file' just means adding it to a zip file?
Embedding applies also to text files not just binaries.
On 2024-05-26, jak <[email protected]> wrote:
Keith Thompson ha scritto:
For example g++ makes something similar: if you pass a file .C itIndeed there are c++ compilers who, if used to compile c code, could
decide to call the c compiler to do the work, but if something in the
code is not strictly c, then the compilation will be in c++, the size
of the executable will increase significantly and will need of an
internal or external runtimer to work. If it were the same thing you
would not get different things.
Oh? Do you know of a C++ compiler that actually behaves this way?
I've never heard of such a thing.
compile the C code but if the file (.C) contains C++ code then
compile C++.
1. The file suffix is not "something /in the code/ that is not strictly C".
The front end of a compiler collection selecting a compiler based
on file suffix is not an example of switching language based
on syntax in the file.
2. g++ does not behave this way.
In fact .C (capital C) is one of the conventions for C++ files. I
seem to remember that the convention was used at A&T and in fact you
can find examples of it in the source code of Cfront (the historic
C++ to C transpiler originally developed by B. Stroustrup).
For g++ to assume that a .C file is C and not C++ would be insanely
poor.
The g++ command even assumes that .c files are C++!
Conversely, when you use the gcc driver command on a .C file,
you get the C++ compiler!
Since you'r posting to Usenet, you're obviously connected to the same Internet as the rest of us, so it's amazing you're not able to check
your facts. You know about g++, so presumbly you have an installation of
it somewhere, where you could run a 30 second experiment.
On Sun, 26 May 2024 19:35:49 +0300, Michael S wrote:
Faster than new MSVC, but slower than old MSVC.
New MSVC is slower than old MSVC?!? Say it isn’t so!
David Brown ha scritto:
On 26/05/2024 17:10, jak wrote:
?
I really wrote that something similar (similar != equal) did g++ and
that, if you write c++ code in a file with the .c extension, the g++
compile it. I never wrote that it was automatically recognized.
In addition, you just explained why g++ compile a .c that contains c++
code. I don't understand: no what?
I made an error here - "g++ foo.c" /will/ treat the file as C++. I
apologise for that, as it made things a lot more confusing.
But that is not what you wrote. Perhaps you didn't write what you
intended to write. You said that g++ somehow determines whether to
compile code as C or C++ based on the /contents/ of the file, not the
filename suffix. And that is completely wrong.
You also mixed up ".c" and ".C". gcc considers ".c" to be C code,
while ".C" (with a capital C) is considered C++.
Sorry but no. I wrote that there are compilers who do it and when they replied, bringing the gcc as an example, I replied that the g++ does something similar.
and no, I have not confused the .c with the .C:
On Sun, 26 May 2024 23:06:47 +0300, Michael S wrote:
On Sun, 26 May 2024 19:19:59 +0100 Malcolm McLean
<[email protected]> wrote:
... was a 33 mb boost zipped image.If '33 mb' means 33 MB ...
Yeah, I wondered about that. Never saw anybody measure things in “millibits” before ...
David Brown <[email protected]> writes:
On 26/05/2024 00:58, Keith Thompson wrote:
David Brown <[email protected]> writes:
On 25/05/2024 03:29, Keith Thompson wrote:
Keith Thompson <[email protected]> writes:
David Brown <[email protected]> writes:
On 23/05/2024 14:11, bart wrote:[...]
The compiler will generate results /as if/ it had expanded the file to
a list of numbers and parsed them. But it will not do that in
practice. (At least, not for more serious implementations - simple
solutions might do so to get support implemented quickly.)
I'll start by acknowledging that the prototype information apparently
*does* optimize #embed when it can. I was mistaken on that point.
#embed *must* expand to the standard-defined comma-delimited sequence in *some* cases.
Which means that the piece of the compiler that implements #embed has to recognize when it must generate that sequence, and when it can do
something more efficient.
I'd expect implementations to have extremely fast implementations for
initialising arrays of character types, and probably also for other
arrays of scaler types. More complicated examples - such as
parameters in a macro or function call - would probably use a
fall-back of generating naïve lists of integer constants.
My problem is not just with how the compiler can figure out when it can optimize, but how programmers are supposed to understand whatever rules
it uses. Can I rely on the optimization being performed if I use a
typedef for unsigned char, or if I use an enumeration type whose
underlying type is unsigned char, or if I have initialization elements
befor and after the #embed directive?
Effective use of #embed requires too much "magic" for my taste -- particularly having the preprocessor rely on information from later
phases. The semantics of #embed don't rely on that information, but efficient use for large files does.
If you have a binary file containing a sequence of int values, you
can
use #embed to initialize an unsigned char array that's aliased with or
copied to the int array.
The *embed element width* is typically going to be CHAR_BIT bits by
default. It can only be changed by an *implementation-defined* embed
parameter. It seems odd that there's no standard way to specify the
element width.
It seems even more odd that the embed element width is
implementation defined and not set to CHAR_BIT by default.
I agree. But it may be left flexible for situations where the host
and target have different ideas about CHAR_BIT. (Targets with
CHAR_BIT other than 8 are very rare, hosts with CHAR_BIT other than 8
are non-existent, but C remains flexible.)
I would think that you'd want the element width to match CHAR_BIT *on
the target* (which is the only CHAR_BIT that's relevant or available).
If you're cross-compiling, you'd probably want to embed a file that
could have been used on the target system.
And if I'm not doing that kind of exotic cross-compiling, I can't rely
on the element width being CHAR_BIT *or* on any standard way to specify
that I want it to be CHAR_BIT.
Requiring the default width to be CHAR_BIT would, I'm guessing, solve
99% of cases. Allowing it to be specified by a parameter would solve
the remaing 1%. And I expect it *will* be CHAR_BIT in most or all implementations, and programmers will rely on that assumption. I think
the standard should guarantee that.
For a very large file, that could be a significant burden. (I don't
have any numbers on that.)
I do :
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#design-efficiency-metrics>
(That's from a proposal for #embed for C and C++. Generating the
numbers and parsing them is akin to using xxd.)
More useful links:
<https://thephd.dev/embed-the-details#results>
<https://thephd.dev/implementing-embed-c-and-c++>
(These are from someone who did a lot of the work for the proposals,
and prototype implementations, as far as I understand it.)
That second link does have a lot of good information. I think I had
seen it before, but I hadn't read it thoroughly. It refers to prototype implementations for both gcc and clang. I've built the prototype on my system, and godbolt.org has it, but the gcc prototype (for which the
article provides good performance data) doesn't seem to be available anywhere.
My experiments with the clang prototype have been a bit confusing. I
assumed that `clang -E` would give me meaningful results, but it always produces the comma-delimited sequence of integer constants, and even
that output is inconsistent. It looks like "-E" synthesizes naive and
not entirely correct output. Feeding that output to clang produces
warnings that I don't get without "-E". Some of this might be the
result of user error on my part.
I did some tests with 100MB file, both with #embed and with #include
using the output of "xxd". #embed *is* much faster.
According to <https://thephd.dev/implementing-embed-c-and-c++>, it
internally generates __builtin_pp_embed, which takes as arguments the expected type (always unsigned char for now), the filename as a string literal, and the data encoded as a base64 string literal. That's not
going to be as fast as a hypothetical pure binary blob, but apparently
it's still much faster than parsing a comma-delimited sequence.
I haven't been able to get "clang -E" in the prototype to generate __builtin_pp_embed, or to get clang to recognize it. There are internal things going on that I don't understand.
The author points out that using binary blobs would break tools that
work with -E preprocessed source files. If you could assume that the preprocessed output will be processed only by the same compiler, that wouldn't be an issue, but apparently that's not a safe assumption.
The author acknowedges that the prototype implementation doesn't handle
all cases correctly.
Prototypes have been made, and they do have such optimisations. How
things end up in real tools remains to be seen, of course.
Here's how I personally would have preferred for #embed to be specified:
- As in current C23 drafts, #embed with no parameters must operate *as
if* it expanded to a comma-delimited list of integer constant
expressions.
- With no parameters, both the common cases (initializing an array of
characters) and odd cases (e.g., initializing a struct object with
varying types and sizes of members) must work as specified.
- A standard-defined parameter allows control over optimization.
The parameter can be "optimize(true)" or "optimize(false)".
"optimize(false)" has no formal effect, but the compiler *should*
generate the canonical sequence of constants.
"optimize(true)" causes undefined behavior if #embed is used in a
context other than the initialization of an array of character type.
A naive compiler can quietly ignore the optimize() parameter and always generate the comma-delimited sequence. An exceedingly clever compiler
could ignore it and always make a correct decision about whether to
optimize #embed.
Without the optimize parameter, typical compilers are expected to
optimize #embed depending on the context in which it's used, and should produce the correct results in all cases. The parameter can be used to override the compiler's judgement.
Another possibility might have been to specify that #embed can *only* be
used to initialize an array of character type, and any other use either
has undefined behavior or is a constraint violation. That would avoid
all the complication of determining from context whether it can be
optimized, and would probably cover 99% of cases. But it's probably too
late for that.
On Mon, 27 May 2024 01:55:24 +0100, bart wrote:
On 27/05/2024 01:44, Lawrence D'Oliveiro wrote:
Nothing “unwieldy” about it. It’s a bunch of temporary intermediate build
products, generated from suitable source files like everything else in the build.
It only solves the easy part: including that binary data in the build.
Apparently that is not so easy as you seem to think.
Yes, it is as easy as I think. I’ve done this sort of thing, using
suitable build scripts.
Or maybe you think
that 'embedding a file' just means adding it to a zip file?
It’s whatever “including it in the build” means. It might indeed be a zip
component, as with resources for an Android app. Or it might be converted into an object file with a tool like objcopy, to be integrated into the executable.
Embedding applies also to text files not just binaries.
Same principle applies.
David Brown <[email protected]> writes:
On 26/05/2024 00:58, Keith Thompson wrote:
It knows because the compiler writers are actually quite smart. The C
standards may describe the translation process in a series of distinct
and independent phases, but that's not how it is done in practice.
The key point is that the compiler knows how the sequence of integers
is going to be used before it gets that far in the preprocessing.
I'd expect implementations to have extremely fast implementations for
initialising arrays of character types, and probably also for other
arrays of scaler types. More complicated examples - such as
parameters in a macro or function call - would probably use a
fall-back of generating naïve lists of integer constants.
My problem is not just with how the compiler can figure out when it can >optimize, but how programmers are supposed to understand whatever rules
it uses. Can I rely on the optimization being performed if I use a
typedef for unsigned char, or if I use an enumeration type whose
underlying type is unsigned char, or if I have initialization elements
befor and after the #embed directive?
Macros in C are not recursive. That stops them exploding, but also means there's a lot you can't do with the preprocessor.
On 27/05/2024 03:48, Lawrence D'Oliveiro wrote:
Apparently that is not so easy as you seem to think.
Yes, it is as easy as I think. I’ve done this sort of thing, using
suitable build scripts.
Show me.
If I am not mistaken, gfortran by default treats extension .f
as "old FORTRAN" and extension .f90 as "new Fortran".
Whichever compiler you use, I strongly recommend using only ".c" for C
files, and only ".cpp" for C++ files.
On Sun, 26 May 2024 19:50:40 +0300, Michael S wrote:
If I am not mistaken, gfortran by default treats extension .f
as "old FORTRAN" and extension .f90 as "new Fortran".
The full list of recognized file extensions and their treatment is
here <https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gfortran/GNU-Fortran-and-GCC.html>.
On Mon, 27 May 2024 14:03:16 +0100, bart wrote:
On 27/05/2024 03:48, Lawrence D'Oliveiro wrote:
Apparently that is not so easy as you seem to think.
Yes, it is as easy as I think. I’ve done this sort of thing, using
suitable build scripts.
Show me.
Here <https://github.com/ldo/unicode_browser_android> is an old
example, from when I was trying to learn Android programming. It lets
you browse the Unicode code-point database, and do incremental
searches by partial matching on code-point names: e.g. you can type
“right arrow” and see candidate matches such as “U+219B RIGHTWARDS ARROW WITH STROKE”, “U+219D RIGHTWARDS WAVE ARROW”, “U+21A0 RIGHTWARDS
TWO HEADED ARROW” etc.
In the “util” subdirectory, you will find a Python script called “get_codes”. This processes a NamesList.txt file as downloaded from Unicode.org, and encodes the database as a binary blob with a specially-constructed header to allow quick loading and extraction of code-point information, including names, categories, related entries
etc. This blob gets built as a “resource file” into the .apk file,
where the Java code can find it.
David Brown <[email protected]> writes:
On 27/05/2024 01:17, Keith Thompson wrote:[...]
Here's how I personally would have preferred for #embed to be
specified:
- As in current C23 drafts, #embed with no parameters must operate
*as
if* it expanded to a comma-delimited list of integer constant
expressions.
- With no parameters, both the common cases (initializing an array of
characters) and odd cases (e.g., initializing a struct object with
varying types and sizes of members) must work as specified.
- A standard-defined parameter allows control over optimization.
The parameter can be "optimize(true)" or "optimize(false)".
"optimize(false)" has no formal effect, but the compiler *should*
generate the canonical sequence of constants.
"optimize(true)" causes undefined behavior if #embed is used in a
context other than the initialization of an array of character type.
I disagree here. I want the compiler to generate the "as if" results
regardless of any optimisation, working as currently specified. And
/if/ the compiler is able to optimise the #embed, then I want it to do
so automatically - I see no situation in which I would ever want
"optimize(false)".
The issue I'm trying to address (very prematurely, no doubt) is
that the decision of whether to optimize #embed vs. generating the
naive comma-separated sequence is difficult to formalize, and easy
to get wrong in corner cases.
"restrict" is another performance
hint whose only formal effect is to introduce undefined behavior
if you use it incorrectly.
Let's say I define an array of a 1-byte enumeration type, initialized
with #embed for a very large binary file. Maybe one compiler recognizes
this as a case where it can perform the optimization, and another
doesn't.
If I can tell the compiler "trust me, I'm using this to
initialize raw byte data, and I'll take responsibility if I get it
wrong", I can see that being useful.
And maybe "optimize" isn't the best name. Perhaps "raw_bytes"?
Without some kind of programmer control, I'm concerned that the rules
for defining an array so #embed will be correctly optimized will be
spread as lore rather than being specified anywhere.
On Sun, 26 May 2024 18:12:17 +0200, David Brown wrote:
Macros in C are not recursive. That stops them exploding, but also means
there's a lot you can't do with the preprocessor.
String-based macros + recursive substitution = recipe for trouble.
On 28/05/2024 01:20, Scott Lurndal wrote:
Keith Thompson <[email protected]> writes:You need the Baby X resource compiler.
David Brown <[email protected]> writes:
On 26/05/2024 00:58, Keith Thompson wrote:
It knows because the compiler writers are actually quite smart. The C >>>> standards may describe the translation process in a series of distinct >>>> and independent phases, but that's not how it is done in practice.
The key point is that the compiler knows how the sequence of integers
is going to be used before it gets that far in the preprocessing.
I'd expect implementations to have extremely fast implementations for
initialising arrays of character types, and probably also for other
arrays of scaler types. More complicated examples - such as
parameters in a macro or function call - would probably use a
fall-back of generating naïve lists of integer constants.
My problem is not just with how the compiler can figure out when it can
optimize, but how programmers are supposed to understand whatever rules
it uses. Can I rely on the optimization being performed if I use a
typedef for unsigned char, or if I use an enumeration type whose
underlying type is unsigned char, or if I have initialization elements
befor and after the #embed directive?
A typical use case for me would be to build a binary file
with a bespoke application. I would expect the #embed of that
file to _maintain the binary layout in memory exactly the
same as in the file_. It would be the #embed user's
responsibilty to ensure that the binary file would be identical
to the binary data expected by the declaration of the data structure
being embedded.
E.g. if the embedded file contained an array of some structure,
the binary format of the embedded file must match the binary format
that would be expected by the compiler (field sizes, alignment etc)
for an array of said structure.
The spec does say that the data in memory must match the data in the
file. So it seems that the preprocessor can simply add a private
attribute (e.g. just pass the #embed to the compiler a la #line or #file)
and the compiler will tag the symbol table entry for the symbol associated >> with the #embed and the code generator can just open the file and
copy the data byte-for-byte to the object file.
[email protected] (Scott Lurndal) writes:
Keith Thompson <[email protected]> writes:
David Brown <[email protected]> writes:
On 26/05/2024 00:58, Keith Thompson wrote:
It knows because the compiler writers are actually quite smart. The C >>>> standards may describe the translation process in a series of distinct >>>> and independent phases, but that's not how it is done in practice.
The key point is that the compiler knows how the sequence of integers
is going to be used before it gets that far in the preprocessing.
I'd expect implementations to have extremely fast implementations for
initialising arrays of character types, and probably also for other
arrays of scaler types. More complicated examples - such as
parameters in a macro or function call - would probably use a
fall-back of generating naïve lists of integer constants.
My problem is not just with how the compiler can figure out when it can >>>optimize, but how programmers are supposed to understand whatever rules >>>it uses. Can I rely on the optimization being performed if I use a >>>typedef for unsigned char, or if I use an enumeration type whose >>>underlying type is unsigned char, or if I have initialization elements >>>befor and after the #embed directive?
A typical use case for me would be to build a binary file
with a bespoke application. I would expect the #embed of that
file to _maintain the binary layout in memory exactly the
same as in the file_.
I'm not sure why you'd expect that given the way #embed is specified -- >*unless* you're using to initialize an array of characters.
It would be the #embed user's
responsibilty to ensure that the binary file would be identical
to the binary data expected by the declaration of the data structure
being embedded.
E.g. if the embedded file contained an array of some structure,
the binary format of the embedded file must match the binary format
that would be expected by the compiler (field sizes, alignment etc)
for an array of said structure.
The spec does say that the data in memory must match the data in the
file.
Where does it say that?
See <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf>
6.10.4. (N3220 is a C26 draft, but it's very close to C23.)
The spec says that #embed expands to a comma-delimited sequence of
integer constant expressions (and like anything, optimizations that
don't violate the specified behavior are allowed). If the >implementation-defined *embed element width* is CHAR_BIT (which is not >guaranteed), then you can expect the same data layout *if* you use it to >initialize an array of characters, preferably unsigned char.
David Brown <[email protected]> writes:
On 28/05/2024 02:33, Keith Thompson wrote:[...]
Without some kind of programmer control, I'm concerned that the
rules for defining an array so #embed will be correctly optimized
will be spread as lore rather than being specified anywhere.
They might, but I really do not think that is so important, since
they will not affect the generated results.
Right, it won't affect the generated results (assuming I use it
correctly). Unless I use `#embed optimize(true)` to initialize
a struct with varying member sizes, but that's my fault because I
asked for it.
The point is compile-timer performance, and perhaps even the ability
to compile at all.
I'm thinking about hypothetical cases where I want to embed a
*very* large file and parsing the comma-delimited sequence could
have unacceptable compile-time performance, perhaps even causing
a compile-time stack overflow depending on how the parser works.
Every time the compiler sees #embed, it has to decide whether to
optimize it or not, and the decision criteria are not specified
anywhere (not at all in the standard, perhaps not clearly in the
compiler's documentation).
OK, so basically this writes a file. Or, part of a file?
Where is the bit in the Java code that embeds it.
The point is this: /once you already have those discrete files/, how do
you painlessly embed them into your application?
David Brown <[email protected]> writes:
On 28/05/2024 02:33, Keith Thompson wrote:[...]
Without some kind of programmer control, I'm concerned that the rules
for defining an array so #embed will be correctly optimized will be
spread as lore rather than being specified anywhere.
They might, but I really do not think that is so important, since they
will not affect the generated results.
Right, it won't affect the generated results (assuming I use it
correctly). Unless I use `#embed optimize(true)` to initialize
a struct with varying member sizes, but that's my fault because I
asked for it.
The point is compile-timer performance, and perhaps even the ability
to compile at all.
I'm thinking about hypothetical cases where I want to embed a
*very* large file and parsing the comma-delimited sequence could
have unacceptable compile-time performance, perhaps even causing
a compile-time stack overflow depending on how the parser works.
Every time the compiler sees #embed, it has to decide whether to
optimize it or not, and the decision criteria are not specified
anywhere (not at all in the standard, perhaps not clearly in the
compiler's documentation).
On 5/27/2024 9:48 PM, Lawrence D'Oliveiro wrote:...
On Sun, 26 May 2024 18:12:17 +0200, David Brown wrote:
Macros in C are not recursive. That stops them exploding, but also means >>> there's a lot you can't do with the preprocessor.\
It seems the preprocessor in BGBCC is likely not entirely conformant
in this case...
If given a recursive macro, it will most likely just explode and
probably crash the compiler...
Mostly as it handles macro-expansion by looping over the line and
performing macro-substitutions until no more substitutions are seen,
at which point it emits the line to the output buffer and moves on to
the next line.
I've got a small commandline-tool that makes a const'd char
-array from any binary file.
On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:
I've got a small commandline-tool that makes a const'd char
-array from any binary file.
It seems to me it would be more efficient to use objcopy to turn that
binary file directly into an object file with symbols accessible from
C code defining its beginning and ending points. Then just link it
into the executable.
On Thu, 30 May 2024 02:32:03 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:
On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:
I've got a small commandline-tool that makes a const'd char
-array from any binary file.
It seems to me it would be more efficient to use objcopy to turn that
binary file directly into an object file with symbols accessible from
C code defining its beginning and ending points. Then just link it
into the executable.
Of course, it is more efficient.
But:
- it covers fewer use cases.
- it exposes array's name and size as global symbols which is not
always desirable
- it feels too much like a magic. It would feel less like a magic if
done by compiler rather than by extra tool. Even better if done by
compiler in standardized manner.
But yes, in real life, in embedded software project, that's what I'd do.
On 30/05/2024 03:32, Lawrence D'Oliveiro wrote:
On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:
I've got a small commandline-tool that makes a const'd char
-array from any binary file.
It seems to me it would be more efficient to use objcopy to turn
that binary file directly into an object file with symbols
accessible from C code defining its beginning and ending points.
Then just link it into the executable.
None of my compilers, whether for C or anything else, generate object
files.
However, suppose I wanted to link a file called 'logo.bmp' say, into
my program, which consisted of a file called main.c.
What is the entire process using your suggestion? What do I put into
main.c? Assume the data is represented by a char-array.
On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:
I've got a small commandline-tool that makes a const'd char
-array from any binary file.
It seems to me it would be more efficient to use objcopy to turn that
binary file directly into an object file with symbols accessible from C
code defining its beginning and ending points. Then just link it into the executable.
On Thu, 30 May 2024 14:34:00 +0100
bart <[email protected]> wrote:
On 30/05/2024 03:32, Lawrence D'Oliveiro wrote:
On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:
I've got a small commandline-tool that makes a const'd char
-array from any binary file.
It seems to me it would be more efficient to use objcopy to turn
that binary file directly into an object file with symbols
accessible from C code defining its beginning and ending points.
Then just link it into the executable.
None of my compilers, whether for C or anything else, generate
object files.
However, suppose I wanted to link a file called 'logo.bmp' say, into
my program, which consisted of a file called main.c.
What is the entire process using your suggestion? What do I put
into main.c? Assume the data is represented by a char-array.
extern unsigned char _binary_logo_bmp_start[];
extern unsigned char _binary_logo_bmp_size[];
The first symbol is an array itself.
The seconded symbol contains the length of array. You use it in
somewhat non-intuitive way:
size_t my_size = (size_t)_binary_logo_bmp_size;
Pay attention that I never used this method myself, just took a look
at the output of objcopy with 'objdump -t', so please don't take my
words as a sure thing.
BTW, options in this case are rather simple:
objcopy -I binary -O elf32-little logo.bmp logo_bmp.o
Replace elf32-little with relevant format for your software. However I
am not sure that it would work for none-elf output formats.
Where do the _binary_logo_bmp_start and ...-size symbols come from?
That is, how do they get into the object file.
On Thu, 30 May 2024 14:34:00 +0100
bart <[email protected]> wrote:
On 30/05/2024 03:32, Lawrence D'Oliveiro wrote:
On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:
I've got a small commandline-tool that makes a const'd char
-array from any binary file.
It seems to me it would be more efficient to use objcopy to turn
that binary file directly into an object file with symbols
accessible from C code defining its beginning and ending points.
Then just link it into the executable.
None of my compilers, whether for C or anything else, generate object
files.
However, suppose I wanted to link a file called 'logo.bmp' say, into
my program, which consisted of a file called main.c.
What is the entire process using your suggestion? What do I put into
main.c? Assume the data is represented by a char-array.
extern unsigned char _binary_logo_bmp_start[];
extern unsigned char _binary_logo_bmp_size[];
The first symbol is an array itself.
The seconded symbol contains the length of array. You use it in somewhat non-intuitive way:
size_t my_size = (size_t)_binary_logo_bmp_size;
Pay attention that I never used this method myself, just took a look at
the output of objcopy with 'objdump -t', so please don't take my words
as a sure thing.
BTW, options in this case are rather simple:
objcopy -I binary -O elf32-little logo.bmp logo_bmp.o
Replace elf32-little with relevant format for your software. However I
am not sure that it would work for none-elf output formats.
On 30/05/2024 15:08, Michael S wrote:
Replace elf32-little with relevant format for your software. However I
am not sure that it would work for none-elf output formats.
There appears to be an objcopy utility that runs under Windows.
On 30/05/2024 16:48, bart wrote:
On 30/05/2024 15:08, Michael S wrote:
Replace elf32-little with relevant format for your software.
However I am not sure that it would work for none-elf output
formats.
There appears to be an objcopy utility that runs under Windows.
objcopy can handle lots of formats, as source or target, and can run
on any general OS host. So the question is not if you can get
objcopy that runs on Windows, it is whether you can use this kind of blob-to-object-file conversion with the output in the Windows object
file format in the same was as you can for elf formats.
You know
vastly more about the Windows object file formats than I do, so maybe
you can answer this yourself.
On Fri, 31 May 2024 09:24:48 +0200
David Brown <[email protected]> wrote:
On 30/05/2024 16:48, bart wrote:
On 30/05/2024 15:08, Michael S wrote:
Replace elf32-little with relevant format for your software.
However I am not sure that it would work for none-elf output
formats.
There appears to be an objcopy utility that runs under Windows.
objcopy can handle lots of formats, as source or target, and can run
on any general OS host. So the question is not if you can get
objcopy that runs on Windows, it is whether you can use this kind of
blob-to-object-file conversion with the output in the Windows object
file format in the same was as you can for elf formats.
That's quite strange question.
You mean, you are able to imagine object file format uncapable to
represent initialized data array?
You know
vastly more about the Windows object file formats than I do, so maybe
you can answer this yourself.
objcopy supplied with msys2 appear to have bug in -O selection handling,
but fortunately there exists an easy workaround. Read my post below if
you are interested.
On Thu, 30 May 2024 15:48:39 +0100
bart <[email protected]> wrote:
Where do the _binary_logo_bmp_start and ...-size symbols come from?
That is, how do they get into the object file.
objcopy generates names of the symbols from the name of input binary
file. I would think that it is possible to change these symbols to
something else, but I am not sure that it is possible withing the same invocation of objcopy. It certainly is possible with a second pass.
Lawrence probably can give more authoritative answer.
Or as a last resort you can RTFM.
No, it does not work like that.
First, copy *exactly* what I said in my previous post.
Only after you reproduced, start to be smart.
_binary_hello_c_size is a link simbol rather than variable.
Declaration:
extern char _binary_hello_c_size[];
Usage:
printf("%zd\n", (size_t)_binary_hello_c_size);
On 30/05/2024 16:03, Michael S wrote:
On Thu, 30 May 2024 15:48:39 +0100
bart <[email protected]> wrote:
Where do the _binary_logo_bmp_start and ...-size symbols come from?
That is, how do they get into the object file.
objcopy generates names of the symbols from the name of input binary
file. I would think that it is possible to change these symbols to something else, but I am not sure that it is possible withing the
same invocation of objcopy. It certainly is possible with a second
pass. Lawrence probably can give more authoritative answer.
Or as a last resort you can RTFM.
I gave myself the simple task of incorporating the source text of
hello.c into a program, and printing it out.
My C program looked like this to start, as an initial test (ignoring declaring the size as an array, unless I had to):
#include <stdio.h>
typedef unsigned char byte;
extern byte _binary_hello_c_start[];
extern int _binary_hello_c_size;
int main(void) {
printf("%d\n", _binary_hello_c_size);
}
One small matter is those ugly, long identifiers. A bigger one in
this case is that I really want that embedded text to be zero
terminated; here it's unlikely to be.
However I still have to create the object file with the data. I tried
this:
objcopy -I binary -O pe-x86-64 hello.c hello.obj
The contents looked about right when I looked inside.
Now to build my program. Because my C compiler can't link object
files itself, I have to get it to generate an object file for the
program, then use an external linker:
C:\c>mcc -c c.c
Compiling c.c to c.obj
C:\c>gcc c.obj hello.obj
hello.obj: file not recognized: file format not recognized
collect2.exe: error: ld returned 1 exit status
Unfortunately gcc/ld doesn't recognise the output of objcopy. Even
though it accepts the output of mcc which is the same COFF format.
But even if it worked, you can see it would be a bit of a palaver.
Here's how builtin embedding worked using a feature of my older C
compiler:
#include <stdio.h>
#include <string.h>
char hello[] = strinclude("hello.c");
int main(void) {
printf("hello =\n%s\n", hello);
printf("strlen(hello) = %zu\n", strlen(hello));
printf("sizeof(hello) = %zu\n", sizeof(hello));
}
I build it and run it like this:
C:\c>bcc c
Compiling c.c to c.exe
C:\c>c
hello =
#include "stdio.h"
int main(void) {
printf("Hello, World!\n");
}
strlen(hello) = 70
sizeof(hello) = 71
C:\c>dir hello.c
31/05/2024 13:48 70 hello.c
It just works; no messing about with objcopy parameters; no long
unwieldy names; no link errors due to unsupported file formats; no
problems with missing terminators for embedded text files imported as strings; no funny ways of getting size info.
On Fri, 31 May 2024 16:28:11 +0300
Michael S <[email protected]> wrote:
On Fri, 31 May 2024 16:19:37 +0300
Michael S <[email protected]> wrote:
No, it does not work like that.
First, copy *exactly* what I said in my previous post.
Only after you reproduced, start to be smart.
_binary_hello_c_size is a link simbol rather than variable.
Declaration:
extern char _binary_hello_c_size[];
Usage:
printf("%zd\n", (size_t)_binary_hello_c_size);
Thinking about it, I could be wrong.
I should test more, with less small program.
I tested with bigger program, and it's still works.
So, what written above is correct.
On Fri, 31 May 2024 16:19:37 +0300
Michael S <[email protected]> wrote:
No, it does not work like that.
First, copy *exactly* what I said in my previous post.
Only after you reproduced, start to be smart.
_binary_hello_c_size is a link simbol rather than variable.
Declaration:
extern char _binary_hello_c_size[];
Usage:
printf("%zd\n", (size_t)_binary_hello_c_size);
Thinking about it, I could be wrong.
I should test more, with less small program.
On Fri, 31 May 2024 13:55:33 +0100
No, it does not work like that.
First, copy *exactly* what I said in my previous post.
Only after you reproduced, start to be smart.
_binary_hello_c_size is a link simbol rather than variable.
Declaration:
extern char _binary_hello_c_size[];
Usage:
printf("%zd\n", (size_t)_binary_hello_c_size);
One small matter is those ugly, long identifiers. A bigger one in
this case is that I really want that embedded text to be zero
terminated; here it's unlikely to be.
The tool is not made specifically for ASCII strings, it is more generic.
Unfortunately gcc/ld doesn't recognise the output of objcopy. Even
though it accepts the output of mcc which is the same COFF format.
It recognizes it if lye to objcopy about format.
Specify elf64-x86-64 instead of pe-x86-64 and everything suddenly
works.
It's all was said in my posts from yesterday. It does not sound like you
had read them.
On 31/05/2024 14:48, Michael S wrote:
On Fri, 31 May 2024 16:28:11 +0300
Michael S <[email protected]> wrote:
On Fri, 31 May 2024 16:19:37 +0300
Michael S <[email protected]> wrote:
No, it does not work like that.
First, copy *exactly* what I said in my previous post.
Only after you reproduced, start to be smart.
_binary_hello_c_size is a link simbol rather than variable.
Declaration:
extern char _binary_hello_c_size[];
Usage:
printf("%zd\n", (size_t)_binary_hello_c_size);
Thinking about it, I could be wrong.
I should test more, with less small program.
I tested with bigger program, and it's still works.
So, what written above is correct.
Can you show the full program and the full process?
Here's how builtin embedding worked using a feature of my older C compiler:
#include <stdio.h>
#include <string.h>
char hello[] = strinclude("hello.c");
int main(void) {
printf("hello =\n%s\n", hello);
printf("strlen(hello) = %zu\n", strlen(hello));
printf("sizeof(hello) = %zu\n", sizeof(hello));
}
(test)"sun-go\n"
(test)"sun-go\n"
On Fri, 31 May 2024 15:04:46 +0100
bart <[email protected]> wrote:
Can you show the full program and the full process?
test_objcopy.c:
#include <stdio.h>
int data1[42] = { 1,2,3 ,4,5};
extern unsigned char _binary_test_bi_start[];
extern unsigned char _binary_test_bi_end[];
extern unsigned char _binary_test_bi_size[];
extern unsigned char _binary_bin_to_list_c_start[];
extern unsigned char _binary_bin_to_list_c_end[];
extern unsigned char _binary_bin_to_list_c_size[];
int main()
{
printf("%-40s %p %zd\n", "_binary_test_bi_start",
_binary_test_bi_start, (size_t)_binary_test_bi_start);
printf("%-40s %p %zd\n", "_binary_test_bi_end",
_binary_test_bi_end, (size_t)_binary_test_bi_end);
printf("%-40s %p %zd\n", "_binary_test_bi_size",
_binary_test_bi_size, (size_t)_binary_test_bi_size);
printf("%-40s %p %zd\n", "_binary_bin_to_list_c_start",
_binary_bin_to_list_c_start, (size_t)_binary_bin_to_list_c_start);
printf("%-40s %p %zd\n", "_binary_bin_to_list_c_end",
_binary_bin_to_list_c_end, (size_t)_binary_bin_to_list_c_end);
printf("%-40s %p %zd\n", "_binary_bin_to_list_c_size",
_binary_bin_to_list_c_size, (size_t)_binary_bin_to_list_c_size);
return 0;
}
Test files: test.bi and bin_to_list_c.
Conversion to ojects:
objcopy -I binary -O elf64-x86-64 test.bi test_bi.o
objcopy -I binary -O elf64-x86-64 bin_to_list.c test_c.o
Compilation:
gcc -s -Wall -Oz test_objcopy.c test_bi.o test_c.o
On 31/05/2024 15:34, Michael S wrote:
On Fri, 31 May 2024 15:04:46 +0100
bart <[email protected]> wrote:
Instead of one compiler, here I used two compilers, a tool 'objcopy'
(which bizarrely needs to generate ELF format files) and lots of extra
ugly code. I also need to disregard whatever the hell _binary_..._size does.
On 31/05/2024 15:34, Michael S wrote:
On Fri, 31 May 2024 15:04:46 +0100
bart <[email protected]> wrote:
Can you show the full program and the full process?
test_objcopy.c:
#include <stdio.h>
int data1[42] = { 1,2,3 ,4,5};
extern unsigned char _binary_test_bi_start[];
extern unsigned char _binary_test_bi_end[];
extern unsigned char _binary_test_bi_size[];
extern unsigned char _binary_bin_to_list_c_start[];
extern unsigned char _binary_bin_to_list_c_end[];
extern unsigned char _binary_bin_to_list_c_size[];
int main()
{
printf("%-40s %p %zd\n", "_binary_test_bi_start",
_binary_test_bi_start, (size_t)_binary_test_bi_start);
printf("%-40s %p %zd\n", "_binary_test_bi_end",
_binary_test_bi_end, (size_t)_binary_test_bi_end);
printf("%-40s %p %zd\n", "_binary_test_bi_size",
_binary_test_bi_size, (size_t)_binary_test_bi_size);
printf("%-40s %p %zd\n", "_binary_bin_to_list_c_start",
_binary_bin_to_list_c_start, (size_t)_binary_bin_to_list_c_start); >> printf("%-40s %p %zd\n", "_binary_bin_to_list_c_end",
_binary_bin_to_list_c_end, (size_t)_binary_bin_to_list_c_end);
printf("%-40s %p %zd\n", "_binary_bin_to_list_c_size",
_binary_bin_to_list_c_size, (size_t)_binary_bin_to_list_c_size);
return 0;
}
Test files: test.bi and bin_to_list_c.
Conversion to ojects:
objcopy -I binary -O elf64-x86-64 test.bi test_bi.o
objcopy -I binary -O elf64-x86-64 bin_to_list.c test_c.o
Compilation:
gcc -s -Wall -Oz test_objcopy.c test_bi.o test_c.o
OK, thanks. But I forget to ask what results you got from running the program. Because if I try your code, using hello.c and hello.exe as test binary/source data, I get this output:
_binary_test_bi_start 00007ff6497620e0 140695771160800
_binary_test_bi_end 00007ff649762ae0 140695771163360
_binary_test_bi_size 00007ff509750a00 140690402380288
_binary_bin_to_list_c_start 00007ff649762ae0 140695771163360
_binary_bin_to_list_c_end 00007ff649762b26 140695771163430
_binary_bin_to_list_c_size 00007ff509750046 140690402377798
The sizes should have been 2560 and 70 respectively; those values are
bit bigger than that.
However I see that you also have start and end addresses, which sounds a
much better way of determining the size. (In that case, what are those
*size symbols for?).
So I can put together a working test:
---------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
extern unsigned char _binary_hello_c_start[];
extern unsigned char _binary_hello_c_end[];
char* makestr(char* start, char* end) {
int length = end-start;
char* s = malloc(length+1);
memcpy(s, start, length);
*(s+length) = 0;
return s;
}
int main() {
char* str = makestr(_binary_hello_c_start, _binary_hello_c_end);
printf("Hello = \n%s", str);
}
---------------------------------
I can build it like this:
---------------------------------
C:\c>mcc -c c
Compiling c.c to c.obj
C:\c>objcopy -I binary -O elf64-x86-64 hello.c hello.obj
C:\c>gcc c.c hello.obj
---------------------------------
And run it like this:
---------------------------------
C:\c>a
Hello =
#include "stdio.h"
int main(void) {
printf("Hello, World!\n");
}
---------------------------------
Instead of one compiler, here I used two compilers, a tool 'objcopy'
(which bizarrely needs to generate ELF format files) and lots of extra
ugly code. I also need to disregard whatever the hell _binary_..._size
does.
But it works.
bart ha scritto:
On 31/05/2024 15:34, Michael S wrote:
On Fri, 31 May 2024 15:04:46 +0100
bart <[email protected]> wrote:
You could use the pe-x86-64 format instead of the elf64-x86-64 to reduce
Instead of one compiler, here I used two compilers, a tool 'objcopy'
(which bizarrely needs to generate ELF format files) and lots of extra
ugly code. I also need to disregard whatever the hell _binary_..._size
does.
But it works.
the size of the object.
bart ha scritto:
C:\c>objcopy -I binary -O elf64-x86-64 hello.c hello.obj
You could use the pe-x86-64 format instead of the elf64-x86-64 to reduce
the size of the object.
bart <[email protected]> writes:
On 31/05/2024 15:34, Michael S wrote:
On Fri, 31 May 2024 15:04:46 +0100
bart <[email protected]> wrote:
Instead of one compiler, here I used two compilers, a tool 'objcopy'
(which bizarrely needs to generate ELF format files) and lots of extra
ugly code. I also need to disregard whatever the hell _binary_..._size does.
$ objcopy -I binary -O elf64-x86-64 main.cpp /tmp/test.o
$ objdump -x /tmp/test.o
/tmp/test.o: file format elf64-little
/tmp/test.o
architecture: UNKNOWN!, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 000030e2 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g .data 0000000000000000 _binary_main_cpp_start 00000000000030e2 g .data 0000000000000000 _binary_main_cpp_end 00000000000030e2 g *ABS* 0000000000000000 _binary_main_cpp_size
$ ls -l main.cpp
-rw-rw-r--. 1 scott scott 12514 May 9 2022 main.cpp
$ printf '%u\n' $(( 0x30e2 ))
12514
The value of the symbol _binary_main_cpp_size is the
number of bytes in the file.
(in other words,
_binary_main_cpp_size = _binary_main_cpp_end - _binary_main_cpp_start
)
In C code:
extern uint8_t _binary_main_cpp_size;
const size_t embed_size = &_binary_main_cpp_size;
On 31/05/2024 19:36, Scott Lurndal wrote:
bart <[email protected]> writes:
On 31/05/2024 15:34, Michael S wrote:
On Fri, 31 May 2024 15:04:46 +0100
bart <[email protected]> wrote:
Instead of one compiler, here I used two compilers, a tool 'objcopy'
(which bizarrely needs to generate ELF format files) and lots of extra
ugly code. I also need to disregard whatever the hell _binary_..._size does.
$ objcopy -I binary -O elf64-x86-64 main.cpp /tmp/test.o
$ objdump -x /tmp/test.o
/tmp/test.o: file format elf64-little
/tmp/test.o
architecture: UNKNOWN!, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 000030e2 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g .data 0000000000000000 _binary_main_cpp_start
00000000000030e2 g .data 0000000000000000 _binary_main_cpp_end
00000000000030e2 g *ABS* 0000000000000000 _binary_main_cpp_size
$ ls -l main.cpp
-rw-rw-r--. 1 scott scott 12514 May 9 2022 main.cpp
$ printf '%u\n' $(( 0x30e2 ))
12514
The value of the symbol _binary_main_cpp_size is the
number of bytes in the file.
(in other words,
_binary_main_cpp_size = _binary_main_cpp_end - _binary_main_cpp_start >>
)
In C code:
extern uint8_t _binary_main_cpp_size;
const size_t embed_size = &_binary_main_cpp_size;
Did you see the output from my version of Michael S's program? The size
is just an address. If I do what you do:
extern unsigned char _binary_hello_c_size;
....
size_t size = &_binary_hello_c_size;
printf("size: %zu\n", size);
It produces:
size: 140697695027270
Little of this seems to work, sorry. You guys keep saying, do this, do
that, no do it that way, go RTFM, but nobody has shown a complete
program that correctly shows the -size symbol to be giving anything >meaningful.
If I run this:
printf("%p\n", &_binary_hello_c_start);
printf("%p\n", &_binary_hello_c_end);
printf("%p\n", &_binary_hello_c_size);
I get:
00007ff6ef252010
00007ff6ef252056
00007ff5af240046
I can see that the first two can be subtracted to give the sizes of the
data, which is 70 or 0x46. 0x46 is the last byte of the address of
_size, so what's happening there? What's with the crap in bits 16-47?
I can extract the size using:
printf("%d\n", (unsigned short)&_binary_hello_c_size);
But something is not right. I've also asked what is the point of the
-size symbol if you can just do -end - -start, but nobody has explained.
On 5/26/2024 6:23 AM, Bonita Montero wrote:
Am 26.05.2024 um 09:13 schrieb jak:
About this I only agree partially because it depends a lot on the
context in which it is used. Moreover, I would not know how to indicate
an optimal programming language for all seasons.
C++ is in almost any case the better C.
What you describe is the greatest inconvenience of c++. To make only one >>> example, when they decided to rewrite the FB platform to accelerate it,
they thought of migrating from php to c++ and they had a collapse of the >>> staff suitable for work, so they thought of relying a compiler that
translated the php into c++ and many of the new languages were born to
try to remedy hits complexity.
C++ is the wrong language for web applications.
I like Java more for that.
C++ is the wrong language for real time apps.
No memory allocation allowed.
I use C++ for my server side apps on my webserver. Works great.
I can see that the first two can be subtracted to give the sizes of the
data, which is 70 or 0x46. 0x46 is the last byte of the address of
_size, so what's happening there? What's with the crap in bits 16-47?
I can extract the size using:
printf("%d\n", (unsigned short)&_binary_hello_c_size);
But something is not right. I've also asked what is the point of the
-size symbol if you can just do -end - -start, but nobody has explained.
On 30/05/2024 03:32, Lawrence D'Oliveiro wrote:
On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:
I've got a small commandline-tool that makes a const'd char -array
from any binary file.
It seems to me it would be more efficient to use objcopy to turn that
binary file directly into an object file with symbols accessible from C
code defining its beginning and ending points. Then just link it into
the executable.
None of my compilers, whether for C or anything else, generate object
files.
On Thu, 30 May 2024 02:32:03 -0000 (UTC)
Lawrence D'Oliveiro <[email protected]d> wrote:
On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:
I've got a small commandline-tool that makes a const'd char -array
from any binary file.
It seems to me it would be more efficient to use objcopy to turn that
binary file directly into an object file with symbols accessible from C
code defining its beginning and ending points. Then just link it into
the executable.
Of course, it is more efficient.
But:
- it covers fewer use cases.
- it exposes array's name and size as global symbols which is not
always desirable
- it feels too much like a magic. It would feel less like a magic if
done by compiler rather than by extra tool. Even better if done by
compiler in standardized manner.
bart ha scritto:
I can see that the first two can be subtracted to give the sizes of
the data, which is 70 or 0x46. 0x46 is the last byte of the address of
_size, so what's happening there? What's with the crap in bits 16-47?
I can extract the size using:
printf("%d\n", (unsigned short)&_binary_hello_c_size);
But something is not right. I've also asked what is the point of the
-size symbol if you can just do -end - -start, but nobody has explained.
typedef unsigned char uchar;
extern uchar _binary_hello_c_size[];
long hello_c_size = _binary_hello_c_size - (uchar *)0;
On Thu, 30 May 2024 14:34:00 +0100, bart wrote:
On 30/05/2024 03:32, Lawrence D'Oliveiro wrote:
On Wed, 29 May 2024 13:58:20 +0200, Bonita Montero wrote:
I've got a small commandline-tool that makes a const'd char -array
from any binary file.
It seems to me it would be more efficient to use objcopy to turn that
binary file directly into an object file with symbols accessible from C
code defining its beginning and ending points. Then just link it into
the executable.
None of my compilers, whether for C or anything else, generate object
files.
That’s too bad. All the good compilers, for languages like C and others which are meant to execute efficiently, do.
bart <[email protected]> writes:
Little of this seems to work, sorry. You guys keep saying, do this, do
that, no do it that way, go RTFM, but nobody has shown a complete
program that correctly shows the -size symbol to be giving anything
meaningful.
If I run this:
printf("%p\n", &_binary_hello_c_start);
printf("%p\n", &_binary_hello_c_end);
printf("%p\n", &_binary_hello_c_size);
I get:
00007ff6ef252010
00007ff6ef252056
00007ff5af240046
I can see that the first two can be subtracted to give the sizes of the
data, which is 70 or 0x46. 0x46 is the last byte of the address of
_size, so what's happening there? What's with the crap in bits 16-47?
I can extract the size using:
printf("%d\n", (unsigned short)&_binary_hello_c_size);
But something is not right. I've also asked what is the point of the
-size symbol if you can just do -end - -start, but nobody has explained.
$ cat /tmp/m.c
#include <stdio.h>
#include <stdint.h>
extern uint64_t _binary_main_cpp_size;
extern uint8_t *_binary_main_cpp_start;
extern uint8_t *_binary_main_cpp_end;
int main()
{
printf("%p\n", &_binary_main_cpp_size);
printf("%p\n", &_binary_main_cpp_start);
printf("%p\n", &_binary_main_cpp_end);
return 0;
}
$ objcopy -I binary -B i386 -O elf64-x86-64 main.cpp /tmp/test.o
$ cc -o /tmp/m /tmp/m.c /tmp/test.o
$ /tmp/m
0x30e2
0x601034
0x604116
$ nm /tmp/m | grep _binary_main
0000000000604116 D _binary_main_cpp_end
00000000000030e2 A _binary_main_cpp_size
0000000000601034 D _binary_main_cpp_start
$ wc -c main.cpp
12514 main.cpp
$ printf 0x%x\\n 12514
0x30e2
The size symbol requires no space in the resulting
executable memory image, and it's more convenient than
having to do the math (at run time, since the compiler
can't know the actual values).
On 31/05/2024 13:55, bart wrote:
On 30/05/2024 16:03, Michael S wrote:Here's my solution. It's a bit more complicated.
On Thu, 30 May 2024 15:48:39 +0100I gave myself the simple task of incorporating the source text of
bart <[email protected]> wrote:
Where do the _binary_logo_bmp_start and ...-size symbols come from?
That is, how do they get into the object file.
objcopy generates names of the symbols from the name of input binary
file. I would think that it is possible to change these symbols to
something else, but I am not sure that it is possible withing the same
invocation of objcopy. It certainly is possible with a second pass.
Lawrence probably can give more authoritative answer.
Or as a last resort you can RTFM.
hello.c into a program, and printing it out.
Here's how builtin embedding worked using a feature of my older C
compiler:
#include <stdio.h>
#include <string.h>
char hello[] = strinclude("hello.c");
int main(void) {
printf("hello =\n%s\n", hello);
printf("strlen(hello) = %zu\n", strlen(hello));
printf("sizeof(hello) = %zu\n", sizeof(hello));
}
I build it and run it like this:
C:\c>bcc c
Compiling c.c to c.exe
C:\c>c
hello =
#include "stdio.h"
int main(void) {
printf("Hello, World!\n");
}
strlen(hello) = 70
sizeof(hello) = 71
C:\c>dir hello.c
31/05/2024 13:48 70 hello.c
It just works; no messing about with objcopy parameters; no long
unwieldy names; no link errors due to unsupported file formats; no
problems with missing terminators for embedded text files imported as
strings; no funny ways of getting size info.
int bbx_write_source (const char *source_xml, char *path, const char *source_xml_file, const char *source_xml_name)
{
XMLDOC *doc = 0;
char error[1024];
char buff[1024];
XMLNODE *root;
XMLNODE *node;
const char *name;
FILE *fpout;
FILE *fpin;
int ch;
doc = xmldocfromstring(source_xml, error, 1024);
if (!doc)
{
fprintf(stderr, "%s\n", error);
return -1;
}
root = xml_getroot(doc);
if (strcmp(xml_gettag(root), "FileSystem"))
return -1;
if (!root->child)
return -1;
if (strcmp(xml_gettag(root->child), "directory"))
return -1;
for (node = root->child->child; node != NULL; node = node->next)
{
if (!strcmp(xml_gettag(node), "file"))
{
name = xml_getattribute(node, "name");
snprintf(buff, 1024, "%s%s", path, name);
fpout = fopen(buff, "w");
if (!fpout)
break;
fpin = file_fopen(node);
if (!fpin)
break;
if (!strcmp(name, source_xml_file))
{
char *escaped = texttostring(source_xml);
if (!escaped)
break;
fprintf(fpout, "char %s[] = %s;\n", source_xml_name,
escaped);
free(escaped);
}
else
{
while ((ch = fgetc(fpin)) != EOF)
fputc(ch, fpout);
}
fclose(fpout);
fclose(fpin);
fpout = 0;
fpin = 0;
}
}
if (fpin || fpout)
{
fclose(fpin);
fclose(fpout);
return -1;
}
return 0;
}
It's leveraging the Baby X resource compiler, the xmparser, and my
filesystem programs. You can't include the source of a program in the
program as a C string, because then the source changes to include that string. So what you do is this.
You first place a placeholder C source file containing a short dummy
string.
The you convert the source to an XML file, and turn it into a string
with the Baby X Resource compiler. Then you drop the source into the
file, removing the placeholder.
Then the program walks the file list, detects that file, and replaces it
with the xml string it has been passed.
And this system works, and it's an easy way of adding source output to ptograms. Of course the function now needs to be modified to walk the
entire tree recursively and I will need a makedirectory function. I've
got it to work for flat source directories.
On 01/06/2024 02:37, jak wrote:
bart ha scritto:
I can see that the first two can be subtracted to give the sizes of
the data, which is 70 or 0x46. 0x46 is the last byte of the address
of _size, so what's happening there? What's with the crap in bits 16-47? >>>
I can extract the size using:
printf("%d\n", (unsigned short)&_binary_hello_c_size);
But something is not right. I've also asked what is the point of the
-size symbol if you can just do -end - -start, but nobody has explained.
typedef unsigned char uchar;
extern uchar _binary_hello_c_size[];
long hello_c_size = _binary_hello_c_size - (uchar *)0;
What result for the size did you get when you ran this?
It seems people are just guessing what might be the right code and
posting random fragments!
On 01/06/2024 02:25, Scott Lurndal wrote:
bart <[email protected]> writes:
Little of this seems to work, sorry. You guys keep saying, do this, do
that, no do it that way, go RTFM, but nobody has shown a complete
program that correctly shows the -size symbol to be giving anything
meaningful.
If I run this: [attempt to reproduce example]
$ cat /tmp/m.c
#include <stdio.h>
#include <stdint.h>
extern uint64_t _binary_main_cpp_size;
extern uint8_t *_binary_main_cpp_start;
extern uint8_t *_binary_main_cpp_end;
int main()
{
printf("%p\n", &_binary_main_cpp_size);
printf("%p\n", &_binary_main_cpp_start);
printf("%p\n", &_binary_main_cpp_end);
return 0;
}
$ objcopy -I binary -B i386 -O elf64-x86-64 main.cpp /tmp/test.o
$ cc -o /tmp/m /tmp/m.c /tmp/test.o
$ /tmp/m
0x30e2
0x601034
0x604116
$ nm /tmp/m | grep _binary_main
0000000000604116 D _binary_main_cpp_end
00000000000030e2 A _binary_main_cpp_size
0000000000601034 D _binary_main_cpp_start
$ wc -c main.cpp
12514 main.cpp
$ printf 0x%x\\n 12514
0x30e2
The size symbol requires no space in the resulting
executable memory image, and it's more convenient than
having to do the math (at run time, since the compiler
can't know the actual values).
Here's my transcript:
-------------------------------------
C:\c>copy hello.c main.cpp # create main.cpp, here it's 70 bytes
1 file(s) copied.
C:\c>type m.c # exact same code as yours
#include <stdio.h>
#include <stdint.h>
extern uint64_t _binary_main_cpp_size;
extern uint8_t *_binary_main_cpp_start;
extern uint8_t *_binary_main_cpp_end;
int main()
{
printf("%p\n", &_binary_main_cpp_size);
printf("%p\n", &_binary_main_cpp_start);
printf("%p\n", &_binary_main_cpp_end);
return 0;
}
C:\c>objcopy -I binary -O elf64-x86-64 main.cpp test.o # make test.o
C:\c>gcc m.c test.o -o m.exe # build m executable
C:\c>m # run m.exe
00007ff5d5480046 # and the size is ...
00007ff715492010
00007ff715492056
[similar results under WSL]
On 5/26/2024 6:23 AM, Bonita Montero wrote:
Am 26.05.2024 um 09:13 schrieb jak:
About this I only agree partially because it depends a lot on the
context in which it is used. Moreover, I would not know how to indicate
an optimal programming language for all seasons.
C++ is in almost any case the better C.
What you describe is the greatest inconvenience of c++. To make only one >>> example, when they decided to rewrite the FB platform to accelerate it,
they thought of migrating from php to c++ and they had a collapse of the >>> staff suitable for work, so they thought of relying a compiler that
translated the php into c++ and many of the new languages were born to
try to remedy hits complexity.
C++ is the wrong language for web applications.
I like Java more for that.
C++ is the wrong language for real time apps. No memory allocation
allowed.
On 5/23/2024 2:25 PM, Bonita Montero wrote:
Am 22.05.2024 um 18:55 schrieb David Brown:
In an attempt to bring some topicality to the group, has anyone
started using, or considering, C23 ? There's quite a lot of change
in it, especially compared to the minor changes in C17.
<https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf>
<https://en.wikipedia.org/wiki/C23_(C_standard_revision)>
<https://en.cppreference.com/w/c/23>
I like that it tidies up a lot of old stuff - it is neater to have
things like "bool", "static_assert", etc., as part of the language
rather than needing a half-dozen includes for such basic stuff.
I like that it standardises a several useful extensions that have
been in gcc and clang (and possibly other compilers) for many years.
I'm not sure it will make a big difference to my own programming -
when I want "typeof" or "chk_add()", I already use them in gcc. But
for people restricted to standard C, there's more new to enjoy. And
I prefer to use standard syntax when possible.
"constexpr" is something I think I will find helpful, in at least
some circumstances.
I ask myself what the point is in further developing a language
like this that can actually no longer be saved.
There is way more code written in C than C++. For instance, just about
all real time systems such as device and engine management are written
in C.
One of my friends writes the device code for a NAS manufacturer. The
code starts off with:
while (1)
{
... a bunch of code
}
On 01/06/2024 02:25, Scott Lurndal wrote:
bart <[email protected]> writes:
Little of this seems to work, sorry. You guys keep saying, do this, do
that, no do it that way, go RTFM, but nobody has shown a complete
program that correctly shows the -size symbol to be giving anything
meaningful.
But something is not right. I've also asked what is the point of the
-size symbol if you can just do -end - -start, but nobody has explained.
$ cat /tmp/m.c
#include <stdio.h>
#include <stdint.h>
extern uint64_t _binary_main_cpp_size;
extern uint8_t *_binary_main_cpp_start;
extern uint8_t *_binary_main_cpp_end;
int main()
{
printf("%p\n", &_binary_main_cpp_size);
printf("%p\n", &_binary_main_cpp_start);
printf("%p\n", &_binary_main_cpp_end);
return 0;
}
$ objcopy -I binary -B i386 -O elf64-x86-64 main.cpp /tmp/test.o
$ cc -o /tmp/m /tmp/m.c /tmp/test.o
$ /tmp/m
0x30e2
0x601034
0x604116
$ nm /tmp/m | grep _binary_main
0000000000604116 D _binary_main_cpp_end
00000000000030e2 A _binary_main_cpp_size
0000000000601034 D _binary_main_cpp_start
bart <[email protected]> writes:
On 01/06/2024 02:25, Scott Lurndal wrote:
bart <[email protected]> writes:
Little of this seems to work, sorry. You guys keep saying, do this, do >>>> that, no do it that way, go RTFM, but nobody has shown a complete
program that correctly shows the -size symbol to be giving anything
meaningful.
If I run this: [attempt to reproduce example]
$ cat /tmp/m.c
#include <stdio.h>
#include <stdint.h>
extern uint64_t _binary_main_cpp_size;
extern uint8_t *_binary_main_cpp_start;
extern uint8_t *_binary_main_cpp_end;
int main()
{
printf("%p\n", &_binary_main_cpp_size);
printf("%p\n", &_binary_main_cpp_start);
printf("%p\n", &_binary_main_cpp_end);
return 0;
}
$ objcopy -I binary -B i386 -O elf64-x86-64 main.cpp /tmp/test.o
$ cc -o /tmp/m /tmp/m.c /tmp/test.o
$ /tmp/m
0x30e2
0x601034
0x604116
$ nm /tmp/m | grep _binary_main
0000000000604116 D _binary_main_cpp_end
00000000000030e2 A _binary_main_cpp_size
0000000000601034 D _binary_main_cpp_start
$ wc -c main.cpp
12514 main.cpp
$ printf 0x%x\\n 12514
0x30e2
The size symbol requires no space in the resulting
executable memory image, and it's more convenient than
having to do the math (at run time, since the compiler
can't know the actual values).
Here's my transcript:
-------------------------------------
C:\c>copy hello.c main.cpp # create main.cpp, here it's 70 bytes
1 file(s) copied.
C:\c>type m.c # exact same code as yours
#include <stdio.h>
#include <stdint.h>
extern uint64_t _binary_main_cpp_size;
extern uint8_t *_binary_main_cpp_start;
extern uint8_t *_binary_main_cpp_end;
int main()
{
printf("%p\n", &_binary_main_cpp_size);
printf("%p\n", &_binary_main_cpp_start);
printf("%p\n", &_binary_main_cpp_end);
return 0;
}
C:\c>objcopy -I binary -O elf64-x86-64 main.cpp test.o # make test.o
C:\c>gcc m.c test.o -o m.exe # build m executable
C:\c>m # run m.exe
00007ff5d5480046 # and the size is ...
00007ff715492010
00007ff715492056
[similar results under WSL]
For what it's worth I see the same behavior running on linux.
It looks like the culprit is gcc, which apparently relocates
the symbol even though it is marked with an A type.
OK, thanks. But I forget to ask what results you got from running the program. Because if I try your code, using hello.c and hello.exe as
test binary/source data, I get this output:
_binary_test_bi_start 00007ff6497620e0
140695771160800 _binary_test_bi_end
00007ff649762ae0 140695771163360 _binary_test_bi_size
00007ff509750a00 140690402380288 _binary_bin_to_list_c_start
00007ff649762ae0 140695771163360 _binary_bin_to_list_c_end
00007ff649762b26 140695771163430
_binary_bin_to_list_c_size 00007ff509750046
140690402377798
The sizes should have been 2560 and 70 respectively; those values are
bit bigger than that.
However I see that you also have start and end addresses, which
sounds a much better way of determining the size. (In that case, what
are those *size symbols for?).
On 01/06/2024 02:25, Scott Lurndal wrote:
Nope, same thing. This doesn't inspire much confidence. With values
shown, the actual size IS contained within the _size value, but only as
the last 16 bits of the value.
gcc versions were 10.3.0 and 9.4.0 respectively; the latter is what is provided by Windows 11.
You also brought up the fact that the size is not known to the compiler anyway, which means a few things are not possible, like using the size
in a static context.
For what it's worth I see the same behavior running on linux.
It looks like the culprit is gcc, which apparently relocates
the symbol even though it is marked with an A type. After
running around in circles for a goodly amount of time, it
occurred to me to try compiling using clang, and that worked.
If I run this:
printf("%p\n", &_binary_hello_c_start);
printf("%p\n", &_binary_hello_c_end);
printf("%p\n", &_binary_hello_c_size);
I get:
00007ff6ef252010
00007ff6ef252056
00007ff5af240046
I can see that the first two can be subtracted to give the sizes of
the data, which is 70 or 0x46. 0x46 is the last byte of the address
of _size, so what's happening there? What's with the crap in bits
16-47?
On 01/06/2024 23:11, Michael S wrote:
On Fri, 31 May 2024 22:15:54 +0100
bart <[email protected]> wrote:
If I run this:
printf("%p\n", &_binary_hello_c_start);
printf("%p\n", &_binary_hello_c_end);
printf("%p\n", &_binary_hello_c_size);
I get:
00007ff6ef252010
00007ff6ef252056
00007ff5af240046
I can see that the first two can be subtracted to give the sizes of
the data, which is 70 or 0x46. 0x46 is the last byte of the address
of _size, so what's happening there? What's with the crap in bits
16-47?
It looks like ASLR. I don't see it because I test on Win7.
I understand those are high-loading addresses. I was asking what they
were doing as part of the size.
Apparently, that size value is wrongly relocated by some versions of
gcc-ld. Since allocations work on 64KB blocks, that explains why the
bottom 16 bits are unaffected.
So such a size value could still be used for objects up 64KB-1, but
it sounds dodgy.
On Fri, 31 May 2024 22:15:54 +0100
bart <[email protected]> wrote:
If I run this:
printf("%p\n", &_binary_hello_c_start);
printf("%p\n", &_binary_hello_c_end);
printf("%p\n", &_binary_hello_c_size);
I get:
00007ff6ef252010
00007ff6ef252056
00007ff5af240046
I can see that the first two can be subtracted to give the sizes of
the data, which is 70 or 0x46. 0x46 is the last byte of the address
of _size, so what's happening there? What's with the crap in bits
16-47?
It looks like ASLR. I don't see it because I test on Win7.
Tim Rentsch <[email protected]> writes:
bart <[email protected]> writes:
On 01/06/2024 02:25, Scott Lurndal wrote:
bart <[email protected]> writes:
Little of this seems to work, sorry. You guys keep saying, do this, do >>>>> that, no do it that way, go RTFM, but nobody has shown a complete
program that correctly shows the -size symbol to be giving anything
meaningful.
If I run this: [attempt to reproduce example]
$ cat /tmp/m.c
#include <stdio.h>
#include <stdint.h>
extern uint64_t _binary_main_cpp_size;
extern uint8_t *_binary_main_cpp_start;
extern uint8_t *_binary_main_cpp_end;
int main()
{
printf("%p\n", &_binary_main_cpp_size);
printf("%p\n", &_binary_main_cpp_start);
printf("%p\n", &_binary_main_cpp_end);
return 0;
}
$ objcopy -I binary -B i386 -O elf64-x86-64 main.cpp /tmp/test.o
$ cc -o /tmp/m /tmp/m.c /tmp/test.o
$ /tmp/m
0x30e2
0x601034
0x604116
$ nm /tmp/m | grep _binary_main
0000000000604116 D _binary_main_cpp_end
00000000000030e2 A _binary_main_cpp_size
0000000000601034 D _binary_main_cpp_start
$ wc -c main.cpp
12514 main.cpp
$ printf 0x%x\\n 12514
0x30e2
The size symbol requires no space in the resulting
executable memory image, and it's more convenient than
having to do the math (at run time, since the compiler
can't know the actual values).
Here's my transcript:
-------------------------------------
C:\c>copy hello.c main.cpp # create main.cpp, here it's 70 bytes >>> 1 file(s) copied.
C:\c>type m.c # exact same code as yours
#include <stdio.h>
#include <stdint.h>
extern uint64_t _binary_main_cpp_size;
extern uint8_t *_binary_main_cpp_start;
extern uint8_t *_binary_main_cpp_end;
int main()
{
printf("%p\n", &_binary_main_cpp_size);
printf("%p\n", &_binary_main_cpp_start);
printf("%p\n", &_binary_main_cpp_end);
return 0;
}
C:\c>objcopy -I binary -O elf64-x86-64 main.cpp test.o # make test.o
C:\c>gcc m.c test.o -o m.exe # build m executable
C:\c>m # run m.exe
00007ff5d5480046 # and the size is ...
00007ff715492010
00007ff715492056
[similar results under WSL]
For what it's worth I see the same behavior running on linux.
Which versions? It works fine on my linux system (FC20, GCC 4.8.3)
It looks like the culprit is gcc, which apparently relocates
the symbol even though it is marked with an A type.
gcc doesn't do 'relocations'. If you have a problem, it's
likely with binutils (i.e. ld(1)).
bart ha scritto:
On 01/06/2024 02:37, jak wrote:
bart ha scritto:
I can see that the first two can be subtracted to give the sizes
of the data, which is 70 or 0x46. 0x46 is the last byte of the
address of _size, so what's happening there? What's with the crap
in bits 16-47?
I can extract the size using:
printf("%d\n", (unsigned short)&_binary_hello_c_size);
But something is not right. I've also asked what is the point of
the -size symbol if you can just do -end - -start, but nobody has
explained.
typedef unsigned char uchar;
extern uchar _binary_hello_c_size[];
long hello_c_size = _binary_hello_c_size - (uchar *)0;
What result for the size did you get when you ran this?
It seems people are just guessing what might be the right code and
posting random fragments!
I wrote it that way precisely because I believed it was the clearest
way. [...]
On Fri, 31 May 2024 19:03:10 +0100
bart <[email protected]> wrote:
OK, thanks. But I forget to ask what results you got from running the
program. Because if I try your code, using hello.c and hello.exe as
test binary/source data, I get this output:
_binary_test_bi_start 00007ff6497620e0
140695771160800 _binary_test_bi_end
00007ff649762ae0 140695771163360 _binary_test_bi_size
00007ff509750a00 140690402380288 _binary_bin_to_list_c_start
00007ff649762ae0 140695771163360 _binary_bin_to_list_c_end
00007ff649762b26 140695771163430
_binary_bin_to_list_c_size 00007ff509750046
140690402377798
The sizes should have been 2560 and 70 respectively; those values are
bit bigger than that.
That's strange. I got expected results:
_binary_test_bi_start 000000013FDD30C0 5366427840 _binary_test_bi_end 000000013FDD67AC 5366441900 _binary_test_bi_size 00000000000036EC 14060 _binary_bin_to_list_c_start 000000013FDD67AC 5366441900 _binary_bin_to_list_c_end 000000013FDD711F 5366444319 _binary_bin_to_list_c_size 0000000000000973 2419
However I see that you also have start and end addresses, which
sounds a much better way of determining the size. (In that case, what
are those *size symbols for?).
I'd guess, *_size is here for the benefit of less smart compilers that
can not figure out that *_end - *_start is a connst expression
and can not compile code like:
static ptrdiff_t bar = _binary_test_bi_end - _binary_test_bi_start;
... I like strings which you
can pass about (though to actually use the contents you need to covert
to char *, otherwuse it is hopeless) ...
My compilers don't routinely generate object files, which would also
need an external dependency (a linker), but they can do if necessary
(eg. to statically link my code into another program with another
compiler).
while (1)
Lynn McGuire <[email protected]> writes:
On 5/26/2024 6:23 AM, Bonita Montero wrote:
Am 26.05.2024 um 09:13 schrieb jak:
About this I only agree partially because it depends a lot on the
context in which it is used. Moreover, I would not know how to
indicate an optimal programming language for all seasons.
C++ is in almost any case the better C.
What you describe is the greatest inconvenience of c++. To make
only one example, when they decided to rewrite the FB platform to
accelerate it, they thought of migrating from php to c++ and they
had a collapse of the staff suitable for work, so they thought of
relying a compiler that translated the php into c++ and many of
the new languages were born to try to remedy hits complexity.
C++ is the wrong language for web applications.
I like Java more for that.
C++ is the wrong language for real time apps.
That's an incorrect statement.
No memory allocation allowed.
It is trivially easy to write C++ code that doesn't
allocate memory dynamically.
I use C++ for my server side apps on my webserver. Works great.
I use C++ for operating systems (you can't get more real-time
than that)
and bare-metal hypervisors.
On Sat, 1 Jun 2024 11:37:45 +0100, bart wrote:
My compilers don't routinely generate object files, which would also
need an external dependency (a linker), but they can do if necessary
(eg. to statically link my code into another program with another
compiler).
Modular code design would indicate that there is no point the compiler duplicating functionality available in the linker.
On Sat, 01 Jun 2024 01:27:41 GMT
[email protected] (Scott Lurndal) wrote:
Lynn McGuire <[email protected]> writes:
On 5/26/2024 6:23 AM, Bonita Montero wrote:
Am 26.05.2024 um 09:13 schrieb jak:
About this I only agree partially because it depends a lot on the
context in which it is used. Moreover, I would not know how to
indicate an optimal programming language for all seasons.
C++ is in almost any case the better C.
What you describe is the greatest inconvenience of c++. To make
only one example, when they decided to rewrite the FB platform to
accelerate it, they thought of migrating from php to c++ and they
had a collapse of the staff suitable for work, so they thought of
relying a compiler that translated the php into c++ and many of
the new languages were born to try to remedy hits complexity.
C++ is the wrong language for web applications.
I like Java more for that.
C++ is the wrong language for real time apps.
That's an incorrect statement.
No memory allocation allowed.
It is trivially easy to write C++ code that doesn't
allocate memory dynamically.
I use C++ for my server side apps on my webserver. Works great.
I use C++ for operating systems (you can't get more real-time
than that)
Engines control is FAR more real-time that OS, to list just one example
out of many.
Of course, nowadays most of these things are no longer done on general-purpose CPUs or even MCUs.
and bare-metal hypervisors.
It is hard to believe that you don't have at least one co-worker that
is begging to switch all new development to C approximately every week.
And couple of folks that beg for Rust.
On 02/06/2024 10:02, Michael S wrote:
On Sat, 01 Jun 2024 01:27:41 GMT
[email protected] (Scott Lurndal) wrote:
Lynn McGuire <[email protected]> writes:
On 5/26/2024 6:23 AM, Bonita Montero wrote:
Am 26.05.2024 um 09:13 schrieb jak:
About this I only agree partially because it depends a lot on
the context in which it is used. Moreover, I would not know how
to indicate an optimal programming language for all seasons.
C++ is in almost any case the better C.
What you describe is the greatest inconvenience of c++. To make
only one example, when they decided to rewrite the FB platform
to accelerate it, they thought of migrating from php to c++ and
they had a collapse of the staff suitable for work, so they
thought of relying a compiler that translated the php into c++
and many of the new languages were born to try to remedy hits
complexity.
C++ is the wrong language for web applications.
I like Java more for that.
C++ is the wrong language for real time apps.
That's an incorrect statement.
No memory allocation allowed.
It is trivially easy to write C++ code that doesn't
allocate memory dynamically.
I use C++ for my server side apps on my webserver. Works great.
I use C++ for operating systems (you can't get more real-time
than that)
Engines control is FAR more real-time that OS, to list just one
example out of many.
Most engine control software runs on an RTOS - so you have at least
as tough real-time requirements for the OS as for the application.
The OS stuff Scott works with, AFAIK, is real-time OS's for specific
tasks such as high-end network equipment. It is not general-purpose
or desktop OS's (which I agree are not particularly real-time).
Of course, nowadays most of these things are no longer done on general-purpose CPUs or even MCUs.
I think you have got that backwards.
Most engine control /is/ done with general purpose microcontrollers,
or at least specific variants of them. They will use ARM Cortex-R or Cortex-M cores rather than Cortex-A cores (i.e., the "real-time"
cores or "microcontroller" cores rather than the "application" cores
you see in telephones, Macs, and ARM servers), but they are standard
cores. Another common choice is the PowerPC cores used in NXP's
engine controllers.
It used to be the case that engine control and other critical hard
real-time work was done with DSPs or FPGAs, but those days are long
past.
and bare-metal hypervisors.
It is hard to believe that you don't have at least one co-worker
that is begging to switch all new development to C approximately
every week. And couple of folks that beg for Rust.
It's possible that he has newbies amongst his co-workers, yes.
On Fri, 31 May 2024 17:55:13 -0500, Lynn McGuire wrote:
while (1)
Why not
while (true)
or even
for (;;)
?
In article <v3gou9$36n61$[email protected]>,
Lawrence D'Oliveiro <[email protected]d> wrote:
On Fri, 31 May 2024 17:55:13 -0500, Lynn McGuire wrote:
while (1)
Why not
while (true)
or even
for (;;)
?
Or even:
:loop
....
goto loop
On Sun, 2 Jun 2024 14:03:30 +0200
David Brown <[email protected]> wrote:
The OS stuff Scott works with, AFAIK, is real-time OS's for specific
tasks such as high-end network equipment. It is not general-purpose
or desktop OS's (which I agree are not particularly real-time).
I'd characterized the software running within high-end NIC is as very
soft real-time. You only care for buffers to not overflow. And if they >overflow, it's not too bad either. The flow is very much unidirectional
or bi-directional with direction almost independent of each other.
On Sat, 01 Jun 2024 01:27:41 GMT
[email protected] (Scott Lurndal) wrote:
Lynn McGuire <[email protected]> writes:
On 5/26/2024 6:23 AM, Bonita Montero wrote:
Am 26.05.2024 um 09:13 schrieb jak:
About this I only agree partially because it depends a lot on the
context in which it is used. Moreover, I would not know how to
indicate an optimal programming language for all seasons.
C++ is in almost any case the better C.
What you describe is the greatest inconvenience of c++. To make
only one example, when they decided to rewrite the FB platform to
accelerate it, they thought of migrating from php to c++ and they
had a collapse of the staff suitable for work, so they thought of
relying a compiler that translated the php into c++ and many of
the new languages were born to try to remedy hits complexity.
C++ is the wrong language for web applications.
I like Java more for that.
C++ is the wrong language for real time apps.
That's an incorrect statement.
No memory allocation allowed.
It is trivially easy to write C++ code that doesn't
allocate memory dynamically.
I use C++ for my server side apps on my webserver. Works great.
I use C++ for operating systems (you can't get more real-time
than that)
Engines control is FAR more real-time that OS, to list just one example
out of many.
Of course, nowadays most of these things are no longer done on >general-purpose CPUs or even MCUs.
and bare-metal hypervisors.
It is hard to believe that you don't have at least one co-worker that
is begging to switch all new development to C approximately every week.
I've always considered
for (;;)
preferable over
while (1)
On Sun, 2 Jun 2024 14:03:30 +0200
David Brown <[email protected]> wrote:
On 02/06/2024 10:02, Michael S wrote:
On Sat, 01 Jun 2024 01:27:41 GMT
[email protected] (Scott Lurndal) wrote:
Lynn McGuire <[email protected]> writes:
On 5/26/2024 6:23 AM, Bonita Montero wrote:
Am 26.05.2024 um 09:13 schrieb jak:
About this I only agree partially because it depends a lot on
the context in which it is used. Moreover, I would not know how
to indicate an optimal programming language for all seasons.
C++ is in almost any case the better C.
What you describe is the greatest inconvenience of c++. To make
only one example, when they decided to rewrite the FB platform
to accelerate it, they thought of migrating from php to c++ and
they had a collapse of the staff suitable for work, so they
thought of relying a compiler that translated the php into c++
and many of the new languages were born to try to remedy hits
complexity.
C++ is the wrong language for web applications.
I like Java more for that.
C++ is the wrong language for real time apps.
That's an incorrect statement.
No memory allocation allowed.
It is trivially easy to write C++ code that doesn't
allocate memory dynamically.
I use C++ for my server side apps on my webserver. Works great.
I use C++ for operating systems (you can't get more real-time
than that)
Engines control is FAR more real-time that OS, to list just one
example out of many.
Most engine control software runs on an RTOS - so you have at least
as tough real-time requirements for the OS as for the application.
From what I read about this stuff (admittedly, long time ago) even
when there is a RTOS, the important part runs alongside RTOS rather than
"on" RTOS.
I.e. there is high priority interrupt that is never ever masked by OS in
the region that is anywhere close to expected time and all
time-sensitive work is done by ISR, with no sort of RTOS calls.
The OS stuff Scott works with, AFAIK, is real-time OS's for specific
tasks such as high-end network equipment. It is not general-purpose
or desktop OS's (which I agree are not particularly real-time).
I'd characterized the software running within high-end NIC is as very
soft real-time.
You only care for buffers to not overflow. And if they
overflow, it's not too bad either.
The flow is very much unidirectional
or bi-directional with direction almost independent of each other.
There are dependencies between directions, e.g. TCP acks, but they a
weak dependencies timing-wise.
Hard real time is about closed loops, most often closed control loops,
but not only those.
Of course, nowadays most of these things are no longer done on
general-purpose CPUs or even MCUs.
I think you have got that backwards.
Most engine control /is/ done with general purpose microcontrollers,
or at least specific variants of them. They will use ARM Cortex-R or
Cortex-M cores rather than Cortex-A cores (i.e., the "real-time"
cores or "microcontroller" cores rather than the "application" cores
you see in telephones, Macs, and ARM servers), but they are standard
cores. Another common choice is the PowerPC cores used in NXP's
engine controllers.
It used to be the case that engine control and other critical hard
real-time work was done with DSPs or FPGAs, but those days are long
past.
Are you sure?
It's much simpler and far more reliable to do such task with $5 PLD
(which today means FPGA that boots from internal flash, rather than
old day's PLD) than with MCU, regardless of price of MCU.
Even if MCU is $4.99 cheaper, the difference is a noise relatively to
price of engine.
and bare-metal hypervisors.
It is hard to believe that you don't have at least one co-worker
that is begging to switch all new development to C approximately
every week. And couple of folks that beg for Rust.
It's possible that he has newbies amongst his co-workers, yes.
Well, Linus is not on his team, but if he was, he would say the same
thing. But probably at much higher rate than weekly.
On 02/06/2024 04:27, Lawrence D'Oliveiro wrote:
On Sat, 1 Jun 2024 11:37:45 +0100, bart wrote:
My compilers don't routinely generate object files, which would also
need an external dependency (a linker), but they can do if necessary
(eg. to statically link my code into another program with another
compiler).
Modular code design would indicate that there is no point the compiler
duplicating functionality available in the linker.
Python uses modules and yet doesn't have a linker.
Engines control is FAR more real-time that OS, to list just one example
out of many.
On Sun, 2 Jun 2024 10:37:55 +0100, bart wrote:
On 02/06/2024 04:27, Lawrence D'Oliveiro wrote:
On Sat, 1 Jun 2024 11:37:45 +0100, bart wrote:
My compilers don't routinely generate object files, which would
also need an external dependency (a linker), but they can do if
necessary (eg. to statically link my code into another program
with another compiler).
Modular code design would indicate that there is no point the
compiler duplicating functionality available in the linker.
Python uses modules and yet doesn't have a linker.
What is importlib, then, if not something that links everything
together?
And guess what: it’s a module.
On 2024-06-02, Lew Pitcher <[email protected]> wrote:
I've always considered
for (;;)
preferable over
while (1)
Of course it is preferable. The idiom constitutes the language's
direct support for unconditional looping, not requiring that to be
requested by an extraneous always-true expression.
Using while (1) or while (true) is like i = i + 1 instead
of ++i, or while (*dst++ = *src++); instead of strcpy.
When Dennis Ritchie (if it was indeed he) chose for to be the
construct in which the guard expression may be omitted, so that it
may express conditional looping, he expressed the intent that it be henceforth used for that purpose.
To continue to use while (1) after the proper utensil is provided is
like to eat with your hands instead of a fork.
On 02/06/2024 15:29, Michael S wrote:
On Sun, 2 Jun 2024 14:03:30 +0200
David Brown <[email protected]> wrote:
On 02/06/2024 10:02, Michael S wrote:
On Sat, 01 Jun 2024 01:27:41 GMT
[email protected] (Scott Lurndal) wrote:
Lynn McGuire <[email protected]> writes:
On 5/26/2024 6:23 AM, Bonita Montero wrote:
Am 26.05.2024 um 09:13 schrieb jak:
About this I only agree partially because it depends a lot on
the context in which it is used. Moreover, I would not know
how to indicate an optimal programming language for all
seasons.
C++ is in almost any case the better C.
What you describe is the greatest inconvenience of c++. To
make only one example, when they decided to rewrite the FB
platform to accelerate it, they thought of migrating from php
to c++ and they had a collapse of the staff suitable for
work, so they thought of relying a compiler that translated
the php into c++ and many of the new languages were born to
try to remedy hits complexity.
C++ is the wrong language for web applications.
I like Java more for that.
C++ is the wrong language for real time apps.
That's an incorrect statement.
No memory allocation allowed.
It is trivially easy to write C++ code that doesn't
allocate memory dynamically.
I use C++ for my server side apps on my webserver. Works
great.
I use C++ for operating systems (you can't get more real-time
than that)
Engines control is FAR more real-time that OS, to list just one
example out of many.
Most engine control software runs on an RTOS - so you have at least
as tough real-time requirements for the OS as for the application.
From what I read about this stuff (admittedly, long time ago) even
when there is a RTOS, the important part runs alongside RTOS rather
than "on" RTOS.
I.e. there is high priority interrupt that is never ever masked by
OS in the region that is anywhere close to expected time and all time-sensitive work is done by ISR, with no sort of RTOS calls.
That's sort-of right. To be precise for something like this, we'd
have to say what exactly we mean by "engine controller". There are
many kinds of engine or motor, and many types of control that are
needed for them. Generally, there is a hierarchy of simpler but more time-critical parts up to more complex but more flexible parts of the
system.
As an example of a system of motor control that I've worked on
(electric motors rather than combustion engines), the most
timing-critical signal generation and safety (emergency stop,
overload protection, etc.) are all in hardware - typically dedicated peripherals in the microcontroller. Some safety parts might also be implemented in non-maskable interrupt functions that the RTOS can
never disable.
The low-level control of the motors is typically run by timer
interrupt functions. These may be disabled by the RTOS, but will
only be disabled for a very short (and predictable) time - interrupt disabling is usually essential to the way locks and inter-process communication works, including communication between these timer
functions and the rest of the code. Higher level control runs as
RTOS tasks of various priorities, and communication with other boards
is usually a lower priority task. Clearly these real-time tasks
cannot be more "real-time" than the RTOS itself. Other boards might
have high level non-realtime system determining things like path
finding, or user interfaces.
And until you get to the highest level stuff, there is no reason why
C++ is not suitable. But whether you use C++, C, Assembly, or Ada
for the low-level and more real-time critical code, you avoid dynamic
memory, exceptions, and other techniques that can have unpredictable
failure modes and unexpected delays. (The high-level stuff can be
written in any language.)
The OS stuff Scott works with, AFAIK, is real-time OS's for
specific tasks such as high-end network equipment. It is not
general-purpose or desktop OS's (which I agree are not
particularly real-time).
I'd characterized the software running within high-end NIC is as
very soft real-time.
I'd characterize it as whatever Scott says it is - he's the expert
there, not you or me.
You only care for buffers to not overflow. And if they
overflow, it's not too bad either.
That is true for some things, but most certainly not for all usage.
The flow is very much unidirectional
or bi-directional with direction almost independent of each other.
There are dependencies between directions, e.g. TCP acks, but they a
weak dependencies timing-wise.
There is a lot of networking that is not TCP/IP.
High-speed network interfaces are used for two purposes - to get high throughput, or to get low latencies. Throughput is not as sensitive
to timing and can tolerate some variation as long as the traffic is independent, but latency is a different matter.
Hard real time is about closed loops, most often closed control
loops, but not only those.
Of course, nowadays most of these things are no longer done on
general-purpose CPUs or even MCUs.
I think you have got that backwards.
Most engine control /is/ done with general purpose
microcontrollers, or at least specific variants of them. They
will use ARM Cortex-R or Cortex-M cores rather than Cortex-A cores
(i.e., the "real-time" cores or "microcontroller" cores rather
than the "application" cores you see in telephones, Macs, and ARM
servers), but they are standard cores. Another common choice is
the PowerPC cores used in NXP's engine controllers.
It used to be the case that engine control and other critical hard
real-time work was done with DSPs or FPGAs, but those days are long
past.
Are you sure?
Pretty sure, yes.
It's much simpler and far more reliable to do such task with $5 PLD
(which today means FPGA that boots from internal flash, rather than
old day's PLD) than with MCU, regardless of price of MCU.
No, it is not simpler or more reliable. Programmable logic is rarely
used for engine or motor control. You use microcontrollers with
appropriate peripherals, such as sophisticated PWM units and encoder interfaces, and advanced timers.
Even if MCU is $4.99 cheaper, the difference is a noise relatively
to price of engine.
That part is true.
and bare-metal hypervisors.
It is hard to believe that you don't have at least one co-worker
that is begging to switch all new development to C approximately
every week. And couple of folks that beg for Rust.
It's possible that he has newbies amongst his co-workers, yes.
Well, Linus is not on his team, but if he was, he would say the same
thing. But probably at much higher rate than weekly.
Yes, but Linux Torvalds knows shit about C++. He knows a lot about
C, and many other things.
He also - not unreasonably - believes that if C++ was used in the
Linux kernel, lots of others who know nothing about using C++ in OS's
and low-level work would make a complete mess of things. You don't
want someone to randomly add std::vector<> or the like into kernel
code. You don't want people who take delight in smart-arse coding,
such as some regulars in c.l.c++, anywhere near the kernel.
But other OS's are not the Linux kernel - it has particularly unique challenges. If you have an appropriate team, C++ is vastly better
for writing RTOS kernels than C.
On Sun, 2 Jun 2024 10:37:55 +0100, bart wrote:
On 02/06/2024 04:27, Lawrence D'Oliveiro wrote:
On Sat, 1 Jun 2024 11:37:45 +0100, bart wrote:
My compilers don't routinely generate object files, which would also
need an external dependency (a linker), but they can do if necessary
(eg. to statically link my code into another program with another
compiler).
Modular code design would indicate that there is no point the compiler
duplicating functionality available in the linker.
Python uses modules and yet doesn't have a linker.
What is importlib, then, if not something that links everything together?
And guess what: it’s a module.
On Sun, 2 Jun 2024 11:02:13 +0300, Michael S wrote:
Engines control is FAR more real-time that OS, to list just one example
out of many.
Speaking of (internal-combustion) engines, I wondered when we can get to
the point where the controller is operating down at the level of
individual spark plug activations and valve openings -- getting rid of
cams and timing belts, in other words.
With that level of control, could you get down to an idling speed of 0
rpm? That is, could the engine get itself going from absolute rest?
On Sun, 2 Jun 2024 21:44:01 +0200
David Brown <[email protected]> wrote:
On 02/06/2024 15:29, Michael S wrote:
On Sun, 2 Jun 2024 14:03:30 +0200
David Brown <[email protected]> wrote:
There is a lot of networking that is not TCP/IP.
High-speed network interfaces are used for two purposes - to get high
throughput, or to get low latencies. Throughput is not as sensitive
to timing and can tolerate some variation as long as the traffic is
independent, but latency is a different matter.
I think, nearly all work in high-end NIC is concentrated on throughput.
For low latency, the best you can do with high end NIC is to disable
all high-end features and to hope that in disabled state they do not
hurt you too badly.
It would be probably better to use specialized "dumb" NIC. I don't know
if such things exist, but considering that high-frequency trading is
still legal (IMHO, it shouldn't be) I would guess that they do.
Hard real time is about closed loops, most often closed control
loops, but not only those.
Of course, nowadays most of these things are no longer done on
general-purpose CPUs or even MCUs.
I think you have got that backwards.
Most engine control /is/ done with general purpose
microcontrollers, or at least specific variants of them. They
will use ARM Cortex-R or Cortex-M cores rather than Cortex-A cores
(i.e., the "real-time" cores or "microcontroller" cores rather
than the "application" cores you see in telephones, Macs, and ARM
servers), but they are standard cores. Another common choice is
the PowerPC cores used in NXP's engine controllers.
It used to be the case that engine control and other critical hard
real-time work was done with DSPs or FPGAs, but those days are long
past.
Are you sure?
Pretty sure, yes.
It's much simpler and far more reliable to do such task with $5 PLD
(which today means FPGA that boots from internal flash, rather than
old day's PLD) than with MCU, regardless of price of MCU.
No, it is not simpler or more reliable. Programmable logic is rarely
used for engine or motor control. You use microcontrollers with
appropriate peripherals, such as sophisticated PWM units and encoder
interfaces, and advanced timers.
I was not talking about electric motors.
Well, Linus is not on his team, but if he was, he would say the same
thing. But probably at much higher rate than weekly.
Yes, but Linux Torvalds knows shit about C++. He knows a lot about
C, and many other things.
He also - not unreasonably - believes that if C++ was used in the
Linux kernel, lots of others who know nothing about using C++ in OS's
and low-level work would make a complete mess of things. You don't
want someone to randomly add std::vector<> or the like into kernel
code. You don't want people who take delight in smart-arse coding,
such as some regulars in c.l.c++, anywhere near the kernel.
Or may be he understand that [for kernel] proclaimed advantages of C++
do not matter or matter too little. And disadvantage of higher
difficulty to see quickly what's going on, is real.
It is interesting to mention that experienced 46 y.o. Dave Cutler and
young student Linus Torvalds independently came to the same conclusion
w.r.t. to kernel language choice.
That despite Cutler's employer being
very C++-oriented at that moment and despite most of the decisions
taken during the peak years of OO hype.
Unlike Torvalds, Cutler was not in a position to fully disable
development of 3-rd party kernel modules in C++, but he did his best to discourage this practice.
But other OS's are not the Linux kernel - it has particularly unique
challenges. If you have an appropriate team, C++ is vastly better
for writing RTOS kernels than C.
I find your statement unproven.
How many surviving and proliferating RTOS kernels are written in
each language?
On 03/06/2024 11:00, Michael S wrote:
On Sun, 2 Jun 2024 21:44:01 +0200
David Brown <[email protected]> wrote:
On 02/06/2024 15:29, Michael S wrote:
On Sun, 2 Jun 2024 14:03:30 +0200
David Brown <[email protected]> wrote:
There is a lot of networking that is not TCP/IP.
High-speed network interfaces are used for two purposes - to get high
throughput, or to get low latencies. Throughput is not as sensitive
to timing and can tolerate some variation as long as the traffic is
independent, but latency is a different matter.
I think, nearly all work in high-end NIC is concentrated on throughput.
For low latency, the best you can do with high end NIC is to disable
all high-end features and to hope that in disabled state they do not
hurt you too badly.
It would be probably better to use specialized "dumb" NIC. I don't know
if such things exist, but considering that high-frequency trading is
still legal (IMHO, it shouldn't be) I would guess that they do.
I think Scott can answer the high-end NIC questions a lot better than I >could.
Hard real time is about closed loops, most often closed control
loops, but not only those.
Of course, nowadays most of these things are no longer done on
general-purpose CPUs or even MCUs.
I think you have got that backwards.
Most engine control /is/ done with general purpose
microcontrollers, or at least specific variants of them. They
will use ARM Cortex-R or Cortex-M cores rather than Cortex-A cores
(i.e., the "real-time" cores or "microcontroller" cores rather
than the "application" cores you see in telephones, Macs, and ARM
servers), but they are standard cores. Another common choice is
the PowerPC cores used in NXP's engine controllers.
It used to be the case that engine control and other critical hard
real-time work was done with DSPs or FPGAs, but those days are long
past.
Are you sure?
Pretty sure, yes.
It's much simpler and far more reliable to do such task with $5 PLD
(which today means FPGA that boots from internal flash, rather than
old day's PLD) than with MCU, regardless of price of MCU.
No, it is not simpler or more reliable. Programmable logic is rarely
used for engine or motor control. You use microcontrollers with
appropriate peripherals, such as sophisticated PWM units and encoder
interfaces, and advanced timers.
I was not talking about electric motors.
Petrol and diesel engines have far less demanding requirements for the
timing of their control systems. The fastest control loops you need to >control them are a fraction of the speed of those used for high-end
electric motor control, and the corresponding acceptable jitter levels
are much less fussy. And they are invariably controlled by
microcontrollers, and have been for decades. (The microcontrollers you
use typically have some specialised timing peripherals.)
However, I would not be surprised to see programmable logic in the >controllers for jet engines, if that is what you are talking about. The >markets there are too small, and the control details too different
between different models, for there to be microcontrollers with
jet-engine peripherals. But regardless, it is all still hierarchical in
the same way, with a RTOS and real-time software tasks sitting above the >dedicated hardware and below the high-level control software.
It is interesting to mention that experienced 46 y.o. Dave Cutler and
young student Linus Torvalds independently came to the same conclusion
w.r.t. to kernel language choice.
You /do/ understand that these decisions were made some 30 years ago?
The languages, developers, compilers, targets, and many other things
have changed in that time.
That despite Cutler's employer being
very C++-oriented at that moment and despite most of the decisions
taken during the peak years of OO hype.
Unlike Torvalds, Cutler was not in a position to fully disable
development of 3-rd party kernel modules in C++, but he did his best to
discourage this practice.
But other OS's are not the Linux kernel - it has particularly unique
challenges. If you have an appropriate team, C++ is vastly better
for writing RTOS kernels than C.
I find your statement unproven.
How many surviving and proliferating RTOS kernels are written in
each language?
Oh, there's little doubt that most publicly available RTOS kernels are
in C, not C++. That does not mean C is in any way /better/ for the
task. There are multiple reasons for C being the language of choice here:
1. Most well-known RTOS kernels have a history stretching back to the >previous century. C++ was not nearly as viable an option at that time,
for a great many reasons.
2. If you write your kernel in C++, you pretty much have to use C++ for
the application code unless you also write a C API for it.
If you write
your kernel in C, you can use almost any language for the application code.
David Brown <[email protected]> writes:
On 03/06/2024 11:00, Michael S wrote:
On Sun, 2 Jun 2024 21:44:01 +0200
David Brown <[email protected]> wrote:
Oh, there's little doubt that most publicly available RTOS kernels are
in C, not C++. That does not mean C is in any way /better/ for the
task. There are multiple reasons for C being the language of choice here: >>
1. Most well-known RTOS kernels have a history stretching back to the
previous century. C++ was not nearly as viable an option at that time,
for a great many reasons.
I would disagree with this. The Chorus microkernel (Chorus Systemes,
later purchased by Sun) was started in the late 1980's and was
written in C++ (with a small set of assembler functions). This was
using Cfront (2.1 and later 3.0). I'm pretty sure it is still in
use. This was long before templates, exceptions or the standard library.
2. If you write your kernel in C++, you pretty much have to use C++ for
the application code unless you also write a C API for it.
Clearly one can use C interfaces from C++ code. And one can develop
C++ wrapper around C-type functionality.
Our C++ kernels supported standard unix-style APIs between user
mode software and the kernel.
If you write
your kernel in C, you can use almost any language for the application code.
If you write your kernel in _any_ lanaguage, you can use _any_ language
for the application code, or the kernel isn't much use to anyone.
On Sun, 2 Jun 2024 11:02:13 +0300, Michael S wrote:
Engines control is FAR more real-time that OS, to list just one example
out of many.
Speaking of (internal-combustion) engines, I wondered when we can get to
the point where the controller is operating down at the level of
individual spark plug activations and valve openings -- getting rid of
cams and timing belts, in other words.
With that level of control, could you get down to an idling speed of 0
rpm? That is, could the engine get itself going from absolute rest?
On 03/06/2024 18:50, Scott Lurndal wrote:nguage for the application code.
If you write your kernel in _any_ lanaguage, you can use _any_ language
for the application code, or the kernel isn't much use to anyone.
Many - I think most - RTOS's are linked as libraries, rather than
separately linked applications.
David Brown <[email protected]> writes:
1. Most well-known RTOS kernels have a history stretching back to
the previous century. C++ was not nearly as viable an option at
that time, for a great many reasons.
I would disagree with this. The Chorus microkernel (Chorus Systemes,
later purchased by Sun) was started in the late 1980's and was
written in C++ (with a small set of assembler functions). This was
using Cfront (2.1 and later 3.0). I'm pretty sure it is still in
use. This was long before templates, exceptions or the standard
library.
On Sun, 02 Jun 2024 13:24:23 +0000, Kenny McCormack wrote:
In article <v3gou9$36n61$[email protected]>,
Lawrence D'Oliveiro <[email protected]d> wrote:
On Fri, 31 May 2024 17:55:13 -0500, Lynn McGuire wrote:
while (1)
Why not
while (true)
or even
for (;;)
?
I've always considered
for (;;)
preferable over
while (1)
as the for (;;) expression does not require the compiler to expand
and evaluate a condition expression.
For the for (;;), the compiler sees the token stream <LPAREN>
<SEMICOLON> <SEMICOLON> <RPAREN>, and emits a closed loop, but
with while (1), the compiler sees <LPAREN> <CONSTANT> <RPAREN>,
and has to evaluate (either at compile time or at execution
time) the value of the <CONSTANT> to determine whether or or
not to emit the closed loop logic.
It's pretty clear that the ICE is becoming a dinosaur.
On 2024-06-02, Lew Pitcher <[email protected]> wrote:
I've always considered
for (;;)
preferable over
while (1)
Of course it is preferable. The idiom constitutes the language's direct support for unconditional looping, not requiring that to be requested by
an extraneous always-true expression.
Using while (1) or while (true) is like i = i + 1 instead
of ++i, or while (*dst++ = *src++); instead of strcpy. [...]
On Mon, 03 Jun 2024 16:50:50 GMT
[email protected] (Scott Lurndal) wrote:
David Brown <[email protected]> writes:
1. Most well-known RTOS kernels have a history stretching back to
the previous century. C++ was not nearly as viable an option at
that time, for a great many reasons.
I would disagree with this. The Chorus microkernel (Chorus Systemes,
later purchased by Sun) was started in the late 1980's and was
written in C++ (with a small set of assembler functions). This was
using Cfront (2.1 and later 3.0). I'm pretty sure it is still in
use. This was long before templates, exceptions or the standard
library.
If Chorus is your idea of well-known then I wonder what you call
obscure.
On 6/3/2024 1:31 PM, Tim Rentsch wrote:
Kaz Kylheku <[email protected]> writes:
On 2024-06-02, Lew Pitcher <[email protected]> wrote:
I've always considered
for (;;)
preferable over
while (1)
Of course it is preferable. The idiom constitutes the language's direct >>> support for unconditional looping, not requiring that to be requested by >>> an extraneous always-true expression.
Using while (1) or while (true) is like i = i + 1 instead
of ++i, or while (*dst++ = *src++); instead of strcpy. [...]
Using for (;;) for an infinite loop is an abomination. Anyone
who advocates following that rule is an instrument of Satan.
Better than goto? ;^D
On 2024-06-02, Lew Pitcher <[email protected]> wrote:
I've always considered
for (;;)
preferable over
while (1)
Of course it is preferable. The idiom constitutes the language's direct support for unconditional looping, not requiring that to be requested by
an extraneous always-true expression.
Using while (1) or while (true) is like i = i + 1 instead
of ++i, or while (*dst++ = *src++); instead of strcpy.
When Dennis Ritchie (if it was indeed he) chose for to be the construct
in which the guard expression may be omitted, so that it may express conditional looping, he expressed the intent that it be henceforth used
for that purpose.
To continue to use while (1) after the proper utensil is provided is
like to eat with your hands instead of a fork.
So how does importlib manage to import importlib before importlib
itself is imported?
There is NO ahead-of-time linking of modules in Python as it is
understood in traditional compiled languages.
Besides, all such statements are executed at runtime, and can be
conditional.
When all sources are available, linker is merely an implementation
detail.
Even in old days of small RAMs, super-popular TurboPascal suit had
modules, but I don't think that it had linker.
All that suggest sto me is that the language *needs* an explicit endless loop!
This version does binary/text to COFF only.
On Mon, 03 Jun 2024 16:50:50 GMT
[email protected] (Scott Lurndal) wrote:
David Brown <[email protected]> writes:
1. Most well-known RTOS kernels have a history stretching back to
the previous century. C++ was not nearly as viable an option at
that time, for a great many reasons.
I would disagree with this. The Chorus microkernel (Chorus Systemes,
later purchased by Sun) was started in the late 1980's and was
written in C++ (with a small set of assembler functions). This was
using Cfront (2.1 and later 3.0). I'm pretty sure it is still in
use. This was long before templates, exceptions or the standard
library.
If Chorus is your idea of well-known then I wonder what you call
obscure.
At the time, in the OS research community, Chorus was, indeed well-known.
Actually, somebody could write a loop like this:
for(int i=0;;++i)
Is that an endless loop or not?
At this point someone will suggest a macro this:
#define forever for(;;)
All that suggest sto me is that the language *needs* an explicit endless loop!
We were writting a large unix compatible operating system in C++
before Linus released the first Linux.
On 2024-06-03, Scott Lurndal <[email protected]> wrote:
At the time, in the OS research community, Chorus was, indeed
well-known.
If Chorus at least doesn't vaguely ring a bell, you must have your
head up your ass as even a bachelor-level computer scientist.
On 2024-06-03, Scott Lurndal <[email protected]> wrote:
At the time, in the OS research community, Chorus was, indeed well-known.
If Chorus at least doesn't vaguely ring a bell, you must have your head
up your ass as even a bachelor-level computer scientist.
In article <v3lb0u$2452$[email protected]>,
Chris M. Thomasson <[email protected]> wrote:
On 6/3/2024 1:31 PM, Tim Rentsch wrote:
Kaz Kylheku <[email protected]> writes:
On 2024-06-02, Lew Pitcher <[email protected]> wrote:
I've always considered
for (;;)
preferable over
while (1)
Of course it is preferable. The idiom constitutes the language's direct >>>> support for unconditional looping, not requiring that to be requested by >>>> an extraneous always-true expression.
Using while (1) or while (true) is like i = i + 1 instead
of ++i, or while (*dst++ = *src++); instead of strcpy. [...]
Using for (;;) for an infinite loop is an abomination. Anyone
who advocates following that rule is an instrument of Satan.
Better than goto? ;^D
I can't believe we're still having this conversation.
Surely, on any reasonably modern compiler, all three forms will generate exactly the same code.
bart <[email protected]> writes:
[...]
All that suggest sto me is that the language *needs* an explicit
endless loop!
No, it doesn't.
I suspect some of the people in this thread saying that one form
is obviously better than the others are joking.
On Mon, 3 Jun 2024 23:43:00 +0100, bart wrote:
All that suggest sto me is that the language *needs* an explicit endless
loop!
I agree. Also it is common for a loop to have multiple exits, and I don’t like treating one of them as a special “termination condition” above the others, so I like to use “break” for all of them.
The “for” form not only caters for this, it allows handy initialization of
local variables that keep their value between loop iterations. E.g.
for (unsigned int i = length_of(array);;)
{
if (i == 0)
{
... not found ...
break;
} /*if*/
--i;
if (... array[i] matches what I want ...)
{
.. found ...
break;
} /*if*/
} /*for*/
On Mon, 3 Jun 2024 11:16:15 +0300, Michael S wrote:
When all sources are available, linker is merely an implementation
detail.
That’s assuming all the code is written in the same language, compilable with the same compiler.
For typical non-trivial open-source projects, this is usually not true.
And consider, even with C, the meaning of top-level “static” and the implications for compiling the source in separate pieces versus all at
once.
Even in old days of small RAMs, super-popular TurboPascal suit had
modules, but I don't think that it had linker.
The programs it built had sizes in, say, the tens of thousands of lines at most.
On Mon, 3 Jun 2024 11:13:32 +0100, bart wrote:
So how does importlib manage to import importlib before importlib
itself is imported?
I guess the same way a linker manages to link itself.
There is NO ahead-of-time linking of modules in Python as it is
understood in traditional compiled languages.
Python is a compiled language.
On 04/06/2024 07:17, Kaz Kylheku wrote:
On 2024-06-03, Scott Lurndal <[email protected]> wrote:
At the time, in the OS research community, Chorus was, indeed well-known. >>If Chorus at least doesn't vaguely ring a bell, you must have your head
up your ass as even a bachelor-level computer scientist.
I think that is putting it a bit strongly - it is a /long/ time since
Chorus was relevant even in academic circles. And while it was
influential, I don't know that it was ever widely used (Scott will know
more about that, I guess).
On 6/3/2024 3:23 PM, Tim Rentsch wrote:
[email protected] (Scott Lurndal) writes:
[ ... (internal-combustion) engines, ... ]
It's pretty clear that the ICE is becoming a dinosaur.
Kind of makes it full circle, doesn't it? ;)
Though, annoyingly, there isn't a great alternative in some use cases:
Batteries: Lower energy density and require charging (slow);
Fuel Cells: More expensive and finicky.
On 6/4/2024 12:17 AM, Kaz Kylheku wrote:
On 2024-06-03, Scott Lurndal <[email protected]> wrote:
At the time, in the OS research community, Chorus was, indeed well-known. >>If Chorus at least doesn't vaguely ring a bell, you must have your head
up your ass as even a bachelor-level computer scientist.
FWIW: When I was going to college for a CS major, the emphasis was
mostly on Microsoft technologies, and a lot of the classes were taught
in C#. I mostly stuck with C for my own uses though (and IIRC did write
one class project in C++/CLI).
It is utterly different from the linkers used with typical C code.
On 04/06/2024 03:10, Lawrence D'Oliveiro wrote:
On Mon, 3 Jun 2024 11:16:15 +0300, Michael S wrote:
When all sources are available, linker is merely an implementation
detail.
That’s assuming all the code is written in the same language,
compilable with the same compiler.
Why, how many C compilers do you use for the same project?
On 04/06/2024 03:10, Lawrence D'Oliveiro wrote:
On Mon, 3 Jun 2024 11:16:15 +0300, Michael S wrote:
When all sources are available, linker is merely an
implementation detail.
That's assuming all the code is written in the same language,
compilable with the same compiler.
Why, how many C compilers do you use for the same project?
On 6/4/2024 2:21 PM, Scott Lurndal wrote:
BGB <[email protected]> writes:
On 6/3/2024 3:23 PM, Tim Rentsch wrote:
[email protected] (Scott Lurndal) writes:
[ ... (internal-combustion) engines, ... ]
It's pretty clear that the ICE is becoming a dinosaur.
Kind of makes it full circle, doesn't it? ;)
Though, annoyingly, there isn't a great alternative in some use cases:
Batteries: Lower energy density and require charging (slow);
Both of which are an order of magnitude better than just a
decade ago - and both energy density and charge time are
a subject of intense research (both in the automotive
and aircraft industries). I fully expect that energy density
per kilogram will be more than doubled in the next decade.
Still pretty far tough to catch up with Ethanol or Gasoline, where it is also many orders of magnitude faster to refill a fuel tank than to charge a battery, ...
IIRC, there aren't many battery technologies that can manage a charge rate much over 1C to 3C (so, getting a recharge time much under ~ 20 minutes or so is unlikely).
Vs, say, refilling something like a car in ~ 25 seconds or so at a fuel pump (but, could potentially be made faster if needed). Though, there are likely to be limits here short of redesigning the mechanical interface.pumping the fuel at 10 gal/min, they pump it at 90 gal/min, and effectively pressure-washing the inside of the fuel-tank).
Say, it could be possible to refill a gas tank in around 3 seconds or so with enough pressure and active sensing, but whether this could be done reliably without undue risk of causing fuel tanks to rupture or similar is unclear (say, rather than
Also would need a fairly strong fuel hose as well (likely steel reinforced to deal with the pressure within the hose).
The main traditional disadvantage of liquid fuel (and ICE's) vs batteries and electric motors, is the comparably low conversion efficiency. Liquid fuel would be stronger here if better conversion efficiencies were achieved (an ICE losing much of itspotential energy as noise and heat).
So, ideally, need some sort of semi-efficient fuel to electricity conversion (possibly using a more modest size batter pack as a buffer stage).
Well, also some potential application areas, like human-scale robots, are hindered by not having any good way to power them (both ICE's and batteries sucking in this application area).
Fuel Cells: More expensive and finicky.
And if you're going to use renewable energy to crack water
into H2, why not just use the electricity itself (concentrate
on better storage technology rather than H2 (gas or liquid)
fuel cells).
Yeah, H2 just kinda sucks.
Ethanol is much better as a fuel in most regards.
But, effectively running fuel cells on Ethanol (rather than H2) is a more complex problem. Methanol is a little easier here, but still not great (also methanol poses a risk due to its high toxicity).
But, yeah, not really a good way to convert electricity into Ethanol or similar.
Methanol could be produced using electricity assuming one can scavenge enough CO2 (with water as an additional input, leaving O2 as a waste product).
Could in theory produce methanol simply using air and electricity as inputs (scavenging both H2O and CO2), but the conversion efficiency would likely be dismal (most of the energy use would be spent running an air compressor, though an air-motor couldrecover some of this on the output side).
Say:say, 250C and 75bar), with the resulting water and methanol being collected, then fed through a distillation phase (likely dropping the pressure by a controlled amount so that the methanol vaporizes but leaving the water behind); the water is then
Compress air into a big tank;
Collect water that accumulates in tank;
Bubble compressed air through an amine solution (this collects CO2 into the solution);
Pump amine through another tank where heat is applied to extract CO2 from the solution (it is then cooled and pumped back through the former tank, to collect more CO2);
Collected water is subjected to a momentary pressure drop (to remove dissolved CO2), and then sent in to an electrolysis stage (to get H2 gas), with the H2 and CO2 being pumped into a heated high-pressure reaction chamber (to produce water and methanol,
Likely, things like heat control/recovery would be needed to have any semblance of efficiency (as well, one would need to recover what energy they can when the waste products are returned to atmospheric pressure).
Pumping (followed by electrolysis) are likely to be the main energy uses, potentially much of the heating and cooling needed could be achieved through the compression and expansion stages (so potentially wouldn't need any additional energy input).pumps and similar).
Would need to process a fairly large volume of air relative to any methanol produced though (so, I would expect mechanical losses in the compression and expansion stages would be where most of the energy loss would occur, such as due to friction in the
Though, I do have some questionable experimental features, like a
compiler option to cause arrays and pointers to be bounds-checked, which
is sometimes useful in debugging (but in some cases can add bugs of its
own; also makes the binaries bigger and negatively effects performance,
...).
Linux was talked about to some extent in one of the classes (but, more
in a high-level introductory sense). ...
But, at the time, the then new OS was Windows Vista ...
At the time, in the OS research community, Chorus was, indeed
well-known.
On Tue, 4 Jun 2024 12:35:43 +0100, bart wrote:
It is utterly different from the linkers used with typical C code.
It does symbol resolution, just like any linker. It handles dependencies (both direct and transitive), just like any linker.
(
I am interested in the C23 subject, but I found almost impossible to
follow such big thread. For me, it is even more difficult without google groups interface and it consumes a lot of time.
My suggestion is to split C23 topics in smaller ones for the specific
item like embed etc...
)
On Mon, 03 Jun 2024 21:22:07 GMT, Scott Lurndal wrote:
At the time, in the OS research community, Chorus was, indeed
well-known.
As soon as you hear “microkernel”, you know it’s essentially a museum- >piece now.
On 6/4/2024 2:21 PM, Scott Lurndal wrote:
BGB <[email protected]> writes:
On 6/3/2024 3:23 PM, Tim Rentsch wrote:
[email protected] (Scott Lurndal) writes:
[ ... (internal-combustion) engines, ... ]
It's pretty clear that the ICE is becoming a dinosaur.
Kind of makes it full circle, doesn't it? ;)
Though, annoyingly, there isn't a great alternative in some use cases:
Batteries: Lower energy density and require charging (slow);
Both of which are an order of magnitude better than just a
decade ago - and both energy density and charge time are
a subject of intense research (both in the automotive
and aircraft industries). I fully expect that energy density
per kilogram will be more than doubled in the next decade.
Still pretty far tough to catch up with Ethanol or Gasoline, where it is
also many orders of magnitude faster to refill a fuel tank than to
charge a battery, ...
IIRC, there aren't many battery technologies that can manage a charge
rate much over 1C to 3C (so, getting a recharge time much under ~ 20
minutes or so is unlikely).
Vs, say, refilling something like a car in ~ 25 seconds or so at a fuel
pump (but, could potentially be made faster if needed). Though, there
are likely to be limits here short of redesigning the mechanical interface.
Lawrence D'Oliveiro <[email protected]d> writes:
On Mon, 03 Jun 2024 21:22:07 GMT, Scott Lurndal wrote:
At the time, in the OS research community, Chorus was, indeed
well-known.
As soon as you hear “microkernel”, you know it’s essentially a museum- >>piece now.
You're really a piece of work.
A modern hypervisor can be considered a microkernel.
Any language that has 'module' objects, even within the same source
file, is a linker.
The term usually refers to a program that takes independently compiled binaries containing native code ...
On Fri, 31 May 2024 22:15:54 +0100
bart <[email protected]> wrote:
If I run this:
printf("%p\n", &_binary_hello_c_start);
printf("%p\n", &_binary_hello_c_end);
printf("%p\n", &_binary_hello_c_size);
I get:
00007ff6ef252010
00007ff6ef252056
00007ff5af240046
I can see that the first two can be subtracted to give the sizes of
the data, which is 70 or 0x46. 0x46 is the last byte of the address
of _size, so what's happening there? What's with the crap in bits
16-47?
It looks like ASLR. I don't see it because I test on Win7.
On Wed, 5 Jun 2024 09:10:34 +0100, bart wrote:
Any language that has 'module' objects, even within the same source
file, is a linker.
I did point out that linking involved pulling multiple files together, did
I not?
The term usually refers to a program that takes independently compiled
binaries containing native code ...
Binaries of some form, yes. Remember “native” is just as relative a term as “hardware” is.
On 5/31/2024 4:11 PM, Scott Lurndal wrote:
jak <[email protected]> writes:
bart ha scritto:
On 31/05/2024 15:34, Michael S wrote:
On Fri, 31 May 2024 15:04:46 +0100
bart <[email protected]> wrote:
<snip>
You could use the pe-x86-64 format instead of the elf64-x86-64 to reduce >>> the size of the object.
Instead of one compiler, here I used two compilers, a tool 'objcopy'
(which bizarrely needs to generate ELF format files) and lots of extra >>>> ugly code. I also need to disregard whatever the hell _binary_..._size >>>> does.
But it works.
By a half dozen bytes, perhaps, and only if your binutils have been
built to support pe-x86-64:
$ objcopy -I binary -O pe-x86-64 main.cpp /tmp/test1.o
objcopy:/tmp/test1.o: Invalid bfd target
The ELF64 format has a 64 byte header, the string table and the
symbol table, and the remainder is the binary
data. The PE header may save a few bytes by using 32-bit fields in
the PE COFF header and symbol table.
Note, you might want to trim your posts when replying with a one-sentence reply.
While I can't say much for using objcopy here (it is likely to be
hindered by however the program was compiled and linked, in any case),
in some other contexts PE/COFF can save more significant amounts of
space vs ELF.
In particular:
PE/COFF typically only stores symbols for imports and exports, rather
than for every symbol in the binary (though, IIRC, GCC+LD does tend to >generate PE/COFF output with every symbol present, *1, so this advantage
is mostly N/A if using GCC).
The PE/COFF base relocation format is more compact than the ELF64
relocation formats:
ELF64 tends to spend 24 bytes for every symbol, and 24 bytes for each
reloc; along with an ASCII string for every symbol.
It also tends to redirect most calls and loads/stores for global
variables through the GOT, rather than using PC-relative / RIP-relative >addressing (or fixed displacements relative to a Global Pointer),
causing the generated code to be larger (along with the size of the GOT).
*2: Seemingly the main way I am aware of to get small binaries is to use
an older version of MSVC (such as 6.0 to 9.0), as the binary-bloat
started to get much more obvious around Visual Studio 2010, but is less
of an issue with VS2005 or VS2008.
Sorry, I've lost track of what it is you are trying to prove. That
everything is a linker?
For my bounds-checking in C, there are no syntactic changes to C.
A modern hypervisor can be considered a microkernel.
Generally, using ELF32 on 64-bit targets isn't a thing...
On Thu, 6 Jun 2024 19:38:08 +0100, bart wrote:
Sorry, I've lost track of what it is you are trying to prove. That
everything is a linker?
You were the one trying to prove that linkerless programming was a good
idea, or something. And then you tried to distract attention from the weakness of your arguments by bringing Python into it.
Not working out so well now, is it?
It's you who can't get your head around the idea that someone could be
away with a 'linker'.
On 2024-06-07, bart <[email protected]> wrote:
It's you who can't get your head around the idea that someone could be
away with a 'linker'.
You can do away with linkers and linking.
But it's pretty helpful when
1. the same library is reused for many programs.
2. you're selling a library, and would like to ship a binary image of
that library.
Without linkage, you don't have a library ecosystem.
I think code generation went in the bulky direction when they started
adding auto-vectorization, and not really any option to be like "Yes, I
want SIMD instructions enabled, but, no, don't autovectorize."
Sometimes vectorization makes things faster, sometimes not, but one
thing it does do, is make the generated binaries bigger.
On 08/06/2024 01:39, Kaz Kylheku wrote:
On 2024-06-07, bart <[email protected]> wrote:
It's you who can't get your head around the idea that someone could be
away with a 'linker'.
You can do away with linkers and linking.
But it's pretty helpful when
1. the same library is reused for many programs.
You use a shared library.
2. you're selling a library, and would like to ship a binary image of
that library.
You ship a shared library.
Can also note that the compilers handle debugging info different:
GCC tends to put debug data in the binary itself (as DWARF or STABS);
But, in general, I suspect MS doesn't care if the EXE and DLL files are
bulky and if their compiler doesn't win the performance game.
On 2024-06-08, bart <[email protected]> wrote:
On 08/06/2024 01:39, Kaz Kylheku wrote:
On 2024-06-07, bart <[email protected]> wrote:
It's you who can't get your head around the idea that someone could be >>>> away with a 'linker'.
You can do away with linkers and linking.
But it's pretty helpful when
1. the same library is reused for many programs.
You use a shared library.
That's linking.
Static linking is the same thing as dynamic except it's being
precomputed: the libs are dynamically processed, but then rather
than the program being run, its image is dumped into an executable.
That executable no then longer needs to repeat that library processing
when started; everything is integrated. (There are ways to optimize
linking so not all the material must be present in memory all at once
as I describe it above.)
2. you're selling a library, and would like to ship a binary image of
that library.
You ship a shared library.
No, not always. There is such thing as selling static libraries.
Numerical code, crypto, codecs.
A few times in my career I worked with purchased static libs.
There are some advantages to it, like that static calls can be
faster than dynamic,
removed at link time.
Another aspect is that it's possible for static libs to be platform-independent, to an extent, because some of the
object formats like COFF are widely recognized. Whereas
shared libs tend to be very OS specific. The vendor has to make
them separately for Windows, Linux, Solaris, BSD, Mac, ...
This gruntwork is a pain in the ass that is removed from
the core value of your code.
The integrator who buys your static lib can turn it into a
shared lib for their target system, if they are so inclined.
On Fri, 7 Jun 2024 16:58:08 -0500, BGB-Alt wrote:
I think code generation went in the bulky direction when they started
adding auto-vectorization, and not really any option to be like "Yes, I
want SIMD instructions enabled, but, no, don't autovectorize."
Sometimes vectorization makes things faster, sometimes not, but one
thing it does do, is make the generated binaries bigger.
And MSVC is the compiler that Microsoft use to build Windows itself, isn’t >it?
On 07/06/2024 01:53, Lawrence D'Oliveiro wrote:
On Thu, 6 Jun 2024 15:38:21 -0500, BGB-Alt wrote:
*2: Seemingly the main way I am aware of to get small binaries is to
use an older version of MSVC (such as 6.0 to 9.0), as the binary-bloat
started to get much more obvious around Visual Studio 2010, but is
less of an issue with VS2005 or VS2008.
Newer version of proprietary compiler generates worse code than older
version?!?
If the code is calling extern gunctions that do IO, we woul expect these
to be massively more sophisticated on a modern ststem Witha little
comouter, pribtf just wtites acharacter raster and utimalthe he Os picks
the up and flushes it out to a pixel raster. And that' aal it's doing.
Whilst on a modrern syste, stdout can do whole lot of intricate things.
Lawrence D'Oliveiro <[email protected]d> writes:
On Fri, 7 Jun 2024 16:58:08 -0500, BGB-Alt wrote:
I think code generation went in the bulky direction when they started
adding auto-vectorization, and not really any option to be like "Yes,
I want SIMD instructions enabled, but, no, don't autovectorize."
Sometimes vectorization makes things faster, sometimes not, but one
thing it does do, is make the generated binaries bigger.
And MSVC is the compiler that Microsoft use to build Windows itself, >>isn’t it?
Last time I built NT, it used the command line compiler 'cl.exe', IIRC.
Granted that was 1998.
Lawrence D'Oliveiro <[email protected]d> writes:
On Fri, 7 Jun 2024 16:58:08 -0500, BGB-Alt wrote:
I think code generation went in the bulky direction when they
started adding auto-vectorization, and not really any option to be
like "Yes, I want SIMD instructions enabled, but, no, don't
autovectorize."
Sometimes vectorization makes things faster, sometimes not, but one
thing it does do, is make the generated binaries bigger.
And MSVC is the compiler that Microsoft use to build Windows itself, >isn’t it?
Last time I built NT, it used the command line compiler 'cl.exe',
IIRC.
Granted that was 1998.
On 6/8/2024 1:28 PM, Malcolm McLean wrote:
On 07/06/2024 01:53, Lawrence D'Oliveiro wrote:
On Thu, 6 Jun 2024 15:38:21 -0500, BGB-Alt wrote:If the code is calling extern gunctions that do IO, we woul expect
*2: Seemingly the main way I am aware of to get small binaries is
to use an older version of MSVC (such as 6.0 to 9.0), as the
binary-bloat started to get much more obvious around Visual
Studio 2010, but is less of an issue with VS2005 or VS2008.
Newer version of proprietary compiler generates worse code than
older version?!?
these to be massively more sophisticated on a modern ststem Witha
little comouter, pribtf just wtites acharacter raster and utimalthe
he Os picks the up and flushes it out to a pixel raster. And that'
aal it's doing. Whilst on a modrern syste, stdout can do whole lot
of intricate things.
That is a whole lot of typos...
But, even if it is built calling MSVCRT as a DLL (rather than static
linked), modern MSVC is still the worst of the bunch in this area.
A build as RISC-V + PIE with a static-linked C library still manages
to be smaller than an x64 build via MSVC with entirely dynamic-linked libraries.
And, around 72% bigger than the same program built as a
dynamic-linked binary with "GCC -O3" (while also often still being
around 40% slower).
Contrast, VS2008 can build programs with binary sizes closer to those
of GCC.
On Sat, 8 Jun 2024 14:52:26 -0500
BGB <[email protected]> wrote:
On 6/8/2024 1:28 PM, Malcolm McLean wrote:
On 07/06/2024 01:53, Lawrence D'Oliveiro wrote:
On Thu, 6 Jun 2024 15:38:21 -0500, BGB-Alt wrote:If the code is calling extern gunctions that do IO, we woul expect
*2: Seemingly the main way I am aware of to get small binaries is
to use an older version of MSVC (such as 6.0 to 9.0), as the
binary-bloat started to get much more obvious around Visual
Studio 2010, but is less of an issue with VS2005 or VS2008.
Newer version of proprietary compiler generates worse code than
older version?!?
these to be massively more sophisticated on a modern ststem Witha
little comouter, pribtf just wtites acharacter raster and utimalthe
he Os picks the up and flushes it out to a pixel raster. And that'
aal it's doing. Whilst on a modrern syste, stdout can do whole lot
of intricate things.
That is a whole lot of typos...
But, even if it is built calling MSVCRT as a DLL (rather than static
linked), modern MSVC is still the worst of the bunch in this area.
A build as RISC-V + PIE with a static-linked C library still manages
to be smaller than an x64 build via MSVC with entirely dynamic-linked
libraries.
And, around 72% bigger than the same program built as a
dynamic-linked binary with "GCC -O3" (while also often still being
around 40% slower).
GCC on Windows or on Linux?
In my experience, gcc on Windows (ucrt64 variant, other gcc variants
are worse) very consistently produces bigger (stripped) exe than even
latest MSVCs which, as you correctly stated, are not as good as older versions at producing small code.
The size of 'Hello, world' program (x86-64, dynamically linked C RTL)
vs2013 - 6,144 bytes
vs2019 - 9,216 bytes
gcc (Debian Linux, -no-pie) - 14,400 bytes
gcc (Debian Linux) - 14,472 bytes
gcc (ucrt64 DLL) - 18,432 bytes
gcc (old DLL) - 42,496 bytes
On 09/06/2024 10:40, Michael S wrote:
On Sat, 8 Jun 2024 14:52:26 -0500
BGB <[email protected]> wrote:
On 6/8/2024 1:28 PM, Malcolm McLean wrote:
On 07/06/2024 01:53, Lawrence D'Oliveiro wrote:
On Thu, 6 Jun 2024 15:38:21 -0500, BGB-Alt wrote:If the code is calling extern gunctions that do IO, we woul expect
*2: Seemingly the main way I am aware of to get small binaries
is to use an older version of MSVC (such as 6.0 to 9.0), as the
binary-bloat started to get much more obvious around Visual
Studio 2010, but is less of an issue with VS2005 or VS2008.
Newer version of proprietary compiler generates worse code than
older version?!?
these to be massively more sophisticated on a modern ststem Witha
little comouter, pribtf just wtites acharacter raster and
utimalthe he Os picks the up and flushes it out to a pixel
raster. And that' aal it's doing. Whilst on a modrern syste,
stdout can do whole lot of intricate things.
That is a whole lot of typos...
But, even if it is built calling MSVCRT as a DLL (rather than
static linked), modern MSVC is still the worst of the bunch in
this area.
A build as RISC-V + PIE with a static-linked C library still
manages to be smaller than an x64 build via MSVC with entirely
dynamic-linked libraries.
And, around 72% bigger than the same program built as a
dynamic-linked binary with "GCC -O3" (while also often still being
around 40% slower).
GCC on Windows or on Linux?
In my experience, gcc on Windows (ucrt64 variant, other gcc variants
are worse) very consistently produces bigger (stripped) exe than
even latest MSVCs which, as you correctly stated, are not as good
as older versions at producing small code.
The size of 'Hello, world' program (x86-64, dynamically linked C
RTL) vs2013 - 6,144 bytes
vs2019 - 9,216 bytes
gcc (Debian Linux, -no-pie) - 14,400 bytes
gcc (Debian Linux) - 14,472 bytes
gcc (ucrt64 DLL) - 18,432 bytes
gcc (old DLL) - 42,496 bytes
I get a lot worse than that:
C:\c>gcc hello.c
C:\c>dir a.exe
09/06/2024 11:04 367,349 a.exe
C:\c>gcc hello.c -s -Os
C:\c>dir a.exe
09/06/2024 11:04 88,064 a.exe
(It didn't like -Oz; did you mean something other than -Os?)
Both import msvcrt.dll. gcc is version 10.3.0.
tcc gives 2KB, and mcc gives 2.5KB.
(With the latter, I know it is because it uses a comprises 5 blocks
of data each of which is at least 512 bytes: 2 for header stuff, plus
always 3 segments. The mininum hello.exe size I think is 700 bytes if
a few corners are cut.)
367KB sounds astonishing, but the first time I tried Dart, it gave me
a 5MB executable for 'hello.dart'.
On Sun, 9 Jun 2024 11:20:11 +0100
bart <[email protected]> wrote:
367KB sounds astonishing, but the first time I tried Dart, it gave
me a 5MB executable for 'hello.dart'.
golang tend to start at >1.5MB, but then it grows very slowly. It
appears to generate *very* self-contained executives. At least I
personally never encountered case where simple copy of exe to new
computer was insufficient.
Considering that go needs much more of run-time support than dart, I
can't find any reason for 5MB except "they don't care".
On 09/06/2024 12:12, Michael S wrote:
On Sun, 9 Jun 2024 11:20:11 +0100
bart <[email protected]> wrote:
GCC on Windows or on Linux?
In my experience, gcc on Windows (ucrt64 variant, other gcc
variants are worse) very consistently produces bigger (stripped)
exe than even latest MSVCs which, as you correctly stated, are
not as good as older versions at producing small code.
The size of 'Hello, world' program (x86-64, dynamically linked C
RTL) vs2013 - 6,144 bytes
vs2019 - 9,216 bytes
gcc (Debian Linux, -no-pie) - 14,400 bytes
gcc (Debian Linux) - 14,472 bytes
gcc (ucrt64 DLL) - 18,432 bytes
gcc (old DLL) - 42,496 bytes
I get a lot worse than that:
C:\c>gcc hello.c
C:\c>dir a.exe
09/06/2024 11:04 367,349 a.exe
C:\c>gcc hello.c -s -Os
C:\c>dir a.exe
09/06/2024 11:04 88,064 a.exe
(It didn't like -Oz; did you mean something other than -Os?)
No, I meant -Oz.
It was invented by clang, but newer gcc understand it.
I don't know what is a difference exactly, but -Oz tends to be a
little smaller.
In program as trivial as this, there should be no difference.
Both import msvcrt.dll. gcc is version 10.3.0.
My gcc variants are from msys2.
Where did you get yours?
It's gcc/TDM.
Anything else, I can spend 10 minutes following links
to a mingw download, only to end up back where I started from.
gcc/TDM is a much simpler installation.
tcc gives 2KB, and mcc gives 2.5KB.
x86-64 or i386?
All were for x64.
gcc's stdio.h header defines `printf` (which my hello.c uses) as an
inlined wrapper based around `__mingw_vasprintf()`. So there might
be further inlined stuff or that is statically linked, before it
finally ends up calling the real `printf`.
With gcc, I get 39.9KB for -m32 -Os -s.
If I use 'puts' instead, and -m32, then it gets down to 14KB.
On Sun, 9 Jun 2024 11:20:11 +0100
bart <[email protected]> wrote:
GCC on Windows or on Linux?
In my experience, gcc on Windows (ucrt64 variant, other gcc variants
are worse) very consistently produces bigger (stripped) exe than
even latest MSVCs which, as you correctly stated, are not as good
as older versions at producing small code.
The size of 'Hello, world' program (x86-64, dynamically linked C
RTL) vs2013 - 6,144 bytes
vs2019 - 9,216 bytes
gcc (Debian Linux, -no-pie) - 14,400 bytes
gcc (Debian Linux) - 14,472 bytes
gcc (ucrt64 DLL) - 18,432 bytes
gcc (old DLL) - 42,496 bytes
I get a lot worse than that:
C:\c>gcc hello.c
C:\c>dir a.exe
09/06/2024 11:04 367,349 a.exe
C:\c>gcc hello.c -s -Os
C:\c>dir a.exe
09/06/2024 11:04 88,064 a.exe
(It didn't like -Oz; did you mean something other than -Os?)
No, I meant -Oz.
It was invented by clang, but newer gcc understand it.
I don't know what is a difference exactly, but -Oz tends to be a little smaller.
In program as trivial as this, there should be no difference.
Both import msvcrt.dll. gcc is version 10.3.0.
My gcc variants are from msys2.
Where did you get yours?
tcc gives 2KB, and mcc gives 2.5KB.
x86-64 or i386?
On Sun, 9 Jun 2024 17:32:40 +0100
bart <[email protected]> wrote:
On 09/06/2024 12:12, Michael S wrote:
On Sun, 9 Jun 2024 11:20:11 +0100
bart <[email protected]> wrote:
GCC on Windows or on Linux?
In my experience, gcc on Windows (ucrt64 variant, other gcc
variants are worse) very consistently produces bigger (stripped)
exe than even latest MSVCs which, as you correctly stated, are
not as good as older versions at producing small code.
The size of 'Hello, world' program (x86-64, dynamically linked C
RTL) vs2013 - 6,144 bytes
vs2019 - 9,216 bytes
gcc (Debian Linux, -no-pie) - 14,400 bytes
gcc (Debian Linux) - 14,472 bytes
gcc (ucrt64 DLL) - 18,432 bytes
gcc (old DLL) - 42,496 bytes
I get a lot worse than that:
C:\c>gcc hello.c
C:\c>dir a.exe
09/06/2024 11:04 367,349 a.exe
C:\c>gcc hello.c -s -Os
C:\c>dir a.exe
09/06/2024 11:04 88,064 a.exe
(It didn't like -Oz; did you mean something other than -Os?)
No, I meant -Oz.
It was invented by clang, but newer gcc understand it.
I don't know what is a difference exactly, but -Oz tends to be a
little smaller.
In program as trivial as this, there should be no difference.
Both import msvcrt.dll. gcc is version 10.3.0.
My gcc variants are from msys2.
Where did you get yours?
It's gcc/TDM.
I never heard about TDM except from you.
Anything else, I can spend 10 minutes following links
to a mingw download, only to end up back where I started from.
gcc/TDM is a much simpler installation.
Somehow, I installed msys2 many times, using 2 or 3 different methods
and it worked every single time. It's huge download, but it works.
There were cases where I had problems installing additional packages on
top of msys2, but they were always caused by idiotic policies of
corporate IT. At my personal systems it was always flawless.
This page appear to give correct up to date instructions https://www.msys2.org/#installation
I can only tell you what works well for me. I can't force you to use it. Also, I can't prevent you from trying to use something that no longer
works well due to absence of support, i.e. old msys/mingw.
On 09/06/2024 21:40, Michael S wrote:
I can only tell you what works well for me. I can't force you to
use it. Also, I can't prevent you from trying to use something that
no longer works well due to absence of support, i.e. old msys/mingw.
I was trying to install the LATEST version of gcc on Windows! That
would 13.x, which I've done before, perhaps hitting on the right link
by chance.
'gcc' /can/ be run from a pure Windows command line, as I've been
using versions of it for years.
But they don't make it easy, as gcc is perceived to be tied to WSL
MSYS2 MINGW CYGWIN.
I've had another go at this elusive compiler, this time apparently successful. Here are the steps I used:
* Start from mingw-w64.com. Ignore where it says it's a 'complete
runtime environment for gcc'. There is also an actual compiler at
the end of the process!
* Click on Downloads on the left
* There is a list of prebuilt toolchains. The promising ones are
w64devkit, MingW-W64-builds, and possibly WinLibs.com?
I clicked on MinGW-W64-builds.
* That takes you down the page to MingW-Builds, but this is where I
had a bit of luck: as this is a one-line entry, I missed it and
starting reading about WinLibs.com instead. But where are the
downloads? The link is in the small print on the last line of that
section.
* It you to winlibs.com. This is looks disconcertingly like a 1990s
website. It surely can't be the right place? Just don't click on
MinGW-w64 as that just takes you back to square one.
* Scroll down to Downloads. There are 16 to choose from for each
version. I clicked (by mistake - I think) on the version /with/
LLVM etc, but I don't know what the difference is. I chose the MSVCRT
version.
The end result was a 1.4GB installation of gcc 14.1.0. Using 'gcc
hello.c -Os -s' gives of 48KB (with 10.3 it was 88KB). It still
imports msvcrt.dll, but not printf (it does import vfprintf).
On Sun, 9 Jun 2024 22:49:39 +0100...
bart <[email protected]> wrote:
On 09/06/2024 21:40, Michael S wrote:
I can only tell you what works well for me. I can't force you to
use it. Also, I can't prevent you from trying to use something that
no longer works well due to absence of support, i.e. old msys/mingw.
I was trying to install the LATEST version of gcc on Windows! That
would 13.x, which I've done before, perhaps hitting on the right link
by chance.
'gcc' /can/ be run from a pure Windows command line, as I've been
using versions of it for years.
But they don't make it easy, as gcc is perceived to be tied to WSL
MSYS2 MINGW CYGWIN.
I've had another go at this elusive compiler, this time apparently
successful. Here are the steps I used:
The end result was a 1.4GB installation of gcc 14.1.0. Using 'gcc
hello.c -Os -s' gives of 48KB (with 10.3 it was 88KB). It still
imports msvcrt.dll, but not printf (it does import vfprintf).
It sounds like you ended up with gcc distro based on 12 y.o. Microsoft
DLL that does not support majority of c11 library features and likely
does not support few c99 library features as well.
If you were a little less stubborn, in 10 minutes you could have have
distro based on new ucrt DLL that is closer to new C standard and
generates smaller binaries.
And likely occupies less than 1.4 GB.
BTW, I don't understand why MSVC produces smaller binaries with old MS C
RTL DLL while gcc produces smaller binaries with new MS C RTL DLL.
But that's undeniable fact.
I don't know what is a difference exactly, but -Oz tends to be a little smaller.
On 6/6/2024 7:57 PM, Lawrence D'Oliveiro wrote:
On Wed, 5 Jun 2024 04:01:28 -0500, BGB wrote:
For my bounds-checking in C, there are no syntactic changes to C.
But how efficient is it? Those research papers I mentioned reported
being able to get the execution overhead in Pascal down to something
like 5-10%.
Also somewhere around a 10% slowdown in this case, but this was with dedicated ISA level support and various specialized helper instructions
(to check/set/adjust the pointer bounds bits).
Now that it's too late to change the definition, I've thought of
something that I think would have been a better way to specify #embed.
Define a new kind of string literal, with a "uc" prefix. `uc"foo"` is
of type `unsigned char[3]`. (Or `const unsigned char[3]`, if that's not
too radical.) Unlike other string literals, there is no implicit
terminating '\0'. Arbitrary byte values can of course be specified in hexadecimal: uc"\x01\x02\x03\x04". Since there's no terminating null character and C doesn't support zero-sized objects, uc"" is a syntax
error.
uc"..." string literals might be made even simpler, for example allowing
only hex digits and not requiring \x (uc"01020304" rather than uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals
could be useful in other contexts, and programmers will want
flexibility. Maybe something like hex"01020304" (embedded spaces could
be ignored) could be defined in addition to uc"\x01\x02\x03\x04".
David Brown <[email protected]> writes:
On 28/05/2024 22:21, Keith Thompson wrote:
David Brown <[email protected]> writes:
On 28/05/2024 02:33, Keith Thompson wrote:[...]
Right, it won't affect the generated results (assuming I use itWithout some kind of programmer control, I'm concerned that the rules >>>>> for defining an array so #embed will be correctly optimized will be
spread as lore rather than being specified anywhere.
They might, but I really do not think that is so important, since they >>>> will not affect the generated results.
correctly). Unless I use `#embed optimize(true)` to initialize
a struct with varying member sizes, but that's my fault because I
asked for it.
I am still not understanding your point. (I am confident that you
have a point, even if I don't get it.)
I cannot see why there would be any need or use of manually adding
optimisation hints or controls in the source code. I cannot see why
the there is any possibility of getting incorrect results in any way.
The point is compile-timer performance, and perhaps even the ability
to compile at all.
I'm thinking about hypothetical cases where I want to embed a
*very* large file and parsing the comma-delimited sequence could
have unacceptable compile-time performance, perhaps even causing
a compile-time stack overflow depending on how the parser works.
Every time the compiler sees #embed, it has to decide whether to
optimize it or not, and the decision criteria are not specified
anywhere (not at all in the standard, perhaps not clearly in the
compiler's documentation).
Yes, I agree with that. And this is how it should be - this is not
something that should be specified. The C standards give minimum
requirements for things like the number of identifiers or the length
of lines. But pretty much all compilers, for most of the "translation
limits", say they are "limited by the memory of the host computer".
The same will apply to #embed. And some compilers will cope better
than others with huge #embed's, some will be faster, some more memory
efficient. Some will change from version to version. This is not
something that can sensibly be specified or formalized - like pretty
much everything in regard to compilation time, each compiler does the
best it can without any specifications. I'd expect compiler reference
manuals might have hints, such as saying #embed is fastest with
unsigned char arrays (or whatever), but no more than that.
But again - I see no reason for manual optimisation hints, and no
reason for any possible errors.
Let me outline a possible strategy for a compiler like gcc. (I have
not looked at the prototype implementations from thephd, nor any gcc
developer discussions.)
gcc splits the C pre-processor and the compiler itself, and
(currently) communicates dataflow in only one direction, via a
temporary file or a pipe. But the "gcc" (or "g++", according to
preference) driver program calls and coordinates the two programs.
If the pre-processor is called stand-alone, then it will generate a
comma-separated list of integers, helpfully split over multiple lines
of reasonable size. This will clearly always be correct, and always
work, within limits of a compiler's translation limits.
But when the gcc driver calls it, it will have a flag indicating that
the target compiler is gcc and supports an extended pre-processed
syntax (and also that the source is C23 - after all, the C
pre-processor can be used as a macro processor for other files with no
relation to C). Now the pre-processor has a lot more freedom.
Whenever it meets an #embed directive, it can generate a line :
#embed_data 123456
followed in the file by 123456 (or whatever) bytes of binary data.
The C compiler, when parsing this file, will pull that in as a single
blob. Then it is up to the C compiler - which knows how the #embed
data will be used - to tell if the these bytes should be used as
parameters to a macro, initialisation for a char array, or whatever.
And it can use them as efficiently as practically possible. (It is
probably only worth using this for #embed data over a certain size -
smaller #embed's could just generate the integer sequences.)
Nowhere in this is there any call of manual optimisation hints, nor
any risk of incorrect results.
I've kept this on the back burner for a couple of weeks. I'm finally
getting around to posting a followup.
I'm not particular concerned about compilers processing #embed
incorrectly. It's conceivable that a compiler could incorrectly decide
that it can optimize a particular #embed directive, but I expect
compilers to be conservative, falling back to the specified behavior if
they can't *prove* that an optimization is safe.
I see two conceptual problems with #embed as it's currently defined in
N3220.
First, there's a possible compile-time performance issue for very large embedded files. The (draft) standard calls for #embed to expand to a comma-separated list of integer constant expressions. (I'm not sure why
it didn't specify integer constants.)
My objection is based on the possibility that #embed for a *very* large
file might result in unacceptable time and memory usage during compile
time. I haven't looked into how existing compilers handle large initializers, but I can imagine that parsing such a list might consume
more than O(N) time and/or memory, or at least O(N) with a large
constant. (If parsing long lists of integer constants is expensive for
some compiler, this could be a motivation to optimize that particular
case.)
The intent of #embed is to copy the contents of a file at compile time
into an array of unsigned char -- but it's specified in a roundabout way
that requires bizarre usages to work "correctly".
I expect at least
some compilers to optimize #embed for better compile-time performance,
but that requires them to determine when optimization is permitted with
no advice from the standard about how to do that. That's going to be moderately difficult for compiler implementers; I'm not too concerned
about that. But it also imposes a burden on programmers, who will have
to use trial and error to determine how to ensure a #embed is optimized.
This all assumes that a naive #embed implementation is going to
cause real problems for very large embedded files (compile-time
stack overflows, unreasonably long compile times, or just using so
much memory that system performance is affected). If it turns out
that this isn't the case, then that objection is mostly addressed.
My other objection is that it's conceptually messy. The expected use
case is in an initializer for an array of unsigned char, but there are
no restrictions on where it can be used.
As a programmer, I want to
copy a file verbatim into an unsigned char array, but at least
conceptually #embed translates the file contents into a long sequence of expressions which are then processed as C code to recreate the raw data. There are bizarre cases (like my previous example initializing a struct
with members of various types) that are required to work. #embed is a preprocessor directive, but determining whether it can be optimized
requires feedback from later compiler phases. It's doable, but it's
*ugly*.
Now that it's too late to change the definition, I've thought of
something that I think would have been a better way to specify #embed.
Define a new kind of string literal, with a "uc" prefix. `uc"foo"` is
of type `unsigned char[3]`. (Or `const unsigned char[3]`, if that's not
too radical.) Unlike other string literals, there is no implicit
terminating '\0'. Arbitrary byte values can of course be specified in hexadecimal: uc"\x01\x02\x03\x04". Since there's no terminating null character and C doesn't support zero-sized objects, uc"" is a syntax
error.
uc"..." string literals might be made even simpler, for example allowing
only hex digits and not requiring \x (uc"01020304" rather than uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals
could be useful in other contexts, and programmers will want
flexibility. Maybe something like hex"01020304" (embedded spaces could
be ignored) could be defined in addition to uc"\x01\x02\x03\x04".
Specify that #embed expands to a sequence of one or more uc string
literals (or hex string literals if that's added), separated by
whitespace. If the embedded file might be empty, use the existing
is_empty() embed parameter. Without is_empty, #embed of an empty file
will expand to uc"", a syntax error.
Since a string literal is a single token, parsing it is likely to be
more efficient than parsing a sequence of integer constant expressions,
even with concatenation of multiple literals. Since a uc"..." string
literal is specifically of type unsigned char[], it can *only* be used
to initialize an unsigned char[] or unsigned char* object, addressing
the conceptual mess. If you want to use #embed to initialize an
array of some other type, you can use a union or some other form of type-punning.
A conforming C23 implementation could even implement this by providing uc"..." (and perhaps hex"...") literals as an extension and adding an implementation-defined embed parameter that generates them.
On 14/06/2024 22:30, Keith Thompson wrote:
Now that it's too late to change the definition, I've thought of
something that I think would have been a better way to specify #embed.
Define a new kind of string literal, with a "uc" prefix. `uc"foo"` is
of type `unsigned char[3]`. (Or `const unsigned char[3]`, if that's not
too radical.) Unlike other string literals, there is no implicit
terminating '\0'. Arbitrary byte values can of course be specified in
hexadecimal: uc"\x01\x02\x03\x04". Since there's no terminating null
character and C doesn't support zero-sized objects, uc"" is a syntax
error.
uc"..." string literals might be made even simpler, for example allowing
only hex digits and not requiring \x (uc"01020304" rather than
uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals
could be useful in other contexts, and programmers will want
flexibility. Maybe something like hex"01020304" (embedded spaces could
be ignored) could be defined in addition to uc"\x01\x02\x03\x04".
That's something I added to string literals in my language within the
last few months. Nothing do with embedding (but it can make hex
sequences within strings more efficient, if that approach was used).
Writing byte-at-a-time hex data was always a bit fiddly:
0x12, 0x34, 0xAB, ...
"\x12\x34\xAB...
It was made worse by my preference for `x` being in lower case, and the
hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look wrong.
What I did was create a new, variable-lenghth string escape sequence
that looks like this:
"ABC\h1234AB...\nopq" // hex sequence between ABC & nopq
Hex digits after \h or \H are read in pairs. White space is allowed
between pairs:
"ABC\H 12 34 AB ...\nopq"
The only thing I wasn't sure about was the closing backslash, which
looks at first like another escape code. But I think it is sound,
although it can still be tweaked.
On 15/06/2024 00:39, bart wrote:
On 14/06/2024 22:30, Keith Thompson wrote:
Now that it's too late to change the definition, I've thought of
something that I think would have been a better way to specify #embed.
Define a new kind of string literal, with a "uc" prefix. `uc"foo"` is
of type `unsigned char[3]`. (Or `const unsigned char[3]`, if that's not >>> too radical.) Unlike other string literals, there is no implicit
terminating '\0'. Arbitrary byte values can of course be specified in
hexadecimal: uc"\x01\x02\x03\x04". Since there's no terminating null
character and C doesn't support zero-sized objects, uc"" is a syntax
error.
uc"..." string literals might be made even simpler, for example allowing >>> only hex digits and not requiring \x (uc"01020304" rather than
uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals
could be useful in other contexts, and programmers will want
flexibility. Maybe something like hex"01020304" (embedded spaces could >>> be ignored) could be defined in addition to uc"\x01\x02\x03\x04".
That's something I added to string literals in my language within the
last few months. Nothing do with embedding (but it can make hex
sequences within strings more efficient, if that approach was used).
Writing byte-at-a-time hex data was always a bit fiddly:
0x12, 0x34, 0xAB, ...
"\x12\x34\xAB...
It was made worse by my preference for `x` being in lower case, and
the hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look wrong.
What I did was create a new, variable-lenghth string escape sequence
that looks like this:
"ABC\h1234AB...\nopq" // hex sequence between ABC & nopq
Hex digits after \h or \H are read in pairs. White space is allowed
between pairs:
"ABC\H 12 34 AB ...\nopq"
The only thing I wasn't sure about was the closing backslash, which
looks at first like another escape code. But I think it is sound,
although it can still be tweaked.
How often would something like that be useful? I would have thought
that it is rare to see something that is basically text but has enough
odd non-printing characters (other than the common \n, \t, \e) to make
it worth the fuss. If you want to have binary data in something that
looks like a string literal, then just use straight-up two hex digits
per character - "4142431234ab". It's simpler to generate and parse. I don't see the benefit of something that mixes binary and text data.
But it also seems reasonable to expect that if a file is big enough to
cause trouble for #embed, then any other method of including it in a C
file will be at least as bad and probably /much/ worse.
The "+" is used for compile-time string/data-string concatenation.)
On 6/14/2024 1:53 AM, Bonita Montero wrote:
It is a poor practice, but seemingly does occur in the wild (intentional
Am 13.06.2024 um 21:07 schrieb BGB:
One possible justification (albeit a weak one) is that if one
recompiles the program with optimizations turned on, in many cases
this may subtly change the behavior of the program (particularly in
relation to things like the contents of uninitialized variables and
dangling pointers, etc...). ...
If you rely on that you're misusing the language anyway.
or not).
On Sat, 15 Jun 2024 20:27:41 +0100, bart wrote:
The "+" is used for compile-time string/data-string concatenation.)
Why didn’t you follow the C convention of implicit concatenation, just by placing literals next to each other?
On 15/06/2024 23:39, Lawrence D'Oliveiro wrote:
On Sat, 15 Jun 2024 20:27:41 +0100, bart wrote:
The "+" is used for compile-time string/data-string concatenation.)
Why didn’t you follow the C convention of implicit concatenation, just
by placing literals next to each other?
Why is that better?
I did actually have that, but it wasn't as useful. It could only work at
the lexical level with actual string literals, for a start.
I fairly promptly fixed this bug once discovered, and am then just left
to wonder how exactly it managed to work in the first place (or didn't
break already).
On Sat, 15 Jun 2024 17:58:22 +0200, David Brown wrote:
But it also seems reasonable to expect that if a file is big enough to
cause trouble for #embed, then any other method of including it in a C
file will be at least as bad and probably /much/ worse.
But if you redefine the problem as “any method of including it in a C *program*”, then you realize that there are better techniques that do not involved C extensions.
On 15/06/2024 18:17, David Brown wrote:
On 15/06/2024 00:39, bart wrote:
On 14/06/2024 22:30, Keith Thompson wrote:
Now that it's too late to change the definition, I've thought of
something that I think would have been a better way to specify #embed. >>>>
Define a new kind of string literal, with a "uc" prefix. `uc"foo"` is >>>> of type `unsigned char[3]`. (Or `const unsigned char[3]`, if that's
not
too radical.) Unlike other string literals, there is no implicit
terminating '\0'. Arbitrary byte values can of course be specified in >>>> hexadecimal: uc"\x01\x02\x03\x04". Since there's no terminating null >>>> character and C doesn't support zero-sized objects, uc"" is a syntax
error.
uc"..." string literals might be made even simpler, for example
allowing
only hex digits and not requiring \x (uc"01020304" rather than
uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals >>>> could be useful in other contexts, and programmers will want
flexibility. Maybe something like hex"01020304" (embedded spaces could >>>> be ignored) could be defined in addition to uc"\x01\x02\x03\x04".
That's something I added to string literals in my language within the
last few months. Nothing do with embedding (but it can make hex
sequences within strings more efficient, if that approach was used).
Writing byte-at-a-time hex data was always a bit fiddly:
0x12, 0x34, 0xAB, ...
"\x12\x34\xAB...
It was made worse by my preference for `x` being in lower case, and
the hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look
wrong.
What I did was create a new, variable-lenghth string escape sequence
that looks like this:
"ABC\h1234AB...\nopq" // hex sequence between ABC & nopq
Hex digits after \h or \H are read in pairs. White space is allowed
between pairs:
"ABC\H 12 34 AB ...\nopq"
The only thing I wasn't sure about was the closing backslash, which
looks at first like another escape code. But I think it is sound,
although it can still be tweaked.
How often would something like that be useful? I would have thought
that it is rare to see something that is basically text but has enough
odd non-printing characters (other than the common \n, \t, \e) to make
it worth the fuss. If you want to have binary data in something that
looks like a string literal, then just use straight-up two hex digits
per character - "4142431234ab". It's simpler to generate and parse.
I don't see the benefit of something that mixes binary and text data.
That's not the same thing. That sequence "...1234..." occupies 4 bytes
(with values 49 50 51 52), not two bytes (with values 0x12 and 0x34, or
18 and 52).
Here's an example of wanting to print '€4.99', first in C (note that my editor doesn't support Unicode so this stuff is needed):
puts("\xE2\x82\xAC" "4.99");
The euro symbol occupies three bytes in UTF8. It's awkward to type: it
has loads of backslashes, it keeps switching case and it needs more concentration.
Plus I had to split the string since apparently \x doesn't stop at two
hex digits, it keeps going: it would have read \xAC4, which overflows
the 8-bit width of a character anyway, so I don't know what the point is
of reading more than 2 hex characters.
Using my feature, it looks like this:
println "\H E2 82 AC\4.99"
There must be loads of examples of wanting to write many byte values
within strings, which in C can also be used to initialise byte arrays (a useful feature I've now adopted; see below).
Here's another example, in my language, which is the first 128 bytes of
an EXE file which is constant. It is currently defined like this,
probably created with a script:
[]byte stubdata = (
0x4D, 0x5A, 0x90, 0x00, 0x03, 0x00, 0x00, 0x00,
0x04, 0x00, 0x00, 0x00, 0xFF, 0xFF, 0x00, 0x00,
...
Using the new escape, I can just copy&paste a dump, and use a text
editor to put in the string context needed, which took under a minute:
[]byte stubdata=
b"\H 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00\"+
b"\H B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00\"+
b"\H 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00\"+
b"\H 00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00\"+
b"\H 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68\"+
b"\H 69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F\"+
b"\H 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20\"+
b"\H 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00\"+
b"\H 50 45 00 00 64 86 04 00 00 00 00 00 00 00 00 00\"
(The 's'/'b' prefixes are needed for strings to have a type of (in C
terms) char[] rather than char*, a detail that C glosses over via some
magic. 's' gives you a zero terminator, 'b' as used here doesn't. The
"+" is used for compile-time string/data-string concatenation.)
In short, more is possible without needed to resort to tools. You can directly work from a hex dump.
On 15/06/2024 21:27, bart wrote:
On 15/06/2024 18:17, David Brown wrote:
On 15/06/2024 00:39, bart wrote:
On 14/06/2024 22:30, Keith Thompson wrote:
Now that it's too late to change the definition, I've thought of
something that I think would have been a better way to specify #embed. >>>>>
Define a new kind of string literal, with a "uc" prefix. `uc"foo"` is >>>>> of type `unsigned char[3]`. (Or `const unsigned char[3]`, if
that's not
too radical.) Unlike other string literals, there is no implicit
terminating '\0'. Arbitrary byte values can of course be specified in >>>>> hexadecimal: uc"\x01\x02\x03\x04". Since there's no terminating null >>>>> character and C doesn't support zero-sized objects, uc"" is a syntax >>>>> error.
uc"..." string literals might be made even simpler, for example
allowing
only hex digits and not requiring \x (uc"01020304" rather than
uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals >>>>> could be useful in other contexts, and programmers will want
flexibility. Maybe something like hex"01020304" (embedded spaces
could
be ignored) could be defined in addition to uc"\x01\x02\x03\x04".
That's something I added to string literals in my language within
the last few months. Nothing do with embedding (but it can make hex
sequences within strings more efficient, if that approach was used).
Writing byte-at-a-time hex data was always a bit fiddly:
0x12, 0x34, 0xAB, ...
"\x12\x34\xAB...
It was made worse by my preference for `x` being in lower case, and
the hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look
wrong.
What I did was create a new, variable-lenghth string escape sequence
that looks like this:
"ABC\h1234AB...\nopq" // hex sequence between ABC & nopq
Hex digits after \h or \H are read in pairs. White space is allowed
between pairs:
"ABC\H 12 34 AB ...\nopq"
The only thing I wasn't sure about was the closing backslash, which
looks at first like another escape code. But I think it is sound,
although it can still be tweaked.
How often would something like that be useful? I would have thought
that it is rare to see something that is basically text but has
enough odd non-printing characters (other than the common \n, \t, \e)
to make it worth the fuss. If you want to have binary data in
something that looks like a string literal, then just use straight-up
two hex digits per character - "4142431234ab". It's simpler to
generate and parse. I don't see the benefit of something that mixes
binary and text data.
That's not the same thing. That sequence "...1234..." occupies 4 bytes
(with values 49 50 51 52), not two bytes (with values 0x12 and 0x34,
or 18 and 52).
Here's an example of wanting to print '€4.99', first in C (note that
my editor doesn't support Unicode so this stuff is needed):
puts("\xE2\x82\xAC" "4.99");
The euro symbol occupies three bytes in UTF8. It's awkward to type: it
has loads of backslashes, it keeps switching case and it needs more
concentration.
Plus I had to split the string since apparently \x doesn't stop at two
hex digits, it keeps going: it would have read \xAC4, which overflows
the 8-bit width of a character anyway, so I don't know what the point
is of reading more than 2 hex characters.
Using my feature, it looks like this:
println "\H E2 82 AC\4.99"
I don't see any improvement of significance. The improvement, if any,
is very minor.
(I gather you have other conveniences for your language's printing
features when converting various types, but that's a different matter.)
The obvious answer to writing this kind of thing is simply to switch to
an editor that supports UTF-8.
Why bother with the \H stuff? That's my point - use hex data for data,
and text for text. Mixing these is not common enough to make it worth
the extra fuss you have to give such negligible extra convenience.
My suggestion is that it could be helpful to have binary blobs written
as hex digits without escapes anywhere, because it is /just/ binary
data. I don't object to having optional spaces - that's a fine idea.
But just write :
b"4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00"
b"B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00"
The extra "\H" adds nothing useful.
(The 's'/'b' prefixes are needed for strings to have a type of (in C
terms) char[] rather than char*, a detail that C glosses over via some
magic. 's' gives you a zero terminator, 'b' as used here doesn't. The
"+" is used for compile-time string/data-string concatenation.)
In short, more is possible without needed to resort to tools. You can
directly work from a hex dump.
Code up an L-System for fun:
On 16/06/2024 15:54, David Brown wrote:
On 15/06/2024 21:27, bart wrote:
On 15/06/2024 18:17, David Brown wrote:
On 15/06/2024 00:39, bart wrote:
On 14/06/2024 22:30, Keith Thompson wrote:
Now that it's too late to change the definition, I've thought of
something that I think would have been a better way to specify
#embed.
Define a new kind of string literal, with a "uc" prefix.
`uc"foo"` is
of type `unsigned char[3]`. (Or `const unsigned char[3]`, if
that's not
too radical.) Unlike other string literals, there is no implicit >>>>>> terminating '\0'. Arbitrary byte values can of course be
specified in
hexadecimal: uc"\x01\x02\x03\x04". Since there's no terminating null >>>>>> character and C doesn't support zero-sized objects, uc"" is a syntax >>>>>> error.
uc"..." string literals might be made even simpler, for example
allowing
only hex digits and not requiring \x (uc"01020304" rather than
uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals >>>>>> could be useful in other contexts, and programmers will want
flexibility. Maybe something like hex"01020304" (embedded spaces >>>>>> could
be ignored) could be defined in addition to uc"\x01\x02\x03\x04".
That's something I added to string literals in my language within
the last few months. Nothing do with embedding (but it can make hex
sequences within strings more efficient, if that approach was used). >>>>>
Writing byte-at-a-time hex data was always a bit fiddly:
0x12, 0x34, 0xAB, ...
"\x12\x34\xAB...
It was made worse by my preference for `x` being in lower case, and
the hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look
wrong.
What I did was create a new, variable-lenghth string escape
sequence that looks like this:
"ABC\h1234AB...\nopq" // hex sequence between ABC & nopq >>>>>
Hex digits after \h or \H are read in pairs. White space is allowed
between pairs:
"ABC\H 12 34 AB ...\nopq"
The only thing I wasn't sure about was the closing backslash, which
looks at first like another escape code. But I think it is sound,
although it can still be tweaked.
How often would something like that be useful? I would have thought
that it is rare to see something that is basically text but has
enough odd non-printing characters (other than the common \n, \t,
\e) to make it worth the fuss. If you want to have binary data in
something that looks like a string literal, then just use
straight-up two hex digits per character - "4142431234ab". It's
simpler to generate and parse. I don't see the benefit of something
that mixes binary and text data.
That's not the same thing. That sequence "...1234..." occupies 4
bytes (with values 49 50 51 52), not two bytes (with values 0x12 and
0x34, or 18 and 52).
Here's an example of wanting to print '€4.99', first in C (note that
my editor doesn't support Unicode so this stuff is needed):
puts("\xE2\x82\xAC" "4.99");
The euro symbol occupies three bytes in UTF8. It's awkward to type:
it has loads of backslashes, it keeps switching case and it needs
more concentration.
Plus I had to split the string since apparently \x doesn't stop at
two hex digits, it keeps going: it would have read \xAC4, which
overflows the 8-bit width of a character anyway, so I don't know what
the point is of reading more than 2 hex characters.
Using my feature, it looks like this:
println "\H E2 82 AC\4.99"
I don't see any improvement of significance. The improvement, if any,
is very minor.
The difference is that it can be typed fluently without that annoying \x between every number. Plus I can add white space for grouping without it affecting the data.
(I gather you have other conveniences for your language's printing
features when converting various types, but that's a different matter.)
The obvious answer to writing this kind of thing is simply to switch
to an editor that supports UTF-8.
It never happens that you want to type a bunch of hex byte values to initialise a byte array? OK.
Why bother with the \H stuff? That's my point - use hex data for
data, and text for text. Mixing these is not common enough to make it
worth the extra fuss you have to give such negligible extra convenience.
My suggestion is that it could be helpful to have binary blobs written
as hex digits without escapes anywhere, because it is /just/ binary
data. I don't object to having optional spaces - that's a fine idea.
But just write :
b"4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00"
b"B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00"
The extra "\H" adds nothing useful.
Is this a separate feature using 'b'?
Because in my scheme, \H is just
another string escape code, which can be used in ordinary strings,
and
b"" strings define char[] data which can include normal text data too.
So my example could have been written as b"MZ\h 90 00 03 ..."
I did look at having a separate feature, but I didn't want that. I ended
up with these scheme for data-strings, here expressed using C types:
Can initialise:
"abcd" char* only
s"abcd" char*, char[] or any T[]; zero-terminated
b"abcd" char*, char[] or any T[]
sinclude"file" char*, char[] or any T[]; zero-terminated
binclude"file" char*, char[] or any T[]
The first 3 can include any string escapes including \H...\
The last two embed file data, binary or text. But if a normal C-style
string is needed with no embedded zeros except at the end, sinclude
should be used with a text file.
(The 's'/'b' prefixes are needed for strings to have a type of (in C
terms) char[] rather than char*, a detail that C glosses over via
some magic. 's' gives you a zero terminator, 'b' as used here
doesn't. The "+" is used for compile-time string/data-string
concatenation.)
In short, more is possible without needed to resort to tools. You can
directly work from a hex dump.
I don't see any improvement of significance. The improvement, if
any, is very minor.
The difference is that it can be typed fluently without that annoying
\x between every number.
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 716 |
| Nodes: | 16 (3 / 13) |
| Uptime: | 53:15:27 |
| Calls: | 12,116 |
| Calls today: | 7 |
| Files: | 15,010 |
| Messages: | 6,518,604 |
| Posted today: | 2 |