On 01/09/2025 11:28, pozz wrote:
Il 29/08/2025 15:04, David Brown ha scritto:
On 29/08/2025 12:39, pozz wrote:
Il 29/08/2025 11:05, David Brown ha scritto:
On 28/08/2025 18:12, pozz wrote:
I don't think this is a pure C programming question, but it's related. >>>>>
I'm building a static library mylib.a with all the object files
except main.o, and the final executable linking together main.o and
mylib.a.
My first question is "Why?" Why not simply link all the object
files together?
Because I want to start creating tests on my projects and one simple
approach is what I have described. All is in a static library (a
collection of object files), except main.o where is only main().
If I want to create test1.c (with its main), I have to only link
test1.o with library.
The only "benefit" you get from using a library in this situation is
that you might conceivably save a couple of lines in your makefile,
Could you explain?
Actually I'm using cmake and I write something similar to this for each
test:
add_executable(testA tests/testA.c)
target_link_libraries(testA PRIVATE ${PRJLIB})
target_link_options(testA PRIVATE -Wl,--gc-sections) # doesn't work
Without the library I need to explictly link testA.c with the exact
source files required for the test. I should take care of compilation options, because they should be the same for the main executable and the
test executable (otherwise I could test a different thing). Linking
against the library helps on this point, because you are sure your
testing exactly the production binary.
I am not familiar enough with cmake to give a detailed answer. CMake
has always struck me as two steps ahead of make, and six and a half
steps behind it - I can't see any good justification or benefit that
outweighs its disadvantages and complexities. Of course that view is
highly subjective, based on what I need from a build tool, and my
existing familiarity with make.
But surely cmake has a convenient way of making a variable that is a
list of all the object files in your source directory (based itself on a
glob of all the C files in the source directory), and passing that to
the linker command as easily as passing the name of a static library?
and if you are using spinning rust drives and a PC from the 1990's,
you might save a second or two from the build. It is not worth the
bother.
No, it's not for that I'm using static library approach for testing.
I don't see any reason for using a static library when building test
binaries, compared to simply passing the list of object files to the linker.
But I suspect you'd see the same effect (linker garbage collection not
working as you want) whether you use a static library or individual
object files.
I want to remove from the final exe everything present in mylib.a
that is not used in main.o.
My second question is "Why?" When you are working on embedded
targets, saving space like this in the executable is often a very
good idea. On a PC, it is rarely worth the effort.
This question was born because I found another problem.
I have a library distributed in source codes. One function in one
source code needs to access a global array that *must* be defined
somewhere in the project.
Okay.
As I wrote, I'm creating some tests. Not all the tests use that
function, so I thought it would be safe to not have the definition of
the global array, but this isn't true in my project and building
system. The linker complains because it doesn't find the global array.
I expected the unused function, with its references, would have been
removed by the linker.
The "--gc-sections" gnu linker option works for elf files, but is
"experimental" for coff/pe. Maybe it simply doesn't work as
effectively for mingw, which will be generating Windows-style coff/pe
binaries, while it works for WSL which uses Linux-style elf binaries.
Ok, I expected it was a very simple task for a linker. Don't put an
unused section in the final binary.
Cross-references back and forth can make it surprisingly complicated, I suspect. And it is not always a simple matter to determine which
symbols must be kept, and which can be cleared out. Things that might
cause complications include interrupt handlers or vectors, weak symbols
and symbol aliasing. (Standard COFF does not, AFAIK, support weak
symbols - COFF/PE does to some extent, but perhaps GNU ld does not
support them with COFF.)
But while I appreciate that you expected linker garbage collection to
work here, you still haven't answered why you feel it is important.
(If you are just trying to understand why it doesn't work, that's fine
by me.)
I explained below. Anyway I know of countermeasures to solve the
specific problem, but my original question is only to understand what
was happening and if I was wrong in something.
Again, I thought the linker garbace collector was a simple task to do.
Suppose only mod1.o is present in mylib.a. foo1() and bar1() are
defined in mod1.c. main() is the only function in main.c and only
foo1() is called from main().
gcc -O2 -ffunction-sections -fdata-sections -c -o mod1.o mod1.c
ar rcs mylib.a mod1.o
gcc -O2 -ffunction-sections -fdata-sections -c -o main.o main.c
gcc -Wl,--gc-sections,--print-gc-sections -o main[.exe] main.o mylib.a >>>>> objdump -d main[.exe] | grep bar1
MinGW in Windows build a main.exe that contains bar1(), while gcc
in WSL doesn't.
Why?
Why are you using "-ffunction-sections" and "-fdata-sections" ?
Because each function is in its section and can be removed if unused
by the garbage collector of the linker (at least, so I understood).
They are potentially useful if you have a significant amount of extra
code and/or data that is in your source code and not wanted in the
final binary, /and/ that it is important for the binary to be as small
as reasonably possible.
Indeed they are used in embedded world, where the program memory is
limited.
I strongly recommend you be careful with these options, and think about
what they actually do. In particular, -fdata-sections can make your
code bigger and slower because it blocks the use of section anchors in a
lot of cases. Yes, I know a lot of embedded developers use
"-fdata-sections", and many IDE "wizards" enable it by default - they
are wrong to do so.
Imagine you have code like this :
int x, int y, int z;
int sum(void) { return x + y + z; }
With -fdata-sections, the compiler has no idea where x, y, and z will
end up - that is up to the linker. So on a load/store architecture
(such as ARM), it must generate code approximating :
int x, int y, int z;
int sum(void) {
const int * const px = &x;
const int * const py = &y;
const int * const pz = &z;
const int tx = *px;
const int ty = *py;
const int tz = *pz;
const int r1 = tx + ty;
const int r2 = r1 + tz;
return r2;
}
Without -fdata-sections, and without -fcommon (which is a terrible
option for many reasons), the compiler can generate roughly :
struct xyz {
int x, int y, int z;
} xyz;
int sum(void) {
const struct xyz * const pxyz = &xyz;
const int tx = xyz->px;
const int ty = xyz->py;
const int tz = xyz->pz;
const int r1 = tx + ty;
const int r2 = r1 + tz;
return r2;
}
This is significantly shorter and faster in practice - especially for
targets that have a "load double register" instruction.
-ffunction-sections is not such an issue for most microcontrollers, but
can limit or reduce the effectiveness of some optimisations (like
function cloning, hot/cold partitioning for cache usage, and branch and
jump instruction size reduction for some targets).
However, "-ffunction-sections" is not going to save you much space in
your flash unless you have a lot of extra source code that is not
actually needed in the program. Sometimes that can happen, especially
if the same source code base is used with multiple build variants. But generally you don't want to write and test extra source code that you
are not actually using in the binary. And if you /really/ want to
squeeze for space, use LTO and you don't need -ffunction-sections or -fdata-sections - but LTO can be "fun" for debugging. (I often use -ffunction-sections because I do often have multiple build variants, but
I actively avoid -fdata-sections.)
As I understand it, you have an embedded program with lots of source
code, and you want to test different parts of it by compiling with
different small "main" functions on a PC. I am not sure if you said
the main program was for an embedded system, or if I am assuming that
because you are one of the few people keeping comp.arch.embedded alive
by starting new threads there :-)
You have some relationship with Sherlock Homes :-)
No. Sherlock Holmes uses extrapolation (despite entirely incorrectly
claiming to use deductive logic) - he takes things he knows and follows outwards by patterns to guess other things. I used interpolation - I
took things I knew and used the patterns to fill in the gaps. So I was
quite confident in my guesses!
Yes, you described my situation very well. I have the principal main for production exe (embedded target) and a few test mains.(native for dev machine).
For the main program, function and data sections are probably not
needed because you the source code that you build for the project is
needed for the program - if there was a lot that you didn't need, it
wouldn't be in the project build.
Indeed "-ffunction-sections" and "-fdata-sections" aren't important for
main target main.c.
For the test programs, function and data sections are not needed
because the size of the test binaries on the PC is irrelevant.
Of course, except when there are some functions that I'm not testing and
that need some references that are defined in the main main(), but not
in the test main().
That sounds like a code organisation issue. Basically, don't do that,
and it will not be a problem. Inter-module dependencies should usually
be a directed acyclic graph (a tree, without loops) with the "main"
module at the root. (Sometimes cycles or loops are required, but they
should be minimised.) The "main" module will depend, directly or
indirectly, on all the rest of the code modules - other modules should
not depend on the "main" module.
Again - if your questions are from curiosity as to why things are not
working as you expected, I fully appreciate that.
Yes, mainly my question was for curiosity. I already fixed my test by defining unreference data in the test main, even if it isn't really
needed in the test.
If your questions are because you think there is a significant
advantage in using static libraries and section garbage collection in
your build process, I believe it is unlikely to be beneficial in
reality - so it does not matter if they don't work.
I think now is clear. I needed linker garbage collector to produce test binary just to fix the undeclared error.
As I noted above, that is best solved in other ways.
Regarding the benefecial of building a static library to link with main main.c and test main.c, I think it's a good approach yet.
In the end, however, my guess is just that the limited coff/pe format
used by Windows binaries is the issue.
Ok.
Or it is a limitation of GNU ld when using Windows COFF/PE.
They can be helpful in some ways for omitting code and data that is
defined in the code, but not actually used in the executable. But
they can also make linking slower, and "-fdata-sections" can reduce
optimisations (especially if you have "-fcommon", which was the
default in older gcc).
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)