• Re: Caller-saved vs. callee-saved registers

    From MitchAlsup1@21:1/5 to BGB on Mon Feb 3 16:49:37 2025
    On Sun, 2 Feb 2025 22:03:46 +0000, BGB wrote:

    On 2/2/2025 2:16 PM, BGB wrote:
    On 2/2/2025 8:52 AM, Anton Ertl wrote:
    [email protected] (MitchAlsup1) writes:

    Decided to throw this at DeepSeek and see what it came up with...

    After a big mountain of text (not read through all of it), it came up
    with an initial suggestion of:
    16 function arguments;
    16 callee-save registers;
    27 general scratch registers;

    What percentage of dynamically called functions need more than 8
    arguments ??

    My guess is way under 1% the most complicated Linux call has 6.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to BGB on Mon Feb 3 21:35:28 2025
    On Mon, 3 Feb 2025 20:58:50 +0000, BGB wrote:

    On 2/3/2025 10:49 AM, MitchAlsup1 wrote:
    On Sun, 2 Feb 2025 22:03:46 +0000, BGB wrote:

    On 2/2/2025 2:16 PM, BGB wrote:
    On 2/2/2025 8:52 AM, Anton Ertl wrote:
    [email protected] (MitchAlsup1) writes:

    Decided to throw this at DeepSeek and see what it came up with...

    After a big mountain of text (not read through all of it), it came up
    with an initial suggestion of:
       16 function arguments;
       16 callee-save registers;
       27 general scratch registers;

    What percentage of dynamically called functions need more than 8
    arguments ??

    My guess is way under 1% the most complicated Linux call has 6.

    Quick look at stats for the Doom output, 0.68% ...

    Not quite zero, I think it varies some, but yeah, generally in the "just under 1%" territory.

    Just as I thought !

    By the time one reaches 16 arguments, they (usually) reach 100%
    coverage.

    I have had managers refuse to accept subroutines with more than 8
    arguments, telling that the arguments should be related in some
    way and I should build some kind of structure representing how
    they are related, and pass it instead.

    But, yeah, I guess one can argue how much the different 8 or 16
    arguments makes when it is less than 1% of functions.


    But, yeah, statistical coverage in this case:
    2 arguments: 36.8%
    4 arguments: 79.2%
    6 arguments: 96.2%
    8 arguments: 99.3%
    10 arguments: 99.9%
    12 arguments: 100%
    16 arguments: 100%
    ---------------------

    ( Then, as I was writing this, randomly had to deal with PC suddenly
    becoming almost unusable for a little while as Firefox exploded and ate
    all the RAM, forcing excessive paging, and the struggle of trying to get
    the program terminated... Might be nice if Windows had a feature that
    could be set to auto-kill any processes that exceeds a certain RAM
    limit, say, 12 or 16GB, or at least have any new memory allocations
    fail... Maybe also protections from "fork bombs" in programs would be
    nice as well... ).

    That would be useful--which means Microsoft is not allowed to provide. -------------

    If one were designing a custom language, they could impose a limit of 16/24/32 as the maximum number of allowed arguments. Most code would not notice, some other code might need to use a struct or something.

    About the only things I ever use "lots of arguments" is [*]printf().

    I tend to use a lot more arguments in Verilog, but then again, this is because there are no structures, and the only other option would be to
    use a "big ol' blob of bits" and then break it apart with wire/assign.

    In HW there is no overhead in using arguments to Verilog subroutines.
    1 argument, or 754 arguments it is all the same. SW fails to have
    this illusion.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to BGB on Mon Feb 3 23:22:19 2025
    On Mon, 3 Feb 2025 22:24:11 +0000, BGB wrote:

    On 2/3/2025 3:35 PM, MitchAlsup1 wrote:
    On Mon, 3 Feb 2025 20:58:50 +0000, BGB wrote:
    -------------------

    I had used 16 arguments in the XG2 ABI, but this was with 64 registers,
    and mostly because this basically entirely eliminated the existence of non-register arguments.


    In the ABI variant originally, it was intended as the 128-bit ABI, and
    was expanded to 16 argument registers initially mostly because, in the 128-bit mode, each 128-bit argument took 2 registers (as an even pair);
    so one was far more likely to eat up all the argument registers.

    I had this problem in Mc 88100 ABI.

    This ABI variant ended up being used as the base ABI for XG2 rather than
    the original (8 argument register) ABI variant (albeit, dropping back to 64-bit pointers). But, it was still sorta relevant for functions
    accepting 128-bit SIMD vectors (which were also passed as even register pairs).

    Another reason NOT to have SIMD...instructions--it is perfectly fine
    to have multi-lane calculations (and memory references) and perform
    them without SIDM instructions.

    Of course, I was also using spill space, and the added cost of 128 bytes
    of spill space to nearly every stack frame was enough to be noticed.

    Ended up special casing it to only require 128 bytes if either:
    Functions with more than 8 argument registers were called;
    A vararg function was called (such as "printf()");
    Other cases using 64 bytes.

    This greatly reduced the number of cases where 128 bytes were reserved,
    and most of these were due to printf's or similar.

    Brian's LLVM compiler reserves nothing on the stack except locations
    that can be accessed within the subroutine--even when varargs is
    present. In fact: caller does not need to know the called subroutine
    is varargs.

    The RISC-V ABI doesn't use spill-space though, but could also be faked callee-side by adjusting the SP-adjustments for the stack-frame if
    needed (namely, adding an extra 64 bytes to the stack-size adjustment).


    By the time one reaches 16 arguments, they (usually) reach 100%
    coverage.

    I have had managers refuse to accept subroutines with more than 8
    arguments, telling that the arguments should be related in some
    way and I should build some kind of structure representing how
    they are related, and pass it instead.


    I am mostly just going by "in the wild" code here...


    But, yeah, I guess one can argue how much the different 8 or 16
    arguments makes when it is less than 1% of functions.


    But, yeah, statistical coverage in this case:
        2 arguments: 36.8%
        4 arguments: 79.2%
        6 arguments: 96.2%
        8 arguments: 99.3%
       10 arguments: 99.9%
       12 arguments: 100%
       16 arguments: 100%
    ---------------------

    Yeah, a case could be made that, practically, 8 is probably sufficient.

    In HW one always shoots at "more than sufficient" {knee of the curve}
    rather than "near optimal".

    Only practical merit of 16 being that one can then (almost) ignore the possibility of non-register arguments, if they make the compiler refuse
    to accept more than 16 arguments.

    Yes, but then you arrive at the subroutine without any free registers
    to perform a "few" calculations to know if you really want to enter
    the subroutine or take a quick exit.

    ---------------
    IIRC Android has a feature like this, albeit generally with a lower
    limit (say, if you try to "malloc()" a 1GB array, the process is
    terminated on the spot).

    Acceptable only when there is a means to actually allow malloc() of
    that 1GB array. I suspect said system is happy to allow you to
    mmap() a 1TB file.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to BGB on Tue Feb 4 22:24:00 2025
    On Tue, 4 Feb 2025 3:05:11 +0000, BGB wrote:

    On 2/3/2025 5:22 PM, MitchAlsup1 wrote:
    On Mon, 3 Feb 2025 22:24:11 +0000, BGB wrote:
    ----------------
    But, yeah, I guess one can argue how much the different 8 or 16
    arguments makes when it is less than 1% of functions.


    But, yeah, statistical coverage in this case:
        2 arguments: 36.8%
        4 arguments: 79.2%
        6 arguments: 96.2%
        8 arguments: 99.3%
       10 arguments: 99.9%
       12 arguments: 100%
       16 arguments: 100%
    ---------------------

    Yeah, a case could be made that, practically, 8 is probably sufficient.

    In HW one always shoots at "more than sufficient" {knee of the curve}
    rather than "near optimal".


    Though, the choice of 8 or 16 arguments is more a compiler/ABI level
    choice than a hardware one.

    Given you data:: 4 is the knee of the curve and 8 is essentially
    optimal;
    it can be argued that 6 is essentially optimal:: as seen from the eye of
    a HW guy. {{I think we used 8 in Mc 88110 without any 'real' issue}}

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)