• Re: System calls (was: VAX)

    From Scott Lurndal@21:1/5 to Anton Ertl on Wed Aug 13 15:03:08 2025
    [email protected] (Anton Ertl) writes:
    [email protected] (Scott Lurndal) writes:
    That said, Unix generally defined -1 as the return value for all
    other system calls, and code that checked for "< 0" instead of
    -1 when calling a standard library function or system call was fundamentally >>broken.

    That may be the interface of the C system call wrapper,

    It _is_ the interface that the programmers need to be
    concerted with when using POSIX C language bindings.

    Other language bindings offer alternative mechanisms.


    errno, but at the actual system call level, the error is indicated in
    an architecture-specific way, and the ones I have looked at before
    today use the sign of the result register or the carry flag. On those >architectures, where the sign is used, mmap(2) cannot return negative >addresses, or must have a special wrapper.

    Why would the wrapper care if the system call failed? The
    return value from the kernel should be passed through to
    the application as per the POSIX language binding requirements.

    lseek(2) and mmap(2) both require the return of arbitrary 32-bit
    or 64-bit values, including those which when interpreted as signed
    values are negative.

    Clearly POSIX defines the interfaces and the underlying OS and/or
    library functions implement the interfaces. The kernel interface
    to the language library (e.g. libc) is irrelevent to typical programmers, except in the case where it doesn't provide the correct semantics.


    Let's look at what the system call wrappers do on RV64G(C) (which has
    no carry flag). For read(2) the wrapper contains:

    0x3ff7f173be <read+20>: ecall
    0x3ff7f173c2 <read+24>: lui a5,0xfffff
    0x3ff7f173c4 <read+26>: mv s0,a0
    0x3ff7f173c6 <read+28>: bltu a5,a0,0x3ff7f1740e <read+100>

    For dup(2) the wrapper contains:

    0x3ff7e7fe9a <dup+2>: ecall
    0x3ff7e7fe9e <dup+6>: lui a7,0xfffff
    0x3ff7e7fea0 <dup+8>: bltu a7,a0,0x3ff7e7fea6 <dup+14>

    and for mmap(2):

    0x3ff7e86b6e <mmap64+12>: ecall
    0x3ff7e86b72 <mmap64+16>: lui a5,0xfffff
    0x3ff7e86b74 <mmap64+18>: bltu a5,a0,0x3ff7e86b8c <mmap64+42>

    So instead of checking for the sign flag, on RV64G the wrapper checks
    if the result is >0xfffff00000000000. This costs one instruction more
    than just checking the sign flag, and allows to almost double the
    number of bytes read(2) can read in one call, the number of file ids
    that cn be returned by dup(2), and the address range returnable by
    mmap(2). Will we ever see processes that need more than 8EB? Maybe
    not, but the designers of the RV64G(C) ABI obviously did not want to
    be the ones that are quoted as saying "8EB should be enough for
    anyone":-).

    Followups to comp.arch

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Scott Lurndal on Wed Aug 13 16:10:10 2025
    [email protected] (Scott Lurndal) writes:
    [email protected] (Anton Ertl) writes:
    [email protected] (Scott Lurndal) writes:
    That said, Unix generally defined -1 as the return value for all
    other system calls, and code that checked for "< 0" instead of
    -1 when calling a standard library function or system call was fundamentally >>>broken.

    That may be the interface of the C system call wrapper,

    It _is_ the interface that the programmers need to be
    concerted with when using POSIX C language bindings.

    True, but not relevant for the question at hand.

    at the actual system call level, the error is indicated in
    an architecture-specific way, and the ones I have looked at before
    today use the sign of the result register or the carry flag. On those >>architectures, where the sign is used, mmap(2) cannot return negative >>addresses, or must have a special wrapper.

    Why would the wrapper care if the system call failed?

    The actual system call returns an error flag and a register. On some architectures, they support just a register. If there is no error,
    the wrapper returns the content of the register. If the system call
    indicates an error, you see from the value of the register which error
    it is; the wrapper then typically transforms the register in some way
    (e.g., by negating it) and stores the result in errno, and returns -1.

    lseek(2) and mmap(2) both require the return of arbitrary 32-bit
    or 64-bit values, including those which when interpreted as signed
    values are negative.

    For lseek(2):

    | Upon successful completion, lseek() returns the resulting offset
    | location as measured in bytes from the beginning of the file.

    Given that off_t is signed, lseek(2) can only return positive values.

    For mmap(2):

    | On success, mmap() returns a pointer to the mapped area.

    So it's up to the kernel which user-level addresses it returns. E.g.,
    32-bit Linux originally only produced user-level addresses below 2GB.
    When memories grew larger, on some architectures (e.g., i386) Linux
    increased that to 3GB.

    Clearly POSIX defines the interfaces and the underlying OS and/or
    library functions implement the interfaces. The kernel interface
    to the language library (e.g. libc) is irrelevent to typical programmers

    Sure, but system calls are first introduced in real kernels using the
    actual system call interface, and are limited by that interface. And
    that interface is remarkably similar between the early days of Unix
    and recent Linux kernels for various architectures. And when you look
    closely, you find how the system calls are design to support returning
    the error indication, success value, and errno in one register.

    lseek64 on 32-bit platforms is an exception (the success value does
    not fit in one register), and looking at the machine code of the
    wrapper and comparing it with the machine code for the lseek wrapper,
    some funny things are going on, but I would have to look at the source
    code to understand what is going on. One other interesting thing I
    noticed is that the system call wrappers from libc-2.36 on i386 now
    draws the boundary between success returns and error returns at
    0xfffff000:

    0xf7d853c4 <lseek+68>: call *%gs:0x10
    0xf7d853cb <lseek+75>: cmp $0xfffff000,%eax
    0xf7d853d0 <lseek+80>: ja 0xf7d85410 <lseek+144>

    So now the kernel can produce 4095 error values, and the rest can be
    success values. In particular, mmap() can return all possible page
    addresses as success values with these wrappers. When I last looked
    at how system calls are done, I found just a check of the N or the C
    flag. I wonder how the kernel is informed that it can now return more addresses from mmap().

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Anton Ertl on Wed Aug 13 18:15:23 2025
    [email protected] (Anton Ertl) writes:
    [email protected] (Scott Lurndal) writes:
    [email protected] (Anton Ertl) writes:
    [email protected] (Scott Lurndal) writes:
    That said, Unix generally defined -1 as the return value for all
    other system calls, and code that checked for "< 0" instead of
    -1 when calling a standard library function or system call was fundamentally
    broken.

    That may be the interface of the C system call wrapper,

    It _is_ the interface that the programmers need to be
    concerted with when using POSIX C language bindings.

    True, but not relevant for the question at hand.

    at the actual system call level, the error is indicated in
    an architecture-specific way, and the ones I have looked at before
    today use the sign of the result register or the carry flag. On those >>>architectures, where the sign is used, mmap(2) cannot return negative >>>addresses, or must have a special wrapper.

    Why would the wrapper care if the system call failed?

    The actual system call returns an error flag and a register. On some >architectures, they support just a register. If there is no error,
    the wrapper returns the content of the register. If the system call >indicates an error, you see from the value of the register which error
    it is; the wrapper then typically transforms the register in some way
    (e.g., by negating it) and stores the result in errno, and returns -1.

    lseek(2) and mmap(2) both require the return of arbitrary 32-bit
    or 64-bit values, including those which when interpreted as signed
    values are negative.

    For lseek(2):

    | Upon successful completion, lseek() returns the resulting offset
    | location as measured in bytes from the beginning of the file.

    Given that off_t is signed, lseek(2) can only return positive values.

    Which was addressed by the LFS (Large File Summit), to support
    files > 2GB in size.

    There is also the degenerate case of open("/dev/mem"...) which
    requires lseek support over the entire physical address space
    and /dev/kmem which supports access to the kernel virtual memory
    address space, which on most systems has the high-order bit
    in the address set to one. Personally, I've used pread/pwrite
    in those cases (once 1003.4 was merged) rather than lseek/read
    and lseek/write.



    For mmap(2):

    | On success, mmap() returns a pointer to the mapped area.

    So it's up to the kernel which user-level addresses it returns. E.g.,
    32-bit Linux originally only produced user-level addresses below 2GB.
    When memories grew larger, on some architectures (e.g., i386) Linux
    increased that to 3GB.

    Aside from mmap-ing /dev/mem or /dev/kmem,
    one must also consider the use of MAP_FIXED, when supported,
    where the kernel doesn't choose the mapped address (although
    it is allowed to refuse to map certain ranges).

    The return value for mmap is 'void *'. The only special value
    for mmap(2) is MAP_FAILED (which is the unsigned equivalent of -1)
    which implies that a one-byte mapping at the end of the address
    space isn't supported.

    all that said, my initial point about -1 was that applications
    should always check for -1 (or MAP_FAILED), not for return
    values less than zero. The actual kernel interface to the
    C library is clearly implementation dependent although it
    must preserve the user-visible required semantics.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dan Cross@21:1/5 to Scott Lurndal on Wed Aug 13 18:51:15 2025
    In article <MO1nQ.2$[email protected]>, Scott Lurndal <[email protected]> wrote: >[email protected] (Anton Ertl) writes:
    [email protected] (Scott Lurndal) writes:
    [snip]
    errno, but at the actual system call level, the error is indicated in
    an architecture-specific way, and the ones I have looked at before
    today use the sign of the result register or the carry flag. On those >>architectures, where the sign is used, mmap(2) cannot return negative >>addresses, or must have a special wrapper.

    Why would the wrapper care if the system call failed? The
    return value from the kernel should be passed through to
    the application as per the POSIX language binding requirements.

    For the branch to `cerror`. That is, the usual reason is (was?)
    to convert from the system call interface to the C ABI,
    specifically, to populate the (userspace, now thread-local)
    `errno` variable if there was an error. (I know you know this,
    Scott, but others reading the discussion may not.)

    Looking at the 32v code for VAX and 7th Edition on the PDP-11,
    on error the kernel returns a non-zero value and sets the carry
    bit in the PSW. The stub checks whether the C bit is set, and
    if so, copies R0 to `errno` and then sets R0 to -1. On the
    PDP-11, `as` supports the non-standard "bec" mnemonic as an
    alias for "bcc" and the stub is actually something like:

    / Do sys call....land in the kernel `trap` in m40.s
    bec 1f
    jmp cerror
    1f:
    rts pc

    cerror:
    mov r0, _errno
    mov $-1, r0
    rts pc

    In other words, if the carry bit is not set, there system call
    was successful, so just return whatever it returned. Otherwise,
    the kernel is returning an error to the user, so do the dance of
    setting up `errno` and returning -1.

    (There's some fiddly bits with popping R5, which Unix used as
    the frame pointer, but I omitted those for brevity).

    lseek(2) and mmap(2) both require the return of arbitrary 32-bit
    or 64-bit values, including those which when interpreted as signed
    values are negative.

    At last for lseek, that was true in the 1990 POSIX standard,
    where the programmer was expected to (maybe save and then) clear
    `errno`, invoke `lseek`, and then check the value of `errno`
    after return to see if there was an error, but has been relaxed
    in subsequent editions (include POSIX 2024) where `lseek` now
    must return `EINVAL` if the offset is negative for a regular
    file, directory, or block-special file. (https://pubs.opengroup.org/onlinepubs/9799919799/functions/lseek.html;
    see "ERRORS")

    For mmap, at least the only documented error return value is
    `MAP_FAILED`, and programmers must check for that explicitly.

    It strikes me that this implies that the _value_ of `MAP_FAILED`
    need not be -1; on x86_64, for instance, it _could_ be any
    non-canonical address.

    Clearly POSIX defines the interfaces and the underlying OS and/or
    library functions implement the interfaces. The kernel interface
    to the language library (e.g. libc) is irrelevent to typical programmers, >except in the case where it doesn't provide the correct semantics.

    Certainly, these are hidden by the system call stubs in the
    libraries for language-specific bindings, and workaday
    programmers should not be trying to side-step those!

    - Dan C.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dan Cross@21:1/5 to Anton Ertl on Wed Aug 13 19:25:31 2025
    In article <[email protected]>,
    Anton Ertl <[email protected]> wrote:
    [email protected] (Scott Lurndal) writes:
    [email protected] (Anton Ertl) writes:
    [email protected] (Scott Lurndal) writes:
    That said, Unix generally defined -1 as the return value for all
    other system calls, and code that checked for "< 0" instead of
    -1 when calling a standard library function or system call was fundamentally
    broken.

    That may be the interface of the C system call wrapper,

    It _is_ the interface that the programmers need to be
    concerted with when using POSIX C language bindings.

    True, but not relevant for the question at hand.

    at the actual system call level, the error is indicated in
    an architecture-specific way, and the ones I have looked at before
    today use the sign of the result register or the carry flag. On those >>>architectures, where the sign is used, mmap(2) cannot return negative >>>addresses, or must have a special wrapper.

    Why would the wrapper care if the system call failed?

    The actual system call returns an error flag and a register. On some >architectures, they support just a register. If there is no error,
    the wrapper returns the content of the register. If the system call >indicates an error, you see from the value of the register which error
    it is; the wrapper then typically transforms the register in some way
    (e.g., by negating it) and stores the result in errno, and returns -1.

    lseek(2) and mmap(2) both require the return of arbitrary 32-bit
    or 64-bit values, including those which when interpreted as signed
    values are negative.

    For lseek(2):

    | Upon successful completion, lseek() returns the resulting offset
    | location as measured in bytes from the beginning of the file.

    Given that off_t is signed, lseek(2) can only return positive values.

    This is incorrect; or rather, it's accidentally correct now, but
    was not previously. The 1990 POSIX standard did not explicitly
    forbid a file that was so large that the offset couldn't
    overflow, hence why in 1990 POSIX you have to be careful about
    error handling when using `lseek`.

    It is true that POSIX 2024 _does_ prohibit seeking so far that
    the offset would become negative, however. But, POSIX 2024
    (still!!) supports multiple definitions of `off_t` for multiple
    environments, in which overflow is potentially unavoidable.
    This leads to considerable complexity in implementations that
    try to support such multiple environments in their ABI (for
    instance, for backwards compatability with old programs).

    For mmap(2):

    | On success, mmap() returns a pointer to the mapped area.

    So it's up to the kernel which user-level addresses it returns. E.g.,
    32-bit Linux originally only produced user-level addresses below 2GB.
    When memories grew larger, on some architectures (e.g., i386) Linux
    increased that to 3GB.

    The point is that the programmer shouldn't have to care. The
    programmer should check the return value against MAP_FAILED, and
    if it is NOT that value, then the returned address may be
    assumed valid. If such an address is not actually valid, that
    indicates a bug in the implementation of `mmap`.

    Clearly POSIX defines the interfaces and the underlying OS and/or
    library functions implement the interfaces. The kernel interface
    to the language library (e.g. libc) is irrelevent to typical programmers

    Sure, but system calls are first introduced in real kernels using the
    actual system call interface, and are limited by that interface. And
    that interface is remarkably similar between the early days of Unix
    and recent Linux kernels for various architectures.

    Not precisely. On x86_64, for example, some Unixes use a flag
    bit to determine whether the system call failed, and return
    (positive) errno values; Linux returns negative numbers to
    indicate errors, and constrains those to values between -4095
    and -1.

    Presumably that specific set of values is constrained by `mmap`:
    assuming a minimum 4KiB page size, the last architecturally
    valid address where a page _could_ be mapped is equivalent to
    -4096 and the first is 0. If they did not have that constraint,
    they'd have to treat `mmap` specially in the system call path.

    Linux _could_ decide to define `MAP_FAILED` as
    0x0fff_ffff_0000_0000, which is non-canonical on all extant
    versions of x86-64, even with 5-level paging, but maybe they do
    not because they're anticipating 6-level paging showing up at
    some point.

    And when you look
    closely, you find how the system calls are design to support returning
    the error indication, success value, and errno in one register.

    lseek64 on 32-bit platforms is an exception (the success value does
    not fit in one register), and looking at the machine code of the
    wrapper and comparing it with the machine code for the lseek wrapper,
    some funny things are going on, but I would have to look at the source
    code to understand what is going on. One other interesting thing I
    noticed is that the system call wrappers from libc-2.36 on i386 now
    draws the boundary between success returns and error returns at
    0xfffff000:

    0xf7d853c4 <lseek+68>: call *%gs:0x10
    0xf7d853cb <lseek+75>: cmp $0xfffff000,%eax
    0xf7d853d0 <lseek+80>: ja 0xf7d85410 <lseek+144>

    So now the kernel can produce 4095 error values, and the rest can be
    success values. In particular, mmap() can return all possible page
    addresses as success values with these wrappers. When I last looked
    at how system calls are done, I found just a check of the N or the C
    flag.

    Yes; see above.

    I wonder how the kernel is informed that it can now return more
    addresses from mmap().

    Assuming you mean the Linux kernel, when it loads an ELF
    executable, the binary image itself is "branded" with an ABI
    type that it can use to make that determination.

    - Dan C.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dan Cross@21:1/5 to Scott Lurndal on Wed Aug 13 19:40:17 2025
    In article <%C4nQ.6540$[email protected]>,
    Scott Lurndal <[email protected]> wrote:
    [snip]
    all that said, my initial point about -1 was that applications
    should always check for -1 (or MAP_FAILED), not for return
    values less than zero. The actual kernel interface to the
    C library is clearly implementation dependent although it
    must preserve the user-visible required semantics.

    For some reason, I have a vague memory of reading somewhere that
    it was considered "more robust" to check for a negative return
    value, and not just -1 specifically. Perhaps this was just
    superstition, or perhaps someone had been bit by an overly
    permissive environment. It certainly seems like advice that we
    can safely discard at this point.

    - Dan C.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Dan Cross on Wed Aug 13 20:28:13 2025
    [email protected] (Dan Cross) writes:
    In article <MO1nQ.2$[email protected]>, Scott Lurndal <[email protected]> wrote: >>[email protected] (Anton Ertl) writes:
    [email protected] (Scott Lurndal) writes:

    For mmap, at least the only documented error return value is
    `MAP_FAILED`, and programmers must check for that explicitly.

    It strikes me that this implies that the _value_ of `MAP_FAILED`
    need not be -1; on x86_64, for instance, it _could_ be any
    non-canonical address.

    And in the very unlikely case that a C compiler was developed
    for the Burroughs B4900, MAP_FAILED could be 0xC0EEEEEE (which
    is how the NULL pointer was encoded in the hardware). Because
    all the data was BCD, undigits (a-f) in an address were
    unconditionally illegal.

    There were instructions to search linked lists, so the hardware
    needed to understand the concept of a NULL pointer (as well
    as deal with the possibility of a loop, using a timer while
    the search instruction was executing).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Dan Cross on Wed Aug 13 21:23:34 2025
    [email protected] (Dan Cross) writes:
    In article <[email protected]>,
    Anton Ertl <[email protected]> wrote:
    For lseek(2):

    | Upon successful completion, lseek() returns the resulting offset
    | location as measured in bytes from the beginning of the file.

    Given that off_t is signed, lseek(2) can only return positive values.

    This is incorrect; or rather, it's accidentally correct now, but
    was not previously. The 1990 POSIX standard did not explicitly
    forbid a file that was so large that the offset couldn't
    overflow, hence why in 1990 POSIX you have to be careful about
    error handling when using `lseek`.

    It is true that POSIX 2024 _does_ prohibit seeking so far that
    the offset would become negative, however.

    I don't think that this is accidental. In 1990 signed overlow had
    reliable behaviour on common 2s-complement hardware with the C
    compilers of the day. Nowadays the exotic hardware where this would
    not work that way has almost completely died out (and C is not used on
    the remaining exotic hardware), but now compilers sometimes do funny
    things on integer overflow, so better don't go there or anywhere near
    it.

    But, POSIX 2024
    (still!!) supports multiple definitions of `off_t` for multiple
    environments, in which overflow is potentially unavoidable.

    POSIX also has the EOVERFLOW error for exactly that case.

    Bottom line: The off_t returned by lseek(2) is signed and always
    positive.

    For mmap(2):

    | On success, mmap() returns a pointer to the mapped area.

    So it's up to the kernel which user-level addresses it returns. E.g., >>32-bit Linux originally only produced user-level addresses below 2GB.
    When memories grew larger, on some architectures (e.g., i386) Linux >>increased that to 3GB.

    The point is that the programmer shouldn't have to care.

    True, but completely misses the point.

    Sure, but system calls are first introduced in real kernels using the >>actual system call interface, and are limited by that interface. And
    that interface is remarkably similar between the early days of Unix
    and recent Linux kernels for various architectures.

    Not precisely. On x86_64, for example, some Unixes use a flag
    bit to determine whether the system call failed, and return
    (positive) errno values; Linux returns negative numbers to
    indicate errors, and constrains those to values between -4095
    and -1.

    Presumably that specific set of values is constrained by `mmap`:
    assuming a minimum 4KiB page size, the last architecturally
    valid address where a page _could_ be mapped is equivalent to
    -4096 and the first is 0. If they did not have that constraint,
    they'd have to treat `mmap` specially in the system call path.

    I am pretty sure that in the old times, Linux-i386 indicated failure
    by returning a value with the MSB set, and the wrapper just checked
    whether the return value was negative. And for mmap() that worked
    because user-mode addresses were all below 2GB. Addresses furthere up
    where reserved for the kernel.

    I wonder how the kernel is informed that it can now return more
    addresses from mmap().

    Assuming you mean the Linux kernel, when it loads an ELF
    executable, the binary image itself is "branded" with an ABI
    type that it can use to make that determination.

    I have checked that with binaries compiled in 2003 and 2000:

    -rwxr-xr-x 1 root root 44660 Sep 26 2000 /usr/local/bin/gforth-0.5.0* -rwxr-xr-x 1 root root 92352 Sep 7 2003 /usr/local/bin/gforth-0.6.2*

    [~:160080] file /usr/local/bin/gforth-0.5.0
    /usr/local/bin/gforth-0.5.0: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, stripped
    [~:160081] file /usr/local/bin/gforth-0.6.2
    /usr/local/bin/gforth-0.6.2: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 2.0.0, stripped

    So there is actually a difference between these two. However, if I
    just strace them as they are now, they both happily produce very high
    addresses with mmap, e.g.,

    mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf7f64000

    I don't know what the difference is between "for GNU/Linux 2.0.0" and
    not having that, but the addresses produced by mmap() seem unaffected.

    However, by calling the binaries with setarch -L, mmap() returns only
    addresses < 2GB in all calls I have looked at. I guess if I had
    statically linked binaries, i.e., with old system call wrappers, I
    would have to use

    setarch -L <binary>

    to make it work properly with mmap(). Or maybe Linux is smart enough
    to do it by itself when it encounters a statically-linked old binary.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Anton Ertl on Thu Aug 14 07:58:41 2025
    [email protected] (Anton Ertl) writes:
    I am pretty sure that in the old times, Linux-i386 indicated failure
    by returning a value with the MSB set, and the wrapper just checked
    whether the return value was negative.

    I have now checked this by chrooting into an old Red Hat 6.2 system
    (not RHEL) with glibc-2.1.3 (released in Feb 2000) and its system call wrappers. And already those wrappers use the current way of
    determining whether a system call returns an error or not:

    For mmap():

    0xf7fd984b <__mmap+11>: int $0x80
    0xf7fd984d <__mmap+13>: mov %edx,%ebx
    0xf7fd984f <__mmap+15>: cmp $0xfffff000,%eax
    0xf7fd9854 <__mmap+20>: ja 0xf7fd9857 <__mmap+23>

    Bottom line: If Linux-i386 ever had a different way of determining
    whether a system call has an error result, it was changed to the
    current way early on. Given that IIRC I looked into that later than
    in 2000, my memory is obviously not of Linux. I must have looked at
    source code for a different system.

    Actually, the whole wrapper is short enough to easily understand what
    is going on:

    0xf7fd9840 <__mmap>: mov %ebx,%edx
    0xf7fd9842 <__mmap+2>: mov $0x5a,%eax
    0xf7fd9847 <__mmap+7>: lea 0x4(%esp,1),%ebx
    0xf7fd984b <__mmap+11>: int $0x80
    0xf7fd984d <__mmap+13>: mov %edx,%ebx
    0xf7fd984f <__mmap+15>: cmp $0xfffff000,%eax
    0xf7fd9854 <__mmap+20>: ja 0xf7fd9857 <__mmap+23>
    0xf7fd9856 <__mmap+22>: ret
    0xf7fd9857 <__mmap+23>: push %ebx
    0xf7fd9858 <__mmap+24>: call 0xf7fd985d <__mmap+29>
    0xf7fd985d <__mmap+29>: pop %ebx
    0xf7fd985e <__mmap+30>: xor %edx,%edx
    0xf7fd9860 <__mmap+32>: add $0x400b,%ebx
    0xf7fd9866 <__mmap+38>: sub %eax,%edx
    0xf7fd9868 <__mmap+40>: push %edx
    0xf7fd9869 <__mmap+41>: call 0xf7fd7f80 <__errno_location>
    0xf7fd986e <__mmap+46>: pop %ecx
    0xf7fd986f <__mmap+47>: pop %ebx
    0xf7fd9870 <__mmap+48>: mov %ecx,(%eax)
    0xf7fd9872 <__mmap+50>: or $0xffffffff,%eax
    0xf7fd9875 <__mmap+53>: jmp 0xf7fd9856 <__mmap+22>

    One interesting difference from the current way of invoking a system
    call is that (as far as I understand the wrapper) the wrapper loads
    the arguments from memory (IA-32 ABI passes parameters on the stack)
    into registers and then performs the system call in some newfangled
    way, whereas here the arguments are left in memory, and apparently a
    pointer to the first argument is passed in %ebx; the system call is
    invoked in the old way: int $0x80.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Anton Ertl on Thu Aug 14 13:28:31 2025
    [email protected] (Anton Ertl) writes:
    Bottom line: If Linux-i386 ever had a different way of determining
    whether a system call has an error result, it was changed to the
    current way early on. Given that IIRC I looked into that later than
    in 2000, my memory is obviously not of Linux. I must have looked at
    source code for a different system.

    I looked around and found
    <[email protected]>. I mentioned the Linux
    approach there, but apparently it did not stick in my memory. I
    linked to <http://stackoverflow.com/questions/36845866/history-of-using-negative-errno-values-in-gnu>,
    and there fuz writes:

    |Historically, system calls returned either a positive value (in case
    |of success) or a negative value indicating an error code. This has
    |been the case from the very beginning of UNIX as far as I'm concerned.

    and Steve Summit earlier writes essentially the same. But Lars
    Brinkhoff read my posting and contradicted Steve Summit and fuz, e.g.:

    |PDP-11 Unix V1 does not do this. When there's an error, the system
    |call sets the carry flag in the status register, and returns the error
    |code in register R0. On success, the carry flag is cleared, and R0
    |holds a return value. Unix V7 does the same.

    Why do I know he read my posting? Because he wrote a followup: <[email protected]>.

    In <[email protected]> I wrote:

    |Some Linux ports use a second register to indicate that there is an
    |error, and SPARC even uses the carry flag.

    So apparently I had looked at the source code of the C wrappers (or of
    the Linux kernel) at that point. I definitely remember finding this
    in some source code.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dan Cross@21:1/5 to Anton Ertl on Thu Aug 14 15:14:56 2025
    In article <[email protected]>,
    Anton Ertl <[email protected]> wrote: >[email protected] (Dan Cross) writes:
    In article <[email protected]>,
    Anton Ertl <[email protected]> wrote:
    For lseek(2):

    | Upon successful completion, lseek() returns the resulting offset
    | location as measured in bytes from the beginning of the file.

    Given that off_t is signed, lseek(2) can only return positive values.

    This is incorrect; or rather, it's accidentally correct now, but
    was not previously. The 1990 POSIX standard did not explicitly
    forbid a file that was so large that the offset couldn't
    overflow, hence why in 1990 POSIX you have to be careful about
    error handling when using `lseek`.

    It is true that POSIX 2024 _does_ prohibit seeking so far that
    the offset would become negative, however.

    I don't think that this is accidental. In 1990 signed overlow had
    reliable behaviour on common 2s-complement hardware with the C
    compilers of the day.

    This is simply not true. If anything, there was more variety of
    hardware supported by C90, and some of those systems were 1's
    complement or sign/mag, not 2's complement. Consequently,
    signed integer overflow has _always_ had undefined behavior in
    ANSI/ISO C.

    However, conversion from signed to unsigned has always been
    well-defined, and follows effectively 2's complement semantics.

    Conversion from unsigned to signed is a bit more complex, and is
    implementation defined, but not UB. Given that the system call
    interface is necessarily deeply intwined with the implementation
    I see no reason why the semantics of signed overflow should be
    an issue here.

    Nowadays the exotic hardware where this would
    not work that way has almost completely died out (and C is not used on
    the remaining exotic hardware),

    If by "C is not used" you mean newer editions of the C standard
    are not used on very old computers with strange representations
    of signed integers, then maybe.

    but now compilers sometimes do funny
    things on integer overflow, so better don't go there or anywhere near
    it.

    This isn't about signed overflow. The issue here is conversion
    of an unsigned value to signed; almost certainly, the kernel
    performs the calculation of the actual file offset using
    unsigned arithmetic, and relies on the (assembler, mind you)
    system call stubs to map those to the appropriate userspace
    type.

    I think this is mostly irrelevant, as the system call stub,
    almost by necessity, must be written in assembler in order to
    have percise control over the use of specific registers and so
    on. From C's perspective, a program making a system call just
    calls some function that's defined to return a signed integer;
    the assembler code that swizzles the register that integer will
    be extracted from sets things up accordingly. In other words,
    the conversion operation that the C standard mentions isn't at
    play, since the code that does the "conversion" is in assembly.
    Again from C's perspective the return value of the syscall stub
    function is already signed with no need of conversion.

    No, for `lseek`, the POSIX rationale explains the reasoning here
    quite clearly: the 1990 standard permitted negative offsets, and
    programs were expected to accommodate this by special handling
    of `errno` before and after calls to `lseek` that returned
    negative values. This was deemed onerous and fragile, so they
    modified the standard to prohibit calls that would result in
    negative offsets.

    But, POSIX 2024
    (still!!) supports multiple definitions of `off_t` for multiple >>environments, in which overflow is potentially unavoidable.

    POSIX also has the EOVERFLOW error for exactly that case.

    Bottom line: The off_t returned by lseek(2) is signed and always
    positive.

    As I said earlier, post POSIX.1-1990, this is true.

    For mmap(2):

    | On success, mmap() returns a pointer to the mapped area.

    So it's up to the kernel which user-level addresses it returns. E.g., >>>32-bit Linux originally only produced user-level addresses below 2GB. >>>When memories grew larger, on some architectures (e.g., i386) Linux >>>increased that to 3GB.

    The point is that the programmer shouldn't have to care.

    True, but completely misses the point.

    I don't see why. You were talking about the system call stubs,
    which run in userspace, and are responsbile for setting up state
    so that the kernel can perform some requested action on entry,
    whether by trap, call gate, or special instruction, and then for
    tearing down that state and handling errors on return from the
    kernel.

    For mmap, there is exactly one value that may be returned from
    the its stub that indicates an error; any other value, by
    definition, represents a valid mapping. Whether such a mapping
    falls in the first 2G, 3G, anything except the upper 256MiB, or
    some hole in the middle is the part that's irrelevant, and
    focusing on that misses the main point: all the stub has to do
    is detect the error, using whatever convetion the kernel
    specifies for communicating such things back to the program, and
    ensure that in an error case, MAP_FAILED is returned from the
    stub and `errno` is set appropriately. Everything else is
    superfluous.

    Sure, but system calls are first introduced in real kernels using the >>>actual system call interface, and are limited by that interface. And >>>that interface is remarkably similar between the early days of Unix
    and recent Linux kernels for various architectures.

    Not precisely. On x86_64, for example, some Unixes use a flag
    bit to determine whether the system call failed, and return
    (positive) errno values; Linux returns negative numbers to
    indicate errors, and constrains those to values between -4095
    and -1.

    Presumably that specific set of values is constrained by `mmap`:
    assuming a minimum 4KiB page size, the last architecturally
    valid address where a page _could_ be mapped is equivalent to
    -4096 and the first is 0. If they did not have that constraint,
    they'd have to treat `mmap` specially in the system call path.

    I am pretty sure that in the old times, Linux-i386 indicated failure
    by returning a value with the MSB set, and the wrapper just checked
    whether the return value was negative. And for mmap() that worked
    because user-mode addresses were all below 2GB. Addresses furthere up
    where reserved for the kernel.

    Define "Linux-i386" in this case. For the kernel, I'm confident
    that was NOT the case, and it is easy enough to research, since
    old kernel versions are online. Looking at e.g. 0.99.15, one
    can see that they set the carry bit in the flags register to
    indicate an error, along with returning a negative errno value: https://kernel.googlesource.com/pub/scm/linux/kernel/git/nico/archive/+/refs/tags/v0.99.15/kernel/sys_call.S

    By 2.0, they'd stopped setting the carry bit, though they
    continued to clear it on entry.

    But remember, `mmap` returns a pointer, not an integer, relying
    on libc to do the necessary translation between whatever the
    kernel returns and what the program expects. So if the behavior
    you describe where anywhere, it would be in libc. Given that
    they have, and had, a mechanism for signaling an error
    independent of C already, and necessarily the fixup of the
    return value must happen in the syscall stub in whatever library
    the system used, relying soley on negative values to detect
    errors seems like a poor design decision ifor a C library.

    So if what you're saying were true, such a check wuld have to
    be in the userspace library that provides the syscall stubs; the
    kernel really doesn't care. I don't know what version libc
    Torvalds started with, or if he did his own bespoke thing
    initially or something, but looking at some commonly used C
    libraries of a certain age, such as glibc 2.0 from 1997-ish, one
    can see that they're explicitly testing the error status against
    -4095 (as an unsigned value) in the stub. (e.g., in sysdeps/unix/sysv/linux/i386/syscall.S).

    But glibc-1.06.1 is a different story, and _does_ appear to
    simply test whether the return value is negative and then jump
    to an error handler if so. So mmap may have worked incidentally
    due to the restriction on where in the address space it would
    place a mapping in very early kernel versions, as you described,
    but that's a library issue, not a kernel issue: again, the
    kernel doesn't care.

    The old version of libc5 available on kernel.org similarly; it
    looks like HJ Lu changed the error handling path to explicitly
    compare against -4095 in October of 1996.

    So, fixed in the most common libc's used with Linux on i386 for
    nearly 30 years, well before the existence of x86_64.

    I wonder how the kernel is informed that it can now return more
    addresses from mmap().

    Assuming you mean the Linux kernel, when it loads an ELF
    executable, the binary image itself is "branded" with an ABI
    type that it can use to make that determination.

    I have checked that with binaries compiled in 2003 and 2000:

    -rwxr-xr-x 1 root root 44660 Sep 26 2000 /usr/local/bin/gforth-0.5.0* >-rwxr-xr-x 1 root root 92352 Sep 7 2003 /usr/local/bin/gforth-0.6.2*

    [~:160080] file /usr/local/bin/gforth-0.5.0
    /usr/local/bin/gforth-0.5.0: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, stripped
    [~:160081] file /usr/local/bin/gforth-0.6.2
    /usr/local/bin/gforth-0.6.2: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for
    GNU/Linux 2.0.0, stripped

    So there is actually a difference between these two. However, if I
    just strace them as they are now, they both happily produce very high >addresses with mmap, e.g.,

    mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf7f64000

    I don't see any reason why it wouldn't.

    I don't know what the difference is between "for GNU/Linux 2.0.0" and
    not having that,

    `file` is pulling that from a `PT_NOTE` segment defined in the
    program header for that second file. A better tool for picking
    apart the details of those binaries is probably `objdump`.

    I'm mildly curious what version of libc those are linked against
    (e.g., as reported by `ldd`).

    but the addresses produced by mmap() seem unaffected.

    I don't see why it would be. Any common libc post 1997-ish
    handles errors in a way that permits this to work correctly. If
    you tried glibc 1.0, it might be a different story, but the
    Linux folks forked that in 1994 and modified it as "Linux libc"
    and the

    However, by calling the binaries with setarch -L, mmap() returns only >addresses < 2GB in all calls I have looked at. I guess if I had
    statically linked binaries, i.e., with old system call wrappers, I
    would have to use

    setarch -L <binary>

    to make it work properly with mmap(). Or maybe Linux is smart enough
    to do it by itself when it encounters a statically-linked old binary.

    Unclear without looking at the kernel source code, but possibly.
    `setarch -L` turns on the "legacy" virtual address space layout,
    but I suspect that the number of binaries that _actually care_
    is pretty small, indeed.

    - Dan C.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Dan Cross on Thu Aug 14 15:32:40 2025
    [email protected] (Dan Cross) writes:
    In article <[email protected]>,
    Anton Ertl <[email protected]> wrote: >>[email protected] (Dan Cross) writes:
    In article <[email protected]>,
    Anton Ertl <[email protected]> wrote:
    For lseek(2):

    | Upon successful completion, lseek() returns the resulting offset
    | location as measured in bytes from the beginning of the file.

    Given that off_t is signed, lseek(2) can only return positive values.

    This is incorrect; or rather, it's accidentally correct now, but
    was not previously. The 1990 POSIX standard did not explicitly
    forbid a file that was so large that the offset couldn't
    overflow, hence why in 1990 POSIX you have to be careful about
    error handling when using `lseek`.

    It is true that POSIX 2024 _does_ prohibit seeking so far that
    the offset would become negative, however.

    I don't think that this is accidental. In 1990 signed overlow had
    reliable behaviour on common 2s-complement hardware with the C
    compilers of the day.

    This is simply not true. If anything, there was more variety of
    hardware supported by C90, and some of those systems were 1's
    complement or sign/mag, not 2's complement. Consequently,
    signed integer overflow has _always_ had undefined behavior in
    ANSI/ISO C.

    Both Burroughs Large Systems (48-bit stack machine) and the
    Sperry 1100/2200 (36-bit) systems had (have, in emulation today)
    C compilers.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dan Cross@21:1/5 to Scott Lurndal on Thu Aug 14 15:44:34 2025
    In article <sknnQ.168942$[email protected]>,
    Scott Lurndal <[email protected]> wrote:
    [email protected] (Dan Cross) writes:
    In article <[email protected]>,
    Anton Ertl <[email protected]> wrote: >>>[email protected] (Dan Cross) writes:
    In article <[email protected]>,
    Anton Ertl <[email protected]> wrote:
    For lseek(2):

    | Upon successful completion, lseek() returns the resulting offset
    | location as measured in bytes from the beginning of the file.

    Given that off_t is signed, lseek(2) can only return positive values.

    This is incorrect; or rather, it's accidentally correct now, but
    was not previously. The 1990 POSIX standard did not explicitly
    forbid a file that was so large that the offset couldn't
    overflow, hence why in 1990 POSIX you have to be careful about
    error handling when using `lseek`.

    It is true that POSIX 2024 _does_ prohibit seeking so far that
    the offset would become negative, however.

    I don't think that this is accidental. In 1990 signed overlow had >>>reliable behaviour on common 2s-complement hardware with the C
    compilers of the day.

    This is simply not true. If anything, there was more variety of
    hardware supported by C90, and some of those systems were 1's
    complement or sign/mag, not 2's complement. Consequently,
    signed integer overflow has _always_ had undefined behavior in
    ANSI/ISO C.

    Both Burroughs Large Systems (48-bit stack machine) and the
    Sperry 1100/2200 (36-bit) systems had (have, in emulation today)
    C compilers.

    Yup. The 1100-series machines were (are) 1's complement. Those
    are the ones I usually think of when cursing that signed integer
    overflow is UB in C.

    I don't think anyone is compiling C23 code for those machines,
    but back in the late 1980s, they were still enough of a going
    concern that they could influence the emerginc C standard. Not
    so much anymore.

    Regardless, signed integer overflow remains UB in the current C
    standard, nevermind definitionally following 2s complement
    semantics. Usually this is done on the basis of performance
    arguments: some seemingly-important loop optimizations can be
    made if the compiler can assert that overflow Cannot Happen.

    And of course, even today, C still targets oddball platforms
    like DSPs and custom chips, where assumptions about the ubiquity
    of 2's comp may not hold.

    - Dan C.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dan Cross@21:1/5 to Dan Cross on Thu Aug 14 15:25:13 2025
    In article <107kuhg$8ks$[email protected]>,
    Dan Cross <[email protected]> wrote:
    In article <[email protected]>,
    Anton Ertl <[email protected]> wrote: >>[email protected] (Dan Cross) writes:
    In article <[email protected]>,
    Anton Ertl <[email protected]> wrote:
    For lseek(2):

    | Upon successful completion, lseek() returns the resulting offset
    | location as measured in bytes from the beginning of the file.

    Given that off_t is signed, lseek(2) can only return positive values.

    This is incorrect; or rather, it's accidentally correct now, but
    was not previously. The 1990 POSIX standard did not explicitly
    forbid a file that was so large that the offset couldn't
    overflow, hence why in 1990 POSIX you have to be careful about
    error handling when using `lseek`.

    It is true that POSIX 2024 _does_ prohibit seeking so far that
    the offset would become negative, however.

    I don't think that this is accidental. In 1990 signed overlow had
    reliable behaviour on common 2s-complement hardware with the C
    compilers of the day.

    This is simply not true. If anything, there was more variety of
    hardware supported by C90, and some of those systems were 1's
    complement or sign/mag, not 2's complement. Consequently,
    signed integer overflow has _always_ had undefined behavior in
    ANSI/ISO C.

    However, conversion from signed to unsigned has always been
    well-defined, and follows effectively 2's complement semantics.

    Conversion from unsigned to signed is a bit more complex, and is >implementation defined, but not UB. Given that the system call
    interface is necessarily deeply intwined with the implementation
    I see no reason why the semantics of signed overflow should be
    an issue here.

    Nowadays the exotic hardware where this would
    not work that way has almost completely died out (and C is not used on
    the remaining exotic hardware),

    If by "C is not used" you mean newer editions of the C standard
    are not used on very old computers with strange representations
    of signed integers, then maybe.

    but now compilers sometimes do funny
    things on integer overflow, so better don't go there or anywhere near
    it.

    This isn't about signed overflow. The issue here is conversion
    of an unsigned value to signed; almost certainly, the kernel
    performs the calculation of the actual file offset using
    unsigned arithmetic, and relies on the (assembler, mind you)
    system call stubs to map those to the appropriate userspace
    type.

    I think this is mostly irrelevant, as the system call stub,
    almost by necessity, must be written in assembler in order to
    have percise control over the use of specific registers and so
    on. From C's perspective, a program making a system call just
    calls some function that's defined to return a signed integer;
    the assembler code that swizzles the register that integer will
    be extracted from sets things up accordingly. In other words,
    the conversion operation that the C standard mentions isn't at
    play, since the code that does the "conversion" is in assembly.
    Again from C's perspective the return value of the syscall stub
    function is already signed with no need of conversion.

    No, for `lseek`, the POSIX rationale explains the reasoning here
    quite clearly: the 1990 standard permitted negative offsets, and
    programs were expected to accommodate this by special handling
    of `errno` before and after calls to `lseek` that returned
    negative values. This was deemed onerous and fragile, so they
    modified the standard to prohibit calls that would result in
    negative offsets.

    But, POSIX 2024
    (still!!) supports multiple definitions of `off_t` for multiple >>>environments, in which overflow is potentially unavoidable.

    POSIX also has the EOVERFLOW error for exactly that case.

    Bottom line: The off_t returned by lseek(2) is signed and always
    positive.

    As I said earlier, post POSIX.1-1990, this is true.

    For mmap(2):

    | On success, mmap() returns a pointer to the mapped area.

    So it's up to the kernel which user-level addresses it returns. E.g., >>>>32-bit Linux originally only produced user-level addresses below 2GB. >>>>When memories grew larger, on some architectures (e.g., i386) Linux >>>>increased that to 3GB.

    The point is that the programmer shouldn't have to care.

    True, but completely misses the point.

    I don't see why. You were talking about the system call stubs,
    which run in userspace, and are responsbile for setting up state
    so that the kernel can perform some requested action on entry,
    whether by trap, call gate, or special instruction, and then for
    tearing down that state and handling errors on return from the
    kernel.

    For mmap, there is exactly one value that may be returned from
    the its stub that indicates an error; any other value, by
    definition, represents a valid mapping. Whether such a mapping
    falls in the first 2G, 3G, anything except the upper 256MiB, or
    some hole in the middle is the part that's irrelevant, and
    focusing on that misses the main point: all the stub has to do
    is detect the error, using whatever convetion the kernel
    specifies for communicating such things back to the program, and
    ensure that in an error case, MAP_FAILED is returned from the
    stub and `errno` is set appropriately. Everything else is
    superfluous.

    Sure, but system calls are first introduced in real kernels using the >>>>actual system call interface, and are limited by that interface. And >>>>that interface is remarkably similar between the early days of Unix
    and recent Linux kernels for various architectures.

    Not precisely. On x86_64, for example, some Unixes use a flag
    bit to determine whether the system call failed, and return
    (positive) errno values; Linux returns negative numbers to
    indicate errors, and constrains those to values between -4095
    and -1.

    Presumably that specific set of values is constrained by `mmap`:
    assuming a minimum 4KiB page size, the last architecturally
    valid address where a page _could_ be mapped is equivalent to
    -4096 and the first is 0. If they did not have that constraint,
    they'd have to treat `mmap` specially in the system call path.

    I am pretty sure that in the old times, Linux-i386 indicated failure
    by returning a value with the MSB set, and the wrapper just checked
    whether the return value was negative. And for mmap() that worked
    because user-mode addresses were all below 2GB. Addresses furthere up >>where reserved for the kernel.

    Define "Linux-i386" in this case. For the kernel, I'm confident
    that was NOT the case, and it is easy enough to research, since
    old kernel versions are online. Looking at e.g. 0.99.15, one
    can see that they set the carry bit in the flags register to
    indicate an error, along with returning a negative errno value: >https://kernel.googlesource.com/pub/scm/linux/kernel/git/nico/archive/+/refs/tags/v0.99.15/kernel/sys_call.S

    By 2.0, they'd stopped setting the carry bit, though they
    continued to clear it on entry.

    But remember, `mmap` returns a pointer, not an integer, relying
    on libc to do the necessary translation between whatever the
    kernel returns and what the program expects. So if the behavior
    you describe where anywhere, it would be in libc. Given that
    they have, and had, a mechanism for signaling an error
    independent of C already, and necessarily the fixup of the
    return value must happen in the syscall stub in whatever library
    the system used, relying soley on negative values to detect
    errors seems like a poor design decision ifor a C library.

    So if what you're saying were true, such a check wuld have to
    be in the userspace library that provides the syscall stubs; the
    kernel really doesn't care. I don't know what version libc
    Torvalds started with, or if he did his own bespoke thing
    initially or something, but looking at some commonly used C
    libraries of a certain age, such as glibc 2.0 from 1997-ish, one
    can see that they're explicitly testing the error status against
    -4095 (as an unsigned value) in the stub. (e.g., in >sysdeps/unix/sysv/linux/i386/syscall.S).

    But glibc-1.06.1 is a different story, and _does_ appear to
    simply test whether the return value is negative and then jump
    to an error handler if so. So mmap may have worked incidentally
    due to the restriction on where in the address space it would
    place a mapping in very early kernel versions, as you described,
    but that's a library issue, not a kernel issue: again, the
    kernel doesn't care.

    The old version of libc5 available on kernel.org similarly; it
    looks like HJ Lu changed the error handling path to explicitly
    compare against -4095 in October of 1996.

    So, fixed in the most common libc's used with Linux on i386 for
    nearly 30 years, well before the existence of x86_64.

    I wonder how the kernel is informed that it can now return more >>>>addresses from mmap().

    Assuming you mean the Linux kernel, when it loads an ELF
    executable, the binary image itself is "branded" with an ABI
    type that it can use to make that determination.

    I have checked that with binaries compiled in 2003 and 2000:

    -rwxr-xr-x 1 root root 44660 Sep 26 2000 /usr/local/bin/gforth-0.5.0* >>-rwxr-xr-x 1 root root 92352 Sep 7 2003 /usr/local/bin/gforth-0.6.2*

    [~:160080] file /usr/local/bin/gforth-0.5.0
    /usr/local/bin/gforth-0.5.0: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, stripped
    [~:160081] file /usr/local/bin/gforth-0.6.2
    /usr/local/bin/gforth-0.6.2: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for
    GNU/Linux 2.0.0, stripped

    So there is actually a difference between these two. However, if I
    just strace them as they are now, they both happily produce very high >>addresses with mmap, e.g.,

    mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf7f64000

    I don't see any reason why it wouldn't.

    I don't know what the difference is between "for GNU/Linux 2.0.0" and
    not having that,

    `file` is pulling that from a `PT_NOTE` segment defined in the
    program header for that second file. A better tool for picking
    apart the details of those binaries is probably `objdump`.

    I'm mildly curious what version of libc those are linked against
    (e.g., as reported by `ldd`).

    but the addresses produced by mmap() seem unaffected.

    I don't see why it would be. Any common libc post 1997-ish
    handles errors in a way that permits this to work correctly. If
    you tried glibc 1.0, it might be a different story, but the
    Linux folks forked that in 1994 and modified it as "Linux libc"
    and the

    ...and the Linux folks changed this to the present mechanism in
    1996.

    (Sorry 'bout that.)

    However, by calling the binaries with setarch -L, mmap() returns only >>addresses < 2GB in all calls I have looked at. I guess if I had
    statically linked binaries, i.e., with old system call wrappers, I
    would have to use

    setarch -L <binary>

    to make it work properly with mmap(). Or maybe Linux is smart enough
    to do it by itself when it encounters a statically-linked old binary.

    Unclear without looking at the kernel source code, but possibly.
    `setarch -L` turns on the "legacy" virtual address space layout,
    but I suspect that the number of binaries that _actually care_
    is pretty small, indeed.

    - Dan C.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)