[email protected] (Scott Lurndal) writes:
That said, Unix generally defined -1 as the return value for all
other system calls, and code that checked for "< 0" instead of
-1 when calling a standard library function or system call was fundamentally >>broken.
That may be the interface of the C system call wrapper,
errno, but at the actual system call level, the error is indicated in
an architecture-specific way, and the ones I have looked at before
today use the sign of the result register or the carry flag. On those >architectures, where the sign is used, mmap(2) cannot return negative >addresses, or must have a special wrapper.
Let's look at what the system call wrappers do on RV64G(C) (which has
no carry flag). For read(2) the wrapper contains:
0x3ff7f173be <read+20>: ecall
0x3ff7f173c2 <read+24>: lui a5,0xfffff
0x3ff7f173c4 <read+26>: mv s0,a0
0x3ff7f173c6 <read+28>: bltu a5,a0,0x3ff7f1740e <read+100>
For dup(2) the wrapper contains:
0x3ff7e7fe9a <dup+2>: ecall
0x3ff7e7fe9e <dup+6>: lui a7,0xfffff
0x3ff7e7fea0 <dup+8>: bltu a7,a0,0x3ff7e7fea6 <dup+14>
and for mmap(2):
0x3ff7e86b6e <mmap64+12>: ecall
0x3ff7e86b72 <mmap64+16>: lui a5,0xfffff
0x3ff7e86b74 <mmap64+18>: bltu a5,a0,0x3ff7e86b8c <mmap64+42>
So instead of checking for the sign flag, on RV64G the wrapper checks
if the result is >0xfffff00000000000. This costs one instruction more
than just checking the sign flag, and allows to almost double the
number of bytes read(2) can read in one call, the number of file ids
that cn be returned by dup(2), and the address range returnable by
mmap(2). Will we ever see processes that need more than 8EB? Maybe
not, but the designers of the RV64G(C) ABI obviously did not want to
be the ones that are quoted as saying "8EB should be enough for
anyone":-).
Followups to comp.arch
- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <[email protected]>
[email protected] (Anton Ertl) writes:
[email protected] (Scott Lurndal) writes:
That said, Unix generally defined -1 as the return value for all
other system calls, and code that checked for "< 0" instead of
-1 when calling a standard library function or system call was fundamentally >>>broken.
That may be the interface of the C system call wrapper,
It _is_ the interface that the programmers need to be
concerted with when using POSIX C language bindings.
at the actual system call level, the error is indicated in
an architecture-specific way, and the ones I have looked at before
today use the sign of the result register or the carry flag. On those >>architectures, where the sign is used, mmap(2) cannot return negative >>addresses, or must have a special wrapper.
Why would the wrapper care if the system call failed?
lseek(2) and mmap(2) both require the return of arbitrary 32-bit
or 64-bit values, including those which when interpreted as signed
values are negative.
Clearly POSIX defines the interfaces and the underlying OS and/or
library functions implement the interfaces. The kernel interface
to the language library (e.g. libc) is irrelevent to typical programmers
[email protected] (Scott Lurndal) writes:
[email protected] (Anton Ertl) writes:
[email protected] (Scott Lurndal) writes:
That said, Unix generally defined -1 as the return value for all
other system calls, and code that checked for "< 0" instead of
-1 when calling a standard library function or system call was fundamentally
broken.
That may be the interface of the C system call wrapper,
It _is_ the interface that the programmers need to be
concerted with when using POSIX C language bindings.
True, but not relevant for the question at hand.
at the actual system call level, the error is indicated in
an architecture-specific way, and the ones I have looked at before
today use the sign of the result register or the carry flag. On those >>>architectures, where the sign is used, mmap(2) cannot return negative >>>addresses, or must have a special wrapper.
Why would the wrapper care if the system call failed?
The actual system call returns an error flag and a register. On some >architectures, they support just a register. If there is no error,
the wrapper returns the content of the register. If the system call >indicates an error, you see from the value of the register which error
it is; the wrapper then typically transforms the register in some way
(e.g., by negating it) and stores the result in errno, and returns -1.
lseek(2) and mmap(2) both require the return of arbitrary 32-bit
or 64-bit values, including those which when interpreted as signed
values are negative.
For lseek(2):
| Upon successful completion, lseek() returns the resulting offset
| location as measured in bytes from the beginning of the file.
Given that off_t is signed, lseek(2) can only return positive values.
For mmap(2):
| On success, mmap() returns a pointer to the mapped area.
So it's up to the kernel which user-level addresses it returns. E.g.,
32-bit Linux originally only produced user-level addresses below 2GB.
When memories grew larger, on some architectures (e.g., i386) Linux
increased that to 3GB.
[email protected] (Scott Lurndal) writes:
[snip]
errno, but at the actual system call level, the error is indicated in
an architecture-specific way, and the ones I have looked at before
today use the sign of the result register or the carry flag. On those >>architectures, where the sign is used, mmap(2) cannot return negative >>addresses, or must have a special wrapper.
Why would the wrapper care if the system call failed? The
return value from the kernel should be passed through to
the application as per the POSIX language binding requirements.
lseek(2) and mmap(2) both require the return of arbitrary 32-bit
or 64-bit values, including those which when interpreted as signed
values are negative.
Clearly POSIX defines the interfaces and the underlying OS and/or
library functions implement the interfaces. The kernel interface
to the language library (e.g. libc) is irrelevent to typical programmers, >except in the case where it doesn't provide the correct semantics.
[email protected] (Scott Lurndal) writes:
[email protected] (Anton Ertl) writes:
[email protected] (Scott Lurndal) writes:
That said, Unix generally defined -1 as the return value for all
other system calls, and code that checked for "< 0" instead of
-1 when calling a standard library function or system call was fundamentally
broken.
That may be the interface of the C system call wrapper,
It _is_ the interface that the programmers need to be
concerted with when using POSIX C language bindings.
True, but not relevant for the question at hand.
at the actual system call level, the error is indicated in
an architecture-specific way, and the ones I have looked at before
today use the sign of the result register or the carry flag. On those >>>architectures, where the sign is used, mmap(2) cannot return negative >>>addresses, or must have a special wrapper.
Why would the wrapper care if the system call failed?
The actual system call returns an error flag and a register. On some >architectures, they support just a register. If there is no error,
the wrapper returns the content of the register. If the system call >indicates an error, you see from the value of the register which error
it is; the wrapper then typically transforms the register in some way
(e.g., by negating it) and stores the result in errno, and returns -1.
lseek(2) and mmap(2) both require the return of arbitrary 32-bit
or 64-bit values, including those which when interpreted as signed
values are negative.
For lseek(2):
| Upon successful completion, lseek() returns the resulting offset
| location as measured in bytes from the beginning of the file.
Given that off_t is signed, lseek(2) can only return positive values.
For mmap(2):
| On success, mmap() returns a pointer to the mapped area.
So it's up to the kernel which user-level addresses it returns. E.g.,
32-bit Linux originally only produced user-level addresses below 2GB.
When memories grew larger, on some architectures (e.g., i386) Linux
increased that to 3GB.
Clearly POSIX defines the interfaces and the underlying OS and/or
library functions implement the interfaces. The kernel interface
to the language library (e.g. libc) is irrelevent to typical programmers
Sure, but system calls are first introduced in real kernels using the
actual system call interface, and are limited by that interface. And
that interface is remarkably similar between the early days of Unix
and recent Linux kernels for various architectures.
And when you look
closely, you find how the system calls are design to support returning
the error indication, success value, and errno in one register.
lseek64 on 32-bit platforms is an exception (the success value does
not fit in one register), and looking at the machine code of the
wrapper and comparing it with the machine code for the lseek wrapper,
some funny things are going on, but I would have to look at the source
code to understand what is going on. One other interesting thing I
noticed is that the system call wrappers from libc-2.36 on i386 now
draws the boundary between success returns and error returns at
0xfffff000:
0xf7d853c4 <lseek+68>: call *%gs:0x10
0xf7d853cb <lseek+75>: cmp $0xfffff000,%eax
0xf7d853d0 <lseek+80>: ja 0xf7d85410 <lseek+144>
So now the kernel can produce 4095 error values, and the rest can be
success values. In particular, mmap() can return all possible page
addresses as success values with these wrappers. When I last looked
at how system calls are done, I found just a check of the N or the C
flag.
I wonder how the kernel is informed that it can now return more
addresses from mmap().
[snip]
all that said, my initial point about -1 was that applications
should always check for -1 (or MAP_FAILED), not for return
values less than zero. The actual kernel interface to the
C library is clearly implementation dependent although it
must preserve the user-visible required semantics.
In article <MO1nQ.2$[email protected]>, Scott Lurndal <[email protected]> wrote: >>[email protected] (Anton Ertl) writes:
[email protected] (Scott Lurndal) writes:
For mmap, at least the only documented error return value is
`MAP_FAILED`, and programmers must check for that explicitly.
It strikes me that this implies that the _value_ of `MAP_FAILED`
need not be -1; on x86_64, for instance, it _could_ be any
non-canonical address.
In article <[email protected]>,
Anton Ertl <[email protected]> wrote:
For lseek(2):
| Upon successful completion, lseek() returns the resulting offset
| location as measured in bytes from the beginning of the file.
Given that off_t is signed, lseek(2) can only return positive values.
This is incorrect; or rather, it's accidentally correct now, but
was not previously. The 1990 POSIX standard did not explicitly
forbid a file that was so large that the offset couldn't
overflow, hence why in 1990 POSIX you have to be careful about
error handling when using `lseek`.
It is true that POSIX 2024 _does_ prohibit seeking so far that
the offset would become negative, however.
But, POSIX 2024
(still!!) supports multiple definitions of `off_t` for multiple
environments, in which overflow is potentially unavoidable.
For mmap(2):
| On success, mmap() returns a pointer to the mapped area.
So it's up to the kernel which user-level addresses it returns. E.g., >>32-bit Linux originally only produced user-level addresses below 2GB.
When memories grew larger, on some architectures (e.g., i386) Linux >>increased that to 3GB.
The point is that the programmer shouldn't have to care.
Sure, but system calls are first introduced in real kernels using the >>actual system call interface, and are limited by that interface. And
that interface is remarkably similar between the early days of Unix
and recent Linux kernels for various architectures.
Not precisely. On x86_64, for example, some Unixes use a flag
bit to determine whether the system call failed, and return
(positive) errno values; Linux returns negative numbers to
indicate errors, and constrains those to values between -4095
and -1.
Presumably that specific set of values is constrained by `mmap`:
assuming a minimum 4KiB page size, the last architecturally
valid address where a page _could_ be mapped is equivalent to
-4096 and the first is 0. If they did not have that constraint,
they'd have to treat `mmap` specially in the system call path.
I wonder how the kernel is informed that it can now return more
addresses from mmap().
Assuming you mean the Linux kernel, when it loads an ELF
executable, the binary image itself is "branded" with an ABI
type that it can use to make that determination.
I am pretty sure that in the old times, Linux-i386 indicated failure
by returning a value with the MSB set, and the wrapper just checked
whether the return value was negative.
Bottom line: If Linux-i386 ever had a different way of determining
whether a system call has an error result, it was changed to the
current way early on. Given that IIRC I looked into that later than
in 2000, my memory is obviously not of Linux. I must have looked at
source code for a different system.
In article <[email protected]>,
Anton Ertl <[email protected]> wrote:
For lseek(2):
| Upon successful completion, lseek() returns the resulting offset
| location as measured in bytes from the beginning of the file.
Given that off_t is signed, lseek(2) can only return positive values.
This is incorrect; or rather, it's accidentally correct now, but
was not previously. The 1990 POSIX standard did not explicitly
forbid a file that was so large that the offset couldn't
overflow, hence why in 1990 POSIX you have to be careful about
error handling when using `lseek`.
It is true that POSIX 2024 _does_ prohibit seeking so far that
the offset would become negative, however.
I don't think that this is accidental. In 1990 signed overlow had
reliable behaviour on common 2s-complement hardware with the C
compilers of the day.
Nowadays the exotic hardware where this would
not work that way has almost completely died out (and C is not used on
the remaining exotic hardware),
but now compilers sometimes do funny
things on integer overflow, so better don't go there or anywhere near
it.
But, POSIX 2024
(still!!) supports multiple definitions of `off_t` for multiple >>environments, in which overflow is potentially unavoidable.
POSIX also has the EOVERFLOW error for exactly that case.
Bottom line: The off_t returned by lseek(2) is signed and always
positive.
For mmap(2):
| On success, mmap() returns a pointer to the mapped area.
So it's up to the kernel which user-level addresses it returns. E.g., >>>32-bit Linux originally only produced user-level addresses below 2GB. >>>When memories grew larger, on some architectures (e.g., i386) Linux >>>increased that to 3GB.
The point is that the programmer shouldn't have to care.
True, but completely misses the point.
Sure, but system calls are first introduced in real kernels using the >>>actual system call interface, and are limited by that interface. And >>>that interface is remarkably similar between the early days of Unix
and recent Linux kernels for various architectures.
Not precisely. On x86_64, for example, some Unixes use a flag
bit to determine whether the system call failed, and return
(positive) errno values; Linux returns negative numbers to
indicate errors, and constrains those to values between -4095
and -1.
Presumably that specific set of values is constrained by `mmap`:
assuming a minimum 4KiB page size, the last architecturally
valid address where a page _could_ be mapped is equivalent to
-4096 and the first is 0. If they did not have that constraint,
they'd have to treat `mmap` specially in the system call path.
I am pretty sure that in the old times, Linux-i386 indicated failure
by returning a value with the MSB set, and the wrapper just checked
whether the return value was negative. And for mmap() that worked
because user-mode addresses were all below 2GB. Addresses furthere up
where reserved for the kernel.
I wonder how the kernel is informed that it can now return more
addresses from mmap().
Assuming you mean the Linux kernel, when it loads an ELF
executable, the binary image itself is "branded" with an ABI
type that it can use to make that determination.
I have checked that with binaries compiled in 2003 and 2000:
-rwxr-xr-x 1 root root 44660 Sep 26 2000 /usr/local/bin/gforth-0.5.0* >-rwxr-xr-x 1 root root 92352 Sep 7 2003 /usr/local/bin/gforth-0.6.2*
[~:160080] file /usr/local/bin/gforth-0.5.0
/usr/local/bin/gforth-0.5.0: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, stripped
[~:160081] file /usr/local/bin/gforth-0.6.2
/usr/local/bin/gforth-0.6.2: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for
GNU/Linux 2.0.0, stripped
So there is actually a difference between these two. However, if I
just strace them as they are now, they both happily produce very high >addresses with mmap, e.g.,
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf7f64000
I don't know what the difference is between "for GNU/Linux 2.0.0" and
not having that,
but the addresses produced by mmap() seem unaffected.
However, by calling the binaries with setarch -L, mmap() returns only >addresses < 2GB in all calls I have looked at. I guess if I had
statically linked binaries, i.e., with old system call wrappers, I
would have to use
setarch -L <binary>
to make it work properly with mmap(). Or maybe Linux is smart enough
to do it by itself when it encounters a statically-linked old binary.
In article <[email protected]>,
Anton Ertl <[email protected]> wrote: >>[email protected] (Dan Cross) writes:
In article <[email protected]>,
Anton Ertl <[email protected]> wrote:
For lseek(2):
| Upon successful completion, lseek() returns the resulting offset
| location as measured in bytes from the beginning of the file.
Given that off_t is signed, lseek(2) can only return positive values.
This is incorrect; or rather, it's accidentally correct now, but
was not previously. The 1990 POSIX standard did not explicitly
forbid a file that was so large that the offset couldn't
overflow, hence why in 1990 POSIX you have to be careful about
error handling when using `lseek`.
It is true that POSIX 2024 _does_ prohibit seeking so far that
the offset would become negative, however.
I don't think that this is accidental. In 1990 signed overlow had
reliable behaviour on common 2s-complement hardware with the C
compilers of the day.
This is simply not true. If anything, there was more variety of
hardware supported by C90, and some of those systems were 1's
complement or sign/mag, not 2's complement. Consequently,
signed integer overflow has _always_ had undefined behavior in
ANSI/ISO C.
[email protected] (Dan Cross) writes:
In article <[email protected]>,
Anton Ertl <[email protected]> wrote: >>>[email protected] (Dan Cross) writes:
In article <[email protected]>,
Anton Ertl <[email protected]> wrote:
For lseek(2):
| Upon successful completion, lseek() returns the resulting offset
| location as measured in bytes from the beginning of the file.
Given that off_t is signed, lseek(2) can only return positive values.
This is incorrect; or rather, it's accidentally correct now, but
was not previously. The 1990 POSIX standard did not explicitly
forbid a file that was so large that the offset couldn't
overflow, hence why in 1990 POSIX you have to be careful about
error handling when using `lseek`.
It is true that POSIX 2024 _does_ prohibit seeking so far that
the offset would become negative, however.
I don't think that this is accidental. In 1990 signed overlow had >>>reliable behaviour on common 2s-complement hardware with the C
compilers of the day.
This is simply not true. If anything, there was more variety of
hardware supported by C90, and some of those systems were 1's
complement or sign/mag, not 2's complement. Consequently,
signed integer overflow has _always_ had undefined behavior in
ANSI/ISO C.
Both Burroughs Large Systems (48-bit stack machine) and the
Sperry 1100/2200 (36-bit) systems had (have, in emulation today)
C compilers.
In article <[email protected]>,
Anton Ertl <[email protected]> wrote: >>[email protected] (Dan Cross) writes:
In article <[email protected]>,
Anton Ertl <[email protected]> wrote:
For lseek(2):
| Upon successful completion, lseek() returns the resulting offset
| location as measured in bytes from the beginning of the file.
Given that off_t is signed, lseek(2) can only return positive values.
This is incorrect; or rather, it's accidentally correct now, but
was not previously. The 1990 POSIX standard did not explicitly
forbid a file that was so large that the offset couldn't
overflow, hence why in 1990 POSIX you have to be careful about
error handling when using `lseek`.
It is true that POSIX 2024 _does_ prohibit seeking so far that
the offset would become negative, however.
I don't think that this is accidental. In 1990 signed overlow had
reliable behaviour on common 2s-complement hardware with the C
compilers of the day.
This is simply not true. If anything, there was more variety of
hardware supported by C90, and some of those systems were 1's
complement or sign/mag, not 2's complement. Consequently,
signed integer overflow has _always_ had undefined behavior in
ANSI/ISO C.
However, conversion from signed to unsigned has always been
well-defined, and follows effectively 2's complement semantics.
Conversion from unsigned to signed is a bit more complex, and is >implementation defined, but not UB. Given that the system call
interface is necessarily deeply intwined with the implementation
I see no reason why the semantics of signed overflow should be
an issue here.
Nowadays the exotic hardware where this would
not work that way has almost completely died out (and C is not used on
the remaining exotic hardware),
If by "C is not used" you mean newer editions of the C standard
are not used on very old computers with strange representations
of signed integers, then maybe.
but now compilers sometimes do funny
things on integer overflow, so better don't go there or anywhere near
it.
This isn't about signed overflow. The issue here is conversion
of an unsigned value to signed; almost certainly, the kernel
performs the calculation of the actual file offset using
unsigned arithmetic, and relies on the (assembler, mind you)
system call stubs to map those to the appropriate userspace
type.
I think this is mostly irrelevant, as the system call stub,
almost by necessity, must be written in assembler in order to
have percise control over the use of specific registers and so
on. From C's perspective, a program making a system call just
calls some function that's defined to return a signed integer;
the assembler code that swizzles the register that integer will
be extracted from sets things up accordingly. In other words,
the conversion operation that the C standard mentions isn't at
play, since the code that does the "conversion" is in assembly.
Again from C's perspective the return value of the syscall stub
function is already signed with no need of conversion.
No, for `lseek`, the POSIX rationale explains the reasoning here
quite clearly: the 1990 standard permitted negative offsets, and
programs were expected to accommodate this by special handling
of `errno` before and after calls to `lseek` that returned
negative values. This was deemed onerous and fragile, so they
modified the standard to prohibit calls that would result in
negative offsets.
But, POSIX 2024
(still!!) supports multiple definitions of `off_t` for multiple >>>environments, in which overflow is potentially unavoidable.
POSIX also has the EOVERFLOW error for exactly that case.
Bottom line: The off_t returned by lseek(2) is signed and always
positive.
As I said earlier, post POSIX.1-1990, this is true.
For mmap(2):
| On success, mmap() returns a pointer to the mapped area.
So it's up to the kernel which user-level addresses it returns. E.g., >>>>32-bit Linux originally only produced user-level addresses below 2GB. >>>>When memories grew larger, on some architectures (e.g., i386) Linux >>>>increased that to 3GB.
The point is that the programmer shouldn't have to care.
True, but completely misses the point.
I don't see why. You were talking about the system call stubs,
which run in userspace, and are responsbile for setting up state
so that the kernel can perform some requested action on entry,
whether by trap, call gate, or special instruction, and then for
tearing down that state and handling errors on return from the
kernel.
For mmap, there is exactly one value that may be returned from
the its stub that indicates an error; any other value, by
definition, represents a valid mapping. Whether such a mapping
falls in the first 2G, 3G, anything except the upper 256MiB, or
some hole in the middle is the part that's irrelevant, and
focusing on that misses the main point: all the stub has to do
is detect the error, using whatever convetion the kernel
specifies for communicating such things back to the program, and
ensure that in an error case, MAP_FAILED is returned from the
stub and `errno` is set appropriately. Everything else is
superfluous.
Sure, but system calls are first introduced in real kernels using the >>>>actual system call interface, and are limited by that interface. And >>>>that interface is remarkably similar between the early days of Unix
and recent Linux kernels for various architectures.
Not precisely. On x86_64, for example, some Unixes use a flag
bit to determine whether the system call failed, and return
(positive) errno values; Linux returns negative numbers to
indicate errors, and constrains those to values between -4095
and -1.
Presumably that specific set of values is constrained by `mmap`:
assuming a minimum 4KiB page size, the last architecturally
valid address where a page _could_ be mapped is equivalent to
-4096 and the first is 0. If they did not have that constraint,
they'd have to treat `mmap` specially in the system call path.
I am pretty sure that in the old times, Linux-i386 indicated failure
by returning a value with the MSB set, and the wrapper just checked
whether the return value was negative. And for mmap() that worked
because user-mode addresses were all below 2GB. Addresses furthere up >>where reserved for the kernel.
Define "Linux-i386" in this case. For the kernel, I'm confident
that was NOT the case, and it is easy enough to research, since
old kernel versions are online. Looking at e.g. 0.99.15, one
can see that they set the carry bit in the flags register to
indicate an error, along with returning a negative errno value: >https://kernel.googlesource.com/pub/scm/linux/kernel/git/nico/archive/+/refs/tags/v0.99.15/kernel/sys_call.S
By 2.0, they'd stopped setting the carry bit, though they
continued to clear it on entry.
But remember, `mmap` returns a pointer, not an integer, relying
on libc to do the necessary translation between whatever the
kernel returns and what the program expects. So if the behavior
you describe where anywhere, it would be in libc. Given that
they have, and had, a mechanism for signaling an error
independent of C already, and necessarily the fixup of the
return value must happen in the syscall stub in whatever library
the system used, relying soley on negative values to detect
errors seems like a poor design decision ifor a C library.
So if what you're saying were true, such a check wuld have to
be in the userspace library that provides the syscall stubs; the
kernel really doesn't care. I don't know what version libc
Torvalds started with, or if he did his own bespoke thing
initially or something, but looking at some commonly used C
libraries of a certain age, such as glibc 2.0 from 1997-ish, one
can see that they're explicitly testing the error status against
-4095 (as an unsigned value) in the stub. (e.g., in >sysdeps/unix/sysv/linux/i386/syscall.S).
But glibc-1.06.1 is a different story, and _does_ appear to
simply test whether the return value is negative and then jump
to an error handler if so. So mmap may have worked incidentally
due to the restriction on where in the address space it would
place a mapping in very early kernel versions, as you described,
but that's a library issue, not a kernel issue: again, the
kernel doesn't care.
The old version of libc5 available on kernel.org similarly; it
looks like HJ Lu changed the error handling path to explicitly
compare against -4095 in October of 1996.
So, fixed in the most common libc's used with Linux on i386 for
nearly 30 years, well before the existence of x86_64.
I wonder how the kernel is informed that it can now return more >>>>addresses from mmap().
Assuming you mean the Linux kernel, when it loads an ELF
executable, the binary image itself is "branded" with an ABI
type that it can use to make that determination.
I have checked that with binaries compiled in 2003 and 2000:
-rwxr-xr-x 1 root root 44660 Sep 26 2000 /usr/local/bin/gforth-0.5.0* >>-rwxr-xr-x 1 root root 92352 Sep 7 2003 /usr/local/bin/gforth-0.6.2*
[~:160080] file /usr/local/bin/gforth-0.5.0
/usr/local/bin/gforth-0.5.0: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, stripped
[~:160081] file /usr/local/bin/gforth-0.6.2
/usr/local/bin/gforth-0.6.2: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for
GNU/Linux 2.0.0, stripped
So there is actually a difference between these two. However, if I
just strace them as they are now, they both happily produce very high >>addresses with mmap, e.g.,
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf7f64000
I don't see any reason why it wouldn't.
I don't know what the difference is between "for GNU/Linux 2.0.0" and
not having that,
`file` is pulling that from a `PT_NOTE` segment defined in the
program header for that second file. A better tool for picking
apart the details of those binaries is probably `objdump`.
I'm mildly curious what version of libc those are linked against
(e.g., as reported by `ldd`).
but the addresses produced by mmap() seem unaffected.
I don't see why it would be. Any common libc post 1997-ish
handles errors in a way that permits this to work correctly. If
you tried glibc 1.0, it might be a different story, but the
Linux folks forked that in 1994 and modified it as "Linux libc"
and the
However, by calling the binaries with setarch -L, mmap() returns only >>addresses < 2GB in all calls I have looked at. I guess if I had
statically linked binaries, i.e., with old system call wrappers, I
would have to use
setarch -L <binary>
to make it work properly with mmap(). Or maybe Linux is smart enough
to do it by itself when it encounters a statically-linked old binary.
Unclear without looking at the kernel source code, but possibly.
`setarch -L` turns on the "legacy" virtual address space layout,
but I suspect that the number of binaries that _actually care_
is pretty small, indeed.
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 40:41:15 |
| Calls: | 12,109 |
| Files: | 15,006 |
| Messages: | 6,518,399 |