Forum: >>> Magnum BBS <<<

Re: System calls

From Thomas Koenig@21:1/5 to David Brown on Thu Aug 14 17:43:50 2025

David Brown <[email protected]> schrieb:

The point is that there when the results of an integer computation are
too big, there is no way to get the correct answer in the types used.
Two's complement wrapping is /not/ correct. If you add two real-world positive integers, you don't get a negative integer.

I believe it was you who wrote "If you add enough apples to a
pile, the number of apples becomes negative", so there is
clerly a defined physical meaning to overflow.

:-)
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Dan Cross on Thu Aug 14 19:15:42 2025

On 14.08.2025 17:44, Dan Cross wrote:

In article <sknnQ.168942$[email protected]>,
Scott Lurndal <[email protected]> wrote:

Both Burroughs Large Systems (48-bit stack machine) and the
Sperry 1100/2200 (36-bit) systems had (have, in emulation today)
C compilers.

Yup. The 1100-series machines were (are) 1's complement. Those
are the ones I usually think of when cursing that signed integer
overflow is UB in C.

I don't think anyone is compiling C23 code for those machines,
but back in the late 1980s, they were still enough of a going
concern that they could influence the emerginc C standard. Not
so much anymore.

They would presumably have been part of the justification for supporting multiple signed integer formats at the time. UB on signed integer
arithmetic overflow is a different matter altogether.

Regardless, signed integer overflow remains UB in the current C
standard, nevermind definitionally following 2s complement
semantics. Usually this is done on the basis of performance
arguments: some seemingly-important loop optimizations can be
made if the compiler can assert that overflow Cannot Happen.

The justification for "signed integer arithmetic overflow is UB" is in
the C standards 6.5p5 under "Expressions" :

"""
If an exceptional condition occurs during the evaluation of an
expression (that is, if the result is not mathematically defined or not
in the range of representable values for its type), the behavior is
undefined.
"""

It actually has absolutely nothing to do with signed integer
representation, or machine hardware. It doesn't even have much to do
with integers at all. It is simply that if the calculation can't give a correct answer, then then the C standards don't say anything about the
results or effects.

The point is that there when the results of an integer computation are
too big, there is no way to get the correct answer in the types used.
Two's complement wrapping is /not/ correct. If you add two real-world
positive integers, you don't get a negative integer.

And of course, even today, C still targets oddball platforms
like DSPs and custom chips, where assumptions about the ubiquity
of 2's comp may not hold.

Modern C and C++ standards have dropped support for signed integer representation other than two's complement, because they are not in use
in any modern hardware (including any DSP's) - at least, not for general-purpose integers. Both committees have consistently voted to
keep overflow as UB.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dan Cross@21:1/5 to [email protected] on Thu Aug 14 21:44:42 2025

In article <107l5ju$k78a$[email protected]>,
David Brown <[email protected]> wrote:

On 14.08.2025 17:44, Dan Cross wrote:

In article <sknnQ.168942$[email protected]>,
Scott Lurndal <[email protected]> wrote:

Both Burroughs Large Systems (48-bit stack machine) and the
Sperry 1100/2200 (36-bit) systems had (have, in emulation today)
C compilers.

Yup. The 1100-series machines were (are) 1's complement. Those
are the ones I usually think of when cursing that signed integer
overflow is UB in C.

I don't think anyone is compiling C23 code for those machines,
but back in the late 1980s, they were still enough of a going
concern that they could influence the emerginc C standard. Not
so much anymore.

They would presumably have been part of the justification for supporting >multiple signed integer formats at the time.

C90 doesn't have much to say about this at all, other than
saying that the actual representation and ranges of the integer
types are implementation defined (G.3.5 para 1).

C90 does say that, "The representations of integral types shall
define values by use of a pure binary numeration system" (sec
6.1.2.5).

C99 tightens this up and talks about 2's comp, 1's comp, and
sign/mag as being the permissible representations (J.3.5, para
1).

UB on signed integer
arithmetic overflow is a different matter altogether.

I disagree.

Regardless, signed integer overflow remains UB in the current C
standard, nevermind definitionally following 2s complement
semantics. Usually this is done on the basis of performance
arguments: some seemingly-important loop optimizations can be
made if the compiler can assert that overflow Cannot Happen.

The justification for "signed integer arithmetic overflow is UB" is in
the C standards 6.5p5 under "Expressions" :

Not in ANSI/ISO 9899-1990. In that revision of the standard,
sec 6.5 covers declarations.

"""
If an exceptional condition occurs during the evaluation of an
expression (that is, if the result is not mathematically defined or not
in the range of representable values for its type), the behavior is >undefined.
"""

In C90, this language appears in sec 6.3 para 5. Note, however,
that they do not define what an exception _is_, only a few
things that _may_ cause one. See below.

It actually has absolutely nothing to do with signed integer
representation, or machine hardware.

Consider this language from the (non-normative) example 4 in sec
5.1.2.3:

|On a machine in which overflows produce an exception and in
|which the range of values representable by an *int* is
|[-32768,+32767], the implementation cannot rewrite this
|expression as [continues with the specifics of the example]....

That seems pretty clear that they're thinking about machines
that actually generate a hardware trap of some kind on overflow.

It doesn't even have much to do
with integers at all. It is simply that if the calculation can't give a >correct answer, then then the C standards don't say anything about the >results or effects.

The point is that there when the results of an integer computation are
too big, there is no way to get the correct answer in the types used.
Two's complement wrapping is /not/ correct. If you add two real-world >positive integers, you don't get a negative integer.

Sorry, but I don't buy this argument as anything other than a
justification after the fact. We're talking about history and
motivation here, not the behavior described in the standard.

In particular, C is a programming language for actual machines,
not a mathematical notation; the language is free to define the
behavior of arithmetic expressions in any way it chooses, though
one presumes it would do so in a way that makes sense for the
machines that it targets. Thus, it could have formalized the
result of signed integer overflow to follow 2's complement
semantics had the committee so chosen, in which case the result
would not be "incorrect", it would be well-defined with respect
to the semantics of the language. Java, for example, does this,
as does C11 (and later) atomic integer operations. Indeed, the
C99 rationale document makes frequent reference to twos
complement, where overflow and modular behavior are frequently
equivalent, being the common case. But aside from the more
recent atomics support, C _chose_ not to do this.

Also, consider that _unsigned_ arithmetic is defined as having
wrap-around semantics similar to modular arithmetic, and thus
incapable of overflow. But that's simply a fiction invented for
the abstract machine described informally in the standard: it
requires special handling one machines like the 1100 series,
because those machines might trap on overflow. The C committee
could just as well have said that the unsigned arithmetic
_could_ overflow and that the result was UB.

So why did C chose this way? The only logical reason is that
there were machines at the time that where a) integer overflow
caused machine exceptions, and b) the representation of signed
integers was not well-defined, so that the actual value
resulting from overflow could not be rigorously defined. Given
that C90 mandated a binary representation for integers and so
the representation of of unsigned integers is basically common,
there was no need to do that for unsigned arithmetic.

And of course, even today, C still targets oddball platforms
like DSPs and custom chips, where assumptions about the ubiquity
of 2's comp may not hold.

Modern C and C++ standards have dropped support for signed integer >representation other than two's complement, because they are not in use
in any modern hardware (including any DSP's) - at least, not for >general-purpose integers. Both committees have consistently voted to
keep overflow as UB.

Yes. As I said, performance is often the justification.

I'm not convinced that there are no custom chips and/or DSPs
that are not manufactured today. They may not be common, their
mere existence is certainly dumb and offensive, but that does
not mean that they don't exist. Note that the survey in, e.g., https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm
only mentions _popular_ DSPs, not _all_ DSPs.

Of course, if such machines exist, I will certainly concede that
I doubt very much that anyone is targeting them with C code
written to a modern standard.

- Dan C.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Thomas Koenig on Fri Aug 15 17:49:58 2025

On 14.08.2025 19:43, Thomas Koenig wrote:

David Brown <[email protected]> schrieb:

The point is that there when the results of an integer computation are
too big, there is no way to get the correct answer in the types used.
Two's complement wrapping is /not/ correct. If you add two real-world
positive integers, you don't get a negative integer.

I believe it was you who wrote "If you add enough apples to a
pile, the number of apples becomes negative", so there is
clerly a defined physical meaning to overflow.

:-)

Yes, I did say something along those lines - but perhaps not /exactly/
those words!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Dan Cross on Fri Aug 15 17:49:53 2025

On 14.08.2025 23:44, Dan Cross wrote:

In article <107l5ju$k78a$[email protected]>,
David Brown <[email protected]> wrote:

On 14.08.2025 17:44, Dan Cross wrote:

In article <sknnQ.168942$[email protected]>,
Scott Lurndal <[email protected]> wrote:

Both Burroughs Large Systems (48-bit stack machine) and the
Sperry 1100/2200 (36-bit) systems had (have, in emulation today)
C compilers.

Yup. The 1100-series machines were (are) 1's complement. Those
are the ones I usually think of when cursing that signed integer
overflow is UB in C.

I don't think anyone is compiling C23 code for those machines,
but back in the late 1980s, they were still enough of a going
concern that they could influence the emerginc C standard. Not
so much anymore.

They would presumably have been part of the justification for supporting
multiple signed integer formats at the time.

C90 doesn't have much to say about this at all, other than
saying that the actual representation and ranges of the integer
types are implementation defined (G.3.5 para 1).

C90 does say that, "The representations of integral types shall
define values by use of a pure binary numeration system" (sec
6.1.2.5).

C99 tightens this up and talks about 2's comp, 1's comp, and
sign/mag as being the permissible representations (J.3.5, para
1).

Yes. Early C didn't go into the details, then C99 described the systems
that could realistically be used. And now in C23 only two's complement
is allowed.

UB on signed integer
arithmetic overflow is a different matter altogether.

I disagree.

You have overflow when the mathematical result of an operation cannot be expressed accurately in the type - regardless of the representation
format for the numbers. Your options, as a language designer or
implementer, of handling the overflow are the same regardless of the representation. You can pick a fixed value to return, or saturate, or
invoke some kind of error handler mechanism, or return a "don't care" unspecified value of the type, or perform a specified algorithm to get a representable value (such as reduction modulo 2^n), or you can simply
say the program is broken if this happens (it is UB).

I don't see where the representation comes into it - overflow is a
matter of values and the ranges that can be stored in a type, not how
those values are stored in the bits of the data.

Regardless, signed integer overflow remains UB in the current C
standard, nevermind definitionally following 2s complement
semantics. Usually this is done on the basis of performance
arguments: some seemingly-important loop optimizations can be
made if the compiler can assert that overflow Cannot Happen.

The justification for "signed integer arithmetic overflow is UB" is in
the C standards 6.5p5 under "Expressions" :

Not in ANSI/ISO 9899-1990. In that revision of the standard,
sec 6.5 covers declarations.

"""
If an exceptional condition occurs during the evaluation of an
expression (that is, if the result is not mathematically defined or not
in the range of representable values for its type), the behavior is
undefined.
"""

In C90, this language appears in sec 6.3 para 5. Note, however,
that they do not define what an exception _is_, only a few
things that _may_ cause one. See below.

It's basically the same in C90 onwards, with just small changes to the
wording. And it /does/ define what is meant by an "exceptional
condition" (or just "exception" in C90) - that is done by the part in parentheses.

It actually has absolutely nothing to do with signed integer
representation, or machine hardware.

Consider this language from the (non-normative) example 4 in sec
5.1.2.3:

|On a machine in which overflows produce an exception and in
|which the range of values representable by an *int* is
|[-32768,+32767], the implementation cannot rewrite this
|expression as [continues with the specifics of the example]....

That seems pretty clear that they're thinking about machines
that actually generate a hardware trap of some kind on overflow.

They are thinking about that possibility, yes. In C90, the term
"exception" here was not clearly defined - and it is definitely not the
same as the term "exception" in 6.3p5. The wording was improved in C99
without changing the intended meaning - there the term in the paragraph
under "Expressions" is "exceptional condition" (defined in that
paragraph), while in the example in "Execution environments", it says
"On a machine in which overflows produce an explicit trap". (C11
further clarifies what "performs a trap" means.)

But this is about re-arrangements the compiler is allowed to make, or
barred from making - it can't make re-arrangements that would mean
execution failed when the direct execution of the code according to the
C abstract machine would have worked correctly (without ever having
encountered an "exceptional condition" or other UB). Representation is
not relevant here - there is nothing about two's complement, ones'
complement, sign-magnitude, or anything else. Even the machine hardware
is not actually particularly important, given that most processors
support non-trapping integer arithmetic instructions and for those that
don't have explicit trap instructions, a compiler could generate "jump
if overflow flag set" or similar instructions to emulate traps
reasonably efficiently. (Many compilers support that kind of thing as
an option to aid debugging.)

It doesn't even have much to do
with integers at all. It is simply that if the calculation can't give a
correct answer, then then the C standards don't say anything about the
results or effects.

The point is that there when the results of an integer computation are
too big, there is no way to get the correct answer in the types used.
Two's complement wrapping is /not/ correct. If you add two real-world
positive integers, you don't get a negative integer.

Sorry, but I don't buy this argument as anything other than a
justification after the fact. We're talking about history and
motivation here, not the behavior described in the standard.

It is a fair point that I am describing a rational and sensible reason
for UB on arithmetic overflow - and I do not know the motivation of the
early C language designers, compiler implementers, and authors of the
first C standard.

I do know, however, that the principle of "garbage in, garbage out" was
well established long before C was conceived. And programmers of that
time were familiar with the concept of functions and operations being
defined for appropriate inputs, and having no defined behaviour for
invalid inputs. C is full of other things where behaviour is left
undefined when no sensible correct answer can be specified, and that is
not just because the behaviour of different hardware could vary. It
seems perfectly reasonable to me to suppose that signed integer
arithmetic overflow is just another case, no different from
dereferencing an invalid pointer, dividing by zero, or any one of the
other UB's in the standards.

In particular, C is a programming language for actual machines,
not a mathematical notation; the language is free to define the
behavior of arithmetic expressions in any way it chooses, though
one presumes it would do so in a way that makes sense for the
machines that it targets.

Yes, that is true. It is, however, also important to remember that it
was based on a general abstract machine, not any particular hardware,
and that the operations were intended to follow standard mathematics as
well as practically possible - operations and expressions in C were not designed for any particular hardware. (Though some design choices were
biased by particular hardware.)

Thus, it could have formalized the
result of signed integer overflow to follow 2's complement
semantics had the committee so chosen, in which case the result
would not be "incorrect", it would be well-defined with respect
to the semantics of the language. Java, for example, does this,
as does C11 (and later) atomic integer operations. Indeed, the
C99 rationale document makes frequent reference to twos
complement, where overflow and modular behavior are frequently
equivalent, being the common case. But aside from the more
recent atomics support, C _chose_ not to do this.

It could have made signed integer overflow defined behaviour, but it did
not. The C standards committee have explicitly chosen not to do that,
even after deciding that two's complement is the only supported
representation for signed integers in C23 onwards. It is fine to have
two's complement representation, and fine to have modulo arithmetic in
some circumstances, while leaving other arithmetic overflow undefined.
Unsigned integer operations in C have always been defined as modulo
arithmetic - addition of unsigned values is a different operation from
addition of signed values. Having some modulo behaviour does not in any
way imply that signed arithmetic should be modulo.

In Java, the language designers decided that integer arithmetic
operations would be modulo operations. Wrapping therefore gives the
correct answer for those operations - it does not give the correct
answer for mathematical integer operations. And Java loses common
mathematical identities which C retains - such as the identity that
adding a positive integer to another integer will increase its value.
Something always has to be lost when approximating unbounded
mathematical integers in a bounded implementation - I think C made the
right choices here about what to keep and what to lose, and Java made
the wrong choices. (Others may of course have different opinions.)

In Zig, unsigned integer arithmetic overflow is also UB as these
operations are not defined as modulo. I think that is a good natural
choice too - but it is useful for a language to have a way to do
wrapping arithmetic on the occasions you need it.

Also, consider that _unsigned_ arithmetic is defined as having
wrap-around semantics similar to modular arithmetic, and thus
incapable of overflow.

Yes. Unsigned arithmetic operations are different operations from
signed arithmetic operations in C.

But that's simply a fiction invented for
the abstract machine described informally in the standard: it
requires special handling one machines like the 1100 series,
because those machines might trap on overflow. The C committee
could just as well have said that the unsigned arithmetic
_could_ overflow and that the result was UB.

They could have done that (as the Zig folk did).

So why did C chose this way? The only logical reason is that
there were machines at the time that where a) integer overflow
caused machine exceptions, and b) the representation of signed
integers was not well-defined, so that the actual value
resulting from overflow could not be rigorously defined. Given
that C90 mandated a binary representation for integers and so
the representation of of unsigned integers is basically common,
there was no need to do that for unsigned arithmetic.

Not at all. Usually when someone says "the only logical reason is...",
they really mean "the only logical reason /I/ can think of is...", or
"the only reason that /I/ can think of that /I/ think is logical is...".

For a language that can be used as a low-level systems language, it is important to be able to do modulo arithmetic efficiently. It is needed
for a number of low-level tasks, including the implementation of large arithmetic operations, handling timers, counters, and other bits and
pieces. So it was definitely a useful thing to have in C.

For a language that can be used as a fast and efficient application
language, it must have a reasonable approximation to mathematical
integer arithmetic. Implementations should not be forced to have
behaviours beyond the mathematically sensible answers - if a calculation
can't be done correctly, there's no point in doing it. Giving nonsense
results does not help anyone - C programmers or toolchain implementers,
so the language should not specify any particular result. More sensible defined overflow behaviour - saturation, error values, language
exceptions or traps, etc., would be very inefficient on most hardware.
So UB is the best choice - and implementations can do something
different if they like.

Too many options make a language bigger - harder to implement, harder to
learn, harder to use. So it makes sense to have modulo arithmetic for
unsigned types, and normal arithmetic for signed types.

I am not claiming to know that this is the reasoning made by the C
language pioneers. But it is definitely an alternative logical reason
for C being the way it is.

And of course, even today, C still targets oddball platforms
like DSPs and custom chips, where assumptions about the ubiquity
of 2's comp may not hold.

Modern C and C++ standards have dropped support for signed integer
representation other than two's complement, because they are not in use
in any modern hardware (including any DSP's) - at least, not for
general-purpose integers. Both committees have consistently voted to
keep overflow as UB.

Yes. As I said, performance is often the justification.

I'm not convinced that there are no custom chips and/or DSPs
that are not manufactured today. They may not be common, their
mere existence is certainly dumb and offensive, but that does
not mean that they don't exist. Note that the survey in, e.g., https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm
only mentions _popular_ DSPs, not _all_ DSPs.

I think you might have missed a few words in that paragraph, but I
believe I know what you intended. There are certainly DSPs and other
cores that have strong support for alternative overflow behaviour -
saturation is very common in DSPs, and it is also common to have a
"sticky overflow" flag so that you can do lots of calculations in a
tight loop, and check for problems once you are finished. I think it is
highly unlikely that you'll find a core with something other than two's complement as the representation for signed integer types, though I
can't claim that I know /all/ devices! (I do know a bit about more
cores than would be considered popular or common.)

Of course, if such machines exist, I will certainly concede that
I doubt very much that anyone is targeting them with C code
written to a modern standard.

Modern C is definitely used on DSPs with strong saturation support.
(Even ARM cores have saturated arithmetic instructions.) But they can
also handle two's complement wrapped signed integer arithmetic if the programmer wants that - after all, it's exactly the same in the hardware
as modulo unsigned arithmetic (except for division). That doesn't mean
that wrapping signed integer overflow is useful or desired behaviour.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dan Cross@21:1/5 to [email protected] on Fri Aug 15 18:33:07 2025

In article <107nkv2$1753a$[email protected]>,
David Brown <[email protected]> wrote:

On 14.08.2025 23:44, Dan Cross wrote:

In article <107l5ju$k78a$[email protected]>,
David Brown <[email protected]> wrote:

[snip]
UB on signed integer
arithmetic overflow is a different matter altogether.

I disagree.

You have overflow when the mathematical result of an operation cannot be >expressed accurately in the type - regardless of the representation
format for the numbers. Your options, as a language designer or
implementer, of handling the overflow are the same regardless of the >representation. You can pick a fixed value to return, or saturate, or
invoke some kind of error handler mechanism, or return a "don't care" >unspecified value of the type, or perform a specified algorithm to get a >representable value (such as reduction modulo 2^n), or you can simply
say the program is broken if this happens (it is UB).

I don't see where the representation comes into it - overflow is a
matter of values and the ranges that can be stored in a type, not how
those values are stored in the bits of the data.

I understood your point. But we are talking about the history of
the language here, not the presently defined behavior.

We do, in fact, have historical source materials we can draw
from when discussing this; there's little need to guess. Here,
we know that the earliest C implementations simply ignored the
posibility of overflow. In K&R1, chap 2, sec 2.5 ("Arithmetic
Operators") on page 38, the authors write, "the action taken on
overflow or underflow depends on the machine at hand." In
Appendix A, sec 7 ("Expressions"), page 185, the authors write:
"The handling of overflow and divide check in expression
evaluation is machine-dependent. All existing implements of C
ignore integer overflows; treatment of division by 0, and all
floating point exceptions, varies between machines, and is
usually adjustable by a library function."

In other words, different machines give different results; some
will trap, others will differ due to representation issues. No
where here does it suggest that the language designers were
worried about getting the "wrong" result, as you have asserted.

Regardless, signed integer overflow remains UB in the current C
standard, nevermind definitionally following 2s complement
semantics. Usually this is done on the basis of performance
arguments: some seemingly-important loop optimizations can be
made if the compiler can assert that overflow Cannot Happen.

The justification for "signed integer arithmetic overflow is UB" is in
the C standards 6.5p5 under "Expressions" :

Not in ANSI/ISO 9899-1990. In that revision of the standard,
sec 6.5 covers declarations.

"""
If an exceptional condition occurs during the evaluation of an
expression (that is, if the result is not mathematically defined or not
in the range of representable values for its type), the behavior is
undefined.
"""

In C90, this language appears in sec 6.3 para 5. Note, however,
that they do not define what an exception _is_, only a few
things that _may_ cause one. See below.

It's basically the same in C90 onwards, with just small changes to the >wording.

Did I suggest otherwise?

And it /does/ define what is meant by an "exceptional
condition" (or just "exception" in C90) - that is done by the part in >parentheses.

That is an interpretation.

It actually has absolutely nothing to do with signed integer
representation, or machine hardware.

Consider this language from the (non-normative) example 4 in sec
5.1.2.3:

|On a machine in which overflows produce an exception and in
|which the range of values representable by an *int* is
|[-32768,+32767], the implementation cannot rewrite this
|expression as [continues with the specifics of the example]....

That seems pretty clear that they're thinking about machines
that actually generate a hardware trap of some kind on overflow.

They are thinking about that possibility, yes. In C90, the term
"exception" here was not clearly defined - and it is definitely not the
same as the term "exception" in 6.3p5. The wording was improved in C99 >without changing the intended meaning - there the term in the paragraph
under "Expressions" is "exceptional condition" (defined in that
paragraph), while in the example in "Execution environments", it says
"On a machine in which overflows produce an explicit trap". (C11
further clarifies what "performs a trap" means.)

But this is about re-arrangements the compiler is allowed to make, or
barred from making - it can't make re-arrangements that would mean
execution failed when the direct execution of the code according to the
C abstract machine would have worked correctly (without ever having >encountered an "exceptional condition" or other UB). Representation is
not relevant here - there is nothing about two's complement, ones' >complement, sign-magnitude, or anything else. Even the machine hardware
is not actually particularly important, given that most processors
support non-trapping integer arithmetic instructions and for those that
don't have explicit trap instructions, a compiler could generate "jump
if overflow flag set" or similar instructions to emulate traps
reasonably efficiently. (Many compilers support that kind of thing as
an option to aid debugging.)

It doesn't even have much to do
with integers at all. It is simply that if the calculation can't give a >>> correct answer, then then the C standards don't say anything about the
results or effects.

The point is that there when the results of an integer computation are
too big, there is no way to get the correct answer in the types used.
Two's complement wrapping is /not/ correct. If you add two real-world
positive integers, you don't get a negative integer.

Sorry, but I don't buy this argument as anything other than a
justification after the fact. We're talking about history and
motivation here, not the behavior described in the standard.

It is a fair point that I am describing a rational and sensible reason
for UB on arithmetic overflow - and I do not know the motivation of the
early C language designers, compiler implementers, and authors of the
first C standard.

Then there's really nothing more to discuss. The intent here is
to understand the motivation of those folks.

Early C didn't even have unsigned; Dennis Ritchie's paper for
the History of Programming Languages conference said that it
came around 1977 (https://www.nokia.com/bell-labs/about/dennis-m-ritchie/chist.html;
see the section on "portability"), and in pre-ANSI C, struct
fields of `int` type were effectively unsigned (K&R1,
pp.138,197). I mentioned the quote from K&R1 about overflow
above, but we see some other hints about signed overflow
becoming negative in other documents. For instance, K&R2, p 118
gives the example of a hash function followed by the sentence,
"unsigned arithmetic ensures that the hash value is
non-negative." This does not suggest to me that the authors
thought that the wrapping behavior of twos-complement arithemtic
was "incorrect".

I do know, however, that the principle of "garbage in, garbage out" was
well established long before C was conceived. And programmers of that
time were familiar with the concept of functions and operations being
defined for appropriate inputs, and having no defined behaviour for
invalid inputs. C is full of other things where behaviour is left
undefined when no sensible correct answer can be specified, and that is
not just because the behaviour of different hardware could vary. It
seems perfectly reasonable to me to suppose that signed integer
arithmetic overflow is just another case, no different from
dereferencing an invalid pointer, dividing by zero, or any one of the
other UB's in the standards.

Indeed; this is effectively what I've been saying: signed
integer overflow is UB because the behavior of overflow varied
between the machines of the day, so C could not make assumptions
about what value would result, in part because of representation
issues: at the hardware level, signed overflow of the largest
representable positive integer yields different _values_ between
1s comp and 2s comp machines. Who is to say which is correct?

In particular, C is a programming language for actual machines,
not a mathematical notation; the language is free to define the
behavior of arithmetic expressions in any way it chooses, though
one presumes it would do so in a way that makes sense for the
machines that it targets.

Yes, that is true. It is, however, also important to remember that it
was based on a general abstract machine, not any particular hardware,
and that the operations were intended to follow standard mathematics as
well as practically possible - operations and expressions in C were not >designed for any particular hardware. (Though some design choices were >biased by particular hardware.)

This is historically inaccurate.

C was developed by and for the PDP-11 initially, targeting Unix,
building from Martin Richards's BCPL (which Ritchie and Thompson
had used under Multics on the GE-645 machine, and GCOS on the
635) and Ken Thompson's B language, which he had implemented as
a chopped-down BCPL to be a systems programming language for
_very_ early Unix on the PDP-7. B was typeless, as the PDP-7
was word-oriented, and we see vestages of this ancestral DNA in
C today. See Ritchie's C history paper for details.

Concerns for protability, leading to the development of the
abstract machine informally described by the C standard, came
much, much later in its evolutionary development.

Thus, it could have formalized the
result of signed integer overflow to follow 2's complement
semantics had the committee so chosen, in which case the result
would not be "incorrect", it would be well-defined with respect
to the semantics of the language. Java, for example, does this,
as does C11 (and later) atomic integer operations. Indeed, the
C99 rationale document makes frequent reference to twos
complement, where overflow and modular behavior are frequently
equivalent, being the common case. But aside from the more
recent atomics support, C _chose_ not to do this.

It could have made signed integer overflow defined behaviour, but it did
not. The C standards committee have explicitly chosen not to do that,
even after deciding that two's complement is the only supported >representation for signed integers in C23 onwards. It is fine to have
two's complement representation, and fine to have modulo arithmetic in
some circumstances, while leaving other arithmetic overflow undefined. >Unsigned integer operations in C have always been defined as modulo >arithmetic - addition of unsigned values is a different operation from >addition of signed values. Having some modulo behaviour does not in any
way imply that signed arithmetic should be modulo.

In Java, the language designers decided that integer arithmetic
operations would be modulo operations. Wrapping therefore gives the
correct answer for those operations - it does not give the correct
answer for mathematical integer operations. And Java loses common >mathematical identities which C retains - such as the identity that
adding a positive integer to another integer will increase its value. >Something always has to be lost when approximating unbounded
mathematical integers in a bounded implementation - I think C made the
right choices here about what to keep and what to lose, and Java made
the wrong choices. (Others may of course have different opinions.)

In Zig, unsigned integer arithmetic overflow is also UB as these
operations are not defined as modulo. I think that is a good natural
choice too - but it is useful for a language to have a way to do
wrapping arithmetic on the occasions you need it.

None of this seems relevant to understanding the motivations of
the members of the committee that produced the 1990 C standard,
other than agreeing that the decision could have been different.

I would add that very early C treated signed and unsigned
arithmetic as more or less equivalent. It wasn't until they
started porting C to machines other than the PDP-11 that it
started to matter.

Also, consider that _unsigned_ arithmetic is defined as having
wrap-around semantics similar to modular arithmetic, and thus
incapable of overflow.

Yes. Unsigned arithmetic operations are different operations from
signed arithmetic operations in C.

This is the second time you have mentioned this. Did I say
something that led you believe that I suggested otherwise, or
am somehow unaware of this fact?

But that's simply a fiction invented for
the abstract machine described informally in the standard: it
requires special handling one machines like the 1100 series,
because those machines might trap on overflow. The C committee
could just as well have said that the unsigned arithmetic
_could_ overflow and that the result was UB.

They could have done that (as the Zig folk did).

Or the SML folks before the Zig folks.

So why did C chose this way? The only logical reason is that
there were machines at the time that where a) integer overflow
caused machine exceptions, and b) the representation of signed
integers was not well-defined, so that the actual value
resulting from overflow could not be rigorously defined. Given
that C90 mandated a binary representation for integers and so
the representation of of unsigned integers is basically common,
there was no need to do that for unsigned arithmetic.

Not at all. Usually when someone says "the only logical reason is...",
they really mean "the only logical reason /I/ can think of is...", or
"the only reason that /I/ can think of that /I/ think is logical is...".

I probably should have said that I'm also drawing from direct
references, as well as hints and inferences from other
historical documents; both editions of K&R as well as early Unix
source code and the "C Reference Manual" from 6th and 7th
Edition Unix (the language described in 7th Ed is quite
different from the language in 6th Ed; most of this was driven
by the a) portability, and b) the need to support
phototypesetters, hence why the C implemented in 7th Ed and PCC
is sometimes called "Typesetter C"). This is complemented with
direct conversations with some of the original players, though
admittedly those were quite a while ago.

For a language that can be used as a low-level systems language, it is >important to be able to do modulo arithmetic efficiently. It is needed
for a number of low-level tasks, including the implementation of large >arithmetic operations, handling timers, counters, and other bits and
pieces. So it was definitely a useful thing to have in C.

For a language that can be used as a fast and efficient application
language, it must have a reasonable approximation to mathematical
integer arithmetic. Implementations should not be forced to have
behaviours beyond the mathematically sensible answers - if a calculation >can't be done correctly, there's no point in doing it. Giving nonsense >results does not help anyone - C programmers or toolchain implementers,
so the language should not specify any particular result. More sensible >defined overflow behaviour - saturation, error values, language
exceptions or traps, etc., would be very inefficient on most hardware.
So UB is the best choice - and implementations can do something
different if they like.

This is where we differ: you keep asserting notions of
"correctness", without acknowledging that a) correctness differs
in this context, and b) the notion of what is "correct" has
itself differed over time as C has evolved.

Moreover, when you say, "if a calculation can't be done
correctly, there's no point in doing it" that's seems highly
specific and reliant on your definition of correctness. My

Here's an example:

char foo = 128;
int x = foo + 1;
printf("%d\n", x);

What is printed? (Note: that's rhetorical)

On the systems I just tested, x86_64, ARM64 and RISCV64, I get
-127 for the first two, and 129 for the last.

Of course, we all know that this relies on implementation
defined behavior around whether `char` is treated as signed or
unsigned (and resultingly conversion from an unsigned constant
to signed), but if what you say were true about GIGO, why is
this not _undefined_ behavior?

Too many options make a language bigger - harder to implement, harder to >learn, harder to use. So it makes sense to have modulo arithmetic for >unsigned types, and normal arithmetic for signed types.

I am not claiming to know that this is the reasoning made by the C
language pioneers. But it is definitely an alternative logical reason
for C being the way it is.

But we _can_ see what those pioneers were thinking by reading
the artifacts they left behind, which we know, again based on
primary sources, had an impact on the standards committee.

And of course, even today, C still targets oddball platforms
like DSPs and custom chips, where assumptions about the ubiquity
of 2's comp may not hold.

Modern C and C++ standards have dropped support for signed integer
representation other than two's complement, because they are not in use
in any modern hardware (including any DSP's) - at least, not for
general-purpose integers. Both committees have consistently voted to
keep overflow as UB.

Yes. As I said, performance is often the justification.

I'm not convinced that there are no custom chips and/or DSPs
that are not manufactured today. They may not be common, their
mere existence is certainly dumb and offensive, but that does
not mean that they don't exist. Note that the survey in, e.g.,
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm
only mentions _popular_ DSPs, not _all_ DSPs.

I think you might have missed a few words in that paragraph, but I
believe I know what you intended. There are certainly DSPs and other
cores that have strong support for alternative overflow behaviour - >saturation is very common in DSPs, and it is also common to have a
"sticky overflow" flag so that you can do lots of calculations in a
tight loop, and check for problems once you are finished. I think it is >highly unlikely that you'll find a core with something other than two's >complement as the representation for signed integer types, though I
can't claim that I know /all/ devices! (I do know a bit about more
cores than would be considered popular or common.)

I was referring specifically to integer representation here, not
saturating (or other) operations, but sure.

Of course, if such machines exist, I will certainly concede that
I doubt very much that anyone is targeting them with C code
written to a modern standard.

Modern C is definitely used on DSPs with strong saturation support.
(Even ARM cores have saturated arithmetic instructions.) But they can
also handle two's complement wrapped signed integer arithmetic if the >programmer wants that - after all, it's exactly the same in the hardware
as modulo unsigned arithmetic (except for division). That doesn't mean
that wrapping signed integer overflow is useful or desired behaviour.

So again, the context here is understanding the initial
motivation. I've mentioned reasons why they don't change it now
(there _are_ arguments about correctness, but compiler writers
also argue strongly that making signed integer overflow well
defined would prohibit them from implementing what they consider
to be important optimizations).

- Dan C.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Guest
  Wed Jul 29 14:26:54 2026
  from Balkans via Telnet
- Rixter
  Wed Jul 29 14:18:17 2026
  from Madison, Nc via Telnet
- Rixter
  Wed Jul 29 02:00:40 2026
  from Madison, Nc via Telnet
- Centurion
  Tue Jul 28 22:54:59 2026
  from Berea, Ohio via Telnet
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet
- Rixter
  Tue Jul 28 13:42:46 2026
  from Madison, Nc via Telnet
- Krenn
  Tue Jul 28 11:59:57 2026
  from Sydney, Nsw via Telnet
- Rixter
  Tue Jul 28 01:23:48 2026
  from Madison, Nc via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	67:23:18
Calls:	12,448
Calls today:	3
Files:	15,194
Messages:	6,537,575

Re: System calls

Who's Online

Recent Visitors

System Info