On 14.08.2025 23:44, Dan Cross wrote:
In article <107l5ju$k78a$[email protected]>,
David Brown <[email protected]> wrote:
On 14.08.2025 17:44, Dan Cross wrote:
In article <sknnQ.168942$[email protected]>,
Scott Lurndal <[email protected]> wrote:
Both Burroughs Large Systems (48-bit stack machine) and the
Sperry 1100/2200 (36-bit) systems had (have, in emulation today)
C compilers.
Yup. The 1100-series machines were (are) 1's complement. Those
are the ones I usually think of when cursing that signed integer
overflow is UB in C.
I don't think anyone is compiling C23 code for those machines,
but back in the late 1980s, they were still enough of a going
concern that they could influence the emerginc C standard. Not
so much anymore.
They would presumably have been part of the justification for supporting
multiple signed integer formats at the time.
C90 doesn't have much to say about this at all, other than
saying that the actual representation and ranges of the integer
types are implementation defined (G.3.5 para 1).
C90 does say that, "The representations of integral types shall
define values by use of a pure binary numeration system" (sec
6.1.2.5).
C99 tightens this up and talks about 2's comp, 1's comp, and
sign/mag as being the permissible representations (J.3.5, para
1).
Yes. Early C didn't go into the details, then C99 described the systems
that could realistically be used. And now in C23 only two's complement
is allowed.
UB on signed integer
arithmetic overflow is a different matter altogether.
I disagree.
You have overflow when the mathematical result of an operation cannot be expressed accurately in the type - regardless of the representation
format for the numbers. Your options, as a language designer or
implementer, of handling the overflow are the same regardless of the representation. You can pick a fixed value to return, or saturate, or
invoke some kind of error handler mechanism, or return a "don't care" unspecified value of the type, or perform a specified algorithm to get a representable value (such as reduction modulo 2^n), or you can simply
say the program is broken if this happens (it is UB).
I don't see where the representation comes into it - overflow is a
matter of values and the ranges that can be stored in a type, not how
those values are stored in the bits of the data.
Regardless, signed integer overflow remains UB in the current C
standard, nevermind definitionally following 2s complement
semantics. Usually this is done on the basis of performance
arguments: some seemingly-important loop optimizations can be
made if the compiler can assert that overflow Cannot Happen.
The justification for "signed integer arithmetic overflow is UB" is in
the C standards 6.5p5 under "Expressions" :
Not in ANSI/ISO 9899-1990. In that revision of the standard,
sec 6.5 covers declarations.
"""
If an exceptional condition occurs during the evaluation of an
expression (that is, if the result is not mathematically defined or not
in the range of representable values for its type), the behavior is
undefined.
"""
In C90, this language appears in sec 6.3 para 5. Note, however,
that they do not define what an exception _is_, only a few
things that _may_ cause one. See below.
It's basically the same in C90 onwards, with just small changes to the
wording. And it /does/ define what is meant by an "exceptional
condition" (or just "exception" in C90) - that is done by the part in parentheses.
It actually has absolutely nothing to do with signed integer
representation, or machine hardware.
Consider this language from the (non-normative) example 4 in sec
5.1.2.3:
|On a machine in which overflows produce an exception and in
|which the range of values representable by an *int* is
|[-32768,+32767], the implementation cannot rewrite this
|expression as [continues with the specifics of the example]....
That seems pretty clear that they're thinking about machines
that actually generate a hardware trap of some kind on overflow.
They are thinking about that possibility, yes. In C90, the term
"exception" here was not clearly defined - and it is definitely not the
same as the term "exception" in 6.3p5. The wording was improved in C99
without changing the intended meaning - there the term in the paragraph
under "Expressions" is "exceptional condition" (defined in that
paragraph), while in the example in "Execution environments", it says
"On a machine in which overflows produce an explicit trap". (C11
further clarifies what "performs a trap" means.)
But this is about re-arrangements the compiler is allowed to make, or
barred from making - it can't make re-arrangements that would mean
execution failed when the direct execution of the code according to the
C abstract machine would have worked correctly (without ever having
encountered an "exceptional condition" or other UB). Representation is
not relevant here - there is nothing about two's complement, ones'
complement, sign-magnitude, or anything else. Even the machine hardware
is not actually particularly important, given that most processors
support non-trapping integer arithmetic instructions and for those that
don't have explicit trap instructions, a compiler could generate "jump
if overflow flag set" or similar instructions to emulate traps
reasonably efficiently. (Many compilers support that kind of thing as
an option to aid debugging.)
It doesn't even have much to do
with integers at all. It is simply that if the calculation can't give a
correct answer, then then the C standards don't say anything about the
results or effects.
The point is that there when the results of an integer computation are
too big, there is no way to get the correct answer in the types used.
Two's complement wrapping is /not/ correct. If you add two real-world
positive integers, you don't get a negative integer.
Sorry, but I don't buy this argument as anything other than a
justification after the fact. We're talking about history and
motivation here, not the behavior described in the standard.
It is a fair point that I am describing a rational and sensible reason
for UB on arithmetic overflow - and I do not know the motivation of the
early C language designers, compiler implementers, and authors of the
first C standard.
I do know, however, that the principle of "garbage in, garbage out" was
well established long before C was conceived. And programmers of that
time were familiar with the concept of functions and operations being
defined for appropriate inputs, and having no defined behaviour for
invalid inputs. C is full of other things where behaviour is left
undefined when no sensible correct answer can be specified, and that is
not just because the behaviour of different hardware could vary. It
seems perfectly reasonable to me to suppose that signed integer
arithmetic overflow is just another case, no different from
dereferencing an invalid pointer, dividing by zero, or any one of the
other UB's in the standards.
In particular, C is a programming language for actual machines,
not a mathematical notation; the language is free to define the
behavior of arithmetic expressions in any way it chooses, though
one presumes it would do so in a way that makes sense for the
machines that it targets.
Yes, that is true. It is, however, also important to remember that it
was based on a general abstract machine, not any particular hardware,
and that the operations were intended to follow standard mathematics as
well as practically possible - operations and expressions in C were not designed for any particular hardware. (Though some design choices were
biased by particular hardware.)
Thus, it could have formalized the
result of signed integer overflow to follow 2's complement
semantics had the committee so chosen, in which case the result
would not be "incorrect", it would be well-defined with respect
to the semantics of the language. Java, for example, does this,
as does C11 (and later) atomic integer operations. Indeed, the
C99 rationale document makes frequent reference to twos
complement, where overflow and modular behavior are frequently
equivalent, being the common case. But aside from the more
recent atomics support, C _chose_ not to do this.
It could have made signed integer overflow defined behaviour, but it did
not. The C standards committee have explicitly chosen not to do that,
even after deciding that two's complement is the only supported
representation for signed integers in C23 onwards. It is fine to have
two's complement representation, and fine to have modulo arithmetic in
some circumstances, while leaving other arithmetic overflow undefined.
Unsigned integer operations in C have always been defined as modulo
arithmetic - addition of unsigned values is a different operation from
addition of signed values. Having some modulo behaviour does not in any
way imply that signed arithmetic should be modulo.
In Java, the language designers decided that integer arithmetic
operations would be modulo operations. Wrapping therefore gives the
correct answer for those operations - it does not give the correct
answer for mathematical integer operations. And Java loses common
mathematical identities which C retains - such as the identity that
adding a positive integer to another integer will increase its value.
Something always has to be lost when approximating unbounded
mathematical integers in a bounded implementation - I think C made the
right choices here about what to keep and what to lose, and Java made
the wrong choices. (Others may of course have different opinions.)
In Zig, unsigned integer arithmetic overflow is also UB as these
operations are not defined as modulo. I think that is a good natural
choice too - but it is useful for a language to have a way to do
wrapping arithmetic on the occasions you need it.
Also, consider that _unsigned_ arithmetic is defined as having
wrap-around semantics similar to modular arithmetic, and thus
incapable of overflow.
Yes. Unsigned arithmetic operations are different operations from
signed arithmetic operations in C.
But that's simply a fiction invented for
the abstract machine described informally in the standard: it
requires special handling one machines like the 1100 series,
because those machines might trap on overflow. The C committee
could just as well have said that the unsigned arithmetic
_could_ overflow and that the result was UB.
They could have done that (as the Zig folk did).
So why did C chose this way? The only logical reason is that
there were machines at the time that where a) integer overflow
caused machine exceptions, and b) the representation of signed
integers was not well-defined, so that the actual value
resulting from overflow could not be rigorously defined. Given
that C90 mandated a binary representation for integers and so
the representation of of unsigned integers is basically common,
there was no need to do that for unsigned arithmetic.
Not at all. Usually when someone says "the only logical reason is...",
they really mean "the only logical reason /I/ can think of is...", or
"the only reason that /I/ can think of that /I/ think is logical is...".
For a language that can be used as a low-level systems language, it is important to be able to do modulo arithmetic efficiently. It is needed
for a number of low-level tasks, including the implementation of large arithmetic operations, handling timers, counters, and other bits and
pieces. So it was definitely a useful thing to have in C.
For a language that can be used as a fast and efficient application
language, it must have a reasonable approximation to mathematical
integer arithmetic. Implementations should not be forced to have
behaviours beyond the mathematically sensible answers - if a calculation
can't be done correctly, there's no point in doing it. Giving nonsense
results does not help anyone - C programmers or toolchain implementers,
so the language should not specify any particular result. More sensible defined overflow behaviour - saturation, error values, language
exceptions or traps, etc., would be very inefficient on most hardware.
So UB is the best choice - and implementations can do something
different if they like.
Too many options make a language bigger - harder to implement, harder to
learn, harder to use. So it makes sense to have modulo arithmetic for
unsigned types, and normal arithmetic for signed types.
I am not claiming to know that this is the reasoning made by the C
language pioneers. But it is definitely an alternative logical reason
for C being the way it is.
And of course, even today, C still targets oddball platforms
like DSPs and custom chips, where assumptions about the ubiquity
of 2's comp may not hold.
Modern C and C++ standards have dropped support for signed integer
representation other than two's complement, because they are not in use
in any modern hardware (including any DSP's) - at least, not for
general-purpose integers. Both committees have consistently voted to
keep overflow as UB.
Yes. As I said, performance is often the justification.
I'm not convinced that there are no custom chips and/or DSPs
that are not manufactured today. They may not be common, their
mere existence is certainly dumb and offensive, but that does
not mean that they don't exist. Note that the survey in, e.g., https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm
only mentions _popular_ DSPs, not _all_ DSPs.
I think you might have missed a few words in that paragraph, but I
believe I know what you intended. There are certainly DSPs and other
cores that have strong support for alternative overflow behaviour -
saturation is very common in DSPs, and it is also common to have a
"sticky overflow" flag so that you can do lots of calculations in a
tight loop, and check for problems once you are finished. I think it is
highly unlikely that you'll find a core with something other than two's complement as the representation for signed integer types, though I
can't claim that I know /all/ devices! (I do know a bit about more
cores than would be considered popular or common.)
Of course, if such machines exist, I will certainly concede that
I doubt very much that anyone is targeting them with C code
written to a modern standard.
Modern C is definitely used on DSPs with strong saturation support.
(Even ARM cores have saturated arithmetic instructions.) But they can
also handle two's complement wrapped signed integer arithmetic if the programmer wants that - after all, it's exactly the same in the hardware
as modulo unsigned arithmetic (except for division). That doesn't mean
that wrapping signed integer overflow is useful or desired behaviour.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)