• Re: System calls

    From Thomas Koenig@21:1/5 to David Brown on Thu Aug 14 17:43:50 2025
    David Brown <[email protected]> schrieb:

    The point is that there when the results of an integer computation are
    too big, there is no way to get the correct answer in the types used.
    Two's complement wrapping is /not/ correct. If you add two real-world positive integers, you don't get a negative integer.

    I believe it was you who wrote "If you add enough apples to a
    pile, the number of apples becomes negative", so there is
    clerly a defined physical meaning to overflow.

    :-)
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Dan Cross on Thu Aug 14 19:15:42 2025
    On 14.08.2025 17:44, Dan Cross wrote:
    In article <sknnQ.168942$[email protected]>,
    Scott Lurndal <[email protected]> wrote:
    Both Burroughs Large Systems (48-bit stack machine) and the
    Sperry 1100/2200 (36-bit) systems had (have, in emulation today)
    C compilers.

    Yup. The 1100-series machines were (are) 1's complement. Those
    are the ones I usually think of when cursing that signed integer
    overflow is UB in C.

    I don't think anyone is compiling C23 code for those machines,
    but back in the late 1980s, they were still enough of a going
    concern that they could influence the emerginc C standard. Not
    so much anymore.


    They would presumably have been part of the justification for supporting multiple signed integer formats at the time. UB on signed integer
    arithmetic overflow is a different matter altogether.

    Regardless, signed integer overflow remains UB in the current C
    standard, nevermind definitionally following 2s complement
    semantics. Usually this is done on the basis of performance
    arguments: some seemingly-important loop optimizations can be
    made if the compiler can assert that overflow Cannot Happen.


    The justification for "signed integer arithmetic overflow is UB" is in
    the C standards 6.5p5 under "Expressions" :

    """
    If an exceptional condition occurs during the evaluation of an
    expression (that is, if the result is not mathematically defined or not
    in the range of representable values for its type), the behavior is
    undefined.
    """

    It actually has absolutely nothing to do with signed integer
    representation, or machine hardware. It doesn't even have much to do
    with integers at all. It is simply that if the calculation can't give a correct answer, then then the C standards don't say anything about the
    results or effects.

    The point is that there when the results of an integer computation are
    too big, there is no way to get the correct answer in the types used.
    Two's complement wrapping is /not/ correct. If you add two real-world
    positive integers, you don't get a negative integer.

    And of course, even today, C still targets oddball platforms
    like DSPs and custom chips, where assumptions about the ubiquity
    of 2's comp may not hold.


    Modern C and C++ standards have dropped support for signed integer representation other than two's complement, because they are not in use
    in any modern hardware (including any DSP's) - at least, not for general-purpose integers. Both committees have consistently voted to
    keep overflow as UB.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dan Cross@21:1/5 to [email protected] on Thu Aug 14 21:44:42 2025
    In article <107l5ju$k78a$[email protected]>,
    David Brown <[email protected]> wrote:
    On 14.08.2025 17:44, Dan Cross wrote:
    In article <sknnQ.168942$[email protected]>,
    Scott Lurndal <[email protected]> wrote:
    Both Burroughs Large Systems (48-bit stack machine) and the
    Sperry 1100/2200 (36-bit) systems had (have, in emulation today)
    C compilers.

    Yup. The 1100-series machines were (are) 1's complement. Those
    are the ones I usually think of when cursing that signed integer
    overflow is UB in C.

    I don't think anyone is compiling C23 code for those machines,
    but back in the late 1980s, they were still enough of a going
    concern that they could influence the emerginc C standard. Not
    so much anymore.

    They would presumably have been part of the justification for supporting >multiple signed integer formats at the time.

    C90 doesn't have much to say about this at all, other than
    saying that the actual representation and ranges of the integer
    types are implementation defined (G.3.5 para 1).

    C90 does say that, "The representations of integral types shall
    define values by use of a pure binary numeration system" (sec
    6.1.2.5).

    C99 tightens this up and talks about 2's comp, 1's comp, and
    sign/mag as being the permissible representations (J.3.5, para
    1).

    UB on signed integer
    arithmetic overflow is a different matter altogether.

    I disagree.

    Regardless, signed integer overflow remains UB in the current C
    standard, nevermind definitionally following 2s complement
    semantics. Usually this is done on the basis of performance
    arguments: some seemingly-important loop optimizations can be
    made if the compiler can assert that overflow Cannot Happen.

    The justification for "signed integer arithmetic overflow is UB" is in
    the C standards 6.5p5 under "Expressions" :

    Not in ANSI/ISO 9899-1990. In that revision of the standard,
    sec 6.5 covers declarations.

    """
    If an exceptional condition occurs during the evaluation of an
    expression (that is, if the result is not mathematically defined or not
    in the range of representable values for its type), the behavior is >undefined.
    """

    In C90, this language appears in sec 6.3 para 5. Note, however,
    that they do not define what an exception _is_, only a few
    things that _may_ cause one. See below.

    It actually has absolutely nothing to do with signed integer
    representation, or machine hardware.

    Consider this language from the (non-normative) example 4 in sec
    5.1.2.3:

    |On a machine in which overflows produce an exception and in
    |which the range of values representable by an *int* is
    |[-32768,+32767], the implementation cannot rewrite this
    |expression as [continues with the specifics of the example]....

    That seems pretty clear that they're thinking about machines
    that actually generate a hardware trap of some kind on overflow.

    It doesn't even have much to do
    with integers at all. It is simply that if the calculation can't give a >correct answer, then then the C standards don't say anything about the >results or effects.

    The point is that there when the results of an integer computation are
    too big, there is no way to get the correct answer in the types used.
    Two's complement wrapping is /not/ correct. If you add two real-world >positive integers, you don't get a negative integer.

    Sorry, but I don't buy this argument as anything other than a
    justification after the fact. We're talking about history and
    motivation here, not the behavior described in the standard.

    In particular, C is a programming language for actual machines,
    not a mathematical notation; the language is free to define the
    behavior of arithmetic expressions in any way it chooses, though
    one presumes it would do so in a way that makes sense for the
    machines that it targets. Thus, it could have formalized the
    result of signed integer overflow to follow 2's complement
    semantics had the committee so chosen, in which case the result
    would not be "incorrect", it would be well-defined with respect
    to the semantics of the language. Java, for example, does this,
    as does C11 (and later) atomic integer operations. Indeed, the
    C99 rationale document makes frequent reference to twos
    complement, where overflow and modular behavior are frequently
    equivalent, being the common case. But aside from the more
    recent atomics support, C _chose_ not to do this.

    Also, consider that _unsigned_ arithmetic is defined as having
    wrap-around semantics similar to modular arithmetic, and thus
    incapable of overflow. But that's simply a fiction invented for
    the abstract machine described informally in the standard: it
    requires special handling one machines like the 1100 series,
    because those machines might trap on overflow. The C committee
    could just as well have said that the unsigned arithmetic
    _could_ overflow and that the result was UB.

    So why did C chose this way? The only logical reason is that
    there were machines at the time that where a) integer overflow
    caused machine exceptions, and b) the representation of signed
    integers was not well-defined, so that the actual value
    resulting from overflow could not be rigorously defined. Given
    that C90 mandated a binary representation for integers and so
    the representation of of unsigned integers is basically common,
    there was no need to do that for unsigned arithmetic.

    And of course, even today, C still targets oddball platforms
    like DSPs and custom chips, where assumptions about the ubiquity
    of 2's comp may not hold.

    Modern C and C++ standards have dropped support for signed integer >representation other than two's complement, because they are not in use
    in any modern hardware (including any DSP's) - at least, not for >general-purpose integers. Both committees have consistently voted to
    keep overflow as UB.

    Yes. As I said, performance is often the justification.

    I'm not convinced that there are no custom chips and/or DSPs
    that are not manufactured today. They may not be common, their
    mere existence is certainly dumb and offensive, but that does
    not mean that they don't exist. Note that the survey in, e.g., https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm
    only mentions _popular_ DSPs, not _all_ DSPs.

    Of course, if such machines exist, I will certainly concede that
    I doubt very much that anyone is targeting them with C code
    written to a modern standard.

    - Dan C.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Thomas Koenig on Fri Aug 15 17:49:58 2025
    On 14.08.2025 19:43, Thomas Koenig wrote:
    David Brown <[email protected]> schrieb:

    The point is that there when the results of an integer computation are
    too big, there is no way to get the correct answer in the types used.
    Two's complement wrapping is /not/ correct. If you add two real-world
    positive integers, you don't get a negative integer.

    I believe it was you who wrote "If you add enough apples to a
    pile, the number of apples becomes negative", so there is
    clerly a defined physical meaning to overflow.

    :-)

    Yes, I did say something along those lines - but perhaps not /exactly/
    those words!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Dan Cross on Fri Aug 15 17:49:53 2025
    On 14.08.2025 23:44, Dan Cross wrote:
    In article <107l5ju$k78a$[email protected]>,
    David Brown <[email protected]> wrote:
    On 14.08.2025 17:44, Dan Cross wrote:
    In article <sknnQ.168942$[email protected]>,
    Scott Lurndal <[email protected]> wrote:
    Both Burroughs Large Systems (48-bit stack machine) and the
    Sperry 1100/2200 (36-bit) systems had (have, in emulation today)
    C compilers.

    Yup. The 1100-series machines were (are) 1's complement. Those
    are the ones I usually think of when cursing that signed integer
    overflow is UB in C.

    I don't think anyone is compiling C23 code for those machines,
    but back in the late 1980s, they were still enough of a going
    concern that they could influence the emerginc C standard. Not
    so much anymore.

    They would presumably have been part of the justification for supporting
    multiple signed integer formats at the time.

    C90 doesn't have much to say about this at all, other than
    saying that the actual representation and ranges of the integer
    types are implementation defined (G.3.5 para 1).

    C90 does say that, "The representations of integral types shall
    define values by use of a pure binary numeration system" (sec
    6.1.2.5).

    C99 tightens this up and talks about 2's comp, 1's comp, and
    sign/mag as being the permissible representations (J.3.5, para
    1).

    Yes. Early C didn't go into the details, then C99 described the systems
    that could realistically be used. And now in C23 only two's complement
    is allowed.


    UB on signed integer
    arithmetic overflow is a different matter altogether.

    I disagree.


    You have overflow when the mathematical result of an operation cannot be expressed accurately in the type - regardless of the representation
    format for the numbers. Your options, as a language designer or
    implementer, of handling the overflow are the same regardless of the representation. You can pick a fixed value to return, or saturate, or
    invoke some kind of error handler mechanism, or return a "don't care" unspecified value of the type, or perform a specified algorithm to get a representable value (such as reduction modulo 2^n), or you can simply
    say the program is broken if this happens (it is UB).

    I don't see where the representation comes into it - overflow is a
    matter of values and the ranges that can be stored in a type, not how
    those values are stored in the bits of the data.

    Regardless, signed integer overflow remains UB in the current C
    standard, nevermind definitionally following 2s complement
    semantics. Usually this is done on the basis of performance
    arguments: some seemingly-important loop optimizations can be
    made if the compiler can assert that overflow Cannot Happen.

    The justification for "signed integer arithmetic overflow is UB" is in
    the C standards 6.5p5 under "Expressions" :

    Not in ANSI/ISO 9899-1990. In that revision of the standard,
    sec 6.5 covers declarations.

    """
    If an exceptional condition occurs during the evaluation of an
    expression (that is, if the result is not mathematically defined or not
    in the range of representable values for its type), the behavior is
    undefined.
    """

    In C90, this language appears in sec 6.3 para 5. Note, however,
    that they do not define what an exception _is_, only a few
    things that _may_ cause one. See below.


    It's basically the same in C90 onwards, with just small changes to the
    wording. And it /does/ define what is meant by an "exceptional
    condition" (or just "exception" in C90) - that is done by the part in parentheses.

    It actually has absolutely nothing to do with signed integer
    representation, or machine hardware.

    Consider this language from the (non-normative) example 4 in sec
    5.1.2.3:

    |On a machine in which overflows produce an exception and in
    |which the range of values representable by an *int* is
    |[-32768,+32767], the implementation cannot rewrite this
    |expression as [continues with the specifics of the example]....

    That seems pretty clear that they're thinking about machines
    that actually generate a hardware trap of some kind on overflow.


    They are thinking about that possibility, yes. In C90, the term
    "exception" here was not clearly defined - and it is definitely not the
    same as the term "exception" in 6.3p5. The wording was improved in C99
    without changing the intended meaning - there the term in the paragraph
    under "Expressions" is "exceptional condition" (defined in that
    paragraph), while in the example in "Execution environments", it says
    "On a machine in which overflows produce an explicit trap". (C11
    further clarifies what "performs a trap" means.)

    But this is about re-arrangements the compiler is allowed to make, or
    barred from making - it can't make re-arrangements that would mean
    execution failed when the direct execution of the code according to the
    C abstract machine would have worked correctly (without ever having
    encountered an "exceptional condition" or other UB). Representation is
    not relevant here - there is nothing about two's complement, ones'
    complement, sign-magnitude, or anything else. Even the machine hardware
    is not actually particularly important, given that most processors
    support non-trapping integer arithmetic instructions and for those that
    don't have explicit trap instructions, a compiler could generate "jump
    if overflow flag set" or similar instructions to emulate traps
    reasonably efficiently. (Many compilers support that kind of thing as
    an option to aid debugging.)


    It doesn't even have much to do
    with integers at all. It is simply that if the calculation can't give a
    correct answer, then then the C standards don't say anything about the
    results or effects.

    The point is that there when the results of an integer computation are
    too big, there is no way to get the correct answer in the types used.
    Two's complement wrapping is /not/ correct. If you add two real-world
    positive integers, you don't get a negative integer.

    Sorry, but I don't buy this argument as anything other than a
    justification after the fact. We're talking about history and
    motivation here, not the behavior described in the standard.

    It is a fair point that I am describing a rational and sensible reason
    for UB on arithmetic overflow - and I do not know the motivation of the
    early C language designers, compiler implementers, and authors of the
    first C standard.

    I do know, however, that the principle of "garbage in, garbage out" was
    well established long before C was conceived. And programmers of that
    time were familiar with the concept of functions and operations being
    defined for appropriate inputs, and having no defined behaviour for
    invalid inputs. C is full of other things where behaviour is left
    undefined when no sensible correct answer can be specified, and that is
    not just because the behaviour of different hardware could vary. It
    seems perfectly reasonable to me to suppose that signed integer
    arithmetic overflow is just another case, no different from
    dereferencing an invalid pointer, dividing by zero, or any one of the
    other UB's in the standards.


    In particular, C is a programming language for actual machines,
    not a mathematical notation; the language is free to define the
    behavior of arithmetic expressions in any way it chooses, though
    one presumes it would do so in a way that makes sense for the
    machines that it targets.

    Yes, that is true. It is, however, also important to remember that it
    was based on a general abstract machine, not any particular hardware,
    and that the operations were intended to follow standard mathematics as
    well as practically possible - operations and expressions in C were not designed for any particular hardware. (Though some design choices were
    biased by particular hardware.)

    Thus, it could have formalized the
    result of signed integer overflow to follow 2's complement
    semantics had the committee so chosen, in which case the result
    would not be "incorrect", it would be well-defined with respect
    to the semantics of the language. Java, for example, does this,
    as does C11 (and later) atomic integer operations. Indeed, the
    C99 rationale document makes frequent reference to twos
    complement, where overflow and modular behavior are frequently
    equivalent, being the common case. But aside from the more
    recent atomics support, C _chose_ not to do this.


    It could have made signed integer overflow defined behaviour, but it did
    not. The C standards committee have explicitly chosen not to do that,
    even after deciding that two's complement is the only supported
    representation for signed integers in C23 onwards. It is fine to have
    two's complement representation, and fine to have modulo arithmetic in
    some circumstances, while leaving other arithmetic overflow undefined.
    Unsigned integer operations in C have always been defined as modulo
    arithmetic - addition of unsigned values is a different operation from
    addition of signed values. Having some modulo behaviour does not in any
    way imply that signed arithmetic should be modulo.

    In Java, the language designers decided that integer arithmetic
    operations would be modulo operations. Wrapping therefore gives the
    correct answer for those operations - it does not give the correct
    answer for mathematical integer operations. And Java loses common
    mathematical identities which C retains - such as the identity that
    adding a positive integer to another integer will increase its value.
    Something always has to be lost when approximating unbounded
    mathematical integers in a bounded implementation - I think C made the
    right choices here about what to keep and what to lose, and Java made
    the wrong choices. (Others may of course have different opinions.)

    In Zig, unsigned integer arithmetic overflow is also UB as these
    operations are not defined as modulo. I think that is a good natural
    choice too - but it is useful for a language to have a way to do
    wrapping arithmetic on the occasions you need it.

    Also, consider that _unsigned_ arithmetic is defined as having
    wrap-around semantics similar to modular arithmetic, and thus
    incapable of overflow.

    Yes. Unsigned arithmetic operations are different operations from
    signed arithmetic operations in C.

    But that's simply a fiction invented for
    the abstract machine described informally in the standard: it
    requires special handling one machines like the 1100 series,
    because those machines might trap on overflow. The C committee
    could just as well have said that the unsigned arithmetic
    _could_ overflow and that the result was UB.


    They could have done that (as the Zig folk did).

    So why did C chose this way? The only logical reason is that
    there were machines at the time that where a) integer overflow
    caused machine exceptions, and b) the representation of signed
    integers was not well-defined, so that the actual value
    resulting from overflow could not be rigorously defined. Given
    that C90 mandated a binary representation for integers and so
    the representation of of unsigned integers is basically common,
    there was no need to do that for unsigned arithmetic.


    Not at all. Usually when someone says "the only logical reason is...",
    they really mean "the only logical reason /I/ can think of is...", or
    "the only reason that /I/ can think of that /I/ think is logical is...".

    For a language that can be used as a low-level systems language, it is important to be able to do modulo arithmetic efficiently. It is needed
    for a number of low-level tasks, including the implementation of large arithmetic operations, handling timers, counters, and other bits and
    pieces. So it was definitely a useful thing to have in C.

    For a language that can be used as a fast and efficient application
    language, it must have a reasonable approximation to mathematical
    integer arithmetic. Implementations should not be forced to have
    behaviours beyond the mathematically sensible answers - if a calculation
    can't be done correctly, there's no point in doing it. Giving nonsense
    results does not help anyone - C programmers or toolchain implementers,
    so the language should not specify any particular result. More sensible defined overflow behaviour - saturation, error values, language
    exceptions or traps, etc., would be very inefficient on most hardware.
    So UB is the best choice - and implementations can do something
    different if they like.

    Too many options make a language bigger - harder to implement, harder to
    learn, harder to use. So it makes sense to have modulo arithmetic for
    unsigned types, and normal arithmetic for signed types.

    I am not claiming to know that this is the reasoning made by the C
    language pioneers. But it is definitely an alternative logical reason
    for C being the way it is.

    And of course, even today, C still targets oddball platforms
    like DSPs and custom chips, where assumptions about the ubiquity
    of 2's comp may not hold.

    Modern C and C++ standards have dropped support for signed integer
    representation other than two's complement, because they are not in use
    in any modern hardware (including any DSP's) - at least, not for
    general-purpose integers. Both committees have consistently voted to
    keep overflow as UB.

    Yes. As I said, performance is often the justification.

    I'm not convinced that there are no custom chips and/or DSPs
    that are not manufactured today. They may not be common, their
    mere existence is certainly dumb and offensive, but that does
    not mean that they don't exist. Note that the survey in, e.g., https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm
    only mentions _popular_ DSPs, not _all_ DSPs.


    I think you might have missed a few words in that paragraph, but I
    believe I know what you intended. There are certainly DSPs and other
    cores that have strong support for alternative overflow behaviour -
    saturation is very common in DSPs, and it is also common to have a
    "sticky overflow" flag so that you can do lots of calculations in a
    tight loop, and check for problems once you are finished. I think it is
    highly unlikely that you'll find a core with something other than two's complement as the representation for signed integer types, though I
    can't claim that I know /all/ devices! (I do know a bit about more
    cores than would be considered popular or common.)

    Of course, if such machines exist, I will certainly concede that
    I doubt very much that anyone is targeting them with C code
    written to a modern standard.


    Modern C is definitely used on DSPs with strong saturation support.
    (Even ARM cores have saturated arithmetic instructions.) But they can
    also handle two's complement wrapped signed integer arithmetic if the programmer wants that - after all, it's exactly the same in the hardware
    as modulo unsigned arithmetic (except for division). That doesn't mean
    that wrapping signed integer overflow is useful or desired behaviour.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dan Cross@21:1/5 to [email protected] on Fri Aug 15 18:33:07 2025
    In article <107nkv2$1753a$[email protected]>,
    David Brown <[email protected]> wrote:
    On 14.08.2025 23:44, Dan Cross wrote:
    In article <107l5ju$k78a$[email protected]>,
    David Brown <[email protected]> wrote:
    [snip]
    UB on signed integer
    arithmetic overflow is a different matter altogether.

    I disagree.

    You have overflow when the mathematical result of an operation cannot be >expressed accurately in the type - regardless of the representation
    format for the numbers. Your options, as a language designer or
    implementer, of handling the overflow are the same regardless of the >representation. You can pick a fixed value to return, or saturate, or
    invoke some kind of error handler mechanism, or return a "don't care" >unspecified value of the type, or perform a specified algorithm to get a >representable value (such as reduction modulo 2^n), or you can simply
    say the program is broken if this happens (it is UB).

    I don't see where the representation comes into it - overflow is a
    matter of values and the ranges that can be stored in a type, not how
    those values are stored in the bits of the data.

    I understood your point. But we are talking about the history of
    the language here, not the presently defined behavior.

    We do, in fact, have historical source materials we can draw
    from when discussing this; there's little need to guess. Here,
    we know that the earliest C implementations simply ignored the
    posibility of overflow. In K&R1, chap 2, sec 2.5 ("Arithmetic
    Operators") on page 38, the authors write, "the action taken on
    overflow or underflow depends on the machine at hand." In
    Appendix A, sec 7 ("Expressions"), page 185, the authors write:
    "The handling of overflow and divide check in expression
    evaluation is machine-dependent. All existing implements of C
    ignore integer overflows; treatment of division by 0, and all
    floating point exceptions, varies between machines, and is
    usually adjustable by a library function."

    In other words, different machines give different results; some
    will trap, others will differ due to representation issues. No
    where here does it suggest that the language designers were
    worried about getting the "wrong" result, as you have asserted.

    Regardless, signed integer overflow remains UB in the current C
    standard, nevermind definitionally following 2s complement
    semantics. Usually this is done on the basis of performance
    arguments: some seemingly-important loop optimizations can be
    made if the compiler can assert that overflow Cannot Happen.

    The justification for "signed integer arithmetic overflow is UB" is in
    the C standards 6.5p5 under "Expressions" :

    Not in ANSI/ISO 9899-1990. In that revision of the standard,
    sec 6.5 covers declarations.

    """
    If an exceptional condition occurs during the evaluation of an
    expression (that is, if the result is not mathematically defined or not
    in the range of representable values for its type), the behavior is
    undefined.
    """

    In C90, this language appears in sec 6.3 para 5. Note, however,
    that they do not define what an exception _is_, only a few
    things that _may_ cause one. See below.

    It's basically the same in C90 onwards, with just small changes to the >wording.

    Did I suggest otherwise?

    And it /does/ define what is meant by an "exceptional
    condition" (or just "exception" in C90) - that is done by the part in >parentheses.

    That is an interpretation.

    It actually has absolutely nothing to do with signed integer
    representation, or machine hardware.

    Consider this language from the (non-normative) example 4 in sec
    5.1.2.3:

    |On a machine in which overflows produce an exception and in
    |which the range of values representable by an *int* is
    |[-32768,+32767], the implementation cannot rewrite this
    |expression as [continues with the specifics of the example]....

    That seems pretty clear that they're thinking about machines
    that actually generate a hardware trap of some kind on overflow.

    They are thinking about that possibility, yes. In C90, the term
    "exception" here was not clearly defined - and it is definitely not the
    same as the term "exception" in 6.3p5. The wording was improved in C99 >without changing the intended meaning - there the term in the paragraph
    under "Expressions" is "exceptional condition" (defined in that
    paragraph), while in the example in "Execution environments", it says
    "On a machine in which overflows produce an explicit trap". (C11
    further clarifies what "performs a trap" means.)

    But this is about re-arrangements the compiler is allowed to make, or
    barred from making - it can't make re-arrangements that would mean
    execution failed when the direct execution of the code according to the
    C abstract machine would have worked correctly (without ever having >encountered an "exceptional condition" or other UB). Representation is
    not relevant here - there is nothing about two's complement, ones' >complement, sign-magnitude, or anything else. Even the machine hardware
    is not actually particularly important, given that most processors
    support non-trapping integer arithmetic instructions and for those that
    don't have explicit trap instructions, a compiler could generate "jump
    if overflow flag set" or similar instructions to emulate traps
    reasonably efficiently. (Many compilers support that kind of thing as
    an option to aid debugging.)

    It doesn't even have much to do
    with integers at all. It is simply that if the calculation can't give a >>> correct answer, then then the C standards don't say anything about the
    results or effects.

    The point is that there when the results of an integer computation are
    too big, there is no way to get the correct answer in the types used.
    Two's complement wrapping is /not/ correct. If you add two real-world
    positive integers, you don't get a negative integer.

    Sorry, but I don't buy this argument as anything other than a
    justification after the fact. We're talking about history and
    motivation here, not the behavior described in the standard.

    It is a fair point that I am describing a rational and sensible reason
    for UB on arithmetic overflow - and I do not know the motivation of the
    early C language designers, compiler implementers, and authors of the
    first C standard.

    Then there's really nothing more to discuss. The intent here is
    to understand the motivation of those folks.

    Early C didn't even have unsigned; Dennis Ritchie's paper for
    the History of Programming Languages conference said that it
    came around 1977 (https://www.nokia.com/bell-labs/about/dennis-m-ritchie/chist.html;
    see the section on "portability"), and in pre-ANSI C, struct
    fields of `int` type were effectively unsigned (K&R1,
    pp.138,197). I mentioned the quote from K&R1 about overflow
    above, but we see some other hints about signed overflow
    becoming negative in other documents. For instance, K&R2, p 118
    gives the example of a hash function followed by the sentence,
    "unsigned arithmetic ensures that the hash value is
    non-negative." This does not suggest to me that the authors
    thought that the wrapping behavior of twos-complement arithemtic
    was "incorrect".

    I do know, however, that the principle of "garbage in, garbage out" was
    well established long before C was conceived. And programmers of that
    time were familiar with the concept of functions and operations being
    defined for appropriate inputs, and having no defined behaviour for
    invalid inputs. C is full of other things where behaviour is left
    undefined when no sensible correct answer can be specified, and that is
    not just because the behaviour of different hardware could vary. It
    seems perfectly reasonable to me to suppose that signed integer
    arithmetic overflow is just another case, no different from
    dereferencing an invalid pointer, dividing by zero, or any one of the
    other UB's in the standards.

    Indeed; this is effectively what I've been saying: signed
    integer overflow is UB because the behavior of overflow varied
    between the machines of the day, so C could not make assumptions
    about what value would result, in part because of representation
    issues: at the hardware level, signed overflow of the largest
    representable positive integer yields different _values_ between
    1s comp and 2s comp machines. Who is to say which is correct?

    In particular, C is a programming language for actual machines,
    not a mathematical notation; the language is free to define the
    behavior of arithmetic expressions in any way it chooses, though
    one presumes it would do so in a way that makes sense for the
    machines that it targets.

    Yes, that is true. It is, however, also important to remember that it
    was based on a general abstract machine, not any particular hardware,
    and that the operations were intended to follow standard mathematics as
    well as practically possible - operations and expressions in C were not >designed for any particular hardware. (Though some design choices were >biased by particular hardware.)

    This is historically inaccurate.

    C was developed by and for the PDP-11 initially, targeting Unix,
    building from Martin Richards's BCPL (which Ritchie and Thompson
    had used under Multics on the GE-645 machine, and GCOS on the
    635) and Ken Thompson's B language, which he had implemented as
    a chopped-down BCPL to be a systems programming language for
    _very_ early Unix on the PDP-7. B was typeless, as the PDP-7
    was word-oriented, and we see vestages of this ancestral DNA in
    C today. See Ritchie's C history paper for details.

    Concerns for protability, leading to the development of the
    abstract machine informally described by the C standard, came
    much, much later in its evolutionary development.

    Thus, it could have formalized the
    result of signed integer overflow to follow 2's complement
    semantics had the committee so chosen, in which case the result
    would not be "incorrect", it would be well-defined with respect
    to the semantics of the language. Java, for example, does this,
    as does C11 (and later) atomic integer operations. Indeed, the
    C99 rationale document makes frequent reference to twos
    complement, where overflow and modular behavior are frequently
    equivalent, being the common case. But aside from the more
    recent atomics support, C _chose_ not to do this.

    It could have made signed integer overflow defined behaviour, but it did
    not. The C standards committee have explicitly chosen not to do that,
    even after deciding that two's complement is the only supported >representation for signed integers in C23 onwards. It is fine to have
    two's complement representation, and fine to have modulo arithmetic in
    some circumstances, while leaving other arithmetic overflow undefined. >Unsigned integer operations in C have always been defined as modulo >arithmetic - addition of unsigned values is a different operation from >addition of signed values. Having some modulo behaviour does not in any
    way imply that signed arithmetic should be modulo.

    In Java, the language designers decided that integer arithmetic
    operations would be modulo operations. Wrapping therefore gives the
    correct answer for those operations - it does not give the correct
    answer for mathematical integer operations. And Java loses common >mathematical identities which C retains - such as the identity that
    adding a positive integer to another integer will increase its value. >Something always has to be lost when approximating unbounded
    mathematical integers in a bounded implementation - I think C made the
    right choices here about what to keep and what to lose, and Java made
    the wrong choices. (Others may of course have different opinions.)

    In Zig, unsigned integer arithmetic overflow is also UB as these
    operations are not defined as modulo. I think that is a good natural
    choice too - but it is useful for a language to have a way to do
    wrapping arithmetic on the occasions you need it.

    None of this seems relevant to understanding the motivations of
    the members of the committee that produced the 1990 C standard,
    other than agreeing that the decision could have been different.

    I would add that very early C treated signed and unsigned
    arithmetic as more or less equivalent. It wasn't until they
    started porting C to machines other than the PDP-11 that it
    started to matter.

    Also, consider that _unsigned_ arithmetic is defined as having
    wrap-around semantics similar to modular arithmetic, and thus
    incapable of overflow.

    Yes. Unsigned arithmetic operations are different operations from
    signed arithmetic operations in C.

    This is the second time you have mentioned this. Did I say
    something that led you believe that I suggested otherwise, or
    am somehow unaware of this fact?

    But that's simply a fiction invented for
    the abstract machine described informally in the standard: it
    requires special handling one machines like the 1100 series,
    because those machines might trap on overflow. The C committee
    could just as well have said that the unsigned arithmetic
    _could_ overflow and that the result was UB.

    They could have done that (as the Zig folk did).

    Or the SML folks before the Zig folks.

    So why did C chose this way? The only logical reason is that
    there were machines at the time that where a) integer overflow
    caused machine exceptions, and b) the representation of signed
    integers was not well-defined, so that the actual value
    resulting from overflow could not be rigorously defined. Given
    that C90 mandated a binary representation for integers and so
    the representation of of unsigned integers is basically common,
    there was no need to do that for unsigned arithmetic.

    Not at all. Usually when someone says "the only logical reason is...",
    they really mean "the only logical reason /I/ can think of is...", or
    "the only reason that /I/ can think of that /I/ think is logical is...".

    I probably should have said that I'm also drawing from direct
    references, as well as hints and inferences from other
    historical documents; both editions of K&R as well as early Unix
    source code and the "C Reference Manual" from 6th and 7th
    Edition Unix (the language described in 7th Ed is quite
    different from the language in 6th Ed; most of this was driven
    by the a) portability, and b) the need to support
    phototypesetters, hence why the C implemented in 7th Ed and PCC
    is sometimes called "Typesetter C"). This is complemented with
    direct conversations with some of the original players, though
    admittedly those were quite a while ago.

    For a language that can be used as a low-level systems language, it is >important to be able to do modulo arithmetic efficiently. It is needed
    for a number of low-level tasks, including the implementation of large >arithmetic operations, handling timers, counters, and other bits and
    pieces. So it was definitely a useful thing to have in C.

    For a language that can be used as a fast and efficient application
    language, it must have a reasonable approximation to mathematical
    integer arithmetic. Implementations should not be forced to have
    behaviours beyond the mathematically sensible answers - if a calculation >can't be done correctly, there's no point in doing it. Giving nonsense >results does not help anyone - C programmers or toolchain implementers,
    so the language should not specify any particular result. More sensible >defined overflow behaviour - saturation, error values, language
    exceptions or traps, etc., would be very inefficient on most hardware.
    So UB is the best choice - and implementations can do something
    different if they like.

    This is where we differ: you keep asserting notions of
    "correctness", without acknowledging that a) correctness differs
    in this context, and b) the notion of what is "correct" has
    itself differed over time as C has evolved.

    Moreover, when you say, "if a calculation can't be done
    correctly, there's no point in doing it" that's seems highly
    specific and reliant on your definition of correctness. My

    Here's an example:

    char foo = 128;
    int x = foo + 1;
    printf("%d\n", x);

    What is printed? (Note: that's rhetorical)

    On the systems I just tested, x86_64, ARM64 and RISCV64, I get
    -127 for the first two, and 129 for the last.

    Of course, we all know that this relies on implementation
    defined behavior around whether `char` is treated as signed or
    unsigned (and resultingly conversion from an unsigned constant
    to signed), but if what you say were true about GIGO, why is
    this not _undefined_ behavior?

    Too many options make a language bigger - harder to implement, harder to >learn, harder to use. So it makes sense to have modulo arithmetic for >unsigned types, and normal arithmetic for signed types.

    I am not claiming to know that this is the reasoning made by the C
    language pioneers. But it is definitely an alternative logical reason
    for C being the way it is.

    But we _can_ see what those pioneers were thinking by reading
    the artifacts they left behind, which we know, again based on
    primary sources, had an impact on the standards committee.

    And of course, even today, C still targets oddball platforms
    like DSPs and custom chips, where assumptions about the ubiquity
    of 2's comp may not hold.

    Modern C and C++ standards have dropped support for signed integer
    representation other than two's complement, because they are not in use
    in any modern hardware (including any DSP's) - at least, not for
    general-purpose integers. Both committees have consistently voted to
    keep overflow as UB.

    Yes. As I said, performance is often the justification.

    I'm not convinced that there are no custom chips and/or DSPs
    that are not manufactured today. They may not be common, their
    mere existence is certainly dumb and offensive, but that does
    not mean that they don't exist. Note that the survey in, e.g.,
    https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm
    only mentions _popular_ DSPs, not _all_ DSPs.

    I think you might have missed a few words in that paragraph, but I
    believe I know what you intended. There are certainly DSPs and other
    cores that have strong support for alternative overflow behaviour - >saturation is very common in DSPs, and it is also common to have a
    "sticky overflow" flag so that you can do lots of calculations in a
    tight loop, and check for problems once you are finished. I think it is >highly unlikely that you'll find a core with something other than two's >complement as the representation for signed integer types, though I
    can't claim that I know /all/ devices! (I do know a bit about more
    cores than would be considered popular or common.)

    I was referring specifically to integer representation here, not
    saturating (or other) operations, but sure.

    Of course, if such machines exist, I will certainly concede that
    I doubt very much that anyone is targeting them with C code
    written to a modern standard.

    Modern C is definitely used on DSPs with strong saturation support.
    (Even ARM cores have saturated arithmetic instructions.) But they can
    also handle two's complement wrapped signed integer arithmetic if the >programmer wants that - after all, it's exactly the same in the hardware
    as modulo unsigned arithmetic (except for division). That doesn't mean
    that wrapping signed integer overflow is useful or desired behaviour.

    So again, the context here is understanding the initial
    motivation. I've mentioned reasons why they don't change it now
    (there _are_ arguments about correctness, but compiler writers
    also argue strongly that making signed integer overflow well
    defined would prohibit them from implementing what they consider
    to be important optimizations).

    - Dan C.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)