• Re: python text, Byte Addressability And Beyond

    From John Levine@21:1/5 to All on Sat May 11 22:53:09 2024
    According to Anton Ertl <[email protected]>:
    Looking up "splicing strings", I find that this is a term used in
    connection with Python for specifying substrings. Python3 is a
    language that lives the codepoint mistake to the extreme (and from
    what I read, this was one of the major pain points in the
    Python2->Python3 transition), but anyway, with UTF-8 one way to
    represent a substring is to use the start index and length in bytes
    (aka code units) rather than code points.

    Python3 has a complex internal string format that stores each string
    as 1, 2, or 4 byte values, depending on what the contents of the
    string are, so ASCII is one byte, UCS-2 is two bytes, and strings that
    contain code points beyond UCS-2 are four bytes. It's not clear how
    hard they try to shrink stuff down when taking substrings.

    https://peps.python.org/pep-0393/

    Python lets you subscript strings either individual items or
    substrings, and I have written a fair amount of code that does that. I
    realize that if I were doing semantic processing on Greek or Arabic, I
    would not be subscripting and expecting it to return straightforwardly
    useful results.

    The string structure has a field for the length of the string in
    UTF-8, but they don't seem to use it for anything, at least not yet,
    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to John Levine on Sun May 12 05:40:45 2024
    John Levine <[email protected]> writes:
    Python3 has a complex internal string format that stores each string
    as 1, 2, or 4 byte values, depending on what the contents of the
    string are, so ASCII is one byte, UCS-2 is two bytes, and strings that >contain code points beyond UCS-2 are four bytes. It's not clear how
    hard they try to shrink stuff down when taking substrings.

    https://peps.python.org/pep-0393/

    This is a nice demonstration of the unnecessary complexity that the
    codepoint mistake leads to. In the general case they can have three representations of the same string: wstr, utf8, and data; only one of
    them needs to be non-NULL, and data is canonical if it is non-NULL
    (not sure what is canonical if wstr and utf8 are present but data is
    not). If data is in latin1 format, but not ASCII, outputting both
    UTF-8 and UTF-16 needs conversion (it's just 8bit->16bit expansion in
    the UTF-16 case, but that means that a fast block copy is
    insufficient). On top of that, they specify both zero termination and
    length indicators: length, utf8_length and wstr_length.

    Of course Python3 has baked this mistake into their API, and once
    software has been written for that API, the complexity becomes
    necessary.

    But if they had decided to just store the data as UTF-8 and use byte
    indexes and lengths in their API, and adjusted the rest of their API accordingly, they could have avoided this complexity and inefficiency,
    and only palindrome and anagram programs that limit themselves to character=codepoint would have become harder to write.

    Python lets you subscript strings either individual items or
    substrings, and I have written a fair amount of code that does that. I >realize that if I were doing semantic processing on Greek or Arabic, I
    would not be subscripting and expecting it to return straightforwardly
    useful results.

    I don't doubt that the API works, it just leads to unnecessary
    complexity in the implementation.

    The string structure has a field for the length of the string in
    UTF-8, but they don't seem to use it for anything, at least not yet,

    My understanding from the PEP is that they use it for specifying the
    length of the utf8 representation; of course, they also use zero
    termination, so if the utf8 field is only passed to functions that use zero-termination, the utf8_length field is not used. Given that, as
    soon as data has been initialized, the contents of the utf8 and wstr
    fields are no longer used (they are not canonical), I expect that the
    only function that is called for the utf8 field is that for converting
    from utf8 to the data form.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Thomas Koenig on Sun May 12 09:00:53 2024
    Thomas Koenig <[email protected]> writes:
    Anton Ertl <[email protected]> schrieb:
    The point I wanted to make is that there is the frequent
    misconception that dealing with individual arbitrary characters is
    something that is relatively common, and that one can do that by using
    UTF-32 (or UTF-16); it isn't, and one cannot.

    Do you really mean one cannot change an individual character
    using UTF-32?

    Correct. That's the "one cannot" part. An Unicode code-point is not
    a character, and what UTF-32 gives you is one code point per code unit
    (a code unit is a fixed size container, 32 bits for UTF-32, 8 bits for
    UTF-8), not one character per code unit. But Unicode supports
    characters that consist of a sequence of several code points, see <https://en.wikipedia.org/wiki/Combining_character>, so if you just
    store one Unicode code to the address where a different code point
    currently is, you have not overwritten a character, just a code point; admittedly, the result is that you have changed one or two characters,
    but that's probably not what the user wanted.

    E.g., consider the following Gforth code (others can tell you how to
    do it in Python):

    "Ko\u0308nig" cr type

    The output is:

    König

    That is, the second character consists of two Unicode code points, the
    "o" and the "\u0308" (Combining Diaeresis).

    (I think that somewhere along the way from the Forth system to the
    xterm through copying and pasting into Emacs the second character has
    become precomposed, but that's probably just as well, so you can see
    what I see).

    If I replace the third code point with an e, I get "Koenig". So by
    overwriting one code point, I insert a character into the string.

    If instead I replace the second code point with a "\u0316" (Combining
    Grave Accent Below):

    "K\u0316\u0308nig" cr type

    I get this (which looks as expected in my xterm, but not in Emacs)

    K̖̈nig

    The first character is now a K with a diaresis above and an accent
    grave below and there are now a total of 4 characters, but still 6
    code points in the string; the second character has been deleted by
    this code-point replacement.

    Back to replacing characters instead of overwriting code points: If
    you want to replace the second character, you would need to replace
    two code points; if the replacement of the character has only one code
    point or more than two, you need to move the remaining three
    characters. You have this problem whether the string is represented
    as UTF-32 or UTF-8.

    I assume you mean "there is no need to do it"..

    That, too. That is the "it isn't" part of the statement.

    If you stick with UTF-8
    and use byte lengths and byte indexes, you can do almost everything as
    well or better (with less complication and more efficiently) as by
    converting to UTF-32 and back.

    Assume you're implementing a language which has a function of
    setting an individual character in a string.

    That's a design mistake in the language, and I know no language that
    has this misfeature.

    Instead, what we see is one language (Python3) that has an even worse misfeature: You can set an individual code point in a string; see
    above for the things you get when you overwrite code points.

    But why would one want to set individual code points? What about
    setting individual code units (in the case of UTF-8, the code unit is
    a byte) or bits? If you think that replacing parts of a character is
    a feature, why not go all the way?

    How would you implement it? Run through the string?

    You have to do that anyway, because of combining characters.

    Would you then also
    store additional information somewhere so that the next character
    that the user sets does not need to do it again?

    Probably not. I would discourage the users from using this misfeature
    and steer them to better alternatives.

    Alternatively, if it's a really important misfeature, I would use an editing-friendly string representation (maybe a piece table or rope)
    and/or maybe do some Python3-style crazyness and have the string be
    represented by an array of characters, and every character is
    represented by a pointer into an UTF-8 sequence.

    In the case of Python3, the sequence seems to have been that they
    started out with the bad idea that indexing a string by code point is
    the way to go, and then designed a first implementation catering to
    that premise, and published it without reconsidering the premise,
    despite the efficiency cost. And of couse it was too inefficient for
    some use cases, but it was too late to switch to a more sensible
    design, so they invented the more complex, but more efficient (than
    the first implementation) PEP 393 implementation.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Anton Ertl on Sun May 12 13:10:56 2024
    On 12/05/2024 07:40, Anton Ertl wrote:
    John Levine <[email protected]> writes:
    Python3 has a complex internal string format that stores each string
    as 1, 2, or 4 byte values, depending on what the contents of the
    string are, so ASCII is one byte, UCS-2 is two bytes, and strings that
    contain code points beyond UCS-2 are four bytes. It's not clear how
    hard they try to shrink stuff down when taking substrings.

    https://peps.python.org/pep-0393/

    This is a nice demonstration of the unnecessary complexity that the
    codepoint mistake leads to.

    A lot of this is, I suspect, for historical reasons. When Python was
    young, most software and languages used either plain ASCII or a mess of
    code pages for 8-bit encodings (or an even bigger mess of 16-bit
    encodings for CJK languages). Unicode was the new hope for a unifying
    16-bit system that would work for all characters in all languages. So
    Python - like Java, Windows NT, QT, and some other systems of that era,
    chose UCS-2 as the modern, international and future-proof solution to
    strings and characters.

    It turns out that UCS-2 was not enough, and these have all been
    suffering from mixed APIs ever since.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to David Brown on Sun May 12 16:12:26 2024
    David Brown <[email protected]> writes:
    On 12/05/2024 07:40, Anton Ertl wrote:
    John Levine <[email protected]> writes:
    Python3 has a complex internal string format that stores each string
    as 1, 2, or 4 byte values, depending on what the contents of the
    string are, so ASCII is one byte, UCS-2 is two bytes, and strings that
    contain code points beyond UCS-2 are four bytes. It's not clear how
    hard they try to shrink stuff down when taking substrings.

    https://peps.python.org/pep-0393/

    This is a nice demonstration of the unnecessary complexity that the
    codepoint mistake leads to.

    A lot of this is, I suspect, for historical reasons. When Python was
    young, most software and languages used either plain ASCII or a mess of
    code pages for 8-bit encodings (or an even bigger mess of 16-bit
    encodings for CJK languages). Unicode was the new hope for a unifying
    16-bit system that would work for all characters in all languages. So
    Python - like Java, Windows NT, QT, and some other systems of that era,
    chose UCS-2 as the modern, international and future-proof solution to
    strings and characters.

    It turns out that UCS-2 was not enough, and these have all been
    suffering from mixed APIs ever since.

    That's certainly true for Java (first release 1995), Windows NT (first
    released 1993) and QT (first released 1995).

    At that time Unicode 1.x (released 1991) was supposed to be the wave
    of the future, and it offered the (to Westerners) familiar environment
    of character = code unit (= 16 bits), ignoring the experience of the
    East Asians with ASCII-compatible variable-width encodings. For new
    systems the 16-bit code unit seemed to be the way to go, and the mixed
    APIs directly stem from that, because they imagined that legacy
    software that uses 8-bit code units would be rewritten to use 16-bit
    code units after a while, but of course the new system has to run
    legacy software, so it also provided a legacy API.

    It did not work out. Software using 8-bit code units was (for the
    most part) not converted to use 16-bit code units, and 16 bits was
    found to be not enough for a universal character set.

    In the meantime, the Silicon Valley based Unicode effort was merged
    with the ISO-based Universal Coded Character Set (UCS) effort (the
    name Unicode was kept) and we got Unicode 2.0 in 1996. Now if code
    unit = character would have been as important as was thought in
    Silicon Valley, the logical step would have been to go for 32-bit
    characters. But the UCS effort had brought in the experience with ASCII-compatible variable-width encodings, and so we got not just
    fixed-width UTF-32, but also variable-width ASCII-compatible UTF-8 and variable-width UTF-16 (to be backwards compatible with the
    systems/interfaces that were designed for 16-bit code units in the
    early 1990s).

    And, lo and behold, the systems that had adopted 16-bit code units
    kept the 16-bit code units and accepted that characters were now variable-width, because variable width is obviously easier to add to
    an existing code base than switching the code unit size.


    Plus at some point (not sure when) they decided that characters have
    to be composable, so even an encoding like UTF-32 with 32-bit code
    units would not be enough for a character. A 32-bit code unit would
    only be a code point.

    At that point, all encodings are variable-width, so why not just use
    UTF-8. And that's what everyone who had not introduced a new platform
    between 1991 and 1996 did. E.g., that's what we see in Unix (from
    around 1970) and in Rust (started 2006, first release 2015).

    Except Python3. I am not familiar with Python, but from the
    discussions I have read my impression is: Python2 (released 2000)
    supported strings of bytes, and people put UTF-8 in there and worked
    with that. Python3 (released 2008) was supposed to be a cleanup and
    instead of refining the code-unit-based approach of Python2 they
    introduced a code-point-based approach, which supported fast indexing
    of code points, a worthless feature. And they found out how hard it
    is to migrate a code base.

    So whatever the reason for the code point mistake in Python3 was, that
    mistake was made long after Unicode 2.0 was introduced in 1996 and the
    success of UTF-8 made it clear that variable-width encodings work out
    fine.

    For comparison: The 1994 Forth standard was designed to support 16-bit characters, and one implementation, JaxForth, actually demonstrated
    that. Most Forth implementations kept 8-bit characters for the time
    being, many assuming that they would have to do something like mixed
    APIs at some point. But when we actually thought and worked on the
    issue in 2004/2005, we were delighted to discover that UTF-8 works
    very well in the existing code base (of our Forth system and others)
    and there are only a few places that need changes; the additional
    words proposed in <http://www.euroforth.org/ef05/ertl-paysan05.pdf>
    have mostly been standardized in Forth-2012, but are actually rarely
    used, because ordinary string words don't care whether a string is
    ASCII or UTF-8. Anyway, this demonstrates that by 2005 it was clear
    that variable-width encodings are very workable, so the Python3
    mistake cannot be explained with its 2008 release date.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Sun May 12 18:48:10 2024
    According to Anton Ertl <[email protected]>:
    It turns out that UCS-2 was not enough, and these have all been
    suffering from mixed APIs ever since.

    That's certainly true for Java (first release 1995), Windows NT (first >released 1993) and QT (first released 1995).

    Don't forget Javascript, which means every browser if full of UCS-2 and/or UTF-16.

    Except Python3. I am not familiar with Python, but from the
    discussions I have read my impression is: Python2 (released 2000)
    supported strings of bytes, and people put UTF-8 in there and worked
    with that. Python3 (released 2008) was supposed to be a cleanup and
    instead of refining the code-unit-based approach of Python2 they
    introduced a code-point-based approach, which supported fast indexing
    of code points, a worthless feature. And they found out how hard it
    is to migrate a code base.

    It makes somewhat more sense than that.

    Python2 had a string type which was an variable length array of 8-bit characters, and a Unicode type which was an variable length array of
    code points. You could use a string to hold either ASCII text or
    arbitrary strings of bytes, depending on what operators and functions
    you used. Python3 reorganized this so that there is only one string
    type used for both ASCII and Unicode and a separate byte type for
    arbitrary strings of data.

    I can say from experience that the python3 approach is less confusing,
    and that in contexts where you know the strings are ASCII, e.g., mail
    or http message headers, subscripting makes sense, even though it
    mostly doesn't for sequences of Unicode code points. Even with code
    points it can make some sense, e.g., if you know you have text in an
    alphabetic language, you can find the code points that are white space
    to do stuff with words.

    In recent years python has been adding type declarations so you can
    say that a particular variable or function parameter has to be of a
    particular type or one of a union of types, e.g. it can be an int or a
    float or a Decimal object. They haven't yet created subtypes to limit
    the range of a type but I expect they will. That would let say a
    Percent is an integer between 0 and 100 and or an ASCII is a string
    with all the code points <= 0x7f. You could write more robust code
    that doesn't accidentally try to subscript into random non-ASCII code
    points.
    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Monnier@21:1/5 to All on Tue May 14 12:24:31 2024
    Assume you're implementing a language which has a function of setting
    an individual character in a string.
    That's a design mistake in the language, and I know no language that
    has this misfeature.

    I suspect "individual character" meant "code point" above.
    Does Unicode even has the notion of "character", really?

    Instead, what we see is one language (Python3) that has an even worse misfeature: You can set an individual code point in a string; see
    above for the things you get when you overwrite code points.

    I think it's fairly common for languages that started with strings
    as "arrays of 8bit chars".

    Emacs Lisp has this misfeature as well (and so does Common Lisp). 🙁
    It's really hard to get rid of it, even though it's used *very* rarely.
    In ELisp, strings are represented internally as utf-8 (tho it pretends
    to be an array opf code points), so an assignment that replaces a single
    char can require reallocating the array!

    But why would one want to set individual code points?

    Because you know your string only contains "characters" made of a single
    code point?

    E.g. your string contains the representation of the border of a table
    (to be displayed in a tty), and you want to "move" the `+` of a column separator (or a prettier version that takes advantage of the wider
    choice offered by Unicode).


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Anton Ertl on Tue May 14 17:43:43 2024
    Anton Ertl wrote:

    Thomas Koenig <[email protected]> writes:

    E.g., consider the following Gforth code (others can tell you how to
    do it in Python):

    "Ko\u0308nig" cr type

    The output is:

    König

    That is, the second character consists of two Unicode code points, the
    "o" and the "\u0308" (Combining Diaeresis).

    (I think that somewhere along the way from the Forth system to the
    xterm through copying and pasting into Emacs the second character has
    become precomposed, but that's probably just as well, so you can see
    what I see).

    If I replace the third code point with an e, I get "Koenig". So by overwriting one code point, I insert a character into the string.

    If instead I replace the second code point with a "\u0316" (Combining
    Grave Accent Below):

    "K\u0316\u0308nig" cr type

    I get this (which looks as expected in my xterm, but not in Emacs)

    K̖̈nig

    The first character is now a K with a diaresis above and an accent
    grave below and there are now a total of 4 characters, but still 6
    code points in the string; the second character has been deleted by
    this code-point replacement.


    It seems to me (in my vast ignorance) that names for things should be
    written in the most appropriate set of characters in the language of
    the person/thing being named.

    Then when such a name is "sent out to be displayed" that it is a property
    of the display what character set(s) it can properly emit, and thereby
    alter the string of characters as appropriate to its capabilities.

    For example:: Take > "K\u0316\u0308nig" cr type ==> K̖̈nig
    When displayed on a ASCII only line printer it would be written Koenig
    When displayed on a enhanced ASCII printer it would be written König
    When displayed on a full functional printer it would be written K̖̈nig

    The problem is the mapping function between how it should be encoded
    in its own native language to what can be expressed on a particular
    device.

    Only the display device needs to understand this mapping and NOT the program/software/device holding the string.

    I think people in Japan should be able to use printf by using プリントフ There is way to much "english" in the way computers are being used.
    It is similar to Anthropomorphizing animal behavior.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to All on Tue May 14 20:35:37 2024
    On 14/05/2024 19:43, MitchAlsup1 wrote:

    I think people in Japan should be able to use printf by using プリントフ There is way to much "english" in the way computers are being used.

    I disagree entirely here.

    For many things, international consistency is more important than
    picking local-sounding names for things that have no localised meaning.
    Having a Japanese name and spelling for "printf" doesn't give Japanese programmers any useful information, it is not easier to type or read,
    and simply ensures that they can't cooperate and collaborate with
    programmers using different languages. MS Office uses local languages
    for its macros and formulas in Excel - I've never heard anyone in Norway
    say they like it, and many who say it is a PITA that makes it hard to
    work with and hard to search for information. Most people IME who
    macros a lot prefer to stick to English.

    It works the other way too. When discussing Karate or Judo, most
    practitioners the world over know what a "mawashi geri" or an "o soto
    gari" is - most consistently use the Japanese terms regardless of native languages. Most, that is, except Americans and some other English
    speakers who feel they have to use English language terms, losing a lot
    of the subtlety and nuances of the terms and being different from their international peers.

    And when people try to force localisation of terms that have no local
    words, the result is just to encourage people to move everything over to
    a single language (English).


    It is similar to Anthropomorphizing animal behavior.

    No, it is not.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to [email protected] on Tue May 14 20:47:12 2024
    MitchAlsup1 <[email protected]> schrieb:

    I think people in Japan should be able to use printf by using プリントフ

    I have to put up with a minor version of that - Microsoft decided to
    localize folder names ("Program files" is dislplayed as "Programme"
    if you use German settings, except when you access it via the
    command line), and all Excel functions are localized; depending
    if you use English or German versions, arguments are separated
    via comma or semicolon. Of course, the other way is a syntax error.

    Saving things in native Excel format is OK, but generating a CSV
    file from a program will either work or not, depending on locale
    ("," vs ";" and "." vs ".").

    This is about as annoying as it gets...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Stefan Monnier on Sat May 18 05:29:20 2024
    Stefan Monnier <[email protected]> writes:
    Anton Ertl:]
    Thomas Koenig:]
    Assume you're implementing a language which has a function of setting
    an individual character in a string.
    That's a design mistake in the language, and I know no language that
    has this misfeature.

    I suspect "individual character" meant "code point" above.

    I meant character, not code point, as should have become clear from
    the following. I think that Thomas Koenig meant "character", too, but
    he may have been unaware of the difference between "character" and
    "Unicode code point".

    Does Unicode even has the notion of "character", really?

    AFAIK it does not. But applications like palindrome checkers care
    about characters, not code points.

    OTOH, most code can be implemented fine as working on strings, without
    knowing how many characters there are in the string (and it then does
    not need to know about code points, either). In other words, it can
    be implemented just as well when the strings are represented as
    strings of code units (whether UTF-8 (bytes), UTF-16 (16-bit code
    units) or UTF-32 (32-bit code units)), and then it does not help to
    convert UTF-8 to something else on input and something else to UTF-8
    on output.

    For the code that cares about characters, if it wants to work
    correctly for characters that cannot be precomposed into a single code
    point, it has to deal with characters that consist of multiple code
    points, i.e., that even in UTF-32 are variable-width. So given that
    you have to bite the variable-width bullet anyway, you can just as
    well use UTF-8.

    Instead, what we see is one language (Python3) that has an even worse
    misfeature: You can set an individual code point in a string; see
    above for the things you get when you overwrite code points.

    I think it's fairly common for languages that started with strings
    as "arrays of 8bit chars".

    Apart from Python3 not in those languages that I have looked at more
    closely wrt this feature.

    In particular, C was created by adding a byte type to B, and that type
    was called "char". It was allowed to be wider to cater for
    word-addressed machines, but on byte-addressed machines "char" is
    invariably a byte. To cater to Unicode, they used a two-pronged
    approach: they added wchar_t and multi-byte functions (IIRC both
    already in C89); wchar_t was obviously introduced to cater for the
    upcoming Unicode 1.0 (which satisfied code unit=code point=character),
    while the multibyte stuff was probably introduced originally for
    dealing with the ASCII-compatible East-Asian encodings.

    When UTF-8 arrived, the multi-byte functions proved to fit that well;
    but of course there is not much usage of those functions, because most
    code works fine without knowing about individual code points or
    characters. And UTF-8 turned out to be the answer to dealing with
    Unicode that the Unix programmers who had a lot of code working with
    strings of chars (i.e., bytes) were looking for.

    Then Unicode 2.0 arrived and the Win32 API (which had embraced wchar_t
    and defined it as being 16-bit) stuck with 16-bit wchar_t, which
    breaks "code unit=code point"; this may not be in line with the
    intentions of the inventors of wchar_t (e.g., there are no
    multi-wchar_t functions in the C standard last time I looked), but
    that has been the existing practice in wchar_t use in C for more than
    a quarter-century.

    Unix, where wchar_t was (and still is) little used, switched to 32-bit
    wchar_t, but

    1) given that Unicode at some point (probably already in 2.0) broke
    "code point=character", that does not really help software like
    palindrome checkers.

    2) wchar_t is little-used in Unix-specific code.

    3) Code that wants to be portable between Unix and Windows and uses
    wchar_t cannot rely on "code unit=code point" anyway.

    So, in practice, C code does not make use of the ability to set an
    individual code point by overwriting a fixed-size code unit.

    Forth has chars that are 8 bits wide in traditional Forth systems on byte-addressed machines. In the 1994 standard (in the middle of the
    reign of Unicode 1.0, and with lots of Californians on the
    standardization committe) provided the option to implement Forth
    systems with chars that take a fixed number >1 of bytes, and one
    system (JaxForth by Jack Woehr for Windows NT) implemented 16-bit
    chars.

    However, JaxForth was not very popular, and most code assumed that 1
    char = 1 (i.e., 8 bits on a byte-addressed machine), and given that
    there was no widely available system that deviated from that, even
    code that wanted to avoid this assumption could not be tested. And
    given that most code has this assumption and would not work on systems
    with 1 chars > 1, all the other systems stuck with 1 char = 1. A Chicken-and-Egg problem? Not really:

    When we looked at the problem in 2004, we found that most code works
    fine with UTF-8; that's because most code does not care about
    characters. Even code that uses words like C@ (load a char from
    memory) typically does it in a way that works with UTF-8. We proposed
    a number of words for dealing with variable-width xchars (what C calls multi-byte characters), and you can theoretically use them with the
    pre-Unicode East-Asian encodings as well as with UTF-8. These words
    were standardized in Forth-2012, but they are actually little-used
    (including by me), because most code actually works fine with opaque
    strings.

    In Gforth, an xchar is a code point, not a character, so these words
    are currently less useful for writing Palindrome checkers than one
    might hope. Maybe at some point we will look at the problem again,
    and provide words for dealing with characters, Unicode normalization,
    collating order and such things, but for now the pain is not big
    enough to tackle that problem.

    Finally, I proposed to standardize the common practice 1 chars = 1;
    this proposal was accepted for standardization in 2016.

    Emacs Lisp has this misfeature as well (and so does Common Lisp). 🙁
    It's really hard to get rid of it, even though it's used *very* rarely.
    In ELisp, strings are represented internally as utf-8 (tho it pretends
    to be an array opf code points), so an assignment that replaces a single
    char can require reallocating the array!

    One way forward might be to also provide a string-oriented API with
    byte (code unit) indices, and recommend that people use that instead
    of the inefficient code-point-indexed API. For a high-level language
    like Elisp or Python, the internal representation can depend on which
    function was last used on the string. So if code uses only the
    string-oriented API, you may be able to avoid the costs of the
    code-point API completely.

    But why would one want to set individual code points?

    Because you know your string only contains "characters" made of a single
    code point?

    This incorrect "knowledge" may be the reason why Emacs 27.1 displays

    K̖̈nig

    as if the first three-code-point character actually was three characters.

    E.g. your string contains the representation of the border of a table
    (to be displayed in a tty), and you want to "move" the `+` of a column >separator (or a prettier version that takes advantage of the wider
    choice offered by Unicode).

    These kinds of things involve additional complications. Not only do
    you have to know the difference between code points and characters,
    you also have to know the visual width of a character which is 0-2 for fixed-width fonts to be used in xterm or the like. Actually, if you
    treat a combining mark as having width 0, you may be able to work with
    code points and do not need characters.

    Why do you want to move the column separator and what do you want to
    overwrite with it? This is likely the result of another operation,
    and maybe that involves another string replacement; and displaying the
    result involves so much overhead that using a string replacement
    instead of a fixed-width store is probably not the dominant cost. And
    if the replacement string happens to have as many bytes as the
    replaced string (which would happen for, e.g., replacing " " with
    "+"), the operation is not so expensive anyway.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Anton Ertl on Sat May 18 14:09:31 2024
    Anton Ertl <[email protected]> schrieb:
    [email protected] (MitchAlsup1) writes:
    It seems to me (in my vast ignorance) that names for things should be >>written in the most appropriate set of characters in the language of
    the person/thing being named.

    Then when such a name is "sent out to be displayed" that it is a property >>of the display what character set(s) it can properly emit, and thereby >>alter the string of characters as appropriate to its capabilities.

    For example:: Take > "K\u0316\u0308nig" cr type ==> K̖̈nig
    When displayed on a ASCII only line printer it would be written Koenig
    When displayed on a enhanced ASCII printer it would be written König
    When displayed on a full functional printer it would be written K̖̈nig

    Why do you think that K̖̈nig should be written as Koenig or König?

    On my display, this read K, n with a diacritic and something close to
    a cedille under the n.


    However, for König

    Again, the diaresis is over the n, not the o.

    Unicode specifies that the precomposed form is
    König. And if you want a transcription into ASCII with the knowledge
    that it's German, the result would be Koenig.

    This is actually sometimes a (fairly minor) problem because the
    name on my passport actually reads "König" (o-diacritic), but
    people without knowledge of German tend to translscribe this as
    "Konig", whereas I transcribe it as "Koenig" on offical forms
    such as the one I need to fill out prior to entering the US.

    This is why modern EU passports have a canonical form of the
    name, which then is "KOENIG".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Terje Mathisen@21:1/5 to Thomas Koenig on Sat May 18 16:25:54 2024
    Thomas Koenig wrote:
    Anton Ertl <[email protected]> schrieb:
    [email protected] (MitchAlsup1) writes:
    It seems to me (in my vast ignorance) that names for things should be
    written in the most appropriate set of characters in the language of
    the person/thing being named.

    Then when such a name is "sent out to be displayed" that it is a property >>> of the display what character set(s) it can properly emit, and thereby
    alter the string of characters as appropriate to its capabilities.

    For example:: Take > "K\u0316\u0308nig" cr type ==> K̖̈nig
    When displayed on a ASCII only line printer it would be written Koenig
    When displayed on a enhanced ASCII printer it would be written König
    When displayed on a full functional printer it would be written K̖̈nig

    Why do you think that K̖̈nig should be written as Koenig or König?

    On my display, this read K, n with a diacritic and something close to
    a cedille under the n.


    However, for König

    Again, the diaresis is over the n, not the o.

    Unicode specifies that the precomposed form is
    König. And if you want a transcription into ASCII with the knowledge
    that it's German, the result would be Koenig.

    This is actually sometimes a (fairly minor) problem because the
    name on my passport actually reads "König" (o-diacritic), but
    people without knowledge of German tend to translscribe this as
    "Konig", whereas I transcribe it as "Koenig" on offical forms
    such as the one I need to fill out prior to entering the US.

    This is why modern EU passports have a canonical form of the
    name, which then is "KOENIG".

    Same problem as my wife and kids who have Norløff either a part of their surname or (my wife) as-is.

    Canonical simplification of the 'ø' character is either 'o' or 'oe', and passports and airline tickets differ, something which can cause all
    sorts of issues with US passport control.

    Terje

    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Terje Mathisen on Sat May 18 14:41:04 2024
    Terje Mathisen <[email protected]> schrieb:

    Canonical simplification of the 'ø' character is either 'o' or 'oe', and passports and airline tickets differ, something which can cause all
    sorts of issues with US passport control.

    Reminds me of either "Asterix and the Great Crossing" or "Asterix
    and the Normans", where Viking speach was indicated by having
    slashes through letters (like ø). When Obelix tries to speak
    their language, he also applies slashes, but does so randomly
    (like through a c) so nobody can understand him.

    Hmm... a challenge, can this be represented as Unicode codepoints?
    I would not be surprised if some Asterix fan had snuck it in while
    nobody was looking.

    (For those who don't know Asterix: It is a comic that was/is wildly
    popular in France and Germany at least, about Gauls who keep on
    resisting Roman occupation in the times of Julius Caesar, aided
    by a magic potion which gives them superhuman strength.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Thomas Koenig on Sat May 18 15:43:05 2024
    Thomas Koenig <[email protected]> writes:
    Anton Ertl <[email protected]> schrieb:
    Why do you think that K̖̈nig should be written as Koenig or König?

    On my display, this read K, n with a diacritic and something close to
    a cedille under the n.

    That displays correctly then. The "close to cedille" is an accent
    grave below.

    However, for König

    Again, the diaresis is over the n, not the o.

    That's strage, in the first case your display system composes the
    diaresis correctly with the preceding glyph (at that point, a K with
    accent grave below), but in the o case, it incorrectly composes it
    with the next glyph.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Thomas Koenig on Sat May 18 15:48:35 2024
    Thomas Koenig <[email protected]> writes:
    Terje Mathisen <[email protected]> schrieb:

    Canonical simplification of the 'ø' character is either 'o' or 'oe', and
    passports and airline tickets differ, something which can cause all
    sorts of issues with US passport control.

    Reminds me of either "Asterix and the Great Crossing" or "Asterix
    and the Normans", where Viking speach was indicated by having
    slashes through letters (like ø). When Obelix tries to speak
    their language, he also applies slashes, but does so randomly
    (like through a c) so nobody can understand him.

    Hmm... a challenge, can this be represented as Unicode codepoints?

    Sure. See <https://en.wikipedia.org/wiki/Bar_(diacritic)>.
    Interestingly, the Obelix character ȼ you mention above has it's own precomposed code point U+023C (Latin Small Letter C with Stroke) and
    its own Wikipedia page: https://en.wikipedia.org/wiki/%C8%BB, but you
    can also compose it from c and the combining short solidus overlay: c̷
    (this does not display correctly on emacs 27.1, but composes correctly
    on an xterm. There is no precomposed Latin Small Letter D with
    Stroke, but you can compose it in the same way: d̷.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Sat May 18 17:09:44 2024
    According to Thomas Koenig <[email protected]>:
    Considering the huge market for palindrome checkers, that is a
    real concern, especially if they involve characters for which
    UTF-32 is not sufficient, such as smileys.

    Is there any language whose characters cannot be represented in
    UTF-32?

    Chinese. There is a huge backlog of obscure but real Chinse characters
    that do not have a Unicode code point. This ISO committee is slowly
    working through them. Every couple of years they approve a batch of
    several thousand of them.

    https://en.wikipedia.org/wiki/Ideographic_Research_Group

    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to Anton Ertl on Sat May 18 17:11:32 2024
    Anton Ertl wrote:


    snip


    A similar concept was implemented in COBOL, where the designers though
    that having to write

    ADD A TO B GIVING C

    or somesuch makes programming easier than writing

    C = A+B

    in FORTRAN.


    I would put a slightly different spin on it. I believe that the
    original COBOL was designed not so much to make programming easier, but
    to make *learning* programming (for non-programmers) easier, and
    because it was supposedly "self documenting", easier for managers, etc.
    to see how the program worked. Remember, when COBOL was developed
    (late 1950s), there weren't many programmers in existance, and it was
    felt that the "mathematical" syntax of Fortran, would be too unfamiliar
    to the business people who developed the new programs to solve business problems, and who were generally not mathematicians.

    Of course, they were wrong about "self documenting", and as more people
    became programmers, the advantages of consice syntax made a big
    difference.





    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to All on Sun May 19 15:32:49 2024
    On Tue, 14 May 2024 17:43:43 +0000, [email protected] (MitchAlsup1)
    wrote:

    I think people in Japan should be able to use printf by using ?????
    There is way to much "english" in the way computers are being used.
    It is similar to Anthropomorphizing animal behavior.

    One could quibble.

    If Japanese people needed to enter kana from their keyboards to write
    programs, that would be awkward; there is not yet a good way to enter
    that kind of text from a keyboard.

    However, I think your point is valid. At least in some contexts.

    Remember back in the early 8-bit days of computing, and before them,
    when schools were exposing children to PDP-8 computers?

    Children were learning to program computers in BASIC.

    Obviously, here, if children in other countries used modified versions
    of BASIC that used keywords in their own natural language, it would be
    much easier for them to get started with programming than if the
    keywords were simply arbitrary strings of letters, taken from a
    foreign language of which they may not necessarily have any knowledge.

    If Algol was supposed to be an _international_ algorithmic language,
    why weren't its keywords taken from Latin or Esperanto, instead of
    English?

    Historical note: Algol was originally called IAL; remember what JOVIAL
    stood for.

    But the objections about sharing code between countries, and the fact
    that English is so widely known in technical circles, are also true.
    It is a complicated issue, made worse by the fact that nationalism and ethnocentricism are often bad things.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to [email protected] on Sun May 19 15:36:45 2024
    On Sat, 18 May 2024 17:11:32 -0000 (UTC), "Stephen Fuld" <[email protected]d> wrote:

    and
    because it was supposedly "self documenting", easier for managers, etc.
    to see how the program worked.

    Of course, if they designed COBOL that way, why did they include a
    statement that let you re-direct GOTO statements from elsewhere in a
    program?

    I mean, that was just *asking* for dishonest programmers to direct the
    odd pennies into their bank accounts and so on.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to John Savard on Mon May 20 11:46:20 2024
    John Savard <[email protected]d> writes:
    Remember back in the early 8-bit days of computing, and before them,
    when schools were exposing children to PDP-8 computers?

    Children were learning to program computers in BASIC.

    Obviously, here, if children in other countries used modified versions
    of BASIC that used keywords in their own natural language, it would be
    much easier for them to get started with programming than if the
    keywords were simply arbitrary strings of letters, taken from a
    foreign language of which they may not necessarily have any knowledge.

    Logo came in versions for different native languages, but looking at <https://de.wikipedia.org/wiki/Logo_(Programmiersprache)>, it shows
    English Logo examples before German Logo examples. I tried Logo on my
    C64; I don't know whether it was in English or German, but in any case
    I was not particularly impressed.

    The C64 as well as many other home computers came with BASIC, and
    BASIC was widely used, and before today I never heard or read any
    suggestion to use native-language commands in BASIC.

    I have seen some suggestions to provide native-language versions of
    Forth, but they never went anywhere (if they were serious). The main motivation here seems to have been that it's easy to do that in Forth,
    so is there a nail to which we can apply this hammer? I attend
    German-language Forth events where some of the partisipants are not
    good enough at English to, e.g., read articles about Forth in English,
    but none of them has Germanized his personal Forth system.

    Scratch is also designed for children and supports native-language
    switching, which eliminates one of the drawbacks of native-language
    versions.

    Like Logo, Scratch comes out of the MIT, and I wonder if the idea that programmers have problems with names that are not in their native
    language is due to their American background.

    If Algol was supposed to be an _international_ algorithmic language,
    why weren't its keywords taken from Latin or Esperanto, instead of
    English?

    Algol 60 does not standardize a program representation in characters
    (a grave mistake fixed by most later programming languages, but ). It
    also does not standardize reserved words (aka keywords); instead, it
    has symbols that are typically written in bold in publications to
    differentiate them from identifiers written in a normal typeface.

    It is up to the compiler implementor how the programmer has to provide
    these symbols; one way is to surround each such symbol with single
    quotes (used in ICT 1900 Algol). A compiler implementor could instead
    (or in addition) support native-language representations of these
    symbols, but I am not aware that this has happened. After all, it's
    an international language, not a national language; or maybe such
    attempts were made and sunk without much notice, for the same reasons
    we have been discussing all along.

    Elliot 803 Algol uses the reserved word approach that means that
    programs don't work that use, e.g., "if" as identifier, but has the
    advantage that you don't need to put that many single quotes in the
    code. This is the approach that won in later programming languages,
    but it makes it hard to introduce new reserved words in later versions
    (they may conflict with existing programs).

    As for why the Algol standard was written in English and used names
    from English rather than from Latin, that's because Algol was designed
    in 1960 when English was the lingua franca among scholars, not before
    ~1700 when Latin served that role. And Esperanto never reached that
    status.

    But concerning Latin, on the last EuroForth conference (near Rome)
    Ulrich Hoffmann gave an amusing talk where he presented a Latinized
    Forth complete with Roman numerals. Unfortunately, that talk is not
    (yet?) online.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to John Savard on Mon May 20 18:27:57 2024
    On Sun, 19 May 2024 15:32:49 -0600
    John Savard <[email protected]d> wrote:

    On Tue, 14 May 2024 17:43:43 +0000, [email protected] (MitchAlsup1)
    wrote:

    I think people in Japan should be able to use printf by using ?????
    There is way to much "english" in the way computers are being used.
    It is similar to Anthropomorphizing animal behavior.

    One could quibble.

    If Japanese people needed to enter kana from their keyboards to write programs, that would be awkward; there is not yet a good way to enter
    that kind of text from a keyboard.

    However, I think your point is valid. At least in some contexts.

    Remember back in the early 8-bit days of computing, and before them,
    when schools were exposing children to PDP-8 computers?

    Children were learning to program computers in BASIC.

    Obviously, here, if children in other countries used modified versions
    of BASIC that used keywords in their own natural language, it would be
    much easier for them to get started with programming than if the
    keywords were simply arbitrary strings of letters, taken from a
    foreign language of which they may not necessarily have any knowledge.

    If Algol was supposed to be an _international_ algorithmic language,
    why weren't its keywords taken from Latin or Esperanto, instead of
    English?

    Historical note: Algol was originally called IAL; remember what JOVIAL
    stood for.

    But the objections about sharing code between countries, and the fact
    that English is so widely known in technical circles, are also true.
    It is a complicated issue, made worse by the fact that nationalism and ethnocentricism are often bad things.

    John Savard

    https://en.wikipedia.org/wiki/Non-English-based_programming_languages
    Long list.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Michael S on Mon May 20 17:00:08 2024
    Michael S <[email protected]> writes: >https://en.wikipedia.org/wiki/Non-English-based_programming_languages
    Long list.

    Compared to all programming languages? Not really. The HOPL data
    base reports 8945 languages, and Landin already wrote "The next 700
    programming languages" (probably based on the idea that there were 700
    up to that point) in 1967.

    The first part points out that while only a little over 1/3 of the
    programming languages were designed in countries where the primary
    language is English, the share of languages that use English-based
    keywords is far larger. And that's especially true for languages that
    achieved some popularity.

    My guess is that if you take the proportion of lines of code written
    in languages where the buitins and (if present) reserved words are
    based on English, you would get a result that's very close to 100%.
    Code where identifiers are all based on English may have a
    significantly lower percentage, though.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to John Savard on Mon May 20 17:44:48 2024
    John Savard wrote:


    Historical note: Algol was originally called IAL; remember what JOVIAL
    stood for.

    Who was Joe ?? in Jovial

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to All on Mon May 20 19:26:39 2024
    MitchAlsup1 wrote:

    John Savard wrote:


    Historical note: Algol was originally called IAL; remember what
    JOVIAL stood for.

    Who was Joe ?? in Jovial


    Just in case you weren't joking,

    Jules Own Version of the International Algorithmic Language

    Jules was Jules Schwartz

    https://en.wikipedia.org/wiki/Jules_Schwartz



    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Anton Ertl on Sat May 18 08:29:12 2024
    Anton Ertl <[email protected]> schrieb:
    Stefan Monnier <[email protected]> writes:

    Does Unicode even has the notion of "character", really?

    AFAIK it does not. But applications like palindrome checkers care
    about characters, not code points.

    Considering the huge market for palindrome checkers, that is a
    real concern, especially if they involve characters for which
    UTF-32 is not sufficient, such as smileys.

    Is there any language whose characters cannot be represented in
    UTF-32?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to [email protected] on Sat May 18 08:40:40 2024
    [email protected] (MitchAlsup1) writes:
    It seems to me (in my vast ignorance) that names for things should be
    written in the most appropriate set of characters in the language of
    the person/thing being named.

    Then when such a name is "sent out to be displayed" that it is a property
    of the display what character set(s) it can properly emit, and thereby
    alter the string of characters as appropriate to its capabilities.

    For example:: Take > "K\u0316\u0308nig" cr type ==> K̖̈nig
    When displayed on a ASCII only line printer it would be written Koenig
    When displayed on a enhanced ASCII printer it would be written König
    When displayed on a full functional printer it would be written K̖̈nig

    Why do you think that K̖̈nig should be written as Koenig or König?

    However, for König Unicode specifies that the precomposed form is
    König. And if you want a transcription into ASCII with the knowledge
    that it's German, the result would be Koenig.

    Only the display device needs to understand this mapping and NOT the >program/software/device holding the string.

    Yes, that's why treating string data as opaque works for most of the
    code.

    I think people in Japan should be able to use printf by using プリントフ >There is way to much "english" in the way computers are being used.

    I don't know how Japanese feel about that, but I certainly don't want
    to have to use some Germanized form of C or Forth. This kind of
    catering for different natural-language programmers has been tried and
    has not taken over the world. I guess that's because

    1) You need to learn a lot about what "printf" means and how it is
    used; remembering the name is only a minor aspect.

    2) Having a name common on all the world allows you to read programs
    from all over the world, use reference material from all over the
    world, etc.

    A similar concept was implemented in COBOL, where the designers though
    that having to write

    ADD A TO B GIVING C

    or somesuch makes programming easier than writing

    C = A+B

    in FORTRAN. Has not found many followers, either. Interestingly,
    among the Algol descendents, the BCPL (and later B and C) syntax,
    which, e.g., replaced 'or' with || or |, and was otherwise more
    symbolic and less natural-language-oriented than its ancestor Algol
    60, was the most successful syntax style among the Algol descendents,
    including spreading to languages like Java that are closer to Algol 60
    or Pascal in other respects.

    I have seen programmers define their own names based on their native
    language, however. But if they use names in their own language, these
    names should not depend on the environment.

    In the macro language of a game I play, you can refer to things
    through their name or through their numeric id. Unfortunately, the
    names are localized, so the only way to write portable macros is by
    using the unmnemonic numeric ids:-(.

    What is more common than localized programming languages is producing
    error messages in localized languages. I find this annoying, too,
    because it makes it harder to find out how others have solved the same
    problem.

    And, e.g., ENOTSUP in Unix, has such a specific meaning that the
    lozalized text does not help the person unfamiliar with Unix, while it
    makes life harder for people who know Unix enough to make sense of the
    message; i.e., even though my native language is German, I find
    "Operation not supported" easier to understand than "Operation wird
    nicht unterstützt"; in the latter case I first have to guess what the
    English error message would have been and then I can start analysing
    the problem.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Thomas Koenig on Sat May 18 10:14:44 2024
    Thomas Koenig <[email protected]> writes:
    Anton Ertl <[email protected]> schrieb:
    Stefan Monnier <[email protected]> writes:

    Does Unicode even has the notion of "character", really?

    AFAIK it does not. But applications like palindrome checkers care
    about characters, not code points.

    Considering the huge market for palindrome checkers, that is a
    real concern, especially if they involve characters for which
    UTF-32 is not sufficient, such as smileys.

    Is there any language whose characters cannot be represented in
    UTF-32?

    The goal of Unicode is to support all writng systems; AFAIK they are
    not yet finished, but they expect that these writing systems will all
    fit into the space provided by UTF-16 (i.e., a little over one million
    code points), but they found it necessary to introduce the concept of
    composing glyphs from multiple code points.

    So if your question is: "Is there any language where one character
    cannot be represented by a single Unicode code point?" The answer is
    that the Unicode designers certainly expect that there are such
    writing systems.

    And looking at <https://en.wikipedia.org/wiki/Telugu_script> (just an
    example), I see that the table of Unicode code points for Telugu <https://en.wikipedia.org/wiki/Telugu_script#Unicode> is much smaller
    than the tables of glyphs in <https://en.wikipedia.org/wiki/Telugu_script#Articulation_of_consonants>
    and <https://en.wikipedia.org/wiki/Telugu_script#Consonants_with_vowel_diacritics>, so the Telugu script seems to be one writing system that cannot be
    represented with only precomposed characters.

    I don't know if palindromes are a thing in Telugu, though.

    But, as your reference to the size of the market for palindrome
    checkers indicates, there is actually little code where dealing with
    individual characters is relevant. For code where individual
    characters are not relevant and opaque strings are sufficient, there
    is no reason to use UTF-32. And for code where individual characters
    are relevant, code points are not sufficient in general, so there is
    no reason to use UTF-32 for that, either.

    Interestingly, Emacs 27.1 manages to deal with "తెలుగు లిపి" (which
    contains 6 characters composed of a total of 11 code points) just
    fine, while it fails on König (with a decomposed Umlaut-o).

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to [email protected] on Wed May 22 02:16:21 2024
    On Mon, 20 May 2024 19:26:39 -0000 (UTC), "Stephen Fuld" <[email protected]d> wrote:

    MitchAlsup1 wrote:

    John Savard wrote:


    Historical note: Algol was originally called IAL; remember what
    JOVIAL stood for.

    Who was Joe ?? in Jovial


    Just in case you weren't joking,

    Jules Own Version of the International Algorithmic Language

    Jules was Jules Schwartz

    https://en.wikipedia.org/wiki/Jules_Schwartz

    Not to be confused with Julius Schwartz.

    https://en.wikipedia.org/wiki/Julius_Schwartz

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Monnier@21:1/5 to All on Wed May 22 15:38:51 2024
    Assume you're implementing a language which has a function of setting
    an individual character in a string.
    That's a design mistake in the language, and I know no language that
    has this misfeature.
    I suspect "individual character" meant "code point" above.
    I meant character, not code point, as should have become clear from
    the following. I think that Thomas Koenig meant "character", too, but
    he may have been unaware of the difference between "character" and
    "Unicode code point".

    I don't know of any language (or even library) that supports the notion
    of "character" for Unicode strings. 🙁

    OTOH, most code can be implemented fine as working on strings, without knowing how many characters there are in the string (and it then does
    not need to know about code points, either).

    Indeed, most operations on strings are conversion of things to strings, concatenation of strings, search (typically for a substring or a regexp), extraction of substring where the boundaries result from an earlier
    search, and parsing (which at the bottom relies often on some sort of
    regexp or equivalent system).

    All of those work just fine on a UTF-8 sequence of bytes.

    Emacs Lisp has this misfeature as well (and so does Common Lisp). 🙁
    It's really hard to get rid of it, even though it's used *very* rarely.
    In ELisp, strings are represented internally as utf-8 (tho it pretends
    to be an array opf code points), so an assignment that replaces a single
    char can require reallocating the array!
    One way forward might be to also provide a string-oriented API with
    byte (code unit) indices, and recommend that people use that instead
    of the inefficient code-point-indexed API.

    I think the long term solution for ELisp will be to declare strings as basically immutable.

    Because you know your string only contains "characters" made of a single
    code point?

    This incorrect "knowledge" may be the reason why Emacs 27.1 displays

    K̖̈nig

    as if the first three-code-point character actually was three characters.

    No, the above seems like a problem in the redisplay code, and that code
    is quite aware of combining characters and stuff. You're probably
    seeing simply a missing rule to allow composition/shaping of your word.
    (the composition/shaping library operates on whole strings at a time,
    but Emacs tends to be quite conservative about the string-chunks it
    sends to that library).

    I recommend you `M-x report-emacs-bug`. The fix should be fairly simple.

    E.g. your string contains the representation of the border of a table
    (to be displayed in a tty), and you want to "move" the `+` of a column
    separator (or a prettier version that takes advantage of the wider
    choice offered by Unicode).
    These kinds of things involve additional complications.

    Very much so, indeed. It usually breaks down in many different ways
    because of the common-but-not-guaranteed assumptions.


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Stefan Monnier on Sat May 25 15:48:07 2024
    Stefan Monnier <[email protected]> writes:
    [Anton Ertl:]
    I meant character, not code point, as should have become clear from
    the following. I think that Thomas Koenig meant "character", too, but
    he may have been unaware of the difference between "character" and
    "Unicode code point".

    I don't know of any language (or even library) that supports the notion
    of "character" for Unicode strings.

    My experiments with Telugu suggest that Emacs understands the concept
    of a character at least for the Telugu script (in contrast to
    decomposed Umlauts). If I press a cursor key in Telugu text, Emacs
    advances to the next character, not the next code point. However, if
    I press DEL or BS, it delets a code point.

    Here's some text again for playing around with it:

    తెలుగు లిపి

    Anyway, the Emacs Lisp functions right-char (and, after testing, also left-char, forward-char, and backward-char) support the notion of
    character at least for some scripts. That may be the result of an
    interaction with the redisplay code that you mention later, but in
    that case it's that code that knows about characters in Unicode.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to John Savard on Sun May 26 03:50:46 2024
    John Savard wrote:

    On Sat, 18 May 2024 17:11:32 -0000 (UTC), "Stephen Fuld" <[email protected]d> wrote:

    and
    because it was supposedly "self documenting", easier for managers,
    etc. to see how the program worked.

    Of course, if they designed COBOL that way, why did they include a
    statement that let you re-direct GOTO statements from elsewhere in a
    program?

    That feature (Alter GOTO) was also in Fortran, as the, long since
    deprecated, assigned GOTO statement. I believe they were there to
    support some older computers that didn't have indexed jump/branch
    instructions, so achieved the effect by modifying the branch
    destination in the instruction itself. And yes, it wwas ugly and made comprehension of the program, and also debugging it, much harder.


    I mean, that was just asking for dishonest programmers to direct the
    odd pennies into their bank accounts and so on.

    Not really. You had to Alter the goto statement to some pre-existing
    label, not just anywhere in the code.



    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Stephen Fuld on Sun May 26 08:33:50 2024
    Stephen Fuld <[email protected]d> schrieb:
    John Savard wrote:

    On Sat, 18 May 2024 17:11:32 -0000 (UTC), "Stephen Fuld"
    <[email protected]d> wrote:

    and
    because it was supposedly "self documenting", easier for managers,
    etc. to see how the program worked.

    Of course, if they designed COBOL that way, why did they include a
    statement that let you re-direct GOTO statements from elsewhere in a
    program?

    That feature (Alter GOTO) was also in Fortran, as the, long since
    deprecated, assigned GOTO statement.

    Assigned is

    ASSIGN 10 to N

    GOTO N (10, 20, 30, 40)

    10 CONTINUE

    which I don't think is what John S. is describing.

    What old FORTRAN compilers had was, for debugging, an AT statement,
    which sucked control from the statement into a DEBUG section, without visibility at the place where it came from. The proverbial COME FROM statement, used as a debugging aid; in the DEBUG section, variables
    could be printed _or changed_.

    Rumor has it that the AD statement was regularly abused, so there
    were a lot of programs which did not run cocrrectly unless debugging
    was enabled...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Thomas Koenig on Sun May 26 10:16:27 2024
    Thomas Koenig <[email protected]> schrieb:

    Rumor has it that the AD statement was regularly abused,

    s/AD/AT

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Anton Ertl on Mon May 27 01:09:44 2024
    On Sun, 12 May 2024 05:40:45 GMT, Anton Ertl wrote:

    This is a nice demonstration of the unnecessary complexity that the
    codepoint mistake leads to. ...

    But if they had decided to just store the data as UTF-8 and use byte
    indexes and lengths in their API, and adjusted the rest of their API accordingly, they could have avoided this complexity and
    inefficiency ...

    But UTF-8 is just a representation of code points, not characters. So I
    don’t understand why one way leads to “unnecessary complexity” and the other way does not.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Anton Ertl on Mon May 27 01:11:22 2024
    On Sun, 12 May 2024 16:12:26 GMT, Anton Ertl wrote:

    Plus at some point (not sure when) they decided that characters have to
    be composable ...

    I think that was true right from the beginning. Else you would have had a combinatorial explosion of alphabetic characters with diacritic marks.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Lawrence D'Oliveiro on Mon May 27 06:20:33 2024
    Lawrence D'Oliveiro <[email protected]d> writes:
    On Sun, 12 May 2024 05:40:45 GMT, Anton Ertl wrote:

    This is a nice demonstration of the unnecessary complexity that the
    codepoint mistake leads to. ...

    But if they had decided to just store the data as UTF-8 and use byte
    indexes and lengths in their API, and adjusted the rest of their API
    accordingly, they could have avoided this complexity and
    inefficiency ...

    But UTF-8 is just a representation of code points, not characters. So I >don’t understand why one way leads to “unnecessary complexity” and the >other way does not.

    In UTF-32 a character is a sequence of code points. In UTF-8 it is a
    sequence of code units. In either case, if you have to deal with
    characters, you have to deal with sequences (and most of the code does
    not have to deal with characters and even less code has to deal with
    code points). So converting to UTF-32 buys you nothing and is
    unnecessary complexity.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Lawrence D'Oliveiro on Mon May 27 06:25:28 2024
    Lawrence D'Oliveiro <[email protected]d> writes:
    On Sun, 12 May 2024 16:12:26 GMT, Anton Ertl wrote:

    Plus at some point (not sure when) they decided that characters have to
    be composable ...

    I think that was true right from the beginning. Else you would have had a >combinatorial explosion of alphabetic characters with diacritic marks.

    Unicode has precomposed variants of the Latin characters that are used
    in normal text. It does not have a precomposed character for, e.g.,
    K̖̈, but then such a character does not occur in normal text.

    Unicode 1.0 with its expansion to 16-bit code units only makes sense
    if the resulting code units are characters. If at that point they had
    planned to have variable-width characters, they could have gone with
    something like UTF-8 from the start and spared us a lot of pain.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Anton Ertl on Mon May 27 07:34:48 2024
    On Sat, 18 May 2024 05:29:20 GMT, Anton Ertl wrote:

    Stefan Monnier <[email protected]> writes:

    Does Unicode even has the notion of "character", really?

    AFAIK it does not.

    It uses terms like “grapheme” and “text element” for the concept, leaving
    “character” without a fixed meaning.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Anton Ertl on Mon May 27 07:36:50 2024
    On Mon, 27 May 2024 06:20:33 GMT, Anton Ertl wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:

    On Sun, 12 May 2024 05:40:45 GMT, Anton Ertl wrote:

    This is a nice demonstration of the unnecessary complexity that the
    codepoint mistake leads to. ...

    But if they had decided to just store the data as UTF-8 and use byte
    indexes and lengths in their API, and adjusted the rest of their API
    accordingly, they could have avoided this complexity and inefficiency
    ...

    But UTF-8 is just a representation of code points, not characters. So I >>don’t understand why one way leads to “unnecessary complexity” and the >>other way does not.

    In UTF-32 a character is a sequence of code points. In UTF-8 it is a sequence of code units.

    UTF-8 is a sequence of bytes encoding code points.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Stefan Monnier on Mon May 27 07:40:42 2024
    On Wed, 22 May 2024 15:38:51 -0400, Stefan Monnier wrote:

    I don't know of any language (or even library) that supports the notion
    of "character" for Unicode strings. 🙁

    Surely a “character” (or “grapheme” I think is (one of) the Unicode terms)
    is (represented by) a non-combining code point combined with all the immediately-following combining code points.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Anton Ertl on Mon May 27 07:42:32 2024
    On Mon, 20 May 2024 11:46:20 GMT, Anton Ertl wrote:

    Algol 60 does not standardize a program representation in characters (a
    grave mistake fixed by most later programming languages ...

    That would likely not have been considered feasible in 1960, given the
    wide variation in character sets between computer systems. Even I/O was considered to be in the too-hard basket back then.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Mon May 27 07:43:42 2024
    On Mon, 20 May 2024 17:44:48 +0000, MitchAlsup1 wrote:

    John Savard wrote:

    Historical note: Algol was originally called IAL; remember what JOVIAL
    stood for.

    Who was Joe ?? in Jovial

    Jules Schwartz <http://bitsavers.trailing-edge.com/pdf/sdc/jovial/Schwartz_-_The_Development_of_JOVIAL_1978.pdf>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Savard on Mon May 27 07:45:59 2024
    On Sun, 19 May 2024 15:32:49 -0600, John Savard wrote:

    If Algol was supposed to be an _international_ algorithmic language,
    why weren't its keywords taken from Latin or Esperanto, instead of
    English?

    Much of its syntax came from mathematics, which is international.

    Semi-related question: are there non-English equivalents for mathematical operators like “grad”, “div” and “curl”?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Mon May 27 15:16:13 2024
    According to Lawrence D'Oliveiro <[email protected]d>:
    On Wed, 22 May 2024 15:38:51 -0400, Stefan Monnier wrote:

    I don't know of any language (or even library) that supports the notion
    of "character" for Unicode strings. 🙁

    Surely a “character” (or “grapheme” I think is (one of) the Unicode terms)
    is (represented by) a non-combining code point combined with all the >immediately-following combining code points.

    Take another look at the table I referred to yesterday. When you have
    ZWJ the rules of what combines with what gets awfully complicated.
    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to [email protected] on Mon May 27 16:41:26 2024
    It appears that Lawrence D'Oliveiro <[email protected]d> said:
    Much of its syntax came from mathematics, which is international.

    Semi-related question: are there non-English equivalents for mathematical >operators like “grad”, “div” and “curl”?

    Grad is written as a nabla, an upside down delta, div as nabla followed by a center dot,
    and curl as nabla followed by a multiplication sign.

    I'm reasonably sure my 1970 math textbook used them but I can't find it at the moment.

    If you're asking how they're written in programming languages, I
    expect they use the English names since we have the better part of a
    century of anglophone numerical programming. Wikipedia says that curl
    is often called "rot" for rotation outside North America.

    I happen to have a copy of "Algol 60 Implementation" published in 1963
    which describes the KDF9 Algol compiler in considerable detail. They
    considered the translation of the Algol publication language to the
    5-bit paper tape code their computer used so trivial that they don't
    even describe it.
    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Levine on Tue May 28 01:08:06 2024
    On Mon, 27 May 2024 15:16:13 -0000 (UTC), John Levine wrote:

    According to Lawrence D'Oliveiro <[email protected]d>:
    On Wed, 22 May 2024 15:38:51 -0400, Stefan Monnier wrote:

    I don't know of any language (or even library) that supports the
    notion of "character" for Unicode strings. 🙁

    Surely a “character” (or “grapheme” I think is (one of) the Unicode >> terms) is (represented by) a non-combining code point combined with all
    the immediately-following combining code points.

    Take another look at the table I referred to yesterday. When you have
    ZWJ the rules of what combines with what gets awfully complicated.

    ZWJ is classed as “punctuation”, and has no combining class. So it forms a “character” or “grapheme” it its own right.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Tue May 28 01:25:38 2024
    According to Lawrence D'Oliveiro <[email protected]d>:
    On Mon, 27 May 2024 15:16:13 -0000 (UTC), John Levine wrote:

    According to Lawrence D'Oliveiro <[email protected]d>:
    On Wed, 22 May 2024 15:38:51 -0400, Stefan Monnier wrote:

    I don't know of any language (or even library) that supports the
    notion of "character" for Unicode strings. 🙁

    Surely a “character” (or “grapheme” I think is (one of) the Unicode >>> terms) is (represented by) a non-combining code point combined with all
    the immediately-following combining code points.

    Take another look at the table I referred to yesterday. When you have
    ZWJ the rules of what combines with what gets awfully complicated.

    ZWJ is classed as “punctuation”, and has no combining class. So it forms a >“character” or “grapheme” it its own right.

    Really, you need to look at that combined emoji table I told you about yesterday.



    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to [email protected] on Tue May 28 01:36:22 2024
    It appears that Lawrence D'Oliveiro <[email protected]d> said:
    On Tue, 28 May 2024 01:25:38 -0000 (UTC), John Levine wrote:

    Really, you need to look at that combined emoji table I told you about
    yesterday.

    I’m just telling you what the official Unicode spec says.

    Um, so am I. Those nine code point things are supposed to display
    as a single little picture, regardless of what some other bit of
    the spec may assert about ZWJ.

    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Levine on Tue May 28 01:22:46 2024
    On Mon, 27 May 2024 16:41:26 -0000 (UTC), John Levine wrote:

    It appears that Lawrence D'Oliveiro <[email protected]d> said:

    Much of its syntax came from mathematics, which is international.

    Semi-related question: are there non-English equivalents for
    mathematical operators like “grad”, “div” and “curl”?

    Grad is written as a nabla, an upside down delta, div as nabla followed
    by a center dot, and curl as nabla followed by a multiplication sign.

    That’s right, I’d forgotten about that.

    I happen to have a copy of "Algol 60 Implementation" published in 1963
    which describes the KDF9 Algol compiler in considerable detail. They considered the translation of the Algol publication language to the
    5-bit paper tape code their computer used so trivial that they don't
    even describe it.

    Only 32 code symbols? It must have used shifts, à la Baudot code. It
    probably was Baudot code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Tue May 28 01:34:36 2024
    According to Lawrence D'Oliveiro <[email protected]d>:
    On Mon, 27 May 2024 19:09:51 -0000 (UTC), John Levine wrote:

    According to EricP <[email protected]>:

    One could have instructions that make it easier to parse the variable
    length UTF-8 sequences into codepoints.

    That would be the CU14 instruction on zSeries, to turn UTF-8 into
    UTF-32. CU41 goes the other way.

    What is the point, in this day and age, of having special machine >instructions to convert character encodings?

    Presumably it makes some inner loop faster. They have instructions
    to convert among all of UTF-8, UTF-16, and UTF-32, with an optional
    bit (available at extra cost) to check that the incoming code points
    are valid in the selected encoding.

    zSeries has a lot of instructions like that. They even have packed
    decimal vector instructions (not to be confused with decimal floating
    point vector instructions, which they also have.) I can sort of guess
    why but I don't really know.

    They're almost certainly implemented in what they call millicode,
    vertical microcode that uses the hardware implemented subset of the
    instruction set plus a few extras to manage internal state info. So
    it's not extra hardware, just extra microcode.
    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Levine on Tue May 28 01:29:31 2024
    On Tue, 28 May 2024 01:25:38 -0000 (UTC), John Levine wrote:

    Really, you need to look at that combined emoji table I told you about yesterday.

    I’m just telling you what the official Unicode spec says.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From moi@21:1/5 to Lawrence D'Oliveiro on Tue May 28 15:43:25 2024
    On 28/05/2024 02:22, Lawrence D'Oliveiro wrote:
    On Mon, 27 May 2024 16:41:26 -0000 (UTC), John Levine wrote:

    It appears that Lawrence D'Oliveiro <[email protected]d> said:

    Much of its syntax came from mathematics, which is international.

    Semi-related question: are there non-English equivalents for
    mathematical operators like “grad”, “div” and “curl”?

    Grad is written as a nabla, an upside down delta, div as nabla followed
    by a center dot, and curl as nabla followed by a multiplication sign.

    That’s right, I’d forgotten about that.

    I happen to have a copy of "Algol 60 Implementation" published in 1963
    which describes the KDF9 Algol compiler in considerable detail. They
    considered the translation of the Algol publication language to the
    5-bit paper tape code their computer used so trivial that they don't
    even describe it.

    Only 32 code symbols? It must have used shifts, à la Baudot code. It probably was Baudot code.

    It was Ferranti 5-channel paper tape code: <http://www.findlayw.plus.com/KDF9/The%20KDF9%20Character%20Codes.pdf>
    --
    Bill F.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Lawrence D'Oliveiro on Tue May 28 17:04:20 2024
    Lawrence D'Oliveiro <[email protected]d> schrieb:
    On Sun, 19 May 2024 15:32:49 -0600, John Savard wrote:

    If Algol was supposed to be an _international_ algorithmic language,
    why weren't its keywords taken from Latin or Esperanto, instead of
    English?

    Much of its syntax came from mathematics, which is international.

    Semi-related question: are there non-English equivalents for mathematical operators like “grad”, “div” and “curl”?

    German has "grad", "div" and "rot". People also use the nabla
    operator, which I personally don't like.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Monnier@21:1/5 to All on Tue May 28 16:37:22 2024
    Anyway, the Emacs Lisp functions right-char (and, after testing, also left-char, forward-char, and backward-char) support the notion of
    character at least for some scripts. That may be the result of an interaction with the redisplay code that you mention later, but in
    that case it's that code that knows about characters in Unicode.

    Indeed, the concept is somewhat visible, but it's not really exposed in
    the language. I think what you're seeing is implemented elsewhere than
    in `forward-char`, it's a part of the interactive loop which sees that
    after `forward-char` you end up "in the middle" of a composition and it
    moves the point further, based on information that mostly belongs to the redisplay code.

    Try `C-u 2 C-f` and I suspect you'll see that it doesn't always advance
    by 2 characters but rather it advances by "2 code points + rounding up
    to the next character boundary".


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Monnier@21:1/5 to All on Tue May 28 16:53:14 2024
    Um, so am I. Those nine code point things are supposed to display
    as a single little picture, regardless of what some other bit of
    the spec may assert about ZWJ.

    Maybe it's a good time to start taking bets for which will be the year
    that Unicode becomes Turing complete?


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to moi on Wed May 29 04:49:26 2024
    On Tue, 28 May 2024 15:43:25 +0100, moi wrote:

    On 28/05/2024 02:22, Lawrence D'Oliveiro wrote:

    On Mon, 27 May 2024 16:41:26 -0000 (UTC), John Levine wrote:

    I happen to have a copy of "Algol 60 Implementation" published in 1963
    which describes the KDF9 Algol compiler in considerable detail. They
    considered the translation of the Algol publication language to the
    5-bit paper tape code their computer used so trivial that they don't
    even describe it.

    Only 32 code symbols? It must have used shifts, à la Baudot code. It
    probably was Baudot code.

    It was Ferranti 5-channel paper tape code: <http://www.findlayw.plus.com/KDF9/The%20KDF9%20Character%20Codes.pdf>

    That doc says it’s a 6-bit code.

    By the way, don’t you hate sites that block user agents like wget?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Stefan Monnier on Wed May 29 06:59:55 2024
    Stefan Monnier <[email protected]> writes:
    Anyway, the Emacs Lisp functions right-char (and, after testing, also
    left-char, forward-char, and backward-char) support the notion of
    character at least for some scripts. That may be the result of an
    interaction with the redisplay code that you mention later, but in
    that case it's that code that knows about characters in Unicode.

    Indeed, the concept is somewhat visible, but it's not really exposed in
    the language. I think what you're seeing is implemented elsewhere than
    in `forward-char`, it's a part of the interactive loop which sees that
    after `forward-char` you end up "in the middle" of a composition and it
    moves the point further, based on information that mostly belongs to the >redisplay code.

    Try `C-u 2 C-f` and I suspect you'll see that it doesn't always advance
    by 2 characters but rather it advances by "2 code points + rounding up
    to the next character boundary".

    Confirmed. So Emacs Lisp has a codepoint-oriented interface and then
    needs to compensate for that elsewhere. This does not indicate that a codepoint-oriented interface is a good idea, rather the opposite.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From moi@21:1/5 to Lawrence D'Oliveiro on Wed May 29 08:32:17 2024
    On 29/05/2024 05:49, Lawrence D'Oliveiro wrote:
    On Tue, 28 May 2024 15:43:25 +0100, moi wrote:

    On 28/05/2024 02:22, Lawrence D'Oliveiro wrote:

    On Mon, 27 May 2024 16:41:26 -0000 (UTC), John Levine wrote:

    I happen to have a copy of "Algol 60 Implementation" published in 1963 >>>> which describes the KDF9 Algol compiler in considerable detail. They
    considered the translation of the Algol publication language to the
    5-bit paper tape code their computer used so trivial that they don't
    even describe it.

    Only 32 code symbols? It must have used shifts, à la Baudot code. It
    probably was Baudot code.

    It was Ferranti 5-channel paper tape code:
    <http://www.findlayw.plus.com/KDF9/The%20KDF9%20Character%20Codes.pdf>

    That doc says it’s a 6-bit code.

    KDF9 characters are 6 bits.
    Ferranti paper tape characters are 5 bits.
    When dealing with the latter, the KDF9 paper tape reader
    sets the high bit of each input character to 1,
    and the paper tape punch discards the high bit.

    By the way, don’t you hate sites that block user agents like wget?

    No.
    I hate user agents like wget, which is why I block them.

    --
    Bill F.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Lawrence D'Oliveiro on Wed May 29 08:07:50 2024
    Lawrence D'Oliveiro <[email protected]d> writes:
    On Mon, 20 May 2024 11:46:20 GMT, Anton Ertl wrote:

    Algol 60 does not standardize a program representation in characters (a
    grave mistake fixed by most later programming languages ...

    That would likely not have been considered feasible in 1960, given the
    wide variation in character sets between computer systems.

    COBOL did it. LISP did it. It was feasible in 1960. It's just that
    the Algol 60 committee did not want to go there. And the Algol 68
    committee did not want to go there even though ASCII was standardized
    in 1963, and Algol 68 was only finished in 1974 AFAIK.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Lawrence D'Oliveiro on Wed May 29 08:20:03 2024
    Lawrence D'Oliveiro <[email protected]d> writes:
    On Mon, 27 May 2024 06:20:33 GMT, Anton Ertl wrote:
    In UTF-32 a character is a sequence of code points. In UTF-8 it is a
    sequence of code units.

    UTF-8 is a sequence of bytes encoding code points.

    Yes, but it is even rarer that code points are needed than that
    characters are needed. Another, better way of stating this is:

    In UTF-32 a character is a sequence of (32-bit) code units.
    In UTF-8 a character is a sequence of (8-bit) code units.

    Given that the data is present in files in UTF-8 form, any conversion
    to and from UTF-32 is just an unnecessary complication.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Monnier@21:1/5 to All on Wed May 29 10:44:21 2024
    Confirmed. So Emacs Lisp has a codepoint-oriented interface and then
    needs to compensate for that elsewhere. This does not indicate that a codepoint-oriented interface is a good idea, rather the opposite.

    Note that the "round to the next character boundary" is actually
    generalized to non-Unicode concepts: you can mark a chunk of text as
    being "intangible" or make it invisible and the "round up" will
    correspondingly move to the next boundary to avoid the cursor being in
    the middle of an invisible or intangible chunk of text.

    I'm not sure the codepoint-oriented API is the best option, but it's not completely clear what *is* the best option. You mention a byte-oriented
    API and you might be right that it's a better option, but in the case of
    Emacs that's what we used in Emacs-20.1 but it worked really poorly
    because of backward compatibility issues. I think if we started from
    scratch now (i.e. without having to contend with backward compatibility,
    and with a better understanding of Unicode (which barely existed back
    then)) it might work better, indeed, but that's not been an option 🙁


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Anton Ertl on Thu May 30 02:50:33 2024
    On Wed, 29 May 2024 08:07:50 GMT, Anton Ertl wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:

    On Mon, 20 May 2024 11:46:20 GMT, Anton Ertl wrote:

    Algol 60 does not standardize a program representation in characters
    (a grave mistake fixed by most later programming languages ...

    That would likely not have been considered feasible in 1960, given the
    wide variation in character sets between computer systems.

    COBOL did it. LISP did it.

    And so did Fortran. They all did it by severely curtailing their allowed character sets.

    It's just that the Algol 60 committee did not want to go there.

    They wanted symbols like “÷”, “×”, “↑”, “≤”, “≥”, “≠”, “≡”, “⊃”, “∨”, “∧”,
    “¬” ... you get the idea. I don’t any computer system on earth could provide all those symbols at the time, or even, say, 20 years later.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Anton Ertl on Thu May 30 02:53:28 2024
    On Wed, 29 May 2024 08:20:03 GMT, Anton Ertl wrote:

    In UTF-32 a character is a sequence of (32-bit) code units.
    In UTF-8 a character is a sequence of (8-bit) code units.

    The point being, there is a 1:1 correspondence between the two
    representations of the same characters/code points. So your claim that use
    of one is somehow a “mistake” while the other is not, is spurious.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to moi on Thu May 30 02:43:39 2024
    On Wed, 29 May 2024 08:32:17 +0100, moi wrote:

    I hate user agents like wget, which is why I block them.

    Which is completely futile, which is why it’s so stupid to do.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Thu May 30 03:25:14 2024
    According to Lawrence D'Oliveiro <[email protected]d>:
    On Wed, 29 May 2024 07:04:35 GMT, Anton Ertl wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:

    Isn’t the point of RISC that these complex operations are
    more efficiently performed by a sequence of simpler instructions?

    The IBM z series are not RISCs.

    Doesn’t matter. The principles of designing high-performance architectures >still apply: simpler instructions are better than more complex ones.

    Nobody buys a mainframe just for its compute speed.

    I do not entirely understand why IBM keeps adding special purpose
    instructions to z. Maybe it's partly marketing, but they have a
    largely captive audience so it has to be more than that. Given the
    millicode design, a lot of the instructions are basically microcoded subroutines that may well run faster than the normal code equivalent
    because the have access to more machine state. If anyone is about to
    say than let all the instructions see all the state, see our
    discussion a week or two ago about architecture vs. implementation.

    If you want something that gives you more MIPS/$, IBM is happy to sell
    you POWER systems.

    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to John Levine on Thu May 30 03:29:28 2024
    John Levine wrote:

    According to Lawrence D'Oliveiro <[email protected]d>:
    On Wed, 29 May 2024 07:04:35 GMT, Anton Ertl wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:

    Isn’t the point of RISC that these complex operations are
    more efficiently performed by a sequence of simpler
    instructions?

    The IBM z series are not RISCs.

    Doesn’t matter. The principles of designing high-performance architectures still apply: simpler instructions are better than
    more complex ones.

    Nobody buys a mainframe just for its compute speed.

    I do not entirely understand why IBM keeps adding special purpose instructions to z. Maybe it's partly marketing, but they have a
    largely captive audience so it has to be more than that. Given the
    millicode design, a lot of the instructions are basically microcoded subroutines that may well run faster than the normal code equivalent
    because the have access to more machine state. If anyone is about to
    say than let all the instructions see all the state, see our
    discussion a week or two ago about architecture vs. implementation.


    Thanks John. Your post and my previous one "crossed in the night". I
    think you answered my question.




    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to Lawrence D'Oliveiro on Thu May 30 03:21:13 2024
    Lawrence D'Oliveiro wrote:


    snip

    They wanted symbols like “÷”, “×”, “↑”, “≤”, “≥”, “≠”, “≡”, “⊃”, “∨”,
    “∧”, “¬” ... you get the idea. I don’t any computer system on earth
    could provide all those symbols at the time, or even, say, 20 years
    later.

    See APL. So many symbols that the language is almost impossible to
    read without a significant investment in learning them.

    https://en.wikipedia.org/wiki/APL_syntax_and_symbols#Monadic_functions


    Please note that I am not advocating this. It is at the opposite end
    of the spectrum from COBOL where you could get by with no special
    characters beyond periods. Neither was a good choice.



    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Stephen Fuld on Wed May 29 21:47:52 2024
    "Stephen Fuld" <[email protected]d> writes:

    Lawrence D'Oliveiro wrote:


    snip

    They wanted symbols like [...]

    See APL. So many symbols that the language is almost impossible to
    read without a significant investment in learning them.

    https://en.wikipedia.org/wiki/APL_syntax_and_symbols#Monadic_functions

    The problem with learning APL is not the character set. APL without
    any special characters (which I actually have some experience using)
    is still unlike any other programming language that existed in the
    1960s or 1970s.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to Tim Rentsch on Thu May 30 06:12:11 2024
    Tim Rentsch wrote:

    "Stephen Fuld" <[email protected]d> writes:

    Lawrence D'Oliveiro wrote:


    snip

    They wanted symbols like [...]

    See APL. So many symbols that the language is almost impossible to
    read without a significant investment in learning them.


    https://en.wikipedia.org/wiki/APL_syntax_and_symbols#Monadic_functions

    The problem with learning APL is not the character set. APL without
    any special characters (which I actually have some experience using)
    is still unlike any other programming language that existed in the
    1960s or 1970s.

    OK, but my main point was to show, by counter example, the error of
    Lawrence's statement quoted below


    I don�t any computer system on earth could
    provide all those symbols at the time, or even, say, 20 years later.

    If the part about the difficulty of learning APL was wrong, then I
    apologise.




    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Stephen Fuld on Thu May 30 05:38:00 2024
    "Stephen Fuld" <[email protected]d> writes:

    Tim Rentsch wrote:

    "Stephen Fuld" <[email protected]d> writes:

    Lawrence D'Oliveiro wrote:

    snip

    They wanted symbols like [...]

    See APL. So many symbols that the language is almost impossible to
    read without a significant investment in learning them.

    https://en.wikipedia.org/wiki/APL_syntax_and_symbols#Monadic_functions

    The problem with learning APL is not the character set. APL without
    any special characters (which I actually have some experience using)
    is still unlike any other programming language that existed in the
    1960s or 1970s.

    OK, but my main point was to show, by counter example, the error of Lawrence's statement quoted below

    I see. I misunderstood the point of what you were saying. Sorry
    about that.

    I don't any computer system on earth could provide all those
    symbols at the time, or even, say, 20 years later.

    If the part about the difficulty of learning APL was wrong, then I
    apologise.

    No apology needed. Even if the APL character set wasn't the main
    source of the difficulty, there is no question that the unusual
    choice of operator characters used contributed to the effort needed
    to understand and use APL.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to John Levine on Thu May 30 12:27:17 2024
    John Levine <[email protected]> writes:
    According to Lawrence D'Oliveiro <[email protected]d>:
    On Wed, 29 May 2024 07:04:35 GMT, Anton Ertl wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:

    Isn’t the point of RISC that these complex operations are
    more efficiently performed by a sequence of simpler instructions?

    The IBM z series are not RISCs.

    Doesn’t matter. The principles of designing high-performance architectures >>still apply: simpler instructions are better than more complex ones.

    Nobody buys a mainframe just for its compute speed.

    I do not entirely understand why IBM keeps adding special purpose >instructions to z. Maybe it's partly marketing, but they have a
    largely captive audience so it has to be more than that.

    It's still marketing. I have listened to several talks about
    converting S/360 programs to C code that can be run on arbitrary
    hardware, and IBM's audience hears about such things, too, so IBM's
    sales force has to provide reasons for not jumping ship. And all
    these new features that sound like they are useful are such reasons.
    Things like decimal FP and CU14.

    The fact that these feature provide no actual benefit is their best
    property: When Intel and ARM evaluate whether they should implement
    these features in their architectures, they find that the benefits of
    these features do not justify their costs, so they refrain from adding
    them to their architectures, preserving the marketing value of the
    feature to IBM.

    Given the
    millicode design, a lot of the instructions are basically microcoded >subroutines that may well run faster than the normal code equivalent
    because the have access to more machine state.

    Maybe IBM adds a microarchitectural stream buffer to allow efficient implementation of CU14, but I doubt it. The marketing value of CU14
    is there whether there is such a stream buffer or not, so why go to
    the expense. If they already have such a stream buffer for other
    features, they might as well use it, though.

    Maybe they internally do the SIMDified RISCy variant I outlined, and
    then have a microcode loop. The SIMDified RISCy variant should be
    cheap enough to implement.

    Or maybe they just have a microcode routine that does what a C program
    would do. In that case there is no performance benefit to having a
    separate instruction, but the marketing benefit is still there.

    If you want something that gives you more MIPS/$, IBM is happy to sell
    you POWER systems.

    If you want something that gives you more MIPS (as well as more
    MIPS/$), lots of companies will be happy to sell you gear with AMD or
    Intel CPUs.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Lawrence D'Oliveiro on Thu May 30 12:47:35 2024
    Lawrence D'Oliveiro <[email protected]d> writes:
    On Wed, 29 May 2024 08:20:03 GMT, Anton Ertl wrote:

    In UTF-32 a character is a sequence of (32-bit) code units.
    In UTF-8 a character is a sequence of (8-bit) code units.

    The point being, there is a 1:1 correspondence between the two >representations of the same characters/code points. So your claim that use
    of one is somehow a “mistake” while the other is not, is spurious.

    If the data you are working on is provided in files containing UTF-8, conversion to UTF-32 does not provide any benefits and is therefore an unnecessary complication, and therefore a mistake.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Thomas Koenig on Thu May 30 14:08:04 2024
    Thomas Koenig <[email protected]> writes:
    Anton Ertl <[email protected]> schrieb:

    It's still marketing. I have listened to several talks about
    converting S/360 programs to C code that can be run on arbitrary
    hardware, and IBM's audience hears about such things, too, so IBM's
    sales force has to provide reasons for not jumping ship. And all
    these new features that sound like they are useful are such reasons.
    Things like decimal FP and CU14.

    The fact that these feature provide no actual benefit is their best
    property:

    No actual benefit?

    If you make such a strong statement, I assume that you have done a
    thorough analysis of this feature for typical mainframe workloads
    and can support your claims with benchmarks.

    Note that the feature was introduced in Znext (2012). That it is
    still there must indicate that it gets some usage.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Scott Lurndal on Thu May 30 17:12:22 2024
    On Thu, 30 May 2024 14:08:04 GMT
    [email protected] (Scott Lurndal) wrote:

    Thomas Koenig <[email protected]> writes:
    Anton Ertl <[email protected]> schrieb:

    It's still marketing. I have listened to several talks about
    converting S/360 programs to C code that can be run on arbitrary
    hardware, and IBM's audience hears about such things, too, so IBM's
    sales force has to provide reasons for not jumping ship. And all
    these new features that sound like they are useful are such
    reasons. Things like decimal FP and CU14.

    The fact that these feature provide no actual benefit is their best
    property:

    No actual benefit?

    If you make such a strong statement, I assume that you have done a
    thorough analysis of this feature for typical mainframe workloads
    and can support your claims with benchmarks.

    Note that the feature was introduced in Znext (2012). That it is
    still there must indicate that it gets some usage.


    Not necessarily.
    After feature was given publicly documented opcode it's very hard to
    remove it.
    Naturally, I don't know if this particular feature got publicly
    documented opcode and don't know where too look.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Anton Ertl on Thu May 30 13:41:58 2024
    Anton Ertl <[email protected]> schrieb:

    It's still marketing. I have listened to several talks about
    converting S/360 programs to C code that can be run on arbitrary
    hardware, and IBM's audience hears about such things, too, so IBM's
    sales force has to provide reasons for not jumping ship. And all
    these new features that sound like they are useful are such reasons.
    Things like decimal FP and CU14.

    The fact that these feature provide no actual benefit is their best
    property:

    No actual benefit?

    If you make such a strong statement, I assume that you have done a
    thorough analysis of this feature for typical mainframe workloads
    and can support your claims with benchmarks.

    Care to show exactly what you did, and what the results were?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Michael S on Thu May 30 14:53:28 2024
    Michael S <[email protected]> schrieb:


    Naturally, I don't know if this particular feature got publicly
    documented opcode and don't know where too look.

    Search for the famed "Principle of Operations" for zSystems.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Michael S on Thu May 30 15:28:51 2024
    Michael S <[email protected]> writes:
    On Thu, 30 May 2024 14:08:04 GMT
    [email protected] (Scott Lurndal) wrote:
    Note that the feature was introduced in Znext (2012). That it is
    still there must indicate that it gets some usage.


    Not necessarily.
    After feature was given publicly documented opcode it's very hard to
    remove it.

    Even if this reason did not exist, the marketing reason for having
    this instruction still exists, so why should they remove it?

    Naturally, I don't know if this particular feature got publicly
    documented opcode and don't know where too look.

    These instructions have public opcodes, and I gave an URL and page
    number in <[email protected]>.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Thu May 30 15:49:24 2024
    According to Anton Ertl <[email protected]>:
    Concerning benchmarks, last I heard IBM forbids benchmarking z
    hardware. Until they change this, I'll assume their z hardware is
    abysmally slow and any benchmarking would result in embarrassment, IBM
    knows this and that's why they forbid benchmarking.

    My guess is that it's not so much that it's slow but that, even more
    than usual, benchmarks show what you want them to show. For example,
    if you benchmark a portable version of gzip or heapsort it will look
    worse than one that knows to use the accelerator instructions.


    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Anton Ertl on Thu May 30 15:55:59 2024
    Anton Ertl <[email protected]> schrieb:
    Thomas Koenig <[email protected]> writes:
    Anton Ertl <[email protected]> schrieb:

    It's still marketing. I have listened to several talks about
    converting S/360 programs to C code that can be run on arbitrary
    hardware, and IBM's audience hears about such things, too, so IBM's
    sales force has to provide reasons for not jumping ship. And all
    these new features that sound like they are useful are such reasons.
    Things like decimal FP and CU14.

    The fact that these feature provide no actual benefit is their best
    property:

    No actual benefit?

    If you make such a strong statement, I assume that you have done a
    thorough analysis of this feature for typical mainframe workloads
    and can support your claims with benchmarks.

    Care to show exactly what you did, and what the results were?

    It provides no actual benefit, because UTF-32 provides no actual
    benefit.

    In other words, you didnt't.

    Thanks for the explanation.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Thomas Koenig on Thu May 30 15:04:35 2024
    Thomas Koenig <[email protected]> writes:
    Anton Ertl <[email protected]> schrieb:

    It's still marketing. I have listened to several talks about
    converting S/360 programs to C code that can be run on arbitrary
    hardware, and IBM's audience hears about such things, too, so IBM's
    sales force has to provide reasons for not jumping ship. And all
    these new features that sound like they are useful are such reasons.
    Things like decimal FP and CU14.

    The fact that these feature provide no actual benefit is their best
    property:

    No actual benefit?

    If you make such a strong statement, I assume that you have done a
    thorough analysis of this feature for typical mainframe workloads
    and can support your claims with benchmarks.

    Care to show exactly what you did, and what the results were?

    It provides no actual benefit, because UTF-32 provides no actual
    benefit. In nearly all code you don't need code points. Dealing with
    data as mostly opaque strings in UTF-8 is less complicated *and* more
    efficient than converting them to UTF-32, working with UTF-32 strings,
    and converting back (even if the conversion was very cheap).

    Of course there are API mistakes (like Python3) that lead to some
    usage of UTF-32, but even on Intel and AMD CPUs where Python3 code
    probably consumes more cycles than on other hardware, that usage has
    not been enough to add instructions like CU14.

    IBM z also has CU12 and CU21 (for converting between UTF-8 and
    UTF-16), and such instructions could see some usage in environments
    where UTF-16 is big, such as Java, JavaScript, and Windows, but even
    in CPUs by Intel and AMD (with lots of Windows and JavaScript) and ARM (Android, i.e., Java) this has not led to instructions for converting
    between UTF-8 and UTF-16.

    Concerning benchmarks, last I heard IBM forbids benchmarking z
    hardware. Until they change this, I'll assume their z hardware is
    abysmally slow and any benchmarking would result in embarrassment, IBM
    knows this and that's why they forbid benchmarking.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Stefan Monnier on Thu May 30 16:25:46 2024
    Stefan Monnier <[email protected]> writes:
    I'm not sure the codepoint-oriented API is the best option, but it's not >completely clear what *is* the best option. You mention a byte-oriented
    API and you might be right that it's a better option, but in the case of >Emacs that's what we used in Emacs-20.1 but it worked really poorly
    because of backward compatibility issues. I think if we started from
    scratch now (i.e. without having to contend with backward compatibility,
    and with a better understanding of Unicode (which barely existed back
    then)) it might work better, indeed, but that's not been an option

    Plus, editors are among the very few uses where you have to deal with individual characters, so the "treat it as opaque string" approach
    that works so well for most other code is not good enough there. The command-line editor of Gforth is one case where we use the xchar words
    (those for dealing with code points of UTF-8).

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Monnier@21:1/5 to All on Thu May 30 14:01:53 2024
    The problem with learning APL is not the character set. APL without
    any special characters (which I actually have some experience using)
    is still unlike any other programming language that existed in the
    1960s or 1970s.

    There have been a few languages that took similar approaches, but the
    most recent and successful I've heard of is [jq](https://en.wikipedia.org/wiki/Jq_%28programming_language%29).


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From moi@21:1/5 to Lawrence D'Oliveiro on Thu May 30 19:01:11 2024
    On 30/05/2024 03:43, Lawrence D'Oliveiro wrote:
    On Wed, 29 May 2024 08:32:17 +0100, moi wrote:

    I hate user agents like wget, which is why I block them.

    Which is completely futile, which is why it’s so stupid to do.

    What a know-all you are. And offensive with it.

    --
    Bill F.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to [email protected] on Thu May 30 22:22:34 2024
    On Thu, 30 May 2024 02:50:33 -0000 (UTC), Lawrence D'Oliveiro
    <[email protected]d> wrote:

    And so did Fortran. They all did it by severely curtailing their allowed >character sets.

    It's just that the Algol 60 committee did not want to go there.

    They wanted symbols like ���, �ה, �?�, �?�, �?�, �?�, �?�, �?�, �?�, �?�, >��� ... you get the idea. I don�t any computer system on earth could
    provide all those symbols at the time, or even, say, 20 years later.

    Well, the 120 character chain for the STRETCH computer's printer
    handled Algol's character set. And so did the punched card code for a
    couple of Russian computers. So the attempt was made.

    And then there was the LISP machine, which started life with the
    infamous "Space Cadet" computer.

    Today, of course, we have Unicode, but that doesn't mean the entire
    Algol character set is conveniently accessible directly from the
    keyboard.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to Anton Ertl on Thu May 30 22:19:14 2024
    On Wed, 29 May 2024 08:07:50 GMT, [email protected]
    (Anton Ertl) wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:
    On Mon, 20 May 2024 11:46:20 GMT, Anton Ertl wrote:

    Algol 60 does not standardize a program representation in characters (a
    grave mistake fixed by most later programming languages ...

    That would likely not have been considered feasible in 1960, given the
    wide variation in character sets between computer systems.

    COBOL did it. LISP did it. It was feasible in 1960. It's just that
    the Algol 60 committee did not want to go there.

    There was a famous article by Bob Bemer in 1960 in the Communications
    of the ACM in which he gave a talbe of all this variation in character
    sets between computers. This helped spur the adoption of ASCII.

    Algol 60 was intended as an International Algorithmic Language. In
    fact, that's what Algol was first called, hence JOVIAL. So it is _not_ particularly hard for me to believe that the international committee
    behind Algol 60 wished to support a wider variety of computers than
    the people behind COBOL and LISP. Yes, those languages, unlike
    FORTRAN, weren't the creations of a single manufacturer.

    But they _were_ fairly U.S. - centric, and Algol was *not*. For
    example, there were British computer systems that offered Algol
    compilers that based their character sets on modified 5-unit
    teleprinters.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to [email protected] on Thu May 30 22:25:47 2024
    On Thu, 30 May 2024 06:12:11 -0000 (UTC), "Stephen Fuld" <[email protected]d> wrote:

    If the part about the difficulty of learning APL was wrong, then I
    apologise.

    I would not say that it was wrong. APL "without special characters"
    was achieved by way of a transliteration scheme, where short codes
    represented the special characters. So instead of memorizing funny
    shapes, you memorized cryptic abbreviations.

    So the character set was _still_ the source of the difficulty of
    learning APL even if you happened to be using an implementation that
    didn't have any special characters.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to [email protected] on Thu May 30 22:31:41 2024
    On Thu, 30 May 2024 03:25:14 -0000 (UTC), John Levine
    <[email protected]> wrote:

    I do not entirely understand why IBM keeps adding special purpose >instructions to z. Maybe it's partly marketing, but they have a
    largely captive audience so it has to be more than that.

    One possibility is to _keep_ that audience captive even after all the
    patents expire that are applicable to machines with the z/Architecture
    in its current state, if you are reluctant to believe that these new instructions genuinely improve performance.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to John Savard on Fri May 31 12:59:42 2024
    On Thu, 30 May 2024 22:19:14 -0600
    John Savard <[email protected]d> wrote:

    But they _were_ fairly U.S. - centric, and Algol was *not*. For
    example,

    U.S.-centric vs U.S. eccentric. http://www.cs.yale.edu/homes/perlis-alan/quotes.html

    Actually I am pretty sure that "eccentric" is not a fair
    characterisation of his personality, but can't resist.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Terje Mathisen@21:1/5 to Thomas Koenig on Fri May 31 14:23:36 2024
    Thomas Koenig wrote:
    Anton Ertl <[email protected]> schrieb:

    It's still marketing. I have listened to several talks about
    converting S/360 programs to C code that can be run on arbitrary
    hardware, and IBM's audience hears about such things, too, so IBM's
    sales force has to provide reasons for not jumping ship. And all
    these new features that sound like they are useful are such reasons.
    Things like decimal FP and CU14.

    The fact that these feature provide no actual benefit is their best
    property:

    No actual benefit?

    If you make such a strong statement, I assume that you have done a
    thorough analysis of this feature for typical mainframe workloads
    and can support your claims with benchmarks.

    Care to show exactly what you did, and what the results were?

    I am pretty sure Anton is correct, at least for data residing in RAM,
    since any reasonably efficient sw algorithm to do the same thing should
    be able to keep up with memory bandwidth, right?

    If the data is already in cache, then you have presumably already
    converted to whatever format you need to use internally while loading.

    It is only when working with smallish blocks (up to a few kB of data,
    fitting in $L1) and needing to run some temporary operation on decoded codepoints, that this woudl be a significant win.

    Terje

    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to John Savard on Fri May 31 09:47:58 2024
    John Savard <[email protected]d> writes:

    On Thu, 30 May 2024 06:12:11 -0000 (UTC), "Stephen Fuld" <[email protected]d> wrote:

    If the part about the difficulty of learning APL was wrong, then I
    apologise.

    I would not say that it was wrong. APL "without special characters"
    was achieved by way of a transliteration scheme, where short codes represented the special characters. So instead of memorizing funny
    shapes, you memorized cryptic abbreviations.

    So the character set was _still_ the source of the difficulty of
    learning APL even if you happened to be using an implementation that
    didn't have any special characters.

    The character set was a source of some of the difficulty of
    learning APL. Certainly not all of it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to BGB on Fri May 31 17:21:53 2024
    BGB wrote:

    On 5/30/2024 11:25 AM, Anton Ertl wrote:
    Stefan Monnier <[email protected]> writes:
    I'm not sure the codepoint-oriented API is the best option, but it's
    not
    completely clear what *is* the best option. You mention a
    byte-oriented
    API and you might be right that it's a better option, but in the case
    of
    Emacs that's what we used in Emacs-20.1 but it worked really poorly
    because of backward compatibility issues. I think if we started from
    scratch now (i.e. without having to contend with backward
    compatibility,
    and with a better understanding of Unicode (which barely existed back
    then)) it might work better, indeed, but that's not been an option

    Plus, editors are among the very few uses where you have to deal with
    individual characters, so the "treat it as opaque string" approach
    that works so well for most other code is not good enough there. The
    command-line editor of Gforth is one case where we use the xchar words
    (those for dealing with code points of UTF-8).


    Yeah.

    For text editors, this is one of the few cases it makes sense to use 32

    or 64 bit characters (say, combining the 'character' with some
    additional metadata such as formatting).

    Though, one thing that makes sense for text editors is if only the
    "currently being edited" lines are fully unpacked, whereas the others
    can remain in a more compact form (such as UTF-8), and are then
    unpacked

    as they come into view (say, treating the editor window as a 32-entry
    modulo cache or similar).

    For the rest, say, one can have, say, a big buffer, with an array of
    lines giving the location and size of the line's text in the buffer.

    In a modern text editor, one can paste in {*.xls tables, *.jpg, *.gif,
    ..}
    along with text from different fonts and different backgrounds on a per character basis.

    If a line is modified, it can be reallocated at the end of the buffer,
    and if the buffer gets full, it can be "repacked" and/or expanded as
    needed. When written back to a file, the buffer lines can be emitted
    in-order to the text file.

    Not entirely sure how other text editors manage things here, not really

    looked into it.

    If you think about it with the above features, you quickly realize it
    is not just text anymore.


    - anton

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Fri May 31 19:41:01 2024
    According to Terje Mathisen <[email protected]>:
    Read all about it: https://www.vm.ibm.com/library/other/22783213.pdf

    It's on page 7-251.

    Thanks!

    I did read all of it, and it was pretty close to how I would have
    designed a sw function to do the same, except for the very funky ABI:

    Both source and destination _must_ be an even register number, with the >following odd register providing the count/length.

    That's the way they've been handling address+length pairs since they
    added long compare and move instructions in S/370. They're so common
    I'd expect there to be hardware to deal with them.

    Just from this little snippet I'm pretty sure this instruction has a
    sizeable startup overhead, compiler support is probably in the form of
    an intrinsic that knows about the need to allocate two pairs of
    register, each pair starting at an even-numbered register.

    Same register allocation would be needed for a string compare or move,
    so that's nothing new.
    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Fri May 31 19:44:49 2024
    According to John Savard <[email protected]d>:
    On Thu, 30 May 2024 03:25:14 -0000 (UTC), John Levine
    <[email protected]> wrote:

    I do not entirely understand why IBM keeps adding special purpose >>instructions to z. Maybe it's partly marketing, but they have a
    largely captive audience so it has to be more than that.

    One possibility is to _keep_ that audience captive even after all the
    patents expire that are applicable to machines with the z/Architecture
    in its current state, if you are reluctant to believe that these new >instructions genuinely improve performance.

    Back in the last millenium there were a bunch of companies that made
    clones of IBM mainframes. They all failed. It's the whole ecosystem of
    hardware and software, not just individual features that keep the
    customers nor patents.

    I have to say I'm somewhat surprised that IBM has put a lot of effort
    into running linux on zSeries, since that's about as un-captive as you
    can get. I would imagine that for some kinds of heavily threaded
    workloads they could be competitive since the z machines have upwards
    of a hundred CPUs with a shared mostly consistent cache.

    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to BGB on Fri May 31 19:12:49 2024
    BGB wrote:

    On 5/31/2024 12:21 PM, MitchAlsup1 wrote:


    For the rest, say, one can have, say, a big buffer, with an array of
    lines giving the location and size of the line's text in the buffer.

    In a modern text editor, one can paste in {*.xls tables, *.jpg, *.gif,
    ..}
    along with text from different fonts and different backgrounds on a per
    character basis.


    Errm, I think we call this a word processor, not a text editor.

    So, you are calling AOL e-mail editor a word processor ??? !!?! Gasp !
    And every modern forum editor (this one not included) word processors
    !!

    Me thinks your definition is overly inclusive.

    Granted, text editors don't usually store font or formatting
    information

    in the text itself, but rather it exists temporarily for things like
    "syntax highlighting".


    If a line is modified, it can be reallocated at the end of the buffer,
    and if the buffer gets full, it can be "repacked" and/or expanded as
    needed. When written back to a file, the buffer lines can be emitted
    in-order to the text file.

    Not entirely sure how other text editors manage things here, not really

    looked into it.

    If you think about it with the above features, you quickly realize it
    is not just text anymore.


    But, word processors are their own category...

    Typically, they also have their own specialized formats (though, "big
    blob of XML inside a ZIP package" seems to have become popular).

    Whereas text-editors typically use plain ASCII/UTF-8/UTF-16 files...
    The great "feature creep" in text editors is mostly that modern ones
    support syntax highlighting and emojis.



    An intermediate option would be a wysiwyg editor that does MediaWiki or

    Markdown. Though, annoyingly, there don't seem to be any that exist as standalone desktop programs (seemingly invariably they are written in JavaScript or similar and intended to operate inside a browser).

    I might eventually need to get around to writing something like this
    (mostly because I use MediaWiki notation for some of my own
    documentation). Also arguably mode advanced than the system used by
    "info" and "man", though a tool along these lines could make sense (but

    possibly as an intermediate, with an interface more like "man" but able

    to jump between documents more like "info").



    Also, bug hunt is annoying. Find/fix one bug, but more bugs remain...
    My project is seemingly in a rather buggy state right at the moment.

    But, I guess, did add things like file redirection and similar, along
    with a few more standard commands.

    So, in the working version, technically things like "cat file1 > file2"

    or "program > file" and similar are now technically possible...

    But, also, everything has turned into a crapstorm of crashes...



    - anton

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Fri May 31 19:47:36 2024
    According to Michael S <[email protected]>:
    U.S.-centric vs U.S. eccentric. >http://www.cs.yale.edu/homes/perlis-alan/quotes.html

    Actually I am pretty sure that "eccentric" is not a fair
    characterisation of his personality, but can't resist.

    He was my thesis advisor and he was pretty eccentric. In a nice way,
    but still quite a character.

    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to John Levine on Fri May 31 21:03:30 2024
    John Levine <[email protected]> writes:
    According to John Savard <[email protected]d>:
    On Thu, 30 May 2024 03:25:14 -0000 (UTC), John Levine
    <[email protected]> wrote:

    I do not entirely understand why IBM keeps adding special purpose >>>instructions to z. Maybe it's partly marketing, but they have a
    largely captive audience so it has to be more than that.

    One possibility is to _keep_ that audience captive even after all the >>patents expire that are applicable to machines with the z/Architecture
    in its current state, if you are reluctant to believe that these new >>instructions genuinely improve performance.

    Back in the last millenium there were a bunch of companies that made
    clones of IBM mainframes. They all failed. It's the whole ecosystem of >hardware and software, not just individual features that keep the
    customers nor patents.

    I have to say I'm somewhat surprised that IBM has put a lot of effort
    into running linux on zSeries, since that's about as un-captive as you
    can get. I would imagine that for some kinds of heavily threaded
    workloads they could be competitive since the z machines have upwards
    of a hundred CPUs with a shared mostly consistent cache.

    I had heard somewhere that the linux use cases on Z run
    multiple VMs, rather than single large SMP.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to John Levine on Fri May 31 21:05:36 2024
    John Levine wrote:

    According to Michael S <[email protected]>:
    U.S.-centric vs U.S. eccentric. >>http://www.cs.yale.edu/homes/perlis-alan/quotes.html

    Actually I am pretty sure that "eccentric" is not a fair
    characterisation of his personality, but can't resist.

    He was my thesis advisor and he was pretty eccentric. In a nice way,
    but still quite a character.


    Back in my day, eccentric was used in the British fashion to point out
    a person with certain qualities that make him instantly memorable, but
    not in any bad way. The Characters on Monty Python were eccentric !!

    Now it means a person with creepy qualities.

    My how the language has migrated.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to [email protected] on Fri May 31 21:01:13 2024
    [email protected] (MitchAlsup1) writes:
    BGB wrote:

    On 5/31/2024 12:21 PM, MitchAlsup1 wrote:


    For the rest, say, one can have, say, a big buffer, with an array of
    lines giving the location and size of the line's text in the buffer.

    In a modern text editor, one can paste in {*.xls tables, *.jpg, *.gif,
    ..}
    along with text from different fonts and different backgrounds on a per
    character basis.


    Errm, I think we call this a word processor, not a text editor.

    So, you are calling AOL e-mail editor a word processor ???

    Yep.


    And every modern forum editor (this one not included) word processors

    Yep. They're certainly not text editors along the lines of vim or emacs.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to [email protected] on Fri May 31 21:51:56 2024
    On Fri, 31 May 2024 19:44:49 -0000 (UTC), John Levine
    <[email protected]> wrote:

    I have to say I'm somewhat surprised that IBM has put a lot of effort
    into running linux on zSeries, since that's about as un-captive as you
    can get.

    You can buy a zSeries machine more cheaply if it can only run Linux,
    but not any IBM operating systems. So this is presumably for the
    purpose of expanding the popularity of the z/Architecture without in
    any way threatening the profitability of their base market.

    If they took it to its logical conclusion, and packaged zArchitecture
    chips without the ability to run current IBM operating systems in the
    same way as POWER chips, I might actually be interested.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to John Savard on Sat Jun 1 07:47:49 2024
    John Savard <[email protected]d> schrieb:
    On Fri, 31 May 2024 19:44:49 -0000 (UTC), John Levine
    <[email protected]> wrote:

    I have to say I'm somewhat surprised that IBM has put a lot of effort
    into running linux on zSeries, since that's about as un-captive as you
    can get.

    You can buy a zSeries machine more cheaply if it can only run Linux,
    but not any IBM operating systems. So this is presumably for the
    purpose of expanding the popularity of the z/Architecture without in
    any way threatening the profitability of their base market.

    One of the main selling points is the hardware reliability, and
    you get this with Linux, too. Plus, you can always run zOS in
    parallel with Linux, either in LPAR mode or as a guest under VM
    (or under KVM, if you're so inclined).

    Software availability is probably the main driver. Even SAP made
    SAP HANA Linux-only, and they have announced that other systems
    will be dropped, so IBM is probably very glad they did that Linux port.

    From what I heard, it actually started out as some people trying
    out a port of Linux as an unofficial hobby project, and finding
    it surprisingly easy.

    If they took it to its logical conclusion, and packaged zArchitecture
    chips without the ability to run current IBM operating systems in the
    same way as POWER chips, I might actually be interested.

    Would you like to buy one, then? That would be a large investment
    of money and space in your home... but then again, an 18-year old
    once bought a z890, see https://www.youtube.com/watch?v=45X4VP8CGtk

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Terje Mathisen on Sat Jun 1 08:50:22 2024
    Terje Mathisen <[email protected]> schrieb:
    Thomas Koenig wrote:
    Anton Ertl <[email protected]> schrieb:

    It's still marketing. I have listened to several talks about
    converting S/360 programs to C code that can be run on arbitrary
    hardware, and IBM's audience hears about such things, too, so IBM's
    sales force has to provide reasons for not jumping ship. And all
    these new features that sound like they are useful are such reasons.
    Things like decimal FP and CU14.

    The fact that these feature provide no actual benefit is their best
    property:

    No actual benefit?

    If you make such a strong statement, I assume that you have done a
    thorough analysis of this feature for typical mainframe workloads
    and can support your claims with benchmarks.

    Care to show exactly what you did, and what the results were?

    I am pretty sure Anton is correct, at least for data residing in RAM,
    since any reasonably efficient sw algorithm to do the same thing should
    be able to keep up with memory bandwidth, right?

    I'm not sure that would be the case for text containing some
    non-ASCII characters, where you cannot predict branches well
    (consider Å, Ø and Æ, which together appear to make up around
    a bit more than 2.5% according to a random statistic I just
    grabbed off the Internet), or ä, ö and ü which have around 1.5%
    occurrence together.

    In Chinese or Japanese text, I assume the spaces and punctuation
    are 7-bit ASCII (are they, actually?) so things would be even
    worse for branch prediction.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Thomas Koenig on Sat Jun 1 17:08:53 2024
    Thomas Koenig <[email protected]> writes:
    I'm not sure that would be the case for text containing some
    non-ASCII characters, where you cannot predict branches well
    (consider Å, Ø and Æ, which together appear to make up around
    a bit more than 2.5% according to a random statistic I just
    grabbed off the Internet), or ä, ö and ü which have around 1.5%
    occurrence together.

    In Chinese or Japanese text, I assume the spaces and punctuation
    are 7-bit ASCII (are they, actually?) so things would be even
    worse for branch prediction.

    Branch prediction for what purpose?

    A typical usage is processing of csv files containing participant
    lists of a course, and the results for the course. The participant
    names contain various non-ASCII characters*. The participant names
    are usually just copied literally from some inputs to some outputs. I
    don't know if the tools I use (awk, join, sort (in the C.utf8 locale)
    etc.) do some code point processing, but if they do, it's totally
    unnecessary.

    In one case I sort on the names to produce reports, so in that case a
    different locale and actual knowledge of the characters for collating
    order purposes might be a good idea, but none of the report users has complained yet about the sorting. And the question is which locale
    one should use when some names are from Turkey, some from Hungary,
    some from Austria, some from Croatia etc.; how would
    LC_COLLATE=de_AT.UTF-8 deal with all the characters that don't occur
    in German? In any case, sorting in a locale other than C involves
    much more (and much more expensive operations) than just code point recognition. And the actual lion's share of the CPU time spent on
    report processing is the conversion from .md to .pdf using pandoc. I
    am sure that this would not be measurably faster if there were only
    ASCII characters in the .md files.

    Bottom line: Code point conversion instructions like CU14 solve a
    problem that people imagine who have no experience working with UTF-8.

    * I have never encountered names containing characters outside the
    roman-based alphabets, though, they are probably all romanized at some
    earlier administrative step (probably when they register for the
    university, or earlier), but Cyrillic, Greek, or CJK characters would
    make no difference to the scripts; they would make it harder for
    course staff to pronounce the names, though, so it's probably good
    that the names are all romanized.

    BTW, the biggest problems stem from ASCII characters in names, in
    particular " " and "'".

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to [email protected] on Sat Jun 1 20:00:48 2024
    On Sat, 1 Jun 2024 07:47:49 -0000 (UTC), Thomas Koenig
    <[email protected]> wrote:
    John Savard <[email protected]d> schrieb:

    If they took it to its logical conclusion, and packaged zArchitecture
    chips without the ability to run current IBM operating systems in the
    same way as POWER chips, I might actually be interested.

    Would you like to buy one, then? That would be a large investment
    of money and space in your home... but then again, an 18-year old
    once bought a z890, see https://www.youtube.com/watch?v=45X4VP8CGtk

    Well, when I said "packaged... in the same way as POWER chips", I
    meant that they would make systems with fewer CPUs than a mainframe
    which were in the category of ordinary desktop computers if they were
    to do that... which, of course, they won't.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to John Savard on Sun Jun 2 06:58:37 2024
    John Savard <[email protected]d> schrieb:
    On Sat, 1 Jun 2024 07:47:49 -0000 (UTC), Thomas Koenig
    <[email protected]> wrote:
    John Savard <[email protected]d> schrieb:

    If they took it to its logical conclusion, and packaged zArchitecture
    chips without the ability to run current IBM operating systems in the
    same way as POWER chips, I might actually be interested.

    Would you like to buy one, then? That would be a large investment
    of money and space in your home... but then again, an 18-year old
    once bought a z890, see https://www.youtube.com/watch?v=45X4VP8CGtk

    Well, when I said "packaged... in the same way as POWER chips", I
    meant that they would make systems with fewer CPUs than a mainframe
    which were in the category of ordinary desktop computers if they were
    to do that... which, of course, they won't.

    You can buy POWER9 machines from RaptorCS. The command prompt
    does not look different from AMD64, but of course the coolness
    factor is much higher. (Also the noise level, if you do not order
    with soundproofing...)

    But maybe you can also run a 360/30 on an FPGA board, somebody
    has apparently implemented it in VHDL from the logic diagrams: https://github.com/ibm2030/IBM2030

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to John Levine on Sun Jun 2 12:01:11 2024
    On Fri, 31 May 2024 19:44:49 -0000 (UTC)
    John Levine <[email protected]> wrote:

    According to John Savard <[email protected]d>:
    On Thu, 30 May 2024 03:25:14 -0000 (UTC), John Levine
    <[email protected]> wrote:

    I do not entirely understand why IBM keeps adding special purpose >>instructions to z. Maybe it's partly marketing, but they have a
    largely captive audience so it has to be more than that.

    One possibility is to _keep_ that audience captive even after all the >patents expire that are applicable to machines with the
    z/Architecture in its current state, if you are reluctant to believe
    that these new instructions genuinely improve performance.

    Back in the last millenium there were a bunch of companies that made
    clones of IBM mainframes. They all failed. It's the whole ecosystem of hardware and software, not just individual features that keep the
    customers nor patents.

    I have to say I'm somewhat surprised that IBM has put a lot of effort
    into running linux on zSeries, since that's about as un-captive as you
    can get. I would imagine that for some kinds of heavily threaded
    workloads they could be competitive since the z machines have upwards
    of a hundred CPUs with a shared mostly consistent cache.


    z15 appears to peak at 190/380 User-visible cores/threads.
    That's less than quad-socket 56-core Intel Xeon. Quad-socket Xeons
    are much less popular than they used to be 20 years ago, but
    HP/Dell/Lenovo would still sell you one if you insist.
    IBM's own Power System E980 can give you whooping 1536 threads in
    maximal configuration.

    May be, Telum pulls zArch ahead of those. I don't know much about it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Dallman@21:1/5 to John Savard on Sun Jun 2 11:23:00 2024
    In article <[email protected]>, [email protected]d (John Savard) wrote:

    If they took it to its logical conclusion, and packaged
    zArchitecture chips without the ability to run current
    IBM operating systems in the same way as POWER chips,
    I might actually be interested.

    A deskside zSeries machine that would boot and run Linux (probably under
    z/VM) reasonably simply would be interesting to me. A big-endian machine
    with comprehensive hardware trapping has software QA uses in the current
    era of machines that hardly trap on anything apart from SEGV.

    I looked into Hercules, but the community for that is mostly interested
    in running historical 31-bit MVS and other elderly OSes. Hercules needs
    quite a lot of configuration set up, which requires using a lot of IBM mainframe terminology and concepts, and doesn't supply a configuration
    file for Linux. Setting it up is hard if you've never been a mainframe operator, and the community isn't all that helpful to outsiders.

    John

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Dallman@21:1/5 to All on Sun Jun 2 11:23:00 2024
    In article <v3d9bh$s9a$[email protected]>, [email protected] (John Levine)
    wrote:

    I have to say I'm somewhat surprised that IBM has put a lot of
    effort into running linux on zSeries, since that's about as
    un-captive as you can get. I would imagine that for some kinds
    of heavily threaded workloads they could be competitive since
    the z machines have upwards of a hundred CPUs with a shared
    mostly consistent cache.

    It seems to have been the easiest way to get zSeries used for web serving
    and other internet tasks. Getting Linux software running on zSeries that
    way is /much/ easier than porting it to z/OS or z/VSE.

    John

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to John Dallman on Sun Jun 2 13:32:40 2024
    John Dallman <[email protected]> schrieb:
    In article <[email protected]>, [email protected]d (John Savard) wrote:

    If they took it to its logical conclusion, and packaged
    zArchitecture chips without the ability to run current
    IBM operating systems in the same way as POWER chips,
    I might actually be interested.

    A deskside zSeries machine that would boot and run Linux (probably under z/VM) reasonably simply would be interesting to me. A big-endian machine
    with comprehensive hardware trapping has software QA uses in the current
    era of machines that hardly trap on anything apart from SEGV.

    There are POWER8 machines on sale on E-bay, on which you can run
    either Linux or AIX, and bigendian too, if you want.

    I also have a login shell open on such a machine right now, but that's
    on the gcc compile farm, which is only for open-source projects.

    $ cat foo.c
    #include <stdio.h>
    #include <string.h>

    int main()
    {
    int a;
    char c;
    a = 0;
    c = 1;
    memcpy (&a,&c,1);
    printf ("%d\n", a);
    return 0;
    }
    $ gcc foo.c
    $ ./a.out
    16777216
    $ uname -a
    Linux cfarm203 6.0.0-6-powerpc64 #1 SMP Debian 6.0.12-1 (2022-12-09) ppc64 GNU/Linux

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Dallman@21:1/5 to Koenig on Sun Jun 2 18:29:00 2024
    In article <v3hs9o$3c8gd$[email protected]>, [email protected] (Thomas Koenig) wrote:

    There are POWER8 machines on sale on E-bay, on which you can run
    either Linux or AIX, and bigendian too, if you want.

    Yup. Considered that. Their trapping is not as comprehensive as zSeries,
    and I could not justify them.

    John

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Sun Jun 2 19:44:57 2024
    According to Thomas Koenig <[email protected]>:
    But maybe you can also run a 360/30 on an FPGA board, somebody
    has apparently implemented it in VHDL from the logic diagrams: >https://github.com/ibm2030/IBM2030

    IBM made several S/360 and S/370 add-in boards for PCs. They worked
    but were never very popular, probably because nobody bought a
    mainframe for the CPU and PC peripherals are underpowered.

    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Sun Jun 2 19:43:18 2024
    According to Anton Ertl <[email protected]>:
    Bottom line: Code point conversion instructions like CU14 solve a
    problem that people imagine who have no experience working with UTF-8.

    The original instructions were CU12 and CU21 which convert between
    UTF-8 and UTF-16. That really is useful, e.g., read a file of UTF-8
    into a program in Java or Javascript which uses UTF-16. I agree the
    UTF-32 versions added in zseries are less likely to be useful.

    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to John Dallman on Mon Jun 3 05:47:50 2024
    [email protected] (John Dallman) writes:
    In article <v3hs9o$3c8gd$[email protected]>, [email protected] (Thomas >Koenig) wrote:

    There are POWER8 machines on sale on E-bay, on which you can run
    either Linux or AIX, and bigendian too, if you want.

    Yup. Considered that. Their trapping is not as comprehensive as zSeries,
    and I could not justify them.

    SPARCs are big-endian and trap on unaligned access (at least that was
    the case when I last used one long ago), while S/370 ff. does not trap
    on unaligned access. What's wrong with SPARC? What other trapping do
    you have in mind?

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to BGB on Mon Jun 3 08:06:01 2024
    On Fri, 31 May 2024 12:14:19 -0500, BGB wrote:

    Though, one thing that makes sense for text editors is if only the
    "currently being edited" lines are fully unpacked, whereas the others
    can remain in a more compact form (such as UTF-8), and are then unpacked
    as they come into view (say, treating the editor window as a 32-entry
    modulo cache or similar).

    That may make sense if you are implementing a *text* editor, like the vi/
    vim family. Remember that Emacs is usable for editing things other than
    text.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to BGB on Mon Jun 3 08:07:44 2024
    On Fri, 31 May 2024 12:55:59 -0500, BGB wrote:

    On 5/31/2024 12:21 PM, MitchAlsup1 wrote:

    In a modern text editor, one can paste in {*.xls tables, *.jpg, *.gif,
    ..} along with text from different fonts and different backgrounds on a
    per character basis.

    Errm, I think we call this a word processor, not a text editor.

    Emacs has things called “text attributes” and “overlays”, for doing precisely this sort of thing. You can even use these things to define
    clickable buttons. Yet nobody would call Emacs a “word processor”.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Anton Ertl on Mon Jun 3 08:11:10 2024
    On Thu, 30 May 2024 12:47:35 GMT, Anton Ertl wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:

    On Wed, 29 May 2024 08:20:03 GMT, Anton Ertl wrote:

    In UTF-32 a character is a sequence of (32-bit) code units.
    In UTF-8 a character is a sequence of (8-bit) code units.

    The point being, there is a 1:1 correspondence between the two
    representations of the same characters/code points. So your claim that
    use of one is somehow a “mistake” while the other is not, is spurious.

    If the data you are working on is provided in files containing UTF-8, conversion to UTF-32 does not provide any benefits and is therefore an unnecessary complication, and therefore a mistake.

    Assuming it does not provide any benefits is the mistake.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to moi on Mon Jun 3 08:10:10 2024
    On Thu, 30 May 2024 19:01:11 +0100, moi wrote:

    On 30/05/2024 03:43, Lawrence D'Oliveiro wrote:

    On Wed, 29 May 2024 08:32:17 +0100, moi wrote:

    I hate user agents like wget, which is why I block them.

    Which is completely futile, which is why it’s so stupid to do.

    What a know-all you are. And offensive with it.

    You find it offensive that your block is so easy to bypass?

    Sucks to be you.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Savard on Mon Jun 3 08:16:06 2024
    On Thu, 30 May 2024 22:22:34 -0600, John Savard wrote:

    And then there was the LISP machine, which started life with the
    infamous "Space Cadet" computer.

    “Space Cadet” keyboard, you mean? <https://www.deviantart.com/default-cube/art/Space-Cadet-Keyboard-650629356> (my exercise in recreating it).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Levine on Mon Jun 3 08:20:48 2024
    On Thu, 30 May 2024 14:42:14 -0000 (UTC), John Levine wrote:

    The condition code tells you which it was. If it was an interrupt, you
    just branch back and keep going.

    Does it really hurt performance for the CPU to keep track of the fact that
    an instruction has to be restarted after an interrupt?

    On the old VAX, there was a processor status bit called “First Part Done”, which was used for interruptible instructions. When an interrupt happened
    with such an instruction, the PC was not incremented past the instruction; instead, the saved PC pointed back at the instruction itself, while the
    saved processor status had the FPD bit set.

    So on a return from the interrupt, the CPU knew not to redo the
    instruction setup, but just continue executing the instruction from the
    current register state.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Dallman@21:1/5 to Anton Ertl on Mon Jun 3 09:20:00 2024
    In article <[email protected]>, [email protected] (Anton Ertl) wrote:

    SPARCs are big-endian and trap on unaligned access (at least that
    was the case when I last used one long ago), while S/370 ff. does
    not trap on unaligned access.

    OK, that shoots down S/370 for this job.

    John

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to John Dallman on Mon Jun 3 13:08:21 2024
    On Mon, 3 Jun 2024 09:20 +0100 (BST)
    [email protected] (John Dallman) wrote:

    In article <[email protected]>, [email protected] (Anton Ertl) wrote:

    SPARCs are big-endian and trap on unaligned access (at least that
    was the case when I last used one long ago), while S/370 ff. does
    not trap on unaligned access.

    OK, that shoots down S/370 for this job.

    John

    What exactly is a job?
    Is it for pure personal amusement or there are practical needs?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Mon Jun 3 10:49:34 2024
    According to Lawrence D'Oliveiro <[email protected]d>:
    On Thu, 30 May 2024 14:42:14 -0000 (UTC), John Levine wrote:

    The condition code tells you which it was. If it was an interrupt, you
    just branch back and keep going.

    Does it really hurt performance for the CPU to keep track of the fact that
    an instruction has to be restarted after an interrupt?

    I should have been clearer, it's not just an interrupt. The CPU does
    some maximum amount of work for the instruction, and sets the
    condition code if it didn't do the whole string. Maybe it was an
    interrupt, maybe it just hit the limit. Many other instructions that
    process long chunks of data work the same way.

    On the old VAX, there was a processor status bit called “First Part Done”,

    Actually that was the PDP-6 and -10 for the byte instructions,

    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From moi@21:1/5 to Lawrence D'Oliveiro on Mon Jun 3 12:50:47 2024
    On 03/06/2024 09:10, Lawrence D'Oliveiro wrote:
    On Thu, 30 May 2024 19:01:11 +0100, moi wrote:

    On 30/05/2024 03:43, Lawrence D'Oliveiro wrote:

    On Wed, 29 May 2024 08:32:17 +0100, moi wrote:

    I hate user agents like wget, which is why I block them.

    Which is completely futile, which is why it’s so stupid to do.

    What a know-all you are. And offensive with it.

    You find it offensive that your block is so easy to bypass?

    Sucks to be you.

    You just cannot help yourself, can you? I am sorry for you,
    but my tolerance has limits and you have just passed them.

    --
    Bill F.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Mon Jun 3 13:46:05 2024
    According to OrangeFish <[email protected]d>:
    On 2024-06-02 15:44, John Levine wrote:
    IBM made several S/360 and S/370 add-in boards for PCs. They worked
    but were never very popular, probably because nobody bought a
    mainframe for the CPU and PC peripherals are underpowered.

    Were they not marketed as a way of developing s/w on a PC without
    chewing up mainframe time?

    I heard it was software licensing. You were allowed to run stuff on
    your PC/360 without paying for an extra seat as you would if you were
    using a mainframe terminal.

    They still weren't very popular, even though they were technically quite clever.
    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From OrangeFish@21:1/5 to John Levine on Mon Jun 3 09:29:25 2024
    On 2024-06-02 15:44, John Levine wrote:
    IBM made several S/360 and S/370 add-in boards for PCs. They worked
    but were never very popular, probably because nobody bought a
    mainframe for the CPU and PC peripherals are underpowered.

    Were they not marketed as a way of developing s/w on a PC without
    chewing up mainframe time?

    OF

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Mon Jun 3 14:13:10 2024
    Lawrence D'Oliveiro <[email protected]d> writes:
    On Thu, 30 May 2024 14:42:14 -0000 (UTC), John Levine wrote:

    The condition code tells you which it was. If it was an interrupt, you
    just branch back and keep going.

    Does it really hurt performance for the CPU to keep track of the fact that
    an instruction has to be restarted after an interrupt?

    Yes, of course. And it complicates the design, which makes it harder
    to verify, particularly for an out-of-order design.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Scott Lurndal on Mon Jun 3 16:36:37 2024
    Scott Lurndal wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:
    On Thu, 30 May 2024 14:42:14 -0000 (UTC), John Levine wrote:

    The condition code tells you which it was. If it was an interrupt, you
    just branch back and keep going.

    Does it really hurt performance for the CPU to keep track of the fact
    that
    an instruction has to be restarted after an interrupt?

    It is already a requirement that we have precise interrupts. Those Rqs
    impose that the unfinished instruction is pointed at by IP on return.

    Yes, of course. And it complicates the design, which makes it harder
    to verify, particularly for an out-of-order design.

    If you can backup mispredicted branches, you have all the OoO HW to
    restart a long running instruction.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Josh Vanderhoof@21:1/5 to Lawrence D'Oliveiro on Mon Jun 3 18:50:28 2024
    Lawrence D'Oliveiro <[email protected]d> writes:

    On Fri, 31 May 2024 12:55:59 -0500, BGB wrote:

    On 5/31/2024 12:21 PM, MitchAlsup1 wrote:

    In a modern text editor, one can paste in {*.xls tables, *.jpg, *.gif,
    ..} along with text from different fonts and different backgrounds on a
    per character basis.

    Errm, I think we call this a word processor, not a text editor.

    Emacs has things called “text attributes” and “overlays”, for doing precisely this sort of thing. You can even use these things to define clickable buttons. Yet nobody would call Emacs a “word processor”.

    RMS did call it a word processor.

    https://lists.gnu.org/archive/html/emacs-devel/2013-11/msg00515.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Thomas Koenig on Tue Jun 4 01:21:56 2024
    On Thu, 30 May 2024 13:41:58 -0000 (UTC), Thomas Koenig wrote:

    Anton Ertl <[email protected]> schrieb:

    The fact that these feature provide no actual benefit is their best
    property:

    No actual benefit?

    If you make such a strong statement, I assume that you have done a
    thorough analysis of this feature for typical mainframe workloads and
    can support your claims with benchmarks.

    We already know the answer to that. It’s why RISC has taken over the computing world.

    Remember that “mainframe workloads” are primarily I/O bound, not CPU- bound. The whole concept of a “mainframe” arose in the era when CPU time was scarce and expensive, so you had all these intelligent I/O peripherals
    that could be given sequences of operations to perform, with minimal CPU intervention. It was all about maximizing throughput (batch operation),
    not minimizing latency (interactive operation).

    Nowadays, the whole concept is obsolete. So the only thing keeping it a
    viable business has to be marketing, not technical, reasons.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Levine on Tue Jun 4 01:26:20 2024
    On Fri, 31 May 2024 19:44:49 -0000 (UTC), John Levine wrote:

    Back in the last millenium there were a bunch of companies that made
    clones of IBM mainframes. They all failed.

    They didn’t “fail” as such. Companies like Amdahl and Wang were able to maintain profitable businesses for quite a few years, decades even. And
    then there were other entirely separate companies, like CDC where Seymour
    Cray invented the concept of the “supercomputer”, much to the surprise of his upper management who just wanted to sell “business” machines.

    All the mainframe companies apart from IBM eventually went out of business because the whole mainframe concept is obsolete. The only reason IBM is
    still going is because it was able to muster more marketing clout than all
    its competitors put together. But even that part of its business is in
    decline. The only part of the company currently making money would be its
    Red Hat acquisition. The rest will eventually wither away.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Thomas Koenig on Tue Jun 4 01:30:49 2024
    On Sat, 1 Jun 2024 07:47:49 -0000 (UTC), Thomas Koenig wrote:

    One of the main selling points [of zSeries] is the hardware
    reliability ...

    Quite an expensive way to get reliability. How does an outfit like Google achieve essentially 0% downtime? By running a swarm of half a million
    commodity servers, that’s how. Every part has been built to the lowest
    cost, except the power supply. And they discovered they can run their data centres a little hot, to save on cooling costs, at the expense of a
    slightly higher failure rate. Because if a few thousand servers are down
    at any particular time, none of their users even notices.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Tue Jun 4 01:32:58 2024
    Lawrence D'Oliveiro wrote:



    If you make such a strong statement, I assume that you have done a
    thorough analysis of this feature for typical mainframe workloads and
    can support your claims with benchmarks.

    We already know the answer to that. It’s why RISC has taken over the computing world.

    Oh Wait !?!

    Remember that “mainframe workloads” are primarily I/O bound, not CPU- bound. The whole concept of a “mainframe” arose in the era when CPU
    time
    was scarce and expensive, so you had all these intelligent I/O
    peripherals
    that could be given sequences of operations to perform, with minimal
    CPU
    intervention. It was all about maximizing throughput (batch operation),

    not minimizing latency (interactive operation).

    One of the reasons those CPUs were microcoded was to allow I/O
    activities
    to have 50% of the compute power and 50% of the memory bandwidth. Thus,

    from one set of HW logic one got 2 different computers, one designed
    for
    COBOL the other designed for I/O (of that era) sharing the same
    expensive
    lump of circuits.

    Nowadays, the whole concept is obsolete. So the only thing keeping it a

    viable business has to be marketing, not technical, reasons.

    Microcode that "runs the instruction pipeline" is obsolete. And if
    anyone
    slugged through the Nick Tredenic book they would understand why.

    Microcode is still viable at the function unit level in converting FMUL
    logic into performing FDIV and SQRT calculations at low added cost.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Josh Vanderhoof on Tue Jun 4 01:36:52 2024
    On Mon, 03 Jun 2024 18:50:28 -0400, Josh Vanderhoof wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:

    On Fri, 31 May 2024 12:55:59 -0500, BGB wrote:

    On 5/31/2024 12:21 PM, MitchAlsup1 wrote:

    In a modern text editor, one can paste in {*.xls tables, *.jpg,
    *.gif, ..} along with text from different fonts and different
    backgrounds on a per character basis.

    Errm, I think we call this a word processor, not a text editor.

    Emacs has things called “text attributes” and “overlays”, for doing >> precisely this sort of thing. You can even use these things to define
    clickable buttons. Yet nobody would call Emacs a “word processor”.

    RMS did call it a word processor.

    https://lists.gnu.org/archive/html/emacs-devel/2013-11/msg00515.html

    No he didn’t: “more features are still needed” to “extend Emacs to do WYSIWYG word processing”. So he admits it’s not doing that yet.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Thomas Koenig on Tue Jun 4 01:51:31 2024
    On Sun, 2 Jun 2024 06:58:37 -0000 (UTC), Thomas Koenig wrote:

    You can buy POWER9 machines from RaptorCS. The command prompt does not
    look different from AMD64, but of course the coolness factor is much
    higher.

    Linux is Linux. There was an article on theinquirer.net (defunct now) some years ago where a guy from SGI was giving an interactive demo (remotely,
    via SSH) on a thousand-core Altix super. It still looked like a Linux
    system. Though commands like “lspci” and “lscpu” produced output that went
    on ... and on ... and on ...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lynn Wheeler@21:1/5 to Lawrence D'Oliveiro on Mon Jun 3 16:42:17 2024
    Lawrence D'Oliveiro <[email protected]d> writes:
    Remember that “mainframe workloads” are primarily I/O bound, not CPU- bound. The whole concept of a “mainframe” arose in the era when CPU time was scarce and expensive, so you had all these intelligent I/O peripherals that could be given sequences of operations to perform, with minimal CPU intervention. It was all about maximizing throughput (batch operation),
    not minimizing latency (interactive operation).

    1980, I was con'ed into helping IBM STL lab that was overcrowded and
    moving 300 people to offsite bldg with dataprocessing back to STL
    datacenter. I was asked to do channel-extender support to place "local"
    channel "attached" controllers at the remote bldg (cutting various
    protocol round-trip latencies). Part of the issue was that the mainframe
    60s era also had limited memory and so there is enormous protocol
    round-trips utilizing data back in mainframe memory.

    1988, local IBM branch asks if I could help LLNL national lab
    standardize some serial stuff they had been playing ... which quickly
    becomes fibre-channel standard (FCS). Some time later, some IBM
    engineers become involved with FCS and define a heavy-weight protocol
    that radically cuts the native throughput ... which was eventually
    released as FICON (used for mainframe I/O, w/extensive protocol
    round-trip latencies, significant impact for even short distrances at
    gbit rates).

    Most recent public benchmark that I've found is IBM z196 "Peak I/O"
    benchmark which had 104 FICON (running over 104 FCS) getting 2M IOPS.
    About the same time, a native FCS was announced for E5-2600 blade
    claiming over million IOPS (two such FCS have higher throughput than
    104 FICON (running over 104 FCS). Note IBM docs recommend that SAPs
    (CPUs dedicated for running I/O) be kept to no more than 70% CPU
    ... which would be more like 1.5M (rather than 2M) IOPS.

    --
    virtualization experience starting Jan1968, online at home since Mar1970

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Tue Jun 4 13:11:27 2024
    Lawrence D'Oliveiro <[email protected]d> writes:
    On Fri, 31 May 2024 19:44:49 -0000 (UTC), John Levine wrote:

    All the mainframe companies apart from IBM eventually went out of business >because the whole mainframe concept is obsolete.

    Is that a fact?

    https://www.unisys.com/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Tue Jun 4 14:18:56 2024
    According to Scott Lurndal <[email protected]>:
    Lawrence D'Oliveiro <[email protected]d> writes:
    On Fri, 31 May 2024 19:44:49 -0000 (UTC), John Levine wrote:

    All the mainframe companies apart from IBM eventually went out of business >>because the whole mainframe concept is obsolete.

    Is that a fact?

    https://www.unisys.com/

    I think that IBM is the only one that still makes CPUs. Aren't the
    Unisys machines all emulated on commodity microprocessors now?

    That doesn't keep them from working perfectly well, of course.

    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to John Levine on Tue Jun 4 14:55:39 2024
    John Levine <[email protected]> writes:
    According to Scott Lurndal <[email protected]>:
    Lawrence D'Oliveiro <[email protected]d> writes:
    On Fri, 31 May 2024 19:44:49 -0000 (UTC), John Levine wrote:

    All the mainframe companies apart from IBM eventually went out of business >>>because the whole mainframe concept is obsolete.

    Is that a fact?

    https://www.unisys.com/

    I think that IBM is the only one that still makes CPUs. Aren't the
    Unisys machines all emulated on commodity microprocessors now?

    Yes, although many of the custom CMOS systems are still operational.


    That doesn't keep them from working perfectly well, of course.

    Indeed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Monnier@21:1/5 to All on Tue Jun 4 16:03:39 2024
    For text editors, this is one of the few cases it makes sense to use 32 or
    64 bit characters (say, combining the 'character' with some additional metadata such as formatting).

    Even just 64bit is very tight to encode all the information in an emoji.

    Though, one thing that makes sense for text editors is if only the
    "currently being edited" lines are fully unpacked, whereas the others can remain in a more compact form (such as UTF-8), and are then unpacked as they come into view (say, treating the editor window as a 32-entry modulo cache
    or similar).

    You sufficiently rarely need to care about "character boundaries" that
    such encoding/decoding is probably not worthwhile (especially if you
    consider the case of multi-MB lines).

    It's easy enough to move through UTF-8 itself.

    Not entirely sure how other text editors manage things here, not really looked into it.

    Several different options.
    Emacs uses a gap buffer, which is a quite primitive approach which in
    theory has poor worst case behavior but works surprisingly well in
    practice (especially with the speed at which current CPUs can copy/move
    large chunks of memory).
    Others use structures like ropes.

    https://coredumped.dev/2023/08/09/text-showdown-gap-buffers-vs-ropes/


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Monnier@21:1/5 to All on Tue Jun 4 16:58:06 2024
    Bottom line: Code point conversion instructions like CU14 solve a
    problem that people imagine who have no experience working with UTF-8.
    The original instructions were CU12 and CU21 which convert between
    UTF-8 and UTF-16. That really is useful, e.g., read a file of UTF-8
    into a program in Java or Javascript which uses UTF-16. I agree the
    UTF-32 versions added in zseries are less likely to be useful.

    It's all really instances of the same: conversion between UTF-N1 and
    UTF-N2 is only every worthwhile if you receive something using UTF-N1
    and you have to return something that uses UTF-N2.

    If your task is described at a higher level and you're not constrained
    by some arbitrary choices in intermediate APIs then you're almost always
    better off working straight from the encoding you receive.


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Monnier@21:1/5 to All on Tue Jun 4 17:01:16 2024
    If you make such a strong statement, I assume that you have done a
    thorough analysis of this feature for typical mainframe workloads and
    can support your claims with benchmarks.
    We already know the answer to that. It’s why RISC has taken over the computing world.

    🙂


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Stefan Monnier on Tue Jun 4 23:30:45 2024
    On Tue, 04 Jun 2024 16:03:39 -0400, Stefan Monnier wrote:

    Emacs uses a gap buffer, which is a quite primitive approach which in
    theory has poor worst case behavior but works surprisingly well in
    practice (especially with the speed at which current CPUs can copy/move
    large chunks of memory).
    Others use structures like ropes.

    https://coredumped.dev/2023/08/09/text-showdown-gap-buffers-vs-ropes/

    Interesting. Most of the article seems to be about constructing
    benchmarks, measuring them, discovering that gap buffers are just as good
    if not the best, and then trying to handwave that away before rinsing and repeating.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Tue Jun 4 23:25:18 2024
    On Tue, 04 Jun 2024 13:11:52 GMT, Scott Lurndal wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:

    On Sat, 1 Jun 2024 07:47:49 -0000 (UTC), Thomas Koenig wrote:

    One of the main selling points [of zSeries] is the hardware
    reliability ...

    Quite an expensive way to get reliability. How does an outfit like
    Google achieve essentially 0% downtime? By running a swarm of half a >>million commodity servers, that’s how.

    And that's not expensive?

    Consider the equivalent number of mainframes, with their inbuilt
    diagnostics capabilities etc, to match that reliability.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Tue Jun 4 23:56:46 2024
    Lawrence D'Oliveiro <[email protected]d> writes:
    On Tue, 04 Jun 2024 13:11:52 GMT, Scott Lurndal wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:

    On Sat, 1 Jun 2024 07:47:49 -0000 (UTC), Thomas Koenig wrote:

    One of the main selling points [of zSeries] is the hardware
    reliability ...

    Quite an expensive way to get reliability. How does an outfit like
    Google achieve essentially 0% downtime? By running a swarm of half a >>>million commodity servers, that’s how.

    And that's not expensive?

    Consider the equivalent number of mainframes, with their inbuilt
    diagnostics capabilities etc, to match that reliability.

    Tandem and Stratus did it three decades ago.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Wed Jun 5 04:10:52 2024
    On Tue, 04 Jun 2024 23:56:46 GMT, Scott Lurndal wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:

    On Tue, 04 Jun 2024 13:11:52 GMT, Scott Lurndal wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:

    On Sat, 1 Jun 2024 07:47:49 -0000 (UTC), Thomas Koenig wrote:

    One of the main selling points [of zSeries] is the hardware
    reliability ...

    Quite an expensive way to get reliability. How does an outfit like >>>>Google achieve essentially 0% downtime? By running a swarm of half a >>>>million commodity servers, that’s how.

    And that's not expensive?

    Consider the equivalent number of mainframes, with their inbuilt >>diagnostics capabilities etc, to match that reliability.

    Tandem and Stratus did it three decades ago.

    At a high cost.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Dallman@21:1/5 to Michael S on Wed Jun 5 09:40:00 2024
    In article <[email protected]>, [email protected] (Michael S) wrote:

    [email protected] (John Dallman) wrote:
    In article <[email protected]>, [email protected] (Anton Ertl) wrote:
    SPARCs are big-endian and trap on unaligned access (at least
    that was the case when I last used one long ago), while S/370
    ff. does not trap on unaligned access.
    OK, that shoots down S/370 for this job.
    What exactly is a job?
    Is it for pure personal amusement or there are practical needs?

    I would like to keep testing the commercial product I work on in a
    big-endian, alignment-trapping environment. However, there isn't much
    budget available for this. We have a SPARC box doing it, left over from
    when we actually supported Solaris, but as testing grows, its CPU power
    becomes less adequate for the job.

    New SPARC boxes are expensive, dealing with Oracle is hard work, and the architecture has no future.

    I've never been very serious about using Linux on IBM Z for this - it's expensive and dealing with IBM is hard work, although the architecture
    still seems to have a future - but if it doesn't trap misaligned accesses,
    it's disqualified.

    John

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Dallman@21:1/5 to D'Oliveiro on Wed Jun 5 09:40:00 2024
    In article <v3lqfs$48om$[email protected]>, [email protected]d (Lawrence
    D'Oliveiro) wrote:

    . . . there were other entirely separate companies, like CDC
    where Seymour Cray invented the concept of the _supercomputer_,
    much to the surprise of his upper management who just wanted
    to sell _business_ machines.

    Another view is that the supercomputer was implicit in the needs of the
    US nuclear weapons laboratories to do simulations of their designs.
    Computers are much cheaper than nuclear testing.

    John

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to John Dallman on Wed Jun 5 12:49:57 2024
    On Wed, 5 Jun 2024 09:40 +0100 (BST)
    [email protected] (John Dallman) wrote:

    In article <[email protected]>,
    [email protected] (Michael S) wrote:

    [email protected] (John Dallman) wrote:
    In article <[email protected]>, [email protected] (Anton Ertl) wrote:
    SPARCs are big-endian and trap on unaligned access (at least
    that was the case when I last used one long ago), while S/370
    ff. does not trap on unaligned access.
    OK, that shoots down S/370 for this job.
    What exactly is a job?
    Is it for pure personal amusement or there are practical needs?

    I would like to keep testing the commercial product I work on in a big-endian, alignment-trapping environment.

    May be, now is a time to stop to like to keep it?
    If I was you, I'd stop carrying not only about big-endian
    alignment-trapping environment, but about any alignment-trapping
    environment.

    However, there isn't much
    budget available for this. We have a SPARC box doing it, left over
    from when we actually supported Solaris, but as testing grows, its
    CPU power becomes less adequate for the job.

    New SPARC boxes are expensive, dealing with Oracle is hard work, and
    the architecture has no future.

    I've never been very serious about using Linux on IBM Z for this -
    it's expensive and dealing with IBM is hard work, although the
    architecture still seems to have a future - but if it doesn't trap
    misaligned accesses, it's disqualified.

    John

    One of the reasons it has the future is because it doesn't trap
    misaligned accesses.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to John Dallman on Wed Jun 5 13:07:39 2024
    On Wed, 5 Jun 2024 09:40 +0100 (BST)
    [email protected] (John Dallman) wrote:

    In article <v3lqfs$48om$[email protected]>, [email protected]d (Lawrence D'Oliveiro) wrote:

    . . . there were other entirely separate companies, like CDC
    where Seymour Cray invented the concept of the _supercomputer_,
    much to the surprise of his upper management who just wanted
    to sell _business_ machines.

    Another view is that the supercomputer was implicit in the needs of
    the US nuclear weapons laboratories to do simulations of their
    designs. Computers are much cheaper than nuclear testing.

    John

    Another view is that Lawrence D'Oliveiro made it up.
    Reading Wikipedia article, it looks like CDC never had much of the
    "business machines" business. What they had were "business machine's peripherals" business and government/scientific machines business. Also
    they offered public cloud services, but that part of the company was
    losing money earned by other divisions.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to John Dallman on Wed Jun 5 10:32:25 2024
    [email protected] (John Dallman) writes:
    I would like to keep testing the commercial product I work on in a >big-endian, alignment-trapping environment.

    Computer architecture exhibits convergence. Starting in the 1960s it
    converged on byte addressing with 8-bit bytes and on 2s-complement,
    starting in the 1980s it converged on IEEE FP, and ending in the 2010s
    it converged on supporting unaligned accesses and on little-endian
    byte order. Your difficulties in getting hardware for testing whether
    software can work with alignment restrictions and with big-endian byte
    order is a result of that convergence. Maybe your desire to keep your
    software ready for big-endian hardware and hardware with alignment
    restrictions is misguided.

    New SPARC boxes are expensive, dealing with Oracle is hard work, and the >architecture has no future.

    Ebay?

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Michael S on Wed Jun 5 13:20:16 2024
    Michael S <[email protected]> writes:
    On Wed, 5 Jun 2024 09:40 +0100 (BST)
    [email protected] (John Dallman) wrote:

    In article <[email protected]>,
    [email protected] (Michael S) wrote:

    [email protected] (John Dallman) wrote:
    In article <[email protected]>,
    [email protected] (Anton Ertl) wrote:
    SPARCs are big-endian and trap on unaligned access (at least
    that was the case when I last used one long ago), while S/370
    ff. does not trap on unaligned access.
    OK, that shoots down S/370 for this job.
    What exactly is a job?
    Is it for pure personal amusement or there are practical needs?

    I would like to keep testing the commercial product I work on in a
    big-endian, alignment-trapping environment.

    May be, now is a time to stop to like to keep it?

    Or he can use an ARM64 chip. They can be configured to
    trap all unaligned accesses and can be configured to run
    in big-endian.

    It's pretty easy to build a big-endian linux for it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Anton Ertl on Wed Jun 5 17:00:09 2024
    Anton Ertl wrote:

    [email protected] (John Dallman) writes:
    I would like to keep testing the commercial product I work on in a >>big-endian, alignment-trapping environment.

    Computer architecture exhibits convergence. Starting in the 1960s it converged on byte addressing with 8-bit bytes and on 2s-complement,
    starting in the 1980s it converged on IEEE FP, and ending in the 2010s

    Although we did not converge on doing denorms properly until the mid
    2000s.

    GPUs followed a more meandering path:: starting out with crappy but
    fast
    FP, then adopting IEEE containers, then over several generations
    adopting
    more and more of IEEE 754 semantics.

    Then there are the SW (and a few HW) holdouts that still believe that
    denorms are hard/slow and we need mechanisms to flush them from the
    numerics. No, we don't, we need circuitry where denorms are not slower
    than norms without having slowed down the norms.

    it converged on supporting unaligned accesses and on little-endian
    byte order. Your difficulties in getting hardware for testing whether software can work with alignment restrictions and with big-endian byte
    order is a result of that convergence. Maybe your desire to keep your software ready for big-endian hardware and hardware with alignment restrictions is misguided.

    New SPARC boxes are expensive, dealing with Oracle is hard work, and the >>architecture has no future.

    Ebay?

    - anton

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Dallman on Thu Jun 6 01:23:57 2024
    On Wed, 5 Jun 2024 09:40 +0100 (BST), John Dallman wrote:

    Another view is that the supercomputer was implicit in the needs of the
    US nuclear weapons laboratories to do simulations of their designs.

    And in code cracking. All very much a function of the Cold War.

    No coincidence that Cray’s fortunes took a downturn when that ended.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Thu Jun 6 02:42:20 2024
    Lawrence D'Oliveiro wrote:

    On Wed, 5 Jun 2024 09:40 +0100 (BST), John Dallman wrote:

    Another view is that the supercomputer was implicit in the needs of the
    US nuclear weapons laboratories to do simulations of their designs.

    And in code cracking. All very much a function of the Cold War.

    No coincidence that Cray’s fortunes took a downturn when that ended.

    Cray sold the first CRAY-1 for $60M this is what the nuclear physicists
    could afford; writing off the entire development costs.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From George Neuner@21:1/5 to [email protected] on Thu Jun 6 00:42:43 2024
    On Tue, 4 Jun 2024 23:25:18 -0000 (UTC), Lawrence D'Oliveiro
    <[email protected]d> wrote:

    On Tue, 04 Jun 2024 13:11:52 GMT, Scott Lurndal wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:

    On Sat, 1 Jun 2024 07:47:49 -0000 (UTC), Thomas Koenig wrote:

    One of the main selling points [of zSeries] is the hardware
    reliability ...

    Quite an expensive way to get reliability. How does an outfit like
    Google achieve essentially 0% downtime? By running a swarm of half a >>>million commodity servers, that’s how.

    And that's not expensive?

    Consider the equivalent number of mainframes, with their inbuilt
    diagnostics capabilities etc, to match that reliability.

    Can't find it now and don't remember many details, but ...

    A long time ago, there was a story going around about Microsoft vs IBM regarding the day-to-day operation of their company web sites. It
    claimed that Microsoft was running a ~1000 machine server farm with a
    crew of ~100, whereas IBM was running 3 mainframes with a crew of ~10.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Thu Jun 6 07:50:48 2024
    On Thu, 6 Jun 2024 02:42:20 +0000, MitchAlsup1 wrote:

    Cray sold the first CRAY-1 for $60M this is what the nuclear physicists
    could afford; writing off the entire development costs.

    I think the Cray-1 line was the only product family from Cray Research/
    Cray Computer that made money. I don’t think the Cray-2 machines were profitable; only two (I think) Cray-3 units were built, and Seymour gave
    them both away; and the Cray-4 was never finished.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Thu Jun 6 11:55:22 2024
    According to George Neuner <[email protected]>:
    Consider the equivalent number of mainframes, with their inbuilt >>diagnostics capabilities etc, to match that reliability.

    Can't find it now and don't remember many details, but ...

    A long time ago, there was a story going around about Microsoft vs IBM >regarding the day-to-day operation of their company web sites. It
    claimed that Microsoft was running a ~1000 machine server farm with a
    crew of ~100, whereas IBM was running 3 mainframes with a crew of ~10.

    It depends on what you want to do.

    If you're doing something that is mostly read-only and easy to
    parallelize, then it makes sense to use a farm of cheap PCs. But if
    you are a bank or an airline, you need to be able to lock your
    database so that you debit a bank account or sell a plane seat exactly
    once. There is a rule of thumb that the cost of locking something
    grows roughly as the square of the number of things contending for
    the lock.

    For example, airline reservation systems are the classic example of a
    mainframe database. About 25 years ago, ITA Software had the bright
    idea to do searches for seats and prices on racks of cheap PCs, which
    worked great since it's read only, and if they suggest a seat or fare
    that turns out to have just sold out, too bad, try again. But when
    travel agents and airlines used it, they kept the ticketing info in a
    regular database because it has to work.

    --
    Regards,
    John Levine, [email protected], Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to George Neuner on Thu Jun 6 13:49:48 2024
    George Neuner <[email protected]> writes:
    On Tue, 4 Jun 2024 23:25:18 -0000 (UTC), Lawrence D'Oliveiro
    <[email protected]d> wrote:

    On Tue, 04 Jun 2024 13:11:52 GMT, Scott Lurndal wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:

    On Sat, 1 Jun 2024 07:47:49 -0000 (UTC), Thomas Koenig wrote:

    One of the main selling points [of zSeries] is the hardware
    reliability ...

    Quite an expensive way to get reliability. How does an outfit like >>>>Google achieve essentially 0% downtime? By running a swarm of half a >>>>million commodity servers, that’s how.

    And that's not expensive?

    Consider the equivalent number of mainframes, with their inbuilt >>diagnostics capabilities etc, to match that reliability.

    Can't find it now and don't remember many details, but ...

    A long time ago, there was a story going around about Microsoft vs IBM >regarding the day-to-day operation of their company web sites. It
    claimed that Microsoft was running a ~1000 machine server farm with a
    crew of ~100, whereas IBM was running 3 mainframes with a crew of ~10.

    In 2010, when the City of Santa Ana decommissioned their Unisys V380[*],
    they replaced it with 21 windows servers. At the time, the V380
    had been running production for almost thirty years.

    [*] Penultimate descendent of the Burroughs B3500.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lynn Wheeler@21:1/5 to George Neuner on Thu Jun 6 09:11:06 2024
    George Neuner <[email protected]> writes:

    On Tue, 4 Jun 2024 23:25:18 -0000 (UTC), Lawrence D'Oliveiro
    <[email protected]d> wrote:

    On Tue, 04 Jun 2024 13:11:52 GMT, Scott Lurndal wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:

    On Sat, 1 Jun 2024 07:47:49 -0000 (UTC), Thomas Koenig wrote:

    One of the main selling points [of zSeries] is the hardware
    reliability ...

    Quite an expensive way to get reliability. How does an outfit like >>>>Google achieve essentially 0% downtime? By running a swarm of half a >>>>million commodity servers, that’s how.

    And that's not expensive?

    Consider the equivalent number of mainframes, with their inbuilt >>diagnostics capabilities etc, to match that reliability.

    Can't find it now and don't remember many details, but ...

    A long time ago, there was a story going around about Microsoft vs IBM regarding the day-to-day operation of their company web sites. It
    claimed that Microsoft was running a ~1000 machine server farm with a
    crew of ~100, whereas IBM was running 3 mainframes with a crew of ~10.

    microsoft had hundreds of millions of customers that were more internet oriented, while IBM had thousands of customers that were much less
    internet oriented (and rate of changing information was much lower) ...
    and IBM number may have only been for the web operation, as opposed to
    total support people.

    Jan1979, I was con'ed into doing benchmark for national lab that was
    looking at getting 70 4341s for compute farm (sort of leading edge of
    the coming cluster supercomputing tsunami). 4341s were also selling into
    the same mid-range market as VAX and in about same numbers for small
    unit orders. Big difference was large companies were ordering hundreds
    of vm/4341s at a time for deployment out into departmental areas (sort
    of the leading edge of the coming distributed computing tsunami).

    The IBM batch system (MVS) was looking at the exploding distributed
    computing market. First problem was only disk product for non-datacenter environment was FBA (fixed-block architecture) and MVS only supported
    CKD. Eventually there was CKD simulation made available on FBA disks
    (currently no CKD disks have been made for decades, all being simulated
    on industry standard fixed block disks). It didn't do MVS much good
    because distributed operation was looking at dozens of systems per
    support person while MVS still required dozens of support people per
    system.

    admittedly 14 year old comparison, max configured z196 mainframe
    benchmarked at 50BIPS ... still dozens of support people. Equivalent
    cloud megadatacenter was half million or more E5-2600 blades that each benchmarked at 500BIPS with enormous automation requiring 70-80 support
    people (per megadatacenter, at least 6000-7000 systems per person and
    each system ten times max configured mainframe) ... also the megacenter comparison was linux (not windows).

    --
    virtualization experience starting Jan1968, online at home since Mar1970

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From OrangeFish@21:1/5 to George Neuner on Thu Jun 6 16:24:03 2024
    On 2024-06-06 00:42, George Neuner wrote:
    On Tue, 4 Jun 2024 23:25:18 -0000 (UTC), Lawrence D'Oliveiro
    <[email protected]d> wrote:

    On Tue, 04 Jun 2024 13:11:52 GMT, Scott Lurndal wrote:

    Lawrence D'Oliveiro <[email protected]d> writes:

    On Sat, 1 Jun 2024 07:47:49 -0000 (UTC), Thomas Koenig wrote:

    One of the main selling points [of zSeries] is the hardware
    reliability ...

    Quite an expensive way to get reliability. How does an outfit like
    Google achieve essentially 0% downtime? By running a swarm of half a
    million commodity servers, that’s how.

    And that's not expensive?

    Consider the equivalent number of mainframes, with their inbuilt
    diagnostics capabilities etc, to match that reliability.

    Can't find it now and don't remember many details, but ...

    A long time ago, there was a story going around about Microsoft vs IBM regarding the day-to-day operation of their company web sites. It
    claimed that Microsoft was running a ~1000 machine server farm with a
    crew of ~100, whereas IBM was running 3 mainframes with a crew of ~10.

    Not the story but this reminds me of Microsoft Scalability Day: https://www.cnet.com/tech/tech-industry/scalability-day-falls-short/

    OF.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Levine on Fri Jun 7 02:26:01 2024
    On Thu, 6 Jun 2024 11:55:22 -0000 (UTC), John Levine wrote:

    If you're doing something that is mostly read-only and easy to
    parallelize, then it makes sense to use a farm of cheap PCs. But if you
    are a bank or an airline, you need to be able to lock your database so
    that you debit a bank account or sell a plane seat exactly once. There
    is a rule of thumb that the cost of locking something grows roughly as
    the square of the number of things contending for the lock.

    Remember that the number of users actually buying a product at any given
    time is only a small proportion (say 1%) of the number of users currently accessing the site.

    So, by that same square law, the locking problem is only 1/10,000 as bad
    as one might think.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Fri Jun 7 02:19:46 2024
    A long time ago, there was a story going around about Microsoft vs IBM >>regarding the day-to-day operation of their company web sites. It
    claimed that Microsoft was running a ~1000 machine server farm with a
    crew of ~100, whereas IBM was running 3 mainframes with a crew of ~10.

    Those mainframes were probably running Linux.

    Not sure why a comparison with servers running Windows is relevant to the
    point I was making, anyway.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to Lawrence D'Oliveiro on Fri Jun 7 03:13:54 2024
    Lawrence D'Oliveiro wrote:

    On Thu, 6 Jun 2024 11:55:22 -0000 (UTC), John Levine wrote:

    If you're doing something that is mostly read-only and easy to
    parallelize, then it makes sense to use a farm of cheap PCs. But if
    you are a bank or an airline, you need to be able to lock your
    database so that you debit a bank account or sell a plane seat
    exactly once. There is a rule of thumb that the cost of locking
    something grows roughly as the square of the number of things
    contending for the lock.

    Remember that the number of users actually buying a product at any
    given time is only a small proportion (say 1%) of the number of users currently accessing the site.

    I don't know where you got that number, but even if it is true for a
    retail storefront type site, I am pretty sure it isn't true for a bank
    (what John was talking about, and a substantial part of mainframes's
    user base). Few people "browse" bank's the products. :-) Even for an
    airline (the other example John gave.) I suspect that far more than 1%
    of the accesses are updates.



    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Stephen Fuld on Fri Jun 7 03:32:53 2024
    On Fri, 7 Jun 2024 03:13:54 -0000 (UTC), Stephen Fuld wrote:

    Lawrence D'Oliveiro wrote:

    Remember that the number of users actually buying a product at any
    given time is only a small proportion (say 1%) of the number of users
    currently accessing the site.

    I don't know where you got that number ...

    From actual experience.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to Lawrence D'Oliveiro on Fri Jun 7 03:48:05 2024
    Lawrence D'Oliveiro wrote:

    On Fri, 7 Jun 2024 03:13:54 -0000 (UTC), Stephen Fuld wrote:

    Lawrence D'Oliveiro wrote:

    Remember that the number of users actually buying a product at any
    given time is only a small proportion (say 1%) of the number of
    users >> currently accessing the site.

    I don't know where you got that number ...

    From actual experience.

    OKK. Was that experience with a bank or airline (what John was
    discussing)?



    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Terje Mathisen@21:1/5 to John Levine on Fri Jun 7 11:06:47 2024
    John Levine wrote:
    According to George Neuner <[email protected]>:
    Consider the equivalent number of mainframes, with their inbuilt
    diagnostics capabilities etc, to match that reliability.

    Can't find it now and don't remember many details, but ...

    A long time ago, there was a story going around about Microsoft vs IBM
    regarding the day-to-day operation of their company web sites. It
    claimed that Microsoft was running a ~1000 machine server farm with a
    crew of ~100, whereas IBM was running 3 mainframes with a crew of ~10.

    It depends on what you want to do.

    If you're doing something that is mostly read-only and easy to
    parallelize, then it makes sense to use a farm of cheap PCs. But if
    you are a bank or an airline, you need to be able to lock your
    database so that you debit a bank account or sell a plane seat exactly
    once. There is a rule of thumb that the cost of locking something
    grows roughly as the square of the number of things contending for
    the lock.

    Which is why you use my trick (probably old?) of setting up an array of
    N preliminary locks, as gate-keepers: N would be approx sqrt(number_of_competing_users), and only after winning that first stage
    are you allowed to compete for the "real" lock.

    I've showed a way here in c.arch to make this adaptive, so it would only
    kick in after a given amount of contention.

    For example, airline reservation systems are the classic example of a mainframe database. About 25 years ago, ITA Software had the bright
    idea to do searches for seats and prices on racks of cheap PCs, which
    worked great since it's read only, and if they suggest a seat or fare
    that turns out to have just sold out, too bad, try again. But when
    travel agents and airlines used it, they kept the ticketing info in a
    regular database because it has to work.

    The main problem here is how long you are allowed to "soft lock" a set
    of seats that you are contemplating buying.

    Terje


    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)