Looking up "splicing strings", I find that this is a term used in
connection with Python for specifying substrings. Python3 is a
language that lives the codepoint mistake to the extreme (and from
what I read, this was one of the major pain points in the
Python2->Python3 transition), but anyway, with UTF-8 one way to
represent a substring is to use the start index and length in bytes
(aka code units) rather than code points.
Python3 has a complex internal string format that stores each string
as 1, 2, or 4 byte values, depending on what the contents of the
string are, so ASCII is one byte, UCS-2 is two bytes, and strings that >contain code points beyond UCS-2 are four bytes. It's not clear how
hard they try to shrink stuff down when taking substrings.
https://peps.python.org/pep-0393/
Python lets you subscript strings either individual items or
substrings, and I have written a fair amount of code that does that. I >realize that if I were doing semantic processing on Greek or Arabic, I
would not be subscripting and expecting it to return straightforwardly
useful results.
The string structure has a field for the length of the string in
UTF-8, but they don't seem to use it for anything, at least not yet,
Anton Ertl <[email protected]> schrieb:
The point I wanted to make is that there is the frequent
misconception that dealing with individual arbitrary characters is
something that is relatively common, and that one can do that by using
UTF-32 (or UTF-16); it isn't, and one cannot.
Do you really mean one cannot change an individual character
using UTF-32?
I assume you mean "there is no need to do it"..
If you stick with UTF-8
and use byte lengths and byte indexes, you can do almost everything as
well or better (with less complication and more efficiently) as by
converting to UTF-32 and back.
Assume you're implementing a language which has a function of
setting an individual character in a string.
How would you implement it? Run through the string?
Would you then also
store additional information somewhere so that the next character
that the user sets does not need to do it again?
John Levine <[email protected]> writes:
Python3 has a complex internal string format that stores each string
as 1, 2, or 4 byte values, depending on what the contents of the
string are, so ASCII is one byte, UCS-2 is two bytes, and strings that
contain code points beyond UCS-2 are four bytes. It's not clear how
hard they try to shrink stuff down when taking substrings.
https://peps.python.org/pep-0393/
This is a nice demonstration of the unnecessary complexity that the
codepoint mistake leads to.
On 12/05/2024 07:40, Anton Ertl wrote:
John Levine <[email protected]> writes:
Python3 has a complex internal string format that stores each string
as 1, 2, or 4 byte values, depending on what the contents of the
string are, so ASCII is one byte, UCS-2 is two bytes, and strings that
contain code points beyond UCS-2 are four bytes. It's not clear how
hard they try to shrink stuff down when taking substrings.
https://peps.python.org/pep-0393/
This is a nice demonstration of the unnecessary complexity that the
codepoint mistake leads to.
A lot of this is, I suspect, for historical reasons. When Python was
young, most software and languages used either plain ASCII or a mess of
code pages for 8-bit encodings (or an even bigger mess of 16-bit
encodings for CJK languages). Unicode was the new hope for a unifying
16-bit system that would work for all characters in all languages. So
Python - like Java, Windows NT, QT, and some other systems of that era,
chose UCS-2 as the modern, international and future-proof solution to
strings and characters.
It turns out that UCS-2 was not enough, and these have all been
suffering from mixed APIs ever since.
It turns out that UCS-2 was not enough, and these have all been
suffering from mixed APIs ever since.
That's certainly true for Java (first release 1995), Windows NT (first >released 1993) and QT (first released 1995).
Except Python3. I am not familiar with Python, but from the
discussions I have read my impression is: Python2 (released 2000)
supported strings of bytes, and people put UTF-8 in there and worked
with that. Python3 (released 2008) was supposed to be a cleanup and
instead of refining the code-unit-based approach of Python2 they
introduced a code-point-based approach, which supported fast indexing
of code points, a worthless feature. And they found out how hard it
is to migrate a code base.
Assume you're implementing a language which has a function of settingThat's a design mistake in the language, and I know no language that
an individual character in a string.
has this misfeature.
Instead, what we see is one language (Python3) that has an even worse misfeature: You can set an individual code point in a string; see
above for the things you get when you overwrite code points.
But why would one want to set individual code points?
Thomas Koenig <[email protected]> writes:
E.g., consider the following Gforth code (others can tell you how to
do it in Python):
"Ko\u0308nig" cr type
The output is:
König
That is, the second character consists of two Unicode code points, the
"o" and the "\u0308" (Combining Diaeresis).
(I think that somewhere along the way from the Forth system to the
xterm through copying and pasting into Emacs the second character has
become precomposed, but that's probably just as well, so you can see
what I see).
If I replace the third code point with an e, I get "Koenig". So by overwriting one code point, I insert a character into the string.
If instead I replace the second code point with a "\u0316" (Combining
Grave Accent Below):
"K\u0316\u0308nig" cr type
I get this (which looks as expected in my xterm, but not in Emacs)
K̖̈nig
The first character is now a K with a diaresis above and an accent
grave below and there are now a total of 4 characters, but still 6
code points in the string; the second character has been deleted by
this code-point replacement.
I think people in Japan should be able to use printf by using プリントフ There is way to much "english" in the way computers are being used.
It is similar to Anthropomorphizing animal behavior.
I think people in Japan should be able to use printf by using プリントフ
Anton Ertl:]
Thomas Koenig:]
Assume you're implementing a language which has a function of settingThat's a design mistake in the language, and I know no language that
an individual character in a string.
has this misfeature.
I suspect "individual character" meant "code point" above.
Does Unicode even has the notion of "character", really?
Instead, what we see is one language (Python3) that has an even worse
misfeature: You can set an individual code point in a string; see
above for the things you get when you overwrite code points.
I think it's fairly common for languages that started with strings
as "arrays of 8bit chars".
Emacs Lisp has this misfeature as well (and so does Common Lisp). 🙁
It's really hard to get rid of it, even though it's used *very* rarely.
In ELisp, strings are represented internally as utf-8 (tho it pretends
to be an array opf code points), so an assignment that replaces a single
char can require reallocating the array!
But why would one want to set individual code points?
Because you know your string only contains "characters" made of a single
code point?
E.g. your string contains the representation of the border of a table
(to be displayed in a tty), and you want to "move" the `+` of a column >separator (or a prettier version that takes advantage of the wider
choice offered by Unicode).
[email protected] (MitchAlsup1) writes:
It seems to me (in my vast ignorance) that names for things should be >>written in the most appropriate set of characters in the language of
the person/thing being named.
Then when such a name is "sent out to be displayed" that it is a property >>of the display what character set(s) it can properly emit, and thereby >>alter the string of characters as appropriate to its capabilities.
For example:: Take > "K\u0316\u0308nig" cr type ==> K̖̈nig
When displayed on a ASCII only line printer it would be written Koenig
When displayed on a enhanced ASCII printer it would be written König
When displayed on a full functional printer it would be written K̖̈nig
Why do you think that K̖̈nig should be written as Koenig or König?
However, for König
Unicode specifies that the precomposed form is
König. And if you want a transcription into ASCII with the knowledge
that it's German, the result would be Koenig.
Anton Ertl <[email protected]> schrieb:
[email protected] (MitchAlsup1) writes:
It seems to me (in my vast ignorance) that names for things should be
written in the most appropriate set of characters in the language of
the person/thing being named.
Then when such a name is "sent out to be displayed" that it is a property >>> of the display what character set(s) it can properly emit, and thereby
alter the string of characters as appropriate to its capabilities.
For example:: Take > "K\u0316\u0308nig" cr type ==> K̖̈nig
When displayed on a ASCII only line printer it would be written Koenig
When displayed on a enhanced ASCII printer it would be written König
When displayed on a full functional printer it would be written K̖̈nig
Why do you think that K̖̈nig should be written as Koenig or König?
On my display, this read K, n with a diacritic and something close to
a cedille under the n.
However, for König
Again, the diaresis is over the n, not the o.
Unicode specifies that the precomposed form is
König. And if you want a transcription into ASCII with the knowledge
that it's German, the result would be Koenig.
This is actually sometimes a (fairly minor) problem because the
name on my passport actually reads "König" (o-diacritic), but
people without knowledge of German tend to translscribe this as
"Konig", whereas I transcribe it as "Koenig" on offical forms
such as the one I need to fill out prior to entering the US.
This is why modern EU passports have a canonical form of the
name, which then is "KOENIG".
Canonical simplification of the 'ø' character is either 'o' or 'oe', and passports and airline tickets differ, something which can cause all
sorts of issues with US passport control.
Anton Ertl <[email protected]> schrieb:
Why do you think that K̖̈nig should be written as Koenig or König?
On my display, this read K, n with a diacritic and something close to
a cedille under the n.
However, for König
Again, the diaresis is over the n, not the o.
Terje Mathisen <[email protected]> schrieb:
Canonical simplification of the 'ø' character is either 'o' or 'oe', and
passports and airline tickets differ, something which can cause all
sorts of issues with US passport control.
Reminds me of either "Asterix and the Great Crossing" or "Asterix
and the Normans", where Viking speach was indicated by having
slashes through letters (like ø). When Obelix tries to speak
their language, he also applies slashes, but does so randomly
(like through a c) so nobody can understand him.
Hmm... a challenge, can this be represented as Unicode codepoints?
Considering the huge market for palindrome checkers, that is a
real concern, especially if they involve characters for which
UTF-32 is not sufficient, such as smileys.
Is there any language whose characters cannot be represented in
UTF-32?
A similar concept was implemented in COBOL, where the designers though
that having to write
ADD A TO B GIVING C
or somesuch makes programming easier than writing
C = A+B
in FORTRAN.
I think people in Japan should be able to use printf by using ?????
There is way to much "english" in the way computers are being used.
It is similar to Anthropomorphizing animal behavior.
and
because it was supposedly "self documenting", easier for managers, etc.
to see how the program worked.
Remember back in the early 8-bit days of computing, and before them,
when schools were exposing children to PDP-8 computers?
Children were learning to program computers in BASIC.
Obviously, here, if children in other countries used modified versions
of BASIC that used keywords in their own natural language, it would be
much easier for them to get started with programming than if the
keywords were simply arbitrary strings of letters, taken from a
foreign language of which they may not necessarily have any knowledge.
If Algol was supposed to be an _international_ algorithmic language,
why weren't its keywords taken from Latin or Esperanto, instead of
English?
On Tue, 14 May 2024 17:43:43 +0000, [email protected] (MitchAlsup1)
wrote:
I think people in Japan should be able to use printf by using ?????
There is way to much "english" in the way computers are being used.
It is similar to Anthropomorphizing animal behavior.
One could quibble.
If Japanese people needed to enter kana from their keyboards to write programs, that would be awkward; there is not yet a good way to enter
that kind of text from a keyboard.
However, I think your point is valid. At least in some contexts.
Remember back in the early 8-bit days of computing, and before them,
when schools were exposing children to PDP-8 computers?
Children were learning to program computers in BASIC.
Obviously, here, if children in other countries used modified versions
of BASIC that used keywords in their own natural language, it would be
much easier for them to get started with programming than if the
keywords were simply arbitrary strings of letters, taken from a
foreign language of which they may not necessarily have any knowledge.
If Algol was supposed to be an _international_ algorithmic language,
why weren't its keywords taken from Latin or Esperanto, instead of
English?
Historical note: Algol was originally called IAL; remember what JOVIAL
stood for.
But the objections about sharing code between countries, and the fact
that English is so widely known in technical circles, are also true.
It is a complicated issue, made worse by the fact that nationalism and ethnocentricism are often bad things.
John Savard
Long list.
Historical note: Algol was originally called IAL; remember what JOVIAL
stood for.
John Savard wrote:
Historical note: Algol was originally called IAL; remember what
JOVIAL stood for.
Who was Joe ?? in Jovial
Stefan Monnier <[email protected]> writes:
Does Unicode even has the notion of "character", really?
AFAIK it does not. But applications like palindrome checkers care
about characters, not code points.
It seems to me (in my vast ignorance) that names for things should be
written in the most appropriate set of characters in the language of
the person/thing being named.
Then when such a name is "sent out to be displayed" that it is a property
of the display what character set(s) it can properly emit, and thereby
alter the string of characters as appropriate to its capabilities.
For example:: Take > "K\u0316\u0308nig" cr type ==> K̖̈nig
When displayed on a ASCII only line printer it would be written Koenig
When displayed on a enhanced ASCII printer it would be written König
When displayed on a full functional printer it would be written K̖̈nig
Only the display device needs to understand this mapping and NOT the >program/software/device holding the string.
I think people in Japan should be able to use printf by using プリントフ >There is way to much "english" in the way computers are being used.
Anton Ertl <[email protected]> schrieb:
Stefan Monnier <[email protected]> writes:
Does Unicode even has the notion of "character", really?
AFAIK it does not. But applications like palindrome checkers care
about characters, not code points.
Considering the huge market for palindrome checkers, that is a
real concern, especially if they involve characters for which
UTF-32 is not sufficient, such as smileys.
Is there any language whose characters cannot be represented in
UTF-32?
MitchAlsup1 wrote:
John Savard wrote:
Historical note: Algol was originally called IAL; remember what
JOVIAL stood for.
Who was Joe ?? in Jovial
Just in case you weren't joking,
Jules Own Version of the International Algorithmic Language
Jules was Jules Schwartz
https://en.wikipedia.org/wiki/Jules_Schwartz
I meant character, not code point, as should have become clear fromI suspect "individual character" meant "code point" above.Assume you're implementing a language which has a function of settingThat's a design mistake in the language, and I know no language that
an individual character in a string.
has this misfeature.
the following. I think that Thomas Koenig meant "character", too, but
he may have been unaware of the difference between "character" and
"Unicode code point".
OTOH, most code can be implemented fine as working on strings, without knowing how many characters there are in the string (and it then does
not need to know about code points, either).
Emacs Lisp has this misfeature as well (and so does Common Lisp). 🙁One way forward might be to also provide a string-oriented API with
It's really hard to get rid of it, even though it's used *very* rarely.
In ELisp, strings are represented internally as utf-8 (tho it pretends
to be an array opf code points), so an assignment that replaces a single
char can require reallocating the array!
byte (code unit) indices, and recommend that people use that instead
of the inefficient code-point-indexed API.
Because you know your string only contains "characters" made of a single
code point?
This incorrect "knowledge" may be the reason why Emacs 27.1 displays
K̖̈nig
as if the first three-code-point character actually was three characters.
E.g. your string contains the representation of the border of a tableThese kinds of things involve additional complications.
(to be displayed in a tty), and you want to "move" the `+` of a column
separator (or a prettier version that takes advantage of the wider
choice offered by Unicode).
I meant character, not code point, as should have become clear from
the following. I think that Thomas Koenig meant "character", too, but
he may have been unaware of the difference between "character" and
"Unicode code point".
I don't know of any language (or even library) that supports the notion
of "character" for Unicode strings.
On Sat, 18 May 2024 17:11:32 -0000 (UTC), "Stephen Fuld" <[email protected]d> wrote:
and
because it was supposedly "self documenting", easier for managers,
etc. to see how the program worked.
Of course, if they designed COBOL that way, why did they include a
statement that let you re-direct GOTO statements from elsewhere in a
program?
I mean, that was just asking for dishonest programmers to direct the
odd pennies into their bank accounts and so on.
John Savard wrote:
On Sat, 18 May 2024 17:11:32 -0000 (UTC), "Stephen Fuld"
<[email protected]d> wrote:
and
because it was supposedly "self documenting", easier for managers,
etc. to see how the program worked.
Of course, if they designed COBOL that way, why did they include a
statement that let you re-direct GOTO statements from elsewhere in a
program?
That feature (Alter GOTO) was also in Fortran, as the, long since
deprecated, assigned GOTO statement.
Rumor has it that the AD statement was regularly abused,
This is a nice demonstration of the unnecessary complexity that the
codepoint mistake leads to. ...
But if they had decided to just store the data as UTF-8 and use byte
indexes and lengths in their API, and adjusted the rest of their API accordingly, they could have avoided this complexity and
inefficiency ...
Plus at some point (not sure when) they decided that characters have to
be composable ...
On Sun, 12 May 2024 05:40:45 GMT, Anton Ertl wrote:
This is a nice demonstration of the unnecessary complexity that the
codepoint mistake leads to. ...
But if they had decided to just store the data as UTF-8 and use byte
indexes and lengths in their API, and adjusted the rest of their API
accordingly, they could have avoided this complexity and
inefficiency ...
But UTF-8 is just a representation of code points, not characters. So I >don’t understand why one way leads to “unnecessary complexity” and the >other way does not.
On Sun, 12 May 2024 16:12:26 GMT, Anton Ertl wrote:
Plus at some point (not sure when) they decided that characters have to
be composable ...
I think that was true right from the beginning. Else you would have had a >combinatorial explosion of alphabetic characters with diacritic marks.
Stefan Monnier <[email protected]> writes:
Does Unicode even has the notion of "character", really?
AFAIK it does not.
Lawrence D'Oliveiro <[email protected]d> writes:
On Sun, 12 May 2024 05:40:45 GMT, Anton Ertl wrote:
This is a nice demonstration of the unnecessary complexity that the
codepoint mistake leads to. ...
But if they had decided to just store the data as UTF-8 and use byte
indexes and lengths in their API, and adjusted the rest of their API
accordingly, they could have avoided this complexity and inefficiency
...
But UTF-8 is just a representation of code points, not characters. So I >>don’t understand why one way leads to “unnecessary complexity” and the >>other way does not.
In UTF-32 a character is a sequence of code points. In UTF-8 it is a sequence of code units.
I don't know of any language (or even library) that supports the notion
of "character" for Unicode strings. 🙁
Algol 60 does not standardize a program representation in characters (a
grave mistake fixed by most later programming languages ...
John Savard wrote:
Historical note: Algol was originally called IAL; remember what JOVIAL
stood for.
Who was Joe ?? in Jovial
If Algol was supposed to be an _international_ algorithmic language,
why weren't its keywords taken from Latin or Esperanto, instead of
English?
On Wed, 22 May 2024 15:38:51 -0400, Stefan Monnier wrote:
I don't know of any language (or even library) that supports the notion
of "character" for Unicode strings. 🙁
Surely a “character” (or “grapheme” I think is (one of) the Unicode terms)
is (represented by) a non-combining code point combined with all the >immediately-following combining code points.
Much of its syntax came from mathematics, which is international.
Semi-related question: are there non-English equivalents for mathematical >operators like “grad”, “div” and “curl”?
According to Lawrence D'Oliveiro <[email protected]d>:
On Wed, 22 May 2024 15:38:51 -0400, Stefan Monnier wrote:
I don't know of any language (or even library) that supports the
notion of "character" for Unicode strings. 🙁
Surely a “character” (or “grapheme” I think is (one of) the Unicode >> terms) is (represented by) a non-combining code point combined with all
the immediately-following combining code points.
Take another look at the table I referred to yesterday. When you have
ZWJ the rules of what combines with what gets awfully complicated.
On Mon, 27 May 2024 15:16:13 -0000 (UTC), John Levine wrote:
According to Lawrence D'Oliveiro <[email protected]d>:
On Wed, 22 May 2024 15:38:51 -0400, Stefan Monnier wrote:
I don't know of any language (or even library) that supports the
notion of "character" for Unicode strings. 🙁
Surely a “character” (or “grapheme” I think is (one of) the Unicode >>> terms) is (represented by) a non-combining code point combined with all
the immediately-following combining code points.
Take another look at the table I referred to yesterday. When you have
ZWJ the rules of what combines with what gets awfully complicated.
ZWJ is classed as “punctuation”, and has no combining class. So it forms a >“character” or “grapheme” it its own right.
On Tue, 28 May 2024 01:25:38 -0000 (UTC), John Levine wrote:
Really, you need to look at that combined emoji table I told you about
yesterday.
I’m just telling you what the official Unicode spec says.
It appears that Lawrence D'Oliveiro <[email protected]d> said:
Much of its syntax came from mathematics, which is international.
Semi-related question: are there non-English equivalents for
mathematical operators like “grad”, “div” and “curl”?
Grad is written as a nabla, an upside down delta, div as nabla followed
by a center dot, and curl as nabla followed by a multiplication sign.
I happen to have a copy of "Algol 60 Implementation" published in 1963
which describes the KDF9 Algol compiler in considerable detail. They considered the translation of the Algol publication language to the
5-bit paper tape code their computer used so trivial that they don't
even describe it.
On Mon, 27 May 2024 19:09:51 -0000 (UTC), John Levine wrote:
According to EricP <[email protected]>:
One could have instructions that make it easier to parse the variable
length UTF-8 sequences into codepoints.
That would be the CU14 instruction on zSeries, to turn UTF-8 into
UTF-32. CU41 goes the other way.
What is the point, in this day and age, of having special machine >instructions to convert character encodings?
Really, you need to look at that combined emoji table I told you about yesterday.
On Mon, 27 May 2024 16:41:26 -0000 (UTC), John Levine wrote:
It appears that Lawrence D'Oliveiro <[email protected]d> said:
Much of its syntax came from mathematics, which is international.
Semi-related question: are there non-English equivalents for
mathematical operators like “grad”, “div” and “curl”?
Grad is written as a nabla, an upside down delta, div as nabla followed
by a center dot, and curl as nabla followed by a multiplication sign.
That’s right, I’d forgotten about that.
I happen to have a copy of "Algol 60 Implementation" published in 1963
which describes the KDF9 Algol compiler in considerable detail. They
considered the translation of the Algol publication language to the
5-bit paper tape code their computer used so trivial that they don't
even describe it.
Only 32 code symbols? It must have used shifts, à la Baudot code. It probably was Baudot code.
On Sun, 19 May 2024 15:32:49 -0600, John Savard wrote:
If Algol was supposed to be an _international_ algorithmic language,
why weren't its keywords taken from Latin or Esperanto, instead of
English?
Much of its syntax came from mathematics, which is international.
Semi-related question: are there non-English equivalents for mathematical operators like “grad”, “div” and “curl”?
Anyway, the Emacs Lisp functions right-char (and, after testing, also left-char, forward-char, and backward-char) support the notion of
character at least for some scripts. That may be the result of an interaction with the redisplay code that you mention later, but in
that case it's that code that knows about characters in Unicode.
Um, so am I. Those nine code point things are supposed to display
as a single little picture, regardless of what some other bit of
the spec may assert about ZWJ.
On 28/05/2024 02:22, Lawrence D'Oliveiro wrote:
On Mon, 27 May 2024 16:41:26 -0000 (UTC), John Levine wrote:
I happen to have a copy of "Algol 60 Implementation" published in 1963
which describes the KDF9 Algol compiler in considerable detail. They
considered the translation of the Algol publication language to the
5-bit paper tape code their computer used so trivial that they don't
even describe it.
Only 32 code symbols? It must have used shifts, à la Baudot code. It
probably was Baudot code.
It was Ferranti 5-channel paper tape code: <http://www.findlayw.plus.com/KDF9/The%20KDF9%20Character%20Codes.pdf>
Anyway, the Emacs Lisp functions right-char (and, after testing, also
left-char, forward-char, and backward-char) support the notion of
character at least for some scripts. That may be the result of an
interaction with the redisplay code that you mention later, but in
that case it's that code that knows about characters in Unicode.
Indeed, the concept is somewhat visible, but it's not really exposed in
the language. I think what you're seeing is implemented elsewhere than
in `forward-char`, it's a part of the interactive loop which sees that
after `forward-char` you end up "in the middle" of a composition and it
moves the point further, based on information that mostly belongs to the >redisplay code.
Try `C-u 2 C-f` and I suspect you'll see that it doesn't always advance
by 2 characters but rather it advances by "2 code points + rounding up
to the next character boundary".
On Tue, 28 May 2024 15:43:25 +0100, moi wrote:
On 28/05/2024 02:22, Lawrence D'Oliveiro wrote:
On Mon, 27 May 2024 16:41:26 -0000 (UTC), John Levine wrote:
I happen to have a copy of "Algol 60 Implementation" published in 1963 >>>> which describes the KDF9 Algol compiler in considerable detail. They
considered the translation of the Algol publication language to the
5-bit paper tape code their computer used so trivial that they don't
even describe it.
Only 32 code symbols? It must have used shifts, à la Baudot code. It
probably was Baudot code.
It was Ferranti 5-channel paper tape code:
<http://www.findlayw.plus.com/KDF9/The%20KDF9%20Character%20Codes.pdf>
That doc says it’s a 6-bit code.
By the way, don’t you hate sites that block user agents like wget?
On Mon, 20 May 2024 11:46:20 GMT, Anton Ertl wrote:
Algol 60 does not standardize a program representation in characters (a
grave mistake fixed by most later programming languages ...
That would likely not have been considered feasible in 1960, given the
wide variation in character sets between computer systems.
On Mon, 27 May 2024 06:20:33 GMT, Anton Ertl wrote:
In UTF-32 a character is a sequence of code points. In UTF-8 it is a
sequence of code units.
UTF-8 is a sequence of bytes encoding code points.
Confirmed. So Emacs Lisp has a codepoint-oriented interface and then
needs to compensate for that elsewhere. This does not indicate that a codepoint-oriented interface is a good idea, rather the opposite.
Lawrence D'Oliveiro <[email protected]d> writes:
On Mon, 20 May 2024 11:46:20 GMT, Anton Ertl wrote:
Algol 60 does not standardize a program representation in characters
(a grave mistake fixed by most later programming languages ...
That would likely not have been considered feasible in 1960, given the
wide variation in character sets between computer systems.
COBOL did it. LISP did it.
It's just that the Algol 60 committee did not want to go there.
In UTF-32 a character is a sequence of (32-bit) code units.
In UTF-8 a character is a sequence of (8-bit) code units.
I hate user agents like wget, which is why I block them.
On Wed, 29 May 2024 07:04:35 GMT, Anton Ertl wrote:
Lawrence D'Oliveiro <[email protected]d> writes:
Isn’t the point of RISC that these complex operations are
more efficiently performed by a sequence of simpler instructions?
The IBM z series are not RISCs.
Doesn’t matter. The principles of designing high-performance architectures >still apply: simpler instructions are better than more complex ones.
According to Lawrence D'Oliveiro <[email protected]d>:
On Wed, 29 May 2024 07:04:35 GMT, Anton Ertl wrote:
Lawrence D'Oliveiro <[email protected]d> writes:
Isn’t the point of RISC that these complex operations are
more efficiently performed by a sequence of simpler
instructions?
The IBM z series are not RISCs.
Doesn’t matter. The principles of designing high-performance architectures still apply: simpler instructions are better than
more complex ones.
Nobody buys a mainframe just for its compute speed.
I do not entirely understand why IBM keeps adding special purpose instructions to z. Maybe it's partly marketing, but they have a
largely captive audience so it has to be more than that. Given the
millicode design, a lot of the instructions are basically microcoded subroutines that may well run faster than the normal code equivalent
because the have access to more machine state. If anyone is about to
say than let all the instructions see all the state, see our
discussion a week or two ago about architecture vs. implementation.
They wanted symbols like “÷”, “×”, “↑”, “≤”, “≥”, “≠”, “≡”, “⊃”, “∨”,
“∧”, “¬” ... you get the idea. I don’t any computer system on earth
could provide all those symbols at the time, or even, say, 20 years
later.
Lawrence D'Oliveiro wrote:
snip
They wanted symbols like [...]
See APL. So many symbols that the language is almost impossible to
read without a significant investment in learning them.
https://en.wikipedia.org/wiki/APL_syntax_and_symbols#Monadic_functions
"Stephen Fuld" <[email protected]d> writes:
Lawrence D'Oliveiro wrote:
snip
They wanted symbols like [...]
See APL. So many symbols that the language is almost impossible to
read without a significant investment in learning them.
The problem with learning APL is not the character set. APL without
any special characters (which I actually have some experience using)
is still unlike any other programming language that existed in the
1960s or 1970s.
I don�t any computer system on earth could
provide all those symbols at the time, or even, say, 20 years later.
Tim Rentsch wrote:
"Stephen Fuld" <[email protected]d> writes:
Lawrence D'Oliveiro wrote:
snip
They wanted symbols like [...]
See APL. So many symbols that the language is almost impossible to
read without a significant investment in learning them.
https://en.wikipedia.org/wiki/APL_syntax_and_symbols#Monadic_functions
The problem with learning APL is not the character set. APL without
any special characters (which I actually have some experience using)
is still unlike any other programming language that existed in the
1960s or 1970s.
OK, but my main point was to show, by counter example, the error of Lawrence's statement quoted below
I don't any computer system on earth could provide all those
symbols at the time, or even, say, 20 years later.
If the part about the difficulty of learning APL was wrong, then I
apologise.
According to Lawrence D'Oliveiro <[email protected]d>:
On Wed, 29 May 2024 07:04:35 GMT, Anton Ertl wrote:
Lawrence D'Oliveiro <[email protected]d> writes:
Isn’t the point of RISC that these complex operations are
more efficiently performed by a sequence of simpler instructions?
The IBM z series are not RISCs.
Doesn’t matter. The principles of designing high-performance architectures >>still apply: simpler instructions are better than more complex ones.
Nobody buys a mainframe just for its compute speed.
I do not entirely understand why IBM keeps adding special purpose >instructions to z. Maybe it's partly marketing, but they have a
largely captive audience so it has to be more than that.
Given the
millicode design, a lot of the instructions are basically microcoded >subroutines that may well run faster than the normal code equivalent
because the have access to more machine state.
If you want something that gives you more MIPS/$, IBM is happy to sell
you POWER systems.
On Wed, 29 May 2024 08:20:03 GMT, Anton Ertl wrote:
In UTF-32 a character is a sequence of (32-bit) code units.
In UTF-8 a character is a sequence of (8-bit) code units.
The point being, there is a 1:1 correspondence between the two >representations of the same characters/code points. So your claim that use
of one is somehow a “mistake” while the other is not, is spurious.
Anton Ertl <[email protected]> schrieb:
It's still marketing. I have listened to several talks about
converting S/360 programs to C code that can be run on arbitrary
hardware, and IBM's audience hears about such things, too, so IBM's
sales force has to provide reasons for not jumping ship. And all
these new features that sound like they are useful are such reasons.
Things like decimal FP and CU14.
The fact that these feature provide no actual benefit is their best
property:
No actual benefit?
If you make such a strong statement, I assume that you have done a
thorough analysis of this feature for typical mainframe workloads
and can support your claims with benchmarks.
Thomas Koenig <[email protected]> writes:
Anton Ertl <[email protected]> schrieb:
It's still marketing. I have listened to several talks about
converting S/360 programs to C code that can be run on arbitrary
hardware, and IBM's audience hears about such things, too, so IBM's
sales force has to provide reasons for not jumping ship. And all
these new features that sound like they are useful are such
reasons. Things like decimal FP and CU14.
The fact that these feature provide no actual benefit is their best
property:
No actual benefit?
If you make such a strong statement, I assume that you have done a
thorough analysis of this feature for typical mainframe workloads
and can support your claims with benchmarks.
Note that the feature was introduced in Znext (2012). That it is
still there must indicate that it gets some usage.
It's still marketing. I have listened to several talks about
converting S/360 programs to C code that can be run on arbitrary
hardware, and IBM's audience hears about such things, too, so IBM's
sales force has to provide reasons for not jumping ship. And all
these new features that sound like they are useful are such reasons.
Things like decimal FP and CU14.
The fact that these feature provide no actual benefit is their best
property:
Naturally, I don't know if this particular feature got publicly
documented opcode and don't know where too look.
On Thu, 30 May 2024 14:08:04 GMT
[email protected] (Scott Lurndal) wrote:
Note that the feature was introduced in Znext (2012). That it is
still there must indicate that it gets some usage.
Not necessarily.
After feature was given publicly documented opcode it's very hard to
remove it.
Naturally, I don't know if this particular feature got publicly
documented opcode and don't know where too look.
Concerning benchmarks, last I heard IBM forbids benchmarking z
hardware. Until they change this, I'll assume their z hardware is
abysmally slow and any benchmarking would result in embarrassment, IBM
knows this and that's why they forbid benchmarking.
Thomas Koenig <[email protected]> writes:
Anton Ertl <[email protected]> schrieb:
It's still marketing. I have listened to several talks about
converting S/360 programs to C code that can be run on arbitrary
hardware, and IBM's audience hears about such things, too, so IBM's
sales force has to provide reasons for not jumping ship. And all
these new features that sound like they are useful are such reasons.
Things like decimal FP and CU14.
The fact that these feature provide no actual benefit is their best
property:
No actual benefit?
If you make such a strong statement, I assume that you have done a
thorough analysis of this feature for typical mainframe workloads
and can support your claims with benchmarks.
Care to show exactly what you did, and what the results were?
It provides no actual benefit, because UTF-32 provides no actual
benefit.
Anton Ertl <[email protected]> schrieb:
It's still marketing. I have listened to several talks about
converting S/360 programs to C code that can be run on arbitrary
hardware, and IBM's audience hears about such things, too, so IBM's
sales force has to provide reasons for not jumping ship. And all
these new features that sound like they are useful are such reasons.
Things like decimal FP and CU14.
The fact that these feature provide no actual benefit is their best
property:
No actual benefit?
If you make such a strong statement, I assume that you have done a
thorough analysis of this feature for typical mainframe workloads
and can support your claims with benchmarks.
Care to show exactly what you did, and what the results were?
I'm not sure the codepoint-oriented API is the best option, but it's not >completely clear what *is* the best option. You mention a byte-oriented
API and you might be right that it's a better option, but in the case of >Emacs that's what we used in Emacs-20.1 but it worked really poorly
because of backward compatibility issues. I think if we started from
scratch now (i.e. without having to contend with backward compatibility,
and with a better understanding of Unicode (which barely existed back
then)) it might work better, indeed, but that's not been an option
The problem with learning APL is not the character set. APL without
any special characters (which I actually have some experience using)
is still unlike any other programming language that existed in the
1960s or 1970s.
On Wed, 29 May 2024 08:32:17 +0100, moi wrote:
I hate user agents like wget, which is why I block them.
Which is completely futile, which is why it’s so stupid to do.
And so did Fortran. They all did it by severely curtailing their allowed >character sets.
It's just that the Algol 60 committee did not want to go there.
They wanted symbols like ���, �ה, �?�, �?�, �?�, �?�, �?�, �?�, �?�, �?�, >��� ... you get the idea. I don�t any computer system on earth could
provide all those symbols at the time, or even, say, 20 years later.
Lawrence D'Oliveiro <[email protected]d> writes:
On Mon, 20 May 2024 11:46:20 GMT, Anton Ertl wrote:
Algol 60 does not standardize a program representation in characters (a
grave mistake fixed by most later programming languages ...
That would likely not have been considered feasible in 1960, given the
wide variation in character sets between computer systems.
COBOL did it. LISP did it. It was feasible in 1960. It's just that
the Algol 60 committee did not want to go there.
If the part about the difficulty of learning APL was wrong, then I
apologise.
I do not entirely understand why IBM keeps adding special purpose >instructions to z. Maybe it's partly marketing, but they have a
largely captive audience so it has to be more than that.
But they _were_ fairly U.S. - centric, and Algol was *not*. For
example,
Anton Ertl <[email protected]> schrieb:
It's still marketing. I have listened to several talks about
converting S/360 programs to C code that can be run on arbitrary
hardware, and IBM's audience hears about such things, too, so IBM's
sales force has to provide reasons for not jumping ship. And all
these new features that sound like they are useful are such reasons.
Things like decimal FP and CU14.
The fact that these feature provide no actual benefit is their best
property:
No actual benefit?
If you make such a strong statement, I assume that you have done a
thorough analysis of this feature for typical mainframe workloads
and can support your claims with benchmarks.
Care to show exactly what you did, and what the results were?
On Thu, 30 May 2024 06:12:11 -0000 (UTC), "Stephen Fuld" <[email protected]d> wrote:
If the part about the difficulty of learning APL was wrong, then I
apologise.
I would not say that it was wrong. APL "without special characters"
was achieved by way of a transliteration scheme, where short codes represented the special characters. So instead of memorizing funny
shapes, you memorized cryptic abbreviations.
So the character set was _still_ the source of the difficulty of
learning APL even if you happened to be using an implementation that
didn't have any special characters.
On 5/30/2024 11:25 AM, Anton Ertl wrote:
Stefan Monnier <[email protected]> writes:
I'm not sure the codepoint-oriented API is the best option, but it's
not
completely clear what *is* the best option. You mention a
byte-oriented
API and you might be right that it's a better option, but in the case
of
Emacs that's what we used in Emacs-20.1 but it worked really poorly
because of backward compatibility issues. I think if we started from
scratch now (i.e. without having to contend with backward
compatibility,
and with a better understanding of Unicode (which barely existed back
then)) it might work better, indeed, but that's not been an option
Plus, editors are among the very few uses where you have to deal with
individual characters, so the "treat it as opaque string" approach
that works so well for most other code is not good enough there. The
command-line editor of Gforth is one case where we use the xchar words
(those for dealing with code points of UTF-8).
Yeah.
For text editors, this is one of the few cases it makes sense to use 32
or 64 bit characters (say, combining the 'character' with some
additional metadata such as formatting).
Though, one thing that makes sense for text editors is if only the
"currently being edited" lines are fully unpacked, whereas the others
can remain in a more compact form (such as UTF-8), and are then
unpacked
as they come into view (say, treating the editor window as a 32-entry
modulo cache or similar).
For the rest, say, one can have, say, a big buffer, with an array of
lines giving the location and size of the line's text in the buffer.
If a line is modified, it can be reallocated at the end of the buffer,
and if the buffer gets full, it can be "repacked" and/or expanded as
needed. When written back to a file, the buffer lines can be emitted
in-order to the text file.
Not entirely sure how other text editors manage things here, not really
looked into it.
- anton
Read all about it: https://www.vm.ibm.com/library/other/22783213.pdfThanks!
It's on page 7-251.
I did read all of it, and it was pretty close to how I would have
designed a sw function to do the same, except for the very funky ABI:
Both source and destination _must_ be an even register number, with the >following odd register providing the count/length.
Just from this little snippet I'm pretty sure this instruction has a
sizeable startup overhead, compiler support is probably in the form of
an intrinsic that knows about the need to allocate two pairs of
register, each pair starting at an even-numbered register.
On Thu, 30 May 2024 03:25:14 -0000 (UTC), John Levine
<[email protected]> wrote:
I do not entirely understand why IBM keeps adding special purpose >>instructions to z. Maybe it's partly marketing, but they have a
largely captive audience so it has to be more than that.
One possibility is to _keep_ that audience captive even after all the
patents expire that are applicable to machines with the z/Architecture
in its current state, if you are reluctant to believe that these new >instructions genuinely improve performance.
On 5/31/2024 12:21 PM, MitchAlsup1 wrote:
For the rest, say, one can have, say, a big buffer, with an array of
lines giving the location and size of the line's text in the buffer.
In a modern text editor, one can paste in {*.xls tables, *.jpg, *.gif,
..}
along with text from different fonts and different backgrounds on a per
character basis.
Errm, I think we call this a word processor, not a text editor.
Granted, text editors don't usually store font or formatting
information
in the text itself, but rather it exists temporarily for things like
"syntax highlighting".
If a line is modified, it can be reallocated at the end of the buffer,
and if the buffer gets full, it can be "repacked" and/or expanded as
needed. When written back to a file, the buffer lines can be emitted
in-order to the text file.
Not entirely sure how other text editors manage things here, not really
looked into it.
If you think about it with the above features, you quickly realize it
is not just text anymore.
But, word processors are their own category...
Typically, they also have their own specialized formats (though, "big
blob of XML inside a ZIP package" seems to have become popular).
Whereas text-editors typically use plain ASCII/UTF-8/UTF-16 files...
The great "feature creep" in text editors is mostly that modern ones
support syntax highlighting and emojis.
An intermediate option would be a wysiwyg editor that does MediaWiki or
Markdown. Though, annoyingly, there don't seem to be any that exist as standalone desktop programs (seemingly invariably they are written in JavaScript or similar and intended to operate inside a browser).
I might eventually need to get around to writing something like this
(mostly because I use MediaWiki notation for some of my own
documentation). Also arguably mode advanced than the system used by
"info" and "man", though a tool along these lines could make sense (but
possibly as an intermediate, with an interface more like "man" but able
to jump between documents more like "info").
Also, bug hunt is annoying. Find/fix one bug, but more bugs remain...
My project is seemingly in a rather buggy state right at the moment.
But, I guess, did add things like file redirection and similar, along
with a few more standard commands.
So, in the working version, technically things like "cat file1 > file2"
or "program > file" and similar are now technically possible...
But, also, everything has turned into a crapstorm of crashes...
- anton
U.S.-centric vs U.S. eccentric. >http://www.cs.yale.edu/homes/perlis-alan/quotes.html
Actually I am pretty sure that "eccentric" is not a fair
characterisation of his personality, but can't resist.
According to John Savard <[email protected]d>:
On Thu, 30 May 2024 03:25:14 -0000 (UTC), John Levine
<[email protected]> wrote:
I do not entirely understand why IBM keeps adding special purpose >>>instructions to z. Maybe it's partly marketing, but they have a
largely captive audience so it has to be more than that.
One possibility is to _keep_ that audience captive even after all the >>patents expire that are applicable to machines with the z/Architecture
in its current state, if you are reluctant to believe that these new >>instructions genuinely improve performance.
Back in the last millenium there were a bunch of companies that made
clones of IBM mainframes. They all failed. It's the whole ecosystem of >hardware and software, not just individual features that keep the
customers nor patents.
I have to say I'm somewhat surprised that IBM has put a lot of effort
into running linux on zSeries, since that's about as un-captive as you
can get. I would imagine that for some kinds of heavily threaded
workloads they could be competitive since the z machines have upwards
of a hundred CPUs with a shared mostly consistent cache.
According to Michael S <[email protected]>:
U.S.-centric vs U.S. eccentric. >>http://www.cs.yale.edu/homes/perlis-alan/quotes.html
Actually I am pretty sure that "eccentric" is not a fair
characterisation of his personality, but can't resist.
He was my thesis advisor and he was pretty eccentric. In a nice way,
but still quite a character.
BGB wrote:
On 5/31/2024 12:21 PM, MitchAlsup1 wrote:
For the rest, say, one can have, say, a big buffer, with an array of
lines giving the location and size of the line's text in the buffer.
In a modern text editor, one can paste in {*.xls tables, *.jpg, *.gif,
..}
along with text from different fonts and different backgrounds on a per
character basis.
Errm, I think we call this a word processor, not a text editor.
So, you are calling AOL e-mail editor a word processor ???
And every modern forum editor (this one not included) word processors
I have to say I'm somewhat surprised that IBM has put a lot of effort
into running linux on zSeries, since that's about as un-captive as you
can get.
On Fri, 31 May 2024 19:44:49 -0000 (UTC), John Levine
<[email protected]> wrote:
I have to say I'm somewhat surprised that IBM has put a lot of effort
into running linux on zSeries, since that's about as un-captive as you
can get.
You can buy a zSeries machine more cheaply if it can only run Linux,
but not any IBM operating systems. So this is presumably for the
purpose of expanding the popularity of the z/Architecture without in
any way threatening the profitability of their base market.
If they took it to its logical conclusion, and packaged zArchitecture
chips without the ability to run current IBM operating systems in the
same way as POWER chips, I might actually be interested.
Thomas Koenig wrote:
Anton Ertl <[email protected]> schrieb:I am pretty sure Anton is correct, at least for data residing in RAM,
It's still marketing. I have listened to several talks about
converting S/360 programs to C code that can be run on arbitrary
hardware, and IBM's audience hears about such things, too, so IBM's
sales force has to provide reasons for not jumping ship. And all
these new features that sound like they are useful are such reasons.
Things like decimal FP and CU14.
The fact that these feature provide no actual benefit is their best
property:
No actual benefit?
If you make such a strong statement, I assume that you have done a
thorough analysis of this feature for typical mainframe workloads
and can support your claims with benchmarks.
Care to show exactly what you did, and what the results were?
since any reasonably efficient sw algorithm to do the same thing should
be able to keep up with memory bandwidth, right?
I'm not sure that would be the case for text containing some
non-ASCII characters, where you cannot predict branches well
(consider Å, Ø and Æ, which together appear to make up around
a bit more than 2.5% according to a random statistic I just
grabbed off the Internet), or ä, ö and ü which have around 1.5%
occurrence together.
In Chinese or Japanese text, I assume the spaces and punctuation
are 7-bit ASCII (are they, actually?) so things would be even
worse for branch prediction.
John Savard <[email protected]d> schrieb:
If they took it to its logical conclusion, and packaged zArchitecture
chips without the ability to run current IBM operating systems in the
same way as POWER chips, I might actually be interested.
Would you like to buy one, then? That would be a large investment
of money and space in your home... but then again, an 18-year old
once bought a z890, see https://www.youtube.com/watch?v=45X4VP8CGtk
On Sat, 1 Jun 2024 07:47:49 -0000 (UTC), Thomas Koenig
<[email protected]> wrote:
John Savard <[email protected]d> schrieb:
If they took it to its logical conclusion, and packaged zArchitecture
chips without the ability to run current IBM operating systems in the
same way as POWER chips, I might actually be interested.
Would you like to buy one, then? That would be a large investment
of money and space in your home... but then again, an 18-year old
once bought a z890, see https://www.youtube.com/watch?v=45X4VP8CGtk
Well, when I said "packaged... in the same way as POWER chips", I
meant that they would make systems with fewer CPUs than a mainframe
which were in the category of ordinary desktop computers if they were
to do that... which, of course, they won't.
According to John Savard <[email protected]d>:
On Thu, 30 May 2024 03:25:14 -0000 (UTC), John Levine
<[email protected]> wrote:
I do not entirely understand why IBM keeps adding special purpose >>instructions to z. Maybe it's partly marketing, but they have a
largely captive audience so it has to be more than that.
One possibility is to _keep_ that audience captive even after all the >patents expire that are applicable to machines with the
z/Architecture in its current state, if you are reluctant to believe
that these new instructions genuinely improve performance.
Back in the last millenium there were a bunch of companies that made
clones of IBM mainframes. They all failed. It's the whole ecosystem of hardware and software, not just individual features that keep the
customers nor patents.
I have to say I'm somewhat surprised that IBM has put a lot of effort
into running linux on zSeries, since that's about as un-captive as you
can get. I would imagine that for some kinds of heavily threaded
workloads they could be competitive since the z machines have upwards
of a hundred CPUs with a shared mostly consistent cache.
If they took it to its logical conclusion, and packaged
zArchitecture chips without the ability to run current
IBM operating systems in the same way as POWER chips,
I might actually be interested.
I have to say I'm somewhat surprised that IBM has put a lot of
effort into running linux on zSeries, since that's about as
un-captive as you can get. I would imagine that for some kinds
of heavily threaded workloads they could be competitive since
the z machines have upwards of a hundred CPUs with a shared
mostly consistent cache.
In article <[email protected]>, [email protected]d (John Savard) wrote:
If they took it to its logical conclusion, and packaged
zArchitecture chips without the ability to run current
IBM operating systems in the same way as POWER chips,
I might actually be interested.
A deskside zSeries machine that would boot and run Linux (probably under z/VM) reasonably simply would be interesting to me. A big-endian machine
with comprehensive hardware trapping has software QA uses in the current
era of machines that hardly trap on anything apart from SEGV.
There are POWER8 machines on sale on E-bay, on which you can run
either Linux or AIX, and bigendian too, if you want.
But maybe you can also run a 360/30 on an FPGA board, somebody
has apparently implemented it in VHDL from the logic diagrams: >https://github.com/ibm2030/IBM2030
Bottom line: Code point conversion instructions like CU14 solve a
problem that people imagine who have no experience working with UTF-8.
In article <v3hs9o$3c8gd$[email protected]>, [email protected] (Thomas >Koenig) wrote:
There are POWER8 machines on sale on E-bay, on which you can run
either Linux or AIX, and bigendian too, if you want.
Yup. Considered that. Their trapping is not as comprehensive as zSeries,
and I could not justify them.
Though, one thing that makes sense for text editors is if only the
"currently being edited" lines are fully unpacked, whereas the others
can remain in a more compact form (such as UTF-8), and are then unpacked
as they come into view (say, treating the editor window as a 32-entry
modulo cache or similar).
On 5/31/2024 12:21 PM, MitchAlsup1 wrote:
Errm, I think we call this a word processor, not a text editor.
In a modern text editor, one can paste in {*.xls tables, *.jpg, *.gif,
..} along with text from different fonts and different backgrounds on a
per character basis.
Lawrence D'Oliveiro <[email protected]d> writes:
On Wed, 29 May 2024 08:20:03 GMT, Anton Ertl wrote:
In UTF-32 a character is a sequence of (32-bit) code units.
In UTF-8 a character is a sequence of (8-bit) code units.
The point being, there is a 1:1 correspondence between the two
representations of the same characters/code points. So your claim that
use of one is somehow a “mistake” while the other is not, is spurious.
If the data you are working on is provided in files containing UTF-8, conversion to UTF-32 does not provide any benefits and is therefore an unnecessary complication, and therefore a mistake.
On 30/05/2024 03:43, Lawrence D'Oliveiro wrote:
On Wed, 29 May 2024 08:32:17 +0100, moi wrote:
I hate user agents like wget, which is why I block them.
Which is completely futile, which is why it’s so stupid to do.
What a know-all you are. And offensive with it.
And then there was the LISP machine, which started life with the
infamous "Space Cadet" computer.
The condition code tells you which it was. If it was an interrupt, you
just branch back and keep going.
SPARCs are big-endian and trap on unaligned access (at least that
was the case when I last used one long ago), while S/370 ff. does
not trap on unaligned access.
In article <[email protected]>, [email protected] (Anton Ertl) wrote:
SPARCs are big-endian and trap on unaligned access (at least that
was the case when I last used one long ago), while S/370 ff. does
not trap on unaligned access.
OK, that shoots down S/370 for this job.
John
On Thu, 30 May 2024 14:42:14 -0000 (UTC), John Levine wrote:
The condition code tells you which it was. If it was an interrupt, you
just branch back and keep going.
Does it really hurt performance for the CPU to keep track of the fact that
an instruction has to be restarted after an interrupt?
On the old VAX, there was a processor status bit called “First Part Done”,
On Thu, 30 May 2024 19:01:11 +0100, moi wrote:
On 30/05/2024 03:43, Lawrence D'Oliveiro wrote:
On Wed, 29 May 2024 08:32:17 +0100, moi wrote:
I hate user agents like wget, which is why I block them.
Which is completely futile, which is why it’s so stupid to do.
What a know-all you are. And offensive with it.
You find it offensive that your block is so easy to bypass?
Sucks to be you.
On 2024-06-02 15:44, John Levine wrote:
IBM made several S/360 and S/370 add-in boards for PCs. They worked
but were never very popular, probably because nobody bought a
mainframe for the CPU and PC peripherals are underpowered.
Were they not marketed as a way of developing s/w on a PC without
chewing up mainframe time?
IBM made several S/360 and S/370 add-in boards for PCs. They worked
but were never very popular, probably because nobody bought a
mainframe for the CPU and PC peripherals are underpowered.
On Thu, 30 May 2024 14:42:14 -0000 (UTC), John Levine wrote:
The condition code tells you which it was. If it was an interrupt, you
just branch back and keep going.
Does it really hurt performance for the CPU to keep track of the fact that
an instruction has to be restarted after an interrupt?
Lawrence D'Oliveiro <[email protected]d> writes:
On Thu, 30 May 2024 14:42:14 -0000 (UTC), John Levine wrote:
The condition code tells you which it was. If it was an interrupt, you
just branch back and keep going.
Does it really hurt performance for the CPU to keep track of the fact
that
an instruction has to be restarted after an interrupt?
Yes, of course. And it complicates the design, which makes it harder
to verify, particularly for an out-of-order design.
On Fri, 31 May 2024 12:55:59 -0500, BGB wrote:
On 5/31/2024 12:21 PM, MitchAlsup1 wrote:
Errm, I think we call this a word processor, not a text editor.
In a modern text editor, one can paste in {*.xls tables, *.jpg, *.gif,
..} along with text from different fonts and different backgrounds on a
per character basis.
Emacs has things called “text attributes” and “overlays”, for doing precisely this sort of thing. You can even use these things to define clickable buttons. Yet nobody would call Emacs a “word processor”.
Anton Ertl <[email protected]> schrieb:
The fact that these feature provide no actual benefit is their best
property:
No actual benefit?
If you make such a strong statement, I assume that you have done a
thorough analysis of this feature for typical mainframe workloads and
can support your claims with benchmarks.
Back in the last millenium there were a bunch of companies that made
clones of IBM mainframes. They all failed.
One of the main selling points [of zSeries] is the hardware
reliability ...
If you make such a strong statement, I assume that you have done a
thorough analysis of this feature for typical mainframe workloads and
can support your claims with benchmarks.
We already know the answer to that. It’s why RISC has taken over the computing world.
Remember that “mainframe workloads” are primarily I/O bound, not CPU- bound. The whole concept of a “mainframe” arose in the era when CPU
time
was scarce and expensive, so you had all these intelligent I/O
peripherals
that could be given sequences of operations to perform, with minimal
CPU
intervention. It was all about maximizing throughput (batch operation),
not minimizing latency (interactive operation).
Nowadays, the whole concept is obsolete. So the only thing keeping it a
viable business has to be marketing, not technical, reasons.
Lawrence D'Oliveiro <[email protected]d> writes:
On Fri, 31 May 2024 12:55:59 -0500, BGB wrote:
On 5/31/2024 12:21 PM, MitchAlsup1 wrote:
Errm, I think we call this a word processor, not a text editor.
In a modern text editor, one can paste in {*.xls tables, *.jpg,
*.gif, ..} along with text from different fonts and different
backgrounds on a per character basis.
Emacs has things called “text attributes” and “overlays”, for doing >> precisely this sort of thing. You can even use these things to define
clickable buttons. Yet nobody would call Emacs a “word processor”.
RMS did call it a word processor.
https://lists.gnu.org/archive/html/emacs-devel/2013-11/msg00515.html
You can buy POWER9 machines from RaptorCS. The command prompt does not
look different from AMD64, but of course the coolness factor is much
higher.
Remember that “mainframe workloads” are primarily I/O bound, not CPU- bound. The whole concept of a “mainframe” arose in the era when CPU time was scarce and expensive, so you had all these intelligent I/O peripherals that could be given sequences of operations to perform, with minimal CPU intervention. It was all about maximizing throughput (batch operation),
not minimizing latency (interactive operation).
On Fri, 31 May 2024 19:44:49 -0000 (UTC), John Levine wrote:
All the mainframe companies apart from IBM eventually went out of business >because the whole mainframe concept is obsolete.
Lawrence D'Oliveiro <[email protected]d> writes:
On Fri, 31 May 2024 19:44:49 -0000 (UTC), John Levine wrote:
All the mainframe companies apart from IBM eventually went out of business >>because the whole mainframe concept is obsolete.
Is that a fact?
https://www.unisys.com/
According to Scott Lurndal <[email protected]>:
Lawrence D'Oliveiro <[email protected]d> writes:
On Fri, 31 May 2024 19:44:49 -0000 (UTC), John Levine wrote:
All the mainframe companies apart from IBM eventually went out of business >>>because the whole mainframe concept is obsolete.
Is that a fact?
https://www.unisys.com/
I think that IBM is the only one that still makes CPUs. Aren't the
Unisys machines all emulated on commodity microprocessors now?
That doesn't keep them from working perfectly well, of course.
For text editors, this is one of the few cases it makes sense to use 32 or
64 bit characters (say, combining the 'character' with some additional metadata such as formatting).
Though, one thing that makes sense for text editors is if only the
"currently being edited" lines are fully unpacked, whereas the others can remain in a more compact form (such as UTF-8), and are then unpacked as they come into view (say, treating the editor window as a 32-entry modulo cache
or similar).
Not entirely sure how other text editors manage things here, not really looked into it.
Bottom line: Code point conversion instructions like CU14 solve aThe original instructions were CU12 and CU21 which convert between
problem that people imagine who have no experience working with UTF-8.
UTF-8 and UTF-16. That really is useful, e.g., read a file of UTF-8
into a program in Java or Javascript which uses UTF-16. I agree the
UTF-32 versions added in zseries are less likely to be useful.
If you make such a strong statement, I assume that you have done aWe already know the answer to that. It’s why RISC has taken over the computing world.
thorough analysis of this feature for typical mainframe workloads and
can support your claims with benchmarks.
Emacs uses a gap buffer, which is a quite primitive approach which in
theory has poor worst case behavior but works surprisingly well in
practice (especially with the speed at which current CPUs can copy/move
large chunks of memory).
Others use structures like ropes.
https://coredumped.dev/2023/08/09/text-showdown-gap-buffers-vs-ropes/
Lawrence D'Oliveiro <[email protected]d> writes:
On Sat, 1 Jun 2024 07:47:49 -0000 (UTC), Thomas Koenig wrote:
One of the main selling points [of zSeries] is the hardware
reliability ...
Quite an expensive way to get reliability. How does an outfit like
Google achieve essentially 0% downtime? By running a swarm of half a >>million commodity servers, that’s how.
And that's not expensive?
On Tue, 04 Jun 2024 13:11:52 GMT, Scott Lurndal wrote:
Lawrence D'Oliveiro <[email protected]d> writes:
On Sat, 1 Jun 2024 07:47:49 -0000 (UTC), Thomas Koenig wrote:
One of the main selling points [of zSeries] is the hardware
reliability ...
Quite an expensive way to get reliability. How does an outfit like
Google achieve essentially 0% downtime? By running a swarm of half a >>>million commodity servers, that’s how.
And that's not expensive?
Consider the equivalent number of mainframes, with their inbuilt
diagnostics capabilities etc, to match that reliability.
Lawrence D'Oliveiro <[email protected]d> writes:
On Tue, 04 Jun 2024 13:11:52 GMT, Scott Lurndal wrote:
Lawrence D'Oliveiro <[email protected]d> writes:
On Sat, 1 Jun 2024 07:47:49 -0000 (UTC), Thomas Koenig wrote:
One of the main selling points [of zSeries] is the hardware
reliability ...
Quite an expensive way to get reliability. How does an outfit like >>>>Google achieve essentially 0% downtime? By running a swarm of half a >>>>million commodity servers, that’s how.
And that's not expensive?
Consider the equivalent number of mainframes, with their inbuilt >>diagnostics capabilities etc, to match that reliability.
Tandem and Stratus did it three decades ago.
[email protected] (John Dallman) wrote:
In article <[email protected]>, [email protected] (Anton Ertl) wrote:What exactly is a job?
SPARCs are big-endian and trap on unaligned access (at leastOK, that shoots down S/370 for this job.
that was the case when I last used one long ago), while S/370
ff. does not trap on unaligned access.
Is it for pure personal amusement or there are practical needs?
. . . there were other entirely separate companies, like CDC
where Seymour Cray invented the concept of the _supercomputer_,
much to the surprise of his upper management who just wanted
to sell _business_ machines.
In article <[email protected]>,
[email protected] (Michael S) wrote:
[email protected] (John Dallman) wrote:
In article <[email protected]>, [email protected] (Anton Ertl) wrote:What exactly is a job?
SPARCs are big-endian and trap on unaligned access (at leastOK, that shoots down S/370 for this job.
that was the case when I last used one long ago), while S/370
ff. does not trap on unaligned access.
Is it for pure personal amusement or there are practical needs?
I would like to keep testing the commercial product I work on in a big-endian, alignment-trapping environment.
However, there isn't much
budget available for this. We have a SPARC box doing it, left over
from when we actually supported Solaris, but as testing grows, its
CPU power becomes less adequate for the job.
New SPARC boxes are expensive, dealing with Oracle is hard work, and
the architecture has no future.
I've never been very serious about using Linux on IBM Z for this -
it's expensive and dealing with IBM is hard work, although the
architecture still seems to have a future - but if it doesn't trap
misaligned accesses, it's disqualified.
John
In article <v3lqfs$48om$[email protected]>, [email protected]d (Lawrence D'Oliveiro) wrote:
. . . there were other entirely separate companies, like CDC
where Seymour Cray invented the concept of the _supercomputer_,
much to the surprise of his upper management who just wanted
to sell _business_ machines.
Another view is that the supercomputer was implicit in the needs of
the US nuclear weapons laboratories to do simulations of their
designs. Computers are much cheaper than nuclear testing.
John
I would like to keep testing the commercial product I work on in a >big-endian, alignment-trapping environment.
New SPARC boxes are expensive, dealing with Oracle is hard work, and the >architecture has no future.
On Wed, 5 Jun 2024 09:40 +0100 (BST)
[email protected] (John Dallman) wrote:
In article <[email protected]>,
[email protected] (Michael S) wrote:
[email protected] (John Dallman) wrote:
In article <[email protected]>,What exactly is a job?
[email protected] (Anton Ertl) wrote:
SPARCs are big-endian and trap on unaligned access (at leastOK, that shoots down S/370 for this job.
that was the case when I last used one long ago), while S/370
ff. does not trap on unaligned access.
Is it for pure personal amusement or there are practical needs?
I would like to keep testing the commercial product I work on in a
big-endian, alignment-trapping environment.
May be, now is a time to stop to like to keep it?
[email protected] (John Dallman) writes:
I would like to keep testing the commercial product I work on in a >>big-endian, alignment-trapping environment.
Computer architecture exhibits convergence. Starting in the 1960s it converged on byte addressing with 8-bit bytes and on 2s-complement,
starting in the 1980s it converged on IEEE FP, and ending in the 2010s
it converged on supporting unaligned accesses and on little-endian
byte order. Your difficulties in getting hardware for testing whether software can work with alignment restrictions and with big-endian byte
order is a result of that convergence. Maybe your desire to keep your software ready for big-endian hardware and hardware with alignment restrictions is misguided.
New SPARC boxes are expensive, dealing with Oracle is hard work, and the >>architecture has no future.
Ebay?
- anton
Another view is that the supercomputer was implicit in the needs of the
US nuclear weapons laboratories to do simulations of their designs.
On Wed, 5 Jun 2024 09:40 +0100 (BST), John Dallman wrote:
Another view is that the supercomputer was implicit in the needs of the
US nuclear weapons laboratories to do simulations of their designs.
And in code cracking. All very much a function of the Cold War.
No coincidence that Cray’s fortunes took a downturn when that ended.
On Tue, 04 Jun 2024 13:11:52 GMT, Scott Lurndal wrote:
Lawrence D'Oliveiro <[email protected]d> writes:
On Sat, 1 Jun 2024 07:47:49 -0000 (UTC), Thomas Koenig wrote:
One of the main selling points [of zSeries] is the hardware
reliability ...
Quite an expensive way to get reliability. How does an outfit like
Google achieve essentially 0% downtime? By running a swarm of half a >>>million commodity servers, that’s how.
And that's not expensive?
Consider the equivalent number of mainframes, with their inbuilt
diagnostics capabilities etc, to match that reliability.
Cray sold the first CRAY-1 for $60M this is what the nuclear physicists
could afford; writing off the entire development costs.
Consider the equivalent number of mainframes, with their inbuilt >>diagnostics capabilities etc, to match that reliability.
Can't find it now and don't remember many details, but ...
A long time ago, there was a story going around about Microsoft vs IBM >regarding the day-to-day operation of their company web sites. It
claimed that Microsoft was running a ~1000 machine server farm with a
crew of ~100, whereas IBM was running 3 mainframes with a crew of ~10.
On Tue, 4 Jun 2024 23:25:18 -0000 (UTC), Lawrence D'Oliveiro
<[email protected]d> wrote:
On Tue, 04 Jun 2024 13:11:52 GMT, Scott Lurndal wrote:
Lawrence D'Oliveiro <[email protected]d> writes:
On Sat, 1 Jun 2024 07:47:49 -0000 (UTC), Thomas Koenig wrote:
One of the main selling points [of zSeries] is the hardware
reliability ...
Quite an expensive way to get reliability. How does an outfit like >>>>Google achieve essentially 0% downtime? By running a swarm of half a >>>>million commodity servers, that’s how.
And that's not expensive?
Consider the equivalent number of mainframes, with their inbuilt >>diagnostics capabilities etc, to match that reliability.
Can't find it now and don't remember many details, but ...
A long time ago, there was a story going around about Microsoft vs IBM >regarding the day-to-day operation of their company web sites. It
claimed that Microsoft was running a ~1000 machine server farm with a
crew of ~100, whereas IBM was running 3 mainframes with a crew of ~10.
On Tue, 4 Jun 2024 23:25:18 -0000 (UTC), Lawrence D'Oliveiro
<[email protected]d> wrote:
On Tue, 04 Jun 2024 13:11:52 GMT, Scott Lurndal wrote:
Lawrence D'Oliveiro <[email protected]d> writes:
On Sat, 1 Jun 2024 07:47:49 -0000 (UTC), Thomas Koenig wrote:
One of the main selling points [of zSeries] is the hardware
reliability ...
Quite an expensive way to get reliability. How does an outfit like >>>>Google achieve essentially 0% downtime? By running a swarm of half a >>>>million commodity servers, that’s how.
And that's not expensive?
Consider the equivalent number of mainframes, with their inbuilt >>diagnostics capabilities etc, to match that reliability.
Can't find it now and don't remember many details, but ...
A long time ago, there was a story going around about Microsoft vs IBM regarding the day-to-day operation of their company web sites. It
claimed that Microsoft was running a ~1000 machine server farm with a
crew of ~100, whereas IBM was running 3 mainframes with a crew of ~10.
On Tue, 4 Jun 2024 23:25:18 -0000 (UTC), Lawrence D'Oliveiro
<[email protected]d> wrote:
On Tue, 04 Jun 2024 13:11:52 GMT, Scott Lurndal wrote:
Lawrence D'Oliveiro <[email protected]d> writes:
On Sat, 1 Jun 2024 07:47:49 -0000 (UTC), Thomas Koenig wrote:
One of the main selling points [of zSeries] is the hardware
reliability ...
Quite an expensive way to get reliability. How does an outfit like
Google achieve essentially 0% downtime? By running a swarm of half a
million commodity servers, that’s how.
And that's not expensive?
Consider the equivalent number of mainframes, with their inbuilt
diagnostics capabilities etc, to match that reliability.
Can't find it now and don't remember many details, but ...
A long time ago, there was a story going around about Microsoft vs IBM regarding the day-to-day operation of their company web sites. It
claimed that Microsoft was running a ~1000 machine server farm with a
crew of ~100, whereas IBM was running 3 mainframes with a crew of ~10.
If you're doing something that is mostly read-only and easy to
parallelize, then it makes sense to use a farm of cheap PCs. But if you
are a bank or an airline, you need to be able to lock your database so
that you debit a bank account or sell a plane seat exactly once. There
is a rule of thumb that the cost of locking something grows roughly as
the square of the number of things contending for the lock.
A long time ago, there was a story going around about Microsoft vs IBM >>regarding the day-to-day operation of their company web sites. It
claimed that Microsoft was running a ~1000 machine server farm with a
crew of ~100, whereas IBM was running 3 mainframes with a crew of ~10.
On Thu, 6 Jun 2024 11:55:22 -0000 (UTC), John Levine wrote:
If you're doing something that is mostly read-only and easy to
parallelize, then it makes sense to use a farm of cheap PCs. But if
you are a bank or an airline, you need to be able to lock your
database so that you debit a bank account or sell a plane seat
exactly once. There is a rule of thumb that the cost of locking
something grows roughly as the square of the number of things
contending for the lock.
Remember that the number of users actually buying a product at any
given time is only a small proportion (say 1%) of the number of users currently accessing the site.
Lawrence D'Oliveiro wrote:
Remember that the number of users actually buying a product at any
given time is only a small proportion (say 1%) of the number of users
currently accessing the site.
I don't know where you got that number ...
On Fri, 7 Jun 2024 03:13:54 -0000 (UTC), Stephen Fuld wrote:
Lawrence D'Oliveiro wrote:
users >> currently accessing the site.Remember that the number of users actually buying a product at any
given time is only a small proportion (say 1%) of the number of
I don't know where you got that number ...
From actual experience.
According to George Neuner <[email protected]>:
Consider the equivalent number of mainframes, with their inbuilt
diagnostics capabilities etc, to match that reliability.
Can't find it now and don't remember many details, but ...
A long time ago, there was a story going around about Microsoft vs IBM
regarding the day-to-day operation of their company web sites. It
claimed that Microsoft was running a ~1000 machine server farm with a
crew of ~100, whereas IBM was running 3 mainframes with a crew of ~10.
It depends on what you want to do.
If you're doing something that is mostly read-only and easy to
parallelize, then it makes sense to use a farm of cheap PCs. But if
you are a bank or an airline, you need to be able to lock your
database so that you debit a bank account or sell a plane seat exactly
once. There is a rule of thumb that the cost of locking something
grows roughly as the square of the number of things contending for
the lock.
For example, airline reservation systems are the classic example of a mainframe database. About 25 years ago, ITA Software had the bright
idea to do searches for seats and prices on racks of cheap PCs, which
worked great since it's read only, and if they suggest a seat or fare
that turns out to have just sold out, too bad, try again. But when
travel agents and airlines used it, they kept the ticketing info in a
regular database because it has to work.
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 14:35:24 |
| Calls: | 12,102 |
| Calls today: | 2 |
| Files: | 15,004 |
| Messages: | 6,518,029 |