Eventually, IBM caught up with the Control
Data 6600 by perfecting pipelining in the IBM 360/91, and then combining
it with cache in the 360/195. From the Pentium II onwards, that's the
way computers are made nowadays.
quadibloc <[email protected]> writes:
Eventually, IBM caught up with the Control
Data 6600 by perfecting pipelining in the IBM 360/91, and then combining
it with cache in the 360/195. From the Pentium II onwards, that's the
way computers are made nowadays.
Pipelining and caches are already used on the MIPS R2000 in 1986, and
the 486 in 1989.
You are probably thinking of OoO Execution, where people usually write
as if the Tomasulo algorithm of the 360/91 as implemented the modern
concept of OoO execution. But the 360/91 only did OoO for FP, did not >support branch prediction, had imprecise exceptions, and the Tomasulo >algorithm was used primarily as a workaround for the dearth of FP
registers in the S/360.
The 360/91 had primitive branch prediction in "loop mode". It had an
eight doublewprd instruction queue (which it confusingly called a stack.)
If a program did a backward branch of less than eight doublewords, it'd
stop prefetching and execute out of the queue until the program fell or >branched out.
quadibloc <[email protected]> writes:
Eventually, IBM caught up with the Control
Data 6600 by perfecting pipelining in the IBM 360/91, and then combining
it with cache in the 360/195. From the Pentium II onwards, that's the
way computers are made nowadays.
Pipelining and caches are already used on the MIPS R2000 in 1986, and
the 486 in 1989.
You are probably thinking of OoO Execution, where people usually write
as if the Tomasulo algorithm of the 360/91 as implemented the modern
concept of OoO execution. But the 360/91 only did OoO for FP, did not support branch prediction, had imprecise exceptions, and the Tomasulo algorithm was used primarily as a workaround for the dearth of FP
registers in the S/360.
The innovation that made OoO execution generally usable rather than a publicity stunt like the 360/91 is the reorder buffer (ROB), which allows to retire the instructions in-order, and to cancel speculatively
"executed" instructions after an exception or branch misprediction.
The Pentium Pro (introduced 1995-11-01), HP PA-8000 (introduced
1995-11-02), and MIPS R10000 (introduced 1996-01) are the first microprocessors which have full-blown OoO execution.
But as someone pointed out to me, IBM has implemented OoO execution
between the 370/195 and the Pentium Pro: The ES/9000 models 900 and
820 (shipping from September 1991) "were the first models with
out-of-order execution since the System/370-195 of 1973. However
unlike the old S/360-91-derived systems, the models 900 and 820 had
full out-of-order execution for both integer and floating-point units,
with precise exception handling, and a fully superscalar pipeline." <https://en.wikipedia.org/wiki/IBM_System/390#ES/9000>. So apparently
they had a ROB, and AFAIK were the first machines to have one. These
models also had a branch target buffer; the article does not mention
branch prediction proper, but given a ROB and a branch target buffer,
it would be surprising if they did not predict branches.
So who came up with the concept of the ROB? I recently looked at one
of the HPS papers (Hwu, Patt, Shebanov on a High Performance Substrate
for the VAX from the mid-late 80s) again, and there was no ROB in that
paper. I did not revisit their later papers whether they had it
there. So apparently ROBs were not known in the mid-1980s in
academia, and in 1991 there was hardware with a ROB commercially
available, and a few years later it appeared in microprocessors.
I wonder how early and how much IBM talked about their ES/9000 OoO implementation and features, but that may have inspired the teams at
Intel, HP and SGI; or maybe there was an ealier source that inspired
them all, but only in 1995/1996 the number of transistors on a chip
was enough to do OoO on a microprocessor.
Ironically, in the transition to CMOS (i.e., microprocessors) IBM
mainframe processors regressed back to in-order (and, I think,
single-issue) again (but with higher clock rates), and in the early
2000s they looked pretty outdated to me. In the meantime they have re-progressed to OoO again AFAIK.
Back to OoO: it's interesting that Tomasulo and the 360/91 are
mentioned often, but the ROB and its inventor(s?), which are at least
as important for the success of OoO execution, isn't.
- anton
On Mon, 19 May 2025 6:22:42 +0000, Anton Ertl wrote:
You are probably thinking of OoO Execution, where people usually write
as if the Tomasulo algorithm of the 360/91 as implemented the modern
concept of OoO execution. But the 360/91 only did OoO for FP, did not
support branch prediction, had imprecise exceptions, and the Tomasulo
algorithm was used primarily as a workaround for the dearth of FP
registers in the S/360.
Yes, I was thinking of OoO execution, as opposed to other forms of
pipelining - basic pipelining was used in the 7094 II and even the 6502.
The Pentium II (and Pentium Pro) also only used OoO for floating-point,
while the 68050 only used OoO for integers!
John, you are usually much better informed!
Terje
The innovation that made OoO execution generally usable rather than a publicity stunt like the 360/91 is the reorder buffer (ROB), which
allows to retire the instructions in-order, and to cancel
speculatively "executed" instructions after an exception or branch misprediction.
The Pentium Pro (introduced 1995-11-01), HP PA-8000 (introduced
1995-11-02), and MIPS R10000 (introduced 1996-01) are the first microprocessors which have full-blown OoO execution.
Wasn't one of the earliest forms of branch prediction the simple
heuristic of always taking it in one direction and not taking it in the
other direction , I seem to remember that being the case for some of the early pipelined microprocessors. I believe it was called static branch prediction compared to the more modern dynamic branch prediction.
What about PPC604? It had more limited OoO resources than the 3
processors you mentioned above, esp. fewer numeber of reservation
stations, but it most certainly had reorder buffers, 16 of them.
So, by your own definitions, it should be called the first single-chip full-blown CPU.
On Mon, 19 May 2025 19:09:12 +0000, Ze wrote:
Wasn't one of the earliest forms of branch prediction the simple
heuristic of always taking it in one direction and not taking it in the
other direction , I seem to remember that being the case for some of the
early pipelined microprocessors. I believe it was called static branch
prediction compared to the more modern dynamic branch prediction.
The simple heuristic I remember was to assume that backward branches
would
be more likely to be taken than not (on the grounds that they were
probably loops) while forward ones would more likely not be taken (I
guess
as an excuse for not disturbing the pipeline too much).
On Tue, 20 May 2025 0:04:03 +0000, Lawrence D'Oliveiro wrote:
On Mon, 19 May 2025 19:09:12 +0000, Ze wrote:
Wasn't one of the earliest forms of branch prediction the simple
heuristic of always taking it in one direction and not taking it in the
other direction , I seem to remember that being the case for some of the >>> early pipelined microprocessors. I believe it was called static branch
prediction compared to the more modern dynamic branch prediction.
The simple heuristic I remember was to assume that backward branches
would
be more likely to be taken than not (on the grounds that they were
probably loops) while forward ones would more likely not be taken (I
guess
as an excuse for not disturbing the pipeline too much).
CDC 7600 used this scheme. Backwards taken, forwards not-taken.
Was about 70% accurate for essentially zero storage and 1 (or few)
gates.
On Mon, 19 May 2025 06:22:42 GMT
[email protected] (Anton Ertl) wrote:
The Pentium Pro (introduced 1995-11-01), HP PA-8000 (introduced
1995-11-02), and MIPS R10000 (introduced 1996-01) are the first
microprocessors which have full-blown OoO execution.
What about PPC604? It had more limited OoO resources than the 3
processors you mentioned above, esp. fewer numeber of reservation
stations, but it most certainly had reorder buffers, 16 of them.
So, by your own definitions, it should be called the first single-chip >full-blown CPU.
John Levine <[email protected]> writes:
The 360/91 had primitive branch prediction in "loop mode". It had an
eight doublewprd instruction queue (which it confusingly called a stack.) >>If a program did a backward branch of less than eight doublewords, it'd >>stop prefetching and execute out of the queue until the program fell or >>branched out.
The 68010 had a similar feature (with a smaller buffer), but I don't
think one would call it branch prediction. In any case, I meant
speculative execution based on branch prediction (but did not write it
that way), and the 360/91 did not do speculative execution AFAIK.
- anton
Most DSPs have some kind of "loop buffer" from which they can execute
without fetching code from memory.
On Mon, 19 May 2025 17:46:45 GMT, [email protected]
(Anton Ertl) wrote:
John Levine <[email protected]> writes:
The 360/91 had primitive branch prediction in "loop mode". It had an
eight doublewprd instruction queue (which it confusingly called a stack.) >>> If a program did a backward branch of less than eight doublewords, it'd
stop prefetching and execute out of the queue until the program fell or
branched out.
The 68010 had a similar feature (with a smaller buffer), but I don't
think one would call it branch prediction. In any case, I meant
speculative execution based on branch prediction (but did not write it
that way), and the 360/91 did not do speculative execution AFAIK.
- anton
Most DSPs have some kind of "loop buffer" from which they can execute
without fetching code from memory.
quadibloc <[email protected]> writes:
Eventually, IBM caught up with the Control
Data 6600 by perfecting pipelining in the IBM 360/91, and then combining
it with cache in the 360/195. From the Pentium II onwards, that's the
way computers are made nowadays.
Pipelining and caches are already used on the MIPS R2000 in 1986, and
the 486 in 1989.
Anton Ertl <[email protected]> schrieb:
quadibloc <[email protected]> writes:
Eventually, IBM caught up with the Control
Data 6600 by perfecting pipelining in the IBM 360/91,
and then
combining
it with cache in the 360/195.
From the Pentium II onwards, that's the
way computers are made nowadays.
Pipelining and caches are already used on the MIPS R2000 in 1986, and
the 486 in 1989.
Or the 801. That may have been the first machine to have
separate I- and D-caches (was it?)
quadibloc <[email protected]> writes:
Eventually, IBM caught up with the Control Data 6600 by perfecting
pipelining in the IBM 360/91,
At the cost of about 3× the number of gates and power along with a 60% increase in the clock rate (60ns versus 100ns). This advantage vanished
about the time of first /91 deliveries with CDC 7600 going to a ~27ns
clock along with pipelining and concurrent calculation.
Mc68010 had a "loop buffer" of a couple handful of instructions.
Mc68020 had 256B instruction cache no TLB
Mc68030 had 256B I$ 256B D$ and ~32E TLB tablewalks in HW
Mc68010 had a "loop buffer" of a couple handful of instructions.
Michael S <[email protected]> writes:
On Mon, 19 May 2025 06:22:42 GMT
[email protected] (Anton Ertl) wrote:
The Pentium Pro (introduced 1995-11-01), HP PA-8000 (introduced
1995-11-02), and MIPS R10000 (introduced 1996-01) are the first
microprocessors which have full-blown OoO execution.
What about PPC604? It had more limited OoO resources than the 3
processors you mentioned above, esp. fewer numeber of reservation
stations, but it most certainly had reorder buffers, 16 of them.
So, by your own definitions, it should be called the first
single-chip full-blown CPU.
Yes. The OoO nature with ROB is explained in <https://arstechnica.com/articles/paedia/cpu/ppc-1.ars/6>.
Somehow that did not register with me earlier (even though a collegue
had a Mac with a PPC 604e IIRC). I guess it's because Apple Marketing
is low on technical details, and if Motorola emphasized this aspect,
that did not pass the filters of the press. Also, IIRC the
performance was not so exceptional that it would direct a spotlight at
the underlying technology, whereas the Pentium Pro with its suprising
SPECint win certainly did. Finally, the successors of the 604 (in particular, the PPC 7450) did not progress much further with OoO
execution
and still had only mild OoO capabilities at a time when the
Pentium 4 already has a 128-entry ROB (and other structure sizes to
match). So given the lack of ambition in the 7450, I did not even
think about the possibility that the 604 might have been the first microprocessor with OoO execution.
- anton
Steve Jobs (he always prefer Intel but until this millennium did not
poses political power to impose his preferences on technical team)
lasting for about 3 years.
Steve Jobs (he always prefer Intel but until this millennium did not
poses political power to impose his preferences on technical team)
lasting for about 3 years.
His hardware products at NeXT prove this is nonsense.
The last NeXT prototype that I saw in a Moto lab in Austin
used the 88110.
He was completely capable of forcing his will on Apple hardware
engineers. Project leads who disagreed were let go or put into
continuation engineering.
The switch was pragmatic and forced because of the weak PPC
roadmap, especially in the portable space.
Steve Jobs (he always prefer Intel but until this millennium did not
poses political power to impose his preferences on technical team)
lasting for about 3 years.
His hardware products at NeXT prove this is nonsense.
The switch was pragmatic and forced because of the weak PPC
roadmap, especially in the portable space.
On Fri, 30 May 2025 10:51:14 -0700, Al Kossow wrote:
Steve Jobs (he always prefer Intel but until this millennium did not
poses political power to impose his preferences on technical team)
lasting for about 3 years.
His hardware products at NeXT prove this is nonsense.
Also, the entire history of the development of the first-generation
Macintosh -- Motorola all the way, even after the switch from 68K to
PowerPC.
The switch was pragmatic and forced because of the weak PPC
roadmap, especially in the portable space.
That’s why the last-gasp PowerPC processor that was used in any
Macintosh,
the G5, came from IBM, not Motorola. I think the hope was that IBM would
step in where Motorola was faltering. But that hope didn’t last long.
We (Unisys) had some systems designed around the 88100 in
that time frame. Apple's decision to go to PPC rather than
the 88110 caused us to evaluate all the current available
processors (SPARC, MIPS, x86, and PPC). For rather pragmatic
reasons (the target machine used the Intel Paragon backplane),
the Pentium Pro was the ultimate choice, used to build the
OPUS family of massively parallel (yet single-system image)
computer systems.
On Tue, 20 May 2025 21:21:07 GMT[PPC604]
[email protected] (Anton Ertl) wrote:
Michael S <[email protected]> writes:
Yes. The OoO nature with ROB is explained in
<https://arstechnica.com/articles/paedia/cpu/ppc-1.ars/6>.
=20
Somehow that did not register with me earlier (even though a collegue
had a Mac with a PPC 604e IIRC). I guess it's because Apple Marketing
is low on technical details, and if Motorola emphasized this aspect,
that did not pass the filters of the press. Also, IIRC the
performance was not so exceptional that it would direct a spotlight at
the underlying technology, whereas the Pentium Pro with its suprising
SPECint win certainly did. Finally, the successors of the 604 (in
particular, the PPC 7450) did not progress much further with OoO
execution=20
=46rom uArch perspective, PPC/MPC 7xx and 7xxx are really successors of
603 rather than of 604.
An offspring that attempted to re-enter PC processors market was PPC970
(a red-headed little brother of POWER4). This foray was terminated by
Steve Jobs (he always prefer Intel but until this millennium did not
poses political power to impose his preferences on technical team)
lasting for about 3 years.
quadibloc wrote:
The Pentium II (and Pentium Pro) also only used OoO for floating-point,
while the 68050 only used OoO for integers!
Huh???
The Pentium (all versions) had two pipes (u & v), both in-order, and
with severe limitations on which opcodes could run in v in parallel with
the primary opcode in the u pipe.
The P6/PentiumPro OTOH does true OoO for all instruction types.
John, you are usually much better informed!
On Mon, 19 May 2025 22:04:22 +0200, Terje Mathisen wrote:
quadibloc wrote:
The Pentium II (and Pentium Pro) also only used OoO for floating-point,
while the 68050 only used OoO for integers!
Huh???
The Pentium (all versions) had two pipes (u & v), both in-order, and
with severe limitations on which opcodes could run in v in parallel with
the primary opcode in the u pipe.
The P6/PentiumPro OTOH does true OoO for all instruction types.
John, you are usually much better informed!
I had read somewhere that the Pentium Pro and the Pentium II, like the System/360 Model 91, were OoO only in their floating-point pipelines. If
that source was faulty, and better sources say differently, I'll need to check on it.
John Savard
I don't know about PPro in the integer section, but it was definitely
OoO in branches, the memory section, and in the PFU. So, I don't see
why they would not have had integer OoO.
The Anderson papers indicate the /91 was just heavily pipelined in the integer side.
Not good enough to keep up with CDC?
After about two years of promising that they would blow CDC out of the
water ...
On Sat, 26 Jul 2025 02:45:56 +0000, Lawrence D'Oliveiro wrote:
Not good enough to keep up with CDC?
After about two years of promising that they would blow CDC out of
the water ...
The IBM System/360 Model 91 wasn't even good enough to keep up with
the Model 85.
However, IBM still realized that OoO was useful, even if it delivered
less than the promised improvement in performance. So they went on to
the Model 195 which added cache to the Model 91 design. That did work
well enough that *I think* it actually did out-perform the CDC
machines of the time.
Even if it didn't, it performed well, and could have been considered a superior alternative - the CDC 6600 had reliabillity problems, I
remember reading. So it would only have had to come close to the 7600
or whatever CDC had at the time in such a situation.
John Savard
From what I see in Wkipedia, it looks like all "number-crunching
oriented" S/360 Models, i.e. 85, 91 and 195, were failures from
business POV, even if to slightly different degrees (85 less bad).
May be, S/370 Model 195 was more successful, I was not able to find info >about number of units shipped.
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 21:19:12 |
| Calls: | 12,104 |
| Calls today: | 4 |
| Files: | 15,004 |
| Messages: | 6,518,112 |