MitchAlsup wrote:
John Levine <[email protected]> posted:
It appears that Waldek Hebisch <[email protected]> said:
My idea was that instruction decoder could essentially translate
ADDL (R2)+, R2, R3
into
MOV (R2)+, TMP
ADDL TMP, R2, R3
But how about this?
ADDL3 (R2)+,(R2)+,(R2)+
Now you need at least two temps, the second of which depends on the
first, and there are instructions with six operands. Or how about
this:
ADDL3 (R2)+,#1234,(R2)+
This is encoded as
OPCODE (R2)+ (PC)+ <1234> (R2)+
The immediate word is in the middle of the instruction. You have to decode >> the operands one at a time so you can recognize immediates and skip over them.
It must have seemed clever at the time, but ugh.
What we must all realize is that each address mode in VAX was a microinstruction all unto itself.
And that is why it was not pipelineable in any real sense.
Yes. The instructions are designed to parsed by a byte-code interpreter
in microcode. Even the NVAX in 1992 its Decode can only produce one
operand per clock.
If that operand is one of the complex memory address modes then it
might be possible to dispatch it and let the back end chew on it
while Decode works on the second operand.
But that assumes the operands are in slow memory. If they are in fast
registers then it stalls waiting for the second and third operands to be decoded making a pipeline pointless.
And since programs mostly put operands in registers it stalls at Decode.
One might say we should just build a fast decoder. But if you look at
the instruction formats, even the simplest 2 register instructions are
3 bytes and would require looking at 24 instruction bits and 3 valid bits
or 27 bits at once. The 3 operand rs1,rs2,rd instructions is 36 bits.
That decoder has to deal with 2^27 or 2^36 possibilities!
And that just handles 2 and 3 register instructions, no memory references.
It is hypothetically possible with a pre-decode stage to compact those
down to 17 bits for 2 register and 21 bits for 3 register but that is
still too many possibilities. That just throws transistors at a problem
that never needed to exist in the first place, and would still not be affordable in 1992 NMOS, certainly not in 1975 TTL.
If we look at what the VAX is actually spending most of its time on,
2 and 3 register ALU operations, those can be decoded in parallel by
looking at 10 bits (8 opcode + 2 valid) for 2 register,
15 bits (12 opcode + 3 valid) for 3 register instructions.
Which is quite doable in 1975 TTL in 1 clock.
And that allows the pipeline to not stall at Decode.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)