Representing the PC as a fixed-point number because it records which
micro-op of the micro-op stream for an instruction got interrupted. It
was easier to restart the micro-op stream than to defer interrupts to
the next instruction.
The lead micro-ops on a interrupt return are just NOP'd out. ATM there
is no micro-op state that needs to be saved and restored through context switches, other than the index into the stream which can be saved as
part of the PC.
On 2025-04-27 4:53 p.m., MitchAlsup1 wrote:
On Sun, 27 Apr 2025 11:36:05 +0000, Robert Finch wrote:
Representing the PC as a fixed-point number because it records which
micro-op of the micro-op stream for an instruction got interrupted. It
was easier to restart the micro-op stream than to defer interrupts to
the next instruction.
Why not just backup to the instruction boundary ??
I think I was worried about an instruction disabling interrupts or
causing an exception that should be processed before the interrupt
occurred. (keeping the interrupt precise). I did not want to need to
consider other exceptions that might have occurred before the interrupt.
Searching for an instruction boundary in either direction is I think
more logic than just recording the micro-op number. It is more FFs to
record the number, but fewer LUTs. There is like 8 x10 bit comparators
plus muxes on the re-order buffer to backup to the instruction boundary
and mark an interrupts. Just recording the micro-op number is just
stuffing 3 bits into the PC, plus three bits propagated down the
pipeline (FFs). The PC has two zero bits available already.
The lead micro-ops on a interrupt return are just NOP'd out. ATM there
is no micro-op state that needs to be saved and restored through context >>> switches, other than the index into the stream which can be saved as
part of the PC.
Also note: this is "completion" interrupt model used in 010, 020, 030
{Not sure about 040}.
It caused:
a) a bunch of bugs
b) a lot of strange stack state
c) which could be attacked by any thread with privilege.
Although it "sounds" like it saves {time, energy, forward progress}
the state saving/restoring on the stack pretty much consumes all of
that.
a) is a bit worrisome. Doing something out of the ordinary is always
asking for bugs. I am guessing programmers tried to manipulate the stack state on the 68k series. I replicated the 010 as an FPGA core and IIRC I stuff the machine state number onto the stack. Which is bound to be a different number than the 010.
b) There is no state that needs to be stacked outside of what is
normally stacked for an interrupt.
For c) the usual interrupt as well
could be attacked by a thread with privilege.
I have coded it both ways, so I may make it a core option. Right up
there with page relative branching. There is already a flag to generate
the core for performance instead of size.
- Retire saves enough state (PC, stack pointer, flags, priv mode)
to restart plus saves the exception auxiliary details.
These can all be copied to privileged control registers
to be read later by the exception handler software.
- uses the exception number to select a new exception handler PC and SP
to jam those registers, and switch to Supervisor mode.
The first few instructions in the exception handler can copy the
exception info from the control regs onto the supervisor stack
(old PC, old SP, old flags, old priv mode, aux info) that the handler
needs to fix the fault and restart.
The exception handler fixes the fault if it can and restores the restart state (old PC, old SP, old flags, old priv mode) into the privileged
control registers. Afterward it executes a Return from Exception or
Interrupt REI instruction.
When Decode sees a REI instruction it:
- uses single step to wait for in-flight instructions to complete
- copies those privileged control registers back to PC, SP, flag, priv mode
- purges the Fetch stage so it restarts using the new PC and priv mode
On 2025-04-27 4:53 p.m., MitchAlsup1 wrote:
On Sun, 27 Apr 2025 11:36:05 +0000, Robert Finch wrote:
Representing the PC as a fixed-point number because it records which
micro-op of the micro-op stream for an instruction got interrupted. It
was easier to restart the micro-op stream than to defer interrupts to
the next instruction.
Why not just backup to the instruction boundary ??
I think I was worried about an instruction disabling interrupts or
causing an exception that should be processed before the interrupt
occurred. (keeping the interrupt precise). I did not want to need to
consider other exceptions that might have occurred before the interrupt.
Searching for an instruction boundary in either direction is I think
more logic than just recording the micro-op number.
It is more FFs to
record the number, but fewer LUTs. There is like 8 x10 bit comparators
plus muxes on the re-order buffer to backup to the instruction boundary
and mark an interrupts. Just recording the micro-op number is just
stuffing 3 bits into the PC, plus three bits propagated down the
pipeline (FFs). The PC has two zero bits available already.
On 2025-04-28 10:06 a.m., EricP wrote:----------------
Robert Finch wrote:
I may have to review my setup. I thought the exception handler would be
Exception handler needs the auxiliary info to know what to fix.
able to determine what is going on given the exception PC. It can find
the instruction excepting. The bad address for a page fault / privilege violation is available in the MMU via load/store instructions. There is nothing stored in the pipeline other than a fault cause code.
Other information needed for micro-op execution is part of the ordinary
state of the CPU. Micro-ops use several GPRs dedicated to micro-ops
usage.
When I first heard about micro-ops I envisioned them as being smaller
than the instructions in the ISA because of the term "micro". For
instance 16 or even 12-bits. I was having a heck of time trying to
implement with 16-bit micro-ops. Then I clued in, why not just make them bigger? They're not really micro-ops, it is more like mega-ops.
Current micro-op structure:
typedef struct packed {
logic v; // valid bit
logic [2:0] count; // number of micro-ops for instruction
logic [2:0] num; // the micro-op of the instruction
logic [1:0] xRs2; // extended register selection bits
logic [1:0] xRs1;
logic [1:0] xRd;
logic [3:0] xop4;
instruction_t ins; // The instruction
} micro_op_t;
On 2025-04-29 5:39 p.m., MitchAlsup1 wrote:
On Tue, 29 Apr 2025 2:35:27 +0000, Robert Finch wrote:
On 2025-04-28 10:06 a.m., EricP wrote:----------------
Robert Finch wrote:
I may have to review my setup. I thought the exception handler would be
Exception handler needs the auxiliary info to know what to fix.
able to determine what is going on given the exception PC. It can find
the instruction excepting. The bad address for a page fault / privilege
violation is available in the MMU via load/store instructions. There is
nothing stored in the pipeline other than a fault cause code.
For My case: The handler arrives with causation in R0, the first 64-bits
of the instruction in R1, and up to 3 operands to that inst in R2..R4.
In the case of page fault, the generated virtual address R2, and the
faulting PTE R3 are available to the handler. If the PTE is GuestOS
pertinent, the fault is delivered to GuestOS, if the PTE is HyperVisor
pertinent, the fault is delivered to HyperVisor.
Other information needed for micro-op execution is part of the ordinary
state of the CPU. Micro-ops use several GPRs dedicated to micro-ops
usage.
Do you have a code for when the microOp wants to use the same register
as the original instruction supplied ??
*poof* I forgot to take the operating mode into consideration. I think
this is easily fixed though.
Micro-ops use a subset of the regular ISA instructions, but the register specs fields are expanded to seven-bits so any register may be selected
for use. To use the same register as what is in the original instruction
it is just a matter of setting the extra register spec bits
appropriately. Extra bits "00" gets access to the integer GPRs. The
registers dedicated to micro-ops have codes outside of this range.
When I first heard about micro-ops I envisioned them as being smaller
than the instructions in the ISA because of the term "micro". For
instance 16 or even 12-bits. I was having a heck of time trying to
implement with 16-bit micro-ops. Then I clued in, why not just make them bigger? They're not really micro-ops, it is more like mega-ops.
Current micro-op structure:
typedef struct packed {
logic v; // valid bit
logic [2:0] count; // number of micro-ops for instruction
logic [2:0] num; // the micro-op of the instruction
logic [1:0] xRs2; // extended register selection bits
logic [1:0] xRs1;
logic [1:0] xRd;
logic [3:0] xop4;
instruction_t ins; // The instruction
} micro_op_t;
Robert Finch <[email protected]> schrieb:
When I first heard about micro-ops I envisioned them as being smaller
than the instructions in the ISA because of the term "micro". For
instance 16 or even 12-bits. I was having a heck of time trying to
implement with 16-bit micro-ops. Then I clued in, why not just make them
bigger? They're not really micro-ops, it is more like mega-ops.
AMD uses 64-bit micro-ops, see the link I posted recently (and
again, below). It is actually a RISC-like ISA, which makes sense,
because you don't want to spend a lot of time decoding micro-ops.
They have 64 bit micro-op length, and most fields they could have
in any instruction has its unique place.
https://bughunters.google.com/blog/5424842357473280/zen-and-the-art-of-microcode-hacking
Current micro-op structure:
typedef struct packed {
logic v; // valid bit
logic [2:0] count; // number of micro-ops for instruction
logic [2:0] num; // the micro-op of the instruction
logic [1:0] xRs2; // extended register selection bits
logic [1:0] xRs1;
logic [1:0] xRd;
logic [3:0] xop4;
instruction_t ins; // The instruction
} micro_op_t;
Hmm... I don't know what your ISA looks like, but having the
original instruction looks strange. Why not take a page from
AMD's book? It looks like a reasonable philosophy, and obviously it
works for them, or they would have done something different by now.
And a paper on IBM Millicode which is kind of like Alpha PAL code
and may be similar to Robert's mega-ops.
The What and Why of System z Millicode 2012 https://share.confex.com/share/119/webprogram/Handout/Session11773/The%20What%20and%20Why%20of%20System%20z%20Millicode%20-%20%2311773.pdf
On 02/05/2025 16:18, EricP wrote:
And a paper on IBM Millicode which is kind of like Alpha PAL code
and may be similar to Robert's mega-ops.
The What and Why of System z Millicode 2012
https://share.confex.com/share/119/webprogram/Handout/Session11773/The%20What%20and%20Why%20of%20System%20z%20Millicode%20-%20%2311773.pdf
Thanks for that reference.
I struggle to see how "miilicode" differs in essentials from
the "extracode" implementation of complex orders on the
Ferranti Orion & Atlas, or the ICT 1900 Series, of 60 years ago.
moi wrote:
On 02/05/2025 16:18, EricP wrote:
And a paper on IBM Millicode which is kind of like Alpha PAL code
and may be similar to Robert's mega-ops.
The What and Why of System z Millicode 2012
https://share.confex.com/share/119/webprogram/Handout/Session11773/The%20What%20and%20Why%20of%20System%20z%20Millicode%20-%20%2311773.pdf
Thanks for that reference.
I struggle to see how "miilicode" differs in essentials from
the "extracode" implementation of complex orders on the
Ferranti Orion & Atlas, or the ICT 1900 Series, of 60 years ago.
From what I can find, they sound somewhat similar.
Extracode is normal ISA instructions stored in a
separate memory accessible in a special mode.
Both PAL and Milli code also use the ISA format instructions
stored in separate memory and a special mode. However they also
disable interrupts and give access to special hardware registers.
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 11:26:28 |
| Calls: | 12,100 |
| Files: | 15,003 |
| Messages: | 6,517,994 |