Forum: >>> Magnum BBS <<<

Re: fractional PCs

From MitchAlsup1@21:1/5 to Robert Finch on Sun Apr 27 20:53:34 2025

On Sun, 27 Apr 2025 11:36:05 +0000, Robert Finch wrote:

Representing the PC as a fixed-point number because it records which
micro-op of the micro-op stream for an instruction got interrupted. It
was easier to restart the micro-op stream than to defer interrupts to
the next instruction.

Why not just backup to the instruction boundary ??

The lead micro-ops on a interrupt return are just NOP'd out. ATM there
is no micro-op state that needs to be saved and restored through context switches, other than the index into the stream which can be saved as
part of the PC.

Also note: this is "completion" interrupt model used in 010, 020, 030
{Not sure about 040}.

It caused:
a) a bunch of bugs
b) a lot of strange stack state
c) which could be attacked by any thread with privilege.

Although it "sounds" like it saves {time, energy, forward progress}
the state saving/restoring on the stack pretty much consumes all of
that.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Robert Finch on Mon Apr 28 10:06:20 2025

Robert Finch wrote:

On 2025-04-27 4:53 p.m., MitchAlsup1 wrote:

On Sun, 27 Apr 2025 11:36:05 +0000, Robert Finch wrote:

Representing the PC as a fixed-point number because it records which
micro-op of the micro-op stream for an instruction got interrupted. It
was easier to restart the micro-op stream than to defer interrupts to
the next instruction.

Why not just backup to the instruction boundary ??

I think I was worried about an instruction disabling interrupts or
causing an exception that should be processed before the interrupt
occurred. (keeping the interrupt precise). I did not want to need to
consider other exceptions that might have occurred before the interrupt.

I assume you are using the words interrupt and exception interchangeably. Especially when discussing at the uArch level, I find it best to keep
these separate as the mechanisms are quite different.

External interrupts can use a single-step mechanism to stall at Fetch
and wait for the older instructions to complete, so it knows there are
no older exceptions or branch mispredict purges or control changes
(eg. there is no Interrupt Disable instruction in-flight).

Exceptions are defined as synchronous with a single instruction so you
can use the Instruction Queue/ROB to synchronize older vs younger ones.

Searching for an instruction boundary in either direction is I think
more logic than just recording the micro-op number. It is more FFs to
record the number, but fewer LUTs. There is like 8 x10 bit comparators
plus muxes on the re-order buffer to backup to the instruction boundary
and mark an interrupts. Just recording the micro-op number is just
stuffing 3 bits into the PC, plus three bits propagated down the
pipeline (FFs). The PC has two zero bits available already.

You don't need to do that back-up scanning thing or worry about
older exceptions if you transfer the exception info to the IQ/ROB uOp
as "results" and let Retire take care of it. At Retire you know
there are no older exceptions and that the committed state
managed by Retire is synchronized with this instruction's start.

An OoO instruction that executes normally will write back its results and
mark the uOp as Done. When the uOp gets to the Retire stage in the queue, Retire sees its done so updates the committed state, PC and registers,
and removes it.

If an instruction exception occurs during execution, at write back
instead it marks the uOp as Except and records the exception number
and any auxiliary info into the uOp, such as the address that page
faulted and what R/W/E access it was attempting.

When an Except uOp reaches Retire, it sees the Except flag and
- purges all the uOps in flight including the one with the exception
(so the PC and registers end up in the state they would be before
the Except instruction, making the exception precise).

Conceptually a big long CancelAll wire running to all function units
and front end stages, and halts Fetch until further notice.

- resets the future state to match the committed state

- Retire saves enough state (PC, stack pointer, flags, priv mode)
to restart plus saves the exception auxiliary details.
These can all be copied to privileged control registers
to be read later by the exception handler software.

- uses the exception number to select a new exception handler PC and SP
to jam those registers, and switch to Supervisor mode.

The first few instructions in the exception handler can copy the
exception info from the control regs onto the supervisor stack
(old PC, old SP, old flags, old priv mode, aux info) that the handler
needs to fix the fault and restart.

The exception handler fixes the fault if it can and restores the restart
state (old PC, old SP, old flags, old priv mode) into the privileged
control registers. Afterward it executes a Return from Exception or
Interrupt REI instruction.

When Decode sees a REI instruction it:
- uses single step to wait for in-flight instructions to complete
- copies those privileged control registers back to PC, SP, flag, priv mode
- purges the Fetch stage so it restarts using the new PC and priv mode

The lead micro-ops on a interrupt return are just NOP'd out. ATM there
is no micro-op state that needs to be saved and restored through context >>> switches, other than the index into the stream which can be saved as
part of the PC.

Also note: this is "completion" interrupt model used in 010, 020, 030
{Not sure about 040}.

It caused:
a) a bunch of bugs
b) a lot of strange stack state
c) which could be attacked by any thread with privilege.

Although it "sounds" like it saves {time, energy, forward progress}
the state saving/restoring on the stack pretty much consumes all of
that.

a) is a bit worrisome. Doing something out of the ordinary is always
asking for bugs. I am guessing programmers tried to manipulate the stack state on the 68k series. I replicated the 010 as an FPGA core and IIRC I stuff the machine state number onto the stack. Which is bound to be a different number than the 010.
b) There is no state that needs to be stacked outside of what is
normally stacked for an interrupt.

Exception handler needs the auxiliary info to know what to fix.

For c) the usual interrupt as well
could be attacked by a thread with privilege.

I have coded it both ways, so I may make it a core option. Right up
there with page relative branching. There is already a flag to generate
the core for performance instead of size.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to EricP on Mon Apr 28 10:50:39 2025

EricP wrote:

Oh I forgot to mention this mechanism is not interrupt reentrant so
must disable and enable interrupts. (But its still cheaper than x86.)

- Retire saves enough state (PC, stack pointer, flags, priv mode)
to restart plus saves the exception auxiliary details.
These can all be copied to privileged control registers
to be read later by the exception handler software.

And disables interrupts so the those control registers can't be
overwritten by an interrupt before we have a chance to save them.

- uses the exception number to select a new exception handler PC and SP
to jam those registers, and switch to Supervisor mode.

The first few instructions in the exception handler can copy the
exception info from the control regs onto the supervisor stack
(old PC, old SP, old flags, old priv mode, aux info) that the handler
needs to fix the fault and restart.

After handler saves the state to the kernel stack it enables interrupts.

And disables interrupts just before it starts the restore sequence below.

The exception handler fixes the fault if it can and restores the restart state (old PC, old SP, old flags, old priv mode) into the privileged
control registers. Afterward it executes a Return from Exception or
Interrupt REI instruction.

The old flags-mode registers also has an interrupt enable flag to restore.

When Decode sees a REI instruction it:
- uses single step to wait for in-flight instructions to complete
- copies those privileged control registers back to PC, SP, flag, priv mode

And restores the committed interrupt enable flag.

- purges the Fetch stage so it restarts using the new PC and priv mode

And Fetch's interrupt enable flag state is set to the committed state.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Robert Finch on Mon Apr 28 22:02:23 2025

On Mon, 28 Apr 2025 2:32:52 +0000, Robert Finch wrote:

On 2025-04-27 4:53 p.m., MitchAlsup1 wrote:

On Sun, 27 Apr 2025 11:36:05 +0000, Robert Finch wrote:

Representing the PC as a fixed-point number because it records which
micro-op of the micro-op stream for an instruction got interrupted. It
was easier to restart the micro-op stream than to defer interrupts to
the next instruction.

Why not just backup to the instruction boundary ??

I think I was worried about an instruction disabling interrupts or
causing an exception that should be processed before the interrupt
occurred. (keeping the interrupt precise). I did not want to need to
consider other exceptions that might have occurred before the interrupt.

Searching for an instruction boundary in either direction is I think
more logic than just recording the micro-op number.

You say your interrupt-PC is fixed point so it can point at the
micro-Op that raised the exception (or was interrupted). It
seems to me that simply wiping the fractional bits from the PC
should put you at the instruction boundary. That is:: Round Down.

It is more FFs to
record the number, but fewer LUTs. There is like 8 x10 bit comparators
plus muxes on the re-order buffer to backup to the instruction boundary
and mark an interrupts. Just recording the micro-op number is just
stuffing 3 bits into the PC, plus three bits propagated down the
pipeline (FFs). The PC has two zero bits available already.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Robert Finch on Tue Apr 29 21:39:49 2025

On Tue, 29 Apr 2025 2:35:27 +0000, Robert Finch wrote:

On 2025-04-28 10:06 a.m., EricP wrote:

Robert Finch wrote:

----------------

Exception handler needs the auxiliary info to know what to fix.

I may have to review my setup. I thought the exception handler would be
able to determine what is going on given the exception PC. It can find
the instruction excepting. The bad address for a page fault / privilege violation is available in the MMU via load/store instructions. There is nothing stored in the pipeline other than a fault cause code.

For My case: The handler arrives with causation in R0, the first 64-bits
of the instruction in R1, and up to 3 operands to that inst in R2..R4.
In the case of page fault, the generated virtual address R2, and the
faulting PTE R3 are available to the handler. If the PTE is GuestOS
pertinent, the fault is delivered to GuestOS, if the PTE is HyperVisor pertinent, the fault is delivered to HyperVisor.

Other information needed for micro-op execution is part of the ordinary
state of the CPU. Micro-ops use several GPRs dedicated to micro-ops
usage.

Do you have a code for when the microOp wants to use the same register
as the original instruction supplied ??

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to Robert Finch on Wed Apr 30 18:09:30 2025

Robert Finch <[email protected]> schrieb:

When I first heard about micro-ops I envisioned them as being smaller
than the instructions in the ISA because of the term "micro". For
instance 16 or even 12-bits. I was having a heck of time trying to
implement with 16-bit micro-ops. Then I clued in, why not just make them bigger? They're not really micro-ops, it is more like mega-ops.

AMD uses 64-bit micro-ops, see the link I posted recently (and
again, below). It is actually a RISC-like ISA, which makes sense,
because you don't want to spend a lot of time decoding micro-ops.
They have 64 bit micro-op length, and most fields they could have
in any instruction has its unique place.

https://bughunters.google.com/blog/5424842357473280/zen-and-the-art-of-microcode-hacking

Current micro-op structure:

typedef struct packed {
logic v; // valid bit
logic [2:0] count; // number of micro-ops for instruction
logic [2:0] num; // the micro-op of the instruction
logic [1:0] xRs2; // extended register selection bits
logic [1:0] xRs1;
logic [1:0] xRd;
logic [3:0] xop4;
instruction_t ins; // The instruction
} micro_op_t;

Hmm... I don't know what your ISA looks like, but having the
original instruction looks strange. Why not take a page from
AMD's book? It looks like a reasonable philosophy, and obviously it
works for them, or they would have done something different by now.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Robert Finch on Wed Apr 30 19:04:12 2025

On Wed, 30 Apr 2025 5:21:13 +0000, Robert Finch wrote:

On 2025-04-29 5:39 p.m., MitchAlsup1 wrote:

On Tue, 29 Apr 2025 2:35:27 +0000, Robert Finch wrote:

On 2025-04-28 10:06 a.m., EricP wrote:

Robert Finch wrote:

----------------

Exception handler needs the auxiliary info to know what to fix.

I may have to review my setup. I thought the exception handler would be
able to determine what is going on given the exception PC. It can find
the instruction excepting. The bad address for a page fault / privilege
violation is available in the MMU via load/store instructions. There is
nothing stored in the pipeline other than a fault cause code.

For My case: The handler arrives with causation in R0, the first 64-bits
of the instruction in R1, and up to 3 operands to that inst in R2..R4.
In the case of page fault, the generated virtual address R2, and the
faulting PTE R3 are available to the handler. If the PTE is GuestOS
pertinent, the fault is delivered to GuestOS, if the PTE is HyperVisor
pertinent, the fault is delivered to HyperVisor.

Other information needed for micro-op execution is part of the ordinary
state of the CPU. Micro-ops use several GPRs dedicated to micro-ops
usage.

Do you have a code for when the microOp wants to use the same register
as the original instruction supplied ??

*poof* I forgot to take the operating mode into consideration. I think
this is easily fixed though.

Micro-ops use a subset of the regular ISA instructions, but the register specs fields are expanded to seven-bits so any register may be selected
for use. To use the same register as what is in the original instruction
it is just a matter of setting the extra register spec bits
appropriately. Extra bits "00" gets access to the integer GPRs. The
registers dedicated to micro-ops have codes outside of this range.

When I first heard about micro-ops I envisioned them as being smaller
than the instructions in the ISA because of the term "micro". For
instance 16 or even 12-bits. I was having a heck of time trying to
implement with 16-bit micro-ops. Then I clued in, why not just make them bigger? They're not really micro-ops, it is more like mega-ops.

Current micro-op structure:

typedef struct packed {
logic v; // valid bit
logic [2:0] count; // number of micro-ops for instruction
logic [2:0] num; // the micro-op of the instruction
logic [1:0] xRs2; // extended register selection bits
logic [1:0] xRs1;
logic [1:0] xRd;
logic [3:0] xop4;
instruction_t ins; // The instruction
} micro_op_t;

That is about right at ~48-bits:: you have to be able to encode
EVERYTHING you want to do.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Thomas Koenig on Fri May 2 11:18:35 2025

Thomas Koenig wrote:

Robert Finch <[email protected]> schrieb:

When I first heard about micro-ops I envisioned them as being smaller
than the instructions in the ISA because of the term "micro". For
instance 16 or even 12-bits. I was having a heck of time trying to
implement with 16-bit micro-ops. Then I clued in, why not just make them
bigger? They're not really micro-ops, it is more like mega-ops.

AMD uses 64-bit micro-ops, see the link I posted recently (and
again, below). It is actually a RISC-like ISA, which makes sense,
because you don't want to spend a lot of time decoding micro-ops.
They have 64 bit micro-op length, and most fields they could have
in any instruction has its unique place.

https://bughunters.google.com/blog/5424842357473280/zen-and-the-art-of-microcode-hacking

Current micro-op structure:

typedef struct packed {
logic v; // valid bit
logic [2:0] count; // number of micro-ops for instruction
logic [2:0] num; // the micro-op of the instruction
logic [1:0] xRs2; // extended register selection bits
logic [1:0] xRs1;
logic [1:0] xRd;
logic [3:0] xop4;
instruction_t ins; // The instruction
} micro_op_t;

Hmm... I don't know what your ISA looks like, but having the
original instruction looks strange. Why not take a page from
AMD's book? It looks like a reasonable philosophy, and obviously it
works for them, or they would have done something different by now.

There are a number of papers researching Intel and AMD microcode.
These look at the microcode format and mechanism. There are other
papers looking at the security in the microcode update system.

Analyzing and Exploiting Branch Mispredictions in Microcode 2025 https://arxiv.org/abs/2501.12890

CustomProcessingUnit Reverse Engineering and Customization of
Intel Microcode 2023
https://misc0110.net/files/cpu_woot23.pdf

Undocumented x86 instructions to control the cpu at the microarchitecture
level in modern intel processors 2023 https://raw.githubusercontent.com/chip-red-pill/udbgInstr/main/paper/undocumented_x86_insts_for_uarch_control.pdf

Reverse Engineering x86 Processor Microcode 2019 https://www.usenix.org/system/files/conference/usenixsecurity17/sec17-koppe.pdf

And a paper on IBM Millicode which is kind of like Alpha PAL code
and may be similar to Robert's mega-ops.

The What and Why of System z Millicode 2012 https://share.confex.com/share/119/webprogram/Handout/Session11773/The%20What%20and%20Why%20of%20System%20z%20Millicode%20-%20%2311773.pdf

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From moi@21:1/5 to EricP on Fri May 2 17:03:28 2025

On 02/05/2025 16:18, EricP wrote:

And a paper on IBM Millicode which is kind of like Alpha PAL code
and may be similar to Robert's mega-ops.

The What and Why of System z Millicode 2012 https://share.confex.com/share/119/webprogram/Handout/Session11773/The%20What%20and%20Why%20of%20System%20z%20Millicode%20-%20%2311773.pdf

Thanks for that reference.

I struggle to see how "miilicode" differs in essentials from
the "extracode" implementation of complex orders on the
Ferranti Orion & Atlas, or the ICT 1900 Series, of 60 years ago.

--
Bill F.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to moi on Fri May 2 13:22:42 2025

moi wrote:

On 02/05/2025 16:18, EricP wrote:

And a paper on IBM Millicode which is kind of like Alpha PAL code
and may be similar to Robert's mega-ops.

The What and Why of System z Millicode 2012
https://share.confex.com/share/119/webprogram/Handout/Session11773/The%20What%20and%20Why%20of%20System%20z%20Millicode%20-%20%2311773.pdf

Thanks for that reference.

I struggle to see how "miilicode" differs in essentials from
the "extracode" implementation of complex orders on the
Ferranti Orion & Atlas, or the ICT 1900 Series, of 60 years ago.

From what I can find, they sound somewhat similar.
Extracode is normal ISA instructions stored in a
separate memory accessible in a special mode.

Both PAL and Milli code also use the ISA format instructions
stored in separate memory and a special mode. However they also
disable interrupts and give access to special hardware registers.

The Manchester Mark I and Atlas A Historical Perspective 1978

It doesn't say if extracode disables interrupts while running but
it doesn't look like it does because it has 3 program counters
for main program, extracode, and interrupt. It wouldn't need
an interrupt PC if it disabled interrupts in extra mode.
Also doesn't mention special control registers.

"Generally speaking an extracode was a commonly used but
relatively complicated function which it was not economic
to implement directly as hardwired logic. Instead, an
extracode consisted of a sequence of normal instructions
(a "macro routine") held in the fixed store.
Entry to these macro routines was very rapid and
involved no preservation of central registers since there
was a dedicated extracode program counter (or control
register) and reserved B-lines, and any extracodes
needing working space used a private area of the
system working store. Amongst extracode instructions
available to the user were ones for carrying out the
common intrinsic functions such as square root, log,
cosine, etc."

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From moi@21:1/5 to EricP on Fri May 2 20:01:46 2025

On 02/05/2025 18:22, EricP wrote:

moi wrote:

On 02/05/2025 16:18, EricP wrote:

And a paper on IBM Millicode which is kind of like Alpha PAL code
and may be similar to Robert's mega-ops.

The What and Why of System z Millicode 2012
https://share.confex.com/share/119/webprogram/Handout/Session11773/The%20What%20and%20Why%20of%20System%20z%20Millicode%20-%20%2311773.pdf

Thanks for that reference.

I struggle to see how "miilicode" differs in essentials from
the "extracode" implementation of complex orders on the
Ferranti Orion & Atlas, or the ICT 1900 Series, of 60 years ago.

From what I can find, they sound somewhat similar.
Extracode is normal ISA instructions stored in a
separate memory accessible in a special mode.

Separate memory on on Atlas.

Both PAL and Milli code also use the ISA format instructions
stored in separate memory and a special mode. However they also
disable interrupts and give access to special hardware registers.

As do Orion and 1900 Series extracode.

--
Bill F.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Guest
  Wed Jul 29 14:26:54 2026
  from Balkans via Telnet
- Rixter
  Wed Jul 29 14:18:17 2026
  from Madison, Nc via Telnet
- Rixter
  Wed Jul 29 02:00:40 2026
  from Madison, Nc via Telnet
- Centurion
  Tue Jul 28 22:54:59 2026
  from Berea, Ohio via Telnet
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet
- Rixter
  Tue Jul 28 13:42:46 2026
  from Madison, Nc via Telnet
- Krenn
  Tue Jul 28 11:59:57 2026
  from Sydney, Nsw via Telnet
- Rixter
  Tue Jul 28 01:23:48 2026
  from Madison, Nc via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	67:24:49
Calls:	12,448
Calls today:	3
Files:	15,194
Messages:	6,537,575

Re: fractional PCs

Who's Online

Recent Visitors

System Info