• Re: fractional PCs

    From MitchAlsup1@21:1/5 to Robert Finch on Sun Apr 27 20:53:34 2025
    On Sun, 27 Apr 2025 11:36:05 +0000, Robert Finch wrote:

    Representing the PC as a fixed-point number because it records which
    micro-op of the micro-op stream for an instruction got interrupted. It
    was easier to restart the micro-op stream than to defer interrupts to
    the next instruction.

    Why not just backup to the instruction boundary ??

    The lead micro-ops on a interrupt return are just NOP'd out. ATM there
    is no micro-op state that needs to be saved and restored through context switches, other than the index into the stream which can be saved as
    part of the PC.

    Also note: this is "completion" interrupt model used in 010, 020, 030
    {Not sure about 040}.

    It caused:
    a) a bunch of bugs
    b) a lot of strange stack state
    c) which could be attacked by any thread with privilege.

    Although it "sounds" like it saves {time, energy, forward progress}
    the state saving/restoring on the stack pretty much consumes all of
    that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to Robert Finch on Mon Apr 28 10:06:20 2025
    Robert Finch wrote:
    On 2025-04-27 4:53 p.m., MitchAlsup1 wrote:
    On Sun, 27 Apr 2025 11:36:05 +0000, Robert Finch wrote:

    Representing the PC as a fixed-point number because it records which
    micro-op of the micro-op stream for an instruction got interrupted. It
    was easier to restart the micro-op stream than to defer interrupts to
    the next instruction.

    Why not just backup to the instruction boundary ??

    I think I was worried about an instruction disabling interrupts or
    causing an exception that should be processed before the interrupt
    occurred. (keeping the interrupt precise). I did not want to need to
    consider other exceptions that might have occurred before the interrupt.

    I assume you are using the words interrupt and exception interchangeably. Especially when discussing at the uArch level, I find it best to keep
    these separate as the mechanisms are quite different.

    External interrupts can use a single-step mechanism to stall at Fetch
    and wait for the older instructions to complete, so it knows there are
    no older exceptions or branch mispredict purges or control changes
    (eg. there is no Interrupt Disable instruction in-flight).

    Exceptions are defined as synchronous with a single instruction so you
    can use the Instruction Queue/ROB to synchronize older vs younger ones.

    Searching for an instruction boundary in either direction is I think
    more logic than just recording the micro-op number. It is more FFs to
    record the number, but fewer LUTs. There is like 8 x10 bit comparators
    plus muxes on the re-order buffer to backup to the instruction boundary
    and mark an interrupts. Just recording the micro-op number is just
    stuffing 3 bits into the PC, plus three bits propagated down the
    pipeline (FFs). The PC has two zero bits available already.

    You don't need to do that back-up scanning thing or worry about
    older exceptions if you transfer the exception info to the IQ/ROB uOp
    as "results" and let Retire take care of it. At Retire you know
    there are no older exceptions and that the committed state
    managed by Retire is synchronized with this instruction's start.

    An OoO instruction that executes normally will write back its results and
    mark the uOp as Done. When the uOp gets to the Retire stage in the queue, Retire sees its done so updates the committed state, PC and registers,
    and removes it.

    If an instruction exception occurs during execution, at write back
    instead it marks the uOp as Except and records the exception number
    and any auxiliary info into the uOp, such as the address that page
    faulted and what R/W/E access it was attempting.

    When an Except uOp reaches Retire, it sees the Except flag and
    - purges all the uOps in flight including the one with the exception
    (so the PC and registers end up in the state they would be before
    the Except instruction, making the exception precise).

    Conceptually a big long CancelAll wire running to all function units
    and front end stages, and halts Fetch until further notice.

    - resets the future state to match the committed state

    - Retire saves enough state (PC, stack pointer, flags, priv mode)
    to restart plus saves the exception auxiliary details.
    These can all be copied to privileged control registers
    to be read later by the exception handler software.

    - uses the exception number to select a new exception handler PC and SP
    to jam those registers, and switch to Supervisor mode.

    The first few instructions in the exception handler can copy the
    exception info from the control regs onto the supervisor stack
    (old PC, old SP, old flags, old priv mode, aux info) that the handler
    needs to fix the fault and restart.

    The exception handler fixes the fault if it can and restores the restart
    state (old PC, old SP, old flags, old priv mode) into the privileged
    control registers. Afterward it executes a Return from Exception or
    Interrupt REI instruction.

    When Decode sees a REI instruction it:
    - uses single step to wait for in-flight instructions to complete
    - copies those privileged control registers back to PC, SP, flag, priv mode
    - purges the Fetch stage so it restarts using the new PC and priv mode

    The lead micro-ops on a interrupt return are just NOP'd out. ATM there
    is no micro-op state that needs to be saved and restored through context >>> switches, other than the index into the stream which can be saved as
    part of the PC.

    Also note: this is "completion" interrupt model used in 010, 020, 030
    {Not sure about 040}.

    It caused:
    a) a bunch of bugs
    b) a lot of strange stack state
    c) which could be attacked by any thread with privilege.

    Although it "sounds" like it saves {time, energy, forward progress}
    the state saving/restoring on the stack pretty much consumes all of
    that.

    a) is a bit worrisome. Doing something out of the ordinary is always
    asking for bugs. I am guessing programmers tried to manipulate the stack state on the 68k series. I replicated the 010 as an FPGA core and IIRC I stuff the machine state number onto the stack. Which is bound to be a different number than the 010.
    b) There is no state that needs to be stacked outside of what is
    normally stacked for an interrupt.

    Exception handler needs the auxiliary info to know what to fix.

    For c) the usual interrupt as well
    could be attacked by a thread with privilege.

    I have coded it both ways, so I may make it a core option. Right up
    there with page relative branching. There is already a flag to generate
    the core for performance instead of size.



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to EricP on Mon Apr 28 10:50:39 2025
    EricP wrote:


    Oh I forgot to mention this mechanism is not interrupt reentrant so
    must disable and enable interrupts. (But its still cheaper than x86.)

    - Retire saves enough state (PC, stack pointer, flags, priv mode)
    to restart plus saves the exception auxiliary details.
    These can all be copied to privileged control registers
    to be read later by the exception handler software.

    And disables interrupts so the those control registers can't be
    overwritten by an interrupt before we have a chance to save them.

    - uses the exception number to select a new exception handler PC and SP
    to jam those registers, and switch to Supervisor mode.

    The first few instructions in the exception handler can copy the
    exception info from the control regs onto the supervisor stack
    (old PC, old SP, old flags, old priv mode, aux info) that the handler
    needs to fix the fault and restart.

    After handler saves the state to the kernel stack it enables interrupts.

    And disables interrupts just before it starts the restore sequence below.

    The exception handler fixes the fault if it can and restores the restart state (old PC, old SP, old flags, old priv mode) into the privileged
    control registers. Afterward it executes a Return from Exception or
    Interrupt REI instruction.

    The old flags-mode registers also has an interrupt enable flag to restore.

    When Decode sees a REI instruction it:
    - uses single step to wait for in-flight instructions to complete
    - copies those privileged control registers back to PC, SP, flag, priv mode

    And restores the committed interrupt enable flag.

    - purges the Fetch stage so it restarts using the new PC and priv mode

    And Fetch's interrupt enable flag state is set to the committed state.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Robert Finch on Mon Apr 28 22:02:23 2025
    On Mon, 28 Apr 2025 2:32:52 +0000, Robert Finch wrote:

    On 2025-04-27 4:53 p.m., MitchAlsup1 wrote:
    On Sun, 27 Apr 2025 11:36:05 +0000, Robert Finch wrote:

    Representing the PC as a fixed-point number because it records which
    micro-op of the micro-op stream for an instruction got interrupted. It
    was easier to restart the micro-op stream than to defer interrupts to
    the next instruction.

    Why not just backup to the instruction boundary ??

    I think I was worried about an instruction disabling interrupts or
    causing an exception that should be processed before the interrupt
    occurred. (keeping the interrupt precise). I did not want to need to
    consider other exceptions that might have occurred before the interrupt.

    Searching for an instruction boundary in either direction is I think
    more logic than just recording the micro-op number.

    You say your interrupt-PC is fixed point so it can point at the
    micro-Op that raised the exception (or was interrupted). It
    seems to me that simply wiping the fractional bits from the PC
    should put you at the instruction boundary. That is:: Round Down.

    It is more FFs to
    record the number, but fewer LUTs. There is like 8 x10 bit comparators
    plus muxes on the re-order buffer to backup to the instruction boundary
    and mark an interrupts. Just recording the micro-op number is just
    stuffing 3 bits into the PC, plus three bits propagated down the
    pipeline (FFs). The PC has two zero bits available already.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Robert Finch on Tue Apr 29 21:39:49 2025
    On Tue, 29 Apr 2025 2:35:27 +0000, Robert Finch wrote:

    On 2025-04-28 10:06 a.m., EricP wrote:
    Robert Finch wrote:
    ----------------

    Exception handler needs the auxiliary info to know what to fix.

    I may have to review my setup. I thought the exception handler would be
    able to determine what is going on given the exception PC. It can find
    the instruction excepting. The bad address for a page fault / privilege violation is available in the MMU via load/store instructions. There is nothing stored in the pipeline other than a fault cause code.

    For My case: The handler arrives with causation in R0, the first 64-bits
    of the instruction in R1, and up to 3 operands to that inst in R2..R4.
    In the case of page fault, the generated virtual address R2, and the
    faulting PTE R3 are available to the handler. If the PTE is GuestOS
    pertinent, the fault is delivered to GuestOS, if the PTE is HyperVisor pertinent, the fault is delivered to HyperVisor.

    Other information needed for micro-op execution is part of the ordinary
    state of the CPU. Micro-ops use several GPRs dedicated to micro-ops
    usage.

    Do you have a code for when the microOp wants to use the same register
    as the original instruction supplied ??

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Robert Finch on Wed Apr 30 18:09:30 2025
    Robert Finch <[email protected]> schrieb:

    When I first heard about micro-ops I envisioned them as being smaller
    than the instructions in the ISA because of the term "micro". For
    instance 16 or even 12-bits. I was having a heck of time trying to
    implement with 16-bit micro-ops. Then I clued in, why not just make them bigger? They're not really micro-ops, it is more like mega-ops.

    AMD uses 64-bit micro-ops, see the link I posted recently (and
    again, below). It is actually a RISC-like ISA, which makes sense,
    because you don't want to spend a lot of time decoding micro-ops.
    They have 64 bit micro-op length, and most fields they could have
    in any instruction has its unique place.

    https://bughunters.google.com/blog/5424842357473280/zen-and-the-art-of-microcode-hacking


    Current micro-op structure:

    typedef struct packed {
    logic v; // valid bit
    logic [2:0] count; // number of micro-ops for instruction
    logic [2:0] num; // the micro-op of the instruction
    logic [1:0] xRs2; // extended register selection bits
    logic [1:0] xRs1;
    logic [1:0] xRd;
    logic [3:0] xop4;
    instruction_t ins; // The instruction
    } micro_op_t;

    Hmm... I don't know what your ISA looks like, but having the
    original instruction looks strange. Why not take a page from
    AMD's book? It looks like a reasonable philosophy, and obviously it
    works for them, or they would have done something different by now.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Robert Finch on Wed Apr 30 19:04:12 2025
    On Wed, 30 Apr 2025 5:21:13 +0000, Robert Finch wrote:

    On 2025-04-29 5:39 p.m., MitchAlsup1 wrote:
    On Tue, 29 Apr 2025 2:35:27 +0000, Robert Finch wrote:

    On 2025-04-28 10:06 a.m., EricP wrote:
    Robert Finch wrote:
    ----------------

    Exception handler needs the auxiliary info to know what to fix.

    I may have to review my setup. I thought the exception handler would be
    able to determine what is going on given the exception PC. It can find
    the instruction excepting. The bad address for a page fault / privilege
    violation is available in the MMU via load/store instructions. There is
    nothing stored in the pipeline other than a fault cause code.

    For My case: The handler arrives with causation in R0, the first 64-bits
    of the instruction in R1, and up to 3 operands to that inst in R2..R4.
    In the case of page fault, the generated virtual address R2, and the
    faulting PTE R3 are available to the handler. If the PTE is GuestOS
    pertinent, the fault is delivered to GuestOS, if the PTE is HyperVisor
    pertinent, the fault is delivered to HyperVisor.

    Other information needed for micro-op execution is part of the ordinary
    state of the CPU. Micro-ops use several GPRs dedicated to micro-ops
    usage.

    Do you have a code for when the microOp wants to use the same register
    as the original instruction supplied ??

    *poof* I forgot to take the operating mode into consideration. I think
    this is easily fixed though.

    Micro-ops use a subset of the regular ISA instructions, but the register specs fields are expanded to seven-bits so any register may be selected
    for use. To use the same register as what is in the original instruction
    it is just a matter of setting the extra register spec bits
    appropriately. Extra bits "00" gets access to the integer GPRs. The
    registers dedicated to micro-ops have codes outside of this range.

    When I first heard about micro-ops I envisioned them as being smaller
    than the instructions in the ISA because of the term "micro". For
    instance 16 or even 12-bits. I was having a heck of time trying to
    implement with 16-bit micro-ops. Then I clued in, why not just make them bigger? They're not really micro-ops, it is more like mega-ops.

    Current micro-op structure:

    typedef struct packed {
    logic v; // valid bit
    logic [2:0] count; // number of micro-ops for instruction
    logic [2:0] num; // the micro-op of the instruction
    logic [1:0] xRs2; // extended register selection bits
    logic [1:0] xRs1;
    logic [1:0] xRd;
    logic [3:0] xop4;
    instruction_t ins; // The instruction
    } micro_op_t;


    That is about right at ~48-bits:: you have to be able to encode
    EVERYTHING you want to do.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to Thomas Koenig on Fri May 2 11:18:35 2025
    Thomas Koenig wrote:
    Robert Finch <[email protected]> schrieb:

    When I first heard about micro-ops I envisioned them as being smaller
    than the instructions in the ISA because of the term "micro". For
    instance 16 or even 12-bits. I was having a heck of time trying to
    implement with 16-bit micro-ops. Then I clued in, why not just make them
    bigger? They're not really micro-ops, it is more like mega-ops.

    AMD uses 64-bit micro-ops, see the link I posted recently (and
    again, below). It is actually a RISC-like ISA, which makes sense,
    because you don't want to spend a lot of time decoding micro-ops.
    They have 64 bit micro-op length, and most fields they could have
    in any instruction has its unique place.

    https://bughunters.google.com/blog/5424842357473280/zen-and-the-art-of-microcode-hacking

    Current micro-op structure:

    typedef struct packed {
    logic v; // valid bit
    logic [2:0] count; // number of micro-ops for instruction
    logic [2:0] num; // the micro-op of the instruction
    logic [1:0] xRs2; // extended register selection bits
    logic [1:0] xRs1;
    logic [1:0] xRd;
    logic [3:0] xop4;
    instruction_t ins; // The instruction
    } micro_op_t;

    Hmm... I don't know what your ISA looks like, but having the
    original instruction looks strange. Why not take a page from
    AMD's book? It looks like a reasonable philosophy, and obviously it
    works for them, or they would have done something different by now.

    There are a number of papers researching Intel and AMD microcode.
    These look at the microcode format and mechanism. There are other
    papers looking at the security in the microcode update system.

    Analyzing and Exploiting Branch Mispredictions in Microcode 2025 https://arxiv.org/abs/2501.12890

    CustomProcessingUnit Reverse Engineering and Customization of
    Intel Microcode 2023
    https://misc0110.net/files/cpu_woot23.pdf

    Undocumented x86 instructions to control the cpu at the microarchitecture
    level in modern intel processors 2023 https://raw.githubusercontent.com/chip-red-pill/udbgInstr/main/paper/undocumented_x86_insts_for_uarch_control.pdf

    Reverse Engineering x86 Processor Microcode 2019 https://www.usenix.org/system/files/conference/usenixsecurity17/sec17-koppe.pdf

    And a paper on IBM Millicode which is kind of like Alpha PAL code
    and may be similar to Robert's mega-ops.

    The What and Why of System z Millicode 2012 https://share.confex.com/share/119/webprogram/Handout/Session11773/The%20What%20and%20Why%20of%20System%20z%20Millicode%20-%20%2311773.pdf

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From moi@21:1/5 to EricP on Fri May 2 17:03:28 2025
    On 02/05/2025 16:18, EricP wrote:

    And a paper on IBM Millicode which is kind of like Alpha PAL code
    and may be similar to Robert's mega-ops.

    The What and Why of System z Millicode 2012 https://share.confex.com/share/119/webprogram/Handout/Session11773/The%20What%20and%20Why%20of%20System%20z%20Millicode%20-%20%2311773.pdf

    Thanks for that reference.

    I struggle to see how "miilicode" differs in essentials from
    the "extracode" implementation of complex orders on the
    Ferranti Orion & Atlas, or the ICT 1900 Series, of 60 years ago.

    --
    Bill F.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to moi on Fri May 2 13:22:42 2025
    moi wrote:
    On 02/05/2025 16:18, EricP wrote:

    And a paper on IBM Millicode which is kind of like Alpha PAL code
    and may be similar to Robert's mega-ops.

    The What and Why of System z Millicode 2012
    https://share.confex.com/share/119/webprogram/Handout/Session11773/The%20What%20and%20Why%20of%20System%20z%20Millicode%20-%20%2311773.pdf


    Thanks for that reference.

    I struggle to see how "miilicode" differs in essentials from
    the "extracode" implementation of complex orders on the
    Ferranti Orion & Atlas, or the ICT 1900 Series, of 60 years ago.

    From what I can find, they sound somewhat similar.
    Extracode is normal ISA instructions stored in a
    separate memory accessible in a special mode.

    Both PAL and Milli code also use the ISA format instructions
    stored in separate memory and a special mode. However they also
    disable interrupts and give access to special hardware registers.

    The Manchester Mark I and Atlas A Historical Perspective 1978

    It doesn't say if extracode disables interrupts while running but
    it doesn't look like it does because it has 3 program counters
    for main program, extracode, and interrupt. It wouldn't need
    an interrupt PC if it disabled interrupts in extra mode.
    Also doesn't mention special control registers.

    "Generally speaking an extracode was a commonly used but
    relatively complicated function which it was not economic
    to implement directly as hardwired logic. Instead, an
    extracode consisted of a sequence of normal instructions
    (a "macro routine") held in the fixed store.
    Entry to these macro routines was very rapid and
    involved no preservation of central registers since there
    was a dedicated extracode program counter (or control
    register) and reserved B-lines, and any extracodes
    needing working space used a private area of the
    system working store. Amongst extracode instructions
    available to the user were ones for carrying out the
    common intrinsic functions such as square root, log,
    cosine, etc."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From moi@21:1/5 to EricP on Fri May 2 20:01:46 2025
    On 02/05/2025 18:22, EricP wrote:
    moi wrote:
    On 02/05/2025 16:18, EricP wrote:

    And a paper on IBM Millicode which is kind of like Alpha PAL code
    and may be similar to Robert's mega-ops.

    The What and Why of System z Millicode 2012
    https://share.confex.com/share/119/webprogram/Handout/Session11773/The%20What%20and%20Why%20of%20System%20z%20Millicode%20-%20%2311773.pdf

    Thanks for that reference.

    I struggle to see how "miilicode" differs in essentials from
    the "extracode" implementation of complex orders on the
    Ferranti Orion & Atlas, or the ICT 1900 Series, of 60 years ago.

    From what I can find, they sound somewhat similar.
    Extracode is normal ISA instructions stored in a
    separate memory accessible in a special mode.

    Separate memory on on Atlas.

    Both PAL and Milli code also use the ISA format instructions
    stored in separate memory and a special mode. However they also
    disable interrupts and give access to special hardware registers.

    As do Orion and 1900 Series extracode.

    --
    Bill F.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)