Working on the Qupls3/StarkCPU core it looks like there will be enough resources to support two sets of registers. The extra set of registers
comes for free for the register file as the BRAMs can support them. The
only increase is in the RAT. The issue I have to trade-off on now is
which of the four operating modes gets its own set of registers while
the other three share a set. However, the first eight registers will be shared between all modes so that arguments can be passed between them.
The ARM does this. My thought is that the application / user mode gets its own register set, and the rest of the system shares the other set.
That way there is no need to save and restore the app registers when
calling the system.
Another thought is to not include float registers for anything other
than apps. It would save 32 regs per mode, possibly allowing three
register sets to be provided.
On 4/16/2025 8:42 PM, Robert Finch wrote:
Working on the Qupls3/StarkCPU core it looks like there will be enough
resources to support two sets of registers. The extra set of registers
comes for free for the register file as the BRAMs can support them. The
only increase is in the RAT. The issue I have to trade-off on now is
which of the four operating modes gets its own set of registers while
the other three share a set. However, the first eight registers will be
shared between all modes so that arguments can be passed between them.
The ARM does this. My thought is that the application / user mode gets >> its own register set, and the rest of the system shares the other set.
That way there is no need to save and restore the app registers when
calling the system.
Another thought is to not include float registers for anything other
than apps. It would save 32 regs per mode, possibly allowing three
register sets to be provided.
Not to mention speeding up context switches as you don't need to
save/restore the FP registers for those levels that don't have them, and
if only one level does have them, no need to save them if the switch is
to a level that doesn't have them, as they then can't be clobbered.
Stephen Fuld <[email protected]d> writes:
On 4/16/2025 8:42 PM, Robert Finch wrote:
Working on the Qupls3/StarkCPU core it looks like there will be enough
resources to support two sets of registers. The extra set of registers
comes for free for the register file as the BRAMs can support them. The
only increase is in the RAT. The issue I have to trade-off on now is
which of the four operating modes gets its own set of registers while
the other three share a set. However, the first eight registers will be
shared between all modes so that arguments can be passed between them.
The ARM does this. My thought is that the application / user mode gets >>> its own register set, and the rest of the system shares the other set.
That way there is no need to save and restore the app registers when
calling the system.
Another thought is to not include float registers for anything other
than apps. It would save 32 regs per mode, possibly allowing three
register sets to be provided.
Not to mention speeding up context switches as you don't need to >>save/restore the FP registers for those levels that don't have them, and
if only one level does have them, no need to save them if the switch is
to a level that doesn't have them, as they then can't be clobbered.
Many modern CPUs including intel/amd have mechanisms that the OS
can use to determine if floating point registers have been used
since the user process was dispatched, including a trap to the
OS on the first floating point use. This allows them to avoid
saving and restoring the FP registers during context switches.
On 2025-04-17 2:26 p.m., MitchAlsup1 wrote:
On Thu, 17 Apr 2025 13:35:36 +0000, Scott Lurndal wrote:
Stephen Fuld <[email protected]d> writes:
On 4/16/2025 8:42 PM, Robert Finch wrote:
Working on the Qupls3/StarkCPU core it looks like there will be enough >>>>> resources to support two sets of registers. The extra set of registers >>>>> comes for free for the register file as the BRAMs can support them. The >>>>> only increase is in the RAT. The issue I have to trade-off on now is >>>>> which of the four operating modes gets its own set of registers while >>>>> the other three share a set. However, the first eight registers will be >>>>> shared between all modes so that arguments can be passed between them. >>>>> The ARM does this. My thought is that the application / user mode >>>>> gets
its own register set, and the rest of the system shares the other set. >>>>> That way there is no need to save and restore the app registers when >>>>> calling the system.
Another thought is to not include float registers for anything other >>>>> than apps. It would save 32 regs per mode, possibly allowing three
register sets to be provided.
Not to mention speeding up context switches as you don't need to
save/restore the FP registers for those levels that don't have them, and >>>> if only one level does have them, no need to save them if the switch is >>>> to a level that doesn't have them, as they then can't be clobbered.
Many modern CPUs including intel/amd have mechanisms that the OS
can use to determine if floating point registers have been used
since the user process was dispatched, including a trap to the
OS on the first floating point use. This allows them to avoid
saving and restoring the FP registers during context switches.
The converse point is that the OS may want to use vector instructions
to move data around (Disk-cache to User-buffer) and thus have to have
access to those registers anyway.
But, yes, if you have 14 different kinds of register files, you need
something close to 13 of them under flags of control to thin the
work at context switch time.
I like having the extra register files, it is just a personal
programming convenience. It reduces the pressure on the general-purpose register file.
It is also a matter of encoding the register selection in
a 32-bit instruction. I did not want to waste more than five bits on the register selection. It is really a 52?-register machine, but the
instruction encodings do not allow all registers for all instructions.
One can use the move instruction to swap any registers.
But it makes the compiler more complex as it has to deal with different architectural register files. I suppose one could code the compiler for
a machine with a flat 96 registers by surrounding operations with move instructions. It is code bloat but I do not know if it would affect performance.
Thinking of including a register exchange instruction.
Many modern CPUs including intel/amd have mechanisms that the OS can use
to determine if floating point registers have been used since the user process was dispatched, including a trap to the OS on the first floating point use. This allows them to avoid saving and restoring the FP
registers during context switches.
Another thought is to not include float registers for anything other
than apps. It would save 32 regs per mode, possibly allowing three
register sets to be provided.
On Wed, 16 Apr 2025 23:42:20 -0400, Robert Finch wrote:
Another thought is to not include float registers for anything other
than apps. It would save 32 regs per mode, possibly allowing three
register sets to be provided.
I had to re-read this to understand it.
Some architectures in the past did speed up context-switching between
user programs and the operating system by giving the operating system
its own set of registers.
And the operating system doesn't need to do floating-point math, only
user programs that do calculations. So what could go wrong with this
great idea?
The first thing that comes to mind is that even a computer with a GUI,
not merely a time-sharing mainframe, may have multiple user programs
running at once. So when a real-time clock interrupt hits, and it's
time to rotate from one compute-bound user program to another, those >floating-point registers *also* need to be saved to memory.
On Wed, 16 Apr 2025 23:42:20 -0400, Robert Finch wrote:
Another thought is to not include float registers for anything other
than apps. It would save 32 regs per mode, possibly allowing three
register sets to be provided.
I had to re-read this to understand it.
Some architectures in the past did speed up context-switching between
user programs and the operating system by giving the operating system
its own set of registers.
And the operating system doesn't need to do floating-point math, only
user programs that do calculations. So what could go wrong with this
great idea?
The first thing that comes to mind is that even a computer with a GUI,
not merely a time-sharing mainframe, may have multiple user programs
running at once. So when a real-time clock interrupt hits, and it's
time to rotate from one compute-bound user program to another, those floating-point registers *also* need to be saved to memory.
So you will probably need an outer ring of the OS that doesn't use
the special OS-specific set of registers.
John Savard
On Thu, 17 Apr 2025 13:35:36 +0000, Scott Lurndal wrote:
Many modern CPUs including intel/amd have mechanisms that the OS can use
to determine if floating point registers have been used since the user
process was dispatched, including a trap to the OS on the first floating
point use. This allows them to avoid saving and restoring the FP
registers during context switches.
That certainly is helpful. But if a user program happens not to have
done any floating-point computations during a particular time-slice,
but did perform some earlier which it will later continue, it could be
that while the FP registers don't need to be saved, they will still
need to be restored on returning to that task from the old copy.
So a mechanism like that still has to be used carefully, and it
doesn't just make the whole problem disappear.
John Savard
I am wondering if there is a way to tag each register individually with
the process id (app id) and involve the register renamer such that there
is no need to save / restore the register context.
Robert Finch <[email protected]> writes:
I am wondering if there is a way to tag each register individually with
the process id (app id) and involve the register renamer such that there
is no need to save / restore the register context.
Simulteneous multi-threading (SMT) tags each microinstruction and
instruction with a (hardware) thread id; two hardware threads may
belong to different processes. How registers are handled probably
depends on the actual implementation, not sure if any implementation
tags it, but they all can tell which physical register belongs to
which hardware thread (if any). And as long as you are not using more hardware threads than your hardware has available, there is no need to
switch contexts, and therefore no need to save and restore it.
- anton
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 40:04:20 |
| Calls: | 12,109 |
| Files: | 15,006 |
| Messages: | 6,518,395 |