Under construction: Q+ background execution buffers for the block memory operations. For instance, a block store operation can be executed in the background while other instructions are executing. Store operations are issued when the MEM unit is not busy. Background instructions continue
to execute even when interrupts occur. The background operations may be useful for initializing blocks of memory that are not needed right-away.
When the operation is issued a handle for the buffer is returned in the destination register so that the status of the operation may be queried,
or the operation cancelled.
On 2024-09-24 4:38 p.m., MitchAlsup1 wrote:
On Tue, 24 Sep 2024 20:03:29 +0000, Robert Finch wrote:
Under construction: Q+ background execution buffers for the block memory >>> operations. For instance, a block store operation can be executed in the >>> background while other instructions are executing. Store operations are
issued when the MEM unit is not busy. Background instructions continue
to execute even when interrupts occur. The background operations may be
useful for initializing blocks of memory that are not needed right-away. >>> When the operation is issued a handle for the buffer is returned in the
destination register so that the status of the operation may be queried, >>> or the operation cancelled.
This is how My 66000 performs:: LDM, STM, ENTER, EXIT, MM, and MS.
Addresses are AGENED and then a state machine over in the memory
unit performs the required steps. {{Not usefully different than the
divider performing the individual steps of division.}} While the
unit performs its duties, other units can be fed and complete
other instructions.
You just have to mark the affected registers to prevent hazards.
Q+ releases the registers right away, so things can continue on.
Q+ captures the register values at issue then does not modify the
registers. Did not want an instruction with three updates happening. It
keeps track of its own values. In theory anyway. Have not got to testing
it yet. A status operation might be used to query the final operation results.
Altering Q+ to use 64-bit instructions and 256 registers instead of supporting a vector instruction set. Two pipeline stages can be removed
then and it is a simpler design. Code density will decrease <200%.
Relying on software to assign registers for vectors.
Also adding a predicate field to instructions. Branches are horrendously
slow in this simple implementation. It may be faster to predicate a
dozen instructions.
Today I am wondering how many predicate registers are enough. Scanning >webpages reveals a variety. The Itanium has 64-predicates, but they are
used for modulo loops and rotated. Rotating register is Itaniums method
of register renaming, so it needs more visible registers. In a classic >superscalar design with a RAT where registers are renamed, it seems like
64 would be far too many.
I cannot see the compiler making use of very many predicate registers >simultaneously.
Since they are not used simultaneously, and register
renaming is in effect, there should not be a great need for predicate >registers.
On 2024-10-04 2:19 a.m., Anton Ertl wrote:...
4) Keep the flags results along with GPRs: have carry and overflow as
bit 64 and 65, N is bit 63, and Z tells something about bits 0-63.
The advantage is that you do not have to track the flags separately
(and, in case of AMD64, track each of C, O, and NZP separately), but
instead can use the RAT that is already there for the GPRs. You can
find a preliminary paper on that on
<https://www.complang.tuwien.ac.at/anton/tmp/carry.pdf>.
One solution, not mentioned in your article, is to support arithmetic
with two bits less than the number of bit a register can support, so
that the carry and overflow can be stored. On a 64-bit machine have all >operations use only 62-bits. It would solve the issue of how to load or
store the carry and overflow bits associated with a register.
Sometimes
arithmetic is performed with fewer bits, as for pointer representation.
I wonder if pointer masking could somehow be involved. It may be useful
to have a bit indicating the presence of a pointer. Also thinking of how
to track a binary point position for fixed point arithmetic. Perhaps
using the whole upper byte of a register for status/control bits would work.
It may be possible with Q+ to support a second destination register
which is in a subset of the GPRs. For example, one of eight registers
could be specified to holds the carry/overflow status. That effectively
ties up a second ALU though as an extra write port is needed for the >instruction.
Today I am wondering how many predicate registers are enough.
Scanning webpages reveals a variety. The Itanium has 64-predicates, but they are
used for modulo loops and rotated. Rotating register is Itaniums method
of register renaming, so it needs more visible registers. In a classic superscalar design with a RAT where registers are renamed, it seems like
64 would be far too many. Cray had eight vector mask registers.
I think
the RISCV- Hwatcha has 16 if I looked at the diagram correctly.
I cannot see the compiler making use of very many predicate registers simultaneously. Since they are not used simultaneously, and register
renaming is in effect, there should not be a great need for predicate registers.
Suppose one wants predicated logic in a loop with the predicate being
set outside of the loop.
It may be desirable to have several blocks of
logic predicated by different predicates in the loop. It is likely
desirable to have more than one predicate then.
Reserved four bits in the instruction for predicates. Do not want towaste bits though. Using a 64-bit instruction.
On 2024-10-05 5:43 a.m., Anton Ertl wrote:
Been thinking some about the carry and overflow and what to do about
register spills and reloads during expression processing. My thought was
that on the machine with 256 registers, simply allocate a ridiculous
number of registers for expression processing, for example 25 or even
50.
Then if the expression is too complex, have the compiler spit out an
error message to the programmer to simplify the expression.
Been thinking some about the carry and overflow and what to do about
register spills and reloads during expression processing. My thought was
that on the machine with 256 registers, simply allocate a ridiculous
number of registers for expression processing, for example 25 or even
50. Then if the expression is too complex, have the compiler spit out an error message to the programmer to simplify the expression. Remnants of
the ‘expression too complex’ error in BASIC.
Been thinking some about the carry and overflow and what to do about
register spills and reloads during expression processing. My thought was
that on the machine with 256 registers, simply allocate a ridiculous
number of registers for expression processing, for example 25 or even
50. Then if the expression is too complex, have the compiler spit out an >error message to the programmer to simplify the expression. Remnants of
the ‘expression too complex’ error in BASIC. So, there are no spills or >reloads during expression processing.
I think the storextra / loadextra
registers used during context switching would work okay. But in Q+ there
are 256 regs which require eight storextra / loadextra registers. I
think the store extra / load extra registers could be hidden in the
context save and restore hardware. Not even requiring access via CSRs or >whatever.
I suppose context loads and stores could be done in blocks of
32 registers. An issue is that the load extra needs to be done before >registers are loaded.
Another thought is to store additional info such as a CRC check of the >register file on context save and restore.
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (3 / 13) |
| Uptime: | 19:15:03 |
| Calls: | 12,104 |
| Calls today: | 4 |
| Files: | 15,004 |
| Messages: | 6,518,087 |