Thomas Gregoire wrote:
First time posting here, though I have perused quite a few threads on
this mailing list over the last few weeks!
I am having some fun teaching myself CPU design (the good old "what I
cannot create, I do not understand"!). For now, I am toying around with
a Xilinx FPGA board. I've implemented a small in-order, pipelined RV32IM
core (I know, baby steps!).
I now would like to understand OoO execution better--so I started
looking at Tomasulo's algorithm. While researching the topic, I stumbled
upon a bunch of discussions here regarding CDC 6600-style scoreboards,
and how they might provide an alternative path towards building an OoO
core with multiple-issue, precise exceptions, etc. Most lecture notes/textbooks I have found don't seem to have anything on the topic.
I have found the following references so far:
- Design of a computer from Thornton. The book is quite fascinating, though it's taking me a while to get used to the notation.
- The source code of Libre SoC. I am rather unfamiliar with nmigen
but even then, it looks quite readable.
- A number of threads on this mailing list as well as internal discussions on the Libre SoC website/mailing list.
Quite a few of these make direct references to book chapters from an unpublished book by Mitch Alsup--I was wondering if there was any way to
get a copy?
<
You could ask for it.
<
Also, are there any other resources that I missed and that you would recommend for anyone getting interested in this topic?
<
Basically, when you get down to the nitty gritty, a Scoreboard and a
<set of> reservation stations can be made to pretty much minick each
other. The RS-style uses "tags" to denote who is broadcasting results
and the stations look at all the tags and decide which instructions
can be launched next cycle. A SB, simple decodes the tags at the sending
end and ORs them into a single tag-vector.
<
Luke and I discussed this is our previous threads.
<
RS stations have the property that each station has to be able to
look at all the tags every cycle. A SB has to look at all the wires
of the tag-vector each cycle. The tag-vector operates at lower power
than the <set of> tag busses. The RS load grows linearly with the size
of the execution window, the SB grows quadratically. So the SB is better
when the number of waiting events (launches) is smaller.
<
But the SB has other properties, that we don't use today because we
have the resources to do register renaming. A SB in a renamed register environment is essentially NO DIFFERENT in capabilities than a RS.
<
Right now, I prefer RS for instruction data flow and SB for memory
data flow.
<
Thomas
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)