BGB wrote:
On 2/27/2024 7:33 PM, MitchAlsup1 wrote:
A thought::
Construct the 8-way cache from a pair of 4-way cache instances
and connect both into one 8-way with a single layer of logic
{multiplexing.}
Possible, I have decided for now to stick with 4-way...
But, even then, efforts at trying to optimize this seem to be causing
the LUT cost to increase rather than decrease...
Then you have tickled one of Verilog's insidious deamons.
How many elements in a way ?? and how many bits in an element ??
If there a way to make a "way" into a single SRAM ?? (or part of a single
SRAM) ??
What I am getting at is that "conceptually" a n-way set associative
cache is unrecognizingly different than n-copies of a 1/n direct
mapped cache coupled to a set/way selection multiplexer based on
address bits compare. {{And of course write set selection.}}
My 1-wide My 66000 implementation carefully chose 3-way (or 6-way)
L1 caches because that exactly fit the number of bits in my SRAM
macro (-2 spare bits). So cramming 3 tags, 3 line-states, and 3-bit
LRU into one 128-bit SRAM word. The 24KB cache is 3-way while the
48KB cache is the 6-way. The read speed path is 1 gate longer in
6-way configuration.
Seemingly, Vivado's response to all this being to turn it almost
entirely into LUT3 instances (with a small number of LUT6's here and there).
It seems to me it is failing to see the SRAM and just made it out of flip-flops.
Looking at the LUT3's, there seem to be various truth-tables in use.
But, off-hand, the patterns aren't super obvious.
A few common ones seem to be:
( I0 & I1) | (!I1 & I2)
( I0 & I1) | I2
(!I0 & I1) | I2
...
These appear to be std decoding pattern recognizers, to me (although
the top one looks like a binary multiplexer}.
The first one seems to be a strong majority though. I think it is a bit
MUX using I1 to select the other bit (I0 or I2).
....
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)