On 1/6/2024 10:36 AM, Quadibloc wrote:
> Given that I do not know a whole lot about how cache
> coherency is done, and Mitch asked me what approach
> I was planning to take...
>
> I went on a web search to find more information on
> the subject.
>
> I learned that MSI went to MESI... and then there were
> a bunch of "ownership" schemes, such as Berkeley,
> Illinois, Firefly, and Dragon.
>
> By 1999, AMD seems to have done something in that area
> with MOESI, and later on Intel came up with MESIF instead,
> where "F", for Forwarding, is _like_ owned data, but it
> is also saved to RAM. Engineers at Intel recently also
> wrote papers on "MOESI Prime", which has primed versions
> of two of the MOESI states to avoid the cache coherency
> mechanism causing RowHammer-like behavior.
>
I still haven't bothered with this...
Though, yeah, not having cache coherence between cores does make for an
ugly situation in that conventional threading doesn't work if one
schedules multiple threads for the same process on different CPU cores
(and cases where memory sharing is being used may require manual cache
flushing or eviction).
So, proper cache-coherence is still to-do. Need to come up with
something "hopefully cheap" though.
That, or maybe try to convince people to do multithreaded programming
without the assistance of conventional cache coherence (... yeah ...).
Though, at least with direct-mapped caches, it is possible to use dummy
buffers and pointer trickery to knock stuff out of the cache. So, say,
one can write algorithms in ways where shared memory access alternately
accesses the shared memory object, and an alternate dummy address (with
accesses being performed in such a way that cores will knock dirty lines
out to RAM, discard any stale values, and then retrieve the "up to date"
values from RAM).
Practice is questionable though, as it does not work with associative
caches (and would require multiple sets of accesses to various addresses
to deal with multi-level eviction, say, to get things out of the L2
cache and into DRAM, and/or convoluted access patterns to evict things
from a 2-way cache, ...).
Then again, maybe one can argue that by the time one is using
associative caches, one can probably justify having proper cache
coherence?...
Well, there is this, and accessing memory from a "no-cache" address
(which has an auto-evict mechanism), but then observe that this
mechanism is seemingly somehow slower than just going through MMIO (or
using sets of alternating memory addresses to knock things out of the
various cache levels).
> Anyways... there was something else I found while looking
> this stuff up.
>
> I had noted that one of the reasons for offering the
> programmer a choice of writing programs with 32-bit
> long instructions and nothing but 32-bit long instructions,
> or using block headers for blocks of 256 bits in code,
> was to allow instructions to be decoded in parallel.
>
Yeah, this is part of why only 32-bit encodings ended up allowed in
bundles...
Allowing variable length instructions in bundles would increase the
number of decoders required (and more complicated/expensive logic to MUX
the outputs of those decoders).
> Mitch pointed out that one could just start decoding
> in parallel at every possible instruction start location,
> while also, in parallel, quickly resolving instruction
> lengths so as to find which decodes result in executions.
>
> I acknowledged that one could certainly do that, but
> since it was somewhat wasteful of heat and electricity,
> I didn't think of this as describing a _typical_
> implementation of my ISA (and hence parallel decoding
> was still an excuse for having a block structure rather
> than conventional CISC-like variable-length instructions).
>
In my case, the same basic logic was overloaded for both bundles and
64/96 bit instructions. As far as decoding is concerned, the jumbo
prefixes are instructions (just with some horizontal decoding magic
glued on).
> Well, one of my search results showed that this was how
> they did it on the first 64-bit Opterons, from AMD, so
> that explains why this technique came so readily to
> Mitch's mind!
>
But, not necessarily cheap.
> John Savard