I was about to say that I don't think that is too bad, as you could
always use an extra channel for the metadata - or, more likely, rotate
the metadata through several channels, as in a RAID configuration.
Which - DRAM RAID - by the way, is already available. So in some ways
this is already being done.
But, I hold myself back from saying that this isn't too bad - because it
rounds up the cache line traffic.
A workload of random word sized accesses, my favorite stalking horse,
would not only be rounded up from a word to a cache line - it would be
rounded up to two cache lines.
Writing would be worse, however: a workload of single word writes would
be rounded up from 1 word write to 2 cacheline reads & 2 cacheline
writes. Since the net effect of RAID techniques is to convert partial
writes into 2-line read-modify-writes.
However, so would any of the various flavors of Poor Man's ECC.
(Well, except the lengthwise ... hmm... this suggests that keeping the
"ECC block", which is probably the cache coherency block size, equal to
half the burst bus transfer size, might tip the balance in our favor,
since (1) for sequential access patterns it would be the same, but (2)
for random word sized access patterns half the read accesses would only
require 1 burst transfer rather than 2, and ditto for writes, half 1rmw
rather than two.
=> Average cost for random is then 1.5r for reads read and 1.5rmw for
writes. And 1.5rmw for random GUPS like patterns.
=> Whereas for dense patterns the overhead is just the size of the metadata.
With compression making it often unnecessary to fetch an extra metadata.
> The use of an additional channel for metadata would
> only differ from "Poor Man's ECC" by using a
> hardwired location for the metadata in a separate
> channel (with the possibility to optimize that
> channel's width, burst length, etc. for its use to
> store metadata).
When David opened my eyes to the barriers to getting ubiquitous width
oriented ECC (e.g. 64+8),
* I first considered trying to persuade JEDEC to allow burst lengths of
5, or 9 - keeping the x8 or x16 interface, but allowing vendors who
cared to provide "ECC DRAM chips" that kept the same pinout, but which
optionally used 1 cycle longer bursts.
When that was shot down, for reasons including the fact that DRAM
vendors already have spare DRAM bitcell arrays on chip - which they use
via things like laser editing to improve yield (I must admit that I
never liked this argument, countering that once the chip was sold there
would be no more laser editing - so why not use the extra but unused
arrays for ECC, if there were enough of them? Although I do not like
that this would move towards ECC continuing to be a premium product.
Still, premium )
* Then I tried address scaling, the classic Poor Man's ECC. Keeping the
metadata adjacent.
* Other proposals have involved having the metadata non-adjacent, at a
fixed address.
In-cache-line-at-fixed-address compression can avoid the need to fetch
many lines of metadata. So non-adjacent may be the overall winner,
since for sequential patterns that compress there is no bandwidth loss
[*], whereas with adjacent metadata
*: no bandwidth loss compared to uncompressed. I don't think that it is
reasonable to compare to compressed, since compressed main memory is
awkward. IBM's approach, of making compressed main memory really an
extra level in the hierarchy, is probably the way to go.
>
> If enough users of memory considered providing extra
> bits for metadata sufficiently important, I suspect
> providing such would not be too difficult or
> expensive. I suspect that a large part of the
> problem is that not enough users care enough.
Yep.
Many of these proposals are transient, evolutionary:
If enough people care about ECC, then DRAM systems will support ECC
properly - and then kluges like Poor Man's ECC - or the variety that
Nvidia is apparently deploying - may not be necessary.
But the kluges seem to help the transition period. Demonstrating that
thee is a market.
It's not enough to have a clear vision of what the final architecture
should be. One also needs to have an idea of how to get there from here
- via steps, each of which can be sold.
> Yes, thus "cache". Of course, if the metadata is
> hint-oriented, then its loss is less critical. As
> has been noted, some security features can be
> provided with hint-like semantics. (Security metadata
> that relied on compressibility would be more vulnerable
> to attacks. While lack of compressibility might be
> used as a clue that an attack is being attempted--and
> lack of security metadata could be recognized as less
> secure--, such explicitly vulnerable security metadata
> makes me a bit uncomfortable, even though only presented
> as a hint [in part because the degree of added security
> might be greatly overestimated by the user].)
Yes, this makes me uncomfortable.
I say "security is a hint" only if I want to run a security enabled
binary on a system that has no security support at all.
On a system that has security support, I want to get the same answer all
the time. If there are data patterns that can lead to breakins, the
hackers will find them.
I am comfortable with putting performance information - e.g. branch
prediction info, typical physical memory addresses for prefetch AT THE
MEMORY CONTROLLER - in such transient space, that might be eliminated by
recompression with different data,