Ryzen 7000 Described, but not detailed

184 views
Skip to first unread message

Quadibloc

unread,
May 23, 2022, 4:23:21 AMMay 23
to
Lisa Su's Computex keynote noted some facts about the Ryzen 7000
series in general, but she didn't give the details on each SKU in the
forthcoming lineup.

The 16-core version has, apparently, a turbo speed of around 5.5 GHz,
as shown in a game demo.

15% single-thread performance uplift was noted. The L2 cache was
doubled, however, and this was enough to give 15% extra performance
in the previous-generation X3D. One would have expected that the
higher frequency, and some IPC improvements in the architecture, would
lead to a bigger uplift for these chips - 15% from just the bigger cache,
plus a few percent more, at least, from all these other things.

John Savard

Anton Ertl

unread,
May 23, 2022, 5:38:04 AMMay 23
to
Quadibloc <jsa...@ecn.ab.ca> writes:
>Lisa Su's Computex keynote noted some facts about the Ryzen 7000
>series in general, but she didn't give the details on each SKU in the
>forthcoming lineup.
>
>The 16-core version has, apparently, a turbo speed of around 5.5 GHz,
>as shown in a game demo.
>
>15% single-thread performance uplift was noted. The L2 cache was
>doubled, however, and this was enough to give 15% extra performance
>in the previous-generation X3D.

L3 cache was tripled to 96MB in the 5800X3D. But yes, between the
clock rate increase and the IPC increase from the larger L2, I expect
little other IPC increases.

But increasing the clock to 5.5GHz is quite a feat. I wonder how they
did that? Is the new process so good, or have they managed to reduce
the gate delays per pipeline stage without reducing IPC?

What she apparently did not mention is Spectre fixes:-(. In a few
days it will be five years since Intel and AMD learned about Spectre.
It seems that they don't want to fix it. However, if Zen4 reuses most of
Zen3's microarchitecture, they still have an excuse.

She apparently also did not mention AVX512, which many expected for
Zen4.

Competetively, Intel is expected to increase L2 with their next
generation (Raptor Lake IIRC), and leave the microarchitecture
otherwise mostly unchanged. And you can always expect a clock
increase from Intel.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Torbjorn Lindgren

unread,
May 23, 2022, 2:44:38 PMMay 23
to
Quadibloc <jsa...@ecn.ab.ca> wrote:
>Lisa Su's Computex keynote noted some facts about the Ryzen 7000
>series in general, but she didn't give the details on each SKU in the
>forthcoming lineup.
>
>The 16-core version has, apparently, a turbo speed of around 5.5 GHz,
>as shown in a game demo.

No, what they showed that it's POSSIBLE to take ONE Zen4 cpu up to at
least 5.5GHz. The slides only says at least one SKU will hit "5GHz+"
single-core boost speed, usually one would expect this to also be the
high-core variant but that's not guaranteed either.

I don't think ANY details on what they were showing was given, if
that's true it could be the best chip they've produced so far
(ultimate golden sample) running on liquid helium (because not even
LN2 was enough). Or it could be a random high-bin sample running on
big air-tower or largeish AIO. I do hope it's closer to the later.

The 5950X already has a boost clock of up to 4.9GHz so the stated 5+
GHz means there MAY only be very little frequency increase compared to
Zen 3.

Basically we just don't know yet. "This fall" suggest AMD probably
know by now but...

The apparently hard DDR5 requirement could end up being a big
gift-wrapped package to Intel in the build-your-own segment since
IIRFC Intel will AFAIK support DDR4 up to at least 13th gen (Raptor
Lake).

This depends on how memory prices develop over the next few months,
I'm told the price-difference isn't bad for big OEM's (Dell, HP, ...)
but it's still 50%++ premium on various online stores for various
sizes and worse yet the common 2x8GB DDR4 config (or 1x8) is a bad
idea on DDR5 due to only 16Gbps DDR5 chips being available - "good"
8GB sticks need 8Gbps memory chips which exists for DDR4 but not for
DDR5!


>15% single-thread performance uplift was noted. The L2 cache was
>doubled, however, and this was enough to give 15% extra performance
>in the previous-generation X3D. One would have expected that the
>higher frequency, and some IPC improvements in the architecture, would
>lead to a bigger uplift for these chips - 15% from just the bigger cache,
>plus a few percent more, at least, from all these other things.

The 5800X3D trippled the size of the shared *L3* cache for 15% IPC
improvement, Zen 4 doubles the per-core *L2* cache. So, different
increase and at very different points in the cache structure.

So 5950X have 8MB L2 (16x512) and 64MB L3 and an hypotetical "7950X"
would have 16MB L2 (16x1024) and ?? L3, but probably 64MB L3 since
they don't mention an increase nor 3D. They did later assure some
reporters that 3D cache hasn't been dropped.

As mentioned we don't even know what frequencies it'll hit, just the
5GHz+. 5.0GHz would be 2% increase, 5.5GHz would be 12% increase.

Given that AMD seems to impy a total improvement of 15% despite
doubling the per-core L2 cache and presumably core improvements I'm
going to guess that it won't increase speed much over the 5950X's
4.9GHz.

AMD has earlier promised that Zen4 would be "overclocking friendly" -
perhaps the combination implies that PBO3? and increased power target
will allow higher frequencies ("single-click overclock") and the 15%
improvement is without overclocking and that the "stock" boost is
somewhere in the 5.0-5.2GHz range.

Quadibloc

unread,
May 23, 2022, 3:32:58 PMMay 23
to
On Monday, May 23, 2022 at 12:44:38 PM UTC-6, Torbjorn Lindgren wrote:

> The apparently hard DDR5 requirement could end up being a big
> gift-wrapped package to Intel in the build-your-own segment since
> IIRFC Intel will AFAIK support DDR4 up to at least 13th gen (Raptor
> Lake).

True, but previous news reports said that Intel was telling motherboard
makers to "emphasize" DDR5 over DDR4 quite strongly, so it seems like
it doesn't want to accept the gift.

John Savard

MitchAlsup

unread,
May 23, 2022, 4:13:02 PMMay 23
to
Perhaps driven by DRAM manufactures ?
>
> John Savard

Torbjorn Lindgren

unread,
May 23, 2022, 5:19:43 PMMay 23
to
I suspect that's an old statement that made sense THEN, long before
the motherboards and CPUs came out - at that point everyone expected
DDR5 to only have a small premium. And at that point the unmatchable
bandwidth that DDR5 do give you was considered an Intels *advantage*
that AMD wouldn't be able to match for ~1 year! So they were trying to
lean into it.

It takes time to change course so I don't consider it surprising that
the initial set of motherboard only contains low and (lower) mid-range
DDR4 motherboards with the high-end reserved for DDR5.

And to be honest, to some extent it still largely makes sense, the
ultra-expensve $1000+ motherboards are really mostly for extreme
overclocking (XO) and both money and sense were discarded long before
the question of memory cost came up. It does still hurt Intel somewhat
in those segments, it would have been useful for XO users to have
access to a low-latency memory platform too which at this point is
still highly tuned DDR4.

But the number of motherboards sold to XO users are fairly small and
the remainder honestly seems to end up with people working on the
"more expensive=better" and "higher number=better" rules.

Where it has the potential to hurt the demand is with smallish (but
still way, WAY larger than XO) "workstation" class of products in the
upper mid-range which generally wants things like integrated TB4 and
10Gbps network which also ends up with DDR5 only (for now).

But most of that market is goes via SA's or big OEMs and as I
mentioned all reports are that for *those* the price-delta isn't very
large!

Also, workstation class machines often lives by the amount of memory
that can be stuffed into them and the DDR4/4-slot/Unregistered is
as it looks right now never going to exceed 128GB, but there's a good
chance the DDR5 variants will later be able to upgrade to 256GB! So
while there's a bummer it also has POSSIBLE future upsides.

So, it's a problem but mainly for a subset of a subset, or subset of
subset of subset.

Until I considered the amount of memory angle I was a bit surprised we
haven't seen any DDR4 Z690/10G/TB4 motherboard (yet) but I can easily
seeing them deciding it's too small market and/or deciding it'll hold
until the 13th gen motherboards comes out.

AFAIK those are likely coming out about the same time as AM4/Ryzen
7xxx/DDR5-only platform, until then they may well be fine continuing
to sell X570/10G/TB4 motherboard (for 5950X), no skin off their nose
that it's AMD instead of Intel.

It also depends on where they think DDR4 prices are going to go by
6-12 months time, there's a lot of lead-time in creating new
motherboards.

Anton Ertl

unread,
May 24, 2022, 2:57:46 AMMay 24
to
Torbjorn Lindgren <t...@none.invalid> writes:
>I don't think ANY details on what they were showing was given, if
>that's true it could be the best chip they've produced so far
>(ultimate golden sample) running on liquid helium (because not even
>LN2 was enough).

A quick search showed 6362MHz all-core for a 5950X with LN2
overclocking, so it's unlikely that they have to go to these lengths
for 5500Mhz on the next generation.

>Basically we just don't know yet. "This fall" suggest AMD probably
>know by now but...

In particular the announcements of multiple mainboards suggest that
the whole ecosystem is pretty far along.

>The apparently hard DDR5 requirement could end up being a big
>gift-wrapped package to Intel in the build-your-own segment since
>IIRFC Intel will AFAIK support DDR4 up to at least 13th gen (Raptor
>Lake).
>
>This depends on how memory prices develop over the next few months,
>I'm told the price-difference isn't bad for big OEM's (Dell, HP, ...)
>but it's still 50%++ premium on various online stores for various
>sizes

Currently more like a factor of 2 here: The cheapest available DDR4 is
2x16GB DDR4-2666 at EUR 96.90, the cheapest avaiolable DDR5 is 16GB
DDR5-4800 at EUR 80.90, but that's only at one dealer; otherwise it
starts at EUR 190.59 for 2x16GB DDR5-4800.

>and worse yet the common 2x8GB DDR4 config (or 1x8) is a bad
>idea on DDR5 due to only 16Gbps DDR5 chips being available - "good"
>8GB sticks need 8Gbps memory chips which exists for DDR4 but not for
>DDR5!

So buy 1x16GB instead of 2x8GB. Same capacity, similar bandwidth,
same number of channels.

But I see that 8GB DIMMs for DDR5 exist, e.g., Crucial CT8G48C40U5
(EUR 59.23).

Anyway, it looks like AM5 will be a high-end platform as long as DDR5
prices stay high, but you can still buy AM4 stuff; AMD may have to
reduce the price over time, though.

>As mentioned we don't even know what frequencies it'll hit, just the
>5GHz+. 5.0GHz would be 2% increase, 5.5GHz would be 12% increase.
>
>Given that AMD seems to impy a total improvement of 15% despite
>doubling the per-core L2 cache and presumably core improvements I'm
>going to guess that it won't increase speed much over the 5950X's
>4.9GHz.

Either that, or the core improvements are mainly for clock speed
improvements (plus L2 increase), and they had their hands full with
porting to the new process (AMD's tick-tock approach) and all the
stuff they had to do on the I/O die (DDR5, PCIe5, faster USB,
graphics, display port).

- anton
--

Quadibloc

unread,
May 24, 2022, 4:42:55 AMMay 24
to
On Monday, May 23, 2022 at 3:38:04 AM UTC-6, Anton Ertl wrote:

> She apparently also did not mention AVX512, which many expected for
> Zen4.

That's disappointing, but then Intel isn't offering AVX-512 in their
consumer chips either.

However, I _did_ see *one* thing that was exciting in what she said about
Ryzen 7000.

It will have 24 PCIe 5 channels. That isn't exciting by itself, since a graphics
card takes 16 of them, so that's not enough for two.

However, the chips will also have *on-chip graphics* even in the 8, 12, and 16
core SKUs, not just in the ones for budget builds. So, if one _isn't_ heavily into
gaming, and can be content with the (fairly good!) on-chip graphics...

finally one does not need to run out and buy a Threadripper if one wants to
put an *accelerator card* in one's CPU!

John Savard

Michael S

unread,
May 24, 2022, 6:01:25 AMMay 24
to
On Tuesday, May 24, 2022 at 11:42:55 AM UTC+3, Quadibloc wrote:
> On Monday, May 23, 2022 at 3:38:04 AM UTC-6, Anton Ertl wrote:
>
> > She apparently also did not mention AVX512, which many expected for
> > Zen4.
> That's disappointing, but then Intel isn't offering AVX-512 in their
> consumer chips either.
>

Intel does not offer AVX-512 in their *new* consumer chips.
Previous generation, what they call Gen11, mainly Rocket Lake desktop
CPUs and Tiger Lake notebook chips, offers AVX-512 just fine.

I took a look at the web site of our biggest network of computer shops.
In the list of "Recommended desktop PCs with Intel CPU" 18 out of 38
models have AVX-512.
For notebooks, they have no "recommended" list, so I looked at all models.
There are many, so my numbers are less precise, but from quick glance it
looks like at the moment models with AVX-512 constitute overwhelming majority
of Intel-based notebooks. Only 20 or 25 models out of ~120 were either too old
(gen10), too cheap (Pentium Gold) or too new (gen12). The rest, i.e. ~75%
are based on i3/i5/i7/i9 Gen11 and support AVX-512.

Anton Ertl

unread,
May 24, 2022, 6:41:25 AMMay 24
to
Quadibloc <jsa...@ecn.ab.ca> writes:
>On Monday, May 23, 2022 at 3:38:04 AM UTC-6, Anton Ertl wrote:
>
>> She apparently also did not mention AVX512, which many expected for
>> Zen4.
>
>That's disappointing, but then Intel isn't offering AVX-512 in their
>consumer chips either.

In Ice Lake, Tiger Lake (mobile), and Rocket Lake (desktop), they
offer AVX-512.

>However, the chips will also have *on-chip graphics* even in the 8, 12, and 16
>core SKUs, not just in the ones for budget builds. So, if one _isn't_ heavily into
>gaming, and can be content with the (fairly good!) on-chip graphics...

The graphics capability is expected to be weak. If you want fairly
good graphics, get, e.g., a Ryzen 5700G (AM4) or an unannounced AM5
6700G (the corresponding mobile APUs already exist).

But the graphics saves both the cost and the power consumption for the
low-end graphics card we have had to put in our Ryzen-based servers.

Michael S

unread,
May 24, 2022, 7:18:25 AMMay 24
to
On Tuesday, May 24, 2022 at 1:41:25 PM UTC+3, Anton Ertl wrote:
>
> But the graphics saves both the cost and the power consumption for the
> low-end graphics card we have had to put in our Ryzen-based servers.

It's not given that server OEMs would want to use built-in GPU.
Back when we were shopping for small Xeon-E3 v3 based server I
was surprised that Dell 1U servers do not support CPUs with
built-in graphics, like 1275v3. Instead, we ended up buying
E3 1271v3 + Matrox G200er.
I never understood Dell's reasons, but surely they had reasons.

However, few years later, similar DELL gear based E-21xx did use built-in GPU.

Torbjorn Lindgren

unread,
May 24, 2022, 8:51:33 AMMay 24
to
Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>Torbjorn Lindgren <t...@none.invalid> writes:
>>I don't think ANY details on what they were showing was given, if
>>that's true it could be the best chip they've produced so far
>>(ultimate golden sample) running on liquid helium (because not even
>>LN2 was enough).
>
>A quick search showed 6362MHz all-core for a 5950X with LN2
>overclocking, so it's unlikely that they have to go to these lengths
>for 5500Mhz on the next generation.

Yeah but it's the first processors of a new design, in a new node and
not launched yet, so it's possible they had to push hard.

No, I don't consider it likely that they had to resort to LN2 cooling,
never mind helium, it was obviously a deliberate "maximum possible"
example to compare with the other extreme (fairly standard cooling for
current high-end PCs) to illustrate how little we actually KNOW.

We should also remember that it was hitting 5.5GHz single-core peaks
running a game, not something that would have pushed the heat
generation much harder like Prime95 Small.


>>and worse yet the common 2x8GB DDR4 config (or 1x8) is a bad
>>idea on DDR5 due to only 16Gbps DDR5 chips being available - "good"
>>8GB sticks need 8Gbps memory chips which exists for DDR4 but not for
>>DDR5!
>
>So buy 1x16GB instead of 2x8GB. Same capacity, similar bandwidth,
>same number of channels.

I wrote Gbps (speed), I obviously meant Gb (size) for the memory
chips. So I'll try to make sure I write "stick" for the actual memory
sticks.

So, an 1x16GB stick configuration is half the channels and (a bit more
than) half the bandwidth of 2x16 stick configuration at the same speed
(and speed is independent).

Yes, it's true that I said that a DDR5 2x8GB stick configuration is
"not optimal" but it'll still beat the **** out of a DDR5 1x16GB stick
configuration. (aka "pre-built computer special", they really LOVE
their single memory sticks to save $5 or so).

OTOH "not quite as bad" isn't the best possible epitaph for a solution
that still cost a lot more than DDR4.


>But I see that 8GB DIMMs for DDR5 exist, e.g., Crucial CT8G48C40U5
>(EUR 59.23).

Yes, I at least implicitly acknowledged this when I said how they were
constructed and why they were inferior.

To elaborate a bit more:

Due to there not being any 8Gb DDR5 memory chips any 8GB DDR5 memory
stick will be built using four 16Gb x16 chips (4*16Gb=8GB, 4x16 gives
64 data bits) and when the memory chips are set up for x16 they have
significantly less concurrancy than when set up as x8 chips (less
groups and planes).

Don't ask me why that is, I just read data-sheets and read/watch
benchmark results :-)

Various new features in DDR5 likely lessen the impact of less
concurrency compared to DDR4 but it's still very noticeable in testing
I've seen.

A 16GB DDR5 stick OTOH is (so far) built using eight 16Gb x8 chips
(8*16Gb=16GB, 8x8 gives 64 databts) and as mentioned x8 (and x4) chips
doesn't have this issue.


The reason this usually isn't a problem for DDR4 is that there's still
4Gb and 8Gb chips in active production so it's trivial to build
smaller memory sticks using eight x8 chips (down to 4GB).

We still see a few of these four x16 chip configurations on very cheap
DDR4 SO-DIMMs now and then and the result tends to be BAD, the impact
is actually much worse on DDR4.


Now, it wouldn't be hard for the manufacturers to build 8Gb DDR5 chips
but I expect they don't think there's enough demand for it. The
various memory manufacturers all have larger DDR5 chips in their
roadmap (24Gb!, 32Gb, 48Gb! and 64Gb) but I don't think anyone has 8Gb
DDR5 listed in their public roadmaps.


>Anyway, it looks like AM5 will be a high-end platform as long as DDR5
>prices stay high, but you can still buy AM4 stuff; AMD may have to
>reduce the price over time, though.

Yeah, if they have fab allocation it certainly makes sense to continue
to produce AM4 especially in light of the current (high) demand and
uncertainty of a new process node.

Whether they do have enough 7nm fab space allocated at TSMC to do that
is of course the question since TSMC is *full* for the foreseable
future but it's not a question anyone outside AMD and TSMC is likely
to know, it's all planned 12-24 months in advance!

They also need the old 12/14nm GF IO die for any of the older chips
they build, again we don't know how how supplies of that looks (IIRC
they're been cutting down on GF deliveries).

On the positive side the Zen4 core dies and IO die both use different
fab processes (TSCM 5nm and 6nm respectively) so they're not in direct
competition in the fab! at least but I expect most processing after
that uses the same factories.


>Either that, or the core improvements are mainly for clock speed
>improvements (plus L2 increase), and they had their hands full with
>porting to the new process (AMD's tick-tock approach) and all the
>stuff they had to do on the I/O die (DDR5, PCIe5, faster USB,
>graphics, display port).

The new IO die certainly got a massive upgrade here OTOH they could
have had people working on that for a long time given how long they've
been using the current IO die.

Anton Ertl

unread,
May 24, 2022, 10:48:47 AMMay 24
to
Torbjorn Lindgren <t...@none.invalid> writes:
>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>Torbjorn Lindgren <t...@none.invalid> writes:
...
>No, I don't consider it likely that they had to resort to LN2 cooling,
>never mind helium

From what I know, most modern CPUs stop working below -120C or so, so
already LN2 cooling is a balancing act, and LHe would not make sense
(less enthalpy of vaporization and much higher cost).

>>>and worse yet the common 2x8GB DDR4 config (or 1x8) is a bad
>>>idea on DDR5 due to only 16Gbps DDR5 chips being available - "good"
>>>8GB sticks need 8Gbps memory chips which exists for DDR4 but not for
>>>DDR5!
>>
>>So buy 1x16GB instead of 2x8GB. Same capacity, similar bandwidth,
>>same number of channels.
>
>I wrote Gbps (speed), I obviously meant Gb (size) for the memory
>chips. So I'll try to make sure I write "stick" for the actual memory
>sticks.
>
>So, an 1x16GB stick configuration is half the channels and (a bit more
>than) half the bandwidth of 2x16 stick configuration at the same speed
>(and speed is independent).

You compared 2x8GB DDR4 with DDR5. Even if it was true that there is
no 8GB DDR5 sticks, 1x16GB DDR5 is similar in performace to 2x8GB
DDR4.

>Due to there not being any 8Gb DDR5 memory chips any 8GB DDR5 memory
>stick will be built using four 16Gb x16 chips (4*16Gb=8GB, 4x16 gives
>64 data bits) and when the memory chips are set up for x16 they have
>significantly less concurrancy than when set up as x8 chips (less
>groups and planes).
>
>Don't ask me why that is, I just read data-sheets and read/watch
>benchmark results :-)

And what do the benchmark results say?

I remember that on one benchmark (on IIRC the Alder Lake review on
anandtech) DDR5 beat DDR4 by a lot due to having double the number of
channels with the same number of sticks, but for most applications the
CPU performance difference of different RAM configurations is
relatively minor.

>On the positive side the Zen4 core dies and IO die both use different
>fab processes (TSCM 5nm and 6nm respectively) so they're not in direct
>competition in the fab! at least but I expect most processing after
>that uses the same factories.

Is that really positive? They have to estimate the demand for their
various products far in advance. The might have estimated that AM5
demand will be much higher than it turns out to be thanks to DDR5
prices, and then they have capacity that they don't need (well maybe
do an Xbox SoC shrink early or something).

Anton Ertl

unread,
May 24, 2022, 11:11:04 AMMay 24
to
Michael S <already...@yahoo.com> writes:
>On Tuesday, May 24, 2022 at 1:41:25 PM UTC+3, Anton Ertl wrote:
>>
>> But the graphics saves both the cost and the power consumption for the
>> low-end graphics card we have had to put in our Ryzen-based servers.
>
>It's not given that server OEMs would want to use built-in GPU.

Yes, and many server customers want the graphics or whatever through
an extra Ethernet port that is served from an mainboard
graphics/control chip called a BMC. Funnily, whenever we bought a
board that advertized a BMC, it turned out that it's an optional
add-on that costs extra in addition to the already-high cost of the
server board.

We have a setup that works with classical VGA KVMs, and this add-on
nonsense ensures that we stay there (well, we have to think about
something for our new Intel servers that don't support VGA).

With the Ryzens, we could use consumer boards, which are not just much
cheaper than server boards, but, as we found recently, are also
arguably better server boards than the sold-as-server boards: The
consumer ASUS TUF Gaming B550M-Plus supports a 2280 and a 22110 NVMe
SSD, the server ASUS P12R-M/10G-2T supports only one 2280 NVMe SSD,
but costs more than twice as much (ok, we saved EUR 100 on a 10GbE
card, but had to pay extra for PCIe->M.2 adapters).

EricP

unread,
May 24, 2022, 12:17:26 PMMay 24
to
Torbjorn Lindgren wrote:
>
> To elaborate a bit more:
>
> Due to there not being any 8Gb DDR5 memory chips any 8GB DDR5 memory
> stick will be built using four 16Gb x16 chips (4*16Gb=8GB, 4x16 gives
> 64 data bits) and when the memory chips are set up for x16 they have
> significantly less concurrancy than when set up as x8 chips (less
> groups and planes).
>
> Don't ask me why that is, I just read data-sheets and read/watch
> benchmark results :-)
>
> Various new features in DDR5 likely lessen the impact of less
> concurrency compared to DDR4 but it's still very noticeable in testing
> I've seen.
>
> A 16GB DDR5 stick OTOH is (so far) built using eight 16Gb x8 chips
> (8*16Gb=16GB, 8x8 gives 64 databts) and as mentioned x8 (and x4) chips
> doesn't have this issue.
>
>
> The reason this usually isn't a problem for DDR4 is that there's still
> 4Gb and 8Gb chips in active production so it's trivial to build
> smaller memory sticks using eight x8 chips (down to 4GB).
>
> We still see a few of these four x16 chip configurations on very cheap
> DDR4 SO-DIMMs now and then and the result tends to be BAD, the impact
> is actually much worse on DDR4.

How many rows can each of these chips have open at once?
The internal configuration of the DRAMs would seem to be important.
I looked at some DDR4 chip specs and I couldn't see an explicit statement.

A picture showed a, say, 4 Gb chip having 4 groups, each group has 8 banks,
each bank has its own set of row address latches, row decode, sense amps,
and each row has 4096 column data latches. And each bank has 32768 rows.

So it looked like this chip could have 32 rows (banks) open at once.

For a 4 GB DIMM, it can use *1 (1 bit wide), *4 or *8 bit chips.
The DIMM has 16 chips, 8 per side.
If it uses *1's then all 16 chips must have the same 32 rows open,
and each chip supplies 1 bit per clock edge from 4096 column addresses.
If it uses *4's then it looked like I can get 4*32=128 rows open at once,
but each row supplies 4 bits per edge, from 1024 column addresses.
For *8's it is 256 rows, 8 bits per edge, from 512 columns addresses

There was also how many beats or bunches-of-bits are read per column access.
It looked like that could scale from 2, 4 or 8 bits (1, 2 or 4 clocks).

So the memory controller has to scale from managing 32 to 256 open rows.
And control how many bits are read per beat per column address.

Does that all sound about right?


MitchAlsup

unread,
May 24, 2022, 12:48:19 PMMay 24
to
Memory controller can restrict itself to 32 banks and if it needs more, close a bank.
That is:: it does not have to be able to use all the rows.

Anton Ertl

unread,
May 24, 2022, 2:01:03 PMMay 24
to
EricP <ThatWould...@thevillage.com> writes:
>For a 4 GB DIMM, it can use *1 (1 bit wide), *4 or *8 bit chips.
>The DIMM has 16 chips, 8 per side.

The sides of a DIMM are selected independently (two ranks per DIMM, at
least for DDR4; I guess with DDR5 this doubles because the number of
channels doubles) on usual DIMMs. DIMMs have 64 data bit lines. So
if a DIMM has 8 chips per side, it has *8 bit chips. AFAIK for UDIMMs
(unbuffered) load limit mean that you have at most 8/9 chips per rank
(there (rarely) are double-capacity UDIMMs with twice as many chips
per side, but they also have double the number of ranks, and you can
use only two of those on a memory controller that supports 4
normal-capacity UDIMMs). DIMMs with four chips per side are
necessarily x16-bits per chip.

Narrower chips are probably for registered RDIMMs and load-reduced
LRDIMMs (for big servers).

>If it uses *1's then all 16 chips must have the same 32 rows open,

All chips of a channel selected by the same rank have the same rows
etc. open.

>There was also how many beats or bunches-of-bits are read per column access.
>It looked like that could scale from 2, 4 or 8 bits (1, 2 or 4 clocks).

For DDR3 and DDR4 a burst is at least 8 transfers long. For DDR5 at
least 16. In order to match 64-byte cache line width, the channel
width has been 64 bits with DDR3/4, and has been reduced to 32 bits in
DDR5 (with the 64 bits of a DIMM containing two channels).

>So the memory controller has to scale from managing 32 to 256 open rows.
>And control how many bits are read per beat per column address.

The latter is pretty much fixed; i.e., 64 bits for DDR4, 32-bits for
DDR5. In the early DDR days minimal bursts were so short that one
could do something called dual-channel mode where instead of getting 8
beats from one 64-bit channel, you would get 4 beats each from two
channels, giving better bandwidth, but requiring the channels to have
the same rows open.

Michael S

unread,
May 24, 2022, 3:47:47 PMMay 24
to
4096 column are atypical. More typically, DDR4 devices have 1024 columns.
I am not even sure if 4096 columns allowed by JPEG standards.

32K rows per bank mostly appear in older and smaller chips. Bigger devices,
esp, those with 4-bit data bus can have up to 256K rows per bank.

>
> So it looked like this chip could have 32 rows (banks) open at once.

DDR4 devices with 32 banks are rare. Most typically, DDR4 device has 4 bank groups of 4 banks each
for 16 banks total.

>
> For a 4 GB DIMM, it can use *1 (1 bit wide), *4 or *8 bit chips.

JEDEC standard defines x4, x8 and x16 DDR4 chips. x1 does not exits.

x4 used almost exclusively for high-capacity server DIMMs, typically buffered/registered.
Unbuffered x4 would be problematic due to high capacitive load on address/control signals.

x16 is used in el-cheapo PC DIMMs and in embedded devices.

DIMMs used in good PCs are almost exclusively made out of x8 DDR devices.

> The DIMM has 16 chips, 8 per side.
> If it uses *1's then all 16 chips must have the same 32 rows open,
> and each chip supplies 1 bit per clock edge from 4096 column addresses.
> If it uses *4's then it looked like I can get 4*32=128 rows open at once,
> but each row supplies 4 bits per edge, from 1024 column addresses.
> For *8's it is 256 rows, 8 bits per edge, from 512 columns addresses
>
> There was also how many beats or bunches-of-bits are read per column access.
> It looked like that could scale from 2, 4 or 8 bits (1, 2 or 4 clocks).
>
> So the memory controller has to scale from managing 32 to 256 open rows.
> And control how many bits are read per beat per column address.
>
> Does that all sound about right?

Most typical PC DDR4 configuration has up to 64 simultaneously open banks per DIMM channel
(16 banks x 4 ranks) and 2 DIMM channels.
In server you have more channels, but also more controllers.
Sometimes in server you have more than 4 ranks, but AFAIK in DDR4 it costs you reduces data rate.


Torbjorn Lindgren

unread,
May 25, 2022, 8:33:53 AMMay 25
to
Torbjorn Lindgren <t...@none.invalid> wrote:
>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>Torbjorn Lindgren <t...@none.invalid> writes:
>>>I don't think ANY details on what they were showing was given, if
>>>that's true it could be the best chip they've produced so far
>>>(ultimate golden sample) running on liquid helium (because not even
>>>LN2 was enough).
>>
>>A quick search showed 6362MHz all-core for a 5950X with LN2
>>overclocking, so it's unlikely that they have to go to these lengths
>>for 5500Mhz on the next generation.
>
>Yeah but it's the first processors of a new design, in a new node and
>not launched yet, so it's possible they had to push hard.
>
>No, I don't consider it likely that they had to resort to LN2 cooling,
>never mind helium, it was obviously a deliberate "maximum possible"
>example to compare with the other extreme (fairly standard cooling for
>current high-end PCs) to illustrate how little we actually KNOW.
>
>We should also remember that it was hitting 5.5GHz single-core peaks
>running a game, not something that would have pushed the heat
>generation much harder like Prime95 Small.

And we now know it was pretty close to the low-bar, AMD's Direct of
Technical Marketing Robert Hallock later disclosed in a PCWorld stream
that the 5.5GHz peak was done without overclocking using a reference
AM5 motherboard and a "consumer-based" Asetek 280mm - which build MOST
PC AIO's thanks to various patents on placing the pump on top of the
CPU (hence some model instead placing the pump in the radiator).

So more substantial cooling than I expect most people have on their
5950X (I'm guessing 240mm or big-air) but not chilled water (Intel 28
core 5GHz demo anyone) or anything crazy and it's a setup that should
be easy to match using consumer parts.

I was expecting it to be at least PBO boosting to hit 5.5GHz but even
that sounds like it's ruled out unless AMD has now decided PBO now
longer counts as overclocking.

Now, will it hit these frequencies on load that are far more strenous?
No idea, perhaps that may explain why they went with 5+GHz on the
poster. Or it was old information. Or...

Torbjorn Lindgren

unread,
May 25, 2022, 10:45:04 AMMay 25
to
Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>Torbjorn Lindgren <t...@none.invalid> writes:
>>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>>Torbjorn Lindgren <t...@none.invalid> writes:
>...
>>No, I don't consider it likely that they had to resort to LN2 cooling,
>>never mind helium
>
>From what I know, most modern CPUs stop working below -120C or so, so
>already LN2 cooling is a balancing act, and LHe would not make sense
>(less enthalpy of vaporization and much higher cost).

It's easy to find a number of XO (extreme overclocking) runs done
using LHe on both older and newer processors (like 9900K, 10900K,
i9-12900K, 3900X) so clearly it's used, even if it's rare compared to
LN2 XO.

From what XO'ers say, I think it's overstating it to say that most
CPUs stop working at -120C or so, some CPU types will run into
unsolvable problem at anywhere from -30 to -120C, while others are
fine even at "full pot" LN2 once you dial in all the required
voltages, IE they'll run even at effectively -195C (and full pot is
soooo much easier to do).

But if a CPU can handle "full LN2 pot" it's very likely it can go at
least a bit colder. I assume nothing will work at "full LHe pot"!, but
clearly it can get colder enough to be worthwhile in some cases.

It's certainly true that it's going to be a lot more expensive and
complicated than LN2 overclocking but... I hope no one is going to
argue AMD couldn't do it if they wanted and spent the required time!

So I used it as the hypotetical "extreme upper bound". As mentioned
elsewhere I didn't expect them to even use LN2 and in the end they
ended close to (but above) my "lower bound".


>I remember that on one benchmark (on IIRC the Alder Lake review on
>anandtech) DDR5 beat DDR4 by a lot due to having double the number of
>channels with the same number of sticks, but for most applications the
>CPU performance difference of different RAM configurations is
>relatively minor.

I'm not sure it's correct to say DDR5 doubles the channel count, the
briefs I've read all say that DDR5 has two independent subchannels.
Which is close but not quite the same. Example brief [1].

IE a DDR4 stick has a 64/72-bit channel and a full burst is 8 blocks
for a total of 64 byte. A DDR5 stick has two sub-channels of 32/40-bit
and a full transfer is 16 blocks so the full burst is actually kept
the same size, 64 byte.

Thanks to the higher transfer speed the longer burst doesn't hurt DDR5
that much, and this arrangement gain "concurrency" which is important
with todays higher core counts and advanced OoO cores.

There's a number of other concurrency improvements in DDR5 too, they
tend to have twice as many banks in total compared to a similar sized
DD4 chip which helps with this. Example
Micron 16Gb DDR4/5: bank groups/banks per bank group (total banks).
DDR4: x8: 4/4 (16) x16: 2/4 (8) [2]
DDR5: x8: 8/4 (32) x16: 4/4 (16) [3]

Interestingly the Micron DDR5 "core" data sheet actually list these
for 8/16/24/32/48/64 Gb chips despite AFAIK only the 16Gb being
available right now (from all manufacturer). That may mean they do
plan to launch 8Gb DDR5 chips at some point. Or not, it might just be
included in case.

But the bank increase does mean that DDR5 x16 based DIMMs should be
less dire than DDR4 x16 ones. There's very little testing of 2x8GB
DDR5, the best I've found was some bandwidth test by MSI [4].

My reading of the 12900K 2x8GB DDR4 vs 2x8GB DDR5 graph is that
they're surprisingly close. And then the 2x8GB DDR5 to 2x16GB DDR5
graphs confirms this with significant uplifts for the 2x16GB kit, even
laterncy goes down, despite both having only 1 Rank.

So yeah, I think this suggest 2x8GB DDR5 with 16Gb x16 memory leaves
quite a bit of performance on the table. And while mostly matching
DDR4 might have been fine when there's near price parity, it's harder
to argue when DDR5 cost a lot more.

And as I expected DDR4 comes away with the latency crown, even though
Intel 12th gen's DDR4 controller tends to have higher latency than
10/11th gen (which we do see in an earlier graph doing that
comparison).

I expect all of this to change over time, the same happened with DDR3
and DDR2, it tends to take a little while for the full potential of a
new standard to be unlocked.

I also do agree that the performance difference for many tasks aren't
that big. There's some tasks that absolutely benefit from the
bandwidth provided by a 2x16 or 2x32GB DDR5, and then it may be
important to know that 2x8GB "16Gb x16" DDR5 configuration *may*
behave more like DDR4 benchmarks than DDR5 benchmarks for those tasks.


>>On the positive side the Zen4 core dies and IO die both use different
>>fab processes (TSCM 5nm and 6nm respectively) so they're not in direct
>>competition in the fab! at least but I expect most processing after
>>that uses the same factories.
>
>Is that really positive? They have to estimate the demand for their
>various products far in advance. The might have estimated that AM5
>demand will be much higher than it turns out to be thanks to DDR5
>prices, and then they have capacity that they don't need (well maybe
>do an Xbox SoC shrink early or something).

I should have said *possibly* positive or negative.

As I mentioned I've seen multiple claim that DDR5 pricing to OEMs
(Dell/HP) and SA's aren't that different from DDR4, it's mainly in the
retail space there's massive differences (this is US based reports).
And OEM+SA is AFAIK a much larger market than retail so if that stays
the case it might help AMD's demand. But DDR5 prices could also go up,
or down.

So, yeah, could definitely go either way.


1. https://media-www.micron.com/-/media/client/global/documents/products/technical-marketing-brief/ddr5_key_module_features_tech_brief.pdf
2. https://www.micron.com/-/media/client/global/documents/products/data-sheet/dram/ddr4/16gb_ddr4_sdram.pdf
3. https://media-www.micron.com/-/media/client/global/documents/products/data-sheet/dram/ddr5/ddr5_sdram_core.pdf
4. https://www.msi.com/blog/a-closer-look-at-ddr5-benchmarks-with-intels-Alder-lake-cpus

EricP

unread,
May 25, 2022, 10:49:19 AMMay 25
to
All such discrepancies with reality are entirely due
to me misreading or misremembering the DRAM specs.

I was mostly trying to get an approximate idea of what kinds of
things a DRAM memory controller has to deal with, in an arm-wavy way.

> 32K rows per bank mostly appear in older and smaller chips. Bigger devices,
> esp, those with 4-bit data bus can have up to 256K rows per bank.
>
>> So it looked like this chip could have 32 rows (banks) open at once.
>
> DDR4 devices with 32 banks are rare. Most typically, DDR4 device has 4 bank groups of 4 banks each
> for 16 banks total.
>
>> For a 4 GB DIMM, it can use *1 (1 bit wide), *4 or *8 bit chips.
>
> JEDEC standard defines x4, x8 and x16 DDR4 chips. x1 does not exits.

Yeah... my bad. The spec says 1 Gig x 16 but I remembered it as 1 x 16 Gb.

> x4 used almost exclusively for high-capacity server DIMMs, typically buffered/registered.
> Unbuffered x4 would be problematic due to high capacitive load on address/control signals.
>
> x16 is used in el-cheapo PC DIMMs and in embedded devices.
>
> DIMMs used in good PCs are almost exclusively made out of x8 DDR devices.
>
>> The DIMM has 16 chips, 8 per side.
>> If it uses *1's then all 16 chips must have the same 32 rows open,
>> and each chip supplies 1 bit per clock edge from 4096 column addresses.
>> If it uses *4's then it looked like I can get 4*32=128 rows open at once,
>> but each row supplies 4 bits per edge, from 1024 column addresses.
>> For *8's it is 256 rows, 8 bits per edge, from 512 columns addresses
>>
>> There was also how many beats or bunches-of-bits are read per column access.
>> It looked like that could scale from 2, 4 or 8 bits (1, 2 or 4 clocks).
>>
>> So the memory controller has to scale from managing 32 to 256 open rows.
>> And control how many bits are read per beat per column address.
>>
>> Does that all sound about right?
>
> Most typical PC DDR4 configuration has up to 64 simultaneously open banks per DIMM channel
> (16 banks x 4 ranks) and 2 DIMM channels.
> In server you have more channels, but also more controllers.
> Sometimes in server you have more than 4 ranks, but AFAIK in DDR4 it costs you reduces data rate.

One of the reasons I was looking at this has to do with ECC memory,
and specifically on how the main memory error scavenger works.
Currently the error scavenger resides in the memory controller and does a
sequential sweep over all of memory looking for errors and reporting them,
and repairing single bits errors before they become double bit.

Errors are reported to the cpu which logs them every so often.
OS's may use such logs to take bad pages off-line.

That configuration worked ok when main memory was relatively small
by todays standards, but because the scavenger scan is sequential
it has not scaled well as memory size has grown. It can now take
days to do each sweep and that is only going to get worse.

I was wondering if there was a way to allow each DRAM chip or DIMM
to scavenge itself in parallel. The scavenger scan and SECDED
logic could either be incorporated into the DRAM chip itself,
kind of like self-refresh, or else into an on-DIMM controller.

The ideal location for scavenging is on the DRAM chip itself,
and changing the widths to *9 or *18. But DRAM processes don't
like logic so this might not fly.

Next best would be an on-DIMM controller so at least each DIMM
can scavenge in parallel. But that looked like it might require
moving some of the memory controller logic, such as open row tracking,
into an on-DIMM controller.

And of course all of this implies interface changes
to the chips and/or DIMMs.


Anton Ertl

unread,
May 25, 2022, 2:09:53 PMMay 25
to
EricP <ThatWould...@thevillage.com> writes:
>One of the reasons I was looking at this has to do with ECC memory,
>and specifically on how the main memory error scavenger works.

Do you mean scrubbing?

>Currently the error scavenger resides in the memory controller and does a
>sequential sweep over all of memory looking for errors and reporting them,
>and repairing single bits errors before they become double bit.
>
>Errors are reported to the cpu which logs them every so often.
>OS's may use such logs to take bad pages off-line.
>
>That configuration worked ok when main memory was relatively small
>by todays standards, but because the scavenger scan is sequential
>it has not scaled well as memory size has grown. It can now take
>days to do each sweep and that is only going to get worse.

DDR5-4800 has ~38GB/s bandwidth per 64-bit superchannel. If you have
4096GB of memory to scrub with 8 superchannels, and you invest 1% of
the bandwidth (0.38GB/s per superchannel), it takes
4096GB/8/0.38(GB/s)=1347s, i.e. less than half an hour. If it takes
longer, it's because somebody decided to spend less bandwidth on it.

>I was wondering if there was a way to allow each DRAM chip

A DRAM chip does not have the ECC information in DDR4, it's on a
separate chip. DDR5 has something called on-die ECC, but apparently
that is not real ECC (I have not seen any information that explains it
and that looked trustworthy).

>or DIMM
>to scavenge itself in parallel.

At the DIMM level it would be possible, but it would require an
additional memory controller that talks to all chips on the DIMM
(extra cost and extra load on the lines). And you save only a factor
of two (with one such scrubber per DIMM) or four (with one scrubber
per DIMM side).

>Next best would be an on-DIMM controller so at least each DIMM
>can scavenge in parallel. But that looked like it might require
>moving some of the memory controller logic, such as open row tracking,
>into an on-DIMM controller.

Yes, and it would have to coordinate with the CPU's DRAM controller.

Quadibloc

unread,
May 26, 2022, 4:24:36 AMMay 26
to
On Wednesday, May 25, 2022 at 6:33:53 AM UTC-6, Torbjorn Lindgren wrote:

> And we now know it was pretty close to the low-bar, AMD's Direct of
> Technical Marketing Robert Hallock later disclosed in a PCWorld stream
> that the 5.5GHz peak was done without overclocking using a reference
> AM5 motherboard and a "consumer-based" Asetek 280mm - which build MOST
> PC AIO's thanks to various patents on placing the pump on top of the
> CPU (hence some model instead placing the pump in the radiator).
>
> So more substantial cooling than I expect most people have on their
> 5950X (I'm guessing 240mm or big-air) but not chilled water (Intel 28
> core 5GHz demo anyone) or anything crazy and it's a setup that should
> be easy to match using consumer parts.

Given that people have achieved over 6 GHz with previous generations of
Ryzen parts, if the 5.5 GHz had been achieved with liquid nitrogen, I think
the Securities and Exchange Commission would be having a word with
Lisa Su.

Chilled water, like in an IBM mainframe, indeed counts as "something
crazy", but water cooling using a fan and radiator on the water outside
the case is what AMD _recommended_ for their 16-core chips, both the
5950 and its predecessor the 3950. (In fact, it was also recommended
for the 5900 and other members of that series.)

So one can't fault them for using the standard, recommended cooling
solution for their chip!

John Savard

EricP

unread,
May 26, 2022, 12:38:52 PMMay 26
to
Anton Ertl wrote:
> EricP <ThatWould...@thevillage.com> writes:
>> One of the reasons I was looking at this has to do with ECC memory,
>> and specifically on how the main memory error scavenger works.
>
> Do you mean scrubbing?

Yes

>> Currently the error scavenger resides in the memory controller and does a
>> sequential sweep over all of memory looking for errors and reporting them,
>> and repairing single bits errors before they become double bit.
>>
>> Errors are reported to the cpu which logs them every so often.
>> OS's may use such logs to take bad pages off-line.
>>
>> That configuration worked ok when main memory was relatively small
>> by todays standards, but because the scavenger scan is sequential
>> it has not scaled well as memory size has grown. It can now take
>> days to do each sweep and that is only going to get worse.
>
> DDR5-4800 has ~38GB/s bandwidth per 64-bit superchannel. If you have
> 4096GB of memory to scrub with 8 superchannels, and you invest 1% of
> the bandwidth (0.38GB/s per superchannel), it takes
> 4096GB/8/0.38(GB/s)=1347s, i.e. less than half an hour. If it takes
> longer, it's because somebody decided to spend less bandwidth on it.

The number I had from 2010 came from Intel's recommendation
for one of its memory controller hubs and was 1.5 hours/GB.
DRAMs haven't gotten faster since then, just bigger.

>> I was wondering if there was a way to allow each DRAM chip
>
> A DRAM chip does not have the ECC information in DDR4, it's on a
> separate chip. DDR5 has something called on-die ECC, but apparently
> that is not real ECC (I have not seen any information that explains it
> and that looked trustworthy).

I know. I'm exploring new, different configurations.

For example, if the DRAM internal row was a multiple of 72 then
an internal scrubber could read a row and chew away on it over
multiple cycles, ideally without interfering with normal operations.
A 72 bit value could be read/written over a *18 interface in
2 clocks (4 edges).

Reading a 64 byte cache line could then have 8 chips on one side
of the DIMM open to the same 32 rows, with reads staggered across
the chips so the whole cache line is read in a 16 clocks burst.

>> or DIMM
>> to scavenge itself in parallel.
>
> At the DIMM level it would be possible, but it would require an
> additional memory controller that talks to all chips on the DIMM
> (extra cost and extra load on the lines). And you save only a factor
> of two (with one such scrubber per DIMM) or four (with one scrubber
> per DIMM side).

With a scrubber per DIMM or per side it allows each scrub itself
in parallel with all other DIMMs. But it still has to transfer
all the data off each chip to the local controller for checking.

If each chip did its own scrubbing in the background that
transfer bottleneck is avoided.

Anton Ertl

unread,
May 26, 2022, 1:37:56 PMMay 26
to
EricP <ThatWould...@thevillage.com> writes:
>Anton Ertl wrote:
>> DDR5-4800 has ~38GB/s bandwidth per 64-bit superchannel. If you have
>> 4096GB of memory to scrub with 8 superchannels, and you invest 1% of
>> the bandwidth (0.38GB/s per superchannel), it takes
>> 4096GB/8/0.38(GB/s)=1347s, i.e. less than half an hour. If it takes
>> longer, it's because somebody decided to spend less bandwidth on it.
>
>The number I had from 2010 came from Intel's recommendation
>for one of its memory controller hubs and was 1.5 hours/GB.
>DRAMs haven't gotten faster since then, just bigger.

DRAM bandwidth has gotten much bigger. 2010 would be DDR3 memory.
But even DDR3 memory supports mempory bandwidths that support much
higher scrubbing rates even if you use only 1% of the bandwidth. If
Intel scrubbed this slowly by default, it's because they felt that
there is no need for faster scrubbing.

>> A DRAM chip does not have the ECC information in DDR4, it's on a
>> separate chip. DDR5 has something called on-die ECC, but apparently
>> that is not real ECC (I have not seen any information that explains it
>> and that looked trustworthy).
>
>I know. I'm exploring new, different configurations.
>
>For example, if the DRAM internal row was a multiple of 72 then
>an internal scrubber could read a row and chew away on it over
>multiple cycles, ideally without interfering with normal operations.

From the things I have read about DDR5's on-die ECC, it really does
per-die ECC, and fixes errors on the die, but does not send any ECC
information outside.

My guess is that they correct a whole row (minimizing the additional
bits necessary, e.g., 14 extra bits for SECDED of an 8Kb row); it's
possible that it's too expensive to compute ECC for an 8K+14bit row,
and in that case they may choose to use more extra bits and a cheaper
ECC function.

My other guess is that they do the SECDED on every refresh. That
mechanism is already in place, the refreshes occur anyway, and given
that the reason for on-die ECC is flaky RAM cells, it makes little
sense to do a refresh without correcting errors. The stated benefit
of on-die ECC is that this counteracts the tendency of RAM to become
less reliable as processes shrink.

But DDR5 also has ECC DIMMs that have 40 bits per sub-channel, which
allows end-to-end ECC checking. And I guess you run a scrubber on
that, too.

MitchAlsup

unread,
May 26, 2022, 3:31:17 PMMay 26
to
On Thursday, May 26, 2022 at 12:37:56 PM UTC-5, Anton Ertl wrote:
> EricP <ThatWould...@thevillage.com> writes:
> >Anton Ertl wrote:

> >For example, if the DRAM internal row was a multiple of 72 then
> >an internal scrubber could read a row and chew away on it over
> >multiple cycles, ideally without interfering with normal operations.
> From the things I have read about DDR5's on-die ECC, it really does
> per-die ECC, and fixes errors on the die, but does not send any ECC
> information outside.
>
> My guess is that they correct a whole row (minimizing the additional
> bits necessary, e.g., 14 extra bits for SECDED of an 8Kb row); it's
> possible that it's too expensive to compute ECC for an 8K+14bit row,
> and in that case they may choose to use more extra bits and a cheaper
> ECC function.
>
> My other guess is that they do the SECDED on every refresh. That
> mechanism is already in place, the refreshes occur anyway, and given
> that the reason for on-die ECC is flaky RAM cells, it makes little
> sense to do a refresh without correcting errors. The stated benefit
> of on-die ECC is that this counteracts the tendency of RAM to become
> less reliable as processes shrink.
<
Given an ability to perform SECDED, and the fact that reading a DRAM is
a destructive operation, DRAMs with SECDED would be performing
correction on EVERY activate, not just refreshes. {A refresh is just an
activate without using the data pins.}
>
> But DDR5 also has ECC DIMMs that have 40 bits per sub-channel, which
> allows end-to-end ECC checking. And I guess you run a scrubber on
> that, too.
<
Fundamentally, there is no reason to check ECC at the DRAM controller,
then pass data over the interconnect without ECC, and reconstruct another
ECC as data reaches the cache hierarchy. It is at least as good to just send
the RAW DRAM ECC through out the memory hierarchy.
<
This allows the DRAM controller to forward data faster than it can correct
ECC and allow the cache hierarchy to make a correction when needed after
arrival, lowering overall latency.

Michael S

unread,
May 26, 2022, 5:27:40 PMMay 26
to
I suspect it's neither.
Nothing is checked except the content read either by external client or by internal scrubber.

> My other guess is that they do the SECDED on every refresh.

My guess is they don't. Refresh has to be relatively fast, order of 15-25ns.
Checking the whole row in such short time would be expensive, but in terms
of area and in terms of power consumption.
Now, it's likely that the do use refresh time for scrub but check only 64 or at best 128 words,
rather than full row.
As to SECDED, at least in case of Micron it is known that they are doing
SEC, but not DED. They didn't supply enough of redundancy for the later -
only 8 parity bits per 128 data bits. So some dual-error will be detected, but
many others would be confused for single errors and turned into triple errors.

>That
> mechanism is already in place, the refreshes occur anyway, and given
> that the reason for on-die ECC is flaky RAM cells, it makes little
> sense to do a refresh without correcting errors. The stated benefit
> of on-die ECC is that this counteracts the tendency of RAM to become
> less reliable as processes shrink.
>
> But DDR5 also has ECC DIMMs that have 40 bits per sub-channel, which
> allows end-to-end ECC checking. And I guess you run a scrubber on
> that, too.

And since the overhead is +25% instead of +12.5% of previous generations, it could
very likely mean the end to availability of "real" ECC in consumer/prosumer space.
Not that it was too common until now...

Torbjorn Lindgren

unread,
May 26, 2022, 9:06:10 PMMay 26
to
Quadibloc <jsa...@ecn.ab.ca> wrote:
>On Wednesday, May 25, 2022 at 6:33:53 AM UTC-6, Torbjorn Lindgren wrote:
>> And we now know it was pretty close to the low-bar, AMD's Direct of
>> Technical Marketing Robert Hallock later disclosed in a PCWorld stream
>> that the 5.5GHz peak was done without overclocking using a reference
>> AM5 motherboard and a "consumer-based" Asetek 280mm - which build MOST
>> PC AIO's thanks to various patents on placing the pump on top of the
>> CPU (hence some model instead placing the pump in the radiator).
>>
>> So more substantial cooling than I expect most people have on their
>> 5950X (I'm guessing 240mm or big-air) but not chilled water (Intel 28
>> core 5GHz demo anyone) or anything crazy and it's a setup that should
>> be easy to match using consumer parts.
>
>Given that people have achieved over 6 GHz with previous generations of
>Ryzen parts, if the 5.5 GHz had been achieved with liquid nitrogen, I think
>the Securities and Exchange Commission would be having a word with
>Lisa Su.

Except... it's not launched yet and won't be for many months ("fall"),
and it's possible not all SKUs is in the first batch.

There's plenty of cases where pre-production HW has needed, well,
extra encouragement lets say, to reach the specs the final product
will do.

Yes, it's close enough to the launch that this should hopefully not be
the case but I think you're vastly overstating the case especially
given that the official listed spec when I was writing was 5GHz+, NOT
the 5.5GHz they were showing! But we can always agree to disagree on
that.

The situation when you're writing is very different since they've
since made additional statement about how this was done (no
overclocking and on 280mm AIO) and that at least one SKU's should be
able to hit 5.5GHz boost without overclocking under the right
circumstances, but both that and the "system setup" was conspiciously
missing from the presentation.


>Chilled water, like in an IBM mainframe, indeed counts as "something
>crazy", but water cooling using a fan and radiator on the water outside
>the case is what AMD _recommended_ for their 16-core chips, both the
>5950 and its predecessor the 3950. (In fact, it was also recommended
>for the 5900 and other members of that series.)
>
>So one can't fault them for using the standard, recommended cooling
>solution for their chip!

The only actual statement I can find from AMD on the subject is that
for Ryzen they recommend "To get maximum performance... AMD recommends
a high performance all-in-one liquid or air cooler"[1].

The title does give additional prominence to the liquid coolers not
found in the text or page title but the list also includes a number of
air-cooler of varying capacity, down to at one single-tower cooler
with just a single 120mm fan! Not sure that I'd count the Dark Rock as
"high-performance air cooling" but it's in AMD's official list. They
also list several much bigger air tower coolers that would work fine!

Their recommendation for the up to 280W TDP ThreadRipper[2] OTOH does
only list 240mm or larger AIO, or custom water-block. Sure you didn't
get them mixed up?


So, how does that 280mm AIO rate in that list of officially
recommended Ryzen coolers? Pretty close to the top actually, the
"largest" coolers listed are 360mm but those are barely better than
the significantly wider 280mm AIOs - 360mm radiators have only ~10%
more surface area than 280mm and 140mm fans tends to have better
fanblade-to-hub ratio which will reduce this advantage so say 0-10%.

Or to look at it differently I'm guessing the 280mm AIO has about
35-40% more cooling capacity than the 240mm AIO and probably has at
least twice the cooling capacity of that Dark Rock Slim!

So I think I was fair when I said it was "pretty close to the lower
bar" that I think I earlier defined as 240mm AIO or equivivalent.
If anything the presence of the Dark Rock Slim suggest I might have
been kind.

Of course after that AMD's clarified things multiple times, including
AM5 socket power, but I see that they've already had to correct their
first statements about that. I don't think we've seen AMD handle a
reveal this badly before (at least not in the Ryzen era). Which is
very high marks - it wasn't BAD as such, just "scattered".

Originally AMD said the AM5 was designed for max 170W peak power, we
know that the corresponding numbers for AM4 was 142W peak power and
105W TDP. This has now been corrected to 230W peak power and 170W TDP
max for AM5.

Their statement also for the first time mention that they'll have
three TDP groups (170/105/65) for AM5 rather than AM4's two (105/65),
though it's still unclear if they say plan multiple 16-core with
different TDP or not.

Intel's KS and AMD's 5800X3D SKU's has shown that some people will pay
extra for higher (specific) performance even at higher TDP or other
drawback (non-gaming performance for X3D), perhaps the plan is for the
best binned chip to go into a 170W & 5.5GHz boost "premium" SKU.


1. https://www.amd.com/en/processors/ryzen-thermal-solutions
2. https://www.amd.com/en/thermal-solutions-threadripper
3. https://www.tomshardware.com/news/amd-corrects-socket-am5-for-ryzen-7000-power-specs-230w-peak-power-170w-tdp

Anton Ertl

unread,
May 27, 2022, 3:53:05 AMMay 27
to
Michael S <already...@yahoo.com> writes:
>On Thursday, May 26, 2022 at 8:37:56 PM UTC+3, Anton Ertl wrote:
>> My guess is that they correct a whole row (minimizing the additional
>> bits necessary, e.g., 14 extra bits for SECDED of an 8Kb row); it's
>> possible that it's too expensive to compute ECC for an 8K+14bit row,
>> and in that case they may choose to use more extra bits and a cheaper
>> ECC function.
>>
>
>I suspect it's neither.
>Nothing is checked except the content read either by external client or by internal scrubber.
>
>> My other guess is that they do the SECDED on every refresh.
>
>My guess is they don't. Refresh has to be relatively fast, order of 15-25ns.
>Checking the whole row in such short time would be expensive, but in terms
>of area and in terms of power consumption.

I think that the area is small compared to the DRAM array, and the
power consumption is small compared to the driving power for
refreshing the row (which is done anyway). But speed might be an
issue. Caches often have ECC, so error correction can be speedy, but
they only do much shorter blocks.

One way would be to have shorter blocks, e.g., 9 blocks of 920-921
bits (10 of which are for ECC) for an 8Kb row. But if you have that
many extra bits (90 for an 8Kb row), there may be better ways to do
ECC than just to split to row into subblocks and check them
separately.

>Now, it's likely that the do use refresh time for scrub but check only 64 or at best 128 words,
>rather than full row.

I guess your idea is to use the transfer lines that are there anyway
for the read and write accesses for the accessed part of a row and do
ECC on that (e.g. for an 8-bit-wide device with 16-beat minimum burst
length, that would be 128 bits). Concerning area, that requires 8 ECC
bits per 128 data bits, which is probably more than when doing ECC for
bigger blocks (bigger ECC circuits, but less extra DRAM). Concerning
timing, the ECC computation may be shorter, but you now have the
transfer there and back.

>As to SECDED, at least in case of Micron it is known that they are doing
>SEC, but not DED.

I guess there is also no way for the device to report a detected error.

>They didn't supply enough of redundancy for the later -
>only 8 parity bits per 128 data bits.

Maybe I should have looked at the data sheets instead of speculating.
So they have documented that they do it this way. Interesting.

So for scrubbing an 8Kb line they use 64 refreshes.

>> But DDR5 also has ECC DIMMs that have 40 bits per sub-channel, which
>> allows end-to-end ECC checking. And I guess you run a scrubber on
>> that, too.
>
>And since the overhead is +25% instead of +12.5% of previous generations, it could
> very likely mean the end to availability of "real" ECC in consumer/prosumer space.
>Not that it was too common until now...

Currently I see a factor of 1.8 between 32GB DDR4-3200 UDIMMs with and
without ECC, which should be plenty for absorbing the 25% overhead in
DRAM chips; of course that partly reflects the fact that there are low
sales numbers for ECC UDIMMs. Interestingly, a 32GB DDR4-3200 RDIMM
can be had for less than a UDIMM (and slower RDIMMs are significantly
cheaper) despite needing additional chips.

So basically, those who buy ECC UDIMMs are already paying a hefty
premium now (and if they buy Intel, they also pay it on the board and
maybe on the CPU; and AMD is partially following Intel by disabling
ECC on their non-Pro APUs, but they idiotically don't sell Pro APUs in
retail). I expect that this will stay the same.

In the long run, the (sub)channels are going to become narrower, which
would mean increasing hardware overhead for ECC. But given on-die
ECC, there is another option: each chip could generate a parity bit
and transmit/receive that on a separate line in order to detect
transmission errors. If there is a transmission error, the receiver
would request a retransmit (or maybe use more lines and perform error
correction if retransmission is impractical).

As for commercial considerations, there is obviously little revenue to
be made with ECC UDIMMs, so market segmentation for UDIMMs with ECC
does not really work, so there is no commercial reason to block such a
development. Market segmentation between UDIMMs in the cheap segment
and RDIMMs and LRDIMMs in the expensive segment seems to work, but
there is no need to block ECC for UDIMMs for that.

Michael S

unread,
May 27, 2022, 6:01:43 AMMay 27
to
I didn't read datasheet either, just their marketing materials named
"Introducing Micron ® DDR5 SDRAM: More Than a Generational Update."

Today I finally opened a datasheet and found out that my guesses were too optimistic.
It is not stated explicitly, but it could be easily deduced that in automatic ESC mode
each Refresh command (REFab) handles exactly 1 code block (128 bits) rather than
4 or 8 code blocks that I was hoping for.
That can be deduced from the following statement in the datasheet:
'The maximum spacing between REFab commands or self refresh entry for the
device to complete the automatic scrub within the recommended 24 hours is t ECSint."

>
> So for scrubbing an 8Kb line they use 64 refreshes.

Exactly :(

> >> But DDR5 also has ECC DIMMs that have 40 bits per sub-channel, which
> >> allows end-to-end ECC checking. And I guess you run a scrubber on
> >> that, too.
> >
> >And since the overhead is +25% instead of +12.5% of previous generations, it could
> > very likely mean the end to availability of "real" ECC in consumer/prosumer space.
> >Not that it was too common until now...
> Currently I see a factor of 1.8 between 32GB DDR4-3200 UDIMMs with and
> without ECC, which should be plenty for absorbing the 25% overhead in
> DRAM chips; of course that partly reflects the fact that there are low
> sales numbers for ECC UDIMMs. Interestingly, a 32GB DDR4-3200 RDIMM
> can be had for less than a UDIMM (and slower RDIMMs are significantly
> cheaper) despite needing additional chips.
>
> So basically, those who buy ECC UDIMMs are already paying a hefty
> premium now (and if they buy Intel, they also pay it on the board and
> maybe on the CPU; and AMD is partially following Intel by disabling
> ECC on their non-Pro APUs, but they idiotically don't sell Pro APUs in
> retail). I expect that this will stay the same.
>
> In the long run, the (sub)channels are going to become narrower,

You still assume that we are ahead of long run [of Moore law].
I am less optimistic. I seems plausible that provisions for 64gb devices
that is already in DDR5 will suffice for more than decade.
May be, close to 2 decades. And we, Anton and Michael, are less
young then we used to be.

Anton Ertl

unread,
May 27, 2022, 8:15:03 AMMay 27
to
Michael S <already...@yahoo.com> writes:
>On Friday, May 27, 2022 at 10:53:05 AM UTC+3, Anton Ertl wrote:
>> In the long run, the (sub)channels are going to become narrower,=20
>
>You still assume that we are ahead of long run [of Moore law].
>I am less optimistic. I seems plausible that provisions for 64gb devices
>that is already in DDR5 will suffice for more than decade.
>May be, close to 2 decades.

Newer DDR standards are not primarily about chip capacity (currently
the largest DDR5 chips are not larger than supported by DDR4: 16Gb),
but about transfer rate. I.e., if the transfer rate becomes faster
than 16x the internal cycle time of DRAM, they have to increase the
minimum burst length (and decrease the (sub)channel width) to benefit
from that increase. And currently I see no sign that the transfer
rate increases are slowing down. DDR4 was released 7 years after
DDR3; DDR5 was released 6 years after DDR4. On the PCIe front, the
release cadence seems to be even higher.

Michael S

unread,
May 27, 2022, 12:11:46 PMMay 27
to
I can not formulate it with sufficient rigor, but my opinion is that need
in new DDR standards is very much a function of non-stability (hopefully growth)
of the market in term of average capacity.
In terms of trade offs between bandwidth, capacity. latency, energy consumption
and configurability, DDR3/4/5 interfaces are jacks of all trades, masters of none.
That is very valuable, when there is a rapid change and no time for specialization.
In other, more stable market conditions it is something that pushes your out of
all commodities into area that economists call "long tail of Pareto curve".
That's not the worst case to be, but also not good enough to invent the next
generation of the JEDEC Standard.

As to commodities, we already see enough to guess with good certainty how
they will look if growth of capacity stops.

On low-capacity (consumer) side of things it would be either LPDDR4 or something
based on LPDDR4 competing against current and future generations of HBM.

On high-capacity (server) side it would be yet another variation of "fully-buffered"
idea, most likely based on the phy as PCIeN. Yes, it failed all previous times,
but it failed exactly because the market needs were dynamic. This time it would
be different. Also, it's already different from last 40 years in terms of much bigger
centralization of the server market and because of "this or that shite as a service".
As you can guess, I don't like these new trends, but can't say that they are not here
because of my dislike.




Stephen Fuld

unread,
May 27, 2022, 12:58:14 PMMay 27
to
I disagree. If it were true, then newer generations of the standard
would be much simpler; just add an address bit or two. The fact that
each standard provides a substantial increase in data rate/bandwidth
over the previous one, which is harder than adding an address pin, says
that data rate/bandwidth is the driving factor, and capacity comes in
second. Unfortunately, latency has proven resistant to substantial
improvements :-(



--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Anton Ertl

unread,
May 27, 2022, 2:14:35 PMMay 27
to
Michael S <already...@yahoo.com> writes:
>As to commodities, we already see enough to guess with good certainty how
>they will look if growth of capacity stops.
>
>On low-capacity (consumer) side of things it would be either LPDDR4 or something
>based on LPDDR4 competing against current and future generations of HBM.

The main shortcoming of LPDDRx memory is that it is soldered on the
board (you gain transfer rate and lower power consumption from that).
That seems to be too unflexible for most desktops (except Apple
customers).

HBM is connected through an MCM, and generally seems too expensive for
consumers.

>On high-capacity (server) side it would be yet another variation of "fully-buffered"
>idea, most likely based on the phy as PCIeN.

There is something new called CXL; the next version of CXL is expected
soon and based on PCIe 6.0 PHY. But from what I read about it, it
does not seem to be a replacement for RDIMMs or LRDIMMs; it's probably
more that you have some boards that have memory in the form of LRDIMMs
if high-capacity is relevant that probably also have one or several
CPUs, and are connected to other boards through CXL. I expect that
certain specialized big iron things will make use of it somewhere, but
I don't expect that it's a replacement for RDIMMs or LRDIMMs.

MitchAlsup

unread,
May 27, 2022, 4:54:49 PMMay 27
to
DRAM chips have rather constant area. Passing over a linear amount of
real-estate is a wire delay problem. So that transistors are getting faster
and wire is getting slower has finally become manifest in DRAM. {actually
DRAM bank access to pin drivers has stayed rather constant since DDR2.}
<
But, does anyone know the ratio of consumer DRAMs (all those PCs) versus
the server DRAM (all those servers) in total chip sold/bought/delivered ?

MitchAlsup

unread,
May 27, 2022, 4:59:11 PMMay 27
to
On Friday, May 27, 2022 at 1:14:35 PM UTC-5, Anton Ertl wrote:
> Michael S <already...@yahoo.com> writes:
> >As to commodities, we already see enough to guess with good certainty how
> >they will look if growth of capacity stops.
> >
> >On low-capacity (consumer) side of things it would be either LPDDR4 or something
> >based on LPDDR4 competing against current and future generations of HBM.
> The main shortcoming of LPDDRx memory is that it is soldered on the
> board (you gain transfer rate and lower power consumption from that).
> That seems to be too unflexible for most desktops (except Apple
> customers).
>
> HBM is connected through an MCM, and generally seems too expensive for
> consumers.
<
LPDDR and HBM are fine for cell phones and GPUs and NON-Upgradable PCs.
<
Last time I looked, cell phones are 10× the market of PCs.
<
> >On high-capacity (server) side it would be yet another variation of "fully-buffered"
> >idea, most likely based on the phy as PCIeN.
<
PCIe 6.0 ? PAM4 and 16GHz clock rate.
<
> There is something new called CXL; the next version of CXL is expected
> soon and based on PCIe 6.0 PHY. But from what I read about it, it
> does not seem to be a replacement for RDIMMs or LRDIMMs; it's probably
> more that you have some boards that have memory in the form of LRDIMMs
> if high-capacity is relevant that probably also have one or several
> CPUs, and are connected to other boards through CXL. I expect that
> certain specialized big iron things will make use of it somewhere, but
> I don't expect that it's a replacement for RDIMMs or LRDIMMs.
<
The main problem is that pins are expensive (area, poser, and skew)

Michael S

unread,
May 28, 2022, 2:44:33 PMMay 28
to
On Friday, May 27, 2022 at 9:14:35 PM UTC+3, Anton Ertl wrote:
> Michael S <already...@yahoo.com> writes:
> >As to commodities, we already see enough to guess with good certainty how
> >they will look if growth of capacity stops.
> >
> >On low-capacity (consumer) side of things it would be either LPDDR4 or something
> >based on LPDDR4 competing against current and future generations of HBM.
> The main shortcoming of LPDDRx memory is that it is soldered on the
> board (you gain transfer rate and lower power consumption from that).
> That seems to be too unflexible for most desktops (except Apple
> customers).

As mentioned by Mitch, DDR4/5 is already outsold by LPDDR4.
And certainly Apple is not the only factor in that. I didn't check
but would expect that any big laptop manufacturer sells more
LPDDR4 machines than of those base on DDR4/5.
Also I'd expect that all AIO desktops and majority of miniPC desktops
are LPDDR4.

>
> HBM is connected through an MCM, and generally seems too expensive for
> consumers.

Not every consumer is cheap. Think gamers, not the casual ones, but addicts.
Or everybody buying Apple.

> >On high-capacity (server) side it would be yet another variation of "fully-buffered"
> >idea, most likely based on the phy as PCIeN.
> There is something new called CXL; the next version of CXL is expected
> soon and based on PCIe 6.0 PHY. But from what I read about it, it
> does not seem to be a replacement for RDIMMs or LRDIMMs; it's probably
> more that you have some boards that have memory in the form of LRDIMMs
> if high-capacity is relevant that probably also have one or several
> CPUs, and are connected to other boards through CXL. I expect that
> certain specialized big iron things will make use of it somewhere, but
> I don't expect that it's a replacement for RDIMMs or LRDIMMs.

Yes, except that I expect that if they went to trouble of doing extension box
they probably are going to build it for cheaper and faster RDIMMs rather
than for LRDIMMS
But my point is that people that build this boxen don't need new memory standard
every 5-7 years. Esp. when capacity does not go up. Even with DDR4 they likely
have internal bandwidth that easily exceeds bandwidth of their external COMM
channels, because there is no other way to get required capacity.
So, if they need DDR5 then only because of promised extra capacity.

My another point is that in stable market middlemen are pushed out.
This boxen would be built either by system manufacturers or, more
likely, by memory manufacturers. And in the second case
memory-to-COMM-ASIC interface does not even have to follow any
JEDEC standard.

MitchAlsup

unread,
May 28, 2022, 3:42:20 PMMay 28
to
I bought a new machine about 6 months ago. It has more CPUs (12) and more
memory (32 GB) than I think I will ever need. When I look while having 6 WORD
documents open (200 pages each), 3 spreadsheets, and 4 CorelDraw files open
30-70 pages each), I am sitting at 25%-30% memory capacity.
>
> My another point is that in stable market middlemen are pushed out.
> This boxen would be built either by system manufacturers or, more
> likely, by memory manufacturers. And in the second case
> memory-to-COMM-ASIC interface does not even have to follow any
> JEDEC standard.
<
Standards are for people who want "choice between" not simply "choice of".
The PC market may have already reached that "stable market" criterion
mentioned above. The only thing I want in a new PC is the ability to insert
my old SATA drives rather than moving many hours of SATA copying when I
get a new machine. {And I certainly don't want any of this in the cloud !}

Anton Ertl

unread,
May 29, 2022, 6:14:20 AMMay 29
to
Michael S <already...@yahoo.com> writes:
>On Friday, May 27, 2022 at 9:14:35 PM UTC+3, Anton Ertl wrote:
>> Michael S <already...@yahoo.com> writes:
>> >As to commodities, we already see enough to guess with good certainty how
>> >they will look if growth of capacity stops.
>> >
>> >On low-capacity (consumer) side of things it would be either LPDDR4 or something
>> >based on LPDDR4 competing against current and future generations of HBM.
>> The main shortcoming of LPDDRx memory is that it is soldered on the
>> board (you gain transfer rate and lower power consumption from that).
>> That seems to be too unflexible for most desktops (except Apple
>> customers).
>
>As mentioned by Mitch, DDR4/5 is already outsold by LPDDR4.

[Citation needed]

Whether that's true or not, it does not change the shortcomings, or
the situation for desktops.

>And certainly Apple is not the only factor in that. I didn't check
>but would expect that any big laptop manufacturer sells more
>LPDDR4 machines than of those base on DDR4/5.

Laptops are not desktops, but anyway, let's get some data on this:

Looking at the "RAM type" category in <https://geizhals.eu/?cat=nb>:

models
5235 DDR4
1578 LPDDR4
154 DDR5
255 LPDDR5

So there are certainly many more models with DDR4/5 than with LPDDR4,
and this probably also means more sales.

The DDR5 vs. LPDDR5 numbers may indicate that the situation will be
different for DDR5 vs. LPDDR5, or it may just have to do with the kind
of models and CPUs available this early in the game. Comparing the
CPU manufacturers for DDR5 vs. LPDDR5 models, I see:

DDR5 LPDDR5
37 6 AMD
117 87 Intel
0 162 Apple

So AMD and Intel CPUs are more often built into laptops with DDR5
rather than LPDDR5 memory.

The numbers of RAM slots are also interesting:

Slots
3223 unknown (includes 0)
4349 1+
3022 2+
143 4+

And the amount of soldered RAM (note that many laptops have some RAM
soldered and a slot for expansion):

3762 none
795 4GB
1428 8GB
1316 16GB
228 32GB
43 64GB (all Apple)

>Also I'd expect that all AIO desktops and majority of miniPC desktops
>are LPDDR4.

Ok, let's check this:

All-in-one <https://geizhals.eu/?cat=sysdiv&xf=450_All-in-One>:

348 DDR4
95 LPDDR4 (all Apple)

Mini PC <https://geizhals.eu/?cat=sysdiv&xf=450_Mini-PC>:
336 DDR4
3 DDR5
25 LPDDR4
36 LPDDR5 (all Apple)

>> HBM is connected through an MCM, and generally seems too expensive for
>> consumers.
>
>Not every consumer is cheap. Think gamers, not the casual ones, but addicts.

I doubt that they are a big enough market to justify the expense of
spinning separate HBM CPUs. And I am not sure if they would be
interested. From what I read, they are into specialty DIMMs with high
clocks and low latency numbers.

>Or everybody buying Apple.

None of the Apple systems use HBM. They use LPDDR4/5. And what would
HBM buy them? Apple gets more than enough bandwidth for their larger
configurations from LPDDR5; they are still weak on capacity, but HBM
would not change that.

Concerning consumers who are willing to pay, the marketing expert
knows how to skim that willingness by giving them something that costs
only a little more to produce (such as the specialty DIMMs discussed
above). An HBM system costs a lot more to produce: changed memory
controllers, changed connectors on the CPU die, MCM, changed socket
(or maybe something like the Pentium II/III and early Athlon slots),
changed mainboards, ...

For Apple's ecosystem it's more in reach, but even they don't go
there.

[CXL]
>But my point is that people that build this boxen don't need new memory standard
>every 5-7 years. Esp. when capacity does not go up. Even with DDR4 they likely
>have internal bandwidth that easily exceeds bandwidth of their external COMM
>channels, because there is no other way to get required capacity.
>So, if they need DDR5 then only because of promised extra capacity.

A pure memory box with bandwidth that cannot even compete with DDR4 in
bandwidth (and certainly not latency) does not look to me like something
that many people would buy. I doubt that we will see such products.

Michael S

unread,
May 29, 2022, 9:07:50 AMMay 29
to
I'm not sure that it follows.
Simple models like Dell Latitude 3320 probably outsel more advanced models by significant factor.
But overall I agree that I overestimated popularity of LPDDR in non-apple PCs.
I suppose, they want to reach the level of 3D graphics performance that
is comparable with best discrete cards of NVidia and AMD. Not necessarily
to match the very top, but to be close. For that they will need HBM.

> Apple gets more than enough bandwidth for their larger
> configurations from LPDDR5; they are still weak on capacity, but HBM
> would not change that.
>
> Concerning consumers who are willing to pay, the marketing expert
> knows how to skim that willingness by giving them something that costs
> only a little more to produce (such as the specialty DIMMs discussed
> above). An HBM system costs a lot more to produce: changed memory
> controllers, changed connectors on the CPU die, MCM, changed socket
> (or maybe something like the Pentium II/III and early Athlon slots),
> changed mainboards, ...
>
> For Apple's ecosystem it's more in reach, but even they don't go
> there.
>
> [CXL]
> >But my point is that people that build this boxen don't need new memory standard
> >every 5-7 years. Esp. when capacity does not go up. Even with DDR4 they likely
> >have internal bandwidth that easily exceeds bandwidth of their external COMM
> >channels, because there is no other way to get required capacity.
> >So, if they need DDR5 then only because of promised extra capacity.
> A pure memory box with bandwidth that cannot even compete with DDR4 in
> bandwidth (and certainly not latency) does not look to me like something
> that many people would buy. I doubt that we will see such products.

Who says it does not compete with DDR4 in bandwidth?
I expect that such product, in order to be sell-able, has to beat DDR4 in bandwidth
per pin, but not necessarily by large factor. Factor of 1.5 to 2 is enough.
But most importantly it has to beat anything else on the market (i.e. mostly
RL-DIMMs) at least by factor of 4 in *capacity* per pin. Preferably, it should
promise factor of 6-8, at least as option for future upgrade.
In order to achieve that it would need pretty wide internal buses, at very least
8x wider that external COMM links, more likely 12x wider.
So, even with DDR4 internal bandwidth easily beats bandwidth of the link.
I just do not see how they can possible get required capacity otherwise.

As to latency, of course it would be bad. But alternatives of similar capacity
are "super-NUMA" like HP Superdome Flex, which is likely even worse if the app
isn't very NUMA-friendly. Another alternative is big IBM z box. Here latency is
better (than HP), due to relatively tightly connected processor racks, but the
price is high.

Anton Ertl

unread,
May 29, 2022, 12:06:16 PMMay 29
to
Michael S <already...@yahoo.com> writes:
>On Sunday, May 29, 2022 at 1:14:20 PM UTC+3, Anton Ertl wrote:
>> Looking at the "RAM type" category in <https://geizhals.eu/?cat=nb>:
>>
>> models
>> 5235 DDR4
>> 1578 LPDDR4
>> 154 DDR5
>> 255 LPDDR5
>>
>> So there are certainly many more models with DDR4/5 than with LPDDR4,
>> and this probably also means more sales.
>>
>
>I'm not sure that it follows.
>Simple models like Dell Latitude 3320 probably outsel more advanced models by significant factor.

Maybe. And maybe even cheaper models like, say, the ASUS Business P1
P1511CEA-BQ749 (offered by 81 dealers vs. 57 for the most-offered Dell
Latitude 3320 variant) outsell the Dell Latitude 3320 by a large
margin. Given that we have no sales numbers, I'll use the model
numbers as approximation.

>> None of the Apple systems use HBM. They use LPDDR4/5. And what would
>> HBM buy them?
>
>I suppose, they want to reach the level of 3D graphics performance that
>is comparable with best discrete cards of NVidia and AMD. Not necessarily
>to match the very top, but to be close. For that they will need HBM.

bits bandwidth
1024 816GB/s M1 Ultra
256 576GB/s Radeon RX 6950XT
384 1008GB/s Geforce RTX 3090 Ti

Given that the M1 Ultra has 96MB of system level cache (compared to
128MB of Infinity Cache for the 6950XT, with cache being no feature
that Nvidia touts), Apple already has comparable bandwidth. No HBM
needed.

>> [CXL]
>> >But my point is that people that build this boxen don't need new memory standard
>> >every 5-7 years. Esp. when capacity does not go up. Even with DDR4 they likely
>> >have internal bandwidth that easily exceeds bandwidth of their external COMM
>> >channels, because there is no other way to get required capacity.
>> >So, if they need DDR5 then only because of promised extra capacity.
>> A pure memory box with bandwidth that cannot even compete with DDR4 in
>> bandwidth (and certainly not latency) does not look to me like something
>> that many people would buy. I doubt that we will see such products.
>
>Who says it does not compete with DDR4 in bandwidth?

You do: "Even with DDR4 they likely have internal bandwidth that
easily exceeds bandwidth of their external COMM channels".

As for the rest: A box with more memory than supported by LRDIMMs on,
e.g., EPYC boards will probably find a market, but I doubt that it
will replace LRDIMMs.

Brett

unread,
May 29, 2022, 4:08:19 PMMay 29
to
The soldered ram in a laptop should be LPDDR while the DIMM in the same
laptop is not low power, so you may have some double counts in this list.

In the fifth generation LPDDR is clearly winning.

Michael S

unread,
May 29, 2022, 4:12:20 PMMay 29
to
On Sunday, May 29, 2022 at 7:06:16 PM UTC+3, Anton Ertl wrote:
> Michael S <already...@yahoo.com> writes:
> >On Sunday, May 29, 2022 at 1:14:20 PM UTC+3, Anton Ertl wrote:
> >> Looking at the "RAM type" category in <https://geizhals.eu/?cat=nb>:
> >>
> >> models
> >> 5235 DDR4
> >> 1578 LPDDR4
> >> 154 DDR5
> >> 255 LPDDR5
> >>
> >> So there are certainly many more models with DDR4/5 than with LPDDR4,
> >> and this probably also means more sales.
> >>
> >
> >I'm not sure that it follows.
> >Simple models like Dell Latitude 3320 probably outsel more advanced models by significant factor.
> Maybe. And maybe even cheaper models like, say, the ASUS Business P1
> P1511CEA-BQ749 (offered by 81 dealers vs. 57 for the most-offered Dell
> Latitude 3320 variant)

Or. may be, even lower end ASUS VivoBook E12, which resembles ACER gears.
Who knows?

> outsell the Dell Latitude 3320 by a large
> margin. Given that we have no sales numbers, I'll use the model
> numbers as approximation.

At very least, we know that according to Gartner estimates Dell PC division
as a whole outsells ASUS PC division as whole, 2.5 to 1. So, fewer dealers,
but more PCs sold by each dealer.
Also, I would think that [at retail] ASUS has many more models than Dell.
At least that's what I see in a couple of shops that I just checked. In one shop
the difference in favor of ASUS was 5 to 1.
So, it seems, Dell has to sell on average many more units of each model.
Of course, it does not mean that most sold Dell model is L3320.

By the way, I don't know how it works in your country, but here Latitude is not
seen in shops at all. Only Inspiron , Vostro and occasional Alienware.
But at work Latitude is everywhere.
Hopefully, now you paid attention to qualifiers "internal and "external".

> As for the rest: A box with more memory than supported by LRDIMMs on,
> e.g., EPYC boards will probably find a market, but I doubt that it
> will replace LRDIMMs.

It does not have to replace LR-DIMM in order to make DDR6 unnecessary or,
at least, to delay its introduction.
I rather see it as an opposite - LR-DIMM are in the same boat with those future
external boxes. Both serve to extend the usefulness of given generation of
DDRn, helping to reduce the necessity of the next generation.

Anton Ertl

unread,
May 29, 2022, 5:41:30 PMMay 29
to
Brett <gg...@yahoo.com> writes:
>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>> Looking at the "RAM type" category in <https://geizhals.eu/?cat=nb>:
>>
>> models
>> 5235 DDR4
>> 1578 LPDDR4
>> 154 DDR5
>> 255 LPDDR5
>
>The soldered ram in a laptop should be LPDDR while the DIMM in the same
>laptop is not low power,

While the memory controller in laptop chips like the Tiger Lake can
handle DDR4 or LPDDR4, they cannot mix them, for whatever reason. So
if a laptop has soldered RAM and a DIMM slot, the soldered RAM is DDR4
(or DDR5), not LPDDRx.

E.g., the Fujitsu Lifebook U7311 has 8GB of soldered RAM and a slot
for a SO-DIMM. dmidecode -t memory tells me [I shortened the output]:

Handle 0x000D, DMI type 17, 92 bytes
Memory Device
Size: 8 GB
Locator: Onboard (Controller0-ChannelA)
Type: DDR4
Type Detail: Synchronous
Speed: 3200 MT/s
Manufacturer: Samsung

Handle 0x0011, DMI type 17, 92 bytes
Memory Device
Size: 32 GB
Locator: DIMM0 (Controller1-ChannelA)
Bank Locator: BANK 0
Type: DDR4
Type Detail: Synchronous
Speed: 3200 MT/s
Manufacturer: Crucial Technology

The first is the soldered-on memory (Locator: Onboard), the second is
an additional SO-DIMM. Both are DDR4.

MitchAlsup

unread,
May 29, 2022, 7:52:19 PMMay 29
to
On Sunday, May 29, 2022 at 4:41:30 PM UTC-5, Anton Ertl wrote:
> Brett <gg...@yahoo.com> writes:
> >Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> >> Looking at the "RAM type" category in <https://geizhals.eu/?cat=nb>:
> >>
> >> models
> >> 5235 DDR4
> >> 1578 LPDDR4
> >> 154 DDR5
> >> 255 LPDDR5
> >
> >The soldered ram in a laptop should be LPDDR while the DIMM in the same
> >laptop is not low power,
> While the memory controller in laptop chips like the Tiger Lake can
> handle DDR4 or LPDDR4, they cannot mix them, for whatever reason. So
> if a laptop has soldered RAM and a DIMM slot, the soldered RAM is DDR4
> (or DDR5), not LPDDRx.
<
The inability to mix and match in laptop chips is likely a fuse programmed
at testing time.

Marcus

unread,
May 30, 2022, 3:37:56 AMMay 30
to
On 2022-05-28, MitchAlsup wrote:
> On Saturday, May 28, 2022 at 1:44:33 PM UTC-5, Michael S wrote:

[snip]

>> But my point is that people that build this boxen don't need new memory standard
>> every 5-7 years. Esp. when capacity does not go up. Even with DDR4 they likely
>> have internal bandwidth that easily exceeds bandwidth of their external COMM
>> channels, because there is no other way to get required capacity.
>> So, if they need DDR5 then only because of promised extra capacity.
> <
> I bought a new machine about 6 months ago. It has more CPUs (12) and more
> memory (32 GB) than I think I will ever need. When I look while having 6 WORD
> documents open (200 pages each), 3 spreadsheets, and 4 CorelDraw files open
> 30-70 pages each), I am sitting at 25%-30% memory capacity.

A few things that I have learned about CPU cores and memory consumption:

1. With every extra core you want more RAM. Twice the number of cores
roughly translates to twice the memory requirements. My typical use case
is parallel compilation (e.g. building LLVM), where each CPU core gets
its own compiler process, and each process consumes RAM independently of
the other. For embarrassingly parallel tasks multi-core CPUs are king.

2. I regularly enough run virtual machines to care about having enough
RAM to support both a fully working host environment and a working VM
environment. Usually the VM uses 25-50% of the host RAM, so out of my
32 GB RAM I sometimes have to do with 16 GB available RAM when the VM
is running.

/Marcus

Anton Ertl

unread,
May 30, 2022, 9:58:53 AMMay 30
to
Michael S <already...@yahoo.com> writes:
>On Sunday, May 29, 2022 at 7:06:16 PM UTC+3, Anton Ertl wrote:
>> Michael S <already...@yahoo.com> writes:
>> >On Sunday, May 29, 2022 at 1:14:20 PM UTC+3, Anton Ertl wrote:
>> >> Looking at the "RAM type" category in <https://geizhals.eu/?cat=nb>:
>> >>
>> >> models
>> >> 5235 DDR4
>> >> 1578 LPDDR4
>> >> 154 DDR5
>> >> 255 LPDDR5
>> >>
>> >> So there are certainly many more models with DDR4/5 than with LPDDR4,
>> >> and this probably also means more sales.
>> >>
>> >
>> >I'm not sure that it follows.
>> >Simple models like Dell Latitude 3320 probably outsel more advanced models by significant factor.
>> Maybe. And maybe even cheaper models like, say, the ASUS Business P1
>> P1511CEA-BQ749 (offered by 81 dealers vs. 57 for the most-offered Dell
>> Latitude 3320 variant)
>
>Or. may be, even lower end ASUS VivoBook E12

Unlikely at the moment, because none are offered. Looking at one
model, I see that it was last offered in February 2019.

>At very least, we know that according to Gartner estimates Dell PC division
>as a whole outsells ASUS PC division as whole, 2.5 to 1. So, fewer dealers,
>but more PCs sold by each dealer.
>Also, I would think that [at retail] ASUS has many more models than Dell.

ASUS Dell
662 468 <https://geizhals.eu/?cat=nb> (Laptops).
61 227 <https://geizhals.eu/?cat=sysdiv> (Systems)

>By the way, I don't know how it works in your country, but here Latitude is not
>seen in shops at all. Only Inspiron , Vostro and occasional Alienware.
>But at work Latitude is everywhere.

I don't know how it works in my country, and I don't go to shops
looking what brands of laptops they have, but I don't see that many
Dells overall when I see other people's laptops. For those that I
see, I have no idea what models they are. There's a central
laptop/tablet ordering system two times per year from the University
of Vienna that my university participates in, and I guess that many
people buy laptops through this system; the most recent one was
<https://ubook.at/fileadmin/content/broschueren/ubook_sose2022.pdf>.
The only Dell Latitude there cost EUR 2239, and I doubt that many
people bought it.

Anyway, if we assume that Dell sells particularly many laptops, and
that we need to study them separately, there the RAM type statistics
is:

models
362 DDR4
57 LPDDR4
17 DDR5
4 LPDDR5

So for Dell laptops the balance is even less in favour of LPDDR4. Do
people all buy these 57 LPDDR4 models, and Dell offers the other
models just for fun?

>> >> [CXL]
>> >> >But my point is that people that build this boxen don't need new memory standard
>> >> >every 5-7 years. Esp. when capacity does not go up. Even with DDR4 they likely
>> >> >have internal bandwidth that easily exceeds bandwidth of their external COMM
>> >> >channels, because there is no other way to get required capacity.
>> >> >So, if they need DDR5 then only because of promised extra capacity.
>> >> A pure memory box with bandwidth that cannot even compete with DDR4 in
>> >> bandwidth (and certainly not latency) does not look to me like something
>> >> that many people would buy. I doubt that we will see such products.
>> >
>> >Who says it does not compete with DDR4 in bandwidth?
>> You do: "Even with DDR4 they likely have internal bandwidth that
>> easily exceeds bandwidth of their external COMM channels".
>>
>
>Hopefully, now you paid attention to qualifiers "internal and "external".
>
>> As for the rest: A box with more memory than supported by LRDIMMs on,
>> e.g., EPYC boards will probably find a market, but I doubt that it
>> will replace LRDIMMs.
>
>It does not have to replace LR-DIMM in order to make DDR6 unnecessary or,
>at least, to delay its introduction.

Like DDR5, DDR6 will probably be about bandwidth, not capacity. The
fact that the largest DDR5 UDIMMs and RDIMMs at the moment are not
bigger than the largest DDR4 UDIMMs and RDIMMs, respectively, shows
that DDR5 is not about capacity (no DDR5 LRDIMMs yet, probably for
lack of corresponding servers).

If they can improve signaling speed enough that DDR5 is insufficient,
there will be DDR6, offering improved bandwidth. It will be there for
those who use UDIMMs, those who use RDIMMs, and those who use LRDIMMs.
And the box you have in mind will have to compete with that.

>I rather see it as an opposite - LR-DIMM are in the same boat with those future
>external boxes. Both serve to extend the usefulness of given generation of
>DDRn, helping to reduce the necessity of the next generation.

LRDIMMs increase the capacity, but not the bandwidth. But bandwidth
is the main reason for the next DDR generation.

Stephen Fuld

unread,
May 30, 2022, 2:05:22 PMMay 30
to
While all of that is true, I think it is only a part of the reason for
the long latency. Consider a hypothetical chip of the same physical
size as a current DRAM, but instead uses SRAM. Of course, it will have
smaller capacity and perhaps require more pins than the DRAM, but I
expect it will have lower latency. Is that correct?

I think that the original DRAM designers made the (IMO correct) tradeoff
to gain lower cost (fewer pins), at the cost of higher latency. This
was reinforced by the introduction of CPU caches, which are essentially
average latency reducers, hiding much of the longer latency. These
allowed DRAMS to successfully dominate the market.

However, cache is a diminishing returns game. Adding more improves
performance, but not as much as the last time you increased it. So it
may be true that reducing DRAM latency may become more important.

So what could be done? A lower overhead (fewer transfers) protocol
coupled with a different internal design and more pins (i.e. higher
cost) might be achievable. It is way out of my area of expertise, but I
just want to stimulate thoughts on this.

Of course, the marketability of such a design, even if achievable, is a
different question. :-(

MitchAlsup

unread,
May 30, 2022, 3:02:22 PMMay 30
to
Back in 1998, I designed a DRAM macro for a logic process. The DRAM
macro had 4× the density of SRAM and only a few picoseconds longer
access time (in the era of 250-500 MHz processors (4ns-2ns)). So, I take
issue that DRAM is inherently slower than SRAM. DRAM implemented in
a DRAM process (moderate P-channels slow N-channels) would not be
as fast a DRAM implemented in a logic process.
<
Where designed = circuit design, timing design, layout, parasitic extraction,
SPICE simulation, sequencer, and interface to the cache busses.
<
In order to deal with the leakage of the logic process, the DRAMs were
refreshed on every cycle the cache was not being accessed. This averages
out to be some 1000× more often than the DRAM process DRAMs.
>
> I think that the original DRAM designers made the (IMO correct) tradeoff
> to gain lower cost (fewer pins), at the cost of higher latency. This
> was reinforced by the introduction of CPU caches, which are essentially
> average latency reducers, hiding much of the longer latency. These
> allowed DRAMS to successfully dominate the market.
<
packaged DRAMs are successful in the moderate latency, lowest cost per bit
{likely supplanted by Flash} market. Here, package cost is significant!
<
Did anyone ever wonder why DRAM manufactures build $2B+ FABs in order
to produce $5 dies that go in $3 packages made in $50M factory..........
>
> However, cache is a diminishing returns game. Adding more improves
> performance, but not as much as the last time you increased it. So it
> may be true that reducing DRAM latency may become more important.
<
Cache gains go in 1/SQRT( size ). 4× the cache size buys ½ the miss rate.
>
> So what could be done? A lower overhead (fewer transfers) protocol
> coupled with a different internal design and more pins (i.e. higher
> cost) might be achievable. It is way out of my area of expertise, but I
> just want to stimulate thoughts on this.
<
Servers trade a couple of clocks of latency for lots more DRAM DIMMs
and the advantages that come from more DIMMs {banks, open rows,
refresh banks not in use,...} and for them this is a good tradeoff.

Michael S

unread,
May 30, 2022, 3:26:35 PMMay 30
to
What you described exists:
https://www.micron.com/products/dram/rldram-memory
It is very very very niche, but its existence proves that it is not absolutely non-marketable.

MitchAlsup

unread,
May 30, 2022, 7:34:40 PMMay 30
to
This data sheet reminds me a lot of the Psuedostatic DRAM from around 1991.
{Which I used as/for the DRAM timing parameters of Mc 88120 design simulation.}

Brett

unread,
May 30, 2022, 8:27:00 PMMay 30
to
DRAM stopped scaling a decade ago, RAM will move onto the CPU by the time
DDR6 is out. On CPU RAM will give 10x the RAM bandwidth or more. Look at
the Apple M1 for the future.

MitchAlsup

unread,
May 30, 2022, 8:39:21 PMMay 30
to
DRAM stopped latency scaling a decade ago (or a bit more) density scaling
continues.
<
For most design point, (low) latency is more important than (high) Bandwidth
Which is why DDR[k] is out pf phase with desire.

Stephen Fuld

unread,
May 31, 2022, 2:02:43 AMMay 31