What cache-archtectures ...

Bonita Montero

unread,

Dec 18, 2019, 10:07:57 AM12/18/19

to

... could make std::hardware_destructive_interference_size
and std::hardware_constructive_interference_size different?

Vir Campestris

unread,

Dec 20, 2019, 4:19:52 PM12/20/19

to

On 18/12/2019 15:07, Bonita Montero wrote:
> .... could make std::hardware_destructive_interference_size
> and std::hardware_constructive_interference_size different?

Multi-level caches perhaps?

Andy

Chris M. Thomasson

unread,

Dec 20, 2019, 5:05:15 PM12/20/19

to

Yeah. Was thinking that std::hardware_destructive_interference_size can
be L1, and std::hardware_constructive_interference_size different can be
L2. However, this is completely up to the implementation.

Iirc, back on early hyperthreading 128-byte would be split into two
64-byte halves. Iirc, two adjacent 64-bit cache lines can interfere.

https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf

They recommend to pad and align things on 128-byte boundaries. The two
64-byte halves can contend. Section 3.7.3, Hardware Prefetching for
Second-Level Cache.

Bonita Montero

unread,

Dec 21, 2019, 2:19:03 AM12/21/19

to

> Yeah. Was thinking that std::hardware_destructive_interference_size can
> be L1, and std::hardware_constructive_interference_size different can be
> L2. However, this is completely up to the implementation.

If L2 has a larger cacheline-size than L1 and L2 is shared,
then both values should be at least >= the L2 cacheline-size.

> Iirc, back on early hyperthreading 128-byte would be split into two
> 64-byte halves. Iirc, two adjacent 64-bit cache lines can interfere.
> https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf
> They recommend to pad and align things on 128-byte boundaries. The two
> 64-byte halves can contend. Section 3.7.3, Hardware Prefetching for
> Second-Level Cache.

There were multicore NetBurst-CPUs, not even NetBurst-based Xeons.
So there wasn't any opportunity for false sharing.

Chris M. Thomasson

unread,

Dec 22, 2019, 8:24:48 PM12/22/19

to

Iirc, back when hyperthreading first came out, there was/is still an
issue, with two concurrent threads heavily working on their respective
contiguous 64-byte structures. They can interfere with one another wrt
using lock'ed rmw. Therefore, its is recommended to align _and_ pad
things on a 128-byte boundary. Something like the following structures,
just some pseudo-code:
_____________________
struct cache_buf
{
char buf[128];
};

struct cache_half
{
char buf[64];
};

struct cache_line
{
cache_half low;
cache_half high;
};

union cache_line_buf
{
cache_line line;
cache_buf buf;
};
_____________________

a cache_line_buf needs to be aligned in memory on a sizeof(cache_buf)
boundary. Its already padded.

Chris M. Thomasson

unread,

Dec 22, 2019, 8:35:14 PM12/22/19

to

On 12/22/2019 5:24 PM, Chris M. Thomasson wrote:
> On 12/20/2019 11:18 PM, Bonita Montero wrote:

[...]

> Something like the following structures,
> just some pseudo-code:
> _____________________
> struct cache_buf
> {
> char buf[128];
> };

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Ummm... Damn it! Heck, this should at least be unsigned char. Or,
uint8_t. Sorry Bonita. ;^o

Bonita Montero

unread,

Dec 24, 2019, 7:18:13 AM12/24/19

to

> Iirc, back when hyperthreading first came out, there was/is still an
> issue, with two concurrent threads heavily working on their respective
> contiguous 64-byte structures. They can interfere with one another wrt
> using lock'ed rmw. Therefore, its is recommended to align _and_ pad

> things on a 128-byte boundary. ...

Not true. The L1-cache of NetBurst-CPUs had 64 byte lines and the
L2-cache 128 byte lines with partitial invalidation of the lower
or upper half through the L1-cache.
What you say might have been true for Net-Burst based dual-socket
Xeons.

Chris M. Thomasson

unread,

Dec 24, 2019, 4:39:57 PM12/24/19

to

Fair enough. So, this is fixed. Thanks for the info. :^)

Chris M. Thomasson

unread,

Dec 24, 2019, 4:41:16 PM12/24/19

to

On 12/24/2019 4:18 AM, Bonita Montero wrote:

https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf

Bonita Montero

unread,

Dec 24, 2019, 8:30:53 PM12/24/19

to

>> Not true. The L1-cache of NetBurst-CPUs had 64 byte lines and the
>> L2-cache 128 byte lines with partitial invalidation of the lower
>> or upper half through the L1-cache.
>> What you say might have been true for Net-Burst based dual-socket
>> Xeons.

> https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf
> Section 3.7.3, Hardware Prefetching for Second-Level Cache.

Alignment of prefetches is a totally different topic.

Chris M. Thomasson

unread,

Dec 28, 2019, 2:52:21 AM12/28/19

to

Imvvho, it can present an insight into deeper levels of the cache
architecture.

Bonita Montero

unread,

Dec 28, 2019, 2:54:23 AM12/28/19

to

This has nothing to do with false or true sharing if the higher level
cache with a larger line-size is shared.

Chris M. Thomasson

unread,

Dec 28, 2019, 2:58:07 AM12/28/19

to

It did with earlier models.

Chris M. Thomasson

unread,

Dec 28, 2019, 2:59:09 AM12/28/19

to

The split 128... Best to keep things padded and properly aligned on
boundaries. The two halfs are interesting.

Bonita Montero

unread,

Dec 28, 2019, 3:53:26 AM12/28/19

to

>> It did with earlier models.

> The split 128... Best to keep things padded and properly aligned on
> boundaries. The two halfs are interesting.

That was only relevant to MP P4-Xeons but not to the P4 desktop-CPUs.
On the latter all threads shared a single L2-cache.

Chris M. Thomasson

unread,

Dec 28, 2019, 6:56:29 PM12/28/19

to

Good to hear.