alignas c++20...

Chris M. Thomasson

unread,

May 7, 2023, 4:25:09 PM5/7/23

to

I can get this working with C++20, can anybody else? Over allocating and
aligning on a page size:
_________________________
#include <iostream>
#include <atomic>
#include <cstdint>
#include <cstdlib>

static constexpr std::size_t ct_page_size = 8192;
static constexpr std::size_t ct_l2_cacheline_size = 64;

struct alignas(ct_l2_cacheline_size) ct_cacheline
{
std::atomic<std::uint64_t> m_state;
};

struct alignas(ct_page_size) ct_page
{
ct_cacheline m_line_0;
ct_cacheline m_line_1;
};

int main()
{
if (sizeof(ct_cacheline) != ct_l2_cacheline_size)
{
std::cout << "Crap! ct_cacheline is not padded!\n";
}

if (sizeof(ct_page) != ct_page_size)
{
std::cout << "Crap! ct_page is not padded!\n";
}

ct_page* page = new ct_page;

std::cout << "page = " << page << "\n";

if ((std::uintptr_t)(page) % ct_page_size)
{
std::cout << "Crap! page is not aligned!\n";
}

if ((std::uintptr_t)(&page->m_line_1) % ct_l2_cacheline_size)
{
std::cout << "Crap! page->m_line_1 is not aligned!\n";
}

delete page;

return 0;
}
_________________________

I get:

page = 0000017135410000

as an output. I get no crap output, so to speak... ;^)

Also, this is working fine in C++20 on msvc.
_________________________
#include <iostream>
#include <atomic>
#include <cstdint>
#include <cstdlib>
#include <vector>

static constexpr std::size_t ct_pages_n = 3;
static constexpr std::size_t ct_page_size = 8192;
static constexpr std::size_t ct_l2_cacheline_size = 64;

template<typename T>
static inline bool
ct_assert_align(T const* ptr, std::size_t alignment)
{
return !((std::uintptr_t)(ptr) % alignment);
}

struct alignas(ct_l2_cacheline_size) ct_cacheline
{
std::atomic<std::uint64_t> m_state;
};

struct alignas(ct_page_size) ct_page
{
ct_cacheline m_line_0;
ct_cacheline m_line_1;
ct_cacheline m_line_2;
ct_cacheline m_line_3;
};

static_assert(sizeof(ct_page) == ct_page_size);
static_assert(sizeof(ct_cacheline) == ct_l2_cacheline_size);

static void
ct_page_iter(
std::vector<ct_page> const& pages,
std::size_t n
) {
for (std::size_t i = 0; i < n; ++i)
{
ct_page const& page = pages[i];

if (!ct_assert_align(&page.m_line_2, ct_l2_cacheline_size))
{
std::cout << "Crap!\n";
}
}
}

int main()
{
std::vector<ct_page> page(ct_pages_n);

// validate first element wrt page alignment
if (! ct_assert_align(&page[0], ct_page_size))
{
std::cout << "Big Time Crap!\n";
}

// validate cache lines
ct_page_iter(page, page.size());

return 0;
}
_________________________

Get any crap? It works for me. Over allocating and aligning is very
useful... It seems as if C++20 is working fine on MSVC.

Chris M. Thomasson

unread,

May 7, 2023, 8:21:32 PM5/7/23

to

On 5/7/2023 1:24 PM, Chris M. Thomasson wrote:
> I can get this working with C++20, can anybody else? Over allocating and
> aligning on a page size:

[...

>
> Get any crap? It works for me. Over allocating and aligning is very
> useful... It seems as if C++20 is working fine on MSVC.

MSVC c++20 works and aligns on page boundary and cache line boundaries:

https://i.ibb.co/93XFNh0/image.png

No crap output... ;^)

Chris M. Thomasson

unread,

May 7, 2023, 8:25:34 PM5/7/23

to

On 5/7/2023 1:24 PM, Chris M. Thomasson wrote:

> I can get this working with C++20, can anybody else? Over allocating and
> aligning on a page size:
> _________________________

[...]

> Get any crap? It works for me. Over allocating and aligning is very
> useful... It seems as if C++20 is working fine on MSVC.

It's working with MSVC fine, however I don't think GCC is going to like
it very much. The alignments might be off...

I can do all of this manually, but I thought alignas would work fine
with new and std::vectors.

Pavel

unread,

May 14, 2023, 6:57:15 PM5/14/23

to

Chris M. Thomasson wrote:
> I can get this working with C++20, can anybody else? Over allocating and
> aligning on a page size:
> _________________________

[snipped]

> _________________________
>
> I get:
>
> page = 0000017135410000
>
> as an output. I get no crap output, so to speak... ;^)
>

[snipped]
Seems to work for me as well

g++ (GCC) 13.1.1 20230429

$ ./a.out
page = 0x55f3f83da000
$

g++ (GCC) 13.1.1 20230429 on x86_64 GNU/Linux

-Pavel

Chris M. Thomasson

unread,

May 15, 2023, 5:55:16 PM5/15/23

to

Nice! I do not have that version installed. How does it respond to the
program that uses a std::vector? MSVC likes it and everything is padded
and aligned. Iirc, std::vector should honor alignas, right?

Pavel

unread,

May 15, 2023, 11:01:54 PM5/15/23

to

Chris M. Thomasson wrote:
> On 5/14/2023 3:56 PM, Pavel wrote:
>> Chris M. Thomasson wrote:
>>> I can get this working with C++20, can anybody else? Over allocating
>>> and aligning on a page size:
>>> _________________________
>> [snipped]
>>> _________________________
>>>
>>> I get:
>>>
>>> page = 0000017135410000
>>>
>>> as an output. I get no crap output, so to speak... ;^)
>>>
>> [snipped]
>> Seems to work for me as well
>>
>> g++ (GCC) 13.1.1 20230429
>>
>> $ ./a.out
>> page = 0x55f3f83da000
>> $
>>
>> g++ (GCC) 13.1.1 20230429 on x86_64 GNU/Linux
>
> Nice! I do not have that version installed. How does it respond to the
> program that uses a std::vector?

The vector prog's output is empty, so everything is good.

> MSVC likes it and everything is padded
> and aligned. Iirc, std::vector should honor alignas, right?
>

I don's think it is the vector specifically that honors alignas; I
rather think the data structure of type ct_page, whether allocated by
vector or in any other legal way, can get aligned accordingly to its
alignment specifier (on my system, it's implementation specific as
alignof(std::max_align_t) is 16 so the alignments of 64 are "extended"
-- but happen to be supported.

On a side note, see if your code benefits from using
std::hardware_constructive_interference_size instead of hardcoded 64 for
the cacheline_size.

HTH
-Pavel

Chris M. Thomasson

unread,

May 16, 2023, 6:47:45 PM5/16/23

to

On 5/15/2023 8:01 PM, Pavel wrote:
> Chris M. Thomasson wrote:
>> On 5/14/2023 3:56 PM, Pavel wrote:
>>> Chris M. Thomasson wrote:
>>>> I can get this working with C++20, can anybody else? Over allocating
>>>> and aligning on a page size:
>>>> _________________________
>>> [snipped]
>>>> _________________________
>>>>
>>>> I get:
>>>>
>>>> page = 0000017135410000
>>>>
>>>> as an output. I get no crap output, so to speak... ;^)
>>>>
>>> [snipped]
>>> Seems to work for me as well
>>>
>>> g++ (GCC) 13.1.1 20230429
>>>
>>> $ ./a.out
>>> page = 0x55f3f83da000
>>> $
>>>
>>> g++ (GCC) 13.1.1 20230429 on x86_64 GNU/Linux
>>
>> Nice! I do not have that version installed. How does it respond to the
>> program that uses a std::vector?
> The vector prog's output is empty, so everything is good.

Excellent. Thanks for taking the time to give it a go, Pavel. I really
need to install a recent GCC. Stuck on MSVC right now for a project.

>> MSVC likes it and everything is padded and aligned. Iirc, std::vector
>> should honor alignas, right?
>>
> I don's think it is the vector specifically that honors alignas; I
> rather think the data structure of type ct_page, whether allocated by
> vector or in any other legal way, can get aligned accordingly to its
> alignment specifier (on my system, it's implementation specific as
> alignof(std::max_align_t) is 16 so the alignments of 64 are "extended"
> -- but happen to be supported.

Remember those early hyperthreaded Pentium's? There was something called
the aliasing problem that would make two hyperthreaded threads falsely
share cache lines and would destroy performance? Iirc, the l2 cache
lines were 128 bytes split into two 64 byte regions. The workaround was
to offset the stacks of each thread using alloca.

> On a side note, see if your code benefits from using
> std::hardware_constructive_interference_size instead of hardcoded 64 for
> the cacheline_size.

It will definitely benefit. I am wondering if
std::hardware_constructive_interference_size is always guaranteed to be
the l2 cache line size?

Pavel

unread,

May 16, 2023, 9:38:45 PM5/16/23

to

np, my pleasure.

>
>>> MSVC likes it and everything is padded and aligned. Iirc, std::vector
>>> should honor alignas, right?
>>>
>> I don's think it is the vector specifically that honors alignas; I
>> rather think the data structure of type ct_page, whether allocated by
>> vector or in any other legal way, can get aligned accordingly to its
>> alignment specifier (on my system, it's implementation specific as
>> alignof(std::max_align_t) is 16 so the alignments of 64 are "extended"
>> -- but happen to be supported.
>
> Remember those early hyperthreaded Pentium's? There was something called
> the aliasing problem that would make two hyperthreaded threads falsely
> share cache lines and would destroy performance? Iirc, the l2 cache
> lines were 128 bytes split into two 64 byte regions. The workaround was
> to offset the stacks of each thread using alloca.

I remember there was a problem with them simulated threads not having
cache of their own (don't recall whether it was L2 or L1) but don't
remember what exactly. The advice I remember was to not use
hyperthreading mode :-).

>
>
>> On a side note, see if your code benefits from using
>> std::hardware_constructive_interference_size instead of hardcoded 64
>> for the cacheline_size.
>
> It will definitely benefit. I am wondering if
> std::hardware_constructive_interference_size is always guaranteed to be
> the l2 cache line size?

Not in general, the Holy Standard definition is expectedly L-agnostic :-).

Gcc in particular implements both as L1 cache line size (at least for
architectures I care of):

Quote:
‘destructive-interference-size’
‘constructive-interference-size’
The values for the C++17 variables
‘std::hardware_destructive_interference_size’ and
‘std::hardware_constructive_interference_size’. The
destructive interference size is the minimum recommended
offset between two independent concurrently-accessed objects;
the constructive interference size is the maximum recommended
size of contiguous memory accessed together. Typically both
will be the size of an L1 cache line for the target, in bytes.
For a generic target covering a range of L1 cache line sizes,
typically the constructive interference size will be the small
end of the range and the destructive size will be the large
end.

The destructive interference size is intended to be used for
layout, and thus has ABI impact. The default value is not
expected to be stable, and on some targets varies with
‘-mtune’, so use of this variable in a context where ABI
stability is important, such as the public interface of a
library, is strongly discouraged; if it is used in that
context, users can stabilize the value using this option.

The constructive interference size is less sensitive, as it is
typically only used in a ‘static_assert’ to make sure that a
type fits within a cache line.
End-of-quote

HTH
-Pavel

Chris M. Thomasson

unread,

May 17, 2023, 9:27:14 PM5/17/23

to

:^)

I am trying to target the l2 cache line, and also allow for alignment on
a large boundary, ct_page. This is a basic requirement for some of my
previous memory allocators. I can do this manually with some math. I am
trying to explore c++'s ability to do this for me, without using some
hackish means for alignment. The fun part is that we can round any
address in the ct_page down to its artificially large boundary. A header
structure can be sitting right before the boundary. Perfect, and highly
efficient for certain types of high performance per-thread allocators.

Perfect. Okay. Targeting the l2 cache line is the main goal.