Custom Memory Manager & placement new

Christopher J. Pisz

unread,

Nov 22, 2016, 4:06:42 PM11/22/16

to

So, I got asked on an interview today if I am have ever done a custom
memory manager and if I am familiar with placement new.

Nope. I think I am accustomed to more high level programming then what
they are looking for. None the less, I'd like to learn about the concept.

Even after Googling placement new, I cannot fathom why I would use it.
Perhaps I have some old legacy hardware that does not actually have any
kind of programming interface, and the only way to communicate is to
plug things in at a specific location? Does that even happen these days?

What kind of scenarios are there with modern hardware and software where
you are using this kind of thing?

Also, why would one need a custom memory manager and what kinds of
custom things would their manager provide? I understand that a person
might have a limited amount of memory, but I don't see how taking over
what new and delete normally do is going to save you. Maybe you have to
only use specific locations, but similar to the previous question, why
would that happen?

Mark Blair

unread,

Nov 22, 2016, 4:13:22 PM11/22/16

to

On 11/22/2016 3:06 PM, Christopher J. Pisz wrote:
> So, I got asked on an interview today if I am have ever done a custom
> memory manager and if I am familiar with placement new.
>
> Nope. I think I am accustomed to more high level programming then what
> they are looking for. None the less, I'd like to learn about the concept.
>
> Even after Googling placement new, I cannot fathom why I would use it.
> Perhaps I have some old legacy hardware that does not actually have any
> kind of programming interface, and the only way to communicate is to
> plug things in at a specific location? Does that even happen these days?
>
> What kind of scenarios are there with modern hardware and software where
> you are using this kind of thing?

It's very useful when used with shared memory.

Chris M. Thomasson

unread,

Nov 22, 2016, 4:30:24 PM11/22/16

to

Lock-free allocators are nice because they tend to outperform
malloc/free under load wrt multiple threads and/or processes
concurrently allocating and freeing blocks of memory. The Intel TBB is
pretty nice. Placement new is great for being able to integrate with
custom memory management solutions. Shared memory is a very nice example
where placement new comes into play.

Christopher J. Pisz

unread,

Nov 22, 2016, 4:35:28 PM11/22/16

to

What does one use shared memory for? Faster IPC? What are some examples?
where you've used it?

From what I am reading, there is no shared memory per-say on Windows.
Only virtual addresses that map to regions of a file. If that is
correct, perhaps, this is the reason I have not come across this, being
a Windows guy.

Mark Blair

unread,

Nov 22, 2016, 4:39:31 PM11/22/16

to

On 11/22/2016 3:35 PM, Christopher J. Pisz wrote:
> On 11/22/2016 3:13 PM, Mark Blair wrote:
>> On 11/22/2016 3:06 PM, Christopher J. Pisz wrote:
>>> So, I got asked on an interview today if I am have ever done a custom
>>> memory manager and if I am familiar with placement new.
>>>
>>> Nope. I think I am accustomed to more high level programming then what
>>> they are looking for. None the less, I'd like to learn about the
>>> concept.
>>>
>>> Even after Googling placement new, I cannot fathom why I would use it.
>>> Perhaps I have some old legacy hardware that does not actually have any
>>> kind of programming interface, and the only way to communicate is to
>>> plug things in at a specific location? Does that even happen these days?
>>>
>>> What kind of scenarios are there with modern hardware and software where
>>> you are using this kind of thing?
>>
>> It's very useful when used with shared memory.
>>
>>>
>>> Also, why would one need a custom memory manager and what kinds of
>>> custom things would their manager provide? I understand that a person
>>> might have a limited amount of memory, but I don't see how taking over
>>> what new and delete normally do is going to save you. Maybe you have to
>>> only use specific locations, but similar to the previous question, why
>>> would that happen?
>>>
>>
>
> What does one use shared memory for? Faster IPC? What are some examples?
> where you've used it?

On Linux, between separate processes that share structured data. In one
case we have a Java GUI that uses JNI to interface to data populated
from a C++ daemon.

> From what I am reading, there is no shared memory per-say on Windows.

There *is* shared memory on Windows, although it's been a while since
I've used it.

> Only virtual addresses that map to regions of a file. If that is
> correct, perhaps, this is the reason I have not come across this, being
> a Windows guy.

Most every OS uses virtual addresses, even for shared memory.

--
Mark

Chris M. Thomasson

unread,

Nov 22, 2016, 4:41:53 PM11/22/16

to

Take a look at memory mapped files:

https://msdn.microsoft.com/en-us/library/ms810613.aspx

That is from 1993.

Louis Krupp

unread,

Nov 22, 2016, 4:52:13 PM11/22/16

to

"Placement new" decouples memory allocation from invocation of the
constructor. If you already have memory that you want to use for an
object, whether it's static or automatic or allocated by something
other than malloc(), you can use placement new to run the object's
constructor with calling malloc().

As Chris said, there are alternatives to malloc() and free(), which
are general-purpose routines. When you're allocating and deallocating
memory for multiple objects of the same type (and therefore of the
same size), it sometimes makes sense to preallocate memory for a bunch
of contiguous objects and then keep track of which ones are in use.
When it's time to initialize a new object of that type, pick an unused
object and call placement new to call the constructor with the
object's address.

This page looks useful:

https://isocpp.org/wiki/faq/dtors#placement-new

Louis

Paavo Helde

unread,

Nov 22, 2016, 5:24:41 PM11/22/16

to

On 22.11.2016 23:06, Christopher J. Pisz wrote:
> So, I got asked on an interview today if I am have ever done a custom
> memory manager and if I am familiar with placement new.
>
> Nope. I think I am accustomed to more high level programming then what
> they are looking for. None the less, I'd like to learn about the concept.
>
> Even after Googling placement new, I cannot fathom why I would use it.
> Perhaps I have some old legacy hardware that does not actually have any
> kind of programming interface, and the only way to communicate is to
> plug things in at a specific location? Does that even happen these days?
>
> What kind of scenarios are there with modern hardware and software where
> you are using this kind of thing?

STL implementations make use of placement new quite a lot, in places
like std::vector or std::make_shared. How do you construct objects
inside the contiguous buffer of a std::vector? Placement new is the only
option for non-POD-s. Whenever you implement something like that itself,
placement new will be needed.

In general, any time when you allocate and release many items together,
then it is more efficient to allocate a single large block and place the
objects inside it. Technically this could appear either as using
placement new or using a custom pooled allocator, depending on exact
implementation. By deep parsing large XML files for example such
techniques might give 100x speedup and 10x better memory usage, compared
to dynamically allocating each XML node individually.

> Also, why would one need a custom memory manager and what kinds of
> custom things would their manager provide? I understand that a person
> might have a limited amount of memory, but I don't see how taking over
> what new and delete normally do is going to save you. Maybe you have to
> only use specific locations, but similar to the previous question, why
> would that happen?

One might want to use a custom memory manager for debugging leaks or
other bugs, for example. One might also want to use a faster
multithreaded memory manager like Intel TBB instead of your standard
one, this may make a world of difference in heavily multithreaded programs.

Cheers
Paavo

Paavo Helde

unread,

Nov 22, 2016, 5:32:53 PM11/22/16

to

On 22.11.2016 23:35, Christopher J. Pisz wrote:
> From what I am reading, there is no shared memory per-say on Windows.
> Only virtual addresses that map to regions of a file.

Technically yes, but this file can be the pagefile, meaning basically
that it is just a region of memory.

Louis Krupp

unread,

Nov 22, 2016, 7:02:39 PM11/22/16

to

... that should have been "without calling malloc()."

Chris Vine

unread,

Nov 22, 2016, 7:27:13 PM11/22/16

to

On Tue, 22 Nov 2016 15:06:33 -0600

"Christopher J. Pisz" <cp...@austin.rr.com> wrote:

You have been given some good examples as regards the use of placement
new (custom memory allocators are somewhat orthogonal to that,
although you may well use placement new in your custom allocator).
Leaving aside things like shared memory, these uses of placement new
usually come down to efficiency.

Say you want to copy an array of some heavy-duty objects of class type
T into a new array allocated on the heap (heavy duty in the sense that
their constructors do some significant work). A simple implementation
may allocate the memory for the new array of T with the new expression,
which will default construct a T object in each element of the array,
and then copy the existing array into the new one with, say,
std::copy(). That will copy construct a new set of T objects over the
default constructed ones.

A better approach may be to allocate an uninitialized array using
std::malloc(), and then to use placement new to copy construct the T
objects from the existing array directly into that uninitialized array.
There is a standard algorithm which will do that for you -
std::uninitialized_copy(): that will call up placement new for each
element of the new array.

In C++11 on, you can use the same approach for local arrays constructed
on the stack. Using alignas<T>, you can allocate an array of char with
the correct alignment for your T objects and then copy construct the T
objects into the array with placement new and/or
std::unitialized_copy().

As another example, I don't think it is possible to construct a
reasonable circular buffer without using placement new, for the same
reasons.

Chris

Chris Vine

unread,

Nov 22, 2016, 7:57:55 PM11/22/16

to

On Wed, 23 Nov 2016 00:27:01 +0000
Chris Vine <chris@cv

> Say you want to copy an array of some heavy-duty objects of class type
> T into a new array allocated on the heap (heavy duty in the sense that
> their constructors do some significant work). A simple implementation
> may allocate the memory for the new array of T with the new
> expression, which will default construct a T object in each element
> of the array, and then copy the existing array into the new one with,
> say, std::copy(). That will copy construct a new set of T objects
> over the default constructed ones.

Errm. That should say that T's copy assignment operator will copy
assign the existing array of T objects into the previously default
constructed ones.

Christopher J. Pisz

unread,

Nov 22, 2016, 8:35:00 PM11/22/16

to

So is this where I create my own "allocator" that is an argument to so
many STL types? Or is that not required? Or is it not required or related?

Are there any good tutorials where I can actually see the difference for
myself and a simple example of how it can be used?

asetof...@gmail.com

unread,

Nov 23, 2016, 1:23:37 AM11/23/16

to

The problem with all your hll way of see, all garbage collector way to see: it is
you lost the contact from the
memory management (other than a simple instruction as add eax ebx)
But memory and the final instruction CPU execute are one subject for
the programmer

Juha Nieminen

unread,

Nov 23, 2016, 2:10:41 AM11/23/16

to

Christopher J. Pisz <cp...@austin.rr.com> wrote:

> Even after Googling placement new, I cannot fathom why I would use it.

std::vector and std::deque (and possibly some of the others) internally
use placement new, and it's essential to their functionality.

Christopher J. Pisz

unread,

Nov 23, 2016, 3:59:06 AM11/23/16

to

On 11/22/2016 3:06 PM, Christopher J. Pisz wrote:

> So, I got asked on an interview today if I am have ever done a custom
> memory manager and if I am familiar with placement new.
>
> Nope. I think I am accustomed to more high level programming then what
> they are looking for. None the less, I'd like to learn about the concept.
>

SNIP

So, I've Googled and read a few articles tonight. I implemented a test
(found below) and holy crap. It makes a difference by a factor of 100 on
my machine with the current settings in Release. Going from 12 seconds
to .12 seconds is pretty darn significant. So, there is something to be
said for this.

The test I implemented was following an article from IBM. I don't
understand a couple of things that they did:

1) They allocate 1 element at a time, rather than numElements * sizeof
the object to be stored. Why?

Also, the trick they are using in allocating the size of the Data object
and treating it like a Node pointer seems to be essential to their
design. How can we change the source to allocate numElements * sizeof
the object instead, if we wanted to?

2) Given that they are allocating one element at a time, just up front,
why am I seeing a performance increase as drastic as I am?

3) I don't seem to get any performance increase if I change the pool
size from 32 to 1024. Why?

Here is the amalgamated source code:

// Common Library
// #include "PerformanceTimer.h"

// Standard Includes
#include <iostream>

//------------------------------------------------------------------------------
const size_t g_numElements(10000);
const size_t g_iterations(5000);
const size_t g_poolSize(32);

//------------------------------------------------------------------------------
// Data class with no memory management
class NoMemoryManagement
{
public:
NoMemoryManagement(double realPart, double SpecificToSimpleMMPart);

private:

double m_realPart;
double m_complexPart;
};

//------------------------------------------------------------------------------
// Data class that overwides operator new and delete to use simple
memory managemer
class SpecificToSimpleMM
{
public:
SpecificToSimpleMM(double realPart, double SpecificToSimpleMMPart);

void * operator new(size_t size);
void operator delete(void * pointerToDelete);

private:

double m_realPart;
double m_complexPart;
};

//------------------------------------------------------------------------------
class IMemoryManager
{
public:
virtual void * allocate(size_t) = 0;
virtual void free(void *) = 0;
};

//------------------------------------------------------------------------------
// Memory Manager that allocates space for multiple objects, of a
specific type, at a time
//
// Customized for objects of type SpecificToSimpleMM and works only in
single-threaded environments.
// Keeps a pool of SpecificToSimpleMM objects available and has future
allocations occur from this pool.
class SimpleMemoryManager : public IMemoryManager
{
// Node in the memory pool
struct FreeStoreNode
{
FreeStoreNode * m_next;
};

void expandPoolSize();
void cleanUp();

// The memory pool
FreeStoreNode * m_freeStoreHead;

public:

SimpleMemoryManager();
virtual ~SimpleMemoryManager();

virtual void * allocate(size_t);
virtual void free(void *);
};

//------------------------------------------------------------------------------
NoMemoryManagement::NoMemoryManagement(double realPart, double complexPart)
:
m_realPart(realPart)
, m_complexPart(complexPart)
{
}

//------------------------------------------------------------------------------
SimpleMemoryManager g_memoryManager; // Global yuck! Just following the
online example

//------------------------------------------------------------------------------
SpecificToSimpleMM::SpecificToSimpleMM(double realPart, double complexPart)
:
m_realPart(realPart)
, m_complexPart(complexPart)
{
}

//------------------------------------------------------------------------------
void * SpecificToSimpleMM::operator new(size_t size)
{
return g_memoryManager.allocate(size);
}

//------------------------------------------------------------------------------
void SpecificToSimpleMM::operator delete(void * pointerToDelete)
{
g_memoryManager.free(pointerToDelete);
}

//------------------------------------------------------------------------------
SimpleMemoryManager::SimpleMemoryManager()
{
expandPoolSize();
}

//------------------------------------------------------------------------------
SimpleMemoryManager::~SimpleMemoryManager()
{
cleanUp();
}

//------------------------------------------------------------------------------
void SimpleMemoryManager::expandPoolSize()
{
// The trick to the design of this memory manager is that each
element that is
// allocated is the larger of a FreeStoreNode pointer or the size
of the object being stored.
// In this case, we are storing SpecificToSimpleMM objects, which
will be larger than a pointer.
// So, while we treat elements as FreeStoreNode pointers, the void
pointer returned by operator new
// points to SpecificToSimpleMM objects.
size_t size = (sizeof(SpecificToSimpleMM) > sizeof(FreeStoreNode
*)) ? sizeof(SpecificToSimpleMM) : sizeof(FreeStoreNode *);

FreeStoreNode * head = reinterpret_cast<FreeStoreNode *>(new
char[size]);
m_freeStoreHead = head;

for (int i = 0; i < g_poolSize; i++)
{
head->m_next = reinterpret_cast<FreeStoreNode *>(new char[size]);
head = head->m_next;
}

head->m_next = 0;
}

//------------------------------------------------------------------------------
void SimpleMemoryManager::cleanUp()
{
// Only cleans up memory from the blocks we allocated that did not
get used yet
// Otherwise, we have to rely on the user deleting any object he
newed, as normal
FreeStoreNode * nextPtr = m_freeStoreHead;

for (; nextPtr; nextPtr = m_freeStoreHead)
{
m_freeStoreHead = m_freeStoreHead->m_next;
delete[] nextPtr; // remember this was a char array
}
}

//------------------------------------------------------------------------------
void * SimpleMemoryManager::allocate(size_t size)
{
if (!m_freeStoreHead)
{
expandPoolSize();
}

FreeStoreNode * head = m_freeStoreHead;
m_freeStoreHead = head->m_next;

return head;
}

//------------------------------------------------------------------------------
void SimpleMemoryManager::free(void * deleted)
{
FreeStoreNode * head = static_cast<FreeStoreNode *> (deleted);
head->m_next = m_freeStoreHead;
m_freeStoreHead = head;
}

//------------------------------------------------------------------------------
void RunNoMemoryManagement()
{
/*
// Start timer
Common::PerformanceTimer timer;
timer.Start();
*/
// Do the work
NoMemoryManagement * array[g_numElements];

for (int i = 0; i < g_iterations; i++)
{
for (int j = 0; j < g_numElements; j++)
{
array[j] = new NoMemoryManagement(i, j);
}

for (int j = 0; j < g_numElements; j++)
{
delete array[j];
}
}
/*
// Stop the timer
double secondsElapsed = timer.Stop();
std::cout << "Test with no memory management took " <<
secondsElapsed << " seconds.\n";
*/
}

//------------------------------------------------------------------------------
void RunSimpleMemoryManagement()
{
/*
// Start timer
Common::PerformanceTimer timer;
timer.Start();
*/
// Do the work
SpecificToSimpleMM * array[1000];

for (int i = 0; i < 5000; i++)
{
for (int j = 0; j < 1000; j++)
{
array[j] = new SpecificToSimpleMM(i, j);
}

for (int j = 0; j < 1000; j++)
{
delete array[j];
}
}
/*
// Stop the timer
double secondsElapsed = timer.Stop();
std::cout << "Test with simple memory management took " <<
secondsElapsed << " seconds.\n";
*/
}

//------------------------------------------------------------------------------
int main(int argc, char * argv[])
{
RunNoMemoryManagement();
RunSimpleMemoryManagement();

return 0;
}

Chris Vine

unread,

Nov 23, 2016, 5:46:46 AM11/23/16

to

On Tue, 22 Nov 2016 19:34:50 -0600

My examples (and the other ones that have been offered) are not
directly concerned with designing allocator objects for C++
containers, but there is a connection.

References to "custom memory allocation" can mean a number of different
things. The simplest is the class specific operator new, whereby you
provide static class versions of at least operator new(std::size_t) and
operator new[](std::size_t), and usually (to respect the expected
interface) of their nothrow counterparts. These operators should just
allocate uninitialized memory - when called, the new and new[]
expressions (as opposed to these operators new) will construct objects
of the relevant class type for you within the uninitialized memory that
these operators allocate.

With class specific operator new you also provide the equivalent
class specific operator delete and operator delete[] methods. In
addition, to respect the interface that a user of the class expects you
would normally provide a class specific placement new (operator
new(std::size_t, void*), but this should usually just forward to global
placement new (::new(std::size_t, void*)). This is because with
placement new the uninitialized memory is provided externally by the
caller, not by operator new. The memory just gets handed on.

Then there are custom allocator objects for containers and certain
other types in the standard library that are allocator aware, as you
have mentioned. The idea is similar: looked at in the round, the
allocate() methods allocate uninitialized memory and the deallocate()
methods deallocate it. std::allocator_traits also provides a default
construct() method for placing items in that memory, which just calls
placement new, and a default destroy() method for removing items, which
just calls the data item's destructor directly: these construct() and
destroy() methods are syntactic sugar for placement new and a
destructor call. An obvious example of their usage is
std::vector::reserve(). If you call that method, std::vector's
allocator will provide a region of uninitialized memory for the
vector. When you push items into the vector, objects will be
constructed in that memory by std::vector's implementation using
placement new.

I gave the example of a fixed size circular buffer. Any reasonable
implementation for generic data types will, when the buffer is
constructed, allocate uninitialized memory for the buffer. Items will
be pushed onto the buffer using placement new. When popped off, old
items will be destroyed by calling the data type's destructor directly.
The buffer's memory will constantly be re-used for the in-place
construction of data items and will not be deallocated until the buffer
itself is destroyed.

I am afraid I don't actually know of any good tutorials. There is a bit
about allocators in section 34.4 of TC++PL, 4th edition.

Chris

Chris M. Thomasson

unread,

Nov 23, 2016, 4:24:28 PM11/23/16

to

I remember a long time ago when I was exploring custom allocators in C++:

https://groups.google.com/d/topic/comp.lang.c++/48Tm8j8ag-0/discussion

The program had no undefined behavior and was confirmed as a compiler
bug in MSVC 8 and 9 by James Kanze:

https://groups.google.com/forum/#!original/comp.lang.c++/48Tm8j8ag-0/o8GXMeZG1fwJ

A quote from the post linked above:
________________________________________
Well, there's no undefined behavior. You're program seems
perfectly legal and well defined to me. It looks like a bug in
VC++, see =A712.5/5:

When a delete-expression is executed, the selected
deallocation function shall be called with the address
of the block of storage to be reclaimed as its first
argument and (if the two-parameter style is used) the
size of the block as its second argument.

And I can't think of any way of interpreting "the size of the
block" to mean anything other than the size requested in the
call to operator new.
________________________________________

This was back when I was still trying out custom allocation techniques
in C++.

Chris M. Thomasson

unread,

Nov 23, 2016, 11:20:12 PM11/23/16

to

I am sad to say that MSVC 2015 has the exact same error for me! I get:
_________________________________________
custom_allocator::allocate(00BBD930, 2234)
custom_allocator::deallocate(00BBD930, 2234)
custom_allocator::allocate(00BC0038, 11170)
custom_allocator::deallocate(00BC0038, 2234)
_________________________________________

This is very wrong!

However, it works online here in a different compiler:

http://cpp.sh/94sf7
_________________________________________
custom_allocator::allocate(0x20b0900, 2234)
custom_allocator::deallocate(0x20b0900, 2234)
custom_allocator::allocate(0x20b0900, 11178)
custom_allocator::deallocate(0x20b0900, 11178)
_________________________________________

and on GCC.

AFAICT, MSVC has a bug here in Version 14.0.25424.00 Update 3 at least...

Damn.

Chris Vine

unread,

Nov 24, 2016, 5:50:25 AM11/24/16

to

On Wed, 23 Nov 2016 20:20:00 -0800
"Chris M. Thomasson" <inv...@invalid.invalid> wrote:
> AFAICT, MSVC has a bug here in Version 14.0.25424.00 Update 3 at
> least...
>
> Damn.

This is a really old bug in delete[] which has been around for ages,
and which Microsoft will readily admit to. However they say they can't
change it because it would break code which relies on the faulty
implementation (not least their own Windows operating system). It has
cropped up on this newsgroup a few times before. What the size_t
argument is passed in operator delete[] is the element size for the type
for which new[] was called, and not the allocated block size. Quite
what led them to think that is what the standard requires is beyond me.

The first suggested work-around is to have your class specific operator
new[] and operator delete[] forward to global operator new[] and
global operator delete[] so that your custom allocator only works when
allocating individual objects (or to not provide class specific operator
new[] and operator delete[] at all, which has the same effect), which is
what most people do. If you really need custom allocation for arrays
and the object is not std::allocator aware, the other workaround is to
use the initial few bytes of any memory allocated with new[] (say, an
unsigned int's worth) to the size of the allocation, and to return not
the beginning of the memory block allocated, but the address immediately
after the int (the start of "object space"). operator delete[] can then
read back four bytes and obtain the int value when it comes to delete[]
time.

mark

unread,

Nov 24, 2016, 6:50:55 AM11/24/16

to

On 2016-11-24 11:50, Chris Vine wrote:
> The first suggested work-around is to have your class specific operator
> new[] and operator delete[] forward to global operator new[] and
> global operator delete[] so that your custom allocator only works when
> allocating individual objects (or to not provide class specific operator
> new[] and operator delete[] at all, which has the same effect), which is
> what most people do. If you really need custom allocation for arrays
> and the object is not std::allocator aware, the other workaround is to
> use the initial few bytes of any memory allocated with new[] (say, an
> unsigned int's worth) to the size of the allocation, and to return not
> the beginning of the memory block allocated, but the address immediately
> after the int (the start of "object space"). operator delete[] can then
> read back four bytes and obtain the int value when it comes to delete[]
> time.

How do you read back those 4 bytes without relying on undefined behavior?

There are compilers (including GCC) that may mis-compile code like that
(it's C, but the restrictions on C++ pointer math should be similar enough):
"C memory object and value semantics: the space of de facto and ISO
standards"
https://www.cl.cam.ac.uk/~pes20/cerberus/notes30.pdf

Chris Vine

unread,

Nov 24, 2016, 8:07:54 AM11/24/16

to

This is a workaround for microsoft's compiler for its x86/64 products.
You can do whatever that compiler and/or the x86/64 platform will
accept. This also happens to be microsoft's suggestion (although I
cannot say that I now remember what size of integer value they
recommend).

There are two issues, alignment and strict aliasing. On alignment,
with x86/64 that only goes to speed of access and not correctness. An
initial int value will in practice be fine. If you remain worried
about it after testing, you can use an initial segment size of
alignof(std::max_align_t) in which to put your integer value, which is
8 bytes on 32-bit windows and 16 bytes on 64-bit.

Strict aliasing is not an issue. Just don't write or read the memory
block's initial integer value in a way which breaks C++'s strict
aliasing rules, that is, manipulate or read the initial integer value
via char* or via std::memcpy()[1].

What else do you think is problematic?

Incidentally gcc is irrelevant to this. It provides the correct value
to operator delete[] to begin with.

Chris

[1] This is being pedantic. After casting operator delete[]'s void*
argument to some other pointer type, type information about the dynamic
type of the content is absent and indeed all objects formally in that
block have already been destroyed, so microsoft's compiler cannot (and
anyway will not) optimize against it. It is just a block of bytes at
that stage, awaiting release to the memory pool or whatever else the
custom allocator does.

mark

unread,

Nov 24, 2016, 10:31:46 AM11/24/16

to

Your delete receives a pointer to the middle of the memory block. If you
do pointer arithmetic on that to get a pointer to the beginning of your
memory block, you have undefined behavior.

> [1] This is being pedantic.

This is not being pedantic. Some compiler developers interpret the
standard in an extremely aggressive (user-unfriendly) way and don't care
if they break real-world code if they might get some tiny performance
improvement. Maybe you can get away with this, if you restrict your code
exclusively to VC++, but with GCC or CLang I wouldn't count on it.

> After casting operator delete[]'s void* argument to some other
> pointer type, type information about the dynamic type of the content
> is absent and indeed all objects formally in that block have already
> been destroyed,

The pointer origin and original type can still be tracked in a lot of
cases and the compiler will be able to prove where the pointer came from.

The delete[] operator will receive a pointer that pointed to some array
S[]. The only pointer math that the standard allows is within the bounds
of S[] and one beyond. Casting to char* doesn't allow you to go outside
the bounds of S[]. Doing math on uintptr_t doesn't change that.

> so microsoft's compiler cannot (and anyway will not) optimize against
> it.

VC++ is less aggressive with optimizations relying on undefined
behavior, but is that guaranteed somewhere?

Chris Vine

unread,

Nov 24, 2016, 11:36:48 AM11/24/16

to

On Thu, 24 Nov 2016 16:31:39 +0100
mark <ma...@invalid.invalid> wrote:
> On 2016-11-24 14:07, Chris Vine wrote:

[snip]

> > This is a workaround for microsoft's compiler for its x86/64
> > products. You can do whatever that compiler and/or the x86/64
> > platform will accept. This also happens to be microsoft's
> > suggestion (although I cannot say that I now remember what size of
> > integer value they recommend).
> >
> > There are two issues, alignment and strict aliasing. On alignment,
> > with x86/64 that only goes to speed of access and not correctness.
> > An initial int value will in practice be fine. If you remain
> > worried about it after testing, you can use an initial segment size
> > of alignof(std::max_align_t) in which to put your integer value,
> > which is 8 bytes on 32-bit windows and 16 bytes on 64-bit.
> >
> > Strict aliasing is not an issue. Just don't write or read the
> > memory block's initial integer value in a way which breaks C++'s
> > strict aliasing rules, that is, manipulate or read the initial
> > integer value via char* or via std::memcpy()[1].
> >
> > What else do you think is problematic?
>
> Your delete receives a pointer to the middle of the memory block. If
> you do pointer arithmetic on that to get a pointer to the beginning
> of your memory block, you have undefined behavior.

This is wrong. See below.

> > [1] This is being pedantic.
>
> This is not being pedantic. Some compiler developers interpret the
> standard in an extremely aggressive (user-unfriendly) way and don't
> care if they break real-world code if they might get some tiny
> performance improvement. Maybe you can get away with this, if you
> restrict your code exclusively to VC++, but with GCC or CLang I
> wouldn't count on it.

There is no getting away with anything if you access the integer value
with unsigned char* or std::memcpy(). See below.

But in any event, as I have already said, this is only concerned with
VC++. There is no way you would bother with this rigmarole with either
gcc or clang, which get operator delete[] right in the first place.

> > After casting operator delete[]'s void* argument to some other
> > pointer type, type information about the dynamic type of the content
> > is absent and indeed all objects formally in that block have already
> > been destroyed,
>
> The pointer origin and original type can still be tracked in a lot of
> cases and the compiler will be able to prove where the pointer came
> from.
>
> The delete[] operator will receive a pointer that pointed to some
> array S[]. The only pointer math that the standard allows is within
> the bounds of S[] and one beyond. Casting to char* doesn't allow you
> to go outside the bounds of S[]. Doing math on uintptr_t doesn't
> change that.

You are wrong about this. operator new[] and operator delete[] know
nothing of types, whether array S[] or anything else. All they know
about is raw bytes of allocated memory passed by void*. They are the
equivalent of malloc() and free() (and many global operator new[]
and operator delete[] just call malloc() and free() and do nothing
else).

I think you may be confusing this with the new[] and delete[]
_expressions_, which do know about types and will construct or receive a
typed array (albeit in the case under discussion beginning 4 bytes off
the beginning of the raw memory originally allocated by operator
new[]). By the time you get to operator delete[], the delete[]
_expression_ has already destroyed all the objects in that typed array.
All that is left as an object is the integer value that operator new[]
originally put at the beginning of the allocated memory.

You are doing three things in this putative operator delete[]:

(i) you are working back to the beginning of the memory block
originally allocated by your custom operator new[]. Using pointer
arithmetic with unsigned char* is guaranteed by the standard to give the
correct result and I cannot think why you think otherwise. At no point
is any pointer going beyond the memory block allocated by operator
new[], so there can be no undefined behaviour.

(ii) you are then either using array addressing with unsigned char* or
using std::memcpy() to copy the initial 4 bytes into a local unsigned
int. If you are doubtful about the first, see §3.10/10 of C++11/14,
last bullet. In practice use std::memcpy(), as in all decent compilers
the call will be optimized out - it is the recommended way of dealing
with aliasing for this kind of case.

(iii) you then deallocate this raw block of memory with your custom
deallocator.

> > so microsoft's compiler cannot (and anyway will not) optimize
> > against it.
>
> VC++ is less aggressive with optimizations relying on undefined
> behavior, but is that guaranteed somewhere?

If you use either unsigned char* or std::memcpy() the aliasing is
entirely compliant and guaranteed to work. See §3.10/10 of C++11/14 as
regards unsigned char*.

Jorgen Grahn

unread,

Nov 24, 2016, 12:28:29 PM11/24/16

to

On Tue, 2016-11-22, Mark Blair wrote:
> On 11/22/2016 3:35 PM, Christopher J. Pisz wrote:
>> On 11/22/2016 3:13 PM, Mark Blair wrote:
>>> On 11/22/2016 3:06 PM, Christopher J. Pisz wrote:
>>>> So, I got asked on an interview today if I am have ever done a custom
>>>> memory manager and if I am familiar with placement new.
>>>>
>>>> Nope. I think I am accustomed to more high level programming then what
>>>> they are looking for. None the less, I'd like to learn about the
>>>> concept.
>>>>
>>>> Even after Googling placement new, I cannot fathom why I would use it.
>>>> Perhaps I have some old legacy hardware that does not actually have any
>>>> kind of programming interface, and the only way to communicate is to
>>>> plug things in at a specific location? Does that even happen these days?
>>>>
>>>> What kind of scenarios are there with modern hardware and software where
>>>> you are using this kind of thing?
>>>
>>> It's very useful when used with shared memory.
>>>

...

>>
>> What does one use shared memory for? Faster IPC? What are some examples?
>> where you've used it?
>
> On Linux, between separate processes that share structured data. In one
> case we have a Java GUI that uses JNI to interface to data populated
> from a C++ daemon.

I think I'd tend to treat that memory as an unstructured buffer, or at
most a dumb struct. The object wrapping it could have pointer to the
buffer, rather than be created using placement new. The class would
have to be very carefully written anyway.

I'm like Christopher -- never used placement new, never think I will.
If I have to, I'll read up on it, like with so many other language
features.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

mark

unread,

Nov 24, 2016, 1:04:11 PM11/24/16

to

No, I don't.

> By the time you get to operator delete[], the delete[]
> _expression_ has already destroyed all the objects in that typed array.
> All that is left as an object is the integer value that operator new[]
> originally put at the beginning of the allocated memory.

That doesn't change the fact that it was an S[]. If all the code is
available inline, the compiler can infer that it was an S[], it didn't
change the type to something else and it thus cannot be a pointer into a
larger char array. And you cannot leave the bounds of S[].

There is no special allowance for pointer math for delete[] in the standard.

> You are doing three things in this putative operator delete[]:
>
> (i) you are working back to the beginning of the memory block
> originally allocated by your custom operator new[]. Using pointer
> arithmetic with unsigned char* is guaranteed by the standard to give the
> correct result and I cannot think why you think otherwise.

It gives the numerically correct result.

> At no point is any pointer going beyond the memory block allocated by
> operator new[], so there can be no undefined behaviour.

It goes beyond the bounds of S[] - which it was just previously. How
does that pointer that pointed to S[] magically change to a pointer for
the outer block? That it was auto-cast to void* doesn't change it's
history or provenance.

What do you think the dynamic type of a pointer to an S[] is over its
lifetime (after construction, after the destructor has run, when the
deallocation function is called)? If the dynamic type changes, how does
that happen? Where in the standard is that spelled out?

Do you think your pointer math is valid over the entire lifetime of S[]
or only after the destructor has run?

Chris Vine

unread,

Nov 24, 2016, 1:22:57 PM11/24/16

to

On Thu, 24 Nov 2016 19:04:01 +0100

You say you are not confusing operator new[] and delete[] with the
new[] and delete[] expressions, but you are contradicting yourself
because you plainly are. operator new[] and operator delete[] do not
deal in arrays. They deal in bytes of memory. End of story.

Richard

unread,

Nov 24, 2016, 1:27:53 PM11/24/16

to

[Please do not mail me a copy of your followup]

Louis Krupp <lkr...@nospam.pssw.com.invalid> spake the secret code
<2ie93c1mhlvmebomk...@4ax.com> thusly:

>"Placement new" decouples memory allocation from invocation of the
>constructor. If you already have memory that you want to use for an
>object, whether it's static or automatic or allocated by something
>other than malloc(), you can use placement new to run the object's
>constructor with calling malloc().

I've also seen it used to ensure that the memory allocation happens
inside a DLL but the constructor runs in whatever code is creating the
instance of a type defined by the DLL.

This is a Windows annoyance of how DLLs behave, but if you have this
problem then it is something you have to deal with. Another
alternative is to expose only interfaces with virtual d'tors and
provide factory functions for creating instances. (Gee, sounds like
COM!)
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Terminals Wiki <http://terminals-wiki.org>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

Richard

unread,

Nov 24, 2016, 1:39:05 PM11/24/16

to

[Please do not mail me a copy of your followup]

som...@somewhere.net spake the secret code
<o12dlk$gv$1...@dont-email.me> thusly:

>What does one use shared memory for? Faster IPC? What are some examples?
>where you've used it?

Yes, faster IPC. If two processes are on the same machine, they don't
need sockets to talk, just a shared memory segment. If you're using
some flavor of unix, the X Window System has had this for a very long
time to enable more efficient communication between clients running on
the same machine as the display server.

> From what I am reading, there is no shared memory per-say on Windows.
>Only virtual addresses that map to regions of a file. If that is
>correct, perhaps, this is the reason I have not come across this, being
>a Windows guy.

It's the same thing under a different name:
<https://msdn.microsoft.com/en-us/library/windows/desktop/aa366551(v=vs.85).aspx>

MSDN calls it "memory-mapped files that the system paging file
stores". If you look at that example, there is no actual file
specified, which tells Windows "use the paging file", which is just a
slightly obtuse way of saying "give me a piece of virtual memory as a
file". That's because Windows is going to expose the sharing through
file handles so we need some way of getting a file handle for a chunk
of virtual memory in order to share a memory segment. They use a
"filename" (Global\\MyFileMappingObject) to identify the segment between
the two processes.

There are analogous APIs in unix for all of this and the result is
basically the same. You have a chunk of memory that can be shared
between two processes. For System V shared memory segments (I don't
think BSD ever came up with something that did the same thing), you
want to look at functions shmget, shmat, shmctl, etc.

Richard

unread,

Nov 24, 2016, 1:41:11 PM11/24/16

to

[Please do not mail me a copy of your followup]

Jorgen Grahn <grahn...@snipabacken.se> spake the secret code
<slrno3e8pj.5...@frailea.sa.invalid> thusly:

>I think I'd tend to treat that memory as an unstructured buffer, or at
>most a dumb struct. The object wrapping it could have pointer to the
>buffer, rather than be created using placement new. The class would
>have to be very carefully written anyway.

All the instances I've seen treat the chunk of memory as a binary
buffer into which structures are serialized/deserialized.

Going back to the X Window System situation, this makes the perfect
place for the client to write its protocol stream and for the server
to read the protocol stream. They just don't communicate the protocol
bytes over a socket.

mark

unread,

Nov 24, 2016, 2:05:03 PM11/24/16

to

There are no bytes of memory in the standard.

Pointer arithmetic is only allowed within array bounds. Read the
standard, "5.7 Additive operators [expr.add]".

Either that void* is a pointer into an array of some type (that array
must include your size variable) or your pointer math is undefined
behavior.

Chris Vine

unread,

Nov 24, 2016, 3:15:29 PM11/24/16

to

On Thu, 24 Nov 2016 20:04:47 +0100
mark <ma...@invalid.invalid> wrot
[snip]

> There are no bytes of memory in the standard.
>
> Pointer arithmetic is only allowed within array bounds. Read the
> standard, "5.7 Additive operators [expr.add]".
>
> Either that void* is a pointer into an array of some type (that array
> must include your size variable) or your pointer math is undefined
> behavior.

Despite all the previous things you have posted this is an interesting
point. I suppose the first thing to say is that contrary to what you
say there are bytes of memory in the standard. They are referred to as
a "block of storage" having a length in bytes.

Let's take the return value of operators new and new[]. §3.7.4.1/2 of
the standard says:

"If it is successful, it shall return the address of the start of a
block of storage whose length in bytes shall be at least as large as
the requested size. ... The pointer returned shall be suitably aligned
so that it can be converted to a pointer of any complete object type
with a fundamental alignment requirement (3.11) and then used to access
the object or array in the storage allocated (until the storage is
explicitly deallocated by a call to a corresponding deallocation
function)."

It therefore provides for the block of storage returned by void* by
operator new/new[] to be convertible to, amongst any other type, pointer
to unsigned char or pointer to unsigned int and then used to access an
object or array of objects of those types in the storage. It must
allow this, or the there would be no purpose in allocating memory in
the first place. Upon constructing objects in that block of storage,
they becomes assessible and, if it is an array, §5.7 applies.

Objects or arrays are placed into the block of storage by placement new
or (for trivial types) just by memcpy()ing it in. That is what the
new/new[] expression (as opposed to operator new) does. Arguably on the
most pedantic level, when implementing the microsoft workaround you
should place the integer value into the memory as an array of char, so
that subsequently using pointer arithmetic on it would then meet the
requirements of §5.7.

As a matter of fact the microsoft workaround does not (as I recall it)
require one to go to the additional trouble of converting to an array of
char. The overarching point on that is, as I have now said for the
third time, that this is only concerned with VC++ and is the way they
deal with the issue. One can therefore take it that, with VC++, it
works. And for the reasons mentioned it seems to me that it is
required to work.

Paavo Helde

unread,

Nov 24, 2016, 3:19:27 PM11/24/16

to

The technique described by Chris is actually often used by compilers
themselves. When you delete[] an array allocated via custom memory
allocator, the compiler needs to know how many destructors to call. How
does it know? Simple, it asks for 4 or 8 bytes more memory from your
custom allocator, stores the object count in the beginning of the block
and shifts the pointer before returning it from the new[] - that's the
same technique as described by Chris.

Of course, implementation can do things which are forbidden for mere
mortals, but it still shows the memory is not considered as pure array
of N objects.

Demo: MSVC2013:

#include <iostream>
#include <string>

class A {
public:
void* operator new[](size_t n) {
void* p = malloc(n);
std::cout << "Allocating " << n << " bytes at " << p << "\n";
return p;
}
void operator delete[](void* p) {
std::cout << "Releasing " << p << "\n";
}
std::string s;
};

int main() {
std::cout << "Creating array of 10 strings (" <<
10*sizeof(std::string) << " bytes)\n";
A* a = new A[10];
std::cout << "Array starts at " << a << "\n";
delete[] a;
}

Creating array of 10 strings (400 bytes)
Allocating 408 bytes at 00000000000EB5E0
Array starts at 00000000000EB5E8
Releasing 00000000000EB5E0

Jerry Stuckle

unread,

Nov 24, 2016, 3:58:56 PM11/24/16

to

On 11/24/2016 1:38 PM, Richard wrote:
> [Please do not mail me a copy of your followup]
>
> som...@somewhere.net spake the secret code
> <o12dlk$gv$1...@dont-email.me> thusly:
>
>> What does one use shared memory for? Faster IPC? What are some examples?
>> where you've used it?
>
> Yes, faster IPC. If two processes are on the same machine, they don't
> need sockets to talk, just a shared memory segment. If you're using
> some flavor of unix, the X Window System has had this for a very long
> time to enable more efficient communication between clients running on
> the same machine as the display server.
>

Don't forget if any process writes to the shared memory, all access to
the shared memory must be protected by mutex semaphores. That slows
things down a bit - but it's still a lot faster than any other IPC.

--
==================
Remove the "x" from my email address
Jerry Stuckle
jstu...@attglobal.net
==================

Chris M. Thomasson

unread,

Nov 24, 2016, 6:14:55 PM11/24/16

to

On 11/24/2016 12:58 PM, Jerry Stuckle wrote:
> On 11/24/2016 1:38 PM, Richard wrote:
>> [Please do not mail me a copy of your followup]
>>
>> som...@somewhere.net spake the secret code
>> <o12dlk$gv$1...@dont-email.me> thusly:
>>
>>> What does one use shared memory for? Faster IPC? What are some examples?
>>> where you've used it?
>>
>> Yes, faster IPC. If two processes are on the same machine, they don't
>> need sockets to talk, just a shared memory segment. If you're using
>> some flavor of unix, the X Window System has had this for a very long
>> time to enable more efficient communication between clients running on
>> the same machine as the display server.
>>
>
> Don't forget if any process writes to the shared memory, all access to
> the shared memory must be protected by mutex semaphores.

Why? Keep in mind that C++11 has basically all the functionality we need
to efficiently avoid mutexs and/or semaphores when possible.

> That slows
> things down a bit - but it's still a lot faster than any other IPC.

Yes. Mutexs can slow definitely things down, however, they do have their
place.

Chris M. Thomasson

unread,

Nov 24, 2016, 7:01:53 PM11/24/16

to

On 11/22/2016 1:06 PM, Christopher J. Pisz wrote:
> So, I got asked on an interview today if I am have ever done a custom
> memory manager and if I am familiar with placement new.
>
> Nope. I think I am accustomed to more high level programming then what
> they are looking for. None the less, I'd like to learn about the concept.
>
> Even after Googling placement new, I cannot fathom why I would use it.
> Perhaps I have some old legacy hardware that does not actually have any
> kind of programming interface, and the only way to communicate is to
> plug things in at a specific location? Does that even happen these days?
>
> What kind of scenarios are there with modern hardware and software where
> you are using this kind of thing?
>

> Also, why would one need a custom memory manager and what kinds of
> custom things would their manager provide? I understand that a person
> might have a limited amount of memory, but I don't see how taking over
> what new and delete normally do is going to save you. Maybe you have to
> only use specific locations, but similar to the previous question, why
> would that happen?
>

Check this _Shi%_ out! Yikes, I see UB...

FWIW, one can run the code:

The origin of a Custom Allocator's base

online here:

http://cpp.sh/74po5

code:
___________________________________________________
#include <cstdio>
#include <cstdlib>
#include <cstddef>
#include <cstdint>
#include <cassert>

#define MAX_ALIGN (alignof(std::max_align_t))
#define CACHE_LINE 128
#define SB_ALIGN 4096
#define SB_BYTES (SB_ALIGN - 128 - 1)

static_assert(!(MAX_ALIGN & (MAX_ALIGN - 1)), "MAX_ALIGN Pow2 Error");
static_assert(!(CACHE_LINE & (CACHE_LINE - 1)), "Cache Line Pow2 Error");

static_assert(!(SB_ALIGN & (SB_ALIGN - 1)), "sblock Pow2 Error");
static_assert(!(CACHE_LINE % MAX_ALIGN), "Cache Line Alignment Error");
static_assert(!(SB_ALIGN % MAX_ALIGN), "sblock Alignment Error");

struct sblock_mem
{
unsigned char mem[SB_ALIGN];
};

struct sblock_desc
{
int a;
};

static_assert(sizeof(sblock_desc) <= CACHE_LINE, "sblock_desc Size Error");

union sblock_desc_union
{
sblock_desc desc;
unsigned char mem[CACHE_LINE];
};

struct sblock
{
sblock_desc_union descu;
sblock_mem blks;
};

#define ROUND_DOWN(mp_ptr, mp_align) \
(((unsigned char*)(mp_ptr)) - \
(((std::uintptr_t)(mp_ptr)) & (mp_align - 1)))

#define ROUND_UP(mp_ptr, mp_align) \
(ROUND_DOWN(mp_ptr, mp_align) + (mp_align))

int main()
{
unsigned char lbuf[sizeof(sblock) + SB_ALIGN] = { '\0' };
unsigned char* abuf = ROUND_UP(lbuf, SB_ALIGN);
sblock* sb = (sblock*)abuf;
sb->descu.desc.a = 12345;

unsigned char* sbp = (unsigned char*)sb;

// 3967 is max offset because 4096-128 = 3968
unsigned char* mem = sb->blks.mem + 3967;
unsigned char* rdmem = ROUND_DOWN(mem, SB_ALIGN);

sblock* sbfinal = (sblock*)rdmem;

assert(sbfinal == sb);

std::printf("%lu\n", (unsigned long)sizeof(sblock));
std::printf("sbp:%p\n", (void*)sbp);
std::printf("mem:%p\n", (void*)mem);
std::printf("rdmem:%p\n", (void*)rdmem);
std::printf("sbfinal:%p\n", (void*)sbfinal);
std::printf("sbfinal->desc.a:%d\n", sbfinal->descu.desc.a);

//std::fflush(stdout);
//std::getchar();

return 0;
}
___________________________________________________

The ROUND_* macros are making think HACKED SHI%. But, it does seem to
work... Actually, IIRC, Intel TBB uses something very similar...

Chris M. Thomasson

unread,

Nov 24, 2016, 7:14:56 PM11/24/16

to

On 11/24/2016 4:01 PM, Chris M. Thomasson wrote:
> On 11/22/2016 1:06 PM, Christopher J. Pisz wrote:
>> So, I got asked on an interview today if I am have ever done a custom
>> memory manager and if I am familiar with placement new.
>>
>> Nope. I think I am accustomed to more high level programming then what
>> they are looking for. None the less, I'd like to learn about the concept.
>>
>> Even after Googling placement new, I cannot fathom why I would use it.
>> Perhaps I have some old legacy hardware that does not actually have any
>> kind of programming interface, and the only way to communicate is to
>> plug things in at a specific location? Does that even happen these days?
>>
>> What kind of scenarios are there with modern hardware and software where
>> you are using this kind of thing?
>>
>> Also, why would one need a custom memory manager and what kinds of
>> custom things would their manager provide? I understand that a person
>> might have a limited amount of memory, but I don't see how taking over
>> what new and delete normally do is going to save you. Maybe you have to
>> only use specific locations, but similar to the previous question, why
>> would that happen?
>>
>
> Check this _Shi%_ out! Yikes, I see UB...

Every time I posted this type of portion of my allocator code, I cringed
at the ROUND_* macros. But, it does allow one to use a free function
whose API just accepts a single void pointer. It rounds down to the
base, and has access to the super block. I used this technique to build
a multi-threaded allocator without using _any_ dynamic memory for a
resource constrained application. Everything was allocated on the stack.
A given number of threads could free allocations on other threads!

:^)

Scott Lurndal

unread,

Nov 27, 2016, 1:07:08 PM11/27/16

to

Jerry Stuckle <jstu...@attglobal.net> writes:
>On 11/24/2016 1:38 PM, Richard wrote:
>> [Please do not mail me a copy of your followup]
>>
>> som...@somewhere.net spake the secret code
>> <o12dlk$gv$1...@dont-email.me> thusly:
>>
>>> What does one use shared memory for? Faster IPC? What are some examples?
>>> where you've used it?
>>
>> Yes, faster IPC. If two processes are on the same machine, they don't
>> need sockets to talk, just a shared memory segment. If you're using
>> some flavor of unix, the X Window System has had this for a very long
>> time to enable more efficient communication between clients running on
>> the same machine as the display server.
>>
>
>Don't forget if any process writes to the shared memory, all access to
>the shared memory must be protected by mutex semaphores. That slows
>things down a bit - but it's still a lot faster than any other IPC.

There are many ways to ensure correctness
of shared data (between threads or between processes) without using
traditional mutexes or semaphores. Atomic accesses, for example,
can easily be used to maintain shared counters (e.g. gcc __sync_fetch_and_add,
__sync_and_and_fetch, et alia) while compare-and-swap can be
used to maintain linked lists without locks. Spinlocks,
read-copy-update and lock-free algorithms are all common in such
uses.

As a note, if using mmap(2) with MAP_PUBLIC or shmat(2) between processes with
disparate address spaces, you must use special flavors of the pthread
mutex with the PTHREAD_PROCESS_SHARED (pthread_mutexattr_setpshared)
attribute or use traditional unix semop(2) functions. Regular pthread mutexes
are normally limited to within a process.

Chris M. Thomasson

unread,

Dec 5, 2016, 10:14:54 PM12/5/16

to

On 11/24/2016 4:01 PM, Chris M. Thomasson wrote:

[...]

> #define ROUND_DOWN(mp_ptr, mp_align) \
> (((unsigned char*)(mp_ptr)) - \
> (((std::uintptr_t)(mp_ptr)) & (mp_align - 1)))
>
> #define ROUND_UP(mp_ptr, mp_align) \
> (ROUND_DOWN(mp_ptr, mp_align) + (mp_align))

> ___________________________________________________
>
>
> The ROUND_* macros are making think HACKED SHI%. But, it does seem to
> work... Actually, IIRC, Intel TBB uses something very similar...

I am wondering if anybody has archs where the static_asserts fail?

Or, archs where they all pass, but the ROUND_* macros all fail?