Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Announcing CookMem, an OSS C++ memory context / allocator project

51 views
Skip to first unread message

superdupe...@gmail.com

unread,
Sep 28, 2018, 11:10:45 PM9/28/18
to
CookMem is a a memory context / allocator written in C++11 that can be easily and quickly cleaned up by freeing the segments, rather than doing individual frees. It is extremely useful to deal with third party libraries that can potentially have memory leaks, or to simplify the memory management.

The secondary goal of this project is to provide a good enough performance that should be comparable to dlmalloc and its derivatives. In fact, the algorithms and some code used here are mostly based on dlmalloc. I basically took the pain to understand dlmalloc and rewrote the logics in C++. Some key calculations though are kept the same as dlmalloc since there is no point to reinvent wheels.

Github page: https://github.com/coconut2015/cookmem
Project documentation: https://coconut2015.github.io/cookmem/index.html

It is licensed under APL 2.0.

Paavo Helde

unread,
Sep 29, 2018, 3:19:52 AM9/29/18
to
On 29.09.2018 6:10, superdupe...@gmail.com wrote:
> CookMem is a a memory context / allocator written in C++11 that can be easily and quickly cleaned up by freeing the segments, rather than doing individual frees. It is extremely useful to deal with third party libraries that can potentially have memory leaks, or to simplify the memory management.

Just for curiosity: how do you convince a crappy third-party library to
use this allocator? And how do you know when it's safe to release the
memory?

From the first glance it seems it is lacking the standard
std::allocator interface and std::allocator_traits, which makes it
harder to use with non-crappy third-party C++ libraries.

Also, how does this compare with other existing pool allocators like the
Boost Pool Library?

superdupe...@gmail.com

unread,
Sep 29, 2018, 3:50:04 AM9/29/18
to
On Saturday, September 29, 2018 at 12:19:52 AM UTC-7, Paavo Helde wrote:
> On 29.09.2018 6:10, superdupe...@gmail.com wrote:
> > CookMem is a a memory context / allocator written in C++11 that can be easily and quickly cleaned up by freeing the segments, rather than doing individual frees. It is extremely useful to deal with third party libraries that can potentially have memory leaks, or to simplify the memory management.
>
> Just for curiosity: how do you convince a crappy third-party library to
> use this allocator? And how do you know when it's safe to release the
> memory?

The safest approach is to replace malloc / free. It does not matter if the third party library is C or C++. Before calling the third party library, switch malloc / free to use a memory pool. Once the call is finished, destroy the memory pool and switch back to whatever the previous memory pool or malloc / free.

You can take a look at how I do the performance test to compare dlmalloc vs CookMem:

https://github.com/coconut2015/cookmem/blob/master/performances/


> From the first glance it seems it is lacking the standard
> std::allocator interface and std::allocator_traits, which makes it
> harder to use with non-crappy third-party C++ libraries.

I believe the allocator interface just need a allocate() and deallocate(). So you use cookmem::MemPool as allocator.

That said, it is better to just replace malloc / free / new / delete since it is very error prone mixing allocators (or at least my perception is that it is error prone).

>
> Also, how does this compare with other existing pool allocators like the
> Boost Pool Library?

Before I started working on CookMem, I did look around for similar stuff and nothing like it is available in C++ as open source. Boost Pool is for fixed size allocations. That is why it is quite simple to implement and very limited in functionality.

Heng Yuan

Öö Tiib

unread,
Sep 29, 2018, 8:41:18 AM9/29/18
to
Is my impression correct that it is only useful for single-threaded
software? That might be crippling in world where even phones are
quad core.


Paavo Helde

unread,
Sep 29, 2018, 1:33:15 PM9/29/18
to
On 29.09.2018 10:49, superdupe...@gmail.com wrote:
> On Saturday, September 29, 2018 at 12:19:52 AM UTC-7, Paavo Helde wrote:
>> On 29.09.2018 6:10, superdupe...@gmail.com wrote:
>>> CookMem is a a memory context / allocator written in C++11 that can be easily and quickly cleaned up by freeing the segments, rather than doing individual frees. It is extremely useful to deal with third party libraries that can potentially have memory leaks, or to simplify the memory management.
>>
>> Just for curiosity: how do you convince a crappy third-party library to
>> use this allocator? And how do you know when it's safe to release the
>> memory?
>
> The safest approach is to replace malloc / free. It does not matter if the third party library is C or C++. Before calling the third party library, switch malloc / free to use a memory pool. Once the call is finished, destroy the memory pool and switch back to whatever the previous memory pool or malloc / free.

This sounds like enforcing a strictly single-threaded program, is this so?

Also, how exactly do you redirect malloc/free dynamically? It's already
pretty tricky enough to do it statically for the whole program,
especially in Windows.

This also assumes that the third-party library does not maintain any
internal state which uses dynamic memory, otherwise it might crash badly
in the next call or upon program exit. Not to speak about that a
third-party library may even decide to spawn its own background threads
which might not like if global malloc/free were changed under their feet.

>
> You can take a look at how I do the performance test to compare dlmalloc vs CookMem:
>
> https://github.com/coconut2015/cookmem/blob/master/performances/
>
>
>> From the first glance it seems it is lacking the standard
>> std::allocator interface and std::allocator_traits, which makes it
>> harder to use with non-crappy third-party C++ libraries.
>
> I believe the allocator interface just need a allocate() and deallocate(). So you use cookmem::MemPool as allocator.
>
> That said, it is better to just replace malloc / free / new / delete since it is very error prone mixing allocators (or at least my perception is that it is error prone).

Yet you are proposing mixing allocators *on the fly*! If that's not
error-prone I do not know what is. Imagine having a local std::string
variable in a function which changes the global malloc/free.

>>
>> Also, how does this compare with other existing pool allocators like the
>> Boost Pool Library?
>
> Before I started working on CookMem, I did look around for similar stuff and nothing like it is available in C++ as open source. Boost Pool is for fixed size allocations. That is why it is quite simple to implement and very limited in functionality.

I have implemented a special pooled allocator for parsing and loading
large XML files into memory. This would be a use case for a
variable-size pooled allocator, but definitely this allocator would be
used only for a certain data structure and should not replace the global
malloc. For example, if I throw an exception during the XML parse I do
not want the error message string to be allocated in the pooled allocator!



superdupe...@gmail.com

unread,
Sep 29, 2018, 1:53:13 PM9/29/18
to
Hi, currently I did not add locking code. This can be easily added later but not a focus.

The intention is that each thread should have its memory pool through thread local or whatever other means.

One use case is that if one thread is about to exit, suspended. We want to release all the memory associated with that thread. Can we release all the resources associated with that thread. Using this memory pool can do so.

Obviously, there are other pieces you need to incorporate this memory pool.

Heng Yuan

superdupe...@gmail.com

unread,
Sep 29, 2018, 2:22:43 PM9/29/18
to
To Paavo Helde,

I answered the question of single thread in another post. Please refer to that. Although I do forgot to add if the 3rd party library is multithreaded, you obviously would want to avoid using CookMem for now. Most of the time though, the reason we would want to use 3rd party code is because of some utility thing that is usually single threaded.

Regarding mixing allocators. There will be complicated cases where you have one allocator that should be freed, and another allocator that is still in use. Here is an example that deals with this situation:

https://github.com/coconut2015/cookmem/blob/master/examples/ex_4.cpp

Obviously, it is not trivial. At least memory pool provides a tool for you to extract specific stuff out of it before discarding the rest.

Now as for shared memory / context by third party libraries. Obviously you will need to be extremely careful. There are initiation related memory, execution related, and the scope of the memory pool needs to be carefully considered. At work, I have encountered situations like this that must be carefully dealt with, but it is definitely doable.

Overall, the use of memory pool is not going to be trivial. You need a lot of other pieces (architectural design etc) to even begin using it. Also, it is not supposed to be an end-all-solve-all solution. It is just one solution. That said, sometimes it can be the best solution.

Heng Yuan

Marcel Mueller

unread,
Sep 29, 2018, 3:57:26 PM9/29/18
to
Am 29.09.2018 um 09:49 schrieb superdupe...@gmail.com:
> The safest approach is to replace malloc / free. It does not matter if the third party library is C or C++. Before calling the third party library, switch malloc / free to use a memory pool. Once the call is finished, destroy the memory pool and switch back to whatever the previous memory pool or malloc / free.

This will sadly fail for many libraries as they tend to use internal
caching. I.e. allocate some objects statically when they are needed
first. These objects are used for later library calls.

So unless your library is explicitly stated to use /no persistent data
objects/ the resulting program will always have undefined behavior. And
I never have seen a library that guarantees to make no non-temporary
allocations - except for those which do not use dynamic memory at all,
of course.


Marcel

superdupe...@gmail.com

unread,
Sep 29, 2018, 4:16:13 PM9/29/18
to
Yes, dealing with static globals can be a major issue and there are no fixed solutions. In fact, it could be no solutions at all unless extensive changes are made.

One approach is to trigger the initiation of some globals using system malloc / free. Then switch to memory pool for later uses with no caching. For one open source library I used at work, this approach was quite successful and allowed us to do version upgrades of the library with minimal modifications.

With memory pool, at least there can be a solution for certain cases with restrictions. This is the point.

Öö Tiib

unread,
Sep 29, 2018, 5:31:34 PM9/29/18
to
On Saturday, 29 September 2018 20:53:13 UTC+3, superdupe...@gmail.com wrote:
> On Saturday, September 29, 2018 at 5:41:18 AM UTC-7, Öö Tiib wrote:
> > On Saturday, 29 September 2018 06:10:45 UTC+3, superdupe...@gmail.com wrote:
> > > CookMem is a a memory context / allocator written in C++11 that can be easily and quickly cleaned up by freeing the segments, rather than doing individual frees. It is extremely useful to deal with third party libraries that can potentially have memory leaks, or to simplify the memory management.
> > >
> > > The secondary goal of this project is to provide a good enough performance that should be comparable to dlmalloc and its derivatives. In fact, the algorithms and some code used here are mostly based on dlmalloc. I basically took the pain to understand dlmalloc and rewrote the logics in C++. Some key calculations though are kept the same as dlmalloc since there is no point to reinvent wheels.
> > >
> > > Github page: https://github.com/coconut2015/cookmem
> > > Project documentation: https://coconut2015.github.io/cookmem/index.html
> > >
> > > It is licensed under APL 2.0.
> >
> > Is my impression correct that it is only useful for single-threaded
> > software? That might be crippling in world where even phones are
> > quad core.
>
> Hi, currently I did not add locking code. This can be easily added
> later but not a focus.

Then it is unfair to compete with malloc and free that are thread-safe.
Have you benchmarked against malloc and free compiled as single-threaded?

> The intention is that each thread should have its memory pool through
> thread local or whatever other means.

That approach could be thinkable if there was clear distinction from
malloc and free that are not thread-local. Otherwise how we make
difference between things that are made strictly for thread-local
processing and things that may end their life-time in hands of some
other thread? Having such complications would make it error-prone,
but still useful as performance optimization.

> One use case is that if one thread is about to exit, suspended. We
> want to release all the memory associated with that thread. Can
> we release all the resources associated with that thread. Using
> this memory pool can do so.

This scenario would be only thinkable when threads acted like
separate processes, communicating with each other only
through rather restricted means like sockets or pipes or
memory-mapped files of operating system. It is no way the case
with usual multi-threaded programs and it would be likely better
to use actual multi-processing and the processes compiled as
single-threaded in these scenarios.

> Obviously, there are other pieces you need to incorporate this memory pool.

I did not manage to follow what you meant.

superdupe...@gmail.com

unread,
Sep 29, 2018, 6:47:35 PM9/29/18
to
On Saturday, September 29, 2018 at 2:31:34 PM UTC-7, Öö Tiib wrote:
> Then it is unfair to compete with malloc and free that are thread-safe.
> Have you benchmarked against malloc and free compiled as single-threaded?

I used the latest dlmalloc source (the mother of most if not all malloc implementations) which does not have lock enabled by default.

> That approach could be thinkable if there was clear distinction from
> malloc and free that are not thread-local. Otherwise how we make
> difference between things that are made strictly for thread-local
> processing and things that may end their life-time in hands of some
> other thread? Having such complications would make it error-prone,
> but still useful as performance optimization.

An approach is basically have a thread local variables to keep track of memory pool including system malloc/free is being used. And switch to use specific pool using the flag.

> This scenario would be only thinkable when threads acted like
> separate processes, communicating with each other only
> through rather restricted means like sockets or pipes or
> memory-mapped files of operating system. It is no way the case
> with usual multi-threaded programs and it would be likely better
> to use actual multi-processing and the processes compiled as
> single-threaded in these scenarios.

Imagine that you are running lots of threads to do some heavy processing.
Each thread has its own partition of work (which may or may not need to call 3rd party library), with some shared data cache, and thread specific memory. Now, one thread declares there is an error / solution and the job can be terminated early. What do you do at this point?

One possible solution is to gather the output info and suspend and kill all the threads and release associated resources. I am not saying that this is the only solution, but it certainly is a choice that pushes certain difficult aspects of task/resource management to specific places, rather than spreading it allover the place.

Now, you could argue that this type of work "should" be done in process model. I do not want to get into thread vs process debate. However, threads are typically lighter weight, and lower synchronization costs. Resource managements are also simpler.

>
> > Obviously, there are other pieces you need to incorporate this memory pool.
>
> I did not manage to follow what you meant.

What I meant that you cannot just drop in memory pool and expect it to magically solve the problem. It requires certain architectural designs such that memory pool provides the benefits. For typical non-server applications, one may never need it.

For a long running server process, how to avoid memory leak, fragmentation, etc are challenging issues. CookMem is intended to be a part of solutions, rather than the solution by itself.

Heng Yuan

Öö Tiib

unread,
Sep 29, 2018, 9:27:40 PM9/29/18
to
On Sunday, 30 September 2018 01:47:35 UTC+3, superdupe...@gmail.com wrote:
> On Saturday, September 29, 2018 at 2:31:34 PM UTC-7, Öö Tiib wrote:
> > Then it is unfair to compete with malloc and free that are thread-safe.
> > Have you benchmarked against malloc and free compiled as single-threaded?
>
> I used the latest dlmalloc source (the mother of most if not all
> malloc implementations) which does not have lock enabled by default.

That dlmalloc has thread-safety when USE_MALLOC_LOCK is defined and that
must be defined in multi threaded program unless someone wants to use
it narrowly for managing state that fully belongs to single thread
only.
There are no authorities, mothers and fathers in software development
products. Instead lot of people write and rewrite and repair and break
similar algorithms from year to year.

>
> > That approach could be thinkable if there was clear distinction from
> > malloc and free that are not thread-local. Otherwise how we make
> > difference between things that are made strictly for thread-local
> > processing and things that may end their life-time in hands of some
> > other thread? Having such complications would make it error-prone,
> > but still useful as performance optimization.
>
> An approach is basically have a thread local variables to keep track
> of memory pool including system malloc/free is being used.
> And switch to use specific pool using the flag.

But your cookmem does not do that, it is just totally thread-unsafe?
Also its architecture is not designed to carry described thread
local pools. It could be instrumented with locks at best.

> > This scenario would be only thinkable when threads acted like
> > separate processes, communicating with each other only
> > through rather restricted means like sockets or pipes or
> > memory-mapped files of operating system. It is no way the case
> > with usual multi-threaded programs and it would be likely better
> > to use actual multi-processing and the processes compiled as
> > single-threaded in these scenarios.
>
> Imagine that you are running lots of threads to do some heavy processing.
> Each thread has its own partition of work (which may or may not need
> to call 3rd party library), with some shared data cache, and thread
> specific memory. Now, one thread declares there is an error / solution
> and the job can be terminated early. What do you do at this point?
>
> One possible solution is to gather the output info and suspend and
> kill all the threads and release associated resources. I am not
> saying that this is the only solution, but it certainly is a choice
> that pushes certain difficult aspects of task/resource management
> to specific places, rather than spreading it allover the place.
>
> Now, you could argue that this type of work "should" be done in
> process model. I do not want to get into thread vs process
> debate. However, threads are typically lighter weight, and
> lower synchronization costs. Resource managements are also simpler.

All we talk about is resource management, particularly memory
management here. Indeed it is simpler with thread-safe allocator.
However with thread-unsafe allocator it will be rather complicated
and error-prone. Someone naively using cookmem in their network service
will possibly open it up to wide array of attack vectors.

> > > Obviously, there are other pieces you need to incorporate this memory pool.
> >
> > I did not manage to follow what you meant.
>
> What I meant that you cannot just drop in memory pool and expect
> it to magically solve the problem. It requires certain architectural
> designs such that memory pool provides the benefits. For typical
> non-server applications, one may never need it.

Yes but if global malloc and free are replaced with thread-unsafe
then the tools to control that aspect of architecture are abstracted
away from architect's hands. The fundamental block, the thread-safe
allocator upon what architect can build thread-local pools if needed
is now missing from the landscape.

> For a long running server process, how to avoid memory leak,
> fragmentation, etc are challenging issues. CookMem is intended
> to be a part of solutions, rather than the solution by itself.

I did not ask about other possible programming problems. I was
particularly interested in under what scenario you consider
your thread-unsafe allocator to be viable as part of solution.
We have plenty of other issues to solve, but those are orthogonal
to thread-safety of allocator.

superdupe...@gmail.com

unread,
Sep 29, 2018, 10:10:20 PM9/29/18
to
To Öö Tiib

CookMem itself does not provide thread safety at all. In fact, I clearly stated that it does not have locking. Yet, this is not an important feature for memory context like CookMem for the reasons below.

Instead of using a single malloc that always locks during allocation / deallocation, it is in fact more efficient that each thread has its own lock-free memory context. If you have specific data needs to be shared as shared / global memory, do the locking yourself in the shared / global memory context. After all, if the usage pattern is that most memory allocated in the local thread are not shared, why pay the extra locking cost?

In the end, CookMem is just a tool. It is up to you to find if it is useful.

Heng Yuan

Öö Tiib

unread,
Sep 29, 2018, 10:55:04 PM9/29/18
to
On Sunday, 30 September 2018 05:10:20 UTC+3, superdupe...@gmail.com wrote:
> To Öö Tiib
>
> CookMem itself does not provide thread safety at all. In fact, I
> clearly stated that it does not have locking. Yet, this is not an
> important feature for memory context like CookMem for the reasons
> below.

Ok.

> Instead of using a single malloc that always locks during
> allocation / deallocation, it is in fact more efficient that each
> thread has its own lock-free memory context. If you have specific
> data needs to be shared as shared / global memory, do the locking
> yourself in the shared / global memory context. After all, if the
> usage pattern is that most memory allocated in the local thread are
> not shared, why pay the extra locking cost?

But does cookmem provide that alternative lock-free thread-specific
memory context? My cursory skimming through it made impression that
no, that it would be tricky to enhance it with the feature. Sure, I
may be wrong, but then can you explain, how?

> In the end, CookMem is just a tool. It is up to you to find if it is useful.

That is certainly true and so I wanted to understand in what scenarios
you consider cookmem to be actually viable as tool.

superdupe...@gmail.com

unread,
Sep 30, 2018, 12:21:17 AM9/30/18
to
On Saturday, September 29, 2018 at 7:55:04 PM UTC-7, Öö Tiib wrote:
>
> But does cookmem provide that alternative lock-free thread-specific
> memory context? My cursory skimming through it made impression that
> no, that it would be tricky to enhance it with the feature. Sure, I
> may be wrong, but then can you explain, how?

You can make it thread-specific. I do not have any code related to this area since it is not limited to this use.

You can also take a look at.

https://en.wikipedia.org/wiki/Region-based_memory_management

CookMem is basically a memory context implemented using algorithms of dlmalloc.

Heng Yuan

Chris M. Thomasson

unread,
Sep 30, 2018, 1:02:50 AM9/30/18
to
On 9/28/2018 8:10 PM, superdupe...@gmail.com wrote:
> CookMem is a a memory context / allocator written in C++11 that can be easily and quickly cleaned up by freeing the segments,

> rather than doing individual frees.

Like a region allocator?

Chris M. Thomasson

unread,
Sep 30, 2018, 1:12:21 AM9/30/18
to
On 9/29/2018 3:47 PM, superdupe...@gmail.com wrote:
> On Saturday, September 29, 2018 at 2:31:34 PM UTC-7, Öö Tiib wrote:
>> Then it is unfair to compete with malloc and free that are thread-safe.
>> Have you benchmarked against malloc and free compiled as single-threaded?
>
> I used the latest dlmalloc source (the mother of most if not all malloc implementations) which does not have lock enabled by default.
>
>> That approach could be thinkable if there was clear distinction from
>> malloc and free that are not thread-local. Otherwise how we make
>> difference between things that are made strictly for thread-local
>> processing and things that may end their life-time in hands of some
>> other thread? Having such complications would make it error-prone,
>> but still useful as performance optimization.
>
> An approach is basically have a thread local variables to keep track of memory pool including system malloc/free is being used. And switch to use specific pool using the flag.
[...]

You want to have the "fast path" be thread local, completely isolated
and free of synchronization, implicit or explicit. The "really slow
path" can try to allocate more memory, or fail. I have a lot of
experience wrt creating these types of allocators. Also, take a look at
the allocator in TBB.

https://www.threadingbuildingblocks.org/tutorial-intel-tbb-scalable-memory-allocator

Also, how does your algorithm deal with false sharing? This can tear
things apart at the seams wrt performance...

Chris M. Thomasson

unread,
Sep 30, 2018, 1:23:49 AM9/30/18
to
There is a tricky way to create an allocator that uses nothing but
memory on a threads stack. It even allows for a memory M allocated by
thread A to be freed by thread B. It is sync-free in the fast path, and
lock-free in the slow path. I was experimenting around with a tricky way
to get another path that can be wait-free. So the levels went something
like:
______________________
(completely thread local)
fast-path = sync-free

(can communicate with another thread)
slow-path = wait-free
snail-path = lock-free
______________________

This was a long time ago, 10 years at least.

Chris M. Thomasson

unread,
Sep 30, 2018, 4:08:49 PM9/30/18
to
On 9/29/2018 10:02 PM, Chris M. Thomasson wrote:
> On 9/28/2018 8:10 PM, superdupe...@gmail.com wrote:
>> CookMem is a a memory context / allocator written in C++11 that can be
>> easily and quickly cleaned up by freeing the segments,
>
>> rather than doing individual frees.
>
> Like a region allocator?

Perhaps, something like Reaps might help you out:

https://people.cs.umass.edu/~emery/pubs/berger-oopsla2002.pdf

The key phrase: "rather than doing individual frees"

makes me think of region-like allocation.

superdupe...@gmail.com

unread,
Sep 30, 2018, 8:46:56 PM9/30/18
to
On Sunday, September 30, 2018 at 1:08:49 PM UTC-7, Chris M. Thomasson wrote:
> > Like a region allocator?
>
> Perhaps, something like Reaps might help you out:
>
> https://people.cs.umass.edu/~emery/pubs/berger-oopsla2002.pdf
>
> The key phrase: "rather than doing individual frees"
>
> makes me think of region-like allocation.

This is probably the older paper. He is the author of Hoard. I basically just looked away the second I saw the GPL license. Likewise for Ravenbrook's MPL.

dlmalloc, nedmalloc, jedmalloc etc are in permissive licenses. Since nedmalloc, which claims to be the fastest malloc, is based on dlmalloc, I basically took the pain to study dlmalloc instead since I mostly wanted a better interface without really needing to deal with multi-threading stuff.

And the result is the CookMem project.

Heng Yuan
0 new messages