[Boost-users] Zero-malloc memory pool for Boost Intrusive smart pointers

104 views
Skip to first unread message

Francesco via Boost-users

unread,
Mar 21, 2019, 9:02:33 PM3/21/19
to boost...@lists.boost.org, Francesco
Hi all,
I have developed for my needs a C++11 memory pool that can provides C++ object instances in the form of boost::intrusive_ptr<> that automatically return to the pool once their refcount goes to zero.

I found the need for such memory pool to have a zero-malloc memory pool algorithm for some performance-oriented applications that were relying on std::shared_ptr<>.

Here's the link in case anybody is interested:
with some (initial) benchmark as well.

I thought that this may be useful to anybody else already using Boost intrusive pointers as well...
Any comment is welcome!

Francesco Montorsi


Gavin Lambert via Boost-users

unread,
Mar 22, 2019, 12:44:42 AM3/22/19
to boost...@lists.boost.org, Gavin Lambert
On 22/03/2019 13:09, Francesco wrote:
> I have developed for my needs a C++11 memory pool that can provides C++
> object instances in the form of boost::intrusive_ptr<> that
> automatically return to the pool once their refcount goes to zero.
>
> I found the need for such memory pool to have a zero-malloc memory pool
> algorithm for some performance-oriented applications that were relying
> on std::shared_ptr<>.
>
> Here's the link in case anybody is interested:
> https://github.com/f18m/boost-intrusive-pool
> with some (initial) benchmark as well.

Did you look at std::make_shared<T>? It avoids the double allocation
for the control block that you're complaining about. (Although you
couldn't use it to allocate an arena block in this case -- while it does
support array allocation, there is then only one control block for the
whole array.)

Also the use of default construction and explicit recycle methods (and
requiring boost_intrusive_pool_item as a base class) is ugly.

You should be able to instead keep your pool_item type internal (not
exposed to the consumer) and use aligned_storage to hold uninitialised
storage for the actual item data, calling T's constructor and destructor
as needed to allocate and recycle the slot. (Or just wrap it in
std/boost::optional, although that can introduce some nesting ambiguities.)
_______________________________________________
Boost-users mailing list
Boost...@lists.boost.org
https://lists.boost.org/mailman/listinfo.cgi/boost-users

Francesco via Boost-users

unread,
Mar 22, 2019, 4:26:28 PM3/22/19
to boost...@lists.boost.org, Francesco, Gavin Lambert
Hi Gavin,

Il giorno ven 22 mar 2019 alle ore 05:44 Gavin Lambert via Boost-users <boost...@lists.boost.org> ha scritto:
Did you look at std::make_shared<T>?  It avoids the double allocation
for the control block that you're complaining about.  (Although you
couldn't use it to allocate an arena block in this case -- while it does
support array allocation, there is then only one control block for the
whole array.)

Yes I know about std::make_shared<> and indeed I was using that solution before implementing the memory pool based on intrusive pointers.
The problem is that in my workload I have several threads that need to get a smart pointer to a fairly-big object every time they process input data and they will release it later.
Now allocating with std::make_shared<>  a new object on every input data was causing my threads to do lots of mallocs/sec and that became the performace limiting factor.
The use of a memory pool instead allows me to do a very large memory allocation from time to time (in blocks of e.g. 1024 items) and still have each single item of the memory pool independently refcounted.

This usage pattern is not possible with std::shared_ptr<> and that's why I switched to boost::intrusive_ptr<> instead.
 
Also the use of default construction and explicit recycle methods (and
requiring boost_intrusive_pool_item as a base class) is ugly. 

You should be able to instead keep your pool_item type internal (not
exposed to the consumer) and use aligned_storage to hold uninitialised
storage for the actual item data, calling T's constructor and destructor
as needed to allocate and recycle the slot. 
My first implementation of the memory pool indeed was using C++11 perfect forwarding to call the ctor/dtor of type T when the item was pulled out the pool or was returning to it.
I later removed such feature because I realized that calling the ctor of the items to recycle produced issues with classes having multiple inheritance IIRC.
I don't think it's safe and sane to call the ctor of the same instance multiple times... 

I know that the implementation may be possibly refined from a C++/template style point of view but regarding performances I verified that it satifies well my needs and allows to write code using smart pointers that has zero-malloc when the memory pool has expanded to contain enough elements to handle the worst-simultaneous-active items...


(Or just wrap it in
std/boost::optional, although that can introduce some nesting ambiguities.)

This is not very clear to me... but I will take a look at std::optional.


Thanks for the suggestions though! 

Francesco

Gavin Lambert via Boost-users

unread,
Mar 22, 2019, 6:26:10 PM3/22/19
to boost...@lists.boost.org, Gavin Lambert
At 09:25 23/03/2019, Francesco wrote:
>Yes I know about std::make_shared<> and indeed I
>was using that solution before implementing the
>memory pool based on intrusive pointers.
>The problem is that in my workload I have
>several threads that need to get a smart pointer
>to a fairly-big object every time they process
>input data and they will release it later.

Which brings up another point; your current
implementation is not thread-safe. Which is
fine, but you should probably mention that
explicitly in your readme. A very common worker
pattern is to have all allocations on one thread
and all deallocations on another thread, and that
wouldn't work as-is.

>Now allocating with std::make_shared<>Â a new
>object on every input data was causing my
>threads to do lots of mallocs/sec and that
>became the performace limiting factor.
>The use of a memory pool instead allows me to do
>a very large memory allocation from time to time
>(in blocks of e.g. 1024 items) and still have
>each single item of the memory pool independently refcounted.

Yes, I realise that. It still could be possible
to have individual node allocations strung up
into a free list rather than requiring a larger
contiguous allocation. But you're correct that
arena allocation and intrusive_ptr is probably
more efficient in the end.

>My first implementation of the memory pool
>indeed was using C++11 perfect forwarding to
>call the ctor/dtor of type T when the item was
>pulled out the pool or was returning to it.
>I later removed such feature because I realized
>that calling the ctor of the items to recycle
>produced issues with classes having multiple inheritance IIRC.
>I don't think it's safe and sane to call the
>ctor of the same instance multiple times...Â

Calls must be paired -- you allocate aligned
uninitialised storage (aligned_storage), then
placement-new to call the constructor on that
storage, then eventually call the destructor, and
then later you can placement-new again.

You must never call the constructor twice in a
row or the destructor twice in a row (unless it's
trivial), but alternating is fine.

Again, this is how things like optional and
variant work internally -- and since they do that
part for you, it can be useful to re-use it
rather than re-implementing it.

Francesco via Boost-users

unread,
Mar 23, 2019, 10:46:02 AM3/23/19
to boost...@lists.boost.org, Francesco, Gavin Lambert
Hi Gavin,


Il giorno ven 22 mar 2019 alle ore 23:26 Gavin Lambert via Boost-users <boost...@lists.boost.org> ha scritto:

Which brings up another point; your current
implementation is not thread-safe.  Which is
fine, but you should probably mention that
explicitly in your readme.


 
>My first implementation of the memory pool
>indeed was using C++11 perfect forwarding to
>call the ctor/dtor of type T when the item was
>pulled out the pool or was returning to it.
>I later removed such feature because I realized
>that calling the ctor of the items to recycle
>produced issues with classes having multiple inheritance IIRC.
>I don't think it's safe and sane to call the
>ctor of the same instance multiple times...Â

Calls must be paired -- you allocate aligned
uninitialised storage (aligned_storage), then
placement-new to call the constructor on that
storage, then eventually call the destructor, and
then later you can placement-new again.

You must never call the constructor twice in a
row or the destructor twice in a row (unless it's
trivial), but alternating is fine.
Again, this is how things like optional and
variant work internally -- and since they do that
part for you, it can be useful to re-use it
rather than re-implementing it.

Interesting.
I remember I had some issues I could not figure out, but thanks for the advice. I may retry this approach (perfect forwarding to the ctor) in the future!

Thanks,
Francesco


 
Reply all
Reply to author
Forward
0 new messages