=?UTF-8?B?w5bDtiBUaWli?= <
oot...@hot.ee> writes:
>On Wednesday, 7 December 2022 at 00:04:34 UTC+2, Scott Lurndal wrote:
>> "Chris M. Thomasson" <
chris.m.t...@gmail.com> writes:
>> >On 12/6/2022 1:53 PM, Paavo Helde wrote:
>> >> 06.12.2022 22:39 Scott Lurndal kirjutas:
>> >>
>> >>>
>> >>> The fact that smart pointers do allocation/deallocation has made
>> >>> them useless for high-performance threaded code, IMO.
>> >>
>> >> You have got it backwards. Smartpointers are taken into use for coping
>> >> with the fact that objects need to by dynamically allocated and
>> >> deallocated, by the program logic.
>> >>
>> >> And this allocation/deallocation would happen relatively rarely.
>> Assumption not in evidence. I've personnally had to rip smart pointers
>> out of code because the allocation/deallocation happened very
>> frequently. One if the applications was simulating a processor pipeline,
>> another was handling network packets both were written by well-educated
>> people familiar with C++.
>>
>> Granted, one can specify a more efficient allocator, but
>>
>> 1) most C++ programmers don't bother or don't know how
>> 2) Even then there is unnecessary overhead unless the allocator is pool based.
>>
>> KISS applies, always.
>
>No one argues with that. Just that keeping it simple is far from simple.
>For example it is tricky to keep dynamic allocations minimal. That is
>not fault of smart pointers.
In my experience, it has been generally sufficent to pre-allocate the
data structures and store them in a table or look-aside list,
as the maximum number is bounded.
For example, an application handling network packets on a processor
with 64 cores, may only need 128 jumbo packet buffers if the packet processing
thread count matches the core count. These can be preallocated
and then passed as regular pointers throughout the flow.\
(Specialized DPUs have a custom hardware block (network pool allocator)
that allocates hardware buffers to packets on ingress and those buffers
are passed by hardware to the other blocks in the flow, such as
blocks to identify the flow, fragment/defragment a packet,
apply encryption/decryption algorithms, all controlled by a
hardware scheduler block, etc.).
Likewise for a simulation of an internal processor interconnect
such as ring or mesh structure, there are a fixed maximum number
of flits than can be active at any point in time. Preallocating
them into a lookaside list eliminates allocation and deallocation
overhead on every flit.
When simulating a full SoC, the maximum inflight objects is
likewise bounded and for the most part, can be preallocated.