Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

garbage collection

177 views
Skip to first unread message

Andrew Goh

unread,
Jul 15, 2018, 10:47:11 AM7/15/18
to
hi all,

this is probably as old when c++ with new / delete is invented or even older than c++ has been around

what's the current state-of-art for 'dynamic' memory management, in the notion of *garbage collection*?
e.g.
there is smart pointers today e.g. shared_ptr

there is Boehm garbage collector
http://www.hboehm.info/gc/
is this still very popular or is this commonly being used today?

smart pointers are possibly simply 'good' but i've come across some articles stating that share_ptr in boost is *slow* (some 10x slower than without), is that still true today?

the other thing would be that has any one embedded smart pointers in place of (ordinary) pointers in complex linked data structures such as elaborate trees (e.g. AVL trees), complex hash maps / linked lists, in 'objects linked to objects' in complex graphs type of scenarios? possibly with circular references

notions of things like

root - o1 - o2 - o3 - o4 - o5

^^ this can be a complex tree / linked list / graph structure

would releasing the smart pointer between root-o1 automatically cause o1 to o5 to be 'garbage collected' and recovered?

bhoehm garbage collector is possibly good and may help resolve the circular references problem
however, bhoehm garbage collector is conservative, and does only mark and sweep
this could leave a lot of memory uncollected
e.g. in the above example when root-o1 is disconnected gc would need to figure out all the linked nodes in the graph and garbage collect them
and that the use of mark and sweep would leave fragmented memory after collection

is there something such as a mark and compact garbage collector? if the simple implementation is difficult perhaps with the combined used of smart pointers?

thanks in advance

Paavo Helde

unread,
Jul 15, 2018, 12:34:54 PM7/15/18
to
On 15.07.2018 17:47, Andrew Goh wrote:
> hi all,
>
> this is probably as old when c++ with new / delete is invented or even older than c++ has been around
>
> what's the current state-of-art for 'dynamic' memory management, in the notion of *garbage collection*?
> e.g.
> there is smart pointers today e.g. shared_ptr
>
> there is Boehm garbage collector
> http://www.hboehm.info/gc/
> is this still very popular or is this commonly being used today?

In C++ the only sensible way to have GC is to logically think that all
allocated objects are leaked, but that's ok as you have got infinite
memory. Behind the scenes the memory gets reused of course, but this
should not have any visible effect on the program behavior.

This means that any kind of non-memory resources will need extra care at
all levels as you need to ensure they get released properly and are not
left to GC which will run in unpredictable times in a random thread.

I believe the current consensus is that with RAII one can control both
memory and non-memory resources in the same way and with less hassle, so
RAII all the way down it is. Thus GC is effectively not needed in C++.

>
> smart pointers are possibly simply 'good' but i've come across some articles stating that share_ptr in boost is *slow* (some 10x slower than without), is that still true today?

The main slowness in C++ nowadays comes from multithread
synchronization. Both dynamic memory allocations and std::shared_ptr
reference count updates are guaranteed to be multithread-safe, so they
are inherently slow in this regard. Note that this only matters if you
have millions or billions of them.

Anyway, if the performance appears to be a problem for you then your
best bet is to reduce the number of multithread synchronizations.
Avoiding std::shared_ptr or using a single-threaded version of it does
not buy you much if you still allocate each node in your data structures
separately. One possible approach is to use memory pools instead so that
e.g. the whole AVL tree is constructed in a pool which will be allocated
and released in one or few steps.

Inside the memory pool one can implement links either as raw pointers or
better yet as integer offsets. The latter can save a lot of memory (been
there, done that, loading a 200 MB XML file into a DOM tree which would
not eat up all the CPU and RAM the computer had got).

Inside a memory pool cycles are a non-issue as well, the whole pool will
be released in a single step anyway.

> notions of things like
>
> root - o1 - o2 - o3 - o4 - o5
>
> ^^ this can be a complex tree / linked list / graph structure
>
> would releasing the smart pointer between root-o1 automatically cause o1 to o5 to be 'garbage collected' and recovered?

Yes, if you need separate parts of your data structures to be released
ASAP, then you can indeed use smart pointers. If the number of objects
is rather in thousands than in millions this ought to be OK. Cycles will
need special care.

>
> bhoehm garbage collector is possibly good and may help resolve the circular references problem
> however, bhoehm garbage collector is conservative, and does only mark and sweep
> this could leave a lot of memory uncollected
> e.g. in the above example when root-o1 is disconnected gc would need to figure out all the linked nodes in the graph and garbage collect them
> and that the use of mark and sweep would leave fragmented memory after collection

Even if the deallocation of nodes is delayed and performed in a
background thread or whatever, the time to allocate the nodes one-by-one
would be still significant. So I do not believe that using a Boehm
collector would automatically provide the best possible result.



woodb...@gmail.com

unread,
Jul 15, 2018, 2:51:01 PM7/15/18
to
To put it a little differently, you may be able to use
multiple instances of single-threaded processes that
use raw pointers and/or unique_ptr. That's what I do
with the C++ Middleware Writer:
https://github.com/Ebenezer-group/onwards


Brian
Ebenezer Enterprises - In G-d we trust.
http://webEbenezer.net

Chris M. Thomasson

unread,
Jul 15, 2018, 11:09:56 PM7/15/18
to
Think of proxy reference counts:

https://groups.google.com/d/topic/lock-free/X3fuuXknQF0/discussion
(read all, if interested...)

https://groups.google.com/d/topic/lock-free/QuLBH87z6B4/discussion

We can amortize the cost of reference counting by keeping multiple
objects under a proxy reference count. We can take a reference, then
iterate a shi% load of objects without having to mutate anything within
them for they say alive during the iteration. We take a reference, and
read full steam ahead. Fwiw, RCU is an example of a clever proxy
collector optimized for reads, wrt the way it works:

https://lwn.net/Articles/262464
(read all!)

So, to answer your question, something like RCU can keep everything
alive without a full blown heavy handed garbage collector

:^)

bitrex

unread,
Jul 16, 2018, 1:45:11 AM7/16/18
to
In C++ at least the ideal is that there is no "garbage" to collect.
Every resource that's "alive" is there for a reason. When the last
context where an object of type Foo needs to exist exits then the
object's destructor is called and all the resources it holds (that the
object ideally acquired via RAII) are freed automatically.

E.g. Java needs a garbage collector because there's only one place you
can allocate memory and instantiate resources which is in a heap of some
type; when a function call exits there isn't really a good way to "know"
without some kind of reference counting whether this resource or that in
the heap is still needed? or was it only just used that one time? How do
I know without some fashion of metadata

C++ "knows" because unlike Java it has a stack and everything that's not
on the heap is on the stack, and the policy such as it is is that
anything left on the stack when the function call exits is out the door.
The stack is there to be used as much as possible as an option of first
resort; code which uses the operator new/smart pointers to heap-allocate
objects that could fit perfectly well on the stack where the resource is
used only within that function call or a nested set of function calls is
functionally retarded, just do Object object and pass around by
reference or value via the stack, it's cool and fast.

IMO whether shared_ptr is 10x slower or not is irrelevant because it has
few good use cases in modern C++, modern C++ which uses it all over the
place is simply not well-designed code. I think I've used it in
something like three or four times. ever.

bitrex

unread,
Jul 16, 2018, 1:58:19 AM7/16/18
to
Or to be pedantic stuff like constexpr objects could be in read-only
Flash memory or ROM or EEPROM or something in the case of a
Harvard-architecture processor.

Siri Cruise

unread,
Jul 16, 2018, 3:18:18 AM7/16/18
to
In article <c85406c3-1cf2-4802...@googlegroups.com>,
Andrew Goh <andre...@gmail.com> wrote:

> hi all,
>
> this is probably as old when c++ with new / delete is invented or even older
> than c++ has been around
>
> what's the current state-of-art for 'dynamic' memory management, in the
> notion of *garbage collection*?

Same as it has always been:
reference counting
mark sweep
ad hoc

reference count:
Cannot cope with cyclic references naively.
Garbage is identified immediately on becoming garbage.

mark sweep:
Can collect any kind of reference graph.
Might introduce noticeable pauses during a mark sweep.

ad hoc:
Has to be written specifically for each program.

> there is Boehm garbage collector
> http://www.hboehm.info/gc/
> is this still very popular or is this commonly being used today?

Boehm is a mark/sweep collector. Unlike other such collector it examines memory
without a map locating pointers, so it examines raw words and guesses whether
they are address references.

I use it daily.

> smart pointers are possibly simply 'good' but i've come across some articles

Smart pointers are just a C++ specific technique to sneak in reference counting
(the smart pointer hides the count), mark/sweep (the smart pointer maintains the
list of base pointers), or ad hoc management.

> (e.g. AVL trees), complex hash maps / linked lists, in 'objects linked to
> objects' in complex graphs type of scenarios? possibly with circular
> references

Only mark/sweep can deal with cycles without any special effort by the
programmer. Reference counting has to have the programmer ensure at least edge
in every cycle is weak. Ad hoc leaves the whole thing to the programmer.

> however, bhoehm garbage collector is conservative, and does only mark and
> sweep

On 64 bit macosx, the lowest address is 4 billion. Integer values are typically
smaller than this and real values look like extremely high addresses, so
misidentification is rare.

> is there something such as a mark and compact garbage collector? if the
> simple implementation is difficult perhaps with the combined used of smart
> pointers?

A compactor has to be able to update all pointers which tends to be lannguage
specific. With hundreds of gigabytes of virtual memory, fragmentation is less
of a concern.

--
:-<> Siri Seal of Disavowal #000-001. Disavowed. Denied. Deleted. @
'I desire mercy, not sacrifice.' /|\
I'm saving up to buy the Donald a blue stone This post / \
from Metebelis 3. All praise the Great Don! insults Islam. Mohammed

Andrew Goh

unread,
Jul 16, 2018, 8:52:27 AM7/16/18
to
hi all,
thanks for all the responses, actually coming from the java world c++ is quite as 'native' to me, i'd guess most 'java programmers' (and so do other similar managed languages) are somewhat spoilt by garbage collection.

i'm coming back into c++ for various reasons among which modern multi-core processors increasingly have features that 'high level' languages such as java can only depend on jvm, compiler to 'optimise' the 'low level' codes. one has little influence over if jvm, compiler etc would after all use the features.
things that come to mind are the various SIMD, AVX, AVX2, AVX512 instructions and increasingly GPUs as well

c++ can use various processor intrinsics, link manual assembler optimised object files, link processor optimised libraries e.g. MKL or TBB
https://www.threadingbuildingblocks.org/
or even link as a cilk+ module, which probably make codes more portable and readable with all that vector optimizations

thanks for the notes on smart pointers and bhoehm gc i'd certainly try them out
https://www.cilkplus.org/

Vir Campestris

unread,
Jul 16, 2018, 4:20:34 PM7/16/18
to
On 16/07/2018 13:52, Andrew Goh wrote:
> thanks for all the responses, actually coming from the java world c++ is quite as 'native' to me, i'd guess most 'java programmers' (and so do other similar managed languages) are somewhat spoilt by garbage collection.

I have a satnav that runs Android, so the app will be Java.

Every few weeks I have to reboot it. It usually goes deaf (ignores voice
control); sometimes it doesn't switch between day and night correctly;
and every so often it plain crashes and burns. To me it's obviously got
some kind of resource leak.

Andy

Vir Campestris

unread,
Jul 16, 2018, 4:46:09 PM7/16/18
to
On 16/07/2018 06:45, bitrex wrote:
> IMO whether shared_ptr is 10x slower or not is irrelevant because it has
> few good use cases in modern C++, modern C++ which uses it all over the
> place is simply not well-designed code. I think I've used it in
> something like three or four times. ever.

I came into a project that was using raw pointers for objects with a
complex lifecycle. I halved the crash rate by swapping them for
shared_ptr and weak_ptr as appropriate. There were a mixture of leaks
and use-after-free problems.

They are overused; there are places where a unique_ptr will do the job.
But IMHO if you see thing* in modern C++ that's what the XP guys call a
code smell. It's dangerous.

Andy

woodb...@gmail.com

unread,
Jul 16, 2018, 5:49:38 PM7/16/18
to

bitrex

unread,
Jul 16, 2018, 7:05:03 PM7/16/18
to
I use raw pointers sometimes simply because I often use C++ to code for
devices with very small amounts of memory, like 4k sometimes. Smart
pointers are right out. But on devices like that I also never use "new"
except at startup so it works out.

C++1x provides many nice features over C that are welcome on platforms
like that regardless. It doesn't hurt performance or stupendously bloat
the code or all those old wives tales.

Chris M. Thomasson

unread,
Jul 16, 2018, 7:09:35 PM7/16/18
to
On 7/16/2018 12:18 AM, Siri Cruise wrote:
> In article <c85406c3-1cf2-4802...@googlegroups.com>,
> Andrew Goh <andre...@gmail.com> wrote:
>
>> hi all,
>>
>> this is probably as old when c++ with new / delete is invented or even older
>> than c++ has been around
>>
>> what's the current state-of-art for 'dynamic' memory management, in the
>> notion of *garbage collection*?
>
> Same as it has always been:
> reference counting
> mark sweep
> ad hoc
>
> reference count:
> Cannot cope with cyclic references naively.
> Garbage is identified immediately on becoming garbage.

Fwiw, proxy reference counting can handle cyclic links in
data-structures it protects. One point, to remove and delete an item,
one only needs to make it non-reachable to other threads, then defer the
deallocation until after quiescent period has elapsed. Making it
unreachable can be as simple as removing it from a linked list, or
whatever data-structure one uses. Reader threads can be iterating
through cyclic linked data-structures while writer threads concurrently
add and remove elements to/from them.

Soviet_Mario

unread,
Jul 16, 2018, 9:38:41 PM7/16/18
to
Il 15/07/2018 18:34, Paavo Helde ha scritto:
> On 15.07.2018 17:47, Andrew Goh wrote:
>> hi all,
>>
>> this is probably as old when c++ with new / delete is
>> invented or even older than c++ has been around
>>
>> what's the current state-of-art for 'dynamic' memory
>> management, in the notion of *garbage collection*?
>> e.g.
>> there is smart pointers today e.g. shared_ptr
>>
>> there is Boehm garbage collector
>> http://www.hboehm.info/gc/
>> is this still very popular or is this commonly being used
>> today?
>
> In C++ the only sensible way to have GC is to logically
> think that all allocated objects are leaked, but that's ok
> as you have got infinite memory.

Uh ... what do you actually mean with this "you have
infinite memory" ?
To use it up freely, never caring, and to simply rely on
try/catch/finally in case some error occurs ?

> Behind the scenes the
> memory gets reused of course,

again I can't understand.
Do you mean at the OS' memory management level ?
Or by some "hidden" code generated by modern compilers ?
Or even sth else ...

> but this should not have any
> visible effect on the program behavior.
>
> This means that any kind of non-memory resources will need
> extra care at all levels as you need to ensure they get
> released properly and are not left to GC which will run in
> unpredictable times in a random thread.
>
> I believe the current consensus is that with RAII one can

at the very outdated time I firstly read of RAII, memory
allocation was one of such resources. Why this distinction now ?

> control both memory and non-memory resources in the same way
> and with less hassle, so RAII all the way down it is. Thus
> GC is effectively not needed in C++.
>
>>

SNIP

ah, another question ... when you speak of POOLS, you mean
allocating on the heap (or even statically) a big contiguous
chunk and then overload new/delete into more specialized
(and faster) forms which pick up some space within that chunk ?

Ciao

>
>>
>> bhoehm garbage collector is possibly good and may help
>> resolve the circular references problem
>> however, bhoehm garbage collector is conservative, and
>> does only mark and sweep
>> this could leave a lot of memory uncollected
>> e.g. in the above example when root-o1 is disconnected gc
>> would need to figure out all the linked nodes in the graph
>> and garbage collect them
>> and that the use of mark and sweep would leave fragmented
>> memory after collection
>
> Even if the deallocation of nodes is delayed and performed
> in a background thread or whatever, the time to allocate the
> nodes one-by-one would be still significant. So I do not
> believe that using a Boehm collector would automatically
> provide the best possible result.
>
>
>


--
1) Resistere, resistere, resistere.
2) Se tutti pagano le tasse, le tasse le pagano tutti
Soviet_Mario - (aka Gatto_Vizzato)

Rosario19

unread,
Jul 17, 2018, 5:04:38 AM7/17/18
to
the leak can be in C++ even if there are no leak,
if the allocator (malloc()/free() or new()/delete()) not has one algo
for return to OS the memory (zeroed first) that can

Rosario19

unread,
Jul 17, 2018, 5:06:00 AM7/17/18
to
On Tue, 17 Jul 2018 11:09:21 +0200, Rosario19 wrote:

>the leak can be in C++ even if there are no leak,
>if the allocator (malloc()/free() or new()/delete()) not has one algo
>for return to OS the memory (zeroed first) that can

the leak can be in C++ even if there are no leak,
if the allocator (malloc()/free() or new()/delete()) not has one algo
for free to OS the memory (zeroed first) that can

Paavo Helde

unread,
Jul 17, 2018, 9:05:12 AM7/17/18
to
On 17.07.2018 4:38, Soviet_Mario wrote:
> Il 15/07/2018 18:34, Paavo Helde ha scritto:
>> On 15.07.2018 17:47, Andrew Goh wrote:
>>> hi all,
>>>
>>> this is probably as old when c++ with new / delete is invented or
>>> even older than c++ has been around
>>>
>>> what's the current state-of-art for 'dynamic' memory management, in
>>> the notion of *garbage collection*?
>>> e.g.
>>> there is smart pointers today e.g. shared_ptr
>>>
>>> there is Boehm garbage collector
>>> http://www.hboehm.info/gc/
>>> is this still very popular or is this commonly being used today?
>>
>> In C++ the only sensible way to have GC is to logically think that all
>> allocated objects are leaked, but that's ok as you have got infinite
>> memory.
>
> Uh ... what do you actually mean with this "you have infinite memory" ?
> To use it up freely, never caring, and to simply rely on
> try/catch/finally in case some error occurs ?

GC basically means leaking all memory, to be cleaned up sometimes later.
This has nothing to do with errors.

>> Behind the scenes the memory gets reused of course,
>
> again I can't understand.
> Do you mean at the OS' memory management level ?
> Or by some "hidden" code generated by modern compilers ?
> Or even sth else ...

By the garbage collector of course. The recent C++ standards include
optional support for the garbage collector, I guess this would count as
"hidden code added by the compiler". But I am not sure if any compiler
has actually implemented this. Alternatively, you add non-hidden code
like Boehm collector into your program.

If there is a plenty of memory, the GC might indeed decide there is no
need to free any memory at all and it can leave the memory cleanup to OS
at the process exit. But this is a corner case.

>
>> but this should not have any visible effect on the program behavior.
>>
>> This means that any kind of non-memory resources will need extra care
>> at all levels as you need to ensure they get released properly and are
>> not left to GC which will run in unpredictable times in a random thread.
>>
>> I believe the current consensus is that with RAII one can
>
> at the very outdated time I firstly read of RAII, memory allocation was
> one of such resources. Why this distinction now ?

Memory can be released in any thread in any time without a direct impact
to the running program. That makes GC possible.

Other resources must be released more deterministically. A file must be
flushed and closed before it can be processed. A mutex lock must be
released in the same thread. A database transaction must be committed
before the program exit, or it would be lost. Etc.

>
>> control both memory and non-memory resources in the same way and with
>> less hassle, so RAII all the way down it is. Thus GC is effectively
>> not needed in C++.
>>
>>>
>
> SNIP
>
> ah, another question ... when you speak of POOLS, you mean allocating on
> the heap (or even statically) a big contiguous chunk and then overload
> new/delete into more specialized (and faster) forms which pick up some
> space within that chunk ?

Yes, few big chunks on the heap.


Vir Campestris

unread,
Jul 17, 2018, 6:14:48 PM7/17/18
to
On 17/07/2018 10:09, Rosario19 wrote:
> the leak can be in C++ even if there are no leak,
> if the allocator (malloc()/free() or new()/delete()) not has one algo
> for return to OS the memory (zeroed first) that can

I'm sorry, I'm not sure I understand you.

Heap management at the malloc/free level has been pretty much bug free
for 30 years or so.

Andy

Paavo Helde

unread,
Jul 17, 2018, 6:40:44 PM7/17/18
to
I believe he/she/it is talking about the program not releasing freed
memory back to OS. This can be caused by memory fragmentation, small
leaks or just failing to call a special function for it
(scalable_allocation_command(TBBMALLOC_CLEAN_ALL_BUFFERS, NULL) for TBB,
for example).


Tim Rentsch

unread,
Jul 18, 2018, 6:42:50 AM7/18/18
to
Andrew Goh <andre...@gmail.com> writes:

> [edited for summarization]
>
> what's the current state-of-art for 'dynamic' memory management, in
> the notion of *garbage collection*?
>
> there is Boehm garbage collector, http://www.hboehm.info/gc/
> [a conservative mark/sweep style collector, so it will reclaim
> cyclic structures] [how effective are conservative collectors at
> not leaking?]
>
> [questions about smart pointers as an alternative to some
> more general automatic management scheme]
>
> is there something such as a mark and compact garbage collector?
> if the simple implementation is difficult perhaps with the
> combined used of smart pointers?

You have asked a question about a very big topic.

Automatic memory management (often generically called "garbage
collection", or GC) has a wide range of approaches and
implementation strategies. None of these questions have
simple answers. (Before going further, disclaimer: I know
very little about C++'s smart pointers.)

Reference counting (which I believe is the approach C++ smart
pointers take) has several advantageous properties: it can be
implemented locally; it tends to be "smooth" in that memory is
deallocated in small slices rather than a bunch at once; freeing
happens more or less immediately, and deterministically; and it
is easy to understand. Reference counting also has several
disadvantageous properties: it will not reclaim cyclic structures
unless they explicitly have the cycles removed; it consumes more
resources (as a general rule) than more global schemes, in both
space and time; the invariants maintained must be kept exactly
right, which makes it reliant on destructors being run, which may
lead to hard-to-understand behavior in the presence of exception
processing. Reference counting works well for certain classes
of applications.

More global schemes, such as mark/sweep, compacting collectors,
or generation scavenging, centralize the memory management
function, and usually are what people mean when they say "GC".
Some techniques require more control over the language than is
available in C++. Some additional important comments:

Having GC available doesn't completely eliminate the need to do
manual or explicit memory management. It does reduce it by a
very large factor, but sometimes it's important to nil out a
pointer, or take some other step.

Early GC schemes were what is called "stop the world" collectors,
where all other processing stops until the collector is done.
That is no longer true in modern GC implementations. In fact,
for some time now there are GC algorithms where the collector
runs in a separate thread.

Conservative collectors sound horrifying, but in practice they
work effectively enough so the difference can be ignored in most
applications, especially on machines with 64-bit pointers. Some
measurements were done for the Boehm collector, which should be
easy to find if someone is interested to look.

GC has a reputation (at least with some people) as being slower
than manual memory management. This reputation is not deserved.
Bjarne wrote about this some years ago, and pointed out that it
is notoriously difficult to compare the time costs of the two
approaches. Certainly in some cases programs get faster after
switching to a GC-based scheme (it was the Boehm collector
in particular in one case I'm familiar with).

Probably the biggest downside of GC-style management is loss of
control over when (or sometimes even whether) finalizers are
run. This is a special case of the earlier comment about GC
not eliminating the need to manage some resources manually in
some cases.

Probably the biggest upside of GC-style memory management is a
big increase in productivity, which has been measured as a
factor somewhere between 1.5 and 2. I know of no studies that
disagree with these findings.

I think it is commonly true that various developers are either
pro-GC or anti-GC. I try to be more agnostic about it: there
are some cases where having GC is an enormous boon, and other
cases where it is essential to maintain manual control. I don't
think either stance is right all the time. So I hope this has
given you a flavor of the various plusses and minusses of the
different possibilities.

Paavo Helde

unread,
Jul 18, 2018, 10:24:09 AM7/18/18
to
On 18.07.2018 13:42, Tim Rentsch wrote:

> Reference counting also has several
> disadvantageous properties: it will not reclaim cyclic structures
> unless they explicitly have the cycles removed;

Using std::weak_ptr might provide some mitigation here.

> it consumes more
> resources (as a general rule) than more global schemes, in both
> space and time;

Depends on the smartpointer. For example, std::unique_ptr consumes zero
resources in both space and time. Yes, std::shared_ptr is a bit
heavyweight because the reference-count update is required to be
multithread-safe. A single-thread only smartpointer is much faster in
multithreaded programs, but its usage obviously requires more care.

> the invariants maintained must be kept exactly
> right, which makes it reliant on destructors being run, which may
> lead to hard-to-understand behavior in the presence of exception
> processing.

Not sure what you want to say here? That if the program is buggy it
might not work properly? Or that resources other than memory cannot be
released by GC? Yes, that's the main problem with GC.

[...]
> Probably the biggest upside of GC-style memory management is a
> big increase in productivity, which has been measured as a
> factor somewhere between 1.5 and 2.

Compared to what? To the proper C++ code using std::vector, std::string,
std::make_unique, std::make_shared, or to the C or C-style C++ code
calling malloc/free or new/delete manually?

I'm sure GC suits fine some programs.

woodb...@gmail.com

unread,
Jul 18, 2018, 12:42:31 PM7/18/18
to
That was my question also.

> std::make_shared, or to the C or C-style C++ code
> calling malloc/free or new/delete manually?
>

Vector and string use new/delete manually.


Brian
Ebenezer Enterprises - Enjoying programming again.
https://github.com/Ebenezer-group/onwards

Andrew Goh

unread,
Jul 18, 2018, 4:21:53 PM7/18/18
to
hi all,
i tend to agree with Paavo Helde point of view about smart pointers and all
the notion about 'leaking resources' in particular memory, i'd think would sound foreign to say a java or c# programmer

in java, you simply do a
MyClass myobject = new MyClass();

and you literally *forget* about it (yup that's leaked), when it goes out of scope say that it is allocated on the stack, garbage collection takes care of cleaning that up

but i had to say these things are 'difficult' in c++ not because it isn't possible to do GC in C++. but more because

1) c++ allows raw pointers to be used, if you look at the above statement from java that isn't a pointer, it is simply a reference, and the underlying memory manager takes care of cleaning those up

Bjarne Stroustrup basically mentioned some of the issues when raw pointers is used along with garbage collection
http://www.stroustrup.com/C++11FAQ.html#gc-abi
the use of raw pointers may lead to ambiguous situations where it may not be possible for gc to tell if a particular object is after all still 'in use'

the cool thing is that 'smart' pointers is invented in place of raw pointers
and 'smart' pointers are basically *references* - not real pointers
but referenced counted smart pointers couldn't resolve circular references (i.e. loops) without GC

there is also this article about Boost shared_ptr being '10 times slower than other means e.g. raw pointers', i'm not sure if this is still true today
http://flyingfrogblog.blogspot.com/2011/01/boosts-sharedptr-up-to-10-slower-than.html

ideally i'd prefer to have implementations amounting to 'a smarter smart pointers', i.e. the smart pointers are literally used as a full blown garbage collection implementation rather than simply reference counting scheme

today's desktop cpus and various mobile arm based cpus are moving towards a trend of multi cores and in addition vector computing is becoming increasingly common e.g. AVX2, AVX512 coming next to all new X86_64 platforms. given this it would be better to see memory management as a 'process' rather than the 'single threaded' simplistic view of clean up as you go model.

the simplistic view is very 'single threaded' (i.e. clean up as you go) and that adds to the total elapsed time as you could imagine the extreme case of a multi-core cpu where all the other cores are simply idle while that single thread is busy doing all that 'clean up as you go'.
i'm literally ok with 'stop the world' which would introduce stalls so that all that 'garbage collection' can do all that 'mark and sweep' and even compacting memory management. the end result is often a big reduction of total elapsed time which translate to a faster / more productive program execution (sometimes significantly faster)

however, i do agree with bitrex about that raw pointers are sometimes still a necessary evil. i've meddled on arduino embedded programming and on the mcu's sometimes 20k (note not 2 megs or 2 gig) it is just 20k ram is all you have.
i still use c++ to keep codes organised, but then memory management becomes totally 'manual' - clean up as you go. here, it is not application efficiency, it is *memory efficiency* and *real time-ness* that matters more

2) the other reason it is *hard* to do GC in C++ as GC is not *insisted* upon in C++, not mandated in the spec, and the language is not based on garbage collection (e.g. like java, c#, which without GC those languages simply fail)

but this throws open the different means of memory management including smart pointers (e.g. shared_ptr), garbage collection e.g. boehm GC etc
and the *garbage collection / memory management* process is *very hard* on large complex apps. there are many *buggy* implementations of GC and memory management as memory management / garbage collection is now simply an 'add-on' to c++ rather than mandated

Vir Campestris

unread,
Jul 19, 2018, 4:24:08 PM7/19/18
to
On 17/07/2018 23:40, Paavo Helde wrote:
> I believe he/she/it is talking about the program not releasing freed
> memory back to OS. This can be caused by memory fragmentation, small
> leaks or just failing to call a special function for it
> (scalable_allocation_command(TBBMALLOC_CLEAN_ALL_BUFFERS, NULL) for TBB,
> for example).

I've fixed bugs in that area by setting custom deleters in
(unique|shared)_ptr.

Andy

Jorgen Grahn

unread,
Jul 30, 2018, 6:52:21 AM7/30/18
to
On Mon, 2018-07-16, Andrew Goh wrote:
> hi all,

> thanks for all the responses, actually coming from the java world
> c++ is quite as 'native' to me, i'd guess most 'java programmers'
> (and so do other similar managed languages) are somewhat spoilt by
> garbage collection.

Then you should use the things discussed above with care. A lot of
C++ code doesn't need ownership-by-pointer at all; things are often
owned by being on the stack, or owned by some other object, or owned
by a std::vector or similar.

Smart pointers are useful, but don't use them to pretend it's
Java or Python. Strict and rigid ownership is a good thing,
in the scenarios where it works.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

woodb...@gmail.com

unread,
Jul 30, 2018, 11:10:37 AM7/30/18
to
On Monday, July 30, 2018 at 5:52:21 AM UTC-5, Jorgen Grahn wrote:
> On Mon, 2018-07-16, Andrew Goh wrote:
> > hi all,
>
> > thanks for all the responses, actually coming from the java world
> > c++ is quite as 'native' to me, i'd guess most 'java programmers'
> > (and so do other similar managed languages) are somewhat spoilt by
> > garbage collection.
>
> Then you should use the things discussed above with care. A lot of
> C++ code doesn't need ownership-by-pointer at all; things are often
> owned by being on the stack, or owned by some other object, or owned
> by a std::vector or similar.
>
> Smart pointers are useful, but don't use them to pretend it's
> Java or Python.

The developer of unique_ptr describes a good alternative
to unique_ptr here:
https://stackoverflow.com/questions/38780596/how-to-handle-constructors-that-must-acquire-multiple-resources-in-an-exception#38780597

I sometimes use unique_ptr, but not that often.


Brian
Ebenezer Enterprises
http://webEbenezer.net
https://github.com/Ebenezer-group/onwards

Tim Rentsch

unread,
Aug 5, 2018, 10:45:30 AM8/5/18
to
Paavo Helde <myfir...@osa.pri.ee> writes:

> On 18.07.2018 13:42, Tim Rentsch wrote:
>
>> Reference counting also has several
>> disadvantageous properties: it will not reclaim cyclic structures
>> unless they explicitly have the cycles removed;
>
> Using std::weak_ptr might provide some mitigation here.

Sure. But that starts to move into the domain of doing manual
management. One needs to think about where to use the weak
pointers: not enough, and cycles don't get reclaimed; too many,
and the data structure falls apart too early. That doesn't mean
reference counting is hopeless -- any non-trivial application does
some level of manual management, and even full-scale mark/sweep
environments need weak pointers in some circumstances. The
important thing is that making sure everything gets reclaimed
incurs a mental cost, and that cost is higher for reference
counting than it is for mark/sweep-style collectors.

>> it consumes more
>> resources (as a general rule) than more global schemes, in both
>> space and time;
>
> Depends on the smartpointer. For example, std::unique_ptr consumes
> zero resources in both space and time. Yes, std::shared_ptr is a bit
> heavyweight because the reference-count update is required to be
> multithread-safe. A single-thread only smartpointer is much faster in
> multithreaded programs, but its usage obviously requires more care.

(Disclaimer: I'm not a unique_ptr expert. I did look into the
issues here, and believe my comments below are correct, but I very
well may have missed or misunderstood something. All feedback
welcome.)

First, my comment was about reference counting, and unique_ptr is
not reference counting. Or it may be thought of as 1-bit RC -- it
either has a non-null pointer or it doesn't (and the "count" is in
the unique_pointer itself, not in shared per-object memory). In
any case it is definitely less automatic than full RC (which I take
as being equivalent to shared_ptr, with a similar disclaimer). As
such it requires more mental effort than using shared_ptr or some
other more complete RC scheme.

Second, regarding resource use. The idea that unique_ptr has no
space or time overhead is, I believe, incorrect. In particular,
when updating a unique_ptr from a different unique_ptr (temporary),
there is a space cost in the form of code space needed for a value
check, and a time cost in checking the value to see if a destructor
needs to be called. Those may not be large costs, but they are not
zero. Compare to a mark/sweep-style environment, where all that
needs to happen is copying the pointer.

Third, besides needing more manual management, using unique_ptrs
carries an additional mental cost in having to understand the more
elaborate language semantics that they rely on. In particular,
unique_ptr has move semantics but not copy semantics. The kinds of
code patterns used for "normal" objects (ie, that do have copy
semantics) in many cases simply don't apply to unique_ptrs. To use
unique_ptrs requires a new kind of thinking, which means more mental
cost. I'm not saying these things can't be learned -- obviously
they can be. But there is an initial cost in learning, and more
importantly an ongoing cost in using (cost here meaning mental
effort) things like unique_ptr that don't use the usual semantic
constructs. If that use could be tucked away underneath a layer of
abstraction that would be one thing, but with unique_ptr it's right
there in your face and there's no getting around having to deal with
it. Is that a good tradeoff to make? I expect in some cases it is.
I believe though that in many cases a different tradeoff would be
better.

>> the invariants maintained must be kept exactly
>> right, which makes it reliant on destructors being run, which may
>> lead to hard-to-understand behavior in the presence of exception
>> processing.
>
> Not sure what you want to say here? [...]

Invariants maintained locally and incrementally are often harder
to get right and harder to understand than invariants maintained
centrally.

>> Probably the biggest upside of GC-style memory management is a
>> big increase in productivity, which has been measured as a
>> factor somewhere between 1.5 and 2.
>
> Compared to what?

Compared to doing all memory management manually.

> To the proper C++ code using std::vector,
> std::string, std::make_unique, std::make_shared, or to the C or
> C-style C++ code calling malloc/free or new/delete manually?

Even just using unique_ptr and shared_ptr, C++ provides a form of
reference counting, which is more automatic management than having
to manage all memory (de-)allocation manually. I don't know of any
studies that compare the productivity results of using reference
counting versus a more centralized form of garbage collection such
as mark/sweep. Based on personal experience I expect that some
level of increased productivity would be found, but it would be
nice to get some more objective results on that.

> I'm sure GC suits fine some programs.

Yes, and the question is which programs (and which classes of
programs). I would never claim that any approach is right for all
applications. But I think many people, or maybe even most people,
form opinions about that based on pre-conceptions that turn out
not to be right if examined more closely, so I like to encourage
people to look at things from different points of view, and see
if their perceptions might change on further investigations.
0 new messages