This is already present in at least one popular implementation (lcc-
win32), and would clearly be a useful addition to the language,
reducing the burden on programmers to manage memory and increasing
security.
For many applications of C (notably real-time) GC is completely
inappropriate.
Any implementation is free to add GC as an extension. It might be a
reasonable suggestion to standardise such an extension outside of the
core language standard.
--
Ian Collins.
among other things...
> Any implementation is free to add GC as an extension. It might be a
> reasonable suggestion to standardise such an extension outside of the
> core language standard.
>
agreed.
such a feature is probably better either ommited, or, at most, left as an
"optional" feature for the implementation. sadly, as well, one can't
'define' refcounting in a language like C absent otherwise constraining the
implementation (either the compiler ends up generating secret runtime calls,
or we end up specifying the details of how the GC is to be implemented).
one can't just say, "well, we will have a conservative mark/sweep
collector". though this is likely to be the case in practice,
standardization would limit us by making implementations less free to use
other options (conservative mark/sweep has a few good points, but also a few
major detractors...).
yes, of course, one also faces a slight issue with a compiler playing along
with a pre-existing C runtime. as such, the garbage collector ends up being
a seperate entity from the rest of the runtime library. though if needed, it
is a hackable issue (we 'rename' malloc and free to refer to different
functions, which may or may not use the old interface depending on what they
are working with).
for example, when in my compiler framework I give it its own C-usable GC
setup (internally, the compiler currently uses this tagged-reference and
refcounting beast, but this is not appropriate for a "general purpose"
allocator), it will exist seperately from the malloc/free interface.
it will not be part of the compiler proper, or of the standard runtime, but
a library extension.
most likely, this interface will be a hybridization and a 'reformulation' of
my existing garbage collectors.
so, I will have GC, but how I implement it may be different than how someone
else does...
> --
> Ian Collins.
There are cases (mostly fairly obscure ones) where GC can break valid
code. If a program stashes a pointer away in a manner that prevents
the GC code from find it and/or recognizing that it might be a
pointer, GC can deallocate memory that's still in use.
This isn't a fatal obstacle to using GC, but any standard for GC in C
(whether it's part of the language standard or not) needs to address
this point.
--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
And, ideally, will interact well with efforts to add
some form of optional GC to ISO C++ (as such efforts
are addressing the same issues).
-- James
A proper economic analysis must include the negative consequences
as well. Some of them have been mentioned in other responses;
I'll add that encouraging programmers to lose track of when their
dynamic objects are valid does not "increase security". There is
also insufficient generality, since there would be no destructors,
which are sometimes essential.
Lest anyone imagine that this is just a hypothetical
curiosity that would never actually occur: I've seen it.
The circumstance was a program whose authors expended a
good deal of effort on their license-locking scheme. To
defend against people using debuggers to trace their code
and patch around the license-enforcement bits, they would
checksum various parts of the code and data to detect
insertion of breakpoints in the former or diddling with
the latter. To make the checksumming code hard to find
they disguised all its pointers by XOR'ing with random
garbage: You might discover that the license key resided
at location thus-and-such, but a search through core
would find no pointer to that spot. Neither would GC ...
That wasn't the limit of the paranoia, either. The
checksumming functions were never called directly, but
only through garbled function pointers de-garbled on the
fly. When a checksum error was found, the checksummer
did not call exit() and thus risk appearing in a stack
trace, but instead diddled another function pointer used
in a periodic maintenance task: the exit() call would occur
five or ten seconds later, from an apparently unrelated
part of the program ... I learned the ins and outs of this
ludicrous scheme when it rose up and bit my behind while
I was trying to port the whole program to another platform,
where many of the utterly non-portable assumptions about
data layout did not hold ... <<shudder>>
Yes, it provokes less mistakes, by doing that automatically.
Since the programmer is no longer responsible for keeping accounting of
each memory piece, less mistakes are done in big software projects,
and the software is much more reliable.
> There is
> also insufficient generality, since there would be no destructors,
> which are sometimes essential.
Destructors can be programmed as usual. Most garbage collectors also
allow you to force a free() if you *know* that an object is destroyed.
For instance:
void FreeData(DataStructure *p)
{
fclose(p->File);
destroyHandle(p->Handle);
// etc etc...
GC_free(p);
}
This destructor would be the same, but the call to
free() could be replaced with a call to GC_free,
or (much better) the call to free would be replaced
with
memset(p,0,sizeof(DataStructure));
to avoid using it after it is destroyed.
--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32
Ahh those horror stories.
I know a better one.
I wrote the contents of a pointer into a file, then, when the
program was started again, I called "free()" with that pointer.
Strange, strange...
It did NOT WORK!!!
We should NOT standardize free() Gosh!
Legal programs would break!
Would you mind providing your measurements
of the magnitude of "much more?"
It is very difficult to quantify but the best example I have is the
debugger of lcc-win32.
This is an application with 3 threads of execution: the program under
debug and the supervisor, the debugger itself, and the user interface
or IDE, that receives the commands from the user. The debugger must
constantly allocate and fill an enormous amount of information at each
breakpoint that the program under debug hits.
At each breakpoint data structures, buffers, and what have you must be
built, displayed, i.e. sent to the user interface that routes that
buffers into their respective windows. All that must be released
but of course not completely when the user restarts the program.
Some buffers must be kept to compare them with the previous ones and
show the changes in red color, other must be kept because they contain
info that is global to a module, etc etc.
You can imagine that malloc/free in this context just do not cut it.
I spent weeks and weeks debugging the malloc/free bugs, and it would
NEVER work. This application can be done without a GC of course, but
then you need BIG teams and BIG money and time, what I did not have.
The GC allowed me to write a debugger. Without it, I would have never
finished.
And many real applications have this degree of complexity. If you spend
your time debugging heap problems you are NOT writing you application
even if it looks like.
Now I can allocate memory in ANY thread in ANY part of the program and
do not have to worry about it. This is a BIG plus.
How big?
Can't tell you how big BIG is!
:-)
> Ahh those horror stories.
>
> I know a better one.
>
> I wrote the contents of a pointer into a file, then, when the
> program was started again, I called "free()" with that pointer.
>
> Strange, strange...
>
> It did NOT WORK!!!
>
> We should NOT standardize free() Gosh!
>
> Legal programs would break!
I fail to see what that's supposed to illustrate. If *course* it
didn't work. No valid program broke. A blatantly invalid program
broke.
A program that stores a pointer in some other form, then reconstitutes
it and accesses the object it points to, is a valid program. I can
think of a number of reasons to do this kind of thing. Eric presented
one (rather ugly) possibility. Another possibility is that a running
program might wish to temporarily compress (to save space) or encrypt
(for security reasons) a chunk of memory that happens to contain
pointers, and recover and use the pointers later. Or a pointer might
be used as, say, a hash table key, with the hash data structure
internally storing its keys in some non-obvious form; you can recover
the pointer value by traversing the keys.
I did *not* say that GC should not be standardized. I merely pointed
out that if it's going to be standardized, the fact that it could
break some programs that are currently valid will have to be dealt
with.
Presumably GC would be optional, since there are circumstances where
it's not appropriate. It would probably suffice to say that if a
pointer points to an object that's subject to garbage collection, and
if that pointer has not been continuously stored in some object in its
original representation, any attempt to access the object designated
by the pointer invokes undefined behavior. (Actually that wording
doesn't *quite* work, since it's possible for an operation that yields
the same pointer value to produce a different representation.)
This is probably less of an issue if the GC subsystem provides its own
allocator, and memory allocated by malloc(), calloc(), and realloc()
is not subject to garbage collection. Then programs would have to be
specifically written to use GC, and existing programs would be largely
unaffected. The OP didn't say which approach he advocates.
Again, I did *not* say that I oppose standardizing GC (I actually
don't have an opinion on the question). I merely pointed out some
things that must be taken into account if it is to be standardized.
I hope you're not suggesting that such issues should simply be
ignored.
>I believe that the Committee should add garbage collection in the next
>version of the Standard.
What exactly would this garbage collection be required to do?
A conservative collector (e.g. the Boehm collector) cannot reliably
collect all garbage (because there may be things that look like
pointers but aren't), and may discard non-garbage data if pointers
are encoded (e.g. when implementing other languages and putting
type bits in the low-order bits of a pointer).
A "real" garbage collector would be impractical with most, if not
all, existing implementations.
If there were requirements, it would presumably be legal to have a
no-op collector ("DDI" as it was known at one time). I suppose you
could say it was a quality-of-implementation issue.
The most plausible solution would be to require at least a conservative
collector, with guarantees about what would not be mistakenly collected,
and perhaps mechanisms for informing the collector about encodings.
-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
>If there were requirements, it would presumably be legal to have a
>no-op collector
I of course meant "If there were NO requirements ..."
Another, not so ludicrous example is the trick of using a
single link to traverse a list of nodes in either direction.
The key is that the link is the XOR of the connected node
addresses. I've seen this used as a space-saving feature
in code that had to minimize RAM usage.
GC works best in more highly constrained languages than C.
It can be useful in C also, but only if the programmer
follows some usage contraints. For example, Rob Pike used
a GC version of malloc in applications for the Blit terminal.
Personally, I haven't found GC to be necessary for C apps.
This problems can be made less critical if the
compiler emits information for the collector telling it exactly
where the pointers ARE. Then, random hits of some data in the
stack can't be considered pointers, and the probability
of leaking memory goes down.
Of course even with this schema, the problem of uninitialized
pointers would still be there i.e.
int fn(void)
{
char *p;
p = GC_malloc(343);
}
If at the point of the call to malloc the GC makes a scan of the stack,
the value of p can contain garbage, and that garbage *could* be
interpreted as a pointer. This is not serious since that garbage
would be released in a future collection when the program is elsewhere.
> A "real" garbage collector would be impractical with most, if not
> all, existing implementations.
>
Impractical no. A *lot* of work? Yes!
>
> The most plausible solution would be to require at least a conservative
> collector, with guarantees about what would not be mistakenly collected,
> and perhaps mechanisms for informing the collector about encodings.
I would not standardize in this area, it is too system specific. It
depends on the size of pointers, stack layout, the specific collector
and so many low level factors that a standard here would be very
difficult to design.
yes.
> This is probably less of an issue if the GC subsystem provides its own
> allocator, and memory allocated by malloc(), calloc(), and realloc()
> is not subject to garbage collection. Then programs would have to be
> specifically written to use GC, and existing programs would be largely
> unaffected. The OP didn't say which approach he advocates.
>
I would take this stance (that good old malloc and friends remain
non-GC'ed), albeit for different reasons than the above.
optional GC may be a better option, just the issue becomes more one of
specifics.
> Again, I did *not* say that I oppose standardizing GC (I actually
> don't have an opinion on the question). I merely pointed out some
> things that must be taken into account if it is to be standardized.
> I hope you're not suggesting that such issues should simply be
> ignored.
>
I will agree here.
sadly, there is a lot that is GC...
for example, we could more or less rip off Boehm's API (but not strictly
specify that the underlying GC is boehm, only that it looks similar...).
maybe even the GC could be a "conjoined twin" with the malloc implementation
(malloc simply representing "explicitly non-GC'ed objects").
or, one could use a custom API.
in any case, for a more or less unmodified compiler, it would probably have
to be conservative.
below is a general idea, taking largely from my own practices (albeit a new
API design...).
a simple idea for a custom API (also includes basic dynamic typing
facilities):
void *gcalloc(int sz); //alloc GC'ed object (untyped)
void *gcrealloc(void *p, int sz); //realloc GC'ed object
void *gcalloct(char *ty, int sz); //alloc typed object
char *gcgettype(void *p); //get type
...
note, type is a string, generally opaque and hopefully not clashing with
anything.
each object may be given a type (untyped objects are simply untyped...).
the typesystem is easily and readily extensible (since all we really need is
a new type name and a few functions).
dynamic typing is useful in many cases, even if not the primary model of the
language and hidden behind a large amount of wrapper code (as is the case
for C).
for the case below:
in general, I am specifying that a lot of wrappers be used (rather than
using the above in a more direct structs-based manner) in part to leave the
implementation more free with the internals (I had before faced pain being
forced from a more manual and ad-hoc pointer-casting approach to a more
uniformized approach).
note that I am assuming a fairly 'lightweight' allocator (namely, one where
I can allocate a very large number of objects absent much real overhead). in
general, I use 'cell' allocators (where the nominal size of each allocation
is 8 or 16 bytes). typically, a different allocation approach (often
piggybacking on malloc) is used for larger objects (more than a few kB).
the implementation could provide a number of built-in types as well (for
example, strings, opaque integers, ...).
char *gcstrdup(char *str);
int gcstringp(void *p); //is a string
may use a different naming convention, such as 'dy':
char *dystrdup(char *str);
int dystringp(void *p); //is a string
void *dyint(int v); //wrap integer
int dyintp(void *p); //is an integer
int dyintv(void *p); //integer value
void *dyfloat(float v); //wrap float
int dyfloatp(void *p); //is a float
float dyfloatv(void *p); //float value
reason, all this stuff is fairly useful...
likewise, for the builtin types, utility can also be incuded for arithmetic:
void *dyadd(void *a, void *b);
void *dymul(void *a, void *b);
so, a few likely useful built-in types:
int, full integer range, may or may not represent part of the range
specially (such as mapping it to some 'reserved' memory region);
long, may or may may not be distinct from int;
long long, represents full long long range;
float, floating point, much like int;
double, floating point (double);
string, normal string;
wstring, wide string (wchar_t);
symbol, like string, but different;
keyword, also like symbol, but different;
array, typed array of pointers;
cons, cells/lists;
symbolic hash tables;
...
likewise, a number of other builtin types (vec3, quat, int128, ...);
and certain API types (such as potentially 'FILE', ...).
since type names are cheap, little reason why not to have a good collection
of useful types...
void *dyarray(int sz);
int dyarrayp(void *p);
void *dyarrayget(void *p, int idx);
void *dyarrayset(void *p, int idx, void *q);
my case, cons cells are primarily a carry-over from lisp/scheme, but they
are useful:
basically, they are linked together into linked lists, each one only having
2 slots (traditionally called 'car' and 'cdr', but 'head' and 'tail' have
also been used before);
so, by linking lots of them together, one builds various structures (such as
ASTs, and other things).
void *dycons(void *car, void *cdr);
int dyconsp(void *p);
void *dycar(void *p); //get car
void *dycdr(void *p); //get cdr
void *dysetcar(void *p, void *v); //set car
void *dysetcdr(void *p, void *v); //set cdr
lists would be composed of cons cells.
void *dylist4(void *a, void *b, void *c, void *d);
being the same as:
dycons(a, dycons(b, dycons(c, dycons(d, NULL))))
likewise, things using cons cells also often provide functions for doing
more complex accesses:
void *dycadddr(void *p);
being the same as:
dycar(dycdr(dycdr(dycdr(p))))
or such...
yes, though possible, this has a few limitations:
more interdependencies between the compiler and runtime;
reduced performance (much like profiling, all these hidden little function
calls);
...
however, at least IME, the occurance of false hits is fairly low (I suspect
most actually come from what were once valid pointers). as a result, "stack
cleaning" would likely be more practical, but is similarly expensive...
the occurance of false hits is, however, not a major problem IME...
> Of course even with this schema, the problem of uninitialized
> pointers would still be there i.e.
>
> int fn(void)
> {
> char *p;
>
> p = GC_malloc(343);
>
> }
>
> If at the point of the call to malloc the GC makes a scan of the stack,
> the value of p can contain garbage, and that garbage *could* be
> interpreted as a pointer. This is not serious since that garbage
> would be released in a future collection when the program is elsewhere.
>
yes.
>
>> A "real" garbage collector would be impractical with most, if not
>> all, existing implementations.
>>
>
> Impractical no. A *lot* of work? Yes!
I don't get where this came from anyways.
I have been using GC in my projects with good success for years...
and it was only very recently that I have had my own C compiler...
in fact, I have used GC in C (and for that matter, dynamic typing) far more
than in my script VMs...
>>
>> The most plausible solution would be to require at least a conservative
>> collector, with guarantees about what would not be mistakenly collected,
>> and perhaps mechanisms for informing the collector about encodings.
>
> I would not standardize in this area, it is too system specific. It
> depends on the size of pointers, stack layout, the specific collector
> and so many low level factors that a standard here would be very
> difficult to design.
>
yes.
however, one could standardize, for a conservative collector, on the "most
conservative" option, namely in that a pointer can refer anywhere within the
object.
of course, this along with my otherwise "not horribly inefficient for masses
of tiny allocs", does constrain the implementation a little, but this is
still plenty possible.
in a linked-list allocator, it is a lot harder (well, along with being
efficient for smaller allocs).
in a cell allocator, we just locate the right part of the heap, and the cell
(shifting and masking the pointer), and skim backwards until we hit the
first cell in the object. this is usually fast enough.
my most recent GC (precise), I think, used 8 byte cells and 8 bits per cell
for a status (holds a cell type, some MM/GC flags, a ref count, ...). this
has a heap overhead of 12.5% (part of the reason why not to use cell
allocation for larger objects...).
my last majorly used (conservative) collector used 16 byte cells and 4
bits/cell (thus, 3.125% heap overhead).
the exact reasons for these differences are technical (type and usage
pattern mostly, for example a very high occurance 8 byte allocs, ...).
I have heard of certain allocators using 4 byte cells, but IMO there is
little reason for this (memory is generally not 'that' tight...).
one can then declare, GC works, so long as one is sane with it...
> GC works best in more highly constrained languages than C.
> It can be useful in C also, but only if the programmer
> follows some usage contraints. For example, Rob Pike used
> a GC version of malloc in applications for the Blit terminal.
>
for most code, the basic constraints (what is required for conservative GC)
are minimal (typically "don't do funky crap with pointers" and "be careful
when mixing the GC and data gained from malloc", ...).
> Personally, I haven't found GC to be necessary for C apps.
in any really non-trivial app, I feel it more than pays for itself.
now, for many small apps, one can get by without it (or for that matter even
bothering to free what they get from malloc, figuring it will all go away
anyways once the app closes), but for larger code, lacking GC can become
painful...
so, necessary? maybe not.
useful? yes.
or such...
Sure, but if it's going to be included in the C standard, then you
*must* have a reasonably rigorous definition of what "sane" means, and
you have to say something about what happens for "insane" code. It's
probably sufficient to say that the behavior is undefined, but you
must define when the behavior is defined and when it isn't.
There's a fairly common case: Numerical Recipes in C's
array allocation scheme.
The original NR had a memory allocation scheme for implementing
arrays with subscripts that didn't start at 0. This was used to
allow transliterating Numerical Recipes in FORTRAN to C. Multidimensional
arrays were implemented. So, if you allocated an array of floats
with dimensions [1..100][1..100], you got back a C array with a pointer
offset by -101 elements. Thus, the pointer to the array pointed far outside
the array.
This was NR's standard approach to array allocation. It could easily
break conservative garbage collection.
John Nagle
This is a memory population issue. As the fraction of the
address space in use increases, the odds that a random value is
a valid address increases. This continues until the address
space is increased. Right now, we're in the worst part of the
cycle - machines have a few gigabytes of memory but are
mostly 32 bit, with 4 gigabytes of address space.
This was less of a problem in, say, the VAX era, when
machines had maybe 4MB of a 4GB address space, and most apps
were far smaller. Today, we routinely have applications with
a gigabyte or more of heap.
For 64-bit machines, this is much less of a problem.
The odds of a random value being in the first few gigabytes
of the address space are low.
Note that conservative garbage collectors offer a potential
for a denial of service attack. If you can get some web service
to load a page of junk crafted to look like suitably spaced pointers,
you can prevent the garbage collector from releasing needed space,
and may be able to crash the application.
John Nagle
sane:
storing references unprotected in locals, globals, or in heap-based objects;
adjusting pointers (say, pointing somewhere into a string);
...
non-sane:
xor'ing or otherwise mutilating pointers (definition: the pointer is
mutilated in such a way that, if interpreted as a pointer, it no longer
refers to an area of memory that is part of the same allocated object). we
could probably require that the memory for pointers be properly aligned, if
such alignment is a natural part of the implementation (this would mean
pointers are not required to be found in certain cases, such as packed
structs).
now, what happens with non-sane code?
well, there is no guerantee that the referenced objects will be retained.
we can already say the GC will be conservative (to play well with C), but,
beyond this, we probably want it to be as loose as is reasonable.
it is unclear whether or not to require that references be tracable through
memory gained from malloc (doing so in the case of a pre-existing runtime
would requite hijacking malloc and free, which leads to another mass of
problems).
though seemingly sane to put the sole reference to an object in memory
gained via malloc, this leads to an implementation issue (malloc has to be
made somehow GC-aware, at least).
more practical would be seemingly to leave this as undefined...
so, examples of things that could break the GC:
int main()
{
char *p, *q;
struct { char b[13]; char *q; }qs __attribute__((__packed__));
struct { char *p; } *qm;
long l;
p=gcalloc(256);
l=((long)p)^0xDEADBEEF; //invalid, obscures pointer
qs.q=p; //invalid, breaks alignment rule
qm=malloc(sizeof(char *));
qm->p=p; //invalid, since malloc'ed memory not required to be
tracable
q=p+257; //invalid, q no longer points at p
q=p-1; //likewise invalid, as q still no longer points at p
*(char **)(q.b)=p; //technically invalid, gc not required to observe
non-pointers
...
//however:
q=p+64; //this is ok, since this is still the same object
...
Sure, but computer memory is not typically filled with random values.
It's likely that a *lot* of 64-bit integers will happen to hold
relatively small values. If valid memory addresses have non-zero
values only in, say, the bottom 33 or 34 bits, then you'll see more
arbitrary integers coincidentally matching memory addresses than you
would if everything were random.
If you can manage to set up your virtual memory so that valid memory
addresses aren't likely integer values, you can largely avoid this
problem.
Note that concurrent garbage collection requires some virtual
memory support, both in hardware and software. Concurrent collectors
need some way to detect which pages changed while they were running.
This is usually done via the "dirty page" bits in VM hardware.
There are many machines on
which C runs, generally small ones like Atmel AVR and ARM CPUs, which don't
have that machinery. The Boehm collector more or less assumes you're running
in something like a Windows or UNIX application environment. It's not
well suited to a small machine or a real-time system.
The whole "finalization" issue is a nightmare, but that's a
C++ issue, so we don't have to discuss it here. If you want to
experience the full horror, read up on how Microsoft Managed C++
did it. Calling destructors from the garbage collector leads
to a range of difficult problems, including deadlock, order
of destruction issues, and "re-animation", which can result
in a destructor running more than once.
John Nagle
It could easily break period.
This approach, while astute and efficient, is non-portable and invokes
undefined behaviour under the Standard. One way to deal with this kind of
pointer fiddling is to allocate these arrays as non-gc-reclaimable and
handle their life cycle by hand as before.
--
Chqrlie.
Most of the techniques described in this thread already invoke undefined
behaviour. Encoding pointers with %p and restoring them via scanf would
become problematic, and the same for any kind of reversible manipulation of
structures as byte arrays. It would not be so difficult to document these
things as incompatible with the garbage collection facilities, and to
provide workarounds (extra APIs for tagging objects as non-reclaimable).
--
Chqrlie
This particular hack on pointers should not be a problem since the pointer
most likely still points inside the object. It it does not, you have a
bigger problem because the assumption that the low order bits are fair to
patch does not hold.
>
> A "real" garbage collector would be impractical with most, if not
> all, existing implementations.
>
> If there were requirements, it would presumably be legal to have a
> no-op collector ("DDI" as it was known at one time). I suppose you
> could say it was a quality-of-implementation issue.
>
> The most plausible solution would be to require at least a conservative
> collector, with guarantees about what would not be mistakenly collected,
> and perhaps mechanisms for informing the collector about encodings.
Such as extra APIs to tag objects as non-reclaimable.
--
Chqrlie.
The easiest way to mark objects as non-reclaimable is to use
malloc/free. The collector will not touch those. This must be
emphasized: the old API must continue to exist and remain
functional.
Since the less memory the GC manages, the faster it is, it is better to
use automatic storage (VLAs) or malloc/free, when the allocation and
release of the memory block happen in a simple enough context (within
a single function of small size for instance) where the GC would not
bring any noticeable speedup in development time.
Of course, the GC memory is not subject to setjmp/longjmp leaks,
a consideration to remember even in those apparently "simple"
cases.
The Standard does not make this possible without invoking undefined
behaviour.
It does save a little memory, but this ugly hack is a bitch to debug.
> GC works best in more highly constrained languages than C.
> It can be useful in C also, but only if the programmer
> follows some usage contraints. For example, Rob Pike used
> a GC version of malloc in applications for the Blit terminal.
I remember how controvertial this was back in the mid-80s, when Rob Pike was
using every trick of the trade to pack all the features into available
ressources.
> Personally, I haven't found GC to be necessary for C apps.
Neither did I think, until very recently. GC can be used as a life saver
for daemons. It will prevent catastrophic failure in case of slow but
steady memory leaks. You can monitor memory usage and invoke it upon
abnormal memory consumption, if it finds any memory to reclaim, you know you
have a bug (memory leak), but the daemon can continue its chores without
running into the wall. Think of it as a flu shot to prevent a mild case of
common cold from degenerating into a deadly illness.
--
Chqrlie.
yes, well, note that I was not assuming that the GC actually be Boehm, but
in a basic form, its API might be reusable due to its popularity (but, with
the backend being in effect a different collector).
the alternative, as I was later suggesting, was to standardize on a
different API, in my case, also defining certain features not themselves
present in Boehm...
however, having a different API may well make more sense, to help avoid the
issues of assuming/implying a certain GC.
> The whole "finalization" issue is a nightmare, but that's a
> C++ issue, so we don't have to discuss it here. If you want to
> experience the full horror, read up on how Microsoft Managed C++
> did it. Calling destructors from the garbage collector leads
> to a range of difficult problems, including deadlock, order
> of destruction issues, and "re-animation", which can result
> in a destructor running more than once.
>
yes. actually, my GC's to some extent use destructors, but only rarely (most
types do not need them).
whether or not the GC should support destructors (or how much is allowed of
them), well then, people can debate on this.
> John Nagle
Designing such protection schemes is a solitary pleasure at trying to outwit
oneself to the extreme. Unsurprisingly, the first likely victims are the
developpers themselves and their colleagues, then the customers and their IT
crew.
--
Chqrlie.
I will agree here.
it is questionable and dangerous to change the behavior of the "old guard"
malloc/free interface, which is why the GC should have a seperate (and
optional) API.
> Since the less memory the GC manages, the faster it is, it is better to
> use automatic storage (VLAs) or malloc/free, when the allocation and
> release of the memory block happen in a simple enough context (within
> a single function of small size for instance) where the GC would not
> bring any noticeable speedup in development time.
>
> Of course, the GC memory is not subject to setjmp/longjmp leaks,
> a consideration to remember even in those apparently "simple"
> cases.
>
yes.
malloc'd memory must still be scanned for pointers to gc-reclaimable memory.
It is actually quite convenient to make malloc'd memory gc-reclaimable, so
you don't need to change existing code to take advantage of GC. Boehm does
that.
> Since the less memory the GC manages, the faster it is, it is better to
> use automatic storage (VLAs) or malloc/free, when the allocation and
> release of the memory block happen in a simple enough context (within
> a single function of small size for instance) where the GC would not
> bring any noticeable speedup in development time.
VLAs or alloca are definitely much faster than malloc/free for small blocks
of local storage in a function activation.
But your assumption is a bit simplistic: conservative GC must scan all
memory blocks that may contain pointers to gc-reclaimable memory. This
includes the heap, the stack, anonymous maps, modifiable data segments...
even if GC manages little memory. It is probably more efficient to let GC
manage the heap itself, so it can scan it more efficiently.
> Of course, the GC memory is not subject to setjmp/longjmp leaks,
> a consideration to remember even in those apparently "simple"
> cases.
Yes!
--
Chqrlie.
> "Douglas A. Gwyn" <DAG...@null.net> a écrit dans le message de news:
> 4702C32F...@null.net...
> > Another, not so ludicrous example is the trick of using a
> > single link to traverse a list of nodes in either direction.
> > The key is that the link is the XOR of the connected node
> > addresses. I've seen this used as a space-saving feature
> > in code that had to minimize RAM usage.
>
> The Standard does not make this possible without invoking undefined
> behaviour.
It does (casting pointers from and to uintptr_t is guaranteed; and nothing
prevent XORing unsigned integers; there is an implementation defined aspect
in C90: finding an integer type big enough).
Note that the C++ comittee is investigating on GC. Reading the related
papers may be interesting for thoose wanting one in C.
A+
--
Jean-Marc
and sometimes even validly registered software can randomly forget that it
was ever registered...
> --
> Chqrlie.
>
NO, you are being too optimistic.
You can cast the pointer to uintptr_t or intptr_t, more precisely, you can
cast your pointers to void * and then cast the result to intptr_t or
uintptr_t, you would then fiddle with the numeric value and keep that in
your structures, undoing the fiddling before casting back to void*, and then
to your type.
There are two problems with that: intptr_t and uintptr_t are *optional*,
and there is no guarantee whatsoever about the actual representation of
pointers so converted. If you *know* which bits to fiddle with, you would
be smart enough to tell GC about the underlying objects it should not
reclaim.
> Note that the C++ comittee is investigating on GC. Reading the related
> papers may be interesting for thoose wanting one in C.
The issue seems a lot more complex for C++. Do you have a pointer ?
--
Chqrlie.
>There are two problems with that: intptr_t and uintptr_t are *optional*,
You don't actually need to use intptr_t; you could copy the pointer
values to suitable arrays of bytes and xor the bytes. But that would
hardly be worth the effort.
>and there is no guarantee whatsoever about the actual representation of
>pointers so converted.
You don't need to known anything about how the pointers are encoded
into integers in order to make the xor trick work.
I suppose there might be a problem if the implementation doesn't always
convert a given pointer into the same integer. I don't know if this is
guaranteed, but it ought to be.
theoretically, but at least with most 'sane' architectures/compilers, there
is no problem with this...
> There are two problems with that: intptr_t and uintptr_t are *optional*,
> and there is no guarantee whatsoever about the actual representation of
> pointers so converted. If you *know* which bits to fiddle with, you would
> be smart enough to tell GC about the underlying objects it should not
> reclaim.
>
errm, you xor them as values, not twiddle specific bits...
just what is going on in the pointers is a much lesser issue.
a general rule of xor:
(i^j)^i=j, (i^j)^j=i
this is true regardless of the values of i or j...
>> Note that the C++ comittee is investigating on GC. Reading the related
>> papers may be interesting for thoose wanting one in C.
>
> The issue seems a lot more complex for C++. Do you have a pointer ?
>
you can search all over, but all the pointers come up void...
mark this down before making a sweeping decision...
> --
> Chqrlie.
>
> "Jean-Marc Bourguet" <j...@bourguet.org> a écrit dans le message de news:
> pxbve9o...@news.bourguet.org...
> > "Charlie Gordon" <ne...@chqrlie.org> writes:
> >
> >> "Douglas A. Gwyn" <DAG...@null.net> a écrit dans le message de news:
> >> 4702C32F...@null.net...
> >
> >> > Another, not so ludicrous example is the trick of using a
> >> > single link to traverse a list of nodes in either direction.
> >> > The key is that the link is the XOR of the connected node
> >> > addresses. I've seen this used as a space-saving feature
> >> > in code that had to minimize RAM usage.
> >>
> >> The Standard does not make this possible without invoking undefined
> >> behaviour.
> >
> > It does (casting pointers from and to uintptr_t is guaranteed; and nothing
> > prevent XORing unsigned integers; there is an implementation defined
> > aspect
> > in C90: finding an integer type big enough).
>
> NO, you are being too optimistic. You can cast the pointer to uintptr_t
> or intptr_t, more precisely, you can cast your pointers to void * and
> then cast the result to intptr_t or uintptr_t, you would then fiddle with
> the numeric value and keep that in your structures, undoing the fiddling
> before casting back to void*, and then to your type.
That's precisely what Douglas' example is all about.
> There are two problems with that: intptr_t and uintptr_t are *optional*,
Indeed. I forgot that. I wonder how many C99 implementations don't have
them.
> and there is no guarantee whatsoever about the actual representation of
> pointers so converted.
Well, it doesn't matter in this case.
> If you *know* which bits to fiddle with, you would be smart enough to
> tell GC about the underlying objects it should not reclaim.
Never said it was impossible nor even complicated. Retrofitting to
existing things would be the major source of problems.
> > Note that the C++ comittee is investigating on GC. Reading the related
> > papers may be interesting for thoose wanting one in C.
>
> The issue seems a lot more complex for C++. Do you have a pointer ?
C is WG14, C++ is WG21. So
http://www.open-std.org/jtc1/sc22/wg21/docs/papers
and look for GC. I think most of the issues and the major source of
objections would be the same.
Yours,
--
Jean-Marc
Yes, the XOR trick works as long as the uintptr_t type is available, an can
be emulated with unsigned char arrays if not. There is indeed no guarantee
that a given pointer always converts to the same integer, what the stanndard
specifies is that the operation is reversible and produces the original
pointer. To avoid this unlikely issue, the pointers to the head and tail of
the xor-linked list should also be stored as integers.
--
Chqrlie.
Yes, for the birectional links, it does not matter. I was also thinking
about another trick mentioned upthread where some information is stored in
unused bits in the pointers.
>> If you *know* which bits to fiddle with, you would be smart enough to
>> tell GC about the underlying objects it should not reclaim.
>
> Never said it was impossible nor even complicated. Retrofitting to
> existing things would be the major source of problems.
True, but retrofitting is not needed either. The GC would not be active
unless explicitly initialized and use with the appropriate functions.
--
Chqrlie.
> >> If you *know* which bits to fiddle with, you would be smart enough to
> >> tell GC about the underlying objects it should not reclaim.
> >
> > Never said it was impossible nor even complicated. Retrofitting to
> > existing things would be the major source of problems.
>
> True, but retrofitting is not needed either. The GC would not be active
> unless explicitly initialized and use with the appropriate functions.
The problems appears when mixing parts using a GC and other written with
the above tricks.
Yours,
--
Jean-Marc
In other words, the same kind of integration woes as for threaded
applications.
--
Chqrlie.
No, it's not guaranteed, but it's also not a problem.
Whatever integer value the pointer is converted to, is
guaranteed to be convertable back to a pointer value
that references the same object, even though it may be
represented differently. The XOR trick does not depend
on uniqueness in the ptr=>int direction.
Unfortunately, on many platforms it is difficult or impossible
for the run-time system to distinguish between GCable and
malloc()ed storage/pointers. Usually, extra time and storage
is required to implement a mechanism for distinguishment.
> Of course, the GC memory is not subject to setjmp/longjmp leaks,
> a consideration to remember even in those apparently "simple"
> cases.
Properly organized use of setjmp/longjmp doesn't suffer from
memory leaks. I mainly use it within an implementation of nested
exception handling, where it is easy to catch exceptions and
clean up unneeded data structures before passing control along.
When uintptr_t is defined, as it usually is, it can be done.
> It does save a little memory, but this ugly hack is a bitch to debug.
It is typically used only when the memory savings are substantial
enough to matter. If the node data is 8 bytes and a pointer is 4 bytes,
that saves 25%.
It's actually a straightforward (formerly well-known) technique and
shouldn't require any debugging. It does mean that when debugging
for other problems, it is somewhat harder to "walk" a list, but not
too bad if the debugger supports reasonably flexible address
expressions.
> Neither did I think, until very recently. GC can be used as a life saver
> for daemons. It will prevent catastrophic failure in case of slow but
> steady memory leaks. ...
A run-time bandage for erroneous code? That's a claim I hadn't
heard before. My preference is to ensure that the program doesn't
have such problems in the first place, and GC cannot do that.
uintptr_t exists in almost all conforming implementations,
and there is a macro the source code can test to determine
whether uintptr_t is available.
There is no need to know anything about the pointer
representation; the uintptr_t translation is guaranteed to
contain sufficient information to be converted back to point
to the correct object, and the XORing doesn't depend on any
aspect of the representation; each uintptr_t value that is
converted back to a pointer is always the same as was
obtained from converting the original pointer.
Theoretically, there could be issues with padding bits in
the uintptr_t integer representation, unless the XOR were
done as an array of unsigned char, but in practice that is
virtually never a problem.
Applications/platforms that really need the XOR trick also
probably can't afford the extra time/space overhead of
coordinating with the GC.
That is, however, unacceptable to those systems programmers who
find it necessary to resort to such tricks as bidrectional use
of a single link due to specific contraints of their target
environments. So long as they're using ptrdiff_t, their code
has been strictly conforming, and turning it into undefined
behavior would be harmful.
If the GC feature were optional AND under control of the
programmer, AND if the undefined behavior would occur only when
GC is explicitly used, that might be acceptable. However,
experience shows that various third-party libraries are likely
to use GC unconditionally, not giving the user any control over
this matter.
All of these issues have been discussed at length by
WG21 in their consideration of GC for C++, and it is
considered necessary to at least allow a mechanism to
note that some piece of code may hide pointers.
-- James
What kind of issues? The promise about converting a pointer to uintptr_t
and back doesn't depend on the representation of uintptr_t, does it?
Most implementations provide them. C99 makes it easier than for C90
since an extended integer type can be used if necessary. The only
real obstacles are when:
(a) addresses are much larger than supported integer types;
(b) tagged architecture makes it infeasible to perform integer ops
on addresses.
Problem (a) is unlikely on the vast majority of platforms, since there
will be an integer type at least 64 bits wide. It could arise on an
unusual architecture where the "ring" part of the address is e.g. a
unique OID, IP address, or something of the sort.
Problem (b) would also affect pointer arithmetic, so a conforming
implementation presumably will have come up with some workaround.
Actually it was never allowed in strictly conforming C, and was
one of my complaints when I reviewed the original edition of the
book. I don't know if it was corrected in later editions.
The technique actually does cause problems on some real systems.
That's true provided you never take the address of a list item after
constructing it. In particular, as Chqrlie pointed out, you have to
keep the pointers to the list ends as converted integers rather than
pointers.
Could you explain for me what the problem would be if you did take the
address of a list item after constructing it? Offhand, it seems to me
that you've just got two different pointers to the same thing; one has
been obscured by using the XOR technique, the other is an ordinary
pointer. I don't see any way that using the ordinary pointer
interferes with extracting and using the XOR'd pointer, or vice versa.
>Could you explain for me what the problem would be if you did take the
>address of a list item after constructing it? Offhand, it seems to me
>that you've just got two different pointers to the same thing; one has
>been obscured by using the XOR technique, the other is an ordinary
>pointer. I don't see any way that using the ordinary pointer
>interferes with extracting and using the XOR'd pointer, or vice versa.
No problem taking the address, but if you converted that address to an
integer it might not be the same integer you got when building the
list, so if you XORed it with a value in the list you might not get
the integer that converts to the next pointer.
I assume this is unlikely when traversing the list (I've never used
this hack^H^H^H^Htechnique). But it does mean that you have to keep the
integers for the two ends rather than the addresses they're derived
from.
I hadn't considered that anyone would do that. However, I'll concede
that it's a mistake that could easily be made by someone who's unaware
of the fact that pointers can have multiple representations identifying
the same memory location.
However, that's a far more restricted statement of the possible problem
situations than your previous one: "That's true provided you never take
the address of a list item after constructing it."
> I assume this is unlikely when traversing the list (I've never used
> this hack^H^H^H^Htechnique). But it does mean that you have to keep the
> integers for the two ends rather than the addresses they're derived
> from.
I never expected otherwise.
Using ptrdiff_t alone does not make such code strictly conforming. It is
possible to make conforming code to implement the XOR hack, but it is quite
convoluted and horrible.
> If the GC feature were optional AND under control of the
> programmer, AND if the undefined behavior would occur only when
> GC is explicitly used, that might be acceptable. However,
> experience shows that various third-party libraries are likely
> to use GC unconditionally, not giving the user any control over
> this matter.
Keeping GC under comtrol of the programmer seems like a fair constraint.
Mantading that third -party libraries document what features they rely on,
especially GC does not seem difficult or unreasonable. If they do use the
GC, then they are incomptible with other libraries and/or modules that do
ugly if portable pointer diddling. What should that be a problem ? It is
quite common to run into incompatibilities between modules for much simpler
reasons: name collisions, use of system specific extensions (threads, other
libraries, memory mapped files...) Programmers deal with these issues, find
work arounds or alternative solutions.
This argument against optional GC is weak.
--
Chqrlie.
I agree that if GC is used, it should handle the malloc heap as well.
>> Of course, the GC memory is not subject to setjmp/longjmp leaks,
>> a consideration to remember even in those apparently "simple"
>> cases.
>
> Properly organized use of setjmp/longjmp doesn't suffer from
> memory leaks. I mainly use it within an implementation of nested
> exception handling, where it is easy to catch exceptions and
> clean up unneeded data structures before passing control along.
That's true, but it is non trivial to do this right, and it would be so much
better for the benefit of all C programmers if such a facility was
standardized.
--
Chqrlie.
The work-around has already been discussed: use an array of unsigned char
the size of a void *, and perform the XOR on the elements of that array.
This is how ugly it need to be if one wants to be absolutely portable. Why
the commitee made [u]intptr_t optional beats me! What is wrong with not
supporting hardware with brain dead or obsolete architectures ?
--
Chqrlie.
Except you can only "walk" the list if you have the pointer to the previous
(or next) node in addition to the current node. Typically you don't have
that information in a function called with just a pointer to a node.
>> Neither did I think, until very recently. GC can be used as a life saver
>> for daemons. It will prevent catastrophic failure in case of slow but
>> steady memory leaks. ...
>
> A run-time bandage for erroneous code? That's a claim I hadn't
> heard before. My preference is to ensure that the program doesn't
> have such problems in the first place, and GC cannot do that.
Of course, so is mine. I'm not saying memory leaks should be ignored or
accepted. We use many tools to prevent them, from strict coding
conventions, type-checked allocation wrappers, systematic code reviews,
testing with debug malloc variants and valgrind... but for most real life
applications, *ensuring* the program does not have bugs it is close to
impossible, just like you cannot ensure that soldiers never get hurt.
Packing band-aid and medicine to prevent unexpected problems from having
catastrophic consequences is not a bad decision. It is impossible to test
all conditions, especially time sensitive conditions. Even 100% code
coverage does not buy you that, contrary to popular management belief. A
program running as a daemon with a memory leak will eventually crash and
possibly even crash the whole system from resource starvation. Embedding a
generic memory leak detection and fixing it autimatically on site at
run-time with a call to GC (along with reporting the problem for further
investigation) *is* effective at preventing these crashes. If your life
depended on that equipment, you would *want* self healing code ;-)
--
Chqrlie.
I thought what I was referring to was clear from the context; evidently
it wasn't.
I say, we keep the existing interface as is.
malloc is still manual...
> If the GC feature were optional AND under control of the
> programmer, AND if the undefined behavior would occur only when
> GC is explicitly used, that might be acceptable. However,
> experience shows that various third-party libraries are likely
> to use GC unconditionally, not giving the user any control over
> this matter.
former point, yes. if gc is used, its costs are expected, but of not, all is
as before.
latter point, errm, how is this any worse than the current issue or
fragmentary use of 3rd party GC's.
in this case, we simply state:
"a GC is supplied with the implementation and conforms to this standardized
interface".
with the current situation, nearly everyone and their dog has their own
incompatible GC framework...
now, what I want out of a GC is this:
does GC;
handles optional type info;
is good for small objects.
in any case, in my case I will probably implement something like this,
though I have yet to decide on the specifics...
I push for standardizing C nested exception handling every time
it seems like it might find a receptive audience. C would have
to have a different design than C++, since it doesn't have
classes. That in itself may serve as a "show stopper" for some.
I'd rather have standard nexted exception handling than GC.
The workaround in this case would have to be not using the library.
It is undoubtedly true that a lot of libraries are poorly designed;
my point was that if you introduce yet another way they can be
incompatible, you're making the situation even worse.
Not if the list has a header node, which is usually the case
for such data structures.
One man's "brain dead" is another's "innovative".
The C standard attempts to be feasible to implement on
any architecture that we reasonably expect to encounter,
including a much wider variety of processor environments
than just the typical "desktop PC". Nearly all of the
kinds of feature you wouldn't want to worry about
supporting are based on actual implementation experience.
Thinking further about it, you're right -- only the "value bits"
matter.
It doesn't have to be. Whatever integer it is, will reconvert to
an equivalent pointer value.
List walking isn't usually done using such a function.
> Packing band-aid and medicine to prevent unexpected problems from having
> catastrophic consequences is not a bad decision.
It's not necessarily a good one, either. My concern is that the use of
GC is likely to encourage programmers to become even more sloppy, since
they think that there is no need to keep track of their allocated objects.
It's the same kind of mindset that believes that run-time buffer limit
checks mean one doesn't have to get the buffer usage right.
I believe in run-time safety nets, with means to implement recovery
strategies so that the app regains control at a higher level. That's
the main reason I am a proponent of nested exception handling; it has
just the right control structure for that kind of design. I oppose
error-handling strategy (as opposed to detection) being implemented by
low-level support such as GC.
Oops, a later posting explained better what the problem might be.
I concur that the stored XOR code depends on using the original
(converted) value of the starting pointer, not some equivalent
value. This isn't a problem on the vast majority of platforms
anyway.
Do you have a written proposal for standard nested exceptions ?
--
Chqrlie.
Even then: pointers to both ends of the list must be kept in integer form
because these values are needed for traversal and further conversions are
not guaranteed to yield the integer value.
For occasional readers, keep in mind that this discussion is mostly a
private joke among Standard C afficionados: converting a pointer to an
appropriately typed integer is problematic on the DS9K (where uintptr_t is
probably not even available), but works as expected on all decent platforms.
--
Chqrlie.
That does not necessarily disqualify my describing them as brain dead or
obsolete.
But for political correctness, I will refer to them as "different".
An innovative example I can think of, that would definitely make uintptr_t
support inappropriate, is making all or some pointers fat: a fat pointer
includes extra information such as object size and base address, and
possibly some typing information. Such pointers would be far larger than
the integer types, but would allow far better runtime checks. The C
Standard allows such an implementation, but I have never actually seen one
(which does not prove anything).
Since such an implementation is neither brain dead, nor obsolete, I was
really too quick to dismiss the issue. Keeping support for one's complement
and sign/magnitude representations for integers is IMHA more open to debate.
--
Chqrlie.
No, but the list walker can call such a function. Once you are tracing it,
all you have left is a pointer to the node, and no way to walk the list. It
is just a drawback of the XOR linked-list trick.
>> Packing band-aid and medicine to prevent unexpected problems from having
>> catastrophic consequences is not a bad decision.
>
> It's not necessarily a good one, either. My concern is that the use of
> GC is likely to encourage programmers to become even more sloppy, since
> they think that there is no need to keep track of their allocated objects.
> It's the same kind of mindset that believes that run-time buffer limit
> checks mean one doesn't have to get the buffer usage right.
But the runtime does report the leaks, so it is just a matter of management
process to enforce good practice on the programmers.
> I believe in run-time safety nets, with means to implement recovery
> strategies so that the app regains control at a higher level. That's
> the main reason I am a proponent of nested exception handling; it has
> just the right control structure for that kind of design. I oppose
> error-handling strategy (as opposed to detection) being implemented by
> low-level support such as GC.
This is a general statement! Why refuse a working last resort solution ? My
use of GC prevents a potential crash with potentially dramatic consequences.
If the system crashes because of resource starvation, it is too late to
implement any kind of error-handling strategy. By the time the system has
rebooted, if it does, the damage may be irrecoverable. I do not discount
the value of nested exception handling, whose inclusion in the Standard I
would gladly support, when it comes to failures, bugs and unexpected runtime
conditions, combining different strategies may be necessary to provide the
best service possible.
--
Chqrlie.
we instead need damn near every library implementing their own allocator
and/or GC...
to a large degree, this is the case. many libraries, if you pass objects to
them, or recieve objects from them, you have to use their allocation and
freeing functions.
this would then be more of the same, but this time there is at least hope of
compatibility, if only because the same GC is available to several libs.
or such...
Do they? When this subject comes up, I rarely if ever see any
specific GC implementation mentioned other than Boehm. Are other GC
implementations in common use out there, or has Boehm become almost a
de facto standard? And if it has, how much added value would there be
in adding GC to the C standard?
--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
The AS/400 architecture is the most commonly presented example of a
system where pointers can't necessarily be stored in integers.
And of course any GC implementation has to recognize distinct
representations of the same pointer value. I suspect the most
commonly mentioned GC implementation, Boehm, doesn't do this, but it
doesn't claim to be completely portable.
There are no *general* GC frameworks but in *many* applications
certain structures are assigned from a pool that recycles memory
for those objects
All those development, heap management and many other "mini-gc"s
are very common and take a substantial part of the effort for
writing something in C.
--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32
Please expand on the gory details for us mere ignorant desktop users.
--
Chqrlie.
Sorry, I don't know the details myself. Perhaps somebody else here
can jump in.
Boehm is the most popular stand-alone GC.
this is what people often use if they just want to throw GC at a codebase.
however, this is not to say this is the 'only' GC in common use.
by my statement 'everyone and their dog has their own GC', I mean this:
there are an endless number of smaller custom GCs (as a general rule, they
are easy enough to write that many people will write their own custom GC,
rather than reusing one from somewhere else).
sometimes, we may even encounter several such GCs being used, in the SAME
project...
now, most of these are not commonly reused, but written specifically for the
project in which they are being used.
(this is especially common in language VMs, which very often end up with
some or another custom GC, but this is also true of many non-VM apps).
now I could list a few examples, but this would be by no means be a
comprehensive list...
The S/38 (from which the AS/400 was derived) had the novel idea of
using a massive virtual address space rather than dealing with
trivialities like file systems. Hence, it had 128-bit pointers back in
the day when IBM mainframes were still using 24-bit addressing on 32-bit
hardware.
-Larry Jones
Just when I thought this junk was beginning to make sense. -- Calvin
I find this sort of debate rather depressing. People here seem to
come from a background where "malloc/free" is the way to do it, and then
garbage collection is a band-aid so that sloppy programmers don't have to
get their "free"s in the right places. I come from a background where
GC was the natural, indeed only, way to reclaim heap storage, and was
appalled to discover on converting code to C that I had to add in all the
"free"s by hand. In much of my code, there was actually no way to know
when objects could no longer be accessed [eg because an object might be
linked in to several lists, queues or other structures], so in the absence
of GC, using "free" *at all* was a bug waiting to strike.
I don't think that's anything to do with "sloppy code". It's
possible to use "malloc/free" in a sloppy way, it's possible to use them
very precisely, it's possible to over-rely on GC, it's possible to help
the collector to manage storage efficiently. 'Tis true that GC does not
mix well with some of the tricks that people get up to, such as reading
in pointers from files or constructing them on the fly by bit-twiddling
from other integer/pointer objects; but perhaps those tricks are not to
be encouraged anyway.
--
Andy Walker, School of MathSci., Univ. of Nott'm, UK.
a...@maths.nott.ac.uk
The lcc-win32 C compiler provides a garbage collector in the
standard distribution since more than 5 years now.
No, although I do have an implementation.
When I broached the subject before, there didn't seem to be
sufficient support from the rest of the committee.
But that is usually because it is not purely a single-object
allocation issue. For example, connections need to be closed,
buffers need to be flushed, etc. This is such a common need
that virtually all "file"-like interfaces (e.g. 9P protocol)
have both "open" and "close" hooks. GC would not address this.
As I recall, the IBM System/38 was largely based on Glenford
Myer's "SWARD" (SoftWare Architecture for Reliable Design).
He wrote a book about this which is well worth reading (I don't
have the title at hand). One of its main features was that the
data types were "tagged" so that type mismatches would trap.
It would not be absolutely impossible to implement uintptr_t on
such a platform, but it might be expensive in terms of cycles.
Although, especially as of C99, the implementation could provide
suitable integer types anyway.
> Since such an implementation is neither brain dead, nor obsolete, I was
> really too quick to dismiss the issue. Keeping support for one's complement
> and sign/magnitude representations for integers is IMHA more open to debate.
There are DSPs like that. Since there is no reason for the *vast*
majority of C code to be concerned over this representation detail,
there is hardly a compelling reason to constrain it. Even less so,
now that there is a standard way to specify twos-complement types
when one really needs them. (Also a standard way to detect whether
or not they are even available; that is important!)
Tell them that Microsoft uses it with their
__try/__except mechanisms.
THEN
you will see that they WILL listen
:-)
Microsoft's implementation of the exception handling
is built upon a linked list of handlers based on a special
segment setup to be pointed by the GS register.
When a procedure that uses exception handling starts, in its prologue it
will insert itself in the linked list, and disconnect itself at exit.
This could be done with a simple linked list of exception handlers
in a portable C implementation. The tricky part is to catch all
signals (SIGSEGV, etc) and reroute them to a common handler, that will
look at the list and call the current handler. Under windows, this
is done by the OS, but it could be done by a common routine that is
setup at program startup to trap all signals...
Of course your implementation could be completely different.
I implemented this kind of mechanism for a lisp interpreter
I wrote during the 90s.
Probably _Advances in Computer Architecture_ by Glenford J. Myers
(note: not Myer), published February 1982.
> It would not be absolutely impossible to implement uintptr_t on
> such a platform, but it might be expensive in terms of cycles.
--
Note that I did not suggest rebooting as the best high-level strategy
(although under some circumstances it might be; that's a decision that
should be deliberately made). If memory cannot be allocated, nested
exception handling should still work (mine does), and the handlers can
attempt whatever strategy was deemed appropriate by the designers.
Note that there is far too much code out there that doesn't check for
malloc returning null, and less still that implements a good recovery
procedure. Nested exception handling would handle that, while GC
wouldn't. Indeed, the GC-as-error-corrector argument only applies to
the specific problem of leaked (lost) allocations, not any of the
myriad of other similar problems. As I said, I'd rather the effort go
into a more generally useful error handling facility. There may be
good arguments for GC, but I think that one is especially weak.
I am talking about a lot more than this...
Would you care to post it here or on c.l.c for discussion?
Or just email me in private.
--
Chqrlie.
I agree. For me lack of objects is not a problem. You can just "throw"
and integer value which you can catch higher up and, if you want,
re-throw. I would also say that if nothing catches the exception then
the program is terminated. This is approximately how one Pascal
implementation I used a lot worked.
--
Flash Gordon
Another approach might be something similar to what Ada does (yeah, I
know, hear me out). Ada defines an "exception" as a special kind of
entity (it's not really a type, and exceptions aren't objects). An
exception can be declared in a manner similar to an object
declaration. For example ("--" introduces a comment):
declare
Kaboom: exception; -- "exception" is a keyword, not a type name
begin
raise Kaboom;
exception
when Kaboom =>
-- code to handle Kaboom exception
when others =>
-- code to handle exceptions other than Kaboom
end;
I'm not suggesting this verbose syntax for C, nor am I suggesting that
an exception should be an entirely new kind of entity.
Instead, make "exception" a special type declared in, say,
<stdexcept.h>, and allow *only* values of type exception* to be thrown
and caught. Make sure that "exception" is incompatible with any other
types, so that an attempt to throw anything else will be diagnosed at
compile time. (This is similar to the treatment of type FILE.)
Borrow C++'s syntax for try/throw/catch, but restrict it to a single
type.
An exception* value is initialized by using a function that associates
a string with an exception, so that the exception name can be printed
in error messages. Provide a function in <stdexcept.h> that returns
the string associated with an exception.
Something like this:
#include <stdexcept.h>
/* ... */
{
exception *kaboom = new_exception("Kaboom");
try {
throw kaboom;
}
catch(exception *x) { /* Note: the type here must be exception* */
if (x == kaboom) {
fprintf(stderr, "Kaboom!\n");
}
else {
const char *name = exc_get_name(x);
if (name == NULL) {
fprintf(stderr, "Caught unnamed exception\n");
}
else {
fprintf("stderr, "Caught exception \"%s\"\n", name);
}
}
}
}
This is off the top of my head, and I'm sure there are serious flaws
that I haven't thought of. Having to check for each possible
exception explicitly by an equality comparison is less convenient than
C++'s syntax. If we could use an integer type rather than a pointer
type, and somehow arrange for exception values to be unique and
constant, we could use a switch statement instead, but arranging for
the values to be constant would be tricky.
Printing exception names can be simplified if we guarantee that an
exception must have a non-null (but possibly empty?) name.
For C++ compatibility, a C++ implementation can provide a <cstdexcept>
header with the same declarations. If possible, C code that uses
exceptions should be valid C++ code that uses exceptions in a
restricted manner. Defining what happens when an exception propagates
from C to C++ or vice versa is left as an exercise (or is left
undefined, but it would be nice to be able to define it reasonably
cleanly).
It will be tempting to encode additional information in the name
string. For example, if I to associate an arbitrary object with an
exception, I can encode its address using sprintf with a "%p" format.
Consider providing a cleaner way to do this (but avoid creeping
featurism).
I don't see the main problem being what is thrown, I see it as how to
write exception safe (an early return can be caused by any function
call) code in C.
In C++, we can use RIAA to create objects to hold and release resources
held within a scope. These objects are also used to release the
resource in the case of an early return or an exception. C doesn't have
this facility, so there would have to be a means of pushing cleanup
handlers called when an exception is thrown for the facility to be useful.
--
Ian Collins.
Yes. Like this?
get resource;
try {
use resource;
} finally {
release resource;
}
Leaving 'try' (even due to an exception) will always execute 'finally',
and a jump into the 'try' is a constraint violation.
--
Hallvard
one does realize right that in terms of compiler implementation, exceptions
are a much bigger issue than GC...
a conservative GC can be largely ignored by the compiler, but exception
handling requires a lot of special logic within the compiler.
or such...
now, for throwing and catching exceptions:
in C, one option would probably use integers.
now, pure integers are a problem ("how do you effectively globally assign
them?...").
another option is 'symbols'. in effect, each exception is given a new
(virtual) variable, and we use the address of the exception variable to
identify the exception, we can also use the linker effectively as a way of
merging exceptions.
as another person noted, types are another possibility...
now, does all this belong in C?...
I am a little more doubtful on this one.
or such...
> --
> Flash Gordon