Summing up, highlighting, and replying to some interesting points in the two GC threads, in the form of quote-and-answer:
Erik:
What I find rather obvious is that any stream you read from may be closed when you hit end-of-file, regardless of where this happens. Users of a stream can test for open-ness with `open-stream-p', which is quite remarkably different from testing for end-of-file.
Ok, but, so what? The topic is closing _files_, not streams. Files may be randomly accessible. Reaching the end-of-file during a read is not a condition under which to automatically close the file.
Kent:
> Even if you only believe that GC-based file closing is a > safety-net -- there are still RISKS, there.
Risks are greatly magnified if you pretend you have more promised to you than you do. A person who is told he has an unreliable tool can still do good things with it if he treats the tool with the care it requires.
Indeed, but there are two caveats that are relevant to the thread(s).
First, a GC-based system that does not reliably close files from GC is significantly harder to program correctly than one that does. Recently, someone posted some implementations of `with-foo' macros from various CL implementations that illustrated the kinds of bugs that can bite you. Since the topic includes RISKS, such bugs should not be regarded as "unlikely events", but must be regarded as "available exploits".
Second, as much as there is a coherent thread here, I thought it was about what technologies to employ in a robust, general purpose implementation of a lisp-family language. One _might_ decide to make a trade-off sacrificing this kind of GC robustness for some other virtue, but it would be a significant trade-off, and should be evaluated as such. Any implementation that makes the trade-off should, as you've suggested, carry prominent warning labels.
Edi:
1. Does [conservative vs. precise GC] have any consequences in practice or are these just theoretical issues?
They are not simply theoretical. Recently, for example, an experiment was performed on GCC in which the precise GC was replaced with a conservative one. Memory usage sharply increased - a consequence of false roots. Another example: I once wrote a very simple regression test for a weak-references implementation, but the test itself was (in practice) unreliable because on some runs, logically dead objects, weakly referred to, were kept alive by false conservative roots.
2. And, _if_ I have to worry, are there any identifiable - maybe degenerated - usage patterns which will lead to accumulated garbage or can this happen with arbitrary data/programs?
By in large, there is absolutely nothing you can do to avoid the problems or the exploits they imply. Given a program which is leaking or consuming excessive memory due to conservative GC, you can often make small changes that reduce the problem -- but in general you can do nothing at all to prevent the problem from occurring or being exploited.
Pascal:
One of the most widely used gcs - the Boehm gc - is conservative. So it doesn't seem to be a problem in general.
C'mon. That a system is widely used does not mean that it is immune from problems. Rather, it means that whatever obscure problems it has are multiplied in significance.
Kaz:
> 1. Does this have any consequences in practice or are these > just theoretical issues?
It has consequences in practice. It means that the GC algorithm is incorrect, so that the application potentially contains memory leaks.
You'd think that that fact, combined with the relative simplicity of doing GC precisely in a language implementation, would turn people off of conservative GC. But no, people rationalize their abandonment of quality for whatever peculiar reasons they have.
Which is, I admit, a near restatement (intended to highlight) your comment:
The reason Boehm is widely used is because it's one of the few choices available to C programmers. It's not chosen because it's conservative, but in spite of that limitation, which can be rationalized away.
But Basile is right that:
Agreed, but Boehm's GC is still quite remarkable in my opinion.
For example, it is good as a debugging tool, and as a duct-tape-solution that improves some buggy C programs.
> What I find rather obvious is that any stream you read from > may be closed when you hit end-of-file, regardless of where > this happens. Users of a stream can test for open-ness with > `open-stream-p', which is quite remarkably different from > testing for end-of-file.
> Ok, but, so what? The topic is closing _files_, not streams. Files > may be randomly accessible. Reaching the end-of-file during a read > is not a condition under which to automatically close the file.
So generalize the open-ness testing idea. Usually the client code knows the closing condition:
For a serially processed file, it could be reaching its end.
For an internet socket, one of the endpoints break the connection, does QUIT, etc.
When some client code encounters the condition, the file/stream/socket/whatever can be closed. Other clients, since they are continuously testing for open-ness, will back off on demand.
Testing for open-ness allows the "don't care" approach to clean up for a larger set of problems and it is stunningly simple compared to writing a special GC engine that can free expensive resources in a timely manner.
Sure, you can think of cases where the closing condition is not known, but it is still a rather clever idea, a simple compromise approach that leaves you better off than before.
-- Cheers, The Rhythm is around me, The Rhythm has control. Ray Blaak The Rhythm is inside me, bl...@telus.net The Rhythm has my soul.
* l...@emf.emf.net (Tom Lord) | Ok, but, so what? The topic is closing _files_, not streams. Files | may be randomly accessible. Reaching the end-of-file during a read | is not a condition under which to automatically close the file.
I thought you said this was obvious, but it appears that you were not entirely truthful.
If you want freedom from closing/freeing a /stream/, the obvious solution is to close it when it has been exhausted. This is neatly localizable and abstractable.
If you want to refer to /files/, the obvious solution is to intern pathnames and use a mechanism that keeps a maximum number of files open which is used to close the least recently referenced file if you are maxed out, so you save on the open calls. If you close a file, you can keep its file-position with this structure for use if you need to open it again. In this scheme, you never really talk to any streams -- they are hidden by the abstraction, and hence open and close are localized operations.
Your argument was that certain algorithms were not implementable in Common Lisp because the language did not require the closing of operating-system files when the system garbage-collected streams. This is in fact wrong. There are far more possibilities that you have been willing to explore before you reached your conclusion.
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
> > 1. Does this have any consequences in practice or are these > > just theoretical issues?
> It has consequences in practice. It means that the GC > algorithm is incorrect, so that the application potentially > contains memory leaks.
> You'd think that that fact, combined with the relative simplicity of > doing GC precisely in a language implementation, would turn people off > of conservative GC. But no, people rationalize their abandonment of > quality for whatever peculiar reasons they have.
> Which is, I admit, a near restatement (intended to highlight) your > comment:
> The reason Boehm is widely used is because it's one of the few > choices available to C programmers. It's not chosen because > it's conservative, but in spite of that limitation, which can > be rationalized away.
Continued programming in C and C++ is an exercise of piling rationalizations on top of rationalizations anyway, so this fits right in. You'd expect a whole lot of things to turn people off C and C++, but no, people rationalize their abandonment of quality for whatever peculiar reasons they have. Which, I dmit, is a near restatement ...
If you want freedom from closing/freeing a /stream/, the obvious solution is to close it when it has been exhausted. This is neatly localizable and abstractable.
Consider the example I posted a while back: a computation that involves a graph whose nodes may refer to streams which are network connections. When part of the graph become inaccessible, _that_ is when the stream needs to be closed. In that example, it is my program, not the stream peer, who marks the end-of-stream. My program has to make that decision based on the lifetimes of nodes in my graph. The garbage collector is already there, computing those lifetimes for me -- it is the right tool for the job.
If you want to refer to /files/, the obvious solution is to intern pathnames and use a mechanism that keeps a maximum number of files open which is used to close the least recently referenced file if you are maxed out, so you save on the open calls.
That's a wrong solution because closing and opening a file may return you different file from the one you started with. The solution you've proposed changes the semantics of the program.
Programs which can assume that the filename->file mapping remains constant throughout their run can use your trick, but others may not.
Some programs can not assume that the filename->file mapping remains constant but use this trick anyway. Some bogosities displayed by NFS are the result of that, for example.
Your argument was that certain algorithms were not implementable in Common Lisp because the language did not require the closing of operating-system files when the system garbage-collected streams. This is in fact wrong. There are far more possibilities that you have been willing to explore before you reached your conclusion.
That is not quite my argument, though I'd guess you can find places where I've stated it misleadingly.
"not implementable" is too strong a claim. "not cleanly" or "not easilly" or "not straightforwardly implementable" is closer.
Also, the requirement is weaker than "must close operating-system files when the system garbage-collects streams" -- strong-guarantee finalizers or weak references also give me enough mechanism to close streams when needed.
There is a kind of reductio against my argument if you think I meant to say "strictly not impelementable". One could write a graph tracer in lisp itself and, at the application level, garbage collect for streams. So if you think I said "strictly not impelementable", that reductio should give you a clue that you've misread me ;-)
* l...@emf.emf.net (Tom Lord) | Consider the example I posted a while back: a computation that | involves a graph whose nodes may refer to streams which are network | connections. When part of the graph become inaccessible, _that_ is | when the stream needs to be closed. In that example, it is my | program, not the stream peer, who marks the end-of-stream. My | program has to make that decision based on the lifetimes of nodes in | my graph. The garbage collector is already there, computing those | lifetimes for me -- it is the right tool for the job.
Tom, which war are you fighting? I am trying to show you that the way you see things has contributed to, if not produced, conclusions that you stick by as if they were The Exclusive Truth, which they are not, and that if you opened up a little bit, you would find that you can look at the same evidence in different ways and arrive at very different conclusions. If you have a problem, we will help you find a solution.
But there is, I suspect, a desire on your part to prove your initial hunch, and to do this you construct ever more problems when your "challenges" have been solved, but to anyone who listens to such "argumentation", the conclusion is fairly inevitable: You construct more "problems" in order to support a foregone conclusion.
But here's how to implement garbage collection of file descriptors: Scan the heap for streams and look into them for file descriptors. When you have collected all the file descriptors this way, and maybe accounted for system streams, loop over the file descriptor space and close all that are not supposed to be open. It would not hurt if you did this after a garbage collection, of course, but it can be done at any time with no ill effects (other than a slew of system calls that result in harmless error returns).
Please note that I said "garbage collection of file descriptors" and not "garbage collection of streams". This difference is important only because you have to be slightly more careful about caches and buffers. In a programming style that depended on this scheme, you would naturally want to force writing buffers to disk before you dropped the stream. This means that "but it would not work, because I want garbage collection to flush the stream buffers" is not a very smart counter-argument. You would make it only to strongly affirm my conjecture that you are making up problems as you go in order to defend your desire for a change in design.
| That's a wrong solution because closing and opening a file may | return you different file from the one you started with. The | solution you've proposed changes the semantics of the program.
Tom, listen. You do this trick when it /preserves/ the semantics of the program. You really do understand this and only pretend to be stupid, don't you? If the file referenced by a filename changes from open to open, you are not really working with /files/, which you said you were, so this is specious at best. It really looks as if you are making up problems, now.
| There is a kind of reductio against my argument if you think I meant | to say "strictly not impelementable". One could write a graph | tracer in lisp itself and, at the application level, garbage collect | for streams. So if you think I said "strictly not impelementable", | that reductio should give you a clue that you've misread me ;-)
All very good, but do you pick up clues that you have misread others? The above, for instance, is just plain silly. We all know that with sufficient work, everything is strictly implementable in any language. So I do not think you meant to say anything. Unlike you, I do not entertain specious arguments for their own sake, so it is quite foreign to me that you you have to defend yourself against something you think I think you meant to say. Really. So, please, clean up this mess and try to write better if you have such fears.
I believe, but not very strongly, that the reason for your failure to be happy with the language as it is, is that you brought with you a model of perfect language design before you came to Common Lisp and now you think Common Lisp ought to conform to it. I think the reason that I tend to learn quickly and "side" with authorities, is that most of the time, I attach no personal identity to the models I bring with me when I am relatively ignorant. When I occasionally do (and the first example that comes to mind is Perl), I also fail at learning it. Back in 1993/4, I worked with C++, and the whole experience was exceedingly painful because the desire to redesign and fix the language was so strong that I had to fight it all the time to get any work done. Every now and then, I see people who come to various newsgroups with a "question" that is really a scream of despair that reality is not what they wanted it to be. I used to have a .signature line that I took from SF Gate columnist Mark Morford (which he may well have taken from somewhere else and that joke was lost on me), which pissed off a lot more people than I sort of intend my .signature to do, reading "if this is not what you expected, please alter your expectations", but I think it is actually a really good epitaph to life in general.
The question is not whether Common Lisp (qua language) guarantees garbage collection to close streams cleanly, the questions are what you would do differently if it did and whether you can do the same things even if it does not. I realize that this is more problem- solution-oriented than many people appear to appreciate, but some times, the only available solution is not as stunningly beautiful as you had dreamt of, because it has to make space for something more.
So, since Common Lisp does not guarantee orderly closure of streams when the stream object is garbage collected, /but/ implementations tend to offer finalization because it is sometimes exceedingly nice to have a safety net, /and/ the mechanism is non-portable, so you need to build some sort of abstraction on top of it, anyway, you might as well implement something even cleaner on top of it all. When you do this, there is very little danger that you will need to worry about how the system cleans up after you, because you will do it yourself. I have suggested two ways to do this that work very well for their inherently limited set of circumstances, and you should be able to think of some more of your own if you can only come to peace with the fact that you have to do /something/ on your own to get the behavior you desire.
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
l...@emf.emf.net (Tom Lord) wrote in message <news:v1s93f82omdi58@corp.supernews.com>... > They are not simply theoretical. Recently, for example, an experiment > was performed on GCC in which the precise GC was replaced with a > conservative one. Memory usage sharply increased - a consequence of > false roots.
That wasn't what I deduced from the thread in the gcc mailing list. My impression was that the experiment used the collector in fully conservative mode. I would guess that any retention problem is due to scanning the heap conservatively, especially since I think gcc plays some games in places to compress the data in the heap a little.
Given that gcc already has layout information, this doesn't strike me as the right approach there, if you were doing more than an experiment.
Root misidentifications generally cause bounded damage. See my 2002 POPL paper.
* l...@emf.emf.net (Tom Lord) | A MEWS is good, and yours has provided me with amusement and other | benefits, but yours is also more than a little bit flakey.
In other words, you had already made up your mind to want the language to guarantee closing of streams upon garbage collection long before you came here and nothing anyone more knowledgeable than you can say will change your mind. Thanks for confirming it.
My MEWS is indeed flakey. It should have triggered on you long ago. I made a grave mistake trying to respond helpfully to you. I apologize and regret the waste of time.
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
Erik Naggum <e...@naggum.no> writes: > * l...@emf.emf.net (Tom Lord) > | A MEWS is good, and yours has provided me with amusement and other > | benefits, but yours is also more than a little bit flakey.
> In other words, you had already made up your mind to want the > language to guarantee closing of streams upon garbage collection > long before you came here and nothing anyone more knowledgeable > than you can say will change your mind. Thanks for confirming it.
> My MEWS is indeed flakey. It should have triggered on you long > ago. I made a grave mistake trying to respond helpfully to you. I > apologize and regret the waste of time.
Could any of you kind souls tell this poor newbie what manner of fragit a MEWS is?
Erik Naggum <e...@naggum.no> writes: > * t...@tenkan.org (Tim Daly, Jr.) > | Could any of you kind souls tell this poor newbie what manner of > | fragit a MEWS is?
> Moron Early Warning System. It is intended to spare the newsgroup > of the otherwise inevitable inkshed.
> They are not simply theoretical. Recently, for example, an > experiment was performed on GCC in which the precise GC was > replaced with a conservative one. Memory usage sharply > increased - a consequence of false roots.
That wasn't what I deduced from the thread in the gcc mailing list. My impression was that the experiment used the collector in fully conservative mode. I would guess that any retention problem is due to scanning the heap conservatively, especially since I think gcc plays some games in places to compress the data in the heap a little.
I am corrected on a minor point, but not refuted :-) May I try to form a balanced summary of our disagreement? I don't want to (and can't) refute your general advocacy of conservative GC -- but I do think there are important qualifications on that advocacy which you don't usually make clear and notice that some "important" projects have made design decisions that apparently ignore those qualifications. I wish that in your advocacy, you yourself gave a more balanced picture of the situation.
So, here:
To make sure I'm understand you: your guess about the GCC case is that it is not false _roots_ that caused the leakage, but rather mis-identified false pointers within the graph (in heap allocated objects). The distinction, just to be clear, is that a "possible root" is a location a conservative GC checks for an apparent pointer even though nothing else (except perhaps a stack pointer) points to the possible root; possible pointers within the graph (non-roots) are areas of allocated memory, checked for apparent pointers only if the allocated region is reached via the graph of possible pointers starting at the roots.
And, you go on to say, the problem of false pointers within the graph can be cured by providing the conservative collector with type information for the objects in the heap. Elsewhere you've also pointed out that with type information for the stack, registers, and static memory -- a conservative GC _becomes_ a precise GC: so one _perhaps_ practical way to address the problems is to advocate to compiler writers that they make that type information available at run-time.
In short, you are describing heuristics that can reduce the amount of leakage from a conservative collector in some situations -- important heuristics for some applications, no doubt. And (elsewhere) you point out a development project that can lead to precise (but compiler-specific) GC that is native to an otherwise ordinary C run-time system. (And, for completeness, you've also pointed out changes compiler makers can make that don't extend the language but that do probabilistically improve conservative GC performance, right?)
There are several problems in applying your arguments here.
First: while GCC may have some type information on hand, we have no idea whether or not the original questioner (the one who asked "are these concerns purely theoretical") has suitable type information, or in what form. Therefore, the GCC case as you've interpreted it illustrates that the concerns are practical, not theoretical: for example, they suggest the practical concern that memory performance of conservative GC varies greatly depending on how much type information you can provide. If providing type information is a non-trivial task, the cost of providing it has to be subtracted from the convenience of just "plugging in" conservative GC -- if you're going to (have to) do that extra work, perhaps going all the way to precise GC isn't really significantly harder.
Second: there is at least one common and one not-unheard of example of a non-root object for which type information can _not_ be provided without help from the compiler. The common case is a heap-allocated jump buffer (the kind filled by `longjmp'). The not-unheard of case is a heap-allocated stack copy (an admittedly non-standard use of C that has nevertheless been ported widely in SCM and systems derived from SCM).
Third: Even though you are explicitly guessing, I think you clearly overreach when you say "_any_ retention problem is due to scanning the heap conservatively" (emphasis added). I think you have to say in your (very plausible) guess, at best, "_most_ retention problems are due to scanning the heap conservatively". When you say _any_, it suggests that you think they will _all_ be fixed by not scanning allocated objects conservatively.
Fourth: in other contexts (and I hope I'm not mis-paraphrasing you here) you've made a "relative error rates" argument and a "small size of error" argument. The "relative error rates" arguments says that with a little tuning, perhaps with some type information -- conservative GC errors can be made of sufficiently low probability that they may be disregarded. The "small size of error" argument says that -- unless you are using data structures which are exotic, any leakage from conservative GC will be of a negligible amount of memory.
The "relative error rates" argument simply doesn't jive with my experience using a conservative GC in a mode in which the stack is scanned conservatively, but all of the heap allocated objects are scanned precisely. So, I don't believe it. I've measured conservative GC leaks in that situation with very small numbers of runs of short-running, tiny, non-memory-intensive programs. If I've measured that, I can only conclude that if the system is deployed widely and used heavily, the error rate will greatly exceed, for example, hardware failure rates.
The "relative error rates" argument _also_ doesn't respond to the issue of creating exploitable risks. That a "natural" occurrence of a bug is rare says nothing about the probability of its exploitation by attackers. Attackers are really quite sophisticated, these days and they seem to do their work mostly be creating otherwise "improbable" conditions.
The "small size of error" argument rests on two false premises:
Taken as a statement about the total amount of memory wasted, the "small size of error" argument relies on the premise that the total amount of memory protected by any one pointer is always small. Even qualifying that (as you have done) by advising programmers not to use, or to change their implementation of "lazy lists" is not qualification enough. The only way to evaluate this argument about quantity of memory leaked is probabilistically: multiplying the probability that a pointer protects a "lot" of memory by the probability that it will be misprotected. On the one hand, this is another relative error rate argument and thus does not answer questions about the risks of exploitation. On the other hand, what little information we have or are able to easily obtain about those two probabilities is very uncertain. Measuring those probabilities for a handful of particular programs is a dubious basis on which to form generalizations.
The "small size of error" argument speaks _only_ about the number of objects leaked: it relies on the premise that only leaks of a large number of objects are a problem. As you know, I like to point out that leaking even a relatively small number of some objects can cause serious problems, depending on the nature of those objects. Leaks of just a few very large objects is a considerable memory leak. Leaks of just a medium number of objects representing external resources can cause a system to fail. On the latter point, we've had our discussion elsewhere about whether or not all programs should explicitly close files and so forth. A similar discussion took place here and I'll point out that if you read it closely, you'll find three relevant points: (1) that even many programmers who agree that all files should be closed explicitly want the GC there as a "safety net"; (2) that it is easy to make mistakes while trying to write code in the style you advocate (as, for example, the buggy implementations of `with-foo' that we saw); (3) that in some cases, the most convenient way to express a program that must close files is to make the decision of when to close those files based on the lifetimes of garbage collected objects -- it the GC can not be relied upon to measure those lifetimes accurately, the the decision of when to close those files can not be made accurately.
The hyperbole of the Subject: line and similar text notwithstanding, nobody has ever argued that conservative GC should never be used. But I've at least pointed out that it is a problematic choice for use in systems that must be secure against exploits are that are intended as "general purpose" programming systems that strive to place the least arbitrary restrictions on how they may be programmed. Those observations combined with your own, that to use conservative GC well in some systems you must nevertheless write enough code to make it as close to a precise GC as possible, raise a very practical question: is the convenience of conservative GC compared to precise _really_ a worthwhile trade-off? For systems intended to be secure, I think it's pretty clear that the answer is no. For other systems, I think it's worth a bit of investment in design and tool-building: to construct a portable precise-GC for use with C that is unambiguously as easy (or easier) to use than conservative GC.
* l...@emf.emf.net (Tom Lord) | A MEWS is good, and yours has provided me with amusement and other | benefits, but yours is also more than a little bit flakey.
In other words, you had already made up your mind to want the language to guarantee closing of streams upon garbage collection long before you came here and nothing anyone more knowledgeable than you can say will change your mind. Thanks for confirming it.
The particular flakiness observed here is that you've made non-responsive technical arguments, then behaved as if they were responsive. The first such in talking to me was plausibly just confusion, but the latest are just jaw-droppingly dumb.
My MEWS is indeed flakey. It should have triggered on you long ago. I made a grave mistake trying to respond helpfully to you. I apologize and regret the waste of time.
So, are you in fact, at least in part, a modern Jive filter?
It seems like a reasonable hypothesis. Your flamage is more than a little bit repetitive -- even if you were going to express the same idea twice, you'd think that, in the interest of being understood, you might try exploring different expressions of it. But instead, in vocabulary, sentence structure, and content -- you're very narrow.
At first I was thrown off because some of your psychologizing of _some_ posters seemed to me to hit the mark. But then you used the same characterization on others to whom it clearly did not apply, suggesting that you're applying it blindly.
And the lisp community presumably does have lots of natural language tools floating around....
So, yeah, I'm starting to think "modern Jive filter", seeded with content suggestions for particular replies. And if someone feeds you a non-responsive argument, you disappoint us by making big logical gaffs.
If the hypothesis is true, it interests me as political statement. Often on technical lists, people abandon reasonable discourse and make replies that are logically very close to your flames -- but they do so in more popular language, often only by implication. In _that_ light, some of the "anti-Erik" chat on this list, particularly that that focusses on your tone or language use rather than content, is a fascinating side effect.
But the political implications of your Jive filter (whether literal or not) are undermined when you say something dumb on the technical front.
* l...@emf.emf.net (Tom Lord) | The particular flakiness observed here is that you've made | non-responsive technical arguments, then behaved as if they were | responsive. The first such in talking to me was plausibly just | confusion, but the latest are just jaw-droppingly dumb.
I take it that your ranting and raving about me is the antithesis of "jaw-droppingly dumb", then. Good luck with your life, Tom.
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
> ... But > I've at least pointed out that it is a problematic choice for use in > systems that must be secure against exploits are that are intended as > "general purpose" programming systems that strive to place the least > arbitrary restrictions on how they may be programmed.
it still remains a storm-in-a-teacup academic argument; i have yet to see you point to any production system with (say) well-analyzed security issues due to boehm collector. when will you nail this down, publish your paper and finish with this discussion? :-]
oz --- you take a banana, you get a lunar landscape. -- j. van wijk
l...@emf.emf.net (Tom Lord) wrote in message <news:v217jd7svlbb45@corp.supernews.com>... > I am corrected on a minor point, but not refuted :-) May I try to form > a balanced summary of our disagreement? I don't want to (and can't) > refute your general advocacy of conservative GC -- but I do think > there are important qualifications on that advocacy which you don't > usually make clear and notice that some "important" projects have made > design decisions that apparently ignore those qualifications. I wish > that in your advocacy, you yourself gave a more balanced picture of > the situation.
[Long discussion not reproduced here.]
I was just trying to correct a minor point, but occasionally important, point, without restarting this discussion. Since I failed ...
I largely agree with your discussion, except that I don't see how security issues enter here. I think we agree that if we, for example, run untrusted code inside a conservatively collected JVM, we run the risk that malicious code will cause excessive retention, and thus cause the JVM to run out of memory. But how is that different from malicious code that just allocates a lot?
The "view" I'm trying to advocate is that:
1) Conservative collectors often, but not always, work well. Whether or not they work well varies considerably with the collector implementation, as well as with the particular application.
2) When applications don't work well with conservative GC, as for the gcc example, in my experience the problem is usually traceable to one or at most a handful of "false pointer" sources (or a handful of very large object allocations). A small amount of type information typically goes a long way. C workarounds for such problems are typically not hard. I'm not sufficiently familiar with various Lisp/Scheme implementations to understand their need/utility there.
3) All other things being equal, clearly type accurate GC is better than conservative collection. All other things are sometimes, but rarely equal. There are significant differences in whether or not you can handle legacy C code, the complexity of foreign function interfaces, and in the difficulty of getting the compiler-collector interface debugged. In real life, there are thus trade-offs, depending on whether these other things matter.
4) There are many degrees of "conservativism" in garbage collection. (E.g. conservatively scan everything -> conservatively scan only top stack frame.) They have significantly different tradeoffs, and are appropriate under different circumstances.
5) There are a small number of things we can actually prove about conservative GC. A pure call-by-value functional program can't require unbounded space solely as the result of a conservative stack/register scan (as opposed to heap scan). I suspect neither can gcc. (That doesn't preclude bounded leaks.)
6) We agree that it is occasionally useful to let the collector manage external resources such as file descriptors. By using a conservative GC, you may lose any guarantees about how many file descriptors can be simultaneously open, and thus you may run out of descriptors earlier than you expected. But for nearly all programming languages, you don't have those guarantees anyway, for several reasons. Object reachability is usually not precisely defined. And often the finalization facility isn't quite up to this task. (See my 2003 POPL paper or http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html) In my experience, this still seems to work fine inpractice, however, conservative GC or not.
I was just trying to correct a minor point, but occasionally important, point, without restarting this discussion. Since I failed ...
I'll now try to remove the bee in my bonnet about that -- but so that you know why it's been there:
In a sense, it's a discussion you and I have had over many months. It started when I was advocating for some work to build an original free software run-time system for high-level languages as a practical alternative to dragging the free software world into a swamp of .NET reimplementation. One of the goals I proposed was to use the new run-time system to make some of our existing sytems more robust by using precise rather than conservative GC. You and I got into this discussion about robustness, and my perception is that the implications of our discussion were widely misinterpreted: your "The sky isn't falling on systems that use conservative GC" was taken as "There is no robustness risk with conservative GC, no reason to try doing any better."
I largely agree with your discussion, except that I don't see how security issues enter here.
Ok, I'll try to fix that.
I think we agree that if we, for example, run untrusted code inside a conservatively collected JVM, we run the risk that malicious code will cause excessive retention, and thus cause the JVM to run out of memory. But how is that different from malicious code that just allocates a lot?
Malicious code run as untrusted code _does_ have unique problems when combined with conservative GC. For example, although I can resource-bound untrusted code -- e.g., to limit the allocations that it can make itself -- I can't prevent it from creating false roots which may protect objects not allocated by the untrusted code.
The risks are larger than just untrusted code, and larger than just running out of memory or other resources. I'll explain:
It's not just untrusted code that is the problem -- it's also untrusted data. With suitably crafted malicious _data_, you get some control over what's on the stack, both in variables and in spilled registers. Thus, malicious data can also create false roots.
False roots enable direct exploits that manage to cause big leaks but they also enable indirect exploits. An attacker can combine unintended retention of some objects with other exploits, for example, tricking the GC into keeping around an object that ultimately keeps a file open, then using another exploit to run code that accesses that file. You can find some shockingly (to me, at least) sophisticated examples of attacks that combine several minor bugs to add up to one big exploit in, for example, 2600. From what I could tell reading the articles there, attackers solve problems by saying "Ok, I need to do X, which can be broken down into problems A, B, and C" and then they go shopping for bugs that enable A, B, and C. Sometimes people respond to my concerns by saying "Well, I'll worry about it when you show me an actual exploit on a deployed system," but I don't think that attitude is justified given the sophsitication of attackers these days.
The ultimate consideration is that with precise GC, any possible exploits are bugs in my program text and I can fix those. With conservative GC, I introduce exploits that aren't there in the program text: there's nothing I can do to eliminate that _class_ of bugs other than switching to precise GC.
The "view" I'm trying to advocate is that:
1) Conservative collectors often, but not always, work well. Whether or not they work well varies considerably with the collector implementation, as well as with the particular application.
I agree, adding the observation that they can _never_ work perfectly. Precise GC also varies in quality and accuracy with implementation, but at least its faults can be seen as bugs in the implementation text and the text changed to fix those bugs.
2) When applications don't work well with conservative GC, as for the gcc example, in my experience the problem is usually traceable to one or at most a handful of "false pointer" sources (or a handful of very large object allocations). A small amount of type information typically goes a long way. C workarounds for such problems are typically not hard. I'm not sufficiently familiar with various Lisp/Scheme implementations to understand their need/utility there.
I think this expands on "varies considerably with collector implementation".
3) All other things being equal, clearly type accurate GC is better than conservative collection. All other things are sometimes, but rarely equal. There are significant differences in whether or not you can handle legacy C code, the complexity of foreign function interfaces, and in the difficulty of getting the compiler-collector interface debugged. In real life, there are thus trade-offs, depending on whether these other things matter.
Of course there are trade-offs and sometimes there are good reasons for choosing conservative GC. I think its unlikely one _really_ wants to choose conservative GC for something like a fresh Java implementation or .NET competitor targetted at enterprise systems, where you both (a) want be able to get as close to perfectly robust as you can afford and (b) never want to be caught in a situation where there is a permanent source of exploits that can not be plugged (even if any particular exploit that takes place can be worked around after the fact). Not all programming problems have solutions that satisfy (a) and (b), but GC does and those solutions are in the precise GC family.
That said, some of your trade-off concerns, such as complexity of ffi's or whether or not a special compiler/collector interface is needed are exaggerated: those concerns can be addressed by building tools (e.g., imagine a GC-lint program checker for C programs or a code generator that produces code for GC bookkeeping).
4) There are many degrees of "conservativism" in garbage collection. (E.g. conservatively scan everything -> conservatively scan only top stack frame.) They have significantly different tradeoffs, and are appropriate under different circumstances.
Obviously.
5) There are a small number of things we can actually prove about conservative GC. A pure call-by-value functional program can't require unbounded space solely as the result of a conservative stack/register scan (as opposed to heap scan). I suspect neither can gcc. (That doesn't preclude bounded leaks.)
Assuming that the program is not being lazilly evaluated, that sounds plausible, and kind of interesting, but not of much practical importance to the questions at hand.
6) We agree that it is occasionally useful to let the collector manage external resources such as file descriptors. By using a conservative GC, you may lose any guarantees about how many file descriptors can be simultaneously open, and thus you may run out of descriptors earlier than you expected.
Agreed, with the addition that running out of descriptors is not the _only_ danger: simply retaining a particular descriptor can open the door to exploits.
But for nearly all programming languages, you don't have those guarantees anyway, for several reasons. Object reachability is usually not precisely defined. And often the finalization facility isn't quite up to this task.
Agreed. This is an area where language designers and implementors need to do a much better job, and where a much better job for our purposes here can certainly be done. There's a hard problem of defining reachability so as to minimize lifetimes to the greatest extent practical -- but that hard problem isn't the one we're talking about here: we only need to provide a useful upper-bound on lifetimes.
In my experience, this still seems to work fine inpractice, however, conservative GC or not.
I can't reliably regression test the weak-references implementation in my Scheme interpreter that uses a partially-conservative GC, because it uses a partially-conservative GC. That discovery was the particular incident that first soured me on conservative GC.
Thanks for an interesting chat. It has helped me become better at articulating the risks, and it has cleared up my model of your "view" (which, it turns out, is closer to a shared view than I suspected).
In article <v26ddo3em0g...@corp.supernews.com>, l...@emf.emf.net (Tom
Lord) wrote: > ... I was advocating for some work to build an original free > software run-time system for high-level languages as a practical > alternative to dragging the free software world into a swamp of .NET > reimplementation.
I believe this is a very worthy goal. I am actually in active need of such a system (for reasons that I won't go into here). I've found that Hans's GC, Bruno's CLN system, and C++ get me a very long way towards where I want to be. If I wanted to use a precise GC instead of Hans's system, what would you recommend? (Note: I need a thread-safe GC.)
> > ... I was advocating for some work to build an original free > > software run-time system for high-level languages as a practical > > alternative to dragging the free software world into a swamp of .NET > > reimplementation.
> I believe this is a very worthy goal. I am actually in active need of > such a system (for reasons that I won't go into here).
Isn't that one of the goals of the Parrot project? AFAIK it's being developed as the default backend of Perl 6, but with cross-language-ability in mind, and some Ruby and Python people used to be quite excited about it. Would be a CL-to-Parrot-compiler be a good idea?
At least, remembering the april fools joke that gave the name to Parrot, one might ask Paul Graham to change "it" in Arc to "*dollar-underscore*" for better readability ;-)
Erik Naggum <e...@naggum.no> wrote in message <news:3251188682174136@naggum.no>... > If you want to refer to /files/, the obvious solution is to intern > pathnames and use a mechanism that keeps a maximum number of files > open which is used to close the least recently referenced file if > you are maxed out, so you save on the open calls. If you close a > file, you can keep its file-position with this structure for use if > you need to open it again. In this scheme, you never really talk to > any streams -- they are hidden by the abstraction, and hence open > and close are localized operations.
Can this reliably work on Unix?
Unix lets one unlink open files - the actual unlink is deferred until all of the corresponding closes occur.
Even if a CL implementation keeps track of the unlinks that it performs, it can't keep track of them all, and even if it could, it can't do anything other than keep the file open, so what does it do if the program wants 1055[1] open files that have been deleted?
-andy
[1] I think that 1024 is a relatively common limit on the number of open files per forked-thingie.
If you delete an open file, what exactly do you want to rely on?
Why would you want to enroll such an open file in this mechanism?
Or are you afraid of what might happen if you delete a file under a program that relies on it? If so, what makes you think that this clever way of dealing with files will make a substantial difference?
In other words, what exactly did you think I proposed?
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
If you delete an open file, what exactly do you want to rely on?
That the data in the deleted file is still accessible to programs holding a descriptor for that file. That the file name may be re-mapped to a new inode, even while the previous inode is still in use.
Why would you want to enroll such an open file in this mechanism?
Because these semantics are part of how good unix programs achieve ACID semantics. It may be the case that sometime in the future, we'll be able to wrap unix filesystem calls in begin-/end-transaction wrappers -- but for a few decades now, people have made good use the very easily implemented semantics under discussion to achieve ACID properties without full-fledged transactions.
Or are you afraid of what might happen if you delete a file under a program that relies on it? If so, what makes you think that this clever way of dealing with files will make a substantial difference?
No, we unix hackers aren't afraid of what might happen on our platforms: we count on it and appreciate the reasons why it is so.
In other words, what exactly did you think I proposed?
Argument for argument's sake? If the earlier questions were honest, I've provided the answers to them.
In the unlikely event anyone else reads Tom Lord's increasingly spurious excuses for articles...
* Andy Freeman | Can this reliably work on Unix?
* Erik Naggum | If you delete an open file, what exactly do you want to rely on?
* Tom Lord | That the data in the deleted file is still accessible to programs | holding a descriptor for that file. That the file name may be | re-mapped to a new inode, even while the previous inode is still in | use.
Are you speaking for Andy Freeman? Did you understand the /context/ of the question he asked? Or are you just jumping into a discussion to which you have not been paying much attention so you can say something stupid and pretend it is somebody else's fault?
* Erik Naggum | Why would you want to enroll such an open file in this | mechanism?
Recall that the mechanism discussed here is one of keeping track of a number of /files/ by name, which can be opened and closed at will. If the files cannot be opened and closed at will in this fashion, then use your goddamn brain and do not use this technique! How hard can it be to grasp that just because a technique has limitations does not mean it is not useful where those limitations do not apply? What kind of a programmer are you if you are unable to grasp this?
* Tom Lord | Because these semantics are part of how good unix programs achieve | ACID semantics.
Let me get this straight. You want automatic flushing of buffers and closing of files upon garbage collection of streams so you can implement certain algorithms which you claim are not easy in Common Lisp as is (except that it is, you just use finalization). When a simple scheme is proposed that would free you of all concerns for when files are opened and closed (which seemed to be the point of your desires until you came up with more problems after the old ones were solved) for some particular uses (I proposed only two such schemes depending on usage factors, obviously not the exclusive list of methods, but the /obvious/ choices for how to implement what you claimed was /obvious/ yet did not understand), your counter-argument is that it should not be used because you want /ACID semantics/ of these garbage collected and automatically opened and closed streams?
Are you /insane/? Or are you just making up problems as you go so that nobody can ever satisfy your increasingly irrational demands?
| It may be the case that sometime in the future, we'll be able to | wrap unix filesystem calls in begin-/end-transaction wrappers -- but | for a few decades now, people have made good use the very easily | implemented semantics under discussion to achieve ACID properties | without full-fledged transactions.
Are you at all aware of what thread this is in? Does the word "context" not have any meaning to you at all? Is thinking so hard for you that you have to concentrate real hard and work long hours not to make bumbling mistakes? Is that why your miserable excuse for a brain overloaded when you were given standard sociological arguments about models and you called them "jaw-droppingly dumb"? I take some solace in the fact that the arguments were given by a professor of sociology at the U of Oslo who has published dozens of papers and books. When you, an obviously harebrained fool, pays no attention and provides evidence that you are so mind-numbingly retarded that you should be protected from yourself, that is one of the things he has discussed over the years -- how people who are poor on mental models seek out and need information that conforms to their models lest they become very nervous and agitated and feel an urge to escape rather than to think.
| Argument for argument's sake?
Yes, that does indeed seem to be your modus operandi.
| If the earlier questions were honest, | I've provided the answers to them.
Except for the fact that they were asked of Andy Freeman, not of you. Except that you do not appear to be particularly honest to begin with. Except that you answered questions completely out of context. Andy Freeman had already swayed from the context, but you showed me what kind of idiot you are when you decided to respond. You probably think that just because your feeble brainpower cannot deal with something, it must be dumb. Astonishingly stupid people tend to behave that way.
It must be fun to be you. I sometimes wish I had accepted the drugs I have been offered at various parties of the years so I could have been able to /relate/ to people like you and how your brains work.
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
l...@emf.emf.net (Tom Lord) writes: > It may be the case that sometime in the future, we'll be able to > wrap unix filesystem calls in begin-/end-transaction wrappers....
It certainly was the case in the past for several non-unix file system calls.