> Why would you want to enroll such an open file in this mechanism?
If it's the only mechanism for files, I don't have much choice.
> Or are you afraid of what might happen if you delete a file under a > program that relies on it? If so, what makes you think that this > clever way of dealing with files will make a substantial difference?
It affects observable behavior.
> In other words, what exactly did you think I proposed?
It looked like a proposal for a scheme that provided the illusion of a "large" number of open files on a system that didn't actually provide same.
* Andy Freeman | If it's the only mechanism for files, I don't have much choice.
Ah, I see you invented the "only mechanism" part on your own and attributed it to me. How manifestly indecent of you to do so.
Please back up and reattach your argumentation to what I wrote -- it is currently flee-floating without connection to what I wrote, yet you comment on it as if I had said something I had not. Then make your own contribution explicit and see what difference it makes.
* Erik Naggum | In other words, what exactly did you think I proposed?
* Andy Freeman | It looked like a proposal for a scheme that provided the illusion | of a "large" number of open files on a system that didn't actually | provide same.
Could you do me the favor of /reading/ what I wrote and /please/ try to avoid introducing noise of your own into it if you are going to comment on it? Intellectual honesty demands that you try to keep at least somewhat clear of polluting the information you comment on. Not that intellectual honesty is in high esteem in this newsgroup, but I still get fairly annoyed when people make up things and then pretend I said them.
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
> If you delete an open file, what exactly do you want to rely on?
That data operations using valid handles are unaffected by unlink.
I think of create&unlink as operations that maintain "root" pointers to data. Open turns said root pointers into handles that can be used to manipulate said data; close frees said handles.
With that separation of powers, unlink shouldn't have any effect on handle operations.
Yes, I know that the unix model isn't quite that straightforward....
* Andy Freeman | That data operations using valid handles are unaffected by unlink.
If you want to rely on this, do you still want the stream to be closed when it is garbage collected, or do you want some control over when it ceases to exist?
I guess I am trying to figure out why you brought this up in the context of garbage-collected streams with finalization semantics.
Also, despite what you believe, this is not the only mechanism. The standard language semantics prevails. Someone wanted to be relieved of closing streams "manually" and wanted them to be closed when they became unreferenced. Again despite what you believe I said, I have offered three different ways to address this problem. (One of them automatic reaping of unreferenced file handles.) How you could possibly have invented the premise that one of these three would be the only one available is beyond me.
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
* Andy Freeman wrote: > Yes, I know that the unix model isn't quite that straightforward....
NFS. And so this comes back to where we started: the moment you start playing clever games with files or streams and GC you suddenly discover you're in a world where you have to do distributed GC across possibly heterogeneous systems with differing underlying semantics. Or alternatively, you don't discover this, but your programs just randomly break every once in a while.
Erik Naggum <e...@naggum.no> wrote in message <news:3251561642580056@naggum.no>... > * Andy Freeman > | If it's the only mechanism for files, I don't have much choice.
> Ah, I see you invented the "only mechanism" part on your own and > attributed it to me. How manifestly indecent of you to do so.
Except that I didn't attribute anything to anyone. I asked a question to help me understand what Naggum wrote.
The thread has discussed both "only" and "special case" mechanisms and the message in question doesn't specify its category. Thus, my question.
> Please back up and reattach your argumentation to what I wrote
No thanks. Been there, did that, got the t-shirt.
I'd like to learn about general mechanisms because I'd like to avoid yet another context-specific mechanism. If that's not on the table....
> * Andy Freeman > | It looked like a proposal for a scheme that provided the illusion > | of a "large" number of open files on a system that didn't actually > | provide same.
> Could you do me the favor of /reading/ what I wrote
The mechanism in question kept track of files and closed&reopened them behind the programmer's back in certain circumstances; the close&reopen was not explicitly requested by the programmer. (The suggested mechanism to choose files to close was LRU.) True, Naggum didn't say why the mechanism was closing files - I assumed that a system limitation might be relevant.
l...@emf.emf.net (Tom Lord) writes, amongst other things:
> It's not just untrusted code that is the problem -- it's also > untrusted data. With suitably crafted malicious _data_, you get some > control over what's on the stack, both in variables and in spilled > registers. Thus, malicious data can also create false roots.
> False roots enable direct exploits that manage to cause big leaks but > they also enable indirect exploits. An attacker can combine > unintended retention of some objects with other exploits, for example, > tricking the GC into keeping around an object that ultimately keeps a > file open, then using another exploit to run code that accesses that > file.
i'm sorry, i find this quite incoherent. can you specify a step-by-step exploit on a server of your choice running with boehm GC, and explain how and why you ended up with the GC and false roots as your *only* course of action for the exploit? if you hacked into server's stack (say), why are there easier exploits available to you? (etc) i'm not saying your point is invalid, i'm just not seeing it clearly out of all this hand waving. a serious example would help reinforce our security toolkits.
oz --- a nought, an ought, a knot, a not easily perceived distinction. -- t. duff
l...@emf.emf.net (Tom Lord) writes, amongst other things:
> It's not just untrusted code that is the problem -- it's also > untrusted data. With suitably crafted malicious _data_, you get > some control over what's on the stack, both in variables and in > spilled registers. Thus, malicious data can also create false > roots. > > False roots enable direct exploits that manage to cause big > leaks but they also enable indirect exploits. An attacker can > combine unintended retention of some objects with other > exploits, for example, tricking the GC into keeping around an > object that ultimately keeps a file open, then using another > exploit to run code that accesses that file.
i'm sorry, i find this quite incoherent.
That's ok. I think your message leads someplace interesting.
I'm sorry, too: because this is a long reply. It's in two parts: one is just trying to clear up what I think are the misunderstandings that led you to judge my contribution "incoherent" -- that's boring, but necessary. The second part is a lot more intersting, in my view: it's an actual gosh-darn engineering question: how to spend money on software development that involves a choice between conservative and precise collectors. So, here we go:
* Fixing the Apparent Miscommunication
i'm sorry, i find this quite incoherent.
Initially, at least, that appears to be because you misread it. You go on to say:
if you hacked into server's stack (say), why are[n't] there easier exploits available to you? (etc)
The misunderstanding seems to be over my phrase "you get some control over what's on the stack".
There are popularized exploits that involve, for lack of a better term, "stack smashing". For example, a bug permits a buffer overrun on a stack-based buffer. An attacker supplies data that causes the overrun. That data both contains arbitrary code and replaces the return address of the stack frame with a pointer to that arbitrary code (or, maybe it just points the return address to existing code that shouldn't run at that particular time but that, if run, will have malicious effect). When you talk about "easier exploits" -- I think that is the kind of thing you are talking about, no?
That is not the kind of "control over the stack" I'm talking about. I'm talking about control over the stack which does _not_ require a bug: control over the values in variables; control over the values in registers when they are spilled. If the stack is scanned conservatively, those values, suitably constructed (which can be forced in some cases by choice of attacker-supplied-data) are false roots and cause errant retention by the GC.
The fundamental problem here is that with conservative scanning, the stack values take on a new meaning that has nothing to do with the program text: the conservative GC sees them as potential roots. That overloading gives attackers a new avenue by which they may program your application (with malicious data) to behave in unintended ways.
can you specify a step-by-step exploit on a server of your choice running with boehm GC, and explain how and why you ended up with the GC and false roots as your *only* course of action for the exploit?
No. Nor would I specify one here, even if I could. Nor do I believe that doing so is a necessary part of pointing out the security risks.
if you hacked into server's stack (say), why are there easier exploits available to you? (etc) i'm not saying your point is invalid, i'm just not seeing it clearly out of all this hand waving. a serious example would help reinforce our security toolkits.
An attacker against a system which uses conservative GC can sometimes (and the program text doesn't make clear when) provide malicious data that targets particular objects to be retained that, with precise GC would not be retained, or that with conservative GC absent malicious data would be unlikely to be retained. How to combine that capability with other partial exploits, or how to use it directly as a complete exploit, is left as an exercise to the attacker; I've mentioned some of my ideas already (resource exhaustion; preserving sensative resources to make them available to a code exploit).
* The Engineering Question
It all comes down to opportunities and how much they cost and probabilities, all of which are impossible to measure precisely.
What is the probability of a conservative GC bug being used in an exploit, or causing a costly failure due to a naturally occuring bug? This is very hard to guess -- sadly, this thread probably raises the probability of exploits; non-costly failures from conservative GC (minor storage leaks) are very measurable -- so we should not "guess low" on the probability of a costly failure.
What is the cost of developing against a precise vs. conservative GC? There's the initial cost (Boehm-family collectors are ready off the shelf.) Then there's the continuting cost (compare how much I'll spend tuning conservative GC to eliminate retention bugs vs. how much I'll spend on precise GC to eliminate bookkeeping bugs). I think that in the current historic state of affairs, it's safe to say that the initial cost of precise GC is higher (though not by a huge amount) -- but that the ongoing costs can be made about equal (portable precise GC needs either code generators or gclint).
What's the lock-in cost of conservative GC? In other words, if we decide today to go with conservative GC, how much do we have to pay later if we need to switch to precise? We should note that code written presuming a conservative GC, especially a conservative-stack-scanning GC, is not easy to convert to precise GC -- one has to add bookkeeping. We can't count on there being an automatable conversion process -- we'd have to restrict the code somewhat to guarantee that automatic conversion was possible (and then write gclint even though we're using conservative GC). As run-time systems grow, this conversion cost is going to just keep getting higher. We might have a slight out, if we plan to one-day modify our C compiler to spew type information, but if that day comes, we'll both lose portability and raise the cost of modifying or replacing our compiler. Finally, if we need to do this conversion quickly, the costs must be suitably multiplied. I don't think these lock-in costs are easy to estimate, other than that they aren't trivial, and they will grow over time.
What's the pay-back of going to precise GC early? At least incrementally better memory performance; the ability to reliably regression test code that involves GC semantics (e.g., weak reference implementations); freedom from worrying about conservative GC lock-in costs, liability costs, and the probability of nastly conservative GC bugs or exploits.
Putting that all together, the only lossage of investing in precise GC is the initial cost, and there's plenty of experience in the field that tells us that cost isn't very high: wanna upper-bound it at 3 man-years (min of 1 calendar-year)?
It's a no-brainer. Walk away from conservative GC; invest a bit in precise. If need be, we can put all this in the form of a kind of Drake's equation for the bean counters.
Erik Naggum <e...@naggum.no> wrote in message <news:3251567553621728@naggum.no>... > * Andy Freeman > | That data operations using valid handles are unaffected by unlink.
> If you want to rely on this, do you still want the stream to be > closed when it is garbage collected, or do you want some control > over when it ceases to exist?
Streams? My question was about how a described mechanism for dealing with files interacts with the semantics for (local) Unix filesystems.
One possible useful answer is "it breaks badly if you do <whatever>".
> I guess I am trying to figure out why you brought this up in the > context of garbage-collected streams with finalization semantics.
I didn't. I noted Naggum's distinction between streams and files and asked about files.
> Also, despite what you believe, this is not the only mechanism.
I don't "believe", I asked a simple question that could have been answered with "no, this isn't a general mechanism", possibly with an added "it's good when ..." or even "no general mechanism is possible because ...".
I note again that general mechanisms had been discussed in the thread and that the proposed mechanism was not labelled. Thus, a question.
l...@emf.emf.net (Tom Lord) wrote in message <news:v26ddo3em0ge5f@corp.supernews.com>... > I think its unlikely one _really_ wants > to choose conservative GC for something like a fresh Java > implementation or .NET competitor targetted at enterprise systems, > where you both (a) want be able to get as close to perfectly robust as > you can afford and (b) never want to be caught in a situation where > there is a permanent source of exploits that can not be plugged (even > if any particular exploit that takes place can be worked around after > the fact). Not all programming problems have solutions that satisfy > (a) and (b), but GC does and those solutions are in the precise GC > family.
> That said, some of your trade-off concerns, such as complexity of > ffi's or whether or not a special compiler/collector interface is > needed are exaggerated: those concerns can be addressed by building > tools (e.g., imagine a GC-lint program checker for C programs or a > code generator that produces code for GC bookkeeping).
I still disagree about security exploits (see below).
And it still seems to me that the tradeoffs are substantial enough that I would consider them. If the foreign-function interface were already specified and mandatory, were performant enough that I could live with it everywhere, and I had enough resources to deal with the extra implementation effort, I would go for the type-accurate collector. In the case of the gcj effort for example, I suspect none of those are really true. (The foreign-function interface (JNI) is specified, but very complex and its performance often left something to be desired. The main gcj developers decided early on not to make it mandatory.) In the case of Mono, I know at least the last point was an issue.
It also seems to me that providing the option of scanning some things, e.g. C frames on the stack or C allocated objects, conservatively is always good, since it gives you options you wouldn't otherwise have. The real issues are whether
1) This costs you enough collector performance to negate the flexibility advantages. (I think the answer here is mostly unclear for applications for which generational collection works well, and mostly "no" for others. Our collector seems very competitive for the latter, but less so for the former. Mostly copying collectors probably help there.)
2) Once you have the facility, do you want to use it in cases for which you could generate precise layout information with more implementation effort and/or other overhead?
> 6) We agree that it is occasionally useful to let the collector > manage external resources such as file descriptors. By using a > conservative GC, you may lose any guarantees about how many > file descriptors can be simultaneously open, and thus you may > run out of descriptors earlier than you expected.
> Agreed, with the addition that running out of descriptors is not the > _only_ danger: simply retaining a particular descriptor can open the > door to exploits.
I disagree. If you are using finalizers to close files, you shouldn't be relying on the timing of the file close for security. Its possible that a type accurate collector might foil an exploit based on having the file open. But if that's the case, you just got lucky. You have a bug in the client code.
There are usually other ways for malicious code to delay such a file close. For example, it may fail to release a lock that's needed by a finalizer preceding the file close in a finalization queue. Or it may force the heap to grow, thus decreasing GC frequency. None of these depend on conservative GC.
> But for nearly all programming languages, you don't have those > guarantees anyway, for several reasons. Object reachability is > usually not precisely defined. And often the finalization > facility isn't quite up to this task.
> Agreed. This is an area where language designers and implementors > need to do a much better job, and where a much better job for our > purposes here can certainly be done. There's a hard problem of > defining reachability so as to minimize lifetimes to the greatest > extent practical -- but that hard problem isn't the one we're talking > about here: we only need to provide a useful upper-bound on lifetimes.
I doubt you can practically get that in this case, especially if you are relying on a general purpose finalization mechanism.
(I think the most serious problem with current specifications of finalization is that they don't provide a reasonable LOWER bound on finalization time. You need some guarantee that the file descriptor is not still in a register and being accessed when the object holding it becomes inaccessible and is finalized. My impression is that current implementations and don't even give you that guarantee. It's unclear the specifications do, either.)
> In my experience, this still seems to work fine inpractice, > however, conservative GC or not.
> I can't reliably regression test the weak-references implementation in > my Scheme interpreter that uses a partially-conservative GC, because > it uses a partially-conservative GC. That discovery was the > particular incident that first soured me on conservative GC.
Isn't that at least partially a bug in the regression test? If you check that most weak references are cleared, the probability of a spurious failure should rapidly go to zero with an increasing sample size. That's how I tend to write such tests. Given the state of "reachability" definitions, that's probably all you can really check for anyway, at least without knowing a lot about your optimizer.