Is there experience with writing self-hosting GC for Lisp or Scheme
systems? By "self-hosting" I mean that the GC is principally written
in Lisp/Scheme, and compiled by the native compiler. I do not mean
something written in a secondary language (like C) and compiled by a
separate compiler, or something written all in assembly language.
Obviously there are interesting problems to be solved it making such a
thing work.
What did the old Lisp Machines do? It's my understanding that the GC
was basically all written in assembly language; is that correct?
Is there research/experience on doing it? Guesses about the best ways
to make it work?
Thomas
No, it was mostly written in Lisp, IIRC. There was some hardware and
microcode acceleration, for things like ephemeral GC, but the main
algorithm was implemented in Lisp.
To do it, the Lisp system has to provide subprimitives that allow a program
to access the low-level data, so it can look at type tags and memory
addresses, perform direct memory reads and writes, etc. The GC code will
probably have to restrict itself to a subset of the full language's
capabilities; it might not even be able to cons, or it might have to do its
consing in a special region of memory that's set aside for it.
--
Barry Margolin, bar...@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
It depends what you mean by "principally written in Lisp/Scheme." If
you mean "Common Lisp or R5RS Scheme with no other functions allowed,"
I doubt that is practical. On the other hand, if you assume that
allocation, copying, and tag-bit twiddling operations are callable
from Lisp, then it is quite possible, at least for simple collectors.
>
>Obviously there are interesting problems to be solved it making such a
>thing work.
>
>What did the old Lisp Machines do? It's my understanding that the GC
>was basically all written in assembly language; is that correct?
>
>Is there research/experience on doing it? Guesses about the best ways
>to make it work?
The Utah Common Lisp collector was written in Lisp (not by me). It
was a stop-and-copy collector, and it certainly didn't look like any
other Lisp program. IIRC, (car foo) would get you the forwarding
pointer of the cons cell. On the other hand, coding the GC algorithm
in Lisp was pretty straight-forward, given the right primitives.
I'm not sure if it's advantageous to write a Lisp collector in Lisp.
Because the normal Lisp world is so inconsistant and screwed up while
running the collector, normal Lisp advantages like debuggability and
access to a repl simply don't apply.
Tim
> No, it [lispm gc] was mostly written in Lisp, IIRC. There was some
> hardware and microcode acceleration, for things like ephemeral GC,
> but the main algorithm was implemented in Lisp.
That's really cool to hear. Are there papers on implementation
around?
> To do it, the Lisp system has to provide subprimitives that allow a program
> to access the low-level data, so it can look at type tags and memory
> addresses, perform direct memory reads and writes, etc. The GC code will
> probably have to restrict itself to a subset of the full language's
> capabilities; it might not even be able to cons, or it might have to do its
> consing in a special region of memory that's set aside for it.
Sure, of course I was taking for granted the existence of suitable
peek/poke primitives.
I'm wondering about what if the entire set of language structures were
usable (for scheme, probably maybe even call/cc)--at least, as the
hard case. And if the hard case works, then why bother restricting
the language?!
So I had thought of the "special memory region" technique; the problem
is that you need to be darn sure that region doesn't run out.
Moreover, it might amount to reserving significant amounts of storage
for this case (to hold a stack, for example). Though a conventional
("in the metalanguage") GC might well need just as much extra space.
> It depends what you mean by "principally written in Lisp/Scheme." If
> you mean "Common Lisp or R5RS Scheme with no other functions allowed,"
> I doubt that is practical. On the other hand, if you assume that
> allocation, copying, and tag-bit twiddling operations are callable
> from Lisp, then it is quite possible, at least for simple collectors.
Um, it's clear that it *can't* be done from the primitives of the
standard, but that's not the point. I'm thinking about this from a
systems design perspective: if you wanted a Lisp/Scheme system and you
didn't want to write *two* compilers....
> The Utah Common Lisp collector was written in Lisp (not by me). It
> was a stop-and-copy collector, and it certainly didn't look like any
> other Lisp program. IIRC, (car foo) would get you the forwarding
> pointer of the cons cell. On the other hand, coding the GC algorithm
> in Lisp was pretty straight-forward, given the right primitives.
Well, I'll be more explicit by what I mean by "the obvious problems".
The obvious problems are: the collector itself uses memory. Obviously
the memory isn't in the same arena as everything else.
What techniques are in use for making that work?
> I'm not sure if it's advantageous to write a Lisp collector in Lisp.
> Because the normal Lisp world is so inconsistant and screwed up while
> running the collector, normal Lisp advantages like debuggability and
> access to a repl simply don't apply.
How about Lisp advantages like: "it's a better language".
That "obvious problem" isn't particularly a problem. A two-space
copying collector doesn't use memory other than what it allocates from
to-space, except for a couple of pointers which can hang out in
globals. More complex algorithms can use preallocated memory,
allocate it directly from the OS, whatever.
>
>What techniques are in use for making that work?
>
>> I'm not sure if it's advantageous to write a Lisp collector in Lisp.
>> Because the normal Lisp world is so inconsistant and screwed up while
>> running the collector, normal Lisp advantages like debuggability and
>> access to a repl simply don't apply.
>
>How about Lisp advantages like: "it's a better language".
>
Some of us are prepared to be more ecumenical in our views :)
I suppose that access to macros might be a bonus when writing a
collector in Lisp, but assuming that much Lisp functionality won't be
available in the collector or will be available in some weird and
crippled form, and that it's desirable for the collector not to cons
itself, the Lisp you write for the collector ends up looking a lot
like C.
Tim
> That "obvious problem" isn't particularly a problem. A two-space
> copying collector doesn't use memory other than what it allocates from
> to-space, except for a couple of pointers which can hang out in
> globals. More complex algorithms can use preallocated memory,
> allocate it directly from the OS, whatever.
Sure, stop and copy doesn't require much dynamic allocation in the
collector at all. Well, except if procedure invocation is doing
allocations, which might be the behavior of a compiler. In which
case, the number allocations will still be something like order n.
That's the sort of problem I have in mind...given that the collector
does do allocations--even if it's official space complexity is still a
constant--is there good knowledge about how to set the bounds?
> I suppose that access to macros might be a bonus when writing a
> collector in Lisp, but assuming that much Lisp functionality won't be
> available in the collector or will be available in some weird and
> crippled form, and that it's desirable for the collector not to cons
> itself, the Lisp you write for the collector ends up looking a lot
> like C.
Um, so the question is to tackle the hard case--that is, not crippling
the language.
And part of the motivation is to avoid the need to write two
compilers.
Thomas> So I have a question for the people who know more than me.
Thomas> Is there experience with writing self-hosting GC for Lisp or Scheme
Thomas> systems? By "self-hosting" I mean that the GC is principally written
Thomas> in Lisp/Scheme, and compiled by the native compiler. I do not mean
Thomas> something written in a secondary language (like C) and compiled by a
Thomas> separate compiler, or something written all in assembly language.
Thomas> Obviously there are interesting problems to be solved it
Thomas> making such a thing work.
Mostly not consing!
Thomas> What did the old Lisp Machines do? It's my understanding that the GC
Thomas> was basically all written in assembly language; is that correct?
As opposed to the new Lisp Machines? :)
The Lisp Machine GC (like everything else that comprised
the entire machine operating system) was written in Lisp.
Lisp Machine Lisp was the superset from which most of Common Lisp
descends, and included functions to access the hardware in various
ways, including accessing memory below the normal (Lisp-object level)
storage conventions. For example, you could hack type codes and
header words, and BLT words around.
The lowest-level machine instruction codes of the Lisp Machines was
intended for implementing Lisp, and mostly corresponded directly to
Lisp functions. The Lisp compiler just translated from higher Lisp
constructs and syntax into the more primitive Lisp constructs
implemented in the hardware. Nobody programmed in "assembly language"
on those machines -- that would be the same as programming in Lisp,
only more cumbersome.
The hardware instructions included: the various Lisp function call
and return protocols (including multiple values and CATCH and so on),
branching, stack and bindings manipulation, CONS, CAR, CDR, SET-TO-CAR,
SET-TO-CDR, RPLACA, RPLACD, GETF, MEMBER, ASSOC, EQ, EQL, GREATERP,
LESSP, LOGIOR, LOGTEST, ENDP, PLUSP, MINUSP, TYPEP, ZEROP, ADD, SUB,
MULTIPLY, etc, CEILING, TRUNCATE, ASH, ROT, LSH, array instructions
such as AREF, ASET, STORE-ARRAY-LEADER, %INSTANCE-REF, %INSTANCE-SET,
%GENERIC-DISPATCH, and so forth. The subprimitive instructions were
things like typed memory allocation and BLTs, %POINTER-DIFFERENCE,
%SET-TAG, %SET-CDR-CODE, STORE-CONDITIONAL (atomic), and stuff like
%CHECK-PREEMPT-REQUEST for handling hardware interrupts and reading
the clock and writing device control registers. Also, some instructions
were available to the compiler for common optimizations, such as an
instruction that corresponded to (SETQ X (CDR X)) seen in loops.
The actual code for the GC was just plain written in Lisp, although it
called "subprimitive functions" (eg. instructions like %POINTER-TYPE-P).
The code was written using normal Lisp syntax, such as the LOOP macro,
and was compiled by the normal Lisp compiler and run like any other.
Thomas> Is there research/experience on doing it?
Thomas> Guesses about the best ways to make it work?
See also T (an early implementation of Scheme by Rees and Pitman)
<http://www.paulgraham.com/thist.html>
> Is there experience with writing self-hosting GC for Lisp or Scheme
> systems? By "self-hosting" I mean that the GC is principally written
> in Lisp/Scheme, and compiled by the native compiler. I do not mean
> something written in a secondary language (like C) and compiled by a
> separate compiler, or something written all in assembly language.
A former Lucid employee (now a co-worker of mine) claims that Lucid
Common Lisp's collector was written entirely in Lisp.
There is a publicly available paper describing some aspects of one
particular revision of their collector:
ftp://publications.ai.mit.edu/ai-publications/1000-1499/AITR-1417.ps.Z
> So I had thought of the "special memory region" technique; the problem
> is that you need to be darn sure that region doesn't run out.
> Moreover, it might amount to reserving significant amounts of storage
> for this case (to hold a stack, for example). Though a conventional
> ("in the metalanguage") GC might well need just as much extra space.
This isn't a language issue, really, it's a general design issue for
GC algorithms.
--
/|_ .-----------------------.
,' .\ / | No to Imperialist war |
,--' _,' | Wage class war! |
/ / `-----------------------'
( -. |
| ) |
(`-. '--.)
`. )----'
> So I have a question for the people who know more than me.
>
> Is there experience with writing self-hosting GC for Lisp or Scheme
> systems? By "self-hosting" I mean that the GC is principally written
> in Lisp/Scheme, and compiled by the native compiler. I do not mean
> something written in a secondary language (like C) and compiled by a
> separate compiler, or something written all in assembly language.
If I recall correctly, Scheme48 was written in PreScheme, a Scheme
dialect, and that includes it's GC.
--
Craig Brozefsky <cr...@red-bean.com>
http://www.red-bean.com/~craig
Ask me about Common Lisp Enterprise Eggplants at Red Bean!
Read the first half of Jones & Lins book
(http://www1.fatbrain.com/asp/bookinfo/bookinfo.asp?theisbn=0471941484) for
a good survey of the state of the GC art as of about seven years or so ago.
If you're doing a basic uniprocessor implementation, it should give you the
info you need. It also has a good survey of bounded space strategies. The
multiproc and some RT stuff has moved on a bit beyond what's offered in the
book, but you can catch up by checking out the last few proceedings of the
International Symposia on Memory Management (even though you'll have to pan
a lot of Java crap to get to a few Lisp applicable nuggets).
faa
For writing a GC, it isn't. GC algorithms are all about manipulating
typed data, Lisp is about flexible types. There is something of an
impedance mismatch here. With the proper type and location reification
operators, it is certainly possible to do this in Lisp/Scheme (in fact
it's probably easier in Scheme because of guaranteed TCO), but that
doesn't mean that the expression of the GC algorithm itself is more
"natural".
That said, I'm waiting for someone to well and truly resurrect certain
aspects of the LM ideal - specifically a GC-friendly OS - although I
suspect that it may be better written in SML.
This is all IMHO. Someone is sure to flame me for this...
david rush
--
There is only one computer program, and we're all just writing
pieces of it.
-- Thant Tessman (on comp.lang.scheme)
Thomas> So I have a question for the people who know more than me.
I will try to answer anyway :-)
Thomas> Is there experience with writing self-hosting GC for Lisp or Scheme
Thomas> systems?
I have been pondering the concept of a "systems lisp" and I think that
Hnery Bakers Linear Lisp[1] has some very interesting properties for
such applications.
First of all Linear Lisp does not need a garbage collector. Since each
object is referenced only once, it is possible to deallocate an object
as soon as the unique reference to it is overwritten.
Secondly, it is quite possible that it gets easier to reason about the
memory requirements of Linear Lisp routines due to the highly regular
memory usage patterns.
The question is of course then how to integrate a Linear Lisp into the
system. In some sense it is cheating since we are using a second
language for the GC part, even though we are not cheating very much
since a Linear Lisp would be a proper subset of the real lisp.
The straightforward solution would be to extend the compiler with a
special linear mode, but I think that it would even be possible to
build a linear sublanguage on top of an existing system, if proper
access to lowlevel memory management was available.
Check your favorite Hnery Baker archive for his papers on Linear Lisp.
[1] Linear Lisp is a lisp in which all objects (the term is used here
in its generic sense) is referenced exactly once. This means that
if you want to pass on an object, you either must destroy the
original reference or explicitly copy the object.
------------------------+-----------------------------------------------------
Christian Lynbech |
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
- pet...@hal.com (Michael A. Petonic)
David> That said, I'm waiting for someone to well and truly resurrect certain
David> aspects of the LM ideal - specifically a GC-friendly OS - although I
David> suspect that it may be better written in SML.
Are you implying that SML is better suited for writing an OS than
Lisp? If so, what is the advantages of SML in that problem domain?
Christian Lynbech <lyn...@get2net.dk> writes:
> >>>>> "David" == David Rush <ku...@bellsouth.net> writes:
>
> David> That said, I'm waiting for someone to well and truly resurrect certain
> David> aspects of the LM ideal - specifically a GC-friendly OS - although I
> David> suspect that it may be better written in SML.
>
> Are you implying that SML is better suited for writing an OS than
> Lisp?
Yes.
> If so, what is the advantages of SML in that problem domain?
Static typing. Provable correctness. I *hate* slow buggy OSes.
david rush
--
Einstein said that genius abhors consensus because when consensus is
reached, thinking stops. Stop nodding your head.
-- the Silicon Valley Tarot
Well, that is hard to prove. We already had an OS in Lisp (several,
actually), it was even sold, but I aren't aware of any written
in SML...
>> If so, what is the advantages of SML in that problem domain?
>
> Static typing. Provable correctness. I *hate* slow buggy OSes.
Last time I checked, SML compilers generated pretty slow code.
When I ported programs to CMUCL they ran several times faster.
Provable correctness? SML has a really great language definition,
which is provably correct. But I don't think it follows that an OS
written in SML will be any less buggy than one written in any other
language with a less strict definition. It just doesn't follow.
Same with static typing. By your logic, OS's should be written rather
in *Haskell*.
Regards,
--
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."
PGP key ID 0x42B32FC9
>> If so, what is the advantages of SML in that problem domain?
David> Static typing. Provable correctness. I *hate* slow buggy OSes.
PreScheme gives you exactly that, which is what the GC of Scheme 48 is
written in. Not that it buys you much there except for performance,
as the (or this particular) GC primarily deals with words and bits.
--
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla
>>>>>> "David" == David Rush <ku...@bellsouth.net> writes:
>
> David> That said, I'm waiting for someone to well and truly resurrect
> certain David> aspects of the LM ideal - specifically a GC-friendly OS -
> although I David> suspect that it may be better written in SML.
>
> Are you implying that SML is better suited for writing an OS than Lisp?
> If so, what is the advantages of SML in that problem domain?
One word: types.
Matthias
> Last time I checked, SML compilers generated pretty slow code.
Must have been a long time ago.
> When I ported programs to CMUCL they ran several times faster.
See the Great PL Shootout page. There are three ML compilers (two SML
and one Ocaml) ahead of CMUCL. Two of them are *significantly* ahead.
> Provable correctness? SML has a really great language definition, which is
> provably correct. But I don't think it follows that an OS written in
> SML will be any less buggy than one written in any other language with a
> less strict definition. It just doesn't follow.
That is probably so. (Actually, "any less buggy" is probably not so.
But that's just my opinion, and I cannot prove it.)
> Same with static typing. By your logic, OS's should be written rather in *Haskell*.
Actually, no. I think that ML's fairly restrictive type system would be
just the right fit.
By the way, the type system is not just for making sure your
implementation is correct. It can also be used to *structure* the OS
itself -- in such a way that things could (at least in theory) become
much more efficient.
Let me give you an example:
In Unix-like OSes, we have the familiar "read" syscall, which roughly has
the following type signature:
val unix_read : int * pointer * int -> int
When unix_read is being called, the OS must check several things:
1. First argument:
- must be a number obtained from a previous call to open/dup/...
- must not have been subjected to "close" in the meantime
- must have been opened for reading
2. Second argument:
- must point to a writable memory region owned by the current process
- the region must contain at least <third argument> many writable
bytes
All this is being checked (by a combination of hardware protection and
actual dynamic checking *at runtime*) when the call is being made.
With a strong static type system that provides abstraction facilities,
one could have the following alternative interface:
---------------------
module FileDesc : sig
type readable
type writeable
type ('r, 'w) filedes
val open_for_reading : ... -> (readable, unit) filedes
val open_for_writing : ... -> (unit, writable) filedes
val open_for_both : ... -> (readable, writable) filedes
...
end
module Buffer : sig
type buffer
val alloc : int -> buffer
val size : buffer -> int
val getData : buffer -> char_array
...
end
open FileDesc
open Buffer
val mlos_read : (readable, 'w) filedes * buffer -> int
val mlos_write : ('r, writable) filedes * buffer -> int
--------------
The point is that both file descriptors and buffers are *unforgeable*
abstract types. So whenever a user program invokes mlos_read, it can
only do so if it has obtained a valid file descriptor and a valid buffer
beforehand. Thus, mlos_read does not need to do *any* checking of its
arguments at runtime because there is a compile-time proof that
everything will be ok. And the important contribution of the programming
language is that it lets you define such abstractions (they do not have
to be built-in).
This is just an example, and I have left out many of the details. In
fact, the above may not be the "best" design once we start living in
a compiler-checked strongly-typed world. But I think it serves to
demonstrate the idea.
Matthias
> The point is that both file descriptors and buffers are *unforgeable*
> abstract types. So whenever a user program invokes mlos_read, it can
> only do so if it has obtained a valid file descriptor and a valid buffer
> beforehand. Thus, mlos_read does not need to do *any* checking of its
> arguments at runtime because there is a compile-time proof that
> everything will be ok. And the important contribution of the programming
> language is that it lets you define such abstractions (they do not have
> to be built-in).
What magic prevents me from stopping the program at a crucial moment
and inserting some bogus stuff in these `unforgeable' types? For
instance: OS gives me a file descriptor, I then hack at it with a hex
editor, and hand it back. Oops, now I'm all over your machine.
--tim
* tmo...@sea-tmoore-l.dotcast.com (Tim Moore)
| I suppose that access to macros might be a bonus when writing a collector
| in Lisp, but assuming that much Lisp functionality won't be available in
| the collector or will be available in some weird and crippled form, and
| that it's desirable for the collector not to cons itself, the Lisp you
| write for the collector ends up looking a lot like C.
Why is this? It seem obviously false to me. The first thing you would
do in a copying garbage collector would be to switch the active half of
the two-space memory organization. The GC's task is only to move live
objects from the other half to this half, right? It should be doable
with the full language available and with no real constraints on consing.
Even reorganizing old space in a generational garbage collector should be
doable while consing from the fresh memory arena.
/// 2002-03-01
--
In a fight against something, the fight has value, victory has none.
In a fight for something, the fight is a loss, victory merely relief.
This issue is completely orthogonal to the language that the GC is written
in. After all, even if it's written in assembler, if the algorithm
requires dynamic memory allocation it will need to be done, and you'll need
to be concerned about running out of space.
Many (perhaps most) GC algorithms are able to use tricks that avoid needing
dynamic memory allocation. For instance, in a recursive copying GC, you
can use the objects in old-space to implement the stack.
Last year. I used SML/NJ, which you are probably familiar with :-))
Don't get me wrong: I think SML/NJ is a great system, I am not such
a performance junkie.
>> When I ported programs to CMUCL they ran several times faster.
>
> See the Great PL Shootout page. There are three ML compilers (two SML
> and one Ocaml) ahead of CMUCL. Two of them are *significantly* ahead.
I am aware of that web page. Frankly I don't understand why CMUCL
looks so bad there. Whenever /I/ implemented anything in several
languages, admittedly something bigger than the programs you find
there, CMUCL left most other compilers far behind, sometimes even
gcc. And yes, the OCaml compiler is just great w.r.t. performance,
note that I was talking about SML. Again: I actually prefer SML to
OCaml for various reasons; but it is not my experience that it is
particularly fast.
>> Same with static typing. By your logic, OS's should be written rather
>> in *Haskell*.
>
> Actually, no. I think that ML's fairly restrictive type system would be
> just the right fit.
I am not surprised that you think so ;-) But the point remains true:
If you advocate using a language that forbids the programmer to do
all kinds of things in order to make programs safer, why not go all
the way? Shouldn't we use Haskell, then? Why should that *one*
point, that we know there won't be any runtime type errors in SML,
make something like an OS so instantaniously safer? I thought there
were /lots and lots/ of different kinds of runtime errors usually
occurring in programs. Writing device drivers and hacking around
in kernels is my job. I do that every day. The problems I face
there are of a totally different nature. Buggy hardware for instance.
Timing problems. Synchronization. Braindead protocols. Deadlocks.
Non-aligned buffers. Memory management. Wild pointers. Registers
that don't work as advertised. I don't think much of these will
go away only because I won't have runtime type errors. What I could
need down there is an expressive language that makes programming
those things easier, on a more abstract level. And nothing is more
expressive than Lisp.
> By the way, the type system is not just for making sure your
> implementation is correct. It can also be used to *structure* the OS
> itself -- in such a way that things could (at least in theory) become
> much more efficient.
And all of a sudden we don't disagree anymore: Sure, a rich type
system is a nice thing. I am wondering if you are aware of the
fact that Common Lisp in fact /has/ a very rich type system: Your
example could be very easily rewritten in CLOS, for instance.
Static typing fanatics often pretend that there is no typing
besides static typing, but Common Lisp is /not/ an untyped language.
Very strange that this has to be repeated over and over again.
If you call a Lisp function with an argument it doesn't expect and
can't handle it will signal an error which you can catch. It won't
do any weird things like an untyped language would. And the
important observation is that experienced programmers hardly if
ever /get/ any of those runtime type errors. And when they do,
then only because of some trivial oversight. Typically when
that happens, a type-error is signalled the first time you run the
program and the bug is trivially fixed. Compare that with your
experience: When was the last time SML/NJ complained about a
type error you programmed that revealed a significant bug in your
program? Sure, when I was a newbie in ML I got type errors all
the time and I was very impressed by how the compiler found all
those errors I made. But when I became more experienced, those
errors occurred more and more rarely. In the end they found
simple typos, not more. Or complained about things which I
knew were perfectly correct but were too hard for the type
inference algorithm to figure out. Then it became very annoying.
We'll probably never agree on this, but I wish people would stop
making claims like ``Lisp is an untyped language''; you didn't
make that claim, but all too many people draw that conclusion.
What happens when you close a file? Suppose I get an opened-for-reading
file descriptor back from open, store it somewhere, and its type is known
to be such that I can mlos_read from it. Do we have a different way to
keep track of this open file than storing (a reference to) the file
descriptor in a variable? If not, how does close communicate the type
back to the compiler so that there is only compile-time type checking
that prevents you from calling mlos_read on a closed file descriptor?
It is probably something very trivial in SML, but I keep getting confused
by such things as the run-time behavior of streams, and wonder how a file
descriptor that has hit end of file is prevented at compile-time from
being used to read more data, and other such simple things.
///
I've been hearing good things about MLton. And OCaml is widely known
for it's excellent code. My experience of SML/NJ was that the 0.93
codebase produced acceptably fast code. Then it got slower, and I
switched to Scheme (those facts are *not* related, actually).
> >> Same with static typing. By your logic, OS's should be written rather
> >> in *Haskell*.
Actually OSes are so inherently stateful that I can't see how using a
pure functional language would help. Perhaps my imagination is simply
inadequate.
> Writing device drivers and hacking around
> in kernels is my job. I do that every day. The problems I face
> there are of a totally different nature. Buggy hardware for instance.
> Timing problems. Synchronization. Braindead protocols. Deadlocks.
> Non-aligned buffers. Memory management. Wild pointers. Registers
> that don't work as advertised. I don't think much of these will
> go away only because I won't have runtime type errors. What I could
> need down there is an expressive language that makes programming
> those things easier, on a more abstract level. And nothing is more
> expressive than Lisp.
Actually, I found SML to be incredibly expressive, but admittedly my
SML programs generally turned into huge masses of functors...
david rush
--
Don't you think it would be a useful item to add to your intellectual
tookit to be capable of saying, when a ton of wet steaming bullshit
lands on your head, "My goodness, this appears to be bullshit?"
-- Douglas MacArthur Shaftoe, in _Cryptonomicon_
>> What I could need down there is an expressive language that
>> makes programming those things easier, on a more abstract level.
>> And nothing is more expressive than Lisp.
> Actually, I found SML to be incredibly expressive, but admittedly my
> SML programs generally turned into huge masses of functors...
Oh, I agree that SML is very expressive. SML would certainly be
the language of my choice... if Common Lisp didn't exist :-)
Matthias> When unix_read is being called, the OS must check several things:
Matthias> 1. First argument:
Matthias> - must be a number obtained from a previous call to open/dup/...
Matthias> - must not have been subjected to "close" in the meantime
Matthias> - must have been opened for reading
It may be that there are things I do not know about the ML system, or
how an ML OS would work file objects, but as I know file objects from
UNIX (be it represented by a small integer or something else), such an
object has state; so how would it be verifiable at compile time that a
particular call to `read' would not be operating on an object that has
been closed?
Is the ML compiler smart enough to know that in regions following the
close operation, the object has another type and would that work in a
multiprocess setting?
Or did it just misinterpret the extent of your claims (in which case I
must state for the record that it was with no malicious intent :-)
Well, there are certainly a lot of things that one has to be very careful
about here. For one, the OS clearly cannot let you run any old code that
comes out of a hex editor. A low-tech solution might be cryptographic
fingerprints inserted by a certified compiler (questionable, because
compilers tend to be buggy). A better solution might eventually emerge
from the idea of proof-carrying code.
Matthias
> > Are you implying that SML is better suited for writing an OS than Lisp?
> > If so, what is the advantages of SML in that problem domain?
> One word: types.
I hope that wasn't an intentional troll, but merely a bizarre and
horrible misunderstanding of the entire way Lisp works.
--
-> -/- - Rahul Jain - -\- <-
-> -\- http://linux.rice.edu/~rahul -=- mailto:rj...@techie.com -/- <-
-> -/- "I never could get the hang of Thursdays." - HHGTTG by DNA -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
Version 11.423.999.221020101.23.50110101.042
(c)1996-2002, All rights reserved. Disclaimer available upon request.
Actually, they do. Writing the GC in subset of Lisp that doesn't cons has
benefits like:
- use of the many code-construction mechanisms that Lisp allows (macros etc)
- easy sharing of the constants needed by the compiler to implement primitives
(assuming your Lisp compiler is written in Lisp :-)
- the ability to develop and test code on-the-fly (sure, if you screw up badly
then you are lost, but many cases this doesn't happen).
--
Martin Simmons, Xanalys Software Tools
zne...@xanalys.com
rot13 to reply
> * Matthias Blume
> | The point is that both file descriptors and buffers are *unforgeable*
> | abstract types. So whenever a user program invokes mlos_read, it can
> | only do so if it has obtained a valid file descriptor and a valid
> buffer | beforehand. Thus, mlos_read does not need to do *any* checking
> of its | arguments at runtime because there is a compile-time proof that
> | everything will be ok. And the important contribution of the
> programming | language is that it lets you define such abstractions
> (they do not have | to be built-in).
>
> What happens when you close a file?
You wouldn't.
> Suppose I get an
> opened-for-reading file descriptor back from open, store it somewhere,
> and its type is known to be such that I can mlos_read from it. Do we
> have a different way to keep track of this open file than storing (a
> reference to) the file descriptor in a variable? If not, how does
> close communicate the type back to the compiler so that there is only
> compile-time type checking that prevents you from calling mlos_read on
> a closed file descriptor?
Well, if you insist on having "close" around, there are linear type
systems or unique types that have been built into some languages (but not
ML). Alternatively, one could probably use monads in some clever way.
> It is probably something very trivial in SML, but I keep getting
> confused by such things as the run-time behavior of streams, and
> wonder how a file descriptor that has hit end of file is prevented at
> compile-time from being used to read more data, and other such simple
> things.
Notice that I did not include "hitting the end of the file" into my
earlier description of which runtime checks I want to avoid. Of course,
nobody should claim that *every* runtime check can be eliminated by
static analysis, doing so would be silly. But the checks that can
be eliminated should be, IMO.
Matthias
> In article <pan.2002.03.01.10....@shimizu-blume.com>,
> Matthias Blume wrote:
>> On Fri, 01 Mar 2002 06:58:35 -0500, Nils Goesche wrote:
>>
>>> Last time I checked, SML compilers generated pretty slow code.
>>
>> Must have been a long time ago.
>
> Last year. I used SML/NJ, which you are probably familiar with :-))
> Don't get me wrong: I think SML/NJ is a great system, I am not such a
> performance junkie.
>
Thanks. I am not such a performance junkie either. :-)
(You brought that topic up.) But I acknowledge that poor performance can
be a showstopper in some situations.
>>> When I ported programs to CMUCL they ran several times faster.
>>
>> See the Great PL Shootout page. There are three ML compilers (two SML
>> and one Ocaml) ahead of CMUCL. Two of them are *significantly* ahead.
>
> I am aware of that web page. Frankly I don't understand why CMUCL looks
> so bad there. Whenever /I/ implemented anything in several languages,
> admittedly something bigger than the programs you find there, CMUCL left
> most other compilers far behind, sometimes even gcc.
Maybe you are just more experienced in cranking out efficient code for
CMUCL than you are for SML. I am serious about this: The stuff on "that
web page" are actually putting SML/NJ at a disadvantage because Bagley
insisted on not using our latest versions -- which especially for the
x86 platform produces much better code almost all of the time. Moreover,
some of the code was written in a slightly strange style. For example,
using 110.39 sped up the matrix multiplication code by a factor of 3,
and re-coding that code in a (to me) more natural style sped it up by
another factor of 2.
> And yes, the OCaml
> compiler is just great w.r.t. performance, note that I was talking about
> SML. Again: I actually prefer SML to OCaml for various reasons; but it
> is not my experience that it is particularly fast.
Certainly. We perfectly agree here.
>
>>> Same with static typing. By your logic, OS's should be written rather
>>> in *Haskell*.
>>
>> Actually, no. I think that ML's fairly restrictive type system would
>> be just the right fit.
>
> I am not surprised that you think so ;-) But the point remains true: If
> you advocate using a language that forbids the programmer to do all
> kinds of things in order to make programs safer, why not go all the way?
> Shouldn't we use Haskell, then?
We could. I am just a bit paranoid about programming in a language where
switching from writing a + b to b + a (literally!) can change the
complexity class of your program from O(n) to O(n^2) etc. Don't get me
wrong, Haskell is a wonderful language as far as beauty is concerned.
But I prefer ML because my brain is wired in such a way that I can more
easily predict how well code that I write down will perform in practice.
Other people's brains are wired differently, and they may well go ahead
and code an OS in Haskell.
> Why should that *one* point, that we
> know there won't be any runtime type errors in SML, make something like
> an OS so instantaniously safer?
It is not "instantanious". It actually takes quite a bit of experience
in setting up the right invariants and then code them into the type
system.
>> By the way, the type system is not just for making sure your
>> implementation is correct. It can also be used to *structure* the OS
>> itself -- in such a way that things could (at least in theory) become
>> much more efficient.
>
> And all of a sudden we don't disagree anymore: Sure, a rich type system
> is a nice thing. I am wondering if you are aware of the fact that
> Common Lisp in fact /has/ a very rich type system: Your example could be
> very easily rewritten in CLOS, for instance.
But the CLOS type system will not be checked at compile time, at least
not the same way that the ML typesystem is being checked. That's at
least my understanding. Correct me if I am wrong. Here is the acid
test:
I want to store in a variable x a 32-bit quantity that has the exact same
representation as an 'unsigned int' in C but for which I have a
compile-time guarantee that it will never be an odd number. What I want
to be able to do on these numbers are the "usual" four arithmetic
operations (with division rounding down to the next even number), and
perhaps some comparisons. And a conversion from/to "unsigned int"
(again, with rounding down to the next even). In SML I write:
structure Even :> sig
type even
val + : even * even -> even
val - : even * even -> even
val * : even * even -> even
val / : even * even -> even
val compare : even * even -> order
val fromWord: Word32.word -> even
val toWord: even -> Word32.word
end = struct
fun roundeven (x: Word32.word) =
Word32.andb (x, 0wxfffffffe)
type even = Word32.word
val op + = Word32.+
val op - = Word32.-
val op * = Word32.*
fun x / y = roundeven (Word32./ (x, y))
val compare = Word32.compare
fun toWord x = x
val fromWord = roundeven
end
Now, whenever I see a value of type "even", I can be absolutely certain
that it will be an even number -- but the point is that all the
operations (except /) are *precisely* as efficient as the ones on Word32.
How do you do this in Common Lisp?
> Static typing fanatics
> often pretend that there is no typing besides static typing, but Common
> Lisp is /not/ an untyped language. Very strange that this has to be
> repeated over and over again. If you call a Lisp function with an
> argument it doesn't expect and can't handle it will signal an error
> which you can catch.
Right, but that happens only after I have already called it -- which is
too late. This is precisely the thing that I described earlier: OS
syscalls do dynamic runtime checks. But what I want is to get rid of
these runtime checks -- safely (i.e., without compromising the integrity
of the OS).
> It won't do any weird things like an untyped
> language would. And the important observation is that experienced
> programmers hardly if ever /get/ any of those runtime type errors. And
> when they do, then only because of some trivial oversight.
That is irrelevant because you use types in quite a different way than
the one I am proposing. Catching the occasional programming error (which
happens more rarely the more experienced the programmer is) is one thing,
catching the malicious attempt at breaking an abstraction (which happens
more frequently the more experienced the malicious hacker is) is another.
I don't want a guarantee that works 99.9% of the time because that's not
good enough for this particular application. What I want is something
that works always, but I want to eliminate the usual runtime penalty for it.
Matthias
Yes, but the two-space model usually assumes equal sized spaces. If from-space
is almost full of live data you won't have much room in the to-space for data
allocated during the copying.
Also, the subset of Lisp available will be constrained to things that can run
safely at any allocation point. E.g. you can't use anything that alloctates
while holding a lock (I'm assuming the is keeping some data structure
consistent).
>>>>>> "Matthias" == Matthias Blume <matt...@shimizu-blume.com> writes:
>
> Matthias> When unix_read is being called, the OS must check several
> things:
>
> Matthias> 1. First argument:
>
> Matthias> - must be a number obtained from a previous call to
> open/dup/... Matthias> - must not have been subjected to "close"
> in the meantime Matthias> - must have been opened for reading
>
> It may be that there are things I do not know about the ML system, or
> how an ML OS would work file objects, but as I know file objects from
> UNIX (be it represented by a small integer or something else), such an
> object has state; so how would it be verifiable at compile time that a
> particular call to `read' would not be operating on an object that has
> been closed?
Well, I did explicitly say that I want to structure things differently,
so I was not really thinking of a Unix-style treatment of files.
As I have already said in a reply to another such question, I would
probably not have a "close" operation at all -- for pretty much the same
reason that we don't need "free" in Lisp or ML.
And again, not all runtime checks can be eliminated this way -- in
particular not those that are testing for inherently dynamic properties.
> Is the ML compiler smart enough to know that in regions following the
> close operation, the object has another type and would that work in a
> multiprocess setting?
I don't know what this has to do with multiprocess settings. (Notice
that even in Unix you can close a file in one process and another process
can still read/write from its (clone of that) file descriptor.)
And there are static typing mechanisms that would allow one to express the
effect that "close" has (linear types, unique types, monads).
Matthias
I assume that if this technique is used at all, it is used all the time,
it is not suddenly used in a system that did not do this previously, but
your argument looks like you think someone would suddenly start to do
some allocation in the garbage collector of a Common Lisp system that had
never done this before. I find this a rather peculiar inability to see
the simplest possible ramification of a suggestion: that it be used all
the time. So, at least some of the garbage in from-space is the direct
result of its collection phase of what is now to-space, right? In other
words, there is at least as much space as you allocated last time you
collected.
Also, what happens in a system that does not allocate during collection?
Do they crash and burn if they they have to grow their new-space because
it did not release enough space on the last allocation? No? If not,
survival tactics of a similar kind _might_ be re-usable in a collector
that does allocation, don't you think?
In other words, your fake "problem" does not exist. If it should spring
up nonetheless, the solution is the same as for a system that does not
allocate during collection. Worst case, we start garbage collection some
time prior to actually hitting the roof of the available space. Yet none
of these very trivial solutions came to mind between starting to write
your article and deciding to post it after it was written. How come?
| Also, the subset of Lisp available will be constrained to things that can
| run safely at any allocation point. E.g. you can't use anything that
| alloctates while holding a lock (I'm assuming the is keeping some data
| structure consistent).
This makes even less sense than your previous, non-existing problem.
What other parts of the system inhibits allocation while being locked?
Please remember that garbage collection happens at a time when the memory
allocation subsystem has the control of the system, so if you had a lock
and you never allocated anything before releasing it, you would never
trigger a garbage collection in the first place.
Do not waste my time by trying to make me think for you. Think and post
in that order. please.
///
The guy hanging on the other side of the socket connection represented
by your file descriptor will most certainly not appreciate this
attitude of yours :-)
I'm not sure what you find obviously false; certainly some algorithms,
when implemented in Lisp, will look more like C or Fortran than
typical Lisp code just by nature of the algorithm itself. I believe
GC fits in that category. Nevertheless, here are some additional
reasons why the GC-in-Lisp -- which I haven't looked at in 10 years --
did not look like typical Lisp code to me:
Instead of using the normal Lisp accessor function like car and aref,
open-coded equivalents of the same were used everywhere to avoid type
checking and other surprises. Perhaps we could have achieved the same
by using (optimize inline) and other declarations. But we didn't.
The semantics of the accessors was further muddied by forwarding
pointers, which were stashed in odd spots. Even simple things
like car require a check for a forwarding pointer, or not, depending
on the context. I could imagine that this requirement would break
pieces of the full language, for example conditions and handler-bind
if the handlers are stored in a list.
I wouldn't say that a GC can't cons in the course of doing a
collection, just that it's desirable not to. It may be awkward to
reclaim the GC's own storage as garbage immediately, especially if in
a two-space collector the storage is simply allocated in to-space.
You can put in arbitrary effort to making the full language available
for use in the GC. On the other hand, in a Common Lisp implementation
there are a lot of things you into which you can put arbitrary effort :)
Tim
> In article <pan.2002.03.01.15....@shimizu-blume.com>,
> Matthias Blume wrote:
>> As I have already said in a reply to another such question, I would
>> probably not have a "close" operation at all -- for pretty much the
>> same reason that we don't need "free" in Lisp or ML.
>
> The guy hanging on the other side of the socket connection represented
> by your file descriptor will most certainly not appreciate this attitude
> of yours :-)
Well, there are two things that "close" does to a socket: it signals the
other guy that communication is over, and it deallocates the data
structures associated with this communication (if I am the last one
holding on to it). What I was proposing was to not do the deallocation.
The runtime check for "is communication over?", of course, cannot be
eliminated because it is inherently dynamic. (Actually, with unique
or linear types, or with monads, there might be a way around even this.)
Anyway, for the last time, I am not saying that every dynamic check can
be eliminated. But many can, and those I would like to...
Matthias
> Followup-to ignored because I read c.l.s although I'm not sure that I
> really want to see the flamewar that is brewing...
>
> Christian Lynbech <lyn...@get2net.dk> writes:
>> >>>>> "David" == David Rush <ku...@bellsouth.net> writes:
>>
>> David> That said, I'm waiting for someone to well and truly resurrect
>> certain David> aspects of the LM ideal - specifically a GC-friendly OS -
>> although I David> suspect that it may be better written in SML.
>>
>> Are you implying that SML is better suited for writing an OS than
>> Lisp?
>
> Yes.
>
>> If so, what is the advantages of SML in that problem domain?
>
> Static typing. Provable correctness. I *hate* slow buggy OSes.
IMHO a new OS would be much more interesting if written in a highly dynamic
language like Lisp than a static language like SML.
Operating Systems are very dynamic environments and you really want it to
adapt to the actual need. How much runtime introspection do you have in SML?
How easy is it to introspect and patch running programs? How easy is it to
add scripting capabilities to programs written in SML? (all rather trivial
and elegant in Lisp)
I don't think that there is a need for yet another OS that would only add
compile-time type-safety to its mostly static and autistic
program-components.
What I would want to see in new OS developments are things like moving the
general OS behaviour from patiens to agens. I want the OS to actively
observate the user(s) and try to actively act in a way that helps him/them
to reach his/their goals. I cannot imagine developing such a thing in a
language that is any less dynamic than lisp.
I sometimes wish that the programming environments in our so called "modern
operating systems" would not distinguish so strongly between user
interaction (shells, scripting) and "real" program development. A language
like Lisp would enable a smooth transition from using it as a shell or for
little scripting tasks up to writing whole applications. Communication
between programs would be trivial and scripting would come for free.
ciao,
Jochen
In other words, the "provable correctness" claim that was stated by another
in an earlier post is false. It is only static type claims that are
enforced, a non-trivial, but fairly gross level of checking and certainly
one that can be circumvented in any "useful" system (read any system that
needs to input untagged binary object representations).
faa
> Matthias Blume wrote:
>> And again, not all runtime checks can be eliminated this way -- in
>> particular not those that are testing for inherently dynamic
>> properties.
>
> In other words, the "provable correctness" claim that was stated by
> another in an earlier post is false.
Better: The claim is not related to the above problem. (Dynamic tests
do not preclude provable correctness -- in fact, correctness will
often require certain dynamic tests to be present (and correct :-).)
> It is only static type claims that
> are enforced, a non-trivial, but fairly gross level of checking and
Well, how far one can push these things remains to be seen. It is
certainly not the technology of today.
> certainly one that can be circumvented in any "useful" system (read any
> system that needs to input untagged binary object representations).
This I would dispute (your definition of "useful", that is). In our
more and more networked world, it is less and less likely that untagged
binary objects can be tolerated as input (and then used, e.g., as
executable code). This is completely independent of whether or
not static typing is involved. For safety and security, it is absolutely
necessary to perform checks on dynamic data. Static typing is simply a
technique to reduce the number of them by providing compile-time proofs
that some of them will always come out true, no matter what.
Matthias
You if you send him a message telling him you're done communicating.
> This I would dispute (your definition of "useful", that is). In our
> more and more networked world, it is less and less likely that untagged
> binary objects can be tolerated as input (and then used, e.g., as
> executable code). This is completely independent of whether or
> not static typing is involved. For safety and security, it is absolutely
> necessary to perform checks on dynamic data. Static typing is simply a
> technique to reduce the number of them by providing compile-time proofs
> that some of them will always come out true, no matter what.
You may dispute it. The fact that there are legacy machines out there that
provide these capabilities and the fact that it is cheaper to use this
representation (in the sense of sending fewer bits) will ensure that it has
some value in certain configurations and will need to be handled in a
dynamic way even in static-typed systems.
My concern is that many people blur the line between levels of safety (note
the earlier posters confusion between "statically type-checked" and
"provably correct") and that many static typing afficianados like to take
advantage of this blurring to make assertions that languages that do not
provide these checks are somehow deficient rather than having simply
selected a different point on the single axis labeled "type-safe", perhaps
to gain ground on other valuation axes.
faa
Here's one way:
Don't provide 'open' and 'close' operations.
Instead, provide (in Haskell syntax, my ML is rusty):
type ReadableFile = ... opaque
type WritableFile = ... opaque
withInputFile :: (ReadableFile -> IO ()) -> FilePath -> IO ()
withOutputFile :: (WritableFile -> IO ()) -> FilePath -> IO ()
readChar :: ReadableFile -> IO (Maybe Char)
... etc.
'withInputFile proc filename' opens the file for reading,
calls 'proc <filehandle>', closes the file, and returns ().
(I'm ignoring error conditions for the moment).
There's no way to get a ReadableFile except by passing
a callback procedure to withInputFile, and since this returns ()
the file handle can't "escape" after it's been closed.
(Well, you could use call/cc, mutable variables, or some
other trick to sneak an open file handle out of its dynamic
extent, but I think the above should be safe in a pure HM
type system).
--Joe English
> Read the first half of Jones & Lins book
> (http://www1.fatbrain.com/asp/bookinfo/bookinfo.asp?theisbn=0471941484) for
> a good survey of the state of the GC art as of about seven years or so ago.
> If you're doing a basic uniprocessor implementation, it should give you the
> info you need. It also has a good survey of bounded space strategies. The
> multiproc and some RT stuff has moved on a bit beyond what's offered in the
> book, but you can catch up by checking out the last few proceedings of the
> International Symposia on Memory Management (even though you'll have to pan
> a lot of Java crap to get to a few Lisp applicable nuggets).
No, it doesn't, because you aren't thinking about the actual
complexity of the problem.
Jones & Lins uses normal space complexity measurements, which don't
apply here, because garbage created by the GC is not itself cleaned;
as a result, such an algorithm uses more space than its actual space
complexity.
> Also, the subset of Lisp available will be constrained to things that can run
> safely at any allocation point. E.g. you can't use anything that alloctates
> while holding a lock (I'm assuming the is keeping some data structure
> consistent).
This kind of restriction is *exactly* what I want to avoid. I'll be
posting a revised version of my question next week.
> Instead of using the normal Lisp accessor function like car and aref,
> open-coded equivalents of the same were used everywhere to avoid type
> checking and other surprises. Perhaps we could have achieved the same
> by using (optimize inline) and other declarations. But we didn't.
So now perhaps it is clear why my original problem is a *problem* and
not a triviality.
See, open coding such functions is exactly what I want to avoid. The
question is "implement GC in lisp", not "implement GC in some
pseudo-Lisp with all the guts marked off-limits".
> I wouldn't say that a GC can't cons in the course of doing a
> collection, just that it's desirable not to. It may be awkward to
> reclaim the GC's own storage as garbage immediately, especially if in
> a two-space collector the storage is simply allocated in to-space.
So this is a failure to look at the actual space of solutions. For
example, if the GC thread has a special arena to allocate from, then
it's perfectly fine to cons.
Well, not in the same way, I guess. I declare whatever I want
to be checked...
I am not sure what to make of this. I'd never define a type like
this, but you could do something like
(deftype even (num-type &optional size)
`(and ,(if size
(list num-type size)
num-type)
(satisfies evenp)))
(defun blark (x)
(declare (type (even unsigned-byte 32) x))
(+ x 42))
and whether that will be slower or faster than if you left out the
declaration depends only on compilation settings. Sure, you'll say
that I won't have any guarantee that x will in fact be an even
number. Well, that's right: There is no guarantee that you
won't have any runtime type errors in CL. That's the point:
I think it doesn't matter :-) I don't have any guarantee that
I won't try to use too large array indices, either, for instance.
> I don't want a guarantee that works 99.9% of the time because that's not
> good enough for this particular application. What I want is something
> that works always, but I want to eliminate the usual runtime penalty for it.
I know what you mean. There are lots of things I'd like to have, too.
But there is no such thing as a free lunch. Your perfect type
inference algorithm comes with a price: It will always be too
restrictive, there is nothing one can do about that. You think
that price is acceptable, I think it isn't.
This reminds me of what some people percieve as contradicting goals
of physicists and mathematicians: The physicists want to get the
job done. If they can't prove some formula they need rigorously
they'll use it anyway. They have no time to do otherwise.
Some of them don't like mathematicians because they think that
mathematicians were always trying to force them to prove everything
rigorously or whatever, but strangely I have never met a
mathematician who actually did that. In the same way, I want
to get the program done. Fast. I can't spend days and days
to find workarounds for some obscure restrictions on types
of elements of modules or something that are only there to ensure
the soundness of your carefully designed type system when I know
that what I am trying to do will work anyway in that case.
And I want to be able to quickly redesign and change arbitrarily large
parts of my software, and your type system will certainly get
in the way again.
You'll disagree, of course. Let's leave it at that. There is
no way you or I can /prove/ that our respective views on
practical programming is correct. Only practice can show -- people
have to try out for themselves :-)
Regards,
--
Nils Goesche
Ask not for whom the <CONTROL-G> tolls.
PGP key ID 0xC66D6E6F
One thing that I think you should bear in mind is what these checks
cost, or rather don't. Imagine the case where the OS has been given a
file handle (or something that claims to be one) and a buffer
(likewise). In a dynamically checked system you have to do a check to
make sure the filehandle is a filehandle of a suitable kind, and a
check to make sure the buffer points at writable space belonging to
whoever made the call. And then you have to fetch data from the disk
and put it into the buffer. Even if the data is already cached you're
doing a memory-to-memory copy of probably hundreds of bytes. It's in
the nature of modern processors that this will *hugely* outweigh the
cost of the checks unless you have done something terribly wrong. On a
PDP11 with fast memory and a slow processor this might have mattered,
on a modern machine where you spend most of your time waiting for
memory it's an irrelevance.
So the efficiency argument is really a red-herring. A couple of more
interesting arguments are that the OS design might be simplified and
that errors might happen at a more convenient time. The former I
don't know about (to make such a system work would require such
radical changes that it's hard to reason about simplicity), the latter
is also fairly bogus because you still need to handle things like the
disk filling up, ECC errors from main memory and so on - no amount of
theorem proving will make these go away or mean that well-written
applications (databases say) need to try and cope.
--tim
> > See the Great PL Shootout page. There are three ML compilers (two SML
> > and one Ocaml) ahead of CMUCL. Two of them are *significantly* ahead.
>
> I am aware of that web page. Frankly I don't understand why CMUCL
> looks so bad there. Whenever /I/ implemented anything in several
> languages, admittedly something bigger than the programs you find
> there, CMUCL left most other compilers far behind, sometimes even
> gcc. And yes, the OCaml compiler is just great w.r.t. performance,
> note that I was talking about SML. Again: I actually prefer SML to
> OCaml for various reasons; but it is not my experience that it is
> particularly fast.
I looked again at that page, and (randomly) chose the sieve of
Eratosthenes. Here, the CMUCL code uses a fixnum array instead of a
char array as the gcc code. On my machine, I get
Evaluation took:
1.07 seconds of real time
0.92 seconds of user run time
0.0 seconds of system run time
0 page faults and
32784 bytes consed.
Choosing a boolean array with CMUCL, I get
Evaluation took:
0.3 seconds of real time
0.3 seconds of user run time
0.0 seconds of system run time
0 page faults and
32784 bytes consed.
NIL
which would make it the third-fastest (after gcc with 0.23 and ghc
with 0.27).
Considering that it is written in the methodology:
http://www.bagley.org/~doug/shootout/bench/sieve/
that one should use small arrays (for fitting in cache size) I find
the comparison quite unfair. I'll mail an improved version to Doug
Bagley.
Yours, Nicolas.
I'm sorry for it, but the above is nonsense. CMUCL allocates full
words also for booleans (as can be seen from the consed bytes).
[Additionally, the original code contained an omission (it does not
reinitialize the array for each test run), which I have augmented with
another error...]
The correct result for CMUCL on my machine is
Evaluation took:
0.58 seconds of real time
0.58 seconds of user run time
0.0 seconds of system run time
0 page faults and
32784 bytes consed.
Compared with gcc (same version as used for the test)
real 0m0.268s
user 0m0.260s
sys 0m0.000s
such that CMUCL is a factor of 2 slower which is quite typical. I
don't understand why Doug Bagley reports 0.94 secs for CMUCL, and I'll
ask him for that.
Nicolas.
P.S. 1: Here is the code I used:
(defun main ()
(let ((flags (make-array 8193 :element-type 'fixnum :initial-element 1)))
(loop repeat 900 of-type fixnum for count of-type fixnum = 0 then 0 do
(loop for i fixnum from 2 upto 8192 do
(unless (zerop (aref flags i))
(loop for k fixnum from (* 2 i) upto 8192 by i do
(setf (aref flags k) 0))
(incf count)))
(dotimes (i 8192) (setf (aref flags i) 1))
finally (format t "Count: ~D~%" count))))
(time (main))
P.S. 2: I tested also strings and bit arrays but results were worse.
> * Matthias Blume wrote:
>> Right, but that happens only after I have already called it -- which is
>> too late. This is precisely the thing that I described earlier: OS
>> syscalls do dynamic runtime checks. But what I want is to get rid of
>> these runtime checks -- safely (i.e., without compromising the
>> integrity of the OS).
>
> One thing that I think you should bear in mind is what these checks
> cost, or rather don't. Imagine the case where the OS has been given a
> file handle (or something that claims to be one) and a buffer
> (likewise). In a dynamically checked system you have to do a check to
> make sure the filehandle is a filehandle of a suitable kind, and a check
> to make sure the buffer points at writable space belonging to whoever
> made the call. And then you have to fetch data from the disk and put it
> into the buffer. Even if the data is already cached you're doing a
> memory-to-memory copy of probably hundreds of bytes. It's in the nature
> of modern processors that this will *hugely* outweigh the cost of the
> checks unless you have done something terribly wrong. On a PDP11 with
> fast memory and a slow processor this might have mattered, on a modern
> machine where you spend most of your time waiting for memory it's an
> irrelevance.
Well, "read" was just one example -- and maybe not the best one, at least
not if you want to retain an Unix-style interface. This is part of the
problem, really. For example, in a strongly typed world I could imagine
the kernel handing back directly a pointer to its internal buffer data
structures (knowing that the type system will prevent user code to mess
with its invariants). This way you get zero-copy I/O -- which for a
while has been pretty much the holy grail of OS implementation --
essentially for free.
Or, another example, consider "fstat": In a world where kernel and user
space can safely share memory (because the type system will prevent "bad"
things from happening), all you need to do is give user
code a pointer to the kernel's data structures. No context switch
necessary, no buffer copying, no checks whether the buffer exists and
is writable, etc. pp.
Matthias
That makes some more assumptions, E.g.
1) The first time you won't have any garbage from the previous collection.
2) A collection cannot allocate more than the previous one.
3) All allocation during collection is garbage by the time the next collection
starts.
Of course it would be possible to make a system that will work most of the time
(or in specific cases), but that has to be made clear from the start.
> Also, what happens in a system that does not allocate during collection?
> Do they crash and burn if they they have to grow their new-space because
> it did not release enough space on the last allocation? No? If not,
> survival tactics of a similar kind _might_ be re-usable in a collector
> that does allocation, don't you think?
Could do, provided they work in the middle of a collection rather than at the
end of one where they are normally used.
> In other words, your fake "problem" does not exist. If it should spring
> up nonetheless, the solution is the same as for a system that does not
> allocate during collection. Worst case, we start garbage collection some
> time prior to actually hitting the roof of the available space. Yet none
> of these very trivial solutions came to mind between starting to write
> your article and deciding to post it after it was written. How come?
Actually, I discarded the "allow some extra space" idea because it doesn't work:
it also assumes 3 above.
>
> | Also, the subset of Lisp available will be constrained to things that can
> | run safely at any allocation point. E.g. you can't use anything that
> | alloctates while holding a lock (I'm assuming the is keeping some data
> | structure consistent).
>
> This makes even less sense than your previous, non-existing problem.
> What other parts of the system inhibits allocation while being locked?
> Please remember that garbage collection happens at a time when the memory
> allocation subsystem has the control of the system, so if you had a lock
> and you never allocated anything before releasing it, you would never
> trigger a garbage collection in the first place.
By "available/use" I meant "available/use in the GC" (since that would result in
recursive entry to the lock if the GC was triggered by allocating while holding
a lock).
But this is already done. Instead of the kernel handling a pointer
back, the user process hands a pointer in and the kernel reads direct
into that.
I really think that going after efficiency is not the thing to do:
conventional OSs have really good implementations by now. You need to
aim for something which is not solved in conventional systems, not
even those that check dynamically (so buffer overflow isn't it). I
have no idea what the win might be (I think there isn't one, but then
I'm not a fan of static type systems, so I would think that).
--tim
The Jalapeno Java JVM:
http://www.research.ibm.com/jalapeno/publication.html#oopsla99_jvm
Squeak Smalltalk:
ftp://st.cs.uiuc.edu/Smalltalk/Squeak/docs/OOPSLA.Squeak.html
David Rush wrote:
> Actually OSes are so inherently stateful that I can't see how using a
> pure functional language would help. Perhaps my imagination is simply
> inadequate.
Simple constraint systems (so simple they are essentially spreadsheets
for program state) are at once functional and all about state. My Cell
system /feels/ functional, but its purpose is to keep program state
internally consistent. This has puzzled me for a while. :)
--
kenny tilton
clinisys, inc
---------------------------------------------------------------
"Be the ball...be the ball...you're not being the ball, Danny."
- Ty, Caddy Shack
I vastly prefer open insults to veiled ones, thank you.
I am sorry I criticized you, as you are obviously not ready for public
debate about your beliefs. Please let me know when you become ready.
So what? It also allows one to clearly delineate the boundaries of
such type-unsafe behavior. This is one of the reasons why I am not a
fan of side-effect-free functional languages. In Scheme, SML (and I
assume Common Lisp) I can isolate side-effecting code so that the rest
of the program is still amenable to algebraic reasoning. SML gives a
type system strong enough to isolate poorly-typed code so that it
doesn't pollute the clean bits.
Strong type systems are a bridge from the messy real world to a much
cleaner mathematical one.
david rush
--
There's man all over for you, blaming on his boots the faults of his feet.
-- Samuel Becket (Waiting For Godot)
Actually *I* agree, but having said that, it largely depends on the
application domain. For "five-9s" systems (OSes, telephone switches,
life-support systems...) I want as much surety of correctness as I can
get, so it's worth the time it takes to work withing the constraints
of static type discipline. For human-oriented systems (failure modes
can be handled by humans), especially when they must be developed
quickly, I'll take a dynamically-typed system any day of the week. In
these cases the delayed detection of failures is a benefit because it
allows me to cope with the fluidity of the marketplace.
Mathias thinks I'm a heretic for this, I'm sure, but that's the way
*my* brain is wired.
david rush
--
Coffee should be black as hell, strong as death and sweet as love.
-- Turkish proverb
> The Jalapeno Java JVM:
> http://www.research.ibm.com/jalapeno/publication.html#oopsla99_jvm
>
> Squeak Smalltalk:
> ftp://st.cs.uiuc.edu/Smalltalk/Squeak/docs/OOPSLA.Squeak.html
Neither of these have a self-hosting GC.
I have read that some time ago, but I keep wondering how this choice
influences expressivity: it seems that you cannot directly represent
any circular structure. Is that true?
If so, would one need to implement a layer on top (maybe using hash-tables)
to represent any form of circular structure?
Would that force the application programmer to implement their own specialized
GC for that structure? Similar to triggers and integrity constraints in a
relational DB?
Bernhard
--
---------------------------------------------------------------------
Bernhard Pfahringer, Dept. of Computer Science, University of Waikato
http://www.cs.waikato.ac.nz/~bernhard +64 7 838 4041
---------------------------------------------------------------------
Bernhard> I have read that some time ago, but I keep wondering how this choice
Bernhard> influences expressivity: it seems that you cannot directly represent
Bernhard> any circular structure. Is that true?
Good question. It is too long since I read Bakers papers on the
subject, but I do recall him having some kind of demonstration that
certain things are still possible, even in a linear lisp.
Even if one cannot have a circular structure "by reference", one can
of course have it "by name" (ie. rather than having a pointer to a
previous element you could use some form of symbolic representation
of the element).
------------------------+-----------------------------------------------------
Christian Lynbech | Ericsson Telebit, Skanderborgvej 232, DK-8260 Viby J
Phone: +45 8938 5244 | email: christia...@ted.ericsson.se
Fax: +45 8938 5101 | web: www.ericsson.com
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
- pet...@hal.com (Michael A. Petonic)
From your posts it seem like you want some primitives that get in the
guts of the OS/Hardware layers of the machine. Am I correct?
Cheers
--
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488
719 Broadway 12th Floor fax +1 - 212 - 995 4122
New York, NY 10003, USA http://bioinformatics.cat.nyu.edu
"Hello New York! We'll do what we can!"
Bill Murray in `Ghostbusters'.
Or in a depth-first stop-and-copy (i.e. recursive copying), you can also
show that the amount of stack space is smaller or equal to the amount
of space that's yet to be copied, so you can put your stack at the end of
the to space.
See Appel's paper about hash-consing-during-GC where they did just that.
Stefan
The OS disallows "hacking at it with a hex editor".
Unless you're some kind of super-privileged user, of course (just like
you can write all over /proc/kmem if you're root).
Stefan
I didn't quite mean it quite so literally. Imagine I get a blob of
code, how do I know that it doesn't fake things? The only way I can
see to do this is a completely trusted compiler, which can sign its
output, so you're still dynamically checking, you just do it once,
when the program starts (isn't this what MS push with ActiveX?). Or I
guess you can do some kind of proof on the program before running it
(Java?).
Given the negligible cost of checks, I'd kind of rather the OS just
did them though.
--tim
Actually no type system can deal with it because the communication might
be closed because someone tripped over the ethernet cable.
I.e. it really is inherently dynamic.
As for those type systems you mention, the most relevant research
in the area is obviously the "Vault" system which is specifically designed
to make sure that operations are executed in the proper order (open, then
read, then close) in device drivers.
Stefan
They can also statically enforce the presence of the relevant dynamic checks.
Stefan
>>>>>> "Matthias" == Matthias Blume <matt...@shimizu-blume.com> writes:
>> Well, there are two things that "close" does to a socket: it signals
>> the other guy that communication is over, and it deallocates the data
>> structures associated with this communication (if I am the last one
>> holding on to it). What I was proposing was to not do the
>> deallocation. The runtime check for "is communication over?", of
>> course, cannot be eliminated because it is inherently dynamic.
>> (Actually, with unique or linear types, or with monads, there might be
>> a way around even this.)
>
> Actually no type system can deal with it because the communication might
> be closed because someone tripped over the ethernet cable. I.e. it
> really is inherently dynamic.
Yes, I know. I was being too brief to be correct. What I was thinking
of was not to completely eliminate the "end-of-communication?" test --
which for obvious reasons is not possible.
What I did think of was a type system that won't let me do this test again
after it said "yes" once. (Of course, this is not very interesting other
than for checking the sanity of my program's internal logic.)
Matthias
In both cases, the GC is written in a modified version of the
high-level language. If that's not self-hosting, then you better
explain more precisely what you mean by the term.
If you mean that the GC has to be written in the UNmodified language,
this is pretty obvioulsy unrealistic -- you need forms of memory
access that the pure HLL won't give you.
Tim> Or I guess you can do some kind of proof on the program before
Tim> running it (Java?).
Back in the good old days when it was still generally believed that
Java was a secure solution, I saw an article picking that idea
apart.
On top of a bunch of bugs in the various implementations they also
demonstrated (according to my recollection) that it was possible to
circumvent the security mechanisms of Java, not through valid Java
code, but it was possible by hacking directly at the JVM bytecodes.
The fix was to add signing of applets, such that also for Java you
need to trust the SW supplier.
I must admit not to have a firm reference on the paper. I'll do a
little digging if people start accusing me of lying :-)
------------------------+-----------------------------------------------------
Christian Lynbech |
> From your posts it seem like you want some primitives that get in the
> guts of the OS/Hardware layers of the machine. Am I correct?
I thought I was bright-shining clear. What I want is a GC written in
the language itself, with all the normal language facilities
available.
Having special peek/poke primitives is certainly necessary for that
task, but not sufficient.
Consider, for example, that memory management for implementations of
the C language are normally written in C.
Consider that if a Lisp system's GC is written in some other language
(like, say, C) then you now need two compilers to build the language.
If your only use for a C compiler is to compile your GC, then you have
really wasted a vast effort in writing one.
Thomas
I understand your points. What I wanted to point out is that the
`malloc' library you write under Unix is different from the one your
write under Windows. In (Common) Lisp, you have another layer to get
past by: the specific CL implementation, which may or may not give you
the necessary hooks to control the OS interface in a way that does not
interfere with the (Common) Lisp system itself.
So the question is: how do you get past this `impasse'? (I surely
don't claim to know how).
> The fix was to add signing of applets, such that also for Java you
> need to trust the SW supplier.
This is nice to know, and enables me to make my point more succinctly:
(a) you need signing, and (b) do you think the average software
vendor's digital signature is worth the bits its made of? Better
check those system calls...
--tim
It seems quite natural that someone who writes a Common Lisp system would
write its guts in some other language first. After a while, it would be
possible to bootstrap the building process in the system itself, but it
would seem natural to build some lower-level Lisp that would enable a
highly portable substrate to be written, and then cross-compilation would
be a breeze, but it still seems fairly reasonable to start off with a
different compiler or language if you want anybody to repeat the building
process from scratch, not just for GC, but for the initial substrate. I
remember having to compile GNU CC on SPARC with the SunOS-supplied C
compiler and then with the GNU CC thus built, in order to arrive at a
"native build" and that when Sun stopped shipping compilers with their
application-only operating system, someone was nice enough to make
binaries available for the rest of the world.
Why is GC so special in your view?
>> I thought I was bright-shining clear. What I want is a GC written
>> in the language itself, with all the normal language facilities
>> available.
>> Having special peek/poke primitives is certainly necessary for that
>> task, but not sufficient.
>> Consider, for example, that memory management for implementations
>> of the C language are normally written in C.
>> Consider that if a Lisp system's GC is written in some other
>> language (like, say, C) then you now need two compilers to build
>> the language. If your only use for a C compiler is to compile your
>> GC, then you have really wasted a vast effort in writing one.
> I understand your points. What I wanted to point out is that the
> `malloc' library you write under Unix is different from the one your
> write under Windows. In (Common) Lisp, you have another layer to
> get past by: the specific CL implementation, which may or may not
> give you the necessary hooks to control the OS interface in a way
> that does not interfere with the (Common) Lisp system itself.
> So the question is: how do you get past this `impasse'? (I surely
> don't claim to know how).
I don't think the `impasse' is passable.
Consider that the various Unix kernels out there do NOT use "all of
C;" they use subsets that on the one hand likely permit all the
_operators_ and control structures of the base language, but which
_EXCLUDE_ great gobs of "The Standard C Library," notably anything
that forcibly depends on malloc().
One of the Frequently Asked Questions about Linux is "So why don't you
port it to C++? Wouldn't that make it lots better?"
The _real_ answer to that: "Because the developers prefer C."
But another pointed reason not to is that C++ subsumes into the base
language a bunch of stuff that, in C, is part of LIBC, and, which, in
many cases, depends on having malloc()/free() (or equivalents thereof)
around to do their work, what with constructors and destructors and
the like.
In order to build an OS kernel in C++, you have to very carefully pick
a subset that doesn't require any underlying "runtime support." By
the time you gut C++ that way, what you've got is basically C with
classes, and there's little point to calling it a "C++-based OS."
With Lisp, it's much the same story; you will at the "base" have to
have some basic set of functions and operations that DO NOT REQUIRE
RUNTIME SUPPORT, because the point of the exercise is to _implement_
that runtime support.
This actually suggests there being merit to the hoary question of
"What's a good `base CL?'" where you bootstrap with some minimal set
of operators, functions, and macros, and then implement the rest of
the system on top of that.
A necessary "base" would include some basic set of operators/functions
necessary for writing the garbage collector which do not themselves
make any use of dynamic memory allocation. [Might this mean that the
'base' would exclusively use stack-based memory allocation? I'd tend
to think so...]
The notion that the system could bootstrap itself without that limited
'base' seems very wishful.
I'll bet an interesting OS to look at would be SPIN, which was
implemented in Modula-3. M3 offers the same "chewy garbage collection
goodness" of Lisp; presumably the SPIN kernel has to have certain
sections that implement the "memory management runtime support" in
such a way that they require no such runtime support.
Forth would be another candidate; one of the longstanding traditions
there is the notion of implementing "target compilers" which start
with a basic set of CODE words (e.g. - assembly language) and then use
that as a bootstrap on top of which to implement the rest of the
language.
That actually points to a somewhat reasonable approach:
- Write a function that issues assembly language instructions
into a function;
- Write some functions that issue groups of assembly language
instructions ("macros" in the assembler sense);
- Implement a set of memory management functions using that
"bootstrap";
- Then you've got the basis for implementing everything else on
top of that.
The notion of doing that without something like assembly language
macros underneath is just wishful thinking...
--
(reverse (concatenate 'string "moc.adanac@" "enworbbc"))
http://www3.sympatico.ca/cbbrowne/macros.html
Rules of the Evil Overlord #123. "If I decide to hold a contest of
skill open to the general public, contestants will be required to
remove their hooded cloaks and shave their beards before entering."
<http://www.eviloverlord.com/>
No-one was talking about Java.
Matthias
> I understand your points. What I wanted to point out is that the
> `malloc' library you write under Unix is different from the one your
> write under Windows. In (Common) Lisp, you have another layer to get
> past by: the specific CL implementation, which may or may not give you
> the necessary hooks to control the OS interface in a way that does not
> interfere with the (Common) Lisp system itself.
I'm not talking about writing it for an existing CL system, I'm
talking about writing it from the standpoint of a systems designer.
> Why is GC so special in your view?
One might well need bootstrap in designing and initially building the
system. But now, one needs *only* GCC to build GCC, and not anything
else. Once one has a running system with GCC, you don't any longer
need the pcc compilers that GCC was originally built with.
> That actually points to a somewhat reasonable approach:
> - Write a function that issues assembly language instructions
> into a function;
> - Write some functions that issue groups of assembly language
> instructions ("macros" in the assembler sense);
> - Implement a set of memory management functions using that
> "bootstrap";
> - Then you've got the basis for implementing everything else on
> top of that.
>
> The notion of doing that without something like assembly language
> macros underneath is just wishful thinking...
I'm working on a CL system that is pretty much based on (x86) assembly
macros like you are describing. It looks something like this:
(defun (setf car) (value cell)
(check-type cell cons)
(with-inline-assembly (:returns :eax)
(:load-lexical cell :ebx)
(:load-lexical value :eax)
(:movl :eax (:ebx -1))))
--
Frode Vatvedt Fjeld
I actually tried to argue that the same would true of a Common Lisp
system, but that portability constraints dictate that those who want to
port a Common Lisp compiler to System X on the Y processor should be able
to use the portable assembler (C) instead of having to start off writing
non-portable assembler and use the system's assembler to bootstrap from.
Needing *only* GCC, as you say, is predicated on the existence of a
binary for your system to begin with. How do people port GCC to a new
platform om which they intend to build the GNU system? My take on this
is that it is no less dependent on some other existing C compiler than
the similar problem for CL compilers is. Duane, please help. :)
To be somewhat pedantic, it isn't necessary to eschew *all* dynamic
allocation in a GC. You just have to collect more than you cons.
IIRC, they first write a /cross/ compiler for the new system that
runs on an old system. Then they use the cross compiler to compile
gcc itself and voila... done. Hey, sounds easy, doesn't it? :-))
Regards,
--
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."
PGP key ID 0x42B32FC9
I assume they add support for the new target to gcc, compile gcc on an
existing system targeted at the new system and then run this new
compiler on the new system.
--tim
But then why the restriction that you "must" have the "full" langauge
available? Sure, Squeak uses a subset "slang" which maps fairly directly
to C and is intended to generate C which is compiled by a separate C
compiler, but it *runs* inside Squeak. You can run/debug a slang based VM
in Squeak (well, it can be done, at least :)). It's *way* slower, but
presumably that's a "mere" implementational issue (the Squeak community
doesn't have the resources to be able to afford *not* to delegate this bit
to C compilers).
There are Smalltalks (and lisps) that let you inline C or asm code...would
that be ok?
Cheers,
Bijan Parsia.
Correct, though it is often complicated by object file formats. One approach is
to generate textual assembly language on the host machine, which is then
assembled and linked on the target machine (using existing tools). Another
approach is to retarget the equivalent GNU tools and generate the binaries
directly on the host machine.
--
Martin Simmons, Xanalys Software Tools
zne...@xanalys.com
rot13 to reply
NAK! This implies that nobody can modify the compiler. If you
have a compiler that signs its output, then somebody can open up
the source code and find the signing key. Then the signing key
can be used to sign arbitrary output. That means you cannot
release the source code for your compiler.
Or maybe read priveleges to it are root-only and root can set the
signing key for a particular installation -- but then you have a
problem that nobody can compile on one system and run on another.
Far far better to have potentially-dangerous processes running in
their own memory arenas where the OS can keep an eye on them in
case they try messing anything up.
Bear
It sounds like _vastly_ more work than building on the native system with
a native assembler and linker to build the first executables until you
could replace those, too.
Back in the old days, I wrote 8080 and Z80 code on the PDP-10 and its
cross-assembler for "microcomputers", because it was so fantastically
more convenient to work on a real computer and deploy on a toy than work
on the toy computer -- mostly all I did on the toy computer was to write
an excellent terminal emulation program, in assembler. However, the only
reason this was more convenient was that it was a royal pain in the butt
to try to use the toy computer for any development. However, I had to
copy the ROMs in that machine to the PDP-10 and basically regenerate its
symbol table in order to make things work correctly. Luckily, it had an
emulator, and curiously, the PDP-10 emulated the code about 100 times
faster than my toy computer executed it. Were it not for the 100,000
times difference in the cost of acquisition and ownership of the two
computers, I would certainly have replaced my Exidy Sorcerer with a
PDP-10. Come to think of, my current home computer is strong enough to
emulate a PDP-10 about 100 times faster than the real thing, too...
> Tim Bradshaw wrote:
>>
>> * Stefan Monnier wrote:
>> >>>>>> "Tim" == Tim Bradshaw <t...@cley.com> writes:
>> >> instance: OS gives me a file descriptor, I then hack at it with a
>> >> hex
>>
>> > The OS disallows "hacking at it with a hex editor". Unless you're
>> > some kind of super-privileged user, of course (just like you can
>> > write all over /proc/kmem if you're root).
>>
>> I didn't quite mean it quite so literally. Imagine I get a blob of
>> code, how do I know that it doesn't fake things? The only way I can
>> see to do this is a completely trusted compiler, which can sign its
>> output, so you're still dynamically checking, you just do it once, when
>> the program starts (isn't this what MS push with ActiveX?). Or I guess
>> you can do some kind of proof on the program before running it (Java?).
>>
>> Given the negligible cost of checks, I'd kind of rather the OS just did
>> them though.
>>
>>
> NAK! This implies that nobody can modify the compiler. [ ... ]
Yes. But there are far better methods than just signing the output of
the compiler. In particular, read up on proof-carrying code: It does
not require a certifying compiler (you can even write the code by hand
as long as you also write the corresponding proof). Code (and
proof!) can come from anywhere. Finally, the trusted computing base can be
far smaller than a typical compiler.
Matthias
I went back and forth with Doug on this. There are several issues you
need to think about:
-- If you use bit vectors, you pay for some shifting of bits etc.
CMUCL actually generates somewhat suboptimal code for the X86 platform
(and extra mask and returning a value that's not used)
-- If you use eight bit BYTE's and don't coerce the compiler to use machine
integers to address the array. You again pay for shifting for fixnum
untagging.
-- If you use fixnums or machine integers (32 bit) fixnums can address
them w/o untagging, and you get fast results BUT this is cheating (won't
scale and it will spill over to L2 cache even with a small array. Doug's
machine has 1/2 speed L2 cache (P-II) so your results will vary if you
have a full speed L2 (eg celeron will beat regular P-II).
this is about all I remember.
There was also the additional issue of loop macro and declarations if
remember correctly.
Doug probably has a changelog of all this somewhere. But disassemble and the
compiler trace facility of CMUCL should be helpful also.
cheers,
BM
No need to sign output. Simply disallow any binaries that were not
created by that machine's compiler. In order to run source code it
must be passed through, and checked, by the compiler on that machine.
>
> Or maybe read priveleges to it are root-only and root can set the
> signing key for a particular installation -- but then you have a
> problem that nobody can compile on one system and run on another.
>
Oh well. FreeBSD gets by (though it's not required to compile, they
still do a lot).
> Far far better to have potentially-dangerous processes running in
> their own memory arenas where the OS can keep an eye on them in
> case they try messing anything up.
Context-switches are expensive, remember. An OS/compiler that removed
as many layers as possible between program and underlying hardware would
be much faster; if the compiler has a chance to examine every piece
of code that goes in the system then it may be able to do this.
--
; Matthew Danish <mda...@andrew.cmu.edu>
; OpenPGP public key: C24B6010 on keyring.debian.org
; Signed or encrypted mail welcome.
; "There is no dark side of the moon really; matter of fact, it's all dark."
This is probably doable, but in my experience with cross-compilation, you
do not just generate code, you effectively generate a module that works
with a much larger system. To make this _really_ work, you have to have
intimate knowledge of the target system. Since the compiler is often the
first thing you build on a new system in order to build the other tools
you want to use there, my thinking is that you save a lot of time using a
pre-existing compiler and like tool, particularly to ensure that you get
the linking information right for that particular environment, what with
all the shared library dependencies and whatnot.
> I actually tried to argue that the same would true of a Common Lisp
> system, but that portability constraints dictate that those who want to
> port a Common Lisp compiler to System X on the Y processor should be able
> to use the portable assembler (C) instead of having to start off writing
> non-portable assembler and use the system's assembler to bootstrap from.
You'll always need an assembler, of course; there isn't any way around
that. And there are advantages for systems like the old KCL and its
descendents which use GCC as the back end for the compiler.
But I'm thinking about a different problem space, not the one you are.
> Needing *only* GCC, as you say, is predicated on the existence of a
> binary for your system to begin with. How do people port GCC to a new
> platform om which they intend to build the GNU system? My take on this
> is that it is no less dependent on some other existing C compiler than
> the similar problem for CL compilers is. Duane, please help. :)
People port GCC to new platforms by having GCC cross-compile code.
No reliance on other compilers is necessary.
MIT Scheme gets ported the same way.
> NAK! This implies that nobody can modify the compiler. If you
> have a compiler that signs its output, then somebody can open up
> the source code and find the signing key. Then the signing key
> can be used to sign arbitrary output. That means you cannot
> release the source code for your compiler.
No, a trusted compiler is simply the only object that has the ability
to create compiled-procedure objects. No problem at all!
Well, the problem is still that only the one compiler is the trusted
one. Two solutions for that problem are to use a subsetted bytecode
thing, like the Java VM, and to use proof-carrying code to validate
compiler output.
> This is probably doable, but in my experience with cross-compilation, you
> do not just generate code, you effectively generate a module that works
> with a much larger system. To make this _really_ work, you have to have
> intimate knowledge of the target system. Since the compiler is often the
> first thing you build on a new system in order to build the other tools
> you want to use there, my thinking is that you save a lot of time using a
> pre-existing compiler and like tool, particularly to ensure that you get
> the linking information right for that particular environment, what with
> all the shared library dependencies and whatnot.
No, Tim was totally right. You don't use the pre-existing compiler in
general; often times the manufacturer isn't providing one.
Often you are the first person writing one: this is now rather often
the case with GCC.
> But then why the restriction that you "must" have the "full" langauge
> available?
Because I'm looking for solutions to the hard problem, not ways of
solving a different problem.
> * Nils Goesche
> | IIRC, they first write a /cross/ compiler for the new system that
> | runs on an old system. Then they use the cross compiler to compile
> | gcc itself and voila... done. Hey, sounds easy, doesn't it? :-))
>
> It sounds like _vastly_ more work than building on the native system with
> a native assembler and linker to build the first executables until you
> could replace those, too.
You really have no clue how GCC works if you think it's more trouble.
Really, GCC is totally equipped to do cross-compilation (as are all
the other parts of the toolchain).
> * tb+u...@becket.net (Thomas Bushnell, BSG)
> | One might well need bootstrap in designing and initially building the
> | system. But now, one needs *only* GCC to build GCC, and not anything
> | else. Once one has a running system with GCC, you don't any longer
> | need the pcc compilers that GCC was originally built with.
>
> I actually tried to argue that the same would true of a Common Lisp
> system, but that portability constraints dictate that those who want to
> port a Common Lisp compiler to System X on the Y processor should be able
> to use the portable assembler (C) instead of having to start off writing
> non-portable assembler and use the system's assembler to bootstrap from.
>
> Needing *only* GCC, as you say, is predicated on the existence of a
> binary for your system to begin with. How do people port GCC to a new
> platform om which they intend to build the GNU system? My take on this
> is that it is no less dependent on some other existing C compiler than
> the similar problem for CL compilers is. Duane, please help. :)
I don't view self-hosting as precluding any bootstrapping which is
necessary to get to that self-hosting state. In fact, I would be
surprised to hear of _any_ kind of self-hosting which doesn't require
a non-self-hosted bootstrap. This applies to both cross-compiling
from another architecture and re-compiling on the same architecture
starting with a different compiler.
Thomas Bushnell's challenge is a good one. And this thread has been
a good one, as well. Several times I considered answering some of the
statements made on this thread, but have refrained because there are
so many issues and stochastic requirements. So I thought I'd put
together several ideas and present them at once, from the point of
view of an Allegro CL developer.
As an initial summary, I submit that the entire lisp _could_ be
written entirely in lisp, but that it is not convenient to do so,
given the fact that we run our lisp on Unix and MS systems, which
are all C based, and even embedded systems tend to have libc
equivalent interfaces. However, I do disagree that it is necessary
to require that the whole language be available for a GC written in
lisp, and will explain that later as well.
First, a background review of Allegro CL's structure, for those
who don't yet know:
1. Most of Allegro CL is written in Allegro CL, and compiles per
architecture to bits (represented in code vector lisp objects)
using the Allegro CL compile and compile-file functions.
2. A subsection of the kernel or "runtime" of Allegro CL is an
extension of CL I call runtime or "rs" code, which also use the
Allegro CL compiler, extended and hooked to produce assembler
source as output.
3. Some small part of Allegro CL is written in C. On some
architectures, the C++ compiler is used, but it is mostly written
in C style. The major purpose of the C code is to parse the .h
header files of the system for the os interface. We try mostly
to limit our C code to os-interface functionality and regularization.
In addition, as a kind of #3a: We also have written our garbage-collector
and our fasl-file reader in C.
The binaries from 2, 3, and 3a are all linked together using the system
linker to either produce a single executable, or to produce a simple
executable main and a shared-library. In both cases, that link output
serves dual purpose as a bootstrap mechanism to load pure lisp code in
(i.e. from #1) or to re-estalish the environment dumped in a previous
lisp session.
The rs code in #2 is sort of a combination of superset/subset of regular
CL code; it understands both C and Lisp calling conventions, but does not
set up a "current function" for its own operation. Since the produced
code is just assembler source, and does not set up a function object,
local constants are not allowed; only constants that are in the lisp's
global table can be referenced by rs code. Recently, I added an
exception to this; string constants can now be represented in rs
code - these will become .asciz or equivalent directives in the
assembler source. This allows such rs functions as
(def-runtime-q print-answer (n)
(q-c-call printf "The answer is %d
" n))
I have also recently extended the rs code to allow for large
stack-allocated specialized arrays; we've always been able to
allocate stack-based simple-vectors in rs code, but due to the
rs code stack frame descriptors we provided for gc purposes, non-lisp
data had been restricted to a few hundred bytes until now.
Theoretically, due to these and other changes, we should now be
able to rewrite both the fasl reader and the garbage-collector
in rs code, but it hasn't been a high priority. For the garbage
collector especially, there must be an incentive to make such a
potentially regressive move; it may be that a new gc to handle
concurrent MP might be just that incentive.
For #3, I was almost ready to disagree with Thomas Bushnell
because I believed that it is necessary to use C functionality
to interface to C library functions. This is especially true
for the need to parse .h files, and to get the correct
definitions and interfaces based on particular #define constants.
If you doubt this, just try to figure out, for example, HP's
sigcontext structure, which has layer upon layer of C macrology
to define a large number of incompatible structure and interface
definitions.
However, I had to back off on any such disagreement, becuase it
certainly is _possible_ to write any of these interface functions
in lisp, using such facilities as our Cbind tool to pre-parse the
header files and to thus present all pertinent information to
the lisp-in-lisp code. However, I still am not inclined to do such
a thing, because it would be specialized toward lisp bootstrap, and
thus not useful for anything else. And why not use C at what it does
best (parse C header files)? Besides, even our Cbind facility uses
the gcc front-end to do the initial parsing, so in essence a non-lisp
compiler part would still be used. Bottom line; it is more convenient
to write our os-interface code in C, because it interfaces to C
libraries. I suppose that we would remove such C interfaces if we
were porting our lisp to a Lisp operatring system.
Finally, I'd like to disagree wholeheartedly with the notion that
the full language must be available for the whole lisp implementation.
Specifically, I am responding to this point by Thomas Bushnell:
>> I thought I was bright-shining clear. What I want is a GC written in
>> the language itself, with all the normal language facilities
>> available.
It is the notion of "availability" that I take issue with. To
make my point, consider the statement that in every English
sentence, all letters of the alphabet are available. That is,
of course, a true statement. And as in the specific example where
"The quick brown fox jumped over the lazy dogs." it is obviously
possible to construct a sentence which _indeed_ does use every
letter in the alphabet. However, does this require that every
sentence be constructed in such a way? Of course not! It is thus
not the whole alphabet which is available to a particular sentence,
but only those letters which in fact work toward constructing the
sentence, which are in fact "available". Thus, for normal
conversation, the letter "q" is not generally available to me to
use unless I am using a word which has a "q" in it (or unless I'm
specifically talking about the letter "q" itself).
Let's extend this notion to an extensible language like Lisp.
Consider the start of a CL function foo:
(defun foo ()
...)
Now, the body of foo can refer to any CL functionality, including
foo itself. However, it would generally be bad programming (i.e.
a bug) to allow a call to foo within foo which results in an
infinite recursion. Thus, to some extent, foo is not fully
available to use as one wishes within foo.
Similar truths apply to a garbage-collector. It might be
perfectly acceptable for a gc function to call cons, but it
had better be prepared to deal with the case where there is
no more room for a new cons cell, which would thus cause a
recursive call to the garbage-collector (presumably an infinite
recursion, since the reason for the initial gc call might have
been for lack of space).
And, as The Oracle in Matrix says, "What's really going to
bake your noodle ..." is that at least in CL, there is no
definition of what a garbage-collector actually _is_. There are
a few references, but no definitions or specs...
--
Duane Rettig Franz Inc. http://www.franz.com/ (www)
1995 University Ave Suite 275 Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253 du...@Franz.COM (internet)
> As an initial summary, I submit that the entire lisp _could_ be
> written entirely in lisp, but that it is not convenient to do so,
> given the fact that we run our lisp on Unix and MS systems, which
> are all C based, and even embedded systems tend to have libc
> equivalent interfaces.
So it should be pointed out that one of the reasons I'm interested in
this question is that I'm interested in lisp systems running on bare
metal.
> However, I do disagree that it is necessary
> to require that the whole language be available for a GC written in
> lisp, and will explain that later as well.
I agree that it may not be *necessary* depending on what that means.
But note that I began by asking about both Scheme and CL; the point is
that of course I could confine myself to a tiny subset of CL and do
things in PL/I (er, I mean "the loop macro").
However, the real things I want are fairly simple: I want complex
closures and I want cons. I might want call/cc, at least, I'm not
willing to exclude that a priori.
> For #3, I was almost ready to disagree with Thomas Bushnell
> because I believed that it is necessary to use C functionality
> to interface to C library functions.
If you really need to, you can do that, and it may well be the most
efficient implementation strategy if you want to run on Unix. (As, of
course, you do.)
A "pure" implementation means that you would do the same work the C
library people do, and make Lisp equivalents for the C header files
yourself. Remember, *I'm* always thinking of this from a systems
design perspective, so "tell the other group to do the work" isn't
really a solution. :) But if the other group is doing the work anyway
(as is the case for people running a Lisp environment on Unix, then of
course, it's convenient to piggyback on them.
> For #3, I was almost ready to disagree with Thomas Bushnell
> because I believed that it is necessary to use C functionality
> to interface to C library functions.
The actual interfaces you need are the *kernel* interfaces, not
interfaces to the C library. From the systems design perspective,
your system would be *replacing* the C library, not borrowing it. If
you do want to borrow it, then it might be most convenient to use C to
hook into it, though as you correctly note, even then you can get
around it.
> Similar truths apply to a garbage-collector. It might be
> perfectly acceptable for a gc function to call cons, but it
> had better be prepared to deal with the case where there is
> no more room for a new cons cell, which would thus cause a
> recursive call to the garbage-collector (presumably an infinite
> recursion, since the reason for the initial gc call might have
> been for lack of space).
So the *point* of my question is, in part, just this problem. Now, if
there *isn't* a solution, then you have to subset the language, omit
cons, and then code your GC.
But why do that if there is a convenient solution?
One strategy: suppose to GC an arena of N bytes takes N/10 bytes of
memory to hold dynamically allocated GC data structures.
One strategy is to just save that space always, so it's there. Or, if
one is using stop-and-copy, then it's even easier to find space.
If one is in a multi-threaded world, and each thread gets its own
allocation arena for normal allocation, and you are using a
stop-the-world approach to GC, then you can't reliably assume
(perhaps) that all the threads have left their arena in an ideal
state. That means that the GC will probably have to allocate out of a
totally separate arena from what other programs use. When it's done,
a quick GC pass (allocating from the main heap) can be run to clean
the special GC arena, and copy anything remaining there onto the main
heap.