* Thomas Bushnell, BSG | Consider that if a Lisp system's GC is written in some other language | (like, say, C) then you now need two compilers to build the language. | If your only use for a C compiler is to compile your GC, then you have | really wasted a vast effort in writing one.
It seems quite natural that someone who writes a Common Lisp system would write its guts in some other language first. After a while, it would be possible to bootstrap the building process in the system itself, but it would seem natural to build some lower-level Lisp that would enable a highly portable substrate to be written, and then cross-compilation would be a breeze, but it still seems fairly reasonable to start off with a different compiler or language if you want anybody to repeat the building process from scratch, not just for GC, but for the initial substrate. I remember having to compile GNU CC on SPARC with the SunOS-supplied C compiler and then with the GNU CC thus built, in order to arrive at a "native build" and that when Sun stopped shipping compilers with their application-only operating system, someone was nice enough to make binaries available for the rest of the world.
Why is GC so special in your view?
/// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.
>> > From your posts it seem like you want some primitives that get in the >> > guts of the OS/Hardware layers of the machine. Am I correct? >> I thought I was bright-shining clear. What I want is a GC written >> in the language itself, with all the normal language facilities >> available. >> Having special peek/poke primitives is certainly necessary for that >> task, but not sufficient. >> Consider, for example, that memory management for implementations >> of the C language are normally written in C. >> Consider that if a Lisp system's GC is written in some other >> language (like, say, C) then you now need two compilers to build >> the language. If your only use for a C compiler is to compile your >> GC, then you have really wasted a vast effort in writing one. > I understand your points. What I wanted to point out is that the > `malloc' library you write under Unix is different from the one your > write under Windows. In (Common) Lisp, you have another layer to > get past by: the specific CL implementation, which may or may not > give you the necessary hooks to control the OS interface in a way > that does not interfere with the (Common) Lisp system itself. > So the question is: how do you get past this `impasse'? (I surely > don't claim to know how).
I don't think the `impasse' is passable.
Consider that the various Unix kernels out there do NOT use "all of C;" they use subsets that on the one hand likely permit all the _operators_ and control structures of the base language, but which _EXCLUDE_ great gobs of "The Standard C Library," notably anything that forcibly depends on malloc().
One of the Frequently Asked Questions about Linux is "So why don't you port it to C++? Wouldn't that make it lots better?"
The _real_ answer to that: "Because the developers prefer C."
But another pointed reason not to is that C++ subsumes into the base language a bunch of stuff that, in C, is part of LIBC, and, which, in many cases, depends on having malloc()/free() (or equivalents thereof) around to do their work, what with constructors and destructors and the like.
In order to build an OS kernel in C++, you have to very carefully pick a subset that doesn't require any underlying "runtime support." By the time you gut C++ that way, what you've got is basically C with classes, and there's little point to calling it a "C++-based OS."
With Lisp, it's much the same story; you will at the "base" have to have some basic set of functions and operations that DO NOT REQUIRE RUNTIME SUPPORT, because the point of the exercise is to _implement_ that runtime support.
This actually suggests there being merit to the hoary question of "What's a good `base CL?'" where you bootstrap with some minimal set of operators, functions, and macros, and then implement the rest of the system on top of that.
A necessary "base" would include some basic set of operators/functions necessary for writing the garbage collector which do not themselves make any use of dynamic memory allocation. [Might this mean that the 'base' would exclusively use stack-based memory allocation? I'd tend to think so...]
The notion that the system could bootstrap itself without that limited 'base' seems very wishful.
I'll bet an interesting OS to look at would be SPIN, which was implemented in Modula-3. M3 offers the same "chewy garbage collection goodness" of Lisp; presumably the SPIN kernel has to have certain sections that implement the "memory management runtime support" in such a way that they require no such runtime support.
Forth would be another candidate; one of the longstanding traditions there is the notion of implementing "target compilers" which start with a basic set of CODE words (e.g. - assembly language) and then use that as a bootstrap on top of which to implement the rest of the language.
That actually points to a somewhat reasonable approach: - Write a function that issues assembly language instructions into a function; - Write some functions that issue groups of assembly language instructions ("macros" in the assembler sense); - Implement a set of memory management functions using that "bootstrap"; - Then you've got the basis for implementing everything else on top of that.
The notion of doing that without something like assembly language macros underneath is just wishful thinking... -- (reverse (concatenate 'string "moc.adanac@" "enworbbc")) http://www3.sympatico.ca/cbbrowne/macros.html Rules of the Evil Overlord #123. "If I decide to hold a contest of skill open to the general public, contestants will be required to remove their hooded cloaks and shave their beards before entering." <http://www.eviloverlord.com/>
On Tue, 05 Mar 2002 16:14:56 -0500, Tim Bradshaw wrote: > * Christian Lynbech wrote:
>> The fix was to add signing of applets, such that also for Java you need >> to trust the SW supplier.
> This is nice to know, and enables me to make my point more succinctly: > (a) you need signing, and (b) do you think the average software vendor's > digital signature is worth the bits its made of? Better check those > system calls...
Marco Antoniotti <marc...@cs.nyu.edu> writes: > I understand your points. What I wanted to point out is that the > `malloc' library you write under Unix is different from the one your > write under Windows. In (Common) Lisp, you have another layer to get > past by: the specific CL implementation, which may or may not give you > the necessary hooks to control the OS interface in a way that does not > interfere with the (Common) Lisp system itself.
I'm not talking about writing it for an existing CL system, I'm talking about writing it from the standpoint of a systems designer.
Erik Naggum <e...@naggum.net> writes: > Why is GC so special in your view?
One might well need bootstrap in designing and initially building the system. But now, one needs *only* GCC to build GCC, and not anything else. Once one has a running system with GCC, you don't any longer need the pcc compilers that GCC was originally built with.
Christopher Browne <cbbro...@acm.org> writes: > That actually points to a somewhat reasonable approach: > - Write a function that issues assembly language instructions > into a function; > - Write some functions that issue groups of assembly language > instructions ("macros" in the assembler sense); > - Implement a set of memory management functions using that > "bootstrap"; > - Then you've got the basis for implementing everything else on > top of that.
> The notion of doing that without something like assembly language > macros underneath is just wishful thinking...
I'm working on a CL system that is pretty much based on (x86) assembly macros like you are describing. It looks something like this:
* tb+use...@becket.net (Thomas Bushnell, BSG) | One might well need bootstrap in designing and initially building the | system. But now, one needs *only* GCC to build GCC, and not anything | else. Once one has a running system with GCC, you don't any longer | need the pcc compilers that GCC was originally built with.
I actually tried to argue that the same would true of a Common Lisp system, but that portability constraints dictate that those who want to port a Common Lisp compiler to System X on the Y processor should be able to use the portable assembler (C) instead of having to start off writing non-portable assembler and use the system's assembler to bootstrap from.
Needing *only* GCC, as you say, is predicated on the existence of a binary for your system to begin with. How do people port GCC to a new platform om which they intend to build the GNU system? My take on this is that it is no less dependent on some other existing C compiler than the similar problem for CL compilers is. Duane, please help. :)
/// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.
> A necessary "base" would include some basic set of operators/functions > necessary for writing the garbage collector which do not themselves > make any use of dynamic memory allocation. [Might this mean that the > 'base' would exclusively use stack-based memory allocation? I'd tend > to think so...]
To be somewhat pedantic, it isn't necessary to eschew *all* dynamic allocation in a GC. You just have to collect more than you cons.
In article <3224397443760...@naggum.net>, Erik Naggum wrote: > * tb+use...@becket.net (Thomas Bushnell, BSG) >| One might well need bootstrap in designing and initially building the >| system. But now, one needs *only* GCC to build GCC, and not anything >| else. Once one has a running system with GCC, you don't any longer >| need the pcc compilers that GCC was originally built with.
> I actually tried to argue that the same would true of a Common Lisp > system, but that portability constraints dictate that those who want to > port a Common Lisp compiler to System X on the Y processor should be able > to use the portable assembler (C) instead of having to start off writing > non-portable assembler and use the system's assembler to bootstrap from.
> Needing *only* GCC, as you say, is predicated on the existence of a > binary for your system to begin with. How do people port GCC to a new > platform om which they intend to build the GNU system? My take on this > is that it is no less dependent on some other existing C compiler than > the similar problem for CL compilers is. Duane, please help. :)
IIRC, they first write a /cross/ compiler for the new system that runs on an old system. Then they use the cross compiler to compile gcc itself and voila... done. Hey, sounds easy, doesn't it? :-))
Regards, -- Nils Goesche "Don't ask for whom the <CTRL-G> tolls."
* Erik Naggum wrote: > Needing *only* GCC, as you say, is predicated on the existence of a > binary for your system to begin with. How do people port GCC to a new > platform om which they intend to build the GNU system? My take on this > is that it is no less dependent on some other existing C compiler than > the similar problem for CL compilers is. Duane, please help. :)
I assume they add support for the new target to gcc, compile gcc on an existing system targeted at the new system and then run this new compiler on the new system.
> One might well need bootstrap in designing and initially building the > system. But now, one needs *only* GCC to build GCC, and not anything > else. Once one has a running system with GCC, you don't any longer > need the pcc compilers that GCC was originally built with.
But then why the restriction that you "must" have the "full" langauge available? Sure, Squeak uses a subset "slang" which maps fairly directly to C and is intended to generate C which is compiled by a separate C compiler, but it *runs* inside Squeak. You can run/debug a slang based VM in Squeak (well, it can be done, at least :)). It's *way* slower, but presumably that's a "mere" implementational issue (the Squeak community doesn't have the resources to be able to afford *not* to delegate this bit to C compilers).
There are Smalltalks (and lisps) that let you inline C or asm code...would that be ok?
"Tim Bradshaw" <t...@cley.com> wrote in message news:ey3g03dgaml.fsf@cley.com... > * Erik Naggum wrote: > > Needing *only* GCC, as you say, is predicated on the existence of a > > binary for your system to begin with. How do people port GCC to a new > > platform om which they intend to build the GNU system? My take on this > > is that it is no less dependent on some other existing C compiler than > > the similar problem for CL compilers is. Duane, please help. :)
> I assume they add support for the new target to gcc, compile gcc on an > existing system targeted at the new system and then run this new > compiler on the new system.
Correct, though it is often complicated by object file formats. One approach is to generate textual assembly language on the host machine, which is then assembled and linked on the target machine (using existing tools). Another approach is to retarget the equivalent GNU tools and generate the binaries directly on the host machine. -- Martin Simmons, Xanalys Software Tools zne...@xanalys.com rot13 to reply
> * Stefan Monnier wrote: > >>>>>> "Tim" == Tim Bradshaw <t...@cley.com> writes: > >> instance: OS gives me a file descriptor, I then hack at it with a hex
> > The OS disallows "hacking at it with a hex editor". > > Unless you're some kind of super-privileged user, of course (just like > > you can write all over /proc/kmem if you're root).
> I didn't quite mean it quite so literally. Imagine I get a blob of > code, how do I know that it doesn't fake things? The only way I can > see to do this is a completely trusted compiler, which can sign its > output, so you're still dynamically checking, you just do it once, > when the program starts (isn't this what MS push with ActiveX?). Or I > guess you can do some kind of proof on the program before running it > (Java?).
> Given the negligible cost of checks, I'd kind of rather the OS just > did them though.
NAK! This implies that nobody can modify the compiler. If you have a compiler that signs its output, then somebody can open up the source code and find the signing key. Then the signing key can be used to sign arbitrary output. That means you cannot release the source code for your compiler.
Or maybe read priveleges to it are root-only and root can set the signing key for a particular installation -- but then you have a problem that nobody can compile on one system and run on another.
Far far better to have potentially-dangerous processes running in their own memory arenas where the OS can keep an eye on them in case they try messing anything up.
* Nils Goesche | IIRC, they first write a /cross/ compiler for the new system that | runs on an old system. Then they use the cross compiler to compile | gcc itself and voila... done. Hey, sounds easy, doesn't it? :-))
It sounds like _vastly_ more work than building on the native system with a native assembler and linker to build the first executables until you could replace those, too.
Back in the old days, I wrote 8080 and Z80 code on the PDP-10 and its cross-assembler for "microcomputers", because it was so fantastically more convenient to work on a real computer and deploy on a toy than work on the toy computer -- mostly all I did on the toy computer was to write an excellent terminal emulation program, in assembler. However, the only reason this was more convenient was that it was a royal pain in the butt to try to use the toy computer for any development. However, I had to copy the ROMs in that machine to the PDP-10 and basically regenerate its symbol table in order to make things work correctly. Luckily, it had an emulator, and curiously, the PDP-10 emulated the code about 100 times faster than my toy computer executed it. Were it not for the 100,000 times difference in the cost of acquisition and ownership of the two computers, I would certainly have replaced my Exidy Sorcerer with a PDP-10. Come to think of, my current home computer is strong enough to emulate a PDP-10 about 100 times faster than the real thing, too...
/// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.
On Wed, 06 Mar 2002 12:09:55 -0500, Ray Dillinger wrote: > Tim Bradshaw wrote:
>> * Stefan Monnier wrote: >> >>>>>> "Tim" == Tim Bradshaw <t...@cley.com> writes: >> >> instance: OS gives me a file descriptor, I then hack at it with a >> >> hex
>> > The OS disallows "hacking at it with a hex editor". Unless you're >> > some kind of super-privileged user, of course (just like you can >> > write all over /proc/kmem if you're root).
>> I didn't quite mean it quite so literally. Imagine I get a blob of >> code, how do I know that it doesn't fake things? The only way I can >> see to do this is a completely trusted compiler, which can sign its >> output, so you're still dynamically checking, you just do it once, when >> the program starts (isn't this what MS push with ActiveX?). Or I guess >> you can do some kind of proof on the program before running it (Java?).
>> Given the negligible cost of checks, I'd kind of rather the OS just did >> them though.
> NAK! This implies that nobody can modify the compiler. [ ... ]
Yes. But there are far better methods than just signing the output of the compiler. In particular, read up on proof-carrying code: It does not require a certifying compiler (you can even write the code by hand as long as you also write the corresponding proof). Code (and proof!) can come from anywhere. Finally, the trusted computing base can be far smaller than a typical compiler.
>>>>> "NN" == Nicolas Neuss <Nicolas.Ne...@iwr.uni-heidelberg.de> writes:
[...] NN> I'm sorry for it, but the above is nonsense. CMUCL allocates NN> full words also for booleans (as can be seen from the consed NN> bytes). [Additionally, the original code contained an NN> omission (it does not reinitialize the array for each test NN> run), which I have augmented with another error...]
I went back and forth with Doug on this. There are several issues you need to think about:
-- If you use bit vectors, you pay for some shifting of bits etc. CMUCL actually generates somewhat suboptimal code for the X86 platform (and extra mask and returning a value that's not used)
-- If you use eight bit BYTE's and don't coerce the compiler to use machine integers to address the array. You again pay for shifting for fixnum untagging.
-- If you use fixnums or machine integers (32 bit) fixnums can address them w/o untagging, and you get fast results BUT this is cheating (won't scale and it will spill over to L2 cache even with a small array. Doug's machine has 1/2 speed L2 cache (P-II) so your results will vary if you have a full speed L2 (eg celeron will beat regular P-II).
this is about all I remember.
There was also the additional issue of loop macro and declarations if remember correctly.
Doug probably has a changelog of all this somewhere. But disassemble and the compiler trace facility of CMUCL should be helpful also.
On Wed, Mar 06, 2002 at 05:09:55PM +0000, Ray Dillinger wrote: > Tim Bradshaw wrote:
> > * Stefan Monnier wrote: > > >>>>>> "Tim" == Tim Bradshaw <t...@cley.com> writes: > > >> instance: OS gives me a file descriptor, I then hack at it with a hex
> > > The OS disallows "hacking at it with a hex editor". > > > Unless you're some kind of super-privileged user, of course (just like > > > you can write all over /proc/kmem if you're root).
> > I didn't quite mean it quite so literally. Imagine I get a blob of > > code, how do I know that it doesn't fake things? The only way I can > > see to do this is a completely trusted compiler, which can sign its > > output, so you're still dynamically checking, you just do it once, > > when the program starts (isn't this what MS push with ActiveX?). Or I > > guess you can do some kind of proof on the program before running it > > (Java?).
> > Given the negligible cost of checks, I'd kind of rather the OS just > > did them though.
> NAK! This implies that nobody can modify the compiler. If you > have a compiler that signs its output, then somebody can open up > the source code and find the signing key. Then the signing key > can be used to sign arbitrary output. That means you cannot > release the source code for your compiler.
No need to sign output. Simply disallow any binaries that were not created by that machine's compiler. In order to run source code it must be passed through, and checked, by the compiler on that machine.
> Or maybe read priveleges to it are root-only and root can set the > signing key for a particular installation -- but then you have a > problem that nobody can compile on one system and run on another.
Oh well. FreeBSD gets by (though it's not required to compile, they still do a lot).
> Far far better to have potentially-dangerous processes running in > their own memory arenas where the OS can keep an eye on them in > case they try messing anything up.
Context-switches are expensive, remember. An OS/compiler that removed as many layers as possible between program and underlying hardware would be much faster; if the compiler has a chance to examine every piece of code that goes in the system then it may be able to do this.
-- ; Matthew Danish <mdan...@andrew.cmu.edu> ; OpenPGP public key: C24B6010 on keyring.debian.org ; Signed or encrypted mail welcome. ; "There is no dark side of the moon really; matter of fact, it's all dark."
* Tim Bradshaw | I assume they add support for the new target to gcc, compile gcc on an | existing system targeted at the new system and then run this new compiler | on the new system.
This is probably doable, but in my experience with cross-compilation, you do not just generate code, you effectively generate a module that works with a much larger system. To make this _really_ work, you have to have intimate knowledge of the target system. Since the compiler is often the first thing you build on a new system in order to build the other tools you want to use there, my thinking is that you save a lot of time using a pre-existing compiler and like tool, particularly to ensure that you get the linking information right for that particular environment, what with all the shared library dependencies and whatnot.
/// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.
Erik Naggum <e...@naggum.net> writes: > I actually tried to argue that the same would true of a Common Lisp > system, but that portability constraints dictate that those who want to > port a Common Lisp compiler to System X on the Y processor should be able > to use the portable assembler (C) instead of having to start off writing > non-portable assembler and use the system's assembler to bootstrap from.
You'll always need an assembler, of course; there isn't any way around that. And there are advantages for systems like the old KCL and its descendents which use GCC as the back end for the compiler.
But I'm thinking about a different problem space, not the one you are.
> Needing *only* GCC, as you say, is predicated on the existence of a > binary for your system to begin with. How do people port GCC to a new > platform om which they intend to build the GNU system? My take on this > is that it is no less dependent on some other existing C compiler than > the similar problem for CL compilers is. Duane, please help. :)
People port GCC to new platforms by having GCC cross-compile code. No reliance on other compilers is necessary.
Ray Dillinger <b...@sonic.net> writes: > NAK! This implies that nobody can modify the compiler. If you > have a compiler that signs its output, then somebody can open up > the source code and find the signing key. Then the signing key > can be used to sign arbitrary output. That means you cannot > release the source code for your compiler.
No, a trusted compiler is simply the only object that has the ability to create compiled-procedure objects. No problem at all!
Well, the problem is still that only the one compiler is the trusted one. Two solutions for that problem are to use a subsetted bytecode thing, like the Java VM, and to use proof-carrying code to validate compiler output.
Erik Naggum <e...@naggum.net> writes: > This is probably doable, but in my experience with cross-compilation, you > do not just generate code, you effectively generate a module that works > with a much larger system. To make this _really_ work, you have to have > intimate knowledge of the target system. Since the compiler is often the > first thing you build on a new system in order to build the other tools > you want to use there, my thinking is that you save a lot of time using a > pre-existing compiler and like tool, particularly to ensure that you get > the linking information right for that particular environment, what with > all the shared library dependencies and whatnot.
No, Tim was totally right. You don't use the pre-existing compiler in general; often times the manufacturer isn't providing one.
Often you are the first person writing one: this is now rather often the case with GCC.
Erik Naggum <e...@naggum.net> writes: > * Nils Goesche > | IIRC, they first write a /cross/ compiler for the new system that > | runs on an old system. Then they use the cross compiler to compile > | gcc itself and voila... done. Hey, sounds easy, doesn't it? :-))
> It sounds like _vastly_ more work than building on the native system with > a native assembler and linker to build the first executables until you > could replace those, too.
You really have no clue how GCC works if you think it's more trouble. Really, GCC is totally equipped to do cross-compilation (as are all the other parts of the toolchain).
Erik Naggum <e...@naggum.net> writes: > * tb+use...@becket.net (Thomas Bushnell, BSG) > | One might well need bootstrap in designing and initially building the > | system. But now, one needs *only* GCC to build GCC, and not anything > | else. Once one has a running system with GCC, you don't any longer > | need the pcc compilers that GCC was originally built with.
> I actually tried to argue that the same would true of a Common Lisp > system, but that portability constraints dictate that those who want to > port a Common Lisp compiler to System X on the Y processor should be able > to use the portable assembler (C) instead of having to start off writing > non-portable assembler and use the system's assembler to bootstrap from.
> Needing *only* GCC, as you say, is predicated on the existence of a > binary for your system to begin with. How do people port GCC to a new > platform om which they intend to build the GNU system? My take on this > is that it is no less dependent on some other existing C compiler than > the similar problem for CL compilers is. Duane, please help. :)
I don't view self-hosting as precluding any bootstrapping which is necessary to get to that self-hosting state. In fact, I would be surprised to hear of _any_ kind of self-hosting which doesn't require a non-self-hosted bootstrap. This applies to both cross-compiling from another architecture and re-compiling on the same architecture starting with a different compiler.
Thomas Bushnell's challenge is a good one. And this thread has been a good one, as well. Several times I considered answering some of the statements made on this thread, but have refrained because there are so many issues and stochastic requirements. So I thought I'd put together several ideas and present them at once, from the point of view of an Allegro CL developer.
As an initial summary, I submit that the entire lisp _could_ be written entirely in lisp, but that it is not convenient to do so, given the fact that we run our lisp on Unix and MS systems, which are all C based, and even embedded systems tend to have libc equivalent interfaces. However, I do disagree that it is necessary to require that the whole language be available for a GC written in lisp, and will explain that later as well.
First, a background review of Allegro CL's structure, for those who don't yet know:
1. Most of Allegro CL is written in Allegro CL, and compiles per architecture to bits (represented in code vector lisp objects) using the Allegro CL compile and compile-file functions.
2. A subsection of the kernel or "runtime" of Allegro CL is an extension of CL I call runtime or "rs" code, which also use the Allegro CL compiler, extended and hooked to produce assembler source as output.
3. Some small part of Allegro CL is written in C. On some architectures, the C++ compiler is used, but it is mostly written in C style. The major purpose of the C code is to parse the .h header files of the system for the os interface. We try mostly to limit our C code to os-interface functionality and regularization.
In addition, as a kind of #3a: We also have written our garbage-collector and our fasl-file reader in C.
The binaries from 2, 3, and 3a are all linked together using the system linker to either produce a single executable, or to produce a simple executable main and a shared-library. In both cases, that link output serves dual purpose as a bootstrap mechanism to load pure lisp code in (i.e. from #1) or to re-estalish the environment dumped in a previous lisp session.
The rs code in #2 is sort of a combination of superset/subset of regular CL code; it understands both C and Lisp calling conventions, but does not set up a "current function" for its own operation. Since the produced code is just assembler source, and does not set up a function object, local constants are not allowed; only constants that are in the lisp's global table can be referenced by rs code. Recently, I added an exception to this; string constants can now be represented in rs code - these will become .asciz or equivalent directives in the assembler source. This allows such rs functions as
(def-runtime-q print-answer (n) (q-c-call printf "The answer is %d " n))
I have also recently extended the rs code to allow for large stack-allocated specialized arrays; we've always been able to allocate stack-based simple-vectors in rs code, but due to the rs code stack frame descriptors we provided for gc purposes, non-lisp data had been restricted to a few hundred bytes until now.
Theoretically, due to these and other changes, we should now be able to rewrite both the fasl reader and the garbage-collector in rs code, but it hasn't been a high priority. For the garbage collector especially, there must be an incentive to make such a potentially regressive move; it may be that a new gc to handle concurrent MP might be just that incentive.
For #3, I was almost ready to disagree with Thomas Bushnell because I believed that it is necessary to use C functionality to interface to C library functions. This is especially true for the need to parse .h files, and to get the correct definitions and interfaces based on particular #define constants. If you doubt this, just try to figure out, for example, HP's sigcontext structure, which has layer upon layer of C macrology to define a large number of incompatible structure and interface definitions.
However, I had to back off on any such disagreement, becuase it certainly is _possible_ to write any of these interface functions in lisp, using such facilities as our Cbind tool to pre-parse the header files and to thus present all pertinent information to the lisp-in-lisp code. However, I still am not inclined to do such a thing, because it would be specialized toward lisp bootstrap, and thus not useful for anything else. And why not use C at what it does best (parse C header files)? Besides, even our Cbind facility uses the gcc front-end to do the initial parsing, so in essence a non-lisp compiler part would still be used. Bottom line; it is more convenient to write our os-interface code in C, because it interfaces to C libraries. I suppose that we would remove such C interfaces if we were porting our lisp to a Lisp operatring system.
Finally, I'd like to disagree wholeheartedly with the notion that the full language must be available for the whole lisp implementation. Specifically, I am responding to this point by Thomas Bushnell:
>> I thought I was bright-shining clear. What I want is a GC written in >> the language itself, with all the normal language facilities >> available.
It is the notion of "availability" that I take issue with. To make my point, consider the statement that in every English sentence, all letters of the alphabet are available. That is, of course, a true statement. And as in the specific example where "The quick brown fox jumped over the lazy dogs." it is obviously possible to construct a sentence which _indeed_ does use every letter in the alphabet. However, does this require that every sentence be constructed in such a way? Of course not! It is thus not the whole alphabet which is available to a particular sentence, but only those letters which in fact work toward constructing the sentence, which are in fact "available". Thus, for normal conversation, the letter "q" is not generally available to me to use unless I am using a word which has a "q" in it (or unless I'm specifically talking about the letter "q" itself).
Let's extend this notion to an extensible language like Lisp. Consider the start of a CL function foo:
(defun foo () ...)
Now, the body of foo can refer to any CL functionality, including foo itself. However, it would generally be bad programming (i.e. a bug) to allow a call to foo within foo which results in an infinite recursion. Thus, to some extent, foo is not fully available to use as one wishes within foo.
Similar truths apply to a garbage-collector. It might be perfectly acceptable for a gc function to call cons, but it had better be prepared to deal with the case where there is no more room for a new cons cell, which would thus cause a recursive call to the garbage-collector (presumably an infinite recursion, since the reason for the initial gc call might have been for lack of space).
And, as The Oracle in Matrix says, "What's really going to bake your noodle ..." is that at least in CL, there is no definition of what a garbage-collector actually _is_. There are a few references, but no definitions or specs...
-- Duane Rettig Franz Inc. http://www.franz.com/ (www) 1995 University Ave Suite 275 Berkeley, CA 94704 Phone: (510) 548-3600; FAX: (510) 548-8253 du...@Franz.COM (internet)
Duane Rettig <du...@franz.com> writes: > As an initial summary, I submit that the entire lisp _could_ be > written entirely in lisp, but that it is not convenient to do so, > given the fact that we run our lisp on Unix and MS systems, which > are all C based, and even embedded systems tend to have libc > equivalent interfaces.
So it should be pointed out that one of the reasons I'm interested in this question is that I'm interested in lisp systems running on bare metal.
> However, I do disagree that it is necessary > to require that the whole language be available for a GC written in > lisp, and will explain that later as well.
I agree that it may not be *necessary* depending on what that means.
But note that I began by asking about both Scheme and CL; the point is that of course I could confine myself to a tiny subset of CL and do things in PL/I (er, I mean "the loop macro").
However, the real things I want are fairly simple: I want complex closures and I want cons. I might want call/cc, at least, I'm not willing to exclude that a priori.
> For #3, I was almost ready to disagree with Thomas Bushnell > because I believed that it is necessary to use C functionality > to interface to C library functions.
If you really need to, you can do that, and it may well be the most efficient implementation strategy if you want to run on Unix. (As, of course, you do.)
A "pure" implementation means that you would do the same work the C library people do, and make Lisp equivalents for the C header files yourself. Remember, *I'm* always thinking of this from a systems design perspective, so "tell the other group to do the work" isn't really a solution. :) But if the other group is doing the work anyway (as is the case for people running a Lisp environment on Unix, then of course, it's convenient to piggyback on them.
> For #3, I was almost ready to disagree with Thomas Bushnell > because I believed that it is necessary to use C functionality > to interface to C library functions.
The actual interfaces you need are the *kernel* interfaces, not interfaces to the C library. From the systems design perspective, your system would be *replacing* the C library, not borrowing it. If you do want to borrow it, then it might be most convenient to use C to hook into it, though as you correctly note, even then you can get around it.
> Similar truths apply to a garbage-collector. It might be > perfectly acceptable for a gc function to call cons, but it > had better be prepared to deal with the case where there is > no more room for a new cons cell, which would thus cause a > recursive call to the garbage-collector (presumably an infinite > recursion, since the reason for the initial gc call might have > been for lack of space).
So the *point* of my question is, in part, just this problem. Now, if there *isn't* a solution, then you have to subset the language, omit cons, and then code your GC.
But why do that if there is a convenient solution?
One strategy: suppose to GC an arena of N bytes takes N/10 bytes of memory to hold dynamically allocated GC data structures.
One strategy is to just save that space always, so it's there. Or, if one is using stop-and-copy, then it's even easier to find space.
If one is in a multi-threaded world, and each thread gets its own allocation arena for normal allocation, and you are using a stop-the-world approach to GC, then you can't reliably assume (perhaps) that all the threads have left their arena in an ideal state. That means that the GC will probably have to allocate out of a totally separate arena from what other programs use. When it's done, a quick GC pass (allocating from the main heap) can be run to clean the special GC arena, and copy anything remaining there onto the main heap.