Why Arc isn't especially OO

54 views
Skip to first unread message

Bill Clementson

unread,
Dec 12, 2001, 12:01:03 AM12/12/01
to
FYI, Paul Graham has posted a new article on his web site discussing why Arc
isn't especially OO:
http://www.paulgraham.com/noop.html

--
Bill Clementson


Kenny Tilton

unread,
Dec 12, 2001, 2:09:52 AM12/12/01
to

Bill Clementson wrote:
>
> FYI, Paul Graham has posted a new article on his web site discussing why Arc
> isn't especially OO:
> http://www.paulgraham.com/noop.html
>

Go figure. OO is a natural form of expression for me. I was doing OO
before the languages were OO. Maybe that is Graham's point, but I think
formalizing my ad hoc pseudo-OO-wannabe techniques was a Good Thing. And
CLOS especially does not really lock you in too badly, especially with
the MOP.

programmers sure do differ on basic stuff. :)

kenny
clinisys

Tim Bradshaw

unread,
Dec 12, 2001, 6:03:52 AM12/12/01
to
"Bill Clementson" <bill_cl...@yahoo.com> wrote in message news:<jwBR7.11052$7y.115433@rwcrnsc54>...

> FYI, Paul Graham has posted a new article on his web site discussing why Arc
> isn't especially OO:
> http://www.paulgraham.com/noop.html


He's very smart, isn't he?

I think there's a big backlash happening / going to happen against
OO-as-it-has-become, which really means some terribly restrictive
cultish thing involving obsessive data hiding, restrictive inheritance
protocols making programming unduly difficult (you want to support
this protocol? well, sorry you can't mix in this class that does it,
you have to reimplement it all with nice different bugs), tying
everything into class definitions and so on. Of course this is
nothing to do with what object oriented programming actually once was,
or with what CLOS is.

So he's worked this out, I guess, and is positioning arc as non-OO.
He's missed the point a bit with CLOS, since it turns out that CLOS
isn't OO either (That's OO-the-cult as opposed to being object
oriented...), but maybe that's just anti-CLism, or just a matter of
style (I'm kind of an inverse Paul Graham: I use CLOS all the time,
but I don't think I do OOP).

Having taught CL courses over the last 5-6 years where I've tried to
demonstrate that CL really is OO, even if it has these inconvenient
things like method definitions outside classes and so on, I can see
that now I'll have to reposition it as `not *actually* OO even though
it looks a bit like it'.

--tim

Espen Vestre

unread,
Dec 12, 2001, 6:09:49 AM12/12/01
to
tfb+g...@tfeb.org (Tim Bradshaw) writes:

> things like method definitions outside classes and so on, I can see
> that now I'll have to reposition it as `not *actually* OO even though
> it looks a bit like it'.

I can see your point - but isn't it a bit sad to surrender to the cultish
definition of OO? On the other hand - maybe we should start referring
to CLOS programming as "generic function programming" and refer, in
footnotes, to "Cult OO" as a "a severely constrained special case of GFP"
;-)?
--
(espen)

Scott McKay

unread,
Dec 12, 2001, 9:41:36 AM12/12/01
to

"Espen Vestre" <espen@*do-not-spam-me*.vestre.net> wrote in message
news:w6k7vsb...@wallace.ws.nextra.no...

I've tried to make the point to Paul that he is setting up a Java-like
strawman of OO, then knocking it down. So what, big deal. So
instead of thinking about the useful things OO can provide, he's
throwing out the baby with the bathwater.

Here's a suggestion: stop calling what languages such as Common Lisp
and Dylan provide "object-oriented programming". God forbid, don't
call it "function-oriented programming", because that will get other
people's
knickers in a twist. Some years ago I suggested "protocol-oriented
programming".

Nils Goesche

unread,
Dec 12, 2001, 10:33:09 AM12/12/01
to
Kenny Tilton <kti...@nyc.rr.com> writes:

Well, he doesn't really say doing OO is bad, does he? Just that he
personally never felt the need to use CLOS. Actually, I sympathize
with most of the articles he writes about his `Arc' language; I don't
like the look of the long `multiple-value-bind' either. But what I
absolutely, positively don't get is why he thinks all those (IMO)
minor issues justify all the effort of the creation of a new
language. He could easily write a few macros that would turn CL into
the very language he describes in those articles, or something very
similar. I guess he has become rich and doesn't know how to use his
time for something useful :-)

Regards,
--
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9

Erik Naggum

unread,
Dec 12, 2001, 12:21:44 PM12/12/01
to
* "Scott McKay" <s...@mediaone.net>

| Some years ago I suggested "protocol-oriented programming".

That is probably _the_ most constructive suggestion I have seen.

///
--
The past is not more important than the future, despite what your culture
has taught you. Your future observations, conclusions, and beliefs are
more important to you than those in your past ever will be. The world is
changing so fast the balance between the past and the future has shifted.

Andreas Bogk

unread,
Dec 12, 2001, 12:30:35 PM12/12/01
to
"Scott McKay" <s...@mediaone.net> writes:

> Here's a suggestion: stop calling what languages such as Common Lisp
> and Dylan provide "object-oriented programming". God forbid, don't
> call it "function-oriented programming", because that will get other
> people's knickers in a twist. Some years ago I suggested
> "protocol-oriented programming".

There's a solid scientific theory behind what we do, the
lambda& calculus[0]. And it does count as both functional
and object-oriented. So why hide?

Andreas

[0] For those who don't know, not the regular lambda calculus,
but extended by multiple dispatch. See
http://www.di.ens.fr/~castagna/pub.frametop.html
Buy the book if you're interested in a calculus explaining
CLOS and Dylan.

--
"In my eyes it is never a crime to steal knowledge. It is a good
theft. The pirate of knowledge is a good pirate."
(Michel Serres)

Coby Beck

unread,
Dec 12, 2001, 1:43:38 PM12/12/01
to

"Nils Goesche" <car...@cartan.de> wrote in message
news:lkpu5k4...@pc022.bln.elmeg.de...

> Kenny Tilton <kti...@nyc.rr.com> writes:
>
> > Bill Clementson wrote:
> > >
> > > FYI, Paul Graham has posted a new article on his web site
> > > discussing why Arc isn't especially OO:
> > > http://www.paulgraham.com/noop.html
> >
> > Go figure. OO is a natural form of expression for me. I was doing OO
> > before the languages were OO. Maybe that is Graham's point, but I think
> > formalizing my ad hoc pseudo-OO-wannabe techniques was a Good Thing. And
> > CLOS especially does not really lock you in too badly, especially with
> > the MOP.
> >
> > programmers sure do differ on basic stuff. :)
>
> Well, he doesn't really say doing OO is bad, does he? Just that he
> personally never felt the need to use CLOS. Actually, I sympathize
> with most of the articles he writes about his `Arc' language; I don't
> like the look of the long `multiple-value-bind' either. But what I
> absolutely, positively don't get is why he thinks all those (IMO)
> minor issues justify all the effort of the creation of a new
> language. He could easily write a few macros that would turn CL into
> the very language he describes in those articles, or something very
> similar.

This is what struck me very hard too. But I didn't read it in great detail...

--
Coby
(remove #\space "coby . beck @ opentechgroup . com")


Tim Bradshaw

unread,
Dec 12, 2001, 1:49:28 PM12/12/01
to
"Scott McKay" <s...@mediaone.net> wrote in message news:<A0KR7.13893$5W5.5...@typhoon.ne.mediaone.net>...

> Some years ago I suggested "protocol-oriented
> programming".

I like this, or something with protocol in it, anyway.

--tim

Kent M Pitman

unread,
Dec 12, 2001, 3:37:20 PM12/12/01
to
"Coby Beck" <cb...@mercury.bc.ca> writes:

I haven't read it at all, but I can still speak to the issue of this point.
I think it's related to the IF* thing, though I mention that only to show
another instance of a small matter turned big, when other small matters don't,
so that one has more understanding that this is a general issue, not specific
one to Paul.

It comes back to my thesis that languages are political parties and to
the thought experiment raised in my paper
http://world.std.com/~pitman/PS/Lambda.html where I suggest that EVEN
IF you vacated some other language's semantics and made it identical
to Lisp, or vice versa, the two languages might be momentarily at the
same point in design space, but their velocity vector, or perhaps
better, their acceleration (since we're really talking gravitational
force here), is different. Over time, as long as there ARE two
languages, they would drift, not move forever after in lockstep
evolving the same. In time, they would be different again. It's
almost like them snapping back to their original destiny (as I vaguely
recall might have been the thesis in Asimov's The End of Eternity,
where I think they said that you could make local changes in time
without affecting the whole future of time because things would
eventually converge again). (Ironic, I suppose that the "controlled"
aspect of design might lead to divergence but the "uncontrolled"
aspect of natural forces might leave to convergence. Hmm. I'll have
to think about that.) Anyway, my point is that the analysis here
about "small change" presumes you are measuring distance in design
space rather than acceleration due to gravity. If you measure the
latter, you may find that the diffrence is much greater, even if the
two things are coincidentally co-located.

In plain English, it isn't where you are but where you're going that
defines whether there might be a need to diverge.

And unlike in physics, politics uses "negotiated gravitational
attraction" where the quark named "spin" (as in "spin control") is
what makes the difference in who's going to be attracted to whom.

I don't know if that makes it right for people to diverge. But
hopefully it makes it easier to understand that they did it for
reasons that are perhaps more non-trivial than they seem. Always
negotiations on getting someone back into the fold should be on the
issue of "direction" not "position", or they will fall on deaf ears.

bc1...@attbi.com

unread,
Dec 12, 2001, 9:27:19 PM12/12/01
to
Kent M Pitman <pit...@world.std.com> writes:

> "Coby Beck" <cb...@mercury.bc.ca> writes:
>
> > "Nils Goesche" <car...@cartan.de> wrote in message
> > news:lkpu5k4...@pc022.bln.elmeg.de...
> > > Kenny Tilton <kti...@nyc.rr.com> writes:

> > > <snip ...>


> > > language. He could easily write a few macros that would turn CL into
> > > the very language he describes in those articles, or something very
> > > similar.
> >
> > This is what struck me very hard too. But I didn't read it in great
> > detail...

It could be that this is what will be (has been?) done. Paul Graham references
comments that were made by Jonathan Rees. On Jonathan's home page, under the
link related to jobs he has had, he mentions that he is currently working for a
company called Clear Methods Inc. His comments about the company:

"a startup still somewhat in stealth mode.
We are using the enhanced productivity made possible by Lisp to make
tools for web application construction and deployment. The company's
Lisp dialect is disguised as an extension of XML, both to obtain
excellent XML integration and because of the immense popular prejudice
against Lisp."

http://mumble.net/jar/industry.html

The web page for Clear Methods is a bit Spartan, indicating a Q4/2001 launch
and a slogan "Ending the Software Crisis":
http://www.clearmethods.com/

It is unclear who owns Clear Methods or what their product is. Maybe Arc is
either a "feeler" to gauge potential support or a "teaser" for functionality
that will be released by a company as a commercial product. This is, of course,
just speculation on my part ...

--
Bill Clementson

Kent M Pitman

unread,
Dec 13, 2001, 12:04:44 AM12/13/01
to
bc1...@attbi.com writes:

> It is unclear who owns Clear Methods or what their product is. Maybe
> Arc is either a "feeler" to gauge potential support or a "teaser"
> for functionality that will be released by a company as a commercial
> product. This is, of course, just speculation on my part ...

I'm pretty sure that whatever Clear Methods' product is, it isn't
Paul Graham's Arc, if that's what you're hinting at.

Language designers just like to chat among each other, like any other
interest group.

bc1...@attbi.com

unread,
Dec 13, 2001, 12:44:13 AM12/13/01
to
Kent M Pitman <pit...@world.std.com> writes:

> bc1...@attbi.com writes:
>
> > It is unclear who owns Clear Methods or what their product is. Maybe
> > Arc is either a "feeler" to gauge potential support or a "teaser"
> > for functionality that will be released by a company as a commercial
> > product. This is, of course, just speculation on my part ...
>
> I'm pretty sure that whatever Clear Methods' product is, it isn't
> Paul Graham's Arc, if that's what you're hinting at.

I wasn't "hinting" at anything, I noticed that there were some
similarities between the stated goals for Arc and the work that Jonathan
Rees was doing and made a speculative comment on that. If they did happen to
be collaborating on a product, I would view that as a positive thing as it
would make more sense to me. I have trouble understanding why anyone would try
to create a new dialect of Lisp and, if Arc happened to be a precursor to a
product built on Lisp but customized for web site creation, I would find that
more logical than an attempt to create a new lisp dialect just for the hell of
it.

--
Bill Clementson

Lieven Marchand

unread,
Dec 13, 2001, 12:41:36 PM12/13/01
to
Kent M Pitman <pit...@world.std.com> writes:

> It comes back to my thesis that languages are political parties and to
> the thought experiment raised in my paper
> http://world.std.com/~pitman/PS/Lambda.html where I suggest that EVEN
> IF you vacated some other language's semantics and made it identical
> to Lisp, or vice versa, the two languages might be momentarily at the
> same point in design space, but their velocity vector, or perhaps
> better, their acceleration (since we're really talking gravitational
> force here), is different. Over time, as long as there ARE two
> languages, they would drift, not move forever after in lockstep
> evolving the same. In time, they would be different again. It's
> almost like them snapping back to their original destiny

The C/C++ community did perform a form of your thought experiment,
with one of the original goals of the C++ standardisation committee
being to let ANSI Classic C be as much a subset of ISO C++ as
possible, which they succeeded in remarkably well. (Most differences
are of the trivia kind.) The updated C standard has taken off in a
very different direction and the new C++ standard probably will remove
further from the common subset too. And this is with a reasonable
amount of overlap in the two standard communities.

--
Lieven Marchand <m...@wyrd.be>
She says, "Honey, you're a Bastard of great proportion."
He says, "Darling, I plead guilty to that sin."
Cowboy Junkies -- A few simple words

James A. Crippen

unread,
Dec 13, 2001, 10:36:22 PM12/13/01
to
Nils Goesche <car...@cartan.de> writes:

> But what I absolutely, positively don't get is why he thinks all
> those (IMO) minor issues justify all the effort of the creation of
> a new language. He could easily write a few macros that would
> turn CL into the very language he describes in those articles, or
> something very similar. I guess he has become rich and doesn't
> know how to use his time for something useful :-)

Now now, don't be too mean (although I'd love to be rich and have time
to tinker with new languages too). I'd bet that the first
implementation *will* be in a Lisp, but the whole point is to have a
small and easily compiled language. Scheme is small but not easily
compilable, and CL is large. Recall that his method of web-service
using Lisp was to spawn processes for each new connection. Smaller
processes (using smaller languages) spawn more quickly, and take up
less memory. They also run faster.

I understand his purpose perfectly. CL is really too large for the
purpose he applied it to. If you don't need CLOS or hash tables for
instance, then you're losing lots of code space in your binary.
Trimming down CL is a start, but he's taking the chance to tinker with
a new language design, which is fun enough, and might eventually be
useful.

And the Lisp Hacker that has not written their own Lisp-like language
is not the true Lisp Hacker.

'james

--
James A. Crippen <ja...@unlambda.com> ,-./-. Anchorage, Alaska,
Lambda Unlimited: Recursion 'R' Us | |/ | USA, 61.20939N, -149.767W
Y = \f.(\x.f(xx)) (\x.f(xx)) | |\ | Earth, Sol System,
Y(F) = F(Y(F)) \_,-_/ Milky Way.

Gabe Garza

unread,
Dec 13, 2001, 11:11:44 PM12/13/01
to
ja...@unlambda.com (James A. Crippen) writes:

> I understand his purpose perfectly. CL is really too large for the
> purpose he applied it to. If you don't need CLOS or hash tables for
> instance, then you're losing lots of code space in your binary.

The purpose of responding to http requests, or the implementation
method of spawning a new process for each request? If you instead
implement http responses using threads (as, e.g., cl-http does) you
get faster "process" start-up times then spawning a new OS-level process
and the size of the executable (environment, interpreter, compiler,
delivered app, whatever) and language becomes largely irrelevant.

Of course, this assumes that Lisp has threads, which, unfortunately,
is arguable.

Gabe Garza

Erik Naggum

unread,
Dec 14, 2001, 1:10:24 AM12/14/01
to
* James A. Crippen

| I understand his purpose perfectly. CL is really too large for the
| purpose he applied it to. If you don't need CLOS or hash tables for
| instance, then you're losing lots of code space in your binary.

I do not understand this position at all. Regardless of the "tree
shaking" issues, "too large language" is simply false. The concept of a
language being "too large" is vacuous. Just because starting up a new
process is thought to be slow, does not mean that the language is too
large. First, startup time has nothing to do with language size.
Second, code size has nothing to do with startup time. Third, neither
has binary size or disk footprint or any of the other numerous idiot
measures of size. As a matter of counter-evidential fact, most of the
Schemes I used before I came to my senses are still way, way slower to
start up than the Common Lisps I have used, especially with actual code
being used, and Scheme is about the smallest language you can get.

Execing a new image is usually quite expensive, but forking is not, at
least under reasonable Unices, so it should be possible to fork a running
image of a large binary with almost no overhead. Reasonable Unices even
do copy-on-write and will not waste memory needlessly, either. This
means that the startup time has become _completely_ irrelevant.

Furhtermore, the one-thread-per-connection model is inefficient compared
to a tightly integrated I/O loop because a context switch is so much more
expensive than merely running the same code with several file descriptors.

Making languages smaller to get faster startup time in a system that has
decided on the fork and exec model appears to me no smarter or effective
than shaving to lose weight.

| And the Lisp Hacker that has not written their own Lisp-like language is
| not the true Lisp Hacker.

Nor is he who has not seen the folly of the endeavor in time.

Alexander Kjeldaas

unread,
Dec 14, 2001, 4:41:32 AM12/14/01
to
Erik Naggum wrote:

I have no experience with server programming in lisp, but I would think
that a fork-once-in-a-while model would be an efficient method for
implementing a server in a garbage-collected language. In the
fork-once-in-a-while model, you have a "tree-shaked" mother process and
fork off a pool of processes that handle connections. Each process can
handle more than one connection, but when the processes have consed too
much, you kill them off and fork a new process instead of starting garbage
collection.

In this system, you have to explicitly communicate with the mother process
whenever you want to change global state (session management etc. for a
http server). In such a system you could also take all kinds of shortcuts
with the memory allocator since you don't want to garbage collect in the
slave process.

Now, what I am saying is that you might be best off by using fork() to
overcome perceived limitations of garbage collection for implementation of
server processes. A server usually receives a request, processes that
request, delivers a result, and starts from scratch. Another way to say
this is that most lisps (correct me if I am wrong) don't support the most
efficient method for setting up and tearing down memory heaps for server
applications. This might be a grave accusation given that I stated that I
have no experience in this area, but read on :-)

A generational garbage collector probably works fairly well for a server
application. Most objects live fairly short, and if you garbage collect
between requests, you'll get decent performance. However, as you suggest
above, the most efficient method to implement most server applications is
as an event-driven state machine. In such an implementation, you can
expect to have, say 200 simultaneous requests. In this situation, there is
no point in time where a generational garbage collector will work well. If
you garbage collect after a request has finished, you will only reclaim
0.5% of the local heap. In order to get decent performance out of the
generational garbage collector you will have to garbage collect every, say
200 requests which would reclaim on average 50% of the local heap. The
downside of this is that you would use twice the memory compared to a
server written in C.

I think a solution to this problem is to support multiple heaps. Each
request should use a separate heap that can be teared down after the
request has been serviced. This requires a tiny bit of compiler support,
but I think it would make it possible to implement a server application
with virtually no memory or garbage collection overhead compared to a
server written in C.

The operations I think are needed to support multiple heaps are: an
operation to create heaps with various characteristics, and the ability to
call a function so that consing happens within the given heap for
everything that is done within that function (recursively).

Example:
(let ((my-heap (make-heap :no-external-references-allowed)))
(call-with-heap my-heap #'my-func))

In order to take full advantage of the ability to create heaps with various
characteristics, I would need a little help from the memory system and the
compiler. For instance, I woule like to create a heap with the invariant
that no object outside the heap (except for the evaluation stack) should be
able to refer to an object inside the heap. This is a fantastic heap
because it allows me to allocate objects fast, and it allows me to throw
away the whole heap after I am done with it - exactly the thing I want for
server programming.

The compiler support that I need for this heap is that when I do
(call-with-heap my-heap #'my-func)
above, the compiler should make a copy of all stack variables, and it
should make all global variables copy-on-write. That way it impossible for
my-func and functions called by it to "communicate" with the outside world.
To communicate with the outside world, a message passing interface could be
used, just as we would do in a fork()-based server.

I think this is one cool thing that Arc could implement - and something
that would make it ideal for server programming.

There are of course lots of additional issues that needs to be considered
before this can work, but I think this idea is sound and can make a real
difference for a certain class of programs (a class which is getting more
and more important it seems).

astor

Tim Bradshaw

unread,
Dec 14, 2001, 5:27:15 AM12/14/01
to
ja...@unlambda.com (James A. Crippen) wrote in message news:<m3ofl2h...@kappa.unlambda.com>...

> Scheme is small but not easily
> compilable, and CL is large.

CL is large? What on earth are languages like Java / C++ then? Oh I
know, there's some theoretical minimal core of Java which is `small'
but I don't think you can count it as a java system unless it has at
least a Gb of libraries. Likewise perl, python (if not now, soon) &c
&c.

CL is really really small.

And this `it takes a long time to start' stuff. CLISP starts *really*
fast on my semi-antique Sun.

--tim

Scott McKay

unread,
Dec 14, 2001, 8:31:07 AM12/14/01
to

"Andreas Bogk" <and...@andreas.org> wrote in message
news:87667cp...@teonanacatl.andreas.org...

> "Scott McKay" <s...@mediaone.net> writes:
>
> > Here's a suggestion: stop calling what languages such as Common Lisp
> > and Dylan provide "object-oriented programming". God forbid, don't
> > call it "function-oriented programming", because that will get other
> > people's knickers in a twist. Some years ago I suggested
> > "protocol-oriented programming".
>
> There's a solid scientific theory behind what we do, the
> lambda& calculus[0]. And it does count as both functional
> and object-oriented. So why hide?
>

Because Java, e.g., has staked out what it calls "object-oriented"
and the model is broken. We just disassociate ourselves with
the name. And anybody who ever attended an old Lisp & Functional
Programming Conference will remember how bitter the fights are
over the word "functional".

Sometimes retreating to find a better point of entry is a better way
to win a battle.

Summary: it's not hiding, it's just politics.


Carl Shapiro

unread,
Dec 14, 2001, 9:39:20 AM12/14/01
to
Alexander Kjeldaas <astor...@fast.no> writes:

> I think a solution to this problem is to support multiple heaps. Each
> request should use a separate heap that can be teared down after the
> request has been serviced. This requires a tiny bit of compiler support,
> but I think it would make it possible to implement a server application
> with virtually no memory or garbage collection overhead compared to a
> server written in C.

On the LispM one can create an "area" which is explicitly marked as
temporary and later clear it when you are absolutely certain that you
are finished with the storage. However, this is a extremely dangerous
way to manually manage memory. You will find yourself in serious
trouble if anything pointed into that area prior to resetting it.

It is much safer (and simpler) to pool the objects you will be using
for each request.

Kent M Pitman

unread,
Dec 14, 2001, 9:55:27 AM12/14/01
to
Carl Shapiro <csha...@panix.com> writes:

Indeed. The Lisp Machine experimented briefly with the idea of using
temporary areas for consing during compilation, figuring all the result of
compilation went away after. The number of bugs this turned up in programs
was extraordinary and they backed out of it, in favor of the resource kind
of model Carl mentions.

Of course, the lispm used a dynamic flag to control consing. This
means that code that didn't know it was executing in a temporary area
ended up consing into the other heap. One might say a lexical flag was
better, but then the code would either be specialized for a given heap
(and you'd need duplicate code specialized to other heaps in order to
reuse the code) or else you'd need explicit dataflow in from an outer caller
saying what heap to use and every call to anything at all that consed would
need to take an extra arg saying what area to cons into. Each of these is
messy.

A good generational GC gives you all of this with none of these
hassles, even without resourcing. Generational GC is ideally designed
for things like server situations that drop all their pointers when
they are done. The amount of work required to reclaim storage in such
situations is very small and the application robustness is MUCH higher.

Marco Antoniotti

unread,
Dec 14, 2001, 10:19:54 AM12/14/01
to

"Scott McKay" <s...@mediaone.net> writes:

I wholeheartedly agree. I would go as far as claiming that a good
time to retreat is when the "hype level" gets to high :)

Anyway, I really like the term "protocol oriented programming". I
would also adopt KMP's "identity oriented programming".

As an aside, there is still the question of "distributed object
oriented programming". Maybe, "distributed protocol programming" is
what we want?

Cheers

--
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488
719 Broadway 12th Floor fax +1 - 212 - 995 4122
New York, NY 10003, USA http://bioinformatics.cat.nyu.edu
"Hello New York! We'll do what we can!"
Bill Murray in `Ghostbusters'.

Kent M Pitman

unread,
Dec 14, 2001, 11:05:11 AM12/14/01
to
Marco Antoniotti <mar...@cs.nyu.edu> writes:

> Anyway, I really like the term "protocol oriented programming". I
> would also adopt KMP's "identity oriented programming".

I think the word "programming" is what is really causing the problem.

Programming is an "internal" thing. That is, it's on what Java calls
"the private side". But from the outside, it doesn't really matter if
a thing is programmed one way or another, IMO. That is, you can black
box things as we've done with BUILT-IN-CLASS, and it's not a lot different
than a sealed standard class (a la Dylan).

From the outside, the issue is whether there is discovery and of what nature
that discovery is.

One might claim that Lisp and Java both have protocols, but that Lisp is
gf-centric and Java is interface-centric. It's a pity these two aren't
unified, frankly, since each has its virtues. C++ I don't see as having
protocols, really, and not even gf's, just overloading.

When I refer to "identity oriented", I'm less referring to how one programs
(which I regard as a personal matter) and more referring to the external
way one treats an object they didn't program.

And there's the question of reflectivity. Can a program ask what another
program has available or must one consult the doc.

I'd be comfortable saying Lisp is a protocol-oriented, gf-centric,
call-by-identity, partially reflective, dynamic language.

I'd say Java is a protocol-oriented, class/interface-centric,
call-mostly-by-identity, partially reflective, half-dynamic/half-static
language.

I'd say C++ is a protocol-wannabe, class-centric, call-by-this-and-that,
overly static language.

> As an aside, there is still the question of "distributed object
> oriented programming".

For those who want to be dooped.

Though the acronym doop has already been used for dynamic object
oriented programming and is a good one to stay away from.

> Maybe, "distributed protocol programming" is what we want?

Dunno even what "distributed" contributes here.

cbbr...@acm.org

unread,
Dec 14, 2001, 12:26:43 PM12/14/01
to

This points nicely to the "C10K" perspective
<http://www.kegel.com/c10k.html>.

There are multiple strategies, that may even be mixable, and the
"obvious" answers _aren't necessarily correct_.

I'd argue that having a Whopping Huge Shared Binary, in the context of
OSes that cope well with shared binaries, is a Huge Win if you want to
spawn new processes a lot.

If the CLOS or hash table support sits disused in a corner of a Big
Shared Binary, it doesn't do anything to hurt performance. The same
is true for just about any other bit of "linked-in" stuff you might
point to.

The place where "Big Language Specification" becomes a _problem_ for
spawning a process is when that leads to having to do a lot of
load-time work. If setting up CLOS requires running lots of load-time
code, and requires allocating a lot of memory, that's a "big lose."

This is exactly the thing that is irritating about making use of
interesting libraries with CL, Perl, Python, Scheme, or, for that
matter, C/C++.

-> If you use Perl for something, and are using lots of CPAN libs,
application startup time will include loading up and allocating
memory to those libs. The loaded modules are NOT going to be
shared; if you spawn 15 Perl processes that each load an LDAP
driver, you may have just 1 copy of the Perl binaries, but you've
got 15 copies of the LDAP wrappers.

-> If you run LaTeX, and are always using some custom macro set, that
macro set has to get loaded, consuming time and memory each time.
If there are multiple instances, the loaded macros repeat by
instance; they're not shared.

-> It takes my CMU/CL instance something like 15 seconds to load in
the CLG library. That means that application start time is
_guaranteed_ to have an initial latency of _at least_ 15 seconds,
and involves having a bunch of _dynamic_ memory allocation.

-> Netscape Navigator/Communicator loads up all sorts of font info,
user info, blah, blah, blah, which leads to bloated start up
times, and, again, no sharing of the outside-the-binary data.

You don't want to have a lot of respawnings of CL environments;
generating the user's environment seems to be expensive enough that
you don't want to do it too often.

This is where the Apache "mod_lisp" module came in; if loads get heavy
and a single CL instance doesn't cope well enough when hit by lots of
concurrent requests, it would probably be a good thing to spawn
multiple CL instances, and divide requests up amongst them.

.. And I love the serendipity of the occasionally spectacularly
entertaining randomly selected .signature; I couldn't have chosen a
better one if I tried :-).
--
(concatenate 'string "cbbrowne" "@ntlug.org")
http://www.ntlug.org/~cbbrowne/finances.html
"Real concurrency---in which one program actually continues to
function while you call up and use another---is more amazing but of
small use to the average person. How many programs do you have that
take more than a few seconds to perform any task?"
-- New York Times, 4/25/89

Alexander Kjeldaas

unread,
Dec 14, 2001, 1:07:56 PM12/14/01
to
Carl Shapiro wrote:

I think several lisps have the ability to allocate such areas. However, I
have not heard of an implementation that has the compiler support
needed to make it safe to use them. I am not advocating using areas the
way you describe them.

astor

Daniel Barlow

unread,
Dec 14, 2001, 1:02:30 PM12/14/01
to
cbbr...@acm.org writes:

> The place where "Big Language Specification" becomes a _problem_ for
> spawning a process is when that leads to having to do a lot of
> load-time work. If setting up CLOS requires running lots of load-time
> code, and requires allocating a lot of memory, that's a "big lose."

To be picky: allocation is cheap. Anonymous memory allocation might
even be as cheap as adjusting the value of the "next-free-word"
pointer. _Writing_ to that allocated memory (e.g. doing fixups) is
the expensive bit, because then your VM actually has to do some work.

KDE suffered badly from this at one time; due to C++ vtables they have
a zillion relocations in every program, which makes applications
take an age to start up. There's a resource about this at

http://www.suse.de/~bastian/Export/linking.txt


-dan

--

http://ww.telent.net/cliki/ - Link farm for free CL-on-Unix resources

Alexander Kjeldaas

unread,
Dec 14, 2001, 1:27:07 PM12/14/01
to
Kent M Pitman wrote:
>
> Indeed. The Lisp Machine experimented briefly with the idea of using
> temporary areas for consing during compilation, figuring all the result of
> compilation went away after. The number of bugs this turned up in
> programs was extraordinary and they backed out of it, in favor of the
> resource kind of model Carl mentions.
>
> Of course, the lispm used a dynamic flag to control consing. This
> means that code that didn't know it was executing in a temporary area
> ended up consing into the other heap. One might say a lexical flag was
> better, but then the code would either be specialized for a given heap
> (and you'd need duplicate code specialized to other heaps in order to
> reuse the code) or else you'd need explicit dataflow in from an outer
> caller saying what heap to use and every call to anything at all that
> consed would
> need to take an extra arg saying what area to cons into. Each of these is
> messy.

It seems that the method of using a temporary area described is indeed
messy. You need to make sure that all code called conses in the same heap
in order to preserve the correctness of the system. Conseptually this
means one needs an extra parameter to all functions in the system - the
argument would indicate which heap should be used. However, since this is a
part of the lisp system, there are a variety of tricks that can be used to
virtually eliminate any overhead associated with this "extra" parameter. So
I don't agree with your last comment. Having an extra "argument" to every
function call is not necessarily expensive, nor messy.

>
> A good generational GC gives you all of this with none of these
> hassles, even without resourcing. Generational GC is ideally designed
> for things like server situations that drop all their pointers when
> they are done. The amount of work required to reclaim storage in such
> situations is very small and the application robustness is MUCH higher.
>

Could you give an example of a generational GC system which would compete
with a "C-program" in the scenario described in my previous post: an
event-driven server which has 200 outstanding requests at any given time?

astor

Marco Antoniotti

unread,
Dec 14, 2001, 1:53:22 PM12/14/01
to

Kent M Pitman <pit...@world.std.com> writes:

> Marco Antoniotti <mar...@cs.nyu.edu> writes:
>
> > Maybe, "distributed protocol programming" is what we want?
>
> Dunno even what "distributed" contributes here.

This is in reference to an old article by Scott McKey about
"distributed objects" and how they seem to fit well with a current
practice, e.g. in Java. The issue is whether the gf-centric nature of
CL can be adapted to these "distribution" ideas.

Sorry for the brevity

Duane Rettig

unread,
Dec 14, 2001, 1:40:59 PM12/14/01
to
Alexander Kjeldaas <astor...@fast.no> writes:

> I have no experience with server programming in lisp, but I would think
> that a fork-once-in-a-while model would be an efficient method for
> implementing a server in a garbage-collected language. In the
> fork-once-in-a-while model, you have a "tree-shaked" mother process and
> fork off a pool of processes that handle connections. Each process can
> handle more than one connection, but when the processes have consed too
> much, you kill them off and fork a new process instead of starting garbage
> collection.

We've had many discussions at Franz about the many aspects of server
technology that can be implemented. The "poor man's global-gc" (i.e.
throw the lisp away and start over) is a serious idea. It is essentially
an extrapolation of the idea that the most efficient garbage-collectors
tend not to collect garbage, but instead they leave the garbage behind
and collect the good stuff!

Throwing away a lisp process is one way to do gc globally in a generational
situation, but there are other ways. We've had good measurements so far
from a new concept we intorduced into Allegro CL 6.0, where we allow you
to "close" sections of oldest heap space so that the global-gc doesn't
bother trying to collect from those areas. In essence, you would build
your application, globally gc it and size the areas as low as possible,
and then "close" the application so that it never gets globally-gc'd.
Then, when the application creates new workspaces and they become tenured
to oldspaces, those "open" oldspaces are the only ones that get globally
gc'd. Such a global-gc is much faster than gc-ing the whole oldspace.

Another feature of closed old areas is that once such a heap is dumped
out, it can be shared by multiple program invocations on any operating
system which allows copy-on-write mapping. The reason for this is that
those closed old-areas tend not to change at all, and so the "copy"
portion of the copy-on-write is invoked less. This means more efficient
startup times for second, third, ... lisps, and more sharing of the
heap (as well, of course, as the obvious sharing of the shared-libraries
and the text section of the main). All in all, it makes it feasible
to get very good efficiency by using multiple lisp invocations.

> In this system, you have to explicitly communicate with the mother process
> whenever you want to change global state (session management etc. for a
> http server). In such a system you could also take all kinds of shortcuts
> with the memory allocator since you don't want to garbage collect in the
> slave process.

Yes, such communication is needed for coordination of multiple processes.
We've come up with a new Lisp RPC module, which allows one to hide
most of the low-level communication grunge, and instead allows communication
via lisp calls. It was based on technology that was developed for
efficient communication with Java. The developer who did this figured
that since we were doing these great things to communicate with other
languages, it would also be nice to have multiple Lisp processes be able
to talk to each other at a high level as well!

> Now, what I am saying is that you might be best off by using fork() to
> overcome perceived limitations of garbage collection for implementation of
> server processes. A server usually receives a request, processes that
> request, delivers a result, and starts from scratch. Another way to say
> this is that most lisps (correct me if I am wrong) don't support the most
> efficient method for setting up and tearing down memory heaps for server
> applications. This might be a grave accusation given that I stated that I
> have no experience in this area, but read on :-)

Well, yes, this is a grave accusation. It may be true, but before you
make such statements, you should solidify your definition of "the most


efficient method for setting up and tearing down memory heaps for server

applications". I'm not sure if you are figuring that it's simply
obvious what this most efficient method is, or if you are leading to
your conclusion in this article about multiple heaps, but it is not
clear to me what "the most efficient method" means to you.

> A generational garbage collector probably works fairly well for a server
> application. Most objects live fairly short, and if you garbage collect
> between requests, you'll get decent performance. However, as you suggest
> above, the most efficient method to implement most server applications is
> as an event-driven state machine. In such an implementation, you can
> expect to have, say 200 simultaneous requests. In this situation, there is
> no point in time where a generational garbage collector will work well. If
> you garbage collect after a request has finished, you will only reclaim
> 0.5% of the local heap. In order to get decent performance out of the
> generational garbage collector you will have to garbage collect every, say
> 200 requests which would reclaim on average 50% of the local heap. The
> downside of this is that you would use twice the memory compared to a
> server written in C.

The tuning of a generational gc is an art, and we tend to take them on
a case-by-case basis. We also try to make such art available to
programmers who will need it without bogging others down in details,
by providing as much control as possible with good defaults. In order
to test your theory, we would have to have a real test case. I think
that it is dfangerous to make generalizations as above without actually
trying things out; you may be surprised in both directions - some
assumptions you've made about how bad things will be will be wrong, and
some assumptions about how good things will be will be wrong. It
largely depends on the application itself, and on which of the
application's parameters "hits the wall" before others.

> I think a solution to this problem is to support multiple heaps.

[ ...]

Others have posted some history in response to the idea of multiple heaps.
In addition, a fundamental deterrent to having multiple heaps is
a Fragmentation Problem: in a linear address-space which is almost
full nowadays (i.e. in 32-bit architectures), it is hard to predict
the best sizes for such multiple-space memory segments because such
address-space is so precious. For example, in a threaded environment,
where each thread has its own stack, how many threads can you reasonably
fit into your process? You might be surprised at how fast the address
space fills up just so accomodate stacks. Perhaps 64-bits will give us
some breathing room for a generation or so...

--
Duane Rettig Franz Inc. http://www.franz.com/ (www)
1995 University Ave Suite 275 Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253 du...@Franz.COM (internet)

cbbr...@acm.org

unread,
Dec 14, 2001, 2:04:09 PM12/14/01
to
Daniel Barlow <d...@telent.net> writes:
> cbbr...@acm.org writes:
> > The place where "Big Language Specification" becomes a _problem_
> > for spawning a process is when that leads to having to do a lot of
> > load-time work. If setting up CLOS requires running lots of
> > load-time code, and requires allocating a lot of memory, that's a
> > "big lose."

> To be picky: allocation is cheap. Anonymous memory allocation might
> even be as cheap as adjusting the value of the "next-free-word"
> pointer. _Writing_ to that allocated memory (e.g. doing fixups) is
> the expensive bit, because then your VM actually has to do some
> work.

I guess I thought it would go without saying that this wasn't merely
doing a malloc(BIGNUM); I was certainly thinking more along the lines
of (for instance) a CLOS implementation going off and instantiating a
whole horde of (say) metaobject protocol objects.

The pathological case would be the scenario where you build the
(sometimes-often-disputed) "minimal CL core," and that core then loads
in, as a combination of load-time/run-time, the rest of the CL system.

In effect, I'm not at all concerned about the VM issues; I'm concerned
about the work involved computing whatever it is that's getting put
into VM.

> KDE suffered badly from this at one time; due to C++ vtables they
> have a zillion relocations in every program, which makes
> applications take an age to start up. There's a resource about this
> at

> http://www.suse.de/~bastian/Export/linking.txt

I've noticed that no KDE application "boots up" quickly; the article
is an interesting cautionary analysis that's worth a read even if
you're not doing C++...

It would probably be a neat thing if they could set up some sort of
"lazy library-loading" scheme so that (for instance) the "MS Word
import module" would get loaded either on demand, when the user
actually asked for it, or based on some sort of "lazy loader" thread
that would load in extra libraries a bit later. Either approach
represents rather a "hack," and could well be pretty
platform-dependent. Which might be OK if KDE were exclusively a Linux
thing, but they do try to make sure it runs elsewhere...
--
(reverse (concatenate 'string "ac.notelrac.teneerf@" "454aa"))
http://www.ntlug.org/~cbbrowne/oses.html
Philosophy is a game with objectives and no rules.
Mathematics is a game with rules and no objectives.

Scott McKay

unread,
Dec 14, 2001, 3:28:40 PM12/14/01
to

"Alexander Kjeldaas" <as...@fast.no> wrote in message
news:9vdg9u$s6c$1...@news.ost.eltele.no...

> Could you give an example of a generational GC system which would compete
> with a "C-program" in the scenario described in my previous post: an
> event-driven server which has 200 outstanding requests at any given time?

If the C program and the program using a generational GC both allocate
roughly the same amount of memory and don't have memory leaks, a
modern generational GC will whup the pants off malloc/free. Why?
Because GC gets to amortize the cost of calling 'free' over many calls
to malloc.

This effect has been amply documented.


Scott McKay

unread,
Dec 14, 2001, 3:32:19 PM12/14/01
to

"Marco Antoniotti" <mar...@cs.nyu.edu> wrote in message
news:y6cr8px...@octagon.mrl.nyu.edu...

>
> Kent M Pitman <pit...@world.std.com> writes:
>
> > Marco Antoniotti <mar...@cs.nyu.edu> writes:
> >
> > > Maybe, "distributed protocol programming" is what we want?
> >
> > Dunno even what "distributed" contributes here.
>
> This is in reference to an old article by Scott McKey about
> "distributed objects" and how they seem to fit well with a current
> practice, e.g. in Java. The issue is whether the gf-centric nature of
> CL can be adapted to these "distribution" ideas.
>

Yeah, for me it's a little easier to get my head around "distributed
objects".
Distributed generic-functions-with-multi-methods makes my brain hurt.
I've been doing some reading and thinking about this, but I don't have
anything to say that would clarify it. The Java "enterprise bean" model
actually works quite well for some things, but there are other parts of
it that are hugely confusing.

I found the references to Java Spaces and Kali helpful, if anyone else is
interested.


Carl Shapiro

unread,
Dec 14, 2001, 4:22:26 PM12/14/01
to
Alexander Kjeldaas <as...@fast.no> writes:

> I think several lisps have the ability to allocate such areas. However, I
> have not heard of an implementation that has the compiler support
> needed to make it safe to use them. I am not advocating using areas the
> way you describe them.

The problems one often runs into with these sorts of allocation
schemes are not easily solved with help from the compiler. For
example, how do you plan on handling common operations that absolutely
have to side effect the environment such as I/O and logging? What
about asynchronous operations that hold onto a buffer but may not
complete by the time your form unwinds? I suppose you could get away
with wrapping only small extents of code with a "cons over here
instead" form. Nevertheless, I cannot see how one would make
effective use out of this magic feature without submitting to a
painfully rigorous coding convention. However, if you have given any
thought to solving these screw cases, I would very much like to hear
about it.

Kent M Pitman

unread,
Dec 14, 2001, 4:36:05 PM12/14/01
to
Carl Shapiro <csha...@panix.com> writes:

> Alexander Kjeldaas <as...@fast.no> writes:
>
> > I think several lisps have the ability to allocate such areas. However, I
> > have not heard of an implementation that has the compiler support
> > needed to make it safe to use them. I am not advocating using areas the
> > way you describe them.
>
> The problems one often runs into with these sorts of allocation
> schemes are not easily solved with help from the compiler. For
> example, how do you plan on handling common operations that absolutely
> have to side effect the environment such as I/O and logging?

Exactly. Or program-level caching, debugging traces, etc.

> What
> about asynchronous operations that hold onto a buffer but may not
> complete by the time your form unwinds?

Not to mention code to manage keyboard interrupts, synchronous
condition handlers and other closures passed down into this code from
elsewhere in the system, daemon programs like "if-added" daemons that
wake up and run when very general-purpose operations are done, method
combination issues, etc.

> I suppose you could get away
> with wrapping only small extents of code with a "cons over here
> instead" form. Nevertheless, I cannot see how one would make
> effective use out of this magic feature without submitting to a
> painfully rigorous coding convention. However, if you have given any
> thought to solving these screw cases, I would very much like to hear
> about it.

Right.

And keep in mind throughout that there is substantial hidden cost in
(a) proving such weird dataflows would work, (b) retooling your system
to use these specialized techniques, (c) discussions within your
company about the fact committing to these techniques now means you
can't do a variety of extensions in straightforward ways and must seek
workarounds, (d) lack of real-time response to customer requests
because working within such specialized schemes can be very tricky to
write and can involve very complex QA to be sure you've gotten right,
(e) more support dollars to manage because bugs are hard to understand
when reported and hard to track down without good reports, and on and
on.

Michael J. Ferrador

unread,
Dec 14, 2001, 5:22:19 PM12/14/01
to
has anyone had experience porting lisp to (only 2 examples I know)

PDP-11/ >=40? (I & D (2) space)

370/XA ((8 bits of) Dataspaces, (1?) Hyperspace)

The dataspace(s) is a 0 to 2^n space, no OS, associated
in the MMU to a regular address space, addressable by
extra bit(s) in the instruction or addressing mode.

LispMe (scheme) moved some (vector?) datatypes to their own
Motorola DragonBall 68K +/- 32K reletive branch segments

> Kent M Pitman wrote:
> >
> > Indeed. The Lisp Machine experimented briefly with the idea of using
> > temporary areas for consing during compilation, figuring all the result
of
> > compilation went away after. The number of bugs this turned up in
> > programs was extraordinary and they backed out of it, in favor of the
> > resource kind of model Carl mentions.
> >
> > Of course, the lispm used a dynamic flag to control consing. This
> > means that code that didn't know it was executing in a temporary area
> > ended up consing into the other heap. One might say a lexical flag was
> > better, but then the code would either be specialized for a given heap
> > (and you'd need duplicate code specialized to other heaps in order to
> > reuse the code) or else you'd need explicit dataflow in from an outer
> > caller saying what heap to use and every call to anything at all that
> > consed would
> > need to take an extra arg saying what area to cons into. Each of these
is
> > messy.
> >

Alexander Kjeldaas <as...@fast.no> wrote in message
news:9vdg9u$s6c$1...@news.ost.eltele.no...

> It seems that the method of using a temporary area described is indeed
> messy. You need to make sure that all code called conses in the same heap
> in order to preserve the correctness of the system. Conseptually this
> means one needs an extra parameter to all functions in the system - the
> argument would indicate which heap should be used. However, since this is
a
> part of the lisp system, there are a variety of tricks that can be used to
> virtually eliminate any overhead associated with this "extra" parameter.
So
> I don't agree with your last comment. Having an extra "argument" to every
> function call is not necessarily expensive, nor messy.

on 370/XA dataspaces (8 bit select of 31/32 ? bit spaces)
make the type number part of the pointer, to the right space?

or should each namespace / package get it's own address space?

---

any other ISA's have this feature, that I didn't know about?

might be on the way out with 64 bit CPU's, but if there is a programming
benefit
how does an 8 / 56 bit mode sound?

Daniel Barlow

unread,
Dec 14, 2001, 5:22:32 PM12/14/01
to
Alexander Kjeldaas <astor...@fast.no> writes:

> I think a solution to this problem is to support multiple heaps. Each
> request should use a separate heap that can be teared down after the
> request has been serviced. This requires a tiny bit of compiler support,
> but I think it would make it possible to implement a server application
> with virtually no memory or garbage collection overhead compared to a
> server written in C.

This seems so obvious that there is probably something horribly wrong
with it, but what about a generational GC with a per-request nursery.
When the request is done, you GC that heap, copying the live data to
the next generation. The cost of a copying GC is mostly related to
what you keep rather than what you leave behind, so in the case that
you throw everything away, this is probably not going to hurt too
much. If when you come to measure performance you find that you're
not throwing as much away as you wanted to, you have to fix things so
that you are - i.e. the system may get slower, but it remains safe.

Certainly it's going to require less compiler support than making all
global variables copy-on-write. And you still have the option, when
you need it, of manipualting global state.

I have a feeling that I've read a paper (could have been Henry Baker?)
describing something similar where he puts the nursery on the stack
and transports it to safety when each function exits. It was a few
years ago and I may be misremembering, though.

Erann Gat

unread,
Dec 14, 2001, 5:14:49 PM12/14/01
to
In article <YhtS7.17056$5W5.7...@typhoon.ne.mediaone.net>, "Scott
McKay" <s...@mediaone.net> wrote:

> If the C program and the program using a generational GC both allocate
> roughly the same amount of memory and don't have memory leaks, a
> modern generational GC will whup the pants off malloc/free. Why?
> Because GC gets to amortize the cost of calling 'free' over many calls
> to malloc.
>
> This effect has been amply documented.

References please?

E.

Kent M Pitman

unread,
Dec 14, 2001, 6:04:18 PM12/14/01
to
"Michael J. Ferrador" <n2...@orn.com> writes:

> has anyone had experience porting lisp to (only 2 examples I know)
>
> PDP-11/ >=40? (I & D (2) space)
>
> 370/XA ((8 bits of) Dataspaces, (1?) Hyperspace)
>
> The dataspace(s) is a 0 to 2^n space, no OS, associated
> in the MMU to a regular address space, addressable by
> extra bit(s) in the instruction or addressing mode.
>
> LispMe (scheme) moved some (vector?) datatypes to their own
> Motorola DragonBall 68K +/- 32K reletive branch segments

There may be specific situations for which this works, like the large
data tables one uses when decoding a gif that is not shared with
anyone and that is discarded upon being done with the gif. But in
general the problem isn't the bits at all. That is imminently doable
The problem is the linguistic effort of specifying which program uses
uses which bits. Lisp is a language that is SPECIFICALLY not about
keeping track of data layout, and it is completely counter to the
entire design of the language to have to think about this. It would
perturb every aspect of every program, IMO. The attempts I have seen
to do this by some hidden/automatic means have failed utterly, and I
think for this reason: The reason it works in C/malloc is because
people want to, for each and every program, think about memory
management. That works, if you like that kind of thing--the "fast GC"
of malloc/free is paid for in the pain and suffering of every
programming minute. Not thinking about memory management works for
Lisp and GC; the freedom to program incurs a GC overhead that we try
to keep small but is nevertheless always present. But hybrids "not
thinking but still doing" are doomed to failure because they expect
magic to happen. There is no magic in the Lisp approach nor the C
approach. There is magic in the hybrid because it expects the best of
both worlds doing the work of neither.

I don't see what this would buy.



> or should each namespace / package get it's own address space?

No. *package* is dynamic information and would cause bad side-effects on
code that was tested with a different prevailing *package*. If you mean the
package of the symbol that is the definition name, that is arbitrary and would
not be suitable because of sharing of symbols among packages, because packages
share code (e.g., through macros) so t hat there is no necessary relation between the
package of the function called and the package of the data, and it begs the
question of whose package is active in:
(let ((*package* (find-package 'alpha)))
(funcall (let ((*package* 'beta))
(compile nil
(let ((*package* 'gamma))
(read-from-string *some-definition*))))))
where *some-definition* holds perhaps a string like:
"(delta:some-macro (epsilon:some-function zeta:*some-special* 'eta:some-constant))"
What on earth would it mean to have this function partition its consing
by package?

> any other ISA's have this feature, that I didn't know about?
>
> might be on the way out with 64 bit CPU's, but if there is a programming
> benefit
> how does an 8 / 56 bit mode sound?

What is all this discussion of bits??? There are a million ways to implement
anything we decide the language should mean. Our problem is not absence
of low-level substrate to support cool ideas, it is absence of linguistic
clarity to know what is being asked for by the programmer such that we don't
gc an area that might be pointed into by other areas we didn't bother
to check or other such horror stories like that. It's all well and good
to have hairy bookkeeping schemes but ultimately what you say in the language
has to really clearly keep the programmer from getting into accidental
trouble. I can't imagine a discussion that significantly relies on any
discussion of bits, bytes, or pointers will improve matters. The discussion
that needs to happen has to talk about dataflow through explicit parameter
passing, dynamic and static variable bindings, etc., as well as general
theories of what compilers may know or infer from text in a program, and how
users can plainly express intent in a way that will not get them into trouble.
Bits and bytes ultimately will underly any such implementation, but no
discussion of bits and bytes seems to me to elucidate such an implementation.

JMO.

Bruce Hoult

unread,
Dec 14, 2001, 6:06:06 PM12/14/01
to
In article <87heqtw...@noetbook.telent.net>, Daniel Barlow
<d...@telent.net> wrote:

> Alexander Kjeldaas <astor...@fast.no> writes:
>
> > I think a solution to this problem is to support multiple heaps. Each
> > request should use a separate heap that can be teared down after the
> > request has been serviced. This requires a tiny bit of compiler
> > support,
> > but I think it would make it possible to implement a server application
> > with virtually no memory or garbage collection overhead compared to a
> > server written in C.

This is actually how a number of Macintosh programs I've written (in C++
or Pascal) work.

On classic MacOS (nearly) the whole of memory is a big heap, and the OS
creates sub-heaps for each Application running. You can do the same
yourself. Allocate some memory using NewPtr(), initialize it as a heap,
set it to be the current heap, allocate away, and then when you're done
just set the current heap back to the main Application heap and
DisposePtr() your temporary heap. You can even have a number of them a
the same time -- e.g. per request or per thread -- and just switch the
curent heap each time you change threads or requests. Instead of
disposing of an old heap, you might just reinitialize it. It's all easy.

Of course you don't get any help from GC (what GC?) but it's no worse
than any other malloc/free style programming.


> I have a feeling that I've read a paper (could have been Henry Baker?)
> describing something similar where he puts the nursery on the stack
> and transports it to safety when each function exits. It was a few
> years ago and I may be misremembering, though.

The "Chicken" Scheme compiler uses this technique. The code is CPS
converted and translated to C, so it consists of a collection of C
functions that *never* return and therefore use the C stack in a
continually growing, linear manner. The C stack is the GC nursery and
when the stack overflows a GC is done and then the stack is unwound
using longjump(). For best performance the C stack is limited to a
fixed size determined experimentally by a test program when you install
Chicken. On my 700 MHz Athlon it chooses 8 KB. On an 867 MHz G4 Mac it
chooses 128 KB.

-- Bruce

Nils Goesche

unread,
Dec 14, 2001, 12:30:43 PM12/14/01
to
Kent M Pitman <pit...@world.std.com> writes:

> I'd be comfortable saying Lisp is a protocol-oriented, gf-centric,
> call-by-identity, partially reflective, dynamic language.
>
> I'd say Java is a protocol-oriented, class/interface-centric,
> call-mostly-by-identity, partially reflective, half-dynamic/half-static
> language.
>
> I'd say C++ is a protocol-wannabe, class-centric, call-by-this-and-that,
> overly static language.

I know (parts of) Java, C++ and CLOS; but could somebody be so kind
and enlighten me about what ``protocol'' means in this context (or
MOP)? I always thought a protocol is something like Q.931 or X.75 :-)

Regards,
--
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9

Thomas F. Burdick

unread,
Dec 14, 2001, 7:47:17 PM12/14/01
to
g...@jpl.nasa.gov (Erann Gat) writes:

Have you done you due dilligence here? If you spent 15, 20 minutes
looking for papers, I'm certian you'd find the references you need.
If you're really curious/doubtful, put in the effort. If you still
can't find any papers, *then* come back and ask other people to do the
work for you.

BTW, a simple thought experiement should suffice to convince doubters
that this is at least *possible*: When you do manual deallocation, you
recursively trace all the dead objects, tracing and freeing their
children. A GC only traces the live objects. Since a properly
behaving gen-gc collects when the 0 generation is mostly garbage, it
should do less tracing than manual management. Of course, this leaves
out a ton of other factors, but since most people's doubts I've heard
about GC efficiency center around tracing...

--
/|_ .-----------------------.
,' .\ / | No to Imperialist war |
,--' _,' | Wage class war! |
/ / `-----------------------'
( -. |
| ) |
(`-. '--.)
`. )----'

Bruce Hoult

unread,
Dec 14, 2001, 7:56:45 PM12/14/01
to
In article <gat-141201...@eglaptop.jpl.nasa.gov>,
g...@jpl.nasa.gov (Erann Gat) wrote:

You don't even need generational GC. Take any random C program that
uses malloc/free and link in Boehm and "#define free(p)". I've done
this with quite a number of (proprietry) programs and never yet seen a
performance decrease, and often substantial performance increases. Not
to mention bugs and leaks disappearing.

Of course it may just be that Boehm is simply better written than your
average OS or compiler vendor's malloc/free, but that's hardly an
argument against using it.

-- Bruce

Thomas F. Burdick

unread,
Dec 14, 2001, 8:32:30 PM12/14/01
to
Alexander Kjeldaas <astor...@fast.no> writes:

> I think a solution to this problem is to support multiple heaps. Each
> request should use a separate heap that can be teared down after the
> request has been serviced. This requires a tiny bit of compiler support,
> but I think it would make it possible to implement a server application
> with virtually no memory or garbage collection overhead compared to a
> server written in C.

I agree with the idea that each request should get its own pool.
Apache does its memory management this way, and it's a pretty
reasonable thing to do for the C world. I'm working on a (server)
application (in Lisp) that manages memory in a similar way. Its
"pools", though, are really ephemeral generations. In a special case,
they can have no roots, and "collection" just frees the whole pool
without checking for pointers from older generations.

> The operations I think are needed to support multiple heaps are: an
> operation to create heaps with various characteristics, and the ability to
> call a function so that consing happens within the given heap for
> everything that is done within that function (recursively).

That's not how we're doing it. We have a dynamic variable that
contains the current pool, and the ability to make new pools.
Primarily, we use this to better identify generations (at the end of
servicing a request, there's almost no memory retained -- but this is
a function of the dynamic contours of the program, not necessarily of
the number of bytes consed). This isn't managed directly by the
programer, though. Conceptually, I put the primitives in the same
realm as GO: they're a powerful, lowlevel tools that I only want to
use when building the tools that I'll actually use to write the
application.

> Example:
> (let ((my-heap (make-heap :no-external-references-allowed)))
> (call-with-heap my-heap #'my-func))

So, what you're trying to express here is that everything you're
allocating will have dynamic-extent. Funny, my compiler recognizes
this by looking at dynamic-extent declarations. And, yeah, they're a
little bit of a pain to type all the time, but that's good. I want
the compiler to assume that everything is indefinate extent, unless I
promise it otherwise. Forcing the programmer to think before making
promises as grand as limiting the extent of an object is IMO good.

> In order to take full advantage of the ability to create heaps with various
> characteristics, I would need a little help from the memory system and the
> compiler. For instance, I woule like to create a heap with the invariant
> that no object outside the heap (except for the evaluation stack) should be
> able to refer to an object inside the heap. This is a fantastic heap
> because it allows me to allocate objects fast, and it allows me to throw
> away the whole heap after I am done with it - exactly the thing I want for
> server programming.

I agree, but I think you're moving in to dangerously manual territory
with the way you're doing this. And, FWIW, you really don't need a
new language for this. A new compiler, probably, but there's nothing
you want that can't be done with simple extensions to CL.

> The compiler support that I need for this heap is that when I do
> (call-with-heap my-heap #'my-func)
> above, the compiler should make a copy of all stack variables, and it
> should make all global variables copy-on-write. That way it impossible for
> my-func and functions called by it to "communicate" with the outside world.
> To communicate with the outside world, a message passing interface could be
> used, just as we would do in a fork()-based server.
>
> I think this is one cool thing that Arc could implement - and something
> that would make it ideal for server programming.

fork(), shmork(). C servers need to kill their children off from time
to time because they're full of memory leaks. We shouldn't need to
deal with this in Lisp. All the overhead of OS preemption is *waaay*
to much to pay.

Thomas F. Burdick

unread,
Dec 14, 2001, 9:07:27 PM12/14/01
to
cbbr...@acm.org writes:

> -> It takes my CMU/CL instance something like 15 seconds to load in
> the CLG library. That means that application start time is
> _guaranteed_ to have an initial latency of _at least_ 15 seconds,
> and involves having a bunch of _dynamic_ memory allocation.

Why don't you just dump a core with it already loaded? It *would*
take me a long time to start up Hemlock, Garnet, etc., except that I
only have to do it once per CMUCL upgrade. I do love my
kitchen-sink.core file :-).

Erik Naggum

unread,
Dec 14, 2001, 10:01:16 PM12/14/01
to
* Nils Goesche <car...@cartan.de>

| I know (parts of) Java, C++ and CLOS; but could somebody be so kind
| and enlighten me about what ``protocol'' means in this context (or
| MOP)? I always thought a protocol is something like Q.931 or X.75 :-)

Those are _communications_ protocol specifications. There are many kinds
of protocols, but they are all concerned with distributed synchronized
state transition and context-dependent information exchange. The reasons
for specifying communications protocols with such heavy-handed machinery
are (1) that there is high value in conserving bandwidth, and (2) that
they must be implementable as completely separate projects. This is not
different from specifying functions and data types and state transitions
in a single software project. For instance, you can only call read-line
on an open character stream, you can call close only on an open stream,
etc. These specifications are entirely equivalent to what you must do in
a communications protocol.

In essence, a remote communications peer may be regarded as an "object"
to which one sends "messages" in order to influence or interrogate its
state. The "information hiding" aspect of some object-oriented models
acutely parallels the design of communicating peers, about which one
should not know too much and really not force to change state outside of
the protocol. What an object or module or whatever exports may be seen
as the elements of its (interaction) protocol.

The distinction between a protocol and an API is that the protocol will
make states and state transitions explicit, while they are usually hidden
in commentaries and error conditions in API. Something designed and
implemented as a _protocol_ will usually be much more orderly, simply
because the number of states and state transitions are reduced when they
are made explicit and visible to the programmer.

E.g., global variables scare many programmers because one may change the
state of the whole system so easily and it is so hard to restore state.
A global variable is hard to model accurately, but this is in sharp
contrast to the special variable, which may be regarded as arguments to
the functions called inside their bindings, each function propagating the
entire set of global bindings in effect when the call is made and models
their unbinding accurately. However, most protocols that invent variable
exchange fail to implement nesting behavior properly. In fact, nesting
and recursion appear to be uncommon in protocols designed by people who
prefer a "flat" state space. Numerous programming techniques would be so
much more _conscious_ if they were designed as parts of protocols.

///
--
The past is not more important than the future, despite what your culture
has taught you. Your future observations, conclusions, and beliefs are
more important to you than those in your past ever will be. The world is
changing so fast the balance between the past and the future has shifted.

Frank A. Adrian

unread,
Dec 15, 2001, 1:50:06 AM12/15/01
to
Scott McKay wrote:

> Because Java, e.g., has staked out what it calls "object-oriented"`
This can be validated by the fact that of 27 papers in this year's OOPSLA
proceding, 25 were directly about Java. The other two were on a VM for UML
and on a graphical display notion for classes.

> and the model is broken.

And this can be validated by the fact that most of those papers were on
trying to get Java to do things that Lisp has been able to do for the last
15 or so years...

faa

Erann Gat

unread,
Dec 15, 2001, 3:24:05 AM12/15/01
to
In article <xcv8zc5...@apocalypse.OCF.Berkeley.EDU>,

t...@apocalypse.OCF.Berkeley.EDU (Thomas F. Burdick) wrote:

> g...@jpl.nasa.gov (Erann Gat) writes:
>
> > In article <YhtS7.17056$5W5.7...@typhoon.ne.mediaone.net>, "Scott
> > McKay" <s...@mediaone.net> wrote:
> >
> > > If the C program and the program using a generational GC both allocate
> > > roughly the same amount of memory and don't have memory leaks, a
> > > modern generational GC will whup the pants off malloc/free. Why?
> > > Because GC gets to amortize the cost of calling 'free' over many calls
> > > to malloc.
> > >
> > > This effect has been amply documented.
> >
> > References please?
>
> Have you done you due dilligence here? If you spent 15, 20 minutes
> looking for papers, I'm certian you'd find the references you need.

Well, I did a Google search on "generational garbage collection malloc
free whup pants" and got no results. :-)

I left out the "whup pants" part and "I'm feeling lucky" takes you to the
Boem GC page, where it says that the performance is generally "comparable"
and "may be substantially [better]" in multi-threaded applications. Not
exactly unqualified support for a general claim of pants-whupping.

But what I was really after was not documentation for the claim per se,
but rather finding out what Scott considered "ample documentation",
whether his claim had any implicit restrictions, and what he meant
(quantitatively) by "whup the pants off." I thought simply asking for
references would make the inquiry a little more concise, but I guess some
of the intent got lost.

E.

Scott McKay

unread,
Dec 15, 2001, 10:45:34 AM12/15/01
to

"Scott McKay" <s...@mediaone.net> wrote in message
news:YhtS7.17056$5W5.7...@typhoon.ne.mediaone.net...

By the way, the site at the company I work at -- www.hotdispatch.com --
is written in Lisp, and we have app servers that listen about 100+ sockets
at a time. We cycle the servers about once a week, but that's usually
because it's the simplest way to ensure they have all patches loaded, etc.
We don't have memory leak or performance problems.

We're using Xanalys (nee Harlequin) Lisp on Netras.

Alexander Kjeldaas

unread,
Dec 15, 2001, 12:22:56 PM12/15/01
to
Kent M Pitman wrote:

> Carl Shapiro <csha...@panix.com> writes:
>
>> Alexander Kjeldaas <as...@fast.no> writes:
>>
>> > I think several lisps have the ability to allocate such areas.
>> > However, I have not heard of an implementation that has the compiler
>> > support
>> > needed to make it safe to use them. I am not advocating using areas
>> > the way you describe them.
>>
>> The problems one often runs into with these sorts of allocation
>> schemes are not easily solved with help from the compiler. For
>> example, how do you plan on handling common operations that absolutely
>> have to side effect the environment such as I/O and logging?
>
> Exactly. Or program-level caching, debugging traces, etc.
>

Let me backtrack a little and repeat what I believe we are discussing here.
As I understand it, you are critical of the idea that a compiler/lisp
system can provide support for a separate heap (heap A) so that it would be
impossible to create a pointer in some other heap in the system that would
point to an object within heap A.

The above does _not_ mean that one has to avoid side effects. Let me give
an analogy which works pretty well for the particular type of heap I used
in my example (the no-external-references-allowed heap).

Imagine two UNIX processes A and B. A has forked off B and is B's parent.
Now B in effect has its own heap which is independent from A's heap. When
B reads from its heap, it will in effect read from the same physical pages
as A's heap. When B tries to write to its own heap, it will be denied
this. The OS will copy the page B tried to write to and make it B's page.
If B wants to change A's heap, it can do so, but it has to communicate with
A using some kind of RPC protocol. Using RPC, B can change A's heap at
will, but it cannot do one thing - it cannot make A's heap refer to an
object in B's heap. Similarly, A can change B's heap.

This analogy is not quite right - Unix processes executes independently of
each other. Please ignore that for a moment and imagine that you could
have two processes which executed as one. Process A would call process B
using RPC, and B could call A.

When you do an RPC call, arguments are marshalled. This is what I meant
with a "message passing interface" that would have to be used when
communicating between heaps. Say a function is executing in a
no-external-references-allowed heap context and wants to call another
function that should use the main heap context. Thec all would have to be
made in a manner such that arguments would be "marshalled" and no
references to the no-external-references-allowed heap could get through.
Such an "RPC call" doesn't need to be a normal call, you would explicitly
specify when you wanted to modify the global heap.

It is possible to imagine problems doing such a call if the argument is a
very complex structure. You could get into "deep copy" and other
problems. It is not necessary to make this complicated. A system could
only allow simple objects to be passed as arguments: numbers, characters,
and strings would probably be enough.

This means that certain parts of the system such as I/O and logging might
have an "RPC" wrapper so that all parts of the system had the same view of
the global state, but I don't see that as a major problem.

>> What
>> about asynchronous operations that hold onto a buffer but may not
>> complete by the time your form unwinds?

I am not sure I understand this particular problem.

>
> Not to mention code to manage keyboard interrupts, synchronous
> condition handlers and other closures passed down into this code from
> elsewhere in the system, daemon programs like "if-added" daemons that
> wake up and run when very general-purpose operations are done, method
> combination issues, etc.
>

Keyboard interrupts: Can they not run as any other code in the main heap
context?

Condition handlers: don't let anything other than simple objects be passed
as arguments to conditions that cross a "heap-boundary" (a place in the
call-stack where one function uses one heap context and another function
uses another heap context). Or else you must make sure you set up a
handler within the no-external-references-allowed heap. In most cases,
this would be a pretty natural thing to do.

Closures passed down: You are not allowed to pass anything but simple
objects as arguments when calling a function which uses another heap
context.

Daemon programs: A closure created in a no-external-references-allowed heap
could be called as long as you only pass simple objects as arguments or
have a way of marshalling the arguments.

I am not sure what issues related to method combination you refer to.

>> I suppose you could get away
>> with wrapping only small extents of code with a "cons over here
>> instead" form. Nevertheless, I cannot see how one would make
>> effective use out of this magic feature without submitting to a
>> painfully rigorous coding convention. However, if you have given any
>> thought to solving these screw cases, I would very much like to hear
>> about it.
>
> Right.
>
> And keep in mind throughout that there is substantial hidden cost in
> (a) proving such weird dataflows would work, (b) retooling your system
> to use these specialized techniques, (c) discussions within your
> company about the fact committing to these techniques now means you
> can't do a variety of extensions in straightforward ways and must seek
> workarounds, (d) lack of real-time response to customer requests
> because working within such specialized schemes can be very tricky to
> write and can involve very complex QA to be sure you've gotten right,
> (e) more support dollars to manage because bugs are hard to understand
> when reported and hard to track down without good reports, and on and
> on.

I don't think something like this would be added to a commercial lisp in
the near future :-) I am discussing this within the context of Arc as
something that would be cool to have, and something that I think would be
especially well suited for server programming - which is something Arc aims
for, and something I believe generational GC has problems with under high
system loads (200 concurrent requests).

astor

Alexander Kjeldaas

unread,
Dec 15, 2001, 12:59:52 PM12/15/01
to
Scott McKay wrote:

My point is that a generational GC and a C program will not possibly
allocate "rougly the same amount of memory". Let me repeat the argument
from my initial posting:

The problem for an event-driven server is that if you have 200 outstanding
requests at any given time, you have 200 "contexts" where you allocate
memory. These contexts are independant of each other and there is no
single point in time where all of them can be "freed" (remember the
requirement that you have 200 outstanding requests _at any given time_). If
you want to keep your memory usage comparable to what a C program would
use, you will have to GC fairly often - say every 20 requests. Since only
10% of the local heap will be freed by the generational GC (20 of 200
requests), the performance of the generational GC will not be great.

An alternative to GCing often is GCing seldom. You would GC every 200th
request. In this case you would be able to free 50% of the local heap, but
you would use rougly twice the memory of a C program. The performance of
the generational GC is still not great (it only removes 50% of the memory
it scans).

Or you would GC every 1000th request, and use huge amounts of memory.

These calculations are not quite fair as a generational GC will have less
fragmentation than malloc()/free(), but I think the general picture is
correct.

If you think a generational GC is _extremely_ fast compared to malloc() you
might argue that even if a generational GC has to collect the local heap 10
times as often for an event-driven server than it would have for a normal
program, that it still would beat malloc/free, then I think you are wrong.
I don't think there is any generational GC implementation with a documented
performance which is that good, but please prove me wrong.

So I am not saying that a generational GC always suck, and I am not
interested in a general discussion about whether generational GC is faster
than malloc()/free(). I am interested in peculiar problems related to
server programming.

astor

Alexander Kjeldaas

unread,
Dec 15, 2001, 1:09:32 PM12/15/01
to
Daniel Barlow wrote:

> Alexander Kjeldaas <astor...@fast.no> writes:
>
>> I think a solution to this problem is to support multiple heaps. Each
>> request should use a separate heap that can be teared down after the
>> request has been serviced. This requires a tiny bit of compiler support,
>> but I think it would make it possible to implement a server application
>> with virtually no memory or garbage collection overhead compared to a
>> server written in C.
>
> This seems so obvious that there is probably something horribly wrong
> with it, but what about a generational GC with a per-request nursery.
> When the request is done, you GC that heap, copying the live data to
> the next generation. The cost of a copying GC is mostly related to
> what you keep rather than what you leave behind, so in the case that
> you throw everything away, this is probably not going to hurt too
> much. If when you come to measure performance you find that you're
> not throwing as much away as you wanted to, you have to fix things so
> that you are - i.e. the system may get slower, but it remains safe.
>
> Certainly it's going to require less compiler support than making all
> global variables copy-on-write. And you still have the option, when
> you need it, of manipualting global state.
>

I think the basic problem with this approach which would be great if it
could be implemented effectively, is that you need to trace references
across all the local heaps. You will have problems GCing just one heap.
So while I think this would be great, I am not sure it is possible to
implement.

> I have a feeling that I've read a paper (could have been Henry Baker?)
> describing something similar where he puts the nursery on the stack
> and transports it to safety when each function exits. It was a few
> years ago and I may be misremembering, though.
>

astor

Alexander Kjeldaas

unread,
Dec 15, 2001, 1:41:34 PM12/15/01
to
Duane Rettig wrote:

A closed old area seems like a good idea, but maybe a morer adical idea can
get even more radical improvements :-). Even if you never have to GC the
old area it seems that you still have problems with the amount of garbage
you can reclaim in the local heap each time you GC.

>> Now, what I am saying is that you might be best off by using fork() to
>> overcome perceived limitations of garbage collection for implementation
>> of
>> server processes. A server usually receives a request, processes that
>> request, delivers a result, and starts from scratch. Another way to say
>> this is that most lisps (correct me if I am wrong) don't support the most
>> efficient method for setting up and tearing down memory heaps for server
>> applications. This might be a grave accusation given that I stated that
>> I have no experience in this area, but read on :-)
>
> Well, yes, this is a grave accusation. It may be true, but before you
> make such statements, you should solidify your definition of "the most
> efficient method for setting up and tearing down memory heaps for server
> applications". I'm not sure if you are figuring that it's simply
> obvious what this most efficient method is, or if you are leading to
> your conclusion in this article about multiple heaps, but it is not
> clear to me what "the most efficient method" means to you.
>

We are discussing server applications, and I am thinking of a model where a
request comes in, is processed, and a result is returned. A little global
state might be updated in the process, but most memory consed is never
again referenced, and can be freed when we finish with our request.

The most efficient method to handle memory efficiently in this situation,
assuming memory is a fairly scarse resource, is to have each request use a
separate pool of memory, and reclaim memory from that pool when the request
finishes.

>
> The tuning of a generational gc is an art, and we tend to take them on
> a case-by-case basis. We also try to make such art available to
> programmers who will need it without bogging others down in details,
> by providing as much control as possible with good defaults. In order
> to test your theory, we would have to have a real test case. I think
> that it is dfangerous to make generalizations as above without actually
> trying things out; you may be surprised in both directions - some
> assumptions you've made about how bad things will be will be wrong, and
> some assumptions about how good things will be will be wrong. It
> largely depends on the application itself, and on which of the
> application's parameters "hits the wall" before others.
>

It is hard to argue against this :-) I don't have any numbers except for
the back-of-an-envelope calculations posted initially, and I am not
prepared to implement this idea to prove my point - at least not while this
thread is active :-)

> Others have posted some history in response to the idea of multiple heaps.
> In addition, a fundamental deterrent to having multiple heaps is
> a Fragmentation Problem: in a linear address-space which is almost
> full nowadays (i.e. in 32-bit architectures), it is hard to predict
> the best sizes for such multiple-space memory segments because such
> address-space is so precious. For example, in a threaded environment,
> where each thread has its own stack, how many threads can you reasonably
> fit into your process? You might be surprised at how fast the address
> space fills up just so accomodate stacks. Perhaps 64-bits will give us
> some breathing room for a generation or so...
>

I know all too well that this is a problem, but a heap doesn't have to use
a linear address space.

astor

Alexander Kjeldaas

unread,
Dec 15, 2001, 2:55:01 PM12/15/01
to
Thomas F. Burdick wrote:

>> Example:
>> (let ((my-heap (make-heap :no-external-references-allowed)))
>> (call-with-heap my-heap #'my-func))
>
> So, what you're trying to express here is that everything you're
> allocating will have dynamic-extent.

I think the example is somewhat wrong. Since a server often is event-driven
you want to keep the heap around until you are done with it. The state
needed to deal with a multiple concurrent queries cannot be stored on a
single unified stack.

I even think destruction of such extra heaps could be done by the GC [no
that doesn't mean we're back to square one, because the GC would only have
to GC a tiny tiny local heap compared to what it would look like if
everything was allocated there].

Example:

(defvar *active-query-heaps*)

(defun handle-query (cmd)
...)

(defun handle-io (cmd)
(declare (simple-string cmd))
(let* ((heap (gethash (queryid cmd) *active-query-heaps*))
(result (call-with-heap heap #'handle-query cmd)))
(declare (simple-string result))
(when (query-finished result)
(setf (gethash (queryid cmd) *active-query-heaps*) nil)
;; GC of tiny "main" local heap will detect dead heap and free
;; the resulting memory without having to scan that heap.
(gc))))

> Funny, my compiler recognizes
> this by looking at dynamic-extent declarations. And, yeah, they're a
> little bit of a pain to type all the time, but that's good. I want
> the compiler to assume that everything is indefinate extent, unless I
> promise it otherwise. Forcing the programmer to think before making
> promises as grand as limiting the extent of an object is IMO good.
>

dynamic-extent looks interesting, but I have a few problems with it:
It looks unsafe to use. You promise to limit the extent of an object, but
will the system catch your errors for you? If it does, at what price? I
am not suggesting that objects are given limited extent. Also, it seems
that you must use a threaded model since you can't use a single unified
stack to hold information if you use a event-driven design. Third, the
lisp implementation probably needs to deal with non-linear stacks on a
32-bit implementation at least.

astor

Scott McKay

unread,
Dec 15, 2001, 4:22:47 PM12/15/01
to

"Alexander Kjeldaas" <astor...@fast.no> wrote in message
news:9vg32p$lg7$1...@news.ost.eltele.no...

>
> The problem for an event-driven server is that if you have 200 outstanding
> requests at any given time, you have 200 "contexts" where you allocate
> memory. These contexts are independant of each other and there is no
> single point in time where all of them can be "freed" (remember the
> requirement that you have 200 outstanding requests _at any given time_).
If
> you want to keep your memory usage comparable to what a C program would
> use, you will have to GC fairly often - say every 20 requests. Since only
> 10% of the local heap will be freed by the generational GC (20 of 200
> requests), the performance of the generational GC will not be great.
>
> An alternative to GCing often is GCing seldom. You would GC every 200th
> request. In this case you would be able to free 50% of the local heap,
but
> you would use rougly twice the memory of a C program. The performance of
> the generational GC is still not great (it only removes 50% of the memory
> it scans).

You description leaves me completely confused. It's clearly the case
that a Lisp program can be written that *allocates* the same amount
of memory as a C program that does the same thing, right? And it's
clearly the case that, if the requests made to the Lisp and the C program
are the same, that the same amount of memory will either become garbage
or be released via 'free' at the same time, right? So the only difference
is
exactly when the memory will actually be freed? The GC will run as
often as necessary to meet certain criteria, such as "x% of memory
should be available for allocation". And the GC will reclaim all the
garbage (*) when it runs, right?

I just don't think you are setting up a scenario that demonstrates any
difference, except that the GC is likely to have lower overhead than
repeated calls to 'free'. (OK, check out Paul Wilson's big survey
papers if you want pointers to more information on this.)

If you are really concerned about certain kinds of allocation, you
can also "resource" the objects in questions.

What can I say, I'm one of the authors of a Lisp web server that
meets your criteria (except we usually have about 100 "open"
requests), and it works just fine.

----------
(*) "Reclaim all the garbage", to first approximation. A precise GC will
reclaim all the garbage. In practice, a conservative GC will reach a stable
state in which some small percentage of memory will not be reclaimed,
but this is usually "in the noise".


Thomas F. Burdick

unread,
Dec 15, 2001, 5:06:30 PM12/15/01