However, not only is there no obvious convention for "copy
construction" (creating one object from another), but of the several
references I have on CLOS, there isn't even any mention of the need to
copy objects. Is there something going on here that is more subtle?
At the moment, the best solution I have come up with is to provide a
method that operates on instances of a standard class to do a shallow
copy. That method (say, copy-object) could then be specialized to
provide deep copies where necessary.
Another approach might be to create a new kind of meta-class, with an
extra slot option for specifying how copying is done (deep or
shallow). At the moment, this approach is a little out of my reach,
and, in any case, does not address the concern I have that I am
perhaps missing the point entirely.
TIA for any help.
The notion of "copying objects" is very application-dependent. Should it
be a deep or shallow copy? What if it should be deep for some slots and
shallow for others? Similar issues arise for comparing for equality.
Because there's no obvious, general solution to this, it's left to the
programmer to determine what's right for his application. We don't even
define a standard generic function that users can write methods for -- the
notion is *application* dependent, not necessarily *type* dependent.
I.e. there could be two applications using objects of the same class, but
one of them needs to do shallow copies while the other needs to do deep
copies. If they each tried to define the COPY-OBJECT method for the class,
they'll conflict with each other; it's much better to force them to come up
with methods in their own package.
I'm sure Kent will respond with a much longer article describing the entire
philosophy (or maybe just a URL pointing to the paper he wrote about it a
few years ago). But the above is the nutshell version.
You may be wondering why other languages, such as C++, don't subscribe to
this same philosophy -- they *do* have copy and equality methods, and even
define default ones when programmers don't create them explicitly. I can
think of a few explanations:
1) It's less common to link multiple applications into the same memory
image, so conflicts are less common.
2) C++ passes structure and class objects by value, rather than as an
object reference, so copying is inherent to the function call sequence.
Therefore, they needed to come up with a standard way to copy objects.
3) The designers of those languages just didn't care about the issues the
way we do (a corollary of the "worse is better" philosophy).
Barry Margolin, bar...@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Don't bother cc'ing followups to me.
There are variety of copying requirements in my application space
(distributed business applications). Let me give you two examples,
which are illustrative of some of the issues you allude to. I would
very much appreciate your insights into how you might handle them in
The first is example is the exemplar: I store an "exemplar" object so
that defaults can be set dynamically by users (rather than relying on
:initargs or :initforms). All new objects are created by making copies
from the exemplar. Referenced objects are not copied, but nested
objects are. In C++, this functionality is provided by the default
copy constructor for a given class.
A second example is object pooling. In garbage collected systems it is
sometimes worthwhile in the interest of efficiency to pre-allocate a
set of objects that are simply reused when discarded. Object identity
is determined in the equal sense, not the eq sense, usually by
providing an explicit identifier. Thus, it makes sense to perform an
assignment operation between two objects, copying the values from one
into the slots in the other. The pre-allocated objects are never
garbage collected, but neither are there any new allocations that need
to be made, reducing the demands placed on the garbage-collection. In
C++, this is handled by "overloading" the assignment operator.
These are both situations where an appropriate set of pre-defined
methods for assignment and copy construction might be useful. There
are some difficulties involved: for example, how to guarantee that
copy semantics are consistent across classes for a given purpose? In
both C++ and CLOS, it is possible override the default implementation
of a method (virtual function in C++). This is necessary in case a
class is doing something tricky, like caching data. (This is why Java
has provided a "transient" flag for attributes that are not truly part
of an object's identity.)
However, this conflicts with the need for consistency between classes.
In this sense, the C++ approach is also inadequate. In fact, I have no
difficulty with the need to implement my own copying facility in CLOS.
I just want to ensure that I am not missing some subtlety in the
design of the language that obviates the need for such copies or
perhaps addresses the requirement in a different manner.
>Because there's no obvious, general solution to this, it's left to the
>programmer to determine what's right for his application. We don't even
>define a standard generic function that users can write methods for -- the
>notion is *application* dependent, not necessarily *type* dependent.
>I.e. there could be two applications using objects of the same class, but
>one of them needs to do shallow copies while the other needs to do deep
>copies. If they each tried to define the COPY-OBJECT method for the class,
>they'll conflict with each other; it's much better to force them to come up
>with methods in their own package.
Fair enough, although I always concede to the "it's always different
each time" argument rather warily. I agree entirely that there is more
than one kind of copy operation, but I am not convinced it isn't
possible to enumerate them and implement general algorithms, provided
there is a way to provide the appropriate "hints" in the class
declaration (like Java's "transient" flag). Same thing with equality.
>I'm sure Kent will respond with a much longer article describing the entire
>philosophy (or maybe just a URL pointing to the paper he wrote about it a
>few years ago). But the above is the nutshell version.
Thank you very much for your feedback. Any references would also be
very helpful. I am presently using Franz's online reference, the MOP
book (Kiczales,et al), and Sonya Keene's book. I found no discussions
about these issues (ironically, the word "object" does not even appear
in the Kiczales index), so other suggestions as to where else I might
look are welcome.
>You may be wondering why other languages, such as C++, don't subscribe to
>this same philosophy
>2) C++ passes structure and class objects by value, rather than as an
>object reference, so copying is inherent to the function call sequence.
>Therefore, they needed to come up with a standard way to copy objects.
This is part of it. You can pass things by reference and in real-world
applications, you usually do. However, semantically, copy and
assignment have to be provided to make objects behave like built-in
types. In practice, having only one way to copy or assign an object is
restrictive and relies a lot on convention. (C++ language lawyers have
talked themselves in circles on this.) So I think they are only there
to allow for consistency, such as being able to pass objects by value.
This is actually consistent with CLOS in that they are not expected to
really obviate the need to deal with application specific
requirements. It's just CLOS doesn't really need assignment or copying
to be consistent with Lisp, whereas C++ does to be consistent with C.
>3) The designers of those languages just didn't care about the issues the
>way we do (a corollary of the "worse is better" philosophy).
I sometimes feel sorry for C++ designers. On the one hand, they got
themselves into this mess by starting with C as their basis. But they
get it from both sides: the C community disparages them for
introducing too much complexity (almost all of which is merely being
acknowledged by the language), and then language gurus hammer them for
not caring about elegance. One thing I am sure of, Bjarne Stroustrup
(the designer of C++) is deeply concerned about elegance! I think
maybe C++ is a sort of misguided attempt to bring powerful programming
constructs (object-orientation, generic types, etc.) to the "masses,"
C provided a pretty large installed base to work from. It turns out
that there aren't that many commercial software developers who are
comfortable with these concepts or capable of using them effectively.
This is the same principle that makes it difficult to languages like
Lisp to gain momentum. Like I said, I think C++'s approach (and by
extension, Java's) are actually consistent with CLOS approach, at
least in that it is internally consistent.
there is no such thing as a copy of an object. let me clarify before you
object. let's define a copy of an original object to be an object equal
to the original in all respects save that if you alter the copy, you do
not also alter the original. that seems to imply deep copy, but I think
it means shallow copy.
now, since "copy" is defined in terms of the possible alterations, it
should be intuitively evident that there is a _purpose_ to a copy that
needs to be communicated. e.g., when I want a copy of a _list_, I want a
fresh cdr chain that I can modify, but I do _not_ thereby want copies of
all the elements of the list. Common Lisp offers COPY-LIST to copy the
cdr chain, COPY-ALIST to copy the cdr chain and any cons cells that are
members of the list, and COPY-TREE to copy all cons cells. this is
pretty specific, don't you think? try to communicate this to a generic
copy for lists.
and, since "copy" is defined in terms of "equal", which does not include
the object identity at some level or another, and possibly any number of
other properties, all the problems of a generic EQUAL apply with full
force. you'll note that there is no EQUAL for class instances, either.
I yield the floor to Kent Pitman on EQUAL. :)
Y2K conversion simplified: Januark, Februark, March, April, Mak, June,
Julk, August, September, October, November, December.
Actually, I don't really object at all. In some deep sense, this is
true for any object, even built-in types. For example, there is only
one integer 3. However, as it turns out, just like with built-in
types, there are number of uses for making copies anyway. A couple of
these are discussed in my reply to Barry's message. Essentially, these
are typically reduced to a logical versus physical view of the system.
I think. =]
> now, since "copy" is defined in terms of the possible alterations, it
> should be intuitively evident that there is a _purpose_ to a copy that
> needs to be communicated. e.g., when I want a copy of a _list_, I want a
> fresh cdr chain that I can modify, but I do _not_ thereby want copies of
> all the elements of the list. Common Lisp offers COPY-LIST to copy the
> cdr chain, COPY-ALIST to copy the cdr chain and any cons cells that are
> members of the list, and COPY-TREE to copy all cons cells. this is
> pretty specific, don't you think? try to communicate this to a generic
> copy for lists.
I suppose I could argue that you could provide analagous methods for
classes or that I could define such a generic copy if I provided the
parameters or "hints" (either as arguments or in the class definition,
or both), but it's sort of beside the point. They aren't there. So I
must still not be getting it.
One thing I've figured out is that deep vs shallow copy isn't very
meaningful in CLOS, since everything's a reference anyway. There isn't
much point to a shallow copy: if I understand this right, you'd just
end up with a bunch of new slots pointing to the exact same data.
A deep copy suffers from the inverse problem; namely, you'd always end
up copying the values. In other words, since everything's a reference,
there's no way to figure out which values are shared between objects
and which aren't. That's probably so obvious to everyone else that it
isn't worth mentioning. But coming from C++, it is news to me!
There is another difficulty with a deep copy: what happens with a
circular reference? Clearly, in this case, the implication is that the
objects should be shared. However, you can't just silently assume
that, because it also might be a mistake.
There are two solutions. The one CLOS has chosen still mystifies me a
little bit; namely, to leave it entirely up to the programmer. Why not
support the second approach, which is to allow shared values to be
flagged as such. You could also flag attributes which are not properly
part of the object's identity. In other words, you'd provide a way to
let the class "know" about copying and identity.
Such a change appears simple enough to implement on the surface, but I
am a little concerned that I am heading into some labryinthian issues.
First, I'd have to record these additional flags with the slot
definitions. Then there are some issues with accessors. If an
attribute isn't shared, shouldn't I always copy it before storing it
as the slot value? Or do I just assume the client does it for me?
> and, since "copy" is defined in terms of "equal", which does not include
> the object identity at some level or another, and possibly any number of
> other properties, all the problems of a generic EQUAL apply with full
> force. you'll note that there is no EQUAL for class instances, either.
Yes, I had, only just today. This issue is also becoming clearer with
some thought (and help from this newsgroup!). How do I know how to
compare all the components that make up an object? In particular, am I
interested in eq or equal? Tricky, but it seems it could be resolved
the same way as copy -- you just need more information in the class
> I yield the floor to Kent Pitman on EQUAL. :)
Very well. Thank you for your response.
You can use defstruct. It provides a copy method. You can use the
`include' option or inheritance, and define methods on structures. The
function defstruct creates new types, but it's very similiar to a
The presence of a pre-defined copier should not be assumed to mean
that there is a unique concept of a copier. There is copying, and
then there is copying correctly. The copier does something reliable
but the thing it reliably does is not a reliable solution to the needs
of the person asking for a copier. (Sigh.) That's what it means for
a solution to not be uniquely determined, and that's why it's sad for
people to be using the unqualified term "copy" where they really mean
something much more specific.
Rather than rant on and on in words, since that never seems to do
any good, I'll let you puzzle the about why the defstruct-provided
copier doesn't work in the following example. Of course, a seasoned
programmer will just say "you can't use the copier in that case".
But implicitly they wil be agreeing with me. A language that offers
features that you have to know not to use in spite of their apparent
desire to have you use them is asking the user to get into trouble.
This is why I think it's bad for COPY to be in the language, why
I think EQUAL is questioanble in its current form, and why people
should think harder about what EQUAL and COPY are about before assuming
the language could or should be offering it.
(defstruct (node (:print-function
(lambda (object stream depth)
(declare (ignore depth))
(write-string "<" stream)
(loop for node = (find-head object) then (node-next node)
for first = t then nil
when (node-p node)
do (unless first (write-string "," stream))
(if (eq node object)
(write-string "[" stream))
(prin1 (node-cell node) stream)
(if (eq node object)
(write-string "]" stream))
do (error "Bad node end: " node))
(write-string ">" stream))))
(defun find-head (node)
(loop (let ((prev (node-prev node)))
(if prev (setq node prev) (return))))
(defun kons (x y)
(check-type y (or null node) "a lizt")
(let ((node (make-node :cell x :next y)))
(when y (setf (node-prev y) node))
(defun lizt (arg &rest more-args) ;it's just a toy, so recursion is ok
(cond ((not more-args) (kons arg '()))
(t (kons arg (apply #'lizt more-args)))))
(setq l1 (lizt 3 4 5))
; => <,4,5>
(setq l2 (node-next l1))
; => <3,,5>
(setq c2 (copy-node l2)) ;try the defstruct-supplied node copier
; => <3,4,5>
; Ask yourself why c2 prints differently than l2 and you'll be on
; track to the issue.
; for further reading about equality and copying,
; if you haven't already, see my "Lisp Pointers" article at
I'm going to steer clear of your comments about copying and equality, since
others have responded admirably.
> A second example is object pooling. In garbage collected systems it is
> sometimes worthwhile in the interest of efficiency to pre-allocate a
> set of objects that are simply reused when discarded. Object identity
> is determined in the equal sense, not the eq sense, usually by
> providing an explicit identifier. Thus, it makes sense to perform an
> assignment operation between two objects, copying the values from one
> into the slots in the other. The pre-allocated objects are never
> garbage collected, but neither are there any new allocations that need
> to be made, reducing the demands placed on the garbage-collection. In
> C++, this is handled by "overloading" the assignment operator.
I'd like to discuss some of your apparent assumptions about dynamic storage
allocation. The technique you call object pooling is known to me as
'resourcing', because some Lisp systems have a set of built-in functions and
macros for this, and the central function is named defresource. As you've
noted, this is a technique for garbage avoidance, and it can reduce the
amount of time spent performing dynamic storage management. However,
resourcing is not a panacea, and it is not useful only for systems that
First, the payback of resourcing comes from reducing the amortized costs of
dynamic storage management. That is, you can save time by not doing storage
allocation and deallocation. However, the cost of allocating a resourced
object is the sum of (1) finding an appropriate object to reuse and (2)
initializing the object.
Let's break these down. At its simplest, finding an appropriate object can
be as simple (and inexpensive) as "take the first one from the pool." So in
C or C++ this is cheaper than malloc(), and in Lisp it's probably only
fractionally more expensive than consing new storage (since consing the
storage is probably a primitive operation, while the call to
allocate-resource probably involves a few function calls.)
Once you have an object, however you get the storage, you may still have to
initialize it. The initialization costs are the same whether you allocate
new storage or reuse a resource. So in the most general case, resourcing is
probably not a huge win in Lisp. (But I want to return to this later,
because resourcing can be a big win under very specific conditions.)
Now let's look at malloc()/free() vs. cons/GC. I want to do this because
you wrote: "In garbage collected systems it is sometimes worthwhile in the
interest of efficiency to pre-allocate a set of objects that are simply
reused when discarded." I want to make it very clear that this does not
hold up as a general statement of fact.
First of all, most non-GC'd systems do their storage allocation using a
manager built upon a malloc()/free() substrate. There are a few wrinkles in
malloc()/free() implementations, but they all have these general features:
- extra-data headers (and sometimes trailers) to hold housekeeping info
- allocated storage is not implicitly moveable
- free storage is usually kept on a linked list
- allocation of all block sizes is usually from one (sometimes two) arenas
- adjacent free blocks are usually consolidated
The upshot of all this is that malloc()/free() work quite well, and very
efficiently, for programs that have the following characteristics
- allocate storage
- hold onto most (or all) of the storage throughout operation
This is true of a lot of C programs. C++ programs, OTOH, have a high rate
of allocation and deallocation because they typically copy objects rather
than referencing them through pointers. (It's a rare C++ program that uses
object identity as equality.) Under these conditions
- objects of varying sizes come and go at high rates, all the time
If your program is going to run for a few minutes or hours, this is no big
deal. But if your program has to remain running for weeks or months, most
malloc()/free() implementations will eventually degrade your program's
performance. Why? Unless the dynamic storage manager makes heroic efforts
to aggregate allocation blocks of similar sizes (and, outside of specialized
memory allocators for real-time systems, I have yet to see a malloc()/free()
implementation that does) your heap will probably accumulate a lot of
odd-sized free blocks in between your program's allocated blocks. The more
free blocks, the bigger the free list, and the more work that must be done
to find an available block and to test for adjacent free blocks to coalesce.
In fact, a significant portion of the arena's address space could be
searched for each allocation.
Once again, malloc()/free() can be very inexpensive for short-running
programs. But performance can degrade significantly when malloc()/free() is
used over a long time period for a high volume of short-lived objects.
Now let's look at a modern cons/GC algorithm. First, allocating new storage
is usually just a matter of bumping a pointer. An incremental generational
collector either does a little bit of GC for every allocation, or runs
asynchronously in the background. Furthermore, the GC maintains multiple
generations, where short-lived objects are frequently collected (and
inexpensively, because the frequency of collection limits the number of
objects in the youngest generation, and hence the size of the "arena" for
the generation.) Longer-lived objects are stored in generations that are
GCd less frequently. Over time, the amoritized GC cost of keeping a
long-lived object becomes very low.
In Lisp, pointers are not explicitly exposed to the programmer, but the
implementation has knowledge of which memory locations are pointers and
which are data. This means that objects can be moved in memory and pointers
updated to follow the relocated objects. Because of this, the GC can move
used blocks of storage and create maximal blocks of free space. And,
finally coming full circle, the availability of large contiguous blocks of
free space means that allocation is usually just a pointer adjustment to
slice off a chunk of free space.
There are other refinements to modern GC that you can read about in "Garbage
Collection: Algorithms for Automatic Dynamic Memory Management", Jones &
Lins, 1996, John Wiley & Sons, ISBN 0-471-94148-4 .
And finally, the inevitable qualifications and disclaimers, and the
unavoidable anectdote <g>... There are still a lot of bad GC
implementations in existence. Look to the long-established Lisp vendors to
have something that really works. GC is hard to do well, and the
part-timers and Johnny-come-lately's have a lot of work ahead of them.
"Conservative" bolt-on GC implementations and non-moving collectors will
never achieve the same performance as a modern GC in Lisp.
Now for the anectdote. I'm currently fixing a C++ server implementation
that must run 24x7. This system runs on huge multi-CPU systems and does
rule-based processing of about 1,000 two-hundred-byte records every second.
Over the course of a week or two, performance degrades by 50% or more
because of the underlying malloc()/free() allocator. (We do not use a GC;
all allocation and deallocation is explicit.) We must periodically bounce
the bottleneck process to reset the allocator. My proposed fix is to use a
custom memory allocator combining multiple arenas and resourcing.
I have yet to observe this kind of long-term performance degradation in a
I said I'd close this posting with a comment on where I think resourcing is
appropriate in a GC'd system. If the cost of initializing an object is
high, and the initialization can be done just once, then resourcing is a
definite win. If you can't meet both of those conditions, you're probably
better off relying on a modern GC.
[... snip ...]
David B. Lamkins <http://www.teleport.com/~dlamkins/>
> I suppose I could argue that you could provide analagous methods for
> classes or that I could define such a generic copy if I provided the
> parameters or "hints" (either as arguments or in the class definition,
> or both), but it's sort of beside the point. They aren't there. So I
> must still not be getting it.
There is such a continuum of possible interpretations of a "copy"
operation that hints or arguments would have a very hard time covering
it. This really has to do with the concept of object identity. The
defining quality of an object is that it has a identity with mutable
state. In that sense, objects can only be equal to themselves. What is
usually meant with an "equality" operator is an equivalence
relation. Although the compiler cannot check it, your "copy" operator
should return an object that is equivalent under your "equality"
operator. Notice that with cached slots, complex tree structures
etc. this can allow the physical representation of both objects to be
In practice, I find I rarely write copy operators. I usually use a
creation operator with some of the arguments coming from the original
object. Likewise, instead of general EQUAL functions, I try to name
the equivalence I'm writing with a more descriptive name.
> One thing I've figured out is that deep vs shallow copy isn't very
> meaningful in CLOS, since everything's a reference anyway. There isn't
> much point to a shallow copy: if I understand this right, you'd just
> end up with a bunch of new slots pointing to the exact same data.
Which could be quite useful in a style
(let ((new-object (shallow-copy old-object)))
(setf (slot new-object) (calculate-new-slot old-object)))
I find I use copy-list more than copy-tree.
> Yes, I had, only just today. This issue is also becoming clearer with
> some thought (and help from this newsgroup!). How do I know how to
> compare all the components that make up an object? In particular, am I
> interested in eq or equal? Tricky, but it seems it could be resolved
> the same way as copy -- you just need more information in the class
Don't let the above discourage you from presenting a design. By the
time you've taken into account circularity, logical versus physical
representation, Java's transient facility and C++ mutables, it will be
quite interesting ;-)
Also note that a lot of things cannot be copied: open network sockets,
operating system semaphores, child process id's ...
Lieven Marchand <m...@bewoner.dma.be>
Few people have a talent for constructive laziness. -- Lazarus Long
> David B. Lamkins <dlam...@teleport.com> wrote:
>> However, the cost of allocating a resourced object is the sum of
>> (1) finding an appropriate object to reuse and
>> (2) initializing the object.
> Don't forget about
> (3) the cost of accessing an object not used for a long time.
> (4) the cost of reduced locality.
> Accessing an old object typically means a cache miss, whereas - in systems
> with generational GC - a new object will be located in the current "new
> generation" area, an area which is likely better covered by the cache.
Yes. Thanks for pointing them out.
BTW, I want to take this opportunity to recommend an _excellent_ book on
dynamic storage management, in case there are people here who haven't
discovered it. Here's the reference, and a capsule review from my page
Garbage Collection: Algorithms for Automatic Dynamic Memory Management,
Jones & Lins, 1996, John Wiley & Sons, ISBN 0-471-94148-4
"If you're at all interested in dynamic storage management [even if you only
use malloc() and free()] you should read this book. As a comprehensive
survey, it sheds light upon a lot of misapprehensions about the true cost of
various techniques. As an implementation guide, it offers -- for the first
time anywhere -- a collection of code and psuedocode for many kinds of
dynamic storage managers in a single reference."
> Even worse, in systems with virtual memory, the memory page containing the
> old object might be swapped out. Accessing it will cause a memory page
> to be swapped in from disk, just to erase and overwrite its contents!
> Whereas generational GC systems and good malloc() implementations typically
> get new pages via mmap("/dev/zero")/VirtualAlloc, i.e. they are provided
> by the system and simply zeroed out.
Yes. I recall that VMS did a similar thing; they used the moniker
In the malloc() implementations I've inspected [only about three or four,
but the techniques seem to be quite consistent], additional pages get mapped
in only to expand the arena, which happens _after_ attempts at finding a
free block by traversing the free list and possibly coalescing adjacent free
blocks during the traversal.
Of course, I can envision a malloc() implementation that would expand the
arena more aggressively by making a different tradeoff between reusing old
storage and allocating new pages. But this seems like a poor choice since
it puts more pressure on VM, resulting in increased paging over the long
> Working with a lot of resourced objects will also, in the long run,
> tend to reduce the locality of reference. Without resource mechanisms,
> two objects allocated one after the other will normally be near to each
> other in memory, which is beneficial for cache and paging. When they
> are taken from resource lists, they are likely to be far away in memory.
Good point. But you're assuming a generational GC and a particular style of
resourcing which allocates objects as needed and only puts them in a
resource pool upon release.
If you have a non-moving allocator (e.g. malloc) you can get excellent
locality by preallocating a suitable number of objects all at once and
putting them into the resource pool. Then their locality remains the same
[presumably all in the same locale, since you've allocated them at the same
time] until the need arises to expand the pool. Even then, you can expand
the pool in the same way that it was initially allocated -- allocate a group
of objects all at once and immediately free them to the pool.
The behavior is a bit harder to reason about when you have a generational
GC, but I think that preallocation helps. Objects get promoted to older
generations based upon their number of collections they've survived. If you
preallocate a group of objects for a resource pool, they'll all get promoted
during the same collection, whether they're still in the pool or allocated
for use. The hard part is conveying to the GC a hint about the desired
locality of the pooled objects.
> Bruno http://clisp.cons.org/~haible/
David B. Lamkins <http://www.teleport.com/~dlamkins/>
Recently undead Isabelle to the archangel Gabriel in "The Prophecy II":
"So, you're keeping me alive because you don't know DOS?"