Welcome to comp.sw.components!!!

Bill Wolfe

unread,

Sep 3, 1989, 2:19:37 PM9/3/89

to

Welcome to the comp.sw.components newsgroup!!!

The following is a description of the scope of the newsgroup:

$ This newsgroup will facilitate discussions about software components
$ and their design, implementation, and utilization. It is probable that
$ this will be mainly a group of professional component developers, people
$ who are training to become such a professional, and/or people who have an
$ interest in developing their own components from time to time.
$
$ As a "public service", we are also here to help and advise all component
$ users who need advice, and to provide a forum for reviewing the software
$ components provided by vendors. We are also here to allow the people who
$ developed any products which are criticized to show why perhaps the
$ criticism may not be entirely justified, or to seek ideas regarding how
$ the component(s) could be improved, and to provide a means by which new
$ techniques can be spread throughout the component development community.
$
$ Also, questions such as "How can I handle problem X which arises when I
$ try to implement component Y in language Z?" and "I have a nifty priority
$ queue, and would like a B+ tree; anyone wanna trade?" are encouraged.

> What, pray tell, is a "software component"?

In _Software_Components_With_Ada_, Booch defines a software component
as "a container for expressing abstractions of data structures and
algorithms". A software component is generally expected to be:

- cohesive; it should denote a single abstraction. Examples are
the stack, queue, list, tree, and graph abstractions.

- reuseable; it should provide a sufficiently general and useful
service to ensure that it will be used over and over
again by many different programmers for many diverse
application areas. A primary reason for the existence
of software components is that they need only to be
written once; once written, the costs of development
can be spread over thousands of applications, making
the cost to any particular application trivial. This
certainly is not the case for the alternative, building
the same functionality up from scratch for each new
application.

- secure; it must perform reliably under any reasonable
circumstances. If an insertion into a tree causes
memory to be exhausted, it is the responsibility of
the component to see to it that there is no change
of state (hence, the consequence of the memory being
exhausted is simply that the insertion does not occur;
the component is left in whatever state it was in prior
to the attempted insertion), and to report the failure
to the user (e.g., by raising an exception along the lines
of Unable_To_Insert_Due_To_Memory_Exhaustion). Also,
advanced languages such as Ada make it possible to
simultaneously pound a component with arbitrarily
large numbers of service requests from many different
tasks; the component must service all such requests
in a serializable manner, without leading to an
inconsistent internal state. A well-written component
will, furthermore, satisfy such requests in parallel
(i.e., will not just achieve serializability by simply
processing requests one at a time, thus causing big delays).
Finally, it must be impossible for a user to examine
or alter the internal state of a component, unless the
user calls a procedure or function designed to provide
information regarding the component's state (e.g., the
position of the current item in a linked list); this
requirement is typically satisfied by language constructs
like the Ada "limited private" clause.

Intuitively, a software component is analogous to the hardware
components (e.g., integrated circuits) which have been used by
hardware engineers for a very long time. By adopting their
methods, we hope to reduce the costs of software development,
improve the quality of our software, and accelerate software
production. This will lead to lower financial barriers to
software development, and thereby accelerate the use of software
systems throughout the economy, with global economic benefits.

If you'd like to read more about software components, Booch's
_Software_Components_With_Ada_ is a good place to start (ISBN
0-8053-0610-2, Benjamin/Cummings, QA76.73.A35B65). Also, simply
keep reading this newsgroup!! :-)

Bill Wolfe, wtw...@hubcap.clemson.edu

Scott Henninger

unread,

Sep 4, 1989, 1:38:09 PM9/4/89

to

|>From: wtw...@hubcap.clemson.edu (Bill Wolfe)
|Subject: Welcome to comp.sw.components!!!
|Date: 3 Sep 89 18:19:37 GMT

|
|
| - reuseable; it should provide a sufficiently general and useful
| service to ensure that it will be used over and over
| again by many different programmers for many diverse
| application areas. A primary reason for the existence
| of software components is that they need only to be
| written once; once written, the costs of development
| can be spread over thousands of applications, making
| the cost to any particular application trivial. This
| certainly is not the case for the alternative, building
| the same functionality up from scratch for each new
| application.

This is a narrow viewpoint of software reuse that does no justice to the
difficulty of constructing a good software reuse system. If the above
statement were true, then the problem was solved years ago with the advent
of software libraries. Unfortunately, there are a number of problems with
software libraries that make them less than fully useful:

1. Users do not know what components are available. I.e. it is easier to
write a component themselves than to try to look for something in the
library. This is an information retrieval problem.

2. Users do not know how to use the available components or what they do.
Tools are needed to help the users with these tasks.

3. Library components rarely fit a need exactly. No matter how hard a
library developer tries to cover all contengencis, someone will
eventually need something slightly different than what is offered.
Facilities are needed to allow modification so the components can be
tailored to meet a specific user's needs.

The following article expounds on these viewpoints:

"Cognitive View of Reuse and Redesign"
G. Fischer, IEEE Software, July 1987, 4(4), pp. 60-72.

In addition, I have only addressed the reuse of implementation units. It
may be that we can get a much larger return on investment by reusing
design artifacts or other work products associated with creating a
software system.

| Intuitively, a software component is analogous to the hardware
| components (e.g., integrated circuits) which have been used by
| hardware engineers for a very long time. By adopting their
| methods, we hope to reduce the costs of software development,
| improve the quality of our software, and accelerate software
| production. This will lead to lower financial barriers to
| software development, and thereby accelerate the use of software
| systems throughout the economy, with global economic benefits.

Unfortunately, this is a misleading analogy. In a canonical sense,
hardware only has to implement one thing - a Von Neumann machine (there
are slight adjustments to this architecture, but the point stands).
Software is called upon to implement a vastly larger number of abstract
machines. A DEC VAX and a IBM 3090 are general purpose computing machines
(with different implementations). You can port programs that run on one to
the other. A math package and a windowing package perform very different
functions. You would not think of porting a logarithms functions to a
window system.

Because of this fact, certain elements, such as an ALU, are universially
necessary in hardware, but do not exist for software. Although one can
contrive certain building blocks for software, there are no elements
(other than a hardware platform and a compiler/interpreter) that must
*necessarily* exist for a software system to work. This makes software
and hardware *fundamentally* different, and to use one as an analogy for
the other will only lead to the sort of difficulties being experienced by
those who believe in this approach. For example, notice that most efforts to
create software components systems are criticized for their limited scope
of applications. I would argue that this is a direct consequence of the
hardware analogy, which is doing more harm than good.

-- Scott
sco...@boulder.colorado.edu

William Thomas Wolfe, 2847

unread,

Sep 4, 1989, 4:22:57 PM9/4/89

to

From sco...@boulder.Colorado.EDU (Scott Henninger):
> [a component should provide a generally useful reuseable service]

>
> This is a narrow viewpoint of software reuse that does no justice to the
> difficulty of constructing a good software reuse system.

The statement described good characteristics of a software component;
it did not and was not intended to address the environment in which
the component was embedded. However, the topic has arisen in this
newsgroup; in particular, I posted a favorable review some four to
six weeks ago of the article "Domain Analysis -- From Art Form to
Engineering Discipline" (Proceedings of the Fifth International
Workshop on Software Specification and Design, May 19-20, 1989,
SEN V14 N3), which views reusers as learning systems and provides
a framework for assessing the progress of a reuse system. Around
early June, there was a discussion of STARS research which was
directed toward constructing an effective Ada software component
retrieval system through the use of a knowledge-based librarian.

These topics should probably be included in the welcome message,
but any implication or inference that these topics have not been
covered in this newgroup is simply incorrect.

> | Intuitively, a software component is analogous to the hardware
> | components (e.g., integrated circuits) which have been used by

> | hardware engineers for a very long time. [...]
>
> Unfortunately, this is a misleading analogy. [...] Although one can
> contrive certain building blocks for software, [...] most efforts to

> create software components systems are criticized for their limited scope
> of applications. I would argue that this is a direct consequence of the
> hardware analogy, which is doing more harm than good.

The presently limited extent to which software components cover the
large number of available domains is largely a consequence of the fact
that domain analysis (as described in the cited article) is still an
embryonic field. IMHO, the hardware analogy is very appropriate.

Bill Wolfe, wtw...@hubcap.clemson.edu

William Thomas Wolfe, 2847

unread,

Sep 4, 1989, 9:21:47 PM9/4/89

to

On the subject of why reuse does not take place more frequently,
the article appearing just before the one Scott cited ("Can Programmers
Reuse Software?" IEEE Software, July '87, pp. 52-60) comments:

If the worth of reusing an ADT could be accurately assessed,
we would expect a person to reuse an ADT even if only a few
percent of the creation effort could be saved... Software
development personnel untrained in software reuse are
influenced by some unimportant features [of a component]
and are not influenced by some important ones... users
cannot properly assess the worth of reusing a candidate
ADT...

Also, regarding the hardware analogy, it does not draw a direct
correspondence between computer software and computer hardware;
rather, it draws an analogy to the extensive catalogs of ICs,
resistors, capacitors, etc. used by electrical engineers to build
all kinds of application-specific products. It seeks to diminish
the "art form" mindset, replacing it with "engineering discipline".

Bill Wolfe, wtw...@hubcap.clemson.edu

Scott Henninger

unread,

Sep 5, 1989, 11:07:34 AM9/5/89

to

|>From: billwolf%hazel.cs.c...@hubcap.clemson.edu (William Thomas Wolfe, 2847 )
|Date: 4 Sep 89 20:22:57 GMT

|
|>From sco...@boulder.Colorado.EDU (Scott Henninger):
|> [a component should provide a generally useful reuseable service]
|>
|> This is a narrow viewpoint of software reuse that does no justice to the
|> difficulty of constructing a good software reuse system.
|

| [...] "Domain Analysis -- From Art Form to

| Engineering Discipline" (Proceedings of the Fifth International
| Workshop on Software Specification and Design, May 19-20, 1989,
| SEN V14 N3), which views reusers as learning systems and provides

| a framework for assessing the progress of a reuse system. [information
| retrieval approaches]

This is on the right track. I still argue that there is a significant tools
problem that must be solved before reuse becomes a sucessful software
engineering strategy.
-- Scott
sco...@boulder.colorado.edu

Scott Henninger

unread,

Sep 5, 1989, 11:11:53 AM9/5/89

to

|>From: billwolf%hazel.cs.c...@hubcap.clemson.edu (William Thomas Wolfe, 2847 )

|Subject: Re: Reasons for low reuse
|Date: 5 Sep 89 01:21:47 GMT

|
| ("Can Programmers Reuse Software?" IEEE Software, July '87, pp. 52-60)

So we must ask ourslves - Do we want to "train" people to become successful
reusers, or should we provide adequate support for reusing software. It is not
clear to me that all of the training in the world will solve the problem. The
problem is twofold (maybe more): First of all, there are potentially millions
of components to choose from. No one could possibly keep track of all of them.
Support for retrieving useful components is therefore needed. Secondly, once
you've got a component, you must tailor it to your specific needs. This means
understanding the code or some description of the code. Anyone who has debugged
code written by others can attest to just how difficult this is. The
"conceptual closeness" measures devised by Prieto-Diaz are a first step in
trying to assess the modifiability of a component. One could also argue that
conceptual level (as opposed to implementation level) descriptions are needed
(no, current documentation techniques are not sufficient).

| Also, regarding the hardware analogy, it does not draw a direct
| correspondence between computer software and computer hardware;
| rather, it draws an analogy to the extensive catalogs of ICs,
| resistors, capacitors, etc. used by electrical engineers to build
| all kinds of application-specific products.

Again, this assumes that the components can be used without modification. This
is not the norm with software, but is with hardware. Also, we must keep in mind
that resistors and capacitors implement one kind of machine. For software, we
must find the equivalents of resistors and capacitors for each domain we write
software for.

| It seeks to diminish the "art form" mindset, replacing it with "engineering
| discipline".

To claim that this is true is to claim that all the world's domains can be
reduced to engineering principles. While it's a worthy goal, I'm not so sure
that it can be done.

I would like to hear other opinions. Anyone out there?
-- Scott
sco...@boulder.colorado.edu

3929

unread,

Sep 5, 1989, 3:38:43 PM9/5/89

to

In article <11...@boulder.Colorado.EDU> sco...@boulder.Colorado.EDU (Scott Henninger) writes:
>| Also, regarding the hardware analogy, it does not draw a direct
>| correspondence between computer software and computer hardware;
>| rather, it draws an analogy to the extensive catalogs of ICs,
>| resistors, capacitors, etc. used by electrical engineers to build
>| all kinds of application-specific products.
>
>Again, this assumes that the components can be used without modification. This
>is not the norm with software, but is with hardware. Also, we must keep in mind
>that resistors and capacitors implement one kind of machine. For software, we
>must find the equivalents of resistors and capacitors for each domain we write
>software for.
>
>| It seeks to diminish the "art form" mindset, replacing it with "engineering
>| discipline".
>
>To claim that this is true is to claim that all the world's domains can be
>reduced to engineering principles. While it's a worthy goal, I'm not so sure
>that it can be done.
>
>I would like to hear other opinions. Anyone out there?
>-- Scott
> sco...@boulder.colorado.edu

I think I'm probably jumping in way over my head here, but I'll hazard a very
naive opinion... :-).

It would seem to me that the hardware analogy is actually rather appropriate.
But before anyone goes and rants all over me, think about how many engineers
we have out there. Obviously, "engineering" is still an art too. Namely, it
is the art of examinging a body of solutions to previous problems and trying
to find one or more solutions which can be adapted to solve the current
problem. I don't see any way to claim that this isn't still an art, since
we would have replaced all those engineers with computers if it weren't an
art.

This makes me believe that there will always be a certain art to software
development. I would never claim that engineers are not in some sense of the
word artists. In this respect, software component development seems to have
two major fronts:

1) coming up with better techniques for identifying commonality between problems
(and therefore their solutions). This is perhaps as much a "way of thinking"
task as it is an infrastructure development task. Not to say that there isn't
a *lot* of infrastructure that we can develop. Quite the contrary, I would
agree with previous statments that software component technologies are rather
embryonic (lots of room for fortunes to be made.... :-).

2) Convince the world (through case studies) that reuse can save them time and
dollars. This is the only thing that will convince the "suits" that this
wonderful infrastructure is worth paying for and worth the cost of retraining
their people to use.

If this is all very obvious to everyone, sorry...

Thanks,
--------------------------------------------------------------------------------
Brian R. Gilstrap ...!{ {killer,bellcore}!texbell, uunet }!swbatl!uucibg
One Bell Center +----------------------------------------------------------
Rm 17-G-4 | "Winnie-the-Pooh read the two notices very carefully,
St. Louis, MO 63101 | first from left to right, and afterwards, in case he had
(314) 235-3929 | missed some of it, from right to left." -- A. A. Milne
--------------------------------------------------------------------------------
Disclaimer:
Me, speak for my company? You must be joking. I'm just speaking my mind.

William Thomas Wolfe, 2847

unread,

Sep 5, 1989, 5:24:56 PM9/5/89

to

From sco...@boulder.Colorado.EDU (Scott Henninger):

> | ("Can Programmers Reuse Software?" IEEE Software, July '87, pp. 52-60)
>

> [...] It is not clear to me that all of the training in the world will

> solve the problem. The problem is twofold (maybe more): First of all,

> there are potentially millions of components to choose from. [...]

The first problem that has to be overcome is the Not Invented
Here syndrome; we must have programmers who want to use components.
This can be addressed by training (at the university level), and
also by appeal to management (in the "real world"). Incidentally,
the hardware analogy is absolutely wonderful for convincing
non-technical managers of the value of reuse; they grasp it
instantly, and immediately become enthusiastic supporters.

> Secondly, once you've got a component, you must tailor it to your
> specific needs. This means understanding the code or some description
> of the code. Anyone who has debugged code written by others can attest
> to just how difficult this is.

Wonderful reasons for NOT DOING IT!!! In particular, components
should be designed in a generic manner, such that the "customization"
can be done automatically. As an example, I manage the Clemson Ada
Software Repository, which contains ADTs like Generic_Priority_Queue
(among others). The user can dream up a new data type, define a few
simple operations over the type (such as "=" and assignment), write
a single line of code to do the instantiation, and voila: the user
now has a new data type (a priority queue which can manage objects
of the user's new type), and a considerable range of powerful
operations over that queue, without ever looking at anyone else's
implementation code; the entire process takes a grand total of
about 5 minutes.

> | Also, regarding the hardware analogy, it does not draw a direct
> | correspondence between computer software and computer hardware;
> | rather, it draws an analogy to the extensive catalogs of ICs,
> | resistors, capacitors, etc. used by electrical engineers to build
> | all kinds of application-specific products.
>
> Again, this assumes that the components can be used without modification.
> This is not the norm with software, but is with hardware. Also, we must
> keep in mind that resistors and capacitors implement one kind of machine.
> For software, we must find the equivalents of resistors and capacitors for
> each domain we write software for.

Resistors and capacitors are domain-independent, and so it is with
abstract data types such as B+ trees, linked lists, directed graphs,
etc.; these are the resistors and capacitors of the software profession.

There are also special-purpose ICs in the hardware industry, and
these are the things which must be developed according to the methods
described by Guillermo Arango in his brilliant article on Domain Analysis.
The software equivalent of a special-purpose IC would be a package
containing a set of domain-specific data types, functions, and procedures.

Bill Wolfe, wtw...@hubcap.clemson.edu

William Thomas Wolfe, 2847

unread,

Sep 5, 1989, 5:40:24 PM9/5/89

to

From article <7...@swbatl.UUCP>, by uuc...@swbatl.UUCP (3929):

> Obviously, "engineering" is still an art too. Namely, it is the art of

> examining a body of solutions to previous problems and trying to find one

> or more solutions which can be adapted to solve the current problem. I
> don't see any way to claim that this isn't still an art, since we would
> have replaced all those engineers with computers if it weren't an art.

There is a general progression which all fields go through:

Art form => Engineering discipline => Hard Science

The progress along this route is proportional to what is known
about the field. The continuum indicates the progression from
all heuristics (toward the left) to all deterministic solutions
(toward the right); the fields which fall in between the two
extremes are those for which heuristics are used to fill in the
areas in which there still is not sufficient science.

When a field is referred to as an engineering discipline, it means that
informal heuristics are on the way to becoming an endangered species,
and the software field is progressing very quickly in that direction.

Bill Wolfe, wtw...@hubcap.clemson.edu

John Nagle

unread,

Sep 6, 1989, 11:05:45 PM9/6/89

to

The "reuse" concept is fundamentally flawed. Software components are
not "reused", they are "used". This is to say that good software components are
not leftovers from past projects, but units designed from the beginning to be
components. They are productized, documented, and sold as components.

There is a small but successful industry selling software components
for microcomputers. Graphics packaged, database management packages,
and communications packages seem to be the most common offerings.

Having used a few of these products, my main observation is that
using software components from multiple vendors tends to result in
annoying problems. A typical problem stems from several components
assuming that they can safely replace some standard portion of the
standard C library (such as "printf" or "exit") with their own version.
This is a good reason for only buying components for which one gets
source code, so that such problems can be resolved.

Another problem is incompatibility of components with new releases
of other tools. When a new release of a C compiler comes out, it seems
to be necessary, annoyingly, to upgrade some components at the same time.
This can create problems, since the upgraded components may not be available
for some time after the release of the compiler. Operating system releases
can cause similar trouble. This can be a serious problem; in the worst case,
dependence on a software component which is not being upgraded can result
in the inability to continue work on a program.

John Nagle

Scott Henninger

unread,

Sep 7, 1989, 2:29:53 PM9/7/89

to

|>From: billwolf%hazel.cs.c...@hubcap.clemson.edu (William Thomas Wolfe, 2847 )
|

| The first problem that has to be overcome is the Not Invented
| Here syndrome; we must have programmers who want to use components.
| This can be addressed by training (at the university level), and
| also by appeal to management (in the "real world").

To say this is to say that current reuse environments are adequate. They
are not. The fact of the matter is that it is easier and takes less time
to code it yourself than reuse existing code with current technology.
Besides, and I repeat, *people are not aware of what reusable components
exist*. This is primarily a tools problem, not a training problem.
Training can only get you so far. Supporting tools must exist before
people will practice what they are preached.

It should also be noted that the NIH syndrome exists in any design
discipline. To paraphrase Herbert Simon: "when designing artifacts for
people, *do not postulate a new man*". The NIH syndrome is simply a part
of human nature. You, President Bush, or anyone else, ARE NOT GOING TO
CHANGE IT. Claiming that the "the problem has to be overcome" in this
case makes the critical mistake of trying to change human nature.

|> Secondly, once you've got a component, you must tailor it to your
|> specific needs. This means understanding the code or some description
|> of the code. Anyone who has debugged code written by others can attest
|> to just how difficult this is.
|
|
| Wonderful reasons for NOT DOING IT!!! In particular, components
| should be designed in a generic manner, such that the "customization"
| can be done automatically.

Unfortunately, this is a yet unrealized goal. The fact of the
matter is that designing generic code is extermely difficult and time
consuming. And it will NEVER be able to handle all of the cases that
programmers will come up with. The ability to tailor existing code is
essential.

| As an example, I manage the Clemson Ada
| Software Repository, which contains ADTs like Generic_Priority_Queue
| (among others). The user can dream up a new data type, define a few
| simple operations over the type (such as "=" and assignment), write
| a single line of code to do the instantiation, and voila: the user
| now has a new data type (a priority queue which can manage objects
| of the user's new type), and a considerable range of powerful
| operations over that queue, without ever looking at anyone else's
| implementation code; the entire process takes a grand total of
| about 5 minutes.

This is precisely my point. Whenever people talk about generic code, they
inevitably give examples of *simple* ADTs; ones that you learn in your
freshman programming class. I would argue that this would constitute an
extermely small percentage of the code for, say, an accounting program, a
user interface, or microcode for a tape drive. The extreme effort
expended in creating a generic stack, queue, or whatever brings little
savings to one trying to create a real application.

| There are also special-purpose ICs in the hardware industry, and
| these are the things which must be developed according to the methods
| described by Guillermo Arango in his brilliant article on Domain Analysis.
| The software equivalent of a special-purpose IC would be a package
| containing a set of domain-specific data types, functions, and procedures.

I agree with this in principle. What we have to avoid is making the
arrogant statement that Computer Science will give us the ability to
formalize all domains; i.e. succeed where others have been "failing" for
tens, hundreds or thousands of years (all the way back to Plato).
Guillermo's approach of "controlled approximations to *satisfycing*
solutions" is exactly what I'm arguing for.

Let me make a clarifying remark. I have always stated that the hardware
analogy is *misleading* and is doing a great deal of harm to current thinking
about software reuse. There are surface similarities, but the problem is
much greater for software. A hardware engineer can flip through a few
hundred pages of hardware component descriptions and find what he needs.
Because software is so much more diverse than hardware, the number of
potential components increases by orders of magnitude. This makes the
problem *fundamentally* different - tools are needed to assist the
programmer.

-- Scott
sco...@boulder.colorado.edu

Rob MacLachlan

unread,

Sep 8, 1989, 1:09:02 PM9/8/89

to

>From: billwolf%hazel.cs.c...@hubcap.clemson.edu (William Thomas Wolfe, 2847 )

>Subject: Re: Reasons for low reuse

>Date: 5 Sep 89 21:40:24 GMT

>
> There is a general progression which all fields go through:
>
> Art form => Engineering discipline => Hard Science
>
> The progress along this route is proportional to what is known
> about the field.

This is a ridiculously sweeping generalization, and also one pretty
discredited in philosophy-of-science circles. This "physics envy" has been
a justification for much bad science (e.g. behaviorism.)

As to the hardware analogy, I am not surprised non-technical managers find
it seductive: oversimplifications usually are. It does show that it would
be *nice* to have standard software components, but it doesn't show that it
is possible.

There is an illusory similarity because the analogy compares *computer*
hardware and *computer* software. You could have said:
"It takes only 47 kinds of screws to build an automobile, so we should
be able to make do with 47 sofware components."

But of course, that would sound silly...

Rob

Bruce Cohen;685-2439;61-028

unread,

Sep 8, 1989, 3:23:47 PM9/8/89

to

In article <11...@boulder.Colorado.EDU> sco...@boulder.Colorado.EDU (Scott Henninger) writes:

>Let me make a clarifying remark. I have always stated that the hardware
>analogy is *misleading* and is doing a great deal of harm to current thinking
>about software reuse. There are surface similarities, but the problem is
>much greater for software. A hardware engineer can flip through a few
>hundred pages of hardware component descriptions and find what he needs.
>Because software is so much more diverse than hardware, the number of
>potential components increases by orders of magnitude. This makes the
>problem *fundamentally* different - tools are needed to assist the
>programmer.
>

There is another reason why the hardware problem is simpler than the
software: the hardware domain has been deliberately restricted to make
standard components easy to specify. It's taken many years, but we
have arrived at the point where any new family of components MUST adhere to
a set of standard interfaces (five volt power supply, high-true logic,
standard logic level voltage minima and maxima, etc., etc. Standardizing
packages has enforced constraints on the partitioning of hardware systems
(if you have only 40 pins, you can't transfer 200 signals directly). And
the maximum manufacturable number of logic gates on a chip at a given time
places a limit on what a designer will attempt on one chip; software size
limitations are much more elastic.

Beyond these standardizations is another one: most hardware designs these
days are for digital transformers: widgets which take in a set of digital
signals and put out another set in response. Market conditions typically
dictate things like timing requirements, word size, etc. Similar
constraints on software are again less demanding: ASCII for character
representation does not really restrict the input much since there are so
many possible meaningful combinations of characters in any useful input
language.

>-- Scott
> sco...@boulder.colorado.edu

"Men in padded bras don't look the same falling from high places."
- R.A. McAvoy - "The Third Eagle"

Bruce Cohen
bru...@orca.wv.tek.com
Interactive Technologies Division, Tektronix, Inc.
M/S 61-028, P.O. Box 1000, Wilsonville, OR 97070

William Thomas Wolfe, 2847

unread,

Sep 9, 1989, 3:21:33 PM9/9/89

to

From article <13...@well.UUCP>, by na...@well.UUCP (John Nagle):

> Having used a few of these products, my main observation is that
> using software components from multiple vendors tends to result in
> annoying problems. A typical problem stems from several components
> assuming that they can safely replace some standard portion of the
> standard C library (such as "printf" or "exit") with their own version.
> This is a good reason for only buying components for which one gets
> source code, so that such problems can be resolved.

Alternatively, you can use a language whose designers recognized
the need for portability; Ada prohibits subsets and supersets, and
issues validation certificates to compilers only after they have
passed a rigorous validation suite; even then, the certificate is
only good for about 18 months, at which point the compiler must face
a newer, stronger validation suite if it is to obtain a new certificate.

Bill Wolfe, wtw...@hubcap.clemson.edu

William Thomas Wolfe, 2847

unread,

Sep 9, 1989, 4:13:43 PM9/9/89

to

From sco...@boulder.Colorado.EDU (Scott Henninger):

> To say this is to say that current reuse environments are adequate. They
> are not. The fact of the matter is that it is easier and takes less time
> to code it yourself than reuse existing code with current technology.

Not true; the work required to repeat the design, implementation,
and testing of a component is considerable. Furthermore, the
component's implementation is likely to be considerably more
sophisticated than the probable result of an attempt to do it
from scratch.

> Besides, and I repeat, *people are not aware of what reusable components
> exist*. This is primarily a tools problem, not a training problem.

My perspective is that if the market exists (i.e., if people are
looking for components and wishing for better tools), the tools
will spring up to meet the demand. Where people don't bother to
look, the tools aren't going to appear courtesy of the Tool Fairy.

> The NIH syndrome is simply a part of human nature.

There are a great many cultures throughout the world which practice
ideas which appear to be alien from the perspectives of other cultures.
This does not imply that a baby from one of those other cultures could
not be socialized into such a culture, such that the ostensibly alien
practices seem perfectly natural. Hence, I suggest that professional
training can be designed such that practices are internalized which
preclude the possibility of rejecting an idea on the basis of its origin.

> designing generic code is extermely difficult and time consuming.

The engineering of ANY product is usually difficult and time-consuming,
especially given that the designer of a generic component is trying to
ensure a great many things that the user doesn't even know s/he needs.
As a simple example, suppose that a user takes a component and uses it
in a sequential application. Later, the application is maintained such
that multitasking is introduced. Fortunately, the component's designer
implemented the component such that multiple requests can be processed
in parallel, while still maintaining serializability and recoverability.

This demonstrates that the benefits of using a component can extend far
beyond the initial construction of the application. Such an application
will probably be of considerably higher quality than a corresponding
application which did not make use of sophisticated software components.

> And it will NEVER be able to handle all of the cases that programmers
> will come up with. The ability to tailor existing code is essential.

Given the existence of a highly sophisticated component, any user who
tries to modify it will probably only get into trouble. It's much
better to use the component in the implementation of a higher-level
user-defined interface than to try to modify a component's implementation.

> Whenever people talk about generic code, they inevitably give examples
> of *simple* ADTs; ones that you learn in your freshman programming class.
> I would argue that this would constitute an extermely small percentage of
> the code for, say, an accounting program, a user interface, or microcode
> for a tape drive.

Domain-specific components will come with their own abstract data types,
and will normally make considerable use of those types in the supplied
procedures and functions. While it is certainly conceivable that
generic parameters would prove useful in situations in which the
domain-specific processing is performed on arbitrary data types,
or using arbitrary user-defined procedures or functions, the fact
that generic code is much more vital to the area of "container" ADTs
has resulted in a short-term concentration of interest in that area.

Also, some of the situations you describe are ones in which a different
technique is applicable: packaging. An abstract interface will be
constructed (e.g., a standard CRT (or CPU) interface), and the
implementation of that package will translate those commands into
whatever is necessary to implement them on a given hardware system.
When porting the application it is only necessary to rewrite the
machine-dependent parts of the system, which will have been segregated
into the bodies of specific packages. The protection provided by the
package specification will ensure that the rest of the application
will remain unaffected.

> Because software is so much more diverse than hardware, the number of
> potential components increases by orders of magnitude. This makes the
> problem *fundamentally* different - tools are needed to assist the
> programmer.

Tools *are* needed, but they won't sell if there isn't enough demand!!
If organizations can be convinced to commit to a reuse program, then
the tools will quickly follow.

Bill Wolfe, wtw...@hubcap.clemson.edu

Golden Richard

unread,

Sep 9, 1989, 7:17:17 PM9/9/89

to

Another (perhaps secondary) reason that I see for the current "low reuse"
syndrome is the number of extremely sloppy components that are available.
While working on a project at the University of New Orleans, we decided
to "save time" and purchase a number of C components from a company in
New Jersey (whose name I should actually mention, but I won't). To make
a long story short, some of the development ended up taking 10X longer than
if we had simply written everything ourselves. Bugs started to creep in
and workarounds sometimes took days.

I'm all for reusable software components, but some reasonably formal system
of specifying *exactly* what the components do and a verification method
for providing better than "a 50-50 chance" of the components meeting specs
are absolutely necessary. A major problem with using someone
else's code in object form is that, barring a rigorous specification, it's
virtually impossible to ascertain *exactly* what a component does. Having
source is helpful, but who wants to poke through hundreds (thousands?) of
pages of code "verifying" functionality?

-=-
Golden Richard III OSU Department of Computer and Information Science
gric...@cis.ohio-state.edu "I'm absolutely positive! ...or not."

William Thomas Wolfe, 2847

unread,

Sep 9, 1989, 9:31:36 PM9/9/89

to

From gric...@skiff.cis.ohio-state.edu (Golden Richard):

> I'm all for reusable software components, but some reasonably formal system
> of specifying *exactly* what the components do and a verification method
> for providing better than "a 50-50 chance" of the components meeting specs
> are absolutely necessary.

Actually, one of the purposes of this group is to provide a forum
in which manufacturers of substandard components can be identified
(and given an opportunity to defend themselves).

If such information is widely distributed throughout the component
user community, then such companies can be subjected to the economic
force of the market. They will have to either improve their products
or go out of business.

So if you have knowledge of a company which is marketing sloppy
products (such as that so-far-unnamed company selling C components
in New Jersey), please be quite specific; this will result in the
strengthening of the entire components market, to everyone's benefit.

Bill Wolfe, wtw...@hubcap.clemson.edu

William Thomas Wolfe, 2847

unread,

Sep 10, 1989, 1:06:26 PM9/10/89

to

From article <61...@pt.cs.cmu.edu>, by r...@wb1.cs.cmu.edu (Rob MacLachlan):

>> There is a general progression which all fields go through:
>>
>> Art form => Engineering discipline => Hard Science
>>
>> The progress along this route is proportional to what is known
>> about the field.
>
> This is a ridiculously sweeping generalization, and also one pretty
> discredited in philosophy-of-science circles. This "physics envy"
> has been a justification for much bad science (e.g. behaviorism.)

This is not "physics envy", just an empirical description.

To show this, let's consider one of the "hardest" cases: the painting,
sculpture, etc. which collectively is referred to as Art. One might
think that there is no prospect for the evolution of these areas into
engineering discipline, or even into hard science. After all, we're
no closer to defining Art than we are to eradicating the cockroach...
but the two problems are actually "hard" for much the same reason.

The cockroach eradication problem is hard because we have not yet
been able to precisely define the cockroach system. Although much
is known (just as much is known about humans), we still do not know
the detailed operations of every organ. In particular, we do not
completely understand the operations of a cockroach's brain, much
less a human brain. If we were able to precisely define a cockroach,
at the molecular level, it would then be a matter of designing special
viruses which were programmed to seek out and destroy anything having
a specific genetic sequence which uniquely identifies the targeted
type of insect. In this way, the problem of cockroach eradication
would move from the present state (between art form and engineering
discipline) to almost a science. If the viruses could be programmed
such that it was provably impossible to elude the virus (through
genetic mutations) due to the high degree of genetic coverage the
virus possessed, and such that they would recursively chase down
every cockroach in the universe under consideration (essentially
driving the insect into extinction), then we would have the problem
down to a science.

Now why is it that we have trouble defining Art? It is precisely
because we do not have a precise definition for the term "human".
Given such a definition, the Art problem could conceivably be expressed
in terms of the production of specific neurochemicals in the user's
brain, or the formation of specific ideas (whatever the representation
of an idea is...), or the formation of specific memories. Having done
this, Art becomes an engineering discipline. As more is known about
how to satisfy the desired constraints upon the user's brain, Art
continues to move into the realm of science. Like the cockroach
eradication problem, the progress in this field has been blocked for
a very long time because it depends upon progress in some very hard
underlying problems which have taken a very long time to solve.

(I have directed followups to sci.philosophy.tech, since this is
getting pretty distant from comp.sw.components...)

Bill Wolfe, wtw...@hubcap.clemson.edu

Scott Simpson

unread,

Sep 11, 1989, 1:39:13 PM9/11/89

to

In article <60...@tut.cis.ohio-state.edu> Golden Richard <gric...@cis.ohio-state.edu> writes:
>I'm all for reusable software components, but some reasonably formal system
>of specifying *exactly* what the components do and a verification method
>for providing better than "a 50-50 chance" of the components meeting specs
>are absolutely necessary. A major problem with using someone
>else's code in object form is that, barring a rigorous specification, it's
>virtually impossible to ascertain *exactly* what a component does.

I've been reading Bertrand Meyer's book, and one of the ways Eiffel
enforces correctness is by specifying semantic assertions that must be
met by routines in a class. Eiffel uses preconditions (require),
postconditions (ensure) and class invariants (invariant).
Preconditions and postconditions must only be met at the beginning and
end of a routine. Class invariants must be met at all "stable" times
(between routine calls). If you don't meet an assertion, an exception
is raised. Also, hiers to a class must meet all the assertions of its
parent, so you can't change behavior by inheriting. For example, the
ADT for a stack may look like (this is not Eiffel, just an ADT spec)

TYPES -- Syntax
STACK[X] -- Generic stack with type X
FUNCTIONS -- Syntax
empty: STACK[X] -> BOOLEAN --Accessor function. Stack on left
new: -> STACK[X] --Creation function. Stack on right
push: X x STACK[X] -> STACK[X] --Transformer function.
--Stack on both sides.
pop: STACK[X] /-> STACK[X] --Transformer. /-> denotes
--partial function. I.e.,
--function doesn't work for
--all values (like stack
--empty)
top: STACK[X] /-> X --Accessor
PRECONDITIONS -- Semantics. Clarifies partial functions.
pre pop(s:STACK[X]) = (not empty(s))
pre top(s:STACK[X]) = (not empty(s))
AXIOMS -- Semantics
For all x:X, s:STACK[X]:
empty(new())
not empty(push(x,s))
top(push(x,s))=x
pop(push(x,s))=s

There is a lot of information there that specifies exactly how a stack
behaves. You can model virtually all of this in Eiffel.

Here is an Eiffel implementation:

class STACK[T]
export
push, pop, top, empty, nb_elements, empty, full -- What clients can see
feature
implementation: ARRAY[T];
max_size: INTEGER;
nb_elements: INTEGER;

Create(n:Integer) is
-- Allocate stack for maximum of n elements
-- (or for no elements if n < 0)
do
if n>0 then max_size := n end;
implementation.Create(1, max_size)
end; -- Create

empty : BOOLEAN is
-- Is stack empty?
do
Result := (nb_elements = 0)
end; -- empty

full : BOOLEAN is
-- Is stack full?
do
Result ;= (nb_elements = max_size)
end; -- full

pop is
-- Remove top element
require -- Precondition!
not empty -- i.e. nb_elements > 0
do
nb_elements := nb_elements - 1
ensure -- Postcondition!
not full;
nb_elements = old nb_elements - 1
end; -- pop

top : T is
-- Top element
require
not empty -- i.e. nb_elements > 0
do
Result := implementation.entry(nb_elements)
end; -- top

push(x : T) is
-- Add x on top
require
not full -- i.e. nb_elements<array_size in this representation
do
nb_elements := nb_elements + 1;
implementation.enter(nb_elements, x)
ensure
not empty;
top = x;
nb_elements = old nb_elements + 1
end; -- push

invariant -- These always must satisfied at stable states.
0 <= nb_elements; nb_elements <= max_size;
empty = (nb_elements = 0)
end -- class STACK

The only AXIOM that is not ensured by this code is

pop(push(x,s)) = s

so you can see that enforcing correctness can be facilitated by the use
of assertions.
Scott Simpson
TRW Space and Defense Sector
usc!trwarcadia!simpson (UUCP)
trwarcadia!sim...@usc.edu (Internet)

Gary Murphy

unread,

Sep 12, 1989, 10:31:39 AM9/12/89

to

In article <60...@tut.cis.ohio-state.edu> Golden Richard <gric...@cis.ohio-state.edu> writes:

>Another (perhaps secondary) reason that I see for the current "low reuse"
>syndrome is the number of extremely sloppy components that are available.

>...[an example of buying object code] ... A major problem with using someone

>else's code in object form is that, barring a rigorous specification, it's
>virtually impossible to ascertain *exactly* what a component does. Having
>source is helpful, but who wants to poke through hundreds (thousands?) of
>pages of code "verifying" functionality?

I've had similar experiences with C library functions packaged with a
compiler, with 3rd party code that uses 'slightly' variant library
functions, and some OOP systems where the interface is formally spelled
out but the rest kept elsewhere - in all three cases, I've either needed
to hunt down the real source or waste a great deal of time probing a
black box.

In this day and age, few programmers would even poke through the lines
they wrote themselves without a symbolic debugger and if source is included,
all it takes is a single re-compile to create the step-able version.

And speaking of reuse, and this problem of source-code secrets, how does
this sit with the misplaced metaphor of software-as-literature with respect
to copyrights? (If I quote Tolstoy's first edition, and the passage is
stated contrarily in the fifth, do I still owe royalties? :-).
--
Gary Murphy - Cognos Incorporated - (613) 738-1338 x5537
3755 Riverside Dr - P.O. Box 9707 - Ottawa Ont - CANADA K1G 3N3
e-mail: decvax!utzoo!dciem!nrcaer!cognos!garym
Cosmic Irreversibility: 1 pot T -> 1 pot P, 1 pot P /-> 1 pot T

Don Dwiggins

unread,

Sep 12, 1989, 3:02:24 PM9/12/89

to

Being real process-oriented lately, I'd like to put this reuse discussion in
a broader setting. Many assertions have been made about the possibilities
and issues of reuse, each based on unstated or only partially stated
assumptions about the process and setting of the supposed act of reuse. In
fact, there can be many such settings, each implying different constraints
and characteristics of the components, the collection of them, and the
"reuser". For example,

The personal toolkit: an experienced machanic collects high-quality
tools, assumes responsibility for them, and takes them with him/her.
Having created a module to solve a problem, an experienced software
engineer might well generalize the solution a bit, polish the
implementation and the interface description, and drop it in the kit.

The corporate resource: rather than continually reinventing solutions to
recurring problems, the company creates a library and supporting staff
to capture the solutions and make them available to new projects. Note
that there's likely some domain leverage here; the company's projects
will often cover a lot of the same ground.

The commercial product: this is what seems to have been the assumption
behind most of the discussions so far. Even here, there are many
possible relationships among the component and library creators,
sellers, and buyers, and probably many possible types of "component"
(how about selling detailed designs or generic sources like MacApp?)

In all these settings, I think that the major problems are organizational,
psychological, and institutional -- meeting the differing goals of the
various folks involved. It'll probably take some "process prototyping" to
work these out. The technical problems that arise will be solved as part of
all this (and some technical issues will turn out to be non-problems).

What I'd like to see is some reuse success and horror stories along this
line: what was the setting and process, how did it work out, what problems
arose and how were they solved (if they were)?

--
Don Dwiggins "Solvitur Ambulando"
Ashton-Tate, Inc.
dwig...@ashtate.uucp

Ian Hopper

unread,

Sep 12, 1989, 11:43:32 PM9/12/89

to

I have always felt that some form of Garbage Collection is
needed in order to do a decent job on reusable "Collection"
modules. Since the current favorite language (C++) does
not consistiently (or easily) support GC, we do not see
good quality collection modules.

I believe the Smalltalk books come to the same conclusion, but
neither Smalltalk nor it's creators are perfect.

The classic approach would be to copy-in and copy-out the contents
of such containers, but simply passing pointers to contents is
often much faster.

Do we focus our efforts on languages that have garbage collection,
or are we forced to use the copy-in/copy-out approach? Regrettably,
the USING code is quite different in the various cases.

Looking forward to responses,
-- Ian Hopper

PS: Writers of collection modules in GC-based environments should
not forget to create dynamically-expanding versions of their collections.
It is very annoying to have to estimate the size of collections in
advance, things always blow up on large data sets.

Sarge Gerbode

unread,

Sep 13, 1989, 10:59:06 PM9/13/89

to

In article <64...@hubcap.clemson.edu> Bill Wolfe writes:

>From article <61...@pt.cs.cmu.edu>, by r...@wb1.cs.cmu.edu (Rob MacLachlan):
>There is a general progression which all fields go through:

>Art form => Engineering discipline => Hard Science

>The progress along this route is proportional to what is known
>about the field.

It seems to me, rather, that the three disciplines can better be
distinguished by their purposes.

Art is (I believe) principally concerned with the production and
communication of subjective experience (feelings, thoughts,
sensations, intimations, and the like).

Engineering is primarily concerned with control over the physical
universe.

Science is concerned with discovery of the laws of nature.

Any of these disciplines can be done with a greater or lesser amount
of knowledge. Various forms of crafts (such as those we learned in
Boy Scout camp) are primitive forms of engineering that require
considerably less knowledge than designing a spacecraft. The
fingerpainting we did in school is considerably less knowledgeable
than the work of Rembrandt. A baby's attempt to find out about
physical reality by alternately tasting, feeling, smelling, and seeing
an object is a primitive form of science -- he too, is concerned with
discovering the regularities in his environment.

I don't think an expert in art is any less knowledgeable than his
counterpart in engineering, or that an expert engineer is more
ignorant than a top-level scientist. It is just that the subject
matter is different and the purpose is different.

Is one form of knowledge more "proper" than another? I don't see how
one could say that.
--
Sarge Gerbode -- UUCP: pyramid!thirdi!metapsy!sarge
Institute for Research in Metapsychology
950 Guinda St. Palo Alto, CA 94301

Russell Turpin

unread,

Sep 14, 1989, 1:21:43 PM9/14/89

to

"Is one form of knowledge more 'proper' than another? I don't
see how one could say that."

Sarge Gerbode

"...the proper study of man is man."

Alexander Pope (?)

Barry W. Kort

unread,

Sep 14, 1989, 9:39:33 PM9/14/89

to

In article <68...@cs.utexas.edu> tur...@cs.utexas.edu (Russell Turpin) writes:

> "Is one form of knowledge more 'proper' than another? I don't
> see how one could say that."

> Sarge Gerbode

I don't know about "proper", but some forms are more useful.

When a jigsaw puzzle is assembled so as to reveal the big picture,
the information is more usable than when the pieices lay jumbled
up in the box.

Facts which are sorted and organized are more useful than a
collection of unorganized data. We structure and organize
information by putting it into outline form, corresponding
to a heirarchical or "tree" topology. Complex systems of
information are compiled into "semantic networks" which
have richer topology than simple trees. Hypercard is a
good example of a tool for organizing information into a
semantic network, where you can easily navigate through
the knowledge base. Modern computer-based thesauri are
another example of a nicely structure semantic network.

So I feel it is useful (maybe even proper) to knit pieces
of information into a fabric of knowledge.

--Barry Kort

Scott Henninger

unread,

Sep 15, 1989, 1:09:45 PM9/15/89

to

|>From: hop...@ntmtv.UUCP (Ian Hopper)

|
|I have always felt that some form of Garbage Collection is
|needed in order to do a decent job on reusable "Collection"
|modules. Since the current favorite language (C++) does
|not consistiently (or easily) support GC, we do not see
|good quality collection modules.

Quality in what sense? Program efficiency? Cognitive efficiency (i.e. how
readable it is)? Correctness? Reliability?

-- Scott
sco...@boulder.colorado.edu

William Thomas Wolfe, 2847

unread,

Sep 16, 1989, 4:17:00 PM9/16/89

to

From article <1...@ntmtv.UUCP>, by hop...@ntmtv.UUCP (Ian Hopper):

> I have always felt that some form of Garbage Collection is
> needed in order to do a decent job on reusable "Collection"
> modules.

Actually, garbage collection is unacceptable to certain
users, real-time users in particular. Therefore, if one
is designing a component for the largest possible audience,
that component must manage its own space. Anything less is
going to result in a component which is essentially useless
when real-time software systems are being constructed.

As far as languages go, I don't make use of GC (for the reason
outlined above), and therefore don't care whether a language
has GC or not. Considerations like whether or not *secure* ADTs
can be constructed (are limited private types available?), whether
they can be constructed to be safe and efficient in the presence of
multitasking, and so on, are much more important.

If there *is* garbage collection, whether required by a language
or provided as a compiler feature (e.g., in Ada, where GC is an
option that compilers are free to provide if the market wants it),
my primary concern is that I be able to turn the damn thing off.

Fortunately, Ada provides the pragma CONTROLLED, allowing me to
specify that a given type is not to be touched by any garbage
collector such that the compilers are required to obey, so there
is room for coexistence as far as GC is concerned.

Bill Wolfe, wtw...@hubcap.clemson.edu

John Gateley

unread,

Sep 17, 1989, 10:03:26 PM9/17/89

to

In article <64...@hubcap.clemson.edu> billwolf%hazel.cs.c...@hubcap.clemson.edu writes:
<From article <1...@ntmtv.UUCP>, by hop...@ntmtv.UUCP (Ian Hopper):
<> I have always felt that some form of Garbage Collection is
<> needed in order to do a decent job on reusable "Collection"
<> modules.
<
< Actually, garbage collection is unacceptable to certain
< users, real-time users in particular.

This is not true: there exists at least one real-time garbage collection
algorithm (don't have the reference right now, email if you want it).

John
gat...@m2.csc.ti.com

William Thomas Wolfe, 2847

unread,

Sep 18, 1989, 7:21:36 AM9/18/89

to

From gat...@m2.csc.ti.com (John Gateley):
> There exists at least one real-time garbage collection algorithm

> (don't have the reference right now, email if you want it).

I recently read of a linear-time algorithm (ignoring constant
factors, of course) -- the only problem is that it required
sixteen times the amount of space actually needed by the program!!!

The point is, any system which uses GC is going to be considerably
less efficient (and constant factors are important here!) than a
competing system in which the fact that an object is no longer
needed is directly communicated to the storage management system.

Now the advocates of GC frequently claim that space management is
too heavy a burden to be placed on the application programmer, and
this may well be true. However, there is a much better way to deal
with this problem. All that is necessary is to encapsulate the
details of storage management within abstract data types, such that
the destruction algorithm is integrated into the block exit mechanism.

By doing this, we place a level of abstraction between the application
programmer and the details of storage management. The use of pointers
is confined to the ADT's implementation, and the application programmer
is shielded from all the resulting complexities. The interface also
simplifies the work of the ADT's implementor; the destruction algorithm
is usually quite straightforward.

Worrying about GC is essentially barking up the wrong tree; what we
need is better reuse mechanisms, so that we can also gain the added
benefit of exploiting the implicitly supplied information regarding
the fact that a given variable is no longer needed which is naturally
produced (with no special effort on the part of the application
programmer) upon each and every block exit. Thus, we can have our
programming convenience WITHOUT eating our memory or CPU cycles, and
thereby keep everyone happy (except perhaps those hackers who get a
charge out of playing with pointers).

Bill Wolfe, wtw...@hubcap.clemson.edu

Steve Tynor

unread,

Sep 18, 1989, 9:05:17 AM9/18/89

to

In article <64...@hubcap.clemson.edu> billwolf%hazel.cs.c...@hubcap.clemson.edu writes:

...

> By doing this, we place a level of abstraction between the application
> programmer and the details of storage management. The use of pointers
> is confined to the ADT's implementation, and the application programmer
> is shielded from all the resulting complexities. The interface also
> simplifies the work of the ADT's implementor; the destruction algorithm
> is usually quite straightforward.

Freeing the memory _is_ usually straightforward. Knowing when it's safe to
do so _isn't_. That's when GC (or reference counting) becomes necessary.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If the facts do not conform to the theory, they must be disposed of.

Steve Tynor
Georgia Tech Research Institute
Artificial Intelligence Branch
ty...@prism.gatech.edu

Rob MacLachlan

unread,

Sep 18, 1989, 10:22:11 AM9/18/89

to

>From: hop...@ntmtv.UUCP (Ian Hopper)
>Newsgroups: comp.sw.components

>Subject: Re: Reasons for low reuse

>Date: 13 Sep 89 03:43:32 GMT

>
>I have always felt that some form of Garbage Collection is
>needed in order to do a decent job on reusable "Collection"
>modules.

[...]

>
>The classic approach would be to copy-in and copy-out the contents
>of such containers, but simply passing pointers to contents is
>often much faster.
>
>Do we focus our efforts on languages that have garbage collection,
>or are we forced to use the copy-in/copy-out approach? Regrettably,
>the USING code is quite different in the various cases.

It seems to me that once you admit dynamic allocation into the language, one
must at least concede the \desirability/ of having GC. The only question is
whether the cost of supporting GC is excessive. There are two costs that I
see:
1] A performance penalty due to GC time and tag manipulation. Unless GC
support is controlled by a compiler switch, some of this overhead is also
present in programs that don't use GC. GC can also cause unpredicable
delays.
2] At the language level, a certain discipline in pointer use is required
so that the garbage collector can locate pointers and not be confused by
random data that might be a pointer.

Being a Lisp programmer, it is understandable that I don't find either of
these costs prohibitive. The performance penalty isn't as much as people
suppose: a lot of progress has been made in GC algorithms in the past ten
years, and these algorithms have recently begun to see practical use in
languages such as Lisp, Smalltalk and ML. Also, as you hinted at above,
there are often hidden efficiencies that can only be safely realized in
systems that support GC. So far as the language style issue, this isn't a
big problem for high-level languages other than C.

I think that Lisp is a constructive proof of the reuse potential in systems
that support "generic types", GC and first-class functions (e.g. procedures
in records, etc.) As many whining system managers will tell you, Lisp
systems are big. That bigness is mainly because a Lisp image has lots of
code in it; Lisp programs tend to have big working sets because lots of
that code is used.

Coming from a Lispy language perspective, the main challenge is not to make
the system smaller, but to make it bigger so that there is more stuff out
there that the programmer can use. (Of course, we want to maintain similar
cost-effectiveness, rather than just bloating with junk.)

>PS: Writers of collection modules in GC-based environments should not
>forget to create dynamically-expanding versions of their collections.

>Itis very annoying to have to estimate the size of collections in

>advance, things always blow up on large data sets.

Yep, this is often something that we have to flame new Lisp programmers for
when we are breaking them in. Of course, if you use a linked-list
implementation of your data structures, this happens for free.

Rob

Ted Dunning

unread,

Sep 18, 1989, 11:19:31 AM9/18/89

to

i find that i am mostly in agreement with mr. wolfe when

In article <64...@hubcap.clemson.edu> billwolf%hazel.cs.c...@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) writes:

The point is, any system which uses GC is going to be considerably
less efficient (and constant factors are important here!) than a
competing system in which the fact that an object is no longer
needed is directly communicated to the storage management
system.

this is not always true. if the objects are heavily shared, or if
point in time that the object becomes unreferencable is obscure, then
this is a difficult problem. people have used reference counting in
the past to solve this problem, but the general consensus has always
been (on lisp implementations) that devoting enough bits to reference
counting was impractical (it certainly is if every value with pointer
semantics must be tagged) and that reference counting collection was
more expensive than stop and copy. in addition, reference counting
does not compact storage as does any of the various forms of stop and
copy.

All that is necessary is to encapsulate the
details of storage management within abstract data types, such that
the destruction algorithm is integrated into the block exit
mechanism.

this is not sufficient. we also need to handle assignment (the old
value of the assignee becomes less referenced), and formal argument
passing (where we suddenly have one more reference). in c++ there is
also all of the complexity of implicit and explicity type conversion.

trickier is the issue of continuations and first class functional
objects. somehow all variables that support reference counting styles
of storage management which are referenced by a continuation, or by a
functional object, must be marked as referenced and not reclaimed on
block exit.

By doing this, we place a level of abstraction between the application
programmer and the details of storage management.

certainly the goal.

The use of pointers
is confined to the ADT's implementation, and the application programmer
is shielded from all the resulting complexities.

again, motherhood and apple pie.

The interface also
simplifies the work of the ADT's implementor; the destruction algorithm
is usually quite straightforward.

this is only straightforward in a language that does not support
advanced concepts. (like ada and c++). once you talk about a
language as strong as scheme, then you have a much more interesting
problem. one that is, incidently, easily handled by conventional
garbage collectors.

in fact, reference counting is not the only possibility in a language
like c++ which provides some knowledge of scope entry and exit for the
programmer. instead, it is possible to keep the equivalent of an
obarray, pushing and popping references to variables as needed. then
when gc is desired, one can do a stop and copy or mark/sweep with an
extra pass to correct all the pointers.

Worrying about GC is essentially barking up the wrong tree; what we
need is better reuse mechanisms, so that we can also gain the added
benefit of exploiting the implicitly supplied information regarding
the fact that a given variable is no longer needed which is naturally
produced (with no special effort on the part of the application
programmer) upon each and every block exit.

this _is_ garbage collection. it just isn't stop and do x garbage
collection.

it should be kept in mind that in general, collection overhead increases
as you go from generational collection to stop and copy to mark sweep
to reference counting to `real time' collection methods (almost all of
which are based indirectly on baker's algorithm in the august '78
issue of cacm). if you care about average throughput, use stop and
copy, and maybe a generational copier. if you want a very simple
system, use ref counting. nobody really uses the real time methods
yet because they are expensive and complex.

--
t...@nmsu.edu
Most of all, he loved the fall
when the cottonwoods leaves turned gold
and floated down the trout streams
under the clear blue windswept skies.

William Thomas Wolfe, 2847

unread,

Sep 18, 1989, 1:41:43 PM9/18/89

to

From article <18...@hydra.gatech.EDU>, by ty...@prism.gatech.EDU (Steve Tynor):

> Freeing the memory _is_ usually straightforward. Knowing when it's safe to
> do so _isn't_. That's when GC (or reference counting) becomes necessary.

It's safe to do so:

1) When the variable goes out of scope

2) When the programmer specifies that it's to be destroyed sooner.

So assuming that we use abstract data types like good little
programmers, where's the need for reference counting?

Bill Wolfe, wtw...@hubcap.clemson.edu

William Thomas Wolfe, 2847

unread,

Sep 18, 1989, 1:51:25 PM9/18/89

to

>The classic approach would be to copy-in and copy-out the contents
>of such containers, but simply passing pointers to contents is
>often much faster.

There are two approaches:

1) Call it a container of objects. In this case, the semantics
are to store a copy, and the programmer may well rely on this,
perhaps destroying or modifying the original while relying on
the container to keep the earlier version handy.

2) Call it a container of pointers. In this case, the semantics
are to store a pointer, and the programmer must refrain from
destroying the original.

What we should NOT do is allege that we have a container of objects
when in reality we have a container of pointers; the two are very
different concepts with respect to the effects of object modification.

Bill Wolfe, wtw...@hubcap.clemson.edu

Anton Rang

unread,

Sep 18, 1989, 5:05:02 PM9/18/89

to

In article <64...@hubcap.clemson.edu> billwolf%hazel.cs.c...@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) writes:

>From article <18...@hydra.gatech.EDU>, by ty...@prism.gatech.EDU (Steve Tynor):
>> Freeing the memory _is_ usually straightforward. Knowing when it's safe to
>> do so _isn't_. That's when GC (or reference counting) becomes necessary.
>
> It's safe to do so:
>
> 1) When the variable goes out of scope
>
> 2) When the programmer specifies that it's to be destroyed sooner.

If there's any explicit pointer manipulation, these conditions
aren't sufficient, because an object can be pointed to in more than
one place. For instance, if I have two objects (A and B) sharing
pointers to object X, then object X should be destroyed (assuming no
other references) when both A and B have been destroyed.
Personally, I prefer controlling my own data structures...leaving it
up to a garbage collector has always seemed a little sloppy to me.

+----------------------------------+------------------+
| Anton Rang (grad student) | "VMS Forever!" |
| University of Wisconsin--Madison | ra...@cs.wisc.edu |
+----------------------------------+------------------+

Stephen Crawley

unread,

Sep 18, 1989, 8:25:38 PM9/18/89

to

The most important distinction between ordinary and real-time programs
is that the real-time programs need to execute in predictable time.
An ordinary synchronous garbage collector that stops execution for a
non-trivial amount of time is clearly not acceptable in this context.
But an asynchronous garbage collector that allows GC to be done in
short burst during periods of inactivity MAY be ok for real-time
programming.

The key question that determines whether or not asynch GC will work in
real-time is whether there is enough idle time to keep up with garbage
collection. In many cases the answer should be yes! And if the answer
is no, it may be possible to MAKE more idle time can be made by 1) using
a faster processor or 2) moving the tasks of garbage collection onto a
second processor. Only when we are pushing the limits of the hardware
will the answer be no.

A second question that affects the viability of GC in a real-time program
is how much garbage gets generated. I claim that for most of real-time
program that are being written today, the answer is "not a lot". If a
programmer can decide to free a piece of dynamic on exiting a scope block,
then a smart compiler could usually make the same decision in many cases.
Similarly a smart compiler should be able to optimise storage reclamation
in other cases. [It is my guess that the main reason that compilers that
optimise GC don't exist is that the industry is prejudiced against GC]

There are of courses the cases that cannot be solved by freeing stuff on
scope exit, or by other static tricks. In such cases, the non-GC'ed
solution MUST handled by the application. Bill Wolfe seems to be saying
that such situations almost never arise in real-time problems. This may
be true right now, but it won't be long before the military start asking
for real-time AI!

Finally, I'd like to rebut the idea that ADT's that do their own storage
management are appropriate for use in a GC'ed environment. If a type
does its own management, then in order to get it to do its stuff, it must
export an explicit destructor operation or its equivalent. How / when does
the destructor get called? There are 2 options as I see it:

1) A type that refers to a non-GC'ed type becomes non-GC'ed, and must
have a destructor operation. We soon find that large parts of our
application have to understand storage management policy.

2) We arrange that the garbage collector provides hooks for calling
an object's destructor operation when the object becomes garbage
in the conventional sense. This makes garbage collection more
complicated. For example, when the GC finds some garbage instances
of ADT's with a destructor, it must decide on the correct order to
destroy them.

In either case, we end up with the problems we were trying to avoid by
using garbage collection; complication for the application, and lots of
potential for storage leaks and other bugs.

In summary, we need to recognise 3 things:

o A GC'ed language is the only sane approach to many hard programming
problems. Some of these problems WILL also be real-time problems.

o A GC'ed language is not necessarily inappropriate for real-time
problems. It depends on the problem, the tools (GC and compiler)
and on the hardware.

o GC'ed languages and non-GC'ed packages don't mix, so we should
not try to mix them. This is a case where software reuse is NOT
appropriate.

-- Steve

William Thomas Wolfe, 2847

unread,

Sep 18, 1989, 8:34:24 PM9/18/89

to

From ra...@cs.wisc.edu (Anton Rang):

>> It's safe to do so:
>> 1) When the variable goes out of scope
>> 2) When the programmer specifies that it's to be destroyed sooner.
>
> If there's any explicit pointer manipulation, these conditions
> aren't sufficient, because an object can be pointed to in more than
> one place.

True, but the basic thrust of the position is that the programmer
should generally not use pointers explicitly; rather, the use of
pointers should be made safe via encapsulation within a secure ADT.

> Personally, I prefer controlling my own data structures...leaving it
> up to a garbage collector has always seemed a little sloppy to me.

Same here. I welcome GC algorithms as an ADT *debugging* tool
(raising an exception if the algorithm ever succeeds in finding
any loose storage), but it just costs way too much at RUN time.

Bill Wolfe, wtw...@hubcap.clemson.edu

Steve Tynor

unread,

Sep 19, 1989, 9:48:23 AM9/19/89

to

In article <64...@hubcap.clemson.edu> billwolf%hazel.cs.c...@hubcap.clemson.edu writes:

>From ra...@cs.wisc.edu (Anton Rang):
>>> It's safe to do so:
>>> 1) When the variable goes out of scope
>>> 2) When the programmer specifies that it's to be destroyed sooner.
>>
>> If there's any explicit pointer manipulation, these conditions
>> aren't sufficient, because an object can be pointed to in more than
>> one place.
>
> True, but the basic thrust of the position is that the programmer
> should generally not use pointers explicitly; rather, the use of
> pointers should be made safe via encapsulation within a secure ADT.

Whether a reference to a shared object is a pointer, or is hidden via some
clever encapsulation, I can't see how the programmer can be expected to know
when to explicitly deallocate the reference without maintaining reference
counts (whether he does it, or the ADT - it's still reference counting) or
relying on GC (or assuming that references are _never_ shared - which seems
like a simplistic assumption).

I am (as we speak) trying to implement a ADT (in Ada) which contains objects
whose lifetimes are not lexically scoped and to which many other objects may
refer. I'd be very happy to avoid reference counts and GC if someone can
suggest a way to do so!

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
WITH disclaimer; USE disclaimer;

joh...@p.cs.uiuc.edu

unread,

Sep 19, 1989, 10:53:00 AM9/19/89

to

> Written 6:21 am Sep 18, 1989 by billwolf%haz...@hubcap.clemson.edu

> Now the advocates of GC frequently claim that space management is
> too heavy a burden to be placed on the application programmer, and
> this may well be true. However, there is a much better way to deal
> with this problem. All that is necessary is to encapsulate the
> details of storage management within abstract data types, such that
> the destruction algorithm is integrated into the block exit mechanism.

> By doing this, we place a level of abstraction between the application
> programmer and the details of storage management. The use of pointers
> is confined to the ADT's implementation, and the application programmer
> is shielded from all the resulting complexities. The interface also
> simplifies the work of the ADT's implementor; the destruction algorithm
> is usually quite straightforward.

Bill Wolfe is being hopelessly idealistic. I've used C++ a lot, which
takes the approach that he advocates, and there are many times when it
is very hard for an ADT to figure out when to destroy an object. Moreover,
memory management is the source of many bugs and takes a lot of programmer
time to get right. In general, we want to avoid programming language
features that encourage bugs and cause programmers to spend a lot of time.
C++ programmers tend to minimize using dynamic memory management as much as
possible, and it is probably partly for this reason.

Moreover, there are times when memory management would be more efficient
if it were implemented by the compiler instead of by the programmer.
For example, C++ programmers use reference counting a lot, and there are
a number of algorithms by which even stupid compilers can eliminate a
lot of reference counting. Of course, programmers can perform these
optimizations by hand, but I believe that compilers should be in charge
of optimization, not programmers. Most languages with garbage collection
are implemented by interpreters or by very simple compilers, and so we
haven't seen much optimization of memory management, but we will in the
future, as more and more languages include garbage collection.

Many garbage collection systems are quite efficient. For example,
some generation scavenging systems have a 3% to 5% overhead. The
problem is that the efficient systems are not real-time, and the
real-time garbage collection algorithms are not efficient. The
algorithm by Appel et. al. in the '88 SIGPLAN programming language
implementation conference is supposed to be both, but I am not sure
that it has been ever been used in a real-time environment. In any
case, there were no timings given, and "real-time" is always relative.

Ralph Johnson

William Thomas Wolfe, 2847

unread,

Sep 19, 1989, 11:12:17 AM9/19/89

to

From s...@cl.cam.ac.uk (Stephen Crawley):

> The key question that determines whether or not asynch GC will work in
> real-time is whether there is enough idle time to keep up with garbage

> collection. [...] Only when we are pushing the limits of the hardware

> will the answer be no.

Another key question is "How can we make this product such that it
uses as little memory and CPU as possible, so that we might underbid
our competitors?"; the reuse of components which do their own space
management will permit the elimination of GC, thus providing an edge.

> Finally, I'd like to rebut the idea that ADT's that do their own storage

> management are appropriate for use in a GC'ed environment. [...]
> [how to handle interactions between managed and unmanaged storage?]

I'm not an expert on how this is done, but I'd guess that memory is
divided into two regions: managed and unmanaged. The GC system could
*read* the managed region (to detect pointers into the unmanaged area),
but only for the purpose of operating upon the unmanaged region; it
must refrain from taking any actions with respect to managed storage.

Probably there are considerably more sophisticated techniques
involved; I'd suggest porting this thread into comp.lang.ada
if someone is seriously interested in the question, since Ada
compilers which provide GC are also required to enforce the
CONTROLLED pragma (which specifies that this type manages its
own storage).

Bill Wolfe, wtw...@hubcap.clemson.edu

William Thomas Wolfe, 2847

unread,

Sep 19, 1989, 11:41:20 AM9/19/89

to

From ty...@prism.gatech.EDU (Steve Tynor):

>>>> It's safe to do so:
>>>> 1) When the variable goes out of scope
>>>> 2) When the programmer specifies that it's to be destroyed sooner.
>

> Whether a reference to a shared object is a pointer, or is hidden via some
> clever encapsulation, I can't see how the programmer can be expected to know
> when to explicitly deallocate the reference without maintaining reference
> counts (whether he does it, or the ADT - it's still reference counting) or
> relying on GC (or assuming that references are _never_ shared - which seems
> like a simplistic assumption).

OK, first let me clarify my position. The primary use for pointers
is simply to implement abstract data types -- lists, trees, graphs, etc.;
by shielding the user from the details of implementation, we can keep
the user from having to worry about manipulating pointers.

Now your question, I believe, is "What if the user wants to store
pointers in a container?"... the response is that the user probably
should seriously consider the possibility that s/he is making an
inappropriate use of pointers.

Let's suppose the user has an object with lots of "inertia", which
causes the user to initially think that pointers are a good way to
"move" this object as necessary. This can be addressed with a
technique which applies to an even harder problem: the case in
which the object is not even locally present. The solution is
to assign unique identification numbers to objects of the type,
keep the objects in a database (which may of course be a distributed
database), and then deal with the identification numbers instead.

Identification numbers have many nice properties: they can be arbitrary
abstractions (e.g., encoding certain properties of their client objects),
they are totally machine-independent, and they are not subject to any
a priori limitations. Information regarding the maximum lifetime of the
object is a particularly good property to encode into the identification
number, since the holder can then quickly check whether the object has
expired without even consulting the database. Where objects can have
an infinite lifetime, a protocol can be devised whereby the database
must be checked at least once per some arbitrary time period by all
users, which will permit identification numbers to be recycled after
one full time period of nonexistence. Other protocols might be used
depending upon the characteristics of the specific object involved.
The techniques for managing the active range of identification numbers
can also be quite sophisticated.

> I am (as we speak) trying to implement a ADT (in Ada) which contains
> objects whose lifetimes are not lexically scoped and to which many
> other objects may refer. I'd be very happy to avoid reference counts
> and GC if someone can suggest a way to do so!

Consider it done.

Bill Wolfe, wtw...@hubcap.clemson.edu

Steve Tynor

unread,

Sep 19, 1989, 12:55:49 PM9/19/89

to

In article <65...@hubcap.clemson.edu> billwolf%hazel.cs.c...@hubcap.clemson.edu writes:
...
> expired without even consulting the database. AWhere objects can have

> an infinite lifetime, a protocol can be devised whereby the database
> must be checked at least once per some arbitrary time period by all
> users, which will permit identification numbers to be recycled after
> one full time period of nonexistence. Other protocols might be used

Gee. This sounds an awful lot like garbage collection to me.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Progress means replacing something wrong with something more subtly wrong.

William Thomas Wolfe, 2847

unread,

Sep 19, 1989, 2:49:23 PM9/19/89

to

From article <19...@hydra.gatech.EDU>, by ty...@prism.gatech.EDU (Steve Tynor):

>> expired without even consulting the database. AWhere objects can have
>> an infinite lifetime, a protocol can be devised whereby the database
>> must be checked at least once per some arbitrary time period by all
>> users, which will permit identification numbers to be recycled after
>> one full time period of nonexistence. Other protocols might be used
>
> Gee. This sounds an awful lot like garbage collection to me.

No... GC would involve verifying (at tremendous cost) that no more
occurrences of a given identification number exist anywhere. This
protocol specifies that identification numbers EXPIRE AUTOMATICALLY
unless certain conditions are met. GC is restricted to a particular
machine; this protocol will function even in a distributed environment.

Usually, the semantics of the object will allow even better controls,
such as a fixed expiration date, but this protocol covers the worst case.

Bill Wolfe, wtw...@hubcap.clemson.edu

Stephen Crawley

unread,

Sep 19, 1989, 8:23:22 PM9/19/89

to

Bill Wolf (?) writes:
>>> It's safe to [reclaim a dynamicly allocated ADT]:

>>> 1) When the variable goes out of scope
>>> 2) When the programmer specifies that it's to be destroyed sooner.

Anton Rang replies:

>> If there's any explicit pointer manipulation, these conditions
>> aren't sufficient, because an object can be pointed to in more than
>> one place.

In addition:
1) doesn't work if a given scope doesn't exit before you've run
out of space OR if the value in a variable is passed out of the scope.
2) doesn't work unless your programmer writes perfect code.

Bill Wolf replies:

> True, but the basic thrust of the position is that the programmer
> should generally not use pointers explicitly; rather, the use of
> pointers should be made safe via encapsulation within a secure ADT.

Anton is absolutely right. It makes no difference whether references
to an ADT is implicit or explicit. The ADT cannot know whether or not
the application that uses it still holds a reference to a given instance.
In all but the simple cases the application must take a hand in the ADT's
storage management, either by invoking the ADT's deallocator explicitly
or by telling the ADT when to incrementing/decrementing a reference count.

Ian Hopper

unread,

Sep 20, 1989, 12:38:18 AM9/20/89

to

From article <11...@boulder.Colorado.EDU>, by sco...@boulder.Colorado.EDU (Scott Henninger):

That is a perfectly reasonably question, given my vague posting.
Sorry to waste your time on the first pass. (Novice poster.)
All of your suggesions apply, I would say. Performance can be an
application specific issue, so is GC.

I interpret "quality" in a reusable package to mean: lack of internal
bugs and an interface that allows the package to be used without hassles.
Typical hassles are listed as items 2-5 below.

The (unstated) question I am getting at is: do we need to assume
mark-and-sweep-like garbage collection in order to write siginificant
amounts of reusable software?

One disclaimer: I don't have a copy of the Gorlan "OOPS" C++ library,
so I am forced to speculate on what is available from it. I did not
wish to imply that it (or any other particular library) is broken or
"bad", rather that such "good" implementations are difficult, and there
are very few of them around. Only OOPS is in a language without a
standard garbage collector. The others have GC's: Smalltalk and Eiffel.
Eiffel has a GC, but the classes suffer from the fixed-at-creation
problem my "PS" was referring to. Smalltalk may not have the
performance that some applications need and lacks static type checking.

I believe that one of the following apply to a reusable collection "mocule":

1) There is a mark-and-sweep garbage collector available.

I claim this is "best", provided you can afford GC. I know
if several GC's that ease the "hiccups" that the classic M-S
approach required.

2) The library implements a mark-sweep garbage collector.
Perhaps restricting the gc capability to objects that
comply with some restrictions.

This is OK, but there are restrictions on use. M-S garbage
collectors are often not portable.

3) The library uses some form of reference counting and punts the
problem of circular references to the client of the library.

If circular references are provably not a problem for the
particular case, then this is perfect. Again, there may
be additional restrictions on what sorts of objects can be
put into a collection. (SubClass of: ReferenceCountableObject.)

4) The library copies collection elements into and out of the
collection objects.

This is slow for contents that are large objects in their own right.
Putting an object into and then retrieving that object creates
a copy, rather than the original object. This may be inappropriate.
Objects cannot tell whether they are a member of a collection,
except by value. Membership in multiple collections is not
easily represented, except by value.

5) The collection class forces it's users to carefully track
the "ownership" of objects. This makes it difficult to
allow an object to be a member of a varying number of
collections without extra book-keeping.

The source of many subtle bugs in user's code. I believe that
this is the "typical" approach.

(Still :) Interested in responses,

Ian

William Thomas Wolfe, 2847

unread,

Sep 20, 1989, 10:46:24 AM9/20/89

to

From s...@cl.cam.ac.uk (Stephen Crawley):

>>>> It's safe to [reclaim a dynamicly allocated ADT]:
>>>> 1) When the variable goes out of scope
>>>> 2) When the programmer specifies that it's to be destroyed sooner.

>> [...] the basic thrust of the position is that the programmer

>> should generally not use pointers explicitly; rather, the use of
>> pointers should be made safe via encapsulation within a secure ADT.
>

> It makes no difference whether references to an ADT is implicit
> or explicit. The ADT cannot know whether or not the application
> that uses it still holds a reference to a given instance.

The pointers are INTERNAL to the ADT, and the user will never
see them. When the ADT goes out of scope, it dies just like
any predefined type would. We are considering here an ADT which
is declared at the beginning of a (procedure, function, block).
This ADT will do dynamic allocation INTERNALLY so as to facilitate
its own expansion and contraction as necessary. Upon block exit,
the ADT's destroy procedure will be invoked, and all the space
occupied by the ADT will be released.

Bill Wolfe, wtw...@hubcap.clemson.edu

Stephen Crawley

unread,

Sep 20, 1989, 6:30:10 PM9/20/89

to

From Bill Wolfe:

OK ... in the trivial case your scheme does work. The conditions
for it to work include:

a) NO references to the ADT are passed out of scope.

b) the scope exits before you run short of space.

c) the programmer (i.e. the application!) does not incorrectly tell
the ADT to release an instance too soon.

If either a), b) or c) do not hold the program dies. Condition a) can
be shown to be true, provided that you can trace all of the side-effects
that occur within the scope and PROVE that they do not violate the
condition. Condition b) is hard to reason about formally except to the
extent that it is possible to place an upper bound on the amount of
memory allocated within the scope. Condition c) is virtually impossible
to reason about ... unless you do something drastic like forbidding
all side-effects.

There are other problems with your scheme:

o Point 2) is in fact the application taking a hand in an ADT's
storage management ... something that he has been saying is
unnecessary. [If you leave out 2), then your scheme is equivalent
to a stack based storage management with an extra stack!]

o GC'ed memory management can reclaim junk (heap nodes that are no longer
needed) whenever it runs out of space, whereas a scope based scheme
can in general only identify and hence reclaim junk on scope exit.
In other words, scope based reclamation of memory will in general
need more memory.

Finally, don't forget that the case where condition a) is not true; i.e.
some of the dynamically allocated ADT instance are passed out of scope
either explicitly (as a result) or implicitly (by side-effect).

----

This whole GC versus non-GC debate seems to hinge the following trade-off.
Do we want programs that

are slower (by say 25%), but are intrinsicly free of a large class of
storage management errors.

or

are usually faster (not always), but may use more storage, may have
potential storage leaks and other storage management bugs, and which
arguably take longer to develop.

----

I would recommend that people should read Chapter 16 (Memory Management)
of Bertrand Meyer's book "Object Oriented Software Construction" for a
calm and rational analysis of the pros and cons of various methods of

Stephen Crawley

unread,

Sep 20, 1989, 9:27:59 PM9/20/89

to

[Sorry about the Re:^n's ... bugs in my news reader]

Bill Wolfe proposes a scheme for getting around the need for garbage
collection by storing objects in a database, using object expiration
times to avoid proper garbage collection. Unfortunately this scheme
doesn't achieve its aim.

Bill says:

"Where objects can have an infinite lifetime, a protocol can be devised
whereby the database must be checked at least once per some arbitrary
time period by all users, which will permit identification numbers
to be recycled after one full time period of nonexistence."

Exactly. In order to check each object that it still refers to, each
"user" (actually application) must traverse the graph of its objects in
the database. This process is exactly equivalent to the mark phase of
a garbage collection, except that the application is doing it!!!

One other point that Bill Wolfe misses is that the CPU overheads of
managing objects in this way can be orders of magnitude greater
than the cost of using ordinary pointers.

Don't get me wrong, I've been doing research in this particular area for
a number of years, and I strongly believe that object databases are the
way of the future. But they ain't cheap, and they ain't a substitute
for garbage collection.

-- Steve

Stephen Crawley

unread,

Sep 20, 1989, 9:52:38 PM9/20/89

to

Bill Wolfe writes:

> OK, first let me clarify my position. The primary use for pointers
> is simply to implement abstract data types -- lists, trees, graphs, etc.;
> by shielding the user from the details of implementation, we can keep
> the user from having to worry about manipulating pointers.

Frankly, I can't see how this is possible. Suppose I have an ADT 'A'
that has a generator operation NEW and a destructor function DISPOSE.
Suppose I want a generic ADT LIST[t: type] that can be used in the
following way.

a: A
aa: A

while true do
a := A$NEW(...)
aa := A$NEW(...)

l: LIST[A] := LIST[A]$MAKE_EMPTY()

LIST[A]$INSERT(l, a)
LIST[A]$INSERT(l, A$NEW(...))
LIST[A]$INSERT(l, a)

aa := LIST[A]$REMOVE_HEAD(l)

etc

end while

The generation of instances of A must be under the control of the
application and assignment of A must be handled. I must be able to
put an A instance into a list any number of times.

How do I implement the LIST ADT so that it invokes the A$DISPOSE
operation at the right time? When is the right time? What happens
if I want to put an instance of A into 2 LIST's at the same time?

William Thomas Wolfe, 2847

unread,

Sep 21, 1989, 12:08:26 PM9/21/89

to

From s...@cl.cam.ac.uk (Stephen Crawley):

>> This ADT will do dynamic allocation INTERNALLY so as to facilitate
>> its own expansion and contraction as necessary. Upon block exit,
>> the ADT's destroy procedure will be invoked, and all the space
>> occupied by the ADT will be released.
>
> OK ... in the trivial case your scheme does work. The conditions
> for it to work include:
>
> a) NO references to the ADT are passed out of scope.

In keeping with the general principle that application programmers
should not be using pointers; appropriate ADTs should be used instead.

> b) the scope exits before you run short of space.

Or you have programmed your application to properly handle the
exception STORAGE_ERROR. The sequence is then as follows:

1) An ADT request comes in for which the ADT decides it will
need more space. The ADT attempts to obtain the space, but
STORAGE_ERROR is raised. Fortunately, the ADT is programmed
so as to guarantee to the user that the ADT will remain in
its original state (i.e., the transaction will abort).

2) The ADT now either directly propagates STORAGE_ERROR or
something equivalent (Unable_To_Comply, for example) to
the user, in accordance with the ADT's specification.

3) The user can then take whatever steps are deemed appropriate
in order to handle the exception. Note carefully that only
the user is in a position to decide what data structures
to throw away in such a crisis.

> c) the programmer (i.e. the application!) does not incorrectly tell
> the ADT to release an instance too soon.

Since the default is that the ADT will die automatically upon going
out of scope, the user will presumably refrain from invoking Destroy
without a fairly good reason.

> o GC'ed memory management can reclaim junk (heap nodes that are no longer
> needed) whenever it runs out of space, whereas a scope based scheme
> can in general only identify and hence reclaim junk on scope exit.
> In other words, scope based reclamation of memory will in general
> need more memory.

Not true. Consider -- what precisely is junk? If we define it as
"memory that is no longer needed due to deallocate operations", then
the ADT will automatically free up all such memory by virtue of the
semantics of the deallocation process; an order to deallocate
constitutes a guarantee from the ADT that it is completely safe
to do so; it knows this because it has received an order from the
user (e.g., Remove_Item) which removes the need for the space,
and since the ADT is in total control of the space involved, it can
proceed to issue the deallocation order and thereby reduce the size
of the ADT's representation.

If we define junk as "memory which is strictly not necessary at this
point in the program", then we require a "Read Programmer's Mind"
instruction. Not happening to have such an instruction, we must rely
on the programmer to communicate the fact that a data structure is
no longer necessary via the Destroy command.

Incidentally, the ADT can provide as one of its services a function
which indicates the amount of space currently occupied by the ADT,
which the user can make use of when it becomes necessary to decide
which data structures to throw away. This is made possible by the
'SIZE attribute in Ada; the ADT simply adds up the 'SIZE of all the
space it is occupying, and returns that value to the user. The user
can then simply select the data structure for which the cost function
(Size/Value) is maximal and order it Destroyed, and then proceed to
retry the aborted operation.

> Finally, don't forget that the case where condition a) is not true; i.e.
> some of the dynamically allocated ADT instance are passed out of scope
> either explicitly (as a result) or implicitly (by side-effect).

Which can't happen unless either the applications programmer is
using pointers directly (naughty naughty), or something called by the
application programmer (in effect, an extension of the application
programmer) is doing it on his/her behalf. In the latter case, we
this presumably is in accordance with the called unit's specification,
and hence remains the fault of the applications programmer.

The rule is simple: Leave the use of pointers to ADT implementors,
and simply reuse the code they produce. Exploit to the absolute
fullest the hard guarantees (proper space management, serializability,
recoverability, concurrent processing, etc.) that the ADT implementors
worked so hard to provide you with, and thy Manager will be pleased.

> This whole GC versus non-GC debate seems to hinge the following trade-off.
> Do we want programs that are slower (by say 25%), but are intrinsicly

> free of a large class of storage management errors, or are usually faster

> (not always), but may use more storage, may have potential storage leaks
> and other storage management bugs, and which arguably take longer to
> develop.

All of the negatives you cite (except "may use more storage", which
is bogus) are overcome by the reuse paradigm. The ADT is developed
with extremely high quality at a high cost, but this cost is then
spread over an extremely large number of users until the end of time;
the result is very much an economic win-win situation.

Bill Wolfe, wtw...@hubcap.clemson.edu

Thomas Maslen

unread,

Sep 21, 1989, 4:00:40 PM9/21/89

to

In the last week or so there has been an interesting, er, discussion on
the desirability of GC in production code. Bill "ADA is the one true
way" Wolfe and perhaps Anton Rang have argued the negative case, while
Ian Hopper, Scott Henninger, John Gateley, Steve Tynor, Rob MacLachlan,
Stephen Crawley, Ted Dunning and Ralph Johnson have been on the
affirmative [my apologies if I've mis-characterized anyone's position].
No prizes for guessing my biases: I think the people arguing for GC
have the right idea.

The paragraph above added exactly no technical content to this
discussion. What we need is something that does. If we're to have
any hope of convincing the ADAholics that maybe, just maybe, languages
need GC in order to be useful, we need a crisp example of a problem
that:
- requires various unrelated parts of a program to use one
instance of an ADT (either because it's too large to copy, or
because we really do want updates to apply to the one
instance),
- doesn't have any obvious point at which we can say "we're sure
we've finished with it now, but we definitely couldn't have
got rid of it much earlier",
- has sufficiently stringent efficiency requirements that we
can't pass around magic numbers, or whatever it was that
Bill W was proposing a few days ago, which have to be
translated before we can actually use the ADT,
- is sufficiently tricky that we can't rely on the programmer
magically remebering to "do the right thing",
- is arguably representative of a non-trivial class of real
problems.
I don't have such a problem handy, just a gut feeling that the insides
of real compilers are rich with suitable examples. Any volunteers?
If nobody has a decent example, then maybe Bill W has a point.

Since this is likely to involve non-negligible volumes of code, and
some of us try to digest lunch while reading this group, I'd like to
suggest that E-mail may be more appropriate than clogging up the
group.

Next, a side bet:

In article <65...@hubcap.clemson.edu> billwolf%hazel.cs.c...@hubcap.clemson.edu writes:

[mucho stuff deleted]

> In keeping with the general principle that application programmers
> should not be using pointers; appropriate ADTs should be used instead.

[more stuff deleted]

> The rule is simple: Leave the use of pointers to ADT implementors,
> and simply reuse the code they produce. Exploit to the absolute
> fullest the hard guarantees (proper space management, serializability,
> recoverability, concurrent processing, etc.) that the ADT implementors
> worked so hard to provide you with, and thy Manager will be pleased.

My guess is that any ADA solutions we see to my hypothetical problem
will involve auxiliary "appropriate ADTs" which are just limited
private types containing a pointer to the reference-counted real ADT.

Finally, has anyone made use of the (Boehm et al) GC-for-C that was
recently posted to comp.sources.unix? Any comments?

Thomas Maslen mas...@polya.stanford.edu

Dave Jones

unread,

Sep 21, 1989, 5:16:34 PM9/21/89

to

A very wise man once said, "Floating point representation is a mistake.
If you don't know enough about the problem to scale the quantities, you
don't know enough about the problem."

May he rest in peace. He was martyred by an angry hoard of Fortran
application programmers.

If he had lived, today he might say, "Garbage collection is a mistake.
If you don't know enough about the problem to free the unreferenced
memory, you don't know enough about the problem."

If I had lived, I probably would agree with him.

I have some library packages, oops -- I meant to say "ADT's" --
which help. I can declare Stacks, which can be marked. Later you can
deallocate all the memory above the mark in one very fast pop. That
covers many many situations. I also have Heaps which contain blocks
all of one size. Not only is it quicker than your typical malloc(),
sometimes by a factor of several hundred, but if you are through with
all the records in a Heap, just destroy the entire Heap. Both of these,
like all my librar.. er ADT's, grow in size indefinitely, as required.

Michael Wolf

unread,

Sep 21, 1989, 8:26:16 PM9/21/89

to

In article <11...@polya.Stanford.EDU> mas...@Polya.Stanford.EDU (Thomas Maslen) writes:
>My guess is that any ADA solutions we see to my hypothetical problem
>will involve auxiliary "appropriate ADTs" which are just limited
>private types containing a pointer to the reference-counted real ADT.

I don't see what the problem is with passing around private types
with pointers to reference-counted ADT's. It adds a small amount of
overhead to pointer references, and a couple bytes per object for
reference counts. If you've got a language which enforces the
"private" part, you're pretty damn sure that your reference count
is correct, and it's then easy to tell when it is or isn't okay
to deallocate an object. The only thing you need is a destructor
routine called on each "private" pointer when it goes out of scope
or get's deallocated.

The nice thing about this approach is that you KNOW that garbage
collection isn't going to unpredicatably impact your performance.

Now, if you can find a GC algorithm that's as efficient, I'll consider
switching. Also, I don't see that writing such a reference counting
ADT is all that much tougher then writing the same ADT without
the reference counting, so it doesn't take too much extra development
time.

There may be algorithms which require so many pointer references, and
so little memory allocation/deallocation that a little GC would be
more efficent, but I'm betting they're in the minority. Of course,
it's possible to get applications where you can't afford the space
overhead of a couple bytes per object, and then GC would be the
way to go.

Michael Wolf These opinions are bootleg from Taiwan, so it comes
mw...@Teknowledge.COM as no surprise to find that they're defective.

Steve Tynor

unread,

Sep 22, 1989, 8:25:18 AM9/22/89

to

In article <82...@goofy.megatest.UUCP> djo...@megatest.UUCP (Dave Jones) writes:
...

>If he had lived, today he might say, "Garbage collection is a mistake.
>If you don't know enough about the problem to free the unreferenced
>memory, you don't know enough about the problem."
>
>If I had lived, I probably would agree with him.

...

>all the records in a Heap, just destroy the entire Heap. Both of these,
>like all my librar.. er ADT's, grow in size indefinitely, as required.

^^^^^^^^^^^^^^^^^^^^^^^^^
I assume this was intended to be followed by a ":-)".

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
No problem is so formidable that you can't just walk away from it.

Giuliano Carlini

unread,

Sep 22, 1989, 8:49:45 PM9/22/89

to

From article <65...@hubcap.clemson.edu>, by billwolf%hazel.cs.c...@hubcap.clemson.edu (William Thomas Wolfe, 2847 ):

>
> Upon block exit,
> the ADT's destroy procedure will be invoked, and all the space
> occupied by the ADT will be released.
>
>
> Bill Wolfe, wtw...@hubcap.clemson.edu

This can't work. What if the ADT is returned as the result of the
block. In order to handle this you either need reference counting, or
garbage collection.

giuliano
uunet!ashton!giuliano
--
giuliano
Net: ...!uunet!ashtate!giuliano
Home: 1322 27th San Pedro, CA 90731 (213)538-6554
Work: Ashton-Tate, 20101 Hamilton Ave., Torrance CA 90502-1319 (213)538-7683

Stephen Crawley

unread,

Sep 22, 1989, 9:50:46 PM9/22/89

to

Michael Wolf writes:

> I don't see what the problem is with passing around private types
> with pointers to reference-counted ADT's. It adds a small amount of
> overhead to pointer references, and a couple bytes per object for
> reference counts. If you've got a language which enforces the
> "private" part, you're pretty damn sure that your reference count
> is correct, and it's then easy to tell when it is or isn't okay
> to deallocate an object.

Ok so far ... sort of

> The only thing you need is a destructor
> routine called on each "private" pointer when it goes out of scope
> or get's deallocated.

And there's the problem with this scheme.

As I tried to explain before, scope based deallocation works if and
only if:

1) the scope DOES exit. [The global scope and the scope of main()
NEVER exit for the sake of this argument.]
2) the scope exits BEFORE we start running out of space
3) and we can afford the extra storage overhead of remembering
that we allocated object X in this scope Y. [You either need
to build a chain of objects allocated at each scope level or
a new heap for each scope level ...]

In general, one of the above criteria does not hold. This means that
we must rely on some other mechanism to tell the ADT to apply the
destructor to a given instance at the appropriate time.

I claim that this means that the application programmer needs to deal
with this in significant percentage of the cases. This is possible,
but it is very difficult to get correct when (for instance) the private
pseudo-pointer type is itself passed around the place in complicated
ways or when it is stored in dynamic data structures.

Bill Wolfe maintains that the problem can be solved with a higher level
library ADTs that implement generic list, tree, graphs etc. I claim he
is incorrect and have presented an example (see <9...@scaup.cl.cam.ac.uk>)
which I believe demonstrates this.

Lets try not to rehash old arguments ...

-- Steve

Stephen Crawley

unread,

Sep 22, 1989, 10:46:59 PM9/22/89

to

Bill Wolfe wrote:
>>> This ADT will do dynamic allocation INTERNALLY so as to facilitate
>>> its own expansion and contraction as necessary. Upon block exit,
>>> the ADT's destroy procedure will be invoked, and all the space
>>> occupied by the ADT will be released.

I replied:

>> OK ... in the trivial case your scheme does work. The conditions
>> for it to work include:
>>
>> a) NO references to the ADT are passed out of scope.

Bill replied:

> In keeping with the general principle that application programmers
> should not be using pointers; appropriate ADTs should be used instead.

An I claim that it is IMPOSSIBLE to build sufficiently general library
ADT's! See article <9...@scaup.cl.cam.ac.uk> for an example. [And I've
a better one up my slieve ;-)]

I wrote:
>> b) the scope exits before you run short of space.

Bill replies:

> Or you have programmed your application to properly handle the
> exception STORAGE_ERROR. The sequence is then as follows:
>

> [Bill then describes a scheme where the application catches a
> STORAGE_ERROR exception, causes the current transaction to be
> aborted and throws away some (global) data structures after
> asking the user.]

This is unrealistic. It assumes that there are enough large global
data structures to be thrown. It also assumes that the application
contains the necessary code for interacting with the end user; i.e.
asking questions that he/she has a chance of understanding. Neither
will be true in general.

I wrote:
>> c) the programmer (i.e. the application!) does not incorrectly tell
>> the ADT to release an instance too soon.

Bill replies:

> Since the default is that the ADT will die automatically upon going
> out of scope, the user will presumably refrain from invoking Destroy

^^^^
[he means programmer I think ...]

> without a fairly good reason.

In this case the *good reason* is that the scope does not exit soon
enough; if the programmer doesn't reclaim the ADT before scope exit,
the program will run out of space and die.

I wrote:
>> GC'ed memory management can reclaim junk (heap nodes that are no longer
>> needed) whenever it runs out of space, whereas a scope based scheme
>> can in general only identify and hence reclaim junk on scope exit.
>> In other words, scope based reclamation of memory will in general
> need more memory.

Bill replied:

> Not true. Consider -- what precisely is junk?

I define junk to mean dynamically allocated space which was allocated
in the current scope (or passed down from a higher scope) that is no
longer accessible from an in-scope variable anywhere in the program.

In a complex program, such junk may be generated in such a way that
it IMPOSSIBLE for any application code or any library ADT to keep track
of its accessibility ... without implementing a garbage collector!

Bill continues:

> If we define it as "memory that is no longer needed due to

> deallocate operations" [...]

No thats not what I meant ...

Bill continues:

> If we define junk as "memory which is strictly not necessary at this
> point in the program"

This is approximately what I mean ...

> then we require a "Read Programmer's Mind" instruction.

Oh no we don't!!!! A garbage collector can (and will) reclaim a lot of
this junk by tracing all in-scope variables etc. The garbage collector
can do a "perfect" job of this. Storage reclamation code in an ADT or
the application can only do a half-hearted job in difficult cases.
Such a difficult case might involve detecting that a subgraph containing
cycles has become detached from its root ... the case where reference
counting does not work. My argument is that the difference between a
perfect job and imperfect job may be enough to cause the program to die.

I therefore maintain that my "may use more storage" point is CORRECT.

Finally (sigh) I wrote:
>> Finally, don't forget that the case where condition a) is not true; i.e.
>> some of the dynamically allocated ADT instance are passed out of scope
>> either explicitly (as a result) or implicitly (by side-effect).

Bill replies:

> Which can't happen unless either the applications programmer is
> using pointers directly (naughty naughty), or something called by the
> application programmer (in effect, an extension of the application
> programmer) is doing it on his/her behalf. In the latter case, we
> this presumably is in accordance with the called unit's specification,
> and hence remains the fault of the applications programmer.

Sigh.

> The rule is simple: Leave the use of pointers to ADT implementors,
> and simply reuse the code they produce. Exploit to the absolute
> fullest the hard guarantees (proper space management, serializability,
> recoverability, concurrent processing, etc.) that the ADT implementors
> worked so hard to provide you with, and thy Manager will be pleased.

Except that ADT implementors haven't because it is impossible. And my
and their Managers had damn well better not start asking the impossible,
'cos it will be on HIS head when we can't deliver! (Don't worry, I will
have asked for a transfer ... or resigned by then.)

-- Steve

Ted Dunning

unread,

Sep 23, 1989, 12:58:29 AM9/23/89

to

one thing that all of the c++ programmers will know by heart, but most
of the ada programmers may not have thought of is that one of the most
common ways for a reference to an adt to be destroyed is by
overwriting due to assignment.

of course, handling this is simple if you can overload assignment.
likewise, if you can outlaw assignment, then the second best solution
of naming it something else is possible.

but where does this leave ada? no overloading or forbidding
assignment leaves the writer of a reliable reference counting adt in a
real pickle.
--
t...@nmsu.edu
Most of all, he loved the fall
when the cottonwoods leaves turned gold
and floated down the trout streams
under the clear blue windswept skies.

d...@hollin.prime.com

unread,

Sep 23, 1989, 4:13:00 PM9/23/89

to

I believe you are both correct.

Most created objects can be managed with explicit creation/destruction code,
placed in abstract data type specifications or otherwise part of the
application, because their lifetimes are naturally connected to the algorithms
(or messages) that use them. Sometimes, however, dynamic reclamation (garbage
collection) is truly necessary (to avoid unnaturalness or inefficiency). A
useful parallel is the need for both stack and heap data storage in
general-purpose programming languages.

When an application can use explicit creation and destruction for most of its
object management, the garbage collection overhead is reduced, and this has
all sorts of good consequences.

David Spector
Prime Computer, Inc.
d...@primerd.prime.com (until my layoff, expected soon)

William Thomas Wolfe, 2847

unread,

Sep 23, 1989, 4:48:04 PM9/23/89

to

From s...@cl.cam.ac.uk (Stephen Crawley):

> Bill Wolfe proposes a scheme for getting around the need for garbage
> collection by storing objects in a database, using object expiration
> times to avoid proper garbage collection. Unfortunately this scheme
> doesn't achieve its aim.
>

> "Where objects can have an infinite lifetime, a protocol can be devised
> whereby the database must be checked at least once per some arbitrary
> time period by all users, which will permit identification numbers
> to be recycled after one full time period of nonexistence."
>
> Exactly. In order to check each object that it still refers to, each
> "user" (actually application) must traverse the graph of its objects in
> the database. This process is exactly equivalent to the mark phase of
> a garbage collection, except that the application is doing it!!!

Not exactly. To start with, the user does not have to do any work
at all if the identification number's expiration period is also used
as the period over which the user will lose interest in the object if
it is not further utilized. In this case, the identification number
will be allowed to expire and the user need only throw it away.

If the user does make use of the number within the expiration period,
then the number is automatically revalidated for another period unless
the database replies that the object has been destroyed, in which case
the identification number expires as well.

In the worst case, the user will have to do a database access just to
verify that the object still exists and that the identification number
remains valid for another expiration period. Since many objects have
a finite lifetime, this burden is borne only by those users who make
use of infinite-lifetime objects. The task of maintaining the validity
of the identification number can be done automatically if desired using
a suitably designed identification-number abstraction which carries a
user-settable automatic revalidation on-off switch.

There are no reference counts kept at the database; the database only
maintains the object, the associated identification number, and any
authorization protocols which may be required of users who wish to
create, read, modify, or destroy. The expiration period protocol
allows identification numbers to be distributed without limitation
(to systems which are unknown to the database, possibly not even
reachable). It is fault-tolerant, allowing users to die with their
identification numbers intact without interfering with the destruction
of the associated object. If the user wakes up, the user can either
continue to utilize the identification number (if it has not expired),
or throw it away due to its expiration.

> One other point that Bill Wolfe misses is that the CPU overheads of
> managing objects in this way can be orders of magnitude greater
> than the cost of using ordinary pointers.

Where the situation permits (as it usually does), pointers which are
encapsulated within ADT implementations will give better performance.

We are managing objects in this way because they present worst-case
properties which do not permit us to use more efficient techniques.

Bill Wolfe, wtw...@hubcap.clemson.edu

William Thomas Wolfe, 2847

unread,

Sep 23, 1989, 5:06:02 PM9/23/89

to

From s...@cl.cam.ac.uk (Stephen Crawley):

>> OK, first let me clarify my position. The primary use for pointers
>> is simply to implement abstract data types -- lists, trees, graphs, etc.;
>> by shielding the user from the details of implementation, we can keep
>> the user from having to worry about manipulating pointers.
>
> Frankly, I can't see how this is possible. Suppose I have an ADT 'A'
> that has a generator operation NEW and a destructor function DISPOSE.
> Suppose I want a generic ADT LIST[t: type] that can be used in the
> following way.

-- create X and Y, two instances of type A;
-- create a list L, containing objects of type A

-- loop
-- append X to the list L
-- append a constant object of type A to the list L
-- append X to the list L again
-- assign the head of list L to the variable Y
-- end loop;

> The generation of instances of A must be under the control of the
> application and assignment of A must be handled. I must be able to
> put an A instance into a list any number of times.
>
> How do I implement the LIST ADT so that it invokes the A$DISPOSE
> operation at the right time? When is the right time? What happens
> if I want to put an instance of A into 2 LIST's at the same time?

When you make your instantiation of the list type for objects of
type A, you must fulfill the instantiation requirements. This will
normally include supplying assignment and destruction procedures for
objects of type A. If you send an object into the list, the assignment
procedure will be used to create a resident copy. If you direct the
removal of an object from the list, the destruction procedure will be
used to destroy it. (See Booch, "Software Components with Ada")

If the type A was an object, then the objects will be created and
destroyed as appropriate. If the type A was a pointer, then the
pointers will be created and destroyed as appropriate. It all depends
on what the user supplies for type A and how the user directs that the
assignment and destruction operations for type A are to be performed.

The "generic" part means that the implementor proceeds with respect to
an arbitrary, user-supplied type which will have the specific operations
required of it, without worrying about what that type is or how the
operations over it are to be implemented; these are decisions which
the user has agreed to make in exchange for the ADT's services.

Bill Wolfe, wtw...@hubcap.clemson.edu

William Thomas Wolfe, 2847

unread,

Sep 23, 1989, 5:13:43 PM9/23/89

to

From t...@nmsu.edu (Ted Dunning):

> but where does this leave ada? no overloading or forbidding
> assignment leaves the writer of a reliable reference counting adt in a
> real pickle.

The fact that it is not possible in Ada 83 to overload ":=" has
not posed practical difficulties, since it is standard practice
to simply require the generic parameter structure

with procedure Assign (Target : in out Object_Type;
Source : in Object_Type) is <>;

thus substituting "Assign" for ":=" in a straightforward way.

There remains a technical problem in that we can't use "Assign"
to initialize variables at the moment of creation, which is probably
why this is one of the more popular Ada 9X revision requests.

I wouldn't use either form of assignment to do a reference-counting
system, though... it's too easy to create unkillable garbage if one
process dies while still holding claim to a reference.

Bill Wolfe, wtw...@hubcap.clemson.edu

William Thomas Wolfe, 2847

unread,

Sep 23, 1989, 8:36:03 PM9/23/89

to

From d...@hollin.prime.com:
> Sometimes, [...] dynamic reclamation (garbage collection) is

> truly necessary (to avoid unnaturalness or inefficiency). A
> useful parallel is the need for both stack and heap data storage in
> general-purpose programming languages.

I think we've established that managed and unmanaged storage
paradigms can coexist, and that components which manage their
own storage can avoid the inefficiencies of garbage collection.

We also know that the user MUST be involved in storage management,
if for no other reason than to decide which data structures to
throw away in the event of a storage crisis.

What I'd like to know is: Is there a hard counterexample which
will demonstrate that situations exist in which the inefficiencies
of garbage collection cannot be avoided? Someone has already outlined
the properties that such a counterexample should have, and mentioned
a "gut feeling" that such a counterexample exists. Unfortunately,
I have a similar "gut feeling" that GC is never worth the cost.

Bill Wolfe, wtw...@hubcap.clemson.edu

Ted Dunning

unread,

Sep 24, 1989, 1:57:28 AM9/24/89

to

In article <65...@hubcap.clemson.edu> billwolf%hazel.cs.c...@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) writes:

I wouldn't use either form of assignment to do a
reference-counting system, though... it's too easy to create
unkillable garbage if one process dies while still holding
claim to a reference.

ok... got it. part of the purpose of ada is to improve maintenance of
software, which implies good readability.

but this lack of := overloading means that we need to use a
procedure... but let's call it setq instead of assign.

now what is the relation of parameter passing by value (i.e.
initializing) on the reference counts in a reference counting adt? is
there any way to catch the fact that passing a variable into a
procedure constitutes another reference? no, you say?

and finally, why don't we just use the ada equivalent of
unwind-protect to decrement reference counts in the event a process
dies. you say it doesn't exist and pre-processors don't exist either
and extension are outlawed?

well why didn't we just use lisp to implement our adt's? (oh, right i
forgot about the 5% overhead for garbage collection).

--
t...@nmsu.edu
remember, when extensions and subsets are outlawed,
only outlaws will have extensions or subsets

William Thomas Wolfe, 2847

unread,

Sep 24, 1989, 1:51:07 PM9/24/89

to

From s...@cl.cam.ac.uk (Stephen Crawley):

>> [Bill then describes a scheme where the application catches a
>> STORAGE_ERROR exception, causes the current transaction to be
>> aborted and throws away some (global) data structures after
>> asking the user.]
>
> This is unrealistic. It assumes that there are enough large global
> data structures to be thrown. It also assumes that the application
> contains the necessary code for interacting with the end user; i.e.
> asking questions that he/she has a chance of understanding. Neither
> will be true in general.

This hasn't even a remote resemblance to what I said.

The ADT aborts the transaction because it can't satisfy the
request. The user receives an exception to that effect. The user
does something to alleviate the problem (e.g., selecting and
destroying a data structure) in the process of handling the
exception, and proceeds to retry the operation.

Incidentally, the questions you are asking (and this is NOT,
repeat NOT, intended to reflect negatively on Mr. Crawley) in
this and at least one previous article indicate that you do not
really understand generics or exception handling at all, and I
would therefore suggest that you investigate some basic Ada texts
in order to understand these basic concepts and better participate
in the discussion.

> I define junk to mean dynamically allocated space which was allocated
> in the current scope (or passed down from a higher scope) that is no
> longer accessible from an in-scope variable anywhere in the program.
> In a complex program, such junk may be generated in such a way that
> it IMPOSSIBLE for any application code or any library ADT to keep track
> of its accessibility ... without implementing a garbage collector!

And I contend that if the program is that complex, then it is
badly written; the programmer probably has not made use of the
basic principle that "abstraction is the fundamental means by
which the computer scientist can combat complexity". Abstract
data types are an integral part of this process.

>> then we require a "Read Programmer's Mind" instruction.
>
> Oh no we don't!!!! A garbage collector can (and will) reclaim a lot of
> this junk by tracing all in-scope variables etc. The garbage collector
> can do a "perfect" job of this. Storage reclamation code in an ADT or
> the application can only do a half-hearted job in difficult cases.
> Such a difficult case might involve detecting that a subgraph containing
> cycles has become detached from its root ... the case where reference
> counting does not work. My argument is that the difference between a
> perfect job and imperfect job may be enough to cause the program to die.

Not a difficult case at all, if we have (for example) a generic
Simple_Directed_Graph ADT. Again, the argument here reflects
a lack of understanding of the basic ideas behind generic software
components.

>> The rule is simple: Leave the use of pointers to ADT implementors,
>> and simply reuse the code they produce. Exploit to the absolute
>> fullest the hard guarantees (proper space management, serializability,
>> recoverability, concurrent processing, etc.) that the ADT implementors
>> worked so hard to provide you with, and thy Manager will be pleased.
>
> Except that ADT implementors haven't because it is impossible.

It is quite possible; start by reading Booch's "Software Components
with Ada", and once you understand that you will be in a position
to consider some more advanced techniques. Then if you still want
to argue against my position, you will at least understand it well
enough to launch a reasonable attack.

Bill Wolfe, wtw...@hubcap.clemson.edu

William Thomas Wolfe, 2847

unread,

Sep 24, 1989, 2:04:53 PM9/24/89

to

From t...@nmsu.edu (Ted Dunning):
> let's call [Ada assignment procedure] setq instead of assign.

>
> now what is the relation of parameter passing by value (i.e.
> initializing) on the reference counts in a reference counting adt?

The answer is that in general, there IS no "parameter passing by
value"!!!!! From the Ada 83 reference manual, 6.2.7:

...The language does not define which [parameter passing
mechanism] is to be adopted for parameter passing, nor
whether different calls to the same subprogram are to use
the same mechanism. The execution of a program is erroneous
if its effect depends on which mechanism is selected by the
implementation.

This leaves the compilers free to implement parameter passing
any way they want to. The semantics of parameter passing are
simply this:

If the parameter is "in", you can read it but not write it.

If the parameter is "out", you can write it but not read it.

If the parameter is "in out", you can read and write it.

The parameter will be of the specified type, and can be
referenced by the specified name.

If the user does not specify a parameter value, the specified
default value will be supplied.

Thou shalt assume nothing else about the passing of parameters.

> [...] pre-processors don't exist either and extension are outlawed?

Actually, preprocessors do exist; I believe the Classic Ada product
(which provides inheritance and dynamic binding to the Ada O-O types)
works via preprocessing. However, they must be kept separate from
any validated compiler system.

> well why didn't we just use lisp to implement our adt's?

No concept of data typing, perhaps?

Bill Wolfe, wtw...@hubcap.clemson.edu

Golden Richard

unread,

Sep 24, 1989, 11:19:02 PM9/24/89

to

In article <9...@scaup.cl.cam.ac.uk> s...@cl.cam.ac.uk (Stephen Crawley) writes:
[pro-GC arguments deleted]

I find the current GC wars rather interesting, but I find myself siding
naturally with Bill simply because I have *never* encountered a situation
that absolutely demanded GC. Even without automatic scope-exit destructors
for ADTs, programmer-controlled storage management isn't difficult.
I'd be most interested in seeing some concrete example where GC was the
'only way to go'.

-=-
Golden Richard III OSU Department of Computer and Information Science
gric...@cis.ohio-state.edu "I'm absolutely positive! ...or not."

Stephen Crawley

unread,

Sep 25, 1989, 4:38:36 AM9/25/89

to

I wrote:
>> Bill Wolfe proposes a scheme for getting around the need for garbage
>> collection by storing objects in a database, using object expiration
>> times to avoid proper garbage collection. Unfortunately this scheme
>> doesn't achieve its aim.
>>
>> "Where objects can have an infinite lifetime, a protocol can be devised
>> whereby the database must be checked at least once per some arbitrary
>> time period by all users, which will permit identification numbers
>> to be recycled after one full time period of nonexistence."
>>
>> Exactly. In order to check each object that it still refers to, each
>> "user" (actually application) must traverse the graph of its objects in
>> the database. This process is exactly equivalent to the mark phase of
>> a garbage collection, except that the application is doing it!!!

Bill Wolfe replies

> Not exactly. To start with, the user does not have to do any work
> at all if the identification number's expiration period is also used
> as the period over which the user will lose interest in the object if
> it is not further utilized. In this case, the identification number
> will be allowed to expire and the user need only throw it away.

In other words the objects don't have a long lifetime. This is a red
herring!! The case we were discussing is the one where the objects DO
have long lifetimes.

Bill continues:

> If the user does make use of the number within the expiration period,

> then the number is automatically revalidated for another period ...

Granted. But unless there is a GUARANTEE that EVERY object of interest
is touched periodically by the application, there is a risk that objects
of interest to the user will be destroyed prematurely.

> ... unless the database replies that the object has been destroyed,

> in which case the identification number expires as well.

From the user's point of view, this is unacceptable behaviour by the
application / database cabal.

Bill continues:

> In the worst case, the user will have to do a database access just to
> verify that the object still exists and that the identification number
> remains valid for another expiration period. Since many objects have
> a finite lifetime, this burden is borne only by those users who make
> use of infinite-lifetime objects.

Not true. There is no way for an application to intuit the lifetime of a
given set of database entries. If it makes a guess, and is wrong it has
committed the sin of destroying user data. We can't expect the user to
specify an accurate lifetime either. The user can't accurately predict the
future any more than an application can. It would be totally unreasonable to
then come back to a user and say "21 of your objects expired and were
deleted. Oh, you didn't want that? Well tough."

In this day and age, users are more and more naive, and systems must
be correspondingly forgiving. Therefore, the burden of storage management
must be borne by ALL applications that deal with POTENTIALLY long lived
data items. In practice this means ALL applications that store data in
the database.

Bill continues:

> The task of maintaining the validity of the identification number can
> be done automatically if desired using a suitably designed identification-
> number abstraction which carries a user-settable automatic revalidation
> on-off switch.

I'm not sure, but this sounds like Bill is finally admitting that the
application needs to include a GC style marker.

However ... user-settable object refreshing is like giving a hang grenade
to a two year old. I can just imagine it now:

User: I came in this morning and half of my PC's database
has disappeared
Support: Hmmm ... why is object refresshing turned off???
User: Joe told me that turning off object would make the
database would make it go faster.

Bill continues to describe an object access control scheme, which is
pretty much beside the point ...

I wrote:
>> One other point that Bill Wolfe misses is that the CPU overheads of
>> managing objects in this way can be orders of magnitude greater
>> than the cost of using ordinary pointers.

Bill replies:

> We are managing objects in this way because they present worst-case
> properties which do not permit us to use more efficient techniques.

What about garbage collection???

Surely Bill's scheme with object numbers and expiration times and (as
Bill finally admits) object refreshing in application code is MUCH MORE
EXPENSIVE than an efficient garbage collector! Bear in mind that a
previous article (sorry no ref.) claimed measurements of overheads of
only 3-5% for a modern garbage collector.

-- Steve

Stephen Crawley

unread,

Sep 25, 1989, 6:55:13 AM9/25/89

to

Golden Richard III writes:

> I find the current GC wars rather interesting, but I find myself siding
> naturally with Bill simply because I have *never* encountered a situation
> that absolutely demanded GC. Even without automatic scope-exit destructors
> for ADTs, programmer-controlled storage management isn't difficult.
> I'd be most interested in seeing some concrete example where GC was the
> 'only way to go'.

Here are some problems where programmer-controlled storage management
would be very difficult or impossible.

Interactive editting of cyclic graph data structures. You have a
heterogeneous cyclic graph data structure and the editor must be able
to make arbitrary sequences of changes to the data structure. How
do you detect that a subgraph has become detached?

This sort of problem arises in many real life problems such as graphical
editors and in various programming environment tools. In practice, many
such applications don't try very hard to reclaiming storage. When the
application has leaked so much space that the processor starts to thrash,
the user can always save his work and restart ...

Dynamic loading & unloading of modules in a running program. An
application consists of a static set of core modules and a number
of other modules that are dynamically link-loaded on demand. When
the programmer recompiles a dynamic module, he may want to reload
it and start using it without shutting down the entire application.
But to do this, the dynamic loader needs to know if the module is
currently being 'used' ... e.g. if some thread is executing a
procedure exported by the module or if some other part of the
application has salted away a reference into the module (e.g. a
procedure pointer)

This problem arose when I was using Mesa / XDE to implement my entity system;
a persistent object system. I did get the dynamic loading to work ... and
it made a big performance difference, but dynamic unloading could not be
done safely, since the Mesa / XDE environment has no garbage collector.

[Incidentally, the lack of garbage collection caused so many reliability
problems that I was forced to give up using Mesa / XDE altogether
about 2 years ago. The new version of the entity system uses MIT CLU
as the base language. CLU is garbage collected, and though the GC I am
using is slow and has some unpleasant attributes, and though I've had
to do some extensive butchery on unsavoury parts of CLU runtime system,
I do not regret making the switch]

-- Steve

Stephen Crawley

unread,

Sep 25, 1989, 8:04:15 AM9/25/89

to

Bill Wolfe writes:
> Incidentally, the questions you are asking (and this is NOT,
> repeat NOT, intended to reflect negatively on Mr. Crawley) in
> this and at least one previous article indicate that you do not
> really understand generics or exception handling at all, and I
> would therefore suggest that you investigate some basic Ada texts
> in order to understand these basic concepts and better participate
> in the discussion.

Nice one Bill.

I'll have you know Mr Wolfe that I've been using languages with exception
handling (CLU and Mesa) and generic data types (CLU) for many years. I
understand the concepts quite adequately thank you. I've also implemented
a variety of storage managers, including garbage collectors both for short
term and long term data structures, and I've done quite a bit of work on
hybrid (i.e. partly GC'ed, partly application controlled) storage management
solutions.

Frankly I don't see why it is necessary to understand ADA intimately
to talk about software reuse. The fact that two issues seem to be so
firmly associated in your mind is very sad, since it seems to blind
you the possibility of non-Ada solutions.

Mr Wolfe: I used to believe fervently (like you) that the overhead
garbage collection was unacceptable. Many years of bitter experience
trying to get by without it have lead me to change my mind. Perhaps if
you had had my experience your views would be different. You never know,
you might even be wrong!

I'll answer your "technical" points later.

-- Steve

Gary Wright

unread,

Sep 25, 1989, 11:14:32 AM9/25/89

to

In article <62...@tut.cis.ohio-state.edu> Golden Richard <gric...@cis.ohio-state.edu> writes:
>In article <9...@scaup.cl.cam.ac.uk> s...@cl.cam.ac.uk (Stephen Crawley) writes:
>[pro-GC arguments deleted]
>
>I find the current GC wars rather interesting, but I find myself siding
>naturally with Bill simply because I have *never* encountered a situation
>that absolutely demanded GC. Even without automatic scope-exit destructors
>for ADTs, programmer-controlled storage management isn't difficult.
>I'd be most interested in seeing some concrete example where GC was the
>'only way to go'.

Well, I have *never* encountered a situation that absolutely demanded using
a high level language. That is why I still do all my programming in
assembly. :-)

If our criteria for deciding whether GC is better than explicit memory
management is based on absolutes, we will never come to a conclusion.

Explicit memory management is theoretically speaking equivelent to GC
(we are all using Turing machines aren't we?). What we need to
determine is if a language that supports GC allows programs to be
expressed in a form that is "better" than a language without GC.
Clearly, it will be hard to come to a conclusion about which form of
expression is "better" than another. We also need to determine if the
benefits derived from the ease of expression outweigh any performace
degradation.

This trade off has been made in the past. A hand-coded assembly program
can probably be made to be more efficient in time and space than a program
generated by a high-level language compiler. Yet most programmers
don't use assembly. Perhaps this analogy holds for the current debate
regarding GC.

It is interesting to note the advances that have been made in
optimizing compilers. Perhaps a sufficiently smart compiler for a language
that supports GC can figure out when GC can be avoided? That is to say,
the compiler can notice when the last reference to an object will be lost
and can explicity "dispose" of the object. In general, the compiler won't
be able to determine this (thus GC) but for the simple cases, why not?

I have seen a number of people claim that GC can not be used in a
real-time system and then conclude that GC is a bad thing that should
not be used. If this isn't a non sequitur I don't know what is. There
will always be special cases that will require special techniques.
What I am interested in are language features that make my life easier
for the majority of applications that don't fall under the catagory of
"special case". I also would like compilers that can do the dirty,
tedious work of deciding when to use special techniques instead of a
more general one. Areas in which I see this now or in the future are:

register usage
parameter passing techniques
dynamic vs. static binding (in the contect of OOP)
function inlining
GC vs non-GC

I am sure others can come up with more examples.

My predicition is that more an more people will start using languages
with GC but there will always remain those who think it is a "bad thing",
and there will always be situations in which GC should not be used.

There are people today who claim that we don't need anything more than
assemblers, and there are situations today in which it is the only
reasonable solution.
--
Gary Wright ...!uunet!hsi!wright
Health Systems International wri...@hsi.com

Stephen Crawley

unread,

Sep 25, 1989, 12:44:36 PM9/25/89

to

Bill Wolfe writes:
>
> When you make your instantiation of the list type for objects of
> type A, you must fulfill the instantiation requirements. This will
> normally include supplying assignment and destruction procedures for
> objects of type A. If you send an object into the list, the assignment
> procedure will be used to create a resident copy. If you direct the
> removal of an object from the list, the destruction procedure will be
> used to destroy it. (See Booch, "Software Components with Ada")
>

First objection: I don't want to put a copy of the instance of A into the list
twice. I want to put the SAME instance of A into the list twice. If
A is a mutable type (i.e. has operations that can change a value) this
is a significant semantic difference.

Actually, though you SAY "make a resident copy [of the object]" I think
you MEAN something slightly different; "make a resident copy of the ref
to the object". Unfortunately, Ada makes a distinction between pointers
and non-pointers and this makes such things hard to describe concisely.
I'll assume you meant this, and also that the object has an internal
reference count that is incremented by the ADT's "assign" and decremented
by its "destroy". (The ref count is essential if an object is to be appear
more than once in a list)

Second objection: You do not say how the assign and destruct operations
get invoked when an instance of A returned by the LIST ADT to the
application. In general this requires some cooperatation on the part
of the application.

In ADA it seems that by declaring A in the ADT as a "limited private" type
the ADT can force the application to cooperate as far as assignment is
concerned. (Since limited private types do not support ANY operations except
for those declare by the PACKAGE, the application programmer can be forced
to apply the ASSIGN or DESTRUCT operation to the result of every A operation
that returns an A instance in order to get his program to type check.)

However, I can't see how the application is forced to call DESTROY on an A
variable when its declaration scope exits. [Please correct me if I am wrong
.. I am NOT an ADA expert.]

If this is the case, then it is a trivial matter to generate an example
where correct storage management depends on the application programmer
not making a mistake. In other words, this is not a correct solution.

---

The ADA partial solution to storage management for lists strikes me as
ugly. For every ADT A, the corresponding PACKAGE needs to declare two
types and a bunch of extra operators (ASSIGN, DESTROY and others that
are forced on us by exporting the type of A as a LIMITED PRIVATE) Finally,
since := can't be overloaded, the programmer has to use an awkward idiom
to assign an A instance to a variable.

Then there is the issue of efficiency. [After all, efficiency IS the
reason we are not using GC'ed storage...] Every time an instance of
the ADT A is assigned, we execute the ASSIGN operator which would be
coded something like

TYPE
A_obj = RECORD [
ref_count: int,
...
]

A = POINTER TO A_obj

-- Assign: updates the A variable given by 'dest'
-- to hold the A object given by 'source'
assign = PROCEDURE (source: IN A, dest: IN OUT A)
IF dest ~= 'uninitialised' THEN
-- If there is already a A value in 'dest', decrement its ref count.
dest^.ref_count := dest^.ref_count - 1
IF dest^.ref_count = 0 THEN
-- deallocate dest^ and all of its fields.
END
END
source^.ref_count := source^.ref_count + 1
dest := source
END assign

That is a LOT of work for a simple assignment!

Contrast it with the GC'ed case where assignment is probably 1 machine
instruction unless you are using some useless reference counting garbage
collection scheme.

If the ratio of list access to list building operations is large enough
(I'd guess maybe 10 to 1 if ASSIGN isn't expanded inline), it will be
MORE EFFICIENT TO USE GARBAGE COLLECTION.

-- Steve

Golden Richard

unread,

Sep 25, 1989, 2:20:44 PM9/25/89

to

In article <5...@hsi86.hsi.UUCP> wri...@hsi.com (Gary Wright) writes:
>In article <62...@tut.cis.ohio-state.edu> Golden Richard <gric...@cis.ohio-state.edu> writes:
>>In article <9...@scaup.cl.cam.ac.uk> s...@cl.cam.ac.uk (Stephen Crawley) writes:
>>[pro-GC arguments deleted]
>

>Well, I have *never* encountered a situation that absolutely demanded using
>a high level language. That is why I still do all my programming in
>assembly. :-)
>
>If our criteria for deciding whether GC is better than explicit memory
>management is based on absolutes, we will never come to a conclusion.
>

I never intended to sound as if I were daring someone to come up with an
application which I couldn't handle with programmer-managed reclamation.
The "absolutely demanded" phrase in my article was prompted by references
to (unnamed) applications which supposedly require GC to ease the burden
on the programmer.

To rephrase my statement, I have never encountered an application where
GC was necessarily preferable to programmer-managed reclamation.
Furthermore, I can't imagine a properly designed application where
storage management is so complicated that the usual techniques for
programmer-managed storage reclamation aren't adequate. If the
programmer can't define when a certain object can go poof, I suspect
a serious lack of understanding of the problem at hand.

William Thomas Wolfe, 2847

unread,

Sep 25, 1989, 2:53:17 PM9/25/89

to

From wri...@hsi.UUCP (Gary Wright):

> I have seen a number of people claim that GC can not be used in a
> real-time system and then conclude that GC is a bad thing that should
> not be used.

You're missing several important steps here having to do with
performance degradation and the violation of the general principle
that one should never do at run time that which can be done instead
at compile time, design time, etc.; in general, the sooner a problem
is handled (and similarly, the sooner an error is detected), the less
costly it will be to handle it. Why pay and pay and pay at run time
when an efficient self-managing solution (valid forevermore) can be
used (and reused) instead?

Bill Wolfe, wtw...@hubcap.clemson.edu

Andrew P. Mullhaupt

unread,

Sep 25, 1989, 8:10:37 PM9/25/89

to

The debate raging over garbage collection in this newsgroup is an
interesting contrast to the "effect of free()" thread in comp.lang.c; there
are many there who argue that they need to examine pointers after de-allocating
them(!) One argument advanced is that even though it's a bad thing, since it's
usually OK, it should become standard practice. This is to my mind in character
for C programmers. In this newsgroup, we find other programmers arguing that
programmers should have no direct use for pointers, and so the language can
determine the correct lifetime of dynamic data.

I suggest it is going to prove necessary for the languages which have
this "Pascalian" point of view to close all loopholes (typecasts to integer,
etc.) to prevent the widespread use of C style perfidious trickery to escape
from sensible programming practice. (If you give them an inch...)

I would also point out that despite the claims of its adherents, APL
(A Pointerless Language) despite having no explicit pointers does not do a
great job with storage management. Often space is allocated, then de-allocated
as soon as possible, only to be re-allocated for the same object, somewhere
else in the workspace, with the attendant risks of fragmentation and garbage
collection. The programmer is often forced to peculiar code (even for APL)
in order to try and provide memory management cues to the interpreter. There
is often no relief for the problem when storage efficiency requires re-use of
a local variable for more than one purpose. At least they have the cute idea
of always garbage collection immediately after printing the last response to
the user. (This has the effect of making garbage collection less noticable
when working in the typical sofware engineer's paradise of a tiny test problem,
and excrutiatingly painful when you get a trivial syntax error in a workspace
where you just started a huge operation.) The point is that it makes a lot of
difference when garbage collection is performed, and if the programmer can
control it.

Another APL story: When calling FORTRAN from APL, you must allocate
the storage for the FORTRAN subprogram in the APL workspace. There are many
important FORTRAN subprograms which involve two calls sharing data in an
auxilliary array residing in the APL workspace, (i.e. at the mercy of the APL
garbage collector.) It can happen that APL decides it must perform garbage
collection between the two calls to FORTRAN and the auxilliary array is moved
to a new home. As can be expected in cases like this, the manual guarantees
unpredictable results. The classic example of this is a Fast Fourier Transform
which sets up all the trig calls in one call and performs the transform in a
second. Repeated transforms can then be done by repeating the second call only,
if you're calling from FORTRAN, (or C, or Pascal etc.) but this is too daring
from APL. And since there is no way to force APL to leave your array alone, the
only hope, and common practice, is to ensure that the garbage collection is
performed before the first call to FORTRAN, and to look at the the size of the
available workspace to make sure the arrays to be transformed are not large
enough to trigger the calamity again. This means that you always suffer the
performace hit of garbage collection, and you always have to essentially manage
your own storage. If writing an FFT in APL wasn't so slow, and the highly
vectorized FORTRAN FFT so fast, and the data arrays so large as to endanger
garbage collection, this wouldn't be much of a payoff. But they are.

This last example falls into the category of "application programmer
doing something naughty (by extension through library routine) with (what are
essentially) pointers." The interesting part is that *BOTH* APL and FORTRAN get
to accuse you of this: APL doesn't want you to rely on the array being in any
given place, and FORTRAN doesn't want you to be able to move it. This is the
kind of unforseen conflict which prevents "RE-USABILITY" in many cases. From
their separate points of view, both the FORTRAN and APL implementors have
correct and sensible implementations, but their products are being used (to
very good result) in a way which neither expected, and through silly tricks
which make for highly unreadable and difficult to maintain code.

The moral of the story is:

1. Put in garbage collection, but also put in sufficient controls for the
programmer.

2. If you need to make assertions about scope of dynamic data, FORCE them.

3. Programmers (even us Pascal lovers) will not stop at anything when the
big push comes to the big shove, and we'll likely get around sensible
and normal guidelines which are not supported by adequate sensible facilities.

Later,
Andrew Mullhaupt.

Disclaimer:
Any opinions here are not necessarily those of my employer.

(Did you know that Morgan Stanley & Co. Inc., is the world's largest APL
shop? IBM told us so...)

Jan Steinman

unread,

Sep 25, 1989, 8:50:47 PM9/25/89

to

<62...@tut.cis.ohio-state.edu (Golden Richard)>
<<wri...@hsi.UUCP (Gary Wright)>>

<...I can't imagine a properly designed application where storage management is

so complicated that the usual techniques for programmer-managed storage
reclamation aren't adequate. If the programmer can't define when a certain
object can go poof, I suspect a serious lack of understanding of the problem at
hand.>

One who cannot imagine a need for garbage collection other than to correct for
programmer deficiencies simply lacks imagination.

The situation is not so simple in dynamic-bound systems. Are you claiming that
Smalltalk, Lisp, Eiffel, Objective C, Gnu Emacs, et. al. have garbage
collectors simply because those who program in them lack serious understanding
of their problems?

In particular, systems designed for rapid prototyping may have storage systems
that have completely non deterministic usage patterns. I'm certain those who
have used non-GC systems to build things similar to the NeXT InterfaceBuilder
or the Smalltalk Browser have come to appreciate the need for GC.

To bring it back to the newsgroup, the problem of reusable components is
intimately concerned with automatic storage management. The need to document
storage procedures is yet another reason truly reusable components are rare.

I really agree with Gary: <If our criteria for deciding whether GC is better
than explicit memory
management is based on absolutes, we will never come to a conclusion.> Once
the religious give up their stubborn, blind opposition to GC, we can all begin
to discover when to use it, when not to, and how to control it so that it stays
out of the way when not used. I can pound most nails with the handle of a
screwdriver, but I'm a fool if I declare there will never be a hammer in my
toolbox!

In particular, I don't think the combination of asynchronous GC with programmer
storage handling has been adequately explored. Creation and destruction of
many data can be handled by a compiler, and most real-time systems can have
scheduled "idle time" in which to perform overhead activities. (A SONAR
project I worked on performed GC during the few milliseconds then the beam was
sweeping through the hull.)

Jan Steinman - N7JDB
Electronic Systems Laboratory
Box 500, MS 50-370, Beaverton, OR 97077
(w)503/627-5881 (h)503/657-7703

Golden Richard

unread,

Sep 25, 1989, 11:44:25 PM9/25/89

to

In article <59...@tekgvs.LABS.TEK.COM> ja...@tekgvs.LABS.TEK.COM (Jan Steinman) writes:
><62...@tut.cis.ohio-state.edu (Golden Richard)>

[stuff deleted]

>The situation is not so simple in dynamic-bound systems. Are you claiming that
>Smalltalk, Lisp, Eiffel, Objective C, Gnu Emacs, et. al. have garbage
>collectors simply because those who program in them lack serious understanding
>of their problems?
>

I am claiming no such thing. In languages where the choice for GC has been
made *for* the programmer, little other choice remains. As a matter of fact,
I do not even intend to say that GC is inferior to programmer-managed
storage reclamation. The fact that I am relatively ignorant of applications
best suited to using GC is why I'm asking the pro-GC spokespersons to please
give a concrete example where one would truly need or want GC over the
alternative.

I'm certainly not of the "assembly language" school that believes that
everything should be explicitly and intricately hacked out by the
programmer. In particular, automatic scope entry/exit creation/destruction
is a godsend. I just don't see the point in making things more complicated
than they need to be.

Golden Richard

unread,

Sep 25, 1989, 11:48:25 PM9/25/89

to

If anyone received multiple (somewhat mangled and incomplete) copies of my
previous article, I would appreciate email. I was having some technical
problems. Thanks in advance. ...and now back to the storage wars...

Ted Dunning

unread,

Sep 26, 1989, 1:07:52 AM9/26/89

to

In article <65...@hubcap.clemson.edu> billwolf%hazel.cs.c...@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) writes:

> now what is the relation of parameter passing by value (i.e.
> initializing) on the reference counts in a reference counting adt?

The answer is that in general, there IS no "parameter passing by
value"!!!!! From the Ada 83 reference manual, 6.2.7:

[ ada-noise deleted ]

Thou shalt assume nothing else about the passing of parameters.

one of the problems is that an in parameter is still a reference which
must be accounted for. since there is no mechanism to handle
initialization distinct from assignment, it is difficult to write a
robust reference counting collection mechanism.

another serious problem is that if in-out parameters are implemented
as copy-in/copy-out (which is legal), then the copy-out operation is
the same as a bit-wise assignment, whereas it should be overloaded
with whatever assignment semantics are appropriate for the collectable
type.

[ please don't read the following embarassingly ad hominem riposte ]

mister wolfe in the process of villifying people for having proved
their refined sensibilities by falling asleep while listening to
adaists has in fact shown that he has not himself investigated the strengths
of other approaches. or that perhaps he could not appreciate the
purpose of constructs without having the point driven home with a
hammer (or ada reference manual).

> [...] pre-processors don't exist either and extension are outlawed?

Actually, preprocessors do exist; I believe the Classic Ada product
(which provides inheritance and dynamic binding to the Ada O-O types)
works via preprocessing. However, they must be kept separate from
any validated compiler system.

and thus will never be standardized or recognized by the ada community
at large? (i.e. those who believe true ada is gospel). or will the
situation become one of various developers (re)inventing various
language constructs and using ada as a particularly odious target
language for their latest compiler?

> well why didn't we just use lisp to implement our adt's?

No concept of data typing, perhaps?

hmmmm..... seems to me that mister wolfe hasn't done much lisp
programming if he thinks that lisp has no concept of data typing.
perhaps he should read CLtL a bit. particularly chapter 2 (data
types), chapter 4 (type specifiers), section 6.2 (data type
predicates), chapter 9 (declarations), as well as chapters 12, 13, 14,
15, 16, 17, 18 and 19 which describe the builtin data types and type
extension methods available to the common lisp programmer.

of course, since mister wolfe is so wonderfully omniscient, he will
know that CLtl is the commonly used abbreviation for Common Lisp, the
Language by guy steele.

he should also consider the reasonable assertion that ml is in fact a
dialect of lisp. where does acceptance of this put his insistence
that lisp has no concept of data typing. in comparison, ada's
concepts of typing are very feeble indeed.

Ted Theodore (W) Leung

unread,

Sep 26, 1989, 10:32:39 AM9/26/89

to

In article <5...@hsi86.hsi.UUCP> wri...@hsi.com (Gary Wright) writes:

>It is interesting to note the advances that have been made in
>optimizing compilers. Perhaps a sufficiently smart compiler for a language
>that supports GC can figure out when GC can be avoided? That is to say,
>the compiler can notice when the last reference to an object will be lost
>and can explicity "dispose" of the object. In general, the compiler won't
>be able to determine this (thus GC) but for the simple cases, why not?

There has been some work done in this area.....

@incollection{NDJSSM81,
author = "Neil D. Jones and Steven S. Muchnick",
title = "Flow Analysis and Optimization of LISP-like Structures",
booktitle = "Program Flow Analysis: Theory and Applications",
publisher = "Prentice Hall",
year = "1981",
editor = "Steven S. Muchnick and Neil D. Jones",
chapter = "4",
pages = "102-131"
}

Jones and Muchnick present some techniques which allow exactly the
kinds of determinations which you are talking about.
--------------------------------------------------------------------
Internet/CSnet: t...@cs.brown.edu | Ted Leung
BITNET: t...@BROWNCS.BITNET | Box 1910, Brown University
UUCP: uunet!brunix!twl | Providence, RI 02912

Gary Wright

unread,

Sep 26, 1989, 11:50:08 AM9/26/89

to

In article <65...@hubcap.clemson.edu> billwolf%hazel.cs.c...@hubcap.clemson.edu writes:
>From wri...@hsi.UUCP (Gary Wright):
>> I have seen a number of people claim that GC can not be used in a
>> real-time system and then conclude that GC is a bad thing that should
>> not be used.
>
> You're missing several important steps here having to do with
> performance degradation and the violation of the general principle
> that one should never do at run time that which can be done instead
> at compile time, design time, etc.; in general, the sooner a problem
> is handled (and similarly, the sooner an error is detected), the less
> costly it will be to handle it.

In my posting I tried to emphasize the fact that there is probably a
trade-off between using a language that supports GC vs. one that
doesn't. It is not clear to me that the relative costs are such as to
render GC useless. I do believe that it is harder to measure the costs
associated with non-GC languages. Please provide some reasoning for your
statements which indicate that you feel differently. Simply claiming that
the costs do not support GC is not enough. I will try to provide some
reasoning for GC below.

I agree that that as much as possible should be done at compile or
design time. I would prefer that the compiler did most of the dirty work.
Also, in the context of reusable software, design decisions should, as much
as possible, not limit later reuse of the software (e.g. a clear violation
of this is C++'s "virtual" and "inline" keywords).

> Why pay and pay and pay at run time
> when an efficient self-managing solution (valid forevermore) can be
> used (and reused) instead?

It is not clear to me that garbage collection entails a particularly
harsh performance penalty. It is also not clear to me that the
self-managing solution is simple enough to allow such components to be
easily created and modified (i.e. modification via inheritance, which
Ada does not support).

After reading your posting, I went back and read the sections in
"Object-oriented Software Construction" regarding memory management.
What follows is mostly Bertrand Meyer's ideas regarding memory
management. I tend to agree, so I will simply present his ideas with
my comments.

Bertrand Meyer claims that programmer-controlled deallocation "is
unacceptable for two reasons: security and complication of program
writing."

By security, Meyer means that programmers are mortals, and they will
make mistakes and will dispose of objects that still have active
references. Complication refers to the fact that simply disposing
of an object is not sufficient, all internal objects must also
be disposed. Meyer calls this the "recursive dispose problem":

This means that a specific release procedure must be
written for any type describing objects that may refer
to other objects. The result will be a set of mutually
recursive procedures of great complication.

[...]
Instead of concentrating on what should be his job -
solving an application problem - the programmer turns
into a bookkeeper, or garbage collector (whichever
metaphor you prefer). The complexity of programs
is increased, hampering readability and hence other
qualities such as ease of error detection and
ease of modification. This further affects security:
the more complex a program, the more likely it is to
contain errors.

This is the trade-off I was talking about. You can do without GC,
but the increased complexity of your programs has its own
penalty also.

Meyer goes on to describe an alternative, the "Self-Management Approach"
in which the designer of an ADT takes care of the storage management
associated with the ADT. This is the approach I believe Bill Wolf
is advocating. This is indeed a "responsible alternative" to leaving
everything to the programmer. Meyer advocates the use of this technique
for languages that do not support GC. In his example, all objects to
be added to a linked list, are copied into storage that is managed
by the linked list. This would seem to be quite inefficient for large
objects. I'm not sure how storage could be safely managed when an ADT
only holds references to objects.

In his example, procedures that add nodes to or delete nodes from the
linked list cause new nodes to be created or destroyed. If an ADT
provides direct procedures to add or destroy objects within the ADT,
then the ADT has not really hidden the problem from the programmer at
all. At the most, it has hidden the recursive dispose problem.

This approach does not solve the previously mentioned problems of
security and complication. Instead of the applications programmer
worrying about storage, the ADT designer must worry about it. My hunch
is that the distiction between an ADT designer and an applications
programmer is clear for objects like linked lists, stacks etc. but that
it is not so clear the farther away from the basic data structures you
get. I wonder if the overhead related to having ADT's managing storage
via copying, and redundant code (all the recursive dispose procedures)
doesn't come close to the overhead for GC.

It should be noted, that Meyer advocates a GC facility that is
incremental in nature (not bursty), and can be explicitly turned on or
off when necessary (e.g. when real-time constraints exist).

Your turn. :-)

Gary Wright

unread,

Sep 26, 1989, 12:01:07 PM9/26/89

to

In article <62...@tut.cis.ohio-state.edu>
Golden Richard <gric...@cis.ohio-state.edu> writes:

>If the
>programmer can't define when a certain object can go poof, I suspect
>a serious lack of understanding of the problem at hand.

But I wonder if the code is structured so that a programmer can
define when a certain object can go "poof", whether the program
can be understood by anyone other than the original programmer
and whether the code can be easily reused.

Unfortunately, I don't have a good example at hand (a major problem
with this discussion in general, as has been noted by others).

Rob MacLachlan

unread,

Sep 26, 1989, 1:38:08 PM9/26/89

to

From: wri...@hsi.UUCP (Gary Wright)
Subject: Re: Re: Garbage Collection & ADTs
Date: 25 Sep 89 15:14:32 GMT
[...]

>What we need to determine is if a language that supports GC allows programs
>to be expressed in a form that is "better" than a language without GC.

You are right on target here. It is silly to claim that "problem X can only
be solved in language Y"; the interesting question is whether language Y
admits a solution to X that is:
-- More efficient
-- More robust
-- More easily (quickly) written
-- Easier to understand (to maintain)
-- Aids code reuse [token concession to the nominal subject]
-- More likely to be correct, etc.

>Perhaps a sufficiently smart compiler for a language that supports GC can
>figure out when GC can be avoided? That is to say, the compiler can notice
>when the last reference to an object will be lost and can explicity
>"dispose" of the object.

For a long time Lisp compilers have attempted to recognize when some objects
can be allocated according to a stack discipline, rather than being heap
allocated. This has been done primarily with numbers, lists and closures.
The ML compiler has an interesting optimization along these lines: if all
references to an allocated object can be identified as being in the same
function, then the allocation can be replaced with a collection of temporary
varaibles.

Of course, both of these allocation optimizations are aimed at eliminating
cons-and-throw-away behavior within a single function. Programmers used to
thinking of storage allocation as expensive would never write such code.

Rob

William Thomas Wolfe, 2847

unread,

Sep 26, 1989, 1:59:31 PM9/26/89

to

From s...@cl.cam.ac.uk (Stephen Crawley):

>> ... unless the database replies that the object has been destroyed,
>> in which case the identification number expires as well.
>
> From the user's point of view, this is unacceptable behaviour by the
> application / database cabal.

The model involves a database which stores objects which can be
created, destroyed, read, and written by various users. The users
might be partitioned into different groups (e.g., those authorized
to destroy vs. those who cannot). Now from the perspective of a
user who can read and write but not destroy, the object's lifetime
is potentially infinite, depending upon the whims of those who have
the power to destroy. By revalidating the identification number,
such users receive continued assurances that the object still exists.

If the object is destroyed, the identification numbers will then all
expire automatically, regardless of how widely they were distributed.

> Bill replies:
>> We are managing objects in this way because they present worst-case
>> properties which do not permit us to use more efficient techniques.
>
> What about garbage collection???

Won't work under the conditions I described (distributed environment).

Bill Wolfe, wtw...@hubcap.clemson.edu

William Thomas Wolfe, 2847

unread,

Sep 26, 1989, 2:25:30 PM9/26/89

to

From s...@cl.cam.ac.uk (Stephen Crawley):

> Here are some problems where programmer-controlled storage management
> would be very difficult or impossible.
>
> Interactive editting of cyclic graph data structures. You have a
> heterogeneous cyclic graph data structure and the editor must be able
> to make arbitrary sequences of changes to the data structure. How
> do you detect that a subgraph has become detached?

The fact that a subgraph has become detached does not imply that
it is no longer part of a graph. Not all graphs are fully connected.
Therefore, it isn't possible to release the storage used until there
is some sort of directive which indicates that a given set of nodes
and/or arcs is to be destroyed.

> Dynamic loading & unloading of modules in a running program.

> [...] the dynamic loader needs to know if the module is

> currently being 'used' ... e.g. if some thread is executing a
> procedure exported by the module or if some other part of the
> application has salted away a reference into the module (e.g. a
> procedure pointer)
>
> This problem arose when I was using Mesa / XDE to implement my entity system;
> a persistent object system. I did get the dynamic loading to work ... and
> it made a big performance difference, but dynamic unloading could not be
> done safely, since the Mesa / XDE environment has no garbage collector.

This is an operating-system problem, not an application problem.
As such, the situation is not "applicable to a wide range of
applications". The solution is operating-system dependent; one
approach might be to use idle time to scan the processes in the
process queue to determine if there are any processes which were
started while the old module was in effect which made use of it.

Operating systems do a lot of things (pointer arithmetic, etc.)
which application systems should not be doing; this falls into
exactly that category.

Bill Wolfe, wtw...@hubcap.clemson.edu

William Thomas Wolfe, 2847

unread,

Sep 26, 1989, 2:41:39 PM9/26/89

to

From article <9...@scaup.cl.cam.ac.uk>, by s...@cl.cam.ac.uk (Stephen Crawley):

> Bill Wolfe writes:
>>
>> When you make your instantiation of the list type for objects of
>> type A, you must fulfill the instantiation requirements. This will
>> normally include supplying assignment and destruction procedures for
>> objects of type A. If you send an object into the list, the assignment
>> procedure will be used to create a resident copy. If you direct the
>> removal of an object from the list, the destruction procedure will be
>> used to destroy it. (See Booch, "Software Components with Ada")
>>
>
> First objection: I don't want to put a copy of the instance of A into
> the list twice. I want to put the SAME instance of A into the list
> twice. If A is a mutable type (i.e. has operations that can change
> a value) this is a significant semantic difference.

Then what you want is a list of pointers, not a list of objects.

> Second objection: You do not say how the assign and destruct operations
> get invoked when an instance of A returned by the LIST ADT to the
> application.

The ADT requires the user to supply the operations, and will invoke
them when appropriate. When the user requests that something be
inserted, the ADT will invoke the assignment procedure to create
a copy. When the user requests that something be removed, the
ADT will invoke the Destroy procedure to destroy it.

If the thing being stored is an object, then the ADT makes and
subsequently destroys its copy of that object. If the thing
being stored is a pointer, then the ADT makes and subsequently
destroys its copy of that pointer.

> I can't see how the application is forced to call DESTROY on an A
> variable when its declaration scope exits.

As mentioned above, the ADT creates and destroys on its own with
respect to the stored object. As far as the ADT's destruction
goes, the automatic destruction upon scope exit is not a part of
Ada 83; it is presently being considered as an Ada 9X revision.
This revision will greatly improve the ability to specify and
use components which manage their own space.

> [Example of an assignment procedure using reference counting]

Try writing the assignment procedure without reference counting,
since nobody ever does reference counting inside assignment anyway...

Bill Wolfe, wtw...@hubcap.clemson.edu

Steve Tynor

unread,

Sep 26, 1989, 3:08:40 PM9/26/89

to

In article <65...@hubcap.clemson.edu>
billwolf%hazel.cs.c...@hubcap.clemson.edu writes:

...

>From s...@cl.cam.ac.uk (Stephen Crawley):
>> Here are some problems where programmer-controlled storage management
>> would be very difficult or impossible.
>>
>> Interactive editting of cyclic graph data structures. You have a
>> heterogeneous cyclic graph data structure and the editor must be able
>> to make arbitrary sequences of changes to the data structure. How
>> do you detect that a subgraph has become detached?
>
> The fact that a subgraph has become detached does not imply that
> it is no longer part of a graph. Not all graphs are fully connected.
> Therefore, it isn't possible to release the storage used until there
> is some sort of directive which indicates that a given set of nodes
> and/or arcs is to be destroyed.

But what if _in_this_case_, a detached subgraph is garbage and should be
deallocated? G.C. surely simplifies the program and would enhance readability.
If you can easily detect when to deallocate - more power to you. But, it's
not always easy.

...

>> Dynamic loading & unloading of modules in a running program.
>> [...] the dynamic loader needs to know if the module is
>> currently being 'used' ... e.g. if some thread is executing a
>> procedure exported by the module or if some other part of the
>> application has salted away a reference into the module (e.g. a
>> procedure pointer)

...

> This is an operating-system problem, not an application problem.
> As such, the situation is not "applicable to a wide range of
> applications". The solution is operating-system dependent; one
> approach might be to use idle time to scan the processes in the
> process queue to determine if there are any processes which were
> started while the old module was in effect which made use of it.

Again, this sounds like garbage collection to me (this time the OS is garbage
collecting the unloaded module...)

> Operating systems do a lot of things (pointer arithmetic, etc.)
> which application systems should not be doing; this falls into
> exactly that category.

Are compilers not mini-operating systems (certainly, the Ada run-time
environment is!)? I'd much rather have my compiler doing my pointer arithmetic
(and storage management) than to expect my application to do so. I'm not
against explicit allocation/deallocation - I just want an alternative for the
cases when it's either too convoluted or impossible.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Eschew Obfuscation.

Steve Tynor
Georgia Tech Research Institute
Artificial Intelligence Branch
ty...@prism.gatech.edu

William Thomas Wolfe, 2847

unread,

Sep 26, 1989, 3:10:49 PM9/26/89

to

From wri...@hsi.UUCP (Gary Wright):

> Bertrand Meyer claims that programmer-controlled deallocation "is
> unacceptable for two reasons: security and complication of program

> writing." By security, Meyer means that programmers [...] will

> make mistakes and will dispose of objects that still have active
> references.

Unless they leave the task to an ADT instead.

> Complication refers to the fact that simply disposing
> of an object is not sufficient, all internal objects must also
> be disposed. Meyer calls this the "recursive dispose problem":
>
> This means that a specific release procedure must be
> written for any type describing objects that may refer
> to other objects. The result will be a set of mutually
> recursive procedures of great complication.

Not true. The user supplies a Destroy procedure for whatever
is being stored. The ADT, in the course of destroying itself,
will call upon the user's Destroy procedure to handle the user's
data type. The writer of the Destroy procedure need only consider
destroying his ADT, since the user's Destroy procedure can be relied
upon to destroy all lower levels. It's really rather simple.

> Instead of the applications programmer worrying about storage, the ADT
> designer must worry about it.

Precisely. As an ADT designer, I consider the task of storage
management to be among the least of my problems. The difficult
part is providing all the concurrency-related hard guarantees,
which languages like Eiffel manage to avoid by not providing
multitasking in the first place.

> My hunch is that the distiction between an ADT designer and an
> applications programmer is clear for objects like linked lists,
> stacks etc. but that it is not so clear the farther away from the
> basic data structures you get.

The distinction is clear: the ADT specification separates the
application programmer from the ADT implementor, regardless of
whether you perceive the structure as "basic" or not.

> I wonder if the overhead related to having ADT's managing storage
> via copying, and redundant code (all the recursive dispose procedures)
> doesn't come close to the overhead for GC.

Probably not, since this approach can exploit more information;
it doesn't have to worry about figuring out whether or not a
piece of space can be released, since it already knows it can.
The Destroy procedures convey information which does not have
to be rediscovered at great and continuing expense.

> It should be noted, that Meyer advocates a GC facility that is
> incremental in nature (not bursty), and can be explicitly turned on or
> off when necessary (e.g. when real-time constraints exist).
>
> Your turn. :-)

We know that GC can coexist with managed components, as in Ada,
where components can declare that the GC system (if any) must
not operate on objects of a managed type. We can easily agree
that if GC is provided, there must be a way for managed types
to turn it off as far as they are concerned, while leaving it
potentially on for others. If a given piece of code contains
only managed types, then the compiler should notice this fact
and exclude GC code from the executable.

The original point was that not all users can tolerate GC
(real-time users in particular); on the other hand, all GC users
can use managed components with no problems. Therefore, if
a component is to be designed for the widest possible audience,
that component must manage its own storage. If components are
used which meet the highest possible standards, then we don't
have to worry about whether our components will stop working
when we do maintenance (which might introduce multitasking,
real-time operation, etc.) on our application; using GC introduces
such weaknesses, in addition to giving an *avoidable* run-time cost.

Bill Wolfe, wtw...@hubcap.clemson.edu

William Thomas Wolfe, 2847

unread,

Sep 26, 1989, 3:19:44 PM9/26/89

to

From t...@nmsu.edu (Ted Dunning):

> one of the problems is that an in parameter is still a reference which
> must be accounted for. since there is no mechanism to handle
> initialization distinct from assignment, it is difficult to write a
> robust reference counting collection mechanism.

OK, I agree. Nobody ever does this anyway, and I can't imagine
why anyone would want to.

> Actually, preprocessors do exist; I believe the Classic Ada product
> (which provides inheritance and dynamic binding to the Ada O-O types)
> works via preprocessing. However, they must be kept separate from
> any validated compiler system.
>
> and thus will never be standardized or recognized by the ada community
> at large?

There is a systematic revision process for the Ada language.

I will be happy to send an Ada 9X revision request form to
Mr. Dunning or anyone else who requests it.

> hmmmm..... seems to me that mister wolfe hasn't done much lisp
> programming if he thinks that lisp has no concept of data typing.
> perhaps he should read CLtL a bit. particularly chapter 2 (data
> types), chapter 4 (type specifiers), section 6.2 (data type
> predicates), chapter 9 (declarations), as well as chapters 12, 13, 14,
> 15, 16, 17, 18 and 19 which describe the builtin data types and type
> extension methods available to the common lisp programmer.

OK, I'm not up on the very latest versions of Lisp (of which I
hear that there are many). How about multitasking capabilities?

Bill Wolfe, wtw...@hubcap.clemson.edu

William Thomas Wolfe, 2847

unread,

Sep 26, 1989, 3:25:42 PM9/26/89

to

From am...@Morgan.COM (Andrew P. Mullhaupt):

> In this newsgroup, we find other programmers arguing that
> programmers should have no direct use for pointers, and so the language can
> determine the correct lifetime of dynamic data.

Not exactly. The argument is that pointers should only be
used by ADT implementors, not by application programmers.

> I suggest it is going to prove necessary for the languages which have
> this "Pascalian" point of view to close all loopholes (typecasts to
> integer, etc.) to prevent the widespread use of C style perfidious
> trickery to escape from sensible programming practice. (If you give
> them an inch...)

The real problem is not the elimination of mechanisms such as
Ada's Unchecked_Conversion, but the maintenance of a software
engineering perspective; bad code can be written in any language.

Bill Wolfe, wtw...@hubcap.clemson.edu

William Thomas Wolfe, 2847

unread,

Sep 26, 1989, 4:48:19 PM9/26/89

to

From ty...@prism.gatech.EDU (Steve Tynor):

>> The fact that a subgraph has become detached does not imply that
>> it is no longer part of a graph. Not all graphs are fully connected.
>> Therefore, it isn't possible to release the storage used until there
>> is some sort of directive which indicates that a given set of nodes
>> and/or arcs is to be destroyed.
>
> But what if _in_this_case_, a detached subgraph is garbage and should
> be deallocated?

OK, assume that we do want to kill detached subgraphs. Which of
the two (or more) connected components is the one we wish to keep?

If we assume that there is some distinguished node or arc which
can never be deleted and which also serves to mark the "permanent"
connected component, then presumably our graph abstraction will
allow us to mark one such node or arc as being endowed with these
magic properties. Now we need an oracle which will tell us if
a connected component contains the magic node or arc. The ability
to determine whether one node (or arc) is reachable if we start at
another node or arc is a generally useful capability which should
probably be provided to the user anyway, particularly since the
algorithm need only be looked up in a standard text on graph theory
and typed in. Since any service which is not explicitly used will
be optimized away anyway, the user pays no real price for having the
service provided. Having provided the service to the user anyway,
we can now use it ourselves to determine which component to throw
away. Now this is obviously a special-case response to this
particular problem, but the problem is also a special case itself.

Even if computing it ourselves has the same computational complexity
as computing it with a garbage collection algorithm, there is still
a gain in that we can search a specific area at a specific time and
therefore improve the predictability of our service. We know that
there will be no operation whose time complexity will be totally out
of proportion to the size of our graph because we are paying for all
garbage everywhere on the machine to be collected.

Finally, there is the problem of the GC system repeatedly being
invoked (at great expense) when the amount of storage remaining
becomes fairly small. With a managed system, the user is given
an opportunity to strategically throw away data structures, and
thereby free up some space that the garbage collector could not
possibly take because it is still "in use".

Bill Wolfe, wtw...@hubcap.clemson.edu

Stephen Crawley

unread,

Sep 26, 1989, 6:37:12 PM9/26/89

to

Bill Wolfe writes:
> The model involves a database which stores objects which can be
> created, destroyed, read, and written by various users. The users
> might be partitioned into different groups (e.g., those authorized
> to destroy vs. those who cannot). Now from the perspective of a
> user who can read and write but not destroy, the object's lifetime
> is potentially infinite, depending upon the whims of those who have
> the power to destroy. By revalidating the identification number,
> such users receive continued assurances that the object still exists.
>
> If the object is destroyed, the identification numbers will then all
> expire automatically, regardless of how widely they were distributed.

A user explicitly destroying objects is equivalent to an application
explicitly deallocating heap nodes. In both case, a dangling reference
is left that will cause trouble next time it is used.

Sensible object-oriented database designs don't allow users to delete
objects. Instead objects are removed from the database when they are
no longer accessible. Dangling references are impossible.

But even assuming a database design where users may delete objects
explicitly, what good does it do a user to check that an object nos
is still valid? If it is valid now, there is no guarantee that it
will still be in 5 seconds time. If it isn't valid, there is nothing
the user can do about it. Sounds like a total waste of time to me ...

Bill writes:
>>> We are managing objects in this way because they present worst-case
>>> properties which do not permit us to use more efficient techniques.
>>
>> What about garbage collection???
>
> Won't work under the conditions I described (distributed environment).

Sorry Bill, you are just plain wrong there.

I have designed and am implementing an algorithm for garbage collection
of a distributed object system. It works, and it is tolerably fast.
(Garbage collection of N interlinked databases requires at most order
N**2 messages) This stuff will be written up for publication in due
course.

-- Steve

joh...@p.cs.uiuc.edu

unread,

Sep 26, 1989, 7:20:54 PM9/26/89

to

> by Bill Wolfe

> I think we've established that managed and unmanaged storage
> paradigms can coexist, and that components which manage their
> own storage can avoid the inefficiencies of garbage collection.

In fact, I don't believe that managed and unmanaged storage paradigm
should coexist in one program. You should either use components that
use automatic garbage collection or components that do not. While it
is possible to link C code to Lisp or Smalltalk, it is always tricky
and error prone.

> We also know that the user MUST be involved in storage management,
> if for no other reason than to decide which data structures to
> throw away in the event of a storage crisis.

Automatic garbage collection prevents storage crises. The system I
use generate an exception if it runs out of memory in spite of a
garbage collection, but I wouldn't dream of trying to handle that
automatically. Moreover, it only happens during debugging, and
usually means an infinite loop.

Bill Wolfe keeps making statements to the effect that garbage collection
is expensive. That is totally false. Garbage collection is cheap.
Anyone who is worried by 5% should be considering assembly language.
Garbage collection is cheap and is going to be cheaper. For example,
there has been little work on using optimizing compilers to reduce
the cost of garbage collection. I bet that a good compiler can make
automatic garbage collection cheaper than doing it yourself.

The problem with garbage collection is that the efficient algorithms are
not real-time. There is work being done in this area, and perhaps we
will soon find a way to make real-time programming compatible with
automatic memory management, but I don't think it is there yet.
However, the problem with garbage collection is NOT efficiency when
you measure the cost of garbage collection over the life of a program.

Ralph Johnson

Dave Jones

unread,

Sep 26, 1989, 7:38:56 PM9/26/89

to

From article <62...@tut.cis.ohio-state.edu>, by gric...@hockey.cis.ohio-state.edu (Golden Richard):

>
> If the programmer can't define when a certain object can go poof, I suspect
> a serious lack of understanding of the problem at hand.
>

Hear, hear.

But as I pointed out earlier, it is also useful to have stack-objects
and heap-objects that allow you to reclaim a whole bunch of objects all
at one go.

William Thomas Wolfe, 2847

unread,

Sep 26, 1989, 8:12:27 PM9/26/89

to

From joh...@p.cs.uiuc.edu:

> In fact, I don't believe that managed and unmanaged storage paradigm
> should coexist in one program. You should either use components that
> use automatic garbage collection or components that do not. While it
> is possible to link C code to Lisp or Smalltalk, it is always tricky
> and error prone.

Since Ada components can coexist with no difficulty, this is
not a true statement in general.

> Automatic garbage collection prevents storage crises. The system I
> use generate an exception if it runs out of memory in spite of a
> garbage collection, but I wouldn't dream of trying to handle that
> automatically.

Those who write embedded systems would tend to disagree.

> [The cost is only 5%]

At what cost in terms of extra space requirements?

> [Real-time GC is coming, Real Soon Now]

When it gets there, let me know.

Bill Wolfe, wtw...@hubcap.clemson.edu

Dave Jones

unread,

Sep 26, 1989, 8:25:10 PM9/26/89

to

From article <5...@hsi86.hsi.UUCP>, by wri...@hsi.UUCP (Gary Wright):
..

>
> Well, I have *never* encountered a situation that absolutely demanded using
> a high level language. That is why I still do all my programming in
> assembly. :-)
>

Sorry, but I'm too tired of this assembler-hll analogy for a smiley to save
you. Grumble. Ggrrrrrrrrrrrrmpf. Okay, okay, here: :-) (Sort of.)

The analogy is particularly inappropriate in this context, because
Assembler and HLL are two techniques for preparing a program, whereas
automatically garbage-collected and explicitly garbage-collected
programs are different end-results.*

Automated garbage collection is runtime overhead.

It will take quite a bit of convincing to make me believe
that it is universally necessary or desirable. I'm not saying it's
impossible, only that it's going to take better arguments than I've seen
so far. So in the mean time, I want the option of not using it. So
far I have used it approximately, now let's see.. approximately zero
times, give or take once.

* The principle advantage of high-level languages over assembler
is usually that it makes the program portable. It is a myth that assembly
language programming is necessarily tedious and slow. Back in the mid to
late seventies, there were macro-assembler and macro packages that made
assembly language programming virtually as easy as programming in high level
languages. I wrote one once. It handled procedure call/return, nested
loops, structures, and all that sort of languagy kind of stuff quite nicely.
Those macro packages were the evolutionary antecedants of HLLs.

Languages like C++ do automate some rather tedious name-space manipulation,
and that is another win. But many of the even "higher" level languages kind
of miss the boat by making the syntax of the language reflect library-routine
functionality. I like new AWK, for example, but I really
wish it were written as C++ classes, so that I could integrate it with
my own stuff.

Dave Jones

unread,

Sep 26, 1989, 9:29:10 PM9/26/89

to

From article <9...@scaup.cl.cam.ac.uk>, by s...@cl.cam.ac.uk (Stephen Crawley):

> Golden Richard III writes:
>
> Here are some problems where programmer-controlled storage management
> would be very difficult or impossible.
>
> Interactive editting of cyclic graph data structures. You have a
> heterogeneous cyclic graph data structure and the editor must be able
> to make arbitrary sequences of changes to the data structure. How

> do you detect that a subgraph has become detached?
>

Funny you should ask, because I'm working on such a problem even as we
speak, sans garbage-collector. The editor is not interactive -- it is
essentially an optimizer which reduces graphs by calculating equivalency
classes of subgraphs -- but I don't see that that matters much whether it
is interactive or not. I handled the sitch by making an "arrow" a proper
data-object, not just a vanilla pointer. One extra pointer's worth of
memory per arrow, but cheap at twice the price!

Before I had that brainstorm I had all sorts of problems, the very least
of which was reclaiming memory. (A wise man once said, if you
don't understand the problem well enough to know when an object
is unreachable, you don't understand the problem well enough.)

Here it is, simplified only a little, to be generic:

typedef struct {
Info node_info; /* Info that the node carries */
List arrows_out; /* List of all arrows pointing to this node */
List arrows_in; /* List of all arrows pointing out of this node */
}Node;

typedef struct {
Symbol label; /* Symbol which labels this arrow */
Node *from; /* Node at the base of this arrow */
Node *to; /* Node at the point of this arrow */
}Arrow;

Randell Jesup

unread,

Sep 26, 1989, 9:29:39 PM9/26/89

to

In article <20...@hydra.gatech.EDU> ty...@prism.gatech.EDU (Steve Tynor) writes:
>In article <65...@hubcap.clemson.edu>

>>From s...@cl.cam.ac.uk (Stephen Crawley):
>>> Here are some problems where programmer-controlled storage management
>>> would be very difficult or impossible.
>>>
>>> Interactive editting of cyclic graph data structures. You have a
>>> heterogeneous cyclic graph data structure and the editor must be able
>>> to make arbitrary sequences of changes to the data structure. How
>>> do you detect that a subgraph has become detached?

...

>But what if _in_this_case_, a detached subgraph is garbage and should be
>deallocated? G.C. surely simplifies the program and would enhance readability.
>If you can easily detect when to deallocate - more power to you. But, it's
>not always easy.

But GC is unlikely to reclaim that storage. Yes, the graph no longer
has pointers to the disconnected subgraph. However, the subgraph may be
circular (you said it was cyclic), so all the nodes of the subgraph may still
have references to them. Can current GC systems recognize such a loop and
remove it in it's entirety? How complex can such a loop be and still be
recognized?

--
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, je...@cbmvax.cbm.commodore.com BIX: rjesup
Common phrase heard at Amiga Devcon '89: "It's in there!"

Stephen Crawley

unread,

Sep 27, 1989, 5:07:57 AM9/27/89

to

I wrote:
>> First objection: I don't want to put a copy of the instance of A into
>> the list twice. I want to put the SAME instance of A into the list
>> twice. If A is a mutable type (i.e. has operations that can change
>> a value) this is a significant semantic difference.

Bill Wolfe replies:

> Then what you want is a list of pointers, not a list of objects.

Huh??? Since when has it been a property of an object that I can only
have one reference to it? You see to have a particularly restricted
idea of what an object is!

No Bill. Given normal use of the term object == instance of an ADT, I
want what I said I want ... the ability to put the same object into
a list twice.

Alternatively, if you insist that I use a different generic list ADT
(say POINTER_LIST), please explain how it does storage management
without pushing the tricky part (in particular the decision about when
reference counts should be incremented or decremented) back to application
code.

I wrote:
>> I can't see how the application is forced to call DESTROY on an A
>> variable when its declaration scope exits.

Bill replies:

> As mentioned above, the ADT creates and destroys on its own with
> respect to the stored object.

Yes Bill ... I understood all of that ... and I understood the mechanism
that ADA uses to force the application to cooperate in most cases.

> As far as the ADT's destruction
> goes, the automatic destruction upon scope exit is not a part of
> Ada 83; it is presently being considered as an Ada 9X revision.
> This revision will greatly improve the ability to specify and
> use components which manage their own space.

But my point is that the mechanism doesn't exist NOW. Given that it
is being CONSIDERED for Ada 9X, it is a safe bet that it won't be
available in production Ada systems for at least 3-5 years. What
do we do to till then? Use inferior software components that can't
get space management right for us?

---

I sketched the code for a LIST ASSIGN operation and pointed out that
the reference counting overheads would in many situations be higher
that the overheads of GC.

Bill Wolfe replies:

> Try writing the assignment procedure without reference counting,
> since nobody ever does reference counting inside assignment anyway...

The generic list ADT problem that I outline requires that SOMETHING
does reference counting. It is a direct consequence of the stipulation
that the generic list must allow instances of an ADT to appear in a list
more than once.

Now you have been insisting all along that our generic list ADT and our
list member ADT (say A) can between them deal with all of the storage
management decisions associated with instances of A. Clearly we cannot
expect the application to be explicitly involved in reference counting.
In ADA, this means that the ASSIGN operation must do it. There is no
other option as far as I can see.

Bill if you are going to contradict me on this, please post the complete
ADA source code of the generic list ADT and another ADT that can be used
to instantiate the list. Be sure it solves the problem as I've specified
it ... not some trivialisation.

-- Steve

Stephen Crawley

unread,

Sep 27, 1989, 5:24:49 AM9/27/89

to

Andrew P. Mullhaupt writes:
>> In this newsgroup, we find other programmers arguing that
>> programmers should have no direct use for pointers, and so the
>> language can determine the correct lifetime of dynamic data.

Bill Wolfe replies:

> Not exactly. The argument is that pointers should only be
> used by ADT implementors, not by application programmers.

Some of the best designed programing languages don't have explicit
pointers at all!! CLU, Eiffel and ML are good examples.

-- Steve