Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

garbage collection in pure tcl?

207 views
Skip to first unread message

John Wiersba

unread,
Feb 28, 2002, 7:19:54 PM2/28/02
to
What is the best way to create dynamic data structures in (pure) tcl
which will get collected as they are no longer referenced?

For example, I want to create a package which returns to its caller a
handle to a new "object". And when the handle is no longer
referenced, I want the "object" to disappear. This precludes, I
believe, using a global (or namespace-global) array of object names
and returning the name to the caller, since the data associated with
the name will never be freed without an explicit action to free it.

Is there some (preferably pure) tcl package which can do this? I
don't know much about incrTcl, but from what I remember reading, its
objects *didn't* have this property -- they needed to be freed
explicitly.

If this topic has already been discussed (and I'm sure it has at some
point), I'd be happy to read the archives if someone could give me a
good search string to find the thread.

Thanks!
-- John Wiersba

Richard Suchenwirth

unread,
Mar 1, 2002, 2:27:18 AM3/1/02
to
John Wiersba wrote:
> What is the best way to create dynamic data structures in (pure) tcl
> which will get collected as they are no longer referenced?
>
> For example, I want to create a package which returns to its caller a
> handle to a new "object". And when the handle is no longer
> referenced, I want the "object" to disappear. This precludes, I
> believe, using a global (or namespace-global) array of object names
> and returning the name to the caller, since the data associated with
> the name will never be freed without an explicit action to free it.

> Is there some (preferably pure) tcl package which can do this? I
> don't know much about incrTcl, but from what I remember reading, its
> objects *didn't* have this property -- they needed to be freed
> explicitly.

Global variables (including all those in namespaces, which is just a way of
structuring and hiding globals) persist until explicitly freed, or [exit].

Local variables are "collected" on leaving the procedure in which they were
defined - the memory consumed is not directly returned to the OS, but kept
to Tcl for later new allocations.

If you know how long a data structure will have to live, it is easiest done
in a proc that spans its expected lifetime. Besides, it could be possible to
devise a mechanism with write and unset traces, but I'm not sure how to make
that bullet-proof (Tcl's equivalent to pointers are names in stack frames,
which are just any strings...)

Best regards, Richard Suchenwirth

Martin Lemburg

unread,
Mar 1, 2002, 4:26:24 AM3/1/02
to
jrw3...@yahoo.com (John Wiersba) wrote in message news:<d1c10fb8.0202...@posting.google.com>...

Hi John,

as far as I know have Tcl_Objs a reference counter.

Every creation of a reference of a Tcl_Obj in an interpreter causes to
increase this reference counter.
Every deletion of a reference decreaes this reference counter.
If the reference counter is lower or equal 0 the Tcl_Obj is freed.

This means in tcl (not in C(++)) that every variable every data you
use seen as a Tcl_Obj is freed if you delete the last reference to it!

This is not a garbage collection, but in tcl you have no reason to
worry about increasing memory usage, not freed memory, if you think
about object references.

Example:

% set element 1;
1
% set list [list 1 2 3 4 $element];
1 2 3 4 1

The Tcl_Obj stored in the variable element has the reference counter
1.
Every Tcl_Obj stored in the variable list has the reference counter 1
except the last Tcl_Obj taken from the variable element.
This last element of the list has now the reference count 2.

% lappend list $element;
1 2 3 4 1 1

Now the Tcl_Obj stored in the variable element has the reference
counter 3.

% set list [lrange $list 0 4];
1 2 3 4 1

Now the Tcl_Obj stored in the variable element has the reference
counter 2.

% set element "Hello world!";

Now the Tcl_Obj previously stored in the variable element has the
reference counter 1.

% set list [lreplace $list end];
1 2 3 4

Now the Tcl_Obj with the content "1", previously stored in the
variable element, has the reference counter 0 and is deallocated!

So in pure tcl you don't have to worry about the memory usage!

Greetings

Martin Lemburg

Kristoffer Lawson

unread,
Mar 1, 2002, 5:25:32 AM3/1/02
to
John Wiersba <jrw3...@yahoo.com> wrote:
> For example, I want to create a package which returns to its caller a
> handle to a new "object". And when the handle is no longer
> referenced, I want the "object" to disappear. This precludes, I
> believe, using a global (or namespace-global) array of object names
> and returning the name to the caller, since the data associated with
> the name will never be freed without an explicit action to free it.

I don't think there's any way to do this in pure Tcl. At the moment,
the reference count is not available at the script level, although this
wouldn't be terribly difficult to do I guess. You can however handle this
at the C level where you can check the reference count and set up
freeing functions for when the reference count reaches 0. There are some
hidden problems involved but it could be what you're looking for.

See: http://tcl.activestate.com/man/tcl8.3/TclLib/ObjectType.htm

--
/ http://www.fishpool.com/~setok/

John Wiersba

unread,
Mar 1, 2002, 10:40:56 AM3/1/02
to
"Richard Suchenwirth" <richard.s...@kst.siemens.de> wrote in message news:<3c7f2d55@spider>...

> Global variables (including all those in namespaces, which is just a way of
> structuring and hiding globals) persist until explicitly freed, or [exit].

Here's what I'm getting at: AFAIK the "normal" way of creating
anonymous (dynamically created) tcl data structures is to use a global
array to hold names whose value is the data structure and then return
the name as the handle. In other languages, this would be
accomplished by returning a reference or pointer to the data
structure.

In languages with some form of garbage collection, these anonyous data
structures are automatically collected when they're no longer
referenced by any client. However in tcl (and languages without a
garbage collector), this doesn't work. In particular, tcl can't free
the data structure because it's referenced by the array which holds
the correspondence between the handle and the data structure itself.
To free the data structure someone has to explicitly remove the entry
from the array.

I believe (from what little I've read) that incrTcl has the same
problem. It can create objects (i.e. data structures) just fine, but
they have to be freed explicitly.

To me, this is tcl's biggest drawback. I can't create complicated
data structures like I can (in perl for instance) without having to
track them in my code to free them at the right time. This is in
direct opposition to the way tcl handles strings, which *are* garbage
collected (or memory-managed) when they're no longer needed. So, tcl
is great for string processing in this respect (but better than C, for
example, where strings have to be memory managed, too). But for more
complicated data structures, it seems to me that tcl is in exactly the
same boat as C, instead of being at a higher level.

What I'm looking for is some advice as to how people deal with this
memory management issue for data structures that are more complicated
than strings. Or maybe there is some nice package that deals with
this issure. Or maybe I'm wrong about incrTcl.

Any thoughts?

-- John Wiersba

Cameron Laird

unread,
Mar 1, 2002, 10:52:26 AM3/1/02
to
In article <d1c10fb8.02030...@posting.google.com>,

John Wiersba <jrw3...@yahoo.com> wrote:
>"Richard Suchenwirth" <richard.s...@kst.siemens.de> wrote in message news:<3c7f2d55@spider>...
>> Global variables (including all those in namespaces, which is just a way of
>> structuring and hiding globals) persist until explicitly freed, or [exit].
>
>Here's what I'm getting at: AFAIK the "normal" way of creating
>anonymous (dynamically created) tcl data structures is to use a global
>array to hold names whose value is the data structure and then return
>the name as the handle. In other languages, this would be
>accomplished by returning a reference or pointer to the data
>structure.
>
>In languages with some form of garbage collection, these anonyous data
>structures are automatically collected when they're no longer
>referenced by any client. However in tcl (and languages without a
.
.
.

>example, where strings have to be memory managed, too). But for more
>complicated data structures, it seems to me that tcl is in exactly the
>same boat as C, instead of being at a higher level.
>
>What I'm looking for is some advice as to how people deal with this
>memory management issue for data structures that are more complicated
>than strings. Or maybe there is some nice package that deals with
>this issure. Or maybe I'm wrong about incrTcl.
>
>Any thoughts?
>
>-- John Wiersba

My instinct is that yes, we're stuck that way.
Briefly, the interpreter can't know when we're
done with anything except string data, so there's
no opportunity for (other) garbage collection.

It'll be next week before I can make this clearer
(or correct it).
--

Cameron Laird <Cam...@Lairds.com>
Business: http://www.Phaseit.net
Personal: http://starbase.neosoft.com/~claird/home.html

joe_english

unread,
Mar 1, 2002, 11:14:41 AM3/1/02
to
John Wiersba wrote:

>To me, this is tcl's biggest drawback. I can't create complicated
>data structures like I can (in perl for instance) without having to

>track them in my code to free them at the right time. [...]


>
>What I'm looking for is some advice as to how people deal with this
>memory management issue for data structures that are more complicated
>than strings.

I usually store the entire data structure in a single array,
and free the whole thing at once with [unset] when it's no
longer needed. I don't usually worry about freeing individual
nodes or subtrees when they become unreferenced, but in the
rare case where this is a problem I do manual garbage collection
by traversing the original data structure, copying the values
to a new array as I go, then do

unset Old; array set Old [array get New]; unset New


--Joe English

John Wiersba

unread,
Mar 1, 2002, 11:38:57 AM3/1/02
to
martin....@ugs.com (Martin Lemburg) wrote in message news:<1285da5d.02030...@posting.google.com>...

> as far as I know have Tcl_Objs a reference counter.
> ...

> This is not a garbage collection, but in tcl you have no reason to
> worry about increasing memory usage, not freed memory, if you think
> about object references.
> ...

> So in pure tcl you don't have to worry about the memory usage!

This is exactly what I'm talking about. Data structures in tcl can be
passed by value (if they can be stored in a string) or named. But if
you want to pass something by reference, you have to simulate that by
passing the name of a named data structure or the index value of a
list or array index. And then you lose the ability to have those data
structures garbage collected when there are no longer any "references"
to them.

Maybe I'm missing something and there's a better way? I'd *much*
prefer not to have to write an extension in C.

-- John Wiersba

David N. Welton

unread,
Mar 1, 2002, 11:54:36 AM3/1/02
to
jrw3...@yahoo.com (John Wiersba) writes:

> This is exactly what I'm talking about. Data structures in tcl can
> be passed by value (if they can be stored in a string) or named.
> But if you want to pass something by reference, you have to simulate
> that by passing the name of a named data structure or the index
> value of a list or array index. And then you lose the ability to
> have those data structures garbage collected when there are no
> longer any "references" to them.

Right, Tcl does not have references, and as much as you can simulate,
and get around that creatively, it causes problems.

> Maybe I'm missing something and there's a better way? I'd *much*
> prefer not to have to write an extension in C.

Maybe someone can suggest something creative that will work for your
particular case.

--
David N. Welton
Consulting: http://www.dedasys.com/
Personal: http://www.dedasys.com/davidw/
Free Software: http://www.dedasys.com/freesoftware/
Apache Tcl: http://tcl.apache.org/

John Wiersba

unread,
Mar 1, 2002, 11:59:39 AM3/1/02
to
Cameron had replied (in an email) with the following:

> My instinct is that yes, we're stuck that way.
> Briefly, the interpreter can't know when we're
> done with anything except string data, so there's
> no opportunity for (other) garbage collection.

> It'll be next week before I can make this clearer
> (or correct it).

Cameron,

I understand what you're saying. This is such a big issue to me, that
it almost completely undermines my ability to enjoy using tcl for
larger projects (I *do* like many features of tcl, especially its
minimalism and generalization of underlying concepts). The solution
of using dynamically generated namespace names or using one big array
to hold all objects and their properties just feels like such a huge
kludge to me.

I was hoping that someone had come up with some technique or trick or
conventional use to circumvent this. This is exactly the same state
perl was in before perl 5 (i.e. no references).

I had been hoping that incrTcl would solve this problem, but from my
(small amount of) reading, it appears not. I'm (still) hoping that
there's some OOP extension to tcl that *does* solve it. To me, it's
not OOP unless objects are garbage collected. And actually, I don't
really need much of anything else from OOP except garbage collection.
Inheritance and polymorphism are highly overrated for general
programming, IMHO. Dynamically allocated (and collected) abstractions
are the key feature.

Does anyone have any suggestions or pointers on how they deal with
this issue in their code?

-- John

Cameron Laird

unread,
Mar 1, 2002, 12:12:22 PM3/1/02
to
In article <d1c10fb8.02030...@posting.google.com>,
John Wiersba <jrw3...@yahoo.com> wrote:
.
.
.

>Inheritance and polymorphism are highly overrated for general
>programming, IMHO. Dynamically allocated (and collected) abstractions
>are the key feature.
.
.

Brett Schwarz

unread,
Mar 1, 2002, 12:43:36 PM3/1/02
to
>
> I believe (from what little I've read) that incrTcl has the same
> problem. It can create objects (i.e. data structures) just fine, but
> they have to be freed explicitly.
>

For the most part you are right. However, you can create an object
within a method with the "local" command, and that object will only
live while in that method (i.e. it will get garbage collected when
that method is terminated). Granted, this has limited usability, but
may work in your situation. So, something like:

itcl::body SomeClass::someMethod {

.
.
.
itcl::local SomeOtherClass localObject
.
.
.

}

Also, on the XOTcl mailing list, they were talking about binding an
object to a variable, so that when the variable is deleted, the object
is delete. I am not sure how they are going to implement that. I think
they were looking at setting a trace on the variable that would delete
the object. I think this is kind of what Richard was alluding to as
well.

I believe TclBlend does something similar to this as well. It sets up
some relationship between the Java object, and a Tcl object. Such as:

set obj1 [java::new SomeClass]

I think as long as obj1 lives, then that java object also lives. Don't
quote me on this one though, I really haven't used TclBlend.

--brett

Kevin Kenny

unread,
Mar 1, 2002, 12:47:52 PM3/1/02
to
John Wiersba <jrw3...@yahoo.com> wrote:
>What I'm looking for is some advice as to how people deal with this
>memory management issue for data structures that are more complicated
>than strings. Or maybe there is some nice package that deals with
>this issure. Or maybe I'm wrong about incrTcl.

Cameron Laird replied:


> My instinct is that yes, we're stuck that way.
> Briefly, the interpreter can't know when we're
> done with anything except string data, so there's
> no opportunity for (other) garbage collection.

We are indeed "stuck that way" for some interpretations of "stuck.' More
precisely, "fixing this problem is a big job, and it'll take a major release
because it will not be entirely backward compatible."

I believe the issue of object lifetime, and the related issues of mutable
objects and shimmering, may be the single largest problem confronting Tcl as
it exists today; it has had impact on all the object-oriented extensions,
on Feather, on TclBlend and Jacl, on TclMico/Combat, on TCOM, ... the
list goes on and on.

I have been thinking a lot about instance management in Tcl in recent
months; alas, my thoughts are definitely not ready to convert into TIPs.
Moreover, I expect that even after I or others have written relevant TIPs,
there will be an uphill road getting them approved; the TCT is innately
(and appropriately) conservative about breaking compatibility, and any
change that really fixes this problem makes [rofound and subtle changes to
the behavior of many familiar features. The problem is that once you've
exposed the "object nature" of values, you've broken the contract that
"everything is a string." Some values become special, and losing the
internal representation of such values acquires side effects. We will
have to feel our way carefully forward, making sure that Tcl's behavior
remains comprehensible and documentable.

By the way, even Java-style garbage collection is not a panacea for these
issues. If you look at the troubled history of Java's finalize() method,
you'll see that implicit garbage collection simply does not work for certain
types of resources.
--
73 de ke9tv/2, Kevin KENNY GE Corporate R&D, Niskayuna, New York, USA

John Wiersba

unread,
Mar 1, 2002, 5:26:19 PM3/1/02
to
Kevin Kenny <ken...@acm.org> wrote
> ...
> By the way, even Java-style garbage collection is not a panacea for these
> issues. If you look at the troubled history of Java's finalize() method,
> you'll see that implicit garbage collection simply does not work for certain
> types of resources.

It's true that reference-counting has its problems and
non-reference-counting also has "issues" as you mentioned. However, I
find that there are lots and lots of cases where I just want to
allocate a new "object" to return to my caller and I don't want my
caller to have to free the object (it might be shared, for instance).

OK, so I'll accept that the caller has to free the object. Does
someone have a recipe for how to handle this kind of situation? Is
there some boilerplate wiki-ish formula which shows the canonical way
of allocating an object to be returned to a caller and later freed by
the caller (or not)? Any gotcha's to look out for? I'm especially
thinking of situations where the data structure might be big (in the
megabytes) so that freeing it at some point will be imperative.

John Wiersba

unread,
Mar 1, 2002, 5:30:43 PM3/1/02
to
Joe English wrote in message news:<a5o9d...@enews1.newsguy.com>...

>
> I usually store the entire data structure in a single array,
> and free the whole thing at once with [unset] when it's no
> longer needed. I don't usually worry about freeing individual
> nodes or subtrees when they become unreferenced, but in the
> rare case where this is a problem I do manual garbage collection
> by traversing the original data structure, copying the values
> to a new array as I go, then do
>
> unset Old; array set Old [array get New]; unset New
>
> --Joe English

Joe,

Could you give a small example of this paradigm? Also, I belive what
you're talking about handles a single object of some type (one array
== one object). How do you handle multiple objects of the same type?

Is there a template of this kind of thing on the wiki or in one of the
tcl books?

Chang Li

unread,
Mar 1, 2002, 5:30:56 PM3/1/02
to
Kevin Kenny <ken...@acm.org> wrote in message news:<3C7FBEC8...@acm.org>...

Do you think garbage collection is needed when Tcl has no object-oriented
extension in kernel?

Chang

joe_english

unread,
Mar 2, 2002, 12:58:58 PM3/2/02
to
John Wiersba wrote:
>
>Could you give a small example of this paradigm?

I can't think of any _small_ examples of big hairy data
structures :-), but dom::tcl sort of works this way.
<URL: http://www.sf.net/projects/tclxml >

In an online help system I developed, I organized the
help text this way:

variable Help;
Help(topics) -- list of topic IDs
Help(styleSheet) -- text widget tag configurations
Help(TOC) -- list of {topicID title} pairs
Help(...) -- other global information
Help($topicId!text) -- help topic text
Help($topicId!next)
Help($topicId!prev) -- IDs of "next" and "previous" topic links
Help($topicId!...) -- Other topic-specific information

Hm, this actually isn't a terribly complicated structure,
but I've used a similar approach in many other programs,
some with mindbogglingly hairy tree- and graph-like structures.
I wish I could show you a more compelling example, but all I
have handy is proprietary code.

The basic idea, using hierarchical array element names
as keys, scales up pretty well.

With this approach, saving and reloading complex data
structures to a file is *really* easy: basically just
[puts $fp [array get X]] and [array set X [read $fp]]
(though I usually use a variation on this theme that
checks for file format compatibility and is friendlier
to revision control systems).

Keeping track of when the data structure has been modified
(e.g., to support "Save file before closing?" dialog boxes)
is also simple; just stick a write trace on the array.

> Also, I belive what
>you're talking about handles a single object of some type (one array
>== one object). How do you handle multiple objects of the same type?

It depends.

Often when there are multiple objects of the same type they're
actually part of a *bigger* tree- or graph- like structure, so I
use a big array to manage the larger structure instead. Often it's
one array == one object == one file.

For GUI programs that manage multiple files, then it's
one (top-level) window == one file == one array; here I
often use the Tk window id as the global array name,
and use [bind $toplevel <Destroy> { unset %W }] for
memory management.

For transient objects, I use [array get] to pass them around
by value and [array set] on a local variable to operate on them.
That way Tcl takes care of memory management.


--Joe English

jeng...@flightlab.com

lvi...@yahoo.com

unread,
Mar 4, 2002, 7:13:28 AM3/4/02
to

According to John Wiersba <jrw3...@yahoo.com>:
: So, tcl

:is great for string processing in this respect (but better than C, for
:example, where strings have to be memory managed, too). But for more
:complicated data structures, it seems to me that tcl is in exactly the
:same boat as C, instead of being at a higher level.

Good summary. The way Tcl was designed, to solve this type of problem,
one would implement an extension which would work the way they wanted.

For a more general solution, those who have needed this function over the past
10+ yrs could brain storm together how to extend the Tcl code without
damaging all those thousands of existing applications and extensions who
have not needed the function to add to it.

As someone (perhaps it was you) said earlier, this is pretty much the state
Perl was in before perl 5.
--
"I know of vanishingly few people ... who choose to use ksh." "I'm a minority!"
<URL: mailto:lvi...@cas.org> <URL: http://www.purl.org/NET/lvirden/>
Even if explicitly stated to the contrary, nothing in this posting
should be construed as representing my employer's opinions.

Kristoffer Lawson

unread,
Mar 4, 2002, 7:25:48 AM3/4/02
to
John Wiersba <jrw3...@yahoo.com> wrote:

> I was hoping that someone had come up with some technique or trick or
> conventional use to circumvent this. This is exactly the same state
> perl was in before perl 5 (i.e. no references).

Ah, but Tcl does have references. Each object has a reference count
internally. When you pass a string backwards and forwards, it's not
copied. Instead the reference count is increased and when needed the
object is copied. Ie. you can create object types that free other
structures when the reference count drops to 0.

I think TclJava uses this to manage Java objects.

>
> I had been hoping that incrTcl would solve this problem, but from my
> (small amount of) reading, it appears not. I'm (still) hoping that
> there's some OOP extension to tcl that *does* solve it. To me, it's
> not OOP unless objects are garbage collected.

Actually I used to think the same, and I still think garbage collection
of objects is extremely useful, but I haven't missed it anywhere near
as much as I expected to. Especially when using child objects (thus
guaranteeing that they disappear when the parent does). I'm not sure
if this is possible in IncrTcl? Still, I agree that we could do with
garbage collection. The problems arise when one uses object references
inside strings and you assume they exist, such as:

after 500 "[MyClass new] foobar"

As Tcl strings are not structured in any way, you have a handle to
a MyClass object, but such an object does not exist as the reference
count is 0. The handle is just contained in the string. You could probably
work out a scheme for structuring Tcl strings to keep references alive,
but that's something for a future version.

If you can avoid situations like the above, then Tcl's existing
reference count mechanism should work OK. But you still need the OO
extension to provide that in C code.

--
/ http://www.fishpool.com/~setok/

Kristoffer Lawson

unread,
Mar 4, 2002, 7:33:59 AM3/4/02
to
Brett Schwarz <brett_...@yahoo.com> wrote:
> Also, on the XOTcl mailing list, they were talking about binding an
> object to a variable, so that when the variable is deleted, the object
> is delete. I am not sure how they are going to implement that. I think
> they were looking at setting a trace on the variable that would delete
> the object. I think this is kind of what Richard was alluding to as
> well.

No need to trace it. You just specify a function to be called when
the new object type should be released (ie. when the reference count
reaches 0). This is quite possible to do with the C API.
This is pretty much what TclBlend does IIRC. It's definitely not
fool-proof, but can work most of the time.

--
/ http://www.fishpool.com/~setok/

Donal K. Fellows

unread,
Mar 4, 2002, 10:36:15 AM3/4/02
to
Chang Li wrote:
> Do you think garbage collection is needed when Tcl has no object-oriented
> extension in kernel?

Yes. There'd be all sorts of new cool things you could do with GC'ed entities
(lambda closures, unnamed objects, stack frames, etc.) Right now, the lack of
GC makes working with such things painful, to say the least...

Donal.
--
Donal K. Fellows http://www.cs.man.ac.uk/~fellowsd/ fell...@cs.man.ac.uk
-- I'm curious; where does this statistic come from? Does its home, perchance,
ever see sunlight? -- Jason A Williams <jason+...@compsoc.man.ac.uk>

Donal K. Fellows

unread,
Mar 4, 2002, 11:09:22 AM3/4/02
to
Kristoffer Lawson wrote:
> Ah, but Tcl does have references. Each object has a reference count
> internally. When you pass a string backwards and forwards, it's not
> copied. Instead the reference count is increased and when needed the
> object is copied. Ie. you can create object types that free other
> structures when the reference count drops to 0.

It's dead easy to do at the C level. There's just one problem; Tcl is a bit too
eager to throw away objects, and there's currently no way to stop it from
happening. And the core is very keen on using its own object types for things,
and this is because it's been coded with the assumption that you can always
recover the internal representation from the string, which is true for all Tcl's
own object types in the context in which they are used, but *not* if you've just
GC'ed the linked structure.

There are a few other problems too (e.g. what does it mean to duplicate a
reference) but lifespan management is the number one target. And I don't think
anything will happen on this front before Tcl9; whatever we do will break
extensions (the object type system will need to change) and maybe some scripts
too.

> I think TclJava uses this to manage Java objects.

And IIRC it has some problems because of it.

Donal K. Fellows

unread,
Mar 4, 2002, 11:17:18 AM3/4/02
to
Kevin Kenny wrote:
> I have been thinking a lot about instance management in Tcl in recent
> months; alas, my thoughts are definitely not ready to convert into TIPs.
[...]

Care to put any of that thought into words? I've been thinking about this for
quite a while too, and I have a feeling that if you could somehow attach to
strings a notion of what objects are "contained" within that string, you could
do something reasonably smart with operations like string-concatenation and
[eval]. Possibly even sufficient (if you make sure all extensions -
particularly Tk and Itcl - are object-aware) if you also have some way to
inhibit changes to other types while still letting code cope with the fact that
its attempts to cache things for later have failed...

Kevin Kenny

unread,
Mar 4, 2002, 6:55:39 PM3/4/02
to
> Kevin Kenny wrote:
> > I have been thinking a lot about instance management in Tcl in recent
> > months; alas, my thoughts are definitely not ready to convert into TIPs.
> [...]
"Donal K. Fellows" wrote:
> Care to put any of that thought into words? I've been thinking about this for
> quite a while too, and I have a feeling that if you could somehow attach to
> strings a notion of what objects are "contained" within that string, you could
> do something reasonably smart with operations like string-concatenation and
> [eval].

You've shamed me into it. I've started a Wiki page at
http://wiki.tcl.tk/3073.html

Kristoffer Lawson

unread,
Mar 5, 2002, 5:28:43 AM3/5/02
to
Donal K. Fellows <fell...@cs.man.ac.uk> wrote:
>
> It's dead easy to do at the C level. There's just one problem; Tcl is a bit too
> eager to throw away objects, and there's currently no way to stop it from
> happening. And the core is very keen on using its own object types for things,
> and this is because it's been coded with the assumption that you can always
> recover the internal representation from the string, which is true for all Tcl's
> own object types in the context in which they are used, but *not* if you've just
> GC'ed the linked structure.

That brings in the old discussion of whether objects in OO-extensions should
have a real string representation. I really liked this idea and it would
immediately solve those issues. I know there are problems with it too, but
has anybody created an experimental framework for this? If not, I'll do
the following:

$projects add objectStringRepresentation
puts [$projects amount]
=> Too many.

;-)

>> I think TclJava uses this to manage Java objects.
>
> And IIRC it has some problems because of it.

Yes. I believe there are issues such as losing internal representation
because of shimmering or because the handle becomes a part of another
string.

--
/ http://www.fishpool.com/~setok/

Kristoffer Lawson

unread,
Mar 5, 2002, 5:36:45 AM3/5/02
to
Donal K. Fellows <fell...@cs.man.ac.uk> wrote:
> Kevin Kenny wrote:
>> I have been thinking a lot about instance management in Tcl in recent
>> months; alas, my thoughts are definitely not ready to convert into TIPs.
> [...]
>
> Care to put any of that thought into words? I've been thinking about this for
> quite a while too, and I have a feeling that if you could somehow attach to
> strings a notion of what objects are "contained" within that string, you could
> do something reasonably smart with operations like string-concatenation and
> [eval].

This is what I was after too when I mentioned structured strings. I thought
about this when doing the list object experiment a while back which avoided
copying lists too often with [lreplace] etc. There I only had one level
of redirection which was enough for that purpose to speed those commands
up by an order of a magnitude (unfortunately they also slightly increased
indexing times). The same could've been done with strings: thus making
things like [string replace] faster. If that logic was extended you
could add deep structure to strings, thus retaining references to other
object types.

So you would get two benefits:

- References to objects contained in the string are retained.
- Several string operations are much faster.

The downside is that the operation of those references can become quite
obscure to the Tcl coder. You somewhat lose the idea that strings are
just strings (or appear to be so). You need to understand that they are
structured in a particular fashion and what happens when changes are made
to that string. This could be dangerous.

--
/ http://www.fishpool.com/~setok/

Donal K. Fellows

unread,
Mar 5, 2002, 8:35:57 AM3/5/02
to
Kristoffer Lawson wrote:
> That brings in the old discussion of whether objects in OO-extensions should
> have a real string representation. I really liked this idea and it would
> immediately solve those issues.

Depends what you mean by "real string rep." IMHO, concatenating the name of the
object type/class name and the hex version of the pointer to the internal
structure (or some other global counter thingy) would be unique enough for
anyone, and dead cheap to implement. The key problem is lifespan management,
and particularly keeping the objects live through (selected) string
manipulations.

> I know there are problems with it too, but
> has anybody created an experimental framework for this?

Nope, not that does the essential extra bit and augments the string itself.
Without that critical component (which is also the compatability-buster) there's
not much point. With it, you have virtually sufficient to do masses of new cool
stuff in Tcl.

> If not, I'll do the following:
> $projects add objectStringRepresentation
> puts [$projects amount]
> => Too many.
> ;-)

YM:
% puts [$projects amount]
number too large to represent

:^) or maybe :^( depending on your PoV...

-- Prices aren't rising - discounts are falling! It's WalMart-in-reverse! I'm
not spending more, I'm saving less! -- Chris Ahlstrom <ahls...@home.com>

Donal K. Fellows

unread,
Mar 5, 2002, 10:46:49 AM3/5/02
to
Kristoffer Lawson wrote:
> So you would get two benefits:
>
> - References to objects contained in the string are retained.
> - Several string operations are much faster.
>
> The downside is that the operation of those references can become quite
> obscure to the Tcl coder. You somewhat lose the idea that strings are
> just strings (or appear to be so). You need to understand that they are
> structured in a particular fashion and what happens when changes are made
> to that string. This could be dangerous.

Maybe, maybe not. It depends on the exact rules you enforce, and that's at
least part of the reason why this is a *big* job, at least of the scale of the
TIP#72 implementation (and that was a bit of a nightmare, I can tell you) though
at least here it would be completely cross-platform...

John Wiersba

unread,
Mar 5, 2002, 7:41:14 PM3/5/02
to
Joe English wrote in message news:<a5r3t...@enews3.newsguy.com>...

> In an online help system I developed, I organized the
> help text this way:
>
> variable Help;
> Help(topics) -- list of topic IDs
> Help(styleSheet) -- text widget tag configurations
> Help(TOC) -- list of {topicID title} pairs
> Help(...) -- other global information
> Help($topicId!text) -- help topic text
> Help($topicId!next)
> Help($topicId!prev) -- IDs of "next" and "previous" topic links
> Help($topicId!...) -- Other topic-specific information
>
> Hm, this actually isn't a terribly complicated structure,
> but I've used a similar approach in many other programs,
> some with mindbogglingly hairy tree- and graph-like structures.

Thanks for the example, Joe. It looks to me that if you had help
associated with multiple windows, you might end up with something
like:

Help($winID!$topicID!text)

To me that looks like a paradigm for handling multiple objects (each
winID is an object). And if you have a recursive data structure, you
end up with something like:

Help($winID!$topicID!window_link) = $winID_2
Help($winID_2!$topicID!text) ...

I tried something like this one time and had the following problems:

- It created a huge array, since all objects were kept in one array
and each object had many attributes which were in the same array.

- It was slow because as the keys get long, the time to build the key
and hash it, and look up the key in the hash table becomes significant
(because the key is long and to confirm a hit you have to compare the
key byte by byte).

- It was slow because, to see all the info related to one object, you
had to do a pattern match with "array get" or "array names" which
ended up looking through all objects for a single object.

The major concern I had was that it appeared to be slower than I would
have liked or expected. I don't know how good tcl's code is for
managing large, growing arrays, e.g. are the hash chains kept short?

What I would like better would be separate arrays for each object:

Help_$winID_1($topicID!window_link) = $winID_2
Help_$winID_2($topicID!text) ...

I'm not sure if anyone has tried such a thing (I haven't).

-- John Wiersba

Kristoffer Lawson

unread,
Mar 6, 2002, 6:09:41 AM3/6/02
to
Donal K. Fellows <fell...@cs.man.ac.uk> wrote:
> Kristoffer Lawson wrote:
>> That brings in the old discussion of whether objects in OO-extensions should
>> have a real string representation. I really liked this idea and it would
>> immediately solve those issues.
>
> Depends what you mean by "real string rep." IMHO, concatenating the name of the
> object type/class name and the hex version of the pointer to the internal
> structure (or some other global counter thingy) would be unique enough for
> anyone, and dead cheap to implement. The key problem is lifespan management,
> and particularly keeping the objects live through (selected) string
> manipulations.

No, that's not what I mean, but that the string representation of an object
could be used to completely regenerate the object if for some reason it
has disappeared. It would contain all relevant information, and could also
be used for serialisation. That way, as long as you have the appropriate
data available somewhere, you never lose your object.

--
/ http://www.fishpool.com/~setok/

Donal K. Fellows

unread,
Mar 6, 2002, 6:25:46 AM3/6/02
to
Kristoffer Lawson wrote:
> No, that's not what I mean, but that the string representation of an object
> could be used to completely regenerate the object if for some reason it
> has disappeared.

Tcl_Obj's are currently supposed to have this property.

> It would contain all relevant information, and could also
> be used for serialisation. That way, as long as you have the appropriate
> data available somewhere, you never lose your object.

The problem is that regenerability doesn't square well with identity; 47 is a
regenerable mathematical object (I've just serialized it in this message, and
everyone gets the same thing back out from it!) but it does not have identity in
a conventional sense, since it isn't all that different from this 47, is it? By
comparison, assignable objects (like those that many OO people naturally assume)
are very different in that two different objects really are very different (if
you assign to a field of one and the other one does not change, that proves that
the objects are different.) Philosophically, this all depends on what it means
for things to be the same as each other, and there is definitely not agreement
on this within the broad OO community (I've been to some very interesting
seminars on this sort of thing; assignment makes theoretical models of objects
quite complex.)

This stuff is very deep, and I'm not nearly a good enough communicator to
express all the complexities in a single message.

"I wouldn't just call you wrong. I'd go further and call you an argumentative
net-kook idiot who can't do his own research before opening his mouth and
yammering bullshit." -- Theo de Raadt <der...@zeus.theos.com>

David N. Welton

unread,
Mar 6, 2002, 6:34:31 AM3/6/02
to
"Donal K. Fellows" <fell...@cs.man.ac.uk> writes:

> This stuff is very deep, and I'm not nearly a good enough
> communicator to express all the complexities in a single message.

I think you are doing a good job, and await more enlightenment:-)

Arjen Markus

unread,
Mar 6, 2002, 6:59:24 AM3/6/02
to

On the subject of assignment: the book "Structure and interpretation
of computer programs" is a very enlightening publication
<http://mini.net/tcl/2843.html>

On the subject of references: they are possible in Tcl - see my
(definitely imperfect) implementation <http://mini.net/tcl/2933.html>

This implementation does not do any garbage collection though. David
challenged me to consider that and I have been thinking about
using a totally different representation of the data that might
solve that (at least as far as reference counts are concerned),
such as [interp alias] or [namespace inscope] to add and keep track
of extra data without them being invisible.

May be someone can actually implement something like that :-)

Regards,

Arjen

Arjen Markus

unread,
Mar 6, 2002, 9:17:15 AM3/6/02
to
John Wiersba wrote:
>

> What I would like better would be separate arrays for each object:
>
> Help_$winID_1($topicID!window_link) = $winID_2
> Help_$winID_2($topicID!text) ...
>
> I'm not sure if anyone has tried such a thing (I haven't).
>
> -- John Wiersba

You could use the tree structure from Tcllib for this - or the
trick it uses: put things in different namespaces. This way
you can organise the data in a set of small arrays and not have
to worry about long keys (perhaps best to use Tcllib, though).

Regards,

Arjen

Stephen Trier

unread,
Mar 6, 2002, 11:24:27 AM3/6/02
to
In article <3C86049C...@wldelft.nl>,

Arjen Markus <Arjen....@wldelft.nl> wrote:
>On the subject of references: they are possible in Tcl - see my
>(definitely imperfect) implementation <http://mini.net/tcl/2933.html>
>
>This implementation does not do any garbage collection though. [...]

A few days ago, I appended a first try at a pure-Tcl garbage collector
to <URL: http://wiki.tcl.tk/1318.html>. It works with Kevin Kenny's
code for the lambda function and LISP-style lists based on lambda.
It works on the same principle as garbage collectors for C: Walk the
whole data space, seeking anything that looks like it might be a
pointer. I intended to adapt it to your implementation of references,
but haven't had time yet.

The biggest problem for this pure-Tcl GC is speed. Even on toy problems,
it is slow. An incremental algorithm driven from the event loop would
help in interactive applications, but it's still going to burn a lot of
CPU cycles.

I don't understand what you are getting at for your alternative
implementation, Arjen. Are you trying to leverage Tcl's reference
counting, or are you trying to narrow down the number of places a
garbage collector has to look?

Stephen

--
Stephen Trier
Technical Development Lab
Cleveland FES Center
s...@po.cwru.edu

Donal K. Fellows

unread,
Mar 6, 2002, 11:38:09 AM3/6/02
to
"David N. Welton" wrote:
> "Donal K. Fellows" <fell...@cs.man.ac.uk> writes:
>> This stuff is very deep, and I'm not nearly a good enough
>> communicator to express all the complexities in a single message.
>
> I think you are doing a good job, and await more enlightenment:-)

Hmm. I'm starting to get out of my depth here, but one group of OO-ers assume
object identity as a primitive operation (by saying every object has a unique
name), and another group (to which I belong) impose it as a derived operation
(by allowing for an optional operation, called something like getName, which is
guaranteed to distinguish two objects if there is some other operation that will
distinguish them.) This is all working in the semantic domain with only
immutable objects (modelled as relations IIRC, but I'd need to look up my notes
from the seminar series to be sure.) Things get much more complex with mutable
objects, because you also have to allow for time; if you can distinguish between
the state of an object before and after an assignment to one of its member
fields, is it really the same object? Or are you really doing some kind of
voodoo magic and really substituting it for another object underneath everyone's
feet? (It might help you if you understand that there are real language/object
systems that do just that; they have the advantage of being very easy to
parallelize on a massive scale at the compiler level, but they don't have
variables in a conventional sense.)

I'm missing out a lot of the explanation, of course, but that's because it is a
pain to convert math to ASCII and I ought to get a bit more work done today...
:^)

Darren New

unread,
Mar 6, 2002, 12:10:03 PM3/6/02
to
"Donal K. Fellows" wrote:
> This stuff is very deep, and I'm not nearly a good enough communicator to
> express all the complexities in a single message.

Not badly described. If you want an example, consider what might be the
string-rep of an object that contains a circular reference involving
other objects. E.g., A points to B, B points to C, C points to A, and
now you just want the string rep for A, and you're going to write it to
a file and read it back in later. Do you also save B and C? If so, and
you read it back before you GC B and C, does the new A point to them
again?

--
Darren New
San Diego, CA, USA (PST). Cryptokeys on demand.
To the user, everything works just as expected,
assuming the user's expectations are correct.

joe_english

unread,
Mar 6, 2002, 5:01:28 PM3/6/02
to
John Wiersba wrote:
>
>I tried something like this one time and had the following problems:
>
>- It created a huge array, since all objects were kept in one array
>and each object had many attributes which were in the same array.
>
>- It was slow because as the keys get long, the time to build the key
>and hash it, and look up the key in the hash table becomes significant
>(because the key is long and to confirm a hit you have to compare the
>key byte by byte).
>
>- It was slow because, to see all the info related to one object, you
>had to do a pattern match with "array get" or "array names" which
>ended up looking through all objects for a single object.

Instead of doing a pattern match [array names $node.*], I maintain
a list of the direct children of a node as one of the node properties:

$node1.fields { f1 f2 f3 ... }
$node1.fields.f1 "field1 value"
$node1.fields.f2 "field2 value"
...

For recursive structures I do something similar, but the subnode IDs
refer to top-level keys:

$node1.subnodes { node2 node3 node4 }
$node1.fields { ... }
$node2.subnodes { node5 node27 node99 }
$node2.fields { ... }
...

This way the keys don't get *too* long -- the maximum key length
is determined by the depth of the type hierarchy, not the depth
of the data hierarchy -- and it's not necessary to do a linear
scan through the entire list of [array names] to retrieve
subobjects.

>The major concern I had was that it appeared to be slower than I would
>have liked or expected. I don't know how good tcl's code is for
>managing large, growing arrays, e.g. are the hash chains kept short?

Tcl_HashTables are among the better implementations I've seen.
Not the absolute best, but pretty close. (FWIW: Tcl uses a
simple multiplicative hash function for string keys, closed
hashing (each bucket is a linked list), and quadruples the
table size when the mean chain length exceeds 3. Hash values
are stored with the table and don't need to be recomputed
during a rehash.)


>What I would like better would be separate arrays for each object:
>
>Help_$winID_1($topicID!window_link) = $winID_2
>Help_$winID_2($topicID!text) ...
>
>I'm not sure if anyone has tried such a thing (I haven't).

That's what tcldom::c does; each document is a separate array.

This approach works well too, but it doesn't really save memory
or time. In fact it's slightly _more_ expensive than using
a single arrray; there's extra overhead for each global variable,
and each access takes _two_ hash table lookups instead of one.

--Joe English

Donal K. Fellows

unread,
Mar 7, 2002, 4:56:48 AM3/7/02
to
Stephen Trier wrote:
> A few days ago, I appended a first try at a pure-Tcl garbage collector
> to <URL: http://wiki.tcl.tk/1318.html>. It works with Kevin Kenny's
> code for the lambda function and LISP-style lists based on lambda.
> It works on the same principle as garbage collectors for C: Walk the
> whole data space, seeking anything that looks like it might be a
> pointer. I intended to adapt it to your implementation of references,
> but haven't had time yet.

Ow! It's easier to get a basic lambda through [unknown] modifications (so long
as you don't need variables that persist between invokations) but trying to
figure out what commands are no longer needed (I assume they have to match some
kind of pattern first, or you might end up deleting stuff required for commands
that just haven't been sourced yet) by crawling over the whole data space? As I
said, ow!

Donal.
--
"If something like this happened in the real world, not only would I be under
NDA, but I would also be afraid of having my own personal butt involved in
the legal equivalent of the massacre of Little Big Horn, with the lawyers
starring as the Indians." -- Chuck Swiger <ch...@friday.com>

Donal K. Fellows

unread,
Mar 7, 2002, 5:08:19 AM3/7/02
to
Darren New wrote:
> "Donal K. Fellows" wrote:
>> This stuff is very deep, and I'm not nearly a good enough communicator
>> to express all the complexities in a single message.
>
> Not badly described. If you want an example, consider what might be the
> string-rep of an object that contains a circular reference involving
> other objects. E.g., A points to B, B points to C, C points to A, and
> now you just want the string rep for A, and you're going to write it to
> a file and read it back in later. Do you also save B and C? If so, and
> you read it back before you GC B and C, does the new A point to them
> again?

That's purely a serialization problem; Java's solution here is as good as any
I've studied up to this point. However, I'm not sure that regenerable objects
(in the sense that I was intending) can properly contain cyclic graphs (or even
DAGs), except as shorthand for infinite trees (or larger finite trees in the DAG
case.) The problem is that if A->B->C->A, then the contained A must be exactly
equivalent to the containing A; no problem if you have object identity, but
requiring some cunning mathematical tricks in the regenerable case (the real
object definition requires a fixed-point operator to define in finite space, and
I'm a little bit rusty on the denotational semantics front at the moment.)

Kristoffer Lawson

unread,
Mar 7, 2002, 5:29:48 AM3/7/02
to
Donal K. Fellows <fell...@cs.man.ac.uk> wrote:

> comparison, assignable objects (like those that many OO people naturally assume)
> are very different in that two different objects really are very different (if
> you assign to a field of one and the other one does not change, that proves that
> the objects are different.)

You're right. There is a real issue there. I was about to say use IDs for
each object which can be looked up and used if they exist in a table,
but of course that doesn't help as strings may contain old data and the
actual data has disappeared. Are there any solutions to these kinds of
problems? I'm actually quite interested now.

--
/ http://www.fishpool.com/~setok/

Arjen Markus

unread,
Mar 7, 2002, 7:24:11 AM3/7/02
to
"Donal K. Fellows" wrote:
>
> Stephen Trier wrote:
> > A few days ago, I appended a first try at a pure-Tcl garbage collector
> > to <URL: http://wiki.tcl.tk/1318.html>. It works with Kevin Kenny's
> > code for the lambda function and LISP-style lists based on lambda.
> > It works on the same principle as garbage collectors for C: Walk the
> > whole data space, seeking anything that looks like it might be a
> > pointer. I intended to adapt it to your implementation of references,
> > but haven't had time yet.
>
> Ow! It's easier to get a basic lambda through [unknown] modifications (so long
> as you don't need variables that persist between invokations) but trying to
> figure out what commands are no longer needed (I assume they have to match some
> kind of pattern first, or you might end up deleting stuff required for commands
> that just haven't been sourced yet) by crawling over the whole data space? As I
> said, ow!

I have just created persistent variables (that change due to an
invocation to reflect the changes in state) using [interp alias].

See: <http://mini.net/tcl/3096.html>

Regards,

Arjen

Stephen Trier

unread,
Mar 7, 2002, 2:50:46 PM3/7/02
to
In article <3C873960...@cs.man.ac.uk>,

Donal K. Fellows <fell...@cs.man.ac.uk> wrote:
>(I assume they have to match some kind of pattern first, or you might end
>up deleting stuff required for commands that just haven't been sourced yet)

Yes, exactly. The garbage collector on the wiki uses the rule that it
may delete anything in the ::cell namespace and nothing in any other
namespace. It searches for the glob pattern "::cell::*" when looking
for references.

It's a proof of concept with many limitations. It works for toy
applications. Its usefulness for real applications is unknown.

Darren New

unread,
Mar 7, 2002, 2:56:13 PM3/7/02
to
"Donal K. Fellows" wrote:
> That's purely a serialization problem; Java's solution here is as good as any
> I've studied up to this point.

Except that when you do it in Java, you know it's happening. I thought
the point of the "serialization" in Tcl was to turn an object into a
string and back again transparently and with garbage collection. A bit
different, when you pass an object to a function and it suddenly is a
new and different object, unrelated to the object you passed in.
I.e....

set a [big::complex::object]
set b $a

if {![catch {do_something $a}]} {...}

# Now, is $a still the same as $b? One would hope. But maybe not, if it
got serialized and repacked again to get passed thru catch or
do_something, even if neither of those tries to change its arguments.

But, yes, if your objects are more functional, this is much less of a
problem.

Arjen Markus

unread,
Mar 8, 2002, 2:57:08 AM3/8/02
to
Stephen Trier wrote:
>
> In article <3C873960...@cs.man.ac.uk>,
> Donal K. Fellows <fell...@cs.man.ac.uk> wrote:
> >(I assume they have to match some kind of pattern first, or you might end
> >up deleting stuff required for commands that just haven't been sourced yet)
>
> Yes, exactly. The garbage collector on the wiki uses the rule that it
> may delete anything in the ::cell namespace and nothing in any other
> namespace. It searches for the glob pattern "::cell::*" when looking
> for references.
>
> It's a proof of concept with many limitations. It works for toy
> applications. Its usefulness for real applications is unknown.
>
> Stephen

Have you seen my attempt at something like and unlike that at:
<http://wiki.tcl.tk/tcl/3096.html> ?

Regards,

Arjen

Donal K. Fellows

unread,
Mar 8, 2002, 4:55:32 AM3/8/02
to

Well, it is possible to use a timestamping solution so that you can hook up to
existing objects, but only if they still exist. If the object has gone, then
you have to create a copy, and it will have to be a copy of the object as it was
at a particular moment in time. Theoretically, you can use tricks like weak
references (which don't inhibit GC) to make that copy occur as late as possible,
but that only really works in some situations, and can have the down-side of
making object release an arbitrarily expensive operation. It might also impact
on reliability (has the data really been written to a stream, or was it just
some kind of smart reference that refers back to the object I'm holding here?)

However, if we set aside the problem of what to do when serializing to
(potentially) persistent data stores or other processes, you still have the
problem if I do this:

create an object A

serialize it once, storing A1 in a string (which consists of a globally
unique label for A, plus the state of A at this point)

update some fields

serialize it again, storing A2 in another string (with the same properties
as above.)

(At this point, deserializing either A1 or A2 will work, hooking back into the
original object.)

nuke A

deserialize A1, recreating A with the same name and in the state that it
was in when A1 was created

deserialize A2, which just relinks to A but in a state inconsistent with
the situation when A2 was created!

Obviously, deserializing the state from A2 over A is not right (you end up with
the same situation, but reversed.) You could try work around this by storing
not just storing the state of A, but a complete history of all assignments
instead so you could always use the latest version. But this is not right
either, if you consider a more complex situation where you (in effect) modify
and backtrack a few times; there simply might not be a consistent history (or
not without understanding that the object can, in effect, travel backwards as
well as forwards through time) and it is entirely conceivable that each
serialization might be substantively incompatible with all the others.

In general, it is only a wise idea to reconnect objects in some circumstances
(e.g. where the objects have no visible internal state, where the objects are
immutable, and where the objects have no useful sense of identity[*]) and that's
quite a high-level decision. I wouldn't expect even a sophisticated library to
be able to guess the correct thing to do in enough circumstances to be truly
useful.

Donal.
[* In this case, it doesn't matter if you substitute one for another, and
reconnection is just a performance and/or memory optimization. ]

-- A large ASCII art .sig is the Usenet equivalent of walking around in
public with your asscrack sticking out past the top of your sweatpants.
You be the judge. -- Mike Hoye <mh...@prince.carleton.ca>

John Wiersba

unread,
Mar 11, 2002, 4:48:55 PM3/11/02
to
Kevin Kenny <ken...@acm.org> wrote in message news:<3C84097B...@acm.org>...

>
> You've shamed me into it. I've started a Wiki page at
> http://wiki.tcl.tk/3073.html

Wow, I wasn't thinking of along these lines at all. Although it
maintains (some) compatibility with the current tcl, it looks prone to
extremely subtle bugs. I was thinking more along the lines of perl's
solution: just expose references at the user level and GC them with
reference counts.

What are the major issues with this approach, besides development
effort? Could such a language be backward compatible with tcl? Have
there been discussions along these lines in the past (in the newsgroup
archives)?

I imagine a new command, to_ref, which would return a reference to its
argument. A string would return a string ref, a list would return a
list ref, an array (name?) would return an array ref. Maybe the
caller would have to disambiguate and array name from a string. I
believe strings and lists can already be disabiguated in the
underlying tcl data structures.

The list functions could be enhanced to check for a list ref, in
addition to a string or list, and operate appropriately. The array
operations would have to be extended to take an array name or an array
reference. Or maybe there would be a parallel set of functions to
handle array refs?

At the expense of compatibility, maybe a reworking of the primitive
commands would enhance things. Maybe clean up the semantics of names
vs. values vs. references. For example, get rid of array names and
use array references instead, pass lists by reference instead of by
value, etc. Maybe some of this would make some of the "shimmering"
issues go away?

I realize this doesn't sound quite like the same old tcl. However,
I've never liked pass-by-name; maybe it would be a better tcl?

-- John Wiersba

Donal K. Fellows

unread,
Mar 12, 2002, 6:13:34 AM3/12/02
to
John Wiersba wrote:
> Wow, I wasn't thinking of along these lines at all. Although it
> maintains (some) compatibility with the current tcl, it looks prone to
> extremely subtle bugs. I was thinking more along the lines of perl's
> solution: just expose references at the user level and GC them with
> reference counts.

How would that work? If I decide to build a command for later evaluation like
this:

set cmd "foobar [getaref1] [getaref2]"

where would the references go?

> What are the major issues with this approach, besides development
> effort? Could such a language be backward compatible with tcl? Have
> there been discussions along these lines in the past (in the newsgroup
> archives)?

I don't currently understand the semantics of your approach as it pertains to
Tcl, but I will point out that variables containing code were far rarer in Perl4
than they are in current Tcl. Indeed, Perl never had an
"everything-is-a-string" mantra, and it was always the case that converting
between numbers and strings could lose information; the Perl world just accepted
that. The situation in Tcl is not equivalent...

> I imagine a new command, to_ref, which would return a reference to its
> argument. A string would return a string ref, a list would return a
> list ref, an array (name?) would return an array ref. Maybe the
> caller would have to disambiguate and array name from a string. I
> believe strings and lists can already be disabiguated in the
> underlying tcl data structures.

The problem is that some things can have an awful lot of interpretations (these
correspond approximately to types); to illustrate, consider the question "What
is 1?" Sure, it can be an integer, but it can also be a float, or a string, or
a variable name, or a command name, or a namespace, or a script, or a
subcommand, or... See? At the moment, the "1" is not locked into belonging to
any particular type; its interpretation is fluid.

The problem with this comes when you consider something with a definite
lifetime, particularly like an object. For various reasons, you can't present
an object as a tuple/list of its fields (that's just a snapshot of the object,
not the object itself) so you need to pick a handle-based representation as your
default (anyone wanting anything else should know they are taking snapshots,
since it is semantically meaningful; I think I explained this last week.) So
you have a handle; i.e. a reference. When should the object go away? Either
when told to (explicit deletion, which makes all existing handles to the object
invalid, reducing them to nothing other than funny strings) or when no
references to the object exist any more (a.k.a. garbage collection.) So, how do
you do this? Well, the obvious way is to use the built-in refcounting abilities
of Tcl_Objs (anything else is a *humungous* coding job, and one that is trying
to hit a moving target too!) Which works (and is used by a number of
extensions, not all of which are public) but is very fragile as references are
lost too easily, particularly when doing string operations like substitution
into double-quoted words or evaluation of scripts; from 8.3 onwards, Tcl has
some tricks to work around this, but the mechanism is not robust enough for
anything other than a performance optimisation. Trust me on this point, and I
wrote that code!

There are two other ways in which Tcl_Objs could do with enhancing;
a) allow objects to hold more than two simultaneous interpretations
b) allow objects to veto changes to their interpretation

Now, the Tcl_Obj structure is the most commonly allocated object in the whole
Tcl core, and by a very large factor, so extending it to include another field
will have major memory consumption implications. It would also greatly
complicate both the code to do format conversion (which is not as clear as I
would like, I'd point out) and make debugging much more complex too. So
realistically speaking, a) is probably out of our reach, even with a major
version change.

It would be far easier to tackle b) though, but that can't be done until we
start work on 9.0

It should be noted that currently Tcl has very simple (operational) semantics
that can be grasped with very little training; the commands might be complex,
but Tcl itself is not. This Is Good.

> At the expense of compatibility, maybe a reworking of the primitive
> commands would enhance things. Maybe clean up the semantics of names
> vs. values vs. references. For example, get rid of array names and
> use array references instead, pass lists by reference instead of by
> value, etc. Maybe some of this would make some of the "shimmering"
> issues go away?

I don't think so, but I might be wrong. Put together a fleshed-out suggestion
and I'll be able to give better feedback. (That's the problem with the current
situation; too much hot-air[*] and not enough code.)

> I realize this doesn't sound quite like the same old tcl. However,
> I've never liked pass-by-name; maybe it would be a better tcl?

Maybe, maybe not. Can't tell without details.

Donal (don't know how relevant all of what I said is! :^)
[* Me included! ]

-- If I teach this course thoroughly enough, none of you will attempt the exam
questions on this topic, and I shall consider this to be a complete success.
-- Arthur Norman

John Wiersba

unread,
Mar 12, 2002, 11:48:48 AM3/12/02
to
Thanks for your long reply, Donal. You're right that what I wrote is
currently just all speculation and dreaming. I was hoping someone had
been down that road before and maybe even gotten somewhere.


"Donal K. Fellows" <fell...@cs.man.ac.uk> wrote
>

> How would that work? If I decide to build a command for later evaluation like
> this:
>
> set cmd "foobar [getaref1] [getaref2]"
>
> where would the references go?

I was thinking of:

set cmd [list foobar [getaref1] [getaref2]]

It seems to me that the two major culprits in this regard are the use
of strings as lists and the use of pass-by-name.

In the example above, if the "list" command were used, then the value
assigned to cmd would not be just a string as in your code, it would
be a list. Each item in a list would retain its properties. If you
use a string as a list, the list functions would still work, but the
elements of the list would only be (sub)strings -- they couldn't be
anything else. But, if you use the list functions on a true list,
then the elements of the list could be references or strings. On the
other hand, if you used a list as a string, any non-string elements
would lose their special properties and become strings, just like it
works today.

I admit that the "format"-style substitutions would have to be handled
differently if you wanted the command to contain elements other than
strings.

-- John Wiersba

Kevin Kenny

unread,
Mar 12, 2002, 12:20:23 PM3/12/02
to
> "Donal K. Fellows" <fell...@cs.man.ac.uk> wrote
> >
> > How would that work? If I decide to build a command for later evaluation like
> > this:
> >
> > set cmd "foobar [getaref1] [getaref2]"
> >
> > where would the references go?

John Wiersba wrote:
> I was thinking of:
>
> set cmd [list foobar [getaref1] [getaref2]]
>
> It seems to me that the two major culprits in this regard are the use
> of strings as lists and the use of pass-by-name.

Your idea works well in Scheme, where all code is representable as lists.
In Tcl, all too often, one wants to compose a script that comprises more
than one command. There is no way to compose such a script without
flattening it to a string and reparsing it. This limitation is
fundamental to the nature of the language, in which the only
truly primitive operation is string substitution.

Experienced Tcl'ers have an entire bag of tricks for finessing this
situation -- which is also the chief cause of "Quoting Hell" -- but
sometimes there really is no alternative to interpolating references
into a string.

Needing to interpolate object references into flattened strings is what
most often causes the references to be lost. One possible solution, as
I discussed earlier in this thread, is an object that behaves like a
string but carries annotations that show the provenance of its
substrings. Such an object can hold on to object references while
still appearing like a string to the script.

--
73 de ke9tv/2, Kevin KENNY GE Corporate R&D, Niskayuna, New York, USA

Stephen Trier

unread,
Mar 12, 2002, 7:22:20 PM3/12/02
to
Despite my better judgement, I'll float the following idea:

There exist illegal sequences in UTF-8 which could be used to code
pointers to objects that can be garbage-collected. Garbage collection
would require examining the strings in the system to find references
in this translated state as well as looking for references in their
natural, pointer form.

One way to think of this is that anything returning a reference will
return a handle, much like is done now, but the handle is a string
representation of the reference itself, recognizable to the garbage
collector and sufficient for reconstruction of the reference from
the string alone.

The I/O functions would have to guard against input and output of
these illegal UTF-8 sequences. String functions that work on UTF-8
should pass the coded references unchanged. I don't know how to make
Tcl_UniChars work with this scheme.

It's a klugy idea, but it preserves the notion that everything is a
string.

Donal K. Fellows

unread,
Mar 13, 2002, 6:47:19 AM3/13/02
to
John Wiersba wrote:
> "Donal K. Fellows" <fell...@cs.man.ac.uk> wrote
>> How would that work? If I decide to build a command for later evaluation like
>> this:
>> set cmd "foobar [getaref1] [getaref2]"
>> where would the references go?
>
> I was thinking of:
> set cmd [list foobar [getaref1] [getaref2]]

That works already. Just don't look at the string form of $cmd anywhere. Which
is why I was saying things as they stand are fragile (it doesn't matter if you
can recreate an exactly equivalent object from the string form, which is the
assumption behind Tcl_Objs currently; in that case, the special handling of
[list]-constructed forms is just a performance optimisation.)

> It seems to me that the two major culprits in this regard are the use
> of strings as lists and the use of pass-by-name.

The problem is that commands != lists. (What's the problem with pass-by-name?
I don't currently see it; in a reference-oriented world, you just make the name
be the reference.)

Donal.

-- There are worse futures that burning in hell. Imagine aeons filled with
rewriting of your apps as WinN**X API will mutate through eternity...
-- Alexander Nosenko <n...@titul.ru>

John Wiersba

unread,
Mar 13, 2002, 12:37:18 PM3/13/02
to
Kevin Kenny <ken...@acm.org> wrote
> John Wiersba wrote:
> > I was thinking of:
> >
> > set cmd [list foobar [getaref1] [getaref2]]
>
> Your idea works well in Scheme, where all code is representable as lists.
> In Tcl, all too often, one wants to compose a script that comprises more
> than one command. There is no way to compose such a script without
> flattening it to a string and reparsing it.

Hmm. I'll have to think about this more. What if there were a
command "do" whose arguments were lists, each of which was a command?
Wouldn't that work?

> This limitation is
> fundamental to the nature of the language, in which the only
> truly primitive operation is string substitution.

I don't see why it's fundamental. Tcl defines string->list and
list->string mappings, but why does that impose the restriction you
mention?



> Experienced Tcl'ers have an entire bag of tricks for finessing this
> situation -- which is also the chief cause of "Quoting Hell" -- but
> sometimes there really is no alternative to interpolating references
> into a string.

It seems to me that (this variant of) tcl would be a cleaner language
if we could avoid "quoting hell".

-- John Wiersba

John Wiersba

unread,
Mar 13, 2002, 12:57:30 PM3/13/02
to
"Donal K. Fellows" <fell...@cs.man.ac.uk> wrote
> John Wiersba wrote:
> > I was thinking of:
> > set cmd [list foobar [getaref1] [getaref2]]
>
> That works already. Just don't look at the string form of $cmd anywhere. Which
> is why I was saying things as they stand are fragile (it doesn't matter if you
> can recreate an exactly equivalent object from the string form, which is the
> assumption behind Tcl_Objs currently; in that case, the special handling of
> [list]-constructed forms is just a performance optimisation.)

I agree that if you converted a list back to its string form and threw
away the list form, you would lose the object references. But no
one's forcing you to do either (and if they are, maybe there's a way
of cleaning that up).

> > It seems to me that the two major culprits in this regard are the use
> > of strings as lists and the use of pass-by-name.
>
> The problem is that commands != lists.

You lost me on this one: "commands != lists". I thought that was more
or less the definition of a command. Kevin says: "scripts != lists".
Is that what you mean?

> (What's the problem with pass-by-name?
> I don't currently see it; in a reference-oriented world, you just make the name
> be the reference.)

I don't understand this one either. Pass-by-name is nowhere near the
same semantics as pass-by-reference. The problems with pass-by-name
and not having references in general are:

1) You have to name so many things that you would never think of
naming in a language with references.
2) You can't create many complex data structures except by the kludge
of using globals (in some namespace) so that you can give everything a
name.

Having references available is *so* much cleaner.

I'm still not convinced that tcl (even without complex enhancements
like magic strings with substructure) is incompatible with references.
Not having references is IMHO the one severe disadvantage which makes
tcl "incomplete" as a programming language (just as perl 4 was a "toy"
language in some sense).

-- John Wiersba

Roy Terry

unread,
Mar 13, 2002, 2:37:50 PM3/13/02
to
John Wiersba wrote:
>
> I'm still not convinced that tcl (even without complex enhancements
> like magic strings with substructure) is incompatible with references.
> Not having references is IMHO the one severe disadvantage which makes
> tcl "incomplete" as a programming language (just as perl 4 was a "toy"
> language in some sense).
If your version of "complete" makes code harder to
write/understand then I'll happily stay with the toy language
as it already allows me to accomplish way more than
others and actually enjoy it.

Michael A. Cleverly

unread,
Mar 13, 2002, 3:21:47 PM3/13/02
to
On Wed, 13 Mar 2002, Roy Terry wrote:

> If your version of "complete" makes code harder to
> write/understand then I'll happily stay with the toy language
> as it already allows me to accomplish way more than
> others and actually enjoy it.

QOTW?

David N. Welton

unread,
Mar 13, 2002, 3:33:20 PM3/13/02
to
Roy Terry <royt...@earthlink.net> writes:

> If your version of "complete" makes code harder to write/understand
> then I'll happily stay with the toy language as it already allows me
> to accomplish way more than others and actually enjoy it.

If you think that data structures via namespaces and global variables
and commands are "complete" or "easier to write/understand"
then... oh, forget it, I don't want to ruin the nice atmosphere here.

Keep an open mind, though,

Roy Terry

unread,
Mar 13, 2002, 5:09:45 PM3/13/02
to

"David N. Welton" wrote:
>
> Roy Terry <royt...@earthlink.net> writes:
>
> > If your version of "complete" makes code harder to write/understand
> > then I'll happily stay with the toy language as it already allows me
> > to accomplish way more than others and actually enjoy it.
>
> If you think that data structures via namespaces and global variables
> and commands are "complete" or "easier to write/understand"
> then... oh, forget it, I don't want to ruin the nice atmosphere here.

Didn't say that.
And I agree that namespaces add complexity for
most people. I will differ re globals - everybody's favorite
whipping boy:

Globals are *easy* to understand.

People *can* make messes with globals same as they can
make messes with classes and references.

Globals can be used for many problems w/o making a mess.

When you use globals for persistent data you should organize it
carefully.

When you use an invisible linked hierachy of pointers/objects/references
to store persistent data you should organize it carefully.

Is that not fair?

As far as "completeness" I take it as pure rhetorical
air, as given any language I can imagine an arbirary number
of features I believe it should have to be complete for me.

Cheers,
Roy

(Still enjoying it)

John Wiersba

unread,
Mar 13, 2002, 6:34:41 PM3/13/02
to
Roy Terry <royt...@earthlink.net> wrote in message news:<3C8FAA8B...@earthlink.net>...

I agree. The idea would be to make code *easier* to understand. Why
do you think having references would make code harder to understand?

joe_english

unread,
Mar 13, 2002, 7:44:56 PM3/13/02
to
John Wiersba wrote:
>Kevin Kenny wrote

>> In Tcl, all too often, one wants to compose a script that comprises more
>> than one command. There is no way to compose such a script without
>> flattening it to a string and reparsing it.
>
>Hmm. I'll have to think about this more. What if there were a
>command "do" whose arguments were lists, each of which was a command?
>Wouldn't that work?

That's actually a pretty neat idea.

It would probably be more flexible to interpret the arguments
as scripts instead of as commands, since a list in canonical form
is the same when interpreted either way.

proc sequence {args} {
foreach arg $args { uplevel $arg }
}

This would be a safer way to extend scripts than just
appending "\n$extraStuff", and as an added bonus it may
also benefit from the "eval pure list" optimization and
reduced shimmering.


--Joe English

David N. Welton

unread,
Mar 14, 2002, 3:58:03 AM3/14/02
to
Roy Terry <royt...@earthlink.net> writes:

> As far as "completeness" I take it as pure rhetorical air, as given
> any language I can imagine an arbirary number of features I believe
> it should have to be complete for me.

There are a lot of practical things that one can do with references
that cannot easily be done in Tcl.

http://mini.net/tcl/2933.html

is a good example.

Donal K. Fellows

unread,
Mar 14, 2002, 4:08:33 AM3/14/02
to
Joe, English wrote:
> proc sequence {args} {
> foreach arg $args { uplevel $arg }
> }
>
> This would be a safer way to extend scripts than just
> appending "\n$extraStuff", and as an added bonus it may
> also benefit from the "eval pure list" optimization and
> reduced shimmering.

Downside: it's a pain to write (at the moment.)

sequence [list foo1 $bar1 $bar2] [list foo2 $bar3 $bar4] \
[list foo3 $bar5 $bar6] [list foo4 $bar7 $bar8]

And it interacts *very* badly with debuggers; look at the code in any way, and
you lose the benefit of the pure-list stuff. Even if you force pure-list-ness,
you still have a problem in that sub-objects might lose their special
properties, potentially leading to early GC.

Donal (this is why KK and I have an interest in tinkering with this.)

-- This may scare your cat into premature baldness, but Sun are not the only
sellers of Unix. -- Anthony Ord <n...@rollingthunder.clara.co.uk>

Donal K. Fellows

unread,
Mar 14, 2002, 4:35:02 AM3/14/02
to
John Wiersba wrote:
> "Donal K. Fellows" <fell...@cs.man.ac.uk> wrote
>> John Wiersba wrote:
>>> I was thinking of:
>>> set cmd [list foobar [getaref1] [getaref2]]
>> That works already. Just don't look at the string form of $cmd anywhere.
>> Which is why I was saying things as they stand are fragile (it doesn't
>> matter if you can recreate an exactly equivalent object from the string
>> form, which is the assumption behind Tcl_Objs currently; in that case, the
>> special handling of [list]-constructed forms is just a performance
>> optimisation.)
>
> I agree that if you converted a list back to its string form and threw
> away the list form, you would lose the object references. But no
> one's forcing you to do either (and if they are, maybe there's a way
> of cleaning that up).

Right now, that's not possible. When an object with both a list rep and a
string rep is passed to [eval] (well, Tcl_EvalObjEx() IIRC) then you have to use
the string form (see below), and then it also makes sense to cache the results
in case you [eval] the object again.

set script {
# This is a comment
command1 arg1
command2 arg2
}
puts [llength $script]; # Prints '9'
eval $script; # Had better not evaluate '#' w 8 args!

Without knowing if the provenance of the list form, you have to use the string
form, and that destroys the list form. The only non-fragile way to fix this
that I can see (that isn't completely inimicable to Tcl's current semantics) is
to find a way to store object references in the string form. Which can't be
done compatibly.

Or can it? No code outside the core allocates Tcl_Objs for itself ('cos that'd
be a memory leak) so an extra field at the end would just be ignored by old
code. Hmmmm...

>>> It seems to me that the two major culprits in this regard are the
>>> use of strings as lists and the use of pass-by-name.
>> The problem is that commands != lists.
>
> You lost me on this one: "commands != lists". I thought that was more
> or less the definition of a command. Kevin says: "scripts != lists".
> Is that what you mean?

Only in part. Due to a coffee-shortage, I don't remember the rest of what I was
thinking of here. (Something to do with subtyping...)

>> (What's the problem with pass-by-name?
>> I don't currently see it; in a reference-oriented world, you just make
>> the name be the reference.)
>
> I don't understand this one either. Pass-by-name is nowhere near the
> same semantics as pass-by-reference.

OK, perhaps you can tell me this: what is the difference between a name and a
human-readable reference? You certainly don't need to force everyone to pick
the names for the references (Tcl doesn't do this with file handles, for
example.)

> The problems with pass-by-name and not having references in general are:
>
> 1) You have to name so many things that you would never think of
> naming in a language with references.
> 2) You can't create many complex data structures except by the kludge
> of using globals (in some namespace) so that you can give everything a
> name.

So you don't have so much in the way of complex data structures? Having seen
the knots many people wind themselves into with data structures, I'm not
convinced that this is a bad thing...

> Having references available is *so* much cleaner.

If only I bought the difference between references and names, I'd be happier
(the Tcl style of pass-by-name is really pass-by-reference with unusual kinds of
references; real pass-by-name is something else entirely.)

Donal K. Fellows

unread,
Mar 14, 2002, 10:59:17 AM3/14/02
to
Stephen Trier wrote:
> Despite my better judgement, I'll float the following idea:

:^)

> There exist illegal sequences in UTF-8 which could be used to code
> pointers to objects that can be garbage-collected. Garbage collection
> would require examining the strings in the system to find references
> in this translated state as well as looking for references in their
> natural, pointer form.

Ouch. Can I run and hide now?

> It's a klugy idea, but it preserves the notion that everything is a
> string.

Ah, but does it preserve our collective sanity? (I think it will break things.
I think it will break things that lots of code assumes will never break. I've
been looking at this message for two days, and I'm still not sure quite what I
think of it. And GC of a long string not containing references will be an O(n)
operation. Ouch.)

Donal.

John Wiersba

unread,
Mar 14, 2002, 8:12:49 PM3/14/02
to
"Donal K. Fellows" <fell...@cs.man.ac.uk> wrote
> Downside: it's a pain to write (at the moment.)
>
> sequence [list foo1 $bar1 $bar2] [list foo2 $bar3 $bar4] \
> [list foo3 $bar5 $bar6] [list foo4 $bar7 $bar8]

It does look a little more painful, but from that point of view, so is

[expr {$x+$y}]

And the upside is that you're saying exactly what you mean:

set cmd1 [list foo1 $bar1 $bar2]
set cmd2 ...
sequence $cmd1 $cmd2 ...

Maybe there should be a sequencelist command which takes a list of
commands:

sequencelist [list $cmd1 $cmd2]

which would all the backslash-at-the-end-of-line nonsense get rid of
extra evals

sequencelist $cmdlist
eval sequence $cmdlist # same thing

> And it interacts *very* badly with debuggers; look at the code in any way, and
> you lose the benefit of the pure-list stuff. Even if you force pure-list-ness,
> you still have a problem in that sub-objects might lose their special
> properties, potentially leading to early GC.
>
> Donal (this is why KK and I have an interest in tinkering with this.)

I don't understand why debuggers are a problem here.

Also, why do you think sub-objects might lose their special
properties? Since they're list elements and each list element is a
separate thing, its properties can be kept because it doesn't have to
get mangled into the middle of a string.

-- John Wiersba

Donal K. Fellows

unread,
Mar 15, 2002, 4:26:48 AM3/15/02
to
John Wiersba wrote:
> I don't understand why debuggers are a problem here.

Because they print things, i.e. generate string reps. For semantic safety (i.e.
compatability with existing code, which is kind of important in a non-major
release) this forces a trip through the parser/compiler and that caches its
results (usually a Good Thing.) And *that* deallocates the tree of references,
which is where things go wrong with the current Tcl_Obj semantics, since if that
command-list held the only reference to something, that reference is *gone* and
the object is GCed. The only way out of this is to have a way of embedding
references in the (UTF-8) string rep itself (and we'd like to do it efficiently
too, of course.)

There is also a separate issue with allowing objects to veto type loss, which is
needed for a few places in the core ("references as commands" springs to mind
here, which is useful for things like lambda expressions) but that's not the
major item.

Donal.

-- Maybe we should do cultural exchange and rename those languages _Smltk and
ThirdLetterOfAlphabet. -- Mark van Gulik <gho...@home.net>

John Wiersba

unread,
Mar 15, 2002, 3:53:39 PM3/15/02
to
"Donal K. Fellows" <fell...@cs.man.ac.uk> wrote
> OK, perhaps you can tell me this: what is the difference between a name and a
> human-readable reference? You certainly don't need to force everyone to pick
> the names for the references (Tcl doesn't do this with file handles, for
> example.)

I think it's more a matter of how the reference is used. Do you just
dereference it to use it or do you have to use it as a key to look
something up in another data structure? Can it be garbage-collected
for you or do you have to memory-manage it yourself? If you equate
tcl names with references, then one limitation is that you can't
garbage-collect tcl's names whereas other languages *can* collect
their references. This can make a big difference in the cleanliness
of code as people coming to java from C have learned.

> So you don't have so much in the way of complex data structures? Having seen
> the knots many people wind themselves into with data structures, I'm not
> convinced that this is a bad thing...
>
> > Having references available is *so* much cleaner.
>
> If only I bought the difference between references and names, I'd be happier
> (the Tcl style of pass-by-name is really pass-by-reference with unusual kinds of
> references; real pass-by-name is something else entirely.)

OK, so here's an example of where references are useful: say you want
to write code to manage an array of arrays. Do you create a single
array with a double index? If you do, then how do you pass one of the
inner arrays to a function that needs to modify that inner array? I
suppose, you could pass the array by value-result (i.e. as a flattened
list, both coming in and out of the function), but that's not very
efficient. This is the kind of data structure that references can
handle easily but tcl names cannot.

-- John Wiersba

Donal K. Fellows

unread,
Mar 18, 2002, 6:09:07 AM3/18/02
to
John Wiersba wrote:
> "Donal K. Fellows" <fell...@cs.man.ac.uk> wrote
>> OK, perhaps you can tell me this: what is the difference between a name
>> and a human-readable reference? You certainly don't need to force
>> everyone to pick the names for the references (Tcl doesn't do this with
>> file handles, for example.)
>
> I think it's more a matter of how the reference is used. Do you just
> dereference it to use it or do you have to use it as a key to look
> something up in another data structure? Can it be garbage-collected
> for you or do you have to memory-manage it yourself? If you equate
> tcl names with references, then one limitation is that you can't
> garbage-collect tcl's names whereas other languages *can* collect
> their references. This can make a big difference in the cleanliness
> of code as people coming to java from C have learned.

I don't think this answers my question. Suppose you have a variable reference,
how would you use it? Well, one way (that is entirely consistent with the way
Tcl currently operates) is to pass it to [upvar] so as to get a local named
version of the reference (and automatically adds another reference to the
reference[*] from the local stack frame) which can then be used. How is that
radically different from the current situation?

What are names but humanly-readable references? Well, with an additional need
for explicit lifespan management (but that's only true when references refer to
things that are globally acccessible; local variables are already automatically
GCed when their containing stack frame is disposed.) If you go to a system
where you are putting live references in things, all this really permits you to
do is to dispense with storing those things in the global name->reference maps.
Sure, that's quite a big gain but it needs to be done carefully.

> OK, so here's an example of where references are useful: say you want
> to write code to manage an array of arrays. Do you create a single
> array with a double index?

When you say "array" do you mean in the C or Tcl sense?

> If you do, then how do you pass one of the
> inner arrays to a function that needs to modify that inner array?

Assuming you mean Tcl arrays (C ones get to use the new [lset] command) then the
efficient way is to pass in the name of the whole array (we'd be using a single
array because there is no particular performance gain for keeping the array
small, and a maintainability loss for splitting it up) and an index fragment.
It is even possible to use a variation upon this to perform slicing along
arbitrary dimensions, which is difficult when you've stuffed the array full of
references...

> I suppose, you could pass the array by value-result (i.e. as a flattened
> list, both coming in and out of the function), but that's not very
> efficient. This is the kind of data structure that references can
> handle easily but tcl names cannot.

Initially, I must admit I was thinking you were talking about C-style arrays,
where taking arbitrary segments is actually quite difficult (and which is a
limitation of the way C does arrays.) Still, I see your point. Even though you
wouldn't do it that way in Tcl as it currently stands (see my comments above
about a monolithic array.)

-- What's the reason of long discussions, if at the end someone says, "we have
not yet thought about it..." -- Andreas Leitgeb <a...@pc7499.gud.siemens.at>

John Wiersba

unread,
Mar 18, 2002, 11:25:28 AM3/18/02
to
"Donal K. Fellows" <fell...@cs.man.ac.uk> wrote
> I don't think this answers my question. Suppose you have a variable reference,
> how would you use it? Well, one way (that is entirely consistent with the way
> Tcl currently operates) is to pass it to [upvar] so as to get a local named
> version of the reference (and automatically adds another reference to the
> reference[*] from the local stack frame) which can then be used. How is that
> radically different from the current situation?

In my book, there's a big difference between tcl names and references.
A snippet of code can't use a name without knowing its context, i.e.
how to interpret it. A reference can just be dereferenced. With
upvar, you have to know where the name is coming from (i.e. up one
level in the call stack). But with references, they can be used by
anyone at any level of the call stack. See below for more.



> What are names but humanly-readable references? Well, with an additional need
> for explicit lifespan management (but that's only true when references refer to
> things that are globally acccessible; local variables are already automatically
> GCed when their containing stack frame is disposed.) If you go to a system
> where you are putting live references in things, all this really permits you to
> do is to dispense with storing those things in the global name->reference maps.
> Sure, that's quite a big gain but it needs to be done carefully.

No GC is a big loss to take. That's what started this whole thread.



> > OK, so here's an example of where references are useful: say you want
> > to write code to manage an array of arrays. Do you create a single
> > array with a double index?
>
> When you say "array" do you mean in the C or Tcl sense?

I meant tcl arrays.



> > If you do, then how do you pass one of the
> > inner arrays to a function that needs to modify that inner array?
>
> Assuming you mean Tcl arrays (C ones get to use the new [lset] command) then the
> efficient way is to pass in the name of the whole array (we'd be using a single
> array because there is no particular performance gain for keeping the array
> small, and a maintainability loss for splitting it up) and an index fragment.
> It is even possible to use a variation upon this to perform slicing along
> arbitrary dimensions, which is difficult when you've stuffed the array full of
> references...

Then you have to establish a global convention in your code that
you're passing not just a name of an array, but the name of an array
plus an index fragment. Every bit of code in your program that might
operate on an anonymous array must use that convention. If you could
just get a reference to that "virtual" inner array, you could pass it
to any piece of code which could treat it as a plain old array
reference, not a slice of a larger array. This name+fragment
convention looks to me like hacking around a problem because there's
no direct way to solve it (efficiently). The only direct solution in
tcl is to pass by value (which is very inefficient) and that will
frequently have the wrong semantics (no data structure sharing).

> > I suppose, you could pass the array by value-result (i.e. as a flattened
> > list, both coming in and out of the function), but that's not very
> > efficient. This is the kind of data structure that references can
> > handle easily but tcl names cannot.
>
> Initially, I must admit I was thinking you were talking about C-style arrays,
> where taking arbitrary segments is actually quite difficult (and which is a
> limitation of the way C does arrays.) Still, I see your point. Even though you
> wouldn't do it that way in Tcl as it currently stands (see my comments above
> about a monolithic array.)

Maybe someday when I have enough time (when I retire?), I'll play
around with a tcl-like language which has built-in references...

-- John Wiersba

Donal K. Fellows

unread,
Mar 19, 2002, 5:36:36 AM3/19/02
to
John Wiersba wrote:
> Maybe someday when I have enough time (when I retire?), I'll play
> around with a tcl-like language which has built-in references...

Don't wait around too long or we might get there before you...

Donal.
-- Fools have been claiming the death of Unix since it was developed in the
late 60's. Unfortunately (for them) the death of Unix keeps getting
postponed, due to circumstances beyond their control.
-- Terry Porter <tjpo...@gronk.porter.net>

Chin Huang

unread,
Mar 12, 2002, 10:56:23 PM3/12/02
to
In article <3C83946F...@cs.man.ac.uk>,
Donal K. Fellows <fell...@cs.man.ac.uk> wrote:
>Yes. There'd be all sorts of new cool things you could do with GC'ed entities
>(lambda closures, unnamed objects, stack frames, etc.) Right now, the lack of
>GC makes working with such things painful, to say the least...

I realized the mechanism tcom uses to implement handles might be the
basis for a more general reference counted object implementation. I
threw together a quick and dirty proof of concept and put it at the
http://www.vex.net/~cthuang/ref/ Web site. This is an extension which
allows you to create a handle to an [incr Tcl] object. The handle
represents a reference counted object and when the last reference is
released the [incr Tcl] object is destroyed.

In the following example, the command

::ref::new $account [list delete object $account]

returns a handle to a reference counted object which delegates its
operations to the [incr Tcl] object specified by $account. The second
argument to the ::ref::new command specifies the script to execute when
the last reference to the object is released. In this case, it deletes
the $account [incr Tcl] object.

I added the ::ref::type command so I can show the object survives when
the handle's internal represtation shimmers to a string. The
::ref::type command returns the name of the internal representation type
of its argument.

Here's the example:


package require ref

package require Itcl
namespace import itcl::*

class Account {
public variable balance 0

destructor {
puts "Account::destructor"
}

public method deposit {amount} {
set balance [expr $balance + $amount]
}

public method withdraw {amount} {
set balance [expr $balance - $amount]
}
}

proc createAccount {} {
set account [Account #auto]

# Create a reference counted object that delegates operations to an
# Itcl object. When the last reference is released, destroy the
# Itcl object.
return [::ref::new $account [list delete object $account]]
}

set accountObj [createAccount]

# The internal representation type is "cmdName".
puts "Internal representation type is [::ref::type $accountObj]"
puts [$accountObj cget -balance]
$accountObj deposit 30
puts [$accountObj cget -balance]

# This command changes the internal representation type to "string".
puts [string length $accountObj]
puts "Internal representation type is [::ref::type $accountObj]"

# This command restores the internal representation type to "cmdName".
$accountObj withdraw 20
puts "Internal representation type is [::ref::type $accountObj]"
puts [$accountObj cget -balance]

# Release the last reference.
unset accountObj

Kristoffer Lawson

unread,
Mar 21, 2002, 4:38:31 AM3/21/02
to
John Wiersba <jrw3...@yahoo.com> wrote:
> In my book, there's a big difference between tcl names and references.
> A snippet of code can't use a name without knowing its context, i.e.
> how to interpret it. A reference can just be dereferenced. With
> upvar, you have to know where the name is coming from (i.e. up one
> level in the call stack). But with references, they can be used by
> anyone at any level of the call stack. See below for more.

You talk a lot about the need for references in the context of arrays.
I understand that for actual objects there is an understandable need for
some form of referencing (and except for the GC-problem, global names
would seem like good enough), but arrays I'd like to use simply as values.
Ie. pass an array and its fields in, the function alters it as needed and
passes it back, if the caller needs an edited version. For most things
I really don't want to use references.

Now I wonder if there are OO-paradigms that do this..?

--
/ http://www.fishpool.com/~setok/

John Wiersba

unread,
Mar 24, 2002, 8:37:09 PM3/24/02
to
Kristoffer Lawson <se...@gfanrend.fishpool.fi> wrote in message news:<a7c9mn$8tg$1...@tron.sci.fi>...

By arrays, I mean associative arrays, e.g. hashes, e.g. tcl arrays.
It's certainly nice to be able to pass both lists and arrays (hashes)
by value and by reference in different contexts. I think there are
many cases where the code is modifying an array variable where pass by
reference would be appropriate.

-- John Wiersba

Kristoffer Lawson

unread,
Mar 25, 2002, 8:18:08 AM3/25/02
to
John Wiersba <jrw3...@yahoo.com> wrote:
> By arrays, I mean associative arrays, e.g. hashes, e.g. tcl arrays.
> It's certainly nice to be able to pass both lists and arrays (hashes)
> by value and by reference in different contexts. I think there are
> many cases where the code is modifying an array variable where pass by
> reference would be appropriate.

Yes, I mean associative arrays as well. While I can't do it now in Tcl
without a lot of array gets and sets, for a lot of things I'd probably prefer
to send values back and forth. Ie. send a value representing the whole array,
and get a similar result back, with appropriate changes made.
When the code gets bigger it's often easier to understand that than having
many references lying around the place.

--
/ http://www.fishpool.com/~setok/

Donal K. Fellows

unread,
Mar 25, 2002, 9:50:07 AM3/25/02
to
Chin Huang wrote:
> I realized the mechanism tcom uses to implement handles might be the
> basis for a more general reference counted object implementation. I
> threw together a quick and dirty proof of concept and put it at the
> http://www.vex.net/~cthuang/ref/ Web site. This is an extension which
> allows you to create a handle to an [incr Tcl] object. The handle
> represents a reference counted object and when the last reference is
> released the [incr Tcl] object is destroyed.

Which is all very well, except Tcl (and many extensions, particularly including
Tk) is currently far too eager to throw internal reps away. I can't, for
example, keep a reference alive when it is just stored in a binding script or
-command option. *That* is what needs the work, and it looks like quite a big
job. Once lifespan management is right, references such as you describe should
work pretty much as you'd expect (and until the problem is fixed, reference
schemes such as you describe will continue to be fragile; I should know, I've
written my own in the past.)

"I wouldn't just call you wrong. I'd go further and call you an argumentative
net-kook idiot who can't do his own research before opening his mouth and
yammering bullshit." -- Theo de Raadt <der...@zeus.theos.com>

0 new messages