Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Dict sharing vs. duplication

65 views
Skip to first unread message

Alexandre Ferrieux

unread,
Dec 20, 2007, 5:34:53 PM12/20/07
to
Hi,

Now that 8.5 is out thanks to intensive work from our beloved
Conspicuous Tcl Illuminati, I'm getting prepared to use dicts
really...

Looking at the dict manpage, I first discovered that no obvious naming
scheme tells in-place operations from functional ones. Of course
that's not worse than lappend/lreplace, and reusing the same names
makes sense !

Then, I wondered about what copy-on-write optimization became in dict
world.
Indeed, if $x is a dict value,

set y $x

copies it, in an efficient way (by reference, incrementing just the
refcount).
Now, the dict value is "shared" (refcount>=2). So as soon as someone
touches it:

dict incr x foo

it gets duplicated, so that $x and $y can have separate destinies.
That's copy-on-write.

Now suppose I have a huge dict, stored in some global, and that I pass
its value back and forth to various routines. Its refcount will go up
with each level of proc call where it is passed as an argument, since
the argument is a local holding a bona fide reference. Suddenly, under
a stack depth of (say) three, I decide to update the dict, eg with a
[dict incr] on the global variable. Won't the huge dict be immediately
duplicated ?

If the above story holds true, then passing a dict value is not a
direct replacement for passing an array name for upvarring: it has
stringent restrictions, and careful refcounting is needed as soon as r/
w operations occur...
Disclaimer: this is no criticism. This semantics is traditional[*] for
all first-class values in Tcl; however if again it's the case, it
should just be advertised so that people don't get caught by
surprise...

Or am I missing something ?

-Alex

[*] But is it necessary ? What would break in Tcl if we allowed
mutable shared values, eg lists or even strings ?
Granted, one would need to backpropagate strep invalidation to all
referers. This would have a cost. But are there other drawbacks ?

Michael Schlenker

unread,
Dec 20, 2007, 6:42:12 PM12/20/07
to
Alexandre Ferrieux schrieb:
The Tcl_Obj representing your 'foo' entry at least needs to be
duplicated and incremented. And as this changes the whole dict, you also
get a duplicate of the hash table structure.

DictIncrCmd in generic/tclDictObj.c tells you the details about this,
there is some optimization in place so that huge string reps aren't
copied. The code in DupDictInternalRep then tells you that the other
values aren't copied, but only refcount increased.

So yes, there is a copy, but its kept close to the minimum possible. You
need a new hashtable struct...

Michael

tom.rmadilo

unread,
Dec 20, 2007, 9:29:28 PM12/20/07
to
On Dec 20, 3:42 pm, Michael Schlenker <schl...@uni-oldenburg.de>
wrote:

This is interesting but I would like to ask: is this just a
performance convenience that we should not think about and rely upon?
It seems that if you want to use a dict as a single reference, you
should just pass around the name of the dict, or use upvar, etc. Dict
offers the potential to make a copy, which is cool, it seems great
that the implementation tries to minimize copying, I just wonder if
Tcl'ers should think about this. In Tcl the best way to not duplicate
objects is to refer to them directly or through upvar.

George Peter Staplin

unread,
Dec 20, 2007, 10:16:38 PM12/20/07
to
Alexandre Ferrieux wrote:
> Hi,
>
> Now that 8.5 is out thanks to intensive work from our beloved
> Conspicuous Tcl Illuminati, I'm getting prepared to use dicts
> really...

I think dicts have a place in Tcl for some data patterns. They
serialize automatically to a string rep, and that's quite useful for
some things, but I can't help but desire to have [structure] be in the
core too, and I think others may see a reason for them too.

structure is the basis for NexTk.


proc Foo-check {inst arg} {
#When this returns 1 then the structure key will be set.
#When this returns 0 or an error the structure key will not be set.
expr {$arg eq "foo" || $arg eq "bar"}
}

proc Foo-dance {inst args} {
puts "Foo is dancing with $args"
}

proc Foo {inst args} {
structure $inst

$inst -text "" -metadata ""

structure-make-method $inst dance [list Foo-dance $inst]
#This prevents typos from creating new data structure keys.
structure-lock-creation $inst

if {[llength $args]} {
if {[catch {$inst {*}$args} err]} {
structure-free $inst ;#or rename $inst {}
return -code error $err
}
}

structure-key-callback $inst -metadata [list Foo-check $inst]

return $inst
}

% Foo .f -text "Hello"
.f
% .f -text
Hello
% .f -metadata
% .f -metadata foo
1
% .f -metadata
foo
% .f -metadata bozovalue
0
% .f -metadata
foo
% .f dance Happy Elves
Foo is dancing with Happy Elves


There are other neat things, like traces for keys, and per key/structure
locking. Structure traces are more like Tcl traces, in that they run
after the variable/key has been set. The structure callbacks make the
common pattern of verifying an object's valid input patterns much
easier.

I basically use [structure] as a prototype object system (prototype in
OO terms, rather than as in a prototype that is for testing.) Each
ntk_window is a structure instance from which a window/widget tree is
formed based on a _children [list] member.


George

Alexandre Ferrieux

unread,
Dec 21, 2007, 2:36:26 AM12/21/07
to
On Dec 21, 12:42 am, Michael Schlenker <schl...@uni-oldenburg.de>
wrote:
> Alexandre Ferrieux schrieb:
>

> > Won't the huge dict be immediately
> > duplicated ?
>
> So yes, there is a copy, but its kept close to the minimum possible. You
> need a new hashtable struct...

Yeah, the "minimum possible" here is still way too much.
The point of my post was to check whether a smart trick existed to
avoid duplication entirely, just like there is never any automatic
duplication of an array. It seems that no such trick exists, right ?

-Alex

Alexandre Ferrieux

unread,
Dec 21, 2007, 2:39:09 AM12/21/07
to
On Dec 21, 3:29 am, "tom.rmadilo" <tom.rmad...@gmail.com> wrote:
>
> In Tcl the best way to not duplicate
> objects is to refer to them directly or through upvar.

What we've just shown is that [array/upvar] is the *only* way to be
sure to avoid duplications. As soon as your dict value gets bound to a
formal parameter or other variable, you're at risk.

-Alex


tom.rmadilo

unread,
Dec 21, 2007, 3:24:26 AM12/21/07
to
On Dec 20, 11:39 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:

I admit I was somewhat confused by dict, but they still have names
just like anything else in Tcl, and if you pass the name only, it
shouldn't 'bind' anything. I'm actually more confused by the
terminology of bound values and formal parameters applied to Tcl.

The way I seem to look at this is that a dict is somewhat like an
array, but you can easily get its value and pass it around. I'm not a
fan (yet) by any means, but the ability to easily get the value could
be important in certain situations. For one, I'm thinking of the error
code stuff. It occurs to me that these are array-like, but if an array
were created with an inner proc, it would go out of scope and
evaporate, whereas a dict could be passed back to the caller and added
to as things unwind. I'm just guessing here, but it seems like the
ability to pass array-link stuff back and add to it is something
unique.

It is pretty surprising that you could get partial non-copy of data
with a dict (or partial reference, whatever). Maybe it saves memory or
whatever, but it seems to me it is much safer to decide where you are
going to keep the single copy of your data, and just refer to it by
name (or via upvar). Thinking about saving space by guessing what
internals are doing, or not, and when, seems like a diversion. Arrays
work very well, their behavior is well understood. Dicts are pretty
new and I think it will take some time to understand when to use them.
I'm a pretty verbose writer, but dicts seem way too verbose for my
taste, and using 'for' when you get 'foreach' is somewhat strange.

As a data structure I don't find much of a use for this one. I use an
array to enforce unique keys and I use lists to enforce order. I
combine the two in a number of ways: array of lists, list of array
names, etc. But dict is similar to a structure, but is also somewhat
squishy for the way I think about data. The one thing that I like is
that they are easy to serialize, so you can easily write them out or
pass them around.

sch...@uni-oldenburg.de

unread,
Dec 21, 2007, 4:39:38 AM12/21/07
to

K combinator when passing the value, so you do not retain a copy...

Michael

miguel

unread,
Dec 21, 2007, 5:28:57 AM12/21/07
to

Sadly that will not work, as TWO refCounts are added when passing an arg
to a proc:
- one for the ref in the objc/objv array (Tcl stack within TEBC)
- one for the ref of the proc's variable

That means that having more refs in the caller's context (which you
remove with the K trick) is irrelevant to the COW behaviour.

Avoiding this double-ref is not easy, and will not happen soon (except
if someone comes up with a great idea). It would involve either of
(a) a non-trivial change in api semantics (callee manages objc/objv,
as opposed to caller)
(b) some elaborate (aka cliff-hanging, aka hacky) special handling
for argument variables at runtime (as opposed to all other vars). This
would have a (perceptible?) cost in runtime, and a (definite!) cost in
added complexity.

I have been thinking about changing the api only for procs (a), but not
as a hi-prio project.

Larry W. Virden

unread,
Dec 21, 2007, 7:52:36 AM12/21/07
to
On Dec 21, 2:39 am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:

>
> What we've just shown is that [array/upvar] is the *only* way to be
> sure to avoid duplications. As soon as your dict value gets bound to a
> formal parameter or other variable, you're at risk.


Weird - I thought the whole reason for dict was so that we had a data
structure that was treated as a first class object - instead, we have
Just Another Second Class Structure (JASCS)?

sch...@uni-oldenburg.de

unread,
Dec 21, 2007, 8:40:13 AM12/21/07
to

Its a first class object. But the Tcl value semantics and the no
duplication do not play together well. A dict is no better than a list
in that respect.

What some people would like is reference semantics.

Michael

miguel

unread,
Dec 21, 2007, 9:04:36 AM12/21/07
to

upvar?

Andreas Leitgeb

unread,
Dec 21, 2007, 9:34:11 AM12/21/07
to
Alexandre Ferrieux <alexandre...@gmail.com> wrote:
> Now suppose I have a huge dict, stored in some global, and that I pass
> its value back and forth to various routines. Its refcount will go up
> with each level of proc call where it is passed as an argument, since
> the argument is a local holding a bona fide reference. Suddenly, under
> a stack depth of (say) three, I decide to update the dict, eg with a
> [dict incr] on the global variable. Won't the huge dict be immediately
> duplicated ?

All the sharing and refcounting is not visible at script-level, where
it is entirely equivalent to a full deep copy.

so:
set globvar [dict create foo 4 bar 2]
proc foo {d} {
dict incr ::globvar bar; # global variable globvar now: foo 4 bar 3
dict incr d foo; # local variable d now: foo 5 bar 2
}
foo $globvar; # global variable globvar now: foo 4 bar 3

at the point where the value stored in globvar is replaced
by a new modified dict value, it is then different from the copy
previously made to the parameter-variable of procedure foo.

If your intention was, to have changes in globvar being reflected
in local "d", then you need upvar to make "d" a link to globvar.

Alexandre Ferrieux

unread,
Dec 21, 2007, 10:14:17 AM12/21/07
to
On Dec 21, 3:34 pm, Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at>
wrote:

>
> If your intention was, to have changes in globvar being reflected
> in local "d", then you need upvar to make "d" a link to globvar.

Yes, but it's a bit more subtle than that: if, inside a proc, I
simultaneously have a local pointing to the same (shared) dict value,
even if I only write to ::globvar, the duplication will occur.

So the only safe (in the sense of avoiding duplications) way of
writing to a global dict, is to make sure that no extra references
(even for read-only purposes) exist at the time. This is hard to do in
general, since script-level reasoning normally avoids refcounts (and
rightly so) !

Hence, the seemingly unavoidable conclusion is: if you have one large
hashtable in your process, and sometimes write to it, make it an
array, not a dict.

-Alex


Donal K. Fellows

unread,
Dec 21, 2007, 10:20:22 AM12/21/07
to
Larry W. Virden wrote:
> Weird - I thought the whole reason for dict was so that we had a data
> structure that was treated as a first class object - instead, we have
> Just Another Second Class Structure (JASCS)?

It's as much a first-class value type as a list or a number. If you
choose to regard that as second-class, that's your option.

Donal.

Donal K. Fellows

unread,
Dec 21, 2007, 10:24:13 AM12/21/07
to
Alexandre Ferrieux wrote:
> Hence, the seemingly unavoidable conclusion is: if you have one large
> hashtable in your process, and sometimes write to it, make it an
> array, not a dict.

Wrong conclusion. Dictionaries are values. Really. If you're wanting
to have some shared state represented in the form of a dictionary, put
it in a variable that is visible in multiple locations (e.g. through
the use of [global]) and use it like that. This is the *same* as if
you were working with a list.

Donal.

Alexandre Ferrieux

unread,
Dec 21, 2007, 10:27:42 AM12/21/07
to
On Dec 21, 11:28 am, miguel <mso...@users.sf.net> wrote:
>
> Avoiding this double-ref is not easy, and will not happen soon (except
> if someone comes up with a great idea).
> I have been thinking about changing the api only for procs (a), but not
> as a hi-prio project.

Please don't. It is sufficient to make the consequences for dicts of
"being first-class" clear enough to all beginners (I mean dict-
beginners. I am one :-).

If you agree with me, we can be at peace with a statement like:

Dicts are not a complete replacement for arrays. They are first-
class, so are easier to transport from place to place as values, but
this freedom comes with a price: they are basically immutable (like
lists), in the sense that the only way to update them in-place is to
make sure they are not shared. Being "not shared" means basically they
are stored in only one var, and this var is accessed through upvar.

On the contrary, Arrays are collections of variables, which gives
them fundamental mutability, without ever being duplicated behind the
scenes. The cost is the slightly less elegant idiom of passing by name
(upvar).

Bottom line for newbies: if your dict is large and r/w, better
make it an array, or count references carefully.

Do we agree on the diagnosis ?

-Alex

Donal K. Fellows

unread,
Dec 21, 2007, 10:32:57 AM12/21/07
to
Alexandre Ferrieux wrote:
> Bottom line for newbies: if your dict is large and r/w, better
> make it an array, or count references carefully.
>
> Do we agree on the diagnosis ?

No. ;-)

Donal.

Alexandre Ferrieux

unread,
Dec 21, 2007, 10:33:43 AM12/21/07
to
On Dec 21, 4:24 pm, "Donal K. Fellows" <donal.k.fell...@man.ac.uk>
wrote:

Yes, but it means you always pass it by name ([global] being a special
case of [upvar]), losing the beauty of first-class-ness...

Moreover, if you write a generic procedure like an iterator, taking an
arbitrary script as argument, then the dict must be passed by name to
the iterator, just in case the arbitrary script needs write access.

-Alex

miguel

unread,
Dec 21, 2007, 11:05:13 AM12/21/07
to
Alexandre Ferrieux wrote:
> On Dec 21, 11:28 am, miguel <mso...@users.sf.net> wrote:
>> Avoiding this double-ref is not easy, and will not happen soon (except
>> if someone comes up with a great idea).
>> I have been thinking about changing the api only for procs (a), but not
>> as a hi-prio project.
>
> Please don't. It is sufficient to make the consequences for dicts of
> "being first-class" clear enough to all beginners (I mean dict-
> beginners. I am one :-).

I've been thinking about this in order to reduce the proc-call overhead,
not specifically wrt dicts or lists.

> If you agree with me, we can be at peace with a statement like:
>
> Dicts are not a complete replacement for arrays. They are first-
> class, so are easier to transport from place to place as values, but
> this freedom comes with a price: they are basically immutable (like
> lists), in the sense that the only way to update them in-place is to
> make sure they are not shared. Being "not shared" means basically they
> are stored in only one var, and this var is accessed through upvar.
>
> On the contrary, Arrays are collections of variables, which gives
> them fundamental mutability, without ever being duplicated behind the
> scenes. The cost is the slightly less elegant idiom of passing by name
> (upvar).
>
> Bottom line for newbies: if your dict is large and r/w, better
> make it an array, or count references carefully.
>
> Do we agree on the diagnosis ?

Dicts are not a complete replacement for arrays? They are a different
animal, not a replacement for each other - although they may be
substitutes for some use cases. The description of dicts as "first-class
arrays" is plain wrong.

Dicts are just a special interpretation for Tcl values, ie, strings.
They share the internal properties of all values - managed by refcounts,
COW semantics, first-class properties. There are specific commands and
internal optimizations for them, as there are for other specially
formatted values: lists, integers, command and variable names, scripts,
etc, etc. Note that values have no name - if you want to name a value,
you have to assign it to some variable (and then not modify the variable).

Arrays otoh are variables that represent a collection of variables. An
array has a name (many names? qualified or not, ...), can carry traces,
can be upvar'ed to. So do the individual elements. It cannot be
represented as a string ([array get] just gives a dict with the same
structure, but *not* the array).

IMHO, it is wrong to stress these things in a "dict vs array" manner; it
really is just a "value vs variable" comparison. Do not confuse the
container (a variable) with its contents (a value). Once the general
case is understood, things become simple. If you insist on seeing this
as a multitude of special cases you just obscure the matter.

Bottom line for newbies: if your values are large and r/w, better store
them in a variable and access it through upvar. No matter if list or
dict or whatever. Note that there is one choice you may have forgotten:
you may as you say upvar to an array; you may also create a variable
whose value is a dict and upvar to the scalar variable:

proc process v {
upvar 1 $v local
dict set local ....
}
set dictVar [dict create ...]
process dictVar

I do agree that when the use case allows for either an array or a dict
it might be valuable to have a specialized list of choice criteria
available. In the wiki?

tom.rmadilo

unread,
Dec 21, 2007, 12:50:31 PM12/21/07
to
On Dec 21, 7:27 am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:

> Bottom line for newbies: if your dict is large and r/w, better


> make it an array, or count references carefully.
>
> Do we agree on the diagnosis ?

I was going to hold off on the language discussion, hoping it would go
away, but it isn't.

The newbie should completely forget about reference counts, period.
This is not a script level concept in Tcl and will only confuse anyone
trying to think in Tcl. Go find some other language to talk about
references, but don't confuse newbie Tcl'ers with reference counts,
first-class-objects, lazy copying, etc. This is real bs in Tcl. If the
developers who wrote the code tried to optimize performance by not
doing an immediate copy, who cares, this has nothing to do with what
the script level code says to do, which is to make a copy:

[set a $b] makes a copy of the value of b and store it in another
place called a, when or how this is done should never be depended upon
by anyone at the script level.

A dict has a value, a real value which can be passed around. A dict
also has a name, a real name which can be passed around. What is the
difference between this and anything else? An array has a value which
can be accessed with [array get myarray].

But to understand the difference between how dict and array are used
is more instructive than looking at their implementation. They are
used for different types of data. Arrays are limited to one-to-one key-
value relationships. This makes the values independent from each
other, they are orthogonal, at right-angles, whatever. Being able to
address the array entries as individuals makes a lot of sense. But a
dict is not like this. A dict is more like a grouped list of lists.
The grouping may be added to by many different somewhat independent
actions, but the dict itself is a single entity of dependent content.

However, array and dict are are also commands, not just structures.
Talking about a dict or array as if they are mere datatypes completely
misses the point, and an important concept in Tcl: we let the
developer decide what to do. dict and array are ways of accessing and
manipulating data, and in pure Tcl you can invent hundreds of other
ways of manipulating data, or storing it.

In addition, any developer who can't figure out how to use globals or
namespace variables to store and maintain large amounts of reusable
data should start at that point and really forget completely about
reference counts. It is usually a bug to copy data if you just need to
manipulate it, regardless of how small or large the data is in memory.
The reason is that your code then makes it clear that you want the
actual global value of a variable at a particular point in your code,
not what it was when your procedure started to run, but the current
value. If you want the value at the time your procedure starts to run,
get a copy.

In addition, [upvar] is not inelegant! This statement can only result
from a complete misunderstanding of the Tcl language. If we need to
help newbies, the best advice is for them to assume that the Tcl
language works the way it works for very good reasons, and there is
nothing inelegant with something that works very well. (How can
someone like reference counting and misunderstand upvar?) Newbies need
to learn the language of Tcl, and suggesting that they think about
things they cannot use in terms which don't apply to the Tcl language
is a bad way to help them out.

Another statement above is misleading to the newbie: that things
should be handled differently with procs/commands"

"I have been thinking about changing the api only for procs..."

Whaa? Lets go back to syntax rule #1:

"A Tcl script is a string containing one or more commands."

The desire to special-case stuff is always a huge danger in
programming; don't fall into the trap of trying to invent a cool
special shiny object for Tcl, it'll stick out like a wart on your
nose.

Reference counting is not a solution to poor programming.

miguel

unread,
Dec 21, 2007, 1:39:12 PM12/21/07
to
tom.rmadilo wrote:
> Another statement above is misleading to the newbie: that things
> should be handled differently with procs/commands"
>
> "I have been thinking about changing the api only for procs..."
>
> Whaa? Lets go back to syntax rule #1:
>
> "A Tcl script is a string containing one or more commands."
>
> The desire to special-case stuff is always a huge danger in
> programming; don't fall into the trap of trying to invent a cool
> special shiny object for Tcl, it'll stick out like a wart on your
> nose.

Sorry, I was obviously not clear. No changes planned at the script
level, this refers at worst to the C-level api, at best to only internal
apis of the bytecode subsystem (compiler and engine).

I was replying to Alexandre, definitely not a newbie and clearly worried
about refcounts. He is programming at the script level, but worrying
about performance and how things are implemented below: not a newbie
concern, and a valid one.

The point I was trying to make (again, talking about the C level and
definitely not to newbies):

(a) currently the command api results in our adding two refcounts to
each value that is passed as an argument to a proc: one by the caller,
one by the proc itself.

(b) this invalidates the proposed "solution" of clearing the var in the
caller using the K combinator, at least when the callee is a proc

(c) two extra refcounts is one-too-many: proper management could be done
with just one extra refcount (assuming some coordination between caller
and callee, ie, an api change). Today the caller cleans up the arguments
on return; we'd need the callee to take over that job.

(d) I have been thinking of ways to avoid the one-too-many in order to
simplify some of the code that is run at each proc call, when setting up
and then cleaning up the proper environment for the proc body. Whatever
is done here, it will NOT introduce a difference between procs and
commands at the script level - possibly not even at the C extension
level. It may yet turn out to be a confined to just the bytecode subsystem.

No, I am not conspiring to destroy the language. One of the most basic
features of Tcl is that all commands are equal, that stays. If it got to
choosing between "that stays" or "miguel stays", even I would vote for
my prompt removal.

Miguel

tom.rmadilo

unread,
Dec 21, 2007, 1:47:24 PM12/21/07
to
On Dec 21, 9:50 am, "tom.rmadilo" <tom.rmad...@gmail.com> wrote:
> The desire to special-case stuff is always a huge danger in
> programming; don't fall into the trap of trying to invent a cool
> special shiny object for Tcl, it'll stick out like a wart on your
> nose.

The delayed copy of a dictionary into a value seems to have an
immediate usefulness in the [dict] commands themselves. From the
manpage there is this example:

dict set capital en [dict get $capital C]

If $capital isn't instantly copied, then it probably doesn't double in
size with the above operation. That would be good. But why the syntax?

For instance, why not:

dict set capital en [dict get capital C] ????

My guess is that in [dict get capital ...] 'capital' isn't guaranteed
to be a dict, it could be a list:

% dict get {a b c d} a
b

The [dict get] command says: "treat arg1 as a dictionary value". If
this wasn't the case, then you would always need to initialize a
dictionary before you could use any [dict] commands on it. Talk about
copying too much, you would likely end up with the original list, plus
a dictionary object, even within a single context.

tom.rmadilo

unread,
Dec 21, 2007, 2:48:49 PM12/21/07
to

Very well explained, and I agree completely. I do appreciate the level
of non-newbie-ness of everyone contributing to this thread, no
question about that. I also fully support the C level developers of
the core commands doing whatever they think works as long as the
[tclsh] users don't have to care about it. One of the key benefits of
these two distinct levels is that the C developers are free to
optimize all they want without informing the upper levels.

I think I mis-understood what Alexandre was getting at, although I'm
again guessing. And that is that if a newbie looked at some small
chunk of code to figure out how to use a dict, or pass around
references to it, it might not be obvious without a little more
explanation.

You are forced to use the name of an array which can't be passed
around by value. In fact, this is somewhat helpful from time to time
in discovering bugs due to naming. You can't have both $a and $a(a),
first one created wins.

But [dict] may actually improve the situation because a dict name will
hardly ever look like a dict value, and of course, you don't even have
to have a name for a dict, so you should be able to apply several
layers of filtering without any intermediate variables. And this is
what is going to be confusing.

Donal K. Fellows

unread,
Dec 21, 2007, 5:20:22 PM12/21/07
to
Alexandre Ferrieux wrote:
> Yes, but it means you always pass it by name ([global] being a special
> case of [upvar]), losing the beauty of first-class-ness...

Same as with lists.

> Moreover, if you write a generic procedure like an iterator, taking an
> arbitrary script as argument, then the dict must be passed by name to
> the iterator, just in case the arbitrary script needs write access.

I have no idea what you are talking about there. You seem to be totally
confused and determined to ascribe magic to everything in sight, whereas
there is damn little magic (and all of that is purely a matter of some
performance optimizations). Given that, what do you think this does?

proc iterDict {dict varKey varValue script} {
upvar 1 $varKey key $varValue val
foreach key [dict keys $dict] {
set val [dict get $dict $key]
uplevel 1 $script
}
}

Also note that [dict for] does pretty much the same thing. ;-)

Donal.

tom.rmadilo

unread,
Dec 21, 2007, 6:46:15 PM12/21/07
to
On Dec 21, 2:20 pm, "Donal K. Fellows"

<donal.k.fell...@manchester.ac.uk> wrote:
> there is damn little magic (and all of that is purely a matter of some
> performance optimizations).

Regardless of any magic, I am more interested in surprise behavior.
The only surprise I see here is that (according to some of the
comments in this thread) the $mydict construct doesn't necessarily
create an instant copy of the data. This is a different kind of
surprise, if true. It means that the [dict] commands can operate on
things which are not dicts in the formal sense, but just have the
structure of a dict, yet when an actual dict is passed around by
value, it is a somewhat thin value, which could fatten up if
necessary. I hope this is a correct statement, because it means that
filtering of a fat dict into a much smaller dict will be nearly ideal.

For instance, I've been working on an nvlist, and the first question
was to pass by value or by reference. I started by value and switched
to by reference. However I lost the ability to chain filtering
(without intermediate results) due to the need to reference the data
by name. Dict seems to neatly solve this difficult choice. I really
hope this analysis is right, but either way a dict is somewhere
between a list and an array, pretty much right in line with Tcl
structures.

ZB

unread,
Dec 21, 2007, 10:05:07 PM12/21/07
to
Dnia 21.12.2007 tom.rmadilo <tom.r...@gmail.com> napisał/a:

> As a data structure I don't find much of a use for this one. I use an
> array to enforce unique keys and I use lists to enforce order. I
> combine the two in a number of ways: array of lists, list of array
> names, etc.

A good example would be a typical "personal database" (or just any other):

- array of lists won't be suitable for this one

- list of array names isn't comfortable: if an array has to contain personal
data (say: arr(name), arr(address), arr(phone) and so on), so although you
have to use different array names, like f.e. arr001, arr002 or similar way
- but the real index will be list index, pointing to the array name (and
not array name itself). One can even just generate array names at random,
because they have just to be different, and will not have any other meaning.
And so there's an "intermediate level" when accessing the data. Of course,
one can live with that - but it isn't convenient neither elegant.

Much easier will this be solved using dictionaries, which are kind of records
(or C-structures). For example, such way:

set tmpDict [dict create name John address Washington phone 123455678]
set dBase [list]
lappend dBase $tmpDict
set tmpDict [dict create name Mary address Chicago phone 3453453453]
lappend dBase $tmpDict
...and so on...

Such way we've got a replacement for "ordinary", integer-indexed linear
array, which holds records as data, with record fields accessible just by
field names - very common structure. You can then sort $dBase list whichever
way you want, and the list index will be real pointer to particular record.

Just my thoughts since "my" thread "Still no records". BTW: take a look at
an example http://pinds.com/2000/09/18/datastructures-in-tcl - the guy had
to use "clever naming", and it's quite interesting solution - but pay
attention, how natural solution would be just using dictionaries (not
present at that time).
--
ZB

tom.rmadilo

unread,
Dec 22, 2007, 1:45:34 AM12/22/07
to

On Dec 21, 7:05 pm, ZB <zbREMOVE_THIS@AND_THISispid.com.pl> wrote:
> Dnia 21.12.2007 tom.rmadilo <tom.rmad...@gmail.com> napisa³/a:
> an example http://pinds.com/2000/09/18/datastructures-in-tcl- the guy had

> to use "clever naming", and it's quite interesting solution - but pay
> attention, how natural solution would be just using dictionaries (not
> present at that time).

Of course Mr. Pind doesn't use Tcl anymore (od'd on ruby then
crashed), so who cares? Using clever names for arrays which you will
just put into a list is moronic. Lists are ordered structures and the
only requirement for arrays is that each needs a unique name. If you
use some kind of non-opaque name for the array names that relates to
order, you instantly lose the ability to insert, delete or reorder the
elements (like lreverse). This is the typical solution you find from
developers who can't figure out how to use the data structures which
exist, basically: arrays have unique keys and lists have unique
order.

But records are not that difficult to create in Tcl. I think the
problem is that whenever something is not present in the base Tcl
language it is seen as a missing feature. Since you can write a record
API in a page or two of Tcl code, it isn't so much a missing feature
as a lack of effort by those who want it. The argument against writing
it in Tcl is always that everything needs to be fast, like as fast as
possible. Believe me, if you write it in plain ol' Tcl and people use
it, you can port it to C. But just complaining about the lack of a
data structure which can be coded up over a weekend or two doesn't
help anyone.

If you want a head-start, I'll give you my nvlist code, which is
generalized beyond the dict code in some (tiny) ways, but not quite to
the record stage. And it is interesting to note here that my nvlist
API is designed to be special-cased with thin wrappers to provide
meaningfully named operators. Trying to push too many features into
core API may seem like a good idea at first, but a thin core API has
the advantage of composition, specialization and slight renaming so
that API users will easily understand the meaning of an API, and you
will be able to search for uses of the API.

So here are some examples:

http://junom.com/gitweb/gitweb.perl?p=twsdl.git&a=search&h=HEAD&st=grep&s=nvlist

This shows a search of uses of nvlist in a larger API. Note that in
most cases here there is a relatively small list of data items, but
each has more than the simple name-value relationship implied by the
name of the structure. This is a typical complex situation which isn't
easily handled by a dictionary, which has indexes on keys and values
only. This nvlist code is built upon [lsearch] and can use any range
of list members as a key, for instance, a 5 member list {a b c d e},
could be selected based upon a search of {* b c d *}.

Okay, that is the nvlist example, how about lists of array names:

http://junom.com/gitweb/gitweb.perl?p=twt.git;a=commit;h=c22ea29e5906d75bd9ed044a677030b3d7929e3a

There are two sets of files here. One is the original pg-
browse.tcl/.adp for browsing a database using foreign keys as links
between the tables. If you are interested, the pg-browse code uses the
OpenACS database API which creates a new specialized data structure,
which can't be easily used without the other associated API.

The second set of files is pg-tmp.tcl/.tmpl Here the database API
returns a list of array names. I wrote this API several years ago and
never considered the case of appending (append) additional rows with
another query, or adding columns (extend) to each row using a loop.

The original OpenACS API has lots of switches, and if you look at the
code it is quite complex, many long files, and the result is a
completely inaccessible data structure.

The new API uses lists and arrays. You can create and modify these
using regular Tcl code, no special API. So, when I needed to port this
relatively complex sequence of queries, I was actually surprised that
my original code fully supported the 'extend' and 'append' features
with no additional code.

My main point here is that it is very important to not create problems
for yourself with new specialized data structures. I think that the
new dict structure very much follows this idea. It does so because
there is a text/string/list representation which can be created in any
number of ways, and many of the operations return values which can be
turned into lists or arrays or new dicts without any special code.
This is very important, much more important than adding a super
functional but isolated data structure.

And don't use Lars Pind as an example of good coding in Tcl, not even
close.

Alexandre Ferrieux

unread,
Dec 22, 2007, 6:55:32 AM12/22/07
to
On Dec 21, 11:20 pm, "Donal K. Fellows"

<donal.k.fell...@manchester.ac.uk> wrote:
> Alexandre Ferrieux wrote:
> > Yes, but it means you always pass it by name ([global] being a special
> > case of [upvar]), losing the beauty of first-class-ness...
>
> Same as with lists.

Yes, but it's irrelevant to somebody who is hesitating between an
array and a dict.

> > Moreover, if you write a generic procedure like an iterator, taking an
> > arbitrary script as argument, then the dict must be passed by name to
> > the iterator, just in case the arbitrary script needs write access.
>
> I have no idea what you are talking about there. You seem to be totally
> confused and determined to ascribe magic to everything in sight, whereas

Totally confused ? You've made clear enough how stupid you believe I
am, but read on.

> there is damn little magic (and all of that is purely a matter of some
> performance optimizations). Given that, what do you think this does?
>
>    proc iterDict {dict varKey varValue script} {
>       upvar 1 $varKey key $varValue val
>       foreach key [dict keys $dict] {
>          set val [dict get $dict $key]
>          uplevel 1 $script
>       }
>    }

It does exactly what I described in the first place: when $script is
executed, at least one extra reference to the dict is in the $dict
local in the iterDict frame (and even one more according to Miguel).
Hence, if the script inadvertently writes to the dict (by indirect
side-effect, eg by calling some other procs which access it through
the global (assuming we did iterDict $::d k v {...}) ans write), then
the unwanted, wasted duplication occurs.

> Also note that [dict for] does pretty much the same thing. ;-)

Pretty much or exactly ? (Since it's a C function I guess it doesn't
suffer from the ref-wasting issues of procs)

-Alex

PS: to help you avoid biting me once more, look at Miguel's analysis.
He does understand my concerns.

Alexandre Ferrieux

unread,
Dec 22, 2007, 7:13:12 AM12/22/07
to
On Dec 21, 5:05 pm, miguel <mso...@users.sf.net> wrote:
>
> IMHO, it is wrong to stress these things in a "dict vs array" manner; it
> really is just a "value vs variable" comparison. Do not confuse the
> container (a variable) with its contents (a value). Once the general
> case is understood, things become simple. If you insist on seeing this
> as a multitude of special cases you just obscure the matter.

I'm not confusing anything. I do understand Tcl internals pretty well,
though I like to put myself in the scripter's seat at the same time to
verify how well we hide the internal complexity.

As you yourself said in a nearby message, my only concern is about
performance, in situations which may be hard to analyse purely from
the scripter's POV.

The reason I'm putting this in terms of "dict vs array" is that, for a
scripter, there is a large functional overlap between the two (I did
not write "redundancy", I want to live through 2007), hence a natural
question like "when should I use one or the other ?".

> Bottom line for newbies: if your values are large and r/w, better store
> them in a variable and access it through upvar. No matter if list or
> dict or whatever.

Yes, but in case you need associative access, doing this is hardly
different (for the client code) from what you'd do with an array... I
have nothing against upvar, I'm just trying to enumerate pros and
cons, some of which are of a stylistic nature, and on this scale it's
an ex-aequo :-)

> I do agree that when the use case allows for either an array or a dict
> it might be valuable to have a specialized list of choice criteria
> available. In the wiki?

That would definitely help !

-Alex

Alexandre Ferrieux

unread,
Dec 22, 2007, 7:30:36 AM12/22/07
to
On Dec 22, 12:46 am, "tom.rmadilo" <tom.rmad...@gmail.com> wrote:
>
> The only surprise I see here is that (according to some of the
> comments in this thread) the $mydict construct doesn't necessarily
> create an instant copy of the data. This is a different kind of
> surprise, if true.

1) It's true;
2) It's a property of all Tcl values, not specific to dicts;
3) It's called COW (copy-on-write) semantics, and is a very valid and
powerful idiom. It can be witnessed at impressive work in all modern
unices' fork(): virtual memory pages are shared by both processes, and
only a write access induces a duplication.
4) It's not the focus of the debate here. Nobody argues against it.
5) The only issue is that "being shared" may happen by surprise
(forgotten locals etc)
6) As Donal and Miguel rightly pointed out, the performance hit would
be just the same with a large list/string, which is also mutable
through [lappend,lset]/[append].
7) True, but an array is never duplicated.

Hope this helps,

-Alex

Fredderic

unread,
Dec 22, 2007, 8:40:13 AM12/22/07
to

is slow and limited.

I have a situation I think I'm mentioned before on here, where I have
backend data manipulation functions, middle generic algorithms (some of
which in turn use others within that same layer), and then task-specific
front-end functions that combine the available backends and middle
algorithms in ways that are meaningful to their specific task.

The problem comes when a backend wants to be passed a reference, and
the variable it's interested in happens to be proc-local. Now, you
can't just pass a name, because it's not globally accessible, and by the
time you get to the back-end procs, you don't know how many levels back
it is. Which means that you have to pass TWO pieces of information.
The name of the variable, and the level at which it is known to exist.

True references would remove the need for such special handling
completely. You'd just pass a reference to the variable, which would
be valid as long as the variable itself was, and local to whatever
proc it was passed into.

But here comes one of my three gripes with [dict]s. What if the value
you want to pass by reference, is not only proc-local, but also within a
[dict]? [upvar] can handle arrays just fine. But it lacks the means to
reach into a [dict], let alone a nested [dict].

If TCL data structures had the capacity to export references, again,
this all would be resolved. You could simply use [dict reference] (for
example) to obtain a reference to a value at an arbitrary depth, and
then pass that reference on as needed.

I have implemented a reference type mechanism by creating a temporary
global shadow variable, and using traces to bind them together. The
mechanism can quite readily handle just about anything, and as it uses
lazy updating, it actually has reasonably decent performance. But it
is messy, complicated, and shouldn't be needed. Most importantly, it
won't handle new types of data added by extensions.


On a side-note, my other two gripes with [dict]s, are they don't
support traces, and there's no get-and-report or get-with-default
methods (both of which I use extensively from my home-grown pre-dict
keyed list implementation). A forth minor gripe is that the string
representation of a [dict] (and an [array], which I suppose it's
intended to be compatible with) is particularly useless.


Fredderic

miguel

unread,
Dec 22, 2007, 11:19:15 AM12/22/07
to
Fredderic wrote:
> On Fri, 21 Dec 2007 11:04:36 -0300,
> miguel <mso...@users.sf.net> wrote:
>
>> sch...@uni-oldenburg.de wrote:
>>> Larry W. Virden wrote:
>>>> Weird - I thought the whole reason for dict was so that we had a
>>>> data structure that was treated as a first class object - instead,
>>>> we have Just Another Second Class Structure (JASCS)?
>>> Its a first class object. But the Tcl value semantics and the no
>>> duplication do not play together well. A dict is no better than a
>>> list in that respect.
>>> What some people would like is reference semantics.
>> upvar?
>
> is slow and limited.

It is bytecompiled in 8.5 - not that slow anymore? It should be faster
than accessing a global variable by name.

> I have a situation I think I'm mentioned before on here, where I have
> backend data manipulation functions, middle generic algorithms (some of
> which in turn use others within that same layer), and then task-specific
> front-end functions that combine the available backends and middle
> algorithms in ways that are meaningful to their specific task.
>
> The problem comes when a backend wants to be passed a reference, and
> the variable it's interested in happens to be proc-local. Now, you
> can't just pass a name, because it's not globally accessible, and by the
> time you get to the back-end procs, you don't know how many levels back
> it is. Which means that you have to pass TWO pieces of information.
> The name of the variable, and the level at which it is known to exist.
>
> True references would remove the need for such special handling
> completely. You'd just pass a reference to the variable, which would
> be valid as long as the variable itself was, and local to whatever
> proc it was passed into.

Interesting concept: globally accessible references for proc-local vars.
I wrote a proof-of-concept implementation at http://wiki.tcl.tk/20526
(NOT TESTED). There are more elaborate solutions at
http://wiki.tcl.tk/14636 and http://wiki.tcl.tk/9549

There might be a (more or less) simple way to implement this in C if the
speed of the above approach is not satisfactory. If you can flesh out
the proposal a bit more, you may want to TIP it.

> But here comes one of my three gripes with [dict]s. What if the value
> you want to pass by reference, is not only proc-local, but also within a
> [dict]? [upvar] can handle arrays just fine. But it lacks the means to
> reach into a [dict], let alone a nested [dict].

There is a semantic problem here. Tcl values are inmutable - if you want
to pass a value, just pass the value: fish it out from its container and
pass it on.

In this context, "reaching into a dict" makes no sense: a dict is just a
special interpretation of a string, and you cannot modify strings. You
can create a new one from the old one - but that is not the same dict,
it is a new one. Of course, you can then store it in the same variable
where the previous dict was stored ...

What you seem to want is to modify a variable that currently holds a
dict value. That is what [upvar] is for.

> If TCL data structures had the capacity to export references, again,
> this all would be resolved. You could simply use [dict reference] (for
> example) to obtain a reference to a value at an arbitrary depth, and
> then pass that reference on as needed.

Not in Tcl - at least, not a R/W reference: it would violate COW semantics.

> I have implemented a reference type mechanism by creating a temporary
> global shadow variable, and using traces to bind them together. The
> mechanism can quite readily handle just about anything, and as it uses
> lazy updating, it actually has reasonably decent performance. But it
> is messy, complicated, and shouldn't be needed. Most importantly, it
> won't handle new types of data added by extensions.

Oh - possibly similar to what I did up there ... should have read to the
end first :}

> On a side-note, my other two gripes with [dict]s, are they don't
> support traces, and there's no get-and-report or get-with-default
> methods (both of which I use extensively from my home-grown pre-dict
> keyed list implementation). A forth minor gripe is that the string
> representation of a [dict] (and an [array], which I suppose it's
> intended to be compatible with) is particularly useless.

Right, dicts do not support any of that. Neither do lists, nor integers,
nor strings. Can't. Not in Tcl.

tom.rmadilo

unread,
Dec 22, 2007, 12:02:54 PM12/22/07
to

On Dec 22, 5:40 am, Fredderic <my-name-h...@excite.com> wrote:
> On Fri, 21 Dec 2007 11:04:36 -0300,
>
> miguel <mso...@users.sf.net> wrote:

Have you ever tried namespaces? It is a very powerful concept, and can
allow you to avoid upvar in most situations. I say most because there
is nothing inconvenient with using upvar if you don't have to pass
around a level indicator.

I did a search through a recent large body of code I wrote, heavily
focused on this concept. Upvar is used only a handful of times, and
never with a level reference. In fact, all of the uses are helper
functions, not the main line of programming:

http://junom.com/gitweb/gitweb.perl?p=twsdl.git&a=search&h=HEAD&st=grep&s=upvar

As an example, I use a namespace name as a pure reference to provide
total isolation between three different components: protocol, data/xml
container and invoked procedure. And the reason is exactly as you
state it: it is not good design to think in terms of levels if there
is not much of a real relationship between them, thus using upvar is a
kludge more than a programming style. (I like to think of namespaces
as named stacks.)

Here is a series of examples using namespaces and upvar together to
use an nvlist:

http://junom.com/gitweb/gitweb.perl?p=twsdl.git;a=blob;f=packages/tws/tcl/nvlist-init.tcl

The idea is simpler than a dict and very space efficient. Obviously
the general concept of a name-value list is like a serialized array,
however the examples above show how a few simple API can introduce the
concepts of both enumeration and records. The last example compares
the nvlist with the example given for [dict] in the Tcl manpages. The
space efficiency is worth noting: you create two nvlists, one for
values and one for the names, so names are only included once and can
be used to get the index of the value list.

But note: this two nvlist solution is for space efficiency and is not
a self-describing structure like a dict.

Donal K. Fellows

unread,
Dec 22, 2007, 12:58:08 PM12/22/07
to
Alexandre Ferrieux wrote:
> Yes, but it's irrelevant to somebody who is hesitating between an
> array and a dict.

Use an array if you need a collection of variables. Use a dict if you
need a collection of values (indexed by name). This is exactly the key
difference between the two things. End of story.

> Totally confused ? You've made clear enough how stupid you believe I
> am, but read on.

I think you're either not asking the question in a clear-enough
fashion or you're using some words to mean different things than what
I understand them to mean. A general failure to communicate. In this
situation all I can do is advise you to Use The Source, Luke.

> It does exactly what I described in the first place: when $script is
> executed, at least one extra reference to the dict is in the $dict
> local in the iterDict frame (and even one more according to Miguel).
> Hence, if the script inadvertently writes to the dict (by indirect
> side-effect, eg by calling some other procs which access it through
> the global (assuming we did iterDict $::d k v {...}) ans write), then
> the unwanted, wasted duplication occurs.

So what? Everything works exactly as if everything was copied instead
of being shared. All that happens is that Tcl postpones the copy until
it is really needed, which is often "never". If you have some other
magical way of doing it that doesn't involve *values* that mutate as
you look at them, do speak up.

> > Also note that [dict for] does pretty much the same thing. ;-)
>
> Pretty much or exactly ? (Since it's a C function I guess it doesn't
> suffer from the ref-wasting issues of procs)

Why do you care? (They are syntactically different, since you ask.)

> PS: to help you avoid biting me once more, look at Miguel's analysis.
> He does understand my concerns.

You're busy constructing the Himalayas out of a minor case of acne.

Donal.

tom.rmadilo

unread,
Dec 22, 2007, 1:42:51 PM12/22/07
to
On Dec 22, 8:19 am, miguel <mso...@users.sf.net> wrote:

> Fredderic wrote:
> In this context, "reaching into a dict" makes no sense: a dict is just a
> special interpretation of a string, and you cannot modify strings. You
> can create a new one from the old one - but that is not the same dict,
> it is a new one. Of course, you can then store it in the same variable
> where the previous dict was stored ...
>
> What you seem to want is to modify a variable that currently holds a
> dict value. That is what [upvar] is for.
>
> > If TCL data structures had the capacity to export references, again,
> > this all would be resolved. You could simply use [dict reference] (for
> > example) to obtain a reference to a value at an arbitrary depth, and
> > then pass that reference on as needed.
>
> Not in Tcl - at least, not a R/W reference: it would violate COW semantics.

Here is the weird voodoo in this situation: it sounds like you are
creating data within the context of a procedure instead of in the
context of a namespace (including global ::). If the data is truly
important, it should not be created or stored as a local variable
which needs to be accessed via an upvar where you guess the level
somehow. If you use a namespace, the location of your data will be
unambiguous. There is simply no good reason to link unrelated code via
upvar, or to modify data via unrelated procedures. Write a tiny set of
wrapper procs for handling it, it is cleaner, easier to read and when
you change your storage or modification methods, your client code can
remain blissfully unaware of the situation. The problem you describe
is purely the result of abusing [upvar]. Or provide an actual example
of code which doesn't work.

Thinking beyond the Tcl scripting level is something which might be
fun to do on the weekend, but it isn't going to solve any of your
actual structural issues. This is why I risk my life here by repeating
that experienced programmers don't appreciate the two level in Tcl.
The lower level can and may contain all sorts of voodoo code,
including reference counts and COW or whatever. These should never be
'exported' to the upper scripting layer in actual practice, or even in
your thinking processes. These layers are ideologically distinct, you
can't mix and match concepts and have a rational discussion.

And what happens is that the technically advanced programmers have the
most trouble keeping the layers functionally distinct, so this sounds
initially like an insult. At the Tcl scripting level, placing a dollar
sign in front of a variable name gives the value of the variable. It
is voodoo that this may not result in an actual copy, but if a
programmer can't figure out when they are making a copy, a little
extra used up space is the least of their problems. But, something is
interesting here with the dict data structure and the dict commands. I
have to admit that I am starting to appreciate them a lot more since
yesterday:

namespace eval ::my {
variable myDict ...
}

proc ::my::findValues { pattern } {
variable myDict
return [dict filter $myDict value $pattern]
}

Does the variable myDict get duplicated here? If so, then the
namespace code is just as broken as the [dict] code, because I thought
the whole point of using namespaces and variables was to easily pass
around data by reference. I admit I never considered the duplication
potential, but in the above code, you could have another proc to do
the update, completely isolating the details of your data from the
clients who use it. At the very least, it avoids copying any more
information into the client than they actually need, and if they need
to update, they only pass in the changes.

Alexandre Ferrieux

unread,
Dec 22, 2007, 1:51:30 PM12/22/07
to
On Dec 22, 6:58 pm, "Donal K. Fellows" <donal.k.fell...@man.ac.uk>
wrote:
>

> You're busy constructing the Himalayas out of a minor case of acne.

OK. It was a pleasure talking to an open mind.

-Alex

miguel

unread,
Dec 22, 2007, 2:02:56 PM12/22/07
to
tom.rmadilo wrote:
> If the data is truly
> important, it should not be created or stored as a local variable
> which needs to be accessed via an upvar where you guess the level
> somehow.

Can't say I agree. Proc-local vars have at least two nice properties not
shared by namespace variables:

(a) they are collected automatically when the proc returns: automatic
lifetime management

(b) you do not have to worry about long-range name collisions (with eg
some package that might be loaded)

I do find the idea of "absolute references to local vars" (ie,
accessible via upvar without having to guess the level) at least intriguing.

Donal K. Fellows

unread,
Dec 22, 2007, 3:36:30 PM12/22/07
to

You're still making mountains out of molehills.

While there are patterns of use for both arrays and dictionaries which
overlap, they are fundamentally distinct. Arrays support traces on
elements, which are variables. Dicts are ordered and nestable values.

Donal.

Donal K. Fellows

unread,
Dec 22, 2007, 3:40:23 PM12/22/07
to
miguel wrote:
> I do find the idea of "absolute references to local vars" (ie,
> accessible via upvar without having to guess the level) at least
> intriguing.

As things stand, if you have the absolute level code (which is "#0" for
global, of course) you can do "absolute references" but they're not
things that persist once the call returns. But you can do that now.

Longer term, I think some sort of closure scheme would be cool, and that
would allow for "long-lived local variables". But it's tricky in many
ways. :-)

Donal.

Donal K. Fellows

unread,
Dec 22, 2007, 3:45:56 PM12/22/07
to
tom.rmadilo wrote:
> namespace eval ::my {
> variable myDict ...
> }
> proc ::my::findValues { pattern } {
> variable myDict
> return [dict filter $myDict value $pattern]
> }
>
> Does the variable myDict get duplicated here?

No. (Well, not really. There's a local variable table entry, yes, but it
points to the namespace variable instead of holding a value.) The value
inside it does temporarily gain two references as part of the process of
passing it to [dict filter], but that's normal and not duplication. (And
I can report right now that [dict filter] does not attempt to avoid
building a dictionary for its result. I suppose that could be optimized
in the future, but it's quite a bit of work for a rare case; I'd be more
likely to write a new set of bugs in the process... :-))

Donal.

tom.rmadilo

unread,
Dec 22, 2007, 4:36:00 PM12/22/07
to
On Dec 22, 12:45 pm, "Donal K. Fellows"
<donal.k.fell...@manchester.ac.uk> wrote:

> tom.rmadilo wrote:
> > Does the variable myDict get duplicated here?
>
> No. (Well, not really. There's a local variable table entry, yes, but it
> points to the namespace variable instead of holding a value.) The value
> inside it does temporarily gain two references as part of the process of
> passing it to [dict filter], but that's normal and not duplication.


Okay, so, miguel has pointed out that there is some need to use local
variables as a way to clean up after yourself. Even though this seems
like very short lived data, which should be as compact as possible,
the idea is that a reference type [dict reference] would somehow avoid
something. The something is apparently the inability to figure out
what level to use for upvar. But is this the issue? You already have
to pass the variable name from parent to child or caller to callee,
and that is all you need. You don't need a level:

proc ::my::dictExists { dictName } {

upvar $dictName localDict
if {[info exists localDict]} {
set size [::my::dictSize localDict]
} else {
set size -1
}
}

proc ::my::dictSize { dictName } {

upvar $dictName localDict
return [dict size $localDict]
}

proc ::my::dictPrint { dictName {type value} {filter *} } {

upvar $dictName localDict

if {[dictExists localDict] == -1 } {
puts "Dict $dictName Does Not exist!"
return
} else {
puts "Dict $dictName where $type like $filter :"
}

dict for {key value} [dict filter $localDict $type $filter] {
puts "$key = $value"
}

return
}

The idea is that you pass the reference name and use upvar at each
level.

Donal K. Fellows

unread,
Dec 22, 2007, 4:59:18 PM12/22/07
to
tom.rmadilo wrote:
> Okay, so, miguel has pointed out that there is some need to use local
> variables as a way to clean up after yourself. Even though this seems
> like very short lived data, which should be as compact as possible,
> the idea is that a reference type [dict reference] would somehow avoid
> something. The something is apparently the inability to figure out
> what level to use for upvar. But is this the issue? You already have
> to pass the variable name from parent to child or caller to callee,
> and that is all you need. You don't need a level:

For various reasons, always give a level with [upvar]. It makes things
faster and tackles some ugly side cases too. :-)

But apart from that, I'm not grokking what's going on with a "dict
reference". Maybe it's the Christmas spirit getting in the way, but I
just don't get what sort of magic is supposed to be going on with it.
Without that, I can't tell if it's a missing feature or something that
should be done a different way. Or even something that's already been
done already without fanfare. :-)

> The idea is that you pass the reference name and use upvar at each
> level.

That's what I do anyway; it makes the code much easier to reuse.

Donal.

Fredderic

unread,
Dec 22, 2007, 11:55:17 PM12/22/07
to

Here's the catch. The problem occurred from a collision of old and new
code (there's code going back a decade in there). Both pieces of code
do their own jobs quite well, but they really weren't designed to work
together. A [dict] is being constructed, and will at the end be
assigned to a namespaced variable, which will be referenced hence-forth
through [upvar]. But at this point, I'm only constructing one branch
of a larger tree, and will be returned back to the caller wherein it
will be plugged into the root [dict] (although at some time in the
future it could be conceivable that it itself will end up part of a
sub-branch, and the new code is being written with that possibility
in mind).


> Thinking beyond the Tcl scripting level is something which might be
> fun to do on the weekend, but it isn't going to solve any of your
> actual structural issues. This is why I risk my life here by repeating
> that experienced programmers don't appreciate the two level in Tcl.
> The lower level can and may contain all sorts of voodoo code,
> including reference counts and COW or whatever. These should never be
> 'exported' to the upper scripting layer in actual practice, or even in
> your thinking processes. These layers are ideologically distinct, you
> can't mix and match concepts and have a rational discussion.

I'm not terribly worried about COW side-effects (reminds me of that
darned "cows with guns" song)... It's not a particularly huge structure
I'm working with. Essentially I am just pulling out a branch, modifying
that, and then plugging it back in where it came from. The issue is
that there's no facility within the legacy code to return the modified
branch, so it's necessary to pass some form of reference. At the
moment, as mentioned, I'm essentially extracting the relevant
sub-[dict], sticking it into a global variable, passing that variable
by name, and then transferring the contents back into the dict at the
end.

The part I'm fussing about, isn't even so much easy access to the value
I want, but most importantly, abstraction of WHERE it might reside at
the time. It just happens that WHERE in this case, has ended up being
somewhere other than a simple variable, and that the intervening layers
aren't expecting anything other than simple values; a variable or
array element name counts, but [dict]s still require arcane magic.


> And what happens is that the technically advanced programmers have the
> most trouble keeping the layers functionally distinct, so this sounds
> initially like an insult. At the Tcl scripting level, placing a dollar
> sign in front of a variable name gives the value of the variable. It
> is voodoo that this may not result in an actual copy, but if a
> programmer can't figure out when they are making a copy, a little
> extra used up space is the least of their problems.

I think I've got that down pretty well, now. The playing around I did
with the wiki [future] package to give myself evil unset trace within a
Tcl_Obj, helped clear all that up pretty well. (Yes, I know that's an
evil evil idea, but it solves an equally evil performance issue rather
well.)


Fredderic

Fredderic

unread,
Dec 23, 2007, 12:13:26 AM12/23/07
to
On Sat, 22 Dec 2007 13:36:00 -0800 (PST),
"tom.rmadilo" <tom.r...@gmail.com> wrote:

> The idea is that you pass the reference name and use upvar at each
> level.

But what do you do, if the intervening levels weren't expecting any a
variable reference, and if you don't know how many intervening levels
lie between your present proc, and the proc which will actually be
doing the manipulation..?

Most of this discussion seems to have assumed a new piece of code,
written (and designed) from scratch. I'm mixing new and half-decade-old
code in frightening ways.

In fact, the oldest piece of code in there actually dates back to about
'96. (I think the file date would show that, if the project hadn't
been moved between six different machines, including three different
architectures, during its lifetime) It was a bunch of [proc]s I wrote
when I was still fairly new to TCL, and was still more used to C...
[proc]s of the form:

proc strlwr {text} {string tolower $text}

Just last night I went through and re-wrote them as:

interp alias {} strlwr {} string tolower

Most of those [proc]s haven't been used in anything new this decade, but
there's enough legacy code still using it that I can't ditch the old
tcl file just yet, and it's not evil enough to bother re-writing its
dependants.


Fredderic

Fredderic

unread,
Dec 23, 2007, 12:58:11 AM12/23/07
to
On Sat, 22 Dec 2007 09:02:54 -0800 (PST),
"tom.rmadilo" <tom.r...@gmail.com> wrote:

> Here is a series of examples using namespaces and upvar together to
> use an nvlist:
>
> http://junom.com/gitweb/gitweb.perl?p=twsdl.git;a=blob;f=packages/tws/tcl/nvlist-init.tcl
>
> The idea is simpler than a dict and very space efficient. Obviously
> the general concept of a name-value list is like a serialized array,
> however the examples above show how a few simple API can introduce the
> concepts of both enumeration and records. The last example compares
> the nvlist with the example given for [dict] in the Tcl manpages. The
> space efficiency is worth noting: you create two nvlists, one for
> values and one for the names, so names are only included once and can
> be used to get the index of the value list.

Storing a name-value list as a list of names, and a list of values,
rather than name-value pairs, has some nice properties in TCL. I did
it that way in my keylist implementation.

You can search on either keys or values VERY easily (even before the
advent of -index in [lsearch]), and given an item index (or many), you
can obtain either the name or the value with equal ease. It's not an
"obvious" structure, I suppose, but it's very nice. It even plays
better with foreach, than your name-value pairs as sub-lists, plus I
suspect it has an even better space saving.

In the list of names, list of values concept, there's only three
lists. One with two items (well, three or four, as I mention below),
and two with several each. In yours, there appears to be n/2+1 lists,
where n is the number of items.

The only downside of my approach (and the thing which has prevented me
from replacing it with [dict]) is that I was tempted, and on one day
when I wasn't thinking too clearly, did, make use of the fact that
there's only two items in the top-level [keyl] list, by having a
"keyed-list magic identifier" first, and allowing extra "out-of-band"
data to be stored in a fourth item. The former isn't so bad, but the
latter was a mistake I'm regretting. ;)

The basic pattern was this:

proc keylget {keyl key} {
return [lindex $keyl [lsearch -exact [lindex $keyl 0] $item]]
}

proc keylset {keylName key value} {
upvar 1 $keylName keyl
lassign $keyl keys vals
set idx [lsearch -exact $keys $key]
if { $idx >= 0 } {
set vals [lreplace $keys $idx $idx $value]
} else {
lappend keys $key
lappend vals $val
}
set keyl [list $keys $vals]
}

proc keylkeys {keyl} {
return [lindex $keyl 0]
}

and so forth... Although I've got a little extra code to "upgrade" a
key-value list of the sort [array get] returns (based on the magic
identifier in item 0, which is ugly enough that I didn't think I'd
ever see anything remotely like it any regular list ;) ). In my
implementation, though, [keylset] actually wraps [keylset_l] which
accepts a list of key and value pairs to set (the kind of list you'd
find sitting in args), and [keylget] accepts an optional variable name
into which to place the value (returning true/false instead, which is
insanely handy).

There's also a [keylget_d] which looks basically like this:

proc keylget_d {keyl key default} {
keylget $keyl $key default
return $default
}

It might have different performance with very large sets of data, but
you get that with any algorithm. :)


Fredderic

Alexandre Ferrieux

unread,
Dec 23, 2007, 8:55:50 AM12/23/07
to
On Dec 22, 9:36 pm, "Donal K. Fellows"

<donal.k.fell...@manchester.ac.uk> wrote:
> Alexandre Ferrieux wrote:
> > OK. It was a pleasure talking to an open mind.
>
> You're still making mountains out of molehills.

You'd be kind to leave me the judge as to the metric height of your
contempt.
Moreover, the "failure to communicate" hypothesis is a bit weakened by
Miguel's understanding.
He must be cheating, reading my mind instead.

> While there are patterns of use for both arrays and dictionaries which
> overlap, they are fundamentally distinct. Arrays support traces on
> elements, which are variables. Dicts are ordered and nestable values.

You can endlessly state those obvious truths; that doesn't come within
a ligh-year to addressing what was simply a warning to people who
might "switch from arrays to dicts" without knowing of all the
consequences.

-Alex

Andreas Leitgeb

unread,
Dec 23, 2007, 11:13:14 AM12/23/07
to
Alexandre Ferrieux <alexandre...@gmail.com> wrote:
> Moreover, if you write a generic procedure like an iterator, taking an
> arbitrary script as argument, then the dict must be passed by name to
> the iterator, just in case the arbitrary script needs write access.

You seem to have wrong expectations of dicts. Dicts are values just
like e.g. numbers are:

set a 42
set b $a ;# at this point, b points to the same number-object as a
incr b; # now, while writing 43 back to b, the previous value sharing
# is broken, and b now points to separate value-object for 43.
puts "a=$a b=$b" ; # -> a=42 b=43
I surely wouldn't want to see "a=43 ..." after this piece of code.

The same happens with dicts.

It doesn't matter that only one of the bits flipped, we get a new
number. And in the same way it doesn't matter how much of the dict-
structure remains, when you replace just one element,you get a new
dict.

(take the last paragraph with a grain of salt :-)

Alexandre Ferrieux

unread,
Dec 23, 2007, 11:27:51 AM12/23/07
to
On Dec 23, 5:13 pm, Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at>
wrote:

> Alexandre Ferrieux <alexandre.ferri...@gmail.com> wrote:
> > Moreover, if you write a generic procedure like an iterator, taking an
> > arbitrary script as argument, then the dict must be passed by name to
> > the iterator, just in case the arbitrary script needs write access.
>
> You seem to have wrong expectations of dicts.  Dicts are values just
> like e.g. numbers are:

Yeah, I know, thanks.

-Alex

Donal K. Fellows

unread,
Dec 23, 2007, 12:02:48 PM12/23/07
to
Alexandre Ferrieux wrote:
> You'd be kind to leave me the judge as to the metric height of your
> contempt.

If I have any contempt, it is for this whole "you hate me" style in
this thread. Please stop that; it's unedifying.

> Moreover, the "failure to communicate" hypothesis is a bit weakened by
> Miguel's understanding.
> He must be cheating, reading my mind instead.

I'm not saying anything about communication between you and him. It's
*me* that's misunderstanding you. Since we're different people (yes,
there are pictures to prove that) and so start from different points
of view, the fact that one of us is having a problem is not impossible
at all.

> You can endlessly state those obvious truths; that doesn't come within
> a ligh-year to addressing what was simply a warning to people who
> might "switch from arrays to dicts" without knowing of all the
> consequences.

I don't recommend that anyone changes their code at all without good
reason. What I will note is that for some things that people were
previously using arrays for, dictionaries will be a better fit. For
other things, they won't. No big surprise there. There may be some
need to map out the frontier between these things on a wiki page; I
think that's the right venue anyway, since I know I do not know
exactly which is best in all circumstances. In my mind it's a bit like
studying how to get good performance in Tcl; the core documentation
says nothing much, and there's a good number of wiki pages on the
topic.

FWIW, there doesn't seem to be a page on dict vs. array yet and I
can't be bothered to seed one. It's an opportunity for others
though...

Donal.

tom.rmadilo

unread,
Dec 23, 2007, 1:04:26 PM12/23/07
to
On Dec 22, 9:58 pm, Fredderic <my-name-h...@excite.com> wrote:
> On Sat, 22 Dec 2007 09:02:54 -0800 (PST),
>
> "tom.rmadilo" <tom.rmad...@gmail.com> wrote:
> > Here is a series of examples using namespaces and upvar together to
> > use an nvlist:
>
> >http://junom.com/gitweb/gitweb.perl?p=twsdl.git;a=blob;f=packages/tws...

> You can search on either keys or values VERY easily (even before the
> advent of -index in [lsearch]), and given an item index (or many), you
> can obtain either the name or the value with equal ease. It's not an
> "obvious" structure, I suppose, but it's very nice. It even plays
> better with foreach, than your name-value pairs as sub-lists, plus I
> suspect it has an even better space saving.
>

With my nvlist, if you use the toName or toValue searches, you get a
list of names (at index 0) or values (at index 1), this works for
foreach. But if you do a search on the nvlist, you get another nvlist;
that is, you get a list of matching entries in the nvlist, not values
from within an entry.

namespace eval ::tws::nvlist::weekdays {
variable dayList
set tDays [search dayList {* T*}]
}
==>{{2 TUE} {4 THU}}


> In the list of names, list of values concept, there's only three
> lists. One with two items (well, three or four, as I mention below),
> and two with several each. In yours, there appears to be n/2+1 lists,
> where n is the number of items.

I'm not sure how you are counting lists in my example. Each nvlist is
one list:

{{1 MON} {2 TUE} {3 WED}...}

or:

{ {xsdvar xsd http://...} {xsivar xsi http://...} ...}

So to use two nvlists for a record-like idea, you just have one other
list. You could use the createEnum proc:

::tws::nvlist::createEnum schemaNamesIdx varName prefix namespace ;==
creates ==>
{{0 varName} {1 prefix} {2 namespace}}

The purpose of the nvlist (in my case) is not to bring lightning speed
to data manipulation, but to rid my code of kludge uses of [lsearch]
which avoids data being created as a list and transformed into an
array for searching. It also regularizes the use of lsearch which has
a significant gotcha if you use the switch -all or not.

Another purpose is to use nvlist to create thin, well named search
procs, which I tried to demonstrate through a few examples linked to
above. So a namespace could use the nvlist procs to create something
more understandable than toName or toValue. The weekdays example is
simple and maybe less confusing:

namespace eval ::tws::nvlist::weekdays {

variable dayList

namespace import ::tws::nvlist::*
createEnum dayList "" MON TUE WED THU FRI SAT SUN
}

proc ::tws::nvlist::weekdays::toDay { num } {
variable dayList
return [toValue dayList $num]
}

proc ::tws::nvlist::weekdays::toNum { day } {
variable dayList
return [toName dayList $day]
}

Nvlist also supports many-to-many mappings, but can still enforce
uniqueness if you use the addItem proc.

The purpose of my example was to illustrate that there is more to
organizing an application than just the procs you use. How you
organize, store and pass data is very important. A reference type,
whatever that is, on a dict will never solve this issue.

Alexandre Ferrieux

unread,
Dec 23, 2007, 1:17:14 PM12/23/07
to
On 23 déc, 18:02, "Donal K. Fellows" <donal.k.fell...@man.ac.uk>
wrote:
>

> > Moreover, the "failure to communicate" hypothesis is a bit weakened by
> > Miguel's understanding.
> > He must be cheating, reading my mind instead.
>
> I'm not saying anything about communication between you and him. It's
> *me* that's misunderstanding you.

Then the "use the source, Luke" is a gratuitous, unfair stab.
FWIW, I would *love* to have a better brain-compatibility with you,
Donal.
In the past, it hadn't struck me as so hard to reach your wavelength.
In fact, I suspect that touching dicts hits a nerve. I can understand
this; take it easy, I do find hashtables-as-values a brilliant
concept, and its implementation a tour de force. (If there's one thing
I find hard to swallow, it is the near-miss of http://wiki.tcl.tk/10813
, namely the lack of a seamless bridge between dicts and arrays).

> I don't recommend that anyone changes their code at all without good
> reason.

You don't, but the style of the "Dictionaries as alternative to
arrays" page in the ActiveTcl Help just released, is much less subtle.
I find it dangerous. Hence my warning.

-Alex

tom.rmadilo

unread,
Dec 23, 2007, 1:17:55 PM12/23/07
to
On Dec 22, 8:55 pm, Fredderic <my-name-h...@excite.com> wrote:
> On Sat, 22 Dec 2007 10:42:51 -0800 (PST),

> Here's the catch. The problem occurred from a collision of old and new


> code (there's code going back a decade in there). Both pieces of code
> do their own jobs quite well, but they really weren't designed to work
> together. A [dict] is being constructed, and will at the end be
> assigned to a namespaced variable, which will be referenced hence-forth
> through [upvar].

Why? If you have a namespace var you have an unambiguous reference,
you don't need upvar, at least to find the variable.

> The part I'm fussing about, isn't even so much easy access to the value
> I want, but most importantly, abstraction of WHERE it might reside at
> the time. It just happens that WHERE in this case, has ended up being
> somewhere other than a simple variable, and that the intervening layers
> aren't expecting anything other than simple values; a variable or
> array element name counts, but [dict]s still require arcane magic.
>

You are describing stuff that nobody can see. There is no code to look
at. I try to illustrate the way I look at things with examples which
work so everyone will know and can judge for themselves. You are
trying to identify a complex code issue using words. What is more
confusing is that you are using dicts (released last week) and the
code is a decade old. Why are you trying to use brand new features on
something which must have been working in the past, and then complain
that it isn't what you need, exactly? I can tell you that if we get a
magic reference to dict, old code will still not work, how could it?
That would presuppose that it used some other reference type in the
past, which doesn't exist.

tom.rmadilo

unread,
Dec 23, 2007, 1:54:31 PM12/23/07
to
On Dec 22, 9:13 pm, Fredderic <my-name-h...@excite.com> wrote:
> On Sat, 22 Dec 2007 13:36:00 -0800 (PST),
>
> "tom.rmadilo" <tom.rmad...@gmail.com> wrote:
> > The idea is that you pass the reference name and use upvar at each
> > level.
>
> But what do you do, if the intervening levels weren't expecting any a
> variable reference, and if you don't know how many intervening levels
> lie between your present proc, and the proc which will actually be
> doing the manipulation..?

In general, this is called SOL. This situation isn't unique to dict or
even Tcl, if you don't know what you are doing or what data you are
referencing, a 'reference type' is the last thing you need to work
with. This is what happens with C, right. A pointer can point to
anything. Confused programmers end up making huge mistakes. One of the
goals of Tcl is to eliminate these mistakes. In Tcl you can only refer
to things which have names, you can use values to a command or
procedure. The command or procedure will interpret these values in an
opaque way. The fact that you can guess what this opaqueness is
doesn't count. But there is no magic here, you still have to pass the
same stuff to a command or proc. Commands and procs still have to do
their job and code has to be composed to work together. And when
developers write new code and new data types, they do so because it
didn't exist before, to solve particular problems, but not to be a
plugin replacement for something else which existed before.

>
> Most of this discussion seems to have assumed a new piece of code,
> written (and designed) from scratch. I'm mixing new and half-decade-old
> code in frightening ways.


Most programmers assume that they don't know the future. How, please
explain how, the old code worked in the past, and also if it doesn't
work with a dict, which was released last week, how is it going to
work with the magic, never-before-existing reference type? Either your
code is magic or the new reference type is magic, or both.

tom.rmadilo

unread,
Dec 23, 2007, 2:26:32 PM12/23/07
to
On Dec 23, 10:17 am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:

I think I'm starting to get it. Use an array if you want to ensure
that you have a single copy of the data, and don't assume that you can
just modify existing code which uses an array and use a dict instead,
most likely you can't.

Arrays have been around for so long that we tend to forget the actual
properties which may most distinguish them from other data at the Tcl
scripting level. The main one is that arrays must be preinitialized.
You have to set some array variable before you can use any array
syntax or array commands. The second is another side of the first: all
arrays have names. There are no anonymous arrays. A major functional
property of arrays is that they do not have values at the Tcl
scripting level. You can't pass them into a proc and a proc cannot
return an array.

The impact is often subtle. If you must pass the data in an array to a
value eating proc, you must export the array contents into a list. In
addition, you cannot do a direct filter to create a subarray. Instead
you must pollute your working context with several helper variables,
run a foreach and create a new array by name. I have lots of examples
of doing this, it is more or less ad-hoc, but it works. It works, but
it is difficult to maintain, and any subtle changes within the context
of use could lead to the need to rethink the ad-hoc code.

The requirement of a name for such a fundamental object is something
unique in Tcl.

Roy Terry

unread,
Dec 23, 2007, 2:56:56 PM12/23/07
to
tom.rmadilo wrote:
> On Dec 23, 10:17 am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
> wrote:
>> On 23 déc, 18:02, "Donal K. Fellows" <donal.k.fell...@man.ac.uk>
>> wrote:
>
>>> I don't recommend that anyone changes their code at all without good
>>> reason.
>> You don't, but the style of the "Dictionaries as alternative to
>> arrays" page in the ActiveTcl Help just released, is much less subtle.
>> I find it dangerous. Hence my warning.
>
> I think I'm starting to get it. Use an array if you want to ensure
> that you have a single copy of the data, and don't assume that you can
> just modify existing code which uses an array and use a dict instead,
> most likely you can't.
>
> Arrays have been around for so long that we tend to forget the actual
> properties which may most distinguish them from other data at the Tcl
> scripting level. The main one is that arrays must be preinitialized.
> You have to set some array variable before you can use any array
> syntax or array commands. The second is another side of the first: all
> arrays have names. There are no anonymous arrays. A major functional
> property of arrays is that they do not have values at the Tcl
> scripting level. You can't pass them into a proc and a proc cannot
> return an array.
I would warn against this verbiage as being too vague and misleading
Better to just read the man pages than publish this.
- There is no "anonymous" array but there is the array with name ""
- you can certainly filter any array easily (what does "direct" mean?)
array set sub [array get main x*]
- array's must be pre-initialized? Before what specifically?
There is "array names non-existent-array" ==> no error

tom.rmadilo

unread,
Dec 23, 2007, 4:44:58 PM12/23/07
to
On Dec 23, 11:56 am, Roy Terry <r...@terryhome.org> wrote:

> tom.rmadilo wrote:
> > You can't pass them into a proc and a proc cannot
> > return an array.
>
> I would warn against this verbiage as being too vague and misleading
> Better to just read the man pages than publish this.
> - There is no "anonymous" array but there is the array with name ""
> - you can certainly filter any array easily (what does "direct" mean?)
> array set sub [array get main x*]
> - array's must be pre-initialized? Before what specifically?
> There is "array names non-existent-array" ==> no error
>

Well, thanks for making my main point, which is we are so used to
arrays we are relatively blind to the features since we have been
using them or working around them for years.

Let me combine several of your points. The word you don't understand
is 'filter' not 'direct'. If you understand 'filter', there is no
problem. Filter means that you get a subset of the input as output,
directly, as a value. If your filter works like this, an additional
filter can be applied directly without creating a variable. With an
array you cannot:

array set abNames [array get [array get originalArray a*] ab*]

Instead you have to:
array set aNames [array get originalArray a*]
array set abNames [array get aNames ab*]

This demonstrates 'pre-initialize' pretty well, and 'direct' pretty
well, and that arrays do not exist without a name, even "" is a name.
Also, [array get originalArray] returns a list, not an array.

I do agree that comp.lang.tcl is not an official manpage, especially
my writings here. Hopefully nobody will interpret these comments as a
knock against arrays, dicts or lists. Each has specific features. You
can compare and contrast the features, but good or bad is very context
sensitive.

Fredderic

unread,
Dec 24, 2007, 9:34:57 PM12/24/07
to
On Sun, 23 Dec 2007 10:54:31 -0800 (PST),
"tom.rmadilo" <tom.r...@gmail.com> wrote:

> On Dec 22, 9:13 pm, Fredderic <my-name-h...@excite.com> wrote:
>> On Sat, 22 Dec 2007 13:36:00 -0800 (PST),
>> "tom.rmadilo" <tom.rmad...@gmail.com> wrote:
>>> The idea is that you pass the reference name and use upvar at each
>>> level.
>> But what do you do, if the intervening levels weren't expecting any
>> a variable reference, and if you don't know how many intervening
>> levels lie between your present proc, and the proc which will
>> actually be doing the manipulation..?
> In general, this is called SOL. This situation isn't unique to dict or
> even Tcl, if you don't know what you are doing or what data you are
> referencing, a 'reference type' is the last thing you need to work
> with. This is what happens with C, right. A pointer can point to
> anything. Confused programmers end up making huge mistakes.

I think maybe a little too much Egg-Nog was consumed last night...?

Let me lay it down as simply as I can... I wrote the original code
half a decade ago. It could do with some cleaning up, but it still
does what it was intended to do, even in the face of new requirements.

But, this is a modular application, and that is a separate module. I
don't want to have to be concerned with how many levels of proc are
being invoked to achieve the desired function, it should continue to
work as it does now. I don't want to have to concern myself with the
exact implementation of unrelated code, I want to be able to
modify/extend/refactor the implementation of that code at some stage (to
bring it more in line with 8.5 practises) without breaking every piece
of code that relies on that modules API. This means, intentionally not
knowing how it works internally.

In short, I DO know what I'm doing, and the code works just fine. The
problem has been resolved quite satisfactorially, and I have mentioned
in this thread how I got around the old modules short-comings. What
this thread is mostly about, for me at least, is finding a better
solution that blends better with TCL both now and future. I've had to
think around this limitation of TCL before, from time to time, as I
dare say many of us have. This time, though, I'm dealing with existing
code with existing uses elsewhere in the project.


>> Most of this discussion seems to have assumed a new piece of code,
>> written (and designed) from scratch. I'm mixing new and
>> half-decade-old code in frightening ways.
> Most programmers assume that they don't know the future. How, please
> explain how, the old code worked in the past, and also if it doesn't
> work with a dict, which was released last week, how is it going to
> work with the magic, never-before-existing reference type? Either
> your code is magic or the new reference type is magic, or both.

You're making assumptions... I never said it worked with the dict at
all. I quite clearly (I believe, I've written it a few times) stated
that it is middle-code that's causing the problem. You present it with
the data (which it stores in a structure it DOES understand), and a set
of procedures specific to handling the details of your data. Quite
akin, in fact, to the way TCL throws around Tcl_Obj's. A list doesn't
care whether its storing ints, strings, or other lists, it doesn't need
to understand their internal structure. It simply knows it's a list,
and how to perform list-like manipulations.

Well this particular module is responsible for caching, organising, and
indexing a chunk of data in a manner that is used at a few distinct
points in the project. It doesn't care exactly what that data is, as
long as it is given a set of functions (the back-end that I mentioned)
to interact with the specific portions of that information that it needs
to do its job, such as the collation information for indexing, and
optional lifetime information to adjust the caching.


[[ in another message just below the last ]]


>> Here's the catch. The problem occurred from a collision of old and
>> new code (there's code going back a decade in there). Both pieces
>> of code do their own jobs quite well, but they really weren't
>> designed to work together. A [dict] is being constructed, and will
>> at the end be assigned to a namespaced variable, which will be
>> referenced hence-forth through [upvar].
> Why? If you have a namespace var you have an unambiguous reference,
> you don't need upvar, at least to find the variable.

I really don't know how to explain it any simpler. Or for that matter,
why it matters. That's all quite immaterial to the real issue at hand,
and is given only for background curiosity. This manipulation is
called-for WITHIN a branch of a multi-level dict that is being
constructed. It will be called apon again later, but by then the
data's final location will be more static, and the issue much
simpler to deal with. Still, a concrete solution in core would
certainly make things a little simpler there, also.


> You are describing stuff that nobody can see. There is no code to look
> at. I try to illustrate the way I look at things with examples which
> work so everyone will know and can judge for themselves. You are
> trying to identify a complex code issue using words.

Of this, I am aware. Which is why I tried to leave off clouding the
issue with unneccesary details. The key issue, is bring able to reach
from an arbitrary point in the call stack, to an arbitrary point in a
value held in a procedure-local variable. The rest was only mentioned
at all because people like to know there's a need for something, before
they invest time and thought into it.

In C, you have pointers which are valid as long as the data they're
pointing to is valid, and so long as it stays where it was when the
pointer was taken. It doesn't matter where that piece of data is in
the overall structure of things, it can be a simple variable, or within
a struct stored in a one of a list of arrays hanging off a hash. C
takes all the references and algorithms, and boils them down to a
single simple flat value called a pointer. Moreover, code that pointer
is passed to doesn't need to know how that pointer was derived. It
doesn't need to know that there are arrays and lists and hashes and
what-not involved. It just needs to know what that single piece of
data is, and how to interact with it.


> I can tell you that if we get a magic reference to dict, old code
> will still not work, how could it?

Because I'm not talking about a magic reference to a dict, I'm talking
about a magic reference INTO a dict. To the data WITHIN the dict.
Essentially, I'm looking for the dict equivalent to [upvar n foo(bar)
cow], where foo(bar) and n are passed as a single simple variable.
Being able to construct some form of reference that contains within it
the information needed to access the desired storage space, just as
TCL understands how to access the storage space named "bar" within the
encompassing storage space of "foo".

In the meantime, I temporarily copy foo(bar) into ::doob, and pass the
name ::doob as an arbitrary value. But this requires either
fore-knowledge of the full path to the data, and tricky (and likely
fragile) code to reach it. The ability to ask a dict to return a
reference to a given element, will reduce that element to a simple
variable or even just a value.

In fact, I suspect a value is the form it would end up taking, and
perhaps some new command as a substitute to [upvar]. Sort of like how
[apply] turns a list with a given structure into an executing code. One
possible idea that pops into mind, would be to use a new [return] code
to carry, instead of a result or an error, reference information, which
would be built up much the same as the long error traces are, to
provide a description of the entire path to an element of data.
Perhaps even an executable script that can be invoked with the same
semantics as good old [set]. But that's probably a discussion best
held by people better at such designing than I, and I'm sure there are
better ways such ends could be achieved.


... and a Merry Christmas to all whom it matters ... :)

Fredderic

Fredderic

unread,
Dec 24, 2007, 10:58:36 PM12/24/07
to
On Sun, 23 Dec 2007 10:04:26 -0800 (PST),
"tom.rmadilo" <tom.r...@gmail.com> wrote:

> On Dec 22, 9:58 pm, Fredderic <my-name-h...@excite.com> wrote:
>> On Sat, 22 Dec 2007 09:02:54 -0800 (PST),
>> "tom.rmadilo" <tom.rmad...@gmail.com> wrote:
>>> http://junom.com/gitweb/gitweb.perl?p=twsdl.git;a=blob;f=packages/tws...
>> You can search on either keys or values VERY easily (even before the
>> advent of -index in [lsearch]), and given an item index (or many),
>> you can obtain either the name or the value with equal ease. It's
>> not an "obvious" structure, I suppose, but it's very nice. It even
>> plays better with foreach, than your name-value pairs as sub-lists,
>> plus I suspect it has an even better space saving.
> With my nvlist, if you use the toName or toValue searches, you get a
> list of names (at index 0) or values (at index 1), this works for
> foreach. But if you do a search on the nvlist, you get another nvlist;
> that is, you get a list of matching entries in the nvlist, not values
> from within an entry.

Which means extra work to unpack that. I'm not sure you're saving much
here...

> namespace eval ::tws::nvlist::weekdays {
> variable dayList
> set tDays [search dayList {* T*}]
> }
> ==>{{2 TUE} {4 THU}}

So to obtain a simple flat list of the names of days starting in T, how would that look?

A far bigger problem with your search function, though, is that it
depends on there being only one space within each data element.
Imagine, if an element along that same vein contained {3 Cows That Moo},
and I wanted a list of items starting with the letter T. Your example
would match that one there, also, which happens to start with a C.
I've even come across situations, back when I were a little greener,
where TCL decided to quote a value introducing a { where it wasn't
expected. This may well catch out people new to TCL that might be
thinking your nvlist might simplify their life in the same way that it
apparently does yours.


>> In the list of names, list of values concept, there's only three
>> lists. One with two items (well, three or four, as I mention
>> below), and two with several each. In yours, there appears to be
>> n/2+1 lists, where n is the number of items.
> I'm not sure how you are counting lists in my example. Each nvlist is
> one list: {{1 MON} {2 TUE} {3 WED}...}

As I understand it, only until you start pulling items out of its
elements. Look up the TUE, based on the number 2, and it'll likely end
up being split into a sub-list of two elements. After a while, you end
up with a whole bunch of little lists wrapped up in one larger list.

If you did something simpler with plain strings that weren't lists,
that wouldn't happen. But the example you chose yourself is one that's
likely to be used in a list-of-lists context. For most applications
I've ever come across, the string "2 TUE" is neither a suitable input or
output format for a day of the week.


> or:
> { {xsdvar xsd http://...} {xsivar xsi http://...} ...}

That again, looks to me like it'd end up being split. The duality of
TCL values means your main list would end up being represented by
sub-lists. (At least, I'd hope it would... If I'm wrong here, someone
PLEASE let me know!)


> So to use two nvlists for a record-like idea, you just have one other
> list. You could use the createEnum proc:
> ::tws::nvlist::createEnum schemaNamesIdx varName prefix namespace ;==
> creates ==> {{0 varName} {1 prefix} {2 namespace}}

Okay... I soooo don't see how that's any easier to work with. To be
honest, it looks more like obfuscation to me, than any practical
simplification. But I do accept that that's liable to be a very
personal and/or peculiar observation. ;)


> The purpose of the nvlist (in my case) is not to bring lightning speed
> to data manipulation, but to rid my code of kludge uses of [lsearch]
> which avoids data being created as a list and transformed into an
> array for searching. It also regularizes the use of lsearch which has
> a significant gotcha if you use the switch -all or not.

Mine avoided [array]s just fine, AND achieved pretty darned good
performance, too.


> Another purpose is to use nvlist to create thin, well named search
> procs, which I tried to demonstrate through a few examples linked to
> above. So a namespace could use the nvlist procs to create something
> more understandable than toName or toValue. The weekdays example is
> simple and maybe less confusing:
>

> proc ::tws::nvlist::weekdays::toDay { num } {
> variable dayList
> return [toValue dayList $num]
> }

I don't know. Maybe it's just me, but that conveys less information to
be than a simple [lsearch] statement. Sure, it might be a little more
tolerant to data format changes, but if that's a problem, then it's
probably saying something about your initial design that you don't want
to hear, and if it becomes a regular occurrence, then you'll be wanting
to think about a fairly radical change that will render either method
pretty much a mute issue, anyhow.


> Nvlist also supports many-to-many mappings, but can still enforce
> uniqueness if you use the addItem proc.

Okay... So your nvlist is basically wrapping up [list], [lsearch],
[lreplace], and [lrange] in new names with less name cohesion and
functionality, at the expense of performance. Is that about right?

How did we end up talking about your nvlist anyhow... From what you're
saying, it's got less and less to do with [dict]s or anything else in
this thread by the paragraph. *shrugs*

Except, perhaps, to demonstrate where a unified reference entity might
come in handy. Though I'm not sure how that'd fit into your API...


> The purpose of my example was to illustrate that there is more to
> organizing an application than just the procs you use. How you
> organize, store and pass data is very important. A reference type,
> whatever that is, on a dict will never solve this issue.

I'm not sure what issue you're referring to, but it'd solve mine just
fine. A simple example;

proc rand {limit} {expr {int(rand()*$limit)}}
interp alias {} magic0 {} reader
interp alias {} magic1 {} do-magic
interp alias {} magic2 {} do-magic
interp alias {} magic3 {} do-magic
interp alias {} magic4 {} do-magic
proc do-magic {ref key} {return [magic[rand 5] $ref $key]}

proc reader {ref key} {upref $ref dict; return [dict get $dict $key]}
set fu [dict create 0 SUN 1 MON 2 TUE 3 WED 4 THU 5 FRI 6 SAT]
do-magic [ref fu] 2 ==> TUE

To make it a closer match to my original question, "reader" would be
passed as a third argument to do-magic, and ref would involve at least
one nesting level of the main [dict], the item names of which aren't
known at the time "reader" is defined, and to top it all off, "fu" would
be a proc-local variable.


Fredderic

Fredderic

unread,
Dec 25, 2007, 12:24:48 AM12/25/07
to
On Sun, 23 Dec 2007 13:44:58 -0800 (PST),
"tom.rmadilo" <tom.r...@gmail.com> wrote:

> On Dec 23, 11:56 am, Roy Terry <r...@terryhome.org> wrote:
> > tom.rmadilo wrote:
> > I would warn against this verbiage as being too vague and misleading
> > Better to just read the man pages than publish this.
> > - There is no "anonymous" array but there is the array with name ""
> > - you can certainly filter any array easily (what does "direct"
> > mean?) array set sub [array get main x*]
> > - array's must be pre-initialized? Before what specifically?
> > There is "array names non-existent-array" ==> no error
> Well, thanks for making my main point, which is we are so used to
> arrays we are relatively blind to the features since we have been
> using them or working around them for years.

You know, I can't say I agree. For me, someone who has to look up the
man page of [lsort] to remember whether the option is -dictionary or
-directory, I can't say I've experienced your point of being too
familiar with arrays to remember what they do... But nevermind...
We'll all different. :)


> Let me combine several of your points. The word you don't understand
> is 'filter' not 'direct'. If you understand 'filter', there is no
> problem. Filter means that you get a subset of the input as output,
> directly, as a value. If your filter works like this, an additional
> filter can be applied directly without creating a variable. With an
> array you cannot:
>
> array set abNames [array get [array get originalArray a*] ab*]
> Instead you have to:
> array set aNames [array get originalArray a*]
> array set abNames [array get aNames ab*]

No, but you CAN do this:

array set abNames [dict filter [array get originalArray a*] key ab*]

Not sure what sort of performance hit you'd take doing that... But it
rather nicely highlights one of the differences in intent between
arrays and dicts...

Now, if dicts could be given the ability to carry traces against its
items, it could BECOME arrays. The array(element) notation would
become sugar for [dict] in the same way that $var is sugar for [set].
THAT would be sweeeeet... ;)


I think also, though, getting back to the issue, that making this point
too close to the start is also missing the wood for the trees... Your
filtering idea can't support one of the main points of arrays; it's
"variable" nature. [array get] doesn't capture, for example, attached
traces. And for there to be any point to building a page out of this
issue, that is the point that has to be made right up there among the
first few. Arrays contain variables, dicts and lists contain (as well
as themselves being) values. (Perhaps that's a better way of saying
it?)

Something like an [array getall] that returns each element as a dict
with items like "key", "value", "read", "write" and "unset", would be
(is) needed, methinks... (Probably more efficient to return it as a
string instead of a list of real [dict]s, though...?)


This does bring up one gripe I've had with TCL's string rep of an array
for a long time... It's a freaking useless way to represent paired
data, at least until [lsearch] and friends learns how to group elements
in a list. I call for a -group option...!

lsearch -inline -group 2 -item 0 [array get originalArray] ab*

where -group essentially cuts the list up into sub-lists of n items
each (the last sub-list being padded with empty items).


> This demonstrates 'pre-initialize' pretty well, and 'direct' pretty
> well, and that arrays do not exist without a name, even "" is a name.

That last point needs to be said elsewhere. I'd personally advise
against confusing the issue by trying to explain as well why "empty"
isn't the same as "anonymous". Just refer them to an appropriate wiki
page on a definition of what "anonymous" is, and isn't. ;)


> Also, [array get originalArray] returns a list, not an array.

Personally, I'd describe it as a [dict]. Practically, I'd make it a
[dict]. But you already know that. ;)


Fredderic

Fredderic

unread,
Dec 25, 2007, 12:35:06 AM12/25/07
to
On 23 Dec 2007 16:13:14 GMT,
Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at> wrote:

> It doesn't matter that only one of the bits flipped, we get a new
> number. And in the same way it doesn't matter how much of the dict-
> structure remains, when you replace just one element, you get a new
> dict.

One thing I did raise, but haven't seen confirmed or dispelled by
anyone, is what happens when you change an element of a sub-dict.

By my understanding, siblings and more distant relatives of that
sub-dict should be referenced rather than actually being copied.

The only parts that should be "new", are the structures that lie in the
direct parentage of the item that was changed.


Fredderic

Alexandre Ferrieux

unread,
Dec 25, 2007, 8:46:22 AM12/25/07
to
On Dec 25, 6:35 am, Fredderic <my-name-h...@excite.com> wrote:
>
> One thing I did raise, but haven't seen confirmed or dispelled by
> anyone, is what happens when you change an element of a sub-dict.

No duplication is done in the absence of sharing: "deep accessors"
like multi-key [dict set/unset] and multi-index [lset] allow a bit of
mutability.

Mutability is indeed nontrivial is EIAS world, because when you touch
a deeply buried substructure you must invalidate all string reps
"above". Currently, Tcl_Obj's don't maintain a list of backreferences,
so this invalidation cannot be done "upwards". Instead, the two deep
accessors above know the root of the tree (a variable holding the top
list or dict value), and thus can safely invalidate exactly the string
reps that deserve it, namely the "direct parentage" of the mutated
value, as you say. But they don't copy the structures themselves
(assuming no extra references of course).

By slight generalization, if someone wants mutability inside a hybrid
nested structure (like a list of dicts of lists ...), there's no
direct primitive like a "blend" of [dict set] and [lset]. But it is
nonetheless doable in Tcl, if not as efficiently, by temporarily
removing the substructure from its parent at each level:

set ldl [list a b c [dict create x 1 y 2 z [list d e f]]]

# Exercise: mutate f to F in-place (path: 3 z 2)

set dl [lindex $ldl 3]
lset ldl 3 nope
set l [dict get $dl z]
dict set dl z nope
lset l 2 F
dict set dl z $l
lset ldl 3 $dl


I'm sure you can generalize this to a usable hybrid deep accessor ;-)

-Alex

miguel

unread,
Dec 25, 2007, 10:37:12 AM12/25/07
to
Alexandre Ferrieux wrote:
> On Dec 25, 6:35 am, Fredderic <my-name-h...@excite.com> wrote:
>> One thing I did raise, but haven't seen confirmed or dispelled by
>> anyone, is what happens when you change an element of a sub-dict.
>
> No duplication is done in the absence of sharing: "deep accessors"
> like multi-key [dict set/unset] and multi-index [lset] allow a bit of
> mutability.

Somewhat: what is being mutated is a variable's value, not the value
itself. Even when we manage to do it in-place (ie, the same Tcl_Obj as
seen from C), conceptually we are not mutating the value: we are just
saving some code and memory traffic by taking shortcuts in the "destroy
old, create new" process.

Note that "mutability" of this kind is just about everywhere in the
core, not just for "deep accessors". Other commands do similar things,
eg [incr].

> Mutability is indeed nontrivial is EIAS world, because when you touch
> a deeply buried substructure you must invalidate all string reps
> "above". Currently, Tcl_Obj's don't maintain a list of backreferences,
> so this invalidation cannot be done "upwards". Instead, the two deep
> accessors above know the root of the tree (a variable holding the top
> list or dict value), and thus can safely invalidate exactly the string
> reps that deserve it, namely the "direct parentage" of the mutated
> value, as you say. But they don't copy the structures themselves
> (assuming no extra references of course).
>
> By slight generalization, if someone wants mutability inside a hybrid
> nested structure (like a list of dicts of lists ...), there's no
> direct primitive like a "blend" of [dict set] and [lset]. But it is
> nonetheless doable in Tcl, if not as efficiently, by temporarily
> removing the substructure from its parent at each level:
>
> set ldl [list a b c [dict create x 1 y 2 z [list d e f]]]
>
> # Exercise: mutate f to F in-place (path: 3 z 2)
>
> set dl [lindex $ldl 3]
> lset ldl 3 nope
> set l [dict get $dl z]
> dict set dl z nope
> lset l 2 F
> dict set dl z $l
> lset ldl 3 $dl

Interestingly something like this is being implemented in L, it wouldn't
be too difficult to do it in Tcl too. I would say thet the main barriers
are:

(1) coming up with a suitable command (and its syntax) for Tcl to
perform this deep manipulation

(2) accepting that an obj's string rep can mutate even when the
requested access errored out (this is a non-issue for L). To illustrate
the issue:

% set x {{1 2} {1 1}}
{1 2} {1 1}
% lset x 2 1
list index out of range
% set x
{1 2} {1 1}

Would it be acceptable for the last return to be '{1 2} {1 1}' instead?
The lset and dict implementations in the core are careful to not
invalidate string reps until *after* the modification succeeded, at a
small but not zero cost. For "deep structures" (nested dicts and lists)
the data management code for that behaviour becomes more involved and
costly. OTOH, it is relatively easy to invalidate the string reps "on
the way in", at the cost of having done so even if ultimately there is
an error and the modif fails.

Alexandre Ferrieux

unread,
Dec 25, 2007, 12:31:00 PM12/25/07
to
On Dec 25, 4:37 pm, miguel <mso...@users.sf.net> wrote:
> [...] Even when we manage to do it in-place (ie, the same Tcl_Obj as

> seen from C), conceptually we are not mutating the value: we are just
> saving some code and memory traffic by taking shortcuts in the "destroy
> old, create new" process.

If you like :-)

> Note that "mutability" of this kind is just about everywhere in the
> core, not just for "deep accessors". Other commands do similar things,
> eg [incr].

I know, but I was toying with the idea of going one step further,
allowing to mutate values. I do realize that allowing it everywhere
would be costly (because of the back refs); then maybe a possibility
would be to have two separate mutable containers (mlist and mdict),
with the constraint that only a mutable container can contain other
mutable containers. Then we'd implement the backrefs only for the
mutable variants, thus preserving the performance of existing
immutable containers... But of course we'd need strong incentive to
undertake all this, not just the appeal of the idea ;-)

Now back to today's Tcl where only variables are mutable ;-)

> (2) accepting that an obj's string rep can mutate even when the
> requested access errored out (this is a non-issue for L). To illustrate
> the issue:
>
> % set x {{1   2} {1 1}}
> {1   2} {1 1}
> % lset x 2 1
> list index out of range
> % set x
> {1   2} {1 1}
>
> Would it be acceptable for the last return to be '{1 2} {1 1}' instead?

My $1e-39: definitely yes. If someone is using [lset] on a string
+list, any noncanonical string rep that may have been in its history
will be lost in case of success. So, having the string rep normalized
*also* in the case of error is no big deal IMHO. Now you decide :-)

-Alex

tom.rmadilo

unread,
Dec 25, 2007, 4:39:51 PM12/25/07
to

On Dec 24, 7:58 pm, Fredderic <my-name-h...@excite.com> wrote:
> > namespace eval ::tws::nvlist::weekdays {
> > variable dayList
> > set tDays [search dayList {* T*}]
> > }
> > ==>{{2 TUE} {4 THU}}
>
> So to obtain a simple flat list of the names of days starting in T, how would that look?
>
> A far bigger problem with your search function, though, is that it
> depends on there being only one space within each data element.
> Imagine, if an element along that same vein contained {3 Cows That Moo},
> and I wanted a list of items starting with the letter T. Your example
> would match that one there, also, which happens to start with a C.
> I've even come across situations, back when I were a little greener,
> where TCL decided to quote a value introducing a { where it wasn't
> expected. This may well catch out people new to TCL that might be
> thinking your nvlist might simplify their life in the same way that it
> apparently does yours.
>

I'm not making any claims about nvlist, this is an example. Unlike
your vaporware code, you can test and actually comment on my examples.
Your code works perfectly because it doesn't exist. I'm also not
trying to solve every problem ever existing with one page of code.
Sorry if you were mislead.

> >> In the list of names, list of values concept, there's only three
> >> lists. One with two items (well, three or four, as I mention
> >> below), and two with several each. In yours, there appears to be
> >> n/2+1 lists, where n is the number of items.
> > I'm not sure how you are counting lists in my example. Each nvlist is
> > one list: {{1 MON} {2 TUE} {3 WED}...}
>
> As I understand it, only until you start pulling items out of its
> elements. Look up the TUE, based on the number 2, and it'll likely end
> up being split into a sub-list of two elements.

% ::tws::nvlist::weekdays::toDay 2
TUE

> After a while, you end
> up with a whole bunch of little lists wrapped up in one larger list.
>

Where are these things going to end up? The point of the nvlist
interface is to eliminate extra variables needed to get to the data
that you want. Hopefully if you want the day abbreviation associated
with the number '2', you run the correct API in the correct place,
such as inline as an arg or as the second arg to [set]. Again, the
point: one data structure, multiple methods of access. You reuse both
the data and the methods.

> If you did something simpler with plain strings that weren't lists,
> that wouldn't happen. But the example you chose yourself is one that's
> likely to be used in a list-of-lists context. For most applications
> I've ever come across, the string "2 TUE" is neither a suitable input or
> output format for a day of the week.
>
> > or:
> > { {xsdvar xsd http://...} {xsivar xsi http://...} ...}
>
> That again, looks to me like it'd end up being split. The duality of
> TCL values means your main list would end up being represented by
> sub-lists. (At least, I'd hope it would... If I'm wrong here, someone
> PLEASE let me know!)

You are wrong, unless you somehow think that there is a distinction
between a list with atomic elements and one which could be interpreted
as a list of lists (by that distinction, you are probably not wrong).
There is no such distinction in Tcl. This is the whole point of [dict]
access methods: assume this is a dict, could be a list, string,
whatever. If you stop trying to make the type/structure reside within
the object, you might get what is going on. This is also one reason I
use nvlist instead of lsearch. In Tcl you can't tell if the return
value is a list or a string: {Hi how are you}. List or string? if you
use lsearch -inline -all, you would get instead:
{{Hi how are you}} for a single result, or {Hi how are you} for four
results. Big difference. Until we can distinguish between string and
list in Tcl, I'll keep using this type of approach to avoid data
dependent issues.

> > The purpose of the nvlist (in my case) is not to bring lightning speed
> > to data manipulation, but to rid my code of kludge uses of [lsearch]
> > which avoids data being created as a list and transformed into an
> > array for searching. It also regularizes the use of lsearch which has
> > a significant gotcha if you use the switch -all or not.
>
> Mine avoided [array]s just fine, AND achieved pretty darned good
> performance, too.

I think I said that the point was to clean up my code which had a
number of perfectly valid, and most likely faster, ad-hoc methods of
doing these simple mappings. The replacement is a wrapper around a
number of basic Tcl commands, so instead of 3-4 lines of code and
additional variable name pollution, I have one line of code which is
less cluttered, and for me, easier to read. If I use [lsearch], then I
have to remember, and anyone else reading the code needs to remember
the mapping of field number to field name. For instance, this is why
people use arrays, because they remove the ambiguity about what you
wanted to get. A dict does the same thing. You don't need to remember
the internal structure to get what you want. For me this is an
important property. Speed, if that is what you are interested in,
great, if that ever becomes an issue for me, I can always create a
specialized case. I would rather my code isolate and encapsulate
issues and ideas first.

The fact that you have created your own list interface to simplify
your coding is part of my point. There are a hundred ways to slice the
problem, right now. As long as you are clear about what problem you
are solving it doesn't matter.

> > Another purpose is to use nvlist to create thin, well named search
> > procs, which I tried to demonstrate through a few examples linked to
> > above. So a namespace could use the nvlist procs to create something
> > more understandable than toName or toValue. The weekdays example is
> > simple and maybe less confusing:
>
> > proc ::tws::nvlist::weekdays::toDay { num } {
> > variable dayList
> > return [toValue dayList $num]
> > }
>
> I don't know. Maybe it's just me, but that conveys less information to
> be than a simple [lsearch] statement. Sure, it might be a little more
> tolerant to data format changes, but if that's a problem, then it's
> probably saying something about your initial design that you don't want
> to hear, and if it becomes a regular occurrence, then you'll be wanting
> to think about a fairly radical change that will render either method
> pretty much a mute issue, anyhow.
>

Are you pretending to not understand what an interface is? Why do we
have procedures at all in Tcl, isn't that the real issue? There would
be no problem here if everything was just one big harry script.

> > Nvlist also supports many-to-many mappings, but can still enforce
> > uniqueness if you use the addItem proc.
>
> Okay... So your nvlist is basically wrapping up [list], [lsearch],
> [lreplace], and [lrange] in new names with less name cohesion and
> functionality, at the expense of performance. Is that about right?
>

Yes! Well, mostly. Isn't all code trying to achieve some form of
specialization? I'm not writing a language, I'm using one. This is
what you do. You take general ideas, like opening a channel, and
create a command which writes to a log. How stupid! I should have just
opened and closed the file each time instead of boiling it all down
into a single [log Error "too simple, too slow"].

Maybe read up on code reuse. This has actual meaning, and it doesn't
mean keep using the same 4-5 lines of ad-hoc code over-and-over-again.
It means recognizing that you are doing some kind of pattern which
could be condensed. Is it worth the effort? Your choice, but at least
recognize the situation for what it is and stop blabbering about
performance before anything is even written.

> How did we end up talking about your nvlist anyhow...
>

> Except, perhaps, to demonstrate where a unified reference entity might
> come in handy. Though I'm not sure how that'd fit into your API...
>

Ahhhh! You remembered, good. Yes a unified reference to important
data. The point is that you want to reuse the data in multiple places
for multiple purposes (maybe even adding/modifying it). You need an
unambiguous reference and I have outlined two methods for achieving
this goal that work right now, as I have demonstrated.

Where does that fit in into my API? My API requires it! The nvlist
must be available to the caller by reference. For instance:

% ::tws::nvlist::filterIndex ::tws::nvlist::weekdays::dayList 1 {* T*}
TUE THU

::tws::nvlist::weekdays::dayList is the unambiguous reference.

This also answers your question of how to get days matching T*, but
this is definitely ugly. Maybe a new proc set:

proc ::tws::nvlist::matchName { nvlistVar name } {

upvar $nvlistVar nvlist

return [filterIndex nvlist 0 [list $name *]]
}

proc ::tws::nvlist::matchValue { nvlistVar value } {

upvar $nvlistVar nvlist

return [filterIndex nvlist 1 [list * $value]]
}

% ::tws::nvlist::matchValue ::tws::nvlist::weekdays::dayList T*
TUE THU

Interfacing is how you isolate fragile or old chunks of code from your
application, not by providing deep references into it. It's a no-
brainer. Maybe the object gurus can explain it better.

> > The purpose of my example was to illustrate that there is more to
> > organizing an application than just the procs you use. How you
> > organize, store and pass data is very important. A reference type,
> > whatever that is, on a dict will never solve this issue.
>
> I'm not sure what issue you're referring to, but it'd solve mine just
> fine. A simple example;
>
> proc rand {limit} {expr {int(rand()*$limit)}}
> interp alias {} magic0 {} reader
> interp alias {} magic1 {} do-magic
> interp alias {} magic2 {} do-magic
> interp alias {} magic3 {} do-magic
> interp alias {} magic4 {} do-magic
> proc do-magic {ref key} {return [magic[rand 5] $ref $key]}
>
> proc reader {ref key} {upref $ref dict; return [dict get $dict $key]}
> set fu [dict create 0 SUN 1 MON 2 TUE 3 WED 4 THU 5 FRI 6 SAT]
> do-magic [ref fu] 2 ==> TUE
>

How do you maintain 'ref'? Either you don't see the problem, or you
are hoping I will miss it. 'ref' is the magic here. If you can
maintain 'ref' as a unique name, then there is no problem.

However, then you go on, and it appears you want to push the 'ref'
into another interp. The main problem is being able to move the 'ref'
into the new interp, which is supposed to provide name isolation.
Essentially, you want to violate the isolation by allowing shared data
across interps.

Fredderic

unread,
Dec 26, 2007, 8:19:54 AM12/26/07
to
On Tue, 25 Dec 2007 13:39:51 -0800 (PST),
"tom.rmadilo" <tom.r...@gmail.com> wrote:
> On Dec 24, 7:58 pm, Fredderic <my-name-h...@excite.com> wrote:
>>> namespace eval ::tws::nvlist::weekdays {
>>> variable dayList
>>> set tDays [search dayList {* T*}]
>>> }
>>> ==>{{2 TUE} {4 THU}}
>> So to obtain a simple flat list of the names of days starting in T,
>> how would that look?

Awww.... No answer? I was honestly curious.


>> A far bigger problem with your search function, though, is that it
>> depends on there being only one space within each data element.
>> Imagine, if an element along that same vein contained {3 Cows That
>> Moo}, and I wanted a list of items starting with the letter T.
>> Your example would match that one there, also, which happens to
>> start with a C. I've even come across situations, back when I were
>> a little greener, where TCL decided to quote a value introducing a
>> { where it wasn't expected. This may well catch out people new to
>> TCL that might be thinking your nvlist might simplify their life in
>> the same way that it apparently does yours.
> I'm not making any claims about nvlist, this is an example. Unlike
> your vaporware code, you can test and actually comment on my examples.
> Your code works perfectly because it doesn't exist. I'm also not
> trying to solve every problem ever existing with one page of code.
> Sorry if you were mislead.

Oh, testy... I'm not competing with you, here, just trying to figure
out if your example (which you're obviously quite chuffed with) holds
anything useful to me other than the idea of using namespaces and
pretty names (which I think we all get), and trying to figure out why
you've been using nested list type examples with a flat simple list
mechanism, and so many layers of extra code. Personally, a [toDay]
procedure with two lines of code will work just as well, with less room
for mistakes, more visible internal mechanisms (for anyone who needs
to look), and better performance to boot. (No, I'm not performance
centric, though coming from an embedded assembler background, where
every clock cycle counts, ofttimes literally, I do always keep an eye
open for it.)


>>>> In the list of names, list of values concept, there's only three
>>>> lists. One with two items (well, three or four, as I mention
>>>> below), and two with several each. In yours, there appears to be
>>>> n/2+1 lists, where n is the number of items.
>>> I'm not sure how you are counting lists in my example. Each
>>> nvlist is one list: {{1 MON} {2 TUE} {3 WED}...}
>> As I understand it, only until you start pulling items out of its
>> elements. Look up the TUE, based on the number 2, and it'll likely
>> end up being split into a sub-list of two elements.
> % ::tws::nvlist::weekdays::toDay 2
> TUE

Yes, nice to see you're capable of mimicking a monkey on a keyboard.
That's not what I asked. Anyhow.....


>> After a while, you end
>> up with a whole bunch of little lists wrapped up in one larger list.
> Where are these things going to end up? The point of the nvlist
> interface is to eliminate extra variables needed to get to the data
> that you want. Hopefully if you want the day abbreviation associated
> with the number '2', you run the correct API in the correct place,
> such as inline as an arg or as the second arg to [set]. Again, the
> point: one data structure, multiple methods of access. You reuse both
> the data and the methods.

But to what end? You invoke a [search] function with a string
pattern, looking within what is basically list data, and then extract
the relevant value. Where you could have used [lsearch -index] to look
only at exactly the portion of the item you want.

variable dayList
lindex [lsearch -inline -index 0 $dayList 2] 1

I honestly fail to see why that is so complicated. I know, just by
looking at it, exactly what it's doing. It's looking for an item in
the list where the first value (a numeric corresponding to the word) is
"2". From this item, it takes the word associated with that number
"TUE". Your [toDay] is clear enough, but I can't tell at first sight
tell exactly what your [search] does. Does it, for example, return the
first match, or all matches? And what if I need to look for an RE?
Will your [search] do that for me, or do I need to remember yet
another name? There are times when pretty names are helpful, but the
best I can figure is that with your nvlist thing, it's just obfuscation
of the basic list functionality.


>> If you did something simpler with plain strings that weren't lists,
>> that wouldn't happen. But the example you chose yourself is one
>> that's likely to be used in a list-of-lists context. For most
>> applications I've ever come across, the string "2 TUE" is neither a
>> suitable input or output format for a day of the week.

We'll just put it down to bad choice of example, then, shall we? Along
with just about every other example associated with the nvlist code.

If you're going to put forth a flat list wrapper, then use it like a
flat list. Don't try to turn it into a keyed list manager.


>>> or:
>>> { {xsdvar xsd http://...} {xsivar xsi http://...} ...}
>> That again, looks to me like it'd end up being split. The duality
>> of TCL values means your main list would end up being represented by
>> sub-lists. (At least, I'd hope it would... If I'm wrong here,
>> someone PLEASE let me know!)
> You are wrong, unless you somehow think that there is a distinction
> between a list with atomic elements and one which could be interpreted
> as a list of lists (by that distinction, you are probably not wrong).

That IS what I'm saying. Taking the string "2 TUE", and operating on
it with [lindex], causes it to be re-interpreted as a list. Having a
bunch of these within a list, is, I think, the wrong data structure to
be using in this example of yours. My structure for tacking this type
of task (which is one of the main things I was trying to get across) is
rather to store the original list a little differently. Given the list:

{{1 MON} {2 TUE} {3 WED} {4 THU} {5 FRI} {6 SAT} {7 SUN}}

I'd instead be storing it like this:

{{1 2 3 4 5 6 7} {MON TUE WED THU FRI SAT SUN}}

Apart from being composed of only three distinct lists, keeping the keys
and values separate allowed (in the pre-dict and pre-lsearch-index days)
you to readily search either by key, or by value. I could find all the
days starting with T:

[lsearch -all -inline [lindex $dayList 1] T*] => {TUE THU}

Which just by looking at it, tells me that it'll return a list of all
the keys matching the pattern T*. Now this, is something that is
useful to package behind an API. It's a given structure of data,
presented in a consistent manner, that provides MORE functionality than
the basic primitives it's utilising.

Your nvlist looks pretty on the front, but it hides so much useful
functionality, not to mention an unnecessary extra level of namespace,
that I really don't see it as an improvement. If I wanted to use RE's
instead of globs, I'd have to create a new function. If I wanted
case-insensitive searching, I'd have to create a new function. In
fact, doing anything other than the most basic primitives, would require
creating a new function to handle it.

Furthermore, searching a list of items with the form {key value} using
a string glob, as you did, is also wrong. It relies on the keys and
values not containing any spaces.


> There is no such distinction in Tcl. This is the whole point of [dict]
> access methods: assume this is a dict, could be a list, string,
> whatever. If you stop trying to make the type/structure reside within
> the object, you might get what is going on.

But you DID put the type/structure within the object. {2 TUE} is a
structure, not a simple value. nvlist really only saves anything, if
the order of the items, and their position within the list, matters.
But even then, it degenerates to little more than pretty names for
built in functionality, and a set of extra caveats.


> This is also one reason I use nvlist instead of lsearch. In Tcl you
> can't tell if the return value is a list or a string: {Hi how are
> you}. List or string? if you use lsearch -inline -all, you would get
> instead: {{Hi how are you}} for a single result, or {Hi how are you}
> for four results. Big difference.

What, you can't tell whether there's an -all in the command or not? If
-all is given, then it's a list. If -all is not given, then it's a
single item. It ain't magic...

And as I pointed out, your [search] method, without looking at how it's
defined, doesn't indicate this either. Does it return the first match,
or does it return multiple? How, exactly, is that better than
[lsearch]? All it's doing is hiding details which the person using it
will have to remember, instead. Me, as someone with a terrible memory,
finds the verbosity of [lsearch] in this instance, rather handy.


> Until we can distinguish between
> string and list in Tcl, I'll keep using this type of approach to
> avoid data dependent issues.

You've just created an author-dependant issue, however.


>> Mine avoided [array]s just fine, AND achieved pretty darned good
>> performance, too.
> I think I said that the point was to clean up my code which had a
> number of perfectly valid, and most likely faster, ad-hoc methods of
> doing these simple mappings. The replacement is a wrapper around a
> number of basic Tcl commands, so instead of 3-4 lines of code and
> additional variable name pollution, I have one line of code which is
> less cluttered, and for me, easier to read.

nvlist is "a wrapper around a number of basic Tcl commands", but in
your examples, you don't use nvlist directly. Instead, you wrap it,
again. That's the bit that gets me. nvlist, itself, doesn't add any
functionality. Heck, it barely even simplifies anything. The nvlist
create command is essentially just a [set], anyhow, if I remember
correctly. I really don't see how that makes anything easier to read!

The key here, of course, is "for you". I've been speaking from a "for
someone else" point of view (being, as I am, part of the "someone
else" set of people with regard to your code). Hand it to someone
unfamiliar with your nvlist work, and it ends up being more cluttered
because they have to keep in mind the syntax and rules for a set of
new, and quite redundant, commands. Unless, of course, they happen to
think just like you. ;)

As for the variable name pollution, I'd rather slightly pollute a
TEMPORARY variable space, than complicate things with redundant
commands that only serve to reduce functionality, decrease performance,
while introducing extra complexity in other areas (namely, in having to
remember more commands, what they do, what they return, and what
gotcha's are associated with them), and oh yeah, creating
PERMANENT command pollution instead (even if it is contained within a
slight amount of namespace pollution).


> If I use [lsearch], then I have to remember, and anyone else reading
> the code needs to remember the mapping of field number to field name.
> For instance, this is why people use arrays, because they remove the
> ambiguity about what you wanted to get. A dict does the same thing.
> You don't need to remember the internal structure to get what you
> want. For me this is an important property. Speed, if that is what
> you are interested in, great, if that ever becomes an issue for me, I
> can always create a specialized case. I would rather my code isolate
> and encapsulate issues and ideas first.

I think you misunderstand me. I've got nothing at all, against the
creation of procs like [toDay] and [toNum] (as long as they're within
a namespace that imparts greater meaning, or named a little more
informatively). I do that myself quite often. What I don't see as
being any practical use, is your nvlist creation. And to do anything
with your nvlist API directly on the data, rather than through a second
layer of wrapping, brings us right back to step one. You STILL need to
remember the field order, etc. So if that's nvlist's only real
benefit, well it's really not that much.

I have actually had a situation where I specifically didn't want to
remember the field order (or more specifically, I wanted it to be
extensible and flexible). So the first item in the list, actually
contains the names of the fields in the subsequent items. I'd written
wrappers for lindex, lsearch, and the rest, which skipped the first
item for most operations, using it only when the given index isn't a
number, in which case it used [lsearch -exact] to look it up. But even
there, the functions were essentially something like:

proc my_search {args} {
set list [lrange [lindex $args end-1] 1 end]
eval lsearch [lreplace $args end-1 end-1 $list]
}

proc my_lindex {list index} {
if { ! [string is digit -strict $index] } {
set index [lsearch -exact [lindex $list 0] $index]
}
eval [linsert [list lindex $index] 1 [lrange $list 1 end]]
}

(Thankfully, in 8.5, we don't need those convoluted [eval] lines
anymore)


>>> Another purpose is to use nvlist to create thin, well named search
>>> procs, which I tried to demonstrate through a few examples linked
>>> to above. So a namespace could use the nvlist procs to create
>>> something more understandable than toName or toValue. The
>>> weekdays example is simple and maybe less confusing:
>>> proc ::tws::nvlist::weekdays::toDay { num } {
>>> variable dayList
>>> return [toValue dayList $num]
>>> }
>> I don't know. Maybe it's just me, but that conveys less
>> information to be than a simple [lsearch] statement. Sure, it
>> might be a little more tolerant to data format changes, but if
>> that's a problem, then it's probably saying something about your
>> initial design that you don't want to hear, and if it becomes a
>> regular occurrence, then you'll be wanting to think about a fairly
>> radical change that will render either method pretty much a mute
>> issue, anyhow.
> Are you pretending to not understand what an interface is? Why do we
> have procedures at all in Tcl, isn't that the real issue? There would
> be no problem here if everything was just one big harry script.

[toDay] is part of a useful interface. What, exactly, is [toValue]?
Given just that procedure definition, and no prior knowledge of your
nvlist package, can you tell me the format of dayList? What will it
return if I give it 0 or 42? Will it give me an empty string, an
error, or some other value? What I'm saying, is that [toDay], in fact,
most of the procedures in your example usage of nvlist (not nvlist
itself), are much better off being implemented using straight TCL.

That's an entire new API I would have to learn, if I were to start
using your nvlist code. And should I want to do anything non-trivial
with it... Then that's where that SOL acronym comes into play. That's
why I say nvlist is more like obfuscation, than it is practically
useful.


>>> Nvlist also supports many-to-many mappings, but can still enforce
>>> uniqueness if you use the addItem proc.
>> Okay... So your nvlist is basically wrapping up [list], [lsearch],
>> [lreplace], and [lrange] in new names with less name cohesion and
>> functionality, at the expense of performance. Is that about right?
> Yes! Well, mostly. Isn't all code trying to achieve some form of
> specialization? I'm not writing a language, I'm using one. This is
> what you do. You take general ideas, like opening a channel, and
> create a command which writes to a log. How stupid! I should have just
> opened and closed the file each time instead of boiling it all down
> into a single [log Error "too simple, too slow"].

Opening a channel and creating a command which writes to a log, is
doing something useful, and probably in only a fraction of the code
you took to do something that itself, can be done in two or three lines
of code. I have a function which takes a list of lists, matches a word
against all the words of those sub-lists, with case insensitivity and
partial matching. It then returns the first word of each sub-list that
contained at least one match. Now that is useful. But stripping away
existing functionality so you've got prettier names, and get to show
off your knowledge of namespaces, is something very different.


> Maybe read up on code reuse. This has actual meaning, and it doesn't
> mean keep using the same 4-5 lines of ad-hoc code over-and-over-again.
> It means recognizing that you are doing some kind of pattern which
> could be condensed. Is it worth the effort? Your choice, but at least
> recognize the situation for what it is and stop blabbering about
> performance before anything is even written.

I employ code reuse quite often. I just prefer it to be doing
something useful, rather than merely saving 4-5 lines of code. Oh,
speaking of which, what 4-5 lines of code, exactly, does [toDay] save?
Or [toNum], or half the other procs in your nvlist examples. Most of
them can be re-written in 1-2 lines at most, with very little reduction
in readability, and a whole slab less ambiguity.


>> How did we end up talking about your nvlist anyhow...
>> Except, perhaps, to demonstrate where a unified reference entity
>> might come in handy. Though I'm not sure how that'd fit into your
>> API...
> Ahhhh! You remembered, good. Yes a unified reference to important
> data. The point is that you want to reuse the data in multiple places
> for multiple purposes (maybe even adding/modifying it). You need an
> unambiguous reference and I have outlined two methods for achieving
> this goal that work right now, as I have demonstrated.

No you haven't. Explain to me how any of this has anything to do with
my original situation, of needing to access a value nested within a
multi-level [dict], held within a proc-local variable, some arbitrary
(and not predictable) number of levels deep. Or the more general issue
I was getting at, of being able to offer read/write access any
arbitrary value within a future composite structure held within a
variable that's local to any arbitrary call frame or namespace. The
last is what I was talking about with generalised references.


> Where does that fit in into my API? My API requires it! The nvlist
> must be available to the caller by reference. For instance:
> % ::tws::nvlist::filterIndex ::tws::nvlist::weekdays::dayList 1 {* T*}
> TUE THU

Geeze... If that's not the flimsiest tie-in ever, I don't know what
is. ;)


> ::tws::nvlist::weekdays::dayList is the unambiguous reference.

Which involves absolutely none of what I was talking about. If the
data my back-end functions needed to access were mere namespaced
variables, I wouldn't have ever brought it up in the first place.


> This also answers your question of how to get days matching T*, but
> this is definitely ugly. Maybe a new proc set:
>
> proc ::tws::nvlist::matchName { nvlistVar name } {
> upvar $nvlistVar nvlist
> return [filterIndex nvlist 0 [list $name *]]
> }

Okay, [filterIndex], if it does what I think it does, might actually be
useful. Though I think it'd be far more so as a simple list
manipulator, than having anything at all to do with nvlist.

The ability to convert a list-of-lists into a list-of-nth-items is
actually quite useful. Employing it within your nvlist, or on the
output of an nvlist search, makes far more sense than it being part of
your nvlist, which restricts its usefulness (and hence limits code
reuse, which you were going on about earlier).


> Interfacing is how you isolate fragile or old chunks of code from your
> application, not by providing deep references into it. It's a no-
> brainer. Maybe the object gurus can explain it better.

Actually, again, you missed the point. I was wanting to provide deep
references AROUND the old and fragile code, so the old and fragile code
didn't need to burden itself with new ideas and techniques.


>>> The purpose of my example was to illustrate that there is more to
>>> organizing an application than just the procs you use. How you
>>> organize, store and pass data is very important. A reference type,
>>> whatever that is, on a dict will never solve this issue.
>> I'm not sure what issue you're referring to, but it'd solve mine
>> just fine. A simple example;
>>
>> proc rand {limit} {expr {int(rand()*$limit)}}
>> interp alias {} magic0 {} reader
>> interp alias {} magic1 {} do-magic
>> interp alias {} magic2 {} do-magic
>> interp alias {} magic3 {} do-magic
>> interp alias {} magic4 {} do-magic
>> proc do-magic {ref key} {return [magic[rand 5] $ref $key]}
>> proc reader {ref key} {upref $ref dict; return [dict get $dict
>> $key]} set fu [dict create 0 SUN 1 MON 2 TUE 3 WED 4 THU 5 FRI 6
>> SAT] do-magic [ref fu] 2 ==> TUE
> How do you maintain 'ref'? Either you don't see the problem, or you
> are hoping I will miss it. 'ref' is the magic here. If you can
> maintain 'ref' as a unique name, then there is no problem.

Oh, I see the problem, or I'd have likely done it by now myself. It's
fairly easy to construct a ref. Already done that. Someone who did
get what I was talking about, provided a rather neat (albeit simple)
implementation in the wiki (http://wiki.tcl.tk/20526). It's simple,
doesn't handle the depths to which I needed mine to reach, but it's
actually better than how I started out with mine. But that's what
happens when your doing something to


> However, then you go on, and it appears you want to push the 'ref'
> into another interp. The main problem is being able to move the 'ref'
> into the new interp, which is supposed to provide name isolation.
> Essentially, you want to violate the isolation by allowing shared data
> across interps.

huh?!? Errr..... If you say so... It's good to see yet another
example of your keen understanding of TCL. :)


Fredderic

Fredderic

unread,
Dec 26, 2007, 9:46:49 AM12/26/07
to
On Tue, 25 Dec 2007 05:46:22 -0800 (PST),
Alexandre Ferrieux <alexandre...@gmail.com> wrote:

Thanks for the top explanation... It's basically what I understood, but
cemented a few things that have been a little murky in my understanding.


> By slight generalization, if someone wants mutability inside a hybrid
> nested structure (like a list of dicts of lists ...), there's no
> direct primitive like a "blend" of [dict set] and [lset]. But it is
> nonetheless doable in Tcl, if not as efficiently, by temporarily
> removing the substructure from its parent at each level:
>
> set ldl [list a b c [dict create x 1 y 2 z [list d e f]]]
>
> # Exercise: mutate f to F in-place (path: 3 z 2)
>
> set dl [lindex $ldl 3]
> lset ldl 3 nope
> set l [dict get $dl z]
> dict set dl z nope
> lset l 2 F
> dict set dl z $l
> lset ldl 3 $dl
>
> I'm sure you can generalize this to a usable hybrid deep accessor ;-)

To deal with the mixed container issue, I actually use an [xset] and
[xget] pair, that would be used like this:

xset {ldl list {3} dict {z} list {2}} F

They also accept an optional [upvar] level before the variable name
(incrementing a pure number as appropriate), but essentially does just
what you describe. Except for the removal... But you can bet I'll be
fixing that. ;)

For some reason replacing the item with a dummy, changing it, and then
putting it back again, just seems wrong. But now that you've pointed
it out, it does make perfect sense.

Just for kicks, my ref implementation basically figures out the
appropriate level prefix using [info level], attaches the path if
given, and uses it to set up a bunch of traces on a global shadow
variable, that in turn use [xset] and [xget] as required. Works pretty
well, and it should work even better with that little improvement.
(I've also added in the take on the reference idea from the wiki page,
for the simple case with no path.)

Many thanks indeed.

Now, if only we had real references within TCL, this sort of thing
wouldn't be needed. You'd just generate a reference to the value you
want, by whatever means, and you could set and get it to your hearts
desire. TCL core could assuredly do it without nearly as much fiddling
about. ;)


Fredderic

Alexandre Ferrieux

unread,
Dec 26, 2007, 12:33:16 PM12/26/07
to
On Dec 26, 3:46 pm, Fredderic <my-name-h...@excite.com> wrote:
>
> Now, if only we had real references within TCL, this sort of thing
> wouldn't be needed.  You'd just generate a reference to the value you
> want, by whatever means, and you could set and get it to your hearts
> desire.  TCL core could assuredly do it without nearly as much fiddling
> about.  ;)

If by "real references" you mean "mutable values", then yes, it is
sexy, but not trivial (see nearby post: a new family of types like
mlist, mdict, and mstring; non-miscible with immutable ones; and
backref housekeeping all over the place... sigh...).

So, shall we count you as a +1 for the idea ?

-Alex

Fredderic

unread,
Dec 27, 2007, 12:33:00 AM12/27/07
to

If that's the only way, then sure... ;)

Though I can't help thinking that adding a second set of types that
mirror what's already there is dangerous, at best...

Your main reason as I understand it, for wanting mutable containers, is
because of the issue with backrefs. From what I can see, the whole
fabric of TCL would break down if you slip in a mutable value. An
awful lot depends on the COW practises being maintained, and
mutability seems somehow redundant to me, beyond being a way to
accelerate some interactions while there is only a single reference.

Our existing [upvar] mechanism manages to gain access to a value, by
pointing to the variable holding it instead, thus avoiding a second
reference. I've been thinking that in keeping with what I see of TCL
tradition, a ref could work basically the same way, at least until
someone comes up with a better internal representation. Once the
mechanism is in place and we've seen its broader impact, one could
then look at integrating it more tightly, or perhaps, discover that
such is not even needed.

Furthermore, there are other ideas floating around which will likely
involve modifications to the internal structure of various things, and
which may well themselves open new ways of achieving this goal as well.
For now, I think adding an extension to the existing [upref] mechanism,
and adding the necessary support to the structures that support Tcl_Obj
types, would achieve the goals with minimal impact. Your mutability
ideas, the way I understand them, still require being able to reach
through a compound value to be of much use. A ref, the way I see it,
solves both problems; it contains the extra information that your
mutability would need to be practical, but with only minor burden to
the existing value representations, and at the same time provides the
mechanism for referring to the deep value you with to change.

Essentially, my thinking goes along these lines; for every type of
internal representation of a Tcl_Obj, there is a structure containing
four functions to manipulate that representation. An extra reader and
writer pair that accept a path (and perhaps a third to construct a
path, allowing it to be an opaque entity unique to each individual
type), would I think, satisfy both ideas quite nicely. It would have
all the information needed to invalidate the string reps and whatever
else, and could even be kept internal to the object and invalidated or
maintained as needed. A ref would then be constructed as a series of
these paths, starting with one representing a simple [upvar].

I'm still not sure how one would go about requesting a path, or how
you'd handle the case where a portion of the path ceases to exist. I'm
guessing the ref would need to retain enough information to rebuild the
whole path, and each path segment would need to be marked as invalid,
but none the less remain until it's released. This is where my ideas
end in a great big implosion. ;)


Fredderic

Alexandre Ferrieux

unread,
Dec 27, 2007, 2:17:33 PM12/27/07
to
On Dec 27, 6:33 am, Fredderic <my-name-h...@excite.com> wrote:
> On Wed, 26 Dec 2007 09:33:16 -0800 (PST),
>
> Alexandre Ferrieux <alexandre.ferri...@gmail.com> wrote:
> > So, shall we count you as a +1 for the idea ?
>
> If that's the only way, then sure...  ;)

Guess I couldn't hope for much more ;-)

> Though I can't help thinking that adding a second set of types that
> mirror what's already there is dangerous, at best...

Yup, it's ugly. The alternative of course is to extend existing types,
but I'd hate to bring the bad news to c.l.t: "hey guys now we're
mutable but 20% slower everywhere, including usual immutable stuff"...

> Your main reason as I understand it, for wanting mutable containers, is
> because of the issue with backrefs.  From what I can see, the whole
> fabric of TCL would break down if you slip in a mutable value.  An
> awful lot depends on the COW practises being maintained,

Not sure I understand this wording; I'd rather say it the other way
around:

- I would like mutability of values for situations where COW would be
too costly: large, deep structures, to be modified incrementally,
without taking tedious steps to keep the refcount to one (the "nope"
trick is just that: a trick ;-)

- To do this without "breaking the whole fabric of Tcl" implies to be
able to invalidate the string reps of all referers of a mutated
structure.

- To reach all referers of a given thing implies to have a chain of
backwards references.

- Since we don't want to pollute lists and dicts with them, we create
extra types.

-Alex

tom.rmadilo

unread,
Dec 27, 2007, 4:57:31 PM12/27/07
to
I think that you are going to find that the concept of a deep
structure cannot be applied to a dict. In addition, the concept of a
reference to a deep value/key is also going to be impossible. The
reason is that, first, a dict is a single value. But what is
surprising is that, regardless of how a dict is represented at the C
level (as a chain of dicts), only the top dict can have a name. The
internal 'dict' structures cannot be accessed even within the [dict]
command. For instance, the [dict incr dvar key] only works for top
level keys, you can't incr an internal key. This is the easiest
example:
% dict set a b 1
b 1
% dict incr a b
b 2
% dict set a b c 1
b {c 1}
% dict incr a b c
expected integer but got "c 1"
% dict set a b c {d 1}
b {c {d 1}}
% dict incr a b c d
wrong # args: should be "dict incr varName key ?increment?"

There is also confusion between values and keys in [dict]. Unless the
entire dictionaryValue is used, you can't understand the effect of an
operation upon the dict. In other words, the entire value is important
when making some kind of update. Simple example:

% dict set j 2 2 3 4
2 {2 {3 4}}
% dict set j 2 2 3
2 {2 3}
% dict set j 2 2 3 4
missing value to go with key
% dict set j 2 2 {3 4}
2 {2 {3 4}}

So in the last line, 3 is both a value and a key:
% dict get $j 2 2 3
4

There is also confusion on what key should be used:
% dict set j 2 2 {3 4 3 5}
2 {2 {3 4 3 5}}
% dict get $j 2 2 3
5

Does key 2 2 3 have two values or one? This is not a statement about
the [dict] command, but an illustration that you would need more than
just a reference to handle updating, manipulating or even selecting
data from a dict.

Andreas Leitgeb

unread,
Dec 27, 2007, 5:12:11 PM12/27/07
to
Fredderic <my-nam...@excite.com> wrote:
> Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at> wrote:
>> It doesn't matter that only one of the bits flipped, we get a new
>> number. And in the same way it doesn't matter how much of the dict-
>> structure remains, when you replace just one element, you get a new
>> dict.

> By my understanding, siblings and more distant relatives of that


> sub-dict should be referenced rather than actually being copied.

I think your understandig is right here. This is exactly the grain of
salt I mentioned, when comparing dicts to numbers.

> The only parts that should be "new", are the structures that lie in the
> direct parentage of the item that was changed.

indeed.

Alex shows his "problem" as being the one, that COW happens, rather than
all current references would then contain the new sub-value, but this
would be equivlent to what I spoke about numbers.

Instead, Alex seems to struggle for a reference-type, and leaving aside
that I don't think it will happen any near time, I think, dicts are the
wrong place to put that functionality in. If we had reference-objects,
and magically solved all the problems with them, then adding such a
reference as an element into some dict wouldn't be a problem, in the
same sense as puttig one into a list wouldn't be one, then.

Donal K. Fellows

unread,
Dec 27, 2007, 6:15:45 PM12/27/07
to
tom.rmadilo wrote:
> only the top dict can have a name.

Strictly no. You're naming the variable holding the dict, not the dict
itself. Big difference.

> The
> internal 'dict' structures cannot be accessed even within the [dict]
> command.

No, but some dict subcommands are limited for syntactic reasons. The
[dict update] and [dict with] allow more complex changes to be built and
carried out.

> Does key 2 2 3 have two values or one?

You've been using it as a list of keys, not as a single key. ;-)

Donal.

Fredderic

unread,
Dec 27, 2007, 10:48:07 PM12/27/07
to
On Thu, 27 Dec 2007 11:17:33 -0800 (PST),
Alexandre Ferrieux <alexandre...@gmail.com> wrote:

>> Though I can't help thinking that adding a second set of types that
>> mirror what's already there is dangerous, at best...
> Yup, it's ugly. The alternative of course is to extend existing types,
> but I'd hate to bring the bad news to c.l.t: "hey guys now we're
> mutable but 20% slower everywhere, including usual immutable stuff"...

I'm pretty sure we're both after the exact same thing, just from
different directions...

How, given your mutable types, are you supposed to for example, fetch
a value from the mutable structure, and then increment it for private
use without the modified value being transferred back into the
structure. As far as I can tell, you're asking that ALL standard
accesses to a mutable structure perform copy-on-read, just so that you
can avoid a few copy-on-writes. Masses of the TCL core, at the very
least, would likely need to be modified to specifically recognise the
special case of a mutable structure, in addition to their existing
special handling of shared objects.

Maybe I'm missing something important, but I still don't see why we
need to change the existing structures at all, is what I'm saying. All
the backref information is already present in the path needed to fully
name the value being modified, just in the wrong form. But, the crux
is that in that in order to reach the value you wish to change, you
have to start at the base (held by a variable either local or within a
namespace), and progress through the structure of that value following
the given path, at the end of which is the value you wish to manipulate.

For each of your incremental changes, you will be doing this same
process. Invoking some TCL command which will start at the base, and
work its way along a path to the desired value, and which need only
remember the Tcl_Obj's it passes through along the way so that it can
invalidate their string reps when it's done with the update. This is
essentially what my [xset] procedure does, but in TCL. In this
setting, placing backref information into the Tcl_Obj internal reps, is
an unnecessary redundancy.

The only way I can see your idea working, would be to get a value from
the structure using the usual means (with [dict get] or [lindex]), and
then by changing that value have the change propagated back into the
structure. Which is a totally new concept to TCL, since as soon as you
obtain the value you wish to change, it becomes a shared object. All
the standard TCL code, both core and script, will immediately delete
the value and replace it with an entirely new fresh one. The only way
to avoid that, would be to mark the variable as containing mutable
data, and have all the standard TCL code check for that and initiate the
propagation as needed. (Which goes back to what I was saying in the
first place.)

On the other hand, my reference idea, in its simplest form, would be
just a fully qualified variable name and associated path, wrapped up in
a single entity. But where your idea puts the burden on the Tcl_Obj
internal type system to keep track of backrefs and whether changes to
a variables value are to be back-propogated or not, my idea puts the
burden on the function performing the access, and insists (pending some
kind of maintained caching added at a latter date) that the entire path
is resolved each time, which in either case means the backref
information is readily available at the time of the modification.

The presentation of my idea as a ref, is essentially just an extension
of the existing [upvar] concept, in which a variable is used to
bookmark another variable, without regard to the value held therein (in
fact, without regard even to whether that variable even exists). My
ref implementation would do much the same, but add to that a
description of the path from the base of that value to some point
within it. For this reason the source variable would probably have to
exist at the very least, at the time the reference is created. Though
the reference would likely contain enough information to re-construct
the path should the source variable be deleted fully or partially at a
later point (as long as the source frame was still present, in the
case of a [proc]-local variable).

Your idea will be quick for a large number of incremental changes to a
deep structure, but incur a penalty virtually everywhere else. Mine
will be slow at incremental changes to a deep structure, but have no
impact at all elsewhere. Furthermore, from what I understand, your
proposal basically ends up being an optimisation of mine, that can be
added on later if it's sufficiently useful.


> - To do this without "breaking the whole fabric of Tcl" implies to be
> able to invalidate the string reps of all referers of a mutated
> structure.

I don't think so. As I've pointed out, I believe the problem runs
much, much deeper than that.

On a further note, it would be nice if references could function
automatically, but I think in either of our ideas, a new function would
be needed to perform the operation. My idea would need some kind of
enhanced [set] command that will use the value of the reference, rather
than setting it. ;) Conversely, it MIGHT be acceptable depending on
the API selected for the creation of the reference, for [set] to
recognise the value of a reference as a special case, resulting in a
usage pattern along the lines of [set $ref value], which isn't too
bad, and avoids the need for a new command.


> - To reach all referers of a given thing implies to have a chain of
> backwards references.
> - Since we don't want to pollute lists and dicts with them, we create
> extra types.

I'm not sure how you would improve the speed of my references idea,
without incurring some cost in speed/memory, either. The lightest
impact I can imagine, would be that of a single pointer added to each
list or dict allowing a list of references to be attached, and wouldn't
occur much overhead in the case of no references passing through that
structure. In fact, the only references that would need to be in the
list at all, are the ones which have this item presently within their
path cache. As soon as the reference is told to flush its path cache
(even just from this point forwards), it can also be removed from the
reference list, until such time as it rebuilds the cache.

A more memory-impacting approach would be to attach a reference list to
every single list/dict item, or perhaps to the Tcl_Obj structure
itself, but would allow references to be invalidated in the fastest
possible way. They could then be removed from the list entirely, until
such time as they rebuild their path cache. Although if we were going
to that extreme, then your backrefs idea would probably be just as
workable, with a separate [deep-set] command that uses the backref
information. It would also likely be quite easy to change my reference
idea between the two methods.

Another added bonus of the ref idea, is that it could be used as a sort
of iterator. Exactly how that would work in the context of a dict, I
don't know (you might need to take a snapshot of the current list of
keys, and then iterate over that list). But for lists, no matter how
deep, it would be fairly trivial to add [ref incr] style commands for
iteration (which I think someone else was asking about).

Either way, there'd be no need for new mutable structures.


Fredderic

tom.rmadilo

unread,
Dec 28, 2007, 12:18:46 PM12/28/07
to
On Dec 27, 3:15 pm, "Donal K. Fellows"

<donal.k.fell...@manchester.ac.uk> wrote:
> tom.rmadilo wrote:
> > only the top dict can have a name.
>
> Strictly no. You're naming the variable holding the dict, not the dict
> itself. Big difference.
>

Wait, what? is this different than a regular variable?

% set a {3 4 3 5}
% dict keys $a
3
% dict values $a
5
% set a {3 5}
% dict keys $a
3
% dict values $a
5

What is the difference between a as a variable holding a list and a as
a variable holding a dict? Also, shouldn't either keys or values
indicate a difference between the two dictionary values above? Maybe
not, but my point is that if not, how can a dictionary be considered a
structure that can be accessed in parts without consideration for the
entire value? Looks like it can't even be read without knowing the
full value.

> > The
> > internal 'dict' structures cannot be accessed even within the [dict]
> > command.
>
> No, but some dict subcommands are limited for syntactic reasons. The
> [dict update] and [dict with] allow more complex changes to be built and
> carried out.
>

Yes or No, not no, but. My point must have got lost somewhere. At the
Tcl scripting level, a dict is not a nested series of similar
structures. Maybe at the C level it is, but that could be for
different reasons. This is different than a Tcl namespace, where the
child namespaces have identical access properties as the parent. So
lets back up to a simple example:

% dict set a b 1
b 1
% dict incr a b
b 2
% dict set a b c 1
b {c 1}
% dict incr a b c
expected integer but got "c 1"

You can't get to the key b c in the dict stored in variable a.
Regardless of the distinction between a variable name and a dict, all
keys are not equal in a dict, so it isn't a nested structure at the
Tcl level.


> > Does key 2 2 3 have two values or one?
>
> You've been using it as a list of keys, not as a single key. ;-)

I was thinking that either [dict keys] or [dict values] or both would
indicate a difference between {3 5} and {3 4 3 5}.

Finally, I noticed this (I'm repeating myself I know):


% dict set j 2 2 {3 4}
2 {2 {3 4}}

% dict get $j 2 2 3
4

3 is both a key and 'part of' a value. This again implies that you
can't think of a dict as a nested series of dicts with a fixed meaning
for all operations.

Alexandre Ferrieux

unread,
Dec 28, 2007, 1:41:01 PM12/28/07
to
On Dec 28, 4:48 am, Fredderic <my-name-h...@excite.com> wrote:
> On Thu, 27 Dec 2007 11:17:33 -0800 (PST),
>
> Alexandre Ferrieux <alexandre.ferri...@gmail.com> wrote:
> >> Though I can't help thinking that adding a second set of types that
> >> mirror what's already there is dangerous, at best...
> > Yup, it's ugly. The alternative of course is to extend existing types,
> > but I'd hate to bring the bad news to c.l.t: "hey guys now we're
> > mutable but 20% slower everywhere, including usual immutable stuff"...
>
> I'm pretty sure we're both after the exact same thing, just from
> different directions...

Not sure; below is a more script-level description, while so far I
have rather described the implementation.

# These are toy examples; in real life
# mutability is only useful for larger structures

set a [mlist 1 2 3 [mdict create a 4 b 5 c [mlist 6 7 8]]]
# $a is a mutable "1 2 3 {a 4 b 5 c {6 7 8}}"
set b [mlist 10 20 30 [lindex $a 3]]
# $b is a mutable "10 20 30 {a 4 b 5 c {6 7 8}}"
set a2 $a
set b2 $b
# sharing again one level up

proc foo d {bar [dict get $d c]}
proc bar l {mlset $l 1 777}
foo [lindex $a 3]

# now $a and $a2 are "1 2 3 {a 4 b 5 c {6 777 8}}"
# and $b and $b2 are "10 20 30 {a 4 b 5 c {6 777 8}}"

As you can see, no "copy-on-read" at all. Mutable values can (and love
to) be shared, be it through extra variables or via substructure (like
[lindex $a 3]). No copies are made in the code above.
Here is what happens to (back) references:

1) set a [mlist ...] plumbs the whole tree with proper backrefs.
2) set b [mlist ... [lindex $a 3]] adds a backref from the mdict
value which is currently [lindex $a 3] to the mlist value currently
reachable as $b
3) set a2 $a;set b2 $b add no backrefs, since variables are not
transparent containers (they don't have a separate string rep; only
the Tcl_Obj's they contain, do).

4) mlset $l 1 777 mutates in-place the given mlist, regardless of its
level of sharing, but also scans the transitive closure of its
backrefs, and for each Tcl_Obj thus reached, invalidates the string
rep. In this example, the invalidated string reps are:

"6 7 8" (the leaf mlist)
"a 4 b 5 c {6 777 8}" (the containing dict)
"1 2 3 {a 4 b 5 c {6 7 8}}"
"10 20 30 {a 4 b 5 c {6 7 8}}"

As you can see, variables play no role in the whole scheme. Everything
is about *values*. No "paths" either, allowing the abstraction
provided by the two layers of procs artificially used to reach down
the mutated mlist.

-Alex

Alexandre Ferrieux

unread,
Dec 28, 2007, 2:11:56 PM12/28/07
to
On Dec 28, 7:41 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:

>        "a 4 b 5 c {6 777 8}" (the containing dict)

Typo. I meant

> "a 4 b 5 c {6 7 8}" (the containing dict)

of course.

Also, notice that the deep-scanning of backrefs stops as soon as a
string rep is null (induction hypothesis: when a string rep is
invalidated, all its referers' are too).
So in real life, this scanning may be very shallow most of the time
(unless an idiot keeps putsing the whole toplevel structure to stderr
for fun).

-Alex

Alexandre Ferrieux

unread,
Dec 28, 2007, 2:17:44 PM12/28/07
to
On Dec 27, 11:12 pm, Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at>
wrote:

>
> Instead, Alex seems to struggle for a reference-type, and leaving aside
> that I don't think it will happen any near time, I think, dicts are the
> wrong place to put that functionality in.

No, we widened the picture. Dicts are no special case
(mdict,mlist,mstring)...
Look in a nearby post ;-)

-Alex

Donal K. Fellows

unread,
Dec 30, 2007, 7:07:55 PM12/30/07
to
tom.rmadilo wrote:
> Wait, what? is this different than a regular variable?

No. Why would it be? ;-)

> % set a {3 4 3 5}
> % dict keys $a
> 3
> % dict values $a
> 5

Ugh, that's not a normalized dictionary value. Legal though, and the
last duplicate key's value prevails. Note that from the perspective of
the dictionary code, it's the same (apart from string representation) as
the dict "3 5" so if you produce a new dictionary based on it (e.g.
using the [dict set] command) you'll lose the fact that there was "3 4"
in there. This is a continuation of the theme of lists not preserving
whitespace I suppose, and it's one of the areas (apart from trivial
syntax) where dicts differ from TclX's keyed lists (which I did look at
very hard when planning the implementation).

> What is the difference between a as a variable holding a list and a as
> a variable holding a dict?

Between the variables? Nothing. Their values (or maybe just the internal
representation of those values) differ, but that's all.

> Also, shouldn't either keys or values
> indicate a difference between the two dictionary values above? Maybe
> not, but my point is that if not, how can a dictionary be considered a
> structure that can be accessed in parts without consideration for the
> entire value? Looks like it can't even be read without knowing the
> full value.

I'm tired and full of cold tonight, so I don't really understand the
conclusions you're drawing from this example or just how you're drawing
them. :-( But I'll try anyway...

To me, the best way to do a "modifiable reference to somewhere inside a
dictionary" is to pass in a tuple, being the name of a variable holding
the overall modifiable value, and a list of keys constituting the path
into the dictionary to the bit to modify. It's a bit like passing in the
start of a C structure plus an offset instead of a pointer directly
inside. If you only want to read and a snapshot will do, just get the
value immediately and pass that, and the Tcl_Obj value reference system
(that already exists) will make that nice and fast for you. Note that
none of this is special to dicts; nested lists have identical issues
(though a different interpretation of the path descriptor). Mixing it up
is possible too, but we don't make that easy to do at the script level.

> % dict incr a b c
> expected integer but got "c 1"

This is purely a limitation of [dict incr] and is due to the fact that
if there was allowed an optional increment and an optional key path,
it's ambiguous which the programmer meant. There are several [dict]
subcommands where this is the case, and you can't look at the value of
the keys to determine this because they're arbitrary strings. Also,
requiring everyone to package up their keys as lists (one way to get
around this problem) would make important simpler cases suck.

In other words, it's deliberate and/but unfortunate.

> Regardless of the distinction between a variable name and a dict, all
> keys are not equal in a dict, so it isn't a nested structure at the
> Tcl level.

You've jumped to conclusions. A single key is *utterly* different to a
pair of keys (and necessarily so, since keys are arbitrary values). At
no point have I ever claimed otherwise.

> I was thinking that either [dict keys] or [dict values] or both would
> indicate a difference between {3 5} and {3 4 3 5}.

Why? Dictionaries don't have duplicate values, and that means that "3 4
3 5" must be dictionary-equivalent[*] to either "3 4" or "3 5" (or be an
error, which is a lot less friendly). I chose the latter because it's
much easier to implement right. :-)

> 3 is both a key and 'part of' a value. This again implies that you
> can't think of a dict as a nested series of dicts with a fixed meaning
> for all operations.

You've lost me. Blame my tiredness.

Donal.
[* NB: this is not the same as string-equivalent or list-equivalent. ]

Fredderic

unread,
Dec 31, 2007, 1:49:14 AM12/31/07
to
On Mon, 31 Dec 2007 00:07:55 GMT,
"Donal K. Fellows" <donal.k...@manchester.ac.uk> wrote:

>> % dict incr a b c
>> expected integer but got "c 1"
> This is purely a limitation of [dict incr] and is due to the fact that
> if there was allowed an optional increment and an optional key path,
> it's ambiguous which the programmer meant. There are several [dict]
> subcommands where this is the case, and you can't look at the value of
> the keys to determine this because they're arbitrary strings. Also,
> requiring everyone to package up their keys as lists (one way to get
> around this problem) would make important simpler cases suck.
> In other words, it's deliberate and/but unfortunate.

Easy solution. Just get my deep-references idea happening. Then you
can just use the standard [incr] on any value at any depth within the
dict, and everyone's happy again. *grins*

Sorry..... Just couldn't help it. :)


Fredderic

Fredderic

unread,
Dec 31, 2007, 7:04:44 AM12/31/07
to
On Fri, 28 Dec 2007 10:41:01 -0800 (PST),
Alexandre Ferrieux <alexandre...@gmail.com> wrote:

> As you can see, variables play no role in the whole scheme. Everything
> is about *values*. No "paths" either, allowing the abstraction
> provided by the two layers of procs artificially used to reach down
> the mutated mlist.

Yup. Sounds like a different take on the same thing to me.

Someone might be able to sort out which is the better way, by
2009... ;)


Fredderic

Alexandre Ferrieux

unread,
Dec 31, 2007, 7:34:26 AM12/31/07
to
On Dec 31, 1:04 pm, Fredderic <my-name-h...@excite.com> wrote:
> On Fri, 28 Dec 2007 10:41:01 -0800 (PST),
>
> Alexandre Ferrieux <alexandre.ferri...@gmail.com> wrote:
> > As you can see, variables play no role in the whole scheme. Everything
> > is about *values*. No "paths" either, allowing the abstraction
> > provided by the two layers of procs artificially used to reach down
> > the mutated mlist.
>
> Yup.  Sounds like a different take on the same thing to me.
>
> Someone might be able to sort out which is the better way, by
> 2009...  ;)

Now that we know the two approaches are different, can you please re-
describe yours down to the details, on the same example as I used for
mine ?

-Alex

tom.rmadilo

unread,
Dec 31, 2007, 8:39:47 AM12/31/07
to
On Dec 30, 4:07 pm, "Donal K. Fellows"

<donal.k.fell...@manchester.ac.uk> wrote:
> To me, the best way to do a "modifiable reference to somewhere inside a
> dictionary" is to pass in a tuple, being the name of a variable holding
> the overall modifiable value, and a list of keys constituting the path
> into the dictionary to the bit to modify. It's a bit like passing in the
> start of a C structure plus an offset instead of a pointer directly
> inside. If you only want to read and a snapshot will do, just get the
> value immediately and pass that, and the Tcl_Obj value reference system
> (that already exists) will make that nice and fast for you. Note that
> none of this is special to dicts; nested lists have identical issues
> (though a different interpretation of the path descriptor). Mixing it up
> is possible too, but we don't make that easy to do at the script level.

My point exactly, I'm just not getting the language right. Reaching
into a dict is an operation which requires the whole dictionary value
and the whole key going in and the whole value of the key. If you had
a deep reference, you would not have the same read/update
characteristics as you get on the Tcl scripting level. I still think
it is even more squishy than what you get in a C struct, which would
never reinterpret what is the value and what is the structure path
based upon the query. This feature makes any direct/deep reference
less similar to the familiar examples in other languages.

Fredderic

unread,
Jan 1, 2008, 3:18:18 AM1/1/08
to

For a pure TCL implementation, I already have.

set a [dict create a 1 b 2 c [list [list 3 4 5] [list 6 7 8]]]]
set path {a {dict c} {list 1 1}}
xset $path 777

Which basically transposes the source variable into a global shadow
variable (as per the wiki page), and then recursively descends the path
to do the similar thing with the targeted value. When needed
(triggered by a read on the globalised variable), the value is merged
back in using the fetch-and-remove method described earlier. It's
ugly, messy, and I'm not 100% certain it's robust, but it seems to work.

My reference idea, is to essentially codify that into TCL core, because
although it works quite well, it is none the less dreadfully slow for
large-scale modifications, having to rip it apart and put it back
together all the time. The idea is for a references internal
representation to be a cache of the Tcl_Obj's involved.

set r [ref a c 1 1] => {{#0 a} {dict c} {list 1 1}}
upref $r x ;# or just allow [upvar] to handle refs natively
set x 777

The purpose of [ref] is just to fully expand the given path, and could
quite readily (in fact, maybe even optimally) return a simple string
value directly, with no internal representation at all. Within it, the
first element would contain all the information needed for [upvar] to
reach the initial value. In this case, referring to the variable "a"
at the apropriate level ("#0" here, though it could also be presented
as "::a"). The short of it is that this works:

upvar {*}[lindex $r 0] somevar

Thereafter, come the path components; {dict c} specifies an element
path {c} in a dict. And {list 1 1} likewise corresponds to a list path
{1 1}. The [upref] call would "listify" the ref, and then walk along
each element invoking a new facility of Tcl_Obj value's type structure
to construct a Tcl_Obj cache, providing all your backref's in one place.
(Leaf type values would simply need to have NULL for this function.) A
little smarts along the way would allow it to know whether any
invalidation is even required, and where and how much. In the case
above, $a is constructed directly, so wouldn't have any string reps to
invalidate, and the reference would know that.

Providing the type of each segment of the path is actually optional,
and in the spirit of TCL's internal details hiding, it might be better
represented as just {{#0 a} c 1 1}. Otherwise people will likely start
building refs simply for the structure introspection value of it.

The catch is that compound values (such as list and dict) would need to
keep a list of active references (a single pointer will do). Or more
specifically, the segment of an active reference corresponding to it.
This would most easily be achieved by making the reference segment a
member of two linked lists; a list spanning the ref (allowing it to be
walked forwards to invalidate cache segments, or backwards to
invalidate string reps), and through the segments of any references
associated directly with this value. When invalidating the references
associated with a value, you would start with the segment pointed to
from the list/dict rep and destroy it, any reference segments after it
(walking forwards), and then move down the second list to the next
reference which has cached through this compound value. At the end of
which, the values references list can be set to NULL, and will stay
that way until one of the references needs to rebuild its Tcl_Obj.

The next step, to save this from getting nasty should there be 10,000
references all passing through a given object, reference segments would
be ref-counted themselves, and shared where possible (so there'd
actually be two next pointers allowing it to fan out in one
direction). The reference itself would have a pointer only to the LAST
cached segment (I'm not sure of a better way to maintain that pointer,
than to have a pointer to the reference stored in the initial [upvar]
segment, or perhaps having the reference itself a valid head of the
chain, which in either case means having to walk back to update it). On
the up-side, this is ONLY done for references that actively pass
through that value, and once the reference invalidation has been
performed, it no longer needs to know that any references even exist
anymore. The Tcl_Obj cache would still, however, exist up to that
point, and prior values would still be tracking the reference like
usual. Some care would just need to be taken in how the references are
undone to get it as concise as possible.

This will require that each reference segment carries at least 6
pointers, plus an int, making it a relatively heavy structure in its
own right. But hopefully there shouldn't be too many of them active at
any given time, and the idea is a kind of run-length-compression of
the structure. Most use cases I've ever seen have only involved at
most one span each of lists or dicts. The segment structure would be
something generally along the lines of:

struct ref_segment_t {
ref_segment_t *prev_seg, *next_seg, *sibil_seg;
ref_segment_t *prev_ref, *next_ref;
int children;
Tcl_Obj *value, **list;
}

It's possibly a fair bit more work than your backref idea, but works
with the existing mechanism (almost - it does if you force it to
re-evaluate the entire path on every access, which would be an
acceptable first step to see how much real-world interest it gets),
doesn't require every single compound structure present and future to
be duplicated with a whole new command each, it matches everyone's
existing expectations by being essentially a super-set of [upvar]
(meaning much less confusing, and probably hence less prone to subtle
little script-level bugs), and should adapt automagically to new
constructs that may be added in the future (once the necessary extra
member is added to the Tcl_Obj internal rep type structure) including
by extensions. It should even be possible at some point to add
script-level support, such as allowing an OO data member to be part of
the ref chain. Although there is the slight downside, admittedly, that
it's not quite as "neat and tidy" as yours in the case where you wish
to rummage around a bit. But I'm willing to accept that. If you're
storing segment types in the text representation of the reference, this
method could even re-construct a branch to set a value if part of the
structure has been removed.


Fredderic

Alexandre Ferrieux

unread,
Jan 1, 2008, 10:53:29 AM1/1/08
to
On Jan 1, 9:18 am, Fredderic <my-name-h...@excite.com> wrote:
> On Mon, 31 Dec 2007 04:34:26 -0800 (PST),
>
> Alexandre Ferrieux <alexandre.ferri...@gmail.com> wrote:
> > Now that we know the two approaches are different, can you please re-
> > describe yours down to the details, on the same example as I used for
> > mine ?
>
> For a pure TCL implementation, I already have.
>
>      set a [dict create a 1 b 2 c [list [list 3 4 5] [list 6 7 8]]]]

Sorry, but that's not exactly my example... But never mind ;-)

>      set path {a {dict c} {list 1 1}}
>      xset $path 777

> My reference idea, is to essentially codify that into TCL core, because


> although it works quite well, it is none the less dreadfully slow for
> large-scale modifications, having to rip it apart and put it back
> together all the time.

> [detailed description]

Sorry. Stack overflow. Must be too simple for me :-)
I think you should TIP it so that it survives this thread, and my new-
year-vacation aversion for complexity...

> Providing the type of each segment of the path is actually optional,
> and in the spirit of TCL's internal details hiding, it might be better
> represented as just {{#0 a} c 1 1}.

No. As Donal already pointed out, in EIAS world there is a type
ambiguity between list and dict for any string which is basically a
even-length list. Suppose $a is "a 0 c {0 {x x} 1 {y y}". If the first
"1" in your path is interpreted as a list index, the access should
return "x". If it is seen as a dict key, int should return "y"...

Hence, your paths need individual component types. But we can
abbreviate them "d" and "l" if you prefer !

> [again details, as requested ;-]

Ouch again...

> It's possibly a fair bit more work than your backref idea,

If by "a fair bit" you mean over ten times, then yes, I agree :-}

Again, I think the natural next step is to TIP.
My personal take on this is "way too complicated for the expected
benefit", but let others judge.
A TIP may stay in this state for years and pop up at some later time.
This will not likely happen with a c.l.t thread !

-Alex

Fredderic

unread,
Jan 2, 2008, 4:53:38 AM1/2/08
to
On Tue, 1 Jan 2008 07:53:29 -0800 (PST),
Alexandre Ferrieux <alexandre...@gmail.com> wrote:

> On Jan 1, 9:18 am, Fredderic <my-name-h...@excite.com> wrote:
>> On Mon, 31 Dec 2007 04:34:26 -0800 (PST),
>> Alexandre Ferrieux <alexandre.ferri...@gmail.com> wrote:
>>> Now that we know the two approaches are different, can you please
>>> re- describe yours down to the details, on the same example as I
>>> used for mine ?
>> For a pure TCL implementation, I already have.
>>      set a [dict create a 1 b 2 c [list [list 3 4 5] [list 6 7 8]]]]
> Sorry, but that's not exactly my example... But never mind ;-)

Yeah well, your example wasn't much use, as I mentioned later on. ;)


>>      set path {a {dict c} {list 1 1}}
>>      xset $path 777
>> My reference idea, is to essentially codify that into TCL core,
>> because although it works quite well, it is none the less
>> dreadfully slow for large-scale modifications, having to rip it
>> apart and put it back together all the time.
>> [detailed description]
> Sorry. Stack overflow. Must be too simple for me :-)

Ahhh..... *nods* Too simple... Must have been. :)


> I think you should TIP it so that it survives this thread, and my
> new-year-vacation aversion for complexity...

Not sure I'm quite that brave just yet... *chuckles*


>> Providing the type of each segment of the path is actually optional,
>> and in the spirit of TCL's internal details hiding, it might be
>> better represented as just {{#0 a} c 1 1}.
> No. As Donal already pointed out, in EIAS world there is a type
> ambiguity between list and dict for any string which is basically a
> even-length list. Suppose $a is "a 0 c {0 {x x} 1 {y y}". If the first
> "1" in your path is interpreted as a list index, the access should
> return "x". If it is seen as a dict key, int should return "y"...
> Hence, your paths need individual component types. But we can
> abbreviate them "d" and "l" if you prefer !

"EIAS world"? That term escapes me just now... Though I think I can
guess. You didn't really need the example there, I know exactly what
you mean.

The trouble with abbreviating it, is that it's supposed to be future
resistant, and someone might add a new type that also happens to start
with the letter d or l. ;)

In my script implementation, it's sorted out using a [switch]
statement. In the core implementation, it would be hopefully using the
names TCL types have for themselves internally. In the case of lists
and dicts, they're exactly that, "list" and "dict". Using those terms
allows a direct lookup in the registered types list, for validation and
type coercion, as needed. But, as you have reminded me (I did mention
there was some alcohol involved in my reasoning), the whole thing fails
miserably (for both of our ideas, I think), if someone goes and changes
the string rep at some level.


>> It's possibly a fair bit more work than your backref idea,
> If by "a fair bit" you mean over ten times, then yes, I agree :-}

I'm actually not sure it's quite as much as you think, or that yours is
quite as simple. In I get some time I'll try to put together the most
basic painfully slow case as an extension. If I manage to pull that
off, then I might feel brave enough TIP it myself.

In the meantime, I've gone and shot myself in the foot. I pointed out
a flaw in an embedded web server's design, so now I get to fix it. ;)


> Again, I think the natural next step is to TIP. My personal take on
> this is "way too complicated for the expected benefit", but let
> others judge. A TIP may stay in this state for years and pop up at
> some later time. This will not likely happen with a c.l.t thread !

Yeah, I know... There's a few issues that need to be ironed out,
though, before it becomes a really viable idea. :(


Fredderic

suchenwi

unread,
Jan 2, 2008, 6:46:04 AM1/2/08
to
On 2 Jan., 10:53, Fredderic <my-name-h...@excite.com> wrote:
> "EIAS world"? That term escapes me just now... Though I think I can
> guess. You didn't really need the example there, I know exactly what
> you mean.

EIAS == Everything Is A String (the Tcl mantra)

Alexandre Ferrieux

unread,
Jan 2, 2008, 6:56:32 AM1/2/08
to
On Jan 2, 10:53 am, Fredderic <my-name-h...@excite.com> wrote:
> On Tue, 1 Jan 2008 07:53:29 -0800 (PST),
>
> Alexandre Ferrieux <alexandre.ferri...@gmail.com> wrote:
>
> > No. As Donal already pointed out, in EIAS world there is a type
> > ambiguity between list and dict for any string which is basically a
> > even-length list.
>
> "EIAS world"?  That term escapes me just now...  Though I think I can
> guess.

Everything Is A String. This means the semantics of Tcl can be
entirely reasoned about at the level of string reps. Internal reps are
just accelerator tricks. This precludes any polymorphic primitive on
Tcl values. The previous sentence is not meant as criticism ;-)

> the whole thing fails
> miserably (for both of our ideas, I think), if someone goes and changes
> the string rep at some level.

Not for both ;-)
In my case, if someone goes and changes a string rep, by definition
the corresponding internal rep will be killed, which means the proper
cleanup of all its forward an back refs. So no "failure", just a large
side-effet which is known from day one by whoever's interested in COW
and efficiency.

> I'm actually not sure it's quite as much as you think, or that yours is
> quite as simple.

In the meantime, I have imagined a "lite" version of the Mutability
approach:

Forget about the mirror types (mlist,mdict...).
Just add a boolean flag in all Tcl_Objs, saying DONT_CACHE_STRING_REP.
The idea is that "not caching string rep" == "mutable".
For most objects, it stays False and you get the usual immutable, COW
semantics.

But we provide a primitive named "mutable", taking a single argument,
and returning a "copy" (or not if not shared) with the flag set to
True:

set d [mutable [dict create a 1 b 2]]

Then you manipulate those objects according to their usual "types" and
semantics, except for three things:

(1) As the flag's name implies, their string rep is immediately
invalidated after each use;

(2) This allows them to disable COW. So, for dicts and lists, this
means that [lset] and [dict set] etc. will react differently to the
object being shared. Instead of duplicating it, they will now operate
in-place;

(3) The flag "recurses up", which means only a mutable container can
contain a mutable object. The converse is allowed without restriction:
a mutable container can contain both mutable and immutable values.
Thus

set l [list $d $d]

with the above $d raises an error "Attempt to insert a mutable value
into an immutable list".
while

set l [mutable list 1 2 3]
lappend l [list a b c]

is allowed.

As you can see, the implementation of this consists of:

(a) defining the flag in the Tcl_Obj structure
(b) invalidating string reps immediately after use for the flagged
objects.
(c) modifying the COW behavior of "in-place" operations like
[lset] and [dict set]
(d) enforcing contagion of mutability in [list] [lappend] etc...

That's it: no backrefs, no doubly-chained whatever, no scanning, no
lifecycle worries...
(I admit that (b) needs some thought to become reasonbaly modular)

-Alex

miguel

unread,
Jan 2, 2008, 10:48:37 AM1/2/08
to

You do not seem to understand the utmost difficulty (is it actually
impossible? I think so myself.) to do that in the inmutable-values-COW
world of Tcl. The ONLY solution (either explicit, or behind the scenes)
is what Donal describes.

OTOH, I'd be delighted to be proved wrong by an actual implementation of
your deep-references, which I'd rush to commit after verifying that it
is actually correct. :)

Fredderic

unread,
Jan 3, 2008, 1:26:25 PM1/3/08
to
On Wed, 2 Jan 2008 03:56:32 -0800 (PST),
Alexandre Ferrieux <alexandre...@gmail.com> wrote:

> As you can see, the implementation of this consists of:
> (a) defining the flag in the Tcl_Obj structure
> (b) invalidating string reps immediately after use for the flagged
> objects.
> (c) modifying the COW behavior of "in-place" operations like
> [lset] and [dict set]
> (d) enforcing contagion of mutability in [list] [lappend] etc...
> That's it: no backrefs, no doubly-chained whatever, no scanning, no
> lifecycle worries...
> (I admit that (b) needs some thought to become reasonbaly modular)

I actually quite like that idea... Although adding a flag to Tcl_Obj
hits every structure, even simple ints, just as badly as the backrefs
would (one of the main reasons I preferred my refs idea over yours).
Actually, I get the feeling, possibly even worse. The backrefs wouldn't
need to be checked constantly like the flag does.

There might be one option, though. It might be possible to mix your
ideas; use a second internal rep type, per compound structure, and
have the internal type structure carry the flag. Everything else can
be identical between mutable and non-mutable lists/dicts/etc.
Literally, all four existing members of the Tcl_ObjType structure
would contain the exact same values, only the new Tcl_ObjType pointer
would contain a different value.

The [mutable] command would take a single argument, and if it's a
container with a mutable variant, then "upgrade" it. The list and dict
manipulations would need to be aware of both forms, and adjust
appropriately (as you already mentioned). The flag could contain one of
two values; either NULL or a Tcl_ObjType.

I'm fragging tired right now, so I'm not sure if I'm thinking right.
But if the "flag" for any type that supports mutability (both the
mutable and immutable one), contained the Tcl_ObjType of the mutable
version, it should do the trick quite neatly. For types that don't have
mutable versions (such as a simple int that isn't a container at all),
the flag would simply be NULL (meaning don't bother).

That should make [mutable]'s job easy; just change the type (since the
types are otherwise identical, even down to the functions they
invoke). And I think for your contagion in (d), it should work also.
If the flag of the base value contains a Tcl_ObjType different to the
current Tcl_ObjType, then it's an immutable container, and can be
left alone (this will also be true for a NULL mutability flag). If the
value of the flag is itself, then it's a mutable container, which
activates the special handling. If you then try to add a value that
has a non-null mutable flag, you need to make sure that its mutable
flag is the same type as the value itself, and "upgrade" it to its
mutable version if not.

And the best part, is that there's extremely little bloat anywhere.
I've got plenty of cases of 10's of thousands of Tcl_Objs kicking
around. That's 10's of thousands of extra flags, very few of which
will ever be set. This variant of your idea effectively turns a value
that's already present in each and every one of them, into the flag.
(Unless anyone knows any other flags they were thinking of putting into
a Tcl_Obj itself, but were afraid to ask...?)


The only downside is that I believe my refs idea can be implemented in
its basic form as an extension (which I hope to do if I get a chance).
I don't think yours can. At least not without even more work to fudge
it, than required in the real in-core implementation. ;)

If there's a TIP to be done, I think this is the more ready one to do.
I still think it's less TCLish than my refs idea, but it certainly does
sound lighter, and that's worth a lot in my book.


Fredderic

Alexandre Ferrieux

unread,
Jan 3, 2008, 5:25:35 PM1/3/08
to
On Jan 3, 7:26 pm, Fredderic <my-name-h...@excite.com> wrote:
> On Wed, 2 Jan 2008 03:56:32 -0800 (PST),
>
> Alexandre Ferrieux <alexandre.ferri...@gmail.com> wrote:
> > As you can see, the implementation of this consists of:
> >     (a) defining the flag in the Tcl_Obj structure
> >     (b) invalidating string reps immediately after use for the flagged
> > objects.
> >     (c) modifying the COW behavior of "in-place" operations like
> > [lset] and [dict set]
> >     (d) enforcing contagion of mutability in [list] [lappend] etc...
> > That's it: no backrefs, no doubly-chained whatever, no scanning, no
> > lifecycle worries...
> > (I admit that (b) needs some thought to become reasonbaly modular)
>
> I actually quite like that idea...  Although adding a flag to Tcl_Obj
> hits every structure, even simple ints, just as badly as the backrefs
> would (one of the main reasons I preferred my refs idea over yours).

Right. In fact I had somehow hoped there would be room for a bit in an
existing integer field, without even looking at the structure :-) Now
looking at it, there's the high bit of refCount. There's also the low
bit of all pointers (assuming pointers are at least 2-aligned... don't
know if this breaks on some exotic processors).
But none of these tricks is satisfactory anyway, since they add
frequent bit-masking operations !

So I like your flag-in-the-type idea very much.

> Actually, I get the feeling, possibly even worse.  The backrefs wouldn't
> need to be checked constantly like the flag does.

No. The flag would only be checked by some specific routines of some
specific types, ie wherever the flag induces a change in behavior ((b),
(c),(d) above). Everywhere else it would just be ignored (or zeroed in
another new primitive [immutable], reverse of [mutable], doing just a
copy and demoting to non-mutable state). Moreover, the backrefs would
*cost* at unlink time.

> There might be one option, though.  It might be possible to mix your
> ideas; use a second internal rep type, per compound structure, and
> have the internal type structure carry the flag.  Everything else can
> be identical between mutable and non-mutable lists/dicts/etc.

Very nice indeed. Just like [mlist], [mdict], etc. but without giving
them mirror named constructors; a perfect case of OO inheritance, with
the subclasses being hidden enough to let the script level imagine
that they are just [list] and [dict]. Gets my vote !

> Literally, all four existing members of the Tcl_ObjType structure
> would contain the exact same values, only the new Tcl_ObjType pointer
> would contain a different value.
> The [mutable] command would take a single argument, and if it's a
> container with a mutable variant, then "upgrade" it.  The list and dict
> manipulations would need to be aware of both forms, and adjust
> appropriately (as you already mentioned).
> The flag could contain one of two values; either NULL or a Tcl_ObjType.

Very smart. Gets my vote again !

> I'm fragging tired right now, so I'm not sure if I'm thinking right.

Let us just imagine when you're *not* tired ;-)

> But if the "flag" for any type that supports mutability (both the
> mutable and immutable one), contained the Tcl_ObjType of the mutable
> version, it should do the trick quite neatly.  For types that don't have
> mutable versions (such as a simple int that isn't a container at all),
> the flag would simply be NULL (meaning don't bother).
>
> That should make [mutable]'s job easy; just change the type (since the
> types are otherwise identical, even down to the functions they
> invoke).

Yes. Maybe for readability and symmetry (because of [immutable]), I
would be ready to waste a few more bytes per Tcl_ObjType (for which
memory is no issue), and instead add *two* fields:

struct Tcl_ObjType *alterEgo;
int flags; /* IS_MUTABLE == (flags & 1) */

The idea being that (1) mutability is a more readable test, and (2)
both [mutable] and [immutable] can have a very simple, type-agnostic
implementation (alterEgo pointing to the x-mutable counterpart).

> If the flag of the base value contains a Tcl_ObjType different to the
> current Tcl_ObjType, then it's an immutable container, and can be
> left alone (this will also be true for a NULL mutability flag).  If the
> value of the flag is itself, then it's a mutable container, which
> activates the special handling.  If you then try to add a value that
> has a non-null mutable flag, you need to make sure that its mutable
> flag is the same type as the value itself, and "upgrade" it to its
> mutable version if not.

Hmmm... Contagion works the other way around (that's OK, we both are
tired ;-)

A mutable container can contain either kind of values. An immutable
container cannot contain mutable values, and moreover cannot promote
them to this status because of the absence of backrefs. Indeed, the
immutable container may itself be contained in one or several other
(immutable) containers, without knowing them. So it could not make
them mutable in turn...

So the only "special handling" to be done is to *forbid* (by raising
an error) any attempt to "poke" a mutable value into an immutable
container (through one of the constructors, or one of the COW-setters
like [lset] and [dict set]).

> And the best part, is that there's extremely little bloat anywhere.
> I've got plenty of cases of 10's of thousands of Tcl_Objs kicking
> around.  That's 10's of thousands of extra flags, very few of which
> will ever be set.  This variant of your idea effectively turns a value
> that's already present in each and every one of them, into the flag.
> (Unless anyone knows any other flags they were thinking of putting into
> a Tcl_Obj itself, but were afraid to ask...?)

Again, I have a slight preference for the more traditional "flags"
bitfield rather than pointer comparison (with the added benefit of
symmetrically switching mutability since each type knows its alter
ego).
Moreover, this makes 31 bits available for those "afraid to ask" ;-)

> The only downside is that I believe my refs idea can be implemented in
> its basic form as an extension (which I hope to do if I get a chance).
> I don't think yours can.  At least not without even more work to fudge
> it, than required in the real in-core implementation.  ;)

Fully agreed. But I had not considered seriously adding such a core
feature as mutability from within an extension...

> If there's a TIP to be done, I think this is the more ready one to do.
> I still think it's less TCLish than my refs idea, but it certainly does
> sound lighter, and that's worth a lot in my book.

OK. Since you contributed the key part making it realistic, I suggest
that you write the TIP (with my blessing and help if needed ;-)

Thanks a lot !

-Alex


>
> Fredderic

Alexandre Ferrieux

unread,
Jan 3, 2008, 7:22:53 PM1/3/08
to
On Jan 3, 7:26 pm, Fredderic <my-name-h...@excite.com> wrote:
> On Wed, 2 Jan 2008 03:56:32 -0800 (PST),
>
> Alexandre Ferrieux <alexandre.ferri...@gmail.com> wrote:
> > As you can see, the implementation of this consists of:
> >     (b) invalidating string reps immediately after use for the flagged
> > objects.
> > (I admit that (b) needs some thought to become reasonbaly modular)

A small addition to this:

After looking at the way GetStringFromObj is used in the code
(especially within ListObj.c and DictObj.c), it turns out that
invalidating "immediately after use" is next to impossible. Indeed:

(1) GetStringFromObj returns storage which is kept track of via the -
>bytes of a Tcl_Obj. So if we start zeroing obj->bytes at some early
point, we need another means not to leak the returned string...

(2) List (and Dict, which are a close relative) stringifiers happen
to be calling GetStringFromObj twice for each and every element. This
makes sense since so far the string rep is cached... It would be a
shame to break that assumption !

Now, I have a way out. Instead of invalidating the string
"immediately", just arrange for invalidating it on next access *if* at
least one mutable container has been poked into in between. Yes, it's
an interp-wide criterion. It *is* paranoid, in that it may invalidate
the reps of untouched containers, but at this cost it is *valid*,
while being reasonably easy to implement:

- keep a global mutables_gencount integer field in the interp
- bump this gencount whenever a mutable is poked into
- store a strep_gencount field in Dict and List intreps (not
Tcl_Obj's !) saying at which time the strep was computed
- in TclGetStringFromObj, in case of a mutable container, reach
down the intrep to get this strep_gencount, and first invalidate the
strep if mutables_gencount!=strep_gencount.

That may sound like a long shot, but it's little more C than English,
and I think it does the job.

-Alex

Fredderic

unread,
Jan 4, 2008, 1:03:23 AM1/4/08
to
On Thu, 3 Jan 2008 16:22:53 -0800 (PST),
Alexandre Ferrieux <alexandre...@gmail.com> wrote:

> On Jan 3, 7:26 pm, Fredderic <my-name-h...@excite.com> wrote:
>> On Wed, 2 Jan 2008 03:56:32 -0800 (PST),
>> Alexandre Ferrieux <alexandre.ferri...@gmail.com> wrote:
>>> As you can see, the implementation of this consists of:
>>>     (b) invalidating string reps immediately after use for the
>>> flagged objects.
>>> (I admit that (b) needs some thought to become reasonbaly modular)
> A small addition to this:
> After looking at the way GetStringFromObj is used in the code
> (especially within ListObj.c and DictObj.c), it turns out that
> invalidating "immediately after use" is next to impossible. Indeed:

Yeah, I realised as soon as I hit Send, that there was something I'd
forgotten to raise... :(

I've never poked around the actual source... Haven't been brave enough
to even download it. If I have the source of tools I work with, I tend
to start spending more time poking around within them, than using them
to do what I need to do. ;)


> (1) GetStringFromObj returns storage which is kept track of via the -
> bytes of a Tcl_Obj. So if we start zeroing obj->bytes at some early
> point, we need another means not to leak the returned string...
> (2) List (and Dict, which are a close relative) stringifiers happen
> to be calling GetStringFromObj twice for each and every element. This
> makes sense since so far the string rep is cached... It would be a
> shame to break that assumption !

*nods* Personally, I think the best bet for dealing with this might be
to add a flag to the Tcl_ObjType indicating that the type should ALWAYS
be asked to update its string rep, and have GetStringFromObj() check
this flag as well as the non-existence of the string rep (which would
be a sliver quicker that digging into the type structure to find this
single-bit flag). That would allow string rep lifetime management to
be optionally put squarely in the hands of the value it's representing.


> Now, I have a way out. Instead of invalidating the string
> "immediately", just arrange for invalidating it on next access *if* at
> least one mutable container has been poked into in between. Yes, it's
> an interp-wide criterion. It *is* paranoid, in that it may invalidate
> the reps of untouched containers, but at this cost it is *valid*,
> while being reasonably easy to implement:
> - keep a global mutables_gencount integer field in the interp
> - bump this gencount whenever a mutable is poked into
> - store a strep_gencount field in Dict and List intreps (not
> Tcl_Obj's !) saying at which time the strep was computed
> - in TclGetStringFromObj, in case of a mutable container, reach
> down the intrep to get this strep_gencount, and first invalidate the
> strep if mutables_gencount!=strep_gencount.
> That may sound like a long shot, but it's little more C than English,
> and I think it does the job.

That'd probably do the trick... Though trying to arrange for that
directly, the way you suggest, would be nasty. But my suggestion just
above should handle the issue rather neatly;

I've never liked generation counters... They've always felt flawed to
me, especially when they're ticking away somewhere like this. I wonder
how long it'd take to wrap a 32-bit generation counter with a script
that's running a fairly tight update loop? I know it's a long shot,
the counter would have to land EXACTLY on the same value for it to
miss. But if a large heavily-used TCL project, say tclhttpd, were to
fall afoul of it, it'd be an absolute bitch of a bug to track down.

I'm not sure what anyone else thinks, but I'd be willing to accept ALL
mutable value string reps being invalidated as soon as any mutable
object changes. It could be accomplished quite easily by having a
Tcl_Obj pointer within the internal rep, that forms a linked list. The
head would be held by the interp, and there'd need to be a function in
Tcl_ObjType to obtain the next Tcl_Obj in the chain (since it's held
by the internal representation). The values update string rep handler,
which is being invoked on EVERY attempt to read the string rep, would
first check whether it has a string rep already. If it does, it'd
invoke the core function to run the invalidation list, and then check
again, simply returning it if it's still there. If either check turned
up as not having an existing string rep, then it'd update the string
rep as per usual, and add itself to the list.

The reason for the double-check, is that the interp would have an
"a mutable value has changed" flag, that any mutable value would call a
function to set any time a change was made. So long as that function
isn't called (meaning the flag is still clear), any subsequent fetches
of the string rep would still invoke the core mass invalidator, but
wouldn't result in any string reps being flushed.

In the case of a value without a string rep, containing one that
does, the first update string rep function would see that it doesn't
have a string rep, so would continue on to generating a new one. It'd
fetch the string rep of its child, which would immediately run the
invalidation list. If no changes had been made to any mutable values
since it had generated its, then it'd simply return its string rep and
all's well. If the mass invalidator detects that a mutable value has
been changed, then it'd invalidate this second values string rep, the
second check would fail, and it would go and regenerate its own string
rep. Either way, the first string rep updater would pick up the value,
build its own string rep, add itself to the list, and be done. So long
as no more mutable values change, you could continue to read the string
rep of either value to your hearts content. They'd invoke the mass
invalidator every time, but it'd just be returning without action.


Fredderic

Alexandre Ferrieux

unread,
Jan 4, 2008, 6:22:54 PM1/4/08
to
On Jan 4, 7:03 am, Fredderic <my-name-h...@excite.com> wrote:
> On Thu, 3 Jan 2008 16:22:53 -0800 (PST),
>
> Alexandre Ferrieux <alexandre.ferri...@gmail.com> wrote:
>
> >  (1) GetStringFromObj returns storage which is kept track of via the -
> > bytes of a Tcl_Obj. So if we start zeroing obj->bytes at some early
> > point, we need another means not to leak the returned string...
> >  (2) List (and Dict, which are a close relative) stringifiers happen
> > to be calling GetStringFromObj twice for each and every element. This
> > makes sense since so far the string rep is cached... It would be a
> > shame to break that assumption !
>
> Personally, I think the best bet for dealing with this might be
> to add a flag to the Tcl_ObjType indicating that the type should ALWAYS
> be asked to update its string rep, and have GetStringFromObj() check
> this flag as well as the non-existence of the string rep (which would
> be a sliver quicker that digging into the type structure to find this
> single-bit flag).  That would allow string rep lifetime management to
> be optionally put squarely in the hands of the value it's representing.

How does this address the GetString-twice problem ?

> > Now, I have a way out. Instead of invalidating the string
> > "immediately", just arrange for invalidating it on next access *if* at
> > least one mutable container has been poked into in between. Yes, it's
> > an interp-wide criterion. It *is* paranoid, in that it may invalidate
> > the reps of untouched containers, but at this cost it is *valid*,
> > while being reasonably easy to implement:
> >     - keep a global mutables_gencount integer field in the interp
> >     - bump this gencount whenever a mutable is poked into
> >     - store a strep_gencount field in Dict and List intreps (not
> > Tcl_Obj's !) saying at which time the strep was computed
> >     - in TclGetStringFromObj, in case of a mutable container, reach
> > down the intrep to get this strep_gencount, and first invalidate the
> > strep if mutables_gencount!=strep_gencount.
> > That may sound like a long shot, but it's little more C than English,
> > and I think it does the job.
>
> That'd probably do the trick...  Though trying to arrange for that
> directly, the way you suggest, would be nasty.  But my suggestion just
> above should handle the issue rather neatly;

Can you explain how you avoid the double string computation in list
and dict stringifiers ?

> I've never liked generation counters...  They've always felt flawed to
> me, especially when they're ticking away somewhere like this.  I wonder
> how long it'd take to wrap a 32-bit generation counter with a script
> that's running a fairly tight update loop?  I know it's a long shot,
> the counter would have to land EXACTLY on the same value for it to
> miss.  But if a large heavily-used TCL project, say tclhttpd, were to
> fall afoul of it, it'd be an absolute bitch of a bug to track down.

I admit I have indulged into ignoring that 1/2^32 chance :-)

> I'm not sure what anyone else thinks, but I'd be willing to accept ALL
> mutable value string reps being invalidated as soon as any mutable

> object changes.  It could be accomplished quite easily (!)

Sorry, you lost me again... But at a macroscopic scale, it looks like
your linked invalidation list is not much easier to maintain than
backrefs, which have the nice property of invalidating exactly what
deserves it. I may be mistaken of course; just trying a random shot in
the dark ;-)

-Alex

Fredderic

unread,
Jan 11, 2008, 1:32:51 AM1/11/08
to
On Fri, 4 Jan 2008 15:22:54 -0800 (PST),
Alexandre Ferrieux <alexandre...@gmail.com> wrote:

Urgh... I've been out of the loop for a bit. This is gonna hurt...
*chuckles*


> On Jan 4, 7:03 am, Fredderic <my-name-h...@excite.com> wrote:
>> On Thu, 3 Jan 2008 16:22:53 -0800 (PST),
>> Alexandre Ferrieux <alexandre.ferri...@gmail.com> wrote:
>>>  (1) GetStringFromObj returns storage which is kept track of via
>>> the - bytes of a Tcl_Obj. So if we start zeroing obj->bytes at
>>> some early point, we need another means not to leak the returned
>>> string... (2) List (and Dict, which are a close relative)
>>> stringifiers happen to be calling GetStringFromObj twice for each
>>> and every element. This makes sense since so far the string rep
>>> is cached... It would be a shame to break that assumption !
>> Personally, I think the best bet for dealing with this might be
>> to add a flag to the Tcl_ObjType indicating that the type should
>> ALWAYS be asked to update its string rep, and have
>> GetStringFromObj() check this flag as well as the non-existence of
>> the string rep (which would be a sliver quicker that digging into
>> the type structure to find this single-bit flag).  That would allow
>> string rep lifetime management to be optionally put squarely in the
>> hands of the value it's representing.
> How does this address the GetString-twice problem ?

The present fix for the problem, is to cache the string rep in the
Tcl_Obj. This suggestion doesn't change that. The string rep is still
being cached, still in the usual place, and can still be invalidated by
the usual means. All my suggestion does, is give the update string rep
function a chance to decide for itself, whether the string rep is still
valid, and to rebuild if it deems it necessary.

Assuming nothing gets changed (other than other string reps being
generated) between those two GetString's, the string rep updater itself
will simply return the string as-is on the second round.


>>> Now, I have a way out. Instead of invalidating the string
>>> "immediately", just arrange for invalidating it on next access
>>> *if* at least one mutable container has been poked into in
>>> between. Yes, it's an interp-wide criterion. It *is* paranoid, in
>>> that it may invalidate the reps of untouched containers, but at
>>> this cost it is *valid*, while being reasonably easy to implement:
>>>     - keep a global mutables_gencount integer field in the interp
>>>     - bump this gencount whenever a mutable is poked into
>>>     - store a strep_gencount field in Dict and List intreps (not
>>> Tcl_Obj's !) saying at which time the strep was computed
>>>     - in TclGetStringFromObj, in case of a mutable container,
>>> reach down the intrep to get this strep_gencount, and first
>>> invalidate the strep if mutables_gencount!=strep_gencount.
>>> That may sound like a long shot, but it's little more C than
>>> English, and I think it does the job.
>> That'd probably do the trick...  Though trying to arrange for that
>> directly, the way you suggest, would be nasty.  But my suggestion
>> just above should handle the issue rather neatly;
> Can you explain how you avoid the double string computation in list
> and dict stringifiers ?

The same way yours does. I haven't changed your idea, just shifted it
a little. Instead of making GetStringFromObj do some funky magic, allow
it the ability to palm a little more control over to the value methods
themselves. More flexibility, less magic.


>> I've never liked generation counters...  They've always felt flawed
>> to me, especially when they're ticking away somewhere like this.  I
>> wonder how long it'd take to wrap a 32-bit generation counter with
>> a script that's running a fairly tight update loop?  I know it's a
>> long shot, the counter would have to land EXACTLY on the same value
>> for it to miss.  But if a large heavily-used TCL project, say
>> tclhttpd, were to fall afoul of it, it'd be an absolute bitch of a
>> bug to track down.
> I admit I have indulged into ignoring that 1/2^32 chance :-)

These are always the worst bugs to find... I used to be quite fond of
generation counters, until an app would run reliably for a few weeks,
and then lock itself up in a tight loop counting from 0 to 4 or 5 or
something, the wrong way, every time the app refreshed its window. So
I'd kill it, adjust the position of a few logging messages, recompile,
and let it run for another couple weeks. ;)

Since then I generally use an internal list. It's about the same size,
"is it my generation" and "update my generation" operations are just as
fast. The only kicker is moving onto the next generation takes a bit
of a hit.


>> I'm not sure what anyone else thinks, but I'd be willing to accept
>> ALL mutable value string reps being invalidated as soon as any
>> mutable object changes.  It could be accomplished quite easily (!)
> Sorry, you lost me again... But at a macroscopic scale, it looks like
> your linked invalidation list is not much easier to maintain than
> backrefs, which have the nice property of invalidating exactly what
> deserves it. I may be mistaken of course; just trying a random shot in
> the dark ;-)

lol I think it falls somewhere in between your backrefs, and
generation counting. It uses the same amount of storage as the
generation counting, which is still less than the backrefs, if I
understand your idea for them correctly. And the maintenance cost
shouldn't be too much worse than generation counting, either, and a
fair bit better than maintaining backrefs for everything (otherwise
you wouldn't have suggested generation counting in the first place ;) ).


Fredderic

Alexandre Ferrieux

unread,
Jan 11, 2008, 5:51:18 PM1/11/08
to
On Jan 11, 7:32 am, Fredderic <my-name-h...@excite.com> wrote:
>
> Since then I generally use an internal list.  It's about the same size,
> "is it my generation" and "update my generation" operations are just as
> fast.  The only kicker is moving onto the next generation takes a bit
> of a hit.

Can you elaborate on this idea in abstracto, outside the current Tcl-
mutables-string-reps context, so that I can grok the idea once for
all ? Fewer than 500 words please, I'm getting short-stacked these
days ;-)

-Alex

Alexandre Ferrieux

unread,
Jan 12, 2008, 5:08:04 AM1/12/08
to
On Jan 11, 11:51 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:

> On Jan 11, 7:32 am, Fredderic <my-name-h...@excite.com> wrote:
>
> > Since then I generally use an internal list.  It's about the same size,
> > "is it my generation" and "update my generation" operations are just as
> > fast.  The only kicker is moving onto the next generation takes a bit
> > of a hit.

OK, forget my previous request, I get it now. Sorry for being slow
again.
The key interpretation item I didn't grok immediately, was that the
semantics of the list you're maintaining is: "all mutables whose
string rep is non-NULL". Then:

- whenever a mutable is modified, a purge occurs, the list
being scanned in full and destroyed on the go, while invalidating the
string reps of its elements.

- whenever a mutable is asked to produce its string rep, by
definition if it's not already there, then the mutable deserves to be
put on the purge list. So there is a bijection between mutable-string-
rep computations and additions to the list. Hence there is no need to
check whether a given object is already on the list or not: the non-
nullity of its string rep embodies this boolean test. *That* is the
key I had missed previously, and that's why I believed this approach
was just as complex as backrefs !

- when a mutable is *freed*, technically no purge is needed,
but housekeeping is due to remove it from the list if it has a string
rep. There a O(N) scanning is unavoidable, but that's not worse (and
actually a constant factor better) than the purge itself.

So, I receive you loud and clear now, and I do think that your purge-
list solution is *very* nice, superior to both backrefs and generation
counting.

However, I have just discovered yet another roadblock (but now I'm
confident you'll dodge it ;-).
The problem is that of cyclic structures: while it's easy to enforce
upwards contagion (only mutables may contain mutables), within a group
of mutables we must not allow a reference cycle; the reason is
twofold:

1) cyclic structures need special machinery to have finite string
reps
(doable but really complex)

2) cyclic structures completely beat refcounting gc.
(that one is the killer)

So the question is now: how do we *detect* those cycles ?
The canonical way is of course to scan the transitive closure of
forward *or* backward references.
Now there are differences between the two:
- forward references are already there, but can be *very* numerous
(containers are here to store enough grain for winter, right ?)
- backrefs, on the contrary, should be rather few. But they're a
mess to implement since they must closely monitor Incr/DecrRefCount
everywhere.

Hmm, come to think of it, maybe we can do with forward refs here.
Indeed, when we're about to insert mutable B as child of mutable A, in
the forward-ref case we have to scan B.
If this "plumbing" occurs at the beginning of B's life, then B might
not be that big...
Yeah, I'm starting to see this as an acceptable isolation of the worst
case, which is to insert a fully-loaded mutable B (say a list of one
million items) in one or several mutables A1, A2, etc...

Now, yet another approach, of course, is just to *document* the issues
of cyclicity (and just detect it ast string-rep-production time):

WARNING - it is possible to produce cyclic structures
with mutable containers. Don't do this because such
objects will leak memory, and trying to convert them
to strings will raise an error.

Do you agree ? Ready for a team TIP ?

-Alex


Fredderic

unread,
Jan 14, 2008, 3:17:03 PM1/14/08
to
On Sat, 12 Jan 2008 02:08:04 -0800 (PST),
Alexandre Ferrieux <alexandre...@gmail.com> wrote:

> On Jan 11, 11:51 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
> wrote:
>> On Jan 11, 7:32 am, Fredderic <my-name-h...@excite.com> wrote:
>>> Since then I generally use an internal list.  It's about the same
>>> size, "is it my generation" and "update my generation" operations
>>> are just as fast.  The only kicker is moving onto the next
>>> generation takes a bit of a hit.
> OK, forget my previous request, I get it now. Sorry for being slow
> again. The key interpretation item I didn't grok immediately, was
> that the semantics of the list you're maintaining is: "all mutables
> whose string rep is non-NULL". Then:
> - whenever a mutable is modified, a purge occurs, the list
> being scanned in full and destroyed on the go, while invalidating the
> string reps of its elements.

The only real alternative I can think of to guarantee no generation
counter glitches, would be to rather painfully locate each and every
mutable Tcl_Obj, and wipe out its string rep every time the generation
counter wraps. But I don't think I much care for that idea.


> - whenever a mutable is asked to produce its string rep, by
> definition if it's not already there, then the mutable deserves to be
> put on the purge list. So there is a bijection between mutable-string-
> rep computations and additions to the list. Hence there is no need to
> check whether a given object is already on the list or not: the non-
> nullity of its string rep embodies this boolean test. *That* is the
> key I had missed previously, and that's why I believed this approach
> was just as complex as backrefs !

There are two options, here.

You could use the space you were previously looking at for your backref
or generation counter to carry an internal list. In that case, if the
internal purge list pointer is non-NULL (it'll be pointing to the
next value on the list, but that fact is irrelevant here), then we're on
the list. At purge time the list would be followed, and simply nulled
out as we went along, with no allocation or releasing of memory
required for maintaining the list itself. But yes, it would probably
be easier at that point to just add backrefs. (That's something I never
quite got with your generation counter idea. I never could understand
how it improved a whole lot on the original backrefs.)

The external list idea is easier, I think, and can be readily
accelerated using a dynamic array (possibly even a standard TCL
list, if I understand their nature correctly), to avoid having to
allocate and release two pieces of memory for each string rep added or
purged. You'd simply allocate a chunk at a time, fill it up with
modified mutable Tcl_Obj's, and then move onto another. There may even
be some benefit to purging old filled-up chunks from the main loop, for
those event-driven type apps.


> - when a mutable is *freed*, technically no purge is needed,
> but housekeeping is due to remove it from the list if it has a string
> rep. There a O(N) scanning is unavoidable, but that's not worse (and
> actually a constant factor better) than the purge itself.

Personally, I was thinking of simply performing a purge. But that's a
jolly good idea. ;)

In either case, you could simply tie it into the free-internal-rep
function that already exists for every internal rep type.


> However, I have just discovered yet another roadblock (but now I'm
> confident you'll dodge it ;-).
> The problem is that of cyclic structures: while it's easy to enforce
> upwards contagion (only mutables may contain mutables), within a group
> of mutables we must not allow a reference cycle; the reason is
> twofold:
> 1) cyclic structures need special machinery to have finite string
> reps (doable but really complex)

I don't see how that's any different to the situation now. Although
it'd confuse the hell out of backrefs - that's one thing I was going
to poke at you in the beginning, but subsequently forgot all about.
The generation counter idea should be totally immune to it. But ...

I think the purge list idea should be just as safe as TCL is right now
(which may or may not be saying much ;) ). The reason is that it's
tied 1-to-1 with string rep creation.

If we're careful to add ourselves to the purge list after we've
constructed our string rep, just before we actually store it in the
Tcl_Obj, but only if we don't have a string rep already (which will
happen if we've already been visited due to the structure being
cyclic), we should be immune also.

It's descending the structure in the first place, where it blows up,
nothing at all to do with mutability, or purge lists, etc. Both purge
lists and generation counters can be made safe due to their being
essentially unordered.

The issue has been in my mind since for debugging my version of the
[future] extension, I have an undocumented [future info] command that
dumps the Tcl_Obj fields: type->name, bytes, and refCount. It's quite
interesting, actually, I've been running around [future info]ing all
sorts of interesting things. But it also provides the Tcl_Obj itself
as a "value" field, which, while not itself cyclic, did get me thinking
about it.


> 2) cyclic structures completely beat refcounting gc.
> (that one is the killer)
> So the question is now: how do we *detect* those cycles ?

How's it done right now?

% set x [list abc def ghi]
abc def ghi
% lset x 1 $x
abc {abc def ghi} ghi
% lset x 1 $x
abc {abc {abc def ghi} ghi} ghi
% future info $x
type list value {abc {abc {abc def ghi} ghi} ghi} refs 2

There SHOULD, to my understanding, be extra references to $x due to
being cyclic. Instead there's just the usual two. I'm guessing
there's some magic going on to break cyclic structures before they
form... Such a structure should be either non-displayable (which
obviously isn't the case here), or non-constructable (which I think is
what's happening). At a guess, I'd say the cycle is being detected as
it's built, and a copy forced.


> Hmm, come to think of it, maybe we can do with forward refs here.
> Indeed, when we're about to insert mutable B as child of mutable A, in
> the forward-ref case we have to scan B. If this "plumbing" occurs at
> the beginning of B's life, then B might not be that big... Yeah, I'm
> starting to see this as an acceptable isolation of the worst case,
> which is to insert a fully-loaded mutable B (say a list of one
> million items) in one or several mutables A1, A2, etc...

Probably. But as I said, cyclic structures are possible NOW. So I'm
wondering how they're being handled, now.


> Do you agree ? Ready for a team TIP ?

Hmmm..... Not at all sure how the whole TIP process works. I flipped
through a few pages of the WIKI on TIPping, a couple days ago, but don't
know that I really get what's expected, and I'm not sure I want to be
the one to do the reference implementation. ;)

I'm also still not sure I like the idea as a whole, it feels to me like
it breaks "the TCL way" in some respects. The implementation side of
it seems to be well and truly nailed down. But I'm not sure I like the
usability aspects. My old references idea was an extension to an
existing well-defined and well-understood concept. This mutability idea
still seems very wishy-washy to me. The need to transition values
between mutable and immutable, and the fact that it's a magic state
with no visible impact at the script level (a reference, on the other
hand, needs to be specifically invoked to bring out its magic), worries
me a little.

If you use the regular manipulators (set, append, lappend, etc.), most
existing scripts can be expected to trash your data. And if you add
mutable manipulators, then you need to duplicate all the existing
methods of in-place manipulation of values in TCL. Unless you only
allow [m(utable-)set], and anything more complicated will involve
copying the value to be manipulated (it's not too bad, if you stick
to only manipulating leaf values). But try to add a new branch to a
large mutable tree that way. The only way I can think of to avoid this
issue, is to have a "mutate" command which rips a value out of the
structure containing it, invalidating any string reps as it does,
invokes a script on that value to perform the manipulation, and then
plugs it back in where it fits.

But that's all coming back to the semantics of my references, where you
can't just pluck out any old value and fiddle with it. You need to
extend a reference to the value you want, make it magic, and then
manipulate the value. Essentially the same way you do with [upvar].

Also I don't know how references could be done any better. In their
basic form, they can reach any value, without regard for mutability,
you simply have to watch the existing rules about sharing. But in that
form, they're also slow. Not too bad for the odd access here and there,
but atrocious for anything more substantial. Although I suppose you
could apply the same purge list technique to the reference backref
caches, and use the same marking technique to indicate there are
references about. But there's still a problem dealing with shared
reference segments along the path you're actively manipulating.

But go ahead and start the mutability TIP if you like. I think it's
definitely viable.


Fredderic

Alexandre Ferrieux

unread,
Jan 14, 2008, 5:20:32 PM1/14/08
to
On Jan 14, 9:17 pm, Fredderic <my-name-h...@excite.com> wrote:
>
> You could use the space you were previously looking at for your backref
> or generation counter to carry an internal list.  In that case, if the
> internal purge list pointer is non-NULL (it'll be pointing to the
> next value on the list, but that fact is irrelevant here), then we're on
> the list.  At purge time the list would be followed, and simply nulled
> out as we went along, with no allocation or releasing of memory
> required for maintaining the list itself.  But yes, it would probably
> be easier at that point to just add backrefs.  (That's something I never
> quite got with your generation counter idea.  I never could understand
> how it improved a whole lot on the original backrefs.)

The difference is that backrefs have to be maintained whenever the
plumbing between containers is done or un-done. And this plumbing, as
I said earlier, is currently not done through a well-define central
Plumb/UnPlumb pair of functions. Rather, it's just "copy pointer, then
Tcl_IncrRefCount()"... While the generation counter needs no
housekeeping. Too bad it has that blind spot at 2^32 :-(

Please notice that heavy as they might look, backrefs are order of
magnitude more efficient on the wasted-stringrep scale than the purge
list when stringification is frequent. But granted, that situation can
be assumed to be rare and/or pathological, so I'm 100% with the purge
list now :-)

>
> The external list idea is easier, I think, and can be readily
> accelerated using a dynamic array (possibly even a standard TCL
> list, if I understand their nature correctly), to avoid having to
> allocate and release two pieces of memory for each string rep added or
> purged.

Tcl lists have the extra constraint of contiguity (for O(1) [lindex]),
which means that they are rather painfully reallocated (hence copied)
every time they reach their current physical size. While the purge
list needs no random access, just a sequential one, so Lisp-like
linked lists are a better match IMHO. Then, the individual "cons"
cells can be allocate by chunks like you suggest (and like Tcl_Obj's).

> > The problem is that of cyclic structures: while it's easy to enforce
> > upwards contagion (only mutables may contain mutables), within a group
> > of mutables we must not allow a reference cycle; the reason is
> > twofold:
> >  1) cyclic structures need special machinery to have finite string
> > reps (doable but really complex)
>
> I don't see how that's any different to the situation now.

No. Currently Tcl is unable to build cycles, thanks to the unblinking
gaze of COW semantics.
Even if you try to poke the "root" of a nested list into one of its
leaves with [lset], by the time you get to the guts of the [lset]
procedure, the list value has a refcount of at least 2, and lset
copies it accordingly.

> It's descending the structure in the first place, where it blows up,

No. Nothing blows up since no cycles are created (as you correctly
guessed later)...

> > Do you agree ? Ready for a team TIP ?
>
> Hmmm.....  Not at all sure how the whole TIP process works.

Can you elaborate ? What's wrong with TIPs ?

> I'm also still not sure I like the idea as a whole, it feels to me like
> it breaks "the TCL way" in some respects.  The implementation side of
> it seems to be well and truly nailed down.  But I'm not sure I like the
> usability aspects.

Ouch - cold shower :-{

> My old references idea was an extension to an
> existing well-defined and well-understood concept.  This mutability idea
> still seems very wishy-washy to me.

> But go ahead and start the mutability TIP if you like. I think it's
> definitely viable.

OK. Thanks anyway for taking part in the brainstorming so far !

-Alex

Donal K. Fellows

unread,
Jan 14, 2008, 6:30:42 PM1/14/08
to
Fredderic wrote:

> Alexandre Ferrieux wrote:
> > Do you agree ? Ready for a team TIP ?
>
> Hmmm..... Not at all sure how the whole TIP process works. I flipped
> through a few pages of the WIKI on TIPping, a couple days ago, but don't
> know that I really get what's expected, and I'm not sure I want to be
> the one to do the reference implementation. ;)

The whole point of a TIP is to try to record the architectural design
decisions so that when we do maintenance in a few years, we can figure
out what the creator of some code intended. (It was triggered by
finding quite a bit of code in 8.4a1 whose purpose was unclear and
where we couldn't contact the author any longer.) Anyway, writing a
TIP's not much harder at all than writing up your ideas on a wiki
page; experience suggests that the hard bit is getting to the point
where you have a design and patch as opposed to an experiment and a
bunch of badly-written hacks. ;-)

Donal.

It is loading more messages.
0 new messages