Of dicts and arrays

Alexandre Ferrieux

unread,

May 12, 2007, 6:36:38 PM5/12/07

to

Hi,

After reading the rationale behind dicts in TIP 111
http://www.tcl.tk/cgi-bin/tct/tip/111.html , I'm a bit frustrated.
Sure enough, arrays not being first-class values is annoying (and
since Python has them, we couldn't survive the insult very long,
right ?).

But shall we keep two hashtable types side-by-side in the core of the
language ?
Isn't there a way of *implementing* arrays based on dicts, since
obviously the younger brother is brighter ?

-Alex

miguel sofer

unread,

May 12, 2007, 6:52:59 PM5/12/07

to

On May 12, 7:36 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:
> Hi,
>
> After reading the rationale behind dicts in TIP 111http://www.tcl.tk/cgi-bin/tct/tip/111.html, I'm a bit frustrated.

> Sure enough, arrays not being first-class values is annoying (and
> since Python has them, we couldn't survive the insult very long,
> right ?).
>
> But shall we keep two hashtable types side-by-side in the core of the
> language ?
> Isn't there a way of *implementing* arrays based on dicts, since
> obviously the younger brother is brighter ?
>
> -Alex

Mmhhh ... tough.

A dict is a value, its elements are values. Values in Tcl can't carry
state.

An array is a variable, its elements are also variables. Variables can
carry state.

Think for instance of traces: you can trace a whole array, or specific
elements. What would be the semantics of tracing a value? How would
they be represented in the string rep (remember EIAS)?

Alexandre Ferrieux

unread,

May 12, 2007, 7:05:25 PM5/12/07

to

On May 13, 12:52 am, miguel sofer <miguel.so...@gmail.com> wrote:
> On May 12, 7:36 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
> wrote:
>
> > Isn't there a way of *implementing* arrays based on dicts, since
> > obviously the younger brother is brighter ?
>

> A dict is a value, its elements are values. Values in Tcl can't carry
> state.
> An array is a variable, its elements are also variables. Variables can
> carry state.

Yes, but I was thinking about faking the copy-on-write semantics by
enforcing a refcount of always 1.
That is, implement the array as something that uses a dict internally,
but never exposes a reference to the dict value.
This way, for the [set a($x) $y] operation, an in-place variant of
[dict set] can be defined and used, since the refcount is 1.

> Think for instance of traces: you can trace a whole array, or specific
> elements. What would be the semantics of tracing a value? How would
> they be represented in the string rep (remember EIAS)?

I don't want to trace dicts, keys, or values. I'm not talking about
merging the notions of dicts and array. I am aware of the heavy
machinery needed to implement variables. I'm only after a
reimplementation of arrays which leverages the existence of dicts...

-Alex

Neil Madden

unread,

May 12, 2007, 7:11:31 PM5/12/07

to

Hmm... both *are* implemented using the same hashtable implementation
though, right?

Oh, and there's namespaces too, of course...

-- Neil

sleb...@gmail.com

unread,

May 12, 2007, 7:17:32 PM5/12/07

to

On May 13, 6:36 am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:

> But shall we keep two hashtable types side-by-side in the core of the
> language ?
> Isn't there a way of *implementing* arrays based on dicts, since
> obviously the younger brother is brighter ?

The problem lies with the fact that each element of an array is a
proper "variable" in and of itself while dicts, like lists, are
(conceptually) just specially formatted strings. Since nobody have yet
to figured out how to efficiently trace individual list elements like:

trace add variable {lindex $x 4} write {puts hello}

therefore nobody have yet figured out how to efficiently trace
individual dict elements.

Traceability is the only argument for keeping arrays. Indeed it is the
primary reason why arrays cannot be made first class values. And
traceability is important because it allows us to use individual array
elements in places like -textvariable and arguments to vwait.

see http://wiki.tcl.tk/10813

This have been proposed numerous times and a lot of people, myself
included, would love to see dicts being merged with arrays. But no
matter how many times this have been proposed nobody has yet proposed
how to resolve the difference between what-is-a-value and what-is-a-
variable which is key to the difference between arrays and dicts (or
list for that matter).

sleb...@gmail.com

unread,

May 12, 2007, 7:22:49 PM5/12/07

to

On May 13, 7:05 am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:

But then, what exactly do you mean by:

Donal K. Fellows

unread,

May 12, 2007, 7:31:56 PM5/12/07

to

Alexandre Ferrieux wrote:
> Yes, but I was thinking about faking the copy-on-write semantics by
> enforcing a refcount of always 1.

Erk! That will either be desperately slow or have bizarre side-
effects. Or both. Tcl 8.* is really not designed to work like that...

> That is, implement the array as something that uses a dict internally,
> but never exposes a reference to the dict value.

Does it make you happier to think that both use a Tcl_HashTable
internally already? :-)

> I don't want to trace dicts, keys, or values. I'm not talking about
> merging the notions of dicts and array. I am aware of the heavy
> machinery needed to implement variables. I'm only after a
> reimplementation of arrays which leverages the existence of dicts...

I'd also like an array system that can have pluggable back-end
implementations, doing the plugging on a per-array basis through an
[array] subcommand, of course. I've not had time to work on making
this idea a reality (work intruded) but I think it should be fairly
straight forward as hardly any code looks inside the implementations
of arrays (I'm not even sure that Tcl_ExecuteByteCode does, and that
function gets its fingers in everywhere!) But the step from there to
the provision of a way to use [dict] itself as an array backend is
quite a big one unless you are willing to drop trivial serializability
(the trace metadata is required as much existing code rightfully
expects to be able to trace on individual array elements). Without
that serializability, what is the point in using dictionaries
themselves? You could just use the back-end directly.

Donal.

miguel sofer

unread,

May 12, 2007, 7:47:08 PM5/12/07

to

On May 12, 8:31 pm, "Donal K. Fellows" <donal.k.fell...@man.ac.uk>
wrote:
> ...

> of arrays (I'm not even sure that Tcl_ExecuteByteCode does, and that

> ...

I'm ready to fight you until I faint in a pool of my own blood before
letting TEBC become Tcl_ExecuteByteCode! It shouldn't even be
TclExecuteByteCode, hope to someday make it a static ExecuteByteCode
instead ;)

Alexandre Ferrieux

unread,

May 12, 2007, 7:57:20 PM5/12/07

to

On May 13, 1:31 am, "Donal K. Fellows" <donal.k.fell...@man.ac.uk>
wrote:

> Alexandre Ferrieux wrote:
>
> > I don't want to trace dicts, keys, or values. I'm not talking about
> > merging the notions of dicts and array. I am aware of the heavy
> > machinery needed to implement variables. I'm only after a
> > reimplementation of arrays which leverages the existence of dicts...
>
> I'd also like an array system that can have pluggable back-end
> implementations, doing the plugging on a per-array basis through an
> [array] subcommand, of course. I've not had time to work on making
> this idea a reality (work intruded) but I think it should be fairly
> straight forward as hardly any code looks inside the implementations
> of arrays (I'm not even sure that Tcl_ExecuteByteCode does, and that
> function gets its fingers in everywhere!)

Yes, I've seen your mention of this on the wiki page.
I find it very elegant. Too bad work intrudes !

> But the step from there to
> the provision of a way to use [dict] itself as an array backend is
> quite a big one unless you are willing to drop trivial serializability
> (the trace metadata is required as much existing code rightfully
> expects to be able to trace on individual array elements).

Why ? Imagine:

# define array a with a dict backend
array plug a getter setter
proc getter x {return [dict get $::my_secret_dict $x]}
proc setter {x y} {dict set-in-place $::my_secret_dict $x $y}

([dict set-in-place] works efficiently only for unshared dicts, it
falls back to [dict set] and hence copy for a shared one)
The trace metadata can be kept only on the array side. It does not
concern the dict below. Hence, it need not be serialized if some
intruder puts its hands onto $::my_secret_dict and (say) requires its
string rep.

> that serializability, what is the point in using dictionaries
> themselves? You could just use the back-end directly.

Elegance. And even interoperability, with extra commands like

array snap a
--> returns $::my_secret_dict, momentarily shared. Will be copied
on next write operation.

array swallow a $dict
--> sets ::my_secret_dict to $dict

Again, the idea is *not* that of http://wiki.tcl.tk/10813 , but rather
to rebuild arrays as an introspectable layer above dicts.
The key is the specific behavior of the putative [dict set-in-place].

-Alex

Neil Madden

unread,

May 12, 2007, 8:39:46 PM5/12/07

to

Alexandre Ferrieux wrote:
...

>> But the step from there to
>> the provision of a way to use [dict] itself as an array backend is
>> quite a big one unless you are willing to drop trivial serializability
>> (the trace metadata is required as much existing code rightfully
>> expects to be able to trace on individual array elements).
>
> Why ? Imagine:
>
> # define array a with a dict backend
> array plug a getter setter
> proc getter x {return [dict get $::my_secret_dict $x]}
> proc setter {x y} {dict set-in-place $::my_secret_dict $x $y}
>
> ([dict set-in-place] works efficiently only for unshared dicts, it
> falls back to [dict set] and hence copy for a shared one)

[dict set] could (and possibly already does) implement this kind of
optimisation anyway for unshared dicts. So the setter would look like:

proc setter {x y} { dict set ::my_secret_dict $x $y }

You'd presumably also want an [unset], and possibly things like [incr],
[append] etc to support things like [incr a(foo)].

> The trace metadata can be kept only on the array side. It does not
> concern the dict below. Hence, it need not be serialized if some
> intruder puts its hands onto $::my_secret_dict and (say) requires its
> string rep.

This does all seem quite possible, and quite nice, but whether it should
replace the current array implementation seems less clear.

>> that serializability, what is the point in using dictionaries
>> themselves? You could just use the back-end directly.
>
> Elegance. And even interoperability, with extra commands like
>
> array snap a
> --> returns $::my_secret_dict, momentarily shared. Will be copied
> on next write operation.
>
> array swallow a $dict
> --> sets ::my_secret_dict to $dict

How would these differ from the existing [array get/set]? I don't know
if these currently return/accept dicts or just lists, but it seems like
a reasonable optimisation.

-- Neil

Neil Madden

unread,

May 12, 2007, 8:52:54 PM5/12/07

to

Neil Madden wrote:

(Ugh, hate replying to myself but thought of this immediately after
hitting send...)

...

> You'd presumably also want an [unset], and possibly things like [incr],
> [append] etc to support things like [incr a(foo)].

...

Which leads to the idea that you'd specify an ensemble that supports all
these various commands -- i.e., much like the "dict" command itself:

# array plug varName ensemble initialValue
array plug a dict [dict create]

Which actually creates a variable "a" and stores the initial value in it
(and is used for traces etc). Then operations on the variable get
translated into calls to the ensemble passing in the fully-qualified
name of the variable, so e.g.:

set a(foo) 12 --> dict set ::a foo 12
incr a(foo) --> dict incr ::a foo
unset a(foo) --> dict unset ::a foo

-- Neil

sleb...@gmail.com

unread,

May 13, 2007, 4:05:08 AM5/13/07

to

On May 13, 7:57 am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:
> Again, the idea is *not* that ofhttp://wiki.tcl.tk/10813, but rather

> to rebuild arrays as an introspectable layer above dicts.
> The key is the specific behavior of the putative [dict set-in-place].

Then congratulations, tcl 8.5 already does most of what you want. You
do realise that array and dict share the same back end don't you? I
don't get what your suggestion adds since from a programmer's
perspective we'll still have two separate hash types: arrays and
dicts. And their behavior will be exactly what we have now.

Michael Schlenker

unread,

May 13, 2007, 5:50:23 AM5/13/07

to

Neil Madden schrieb:

They differ in key types afaik, arrays use malloced char* keys, while
dicts use Tcl_Obj* keys. But otherwise they are using the same hashtable
implementation.

Michael

Alexandre Ferrieux

unread,

May 13, 2007, 1:41:44 PM5/13/07

to

On May 13, 2:52 am, Neil Madden <n...@cs.nott.ac.uk> wrote:

> Neil Madden wrote:
>
> Which leads to the idea that you'd specify an ensemble that supports all
> these various commands -- i.e., much like the "dict" command itself:
>

> set a(foo) 12 --> dict set ::a foo 12
> incr a(foo) --> dict incr ::a foo
> unset a(foo) --> dict unset ::a foo

Lovely ! And much more general than my initial request.
If you TIP, this, I'll put all the weight of my zero-voice, non-TCT-
membership to advocate it.

-Alex

Alexandre Ferrieux

unread,

May 13, 2007, 2:38:08 PM5/13/07

to

On May 13, 10:05 am, "slebet...@yahoo.com" <slebet...@gmail.com>
wrote:

You never refactor, do you ? ;-)

To answer your question, no, until Neil's reply yesterday I hadn't
realized how much they shared.
Since arrays are often said to be suboptimal regarding key allocation
and non-TclObj awareness, and dicts have all these good properties, I
assumed they were largely different. Apparently not. I stand
corrected.

Then I'll denature my initial request to a much milder one: make sure
that an efficient bidirectional bridge exists between arrays and
dicts.
The reason for this, is that when you have two very similar APIs, you
get a racemic usage. And more often than you'd like, when using two
libraries by different authors, you'll end up converting back and
forth between the two species.

As Neil pointed out, the bridge already exists in [array get]/[array
set], and they could transparently be redefined to use dicts instead
of even-sized lists (which have the same string rep). I don't know the
implementation enough to infer how easy it'll be to make the
conversion fast (i.e. without recomputation). My idea about using a
dict as backend for array gave precisely this for free.

-Alex