Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

sorting a list of dicts by a nested key...

217 views
Skip to first unread message

Andreas Leitgeb

unread,
Jan 27, 2022, 12:48:23 PM1/27/22
to
This is one problem I had, but completely worked around it,
by changing it to nested lists (and lsort -index ...).

Nevertheless, I'm curious, what would be the best approach to
get from outset:

set list {{key 42 a 1} {key 54 b 5} {c 6 key 30}}

to the list sorted by the value for "key" from the nested dicts:

{c 6 key 30} {key 42 a 1} {key 54 b 5}

So far I can only think of lsort -command ...

heinrichmartin

unread,
Jan 27, 2022, 5:34:52 PM1/27/22
to
Not an answer, but sharing my experience. I mostly avoid lists of dicts and prefer dicts of dicts by using the key in the outer-most structure, i.e.

set data {42 {key 42 a 1} 54 {key 54 b 5} 30 {c 6 key 30}}

I assume that [dict keys] is quite cheap as long as $data does not shimmer ... forearch key [lsort [dict keys $data]] {set record [dict get $data $key]; ...}

Andreas Leitgeb

unread,
Jan 27, 2022, 6:31:45 PM1/27/22
to
Thanks!
Very good point!

Compared to [lsort -command ...], even an extra iteration to turn the
outer list to a dict would have been a breeze...

Harald Oehlmann

unread,
Jan 28, 2022, 2:13:18 AM1/28/22
to
Not an answer neither. If you arange tk get key at the first position
always, you can use lsort -index 1.

My big data structures are normally array of dicts, to:
- restrict shimmering on the toplevel key level (array elements does not
shimmer the whole array)
- then use highly nested dicts to have "dict index a(b) $i1 $i2 $i3".

Harald
-

Andreas Leitgeb

unread,
Jan 28, 2022, 3:34:45 AM1/28/22
to
Harald Oehlmann <wort...@yahoo.de> wrote:
> Not an answer neither. If you arange to get key at the first position
> always, you can use lsort -index 1.

That's a bit dirty, isn't it? ;-) shimmering all nested dicts...

I named it "key" but that was meant in the sense of sort-key,
not in the sense of fixed object key.

It may change between one sort task and the next one.

If the value for key changes, the key will move to the right
within the dict, so "keeping the key at front" may not be as
easy in practise.

There is also a possible extra challenge, if keys aren't necessarily
unique. While it wouldn't matter in your approach, Martin's approach
then would have to be adapted to an outer dict holding lists of
inner dicts of same key. Will likely still beat the lsort -command ...

> My big data structures are normally array of dicts, to:
> - restrict shimmering on the toplevel key level (array elements does not
> shimmer the whole array)

To some degree, arrays may do more shimmering harm than dicts...
in particular the keys for arrays are always shimmered to string,
unlike for dicts... - at least that's what I've gathered here in
the past. Not entirely sure, if still true.

> - then use highly nested dicts to have "dict index a(b) $i1 $i2 $i3".

"dict get", or did I miss any latest features?

some of the dict-subcommands unfortunately are not able to reach out
into nested dicts...

Just recently I used an array of single-level-dicts, to be able to "incr"
leaf-elements indexed by two keys.

Thanks for joining into the discussion.

Harald Oehlmann

unread,
Jan 28, 2022, 3:51:59 AM1/28/22
to
Am 28.01.2022 um 09:34 schrieb Andreas Leitgeb:
> To some degree, arrays may do more shimmering harm than dicts...
> in particular the keys for arrays are always shimmered to string,
> unlike for dicts... - at least that's what I've gathered here in
> the past. Not entirely sure, if still true.

That is an important point, but I suppose, this is also true for dicts.
They both use a hash-table for internal organisation and the hash-table
requires always a string representation.

Thank you,
Harald

Andreas Leitgeb

unread,
Jan 28, 2022, 5:17:14 AM1/28/22
to
There is a subtle difference between "shimmering to string" and merely
obtaining a string for the hash and then not storing the string back
into the object.

Adding a string-rep to an object is not always an improvement.
Existing string-reps sometimes inhibit alternative "direct routes"
from some object to another.

e.g.: Given a dict object with a string-rep, it is possible, that the
string-rep was originally a list's string-rep that still contains
duplicate keys in the dict-sense, but these are perfectly legal for
a list. Therefore, the list would be re-parsed from the string rep
rather than from the dict-rep.

Without a string-rep, the list can (or at least could) be directly
obtained from the dict.

Harald Oehlmann

unread,
Jan 28, 2022, 5:29:43 AM1/28/22
to
Andreas,

thank you, good point,
Harald

heinrichmartin

unread,
Jan 28, 2022, 5:52:50 AM1/28/22
to
On Friday, January 28, 2022 at 11:17:14 AM UTC+1, Andreas Leitgeb wrote:
> Adding a string-rep to an object is not always an improvement.
> Existing string-reps sometimes inhibit alternative "direct routes"
> from some object to another.
>
> e.g.: Given a dict object with a string-rep, it is possible, that the
> string-rep was originally a list's string-rep that still contains
> duplicate keys in the dict-sense, but these are perfectly legal for
> a list. Therefore, the list would be re-parsed from the string rep
> rather than from the dict-rep.
>
> Without a string-rep, the list can (or at least could) be directly
> obtained from the dict.

One never stops learning. This means it actually makes a difference whether I start a dict with [dict create] or not (namely no string rep vs string rep).
On the other hand, I can still stick to initialising short constant dicts with braces (rather than [dict create]), because it will not be measurable in e.g.,

set defaults {foo 1 bar 2} ;# that is imho more readable than [dict create foo 1 bar 2]

Harald Oehlmann

unread,
Jan 28, 2022, 6:32:06 AM1/28/22
to
Yes, it makes a difference:

% set defaults {foo 1 bar 2}
% tcl::unsupported::representation $defaults
value is a pure string with a refcount of 4, object pointer at 007D4430,
string representation "foo 1 bar 2"
% dict get $defaults foo
1
% tcl::unsupported::representation $defaults
value is a dict with a refcount of 4, object pointer at 007D4430,
internal representation 040879E0:00000000, string representation "foo 1
bar 2"

Now without string interpretation:
Remark the funny ";list" at the end of the command to avoid to put the
result as command result, which then would be converted by the sehll to
a string for printing, which would add a string representation

% set defaults [dict create foo 1 bar 2];list
% tcl::unsupported::representation $defaults
value is a dict with a refcount of 7, object pointer at 007D4430,
internal representation 040879E0:00000000, string representation "foo 1
bar 2"

Harald

Harald Oehlmann

unread,
Jan 28, 2022, 6:58:57 AM1/28/22
to

Am 28.01.2022 um 12:32 schrieb Harald Oehlmann:
> set defaults [dict create foo 1 bar 2];list
> % tcl::unsupported::representation $defaults

Sorry, my trick with the ";list" did not work, there is still a string
representation. Aim was to avoid this.
I may check out later, sorry,
Harald

Andreas Leitgeb

unread,
Jan 28, 2022, 7:05:36 AM1/28/22
to
heinrichmartin <martin....@frequentis.com> wrote:
> On Friday, January 28, 2022 at 11:17:14 AM UTC+1, Andreas Leitgeb wrote:
>> Adding a string-rep to an object is not always an improvement.

> One never stops learning. This means it actually makes a difference
> whether I start a dict with [dict create] or not (namely no string rep
> vs string rep).

Indeed it does, but only for that specific initial dict.

you can see it, if you:
set d {a 1 a 2 a 3}
dict size $d ;# -> 1
llength $d ;# -> 6, because it used orig string-rep.

As soon as you add further keys, or just change the value
for a key, you get a new dict and that new dict will then
be free of string-rep, even if the orig one was unshared
and thus modified in-place. Any mod on the object kills the
string rep.

This goes a step further, in that, when you create an empty
dict [dict create] or empty list [list], tcl will just return
a shared ref to the empty string object, because it doesn't
matter for those small objects.

Andreas Leitgeb

unread,
Jan 28, 2022, 8:06:05 AM1/28/22
to
Harald Oehlmann <wort...@yahoo.de> wrote:
> Now without string interpretation:
> Remark the funny ";list" at the end of the command to avoid to put the
> result as command result, which then would be converted by the sehll to
> a string for printing, which would add a string representation
>
> % set defaults [dict create foo 1 bar 2];list
> % tcl::unsupported::representation $defaults
> value is a dict with a refcount of 7, object pointer at 007D4430,
> internal representation 040879E0:00000000, string representation "foo 1
> bar 2"

I was surprised, that there even is a string rep in that case, so I
checked that I could reproduce it (yes, I could), and then tried
another one:

% set defaults [dict create foo [expr {1+1}] bar 4];list
% tcl::unsupported::representation $defaults
value is a dict with a refcount of 2, object pointer at 0x55d874107190,
internal representation 0x55d874172b60:(nil), no string representation

Apparently, there exists a special case, where all key and value
elements of a new dict are all strings themselves, and then a string-rep
of the dict is (unnecessarily?) pre-generated.

heinrichmartin

unread,
Jan 28, 2022, 8:21:16 AM1/28/22
to
On Friday, January 28, 2022 at 2:06:05 PM UTC+1, Andreas Leitgeb wrote:
Nope, note the refcount 7. Always start such tests in a fresh tclsh - or even better in a file.
But even in a fresh tclsh, I can reproduce the string rep ... and demonstrate how Tcl re-uses objects.

$ tclsh
% tcl::unsupported::representation [dict create foo bar]
value is a dict with a refcount of 3, object pointer at 0x10306b0, internal representation 0x10afad0:(nil), string representation "foo bar"
% tcl::unsupported::representation [dict create foo bar]
value is a dict with a refcount of 4, object pointer at 0x10306b0, internal representation 0x10afad0:(nil), string representation "foo bar"
% tcl::unsupported::representation [dict create foo bar]
value is a dict with a refcount of 5, object pointer at 0x10306b0, internal representation 0x10afad0:(nil), string representation "foo bar"
% tcl::unsupported::representation [dict {*}{create foo bar}]
value is a dict with a refcount of 6, object pointer at 0x10306b0, internal representation 0x10afad0:(nil), string representation "foo bar"
% tcl::unsupported::representation [dict {*}{create foo bar}]
value is a dict with a refcount of 7, object pointer at 0x10306b0, internal representation 0x10afad0:(nil), string representation "foo bar"
% tcl::unsupported::representation [dict {*}{create foo baz}]
value is a dict with a refcount of 3, object pointer at 0x10cc9e0, internal representation 0x10b18d0:(nil), string representation "foo baz"

Your special case seems to be crafted by the interactive mode - not even non-normalized string reps in the command could break that.
But the interpreter on its own behaves as expected:

$ cat test.tcl
set a [dict create foo 1 bar 2]
puts [tcl::unsupported::representation $a]
puts $a
puts [tcl::unsupported::representation $a]
puts [tcl::unsupported::representation {foo 1 bar 2}]

$ tclsh test.tcl
value is a dict with a refcount of 2, object pointer at 0x10d0610, internal representation 0x10e9ec0:(nil), no string representation
foo 1 bar 2
value is a dict with a refcount of 2, object pointer at 0x10d0610, internal representation 0x10e9ec0:(nil), string representation "foo 1 bar 2"
value is a pure string with a refcount of 1, object pointer at 0x10d0eb0, string representation "foo 1 bar 2"

Andreas Leitgeb

unread,
Jan 28, 2022, 8:58:02 AM1/28/22
to
heinrichmartin <martin....@frequentis.com> wrote:
> Your special case seems to be crafted by the interactive mode -
> not even non-normalized string reps in the command could break that.

I did try to alter values first, using different numbers for value of
bar, and always got a string rep, despite the ";list"-trick to avoid
stringification of result in the shell. Only when I computed one of
the inner values, it worked as expected.

I fail to explain this with interactive mode alone. If it were that,
then 2 versus [expr {1+1}] wouldn't make a reproducible difference,
either.

Interactive mode, however, may of course play a part, namely by not
compiling the commands... - The interpreted path may well take a
different route for dict create with fixed arguments, than for dict
create with computed ones.

Nevertheless, this is no bug, as long as the difference can only be seen
by tcl::unsupported::-means.

Eric

unread,
Jan 28, 2022, 9:00:06 AM1/28/22
to
Hi,

This is due to management of literals table.
I had the same "issue", trying to reduce size of generated code using [list] instead of {}.
See https://core.tcl-lang.org/tcl/tktview?name=a64ef79518

Eric
0 new messages