Looking for elegant summary function

Christopher Nelson

unread,

Dec 8, 2005, 9:04:33 AM12/8/05

to

I'm looking for an elegant and efficient way to determine how many
distinct elements are in a list. For example [x [list 1]] and [x
[list 1 1]] and [x [list 1 1 1 1]] should all return 1, [x [list 1 2]]
and [x [list 1 2 1]] should return 2. One brute force approach that
suggests itself (untested) is:

proc x { l } {
foreach e $l {
set a($e) 1
}
return [llength [array names a]]
}

Not too bad, I guess. Any other thoughts?

Chris

Aleks

unread,

Dec 8, 2005, 9:30:06 AM12/8/05

to

Christopher Nelson schrieb:

Probably not very helpful but tclx has a function (lrmdups) wich removes
duplicate values from lists so you could say:

llenght [lrmdups "1 2 2 3 3"]
=> 3

Aleks

Helmut Giese

unread,

Dec 8, 2005, 9:35:36 AM12/8/05

to

On 8 Dec 2005 06:04:33 -0800, "Christopher Nelson"
<cne...@nycap.rr.com> wrote:

Hi Chris,
Maybe llength [lsort -unique $lst] ?

Hm, on second thought, while this is shorter to write (and hence may
be deemed more "elegant"), it will probably be less efficient in most
cases - sorting having a complexity of O(n log(n)) (IIRC), anyway,
worse than O(n). So your above approach with a complexity of O(n)
isn't "too bad".
HTH
Helmut Giese

davidh...@simplifiedlogic.com

unread,

Dec 8, 2005, 9:22:59 AM12/8/05

to

llength [lsort -unique "1 2 2 3 3"]
--> 3

Bryan Oakley

unread,

Dec 8, 2005, 9:31:46 AM12/8/05

to

Why the need for an elegant function? Does this one not perform well
enough for your specific need? Or maybe the right question is, why do
you not think that's an elegant solution?

Given a 8000 element list with each item having 8 non-sequential
duplicates, the above takes only 8ish ms on my middle-of-the-road
laptop. That seems pretty good for any sort of data I typically
manipulate. Are you dealing with huge amounts of data, or have to call
this function in a big loop?

If you want a one-liner (if "one-liner == elegant"), you can try this:

[llength [lsort -unique $list]]

... but that, somewhat surprisingly, takes longer than your brute force
solution on the same data set (11ms vs 8ms).

Michael Schlenker

unread,

Dec 8, 2005, 9:54:43 AM12/8/05

to

Christopher Nelson schrieb:

No idea how slow it would be:

proc x {l} {
llength [lsort -unique $l]
}

Michael

Fredderic

unread,

Dec 8, 2005, 9:46:04 AM12/8/05

to

On 8 Dec 2005 06:04:33 -0800,
"Christopher Nelson" <cne...@nycap.rr.com> wrote:

> I'm looking for an elegant and efficient way to determine how many
> distinct elements are in a list. For example [x [list 1]] and [x
> [list 1 1]] and [x [list 1 1 1 1]] should all return 1, [x [list 1 2]]
> and [x [list 1 2 1]] should return 2. One brute force approach that
> suggests itself (untested) is:
>
> proc x { l } {
> foreach e $l {
> set a($e) 1
> }
> return [llength [array names a]]
> }

The only slight improvement I can imagine would be this:

proc count_uniq {list} {
foreach $list [list [unset list]] break
llength [info locals]
}

I use something similar (without the llength) to find the set of unique
values in a list. It's been able to handle anything I've thrown at it
to date. (The little bit of trickery with [list] is because foreach
insists on having at least one real item in its list of values)

Fredderic

unread,

Dec 8, 2005, 10:00:44 AM12/8/05

to

On Thu, 08 Dec 2005 14:31:46 GMT,
Bryan Oakley <oak...@bardo.clearlight.com> wrote:

> > I'm looking for an elegant and efficient way to determine how many
> > distinct elements are in a list. For example [x [list 1]] and [x
> > [list 1 1]] and [x [list 1 1 1 1]] should all return 1, [x [list 1
> > 2]] and [x [list 1 2 1]] should return 2. One brute force approach
> > that suggests itself (untested) is:

> Why the need for an elegant function? Does this one not perform well
> enough for your specific need? Or maybe the right question is, why do
> you not think that's an elegant solution?

The general view of elegance seems to be either code efficiency, or
code readability. I tend to switch views depending on what I'm writing
at the time. For a small generic function that does a specific task
which could prove handy in all sorts of unforeseen circumstances (and
who's total purpose is summed up in its entirety by the procedures
one-word name alone), I like to go for the efficiency definition. For a
more complex task used in specific cases as part of a larger process, I
go for code readability.

Besides which, in this case (typical of this sort of generic utility
function), I find the most "readable" versions to be only slightly
easier to read than any of the highly efficient and "unreadable"
versions I could imagine off the top of my head anyhow.

> If you want a one-liner (if "one-liner == elegant"), you can try this:
> [llength [lsort -unique $list]]
> ... but that, somewhat surprisingly, takes longer than your brute
> force solution on the same data set (11ms vs 8ms).

In most cases, the instant you mention the term "sort", efficiency
pretty much goes out the window (unless you can cheat somehow, or
unless the alternative would be very, very ugly indeed). ;)

Fredderic

Bryan Oakley

unread,

Dec 8, 2005, 10:09:22 AM12/8/05

to

Michael Schlenker wrote:
> No idea how slow it would be:

Are you aware of the time command? It answers that question very easily.

Aleks

unread,

Dec 8, 2005, 10:24:35 AM12/8/05

to

davidh...@simplifiedlogic.com schrieb:

> llength [lsort -unique "1 2 2 3 3"]
> --> 3
>

Of course! I tend to forget that much of tclx's functionality has made
it into the core which was not true in the last millenium when I started
using tcl (and tclx).

Christopher Nelson

unread,

Dec 9, 2005, 9:41:33 AM12/9/05

to

Aleks wrote:
> Christopher Nelson schrieb:
> > I'm looking for an elegant and efficient way to determine how many

> > distinct elements are in a list. ...

>
> Probably not very helpful but tclx has a function (lrmdups) wich removes
> duplicate values from lists so you could say:
>
> llenght [lrmdups "1 2 2 3 3"]
> => 3

Strangely, I find TkX but not TclX is in [package names]. I imagine
that means my installation is screwed up and I'll look into that.

I like this solution for it's expressiveness: it says what I'm going:
"count the items in the list after removing duplicates".

That expressiveness is one aspect of what I consider "elegant".
Efficienty and maintainability are others.

Thanks all.