Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

lsort -unique without sort ...?

2,817 views
Skip to first unread message

Anne Rozinat

unread,
Sep 19, 2001, 5:35:51 PM9/19/01
to
Hello,

does anybody know, if it is possible to use the [lsort -unique $mylist]
command, without sorting anything?
I only want to remove duplications from my list but initial order is
important!

Thanks.
Anne


Jeff Hobbs

unread,
Sep 19, 2001, 7:02:52 PM9/19/01
to
Anne Rozinat wrote:
...

There is a command for that in TclX (lrmdups), but IIRC it is written
in pure Tcl, and there are faster varieties. Keeping the initial
order and removing dups is costly. The reason it's there in lsort
is that it was an easy option to add given a sorted list.

--
Jeff Hobbs The Tcl Guy
Senior Developer http://www.ActiveState.com/
Tcl Support and Productivity Solutions

Bruce Hartweg

unread,
Sep 19, 2001, 8:03:44 PM9/19/01
to
Anne Rozinat wrote:

this (untested) snippet should do the trick - but no guarantees on speed
for large lists

set new {}
catch {unset tmpAry}
foreach elem $old {
if [info exists tmpAry($elem)] {
# already have this one
} else {
set tmpAry($elem) 1
lappend new $elem
}
}

Bruce

Les Cargill

unread,
Sep 19, 2001, 10:21:35 PM9/19/01
to

Note: this is probably lousy usage ("\{" for constructing lists ),
but it works ( and exposes the power of the "-index" option ):

What is the prefferred method, anyhow, folks? I always end up back doing
this to construct lists, for some reason.

--- BEGIN EXCERPT ----
proc uniq_ue { l } {

set interim ""
set k 0
foreach elem $l {
set interim "$interim \{ $k $elem \}"
incr k
}
set interim [lsort -index 1 -unique $interim ]
set interim [lsort -index 0 -unique $interim ]
set ret ""
foreach elem $interim {
set ret "$ret [lindex $elem 1]"
}
## Garnish with an ending space so strings match
set ret "$ret "
return $ret
}


set l { fifteen two three five seven four eight seven two }
set l2 { fifteen three five four eight seven two }

set l3 [ uniq_ue $l ]
##This is test code, which takes advantage of string matching.
if {$l3 != $l2} {
puts "Sorry, no dice."
puts "Got <$l3>"
puts "Expected <$l2>"
} else {
puts "Worked"
}

## Perhaps, an exit
##exit

--- END EXCERPT ---
> Thanks.
> Anne

--
http://home.att.net/~lcargill

Bruce Hartweg

unread,
Sep 20, 2001, 3:51:10 AM9/20/01
to

"Les Cargill" <lcar...@worldnet.att.net> wrote in message news:3BA9533B...@worldnet.att.net...

>
>
> Anne Rozinat wrote:
> >
> > Hello,
> >
> > does anybody know, if it is possible to use the [lsort -unique $mylist]
> > command, without sorting anything?
> > I only want to remove duplications from my list but initial order is
> > important!
> >
>
> Note: this is probably lousy usage ("\{" for constructing lists ),
> but it works ( and exposes the power of the "-index" option ):
>
use the list command to create lists, (also use lappend to grow lists)
e.g.

set interim {}
set k 0
foreach elem $l {
lappend interim [list $k $elem]
incr k
}

(later you can also say "lappend $ret [lindex $elem 1])

not only is lappend faster, the list command ensures proper quoting of $elem,
if the original list had a single element consiting of multiple words,
your method would not end up with a 2 element list, but instead
an N+1 element list (N = number of words in original element)
(to see why this is bad try running the following list thru your code)

set lst1 [list a b c "a multi-word element" d "some string" e "some other string"]
set lst2 [uniq_ue $lst1]
if [string equal [string trim $lst1] [string trim $lst2]] {
puts "Everything OK"
} else {
puts "Uh-oh, something screwed up!"
}

Also, instead of worrying about ading a space to make sure strings match
(which is fine in a known test example, but in actual code how do you
know if your string will need it?) Easier to use a string trim on both args
at comparision time. (If you are really trying to compare 2 lists, you might
also want to use join to make sure they match as strings - because the
following 2 LISTS are the same
set l1 { a b c d}
set l2 "a b c d"
but will NOT be the same string (but [join $l1] will be the same as [join $l2])


A final note, since the OP stated that maintaining order is important, it is good
to note that this method (even when fixed to ensure correctness) seems to remove
the initial duplicates so at first glance the order 'seems' changed

> set l { fifteen two three five seven four eight seven two }
> set l2 { fifteen three five four eight seven two }

to me the first list has 2 extra dupes at the end, but this removes the ones
earlier in the list so it looks like some re-ordering has been done. This is
OK, but if order is important then you want to know if it is keeping the first
occurrence of an element, or last, or some middle value. I'm not even sure
that you will always get the last one (but probably so - depends on implementation
of the lsort -unique).
See my earlier post for a method that always keep first occurrences.

Bruce


Tom Wilkason

unread,
Sep 20, 2001, 5:20:50 PM9/20/01
to
"Anne Rozinat" <an...@pimpdaddy.de> wrote in message
news:9ob2sm$o52$03$1...@news.t-online.com...

#At the cost of memory you can use:
proc lunique {LIST} {
foreach item $LIST {
if {![info exists hash($item)]} {
lappend newList $item
set hash($item) 1
}
}
return $newList
}

#or at the cost of speed use
proc lunique {LIST} {
set newList [list]
foreach item $LIST {
if {[lsearch -exact $newList $item]==-1} {
lappend newList $item
}
}
return $newList
}

Tom Wilkason

Les Cargill

unread,
Sep 20, 2001, 9:03:59 PM9/20/01
to

Bruce Hartweg wrote:
>
<snip>

Thanks, Bruce. Excellent tips, all. I've developed a few bad habits.

> A final note, since the OP stated that maintaining order is important, it is good
> to note that this method (even when fixed to ensure correctness) seems to remove
> the initial duplicates so at first glance the order 'seems' changed
>
> > set l { fifteen two three five seven four eight seven two }
> > set l2 { fifteen three five four eight seven two }
>
> to me the first list has 2 extra dupes at the end, but this removes the ones
> earlier in the list so it looks like some re-ordering has been done. This is
> OK, but if order is important then you want to know if it is keeping the first
> occurrence of an element, or last, or some middle value. I'm not even sure
> that you will always get the last one (but probably so - depends on implementation
> of the lsort -unique).

I forgot that lsort is probably a quicksort at heart, and quicksort
is not stable.

> See my earlier post for a method that always keep first occurrences.
>
> Bruce

--
http://home.att.net/~lcargill

Donal K. Fellows

unread,
Sep 21, 2001, 5:59:09 AM9/21/01
to
Les Cargill wrote:
> I forgot that lsort is probably a quicksort at heart, and quicksort
> is not stable.

[lsort] is *guaranteed* to be a stable sort with good performance even in
worst-cases like fully- or reverse-sorted lists. (It's merge-sort, not
quicksort.) THIS IS DOCUMENTED IN THE FIRST PARAGRAPH OF THE MANPAGE.
A quick "D'oh!" is in order, don't you think? :^)

Donal.
--
-- Itanium-based servers combined with Microsoft's latest server software
offer customers superior performance, greater choice, reliability and
investment protection at significantly lower costs than proprietary
solutions. -- Abhi Talwalkar (Intel VP)

morgan mair fheal

unread,
Sep 21, 2001, 6:32:21 AM9/21/01
to
In article <9ob2sm$o52$03$1...@news.t-online.com>, "Anne Rozinat"
<an...@pimpdaddy.de> wrote:

proc unique L {
foreach e $L {set A($e) 1}
array names A
}


so many ways to code
so few megabytes to code in

Bruce Hartweg

unread,
Sep 21, 2001, 9:26:58 AM9/21/01
to

"morgan mair fheal" <mair_...@www.yahoo.com> wrote in message news:mair_fheal-21...@c36.ppp.tsoft.com...

> In article <9ob2sm$o52$03$1...@news.t-online.com>, "Anne Rozinat"
> <an...@pimpdaddy.de> wrote:
>
> >Hello,
> >
> >does anybody know, if it is possible to use the [lsort -unique $mylist]
> >command, without sorting anything?
> >I only want to remove duplications from my list but initial order is
> >important!
>
> proc unique L {
> foreach e $L {set A($e) 1}
> array names A
> }
>
But the poster didn't want an order change in the list, this will
give the list back in some semi-random order neither sorted nor
the original.

Bruce

[file delete /var/tmp/foo] Helfter

unread,
Sep 21, 2001, 9:38:03 AM9/21/01
to
In article <mair_fheal-21...@c36.ppp.tsoft.com>,

mair_...@www.yahoo.com (morgan mair fheal) writes:
>In article <9ob2sm$o52$03$1...@news.t-online.com>, "Anne Rozinat"
><an...@pimpdaddy.de> wrote:
>
>>Hello,
>>
>>does anybody know, if it is possible to use the [lsort -unique $mylist]
>>command, without sorting anything?
>>I only want to remove duplications from my list but initial order is
>>important!
>
>proc unique L {
> foreach e $L {set A($e) 1}
> array names A
>}

I do not believe that this will maintain the original order of the list
which was a requirement in the first post.


>
>
>so many ways to code
>so few megabytes to code in

--
Todd M. Helfter
Database Analyst/Programmer
Purdue University
t...@purdue.edu

Peter Gijsemans

unread,
Sep 21, 2001, 12:54:56 PM9/21/01
to
This little proc keeps the order of the list and removes the doubles.

proc unique_list { original_list unique_list } {
#
# Takes as input a list sorts it and delete all the doubles
# so that the returned list contains unique elements only.
#
# @param original_list : name of the input list
# @param unique_list : name of the output list containing only
unique elements
#

# pass by name of the lists
upvar $original_list ol
upvar $unique_list ul

set unique_el ""

foreach el [lsort $ol] {
if { $unique_el != $el } {
set unique_el $el
lappend ul $el
}
}
}


--
_______________________________________________________
PETER GIJSEMANS ("`-/")_.-'"``-._
Alcatel Bell Space N.V. . . `; -._ )-;-,_`)
Berkenrodelei 33 (v_,)' _ )`-.\ ``-'
B-2660 Hoboken - Belgium _.- _..-_/ / ((.'
phone : ++32(0)3.829.51.87 __((,.-'___((,/_____________

Peter Gijsemans

unread,
Sep 21, 2001, 1:07:51 PM9/21/01
to
Oops,

sorry previous procedure does scramble the list order.

Peter.

Raman Narayan

unread,
Sep 21, 2001, 12:41:03 PM9/21/01
to
Hi Anne,

I don't know why anyone had ever mentioned using lsort itself :)
instead of writing various flavors of sort!

I hope this helps you:

proc nosort {val1 val2} {
if {$val1 == $val2} {
return 0
} else {
return 1
}
}
set testlist [list 2 2 3 3 5 5 5 2 2 1 1]
lsort -command nosort -unique $testlist
=> 1 2 5 3 2 #unique list in reverse order

set uniquelist [lsort -command nosort -unique $testlist]

then just reverse the list as follows(or write an optimized "reverselist"):
set newlist {}
foreach el $uniquelist {
set newlist [linsert $newlist 0 $el]
}


Regards
Raman

Raman Narayan

unread,
Sep 21, 2001, 1:29:43 PM9/21/01
to

Hi Peter,

I don't think that it really works in preserving the order,
infact it would missout the some elements if they get repeated!,
problem seems to be due to using "lsort" to scan through the list.

A testcase for you to check:
set testlist [list 2 2 3 4 4 5 5 3 3 2 2 4 4]
set test {}
unique_list testlist test

PS: I did post a solution you may want to checkout.

Regards
Raman

Glenn Jackman

unread,
Sep 21, 2001, 3:50:45 PM9/21/01
to
Raman Narayan <ram...@exar.com> wrote on [Fri, 21 Sep 2001 09:41:03 -0700]:
>Hi Anne,
>
> I don't know why anyone had ever mentioned using lsort itself :)
>instead of writing various flavors of sort!
>
>I hope this helps you:
>
>proc nosort {val1 val2} {
>if {$val1 == $val2} {
>return 0
>} else {
>return 1
>}
>}
>set testlist [list 2 2 3 3 5 5 5 2 2 1 1]
>lsort -command nosort -unique $testlist
>=> 1 2 5 3 2 #unique list in reverse order

Does not remove _all_ duplicates, only consecutively repeated
elements. OP would expect to see {2 3 5 1} given your input.

btw, changing "return 1" to "return -1" would not reverse the
returned list.

--
Glenn Jackman

Raman Narayan

unread,
Sep 21, 2001, 5:31:39 PM9/21/01
to
Hi Glenn,

> Does not remove _all_ duplicates, only consecutively repeated
> elements. OP would expect to see {2 3 5 1} given your input.
>

It is difficult to get performance and still get reduced output
with "lsort" without "sorting" the output(basically making the
duplicate entries contiguous to make the "unique" part of "lsort"
work correctly). I just made an assumption that all the duplicate
entries are contiguous:) and make "lsort" usable for the task at hand.
The basic idea is that you can get some leverage out of existing "lsort"
itself.

>
> btw, changing "return 1" to "return -1" would not reverse the
> returned list.

I too know that it does not work, thanks for pointing it out.

>
> --
> Glenn Jackman

Regards
Raman

Peter.DeRijk

unread,
Sep 25, 2001, 8:40:02 AM9/25/01
to

The list_remdup in Extral (http://rrna.uia.ac.be/extral/) will do this.
The algorithm basically works like this:
foreach e $list {
if {![info exists a($e)]} {
lappend result $e
set a($e) {}
}
}

There is also a C version in the package that is faster:
code:
set list 10
for {set a 1} {$a < 10000} {incr a} {
lappend list $a
}
time {list_remdup $list}

C version: 16748 microseconds per iteration
Tcl version: 133676 microseconds per iteration

--
Peter De Rijk der...@uia.ua.ac.be
<a href="http://rrna.uia.ac.be/~peter/">Peter</a>
To achieve the impossible, one must think the absurd.
to look where everyone else has looked, but to see what no one else has seen.

Keith Lea

unread,
Sep 25, 2001, 10:55:42 PM9/25/01
to
"Bruce Hartweg" <brha...@bigfoot.com> wrote in message
news:3BA93260...@bigfoot.com...

>
> this (untested) snippet should do the trick - but no guarantees on speed
> for large lists
>
> set new {}
> catch {unset tmpAry}
> foreach elem $old {
> if [info exists tmpAry($elem)] {
> # already have this one
> } else {
> set tmpAry($elem) 1
> lappend new $elem
> }
> }

You could easily speed it up..

proc removeDupes {old} {
## create an empty list
set new [list]

## go through each element of the given list..
foreach element $old {
## it's not a duplicate!
if {![info exists tempArray($element)]} {
## add this element to the new list
lappend new $element

## and remember that we already saw this value
set tempArray($element) 0
}
}

## and return the new list
return $new
}

it takes 820 microseconds per iteration on my p2-400 :) that's an improvement of
about 10% on yours :)

enjoy. :D

-kl


0 new messages