Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to select unique elements in a list?

195 views
Skip to first unread message

Robert Villegas

unread,
Aug 19, 1997, 3:00:00 AM8/19/97
to

I am pretty sure that all the replies to this query


> First, let me say that Union does not do what I'm looking for! I am
> interested in a function which returns the unique elements from a list.
> Let's call the function Distinct. Then, I want
>
> Distinct[{i1,i2,i3,i2}]
>
> to return
>
> {i1,i3}


did not preserve the original ordering of the list when they extracted
singletons. Although the original posting did not ask for this, some
applications would want to retain the original order, so it's interesting to
find a reasonably fast order-preserving version. We can build on the fast
output of any of the proposed methods to fix up the ordering.

First, a function that will reorder a subset according to the ordering of
the original list:

RestoreOrder[subset_, list_] :=
Last /@
Sort[ {Position[list, #, {1}, 1][[1, 1]], #}& /@
subset
]


Now apply the RestoreOrder operator to any of the methods:

Singletons[list_] :=
RestoreOrder[#, list]& @
Cases[Split @ Sort[list], {singleton_} :> singleton]

Not as fast as the proposals, but it preserves more information.

We might generalize this to selecting elements that occur k times instead
of just once:

MultiplicitySelect[list_, k_] :=
RestoreOrder[#, list]& [
First /@
Cases[Split @ Sort[list], Table[_, {k}] ]
]


By the way, the same idea can produce an order-preserving variant of Union,
which is something I've needed fairly frequently:

EliminateRepetition[list_List] := RestoreOrder[Union[list], list]

Robby Villegas


Robert Villegas

unread,
Aug 22, 1997, 3:00:00 AM8/22/97
to

About my order-restoring function that I used on the singletons:

RestoreOrder[subset_, list_] :=
Last /@
Sort[ {Position[list, #, {1}, 1][[1, 1]], #}& /@
subset
]


I just remembered that Todd Gayley and I realized a far faster indexing
method for a somewhat similar problem several months ago: build a rule
table mapping each element to its position. Here is the new, faster
RestoreOrder:

RestoreOrder[subset_, list_] :=
With[{elemIndex = Dispatch @ Thread[list -> Range @ Length[list]]},
Last /@
Sort[{Replace[#, elemIndex], #}& /@ subset]
]


Here's a timing comparison on a 15,000-element list with lots of singletons:

In[10]:=
bigList = Table[Random[Integer, {1, 10*^3}], {15*^3}];

In[11]:=
Timing[oldResult = Singletons[bigList];]

Out[11]=
{50.93 Second,Null}

In[12]:=
Timing[newResult = SingletonsNew[bigList];]

Out[12]=
{0.96 Second,Null}

In[13]:=
oldResult === newResult

Out[13]=
True

In[14]:=
Length[oldResult]

Out[14]=
3230


What you sacrifice with the big hash table, of course, is memory.
MaxMemoryUsed[] increases by a couple megabytes or more when this
is run on a list of 10,000 integers. MemoryInUse[] doesn't change
that much, which means the memory used by the temporary hash table
is available for future Mathematica use, but not for other
applications.

If you're going to re-order many times with respect to the same master
list, you might want RestoreOrder to remember the hash table so it
doesn't have to regenerate it each time. Generating a hash table is
pretty fast, so don't bother unless you need it quite a few times or
you're on a slow machine. Perhaps something like the following:


ClearAll[RestoreOrder]

Options[RestoreOrder] = {Remember -> False};

RestoreOrder[subset_, list_, opts___] :=
With[{record = $RestoreOrderRecord[list]},
With[{elemIndex =
Replace[record,
_$RestoreOrderRecord :>
Dispatch @ Thread[list -> Range @ Length[list]]
]
},
If[(Remember /. Flatten[{opts, Options[RestoreOrder]}]) &&
Head[record] === $RestoreOrderRecord,
$RestoreOrderRecord[list] = elemIndex
];

Last /@
Sort[{Replace[#, elemIndex], #}& /@ subset]
]
]

Now use the option Remember -> True for individual cases or do
SetOptions[RestoreOrder, Remember -> True] to make it remember all
future hash tables (then you'll be sacrificing MemoryInUse[] as well
as MaxMemoryUsed[]).


Maybe there's a more elegant way to implement the remembering mechanism,
but this works.


Robby Villegas


Nigel King

unread,
Aug 26, 1997, 3:00:00 AM8/26/97
to

In article <5rmq5u$p...@smc.vnet.net>,
cr...@c2.telstra-mm.net.au (Colin Rose) wrote:

>"Carl Woll" <ca...@u.washington.edu> wrote:
>
> First, let me say that Union does not do what I'm looking for! I am
> interested in a function which returns the unique elements from a list.
> Let's call the function Distinct. Then, I want
>
> Distinct[{i1,i2,i3,i2}] to return {i1,i3}
>

> RemoveDuplicates[a___,i_,i_,b___]:=RemoveDuplicates[a,b]
> Distinct[a_List]:=List@@RemoveDuplicates@@Sort[a]

I am not sure that people have noticed in this that the removal above does
not remove odd numbers of duplicates

In[19]:=Distinct[{1,2,2,2,3}]
Out[19]={1,2,3}

any solutions? As noticed pattern matching is not easy!

Nigel King
Northern Telecom


0 new messages