[Haskell-cafe] Create a list without duplicates from a list with duplicates

ne...@lyra.net

unread,

Feb 8, 2008, 7:12:59 AM2/8/08

to haskel...@haskell.org

Hallo!

Let's suppose I have a list [a,b,c,d,c,d]. I'd like to write
a function that returns a new list without duplicates (in
the example [a,b,c,d]). How can I do that? What is the most
general way? I'd like to use the same function for a list of
Int or String or some other user defined data type.

Thank for your attention!
_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Jed Brown

unread,

Feb 8, 2008, 7:21:01 AM2/8/08

to ne...@lyra.net, haskel...@haskell.org

On 8 Feb 2008, ne...@lyra.net wrote:

> Hallo!
>
> Let's suppose I have a list [a,b,c,d,c,d]. I'd like to write
> a function that returns a new list without duplicates (in
> the example [a,b,c,d]). How can I do that? What is the most
> general way? I'd like to use the same function for a list of
> Int or String or some other user defined data type.

Look at Data.List:

nub :: (Eq a) => [a] -> [a]
nub = nubBy (==)

nubBy :: (a -> a -> Bool) -> [a] -> [a]
nubBy eq [] = []
nubBy eq (x:xs) = x : nubBy eq (filter (\ y -> not (eq x y)) xs)

Jed

Felipe Lessa

unread,

Feb 8, 2008, 7:25:45 AM2/8/08

to Jed Brown, haskel...@haskell.org

2008/2/8 Jed Brown <j...@59a2.org>:

> Look at Data.List:
>
> nub :: (Eq a) => [a] -> [a]
> nub = nubBy (==)
>
> nubBy :: (a -> a -> Bool) -> [a] -> [a]
> nubBy eq [] = []
> nubBy eq (x:xs) = x : nubBy eq (filter (\ y -> not (eq x y)) xs)

And then there's also

sort :: (Ord a) => [a] -> [a]

which should have better performance, O(n log n) against O(n²) I
guess, but of course will change the order of the elements. If you
really don't mind the order at all, you could also use Data.Set in the
first place.

Cheers,

--
Felipe.

Dan Weston

unread,

Feb 8, 2008, 3:37:16 PM2/8/08

to Felipe Lessa, haskel...@haskell.org

As noted, (Data.Set.toList . Data.Set.fromList) is the best traditional
solution if you don't care about order (or Data.Set.toAscList for a
sorted result).

If order is important, the new bijective Data.Bimap class
http://code.haskell.org/~scook0/haddock/bimap/Data-Bimap.html
may be your best bet (I haven't yet tried it myself).

Meanwhile, here is a hand-rolled solution to order-preserving nubbing:

> import Data.List(groupBy,sortBy,sort)
> import Data.Maybe(listToMaybe)
>
> efficientNub :: (Ord a) => [a] -> [a]
> efficientNub = flip zip [0..] -- carry along index
> >>> sort -- sort by value, then index
> >>> groupBy equalFsts -- group adjacent equal values
> >>> map head -- keep only primus inter pares
> >>> sortBy compareSnds -- sort by index
> >>> map fst -- discard index
>
> where equalFsts (x1,y1) (x2,y2) = x1 == x2
> compareSnds (x1,y1) (x2,y2) = compare y1 y2
> x >>> y = y . x

There is a hidden proof obligation here:

Exercise: Prove that (groupBy equalFsts >>> map head) is a total
function, using the defintion of groupBy from Data.List:

groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _ [] = []
groupBy eq (x:xs) = (x:ys) : groupBy eq zs
where (ys,zs) = span (eq x) xs

> ------------------------------------------------------------------------

Tillmann Rendel

unread,

Feb 8, 2008, 4:08:33 PM2/8/08

to Dan Weston, haskel...@haskell.org

Dan Weston wrote:
> Meanwhile, here is a hand-rolled solution to order-preserving nubbing:
>
> > import Data.List(groupBy,sortBy,sort)
> > import Data.Maybe(listToMaybe)
> >
> > efficientNub :: (Ord a) => [a] -> [a]
> > efficientNub = flip zip [0..] -- carry along index
> > >>> sort -- sort by value, then index
> > >>> groupBy equalFsts -- group adjacent equal values
> > >>> map head -- keep only primus inter pares
> > >>> sortBy compareSnds -- sort by index
> > >>> map fst -- discard index
> >
> > where equalFsts (x1,y1) (x2,y2) = x1 == x2
> > compareSnds (x1,y1) (x2,y2) = compare y1 y2
> > x >>> y = y . x

I would try something like

efficientNub = catMaybes . snd . mapAccumR f empty where
f s x | member x s = (s, Nothing)
| otherwise = (insert x s, x)

that is, threading the Set of already given results through.

Tillmann

Stuart Cook

unread,

Feb 8, 2008, 10:35:33 PM2/8/08

to haskel...@haskell.org

On Sat, Feb 9, 2008 at 7:36 AM, Dan Weston <west...@imageworks.com> wrote:
> If order is important, the new bijective Data.Bimap class
> http://code.haskell.org/~scook0/haddock/bimap/Data-Bimap.html
> may be your best bet (I haven't yet tried it myself).

Let me try:

nub :: (Ord a) => [a] -> [a]
nub = map snd . Data.Bimap.toAscList . Data.Bimap.fromList . reverse
zip [1..]

> nub "hello, world!"
"helo, wrd!"

Without the call to (reverse), this would still be an order-preserving
nub, except that it would preserve the relative order of the *last*
occurrence of each element. Actually, this makes me wonder whether
fromList's behaviour should be changed, and whether I should add a
"non-clobbering" variant of insert.

Stuart

ChrisK

unread,

Feb 9, 2008, 8:20:14 AM2/9/08

to haskel...@haskell.org, haskel...@haskell.org

For Bimap is there anything like Data.Map.insertWithKey ?

Stuart Cook wrote:
> On Sat, Feb 9, 2008 at 7:36 AM, Dan Weston <west...@imageworks.com> wrote:
>> If order is important, the new bijective Data.Bimap class
>> http://code.haskell.org/~scook0/haddock/bimap/Data-Bimap.html
>> may be your best bet (I haven't yet tried it myself).
>
> Let me try:
>
> nub :: (Ord a) => [a] -> [a]
> nub = map snd . Data.Bimap.toAscList . Data.Bimap.fromList . reverse

> . zip [1..]

Stuart Cook

unread,

Feb 9, 2008, 9:58:24 AM2/9/08

to ChrisK, haskel...@haskell.org

On Sun, Feb 10, 2008 at 12:19 AM, ChrisK <has...@list.mightyreason.com> wrote:
> For Bimap is there anything like Data.Map.insertWithKey ?

No. I wanted to implement the insertWith family, but it wasn't clear
to me what should happen if the value produced by the user's function
already exists, bound to something else.

One possibility might be to do nothing (or error) if that happens,
making it the user's responsibility to not do such things.

Whatever the case may be, it should be possible for client code to
define such an operation, with no more than a constant-factor
overhead. Perhaps I should wait and see if people find themselves
doing this before guessing at an implementation.

(Incidentally, almost everything in Map that's missing from Bimap is
missing either because it doesn't make sense, or because of corner
cases like this one.)

Henning Thielemann

unread,

Feb 9, 2008, 7:09:24 PM2/9/08

to ne...@lyra.net, haskel...@haskell.org

On Fri, 8 Feb 2008, ne...@lyra.net wrote:

> Hallo!
>
> Let's suppose I have a list [a,b,c,d,c,d]. I'd like to write
> a function that returns a new list without duplicates (in
> the example [a,b,c,d]). How can I do that? What is the most
> general way? I'd like to use the same function for a list of
> Int or String or some other user defined data type.

List.nub

I assume you couldn't find this function due its cryptic name. Don't
worry, many others failed before.

Dan Weston

unread,

Feb 11, 2008, 3:34:28 PM2/11/08

to ne...@lyra.net, Henning Thielemann, haskel...@haskell.org

Of course the most *general* way requires an Eq constraint:

> List.nub :: Eq a => [a] -> [a]

But there are better functions (already mentioned) with the less general
Ord constraint.

Int and String are instances of Ord. "some other user defined data type"
probably is too, but if you mean "any other user defined data type",
even Eq may not be satisfied, e.g.

> data NotEvenEq a = FuncsNotEq (a->a)

The hammer you use depends on what you're hammering on.

Dan

Henning Thielemann wrote:
> On Fri, 8 Feb 2008, ne...@lyra.net wrote:
>
>> Hallo!
>>
>> Let's suppose I have a list [a,b,c,d,c,d]. I'd like to write
>> a function that returns a new list without duplicates (in
>> the example [a,b,c,d]). How can I do that? What is the most
>> general way? I'd like to use the same function for a list of
>> Int or String or some other user defined data type.
>
> List.nub