Removing duplicates from a list while maintaining order

2,826 views
Skip to first unread message

Paul Bian

unread,
Jun 3, 2015, 1:25:55 AM6/3/15
to racket...@googlegroups.com
Hi all,

Trying to learn a bit about recursion in racket.
The question is to remove duplicates from a list while maintaining the order of the list.

one function to remove duplicates from the left,

i.e. 1 2 1 3 2 4 5 -> 1 2 3 4 5

and one from the right.

i.e. 1 2 1 3 2 4 5 -> 1 3 2 4 5

I've gotten the code to work by writing some helper functions, but my question is, is it possible to solve these problems WITHOUT:

- local definition (i.e. beginner student language in dr racket)
- loops
- built in 'remove', 'reverse', 'member?'
- writing your own 'remove', 'reverse', 'member?' functions

OR if 'member?' is allowed... I can figure out the right side case without additional functions, but not for the left case, is there a way to do this without the remove duplicates function that i wrote?

---
Code:

http://pastebin.com/h0dBdUaR

(define (RemoveElement element lst)
;if unique-left is applied to (list 1 4 2 1 5 4), the result would be (list 1 4 2 5)
(cond
[(empty? lst) empty]
[(= element (first lst)) (RemoveElement element (rest lst))]
[else (cons (first lst) (RemoveElement element (rest lst)))]))

(define (unique-left lst)
(cond
[(empty? lst) empty]
[(member? (first lst) (rest lst)) (cons (first lst) (unique-left (RemoveElement (first lst) (rest lst))))]
[else (cons (first lst) (unique-left (rest lst)))]
)
)

(define (unique-right lst)
;the result of applying it to (list 1 4 2 1 5 4) would be (list 2 1 5 4)
(cond
[(empty? lst) empty]
[(not (member? (first lst) (rest lst))) (cons (first lst) (unique-right (rest lst)))]
[else (unique-right (rest lst))]))

mazert

unread,
Jun 3, 2015, 5:33:20 AM6/3/15
to racket...@googlegroups.com
Le 03/06/2015 07:25, Paul Bian a écrit :
> Hi all,

> one function to remove duplicates from the left,
>
> i.e. 1 2 1 3 2 4 5 -> 1 2 3 4 5
>
> and one from the right.
>
> i.e. 1 2 1 3 2 4 5 -> 1 3 2 4 5

Hello,

For the first function, I think you can't do it without a local define
or by adding a second parameter (for result). Here is my solution :

(define (right L)
(cond ((empty? L) (list))
((member? (car L) (cdr L)) (right (cdr L)))
(else (cons (car L) (right (cdr L))))))

(define (left L res)
(cond ((empty? L) res)
((not (member? (car L) res)) (left (cdr L) (append res
(list(car L)))))
(else (left (cdr L) res))))


For left, I use an empty list at initialisation (res), and I can check
if an element have already been added. I didn't use reverse but it is
recommanded instead of append cause it cost less of computing time.

append = reverse = O(n) . But i use append more than one time.

In practise I never use remove, cause it cost a lot of computing time
too O(pos) (cause have to reindex each element of the list each time),
except if you remove the first element, but (cdr) does it ;) .


Matthias Felleisen

unread,
Jun 3, 2015, 8:54:08 AM6/3/15
to Paul Bian, racket...@googlegroups.com

In BSL:

;; -----------------------------------------------------------------------------
;; [Listof X] -> [Listof X]
;; remove duplicates ...

;; -----------------------------------------------------------------------------
;; ... keeping the copy on the left
(require racket/base) ;; to import remove*, a variant of which should be in *SL

(check-expect (unique-l (list 1 2 1 3 1 2)) (list 1 2 3))

(define (unique-l l)
(cond
[(empty? l) '()]
[else (cons (first l) (remove* (list (first l)) (unique-l (rest l))))]))

;; -----------------------------------------------------------------------------
;; ... keeping the copy on the right
(check-expect (unique-r (list 1 2 1 3 1 2)) (list 3 1 2))

(define (unique-r l)
(cond
[(empty? l) '()]
[else (if (member? (first l) (rest l))
(unique-r (rest l))
(cons (first l) (unique-r (rest l))))]))
> --
> You received this message because you are subscribed to the Google Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Paul Bian

unread,
Jun 3, 2015, 11:04:38 AM6/3/15
to racket...@googlegroups.com
Thanks for all the help guys, it really helps with my understanding.

So it seems to me there's no real easy way to do this while keeping all the restrictions intact.

I sat here thinking about it for quite a while before posting, as I thought I missed some simple solution, as I'm inexperienced.

Prabhakar Ragde

unread,
Jun 3, 2015, 8:07:20 PM6/3/15
to racket...@googlegroups.com
This is a homework question of mine (though I don't think the OP is
doing homework for a course for credit; I'd be curious to know where he
found it). I usually state the restrictions on 'reverse' and 'remove',
but not 'member?'. The restriction on 'reverse' is so that students
don't write one of the functions by reversing, calling the other,
reversing again. Also sometimes when students violate structural
recursion, they get a reversed result, and I don't want them applying
'reverse' as a quick fix. The restriction on 'remove' is on the built-in
function of that name, because it only removes one element, and using it
leads students away from structural recursion. Recently I have taken to
hinting that they should write something like remove*. --PR

Matthias Felleisen

unread,
Jun 3, 2015, 9:25:30 PM6/3/15
to Prabhakar Ragde, racket...@googlegroups.com

Apologies for posting a solution. Since the OP had some code, I thought I'd show him the essence .. throwing in (require racket) as the key to any instructor who'd grade (and care that a student had cheated).

-- Matthias





On Jun 3, 2015, at 8:07 PM, Prabhakar Ragde wrote:

> This is a homework question of mine (though I don't think the OP is doing homework for a course for credit; I'd be curious to know where he found it). I usually state the restrictions on 'reverse' and 'remove', but not 'member?'. The restriction on 'reverse' is so that students don't write one of the functions by reversing, calling the other, reversing again. Also sometimes when students violate structural recursion, they get a reversed result, and I don't want them applying 'reverse' as a quick fix. The restriction on 'remove' is on the built-in function of that name, because it only removes one element, and using it leads students away from structural recursion. Recently I have taken to hinting that they should write something like remove*. --PR
>

Paul Bian

unread,
Jun 3, 2015, 10:02:57 PM6/3/15
to racket...@googlegroups.com
I'm not a student, and haven't touched scheme since first year university in 2006. I overheard the problem discussed and wanted to give it a shot. Essentially wondering since I couldn't do it with all the constraints, whether I'm lacking some fundamental understanding.

Michael Tiedtke

unread,
Jun 4, 2015, 1:08:22 AM6/4/15
to Paul Bian, racket...@googlegroups.com
While developing with theoretical restrictions is good for practicing in practice I suggest to rely on high level list handling routines like sort, filter, map and for-each as these usually are optimized. You might want to have a look at different implementations of SRFI 1 delete-duplicates. Or think about it in terms of sets with SRFI 113. As the problem is related to sorting you might want to have a look at the by now deprecated SRFI 32: its reference implementation includes heap sort, quick sort and perhaps others.

See http://docs.racket-lang.org/srfi/srfi-std/srfi-1.html#Deletion

delete-duplicates [=] -> list
"Be aware that, in general, delete-duplicates runs in time O(n2) for n-element lists. Uniquifying long lists can be accomplished in O(n lg n) time by sorting the list to bring equal elements together, then using a linear-time algorithm to remove equal elements. Alternatively, one can use algorithms based on element-marking, with linear-time results."


See http://srfi.schemers.org/srfi-113/srfi-113.html#Copyingandconversion

(list->set comparator list)
"Returns a newly allocated set, created as if by set using comparator, that contains the elements of list. Duplicate elements (in the sense of the equality predicate) are omitted."


http://srfi.schemers.org/srfi-95/srfi-95.html (http://srfi.schemers.org/srfi-32/srfi-32.txt)
"To choose optimal sort algorithms requires nearly as much understanding as to write them. Most users don't."

If you have "big data" with duplicates you should try to use a directed graph instead of well formed lists. It's fast and has a small memory footprint when there are many duplicates, i.e. a lot of redundancy in the data. Language contains a lot of redundancy (both at the word and at the letter level). This direction leads to searching (sth like the whole WWW in O(n)) and compression algorithms ...

Have fun,
Michael
Reply all
Reply to author
Forward
0 new messages