# Idiomatic way to remove duplicates in a slice

24,455 views

### Sathish VJ

Aug 11, 2012, 10:33:54 PM8/11/12
Is there an efficient, idiomatic way to remove duplicates in a slice?
If possible, I do want to retain the original order of items.

### Hailiang

Aug 12, 2012, 12:47:40 AM8/12/12
There is an implementation in a previous post "Set implementation benchmarking" by Karl Seguin, and I think it is simple enough and retain the order of elements:
func (this *OnDemandArraySet) RemoveDuplicates() {
length := len(this.data) - 1
for i := 0; i < length; i++ {
for j := i + 1; j <= length; j++ {
if (this.data[i] == this.data[j]) {
this.data[j] = this.data[length]
this.data = this.data[0:length]
length--
j--
}
}
}
}

Hailiang

### Paul Hankin

Aug 12, 2012, 2:43:20 AM8/12/12
On Sunday, 12 August 2012 04:33:54 UTC+2, Sathish VJ wrote:
Is there an efficient, idiomatic way to remove duplicates in a slice?
If possible, I do want to retain the original order of items.

I'd do the obvious, and build a map of the things found so far when iterating over the slice, omitting ones that have already been found. Like this:

package main

import (
"fmt"
)

func RemoveDuplicates(xs *[]string) {
found := make(map[string]bool)
j := 0
for i, x := range *xs {
if !found[x] {
found[x] = true
(*xs)[j] = (*xs)[i]
j++
}
}
*xs = (*xs)[:j]
}

func main() {
xs := []string{"to", "be", "or", "not", "to", "be", "that"}
RemoveDuplicates(&xs)
fmt.Println(xs)
}

--
Paul

### DisposaBoy

Aug 12, 2012, 3:11:39 AM8/12/12

On Sunday, August 12, 2012 3:33:54 AM UTC+1, Sathish VJ wrote:
Is there an efficient, idiomatic way to remove duplicates in a slice?
If possible, I do want to retain the original order of items.

If your dataset isn't so large that you have to start to being paranoid about memory usage then the simplest thing I can think is to just use a map along-side the slice.
Two simple approaches are:

### Michael Jones

Aug 12, 2012, 5:34:12 AM8/12/12
These are all great answers. The "better" question seems less about slices and more about context--how do you know if there are duplicates (where 'how' stands for how many, how efficiently, how much extra storage, etc.)

The first answer is an "inverse insertion sort" that will use no extra memory and do O(n**2) comparisons. The other answers use Go's map feature and will be linear to logarithmic in n (depending on the hidden implementation and data sizes vs. table allocations.) This is a huge difference if you have 1,000,000 values to de-duplicate.

Even so, the first way could* easily be faster for small n. If you know the range of n, the size of the slice's data type, and the available working storage, then you could properly decide between these two paths, since each is better in some cases.

Michael

*Pure conjecture...
--
Michael T. Jones | Chief Technology Advocate  |

### Glenn Brown

Aug 12, 2012, 6:21:11 AM8/12/12
to Sathish VJ, Glenn Brown, golan...@googlegroups.com

> Is there an efficient, idiomatic way to remove duplicates in a slice?
> If possible, I do want to retain the original order of items.

This one runs in linear time: http://play.golang.org/p/HJEm6qy5wb
It uses a map to track which entries have been seen, so it only works with map-able types.

package main

import "fmt"

func removeDuplicates(a []int) []int {
result := []int{}
seen := map[int]int{}
for _, val := range a {
if _, ok := seen[val]; !ok {
result = append(result, val)
seen[val] = val
}
}
return result
}

func main() {
x := []int{9, 1, 2, 3, 5, 6, 9, 2, 3, 3, 6, 8, 1}
x = removeDuplicates(x)
fmt.Println(x)
}

### Tim Harig

Aug 12, 2012, 7:16:36 AM8/12/12
to golang-nuts
On Sat, Aug 11, 2012 at 07:33:54PM -0700, Sathish VJ wrote:
> Is there an efficient, idiomatic way to remove duplicates in a slice?
> If possible, I do want to retain the original order of items.

It seems to me, that if you do not want duplication, don't allow
duplicates into the slice. Keep the data itself in a map or btree
structure that will make duplicates obvious as you are trying to
store them. You can then use a slice of pointers to the objects in
the map/btree to preserve your order if you really want to preserver
linearity.

If you need to represent duplication in your slice at some point, then
use a form of indirection that includes reference counting. Then all
you have to do, is remove any pointer in your slice to an object that
contains a reference count greater than one. This can be done in the
data type itself represented by an interface or by using an intermediate
smart pointer.

### ali oygur

Dec 14, 2016, 6:41:28 AM12/14/16
to golang-nuts
there is good gist.

https://gist.github.com/alioygur/16c66b4249cb42715091fe010eec7e33

12 Ağustos 2012 Pazar 06:33:54 UTC+4 tarihinde Sathish VJ yazdı:

### Michael Jones

Dec 14, 2016, 12:15:01 PM12/14/16
to ali oygur, golang-nuts

Do we know anything about the structure of the slice?

It is inherent in the general notion that there must be a comparison of a new item with existing ones to establish uniqueness. Even so, knowledge about the structure of the input can make this comparison easier.

If the input is already ordered, then testing against the prior element is a test against all elements.

If the input is integers in a small range then a bit mask presence test is good.

If avoiding allocation is important one could sort the list O(n log n) and then scan it O(n)

The hash (map/dictionary) approach is good, general, and slightly more space efficient with a slice of struct{} since presence testing is all that is necessary.

An O(n) scan to build a histogram would let you know about ranges and potential sections for subsequent processing.

Many well-known approaches.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

### Ayan George

Dec 14, 2016, 12:36:01 PM12/14/16
On 12/14/2016 12:14 PM, Michael Jones wrote:
> Do we know anything about the structure of the slice?
>
>

Just FYI -- this thread was started on 8/11/2012 and you, Michael, even
responded to it at the time.. I think it was pretty well hashed out
back then and I'm not sure why this was resurrected:

-ayan

### Michael Jones

Dec 14, 2016, 1:07:11 PM12/14/16
Ha ha! Thought it seemed a familiar question. Sometimes they resurface and it seems kind to treat new ones fairly. I should have looked though. Thank you!

### John Souvestre

Dec 14, 2016, 1:45:01 PM12/14/16
to golang-nuts

Ø  there is good gist.

Ø  this algorithm 10x faster …

I don’t believe it is faster as the slice size increases.  What you propose is O(N**2) vs O(N).  Try a test with a slice 10 or 100 times the size.

John

John Souvestre - New Orleans LA

--