Idiomatic way to remove duplicates in a slice

Sathish VJ

unread,

Aug 11, 2012, 10:33:54 PM8/11/12

to golan...@googlegroups.com

Is there an efficient, idiomatic way to remove duplicates in a slice?

If possible, I do want to retain the original order of items.

Hailiang

unread,

Aug 12, 2012, 12:47:40 AM8/12/12

to golan...@googlegroups.com

There is an implementation in a previous post "Set implementation benchmarking" by Karl Seguin, and I think it is simple enough and retain the order of elements:

func (this *OnDemandArraySet) RemoveDuplicates() {

length := len(this.data) - 1

for i := 0; i < length; i++ {

for j := i + 1; j <= length; j++ {

if (this.data[i] == this.data[j]) {

this.data[j] = this.data[length]

this.data = this.data[0:length]

length--

j--

}

Hailiang

Paul Hankin

unread,

Aug 12, 2012, 2:43:20 AM8/12/12

to golan...@googlegroups.com

On Sunday, 12 August 2012 04:33:54 UTC+2, Sathish VJ wrote:

Is there an efficient, idiomatic way to remove duplicates in a slice?
If possible, I do want to retain the original order of items.

I'd do the obvious, and build a map of the things found so far when iterating over the slice, omitting ones that have already been found. Like this:

package main

import (

"fmt"

)

func RemoveDuplicates(xs *[]string) {

found := make(map[string]bool)

j := 0

for i, x := range *xs {

if !found[x] {

found[x] = true

(*xs)[j] = (*xs)[i]

j++

}

*xs = (*xs)[:j]

}

func main() {

xs := []string{"to", "be", "or", "not", "to", "be", "that"}

RemoveDuplicates(&xs)

fmt.Println(xs)

}

http://play.golang.org/p/GG2RQ1ot6t

--

Paul

DisposaBoy

unread,

Aug 12, 2012, 3:11:39 AM8/12/12

to golan...@googlegroups.com

On Sunday, August 12, 2012 3:33:54 AM UTC+1, Sathish VJ wrote:

Is there an efficient, idiomatic way to remove duplicates in a slice?
If possible, I do want to retain the original order of items.

If your dataset isn't so large that you have to start to being paranoid about memory usage then the simplest thing I can think is to just use a map along-side the slice.

Two simple approaches are:

http://play.golang.org/p/q3bZ3hpOzD

http://play.golang.org/p/i3unzb-yXv

Michael Jones

unread,

Aug 12, 2012, 5:34:12 AM8/12/12

to DisposaBoy, golan...@googlegroups.com

These are all great answers. The "better" question seems less about slices and more about context--how do you know if there are duplicates (where 'how' stands for how many, how efficiently, how much extra storage, etc.)

The first answer is an "inverse insertion sort" that will use no extra memory and do O(n**2) comparisons. The other answers use Go's map feature and will be linear to logarithmic in n (depending on the hidden implementation and data sizes vs. table allocations.) This is a huge difference if you have 1,000,000 values to de-duplicate.

Even so, the first way could* easily be faster for small n. If you know the range of n, the size of the slice's data type, and the available working storage, then you could properly decide between these two paths, since each is better in some cases.

Michael

*Pure conjecture...

--

Michael T. Jones | Chief Technology Advocate | m...@google.com | +1 650-335-5765

Glenn Brown

unread,

Aug 12, 2012, 6:21:11 AM8/12/12

to Sathish VJ, Glenn Brown, golan...@googlegroups.com

> Is there an efficient, idiomatic way to remove duplicates in a slice?
> If possible, I do want to retain the original order of items.

This one runs in linear time: http://play.golang.org/p/HJEm6qy5wb
It uses a map to track which entries have been seen, so it only works with map-able types.

package main

import "fmt"

func removeDuplicates(a []int) []int {
result := []int{}
seen := map[int]int{}
for _, val := range a {
if _, ok := seen[val]; !ok {
result = append(result, val)
seen[val] = val
}
}
return result
}

func main() {
x := []int{9, 1, 2, 3, 5, 6, 9, 2, 3, 3, 6, 8, 1}
x = removeDuplicates(x)
fmt.Println(x)
}

Tim Harig

unread,

Aug 12, 2012, 7:16:36 AM8/12/12

to golang-nuts

On Sat, Aug 11, 2012 at 07:33:54PM -0700, Sathish VJ wrote:
> Is there an efficient, idiomatic way to remove duplicates in a slice?
> If possible, I do want to retain the original order of items.

It seems to me, that if you do not want duplication, don't allow
duplicates into the slice. Keep the data itself in a map or btree
structure that will make duplicates obvious as you are trying to
store them. You can then use a slice of pointers to the objects in
the map/btree to preserve your order if you really want to preserver
linearity.

If you need to represent duplication in your slice at some point, then
use a form of indirection that includes reference counting. Then all
you have to do, is remove any pointer in your slice to an object that
contains a reference count greater than one. This can be done in the
data type itself represented by an interface or by using an intermediate
smart pointer.

ali oygur

unread,

Dec 14, 2016, 6:41:28 AM12/14/16

to golang-nuts

there is good gist.

https://gist.github.com/alioygur/16c66b4249cb42715091fe010eec7e33

12 Ağustos 2012 Pazar 06:33:54 UTC+4 tarihinde Sathish VJ yazdı:

Michael Jones

unread,

Dec 14, 2016, 12:15:01 PM12/14/16

to ali oygur, golang-nuts

Do we know anything about the structure of the slice?

It is inherent in the general notion that there must be a comparison of a new item with existing ones to establish uniqueness. Even so, knowledge about the structure of the input can make this comparison easier.

If the input is already ordered, then testing against the prior element is a test against all elements.

If the input is integers in a small range then a bit mask presence test is good.

If avoiding allocation is important one could sort the list O(n log n) and then scan it O(n)

The hash (map/dictionary) approach is good, general, and slightly more space efficient with a slice of struct{} since presence testing is all that is necessary.

An O(n) scan to build a histogram would let you know about ranges and potential sections for subsequent processing.

Many well-known approaches.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ayan George

unread,

Dec 14, 2016, 12:36:01 PM12/14/16

to golan...@googlegroups.com

On 12/14/2016 12:14 PM, Michael Jones wrote:
> Do we know anything about the structure of the slice?
>
>

Just FYI -- this thread was started on 8/11/2012 and you, Michael, even
responded to it at the time.. I think it was pretty well hashed out
back then and I'm not sure why this was resurrected:

https://groups.google.com/forum/#!topic/golang-nuts/-pqkICuokio

-ayan

Michael Jones

unread,

Dec 14, 2016, 1:07:11 PM12/14/16

to Ayan George, golan...@googlegroups.com

Ha ha! Thought it seemed a familiar question. Sometimes they resurface and it seems kind to treat new ones fairly. I should have looked though. Thank you!