Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Duplicate elements in array

0 views
Skip to first unread message

Shuaib Zahda

unread,
Oct 28, 2007, 8:47:39 AM10/28/07
to
Hello

I am trying to output the duplicate elements in an array. I looked into
the api of ruby I found uniq method which outputs the array with no
duplication. What i want is to know which elements is duplicated.
For example

array = ["apple", "banana", "apple", "orange"]
=> ["apple", "banana", "apple", "orange"]
array.uniq
=> ["apple", "banana", "orange"]

I want the method to tell me that apple is the duplicated element

I tried this but it does not work

array - array.uniq

any idea

Regards
Shuaib
--
Posted via http://www.ruby-forum.com/.

Mohit Sindhwani

unread,
Oct 28, 2007, 9:15:40 AM10/28/07
to

I don't know a good way to do it, but one way to get the result would be
to force it into a hash since that eliminates duplicates.


I'm sure there's a better way to do it, but here's what I got.

array = ["apple", "banana", "apple", "orange", "fat", "cow", "cow"]
h = Hash.new
duplicates = []

array.each {|item|
if h.has_key?(item) then
duplicates << item
else
h[item] = 0 #it doesn't matter what we store
end
}

puts duplicates

Cheers
Mohit.


Mohit Sindhwani

unread,
Oct 28, 2007, 9:16:28 AM10/28/07
to
Sean O'Halpin wrote:
>> Here's one way (I'm sure there must be a simpler approach - just can't
>>
> think of it right now):

>
> array = ["apple", "banana", "apple", "orange"]
> counts = array.inject(Hash.new {|h,k| h[k] = 0 }) { |hash, item| hash[item]
> += 1; hash}
> p counts #=> {"apple"=>2, "banana"=>1, "orange"=>1}
> p counts.select { |k,v| v > 1 }.map{ |k, v| k}.flatten #=> ["apple"]
>
>
> Regards,
> Sean
>
>
I so have to get the hang of inject, flatten and map.

Cheers,
Mohit.
10/28/2007 | 9:16 PM.


Robert Klemme

unread,
Oct 28, 2007, 9:35:37 AM10/28/07
to

irb(main):007:0> array = %w{apple banana apple orange}


=> ["apple", "banana", "apple", "orange"]

irb(main):008:0> array.inject(Hash.new(0)) {|ha,e|
ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
=> ["apple"]

Kind regards

robert

Shuaib Zahda

unread,
Oct 28, 2007, 9:48:23 AM10/28/07
to
Thanks a lot guys.
It works.

I really appreciate your help

Cheers

Harry Kakueki

unread,
Oct 28, 2007, 9:55:07 AM10/28/07
to
On 10/28/07, Shuaib Zahda <shuaib...@gmail.com> wrote:

arr,dup = ["apple", "banana", "apple", "orange"],[]
(arr.length-1).times do
a = arr.shift
dup << a if arr.include?(a)
end
p dup.uniq

Harry

--
A Look into Japanese Ruby List in English
http://www.kakueki.com/

Sean O'Halpin

unread,
Oct 28, 2007, 1:44:53 PM10/28/07
to
On 10/28/07, Mohit Sindhwani <mo_...@onghu.com> wrote:
> I so have to get the hang of inject, flatten and map.
>
> Cheers,
> Mohit.
> 10/28/2007 | 9:16 PM.

Hi,

They are definitely worth looking into - inject in particular is a
powerful tool (Robert Klemme can make it do anything!). However, the
following benchmark shows that a slight modification of your approach
is actually pretty efficient. (The modification is to store the
duplicates in a hash rather than an array so you can return the list
of duplicates using Hash#keys).

Regards,
Sean

# Mohit Sindhwani (with slight adjustment)
def duplicates_1(array)
seen = { }
duplicates = { }
array.each {|item| seen.key?(item) ? duplicates[item] = true :
seen[item] = true}
duplicates.keys
end

# Robert Klemme
def duplicates_2(array)


array.inject(Hash.new(0)) {|ha,e| ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys

end

# from facets
def duplicates_3(array)
array.inject(Hash.new(0)){|h,v| h[v]+=1; h}.reject{|k,v| v==1}.keys
end

require 'benchmark'

def do_benchmark(title, n, methods, *args, &block)
puts '-' * 40
puts title
puts '-' * 40
Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x|
methods.each do |meth|
x.report(meth.to_s) { n.times do send(meth, *args, &block) end }
end
end
end

# get some data (Ubuntu specific I guess - YMMV)
array = File.read('/etc/dictionaries-common/words').split(/\n/)

# test w/o dups
do_benchmark('no duplicates', 10, [:duplicates_1, :duplicates_2,
:duplicates_3], array)

# create some duplicates
array = array[0..999] * 100
do_benchmark('duplicates', 10, [:duplicates_1, :duplicates_2,
:duplicates_3], array)

__END__
$ ruby bm-duplicates.rb
----------------------------------------
no duplicates
----------------------------------------
user system total real
duplicates_1 2.200000 0.010000 2.210000 ( 2.215057)
duplicates_2 5.820000 0.000000 5.820000 ( 5.812414)
duplicates_3 6.580000 0.010000 6.590000 ( 6.586708)
----------------------------------------
duplicates
----------------------------------------
user system total real
duplicates_1 1.560000 0.000000 1.560000 ( 1.562587)
duplicates_2 2.660000 0.000000 2.660000 ( 2.665301)
duplicates_3 2.590000 0.000000 2.590000 ( 2.595189)

Mohit Sindhwani

unread,
Oct 28, 2007, 2:15:20 PM10/28/07
to

Thanks Sean! Makes me feel quite nice about it.

So, hashes are faster than arrays?

Cheers,
Mohit.
10/29/2007 | 2:13 AM.


Sean O'Halpin

unread,
Oct 28, 2007, 2:47:28 PM10/28/07
to

It depends what you're doing with them and how big they are. But in
this instance, I changed your solution to use a hash because you were
appending the duplicates to an array which resulted in adding an entry
to that array every time you detected a duplicate. This didn't show up
in your example because your data contained at most two instances of
an item. If you change your example to:

array = ["apple", "banana", "apple", "orange", "fat", "cow", "cow",
"apple", "apple"]


h = Hash.new
duplicates = []

array.each {|item|
if h.has_key?(item) then
duplicates << item
else
h[item] = 0 #it doesn't matter what we store
end
}

puts duplicates

it outputs

apple
cow
apple
apple

which is probably not what you want.

Regards,
Sean

Jimmy Kofler

unread,
Oct 28, 2007, 5:31:53 PM10/28/07
to
> Duplicate elements in array
> Posted by Shuaib Zahda (shuaib85) on 28.10.2007 13:47

> Hello
>
> I am trying to output the duplicate elements in an array. I looked into
> the api of ruby I found uniq method which outputs the array with no
> duplication. What i want is to know which elements is duplicated.

Here's yet another way to do it:
http://snippets.dzone.com/posts/show/4148

Cheers,

j.k.

Mohit Sindhwani

unread,
Oct 28, 2007, 11:44:24 PM10/28/07
to
Thanks for the explanation, Sean. Actually, I guess it's not clear if
the OP wants to know each occurrence of the duplicates or just the list
of duplicates. But, there are now solutions for both cases!

Cheers,
Mohit.
10/29/2007 | 11:44 AM.

Peña, Botp

unread,
Oct 29, 2007, 12:02:28 AM10/29/07
to
From: Sean O'Halpin [mailto:sean.o...@gmail.com]
# $ ruby bm-duplicates.rb
# ----------------------------------------
# no duplicates
# ----------------------------------------
# user system total real
# duplicates_1 2.200000 0.010000 2.210000 ( 2.215057)
# duplicates_2 5.820000 0.000000 5.820000 ( 5.812414)
# duplicates_3 6.580000 0.010000 6.590000 ( 6.586708)
# ----------------------------------------
# duplicates
# ----------------------------------------
# user system total real
# duplicates_1 1.560000 0.000000 1.560000 ( 1.562587)
# duplicates_2 2.660000 0.000000 2.660000 ( 2.665301)
# duplicates_3 2.590000 0.000000 2.590000 ( 2.595189)

i just tested this using ruby1.9 on a p4 box running windowsxp. i included ruby's group_by and got surprising results.

C:\ruby1.9\bin>diff test-old.rb test.rb
19a20,24
> #1.9's group_by
> def duplicates_4(array)
> array.group_by{|e|e}.select{|_,k| k.size>1}.keys
> end
>
26c31


< Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x|

---
> Benchmark.bmbm(methods.map{ |x| x.to_s.length}.max + 2) do |x|
34c39


< array = File.read('/etc/dictionaries-common/words').split(/\n/)

---
> array = File.read('american-english').split(/\n/)
38c43
< :duplicates_3], array)
---
> :duplicates_3,:duplicates_4], array)
43c48
< :duplicates_3], array)
---
> :duplicates_3,:duplicates_4], array)

C:\ruby1.9\bin>


C:\ruby1.9\bin>ruby test.rb
----------------------------------------
no duplicates
----------------------------------------
Rehearsal -------------------------------------------------
duplicates_1 7.609000 0.094000 7.703000 ( 7.984000)
duplicates_2 10.438000 0.109000 10.547000 ( 11.608000)
duplicates_3 14.609000 0.219000 14.828000 ( 14.874000)
duplicates_4 11.422000 0.141000 11.563000 ( 14.201000)
--------------------------------------- total: 44.641000sec

user system total real
duplicates_1 7.219000 0.125000 7.344000 ( 8.109000)
duplicates_2 9.844000 0.078000 9.922000 ( 10.374000)
duplicates_3 14.391000 0.172000 14.563000 ( 18.498000)
duplicates_4 11.172000 0.172000 11.344000 ( 12.998000)
----------------------------------------
duplicates
----------------------------------------
Rehearsal -------------------------------------------------
duplicates_1 3.375000 0.000000 3.375000 ( 3.765000)
duplicates_2 3.218000 0.000000 3.218000 ( 3.828000)
duplicates_3 3.250000 0.000000 3.250000 ( 3.672000)
duplicates_4 2.032000 0.047000 2.079000 ( 2.077000)
--------------------------------------- total: 11.922000sec

user system total real
duplicates_1 3.375000 0.000000 3.375000 ( 3.437000)
duplicates_2 3.188000 0.000000 3.188000 ( 3.218000)
duplicates_3 3.219000 0.015000 3.234000 ( 3.281000)
duplicates_4 1.844000 0.000000 1.844000 ( 1.859000)

C:\ruby1.9\bin>

kind regards -botp

Robert Klemme

unread,
Oct 29, 2007, 6:28:20 AM10/29/07
to
2007/10/28, Sean O'Halpin <sean.o...@gmail.com>:

> On 10/28/07, Robert Klemme <short...@googlemail.com> wrote:
> >
> > irb(main):007:0> array = %w{apple banana apple orange}
> > => ["apple", "banana", "apple", "orange"]
> > irb(main):008:0> array.inject(Hash.new(0)) {|ha,e|
> > ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
> > => ["apple"]
> >
> Succint ~and~ efficient!

Thanks!

> Do you have a mail filter checking for any posts
> containing 'inject'? :)

I don't need that since most of them were written by me. :-) (slight
exaggeration)
*chuckle*

Kind regards

robert

0 new messages