I am trying to output the duplicate elements in an array. I looked into
the api of ruby I found uniq method which outputs the array with no
duplication. What i want is to know which elements is duplicated.
For example
array = ["apple", "banana", "apple", "orange"]
=> ["apple", "banana", "apple", "orange"]
array.uniq
=> ["apple", "banana", "orange"]
I want the method to tell me that apple is the duplicated element
I tried this but it does not work
array - array.uniq
any idea
Regards
Shuaib
--
Posted via http://www.ruby-forum.com/.
I don't know a good way to do it, but one way to get the result would be
to force it into a hash since that eliminates duplicates.
I'm sure there's a better way to do it, but here's what I got.
array = ["apple", "banana", "apple", "orange", "fat", "cow", "cow"]
h = Hash.new
duplicates = []
array.each {|item|
if h.has_key?(item) then
duplicates << item
else
h[item] = 0 #it doesn't matter what we store
end
}
puts duplicates
Cheers
Mohit.
Cheers,
Mohit.
10/28/2007 | 9:16 PM.
irb(main):007:0> array = %w{apple banana apple orange}
=> ["apple", "banana", "apple", "orange"]
irb(main):008:0> array.inject(Hash.new(0)) {|ha,e|
ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
=> ["apple"]
Kind regards
robert
I really appreciate your help
Cheers
arr,dup = ["apple", "banana", "apple", "orange"],[]
(arr.length-1).times do
a = arr.shift
dup << a if arr.include?(a)
end
p dup.uniq
Harry
--
A Look into Japanese Ruby List in English
http://www.kakueki.com/
Hi,
They are definitely worth looking into - inject in particular is a
powerful tool (Robert Klemme can make it do anything!). However, the
following benchmark shows that a slight modification of your approach
is actually pretty efficient. (The modification is to store the
duplicates in a hash rather than an array so you can return the list
of duplicates using Hash#keys).
Regards,
Sean
# Mohit Sindhwani (with slight adjustment)
def duplicates_1(array)
seen = { }
duplicates = { }
array.each {|item| seen.key?(item) ? duplicates[item] = true :
seen[item] = true}
duplicates.keys
end
# Robert Klemme
def duplicates_2(array)
array.inject(Hash.new(0)) {|ha,e| ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
end
# from facets
def duplicates_3(array)
array.inject(Hash.new(0)){|h,v| h[v]+=1; h}.reject{|k,v| v==1}.keys
end
require 'benchmark'
def do_benchmark(title, n, methods, *args, &block)
puts '-' * 40
puts title
puts '-' * 40
Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x|
methods.each do |meth|
x.report(meth.to_s) { n.times do send(meth, *args, &block) end }
end
end
end
# get some data (Ubuntu specific I guess - YMMV)
array = File.read('/etc/dictionaries-common/words').split(/\n/)
# test w/o dups
do_benchmark('no duplicates', 10, [:duplicates_1, :duplicates_2,
:duplicates_3], array)
# create some duplicates
array = array[0..999] * 100
do_benchmark('duplicates', 10, [:duplicates_1, :duplicates_2,
:duplicates_3], array)
__END__
$ ruby bm-duplicates.rb
----------------------------------------
no duplicates
----------------------------------------
user system total real
duplicates_1 2.200000 0.010000 2.210000 ( 2.215057)
duplicates_2 5.820000 0.000000 5.820000 ( 5.812414)
duplicates_3 6.580000 0.010000 6.590000 ( 6.586708)
----------------------------------------
duplicates
----------------------------------------
user system total real
duplicates_1 1.560000 0.000000 1.560000 ( 1.562587)
duplicates_2 2.660000 0.000000 2.660000 ( 2.665301)
duplicates_3 2.590000 0.000000 2.590000 ( 2.595189)
Thanks Sean! Makes me feel quite nice about it.
So, hashes are faster than arrays?
Cheers,
Mohit.
10/29/2007 | 2:13 AM.
It depends what you're doing with them and how big they are. But in
this instance, I changed your solution to use a hash because you were
appending the duplicates to an array which resulted in adding an entry
to that array every time you detected a duplicate. This didn't show up
in your example because your data contained at most two instances of
an item. If you change your example to:
array = ["apple", "banana", "apple", "orange", "fat", "cow", "cow",
"apple", "apple"]
h = Hash.new
duplicates = []
array.each {|item|
if h.has_key?(item) then
duplicates << item
else
h[item] = 0 #it doesn't matter what we store
end
}
puts duplicates
it outputs
apple
cow
apple
apple
which is probably not what you want.
Regards,
Sean
Here's yet another way to do it:
http://snippets.dzone.com/posts/show/4148
Cheers,
j.k.
Cheers,
Mohit.
10/29/2007 | 11:44 AM.
i just tested this using ruby1.9 on a p4 box running windowsxp. i included ruby's group_by and got surprising results.
C:\ruby1.9\bin>diff test-old.rb test.rb
19a20,24
> #1.9's group_by
> def duplicates_4(array)
> array.group_by{|e|e}.select{|_,k| k.size>1}.keys
> end
>
26c31
< Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x|
---
> Benchmark.bmbm(methods.map{ |x| x.to_s.length}.max + 2) do |x|
34c39
< array = File.read('/etc/dictionaries-common/words').split(/\n/)
---
> array = File.read('american-english').split(/\n/)
38c43
< :duplicates_3], array)
---
> :duplicates_3,:duplicates_4], array)
43c48
< :duplicates_3], array)
---
> :duplicates_3,:duplicates_4], array)
C:\ruby1.9\bin>
C:\ruby1.9\bin>ruby test.rb
----------------------------------------
no duplicates
----------------------------------------
Rehearsal -------------------------------------------------
duplicates_1 7.609000 0.094000 7.703000 ( 7.984000)
duplicates_2 10.438000 0.109000 10.547000 ( 11.608000)
duplicates_3 14.609000 0.219000 14.828000 ( 14.874000)
duplicates_4 11.422000 0.141000 11.563000 ( 14.201000)
--------------------------------------- total: 44.641000sec
user system total real
duplicates_1 7.219000 0.125000 7.344000 ( 8.109000)
duplicates_2 9.844000 0.078000 9.922000 ( 10.374000)
duplicates_3 14.391000 0.172000 14.563000 ( 18.498000)
duplicates_4 11.172000 0.172000 11.344000 ( 12.998000)
----------------------------------------
duplicates
----------------------------------------
Rehearsal -------------------------------------------------
duplicates_1 3.375000 0.000000 3.375000 ( 3.765000)
duplicates_2 3.218000 0.000000 3.218000 ( 3.828000)
duplicates_3 3.250000 0.000000 3.250000 ( 3.672000)
duplicates_4 2.032000 0.047000 2.079000 ( 2.077000)
--------------------------------------- total: 11.922000sec
user system total real
duplicates_1 3.375000 0.000000 3.375000 ( 3.437000)
duplicates_2 3.188000 0.000000 3.188000 ( 3.218000)
duplicates_3 3.219000 0.015000 3.234000 ( 3.281000)
duplicates_4 1.844000 0.000000 1.844000 ( 1.859000)
C:\ruby1.9\bin>
kind regards -botp
Thanks!
> Do you have a mail filter checking for any posts
> containing 'inject'? :)
I don't need that since most of them were written by me. :-) (slight
exaggeration)
*chuckle*
Kind regards
robert