How to improve code efficiency of my example

17 views
Skip to first unread message

wbsu...@gmail.com

unread,
Sep 11, 2015, 2:32:01 PM9/11/15
to Ruby on Rails: Talk
# I was just at a job interview and was asked to write a function that did a frequency of various words from a string.
# I wrote something out on the board similar to my freq() method below. When I got home I coded this example up as
# an exercise
#
# I was then asked how could I improve the efficiency of the function. My guess is it would need to involve multi processing ?
# multi threading would not help I don't think. Offhand I wasn't sure how to answer the question. Multi processing via fork is
# not easy because the freq has to be combined at the end. That seems to have been the best answer but how to do it in
# my simple function I don't know ..
#
# they also asked what is the complexity which I forget that stuff as once in a while I review it, but then within a few months I forget it again and
# I have many areas I am trying to study ..
#
#

require 'benchmark'

class FreqTest
  attr_reader :data_str

  def initialize
    gen_data
  end

    def freq
        map = {}
        tokens = @data_str.scan(/\w+/)
        tokens.each do |tok|
            map[tok] ||= 0
            map[tok] += 1
        end
        map
  end

  private
 
    def gen_data
        parts = %w(ze pu ke no zi al loo kuna zetch pe def tuna kaga)
        @data_str = ""
        loop_cnt = rand(100000) + 100
        loop_cnt.times do
            str = ""
            (rand(4) + 1).times{str << parts[rand(parts.length)]}
            # puts "<" + str + ">"
            @data_str << str + " "
        end
    end

end

ftest = FreqTest.new


puts ftest.data_str     

puts Benchmark.measure{
    ftest.freq
}

Frederick Cheung

unread,
Sep 12, 2015, 1:57:51 PM9/12/15
to Ruby on Rails: Talk


On Friday, September 11, 2015 at 7:32:01 PM UTC+1, wbsu...@yahoo.com wrote:
# I was just at a job interview and was asked to write a function that did a frequency of various words from a string.
# I wrote something out on the board similar to my freq() method below. When I got home I coded this example up as
# an exercise
#
# I was then asked how could I improve the efficiency of the function. My guess is it would need to involve multi processing ?
# multi threading would not help I don't think. Offhand I wasn't sure how to answer the question. Multi processing via fork is
# not easy because the freq has to be combined at the end. That seems to have been the best answer but how to do it in
# my simple function I don't know ..
#

At the moment your inner loop body looks like this:

map[tok] ||= 0
map[tok] += 1

which reads from the hash twice and writes to it once or twice depending on whether the value is already set.

You could do
existing = map[tok] || 0
map[tok] = existing + 1

which always does one hash read and one hash write. You could also use a hash with default value

map = Hash.new(0)
And then in your loop, just map[tok]+=1

You could also argue that building up an array of words is wasteful - you're allocating a great big array of words but you don't actually need the array - just one work at a time.. You could instead do

@data_str.scan(/\w+/) do |token|
  map[token] += 1
end

In a quick benchmark this is about 15% faster than the original on my machine

Personally I wouldn't consider parallelising this to be something that makes it more efficient - just something that makes it faster.  Forking should work fine. As you say you will have to combine the results, but that's not a problem. You might want to look at the map-reduce approach for more examples of this

Fred

Reply all
Reply to author
Forward
0 new messages