Re: Fog::Storage::Rackspace::Directories#get consuming 67% of the process time

57 views
Skip to first unread message

Kyle Rames

unread,
Apr 8, 2013, 1:45:46 PM4/8/13
to ruby...@googlegroups.com
I am still researching this, but it sounds like Ruby's time parsing might just be really slow:


Kyle


On Monday, April 8, 2013 12:03:17 PM UTC-5, ha wrote:
Hi guys,

I'm troubleshooting some latency issues in one of our backend services and I'm getting a strange result.

Not sure if I'm reading the result correctly, so see the pic of the ruby-prof output.

About my environment:
* ruby-1.9.3-p194 [ x86_64 ]
* fog (1.8.0)
* daemons (1.1.8)

It seems to me that 67% of the time is being spent in "Directories::get"

67.02% (95.29%) Fog::Storage::Rackspace::Directories#get [56 calls, 56 total]

And 32% spent to set last_modified attribute and time parsing.
32.26% (64.75%) Fog::Storage::Rackspace::File#last_modified= [101185 calls, 101241 total]


Any clue on what is going on?

Thanks,
./Helio

ha

unread,
Apr 8, 2013, 2:08:37 PM4/8/13
to ruby...@googlegroups.com
Thanks Kyle,

Going through the Fog code I got this block that seems to be doing the parse.
        when :time
          class_eval <<-EOS, __FILE__, __LINE__
            def #{name}=(new_#{name})
              attributes[:#{name}] = if new_#{name}.nil? || new_#{name} == "" || new_#{name}.is_a?(Time)
                new_#{name}
              else
                Time.parse(new_#{name})
              end
            end
          EOS

That is inside lib/fog/core/attributes.rb.

I'm trying to set that directly in my code to move from the false to the true side of the if!


./Helio

Kyle Rames

unread,
Apr 8, 2013, 2:43:09 PM4/8/13
to ruby...@googlegroups.com
The more I research this, the more I am convenced this is an issue with Ruby's time implementation. 

According to the home_run gem (which is alternative date implementation for Ruby), the standard Date library was re-written in C and received a significant speed boost. 


However, when I execute the following in irb

require 'time'
Time.method(:parse).source_location

It says that it is coming from a ruby file.

["/Users/kyle.rames/.rvm/rubies/ruby-1.9.3-p286/lib/ruby/1.9.1/time.rb", 263] 

Does anyone have an ideas? Are we using the wrong method for this?

Kyle

geemus (Wesley Beary)

unread,
Apr 8, 2013, 4:06:50 PM4/8/13
to ruby...@googlegroups.com
Hmm. I don't think it should just be the time parsing. ha, could you share the code where the problem is occurring? Also, running with `ENV['EXCON_DEBUG']="true"` may be enlightening (I'm suspecting there are more api calls happening than one might think.


--
You received this message because you are subscribed to the Google Groups "ruby-fog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ruby-fog+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

ha

unread,
Apr 8, 2013, 4:26:13 PM4/8/13
to ruby...@googlegroups.com
I'm not sure I'll be able to get something runnable but I can list the steps my process goes through Fog library to remote store files in Rackspace.

BTW: I changed the code (I listed above) to the one below and re-tested:
        when :time
          puts "This is name: #{name}"
          class_eval <<-EOS, __FILE__, __LINE__
            def #{name}=(new_#{name})
              attributes[:#{name}] = if new_#{name}.nil? || new_#{name} == "" || new_#{name}.is_a?(Time)
                new_#{name}
              else
                # Let's see what happens...
                #Time.parse(new_#{name})
                new_#{name}
              end
            end
          EOS

The result I get in the ruby-prof is around 30% better, just by eliminating Time.parse(...).

./Helio

geemus (Wesley Beary)

unread,
Apr 8, 2013, 4:40:18 PM4/8/13
to ruby...@googlegroups.com
Cool, yeah whatever extra info you can provide would help narrow this down.

ha

unread,
Apr 8, 2013, 4:50:56 PM4/8/13
to ruby...@googlegroups.com
--> Pull some XML, parse, read some images
Then each image (original, and a few resized ones) go through a code similar to this:

      @my_fog = Fog::Storage.new(
        provider:           'Rackspace',
        rackspace_username: rack_username,
        rackspace_api_key:  rack_api_key
      )

--> Call a method to store_image that looks like a lot this:

    def store_image(filename, image_file)
      container = @my_fog.directories.get(image_container) # image_container is like '<RACK_ENV>_images<CONTAINER_NUMBER>'

      new_object = container.files.create(
        :key          => "#{@hashed_url}/#{filename}.jpg",
        :body         => open(image_file),
        :public       => true,
        :content_type => 'image/jpeg'
      )

      new_object.public_url
    rescue => e
      msg = e.respond_to?(:message) ? e.message : e
      raise MyERROR.new("Error storing image: #{msg}", "IMAGE_STORE")
    end

Then the process go back to the beginning and keep doing it until all the XMLs are parsed and all images are stored in Rackspace.


It does seems easy for me to detach this code from the app and see if I can get a script doing just this so I can optimize.
I'll first switch from using Fog and storing the files locally so I can then compare with same image sample.


Some of the number I'm having:
1. Remote storage (Fog+Rackspace): 3 hours for about 3k images
2. Local storage: 15 minutes
3. The code I sent below without using Time.parse(...): I expect to take 2 hours, but I haven't run a full simulation with production data


Thanks a lot for helping!


./Helio

geemus (Wesley Beary)

unread,
Apr 8, 2013, 4:58:00 PM4/8/13
to ruby...@googlegroups.com
One thing jumps out immediately that could probably solve your issue. The first line of store_image reads:

container = @my_fog.directories.get(image_container)

This does an http request and pulls back data/parses (and as we saw can be quite slow in some cases). In this case, if you can reasonably assume that this directory already exists, you can skip the http request by doing this:

container = @my_fog.directories.new(:key => image_container)

That will create just a local reference without making any remote calls, which you can then chain the other commands on to do file creation. Pretty sure that simple change should pretty much take care of the issues you mentioned (though there may be further room for improvement). Sorry that this is non-obvious (I've seen another of other people encounter the same issue). Definitely welcome any suggestions about how we can make this behavior clearer in the documentation or elsewhere.

Thanks!
wes

ha

unread,
Apr 9, 2013, 8:52:50 AM4/9/13
to ruby...@googlegroups.com
Thanks Wes!

That change alone had a big impact in the processing time.
Isn't it as fast as I'd like to but now the bottleneck lies somewhere else!

I took the time to look at the documentation and it seems pretty clear the best way to do it, is how you suggested in your email.
In the 'Getting Started' page it is suggested to use:
directory = storage.directories.create(
  :key => 'data'
)

And also in the Rackspace 'Provider Documentation' it suggests a code very similar to the above in the upload_file.rb.
I guess the person who implemented the service I'm optimizing use other documentation!


Thanks,
./Helio

geemus (Wesley Beary)

unread,
Apr 9, 2013, 10:20:18 AM4/9/13
to ruby...@googlegroups.com
Great to hear. Documenting it has been a bit tricky because it can depend on the situation. Create as above is required if the directory doesn't always exist and in some cases you may want to do the get instead of new (even though it is much slower) because you need/want to ensure that the directory is there (or is still there). That said, we can probably clean some of that up. Thanks for the feedback and just let us know if you run into anything else that seems particularly out of control.
Thanks!
wes

Rupak Ganguly

unread,
Apr 9, 2013, 3:39:38 PM4/9/13
to ruby...@googlegroups.com
Hi Helio,

As Wes said, if you are sure that the container exists, you can use:
 container = @my_fog.directories.new(:key => image_container)

But, if you are not sure, then,

In case of HP Cloud, I have an extra 'head' method for the Directory model. I am sure Rackspace has a HEAD api call. The situations where you want to be sure that the container is present, a 'directories.head()' call is much more performant than the  'directories.get()' call. Although, at the worst case, if you have to refresh the 'directory.files' collection just to be sure of files you already got in the directory, I guess you will have to take the hit and call 'directories.get'.

But, otherwise, you could do something like:

@my_fog.directories.head("image_container").files.create(:key => "sample.txt", :body => "This is a sample file")

Hope it helps.


Thanks,
Rupak Ganguly
Ph: 678-648-7434

Kyle Rames

unread,
Apr 9, 2013, 3:57:10 PM4/9/13
to ruby...@googlegroups.com
Unfortunately, the Rackspace implementation does not currently have a head method. However, you could do something like the following:

container = nil
begin
  @my_fog.service.head_container("image_container")
  container = @my_fog.directories.new(:key => image_container)
rescue Fog::Rackspace::Errors::NotFound => e
  container = @my_fog.directories.create(:key => image_container)
end  


Kyle

ha

unread,
Apr 12, 2013, 3:46:49 PM4/12/13
to ruby...@googlegroups.com
Kyle,
Your code is pretty concise! :-)

I know the containers exists before hand, because I create them before the App starts.
That is actually the current logic and I don't know exactly the reason behind that (I came in after the first release was up & running).

I check the docs and difference between (@my_fog.directories.)new and create is that create calls new, save, and then return the new object.

Is it a fair assumption that the create will indeed make an extra request to rackspace?

Considering that I have my containers created, that before deployment I check whether they exsist, and nobody else (I want to believe on this one...) touches them, I feel like not using the NotFound exception.

But the code is so beautiful and complete that I feel like using it! :-)

Thanks,
./Helio

Kyle Rames

unread,
Apr 12, 2013, 4:01:31 PM4/12/13
to ruby...@googlegroups.com
Unfortunately create will make a request. Hopefully we can look at changing it in the future.

And thanks for the complement! :)

Kyle
Reply all
Reply to author
Forward
0 new messages