[Eventmachine-talk] EM sending and receiving large files

269 views
Skip to first unread message

Dan Mayer

unread,
Sep 28, 2008, 10:18:11 PM9/28/08
to eventmac...@rubyforge.org
We have been trying to send large files with EventMachine and noticed a few issues. If we just use send data with the contents of a file inside it is slow, and the server eats about 98% of the CPU. The send_file call only supports files up to 32K, which we are sending files as large as 5mb. Lastly we have been unable to use stream_file_data, because it has a dependency on evma_fastfilereader, which I couldn't seem to find anywhere to install anymore.

Some of these issues have been discussed in this thread:
http://groups.google.com/group/eventmachine/browse_thread/thread/3cc6b0ee1a8419?pli=1

Has anyone been sending large file with eventmachine that could share some tips. In our case we are using EM for both the client and the server. We are trying to sync over a directory of many files, is this just not a recommended usage of EM? Besides looking for solutions to make this work better on EM, are there other recommendations of better ways to send and receive large amounts of file data with Ruby?

Thanks,
Dan

--
Dan Mayer
Co-founder, Devver
(http://devver.net)
follow us on twitter: http://twitter.com/devver
My Blog (http://mayerdan.com)

Kirk Haines

unread,
Sep 28, 2008, 10:46:27 PM9/28/08
to eventmac...@rubyforge.org
On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <d...@devver.net> wrote:
We have been trying to send large files with EventMachine and noticed a few issues. If we just use send data with the contents of a file inside it is slow, and the server eats about 98% of the CPU. The send_file call only supports files up to 32K, which we are sending files as large as 5mb. Lastly we have been unable to use stream_file_data, because it has a dependency on evma_fastfilereader, which I couldn't seem to find anywhere to install anymore.

Hmmm.  I think that was confused oversight on Francis/my part.  evma_fastfilereader should be part of EM.  Until it is, you can get it by installing Swiftiply.
 

Has anyone been sending large file with eventmachine that could share some tips. In our case we are using EM for both the client and the server. We are trying to sync over a directory of many files, is this just not a recommended usage of EM? Besides looking for solutions to make this work better on EM, are there other recommendations of better ways to send and receive large amounts of file data with Ruby?

 Using stream_file_data I regularly transfer very large files with Swiftiply.


Kirk Haines

James Tucker

unread,
Sep 29, 2008, 7:56:15 AM9/29/08
to eventmac...@rubyforge.org
On 29 Sep 2008, at 03:46, Kirk Haines wrote:



On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <d...@devver.net> wrote:
We have been trying to send large files with EventMachine and noticed a few issues. If we just use send data with the contents of a file inside it is slow, and the server eats about 98% of the CPU. The send_file call only supports files up to 32K, which we are sending files as large as 5mb. Lastly we have been unable to use stream_file_data, because it has a dependency on evma_fastfilereader, which I couldn't seem to find anywhere to install anymore.

Hmmm.  I think that was confused oversight on Francis/my part.  evma_fastfilereader should be part of EM.  Until it is, you can get it by installing Swiftiply.

I've been meaning to come and grab it and commit it to EM, as it's also the last failing test in the suite run from trunk after the last months work. Assuming there are no other issues raised, I will get this committed to the EM code base.



Has anyone been sending large file with eventmachine that could share some tips. In our case we are using EM for both the client and the server. We are trying to sync over a directory of many files, is this just not a recommended usage of EM? Besides looking for solutions to make this work better on EM, are there other recommendations of better ways to send and receive large amounts of file data with Ruby?

 Using stream_file_data I regularly transfer very large files with Swiftiply.


Kirk Haines
_______________________________________________
Eventmachine-talk mailing list
Eventmac...@rubyforge.org
http://rubyforge.org/mailman/listinfo/eventmachine-talk

Dan Mayer

unread,
Sep 29, 2008, 8:45:11 PM9/29/08
to eventmac...@rubyforge.org
Thanks for the tip on installing Swiftiply, that made stream_file_data work perfectly.

Unfortunately, it didn't solve our problem. Large files were still taking a long time to transfer. So I looked deeper into the issue, I had always been assuming the delay was actually the slow transfer time. Running a profiler against our code was enlightening as always, it appears our message buffer is adding a significant amount of the time. If I completely get rid of any message buffer on the server used to split up multiple messages, either send_data or stream_file_data (with larger files) drops to less than 1 second. After searching around a bit I found BufferedTokenizer, which is one of the protocols for EM. Switching from our apparently bad buffer to the one included with EM brought us from 10 seconds to 1.2 seconds.

Thanks for the the help, looks like everything is back on track for our EM performance.

thanks,
Dan Mayer

Aman Gupta

unread,
Sep 29, 2008, 9:49:29 PM9/29/08
to eventmac...@rubyforge.org
Do you know what specifically about your buffer was causing issues?
Were you using String#<<

Aman

Dan Mayer

unread,
Sep 29, 2008, 10:07:30 PM9/29/08
to eventmac...@rubyforge.org
Aman (and hopefully others interested on the list),

Here is a profiler dump after I optimized a bit, I got ours from 26ish seconds down to 10 by getting rid of things like String#<<
14.44     3.49      0.66      668     0.99     0.99  String#split
 13.13     4.09      0.60      665     0.90     0.90  String#index
  4.16     4.28      0.19      668     0.28     3.29  DataBuffer#grab
  3.06     4.42      0.14      661     0.21     6.87  EmServerExample#receive_data
  0.88     4.46      0.04     2007     0.02     0.02  Array#length
  0.66     4.49      0.03     2007     0.01     0.01  Fixnum#>
  0.66     4.52      0.03      662     0.05     3.31  DataBuffer#append

What is the fastest way to do appending to strings?

This is a really messy since I was messing around trying a bunch optimizations and other things, before finding and switching to the EM buffer.

class DataBuffer
  FRONT_DELIMITER = "0x5b".hex.chr # '['
  #']'[0].to_s(16).hex.chr
  BACK_DELIMITER = "0x5d".hex.chr # ']'
#crazy delimiter because normal ones kept showing up in binary files
  DELIMITER = "|#{FRONT_DELIMITER}#{FRONT_DELIMITER}#{FRONT_DELIMITER}GT_DELIM#{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}|"
#added to replace, dynamically making these
  DELIM_ESCAPE = /#{Regexp.escape(DELIMITER)}/
  DELIM_ESCAPE_END = /#{Regexp.escape(DELIMITER)}\Z/

    def initialize
      @unprocessed = ""
      @commands = []
    end

    def grab
      new_messages = @unprocessed.split(DELIM_ESCAPE)
      while new_messages.length > 1
        @commands << new_messages.shift
      end
      msg_length = new_messages.length
      if msg_length > 0
        if msg_length == 1 && (@unprocessed=~DELIM_ESCAPE_END)
          # @commands << new_messages.shift
          @commands.push(new_messages.shift)
          @unprocessed = ""
        else
          #put the rest of the last statement back into the buffer
          while(cut=@unprocessed.index(DELIM_ESCAPE))
            @unprocessed = (@unprocessed[cut..@unprocessed.length]).sub(DELIMITER,"")
          end
        end
      end
      if @commands.length > 0
        return @commands.shift
      else
        return nil #if @commands.length==0
      end
    end
   
    def prepare(str)
      str.to_s+DELIMITER
    end
   
    def append(data)
      # @unprocessed << data
      @unprocessed = @unprocessed + data
    end
   
  end
 
... client / server code usage...
send_data(@buffer.prepare("some_msg"))

 def receive_data(data)
      @buffer.append(data)
      while(command = @buffer.grab)
         process(command)
      end
  end

    def process(data)
      puts "got data: #{data}"
    end
...

I am probably going to look closer at the EM buffer and our code and I am sure I will realize something pretty dumb that we did.

Thanks,
Dan

James Tucker

unread,
Sep 30, 2008, 7:25:17 AM9/30/08
to eventmac...@rubyforge.org
Dan,

If you have some time, would you be able to use your data sets against this other BufferedTokenizer implementation:


There are varying cases for performance depending on the specific data sets and chunk size being added to the buffer. Ruby's GC certainly starts to cause performance issues with too many objects, so I'm trying to strike a balance.

Any input would be welcome,

Kind regards,

J.

Tony Arcieri

unread,
Sep 30, 2008, 12:46:29 PM9/30/08
to eventmac...@rubyforge.org
On Tue, Sep 30, 2008 at 5:25 AM, James Tucker <jftu...@gmail.com> wrote:
Dan,

If you have some time, would you be able to use your data sets against this other BufferedTokenizer implementation:


A string-based one should generally be faster on Ruby 1.8

--
Tony Arcieri
medioh.com

James Tucker

unread,
Sep 30, 2008, 1:55:26 PM9/30/08
to eventmac...@rubyforge.org
In a few tests I did here, the differences were related to size of incoming chunk and number of chunks per token mostly.

1.8 - 1.9 speed differences vary, each has it's own advantages at certain tasks, but the two implementations were overall quite comparable on both interpreters.

What I'm hoping to get an idea of is where and why the differences really come up.



--
Tony Arcieri
medioh.com

Dan Mayer

unread,
Oct 1, 2008, 12:55:09 AM10/1/08
to eventmac...@rubyforge.org
Sure no problem. Sorry it took me so long to get back to this, I got slammed with some items that I had to take care of today.

I ran it on a small test set of data, and the results were very similar... The current tokenizer in EM seemed to outperform your pastie by very small amounts. Tomorrow I can run it against a much large and real project, and I will let you know if I notice any significant differences.

I am cleaning up some of the code I have been using, and will likely make a post about various methods of sending files through EM in the next couple days. I noticed it wasn't the easiest to find examples of the various options just out on the web, so it might help a few people running into similar problems.

peace,
Dan Mayer

Dan Mayer

unread,
Oct 8, 2008, 11:42:16 AM10/8/08
to eventmac...@rubyforge.org
One final follow up.

I posted some quick benchmarks comparing sending files with our buffer, EM's buffer, the buffer James Tucker suggested, and stream_file_data. I also included some benchmarks with compression. I included the code I used for testing. I thought since I hadn't easily found a good way to send files it might help out some people in the future. It was nice to be able to just switch buffers and get a 10X improvement on speed.

http://devver.net/blog/2008/10/sending-files-with-eventmachine/

If anyone has any thoughts, tips, or alternative buffers let me know.

thanks,
Dan

Aman Gupta

unread,
Oct 9, 2008, 12:17:49 AM10/9/08
to eventmac...@rubyforge.org
> If anyone has any thoughts, tips, or alternative buffers let me know.

You might also try Tony's C buffer:

http://github.com/igrigorik/em-http-request/tree/master/ext/buffer/em_buffer.c
http://github.com/tarcieri/rev/tree/master/ext/rev/rev_buffer.c

Aman

Tony Arcieri

unread,
Oct 9, 2008, 1:47:59 AM10/9/08
to eventmac...@rubyforge.org
Although that buffer may be the source of the problems you were experiencing with Rev... that'd be good to know.
--
Tony Arcieri
medioh.com
Reply all
Reply to author
Forward
0 new messages