We have been trying to send large files with EventMachine and noticed a few issues. If we just use send data with the contents of a file inside it is slow, and the server eats about 98% of the CPU. The send_file call only supports files up to 32K, which we are sending files as large as 5mb. Lastly we have been unable to use stream_file_data, because it has a dependency on evma_fastfilereader, which I couldn't seem to find anywhere to install anymore.
Has anyone been sending large file with eventmachine that could share some tips. In our case we are using EM for both the client and the server. We are trying to sync over a directory of many files, is this just not a recommended usage of EM? Besides looking for solutions to make this work better on EM, are there other recommendations of better ways to send and receive large amounts of file data with Ruby?
On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <d...@devver.net> wrote: > We have been trying to send large files with EventMachine and noticed a few > issues. If we just use send data with the contents of a file inside it is > slow, and the server eats about 98% of the CPU. The send_file call only > supports files up to 32K, which we are sending files as large as 5mb. Lastly > we have been unable to use stream_file_data, because it has a dependency on > evma_fastfilereader, which I couldn't seem to find anywhere to install > anymore.
Hmmm. I think that was confused oversight on Francis/my part. evma_fastfilereader should be part of EM. Until it is, you can get it by installing Swiftiply.
> Has anyone been sending large file with eventmachine that could share some > tips. In our case we are using EM for both the client and the server. We are > trying to sync over a directory of many files, is this just not a > recommended usage of EM? Besides looking for solutions to make this work > better on EM, are there other recommendations of better ways to send and > receive large amounts of file data with Ruby?
Using stream_file_data I regularly transfer very large files with Swiftiply.
> On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <d...@devver.net> wrote: > We have been trying to send large files with EventMachine and > noticed a few issues. If we just use send data with the contents of > a file inside it is slow, and the server eats about 98% of the CPU. > The send_file call only supports files up to 32K, which we are > sending files as large as 5mb. Lastly we have been unable to use > stream_file_data, because it has a dependency on > evma_fastfilereader, which I couldn't seem to find anywhere to > install anymore.
> Hmmm. I think that was confused oversight on Francis/my part. > evma_fastfilereader should be part of EM. Until it is, you can get > it by installing Swiftiply.
I've been meaning to come and grab it and commit it to EM, as it's also the last failing test in the suite run from trunk after the last months work. Assuming there are no other issues raised, I will get this committed to the EM code base.
> Has anyone been sending large file with eventmachine that could > share some tips. In our case we are using EM for both the client and > the server. We are trying to sync over a directory of many files, is > this just not a recommended usage of EM? Besides looking for > solutions to make this work better on EM, are there other > recommendations of better ways to send and receive large amounts of > file data with Ruby?
> Using stream_file_data I regularly transfer very large files with > Swiftiply.
Thanks for the tip on installing Swiftiply, that made stream_file_data work perfectly.
Unfortunately, it didn't solve our problem. Large files were still taking a long time to transfer. So I looked deeper into the issue, I had always been assuming the delay was actually the slow transfer time. Running a profiler against our code was enlightening as always, it appears our message buffer is adding a significant amount of the time. If I completely get rid of any message buffer on the server used to split up multiple messages, either send_data or stream_file_data (with larger files) drops to less than 1 second. After searching around a bit I found BufferedTokenizer, which is one of the protocols for EM. Switching from our apparently bad buffer to the one included with EM brought us from 10 seconds to 1.2 seconds.
Thanks for the the help, looks like everything is back on track for our EM performance.
On Mon, Sep 29, 2008 at 5:56 AM, James Tucker <jftuc...@gmail.com> wrote:
> On 29 Sep 2008, at 03:46, Kirk Haines wrote:
> On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <d...@devver.net> wrote:
>> We have been trying to send large files with EventMachine and noticed a >> few issues. If we just use send data with the contents of a file inside it >> is slow, and the server eats about 98% of the CPU. The send_file call only >> supports files up to 32K, which we are sending files as large as 5mb. Lastly >> we have been unable to use stream_file_data, because it has a dependency on >> evma_fastfilereader, which I couldn't seem to find anywhere to install >> anymore.
> Hmmm. I think that was confused oversight on Francis/my part. > evma_fastfilereader should be part of EM. Until it is, you can get it by > installing Swiftiply.
> I've been meaning to come and grab it and commit it to EM, as it's also the > last failing test in the suite run from trunk after the last months work. > Assuming there are no other issues raised, I will get this committed to the > EM code base.
>> Has anyone been sending large file with eventmachine that could share some >> tips. In our case we are using EM for both the client and the server. We are >> trying to sync over a directory of many files, is this just not a >> recommended usage of EM? Besides looking for solutions to make this work >> better on EM, are there other recommendations of better ways to send and >> receive large amounts of file data with Ruby?
> Using stream_file_data I regularly transfer very large files with > Swiftiply.
On Mon, Sep 29, 2008 at 5:45 PM, Dan Mayer <d...@devver.net> wrote: > Thanks for the tip on installing Swiftiply, that made stream_file_data work > perfectly.
> Unfortunately, it didn't solve our problem. Large files were still taking a > long time to transfer. So I looked deeper into the issue, I had always been > assuming the delay was actually the slow transfer time. Running a profiler > against our code was enlightening as always, it appears our message buffer > is adding a significant amount of the time. If I completely get rid of any > message buffer on the server used to split up multiple messages, either > send_data or stream_file_data (with larger files) drops to less than 1 > second. After searching around a bit I found BufferedTokenizer, which is one > of the protocols for EM. Switching from our apparently bad buffer to the one > included with EM brought us from 10 seconds to 1.2 seconds.
> Thanks for the the help, looks like everything is back on track for our EM > performance.
> thanks, > Dan Mayer
> On Mon, Sep 29, 2008 at 5:56 AM, James Tucker <jftuc...@gmail.com> wrote:
>> On 29 Sep 2008, at 03:46, Kirk Haines wrote:
>> On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <d...@devver.net> wrote:
>>> We have been trying to send large files with EventMachine and noticed a >>> few issues. If we just use send data with the contents of a file inside it >>> is slow, and the server eats about 98% of the CPU. The send_file call only >>> supports files up to 32K, which we are sending files as large as 5mb. Lastly >>> we have been unable to use stream_file_data, because it has a dependency on >>> evma_fastfilereader, which I couldn't seem to find anywhere to install >>> anymore.
>> Hmmm. I think that was confused oversight on Francis/my part. >> evma_fastfilereader should be part of EM. Until it is, you can get it by >> installing Swiftiply.
>> I've been meaning to come and grab it and commit it to EM, as it's also >> the last failing test in the suite run from trunk after the last months >> work. Assuming there are no other issues raised, I will get this committed >> to the EM code base.
>>> Has anyone been sending large file with eventmachine that could share >>> some tips. In our case we are using EM for both the client and the server. >>> We are trying to sync over a directory of many files, is this just not a >>> recommended usage of EM? Besides looking for solutions to make this work >>> better on EM, are there other recommendations of better ways to send and >>> receive large amounts of file data with Ruby?
>> Using stream_file_data I regularly transfer very large files with >> Swiftiply.
Aman (and hopefully others interested on the list),
Here is a profiler dump after I optimized a bit, I got ours from 26ish seconds down to 10 by getting rid of things like String#<< 14.44 3.49 0.66 668 0.99 0.99 String#split 13.13 4.09 0.60 665 0.90 0.90 String#index 4.16 4.28 0.19 668 0.28 3.29 DataBuffer#grab 3.06 4.42 0.14 661 0.21 6.87 EmServerExample#receive_data 0.88 4.46 0.04 2007 0.02 0.02 Array#length 0.66 4.49 0.03 2007 0.01 0.01 Fixnum#> 0.66 4.52 0.03 662 0.05 3.31 DataBuffer#append
What is the fastest way to do appending to strings?
This is a really messy since I was messing around trying a bunch optimizations and other things, before finding and switching to the EM buffer.
class DataBuffer FRONT_DELIMITER = "0x5b".hex.chr # '[' #']'[0].to_s(16).hex.chr BACK_DELIMITER = "0x5d".hex.chr # ']' #crazy delimiter because normal ones kept showing up in binary files DELIMITER = "|#{FRONT_DELIMITER}#{FRONT_DELIMITER}#{FRONT_DELIMITER}GT_DELIM#{BACK_DELI MITER}#{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}|" #added to replace, dynamically making these DELIM_ESCAPE = /#{Regexp.escape(DELIMITER)}/ DELIM_ESCAPE_END = /#{Regexp.escape(DELIMITER)}\Z/
def initialize @unprocessed = "" @commands = [] end
def grab new_messages = @unprocessed.split(DELIM_ESCAPE) while new_messages.length > 1 @commands << new_messages.shift end msg_length = new_messages.length if msg_length > 0 if msg_length == 1 && (@unprocessed=~DELIM_ESCAPE_END) # @commands << new_messages.shift @commands.push(new_messages.shift) @unprocessed = "" else #put the rest of the last statement back into the buffer while(c...@unprocessed.index(DELIM_ESCAPE)) @unprocessed = (@unprocessed[cu...@unprocessed.length ]).sub(DELIMITER,"") end end end if @commands.length > 0 return @commands.shift else return nil #if @commands.length==0 end end
def prepare(str) str.to_s+DELIMITER end
def append(data) # @unprocessed << data @unprocessed = @unprocessed + data end
end
... client / server code usage... send_data(@buffer.prepare("some_msg"))
def receive_data(data) @buffer.append(data) while(command = @buffer.grab) process(command) end end
def process(data) puts "got data: #{data}" end ...
I am probably going to look closer at the EM buffer and our code and I am sure I will realize something pretty dumb that we did.
Thanks, Dan
On Mon, Sep 29, 2008 at 7:49 PM, Aman Gupta <themastermi...@gmail.com>wrote:
> Do you know what specifically about your buffer was causing issues? > Were you using String#<<
> Aman
> On Mon, Sep 29, 2008 at 5:45 PM, Dan Mayer <d...@devver.net> wrote: > > Thanks for the tip on installing Swiftiply, that made stream_file_data > work > > perfectly.
> > Unfortunately, it didn't solve our problem. Large files were still taking > a > > long time to transfer. So I looked deeper into the issue, I had always > been > > assuming the delay was actually the slow transfer time. Running a > profiler > > against our code was enlightening as always, it appears our message > buffer > > is adding a significant amount of the time. If I completely get rid of > any > > message buffer on the server used to split up multiple messages, either > > send_data or stream_file_data (with larger files) drops to less than 1 > > second. After searching around a bit I found BufferedTokenizer, which is > one > > of the protocols for EM. Switching from our apparently bad buffer to the > one > > included with EM brought us from 10 seconds to 1.2 seconds.
> > Thanks for the the help, looks like everything is back on track for our > EM > > performance.
> > thanks, > > Dan Mayer
> > On Mon, Sep 29, 2008 at 5:56 AM, James Tucker <jftuc...@gmail.com> > wrote:
> >> On 29 Sep 2008, at 03:46, Kirk Haines wrote:
> >> On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <d...@devver.net> wrote:
> >>> We have been trying to send large files with EventMachine and noticed a > >>> few issues. If we just use send data with the contents of a file inside > it > >>> is slow, and the server eats about 98% of the CPU. The send_file call > only > >>> supports files up to 32K, which we are sending files as large as 5mb. > Lastly > >>> we have been unable to use stream_file_data, because it has a > dependency on > >>> evma_fastfilereader, which I couldn't seem to find anywhere to install > >>> anymore.
> >> Hmmm. I think that was confused oversight on Francis/my part. > >> evma_fastfilereader should be part of EM. Until it is, you can get it > by > >> installing Swiftiply.
> >> I've been meaning to come and grab it and commit it to EM, as it's also > >> the last failing test in the suite run from trunk after the last months > >> work. Assuming there are no other issues raised, I will get this > committed > >> to the EM code base.
> >>> Has anyone been sending large file with eventmachine that could share > >>> some tips. In our case we are using EM for both the client and the > server. > >>> We are trying to sync over a directory of many files, is this just not > a > >>> recommended usage of EM? Besides looking for solutions to make this > work > >>> better on EM, are there other recommendations of better ways to send > and > >>> receive large amounts of file data with Ruby?
> >> Using stream_file_data I regularly transfer very large files with > >> Swiftiply.
There are varying cases for performance depending on the specific data sets and chunk size being added to the buffer. Ruby's GC certainly starts to cause performance issues with too many objects, so I'm trying to strike a balance.
> I am probably going to look closer at the EM buffer and our code and > I am sure I will realize something pretty dumb that we did.
> Thanks, > Dan
> On Mon, Sep 29, 2008 at 7:49 PM, Aman Gupta > <themastermi...@gmail.com> wrote: > Do you know what specifically about your buffer was causing issues? > Were you using String#<<
> Aman
> On Mon, Sep 29, 2008 at 5:45 PM, Dan Mayer <d...@devver.net> wrote: > > Thanks for the tip on installing Swiftiply, that made > stream_file_data work > > perfectly.
> > Unfortunately, it didn't solve our problem. Large files were still > taking a > > long time to transfer. So I looked deeper into the issue, I had > always been > > assuming the delay was actually the slow transfer time. Running a > profiler > > against our code was enlightening as always, it appears our > message buffer > > is adding a significant amount of the time. If I completely get > rid of any > > message buffer on the server used to split up multiple messages, > either > > send_data or stream_file_data (with larger files) drops to less > than 1 > > second. After searching around a bit I found BufferedTokenizer, > which is one > > of the protocols for EM. Switching from our apparently bad buffer > to the one > > included with EM brought us from 10 seconds to 1.2 seconds.
> > Thanks for the the help, looks like everything is back on track > for our EM > > performance.
> > thanks, > > Dan Mayer
> > On Mon, Sep 29, 2008 at 5:56 AM, James Tucker <jftuc...@gmail.com> > wrote:
> >> On 29 Sep 2008, at 03:46, Kirk Haines wrote:
> >> On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <d...@devver.net> wrote:
> >>> We have been trying to send large files with EventMachine and > noticed a > >>> few issues. If we just use send data with the contents of a file > inside it > >>> is slow, and the server eats about 98% of the CPU. The send_file > call only > >>> supports files up to 32K, which we are sending files as large as > 5mb. Lastly > >>> we have been unable to use stream_file_data, because it has a > dependency on > >>> evma_fastfilereader, which I couldn't seem to find anywhere to > install > >>> anymore.
> >> Hmmm. I think that was confused oversight on Francis/my part. > >> evma_fastfilereader should be part of EM. Until it is, you can > get it by > >> installing Swiftiply.
> >> I've been meaning to come and grab it and commit it to EM, as > it's also > >> the last failing test in the suite run from trunk after the last > months > >> work. Assuming there are no other issues raised, I will get this > committed > >> to the EM code base.
> >>> Has anyone been sending large file with eventmachine that could > share > >>> some tips. In our case we are using EM for both the client and > the server. > >>> We are trying to sync over a directory of many files, is this > just not a > >>> recommended usage of EM? Besides looking for solutions to make > this work > >>> better on EM, are there other recommendations of better ways to > send and > >>> receive large amounts of file data with Ruby?
> >> Using stream_file_data I regularly transfer very large files with > >> Swiftiply.
On Tue, Sep 30, 2008 at 5:25 AM, James Tucker <jftuc...@gmail.com> wrote: > Dan, > If you have some time, would you be able to use your data sets against this > other BufferedTokenizer implementation:
> A string-based one should generally be faster on Ruby 1.8
In a few tests I did here, the differences were related to size of incoming chunk and number of chunks per token mostly.
1.8 - 1.9 speed differences vary, each has it's own advantages at certain tasks, but the two implementations were overall quite comparable on both interpreters.
What I'm hoping to get an idea of is where and why the differences really come up.
Sure no problem. Sorry it took me so long to get back to this, I got slammed with some items that I had to take care of today.
I ran it on a small test set of data, and the results were very similar... The current tokenizer in EM seemed to outperform your pastie by very small amounts. Tomorrow I can run it against a much large and real project, and I will let you know if I notice any significant differences.
I am cleaning up some of the code I have been using, and will likely make a post about various methods of sending files through EM in the next couple days. I noticed it wasn't the easiest to find examples of the various options just out on the web, so it might help a few people running into similar problems.
On Tue, Sep 30, 2008 at 5:25 AM, James Tucker <jftuc...@gmail.com> wrote: > Dan, > If you have some time, would you be able to use your data sets against this > other BufferedTokenizer implementation:
> There are varying cases for performance depending on the specific data sets > and chunk size being added to the buffer. Ruby's GC certainly starts to > cause performance issues with too many objects, so I'm trying to strike a > balance.
> Any input would be welcome,
> Kind regards,
> J.
> On 30 Sep 2008, at 03:07, Dan Mayer wrote:
> Aman (and hopefully others interested on the list),
> Here is a profiler dump after I optimized a bit, I got ours from 26ish > seconds down to 10 by getting rid of things like String#<< > 14.44 3.49 0.66 668 0.99 0.99 String#split > 13.13 4.09 0.60 665 0.90 0.90 String#index > 4.16 4.28 0.19 668 0.28 3.29 DataBuffer#grab > 3.06 4.42 0.14 661 0.21 6.87 > EmServerExample#receive_data > 0.88 4.46 0.04 2007 0.02 0.02 Array#length > 0.66 4.49 0.03 2007 0.01 0.01 Fixnum#> > 0.66 4.52 0.03 662 0.05 3.31 DataBuffer#append
> What is the fastest way to do appending to strings?
> This is a really messy since I was messing around trying a bunch > optimizations and other things, before finding and switching to the EM > buffer.
> class DataBuffer > FRONT_DELIMITER = "0x5b".hex.chr # '[' > #']'[0].to_s(16).hex.chr > BACK_DELIMITER = "0x5d".hex.chr # ']' > #crazy delimiter because normal ones kept showing up in binary files > DELIMITER = > "|#{FRONT_DELIMITER}#{FRONT_DELIMITER}#{FRONT_DELIMITER}GT_DELIM#{BACK_DELI MITER}#{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}|" > #added to replace, dynamically making these > DELIM_ESCAPE = /#{Regexp.escape(DELIMITER)}/ > DELIM_ESCAPE_END = /#{Regexp.escape(DELIMITER)}\Z/
> I am probably going to look closer at the EM buffer and our code and I am > sure I will realize something pretty dumb that we did.
> Thanks, > Dan
> On Mon, Sep 29, 2008 at 7:49 PM, Aman Gupta <themastermi...@gmail.com>wrote:
>> Do you know what specifically about your buffer was causing issues? >> Were you using String#<<
>> Aman
>> On Mon, Sep 29, 2008 at 5:45 PM, Dan Mayer <d...@devver.net> wrote: >> > Thanks for the tip on installing Swiftiply, that made stream_file_data >> work >> > perfectly.
>> > Unfortunately, it didn't solve our problem. Large files were still >> taking a >> > long time to transfer. So I looked deeper into the issue, I had always >> been >> > assuming the delay was actually the slow transfer time. Running a >> profiler >> > against our code was enlightening as always, it appears our message >> buffer >> > is adding a significant amount of the time. If I completely get rid of >> any >> > message buffer on the server used to split up multiple messages, either >> > send_data or stream_file_data (with larger files) drops to less than 1 >> > second. After searching around a bit I found BufferedTokenizer, which is >> one >> > of the protocols for EM. Switching from our apparently bad buffer to the >> one >> > included with EM brought us from 10 seconds to 1.2 seconds.
>> > Thanks for the the help, looks like everything is back on track for our >> EM >> > performance.
>> > thanks, >> > Dan Mayer
>> > On Mon, Sep 29, 2008 at 5:56 AM, James Tucker <jftuc...@gmail.com> >> wrote:
>> >> On 29 Sep 2008, at 03:46, Kirk Haines wrote:
>> >> On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <d...@devver.net> wrote:
>> >>> We have been trying to send large files with EventMachine and noticed >> a >> >>> few issues. If we just use send data with the contents of a file >> inside it >> >>> is slow, and the server eats about 98% of the CPU. The send_file call >> only >> >>> supports files up to 32K, which we are sending files as large as 5mb. >> Lastly >> >>> we have been unable to use stream_file_data, because it has a >> dependency on >> >>> evma_fastfilereader, which I couldn't seem to find anywhere to install >> >>> anymore.
>> >> Hmmm. I think that was confused oversight on Francis/my part. >> >> evma_fastfilereader should be part of EM. Until it is, you can get it >> by >> >> installing Swiftiply.
>> >> I've been meaning to come and grab it and commit it to EM, as it's also >> >> the last failing test in the suite run from trunk after the last months >> >> work. Assuming there are no other issues raised, I will get this >> committed >> >> to the EM code base.
>> >>> Has anyone been sending large file with eventmachine that could share >> >>> some tips. In our case we are using EM for both the client and the >> server. >> >>> We are trying to sync over a directory of many files, is this just not >> a >> >>> recommended usage of EM? Besides looking for solutions to make this >> work >> >>> better on EM, are there other recommendations of better ways to send >> and >> >>> receive large amounts of file data with Ruby?
>> >> Using stream_file_data I regularly transfer very large files with >> >> Swiftiply.
I posted some quick benchmarks comparing sending files with our buffer, EM's buffer, the buffer James Tucker suggested, and stream_file_data. I also included some benchmarks with compression. I included the code I used for testing. I thought since I hadn't easily found a good way to send files it might help out some people in the future. It was nice to be able to just switch buffers and get a 10X improvement on speed.
On Tue, Sep 30, 2008 at 10:55 PM, Dan Mayer <d...@devver.net> wrote: > Sure no problem. Sorry it took me so long to get back to this, I got > slammed with some items that I had to take care of today.
> I ran it on a small test set of data, and the results were very similar... > The current tokenizer in EM seemed to outperform your pastie by very small > amounts. Tomorrow I can run it against a much large and real project, and I > will let you know if I notice any significant differences.
> I am cleaning up some of the code I have been using, and will likely make a > post about various methods of sending files through EM in the next couple > days. I noticed it wasn't the easiest to find examples of the various > options just out on the web, so it might help a few people running into > similar problems.
> peace, > Dan Mayer
> On Tue, Sep 30, 2008 at 5:25 AM, James Tucker <jftuc...@gmail.com> wrote:
>> Dan, >> If you have some time, would you be able to use your data sets against >> this other BufferedTokenizer implementation:
>> There are varying cases for performance depending on the specific data >> sets and chunk size being added to the buffer. Ruby's GC certainly starts to >> cause performance issues with too many objects, so I'm trying to strike a >> balance.
>> Any input would be welcome,
>> Kind regards,
>> J.
>> On 30 Sep 2008, at 03:07, Dan Mayer wrote:
>> Aman (and hopefully others interested on the list),
>> Here is a profiler dump after I optimized a bit, I got ours from 26ish >> seconds down to 10 by getting rid of things like String#<< >> 14.44 3.49 0.66 668 0.99 0.99 String#split >> 13.13 4.09 0.60 665 0.90 0.90 String#index >> 4.16 4.28 0.19 668 0.28 3.29 DataBuffer#grab >> 3.06 4.42 0.14 661 0.21 6.87 >> EmServerExample#receive_data >> 0.88 4.46 0.04 2007 0.02 0.02 Array#length >> 0.66 4.49 0.03 2007 0.01 0.01 Fixnum#> >> 0.66 4.52 0.03 662 0.05 3.31 DataBuffer#append
>> What is the fastest way to do appending to strings?
>> This is a really messy since I was messing around trying a bunch >> optimizations and other things, before finding and switching to the EM >> buffer.
>> class DataBuffer >> FRONT_DELIMITER = "0x5b".hex.chr # '[' >> #']'[0].to_s(16).hex.chr >> BACK_DELIMITER = "0x5d".hex.chr # ']' >> #crazy delimiter because normal ones kept showing up in binary files >> DELIMITER = >> "|#{FRONT_DELIMITER}#{FRONT_DELIMITER}#{FRONT_DELIMITER}GT_DELIM#{BACK_DELI MITER}#{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}|" >> #added to replace, dynamically making these >> DELIM_ESCAPE = /#{Regexp.escape(DELIMITER)}/ >> DELIM_ESCAPE_END = /#{Regexp.escape(DELIMITER)}\Z/
>> I am probably going to look closer at the EM buffer and our code and I am >> sure I will realize something pretty dumb that we did.
>> Thanks, >> Dan
>> On Mon, Sep 29, 2008 at 7:49 PM, Aman Gupta <themastermi...@gmail.com>wrote:
>>> Do you know what specifically about your buffer was causing issues? >>> Were you using String#<<
>>> Aman
>>> On Mon, Sep 29, 2008 at 5:45 PM, Dan Mayer <d...@devver.net> wrote: >>> > Thanks for the tip on installing Swiftiply, that made stream_file_data >>> work >>> > perfectly.
>>> > Unfortunately, it didn't solve our problem. Large files were still >>> taking a >>> > long time to transfer. So I looked deeper into the issue, I had always >>> been >>> > assuming the delay was actually the slow transfer time. Running a >>> profiler >>> > against our code was enlightening as always, it appears our message >>> buffer >>> > is adding a significant amount of the time. If I completely get rid of >>> any >>> > message buffer on the server used to split up multiple messages, either >>> > send_data or stream_file_data (with larger files) drops to less than 1 >>> > second. After searching around a bit I found BufferedTokenizer, which >>> is one >>> > of the protocols for EM. Switching from our apparently bad buffer to >>> the one >>> > included with EM brought us from 10 seconds to 1.2 seconds.
>>> > Thanks for the the help, looks like everything is back on track for our >>> EM >>> > performance.
>>> > thanks, >>> > Dan Mayer
>>> > On Mon, Sep 29, 2008 at 5:56 AM, James Tucker <jftuc...@gmail.com> >>> wrote:
>>> >> On 29 Sep 2008, at 03:46, Kirk Haines wrote:
>>> >> On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <d...@devver.net> wrote:
>>> >>> We have been trying to send large files with EventMachine and noticed >>> a >>> >>> few issues. If we just use send data with the contents of a file >>> inside it >>> >>> is slow, and the server eats about 98% of the CPU. The send_file call >>> only >>> >>> supports files up to 32K, which we are sending files as large as 5mb. >>> Lastly >>> >>> we have been unable to use stream_file_data, because it has a >>> dependency on >>> >>> evma_fastfilereader, which I couldn't seem to find anywhere to >>> install >>> >>> anymore.
>>> >> Hmmm. I think that was confused oversight on Francis/my part. >>> >> evma_fastfilereader should be part of EM. Until it is, you can get it >>> by >>> >> installing Swiftiply.
>>> >> I've been meaning to come and grab it and commit it to EM, as it's >>> also >>> >> the last failing test in the suite run from trunk after the last >>> months >>> >> work. Assuming there are no other issues raised, I will get this >>> committed >>> >> to the EM code base.
>>> >>> Has anyone been sending large file with eventmachine that could share >>> >>> some tips. In our case we are using EM for both the client and the >>> server. >>> >>> We are trying to sync over a directory of many files, is this just >>> not a >>> >>> recommended usage of EM? Besides looking for solutions to make this >>> work >>> >>> better on EM, are there other recommendations of better ways to send >>> and >>> >>> receive large amounts of file data with Ruby?
>>> >> Using stream_file_data I regularly transfer very large files with >>> >> Swiftiply.
> On Tue, Sep 30, 2008 at 10:55 PM, Dan Mayer <d...@devver.net> wrote:
>> Sure no problem. Sorry it took me so long to get back to this, I got >> slammed with some items that I had to take care of today.
>> I ran it on a small test set of data, and the results were very similar... >> The current tokenizer in EM seemed to outperform your pastie by very small >> amounts. Tomorrow I can run it against a much large and real project, and I >> will let you know if I notice any significant differences.
>> I am cleaning up some of the code I have been using, and will likely make >> a post about various methods of sending files through EM in the next couple >> days. I noticed it wasn't the easiest to find examples of the various >> options just out on the web, so it might help a few people running into >> similar problems.
>> peace, >> Dan Mayer
>> On Tue, Sep 30, 2008 at 5:25 AM, James Tucker <jftuc...@gmail.com> wrote:
>>> Dan, >>> If you have some time, would you be able to use your data sets against >>> this other BufferedTokenizer implementation: >>> http://pastie.textmate.org/private/ykjtuipjedrwgzwgggu5w >>> There are varying cases for performance depending on the specific data >>> sets and chunk size being added to the buffer. Ruby's GC certainly starts to >>> cause performance issues with too many objects, so I'm trying to strike a >>> balance. >>> Any input would be welcome, >>> Kind regards, >>> J.
>>> On 30 Sep 2008, at 03:07, Dan Mayer wrote:
>>> Aman (and hopefully others interested on the list),
>>> Here is a profiler dump after I optimized a bit, I got ours from 26ish >>> seconds down to 10 by getting rid of things like String#<< >>> 14.44 3.49 0.66 668 0.99 0.99 String#split >>> 13.13 4.09 0.60 665 0.90 0.90 String#index >>> 4.16 4.28 0.19 668 0.28 3.29 DataBuffer#grab >>> 3.06 4.42 0.14 661 0.21 6.87 >>> EmServerExample#receive_data >>> 0.88 4.46 0.04 2007 0.02 0.02 Array#length >>> 0.66 4.49 0.03 2007 0.01 0.01 Fixnum#> >>> 0.66 4.52 0.03 662 0.05 3.31 DataBuffer#append
>>> What is the fastest way to do appending to strings?
>>> This is a really messy since I was messing around trying a bunch >>> optimizations and other things, before finding and switching to the EM >>> buffer.
>>> class DataBuffer >>> FRONT_DELIMITER = "0x5b".hex.chr # '[' >>> #']'[0].to_s(16).hex.chr >>> BACK_DELIMITER = "0x5d".hex.chr # ']' >>> #crazy delimiter because normal ones kept showing up in binary files >>> DELIMITER = >>> "|#{FRONT_DELIMITER}#{FRONT_DELIMITER}#{FRONT_DELIMITER}GT_DELIM#{BACK_DELI MITER}#{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}|" >>> #added to replace, dynamically making these >>> DELIM_ESCAPE = /#{Regexp.escape(DELIMITER)}/ >>> DELIM_ESCAPE_END = /#{Regexp.escape(DELIMITER)}\Z/
>>> I am probably going to look closer at the EM buffer and our code and I am >>> sure I will realize something pretty dumb that we did.
>>> Thanks, >>> Dan
>>> On Mon, Sep 29, 2008 at 7:49 PM, Aman Gupta <themastermi...@gmail.com> >>> wrote:
>>>> Do you know what specifically about your buffer was causing issues? >>>> Were you using String#<<
>>>> Aman
>>>> On Mon, Sep 29, 2008 at 5:45 PM, Dan Mayer <d...@devver.net> wrote: >>>> > Thanks for the tip on installing Swiftiply, that made stream_file_data >>>> > work >>>> > perfectly.
>>>> > Unfortunately, it didn't solve our problem. Large files were still >>>> > taking a >>>> > long time to transfer. So I looked deeper into the issue, I had always >>>> > been >>>> > assuming the delay was actually the slow transfer time. Running a >>>> > profiler >>>> > against our code was enlightening as always, it appears our message >>>> > buffer >>>> > is adding a significant amount of the time. If I completely get rid of >>>> > any >>>> > message buffer on the server used to split up multiple messages, >>>> > either >>>> > send_data or stream_file_data (with larger files) drops to less than 1 >>>> > second. After searching around a bit I found BufferedTokenizer, which >>>> > is one >>>> > of the protocols for EM. Switching from our apparently bad buffer to >>>> > the one >>>> > included with EM brought us from 10 seconds to 1.2 seconds.
>>>> > Thanks for the the help, looks like everything is back on track for >>>> > our EM >>>> > performance.
>>>> > thanks, >>>> > Dan Mayer
>>>> > On Mon, Sep 29, 2008 at 5:56 AM, James Tucker <jftuc...@gmail.com> >>>> > wrote:
>>>> >> On 29 Sep 2008, at 03:46, Kirk Haines wrote:
>>>> >> On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <d...@devver.net> wrote:
>>>> >>> We have been trying to send large files with EventMachine and >>>> >>> noticed a >>>> >>> few issues. If we just use send data with the contents of a file >>>> >>> inside it >>>> >>> is slow, and the server eats about 98% of the CPU. The send_file >>>> >>> call only >>>> >>> supports files up to 32K, which we are sending files as large as >>>> >>> 5mb. Lastly >>>> >>> we have been unable to use stream_file_data, because it has a >>>> >>> dependency on >>>> >>> evma_fastfilereader, which I couldn't seem to find anywhere to >>>> >>> install >>>> >>> anymore.
>>>> >> Hmmm. I think that was confused oversight on Francis/my part. >>>> >> evma_fastfilereader should be part of EM. Until it is, you can get >>>> >> it by >>>> >> installing Swiftiply.
>>>> >> I've been meaning to come and grab it and commit it to EM, as it's >>>> >> also >>>> >> the last failing test in the suite run from trunk after the last >>>> >> months >>>> >> work. Assuming there are no other issues raised, I will get this >>>> >> committed >>>> >> to the EM code base.
>>>> >>> Has anyone been sending large file with eventmachine that could >>>> >>> share >>>> >>> some tips. In our case we are using EM for both the client and the >>>> >>> server. >>>> >>> We are trying to sync over a directory of many files, is this just >>>> >>> not a >>>> >>> recommended usage of EM? Besides looking for solutions to make this >>>> >>> work >>>> >>> better on EM, are there other recommendations of better ways to send >>>> >>> and >>>> >>> receive large amounts of file data with Ruby?
>>>> >> Using stream_file_data I regularly transfer very large files with >>>> >> Swiftiply.
> > On Tue, Sep 30, 2008 at 10:55 PM, Dan Mayer <d...@devver.net> wrote:
> >> Sure no problem. Sorry it took me so long to get back to this, I got > >> slammed with some items that I had to take care of today.
> >> I ran it on a small test set of data, and the results were very > similar... > >> The current tokenizer in EM seemed to outperform your pastie by very > small > >> amounts. Tomorrow I can run it against a much large and real project, > and I > >> will let you know if I notice any significant differences.
> >> I am cleaning up some of the code I have been using, and will likely > make > >> a post about various methods of sending files through EM in the next > couple > >> days. I noticed it wasn't the easiest to find examples of the various > >> options just out on the web, so it might help a few people running into > >> similar problems.
> >> peace, > >> Dan Mayer
> >> On Tue, Sep 30, 2008 at 5:25 AM, James Tucker <jftuc...@gmail.com> > wrote:
> >>> Dan, > >>> If you have some time, would you be able to use your data sets against > >>> this other BufferedTokenizer implementation: > >>> http://pastie.textmate.org/private/ykjtuipjedrwgzwgggu5w > >>> There are varying cases for performance depending on the specific data > >>> sets and chunk size being added to the buffer. Ruby's GC certainly > starts to > >>> cause performance issues with too many objects, so I'm trying to strike > a > >>> balance. > >>> Any input would be welcome, > >>> Kind regards, > >>> J.
> >>> On 30 Sep 2008, at 03:07, Dan Mayer wrote:
> >>> Aman (and hopefully others interested on the list),
> >>> Here is a profiler dump after I optimized a bit, I got ours from 26ish > >>> seconds down to 10 by getting rid of things like String#<< > >>> 14.44 3.49 0.66 668 0.99 0.99 String#split > >>> 13.13 4.09 0.60 665 0.90 0.90 String#index > >>> 4.16 4.28 0.19 668 0.28 3.29 DataBuffer#grab > >>> 3.06 4.42 0.14 661 0.21 6.87 > >>> EmServerExample#receive_data > >>> 0.88 4.46 0.04 2007 0.02 0.02 Array#length > >>> 0.66 4.49 0.03 2007 0.01 0.01 Fixnum#> > >>> 0.66 4.52 0.03 662 0.05 3.31 DataBuffer#append
> >>> What is the fastest way to do appending to strings?
> >>> This is a really messy since I was messing around trying a bunch > >>> optimizations and other things, before finding and switching to the EM > >>> buffer.
> >>> class DataBuffer > >>> FRONT_DELIMITER = "0x5b".hex.chr # '[' > >>> #']'[0].to_s(16).hex.chr > >>> BACK_DELIMITER = "0x5d".hex.chr # ']' > >>> #crazy delimiter because normal ones kept showing up in binary files > >>> DELIMITER =
> "|#{FRONT_DELIMITER}#{FRONT_DELIMITER}#{FRONT_DELIMITER}GT_DELIM#{BACK_DELI MITER}#{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}|" > >>> #added to replace, dynamically making these > >>> DELIM_ESCAPE = /#{Regexp.escape(DELIMITER)}/ > >>> DELIM_ESCAPE_END = /#{Regexp.escape(DELIMITER)}\Z/
> >>> I am probably going to look closer at the EM buffer and our code and I > am > >>> sure I will realize something pretty dumb that we did.
> >>> Thanks, > >>> Dan
> >>> On Mon, Sep 29, 2008 at 7:49 PM, Aman Gupta <themastermi...@gmail.com> > >>> wrote:
> >>>> Do you know what specifically about your buffer was causing issues? > >>>> Were you using String#<<
> >>>> Aman
> >>>> On Mon, Sep 29, 2008 at 5:45 PM, Dan Mayer <d...@devver.net> wrote: > >>>> > Thanks for the tip on installing Swiftiply, that made > stream_file_data > >>>> > work > >>>> > perfectly.
> >>>> > Unfortunately, it didn't solve our problem. Large files were still > >>>> > taking a > >>>> > long time to transfer. So I looked deeper into the issue, I had > always > >>>> > been > >>>> > assuming the delay was actually the slow transfer time. Running a > >>>> > profiler > >>>> > against our code was enlightening as always, it appears our message > >>>> > buffer > >>>> > is adding a significant amount of the time. If I completely get rid > of > >>>> > any > >>>> > message buffer on the server used to split up multiple messages, > >>>> > either > >>>> > send_data or stream_file_data (with larger files) drops to less than > 1 > >>>> > second. After searching around a bit I found BufferedTokenizer, > which > >>>> > is one > >>>> > of the protocols for EM. Switching from our apparently bad buffer to > >>>> > the one > >>>> > included with EM brought us from 10 seconds to 1.2 seconds.
> >>>> > Thanks for the the help, looks like everything is back on track for > >>>> > our EM > >>>> > performance.
> >>>> > thanks, > >>>> > Dan Mayer
> >>>> > On Mon, Sep 29, 2008 at 5:56 AM, James Tucker <jftuc...@gmail.com> > >>>> > wrote:
> >>>> >> On 29 Sep 2008, at 03:46, Kirk Haines wrote:
> >>>> >> On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <d...@devver.net> wrote:
> >>>> >>> We have been trying to send large files with EventMachine and > >>>> >>> noticed a > >>>> >>> few issues. If we just use send data with the contents of a file > >>>> >>> inside it > >>>> >>> is slow, and the server eats about 98% of the CPU. The send_file > >>>> >>> call only > >>>> >>> supports files up to 32K, which we are sending files as large as > >>>> >>> 5mb. Lastly > >>>> >>> we have been unable to use stream_file_data, because it has a > >>>> >>> dependency on > >>>> >>> evma_fastfilereader, which I couldn't seem to find anywhere to > >>>> >>> install > >>>> >>> anymore.
> >>>> >> Hmmm. I think that was confused oversight on Francis/my part. > >>>> >> evma_fastfilereader should be part of EM. Until it is, you can get > >>>> >> it by > >>>> >> installing Swiftiply.
> >>>> >> I've been meaning to come and grab it and commit it to EM, as it's > >>>> >> also > >>>> >> the last failing test in the suite run from trunk after the last > >>>> >> months > >>>> >> work. Assuming there are no other issues raised, I will get this > >>>> >> committed > >>>> >> to the EM code base.
> >>>> >>> Has anyone been sending large file with eventmachine that could > >>>> >>> share > >>>> >>> some tips. In our case we are using EM for both the client and the > >>>> >>> server. > >>>> >>> We are trying to sync over a directory of many files, is this just > >>>> >>> not a > >>>> >>> recommended usage of EM? Besides looking for solutions to make > this > >>>> >>> work > >>>> >>> better on EM, are there other recommendations of better ways to > send > >>>> >>> and > >>>> >>> receive large amounts of file data with Ruby?
> >>>> >> Using stream_file_data I regularly transfer very large files with > >>>> >> Swiftiply.