Ruby Net::SFTP issue

1,210 views
Skip to first unread message

Chris McCann

unread,
May 11, 2013, 6:28:56 PM5/11/13
to sdr...@googlegroups.com
I've built a Rails app that retrieves zip files from a remote server and processes them (loads the CSV files contained therein into a MySQL database).  

The file retrieval is done using Ruby's Net::SFTP module running in the background via DelayedJob.  It works well, unless the zip file is over 4GB, in which case it quits pretty much exactly after transferring 4GB of data.  There are no errors whatsoever in the log -- I just see that the file download was started and never finished.

The file is actually 8.5GB, and downloading it manually with sftp in the terminal works just fine.

I've searched the interwebs for any information about his problem without any success.  The only clue I have is that trying to unzip the files using rubyzip also failed for large zip files because rubyzip doesn't handle the Zip64 format, which any zip file over 4GB has.  I'm wondering if rubyzip has a similar 32-bit limit in the size of a file it can handle.  Nothing in the Net::SFTP docs say anything about 32-bit limitations. Thoughts?

If anyone has a clue, I'm all ears.  I'm not sure if it's a ruby problem, a Rails problem, or perhaps a problem on the other end in which the FTP client disconnects after 4GB has been transferred.  Odd, to say the least.

Thanks,

Chris

Ian Young

unread,
May 11, 2013, 6:32:32 PM5/11/13
to sdr...@googlegroups.com
You're not trying to put the files on a FAT32 filesystem, are you? Because that has a 4GB limit.


--
--
SD Ruby mailing list
sdr...@googlegroups.com
http://groups.google.com/group/sdruby
---
You received this message because you are subscribed to the Google Groups "SD Ruby" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sdruby+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Neal Clark

unread,
May 11, 2013, 6:33:15 PM5/11/13
to sdr...@googlegroups.com
having absolutely no experience doing any of the things your app is doing, my first thought is… try a compression format that lets you break things up into multiple files, e.g. rar. or try split(1). if it works for < 4gb, keeping it < 4gb seems like an easy "first try" solution.

-n

Chris McCann

unread,
May 11, 2013, 6:48:31 PM5/11/13
to sdr...@googlegroups.com
No, they're being downloaded into a 64-bit CentOS machine.

You received this message because you are subscribed to a topic in the Google Groups "SD Ruby" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sdruby/4OhfZ4hklIA/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to sdruby+un...@googlegroups.com.

Chris McCann

unread,
May 11, 2013, 6:50:35 PM5/11/13
to sdr...@googlegroups.com
I don't have control of the file format -- they're provided by a third-party contractor.  The normal sftp process is able to download the files fine, so I'm suspecting it's a ruby limitation, and I really can't ask them to change the compression format for that.

You received this message because you are subscribed to a topic in the Google Groups "SD Ruby" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sdruby/4OhfZ4hklIA/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to sdruby+un...@googlegroups.com.

James Miller

unread,
May 11, 2013, 6:52:19 PM5/11/13
to sdr...@googlegroups.com
Can you paste some code for the SFTP download? I'd definitely start by confirming whether or not the Ruby script can exceed a 4GB download by extracting that out to its own script, since it sounds like you're unsure whether the problem is in the download or the unarchive.

Similarly, download the 8.5GB file manually and run it through rubyzip to see if it chokes on it.

Once you nail down the culprit, then we can work on fixing it :)


On Sat, May 11, 2013 at 3:33 PM, Neal Clark <ncl...@thrownproject.com> wrote:

Neal Clark

unread,
May 11, 2013, 6:53:57 PM5/11/13
to sdr...@googlegroups.com
so it sounds like it's not a Net::SFTP issue? have you tried shelling out to unzip the files instead of using rubyzip?

Chris McCann

unread,
May 11, 2013, 6:55:51 PM5/11/13
to sdr...@googlegroups.com
Bensie,

Here's the code:

  def download_file!
    SFTP.start(@host, @user, password: @password) do |sftp|
      @remote_path  = File.join(@remote_dir, @remote_file_name)
      @extract_path = File.join(@tmp_dir, @remote_file_name)
      sftp.download!(remote_path, extract_path, read_size: 64000) do |event, downloader, *args|
        case event
          when :open then
            # args[0] : file metadata
            log_event "Starting download: #{@remote_path}"
          when :finish then
            log_event "Completed download of #{@remote_path}"
        end
      end
    end
    @extract_path
  end

I have not been able to get the ruby script to download more than 4GB, and I have also confirmed that rubyzip can't unzip a file > 4GB (it explicitly does not support Zip64).

Cheers,

Chris

You received this message because you are subscribed to a topic in the Google Groups "SD Ruby" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sdruby/4OhfZ4hklIA/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to sdruby+un...@googlegroups.com.

Chris McCann

unread,
May 11, 2013, 6:57:16 PM5/11/13
to sdr...@googlegroups.com
The issue of unzipping it is separate -- I've confirmed rubyzip does not support Zip64 (> 4GB) files.

The issue I'm contending with now is that Net::SFTP doesn't seem to transfer files > 4GB, and given the rubyzip issue, I wondered if they might have the same root cause, a 32-bit integer limitation.

James Miller

unread,
May 11, 2013, 7:55:36 PM5/11/13
to sdr...@googlegroups.com
While this doesn't help solve anything, I can confirm the same issue on my 64-bit setup - files over 4GB just hang forever. Might need to shell out...

Chris McCann

unread,
May 11, 2013, 8:12:12 PM5/11/13
to sdr...@googlegroups.com
Thanks, Bensie, good to have another data point. 
To unsubscribe from this group and all its topics, send an email to sdruby+un...@googlegroups.com.

Bradly

unread,
May 11, 2013, 8:48:41 PM5/11/13
to sdr...@googlegroups.com
I'm not sure if it still does but I'm pretty sure net/sftp used to load the entire file in to memory on download. If that is the case perhaps something is watching and killing the process? Does net/sftp allow you to grab the file in chunks?

bradleyland

unread,
May 12, 2013, 11:24:45 PM5/12/13
to sdr...@googlegroups.com
A lot of libraries (Ruby and elsewhere) handle large files rather naively. Fortunately, Net::SFTP looks like it has some smarts for this. The docs for Session#download! say: "If local [the second argument] is omitted, downloads the file to an in-memory buffer and returns the result as a string; otherwise, returns the Net::SFTP::Operations::Download instance." The next place to look is at Download: http://net-ssh.github.io/net-sftp/classes/Net/SFTP/Operations/Download.html. It looks like you've already been there though, because you're using the event handler. The file is written as the download progresses, so it doesn't look like it's cramming the whole thing in memory. I also watched memory usage while downloading a large file, and the ruby process stays around 30 MB, so I don't think it's an issue related to in-memory buffering.

I'm still digging to see if I can find a reason why it would hang at 4 GB.

Chris McCann

unread,
May 12, 2013, 11:43:04 PM5/12/13
to sdr...@googlegroups.com
Thanks very much, Brad. I looked through the source to see if I could find a 32-bit integer somewhere that might be the problem but didn't find any smoking guns. Let me know if you do. 

On May 12, 2013, at 8:24 PM, bradleyland <brad.l...@gmail.com> wrote:

A lot of libraries (Ruby and elsewhere) handle large files rather naively. Fortunately, Net::SFTP looks like it has some smarts for this. The docs for Session#download! say: "If local [the second argument] is omitted, downloads the file to an in-memory buffer and returns the result as a string; otherwise, returns the Net::SFTP::Operations::Download instance." The next place to look is at Download: http://net-ssh.github.io/net-sftp/classes/Net/SFTP/Operations/Download.html. It looks like you've already been there though, because you're using the event handler. The file is written as the download progresses, so it doesn't look like it's cramming the whole thing in memory. I also watched memory usage while downloading a large file, and the ruby process stays around 30 MB, so I don't think it's an issue related to in-memory buffering.

I'm still digging to see if I can find a reason why it would hang at 4 GB.

--

bradleyland

unread,
May 13, 2013, 12:05:11 AM5/13/13
to sdr...@googlegroups.com
Yeah, Ruby Fixnum normally automatically casts to Bignum if you exceed Fixnum boundaries, so it would be an odd case that a reference counter overflow would hang this up. This is really puzzling, and I'm wondering if the bug isn't further down the stack, like Net::SSH or something.

I'm curious, have you tried Session#download instead of Session#download!. The former is non-blocking, but won't really have any effect on your operation unless you specifically need it to block until finished.

This really sucks to debug! Waiting for a 4 GB file transfer is like waiting for paint to dry.

Chris McCann

unread,
May 13, 2013, 12:18:04 AM5/13/13
to sdr...@googlegroups.com
I haven't tried the #download method though I could easily do that.  I was contemplating setting up a local FTP server on my dev machine to try to debug this, under the assumption that the file transfer speed would be very fast running locally.

I also haven't looked into Net::SSH but I sort of discounted it because it seemed that a timeout or other issue there would cause the file transfer to not always fail on a 4GB boundary.  It fails at exactly the same number of bytes each time.  Still, there may be something to glean by looking there, too.

James Miller

unread,
May 13, 2013, 12:21:27 AM5/13/13
to sdr...@googlegroups.com
Definitely don't try testing this over the Internet - that will take forever to debug. If you're on a Mac just turn on SSH access and create test files of the desired size with the dd command. SFTP uses the SSH protocol so there is nothing else you need to setup.  
You received this message because you are subscribed to the Google Groups "SD Ruby" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sdruby+un...@googlegroups.com.

Chris McCann

unread,
May 13, 2013, 12:31:59 AM5/13/13
to sdr...@googlegroups.com
Good gouge, Bensie, thanks.

bradleyland

unread,
May 13, 2013, 9:08:08 AM5/13/13
to sdr...@googlegroups.com
Yeah, I'm using localhost as the host argument to SFTP#start so the file transfer occurs over the loopback adapter. It still takes over 5 minutes on a MacBook Air with SSD because SSH won't transfer anything more than 64K chunks. It's like transferring water in a thimble.

bradleyland

unread,
May 13, 2013, 9:47:53 AM5/13/13
to sdr...@googlegroups.com
Have a look at this Gist:


That's the test script I've been using, along with stack traces after the interrupt. The `when :get` case statement writes progress, so we can see exactly how many bytes were transferred, plus we get some insight in to the read loop activity level. Observing the script, we can see the Net::SFTP read loop stop updating when trying to read 4294049792 bytes. Operations::Download#on_read updates progress every time a read finishes, so when it stops, we can assume that something has hung internally.

In that same Gist, you can see the stack trace from the script when given SIGINT after it hangs. The resulting stack trace gives further clues that the problem is actually inside Net::SSH. I tried both Session#download! and Session#download. Both show the hang at net-ssh-2.6.7/lib/net/ssh/ruby_compat.rb:30 when given the interrupt. There's a call to IO#select here, which makes a system call to select(2) (we've arrived at the threshold of Ruby's C implementations) and returns an array. If we wanted to keep debugging, we'd want to attach a debugger to the process and have a look at the stack, focusing on arguments passed to IO#select. Given the file offset range we're in, a datatype boundary issue is likely.

And at this point, you're really traveling down the rabbit hole. I'm an amateur Rubyist, so we're down to the point where my knowledge withers and I start grabbing at straws.

This would make a good ticket for Net::SSH (and maybe Net::SFTP). I'm sure the authors of those libraries would be interested in seeing what we've discovered here. If you're not up for it, I'd be happy to take it to them. If you do open a ticket, be sure to post a link here so we can follow the progress.

In the interim, it looks like your best bet for a workaround is shelling out.

Chris McCann

unread,
May 13, 2013, 10:54:33 AM5/13/13
to sdr...@googlegroups.com
Wow, top-notch work, Brad.  I'll happily submit this as a ticket as you suggested and give full credit to you for the sleuthing, unless you'd like to submit it.

bradleyland

unread,
May 13, 2013, 12:13:28 PM5/13/13
to sdr...@googlegroups.com
Thanks :) I tried to get debugger (gem) hooked up to an invocation of that script, but once it hangs the rdb console goes unresponsive. It's pretty ugly! Fortunately, `gdb -p <process_id>` was still able to attach. You can see the output of a 'where' call to gdb at the end of the same Gist:


I think I'd like to open the ticket since I have a debugger running and attached. I have it running on a machine that I can leave idle waiting for additional gdb calls, so that might help the maintainer, given the time it takes to reproduce.


I'll ping you in the issue so you can stay up to speed.

Chris McCann

unread,
May 13, 2013, 12:15:02 PM5/13/13
to sdr...@googlegroups.com
Brad,

Sounds good.  My github handle is testflyjets.

Cheers,

Chris

--

bradleyland

unread,
May 13, 2013, 3:15:20 PM5/13/13
to sdr...@googlegroups.com
Bug report is up: https://github.com/net-ssh/net-sftp/issues/27

The output of gdb `where` is very telling. I think the hang is definitely within the system call, so I'm expecting that this might go upstream a few levels (maybe to Ruby core?).

Do you know anything about Open3? I recommend using this library when shelling out from within applications that you intend to run unattended. You gain a lot more flexibility in handling the stdin, stdout, and stderr of the application you're spawning. That can be important when things go wrong. You can pretty easily wrap up stdout and stderr so you can log them to your Ruby application logs appropriately. It's also essential when you're working with shell applications that expect things on stdin. I've used it to wrap gnuplot in the past.

Reply all
Reply to author
Forward
0 new messages