Errno::ECONNRESE when a command takes a long time to run

262 views
Skip to first unread message

francois.beausoleil

unread,
Aug 28, 2008, 1:42:49 PM8/28/08
to Capistrano
I know it's a configuration issue on my server, but I can't find what
to change... I receive the following backtrace regularly:

/Library/Ruby/Gems/1.8/gems/net-ssh-2.0.2/lib/net/ssh/buffered_io.rb:
64:in `recv': Connection reset by peer - recvfrom(2)
(Errno::ECONNRESET)
from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.2/lib/net/ssh/
buffered_io.rb:64:in `fill'
from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.2/lib/net/ssh/connection/
session.rb:200:in `postprocess'
from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.2/lib/net/ssh/connection/
session.rb:196:in `each'
from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.2/lib/net/ssh/connection/
session.rb:196:in `postprocess'
from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/capistrano/
processable.rb:31:in `process_iteration'
from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/capistrano/
processable.rb:43:in `ensure_each_session'
from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/capistrano/
processable.rb:41:in `each'
from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/capistrano/
processable.rb:41:in `ensure_each_session'
... 25 levels...
from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/capistrano/cli/
execute.rb:14:in `execute'
from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/bin/cap:4
from /usr/bin/cap:19:in `load'
from /usr/bin/cap:19

This happens while running commands remotely that take a "long" time
to execute. Something on the order of minutes (restoring backups or
filling an EC2 volume for example). This is on Ubuntu Hardy 8.04
using OpenSSH_4.7p1 Debian-8ubuntu1.2, OpenSSL 0.9.8g 19 Oct 2007.

I did check man sshd_config, but couldn't find a timeout option I
could identify. I found this thread
http://www.wideopen.com/archives/redhat-list/2001-April/msg01778.html
on the RedHat mailing list, but it doesn't have a response. This
seemed promising: http://johnsonsolutions.blogspot.com/2008/02/config-idle-timeout-for-openssh-fedora.html,
alas, the OpenSSH version above didn't recognize the IdleTimeout
option. And the OpenSSH general mailing list didn't turn up anything
relevant on a search for "timeout".

BTW, I do get kicked out of the machine regularly when I simply SSH
into the box and run the same commands, so this is nothing Capistrano-
specific.

Any help would be *very* appreciated!

Thanks!
François

Jamis Buck

unread,
Aug 28, 2008, 1:51:13 PM8/28/08
to capis...@googlegroups.com
It's not necessarily a server configuration issue. See this thread for
a discussion of this same issue, with a workaround:

http://groups.google.com/group/capistrano/browse_thread/thread/a919f2e77b7c8a28/3ca39907f6f4f3a1?#3ca39907f6f4f3a1

(or, if that long URL gets truncated, here's a tinyurl: http://tinyurl.com/67w7py)

- Jamis

> --~--~---------~--~----~------------~-------~--~----~
> To unsubscribe from this group, send email to capistrano-...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/capistrano
> -~----------~----~----~----~------~----~------~--~---
>

francois.beausoleil

unread,
Aug 28, 2008, 2:52:47 PM8/28/08
to Capistrano
Thanks Jamis. I read the thread you mentioned. What you are advising
is to change this code:

desc "Restores the latest binary dump"
task :latest_physical, :roles => %w(slave) do
run "cd /tmp/restore && clis3 get #{backup} | tar xzv"
run "rm -rfv /var/lib/mysql/* && cp -rv /tmp/restore/backup/* /var/
lib/mysql && rm -f /var/lib/mysql/*.pid"
end

Into this:
desc "Restores the latest binary dump"
task :latest_physical, :roles => %w(slave) do
teardown_connections_to(sessions.keys)
run "cd /tmp/restore && clis3 get #{backup} | tar xzv"
teardown_connections_to(sessions.keys)
run "rm -rfv /var/lib/mysql/* && cp -rv /tmp/restore/backup/* /var/
lib/mysql && rm -f /var/lib/mysql/*.pid"
end

I haven't tried it yet, just want to confirm I'm reading things
correctly.

Thanks!
François

On 28 août, 13:51, Jamis Buck <ja...@37signals.com> wrote:
> It's not necessarily a server configuration issue. See this thread for  
> a discussion of this same issue, with a workaround:
>
> http://groups.google.com/group/capistrano/browse_thread/thread/a919f2...
> > seemed promising:http://johnsonsolutions.blogspot.com/2008/02/config-idle-timeout-for-...
> > ,
> > alas, the OpenSSH version above didn't recognize the IdleTimeout
> > option.  And the OpenSSH general mailing list didn't turn up anything
> > relevant on a search for "timeout".
>
> > BTW, I do get kicked out of the machine regularly when I simply SSH
> > into the box and run the same commands, so this is nothing Capistrano-
> > specific.
>
> > Any help would be *very* appreciated!
>
> > Thanks!
> > François
>
> > >
>
>
>  smime.p7s
> 3KAfficherTélécharger

Jamis Buck

unread,
Aug 28, 2008, 2:59:22 PM8/28/08
to capis...@googlegroups.com
Actually, you'd probably want to put the teardown at the very end of
the task, so that tasks run after latest_physical will have to force a
reconnect. Maybe I misunderstood though--is the timeout happening on
the second run()? If so, then yes, you'd put the teardown between the
two runs.

- Jamis

francois.beausoleil

unread,
Aug 28, 2008, 3:03:56 PM8/28/08
to Capistrano
The timeout happens while the run is working, not outside. The file
I'm extracting is 2.5 Gb compressed, 30Gb decompressed.

On 28 août, 14:59, "Jamis Buck" <ja...@37signals.com> wrote:
> Actually, you'd probably want to put the teardown at the very end of
> the task, so that tasks run after latest_physical will have to force a
> reconnect. Maybe I misunderstood though--is the timeout happening on
> the second run()? If so, then yes, you'd put the teardown between the
> two runs.
>
> - Jamis
>
> On Thu, Aug 28, 2008 at 12:52 PM, francois.beausoleil
>

Jamis Buck

unread,
Aug 28, 2008, 3:05:58 PM8/28/08
to capis...@googlegroups.com
Hmm. That doesn't seem right. Which version of Capistrano are you using?

- Jamis

francois.beausoleil

unread,
Aug 28, 2008, 3:11:30 PM8/28/08
to Capistrano
Latest:

$ cap --version
Capistrano v2.4.3


On 28 août, 15:05, "Jamis Buck" <ja...@37signals.com> wrote:
> Hmm. That doesn't seem right. Which version of Capistrano are you using?
>
> - Jamis
>
> On Thu, Aug 28, 2008 at 1:03 PM, francois.beausoleil
>

Jamis Buck

unread,
Aug 28, 2008, 3:13:27 PM8/28/08
to capis...@googlegroups.com
Can you send the output of a cap invocation that includes the timeout error?

- Jamis

francois.beausoleil

unread,
Aug 28, 2008, 3:22:34 PM8/28/08
to Capistrano
Here goes!

* executing "rm -rfv /var/lib/mysql/* && cp -rv /tmp/restore/backup/
* /var/lib/mysql && rm -f /var/lib/mysql/*.pid"
servers: ["ec2-75-101-245-196.compute-1.amazonaws.com"]
[ec2-75-101-245-196.compute-1.amazonaws.com] executing command
** [out :: ec2-75-101-245-196.compute-1.amazonaws.com] removed `/var/
lib/mysql/debian-5.0.flag'
** [out :: ec2-75-101-245-196.compute-1.amazonaws.com] removed `/var/
lib/mysql/domU-12-31-38-00-3D-B4.pid'
** [out :: ec2-75-101-245-196.compute-1.amazonaws.com] removed `/var/
lib/mysql/ib_logfile0'
** [out :: ec2-75-101-245-196.compute-1.amazonaws.com] removed `/var/
lib/mysql/ib_logfile1'
** [out :: ec2-75-101-245-196.compute-1.amazonaws.com] removed `/var/
lib/mysql/ibdata0'
I sent this backtrace in the first message of this thread too.

On 28 août, 15:13, "Jamis Buck" <ja...@37signals.com> wrote:
> Can you send the output of a cap invocation that includes the timeout error?
>
> - Jamis
>
> On Thu, Aug 28, 2008 at 1:11 PM, francois.beausoleil
>

Kenneth Kalmer

unread,
Aug 28, 2008, 3:31:15 PM8/28/08
to capis...@googlegroups.com
On Thu, Aug 28, 2008 at 7:42 PM, francois.beausoleil <francois....@gmail.com> wrote:

I know it's a configuration issue on my server, but I can't find what
to change...  I receive the following backtrace regularly:

/Library/Ruby/Gems/1.8/gems/net-ssh-2.0.2/lib/net/ssh/buffered_io.rb:
64:in `recv': Connection reset by peer - recvfrom(2)
(Errno::ECONNRESET)
       from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.2/lib/net/ssh/
buffered_io.rb:64:in `fill'
This happens while running commands remotely that take a "long" time
to execute.  Something on the order of minutes (restoring backups or
filling an EC2 volume for example).  This is on Ubuntu Hardy 8.04
using OpenSSH_4.7p1 Debian-8ubuntu1.2, OpenSSL 0.9.8g 19 Oct 2007.

BTW, I do get kicked out of the machine regularly when I simply SSH
into the box and run the same commands, so this is nothing Capistrano-
specific.

Any help would be *very* appreciated!

Now I wish I was skilled enough to dive into net/ssh, maybe in the near future.

Jamis, I'm sure the SSH protocol has a NOOP similair to FTP. I based this only on being able to set it in putty and SecureCRT. I did q quick google and found this (http://drupal.star.bnl.gov/STAR/comp/sofi/facility-access/ssh-stable-con), where there is a reference made to some RFC. They also discuss parameters for the *nix ssh command that can be used as well, this will help Francois as well.

Best

--
Kenneth Kalmer
kenneth...@gmail.com
http://opensourcery.co.za

francois.beausoleil

unread,
Aug 28, 2008, 4:14:47 PM8/28/08
to Capistrano
Thanks Kenneth,

This is nice, but I fail to see the relevance for Capistrano. Can we
set TCPKeepAlive and TCP_NODELAY in Net::SSH? Because that's what
seems to be needed here.

Thanks!
François
> kenneth.kal...@gmail.comhttp://opensourcery.co.za

Jamis Buck

unread,
Aug 28, 2008, 6:39:41 PM8/28/08
to capis...@googlegroups.com
Networking is not my forte. Net::SSH does not currently support either
TCPKeepAlive or TCP_NODELAY, primarily because I have no idea how to
implement or test those. If someone wants to take a stab at it, it
could be a valuable addition to Net::SSH.

- Jamis

Reply all
Reply to author
Forward
0 new messages