Google Groups Home Help | Sign in
Switchpipe stops responding after a random period of time
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  12 messages - Collapse all
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
ChrisR  
View profile
 More options Mar 27, 1:47 pm
From: ChrisR <EvilGeen...@gmail.com>
Date: Thu, 27 Mar 2008 10:47:26 -0700 (PDT)
Local: Thurs, Mar 27 2008 1:47 pm
Subject: Switchpipe stops responding after a random period of time
I have just deployed 7 rails applications using switchpipe onto a
Solaris 11 machine.

I now have a major problem where switchpipe just stops responding to
requests and i have to give it a "./script/switchpipe restart" to get
it going again.  It seems to happen after a random amount of time,
1hr~3hours.  It has never lasted more than 3 hours.

Has anybody come across this before?

Any help is greatly appreciated as I really want to use switchpipe.

Thanks
Chris


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Cooper  
View profile
 More options Mar 27, 2:46 pm
From: Peter Cooper <pcoo...@gmail.com>
Date: Thu, 27 Mar 2008 11:46:59 -0700 (PDT)
Local: Thurs, Mar 27 2008 2:46 pm
Subject: Re: Switchpipe stops responding after a random period of time
On Mar 27, 5:47 pm, ChrisR <EvilGeen...@gmail.com> wrote:

> I have just deployed 7 rails applications using switchpipe onto a
> Solaris 11 machine.

> I now have a major problem where switchpipe just stops responding to
> requests and i have to give it a "./script/switchpipe restart" to get
> it going again.  It seems to happen after a random amount of time,
> 1hr~3hours.  It has never lasted more than 3 hours.

> Has anybody come across this before?

This was a common issue prior to 1.04. I imagine you are running 1.04
or higher (trunk) though, which in my own tests has been running for
over a month now uninterrupted on one of my machine. Someone else has
reported the same issue with the trunk version, however, and it began
after they upgraded Apache. I'm still waiting for more information
from them so I can replicate this issue, but if you can provide more
info, it might help triangulate the issue :) Essentially.. I need to
know what's in front of your SwitchPipe? Apache? Nginx?

Once I can replicate the problem, it'll probably be a pretty easy fix.
The pre 1.04 issue was related to some UNIX quirks I wasn't taking
into account, so that could also be the issue here since I have not
extensively tested on anything else than BSD or Linux so far.

Cheers,
Pete


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jason Stirk  
View profile
 More options Mar 28, 12:14 am
From: "Jason Stirk" <jst...@gmail.com>
Date: Fri, 28 Mar 2008 15:14:12 +1100
Local: Fri, Mar 28 2008 12:14 am
Subject: Re: [SwitchPipe] Re: Switchpipe stops responding after a random period of time

Ooh! My ears are burning!

I'm having a similar problem, but anywhere from a few hours to over 24. The
last time I restarted I think it lasted about 8-10 hours (just had to
restart then).

This is happening on a Gentoo box (2.6.18-xen kernel), Ruby 1.8.6-111. I did
notice that daemons gem I'm running is 1.0.9 vs. 1.0.10 which is latest. All
other gems that the site asks to install are at their latest version.

Apache was a minor upgrade - it's now 2.2.8, and was formerly working well
on 2.2.7(-something). I'm not certain that this is the cause, but it's one
of the only things to have changed that I can think of.

I've just gone through and put in a bunch of logging requests throughout the
code in the hope that one of them will lead me to where it's stalling. I'll
keep you posted if I find anything interesting.

Best regards,
Jason

On 28/03/2008, Peter Cooper <pcoo...@gmail.com> wrote:


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
ChrisR  
View profile
 More options Mar 28, 6:36 am
From: ChrisR <EvilGeen...@gmail.com>
Date: Fri, 28 Mar 2008 03:36:30 -0700 (PDT)
Local: Fri, Mar 28 2008 6:36 am
Subject: Re: Switchpipe stops responding after a random period of time
The Delegate proxy (http://www.delegate.org) is the only thing in
front of switchpipe.  When it crashes, I can't even access it directly
(without going through the proxy).

Maybe you could give me a modified version of switchpipe.rb that
created a log file with more information, then I could give you back
the log file today after it hangs.

Thanks
Chris


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
ChrisR  
View profile
 More options Mar 28, 11:32 am
From: ChrisR <EvilGeen...@gmail.com>
Date: Fri, 28 Mar 2008 08:32:35 -0700 (PDT)
Local: Fri, Mar 28 2008 11:32 am
Subject: Re: Switchpipe stops responding after a random period of time
I'm sure its probably nothing but it says :
VERSION = "1.03"

at the top of the switchpipe.rb in the trunk and in v1.04


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Cooper  
View profile
 More options Mar 31, 7:43 am
From: Peter Cooper <pcoo...@gmail.com>
Date: Mon, 31 Mar 2008 04:43:34 -0700 (PDT)
Local: Mon, Mar 31 2008 7:43 am
Subject: Re: Switchpipe stops responding after a random period of time

On Mar 28, 4:32 pm, ChrisR <EvilGeen...@gmail.com> wrote:

> I'm sure its probably nothing but it says :
> VERSION = "1.03"

> at the top of the switchpipe.rb in the trunk and in v1.04

Thanks for the extra updates. I'm travelling at Ruby conferences at
the moment but when I get back I'll rig up SwitchPipe with Apache 2 in
front of it (I have only used it with Apache 1.3) and see if I can
recreate some of these issues.

Cheers,
Pete


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jason Stirk  
View profile
 More options Mar 31, 8:03 am
From: "Jason Stirk" <jst...@gmail.com>
Date: Mon, 31 Mar 2008 23:03:36 +1100
Local: Mon, Mar 31 2008 8:03 am
Subject: Re: [SwitchPipe] Re: Switchpipe stops responding after a random period of time

Pete,

Brief status report on my logging exercises. Last failure took about 48
hours to come about, which made it a bit slow to work out where it's dying.

Data was still coming in, but a backend instance was never getting launched.
I've since added more logging in between the request coming in, and when
it's processed in the hope of working out exactly where it's aborting.

Chris contacted me off list, and I've sent him a diff with the additional
logger calls. If his SwitchPipe is dying as often as he says, hopefully it
will be able to give us a really good idea of where it's dying without the
48 hour-or-so wait that I'm battling against.

I've attached the diff just so as that you can stay clued in to where I'm
looking. Nothing fancy (just calls to the logger).

Enjoy the conference!

Best regards,
Jason

On 31/03/2008, Peter Cooper <pcoo...@gmail.com> wrote:

  switchpipe-logging-31Mar2008.diff
4K Download

    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
ChrisR  
View profile
 More options Mar 31, 10:34 am
From: ChrisR <EvilGeen...@gmail.com>
Date: Mon, 31 Mar 2008 07:34:14 -0700 (PDT)
Local: Mon, Mar 31 2008 10:34 am
Subject: Re: Switchpipe stops responding after a random period of time
I applied the diff file and waited for it to hang, here is the last
few entries of the log file:

D, [2008-03-31T14:58:25.260526 #4153] DEBUG -- : Received data (494
bytes)
D, [2008-03-31T14:58:25.260786 #4153] DEBUG -- : Got request (494
bytes)
D, [2008-03-31T14:58:25.261239 #4153] DEBUG -- : Entered process()
D, [2008-03-31T14:58:26.670501 #4153] DEBUG -- : Received data (698
bytes)
D, [2008-03-31T14:58:26.670759 #4153] DEBUG -- : Got request (698
bytes)
D, [2008-03-31T14:58:26.671257 #4153] DEBUG -- : Entered process()
D, [2008-03-31T14:58:47.779328 #4153] DEBUG -- : Received data (494
bytes)
D, [2008-03-31T14:58:47.779589 #4153] DEBUG -- : Got request (494
bytes)
D, [2008-03-31T14:58:47.780121 #4153] DEBUG -- : Entered process()
D, [2008-03-31T14:58:49.857491 #4153] DEBUG -- : Received data (669
bytes)
D, [2008-03-31T14:58:49.857720 #4153] DEBUG -- : Got request (669
bytes)
D, [2008-03-31T14:58:49.858210 #4153] DEBUG -- : Entered process()
D, [2008-03-31T15:00:04.834368 #4153] DEBUG -- : Received data (1018
bytes)
D, [2008-03-31T15:00:04.834629 #4153] DEBUG -- : Got request (1018
bytes)
D, [2008-03-31T15:00:04.835248 #4153] DEBUG -- : Entered process()

So its getting stuck at "Entered process()" for every request, which
is just before the line

 EventMachine.defer(before, after)

The mystery continues, is it EventMachine's fault?

Thanks

Chris


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jason Stirk  
View profile
 More options Apr 3, 10:35 pm
From: "Jason Stirk" <jst...@gmail.com>
Date: Fri, 4 Apr 2008 13:35:36 +1100
Local: Thurs, Apr 3 2008 10:35 pm
Subject: Re: [SwitchPipe] Re: Switchpipe stops responding after a random period of time

Hi Chris,

Sorry about the delay - I've been waiting for things to crash again.

I'm getting exactly the same thing as you. It's like EventMachine.defer() is
swallowing the request, never to be seen again.

I've hacked the EventMachine.defer code to give me some logs of what's
actually happening. EventMachine starts 20 threads that sit around and watch
a Queue that has all the incoming requests. I'm wondering if something's
happening that's making these 20 threads slowly die, or block or something
like that. The logging I've added just tells me which thread is handling
each request, how many items are waiting, etc.

Hopefully I'll have something to share within a few days.

In the mean time, it's a dirty dirty hack, but have you thought about
restarting switchpipe every hour or so before it dies via cron? That's what
I'm currently looking at as the alternative if we can't track down the bug.
Nasty "solution", I know.

All the best,
Jason

On 01/04/2008, ChrisR <EvilGeen...@gmail.com> wrote:


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jason Stirk  
View profile
 More options Apr 4, 12:12 am
From: "Jason Stirk" <jst...@gmail.com>
Date: Fri, 4 Apr 2008 15:12:41 +1100
Local: Fri, Apr 4 2008 12:12 am
Subject: Re: [SwitchPipe] Re: Switchpipe stops responding after a random period of time

Hi all,

I believe I've found the problem, and created a fix. Diff against r40
attached.

I finally managed to reproduce the error once I could see what was happening
with the EventMachine threads.

It seems that "GET / HTTP/1.0" requests were being issued to the SwitchPipe
process, presumably sent by mod_proxy trying to check that the back-end was
still alive. I remember seeing log history of this before, but I'm unable to
find the exact configuration as of yet.

When these requests come in, SwitchPipe tries to work out which application
to launch - first based upon the host-name of the request (which my
particular configuration doesn't define), and then by the path of the
request. The code that works out the path (app_from_path) in the case of the
request for "/" will return false. This then causes an exception to be
raised in App.find_by_path as it tries to call to_sym() on false. This
causes the Thread to die, with no notification. As EventMachine creates 20
Threads initially, and never maintains them, each request for "/" causes a
Thread to die. Eventually, EventMachine runs out the Threads and the
requests just keep getting queued up. This is when SwitchPipe obviously
stalls.

The attached patch changes App.find_by_path so as that if to_sym can not be
called on the supplied argument, it returns nil. This then causes the rest
of the process() code to realise that an application could not be found, and
the connection is dropped. It also adds a few extra lines of logging to
identify when some errors occur.

With the patch applied, issuing "GET / HTTP/1.0" requests to the SwitchPipe
process no longer causes Threads to die, but rather abandons the connection
cleanly.

Please let me know if the patch works for you - it appears to be working
here. Many thanks to Chris for his help.

Best regards,
Jason Stirk
Achernar Solutions
http://achernarsolutions.com.au/

On 04/04/2008, Jason Stirk <jst...@gmail.com> wrote:

[ switchpipe-fix-04Apr2008.diff.txt ]
Index: lib/switchpipe.rb
===================================================================
--- lib/switchpipe.rb   (revision 40)
+++ lib/switchpipe.rb   (working copy)
@@ -363,7 +363,9 @@

     # Return an app matching on the path
     def self.find_by_path(path)
-      @@apps[path.to_sym]
+      # NOTE: We need to ignore when we can't symbolize the path, as
+      #       app_from_path() returns FALSE in some cases
+      @@apps[path.to_sym] if path.respond_to?(:to_sym)
     end

     # Return an app matching on the hostname
@@ -419,6 +421,7 @@

     rescue
       # When all else fails..
+      LOG.error "Caught #{$!} - close_connection called"
       close_connection
     end

@@ -434,6 +437,7 @@
           found_by = :path
         end

+        LOG.debug('Failed to find app') unless app
         return '' unless app

         # Log the request
@@ -718,4 +722,4 @@
       "#{@host}:#{@port}"
     end
   end
-end
\ No newline at end of file
+end


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
ChrisR  
View profile
 More options Apr 4, 10:49 am
From: ChrisR <EvilGeen...@gmail.com>
Date: Fri, 4 Apr 2008 07:49:50 -0700 (PDT)
Local: Fri, Apr 4 2008 10:49 am
Subject: Re: Switchpipe stops responding after a random period of time