Hi all,
I believe I've found the problem, and created a fix. Diff against r40
attached.
I finally managed to reproduce the error once I could see what was happening
with the EventMachine threads.
It seems that "GET / HTTP/1.0" requests were being issued to the SwitchPipe
process, presumably sent by mod_proxy trying to check that the back-end was
still alive. I remember seeing log history of this before, but I'm unable to
find the exact configuration as of yet.
When these requests come in, SwitchPipe tries to work out which application
to launch - first based upon the host-name of the request (which my
particular configuration doesn't define), and then by the path of the
request. The code that works out the path (app_from_path) in the case of the
request for "/" will return false. This then causes an exception to be
raised in App.find_by_path as it tries to call to_sym() on false. This
causes the Thread to die, with no notification. As EventMachine creates 20
Threads initially, and never maintains them, each request for "/" causes a
Thread to die. Eventually, EventMachine runs out the Threads and the
requests just keep getting queued up. This is when SwitchPipe obviously
stalls.
The attached patch changes App.find_by_path so as that if to_sym can not be
called on the supplied argument, it returns nil. This then causes the rest
of the process() code to realise that an application could not be found, and
the connection is dropped. It also adds a few extra lines of logging to
identify when some errors occur.
With the patch applied, issuing "GET / HTTP/1.0" requests to the SwitchPipe
process no longer causes Threads to die, but rather abandons the connection
cleanly.
Please let me know if the patch works for you - it appears to be working
here. Many thanks to Chris for his help.
Best regards,
Jason Stirk
Achernar Solutions
http://achernarsolutions.com.au/
On 04/04/2008, Jason Stirk <jst...@gmail.com> wrote:
> Hi Chris,
> Sorry about the delay - I've been waiting for things to crash again.
> I'm getting exactly the same thing as you. It's like EventMachine.defer()
> is swallowing the request, never to be seen again.
> I've hacked the EventMachine.defer code to give me some logs of what's
> actually happening. EventMachine starts 20 threads that sit around and watch
> a Queue that has all the incoming requests. I'm wondering if something's
> happening that's making these 20 threads slowly die, or block or something
> like that. The logging I've added just tells me which thread is handling
> each request, how many items are waiting, etc.
> Hopefully I'll have something to share within a few days.
> In the mean time, it's a dirty dirty hack, but have you thought about
> restarting switchpipe every hour or so before it dies via cron? That's what
> I'm currently looking at as the alternative if we can't track down the bug.
> Nasty "solution", I know.
> All the best,
> Jason
> On 01/04/2008, ChrisR <EvilGeen...@gmail.com> wrote:
> > I applied the diff file and waited for it to hang, here is the last
> > few entries of the log file:
> > D, [2008-03-31T14:58:25.260526 #4153] DEBUG -- : Received data (494
> > bytes)
> > D, [2008-03-31T14:58:25.260786 #4153] DEBUG -- : Got request (494
> > bytes)
> > D, [2008-03-31T14:58:25.261239 #4153] DEBUG -- : Entered process()
> > D, [2008-03-31T14:58:26.670501 #4153] DEBUG -- : Received data (698
> > bytes)
> > D, [2008-03-31T14:58:26.670759 #4153] DEBUG -- : Got request (698
> > bytes)
> > D, [2008-03-31T14:58:26.671257 #4153] DEBUG -- : Entered process()
> > D, [2008-03-31T14:58:47.779328 #4153] DEBUG -- : Received data (494
> > bytes)
> > D, [2008-03-31T14:58:47.779589 #4153] DEBUG -- : Got request (494
> > bytes)
> > D, [2008-03-31T14:58:47.780121 #4153] DEBUG -- : Entered process()
> > D, [2008-03-31T14:58:49.857491 #4153] DEBUG -- : Received data (669
> > bytes)
> > D, [2008-03-31T14:58:49.857720 #4153] DEBUG -- : Got request (669
> > bytes)
> > D, [2008-03-31T14:58:49.858210 #4153] DEBUG -- : Entered process()
> > D, [2008-03-31T15:00:04.834368 #4153] DEBUG -- : Received data (1018
> > bytes)
> > D, [2008-03-31T15:00:04.834629 #4153] DEBUG -- : Got request (1018
> > bytes)
> > D, [2008-03-31T15:00:04.835248 #4153] DEBUG -- : Entered process()
> > So its getting stuck at "Entered process()" for every request, which
> > is just before the line
> > EventMachine.defer(before, after)
> > The mystery continues, is it EventMachine's fault?
> > Thanks
> > Chris