The app starts up with the usual complement of 10 (or whatever I set
server.thread_pool to) HTTP server threads (CherryPy WorkerThreads),
along with a few other threads as usual.
After a period of time, where the app has had minimal activity (a few
hits per minute at most) the CherryPy WorkerThreads die off and are
not replaced. This continues until there are no CherryPy
WorkerThreads left, at which point any HTTP requests simply hang
(clients can connect, but receive no response until they timeout). I
have to restart the application to get it accepting requests again.
Anyone seen this sort of behaviour before with TG or CP? I'm going to
continue to debug it, but any hints would be helpful.
TurboGears Complete Version Information
TurboGears requires:
* TurboGears 1.0.4b1
* cElementTree 1.0.5-20051216
* elementtree 1.2.6
* SQLAlchemy 0.3.11
* TurboKid 1.0.3
* TurboJson 1.0
* TurboCheetah 0.9.5
* simplejson 1.3
* setuptools 0.6c7
* RuleDispatch 0.5a0.dev-r2306
* PasteScript 0.9.7
* FormEncode 0.7.1
* DecoratorTools 1.5
* configobj 4.3.2
* CherryPy 2.2.1
* Cheetah 2.0rc7
* kid 0.9.6
* RuleDispatch 0.5a0.dev-r2306
* Cheetah 2.0rc7
* PyProtocols 1.0a0
* Cheetah 2.0rc7
* PasteDeploy 0.9.6
* Paste 0.9.7
Toolbox Gadgets
* info (TurboGears 1.0.4b1)
* catwalk (TurboGears 1.0.4b1)
* shell (TurboGears 1.0.4b1)
* designer (TurboGears 1.0.4b1)
* widgets (TurboGears 1.0.4b1)
* admi18n (TurboGears 1.0.4b1)
Identity Providers
* sqlobject (TurboGears 1.0.4b1)
* sqlalchemy (TurboGears 1.0.4b1)
tg-admin Commands
* info (TurboGears 1.0.4b1)
* shell (TurboGears 1.0.4b1)
* quickstart (TurboGears 1.0.4b1)
* update (TurboGears 1.0.4b1)
* sql (TurboGears 1.0.4b1)
* i18n (TurboGears 1.0.4b1)
* toolbox (TurboGears 1.0.4b1)
Visit Managers
* sqlobject (TurboGears 1.0.4b1)
* sqlalchemy (TurboGears 1.0.4b1)
Template Engines
* cheetah (TurboCheetah 0.9.5)
* json (TurboJson 1.0)
* kid (TurboKid 1.0.3)
* genshi-markup (Genshi 0.4.4)
* genshi-text (Genshi 0.4.4)
* genshi (Genshi 0.4.4)
Widget Packages
* file_fields (FileFields 0.1a6.dev-r612)
* tgcaptcha (TGCaptcha 0.11)
TurboGears Extensions
* visit (TurboGears 1.0.4b1)
* identity (TurboGears 1.0.4b1)
* file_server (FileFields 0.1a6.dev-r612)
* tg_media_farm (TGMediaFarm 0.6.2)
Cheers,
Chris Miles
Even with TG configured to output everything to log files, these
tracebacks are still written to stderr. This happens within
WorkerThread.run() which simply calls traceback.print_exc() for
unhandled exceptions (in this case we were seeing socket timeout
errors). There's no option to write these to the log files, which I
think could be considered a CherryPy bug.
Our init.d script for this app looked like (stripped down to only the
important bits):
{{{
. /etc/rc.d/init.d/functions
PROG_RUN=/appdir/start-tgapp.py
PROG_CONF=/appdir/turbogears-prod.cfg
PROG_USER=apache
daemon --user $PROG_USER --check $PROG_RUN "( $PROG_RUN $PROG_CONF & )"
}}}
Our fix was to change the daemon line to:
{{{
daemon --user $PROG_USER --check $PROG_RUN "( $PROG_RUN $PROG_CONF 2>/
dev/null & )"
}}}
Cheers,
Chris
Socket timeouts occur when the client kills the connection in an
unclean fashion. This is the kind of thing your robot does not test
right now but that happen often in the real world.
This issue is a real one and we should try to lobby some maintenance
release on CP 2.2 (waiting for a better alternative which would be
upgrading to ... hush ... still in the secret lab)
Florent.