I am running Turbogears app (CP 2) on Debian in production, behind
Apache with mod_rewrite, and have a nasty problem: sometimes CherryPy
process dies. First I thought that it has something to do with the fact
that I was starting CherryPy process with "nohup" command to demonize
it, like:
$ nohup ./start-app.py prod.cfg &
So I wrote start-stop-daemon script, now the app is started this way:
start-stop-daemon --chuid $USER --start --pidfile $PIDFILE \
--chdir $APP_PATH --background --make-pidfile --startas
$DAEMON -- $CONFIG
But still, sometimes the process just dies. I've googled GoogleGroups
and found Remi's message
(http://groups.google.com/group/cherrypy-users/msg/3e6424f2a1d659f4)
that mention that CP can die "if tries to print something to the
console but there is no console" Does it mean that "print" statement
can't be used when app is demonized?
The problem is that I can't reproduce the problem manually. The
applications seems to die randomly, I dont know what cause it.
Thanks for any help!
Ksenia
I'm noticing a similar behavior - my TG-app 1.0b1/CP2.2.1/
py2.4.2 - dies without any clues/logged errors... It turns out
that I'm using a config almost identical to what David Dahl
was using in that post you refer to (lighty and all).
Its been happening often now, that it causing some concern.
I was under the impression it may be something wrong with
my config... It happens even under no activity when I have
the TG app running overnight.. usuall fails in less than 36hrs.
I'm actually starting my TG app in a terminal:
python(2.4.2) -v my-app-startup.py prod.cfg
with my logging specific config being:
----------------------------------------
[global]
server.environment="production"
server.thread_pool = 5
log_debug_info_filter.on = True
server.log_to_screen = True
server.log_file="/var/tmp/tg-app-server.log"
[logging]
[[handlers]]
[[[access_out]]]
# set the filename as the first argument below
args="('server.log',)"
class='FileHandler'
level='INFO'
formatter='message_only'
[[loggers]]
[[[myapp]]]
level='ERROR'
qualname='myapp'
handlers=['error_out']
[[[access]]]
level='INFO'
qualname='turbogears.access'
handlers=['access_out']
propagate=0
----------------------------------------
Is there a way to run CP with say pdb enabled,
so the failure may show up in the terminal...
I don't understand the different between running
TG/CP in a terminal as opposed to running it in
a console (screen) as David alludes to, in his
original post.
Pl. let us know what you find.
Thanks much,
/venkat
http://www.cherrypy.org/wiki/BehindApache says:
Note: The "os.setpgid(os.getpid(), 0)" line seems
to prevent Apache from killing the CP process after
a period of inactivity (many thanks to Matt Lewis
for this trick).
Robert Brewer
System Architect
Amor Ministries
fuma...@amor.org
Thanks Robert,
But this line is from autostart.cgi that I dont use - I start CP
process in the terminal, independant from Apache...
Ksenia.
> Its been happening often now, that it causing some concern.
> I was under the impression it may be something wrong with
> my config... It happens even under no activity when I have
> the TG app running overnight.. usuall fails in less than 36hrs.
The same here. The difference is that I always have an activity: I
wrote a script that checks every 5 minutes, using HEAD request, if the
site is still up.
I've checked the log from the last night, and see that the process has
died 3 hours later after the last GET request.
My (related to logging) config is:
server.environment="production"
server.thread_pool = 5
[logging]
[[handlers]]
[[[access_out]]]
# set the filename as the first argument below
args="('server.log',)"
class='FileHandler'
level='INFO'
formatter='message_only'
[[loggers]]
[[[myapp]]]
level='DEBUG'
qualname='myapp'
handlers=['debug_out']
[[[allinfo]]]
level='INFO'
handlers=['debug_out']
[[[access]]]
level='INFO'
qualname='turbogears.access'
handlers=['access_out']
propagate=0
>
> Is there a way to run CP with say pdb enabled,
> so the failure may show up in the terminal...
I dont know.
>
> I don't understand the different between running
> TG/CP in a terminal as opposed to running it in
> a console (screen) as David alludes to, in his
> original post.
I dont think I understand this too.
Thanks for your reply!
Ksenia.
I decided to try running my TG-app under pdb...
I launched my TG app in a terminal like so:
python -v -m pdb start-myapp.py prog.cfg
Lo and behold, its been running ok for >48hrs
this way... no crashes, CP or otherwise...
Go to show that "when something is instrumented
its behavior is slightly changed..." :-)
Its getting difficult to solve this problem...
/venkat
Have you considered a trace logger? It will slow down your system, but should at least show you what the "last call" before the crash was. I use PyConquer whenever I need to debug the guts of CherryPy itself: http://projects.amor.org/misc/wiki/PyConquer (in fact, I wrote it to debug CP ;)
Well, I spoke too early there... and it crashed again :-)
After some more searching on the 'net, I found someone
say that lighttpd sometimes silently crashes if it finds
some config specified in a way it doesn't like...
The reason for failure was the sudden removal of the
Unix Domain Socket that was setup between lighty &
TG (CP). TG was confgiured to setup the uds, and
it and lighty were configured to use it. Unknown to
TG (CP) that uds was removed, very likely by
lighty. There were no error logs in TG (CP) re the
sudden removal of the UDS, but in lighty's error log,
I noticed this:
...
2006-12-11 10:31:33: (/proj/lighttpd/src/mod_fastcgi.c.1739) connect
failed: No such file or directory on unix:/tmp/tgapp_fcgi_socket
2006-12-11 10:31:33: (/proj/lighttpd/src/mod_fastcgi.c.2851) backend
died, we disable it for a 5 seconds and send the request to another
backend instead: reconnects: 0 load: 1
...
I doubt TG (CP) were behind the removal of that uds.
I changed the the following fastcgi config section from:
...
$HTTP["host"] =~ "" {
fastcgi.server = ( "" =>
( "127.0.0.1" =>
(
"socket" => "/tmp/tgapp_fcgi_socket",
"check-local" => "disable",
)
)
)
}
...
to:
...
$HTTP["host"] =~ "" {
fastcgi.server = ( "" =>
((
"socket" => "/tmp/tgapp_fcgi_socket",
"check-local" => "disable",
"min-procs" => 1,
"max-procs" => 5
))
)
}
...
Note the removal of "127.0.0.1" as the "backend" specifier
making it more generic and the addition of the "procs"
entries
In the last 48 hrs I haven't seen a crash... hopefully this
fixes the problem.
/venkat
I know I'm coming in a little late to this thread, but I wanted to share
how I solved a similar problem with Lighttpd. It may be the same
problem manifesting itself differently, but anyway...
I use Ubuntu, so I installed their version of Lighttpd and followed a
sample from a web page on how to set up fcgi. It worked great, but for
some reason the crazy server kept on crashing, without warning or log
entry. After looking around a bit I noticed that almost everyone else
asking questions on this was also using Ubuntu. The solution that
worked for me (it may not work for you if it is not the same problem, of
course) was to _uninstall_ the Ubuntu version of Lighttpd and compile it
from scratch with relevant options set for my environment. It hasn't
crashed since then.
Like I said I can't claim this is the end-all solution for this problem
but it may help some....
Good luck!
Tom Fillmore
Ok, I've found the reason... it turned to be very simple, but new to me
(learning Debian): oom-killer. Wow.
Sorry for noise.
Ksenia.
Uh, yes. Wow. Maybe the CP Engine should write OOM_DISABLE to
/proc/<pid>/oomadj right after it starts up?