bash(16731)---python2.7(24545)-+-python2.7(24564)---{python2.7}(24565)
|-python2.7(24572)-+-python2.7(1110)
| |-python2.7(8647)
| |-python2.7(11747)
| |-python2.7(14117)
| |-python2.7(14302)
w2p -K arm:ticker,arm,arm,arm
09:09:47.752 [24576] Process-4:488,
09:14:28.907 [24576] Process-4:488, Process-4:1125,
09:15:59.526 [24576] Process-4:488, Process-4:1125, Process-4:1301,
09:20:35.924 [24576] Process-4:488, Process-4:1880, Process-4:1125, Process-4:1301,
def async(self, task):
...
out = multiprocessing.Queue()
queue = multiprocessing.Queue(maxsize=1)
p = multiprocessing.Process(target=executor, args=(queue, task, out))
...
if p.is_alive():
p.terminate()
logger.debug(' +- Zombie (%s)' % multiprocessing.active_children())
def executor(queue, task, out):
"""The function used to execute tasks in the background process."""
logger.debug(' task started PID:%s -> %s' % (os.getppid(),os.getpid()))
...
I'd say there are a LOT of strange things going on on your system, since you're reporting several different issues that nobody ever faced and all in the last week.
zombie processes shouldn't be there unless you killed improperly a worker process.Python can't really do anything about it, and that's the way there's a specific API to kill (or terminate) a worker.
bash(16731) // my shell
\---python2.7(24545) // scheduler.py (-K)
\-+-python2.7(24564)---{python2.7}(24565) // idling worker
|-python2.7(24572) // worker with picked task
\-python2.7(1110) // still waiting for semaphore (TIMEOUT)
\-python2.7(8647) // still waiting for semaphore (TIMEOUT)
\-python2.7(11747) // still waiting for semaphore (TIMEOUT)
\-python2.7(14117) // run the actually task (RUNNING)
\-python2.7(14302) // still waiting for semaphore (TIMEOUT)
p = multiprocessing.Process(executor, ....)
p.start()
try:
#task runs
p.join(run_timeout)
except:
#this should be raised only when a general error on the task happened, so it's a STOPPED one
p.terminate()
p.join()
else:
#this is the codepath your task takes, since its the one landing TIMEOUT tasks
if p.is_alive():
# this is ultimately the call that SHOULD kill the process you later find as a zombie
p.terminate()
....
... else:
if p.is_alive():
logger.debug(' MD: terminating')
p.terminate()
logger.debug(' MD: terminated')
logger.debug(' MD: joining')
p.join()
logger.debug(' MD: joined')
logger.debug(' task timeout')
try:
# we try to get a traceback here
tr = queue.get(timeout=2)
...
From where I stand, if the result is the task being labelled as TIMEOUT (with the corresponding "task timeout" debug line), it can only be originated there.Maybe there's a culprit there.... can you add a p.join() after that p.terminate(), and maybe a few debug lines ?i.e.
... else:
if p.is_alive():
logger.debug(' MD: terminating')
p.terminate()
logger.debug(' MD: terminated')
logger.debug(' MD: joining')
p.join()
logger.debug(' MD: joined')
logger.debug(' task timeout')
try:
# we try to get a traceback here
tr = queue.get(timeout=2)
...
[01] logs $> pstree -p 16731
bash(16731)---python2.7(28670)-+-python2.7(28678)---{python2.7}(28679)
|-python2.7(28680)-+-python2.7(29554)
| `-{python2.7}(28681)
...
11-04 07:56:47.508 [28680] task starting CPID:29554
...
... configured timeout 30 seconds ...
...
11-04 07:57:02.510 [28680] MD: terminating
11-04 07:57:02.512 [28680] MD: terminated
11-04 07:57:02.512 [28680] MD: joining
[01] logs $> pstack 29554
#0 0x00000030db40ce51 in sem_wait () from /lib64/libpthread.so.0
#1 0x00002ba981e3123d in PyThread_acquire_lock () from /usr/local/lib/libpython2.7.so.1.0
#2 0x00002ba981e35202 in lock_PyThread_acquire_lock () from /usr/local/lib/libpython2.7.so.1.0
#3 0x00002ba981dfb9c9 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.7.so.1.0
...
#11 0x00002ba981d82252 in function_call () from /usr/local/lib/libpython2.7.so.1.0
#12 0x00002ba981d54318 in PyObject_Call () from /usr/local/lib/libpython2.7.so.1.0
...
#46 0x00002ba981d82252 in function_call () from /usr/local/lib/libpython2.7.so.1.0
#47 0x00002ba981d54318 in PyObject_Call () from /usr/local/lib/libpython2.7.so.1.0
#48 0x00002ba981d6499f in instancemethod_call () from /usr/local/lib/libpython2.7.so.1.0
#49 0x00002ba981d54318 in PyObject_Call () from /usr/local/lib/libpython2.7.so.1.0
#50 0x00002ba981db701c in slot_tp_init () from /usr/local/lib/libpython2.7.so.1.0
#51 0x00002ba981db0f58 in type_call () from /usr/local/lib/libpython2.7.so.1.0
...
#61 0x00002ba981e20f37 in PyRun_SimpleFileExFlags () from /usr/local/lib/libpython2.7.so.1.0
#62 0x00002ba981e33726 in Py_Main () from /usr/local/lib/libpython2.7.so.1.0
#63 0x00000030da81d9f4 in __libc_start_main () from /lib64/libc.so.6
#64 0x0000000000400629 in _start ()
11-04 08:40:59.521 [13556] web2py.Scheduler - DEBUG - MD: terminating (os kill)
11-04 08:40:59.521 [13556] web2py.Scheduler - DEBUG - MD: terminated
11-04 08:40:59.522 [13556] web2py.Scheduler - DEBUG - MD: joining
11-04 08:40:59.524 [13556] web2py.Scheduler - DEBUG - MD: joined
BTW: do your task write a lot on stdout/stderr and/or return huge results ?
--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to the Google Groups "web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.