Unexpected killing of workers (wr_app_wkr_add_timeout_cb)

0 views
Skip to first unread message

gom

unread,
Jul 9, 2010, 8:43:53 AM7/9/10
to WebROaR - Ruby Application Server
Hello,

I am running my rails app on WebROaR and it's about half a month
since the site launched. It is running fine, thanks to your team.

Today, I got very heavy load average (10-15) on the box
and the workers seem to run so slow. As Nikunj described in
"TimeOutsettings?" thread, the Head sent 2 pings to a worker
then killed it (found in webroar.log), and invoked another.

"TimeOutsettings?"
http://groups.google.com/group/webroar/browse_thread/thread/cc91fbdaf6f067c0

But after a series of ping > kill repeated many times, different
messages appeared. Googled it but no clue.

----
Fri Jul 9 13:02:49 2010-11583-Info:PID of created worker = 15712,
Rails application=/path/to/rails
Fri Jul 9 13:06:59 2010-11583-Info:wr_app_wkr_add_timeout_cb: killing
worker, pid = 15712
----

Does this indicate any extraordinary situation?
After 30 min, workers got back to normal condition without any
operation,
but I'm afraid the same things may happen again.

Thanks in advance,
gom

Nikunj Limbaseeya

unread,
Jul 10, 2010, 7:20:52 AM7/10/10
to web...@googlegroups.com

gom,

Thanks you for using WebROaR.

In 'TimeOut settings?' response, I have summarized mechanism to identify the worker
which hangs while processing a request.

Summarizing the scenario.
   - Head waits for 60 seconds to get processed request back from worker.
   - After 60 seconds it sends first PING signal and waits for 15 seconds to
   get a reply.
   - After 15 seconds it sends second PING signal and again waits for 15
   seconds to get some reply.
   - If worker is not responding during this time interval, the head assumes
   the worker is in a unstable state and unable to process further requests.

We have defined values for worker idle time (WR_WKR_IDLE_TIME), ping wait
time (WR_PING_WAIT_TIME) and number of ping trials (WR_PING_TRIALS) in
'wr_config.h' file.

Lets summarize the scenario to create new worker.

  - Head creates new worker and wait for it to contact back.
  - If worker is unable to contact back within 25 seconds, it is assumed that there might be some problem with loading application and worker never contact back. It kills the worker.
  - Create new worker and repeat the above step.

If three consecutive workers get timed out and not contacted back, head assumes that there is no enough memory or processing power to create new worker.
In this case WebROaR waits for 30 minutes to create new worker.

Previously we have defined values for worker add timeout (WR_WKR_ADD_TIMEOUT) and wait time to create new workers (WR_WKR_ADD_WAIT_TIME) in 'wr_config.h'. But in current code we have stored these values in variables in wr_config_server_init(wr_config.c) function.

In your case, due to high load three consecutive workers got timed out and the server waits for 30 minutes to create new workers. After 30 minutes worker created and connected to the server successfully.

I hope this reply would solve your queries.

Thanks,
Nikunj

gom

unread,
Jul 14, 2010, 5:18:25 AM7/14/10
to WebROaR - Ruby Application Server
Nikunj,

Thank you for a detailed introduction on how the workers are created.
With your comment, now I can understand why my app got back to normal
state by itself.


Thanks,
gom


On 7月10日, 午後8:20, Nikunj Limbaseeya <nikunj.limbase...@webroar.in>
wrote:
Reply all
Reply to author
Forward
0 new messages