Workers being killed silently, no errors

991 views
Skip to first unread message

Andrew Havens

unread,
Aug 7, 2012, 11:07:50 AM8/7/12
to delay...@googlegroups.com
We have been running delayed job in development mode successfully for a few months now (we have not yet released this app to production). We haven't noticed any issues until recently, but they might have gone un-noticed for a while. We recently upgraded some gems, so that may have something to do with it.

Now, whenever delayed job is given a job, it immediately kills the worker daemon with no explanation. Running script/delayed_job status gives us:

pid-file for killed process 1143 found (/appPath/tmp/pids/delayed_job.pid), deleting. delayed_job: no instances running

The strange thing is that it works fine when it is not daemonized by running script/delayed_job run

What can we do to debug this issue?

Gems:
rails 3.2.7
mongoid 3.0.3
delayed_job 3.0.3
delayed_job_mongoid 2.0.0

Andrew Havens

unread,
Aug 7, 2012, 1:16:32 PM8/7/12
to delay...@googlegroups.com
If you could point me to the place in the source code where I could add some extra logging/begin-rescue statements, that would help me to figure this out.

Matt Griffin

unread,
Aug 7, 2012, 1:32:48 PM8/7/12
to delay...@googlegroups.com
Workers die because either 1) you killed them or 2) ruby segfaults or 3) some unhandled exception causes the worker to drop out of the execution loop.

When I'm trying to track things down I normally just run the worker undaemonized and see what it spits out.

Andrew Havens

unread,
Aug 7, 2012, 2:41:31 PM8/7/12
to delay...@googlegroups.com
Workers die because either 1) you killed them or 2) ruby segfaults or 3) some unhandled exception causes the worker to drop out of the execution loop.
https://github.com/collectiveidea/delayed_job/blob/master/lib/delayed/worker.rb#L131

So since I am not killing them, it must be either Ruby is segfaulting or there is an unhandled exception. I noticed that in a previous version of the delayed_job_mongoid gem the "reserve job" logic was wrapped in a begin/rescue, but it was removed in recent versions. Could this be the cause?




Andrew Havens

unread,
Aug 7, 2012, 2:44:20 PM8/7/12
to delay...@googlegroups.com
When I'm trying to track things down I normally just run the worker undaemonized and see what it spits out.

I mentioned that it works when I run it undaemonized as "script/delayed_job run". Is that what you meant? Or did you mean something different?

Andrew Havens

unread,
Aug 7, 2012, 7:06:34 PM8/7/12
to delay...@googlegroups.com
Workers die because either 1) you killed them or 2) ruby segfaults or 3) some unhandled exception causes the worker to drop out of the execution loop.

After much debugging, it appears to be a segfault in Random.rand. Updating to ruby-1.9.3-head did not fix the problem. However, calling r = Random.new; r.rand seemed to work. =[

Ricardo Bernardeli

unread,
Aug 7, 2012, 11:41:16 PM8/7/12
to delay...@googlegroups.com
Hi!

You could also call rand directly.

1.9.3p194 :001 > rand
 => 0.02043360718963627 
1.9.3p194 :002 > rand 100
 => 18 

I don't know if Kernel.rand and Random.rand are the same implementation.

On 7 August 2012 20:06, Andrew Havens <misbe...@gmail.com> wrote:
Workers die because either 1) you killed them or 2) ruby segfaults or 3) some unhandled exception causes the worker to drop out of the execution loop.

After much debugging, it appears to be a segfault in Random.rand. Updating to ruby-1.9.3-head did not fix the problem. However, calling r = Random.new; r.rand seemed to work. =[



Andrew Havens

unread,
Aug 8, 2012, 10:09:36 AM8/8/12
to delay...@googlegroups.com
You could also call rand directly.

Good to know, thanks. I'm really only guessing that this was the problem since it works when I change it. There is no output nor error message anywhere, so I don't have any way of knowing.

Do you know if there is any way that I can see a log or dump of the segmentation fault? I'm new to this sort of thing so I don't really understand it.

Ricardo Bernardeli

unread,
Aug 8, 2012, 3:54:38 PM8/8/12
to delay...@googlegroups.com
Well, have u set a log for delayed_job?

This is my config/initializers/delayed_job.rb

# -*- encoding : utf-8 -*-
Delayed::Worker.logger = Logger.new(File.join(Rails.root, 'log', 'delayed_job.log'))

Everything that happens should be there!

Andrew Havens

unread,
Aug 8, 2012, 4:54:31 PM8/8/12
to delay...@googlegroups.com
After further investigation, this turned out to be a bug in the daemons gem which delayed job depends on. It was not pre-seeding the random number generator after forking a new process. I patched it and submitted it here: http://rubyforge.org/tracker/index.php?func=detail&aid=29627&group_id=524&atid=2086


Ratnakar Vanapalli

unread,
Dec 6, 2014, 5:33:35 AM12/6/14
to delay...@googlegroups.com, misbe...@gmail.com
Hi ,

I had the same issue , what I need to do ?

All you are saying its with Random.new.rand where I need to place it , can you please help me asap its very urgent for me 

thanks for the help

saj...@royalbrothers.com

unread,
May 21, 2018, 8:18:08 AM5/21/18
to delayed_job
Hi Andrew Havens,

I'm facing the same issue. The link that you provided is not reachable. Can you please help to resolve this issue and where to place Random.new.rand?
Reply all
Reply to author
Forward
0 new messages