> Does anybody have any clues?
tl;dr: try this for now:
http://github.com/guns/delayed_job/blob/delayed_job_daemon/lib/delayed/daemon_tasks.rb
There are alot of people having the same issue as you. It seems the
Daemons-backed `delayed_job' script has developed some issues in the
push towards 2.1.0.
Presently there are two solutions:
1) Run `rake jobs:work' in production and use a process monitoring
daemon like monit or god to manage your worker processes
2) Rearchitect the delayed_job script as a classic pre-forking daemon
that dispatches jobs to worker processes. See
http://github.com/collectiveidea/delayed_job/issues#issue/25
The second solution is the one we should be pushing towards, but it
doesn't exist presently. When it is developed though, I suspect the
Daemons library will not be involved. I believe it is at the center of
this issue.
The design of the Daemons gem hinges upon using process groups and
pidfiles to manage daemonized scripts. Besides being brittle, it seems
like an odd decision, since forking daemons usually involve a master
process that forks children processes and manages them with SIGTERM,
SIGHUP, and SIGCLD.
However, Daemons does not fork proper children, but instead double
forks standalone daemons, managing them via signals to the process
group. Also if a monitoring process is spawned, it checks to see if
your processes are still alive every 30s. This is necessary because
the daemons process neither receives SIGCLD signals, nor can it use a
simple blocking `Process.wait' call to detect the death of children
processes.
Finally, though it may not actually be part of the problem, the author
of the gem consistently traps signals without reraising them or
restoring the default signal handlers. If you work with processes, you
should always propagate signals (see
http://www.cons.org/cracauer/sigint.html); `exit(130)` is not the same
as `Process.kill :TERM, $$`.
I don't mean to harsh on the Daemons gem. It seems like it could work
fine for simple scripts, but I don't think I would write a daemonizing
library in the same way. If I did write such a library, these are the
steps I would follow:
* silence the standard streams
* double fork the master to detach from the controlling process
* call setsid(2) to create a new process group
* create some children processes
* resurrect dead children on receipt of SIGCLD, or when wait(2)
returns a pid
* restart children on SIGHUP
* reopen logfiles on SIGUSR1 (optional but nice)
* terminate and reap children on SIGTERM, restore the default handler
for TERM, and resend TERM to self
--
Right now I am waiting for Brandon to finalize the API for 2.1.0, and
after that, time permitting, I would like to attempt to write a
replacement for script/delayed_job.
Until then however, I do have a forking daemon that I wrote some weeks
ago for a file-uploading site. It doesn't dispatch jobs like the final
solution should (the workers just fight for a db lock), but it does
reliably launch and manage multiple worker processes. You can see the
code here (I recommend reading the header):
http://github.com/guns/delayed_job/blob/delayed_job_daemon/lib/delayed/daemon_tasks.rb
It's just a beginning, it's not beautiful, and it should be more
configurable, but I have it running flawlessly in production right
now. Also, I've been keeping it up to date with all of Brandon's
changes.
Directions:
Point your Gemfile at the repo:
gem 'delayed_job', :git=>'git://github.com/guns/delayed_job.git',
:branch=>'delayed_job_daemon'
Run as a rake task:
WORKERS=n RAILS_ENV=production rake jobs:daemon:start
where `n' is the number of processes you'd like to spawn.
Further directions are in the header of `daemon_tasks.rb', the file
linked above.
Cheers,
guns
There are alot of people having the same issue as you. It seems the
Daemons-backed `delayed_job' script has developed some issues in the
push towards 2.1.0.
…
I don't mean to harsh on the Daemons gem. It seems like it could work
fine for simple scripts, but I don't think I would write a daemonizing
library in the same way. If I did write such a library, these are the
steps I would follow:
* silence the standard streams
* double fork the master to detach from the controlling process
* call setsid(2) to create a new process group
* create some children processes
* resurrect dead children on receipt of SIGCLD, or when wait(2)
returns a pid
* restart children on SIGHUP
* reopen logfiles on SIGUSR1 (optional but nice)
* terminate and reap children on SIGTERM, restore the default handler
for TERM, and resend TERM to self
Right now I am waiting for Brandon to finalize the API for 2.1.0, and
after that, time permitting, I would like to attempt to write a
replacement for script/delayed_job.
…
Cheers,
guns
> Are there any libraries out there that do this well? Unicorn is the
> only example I've seen, but it's a little too dense for me to wrap my
> mind around, no matter how many times I read
> http://tomayko.com/writings/unicorn-is-unix.
I don't know of any ruby libraries, but writing a forking Unix daemon
is really well-travelled territory, and a big project like delayed_job
would really benefit from pulling the logic in-house.
> The single biggest issue with delayed_job right now is the daemon,
> so I'm happy to put all other work aside to focus on this. Heck, if
> we could get this thing rewritten soonish, I'd love to just skip 2.1
> and go straight to 3.0 with a killer daemon.
That would be awesome! My schedule has been tight recently though, so
I don't know if I'd be able to get a final version out the door before
the end of the month. We should look at David's branch and try
polishing that if he already is most of the way there.
On 15 Sep 2010, at 10:35 AM, David Genord II wrote:
> I have actually done a second go round at just this. For this round
> I modeled my work off of phusion passenger.
Extracting from Passenger sounds like an excellent idea. I cloned your
branch and did a quick `git diff -w v2.1.0.pre' to scan your code.
Unfortunately, I wasn't able to run script/delayed_job from your
branch (I think it has a conflict with rails 3 release or ruby 1.9.2),
but I think the overall design is solid. I do have some nitpicks wrt
Passenger's paranoid signal handling, and I think the command line
script needs to be overhauled, but these are little issues.
One clarification:
> The basic overview is that script/delayed_job creates a server which
> spawns a producer to find and reserve jobs. These jobs are sent back
> to the server which spawns consumers on demand to run the jobs.
> One of the other things I would like to implement from this change
> is a configuration file. So that you could do something like the
> following and the server would handle all the scheduling
So the goal would be to have a machine-global spawn server that can
handle mutliple producers for different apps, and then manage the
global worker pool based on an rc file? That would be really awesome.
It would require, though, that the delayed_job script be divorced from
any single app, in favor of a single master script with a single
global config file. This would make administration of delayed_job dead
simple. :)
Is your branch mostly there, or would you like some help implementing
the details?
guns
> The config setup and spawn server would not be machine global, just within the app. While what you describe would be more useful, it would also be exponentially more difficult.
Oh I don't think it would be _exponentially_ more difficult. :)
> I'll dig through it tonight and see what I need and what I could use help with.
Well then I'll hang back and keep an eye on your branch. I hope we'll have a nice solution for the next release.
guns
> I just tested ruby 1.9.2-p0 with rails 3 and everything worked. What
> OS/DB/Backend were you testing my branch against?
It was on OS X 10.6 / MySQL 5 / ActiveRecord.
I didn't spend much time looking into it; I'll give it another go
later today, with a new app and an existing one.
> i can see a new process being created nicely, with a worker process.
> Only the trouble is: this process seems to kill itself continuously
> without ever doing anything.
> 2010-09-17T13:50:44+0200: [delayed_worker.master] SIGHUP received!
> Restarting workers.
> 2010-09-17T13:50:45+0200: [delayed_worker.master] SIGHUP received!
> Restarting workers.
> 2010-09-17T13:50:54+0200: [delayed_worker.master] SIGHUP received!
> Restarting workers.
> 2010-09-17T13:50:56+0200: [delayed_worker.master] SIGHUP received!
> Restarting workers.
> 2010-09-17T13:51:04+0200: [delayed_worker.master] SIGHUP received!
> Restarting workers.
> Small addition: i did a deploy to my production environment, and there
> the solution from guns is working fine.
Oops!
Well, we are moving towards David's branch, but if you're still interested in kicking around with mine, I pushed a bigfix that corrects the issue you were having.
Currently the master poll tmp/restart.txt for timestamp updates and sends a SIGHUP to itself on update (like Passenger). I thought I was correctly handling the case where tmp/restart.txt didn't exist, but I was wrong! It's fixed now.
guns