Gearman don't have to do it, this is not the concept.
--
http://www.danielmantovani.com
"If you’ve never written anything thoughtful, then you’ve never had
any difficult, important, or interesting thoughts. That’s the secret:
people who don’t write, are people who don’t think."
You can always submit the job from cron. I use the ruby client and the
rufus scheduler library to do scheduled submissions.
Gearman can still be used for the IPC, but the actual scheduling would
have to be done by your application.
So you could have a gearman job that says 'queue this for 2 hours from
now'... your worker accepts it immediately. Waits 2 hours then queues
another job that actually does the work.
But you probably need to handle failures so having your worker store
the scheduled job in a db would be wise.
Rhett
As well as a modified Ruby client that can support it:
http://github.com/gearman-ruby/gearman-ruby
This functionality is implicitly described in the Gearman protocol
(SUBMIT_JOB_SCHED), and we needed it too, so we implemented it. :D
Colin
--
=begin
Colin Curtin
Engineer, Cramer Development
http://cramerdev.com
email: co...@cramerdev.com
skype: colin.t.curtin
phone: +1.805.694.UNIX (8649)
=end
That is awesome. Please flag the branches for merge when you find yourself reaching the point where you want them merged.
Cheers,
-Brian
I said "we should add that" offhand at a conference years ago
(SUBMIT_JOB_EPOCH) but later decided it was a bad idea.
It'll probably work for you and a good number of other folks, but
robustness/scaling is hard to bolt on top of that. I'm sorta partial to
the concept of a worker that accepts special "run this job in the future"
jobs. It then notes the job args into a database/file/whatever and is then
able to pull them back in batches and resubmit into gearman. You can do
that either in the worker or by some other daemon/cron/whatever.
This way you don't waste memory with jobs scheduled into the distance and
can quickly requeue them en masse without hammering a DB with many
processes.
All of our "future" jobs also happen to be ones that need to persist, so
we end up running them out of another system mildly similar to this idea
(TheSchwartz). But it's slow and will eventually be more like what I just
described.
Not discouraging what you're doing, more like sharing the extra thoughts
on this guy.
For example, we do something like this:
client schedules bg job at time x
worker picks up job
- works on data
- transient failure? exponentially back-off and re-schedule the job farther in the future (similar to SMTP)
- otherwise do whatever with the results and re-schedule for a week (or however far) in the future
- lather, rinse, repeat
Without this feature, we'd have to implement something that would schedule tasks ala cron, then interface with it, manage it, etc. I don't want to have several thousand cron jobs, or have another daemon that needs to be maintained, I'd rather just have those jobs in the queue with a future run date. This way when we peek at the persistent queue with our queue-management tool I can see what's supposed to run, when it's supposed to run and view the data associated with it.
Admittedly, as a linked-list, it's not fantastic performance by any means. In the worst-case, N-1 jobs in the queue are jobs in the future and the one you want is all the way at the end of the list. I intend to re-implement it as a heap-based priority queue for efficiency. That way if it's to be run now it ends up on top, otherwise the heap is heap-sorted based on timestamp. I'm not sure if we'll implement that in the C server or in the Ruby one we're hacking on (or maybe both).
I've already hacked support / tested most of the persistent queue modules, but we're using SQLite because it's easy to query so we can quickly examine the queue (even though TC is faster). I've even tested it with > 10K jobs and it seems to do fine...
But, one of the things I find most appealing about Gearman is that because it's so transparent and flexible you can do it any way you want... TIMTOWTDI eh? :)
-John
Putting future jobs into the gearmand tie it to the gearmand and the
limitations of how you can persist jobs to that gearmand. In the case of
writebacks to the DB:
- New jobs go into DB with a timestamp of when they get executed, number
of retries, etc
- Special worker/client thing selects a chunk from DB order by timestamp,
marks some rows as locked for some time, locked forever, or however you
feel like implementing this.
- Client then re-injects jobs into gearmand, which can be any gearmand.
Workers retain a little extra code to note back to the DB on
failure/completion, or fire another job which does so.
Could implement that in a few other ways (worker with parallel syncronous
job requests, etc).
This has some key benefits for us. As a distributed system, jobs that come
in can go out anywhere. You can easily balance jobs across gearmands as
things scale up/down. Future scheduled jobs also don't take up RAM in
gearmand and can have notes easily customized per app.
Adding more DB's and randomly injecting where they go in scales up
persistence. DB's only really need one reader each so they'll scale
independently of each other for writes.
SQL lets us easily inspect future job queues. Apps can modify the jobs,
cancel them, etc. I don't think we even specified a packet for cancelling
a future specced job.
Otherwise we end up needing to build a lot more into the gearmand to get
something similar to this. Seems like an easy way to add too much code to
the wrong place to me. I'd tear all my hair out if I ever had to add a
daemon to a cluster but then had to write special tools and patch the
daemon to even out memory usage.
-Dormando