Run task in gearman in the future

bessarabov

unread,

Aug 12, 2010, 4:35:53 AM8/12/10

to gearman

I have gearman servers that are working perfectly well. I'm sending
tasks to workers they prform them. Splendid!

But I need the new thing: I want some tasks to be done in future (the
time can differ from several minutes to several days).

So I'm thinking of how to make it in the best possible way.

Will appreciate any thoughts and solutions. Thank you!

Daniel de Oliveira Mantovani

unread,

Aug 18, 2010, 2:11:01 AM8/18/10

to gea...@googlegroups.com

Dude, it is just "logic" you can do it on your application ;)

Gearman don't have to do it, this is not the concept.

--
http://www.danielmantovani.com

"If you’ve never written anything thoughtful, then you’ve never had
any difficult, important, or interesting thoughts. That’s the secret:
people who don’t write, are people who don’t think."

Tim Uckun

unread,

Aug 18, 2010, 5:47:37 AM8/18/10

to gea...@googlegroups.com

You can always submit the job from cron. I use the ruby client and the
rufus scheduler library to do scheduled submissions.

Shane Harter

unread,

Aug 18, 2010, 6:31:57 AM8/18/10

to gea...@googlegroups.com

One way we dreamed up that you could use Gearman for this...

If you think of "functions" as just queues..

If you have a job you know you want run on 2010/9/1, for example, you could add it to a queue named somejob_20100901

And in your worker, you would have logic to, each day, register a function for the current day's queued events.

Imperfect but it works. Built-in AT scheduling would be awesome though.

Shane

Rhett Garber

unread,

Aug 18, 2010, 12:36:08 PM8/18/10

to gea...@googlegroups.com

We do this in a separate, database backed daemon.

Gearman can still be used for the IPC, but the actual scheduling would
have to be done by your application.

So you could have a gearman job that says 'queue this for 2 hours from
now'... your worker accepts it immediately. Waits 2 hours then queues
another job that actually does the work.

But you probably need to handle failures so having your worker store
the scheduled job in a db would be wise.

Rhett

John Ewart

unread,

Aug 18, 2010, 10:28:06 AM8/18/10

to gearman

I have some work on launchpad that implements the SUBMIT_JOB_EPOCH
packet type, which we use in production. I personally hate having
multiple systems to maintain so I patched gearmand to support future
jobs so we can keep them all in one place. Right now it's a simple
extension of the linked-list queue system the C server so it's far
from perfect and O(n), but it works for us. If you're interested, it's
linked to the main gearmand project on Launchpad in the branches list.

-John

On Aug 18, 3:31 am, Shane Harter <shane.har...@gmail.com> wrote:
> One way we dreamed up that you could use Gearman for this...
>
> If you think of "functions" as just queues..
>
> If you have a job you know you want run on 2010/9/1, for example, you could
> add it to a queue named somejob_20100901
>
> And in your worker, you would have logic to, each day, register a function
> for the current day's queued events.
>
> Imperfect but it works. Built-in AT scheduling would be awesome though.
>
> Shane
>

Colin Curtin

unread,

Aug 18, 2010, 1:30:04 PM8/18/10

to gea...@googlegroups.com

As John mentioned in another thread, we have a branch of gearman
server that can schedule jobs in the future:
https://code.launchpad.net/~jewart/gearmand/scheduled_jobs_support

As well as a modified Ruby client that can support it:
http://github.com/gearman-ruby/gearman-ruby

This functionality is implicitly described in the Gearman protocol
(SUBMIT_JOB_SCHED), and we needed it too, so we implemented it. :D

Colin

--
=begin
Colin Curtin
Engineer, Cramer Development
http://cramerdev.com
email: co...@cramerdev.com
skype: colin.t.curtin
phone: +1.805.694.UNIX (8649)
=end

Brian Aker

unread,

Aug 18, 2010, 1:55:01 PM8/18/10

to gea...@googlegroups.com

Hi!

That is awesome. Please flag the branches for merge when you find yourself reaching the point where you want them merged.

Cheers,
-Brian

bessarabov

unread,

Aug 18, 2010, 2:02:27 PM8/18/10

to gearman

This is super!

Hope to see this feature will go to main line of gearman.

Thank you! =)

dormando

unread,

Aug 18, 2010, 10:25:46 PM8/18/10

to gearman

Urf.

I said "we should add that" offhand at a conference years ago
(SUBMIT_JOB_EPOCH) but later decided it was a bad idea.

It'll probably work for you and a good number of other folks, but
robustness/scaling is hard to bolt on top of that. I'm sorta partial to
the concept of a worker that accepts special "run this job in the future"
jobs. It then notes the job args into a database/file/whatever and is then
able to pull them back in batches and resubmit into gearman. You can do
that either in the worker or by some other daemon/cron/whatever.

This way you don't waste memory with jobs scheduled into the distance and
can quickly requeue them en masse without hammering a DB with many
processes.

All of our "future" jobs also happen to be ones that need to persist, so
we end up running them out of another system mildly similar to this idea
(TheSchwartz). But it's slow and will eventually be more like what I just
described.

Not discouraging what you're doing, more like sharing the extra thoughts
on this guy.

John Ewart

unread,

Aug 18, 2010, 11:09:30 PM8/18/10

to gea...@googlegroups.com

Really? I think it's exactly the right place to put it, as it's a job to run, it just happens that it should be run in the future. One project of ours has *lots* of recurring jobs (crawling data, updating user stats, etc.) and to be able to just place that job in the queue, and then to be able to re-schedule it after it runs from one single system on is really, really nice. This makes clients able to fire-and-forget and the job will get done when it needs to, and the worker will re-schedule the task for the future without ever leaving the connection to the job manager(s).

For example, we do something like this:

client schedules bg job at time x

worker picks up job
- works on data
- transient failure? exponentially back-off and re-schedule the job farther in the future (similar to SMTP)
- otherwise do whatever with the results and re-schedule for a week (or however far) in the future
- lather, rinse, repeat

Without this feature, we'd have to implement something that would schedule tasks ala cron, then interface with it, manage it, etc. I don't want to have several thousand cron jobs, or have another daemon that needs to be maintained, I'd rather just have those jobs in the queue with a future run date. This way when we peek at the persistent queue with our queue-management tool I can see what's supposed to run, when it's supposed to run and view the data associated with it.

Admittedly, as a linked-list, it's not fantastic performance by any means. In the worst-case, N-1 jobs in the queue are jobs in the future and the one you want is all the way at the end of the list. I intend to re-implement it as a heap-based priority queue for efficiency. That way if it's to be run now it ends up on top, otherwise the heap is heap-sorted based on timestamp. I'm not sure if we'll implement that in the C server or in the Ruby one we're hacking on (or maybe both).

I've already hacked support / tested most of the persistent queue modules, but we're using SQLite because it's easy to query so we can quickly examine the queue (even though TC is faster). I've even tested it with > 10K jobs and it seems to do fine...

But, one of the things I find most appealing about Gearman is that because it's so transparent and flexible you can do it any way you want... TIMTOWTDI eh? :)

-John

dormando

unread,

Aug 19, 2010, 7:29:59 PM8/19/10

to gea...@googlegroups.com

Meh, as I described it's not a cron thing, and it's still a within gearman
thing.

Putting future jobs into the gearmand tie it to the gearmand and the
limitations of how you can persist jobs to that gearmand. In the case of
writebacks to the DB:

- New jobs go into DB with a timestamp of when they get executed, number
of retries, etc

- Special worker/client thing selects a chunk from DB order by timestamp,
marks some rows as locked for some time, locked forever, or however you
feel like implementing this.

- Client then re-injects jobs into gearmand, which can be any gearmand.
Workers retain a little extra code to note back to the DB on
failure/completion, or fire another job which does so.

Could implement that in a few other ways (worker with parallel syncronous
job requests, etc).

This has some key benefits for us. As a distributed system, jobs that come
in can go out anywhere. You can easily balance jobs across gearmands as
things scale up/down. Future scheduled jobs also don't take up RAM in
gearmand and can have notes easily customized per app.

Adding more DB's and randomly injecting where they go in scales up
persistence. DB's only really need one reader each so they'll scale
independently of each other for writes.

SQL lets us easily inspect future job queues. Apps can modify the jobs,
cancel them, etc. I don't think we even specified a packet for cancelling
a future specced job.

Otherwise we end up needing to build a lot more into the gearmand to get
something similar to this. Seems like an easy way to add too much code to
the wrong place to me. I'd tear all my hair out if I ever had to add a
daemon to a cluster but then had to write special tools and patch the
daemon to even out memory usage.

-Dormando

Reply all

Reply to author

Forward