Best way to enqueue multiple jobs that depend on each other?

436 views
Skip to first unread message

Brice Stacey

unread,
May 28, 2012, 9:46:05 PM5/28/12
to boston-r...@googlegroups.com
Howdy -

I am trying to figure out the best way to enqueue multiple jobs which depend on others to complete first. For context, I'm uploading a file, parsing it (job), and then map reducing it (second job). I'm using mongo (Mongoid) with redis (resque).

I have a couple ideas:

1) One long job (obviously bad design)
2) At the end of each job, have it enqueue the next job (this feels wrong, like they're too coupled)
3) Setup an observer for when parsing is complete, say when parsed goes from false to true (I'm leaning toward this, but worried about unexpected consequences)

Would love to hear people's ideas.

Brice

Maurício Linhares

unread,
May 28, 2012, 9:49:43 PM5/28/12
to boston-r...@googlegroups.com
Get someone else to implement 2.

I have something that resembles this today and what i did is once the first job finishes, it pings a web app back and the web app decides what other job should be executed and then enqueues it on resque.
--
You received this message because you are subscribed to the Boston Ruby Group mailing list
To post to this group, send email to boston-r...@googlegroups.com
To unsubscribe from this group, send email to boston-rubygro...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/boston-rubygroup

Brice Stacey

unread,
May 29, 2012, 9:53:09 PM5/29/12
to boston-r...@googlegroups.com
I actually already implemented 2 but instead of a webservice I just enqueue the next job from the previous one. I wasn't sure if it's the best choice, but I guess seems good enough. 

Thanks,

Brice
Sent from my iPhone

Maurício Linhares

unread,
May 29, 2012, 9:57:49 PM5/29/12
to boston-r...@googlegroups.com
If there is no logic in there to select what job the next job is, I think it's fine.

In my case there is some logic when deciding what is the next step once the first job is done.

Ben Tucker

unread,
May 29, 2012, 11:48:15 PM5/29/12
to boston-r...@googlegroups.com
Not saying this is necessarily the best pattern, but I had a similar need on a project a couple years ago.  What we did was to enqueue both jobs at the onset, but job #2 started with a dependency check and if it failed the job was re-queued with an exponential backoff. This worked well and I liked that each job was only responsible for itself, but it's certainly not as efficient as having #1 directly or indirectly enqueue #2 upon completion.

Ben


--

Jacob Burkhart

unread,
May 30, 2012, 1:07:17 PM5/30/12
to boston-r...@googlegroups.com
I am interested in this problem.

In fact I wrote some code: https://github.com/jacobo/resque-delegation

In fact, look at my crazy DSL for generating job dependencies in the
making of a sandwhich:
https://github.com/jacobo/resque-delegation/blob/master/spec/delegation_spec.rb

It could work.

BUT, I've come to decide it's best to keep state in models and in your
database, where it is easier to diagnose what went wrong (as opposed
to having that state in redis).

So make a model for you task, and have the model change state as parts
of it are completed. And have it enQ jobs as it moves along it's
state. Simple. Effective.
> http://groups.google.com/group/boston-rubygroup

Luke

unread,
May 30, 2012, 9:17:31 PM5/30/12
to boston-r...@googlegroups.com
My thoughts, in my favorite style (bullets for life):

  • if job B depends on A's completion before it can start, then job B should handle enqueuing itself after A
  • downstream job (B) being more aware of its predecessors (A) seems more likely than other way around
    • for example, a job to job to generate animated weather map knows the job of cleaning up radar data needs to be done first (knows its inputs) (B knows about A)
    • the job to gather radar data doesn't necessarily know what its outputs are being used for (A doesn't know about B)
  • To implement that system above, where B knows about A, have your calling code enqueue B.  B starts with a dependency check.  If B's dependencies aren't completed, it enqueues A  and then re-enqueues itself.
  • Rake isn't a queue, but it is dependency-based.  Think of it like
    • task :render_weather_map => :scrub_radar_data

Luke

unread,
May 30, 2012, 9:18:04 PM5/30/12
to boston-r...@googlegroups.com
... and of course I hit <Tab>:w<CR> while typing that email and it sent.  Oops.  (Tab is mapped to Esc in insert mode)

Luke

unread,
May 30, 2012, 9:24:50 PM5/30/12
to boston-r...@googlegroups.com
In conclusion, just have a job check for completion of its dependencies.  Should be able to tell from state of data, I should hope.  It is in the state of *some* data somewhere that a dependency is manifested.

In the weather example, it might look like:

define_job(:generate_weather_map) do
  if !weather_data_source.current_period.scrubbed_at.present?
    enqueue(:scrub_weather_data, weather_data_period_id: weather_data_source.current_period.id)
    enqueue(:generate_weather_map, weather_data_period_id: weather_data_source.current_period.id)
    return false
  end
  weather_data_source.map = #generate_weather_map
  true
end

Joey

unread,
May 31, 2012, 9:38:35 PM5/31/12
to Boston Ruby Group
In practice, I believe option (2) is not as fragile/coupled as you
mention. I.e., the later jobs are triggered by the completion of the
earlier jobs. (Job #1 doesn't need to directly trigger job #2. Rather,
the shared model [a file/upload in your case] would trigger the
appropriate processing once the preconditions are satisfied.)

Also, in appropriate cases we've used this gem--very useful when
generating a batch of jobs that _can_ be parallelized, along with some
clean-up/finalization tasks that depend on the rest being complete:
https://github.com/aaw/resque-multi-step



On May 30, 9:24 pm, Luke <wlgriffi...@gmail.com> wrote:
> In conclusion, just have a job check for completion of its dependencies.
>  Should be able to tell from state of data, I should hope.  It is in the
> state of *some* data somewhere that a dependency is manifested.
>
> In the weather example, it might look like:
>
> define_job(:generate_weather_map) do
>   if !weather_data_source.current_period.scrubbed_at.present?
>     enqueue(:scrub_weather_data, weather_data_period_id:
> weather_data_source.current_period.id)
>     enqueue(:generate_weather_map, weather_data_period_id:
> weather_data_source.current_period.id)
>     return false
>   end
>   weather_data_source.map = #generate_weather_map
>   true
> end
>
>
>
>
>
>
>
> On Wed, May 30, 2012 at 9:18 PM, Luke <wlgriffi...@gmail.com> wrote:
> > ... and of course I hit <Tab>:w<CR> while typing that email and it sent.
> >  Oops.  (Tab is mapped to Esc in insert mode)
>
> > On Wed, May 30, 2012 at 9:17 PM, Luke <wlgriffi...@gmail.com> wrote:
>
> >> My thoughts, in my favorite style (bullets for life):
>
> >>    - if job B depends on A's completion before it can start, then job B
> >>    should handle enqueuing itself after A
> >>    - downstream job (B) being more aware of its predecessors (A) seems
> >>    more likely than other way around
> >>       - for example, a job to job to generate animated weather map knows
> >>       the job of cleaning up radar data needs to be done first (knows its inputs)
> >>       (B knows about A)
> >>       - the job to gather radar data doesn't necessarily know what its
> >>       outputs are being used for (A doesn't know about B)
> >>    - To implement that system above, where B knows about A, have your
> >>    calling code enqueue B.  B starts with a dependency check.  If B's
> >>    dependencies aren't completed, it enqueues A  and then re-enqueues itself.
> >>    - Rake isn't a queue, but it is dependency-based.  Think of it like
> >>       - task :render_weather_map => :scrub_radar_data
>
> >> On Wed, May 30, 2012 at 1:07 PM, Jacob Burkhart <igoti...@gmail.com>wrote:
>
> >>> I am interested in this problem.
>
> >>> In fact I wrote some code:https://github.com/jacobo/resque-delegation
>
> >>> In fact, look at my crazy DSL for generating job dependencies in the
> >>> making of a sandwhich:
>
> >>>https://github.com/jacobo/resque-delegation/blob/master/spec/delegati...
>
> >>> It could work.
>
> >>> BUT, I've come to decide it's best to keep state in models and in your
> >>> database, where it is easier to diagnose what went wrong (as opposed
> >>> to having that state in redis).
>
> >>> So make a model for you task, and have the model change state as parts
> >>> of it are completed.  And have it enQ jobs as it moves along it's
> >>> state.  Simple. Effective.
>
> >>> On Tue, May 29, 2012 at 8:48 PM, Ben Tucker <b...@btucker.net> wrote:
> >>> > Not saying this is necessarily the best pattern, but I had a similar
> >>> need on
> >>> > a project a couple years ago.  What we did was to enqueue both jobs at
> >>> the
> >>> > onset, but job #2 started with a dependency check and if it failed the
> >>> job
> >>> > was re-queued with an exponential backoff. This worked well and I
> >>> liked that
> >>> > each job was only responsible for itself, but it's certainly not as
> >>> > efficient as having #1 directly or indirectly enqueue #2 upon
> >>> completion.
>
> >>> > Ben
>
> >>> > On May 28, 2012, at 9:46 PM, Brice Stacey <bricesta...@gmail.com>
Reply all
Reply to author
Forward
0 new messages