[delayed_job] Avoid Duplicate Jobs in Queue

1,289 views
Skip to first unread message

Jim Mulholland

unread,
May 20, 2010, 7:34:18 PM5/20/10
to delayed_job
Is there an simple way to avoid adding duplicates to a Delayed Job
queue?

I added an "enqueue" monkey patch in an initializer that wrapped
"unless self.find_by_handler(object.to_yaml" around the create
statement which works but seems rather hacky. Is there a better way?

Here is my monkey patch:


module Delayed
module Backend
module Base
module ClassMethods
# Add a job to the queue
def enqueue(*args)
object = args.shift
unless object.respond_to?(:perform)
raise ArgumentError, 'Cannot enqueue items which do not
respond to perform'
end

priority = args.first || 0
run_at = args[1]

# Do not enqueue duplicate jobs
unless self.find_by_handler(object.to_yaml)
self.create(:payload_object => object, :priority =>
priority.to_i, :run_at => run_at)
end
end
end
end
end
end

Brandon Keepers

unread,
May 21, 2010, 9:12:01 AM5/21/10
to delay...@googlegroups.com
DelayedJob does not currently have anything to do this. Can you explain your use case?

Anytime I've needed something like this, it made more sense to push it into the application.

class MyModel
def method_that_gets_delayed
if state == 'pending'
update_attribute :state, 'processing'
# do work
update_attribute :state, 'complete'
end
end
end

Now if the job gets queued multiple times for some reason, it will only process if the work needs done.

=b

Jim Mulholland

unread,
May 21, 2010, 5:40:06 PM5/21/10
to delayed_job
Hi Brandon,

Thanks for the reply. I think we have a fairly unique use case.

In short, we are using DJ as an as-needed cron system. We are hosting
several sites where we are fetching Tweets from Twitter. Instead of
creating a cron job to process every x minutes, we add a job to our
queue to run every time a user hits one of our sites assuming a fetch
tweet job is not already there. This saves us from consistently
fetching tweets for low traffic sites which saves us on server load
and on Twitter api hits.

I'll stick with what we have for now.

Thanks again!

Daniel Morrison

unread,
May 24, 2010, 5:10:45 AM5/24/10
to delay...@googlegroups.com
I think your case is fairly common, but I'd suggest you try Brandon's solution.

In one situation, for example, I have to expire a record 15 minutes after it was last edited. So every time the record gets changed I schedule an 'expire' job. At any time, I might have a handful of these jobs in the queue.

Then, my expire method looks like:

def expire
do_the_expiration if updated_at < 15.minutes.ago
end

That way, if there are old jobs around, they simply become noops, and I can safely ignore them.

I use this pattern quite a bit, and recommend it. Maybe you can check when you last fetched, and only do a new one if enough time has passed.

Cheers,

-Daniel

david rawk

unread,
May 1, 2014, 3:41:38 AM5/1/14
to delay...@googlegroups.com, dan...@collectiveidea.com
the only problem I see with the status is that it's stored, and if something goes wrong, a worker dies, then what sets the status away from pending. 
The Monkey patch method avoids the duplication. 

PS Thanks for the code,  I know it's not your goal to give me a how to, when you're looking for a better way, but like you I'll use it until I come up with something better.  
Reply all
Reply to author
Forward
0 new messages