Gearman queue handling

Josias

unread,

May 5, 2020, 7:00:15 PM5/5/20

to Gearman

Hey!
So we currently use Gearman with workers to distribute the processing of videos. My problem is that if a user sends a big batch of videos, then it takes over the queue.
So I want to be able to have a better queue system. Gearman is fine. Does the job as far as I know but I've been recommended to use Kafka or Rabbitmq for this.
Since some batches can be large, we can't allocate just one worker per user to fix this. Somehow we need to distribute it without blocking other users based on the workers that are available at every time.

What do you guys recommend?

Thanks in advance for your help!

dormando

unread,

May 5, 2020, 7:09:39 PM5/5/20

to Gearman

There are a couple of job priorities? might be simplest to just submit
batch jobs with the lower priority. then realtime jobs still get gobbled
off the top.

Depending on how your workers are set up you could also register multiple
functions, with one for realtime and one for batches. If workers then
wait on both functions they can consciously take N jobs from batch
before they stop listening on batch and only do some realtime work.

Clint Byrum

unread,

May 5, 2020, 8:46:06 PM5/5/20

to gea...@googlegroups.com

Howdy!

As dormando said, if you have a clear distinction between "need ASAP" and "when we get to it", you can just run ASAP jobs as HIGH priority and batch jobs as "LOW" and monitor both queues, ensuring you always have some extra resources to get to those low jobs. Beware here though, these are *preemptive* priorities, which means HIGH jobs will *all* be sent to workers before any NORMAL or LOW jobs are.

However, if you don't actually have a clear and easy way to make that distinction at submit time, then the problem is not gearmand. rabbitmq and kafka are going to work the same if applied in the same simple manner. The simple mechanics of a single FIFO queue are your problem.

You may need a Quality of Service algorithm.

https://en.wikipedia.org/wiki/Network_scheduler has a bunch of them listed. Token bucket filter might work. Basically wherever you're submitting jobs directly to workers, you'll want to precede that with a QoS check of some kind. Some algorithms will buffer and delay, some will drop/reject. You have to figure out what works for your code base and economics.

However, in both cases, you might be being pennywise and pound foolish here. Cloud resources are cheap and virtually limitless, software development resources are the opposite. It's almost trivial to expand your worker pool elastically as demand rises and falls, but not so much to debug a scheduling algorithm.

--
You received this message because you are subscribed to the Google Groups "Gearman" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gearman+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gearman/fd768ac7-8aa0-4c9e-b8c1-b64a6048c959%40googlegroups.com.

Josias

unread,

Feb 11, 2022, 5:38:00 AM2/11/22

to Gearman

Thank you both for your feedback! I just realized I didn't respond back then.

Reply all

Reply to author

Forward