newbie question about managing parallel video encoding through multiple job servers

210 views
Skip to first unread message

Brice

unread,
Mar 31, 2011, 11:47:50 AM3/31/11
to gearman
Hi,

Here is the situation: I developed a PHP application that runs ffmpeg
locally to encode videos. For each submitted video, I encode 3 new
versions of it (low/mid/high def). Submitted videos are handled one
after the other, so that each version of those videos is also encoded
one after the other.

Today, I have to many users, and at the end of the day people are
waiting too long for their videos to be handled by my single server.
So I decided to investigate the many possibilities offered by Gearman
^^

I have two hardware servers on which I want to install Gearman as job
servers. That would make 3 machines :
- one web server
- two encoding servers (on which I will install Gearman)

For now I succeeded to set workers on the two servers. Those two
workers (who have the same name) are set to run the proper encoding
function.

I also succeeded to set a Gearman client with PHP from web server. I
can register the two job server's IP from that client. I did a
successful test : when I run my background job, one of the two server
runs it and the video is encoded (... video comes from a web server
partition that I mounted on both encoding servers).

My question is this : how Gearman manages to know which one of the
many job servers is the most available to run the job? In other words,
how can I be sure that the server that has the most available CPU is
running the new job?

I am very concern that an encoding server will receive all jobs from
Gearman and run them slower than if Gearman was able to fairly
dispatch jobs between all encoding servers so that every one of them
would use the same average of CPU.

To sum up my question, does Gearman take CPU in consideration when a
client ask for a job to be done from multiple workers?

Thx for your answers!

-- sorry for my poor english, I did my best to explain myself
clearly ;-)

Brice

Brian Moon

unread,
Mar 31, 2011, 1:33:16 PM3/31/11
to gea...@googlegroups.com, Brice
Well, this is is handled sort of virtually. You see, what gearmand does
is send a wake up to all the workers that know how to do job A. The
first one that responds OH, ME! ME! ME! gets the job. If a server is
bogged down with work, it is likely that it will not the first to
respond to such a question. Have you observed something else? Beware
pre-optimization and worrying about things that are not too be worried
about.

Brian.
http://brian.moonspot.net

shawn wilson

unread,
Mar 31, 2011, 1:43:28 PM3/31/11
to gea...@googlegroups.com
On Thu, Mar 31, 2011 at 11:47 AM, Brice <brv...@gmail.com> wrote:
>
> To sum up my question, does Gearman take CPU in consideration when a
> client ask for a job to be done from multiple workers?
>

i think this is the wrong question. i would optimize your encoding so
that each encoding process takes up as much of the processor as
possible. then i'd see if i could check how many jobs each worker has
queued.

ps - that didn't answer your question, i just figured i'd give you
something different to think about.

Tim Uckun

unread,
Mar 31, 2011, 7:56:59 PM3/31/11
to gea...@googlegroups.com
On Fri, Apr 1, 2011 at 6:33 AM, Brian Moon <br...@moonspot.net> wrote:
> Well, this is is handled sort of virtually. You see, what gearmand does is
> send a wake up to all the workers that know how to do job A. The first one
> that responds OH, ME! ME! ME! gets the job. If a server is bogged down with
> work, it is likely that it will not the first to respond to such a question.
> Have you observed something else? Beware pre-optimization and worrying about
> things that are not too be worried about.


My solution has been to put more workers on the beefier machines and
less workers on the weaker machines. Works just fine.

Artur Bodera

unread,
Apr 1, 2011, 5:38:54 AM4/1/11
to gearman
On Mar 31, 4:47 pm, Brice <brv...@gmail.com> wrote:
> To sum up my question, does Gearman take CPU in consideration when a
> client ask for a job to be done from multiple workers?

No. It's worse than round-robin. First available worker gets the job,
regardless of anything else.

For targeting specific workers you can employ:
1. job checking (i.e. the worker receives job data and fails it
because the avg load is too high to start encoding).
2. worker name pre/suffixing (i.e. encodevideo becomes encodevideo_1
for first server, encodevideo_2 for second one)

Also, I personally use:
nice -n 19 /path/to/ffmpeg -threads 1 -y ....

This sets the lowest job priority (so the server can perform other
foreground jobs) and forces ffmpeg to use only 1 thread (so it will
use only 1 cpu core on 8 core machine, i.e. you can have a total of 8
independent encoding workers running there).

Arthur

isaiah van der Elst

unread,
Apr 1, 2011, 11:56:17 AM4/1/11
to gea...@googlegroups.com
>No. It's worse than round-robin.
 

This is incorrect.  The round-robin algorithm forces jobs onto workers, even if they are not ready for it. Gearman will only send job to those workers who have specified they’re ready to work.

 

Gearman does not take the worker’s CPU into consideration when distributing work.  However, the workers with better CPUs should complete the jobs faster and therefore grab jobs more often.

 

If the server is favoring a worker, it’s most likely due to the RTT.  When a job becomes available, the server will notify all waiting workers.  The first to worker to grab the job from the server gets it. So if your system is favoring the slower machines, it’s likely due to the slower machines having a better RTT.

Brice

unread,
Apr 5, 2011, 12:56:14 PM4/5/11
to gearman
Thank you guys for your answers, that was very usefull :D
Reply all
Reply to author
Forward
0 new messages