Excerpts from Edwin Fuquen's message of 2012-10-03 09:17:19 -0700:
Right, querying for all the handles would not scale well as it would have
to lock the structure to make a copy and then slam it out the network port.
Gearmand is kept simple to reduce the impact and chance of failure. The
more you can push off to workers and clients, the more distributed and
scalable your system becomes.
For this problem, you can run lots of workers which keep track, there
is no need to persist anything, unless you really are super paranoid
about losing track.
In pseudo php:
$statusObject = AggregateStatusThing('memcached_server+key_or_something');
$client = GearmanClient();
$client = GearmanWorker();
record_status($task) {
$statusObject->updateReport($task->jobHandle(), $task->taskDenominator())
}
$client->setStatusCallback('record_status');
record_complete($task){
$statusObject->finishJob($task->jobHandle());
}
$client->setCompleteCallback('record_complete');
marshall_work($job) {
$client->addTask('_real_transcode', $job->getPayload());
}
$worker->registerFunction('transcode', 'marshall_work');
// Better to use NON_BLOCKING mode here so everything can
// run at one time, but I think you get the point
while($worker->work()) {
$client->runTasks();
}
Worst case, one of these workers dies and you lose track of the work it
was tracking (the work will still run, you just won't have insight into
completion). You can mitigate the impact of these failures by running
many of these tracker workers.
>
> Has anyone else had this use case? Is there a better solution with
> the gearman-server? If the feature doesn't exist can we get it in
> there? It would seem like it would be something easy to add.
>
I'd rather see the above included as a helper worker to run in many
places than see it built into gearmand.