gearadmin --status / different stats for different servers

Aaron Lozier

unread,

May 3, 2017, 12:59:18 PM5/3/17

to Gearman

I currently have three worker servers (as in, physical servers). The first is running 5 gearman workers, the other two 10 each - using supervisor to ensure these running processes.

I am troubleshooting multiple problems with my background job processing, including:

1) Not all workers being utilized. Generally have no more than 14 being used at one time, with other jobs remaining in the queued state.

2) High CPU utilization grinding my servers to a halt (granted, not really Gearman's problem)

3) Jobs stalling/not able to retrieve status after many tries.

4) Job servers seem to be out of sync.

Since it is difficult to get at the root of these issues, I'm trying to focus first on one issue that is a bit confounding, and could possibly be the root of other issues.

On each worker server, if I run gearadmin --status, I get radically different results:

worker #1 (running 5 workers)

run_observer 68 5 5

worker #2 (running 5 workers)

run_observer 11 4 6

worker #3 (running 5 workers)

run_observer 65 10 10

So, depending upon which worker I ask, I have anywhere from 11 - 68 jobs queued, 4 - 10 jobs running (It should be more like 14), and 21 available workers. Basically, none of these numbers make sense no matter how I interpret them, and shouldn't they be in sync?

Am I completely misunderstanding how the worker servers are supposed to be setup in a multi-server environment?

As far as I can tell, jobs ARE being distributed to different workers. It's just that the utilization and stats seem far off from what I'm expecting.

The results of gearadmin --workers are similarly skewed. It should be noted however, that when I run this command I DO see the private IP of all three worker servers listed, albeit with significantly varying distributions.

Clint Byrum

unread,

May 3, 2017, 2:04:01 PM5/3/17

to gearman

Excerpts from Aaron Lozier's message of 2017-05-03 09:59:18 -0700:

I think you missed the point a little, but you're not alone, as over the
years gearman has unfortunately suffered from a deafening silence
resulting in misconceptions.

The idea is that all workers, and all clients, should be setup to contact
all gearmands. At scale, like, if you have hundreds of gearmands or
workers, you'll probably want to limit that a bit, but you're not up at
100+ gearmands, and 10000+ workers, so let's ignore that.

The reason for adding all the servers is that gearmand is designed to
distribute work at the edges. So clients should _also_ list all the
servers, or a portion of them at scale. If you've used memcached, this
should seem familiar.

Now, due to an oversight when libgearman was written [1] we make this
a lot less cool than it could be. In the original perl system that
livejournal built, the unique ID of each job was hashed on the client
and sent to a server based on that hash. This allowed multiple clients
submitting the same job to the same cluster to coalesce on a single
running worker's output. This is particularly effective at thundering-herd
prevention when you use a slow database and a fast cache such as memcached
(which was also created by livejournal.. go figure! ;)

But for background jobs, what you should be most interested in is just
making sure clients submit their jobs to a randomly chosen gearmand. For
that you can just randomize the list when adding servers.

[1] https://github.com/gearman/gearmand/issues/64

>
> As far as I can tell, jobs ARE being distributed to different workers. It's
> just that the utilization and stats seem far off from what I'm expecting.
>
>
> The results of gearadmin --workers are similarly skewed. It should be noted
> however, that when I run this command I DO see the private IP of all three
> worker servers listed, albeit with significantly varying distributions.
>

It sounds to me like you have 3 servers running both worker processes
and gearmand. Guessing you just have your workers using 127.0.0.1 as
the sole gearmand to contact. This is the mistake.

You should add all of the gearmand ips to all of your clients and
workers. You should randomize the order of that list on the clients,
so that each one selects a different gearmand as its primary job target
(they'll fail over to the others automatically if the first one fails).

Last note, for workers, they'll always connect to and wait for jobs on
all of the servers you add. This way wherever work is submitted, it will
be presented to all of the workers, and thus, those that are not busy
will get first crack, and you should see much better distribution of work.

Aaron Lozier

unread,

May 3, 2017, 3:29:13 PM5/3/17

to Gearman

Enter code here...

I really appreciate you taking the time to respond. Your description of the "deafening silence" surrounding Gearman is an apt one. I feel like I have read everything that has been written about Gearman, desperate to find some clarity. For better or for worse, I have invested a huge amount of time and energy into implementing Gearman into our application. It would be one thing if it was failing for me completely, but it's much more subtle than that. The actual situation I've been dealing with is that upon a full reboot of all worker servers, everything works flawlessly - for awhile. It seems around the 2-3 day mark that cracks start appearing. And the symptoms are varied. I forgot to mention another problem: sometimes workers sit in the queue for a very long time before beginning to run. Actually that could be related to my #1 above.

I have read your response a number of times, and as far as I can tell, I am already set up the way you describe. I have a function which executes the client, and another that executes a worker. For simplicity, I felt it was best to only run the client on the first worker server, but I have multiple instances of the worker function running on the three servers as I described above, managed by supervisor.

My application is written in the Zend Framework with Doctrine ORM, for what it's worth.

Worker:

     /**
     * Start gearman worker
     */
    public function gearmanWorkerAction()
    {                
        set_error_handler([$this,'errHandle']);
        register_shutdown_function([$this,'fatalErrorShutdownHandler']);
        
        $this->_start_version = filemtime(realpath(APPLICATION_PATH . DIRECTORY_SEPARATOR . '..'));
        $this->_worker_start_time = time();
        
        $this->_helper->layout->disableLayout();
        $this->_helper->viewRenderer->setNoRender(true);
        
        $worker= new \GearmanWorker();
        foreach($this->_getWorkerServers() as $worker_server){
            $worker->addServer($worker_server[0],$worker_server[1]);
        }
        $worker->addFunction("run_observer", array($this,"run_observer"));
        while ($worker->work());
    }

Client:

    /**
     * Queue any tasks that need to be run by gearman worker. This should be on a cron job.
     */
    public function gearmanClientAction()
    {                        
        $client = new GearmanClient();
        foreach($this->_getWorkerServers() as $worker_server){
            $client->addServer($worker_server[0],$worker_server[1]);
        }
          
        $this->_helper->layout->disableLayout();
        $this->_helper->viewRenderer->setNoRender(true);
        
        $this->_checkSubmittedTasks($client);
        /**
         * Make sure all scheduled jobs configured in .ini files are registered in the database
         */
        $this->_registerConfiguredGearmanTasks();

        $gearman_tasks = $this->getEntityManager()->getRepository('\Ia\Entity\GearmanTask')->getTasksToRun();
        if(count($gearman_tasks)>0){
            /**
             * Register callbacks
             */
            $client->setStatusCallback(array($this,'_gearmanTaskStatus'));
            $client->setDataCallback(array($this,'_gearmanTaskData'));
            $client->setCreatedCallback(array($this,'_gearmanTaskCreated'));
            $client->setCompleteCallback(array($this,'_gearmanTaskComplete'));
            $client->setExceptionCallback(array($this,'_gearmanTaskException'));
            $client->setFailCallback(array($this,'_gearmanTaskFail'));
            foreach($gearman_tasks as $gearman_task){
                if($gearman_task->priority == \Ia\Entity\GearmanTask::PRIORITY_HIGH 
                    || APPLICATION_ENV == 'production'){
                    $gearman_task->setState(\Ia\Entity\GearmanTask::STATE_SUBMITTED);
                    $workload = serialize(array($gearman_task->resource,unserialize($gearman_task->params)));
                    $unique_id = ($gearman_task->type==\Ia\Entity\GearmanTask::TYPE_SCHEDULED) ? $gearman_task->id.'_'.$gearman_task->next_run : $gearman_task->id;
                    switch($gearman_task->priority){
                        case \Ia\Entity\GearmanTask::PRIORITY_HIGH:
                            $client->addTaskHigh("run_observer", $workload, null, $unique_id); 
                            break;
                        case \Ia\Entity\GearmanTask::PRIORITY_LOW:
                            $client->addTaskLow("run_observer", $workload, null, $unique_id); 
                            break;
                        default:
                            $client->addTask("run_observer", $workload, null, $unique_id); 
                            break;
                    }
                    $this->getEntityManager()->persist($gearman_task);
                    $this->getEntityManager()->flush();
                }    
            }
            $client->runTasks();
        } 
    }

Get Worker Servers:

    protected function _getWorkerServers()
    {
        if(count($this->_worker_servers)==0){
            if(\Ia\Config::get('gearman_worker/servers')){
                foreach(\Ia\Config::get('gearman_worker/servers') as $worker_server){
                    $parts = explode(':',$worker_server);
                    if(count($parts)==2){
                        $this->_worker_servers[] = $parts;
                    }
                }
            }
            if(count($this->_worker_servers)==0){
                $this->_worker_servers[] = ['127.0.0.1', 4730];
            }
        }
        shuffle($this->_worker_servers);
        return $this->_worker_servers;
    }

So as you can see, I am setting all worker servers in both the client and worker actions.

To re-cap:

1) I have gearman running on all three servers (gearman-job-server)

2) worker1 contains a cron executing gearmanClientAction() every minute. If it has jobs to runs, it sends to gearman as shown above and remains running to monitor the progress of the jobs and to update my mysql table so I can watch the status on a UI I have written.

3) worker1 is always keeping open 5 instances of gearmanWorkerAction() using supervisor, whereas worker 2 and 3 each have 10. I put less on worker1 with the thinking that since it's also handling the client, I want to keep more resource availability on that server. Maybe unnecessary, or maybe overkill.

Does this raise any red flags that you can see?

Aaron

Aaron Lozier

unread,

May 3, 2017, 4:38:58 PM5/3/17

to Gearman

Can you recommend a really good monitoring tool for Gearman? Beginning to wonder how many of my problems could be related to improper reporting of the actual status of jobs, what is running/not running. Still seeing odd numbers in terms of how many workers are registered, available, etc. These do not line up with how many workers I *should* have registered. And I also can't account for the number of jobs it claims is running.

Clint Byrum

unread,

May 3, 2017, 5:45:21 PM5/3/17

to gearman

Excerpts from Aaron Lozier's message of 2017-05-03 12:29:13 -0700:

> Enter code here...
>
> I really appreciate you taking the time to respond. Your description of the
> "deafening silence" surrounding Gearman is an apt one. I feel like I have
> read everything that has been written about Gearman, desperate to find some
> clarity. For better or for worse, I have invested a huge amount of time and
> energy into implementing Gearman into our application. It would be one
> thing if it was failing for me completely, but it's much more subtle than
> that. The actual situation I've been dealing with is that upon a full
> reboot of all worker servers, everything works flawlessly - for awhile. It
> seems around the 2-3 day mark that cracks start appearing. And the symptoms
> are varied. I forgot to mention another problem: sometimes workers sit in
> the queue for a very long time before beginning to run. Actually that could
> be related to my #1 above.
>

Indeed. We're trying to bring it back, slowly. Great to have you here
and interested. :)

> I have read your response a number of times, and as far as I can tell, I am
> already set up the way you describe. I have a function which executes the
> client, and another that executes a worker. For simplicity, I felt it was
> best to only run the client on the first worker server, but I have multiple
> instances of the worker function running on the three servers as I
> described above, managed by supervisor.
>

I've used gearmand for something similar. It's been a long time though.

Cool! My assumption is proven wrong. I do have a couple of other
concerns.

> To re-cap:
>
> 1) I have gearman running on all three servers (gearman-job-server)
> 2) worker1 contains a cron executing gearmanClientAction() every minute. If
> it has jobs to runs, it sends to gearman as shown above and remains running
> to monitor the progress of the jobs and to update my mysql table so I can
> watch the status on a UI I have written.
> 3) worker1 is always keeping open 5 instances of gearmanWorkerAction()
> using supervisor, whereas worker 2 and 3 each have 10. I put less on
> worker1 with the thinking that since it's also handling the client, I want
> to keep more resource availability on that server. Maybe unnecessary, or
> maybe overkill.
>
> Does this raise any red flags that you can see?
>

Your setup is fine. However, how many tasks might be added in one of
these cron jobs? You may actually want to farm waiting for statuses
and updating your status display out to workers too. Rather than have
your cron job process stay resident and wait for all of the results from
that particular run. Otherwise I could see a situation where you have
a very busy client that spends a ton of time processing WORK_STATUS and
WORK_COMPLETE messages. Internally in gearmand, processing these in one
connection may cause quite a bit of contention for that thread from
all the other workers, though I haven't looked at that code in a while
to see if that's really true..

But ultimately, that's just a theory. I can't see any glaring red flags.

Have you tried 1.1.15?

Brian Moon

unread,

May 4, 2017, 12:03:11 AM5/4/17

to gea...@googlegroups.com

On 5/3/17 14:29 , Aaron Lozier wrote:
> The actual situation I've been
> dealing with is that upon a full reboot of all worker servers,
> everything works flawlessly - for awhile. It seems around the 2-3 day
> mark that cracks start appearing. And the symptoms are varied. I forgot
> to mention another problem: sometimes workers sit in the queue for a
> very long time before beginning to run. Actually that could be related
> to my #1 above.

When this happens for us, it tends to only happen in environments where
the job load is very low. I find gearmand has a lot of CLOSE_WAIT
connections (see below). My current working theory is that I have
workers disconnecting in an unclean manner which leave the gearmand with
these connections. Because gearmand manages workers based on their TCP
connection, it thinks it has more workers than it really does. This
makes the stats wrong and can cause delays in jobs getting run. A
restart of gearmand fixes it. I would be curious if your is in this
state as well.

$ lsof -p `pidof gearmand` | fgrep TCP
gearmand 29119 gearman 9u IPv4 4252675769 0t0
TCP *:4730 (LISTEN)
gearmand 29119 gearman 8383u IPv4 70041600 0t0
TCP localhost:4730->localhost:58159 (CLOSE_WAIT)
gearmand 29119 gearman 8384u IPv4 70041297 0t0
TCP localhost:4730->localhost:58143 (CLOSE_WAIT)
gearmand 29119 gearman 8385u IPv4 70048500 0t0
TCP localhost:4730->localhost:58325 (CLOSE_WAIT)
gearmand 29119 gearman 8386u IPv4 70065763 0t0
TCP localhost:4730->localhost:58706 (CLOSE_WAIT)
gearmand 29119 gearman 8387u IPv4 70062636 0t0
TCP localhost:4730->localhost:58649 (CLOSE_WAIT)
gearmand 29119 gearman 8388u IPv4 70084044 0t0
TCP localhost:4730->localhost:59166 (CLOSE_WAIT)
gearmand 29119 gearman 8389u IPv4 70076871 0t0
TCP localhost:4730->localhost:58987 (CLOSE_WAIT)
gearmand 29119 gearman 8390u IPv4 70086486 0t0
TCP localhost:4730->localhost:59249 (CLOSE_WAIT)
gearmand 29119 gearman 8391u IPv4 70079035 0t0
TCP localhost:4730->localhost:59051 (CLOSE_WAIT)
gearmand 29119 gearman 8392u IPv4 70087618 0t0
TCP localhost:4730->localhost:59284 (CLOSE_WAIT)
gearmand 29119 gearman 8393u IPv4 70091785 0t0
TCP localhost:4730->localhost:59368 (CLOSE_WAIT)
gearmand 29119 gearman 8394u IPv4 70099088 0t0
TCP localhost:4730->localhost:59527 (CLOSE_WAIT)
gearmand 29119 gearman 8395u IPv4 70122030 0t0
TCP localhost:4730->localhost:60108 (CLOSE_WAIT)
gearmand 29119 gearman 8396u IPv4 70132483 0t0
TCP localhost:4730->localhost:60382 (CLOSE_WAIT)
gearmand 29119 gearman 8397u IPv4 70127153 0t0
TCP localhost:4730->localhost:60334 (CLOSE_WAIT)
gearmand 29119 gearman 8398u IPv4 70143785 0t0
TCP localhost:4730->localhost:60615 (CLOSE_WAIT)
gearmand 29119 gearman 8399u IPv4 70144617 0t0
TCP localhost:4730->localhost:60735 (CLOSE_WAIT)
gearmand 29119 gearman 8400u IPv4 70152139 0t0
TCP localhost:4730->localhost:60943 (CLOSE_WAIT)
gearmand 29119 gearman 8401u IPv4 70158506 0t0
TCP localhost:4730->localhost:60985 (CLOSE_WAIT)
gearmand 29119 gearman 8402u IPv4 70167842 0t0
TCP localhost:4730->localhost:32997 (CLOSE_WAIT)
gearmand 29119 gearman 8403u IPv4 70086588 0t0
TCP localhost:4730->localhost:59252 (ESTABLISHED)
gearmand 29119 gearman 8404u IPv4 70171222 0t0
TCP localhost:4730->localhost:33077 (CLOSE_WAIT)
gearmand 29119 gearman 8405u IPv4 70090847 0t0
TCP localhost:4730->localhost:59373 (ESTABLISHED)
gearmand 29119 gearman 8406u IPv4 70099123 0t0
TCP localhost:4730->localhost:59530 (ESTABLISHED)
gearmand 29119 gearman 8407u IPv4 70122033 0t0
TCP localhost:4730->localhost:60111 (ESTABLISHED)
gearmand 29119 gearman 8408u IPv4 70131217 0t0
TCP localhost:4730->localhost:60338 (ESTABLISHED)
gearmand 29119 gearman 8409u IPv4 70132986 0t0
TCP localhost:4730->localhost:60387 (ESTABLISHED)
gearmand 29119 gearman 8410u IPv4 70144012 0t0
TCP localhost:4730->localhost:60618 (ESTABLISHED)
gearmand 29119 gearman 8411u IPv4 70144638 0t0
TCP localhost:4730->localhost:60740 (ESTABLISHED)
gearmand 29119 gearman 8412u IPv4 70152148 0t0
TCP localhost:4730->localhost:60947 (ESTABLISHED)
gearmand 29119 gearman 8413u IPv4 70158518 0t0
TCP localhost:4730->localhost:60988 (ESTABLISHED)
gearmand 29119 gearman 8414u IPv4 70167846 0t0
TCP localhost:4730->localhost:33002 (ESTABLISHED)
gearmand 29119 gearman 8415u IPv4 70171866 0t0
TCP localhost:4730->localhost:33082 (ESTABLISHED)

--

Brian.
--------
http://brian.moonspot.net/

Aaron Lozier

unread,

May 4, 2017, 9:42:35 AM5/4/17

to Gearman

On Wednesday, May 3, 2017 at 4:45:21 PM UTC-5, Clint "SpamapS' Byrum wrote:

Excerpts from Aaron Lozier's message of 2017-05-03 12:29:13 -0700:
> Enter code here...
>
> I really appreciate you taking the time to respond. Your description of the
> "deafening silence" surrounding Gearman is an apt one. I feel like I have
> read everything that has been written about Gearman, desperate to find some
> clarity. For better or for worse, I have invested a huge amount of time and
> energy into implementing Gearman into our application. It would be one
> thing if it was failing for me completely, but it's much more subtle than
> that. The actual situation I've been dealing with is that upon a full
> reboot of all worker servers, everything works flawlessly - for awhile. It
> seems around the 2-3 day mark that cracks start appearing. And the symptoms
> are varied. I forgot to mention another problem: sometimes workers sit in
> the queue for a very long time before beginning to run. Actually that could
> be related to my #1 above.
>

Indeed. We're trying to bring it back, slowly. Great to have you here
and interested. :)

Thanks! I am encouraged to hear the project is being revived. I was beginning to feel I should investigate other systems, but for PHP at least, I don't think there's anything better out there. Certainly not anything that has all of the features Gearman has. Correct me if I'm wrong. Add to that the fact that what I've built seems SO close to working flawlessly.

Ok, so to confirm I am understanding, your suggestion is to abandon my "addTask" functions in lieu of "addTaskBackground," and let the worker jobs handle the responsibility of updating the MySQL table with percentage complete, status, etc. And to do away with the callback functions. This seems do-able, although I do seem to remember having trouble on my first attempt at using the addTaskBackground functions. I couldn't find a way to get the Gearman status of these jobs after they were submitted (QUEUED, RUNNING, COMPLETE). But I'll give it another shot, if that's your suggestion.

Have you tried 1.1.15?

No, but I will work on that as well and let you know how it goes.

Aaron Lozier

unread,

May 4, 2017, 9:45:50 AM5/4/17

to Gearman

Interesting. I was just pondering this morning, before I saw your response, whether part of my trouble could be the worker processes themselves. In my implementation, I automatically exit the worker process if it has been running for more than an hour (assuming it is not currently processing a job) OR if a new deployment has occured. I felt these checks were necessary to ensure the worker wouldn't continue running outdated code. It seems reasonable to assume, based on what others have written, that Gearman may not be realizing these workers are no longer available.

Suggestions on this, Clint?

Aaron Lozier

unread,

May 4, 2017, 9:46:36 AM5/4/17

to Gearman

Forgot to mention: I did try the command you suggested but even after running for 24 hours, I only show established connections.

root@worker2:~# lsof -p `pidof gearmand` | fgrep TCP

gearmand 794 gearman 9u IPv4 8173 0t0 TCP *:4730 (LISTEN)

gearmand 794 gearman 10u IPv6 8174 0t0 TCP *:4730 (LISTEN)

gearmand 794 gearman 33u IPv4 294742 0t0 TCP 10.138.240.28:4730->10.138.112.183:55239 (ESTABLISHED)

gearmand 794 gearman 34u IPv4 2058447 0t0 TCP 10.138.240.28:4730->10.138.112.183:55463 (ESTABLISHED)

gearmand 794 gearman 35u IPv4 556773 0t0 TCP 10.138.240.28:4730->10.138.112.183:57352 (ESTABLISHED)

gearmand 794 gearman 36u IPv4 1226662 0t0 TCP 10.138.240.28:4730->10.138.112.183:43889 (ESTABLISHED)

gearmand 794 gearman 37u IPv4 1803366 0t0 TCP 10.138.240.28:4730->10.138.112.183:37649 (ESTABLISHED)

gearmand 794 gearman 38u IPv4 796462 0t0 TCP 10.138.240.28:4730->10.138.112.183:60539 (ESTABLISHED)

gearmand 794 gearman 39u IPv4 61229 0t0 TCP 10.138.240.28:4730->10.138.112.183:55998 (ESTABLISHED)

gearmand 794 gearman 40u IPv4 37213 0t0 TCP 10.138.240.28:4730->10.138.112.183:54403 (ESTABLISHED)

gearmand 794 gearman 41u IPv4 1097150 0t0 TCP 10.138.240.28:4730->10.138.112.183:33877 (ESTABLISHED)

gearmand 794 gearman 42u IPv4 1629211 0t0 TCP 10.138.240.28:4730->10.138.112.183:51950 (ESTABLISHED)

gearmand 794 gearman 43u IPv4 102537 0t0 TCP 10.138.240.28:4730->10.138.112.183:60353 (ESTABLISHED)

gearmand 794 gearman 44u IPv4 1059021 0t0 TCP 10.138.240.28:4730->10.138.112.183:59459 (ESTABLISHED)

gearmand 794 gearman 45u IPv4 2059753 0t0 TCP 10.138.240.28:4730->10.138.112.183:58456 (ESTABLISHED)

gearmand 794 gearman 46u IPv4 52910 0t0 TCP 10.138.240.28:4730->10.138.112.183:55176 (ESTABLISHED)

gearmand 794 gearman 47u IPv4 1516057 0t0 TCP 10.138.240.28:4730->10.138.240.28:58078 (ESTABLISHED)

gearmand 794 gearman 48u IPv4 2072308 0t0 TCP 10.138.240.28:4730->10.138.112.183:37881 (ESTABLISHED)

gearmand 794 gearman 49u IPv4 1516042 0t0 TCP 10.138.240.28:4730->10.138.112.183:34500 (ESTABLISHED)

gearmand 794 gearman 50u IPv4 2037465 0t0 TCP 10.138.240.28:4730->10.138.112.183:40807 (ESTABLISHED)

gearmand 794 gearman 52u IPv4 2065869 0t0 TCP 10.138.240.28:4730->10.138.0.156:52906 (ESTABLISHED)

gearmand 794 gearman 53u IPv4 2066152 0t0 TCP 10.138.240.28:4730->10.138.0.156:52923 (ESTABLISHED)

gearmand 794 gearman 54u IPv4 2054916 0t0 TCP 10.138.240.28:4730->10.138.112.183:55065 (ESTABLISHED)

gearmand 794 gearman 55u IPv4 1654587 0t0 TCP 10.138.240.28:4730->10.138.112.183:53906 (ESTABLISHED)

gearmand 794 gearman 56u IPv4 1541322 0t0 TCP 10.138.240.28:4730->10.138.112.183:38534 (ESTABLISHED)

gearmand 794 gearman 57u IPv4 11177 0t0 TCP 10.138.240.28:4730->10.138.0.156:38832 (ESTABLISHED)

gearmand 794 gearman 58u IPv4 26468 0t0 TCP 10.138.240.28:4730->10.138.112.183:53051 (ESTABLISHED)

gearmand 794 gearman 59u IPv4 1342442 0t0 TCP 10.138.240.28:4730->10.138.112.183:49107 (ESTABLISHED)

gearmand 794 gearman 60u IPv4 32813 0t0 TCP 10.138.240.28:4730->10.138.112.183:53783 (ESTABLISHED)

gearmand 794 gearman 61u IPv4 1946294 0t0 TCP 10.138.240.28:4730->10.138.112.183:42830 (ESTABLISHED)

gearmand 794 gearman 63u IPv4 1687104 0t0 TCP 10.138.240.28:4730->10.138.112.183:55248 (ESTABLISHED)

gearmand 794 gearman 64u IPv4 1630342 0t0 TCP 10.138.240.28:4730->10.138.240.28:47814 (ESTABLISHED)

gearmand 794 gearman 65u IPv4 190775 0t0 TCP 10.138.240.28:4730->10.138.112.183:43088 (ESTABLISHED)

gearmand 794 gearman 66u IPv4 2031486 0t0 TCP 10.138.240.28:4730->10.138.112.183:38689 (ESTABLISHED)

gearmand 794 gearman 67u IPv4 2063770 0t0 TCP 10.138.240.28:4730->10.138.112.183:60613 (ESTABLISHED)

gearmand 794 gearman 68u IPv4 1685918 0t0 TCP 10.138.240.28:4730->10.138.240.28:57036 (ESTABLISHED)

gearmand 794 gearman 69u IPv4 1993026 0t0 TCP 10.138.240.28:4730->10.138.112.183:37511 (ESTABLISHED)

gearmand 794 gearman 70u IPv4 2073329 0t0 TCP 10.138.240.28:4730->10.138.240.28:55105 (ESTABLISHED)

gearmand 794 gearman 71u IPv4 2047229 0t0 TCP 10.138.240.28:4730->10.138.240.28:52196 (ESTABLISHED)

gearmand 794 gearman 73u IPv4 2066159 0t0 TCP 10.138.240.28:4730->10.138.0.156:52927 (ESTABLISHED)

gearmand 794 gearman 74u IPv4 1856009 0t0 TCP 10.138.240.28:4730->10.138.112.183:58012 (ESTABLISHED)

gearmand 794 gearman 75u IPv4 1815629 0t0 TCP 10.138.240.28:4730->10.138.0.156:50528 (ESTABLISHED)

gearmand 794 gearman 76u IPv4 2066170 0t0 TCP 10.138.240.28:4730->10.138.112.183:33272 (ESTABLISHED)

gearmand 794 gearman 77u IPv4 2024398 0t0 TCP 10.138.240.28:4730->10.138.112.183:55905 (ESTABLISHED)

gearmand 794 gearman 78u IPv4 2073037 0t0 TCP 10.138.240.28:4730->10.138.112.183:35879 (ESTABLISHED)

gearmand 794 gearman 79u IPv4 1962985 0t0 TCP 10.138.240.28:4730->10.138.112.183:44088 (ESTABLISHED)

gearmand 794 gearman 80u IPv4 2059882 0t0 TCP 10.138.240.28:4730->10.138.112.183:58833 (ESTABLISHED)

gearmand 794 gearman 81u IPv4 2073353 0t0 TCP 10.138.240.28:4730->10.138.112.183:36764 (ESTABLISHED)

gearmand 794 gearman 82u IPv4 2000079 0t0 TCP 10.138.240.28:4730->10.138.0.156:51981 (ESTABLISHED)

gearmand 794 gearman 83u IPv4 2070961 0t0 TCP 10.138.240.28:4730->10.138.0.156:52936 (ESTABLISHED)

gearmand 794 gearman 84u IPv4 2066311 0t0 TCP 10.138.240.28:4730->10.138.112.183:33604 (ESTABLISHED)

gearmand 794 gearman 85u IPv4 2071678 0t0 TCP 10.138.240.28:4730->10.138.112.183:34495 (ESTABLISHED)

gearmand 794 gearman 86u IPv4 2070970 0t0 TCP 10.138.240.28:4730->10.138.0.156:52938 (ESTABLISHED)

On Wednesday, May 3, 2017 at 11:03:11 PM UTC-5, brianlmoon wrote:

Aaron Lozier

unread,

May 4, 2017, 10:32:41 AM5/4/17

to Gearman

If I do away with the callbacks, the main problem I have is figuring out if a job has been queued. Do I have any options here other than simply polling?

Clint Byrum

unread,

May 4, 2017, 1:06:27 PM5/4/17

to gearman

Excerpts from Aaron Lozier's message of 2017-05-04 06:42:33 -0700:

I think there are some other options certainly. You can use AMQP based
solutions like qpid or RabbitMQ. There's also 0mq. But those are all
lower level, so they'll require you to decide things like whether to use
fanout or reliable messaging.

Not quite. What I am suggesting is that you have two queues. One for
submitting and waiting for work, and the other stays the same as the
one you have now. So you'd fire off all the background "submit this and
wait for a status" in your cron job, and then have as many workers as you
need to spread out the load of waiting for WORK_STATUS and WORK_COMPLETE
messages and updating your monitoring DB. But those would still submit
as single foreground jobs, so that they can continue to do what the cron
job does.

This still sounds far fetched. I think you do have TCP/connection
problems. I just don't know exactly what they are.

> >
> > Have you tried 1.1.15?
> >
>
> No, but I will work on that as well and let you know how it goes.
>

It would help us if for no other reason than we can more reasonably
address any code issues if you find them. There's been a lot of bug
fixing since the 1.0 series was released.

Aaron Lozier

unread,

May 5, 2017, 3:30:21 PM5/5/17

to Gearman

Hmm, well, I may have misunderstood but I already refactored my code such that the worker jobs themselves are responsible for reporting on their own status and progress. Couldn't hurt I guess. Also upgraded Gearman. Also discovered at least one problem on my end: I recently made an update such that a worker exists upon completion of one job. The problem I ran into is if the job could be completed very quickly, supervisor detected as a failure, eventually no longer spawning that worker. This is easily fixable, at least.

What are your thoughts on workers exiting but Gearman still thinking they are there? I have seen others report this type of issue. Is it a real problem? Is there a workaround?

Reply all

Reply to author

Forward