gearadmin --status / different stats for different servers

697 views
Skip to first unread message

Aaron Lozier

unread,
May 3, 2017, 12:59:18 PM5/3/17
to Gearman
I currently have three worker servers (as in, physical servers). The first is running 5 gearman workers, the other two 10 each - using supervisor to ensure these running processes.

I am troubleshooting multiple problems with my background job processing, including:

1) Not all workers being utilized. Generally have no more than 14 being used at one time, with other jobs remaining in the queued state.
2) High CPU utilization grinding my servers to a halt (granted, not really Gearman's problem)
3) Jobs stalling/not able to retrieve status after many tries.
4) Job servers seem to be out of sync.

Since it is difficult to get at the root of these issues, I'm trying to focus first on one issue that is a bit confounding, and could possibly be the root of other issues.

On each worker server, if I run gearadmin --status, I get radically different results:

worker #1 (running 5 workers)

run_observer 68 5 5


worker #2 (running 5 workers)

run_observer 11 4 6


worker #3 (running 5 workers)

run_observer 65 10 10


So, depending upon which worker I ask, I have anywhere from 11 - 68 jobs queued, 4 - 10 jobs running (It should be more like 14), and 21 available workers. Basically, none of these numbers make sense no matter how I interpret them, and shouldn't they be in sync?


Am I completely misunderstanding how the worker servers are supposed to be setup in a multi-server environment?


As far as I can tell, jobs ARE being distributed to different workers. It's just that the utilization and stats seem far off from what I'm expecting.


The results of gearadmin --workers are similarly skewed. It should be noted however, that when I run this command I DO see the private IP of all three worker servers listed, albeit with significantly varying distributions. 




Clint Byrum

unread,
May 3, 2017, 2:04:01 PM5/3/17
to gearman
Excerpts from Aaron Lozier's message of 2017-05-03 09:59:18 -0700:
I think you missed the point a little, but you're not alone, as over the
years gearman has unfortunately suffered from a deafening silence
resulting in misconceptions.

The idea is that all workers, and all clients, should be setup to contact
all gearmands. At scale, like, if you have hundreds of gearmands or
workers, you'll probably want to limit that a bit, but you're not up at
100+ gearmands, and 10000+ workers, so let's ignore that.

The reason for adding all the servers is that gearmand is designed to
distribute work at the edges. So clients should _also_ list all the
servers, or a portion of them at scale. If you've used memcached, this
should seem familiar.

Now, due to an oversight when libgearman was written [1] we make this
a lot less cool than it could be. In the original perl system that
livejournal built, the unique ID of each job was hashed on the client
and sent to a server based on that hash. This allowed multiple clients
submitting the same job to the same cluster to coalesce on a single
running worker's output. This is particularly effective at thundering-herd
prevention when you use a slow database and a fast cache such as memcached
(which was also created by livejournal.. go figure! ;)

But for background jobs, what you should be most interested in is just
making sure clients submit their jobs to a randomly chosen gearmand. For
that you can just randomize the list when adding servers.

[1] https://github.com/gearman/gearmand/issues/64

>
> As far as I can tell, jobs ARE being distributed to different workers. It's
> just that the utilization and stats seem far off from what I'm expecting.
>
>
> The results of gearadmin --workers are similarly skewed. It should be noted
> however, that when I run this command I DO see the private IP of all three
> worker servers listed, albeit with significantly varying distributions.
>

It sounds to me like you have 3 servers running both worker processes
and gearmand. Guessing you just have your workers using 127.0.0.1 as
the sole gearmand to contact. This is the mistake.

You should add all of the gearmand ips to all of your clients and
workers. You should randomize the order of that list on the clients,
so that each one selects a different gearmand as its primary job target
(they'll fail over to the others automatically if the first one fails).

Last note, for workers, they'll always connect to and wait for jobs on
all of the servers you add. This way wherever work is submitted, it will
be presented to all of the workers, and thus, those that are not busy
will get first crack, and you should see much better distribution of work.

Aaron Lozier

unread,
May 3, 2017, 3:29:13 PM5/3/17
to Gearman
Enter code here...

I really appreciate you taking the time to respond. Your description of the "deafening silence" surrounding Gearman is an apt one. I feel like I have read everything that has been written about Gearman, desperate to find some clarity. For better or for worse, I have invested a huge amount of time and energy into implementing Gearman into our application. It would be one thing if it was failing for me completely, but it's much more subtle than that. The actual situation I've been dealing with is that upon a full reboot of all worker servers, everything works flawlessly - for awhile. It seems around the 2-3 day mark that cracks start appearing. And the symptoms are varied. I forgot to mention another problem: sometimes workers sit in the queue for a very long time before beginning to run. Actually that could be related to my #1 above.

I have read your response a number of times, and as far as I can tell, I am already set up the way you describe. I have a function which executes the client, and another that executes a worker. For simplicity, I felt it was best to only run the client on the first worker server, but I have multiple instances of the worker function running on the three servers as I described above, managed by supervisor.

My application is written in the Zend Framework with Doctrine ORM, for what it's worth.

Worker:

     /**
     * Start gearman worker
     */
    public function gearmanWorkerAction()
    {                
        set_error_handler([$this,'errHandle']);
        register_shutdown_function([$this,'fatalErrorShutdownHandler']);
       
        $this->_start_version = filemtime(realpath(APPLICATION_PATH . DIRECTORY_SEPARATOR . '..'));
        $this->_worker_start_time = time();
       
        $this->_helper->layout->disableLayout();
        $this->_helper->viewRenderer->setNoRender(true);
       
        $worker= new \GearmanWorker();
        foreach($this->_getWorkerServers() as $worker_server){
            $worker->addServer($worker_server[0],$worker_server[1]);
        }
        $worker->addFunction("run_observer", array($this,"run_observer"));
        while ($worker->work());
    }

Client:

    /**
     * Queue any tasks that need to be run by gearman worker. This should be on a cron job.
     */
    public function gearmanClientAction()
    {                        
        $client = new GearmanClient();
        foreach($this->_getWorkerServers() as $worker_server){
            $client->addServer($worker_server[0],$worker_server[1]);
        }
         
        $this->_helper->layout->disableLayout();
        $this->_helper->viewRenderer->setNoRender(true);
       
        $this->_checkSubmittedTasks($client);
        /**
         * Make sure all scheduled jobs configured in .ini files are registered in the database
         */
        $this->_registerConfiguredGearmanTasks();

        $gearman_tasks = $this->getEntityManager()->getRepository('\Ia\Entity\GearmanTask')->getTasksToRun();
        if(count($gearman_tasks)>0){
            /**
             * Register callbacks
             */
            $client->setStatusCallback(array($this,'_gearmanTaskStatus'));
            $client->setDataCallback(array($this,'_gearmanTaskData'));
            $client->setCreatedCallback(array($this,'_gearmanTaskCreated'));
            $client->setCompleteCallback(array($this,'_gearmanTaskComplete'));
            $client->setExceptionCallback(array($this,'_gearmanTaskException'));
            $client->setFailCallback(array($this,'_gearmanTaskFail'));
            foreach($gearman_tasks as $gearman_task){
                if($gearman_task->priority == \Ia\Entity\GearmanTask::PRIORITY_HIGH
                    || APPLICATION_ENV == 'production'){
                    $gearman_task->setState(\Ia\Entity\GearmanTask::STATE_SUBMITTED);
                    $workload = serialize(array($gearman_task->resource,unserialize($gearman_task->params)));
                    $unique_id = ($gearman_task->type==\Ia\Entity\GearmanTask::TYPE_SCHEDULED) ? $gearman_task->id.'_'.$gearman_task->next_run : $gearman_task->id;
                    switch($gearman_task->priority){
                        case \Ia\Entity\GearmanTask::PRIORITY_HIGH:
                            $client->addTaskHigh("run_observer", $workload, null, $unique_id);
                            break;
                        case \Ia\Entity\GearmanTask::PRIORITY_LOW:
                            $client->addTaskLow("run_observer", $workload, null, $unique_id);
                            break;
                        default:
                            $client->addTask("run_observer", $workload, null, $unique_id);
                            break;
                    }
                    $this->getEntityManager()->persist($gearman_task);
                    $this->getEntityManager()->flush();
                }    
            }
            $client->runTasks();
        }
    }

Get Worker Servers:

    protected function _getWorkerServers()
    {
        if(count($this->_worker_servers)==0){
            if(\Ia\Config::get('gearman_worker/servers')){
                foreach(\Ia\Config::get('gearman_worker/servers') as $worker_server){
                    $parts = explode(':',$worker_server);
                    if(count($parts)==2){
                        $this->_worker_servers[] = $parts;
                    }
                }
            }
            if(count($this->_worker_servers)==0){
                $this->_worker_servers[] = ['127.0.0.1', 4730];
            }
        }
        shuffle($this->_worker_servers);
        return $this->_worker_servers;
    }

So as you can see, I am setting all worker servers in both the client and worker actions.

To re-cap:

1) I have gearman running on all three servers (gearman-job-server)
2) worker1 contains a cron executing gearmanClientAction() every minute. If it has jobs to runs, it sends to gearman as shown above and remains running to monitor the progress of the jobs and to update my mysql table so I can watch the status on a UI I have written.
3) worker1 is always keeping open 5 instances of gearmanWorkerAction() using supervisor, whereas worker 2 and 3 each have 10. I put less on worker1 with the thinking that since it's also handling the client, I want to keep more resource availability on that server. Maybe unnecessary, or maybe overkill.

Does this raise any red flags that you can see?

Aaron

Aaron Lozier

unread,
May 3, 2017, 4:38:58 PM5/3/17
to Gearman
Can you recommend a really good monitoring tool for Gearman? Beginning to wonder how many of my problems could be related to improper reporting of the actual status of jobs, what is running/not running. Still seeing odd numbers in terms of how many workers are registered, available, etc. These do not line up with how many workers I *should* have registered. And I also can't account for the number of jobs it claims is running.

Clint Byrum

unread,
May 3, 2017, 5:45:21 PM5/3/17
to gearman
Excerpts from Aaron Lozier's message of 2017-05-03 12:29:13 -0700:
> Enter code here...
>
> I really appreciate you taking the time to respond. Your description of the
> "deafening silence" surrounding Gearman is an apt one. I feel like I have
> read everything that has been written about Gearman, desperate to find some
> clarity. For better or for worse, I have invested a huge amount of time and
> energy into implementing Gearman into our application. It would be one
> thing if it was failing for me completely, but it's much more subtle than
> that. The actual situation I've been dealing with is that upon a full
> reboot of all worker servers, everything works flawlessly - for awhile. It
> seems around the 2-3 day mark that cracks start appearing. And the symptoms
> are varied. I forgot to mention another problem: sometimes workers sit in
> the queue for a very long time before beginning to run. Actually that could
> be related to my #1 above.
>

Indeed. We're trying to bring it back, slowly. Great to have you here
and interested. :)

> I have read your response a number of times, and as far as I can tell, I am
> already set up the way you describe. I have a function which executes the
> client, and another that executes a worker. For simplicity, I felt it was
> best to only run the client on the first worker server, but I have multiple
> instances of the worker function running on the three servers as I
> described above, managed by supervisor.
>

I've used gearmand for something similar. It's been a long time though.
Cool! My assumption is proven wrong. I do have a couple of other
concerns.

> To re-cap:
>
> 1) I have gearman running on all three servers (gearman-job-server)
> 2) worker1 contains a cron executing gearmanClientAction() every minute. If
> it has jobs to runs, it sends to gearman as shown above and remains running
> to monitor the progress of the jobs and to update my mysql table so I can
> watch the status on a UI I have written.
> 3) worker1 is always keeping open 5 instances of gearmanWorkerAction()
> using supervisor, whereas worker 2 and 3 each have 10. I put less on
> worker1 with the thinking that since it's also handling the client, I want
> to keep more resource availability on that server. Maybe unnecessary, or
> maybe overkill.
>
> Does this raise any red flags that you can see?
>

Your setup is fine. However, how many tasks might be added in one of
these cron jobs? You may actually want to farm waiting for statuses
and updating your status display out to workers too. Rather than have
your cron job process stay resident and wait for all of the results from
that particular run. Otherwise I could see a situation where you have
a very busy client that spends a ton of time processing WORK_STATUS and
WORK_COMPLETE messages. Internally in gearmand, processing these in one
connection may cause quite a bit of contention for that thread from
all the other workers, though I haven't looked at that code in a while
to see if that's really true..

But ultimately, that's just a theory. I can't see any glaring red flags.

Have you tried 1.1.15?

Brian Moon

unread,
May 4, 2017, 12:03:11 AM5/4/17
to gea...@googlegroups.com
On 5/3/17 14:29 , Aaron Lozier wrote:
> The actual situation I've been
> dealing with is that upon a full reboot of all worker servers,
> everything works flawlessly - for awhile. It seems around the 2-3 day
> mark that cracks start appearing. And the symptoms are varied. I forgot
> to mention another problem: sometimes workers sit in the queue for a
> very long time before beginning to run. Actually that could be related
> to my #1 above.

When this happens for us, it tends to only happen in environments where
the job load is very low. I find gearmand has a lot of CLOSE_WAIT
connections (see below). My current working theory is that I have
workers disconnecting in an unclean manner which leave the gearmand with
these connections. Because gearmand manages workers based on their TCP
connection, it thinks it has more workers than it really does. This
makes the stats wrong and can cause delays in jobs getting run. A
restart of gearmand fixes it. I would be curious if your is in this
state as well.

$ lsof -p `pidof gearmand` | fgrep TCP
gearmand 29119 gearman 9u IPv4 4252675769 0t0
TCP *:4730 (LISTEN)
gearmand 29119 gearman 8383u IPv4 70041600 0t0
TCP localhost:4730->localhost:58159 (CLOSE_WAIT)
gearmand 29119 gearman 8384u IPv4 70041297 0t0
TCP localhost:4730->localhost:58143 (CLOSE_WAIT)
gearmand 29119 gearman 8385u IPv4 70048500 0t0
TCP localhost:4730->localhost:58325 (CLOSE_WAIT)
gearmand 29119 gearman 8386u IPv4 70065763 0t0
TCP localhost:4730->localhost:58706 (CLOSE_WAIT)
gearmand 29119 gearman 8387u IPv4 70062636 0t0
TCP localhost:4730->localhost:58649 (CLOSE_WAIT)
gearmand 29119 gearman 8388u IPv4 70084044 0t0
TCP localhost:4730->localhost:59166 (CLOSE_WAIT)
gearmand 29119 gearman 8389u IPv4 70076871 0t0
TCP localhost:4730->localhost:58987 (CLOSE_WAIT)
gearmand 29119 gearman 8390u IPv4 70086486 0t0
TCP localhost:4730->localhost:59249 (CLOSE_WAIT)
gearmand 29119 gearman 8391u IPv4 70079035 0t0
TCP localhost:4730->localhost:59051 (CLOSE_WAIT)
gearmand 29119 gearman 8392u IPv4 70087618 0t0
TCP localhost:4730->localhost:59284 (CLOSE_WAIT)
gearmand 29119 gearman 8393u IPv4 70091785 0t0
TCP localhost:4730->localhost:59368 (CLOSE_WAIT)
gearmand 29119 gearman 8394u IPv4 70099088 0t0
TCP localhost:4730->localhost:59527 (CLOSE_WAIT)
gearmand 29119 gearman 8395u IPv4 70122030 0t0
TCP localhost:4730->localhost:60108 (CLOSE_WAIT)
gearmand 29119 gearman 8396u IPv4 70132483 0t0
TCP localhost:4730->localhost:60382 (CLOSE_WAIT)
gearmand 29119 gearman 8397u IPv4 70127153 0t0
TCP localhost:4730->localhost:60334 (CLOSE_WAIT)
gearmand 29119 gearman 8398u IPv4 70143785 0t0
TCP localhost:4730->localhost:60615 (CLOSE_WAIT)
gearmand 29119 gearman 8399u IPv4 70144617 0t0
TCP localhost:4730->localhost:60735 (CLOSE_WAIT)
gearmand 29119 gearman 8400u IPv4 70152139 0t0
TCP localhost:4730->localhost:60943 (CLOSE_WAIT)
gearmand 29119 gearman 8401u IPv4 70158506 0t0
TCP localhost:4730->localhost:60985 (CLOSE_WAIT)
gearmand 29119 gearman 8402u IPv4 70167842 0t0
TCP localhost:4730->localhost:32997 (CLOSE_WAIT)
gearmand 29119 gearman 8403u IPv4 70086588 0t0
TCP localhost:4730->localhost:59252 (ESTABLISHED)
gearmand 29119 gearman 8404u IPv4 70171222 0t0
TCP localhost:4730->localhost:33077 (CLOSE_WAIT)
gearmand 29119 gearman 8405u IPv4 70090847 0t0
TCP localhost:4730->localhost:59373 (ESTABLISHED)
gearmand 29119 gearman 8406u IPv4 70099123 0t0
TCP localhost:4730->localhost:59530 (ESTABLISHED)
gearmand 29119 gearman 8407u IPv4 70122033 0t0
TCP localhost:4730->localhost:60111 (ESTABLISHED)
gearmand 29119 gearman 8408u IPv4 70131217 0t0
TCP localhost:4730->localhost:60338 (ESTABLISHED)
gearmand 29119 gearman 8409u IPv4 70132986 0t0
TCP localhost:4730->localhost:60387 (ESTABLISHED)
gearmand 29119 gearman 8410u IPv4 70144012 0t0
TCP localhost:4730->localhost:60618 (ESTABLISHED)
gearmand 29119 gearman 8411u IPv4 70144638 0t0
TCP localhost:4730->localhost:60740 (ESTABLISHED)
gearmand 29119 gearman 8412u IPv4 70152148 0t0
TCP localhost:4730->localhost:60947 (ESTABLISHED)
gearmand 29119 gearman 8413u IPv4 70158518 0t0
TCP localhost:4730->localhost:60988 (ESTABLISHED)
gearmand 29119 gearman 8414u IPv4 70167846 0t0
TCP localhost:4730->localhost:33002 (ESTABLISHED)
gearmand 29119 gearman 8415u IPv4 70171866 0t0
TCP localhost:4730->localhost:33082 (ESTABLISHED)

--

Brian.
--------
http://brian.moonspot.net/

Aaron Lozier

unread,
May 4, 2017, 9:42:35 AM5/4/17
to Gearman


On Wednesday, May 3, 2017 at 4:45:21 PM UTC-5, Clint "SpamapS' Byrum wrote:
Excerpts from Aaron Lozier's message of 2017-05-03 12:29:13 -0700:
> Enter code here...
>
> I really appreciate you taking the time to respond. Your description of the
> "deafening silence" surrounding Gearman is an apt one. I feel like I have
> read everything that has been written about Gearman, desperate to find some
> clarity. For better or for worse, I have invested a huge amount of time and
> energy into implementing Gearman into our application. It would be one
> thing if it was failing for me completely, but it's much more subtle than
> that. The actual situation I've been dealing with is that upon a full
> reboot of all worker servers, everything works flawlessly - for awhile. It
> seems around the 2-3 day mark that cracks start appearing. And the symptoms
> are varied. I forgot to mention another problem: sometimes workers sit in
> the queue for a very long time before beginning to run. Actually that could
> be related to my #1 above.
>

Indeed. We're trying to bring it back, slowly. Great to have you here
and interested. :)

Thanks! I am encouraged to hear the project is being revived. I was beginning to feel I should investigate other systems, but for PHP at least, I don't think there's anything better out there. Certainly not anything that has all of the features Gearman has. Correct me if I'm wrong. Add to that the fact that what I've built seems SO close to working flawlessly.
Ok, so to confirm I am understanding, your suggestion is to abandon my "addTask" functions in lieu of "addTaskBackground," and let the worker jobs handle the responsibility of updating the MySQL table with percentage complete, status, etc. And to do away with the callback functions. This seems do-able, although I do seem to remember having trouble on my first attempt at using the addTaskBackground functions. I couldn't find a way to get the Gearman status of these jobs after they were submitted (QUEUED, RUNNING, COMPLETE). But I'll give it another shot, if that's your suggestion.
 

Have you tried 1.1.15?

No, but I will work on that as well and let you know how it goes.

Aaron Lozier

unread,
May 4, 2017, 9:45:50 AM5/4/17
to Gearman
Interesting. I was just pondering this morning, before I saw your response, whether part of my trouble could be the worker processes themselves. In my implementation, I automatically exit the worker process if it has been running for more than an hour (assuming it is not currently processing a job) OR if a new deployment has occured. I felt these checks were necessary to ensure the worker wouldn't continue running outdated code. It seems reasonable to assume, based on what others have written, that Gearman may not be realizing these workers are no longer available.

Suggestions on this, Clint?

Aaron Lozier

unread,
May 4, 2017, 9:46:36 AM5/4/17
to Gearman
Forgot to mention: I did try the command you suggested but even after running for 24 hours, I only show established connections.

root@worker2:~# lsof -p `pidof gearmand` | fgrep TCP

gearmand 794 gearman    9u  IPv4               8173      0t0    TCP *:4730 (LISTEN)

gearmand 794 gearman   10u  IPv6               8174      0t0    TCP *:4730 (LISTEN)

gearmand 794 gearman   33u  IPv4             294742      0t0    TCP 10.138.240.28:4730->10.138.112.183:55239 (ESTABLISHED)

gearmand 794 gearman   34u  IPv4            2058447      0t0    TCP 10.138.240.28:4730->10.138.112.183:55463 (ESTABLISHED)

gearmand 794 gearman   35u  IPv4             556773      0t0    TCP 10.138.240.28:4730->10.138.112.183:57352 (ESTABLISHED)

gearmand 794 gearman   36u  IPv4            1226662      0t0    TCP 10.138.240.28:4730->10.138.112.183:43889 (ESTABLISHED)

gearmand 794 gearman   37u  IPv4            1803366      0t0    TCP 10.138.240.28:4730->10.138.112.183:37649 (ESTABLISHED)

gearmand 794 gearman   38u  IPv4             796462      0t0    TCP 10.138.240.28:4730->10.138.112.183:60539 (ESTABLISHED)

gearmand 794 gearman   39u  IPv4              61229      0t0    TCP 10.138.240.28:4730->10.138.112.183:55998 (ESTABLISHED)

gearmand 794 gearman   40u  IPv4              37213      0t0    TCP 10.138.240.28:4730->10.138.112.183:54403 (ESTABLISHED)

gearmand 794 gearman   41u  IPv4            1097150      0t0    TCP 10.138.240.28:4730->10.138.112.183:33877 (ESTABLISHED)

gearmand 794 gearman   42u  IPv4            1629211      0t0    TCP 10.138.240.28:4730->10.138.112.183:51950 (ESTABLISHED)

gearmand 794 gearman   43u  IPv4             102537      0t0    TCP 10.138.240.28:4730->10.138.112.183:60353 (ESTABLISHED)

gearmand 794 gearman   44u  IPv4            1059021      0t0    TCP 10.138.240.28:4730->10.138.112.183:59459 (ESTABLISHED)

gearmand 794 gearman   45u  IPv4            2059753      0t0    TCP 10.138.240.28:4730->10.138.112.183:58456 (ESTABLISHED)

gearmand 794 gearman   46u  IPv4              52910      0t0    TCP 10.138.240.28:4730->10.138.112.183:55176 (ESTABLISHED)

gearmand 794 gearman   47u  IPv4            1516057      0t0    TCP 10.138.240.28:4730->10.138.240.28:58078 (ESTABLISHED)

gearmand 794 gearman   48u  IPv4            2072308      0t0    TCP 10.138.240.28:4730->10.138.112.183:37881 (ESTABLISHED)

gearmand 794 gearman   49u  IPv4            1516042      0t0    TCP 10.138.240.28:4730->10.138.112.183:34500 (ESTABLISHED)

gearmand 794 gearman   50u  IPv4            2037465      0t0    TCP 10.138.240.28:4730->10.138.112.183:40807 (ESTABLISHED)

gearmand 794 gearman   52u  IPv4            2065869      0t0    TCP 10.138.240.28:4730->10.138.0.156:52906 (ESTABLISHED)

gearmand 794 gearman   53u  IPv4            2066152      0t0    TCP 10.138.240.28:4730->10.138.0.156:52923 (ESTABLISHED)

gearmand 794 gearman   54u  IPv4            2054916      0t0    TCP 10.138.240.28:4730->10.138.112.183:55065 (ESTABLISHED)

gearmand 794 gearman   55u  IPv4            1654587      0t0    TCP 10.138.240.28:4730->10.138.112.183:53906 (ESTABLISHED)

gearmand 794 gearman   56u  IPv4            1541322      0t0    TCP 10.138.240.28:4730->10.138.112.183:38534 (ESTABLISHED)

gearmand 794 gearman   57u  IPv4              11177      0t0    TCP 10.138.240.28:4730->10.138.0.156:38832 (ESTABLISHED)

gearmand 794 gearman   58u  IPv4              26468      0t0    TCP 10.138.240.28:4730->10.138.112.183:53051 (ESTABLISHED)

gearmand 794 gearman   59u  IPv4            1342442      0t0    TCP 10.138.240.28:4730->10.138.112.183:49107 (ESTABLISHED)

gearmand 794 gearman   60u  IPv4              32813      0t0    TCP 10.138.240.28:4730->10.138.112.183:53783 (ESTABLISHED)

gearmand 794 gearman   61u  IPv4            1946294      0t0    TCP 10.138.240.28:4730->10.138.112.183:42830 (ESTABLISHED)

gearmand 794 gearman   63u  IPv4            1687104      0t0    TCP 10.138.240.28:4730->10.138.112.183:55248 (ESTABLISHED)

gearmand 794 gearman   64u  IPv4            1630342      0t0    TCP 10.138.240.28:4730->10.138.240.28:47814 (ESTABLISHED)

gearmand 794 gearman   65u  IPv4             190775      0t0    TCP 10.138.240.28:4730->10.138.112.183:43088 (ESTABLISHED)

gearmand 794 gearman   66u  IPv4            2031486      0t0    TCP 10.138.240.28:4730->10.138.112.183:38689 (ESTABLISHED)

gearmand 794 gearman   67u  IPv4            2063770      0t0    TCP 10.138.240.28:4730->10.138.112.183:60613 (ESTABLISHED)

gearmand 794 gearman   68u  IPv4            1685918      0t0    TCP 10.138.240.28:4730->10.138.240.28:57036 (ESTABLISHED)

gearmand 794 gearman   69u  IPv4            1993026      0t0    TCP 10.138.240.28:4730->10.138.112.183:37511 (ESTABLISHED)

gearmand 794 gearman   70u  IPv4            2073329      0t0    TCP 10.138.240.28:4730->10.138.240.28:55105 (ESTABLISHED)

gearmand 794 gearman   71u  IPv4            2047229      0t0    TCP 10.138.240.28:4730->10.138.240.28:52196 (ESTABLISHED)

gearmand 794 gearman   73u  IPv4            2066159      0t0    TCP 10.138.240.28:4730->10.138.0.156:52927 (ESTABLISHED)

gearmand 794 gearman   74u  IPv4            1856009      0t0    TCP 10.138.240.28:4730->10.138.112.183:58012 (ESTABLISHED)

gearmand 794 gearman   75u  IPv4            1815629      0t0    TCP 10.138.240.28:4730->10.138.0.156:50528 (ESTABLISHED)

gearmand 794 gearman   76u  IPv4            2066170      0t0    TCP 10.138.240.28:4730->10.138.112.183:33272 (ESTABLISHED)

gearmand 794 gearman   77u  IPv4            2024398      0t0    TCP 10.138.240.28:4730->10.138.112.183:55905 (ESTABLISHED)

gearmand 794 gearman   78u  IPv4            2073037      0t0    TCP 10.138.240.28:4730->10.138.112.183:35879 (ESTABLISHED)

gearmand 794 gearman   79u  IPv4            1962985      0t0    TCP 10.138.240.28:4730->10.138.112.183:44088 (ESTABLISHED)

gearmand 794 gearman   80u  IPv4            2059882      0t0    TCP 10.138.240.28:4730->10.138.112.183:58833 (ESTABLISHED)

gearmand 794 gearman   81u  IPv4            2073353      0t0    TCP 10.138.240.28:4730->10.138.112.183:36764 (ESTABLISHED)

gearmand 794 gearman   82u  IPv4            2000079      0t0    TCP 10.138.240.28:4730->10.138.0.156:51981 (ESTABLISHED)

gearmand 794 gearman   83u  IPv4            2070961      0t0    TCP 10.138.240.28:4730->10.138.0.156:52936 (ESTABLISHED)

gearmand 794 gearman   84u  IPv4            2066311      0t0    TCP 10.138.240.28:4730->10.138.112.183:33604 (ESTABLISHED)

gearmand 794 gearman   85u  IPv4            2071678      0t0    TCP 10.138.240.28:4730->10.138.112.183:34495 (ESTABLISHED)

gearmand 794 gearman   86u  IPv4            2070970      0t0    TCP 10.138.240.28:4730->10.138.0.156:52938 (ESTABLISHED)


On Wednesday, May 3, 2017 at 11:03:11 PM UTC-5, brianlmoon wrote:

Aaron Lozier

unread,
May 4, 2017, 10:32:41 AM5/4/17
to Gearman
If I do away with the callbacks, the main problem I have is figuring out if a job has been queued. Do I have any options here other than simply polling?

Clint Byrum

unread,
May 4, 2017, 1:06:27 PM5/4/17
to gearman
Excerpts from Aaron Lozier's message of 2017-05-04 06:42:33 -0700:
I think there are some other options certainly. You can use AMQP based
solutions like qpid or RabbitMQ. There's also 0mq. But those are all
lower level, so they'll require you to decide things like whether to use
fanout or reliable messaging.
Not quite. What I am suggesting is that you have two queues. One for
submitting and waiting for work, and the other stays the same as the
one you have now. So you'd fire off all the background "submit this and
wait for a status" in your cron job, and then have as many workers as you
need to spread out the load of waiting for WORK_STATUS and WORK_COMPLETE
messages and updating your monitoring DB. But those would still submit
as single foreground jobs, so that they can continue to do what the cron
job does.

This still sounds far fetched. I think you do have TCP/connection
problems. I just don't know exactly what they are.

> >
> > Have you tried 1.1.15?
> >
>
> No, but I will work on that as well and let you know how it goes.
>

It would help us if for no other reason than we can more reasonably
address any code issues if you find them. There's been a lot of bug
fixing since the 1.0 series was released.

Aaron Lozier

unread,
May 5, 2017, 3:30:21 PM5/5/17
to Gearman
Hmm, well, I may have misunderstood but I already refactored my code such that the worker jobs themselves are responsible for reporting on their own status and progress. Couldn't hurt I guess. Also upgraded Gearman. Also discovered at least one problem on my end: I recently made an update such that a worker exists upon completion of one job. The problem I ran into is if the job could be completed very quickly, supervisor detected as a failure, eventually no longer spawning that worker. This is easily fixable, at least.

What are your thoughts on workers exiting but Gearman still thinking they are there? I have seen others report this type of issue. Is it a real problem? Is there a workaround?
Reply all
Reply to author
Forward
0 new messages