Please help, not sure if I am using Gearman for the right use Case

Jitendra Singh

unread,

Jun 7, 2018, 7:41:13 PM6/7/18

to Gearman

Hi Group,

I am very newbie to concept of background processing and Gearman, we have got a project in which we have around 200,000 record of data stored on a Server, the Server itself has a REST API, but the endpoints gives back complete set of 200,000 records with their ID and lastupdate field, Now, my use case is that I want to make again an API call for each of these ID's and get back the complete data of a particular entity, I first thought to use Guzzle in Async mode, but I am unsure about whether the Server on which Guzzle is configured, itself will support these many HTTP Request, and once I get these Data, I need to perform some Transformation from JSON to XML for each of these records, so in my sense all these tasks are going to take a decent amount of time and processing, my question is that, is Gearman the right solution for it, and if it is what approach should I take to chunk all of these records, or is it that Gearman does that job to chunk these tasks. Once again I am very new to the concepts of Gearman, in general I had a read to its documentation and after reading it, I am thinking that Gearman is the right solution, but again I might be completely wrong, so I would really appreciate if someone can guide me on with this situation. Thank You.

Clint Byrum

unread,

Jun 7, 2018, 8:57:08 PM6/7/18

to gea...@googlegroups.com

Hi! Welcome and thanks for considering Gearman. See in-line replies.

Quoting Jitendra Singh (2018-06-07 16:41:12)

Gearman is good for spreading a lot of tasks out to a lot of worker
processes. So if you want to read the JSON, transform to XML, do something
else expensive, and send it along somewhere else, that's a good use
case for a gearman worker. You can then spin up as many workers as you
need to parallelize the job and get it done faster. So you'd probably
have one main "client" that would pull the 200,000 record ID's, and
then submit them all as individual gearman jobs. The workers would read
the job ID out of gearman, and then hit your REST API to get details,
and do their transformation. If you want the results to go back to one
place, you can have them sent back to the original client as results,
or you can do these as "background" jobs and just have the workers write
the results wherever they go.

That said, 200,000 records is almost nothing. Have you tried just
walking through every record and applying transformations to each one? Is
that too slow? If not, just do that. If, however, you want to use more
CPU/memory/disk/network to get it done faster, then yes, gearman may be
a good choice.

Jitendra Singh

unread,

Jun 7, 2018, 9:54:12 PM6/7/18

to Gearman

Hi Clint,

Firstly really thank you, and appreciate your reply and help.

So I will again show my use case but this time in diagrammatic way, and also highlight my main concern.

Please Click on the image to view a higher resolution image,

So if you see I have a server sitting in between which should do this job of 200,000 PHP Guzzle GET Request and also transform each one of them, Now my concern is can Gearmen handle 200,000 workers, I mean I don't really know what is upper limit on number of workers, and if it can handle that many workers, is this the best approach to use, or should I chunk those 200,000 record may be in size of 1000's and send them batch by batch, because I am afraid of the job being killed as there might be too many workers to be handled by the server, and yes I want to speed up the Process as there is requirement of data to be synced every 2 hours, and i am not sure whether that can be achieved without Gearmen, what my goal is to make this Process of HTTP Request and Conversion scalable. Once again really thankful for your advice and looking forward with any new suggestion.

Thank You.

Clint Byrum

unread,

Jun 8, 2018, 11:57:16 AM6/8/18

to gea...@googlegroups.com

Quoting Jitendra Singh (2018-06-07 18:54:12)

> Hi Clint,
>
> Firstly really thank you, and appreciate your reply and help.
>
> So I will again show my use case but this time in diagrammatic way, and also
> highlight my main concern.
>
>

> [Flow]

>
>
>
>
> Please Click on the image to view a higher resolution image,
>
> So if you see I have a server sitting in between which should do this job of
> 200,000 PHP Guzzle GET Request and also transform each one of them, Now my
> concern is can Gearmen handle 200,000 workers, I mean I don't really know what
> is upper limit on number of workers, and if it can handle that many workers, is
> this the best approach to use, or should I chunk those 200,000 record may be in
> size of 1000's and send them batch by batch, because I am afraid of the job
> being killed as there might be too many workers to be handled by the server,
> and yes I want to speed up the Process as there is requirement of data to be
> synced every 2 hours, and i am not sure whether that can be achieved without
> Gearmen, what my goal is to make this Process of HTTP Request and Conversion
> scalable. Once again really thankful for your advice and looking forward with
> any new suggestion.
>

If you have one server in the middle, you should just fork off processes
and feed them data via a local pipe.

Gearman is for using *many* servers in that role. If you want to have many
*worker computers*, continue with gearman. If it's just one.. no, don't
gearman.

Yes gearman can handle 200,000 workers. Just make sure to keep the
payloads small, and you may want to have multiple gearmands so that you
can split the queue up. If your gearman client is written correctly,
it will balance the job submissions to each gearmand evenly (all workers
will connect to all gearmands). Verify this though, I recall that libgearman
was broken at some point and was just doing first-available submission.

Reply all

Reply to author

Forward