Existing production-ready libraries for parallelizing HTTP requests?

135 views
Skip to first unread message

t...@airmap.com

unread,
Aug 19, 2019, 12:20:53 PM8/19/19
to golang-nuts
tl;dr Do you of any libraries for parallelizing HTTP requests with per-server concurrency control and handling of retries?

I'm writing a service that fetches many independent small binary blobs (map tiles) over HTTP from a several upstream servers and package them together in to a single archive. I want to parallelize the fetching of the small binary blobs. Currently there are O(10) upstream servers and O(1000) small binary blobs fetched from each.

Making parallel HTTP requests in Go is trivially easy and is demonstrated in many Go tutorials and blog posts. However, I'm looking for a "production ready" library that supports:
* Per upstream server concurrency limits.
* Overall (across all upstream servers) concurrency limits.
* Controllable retries with exponential backoff in the case of upstream server errors.
* Timeouts for upstream requests.
* context.Context support.

This would seem to be a common enough task that I would expect to find an existing library that does all of the above. Existing Go web scrapers, e.g. colly, likely have this functionality internally but do not expose it in their API and are instead focused on crawling web pages.

Do you know of any such library?

Many thanks,
Tom

Confidentiality Notice:
This electronic message and any attached documents contain confidential and privileged information and is for the sole use of the individual or entity to whom it is addressed. If you are not the addressee of this email, or the employee or agent responsible for delivering it to the addressee, you are hereby notified that any dissemination, distribution or copying of this transmission is strictly prohibited. If you receive this message in error, please notify the sender immediately by return e-mail or telephone and destroy the attached message (and all attached documents) immediately. Thank you for your cooperation.

Thomas Bushnell, BSG

unread,
Aug 20, 2019, 3:33:36 PM8/20/19
to t...@airmap.com, golang-nuts
I am of the opinion that a case like this is best handled by simply writing the thing you want.

Concurrency limits are easily managed by using tokens to gate fetches. One simple technique is to make a channel of struct{} with capacity equal to the maximum number of concurrent connections you are allowed. You can either fill it with things at startup, and then read from it before a request and send back to the channel when done, or start with it empty and send to it before a request and read back after. Either is equivalent.

I'm not sure the point of overall concurrency limits in general, but the same thing works. It's unlikely to be a problem IMO for the size of job you describe.

Retries are best done inside each fetch; wrap http.Get with the logic you want. There is no one-size-fits all here. There is a public backoff library available, but it's a bit complex and the code could easily be simpler if you address exactly what you want directly.

For contexts, just use the http package's (*Request).WithContext method. That accomplishes timeouts too.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/cfb138b2-88d4-46ee-9315-996389718bad%40googlegroups.com.

Sam Fourman Jr.

unread,
Aug 20, 2019, 11:39:10 PM8/20/19
to Thomas Bushnell, BSG, t...@airmap.com, golang-nuts
I am also in the same boat as tom, there is certainly a demand for this type of library.

-- Sam Fourman



--

Sam Fourman Jr.

roger peppe

unread,
Aug 21, 2019, 4:17:35 AM8/21/19
to t...@airmap.com, golang-nuts
For just the exponential backoff part, you might want to take a look at https://godoc.org/gopkg.in/retry.v1 which provides easily pluggable retry strategies, including exponential backoff with jitter, and you can write your code as a normal for loop - no awkward callback to use.

  cheers,
    rog.

--
Reply all
Reply to author
Forward
0 new messages