Performance hit when using goroutine

179 views
Skip to first unread message

Hawari Rahman

unread,
May 22, 2018, 1:12:04 AM5/22/18
to golang-nuts
Hi Everyone,

So for the background, I have an API for retrieving data from a database. For an endpoint there exists two function calls:
- SelectCollections - for retrieving the collection records
- HydrateCollection - for hydrating each collections (the second function is called on each results of the SelectCollections function).

Previously we run the HydrateCollections sequentially using a for loop, and it results in the following:

2018/05/22 10:59:05 HydrateCollection took 41.497134ms
2018/05/22 10:59:05 HydrateCollection took 42.695152ms
2018/05/22 10:59:05 HydrateCollection took 22.870913ms
2018/05/22 10:59:05 HydrateCollection took 17.833512ms
2018/05/22 10:59:05 HydrateCollection took 18.765984ms
2018/05/22 10:59:05 HydrateCollection took 16.338906ms
2018/05/22 10:59:05 HydrateCollection took 14.660654ms
2018/05/22 10:59:05 HydrateCollection took 20.818716ms
2018/05/22 10:59:05 HydrateCollection took 18.57054ms
2018/05/22 10:59:05 Resolving Items took 172.9182ms
2018/05/22 10:59:05 SelectCollections took 173.992051ms

After using goroutines to run HydrateCollections now the results is as follows:

2018/05/22 11:21:36 HydrateCollection took 9.187861ms
2018/05/22 11:21:36 HydrateCollection took 63.651507ms
2018/05/22 11:21:36 HydrateCollection took 79.199976ms
2018/05/22 11:21:36 HydrateCollection took 122.349986ms
2018/05/22 11:21:37 HydrateCollection took 150.627746ms
2018/05/22 11:21:37 HydrateCollection took 168.432517ms
2018/05/22 11:21:37 HydrateCollection took 171.602705ms
2018/05/22 11:21:37 HydrateCollection took 179.127794ms
2018/05/22 11:21:37 HydrateCollection took 185.137562ms
2018/05/22 11:21:37 Resolving items took 185.434821ms
2018/05/22 11:21:37 SelectCollections took 187.358141ms

Why does the HydrateCollection take a significant performance hit now that I'm using goroutine?

HydrateCollection function takes a pointer to Collection struct, and mutating it inside the goroutine, so I think that it may be a problem. Also the HydrateCollection itself is a method of a Repository pointer which holds a database connection, the same Repository pointer is used by SelectCollections.

hydrateChan := make(chan error, len(collections))
for index := range collections {
    go func
(coll *Collection) {
        hydrateChan
<- r.hydrateCollection(coll)
   
}(&collections[index])
}

for i := 0; i < len(collections); i++ {
    err
:= <-hydrateChan
   
if err != nil {
        logrus
.Error(err)
   
}
}

Is there something that I need to address in the previous code snippet?

Burak Serdar

unread,
May 22, 2018, 1:36:25 AM5/22/18
to Hawari Rahman, golang-nuts
It is difficult to guess without knowing what HydrateCollection does.
But these numbers are increasing monotonically. Is it possible that
whatever connection HydrateCollection is operating on cannot handle
multiple concurrent requests? Maybe some transaction isolation
problem? So all HydrateCollection instances start running at the same
time, but the second one cannot actually start its work until the
first is done, and third cannot start until the second is done, etc.?
You can test if that's the case by logging the start-end times of
HydrateCollections and the collections each are working on.

>
>
> Why does the HydrateCollection take a significant performance hit now that
> I'm using goroutine?
>
> HydrateCollection function takes a pointer to Collection struct, and
> mutating it inside the goroutine, so I think that it may be a problem. Also
> the HydrateCollection itself is a method of a Repository pointer which holds
> a database connection, the same Repository pointer is used by
> SelectCollections.
>
> hydrateChan := make(chan error, len(collections))
> for index := range collections {
> go func(coll *Collection) {
> hydrateChan <- r.hydrateCollection(coll)
> }(&collections[index])
> }
>
> for i := 0; i < len(collections); i++ {
> err := <-hydrateChan
> if err != nil {
> logrus.Error(err)
> }
> }
>
> Is there something that I need to address in the previous code snippet?
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Hawari Rahman

unread,
May 22, 2018, 2:21:33 AM5/22/18
to golang-nuts
Hi Burak,

Thank you for your reply,
Inside the HydrateCollection function there are database queries (at least three), run using the same *sql.DB throughout the application.

Is it possible that
whatever connection HydrateCollection is operating on cannot handle
multiple concurrent requests?

In that case, what should I look out for? I've always assumed that using a global *sql.DB will always suffice.

Jan Mercl

unread,
May 22, 2018, 2:35:05 AM5/22/18
to Hawari Rahman, golang-nuts
On Tue, May 22, 2018 at 7:12 AM Hawari Rahman <hawari....@gmail.com> wrote:

> Is there something that I need to address in the previous code snippet?

It's necessary to analyze and understand the code to figure out if adding concurrency makes sense. Just running many tasks in goroutines does not per se guarantee any improvements. Actually it can be the opposite.

Later you wrote that all of the goroutines access a DB. It's possible that some/all DB methods do lock the DB. So only one goroutine at a time can actually do something useful. The code is then essentially serial and adding many goroutines to execute it just adds scheduling and communication overhead for no good reason.

Also, even where concurrency makes sense, launching too many goroutines can make things again worse. Only benchmarks can tell for sure, but often the right number of workers is roughly the same as the number of CPU cores available.

--

-j

Hawari Rahman

unread,
May 22, 2018, 2:35:20 AM5/22/18
to golang-nuts
After some more detailed investigation, it seems the most time consuming process is a "Select by IDs" query. So to hydrate a Collection object, I need to retrieve another objects through a SELECT WHERE id = ANY(?)" query in postgres. I'm curious on how the performance can differs greatly when concurrency is introduced.

Justin Israel

unread,
May 22, 2018, 4:45:29 AM5/22/18
to Hawari Rahman, golang-nuts


On Tue, May 22, 2018, 6:35 PM Hawari Rahman <hawari....@gmail.com> wrote:
After some more detailed investigation, it seems the most time consuming process is a "Select by IDs" query. So to hydrate a Collection object, I need to retrieve another objects through a SELECT WHERE id = ANY(?)" query in postgres. I'm curious on how the performance can differs greatly when concurrency is introduced.

Is the query doing a table scan vs using the index? 

Dave Cheney

unread,
May 22, 2018, 4:46:13 AM5/22/18
to golang-nuts
The best tool to investigate this problem is the execution tracer. It will show you the activity of goroutines over time making is easy to spot contention.

Hawari Rahman

unread,
May 22, 2018, 4:59:14 AM5/22/18
to golang-nuts
Hi Justin,

Yes, it is using index scan, which makes me even more puzzled.

Hawari Rahman

unread,
May 22, 2018, 5:02:28 AM5/22/18
to golang-nuts
Hi Dave,

Yeah, I guess that's the way I'm going to go now. By the way, about the usage of the channels in my snippet, is there any catch from that?

Burak Serdar

unread,
May 22, 2018, 9:30:39 AM5/22/18
to Hawari Rahman, golang-nuts
On Tue, May 22, 2018 at 12:21 AM, Hawari Rahman
<hawari....@gmail.com> wrote:
> Hi Burak,
>
> Thank you for your reply,
> Inside the HydrateCollection function there are database queries (at least
> three), run using the same *sql.DB throughout the application.
>
>> Is it possible that
>> whatever connection HydrateCollection is operating on cannot handle
>> multiple concurrent requests?
>
>
> In that case, what should I look out for? I've always assumed that using a
> global *sql.DB will always suffice.


One thing you can try is to get multiple db connections instead of
using a shared one. If your db driver is serializing concurrent
operations on the same connection, the goroutines would run
sequentially, which seems to be the case.

Dave Cheney

unread,
May 22, 2018, 11:13:18 AM5/22/18
to golang-nuts
The execution tracer will show this as it tracks resources that goroutines block on.

Seriously I’m just going to keep suggesting the execution tracer until you try it :)
Reply all
Reply to author
Forward
0 new messages