High futex contention when reading from DB with mgo

494 views
Skip to first unread message

Chandra Sekar S

unread,
Jan 4, 2013, 2:48:18 PM1/4/13
to golan...@googlegroups.com
Hi,

I have this very simple piece of code which is similar to a hot path in our application. GOMAXPROCS is left at the default of 1.


When I tested (siege, concurrency 50), I found that it gave poor throughput. To confirm it was not a DB issue, I wrote a clone in node.js and ended up with twice the throughput.

I tried profiling and got these results.

Total: 2759 samples
     491  17.8%  17.8%      491  17.8% runtime.futex
     329  11.9%  29.7%      603  21.9% syscall.Syscall
     107   3.9%  33.6%      107   3.9% runtime.memmove
      66   2.4%  36.0%      165   6.0% hash_lookup
      65   2.4%  38.3%      334  12.1% runtime.mallocgc
      63   2.3%  40.6%      743  26.9% labix.org/v2/mgo/bson.(*decoder).readDocTo
      50   1.8%  42.4%      336  12.2% labix.org/v2/mgo/bson.(*decoder).readElemTo
      48   1.7%  44.2%      101   3.7% sort.StringSlice.Less
      47   1.7%  45.9%       49   1.8% sweepspan
      46   1.7%  47.6%       46   1.7% runtime.markallocated

How can I improve this?

--
Chandra Sekar.S

bryanturley

unread,
Jan 4, 2013, 3:38:41 PM1/4/13
to golan...@googlegroups.com

I know this has been noticed before by others, and I think they are working on a fix for it.
What version of go are you using?

I can't find it in the issue tracker though perhaps it has been solved in the lastest versions?

Dave Cheney

unread,
Jan 4, 2013, 9:28:27 PM1/4/13
to Chandra Sekar S, golan...@googlegroups.com
Can you please provide

The OS and version of Go you are using

The svg output of this profile. 

At some point someone will ask you test with tip, so you should go ahead and profile your code under tip. 

Cheers

Dave
--
 
 

Chandru

unread,
Jan 4, 2013, 11:09:23 PM1/4/13
to golan...@googlegroups.com
I am on tip.

Ubuntu 12.10
go version devel +be6ca9f6bfe8 Fri Jan 04 17:07:21 2013 +0100 linux/amd64

Profile SVG attached.

--
Chandra Sekar.S
mgo_profile.svg

Dave Cheney

unread,
Jan 4, 2013, 11:48:40 PM1/4/13
to Chandru, golan...@googlegroups.com
Thank you, can you rerun your test with

GOGCTRACE=1 and post the results
> --
>
>

Chandru

unread,
Jan 5, 2013, 1:19:14 AM1/5/13
to golan...@googlegroups.com
GC trace attached.

Is this trace always useful in debugging performance issues? If so, how should I interpret it?

--
Chandra Sekar.S
mgo_gc_trace

Alexey Borzenkov

unread,
Jan 5, 2013, 1:27:46 AM1/5/13
to Chandru, golan...@googlegroups.com
Interesting, like in my case I see you have an even split between futexwakeup and futexsleep calls. I wonder how many cores do you have on the server?


--
 
 

Dave Cheney

unread,
Jan 5, 2013, 1:33:56 AM1/5/13
to Alexey Borzenkov, Chandru, golan...@googlegroups.com
I think we can alter your test to emit a contention graph, which might
point to where the high contention is occuring.

I am surprised that the futex call is showing up in the profile, that
is usually an indication that GOMAXPROCS has been raised.
> --
>
>

Chandru

unread,
Jan 5, 2013, 1:42:30 AM1/5/13
to golan...@googlegroups.com
2 cores. Even binding the process to a single core with taskset didn't change things.

How do I get the contention graph?

--
Chandra Sekar.S

Dave Cheney

unread,
Jan 5, 2013, 1:55:35 AM1/5/13
to Chandru, golan...@googlegroups.com
Try this, http://play.golang.org/p/AjLmQE0Cvu

Not tested, might need a bit of tweaking. You could restructure your
code into a testing.Benchmark which would give you all this for free.

The result can be fed to pprof as before.
> --
>
>

Dave Cheney

unread,
Jan 5, 2013, 2:07:13 AM1/5/13
to Chandru, golan...@googlegroups.com
Try this

http://play.golang.org/p/0AuJ06v2wJ

place it in a pkg called `mgo`

Then you can use the usual testing flags, described at the bottom of
godoc cmd/go, and compare your results as improvements are made.

Cheers

Dave

Chandru

unread,
Jan 5, 2013, 2:13:27 AM1/5/13
to Dave Cheney, golan...@googlegroups.com
The profile output of your first example,

Total: 1567.764 seconds
1470.712  93.8%  93.8% 1470.712  93.8% sync.(*Cond).Wait
  97.009   6.2% 100.0%   97.009   6.2% runtime.chanrecv1
   0.043   0.0% 100.0%    0.043   0.0% sync.(*Mutex).Lock
   0.000   0.0% 100.0%    0.010   0.0% bufio.(*Reader).ReadLine
   0.000   0.0% 100.0%    0.010   0.0% bufio.(*Reader).ReadSlice
   0.000   0.0% 100.0%    0.010   0.0% bufio.(*Reader).fill
   0.000   0.0% 100.0%    0.010   0.0% io.(*LimitedReader).Read
   0.000   0.0% 100.0% 1470.713  93.8% labix.org/v2/mgo.(*Iter).Next
   0.000   0.0% 100.0%    0.000   0.0% labix.org/v2/mgo.(*Iter).getMore
   0.000   0.0% 100.0%    0.037   0.0% labix.org/v2/mgo.(*Query).Iter

I've also attached the SVG.

--
Chandra Sekar.S
block_prof.svg

Chandru

unread,
Jan 5, 2013, 2:20:59 AM1/5/13
to golan...@googlegroups.com
With the benchmark test, I get these.

Total: 3.370 seconds
   2.713  80.5%  80.5%    2.713  80.5% runtime.chanrecv1
   0.657  19.5% 100.0%    0.657  19.5% sync.(*Cond).Wait
   0.000   0.0% 100.0%    0.657  19.5% _/home/chandru/benchmark/mgo.BenchmarkMgoItems
   0.000   0.0% 100.0%    0.657  19.5% labix.org/v2/mgo.(*Iter).Next
   0.000   0.0% 100.0%    0.126   3.7% labix.org/v2/mgo.(*mongoSocket).readLoop
   0.000   0.0% 100.0%    0.126   3.7% labix.org/v2/mgo.fill
   0.000   0.0% 100.0%    2.587  76.8% main.main
   0.000   0.0% 100.0%    0.126   3.7% net.(*conn).Read
   0.000   0.0% 100.0%    0.126   3.7% net.(*netFD).Read
   0.000   0.0% 100.0%    0.126   3.7% net.(*pollServer).WaitRead

SVG attached.

--
Chandra Sekar.S
bench_prof.svg

Dave Cheney

unread,
Jan 5, 2013, 2:24:53 AM1/5/13
to Chandru, golan...@googlegroups.com
My brief interpretation of these results is your go process is waiting
on the mongo process, the high futex calls are because Iter calls
cond.Wait internally. This is similar to a goroutine waiting in
syscall.Syscall. The profiler just reports whoever is on top of the
call stack.

I cannot explain why the speed of mongodb is different to your other
(node ? was it ) test case, maybe others more experienced with tuning
mongodb can assist further.

Chandru

unread,
Jan 5, 2013, 3:02:09 AM1/5/13
to Dave Cheney, golan...@googlegroups.com
Thanks a ton for your help, Dave. Will take this discussion to the mgo mailing list.

--
Chandra Sekar.S

Dave Cheney

unread,
Jan 5, 2013, 3:11:24 AM1/5/13
to Chandru, golan...@googlegroups.com
No worries. Sorry I can't explain why, only how at this stage. 

Alexey Borzenkov

unread,
Jan 5, 2013, 4:02:19 AM1/5/13
to Chandra Sekar S, golang-nuts
Chandra,

I'm not entirely sure, but I looked at mgo source code and I have a hunch that lower throughput could be due to how mgo manages its sockets: it only acquires them for a duration of preparing an op and writing it to that socket, which means that with high concurrency multiple queries might be serializing on the same connection, hence lower throughput. I haven't looked into node.js driver, but normally mongo drivers seem to acquire sockets for the whole request-response duration. In the past I thought this mgo behaviour was actually good, since it improved concurrency, but if my hunch is correct this might be the side effect, since on the server side mongodb runs ops serially anyway...


--
 
 

Dmitry Vyukov

unread,
Jan 5, 2013, 5:13:51 AM1/5/13
to golan...@googlegroups.com
Hi,

This is a known issue in goroutine scheduler. Try setting GOMAXPROCS to 2/4/8, it can alleviate the problem somewhat.

Chandru

unread,
Jan 5, 2013, 5:49:57 AM1/5/13
to golang-nuts
That did improve it, but why? Shouldn't setting GOMAXPROCS higher make it worse?

While it has reduced the time spent on futex to 5.7%, syscall.Syscall still remains at 11.1%, hurting throughput.

--
Chandra Sekar.S

Dave Cheney

unread,
Jan 5, 2013, 6:02:44 AM1/5/13
to Chandru, golang-nuts
> While it has reduced the time spent on futex to 5.7%, syscall.Syscall still
> remains at 11.1%, hurting throughput.

I wouldn't obsess about those numbers, they aren't real numbers, just
proxies for time spent outside the view of your program, and hence the
profile. I would focus on straight mgo throughput, as that is your
original goal. For example, those syscall.Syscalls come from the read
path, from mgo.Fill. If you can reduce the number of reads the mgo
driver is doing, the amount of time reported as syscall.Syscall will
fall.

Chandru

unread,
Jan 5, 2013, 6:09:31 AM1/5/13
to golang-nuts
Yes, I understand that. That discussion belongs to the mgo list.

However, I still haven't understood why increasing GOMAXPROCS reduced futex contentions. Doesn't it always increase it?

--
Chandra Sekar.S

Dmitry Vyukov

unread,
Jan 5, 2013, 6:43:52 AM1/5/13
to Chandru, golang-nuts
On Sat, Jan 5, 2013 at 2:49 PM, Chandru <chand...@gmail.com> wrote:
> That did improve it, but why? Shouldn't setting GOMAXPROCS higher make it
> worse?

With GOMAXPROCS=1 runtime tries hard to keep only 1 running thread at
a time, thus excessive thread parks/unparks (futex ops). When
GOMAXPROCS=8 runtime may try not that hard to park and unpark threads.


> While it has reduced the time spent on futex to 5.7%, syscall.Syscall still
> remains at 11.1%, hurting throughput.

If you care only about performance, then you need to replace your
program with "func main() {}". That will be the fastest :)
Seriously, syscall.Syscall are socket read() and write() calls. It's
your useful work, runtime can't just eliminate it.

Chandru

unread,
Jan 5, 2013, 7:02:19 AM1/5/13
to golang-nuts
Thanks for taking the time to explain futex. Is proc.c source the best place to learn how the scheduler works or is there another documentation I've missed?

While I understand time spent on syscall is actual work, I'm trying to reduce it further because I could achieve much better performance (almost twice) with the node.js implementation. I may have either used mgo wrong or mgo's implementation may need some improvement.

--
Chandra Sekar.S

Dmitry Vyukov

unread,
Jan 5, 2013, 7:10:08 AM1/5/13
to Chandru, golang-nuts
On Sat, Jan 5, 2013 at 4:02 PM, Chandru <chand...@gmail.com> wrote:
> Thanks for taking the time to explain futex. Is proc.c source the best place
> to learn how the scheduler works or is there another documentation I've
> missed?

I am not aware of any other documentation.

> While I understand time spent on syscall is actual work, I'm trying to
> reduce it further because I could achieve much better performance (almost
> twice) with the node.js implementation. I may have either used mgo wrong or
> mgo's implementation may need some improvement.

When you set GOMAXPROCS to a higher value, are you able to fully load all cores?

I see that 30% of time is spent in bson decoding. I guess node.js uses
C library to that, right? Go implementation is not 100% optimal. I may
suggest trying to use gccgo which must generate much better straight
line code. However, I never actually investigated what benefits it can
provide for such program. Maybe somebody on the list can provide some
insights on this.

Chandru

unread,
Jan 5, 2013, 7:25:30 AM1/5/13
to golang-nuts
Yes, the node.js driver uses a C++ bson module. I was focusing on the time spent on each function (default sort order) and missed the total time in which it was running.

That could be it. Thanks.

--
Chandra Sekar.S

Alexey Borzenkov

unread,
Jan 5, 2013, 7:46:36 AM1/5/13
to Chandru, golang-nuts
Hi Chandru,

Could you try if this version would have better throughput?


I haven't tested it though, so it might have so typos...


--
 
 

Chandru

unread,
Jan 5, 2013, 11:32:00 AM1/5/13
to golang-nuts
Nope. That made it worse and made futex the bottleneck again.

--
Chandra Sekar.S
Reply all
Reply to author
Forward
0 new messages