Re: Profiling: 97.9% in runtime.mach_semaphore_timedwait

958 views
Skip to first unread message

Andy Balholm

unread,
Mar 29, 2013, 2:36:34 PM3/29/13
to golan...@googlegroups.com
As you stated in http://skydb.io/blog/introduction-to-behavioral-databases.html:

Transactions & Locks = Slow

The channel contains a lock. By pushing every request through the channel, you're creating tremendous contention on the lock. Is there any way to eliminate that bottleneck?

I haven't really looked at your code; I'm just thinking about what can happen when you have too many goroutines trying to read or write the same channel. If you could eliminate the bottleneck, your code would probably run about 50 times faster (since it's wasting 98% of its time in lock contention—at least I suspect that is where mach_semaphore_timedwait is being called). It would probably also be able to take advantage of both CPU cores.

b...@skylandlabs.com

unread,
Mar 29, 2013, 3:37:40 PM3/29/13
to golan...@googlegroups.com
Andy-

I tried pulling out the server level channel and it only slightly improved performance. I'll play around with it some more and see if I can figure it out. The locks are unfortunately necessary on insert. The query speed is really where I'm trying to optimize and that effectively uses no locks and is blazingly fast (10MM records/sec/core). I'm just hoping to bump up my insert performance. Something doesn't seem quite right with it.


Ben

Carlos Castillo

unread,
Mar 29, 2013, 5:31:39 PM3/29/13
to golan...@googlegroups.com
Have you looked at the result of the web command in pprof? It shows a graphical representation of the profile, and you can see which code is calling mach_semaphore_timedwait (and with what frequency). Sometimes I see the problem better with this view of the data.

Also, are you using go-1.0.3 or tip (soon to be go-1.1)? Tip includes block profiling, where you can find out how long and where your code is blocking.

On Friday, March 29, 2013 9:46:53 AM UTC-7, b...@skylandlabs.com wrote:
I recently rewrote my open source database in Go and it's been a great experience. I ran into a profiling issue that I couldn't find anything on Google about. I'm importing data from the GitHub Archive into my database and while running profiling it's showing that 97.9% of the time is spent in runtime.mach_semaphore_timedwait:

blackdog2:sky benbjohnson$ go tool pprof http://localhost:8585/debug/pprof/profile
Gathering CPU profile from http://localhost:8585/debug/pprof/profile?seconds=30 for 30 seconds to
  /Users/benbjohnson/pprof/a.out.1364574658.localhost-port8585
Be patient...
Wrote profile to /Users/benbjohnson/pprof/a.out.1364574658.localhost-port8585
Welcome to pprof!  For help, type 'help'.
(pprof) top10
Total: 1504 samples
    1473  97.9%  97.9%     1473  97.9% runtime.mach_semaphore_timedwait
      18   1.2%  99.1%       18   1.2% runtime.sigprocmask
      13   0.9% 100.0%       13   0.9% runtime.mach_semaphore_signal
       0   0.0% 100.0%       31   2.1% net.(*pollServer).Run
       0   0.0% 100.0%       13   0.9% net.(*pollServer).WakeFD
       0   0.0% 100.0%       18   1.2% net.(*pollster).WaitFD
       0   0.0% 100.0%     1473  97.9% runtime.MHeap_Scavenger
       0   0.0% 100.0%       13   0.9% runtime.chansend
       0   0.0% 100.0%       13   0.9% runtime.chansend1
       0   0.0% 100.0%       18   1.2% runtime.exitsyscall

The database demultiplexes the Go HTTP router through a single channel to grab a reference to the table and then it sends the actual processing of the insert of the data to a separate event loop inside a servlet.


The server is running at about 110% CPU (on my dual-core MacBook Pro) and the importer is inserting one item at a time. I'm getting throughput of about 400 items/second which seems low.

What is causing the runtime.mach_semaphore_timedwait to occur? I was thinking that it could just be idle waiting for the importer in between inserts but the CPU usage seems really high for that.


Ben Johnson

Ben Johnson

unread,
Mar 29, 2013, 5:56:58 PM3/29/13
to Carlos Castillo, golan...@googlegroups.com
Thanks, Carlos. I attached the svg output. This one doesn't actually show any of my code though. I assume it's because all the samples that were taken were done in the runtime. The svg output before I removed the server level channel did show some of my code but I don't have a copy any longer. It seems to all be coming from runtime.MHeap_Scavenger. Would the GC waiting on something?

Ben
skyd-pprof.svg

Dave Cheney

unread,
Mar 29, 2013, 8:13:42 PM3/29/13
to Ben Johnson, Carlos Castillo, golan...@googlegroups.com
Can you try two things for me.

1. run your program under GOGCTRACE=1, you don't need to be profiling.
This may confirm a suspicion that you have a spinning goroutine
blocking progress of the gc.

2. profile it on a system other than OS X.

Cheers

Dave
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "golang-nuts" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/golang-nuts/G1uBDhG-TCE/unsubscribe?hl=en-US.
> To unsubscribe from this group and all its topics, send an email to
> golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
>
>

Ben Johnson

unread,
Mar 29, 2013, 8:42:08 PM3/29/13
to Dave Cheney, Carlos Castillo, golan...@googlegroups.com
Dave-

Thanks for the help. I attached the output of running with GOGCTRACE=1. I'll setup a linux box to profile on as well and get back to you with those results.


Ben

skyd.out

Dave Cheney

unread,
Mar 29, 2013, 8:46:13 PM3/29/13
to Ben Johnson, Carlos Castillo, golan...@googlegroups.com
Thanks. That output looks fine, there is no obvious blocking for the
gc, in fact you're barely using any heap at all.

Ingo Oeser

unread,
Mar 29, 2013, 9:51:47 PM3/29/13
to golan...@googlegroups.com
On Friday, March 29, 2013 8:37:40 PM UTC+1, b...@skylandlabs.com wrote:
Andy-

I tried pulling out the server level channel and it only slightly improved performance. I'll play around with it some more and see if I can figure it out. The locks are unfortunately necessary on insert. The query speed is really where I'm trying to optimize and that effectively uses no locks and is blazingly fast (10MM records/sec/core). I'm just hoping to bump up my insert performance. Something doesn't seem quite right with it.


AFAICS there are two levels of channels:
* per servlet
* per server

Since both channels are unbuffered and you just call async and wait afterwards, couldn't you just skip the Servlet channel processing loop and execute the function directly?
Did you try using buffered channels to get better batching?

Order of operations isn't guaranteed with the current model anyway. Only that exactly one message per leveldb is executed at any time. Is that the goal?
If it is, you might try a sync.Mutex instead of a unbuffered channel.

PS: I like the project very much. Would be very useful at work.

Dmitry Vyukov

unread,
Mar 30, 2013, 1:02:31 AM3/30/13
to Ben Johnson, Carlos Castillo, golang-nuts
Looks like profiler bug -- all profiling signals are delivered to the
blocked scavenger thread. Perhaps you are using unsupported version of
OS X.
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
>
> On Mar 29, 2013, at 3:31 PM, Carlos Castillo wrote:
>

Ben Johnson

unread,
Apr 1, 2013, 12:44:10 AM4/1/13
to golang-nuts
Thanks again to everyone that helped me out. I spent the weekend doing some research and reworking a lot of the structure of my database. I had a misunderstanding about how LevelDB locks iterators so I thought I had to serialize access. Luckily that's not the case so I dropped my message loops and got everything down to a single lock per servlet and the code is greatly simplified. Hurray!

It also turns out that the GitHub Archive data is not strictly chronological as I originally thought so I was performing a lot of random inserts (which my database is not optimized for) instead of appends (which my database is optimized for). When I ran against append only data I was able to get about 2,000+ appends per second which seems more reasonable.

I'll let you all know once I'm able to profile on Linux and I'll post the results.

Ingo Oeser- Sky has a Google Group if you're interested (https://groups.google.com/d/forum/skydb). Let me know if I can help.


Ben

Val

unread,
Dec 26, 2014, 3:47:49 PM12/26/14
to golan...@googlegroups.com
FWIW, I also encounter the "99% spent in runtime.mach_semaphore_timedwait" in my toy program.

On OSX 10.9.3 with go1.3, then go1.4 as well, but not every time (I sometimes get lucky and produce a meaningful graph).
This corroborates the previous posts in this thread. My code does only heavy computation so it looks a bit like a "spinning loop": no channels, almost no heap allocation, very few function calls.

I can't give the equivalent results in Linux, but will be able to in a few days.

Jonathan Lawlor

unread,
Dec 26, 2014, 4:11:22 PM12/26/14
to golan...@googlegroups.com
This is an old thread, but there is an issue with OSX: https://github.com/golang/go/issues/6047
Reply all
Reply to author
Forward
0 new messages