High cpu usage on syscall.Syscall when reading from multicast

737 views
Skip to first unread message

Mario Kozjak

unread,
Apr 29, 2015, 9:46:19 AM4/29/15
to golan...@googlegroups.com, Petar Koretić
Hi,

I wrote a following program: 

It is used to read data from a udp socket and write it to a file.
According to the attached svg, there seems to be a high utilization of syscall.Syscall which seems to lead to a high cpu usage when listening from multicast.

The test was done on Ubuntu 14.04.1 LTS, Trusty Tahr on the host and via docker.
sysctl.conf:
net.core.rmem_max=12582912
net.core.rmem_default=12582912

For one multicast socket read and file write, it utilizes around 40% on Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz (6 cores).
A linked program is using "github.com/davecheney/profile" to profile cpu usage. I got similar results when not using the profiler, btw.
A profiling file is also attached to this email.
(bear in mind I'm using different stream sources/mcast addresses)

Host:

a) one stream:

avg cpu usage: 58.38 (pidstat)
(pidstat 3 -p 63890 - Linux 3.16.0-33-generic (vod1) 04/29/2015 _x86_64_ (24 CPU))

b) two streams:

avg cpu usage: 72.30 (pidstat)

c) three streams:

avg cpu usage: 80.67 (pidstat)


I have also tested the whole thing on kvm (Intel Xeon E312xx (Sandy Bridge)) without runtime.GOMAXPROCS(2):
cat /proc/cpuinfo | grep processor | wc -l
1

a) one stream:

cat /proc/18113/status | grep ctxt
voluntary_ctxt_switches:        1286
nonvoluntary_ctxt_switches:     2902

avg cpu usage: 1.14 (pidstat)


b) two streams:

cat /proc/18113/status | grep ctxt
voluntary_ctxt_switches:        6233
nonvoluntary_ctxt_switches:     14674

avg cpu usage: 7.14 (pidstat)


c) three streams:

cat /proc/18113/status | grep ctxt
voluntary_ctxt_switches:        19426
nonvoluntary_ctxt_switches:     49421

avg cpu usage: 12.81 (pidstat)


Context switches were taken at similar intervals (two minutes after adding a new stream recording).


Does anyone have any ideas why a high cpu usage is seen here? Are there any bad parts in the gorecord code which would introduce such loads?
Bear in mind the rpc methods from the code aren't used when testing.


Thank you in advance!

With regards,
Mario Kozjak
pprof009.svg
cpu.pprof

Mario Kozjak

unread,
May 4, 2015, 3:30:42 AM5/4/15
to golan...@googlegroups.com, petar....@gmail.com
Any ideas on this, maybe?

Dave Cheney

unread,
May 4, 2015, 4:32:31 AM5/4/15
to golan...@googlegroups.com, petar....@gmail.com
I had a quick look at the code and my suggestion is to focus on the error paths. In most cases when your code hits an error, it immediately jumps back to the top of the loop and retries the operation, this is basically spinning and probably accounts for the high cpu usage. 

Also, in many cases where you are calling a Read method you are discarding the number of bytes returned. You should no do this as you cannot assume that Read will return len(buf) bytes, so you must reslice the buffer after the Read call with the number of bytes actually read.

Mario Kozjak

unread,
May 4, 2015, 5:23:53 AM5/4/15
to golan...@googlegroups.com, Petar Paradzik, petar....@gmail.com
Hello, Dave!

First of all, thanks for taking a peek at the code!

For the errors you've pointed out, I believe you're talking about the line 846 (https://github.com/mkozjak/gorecord/blob/cpu_profile/gorecord.go#L846).
I just did another test printing out the error if it occurs. Not a single one came around for a few minutes of testing (with one or two different multicast streams, in example).
Which means 'continue' is never triggered.

The cpu was at 5.05% on average for one stream (read from udp socket and write to file) and 10.52% for two.
The test was done on a kvm machine.

Regarding zero bytes returned from the socket.. I have also tested this right now and I never get 0 bytes.
The multicast is definitely arriving. Each packet is 1328 bytes large. So 'pkt' is always full. So that should not be the case of a big cpu usage either.

Do you have any other thoughts on this, maybe?

With regards,
Mario Kozjak

James Bardin

unread,
May 4, 2015, 10:05:24 AM5/4/15
to golan...@googlegroups.com, petar....@gmail.com, parad...@gmail.com


On Monday, May 4, 2015 at 5:23:53 AM UTC-4, Mario Kozjak wrote:

Regarding zero bytes returned from the socket.. I have also tested this right now and I never get 0 bytes.
The multicast is definitely arriving. Each packet is 1328 bytes large. So 'pkt' is always full. So that should not be the case of a big cpu usage either.


Still no reason to not do it correctly. You could be sending null bytes to something that doesn't expect it at some point, or being UDP, you could get a fragmented packet with one part lost, junk packets hitting the socket, a broken packet from the source, etc. But yes, shouldn't affect the CPU usage unless it's causing errors further down the line.

From the source, if you're receiving packets at any appreciable speed, you're going to be thrashing the allocator and GC. You create a new slice for every single packet of 1328 bytes.  While it looks nice and clean to have a channel that takes slices to send to you writer goroutine, you can't get any appreciable throughput this way, there's just too much garbage generated. You need to reuse your buffers.

James Bardin

unread,
May 4, 2015, 10:07:37 AM5/4/15
to golan...@googlegroups.com, petar....@gmail.com, parad...@gmail.com

From the source, if you're receiving packets at any appreciable speed, you're going to be thrashing the allocator and GC. You create a new slice for every single packet of 1328 bytes.  While it looks nice and clean to have a channel that takes slices to send to you writer goroutine, you can't get any appreciable throughput this way, there's just too much garbage generated. You need to reuse your buffers.

Oh, and fas or the profile, the pprof isn't much use without the binary that created it, but I would lighten the load on the GC first.

Mario Kozjak

unread,
May 4, 2015, 11:07:15 AM5/4/15
to golan...@googlegroups.com, parad...@gmail.com, petar....@gmail.com
Hi, James

Reusing the buffer didn't lower the cpu usage at all. For one channel it was about 10% on kvm, and 40% at the real server (!!!!!!!) (both same servers like in the first post are used).


Regards,
mk

Mario Kozjak

unread,
May 4, 2015, 11:53:23 AM5/4/15
to golan...@googlegroups.com, parad...@gmail.com, petar....@gmail.com
Also, here is an example application written in Qt, using maximum of 2.5% cpu per channel (multicast listen and write to file) on both kvm and host:

Mario Kozjak

unread,
May 10, 2015, 10:36:48 AM5/10/15
to golan...@googlegroups.com, petar....@gmail.com, da...@cheney.net, parad...@gmail.com
Are there any other ideas? Is it my code that's not correct that would produce such cpu consumptions
(except the stuff already pointed out which don't really augment this problem)?

I've already tried the go git version and the problem persists, too.

Regards,
mk

Zlatko Calusic

unread,
Jun 24, 2015, 5:41:20 PM6/24/15
to Mario Kozjak, golan...@googlegroups.com
On 04.05.2015 11:23, Mario Kozjak wrote:
> Hello, Dave!
>
> First of all, thanks for taking a peek at the code!
>
> For the errors you've pointed out, I believe you're talking about the
> line 846
> (https://github.com/mkozjak/gorecord/blob/cpu_profile/gorecord.go#L846).
> I just did another test printing out the error if it occurs. Not a
> single one came around for a few minutes of testing (with one or two
> different multicast streams, in example).
> Which means 'continue' is never triggered.
>
> The cpu was at 5.05% on average for one stream (read from udp socket and
> write to file) and 10.52% for two.
> The test was done on a kvm machine.
>
> Regarding zero bytes returned from the socket.. I have also tested this
> right now and I never get 0 bytes.
> The multicast is definitely arriving. Each packet is 1328 bytes large.
> So 'pkt' is always full. So that should not be the case of a big cpu
> usage either.
>
> Do you have any other thoughts on this, maybe?
>

Hello Mario,

I see you are using obsolete and not maintained anymore
code.google.com/p/go.net repo.

Why not switch to golang.org/x/net and see if you get any improvements
with newer code? Additionaly, the switch will enable you to compile with
soon to be released Go 1.5 and see if that provides any more improvements.

Hope it helps.

Regards,

--
Zlatko

Mario Kozjak

unread,
Jul 21, 2015, 10:19:00 AM7/21/15
to golan...@googlegroups.com, da...@cheney.net, Petar Paradzik, Petar Koretić
Hello,

Seems like this line is somewhat problematic with our usecase where we have multiple multicasts that are to be bound at the same port,


By removing the syscall.AF_INET case and rebuilding go, we got drastic changes with performance.
--
Mario Kozjak

Mario Kozjak

unread,
Mar 17, 2016, 9:01:39 PM3/17/16
to da...@cheney.net, golan...@googlegroups.com, Petar Koretić

Hello. We are having major improvements considering this issue with go version 1.6.

Anything changed in this area and what was a potential problem?

Reply all
Reply to author
Forward
0 new messages