Endlessly increasing CPU usage problem

2,122 views
Skip to first unread message

Ian Ragsdale

unread,
Jan 30, 2013, 12:12:20 PM1/30/13
to golan...@googlegroups.com
Hi all, I have a go process with a really strange (interesting?) CPU usage issue.  Over the course of about 24 hours, the CPU usage increases from very low (what I would expect) to 100% of CPU, for no apparent reason.

This process keeps a set of IMAP connections open to send a notification when a message comes in, and right now it's handling maybe 20 accounts, so it's got something like 20 persistent connections open and then performs an HTTP request anytime it sees a new message.  So, it should be sitting idle waiting on network data to come in some very high percentage of the time and then making a call using the built in http client on an occasional basis.

So, I'd expect CPU usage to be very low until some messages come in and then spike, but instead I see this slow, steady CPU usage increase over time. I would feel like that is the result of some data structure somewhere getting steadily larger, but the memory usage stays very steady around 15MB, so that doesn't seem to be happening, and I don't see anything obvious growing when looking at a heap profile.

Ordinarily, here's where I'd expect doing a CPU profile to be quite useful, but I'm seeing some really strange results. I've set up a goroutine to do a 3 minute profile and then sleep for 5 minutes continuously, so I can look at the results over time. The first profile looks good and makes sense, but the issue hasn't shown up yet. All subsequent profiles seem valid in in the pprof tool, but always contain less than 10 samples, which is obviously not right for a 3 minute profile. Those samples don't really give me any info to go on.

So, does anybody have any suggestions about what could be causing this or how I might track it down? A heap profile doesn't show any large data structures, a goroutine sample doesn't a large number of goroutines piled up (and the memory usage would show that), and a CPU profile isn't showing me anything. I'm kind of stuck.

Thanks,
Ian

minux

unread,
Jan 30, 2013, 12:18:16 PM1/30/13
to Ian Ragsdale, golan...@googlegroups.com
perhaps you can use "net/http/pprof" package and use the live profile data
or view the goroutine stack trace at any instant.

bryanturley

unread,
Jan 30, 2013, 12:19:09 PM1/30/13
to golan...@googlegroups.com
Perhaps track system stats external to your program as well.
Using a full system profiler might help if your code is causing the system (kernel/other) to behave badly.

Dave Cheney

unread,
Jan 30, 2013, 5:00:25 PM1/30/13
to Ian Ragsdale, golan...@googlegroups.com
Hi Ian,

Can you please provide the following details

* your Go version and operating system details
* the full output from the process when sent a SIGQUIT, once it entered the high CPU usage scenario. 
* the full output from running the process with GOGCTRACE=1 as above. 
* the source, I possible, or at least a sample that reproduces the issue. 
* if you are using Linux, try running the process under the perf(1) tool. 
* if you able, try running the process under strace(1). 

My gut feeling is your code, or a library is not checking an error code from a socket operation and entering into a tight loop. If this is the case temporarily breaking the networking on this host may induce the failure. 

Cheers

Dave
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Ian Ragsdale

unread,
Jan 30, 2013, 8:57:33 PM1/30/13
to Dave Cheney, golan...@googlegroups.com
Thanks Dave, those look like great suggestions.  I'm running Go 1.0.3 on Ubuntu:

# go version                                                                                                                                                                                                                                                                                go version go1.0.3

# uname -a
Linux 4bf343d4-786f-4fa6-b37c-8bdd018fc63b 2.6.32-350-ec2 #57-Ubuntu SMP Thu Nov 15 15:59:03 UTC 2012 x86_64 GNU/Linux

I'll work on gathering the SIGQUIT & GOGCTRACE output - didn't know about those techniques.  I can't really share the source, and I'm not sure if there's a good way to cut it down to a sharable chunk, but I'll give it a shot.  Strace was going to be my next move.

I think your theory in general makes sense, as I'm familiar with that kind of failure, but if that was the case, would it not immediately shoot up to 100% cpu usage as soon as the problem occurred?  In this situation, the cpu usage creeps up quite slowly, in a nearly perfect linear fashion over the course of many hours.

- Ian

Anon

unread,
Nov 13, 2018, 9:26:29 AM11/13/18
to golang-nuts
Hi Ian,

I am facing the same issue. The CPU usage on my servers increases from very low to 100% of CPU. Did you find the solution to your problem?

go version : 1.10.1
OS : centOS

Thanks,

::DISCLAIMER::

----------------------------------------------------------------------------------------------------------------------------------------------------


This message is intended only for the use of the addressee and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify us immediately by return e-mail and delete this e-mail and all attachments from your system.

Rustam Abdullaev

unread,
Mar 8, 2021, 12:25:11 PM3/8/21
to golang-nuts
You might have a ticker leak.
Check that you're not accidentally calling time.NewTicker in a loop.
Get a pprof heap dump and look for time.NewTicker objects. The total size of them should not be large. If it is, then you have a ticker leak.

Uli Kunitz

unread,
Mar 9, 2021, 3:43:30 PM3/9/21
to golang-nuts
Ian,

I recommend to use a newer version. go 1.0.3 has been released in September 2012.

Danny Carr

unread,
Jan 19, 2023, 12:56:40 AM1/19/23
to golang-nuts
@Rustam Abdullaev you're a lifesaver!! I was having this same issue with rising CPU and was going crazy trying to find the resource leak until I saw your comment - pulled up a heap dump and saw 75% time.NewTicker, found a dangling ticker I was instantiating in a goroutine, added a defer ticker.Stop() and voila, problem solved.

For my own edification and that of future debuggers, as far as I understand it the issue with leaked tickers is that the time package uses the netpoller (interaction there is beyond me) to continuously loop over all timers on the heap, checking if each is ready to fire and running the associated function if so. With respect to tickers, that means that every new ticker will produce a new call to time.sendTime() on the configured interval until its timer is removed from the heap via Ticker.Stop() - that call to time.sendTime() runs a non-blocking send of time.Now() to the ticker channel (which explicitly never gets closed as per the ticker docs), which altogether comprises that 'tight loop' that Dave suggested earlier in this thread.

Is it just me or could this be made a little clearer in the docs? I suppose you're meant to implicitly understand that any allocated resources need to be cleaned up, but this rather mystifying rising CPU issue seems like it could be worth an explicit mention.
Reply all
Reply to author
Forward
0 new messages