runtime/pprof - pausing CPU profiling

149 views
Skip to first unread message

Stephen Illingworth

unread,
Nov 12, 2024, 6:16:56 AMNov 12
to golang-nuts
Hello,

I want to create a runtime CPU profile but to restrict the profile to a specific part of the program. I currently have a way of taking a CPU profile for the entire program so have experience with the runtime/pprof package but this is a new requirement for me.

The 'section' to be profiled represents a significant amount of code and I'm looking to see if any particular part of that section can be improved. The problem I have is that the program outside of that section is significantly larger (ie. larger CPU load) and dominates the profile data.

The section to be profiled is entered and exited 60 times per second (at a minimum).

In effect, I want to pause the CPU profile when execution exits the section and to resume when the profile enters it again. Stopping and restarting the profile is too slow.

But maybe there's a completely different way to do this. Does anyone have experience with this type of thing?

Regards
Stephen

Jason E. Aten

unread,
Nov 12, 2024, 9:06:56 PMNov 12
to golang-nuts
Idea 0: Can you just click on the graphviz boxes in the pprof http page
for your section and then just show the source for only those? 
Might give you a quick idea of which are the hottest spots only 
among those func of interest.

Idea 1: Can you write a test or benchmark that just exercises the important section?

I suspect you could then run the profiler on your benchmark/test.
If not, it might be worth refactoring your code to allow this. 
Otherwise you could write an alternative "main" as a harness that just calls
the important section.

Idea 2: referring to the StartCPUProfile docs:
> On Unix-like systems, StartCPUProfile does not work by default for Go code built 
> with -buildmode=c-archive or -buildmode=c-shared. StartCPUProfile relies on the 
> SIGPROF signal, but that signal will be delivered to the main program's SIGPROF 
> signal handler (if any) not to the one used by Go. To make it work, call os/signal.Notify 
> for syscall.SIGPROF, but note that doing so may break any profiling being done by the main program.

This makes it sounds like the SIGPROF signal is used to do the sampling. So
you might be able to manipulate it (e.g. ignore it when not in your code). Seems like
a hack, but might be worth experimenting with.

Idea 3: post processing. You could probably take apart the profile log after it is
recorded and discard the samples that are irrelevant.

Idea 4: simple manual timing. Insert time.Now() and time.Since() calls at
strategic points, and measure the improvement as you tweak your code.

scott beeker

unread,
Nov 13, 2024, 1:05:36 AMNov 13
to Jason E. Aten, golang-nuts

create a runtime CPU profile but to restrict the profile to a specific part of the program.  runtime/pprof - pausing CPU profiling

To create a runtime CPU profile for a specific part of a Go program, you can use the `runtime/pprof` package. Here's how you can do it:

1. **Import the Package**: Add `import "runtime/pprof"` to your Go file.

2. **Start Profiling**: Use `pprof.StartCPUProfile(file)` to start profiling, where `file` is an `os.File` object to write the profile data.

3. **Restrict Profiling**: Place `pprof.StartCPUProfile()` before the code section you want to profile and `pprof.StopCPUProfile()` immediately after that section.

4. **Stop Profiling**: Ensure you call `defer pprof.StopCPUProfile()` to stop profiling and flush data when the function exits[2][3][4].

Citations:
[1] Gperftools CPU Profiler https://gperftools.github.io/gperftools/cpuprofile.html
[2] Unlocking Performance Insights: CPU Profiling in Go https://www.codingexplorations.com/blog/unlocking-performance-insights-cpu-profiling-in-go
[3] Profiling Go Programs https://go.dev/blog/pprof
[4] runtime/pprof - Go Packages https://pkg.go.dev/runtime/pprof
[5] Runtime profiling - HPC Wiki https://hpc-wiki.info/hpc/Runtime_profiling
[6] CPU Profiler — go-profiler-notes documentation - Datadog has moved https://datadoghq.dev/go-profiler-notes/profiling/cpu-profiler.html
[7] Profiling Go programs with pprof - Julia Evans https://jvns.ca/blog/2017/09/24/profiling-go-with-pprof/
[8] CPU profiling in the Performance Profiler - Visual Studio (Windows) https://learn.microsoft.com/en-us/visualstudio/profiling/cpu-usage?WT.mc_id=studentamb_228125&view=vs-2022


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/c6c0833e-a1d0-4132-87cf-c03b7f6a0ab6n%40googlegroups.com.

Stephen Illingworth

unread,
Nov 13, 2024, 3:41:06 AMNov 13
to golang-nuts
Thank's Jason. These are all good ideas:

On Wednesday 13 November 2024 at 02:06:56 UTC Jason E. Aten wrote:
Idea 0: Can you just click on the graphviz boxes in the pprof http page
for your section and then just show the source for only those? 
Might give you a quick idea of which are the hottest spots only 
among those func of interest.

Yes. This was my original thought. We can "focus" on the function that's at the start of the subtree I'm interested in. However, the statistics don't scale. In other words, I'd like the root node of the subtree to represent a 100% of cumulative CPU time in the profile.
 
Idea 1: Can you write a test or benchmark that just exercises the important section?

I suspect you could then run the profiler on your benchmark/test.
If not, it might be worth refactoring your code to allow this. 
Otherwise you could write an alternative "main" as a harness that just calls
the important section.

Maybe. However, the code that is outside of the subtree is important because of the data that is used by the subtree. The problem is that the path to the subtree is complex and also compute heavy. I suppose I could create a suitable test harness for this purpose, but it's really live information I'm interested in here.

But I'll give this some more thought.
 
Idea 2: referring to the StartCPUProfile docs:
> On Unix-like systems, StartCPUProfile does not work by default for Go code built 
> with -buildmode=c-archive or -buildmode=c-shared. StartCPUProfile relies on the 
> SIGPROF signal, but that signal will be delivered to the main program's SIGPROF 
> signal handler (if any) not to the one used by Go. To make it work, call os/signal.Notify 
> for syscall.SIGPROF, but note that doing so may break any profiling being done by the main program.

This makes it sounds like the SIGPROF signal is used to do the sampling. So
you might be able to manipulate it (e.g. ignore it when not in your code). Seems like
a hack, but might be worth experimenting with.

I had a quick look at the pprof package to see if there was anything obvious I could do. I didn't go into runtime package however, which I suspect I would need to. I'd probably get out of my depth very quickly once I was in there.

Idea 3: post processing. You could probably take apart the profile log after it is
recorded and discard the samples that are irrelevant.

I think this is probably the best and most sustainable solution. A post-processing filter to create a profile containing just the subtree and scaled statistics would be good.
 
Idea 4: simple manual timing. Insert time.Now() and time.Since() calls at
strategic points, and measure the improvement as you tweak your code.

I would normally go for something like this but the subtree is quite complex. Manual timing points wouldn't necessarily reveal what I want to see - and I would need to know where the (possible) issues are before I added the timing calls! I suppose I could use something like this to bisect where a problem might be, but that sounds like it would take a long time and not be repeatable.

But as I said, these are all great ideas. Thanks for taking the time to outline them.


As I was writing this reply it occurred to me that maybe a continuous profiling solution might be good for my case. Does anyone have any recommendations for continuous profiling with the kind of detail offered by pprof?

Thanks
Stephen


Stephen Illingworth

unread,
Nov 13, 2024, 3:43:18 AMNov 13
to golang-nuts
On Wednesday 13 November 2024 at 06:05:36 UTC scott beeker wrote:

2. **Start Profiling**: Use `pprof.StartCPUProfile(file)` to start profiling, where `file` is an `os.File` object to write the profile data.

3. **Restrict Profiling**: Place `pprof.StartCPUProfile()` before the code section you want to profile and `pprof.StopCPUProfile()` immediately after that section.

4. **Stop Profiling**: Ensure you call `defer pprof.StopCPUProfile()` to stop profiling and flush data when the function exits[2][3][4].

Hi Scott. Thanks for replying.

I've tried but this unfortunately, the Start and Stop processes are too expensive and really require writing to a different file for every stop. The nature of the program means I need to do the Start/Stop process 60+ times per second, so it would generate a lot of files and be very slow on top.

Jason E. Aten

unread,
Nov 13, 2024, 7:02:12 AMNov 13
to Stephen Illingworth, golang-nuts
> I've tried but this unfortunately, the Start and Stop processes are too expensive and really require writing to a different file for every stop. The nature of the program means I need to do the Start/Stop process 60+ times per second, so it would generate a lot of files and be very slow on top.

Note that the StartCPUProfile call takes an io.Writer. Therefore a simple *bytes.Buffer suffices. No files need be created. You can be much faster than the file system.

> func StartCPUProfile(w io.Writer) error

Writing a benchmark for your subsystem is the hardest mentally, but most likely the best choice for the long term.


--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/Emrx_W9eSig/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/e21aad0f-04f9-4740-8ce7-34afaa65d17dn%40googlegroups.com.

Stephen Illingworth

unread,
Nov 13, 2024, 7:54:52 AMNov 13
to golang-nuts
On Wednesday 13 November 2024 at 12:02:12 UTC Jason E. Aten wrote:
> I've tried but this unfortunately, the Start and Stop processes are too expensive and really require writing to a different file for every stop. The nature of the program means I need to do the Start/Stop process 60+ times per second, so it would generate a lot of files and be very slow on top.

Note that the StartCPUProfile call takes an io.Writer. Therefore a simple *bytes.Buffer suffices. No files need be created. You can be much faster than the file system.

Yes. I've tried that too. But it's the call to StopCPUProfile() which takes a lot of time. Using a bytes.Buffer it takes over 200ms to complete. It's not much quicker than using a os.File as the writer implementation.

My deadline for everything is less than 16ms so the 'pausing' process needs to take a lot less time than that.

I could maybe live with the delay for benchmarking purposes but I'm also left with the problem of "concatenated profiles". pprof doesn't support that. If it did or if there was a way of post-processing the concatenated profiles into one profile that might be a satisfactory outcome.
 
> func StartCPUProfile(w io.Writer) error

Writing a benchmark for your subsystem is the hardest mentally, but most likely the best choice for the long term.

You're absolutely right of course. But I was hoping for a mechanism that could be activated when a user detects a problem with their own input data. That might still be possible but I'll have to give it some thought.

Thanks for the reply!

Jason E. Aten

unread,
Nov 13, 2024, 8:31:53 AMNov 13
to Stephen Illingworth, golang-nuts
All good. I still think that the hack of blocking and restoring the runtime's SIGPROF signal 
handler might actually work though.  It might not work. But if it does, it is almost
exactly what you were looking for.

Ian or others more knowledgeable myself might be able to advise better if it _should_
work, but maybe give it a shot. You could ask on gopherslack too for ideas.




--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/Emrx_W9eSig/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

Jason E. Aten

unread,
Nov 13, 2024, 8:41:53 AMNov 13
to Stephen Illingworth, golang-nuts
(On gopherslack, the  #darkarts folks are performance oriented....)

Ian Lance Taylor

unread,
Nov 13, 2024, 10:57:42 AMNov 13
to Jason E. Aten, Stephen Illingworth, golang-nuts
On Wed, Nov 13, 2024, 5:31 AM Jason E. Aten <j.e....@gmail.com> wrote:
All good. I still think that the hack of blocking and restoring the runtime's SIGPROF signal 
handler might actually work though.  It might not work. But if it does, it is almost
exactly what you were looking for.

Ian or others more knowledgeable myself might be able to advise better if it _should_
work, but maybe give it a shot. You could ask on gopherslack too for ideas.


It does seem like blocking SIGPROF should work.  Still takes a system call, of course.

That said for this kind of analysis I would probably try the Linux perf tool.

Ian



On Wed, Nov 13, 2024 at 12:55 PM Stephen Illingworth <stephen.i...@gmail.com> wrote:
On Wednesday 13 November 2024 at 12:02:12 UTC Jason E. Aten wrote:
> I've tried but this unfortunately, the Start and Stop processes are too expensive and really require writing to a different file for every stop. The nature of the program means I need to do the Start/Stop process 60+ times per second, so it would generate a lot of files and be very slow on top.

Note that the StartCPUProfile call takes an io.Writer. Therefore a simple *bytes.Buffer suffices. No files need be created. You can be much faster than the file system.

Yes. I've tried that too. But it's the call to StopCPUProfile() which takes a lot of time. Using a bytes.Buffer it takes over 200ms to complete. It's not much quicker than using a os.File as the writer implementation.

My deadline for everything is less than 16ms so the 'pausing' process needs to take a lot less time than that.

I could maybe live with the delay for benchmarking purposes but I'm also left with the problem of "concatenated profiles". pprof doesn't support that. If it did or if there was a way of post-processing the concatenated profiles into one profile that might be a satisfactory outcome.
 
> func StartCPUProfile(w io.Writer) error

Writing a benchmark for your subsystem is the hardest mentally, but most likely the best choice for the long term.

You're absolutely right of course. But I was hoping for a mechanism that could be activated when a user detects a problem with their own input data. That might still be possible but I'll have to give it some thought.

Thanks for the reply!

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/Emrx_W9eSig/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/cd406d0c-c281-4e8f-99ac-6ffb5b3a16efn%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/CAPNEFAYSGUGx2S6P9Mr4poOzBNSZ5u-4pBKcwMuHdCza76n8OQ%40mail.gmail.com.

Rhys Hiltner

unread,
Nov 13, 2024, 1:49:59 PMNov 13
to Ian Lance Taylor, Jason E. Aten, Stephen Illingworth, golang-nuts
There are a few options for post-processing, all within the set of supported Go APIs. One is to use the `-focus` flag of `go tool pprof` so it only displays samples that match a particular call stack. If the code to be profiled is in a single function, that might be all you need. Another is to set and clear "goroutine labels" / "tags" around the parts of the code that you want to inspect. That would mean changing the program to use https://pkg.go.dev/runtime/pprof#Do (or SetGoroutineLabels), and then asking "go tool pprof" to filter the samples. It looks like the `-tagfocus` flag should do that for you.

The 200ms pause when stopping a CPU profile is mostly a sleep; if I recall correctly, the runtime stops asking for SIGPROF deliveries immediately but then waits for any stragglers. And if you end up with several profile files, you can pass them all as arguments to `go tool pprof` and it will merge them for you.

If you're looking for a continuous profiling option that works within Go (no sidecar), is centered around files (no collection infrastructure to manage), and gives you the full power of Go's native profiles, you might enjoy https://pkg.go.dev/github.com/rhysh/autoprof.

Rhys


Reply all
Reply to author
Forward
0 new messages