Vtune is very useful for squeezing the ultimate performance out of Go programs, once you have done
the usual optimisation, mimized allocations, io etc.
pprof is more than adequate for the average programmer. But when you need to super-optimise
functions which implement math kernels, crypto functions, video codecs etc, then without a HW perfomance
counter based profiler such as vtune or linux perf, (
https://perf.wiki.kernel.org/index.php/Main_Page) you are shooting in the dark.
vtune not only tells you which functions are taking the most time, but WHY these are taking a long time,
how long the code is spending waiting for cache misses, and the different kind of stall cycles which
kill performance on a modern CPU.
Vtune or perf is also a great tool for teaching us about processors, and helping us understand what influences
the rate at which instructions are executed by them.
The problem with vtune is that it is quite unfriendly and expensive (> $3000 for a single floating license)!
It also does not work on ARM processors (such as Apple M1).
There has been a proposal to add performance counters to pprof.
https://go.googlesource.com/proposal/+/refs/changes/08/219508/2/design/36821-perf-counter-pprof.mdIf accepted, this would give the power of vtune to the masses for free..