Thanks for taking the time to clarify.
So it sounds like we shouldn't be relying on cpu_ms for much of anything. With regard to the average mcycles attributed to each request in the dashboard, it sounds like this is just a best guess made by the appengine container (or whichever system is responsible for gathering statistics). Do you think it's reasonable to assume that it just looks at load (computed by the VM, perhaps?) between the beginning and end of each request to compute this average?
If that model's correct, I presume that there's essentially no way to get reliable numbers when there's any concurrency -- e.g., if the original request goes into IO wait, the runtime would likely switch to another goroutine, some of whose load would be attributed to the original (unrelated) request.
Assuming *all* of that's correct, do you think it's safe to say that we should just stick with using pprof locally to profile?
Cheers,
joel.