I've been looking at these kinds of graphs for a while. There have been a few easy wins, but a lot of this bookkeeping overhead is going to be harder to get rid of with flyby micro-optimization. If we're going to solve this, we need to transition to an engineering culture that thinks about performance as a primary concern when writing code and architecting new components. We're probably also going to need to rip up some of our old code along the way.
It's easy to say that we're in this situation because of the language that we're using, but I don't think that's fair, accurate, or helpful. Go's biggest obstacle to performance is that it makes it so easy to write moderately inefficient code. It makes it easy to allocate memory without the full cost being obvious because it's hidden by the GC, it makes it trivial to toss around and abuse expensive synchronization mechanisms to make up for poorly thought-out object relations, and it imposes a cost on abstraction in the form of heap allocations and dynamic dispatch.
But that doesn't mean that we can't use it to write highly performant code. The new cost-based optimizer is a perfect example of this. The team working on it has kept their eye on performance metrics throughout its development process. They began by creating a series of micro-benchmarks, they adopted a zero-allocation mindset, and they justified changes using benchmark results and profiles. As a result, the cost-based optimizer appears to be faster than the old heuristic-based optimizer even though it does significantly more work.
One of the biggest mistakes we make is believing that code that is off the "hot path" (for some definition of "hot path") can be arbitrarily inefficient. If nothing else,
https://github.com/cockroachdb/cockroach/issues/30208 should come as a warning sign for how impactful small inefficiencies (a few allocations and a few lock acquisitions) that are only performed once per transaction can be.
To start moving in the right direction, I'd like to get some mindshare around the tools engineers have at their disposal to write efficient code. Would people be interested in a lunch and learn about profiling using pprof?