Here's an entirely unscientific method to determine the overhead of profiling. The Go distribution contains a set of basic benchmarks, one of which is a loopback based http client server benchmark. Running the benchmark with and without profiling gives a rough ballpark for the overhead of profiling.
lucky(~/go/test/bench/go1) % go test -run=XXX -bench=HTTPClientServer
goos: linux
goarch: amd64
BenchmarkHTTPClientServer-4 20000 84296 ns/op
PASS
ok _/home/dfc/go/test/bench/go1 4.274s
lucky(~/go/test/bench/go1) % go test -run=XXX -bench=HTTPClientServer -cpuprofile=/tmp/c.p
goos: linux
goarch: amd64
BenchmarkHTTPClientServer-4 20000 85316 ns/op
PASS
ok _/home/dfc/go/test/bench/go1 4.402s
You could use this to experiment with the other kinds of profiles; memory, block, trace, etc.
If you wanted to go a step further you could adding profiling to your own project with my
github.com/pkg/profile package then compare the results of a http load test with and without profiling enabled.
Thanks
Dave