Is the GC team interested in "real programs" that could be good benchmarks for GC, or do you have what you need?
I ask because I have a program that is GC limited. It took 6.8 seconds to run under go1.7.3, and now only takes 4.3 seconds to run under go1.8beta2. This seems like a much bigger improvement than expected from the release notes. Secondly, if I modify the program to save 2 slice allocations per iteration in the inner loop, the runtime drops to 2.5 seconds in both go versions. This seems like a very large difference, though I don't know if it's unexpected.
Details:
The program constructs a linear program based on certain logic, and then transforms my Go representation into a format that can be read by Gurobi (a commercial LP solver). The full program writes this representation to a file, though for the benchmarks above I disabled this write. The construction of the LP in Go format only takes about 1% of the runtime, though it does create a lot of persistent objects. In the conversion to Gurobi format, each constraint is turned into a []byte. The two slice allocations are temporary storage to convert the constraints between the two formats. The problem above has ~40,000 constraints and ~17,000 variables, and so the two slice allocations are each roughly "c := make([]float64, 17000)" (it's the same size for each iteration of the loop). I suspect that part of the problem is that there are a lot of long-living objects which make each GC take a while even though the number of new/dead objects is small. Secondly, I suspect that because the slices are large, they trigger GCs each iteration of the loop, even though the number of objects is small (if I remove the two allocations, but also make GOGC small, the running time is unaffected).
If it's useful to transform this into a benchmark I can.