I am quite unfamiliar with how compilers work but I think there might be some inconsistencies that could be fixed about functions that create and return a string.
Here is a memory benchmark:
http://play.golang.org/p/Ba4fekX7Dt
BenchmarkBuildBytes creates and returns a byte slice.
BenchmarkBuildString creates a byte slice and then converts it to a string.
BenchmarkConvertString calls BenchmarkBuildBytes and converts it to a string.
Here are the results :
BenchmarkBuildBytes 1024 B/op 1 allocs/op
BenchmarkBuildString 1024 B/op 1 allocs/op
BenchmarkConvertString 2048 B/op 2 allocs/op
The results show that BenchmarkBuildString allocates only once like BenchmarkBuildBytes which is nice. However BenchmarkConvertString allocates twice.
First question: Could the compiler optimize BenchmarkConvertString so that it allocates only once?
With this optimization, packages could only provides a function returning a slice of bytes and the user could convert it to string without any additional allocation. Without it, efficient API needs to provide two versions of the functions, one that returns a slice of bytes and one that returns a string.
I feel that it could be possible for the compiler to optimize this since the byte slice is converted as soon as it is returned by the function. But again I am not very familiar with how compilers work.
Now here are the results when the length is set to 10,000 instead of 1,000:
BenchmarkBuildBytes 10496 B/op 1 allocs/op
BenchmarkBuildString 26880 B/op 2 allocs/op
BenchmarkConvertString 20992 B/op 2 allocs/op
Not only the optimization in BenchmarkBuildString no longer works but also even more memory is used than in BenchmarkConvertString.
Finally, here are the results when the length is set to 100,000:
BenchmarkBuildBytes 106496 B/op 1 allocs/op
BenchmarkBuildString 212992 B/op 2 allocs/op
BenchmarkConvertString 212992 B/op 2 allocs/op
Now the optimization in BenchmarkBuildString still does not work but at least it does not use more memory than BenchmarkConvertString.
Second question: Are these results normal or is there a bug? Could the optimization in BenchmarkBuildString stays whatever the slice length? If not, could it use at most the same amount of memory than BenchmarkConvertString?