I am working on optimizing code to calculate discrete cosine transforms. I am doing this by implementing static code for a given array size vs. using code that can dynamically handle a slice of any size. Overall, I am seeing amazing performance - kudos Go compiler. I have one last optimization opportunity that I would like to achieve. Unfortunately, my Gofu is not sufficient to figure it out on my own. Is there a way to convert a fixed size array to a slice without having to incur 2 allocations.
go version go1.22.4 linux/amd64
Example:
func DCT2DFast8(in []float64) (out [64]) {...} - Optimized static code, 0 allocations when called directly. See following benchmark.
func BenchmarkDCT2DFast8(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = DCT2DFast8(ary2d_flat[8])
}
}BenchmarkDCT2DFast8-12 1372852 757.9 ns/op 0 B/op 0 allocs/opSame functionality but wrapped in the generalized function that can be called for any size slice
func DCT_2D(input []float64, sz int) []float64 {
...
if sz == 8 {
r := DCT2DFast8(result)
return []float64(r[:])
}
...
}
BenchmarkDCT_2D_8-12 340670 3227 ns/op 1024 B/op 2 allocs/op
Is there a more performant way, meaning 1 or zero allocations, to convert a fixed size array to a slice?
Ideally, I would like for the following to work:
return (DCT2DFast8(result)[:]
Unfortunately, this does not since the function's return value is transient and the slice expression cannot operate on it.
The current static implementation is 2+ time faster than the more generalized form.
BenchmarkDCT_2D_8-12 144020 7922 ns/op 2112 B/op 19 allocs/op
But it falls far short of the 11 times faster than the array result.
Thank you in advance!
lbe