Optimization Question: Convert Array to Slice without Allocation

235 views

Skip to first unread message

William Gilmore

unread,

Jul 30, 2024, 4:54:32 PM7/30/24

to golang-nuts

I am working on optimizing code to calculate discrete cosine transforms. I am doing this by implementing static code for a given array size vs. using code that can dynamically handle a slice of any size. Overall, I am seeing amazing performance - kudos Go compiler. I have one last optimization opportunity that I would like to achieve. Unfortunately, my Gofu is not sufficient to figure it out on my own. Is there a way to convert a fixed size array to a slice without having to incur 2 allocations.

Go Environment information:

go version go1.22.4 linux/amd64

Example:

func DCT2DFast8(in []float64) (out [64]) {...} - Optimized static code, 0 allocations when called directly. See following benchmark.

func BenchmarkDCT2DFast8(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = DCT2DFast8(ary2d_flat[8])
}
}

BenchmarkDCT2DFast8-12 1372852 757.9 ns/op 0 B/op 0 allocs/op

Same functionality but wrapped in the generalized function that can be called for any size slice

func DCT_2D(input []float64, sz int) []float64 {
...

    if sz == 8 {
      r := DCT2DFast8(result)
      return []float64(r[:])
      }
      ...
    }

BenchmarkDCT_2D_8-12 340670 3227 ns/op 1024 B/op 2 allocs/op

Is there a more performant way, meaning 1 or zero allocations, to convert a fixed size array to a slice?

Ideally, I would like for the following to work:

return (DCT2DFast8(result)[:]

Unfortunately, this does not since the function's return value is transient and the slice expression cannot operate on it.

The current static implementation is 2+ time faster than the more generalized form.

BenchmarkDCT_2D_8-12 144020 7922 ns/op 2112 B/op 19 allocs/op

But it falls far short of the 11 times faster than the array result.

Thank you in advance!

lbe

Brian Candler

unread,

Jul 31, 2024, 3:30:16 AM7/31/24

to golang-nuts

On Tuesday 30 July 2024 at 21:54:32 UTC+1 William Gilmore wrote:

func DCT2DFast8(in []float64) (out [64]) {...} - Optimized static code, 0 allocations when called directly. See following benchmark.

I am guessing return should be something like out [64]float64 ?

Ideally, I would like for the following to work:

return (DCT2DFast8(result)[:]

(Aside: parentheses don't match)

Unfortunately, this does not since the function's return value is transient and the slice expression cannot operate on it.

By "transient" I think you mean "not addressable". Try returning a pointer to the array, instead of an array value: