Optimization Question: Convert Array to Slice without Allocation

235 views
Skip to first unread message

William Gilmore

unread,
Jul 30, 2024, 4:54:32 PM7/30/24
to golang-nuts
I am working on optimizing code to calculate discrete cosine transforms. I am doing this by implementing static code for a given array size vs. using code that can dynamically handle a slice of any size. Overall, I am seeing amazing performance - kudos Go compiler. I have one last optimization opportunity that I would like to achieve. Unfortunately, my Gofu is not sufficient to figure it out on my own. Is there a way to convert a fixed size array to a slice without having to incur 2 allocations.

Go Environment information:
    go version go1.22.4 linux/amd64                                                                                                                               
Example:

func DCT2DFast8(in []float64) (out [64]) {...} - Optimized static code, 0 allocations when called directly. See following benchmark.

    func BenchmarkDCT2DFast8(b *testing.B) {
      for i := 0; i < b.N; i++ {
        _ = DCT2DFast8(ary2d_flat[8])
      }
    }


BenchmarkDCT2DFast8-12     1372852       757.9 ns/op       0 B/op       0 allocs/op

Same functionality but wrapped in the generalized function that can be called for any size slice

    func DCT_2D(input []float64, sz int) []float64 {
  
    ...
      if sz == 8 {
          r := DCT2DFast8(result)
          return []float64(r[:])
        }
        ...
      }

BenchmarkDCT_2D_8-12      340670      3227 ns/op    1024 B/op       2 allocs/op

Is there a more performant way, meaning 1 or zero allocations, to convert a fixed size array to a slice?

Ideally, I would like for the following to work:

    return (DCT2DFast8(result)[:]

Unfortunately, this does not since the function's return value is transient and the slice expression cannot operate on it.

The current static implementation is 2+ time faster than the more generalized form.

BenchmarkDCT_2D_8-12      144020      7922 ns/op    2112 B/op      19 allocs/op

But it falls far short of the 11 times faster than the array result.

Thank you in advance! 

lbe

Brian Candler

unread,
Jul 31, 2024, 3:30:16 AM7/31/24
to golang-nuts
On Tuesday 30 July 2024 at 21:54:32 UTC+1 William Gilmore wrote:

func DCT2DFast8(in []float64) (out [64]) {...} - Optimized static code, 0 allocations when called directly. See following benchmark.


I am guessing return should be something like out [64]float64 ?

 

Ideally, I would like for the following to work:

    return (DCT2DFast8(result)[:]


(Aside: parentheses don't match)

 
Unfortunately, this does not since the function's return value is transient and the slice expression cannot operate on it.

By "transient" I think you mean "not addressable".  Try returning a pointer to the array, instead of an array value:


Reply all
Reply to author
Forward
0 new messages