I'm trying to prove an optimization technique for ring buffer is effective. One of the technique is using bitmask instead of modulo to calculate a wrap around. However, in my environment, modulo is slightly faster in a test where 1 billion items are enqueued /dequeued by a single goroutine. What do you think could be the cause?
Environment:
* go version go1.21.4 darwin/arm64
* Apple M1 Pro
RingBuffer with modulo:
```
type RingBuffer0 struct {
writeIdx uint64
readIdx uint64
buffers []any
size uint64
}
func NewRingBuffer0(size uint64) *RingBuffer0 {
rb := &RingBuffer0{}
rb.init(size)
return rb
}
func (rb *RingBuffer0) init(size uint64) {
rb.buffers = make([]any, size)
rb.size = size
}
func (rb *RingBuffer0) Enqueue(item any) error {
if rb.writeIdx-rb.readIdx == rb.size {
return ErrBufferFull
}
rb.buffers[rb.writeIdx%rb.size] = item
rb.writeIdx++
return nil
}
func (rb *RingBuffer0) Dequeue() (any, error) {
if rb.writeIdx == rb.readIdx {
return nil, ErrBufferEmpty
}
item := rb.buffers[rb.readIdx%rb.size]
rb.readIdx++
return item, nil
}
```
RingBuffer with bitmask:
change each module calculation to the code below
* rb.buffers[rb.writeIdx&(rb.size-1)] = item
* item := rb.buffers[rb.readIdx&(rb.size-1)]
Test:
func TestSingle(rb RingBuffer) {
start := time.Now()
total := 500000
for i := 0; i < total; i++ {
for j := 0; j < 1000; j++ {
rb.Enqueue(j)
}
for j := 0; j < 1000; j++ {
rb.Dequeue()
}
}
end := time.Now()
count := total * 2000
duration := end.Sub(start).Milliseconds()
fmt.Printf("%d ops in %d ms\n", count, duration)
}