Re: What's the best way to grow a slice's capacity?

Gerard

unread,

Apr 23, 2013, 10:02:34 AM4/23/13

to golan...@googlegroups.com

If the cap of the slice is large enough, then append doesn't allocate new memory. See goplay

Op dinsdag 23 april 2013 15:37:26 UTC+2 schreef Arne Hormann het volgende:

What's the best way to grow a slice by a given size a priori (when the data you want to append is not yet available)?

I first wondered when I read go.crypto/nacl/secretbox, where out = append(out, 0) is called in a loop.
To me, that seems to be the worst way to do it.

I see two additional ways: append & make and make & copy.
Example: http://play.golang.org/p/PP_RI_WKbJ

The make & copy way is pretty obvious, but it's longer and optimizing it in the compiler is probably harder.
I prefer the append way, but it allocates a new slice it discards immediately.

So... append in a loop, make & copy or append & make - or something else?

Arne Hormann

unread,

Apr 23, 2013, 10:09:28 AM4/23/13

to golan...@googlegroups.com

I know - but that's not always the case. Sometimes I want to start with a small slice and only grow it incrementally when needed.

Coincidentially, that's the scenario I'm interested in.

Julien Schmidt

unread,

Apr 23, 2013, 10:20:48 AM4/23/13

to golan...@googlegroups.com

Assume that the slice / buffer length is always the full capacity.

Arne Hormann

unread,

Apr 23, 2013, 10:24:54 AM4/23/13

to golan...@googlegroups.com

From an off-list response:

> What makes you think it discards it [the underlying array] immediately?

Taken my play example: the slice you have is len == 16, in append you make another slice with len == 16. The slice you get is len == 32.

If there's no free space after the first slice, append has to allocate a new slice with len == 32 and copy the contents of the first and second slice into it.

The second slice - the make([]byte, 16) in append - was allocated but discarded as it'll never be used. Still, the memory in it was zeroed and copied, which is a waste.

Maxim Khitrov

unread,

Apr 23, 2013, 10:45:05 AM4/23/13

to Arne Hormann, golan...@googlegroups.com

First, depending on your exact performance requirements, you may be
better off with a linked list or a slice of slices. That would
eliminate the extra copy operations.

If you do need just once slice, you can still use append without
creating a temporary slice (grow3):

http://play.golang.org/p/jMGLmaxUTt

I think this is what secretbox is trying to do. In my benchmarks, this
version is significantly faster than the other two. It relies on the
fact that append doubles the slice capacity on each reallocation. It
might also benefit from a few other optimizations, such as growing the
array in place if there is sufficient free memory after it.

Another example is bytes.Buffer, which uses make & copy:

http://tip.golang.org/src/pkg/bytes/buffer.go#L77

> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

chris dollin

unread,

Apr 23, 2013, 10:46:57 AM4/23/13

to Arne Hormann, golang-nuts

On 23 April 2013 15:09, Arne Hormann <arneh...@gmail.com> wrote:

I know - but that's not always the case. Sometimes I want to start with a small slice and only grow it incrementally when needed.

The first use of append [1] on a full slice will grow the slice

*by a multiplicative factor*, so the resulting slice has room
to append additonal elements.

Chris

[1] Well, an append that appends 1 or more items.

--
Chris "allusive" Dollin

chris dollin

unread,

Apr 23, 2013, 10:55:20 AM4/23/13

to Arne Hormann, golang-nuts

Oh, /that/ slice, the second argument slice for append. SIlly of me: I was
focussed on the first argument / result slice.

But why make the slice bigger with irrelevant values (the "wasted zeroes")

rather than the values you want to append to it?

[Note that this is a place where escape analysis could spot that the

second argument slice need not be heap allocated, and the zeroing
and copying is pretty much a given if you're going to make a slice

bigger ...]

Chris

--
Chris "allusive" Dollin

Arne Hormann

unread,

Apr 23, 2013, 11:14:33 AM4/23/13

to golan...@googlegroups.com, Arne Hormann, ehog....@googlemail.com

Am Dienstag, 23. April 2013 16:55:20 UTC+2 schrieb chris dollin:

On 23 April 2013 15:24, Arne Hormann <arneh...@gmail.com> wrote:

From an off-list response:
> What makes you think it discards it [the underlying array] immediately?

Taken my play example: the slice you have is len == 16, in append you make another slice with len == 16. The slice you get is len == 32.
If there's no free space after the first slice, append has to allocate a new slice with len == 32 and copy the contents of the first and second slice into it.

The second slice - the make([]byte, 16) in append - was allocated but discarded as it'll never be used. Still, the memory in it was zeroed and copied, which is a waste.

Oh, /that/ slice, the second argument slice for append. SIlly of me: I was
focussed on the first argument / result slice.

But why make the slice bigger with irrelevant values (the "wasted zeroes")

rather than the values you want to append to it?

This originally came up during a review of the go-sql-driver/mysql zero copy buffer. I don't know the values before they are read.

It's hard to know before how large the largest column ever fetched will be - doubling the buffer size is probably fine, but it should allocate as little memory as possible and not allocate in more than step.

Speed is probably more important than memory (Julien's call), so I guess append for doubling will be the way to go.

I just want to make sure the implementation uses the best way to do it.

[Note that this is a place where escape analysis could spot that the
second argument slice need not be heap allocated, and the zeroing
and copying is pretty much a given if you're going to make a slice
bigger ...]

That's exactly what I meant when I wrote copy & make is probably harder to optimize - the make-case for append should be pretty easy.

roger peppe

unread,

Apr 23, 2013, 1:04:20 PM4/23/13

to Arne Hormann, golang-nuts, ehedgehog

I'd do something like this:

// grow returns a slice with the same contents as slice
// but with a capacity at least as large as newCap.
func grow(slice []byte, newCap int) []byte {
if cap(slice) >= newCap {
return slice
}
newSlice := make([]byte, len(slice), newCap)
copy(newSlice, slice)
return newSlice

Gerard

unread,

Apr 24, 2013, 3:56:22 AM4/24/13

to golan...@googlegroups.com, Arne Hormann, ehog....@googlemail.com

Thinking about it once more, I would like to mention two things:

I think that clear code is more important than micro-optimisation, especially with todays processor power.
If you really want to know the answer, perform benchmark tests (with actual data and 32/64 bit environment and Go1.1).

Julien Schmidt

unread,

Apr 24, 2013, 9:40:34 AM4/24/13

to golan...@googlegroups.com, Arne Hormann

I tried an approach based on Maxim's grow3:

Here is the current code (with debug output):

// grow buffer if necessary

if need > len(b.buf) {
fmt.Printf("REFILL len=%d, need=%d \r\n", len(b.buf), need)
for cap(b.buf) < need {
b.buf = append(b.buf[:cap(b.buf)], 0)
fmt.Printf("APPENDED cap=%d \r\n", cap(b.buf))
}
b.buf = b.buf[:cap(b.buf)]
fmt.Printf("END len=%d, need=%d \r\n", len(b.buf), need)
}

need is the new size we need, b.buf is the buffer so len(b.buf) is the current size.

Here is the debug output from a test:

REFILL len=4096, need=1048551
APPENDED cap=5120
APPENDED cap=6400
APPENDED cap=8000
APPENDED cap=10000
APPENDED cap=12500
APPENDED cap=15625
APPENDED cap=19531
APPENDED cap=24413
APPENDED cap=30516
APPENDED cap=38145
APPENDED cap=47681
APPENDED cap=59601
APPENDED cap=74501
APPENDED cap=93126
APPENDED cap=116407
APPENDED cap=145508
APPENDED cap=181885
APPENDED cap=227356
APPENDED cap=284195
APPENDED cap=355243
APPENDED cap=444053
APPENDED cap=555066
APPENDED cap=693832
APPENDED cap=867290
APPENDED cap=1084112
END len=1084112, need=1048551

Any suggestions how this could be optimized?

Julien Schmidt

unread,

Apr 24, 2013, 9:54:55 AM4/24/13

to golan...@googlegroups.com, Arne Hormann

I'm now tending to use grow2 instead. I'm writing a benchmark now...

Hotei

unread,

Apr 24, 2013, 9:57:53 AM4/24/13

to golan...@googlegroups.com

There's nothing magic about the append function. If the current append doesn't suit your purpose it's trivial to write your own and have it grow your slices by whatever increment percentage you want, including a sliding scale percentage if that's useful.

chris dollin

unread,

Apr 24, 2013, 10:00:29 AM4/24/13

to Julien Schmidt, golang-nuts, Arne Hormann

On 24 April 2013 14:40, Julien Schmidt <g...@julienschmidt.com> wrote:

I tried an approach based on Maxim's grow3:

Here is the current code (with debug output):

// grow buffer if necessary
if need > len(b.buf) {
fmt.Printf("REFILL len=%d, need=%d \r\n", len(b.buf), need)
for cap(b.buf) < need {
b.buf = append(b.buf[:cap(b.buf)], 0)
fmt.Printf("APPENDED cap=%d \r\n", cap(b.buf))
}
b.buf = b.buf[:cap(b.buf)]
fmt.Printf("END len=%d, need=%d \r\n", len(b.buf), need)
}
need is the new size we need, b.buf is the buffer so len(b.buf) is the current size.

Here is the debug output from a test:

...

Any suggestions how this could be optimized?

If you don't know what the elements are, but do know how many you

need, the grow2() version makes the most sense to me.

If the allocated size is going to be beig enough, you can

still add the elements using append() so that len() is not

misleading.

chris dollin

unread,

Apr 24, 2013, 10:01:37 AM4/24/13

to Hotei, golang-nuts

On 24 April 2013 14:57, Hotei <hote...@gmail.com> wrote:

There's nothing magic about the append function.

Apart from its polymorphic type.

Chris

--
Chris "polymorph self" Dollin

Maxim Khitrov

unread,

Apr 24, 2013, 10:02:12 AM4/24/13

to Julien Schmidt, golan...@googlegroups.com, Arne Hormann

grow3 works well when the extra requested space is less than the
current capacity. When you need to more than double the existing
capacity, the make & copy approach is likely to be faster. Here's an
example:

http://play.golang.org/p/sLHRRReVvI

Julien Schmidt

unread,

Apr 24, 2013, 10:26:55 AM4/24/13

to golan...@googlegroups.com, Arne Hormann

On Wednesday, April 24, 2013 4:02:12 PM UTC+2, Maxim Khitrov wrote:

grow3 works well when the extra requested space is less than the
current capacity.

That was also my impression.

When you need to more than double the existing
capacity, the make & copy approach is likely to be faster. Here's an
example:

http://play.golang.org/p/sLHRRReVvI

This looks really nice! I think I'll skip the benchmark and stick to that code.

In the current use-case this code part is not really performance critical. It's more a general interest. Maybe I'll write a benchmark some time later to find the exact point where append gets more expensive.

Julien Schmidt

unread,

Jun 2, 2013, 9:19:34 AM6/2/13

to golan...@googlegroups.com, Arne Hormann

First of all, the built-in append function has not same characteristics as the example Append-function in Go at http://golang.org/doc/effective_go.html#slices. You often read, that append would double the size when it must reallocate, which is not entirely true. The slice is currently growed by factor 1.25 if the size is greater than 1024: http://code.google.com/p/go/source/browse/src/pkg/runtime/slice.c#208

Also I decided to run a benchmark. The (heavily copy-pasted) source: http://play.golang.org/p/xMWTHiu9JY

Results:

System: Intel i5-2500K @ 3.30 GHz, 8192 MB RAM, Windows 7 x64

PASS

BenchmarkGrowAppend1 50000 32121 ns/op 10240 B/op 1 allocs/op

BenchmarkGrowMake1 50000 32101 ns/op 9471 B/op 1 allocs/op

BenchmarkGrowAppend4 50000 32261 ns/op 10240 B/op 1 allocs/op

BenchmarkGrowMake4 50000 31961 ns/op 9471 B/op 1 allocs/op

BenchmarkGrowAppend64 50000 31721 ns/op 10240 B/op 1 allocs/op

BenchmarkGrowMake64 50000 32241 ns/op 9471 B/op 1 allocs/op

BenchmarkGrowAppend256 50000 31861 ns/op 10240 B/op 1 allocs/op

BenchmarkGrowMake256 50000 31981 ns/op 9471 B/op 1 allocs/op

BenchmarkGrowAppend1024 50000 32241 ns/op 10240 B/op 1 allocs/op

BenchmarkGrowMake1024 50000 32121 ns/op 10239 B/op 1 allocs/op

BenchmarkGrowAppend4096 10000 130507 ns/op 50688 B/op 6 allocs/op

BenchmarkGrowMake4096 100000 21841 ns/op 11601 B/op 1 allocs/op

BenchmarkGrowAppend16384 10000 264215 ns/op 146688 B/op 12 allocs/op

BenchmarkGrowMake16384 1000000 1372 ns/op 20480 B/op 1 allocs/op

BenchmarkGrowAppend65536 5000 576232 ns/op 454912 B/op 21 allocs/op

BenchmarkGrowMake65536 50000 66623 ns/op 79872 B/op 2 allocs/op

BenchmarkGrowAppend262144 2000 1049059 ns/op 1577216 B/op 33 allocs/op

BenchmarkGrowMake262144 50000 73044 ns/op 276480 B/op 2 allocs/op

ok command-line-arguments 110.262s

Same system, Linux (Ubuntu 12.04) @ VirtualBox VM with 2048 MB RAM, 2 CPU Cores [VT-x/AMD-V, Nested Paging and PAE/NX enabled]

testing: warning: no tests to run

PASS

BenchmarkGrowAppend1 200000 12270 ns/op 5181 B/op 1 allocs/op

BenchmarkGrowMake1 500000 7012 ns/op 4352 B/op 1 allocs/op

BenchmarkGrowAppend4 100000 16431 ns/op 5242 B/op 1 allocs/op

BenchmarkGrowMake4 500000 6958 ns/op 4352 B/op 1 allocs/op

BenchmarkGrowAppend64 100000 16332 ns/op 5242 B/op 1 allocs/op

BenchmarkGrowMake64 500000 7048 ns/op 4352 B/op 1 allocs/op

BenchmarkGrowAppend256 100000 16635 ns/op 5242 B/op 1 allocs/op

BenchmarkGrowMake256 500000 7032 ns/op 4352 B/op 1 allocs/op

BenchmarkGrowAppend1024 200000 7847 ns/op 5120 B/op 1 allocs/op

BenchmarkGrowMake1024 200000 11910 ns/op 5175 B/op 1 allocs/op

BenchmarkGrowAppend4096 50000 43127 ns/op 30607 B/op 4 allocs/op

BenchmarkGrowMake4096 200000 11180 ns/op 8192 B/op 1 allocs/op

BenchmarkGrowAppend16384 50000 70581 ns/op 106342 B/op 8 allocs/op

BenchmarkGrowMake16384 200000 12060 ns/op 20480 B/op 1 allocs/op

BenchmarkGrowAppend65536 10000 220643 ns/op 371557 B/op 13 allocs/op

BenchmarkGrowMake65536 50000 37278 ns/op 70313 B/op 1 allocs/op

BenchmarkGrowAppend262144 5000 556190 ns/op 1441009 B/op 20 allocs/op

BenchmarkGrowMake262144 50000 59496 ns/op 267776 B/op 1 allocs/op

ok command-line-arguments 124.845s

So the extra case-differentiation currently doesn't make sense. If you know that that the underlying array must be reallocated, it is better to use just make.

Reply all

Reply to author

Forward