Re: What's the best way to grow a slice's capacity?

11,342 views
Skip to first unread message

Gerard

unread,
Apr 23, 2013, 10:02:34 AM4/23/13
to golan...@googlegroups.com
If the cap of the slice is large enough, then append doesn't allocate new memory. See goplay


Op dinsdag 23 april 2013 15:37:26 UTC+2 schreef Arne Hormann het volgende:
What's the best way to grow a slice by a given size a priori (when the data you want to append is not yet available)?

I first wondered when I read go.crypto/nacl/secretbox, where out = append(out, 0) is called in a loop.
To me, that seems to be the worst way to do it.

I see two additional ways: append & make and make & copy.

The make & copy way is pretty obvious, but it's longer and optimizing it in the compiler is probably harder.
I prefer the append way, but it allocates a new slice it discards immediately.

So... append in a loop, make & copy or append & make - or something else?

Arne Hormann

unread,
Apr 23, 2013, 10:09:28 AM4/23/13
to golan...@googlegroups.com
I know - but that's not always the case. Sometimes I want to start with a small slice and only grow it incrementally when needed.
Coincidentially, that's the scenario I'm interested in.

Julien Schmidt

unread,
Apr 23, 2013, 10:20:48 AM4/23/13
to golan...@googlegroups.com
Assume that the slice / buffer length is always the full capacity.

Arne Hormann

unread,
Apr 23, 2013, 10:24:54 AM4/23/13
to golan...@googlegroups.com
From an off-list response:
> What makes you think it discards it [the underlying array] immediately?

Taken my play example: the slice you have is len == 16, in append you make another slice with len == 16. The slice you get is len == 32.
If there's no free space after the first slice, append has to allocate a new slice with len == 32 and copy the contents of the first and second slice into it.
The second slice - the make([]byte, 16) in append - was allocated but discarded as it'll never be used. Still, the memory in it was zeroed and copied, which is a waste.

Maxim Khitrov

unread,
Apr 23, 2013, 10:45:05 AM4/23/13
to Arne Hormann, golan...@googlegroups.com
First, depending on your exact performance requirements, you may be
better off with a linked list or a slice of slices. That would
eliminate the extra copy operations.

If you do need just once slice, you can still use append without
creating a temporary slice (grow3):

http://play.golang.org/p/jMGLmaxUTt

I think this is what secretbox is trying to do. In my benchmarks, this
version is significantly faster than the other two. It relies on the
fact that append doubles the slice capacity on each reallocation. It
might also benefit from a few other optimizations, such as growing the
array in place if there is sufficient free memory after it.

Another example is bytes.Buffer, which uses make & copy:

http://tip.golang.org/src/pkg/bytes/buffer.go#L77
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

chris dollin

unread,
Apr 23, 2013, 10:46:57 AM4/23/13
to Arne Hormann, golang-nuts
On 23 April 2013 15:09, Arne Hormann <arneh...@gmail.com> wrote:
I know - but that's not always the case. Sometimes I want to start with a small slice and only grow it incrementally when needed.

The first use of append [1] on a full slice will grow the slice
*by a multiplicative factor*, so the resulting slice has room
to append additonal elements.

Chris


[1] Well, an append that appends 1 or more items.
--
Chris "allusive" Dollin

chris dollin

unread,
Apr 23, 2013, 10:55:20 AM4/23/13
to Arne Hormann, golang-nuts
Oh, /that/ slice, the second argument slice for append. SIlly of me: I was
focussed on the first argument / result slice.

But why make the slice bigger with irrelevant values (the "wasted zeroes")
rather than the values you want to append to it?

[Note that this is a place where escape analysis could spot that the
second argument slice need not be heap allocated, and the zeroing
and copying is pretty much a given if you're going to make a slice
bigger ...]

Chris

--
Chris "allusive" Dollin

Arne Hormann

unread,
Apr 23, 2013, 11:14:33 AM4/23/13
to golan...@googlegroups.com, Arne Hormann, ehog....@googlemail.com


Am Dienstag, 23. April 2013 16:55:20 UTC+2 schrieb chris dollin:
On 23 April 2013 15:24, Arne Hormann <arneh...@gmail.com> wrote:
From an off-list response:
> What makes you think it discards it [the underlying array] immediately?

Taken my play example: the slice you have is len == 16, in append you make another slice with len == 16. The slice you get is len == 32.
If there's no free space after the first slice, append has to allocate a new slice with len == 32 and copy the contents of the first and second slice into it.
The second slice - the make([]byte, 16) in append - was allocated but discarded as it'll never be used. Still, the memory in it was zeroed and copied, which is a waste.

Oh, /that/ slice, the second argument slice for append. SIlly of me: I was
focussed on the first argument / result slice.

But why make the slice bigger with irrelevant values (the "wasted zeroes")
rather than the values you want to append to it?

This originally came up during a review of the go-sql-driver/mysql zero copy buffer. I don't know the values before they are read.
It's hard to know before how large the largest column ever fetched will be - doubling the buffer size is probably fine, but it should allocate as little memory as possible and not allocate in more than step.
Speed is probably more important than memory (Julien's call), so I guess append for doubling will be the way to go.
I just want to make sure the implementation uses the best way to do it.
 
[Note that this is a place where escape analysis could spot that the
second argument slice need not be heap allocated, and the zeroing
and copying is pretty much a given if you're going to make a slice
bigger ...]

That's exactly what I meant when I wrote copy & make is probably harder to optimize - the make-case for append should be pretty easy.

roger peppe

unread,
Apr 23, 2013, 1:04:20 PM4/23/13
to Arne Hormann, golang-nuts, ehedgehog
I'd do something like this:


// grow returns a slice with the same contents as slice
// but with a capacity at least as large as newCap.
func grow(slice []byte, newCap int) []byte {
if cap(slice) >= newCap {
return slice
}
newSlice := make([]byte, len(slice), newCap)
copy(newSlice, slice)
return newSlice

Gerard

unread,
Apr 24, 2013, 3:56:22 AM4/24/13
to golan...@googlegroups.com, Arne Hormann, ehog....@googlemail.com
Thinking about it once more, I would like to mention two things:
  1. I think that clear code is more important than micro-optimisation, especially with todays processor power.
  2. If you really want to know the answer, perform benchmark tests (with actual data and 32/64 bit environment and Go1.1).

Julien Schmidt

unread,
Apr 24, 2013, 9:40:34 AM4/24/13
to golan...@googlegroups.com, Arne Hormann
I tried an approach based on Maxim's grow3:

Here is the current code (with debug output):
// grow buffer if necessary
if need > len(b.buf) {
fmt.Printf("REFILL len=%d, need=%d \r\n", len(b.buf), need)
for cap(b.buf) < need {
b.buf = append(b.buf[:cap(b.buf)], 0)
fmt.Printf("APPENDED cap=%d \r\n", cap(b.buf))
}
b.buf = b.buf[:cap(b.buf)]
fmt.Printf("END len=%d, need=%d \r\n", len(b.buf), need)
}
need is the new size we need, b.buf is the buffer so len(b.buf) is the current size.

Here is the debug output from a test:
REFILL len=4096, need=1048551
APPENDED cap=5120
APPENDED cap=6400
APPENDED cap=8000
APPENDED cap=10000
APPENDED cap=12500
APPENDED cap=15625
APPENDED cap=19531
APPENDED cap=24413
APPENDED cap=30516
APPENDED cap=38145
APPENDED cap=47681
APPENDED cap=59601
APPENDED cap=74501
APPENDED cap=93126
APPENDED cap=116407
APPENDED cap=145508
APPENDED cap=181885
APPENDED cap=227356
APPENDED cap=284195
APPENDED cap=355243
APPENDED cap=444053
APPENDED cap=555066
APPENDED cap=693832
APPENDED cap=867290
APPENDED cap=1084112
END len=1084112, need=1048551

Any suggestions how this could be optimized? 

Julien Schmidt

unread,
Apr 24, 2013, 9:54:55 AM4/24/13
to golan...@googlegroups.com, Arne Hormann
I'm now tending to use grow2 instead. I'm writing a benchmark now...

Hotei

unread,
Apr 24, 2013, 9:57:53 AM4/24/13
to golan...@googlegroups.com
There's nothing magic about the append function.  If the current append doesn't suit your purpose it's trivial to write your own and have it grow your slices by whatever increment percentage you want, including a sliding scale percentage if that's useful.

chris dollin

unread,
Apr 24, 2013, 10:00:29 AM4/24/13
to Julien Schmidt, golang-nuts, Arne Hormann
On 24 April 2013 14:40, Julien Schmidt <g...@julienschmidt.com> wrote:
I tried an approach based on Maxim's grow3:

Here is the current code (with debug output):
// grow buffer if necessary
if need > len(b.buf) {
fmt.Printf("REFILL len=%d, need=%d \r\n", len(b.buf), need)
for cap(b.buf) < need {
b.buf = append(b.buf[:cap(b.buf)], 0)
fmt.Printf("APPENDED cap=%d \r\n", cap(b.buf))
}
b.buf = b.buf[:cap(b.buf)]
fmt.Printf("END len=%d, need=%d \r\n", len(b.buf), need)
}
need is the new size we need, b.buf is the buffer so len(b.buf) is the current size.

Here is the debug output from a test:

...

Any suggestions how this could be optimized? 

If you don't know what the elements are, but do know how many you
need, the grow2() version makes the most sense to me.

If the allocated size is going to be beig enough, you can
still add the elements using append() so that len() is not
misleading.

chris dollin

unread,
Apr 24, 2013, 10:01:37 AM4/24/13
to Hotei, golang-nuts
On 24 April 2013 14:57, Hotei <hote...@gmail.com> wrote:
There's nothing magic about the append function.

Apart from its polymorphic type.

Chris

--
Chris "polymorph self" Dollin

Maxim Khitrov

unread,
Apr 24, 2013, 10:02:12 AM4/24/13
to Julien Schmidt, golan...@googlegroups.com, Arne Hormann
grow3 works well when the extra requested space is less than the
current capacity. When you need to more than double the existing
capacity, the make & copy approach is likely to be faster. Here's an
example:

http://play.golang.org/p/sLHRRReVvI

Julien Schmidt

unread,
Apr 24, 2013, 10:26:55 AM4/24/13
to golan...@googlegroups.com, Arne Hormann

On Wednesday, April 24, 2013 4:02:12 PM UTC+2, Maxim Khitrov wrote:
grow3 works well when the extra requested space is less than the
current capacity.
That was also my impression.
 
When you need to more than double the existing
capacity, the make & copy approach is likely to be faster. Here's an
example:

http://play.golang.org/p/sLHRRReVvI
This looks really nice! I think I'll skip the benchmark and stick to that code.

In the current use-case this code part is not really performance critical. It's more a general interest. Maybe I'll write a benchmark some time later to find the exact point where append gets more expensive. 

Julien Schmidt

unread,
Jun 2, 2013, 9:19:34 AM6/2/13
to golan...@googlegroups.com, Arne Hormann
First of all, the built-in append function has not same characteristics as the example Append-function in Go at http://golang.org/doc/effective_go.html#slices. You often read, that append would double the size when it must reallocate, which is not entirely true. The slice is currently growed by factor 1.25 if the size is greater than 1024: http://code.google.com/p/go/source/browse/src/pkg/runtime/slice.c#208 

Also I decided to run a benchmark. The (heavily copy-pasted) source: http://play.golang.org/p/xMWTHiu9JY

Results:

System: Intel i5-2500K @ 3.30 GHz, 8192 MB RAM, Windows 7 x64

PASS
BenchmarkGrowAppend1       50000     32121 ns/op   10240 B/op       1 allocs/op
BenchmarkGrowMake1         50000     32101 ns/op    9471 B/op       1 allocs/op
BenchmarkGrowAppend4       50000     32261 ns/op   10240 B/op       1 allocs/op
BenchmarkGrowMake4         50000     31961 ns/op    9471 B/op       1 allocs/op
BenchmarkGrowAppend64      50000     31721 ns/op   10240 B/op       1 allocs/op
BenchmarkGrowMake64        50000     32241 ns/op    9471 B/op       1 allocs/op
BenchmarkGrowAppend256     50000     31861 ns/op   10240 B/op       1 allocs/op
BenchmarkGrowMake256       50000     31981 ns/op    9471 B/op       1 allocs/op
BenchmarkGrowAppend1024    50000     32241 ns/op   10240 B/op       1 allocs/op
BenchmarkGrowMake1024      50000     32121 ns/op   10239 B/op       1 allocs/op
BenchmarkGrowAppend4096    10000    130507 ns/op   50688 B/op       6 allocs/op
BenchmarkGrowMake4096     100000     21841 ns/op   11601 B/op       1 allocs/op
BenchmarkGrowAppend16384   10000    264215 ns/op  146688 B/op      12 allocs/op
BenchmarkGrowMake16384   1000000      1372 ns/op   20480 B/op       1 allocs/op
BenchmarkGrowAppend65536    5000    576232 ns/op  454912 B/op      21 allocs/op
BenchmarkGrowMake65536     50000     66623 ns/op   79872 B/op       2 allocs/op
BenchmarkGrowAppend262144   2000   1049059 ns/op 1577216 B/op      33 allocs/op
BenchmarkGrowMake262144    50000     73044 ns/op  276480 B/op       2 allocs/op
ok   command-line-arguments 110.262s


Same system, Linux (Ubuntu 12.04) @ VirtualBox VM with 2048 MB RAM, 2 CPU Cores [VT-x/AMD-V, Nested Paging and PAE/NX enabled]

testing: warning: no tests to run
PASS
BenchmarkGrowAppend1      200000     12270 ns/op    5181 B/op       1 allocs/op
BenchmarkGrowMake1        500000      7012 ns/op    4352 B/op       1 allocs/op
BenchmarkGrowAppend4      100000     16431 ns/op    5242 B/op       1 allocs/op
BenchmarkGrowMake4        500000      6958 ns/op    4352 B/op       1 allocs/op
BenchmarkGrowAppend64     100000     16332 ns/op    5242 B/op       1 allocs/op
BenchmarkGrowMake64       500000      7048 ns/op    4352 B/op       1 allocs/op
BenchmarkGrowAppend256    100000     16635 ns/op    5242 B/op       1 allocs/op
BenchmarkGrowMake256      500000      7032 ns/op    4352 B/op       1 allocs/op
BenchmarkGrowAppend1024   200000      7847 ns/op    5120 B/op       1 allocs/op
BenchmarkGrowMake1024     200000     11910 ns/op    5175 B/op       1 allocs/op
BenchmarkGrowAppend4096    50000     43127 ns/op   30607 B/op       4 allocs/op
BenchmarkGrowMake4096     200000     11180 ns/op    8192 B/op       1 allocs/op
BenchmarkGrowAppend16384   50000     70581 ns/op  106342 B/op       8 allocs/op
BenchmarkGrowMake16384    200000     12060 ns/op   20480 B/op       1 allocs/op
BenchmarkGrowAppend65536   10000    220643 ns/op  371557 B/op      13 allocs/op
BenchmarkGrowMake65536     50000     37278 ns/op   70313 B/op       1 allocs/op
BenchmarkGrowAppend262144   5000    556190 ns/op 1441009 B/op      20 allocs/op
BenchmarkGrowMake262144    50000     59496 ns/op  267776 B/op       1 allocs/op
ok   command-line-arguments 124.845s

So the extra case-differentiation currently doesn't make sense. If you know that that the underlying array must be reallocated, it is better to use just make.
Reply all
Reply to author
Forward
0 new messages