`for i, v := range` causes allocation

456 views
Skip to first unread message

Luke Wilson

unread,
Apr 12, 2021, 12:01:38 PM4/12/21
to golan...@googlegroups.com
I've heard several times from members of the community (on Matrix and possibly on answers) that a simple iteration like
const mixed = "\b5Ὂg̀9! ℃ᾭG"
for _, c := range mixed {
	... do something with c (but not write to it)

will actually silently allocate a slice of runes and decode the string into it, before iteration. I've heard it is done to prevent problems that occur when a programmer might overwrite data being iterated, which should be a no-brainer for programmers in general, but sure, whatever. So is it true in the case for constants? Is it true always, or only when writes occur to the source string or `c` in that case?

And if it always occurs, wouldn't it be great optimization to only decode runes when we get to them?

Since hearing of this, I started doing all of my utf-8 iteration of string runes as such:

import "unicode/utf8"
str := "Some UTF-8 string."
var i int
for i < len(str) {
	r, size := utf8.DecodeRune(str[i:])
	// do something with r...
	i += size
}

Which admittedly is a bit of a burden, and possibly premature optimization if I'm wrong. But I just think it shouldn't silently allocate a ginormous slice in the background to iterate the runes of a string I might not read all of, and especially considering `for i, v := range` should be idiomatic.

I tried to make a test on Compiler Explorer modifying the string being iterated, and not. I learned that I can't read the assembly gc produces. But what I gathered is that it takes a pointer to the original string and actually calls runtime.decoderune on each iteration. Modifications to mixed change variable mixed (to point to the newly allocated string(s)), while again, the original string pointer is kept in stack while iterating. I'd like to have somebody who can read the assembly confirm.

If that's the case, then it completely voids my argument and concerns. And will make many people, including myself, very happy to learn that it is in fact optimal.

In addition, I wonder if it's the same for other types being iterated? What if a byte slice for example has some bytes modified during iteration? It must create a copy, but it shouldn't copy for loops that do not write to the iterated variable.

Sorry for the rant, I'm passionate about it.

Luke

Jan Mercl

unread,
Apr 12, 2021, 12:33:51 PM4/12/21
to Luke Wilson, golang-nuts
On Mon, Apr 12, 2021 at 6:01 PM Luke Wilson <theluk...@gmail.com> wrote:

> I've heard several times from members of the community (on Matrix and possibly on answers) that a simple iteration like
>
> const mixed = "\b5Ὂg̀9! ℃ᾭG"
> for _, c := range mixed {
> ... do something with c (but not write to it)
>
> will actually silently allocate a slice of runes and decode the string into it, before iteration. I've heard it is done to prevent problems that occur when a programmer might overwrite data being iterated

I believe no silent allocation and no conversion to a slice of runes
occurs. A single instance of variable c, of type rune, exists within
the loop. There's no problem with modifying 'c'. A problem exists if
the _address_ of 'c' is assumed to point to different variables in
each cycle. That's not the case.

Ian Lance Taylor

unread,
Apr 12, 2021, 12:34:45 PM4/12/21
to Luke Wilson, golang-nuts
On Mon, Apr 12, 2021 at 9:01 AM Luke Wilson <theluk...@gmail.com> wrote:
>
> I've heard several times from members of the community (on Matrix and possibly on answers) that a simple iteration like
>
> const mixed = "\b5Ὂg̀9! ℃ᾭG"
> for _, c := range mixed {
> ... do something with c (but not write to it)
>
> will actually silently allocate a slice of runes and decode the string into it, before iteration. I've heard it is done to prevent problems that occur when a programmer might overwrite data being iterated, which should be a no-brainer for programmers in general, but sure, whatever. So is it true in the case for constants? Is it true always, or only when writes occur to the source string or `c` in that case?

There are different Go compilers, but I'm not aware of any compiler
that implements range over a string by first converting the string to
a slice of runes. The default Go compiler implements this by calling
runtime.decoderune in a loop to fetch each new rune. You can see that
code at https://golang.org/src/runtime/utf8.go#L51. There won't be
any additional allocation in this statement.


> Since hearing of this, I started doing all of my utf-8 iteration of string runes as such:
>
> import "unicode/utf8"
> str := "Some UTF-8 string."
> var i int
> for i < len(str) {
> r, size := utf8.DecodeRune(str[i:])
> // do something with r...
> i += size
> }
>
> Which admittedly is a bit of a burden, and possibly premature optimization if I'm wrong. But I just think it shouldn't silently allocate a ginormous slice in the background to iterate the runes of a string I might not read all of, and especially considering `for i, v := range` should be idiomatic.

There is nothing wrong with that code, but it's essentially the same
as what the simple range over a string does.


> In addition, I wonder if it's the same for other types being iterated? What if a byte slice for example has some bytes modified during iteration? It must create a copy, but it shouldn't copy for loops that do not write to the iterated variable.

Ranging over a slice does not create a copy of the slice.

However, ranging over an array (not a pointer to an array) does, in
the general case, create a copy of the array. That is required by the
semantics of range. In some cases that copy can be optimized away.

Ian

Luke Wilson

unread,
Apr 12, 2021, 12:42:34 PM4/12/21
to Jan Mercl, ia...@golang.org, golan...@googlegroups.com
On 4/12/21 11:31 AM, Jan Mercl wrote:
> I believe no silent allocation and no conversion to a slice of runes
> occurs. A single instance of variable c, of type rune, exists within
> the loop. There's no problem with modifying 'c'. A problem exists if
> the _address_ of 'c' is assumed to point to different variables in
> each cycle. That's not the case.

And Ian Lance Taylor:

> There are different Go compilers, but I'm not aware of any compiler
> that implements range over a string by first converting the string to
> a slice of runes. The default Go compiler implements this by calling
> runtime.decoderune in a loop to fetch each new rune. You can see that
> code athttps://golang.org/src/runtime/utf8.go#L51. There won't be
> any additional allocation in this statement.

Sweet, I am very excited to hear this. Thanks Ian, and Jan. I'll let
people know.

Brian Candler

unread,
Apr 12, 2021, 2:58:22 PM4/12/21
to golang-nuts
On Monday, 12 April 2021 at 17:01:38 UTC+1 theluk...@gmail.com wrote:
I've heard several times from members of the community (on Matrix and possibly on answers) that a simple iteration like
const mixed = "\b5Ὂg̀9! ℃ᾭG"
for _, c := range mixed {
	... do something with c (but not write to it)

will actually silently allocate a slice of runes and decode the string into it, before iteration. I've heard it is done to prevent problems that occur when a programmer might overwrite data being iterated, which should be a no-brainer for programmers in general, but sure, whatever. So is it true in the case for constants? Is it true always, or only when writes occur to the source string or `c` in that case?


Writing to c won't make any difference (because it's a completely separate variable, an int32 which doesn't share any memory with the original string); and writing to the source string is impossible (because strings are immutable in go).

Could it be possible that you're thinking of the opposite case?  Given an input slice b []byte, if you choose to iterate over it as a series of UTF-8 codepoints using

    for _, c := range string(b) {
        ...
    }


then I would expect the string(b) conversion to make a byte-for-byte copy of b, because a string is immutable but b isn't.

This isn't the same as exploding the string into a slice of int32 runes up-front, but it might explain where the rumour came from.
Reply all
Reply to author
Forward
0 new messages