Amateur question: when you should use runes?

Kamil Ziemian

unread,

Nov 15, 2021, 12:59:33 PM11/15/21

to golang-nuts

Hello,

I read quite a few blog posts, articles, listen to nice number to talks about strings, runes and encoding in Go. I now reading Go Language Spec and I just stuck in the section about runes. I mean, it isn't hard as itself, but it raises to much questions to me. I decided that I need to learn more about Unicode and UTF-8, so from today I'm reading Unicode Technical Site (?), currently the Glossary (https://www.unicode.org/glossary/). But I can't understand one thing: when in practice you should use runes?

My understanding at this moment is like that. Unicode assign every symbol a number (at this moment I disregard normalization and any other more advance stuff), rune is alias for int32 that stores integer representation of this number. UTF-8 is variable size encoding using one or more bytes to encode symbol and shouldn't and DOESN'T represent integer value of symbols Unicode number. Virtues of UTF-8 are clear as how it allows to save a space is clear to me, but I can't find a reason why I should transform my text to runes? In Go stdlib there is a RuneReader interface (?) so this reason must exists, but I just can't find anything. Maybe it have something to do with sending information using internet? I don't know, this is totally outside my humble knowledge.

You can say, that since I don't see a reason to use runes, I probably shouldn't care about it. This is a valid point, but I want to know Go reasonable well and constantly find code with runes which reason of existence I don't understand (e.g. functions in stdlib that operates on runes) is quite demoralising to me.

Best
Kamil

Robert Engels

unread,

Nov 15, 2021, 2:00:13 PM11/15/21

to Kamil Ziemian, golang-nuts

When your string contains Unicode characters dealing with it as individual bytes is difficult. When using runes with range - each index is a “Unicode character” (which may be multiple bytes) - which is easy to use.

See go.dev/blog/strings

On Nov 15, 2021, at 12:00 PM, Kamil Ziemian <kziem...@gmail.com> wrote:

Hello,

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/f3dad0e1-cd25-4e33-a7f2-34e0118bf68an%40googlegroups.com.

burak serdar

unread,

Nov 15, 2021, 2:08:47 PM11/15/21

to Kamil Ziemian, golang-nuts

On Mon, Nov 15, 2021 at 11:00 AM Kamil Ziemian <kziem...@gmail.com> wrote:

Hello,

I read quite a few blog posts, articles, listen to nice number to talks about strings, runes and encoding in Go. I now reading Go Language Spec and I just stuck in the section about runes. I mean, it isn't hard as itself, but it raises to much questions to me. I decided that I need to learn more about Unicode and UTF-8, so from today I'm reading Unicode Technical Site (?), currently the Glossary (https://www.unicode.org/glossary/). But I can't understand one thing: when in practice you should use runes?

My understanding at this moment is like that. Unicode assign every symbol a number (at this moment I disregard normalization and any other more advance stuff), rune is alias for int32 that stores integer representation of this number. UTF-8 is variable size encoding using one or more bytes to encode symbol and shouldn't and DOESN'T represent integer value of symbols Unicode number. Virtues of UTF-8 are clear as how it allows to save a space is clear to me, but I can't find a reason why I should transform my text to runes? In Go stdlib there is a RuneReader interface (?) so this reason must exists, but I just can't find anything. Maybe it have something to do with sending information using internet? I don't know, this is totally outside my humble knowledge.

In general, you should work with runes whenever you are working with text that is entered by humans, or text that will be read by humans.

When you work with a string as a stream of bytes, then you either assume the string does not contain any bytes over 127, or you have to decode the UTF-8 string yourself. Working with runes eliminates both problems.

You can say, that since I don't see a reason to use runes, I probably shouldn't care about it. This is a valid point, but I want to know Go reasonable well and constantly find code with runes which reason of existence I don't understand (e.g. functions in stdlib that operates on runes) is quite demoralising to me.

Best
Kamil

--

Sachin Raut

unread,

Nov 15, 2021, 3:52:27 PM11/15/21

to burak serdar, Kamil Ziemian, golang-nuts

On Tue, Nov 16, 2021 at 12:38 AM burak serdar <bse...@computer.org> wrote:

On Mon, Nov 15, 2021 at 11:00 AM Kamil Ziemian <kziem...@gmail.com> wrote:
Hello,

I read quite a few blog posts, articles, listen to nice number to talks about strings, runes and encoding in Go. I now reading Go Language Spec and I just stuck in the section about runes. I mean, it isn't hard as itself, but it raises to much questions to me. I decided that I need to learn more about Unicode and UTF-8, so from today I'm reading Unicode Technical Site (?), currently the Glossary (https://www.unicode.org/glossary/). But I can't understand one thing: when in practice you should use runes?

An example of data type "rune"

Lets say we want to retrieve 5th digit from an integer ( 1234567 )

Step 1 = convert an integer to string using "strconv.Itoa()"

Step 2 = convert that string to slice of rune using "rune"

Step 3 = retrieve 5th digit from slice of rune

Step 4 = convert the retrieved 5th digit to string

here is an example https://play.golang.org/p/M22Awjcu2-0

My understanding at this moment is like that. Unicode assign every symbol a number (at this moment I disregard normalization and any other more advance stuff), rune is alias for int32 that stores integer representation of this number. UTF-8 is variable size encoding using one or more bytes to encode symbol and shouldn't and DOESN'T represent integer value of symbols Unicode number. Virtues of UTF-8 are clear as how it allows to save a space is clear to me, but I can't find a reason why I should transform my text to runes? In Go stdlib there is a RuneReader interface (?) so this reason must exists, but I just can't find anything. Maybe it have something to do with sending information using internet? I don't know, this is totally outside my humble knowledge.

In general, you should work with runes whenever you are working with text that is entered by humans, or text that will be read by humans.

When you work with a string as a stream of bytes, then you either assume the string does not contain any bytes over 127, or you have to decode the UTF-8 string yourself. Working with runes eliminates both problems.

You can say, that since I don't see a reason to use runes, I probably shouldn't care about it. This is a valid point, but I want to know Go reasonable well and constantly find code with runes which reason of existence I don't understand (e.g. functions in stdlib that operates on runes) is quite demoralising to me.

Best
Kamil

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/f3dad0e1-cd25-4e33-a7f2-34e0118bf68an%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAMV2Rqovfg9k9ALawv%2BC36_AxT0sbOb%2BEpcpKNO9r2kmg_T1nQ%40mail.gmail.com.

Kamil Ziemian

unread,

Nov 17, 2021, 1:00:03 PM11/17/21

to golang-nuts

Thank you all for answers. I have a lot to do in next few weeks, after that I will go back to runes and think more about your answers.

Best,
Kamil

Brian Candler

unread,

Nov 18, 2021, 3:44:59 AM11/18/21

to golang-nuts

The example given of using runes to split a string of digits into individual digits isn't a great one, because treating the string as an array of bytes would work just as well *in that situation*

Where it matters is when you're processing a string with codepoints > 127, for example: "hellø wörld". Those special characters are composed of multiple bytes in UTF-8, but will be presented as a single rune.

Jan Mercl

unread,

Nov 18, 2021, 4:29:13 AM11/18/21

to Kamil Ziemian, golang-nuts

On Mon, Nov 15, 2021 at 7:00 PM Kamil Ziemian <kziem...@gmail.com> wrote:

> ... when in practice you should use runes?

For example the API of the unicode package uses runes extensively:
https://pkg.go.dev/unicode.

> My understanding at this moment is like that. Unicode assign every symbol a number (at this moment I disregard normalization and any other more advance stuff), rune is alias for int32 that stores integer representation of this number.

Unicode text is a sequence of codepoints. Some represent [visible]
glyphs, some do not. Codepoint is just a numeric code, like 32 is the
numeric code of the ASCII space symbol. (Unicode is a superset of
ASCII.)

> UTF-8 is variable size encoding using one or more bytes to encode symbol and shouldn't and DOESN'T represent integer value of symbols Unicode number.

UTF-8 encded Unicode text represents those codepoint sequences most of
the time in smaller space, losslessly. UTF-8 encoded codepoint still
represents the Unicode codepoint ie. it is a numeric value. But
semantically, not directly in the bit pattern of the byte sequence.

> Virtues of UTF-8 are clear as how it allows to save a space is clear to me, but I can't find a reason why I should transform my text to runes?

No need to search for a reason. It'll run into you sooner or later
while you keep writing non-trivial, Unicode aware code ;-)

But if I would have to come with an example: Consider a text editor
that reads a file, splits it in lines, which are UTF-8 strings and
keeps them in this encoded form for space efficiency. While editing a
line, it is however probably better to convert the current/edited line
into a rune slice (lineRunes := []rune(line[cursorY])) when the cursor
enters the line and work with that form until the cursor moves to a
different line at which moment the []rune will be converted back to a
string (line[cursorY] = string(lineRunes)).

-j

Reply all

Reply to author

Forward