Is there a easy way to get a single unicode character from a string?

1,540 views
Skip to first unread message

赵铭宇

unread,
May 15, 2015, 8:35:51 PM5/15/15
to golan...@googlegroups.com
I have aksed this question at http://stackoverflow.com/questions/30263607/how-to-get-a-single-unicode-character-from-string-in-golang,
and I really got some way to make it.
But I am still confused.
 I can use the for range loop to touch every unicode character,it means that getting a single unicode character is possible.
Why is't there a easy to get a single unicode character by index?
Why not add this feature to String type in Golang?

Rob Pike

unread,
May 15, 2015, 8:39:00 PM5/15/15
to 赵铭宇, golan...@googlegroups.com

Ian Lance Taylor

unread,
May 15, 2015, 8:40:29 PM5/15/15
to 赵铭宇, golang-nuts
To get the first Unicode character, you can write
[]rune(s)[0]
To get the next one, use 1 instead of 0, and so on.

To do it more efficiently, you can use
http://godoc.org/golang.org/x/exp/utf8string .

Ian

adon...@google.com

unread,
May 18, 2015, 5:07:01 PM5/18/15
to golan...@googlegroups.com, zhx19394...@gmail.com
On Friday, 15 May 2015 20:40:29 UTC-4, Ian Lance Taylor wrote:
To get the first Unicode character, you can write
    []rune(s)[0]
To get the next one, use 1 instead of 0, and so on.

To do it more efficiently, you can use
http://godoc.org/golang.org/x/exp/utf8string .

If you don't want to pull in an experimental library, you can use a range loop:

for _, r := range s {
    print(r) // use the first rune
    break
}

or:

r, _ := utf8.DecodeRuneInString(s)

simon place

unread,
May 18, 2015, 7:56:41 PM5/18/15
to golan...@googlegroups.com
isn't this all because runes are 32bit so fixed length, whereas strings are utf8 which is variable length, so no way to 'index' to a character, think of strings as compressed rune arrays.

unicode is a reference number for a glyph, runes and utf8 characters just encode that number differently.

understanding this, and if you wanted to index repeatedly into the same string, you could do one pass and record the address of each utf8 encoded character, then use that to get to the char, but addresses are same or larger than runes, so saving indexes to every 16'th, say, character then moving forward to the actual required might make sense if you had a lot of unchanging long strings.
Reply all
Reply to author
Forward
0 new messages