Invalid UTF-8 bytes in range string

411 views
Skip to first unread message

Isaac Wagner

unread,
Nov 30, 2009, 5:51:15 PM11/30/09
to golan...@googlegroups.com
What happens if range string (in a loop) encounters an invalid UTF-8 byte
sequence?

--
Isaac Wagner

SnakE

unread,
Nov 30, 2009, 6:03:30 PM11/30/09
to i...@isaacwagner.me, golan...@googlegroups.com
2009/12/1 Isaac Wagner <i...@isaacwagner.me>

What happens if range string (in a loop) encounters an invalid UTF-8 byte
sequence?

http://golang.org/doc/go_spec.html#For_statements

For strings, the "range" clause iterates over the Unicode code points in the string. On successive iterations, the index variable will be the index of the first byte of successive UTF-8-encoded code points in the string, and the second variable, of type int, will be the value of the corresponding code point. If the iteration encounters an invalid UTF-8 sequence, the second variable will be 0xFFFD, the Unicode replacement character, and the next iteration will advance a single byte in the string.

Isaac Wagner

unread,
Nov 30, 2009, 6:12:47 PM11/30/09
to golan...@googlegroups.com
Hum. Thanks for the prompt response... (Perhaps I should take a short trip
over to the documentation next time.)

--
Isaac Wagner

Tetsu

unread,
Nov 30, 2009, 7:57:22 PM11/30/09
to golang-nuts
Every invalid code points in a sequence will be converted to 0xFFFD.
For example:
code points: 61, F1, 80, 80
UTF-8 con: 0061, FFFD, FFFD, FFFD
BTW, this conversion doesn't follow the Unicode Standard 5.2
recommendation. But not a big deal.

-Tetsu
Reply all
Reply to author
Forward
0 new messages