Invalid UTF-8 bytes in range string

Isaac Wagner

unread,

Nov 30, 2009, 5:51:15 PM11/30/09

to golan...@googlegroups.com

What happens if range string (in a loop) encounters an invalid UTF-8 byte
sequence?

--
Isaac Wagner

SnakE

unread,

Nov 30, 2009, 6:03:30 PM11/30/09

to i...@isaacwagner.me, golan...@googlegroups.com

2009/12/1 Isaac Wagner <i...@isaacwagner.me>

What happens if range string (in a loop) encounters an invalid UTF-8 byte
sequence?

http://golang.org/doc/go_spec.html#For_statements

For strings, the "range" clause iterates over the Unicode code points in the string. On successive iterations, the index variable will be the index of the first byte of successive UTF-8-encoded code points in the string, and the second variable, of type int, will be the value of the corresponding code point. If the iteration encounters an invalid UTF-8 sequence, the second variable will be 0xFFFD, the Unicode replacement character, and the next iteration will advance a single byte in the string.

Isaac Wagner

unread,

Nov 30, 2009, 6:12:47 PM11/30/09

to golan...@googlegroups.com

Hum. Thanks for the prompt response... (Perhaps I should take a short trip
over to the documentation next time.)

--
Isaac Wagner

Tetsu

unread,

Nov 30, 2009, 7:57:22 PM11/30/09

to golang-nuts

Every invalid code points in a sequence will be converted to 0xFFFD.
For example:
code points: 61, F1, 80, 80
UTF-8 con: 0061, FFFD, FFFD, FFFD
BTW, this conversion doesn't follow the Unicode Standard 5.2
recommendation. But not a big deal.

-Tetsu

Reply all

Reply to author

Forward