Decoding UTF16 sequences to ASCII string

253 views
Skip to first unread message

Tong Sun

unread,
Nov 3, 2015, 1:01:45 PM11/3/15
to golang-nuts
Hi, 

I think my test program, https://github.com/suntong/lang/blob/master/lang/Go/src/text/utf16B.go, went very well, as I see the output, which is enclosed at the bottom of the program, is exactly what I'm expecting, until I redirect the output to a file, and view it with my `less`, when I found out the output is still in utf16, not ASCII string. 

I've put it on golang play
but please don't let its current output deceives you, 

Please help me make the output to be true ASCII string, as you can see, I was trying with `utf16.Decode` but that failed. 

Thanks




Manlio Perillo

unread,
Nov 3, 2015, 1:37:23 PM11/3/15
to golang-nuts
Il giorno martedì 3 novembre 2015 19:01:45 UTC+1, Tong Sun ha scritto:
Hi, 

I think my test program, https://github.com/suntong/lang/blob/master/lang/Go/src/text/utf16B.go, went very well, as I see the output, which is enclosed at the bottom of the program, is exactly what I'm expecting, until I redirect the output to a file, and view it with my `less`, when I found out the output is still in utf16, not ASCII string. 

> [...]

Your program is not really correct.
The correct code is this:

The reason the output looks correct on a terminal (or on a browser) is because the \x00 byte is not printed and has a zero width.
However look at:

Note that Go string is just a sequence of bytes assumed to be UTF-8 encoded.


Regards  Manlio 

Jakob Borg

unread,
Nov 3, 2015, 1:47:05 PM11/3/15
to Tong Sun, golang-nuts
You need to give utf16.Decode a slice of uint16s, each representing one character (or code point or whatever):


//jb

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tong Sun

unread,
Nov 3, 2015, 1:52:46 PM11/3/15
to Jakob Borg, golang-nuts
Oh Thank you Jakob, and Manlio!

On Tue, Nov 3, 2015 at 1:46 PM, Jakob Borg wrote:
> You need to give utf16.Decode a slice of uint16s, each representing one
> character (or code point or whatever):
>
> http://play.golang.org/p/cCufRfEQso
>
> //jb
>

Manlio Perillo

unread,
Nov 3, 2015, 2:35:52 PM11/3/15
to golang-nuts, ja...@nym.se
Il giorno martedì 3 novembre 2015 19:52:46 UTC+1, Tong Sun ha scritto:
Oh Thank you Jakob, and Manlio!


Note that the correct solution is to use an UTF16 decoder:

The decoding is a bit more complex, in general, due to surrogate pairs:

> [...]

Regards  Manlio 
Reply all
Reply to author
Forward
0 new messages