On Sunday, July 10, 2011 at 12:29 AM, hka...@gmail.com wrote:
UTF16toString()
func UTF16ToString(s []uint16) string { 60 for i, v := range s { 61 if v == 0 { 62 s = s[0:i] 63 break 64 } 65 } 66 return string(utf16.Decode(s)) 67 }
Alex
It is Windows specific function. It only exist, if GOOS=windows. Use utf16 package instead.Alex
On 7/9/2011 7:35 PM, Athiwat Chunlakhan wrote:
> I know about those two packages already, but the question here is how do I use them to get my result?
>
It looks clumsy to me, but something like:
import (
"encoding/bytes"
"utf8"
"utf16"
)
func decode_utf16_to_utf8_string(string) string {
if len(string) % 1 != 0 { // Error, can't be UTF-16
}
as_uint16 := make([]uint16, len(string) / 2)
for i := 0; i < len(string); i += 2 {
as_uint16[i/2] = bytes.LittleEndian.Uint16(as_bytes[i:i+2])
}
as_int_slice := utf16.Decode(as_uint16)
// I think 4 bytes is enough, worst case, but UCS-2 is actually
// only 3 bytes
//
http://stackoverflow.com/questions/6466071/whats-the-longest-in-bytes-utf-8-character-which-is-present-in-ucs-2
as_bytes := make([]byte, len(as_int_slice)*4)
offset := 0
for _, rune := range as_int_slice {
offset += utf8.EncodeRune(as_bytes[offset:], rune)
}
return string(as_bytes[:offset])
}
I haven't verified it. But that should at least mutate the types
correctly at each step. If you wanted, you could probably change the
algorithm a bit, to avoid some of the intermediate buffers. However,
because of extended Unicode, not every char can be encoded in a single
uint16. If you're sure the data doesn't contain the extended set, then
you could change the first as_uint16 loop to decode straight to a "rune
int". Something like:
func decode_utf16_to_utf8_string(string) string {
if len(string) % 1 != 0 { // Error, can't be UTF-16
}
as_uint16 := make([]uint16, 1)
// UCS-2 into UTF-8 has a worst-case of growing by 1 byte per rune
as_bytes := make([]bytes, len(string)* 3 / 2)
offset := 0
for i := 0; i < len(string); i += 2 {
as_uint16[0] := bytes.LittleEndian.Uint16(as_bytes[i:i+2])
rune := utf16.Decode(as_uint16)
offset += utf8.EncodeRune(as_bytes[offset:], rune)
}
return string(as_bytes[:offset])
}
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk4ZCGQACgkQJdeBCYSNAAM01gCeJ6URqFV1f/KnUKr/xlrk01Mn
dycAn1ODfPyuub2674QTgFy1xm8+Aesd
=ICKG
-----END PGP SIGNATURE-----
I'm confused as to why that function is in the syscall package at all. I see that it and related function are needed by some Windows syscalls, ...
... why not have those functions in the utf16 package itself?
as__uint16 := make([]uint16, 1)
I think it is too trivial to implement if anyone needs it. Also, I'm not sure where else this function would be applicable, but in windows syscall package.Alex
... On the other hand, if they are merely trivial helper functions with no use outside of certain Windows syscalls, they should not be exposed.