It seems UTF-16 Big Endian.
If so, the first 2 chars should be chr(254) and chr(255). (The
BOM)
If it's UTF-32 Big Endian the first 4 chars should be 0,0,254,255
Dan
--
You received this message because you are subscribed to the Google Groups "Harbour Users" group.
Unsubscribe: harbour-user...@googlegroups.com
Web: https://groups.google.com/group/harbour-users
---
You received this message because you are subscribed to the Google Groups "Harbour Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to harbour-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/harbour-users/4a5393dc-7af6-47a0-93f3-b5925c58d262n%40googlegroups.com.
This is just a workaround, it is not the correct way to convert UTF16BE to <whatever>.
I know that my good friend fdaniele does it that way (from UTF16
LE) and I told him I was "horrified" by that. Eh Eh, just joking.
It's a quick and dirty method but it works, more or less.
Unfortunately, there is no Windows API function to convert from UTF16BE. You should change the "endianness" of the file using a function that swaps the two bytes of each codified character. You should write a C function that calls the API Windows function _swab.
Once swapped the bytes, you can convert from UTF16LE using the Harbour functions or the Windows API function WideCharToMultibyte (again, you need a C wrapper)
In UTF-16 you have a character codified in 2 bytes. The second byte is 00 unless the character is in the first 256 ASCII codes (they more or less match the ANSI codify, 1 byte needed), in LE you have the 00 as second byte, in BE the order is inverted.
Anyway, an editor should allow to open a UTF-16 BE text file and save as, say, UTF-8. Since not all the characters present in UTF16 have a corresponding char in UTF-8, possibly not all the characters can be converted.
As per the e" preceding the string, this is the representation of a UTF16 string in a Harbour memvar.
Dan
To view this discussion on the web visit https://groups.google.com/d/msgid/harbour-users/7e7e2619-df5d-4dc7-a267-ed40e166fdf9n%40googlegroups.com.
Correction: my explanation of the UTF16 encoding is very
approximated, and there is a typo
This is just a workaround, it is not the correct way to convert UTF16BE to <whatever>.
I know that my good friend fdaniele does it that way (from UTF16 LE) and I told him I was "horrified" by that. Eh Eh, just joking. It's a quick and dirty method but it works, more or less.
Unfortunately, there is no Windows API function to convert from UTF16BE. You should change the "endianness" of the file using a function that swaps the two bytes of each codified character. You should write a C function that calls the API Windows function _swab.
Once swapped the bytes, you can convert from UTF16LE using the Harbour functions or the Windows API function WideCharToMultibyte (again, you need a C wrapper)
In UTF-16 you have a character codified in 2 bytes. The second byte is 00 unless the character is in the first 256 ASCII codes (they more or less match the ANSI codify, 1 byte needed), in LE you have the 00 as second byte, in BE the order is inverted.
Must read :
In UTF-16 you have a character codified in 2 bytes. The second
byte is 00 IF the character is in the first 256 ASCII codes,
otherwise the character is codified with 2 bytes
To view this discussion on the web visit https://groups.google.com/d/msgid/harbour-users/356ffeb5-6469-4d39-809c-29fdada54063%40tiscalinet.it.