I am writing a pure C/C++ program to convert from UCS-2 to UTF-8 character
string. I can not find enough information from Google -- the mapping tables
(formula) between UCS-2 and UTF-8.
I want to develop the program through pure bit operations (&, | and
shifting), and I do not want to invoking any OS specific APIs.
Any reference samples or the mapping tables (formula) between UCS-2 and UTF-8?
thanks in advance,
George
http://www.unicode.org - it's not as simple as a mapping table though.
Are you sure you mean UCS2 and not UTF-16, btw?
> Any reference samples or the mapping tables (formula) between UCS-2 and
> UTF-8?
There are thousands of open source programs out there, e.g. iconv or yudit
which both can do this.
Uli
I think UCS-2 should be the same as UTF-16, right? Any differences?
regards,
George
Uli
I can not open unicode.org, could you help to post the related content
please? :-)
regards,
George
It's a bit too big for a Usenet posting. Also, I'm too lazy to extract
everything that might be relevant to your case.
Uli
> Sorry Uli,
>
> I can not open unicode.org, could you help to post the related content
> please? :-)
"
Q: What is the difference between UCS-2 and UTF-16?
A: UCS-2 is what a Unicode implementation was up to Unicode 1.1, *before*
surrogate code points and UTF-16 were added as concepts to Version 2.0 of
the standard. This term should be now be avoided.
When interpreting what people have meant by "UCS-2" in past usage, it is
best thought of as not a data format, but as an indication that an implementation
does not interpret any supplementary characters. In particular, for the purposes
of data exchange, UCS-2 and UTF-16 are identical formats. Both are 16-bit,
and have exactly the same code unit representation.
The effective difference between UCS-2 and UTF-16 lies at a different level,
when one is interpreting a sequence code units as code points or as characters.
In that case, a UCS-2 implementation would not handle processing like character
properties, codepoint boundaries, collation, etc. for supplementary characters.
[MD] & [KW]
"
FWIW,
- Kim
Wikipedia seems to have a page on everything:
http://en.wikipedia.org/wiki/UTF-8
--
With best wishes,
Igor Tandetnik
With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925
--
=====================================
Alexander Nickolov
Microsoft MVP [VC], MCSD
email: agnic...@mvps.org
MVP VC FAQ: http://vcfaq.mvps.org
=====================================
"Igor Tandetnik" <itand...@mvps.org> wrote in message
news:Om6IwUD1...@TK2MSFTNGP02.phx.gbl...
--
=====================================
Alexander Nickolov
Microsoft MVP [VC], MCSD
email: agnic...@mvps.org
MVP VC FAQ: http://vcfaq.mvps.org
=====================================
"Alexander Nickolov" <agnic...@mvps.org> wrote in message
news:O9t3EpG1...@TK2MSFTNGP05.phx.gbl...
regards,
George
I want to confirm that you mean UCS-2 and UTF-16 are the same thing?
regards,
George
Is there a mapping table (or formula) between UTF-8 and UCS-2? I can not
find. If you have, could you help to post please?
regards,
George
Is there a mapping table (or formula) between UTF-8 and UCS-2? I can not
find. If you have, could you help to post please?
regards,
George
George:
No, KIm did not say that. In fact he said the opposite. UCS-2 is
obsolete, and you should not use it. Recent versions of Windows support
UTF-16, complete with surrogate pairs.
--
David Wilkinson
Visual C++ MVP
It's right there in the article I gave you a link to. It is quite
obvious that you didn't bother following the link. In which case I, too,
can't be bothered to help you any further, sorry.
What means surrogate pairs?
regards,
George
http://en.wikipedia.org/wiki/UTF-16#Encoding_of_characters_outside_the_BMP
Good link.
regards,
George
I will read the link.
regards,
George