Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Unicode in C++
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  8 messages - Expand all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Michael Davis  
View profile  
 More options Jun 8 2005, 11:08 am
Newsgroups: comp.lang.c++
From: Michael Davis <m...@nospam.com>
Date: Wed, 08 Jun 2005 11:08:10 -0400
Local: Wed, Jun 8 2005 11:08 am
Subject: Unicode in C++
Hi,

I've known C/C++ for years, but only ever used ascii strings. I have a
client who wants to know how gcc handles unicode. I've found the functions
utf8_mbtowc, utf8_mbstowcs, utf8_wctomb and utf8_wcstombs, but I'm
wondering if there are any other libraries or functions which can do things
like handle different kinds of encodings?

Thanks
Michael Davis


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rolf Magnus  
View profile  
 More options Jun 8 2005, 11:49 am
Newsgroups: comp.lang.c++
From: Rolf Magnus <ramag...@t-online.de>
Date: Wed, 08 Jun 2005 17:49:30 +0200
Local: Wed, Jun 8 2005 11:49 am
Subject: Re: Unicode in C++

Michael Davis wrote:
> Hi,

> I've known C/C++ for years, but only ever used ascii strings. I have a
> client who wants to know how gcc handles unicode. I've found the functions
> utf8_mbtowc, utf8_mbstowcs, utf8_wctomb and utf8_wcstombs, but I'm
> wondering if there are any other libraries or functions which can do
> things like handle different kinds of encodings?

There is iconv.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Michael Davis  
View profile  
 More options Jun 8 2005, 12:19 pm
Newsgroups: comp.lang.c++
From: Michael Davis <m...@nospam.com>
Date: Wed, 08 Jun 2005 12:19:19 -0400
Local: Wed, Jun 8 2005 12:19 pm
Subject: Re: Unicode in C++

Rolf Magnus wrote:
> Michael Davis wrote:

>> Hi,

>> I've known C/C++ for years, but only ever used ascii strings. I have a
>> client who wants to know how gcc handles unicode. I've found the
>> functions utf8_mbtowc, utf8_mbstowcs, utf8_wctomb and utf8_wcstombs, but
>> I'm wondering if there are any other libraries or functions which can do
>> things like handle different kinds of encodings?

> There is iconv.

Thanks!
md

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
elv...@gmail.com  
View profile  
 More options Jun 9 2005, 5:43 am
Newsgroups: comp.lang.c++
From: elv...@gmail.com
Date: 9 Jun 2005 02:43:26 -0700
Local: Thurs, Jun 9 2005 5:43 am
Subject: Re: Unicode in C++
A proper std:: way is using wchar_t, wstring types  - can handle
Unicode strings.
(fstream -> wfstream, ostream -> wostream, istream -> wistream, etc)
To display characters properly (in a window, console) or to save them
in a file you have to use locales (regional settings) that are
available in your computer.

E.g. to find a name of the available locale:
...
#include <locale>
...
..
.

try
{
  locale AvailLocale("german");
  cout << AvailLocale.name() << endl;

}

catch(runtime_error& e )
{
  cout << e.what() << endl;

}

You should get something like this :
German_Germany.1252(in Windows)
de_DE.iso8859-1(in Unix/Linux)
See
http://cvs.sourceforge.net/viewcvs.py/dmx/dmx/xc/nls/locale.alias?rev...
for more detailed list.

To save a pure Unicode string to file you need to upgrade STL
http://www.codeproject.com/vcpp/stl/upgradingstlappstounicode.asp?pri...
or to use C-like way (fwrite) but it is not common way of doing that -
it is platform dependent.

Use available locales, e.g.:

locale Ger("German_Germany.1252");
wcout.imbue(Ger); //attach locale to stream
wstring ws(L"A german text...");
wcout << ws << endl;
//to get a current locale of a stream use:
CurrentLocale = wcout.getloc();

It is good to use a text editor that can display/manage these locales.

Also visit
http://www.langer.camelot.de/Articles/Cuj/Internationalization/I18N.html


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rapscallion  
View profile  
 More options Jun 9 2005, 6:06 am
Newsgroups: comp.lang.c++
From: "Rapscallion" <raps725...@spambob.com>
Date: 9 Jun 2005 03:06:35 -0700
Local: Thurs, Jun 9 2005 6:06 am
Subject: Re: Unicode in C++

elv...@gmail.com wrote:
> A proper std:: way is using wchar_t, wstring types  - can handle
> Unicode strings.
> (fstream -> wfstream, ostream -> wostream, istream -> wistream, etc)

By 'Unicode' you mean UTF-16, right?

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ron Natalie  
View profile  
 More options Jun 9 2005, 6:24 am
Newsgroups: comp.lang.c++
From: Ron Natalie <r...@spamcop.net>
Date: Thu, 09 Jun 2005 06:24:32 -0400
Local: Thurs, Jun 9 2005 6:24 am
Subject: Re: Unicode in C++
Rapscallion wrote:
> elv...@gmail.com wrote:

>>A proper std:: way is using wchar_t, wstring types  - can handle
>>Unicode strings.
>>(fstream -> wfstream, ostream -> wostream, istream -> wistream, etc)

> By 'Unicode' you mean UTF-16, right?

Not necessarily.  While Windows equates UNICODE with UTF-16, many
of the UNIX implemeations use a 32 bit wchar_t and UNICODE>

Unfortunately, while the various W-versions of the functions can
support wide char (presumably some UNICODE version) strings.  Most
of the major C++ interfaces don't support it.   The assumption of
the standardizer is there some mutibyte-char type that you can use
for the system interfaces.   It's really stupid and causes a pain
in the butt on systems that really don't have that mapping (like
Windows).


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rolf Magnus  
View profile  
 More options Jun 9 2005, 6:25 am
Newsgroups: comp.lang.c++
From: Rolf Magnus <ramag...@t-online.de>
Date: Thu, 09 Jun 2005 12:25:05 +0200
Local: Thurs, Jun 9 2005 6:25 am
Subject: Re: Unicode in C++

Rapscallion wrote:
> elv...@gmail.com wrote:
>> A proper std:: way is using wchar_t, wstring types  - can handle
>> Unicode strings.
>> (fstream -> wfstream, ostream -> wostream, istream -> wistream, etc)

> By 'Unicode' you mean UTF-16, right?

By 'Unicode' he should mean wide characters of an unspecified encoding. On
my compiler, it's definitely not UTF-16, because wchar_t is 32bits.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
elv...@gmail.com  
View profile  
 More options Jun 9 2005, 2:52 pm
Newsgroups: comp.lang.c++
From: elv...@gmail.com
Date: 9 Jun 2005 11:52:42 -0700
Local: Thurs, Jun 9 2005 2:52 pm
Subject: Re: Unicode in C++
Unicode is a very big character set where each character has its own
index. There is
thousands of characters in this set. Unicode means standard it is not
character encoding. There also exists standard with name ISO 10646.
Theoretically ISO 10646 can handle about billions of characters. The
first 65 536 characters of ISO 10646 are identical with Unicode
standard. Advantage of Unicode or ISO 10646 is that these formats cover
almost every character you would ever need.

Non-Wide Characters - reprezented with CHAR:
Many charsets (ISO 8859-1, ISO 8859-2, ...) include 256 characters - it
means that it is not possible to cover every language in such small
number of characters. But many applications are not able to manage
Unicode at this time so use some of encodings/character representations
 available in your OS:

standardized charsets ISO 8859...
or windows-125X ...
or Mac x-mac-ce ...etc
or UTF-8.

UTF? yes but it is reprezented with WIDE CHAR.
UTF-8 is a way how to write a character to file: ASCII characters are
represented with one byte and other characters are represented with
more than one byte.
example: 11000011-10101101

UTF-16: All characters are represented with two bytes. Some of those
characters have a special meaning.
example: 11101101-00000000

To represent all languages as much as possible use wchar_t (one
character), wstring (string). These types are __usually__ able to cover
all characters in Unicode standard with 4 bytes but it can be also 2
bytes. w means wide characters. To use them you have to use streams for
wide characters.  Please see std::locale, std::locale::facet. When
using w-objects you have to be sure about your current
encoding/charset.

Usually we express text in programs with CHARs (We can be happy enough
with chars) but sometime we want to use a different language, very
different language that is not covered in the available encoding (with
256 characters, windows-125X, ISO88...).  We can handle text in program
like Unicode set (and we can be happy as well) but we (in C++) usually
write to file using available encoding (non-Unicode)in our OS because
it is not possible when using std::. One way is
http://www.codeproject.com/vcpp/stl/upgradingstlappstounicode.asp?pri...
another way is using C function fwrite:

wchar_t myWString[] = L"Some strange characters."
fwrite(myWString, sizeof(wchar_t), sizeof(myWString)/sizeof(wchar_t),
myFile );

but is is not portable.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google