Is it possible to have std::cout (or better yet std::ostream in general)
accept also std::wstring / wchar_t* as strings?
* Where / which operator<< would need to be specialized or overloaded?
* I would need to convert from (a known) unicode encoding to (a known)
multibyte encoding and then feed that to cout?
* Are there any stdlib implementations out there that already support this?
cheers,
Martin
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
No. Use std::wcout instead. That is what it is there for. Please note that
the standard does not require that mixing and matching cout and wcout will
work as intended. It is best to go all or nothing.
> * Where / which operator<< would need to be specialized or overloaded?
> * I would need to convert from (a known) unicode encoding to (a known)
> multibyte encoding and then feed that to cout?
The conversions/locale part of the library was crafted by somebody who
really knew what they were doing. Unfortunately nobody else seems to fully
understand it. This part is one of the least understood parts of the library
in general.
As far as I can tell, to convert from a wide string to a normal string, you
can use the built in conversions, but they will not work right if your wide
string uses chacaters not available in your standard string encoding.
If you want to convert to a multibyte representation that requires writing a
specialization of std::codecvt. Unforutunately, the standard specifies so
little about this, that if the standard library you have has any useful
support, it is completely non-portable. In fact the Dinkumware does not even
try to support additional specializations of std::codecvt, but provides
additional facets that are interface compatible with codecvt.
Don't even get me started on the std::messages facet, which was designed
with "catopen" style I18N, as well as Window's unusual resource based I18N
system in mind.
The simple fact here is that a large chunk of the locale facets part of the
specification are extremely underspecified, to the point that they are
barely useful. For fun read chapter 15 in the libstdc++ manual, where the
author shows just how unclear the specification really is here.
> Hi all,
>
> Is it possible to have std::cout (or better yet std::ostream in general)
> accept also std::wstring / wchar_t* as strings?
cout is an object... how can you overload just for this particular object?
It has to be for std::ostream I suppose?
> * Where / which operator<< would need to be specialized or overloaded?
> * I would need to convert from (a known) unicode encoding to (a known)
> multibyte encoding and then feed that to cout?
> * Are there any stdlib implementations out there that already support
> this?
jyoti@jyoti-desktop:~$ cat test.cpp
#include <iostream>
int main(void)
{ std::wstring str = L"Hello World.\n";
std::wcout << str;
return 0;
}
jyoti@jyoti-desktop:~$ g++ --pedantic -Wall test.cpp
jyoti@jyoti-desktop:~$ ./a.out
Hello World.
jyoti@jyoti-desktop:~$
Did you bother yourself with a google search?
Regards,
Jyoti
Not easily because ostream is tyepdef for a template class
basic_ostream parameterized by char and char_traits<char>
and string is a template class parameterized (by default) by char and
char_traits<char> and these common template parameters are the basis
of their interoperation and the underlying buffer that the stream
works on.
> * Where / which operator<< would need to be specialized or overloaded?
> * I would need to convert from (a known) unicode encoding to (a known)
> multibyte encoding and then feed that to cout?
Just define
ostream& operator<<(ostream&,const wstring&);
Execpt that it wouldn't be portable - I don't think that ostream is
defined in the standard as being any particular multibyte encoding and
I'm absolutely certain that the encoding of wchar_t isn't defined at
all (hence new classes in the next standard to explicitly deal with
UTF-16 and UTF-32)
> * Are there any stdlib implementations out there that already support this?
>
If there are they cannot be portable and doing it yourself isn't
difficult (for UTF-16 to UTF-8)
{ clc++m banner removed -mod }
Hi Martin,
Use wcout for printing wstring.
std::wstring str = TEXT("this is a unicode string");
wcout<<"string"<<str.c_str();
Regards,
Karthik
Yes, that's how it should have been designed.
> * Where / which operator<< would need to be specialized or overloaded?
The question seems odd: it's like asking, which key located to the upper left on
my QWERTY keyboard should I press to type a Q?
Well, for std::wstring it's
std::ostream& operator<<( std::ostream& stream, std::wstring const& ws )
> * I would need to convert from (a known) unicode encoding to (a known)
> multibyte encoding
Assuming that std::wstring encodes Unicode in some way (it doesn't have to, it's
application specific), yes that's what std::wcout does.
> and then feed that to cout?
Yes.
> * Are there any stdlib implementations out there that already support this?
Most, if by "support" you mean whether the required functionality is just one
routine or operator invocation away.
More interesting, are there any stdlib implementations that *don't* support it?
And yes, there are: MinGW 3.4.5 for Windows.
Cheers & hth.,
- Alf
Not a C++ question :-)? Judging by your terminology (multibyte versus
unicode encoding), you are on Windows. If so, and if you are on VC,
the simplest thing I can think of is to do this in some header:
inline ostream& operator<<(ostream& stream, const std::wstring&
string)
{
return stream << CW2A(string.c_str());
}
(CW2A is in atlconv.h.)
If not Win, it really depends on what encoding your system uses. For
example, you can easily (Linux) be in UTF-8, and all is done for you
already.
Goran.
Not as far as I know.
You can make a std::wofstream convert a std::wstring to UTF-8, though.
> * Where / which operator<< would need to be specialized or overloaded?
For wchar_t* it's not possible, for std::wstring it's disallowed by
the standard.
The reason as I see it is that there is a large "gap" between what the
standard specifies and the users (of the language, i.e. the programmers)
perceive and need. The users deal with "concrete" encodings such as
ISO-8859-1, EUC-KR, UTF-8, UCS-4 and are interested in converting
characters between these encodings. However, the standard specifies
so little about encodings and leaves the rest to the "implementation",
whose documentation many users don't know where to look for.
(I wonder if actual implementations document this area very well at all.)
Things may get a little better with the upcoming new types char16_t and
char32_t, hopefully.
>
> The simple fact here is that a large chunk of the locale facets part of
the
> specification are extremely underspecified, to the point that they are
> barely useful. For fun read chapter 15 in the libstdc++ manual, where the
> author shows just how unclear the specification really is here.
Exactly.
--
Seungbeom Kim
Sadly the whole std C++ unicode interoperability is a mess. (at least on
Windows) May it'll get better (usable) with C++0x but I doubt it. (My
impression is that MS likes to force programmers into the Windows
specific APIs for the unicode stuff instead of making the std APIs usable)
>
>> * Where / which operator<< would need to be specialized or overloaded?
>
> The question seems odd: it's like asking, which key located to the upper
> left on
> my QWERTY keyboard should I press to type a Q?
>
:-)
> Well, for std::wstring it's
>
> std::ostream& operator<<( std::ostream& stream, std::wstring const& ws )
>
So you see - it's specializing not overloading. You resolved my doubt.
The reason I phrased the question as is had more to do with wchar_t* ...
I'm unsure if specializing it could cause some problems and I'm unsure
if I need some special stuff for const wchat_* etc.
>
>> * I would need to convert from (a known) unicode encoding to (a known)
>> multibyte encoding
>
> Assuming that std::wstring encodes Unicode in some way (it doesn't have
> to, it's
> application specific), yes that's what std::wcout does.
>
>
>> and then feed that to cout?
>
> Yes.
>
>
>> * Are there any stdlib implementations out there that already support
>> this?
>
> Most, if by "support" you mean whether the required functionality is
> just one
> routine or operator invocation away.
>
It's not one but two (with wchar_t*) and with implementation I meant to
include the platform, that is, if there is an implementation/platform
that includes the function as well as the knowledge of _which_ unicode
and multibyte encoding to actually use.
> More interesting, are there any stdlib implementations that *don't*
> support it?
>
> And yes, there are: MinGW 3.4.5 for Windows.
>
Um. Why?
cheers,
Martin
Sorry, my bad, mea culpa, etc., I must beg forgiveness. I was probably thinking
of the lack of general support for wide streams with that compiler. But the lack
of general support isn't in the way of passing wide strings to std::cout.
Cheers, & sorry for dis-information (!),
- Alf
> > * Where / which operator<< would need to be specialized or overloaded?
> > * I would need to convert from (a known) unicode encoding to (a known)
> > multibyte encoding and then feed that to cout?
> > * Are there any stdlib implementations out there that already support
> > this?
>
> jyoti@jyoti-desktop:~$ cat test.cpp
> #include <iostream>
>
> int main(void)
> { std::wstring str = L"Hello World.\n";
> std::wcout << str;
> return 0;}
>
> jyoti@jyoti-desktop:~$ g++ --pedantic -Wall test.cpp
> jyoti@jyoti-desktop:~$ ./a.out
> Hello World.
All ASCII characters? I am not impressed... :/ What is wstring good
for then? What have we gained?
How is wstring better than an array of integers?
> Did you bother yourself with a google search?
Challenge: Write a C++ program that takes the user's name, compares it
to a hard-coded name that which contains a Unicode character, e.g.
"Şule". And produces either "Seni tanıyorum!" (I know you) for a
match, or "Tanıştığımıza memnun oldum" (nice to meet you) for no
match.
Simple student program... Google all you want.
I am very curious on how to write this program portably. I am also
very skeptical. I don't think C++ is for non-ASCII alphabets at all.
Ali
FWIW, my OP was about outputting wstring to cout and *not wcout* and the
problems one might face.
>> Did you bother yourself with a google search?
>
> Challenge: Write a C++ program that takes the user's name, compares it
> to a hard-coded name that which contains a Unicode character, e.g.
> "Şule". And produces either "Seni tanıyorum!" (I know you) for a
> match, or "Tanıştığımıza memnun oldum" (nice to meet you) for no
> match.
>
> Simple student program... Google all you want.
>
> I am very curious on how to write this program portably. I am also
> very skeptical. I don't think C++ is for non-ASCII alphabets at all.
>
Oversimplification.
The problem with portability here is not so much (standard) C++ I think.
Even if you restrict yourself to a subset of C++ compilers that will
correctly handle unicode (UTF-16 or UTF-32) for wstring, you still have
the problem that you have to know where the string is going to and that
the stdlib implementation on the platform needs to correctly convert the
wstring to the appropriate encoding of the "display device" (i.e. either
the console or some graphical library).
For example on Windows:
* For GUI programs you have no problems. You feed the unicode characters
(be they in a wstring or a CString) to the Win API and they will be
correctly displayed (*if* you have set a font that has the characters)
* For console programs you're pretty much f'ed because while you can get
it to work it's so horribly complicated that I'd call it broken.
(see a posting of mine in m.p.v.stl:
http://www.microsoft.com/communities/newsgroups/en-us/default.aspx?dg=microsoft.public.vc.stl&tid=db2fda2b-0423-4b5b-84f6-7c10b56ffefa&cat=en_US_ef479e33-8471-4f51-8ac6-ddac3f808cbf&lang=en&cr=US&sloc=&p=1
)
So what is one to do?
One solution I could think of for the console, (I have never tried it )
is to write an operator<<(std::ostream, std::wstring) that will convert
the wstring from the platform specific wide unicode encoding to UTF-8.
Of course that would mean that cout should expect UTF-8 so you can't use
it with the "normal" ASCII codepage.
br,
Martin
I think you'll have to replace "non-ASCII" by some other phrase.
Whatever limitation of C++ that you're complaining about (it has many
that you might be referring to), it can't be about lack of C++ support
for non-ASCII character set encodings, since C++ does not require ASCII.
The committee went rather far out of it's way to avoid mandating any
particular character set.
The standard only puts a few restrictions on character set encodings:
1) All-bits-0 must be the null character.
2) There's a basic source character set of 96 characters that must be
must all be represented with positive and distinct values.
3) The digits '0' through '9' must be represented by consecutive values.
ASCII meets these requirements, but as a matter of deliberate design,
they were chosen to allow EBCDIC and many other encodings to be used as
well.
The only actual references to a specific character set are those to
ISO/IEC 10646, and in particular the UTF-8 encoding thereof; but support
for ISO/IEC 10646 is not mandatory, except insofar as it is the basis
for Universal Character Names (2.2p2). It will be different in the next
version of the standard, where char16_t, char32_t types will have
ISO/IEC 10646 encodings, with corresponding character and string
literals prefixed with 'u' or 'U', respectively. If __STDC_ISO_10646__
is defined, wchar_t values will also represent ISO/IEC 10646 characters.