Make std::cout accept std::wstring ?

Martin B.

unread,

Oct 8, 2009, 11:03:17 PM10/8/09

to

Hi all,

Is it possible to have std::cout (or better yet std::ostream in general)
accept also std::wstring / wchar_t* as strings?

* Where / which operator<< would need to be specialized or overloaded?
* I would need to convert from (a known) unicode encoding to (a known)
multibyte encoding and then feed that to cout?
* Are there any stdlib implementations out there that already support this?

cheers,
Martin

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Joe Smith

unread,

Oct 9, 2009, 3:34:43 PM10/9/09

to

"Martin B." <0xCDC...@gmx.at> wrote in message
news:haku90$alu$1...@news.eternal-september.org...

> Hi all,
>
> Is it possible to have std::cout (or better yet std::ostream in general)
> accept also std::wstring / wchar_t* as strings?

No. Use std::wcout instead. That is what it is there for. Please note that
the standard does not require that mixing and matching cout and wcout will
work as intended. It is best to go all or nothing.

> * Where / which operator<< would need to be specialized or overloaded?
> * I would need to convert from (a known) unicode encoding to (a known)
> multibyte encoding and then feed that to cout?

The conversions/locale part of the library was crafted by somebody who
really knew what they were doing. Unfortunately nobody else seems to fully
understand it. This part is one of the least understood parts of the library
in general.

As far as I can tell, to convert from a wide string to a normal string, you
can use the built in conversions, but they will not work right if your wide
string uses chacaters not available in your standard string encoding.

If you want to convert to a multibyte representation that requires writing a
specialization of std::codecvt. Unforutunately, the standard specifies so
little about this, that if the standard library you have has any useful
support, it is completely non-portable. In fact the Dinkumware does not even
try to support additional specializations of std::codecvt, but provides
additional facets that are interface compatible with codecvt.

Don't even get me started on the std::messages facet, which was designed
with "catopen" style I18N, as well as Window's unusual resource based I18N
system in mind.

The simple fact here is that a large chunk of the locale facets part of the
specification are extremely underspecified, to the point that they are
barely useful. For fun read chapter 15 in the libstdc++ manual, where the
author shows just how unclear the specification really is here.

jyoti

unread,

Oct 9, 2009, 3:35:38 PM10/9/09

to

On Fri, 09 Oct 2009 08:33:17 +0530, Martin B. <0xCDC...@gmx.at> wrote:

> Hi all,
>
> Is it possible to have std::cout (or better yet std::ostream in general)
> accept also std::wstring / wchar_t* as strings?

cout is an object... how can you overload just for this particular object?
It has to be for std::ostream I suppose?

> * Where / which operator<< would need to be specialized or overloaded?
> * I would need to convert from (a known) unicode encoding to (a known)
> multibyte encoding and then feed that to cout?
> * Are there any stdlib implementations out there that already support
> this?

jyoti@jyoti-desktop:~$ cat test.cpp
#include <iostream>

int main(void)
{ std::wstring str = L"Hello World.\n";
std::wcout << str;
return 0;
}
jyoti@jyoti-desktop:~$ g++ --pedantic -Wall test.cpp
jyoti@jyoti-desktop:~$ ./a.out
Hello World.
jyoti@jyoti-desktop:~$

Did you bother yourself with a google search?

Regards,
Jyoti

Nick Hounsome

unread,

Oct 9, 2009, 5:15:41 PM10/9/09

to

On 9 Oct, 04:03, "Martin B." <0xCDCDC...@gmx.at> wrote:
> Hi all,
>
> Is it possible to have std::cout (or better yet std::ostream in general)
> accept also std::wstring / wchar_t* as strings?

Not easily because ostream is tyepdef for a template class
basic_ostream parameterized by char and char_traits<char>
and string is a template class parameterized (by default) by char and
char_traits<char> and these common template parameters are the basis
of their interoperation and the underlying buffer that the stream
works on.

> * Where / which operator<< would need to be specialized or overloaded?
> * I would need to convert from (a known) unicode encoding to (a known)
> multibyte encoding and then feed that to cout?

Just define

ostream& operator<<(ostream&,const wstring&);

Execpt that it wouldn't be portable - I don't think that ostream is
defined in the standard as being any particular multibyte encoding and
I'm absolutely certain that the encoding of wchar_t isn't defined at
all (hence new classes in the next standard to explicitly deal with
UTF-16 and UTF-32)

> * Are there any stdlib implementations out there that already support this?
>

If there are they cannot be portable and doing it yourself isn't
difficult (for UTF-16 to UTF-8)

Karthikeyan

unread,

Oct 9, 2009, 5:14:15 PM10/9/09

to

On Oct 9, 8:03 am, "Martin B." <0xCDCDC...@gmx.at> wrote:
> Hi all,
>
> Is it possible to have std::cout (or better yet std::ostream in general)
> accept also std::wstring / wchar_t* as strings?
>
> * Where / which operator<< would need to be specialized or overloaded?
> * I would need to convert from (a known) unicode encoding to (a known)
> multibyte encoding and then feed that to cout?
> * Are there any stdlib implementations out there that already support this?
>
> cheers,
> Martin

{ clc++m banner removed -mod }

Hi Martin,
Use wcout for printing wstring.

std::wstring str = TEXT("this is a unicode string");

wcout<<"string"<<str.c_str();

Regards,
Karthik

Alf P. Steinbach

unread,

Oct 9, 2009, 5:30:54 PM10/9/09

to

* Martin B.:

>
> Is it possible to have std::cout (or better yet std::ostream in general)
> accept also std::wstring / wchar_t* as strings?

Yes, that's how it should have been designed.

> * Where / which operator<< would need to be specialized or overloaded?

The question seems odd: it's like asking, which key located to the upper left on
my QWERTY keyboard should I press to type a Q?

Well, for std::wstring it's

std::ostream& operator<<( std::ostream& stream, std::wstring const& ws )

> * I would need to convert from (a known) unicode encoding to (a known)
> multibyte encoding

Assuming that std::wstring encodes Unicode in some way (it doesn't have to, it's
application specific), yes that's what std::wcout does.

> and then feed that to cout?

Yes.

> * Are there any stdlib implementations out there that already support this?

Most, if by "support" you mean whether the required functionality is just one
routine or operator invocation away.

More interesting, are there any stdlib implementations that *don't* support it?

And yes, there are: MinGW 3.4.5 for Windows.

Cheers & hth.,

- Alf

Goran

unread,

Oct 9, 2009, 5:34:51 PM10/9/09

to

On Oct 9, 5:03 am, "Martin B." <0xCDCDC...@gmx.at> wrote:
> Hi all,
>
> Is it possible to have std::cout (or better yet std::ostream in general)
> accept also std::wstring / wchar_t* as strings?
>
> * Where / which operator<< would need to be specialized or overloaded?
> * I would need to convert from (a known) unicode encoding to (a known)
> multibyte encoding and then feed that to cout?

Not a C++ question :-)? Judging by your terminology (multibyte versus
unicode encoding), you are on Windows. If so, and if you are on VC,
the simplest thing I can think of is to do this in some header:

inline ostream& operator<<(ostream& stream, const std::wstring&
string)
{
return stream << CW2A(string.c_str());
}

(CW2A is in atlconv.h.)

If not Win, it really depends on what encoding your system uses. For
example, you can easily (Linux) be in UTF-8, and all is done for you
already.

Goran.

Mathias Gaunard

unread,

Oct 9, 2009, 5:32:21 PM10/9/09

to

On 9 oct, 05:03, "Martin B." <0xCDCDC...@gmx.at> wrote:
> Hi all,
>
> Is it possible to have std::cout (or better yet std::ostream in general)
> accept also std::wstring / wchar_t* as strings?

Not as far as I know.
You can make a std::wofstream convert a std::wstring to UTF-8, though.

> * Where / which operator<< would need to be specialized or overloaded?

For wchar_t* it's not possible, for std::wstring it's disallowed by
the standard.

Seungbeom Kim

unread,

Oct 9, 2009, 11:10:24 PM10/9/09

to

Joe Smith wrote:
>
> The conversions/locale part of the library was crafted by somebody who
> really knew what they were doing. Unfortunately nobody else seems to fully
> understand it. This part is one of the least understood parts of the
library
> in general.

The reason as I see it is that there is a large "gap" between what the
standard specifies and the users (of the language, i.e. the programmers)
perceive and need. The users deal with "concrete" encodings such as
ISO-8859-1, EUC-KR, UTF-8, UCS-4 and are interested in converting
characters between these encodings. However, the standard specifies
so little about encodings and leaves the rest to the "implementation",
whose documentation many users don't know where to look for.
(I wonder if actual implementations document this area very well at all.)

Things may get a little better with the upcoming new types char16_t and
char32_t, hopefully.

>
> The simple fact here is that a large chunk of the locale facets part of
the
> specification are extremely underspecified, to the point that they are
> barely useful. For fun read chapter 15 in the libstdc++ manual, where the
> author shows just how unclear the specification really is here.

Exactly.

--
Seungbeom Kim

Martin B.

unread,

Oct 12, 2009, 3:08:15 PM10/12/09

to

Alf P. Steinbach wrote:
> * Martin B.:
>>
>> Is it possible to have std::cout (or better yet std::ostream in
>> general) accept also std::wstring / wchar_t* as strings?
>
> Yes, that's how it should have been designed.
>

Sadly the whole std C++ unicode interoperability is a mess. (at least on
Windows) May it'll get better (usable) with C++0x but I doubt it. (My
impression is that MS likes to force programmers into the Windows
specific APIs for the unicode stuff instead of making the std APIs usable)

>
>> * Where / which operator<< would need to be specialized or overloaded?
>
> The question seems odd: it's like asking, which key located to the upper
> left on
> my QWERTY keyboard should I press to type a Q?
>

:-)

> Well, for std::wstring it's
>
> std::ostream& operator<<( std::ostream& stream, std::wstring const& ws )
>

So you see - it's specializing not overloading. You resolved my doubt.

The reason I phrased the question as is had more to do with wchar_t* ...
I'm unsure if specializing it could cause some problems and I'm unsure
if I need some special stuff for const wchat_* etc.

>
>> * I would need to convert from (a known) unicode encoding to (a known)
>> multibyte encoding
>
> Assuming that std::wstring encodes Unicode in some way (it doesn't have
> to, it's
> application specific), yes that's what std::wcout does.
>
>
>> and then feed that to cout?
>
> Yes.
>
>
>> * Are there any stdlib implementations out there that already support
>> this?
>
> Most, if by "support" you mean whether the required functionality is
> just one
> routine or operator invocation away.
>

It's not one but two (with wchar_t*) and with implementation I meant to
include the platform, that is, if there is an implementation/platform
that includes the function as well as the knowledge of _which_ unicode
and multibyte encoding to actually use.

> More interesting, are there any stdlib implementations that *don't*
> support it?
>
> And yes, there are: MinGW 3.4.5 for Windows.
>

Um. Why?

cheers,
Martin

Alf P. Steinbach

unread,

Oct 12, 2009, 7:43:16 PM10/12/09

to

* Martin B.:

> Alf P. Steinbach wrote:
>
>> More interesting, are there any stdlib implementations that *don't*

>> support [passing wide strings to std::cout]?

>>
>> And yes, there are: MinGW 3.4.5 for Windows.
>>
>
> Um. Why?

Sorry, my bad, mea culpa, etc., I must beg forgiveness. I was probably thinking
of the lack of general support for wide streams with that compiler. But the lack
of general support isn't in the way of passing wide strings to std::cout.

Cheers, & sorry for dis-information (!),

- Alf

Ali Çehreli

unread,

Oct 16, 2009, 5:26:44 AM10/16/09

to

On Oct 9, 12:35 pm, jyoti <jyoti.mic...@gmail.com> wrote:

> On Fri, 09 Oct 2009 08:33:17 +0530, Martin B. <0xCDCDC...@gmx.at> wrote:

> > * Where / which operator<< would need to be specialized or overloaded?
> > * I would need to convert from (a known) unicode encoding to (a known)
> > multibyte encoding and then feed that to cout?
> > * Are there any stdlib implementations out there that already support
> > this?
>
> jyoti@jyoti-desktop:~$ cat test.cpp
> #include <iostream>
>
> int main(void)
> { std::wstring str = L"Hello World.\n";
> std::wcout << str;
> return 0;}
>
> jyoti@jyoti-desktop:~$ g++ --pedantic -Wall test.cpp
> jyoti@jyoti-desktop:~$ ./a.out
> Hello World.

All ASCII characters? I am not impressed... :/ What is wstring good
for then? What have we gained?

How is wstring better than an array of integers?

> Did you bother yourself with a google search?

Challenge: Write a C++ program that takes the user's name, compares it
to a hard-coded name that which contains a Unicode character, e.g.
"Şule". And produces either "Seni tanıyorum!" (I know you) for a
match, or "Tanıştığımıza memnun oldum" (nice to meet you) for no
match.

Simple student program... Google all you want.

I am very curious on how to write this program portably. I am also
very skeptical. I don't think C++ is for non-ASCII alphabets at all.

Ali

Martin B.

unread,

Oct 16, 2009, 11:16:30 AM10/16/09

to

Ali Çehreli wrote:
> On Oct 9, 12:35 pm, jyoti <jyoti.mic...@gmail.com> wrote:
>> On Fri, 09 Oct 2009 08:33:17 +0530, Martin B. <0xCDCDC...@gmx.at> wrote:
>
>>> * Where / which operator<< would need to be specialized or overloaded?
>>> * I would need to convert from (a known) unicode encoding to (a known)
>>> multibyte encoding and then feed that to cout?
>>> * Are there any stdlib implementations out there that already support
>>> this?
>> jyoti@jyoti-desktop:~$ cat test.cpp
>> #include <iostream>
>>
>> int main(void)
>> { std::wstring str = L"Hello World.\n";
>> std::wcout << str;
>> return 0;}
>>
>> jyoti@jyoti-desktop:~$ g++ --pedantic -Wall test.cpp
>> jyoti@jyoti-desktop:~$ ./a.out
>> Hello World.
>
> All ASCII characters? I am not impressed... :/ What is wstring good
> for then? What have we gained?
>
> How is wstring better than an array of integers?
>

FWIW, my OP was about outputting wstring to cout and *not wcout* and the
problems one might face.

>> Did you bother yourself with a google search?
>
> Challenge: Write a C++ program that takes the user's name, compares it
> to a hard-coded name that which contains a Unicode character, e.g.
> "Şule". And produces either "Seni tanıyorum!" (I know you) for a
> match, or "Tanıştığımıza memnun oldum" (nice to meet you) for no
> match.
>
> Simple student program... Google all you want.
>
> I am very curious on how to write this program portably. I am also
> very skeptical. I don't think C++ is for non-ASCII alphabets at all.
>

Oversimplification.

The problem with portability here is not so much (standard) C++ I think.
Even if you restrict yourself to a subset of C++ compilers that will
correctly handle unicode (UTF-16 or UTF-32) for wstring, you still have
the problem that you have to know where the string is going to and that
the stdlib implementation on the platform needs to correctly convert the
wstring to the appropriate encoding of the "display device" (i.e. either
the console or some graphical library).

For example on Windows:
* For GUI programs you have no problems. You feed the unicode characters
(be they in a wstring or a CString) to the Win API and they will be
correctly displayed (*if* you have set a font that has the characters)
* For console programs you're pretty much f'ed because while you can get
it to work it's so horribly complicated that I'd call it broken.
(see a posting of mine in m.p.v.stl:
http://www.microsoft.com/communities/newsgroups/en-us/default.aspx?dg=microsoft.public.vc.stl&tid=db2fda2b-0423-4b5b-84f6-7c10b56ffefa&cat=en_US_ef479e33-8471-4f51-8ac6-ddac3f808cbf&lang=en&cr=US&sloc=&p=1
)

So what is one to do?
One solution I could think of for the console, (I have never tried it )
is to write an operator<<(std::ostream, std::wstring) that will convert
the wstring from the platform specific wide unicode encoding to UTF-8.
Of course that would mean that cout should expect UTF-8 so you can't use
it with the "normal" ASCII codepage.

br,
Martin

James Kuyper

unread,

Oct 16, 2009, 11:11:28 AM10/16/09

to

� wrote:
...

> I am very curious on how to write this program portably. I am also
> very skeptical. I don't think C++ is for non-ASCII alphabets at all.

I think you'll have to replace "non-ASCII" by some other phrase.
Whatever limitation of C++ that you're complaining about (it has many
that you might be referring to), it can't be about lack of C++ support
for non-ASCII character set encodings, since C++ does not require ASCII.
The committee went rather far out of it's way to avoid mandating any
particular character set.

The standard only puts a few restrictions on character set encodings:
1) All-bits-0 must be the null character.
2) There's a basic source character set of 96 characters that must be
must all be represented with positive and distinct values.
3) The digits '0' through '9' must be represented by consecutive values.

ASCII meets these requirements, but as a matter of deliberate design,
they were chosen to allow EBCDIC and many other encodings to be used as
well.

The only actual references to a specific character set are those to
ISO/IEC 10646, and in particular the UTF-8 encoding thereof; but support
for ISO/IEC 10646 is not mandatory, except insofar as it is the basis
for Universal Character Names (2.2p2). It will be different in the next
version of the standard, where char16_t, char32_t types will have
ISO/IEC 10646 encodings, with corresponding character and string
literals prefixed with 'u' or 'U', respectively. If __STDC_ISO_10646__
is defined, wchar_t values will also represent ISO/IEC 10646 characters.

Jim Michaels

unread,

Apr 1, 2014, 8:13:58 PM4/1/14

to

On Friday, October 16, 2009 8:16:30 AM UTC-7, Martin B. wrote:
> (see a posting of mine in m.p.v.stl:
> http://www.microsoft.com/communities/newsgroups/en-us/default.aspx?dg=microsoft.public.vc.stl&tid=db2fda2b-0423-4b5b-84f6-7c10b56ffefa&cat=en_US_ef479e33-8471-4f51-8ac6-ddac3f808cbf&lang=en&cr=US&sloc=&p=1
> )

> br,
> Martin

that post is not visible. just redirects to microsoft.com page.

don't have a working answer to my question yet. or are things so throughly broken that it's impossible to implement streams?

Richard

unread,

Apr 4, 2014, 10:30:19 PM4/4/14

to

[Please do not mail me a copy of your followup]

Jim Michaels <jmic...@this.is.invalid> spake the secret code
<ed48f1e6-da62-43f5...@googlegroups.com> thusly:

> don't have a working answer to my question yet. or are things so
> throughly broken that it's impossible to implement streams?

I can't see the earlier articles in this thread, they've expired off
my server.

However, these are my thoughts:

A stream with a narrow character encoding is fundamentally
incompatible with a wide character encoded string.

Therefor, some sort of adapter must be supplied that encodes a wide
character string into some sort of narrow character encoding. MBCS is
one such mechanism that could be used for adapting the wstring, but
not the only one.

This is why the standard library doesn't do this "out of the box" for
you -- there are many different ways to narrow a wide string depending
on the desired character encoding (UTF-8 is another one, for
instance).
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
The Terminals Wiki <http://terminals.classiccmp.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

Martin Ba

unread,

Apr 10, 2014, 4:20:41 PM4/10/14

to

On 05.04.2014 04:30, Richard wrote:
>
> [Please do not mail me a copy of your followup]
>
> Jim Michaels <jmic...@this.is.invalid> spake the secret code
> <ed48f1e6-da62-43f5...@googlegroups.com> thusly:
>
>> don't have a working answer to my question yet. or are things so
>> throughly broken that it's impossible to implement streams?
>
> I can't see the earlier articles in this thread, they've expired off
> my server.
>

The original thread can be found here:
"Make std::cout accept std::wstring ?"
https://groups.google.com/d/msg/comp.lang.c++.moderated/pPiJlLxe-ko/KQT-PYnZ
0MgJ

The mentioned ms usegroup link is this:
microsoft.public.vc.stl > "wcout, VS2008 and UTF-16"
https://groups.google.com/d/msg/microsoft.public.vc.stl/5j92GNZaie0/jyaVRuOI
SWIJ

> However, these are my thoughts:
>
> A stream with a narrow character encoding is fundamentally
> incompatible with a wide character encoded string.
>
> Therefor, some sort of adapter must be supplied that encodes a wide
> character string into some sort of narrow character encoding. MBCS is
> one such mechanism that could be used for adapting the wstring, but
> not the only one.
>
> This is why the standard library doesn't do this "out of the box" for
> you -- there are many different ways to narrow a wide string depending
> on the desired character encoding (UTF-8 is another one, for
> instance).
>

Have you ever tried:

std::string message_of_unknown_encoding = ...;
const char* pMsp = message.c_str();
std::wcout << pMsg; // note char into wcout

This will work, using `widen()`, and breaking horrible if the encoding
doesn't fit. (Incidentally it wont work with the std::string object.)

Note that, since back then I have been selectively using:

+ + + +
namespace output_operator_extension
{
// My custom converter for the char encoding used here:
std::string wstr2str(const std::wstring& wstr);

template<class Tr>
std::basic_ostream<char, Tr>&
operator<<(std::basic_ostream<char, Tr>& os, const std::wstring& str)
{
return operator<<(os, wstr2str(str));
}

template<class Tr>
std::basic_ostream<char, Tr>&
operator<<(std::basic_ostream<char, Tr>& os, const wchar_t*const& str)
{
return operator<<(os, std::wstring(str));
}
}

// Don't know or care if conformant. Works and needed on MSVC 8 (2005):
namespace std {
using output_operator_extension::operator<<;
}
+ + + + +

cheers,
Martin

--
Like any language, C++ allows you to shoot yourself
in the foot -- but with C++, you sometimes don't
realize you shot yourself until it's too late. (Jeff Langr)

Richard

unread,

Apr 19, 2014, 5:35:27 AM4/19/14

to

[Please do not mail me a copy of your followup]

Martin Ba <0xcdc...@gmx.at> spake the secret code
<li6147$46d$1...@dont-email.me> thusly:

>The original thread can be found here:

Thanks, read it. I couldn't read the link to the MS communities site,
though.

>Have you ever tried:
>
> std::string message_of_unknown_encoding = ...;
> const char* pMsp = message.c_str();
> std::wcout << pMsg; // note char into wcout

While this compiles and may do something, I can't see how it could do
anything useful since we haven't specified the encodings for widen()
to use (or vice-versa for narrow() to use).

--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
The Terminals Wiki <http://terminals.classiccmp.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

Martin B.

unread,

Apr 22, 2014, 2:23:28 AM4/22/14

to

On 19.04.2014 11:35, Richard wrote:
> [Please do not mail me a copy of your followup]
>
> Martin Ba <0xcdc...@gmx.at> spake the secret code
> <li6147$46d$1...@dont-email.me> thusly:
>
>> The original thread can be found here:
>
> Thanks, read it. I couldn't read the link to the MS communities site,
> though.
>
>> Have you ever tried:
>>
>> std::string message_of_unknown_encoding = ...;
>> const char* pMsp = message.c_str();
>> std::wcout << pMsg; // note char into wcout
>
> While this compiles and may do something, I can't see how it could do
> anything useful since we haven't specified the encodings for widen()
> to use (or vice-versa for narrow() to use).
>

I didn't mean doing this in isolation. [widen][1] is quite well defined
in terms of that it uses the locale, which specifies a character set,
which can work at runtime when "everything" is correctly set up.

What I was kind of referring to was your statement:

> This is why the standard library doesn't

> do this "out of the box" ...

Which is only 3/4 true.

Of the 4 possibilities the std has for inserting a string into a
other-char-type stream, we have (xout as example) :

* cout << wchar_t* : FAIL
* cout << wstring : FAIL
* wcout << string : FAIL
* wcout << char* : WORKS! (well, kind of)

I just meant to point out that there isn't any inherent technical thing
that would prevent us from inserting one char-type into the stream type
for the "opposite" char-type.

Of course `widen` and `narrow` don't make *any* sense as soon as you
have any MBCS (UTF-8), but I think that's quite another story.

cheers,
Martin

[1] : http://en.cppreference.com/w/cpp/locale/ctype/widen

--

Richard

unread,

Apr 23, 2014, 7:07:07 PM4/23/14

to

[Please do not mail me a copy of your followup]

"Martin B." <0xCDC...@gmx.at> spake the secret code
<lj3uat$k29$1...@dont-email.me> thusly:

>What I was kind of referring to was your statement:
>
>> This is why the standard library doesn't
>> do this "out of the box" ...
>
>Which is only 3/4 true.

Fair enough, widen() and narrow() are there and if you imbue the
proper locale in a stream, then it knows what to do.

>Of the 4 possibilities the std has for inserting a string into a
>other-char-type stream, we have (xout as example) :
>
>* cout << wchar_t* : FAIL
>* cout << wstring : FAIL
>* wcout << string : FAIL
>* wcout << char* : WORKS! (well, kind of)
>
>I just meant to point out that there isn't any inherent technical thing
>that would prevent us from inserting one char-type into the stream type
>for the "opposite" char-type.
>
>Of course `widen` and `narrow` don't make *any* sense as soon as you
>have any MBCS (UTF-8), but I think that's quite another story.

Yes, this is true since widen and narrow both operate on a single
character and therefore don't have enough information to handle the
MBCS scenario.

The best (only?) explanation of all this locale and stream stuff I've
found is in: "Standard C++ IOStreams and Locales: Advanced
Programmer's Guide and Reference" by Angelika Langer and Klaus Kreft
<http://amzn.to/1l33Sv0>

--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
The Terminals Wiki <http://terminals.classiccmp.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

Martin B.

unread,

Apr 28, 2014, 2:13:09 AM4/28/14

to

On 24.04.2014 01:07, Richard wrote:
>
> The best (only?) explanation of all this locale and stream stuff I've
> found is in: "Standard C++ IOStreams and Locales: Advanced
> Programmer's Guide and Reference" by Angelika Langer and Klaus Kreft
> <http://amzn.to/1l33Sv0>
>

Yes. I have that book.

It's great to understand and make use of what we have.

When reading it today it also shows how over-complicated iostreams is in
most everything it does. When I look at this from 2014, I just can't
stop shaking my head.

--
Good C++ code is better than good C code, but
bad C++ can be much, much worse than bad C code.

Richard

unread,

May 2, 2014, 2:37:13 AM5/2/14

to

[Please do not mail me a copy of your followup]

"Martin B." <0xCDC...@gmx.at> spake the secret code

<ljjg32$r3e$1...@dont-email.me> thusly:

>When reading it today it also shows how over-complicated iostreams is in
>most everything it does. When I look at this from 2014, I just can't
>stop shaking my head.

When I read it, I too wondered if we couldn't achieve the same goals
with something simpler.

The practical consequence of it's design is that people point to
(f)printf() being significantly faster than operator<< on a stream,
for the case where (f)printf and operator<< should yield the same
output (i.e. there is no localization to a locale other than the "C"
locale and there is no code point conversion going on).

--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
The Terminals Wiki <http://terminals.classiccmp.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

Martin B.

unread,

May 2, 2014, 8:33:51 PM5/2/14

to

On 02.05.2014 08:37, Richard wrote:
> [Please do not mail me a copy of your followup]
>
> "Martin B." <0xCDC...@gmx.at> spake the secret code
> <ljjg32$r3e$1...@dont-email.me> thusly:
>
>> When reading it today it also shows how over-complicated iostreams is in
>> most everything it does. When I look at this from 2014, I just can't
>> stop shaking my head.
>
> When I read it, I too wondered if we couldn't achieve the same goals
> with something simpler.
>

Noone has yet stepped up and come up with anything that gained traction.
Sadly.

> The practical consequence of it's design is that people point to
> (f)printf() being significantly faster than operator<< on a stream,
> for the case where (f)printf and operator<< should yield the same
> output (i.e. there is no localization to a locale other than the "C"
> locale and there is no code point conversion going on).
>

The "funny" thing is that - in my experience from a few tests here and
there and looking at the MSVC implementation of iostreams - the
performance hit is *not* from any locale applications or other
conversion going on. (After all, printf also has to use and apply the
coreect globale locale.)

For example, on MSVC, one rather expensive step in using a ostream is
constructing the actual (o)stringstream object (without output buffer
allocation). (I think it had to do with copying/binding the global
locale to the object.) Total waste, as in the next line I had to imbue
the stream with a different locale, and of course there's not ctor
taking a locale.

But it's not only the performance of iostreams that I find irritating.

Just look at the good/bad/fail bit mess.

cheers,
Martin

--

James K. Lowden

unread,

May 5, 2014, 6:57:27 PM5/5/14

to

On Fri, 2 May 2014 18:33:51 CST
"Martin B." <0xCDC...@gmx.at> wrote:

> But it's not only the performance of iostreams that I find irritating.
>
> Just look at the good/bad/fail bit mess.

What are you referring to, exactly?

The iostream status bit exemplifies C++ minimalality. The stream is:

* OK, or
* has experienced an error but remains usable, or
* has hit EOF, or
* is in an unusable state

Tested as a boolean, the stream is true/false, which is all you
need to know before proceeding.

Those states are as specific as I can imagine without getting into
details from the OS. It deals cleanly with with the EOF ambiguity of
read(2) and, unlike stdio, distinguishes between recoverable and
nonrecoverable errors.

I suspect complaints about iostreams stem not from any accidental
complexity in the design, but from inherent complexity in the problem
of I/O. In fact, I have yet to see a better design in any language.

--jkl

Seungbeom Kim

unread,

May 8, 2014, 8:39:57 AM5/8/14

to

On 2014-05-05 15:57, James K. Lowden wrote:
>
> The iostream status bit exemplifies C++ minimalality. The stream is:
>
> * OK, or
> * has experienced an error but remains usable, or
> * has hit EOF, or
> * is in an unusable state

This so far seems to imply that a stream is in one of the four states
at any given time. However:

> Tested as a boolean, the stream is true/false, which is all you
> need to know before proceeding.

The interface where each one returns true/false seems to imply that
any of the 2^4=16 combinations is possible.
Common sense tells that "OK" and "unusable", for example, cannot be
true at the same time, but the interface still allows of too many
possibilities.

Okay, good() is just a special case, a shorthand for "nothing else
is true." Then are the other three cases independent of one another,
and is any of the other 2^3=8 combinations possible?

It turns out that bad() implies fail(), because fail() also checks
badbit, so there is not really a single function that tells whether
the stream "has experienced an error but remains usable" (though you
could combine two functions).

But is it really possible that badbit is true and failbit is false?
The standard doesn't tell the answer.
If badbit also implies failbit, using a bitmask type for them could
be okay for an implementation detail, but exposing them in the public
interface in a way that implies independence is confusing at best
and an enum of three enumerators (e.g. { good, fail, bad }) could have
been much clearer (assuming that eofbit is independent of the others).
If badbit and failbit are truly independent, fail() checking also for
badbit adds confusion to the relationship between the bitmasks and the
functions.

>
> Those states are as specific as I can imagine without getting into
> details from the OS. It deals cleanly with with the EOF ambiguity of
> read(2) and, unlike stdio, distinguishes between recoverable and
> nonrecoverable errors.

Can you please explain what is the EOF ambiguity of read(2)?

> I suspect complaints about iostreams stem not from any accidental
> complexity in the design, but from inherent complexity in the problem
> of I/O. In fact, I have yet to see a better design in any language.

In addition to what I said above, maybe the naming of the iostate bits/
functions could have been better: what's the difference between 'fail'
and 'bad'; which one is more severe; does one of them imply the other;
does 'eof' imply any of them; are 'good' and 'bad' opposites; does
'while (cin>>x)' check cin.good(), !cin.fail(), or !cin.bad(); etc.
It took me a long time before I could answer these questions.

--
Seungbeom Kim

James K. Lowden

unread,

May 9, 2014, 9:11:15 AM5/9/14

to

On Thu, 8 May 2014 06:39:57 CST
Seungbeom Kim <musi...@bawi.org> wrote:

> > Those states are as specific as I can imagine without getting into
> > details from the OS. It deals cleanly with with the EOF ambiguity
> > of read(2) and, unlike stdio, distinguishes between recoverable and
> > nonrecoverable errors.
>
> Can you please explain what is the EOF ambiguity of read(2)?

When read(2) returns zero bytes, it may mean that no data
were pending on a socket with nonblocking I/O, or EOF.

Granted, files aren't sockets, and the standard library doesn't
implement network I/O for iostreams. But, as you know, just as a
stream is not a socket, neither is it a file. It strikes me as
prescient that EOF and "no data read" are distinct.

> > I suspect complaints about iostreams stem not from any accidental
> > complexity in the design, but from inherent complexity in the
> > problem of I/O. In fact, I have yet to see a better design in any
> > language.
>
> In addition to what I said above, maybe the naming of the iostate
> bits/ functions could have been better: what's the difference between
> 'fail' and 'bad'; which one is more severe; does one of them imply
> the other; does 'eof' imply any of them; are 'good' and 'bad'
> opposites; does 'while (cin>>x)' check cin.good(), !cin.fail(), or !
> cin.bad(); etc. It took me a long time before I could answer these
> questions.

I agree with most of what you said, Seungbeom. The states are complex
and as implemented it's hard to imagine how bad doesn't imply fail, or
how eof doesn't imply fail. I remember the same confusion you do about
what ! cin means.

If you disassociate a stream from files and sockets, though -- if it's
"just a stream" -- then we *can* imagine bad independent of fail. If
the stream can reflect the physical removal of the datasource -- say,
an Ethernet plug or USB connection -- it could report "bad" in lieu of
a read operation. We can also imagine EOF being signified as part of
the communcations protocol, not requiring a short read to be
discovered. Why not?

I'm simply saying that complexity is inherent in the problem domain:
the states eof, failed, and bad are distinct. They overlap in our
minds (and experience) only because of the libraries and operating
systems they've been implemented on.

> enum of three enumerators (e.g. { good, fail, bad }) could have
> been much clearer (assuming that eofbit is independent of the others).

and of course that's what we have, in the form of good(), bad(), fail()
and eof(). Is it so important that it be an enumerated type? Would
that actually simplify anything?

--jkl

--

Tobias Müller

unread,

May 9, 2014, 6:54:54 PM5/9/14

to

"James K. Lowden" <jklo...@speakeasy.net> wrote:
> On Thu, 8 May 2014 06:39:57 CST
> Seungbeom Kim <musi...@bawi.org> wrote:
>
>>> Those states are as specific as I can imagine without getting into
>>> details from the OS. It deals cleanly with with the EOF ambiguity
>>> of read(2) and, unlike stdio, distinguishes between recoverable and
>>> nonrecoverable errors.
>>
>> Can you please explain what is the EOF ambiguity of read(2)?
>
> When read(2) returns zero bytes, it may mean that no data
> were pending on a socket with nonblocking I/O, or EOF.
>
> Granted, files aren't sockets, and the standard library doesn't
> implement network I/O for iostreams. But, as you know, just as a
> stream is not a socket, neither is it a file. It strikes me as
> prescient that EOF and "no data read" are distinct.

AFAIK read/recv only return 0 on EOF. If no data is pending for nonblocking
IO, it returns -1 with errno set to EWOULDBLOCK or EAGAIN.

Tobi

James K. Lowden

unread,

May 10, 2014, 9:09:09 AM5/10/14

to

On Fri, 9 May 2014 15:54:54 -0700 (PDT)
Tobias M�ller <tro...@bluewin.ch> wrote:

>>> Can you please explain what is the EOF ambiguity of read(2)?
>>
>> When read(2) returns zero bytes, it may mean that no data
>> were pending on a socket with nonblocking I/O, or EOF.

....

> AFAIK read/recv only return 0 on EOF. If no data is pending for
> nonblocking IO, it returns -1 with errno set to EWOULDBLOCK or EAGAIN.

My mistake, thank you for the correction. I was thinking of fread(3),
not the same thing at all, and having nothing to do with nonblocking
I/O, and somehow had convinced myself that read(2) shared the same
ambiguity.

--jkl

Seungbeom Kim

unread,

May 11, 2014, 8:44:01 AM5/11/14

to

On 2014-05-09 06:11, James K. Lowden wrote:
> On Thu, 8 May 2014 06:39:57 CST
> Seungbeom Kim <musi...@bawi.org> wrote:
>>
>> Can you please explain what is the EOF ambiguity of read(2)?
>
> When read(2) returns zero bytes, it may mean that no data
> were pending on a socket with nonblocking I/O, or EOF.

I thought I'd get an error of EWOULDBLOCK if no data were pending
on a nonblocking socket.

> If you disassociate a stream from files and sockets, though -- if it's
> "just a stream" -- then we *can* imagine bad independent of fail. If
> the stream can reflect the physical removal of the datasource -- say,
> an Ethernet plug or USB connection -- it could report "bad" in lieu of
> a read operation. We can also imagine EOF being signified as part of
> the communcations protocol, not requiring a short read to be
> discovered. Why not?

Are you saying that an error from the physical source sets badbit, and
an error purely within the stream layer (e.g. conversion to an integer
from characters that have been read from the source successfully) sets
failbit? I hope it's as simple as that, but I'm not sure how consistent
that understanding is with what the standard says (which still feels vague):

* badbit indicates a loss of integrity in an input or output sequence
(such as an irrecoverable read error from a file);
* failbit indicates that an input operation failed to read the expected
characters, or that an output operation failed to generate the desired
characters.

> I'm simply saying that complexity is inherent in the problem domain:
> the states eof, failed, and bad are distinct. They overlap in our
> minds (and experience) only because of the libraries and operating
> systems they've been implemented on.

I tend to agree that I/O can be inherently complex. I also understand
that eof is somewhat distinct, but I don't get (yet) how precisely fail
and bad are distinct from each other, nor in what different ways I can
deal with the two kinds of errors.

>> enum of three enumerators (e.g. { good, fail, bad }) could have
>> been much clearer (assuming that eofbit is independent of the others).
>
> and of course that's what we have, in the form of good(), bad(), fail()
> and eof(). Is it so important that it be an enumerated type? Would
> that actually simplify anything?

Again, the difference is in the number of (seemingly) possible states.

For example, given a simple traffic signal with red, yellow, and
green lights where one and only one of them is ON at any given time,
'enum { red, yellow, green } state();' is a better interface than
'bool red(), yellow(), green();' because with the former you know
exactly which one of the three states you are in, and you don't
need to worry 'What if red() and yellow() are both ON?'.

Yes, if you dig into the standard, you can learn that bad() implies
fail() and good() implies !fail(), and with some logic you can infer
that effectively you have three MECE states {!fail(), fail() && !bad(),
bad()} (modulo eof()). But this is getting far from being intuitive.

Maybe it doesn't matter as much because I have never had to deal with
bad() explicitly, and operator void*/bool() and eof() are all I have
needed so far...

--
Seungbeom Kim

James K. Lowden

unread,

May 12, 2014, 1:40:05 AM5/12/14

to

On Sun, 11 May 2014 05:44:01 -0700 (PDT)
Seungbeom Kim <musi...@bawi.org> wrote:

> On 2014-05-09 06:11, James K. Lowden wrote:
> > On Thu, 8 May 2014 06:39:57 CST
> > Seungbeom Kim <musi...@bawi.org> wrote:
> >>
> >> Can you please explain what is the EOF ambiguity of read(2)?
> >
> > When read(2) returns zero bytes, it may mean that no data
> > were pending on a socket with nonblocking I/O, or EOF.
>
> I thought I'd get an error of EWOULDBLOCK if no data were pending
> on a nonblocking socket.

Quite right. I was thinking of fread(), which when it returns zero has
to be checked with feof() and ferror(). I remembered there was
ambiguity, but not the condition. :-/

> > If you disassociate a stream from files and sockets, though -- if
> > it's "just a stream" -- then we *can* imagine bad independent of
> > fail. If the stream can reflect the physical removal of the
> > datasource -- say, an Ethernet plug or USB connection -- it could
> > report "bad" in lieu of a read operation. We can also imagine EOF
> > being signified as part of the communcations protocol, not
> > requiring a short read to be discovered. Why not?
>
> Are you saying that an error from the physical source sets badbit, and
> an error purely within the stream layer (e.g. conversion to an integer
> from characters that have been read from the source successfully) sets
> failbit?

Not *does*, but could. The design and semantics of the iostream status
word perforce antedate implementation and standardization.

I'm not making any statement about the standard; I'm just defending the
design. There's a difference between "the last operation failed" (e.g.
could not convert) and "the stream is no longer usable" (e.g. the
device has been disconnected).

As currently implemented, afaik there's no way badbit will change state
without a failed operation, so bad implies fail. You would be rightly
suprised to find failbit behaving as volatile, spontaneously changing
to reflect the device state. I'm only saying that's an artifact of
the current implementation, not the design per se.

> * badbit indicates a loss of integrity in an input or output sequence
> (such as an irrecoverable read error from a file);
> * failbit indicates that an input operation failed to read the
> expected characters, or that an output operation failed to generate
> the desired characters.

Right, in the first case the stream is kaput, pushing up daisies, an
ex-stream. Nothing you do, even putting the floppy disk back in the
drive, will bring it back.

In the second case, you stil have a chance. The tellg() pointer hasn't
budged, and you can re-try the input with a different conversion.

Arguably, it's confusing because the first is a property of the stream
and the second is a property of the operation. Traditionally functions
return status -- erc = func(...) -- but the iostream operator syntax
precludes that, so the status is stashed in the stream. The
property of the stream becomes, "the last operation failed".

> >> enum of three enumerators (e.g. { good, fail, bad }) could have
> >> been much clearer (assuming that eofbit is independent of the
> >> others).
> >
> > and of course that's what we have, in the form of good(), bad(),
> > fail() and eof(). Is it so important that it be an enumerated
> > type? Would that actually simplify anything?
>
> Again, the difference is in the number of (seemingly) possible states.
>
> For example, given a simple traffic signal with red, yellow, and
> green lights where one and only one of them is ON at any given time,
> 'enum { red, yellow, green } state();' is a better interface than
> 'bool red(), yellow(), green();' because with the former you know
> exactly which one of the three states you are in, and you don't
> need to worry 'What if red() and yellow() are both ON?'.

Agreed. I think the question boils down to whether or not failbit and
badbit represent orthogonal states.

We know operations can fail without the stream "losing integrity". We
can imagine the stream going bad spontaneously (if, say, the operator
unmounts the drive) even if current implementation requires a failed
operation to report it. ISTM the two states are quite independent.

--jkl

--

Chris Vine

unread,

May 12, 2014, 6:48:01 PM5/12/14

to

On Sun, 11 May 2014 22:40:05 -0700 (PDT)

"James K. Lowden" <jklo...@speakeasy.net> wrote:

[snip]

> Agreed. I think the question boils down to whether or not failbit and
> badbit represent orthogonal states.
>
> We know operations can fail without the stream "losing integrity". We
> can imagine the stream going bad spontaneously (if, say, the operator
> unmounts the drive) even if current implementation requires a failed
> operation to report it. ISTM the two states are quite independent.

What the standard says (�27.5.3.1.4) for badbit, eofbit and failbit is:

badbit: indicates a loss of integrity in an input or output sequence
(such as an irrecoverable read error from a file);

eofbit: indicates that an input operation reached the end of an input
sequence;

failbit: indicates that an input operation failed to read the expected
characters, or that an output operation failed to generate the desired
characters.

This seems to imply that, as you say, you could get badbit set without
failbit being set. If so, failbit would be set immediately you actually
attempt an input or output on a stream with a set badbit (or a set
eofbit for that matter).

But ... this is all very good in theory, but a stream object only has
such information as it is able to obtain from its streambuffer, which
is the object which interfaces with the operating system's input/output
primitives. In the general case this information in turn depends on
the return values of the streambuffer's overridable uflow()/underflow()/
overflow() methods for character input/output, and xsgetn() and
xsputn() for block input/output, together with the values of the buffer
pointers (if any).

According to the standard, underflow() returns "traits::to_int_type(c),
where c is the first character of the pending sequence, without moving
the input sequence position past it. If the pending sequence is null
then the function returns traits::eof() to indicate failure". uflow()
does the same but increments the next pointer. overflow() "returns
traits::eof() or throws an exception if the function fails. Otherwise,
returns some value other than traits::eof() to indicate success".

xsgetn() and xsputn() are even less informative. They return the
number of characters read into the buffer argument, or written out
respectively. Stream failure or end-of-file is deduced from there
being a short read or short write.

Everyone will probably have written a streambuffer in order to do
anything useful with sockets or safe temporary files, amongst other
things. When I have written a streambuffer I have just had the
streambuffer's virtual functions return what is mandated by the
standard. As to what this achieves in practice, as I recall on a read
std::istream normally sets eofbit and failbit together, and sets badbit
when a further read attempt is made. As would be expected, the standard
line-reading functions behave differently because a short read still
satisfies the request for a line of input: as far as I recall
std::istream::getline() and std::getline() set eofbit whenever they
encounter traits::eof or a short read from the streambuffer but only
set failbit when there are no characters of a line to return at all (ie
on the next read), when badbit is also set. On a failed write (which
in practice means a failed flush), I seem to recall that std::ostream
sets all three bits immediately.

This rarely matters because as you say the fail() method returns true
if either badbit or failbit are set.

I guess it would be possible for something deriving from basic_ostream
or basic_istream to be more discriminating if provided with
supplementary error information by a custom streambuffer's additional
error reporting functions, but has anyone actually written a
streambuffer/stream combination which does this?

Chris