Framework Qt with c++

Andrea

unread,

Oct 19, 2014, 6:55:23 AM10/19/14

to

I saw this framework and it seems very interesting (I have done some
tests with the Qt Creator editor and are positively surprised).

Does anyone use it? Opinions?

--

Rick C. Hodgin

unread,

Oct 19, 2014, 8:16:49 AM10/19/14

to

Qt's licensing model can be undesirable for some. It is a solid
framework though. Have you tried wxCrafter?

Best regards,
Rick C. Hodgin

Victor Bazarov

unread,

Oct 19, 2014, 8:17:32 AM10/19/14

to

My opinion is that some folks use it. Some companies even use it in the
foundation of their commercial products. I think you can find some
testimonials on the web site...

V
--
I do not respond to top-posted replies, please don't ask

Rick C. Hodgin

unread,

Oct 19, 2014, 8:19:13 AM10/19/14

to

http://en.m.wikipedia.org/wiki/WxWidgets

Andrea

unread,

Oct 19, 2014, 9:22:41 AM10/19/14

to

Thanks for the reply.

Maybe I'm wrong, but WxWidgets it only for the GUI. Qt can be used for
the database, file system etc... Now I'm testing class for tcp socket
and it's not complicated. For now, my vote is positive, but I wanted to
know an opinion from people with more experience than me :)

--

Andrea

unread,

Oct 19, 2014, 9:31:32 AM10/19/14

to

On 19/10/2014 14:17, Victor Bazarov wrote:
> My opinion is that some folks use it. Some companies even use it in the
> foundation of their commercial products. I think you can find some
> testimonials on the web site...

Thank for the reply.

I am follow this group for a few months, and I never saw discussion
about Qt. I thought that is not used or that it's a bad product... and I
wanted to know an opinion from /skilled/ people in this group before
lose my time with this framework :)

--

Mr Flibble

unread,

Oct 19, 2014, 10:45:54 AM10/19/14

to

Qt is great for GUI but I wouldn't use Qt for anything else; for sockets
I would use boost.asio.

I wouldn't touch wxWidgets with a barge pole as it looks too much like
that shite Microsoft effort called "MFC".

/Flibble

Jorgen Grahn

unread,

Oct 19, 2014, 11:08:56 AM10/19/14

to

On Sun, 2014-10-19, Andrea wrote:
> On 19/10/2014 14:17, Victor Bazarov wrote:
>> My opinion is that some folks use it. Some companies even use it in the
>> foundation of their commercial products. I think you can find some
>> testimonials on the web site...
>
> Thank for the reply.
>
> I am follow this group for a few months, and I never saw discussion
> about Qt. I thought that is not used or that it's a bad product...

I'm not a user because I don't do GUIs, but my impression is:
- Qt /is/ discussed here now and then
- but not very in-depth
- and usually it's "your GUI toolkit sucks more than mine!"
type discussions, or licensing wars
- there are connections to the infected Linux desktop situation,
too ... GNOME and so on

> and I
> wanted to know an opinion from /skilled/ people in this group before
> lose my time with this framework :)

Skilled with such toolkits, you mean. Yes, it's a valid question.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Öö Tiib

unread,

Oct 19, 2014, 11:13:29 AM10/19/14

to

Qt is quite decent product. It is fairly widely portable and simplifies
lot of aspects of application-programming. You won't lose anything
if you try it out, more likely you will get some good ideas. You can
likely find some Qt-related advice here too but it is better to discuss
it in Qt-specific forums.

The reasons are that ...
* Qt code it is not exactly standard C++ (that we discuss here). Qt has
introduced some special keywords. The code is preprocessed before C++
compiler with utilities of Qt. The utilities do not handle C++
preprocessor macros or templates too well and that may give surprising
results sometimes. It is better to follow certain Qt-specific idioms of
coding with it.
* Qt has introduced script language qml for user interface definition.
That is processed run-time. It is not too well thought thru, HTML with
its .css, .js and .htm files feels better GUI definition system. Anyway
it is important part of Qt that has nothing to do with C++.

Norbert_Paul

unread,

Oct 19, 2014, 11:47:46 AM10/19/14

to

I have to use it in my company. My predecessor in my job had /everything/
entagled with Qt -- even low-level networking stuff, where, for example,
he used the quint8 data type as bytes.

I am somewhat ambiguous about Qt, because whereas it has some nice features
but I'd rather not use it. Some criticism in detail:

(1) Qt "extends" C syntax by provinding features like sigals, slots,
properties, etc. This leads to a huge amount of automatically generated
intermediate C++ files (moc_<classname>.cpp) that implement these language
extensions and are costly to compile. I often spend quite much time waiting
for the compiler to finish after having made minor changes. (Actually, I
suspect that this having to wait so much is also a design issue of our software).

(2) Qt defines elements that compete with STL, like QString, QMap, QList, etc.
Being a great fan of the STL I see no point why Qt tries to re-invented
the wheel. There also exist QThred, QFile, QDirectory, QApplication, qMax<T>, ...

(3) When you use the STL -- or templates in general -- (at least with Qt Creator
that came along Qt 4.8) you'll find out that the IDE is not aware of templates.
So when you have, say,
std::vector<Foo*> foos;
...
for ( std::vector<Foo*>::iterator i = foos.begin(), n = foos.end() ; i != n ; ++i ) {
(*i)// Typing continues here...
}
then the IDE does not recognise the type of *i for auto-completion (but it compiles well).
As I find auto-completion very helpful, in particular because it helps to avoid typing
errors, I often end up in ugly idioms like

for ( ... /*same as above*/ ) {
Foo * pFoo = *i;
pFoo// Typing continues here...
}

If I could decide at my own I'd only use Qt for the GUI and keep the application
logic itself completely Qt-free. Such a Qt-free model also gives you the freedom to port
the software to other windowing toolkits.

Norbert

Balwinder S Dheeman

unread,

Oct 20, 2014, 12:50:27 AM10/20/14

to

On 10/19/2014 05:46 PM, Rick C. Hodgin wrote:
> Qt's licensing model can be undesirable for some. It is a solid
> framework though. Have you tried wxCrafter?

There also exist:
http://www.fltk.org/
http://www.fox-toolkit.org/

and even:
http://notus.sourceforge.net/

If someone is that much worried on licensing should not one contribute
and, or promote some and, or one of the above stated C++ GUI Toolkits?

--
Balwinder S "bdheeman" Dheeman (http://bdheeman.BlogSpot.in/)
"Working together, works! The proof is GNU/Linux and F/LOSS Projects;
Do you too voluntarily work on or contribute to making any difference?"

Reinhardt Behm

unread,

Oct 20, 2014, 12:54:31 AM10/20/14

to

Since using Qt beginning with version 2.x for about 14 years I'm a bit
biased.
To fully use Qt you need to invest some time. Read to very good
documentation, try the examples.
When you start to use it you will probably work "against" it instead of
"with" it until you have grasped the spirit of the whole framework.
What I most like about it, is some kind of "completeness". It gives you
tools for most programming tasks - at least mine. Before using Qt I
investigated other frameworks but found most lacking in some regard, forcing
me to use several frameworks with never really fit together. Also the
commercial backing behind Qt has guarantied a quite stable API which I have
missed with other libraries where even with minor version changes the API
was incompatible.

If you program for different environments Qt is great. I mostly work in and
for Linux but if somebody comes and needs the software for Windows I just
recompile it.
And the different environments include Linux Desktop, Linux embedded,
Windows Desktop Windows Embedded and many more, those are the one I used up
to now.
Some people don't like the licensing. For me LGPL is Ok. Before they opened
it to LGPL I had a commercial license.
The extension the Trolls added to C++ are no problem for me. I am no
language purist having worked with to many languages over the years.

I do not use their scripting language (QML). For me that's just another
language for the Java kiddies. My applications (avionics, and other mission
critical stuff) does not have room for that.
If you come from C++ it is probably best to stick to QWidgets and QObjects.

--
Reinhardt

Qu0ll

unread,

Oct 20, 2014, 1:03:04 AM10/20/14

to

"Reinhardt Behm" wrote in message news:m224ho$jr9$1...@dont-email.me...

[snip]

> I do not use their scripting language (QML). For me that's just another
> language for the Java kiddies. My applications (avionics, and other
> mission
> critical stuff) does not have room for that.

Can you elaborate on this? What does "does not have room for that" mean
exactly? Is it a performance concern due to another layer of abstraction or
is it code bloat you are referring to?

--
And loving it,

-Qu0ll (Rare, not extinct)
_________________________________________________
Qu0llS...@gmail.com
[Replace the "SixFour" with numbers to email me]

Reinhardt Behm

unread,

Oct 20, 2014, 3:59:32 AM10/20/14

to

Qu0ll wrote:

> "Reinhardt Behm" wrote in message news:m224ho$jr9$1...@dont-email.me...
>
> [snip]
>
>> I do not use their scripting language (QML). For me that's just another
>> language for the Java kiddies. My applications (avionics, and other
>> mission
>> critical stuff) does not have room for that.
>
> Can you elaborate on this? What does "does not have room for that" mean
> exactly? Is it a performance concern due to another layer of abstraction
> or is it code bloat you are referring to?
>

On point is performance. When you do not have much space and must fulfill
hard environmental requirements (-55°C to +85°C ambient temp for example)
you can not use the newest fancy CPU and much RAM. So wasting CPU cycles to
interpret another language is not the preferred method.
The other is you have to verify your code. With QML (other most other
interpreted languages) I have no code checking, I could even create self-
modifying code easily. If I would show something like this to some
certification authorities they will stop speaking to me.

Would you like to fly in an aircraft where critical things are programmed
this way?

--
Reinhardt

Christopher Pisz

unread,

Oct 20, 2014, 11:33:26 AM10/20/14

to

All the responses in this thread ignore what the target OS is.
If yer on *nix, sure, go Qt. If you are targeting Windows then there are
vastly more options. You could even use the Windows API....

Christopher Pisz

unread,

Oct 20, 2014, 11:40:28 AM10/20/14

to

P.S
I would question why anyone would do a UI in C++ at all, unless you were
programming up a video game. C++ is for high performance critical code
imo. .NET or Java would be more suited for a UI that has to wait on a
user to click a button. You can separate the two and it has been my
experience in most projects that they will. Let your C++ code do the
heavy lifting and let some other technology present it.

The only UIs I see people make new with C++ are very simple dialogs. If
you find yourself working on a UI in C++ these days, it was probably
some ancient artifact that no one wants to do over imo.

Mr Flibble

unread,

Oct 20, 2014, 1:26:02 PM10/20/14

to

The proposal to add 2D graphics support to the C++ standard library is
at odds with your analysis re C++ and writing GUIs.

/Flibble

Melzzzzz

unread,

Oct 20, 2014, 2:15:20 PM10/20/14

to

On Mon, 20 Oct 2014 10:40:15 -0500
Christopher Pisz <nos...@notanaddress.com> wrote:

> On 10/20/2014 10:33 AM, Christopher Pisz wrote:
> > On 10/19/2014 5:55 AM, Andrea wrote:
> >> I saw this framework and it seems very interesting (I have done
> >> some tests with the Qt Creator editor and are positively
> >> surprised).
> >>
> >> Does anyone use it? Opinions?
> >>
> >
> > All the responses in this thread ignore what the target OS is.
> > If yer on *nix, sure, go Qt. If you are targeting Windows then
> > there are vastly more options. You could even use the Windows
> > API....
> >
>
>
> P.S
> I would question why anyone would do a UI in C++ at all, unless you
> were programming up a video game. C++ is for high performance
> critical code imo. .NET or Java would be more suited for a UI that
> has to wait on a user to click a button.

Java for UI?! Bljak...

You can separate the two and
> it has been my experience in most projects that they will. Let your
> C++ code do the heavy lifting and let some other technology present
> it.
>
> The only UIs I see people make new with C++ are very simple dialogs.
> If you find yourself working on a UI in C++ these days, it was
> probably some ancient artifact that no one wants to do over imo.

C++ is perfect for UI... Take for example spreadsheet app in few
lines of code with Qt... from qt programming book....

--
Manjaro all the way!
http://manjaro.org/

Andreas Dehmel

unread,

Oct 20, 2014, 2:37:55 PM10/20/14

to

On Sun, 19 Oct 2014 17:47:36 +0200
Norbert_Paul <norbertpau...@yahoo.com> wrote:

[...]

> (1) Qt "extends" C syntax by provinding features like sigals, slots,
> properties, etc. This leads to a huge amount of automatically
> generated intermediate C++ files (moc_<classname>.cpp) that implement
> these language extensions and are costly to compile. I often spend
> quite much time waiting for the compiler to finish after having made
> minor changes. (Actually, I suspect that this having to wait so much
> is also a design issue of our software).

The moc is only needed to _define_ signals/slots, not to use them.
And since slots are just normal methods with some syntactic sugar
for the moc, you can often avoid moc-specifics spreading too far
by building moc-aware base classes with virtual slots and deriving
from those. I often do that for standard stuff.
Qt5 is said to have an alternative approach to signals/slots, but
I haven't tried it yet.

> (2) Qt defines elements that compete with STL, like QString, QMap,
> QList, etc. Being a great fan of the STL I see no point why Qt tries
> to re-invented the wheel. There also exist QThred, QFile, QDirectory,
> QApplication, qMax<T>, ...

There are two reasons why these classes exist:
1) Qt predates usable STL (in a productive sense) by quite some time
2) Qt wants a stable _binary_ interface within major releases and you
can completely forget about that as soon as you involve the STL.

One could also add
3) Why fix what isn't broken? Qt containers also have an STL-compatible
interface including iterators, plus they allow implicit sharing.

Furthermore, QString does NOT compete with std::string. QString exists
due to the blatant absence of a proper string class anywhere in the
standard. QString has proper Unicode support, and all classes within
Qt which use "strings" interface to it, so it's trivial to have full,
platform-independent, lossless Unicode support from the command line to
the file system. Good luck trying that with std::string, std::wstring and
the (mostly) char*-based crap provided by the standard -- it might work
on Unix with UTF-8 encoding to a certain extent (and that only by accident,
not because it's standard-defined behaviour), but the only thing it'll
produce on Windows is another broken application.
There's an analogon of std::string in Qt too, BTW. It's called QByteArray
and the main difference between that and std::string is that Qt used
the corrent name.

Andreas
--
Dr. Andreas Dehmel Ceterum censeo
FLIPME(ed.enilno-t@nouqraz) Microsoft esse delendam
http://www.zarquon.homepage.t-online.de (Cato the Much Younger)

Mr Flibble

unread,

Oct 20, 2014, 2:52:52 PM10/20/14

to

QString uses wchar_t and UTF-16 which I would hardly call "proper
Unicode support".

> Qt which use "strings" interface to it, so it's trivial to have full,
> platform-independent, lossless Unicode support from the command line to
> the file system. Good luck trying that with std::string, std::wstring and
> the (mostly) char*-based crap provided by the standard -- it might work
> on Unix with UTF-8 encoding to a certain extent (and that only by accident,
> not because it's standard-defined behaviour), but the only thing it'll
> produce on Windows is another broken application.
> There's an analogon of std::string in Qt too, BTW. It's called QByteArray
> and the main difference between that and std::string is that Qt used
> the corrent name.

At least std::string lets you encode UTF-8 which is far superior to
UTF-16 (which like UTF-8 is also a variable length encoding).

/Flibble

Christopher Pisz

unread,

Oct 20, 2014, 2:54:45 PM10/20/14

to

On 10/20/2014 12:25 PM, Mr Flibble wrote:
> On 20/10/2014 16:40, Christopher Pisz wrote:
>> On 10/20/2014 10:33 AM, Christopher Pisz wrote:
>>> On 10/19/2014 5:55 AM, Andrea wrote:

SNIP

> The proposal to add 2D graphics support to the C++ standard library is
> at odds with your analysis re C++ and writing GUIs.
>
> /Flibble
>

Plans for the future don't effect an analysis of the current.

Mr Flibble

unread,

Oct 20, 2014, 2:59:55 PM10/20/14

to

Bullshit mate and you know it.

/Flibble

Reinhardt Behm

unread,

Oct 20, 2014, 10:24:05 PM10/20/14

to

Why should I use two different languages with all problems to get my data
from one to the other when I have one framework doing everything for me? And
it's doing it on many platforms without any changes to the source.

--
Reinhardt

Daniel

unread,

Oct 20, 2014, 10:45:26 PM10/20/14

to

On Monday, October 20, 2014 2:52:52 PM UTC-4, Mr Flibble wrote:
>
> At least std::string lets you encode UTF-8 which is far superior to
> UTF-16 (which like UTF-8 is also a variable length encoding).
>

I believe you mean that std::string can serve as a container of UTF-8 octets, but so can std::vector<char>. Apart from that, std::string supports no "string" semantics whatsoever.

Daniel

Andreas Dehmel

unread,

Oct 21, 2014, 12:53:29 PM10/21/14

to

On Mon, 20 Oct 2014 19:52:40 +0100
Mr Flibble <flibbleREM...@i42.co.uk> wrote:

> On 20/10/2014 19:35, Andreas Dehmel wrote:

[...]

> > Furthermore, QString does NOT compete with std::string. QString
> > exists due to the blatant absence of a proper string class anywhere
> > in the standard. QString has proper Unicode support, and all
> > classes within
>
> QString uses wchar_t and UTF-16 which I would hardly call "proper
> Unicode support".
>
> > Qt which use "strings" interface to it, so it's trivial to have
> > full, platform-independent, lossless Unicode support from the
> > command line to the file system. Good luck trying that with
> > std::string, std::wstring and the (mostly) char*-based crap
> > provided by the standard -- it might work on Unix with UTF-8
> > encoding to a certain extent (and that only by accident, not
> > because it's standard-defined behaviour), but the only thing it'll
> > produce on Windows is another broken application. There's an
> > analogon of std::string in Qt too, BTW. It's called QByteArray and
> > the main difference between that and std::string is that Qt used
> > the corrent name.
>
> At least std::string lets you encode UTF-8 which is far superior to
> UTF-16 (which like UTF-8 is also a variable length encoding).

First up, no real string class' random accessors will ever return
anything other than "characters" (which is something completely
different from "instances of a type which, for mildly amusing
historical reasons, contains the sequence 'char' in its name).
Contrast this to std::string containing UTF-8 or any other multibyte
encoding, which will return binary gobbledegook. UTF-8 is a great
format for serializing strings into an external representation
(and QString has conversions both ways built in, no need to go
the fully generic route via QTextCodec), but it's utterly ridiculous
as internal storage of a string class.

And second: what do you think you can do with your UTF-8 std::string
class? Feed it into any of the standard lib's functions taking a
char*-argument? Obtaining it from the command line? Because both
are not standard-defined behaviour and will definitely not work on
Windows as a popular example. Using the QtCore library, however, I can
for instance obtain a command line containing a filename with chinese
characters and open that file on all supported platforms provided I
have the necessary access rights. I would call that "proper Unicode
support". The standard lib, OTOH, as usual for anything approaching
practical relevance, classifies this as "implementation-defined-behaviour"
(with known and very popular points of failure); and I'd definitely
call that "no Unicode support". What-so-ever.

Mr Flibble

unread,

Oct 21, 2014, 1:34:09 PM10/21/14

to

What Windows does is off-topic and irrelevant as Qt is supposed to be
cross-platform.

UTF-16 also does not support random access due to UTF-16 surrogate pairs
so is a variable length encoding like UTF-8; again UTF-8 is superior to
UTF-16.

/Leigh

/Flibble

Mr Flibble

unread,

Oct 21, 2014, 1:38:08 PM10/21/14

to

Sure it does: I have no problem storing my UTF-8 in std::string objects;
the need to randomly access code points is usually only required at the
rendering level or when doing certain types of string comparisons but if
you want do that then do it properly with something like ICU.

/Leigh

Christopher Pisz

unread,

Oct 21, 2014, 3:19:42 PM10/21/14

to

"problems" are ill defined. Sure you have to communicate to your UI as
your UI is completely separated. But shouldn't your UI be completely
separated anyway? Isn't there some work to be done in communicating to
your UI anyway whether it be through events or some other mechanism, and
then threading issues to boot? Is it _really_ more work or more problematic?

For a good looking and complicated UI, it makes sense in that,
productivity is higher. In my personal experience, it took me one week
to complete the same UI using XAML compared to 6 months in C++ using a
UI framework and there were still issued to be sorted out in the latter.

It makes sense in that it is the modern way of doing things and an
easier way of doing things, in a growing opinion.

Your response leads me to believe however, that you are not working on a
new project though and that cross platform is evidently important to
you. Both of which are factors that I pointed out as exceptions.

If you want to write a UI in *nix, then have at it! I'm sure that is
where these UI frameworks originated.

Victor Bazarov

unread,

Oct 21, 2014, 4:19:12 PM10/21/14

to

On 10/21/2014 3:19 PM, Christopher Pisz wrote:
> [..]

> Your response leads me to believe however, that you are not working on a
> new project though and that cross platform is evidently important to
> you. Both of which are factors that I pointed out as exceptions.

8-O

OK, I'm taking this out of the context, possibly, not to mention
replying to a follow-up (repeated) statement, but... REALLY?! Not
working on a new project is *an exception*? How many programmers
worldwide do you think work on new projects right now versus on "legacy"
(and I use this term somewhat liberally here) projects?

The importance of cross platform development is an exception, really?
Considering myself a pretty much Windows developer as far as apps go, I
haven't worked on many projects in my life (couple dozen if we don't
count small stuff). Still at least 20% of them were produced on more
than one platform. Today we have at least three mobile platforms to
deal with, in addition to "stationary" ones, so how *exceptional* do you
think is portability nowadays?

<eye roll>

V
--
I do not respond to top-posted replies, please don't ask

Christopher Pisz

unread,

Oct 21, 2014, 4:52:08 PM10/21/14

to

Depends on your niche.

Yes, surely it is. In my 20 years 100% of the processes I've worked on
have been 100% targeted for Windows or 100% targeted for Linux. None
have been targeted for mobile platforms. Keep in mind that a project can
emcompess many many processes which could be subprojects. I am talking
about a project meaning one executable or one library or one solution
that results in one of the above.

I've also seen many many brand spanking new projects whether they be
tools to work with existing stuff, reporting apps, UIs to talk send
commands to existing servers, diagnostics, or entirely new independent
things. However, I haven't gotten to work on such things myself, because
well...I marketed myself as a C++ programmer and let's face it, C++
programmers get hired to fix existing C++ code most of the time,
possibly leading you to your experience, which admitadly is more tenured
than mine. However, I have had the pleasure of casting my vote in
meetings to have other programmers (such as .NET guys) create the
aforementioned projects new in their technology if it makes sense. I
simply am not going to cling on to any delusion that C++ is better for
UI development than the other technologies out there. It isn't.

I will however vote that anything high performance, scientific, or
realtime should probably come to my desk.

Its been many occasions that we have some backend service that performs
some kind of calculations or data transformations and it comes time to
put some UI on it. If we are already on a Windows Platform, I think it
is just plain silly to build some UI on top if it in C++ as opposed to
communicating with a .NET app. They can make something beautiful and
completely separated, and separately testable in XAML in a matter of days.

Even at my current job, I am working on an existing C++ service that
communicates via soap to a server written in .NET with an app that
presents, written in .NET. It happens over and over. That's the way it
was when I got here and for good reason.

I guess maybe I gave up my C++ is great for everything banner long ago.
I still love the language and its the only one I can claim to be
marketable using.

>
> <eye roll>
<wink>

> V

Christopher Pisz

unread,

Oct 21, 2014, 6:41:59 PM10/21/14

to

Cross Platform is a funny term as well. To some people that means every
line of code must compile and run on every platform. To others it means
littering the code with #ifdef. To others still it may mean maintaining
two versions of a couple libraries or executables. If I was stuck in a
situation where I really had to run on different OSes, I'd tend towards
the latter.

Or in the case of different Windows versions, just chose the lowest
common denominator (dernit XP die) and choose the platform toolset in
project settings.

I'd think it really difficult to get anything done using the first
definition. Standard C++ by itself just doesn't do everything needed in
most cases. You've got to interact with the OS at some point.

I'm sure even QT goes the #ifdef route and calls the Windows API or the
*nix eqiv libraries

Reinhardt Behm

unread,

Oct 22, 2014, 1:03:53 AM10/22/14

to

Christopher Pisz wrote:

> Its been many occasions that we have some backend service that performs
> some kind of calculations or data transformations and it comes time to
> put some UI on it. If we are already on a Windows Platform, I think it
> is just plain silly to build some UI on top if it in C++ as opposed to
> communicating with a .NET app. They can make something beautiful and
> completely separated, and separately testable in XAML in a matter of days.

Nice for you if you get something tested in a matter of days.
My software often (e.g. avioncis) has to be certified with the relevant
authorities. This verification can take weeks to months.

> Even at my current job, I am working on an existing C++ service that
> communicates via soap to a server written in .NET with an app that
> presents, written in .NET. It happens over and over. That's the way it
> was when I got here and for good reason.

If you have the bandwidth to transport a matchbox using an 18-wheeler (like
when soap). Using a satellite link at 2400 bit/sec where you have to
literally pay for every byte does not allow me such luxury.

--
Reinhardt

Christopher Pisz

unread,

Oct 22, 2014, 11:01:39 AM10/22/14

to

2400 bps? Where are you working? Why is it still 1983 there?

Anyway you could use COM and send your own structures, compress them, etc.

If you really wanted you could make your own protocol and use your own
raw socket. It just depends how much you want to code and maintain.

I am not sure why you have such a bandwidth limitation. Because
considering the argument, I would expect you to have the backend and the
UI on the same machine. I don't in my scenario, but when people are
arguing for making the UI in C++, I assume the want it in as part of the
same executable as their computations. If not, you have the same
problems of transporting data back and forth. It's not a language issue.

Scott Lurndal

unread,

Oct 22, 2014, 12:30:01 PM10/22/14

to

Christopher Pisz <nos...@notanaddress.com> writes:

>Cross Platform is a funny term as well. To some people that means every
>line of code must compile and run on every platform. To others it means
>littering the code with #ifdef. To others still it may mean maintaining
>two versions of a couple libraries or executables. If I was stuck in a
>situation where I really had to run on different OSes, I'd tend towards
>the latter.

In the cross-platform projects I've worked on, we generally have an
os dependent class which encapsulates the platform dependencies and
the Makefile will compile the proper implementation. Sometimes multiple
classes (e.g we have a network port utility class whose implementation depends
on the platform).

The OS dependent class is mainly static functions.

examples:

public:
static void backtrace(c_logger *lp);
static uint32 max_open_files(void);
static bool daemonize(const char *, const char *);
static const char *get_daemonize_error(void) { return daemonize_error; }
static bool set_coredir(const char *);

static bool create_pidfile(const char *);
static bool remove_pidfile(const char *);
static const char *get_pidfile_error(void) { return pidfile_error; }

static void initialize_system_log(const char *);

static void system_log(const char *, ...);
static void system_log(const char *, va_list);

Andreas Dehmel

unread,

Oct 22, 2014, 3:10:31 PM10/22/14

to

On Tue, 21 Oct 2014 18:33:59 +0100
Mr Flibble <flibbleREM...@i42.co.uk> wrote:

[...]

> What Windows does is off-topic and irrelevant as Qt is supposed to be
> cross-platform.

It's very relevant in a discussion why QString "competes" with std::string.
Because one has unicode support and resides in an environment using it
for everything string-related and consequently supports Unicode consistently,
whereas the other is nothing but a dumb container of instances over a
type with implementation-defined size, implementation-defined sign,
implementation- or environment-defined encoding and implementation-defined
result for pretty much every operation. And in case you're wondering:
in this day and age there's really no such thing as a non-Unicode string
anymore (and hasn't been for at least 15 to 20 years).

> UTF-16 also does not support random access due to UTF-16 surrogate
> pairs so is a variable length encoding like UTF-8; again UTF-8 is
> superior to UTF-16.

I don't know where your UTF-16 information comes from, but the Qt
documentation for QString and QChar make no mention of the internal
storage format, nor does it matter as long as it can losslessly
transport all encodings the platform can support. And no matter what
you seem to believe, QString does have O(1) random access, as any
proper string class must have.

Andreas Dehmel

unread,

Oct 22, 2014, 3:10:31 PM10/22/14

to

You obviously don't have a clue what we mean by "string" semantics.
And you're horribly wrong in thinking encoding doesn't matter for
the majority of cases, it's the complete opposite. Not even something
seemingly simple like concatenation is encoding-agnostic because
concatenating strings with different encodings will result in complete
and utter garbage. String comparisons _always_ depend on the encoding,
not just "certain types" (by which you probably meant case insensitive).
Or do you think that for example the encoding of compiler-generated
string literals matches whatever you want to compare them with, like
say the encoding defined by the environment?

Could it also be that you haven't realized that your suggestion of
delegating string operations to ICU (and implicitly everything
file-related to yet another library or platform-specific CRT) is just
a roundabout way of saying the standard library has NO string support?
Amusing.

Paavo Helde

unread,

Oct 22, 2014, 3:55:21 PM10/22/14

to

Andreas Dehmel <blackhole....@spamgourmet.com> wrote in
news:20141022203...@kuroneko.dehmel-lan.de:

> And you're horribly wrong in thinking encoding doesn't matter for
> the majority of cases, it's the complete opposite. Not even something
> seemingly simple like concatenation is encoding-agnostic because
> concatenating strings with different encodings will result in complete
> and utter garbage. String comparisons _always_ depend on the encoding,
> not just "certain types" (by which you probably meant case insensitive).

There is a simple solution: all strings are in the same encoding.
Everything gets translated into UTF-8 at the application borders (including
command-line arguments from Windows programs). And std::string is just fine
for that.

In most programs random fast access to single Unicode characters is not
needed. And search/comparison works fine in UTF-8, also std::replace works
fine in the ASCII subset (e.g. replacing slashes with backslashes).

One place where the random access might be needed is a text editor control.
Happily, we are using Scintilla library which accepts the UTF-8 encoding
and works just fine. I don't bother what it uses inside.

Cheers
Paavo

Mr Flibble

unread,

Oct 22, 2014, 3:59:31 PM10/22/14

to

On 22/10/2014 19:39, Andreas Dehmel wrote:
> On Tue, 21 Oct 2014 18:37:56 +0100
> Mr Flibble <flibbleREM...@i42.co.uk> wrote:
>
>> On 21/10/2014 03:45, Daniel wrote:
>>> On Monday, October 20, 2014 2:52:52 PM UTC-4, Mr Flibble wrote:
>>>>
>>>> At least std::string lets you encode UTF-8 which is far superior to
>>>> UTF-16 (which like UTF-8 is also a variable length encoding).
>>>>
>>> I believe you mean that std::string can serve as a container of
>>> UTF-8 octets, but so can std::vector<char>. Apart from that,
>>> std::string supports no "string" semantics whatsoever.
>>
>> Sure it does: I have no problem storing my UTF-8 in std::string
>> objects; the need to randomly access code points is usually only
>> required at the rendering level or when doing certain types of string
>> comparisons but if you want do that then do it properly with
>> something like ICU.
>
> You obviously don't have a clue what we mean by "string" semantics.
> And you're horribly wrong in thinking encoding doesn't matter for
> the majority of cases, it's the complete opposite. Not even something
> seemingly simple like concatenation is encoding-agnostic because
> concatenating strings with different encodings will result in complete
> and utter garbage. String comparisons _always_ depend on the encoding,

Who suggested concatenating strings with different encodings? I
certainly didn't; I have no problems concatenating one UTF-8 string with
another UTF-8 string.

> not just "certain types" (by which you probably meant case insensitive).

Nonsense.

> Or do you think that for example the encoding of compiler-generated
> string literals matches whatever you want to compare them with, like
> say the encoding defined by the environment?

More nonsense.

>
> Could it also be that you haven't realized that your suggestion of
> delegating string operations to ICU (and implicitly everything
> file-related to yet another library or platform-specific CRT) is just
> a roundabout way of saying the standard library has NO string support?

Again who said the standard library has no string support? I certainly
didn't.

> Amusing.

Child.

/Flibble

Mr Flibble

unread,

Oct 22, 2014, 4:02:39 PM10/22/14

to

On 22/10/2014 19:22, Andreas Dehmel wrote:
> On Tue, 21 Oct 2014 18:33:59 +0100
> Mr Flibble <flibbleREM...@i42.co.uk> wrote:
>
> [...]
>> What Windows does is off-topic and irrelevant as Qt is supposed to be
>> cross-platform.
>
> It's very relevant in a discussion why QString "competes" with std::string.
> Because one has unicode support and resides in an environment using it
> for everything string-related and consequently supports Unicode consistently,
> whereas the other is nothing but a dumb container of instances over a
> type with implementation-defined size, implementation-defined sign,
> implementation- or environment-defined encoding and implementation-defined
> result for pretty much every operation. And in case you're wondering:
> in this day and age there's really no such thing as a non-Unicode string
> anymore (and hasn't been for at least 15 to 20 years).

You are obviously clueless mate; that was just a clueless rant.

>
>
>> UTF-16 also does not support random access due to UTF-16 surrogate
>> pairs so is a variable length encoding like UTF-8; again UTF-8 is
>> superior to UTF-16.
>
> I don't know where your UTF-16 information comes from, but the Qt
> documentation for QString and QChar make no mention of the internal
> storage format, nor does it matter as long as it can losslessly
> transport all encodings the platform can support. And no matter what
> you seem to believe, QString does have O(1) random access, as any
> proper string class must have.

Try reading the Qt documentation: QString is UTF-16 and UTF-16 is the
worse than UTF-8. If UTF-16 strings have O(1) random access then so do
UTF-8 strings; to repeat: BOTH are variable length encodings.

/Flibble

Tobias Müller

unread,

Oct 22, 2014, 6:23:43 PM10/22/14

to

Andreas Dehmel <blackhole....@spamgourmet.com> wrote:
> First up, no real string class' random accessors will ever return
> anything other than "characters" (which is something completely
> different from "instances of a type which, for mildly amusing
> historical reasons, contains the sequence 'char' in its name).

It's not as straight forward as you might think, Unicode is like dates, the
more you know, the less you know.

There are actually no characters in Unicode, only:
1. Code units (e.g. bytes in case of UTF-8) those are not meaningful by
themself.
2. Code points (Unicodes). A bit more useful but still meaningless in
general, because a 'character' can be composed from multiple code points.
3. Grapheme clusters. Visual representation of a character(-combination).

Grapheme clusters are probably the most character-like representation but
no string class that I know uses them.

Your beloved QChar is actually representing a UTF-16 *code unit*.
Random access on a QString can return a single element of an UTF-16
surrogate pair, which is meaningless by itself.

You should read about Unicode and also the documentation of QString/QChar.

Tobi

Chris Vine

unread,

Oct 22, 2014, 6:53:03 PM10/22/14

to

On Wed, 22 Oct 2014 20:22:21 +0200
Andreas Dehmel <blackhole....@spamgourmet.com> wrote:
> And no matter what
> you seem to believe, QString does have O(1) random access, as any
> proper string class must have.

Have you got a citation for that? QString is documented as using
UTF-16 internally, which is a variable length format (it can have
surrogate pairs), which in turn means that QChar cannot (and doesn't
purport to) represent an entire unicode code point. So I would have
thought it would be very difficult if not impossible to provide O(1)
random access.

If you know about Qt, which you seem to, you can help me with a puzzle
which I have sometimes wondered about with Qt strings.
QString::operator[]() is said to return "the character at the specified
position in the string as a modifiable reference." If it is a bare
reference to a particular QChar object, that would be a terrible design
choice because it means a modification could result in invalidating the
string. It would also mean that the documentation is defective, because
clearly it cannot return a character unless it happens to be in the
restricted UCS2/BMP codeset. Similarly QString::size() is said to
return "the number of characters in this string". Is that actually
correct for any string which isn't wholly in the BMP?

If the documentation is wrong (which I strongly suspect), it would of
course in turn mean that QString is not locale portable and from that
point of view no better than a std::string object holding UTF-8.

Chris

Öö Tiib

unread,

Oct 22, 2014, 8:46:43 PM10/22/14

to

On Wednesday, 22 October 2014 22:10:31 UTC+3, Andreas Dehmel wrote:
> Could it also be that you haven't realized that your suggestion of
> delegating string operations to ICU (and implicitly everything
> file-related to yet another library or platform-specific CRT) is just
> a roundabout way of saying the standard library has NO string support?
> Amusing.

What you mean?

Amusing indeed but this is so. 'std::basic_string<char>' is just byte
buffer. It is indeed good advice to pretend that bytes in 'std::string'
are UTF8 and use ICU directly on those instead of relying on C++
standard library locales. C++ locales are just guaranteed to slow down
iostreams, every other side effect of those are unspecified.

ICU is not platform specific. It is portable open source library. The
project is dragged by IBM, Google and Apple (and possibly other large
users). Qt uses ICU for localization f texts. All official builds of
Qt depend on ICU. You can build Qt without that dependency but then
your "string semantics" will be likely screwed up with Qt as well.

Reinhardt Behm

unread,

Oct 22, 2014, 10:14:04 PM10/22/14

to

Christopher Pisz wrote:

> On 10/22/2014 12:03 AM, Reinhardt Behm wrote:
>> Christopher Pisz wrote:
>>
>>> Its been many occasions that we have some backend service that performs
>>> some kind of calculations or data transformations and it comes time to
>>> put some UI on it. If we are already on a Windows Platform, I think it
>>> is just plain silly to build some UI on top if it in C++ as opposed to
>>> communicating with a .NET app. They can make something beautiful and
>>> completely separated, and separately testable in XAML in a matter of
>>> days.
>>
>> Nice for you if you get something tested in a matter of days.
>> My software often (e.g. avioncis) has to be certified with the relevant
>> authorities. This verification can take weeks to months.
>
>
>
>>> Even at my current job, I am working on an existing C++ service that
>>> communicates via soap to a server written in .NET with an app that
>>> presents, written in .NET. It happens over and over. That's the way it
>>> was when I got here and for good reason.
>>
>> If you have the bandwidth to transport a matchbox using an 18-wheeler
>> (like when soap). Using a satellite link at 2400 bit/sec where you have
>> to literally pay for every byte does not allow me such luxury.
>>
>
> 2400 bps? Where are you working? Why is it still 1983 there?

It is 2014 here, but the physics and the speed of light has not changed in
the last 100 years.
Managing aircraft flying worldwide does not allow much choice. There is only
one system that has reliable worldwide coverage at affordable prices, but
even there you pay 5 cent for each 30 byte package and max data rate really
is 2400 bit/sec.
When I started this project 5 years ago, some genius proposed soap as a
protocol. I had to show him that the costs would exceed the fuel costs
whereas my solution was at about 1$/h.

Well this has nothing to do with the UI, but sometimes I get a bit nervous
when I see people just throw the latest fancy stuff on everything just
because something else is so "pre-2000".

> Anyway you could use COM and send your own structures, compress them, etc.
>
> If you really wanted you could make your own protocol and use your own
> raw socket. It just depends how much you want to code and maintain.
>
> I am not sure why you have such a bandwidth limitation. Because
> considering the argument, I would expect you to have the backend and the
> UI on the same machine. I don't in my scenario, but when people are
> arguing for making the UI in C++, I assume the want it in as part of the
> same executable as their computations. If not, you have the same
> problems of transporting data back and forth. It's not a language issue.

I have both scenarios. Tightly coupled systems with very limited CPU
resources (there is not much space in a helicopter), with complex demands on
the UI and limited bandwidth coupled servers and systems on the ground
running on diverse OSes.

Well, many of my applications are to be certified and this means not only
the application, but the whole system. Pulling in some additional library or
language interpreter adds to the complexity and the costs. So if I can stick
with one library which covers most of what is needed is just the way to go
then.
This is where Qt is just handy and putting together some simple UI is also
only a matter of hours.

--
Reinhardt

Daniel

unread,

Oct 23, 2014, 6:30:25 AM10/23/14

to

That was my thought too :-)

> Again who said the standard library has no string support? I certainly
> didn't.
>

No, it wasn't you. Your point is that in C++ with std::string + third party library, working with unicode is manageable, Andreas' point (somewhat impolitely phrased) is that with C++ alone it is not. Both points are correct.

Does it matter that C++ requires third part libraries, even if free open source? I think that it does. Many authors of small open source utilities feel that it is undesirable to have dependencies on other libraries. Authors of C++ JSON parsers, for example, have to support conversions between UTF8 and codepoints, and they typically implement that themselves. That shouldn't be necessary.

Daniel

Chris Vine

unread,

Oct 23, 2014, 11:40:55 AM10/23/14

to

The complaint was not that the C++ standard does not make provision for
codeset conversions, but that this amounts to it having "NO string
support". I think that is a very strange way of looking at it, and
wrong.

The C++ standard library does not come equipped with functions to carry
out conversions between different character sets. However I do not see
that as an impediment. There are bazillions of libraries available to
do that for you. If that is not good enough it would be
straightforward to make a proposal to the C++ standard committee for
them to be added (but which might well be rejected).

The actual complaint was that "you're horribly wrong in thinking

encoding doesn't matter for the majority of cases, it's the complete
opposite. Not even something seemingly simple like concatenation is
encoding-agnostic because concatenating strings with different
encodings will result in complete and utter garbage. String comparisons

_always_ depend on the encoding, not just "certain types" ...". The
further proposition was that the functions to enable this should be
part of the string interface.

I disagree with that also. You don't need specific unicode support to
provide string concatenation - a C++ string will concatenate anything
you offer it provided that it has the correct formal type (char for
std::string, wchar_t for std::wstring, char16_t for std::u16string or
char32_t for std::u32string). Ditto comparisons. It is up to the user
to get the codesets right when concatenating or comparing. Whether the
function which is doing the conversion in order to perform the
concatenation or comparison is internal to the string or external to it
is beside the point. std::string is enough of a monster already. Note
also that there a number of systems whose wide character codeset is
neither UTF-16 nor UTF-32. And some systems still using narrow
encodings other than UTF-8. (ISO 8859-1 and derivatives still seems
quite popular, as well as being compatible with UTF-16 up to code point
255.)

The last point to make is that the poster's exemplar of QString does not
actually seem to perform the function that he thinks it does. You still
have to offer QString its input in a codeset it recognises (UTF-8 or
UTF-16), for obvious reasons; for anything else the user has to make
her own conversions using fromLatin1(), fromLocal8bit() or use some
external conversion function, and if you don't it will fail. And you
still have to ensure that if you are comparing strings they are
correctly normalized (to use or not use combining characters). And
QString carries out comparisons of two individual characters using
QChar, which only covers characters in the basic multilingual plane.
And its character access functions also only return QChar and not a 32
bit type capable of holding a unicode code point. Indeed, as far as I
can tell (but I stand ready to be corrected) it appears that all
individual character access functions in QString only correctly handle
the BMP, including its iterators and the way it indexes for its other
methods such as chop(), indexOf() and size(). It even appears to allow
a QString to be modified by indexed 16 bit code units. If so, that is
hopeless.

I used to use frequently a string class which was designed for UTF-8.
In the end, I stopped using it because I found it had little actual
advantage over std::string. You still had to validate what went into
this string class to ensure that it really was UTF-8, and convert if
necessary. The class provided an operator[]() method which returned a
whole unicode code point which was nice (and which QString appears not
to), but in the end I made my own iterator class for std::string which
iterates over the string by whole code points (and dereferences to a
32 bit type), and in practice I found that was just as good.

Chris

Nobody

unread,

Oct 23, 2014, 1:33:40 PM10/23/14

to

On Thu, 23 Oct 2014 16:40:45 +0100, Chris Vine wrote:

> The C++ standard library does not come equipped with functions to carry
> out conversions between different character sets.

Why doesn't std::codecvt qualify?

Chris Vine

unread,

Oct 23, 2014, 2:46:10 PM10/23/14

to

Good point. It is easy to overlook C++11's built in conversion
specializations for UTF-8 <-> UTF-32 and UTF-8 <-> UTF-32. There must
be at least some compilers that now support it (presumably gcc-4.9 and
clang-3.4 do). While std::codecvt has in the past been more closely
associated with file streams, there is no reason why it cannot be used
with strings, and there is std::wstring_convert to help you do it.

So does this mean that C++11 now does provide "string support" after
all? I suppost those that hold that line would have to say that it
does. (Of course, I say string support and conversion support are two
different things.)

Chris

Andreas Dehmel

unread,

Oct 23, 2014, 3:33:01 PM10/23/14

to

On Wed, 22 Oct 2014 23:52:50 +0100
Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:

> On Wed, 22 Oct 2014 20:22:21 +0200
> Andreas Dehmel <blackhole....@spamgourmet.com> wrote:
> > And no matter what
> > you seem to believe, QString does have O(1) random access, as any
> > proper string class must have.
>
> Have you got a citation for that? QString is documented as using
> UTF-16 internally, which is a variable length format (it can have
> surrogate pairs), which in turn means that QChar cannot (and doesn't
> purport to) represent an entire unicode code point. So I would have
> thought it would be very difficult if not impossible to provide O(1)
> random access.

Where is that documented? Because I looked through the Qt4 documentation
of QChar and QString and while they mention a 16-bit base type, there's
nothing there about the internal encoding being UTF-16. There are
certainly conversion functions for UTF-16 and several others in the
respective interfaces of these classes, just like for UTF-8 and Latin-1,
but that has nothing to do the internal storage format.

Andreas Dehmel

unread,

Oct 23, 2014, 3:33:01 PM10/23/14

to

I think it boils down to this, though. Pretty much the only thing
the standard lib provides is a buffer with a zero at the end as its
only non-implementation-defined property, and I think that's not even
remotely sufficient to seriously claim "string support". Never mind
that most functions taking "string" arguments go even lower level
with a raw char*...

[...]

> The last point to make is that the poster's exemplar of QString does
> not actually seem to perform the function that he thinks it does.
> You still have to offer QString its input in a codeset it recognises
> (UTF-8 or UTF-16), for obvious reasons; for anything else the user
> has to make her own conversions using fromLatin1(), fromLocal8bit()
> or use some external conversion function, and if you don't it will
> fail.

Of course. I never claimed QString will magically guess the encoding.
However, sorting out the encoding during construction is a much
cleaner, less error-prone way than keeping the encoding implicitly
on the side as in std::string and working on a raw byte buffer (where
strictly speaking the standard doesn't even say it's a byte).
Components making errors during construction (e.g. another library
which, like so many, just dumps local encoding into everything) are
typically revealed much sooner that way, within those components
near the error, rather than completely elsewhere where they'll make
a _real_ mess.
Conversions usually don't require external tools BTW, the methods
in the QString-interface are just convenience for the most popular
encodings, a much wider range is available via QTextCodec.

> And you still have to ensure that if you are comparing strings
> they are correctly normalized (to use or not use combining
> characters). And QString carries out comparisons of two individual
> characters using QChar, which only covers characters in the basic
> multilingual plane. And its character access functions also only
> return QChar and not a 32 bit type capable of holding a unicode code
> point. Indeed, as far as I can tell (but I stand ready to be
> corrected) it appears that all individual character access functions
> in QString only correctly handle the BMP, including its iterators and
> the way it indexes for its other methods such as chop(), indexOf()
> and size(). It even appears to allow a QString to be modified by
> indexed 16 bit code units. If so, that is hopeless.

That depends on what you use it for. When using it to communicate
with the OS, e.g. file system, console, GUI etc., I have yet to see
this become an issue (which includes Asian countries), and that's one
of the biggest advantages QString has in my eyes: it's not the class
itself, it's the entire environment it's integrated in, and integrated
well. I've mentioned this several times, but it's obviously a point
everybody painstakingly avoids addressing for fear of admitting that
pretty much every function in the standard library that goes beyond
manipulating a mere byte buffer (i.e. pretty much everything interfacing
with the system and the environment, or in other words everything you
can't just as well implement yourself in portable user space but only by
resorting to platform-specific extensions) can't handle UTF-8 (except
by accident). And while QString may not be perfect, it's several orders
of magnitude better in these areas than anything the standard provides,
which is basically undefined behaviour for anything not restricted to
7-bits of some unqualified encoding (good luck trying to feed a
UTF-8 encoded filename into fopen() on e.g. Windows...)

> I used to use frequently a string class which was designed for UTF-8.
> In the end, I stopped using it because I found it had little actual
> advantage over std::string. You still had to validate what went into
> this string class to ensure that it really was UTF-8, and convert if
> necessary. The class provided an operator[]() method which returned
> a whole unicode code point which was nice (and which QString appears
> not to), but in the end I made my own iterator class for std::string
> which iterates over the string by whole code points (and dereferences
> to a 32 bit type), and in practice I found that was just as good.

So in conclusion:
1) you initially used a string class not in the standard library
2) you then extended the standard library for string handling
3) you need external libraries for string transcoding so you can
even get started using the "std::string has UTF-8 encoding" convention
4) you need external libraries to actually do anything with these
strings that goes beyond the simplest forms of buffer manipulation
5) you need other external libraries if you want these "strings"
to actually interface with something like the file system. Which
to me is really beyond insanity.

Yet people insist the standard has string support and as far as I'm
concerned that simply does not compute. We can agree to disagree.

Mr Flibble

unread,

Oct 23, 2014, 3:56:25 PM10/23/14

to

On 23/10/2014 20:25, Andreas Dehmel wrote:
[snip]

> Yet people insist the standard has string support and as far as I'm
> concerned that simply does not compute. We can agree to disagree.

QString is terrible in comparison to std::string because UTF-16 is
terrible in comparison to UTF-8. UTF-16 QString does NOT have O(1)
random access as you don't know how many surrogate pairs are in the
string. The fact that QString "works well" with Windows (which also
erroneously embraces UTF-16) is irrelevant as Qt is supposed to be
cross-platform.

Bottom line: UTF-8 and UTF-32 FTW; UTF-16 if you are mental.

/Flibble

Paavo Helde

unread,

Oct 23, 2014, 4:13:20 PM10/23/14

to

Andreas Dehmel <blackhole....@spamgourmet.com> wrote in

news:20141023212...@kuroneko.dehmel-lan.de:

> So in conclusion:
> 1) you initially used a string class not in the standard library
> 2) you then extended the standard library for string handling
> 3) you need external libraries for string transcoding so you can
> even get started using the "std::string has UTF-8 encoding" convention
> 4) you need external libraries to actually do anything with these
> strings that goes beyond the simplest forms of buffer manipulation
> 5) you need other external libraries if you want these "strings"
> to actually interface with something like the file system. Which
> to me is really beyond insanity.

You are speaking like Qt were not an external library.

Besides, communicating UTF-8 to OS does not need any external libraries
on common targets. On Linux/Mac no conversion is needed (de facto). On
Windows you have the Windows SDK functions MultiByteToWideChar() and
WideCharToMultiByte() which do the work. Just wrap them up in 3-liner C++
wrapper functions and there you go.

Besides, UTF-8 to UTF-16 or UTF-32 conversion is not actually a rocket
science, it would be just a 10 or 20 line function in C++. No need to
incorporate a huge external library.

Cheers
Paavo

Chris Vine

unread,

Oct 23, 2014, 5:00:22 PM10/23/14

to

On Thu, 23 Oct 2014 21:30:53 +0200

Andreas Dehmel <blackhole....@spamgourmet.com> wrote:
> On Wed, 22 Oct 2014 23:52:50 +0100
> Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
>
> > On Wed, 22 Oct 2014 20:22:21 +0200
> > Andreas Dehmel <blackhole....@spamgourmet.com> wrote:
> > > And no matter what
> > > you seem to believe, QString does have O(1) random access, as any
> > > proper string class must have.
> >
> > Have you got a citation for that? QString is documented as using
> > UTF-16 internally, which is a variable length format (it can have
> > surrogate pairs), which in turn means that QChar cannot (and doesn't
> > purport to) represent an entire unicode code point. So I would have
> > thought it would be very difficult if not impossible to provide O(1)
> > random access.
>
> Where is that documented? Because I looked through the Qt4
> documentation of QChar and QString and while they mention a 16-bit
> base type, there's nothing there about the internal encoding being
> UTF-16. There are certainly conversion functions for UTF-16 and
> several others in the respective interfaces of these classes, just
> like for UTF-8 and Latin-1, but that has nothing to do the internal
> storage format.

http://qt-project.org/doc/qt-5/qstring.html
"QString stores a string of 16-bit QChars, where each QChar corresponds
one Unicode 4.0 character. (Unicode characters with code values above
65535 are stored using surrogate pairs, i.e., two consecutive QChars.)"

http://qt-project.org/wiki/QtStrings
"Internally, QString stores the string using the UTF-16 encoding"

Chris

Chris Vine

unread,

Oct 23, 2014, 5:00:22 PM10/23/14

to

On Thu, 23 Oct 2014 21:25:10 +0200
Andreas Dehmel <blackhole....@spamgourmet.com> wrote:
> Of course. I never claimed QString will magically guess the encoding.
> However, sorting out the encoding during construction is a much
> cleaner, less error-prone way than keeping the encoding implicitly
> on the side as in std::string and working on a raw byte buffer (where
> strictly speaking the standard doesn't even say it's a byte).
> Components making errors during construction (e.g. another library
> which, like so many, just dumps local encoding into everything) are
> typically revealed much sooner that way, within those components
> near the error, rather than completely elsewhere where they'll make
> a _real_ mess.
> Conversions usually don't require external tools BTW, the methods
> in the QString-interface are just convenience for the most popular
> encodings, a much wider range is available via QTextCodec.

This is drivel. QString keeps its data as a raw buffer of 16 bit
unsigned integers (QChar). Everything you say about std::string you can
say about QString.

> That depends on what you use it for. When using it to communicate
> with the OS, e.g. file system, console, GUI etc., I have yet to see
> this become an issue (which includes Asian countries), and that's one
> of the biggest advantages QString has in my eyes: it's not the class
> itself, it's the entire environment it's integrated in, and integrated
> well. I've mentioned this several times, but it's obviously a point
> everybody painstakingly avoids addressing for fear of admitting that
> pretty much every function in the standard library that goes beyond
> manipulating a mere byte buffer (i.e. pretty much everything
> interfacing with the system and the environment, or in other words
> everything you can't just as well implement yourself in portable user
> space but only by resorting to platform-specific extensions) can't
> handle UTF-8 (except by accident). And while QString may not be
> perfect, it's several orders of magnitude better in these areas than
> anything the standard provides, which is basically undefined
> behaviour for anything not restricted to 7-bits of some unqualified
> encoding (good luck trying to feed a UTF-8 encoded filename into
> fopen() on e.g. Windows...)

Many words signifying nothing: see above. QString has virtually nothing
that std::u16string doesn't have, including its unsuitability for use
with UTF-16. On your line that USC-2 is enough, try that one out on
your Chinese consumers.

Your windows point is a red herring. I know of no unix-like system
whose C library doesn't accept UTF-8 for filenames, if that happens
also to be the system's narrow filename encoding. As far as fopen() is
concerned, the file name is a concatenation of bytes. However, as you
probably well know, windows uses Windows ANSI as its narrow character
encoding. Complain to microsoft about that.

> So in conclusion:
> 1) you initially used a string class not in the standard library
> 2) you then extended the standard library for string handling
> 3) you need external libraries for string transcoding so you can
> even get started using the "std::string has UTF-8 encoding" convention
> 4) you need external libraries to actually do anything with these
> strings that goes beyond the simplest forms of buffer manipulation
> 5) you need other external libraries if you want these "strings"
> to actually interface with something like the file system. Which
> to me is really beyond insanity.

As someone else has pointed out, you don't any more need external
libraries for UTF-8 <-> UTF-16 and UTF-8 <-> UTF32 conversion. C++11
offers that as a built-in specialization of std::codecvt. But that is a
side issue, because there are many others available.

And if you want to use QString for UTF-16 as opposed to USC-2, you
would need to do exactly the same yourself. Using UTF-16 with QString
is little different from using UTF-8 with std::string, and no different
from using UTF-16 with std::u16string.

You can go on shouting "integration, integration" ("it's the entire
environment it's integrated in, and integrated well"), but those are
just vacuous buzzwords. The QString methods are an overblown set of
functions most of which should be helper functions and not member
functions (including the static member conversion functions), most of
which are provided also by std::u16string (most member functions of
which should also be helper functions) or std::codecvt and all the
others of which are trivially implementable. And QString is a terrible
implementation for a dedicated UTF-16 string, just as std::u16string
is. You have to be blind not to see it.

Chris

Christopher Pisz

unread,

Oct 23, 2014, 6:07:41 PM10/23/14

to

On 10/23/2014 3:13 PM, Paavo Helde wrote:
> Andreas Dehmel <blackhole....@spamgourmet.com> wrote in
> news:20141023212...@kuroneko.dehmel-lan.de:

SNIP

> On Windows you have the Windows SDK functions MultiByteToWideChar() and
> WideCharToMultiByte() which do the work. Just wrap them up in 3-liner C++
> wrapper functions and there you go.
>
> Besides, UTF-8 to UTF-16 or UTF-32 conversion is not actually a rocket
> science, it would be just a 10 or 20 line function in C++. No need to
> incorporate a huge external library.
>
> Cheers
> Paavo
>

Agree.
Been part of my common library in every project for more than a decade.
What's the big deal?

Andreas Dehmel

unread,

Oct 24, 2014, 3:09:55 PM10/24/14

to

On Thu, 23 Oct 2014 22:00:05 +0100
Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:

> On Thu, 23 Oct 2014 21:25:10 +0200
> Andreas Dehmel <blackhole....@spamgourmet.com> wrote:

[...]

> Your windows point is a red herring. I know of no unix-like system
> whose C library doesn't accept UTF-8 for filenames, if that happens
> also to be the system's narrow filename encoding. As far as fopen()
> is concerned, the file name is a concatenation of bytes. However, as
> you probably well know, windows uses Windows ANSI as its narrow
> character encoding. Complain to microsoft about that.

Actually, we should complain to the standard about that. Since
Microsoft's C++-implementations are also standard-compliant (at least
as far as these matters are concerned), the standard clearly failed
to ensure a usable abstraction even for such fundamental things as
the file system (by e.g. requiring a working UTF-8 locale which would
at least ensure you can access files at all). The standard has at
least as big a part in this mess as Microsoft does.

[...]

> As someone else has pointed out, you don't any more need external
> libraries for UTF-8 <-> UTF-16 and UTF-8 <-> UTF32 conversion. C++11
> offers that as a built-in specialization of std::codecvt.

As far as I can remember from reading the C++11 update, the standard
neither states a minimum set of mandatory facets nor a standardized
notation to access them, so my answer to that is still a clear no.

> You can go on shouting "integration, integration" ("it's the entire
> environment it's integrated in, and integrated well"), but those are
> just vacuous buzzwords.

They're not. I'm using both the standard library and Qt a lot and
I do appreciate not having to wrap everything string-related myself
in a platform-specific way. Whether you want to believe it or not,
in real-life terms it does make things a lot easier; I wouldn't
use it otherwise.

> The QString methods are an overblown set of
> functions most of which should be helper functions and not member
> functions (including the static member conversion functions), most of
> which are provided also by std::u16string (most member functions of
> which should also be helper functions) or std::codecvt and all the
> others of which are trivially implementable.

That's rich. Ever had a look a the amount of crap a current std::string-
include pulls in? Makes QString look positively anaemic.

Chris Vine

unread,

Oct 24, 2014, 4:41:52 PM10/24/14

to

On Fri, 24 Oct 2014 21:09:01 +0200
Andreas Dehmel <blackhole....@spamgourmet.com> wrote:
> On Thu, 23 Oct 2014 22:00:05 +0100
> Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:

> > [snip]

>
> Actually, we should complain to the standard about that. Since
> Microsoft's C++-implementations are also standard-compliant (at least
> as far as these matters are concerned), the standard clearly failed
> to ensure a usable abstraction even for such fundamental things as
> the file system (by e.g. requiring a working UTF-8 locale which would
> at least ensure you can access files at all). The standard has at
> least as big a part in this mess as Microsoft does.

It is not for the C++ standard to mandate codesets for filesystems,
that would be absurd. On a particular implementation you use whatever
locale and filename codesets the implementation has chosen to adopt.
And it isn't a mess - it was you trying to make something of it. I have
never written code which depends on a particular narrow character
codeset for the target machine.

So far as you insist there is still a problem, Qt cannot answer it
because at the end of the day it is a wrapper which still has to use the
host system's filesystem calls underneath.

> > As someone else has pointed out, you don't any more need external
> > libraries for UTF-8 <-> UTF-16 and UTF-8 <-> UTF32 conversion.
> > C++11 offers that as a built-in specialization of std::codecvt.
>
> As far as I can remember from reading the C++11 update, the standard
> neither states a minimum set of mandatory facets nor a standardized
> notation to access them, so my answer to that is still a clear no.

These specializations are required in table 81 and §22.4.1.4/3 of the
C++11 standard.

[snip]

> > The QString methods are an overblown set of
> > functions most of which should be helper functions and not member
> > functions (including the static member conversion functions), most
> > of which are provided also by std::u16string (most member functions
> > of which should also be helper functions) or std::codecvt and all
> > the others of which are trivially implementable.
>
> That's rich. Ever had a look a the amount of crap a current
> std::string- include pulls in? Makes QString look positively anaemic.

Care to elaborate on the "crap" that std::string pulls in that QString
does not?

Chris

Bo Persson

unread,

Oct 24, 2014, 5:37:45 PM10/24/14

to

On 2014-10-24 21:09, Andreas Dehmel wrote:
> On Thu, 23 Oct 2014 22:00:05 +0100
> Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
>
>> On Thu, 23 Oct 2014 21:25:10 +0200
>> Andreas Dehmel <blackhole....@spamgourmet.com> wrote:
> [...]
>> Your windows point is a red herring. I know of no unix-like system
>> whose C library doesn't accept UTF-8 for filenames, if that happens
>> also to be the system's narrow filename encoding. As far as fopen()
>> is concerned, the file name is a concatenation of bytes. However, as
>> you probably well know, windows uses Windows ANSI as its narrow
>> character encoding. Complain to microsoft about that.
>
> Actually, we should complain to the standard about that. Since
> Microsoft's C++-implementations are also standard-compliant (at least
> as far as these matters are concerned), the standard clearly failed
> to ensure a usable abstraction even for such fundamental things as
> the file system (by e.g. requiring a working UTF-8 locale which would
> at least ensure you can access files at all). The standard has at
> least as big a part in this mess as Microsoft does.
>

IBM has used EBCDIC for file names on their mainframes since the 1960's.
No programming language is likely to change that.

Bo Persson

Tobias Müller

unread,

Oct 24, 2014, 5:44:30 PM10/24/14

to

Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
> However, as you probably well know, windows uses Windows ANSI as its
> narrow character encoding. Complain to microsoft about that.

It's called WinANSI everywhere (including MSDN), but it's actually just the
"current multibyte encoding", as in unix. You can use encodings other than
Windows ANSI, just noone does it.

Tobi

Andreas Dehmel

unread,

Oct 25, 2014, 10:09:25 AM10/25/14

to

On Fri, 24 Oct 2014 21:41:33 +0100

Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:

> On Fri, 24 Oct 2014 21:09:01 +0200
> Andreas Dehmel <blackhole....@spamgourmet.com> wrote:
> > On Thu, 23 Oct 2014 22:00:05 +0100
> > Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
> > > [snip]
> >
> > Actually, we should complain to the standard about that. Since
> > Microsoft's C++-implementations are also standard-compliant (at
> > least as far as these matters are concerned), the standard clearly
> > failed to ensure a usable abstraction even for such fundamental
> > things as the file system (by e.g. requiring a working UTF-8 locale
> > which would at least ensure you can access files at all). The
> > standard has at least as big a part in this mess as Microsoft does.
>
> It is not for the C++ standard to mandate codesets for filesystems,
> that would be absurd.

I didn't say codesets for filesystems but a working UTF-8 locale,
that's different. That would merely ensure a _lossless_ encoding
to use in communicating with the standard library, which will still
have to translate that to whatever the system uses for its various
components (just as it does now).

> On a particular implementation you use whatever
> locale and filename codesets the implementation has chosen to adopt.
> And it isn't a mess - it was you trying to make something of it. I
> have never written code which depends on a particular narrow character
> codeset for the target machine.

I'm sure you have. Pretty much everybody has because the encoding
of compiler-generated string literals is implementation-defined
and we've all used those in one form or another. It being some sort
of ASCII-superset is merely a very common convention. Only C++11
u8-literals have a well-defined encoding and these will take a long
time to appear in actual code.

[...]

> > That's rich. Ever had a look a the amount of crap a current
> > std::string- include pulls in? Makes QString look positively
> > anaemic.
>
> Care to elaborate on the "crap" that std::string pulls in that QString
> does not?

Well, looking at the string-includes for GCC-4.7 on Linux:

#include <bits/c++config.h>
#include <bits/stringfwd.h>
#include <bits/char_traits.h> // NB: In turn includes stl_algobase.h
#include <bits/allocator.h>
#include <bits/cpp_type_traits.h>
#include <bits/localefwd.h> // For operators >>, <<, and getline.
#include <bits/ostream_insert.h>
#include <bits/stl_iterator_base_types.h>
#include <bits/stl_iterator_base_funcs.h>
#include <bits/stl_iterator.h>
#include <bits/stl_function.h> // For less
#include <ext/numeric_traits.h>
#include <bits/stl_algobase.h>
#include <bits/range_access.h>
#include <bits/basic_string.h>
#include <bits/basic_string.tcc>

... and it only gets worse from there. While some of these are obviously
needed, I wouldn't call pulling in everything from iostreams to (later
on) hash functions particularily memorable design. YMMV.

Öö Tiib

unread,

Oct 25, 2014, 1:09:58 PM10/25/14

to

On Saturday, 25 October 2014 17:09:25 UTC+3, Andreas Dehmel wrote:
> On Fri, 24 Oct 2014 21:41:33 +0100
> Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
> > On a particular implementation you use whatever
> > locale and filename codesets the implementation has chosen to adopt.
> > And it isn't a mess - it was you trying to make something of it. I
> > have never written code which depends on a particular narrow character
> > codeset for the target machine.
>
> I'm sure you have. Pretty much everybody has because the encoding
> of compiler-generated string literals is implementation-defined
> and we've all used those in one form or another. It being some sort
> of ASCII-superset is merely a very common convention. Only C++11
> u8-literals have a well-defined encoding and these will take a long
> time to appear in actual code.

Everybody use string literals for ASCII (whose superset UTF8 is).
People who uses string literals where ASCII is not sufficient
(like i18n) will soon regret anyway. No one does want those as
string literals. Translation company? No. Coder? No. End user? No.
So non-ASCII string literal is basically red herring and bear trap
for naive novice developer.

Chris Vine

unread,

Oct 25, 2014, 5:05:55 PM10/25/14

to

On Sat, 25 Oct 2014 15:41:05 +0200
Andreas Dehmel <blackhole....@spamgourmet.com> wrote:
> On Fri, 24 Oct 2014 21:41:33 +0100
> Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
> > It is not for the C++ standard to mandate codesets for filesystems,
> > that would be absurd.
>
> I didn't say codesets for filesystems but a working UTF-8 locale,
> that's different. That would merely ensure a _lossless_ encoding
> to use in communicating with the standard library, which will still
> have to translate that to whatever the system uses for its various
> components (just as it does now).

That is pointless. There is no point in providing a lossless channel
to a lossy system encoding. You just stay with the system encoding,
whatever it happens to be.

> > On a particular implementation you use whatever
> > locale and filename codesets the implementation has chosen to adopt.
> > And it isn't a mess - it was you trying to make something of it. I
> > have never written code which depends on a particular narrow
> > character codeset for the target machine.
>
> I'm sure you have. Pretty much everybody has because the encoding
> of compiler-generated string literals is implementation-defined
> and we've all used those in one form or another. It being some sort
> of ASCII-superset is merely a very common convention. Only C++11
> u8-literals have a well-defined encoding and these will take a long
> time to appear in actual code.

I have not, and the same applies as I have set out above. Unless you
are writing to a machine whose narrow or wide locale encoding is
guaranteed to support unicode, you can't write string literals covering
the full unicode range in any context which uses narrow or wide
encoding. I do not write programs for such constrained cases, and I do
not think you could find any program which does, except in very
specialized cases. In practice I pass all such strings dynamically
through a text and encoding translation layer, such as gettext (gettext
will translate both), but there are a number of similar systems
available. Translation and encoding failure is then soft and
recoverable dynamically at run time (say with untranslated ASCII or
substitute characters), which it has to be, because the system locale
on which a binary runs is a runtime property. Incidentally, Qt does
the same for its strings.

Frankly, that is pathetic. These headers are all very basic stuff,
separated out for the purposes of modularity, and are no indication of
either complexity or "crap" content.

You started off with what I regard as a bizarre proposition (to another
poster) that "Could it also be that you haven't realized that your

suggestion of delegating string operations to ICU (and implicitly
everything file-related to yet another library or platform-specific
CRT) is just a roundabout way of saying the standard library has NO

string support? Amusing", and then added to it in a way which led me to
believe that you did not have a clear understanding of how Qt strings
are implemented internally, nor of unicode nor the standard library.
However, you are a very accomplished linguist (and that is a
compliment).

Chris

Bo Persson

unread,

Oct 25, 2014, 6:52:35 PM10/25/14

to

"Everybody" doesn't use ASCII. Some of us use EBCDIC since before ASCII
even existed.

How is a language standard going to change that?

Bo Persson

Öö Tiib

unread,

Oct 25, 2014, 8:37:11 PM10/25/14

to

Sure, everybody doesn't use C++ either but we weren't discussing
string literals in COBOL. ASCII was first used commercially 1963 ...
neither C, C++ nor string literals of those languages did exist
back then.

> How is a language standard going to change that?

Most internet text content is UTF8, rest is its subset ASCII. How can
some computing device in modern world expect to communicate with rest
of it if it does not know those encodings? So programming language
standard can easily just accept what the objective reality anyway
is.

IBM (or is it Lenovo now?) is the General Motors of IT industry and
if it needs its EBCDIC then it may provide "extensions" that support
its EBCDIC, not other way around.

Martijn Lievaart

unread,

Oct 26, 2014, 8:05:17 AM10/26/14

to

It's actually EBCDIC-UTF8 now, which more or less reinforces the point
made.

M4

David Brown

unread,

Oct 26, 2014, 8:10:01 AM10/26/14

to

That is not remotely true.

Most programs are written to be used only in the language of the country
they are written, and are written by people who speak that language. So
when I am writing a program where the user-visible strings will be in
Norwegian, it's because I am writing a single-language program (and it
will /always/ be single language) and I want to write my string literals
using Norwegian letters (æ ø å) - without the fuss and effort of
external translations.

When you are working with external translations, or coders that don't
speak the language, then I agree that plain ASCII is the way to go. But
that is not the way that most code is written - most coders want to be
able to write strings (and comments) in their own language.

Chris Vine

unread,

Oct 26, 2014, 11:02:17 AM10/26/14

to

On Sun, 26 Oct 2014 13:09:48 +0100
David Brown <david...@hesbynett.no> wrote:
> Most programs are written to be used only in the language of the
> country they are written

That seems to me to be most improbable. Casting my mind to all the
devices I have about the home, car or office, and which have some sort
of programmatic control, they all seem to be marketed to more than one
country.

I am not sure where a metric on this can be obtained. What were you
basing your conclusion on - is this personal anecdote/experience?

> So when I am writing a program where the user-visible
> strings will be in Norwegian, it's because I am writing a
> single-language program (and it will /always/ be single language) and
> I want to write my string literals using Norwegian letters (æ ø å) -
> without the fuss and effort of external translations.

However, even if you are correct about single-language targets for
"most programs", that does not mean that the user is using the same
codeset as the programmer to represent the alphabet of the language in
question. To have such control of the system in use I guess you are
thinking of embedded devices or single supplier domain specific
computing. But as I say, most embedded devices seem to be marketed to
more than one country.

Chris

Öö Tiib

unread,

Oct 26, 2014, 12:00:37 PM10/26/14

to

Most? I have directly opposite observation.

Most open source C or C++ software has English comments and names in
code and is either not dealing with texts at all, works English-only
or is internationalized to several languages.
Most commercial C++ software is internationalized to several languages.

One reason is that the C++ itself does not support universal encoding.
That u8"Whatever" is pointless and even that is not supported by most
compilers anyway. Correct would be that utf8 text can be immediately
written "اللغة العربية الفصحى" and every other crap should look ugly
like ebcdic"Lenovo stuff".

> when I am writing a program where the user-visible strings will be in
> Norwegian, it's because I am writing a single-language program (and it
> will /always/ be single language) and I want to write my string literals
> using Norwegian letters (æ ø å) - without the fuss and effort of
> external translations.

I applaud such welcome nationalism ... you have every right to do like
that. The only thing that is strange ... why you, majority, haven't
beaten the C++ standard committee to senses so it has decent UTF-8
support in it?

I myself avoid writing for subset of 5 million people when there is
likely similar subset on market of 7 billion people but that is only
mine greedy and mercantile world-view. ;)

> When you are working with external translations, or coders that don't
> speak the language, then I agree that plain ASCII is the way to go.

Yes, we here are in constant difficulties to find enough decent
engineers so we always want to hire bright people from where ever who
happened to move here. If they can understand our bad English and
we can understand their C++ code then deal it likely is. That is
other reason why we keep all names and comments in English.

> that is not the way that most code is written - most coders want to be
> able to write strings (and comments) in their own language.

Oh, lets speak about place where there are most coders then?

India is a country with at least 4 major local alphabets (Hindi,
Bengali, Punjabi, Beharati) and about 40 major local languages (more
than million of speakers each). There are about 200 times more people
than in Norway and about 400 times more coders than in Norway.

One of reason why they have such booming software industry there with
millions and millions of Indian developers is that *none* of them
stubbornly insists to write C or C++ using those alphabets or languages.
The compilers do not accept it and they themselves would be in trouble
to understand and to maintain each others code if they did. Instead they
stick with Latin alphabet, ASCII and English.

Bo Persson

unread,

Oct 26, 2014, 3:15:31 PM10/26/14

to

Wanna make a guess if "most software" is open source or closed source
inhouse software? :-)

Quite often the business is the service produced by the software, not
the software itself.

Bo Persson

David Brown

unread,

Oct 26, 2014, 4:20:58 PM10/26/14

to

On 26/10/14 16:01, Chris Vine wrote:
> On Sun, 26 Oct 2014 13:09:48 +0100
> David Brown <david...@hesbynett.no> wrote:
>> Most programs are written to be used only in the language of the
>> country they are written
>
> That seems to me to be most improbable. Casting my mind to all the
> devices I have about the home, car or office, and which have some sort
> of programmatic control, they all seem to be marketed to more than one
> country.
>
> I am not sure where a metric on this can be obtained. What were you
> basing your conclusion on - is this personal anecdote/experience?

I am basing my my argument on the "fact" (meaning I can't give decent
references or statistics - so if you don't agree with me, that's fair
enough) that most programs written are for very small audiences. It is
more common to write code for a single customer, or for a single
specialised application, than to write something that ends up spread all
over the world.

In fact, I expect that the majority of programs ever written have not
made it off the developer's own PC.

>
>> So when I am writing a program where the user-visible
>> strings will be in Norwegian, it's because I am writing a
>> single-language program (and it will /always/ be single language) and
>> I want to write my string literals using Norwegian letters (æ ø å) -
>> without the fuss and effort of external translations.
>
> However, even if you are correct about single-language targets for
> "most programs", that does not mean that the user is using the same
> codeset as the programmer to represent the alphabet of the language in
> question. To have such control of the system in use I guess you are
> thinking of embedded devices or single supplier domain specific
> computing. But as I say, most embedded devices seem to be marketed to
> more than one country.
>

If you count in terms of number of embedded devices, then most will be
international - but if you count in terms of the number of types of
embedded devices, or the number of embedded programs sold, then only a
very small proportion get sold internationally. (Of course, most
embedded devices don't have any user-level strings anyway.)

David Brown

unread,

Oct 26, 2014, 4:42:40 PM10/26/14

to

I would agree about open source software. But most software written is
not open source software.

> Most commercial C++ software is internationalized to several languages.

I disagree (again, I have no statistics to back this up). Most publicly
available commercial software is written in English only, and only
supports English. But most commercially written software is not
publicly available at all - it is written for specific uses. (And a lot
of software that hopes to be international never makes it that far.)

However, I don't think that C++ is the most common language for such
software.

>
> One reason is that the C++ itself does not support universal encoding.
> That u8"Whatever" is pointless and even that is not supported by most
> compilers anyway. Correct would be that utf8 text can be immediately
> written "اللغة العربية الفصحى" and every other crap should look ugly
> like ebcdic"Lenovo stuff".

It doesn't matter what "most compilers" support - for most software, it
matters what /your/ compiler supports. Again I am basing this on my
unsupported claim that software is typically written for specific
purposes and specific uses.

On /my/ compiler (gcc), I can write:

#include <iostream>

int main(void) {
std::cout << "This is a test åøæ ÅØÆ\n";

return 0;
}

The output is:

This is a test åøæ ÅØÆ

That's all I need to be able to write software in Norwegian.

>
>> when I am writing a program where the user-visible strings will be in
>> Norwegian, it's because I am writing a single-language program (and it
>> will /always/ be single language) and I want to write my string literals
>> using Norwegian letters (æ ø å) - without the fuss and effort of
>> external translations.
>
> I applaud such welcome nationalism ... you have every right to do like
> that. The only thing that is strange ... why you, majority, haven't
> beaten the C++ standard committee to senses so it has decent UTF-8
> support in it?

See above.

>
> I myself avoid writing for subset of 5 million people when there is
> likely similar subset on market of 7 billion people but that is only
> mine greedy and mercantile world-view. ;)
>

If I were writing software for other markets, then I agree. Sometimes I
do - I have written international software in Python (using unicode) and
in embedded systems in C and assembly (with latin-1 or home-made
encodings to handle the limitations of the display).

>> When you are working with external translations, or coders that don't
>> speak the language, then I agree that plain ASCII is the way to go.
>
> Yes, we here are in constant difficulties to find enough decent
> engineers so we always want to hire bright people from where ever who
> happened to move here. If they can understand our bad English and
> we can understand their C++ code then deal it likely is. That is
> other reason why we keep all names and comments in English.
>

Certainly that is true. And with an increasing globalisation of the
market, it is becoming increasingly common that the people writing the
software don't speak the language of the users even if the users are all
from one country.

>> that is not the way that most code is written - most coders want to be
>> able to write strings (and comments) in their own language.
>
> Oh, lets speak about place where there are most coders then?
>
> India is a country with at least 4 major local alphabets (Hindi,
> Bengali, Punjabi, Beharati) and about 40 major local languages (more
> than million of speakers each). There are about 200 times more people
> than in Norway and about 400 times more coders than in Norway.

The only serious common language in India is English (very handy for the
rest of us - and it's no coincidence that India is a centre of
programming). Indians prefer to code in English to Hindi because it is
easier for them, even if they have a free choice. Indians from
different parts of the country will often talk to each other in English,
even though in theory they have Hindi in common.

But if you look at code from European countries that is not targeted
internationally (either for users or for other developers), you'll find
most of the strings and comments will be in their own language. Most
names of functions, variables, classes, etc., will be Anglicised
spellings of words from their own language.

Chris Vine

unread,

Oct 26, 2014, 5:44:37 PM10/26/14

to

On Sun, 26 Oct 2014 21:42:26 +0100
David Brown <david...@hesbynett.no> wrote:
[snip]

> On /my/ compiler (gcc), I can write:
>
> #include <iostream>
>
> int main(void) {
> std::cout << "This is a test åøæ ÅØÆ\n";
>
> return 0;
> }
>
>
> The output is:
>
> This is a test åøæ ÅØÆ
>
> That's all I need to be able to write software in Norwegian.

Maybe, but did you just get lucky or really look it up in the gcc
documentation and have all your compiler switches correctly set?

C++ (and C) have the concept of a source character set (the encoding of
the source files) and an execution character set (the encoding for
string literals in the binary) and the two need not be the same. And
they need not be the same as the character set(s) for the locale used by
your output stream and terminal, even in the same machine. The
execution character set is implementation defined.

As you use gcc, http://gcc.gnu.org/onlinedocs/cpp/Character-sets.html
suggests you should be OK in assuming UTF-8 as the default for the
encoding of the narrow character execution character set (but you will
be in trouble with the default if your locale codeset is, say,
ISO-8559-15). You can use the -fexec-charset compiler flag to put
something else in the binary though. However, you still have the input
character set to contend with. Here gcc is quite complicated. It first
converts the encoding of the input files passed to it into its own
notion of the source character set. One curiosity is that if the input
charset is not specified via -finput-charset, gcc tries to obtain the
locale character set to perform this conversion:

"-finput-charset=charset: Set the input character set, used for
translation from the character set of the input file to the source
character set used by GCC. If the locale does not specify, or GCC
cannot get this information from the locale, the default is UTF-8.
This can be overridden by either the locale or this command line
option. Currently the command line option takes precedence if there's
a conflict. charset can be any encoding supported by the system's
iconv library routine."

This means that with gcc source code may not be portable in the absence
of -finput-charset being passed to the compiler. That is not
necessarily problematic where the source is written on the machine on
which the binary runs provided (see the extract above) the source file
is in fact in the locale encoding.

It follows that you can get string literals outside the ASCII range to
compile with gcc and subsequently run correctly on your machine, but
only if you are meticulous about your compiler switches (or the planets
are in conjunction with Venus in the ascension). However, everyone I
know writes literals in ASCII and uses dynamic string conversion at
runtime (but I accept people you know don't). The ASCII approach just
works.

Chris

Öö Tiib

unread,

Oct 26, 2014, 8:39:12 PM10/26/14

to

That feels orthogonal response to a claim "Most commercial

C++ software is internationalized to several languages."

Indeed there are lot of in-house commercial software.
Some bigger players write it in C++ too.

> Quite often the business is the service produced by the software, not
> the software itself.

So such services provide the texts spread over network
as UTF-8 on most of the cases? And most of the rest are
ASCII? What else did I say?

Oh about i18n. The few services that I know being
mostly written in C++ (like Google) tend to talk with
me in Estonian. So the in-house services written
in C++ seem to be localized to several languages.
To quite lot of languages even since on our planet
are only few hundred thousands of alive Estonians
left.

Öö Tiib

unread,

Oct 26, 2014, 10:41:39 PM10/26/14

to

I did not say that most software is open source. Lot
of it is.

> > Most commercial C++ software is internationalized to several languages.
>
> I disagree (again, I have no statistics to back this up). Most publicly
> available commercial software is written in English only, and only
> supports English. But most commercially written software is not
> publicly available at all - it is written for specific uses. (And a lot
> of software that hopes to be international never makes it that far.)

What C++ commercial software is English only? Do you
mean software development tools? Majority of software
is meant for normal people (not us, hackers) and it is
almost always localized.

> However, I don't think that C++ is the most common language for such
> software.

Are we somehow missing each other points?
I say: "Most commercial C++ software is internationalized
to several languages."
You disagree: "I don't think that most commercial
software is written in C++."
I did not discuss software not written in C++ (or in
C at least).

> > One reason is that the C++ itself does not support universal encoding.
> > That u8"Whatever" is pointless and even that is not supported by most
> > compilers anyway. Correct would be that utf8 text can be immediately
> > written "اللغة العربية الفصحى" and every other crap should look ugly
> > like ebcdic"Lenovo stuff".
>
> It doesn't matter what "most compilers" support - for most software, it
> matters what /your/ compiler supports. Again I am basing this on my
> unsupported claim that software is typically written for specific
> purposes and specific uses.

On my case it is not good enough to call a piece of C++
written for single computer a "software". I may call
it "prototype" or "proof of concept". Life has made me
paranoid. There are always defects and nothing reveals
defects better than trying on several computers and
even on different platforms.

> On /my/ compiler (gcc), I can write:

...

Yes. Chris Vine already wrote long dissertation
about how the different encodings are playing there.

> But if you look at code from European countries that is not targeted
> internationally (either for users or for other developers), you'll find
> most of the strings and comments will be in their own language. Most
> names of functions, variables, classes, etc., will be Anglicised
> spellings of words from their own language.

It may be that such non-international C++ code is
not that widespread as you suggest. I don't have
proof either just tons of personal experience.

Things evolve. Two decades ago I saw several cases
like you describe. During last decade I have seen lot
of commercial C, C++ and C# source code written by
different European and other nations and only rather
few comments (probably entered by novice) were not
in English. People and companies almost always want
to (and often need to) cooperate internationally.

David Brown

unread,

Oct 27, 2014, 6:28:07 AM10/27/14

to

On 26/10/14 22:44, Chris Vine wrote:
> On Sun, 26 Oct 2014 21:42:26 +0100
> David Brown <david...@hesbynett.no> wrote:
> [snip]
>> On /my/ compiler (gcc), I can write:
>>
>> #include <iostream>
>>
>> int main(void) {
>> std::cout << "This is a test åøæ ÅØÆ\n";
>>
>> return 0;
>> }
>>
>>
>> The output is:
>>
>> This is a test åøæ ÅØÆ
>>
>> That's all I need to be able to write software in Norwegian.
>
> Maybe, but did you just get lucky or really look it up in the gcc
> documentation and have all your compiler switches correctly set?

gcc specifies that it uses utf-8 by default for char (and utf-16 or
utf-32 for wchar, depending on the target). You can change it if you
want using compiler switches (as you note below). If I were using a
locale that was not utf-8, I would need to give a command line switch
for the input encoding - but most Linux installations use utf-8 locales
these days. I suspect that on Windows, gcc won't be able to get the
locale encoding, and will therefore default to utf-8.

clang is similar, but goes further in allowing utf-8 characters in
identifiers (gcc "allows" this, with a command line switch, but you have
to give your utf-8 characters as \u... sequences which is not very user
friendly).

Correct.

To be portable, you should either include a Makefile or other list of
required command line switches (I would always do that anyway), or use
explicit utf-8 string literals such as u8"π = 3.1415" which are
supported by C11 and C++11.

>
> It follows that you can get string literals outside the ASCII range to
> compile with gcc and subsequently run correctly on your machine, but
> only if you are meticulous about your compiler switches (or the planets
> are in conjunction with Venus in the ascension). However, everyone I
> know writes literals in ASCII and uses dynamic string conversion at
> runtime (but I accept people you know don't). The ASCII approach just
> works.
>

The utf-8 approach "just works" too. If you are working in a mixed
environment where people have different locales when compiling, you need
a "-finput-charset=utf-8" - that's hardly an onerous requirement, and is
certainly /vastly/ easier than to insist on external dynamic string
conversions.

There are plenty of /good/ reasons for using external string
translations - but this certainly is not one of them. You are wildly
exaggerating the "problem".

David Brown

unread,

Oct 27, 2014, 8:48:54 AM10/27/14

to

That's true - nor did I say you said so! But since we are trying to
generalise about "most" software, and open source software is a minority
of software written, then I thought it could be left out here.

>
>>> Most commercial C++ software is internationalized to several languages.
>>
>> I disagree (again, I have no statistics to back this up). Most publicly
>> available commercial software is written in English only, and only
>> supports English. But most commercially written software is not
>> publicly available at all - it is written for specific uses. (And a lot
>> of software that hopes to be international never makes it that far.)
>
> What C++ commercial software is English only? Do you
> mean software development tools? Majority of software
> is meant for normal people (not us, hackers) and it is
> almost always localized.

I think this all hinges on what we mean by "most software". I mean
"most programs written", not "most programs installed". If it were the
later, then I would agree with you - a very large proportion of software
installed on any given machine is internationalised to several
languages. That applies to C++ and other programming languages, and to
commercial and open source software.

However, most programs /written/ are not spread that far. But even if
they are sold in different countries, most are not internationalised.
That is because most of it is for specialised use, not for "normal"
people. Some of these are software developers - but the same applies to
graphics artists, engineers, and just about anyone else who uses
specialised software. Even games are typically English only.

>
>> However, I don't think that C++ is the most common language for such
>> software.
>
> Are we somehow missing each other points?
> I say: "Most commercial C++ software is internationalized
> to several languages."
> You disagree: "I don't think that most commercial
> software is written in C++."
> I did not discuss software not written in C++ (or in
> C at least).

I was making an additional comment, not disagreeing with something that
I imagined you wrote. The fact that a lot of software is written in
something other than C++, and that often we do not know the language,
makes it hard to estimate anything about common uses of a particular
language.

>
>>> One reason is that the C++ itself does not support universal encoding.
>>> That u8"Whatever" is pointless and even that is not supported by most
>>> compilers anyway. Correct would be that utf8 text can be immediately
>>> written "اللغة العربية الفصحى" and every other crap should look ugly
>>> like ebcdic"Lenovo stuff".
>>
>> It doesn't matter what "most compilers" support - for most software, it
>> matters what /your/ compiler supports. Again I am basing this on my
>> unsupported claim that software is typically written for specific
>> purposes and specific uses.
>
> On my case it is not good enough to call a piece of C++
> written for single computer a "software". I may call
> it "prototype" or "proof of concept". Life has made me
> paranoid. There are always defects and nothing reveals
> defects better than trying on several computers and
> even on different platforms.
>
>> On /my/ compiler (gcc), I can write:
>
> ...
>
> Yes. Chris Vine already wrote long dissertation
> about how the different encodings are playing there.

See my answer to his post.

>
>> But if you look at code from European countries that is not targeted
>> internationally (either for users or for other developers), you'll find
>> most of the strings and comments will be in their own language. Most
>> names of functions, variables, classes, etc., will be Anglicised
>> spellings of words from their own language.
>
> It may be that such non-international C++ code is
> not that widespread as you suggest. I don't have
> proof either just tons of personal experience.

I have no proof myself, other than thousands of English-only (and some
Norwegian-only) programs on my systems.

>
> Things evolve. Two decades ago I saw several cases
> like you describe. During last decade I have seen lot
> of commercial C, C++ and C# source code written by
> different European and other nations and only rather
> few comments (probably entered by novice) were not
> in English. People and companies almost always want
> to (and often need to) cooperate internationally.
>

That has a ring of truth to it - and I'd be happy to accept (without any
proof) that it is more common now to write internationalised code than
it used to be.

Chris Vine

unread,

Oct 27, 2014, 10:41:52 AM10/27/14

to

On Mon, 27 Oct 2014 11:27:53 +0100

David Brown <david...@hesbynett.no> wrote:
[snip]

> The utf-8 approach "just works" too. If you are working in a mixed
> environment where people have different locales when compiling, you
> need a "-finput-charset=utf-8" - that's hardly an onerous
> requirement, and is certainly /vastly/ easier than to insist on
> external dynamic string conversions.

It is not as simple as that. Where someone has a locale codeset
other than UTF-8 when compiling, they have a locale codeset other than
UTF-8 at runtime also. So they also need an execution character set
flag for the compiler which puts their codeset in the binary.

And this is the problem with your approach. It hard codes the runtime
locale in the binary itself. This means that the binary will only run
on a machine with a particular locale codeset. It is excessively
brittle. We may reach the stage where you can assume that everyone is
using UTF-8 as their narrow character locale codeset, but we are not
there yet.

> There are plenty of /good/ reasons for using external string
> translations - but this certainly is not one of them. You are wildly
> exaggerating the "problem".

See above.

On the other hand I hope you will now agree that your:

> On /my/ compiler (gcc), I can write:
>
> #include <iostream>
>
> int main(void) {
> std::cout << "This is a test åøæ ÅØÆ\n";
>
> return 0;
> }
>
>
> The output is:
>
> This is a test åøæ ÅØÆ
>
> That's all I need to be able to write software in Norwegian.

greatly understated the issues involved.

Turning from that to other related things, I would be interested in
your view on the effect of the C++11 u8 string prefix. §2.14.5/7 of
the standard says "A string literal that begins with u8, such as
u8"asdf", is a UTF-8 string literal and is initialized with the given
characters as encoded in UTF-8." I think this means that a conforming
compiler would be required to put the string literal in the binary as
UTF-8, and ignore any contrary setting (in the case of gcc, as with the
-fexec-charset flag). However I think it probably does not govern the
input codeset - in other words I think someone writing code using
ISO-8859-* as the codeset that her editor emits (say, because it is her
locale codeset) can still use u8 to enforce UTF-8 in the binary
itself. That would also be sort-of consistent with the C++98 L string
prefix, which enforces a conversion between the input narrow string
representation and the wide string execution character set.

Is that your view also (and also of anyone else who wants to offer an
opinion)?

Chris