Heinz-Mario Frühbeis <
D...@earlybite.individcore.de> wrote:
> I have an issue with XDrawString from XLib, because it prints only
> special characters instead of ü, ö, a.s.o.
> E.g.:
> string test = "Hüh";
> XDrawString(..., test, ...) // isn't printing ü
> But what works is:
> string test = "h";
> char nChar = (char) 252; // is ASCII ü
That's where things start to go wrong: there is no 'ü' in
ASCII - ASCII defines only the values up to 127 (ASCII
is an abbreviation for American Standard Code for In-
formation Interchange and the Americans don't use um-
lauts and, moreover, back when it was designed it wan't
uncommon to use only 7 bits for representing characters).
There are lots of different encodings that use the values
above 127, one of them being 'iso_8859-1' (commonly used
for Western European languages) and that's probably the one
you got the idea from the the 'ü' is represented by the
value 252 (same in a number of other iso_8859-x encodings
but not all of them - in the encoding used for Cyrillic,
'iso_8859-5', 252 represents 'ќ'). And XDrawString() will
render that value as 'ü' only if you also use a font that
is made for these encodings.
Next problem: if you have
> string test = "Hüh";
in your code then what is stored in 'test' depends on what
encoding your editor uses. Nowadays it's not unlikely that
this is UTF-8 and, if you look at the individual bytes, e.g.
by going through 'test.c_str()', you will find that it con-
tains 4 chars, the first one being 0x48, the second 0xC3, the
third 0xBC and the fourth 0x68. The 0x48 and 0x68 are 'H' and
'h' as in ASCII (ASCII characters, i.e. stuff up to 127 are
encoded the same way in UTF-8) and the combination of 0xC3
and 0xBC is the way UTF-8 encodes 'ü', officially called
"LATIN SMALL LETTER U WITH DIAERESIS". If you use a font
for iso_8859-1 encoding XDrawstring() will render them as
'Ã' and '¼', but if you'd use a UTF-8 font it would be ren-
dered as 'ü'.
It gets trickier when you use input coming from outside
the program: what you will read depends on the encoding
used by whatever sends the data - if you e.g. try to draw
strings entered into a terminal what 'test' will contain
depends on what encoding the terminal uses. And if you use
input you got from Xlib functions things get even a bit
more "intereting".
> test = test + nChar;
> test = test + "h";
> XDrawString(..., test, ...) // is now printing ü
> So I wanted to replace ü with (char) 252...
> But I do get it working, this is one of my tries:
Well, it works somehow because you use a (probaby) iso_8959-1)
font and forced the value of 252 into the string (and
std::sstring doesn't care at all about the encoding, you
can actually store any binary data in a std::string. And
the length method doesn't tell you how many "letters" there
are since that would depend on how letters are encoded but
just the plain number of bytes.
The following is for sure not the code you're using (it
won't even compile), so anything one can say about it may
have nothing to do with the problems you're facing...
> strimg nCaption = "ausführen";
^
> if(nCaption != ""){
What's that test for - the find() method of std::string will
work quite fine on an empty string? And why, if you insist on
this test, not use the empty() method?
> if(nCaption.find("u") > 0){
Why do you look for "u" in the string? And the find() method
does return the position of what you where looking for, which
can include 0 (the very start of the string). What you should
ompare to is std::string::npos which is what gets returned if
the string does not contain what you were looking for. So your
test asks: is there an "u" somewhere in the string beyond the
first byte or non at all? What you need here is
if (nCaption.find("ü") != std::string::npos)
to make sense since it asks: is there an "ü" in that string?
> char nChar = (char) 252;
> const char* nChar1 = &nChar;
> std::string umlaut = "ü";
> nCaption.replace(nCaption.find("ü"), umlaut.length() , nChar1);
The third argument to the replace() method must, when you
call it with a char pointer, be a pointer to a C string
(which must include a '\0' at the end!), but what you pass
it is a pointer to a single char with no '\0' following it
(or only just by chance). Be prepared for lots of strange
looking stuff to get inserted for the 'ü'...
> }
> }
All that could have been written much simpler and cleaner as
std::string nCaption = "ausführen";
const char iso_8859_1_uuml[] = {252, '\0'};
const size_t len = std::string("ü").length();
size_t pos;
while ((pos = nCaption.find("ü")) != std::string::npos)
nCaption.replace(pos, len, iso_8859_1_uuml);
But that only "solves" the problem for 'ü', you've got to do
the same replacements for all non-ASCII characters you're pre-
pared to deal with (and decide what to do with those that can't
be represented in your choosen encoding)... And if you switch
to a font that's made for UTF-8 this will actually break things.
Regards, Jens
--
\ Jens Thoms Toerring ___
j...@toerring.de
\__________________________
http://toerring.de