const char units[] = "µA";

mathieu

unread,

Aug 6, 2007, 9:09:28 AM8/6/07

to

Hi,

Sorry if this is slightly off topic. But I have been banging my head
trying to figure out how to solve my issue.

I have a dictionary (simple lookup from a key to string) that in
some case can return a string containing the symbol 'µ'. So my
question is simply, can I write:

const char units[] = "µA";

Or will the compilation depend on my computer locale ?

I'd really appreciate if someone could point me to a presentation of
this issue in C/C++ programming and handling of non-ASCII character.

thanks !
-Mathieu
Ps: As a side note can I print this character in most xterm/console ?

#include <iostream>

int main()
{
const char units[] = "µA";
std::cout << units << std::endl;
return 0;
}

Gernot Frisch

unread,

Aug 6, 2007, 9:16:49 AM8/6/07

to

> question is simply, can I write:

> const char units[] = "ÁA";

> Or will the compilation depend on my computer locale ?

Many codepages have Á (mue) at index 230 defined. However, you may not
relay on this. If you need the character consider using unicode or (as
we do for half a decade now) write "mue" instead of the character.
Just my .02$
-
Gernot

Victor Bazarov

unread,

Aug 6, 2007, 10:07:01 AM8/6/07

to

Another penny added here: often, in lieu of the actual "Mu", the Latin
lowercase U is used: uA. I am guessing it's because they look a bit
alike ('u' doesn't have the first stroke that in "Mu" extends beyond
the baseline).

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

mathieu

unread,

Aug 6, 2007, 12:03:44 PM8/6/07

to

On Aug 6, 4:07 pm, "Victor Bazarov" <v.Abaza...@comAcast.net> wrote:
> Gernot Frisch wrote:
> >> question is simply, can I write:

> >> const char units[] = "µA";

> >> Or will the compilation depend on my computer locale ?
>

> > Many codepages have µ (mue) at index 230 defined. However, you may not

For some reason I did not get the correct character doing:

const unsigned char mue = 230;

instead I had to do:

const char mue = -75;

> > relay on this. If you need the character consider using unicode or (as
> > we do for half a decade now) write "mue" instead of the character.

I am definitely not switching to UNICODE simply because of one single
non-ASCII character.

> Another penny added here: often, in lieu of the actual "Mu", the Latin
> lowercase U is used: uA. I am guessing it's because they look a bit
> alike ('u' doesn't have the first stroke that in "Mu" extends beyond
> the baseline).

Thanks. But that does not answer my initial question, what does the C/C
++ standard says about:

assert( 'µ' == -75 )

Am I missing something obvious here ?

Thanks
-Mathieu

Victor Bazarov

unread,

Aug 6, 2007, 1:17:08 PM8/6/07

to

mathieu wrote:
> On Aug 6, 4:07 pm, "Victor Bazarov" <v.Abaza...@comAcast.net> wrote:
>> Gernot Frisch wrote:
>>>> question is simply, can I write:

>>>> const char units[] = "ÁA";

>>>> Or will the compilation depend on my computer locale ?
>>

>>> Many codepages have Á (mue) at index 230 defined. However, you may

>>> not
>
> For some reason I did not get the correct character doing:
>
> const unsigned char mue = 230;

It's implementation-specific. I did [get the lowercase Mu that way].

>
> instead I had to do:
>
> const char mue = -75;

Really? I would think -26 should do the same.

> [..] But that does not answer my initial question, what does the

> C/C ++ standard says about:
>

> assert( 'Á' == -75 )

>
> Am I missing something obvious here ?

The standard (I suppose you mean the C++ Standard) says nothing about
any specific encoding of characters.

James Kanze

unread,

Aug 6, 2007, 4:51:52 PM8/6/07

to

On Aug 6, 3:09 pm, mathieu <mathieu.malate...@gmail.com> wrote:
> Sorry if this is slightly off topic. But I have been banging my head
> trying to figure out how to solve my issue.

> I have a dictionary (simple lookup from a key to string) that in
> some case can return a string containing the symbol 'µ'. So my
> question is simply, can I write:

> const char units[] = "µA";

> Or will the compilation depend on my computer locale ?

More likely, it will depend on your compiler, perhaps some
compiler options, and perhaps the locale. Or perhaps not.

Formally, what you want is "\u03BCA". In theory, at least, you
should be able to type this in, and get it. Except that in a
lot of cases, there will be no encoding of "\u03BA" in what the
compiler considers its standard "narrow character" encoding;
you'd have to do something like:

wchar_t const units[] = L"\u03BCA" ;

After that, it will depend on what locale is imbued in the wide
stream you write this too.

> I'd really appreciate if someone could point me to a presentation of
> this issue in C/C++ programming and handling of non-ASCII character.

It's difficult, because so much is implementation dependant.
(Not even necessarily implementation defined---and some of it
goes beyond C++.)

> thanks !
> -Mathieu
> Ps: As a side note can I print this character in most xterm/console ?

> #include <iostream>

> int main()
> {
> const char units[] = "µA";
> std::cout << units << std::endl;
> return 0;
> }

I can't. To begin with, the machines at work don't even have a
font installed that has a mu character. Which, of course, makes
it very difficult for the compiler to generate code which will
display one.

In general, what the compiler will do when it encounters the mu
character is up to the compiler. I think (but I'm far from
certain) that g++ will simply copy the bytes in a string
constant through; if you can enter the mu character in your
editor (again, I can't), and it looks like a mu, then there's a
chance that the above might work. A chance---no guarantee.
(And of course, if you start the program in a different xterm,
configured to use a font with a different encoding, you'll see
something entirely different.)

--
James Kanze (GABI Software) email:james.kanze:gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

James Kanze

unread,

Aug 6, 2007, 5:06:41 PM8/6/07

to

On Aug 6, 3:16 pm, "Gernot Frisch" <M...@Privacy.net> wrote:
> > question is simply, can I write:

> > const char units[] = "µA";

> > Or will the compilation depend on my computer locale ?

> Many codepages have µ (mue) at index 230 defined. However, you may not
> relay on this.

He mentionned xterm, so I rather doubt he's talking about
Windows, and their non-standard terminology. In Unicode, it's
03BC, which would be 0xCE, 0xBC in UTF-8. (There's also an ISO
8859 encoding for Greek, but I don't know what it is exactly.)

Modern Linux does attempt to allow UTF-8 everywhere, so it's
just possible that if he's under Linux, and the sysadmins have
configured everything just right (and he's set his environment
up just right)`, that using this two character sequence would
work.

Under Windows, of course, he could just use L"\u03BC", which if
his IO uses the correct locale (I'm not sure how Windows
compilers map this sort of stuff), should work.

> If you need the character consider using unicode or (as
> we do for half a decade now) write "mue" instead of the character.

The solution I've always seen has been to use the small letter
u. Looks sort of like a mu, and is present in the basic
character set.

James Kanze

unread,

Aug 6, 2007, 5:09:19 PM8/6/07

to

On Aug 6, 7:17 pm, "Victor Bazarov" <v.Abaza...@comAcast.net> wrote:

[...]

> The standard (I suppose you mean the C++ Standard) says nothing about
> any specific encoding of characters.

It also doesn't guarentee the availability of any characters
other than those in the basic character set. My machines at
work are configured to use ISO 8859-1, for example, and there's
no way he can get a mu in a narrow character.

Thomas J. Gritzan

unread,

Aug 6, 2007, 5:33:56 PM8/6/07

to

James Kanze schrieb:

> On Aug 6, 7:17 pm, "Victor Bazarov" <v.Abaza...@comAcast.net> wrote:
>
> [...]
>> The standard (I suppose you mean the C++ Standard) says nothing about
>> any specific encoding of characters.
>
> It also doesn't guarentee the availability of any characters
> other than those in the basic character set. My machines at
> work are configured to use ISO 8859-1, for example, and there's
> no way he can get a mu in a narrow character.

On my Linux box with ISO 8859-1 I get a mu with Alt-GR+m. Also, there is:

http://de.wikipedia.org/wiki/ISO_8859-1

"B5" is µ (mu). What other ISO 8859-1 do you have that has no mu?

--
Thomas
http://www.netmeister.org/news/learn2quote.html

James Kanze

unread,

Aug 7, 2007, 4:07:32 AM8/7/07

to

On Aug 6, 11:33 pm, "Thomas J. Gritzan" <Phygon_ANTIS...@gmx.de>
wrote:
> James Kanze schrieb:

> http://de.wikipedia.org/wiki/ISO_8859-1

Sometimes, even verifying before posting doesn't work. I
actually grepped for the string MU, and the only character which
came up was 0xD7 MULTIPLICATION SIGN. There is no MU in ISO
8859-1, according to ISO (or at least, according to the Unicode
translation tables I downloaded from the Unicode site), BUT...

There is a character 0xB5 MICRO SIGN, and if that isn't exactly
the same thing as a mu, I don't know. Except of course that
grep MU won't find it:-). In context, of course, what the OP
really wants is the micro sign, and not mu. Supposing that
there is a difference.

So "\u00B5" should work, provided the fonts and the locales are
set up for ISO 8859-1 (or 8859-15).

--
James Kanze (GABI Software) email:james...@gmail.com