Non ASCII characters in CFString litteral, HELP !

Francois

unread,

Feb 26, 2007, 2:28:02 PM2/26/07

to

I am puzzled by this warning in some programs. I am French and use a lot
of accented chars. Worse, although these accented chars get typed
correctly in my source code files, when they are sent programatically
for ex. to some TextField they appear as garbage.
I have tried to change the files encoding in Xcode preferences, to set
'char' as unsigned in the build settings with no result at all.
This appears new to me and I never experienced that in Panther ( I'm now
in 10.4.8 and XCode 2.4.1 )
Please, What can I do to solve this irritating problem. ?

Patrick Machielse

unread,

Feb 26, 2007, 3:48:35 PM2/26/07

to

Francois <dan...@9online.fr> wrote:

> I am puzzled by this warning in some programs. I am French and use a lot
> of accented chars. Worse, although these accented chars get typed
> correctly in my source code files, when they are sent programatically
> for ex. to some TextField they appear as garbage.
> I have tried to change the files encoding in Xcode preferences, to set
> 'char' as unsigned in the build settings with no result at all.
> This appears new to me and I never experienced that in Panther ( I'm now
> in 10.4.8 and XCode 2.4.1 )

The warining may have been enabled by default recently, but the issue is
'age old'. C source files should only contain ASCII characters (unlike
Java source files, which can contain any Unicode character). The usual
way of handling 'richer' strings is to store them in a separate file --
which can be UTF-8 encoded -- and read them in at runtime. This strategy
also helps you when you want to localize/internationalize your program.
In general it is a good idea to separate code and strings, which are
really resources.

patrick

David Phillip Oster

unread,

Feb 26, 2007, 9:29:50 PM2/26/07

to

Use CFCopyLocalizedString()

Francois

unread,

Feb 27, 2007, 4:20:22 AM2/27/07

to

Thanks but it looks like a Carbon function. No doubt it works in Cocoa
but bring dumb in Carbon it tells nothing to me.

Francois

unread,

Feb 27, 2007, 4:53:54 AM2/27/07

to

Patrick Machielse wrote:
> Francois <dan...@9online.fr> wrote:

>
> The warining may have been enabled by default recently, but the issue is
> 'age old'. C source files should only contain ASCII characters (unlike
> Java source files, which can contain any Unicode character). The usual
> way of handling 'richer' strings is to store them in a separate file --
> which can be UTF-8 encoded -- and read them in at runtime. This strategy
> also helps you when you want to localize/internationalize your program.
> In general it is a good idea to separate code and strings, which are
> really resources.
>
> patrick

Yr suggestion, Patrick, is excellent but I failed to make it work. I
created a "define.h" file UFT-8 encoded ( and BTW all my code files are
UFT8 encoded) where I put strings definitions like:
NSString *TXT = @"ANNÉE D'ÉDITION N° : " ;
But the result is unchanged : garbage where the non-ASCII chars are put
by the program.
If the same accented text is typed on a view in IB it looks OK at run-time,
I'm pretty sure it's a very simple point I miss ( perhaps in the build
settings or the info.plist file ). It is all the more enfuriating that
I'm presently reusing the same code ( NSWindowController subclasses) of
an older program which doesn't show at all this annoying behavior.
This old chap is default System encoded (Western Mac OS which I believe
to be Lucida
Grande, and , By Zeus, Lucida Grande does have all the accented chars of
French !)
Well , I'm stuck !

Patrick Machielse

unread,

Feb 27, 2007, 5:19:24 AM2/27/07

to

Francois <dan...@9online.fr> wrote:

> Yr suggestion, Patrick, is excellent but I failed to make it work. I
> created a "define.h" file UFT-8 encoded ( and BTW all my code files are
> UFT8 encoded) where I put strings definitions like:
> NSString *TXT = @"ANNÉE D'ÉDITION N° : " ;
> But the result is unchanged : garbage where the non-ASCII chars are put
> by the program.

If you #include or #import your define.h file, the result is no
different from when you just type it directly into your code.

You could:

- put 'ANNÉE D'ÉDITION N° : ' in a separate file strings.txt
- read it using: -[NSString stringWithContentsOfFile:encoding:error]

really though, you probably want to use the NSLocalizedString macro's:

<http://developer.apple.com/documentation/MacOSX/Conceptual/BPInternatio
nal/Articles/StringsFiles.html>

> If the same accented text is typed on a view in IB it looks OK at run-time

nib files are archives wich store text as unicode (I suppose).

patrick

Sherm Pendley

unread,

Feb 27, 2007, 6:48:22 AM2/27/07

to

Francois <dan...@9online.fr> writes:

> Yr suggestion, Patrick, is excellent but I failed to make it work.

You didn't do what he suggested.

> created a "define.h" file UFT-8 encoded ( and BTW all my code files
> are UFT8 encoded) where I put strings definitions like:
> NSString *TXT = @"ANNÉE D'ÉDITION N° : " ;
> But the result is unchanged : garbage where the non-ASCII chars are
> put by the program.

Patrick suggested putting your strings in an external file and loading them
at runtime. That's not what you did; instead, you put them in another source
file.

> If the same accented text is typed on a view in IB it looks OK at run-time,
> I'm pretty sure it's a very simple point I miss

The point is that GCC only understands ASCII. IB is not GCC.

sherm--

--
Web Hosting by West Virginians, for West Virginians: http://wv-www.net
Cocoa programming in Perl: http://camelbones.sourceforge.net

Francois

unread,

Feb 27, 2007, 8:29:22 AM2/27/07

to

Sherm Pendley wrote:
> Francois <dan...@9online.fr> writes:
>
>> Yr suggestion, Patrick, is excellent but I failed to make it work.
>
> You didn't do what he suggested.
>
>> created a "define.h" file UFT-8 encoded ( and BTW all my code files
>> are UFT8 encoded) where I put strings definitions like:
>> NSString *TXT = @"ANNÉE D'ÉDITION N° : " ;
>> But the result is unchanged : garbage where the non-ASCII chars are
>> put by the program.
>
> Patrick suggested putting your strings in an external file and loading them
> at runtime. That's not what you did; instead, you put them in another source
> file.
>
>> If the same accented text is typed on a view in IB it looks OK at run-time,
>> I'm pretty sure it's a very simple point I miss
>
> The point is that GCC only understands ASCII. IB is not GCC.
>
> sherm--
>

You are right, Thanks to all for their advice. I found a ( maybe clumsy)
solution by using:
[NSString stringWithUTF8String:"à votre santé"]
every time I have to load such an accented string at run time. Takes
more time to type but, fortunately, not all french words are accented.
This will go for the time being but I'll try later to experiment the
solutions suggested.

Sherm Pendley

unread,

Feb 27, 2007, 8:58:26 AM2/27/07

to

Francois <dan...@9online.fr> writes:

> You are right, Thanks to all for their advice. I found a ( maybe
> clumsy) solution by using:
> [NSString stringWithUTF8String:"à votre santé"]

If you're thankful for the advice, why are you ignoring it?

> every time I have to load such an accented string at run time.

You're *NOT* loading the string at run time. It's in your source code.

You were told the correct way to do this, so what's the point in continuing
to use half-baked hacks?

Michael Ash

unread,

Feb 27, 2007, 11:44:37 AM2/27/07

to

Francois <dan...@9online.fr> wrote:
> You are right, Thanks to all for their advice. I found a ( maybe clumsy)
> solution by using:

> [NSString stringWithUTF8String:"? votre sant?"]

> every time I have to load such an accented string at run time. Takes
> more time to type but, fortunately, not all french words are accented.
> This will go for the time being but I'll try later to experiment the
> solutions suggested.

This may work but it is only by coincidence. The behavior of non-ASCII
characters in your source code (other than in comments) is undefined.

Stop looking for shortcuts and put your strings in an external file the
way everyone has been saying.

--
Michael Ash
Rogue Amoeba Software

Michael Ash

unread,

Feb 27, 2007, 11:45:29 AM2/27/07

to

Actually, it's a CoreFoundation function, not a Carbon function.

One would hope that after looking at the documentation a bit and seeing it
as part of CFBundle, it would spark the idea to look in NSBundle to find
the Cocoa equivalent, NSLocalizedString().

Sean McBride

unread,

Feb 28, 2007, 10:28:53 PM2/28/07

to

In article <ervcc1$gbi$1...@aioe.org>, Francois <dan...@9online.fr> wrote:

As others have said, the problem is gcc. Other than the workarounds
already suggested, you might look at using another compiler. Your
choices are: CodeWarrior (discontinued, thus not recommended), icc (from
Intel), and xlc (from IBM). I don't know if any of them accept
better-than-ASCII source files.

Michael Ash

unread,

Feb 28, 2007, 10:49:12 PM2/28/07

to

xlc is PPC-only and last I heard it did not do Objective-C. Same for icc
except it's x86-only. CodeWarrior is PPC-only and very dead, although it
does Objective-C, but I don't know if it does it convincingly enough to
actually work with Cocoa.

Oh, and they're all quite expensive, particularly compared to gcc's $free.

I think this is swatting a fly with a machine gun....

Sean McBride

unread,

Mar 4, 2007, 11:06:38 AM3/4/07

to

In article <11727209...@nfs-db1.segnet.com>,
Michael Ash <mi...@mikeash.com> wrote:

> > As others have said, the problem is gcc. Other than the workarounds
> > already suggested, you might look at using another compiler. Your
> > choices are: CodeWarrior (discontinued, thus not recommended), icc (from
> > Intel), and xlc (from IBM). I don't know if any of them accept
> > better-than-ASCII source files.
>
> xlc is PPC-only and last I heard it did not do Objective-C. Same for icc
> except it's x86-only. CodeWarrior is PPC-only and very dead, although it
> does Objective-C, but I don't know if it does it convincingly enough to
> actually work with Cocoa.
>
> Oh, and they're all quite expensive, particularly compared to gcc's $free.
>
> I think this is swatting a fly with a machine gun....

Maybe, that's for the OP to decide. Just trying to suggest options....

Also, you might not think it such a drastic solution if you couldn't
code in your mother tongue. Imagine gcc were a Japanese invention and
did not support English characters! :)

It is 2007 after all, and gcc's ASCII-only-ness is exceeding lame.

Michael Ash

unread,

Mar 4, 2007, 5:52:20 PM3/4/07

to

I would be careful calling it a "solution" when it doesn't actually fix
the OP's problem. The problem is non-ASCII NSString literals (the error
refers to CF but actually means both CF and NS) but one or two of the
compilers listed don't even support Objective-C. The one that does
(CodeWarrior) probably doesn't reliably support non-ASCII NSString
literals anyway. Even if it did, it won't support characters outside of
MacRoman because of how NSConstantString is implemented.

> It is 2007 after all, and gcc's ASCII-only-ness is exceeding lame.

For this particular case, blame C or Cocoa, not gcc. Gcc is perfectly
happy to take C string constants in any encoding and just pass them
through to the other side. So as long as you make sure your files are
UTF-8 you can do something like this:

[NSString stringWithUTF8String:"Arbitrary Unicode Goes Here"]

However the C standard does not guarantee any particular behavior in this
case, so it's bad practice to rely on it in your code. Likewise, for ObjC
string constants, gcc just passes the data through. The problem is that
only MacRoman is accepted so for most people the data gets corrupted. It's
also not officially supported by the language/libraries, so once again
it's bad to rely on it. From what I can see, gcc is already doing as much
as it can in this department.

If you're referring to ASCII-only-ness outside of string constants (like
for identifiers and such), that certainly could be remedied and there's
little reason not to IMO, but once again the resulting code would be
unportable.

Sean McBride

unread,

Mar 5, 2007, 9:39:13 PM3/5/07

to

In article <11730487...@nfs-db1.segnet.com>,
Michael Ash <mi...@mikeash.com> wrote:

> Sean McBride <cwa...@cam.org> wrote:
> > In article <11727209...@nfs-db1.segnet.com>,
> > Michael Ash <mi...@mikeash.com> wrote:
> >
> >> > As others have said, the problem is gcc. Other than the workarounds
> >> > already suggested, you might look at using another compiler. Your
> >> > choices are: CodeWarrior (discontinued, thus not recommended), icc (from
> >> > Intel), and xlc (from IBM). I don't know if any of them accept
> >> > better-than-ASCII source files.
> >>
> >> xlc is PPC-only and last I heard it did not do Objective-C. Same for icc
> >> except it's x86-only. CodeWarrior is PPC-only and very dead, although it
> >> does Objective-C, but I don't know if it does it convincingly enough to
> >> actually work with Cocoa.
> >>
> >> Oh, and they're all quite expensive, particularly compared to gcc's $free.
> >>
> >> I think this is swatting a fly with a machine gun....
> >
> > Maybe, that's for the OP to decide. Just trying to suggest options....
> >
> > Also, you might not think it such a drastic solution if you couldn't
> > code in your mother tongue. Imagine gcc were a Japanese invention and
> > did not support English characters! :)
>
> I would be careful calling it a "solution" when it doesn't actually fix
> the OP's problem.

Quite right. I should have said 'potential maybe-solution'. :) As I
said way way back: "you might look at using another compiler. [...] I
don't know if any of them accept better-than-ASCII source files.". We
have now looked at it, and clearly it won't help the OP.

> The problem is non-ASCII NSString literals (the error
> refers to CF but actually means both CF and NS) but one or two of the
> compilers listed don't even support Objective-C.

It is indeed lamentable that we have pretty much only one compiler
choice on our platform (at least for anything that needs Obj-C, and more
and more stuff needs it these days).

> The one that does
> (CodeWarrior) probably doesn't reliably support non-ASCII NSString
> literals anyway. Even if it did, it won't support characters outside of
> MacRoman because of how NSConstantString is implemented.
>
> > It is 2007 after all, and gcc's ASCII-only-ness is exceeding lame.
>
> For this particular case, blame C or Cocoa, not gcc. Gcc is perfectly
> happy to take C string constants in any encoding and just pass them
> through to the other side. So as long as you make sure your files are
> UTF-8 you can do something like this:
>
> [NSString stringWithUTF8String:"Arbitrary Unicode Goes Here"]
>
> However the C standard does not guarantee any particular behavior in this
> case, so it's bad practice to rely on it in your code. Likewise, for ObjC
> string constants, gcc just passes the data through. The problem is that
> only MacRoman is accepted so for most people the data gets corrupted. It's
> also not officially supported by the language/libraries, so once again
> it's bad to rely on it. From what I can see, gcc is already doing as much
> as it can in this department.

Interesting. Also, a quick look at gcc's man page has also set me
straight. OK, so maybe it's not gcc that sucks, but something sure
sucks! :) It's 2007, I should be able to do something like [NSString
stringWithUTF8String:"Arbitrary Unicode Goes Here"] already! Can you
sense it frustrates me?! :)

Maybe Apple will use the upcoming 64 bit ABI change to improve
NSConstantString and friends?

> If you're referring to ASCII-only-ness outside of string constants (like
> for identifiers and such), that certainly could be remedied and there's
> little reason not to IMO, but once again the resulting code would be
> unportable.

No, wasn't talking about that... I believe you are talking about
"extended identifiers" which seems to be on their todo list:
<http://gcc.gnu.org/c99status.html>. I'm no language lawyer, but it
seems to be part of C99, and so would be "portable". (Not that C99 is
widely supported by compiler vendors.)

Michael Ash

unread,

Mar 6, 2007, 12:00:45 AM3/6/07

to

Sean McBride <cwa...@cam.org> wrote:
> In article <11730487...@nfs-db1.segnet.com>,

>> I would be careful calling it a "solution" when it doesn't actually fix
>> the OP's problem.
>
> Quite right. I should have said 'potential maybe-solution'. :) As I
> said way way back: "you might look at using another compiler. [...] I
> don't know if any of them accept better-than-ASCII source files.". We
> have now looked at it, and clearly it won't help the OP.

So it seems, unfortunately.

>> The problem is non-ASCII NSString literals (the error
>> refers to CF but actually means both CF and NS) but one or two of the
>> compilers listed don't even support Objective-C.
>
> It is indeed lamentable that we have pretty much only one compiler
> choice on our platform (at least for anything that needs Obj-C, and more
> and more stuff needs it these days).

I agree completely. I have to say that I'm very happy with the main
compiler being free, and doubly happy that it's something as widespread
and... well, "good" wouldn't describe it but let's just say un-bad... as
gcc. But it would be very nice if there was some real competition to the
Xcode/gcc pair, particularly the Xcode part.

CodeWarrior was obviously dying for a long time before it was finally
axed, but I still lamented the loss.

>> However the C standard does not guarantee any particular behavior in this
>> case, so it's bad practice to rely on it in your code. Likewise, for ObjC
>> string constants, gcc just passes the data through. The problem is that
>> only MacRoman is accepted so for most people the data gets corrupted. It's
>> also not officially supported by the language/libraries, so once again
>> it's bad to rely on it. From what I can see, gcc is already doing as much
>> as it can in this department.
>
> Interesting. Also, a quick look at gcc's man page has also set me
> straight. OK, so maybe it's not gcc that sucks, but something sure
> sucks! :) It's 2007, I should be able to do something like [NSString
> stringWithUTF8String:"Arbitrary Unicode Goes Here"] already! Can you
> sense it frustrates me?! :)

It doesn't bother me too much, but it would be nice if it worked. For what
I do, if I start using accented characters and the like then it's time to
start moving the strings to an external file anyway, but there's not much
reason that this shouldn't be allowed.

The main problem is that there's no guarantee of the encoding of the file.
You can of course tell Xcode that it should be UTF-8 and write your code
to assume string constants are UTF-8, but the compiler can't know about
all of this and I think that's why the standard doesn't specify this
behavior. A simple fix would be to look for a BOM and use UTF-8 if it's
present, and warn about potential brokenness if it's not.

> Maybe Apple will use the upcoming 64 bit ABI change to improve
> NSConstantString and friends?

It wouldn't surprise me. I'm told that they are using it to improve a lot
of other things. Too bad all of the first-gen Intel machines won't be able
to take advantage of it.

>> If you're referring to ASCII-only-ness outside of string constants (like
>> for identifiers and such), that certainly could be remedied and there's
>> little reason not to IMO, but once again the resulting code would be
>> unportable.
>
> No, wasn't talking about that... I believe you are talking about
> "extended identifiers" which seems to be on their todo list:
> <http://gcc.gnu.org/c99status.html>. I'm no language lawyer, but it
> seems to be part of C99, and so would be "portable". (Not that C99 is
> widely supported by compiler vendors.)

You learn something every day. I had no idea that C99 included this. And
here I was thinking that gcc actually implemented all of C99....

I would consider that to be portable in the theoretical sense but not the
practical sense. Which is just another way of saying that no, it's not
really portable yet. :)

Sean McBride

unread,

Mar 6, 2007, 7:22:59 PM3/6/07

to

In article <11731572...@nfs-db1.segnet.com>,
Michael Ash <mi...@mikeash.com> wrote:

> I agree completely. I have to say that I'm very happy with the main
> compiler being free, and doubly happy that it's something as widespread
> and... well, "good" wouldn't describe it but let's just say un-bad... as
> gcc. But it would be very nice if there was some real competition to the
> Xcode/gcc pair, particularly the Xcode part.

Agreed.

> CodeWarrior was obviously dying for a long time before it was finally
> axed, but I still lamented the loss.

Likewise!

> > Interesting. Also, a quick look at gcc's man page has also set me
> > straight. OK, so maybe it's not gcc that sucks, but something sure
> > sucks! :) It's 2007, I should be able to do something like [NSString
> > stringWithUTF8String:"Arbitrary Unicode Goes Here"] already! Can you
> > sense it frustrates me?! :)
>
> It doesn't bother me too much, but it would be nice if it worked. For what
> I do, if I start using accented characters and the like then it's time to
> start moving the strings to an external file anyway, but there's not much
> reason that this shouldn't be allowed.
>
> The main problem is that there's no guarantee of the encoding of the file.
> You can of course tell Xcode that it should be UTF-8 and write your code
> to assume string constants are UTF-8, but the compiler can't know about
> all of this and I think that's why the standard doesn't specify this
> behavior. A simple fix would be to look for a BOM and use UTF-8 if it's
> present, and warn about potential brokenness if it's not.

Well, the compiler has to either be told the encoding (and there is
already a command line flag to gcc to do so), or detect the encoding
(BOM as you say), or assume the encoding. If I read the man page right,
it already assumes UTF8 today (see -finput-charset). UTF8 seems like a
good assumption since it will work with ASCII (ie 99% of C out there).

So wait... shouldn't [NSString stringWithUTF8String:"Arbitrary Unicode
Goes Here"] actually work today?! I seem to remember the last time this
discussion came up (where ever) that the consensus was that we could not.

I just tried, and it does indeed seem to work if my file is UTF8 and
Xcode is told the file is UTF8.

So I guess it's only @"" and CFSTR("") that need fixing!

> > Maybe Apple will use the upcoming 64 bit ABI change to improve
> > NSConstantString and friends?
>
> It wouldn't surprise me. I'm told that they are using it to improve a lot
> of other things.

I better head over to Radar...

> > No, wasn't talking about that... I believe you are talking about
> > "extended identifiers" which seems to be on their todo list:
> > <http://gcc.gnu.org/c99status.html>. I'm no language lawyer, but it
> > seems to be part of C99, and so would be "portable". (Not that C99 is
> > widely supported by compiler vendors.)
>
> You learn something every day. I had no idea that C99 included this. And
> here I was thinking that gcc actually implemented all of C99....
>
> I would consider that to be portable in the theoretical sense but not the
> practical sense. Which is just another way of saying that no, it's not
> really portable yet. :)

:)

Michael Ash

unread,

Mar 7, 2007, 11:24:31 AM3/7/07

to

Sean McBride <cwa...@cam.org> wrote:
> So wait... shouldn't [NSString stringWithUTF8String:"Arbitrary Unicode
> Goes Here"] actually work today?! I seem to remember the last time this
> discussion came up (where ever) that the consensus was that we could not.
>
> I just tried, and it does indeed seem to work if my file is UTF8 and
> Xcode is told the file is UTF8.

This has worked for quite some time. The issue is that, as far as I know,
it's not supported so the code could break later. (I actually don't know
if it's not supported in gcc, or if it's just not supported in C, or if
I'm just wrong about this.) Another issue is that I don't entirely trust
Xcode not to start changing my encoding behind my back.

> So I guess it's only @"" and CFSTR("") that need fixing!

Right. As far as I know, they get the raw data passed to them just as C
strings do. The difference is that they're actually Unicode aware and so
they expect a certain encoding, unlike "" which just treats it as a bag of
bytes. And the problem is that, in Apple's infinite wisdom, the encoding
they expect is MacRoman. I can't imagine the encoding for @"" was MacRoman
in the NeXT days, so they must have changed it for OS X. Why they chose
MacRoman instead of UTF-8 is beyond me.

Jens Ayton

unread,

Mar 7, 2007, 12:39:02 PM3/7/07

to

Michael Ash

>
> You can of course tell Xcode that it should be UTF-8 and write your code
> to assume string constants are UTF-8, but the compiler can't know about
> all of this

...unless of course Xcode was updated to pass -finput-charset=foo parameters
to gcc as appropriate. There's also an -fexec-charset=bar option, to specify
the encoding string and character constants should be converted to when
generating the binary. (In other words, to support UTF-8 CF/NSString
constants, you'd want -fexec-charset=UTF-8 -finput-charset=X, where X is what
Xcode has been told you're using.)

--
Jens Ayton