Hi,
I am in my early exeperiments of converting an existing Seaside Application to full utf-8 support. Feels a bit scary as there are so many layers to change, test and check, from Seaside to file access down to the database...
I just encountered something that surprised me, and I am almost sure I am missing something.
Here is the situation:
My Application currently runs in the local codepage ISO8859-15 and people can upload files via their web browser. There is zero reliable indication as to what encoding the file is in, but it is relatively probable that it is utf-8 or some western european code page like Windows-1252.
Until one isn't.
In the concrete case, the file has no BOM and contains German umlauts as well as a Czech character.
One of the ideas for our transition to pure unicode is to work with UnicodeString for everything from/to Seaside and files as well as the database an not tocuh too much in the "middle" for a start. (There is so much WriteStream on: String new and such that it feels scary to touch everything in one step). But I am digressing..
So here is what I tried:
"Create a unicode String with only German Umlauts"
(UnicodeString escaped: '\u{FC}\u{E4}\u{F6}') asSBString --> 'üäö'
"Now add a characer not present in the local codepage"
(UnicodeString escaped: '\u{FC}\u{E4}\u{F6}\u{10D}') asSBString--> nil
(UnicodeString escaped: '\u{FC}\u{E4}\u{F6}\u{10D}') isAscii --> false
(note: \u{10D} is not in ISO-8859-1, it's a Czech character)
The idea being that there must be a way like in the old convertFromCodePage:... methods to keep as much as possible from the original and reduce the amount of "broken" characters in the result.
But: if my local codepage doesn't support a character, UnicodeString simply gives up, returning nil instead of a String resembling a close approximation of the original.
I couldn't find a way to reproduce what the iconv library provides with options like this:
aString convertFromCodePage: 'UTF-8' toCodePage: 'ISO-8859-15//TRANSLIT'
What am I missing? Is there really no way of converting a UnicodeString to a "local" String with TRANSLIT or IGNORE?
Joachim