This actually affects a whole bunch of scripts, now that I think of it:
Georgian, Armenian, Arabic, Hebrew, a number of Indic scripts,
Canadian Syllabics, Cherokee, and more...
It might be easier to special-case the East Asian scripts, actually
(Chinese, Japanese, Korean, Vietnamese, Thai?, what else?)
Avram
More importantly, what do you do with transliterated names?
Bruce
The handling for transliterated names depends on the style, and the
language. Frank has worked this out, but we need (ahem) language data
exposed to the processor to make it work.
Presumably we'll want to fold in some of the publicly available locale
data on things like default name part ordering to make this more
robust.
On Fri, May 27, 2011 at 1:18 AM, Frank Bennett <bierc...@gmail.com> wrote:
>> This actually affects a whole bunch of scripts, now that I think of it:
>> Georgian, Armenian, Arabic, Hebrew, a number of Indic scripts,
>> Canadian Syllabics, Cherokee, and more...
>>
>> It might be easier to special-case the East Asian scripts, actually
>> (Chinese, Japanese, Korean, Vietnamese, Thai?, what else?)
>
> Khmer is another. That might be the way to go, but let's see how far
> we can get with incremental tweaks for Persian. It's great to be
> getting direct feedback from the field!
We know already that the current logic gets "Georgian, Armenian,
Arabic, Hebrew, a number of Indic scripts, Canadian Syllabics,
Cherokee, and more..." wrong as it is. (South-)East Asian is the
special case.
Avram
For Persian, I guess the first question is whether there is value in
character-based language discrimination at all. Are Persian name
formatting conventions different from, say, Arabic, and if so, is
there a way to distinguish the two at the character level?