I am using Sadhiro Tomoyuki's Lingua::JA::Sort::JIS module to sort
Japanese names of stores. I have come close to achieving the order my
client has asked for but am having a little difficulty matching their
request exactly. The problem seems to be collating kana glyphs with
manyogana glyphs. (Please excuse me if I am misusing any terms - this
is my first introduction to Japanese.)
Here is an example of 13 store names ordered with
Lingua::JA::Sort::JIS::msort:
1. 伊勢丹 JR京都店
2. アペックス 福山
3. アミュプラザ 鹿児島
4. オクノ 旭川
5. さくら野百貨店 仙台
6. さつま屋 鹿児島
7. スタンス 米子
8. そごう 神戸店
9. そごう 千葉店
10. そごう 大宮店
11. そごう 横浜店
12. ダイアモンドシティアルル 橿原
13. ニューズ 熊本
My client tells me that entry 1 should actually come after the 3rd
entry and before the fourth. From this description on manyogana, I'm
thinking they're saying that collation of the glyph 伊 should be based
on its katakana adaptation イ which makes sense:
http://en.wikipedia.org/wiki/Manyogana
Note I'm basing many of my statements on staring at and comparing these
glyphs online and so I might be far off.
So my questions are:
1. Is my client correct in their ordering?
2. I believe I've tried all the combinations of collation levels and
kanji classes in the Lingua::JA::Sort::JIS jcmp function but have not
achieved the desired ordering. Have I perhaps missed the correct
combination?
3. Is the solution to first convert the manyogana characters to
katakana and then do the msort? If so does anyone know of a Perl module
to do this or a nice reference that I could use more programmatically
than the image on the link above?
4. Can anyone think of any other glyphs or classes of Japanese glyphs
similar to manyogana that I should be worried about?
Thanks for any help you can give me!
Best,
Mike
The solution that I am using then is to store with each piece of
kana/kanji text, a kana-only phonetization of that text. I then rely on
the content editors to know the context of the text and supply an
accurate phonetization in kana. (In other words, I'm putting the
responsibility on someone else!) There does exist a determinate
ordering of the kana-only text and so this becomes a tractable problem.
Mike
: The solution that I am using then is to store with each piece of
: kana/kanji text, a kana-only phonetization of that text. I then rely on
: the content editors to know the context of the text and supply an
: accurate phonetization in kana. (In other words, I'm putting the
: responsibility on someone else!) There does exist a determinate
: ordering of the kana-only text and so this becomes a tractable problem.
This is indeed best practice; have a look at Sharp Zaurus and other Japanese
PIMs which regularly offer a "pronounciation" field next to "name in written
form", the first is kana, the latter is kanji.
Oliver.
--
Dr. Oliver Corff e-mail: co...@zedat.fu-berlin.de