Possibility of Bopomofo integration?

20 views

Skip to first unread message

rypervenche

unread,

Apr 17, 2011, 1:17:59 AM4/17/11

to cjklib-devel

Hi there. I was wondering if it would be possible to add 注音符號(Zhuyin
fuhao, Bopomofo) support for this. I know that MDBG.net offers this
from their website when searching the CC-CEDICT, so maybe you could
find a way to get it from there. If not I would be willing to help
write something. It would be difficult at all, just substitution of
pinyin.

Please let me know as I hope to be using this very soon.

Thank you.

Christoph Burgmer

unread,

Apr 17, 2011, 2:55:16 PM4/17/11

to cjklib...@googlegroups.com, rypervenche

Hi, Zhuyin fuhao would be very interesting. So far I haven't gotten around to
having a look at it.

Cjklib's romanisation handling is fairly complex, mostly as there are many
special cases with romanisations. However, for Zhuyin fuhao it should be
fairly straightforward, and I'll try to point you to the places you can have a
look at.

Most generally you need a mapping from Zhuyin fuhao to a romanisation already
supported by cjklib for any benefit. This probably will be Pinyin as this is
the one romanisation with the most data (Unihan, CEDICT, ...). So you should
have a table for each syllable from Zhuyin to Pinyin (actually a mapping from
initial to initial, e.g. "h", and final to final, e.g. "an", would suffice).

The next step is to implement a ReadingOperator to be able to transform single
syllables. This is important so that a continuous stream of characters can be
split into single syllables. See for example MandarinBrailleOperator which
probably comes closest to Zhuyin.

The final step is to implement the Converter that will convert single syllables
between two Readings. Following the Braille suggestion above you can have a
look at PinyinBrailleConverter, which converts between Pinyin & Braille in
both directions.

Once all steps are completed you can
- Convert between Zhuyin and (if you choose so) Pinyin and through existing
convertes between all other Mandarin romanisations
- You can use the Unihan character <-> reading mapping to look-up character
pronunciation in Zhuyin
- You can use CEDICT and other Mandarin dictionaries in Zhuyin.