Problem with traditional characters that are still used as simplified characters with a special meaning (or reading)

3 views
Skip to first unread message

Pit H.

unread,
Sep 10, 2017, 3:27:58 AM9/10/17
to digl...@googlegroups.com
The character 乾 (as a traditional character, read qián or gān) is such an example: it is simplified as 干 for all meanings read gān but remains to be used in its traditional shape 乾 (i.e., s=t) for all meanings read qián, in order to differentiate the two readings.

Samples for use of 乾 in simplified context as a standalone character and in compounds:

qián the first of the Eight Diagrams; sky; family name
乾坤 qiánkūn heaven and earth
扭转乾坤 male and female 
乾隆 Qiánlóng title of the fourth emperor's reign in Qing Dynasty, 1736—1795
乾明 Qiánmíng Buddhist temple's name
乾造 qiánzào (fortune-telling) a man's horoscope

Morpheus-Eastern, however, treats 乾 as a traditional-only form, thus also displaying dictionary entries of 乾 that are read gān and should be shown as 干 in simplified context (as in the samples above).

This is a minor problem only and but should be dealt with when we design the t2s conversion routine. There are only about a dozen of characters where the traditional shape has been split into two simplified shapes.

Michael Bykov

unread,
Sep 26, 2017, 11:14:14 AM9/26/17
to digl...@googlegroups.com
2017-09-10 10:27 GMT+03:00 Pit H. <lingua...@gmail.com>:
The character 乾 (as a traditional character, read qián or gān) is such an example: it is simplified as 干 for all meanings read gān but remains to be used in its traditional shape 乾 (i.e., s=t) for all meanings read qián, in order to differentiate the two readings.

Samples for use of 乾 in simplified context as a standalone character and in compounds:


but how program can determine the context?


 

qián the first of the Eight Diagrams; sky; family name
乾坤 qiánkūn heaven and earth
扭转乾坤 male and female 
乾隆 Qiánlóng title of the fourth emperor's reign in Qing Dynasty, 1736—1795
乾明 Qiánmíng Buddhist temple's name
乾造 qiánzào (fortune-telling) a man's horoscope

Morpheus-Eastern, however, treats 乾 as a traditional-only form, thus also displaying dictionary entries of 乾 that are read gān and should be shown as 干 in simplified context (as in the samples above).

This is a minor problem only and but should be dealt with when we design the t2s conversion routine. There are only about a dozen of characters where the traditional shape has been split into two simplified shapes.

--
Вы получили это сообщение, поскольку подписаны на группу "diglossa".
Чтобы отменить подписку на эту группу и больше не получать от нее сообщения, отправьте письмо на электронный адрес diglossa+unsubscribe@googlegroups.com.
Чтобы отправлять сообщения в эту группу, отправьте письмо на электронный адрес digl...@googlegroups.com.
Чтобы зайти в группу, перейдите по ссылке https://groups.google.com/group/diglossa.
Чтобы настроить другие параметры, перейдите по ссылке https://groups.google.com/d/optout.



--

Pit H.

unread,
Sep 26, 2017, 3:22:55 PM9/26/17
to digl...@googlegroups.com

>>but how program can determine the context?<<

In simplified Chinese context, there should be no problem, as 乾 (qian2) and 干 (gan1 or gan4) have clearly different pronunciations and meanings. The problem is that for some reason, Morpheus does not differentiate between 乾 & 干, and I do not know what are the rules that Morpheus uses to determine whether a Chinese text is simplified or traditional. But once simplified context is "detected", there should be a clear distinction.

In traditional Chinese context, it's different. Everything would be 乾 (gan1 or gan4 or qian2, in order of frequency), and only context (compounds) can decide which pronunciation has to be applied.





Pit H.

unread,
Sep 26, 2017, 3:28:48 PM9/26/17
to digl...@googlegroups.com
In order to answer your question even more clearly: let me now what Morpheus does now to "determine" whether a context is traditional or simplified, or whether you make this decision only on a character-by-character (or word-by-word) basis. If the latter is true, you should introduce some routine that recognizes that at least one "typical" (= non-ambiguous) simplified character is used in a text which would "determine" that this text is simplified Chinese.
Reply all
Reply to author
Forward
0 new messages