Looking to help with Plover for Chinese

428 views
Skip to first unread message

Devin Tankersley

unread,
Apr 21, 2017, 8:40:05 AM4/21/17
to Plover
I'm an American currently doing an MA in linguistics in Taiwan, and I've long felt plagued by the (slow) speed at which I type in Chinese. I came across Plover not that long ago, and after a few false starts, I have recently been able to get into the swing of learning to type with it.

Soon after getting into the groove of learning, I started to wonder how stenography could work for Chinese. By looking through some of the older posts here and around some other sites, I learned of the existence of the Yawei stenography machine (亞偉速錄機), and I went ahead and ordered a textbook for it, since I was super curious how it was all set up, and the details I could find online were quite sparse.

Here's what the layout looks like:

A N I G D  D G I N A
O E U W Z  Z W U E O
      B X  X B

The layout is largely similar to Plover's, with the keys of course being renamed, and also doing away with the * key(s) and # bar. Notably, the layout is mirrored down the middle, so that the left and right hands each have access to the same keys -- as each side governs one syllable, the typist can easily type two syllables at a time, with the left hand governing the first syllable, and the right governing the second. This is quite handy for a number of reasons: first, Mandarin Chinese quite likes grouping morphemes into disyllabic groups (students of Mandarin learn these largely as compounds); second, this helps deal with much of the problems around homophones, for example a syllable like 'fu', stroked as XBU (Yawei does not consider tone), could be 夫 fū "man/husband" 服 fú "serve/clothes" 府 fǔ "seat of government" 婦 fù "(married) woman", among a variety of other possibilities, but the disyllabic 'fufu' (stroked XBU:XBU) is most likely 夫婦 fūfù "husband and wife", with just a few other (rather rare) possibilities.

After receiving my Yawei textbook in the mail, I excitedly went through the opening material and the first chapter. I had hoped to kind of "hack" Plover into being able to use this method, largely by using the existing layout but just pretending as if the keys are as they should be on a Yawei machine, and adding phrases to the user dictionary as I go. But I quickly realized that this is simply not feasible due to the nature of the default steno layout in Plover, as the left hand would be lacking three keys (i.e. Yawei splits Plover's left S into A and O, and Yawei's left D and Z are subsumed under Plover's *).

I looked around a bit to see what other users had done for adapting Plover to other languages, and I am fairly certain that someone more coding-savvy than I am could modify the layout such that Yawei could be implemented. Were that to happen, I would happily do my part to fill out a dictionary and possibly even make some practice tools for learning the system, if it seems like people would be interested in that. I am not sure about the copyright situation with something like this, but because Plover seems to be based in some part on existing theories, I hope it would be deemed appropriate. That being said, after a more thorough searching, I've found a website that contains much of the same information as the textbook, so it seems fair to say that this information is available to the public: http://ptr.chaoxing.com/nodedetailcontroller/visitnodedetail?knowledgeId=2183315

As a side note, I have considered buying a Yawei steno machine, but I am hesitant to do so for two main reasons: such a machine will likely only be able to type simplified Chinese, and I have my doubts about how easy it would be to modify the corresponding software to allow for traditional characters; as an MA student without full time employment, I do not have a lot of disposable income to spend on something that might not be able to type what I would need it to.

tl;dr: if someone could adapt Plover's layout to fit Yawei's (as shown above), I would super appreciate it and would work on writing up the dictionary for Chinese steno on Plover.

Ted Morin

unread,
Apr 21, 2017, 8:56:09 AM4/21/17
to Plover
Hi Devin,

Thanks for your research, this is some very useful detail. With the next version of Plover we do plan to support changing its language and layout, so the Yawei layout is definitely on the table. Having you around as a resource for reference for implementation questions would be great.

It's easiest for us to manage issues on GitHub, I've copied your message to our issue tracker: https://github.com/openstenoproject/plover/issues/749

I'll ask further questions there.

All the best,
Ted

--
You received this message because you are subscribed to the Google Groups "Plover" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ploversteno+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Devin Tankersley

unread,
Apr 28, 2017, 6:10:31 AM4/28/17
to Plover
Yawei Chinese steno overview

1. Yawei Layout
The layout for the Yawei steno machine is overall quite similar to that of English steno: two rows of keys for the fingers and two keys for each thumb. The layout is given below:


A N I G D   D G I N A
O E U W Z   Z W U E O
      B X   X B

The keyboard layout is mirrored down the middle, such that each side has the same keys available and the same fingers have access to the same keys (e.g. D, G, Z, and W are available to either pointer finger). By convention, when writing Yawei steno code only one order is used, regardless of which side it is actually used on. The order of the keys is given below:

X B D Z G W I U N E A O

(Yawei textbooks tend to reorder NA as AN, since it corresponds to the sound 'an'. This helps with ease of reading, but for the sake of consistency this convention will not be followed here.)

Above the keyboard on the Yawei steno machine is a row of six function keys which are primarily used to select output if the input is ambiguous. The screen above the function keys is a display for the output.

One area where Yawei steno differs from Plover (and most English steno theories?) is the placement of the fingers: the index fingers cover the first four fingers on both sides (i.e. D, Z, G, W), while the middle, ring, and pinky fingers each get one column (I and U for the middle, N and E for the ring, and A and O for the pinky).

2. Phonetic Input
Four of the keys are for consonants (B, D, Z, G) and six are for vowels/rimes (I, U, N, E, A, O). X and W are largely used as function keys, but X is also used in conjunction with the consonant keys.

Each of the sound keys (i.e. all keys except for X and W) also corresponds to a syllable of Mandarin, so that single key combinations can be used to form disyllabic compounds. The correspondences for each of these keys are as follows:

B = bu, D = de, Z = zhi, G = ge
I
= yi, U = wu, N = en, E = e, A = a, O = (w)o

Below are some example combinations of single keys, one on each hand (note: the Yawei convention of showing which side a given key is on is to use a colon (:) much in the same way a hyphen (-) is used in English steno, and so for the sake of clarity the English steno convention will be followed):

D-Z 得知 dezhi B-D 不得 bude G-Z 擱置 gezhi
I
-U 義務 yiwu  A-I 阿姨 ayi  N-D 恩德 ende
E
-U 訛誤 ewu   O-D 我的 wode I-G 一個 yige

Relatedly, each consonant has an associated vowel, so that if that vowel is used, there is no need to stroke the rest of the syllable. That is, there is no need to stroke BU for 'bu', as B already covers 'bu'. If another vowel is used, e.g. BA 'ba', then the default vowel is removed.

The other consonants of Mandarin are stroked using combinations of the consonant keys, generally with two thumb keys X and B changing the quality of the consonant (e.g. unaspirated to aspirated, oral to nasal, etc.), but not the place of articulation. Below is the full list of consonants and their Yawei codes:

B 'bu'  BG 'pu'  XB 'mu'  XBU 'fu'
D
'de'  BD 'te'  XBD 'ne' XD 'le'
Z
'zhi' BZ 'chi' XZ 'shi' XBZ 'ri'
DZ
'zi' BDZ 'ci' XDZ 'si'
G
'ge'  XBG 'ke' XG 'he'
GI
'ji' XGI 'qi' XI 'xi'

The vowel/rime keys are also used in conjunction to form the various rimes of Mandarin. While many of the key combinations resemble their pinyin equivalents, some of them differ rather significantly. Below is the full list of rime codes:

I 'yi'      U 'wu'    IU 'yu/ü'  N 'en'
E
'e, ei'   NE 'eng'  A 'a'      O '(w)o'
AO
'ao'     NA 'an'   EO 'ou'    IN 'yin'
UE
'wei'    IA 'ya'   NO 'ang'   IO 'ai'
IE
'ye'     EA 'yo'   XE 'er'    UN 'wen'
UA
'wa'     UIO 'wai' INE 'ying' IUE 'yue'
IUN
'yun'   IAO 'yao' INA 'yan'  UEO 'ong/weng'
IUEO
'yong' IEO 'you' INO 'yang' UNO 'wang'
UAN
'wan'   IUAN 'yuan'

Note that given the nature of pinyin spelling rules, there is some variation in how the corresponding syllables will be spelled out, e.g. in the above table U stands for 'wu', but the pinyin syllable 'gu' is nevertheless GU, as the leading w/y is dropped if another consonant is used. Also, while the correspondence between Yawei steno codes and pinyin is nearly one-to-one, the keys E and O potentially refer to two different rime categories.

The key E has both 'e' and 'ei', and this works because these two rhymes only contrast in a small handful of cases, namely 'ge' vs. 'gei', 'ze' vs. 'zei', 'she' vs. 'shei', and 'me' vs 'mei'. In the first example, 'ge' is stroked as G, leaving 'gei' to be stroked as GE, so there is no conflict. In the second and third examples, 'ze' and 'she' are stroked DZE and XZE respectively, and so to make a distinction I is added to the codes for 'zei' and 'shei', giving DZIE and XZIE. In the final example, 'me' is stroked as XBE and 'mei' is stroked as XBIU.

In the case of the O key, which on its own is 'wo', it is also used for 'o', which is only a rime on its own (i.e. without a preceding consonant) in sentence final particles, usually written 哦, 喔, or 噢. I am not yet certain how Yawei handles the latter case.

2.1 Alternative Phonetic Codes
Given the gaps in the combinations of the consonant and rime codes, there are a few alternative strokes for some syllables that may be easier to type, either because they involve using "stronger" fingers (thumb, index, middle) instead of "weaker" fingers (ring or pinky), or they use keys that are next to each other (allowing the use of only one finger depressing both keys at once), or they use fewer keys. Some of these involve either dropping some of the rime keys and/or replacing one or more keys with the function key W.

All of the alternative codes are given below with their fully phonetic code on the right for comparison.

WIU=GIU 'ju'        XWIU=XGIU 'qu'
WIUN
=GIUN 'jun'     XWIUN=XGIUN 'qun'
WIUE
=GIUE 'jue'     XWIUE=XGIUE 'que'
WIUNA
= GIUNA 'juan' XWIUNA=XGIUNA 'quan'
XBZIU
=XBDIU 'nü'    XZIU=XDIU 'lü'
BWIU
=XDIU 'lü'      XWIUO=XGIUO 'huai'
WIUEO
=GIUEO 'jiong' GWA=GUA 'gua'
XBGW
=XBGUA 'kua'    ZIU=ZUA 'zhua'
XZWU
=XZUA 'shua'    GWI=GUNA 'guan'
GWIU
=DZUNA 'zuan'   BGWIU=XDZUNA 'suan'
WUNO
=ZUNO 'zhuang'  BWIUE=XDIUE 'lüe'
XBWIUE
=XBDIUE 'nüe' XWIUEO=XGIUEO 'qiong'
XBGI
=XBGUIO 'kuai'  XDN=XDNE 'leng'
BDN
=BDNE 'teng'     BDIN=BDINE 'ting'
DIN
=DINE 'ding'     BUI=XBUA 'fa'
XGW
=XGUA 'fa'

3. Briefs in Yawei Steno
Yawei handles certain types of briefs in a rather formulaic manner, categorizing them based on how the function keys are used. Each category is described below.

3.1 Specific Character Codes (Single Briefs) 漢字特定碼
3.1.1 One Hand Briefs [numbers, etc]
These briefs are mostly used for certain commonly used individual characters (as well as a few phrases), and also the characters for numbers. Stroking the specific code on the left hand while using one of the function keys on the right then gives a two character brief, as in WU for 五 wu "five", when used in WU-W gives 五月 wuyue "May (the fifth month)".

WO líng "zero"
WI
yī "one"     WI-X 一切 yíqiè "all"    WI-W 一般 yìbān "ordinary"
XWE
èr "two"    XWE-X 二者 èrzhě "both"
WN
sān "three"  WN-X 三好 sān hǎo "three virtues" WN-W 三月 sānyuè "March"
ZW
sì "four"    ZW-W 四月 sìyuè "April"
WU
wǔ "five"    WU-W 五月 wǔyuè "May"
WEO
liù "six"   WEO-W 六月 liùyuè "June"
XGWI
qī "seven" XGWI-W 七月 qīyuè "July"
BW
bā "eight"   BW-W 八月 bāyuè "August"
GW
jiǔ "nine"   GW-W 九月 jiǔyuè "September"
XZW
shí "ten"   XZW-X 十分 shífēn "very" XZW-W 十月 shíyuè "October"
WIO
百分之 bǎifēn zhī "percent"
WIAN
千分之 qiānfēn zhī "(a) thousandth"

Certain specific character codes are used to distinguish otherwise homophonous phrases, as in the following examples where the specific character code for 是 shì "to be" is used to distinguish it from other characters that are also pronounced 'shi':

XZI-G 是個 'shi ge' vs XZ-G 詩歌 'shige'
XZI
-B 是不 'shi bu' vs XZ-B 師部 'shibu

3.1.2 Two Hand Single Briefs 單音詞特定碼
For most of the possible syllables in Mandarin, up to three of the most commonly used characters with that pronunciation are chosen for briefs of the form X-[code], W-[code], or XW-[code]. That is, for a given syllable, stroking X on the left hand and the code for that syllable on the right results in a particular character, stroking W on the left hand results in another, and so on. Consider the following examples with homophonous syllables.

IEO 'you'  X-IEO yǒu "have" W-IEO yóu "by"     XW-IEO yòu "again"
IU
'yu'    X-IU yú "at"     W-IU yǔ "and"      XW-IU "at"*
GI
'ji'    X-GI jí "and"    W-GI jǐ "how many" XW-GI jí "that is"
D
'de'     X-D de "of"      W-D de "aspect marker"
DZ
'zi',   X-DZ zì "self, from" W-DZ zì "character"
XZ
'shi'   X-XZ 使 shǐ "cause" W-XZ shí "time"
BDZ
'ci'   X-BDZ cǐ "this"  W-BDZ cì "instance"
BD
'te'    X-BD tè "special"
O
'wo'     X-O wǒ "I, me"
XBDIU
'nv' X-XBDIU nǚ "woman"
*The codes for 於 and 于 here are reversed from original due to their differing distributions in Traditional Chinese.

Though in fact, there are only 10 uses of the XW briefs, and not all possible syllables use either X or W briefs (though if the W brief is used, then there is also an X brief).

XW-BZNE chéng "ride"
XW
-BZU chǔ "place"
XW
-IEO yòu "again"
XW
-IU 於/于 yú "at"
XW
-GI jí "that is, thus"
XW
-XZN shén "god"
XW
-XZNE shěng "province"
XW
-UE wéi "only"
XW
-XINA xiàn "county"
XW
-XNE zhēng "struggle"

In addition, the codes for the banker's numerals are the single hand number codes on the right hand, with W stroked on the left, as shown below.

W-WI yī "one"     W-XWE èr "two"     W-WN sān "three"
W
-ZW sì "four"    W-WU wǔ "five"      W-WEO liù "six"
W
-XGWI qī "seven" W-BW bā "eight"    W-GW jiǔ "nine"
W
-XZW shí "ten"   W-WIO bǎi "hundred" W-WIAN qiān "thousand"

3.2 Two Character Briefs (1-2 Briefs)  雙音詞語略碼
The format of 1-2 briefs is either [syl]-X or [syl]-W, where [syl] is a given syllable code. This means that for each possible syllable, there is the potential for up to two briefs that start with that syllable. Below are few examples:

B 'bu'   B-X 不能   bùnéng "cannot" B-W 部分 bùfèn "part"
XB
'mu'  XB-X 目的 mùdì "goal"      XB-W 目前 mùqián "at present"
D
'de'   D-X 得到  dédào "receive"  D-W 德國 Déguó "Germany"
XZ
'shi' XZ-X 時間 shíjiān "time"   XZ-W 時候 shíhòu "(at what) time"

3.3 Three Character Briefs (1-3 Briefs) 三音詞語略碼
The format of 1-3 briefs is [syl]/X-X, where [syl] is a given syllable code, stroked with either hand. Note that not all possible syllables actually have a corresponding 1-3 brief, i.e. there are some gaps.
Below are some examples of 1-3 briefs:

BD 'te'  BD/X-X 特別是 tèbié shì "especially"
XBU
'fu' XBU/X-X 服務員 fúwùyuán "customer service representative"
XGI
'qi' XGI/X-X 企業家 qìyèjiā "entrepreneur"
DI
'di'  DI/X-X 第一次 dì yí cì "the first time"

3.4 Four Character Briefs (2-4 Briefs) 四音詞語略碼
The format of 2-4 briefs is [syl1]-[syl4]/X-X, where [syl1] is the code for the first syllable of the target phrase, and [syl4] is the code for the fourth syllable of the target phrase. See the examples below:

B 'bu', XDZ 'si'      B-XDZ/X-X 不好意思 bùhǎoyìsi "sorry, excuse me"
XDZUE
'sui', BDZ 'ci' XDZUE-BDZ/X-X 雖然如此 suīrán rú cǐ "be that as it may"
XZ
'shi', G 'ge'      XZ-G/X-X 市場價格 shìchǎng jiàgé "market value"
B
'bu', GI 'ji'       B-GI/X-X 不切實際 bú qiè shíjì "unrealistic"
DINA
'dian', BO 'bo'  DINA-BO/X-X 電視廣播 diànshì guǎngbō "TV commerical"

3.5 Multiple Character Briefs (Multi Briefs) 多音詞語略碼
The format of Multi Briefs is [syl1]-[syl2]/[syl#]-XO, where [syl1] is the code for the first syllable of the target phrase, [syl2] is the second, and [syl#] is the code for the final syllable. See the examples below:

DZEO 'zou', DZ 'zi', XDU 'lu'  DZEO-DZ/XDU-XO 走自己的路 zǒu zìjǐ de lù "to go down one's own path"
BDI 'ti', GAO 'gao', XDIU 'lü' BDI-GAO/XDIU-XO 提高工作效率 tígāo gōngzuò xiàolü "increase work productivity"
DZOI 'zai', ZE 'zhe' XIA 'xia' DZOI-ZE/XIA-XO 在這種情況下 zài zhè zhǒng qíngkuàng xià "under this sort of circumstance"
GIA 'jia', XDI 'li', IA 'ya'   GIA-XDI/IA-XO 加利福尼亞 jiālìfúníyà "California"

3.6 Disyllabic Suffix Briefs (Suffix Briefs) 後置成分雙音詞特定碼
The format of Suffix Briefs is [syl1]-[suffix], where [syl1] is the code for the first syllable of the target phrase, and [suffix] is the brief for the corresponding two syllable ending. The structure of the [suffix] is to take the consonant of the first syllable + W + the rime of the second syllable. For example, the suffix brief for ZU-I 主義 zhǔyì "-ism" is ZWI. At the time of writing, there are only 11 suffixes that are used in this way, four of which are given in examples below.

XZE 'she'    XZE-ZWI 社會主義 shèhuìzhǔyì "socialism"
XBA
'ma'     XBA-ZWI 馬克思主義 makèsizhǔyì "Marxism"
IO
'ai'      IO-ZWI 愛國主義 àiguózhǔyì "patriotism"
XBUNE
'feng' XBUNE-XZWUE 封建社會 fēngjiànshèhuì "feudalist society"
GUE
'gui'    GUE-ZWU 規章制度 guīzhāng zhìdù "rules and regulations"
GAO
'gao'    GAO-XWAO 高等學校 gāoděng xuéxiào "college/university (i.e. schools of higher learning)"

4. Other Types of Output
4.1 Punctuation
Much of the commonly used punctuation marks come in pairs, each member of the pair using the same code, but on opposite sides. For example, the full-width comma (,) is -DGI and the full-width period (。) is DGI-. Below are a few more such pairs:

ZG-    -ZG
DGIN
-  -DGIN
DW
-     -DW
DZIU
-  -DZIU

Some of the punctuation marks are "on their own", as it were, and there is a whole category of additional punctuation marks and symbols that stroke XU- combined with a mnemonic syllable. This category also has some overlap with the above commonly used marks. The "mnemonics" are given in square brackets, with an explanation in English in parentheses.

XBDG-
XU
-GINE # [井] (similar in appearance)
XU
-BIO %  [百(分號)] (using 100, as percent is "per 100")
XU
-IUAN [圓(括號)] (meaning "round")
XU
-IUN [云(諧音)] (similar in sound to the above)
XU
-UN   [問(號)] (meaning "question (mark)")
XU
-GIU  [句(號)] (meaning "sentence (mark)")

4.2 Arabic Numerals and Math Notation
Arabic numerals (1, 2, 3, etc.) and other symbols used for math are output by stroking XN on one side, and depressing a key on the other side. By stroking XN, a new sort of layout is activated, as shown below:

< ÷ × - +   1 3 5 7 9
> ) ( / =   2 4 6 8 0
      % X   X .

Stroking XN- and a key on the right side gives the numbers, while -XN and a key on the left gives the symbols. Using the X keys as shown above would give a space, i.e. XN-X and X-XN both give a single space. Below are a few more examples:

XN-D = 1        XN-B = .
XN
-Z = 2        XN-O = 0
D
-XN = +        W-XN = /
A
-XN = <        B-XN = %

Note that there is usually a space between a numeral and a Chinese character, but this is not supplied by the machine, so the user is expected to supply the space using XN-X or X-XN.

4.3 Latin Letters and Pinyin
Latin letters (A, a, B, b, etc) can be input individually using XU- (for capital letters) or XUE- (for lower case letters) combined with a code on the right hand side. These codes are roughly what one would expect based on the phonetic input-output. That is, XBU corresponds to 'fu' in pinyin, and so XU-XBU is 'F'. Below are a few more representative examples:

XU-A  A  XUE-A  a
XU
-GI J  XUE-GI j
XU
-BD T  XUE-BD t
XU
-UE V  XUE-UE v
XU
-IA Y  XUE-IA y

Latin letter output is also handled via pinyin output, where the user inputs phonetic code (no briefs), and then strokes WUE or WUEO to output the corresponding pinyin for that phonetic code. According to the textbook, this is to avoid typing out the wrong characters for a name, with the note that the stenographer would go back and correct the script once they know how the name is written. Below are the examples given in the text:

XDI/WUE/XGUEO-XGI/WUE   lihongqi (i.e. 李宏起, a personal name)
ZNO
/WUE/GIA-GIE/WUE     zhangjiajie (i.e. 張家界, a place name)

Note that spaces are not added automatically.

An apparent feature of Yawei steno machines not mentioned in the textbook but included in the software package on the company website, is the inclusion of English output via Chinese. What this means is, rather than typing out each individual letter as would shown above, it is apparently possible to stroke a phrase of Chinese, and then input a code afterwards that converts that phrase into Chinese.

I'm not sure the exact intention behind this, but I believe that it's meant to deal with situations where the speaker switches to English for a word or two. For example, in Taiwan it's not uncommon for people to use certain words of English, such as "idea", "care", "logo", etc., in a Mandarin sentence. Consider a sentence like the one below, which switches into English:

我沒有什麼idea - wǒ méiyǒu shénme 'idea' - "I don't have any ideas."

For the stenographer, it would take a bit longer to stroke the individual letters of "idea" than it would to stroke a Chinese equivalent to "idea" (意見 yìjiàn, for example), and then use a code similar to the brief codes to have that converted into English. I forget what the actual code is, but I can certainly find it again.

5. Strategies for Single Character Output
Given the issue of homophony, typing a given Chinese character sans context and using only phonetic input forces the user to choose which character to output. Forced character selection, while a part of the Yawei steno process, is not the only strategy for dealing with this problem, however. Below are two strategies (that don't involve the single character briefs discussed above) for typing individual characters.

5.1 Selection Via Elimination 聯詞消字定字法
This strategy involves typing out a phrase containing the target character, then deleting a non-target character that is also in that phrase. For example, to output 知 zhī "know", the textbook recommends stroking Z-DAO for 知道 zhīdào "to know", and then pressing -W to delete 道 dào, leaving the target character 知 zhī.

This strategy also works with phrases where the target character is at the end, and so the user must delete the preceding character. For example, one way to type 家 jia "house" would be to stroke BNA-GIA for 搬家 bānjiā "to move (house)", then press W- to delete the second-to-last character on the stack, namely 搬 ban.

In theory, using W to delete either the last or the second-to-last character also works when stroking phrases of more than two characters. In practice, I'm not sure how often this is employed.

Note that briefs may also be used for this strategy, so to type 起 qǐ "rise" for example, one could stroke XGI-X for 起來 qǐlái "come up", then delete 來 lái using -W.

5.2 Character Shape Codes 形碼
In a situation where the stenographer must type a character whose pronunciation may not be clear to the stenographer, there is a way to type the character by looking at it's "shape", i.e. its component parts. On the machine, to enter into this mode, the typist strokes XN-XN, then types the target character(s), and strokes XN-XN again to leave this mode.

To give an example of how this works, if the user wanted to type 謝 xiè "to thank" in this way, they would break the character up into two parts, picking the largest coherent parts available. In the case of 謝 xie, that would be 言 yán and 射 shè -- the typist should avoid picking out a smaller part of a larger whole, so picking out 言 yán and 寸 cùn for example, would not work. With the two parts found, the user then strokes those two parts phonetically, so in the above case that would be IAN-XZE.

Another example would be the comparatively rare character 趑 zī (the first character in 趑趄 zījū "walk with difficulty"): this character can be broken up into 走 zǒu and 次 cì, and so after entering into the character shape input mode, user would stroke DZEO-BDZ. The user should not take the smaller 欠 qiàn character that could also be found within 次 cì / 趑 zi.

For characters that cannot be easily broken up into distinct parts this way, the typist can instead take the phonetic code for one part and the stroke code for the other part, or instead use the stroke codes for the first and last parts of the character. For example, the character 孓 jue could be typed by using the phonetic code XD 'le' standing for 了, combined with XBDA 'nà' which stands for 捺 "a stroke going down and to the right", giving XD-XBDA. See two more examples below

ZE-XDZ -- 'zhé' "a bent/turning stroke" + 私/厶 'si'
XGNE-XNGE -- 'héng' "a horizontal stroke" x2 (both the first and last strokes in the case are the same)

If a character cannot be easily broken up, then then the left hand types it phonetically while the right hand gives the code for the first stroke. For example, 之 zhi "of" is stroked Z-DINA in this way, with Z 'zhi' for the sound and DINA 'dian' referring to 點 dian "a dot" (the top part of the character is a "dot" stroke).

5.3 (Forced) Character Selection
In the event that a given stroke cannot be disambiguated, it is up to the typist to pick the target output using the function keys above the keyboard. On the Yawei steno software, it gives each phrase a sort of "index", representing where said phrase will appear. For example, if a given output is the third choice on the first page, its index will be (1-3), with the page first and the spot on that page second. To give an example of this, XZ-GINA 'shijian' could refer to a number of different possible phrases, so under the heading for XZ-GINA 'shijian', each output would be given an index, as in the hypothetical example below.

實踐 (1-1)        "(put into) practice"
事件 (1-2)        "event, incident"
始建 (1-3)        "start building"
失檢 (1-4)        "be indiscreet"
屍檢 (1-5)        "autopsy"
時艱 (2-1)        "hard times"

Note that this kind of output selection could also be used for single characters, but given the amount of homophony in such circumstances and how long it would take to "find" the target character, it would probably be best to avoid such an approach.

6. Other Functions [deletion, selection, movement, etc]
As mentioned above, the Yawei steno machine also allows the typist certain typesetting functions, such as punctuation, deletion, adding spaces and line breaks, as well as "moving" the cursor. Below is a brief overview of some of these other functions:

-X      adds a space
W
-      deletes the second-to-last character
-W      deletes the last character
W
-W     deletes the preceding stroke
XBW
-    adds a line break and indents two spaces
-XBW    adds a line break without indenting
WUE     converts the previous phonetic input
into Pinyin output
WUEO    same
as above
XAN
-I   move cursor up
XAN
-U   move cursor down
XAN
-W   move cursor left
XAN
-E   move cursor right
XU
-XZEO move to the beginning of the line (= HOME), right hand strokes shou "beginning"
XU
-XBO  move to the end of the line (= END), right hand strokes mò "end"
XU
-IAN  move up one page (= PgUp)
XU
-UEO  move down one page (= PgDn)



7. Concluding Remarks
So that's the basics of Yawei steno for Chinese. I have some thoughts about how this system could be adapted for Plover, and I am working on a sample dictionary as a sort of "proof of concept". I will try to get these thoughts organized well enough soon and provide another write-up along with the sample dictionary.
To unsubscribe from this group and stop receiving emails from it, send an email to ploversteno...@googlegroups.com.

Ted Morin

unread,
Apr 28, 2017, 9:33:49 AM4/28/17
to Plover
This is looking really good. I've been thinking about it, and I think the mirrored order is okay, as will be the ambiguous input. I think can we implement a tool plugin for Plover that will display the choices when you enter in an ambiguous entry. Speaking of, what happens if you don't hit any of the choices? Is there a default?

Joshua Taylor

unread,
Apr 28, 2017, 3:29:05 PM4/28/17
to plove...@googlegroups.com

This looks awesome! Now you're making me want to learn Mandarin just so I can steno in it. :D

-- Joshua Taylor

Devin Tankersley

unread,
May 15, 2017, 4:25:13 AM5/15/17
to Plover
Hey there,

So I think that if nothing is used to disambiguate the input (such as selecting the target output using the selection keys), I would think that it would simply go with the "first" output, i.e. if there are a few options it'll go with whatever output corresponds to the 1-1 index (= first entry on the first page of options).

I am just about done writing up some of the issues (and potential solutions) I have thought of for adapting Yawei steno, and I hope to post that here soon. Also, I plan to include a sort of "proof of concept" sample dictionary based on the dictionary files from Rime Input Method (https://github.com/rime/brise), modeled after the English steno dictionaries in Plover.

And I am glad this stuff is interesting enough that it might get more people interested in learning Chinese!

-Devin T.

Devin Tankersley

unread,
May 15, 2017, 9:41:47 AM5/15/17
to Plover
Adapting Yawei to Plover

1. Free variation of input (on either hand) vs fixed to one hand
Yawei steno (ostensibly) allows typing a given syllable code on either hand, potentially allowing for alternative stroking possibilities. For example, the three character phrase 是不是 shì bú shì "is it the case that... (lit. is [or] is not)" could theoretically be stroked XZI-B/XZI, XZI/B-XZI, or even XZI/B/XZI. (Note that given its high frequency of use, 是 shì "to be" has a dedicated code that differs from other characters with the same 'shi' reading.)

One potential way of adapting this to Plover would be to include all three possible stroking orders in the dictionary. One problem we might run into, however, is the issue of handedness: if the system does not have to consider which side the one handed input comes in on, then we can probably just include the three entries as written above; if, however, the system must take into account which side one handed input occurs on, then we'd have to add many more entries into the dictionary, as listed below:

是不是 shì bú shì:
XZI-B/XZI-
XZI-B/-XZI
XZI-/B-XZI
-XZI/B-XZI
XZI-/B-/XZI-
XZI-/B-/-XZI
XZI-/-B/-XZI
XZI-/-B/XZI-
-XZI/B-/XZI-
-XZI/B-/-XZI
-XZI/-B/-XZI
-XZI/-B/XZI-

Having to list each and every possible way to stroke a given phrase like this would clearly take up quite a bit of space, so it would probably be pretty beneficial if the system could take in one handed input without considering the side.

Of course, to adapt Yawei as is, there are certain areas where the input side does matter. If both sides are given input, then the left and right have to be distinguished so that the left side is "first". More to the point, a small number of punctuation marks have the same code, but depend on which side they are input on. For example, -DGI is a fullwidth comma (,), whereas DGI- is a fullwidth period (。); -ZG is a list comma (、), and ZG- is a question mark (?).

One way to deal with this, perhaps, would be to have the majority of the phrases listed in the dictionary file without respect to hand side, but have the punctuation marks that require a specific side be specified in the dictionary. In other words, 是不是 shì bú shì "is [or] is not" could be given an entry like XZI B XZI (using spaces to mark syllable boundaries, and not stroke or side boundaries), but the comma would be listed as -DGI, with the side it occurs on as part of its entry. I am not sure how easy it would be to implement something like this, but this might be one way of dealing with the issue.

2. The character selection issue
Unlike English steno, there are times in Yawei steno where the user is permitted (or obliged) to select the target output from a list of options when the input corresponds to a number of possible outputs using the screen and function keys on the machine. One way of dealing with this would be to include a number "index" for inputs that correspond to more than one output. The user would then simply use the number for that entry following the output.

Consider the pair 收到 shoudào and 受到 shòudào, both of which can be translated as "receive", but have different uses/shades of meaning -- these two would both be stroked as XZEO-DAO. In some cases it would be possible to rely on context to disambiguate the two if the such phrases are included in the dictionary, e.g. XZEO-DAO could resolve to 受到 shòudào if 影響 yǐngxiǎng "influence" follows, since 受到影響 shòudào yǐngxiǎng means "to receive influence (from)", and 收到影響 shōudào yǐngxiǎng wouldn't make much sense, as 收到 shōudào is used more for physical items. So, if XZEO-DAO/INE-XINO for 受到影響 shòudào yǐngxiǎng is included in the dictionary, then there is no need to choose between the two verbs.

However, not all contexts can be anticipated, and there may be times when the user will have to manual pick which one is needed. For such cases, the user could rely on stroking something like XZEO-DAO/1 for 收到 shōudào and XZEO-DAO/2 for 受到 shòudào. Since the number bar is not used in Yawei, that leaves the top row of keys otherwise available for other uses, and I think this would be a handy addition. Note that if the user doesn't select a specific output, i.e. they don't use the numbers, then I think it should just default to the first entry unless it can be otherwise disambiguated, as shown in the example above.

One problem with this could be the fact that the index given to a particular output is still arbitrary, which means that in some cases the user might have to simply guess which index corresponds to the target output until they have it memorized. If the "suggestions" window could be modified to show potential outputs, e.g. if the user strokes XZEO-DAO, then 收到 shōudào shows up under /1 and 受到 shòudào shows up under /2, that would reduce the amount of guesswork required by a great deal.

3. Choosing what items get briefed
The structure of briefs is rather mechanical/formulaic in Yawei: basically, each possible syllable is assigned up to three single character briefs, up to two 2 character briefs, and one 3 character brief, as in the example below.

DINA 'dian'
X-DINA 點 diǎn "point, o'clock, order, dot, etc."
W-DINA 電 diàn "electric(ity)"
DINA-X 電腦 diànnǎo "computer"
DINA-W 電話 diànhuà "phone"
DINA/X-X 電視機 diànshìjī "television"

For some syllables, however, there are few or even zero assigned briefs. For example, the syllable 'ha', stroked XGA, is only assigned a 3 character brief, namely 哈爾濱 Hā'ěrbīn "Harbin" (a city in northeast China). However, there are a number of other words beginning with 'ha' that could be included: the onomatopoeia for laughter is (unsurprisingly) 'hāhā' or 哈哈, so that could be given a 2 character brief; a commonly used Taiwanese word for frog is 蛤蟆 hámà, so that could also be included.

To be honest, I am not sure why there are such gaps in the brief system, but my best guess would be that students were expected to have these memorized, and perhaps the designers of the textbook only wanted the students to focus memorizing certain high frequency words or technical 2vocabulary. It is also possible that the actual Yawei steno machine does fill in these gaps, but that they are simply not listed in the textbook.

One other point I'd like to bring up about the Yawei briefs is the overt emphasis on political vocabulary, and that in some cases the phrases chosen for briefs are rather dated. For example, the 3 character brief for GIE 'jie' is 解放軍 jiěfàngjūn "Liberation Army", but in based on the Rime pinyin dictionary, which has assigns each entry weight based on how frequently it was used in a particular corpus (more on that below), I have found that the most common three character phrase starting with 'jie' is 接下來 jiēxiàlái "moving on, next, etc." Personally I believe having that as the 3 character brief assigned to 'jie' is more appropriate given that it would have a greater range of use (many presenters use this kind of phrasing, but not everyone feels the need to bring up the Liberation Army all the time), and placing preference on Chinese political terms is hardly neutral, and I would hope that the end result of this endeavor could be amenable to all sides of the Taiwan Strait.

4. Building a dictionary based on Rime
The Rime Input Method Engine is an open-source collection of different input methods for Chinese all rolled into one. The Rime team has uploaded the fruits of their labor for any who would like to use it, including two dictionary (or possibly three) files used in the input methods. One of these dictionaries uses weights to control which outputs are ranked at the top (and thus more likely to appear), so that rare combinations are available but not prioritized.

What I've done is taken this weighted dictionary and combined it with the other dictionary file, which includes some of the other pronunciations used for certain characters and phrases. This is relevant to me as some of the pronunciations that are standard in Taiwan are not used in China, and again I would hope that the end result of all this is something that could be used by people on either side. So, after a bit of trial and error, along with a few false starts, I was able to combine them in such a way that the best parts of both are retained, i.e. a dictionary file ordered by frequency of usage with pronunciation that reflects both types of Standard Mandarin.

Using this file, I intend to come up with a list of briefs based on frequency of usage, avoiding proper names as the choice for 2 character and 3 character briefs. (For 4 character and 5+ character briefs, there is a greater range of possibilities, and it doesn't feel like it would be promoting certain phrases over others to have proper names.) I will also include this sample dictionary for everyone to take a look at. Because I don't know how the handedness and input part will work out, for entries with an even number of characters, each pair of characters will be stroked together with the side specified, and for entries with an odd number of characters, the final character will be left "ambiguous". See below for an example of each, where the hyphen (-) is used to mark the side.

Even number of characters:
GUA-XTINA/ZDI-XIA = 瓜田李下 guā tián lǐ xià
"in suspicious circumstances"

Odd number of characters:
GIA-ZDI/XBU-XBDI/IA = 加利福尼亞 jiālìfúníyà
"California"

I've also gone through and assigned single character briefs for each of the possible syllables (recall that this sort of brief uses X, W, or XW on one side and a syllable on the other to output a single character, e.g. X-IEO for 有 yǒu "have" and W-IEO 又 yòu "again"), generally assigning the character with the greatest weight with the first brief, the second character with the second brief. In a few cases, I felt that it was a pity that Yawei steno doesn't take advantage of a few more of these types of briefs to include high frequency characters that might benefit from a direct way to be typed, so I added many such briefs that are not present in the original Yawei theory. Note that this does not cause any conflict, but it is a somewhat significant difference.

Eventually I will work through the other types of briefs, but that will take a bit of time, and I'd like to let others have a sample dictionary to potentially work from, so that will have to wait a little bit.

5. Selecting characters by shape + English output via translation
Two features of Yawei steno that might take quite a while to implement include the function where the user can type a particular character by making reference to its shape and the Chinese-to-English output function. The former involves first stroking XN-XN to enter into "shape code" mode, which means that all of the following strokes will be interpreted as referring to the shapes of a particular character and not the phonetic form, until the user strokes XN-XN again, returning the machine back to its normal mode. Reusing an example from my previous write up, to type the rather rare character 趑 zī (the first character in 趑趄 zījū "walk with difficulty"), whose pronunciation may not be known to the user, one can first stroke XN-XN to enter into this mode, then stroke DZEO-BDZ, where DZEO 'zou' refers to 走 zǒu, which occurs on the left side of the character, and BDZ 'ci' refers to 次 cì, which is on the right.

Implementing this feature requires the use of a highly specialized dictionary, which may be extracted from the official Yawei software, but I am not confident in my ability to extract such data easily. The alternative is then to build one from scratch, but such a feature seems best left for later, as it would be an incredible time sink.

The Chinese-to-English function is potentially slightly more straightforward. Here, the user strokes a phrase in Chinese (I think that the upper limit on the phrase is four characters, which means that it seems possible to use phrases longer than a single stroke), then strokes XUEO-XBW to replace that phrase of Chinese with its corresponding English translation. A few hypothetical examples using English words commonly used by Taiwanese people in their Mandarin are given below:

Care: DZIO-I/XUEO-XBW (cf. 在意 zàiyì)
Either: IAO-XBE/XUEO-XBW (cf. 要麼 yàome)
Logo: BIAO-Z/XUEO-XBW (cf. 標誌 biāozhì)

On the Yawei software GUI, it shows a number of sample English words, many with multiple Chinese translations. Likewise, for a single Chinese translation, there may be multiple corresponding English outputs. For example, 吃 chī "to eat" corresponds to "ate", "eat", and "eaten", and I'm not sure how the user is meant to choose which one is their target output -- perhaps the function keys are also used here to choose the output?

One slightly confusing aspect to this function is that is appears to take Chinese output as the input to the English translation. I have come to this conclusion based on entries like the one for "pillow", where all of the Chinese translations would be ambiguous if it only takes the phonetics into account, as shown below:

Translations listed for "pillow":
枕頭 zhěntóu = ZN-XDEO (could also be 針頭 zhēntóu "syringe")
枕 zhen = ZN (could also be 真 zhēn "true", 針 zhēn "needle", 震 zhèn "earthquake", etc.)
擱 gē = G (could also be 各 gè "each", 歌 gē "song", 隔 gé "separate", etc.)
墊 diàn = DINA (could also be 點 diǎn "dot", 電 diàn "electricity", 顛 diān "top", etc.)

Adding a streamlined version of this feature could potentially be sped up by adapting some of the open source Chinese-English dictionaries, which would certainly be much faster than writing one from scratch. That being said, Chinese-to-English functionality seems like it would be a nice feature, but I don't know think that this is a particularly urgent addition.

6. The name
Because the outcome of this adaptation won't necessarily be exactly the same as Yawei steno (possibly changing out the briefs, not having all the same features, etc.), I suggest giving Chinese steno for Plover another name. I have a few ideas, and I am not fully committed to any particular one, but I'll go through what I think could be choices.

My initial thought was to use the Chinese translation of "plover", which is 鴴 héng, and I quite like this character for a few reasons. Composed of the characters 行 xíng/háng "to go / line, profession" and 鳥 niǎo "bird", it visually evokes speed and business, and also, you know, a bird! Both of these elements seem to fit in well with the Plover logo and all that. As a side note, 行 xíng also describes a kind of Chinese semi-cursive calligraphy (行書 xíngshū) characterized by connecting many of the strokes. It is not, however, the quick, shorthand-like style used by early court "stenographers" in imperial China, as I had originally thought. (This style is called 草書 cǎoshū "rough script".)

After giving it a bit more thought, I decided it would be a little awkward to have just a single character as the name, since in Mandarin most names are at least two characters. Also, typing single characters is not necessarily easy to do without selection, so rounding it out to a two character compound might be for the best. So far, I've landed on 燕鴴 yànhéng (lit. "swallow plover"), which is not actually a "plover", but a pratincole -- still a cute bird though! The oriental pratincole (Glareola maldivarum), also called the eastern collard pratincole, is found all over East Asia, and given the cultural influence Chinese has had in the region, it seems like a fitting choice.

In addition, romanizing the name as Yanheng, there is a slight nod to two Chinese steno theories: Yawei and Suoheng. Of course, the Chinese steno for Plover system that I am proposing is going to take more from Yawei than Suoheng, especially since the Suoheng keyboard layout it not entirely compatible, but for those in the know they might recognize the connection to these established steno theories.

********
That's it for now! If anyone has any more questions, please let me know and I'll try to get back to you asap. I'll also continue working on setting up the briefs and other codes from Yawei, so that hopefully it'll all come together with a good, working dictionary.
yanheng sample dict.json

Terry Waltz

unread,
May 15, 2017, 4:21:59 PM5/15/17
to Plover
Please keep me in the loop with this. I have been looking for a means to implement the Yawei-like system for some time now for Chinese.

Terry

Diego Pomeranec

unread,
Aug 31, 2017, 5:22:11 PM8/31/17
to Plover
hello everyone. 
Let me introduce you. I am Diego Pomeranec from Argentina, the founder of Pomeranec Media. We provide Spanish/ English/ Portuguese Real time Closed Caption service for live events such as congress, conference, theater, etc. 
Right now, i am looking for two chinese captioners? to hire. It is for my client events next year (May 3th to 5th, 2018 for 24 hours.)
Is there any one to help me to find two chinese captioners?
Regards
Diego Pomeranec

Selena Stehn

unread,
Aug 31, 2017, 7:49:32 PM8/31/17
to plove...@googlegroups.com
Hi, Diego.  

You can check with Jade King via Facebook.   She does know Chinese captioners.  I am not positive, but I think she may provide them as well. ??  Stanley Sakai may know if she does or may know of some as well.  

-Selena

Sent from my iPhone
--

Diego Pomeranec

unread,
Aug 31, 2017, 7:57:00 PM8/31/17
to plove...@googlegroups.com
Thank you for your help. I haven’t find her on Facebook. Can you send me her fb profile?
Regards
Diego

Enviado desde mi iPhone de Diego Pomeranec

El 31 ago. 2017, a la(s) 20:49, Selena Stehn <sst...@live.com> escribió:

> Jade King

Selena Stehn

unread,
Aug 31, 2017, 8:02:54 PM8/31/17
to plove...@googlegroups.com
Her website is: jadeluxe.wordpress.com.

Her FB contact info is there, as well, I believe.



Sent from my iPhone
Reply all
Reply to author
Forward
0 new messages