The Phonetics-Enhanced English (PEE) Scheme ver 0.01
Quick Examples
Full-fledged: A qu̹ic̹k brőw̵n fox jumps ōve͠r t̹h̵è lāzy dog.
Lite: A qu̹ick brőwn fox jumps ōver t̹hè lāzy dog.
Usage
The user is supposed to learn this PEE scheme "by example", i.e. they
will know how new words sound by looking at diacritics used in known
words. They don't need to systematically study the rules in the
scheme, although gradual rule instruction in the context of reading
(by an automatic code-switching system) is definitely a boost.
Depending on the user's level of English phonics knowledge, liter
versions of this scheme can be used. Moreover, already acquired words
and word parts (e.g. -tion) don't need diacritics.
Design Remarks
Version 0.01 is a working but not optimized scheme. Some diacritics
used in this version could be replaced by ones that are visually
clearer in some fonts, and diacritic assignment could be more
scientific, logical and memorable.
Diacritics can be assigned in several ways: (1) each diacritic
corresponds to a phoneme, regardless of the letter modified; (2) each
diacritic corresponds to a certain phonetic aspect, regardless of the
letter modified; (3) each diacritic is just a randomly chosen symbol
to differentiate a letter's possible phonemes. This version makes use
of all these three principles.
Encoding
PEE is based on Unicode.
Unicode often provides both "combining codepoint" characters that can
add diacritics to other characters, and "pre-composed" characters that
are letters which already have diacritics. We should prefer the
approach which has better rendering in most fonts. For example, the
post-composed H̵ (H + U+0335) looks less disturbing than the pre-
composed Ħ (U+0126).
Combining codepoints can be found at http://en.wikipedia.org/wiki/Combining_character
.
Pre-composed characters can be found by visiting the Wikipedia page
for a basic Latin letter (e.g. http://en.wikipedia.org/wiki/A_%28letter%29)
and then looking at "Letter <X> with diacritics" at the bottom of the
page, where <X> is that basic letter.
All phonetic transcriptions in [...] in this document are in IPA
(International Phonetic Alphabet).
The Scheme
1. Unrepresentable or Variable Sounds (UNREP/VAR)
Example: bu͂siness
A "~" above (U+0342, which is clearer than U+0303 in some fonts) or
below (U+0330) a vowel/consonant letter means this letter's
corresponding sound can't be represented by diacritics in this version
(because they are rare exceptions to English orthography or are loan
words), or the letter's sound is variable depending on context (for
example, the "ea" in "read" has various sounds depending on whether
"read" is used in the past tense/as a past participle).
Pre-composed characters are preferred if they display better in most
fonts.
UNREP/VAR always appears above a vowel letter (a͂, e͂, i͂, o͂, u͂, w͂,
y͂), and usually appears below a consonant letter. If there is not
enough space below a consonant letter (e.g. g), it appears above the
letter.
UNREP/VAR does not affect vowel/consonant letters around the letter
modified.
2. Silences
Example: také
All diacritics for silence do not affect vowel/consonant letters
around the letter modified.
2.1. Single-Letter Silence
A "/" above (U+0341, which is clearer than U+0301 in some fonts) or
below (U+0317) a vowel/consonant letter silences this letter.
Pre-composed characters are preferred if they display better in most
fonts. An example is í (U+00ED).
The "/" always appears above a vowel letter (á, é, í, ó, ú, ẃ,
ý), and usually appears below a consonant letter. If there is not
enough space below a consonant letter (e.g. g), it appears above the
letter.
A short "-" inside (U+0335) a letter can also silence this letter. It
serves in cases where we want to avoid excessive separate diacritics.
Examples include "how̵", "hey̵" and "dooɍ".
Pre-composed characters are preferred if they display better in most
fonts. An example is H̵ (H+ U+0335).
2.2. Double-Letter Silence
A reverse arch above (U+035D) two letters silences these letters.
Examples include rig͝ht and chequ͝e.
3. Stress (STR)
Example: wikipẹdia
A "." below (U+0323) a vowel letter (ạ, ẹ, ị, ọ, ụ, ẉ, ỵ) means
the syllable this vowel letter belongs to is stressed.
Pre-composed characters are preferred if they display better in most
fonts. Examples include ị (U+1ECB) and ỵ (U+1EF5).
A multi-syllable word without STR means its stress is variable, e.g.
"present".
4. Syllable Separator
Example: a·way
A "·" (U+00B7) between two characters separates two syllables. It is
necessary if a word's spelling is not left-associative. For example,
"away" should be separated as "a·way" instead of "aw·ay".
5. Vowels
All vowel diacritics affect vowel letters around the letter modified.
5.1. Schwas
5.1.1. Single-Letter Short Schwa
A "\" above (U+0340, which is clearer than U+0300 in some fonts) a
vowel letter (à, è, ì, ò, ù, ẁ, ỳ) means that letter, along with
vowel letters around it, has a schwa sound (IPA [ə]). An example is
"wikipedià".
Pre-composed characters are preferred if they display better in most
fonts. An example is ì (U+00EC).
5.1.2. Double-Letter Short Schwa
A "⁓" above (U+0360) "ar", "er", "ir", "or", "ur" and "re" (a͠r, e͠r,
i͠r, o͠r, u͠r, r͠e) means it is a schwa sound. An example is worke͠r.
5.1.3. Double-Letter Long Schwa
An arch above (U+0361) "ar", "er", "ir", "or", "ur" and "yr" (a͡r,
e͡r, i͡r, o͡r, u͡r, y͡r) means it is a long schwa sound (IPA [əː]). An
example is wo͡rk.
5.2. Short Vowels
Without diacritics, the vowel letters a, e, i/y, o and u sound [æ],
[e], [i], [ɔ] and [ʌ], e.g. "bat", "bet", "bit"/"gym", "bot" and
"but".
Diacritic-equipped vowel letters for short vowels are categorized into
four "classes". The first class has the same sounds as the above
diacritic-free letters do.
1st-Class Short (U+0306): ă [æ], ĕ [e], ĭ/y̆ [i], ŏ [ɔ], ŭ [ʌ]
(e.g. băt, bĕt, bĭt/gy̆m, bŏt, bŭt)
2nd-Class Short (U+0307): ȧ [e], ė [i], , ȯ [ʌ], u̇ [u]
(e.g. ȧny, dėsign, sȯme, pu̇t)
3rd-Class Short (U+0311): ȃ [i], ȇ [ɔ], , ȏ [u], (e.g.
privȃte, ȇncore, bȏok)
4th-Class Short (U+030D): a̍ [ɔ], , , , (e.g.
swa̍p)
It is interesting that letters on anti-diagonal lines have the same
sound. For example, ȃ, ė and ĭ/y̆ all sound [i].
5.3. Long Vowels
For convenience, [juː] and [ju] are considered long vowels in this
section.
Diacritics-equipped vowel letters for long vowels are also categorized
into four classes.
1st-Class Long (U+0304): ā [ei], ē [iː], ī/ȳ [ai], ō [əu], ū/w̄ [juː]
(e.g. tāke, mēet, līght/mȳ, gō, hūge/new̄)
2nd-Class Long (U+0308): ä [ɑː], ë [ei], ï/ÿ [iː], ö [ɔː], ü/ẅ [uː]
(e.g. cär, ëight, machïne/quaÿ, förce, blüe/jeẅ)
3rd-Class Long (U+030F): ȁ [ɔː], , , ȍ [uː], ȕ [ju]
(e.g. tȁll, fȍod, cȕre)
4th-Class Long (U+030B): , , , ő [au] (e.g. rőund)
6. R's
All diacritics for "r" do not affect vowel/consonant letters around
the letter modified.
The default "r" (without any diacritic) has the [r] sound.
A right-half circle below (U+0339) "r" (r̹) means this "r" sounds
[ər]. An example is exper̹ience.
A "-" in "r" (ɍ, U+024D) means this "r" is silenced.
7. Consonants
All consonant diacritics do not affect vowel/consonant letters around
the letter modified.
7.1. [tʃ], [ʃ] and [ʒ]
[tʃ], [ʃ] and [ʒ] graphemes are assigned U+032F, U+032E and U+0331:
[tʃ]: ch, c̯h̵, t̯, t̯c͝h, t̯ɨ, c̯, c̯z̵, t͝sc̯h̵
[ʃ]: sh, s̮h̵, t̮ɨ, c̮ɨ, s̮s̮ɨ, s̮ɨ, s̮s̮, c̮h̵, s̮, s̮c̮ɨ, c̮é,
s̮ch̵, s̮c̮
[ʒ]: s̱ɨ, s̱, ẕ, ẕh̵, ṯɨ, s̱h̵
7.2. X's
The default "x" (without any diacritic) has the [ks] sound.
A right-half circle below (U+0339) "x" (x̹) means this "x" sounds
[gz]. An example is ex̹ample.
A "\" below (U+0316) "x" (x̖) means this "x" sounds [kʃ]. An example
is anx̖ɨous.
A left-half circle below (U+031C) "x" (x̜) means this "x" sounds [z].
An example is x̜ylophone.
7.3. Other Consonants
Graphemes for other consonants that may need diacritics are below:
[t]: éd̖
[ɡ]: g, gg, gu͝e, gh̵
[k]: c̹, k, c̹k, c̹h̵, c̹c̹, qu̵, q, c̹q, c̹u̵, qu͝e, kk, kh̵
[ŋ]: ng, n̹g̵, n̹, n̹g̵u͝e, n̹g̵h̵
[f]: f, ph, p̀h̵, ff, ğh̵, p̀h̵
[v]: v, vv, f̹
[θ]: th, t̜h̵, c͝ht̜h̵, p͝ht̜h̵, t̜t̜h̵
[ð]: t̹h, t̹h̵
[s]: s, c, ss, sc, st̵, p̵s, sc͝h, cc, sé, cé
[z]: s̹, z, x̜, zz, s̹s̹, zé
[dʒ]: g̀, j, d̹g̀, d̹g̀é, d̹, d̹ɨ, g̀ɨ, g̀é, d̹j, g̀g̀
[w]: u̹