de'i li 2015-04-19 ti'u li 16:18 li'ai TR NS di'e cusku:
> When I first learned about loglan/lojban I thought all words were of the
> form `CVCCV` or `CCVCV`, and compound words (lujvo) were always
> `(CVC|CCV)...CV`.
That seems correct for original, pre-rafsi Loglan. Words with other
shapes were invented later.
> I knew names had more flexibility. Later I learned
> that borrowed words did too. Even later I learned that compound words
> too had more flexible rules, e.g. hypen letters. And finally in the end,
> starring me in the face, it dawned on me that even simple words like
> `brivla` did not fit the original formations.
Par for the course. Most of the learning materials leave the morphology
unexplained, and even the CLL leaves some questions open.
> After reading a fair bit, I find myself more confused than ever. Are
> there any single set of definitive rules defining what and what isn't a
> legal lojban word?
Each parser has its own set of word formation rules. Most words in use
are recognized by all parsers but there are edge cases. I'll try to lay
out how camxes does it, since the existing documents explain it a little
disjointedly.
------ MEZOHE'S CONDENSED CLL 2.0 MORPHOLOGY CHAPTER COUNTERFEIT ------
=== Phonemes ===
At the most basic level, an utterance is made of phonemes. Here are the
main classes of phonemes (there are subclasses as seen later):
- consonants {zunsna}:
bdgjvz (voiced), cfkpstx (unvoiced), lmnr (syllabic)
- glides {karmlisna}: i u
- h {me'o .y'y}: '
- word break (glottal stop) {depybu'i}: .
- vowels {karsna}: a e i o u
- diphthongs: au ai ei oi
- y {me'o .ybu}: y
The comma {me'o slaka bu} isn't a phoneme, but is used to separate
syllables for clarity. Removing it has no effect.
i and u are vowels, unless a vowel or diphthong follows, in which case
they are glides. Glide-diphthong pairs win over glide-vowel pairs, which
win over diphthongs.
At this level, strings of consonants follow these rules:
- consonants can be next to consonants, word breaks, vowels,
diphthongs, and y
- no consonant can be followed by itself
- voiced consonants can't be next to voiceless ones, and vice versa
- sibilants (cjsz) can't be next to each other
- x can't be next to c or k
- the substrings mz, nts, ntc, ndz, ndj are not allowed
Glides must follow a word break, vowel, diphthong, or y, and be followed
by a vowel, diphthong, or y. i as a glide can't follow a diphthong
ending in i, and u as a glide can't follow the diphthong au.
h can't be next to a consonant, glide, or glottal stop.
Vowels, diphthongs, and y can be next to consonants, glides, h, and word
breaks.
=== Syllables ===
These are the shapes syllables {slaka} can have:
* Vowel syllable
- a word break, a glide, or up to three consonants
- then a vowel or a diphthong
- then optionally a consonant
- e.g. .a, spa, pan, blaif, stra
* h-syllable
- the letter '
- then a vowel or diphthong
- then optionally a consonant
- e.g. 'u, 'ei, 'am
* y-syllable
- a word break, a glide, or up to three consonants
- then the letter y
- e.g. by, .y, gry, zbly
* hy-syllable
- the string "'y"
* consonantal syllable {zunsnaslaka}
- a consonant
- then a syllabic consonant
- e.g. fl, sm, rn
When a syllable starts with more than one consonant, the rules for these
clusters {zunsnagri} are more restrictive than the general ones above.
These are the permissible initial doubles, stolen with love from CLL:
pl pr fl fr
bl br vl vr
cp cf ct ck cm cn cl cr
jb jv jd jg jm
sp sf st sk sm sn sl sr
zb zv zd zg zm
tc tr ts kl kr
dj dr dz gl gr
ml mr xl xr
And the permissible initial triples:
cfr cfl sfr sfl jvr jvl zvr zvl
cpr cpl spr spl jbr jbl zbr zbl
ckr ckl skr skl jgr jgl zgr zgl
ctr str jdr zdr
cmr cml smr sml jmr jml zmr zml
When segmenting text into syllables, when a consonant could possibly
either start a syllable or end one, it's always taken to start one. In
other words, onsets are greedy, codas are lazy.
=== Words ===
Words can be cmavo, cmevla, or brivla. cmavo and brivla are made of
syllables, while cmevla are free strings of phonemes.
cmavo are composed of:
- one vowel- or y-syllable, with at most one initial consonant and no
final consonant
- optionally followed by any number of h- or hy-syllables without any
final consonants
Examples: .a, ba, bai, ba'i, ba'ai, by, by'i, ia, iai, iy, ua'ai'y
There are two exceptions: "ybu", also spelled "y.bu", is a single cmavo
despite the medial consonant and word break, and "y" surrounded by word
breaks and not followed by "bu" is a word break itself, not a cmavo.
cmavo can be stressed on any syllable.
cmevla are arbitrary strings of phonemes, following phoneme but not
syllable restrictions, starting with a word break, containing no word
breaks, and ending with a consonant followed by a word break. They can
be stressed on any vowel, diphthong, or syllabic consonant.
A brivla is composed of any number of initial rafsi followed by a final
rafsi. It must begin with a vowel syllable, end with a vowel- or
h-syllable, and have at least two syllables. It may not be a slinkuhi,
and may not start with a sequence of cmavo that yields a valid word when
removed. Stress (marked here with a grave accent) is on the second-last
vowel- or h-syllable.
A final rafsi is:
- a zihevla:
- a vowel syllable
- followed by any number of vowel, h-, or consonantal syllables
- followed by a vowel- or h-syllable with no final consonant
- is not a gismu or sequence of more than one rafsi
- e.g. cpi,kù,ku àl,ga fì,pr,koi glàu,ka sprà,'e
- or a gismu:
- a CV vowel syllable followed by a CCV one
- or a CVC one then a CV one
- or a CCV one then a CV one
- e.g. pà,stu vèd,li tsà,ni
- or a short final rafsi:
- a CVV or CCV vowel syllable, e.g. xau, cpa
- or a CV vowel syllable followed by a 'V h-syllable,
e.g. fà'i
An initial rafsi is any one of these:
- a gismu followed by the syllable "'y"
e.g. fasnu'y
- a gismu with its final vowel replaced with y
e.g. fasny
- a zihevla followed by the syllable "'y"
e.g. sorpeka'y
- a CV vowel syllable followed by a Cy y-syllable
e.g. fa,ky
- a short y-less rafsi, unless the following rafsi is a zihevla rafsi:
- a vowel syllable of the form CVV, CVVr, CVC, or CCV
- or a CV syllable followed by a 'V or 'Vr syllable
e.g. gau gaur gas jbu li,'a li,'ar
- a short y-less rafsi followed by a short final rafsi followed by "'y"
e.g. cau,cni,'y ri,'ar,ju,'o,'y mul,fau,'y, jbo,jbe,'y
- a zihevla that ends in a vowel syllable with its final vowel replaced
with y, unless the result breaks up into a string of any other rafsi
e.g. ka,'or,ty a,sny
If a CVVr or CV'Vr rafsi is followed by a rafsi beginning with "r", and
only then, the final "r" of the first rafsi is replaced with an "n".
If a rafsi ending in "y" is followed by a rafsi beginning with a vowel,
and only then, an "'" is prepended to the second rafsi. In other
situations where sticking two rafsi together violates phoneme or
syllable rules, the left rafsi needs to be replaced with one ending with
"y".
A brivla consisting of just a zihevla is called a zihevla, one
consisting of just a gismu is a gismu, and all others are called lujvo.
A slinkuhi {valslinku'i} is a [consonant followed by a brivla that up to
its first y-syllable, or if no y-syllables, in its entirety, is composed
of non-zihevla rafsi] that itself can't be broken up into a string of rafsi.
e.g. _p_rà,'i _s_pòr,te _z_bla,zdà,vro _c_nar,jy,fra,gà,ri
Other non-words also behave like slinkuhi, in that prepending a cmavo
makes them a word, but these arise from rules other than the one named
slinkuhi.
e.g. cpa cpau cpra cprau (brivla must have 2+ syllables)
cl,pàr,nu (brivla must start with a vowel syllable)
A tosmabru {valrtosmabru} is a sequence of cmavo followed by a brivla.
tosmabru can be coerced into being brivla by adding a consonant at the
end of the last syllable of the first cmavo.
e.g. gau,tcì,ni -> gau tcini; cmavo + gismu
gaur,tcì,ni -> gaurtcini; a single lujvo
.a,'u,nain,mo -> .a'u nainmo; cmavo + zi'evla
.a,'ur,nain,mo -> .a'urnainmo; a single zihevla
boi,kèi,foi -> boi kèi foi; three cmavo
boir,kèi,foi -> boirkeifoi; a single lujvo
=== Word breaks, glottal stops ===
All word breaks may be pronounced as glottal stops, and some word breaks
have to. Glottal stops are required before and after all cmevla, as well
as before all words starting with a vowel or "y". They are also required
after certain cmavo:
- When pronouncing two words together would break a phonotactic rule,
they need to be separated with a glottal stop.
e.g. "au" "uàn,mo" -> {.au .uanmo}
- Each pair of cmavo of the form CV Cy followed by either a brivla or a
cmavo of the form CVV or CV'V needs a glottal stop between the last
and second-last word.
e.g. "ca" "vy" "càr,vi" -> {ca vy. carvi} /Sa.vy?.'Sar.vi/
(/Sa.vy.'Sar.vi/ would be {cavycarvi}, a lujvo)
- Every stressed cmavo followed by a brivla starting with a consonant
cluster needs a glottal stop after the cmavo.
e.g. "bà" "sna,jù,'i" -> {bà. snaju'i} /'ba?.sna.'Zu.hi/
(/'ba.sna.'Zu.hi/ would be {basna jù'i}, a gismu and a cmavo)
=== Parser peculiarities ===
jbofihe, popular before camxes came along, has different rules than camxes.
* Vowel syllables
- They may start with any number of consonants, and the rule for
initial triples doesn't exist. The only restriction is that all
pairs in the initial cluster need to be valid initial pairs.
e.g. {stsmla'u} is a word
- They may end with up to two consonants, not just one.
e.g. {bongnanba} is a word
- Syllables beginning with glides are their own type, and if not
preceded by a glottal stop, they continue the word like an
h-syllable.
e.g. {.aierne} is one word, not two,
{.ia} always starts with a glottal stop
- Syllables beginning with vowels don't require a word boundary
before them.
e.g. {sincrboa} is a word, {.joan.} is a word
(Or, more accurately, jbofihe has no notion of syllables in the sense
that camxes does, but even under jbofihe practically no one would use
words that violated these modified syllable rules)
* cmevla
Dotside doesn't apply: the beginning of cmevla can also be delimited by
some cmavo, namely {la}, {lai}, {la'i}, or {doi}. If one of these cmavo
precedes a cmevla, no initial glottal stop is required. cmevla can't
contain any of these cmavo. For example {la .larfin.} parses as three
words, "la" "la" "rfin"
* brivla
zihevla as final rafsi, rafsi beginning with vowels, and rafsi ending in
"'y" do not exist.
e.g. {bardykentauru}, {.algyro'i}, {sorpeka'ykla} aren't words
rafsi with CVCy shape are illegal if the corresponding CVC rafsi is
legal in the situation.
e.g. {jbobanyjvo} isn't a word, only {jbobanjvo} is
rafsi with CVVr or CV'Vr shape are only recognized as rafsi if using the
corresponding CVV or CV'V rafsi would result in tosmabru.
e.g. {lerpi'oci'arci'e} is a zihevla,
{lerpi'oci'aci'e} is a lujvo,
{ci'arci'e} is a lujvo
All brivla must have a consonant cluster within the first five letters
after ' and y are removed. {ko'oinde} is not a word.
----------------------------------------------------------------------
I hope that I didn't overlook too many rules and that the text is fairly
understandable. Do tell if something is wrong or unclear.
mu'o do