jbovlaste, vlatai, camxes and morphology

60 views
Skip to first unread message

Riley Martinez-Lynch

unread,
Apr 18, 2014, 1:21:44 PM4/18/14
to loj...@googlegroups.com
A couple of issues (#26, #37) have been recently raised against jbovlaste which can be traced to "vlatai", a tool which is used to verify and classify the morphology of new words. vlatai is a tool which is built as part of jbofi'e: Accordingly it has not been substantially updated for some time, and is known to exhibit bugs, including failing to parse some valid words.

One suggestion has been to replace vlatai with camxes. I've taken some initial steps in that direction, but need some help verifying what the correct behavior should be for the cases where camxes and jbofi'e return different results.

I ran all of the words in jbovlaste through camxes and filed issues #38, #39, #40 and #41 to record issues encountered with (respectively) cmavo, cmene, fu'ivla and lujvo. I will include a few examples:
  • camxes parses {y} as "initialSpaces", but it is considered a cmavo in jbovlaste.
  • camxes doesn't parse {bu} compounds like {denpa bu} as cmavo
  • {ybu} is a "cmavo cluster" in jbovlaste but a single cmavo per camxes
  • {aierne} is a fu'ivla in jbovlaste but a "cmavo + fu'ivla" per camxes
  • {selda'ergau} is a fu'ivla in jbovlaste but a lujvo per camxes
  • {zei} compounds aren't recognized as lujvo by camxes (nor in vlatai: jbovlaste has a workaround)
If you can offer verifications or corrections for any of these issues, please respond here or add comments to the issues in github.

I'd also like to know if there's consensus on which is more correct or current: "cmavo cluster" (jbovlaste) or "compound cmavo" (CLL), or if there's a distinction between these terms.

Thank you!

--Riley
mi'e la mukti mu'o

Jorge Llambías

unread,
Apr 18, 2014, 2:15:03 PM4/18/14
to loj...@googlegroups.com
On Fri, Apr 18, 2014 at 2:21 PM, Riley Martinez-Lynch <shun...@gmail.com> wrote:
  • camxes parses {y} as "initialSpaces", but it is considered a cmavo in jbovlaste.

More generally, it will be parsed as a space not just initially, i.e. not as a word. The reason we did this was so that it would not, for example, be quoted with "zo", so that you are allowed to hesitate between "zo" and the word you want to quote.
  • camxes doesn't parse {bu} compounds like {denpa bu} as cmavo

"denpa bu" is considered two words, not one cmavo. It can't be quoted with single-word quoter "zo". "zo denpa bu" is the quoted word "denpa" converted into a lerfu with "bu". 
 
  • {ybu} is a "cmavo cluster" in jbovlaste but a single cmavo per camxes

Yes, camxes considers this one a single word, Since "y" itself is not considered a word, it is not something that "bu" can attach to, and so in order to maintain "ybu" as a lerfu we had to make it a sui generis word. It can't be "y bu" because then "y" is just hesitation and "bu" will attach to whatever precedes it. It can be quoted with "zo". 
  • {aierne} is a fu'ivla in jbovlaste but a "cmavo + fu'ivla" per camxes
Yes, camxes considers i/u in iV uV to be semi-consonants and does not require a pause in front of them, so ".aierne" breaks up into two words, just as "caierne" does.

  • {selda'ergau} is a fu'ivla in jbovlaste but a lujvo per camxes 

Yes, camxes allows the -r- hyphen in front of CVV cmavo always, not just when required. This is to facilitate lujvo making, so that if you already know the lujvo "da'ergau" and you then want to make a new lujvo by adding a rafsi in front, you don't have to remember to remove the hyphen.
 

  • {zei} compounds aren't recognized as lujvo by camxes (nor in vlatai: jbovlaste has a workaround)

Right, they are not considered a single word, they can't be quoted with "zo". 
 
If you can offer verifications or corrections for any of these issues, please respond here or add comments to the issues in github.

I'd also like to know if there's consensus on which is more correct or current: "cmavo cluster" (jbovlaste) or "compound cmavo" (CLL), or if there's a distinction between these terms.

I think they are just two names for the same thing. Perhaps "cmavo cluster" covers any string of cmavo (jbovlaste won't care if it makes any sense to cluster them together), while "compound cmavo" is probably meant to be a string of cmavo that occurs frequently in a grammatical context, but this is a distinction I just made up.

mu'o mi'e xorxes

Jorge Llambías

unread,
Apr 18, 2014, 2:20:59 PM4/18/14
to loj...@googlegroups.com
On Fri, Apr 18, 2014 at 3:15 PM, Jorge Llambías <jjlla...@gmail.com> wrote:

  • {selda'ergau} is a fu'ivla in jbovlaste but a lujvo per camxes 

Yes, camxes allows the -r- hyphen in front of CVV cmavo always, not just when required. This is to facilitate lujvo making, so that if you already know the lujvo "da'ergau" and you then want to make a new lujvo by adding a rafsi in front, you don't have to remember to remove the hyphen.

I meant "the -r- hyphen *after* CVV cmavo". I also forgot to mention that the same applies to the -y- hyphen after CVC cmavo, so that for example "selyda'ergau" should also be accepted, although the -y- hyphen is not required there either.

Jorge Llambías

unread,
Apr 18, 2014, 3:08:36 PM4/18/14
to loj...@googlegroups.com
On Fri, Apr 18, 2014 at 2:21 PM, Riley Martinez-Lynch <shun...@gmail.com> wrote:

I ran all of the words in jbovlaste through camxes and filed issues #38, #39, #40 and #41 to record issues encountered with (respectively) cmavo, cmene, fu'ivla and lujvo. 

The issues with cmevla and fu'ivla have to do with which syllables are considered acceptable in Lojban words. CLL is not completely clear on that, and that's why different parsers went with different things. A syllable consists of an onset, a nucleus and a coda. For camxes, a valid syllable has: 

 - a single consonant or nothing as coda
 - one of this 10 valid nuclei: a, e, i, o, u, ai, au, ei, oi, y
 - one of many valid onsets. 

Valid onsets are:
 . (glottal stop) only at the beginning of the word,
 ' (apostrophe) not at the beginning of the word.
 C (any other consonant)
 i/u
 Ci/Cu
 CC (any one of the permissible initial pairs listed in CLL) 
 CCC (a slightly more restricted list than what can be derived from the rules in CLL. Basically those that fall within this pattern [sczj][bdfgkmnptvx][lr] with each CC being a permissible initial pair.)

And then there are the 64 consonantal syllables, consisting of any consonant followed by [lmnr] (but not repeating the same consonant).

vlatai allows syllables like CCiV/CCuV, codas with more than one consonant, empty onsets (as in "oa"). I think those three account for most of the differences in what camxes and vlatai allow.

Pierre Abbat

unread,
Apr 18, 2014, 4:02:07 PM4/18/14
to loj...@googlegroups.com
On Friday, April 18, 2014 15:15:03 Jorge Llambías wrote:
> On Fri, Apr 18, 2014 at 2:21 PM, Riley Martinez-Lynch <shun...@gmail.com>
> > - {aierne} is a fu'ivla in jbovlaste but a "cmavo + fu'ivla" per camxes
> >
> > Yes, camxes considers i/u in iV uV to be semi-consonants and does not
>
> require a pause in front of them, so ".aierne" breaks up into two words,
> just as "caierne" does.

The fix for that is "ai'erne". The ve fu'ivla has an initial /h/.

Other words affected by camxes:
"tarksako" (dandelion) -> "tarsako" or "traksako"
"kriofla" (cloves) -> "kriiofla"
"fasxolarkto" (koala) -> "fasxolarto". The latter was actually my original
proposal, but Nick prefers the version with "k" because he can see the Greek
word for "bear" in it. Without the "k", the bear turns to bread.

"martio", "prilio", "madjio", "djunio", and "djulio" are all valid, but a bug
in vlatai calls them invalid because they differ by one apostrophe from lujvo.

"damskrima" (badly formed word for "fencing", the sport) is valid, but another
bug in vlatai calls it invalid. Better forms are "dambrskrima" and "eskrima".

"mliau" (meow) is affected by whether triphthongs are allowed. If they are,
it's invalid because it has only one syllable. If they aren't, it's "mli,au".

Pierre
--
ve ka'a ro klaji la .romas. se jmaji

Jorge Llambías

unread,
Apr 18, 2014, 4:20:51 PM4/18/14
to loj...@googlegroups.com
On Fri, Apr 18, 2014 at 5:02 PM, Pierre Abbat <ph...@bezitopo.org> wrote:

"mliau" (meow) is affected by whether triphthongs are allowed. If they are,
it's invalid because it has only one syllable. If they aren't, it's "mli,au".

camxes will want "mli,iau" or "mli'au" because ",au" doesn't have an onset.

Wuzzy

unread,
Apr 18, 2014, 8:45:54 PM4/18/14
to loj...@googlegroups.com
Am Fri, 18 Apr 2014 16:08:36 -0300
schrieb Jorge Llambías <jjlla...@gmail.com>:

> The issues with cmevla and fu'ivla have to do with which syllables are
> considered acceptable in Lojban words. CLL is not completely clear on
> that, and that's why different parsers went with different things.
Wow. If this is true, then things are seriously messed up in Lojban. The
morphology is a core part of the language. And that part is not well
defined? I never really noticed that but I fear this is actually true.
Seriously, this sucks. I think a revision of the CLL is badly
needed.

If there is no formalization, parsers are doomed to give different
or nonsensical results. If parsers are doomed to give such
results, the issues for jbovlaste can not be fixed. But then this means
the bug does neither lie in jbovlaste, nor in vlatai nor in camxes. The
bug appears to be actually in Lojban itself. Seriously, I hate to type
this, but: This is messed up. :-(

Jorge Llambías

unread,
Apr 18, 2014, 10:17:38 PM4/18/14
to loj...@googlegroups.com
On Fri, Apr 18, 2014 at 9:45 PM, Wuzzy <alm...@aol.com> wrote:
Am Fri, 18 Apr 2014 16:08:36 -0300
schrieb Jorge Llambías <jjlla...@gmail.com>:

> The issues with cmevla and fu'ivla have to do with which syllables are
> considered acceptable in Lojban words. CLL is not completely clear on
> that, and that's why different parsers went with different things.
Wow. If this is true, then things are seriously messed up in Lojban. The
morphology is a core part of the language. And that part is not well
defined? I never really noticed that but I fear this is actually true.
Seriously, this sucks. I think a revision of the CLL is badly
needed.

You probably never noticed because it affects very marginal words. The core part of the morphology is well defined. 

If there is no formalization, parsers are doomed to give different
or nonsensical results. If parsers are doomed to give such
results, the issues for jbovlaste can not be fixed. But then this means
the bug does neither lie in jbovlaste, nor in vlatai nor in camxes. The
bug appears to be actually in Lojban itself. Seriously, I hate to type
this, but: This is messed up. :-(

Personally, I think other parts of the language are probably more messed up and require more attention, but yes, an official definition of the morphology would be nice and relatively easy to do.

Riley Martinez-Lynch

unread,
Apr 19, 2014, 11:09:41 AM4/19/14
to loj...@googlegroups.com
Thank you for your detailed reply!
  • camxes parses {y} as "initialSpaces", but it is considered a cmavo in jbovlaste. 
  • {ybu} is a "cmavo cluster" in jbovlaste but a single cmavo per camxes
More generally, it will be parsed as a space not just initially, i.e. not as a word. The reason we did this was so that it would not, for example, be quoted with "zo", so that you are allowed to hesitate between "zo" and the word you want to quote. 
Yes, camxes considers this one a single word, Since "y" itself is not considered a word, it is not something that "bu" can attach to, and so in order to maintain "ybu" as a lerfu we had to make it a sui generis word. It can't be "y bu" because then "y" is just hesitation and "bu" will attach to whatever precedes it. It can be quoted with "zo". 

In the cases of {y} and {ybu}, it sounds like no immediate action is required. They are correctly classified in jbovlaste, and unlikely to be used in new compounds added to jbovlaste.
  • camxes doesn't parse {bu} compounds like {denpa bu} as cmavo
"denpa bu" is considered two words, not one cmavo. It can't be quoted with single-word quoter "zo". "zo denpa bu" is the quoted word "denpa" converted into a lerfu with "bu".

In this case, it seems like jbovlaste should be updated and corrected: "cmavo cluster" is not accurate. Do you have a suggestion for what to call forms like this? "bu letterals" or just "letterals"? Incidentally, this classification does not appear to come from vlatai: It doesn't recognize {denpa bu} at all. 
  • {aierne} is a fu'ivla in jbovlaste but a "cmavo + fu'ivla" per camxes
Yes, camxes considers i/u in iV uV to be semi-consonants and does not require a pause in front of them, so ".aierne" breaks up into two words, just as "caierne" does.

It sounds like the cmene and fu'ivla which are either rejected or reclassified by camxes or split into multiple words are not morphologically valid, and should be marked as invalid, or corrected so that they parse as intended, or both.
  • {zei} compounds aren't recognized as lujvo by camxes (nor in vlatai: jbovlaste has a workaround)
Right, they are not considered a single word, they can't be quoted with "zo". 

This case seems a lot like {denpa bu}: The category that jbovlaste is using is less accurate than it could be. Should these entries be reclassified as "zei lujvo"? Something else?

I'd also like to know if there's consensus on which is more correct or current: "cmavo cluster" (jbovlaste) or "compound cmavo" (CLL), or if there's a distinction between these terms.
I think they are just two names for the same thing. Perhaps "cmavo cluster" covers any string of cmavo (jbovlaste won't care if it makes any sense to cluster them together), while "compound cmavo" is probably meant to be a string of cmavo that occurs frequently in a grammatical context, but this is a distinction I just made up.

Given that jbovlaste is not intended to store nonsense clusters of cmavo, it seems like it might make sense to adopt the CLL terminology and reclassify "cmavo clusters" as "compound cmavo". Any objections?
 

Jorge Llambías

unread,
Apr 19, 2014, 11:48:35 AM4/19/14
to loj...@googlegroups.com
On Sat, Apr 19, 2014 at 12:09 PM, Riley Martinez-Lynch <shun...@gmail.com> wrote:
"denpa bu" is considered two words, not one cmavo. It can't be quoted with single-word quoter "zo". "zo denpa bu" is the quoted word "denpa" converted into a lerfu with "bu".

In this case, it seems like jbovlaste should be updated and corrected: "cmavo cluster" is not accurate. Do you have a suggestion for what to call forms like this? "bu letterals" or just "letterals"? Incidentally, this classification does not appear to come from vlatai: It doesn't recognize {denpa bu} at all. 
  • {zei} compounds aren't recognized as lujvo by camxes (nor in vlatai: jbovlaste has a workaround)
Right, they are not considered a single word, they can't be quoted with "zo". 
This case seems a lot like {denpa bu}: The category that jbovlaste is using is less accurate than it could be. Should these entries be reclassified as "zei lujvo"? Something else?

"bu letteral" and "zei lujvo" sound fine to me. In lojban they'd be "zo bu zei lerfu" and "zo zei zei lujvo".

"bu" and "zei" have to be deactivated with "zo" so they don't grab whatever word comes before.

I'd also like to know if there's consensus on which is more correct or current: "cmavo cluster" (jbovlaste) or "compound cmavo" (CLL), or if there's a distinction between these terms.
I think they are just two names for the same thing. Perhaps "cmavo cluster" covers any string of cmavo (jbovlaste won't care if it makes any sense to cluster them together), while "compound cmavo" is probably meant to be a string of cmavo that occurs frequently in a grammatical context, but this is a distinction I just made up.

Given that jbovlaste is not intended to store nonsense clusters of cmavo, it seems like it might make sense to adopt the CLL terminology and reclassify "cmavo clusters" as "compound cmavo". Any objections?

Perhaps the issue was that "compund cmavo" sounds as if the result of the compounding was a single cmavo, which it is not, so "cmavo cluster" (or "cmavo compound") is more accurate.

Gleki Arxokuna

unread,
Apr 19, 2014, 1:28:17 PM4/19/14
to loj...@googlegroups.com
2014-04-19 4:45 GMT+04:00 Wuzzy <alm...@aol.com>:
Am Fri, 18 Apr 2014 16:08:36 -0300
schrieb Jorge Llambías <jjlla...@gmail.com>:

> The issues with cmevla and fu'ivla have to do with which syllables are
> considered acceptable in Lojban words. CLL is not completely clear on
> that, and that's why different parsers went with different things.
Wow. If this is true, then things are seriously messed up in Lojban. The
morphology is a core part of the language. And that part is not well
defined? I never really noticed that but I fear this is actually true.
Seriously, this sucks. I think a revision of the CLL is badly
needed.

The point is that camxes was indeed a major change in the language but it hadnt much to do with CLL that's why few mention it when speaking about changes in lojban.

however, i wonder why {relmast} is no longer valid. How can it even break self-segregation?


If there is no formalization, parsers are doomed to give different
or nonsensical results. If parsers are doomed to give such
results, the issues for jbovlaste can not be fixed. But then this means
the bug does neither lie in jbovlaste, nor in vlatai nor in camxes. The
bug appears to be actually in Lojban itself. Seriously, I hate to type
this, but: This is messed up. :-(

--
You received this message because you are subscribed to the Google Groups "lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lojban+un...@googlegroups.com.
To post to this group, send email to loj...@googlegroups.com.
Visit this group at http://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.

Gleki Arxokuna

unread,
Apr 19, 2014, 1:30:54 PM4/19/14
to loj...@googlegroups.com

2014-04-19 19:48 GMT+04:00 Jorge Llambías <jjlla...@gmail.com>:
Perhaps the issue was that "compund cmavo" sounds as if the result of the compounding was a single cmavo, which it is not, so "cmavo cluster" (or "cmavo compound") is more accurate.

"cmavo compound" reminds me of "compounds" which are lujvo. But cmavo clusters (although yes, they have different meaning depending on their relative position because of scope) dont work like lujvo.

Jorge Llambías

unread,
Apr 19, 2014, 1:59:13 PM4/19/14
to loj...@googlegroups.com
On Sat, Apr 19, 2014 at 2:28 PM, Gleki Arxokuna <gleki.is...@gmail.com> wrote:


The point is that camxes was indeed a major change in the language but it hadnt much to do with CLL that's why few mention it when speaking about changes in lojban.


"not much to do with CLL" is something of an exaggeration... It only deviates from CLL in relatively marginal or doubtful cases.

 
however, i wonder why {relmast} is no longer valid. How can it even break self-segregation?

It's because of the final double consonant -st, not because it breaks self-segregation. The current version of the morphology will accept almost anything for cmevla, so probably the test was done with an older more strict version. 

Riley Martinez-Lynch

unread,
Apr 22, 2014, 7:36:46 PM4/22/14
to loj...@googlegroups.com
I created issues #42#43, and #44 to propose renaming the jbovlaste valsi type "cmavo cluster" to "cmavo compound", and to create two new types "bu letteral" and "zei lujvo".

The purpose of these changes is to reconcile the way that the morphology of words is analyzed by vlatai/jbofi'e and camxes as described in issues #38 and #41, in anticipation of replacing vlatai with camxes per #26.

Please join the discussion either here or in the comments on github if you think that these issues should be handled in some other way.

I'm uncertain about how to proceed regarding the cmene (#39) and fu'ivla (#40) which are invalid ("nonLojbanWord") under camxes, or which parse differently
  • Respell the words (according to rules and/or community guidance) so that camxes accepts them as cmene and fu'ivla
  • Don't respell the words, but either reclassify them or otherwise flag them as using obsolete morphology
  • Do nothing: Don't respell or reclassify the words 
  • Delete the problematic words entirely
I appreciate your guidance!

-- Riley
Reply all
Reply to author
Forward
0 new messages