Where is the latest/official PEG grammar?

47 views
Skip to first unread message

scope845h...@icebubble.org

unread,
Mar 30, 2020, 1:14:32 AM3/30/20
to lojban
So, I see that the lojban.org Web site is now a Wiki (a cute term for a
poorly organized list of lists of lists to information of dubious
accuracy). (And why does it default to Chinese?) I'm looking for the
current official (or at least the latest) PEG grammar for Lojban. But,
after about an hour of clicking in circles, I still couldn't find
anything. :(

The closest thing I could find was on an external site:

http://subvert-the-dominant-paradigm.net/~jbominji/code/lojban_grammar.peg

But that PEG contains obvious errors, i.e. "anaphoracataphora",
"openclosed", etc. Is there a "correct" PEG for Lojban lying around
(hiding?) anywhere? Or at least a "most correct" version?

ki'e .oiro'e

Ilmen

unread,
Mar 30, 2020, 6:11:36 PM3/30/20
to loj...@googlegroups.com

scope845h...@icebubble.org

unread,
Apr 1, 2020, 9:24:54 PM4/1/20
to loj...@googlegroups.com
Ilmen <ilmen....@gmail.com> writes:

> https://github.com/lojban/ilmentufa/blob/master/camxes.peg

I don't get it. That PEG also has "anaphoracataphora", "openclosed",
etc. bugs in it. How is that the most current grammar?

scope845h...@icebubble.org

unread,
Apr 10, 2020, 11:22:41 PM4/10/20
to loj...@googlegroups.com
Bob LeChevalier <loj...@lojban.org> writes:

> I don't know why those names are used in any particular PEG
> grammar.

...

> For the tag "anaphoracataphora, I suspect that is not a "bug" because
> anaphora and cataphora are identical in term of grammar so it is
> merely a naming convention whether to include some separator between
> the two words or not. I don't know what construct "openclosed" refers
> to, but likely it also is a conflation of two types of constructs with
> the same grammar.

No, these are not the names of non-terminals in the PEG, they appear in
the text of comments in the PEG. They should read "anaphora/cataphora",
"open/closed", etc. It's as if someone did a global
s|([a-zA-Z])/([a-zA-Z])|\1\2| search and replace on the grammar. The
fact that nobody seems to have noticed these errors strongly suggests
that these PEG versions of the grammar have not been given sufficient
attention. (doi camxes)

> The latest *official* grammar remains the YACC grammar included in The
> Complete Lojban Language since its publication in 1997. There have
> been attempts to redevelop the grammar in a PEG form, but NONE of the
> PEG grammars has been certified by byfy (or its new successor group)
> as an official replacement for that old one,

What are we waiting for? Lojban has been around since 1985 bi'o 1987,
yet we still don't have a complete grammar or a correct parser. The
YACC has received official blessing, but it's not complete. Handling of
elidable terminators is hand-hacked into the code. The lexer is also
hand-coded, and implements only a good approximation to Lojban
morphology. So far, we have neither a complete specification of the
language, nor a fully correct parser.

> My understanding is that they are more-or-less equivalent to the old
> YACC grammar, but I don't know that this equivalence was ever formally
> proven.

Maybe that's what we're waiting for? A proof that the PEG grammar is
backward-compatible with the YACC? If so, the second half of Bryan
Ford's thesis paper on PEGs describes how to transform parsing
expressions into other forms which could be compared with the YACC.

For reference, that thesis paper is: http://bford.info/pub/lang/peg.pdf

Gleki Arxokuna

unread,
Apr 11, 2020, 10:35:56 AM4/11/20
to lojban


Em sábado, 11 de abril de 2020 06:22:41 UTC+3, scope845h...@icebubble.org escreveu:
Bob LeChevalier <loj...@lojban.org> writes:

> I don't know why those names are used in any particular PEG
> grammar.

...

> For the tag "anaphoracataphora, I suspect that is not a "bug" because
> anaphora and cataphora are identical in term of grammar so it is
> merely a naming convention whether to include some separator between
> the two words or not.  I don't know what construct "openclosed" refers
> to, but likely it also is a conflation of two types of constructs with
> the same grammar.

No, these are not the names of non-terminals in the PEG, they appear in
the text of comments in the PEG.  They should read "anaphora/cataphora",
"open/closed", etc.  It's as if someone did a global
s|([a-zA-Z])/([a-zA-Z])|\1\2| search and replace on the grammar.  The
fact that nobody seems to have noticed these errors strongly suggests
that these PEG versions of the grammar have not been given sufficient
attention.  (doi camxes)

> The latest *official* grammar remains the YACC grammar included in The
> Complete Lojban Language since its publication in 1997. There have
> been attempts to redevelop the grammar in a PEG form, but NONE of the
> PEG grammars has been certified by byfy (or its new successor group)
> as an official replacement for that old one,

What are we waiting for?

PEG has its own deficiencies.
 
 Lojban has been around since 1985 bi'o 1987,
yet we still don't have a complete grammar or a correct parser.  The
YACC has received official blessing, but it's not complete.  Handling of
elidable terminators is hand-hacked into the code.  The lexer is also
hand-coded, and implements only a good approximation to Lojban
morphology.  So far, we have neither a complete specification of the
language, nor a fully correct parser.


True. PEG won't put us forward significantly.
 
> My understanding is that they are more-or-less equivalent to the old
> YACC grammar, but I don't know that this equivalence was ever formally
> proven.

Maybe that's what we're waiting for?  A proof that the PEG grammar is
backward-compatible with the YACC? 

Its not backward compatible by definition.
 
 
 
IfIf so, the second half of Bryan
Ford's thesis paper on PEGs describes how to transform parsing
expressions into other forms which could be compared with the YACC.

Transformation doesn't necessarily imply equivalence.

scope845h...@icebubble.org

unread,
Apr 14, 2020, 11:34:47 AM4/14/20
to loj...@googlegroups.com
Gleki Arxokuna <gleki.is...@gmail.com> writes:

> PEG has its own deficiencies.

What deficiencies?

> True. PEG won't put us forward significantly.

Why wouldn't it? Having a complete specification of the language seems
like it would be a HUGE step forward, to me. From PEG, it would be but
a short distance to having a working, fully-correct parser.

>> Maybe that's what we're waiting for? A proof that the PEG grammar is
>> backward-compatible with the YACC?
>
>
> Its not backward compatible by definition.

It would have to be. Otherwise, currently (and historically)
grammatical Lojban wouldn't be grammatical under the new (PEG
specification of the) grammar. Am I wrong?

>> IfIf so, the second half of Bryan
>> Ford's thesis paper on PEGs describes how to transform parsing
>> expressions into other forms which could be compared with the YACC.
>
>
> Transformation doesn't necessarily imply equivalence.

No, but it would render the PEG in a form which could be *compared* to
the YACC. If you cut-out the morphology rules, and allow for changes
for handling elidable terminators and metalinguistic erasers, the
remainder (the bulk) of the grammar should be formally equivalent to the
current YACC grammar.300. That would prove that the YACC and PEG parse
essentially the same language. Given such a proof, is there any reason
why such a PEG would NOT be accepted as the new baseline grammar?

Gleki Arxokuna

unread,
Apr 14, 2020, 11:57:50 AM4/14/20
to lojban


Em terça-feira, 14 de abril de 2020 18:34:47 UTC+3, scope845h...@icebubble.org escreveu:
Gleki Arxokuna <gleki.is...@gmail.com> writes:

> PEG has its own deficiencies.

What deficiencies?

> True. PEG won't put us forward significantly.

Why wouldn't it?  Having a complete specification of the language seems
like it would be a HUGE step forward, to me.  From PEG, it would be but
a short distance to having a working, fully-correct parser.

>> Maybe that's what we're waiting for?  A proof that the PEG grammar is
>> backward-compatible with the YACC?  
>
>
> Its not backward compatible by definition.

It would have to be.  Otherwise, currently (and historically)
grammatical Lojban wouldn't be grammatical under the new (PEG
specification of the) grammar.  Am I wrong?

>> IfIf so, the second half of Bryan
>> Ford's thesis paper on PEGs describes how to transform parsing
>> expressions into other forms which could be compared with the YACC.
>
>
> Transformation doesn't necessarily imply equivalence.

No, but it would render the PEG in a form which could be *compared* to
the YACC.  If you cut-out the morphology rules,

You need to cut out much more. Compare how many lines the BNF grammar has. One can learn it by heart. Now compare to camxes grammars. 
and allow for changes
for handling elidable terminators and metalinguistic erasers, the
remainder (the bulk) of the grammar should be formally equivalent to the
current YACC grammar.300.  That would prove that the YACC and PEG parse
essentially the same language.  Given such a proof, is there any reason
why such a PEG would NOT be accepted as the new baseline grammar?

Because PEG formalism doesn't allow checking for ambiguities. E.g. PEG is unambiguous even if you add to it rules and subrules that would never match. 

But seriously PEG/CFG are not powerful enough even by BPFK standards (see BPFK pages in the wiki)

scope845h...@icebubble.org

unread,
Apr 17, 2020, 11:41:05 AM4/17/20
to loj...@googlegroups.com
Gleki Arxokuna <gleki.is...@gmail.com> writes:

>> > Transformation doesn't necessarily imply equivalence.
>>
>> No, but it would render the PEG in a form which could be *compared* to
>> the YACC. If you cut-out the morphology rules,
>
>
> You need to cut out much more. Compare how many lines the BNF grammar has.
> One can learn it by heart. Now compare to camxes grammars.

Did you read what I wrote? I'm not talking about MEMORIZING the
grammar. I'm proposing comparison of the formal grammar rules (however
many thousands of lines they may be) derived from the YACC and PEG,
respectively, to see if they parse equivalent languages. That would
prove the equivalence of the PEG to the YACC.

> Because PEG formalism doesn't allow checking for ambiguities. E.g. PEG is
> unambiguous even if you add to it rules and subrules that would never
> match.
>
> But seriously PEG/CFG are not powerful enough even by BPFK standards (see
> BPFK pages in the wiki)

None of what you have writen here makes any sense to me. What do you
mean?

Gleki Arxokuna

unread,
Apr 17, 2020, 3:46:32 PM4/17/20
to lojban


Em sexta-feira, 17 de abril de 2020 18:41:05 UTC+3, scope845h...@icebubble.org escreveu:
Gleki Arxokuna <gleki.is...@gmail.com> writes:

>> > Transformation doesn't necessarily imply equivalence.
>>
>> No, but it would render the PEG in a form which could be *compared* to
>> the YACC.  If you cut-out the morphology rules,
>
>
> You need to cut out much more. Compare how many lines the BNF grammar has.
> One can learn it by heart. Now compare to camxes grammars.

Did you read what I wrote?  I'm not talking about MEMORIZING the
grammar.  I'm proposing comparison of the formal grammar rules (however
many thousands of lines they may be) derived from the YACC and PEG,
respectively, to see if they parse equivalent languages.  That would
prove the equivalence of the PEG to the YACC.

camxes contains morphology, chapter 21 of the CLL doesnt. camxes has multiple rules from BPFK (like handling magic words), chapter 21 doesn't.

If you mean replacing priority choice operator with alternation then camxes becomes ambiguous ( more than one parse tree for almost any worthwhile sentence).
 

> Because PEG formalism doesn't allow checking for ambiguities. E.g. PEG is
> unambiguous even if you add to it rules and subrules that would never
> match.
>
> But seriously PEG/CFG are not powerful enough even by BPFK standards (see
> BPFK pages in the wiki)

None of what you have writen here makes any sense to me.  What do you
mean?

PEG is not the perfect parser by bpfk  standards. Scope of da/BAI/bridi is not handled, fuhe is not handled. By CLL standards internal grammar of UI is not handled.

scope845h...@icebubble.org

unread,
Apr 19, 2020, 11:49:52 AM4/19/20
to loj...@googlegroups.com
Gleki Arxokuna <gleki.is...@gmail.com> writes:

>> Did you read what I wrote? I'm not talking about MEMORIZING the
>> grammar. I'm proposing comparison of the formal grammar rules (however
>> many thousands of lines they may be) derived from the YACC and PEG,
>> respectively, to see if they parse equivalent languages. That would
>> prove the equivalence of the PEG to the YACC.
>>
>
> camxes contains morphology, chapter 21 of the CLL doesnt. camxes has
> multiple rules from BPFK (like handling magic words), chapter 21 doesn't.

I don't think you're actually reading what I write. Are you a bot? ;)

> PEG is not the perfect parser by bpfk standards. Scope of da/BAI/bridi is
> not handled, fuhe is not handled. By CLL standards internal grammar of UI
> is not handled.

OK, now I think you're just saying things for the sake of making
nonsense. Having realized, now, that you cannot (or are not willing to)
make sense, I shall stop asking you to clarify what you mean.

scope845h...@icebubble.org

unread,
Apr 24, 2020, 7:11:28 PM4/24/20
to loj...@googlegroups.com
Bob LeChevalier <loj...@lojban.org> writes:

> On 4/14/2020 1:59 PM, scope845h...@icebubble.org wrote:
>> Gleki Arxokuna <gleki.is...@gmail.com> writes:

>> None of what you have writen here makes any sense to me. What do you
>> mean?
>
> As I said in my other answer (which I seem to have been sending only
> to you and not to the list, so I will continue that way), the official

Hm. That reply of mine wasn't address to you, it was addressed to Gleki
Arxokuna <gleki.is...@gmail.com>.

>> yet we still don't have a complete grammar
>
> The official YACC grammar in CLL is considered complete.

I realize that the YACC is "considered" authoritative, but it is not
complete. For starters, it requires a separate lexer. Neither the
lexer nor parser are usable unless you're in an environment where you
can run code written in C. And, if you do get them to run, the results
are not correct. Neither elidable terminators nor magic words are
handled correctly, and there is no formal specification (just narrative
descriptions in the CLL) for how they should work. For example, I have
yet to see a parser which handles SA correctly.

> I don't understand any PEG grammar; it is gobbledygook to me.

PEG is fairly straightforward. You just have to learn the operators
used in the parsing expressions, and their precedences.

> If a PEG formalization cannot be easily used by a real human being to
> learn and use the language, more easily than the official YACC
> version, the PEG formalization is pretty much useless.

> But there is little real value in a PEG grammar that is merely
> identical to the YACC specification, with no added functionality,
> which is why provable equivalence isn't important enough to bother
> with.

No, no, there would be HUGE value in it! A PEG formalization would be
useful because (1) it would, finally, be a complete specification of
Lojban orthography, morphology, and grammar; (2) it would, finally,
provide proof that Lojban is unambiguous; (3) it would be readily
portable to any computing system, using any programming language; and
(4) it would provide parse trees that could be used to implement a
variety of useful tools for processing Lojban text.

Proving equivalence between the PEG and the YACC is vitally important
because (A) there should be some way to be sure that PEG-based tools are
designed and implemented correctly; and (B) if a PEG formulation is ever
adopted as the official grammar, we would want to make sure it's fully
compatible with the historical YACC version of the grammar.

> It might be nice to have a lexer/parser that can operate on based on
> an official formal grammar but not at the expense of someone being
> able to actually use the formal grammar to learn the language.

> I don't even like E-BNF, which many people apparently prefer to the
> YACC grammar.

The E-BNF is quite readable, although the E-BNF in the CLL has MANY
errors in it. I find the YACC almost completely unintelligible. I only
refer to the YACC when verifying or making corrections to the E-BNF.

> There have been attempts to formalize the morphpology as an algorithm,
> which my wife worked on with a couple other people.

Yes, I know. I remember talking with her about it at Logfest in 2006.
Now 14 years later, I still haven't figured out what Lojban's morphology
rules are supposed to be. That's actually why I'm reading the PEG: to
figure out Lojban's morphology rules.

> started playing with PEG grammars. Again, Nora's algorithm was "good
> enough" in that it completely specified the rules, even if it didn't
> match any formalization scheme.

What we have isn't good enough, because it's an incomplete specification
of Lojban morphology. Aside from the PEG, there is no way to distiguish
fu'ivla from lujvo, for instance. There are a lot of constructs which
could be classified either way, and the CLL doesn't provide enough rules
to disambiguate those cases.

> In the early oughts, we started trying to formalize the morphology in
> a fixed algorithm, NOT in any schema such as YACC or PEG or even BNF,
> and we reached a more or less satisfactory conclusion, though the

Where might this alogrithm be documented? (If you're referring to the
lujvo-making alogrithm printed in the CLL, it's not complete.)

> But no one was ever satisfied with any particular formalization, and
> it has never been a big priority.

I don't understand how formalizing the morphology CAN'T be an important
priority; it's essential to proving the unambiguity of the language.

> result was never officially approved because people were pursuing the
> PEG approach by then. Nora wrote a simplistic Turbo-Pascal program to
> verify that algorithm matched human understanding (which is the

Pascal code is not readily usable in modern computing environments, and
can't readily be translated into rules which ARE useful in modern
software. Nor is it particularly readable, if one is trying to learn
(decipher) the Lojban morphology rules.

> Who is waiting? There's probably no real market for anything more
> sophisticated than we have now. And the approval of "dotside" would

Everyone, I think? That's why there's so much interest in PEG
formalizations. What we have now is a collection of toys. What we want
is a collection of tools. So far, all of our "tools" are really just
assorted collections of hacks: cobbled-together bits of software which
implement approximations of Lojban, each implemented for/in its own very
specific computing environment.

BTW, thank for your post RE: Jeff Prothero.

Gleki Arxokuna

unread,
Apr 25, 2020, 3:33:45 AM4/25/20
to lojban


Em sábado, 25 de abril de 2020 02:11:28 UTC+3, scope845h...@icebubble.org escreveu:
Bob LeChevalier <loj...@lojban.org> writes:

Lojban grammar can NOT be expressed via PEG. PEG is not powerful enough. 

mukti

unread,
Apr 27, 2020, 7:31:29 PM4/27/20
to lojban
I want to address a few of scope845's questions and follow-ups.

First, the question of what is "official" is a sensitive one. There has been a tendency in the lojban community to reject linguistic prescription (making assertions about how people should use the language) in favor of description: How are people actually using the language?

The YACC parser which is represented in the first edition of The Complete Lojban Language has been formally recognized by LLG. It has not, to the best of my knowledge, been actively maintained in some time, and does not accept some more recent developments in lojban usage.

For the last 15 years or so, there has been more development activity in the camxes line of parsers. This started with a Java/PEG parser by Robin Lee Powell that included morphological reforms spearheaded by Jorge Llambias. Ten years later, another significant step was taken when Masato Hagiwara ported the PEG grammar to JavaScript. The best maintained descendant of this parser is the Ilmentufa parser linked above. It fixes bugs in the original parser and adds support for some varieties of usage.

One of the challenges in developing parsers-- and please someone correct me if this is no longer true-- is that while camxes established a corpus of texts which it expects to be acceptable, this practice hasn't always been followed in subsequent parsers, and work to establish a parser-independent AST for lojban has yet to be done: Even when you can verify that parsers accept or reject the same texts, it's less certain that they are analyzing those texts in the same way.

Anyway, scope845, if you are interested in pushing lojban parsers forward, there's a lot of work to be done, and I think that you'll find people are receptive to help doing that work.

On Saturday, April 25, 2020 at 3:33:45 AM UTC-4, Gleki Arxokuna wrote:


Em sábado, 25 de abril de 2020 02:11:28 UTC+3, scope845h...@icebubble.org escreveu:
Bob LeChevalier <loj...@lojban.org> writes:

> On 4/14/2020 1:59 PM, scope845h...@icebubble.org wrote:
>> Gleki Arxokuna <gleki....@gmail.com> writes:

>> None of what you have writen here makes any sense to me.  What do you
>> mean?
>
> As I said in my other answer (which I seem to have been sending only
> to you and not to the list, so I will continue that way), the official

Hm.  That reply of mine wasn't address to you, it was addressed to Gleki
Arxokuna <gleki....@gmail.com>.
Reply all
Reply to author
Forward
0 new messages