Help parsing Lojban from Python? (Hey, Riley! :)

37 views
Skip to first unread message

Robin Lee Powell

unread,
Aug 26, 2021, 10:11:44 PM8/26/21
to loj...@googlegroups.com, shun...@gmail.com

In service to making certain parts of the lojban.org infra a bit
more resilient, I'm updating some stuff that uses
https://github.com/lojban/python-camxes . This relies on java and
the camxes jar, which, whatever, but it's also built on LEPL, which
no longer works (see for example
https://github.com/modoboa/modoboa/issues/1780 ).

https://github.com/teleological/camxes-py is a pure Python
replacement, but is a CLI program rather than a library; it's really
not designed to be used as a library. I'd love it if someone
updated and fixed that.

Unless there's another option? What's the state of the art in this
space?

Robin Powell

unread,
Aug 27, 2021, 12:10:56 AM8/27/21
to lojban
Some additional details are now at https://github.com/lojban/visual-camxes/blob/master/TODO , and camxes.lojban.org is on Python 3.6 with standard Flask.

Riley Martinez-Lynch

unread,
Aug 27, 2021, 11:15:57 PM8/27/21
to Robin Lee Powell, loj...@googlegroups.com
Robin, I'd be happy to make whatever changes are needed to make it work. I don't see the CLI interface as an essential part of the interface, and if I can do something to make it easier to access programmatically, I'd like to do that. Glad to take cues here, or if you wanted to jump on a call or chat, can do that too.

Sent from my iPhone

> On Aug 26, 2021, at 10:11 PM, Robin Lee Powell <robinle...@gmail.com> wrote:
>
> 

Robin Lee Powell

unread,
Aug 28, 2021, 12:02:28 AM8/28/21
to Riley Martinez-Lynch, loj...@googlegroups.com
Feel free to come find me on Libera IRC, or suggest a preferred chat
option for you.

The stuff I want is actually quite simple, though:

(1) I want to confirm that camxes-py is the preferred Python option
these days

(2) I want to be able to run "run" (see
https://github.com/teleological/camxes-py/blob/master/camxes.py#L89
) or something like it in a direct, straightforward way, i.e.:

import camxespy
tree = camxespy.run("mi klama", transformer='camxes-morphology')

, and tree should contain an obvious python representation of the
parse tree.

This requires, AFAICT (I don't actually know Python very well) that
camxes-py have a library struture to it that it doesn't currently
have and that the options be configurable in some way other than
OptionParser.

I can actually do all that myself, but I'm not really a pythonista
and what I do won't be idiomatic at all.

Stretch goals:

(3) Update to most-recent parsimonious; it currently breaks on
0.8.1, but works on 0.6.2

(4) Update to Python 3, but I'm perfectly capable of making a PR for
this myself.

(5) Make a mode that collapses productions with only one child, i.e.
make the output look like this (in terms of productions not syntax):

rlpowell@stodi> echo "mi klama" | camxes -f
Flat layout requested.
text=( sentence=( CMAVO=( KOhA=( mi ) ) BRIVLA=( gismu=( klama ) ) ) )

Instead of this:

root@66324b4aed4b:/src# python camxes.py "mi klama"
["text",["text_1",["paragraphs",["paragraph",["statement",["statement_1",["statement_2",["statement_3",["sentence",[["terms",["terms_1",["terms_2",["abs_term",["abs_term_1",["sumti",["sumti_1",["sumti_2",["sumti_3",["sumti_4",["sumti_5",["sumti_6",["KOhA_clause",[["KOhA","mi"]]]]]]]]]]]]]]],["CU"]],["bridi_tail",["bridi_tail_1",["bridi_tail_2",["bridi_tail_3",["selbri",["selbri_1",["selbri_2",["selbri_3",["selbri_4",["selbri_5",["selbri_6",["tanru_unit",["tanru_unit_1",["tanru_unit_2",["BRIVLA_clause",[["BRIVLA",["gismu","klama"]]]]]]]]]]]]]],["tail_terms",["VAU"]]]]]]]]]]]]]]]

, but as I said before this is not hard to do after the fact once
you have the parse tree.


On Fri, Aug 27, 2021 at 11:15:54PM -0400, Riley Martinez-Lynch
wrote:

Riley Lynch

unread,
Aug 29, 2021, 10:13:10 AM8/29/21
to Robin Lee Powell, loj...@googlegroups.com
I'm going to be spending most of the day driving, but before I do that, I'll try to address a few questions here, and then can follow up by mail or IRC.

(1) I want to confirm that camxes-py is the preferred Python option
these days

I'm not aware of other parsers in python. I specifically developed the parser because I wanted a python implementation to complement your java implementation and Masato and Ilmen's javascript parsers.

I notice now that Randall Holmes has developed a Python PEG parser for Loglan.

(2) I want to be able to run …  it in a direct, straightforward way … and tree should contain an obvious python representation of the
parse tree
(5) Make a mode that collapses productions with only one child

Running will be the easy part. The representation of the parse tree raises some interesting questions.

For camxes-py, I created a transformation of the parse tree which replicated the output of Ilmentufa. I did this so that I could run against the test corpus that you set up for java camxes and verify not only that the python parser could accept the same corpus as the java and javascript parsers, but that it was comprehending the same structures.

That said, the output exposes a lot of the mechanics of the parser specification and obscures the semantics. Ideally, I'd like for the test suites to target compatibility with a semantically-structured representation of the parse. There's been some work on Ilmentufa to post-process the parse tree into something more palatable. Have you taken a look at that?

(3) Update to most-recent parsimonious; it currently breaks on 0.8.1, but works on 0.6.2

I wrote against the most recent version of parsimonious at that time. Glad to see work has continued. I remember the author was working on some performance enhancements, and one problem with camxes-py in its current form is that it is slow.

(4) Update to Python 3

I agree that this should be done.

Robin Lee Powell

unread,
Aug 29, 2021, 1:59:42 PM8/29/21
to Riley Lynch, loj...@googlegroups.com
Oh, it turns out I was looking in the "wrong" repo:
https://github.com/lojban/camxes-py has the Python 3 stuff done
already, by mezohe

On Sun, Aug 29, 2021 at 10:12:56AM -0400, Riley Lynch wrote:
> I'm going to be spending most of the day driving, but before I do that,
> I'll try to address a few questions here, and then can follow up by mail or
> IRC.
>
> (1) I want to confirm that camxes-py is the preferred Python option
> these days
>
> I'm not aware of other parsers in python. I specifically developed the
> parser because I wanted a python implementation to complement your java
> implementation and Masato and Ilmen's javascript parsers.

Well, there's https://github.com/lojban/python-camxes :)

> I notice now that Randall Holmes has developed a Python PEG parser for
> Loglan.
>
> (2) I want to be able to run … it in a direct, straightforward way … and
> tree should contain an obvious python representation of the
> parse tree
> (5) Make a mode that collapses productions with only one child
>
> Running will be the easy part. The representation of the parse tree raises
> some interesting questions.

I'm *far* more interested in someone else doing the running part,
FWIW; I feel competent to play with the parse tree after the fact,
but I don't really know idiomatic Python so if I try to make a
library out of what's there it's going to suck.

> For camxes-py, I created a transformation of the parse tree which
> replicated the output of Ilmentufa. I did this so that I could run against
> the test corpus that you set up for java camxes and verify not only that
> the python parser could accept the same corpus as the java and javascript
> parsers, but that it was comprehending the same structures.
>
> That said, the output exposes a lot of the mechanics of the parser
> specification and obscures the semantics. Ideally, I'd like for the test
> suites to target compatibility with a semantically-structured
> representation of the parse. There's been some work on Ilmentufa to
> post-process the parse tree into something more palatable. Have you taken a
> look at that?

Nope, I actually didn't realize that ilmentufa was a thing until
this conversation. (I'd heard of it, but didn't know what it was.)

(Side comment: the "About" page for both camxes and jboski now
points to all alternatives I'm aware of.)

> (3) Update to most-recent parsimonious; it currently breaks on 0.8.1, but
> works on 0.6.2
>
> I wrote against the most recent version of parsimonious at that time. Glad
> to see work has continued. I remember the author was working on some
> performance enhancements, and one problem with camxes-py in its current
> form is that it is slow.

Again, I'm perfectly happy to do that part, fwiw.

Ilmen

unread,
Aug 29, 2021, 5:53:26 PM8/29/21
to loj...@googlegroups.com
Javascript Camxes (Ilmentufa) is at:

https://github.com/lojban/ilmentufa/
<https://github.com/lojban/ilmentufa/> (repository)

https://lojban.github.io/ilmentufa/camxes.html
<https://lojban.github.io/ilmentufa/camxes.html> (one of the HTML
interfaces)

https://lojban.github.io/ilmentufa/glosser/glosser.htm
<https://lojban.github.io/ilmentufa/glosser/glosser.htm> (another HTML
interface, allowing nested boxes output)

Ilmentufa can also be used via command line by running "run_camxes.js"
with Node.js (see the readme file for details).

—Ilmen.

Robin Lee Powell

unread,
Oct 18, 2021, 1:05:26 AM10/18/21
to loj...@googlegroups.com
Thanks to https://github.com/olpa for getting this done!

I have now done the next step; camxes.lojban.org now uses pure
Python parsing and it's now on Python 3.9 ; see
https://github.com/lojban/visual-camxes/commit/eb850f7aad2e99bbcbc29c43a0c79ebbfdd06646
and https://github.com/lojban/visual-camxes/commit/4dd9cf4698753a3b69129d8793df0f8617f1fea0
> --
> You received this message because you are subscribed to the Google Groups "lojban" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to lojban+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/lojban/1427d3a7-a0fd-4546-8ed9-686b1db2cfddn%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages