vlastezba: First beta version released!

32 views
Skip to first unread message

Johan Pretorius

unread,
Apr 20, 2011, 5:02:28 AM4/20/11
to lojban-beginners
Hi all

You can download it from here: http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download

I have completed the cmavo cluster breakout code, and tested it as far as I was able.

It should be easy enough to run if you have Java 1.6 installed, just go java -jar vlastezba.jar and it will print out usage instructions.

Please download it and test to pieces!  I'd love all your feedback.

Not that it doesn't get very smart at this stage - for instance, it won't know what to do if you feed it a string of lojban that doesn't have any spaces in.  The only clever bit is that it's able to break apart cmavo clusters if they don't have any spaces.

Regards,
Johan

--
Johan Pretorius
Cell: 0829268327

.alyn.post.

unread,
Apr 20, 2011, 10:29:11 AM4/20/11
to lojban-b...@googlegroups.com
Do you have an external representation for your valsi parsing
result? If I hand you the string "coirodo" is there a print
form of that along the lines of ("coi" "ro" "do")?

I would be interested seeing the result from processing a large
data set of words and phrases and comparing that to jbogenturfa'i.
In order to do this I'd need some output format from your program
that I could parse.

jbogenturfa'i uses the morphology PEG grammar that xorxes developed,
so it contains code which I think is similar (and should be
identical in result) to what you are doing:

$ echo "coirodo"|jbogenturfahi --rafske
((cmavo (COI "coi")) (cmavo (PA "ro")) (cmavo (KOhA "do")))

I'd be curious to know whether they are in fact producing identical
results.

-Alan

On Wed, Apr 20, 2011 at 11:02:28AM +0200, Johan Pretorius wrote:
> Hi all
>
> You can download it from here:

> [1]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download

> [2]preto...@gmail.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "Lojban Beginners" group.
> To post to this group, send email to lojban-b...@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban-beginne...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lojban-beginners?hl=en.
>
> References
>
> Visible links
> 1. http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> 2. mailto:preto...@gmail.com

--
.i ma'a lo bradi ku penmi gi'e du

Johan Pretorius

unread,
Apr 20, 2011, 10:51:51 AM4/20/11
to lojban-b...@googlegroups.com
Hi Alan,

That would indeed be an interesting experiment, I'd be quite keen to see the results myself.

Right now, if you just call

   java -jar vlastezba.jar test.txt

with some Lojban text (legal or otherwise) in test.txt, it will return (on stdout), one valsi per line.  So "coirodo" would result in:
   coi
   ro
   do
(you can make it go look up the definitions by passing a second parameter, but it will just add junk to the output that I don't think you'd want)

Right now it doesn't check grammar at all, so you can throw any random collection of words at it (I don't intend for it to ever do this, there are tools out there that are far better at this than I could ever hope to make it).

It also won't give you a classification of valsi - it doesn't "know" when it's dealing with a cmavo (or indeed what class), or a gismu, or a lujvo.  This I DO intend to fix.

I want to add other output formats anyway, so if you want me to do something specific to make your comparison easier, let me know.  Now would be a good time, as I'm going away on holiday for a week, and wanted to spend at least a little bit of time on vlastezba.

In fact, if you are comfortable with Java, feel free to make it do what you need, the source code is on sourceforge.net (http://sourceforge.net/projects/vlastezba/), and is GPL'ed :-)

mu'o mi'e iu'an

Brian Shannon

unread,
Apr 20, 2011, 11:02:36 AM4/20/11
to lojban-b...@googlegroups.com
"This software may not be copied or distributed in any form without
the written permission of Postilion."

This is *not* GPL'ed. Assuming you are the sole copyright holder,
follow the GNU guide to license your software under the GPL.
http://www.gnu.org/licenses/gpl-howto.html

Johan Pretorius

unread,
Apr 20, 2011, 11:06:05 AM4/20/11
to lojban-b...@googlegroups.com
Oh wow, sorry about that... that would be my Eclipse being configured for work purposes.  I'll get rid of that right away.

Thanks for the link!
Johan

.alyn.post.

unread,
Apr 20, 2011, 11:12:14 AM4/20/11
to lojban-b...@googlegroups.com
I can more-or-less work with the what it does now, so that is
sufficient experimentation.

I routinely write code like |if(var=="foo")| when I mean
|if(var.equals("foo"))|, my Java isn't what it could be.

I'm able to parse XML for tree-structured data, which is probably
the easiest choice for interoperability:

XML:

<pruce>
<selruhe>coi ro do</selruhe>
<teryruhe>
<cmavo selmaho="COI">coi</cmavo>
<cmavo selmaho="PA">ro</cmavo>
<cmavo selmaho="KOhA">do</cmavo>
</teryruhe>
</pruce>

If this makes you cringe, then how about:

csv:

klesi,valsi
COI,coi
PA,ro
KOhA,do

Which unfortunately doesn't include the input string; I don't see a
simple way to do that that is normal (as in normal form).

-Alan

> you need, the source code is on [1]sourceforge.net
> ([2]http://sourceforge.net/projects/vlastezba/), and is GPL'ed :-)


>
> mu'o mi'e iu'an
>
> On Wed, Apr 20, 2011 at 4:29 PM, .alyn.post.
> <[3]alyn...@lodockikumazvati.org> wrote:
>
> Do you have an external representation for your valsi parsing
> result? If I hand you the string "coirodo" is there a print
> form of that along the lines of ("coi" "ro" "do")?
>
> I would be interested seeing the result from processing a large
> data set of words and phrases and comparing that to jbogenturfa'i.
> In order to do this I'd need some output format from your program
> that I could parse.
>
> jbogenturfa'i uses the morphology PEG grammar that xorxes developed,
> so it contains code which I think is similar (and should be
> identical in result) to what you are doing:
>
> $ echo "coirodo"|jbogenturfahi --rafske
> ((cmavo (COI "coi")) (cmavo (PA "ro")) (cmavo (KOhA "do")))
>
> I'd be curious to know whether they are in fact producing identical
> results.
>
> -Alan
> On Wed, Apr 20, 2011 at 11:02:28AM +0200, Johan Pretorius wrote:
> > Hi all
> >
> > You can download it from here:
> >

> [1][4]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download


> >
> > I have completed the cmavo cluster breakout code, and tested it as far
> as
> > I was able.
> >
> > It should be easy enough to run if you have Java 1.6 installed, just
> go
> > java -jar vlastezba.jar and it will print out usage instructions.
> >
> > Please download it and test to pieces! I'd love all your feedback.
> >
> > Not that it doesn't get very smart at this stage - for instance, it
> won't
> > know what to do if you feed it a string of lojban that doesn't have
> any
> > spaces in. The only clever bit is that it's able to break apart cmavo
> > clusters if they don't have any spaces.
> >
> > Regards,
> > Johan
> >
> > --
> > Johan Pretorius
> > Cell: 0829268327

> > [2][5]preto...@gmail.com


> >
> > --
> > You received this message because you are subscribed to the Google
> Groups
> > "Lojban Beginners" group.
> > To post to this group, send email to

> [6]lojban-b...@googlegroups.com.


> > To unsubscribe from this group, send email to

> > [7]lojban-beginne...@googlegroups.com.


> > For more options, visit this group at

> > [8]http://groups.google.com/group/lojban-beginners?hl=en.
> >
> > References
> >
> > Visible links
> > 1.
> [9]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> > 2. mailto:[10]preto...@gmail.com


>
> --
> .i ma'a lo bradi ku penmi gi'e du
> --
> You received this message because you are subscribed to the Google
> Groups "Lojban Beginners" group.
> To post to this group, send email to

> [11]lojban-b...@googlegroups.com.


> To unsubscribe from this group, send email to

> [12]lojban-beginne...@googlegroups.com.


> For more options, visit this group at

> [13]http://groups.google.com/group/lojban-beginners?hl=en.


>
> --
> Johan Pretorius
> Cell: 0829268327

> [14]preto...@gmail.com


>
> --
> You received this message because you are subscribed to the Google Groups
> "Lojban Beginners" group.
> To post to this group, send email to lojban-b...@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban-beginne...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lojban-beginners?hl=en.
>
> References
>
> Visible links
> 1. http://sourceforge.net/

> 2. http://sourceforge.net/projects/vlastezba/
> 3. mailto:alyn...@lodockikumazvati.org
> 4. http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> 5. mailto:preto...@gmail.com
> 6. mailto:lojban-b...@googlegroups.com
> 7. mailto:lojban-beginners%2Bunsu...@googlegroups.com
> 8. http://groups.google.com/group/lojban-beginners?hl=en
> 9. http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> 10. mailto:preto...@gmail.com
> 11. mailto:lojban-b...@googlegroups.com
> 12. mailto:lojban-beginners%2Bunsu...@googlegroups.com
> 13. http://groups.google.com/group/lojban-beginners?hl=en
> 14. mailto:preto...@gmail.com

.alyn.post.

unread,
Apr 20, 2011, 11:22:15 AM4/20/11
to lojban-b...@googlegroups.com
I'm not getting the result you report:

$ echo "coirodo"|java -jar vlastezba.jar /dev/fd/0
Read file [/dev/fd/0], got [0] unique words.

This is also happening if I write the file and try it:

$ cat test.txt
coirodo
$ java -jar vlastezba.jar test.txt
Read file [test.txt], got [0] unique words.

Here is my java version:

$ java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07-334-10M3326)
Java HotSpot(TM) Client VM (build 19.1-b02-334, mixed mode)

-Alan

> you need, the source code is on [1]sourceforge.net

> ([2]http://sourceforge.net/projects/vlastezba/), and is GPL'ed :-)


>
> mu'o mi'e iu'an
>
> On Wed, Apr 20, 2011 at 4:29 PM, .alyn.post.
> <[3]alyn...@lodockikumazvati.org> wrote:
>
> Do you have an external representation for your valsi parsing
> result? If I hand you the string "coirodo" is there a print
> form of that along the lines of ("coi" "ro" "do")?
>
> I would be interested seeing the result from processing a large
> data set of words and phrases and comparing that to jbogenturfa'i.
> In order to do this I'd need some output format from your program
> that I could parse.
>
> jbogenturfa'i uses the morphology PEG grammar that xorxes developed,
> so it contains code which I think is similar (and should be
> identical in result) to what you are doing:
>
> $ echo "coirodo"|jbogenturfahi --rafske
> ((cmavo (COI "coi")) (cmavo (PA "ro")) (cmavo (KOhA "do")))
>
> I'd be curious to know whether they are in fact producing identical
> results.
>
> -Alan
> On Wed, Apr 20, 2011 at 11:02:28AM +0200, Johan Pretorius wrote:
> > Hi all
> >
> > You can download it from here:
> >

> [1][4]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download


> >
> > I have completed the cmavo cluster breakout code, and tested it as far
> as
> > I was able.
> >
> > It should be easy enough to run if you have Java 1.6 installed, just
> go
> > java -jar vlastezba.jar and it will print out usage instructions.
> >
> > Please download it and test to pieces! I'd love all your feedback.
> >
> > Not that it doesn't get very smart at this stage - for instance, it
> won't
> > know what to do if you feed it a string of lojban that doesn't have
> any
> > spaces in. The only clever bit is that it's able to break apart cmavo
> > clusters if they don't have any spaces.
> >
> > Regards,
> > Johan
> >
> > --
> > Johan Pretorius
> > Cell: 0829268327

> > [2][5]preto...@gmail.com


> >
> > --
> > You received this message because you are subscribed to the Google
> Groups
> > "Lojban Beginners" group.
> > To post to this group, send email to

> [6]lojban-b...@googlegroups.com.


> > To unsubscribe from this group, send email to

> > [7]lojban-beginne...@googlegroups.com.


> > For more options, visit this group at

> > [8]http://groups.google.com/group/lojban-beginners?hl=en.
> >
> > References
> >
> > Visible links
> > 1.
> [9]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> > 2. mailto:[10]preto...@gmail.com


>
> --
> .i ma'a lo bradi ku penmi gi'e du
> --
> You received this message because you are subscribed to the Google
> Groups "Lojban Beginners" group.
> To post to this group, send email to

> [11]lojban-b...@googlegroups.com.


> To unsubscribe from this group, send email to

> [12]lojban-beginne...@googlegroups.com.


> For more options, visit this group at

> [13]http://groups.google.com/group/lojban-beginners?hl=en.


>
> --
> Johan Pretorius
> Cell: 0829268327

> [14]preto...@gmail.com


>
> --
> You received this message because you are subscribed to the Google Groups
> "Lojban Beginners" group.
> To post to this group, send email to lojban-b...@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban-beginne...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lojban-beginners?hl=en.
>
> References
>
> Visible links

> 14. mailto:preto...@gmail.com

Johan Pretorius

unread,
Apr 20, 2011, 11:24:31 AM4/20/11
to lojban-b...@googlegroups.com
Okay, the licensing is fixed now.

Alan, The fact that you know that's a problem puts you in 1% of the population :-)

Anyway, I'm not diametrically opposed to XML just for the sake of being opposed... it's worth looking at, especially, as you say, for interoperability.

Do you think it's necessary to include the input string?  I foresee vlastezba being used for large bodies of text, anyway that's how I intend to use it for myself: I feed it the terry the tiger story and let it build me something I can print out, which means my sucky vocabulary does not stop me reading the story, albeit slowly.

Maybe it's a good idea to make that configurable.

-Johan

Brian Shannon

unread,
Apr 20, 2011, 11:31:35 AM4/20/11
to lojban-b...@googlegroups.com
On 20/04/2011, Johan Pretorius <preto...@gmail.com> wrote:

.alyn.post.

unread,
Apr 20, 2011, 11:42:36 AM4/20/11
to lojban-b...@googlegroups.com
I had not considered the use case of a large story, I was thinking
of individual test strings and my need to know how input was paired
with output. Particularly, I didn't want erroneous input in one
test case to cause another input to parse incorrectly. I can (and
should, really) work around this by calling the program multiple
times.

BTW, what result does your program produce for:

ba'e ba'er ba'ercatra

That should be something like:

((cmavo (BAhE "ba'e")) (cmene "ba'er") (lujvo "ba'ercatra"))

With different results being produced depending on whether the
spaces are there or not. I'm curious if you're handling that
correctly.

-Alan

> > you need, the source code is on [1][2]sourceforge.net
> > ([2][3]http://sourceforge.net/projects/vlastezba/), and is GPL'ed :-)


> >
> > mu'o mi'e iu'an
> >
> > On Wed, Apr 20, 2011 at 4:29 PM, .alyn.post.
> > <[3][4]alyn...@lodockikumazvati.org> wrote:
> >
> > Do you have an external representation for your valsi parsing
> > result? If I hand you the string "coirodo" is there a print
> > form of that along the lines of ("coi" "ro" "do")?
> >
> > I would be interested seeing the result from processing a large
> > data set of words and phrases and comparing that to jbogenturfa'i.
> > In order to do this I'd need some output format from your program
> > that I could parse.
> >
> > jbogenturfa'i uses the morphology PEG grammar that xorxes developed,
> > so it contains code which I think is similar (and should be
> > identical in result) to what you are doing:
> >
> > $ echo "coirodo"|jbogenturfahi --rafske
> > ((cmavo (COI "coi")) (cmavo (PA "ro")) (cmavo (KOhA "do")))
> >
> > I'd be curious to know whether they are in fact producing identical
> > results.
> >
> > -Alan
> > On Wed, Apr 20, 2011 at 11:02:28AM +0200, Johan Pretorius wrote:
> > > Hi all
> > >
> > > You can download it from here:
> > >
> >

> [1][4][5]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download


> > >
> > > I have completed the cmavo cluster breakout code, and tested it as
> far
> > as
> > > I was able.
> > >
> > > It should be easy enough to run if you have Java 1.6 installed, just
> > go
> > > java -jar vlastezba.jar and it will print out usage instructions.
> > >
> > > Please download it and test to pieces! I'd love all your feedback.
> > >
> > > Not that it doesn't get very smart at this stage - for instance, it
> > won't
> > > know what to do if you feed it a string of lojban that doesn't have
> > any
> > > spaces in. The only clever bit is that it's able to break apart
> cmavo
> > > clusters if they don't have any spaces.
> > >
> > > Regards,
> > > Johan
> > >
> > > --
> > > Johan Pretorius
> > > Cell: 0829268327

> > > [2][5][6]preto...@gmail.com


> > >
> > > --
> > > You received this message because you are subscribed to the Google
> > Groups
> > > "Lojban Beginners" group.
> > > To post to this group, send email to

> > [6][7]lojban-b...@googlegroups.com.


> > > To unsubscribe from this group, send email to

> > > [7][8]lojban-beginne...@googlegroups.com.


> > > For more options, visit this group at

> > > [8][9]http://groups.google.com/group/lojban-beginners?hl=en.
> > >
> > > References
> > >
> > > Visible links
> > > 1.
> >
> [9][10]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> > > 2. mailto:[10][11]preto...@gmail.com


> >
> > --
> > .i ma'a lo bradi ku penmi gi'e du
> > --
> > You received this message because you are subscribed to the Google
> > Groups "Lojban Beginners" group.
> > To post to this group, send email to

> > [11][12]lojban-b...@googlegroups.com.


> > To unsubscribe from this group, send email to

> > [12][13]lojban-beginne...@googlegroups.com.


> > For more options, visit this group at

> > [13][14]http://groups.google.com/group/lojban-beginners?hl=en.


> >
> > --
> > Johan Pretorius
> > Cell: 0829268327

> > [14][15]preto...@gmail.com


> >
> > --
> > You received this message because you are subscribed to the Google
> Groups
> > "Lojban Beginners" group.
> > To post to this group, send email to

> [16]lojban-b...@googlegroups.com.


> > To unsubscribe from this group, send email to

> > [17]lojban-beginne...@googlegroups.com.


> > For more options, visit this group at

> > [18]http://groups.google.com/group/lojban-beginners?hl=en.
> >
> > References
> >
> > Visible links
> > 1. [19]http://sourceforge.net/
> > 2. [20]http://sourceforge.net/projects/vlastezba/
> > 3. mailto:[21]alyn...@lodockikumazvati.org
> > 4.
> [22]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> > 5. mailto:[23]preto...@gmail.com
> > 6. mailto:[24]lojban-b...@googlegroups.com
> > 7. mailto:[25]lojban-beginners%2Bunsu...@googlegroups.com
> > 8. [26]http://groups.google.com/group/lojban-beginners?hl=en
> > 9.
> [27]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> > 10. mailto:[28]preto...@gmail.com
> > 11. mailto:[29]lojban-b...@googlegroups.com
> > 12. mailto:[30]lojban-beginners%2Bunsu...@googlegroups.com
> > 13. [31]http://groups.google.com/group/lojban-beginners?hl=en
> > 14. mailto:[32]preto...@gmail.com


> --
> .i ma'a lo bradi ku penmi gi'e du
>
> --
> You received this message because you are subscribed to the Google
> Groups "Lojban Beginners" group.
> To post to this group, send email to

> [33]lojban-b...@googlegroups.com.


> To unsubscribe from this group, send email to

> [34]lojban-beginne...@googlegroups.com.


> For more options, visit this group at

> [35]http://groups.google.com/group/lojban-beginners?hl=en.


>
> --
> Johan Pretorius
> Cell: 0829268327

> [36]preto...@gmail.com


>
> --
> You received this message because you are subscribed to the Google Groups
> "Lojban Beginners" group.
> To post to this group, send email to lojban-b...@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban-beginne...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lojban-beginners?hl=en.
>
> References
>
> Visible links

> 1. mailto:alyn...@lodockikumazvati.org
> 2. http://sourceforge.net/
> 3. http://sourceforge.net/projects/vlastezba/
> 4. mailto:alyn...@lodockikumazvati.org
> 5. http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> 6. mailto:preto...@gmail.com
> 7. mailto:lojban-b...@googlegroups.com
> 8. mailto:lojban-beginners%2Bunsu...@googlegroups.com
> 9. http://groups.google.com/group/lojban-beginners?hl=en
> 10. http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> 11. mailto:preto...@gmail.com
> 12. mailto:lojban-b...@googlegroups.com
> 13. mailto:lojban-beginners%2Bunsu...@googlegroups.com
> 14. http://groups.google.com/group/lojban-beginners?hl=en
> 15. mailto:preto...@gmail.com
> 16. mailto:lojban-b...@googlegroups.com
> 17. mailto:lojban-beginners%2Bunsu...@googlegroups.com
> 18. http://groups.google.com/group/lojban-beginners?hl=en
> 19. http://sourceforge.net/
> 20. http://sourceforge.net/projects/vlastezba/
> 21. mailto:alyn...@lodockikumazvati.org
> 22. http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> 23. mailto:preto...@gmail.com
> 24. mailto:lojban-b...@googlegroups.com
> 25. mailto:lojban-beginners%252Buns...@googlegroups.com
> 26. http://groups.google.com/group/lojban-beginners?hl=en
> 27. http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> 28. mailto:preto...@gmail.com
> 29. mailto:lojban-b...@googlegroups.com
> 30. mailto:lojban-beginners%252Buns...@googlegroups.com
> 31. http://groups.google.com/group/lojban-beginners?hl=en
> 32. mailto:preto...@gmail.com
> 33. mailto:lojban-b...@googlegroups.com
> 34. mailto:lojban-beginners%2Bunsu...@googlegroups.com
> 35. http://groups.google.com/group/lojban-beginners?hl=en
> 36. mailto:preto...@gmail.com

Johan Pretorius

unread,
Apr 20, 2011, 12:02:09 PM4/20/11
to lojban-b...@googlegroups.com
To be honest, I sucked that example out of my ear based on how it was meant to work.  The reason you didn't get that result, is because there were two bugs in the code:
 - We were ignoring the first line of any file (fixed it)
 - when the last word of the file is a compound cmavo, it gets misparsed and you only end up getting the first cmavo from the cluster, the rest are ignored.  This one needs more careful thought.

So, with the new jar file that I'm uploading as I speak, you should be able to do what tried to, just make sure you tack a gismu or something to the end of your file, so that you get accurate results.  This time I tested it before making any wild claims :-)

Johan Pretorius

unread,
Apr 20, 2011, 12:08:55 PM4/20/11
to lojban-b...@googlegroups.com
What is {ba'er} and what is {ba'ercatra}?

When the spaces are there, I get:
   Read file [coirodo.txt], got [3] unique words.
   ba'er    
   ba'e     
   ba'ercatra

When I take out the spaces, I get...
   Read file [coirodo.txt], got [1] unique words.
   ba'eba'erba'ercatra

Which is... correct, I think?  ba'er isn't a cmavo, so that thing isn't a cmavo cluster.


On Wed, Apr 20, 2011 at 5:42 PM, .alyn.post. <alyn...@lodockikumazvati.org> wrote:
 ba'e ba'er ba'ercatra



--
Johan Pretorius
Cell: 0829268327

Johan Pretorius

unread,
Apr 20, 2011, 12:13:37 PM4/20/11
to lojban-b...@googlegroups.com
I have added this as a bug to the project so it doesn't fall into a crack and disappear.

.alyn.post.

unread,
Apr 20, 2011, 12:16:07 PM4/20/11
to lojban-b...@googlegroups.com
Ha! I took a look at your parser and I can see how both those
mistakes could be made. :-)

I haven't updated my .jar file, I tried instead to work around the
bugs you report below:

$ echo "^Mba'e ba'er ba'ercatra broda"|java -jar vlastezba.jar /dev/fd/0
Read file [/dev/fd/0], got [4] unique words.


ba'er
ba'e
ba'ercatra

broda

Ok, that seems pretty reasonable. If I remove all the spaces, I still
expect there to be two words:

$ echo "^Mba'eba'erba'ercatra broda"|java -jar vlastezba.jar /dev/fd/0
Read file [/dev/fd/0], got [2] unique words.


ba'eba'erba'ercatra
broda

That "ba'eba'erba'ercatra" should be two words, "ba'e" and the lujvo
"ba'erba'ercatra"

It also appears I can break things by using '.':

$ echo "^Mba'e.ba'erba'ercatra broda"|java -jar vlastezba.jar /dev/fd/0
Read file [/dev/fd/0], got [2] unique words.


ba'e.ba'erba'ercatra
broda

There aren't any Lojban words with '.' in them.

This problem is perhaps better demonstrated here:

$ echo "^Mcoi.ro.do broda"|java -jar vlastezba.jar /dev/fd/0
lojban.vlastezba.TokenizerFailure: Could not find any cmavo in [coi.ro.do] - last candidate cmavo was [.r], cmavo list is: {coi}
at lojban.vlastezba.LojbanTokenizer.breakOutCmavo(LojbanTokenizer.java:292)
at lojban.vlastezba.LojbanTokenizer.getNextWord(LojbanTokenizer.java:182)
at lojban.vlastezba.LojbanTokenizer.nextWord(LojbanTokenizer.java:473)
at lojban.vlastezba.GlossaryCreator.loadHashMap(GlossaryCreator.java:128)
at lojban.vlastezba.GlossaryCreator.createGlossary(GlossaryCreator.java:32)
at lojban.vlastezba.GlossaryCreator.main(GlossaryCreator.java:183)


Cannonically, whitespace in Lojban is all of/some of the whitespace
character class in your locale (until we get our own locale, probably the
English whitespace character class) and the '.' character. It is often
(informally) also any punctuation other than ' (and the UTF8 version
of that...) and ,

-Alan

> > you need, the source code is on [1][2]sourceforge.net

> > ([2][3]http://sourceforge.net/projects/vlastezba/), and is GPL'ed :-)


> >
> > mu'o mi'e iu'an
> >
> > On Wed, Apr 20, 2011 at 4:29 PM, .alyn.post.
> > <[3][4]alyn...@lodockikumazvati.org> wrote:
> >
> > Do you have an external representation for your valsi parsing
> > result? If I hand you the string "coirodo" is there a print
> > form of that along the lines of ("coi" "ro" "do")?
> >
> > I would be interested seeing the result from processing a large
> > data set of words and phrases and comparing that to jbogenturfa'i.
> > In order to do this I'd need some output format from your program
> > that I could parse.
> >
> > jbogenturfa'i uses the morphology PEG grammar that xorxes developed,
> > so it contains code which I think is similar (and should be
> > identical in result) to what you are doing:
> >
> > $ echo "coirodo"|jbogenturfahi --rafske
> > ((cmavo (COI "coi")) (cmavo (PA "ro")) (cmavo (KOhA "do")))
> >
> > I'd be curious to know whether they are in fact producing identical
> > results.
> >
> > -Alan
> > On Wed, Apr 20, 2011 at 11:02:28AM +0200, Johan Pretorius wrote:
> > > Hi all
> > >
> > > You can download it from here:
> > >
> >

> [1][4][5]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download


> > >
> > > I have completed the cmavo cluster breakout code, and tested it as
> far
> > as
> > > I was able.
> > >
> > > It should be easy enough to run if you have Java 1.6 installed, just
> > go
> > > java -jar vlastezba.jar and it will print out usage instructions.
> > >
> > > Please download it and test to pieces! I'd love all your feedback.
> > >
> > > Not that it doesn't get very smart at this stage - for instance, it
> > won't
> > > know what to do if you feed it a string of lojban that doesn't have
> > any
> > > spaces in. The only clever bit is that it's able to break apart
> cmavo
> > > clusters if they don't have any spaces.
> > >
> > > Regards,
> > > Johan
> > >
> > > --
> > > Johan Pretorius
> > > Cell: 0829268327

> > > [2][5][6]preto...@gmail.com


> > >
> > > --
> > > You received this message because you are subscribed to the Google
> > Groups
> > > "Lojban Beginners" group.
> > > To post to this group, send email to

> > [6][7]lojban-b...@googlegroups.com.


> > > To unsubscribe from this group, send email to

> > > [7][8]lojban-beginne...@googlegroups.com.


> > > For more options, visit this group at

> > > [8][9]http://groups.google.com/group/lojban-beginners?hl=en.
> > >
> > > References
> > >
> > > Visible links
> > > 1.
> >
> [9][10]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> > > 2. mailto:[10][11]preto...@gmail.com


> >
> > --
> > .i ma'a lo bradi ku penmi gi'e du
> > --
> > You received this message because you are subscribed to the Google
> > Groups "Lojban Beginners" group.
> > To post to this group, send email to

> > [11][12]lojban-b...@googlegroups.com.


> > To unsubscribe from this group, send email to

> > [12][13]lojban-beginne...@googlegroups.com.


> > For more options, visit this group at

> > [13][14]http://groups.google.com/group/lojban-beginners?hl=en.


> >
> > --
> > Johan Pretorius
> > Cell: 0829268327

> > [14][15]preto...@gmail.com


> >
> > --
> > You received this message because you are subscribed to the Google
> Groups
> > "Lojban Beginners" group.
> > To post to this group, send email to

> [16]lojban-b...@googlegroups.com.


> > To unsubscribe from this group, send email to

> > [17]lojban-beginne...@googlegroups.com.


> > For more options, visit this group at

> > [18]http://groups.google.com/group/lojban-beginners?hl=en.
> >
> > References
> >
> > Visible links

> > 14. mailto:[32]preto...@gmail.com


> --
> .i ma'a lo bradi ku penmi gi'e du
>
> --
> You received this message because you are subscribed to the Google
> Groups "Lojban Beginners" group.
> To post to this group, send email to

> [33]lojban-b...@googlegroups.com.


> To unsubscribe from this group, send email to

> [34]lojban-beginne...@googlegroups.com.


> For more options, visit this group at

> [35]http://groups.google.com/group/lojban-beginners?hl=en.


>
> --
> Johan Pretorius
> Cell: 0829268327

> [36]preto...@gmail.com


>
> --
> You received this message because you are subscribed to the Google Groups
> "Lojban Beginners" group.
> To post to this group, send email to lojban-b...@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban-beginne...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lojban-beginners?hl=en.
>
> References
>
> Visible links

> 36. mailto:preto...@gmail.com

.alyn.post.

unread,
Apr 20, 2011, 12:29:23 PM4/20/11
to lojban-b...@googlegroups.com
ba'e, ba'er, and ba'ercatra are to me a fun demonstration of an
interesting aspect of Lojban morphology.

ba'e is a cmavo for emphasis. ba'e is *also* a rafsi for balre (blade).
Since a rafsi can't appear as a word by itself, it is perfectly legal
for the letters making up a rafsi to have a separate, independent cmavo
definition. ba'er is also a valid cmene, so it is a cute example of an
interesting case where ba'e, by itself, is a cmavo, but but in a lujvo
means "blade." In the particular lujvo I chose, the required 'r' after
ba'e, when *not* followed by more lujvo-stuff makes ba'er a cmene, having
nothing to do with either emphasis or bladeness.

-Alan

> [2]preto...@gmail.com


>
> --
> You received this message because you are subscribed to the Google Groups
> "Lojban Beginners" group.
> To post to this group, send email to lojban-b...@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban-beginne...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lojban-beginners?hl=en.
>
> References
>
> Visible links
> 1. mailto:alyn...@lodockikumazvati.org

> 2. mailto:preto...@gmail.com

Pierre Abbat

unread,
Apr 20, 2011, 1:02:30 PM4/20/11
to lojban-b...@googlegroups.com
On Wednesday 20 April 2011 12:29:23 .alyn.post. wrote:
> ba'e, ba'er, and ba'ercatra are to me a fun demonstration of an
> interesting aspect of Lojban morphology.

Dan and Baher are killers. Dan shoots people; Baher stabs people. la dan.
dancatra .i la ba'e .ba'er. ba'ercatra

mu'omi'e .pier.
--
I believe in Yellow when I'm in Sweden and in Black when I'm in Wales.

.alyn.post.

unread,
Apr 21, 2011, 8:35:57 AM4/21/11
to lojban-b...@googlegroups.com
Attached you'll find a copy of cmavo.txt and gismu.txt with only the
actual words retained. The definition and class information has
been removed. I run this file through jbogenturfa'i one line at a
time and output all of the words seen on each line. This file is
called baseline.txt. I did the same for vlastezba, creating
output.txt. I've also attached the script I used to run vlastezba,
which works around the two previously reported issues with the
parser (one of which you've already fixed.)

The file diff.txt is a comparison between jbogenturfa'i and
vlastezba. While this is not an exhaustive test (earlier issues
I reported are not covered by these test cases), I do consider it a
sort of minimal test. There are no lujvo, fu'ivla, cmene, or
non-Lojban words here, however.

vlastezba crashes on 80 of the 2433 tests. For the remaining tests,
it produces results consistent with jbogenturfa'i. I suspect most
of these tests are crashing on the same or nearly the same problem;
they're clustered in the input around certain word forms.

Will you fix your program so it no longer crashes on these 80 lines?
Will you add regression testing for these cases so you can verify
that the program still works after changes?

-Alan

On Wed, Apr 20, 2011 at 11:02:28AM +0200, Johan Pretorius wrote:
> Hi all
>
> You can download it from here:

> [1]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download

> [2]preto...@gmail.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "Lojban Beginners" group.
> To post to this group, send email to lojban-b...@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban-beginne...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lojban-beginners?hl=en.
>
> References
>
> Visible links

> 1. http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download

vlastezba-cipra.tgz

Johan Pretorius

unread,
Apr 21, 2011, 3:30:26 PM4/21/11
to lojban-b...@googlegroups.com

Hi Alan

I certainly intend to! (that is, both fix the errors and add regression tests) The next week or two is an excellent time for it too.  I spent today travelling, hopefully tomorrow will afford me a few hours to focus on this.

Thank you for the comparitive testing you did! It is something I might not have thought of, left to my own devices :-)

-Johan
sent from my X10 Mini

>...

>    [2]preto...@gmail.com
>
>    --

> You received this message because you are subscribed to the Google Groups

> "Lojban Beginner...

> References
>
>    Visible links
>    1. http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download

> 2. mailto:preto...@gmail.com

--
.i ma'a lo bradi ku penmi gi'e du

--

You received this message because you are subscribed to the Google Groups "Lojban Beginners" group.

...

.alyn.post.

unread,
Apr 21, 2011, 3:41:42 PM4/21/11
to lojban-b...@googlegroups.com
\o/

I look forward to hearing an update from you once you've fixed this
round of bugs. I have another file containing test phrases that has
been passed around and used by a few Lojbanic programs. I don't have
a good description of what the results from that file *should* be, so
I haven't done much with it yet. (The amount of manual labor
required to validate it is beyond my limit.)

Your word boundary detection works differently than what I do by
parsing with PEG, it would be neat to see how our programs differ
on this comparatively more complicated file. It would at least
narrow down the set of things in that file that should be hand
checked.

-Alan

On Thu, Apr 21, 2011 at 09:30:26PM +0200, Johan Pretorius wrote:
> Hi Alan
>
> I certainly intend to! (that is, both fix the errors and add regression
> tests) The next week or two is an excellent time for it too. I spent today
> travelling, hopefully tomorrow will afford me a few hours to focus on
> this.
>
> Thank you for the comparitive testing you did! It is something I might not
> have thought of, left to my own devices :-)
>
> -Johan
> sent from my X10 Mini
>
> On 21 Apr 2011 2:36 PM, ".alyn.post."

> [1][2]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download


>
> >
> > I have completed the cmavo cluster breakout code, and tested it as far
> as
> > I was able.
> >...
>

> > [2][3]preto...@gmail.com


> >
> > --
>
> > You received this message because you are subscribed to the Google
> Groups
> > "Lojban Beginner...
>
> > References
> >
> > Visible links
> > 1.

> [4]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
>
> > 2. mailto:[5]preto...@gmail.com


>
> --
> .i ma'a lo bradi ku penmi gi'e du
>
> --
>
> You received this message because you are subscribed to the Google
> Groups "Lojban Beginners" group.
> ...
>
> --
> You received this message because you are subscribed to the Google Groups
> "Lojban Beginners" group.

> To post to this group, send email to lojban-b...@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban-beginne...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lojban-beginners?hl=en.
>
> References
>
> Visible links

> 1. mailto:alyn...@lodockikumazvati.org
> 2. http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> 3. mailto:preto...@gmail.com
> 4. http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> 5. mailto:preto...@gmail.com

Johan Pretorius

unread,
Apr 21, 2011, 4:42:40 PM4/21/11
to lojban-b...@googlegroups.com
Hi Alan

I thought I would have a quick look before going to bed... and it seems that  the troublesome word forms you mention are mostly (99%) cmavo that don't start with a consonant.

When writing vlastezba, I tried to adhere as strictly as possible to the word forms specified in the CLL, in section 4.2:  In there, all cmavo that start with a vowel should have a dot pre-pended.  So the forms, as I understand them, are as follows, where any two adjacent vowels may or may not have a ' inserted between them:
.V
.y.
CV
.VV
CVV
Cy
CVVV (experimental use only)

The only dot I could find in cmavo.txt was in "na.a", which seems like a mistake to me.  Anyway, I took cmavo.txt, prepended dots to all lines starting with a vowel, ran it again - this time vlastezba choked on "ybu".  I remember now that I hadn't defined y as either a consonant or a vowel, because the CLL was clear that it is NOT a vowel, but it wasn't at all clear that it was a consonant.  Indeed, if "ybu" is a valid cmavo, that would seem to imply that "y" is a vowel, yet I clearly recall reading a list of vowels that excluded "y", with the explanation that it isn't really a vowel. 

So I removed the "ybu" line from the file, and this time it parsed okay (see attached cmavo_dots.stdout).  Do you mind running the attached cmavo_dots.txt through jbogenturfa'i to see if  we get fewer than 80 discrepancies this time (I would do it myself, but I don't have a copy of jbogenturfa'i or, for that matter, a copy of Linux)

So I have three questions:
1) Is "y" a vowel or a consonant?
2) Are dots before vowel-only cmavo required or not?
3) If the dots are required, am I being reasonable to expect them to be present?

-Johan


On Thu, Apr 21, 2011 at 2:35 PM, .alyn.post. <alyn...@lodockikumazvati.org> wrote:
vlastezba crashes on 80 of the 2433 tests.  For the remaining tests,
it produces results consistent with jbogenturfa'i.  I suspect most
of these tests are crashing on the same or nearly the same problem;
they're clustered in the input around certain word forms.



--
Johan Pretorius
Cell: 0829268327

cipra.zip

.alyn.post.

unread,
Apr 21, 2011, 5:01:12 PM4/21/11
to lojban-b...@googlegroups.com
Here are the definitions of consonants and vowels from the PEG file:


consonant <- voiced
/ unvoiced
/ syllabic

syllabic <- l / m / n / r

voiced <- b / d / g / j / v / z
unvoiced <- c / f / k / p / s / t / x

vowel <- a / e / i / o / u

You'll have to ask another question. :-)

ybu is valid, it being the y letteral in selma'o BY (I believe it has
also been two words, Y+BU, in the past though I could be wrong about
that. For your purposes either definition is probably ok.)

Will you quote the phrase you're referring to regarding pauses? I
see this one:

The cmavo “.u'e” begins with a vowel, and like all words
beginning with a vowel, requires a pause (represented by “.”)
before it.

I understand "represented" to be different than "required." In
particular, these lines are all the same:

.a'e.i'o
a'e.i'o
a'e i'o
.a'e i'o

In that '.' and ' ' are the same. In section 3.3 there is this:

Technically, the period is an optional reminder to the reader of a
mandatory pause that is dictated by the rules of the language;
because these rules are unambiguous, a missing period can be
inferred from otherwise correct text. Periods are included only as
an aid to the reader.

A period also may be found apparently embedded in a word. When
this occurs, such a written string is not one word but two,
written together to indicate that the writer intends a unitary
meaning for the compound. It is not really necessary to use a
space between words if a period appears.

Given my understanding of the above, I believe your requiring a
period to represent this pause to be in error.

I will rerun my baseline and reply back with it.

-Alan

> [2]preto...@gmail.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "Lojban Beginners" group.
> To post to this group, send email to lojban-b...@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban-beginne...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lojban-beginners?hl=en.
>
> References
>
> Visible links

> 1. mailto:alyn...@lodockikumazvati.org

Jonathan Jones

unread,
Apr 21, 2011, 5:09:29 PM4/21/11
to lojban-b...@googlegroups.com

All Lojban words beginning with a vowel are supposed to have "." prepended, however, they are not /required/ in text as long as the word boundaries are clearly defined.

It is assumed that in such cases the reader knows that the pause normally indicated by "." is implicitly there, so in practice, many jbopre don't write it, either because of aesthetics, laziness, or possibly other reasons.

For example, {coi djeims mi e do klama lo vecnu} is the same as {coi.djeims. mi .e do klama lo vecnu}.

Personally, I feel that omitting /any/
"." is a bad practice that is potentially confusing, especially to newbies, so I endeavor not to do so.

to pu benji di'u fo lo mi me la.android. fonxa toi

mu'o mi'e.aionys.

On Apr 21, 2011 2:42 PM, "Johan Pretorius" <preto...@gmail.com> wrote:

Hi Alan

I thought I would have a quick look before going to bed... and it seems that  the troublesome word forms you mention are mostly (99%) cmavo that don't start with a consonant.

When writing vlastezba, I tried to adhere as strictly as possible to the word forms specified in the CLL, in section 4.2:  In there, all cmavo that start with a vowel should have a dot pre-pended.  So the forms, as I understand them, are as follows, where any two adjacent vowels may or may not have a ' inserted between them:
.V
.y.
CV
.VV
CVV
Cy
CVVV (experimental use only)

The only dot I could find in cmavo.txt was in "na.a", which seems like a mistake to me.  Anyway, I took cmavo.txt, prepended dots to all lines starting with a vowel, ran it again - this time vlastezba choked on "ybu".  I remember now that I hadn't defined y as either a consonant or a vowel, because the CLL was clear that it is NOT a vowel, but it wasn't at all clear that it was a consonant.  Indeed, if "ybu" is a valid cmavo, that would seem to imply that "y" is a vowel, yet I clearly recall reading a list of vowels that excluded "y", with the explanation that it isn't really a vowel. 

So I removed the "ybu" line from the file, and this time it parsed okay (see attached cmavo_dots.stdout).  Do you mind running the attached cmavo_dots.txt through jbogenturfa'i to see if  we get fewer than 80 discrepancies this time (I would do it myself, but I don't have a copy of jbogenturfa'i or, for that matter, a copy of Linux)

So I have three questions:
1) Is "y" a vowel or a consonant?
2) Are dots before vowel-only cmavo required or not?
3) If the dots are required, am I being reasonable to expect them to be present?

-Johan



On Thu, Apr 21, 2011 at 2:35 PM, .alyn.post. <alyn...@lodockikumazvati.org> wrote:
>

> vlastezba...

--
Johan Pretorius
Cell: 0829268327


preto...@gmail.com

--
You received this message because you are subscribed to the Google Group...

Jonathan Jones

unread,
Apr 21, 2011, 5:10:15 PM4/21/11
to lojban-b...@googlegroups.com

Jonathan Jones

unread,
Apr 21, 2011, 5:11:52 PM4/21/11
to lojban-b...@googlegroups.com

Jonathan Jones

unread,
Apr 21, 2011, 5:12:09 PM4/21/11
to lojban-b...@googlegroups.com

to pu benji di'u fo lo mi me la.android. fonxa toi

mu'o mi'e.aionys.

> I thought I w...

> <[1]alyn...@lodockikumazvati.org> wrote:
>
> vlastezba crashes on 80 of the 2433 tests....

>    [2]preto...@gmail.com

>
> --
> You received this message because you are subscribed to the Google Groups

> "Lojb...

> References
>
> Visible links
> 1. mailto:alyn...@lodockikumazvati.org

> 2. mailto:preto...@gmail.com



--
.i ma'a lo bradi ku penmi gi'e du

--

You received this message because you are subscribed to the Google Groups "Lojban Beginners" group.

...

Jonathan Jones

unread,
Apr 21, 2011, 5:15:09 PM4/21/11
to lojban-b...@googlegroups.com

.alyn.post.

unread,
Apr 21, 2011, 5:27:44 PM4/21/11
to lojban-b...@googlegroups.com
.i .uanai la'e zo di'u .i lo se benji ku no nilcla
.i mu'o mi'e .alyn.

On Thu, Apr 21, 2011 at 03:12:09PM -0600, Jonathan Jones wrote:
> to pu benji di'u fo lo mi me la.android. fonxa toi
>
> mu'o mi'e.aionys.
>
> On Apr 21, 2011 3:01 PM, ".alyn.post."

> > <[1][2]alyn...@lodockikumazvati.org> wrote:
> >
> > vlastezba crashes on 80 of the 2433 tests....
>

> > [2][3]preto...@gmail.com


>
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups
> > "Lojb...
>
> > References
> >
> > Visible links

> > 1. mailto:[4]alyn...@lodockikumazvati.org
>
> > 2. mailto:[5]preto...@gmail.com


>
> --
> .i ma'a lo bradi ku penmi gi'e du
>
> --
>
> You received this message because you are subscribed to the Google
> Groups "Lojban Beginners" group.
> ...
>
> --
> You received this message because you are subscribed to the Google Groups
> "Lojban Beginners" group.

> To post to this group, send email to lojban-b...@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban-beginne...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lojban-beginners?hl=en.
>

> References
>
> Visible links
> 1. mailto:alyn...@lodockikumazvati.org

> 2. mailto:alyn...@lodockikumazvati.org
> 3. mailto:preto...@gmail.com
> 4. mailto:alyn...@lodockikumazvati.org
> 5. mailto:preto...@gmail.com

Johan Pretorius

unread,
Apr 21, 2011, 5:34:13 PM4/21/11
to lojban-b...@googlegroups.com
Hmmm.... okay, clearly a more thorough reading of the CLL is required... I kinda just jumped in at the section I thought I needed. :-)

Given this, I'll have to go over my parsing logic again, it is incorrect as it stands - I assumed the dot is an integral part of the word form.

Thanks for all your help, guys!




On Thu, Apr 21, 2011 at 11:09 PM, Jonathan Jones <eye...@gmail.com> wrote:

All Lojban words beginning with a vowel are supposed to have "." prepended, however, they are not /required/ in text as long as the word boundaries are clearly defined.

It is assumed that in such cases the reader knows that the pause normally indicated by "." is implicitly there, so in practice, many jbopre don't write it, either because of aesthetics, laziness, or possibly other reasons.

For example, {coi djeims mi e do klama lo vecnu} is the same as {coi.djeims. mi .e do klama lo vecnu}.

Personally, I feel that omitting /any/
"." is a bad practice that is potentially confusing, especially to newbies, so I endeavor not to do so.

to pu benji di'u fo lo mi me la.android. fonxa toi

mu'o mi'e.aionys.


Jonathan Jones

unread,
Apr 21, 2011, 6:13:06 PM4/21/11
to lojban-b...@googlegroups.com
la.djimeil.app. na'e nelci lo mi fonxa
--
mu'o mi'e .aionys.

.i.a'o.e'e ko cmima le bende pe lo pilno be denpa bu .i doi.luk. mi patfu do zo'o
(Come to the Dot Side! Luke, I am your father. :D )

.alyn.post.

unread,
Apr 22, 2011, 12:27:58 AM4/22/11
to lojban-b...@googlegroups.com
On Thu, Apr 21, 2011 at 03:15:09PM -0600, Jonathan Jones wrote:
> Personally, I feel that omitting /any/
> "." is a bad practice that is potentially confusing, especially to
> newbies, so I endeavor not to do so.
>

.i ji'a mi nelci lo la'e di'u se catlu


.i mu'o mi'e .alyn.

.alyn.post.

unread,
Apr 22, 2011, 12:46:49 AM4/22/11
to lojban-b...@googlegroups.com
On Thu, Apr 21, 2011 at 10:42:40PM +0200, Johan Pretorius wrote:
> Do you mind running the attached
> cmavo_dots.txt through jbogenturfa'i to see if we get fewer than 80
> discrepancies this time (I would do it myself, but I don't have a copy of
> jbogenturfa'i or, for that matter, a copy of Linux)
>

.i .ui.ue ki'u lo do se pruce la jbogenturfa'i la vlastezba cu jalge mintu
to zoi .fanva.

With your provided input file, I get the same result that your
program does!

.fanva. toi

.i mu'o mi'e .alyn.

Jonathan Jones

unread,
Apr 22, 2011, 2:19:42 AM4/22/11
to lojban-b...@googlegroups.com
.i.uanai xu do cusku la'e zo .ie

Johan Pretorius

unread,
Apr 22, 2011, 12:45:28 PM4/22/11
to lojban-b...@googlegroups.com
Hi Alan, all

Alan, can I please ask you to run the attached four files through jbogenturfa'i, and send me back the results?  I have a visual tool (kdiff3) to compare them to my results, which makes it easier for me to figure out what is going on.

New release!  Get it here: http://sourceforge.net/projects/vlastezba/files/vlastezba_21.jar/download

In this release, I have fixed a bunch of things:
 - Dots are no longer assumed to be an integral part of a word.  In fact, now, if a dot is found, it is assumed to be a word separator, in exactly the same way as a space.  Beyond this they are completely ignored, and indeed, removed from the input stream.
 - "ybu" and "y'y" now parses.  Since no clarity was to be had about whether or not y is a vowel, consonant, neither or both, I just added those two as special cases... I alread had a loose standing "y" as a special case in there, because it is explicitly mentioned in CLL (section 4.3, I think)
 - The last cmavo cluster in a file is no longer misparsed.  Specifically, I added a regression test and unit test for "coirodo" appearing on a single line in its own file, and it finds 3 words as you would expect it to.
 - Output is now always ordered alphabetically.  Previously it was in any old order because I used an unordered HashMap to store them in.
 - Previously we seemed to produce some duplicates (I guess this could happen if there were extra whitespace in the words).  This only happened in about 0.5% of cases.  I did not consciously fix this, but it seems to no longer happen.
 - Internally, the logic is much better organized - the parsing logic is no longer all stuffed into a single class, instead there is a class hierarchy specifically to represent each word class, the idea is that each will have its own specialized processing.  The main point of doing this was to enrich the results returned by the tokenizer, which means in future we can get all flexible (like, if we find a lujvo, we will know what it's rafsi are, so that we can decide to give the user a list of those, look up their gismu's definitions, or what).
 - Added regression tests.  There are 4 files: the Terry the Tiger story, the Berenstein Bears story, a file containing only "coirodo" on a single line, and a file containing a list of all recognized cmavo (about 1000 lines).  I also added a script that will run all these through vlastezba, compares the outputs against "expected" results, and spits the diffs into a single file (test_result.txt).  It should be noted that the "expected" results are baselined off of this release, so it is impossible for there to be any reported problems.  However, next time a change is made, it will be possible to see how the regression tests are affected.  The expected results can then be manually updated to be more correct, thus causing the test to become more correct over time.
 - Added 2 unit tests to the ones already existing, specifically to test these two cases: "coirodo" and "ybu"... since both were problems that got fixed in this release.

By the way, does anybody know how to do a formal release on SourceForge?  Aside from just uploading the jar file, which is what I'm doing currently.

Regards,
iu'an
tests.zip

.alyn.post.

unread,
Apr 22, 2011, 1:23:38 PM4/22/11
to lojban-b...@googlegroups.com
I have run these four files through jbogenturfahi with the --rafske
option. I have attached both the raw output[1] and the post-processed
output[2].

The post-processed output is hopefully what you want, a sorted list
of words, one per line, that appear in each input file.

1: The raw output is in Scheme, and contains more information but is
also more difficult to parse without a Scheme reader.
2: The program I used to perform post-processing is attached as
well, though it also requires having Scheme. I include it for
informational purposes.

-Alan

On Fri, Apr 22, 2011 at 06:45:28PM +0200, Johan Pretorius wrote:
> Hi Alan, all
>
> Alan, can I please ask you to run the attached four files through
> jbogenturfa'i, and send me back the results? I have a visual tool (kdiff3)
> to compare them to my results, which makes it easier for me to figure out
> what is going on.
>
> New release! Get it here:

> [1]http://sourceforge.net/projects/vlastezba/files/vlastezba_21.jar/download

> --


> You received this message because you are subscribed to the Google Groups
> "Lojban Beginners" group.
> To post to this group, send email to lojban-b...@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban-beginne...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lojban-beginners?hl=en.
>
> References
>
> Visible links

> 1. http://sourceforge.net/projects/vlastezba/files/vlastezba_21.jar/download

jbogenturfahi-cipra.zip

.alyn.post.

unread,
Apr 22, 2011, 1:28:32 PM4/22/11
to lojban-b...@googlegroups.com
On Fri, Apr 22, 2011 at 06:45:28PM +0200, Johan Pretorius wrote:
> By the way, does anybody know how to do a formal release on SourceForge?
> Aside from just uploading the jar file, which is what I'm doing currently.
>
> Regards,
> iu'an

The last time I did a release on sourceforge, I had to upload my
tarball to an ftp server, then go to a special page on sourceforge
and "select" my file from the list of recent uploads. It would then
be attached as a release to my project.

That was some years ago, the process may have changed.

-Alan

Johan Pretorius

unread,
Apr 22, 2011, 4:25:03 PM4/22/11
to lojban-b...@googlegroups.com
Thanks!

Looking at the comparison, I spotted these bugs in vlastezba:
  1. For some reason, {bybycy} and {bycy} don't fall apart.  I expect them to, they're cmavo clusters.
  2. Same thing with {mycy}
  3. Same thing with {pycy}
  4. Same thing with {lety}
  5. vlastezba does not ignore case (it should... in cmavo_list, vlastezba does not break apart these two at all: {cu'uko'aBAI*}, {da'aremoiMOI*}
And these, which appear to be bugs in jbogenturfa'i:
  1. It seems to think {jefyfa'o} is not a cmavo cluster.
  2. Same thing with {ticyve'u}
  3. Same thing with {cikygau}
  4. Same thing with {vacysai}
cmavo_list have two entries that have asterisks as their last character.  As far as I understand, asterisks mean nothing in Lojban.  Does that mean we need to ignore them (i.e. delete them from the input stream?)

And I just realized I had been rude - I never sent you my output files!  My apologies, they're attached now.

I'm really excited that we're getting such close results!

Cheers,
Johan
tests.zip

.alyn.post.

unread,
Apr 22, 2011, 4:52:56 PM4/22/11
to lojban-b...@googlegroups.com
On Fri, Apr 22, 2011 at 10:25:03PM +0200, Johan Pretorius wrote:
> Thanks!
>
> Looking at the comparison, I spotted these bugs in vlastezba:
>
> 1. For some reason, {bybycy} and {bycy} don't fall apart. I expect them

> to, they're cmavo clusters.
> 2. Same thing with {mycy}
> 3. Same thing with {pycy}
> 4. Same thing with {lety}
> 5. vlastezba does not ignore case (it should... in cmavo_list, vlastezba

> does not break apart these two at all: {cu'uko'aBAI*}, {da'aremoiMOI*}
>

I believe that the PEG grammar does not forbid capitalization
outside of cmene, which the CLL implies. It might actually be
ok to properly mark stress outside of a cmene, I'm not super
clear. But yeah, the PEG grammor does not distinguish between
capital- and lower-case lerfu.

I think the two examples you cited, {cu'uko'aBAI*} and {da'aremoiMOI*},
are a parsing error. The BAI* and MOI* are the second column in the
file. Did I send it that way, or did that happen on your end?

The * there indicates that the cmavo cluster is not actually of the
indicated selma'o, but acts as if it were.


> And these, which appear to be bugs in jbogenturfa'i:
>

> 1. It seems to think {jefyfa'o} is not a cmavo cluster.
> 2. Same thing with {ticyve'u}
> 3. Same thing with {cikygau}
> 4. Same thing with {vacysai}
>

I believe all of these are a single class, those being lujvo with
impermissible consonant clusters (ff, cv, kg, and cs). When
constructing lujvo from rafsi, impermissible consonant clusters must
be buffered with 'y'. The CLL section discussing this is here:

"The lujvo-making algorithm"
http://dag.github.com/cll/4/11/

> cmavo_list have two entries that have asterisks as their last character.
> As far as I understand, asterisks mean nothing in Lojban. Does that mean
> we need to ignore them (i.e. delete them from the input stream?)
>
> And I just realized I had been rude - I never sent you my output files! My
> apologies, they're attached now.
>

Thank you!

> I'm really excited that we're getting such close results!
>

Indeed!

-Alan

Pierre Abbat

unread,
Apr 22, 2011, 5:04:50 PM4/22/11
to lojban-b...@googlegroups.com
On Friday 22 April 2011 16:25:03 Johan Pretorius wrote:
> 1. It seems to think {jefyfa'o} is not a cmavo cluster.
> 2. Same thing with {ticyve'u}
> 3. Same thing with {cikygau}
> 4. Same thing with {vacysai}

They're lujvo, formed from {jeftu fanmo}, {tcica vecnu}, {cikna gasnu}, and
{vanci sanmi} respectively. If you want them to be cmavo clusters, you have
to pause after "y".

Pierre
--
li fi'u vu'u fi'u fi'u du li pa

Jonathan Jones

unread,
Apr 22, 2011, 5:07:46 PM4/22/11
to lojban-b...@googlegroups.com
On Fri, Apr 22, 2011 at 2:25 PM, Johan Pretorius <preto...@gmail.com> wrote:
Thanks!

Looking at the comparison, I spotted these bugs in vlastezba:
  1. For some reason, {bybycy} and {bycy} don't fall apart.  I expect them to, they're cmavo clusters.
  2. Same thing with {mycy}
  3. Same thing with {pycy}
  4. Same thing with {lety}
  5. vlastezba does not ignore case (it should... in cmavo_list, vlastezba does not break apart these two at all: {cu'uko'aBAI*}, {da'aremoiMOI*}
And these, which appear to be bugs in jbogenturfa'i:
  1. It seems to think {jefyfa'o} is not a cmavo cluster.
  2. Same thing with {ticyve'u}
  3. Same thing with {cikygau}
  4. Same thing with {vacysai}

That's because they're lujvo.
cmavo_list have two entries that have asterisks as their last character.  As far as I understand, asterisks mean nothing in Lojban.  Does that mean we need to ignore them (i.e. delete them from the input stream?)

And I just realized I had been rude - I never sent you my output files!  My apologies, they're attached now.

I'm really excited that we're getting such close results!

Cheers,
Johan



On Fri, Apr 22, 2011 at 7:23 PM, .alyn.post. <alyn...@lodockikumazvati.org> wrote:
I have run these four files through jbogenturfahi with the --rafske
option.  I have attached both the raw output[1] and the post-processed
output[2].

--
You received this message because you are subscribed to the Google Groups "Lojban Beginners" group.
To post to this group, send email to lojban-b...@googlegroups.com.
To unsubscribe from this group, send email to lojban-beginne...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban-beginners?hl=en.

--
mu'o mi'e .aionys.

.i.e'ucai ko cmima lo pilno be denpa bu .i doi.luk. mi patfu do zo'o

Johan Pretorius

unread,
Apr 22, 2011, 5:23:47 PM4/22/11
to lojban-b...@googlegroups.com
On Fri, Apr 22, 2011 at 10:52 PM, .alyn.post. <alyn...@lodockikumazvati.org> wrote:
I believe that the PEG grammar does not forbid capitalization
outside of cmene, which the CLL implies.  It might actually be
ok to properly mark stress outside of a cmene, I'm not super
clear.  But yeah, the PEG grammor does not distinguish between
capital- and lower-case lerfu.

Okay, so it's probably worth fixing in case somebody decides to put stress on some part of a cmavo cluster to make it fit into the rhythm of their poem.
 
I think the two examples you cited, {cu'uko'aBAI*} and {da'aremoiMOI*},
are a parsing error.  The BAI* and MOI* are the second column in the
file.  Did I send it that way, or did that happen on your end?

I suspect you sent them that way, I remember being about to make a similar file for myself, when you conveniently saved me the trouble by sending me your copy :-)  I didn't check though, so it's still possible that it happened on my end.

 
>     1. It seems to think {jefyfa'o} is not a cmavo cluster.
>     2. Same thing with {ticyve'u}
>     3. Same thing with {cikygau}
>     4. Same thing with {vacysai}

I believe all of these are a single class, those being lujvo with
impermissible consonant clusters (ff, cv, kg, and cs).  When
constructing lujvo from rafsi, impermissible consonant clusters must
be buffered with 'y'...

Also (Pierre):

They're lujvo, formed from {jeftu fanmo}, {tcica vecnu}, {cikna gasnu}, and
{vanci sanmi} respectively. If you want them to be cmavo clusters, you have
to pause after "y".
 
And (Jonathan):

That's because they're lujvo.

Huh, you mean they're lujvo?   Erm... Then how am I to distinguish between a cmavo cluster and a lujvo with impermissable consonant clusters? 
Oh wait, I see a possible answer... cmavo can never contain "y", except for .y. and Cy?

-Johan

Pierre Abbat

unread,
Apr 22, 2011, 6:52:27 PM4/22/11
to lojban-b...@googlegroups.com
On Friday 22 April 2011 17:23:47 Johan Pretorius wrote:

> Huh, you mean they're lujvo? Erm... Then how am I to distinguish between
> a cmavo cluster and a lujvo with impermissable consonant clusters?
> Oh wait, I see a possible answer... cmavo can never contain "y", except for
> .y. and Cy?

You have to put a dot or a space in the cmavo cluster, such as "le kymoi"
or "leky.moi". (Neither valfendi nor jbofi'e accepts "kymoi", but it can't be
a lujvo or the beginning of one. camxes accepts it.) With a dot in it, it
still looks like a cluster (as is the previously mentioned "na.a").

Pierre
--
La sal en el mar es más que en la sangre.
Le sel dans la mer est plus que dans le sang.

Johan Pretorius

unread,
Apr 22, 2011, 7:08:04 PM4/22/11
to lojban-b...@googlegroups.com
> Huh, you mean they're lujvo?   Erm... Then how am I to distinguish between
> a cmavo cluster and a lujvo with impermissable consonant clusters?
> Oh wait, I see a possible answer... cmavo can never contain "y", except for
> .y. and Cy?

You have to put a dot or a space in the cmavo cluster, such as "le kymoi"
or "leky.moi". (Neither valfendi nor jbofi'e accepts "kymoi", but it can't be
a lujvo or the beginning of one. camxes accepts it.) With a dot in it, it
still looks like a cluster (as is the previously mentioned "na.a").


So, what you're saying is, that "bybycy" isn't a legal cmavo cluster?  Instead, it should be written "by.by.cy"?

.alyn.post.

unread,
Apr 22, 2011, 7:16:03 PM4/22/11
to lojban-b...@googlegroups.com
> Instead, it should be written "[1]by.by.cy"?
>

Well, bybyby is a legal cmavo cluster. I'm still trying to
understand why/how jbofi'i and valfendi don't like kymoi, so
I can't further explain what the underlying rule is.

Pierre Abbat

unread,
Apr 22, 2011, 10:12:38 PM4/22/11
to lojban-b...@googlegroups.com
On Friday 22 April 2011 19:16:03 .alyn.post. wrote:
> Well, bybyby is a legal cmavo cluster. I'm still trying to
> understand why/how jbofi'i and valfendi don't like kymoi, so
> I can't further explain what the underlying rule is.

If the piece ends in a vowel and the last 'y' is followed (possibly with more
letters between) by CVV, CV'V, or CCV, valfendi thinks it may be a brivla and
does not split after the 'y'. Only later does it figure out that "k" is not a
rafsi and call it invalid. A string consisting only of CV and Cy, such
as "lepybazyvo", it splits into cmavo.

Pierre
--
I believe in Yellow when I'm in Sweden and in Black when I'm in Wales.

Johan Pretorius

unread,
Apr 23, 2011, 8:18:08 PM4/23/11
to lojban-b...@googlegroups.com
Hi all

vlastezba v0.1 is released!

Get it here: https://sourceforge.net/projects/vlastezba/files/v0.1/

Highlights:
  • We no longer confuse lujvo containing impermissable consonant pairs with cmavo clusters.
  • We now catch lerfu-word strings (albeit as a special case)
  • Updated regression and unit tests for lerfu-word strings and lujvo containing impermissable consonant pairs.
  • Added a definition retrieval strategy that looks at an XML file generated by jbovlaste. It's a whole lot faster and doesn't eat bandwidth.
  • Removed custom CLI processing code, replaced with Apache Commons CLI library.
  • Added rafsi to output (when retrieving definitions)
  • Added SkipWords option and corresponding regression test.
Comments and criticisms are most welcome!  In particular: Alan, I think our output will be identical for the current set of test files.  You mentioned you had a somewhat more complex file lying around?

Regards,
Johan

preto...@gmail.com

unread,
Oct 28, 2014, 6:30:04 PM10/28/14
to lojban-b...@googlegroups.com, preto...@gmail.com
Hi all,

Resurrecting a very old thread to make sure the groups archive is udpated with the fact that I have moved the vlastezba source code from Sourceforge to Bitbucket.


The reasaons are that the Bitbucket interface is much more modern and accommodating (being all web based).  I've updated the Sourceforge summary page to indicate this new status.

It is set up as a Mercurial repository, because that's a little simpler for me to work with.

No material changes so far to the program itself.

Regards,
Johan

Gleki Arxokuna

unread,
Oct 29, 2014, 3:09:03 AM10/29/14
to lojban-b...@googlegroups.com
2014-10-29 1:30 GMT+03:00 <preto...@gmail.com>:
Hi all,

Resurrecting a very old thread to make sure the groups archive is udpated with the fact that I have moved the vlastezba source code from Sourceforge to Bitbucket.


access denied.
 

The reasaons are that the Bitbucket interface is much more modern and accommodating (being all web based).  I've updated the Sourceforge summary page to indicate this new status.

It is set up as a Mercurial repository, because that's a little simpler for me to work with.

No material changes so far to the program itself.

Regards,
Johan

On Sunday, April 24, 2011 2:18:08 AM UTC+2, Johan Pretorius wrote:
Hi all

vlastezba v0.1 is released!

Get it here: https://sourceforge.net/projects/vlastezba/files/v0.1/

Highlights:
  • We no longer confuse lujvo containing impermissable consonant pairs with cmavo clusters.
  • We now catch lerfu-word strings (albeit as a special case)
  • Updated regression and unit tests for lerfu-word strings and lujvo containing impermissable consonant pairs.
  • Added a definition retrieval strategy that looks at an XML file generated by jbovlaste. It's a whole lot faster and doesn't eat bandwidth.
  • Removed custom CLI processing code, replaced with Apache Commons CLI library.
  • Added rafsi to output (when retrieving definitions)
  • Added SkipWords option and corresponding regression test.
Comments and criticisms are most welcome!  In particular: Alan, I think our output will be identical for the current set of test files.  You mentioned you had a somewhat more complex file lying around?

Regards,
Johan

--
You received this message because you are subscribed to the Google Groups "Lojban Beginners" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lojban-beginne...@googlegroups.com.

To post to this group, send email to lojban-b...@googlegroups.com.

Johan Pretorius

unread,
Oct 29, 2014, 11:10:05 AM10/29/14
to lojban-beginners
Ah - okay, I've made it a public repository now, pelase try again.



Regards,
Johan


--
You received this message because you are subscribed to a topic in the Google Groups "Lojban Beginners" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lojban-beginners/o85r_AgmdUA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lojban-beginne...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages