Technical, Help Request: What information *should* a Lojban dictionary system have?

34 views
Skip to first unread message

Robin Lee Powell

unread,
Sep 11, 2010, 5:50:35 PM9/11/10
to lojba...@lojban.org, bp...@lojban.org, jbov...@lojban.org

(*Please* redirect all followups to the main list (I'd say the
jbovlaste list, but that's a lot harder to get on, so...))

Some of us have had brief chats about what a re-done jbovlaste would
look like. The UI part is pretty well understood, in as much as web
UIs are decently consistent these days and besides, people like
http://vlasisku.lojban.org/, so that provides a good starting point.

Much more interesting to me is the back-end data: What sorts of
things *should* a Lojbanic dictionary store, ideally?

What got this started is the realization that Lojban isn't English,
and that, in particular, the brivla definitions seem anti-Lojbanic.
When I see

x1 gets/procures/acquires/obtains/accepts x2 from source x3

that kind of looks to me like a verb; I see the big thing in the
middle as being "the meaning" of "the verb".

Lojban isn't like that: brivla are as much or more about the
*places* than about the central meaning-concept.

This lead to me wondering what a definition format that really
focused on the places would look like; I don't really have an answer
yet, but this in turn lead to a lot of other stuff.

In particular, it seemed to me that if you had the right kind of
information about the places, you could generate the sort of
definiton I pasted above automatically from that.

Then we had the smart.fm thing, which made it obvious that not all
definitions suit all situations; it was very important there to pare
the definitions down to bare essentials. It was also a giant pain.
So I got to thinking about what sort of data we'd have to have to
generate different levels of detail in the definitions.

As part of that, I ended up extracting some data from jbofihe, some
of the data it uses to generate English glosses, like this:

[([klama1 (go-er(s)):] mi /I, me/) /[is, does]/ <<klama /go-ing/>> ([klama2 (destination(s)):] le /the/ zarci /trading place(s)/)]

Which is kind of ugly, but if you strip out anything that's not
between /.../, you get:

I, me [is, does] go-ing the trading place(s)

which is really rather good. Good enough that one of my
girlfriends, who has never studied a word of Lojban, reads my blog
posts that way.

So this left me thinking that I want dictionary software which
could, given the right data, serve *all* of these purposes: formal
dictionary definitons, casual definitions, and glossing (which
implies very detailed information about the individual places).

I don't know exactly what this looks like, but I *think* we can get
all that by just talking about the places themselvles. The
resulting formal definiton might look a bit different; I'm not sure
yet, which is why I'm posting this: I want help coming up with
something awesome.

A reasonable starting point for discussion is what jbovlaste uses to
generate its glosses, I think:

# x1 gets/procures/acquires/obtains/accepts x2 from source x3 [previous possessor not implied]
cpacu1:A;acquire
cpacu2:P;acquired
cpacu3:D;source* of acquisition
cpacu3t:source

And here's what the letters mean:

│ ││ │ │ │
Letter │ Type ││ Noun │ Verb │ Qualifier │ Tag
───────┼─────────────┼┼─────────────────┼───────────┼───────────┼─────────────────
A │ Act ││ X-er(s) │ X-ing │ X-ing │ X-er(s)
D │ Discrete ││ X(s) │ being X │ X │ X
S │ Substance ││ X │ being X │ X │ X
P │ Property ││ X thing(s) │ being X │ X │ X thing(s)
R │ Rev. prop ││ thing(s) X │ being X │ X │ things(s) X
I │ Idiomatic ││ thing(s) X-ing │ X-ing │ X-ing │ thing(s) X-ing
E │ Event ││ X(s) │ being X │ X │ X

That actual format is .. not great :), but the information is
fantastic.

How can we expand that so that we could, in theory, have enough
information to serve all masters? What would the resulting
dictionary definitions look like?

-Robin

--
http://singinst.org/ : Our last, best hope for a fantastic future.
Lojban (http://www.lojban.org/): The language in which "this parrot
is dead" is "ti poi spitaki cu morsi", but "this sentence is false"
is "na nei". My personal page: http://www.digitalkingdom.org/rlp/

Stela Selckiku

unread,
Sep 11, 2010, 6:10:31 PM9/11/10
to loj...@googlegroups.com
On Sat, Sep 11, 2010 at 5:50 PM, Robin Lee Powell
<rlpo...@digitalkingdom.org> wrote:
>
> Much more interesting to me is the back-end data: What sorts of
> things *should* a Lojbanic dictionary store, ideally?

I don't know if it's exactly the sort of thing you're talking about,
but what I would find most useful is a space for long, free-form
rambling and discussion about the meanings of words (and of specific
places).

mi'e la stela selckiku
mu'o

Robin Lee Powell

unread,
Sep 11, 2010, 6:13:55 PM9/11/10
to loj...@googlegroups.com

We've found that allowing that tends to result in the same stuff
being hashed over and over. If it was thoroughly tied to one
location, though, it might not be so bad.

Ask jcowan about his Elephant idea, too.

Stela Selckiku

unread,
Sep 11, 2010, 6:26:14 PM9/11/10
to loj...@googlegroups.com
On Sat, Sep 11, 2010 at 6:13 PM, Robin Lee Powell
<rlpo...@digitalkingdom.org> wrote:
>
> We've found that allowing that tends to result in the same stuff
> being hashed over and over.

Hmm, well perhaps it's a matter of preference, because that's just
what I'd hope would happen. Lots of different discussion of the same
places, from all different perspectives, expressed in various ways.
That way we'd end up with some clarity about the deep semantics, or at
least some clarity about what the various opinions are.

Kevin Reid

unread,
Sep 11, 2010, 6:31:12 PM9/11/10
to loj...@googlegroups.com
On Sep 11, 2010, at 17:50, Robin Lee Powell wrote:

> What got this started is the realization that Lojban isn't English,
> and that, in particular, the brivla definitions seem anti-Lojbanic.
> When I see
>
> x1 gets/procures/acquires/obtains/accepts x2 from source x3
>
> that kind of looks to me like a verb; I see the big thing in the
> middle as being "the meaning" of "the verb".

...


> In particular, it seemed to me that if you had the right kind of
> information about the places, you could generate the sort of
> definiton I pasted above automatically from that.


I think this is a dangerous idea, and arguably less Lojbanic than the
current definition format.

Lojban is not English; selbri places are not prepositions. The
definition of a selbri is the *RELATION* among *ALL* of the places,
not the combination of the meanings of individual places. Insofar as a
gismu, at least, can readily be described by describing each of its
places individually, it is incoherent and ought to be trimmed back.

For example:

sumji
x1 is a mathematical sum/result/total of x2 plus/increased by x3

If you rewrite this definition in terms of the places

x1 sum
x2 summand #1/2
x3 summand #2/2

then this has *OBSCURED* the actual relationship, by omitting the
information about how x1 is related to x2 except implicitly.

What would be an improvement in the area of "avoiding the verb" is
having the brivla definitions be given with each of the places as the
English subject; for example:

spati
x1 is a plant of species x2.
x2 is the species of plant x1.

(I've omitted the / words from the gismu list definition for brevity.)

The latter (with the numbers changed) could also be used as a
definition of {selspati}, and that sort of thing does occur in
jbovlaste (for example, I wrote a definition for {tersmu} so that I
could index it under the English verb "understand"), but I think that
it would be better if a single brivla entry (especially for lujvo,
which are not as obviously sel-convertable) contained the “English
verb” perspectives for *all* of its places.

--
Kevin Reid <http://switchb.org/kpreid/>

Jonathan Jones

unread,
Sep 11, 2010, 6:46:25 PM9/11/10
to bpfk...@googlegroups.com, lojba...@lojban.org, bp...@lojban.org, jbov...@lojban.org
Perhaps a good starting point would be to have a definition for each place of each word? That's what I did for the gismu for the Smart.fm lessons.

--
mu'o mi'e .aionys.

.i.a'o.e'e ko klama le bende pe denpa bu

2088410807

Jonathan Jones

unread,
Sep 11, 2010, 6:51:28 PM9/11/10
to bpfk...@googlegroups.com, lojba...@lojban.org, bp...@lojban.org, jbov...@lojban.org
The full list is here.

Jonathan Jones

unread,
Sep 11, 2010, 6:55:56 PM9/11/10
to loj...@googlegroups.com
On Sat, Sep 11, 2010 at 4:31 PM, Kevin Reid <kpr...@switchb.org> wrote:
<snip>

For example:

sumji
 x1 is a mathematical sum/result/total of x2 plus/increased by x3

sumji    x1 is the sum of x2 plus x3
lo sumji    a total of a sum
lo se sumji    something summed
lo te sumji    something summed with

<snip>
...[F]or example:


spati
 x1 is a plant of species x2.

spati    x1 is a plant of species x2
lo spati    a plant
lo se spati    a species of plant

Robin Lee Powell

unread,
Sep 11, 2010, 8:05:38 PM9/11/10
to loj...@googlegroups.com
On Sat, Sep 11, 2010 at 06:10:31PM -0400, Stela Selckiku wrote:

Forgot to mention: you know jbovlaste already has this, yeah?

Robert LeChevalier

unread,
Sep 11, 2010, 8:24:13 PM9/11/10
to Lojban List
Robin Lee Powell wrote:
> Much more interesting to me is the back-end data: What sorts of
> things *should* a Lojbanic dictionary store, ideally?
>
> What got this started is the realization that Lojban isn't English,
> and that, in particular, the brivla definitions seem anti-Lojbanic.
> When I see
>
> x1 gets/procures/acquires/obtains/accepts x2 from source x3
>
> that kind of looks to me like a verb; I see the big thing in the
> middle as being "the meaning" of "the verb".
>
> Lojban isn't like that: brivla are as much or more about the
> *places* than about the central meaning-concept.
>
> This lead to me wondering what a definition format that really
> focused on the places would look like; I don't really have an answer
> yet, but this in turn lead to a lot of other stuff.

A lot of thought, as well as research on dictionary writing went into
the format I used, which is what appears above. The decisive factor
that led to the above was brevity (which is a factor in most
dictionaries that see print)

If people are ONLY going to look at dictionaries on-line, then brevity
may be less important, but if there is ever to be a printed (or
printable) dictionary, with a significant number of entries, something
like the above becomes necessary.

Another thing to remember - while Lojban isn't English, until we get to
Lojban-only dictionaries, all dictionaries are translation dictionaries,
and there are very different rules because of how such dictionaries are
used. There, the key to format is the target language - the one in
which people will be focused on when translating. One would therefore
expect a different format for the English-to-Lojban side than for the
Lojban-to-English side. I tried for this in the quasi-automated
dictionary format, which used keyword manipulation to turn the above
into a series of entries like:

procurer: x1 of cpacu; (followed by some form of the definition above)
acquirer: x1 of cpacu; (followed by some form of the definition above)
obtainer: x1 of cpacu; ...
accepter: x1 of cpacu; ...
acquisition: x2 of cpacu ...
source of acquisition: x3 of cpacu ...
acquisition: nu event of cpacu ...
acquisition: nu event of cpacu ...
acquisition: pu'u process of cpacu ...
get : x1+x2 of cpacu ... (meaning that at least those two places must be
filled in order to translate the English)
etc.

This turns each lojban-to-English definition into a rich multitude of
English to Lojban definitions. It can take a lot of work, even with
something like Cowan's perl script that did the keyword manipulation,
and you have to think about which keywords are likely to useful.

lojban.org should have the files I generated doing this work somewhere,
so people can see what the result looked like. It wasn't necessarily
all that pretty, but it met the functional need. And the automated
processing made it possible to create a dictionary in less than several
lifetimes.

I won't pretend to know how to apply these insights into online
dictionaries, since the only kind I ever use are those that display
entries looking like regular English entries (a formatting style that
developed over a couple hundred years of dictionary writing).

lojbab

Robin Lee Powell

unread,
Sep 11, 2010, 8:40:44 PM9/11/10
to loj...@googlegroups.com
On Sat, Sep 11, 2010 at 06:31:12PM -0400, Kevin Reid wrote:
> On Sep 11, 2010, at 17:50, Robin Lee Powell wrote:
>
> >What got this started is the realization that Lojban isn't
> >English, and that, in particular, the brivla definitions seem
> >anti-Lojbanic. When I see
> >
> > x1 gets/procures/acquires/obtains/accepts x2 from source x3
> >
> >that kind of looks to me like a verb; I see the big thing in the
> >middle as being "the meaning" of "the verb".
> ...
> >In particular, it seemed to me that if you had the right kind of
> >information about the places, you could generate the sort of
> >definiton I pasted above automatically from that.
>
>
> I think this is a dangerous idea, and arguably less Lojbanic than
> the current definition format.

'swhy I asked. :)

> Lojban is not English; selbri places are not prepositions. The
> definition of a selbri is the *RELATION* among *ALL* of the
> places, not the combination of the meanings of individual places.

Fair enough.

What if there was detailed information about the places *and*
something like what we now call gloss words? That is, words that
describe the whole relationship?

> What would be an improvement in the area of "avoiding the verb" is
> having the brivla definitions be given with each of the places as
> the English subject; for example:
>
> spati
> x1 is a plant of species x2.
> x2 is the species of plant x1.
>
> (I've omitted the / words from the gismu list definition for
> brevity.)
>
> The latter (with the numbers changed) could also be used as a
> definition of {selspati}, and that sort of thing does occur in
> jbovlaste (for example, I wrote a definition for {tersmu} so that
> I could index it under the English verb "understand"), but I think
> that it would be better if a single brivla entry (especially for
> lujvo, which are not as obviously sel-convertable) contained the
> “English verb” perspectives for *all* of its places.

That's a very interesting idea.

Now, how would that look such that it was stored in a more
machine-understandable way and not as actual strings of English
text?

It's the giant strings of English text thing that I most want to get
away from; it's caused us nothing but trouble IME.

-Roibn

Robin Lee Powell

unread,
Sep 11, 2010, 8:43:21 PM9/11/10
to loj...@googlegroups.com
On Sat, Sep 11, 2010 at 08:24:13PM -0400, Robert LeChevalier wrote:
> Robin Lee Powell wrote:
> >Much more interesting to me is the back-end data: What sorts of
> >things *should* a Lojbanic dictionary store, ideally?
> >
> >What got this started is the realization that Lojban isn't
> >English, and that, in particular, the brivla definitions seem
> >anti-Lojbanic. When I see
> >
> > x1 gets/procures/acquires/obtains/accepts x2 from source x3
> >
> >that kind of looks to me like a verb; I see the big thing in the
> >middle as being "the meaning" of "the verb".
> >
> >Lojban isn't like that: brivla are as much or more about the
> >*places* than about the central meaning-concept.
> >
> >This lead to me wondering what a definition format that really
> >focused on the places would look like; I don't really have an
> >answer yet, but this in turn lead to a lot of other stuff.
>
> A lot of thought, as well as research on dictionary writing went
> into the format I used, which is what appears above. The decisive
> factor that led to the above was brevity (which is a factor in
> most dictionaries that see print)

I'll respond to the rest later, but just to be clear: I have no
particular aversion to that format as such *for a print dictionary*.
My issue is with storing the definitions that way internally to the
database.

I was speculating on the nature of that format, it's true, but not
with any serious intent; just noodling.

Kevin Reid

unread,
Sep 11, 2010, 8:55:14 PM9/11/10
to loj...@googlegroups.com
On Sep 11, 2010, at 20:40, Robin Lee Powell wrote:

> What if there was detailed information about the places *and*
> something like what we now call gloss words? That is, words that
> describe the whole relationship?

That's a verb and prepositions, which is Bad.

> Now, how would that look such that it was stored in a more
> machine-understandable way and not as actual strings of English
> text?
>
> It's the giant strings of English text thing that I most want to get
> away from; it's caused us nothing but trouble IME.

I argue that the English text is necessary, and that the problem that
you're seeing is from that we always use the same English text. We
already use many/several/a variety of English words separated/
interspersed/divided by spaces to convey the broadness of gismu; we
need to do the same with sentence structure to convey the nonverbness
of gismu.


Another question we should be asking (and I haven't thought about the
answer yet) is: what definition format encourages word inventors to
think clearly about their definition?

The definition sentences that jbovlaste/the gismu list use now feel
more like a 'formal definition' than dropping a bunch of English
keywords into slots would. (Though of course I haven't tried it.
Quick, someone name all the biases I'm exhibiting.)

Oh yeah, here's another argument: There are better things we could be
doing than rewriting all the jbovlaste definitions. Like rewriting
only the cmavo definitions :)

Robin Lee Powell

unread,
Sep 11, 2010, 9:02:25 PM9/11/10
to loj...@googlegroups.com
On Sat, Sep 11, 2010 at 08:55:14PM -0400, Kevin Reid wrote:
> On Sep 11, 2010, at 20:40, Robin Lee Powell wrote:
> >
> >Now, how would that look such that it was stored in a more
> >machine-understandable way and not as actual strings of English
> >text?
> >
> >It's the giant strings of English text thing that I most want to
> >get away from; it's caused us nothing but trouble IME.
>
> I argue that the English text is necessary, and that the problem
> that you're seeing is from that we always use the same English
> text.

Then you are very confused about the problem I am seeing.

I want to be able to generate *at least* a formal dictionary,
flashcards, simple definitions, and glossing, from one datasource,
easily, with the data source being as simple as possible.

Every time we've gone to do something new with this data, we've had
to do a giant amount of work to convert the data, and then an even
more giant amount of work to hand-fix all the errors thus
introduced. It's crappy, and I want it to stop.

My talk about making the definitions more Lojbanic was entirely a
side comment.

> Another question we should be asking (and I haven't thought about
> the answer yet) is: what definition format encourages word
> inventors to think clearly about their definition?

That is an interesting question.

> Oh yeah, here's another argument: There are better things we could
> be doing than rewriting all the jbovlaste definitions. Like
> rewriting only the cmavo definitions :)

This is true, but I wasn't talking about right now, I was talking in
theory/at some point. I've had to do this sort of conversion work
at least 6 seperate times now; I'm very very tired of it. I want to
do it once and have done forever.

-Robin

Kevin Reid

unread,
Sep 11, 2010, 9:12:48 PM9/11/10
to loj...@googlegroups.com
On Sep 11, 2010, at 21:02, Robin Lee Powell wrote:

> I want to be able to generate *at least* a formal dictionary,
> flashcards, simple definitions, and glossing, from one datasource,
> easily, with the data source being as simple as possible.


Aaaaaaah. I see now. Such as how smart.fm needed additional text to
describe each place of a gismu.

I think that this problem falls under NLP, because the text needed in
each case is not *regular* (that is, related in a systematic way to
the other cases). For example, I've tweaked the smart.fm definitions
because the existing more-regular (though still hand-written) ones
were misleading in English. (I don't think I kept notes on which ones
I changed; the only example I remember was x2 of troci => "something
tried", which is wrong but not a good example of the generic type of
wrongness.)

Jonathan Jones

unread,
Sep 11, 2010, 11:57:42 PM9/11/10
to loj...@googlegroups.com

Well, there is a reason why I made the goal open access..

Alan Post

unread,
Sep 11, 2010, 9:22:19 PM9/11/10
to loj...@googlegroups.com
On Sat, Sep 11, 2010 at 06:02:25PM -0700, Robin Lee Powell wrote:
> I want to be able to generate *at least* a formal dictionary,
> flashcards, simple definitions, and glossing, from one datasource,
> easily, with the data source being as simple as possible.
>
> Every time we've gone to do something new with this data, we've had
> to do a giant amount of work to convert the data, and then an even
> more giant amount of work to hand-fix all the errors thus
> introduced. It's crappy, and I want it to stop.
>
> My talk about making the definitions more Lojbanic was entirely a
> side comment.
>

I've just purchased a handheld computer called a Ben NanoNote.[1]
I intend to use this machine primarily to study Lojban, and to that
purpose I've ported makfa[2] and intend to port jbofihe and mnemo[3].
I think these three programs are sufficient to create a portable study
environment, giving me opportunities to study and play with Lojban
at times I don't carry my laptop.

I've realized working on these programs that each one of them uses a
different copy of the dictionary. That strikes me, architecturally,
as a problem. I'd really like these programs to be drawing from the
same data source: the dictionary does change, and even if we imagine
the community dictionary never changing, *I'm* not currently happy
with it and want to make changes to my local copy.

I have longer term plans to add to the corpus of Lojban-using
programs, and very much support defining a data format that supports
the use cases we put it to.

-Alan

1: http://sharism.cc/
2: http://projects.qi-hardware.com/index.php/p/openwrt-packages/source/tree/master/makfa
3: http://en.qi-hardware.com/wiki/User:Alanpost/Backlog
--
ko djuno fi le do sevzi

Lindar

unread,
Sep 12, 2010, 2:44:15 AM9/12/10
to lojban
My one and only major must have request at this time is this:

Please make djisku, djicusku, djicysku, and djicycusku all link to the
exact same page/definition. They are all the same word/defined things,
so shouldn't each one be added? When adding new lujvo to the list, it
should add them by component rafsi and generate a score list.

Jonathan Jones

unread,
Sep 12, 2010, 3:01:11 AM9/12/10
to loj...@googlegroups.com

So, exactly what jvozba does with tanru, but also with a definition of the lujvo?

Lindar

unread,
Sep 12, 2010, 3:30:44 AM9/12/10
to lojban
> So, exactly what
> jvozba<http://jwodder.freeshell.org/lojban/jvozba.cgi?tanru=djica+cusku&vlal...>does
> with tanru, but also with a definition of the lujvo?

Yeah, the idea was partially inspired by that. So if I put in
{djicysku}, then it'll bring up a list like that of all the forms with
scores + component selrafsi + definition, and every form entered
brings up the same page.

Dag Odenhall

unread,
Sep 12, 2010, 4:30:01 AM9/12/10
to loj...@googlegroups.com

Good idea for vlasisku, search by source metaphor: glicybangu redirects
you to glibau.

purpleposeidon

unread,
Sep 12, 2010, 6:14:43 AM9/12/10
to loj...@googlegroups.com
Making the authoritative definition of gismu using the "x1: widget"
model doesn't seem very awesome to me. And is providing additional
information with them going to allow us to generate something we don't
already have?

We should dump all the Smarf.fm data gently into jbovlaste. And since
we're adding one new type of definition, we should also add others,
such as the (hrefs of) the pixra on .uikipedias.


mu'omi'e.sipnabebnadjeims.

Oren

unread,
Sep 12, 2010, 9:48:21 AM9/12/10
to loj...@googlegroups.com
In response to the questions about keeping all sumti linked to their brethren in each gismu definition, and in partial response to the question of how to encourage creative interpretation of definitions, why not try something like one of these:


co'o mi'e korbi


--
You received this message because you are subscribed to the Google Groups "lojban" group.
To post to this group, send email to loj...@googlegroups.com.
To unsubscribe from this group, send email to lojban+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.


Remo Dentato

unread,
Sep 12, 2010, 11:20:40 AM9/12/10
to loj...@googlegroups.com
I would suggest to not put too much work on specifing the se- te- etc
words. I fear that having a fixed sense for, say, "terdunda" would be
misleading for newbies and useless for those more knoledgeable. The
right way to translate the "places" of a brivla highly depend on the
context.

I'm convinced that we should insist on making people understanding the
place structure and use their creativity to find the right way to
translate it in their language rather than suggest them that there is
a *right* translation for that place.

I tried to do this for Italian when I started learning language and
pretty soon decided that it would have been a waste of my (and other
Italians) time try to find all possibile meaning for gismu places.

just my 2c

Remo

Robin Lee Powell

unread,
Sep 12, 2010, 11:26:50 AM9/12/10
to loj...@googlegroups.com
On Sat, Sep 11, 2010 at 07:22:19PM -0600, Alan Post wrote:
> I've just purchased a handheld computer called a Ben NanoNote.[1]
> I intend to use this machine primarily to study Lojban, and to that
> purpose I've ported makfa[2] and intend to port jbofihe and mnemo[3].
[snip]
> 3: http://en.qi-hardware.com/wiki/User:Alanpost/Backlog

mnemo seems to be http://mnemo.sourceforge.net/

Just out of curiousity: out of all the *many* spaced-repition based
flashcard programs, why *that* one, which seems to be abandonware?

Robin Lee Powell

unread,
Sep 12, 2010, 11:27:20 AM9/12/10
to loj...@googlegroups.com
On Sat, Sep 11, 2010 at 07:22:19PM -0600, Alan Post wrote:
> I've realized working on these programs that each one of them uses
> a different copy of the dictionary. That strikes me,
> architecturally, as a problem. I'd really like these programs to
> be drawing from the same data source: the dictionary does change,
> and even if we imagine the community dictionary never changing,
> *I'm* not currently happy with it and want to make changes to my
> local copy.
>
> I have longer term plans to add to the corpus of Lojban-using
> programs, and very much support defining a data format that
> supports the use cases we put it to.

Yes, *exactly*. So the question is, what would such a thing look
like?

Robin Lee Powell

unread,
Sep 12, 2010, 11:32:29 AM9/12/10
to loj...@googlegroups.com
On Sun, Sep 12, 2010 at 05:20:40PM +0200, Remo Dentato wrote:
> I would suggest to not put too much work on specifing the se- te-
> etc words. I fear that having a fixed sense for, say, "terdunda"
> would be misleading for newbies and useless for those more
> knoledgeable. The right way to translate the "places" of a brivla
> highly depend on the context.
>
> I'm convinced that we should insist on making people understanding
> the place structure and use their creativity to find the right way
> to translate it in their language rather than suggest them that
> there is a *right* translation for that place.

I agree for {te dunda}, but not {terdunda}; the latter is a lujvo,
and as such should have a firm/strict definition like all other
brivla.

Robin Lee Powell

unread,
Sep 12, 2010, 11:33:10 AM9/12/10
to loj...@googlegroups.com
On Sun, Sep 12, 2010 at 09:48:21AM -0400, Oren wrote:
> In response to the questions about keeping all sumti linked to
> their brethren in each gismu definition,

You mean rafsi, yes?

> and in partial response to the question of how to encourage
> creative interpretation of definitions, why not try something like
> one of these:
>
> http://www.lojban.org/tiki/cpacuvisualization

I don't see what that adds at all? It's just keywords and some
bubbles?

Robin Lee Powell

unread,
Sep 12, 2010, 11:35:29 AM9/12/10
to loj...@googlegroups.com

That is unquestionably a good idea.

Robert LeChevalier

unread,
Sep 12, 2010, 11:44:18 AM9/12/10
to loj...@googlegroups.com
Robin Lee Powell wrote:
> I'll respond to the rest later, but just to be clear: I have no
> particular aversion to that format as such *for a print dictionary*.
> My issue is with storing the definitions that way internally to the
> database.

Then, as I understand what you want, I agree with the other person who
said this is an NLP problem.

In particular, I think we shouldn't try to reinvent the NLP wheel.

There are numerous AI systems out there with predicate based languages,
which need and have engines that store and process the definitions of
predicates. Most of those could serve as a model, but those like the
CYC project, which were designed specifically to enable NLP, would seem
especially likely.

Whether we can use our non-profit nature and our academic ties to
perhaps get CYC or one of the other such projects to let us use (or
possible modify) their tools for Lojban is something I don't have the
expertise or the connections for. But there are academics in the Lojban
community who might be able to help or suggest people to contact (Nick
Nicolas being an obvious person to start with since I know he has
considered the NLP aspects of Lojban, but I'm sure he is hardly the
person in the community who has worked in that arena).

I'm not really sure you will ever come up with a way to automatedly turn
whatever internal format you use into a publishable print-dictionary
without any need for hand-editing, especially since jbovlaste and its
ilk aren't trying to serve merely as a Lojban-English dictionary, but as
a Lojban-everything dictionary.

In any case, designing the database strikes me as a project that needs
serious research into how large-dictionary databases are designed. At
one point, we sought a grant that would have been used for such research
- the only grant (pre-)proposal that we've ever prepared. But that
didn't get very far %^).

lojbab

Oren

unread,
Sep 12, 2010, 12:56:42 PM9/12/10
to loj...@googlegroups.com
On Sun, Sep 12, 2010 at 11:33, Robin Lee Powell <rlpo...@digitalkingdom.org> wrote:
I don't see what that adds at all?  It's just keywords and some
bubbles?

Yea, it's just a thought, nothing more. I like visuals, and that would be something I would be happy to code if people seemed amicable to it.

People mention NLP... where's the largest bilingual english-lojban corpus? I recall the monolingual lojban corpus as being somewhat large, but do we have much data for generating word pairs probabilities or translation rules?

Alan Post

unread,
Sep 12, 2010, 1:57:55 PM9/12/10
to loj...@googlegroups.com
On Sun, Sep 12, 2010 at 08:26:50AM -0700, Robin Lee Powell wrote:
> On Sat, Sep 11, 2010 at 07:22:19PM -0600, Alan Post wrote:
> > I've just purchased a handheld computer called a Ben NanoNote.[1]
> > I intend to use this machine primarily to study Lojban, and to that
> > purpose I've ported makfa[2] and intend to port jbofihe and mnemo[3].
> [snip]
> > 3: http://en.qi-hardware.com/wiki/User:Alanpost/Backlog
>
> mnemo seems to be http://mnemo.sourceforge.net/
>
> Just out of curiousity: out of all the *many* spaced-repition based
> flashcard programs, why *that* one, which seems to be abandonware?
>

I would love a better suggestion.

I need something that works within a Unix console, and doesn't
depend on X11. Something that works in the Linux framebuffer
is also acceptable.

This was the *only* program I found that worked from the console.
If anyone knows of other programs that work on the console or
framebuffer, and don't depend on X11, please suggest them to me!

-Alan

Alan Post

unread,
Sep 12, 2010, 2:26:58 PM9/12/10
to loj...@googlegroups.com
On Sun, Sep 12, 2010 at 08:27:20AM -0700, Robin Lee Powell wrote:
> On Sat, Sep 11, 2010 at 07:22:19PM -0600, Alan Post wrote:
> > I've realized working on these programs that each one of them uses
> > a different copy of the dictionary. That strikes me,
> > architecturally, as a problem. I'd really like these programs to
> > be drawing from the same data source: the dictionary does change,
> > and even if we imagine the community dictionary never changing,
> > *I'm* not currently happy with it and want to make changes to my
> > local copy.
> >
> > I have longer term plans to add to the corpus of Lojban-using
> > programs, and very much support defining a data format that
> > supports the use cases we put it to.
>
> Yes, *exactly*. So the question is, what would such a thing look
> like?
>

I did not, last night, have anything firm to contribute to the
conversation, so I outlined my problem without having anything solid
to propose.

I don't have a firm idea yet of exactly what is being proposed, what
problem it is trying to solve, and what problem it isn't trying to
solve--I don't have enough context to really follow the e-mail
discussion and integrate ideas as I've seen them.

I've created a wiki page that I hope captures the essence of everything
that has been discussed so far:

http://lojban.org/tiki/tiki-index.php?page=Dictionary

I would be helped by formalizing all of these ideas into actionable
proposals, and I think the best way to do this is to summarize
discussion into formal, structured proposals on the wiki. The page
above is my start at doing that.

My contribution after sleeping on it is to ask: Our experience with
dictionaries all come from languages without formal grammars. We have
a formal grammar. Is the way we're thinking about a dictionary limited
by our experience of languages without formal grammars? Can we do
something combining definitions and grammar that can't be done in
any other language?

I don't have an answer yet, I've added it to the page above and hope
other people will replace questions with proposals on that page as
we discuss them here.

-Alan

Alan Post

unread,
Sep 12, 2010, 9:04:02 PM9/12/10
to loj...@googlegroups.com
On Sun, Sep 12, 2010 at 12:26:58PM -0600, Alan Post wrote:
> I've created a wiki page that I hope captures the essence of everything
> that has been discussed so far:
>
> http://lojban.org/tiki/tiki-index.php?page=Dictionary
>
> I would be helped by formalizing all of these ideas into actionable
> proposals, and I think the best way to do this is to summarize
> discussion into formal, structured proposals on the wiki. The page
> above is my start at doing that.
>

I've added a proposal to the dictionary page:

http://lojban.org/tiki/tiki-index.php?page=Dictionary

It's pretty bad, and certainly incomplete. It is superior to what
was there, which was nothing at all. I've tried to identify the use
cases we're trying to solve, and have added a section after the
proposal explaining how it solves the use case identified.

It is extremely basic. Will you help me make it better?

Robin Lee Powell

unread,
Sep 14, 2010, 3:50:23 PM9/14/10
to loj...@googlegroups.com
On Sun, Sep 12, 2010 at 11:57:55AM -0600, Alan Post wrote:
> On Sun, Sep 12, 2010 at 08:26:50AM -0700, Robin Lee Powell wrote:
> > On Sat, Sep 11, 2010 at 07:22:19PM -0600, Alan Post wrote:
> > > I've just purchased a handheld computer called a Ben
> > > NanoNote.[1] I intend to use this machine primarily to study
> > > Lojban, and to that purpose I've ported makfa[2] and intend to
> > > port jbofihe and mnemo[3].
> > [snip]
> > > 3: http://en.qi-hardware.com/wiki/User:Alanpost/Backlog
> >
> > mnemo seems to be http://mnemo.sourceforge.net/
> >
> > Just out of curiousity: out of all the *many* spaced-repition
> > based flashcard programs, why *that* one, which seems to be
> > abandonware?
> >
>
> I would love a better suggestion.
>
> I need something that works within a Unix console, and doesn't
> depend on X11. Something that works in the Linux framebuffer is
> also acceptable.

Yeah, there's not much.

http://www.phy.duke.edu/~rgb/General/general.php#flashcard , maybe

I used to use http://www.emacswiki.org/emacs/FlashCard , which is
emacs based and will work in a pure terminal, but you have to be
running emacs

LogFlash, but I find using that pretty painful. There's a
runs-on-linux version at
http://www.lojban.org/tiki/tiki-index.php?page=Lojbanic+Software

I don't know what else there is.

Reply all
Reply to author
Forward
0 new messages