gismu database

42 views
Skip to first unread message

vpbr...@gmail.com

unread,
Aug 11, 2017, 9:24:50 PM8/11/17
to lojban
In case this might be useful to others, and perhaps in hopes of gathering helpful diffs/patches if anyone does updates,
I wanted to share my evolving database on the gismu.

lojban_gismu_dict.txt
https://app.box.com/s/tq9jcjlrwj5ah21hy0hldhd1bn25wo7m

The format looks like this.

<gismu> <cvc-rafsi>     <ccv-rafsi>     <cvv-rafsi>     #<frequency-rank>
fanva: <short-gloss>
morji: <mnemonics-cognates>
klesi: <gismu-category>
<predicate-template-with-slots>
A/fa: <type-of-A>: <A-case-role>: <gloss-for-lo-gismu>
B/fe: <type-of-B>: <B-case-role>: <gloss-for-lo-se-gismu>
C/fi: <type-of-C>: <C-case-role>: <gloss-for-lo-te-gismu>
D/fo: <type-of-D>: <D-case-role>: <gloss-for-lo-ve-gismu>
E/fu: <type-of-E>: <E-case-role>: <gloss-for-lo-xe-gismu>
mupli: <example-sentence-filling-all-slots>
lujvo: <examples-of-short-lujvo-with-these-short-rafsi>

cusku   cus     sku             #14
fanva: express
morji: express say
klesi:
person A expresses or says text B for audience C via expressive medium D;
A/fa: PRS: ACT: expresser
B/fe: TXT: PRD: expressed words
C/fi: PRS: DST: audience
D/fo: THI: INS: expressive medium
mupli: le gunka jatna cu cusku se duhu miha ba mutce gunka kei miha lo samselmri
lujvo: cuskahi cuskuhi skudji skuspu biksku cnisku

The short glosses are the actual English text one would put in between x1/A and x2/B, usually without a leading "is" or a trailing "of".
The gismu categories are a future enhancement not yet present.
The definition templates in many cases are simplified from the prolix definitions usually found.
The argument types and cases have mnemonics defined in the following lists. They are pretty debatable.

types-cases.txt
https://app.box.com/s/26vo9vgncz0f1xrnukc79l8ftl7bczo5

The database includes all the standard gismu and defs, but a lot of other info in it is very incomplete.
I am in process of adding in many of la gleki's example sentences from his dictionary, which are useful for learning.
Enjoy.

mihe bremenli

suke...@gmail.com

unread,
Aug 12, 2017, 3:59:03 AM8/12/17
to lojban
coi do

Thanks for sharing. As I'm developing spell checking dictionaries, I have questions...
  • What sources do you use for your database?
  • Do you think it would be wise integrating your entries in my dictionaries, knowing the fact that I use jbovlaste as a unique source for now?
  • Do you think there might be some data I  could use to improve spell checking?
ki'e .i co'o

.i mi'e la .sykyndyr.

vpbr...@gmail.com

unread,
Aug 12, 2017, 5:58:20 PM8/12/17
to lojban
Sources.
The gismu, rafsi, rough gloss, and prolix definition came from jbovlaste.
The short glosses were my improvements, based on the definitions.
The mnemonics came from the etymology in the 6 languages, software evaluated for relevance by similarity, then adjusted by hand.
The frequency rank was my own mish-mash combination from several sources of frequency.
The glosses for conversion sumti came from a flash-card deck, I forget which, polished by myself.
The types and case tags are my own homebrew, but they frequently come from annotations in the jbovlaste def itself.
The example sentences are from la gleki's dictionary, from la muplis, and from my own creativity.
The lujvo examples were software selected from a big file of lujvo that I analyzed -- but they ought to be chosen for interest.
The classification of gismu will be based on a listing I found that sorts gismu in categories, which I am refining.

Processing this stuff into a dictionary might be a little better than a dump of jbovlaste, but would still call for proofreading.

Spell checking of lojban is more algorithmic than for most languages.
The gismu and cmavo lists are fixed, except for classes allowing for experimental forms.
The possible lujvo are not limited by a dictionary but by the morphology algorithm.
Even the fuhivla are open-ended, but limited by the rules.
Still, a spell checker would usefully point out impossible words as distinct from not-yet-defined words as needing attention.

mihe bremenli
Reply all
Reply to author
Forward
0 new messages